Re: RFC: drop support for gcc < 4.0

2007-08-21 Thread Robert P. J. Day
On Tue, 21 Aug 2007, Adrian Bunk wrote:

> It is an option to say "gcc >= 4.0 on i386 and >= 3.4 on all other
> architectures is required".

  if you're going to do something like that, you might as well take
the extra step and start keeping track of which versions of gcc work
with which architectures, along the lines of what dan kegel did with
the results matrix of crosstool:

http://www.kegel.com/crosstool/crosstool-0.43/buildlogs/

  i'm being only moderately facetious, of course but, on the other
hand, if there's all this anecdotal information regarding which
combinations work and which don't, maybe it's worth codifying that
into a compilation check somewhere in the build process.

  after all, at the moment in init/main.c, any gcc < 3.2 is rejected
outright, while gcc-4.1.0 generates a warning.  that's incredibly ad
hoc and certainly incomplete.  might as well just write a script for
the scripts/ directory which accepts an architecture and a version of
gcc and tells you what the current situation is and what you can do
about it.

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://crashcourse.ca

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add I/O hypercalls for i386 paravirt

2007-08-21 Thread Zachary Amsden

Avi Kivity wrote:

Zachary Amsden wrote:
  

In general, I/O in a virtual guest is subject to performance
problems.  The I/O can not be completed physically, but must be
virtualized.  This means trapping and decoding port I/O instructions
from the guest OS.  Not only is the trap for a #GP heavyweight, both
in the processor and the hypervisor (which usually has a complex #GP
path), but this forces the hypervisor to decode the individual
instruction which has faulted.  Worse, even with hardware assist such
as VT, the exit reason alone is not sufficient to determine the true
nature of the faulting instruction, requiring a complex and costly
instruction decode and simulation.

This patch provides hypercalls for the i386 port I/O instructions,
which vastly helps guests which use native-style drivers.  For certain
VMI workloads, this provides a performance boost of up to 30%.  We
expect KVM and lguest to be able to achieve similar gains on I/O
intensive workloads.





Won't these workloads be better off using paravirtualized drivers? 
i.e., do the native drivers with paravirt I/O instructions get anywhere

near the performance of paravirt drivers?
  


Yes, in general, this is true (better off with paravirt drivers).  
However, we have "paravirt" drivers which run in both 
fully-paravirtualized and fully traditionally virtualized environments.  
As a result, they use native port I/O operations to interact with 
virtual hardware.


Since not all hypervisors have paravirtualized driver infrastructures 
and guest O/S support yet, these hypercalls can be advantages to a wide 
range of scenarios.  Using I/O hypercalls as such gives exactly the same 
performance as paravirt drivers for us, by eliminating the costly decode 
path, and the simplicity of using the same driver code makes this a huge 
win in code complexity.


Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Console events and accessibility

2007-08-21 Thread Greg KH
On Tue, Aug 21, 2007 at 11:29:39PM +0200, Samuel Thibault wrote:
> Some external modules like Speakup need to monitor console output.
> 
> This adds a VT notifier that such modules can use to get console output 
> events:
> allocation, deallocation, writes, other updates (cursor position, switch, 
> etc.)
> 
> Signed-off-by: Samuel Thibault <[EMAIL PROTECTED]>

Will speakup work with this kind of change?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Restricting CDC-ACM devices

2007-08-21 Thread Greg KH
On Tue, Aug 21, 2007 at 05:03:54PM -0500, Nate wrote:
> I would like to use the cdc-acm driver in the Linux kernel (2.6.22-rc1),
> but restrict the access to only my VID/PID devices.  Is there an easy way
> to do with without modifying cdc-acm.c?

Why do you not want to modify the driver?

> In a past prototype I made a simple wrapper driver for usb serial by
> adding my VID/PID numbers to the wrapper driver's id_table.  Then when
> that usb driver was accessed on connection, the driver just pointed to the
> usb_serial_* functions (probe, disconnect, etc).  I tried to do the same
> with the cdc-acm driver, but the cdc-acm driver's probe function was
> called before my driver's probe.  I noticed that the cdc-amc driver will
> attach when it detects the two CDC-ACM interfaces, so I removed the
> cdc-acm driver with "make menuconfig".  This didn't work because the
> cdc-acm functions I was attempting to call from my driver do not exist.

You can disconnect the device from the driver from userspace for any
device you just don't want to have connected by using the sysfs
bind/unbind files.  That doesn't require any kernel changes at all.

Why do you want to do this, what are you expecting to achieve with such
a change?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add I/O hypercalls for i386 paravirt

2007-08-21 Thread Avi Kivity
Zachary Amsden wrote:
> In general, I/O in a virtual guest is subject to performance
> problems.  The I/O can not be completed physically, but must be
> virtualized.  This means trapping and decoding port I/O instructions
> from the guest OS.  Not only is the trap for a #GP heavyweight, both
> in the processor and the hypervisor (which usually has a complex #GP
> path), but this forces the hypervisor to decode the individual
> instruction which has faulted.  Worse, even with hardware assist such
> as VT, the exit reason alone is not sufficient to determine the true
> nature of the faulting instruction, requiring a complex and costly
> instruction decode and simulation.
>
> This patch provides hypercalls for the i386 port I/O instructions,
> which vastly helps guests which use native-style drivers.  For certain
> VMI workloads, this provides a performance boost of up to 30%.  We
> expect KVM and lguest to be able to achieve similar gains on I/O
> intensive workloads.
>


Won't these workloads be better off using paravirtualized drivers? 
i.e., do the native drivers with paravirt I/O instructions get anywhere
near the performance of paravirt drivers?


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Add I/O hypercalls for i386 paravirt

2007-08-21 Thread Zachary Amsden
In general, I/O in a virtual guest is subject to performance problems.  
The I/O can not be completed physically, but must be virtualized.  This 
means trapping and decoding port I/O instructions from the guest OS.  
Not only is the trap for a #GP heavyweight, both in the processor and 
the hypervisor (which usually has a complex #GP path), but this forces 
the hypervisor to decode the individual instruction which has faulted.  
Worse, even with hardware assist such as VT, the exit reason alone is 
not sufficient to determine the true nature of the faulting instruction, 
requiring a complex and costly instruction decode and simulation.


This patch provides hypercalls for the i386 port I/O instructions, which 
vastly helps guests which use native-style drivers.  For certain VMI 
workloads, this provides a performance boost of up to 30%.  We expect 
KVM and lguest to be able to achieve similar gains on I/O intensive 
workloads.


This patch is against 2.6.23-rc2-mm2, and should be targeted for 2.6.24.

Zach
Virtualized guests in general benefit from having I/O hypercalls.  This
patch adds support for port I/O hypercalls to VMI and provides the
infrastructure for other backends to make use of this feature.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/paravirt.c b/arch/i386/kernel/paravirt.c
index ea962c0..4d0d150 100644
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -329,6 +329,18 @@ struct paravirt_ops paravirt_ops = {
 
 	.set_iopl_mask = native_set_iopl_mask,
 	.io_delay = native_io_delay,
+	.outb = native_outb,
+	.outw = native_outw,
+	.outl = native_outl,
+	.inb = native_inb,
+	.inw = native_inw,
+	.inl = native_inl,
+	.outsb = native_outsb,
+	.outsw = native_outsw,
+	.outsl = native_outsl,
+	.insb = native_insb,
+	.insw = native_insw,
+	.insl = native_insl,
 
 #ifdef CONFIG_X86_LOCAL_APIC
 	.apic_write = native_apic_write,
diff --git a/arch/i386/kernel/vmi.c b/arch/i386/kernel/vmi.c
index 44feb34..5ecd85b 100644
--- a/arch/i386/kernel/vmi.c
+++ b/arch/i386/kernel/vmi.c
@@ -56,6 +56,7 @@ static int disable_tsc;
 static int disable_mtrr;
 static int disable_noidle;
 static int disable_vmi_timer;
+static int disable_io_ops;
 
 /* Cached VMI operations */
 static struct {
@@ -72,6 +73,18 @@ static struct {
 	void (*set_initial_ap_state)(int, int);
 	void (*halt)(void);
   	void (*set_lazy_mode)(int mode);
+	void (*outb)(u8 value, u16 port);
+	void (*outw)(u16 value, u16 port);
+	void (*outl)(u32 value, u16 port);
+	u8 (*inb)(u16 port);
+	u16 (*inw)(u16 port);
+	u32 (*inl)(u16 port);
+	void (*outsb)(const void *addr, u16 port, u32 count);
+	void (*outsw)(const void *addr, u16 port, u32 count);
+	void (*outsl)(const void *addr, u16 port, u32 count);
+	void (*insb)(void *addr, u16 port, u32 count);
+	void (*insw)(void *addr, u16 port, u32 count);
+	void (*insl)(void *addr, u16 port, u32 count);
 } vmi_ops;
 
 /* Cached VMI operations */
@@ -565,6 +578,33 @@ static void vmi_set_lazy_mode(enum paravirt_lazy_mode mode)
 	}
 }
 
+#define BUILDIO(bwl,type) \
+static void vmi_out##bwl(type value, int port) { \
+	__asm__ __volatile__("call *%0" : : \
+		"r"(vmi_ops.out##bwl), "a"(value), "d"(port)); \
+} \
+static type vmi_in##bwl(int port) { \
+	type value; \
+	__asm__ __volatile__("call *%1" : \
+		"=a"(value) : \
+		"r"(vmi_ops.in##bwl), "d"(port)); \
+	return value; \
+} \
+static void vmi_outs##bwl(int port, const void *addr, unsigned long count) { \
+	__asm__ __volatile__("call *%2" : \
+		"+S"(addr), "+c"(count) : \
+		"r"(vmi_ops.outs##bwl), "d"(port)); \
+} \
+static void vmi_ins##bwl(int port, void *addr, unsigned long count) { \
+	__asm__ __volatile__("call *%2" : \
+		"+D"(addr), "+c"(count) : \
+		"r"(vmi_ops.ins##bwl), "d"(port)); \
+} 
+
+BUILDIO(b,unsigned char)
+BUILDIO(w,unsigned short)
+BUILDIO(l,unsigned int)
+
 static inline int __init check_vmi_rom(struct vrom_header *rom)
 {
 	struct pci_header *pci;
@@ -791,6 +831,21 @@ static inline int __init activate_vmi(void)
 	para_wrap(load_esp0, vmi_load_esp0, set_kernel_stack, UpdateKernelStack);
 	para_fill(set_iopl_mask, SetIOPLMask);
 	para_fill(io_delay, IODelay);
+	if (!disable_io_ops) {
+		para_wrap(inb, vmi_inb, inb, INB);
+		para_wrap(inw, vmi_inw, inw, INW);
+		para_wrap(inl, vmi_inl, inl, INL);
+		para_wrap(outb, vmi_outb, outb, OUTB);
+		para_wrap(outw, vmi_outw, outw, OUTW);
+		para_wrap(outl, vmi_outl, outl, OUTL);
+		para_wrap(insb, vmi_insb, insb, INSB);
+		para_wrap(insw, vmi_insw, insw, INSW);
+		para_wrap(insl, vmi_insl, insl, INSL);
+		para_wrap(outsb, vmi_outsb, outsb, OUTSB);
+		para_wrap(outsw, vmi_outsw, outsw, OUTSW);
+		para_wrap(outsl, vmi_outsl, outsl, OUTSL);
+	}
+
 	para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode);
 
 	/* user and kernel flush are just handled with different flags to FlushTLB */
@@ -968,6 +1023,8 @@ static int __init parse_vmi(char *arg)
 		disable_noidle = 1;
 	} else if (!strcmp(arg, "disable_noidle"))
 		disable_noidle 

Re: [PATCH 11/23] make atomic_read() and atomic_set() behavior consistent on m32r

2007-08-21 Thread Hirokazu Takata
Hi, Chris,

From: Hirokazu Takata <[EMAIL PROTECTED]>
Date: Wed, 22 Aug 2007 10:56:54 +0900
> From: Chris Snook <[EMAIL PROTECTED]>
> Date: Mon, 13 Aug 2007 07:24:52 -0400
> > From: Chris Snook <[EMAIL PROTECTED]>
> > 
> > Use volatile consistently in atomic.h on m32r.
> > 
> > Signed-off-by: Chris Snook <[EMAIL PROTECTED]>
> 
> Thanks,
> 
> Acked-by: Hirokazu Takata <[EMAIL PROTECTED]>

Hmmm.. It seems my reply was overhasty.

Applying the above patch, I have many warning messages like this:

<-- snip -->
  ...
  CC  kernel/sched.o
In file included from 
/project/m32r-linux/kernel/work/linux-2.6_dev.git/include/linux/netlink.h:139,
 from 
/project/m32r-linux/kernel/work/linux-2.6_dev.git/include/linux/genetlink.h:4,
 from 
/project/m32r-linux/kernel/work/linux-2.6_dev.git/include/net/genetlink.h:4,
 from 
/project/m32r-linux/kernel/work/linux-2.6_dev.git/include/linux/taskstats_kern.h:12,
 from 
/project/m32r-linux/kernel/work/linux-2.6_dev.git/include/linux/delayacct.h:21,
 from 
/project/m32r-linux/kernel/work/linux-2.6_dev.git/kernel/sched.c:61:
/project/m32r-linux/kernel/work/linux-2.6_dev.git/include/linux/skbuff.h: In 
function 'skb_shared':
/project/m32r-linux/kernel/work/linux-2.6_dev.git/include/linux/skbuff.h:521: 
warning: passing argument 1 of 'atomic_read' discards qualifiers from pointer 
target type
  ...
<-- snip -->

In this case, it is because stb_shared() is defined with a parameter with
"const" qualifier, in include/linux/skbuff.h.

static inline int skb_shared(const struct sk_buff *skb)
{
return atomic_read(>users) != 1;
}

I think the parameter of atomic_read() should have "const" 
qualifier to avoid these warnings, and IMHO this modification might be
worth applying on other archs.

Here is an additional patch to revise the previous one for m32r.
I also tried to rewrite it with inline asm code, but the kernel text size
bacame roughly 2kB larger. So, I prefer C version.

Thanks, 

-- Takata


[PATCH] m32r: Add "const" qualifier to the parameter of atomic_read()

Update atomic_read() to avoid the following warning of gcc-4.1.x:
  warning: passing argument 1 of 'atomic_read' discards qualifiers
  from pointer target type

Signed-off-by: Hirokazu Takata <[EMAIL PROTECTED]>
Cc: Chris Snook <[EMAIL PROTECTED]>
---
 include/asm-m32r/atomic.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-m32r/atomic.h b/include/asm-m32r/atomic.h
index ba19689..9d46f86 100644
--- a/include/asm-m32r/atomic.h
+++ b/include/asm-m32r/atomic.h
@@ -32,7 +32,7 @@ typedef struct { int counter; } atomic_t;
  *
  * Atomically reads the value of @v.
  */
-static __inline__ int atomic_read(atomic_t *v)
+static __inline__ int atomic_read(const atomic_t *v)
 {
 return *(volatile int *)>counter;
 }
-- 
1.5.2.4

--
Hirokazu Takata <[EMAIL PROTECTED]>
Linux/M32R Project:  http://www.linux-m32r.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH][BUGFIX] fix rcu_read_lock in page migraton

2007-08-21 Thread KAMEZAWA Hiroyuki
This is a patch against the problme Shaohua rported.
Just an idea for fix the problem.
How do you think ? dummy vma is better ? (I don't like dummy vma.)

-Kame
==
In migration fallback path, write_page() or lock_page() will be called.
This causes sleep with holding rcu_read_lock().
For avoding that, just do rcu_lock if the page is Anon.(this is enough.)

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>


---
 mm/migrate.c |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: linux-2.6.23-rc2-mm2/mm/migrate.c
===
--- linux-2.6.23-rc2-mm2.orig/mm/migrate.c
+++ linux-2.6.23-rc2-mm2/mm/migrate.c
@@ -611,6 +611,7 @@ static int unmap_and_move(new_page_t get
int rc = 0;
int *result = NULL;
struct page *newpage = get_new_page(page, private, );
+   int rcu_locked = 0;
 
if (!newpage)
return -ENOMEM;
@@ -636,8 +637,13 @@ static int unmap_and_move(new_page_t get
 * we cannot notice that anon_vma is freed while we migrates a page.
 * This rcu_read_lock() delays freeing anon_vma pointer until the end
 * of migration. File cache pages are no problem because of page_lock()
+* File Caches may use write_page() or lock_page() in migration, then,
+* just care Anon page here.
 */
-   rcu_read_lock();
+   if (PageAnon(page)) {
+   rcu_read_lock();
+   rcu_locked = 1;
+   }
/*
 * This is a corner case handling.
 * When a new swap-cache is read into, it is linked to LRU
@@ -656,7 +662,8 @@ static int unmap_and_move(new_page_t get
if (rc)
remove_migration_ptes(page, page);
 rcu_unlock:
-   rcu_read_unlock();
+   if (rcu_locked)
+   rcu_read_unlock();
 
 unlock:
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] ppc .gitignore update

2007-08-21 Thread Paul Mackerras
Adrian Bunk writes:
> From: Grant Likely <[EMAIL PROTECTED]>
> 
> arch/ppc/.gitignore shouldn't exclude arch/ppc/boot/include

Already in my for-2.6.24 and master branches.

Paul.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-08-21 Thread Casey Schaufler

--- Kyle Moffett <[EMAIL PROTECTED]> wrote:

> On Aug 21, 2007, at 11:50:48, Casey Schaufler wrote:
> > --- Kyle Moffett <[EMAIL PROTECTED]> wrote:
> >> Well, in this case the "box" I want to secure will eventually be  
> >> running multi-user X on a multi-level-with-IPsec network.  For  
> >> that kind of protection profile, there is presently no substitute  
> >> for SELinux with some X11 patches.  AppArmor certainly doesn't  
> >> meet the confidentiality requirements (no data labelling), and  
> >> SMACK has no way of doing the very tight per-syscall security  
> >> requirements we have to meet.
> >
> > And what requirements would those be? Seriously, I've done Common  
> > Criteria and TCSEC evaluations on systems with less flexibility and  
> > granularity than Smack that included X, NFSv3, NIS, clusters, and  
> > all sorts of spiffy stuff.
> 
> These are requirements more of the "give the client warm fuzzies".

OK, that's perfectly reasonable. If the client has been sold
on the concept of SELinux the client will get warm fuzzies
only from SELinux. Security is how you feel about it, after all.
   
> On the other hand, when designing a box that could theoretically be  
> run on a semi-public unclassified network and yet still be safe  
> enough to run classified data over IPsec links, you want to give the  
> client all the warm fuzzies they ask for and more.

Yes. Of course, a little hard technology behind it doesn't hurt, either.
 
> > I mean, if the requirement is anything short of "runs SELinux" I  
> > have good reason to believe that a Smack based system is up to it.
> 
> "up to it", yes, but I think you'll find that beyond the simplest  
> policies, an SELinux policy that properly uses the SELinux  
> infrastructure will be much shorter than the equivalent SMACK policy,  

Well, I find that hard to believe. Maybe I'm only thinking of what
you would consider the simplest policies.

> not even including all the things that SELinux does and SMACK doesn't.

Of course.
 
> >> I didn't make this clear initially but that is the kind of system  
> >> I'm talking about wanting to secure some 50 million lines of code on.
> >
> > Cool. SELinux provides one approach to dealing with that, and the  
> > huge multiuser general purpose machine chuck full of legacy  
> > software hits the SELinux sweet spot.
> 
> Well, given that 99.9% of the systems people are really concerned  
> about security on are multi-user general-purpose machines chuck full  
> of legacy software, that seems to work just fine.

Err, no. By unit count such systems are extremely rare. There is
tremendous concern for security in your cell phone, your DVR,
your PDA, and even your toaster.

> If it's a single- 
> user box then you don't even need MAC, just a firewall, a good locked  
> rack/case/keyboard/etc, and decent physical security.

You cell phone has really lousy physical security.

> If it's  
> entirely custom-controlled software then you can just implement the  
> "MAC" entirely in your own software.  "General-purpose" vs "special- 
> purpose" is debatable, so I'll just leave that one lie.

Indeed. Total control over the software on your phone is not
a competetive option for a provider.

> Replying to another email:
> >> but you written it in wrong language. You written it in C, while  
> >> you should have written it in SELinux policy language (and your  
> >> favourite scripting language as frontend).
> >
> > I have often marvelled at the notion of a simplification layer.  I  
> > believe that you build complex things on top of simple things, not  
> > the other way around.
> 
> There is no "one answer" to this question in software development.   

You're correct. Can I quote you on that?

> Generally you prioritize things based on maximizing maintainability  
> and speed and minimizing code, bugs, and complexity.  Those are often  
> both conflicting and in agreement.  Here are a few common examples of  
> simple-thing-on-complex-thing:
> ...
> 
> Look at the SELinux model again; it has the following things:
>(A) Labels on almost-all user-visible kernel objects
>(B) Individual access rules for almost every operation on those  
> objects
>(C) "Transition" rules to set the label on newly created objects.
>(D) Fundamental "constraints" which enforce hard limits on what  
> may be permitted with "allow" rules
> 
>  From a fundamental standpoint it's harder to get much simpler than  
> that.

It's easy to get simpler than that:
(A) Labels on all objects and subjects
(B) Access rules for subjects and objects

No transformations. Operations in terms of rwx. lots simpler.

>  On top of that model, we also have a bit of additional  
> *flexibility* for MLS/RBAC, although that flexibility may be ignored  
> completely.
>(1) You can define "users" which may only assume some "roles"
>(2) You can define "roles" may only run in some "types"
>(3) There's a simple way of declaring multiple "levels" and  

[PATCH 10/11] cxgb3 - Firmware update

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Update firmware version
Allow the driver to be up and running with older FW image

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/common.h |2 +-
 drivers/net/cxgb3/cxgb3_main.c |9 +
 drivers/net/cxgb3/t3_hw.c  |   20 +++-
 drivers/net/cxgb3/version.h|2 +-
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h
index b665b20..ff867c2 100644
--- a/drivers/net/cxgb3/common.h
+++ b/drivers/net/cxgb3/common.h
@@ -691,7 +691,7 @@ int t3_read_flash(struct adapter *adapter, unsigned int 
addr,
  unsigned int nwords, u32 *data, int byte_oriented);
 int t3_load_fw(struct adapter *adapter, const u8 * fw_data, unsigned int size);
 int t3_get_fw_version(struct adapter *adapter, u32 *vers);
-int t3_check_fw_version(struct adapter *adapter);
+int t3_check_fw_version(struct adapter *adapter, int *must_load);
 int t3_init_hw(struct adapter *adapter, u32 fw_params);
 void mac_prep(struct cmac *mac, struct adapter *adapter, int index);
 void early_hw_init(struct adapter *adapter, const struct adapter_info *ai);
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 65ded16..eaebd7f 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -814,11 +814,12 @@ static int cxgb_up(struct adapter *adap)
int must_load;
 
if (!(adap->flags & FULL_INIT_DONE)) {
-   err = t3_check_fw_version(adap);
-   if (err == -EINVAL)
+   err = t3_check_fw_version(adap, _load);
+   if (err == -EINVAL) {
err = upgrade_fw(adap);
-   if (err)
-   goto out;
+   if (err && must_load)
+   goto out;
+   }
 
err = t3_check_tpsram_version(adap, _load);
if (err == -EINVAL) {
diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c
index 63032e8..3d47627 100644
--- a/drivers/net/cxgb3/t3_hw.c
+++ b/drivers/net/cxgb3/t3_hw.c
@@ -957,16 +957,18 @@ int t3_get_fw_version(struct adapter *adapter, u32 *vers)
 /**
  * t3_check_fw_version - check if the FW is compatible with this driver
  * @adapter: the adapter
- *
+ * @must_load: set to 1 if loading a new FW image is required
+
  * Checks if an adapter's FW is compatible with the driver.  Returns 0
  * if the versions are compatible, a negative error otherwise.
  */
-int t3_check_fw_version(struct adapter *adapter)
+int t3_check_fw_version(struct adapter *adapter, int *must_load)
 {
int ret;
u32 vers;
unsigned int type, major, minor;
 
+   *must_load = 1;
ret = t3_get_fw_version(adapter, );
if (ret)
return ret;
@@ -979,9 +981,17 @@ int t3_check_fw_version(struct adapter *adapter)
minor == FW_VERSION_MINOR)
return 0;
 
-   CH_ERR(adapter, "found wrong FW version(%u.%u), "
-  "driver needs version %u.%u\n", major, minor,
-  FW_VERSION_MAJOR, FW_VERSION_MINOR);
+   if (major != FW_VERSION_MAJOR)
+   CH_ERR(adapter, "found wrong FW version(%u.%u), "
+  "driver needs version %u.%u\n", major, minor,
+  FW_VERSION_MAJOR, FW_VERSION_MINOR);
+   else {
+   *must_load = 0;
+   CH_WARN(adapter, "found wrong FW minor version(%u.%u), "
+   "driver compiled for version %u.%u\n", major, minor,
+   FW_VERSION_MAJOR, FW_VERSION_MINOR);
+   }
+
return -EINVAL;
 }
 
diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h
index eb508bf..ef1c633 100644
--- a/drivers/net/cxgb3/version.h
+++ b/drivers/net/cxgb3/version.h
@@ -39,6 +39,6 @@
 
 /* Firmware version */
 #define FW_VERSION_MAJOR 4
-#define FW_VERSION_MINOR 3
+#define FW_VERSION_MINOR 6
 #define FW_VERSION_MICRO 0
 #endif /* __CHELSIO_VERSION_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] cxgb3 - log and clear PEX errors

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Clear pciE PEX errors late at module load time.
Log details when PEX errors occur.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/t3_hw.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c
index 3d47627..538b254 100644
--- a/drivers/net/cxgb3/t3_hw.c
+++ b/drivers/net/cxgb3/t3_hw.c
@@ -1355,6 +1355,10 @@ static void pcie_intr_handler(struct adapter *adapter)
{0}
};
 
+   if (t3_read_reg(adapter, A_PCIE_INT_CAUSE) & F_PEXERR)
+   CH_ALERT(adapter, "PEX error code 0x%x\n",
+t3_read_reg(adapter, A_PCIE_PEX_ERR));
+
if (t3_handle_intr_status(adapter, A_PCIE_INT_CAUSE, PCIE_INTR_MASK,
  pcie_intr_info, adapter->irq_stats))
t3_fatal_err(adapter);
@@ -1806,6 +1810,8 @@ void t3_intr_clear(struct adapter *adapter)
for (i = 0; i < ARRAY_SIZE(cause_reg_addr); ++i)
t3_write_reg(adapter, cause_reg_addr[i], 0x);
 
+   if (is_pcie(adapter))
+   t3_write_reg(adapter, A_PCIE_PEX_ERR, 0x);
t3_write_reg(adapter, A_PL_INT_CAUSE0, 0x);
t3_read_reg(adapter, A_PL_INT_CAUSE0);  /* flush */
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/11] cxgb3 - Update internal memory management

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Set PM1 internal memory to round robin mode
It balances access to this internal memory for multiport adapters.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/regs.h  |2 ++
 drivers/net/cxgb3/t3_hw.c |2 ++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h
index 2824278..5e1bc0d 100644
--- a/drivers/net/cxgb3/regs.h
+++ b/drivers/net/cxgb3/regs.h
@@ -1326,6 +1326,7 @@
 #define V_D0_WEIGHT(x) ((x) << S_D0_WEIGHT)
 
 #define A_PM1_RX_CFG 0x5c0
+#define A_PM1_RX_MODE 0x5c4
 
 #define A_PM1_RX_INT_ENABLE 0x5d8
 
@@ -1394,6 +1395,7 @@
 #define A_PM1_RX_INT_CAUSE 0x5dc
 
 #define A_PM1_TX_CFG 0x5e0
+#define A_PM1_TX_MODE 0x5e4
 
 #define A_PM1_TX_INT_ENABLE 0x5f8
 
diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c
index 23b1a16..13bfbec 100644
--- a/drivers/net/cxgb3/t3_hw.c
+++ b/drivers/net/cxgb3/t3_hw.c
@@ -3189,6 +3189,8 @@ int t3_init_hw(struct adapter *adapter, u32 fw_params)
t3_set_reg_field(adapter, A_PCIX_CFG, 0, F_CLIDECEN);
 
t3_write_reg(adapter, A_PM1_RX_CFG, 0x);
+   t3_write_reg(adapter, A_PM1_RX_MODE, 0);
+   t3_write_reg(adapter, A_PM1_TX_MODE, 0);
init_hw_for_avail_ports(adapter, adapter->params.nports);
t3_sge_init(adapter, >params.sge);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/11] cxgb3 - engine microcode update

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Load microcode engine when the interface
is configured up.
Bump up version to 1.1.0.
Allow the driver to be and running with
older microcode images.
Allow ethtool to log the microcode version.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/common.h |8 ++-
 drivers/net/cxgb3/cxgb3_main.c |  116 
 drivers/net/cxgb3/t3_hw.c  |   43 +--
 3 files changed, 113 insertions(+), 54 deletions(-)

diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h
index d54446f..b665b20 100644
--- a/drivers/net/cxgb3/common.h
+++ b/drivers/net/cxgb3/common.h
@@ -127,8 +127,8 @@ enum {  /* adapter 
interrupt-maintained statistics */
 
 enum {
TP_VERSION_MAJOR= 1,
-   TP_VERSION_MINOR= 0,
-   TP_VERSION_MICRO= 44
+   TP_VERSION_MINOR= 1,
+   TP_VERSION_MICRO= 0
 };
 
 #define S_TP_VERSION_MAJOR 16
@@ -438,6 +438,7 @@ enum {  /* chip 
revisions */
T3_REV_A  = 0,
T3_REV_B  = 2,
T3_REV_B2 = 3,
+   T3_REV_C  = 4,
 };
 
 struct trace_params {
@@ -682,7 +683,8 @@ const struct adapter_info *t3_get_adapter_info(unsigned int 
board_id);
 int t3_seeprom_read(struct adapter *adapter, u32 addr, u32 *data);
 int t3_seeprom_write(struct adapter *adapter, u32 addr, u32 data);
 int t3_seeprom_wp(struct adapter *adapter, int enable);
-int t3_check_tpsram_version(struct adapter *adapter);
+int t3_get_tp_version(struct adapter *adapter, u32 *vers);
+int t3_check_tpsram_version(struct adapter *adapter, int *must_load);
 int t3_check_tpsram(struct adapter *adapter, u8 *tp_ram, unsigned int size);
 int t3_set_proto_sram(struct adapter *adap, u8 *data);
 int t3_read_flash(struct adapter *adapter, unsigned int addr,
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index e5744e7..65ded16 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -721,6 +721,7 @@ static void bind_qsets(struct adapter *adap)
 }
 
 #define FW_FNAME "t3fw-%d.%d.%d.bin"
+#define TPSRAM_NAME "t3%c_protocol_sram-%d.%d.%d.bin"
 
 static int upgrade_fw(struct adapter *adap)
 {
@@ -742,6 +743,61 @@ static int upgrade_fw(struct adapter *adap)
return ret;
 }
 
+static inline char t3rev2char(struct adapter *adapter)
+{
+   char rev = 0;
+
+   switch(adapter->params.rev) {
+   case T3_REV_A:
+   rev = 'a';
+   break;
+   case T3_REV_B:
+   case T3_REV_B2:
+   rev = 'b';
+   break;
+   case T3_REV_C:
+   rev = 'c';
+   break;
+   }
+   return rev;
+}
+
+int update_tpsram(struct adapter *adap)
+{
+   const struct firmware *tpsram;
+   char buf[64];
+   struct device *dev = >pdev->dev;
+   int ret;
+   char rev;
+   
+   rev = t3rev2char(adap);
+   if (!rev)
+   return 0;
+
+   snprintf(buf, sizeof(buf), TPSRAM_NAME, rev,
+TP_VERSION_MAJOR, TP_VERSION_MINOR, TP_VERSION_MICRO);
+
+   ret = request_firmware(, buf, dev);
+   if (ret < 0) {
+   dev_err(dev, "could not load TP SRAM: unable to load %s\n",
+   buf);
+   return ret;
+   }
+   
+   ret = t3_check_tpsram(adap, tpsram->data, tpsram->size);
+   if (ret)
+   goto release_tpsram;
+
+   ret = t3_set_proto_sram(adap, tpsram->data);
+   if (ret)
+   dev_err(dev, "loading protocol SRAM failed\n");
+
+release_tpsram:
+   release_firmware(tpsram);
+   
+   return ret;
+}
+
 /**
  * cxgb_up - enable the adapter
  * @adapter: adapter being enabled
@@ -755,6 +811,7 @@ static int upgrade_fw(struct adapter *adap)
 static int cxgb_up(struct adapter *adap)
 {
int err = 0;
+   int must_load;
 
if (!(adap->flags & FULL_INIT_DONE)) {
err = t3_check_fw_version(adap);
@@ -763,6 +820,13 @@ static int cxgb_up(struct adapter *adap)
if (err)
goto out;
 
+   err = t3_check_tpsram_version(adap, _load);
+   if (err == -EINVAL) {
+   err = update_tpsram(adap);
+   if (err && must_load)
+   goto out;
+   }
+
err = init_dummy_netdevs(adap);
if (err)
goto out;
@@ -1097,9 +1161,11 @@ static int get_eeprom_len(struct net_device *dev)
 static void get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
 {
u32 fw_vers = 0;
+   u32 tp_vers = 0;
struct adapter *adapter = dev->priv;
 
t3_get_fw_version(adapter, _vers);
+   t3_get_tp_version(adapter, _vers);
 
strcpy(info->driver, DRV_NAME);
strcpy(info->version, DRV_VERSION);
@@ 

[PATCH 6/11 RESEND] cxgb3 - Fatal error update

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Stop the MAC when a fatal error is detected.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/cxgb3_main.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index dc5d269..a1f94cf 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -2270,6 +2270,10 @@ void t3_fatal_err(struct adapter *adapter)
 
if (adapter->flags & FULL_INIT_DONE) {
t3_sge_stop(adapter);
+   t3_write_reg(adapter, A_XGM_TX_CTRL, 0);
+   t3_write_reg(adapter, A_XGM_RX_CTRL, 0);
+   t3_write_reg(adapter, XGM_REG(A_XGM_TX_CTRL, 1), 0);
+   t3_write_reg(adapter, XGM_REG(A_XGM_RX_CTRL, 1), 0);
t3_intr_disable(adapter);
}
CH_ALERT(adapter, "encountered fatal error, operation suspended\n");
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/11 RESEND] cxgb3 - log adapter serial number

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Log HW serial number when cxgb3 module is loaded.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/common.h |2 ++
 drivers/net/cxgb3/cxgb3_main.c |6 --
 drivers/net/cxgb3/t3_hw.c  |3 ++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h
index 55922ed..d54446f 100644
--- a/drivers/net/cxgb3/common.h
+++ b/drivers/net/cxgb3/common.h
@@ -97,6 +97,7 @@ enum {
MAX_NPORTS = 2, /* max # of ports */
MAX_FRAME_SIZE = 10240, /* max MAC frame size, including header + FCS */
EEPROMSIZE = 8192,  /* Serial EEPROM size */
+   SERNUM_LEN = 16,/* Serial # length */
RSS_TABLE_SIZE = 64,/* size of RSS lookup and mapping tables */
TCB_SIZE = 128, /* TCB size */
NMTUS = 16, /* size of MTU table */
@@ -391,6 +392,7 @@ struct vpd_params {
unsigned int uclk;
unsigned int mdc;
unsigned int mem_timing;
+   u8 sn[SERNUM_LEN + 1];
u8 eth_base[6];
u8 port_type[MAX_NPORTS];
unsigned short xauicfg[2];
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index a1f94cf..e5744e7 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -2333,10 +2333,12 @@ static void __devinit print_port_info(struct adapter 
*adap,
   (adap->flags & USING_MSIX) ? " MSI-X" :
   (adap->flags & USING_MSI) ? " MSI" : "");
if (adap->name == dev->name && adap->params.vpd.mclk)
-   printk(KERN_INFO "%s: %uMB CM, %uMB PMTX, %uMB PMRX\n",
+   printk(KERN_INFO
+  "%s: %uMB CM, %uMB PMTX, %uMB PMRX, S/N: %s\n",
   adap->name, t3_mc7_size(>cm) >> 20,
   t3_mc7_size(>pmtx) >> 20,
-  t3_mc7_size(>pmrx) >> 20);
+  t3_mc7_size(>pmrx) >> 20,
+  adap->params.vpd.sn);
}
 }
 
diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c
index dd3149d..23b1a16 100644
--- a/drivers/net/cxgb3/t3_hw.c
+++ b/drivers/net/cxgb3/t3_hw.c
@@ -505,7 +505,7 @@ struct t3_vpd {
u8 vpdr_len[2];
VPD_ENTRY(pn, 16);  /* part number */
VPD_ENTRY(ec, 16);  /* EC level */
-   VPD_ENTRY(sn, 16);  /* serial number */
+   VPD_ENTRY(sn, SERNUM_LEN); /* serial number */
VPD_ENTRY(na, 12);  /* MAC address base */
VPD_ENTRY(cclk, 6); /* core clock */
VPD_ENTRY(mclk, 6); /* mem clock */
@@ -648,6 +648,7 @@ static int get_vpd_params(struct adapter *adapter, struct 
vpd_params *p)
p->uclk = simple_strtoul(vpd.uclk_data, NULL, 10);
p->mdc = simple_strtoul(vpd.mdc_data, NULL, 10);
p->mem_timing = simple_strtoul(vpd.mt_data, NULL, 10);
+   memcpy(p->sn, vpd.sn_data, SERNUM_LEN);
 
/* Old eeproms didn't have port information */
if (adapter->params.rev == 0 && !vpd.port0_data[0]) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/11 RESEND] cxgb3 - use immediate data for offload Tx

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Send small TX_DATA work requests as immediate data even when
there are fragments. this avoids doing multiple DMAs for 
small fragmented packets. 
The driver already implements this optimization for small
contiguous packets.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/sge.c |   17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 9213cda..dca2716 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -1182,8 +1182,8 @@ int t3_eth_xmit(struct sk_buff *skb, struct net_device 
*dev)
  *
  * Writes a packet as immediate data into a Tx descriptor.  The packet
  * contains a work request at its beginning.  We must write the packet
- * carefully so the SGE doesn't read accidentally before it's written in
- * its entirety.
+ * carefully so the SGE doesn't read it accidentally before it's written
+ * in its entirety.
  */
 static inline void write_imm(struct tx_desc *d, struct sk_buff *skb,
 unsigned int len, unsigned int gen)
@@ -1191,7 +1191,11 @@ static inline void write_imm(struct tx_desc *d, struct 
sk_buff *skb,
struct work_request_hdr *from = (struct work_request_hdr *)skb->data;
struct work_request_hdr *to = (struct work_request_hdr *)d;
 
-   memcpy([1], [1], len - sizeof(*from));
+   if (likely(!skb->data_len))
+   memcpy([1], [1], len - sizeof(*from));
+   else
+   skb_copy_bits(skb, sizeof(*from), [1], len - sizeof(*from));
+
to->wr_hi = from->wr_hi | htonl(F_WR_SOP | F_WR_EOP |
V_WR_BCNTLFLT(len & 7));
wmb();
@@ -1261,7 +1265,7 @@ static inline void reclaim_completed_tx_imm(struct 
sge_txq *q)
 
 static inline int immediate(const struct sk_buff *skb)
 {
-   return skb->len <= WR_LEN && !skb->data_len;
+   return skb->len <= WR_LEN;
 }
 
 /**
@@ -1467,12 +1471,13 @@ static void write_ofld_wr(struct adapter *adap, struct 
sk_buff *skb,
  */
 static inline unsigned int calc_tx_descs_ofld(const struct sk_buff *skb)
 {
-   unsigned int flits, cnt = skb_shinfo(skb)->nr_frags;
+   unsigned int flits, cnt;
 
-   if (skb->len <= WR_LEN && cnt == 0)
+   if (skb->len <= WR_LEN)
return 1;   /* packet fits as immediate data */
 
flits = skb_transport_offset(skb) / 8;  /* headers */
+   cnt = skb_shinfo(skb)->nr_frags;
if (skb->tail != skb->transport_header)
cnt++;
return flits_to_desc(flits + sgl_len(cnt));
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/11 RESEND] cxgb3 - Expose HW memory page info

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

A HW issue requires limiting the receive window size
to 23 pages of internal memory.
These pages can be configured to different sizes,
thus the RDMA driver needs to know the
page size to enforce the upper limit.

Also assign explicit enum values.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/cxgb3_ctl_defs.h |   52 +---
 drivers/net/cxgb3/cxgb3_offload.c  |7 +
 2 files changed, 38 insertions(+), 21 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_ctl_defs.h 
b/drivers/net/cxgb3/cxgb3_ctl_defs.h
index 2095dda..6c4f320 100644
--- a/drivers/net/cxgb3/cxgb3_ctl_defs.h
+++ b/drivers/net/cxgb3/cxgb3_ctl_defs.h
@@ -33,27 +33,29 @@
 #define _CXGB3_OFFLOAD_CTL_DEFS_H
 
 enum {
-   GET_MAX_OUTSTANDING_WR,
-   GET_TX_MAX_CHUNK,
-   GET_TID_RANGE,
-   GET_STID_RANGE,
-   GET_RTBL_RANGE,
-   GET_L2T_CAPACITY,
-   GET_MTUS,
-   GET_WR_LEN,
-   GET_IFF_FROM_MAC,
-   GET_DDP_PARAMS,
-   GET_PORTS,
-
-   ULP_ISCSI_GET_PARAMS,
-   ULP_ISCSI_SET_PARAMS,
-
-   RDMA_GET_PARAMS,
-   RDMA_CQ_OP,
-   RDMA_CQ_SETUP,
-   RDMA_CQ_DISABLE,
-   RDMA_CTRL_QP_SETUP,
-   RDMA_GET_MEM,
+   GET_MAX_OUTSTANDING_WR  = 0,
+   GET_TX_MAX_CHUNK= 1,
+   GET_TID_RANGE   = 2,
+   GET_STID_RANGE  = 3,
+   GET_RTBL_RANGE  = 4,
+   GET_L2T_CAPACITY= 5,
+   GET_MTUS= 6,
+   GET_WR_LEN  = 7,
+   GET_IFF_FROM_MAC= 8,
+   GET_DDP_PARAMS  = 9,
+   GET_PORTS   = 10,
+
+   ULP_ISCSI_GET_PARAMS= 11,
+   ULP_ISCSI_SET_PARAMS= 12,
+
+   RDMA_GET_PARAMS = 13,
+   RDMA_CQ_OP  = 14,
+   RDMA_CQ_SETUP   = 15,
+   RDMA_CQ_DISABLE = 16,
+   RDMA_CTRL_QP_SETUP  = 17,
+   RDMA_GET_MEM= 18,
+
+   GET_RX_PAGE_INFO= 50,
 };
 
 /*
@@ -161,4 +163,12 @@ struct rdma_ctrlqp_setup {
unsigned long long base_addr;
unsigned int size;
 };
+
+/*
+ * Offload TX/RX page information.
+ */
+struct ofld_page_info {
+   unsigned int page_size;  /* Page size, should be a power of 2 */
+   unsigned int num;/* Number of pages */
+};
 #endif /* _CXGB3_OFFLOAD_CTL_DEFS_H */
diff --git a/drivers/net/cxgb3/cxgb3_offload.c 
b/drivers/net/cxgb3/cxgb3_offload.c
index e620ed4..522c1be 100644
--- a/drivers/net/cxgb3/cxgb3_offload.c
+++ b/drivers/net/cxgb3/cxgb3_offload.c
@@ -317,6 +317,8 @@ static int cxgb_offload_ctl(struct t3cdev *tdev, unsigned 
int req, void *data)
struct iff_mac *iffmacp;
struct ddp_params *ddpp;
struct adap_ports *ports;
+   struct ofld_page_info *rx_page_info;
+   struct tp_params *tp = >params.tp;
int i;
 
switch (req) {
@@ -382,6 +384,11 @@ static int cxgb_offload_ctl(struct t3cdev *tdev, unsigned 
int req, void *data)
if (!offload_running(adapter))
return -EAGAIN;
return cxgb_rdma_ctl(adapter, req, data);
+   case GET_RX_PAGE_INFO:
+   rx_page_info = data;
+   rx_page_info->page_size = tp->rx_pg_size;
+   rx_page_info->num = tp->rx_num_pgs;
+   break;
default:
return -EOPNOTSUPP;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/11 RESEND] cxgb3 - tighten checks on TID values

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Enforce validity checks on connection ids

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/cxgb3_defs.h|   20 ++--
 drivers/net/cxgb3/cxgb3_offload.c |   28 +++-
 2 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_defs.h b/drivers/net/cxgb3/cxgb3_defs.h
index 483a594..45e9216 100644
--- a/drivers/net/cxgb3/cxgb3_defs.h
+++ b/drivers/net/cxgb3/cxgb3_defs.h
@@ -79,9 +79,17 @@ static inline struct t3c_tid_entry *lookup_tid(const struct 
tid_info *t,
 static inline struct t3c_tid_entry *lookup_stid(const struct tid_info *t,
unsigned int tid)
 {
+   union listen_entry *e;
+
if (tid < t->stid_base || tid >= t->stid_base + t->nstids)
return NULL;
-   return &(stid2entry(t, tid)->t3c_tid);
+
+   e = stid2entry(t, tid);
+   if ((void *)e->next >= (void *)t->tid_tab &&
+   (void *)e->next < (void *)>atid_tab[t->natids])
+   return NULL;
+
+   return >t3c_tid;
 }
 
 /*
@@ -90,9 +98,17 @@ static inline struct t3c_tid_entry *lookup_stid(const struct 
tid_info *t,
 static inline struct t3c_tid_entry *lookup_atid(const struct tid_info *t,
unsigned int tid)
 {
+   union active_open_entry *e;
+
if (tid < t->atid_base || tid >= t->atid_base + t->natids)
return NULL;
-   return &(atid2entry(t, tid)->t3c_tid);
+
+   e = atid2entry(t, tid);
+   if ((void *)e->next >= (void *)t->tid_tab &&
+   (void *)e->next < (void *)>atid_tab[t->natids])
+   return NULL;
+
+   return >t3c_tid;
 }
 
 int process_rx(struct t3cdev *dev, struct sk_buff **skbs, int n);
diff --git a/drivers/net/cxgb3/cxgb3_offload.c 
b/drivers/net/cxgb3/cxgb3_offload.c
index 522c1be..7fb526a 100644
--- a/drivers/net/cxgb3/cxgb3_offload.c
+++ b/drivers/net/cxgb3/cxgb3_offload.c
@@ -57,7 +57,7 @@ static DEFINE_RWLOCK(adapter_list_lock);
 static LIST_HEAD(adapter_list);
 
 static const unsigned int MAX_ATIDS = 64 * 1024;
-static const unsigned int ATID_BASE = 0x10;
+static const unsigned int ATID_BASE = 0x1;
 
 static inline int offload_activated(struct t3cdev *tdev)
 {
@@ -684,10 +684,19 @@ static int do_cr(struct t3cdev *dev, struct sk_buff *skb)
 {
struct cpl_pass_accept_req *req = cplhdr(skb);
unsigned int stid = G_PASS_OPEN_TID(ntohl(req->tos_tid));
+   struct tid_info *t = &(T3C_DATA(dev))->tid_maps;
struct t3c_tid_entry *t3c_tid;
+   unsigned int tid = GET_TID(req);
 
-   t3c_tid = lookup_stid(&(T3C_DATA(dev))->tid_maps, stid);
-   if (t3c_tid->ctx && t3c_tid->client->handlers &&
+   if (unlikely(tid >= t->ntids)) {
+   printk("%s: passive open TID %u too large\n",
+  dev->name, tid);
+   t3_fatal_err(tdev2adap(dev));
+   return CPL_RET_BUF_DONE;
+   }
+   
+   t3c_tid = lookup_stid(t, stid);
+   if (t3c_tid && t3c_tid->ctx && t3c_tid->client->handlers &&
t3c_tid->client->handlers[CPL_PASS_ACCEPT_REQ]) {
return t3c_tid->client->handlers[CPL_PASS_ACCEPT_REQ]
(dev, skb, t3c_tid->ctx);
@@ -769,16 +778,25 @@ static int do_act_establish(struct t3cdev *dev, struct 
sk_buff *skb)
 {
struct cpl_act_establish *req = cplhdr(skb);
unsigned int atid = G_PASS_OPEN_TID(ntohl(req->tos_tid));
+   struct tid_info *t = &(T3C_DATA(dev))->tid_maps;
struct t3c_tid_entry *t3c_tid;
+   unsigned int tid = GET_TID(req);
 
-   t3c_tid = lookup_atid(&(T3C_DATA(dev))->tid_maps, atid);
+   if (unlikely(tid >= t->ntids)) {
+   printk("%s: active establish TID %u too large\n",
+  dev->name, tid);
+   t3_fatal_err(tdev2adap(dev));
+   return CPL_RET_BUF_DONE;
+   }
+
+   t3c_tid = lookup_atid(t, atid);
if (t3c_tid && t3c_tid->ctx && t3c_tid->client->handlers &&
t3c_tid->client->handlers[CPL_ACT_ESTABLISH]) {
return t3c_tid->client->handlers[CPL_ACT_ESTABLISH]
(dev, skb, t3c_tid->ctx);
} else {
printk(KERN_ERR "%s: received clientless CPL command 0x%x\n",
-  dev->name, CPL_PASS_ACCEPT_REQ);
+  dev->name, CPL_ACT_ESTABLISH);
return CPL_RET_BUF_DONE | CPL_RET_BAD_MSG;
}
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/11 RESEND] cxgb3 - SGE doorbell overflow warning

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Log doorbell Fifo overflow

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/regs.h |8 
 drivers/net/cxgb3/sge.c  |4 
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h
index aa80313..2824278 100644
--- a/drivers/net/cxgb3/regs.h
+++ b/drivers/net/cxgb3/regs.h
@@ -172,6 +172,14 @@
 
 #define A_SG_INT_CAUSE 0x5c
 
+#define S_HIPIODRBDROPERR11
+#define V_HIPIODRBDROPERR(x) ((x) << S_HIPIODRBDROPERR)
+#define F_HIPIODRBDROPERRV_HIPIODRBDROPERR(1U)
+
+#define S_LOPIODRBDROPERR10
+#define V_LOPIODRBDROPERR(x) ((x) << S_LOPIODRBDROPERR)
+#define F_LOPIODRBDROPERRV_LOPIODRBDROPERR(1U)
+
 #define S_RSPQDISABLED3
 #define V_RSPQDISABLED(x) ((x) << S_RSPQDISABLED)
 #define F_RSPQDISABLEDV_RSPQDISABLED(1U)
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index a2cfd68..9213cda 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -2476,6 +2476,10 @@ void t3_sge_err_intr_handler(struct adapter *adapter)
 "(0x%x)\n", (v >> S_RSPQ0DISABLED) & 0xff);
}
 
+   if (status & (F_HIPIODRBDROPERR | F_LOPIODRBDROPERR))
+   CH_ALERT(adapter, "SGE dropped %s priority doorbell\n",
+status & F_HIPIODRBDROPERR ? "high" : "lo");
+
t3_write_reg(adapter, A_SG_INT_CAUSE, status);
if (status & (F_RSPQCREDITOVERFOW | F_RSPQDISABLED))
t3_fatal_err(adapter);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/11 RESEND] cxgb3 - Update rx coalescing length

2007-08-21 Thread Divy Le Ray
From: Divy Le Ray <[EMAIL PROTECTED]>

Reduce Rx coalescing length to 12288
Large bursts from the adapter to the host create back pressure
on the chip. Reducing the burst size avoids the issue.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/common.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h
index c46c249..55922ed 100644
--- a/drivers/net/cxgb3/common.h
+++ b/drivers/net/cxgb3/common.h
@@ -104,7 +104,7 @@ enum {
PROTO_SRAM_LINES = 128, /* size of TP sram */
 };
 
-#define MAX_RX_COALESCING_LEN 16224U
+#define MAX_RX_COALESCING_LEN 12288U
 
 enum {
PAUSE_RX = 1 << 0,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/11] cxgb3 - driver updates

2007-08-21 Thread Divy Le Ray

Jeff,

I'm resubmitting the last cxgb3 patch series against netdev-2.6#upstream,
minus the first patch that you already applied and the last patch.

Here is a brief description:
-   Modify max HW Rx coalescing size
-   Log SGE doorbell Fifo overflow
-   Use Tx immediate data for offload packets whenever possible
-   RDMA can get internal mem info to workaround HW issues
-   More validity checks on connection ids
-   Stop MAC when a fatal error is detected
-   Log HW serial number
-   Update internal mem operating mode
-   Update engine microcode management, version is now 1.1.0
-   Update FW management, version is now 4.6.0
-   Ignore some HW errors until the HW is initialized

Cheers,
Divy



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-08-21 Thread Kyle Moffett

On Aug 21, 2007, at 11:50:48, Casey Schaufler wrote:

--- Kyle Moffett <[EMAIL PROTECTED]> wrote:
Well, in this case the "box" I want to secure will eventually be  
running multi-user X on a multi-level-with-IPsec network.  For  
that kind of protection profile, there is presently no substitute  
for SELinux with some X11 patches.  AppArmor certainly doesn't  
meet the confidentiality requirements (no data labelling), and  
SMACK has no way of doing the very tight per-syscall security  
requirements we have to meet.


And what requirements would those be? Seriously, I've done Common  
Criteria and TCSEC evaluations on systems with less flexibility and  
granularity than Smack that included X, NFSv3, NIS, clusters, and  
all sorts of spiffy stuff.


These are requirements more of the "give the client warm fuzzies".   
On the other hand, when designing a box that could theoretically be  
run on a semi-public unclassified network and yet still be safe  
enough to run classified data over IPsec links, you want to give the  
client all the warm fuzzies they ask for and more.



I mean, if the requirement is anything short of "runs SELinux" I  
have good reason to believe that a Smack based system is up to it.


"up to it", yes, but I think you'll find that beyond the simplest  
policies, an SELinux policy that properly uses the SELinux  
infrastructure will be much shorter than the equivalent SMACK policy,  
not even including all the things that SELinux does and SMACK doesn't.



I didn't make this clear initially but that is the kind of system  
I'm talking about wanting to secure some 50 million lines of code on.


Cool. SELinux provides one approach to dealing with that, and the  
huge multiuser general purpose machine chuck full of legacy  
software hits the SELinux sweet spot.


Well, given that 99.9% of the systems people are really concerned  
about security on are multi-user general-purpose machines chuck full  
of legacy software, that seems to work just fine.  If it's a single- 
user box then you don't even need MAC, just a firewall, a good locked  
rack/case/keyboard/etc, and decent physical security.  If it's  
entirely custom-controlled software then you can just implement the  
"MAC" entirely in your own software.  "General-purpose" vs "special- 
purpose" is debatable, so I'll just leave that one lie.


Replying to another email:
but you written it in wrong language. You written it in C, while  
you should have written it in SELinux policy language (and your  
favourite scripting language as frontend).


I have often marvelled at the notion of a simplification layer.  I  
believe that you build complex things on top of simple things, not  
the other way around.


There is no "one answer" to this question in software development.   
Generally you prioritize things based on maximizing maintainability  
and speed and minimizing code, bugs, and complexity.  Those are often  
both conflicting and in agreement.  Here are a few common examples of  
simple-thing-on-complex-thing:

  *  pthreads on top of clone()
  *  open(some_string) on top of all the complex VFS machinery
  *  "netcat" on top of the vast Linux network stack including  
support for arbitrary packet filtering and transformation.


In addition, "simple" is undesirable if it makes the implementation  
less generic for no good reason.  Would you want to use the "simple"  
MS Windows disk-drive model under Linux?  Every disk is its own  
letter and has its files under it.  Oh, you wanted to mount a  
filesystem over C:\tmp?  Sorry, we don't support that, too bad.   
Under Linux we have a very flexible and powerful VFS which lets you  
do very crazy things, and then for the user's convenience we have  
various "simple" interfaces (like Gnome/KDE/XFCE).


Software development is very much about finding the Right Model(TM)  
to underlie the system, and then building any simplifications-to-the- 
user on top of the very simple model.


Look at the SELinux model again; it has the following things:
  (A) Labels on almost-all user-visible kernel objects
  (B) Individual access rules for almost every operation on those  
objects

  (C) "Transition" rules to set the label on newly created objects.
  (D) Fundamental "constraints" which enforce hard limits on what  
may be permitted with "allow" rules


From a fundamental standpoint it's harder to get much simpler than  
that.  On top of that model, we also have a bit of additional  
*flexibility* for MLS/RBAC, although that flexibility may be ignored  
completely.

  (1) You can define "users" which may only assume some "roles"
  (2) You can define "roles" may only run in some "types"
  (3) There's a simple way of declaring multiple "levels" and  
"dominance".


So you see, SELinux is a pretty fundamental description of the  
degrees of flexibility needed to secure everything.  That kind of  
FUNDAMENTAL description is what belongs in the kernel.  Anything else  
can and should be built on top with 

Re: [linux-usb-devel] [4/4] 2.6.23-rc3: known regressions

2007-08-21 Thread Linus Torvalds


On Tue, 21 Aug 2007, Linus Torvalds wrote:
> 
> Side note: after reverting 196705c9bb I can't get the mouse to skip any 
> more on that mac mini. But since the bad behaviour wasn't 100% reliable to 
> begin with, that's not really a guarantee of anything. Two out of three 
> kids are off on camp this week, so that machine probably won't be getting 
> a lot of testing ;/

Well, my one remaining child said today that "I got so much time on 
webkinz today - yesterday the mouse locked up after five minutes".

Apparently it hadn't had the mouse lock up at all today.

So I really do believe that that 196705c9bb commit caused problems on 
intel-only USB machines too ("ondemand" cpufreq governor, switching 
between 1.0-1.66 Ghz using acpi-cpufreq: totally bog-standard in all 
respects, in other words).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [autofs] [PATCH] autofs4: reinstate negatitive timeout of mount fails

2007-08-21 Thread Ian Kent
On Wed, 2007-08-22 at 10:56 +0800, Ian Kent wrote:
> On Tue, 2007-08-21 at 13:15 -0700, Andrew Morton wrote:
> > 
> > It seems to use a lot of list_for_each[_safe] which could
> > have been coded as list_for_each_entry[_safe], btw.
> 
> Mmm .. good point. I've not noticed the list_for_each_entry* macros.

A good idea but that change would cover more than just this patch so I'd
rather leave the patch as is and submit a cleanup patch to cover this
later.

Ian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How to learn Linux Kernel Programming

2007-08-21 Thread Glauber de Oliveira Costa
On 8/21/07, Noud Aldenhoven <[EMAIL PROTECTED]> wrote:

> I'm a simple Math/Computer Science student and would like to learn
> more about linux and it's kernel.
> To be more precise, I'd to learn how to program in the linux kernel
> and maybe become a developer,
> if everything goes fine.
> But where do I start? Almost all information I found on the Internet
> if from before 2005 and I think that
> means it's out-of-date. Are there up-to-date documentations that are
> use full to read and explain how
> the kernel is build. (for example, is /usr/src/linux/Documentation a
> use full dir?)

Besides the sources already mentioned, there are a couple of quite
good books. I know at least Robert Love's Linus Kernel Development, by
O'Reilly, Rubini et. al. Linux Device Drivers,  and Mel Gorman's about
Virtual Memory, whose exact name I can't recall.

You can also try to start following LKML's flow. Maybe you won't
understand much in the beginning, but your comprehension on the
discussions will improve in the future. (Maybe reading a subsystem
mailing list - less traffic - is a good idea, if you have some
specific interests)

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] fix - ensure we don't use bootconsoles after init has been released

2007-08-21 Thread Robin Getz
From: Robin Getz <[EMAIL PROTECTED]>

Gerd Hoffmann pointed out that my patch from yesterday can lead 
to a null pointer dereference if the kernel is booted with no
console, and no earlyprintk defined. This fixes that issue.

 printk.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

Signed-off-by: Robin Getz <[EMAIL PROTECTED]>

---

Index: linux-2.6.x/kernel/printk.c
===
--- linux-2.6.x/kernel/printk.c 
+++ linux-2.6.x/kernel/printk.c
@@ -1106,10 +1106,12 @@
 
 static int __init disable_boot_consoles(void)
 {
-   if (console_drivers->flags & CON_BOOT) {
-   printk(KERN_INFO "turn off boot console %s%d\n",
-   console_drivers->name, console_drivers->index);
-   return unregister_console(console_drivers);
+   if (console_drivers != NULL) {
+   if (console_drivers->flags & CON_BOOT) {
+   printk(KERN_INFO "turn off boot console %s%d\n",
+   console_drivers->name, console_drivers->index);
+   return unregister_console(console_drivers);
+   }
}
return 0;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 09/14] Convert from class_device to device for SPI

2007-08-21 Thread Tony Jones
On Tue, Aug 21, 2007 at 11:28:28AM -0700, David Brownell wrote:
> Can you update the Documentation/spi/spi-summary text which is
> invalidated by this change?  That's part of why I rejected an
> earlier version of this patch:  since it broke the documentation,
> it was incomplete.

I believe this is the necessary documentation changes.  Alas I can't write 
verbiage you are necessarily happy with, only you can do that, if there is
a factual error, I'll be happy to correct but feel free to edit for personal 
style.  I'll be gone thru Sunday so if it needs more adjustment I'll do it 
then.

Tony

--

Convert from class_device to device for drivers/spi.  This is part of the work 
to eliminate struct class_device.

Signed-off-by: Tony Jones <[EMAIL PROTECTED]>

---
 Documentation/spi/spi-summary |   13 +++--
 drivers/spi/spi.c |   36 ++--
 drivers/spi/spi_bitbang.c |2 +-
 drivers/spi/spi_lm70llp.c |2 +-
 include/linux/spi/spi.h   |   12 ++--
 5 files changed, 33 insertions(+), 32 deletions(-)

--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -207,7 +207,7 @@ struct spi_device *spi_new_device(struct
  struct spi_board_info *chip)
 {
struct spi_device   *proxy;
-   struct device   *dev = master->cdev.dev;
+   struct device   *dev = master->dev.parent;
int status;
 
/* NOTE:  caller did any chip->bus_num checks necessary.
@@ -242,7 +242,7 @@ struct spi_device *spi_new_device(struct
proxy->modalias = chip->modalias;
 
snprintf(proxy->dev.bus_id, sizeof proxy->dev.bus_id,
-   "%s.%u", master->cdev.class_id,
+   "%s.%u", master->dev.bus_id,
chip->chip_select);
proxy->dev.parent = dev;
proxy->dev.bus = _bus_type;
@@ -341,18 +341,18 @@ static void scan_boardinfo(struct spi_ma
 
 /*-*/
 
-static void spi_master_release(struct class_device *cdev)
+static void spi_master_release(struct device *dev)
 {
struct spi_master *master;
 
-   master = container_of(cdev, struct spi_master, cdev);
+   master = container_of(dev, struct spi_master, dev);
kfree(master);
 }
 
 static struct class spi_master_class = {
.name   = "spi_master",
.owner  = THIS_MODULE,
-   .release= spi_master_release,
+   .dev_release= spi_master_release,
 };
 
 
@@ -360,7 +360,7 @@ static struct class spi_master_class = {
  * spi_alloc_master - allocate SPI master controller
  * @dev: the controller, possibly using the platform_bus
  * @size: how much zeroed driver-private data to allocate; the pointer to this
- * memory is in the class_data field of the returned class_device,
+ * memory is in the driver_data field of the returned device,
  * accessible with spi_master_get_devdata().
  * Context: can sleep
  *
@@ -386,9 +386,9 @@ struct spi_master *spi_alloc_master(stru
if (!master)
return NULL;
 
-   class_device_initialize(>cdev);
-   master->cdev.class = _master_class;
-   master->cdev.dev = get_device(dev);
+   device_initialize(>dev);
+   master->dev.class = _master_class;
+   master->dev.parent = get_device(dev);
spi_master_set_devdata(master, [1]);
 
return master;
@@ -418,7 +418,7 @@ EXPORT_SYMBOL_GPL(spi_alloc_master);
 int spi_register_master(struct spi_master *master)
 {
static atomic_t dyn_bus_id = ATOMIC_INIT((1<<15) - 1);
-   struct device   *dev = master->cdev.dev;
+   struct device   *dev = master->dev.parent;
int status = -ENODEV;
int dynamic = 0;
 
@@ -443,12 +443,12 @@ int spi_register_master(struct spi_maste
/* register the device, then userspace will see it.
 * registration fails if the bus ID is in use.
 */
-   snprintf(master->cdev.class_id, sizeof master->cdev.class_id,
+   snprintf(master->dev.bus_id, sizeof master->dev.bus_id,
"spi%u", master->bus_num);
-   status = class_device_add(>cdev);
+   status = device_add(>dev);
if (status < 0)
goto done;
-   dev_dbg(dev, "registered master %s%s\n", master->cdev.class_id,
+   dev_dbg(dev, "registered master %s%s\n", master->dev.bus_id,
dynamic ? " (dynamic)" : "");
 
/* populate children from any spi device tables */
@@ -481,8 +481,8 @@ void spi_unregister_master(struct spi_ma
 {
int dummy;
 
-   dummy = device_for_each_child(master->cdev.dev, NULL, __unregister);
-   class_device_unregister(>cdev);
+   dummy = device_for_each_child(master->dev.parent, NULL, __unregister);
+   

Re: bug in migrate page

2007-08-21 Thread KAMEZAWA Hiroyuki
On Wed, 22 Aug 2007 10:50:53 +0800
Shaohua Li <[EMAIL PROTECTED]> wrote:

> > At quick glance, above path has no writepage() ops.
> > just replace swap's radix tree entry.
> I missed swap has .migratepage and thought fallback_migrate_page is
> used, then I thought doing rcu lock in PageAnon case is ok.
> 
Thank you, I'll write a patch.

Regards,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bug in migrate page

2007-08-21 Thread Shaohua Li
On Wed, 2007-08-22 at 11:52 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 22 Aug 2007 10:08:09 +0800
> Shaohua Li <[EMAIL PROTECTED]> wrote:
> 
> > commit dc386d4d1e98bb39fb967ee156cd456c802fc692 adds rcu_read_lock, but
> > some routines in the lock range might sleep (like lock_buffer,
> > aops->writepage), I saw a 'sleep in atomic' warning. It appears the
> > patch has several versions before. Doing rcu_read_lock in PageAnon
> > sounds break the case of PageAnon(page) && PageSwapCache(page),
> > as .writepage might be called. The dummy anon patch maybe is ok.
> > 
> 
> Thank you for catching.
> 
> Maybe you're correct.
> 
> BTW, in PageAnon(page) && PageSwapCache(page) case, I can't find when
> .writepage is called. Could you explain ?
> 
> In my understanding,
> 
> rcu_read_lock()
>   -> try_to_unmap()
>   -> move_to_new_page()
>   -> migrate_page() // swap has .migratepage member.
>   -> migrate_page_move_mapping().
>   -> migrate_page_copy().
>   -> remove_migration_ptes().
> 
> 
> At quick glance, above path has no writepage() ops.
> just replace swap's radix tree entry.
I missed swap has .migratepage and thought fallback_migrate_page is
used, then I thought doing rcu lock in PageAnon case is ok.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


What must I do to get HPET

2007-08-21 Thread Russ Dill
I have an Acer Ferrari 5000 laptop, it has an AMD64 TL-60 processor, and an
RS480 host bridge. I'm running 2.6.22, I get no HPET.

>From looking at the kernel source, the HPET driver is looking for PNP0103. I 
>see
no PNP0103 entry on my machine, just PNP0100. Searching around, it looks like
kernels running on other RS480 systems are finding HPETs.

http://lists.openwall.net/linux-kernel/2007/03/26/215

Do I need to get Acer to update the BIOS to include a PNP0103 entry? Is there
some way I can force this?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] autofs4: reinstate negatitive timeout of mount fails

2007-08-21 Thread Ian Kent
On Tue, 2007-08-21 at 13:15 -0700, Andrew Morton wrote:
> On Tue, 21 Aug 2007 17:26:09 +0800
> Ian Kent <[EMAIL PROTECTED]> wrote:
> 
> > Due to a change to fs/dcache.c:d_lookup() in the 2.6 kernel whereby only
> > hashed dentrys are returned the negative caching of mount failures
> > stopped working in the autofs4 module for nobrowse mount (ie. directory
> > created at mount time and removed at umount or following a mount
> > failure).
> > 
> > This patch keeps track of the dentrys from mount fails in order to be
> > able check the timeout since the last fail and return the appropriate
> > status. In addition the timeout value is settable at load time as a
> > module option and via sysfs using the module
> > parameter /sys/module/autofs4/parameters/negative_timeout.
> 
> Boy, that's a complex-looking patch.  I think I'll sit on this one
> for 2.6.24 ;)

Yes, that's fine .. the principle isn't that complex.

> 
> It seems to use a lot of list_for_each[_safe] which could
> have been coded as list_for_each_entry[_safe], btw.

Mmm .. good point. I've not noticed the list_for_each_entry* macros.

Ian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bug in migrate page

2007-08-21 Thread KAMEZAWA Hiroyuki
On Wed, 22 Aug 2007 10:08:09 +0800
Shaohua Li <[EMAIL PROTECTED]> wrote:

> commit dc386d4d1e98bb39fb967ee156cd456c802fc692 adds rcu_read_lock, but
> some routines in the lock range might sleep (like lock_buffer,
> aops->writepage), I saw a 'sleep in atomic' warning. It appears the
> patch has several versions before. Doing rcu_read_lock in PageAnon
> sounds break the case of PageAnon(page) && PageSwapCache(page),
> as .writepage might be called. The dummy anon patch maybe is ok.
> 

Thank you for catching.

Maybe you're correct.

BTW, in PageAnon(page) && PageSwapCache(page) case, I can't find when
.writepage is called. Could you explain ?

In my understanding,

rcu_read_lock()
-> try_to_unmap()
-> move_to_new_page()
-> migrate_page() // swap has .migratepage member.
-> migrate_page_move_mapping().
-> migrate_page_copy().
-> remove_migration_ptes().


At quick glance, above path has no writepage() ops.
just replace swap's radix tree entry.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] autofs4: reinstate negatitive timeout of mount fails

2007-08-21 Thread Ian Kent
On Tue, 2007-08-21 at 10:19 -0400, Peter Staubach wrote:
> Ian Kent wrote:
> > Hi,
> >
> > Due to a change to fs/dcache.c:d_lookup() in the 2.6 kernel whereby only
> > hashed dentrys are returned the negative caching of mount failures
> > stopped working in the autofs4 module for nobrowse mount (ie. directory
> > created at mount time and removed at umount or following a mount
> > failure).
> >
> > This patch keeps track of the dentrys from mount fails in order to be
> > able check the timeout since the last fail and return the appropriate
> > status. In addition the timeout value is settable at load time as a
> > module option and via sysfs using the module
> > parameter /sys/module/autofs4/parameters/negative_timeout.
> >
> > Signed-off-by: Ian Kent <[EMAIL PROTECTED]>
> >
> > ---
> > --- linux-2.6.23-rc2-mm2/fs/autofs4/init.c.negative-timeout 2007-07-09 
> > 07:32:17.0 +0800
> > +++ linux-2.6.23-rc2-mm2/fs/autofs4/init.c  2007-08-21 15:44:34.0 
> > +0800
> > @@ -14,6 +14,10 @@
> >  #include 
> >  #include "autofs_i.h"
> >  
> > +unsigned int negative_timeout = AUTOFS_NEGATIVE_TIMEOUT;
> > +module_param(negative_timeout, uint, S_IRUGO | S_IWUSR);
> > +MODULE_PARM_DESC(negative_timeout, "Cache mount fails negatively for this 
> > many seconds");
> > +
> >  static int autofs_get_sb(struct file_system_type *fs_type,
> > int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> >  {
> > --- linux-2.6.23-rc2-mm2/fs/autofs4/inode.c.negative-timeout
> > 2007-08-17 11:52:33.0 +0800
> > +++ linux-2.6.23-rc2-mm2/fs/autofs4/inode.c 2007-08-21 15:44:34.0 
> > +0800
> > @@ -46,6 +46,7 @@ struct autofs_info *autofs4_init_ino(str
> > ino->inode = NULL;
> > ino->dentry = NULL;
> > ino->size = 0;
> > +   ino->negative_timeout = negative_timeout;
> >  
> > INIT_LIST_HEAD(>rehash);
> >  
> > @@ -98,11 +99,24 @@ void autofs4_free_ino(struct autofs_info
> >  static void autofs4_force_release(struct autofs_sb_info *sbi)
> >  {
> > struct dentry *this_parent = sbi->sb->s_root;
> > -   struct list_head *next;
> > +   struct list_head *p, *next;
> >  
> > if (!sbi->sb->s_root)
> > return;
> >  
> > +   /* Cleanup the negative dentry cache */
> > +   spin_lock(>rehash_lock);
> > +   list_for_each_safe(p, next, >rehash_list) {
> > +   struct autofs_info *ino;
> > +   struct dentry *dentry;
> > +   ino = list_entry(p, struct autofs_info, rehash);
> > +   dentry = ino->dentry;
> > +   spin_unlock(>rehash_lock);
> > +   dput(ino->dentry);
> >   
> 
> Should this be dput(dentry);?

It could be since they're the same or maybe I should get rid of the
assignment. Maybe that would save a couple of cpu cycles.

> 
> Thanx...
> 
>ps
> 
> 
> > +   spin_lock(>rehash_lock);
> > +   }
> > +   spin_unlock(>rehash_lock);
> > +
> > spin_lock(_lock);
> >  repeat:
> > next = this_parent->d_subdirs.next;
> > --- linux-2.6.23-rc2-mm2/fs/autofs4/autofs_i.h.negative-timeout 
> > 2007-08-17 11:52:33.0 +0800
> > +++ linux-2.6.23-rc2-mm2/fs/autofs4/autofs_i.h  2007-08-21 
> > 15:44:34.0 +0800
> > @@ -40,6 +40,14 @@
> >  #define DPRINTK(fmt,args...) do {} while(0)
> >  #endif
> >  
> > +/*
> > + * If the daemon returns a negative response (AUTOFS_IOC_FAIL) then we keep
> > + * the negative response cached for up to the time given here, although
> > + * the time can be shorter if the kernel throws the dcache entry away.
> > + */
> > +#define AUTOFS_NEGATIVE_TIMEOUT60  /* default 1 minute */
> > +extern unsigned int negative_timeout;
> > +
> >  /* Unified info structure.  This is pointed to by both the dentry and
> > inode structures.  Each file in the filesystem has an instance of this
> > structure.  It holds a reference to the dentry, so dentries are never
> > @@ -52,8 +60,16 @@ struct autofs_info {
> >  
> > int flags;
> >  
> > +   /*
> > +* Two types of unhashed dentry can exist on this list.
> > +* Negative dentrys from failed mounts and positive dentrys
> > +* resulting from a race between expire and mount. This 
> > +* fact is used when looking for dentrys in the list.
> > +*/
> > struct list_head rehash;
> >  
> > +   unsigned int negative_timeout;
> > +
> > struct autofs_sb_info *sbi;
> > unsigned long last_used;
> > atomic_t count;
> > --- linux-2.6.23-rc2-mm2/fs/autofs4/root.c.negative-timeout 2007-08-17 
> > 11:53:38.0 +0800
> > +++ linux-2.6.23-rc2-mm2/fs/autofs4/root.c  2007-08-21 15:44:34.0 
> > +0800
> > @@ -238,6 +238,125 @@ out:
> > return dcache_readdir(file, dirent, filldir);
> >  }
> >  
> > +static int autofs4_compare_dentry(struct dentry *parent, struct dentry 
> > *dentry, struct qstr *name)
> > +{
> > +   unsigned int len = name->len;
> > +   unsigned int hash = name->hash;
> > +   const unsigned char *str = name->name;
> > +   struct qstr *qstr = >d_name;
> 

Re: Problems with IDE on linux 2.6.22.X

2007-08-21 Thread Rene Herman

On 08/22/2007 03:39 AM, José Luis Patiño Andrés wrote:


You have a SATA harddrive (Hitachi Travelstar 5K100 100GB SATA/2.5") and an
IDE (also known as PATA) DVD drive (LG GMA-4082N). That is, your disk
should be driven by the:

"Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support"

under the "Serial ATA (prod) and Parallel ATA (experimental) drivers" menu,
and it seems this driver should also take care of your DVD. Not sure from
your report what you are using -- first try with only that driver, and
nothing from the old "ATA/ATAPI/MFM/RLL support" menu selected.

In that situation, your harddrive works, but your DVD does not?


Okay, now it's tested as you said. In fact, in this way with only the SATA 
drivers activated and ATA/ATAPI support completely unselected, my HDD works 
but my DVD not.


Okay. Jeff, Alan -- 2.6.20.15 apparently working. A few weeks ago there was 
another report of a DVD drive failing detection on pata_amd (my CD and DVD 
drives work fine on pata_amd). Did some ATAPI timeouts change or something?


He's using:

00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) Serial 
ATA Storage Controller IDE (rev 02) (prog-if 80 [Master])



And so...


If so, this should be fixed in the driver, but to get things working I
believe you may try with both the above driver for your harddisk and the
old IDE driver for the DVD:

<*>   Enhanced IDE/MFM/RLL disk/cdrom/tape/floppy support
<*> Include IDE/ATAPI CDROM support (NEW)
[*] PCI IDE chipset support
[*] Generic PCI bus-master DMA support
<*>   Intel PIIXn chipsets support


Checked.


(do not select IDE/ATA-2 disk support)


Unselected.

Now, I have this kernel panic:
###
#VFS: cannot open root device "sda3" or unknown-block (0,0)
#Please, append a correct "root=" boot option; here are the available 
#partitions:

#1600 4194302 hdc driver: ide-cdrom


Okay, makes sense, seems the new driver simply can't grab the SATA part 
anymore when the old driver already's got the IDE part -- I wasn't sure 
about that (not a SATA user myself -- just noticed your report due to 
noticing that previous one due to pata_amd...).


The old SATA driver available from the IDE menu also does not support your 
chip, so I don't believe there are any workarounds -- you'll need the issue 
fixed.


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2.6.23-rc2-mm2] small fix for ia64 icache sync patch

2007-08-21 Thread KAMEZAWA Hiroyuki
This is updated version. 

Andrew, could you repleace ?

-Kame
==
Fixing 2 small issues pointed by Tony Luck.

Changelog v1 -> v2
* add pte_present_exec_user()
* remove pte_user
* fixed comments.

v1.
* removing redundant BUG_ON in __ia64_sync_icache_dcache().
* check pte_present() first.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 arch/ia64/mm/init.c|2 --
 include/asm-ia64/pgtable.h |   17 +++--
 2 files changed, 11 insertions(+), 8 deletions(-)

Index: linux-2.6.23-rc2-mm2/include/asm-ia64/pgtable.h
===
--- linux-2.6.23-rc2-mm2.orig/include/asm-ia64/pgtable.h
+++ linux-2.6.23-rc2-mm2/include/asm-ia64/pgtable.h
@@ -297,7 +297,6 @@ ia64_phys_addr_valid (unsigned long addr
 /*
  * The following have defined behavior only work if pte_present() is true.
  */
-#define pte_user(pte)  ((pte_val(pte) & _PAGE_PL_MASK) == _PAGE_PL_3)
 #define pte_write(pte) ((unsigned) (((pte_val(pte) & _PAGE_AR_MASK) >> 
_PAGE_AR_SHIFT) - 2) <= 4)
 #define pte_exec(pte)  ((pte_val(pte) & _PAGE_AR_RX) != 0)
 #define pte_dirty(pte) ((pte_val(pte) & _PAGE_D) != 0)
@@ -324,14 +323,20 @@ ia64_phys_addr_valid (unsigned long addr
  *  set_pte() is also called by the kernel, but we can expect that the kernel
  *  flushes icache explicitly if necessary.
  */
+#define pte_present_exec_user(pte)\
+   ((pte_val(pte) & (_PAGE_P | _PAGE_PL_MASK | _PAGE_AR_RX)) == \
+   (_PAGE_P | _PAGE_PL_3 | _PAGE_AR_RX))
+
 extern void __ia64_sync_icache_dcache(pte_t pteval);
 static inline void set_pte(pte_t *ptep, pte_t pteval)
 {
-   if (pte_exec(pteval) &&// flush only new executable page.
-   pte_present(pteval) && // swap out ?
-   pte_user(pteval) &&// ignore kernel page
-   (!pte_present(*ptep) ||// do_no_page or swap in, migration,
-   pte_pfn(*ptep) != pte_pfn(pteval))) // do_wp_page(), page copy
+   /* page is present && page is user  && page is executable
+* && (page swapin or new page or page migraton
+*  || copy_on_write with page copying.)
+*/
+   if (pte_present_exec_user(pteval) &&
+   (!pte_present(*ptep) ||
+   pte_pfn(*ptep) != pte_pfn(pteval)))
/* load_module() calles flush_icache_range() explicitly*/
__ia64_sync_icache_dcache(pteval);
*ptep = pteval;
Index: linux-2.6.23-rc2-mm2/arch/ia64/mm/init.c
===
--- linux-2.6.23-rc2-mm2.orig/arch/ia64/mm/init.c
+++ linux-2.6.23-rc2-mm2/arch/ia64/mm/init.c
@@ -60,8 +60,6 @@ __ia64_sync_icache_dcache (pte_t pte)
struct page *page;
unsigned long order;
 
-   BUG_ON(!pte_exec(pte));
-
page = pte_page(pte);
addr = (unsigned long) page_address(page);
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] get_nodes should ignore invalid node

2007-08-21 Thread Shaohua Li
get_nodes doesn't check if nodes in node mask are valid, cause a kernel
oops when an invalid node is used..

Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>

Index: linux/mm/mempolicy.c
===
--- linux.orig/mm/mempolicy.c   2007-07-25 09:14:33.0 +0800
+++ linux/mm/mempolicy.c2007-08-21 13:15:41.0 +0800
@@ -850,6 +850,8 @@ static int get_nodes(nodemask_t *nodes, 
if (copy_from_user(nodes_addr(*nodes), nmask, nlongs*sizeof(unsigned 
long)))
return -EFAULT;
nodes_addr(*nodes)[nlongs-1] &= endmask;
+
+   nodes_and(*nodes, *nodes, node_online_map);
return 0;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


bug in migrate page

2007-08-21 Thread Shaohua Li
commit dc386d4d1e98bb39fb967ee156cd456c802fc692 adds rcu_read_lock, but
some routines in the lock range might sleep (like lock_buffer,
aops->writepage), I saw a 'sleep in atomic' warning. It appears the
patch has several versions before. Doing rcu_read_lock in PageAnon
sounds break the case of PageAnon(page) && PageSwapCache(page),
as .writepage might be called. The dummy anon patch maybe is ok.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] add some Blackfin specific checks to checkpatch.pl

2007-08-21 Thread Mike Frysinger
Check for a few common errors in Blackfin-specific code wrt MMR loading in
assembly and doing core/system syncs.  Restrict the Blackfin MMR checks to 
actual Blackfin assembly files as pointed out by Joe Perches.

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
CC: Bryan Wu <[EMAIL PROTECTED]>
CC: Andy Whitcroft <[EMAIL PROTECTED]>
---
 scripts/checkpatch.pl |   22 
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dae7d30..ead9675 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -486,9 +486,31 @@ sub process {
WARN("line over 80 characters\n" . $herecurr);
}
 
+# Blackfin: use hi/lo macros
+   if ($realfile =~ [EMAIL PROTECTED]/blackfin/.*\.S$@) {
+   if ($line =~ 
/\.[lL][[:space:]]*=.*&[[:space:]]*0x[fF][fF][fF][fF]/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the LO() macro, not (... & 
0x)\n" . $herevet);
+   }
+   if ($line =~ /\.[hH][[:space:]]*=.*>>[[:space:]]*16/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the HI() macro, not (... >> 16)\n" . 
$herevet);
+   }
+   }
+
 # check we are in a valid source file *.[hc] if not then ignore this hunk
next if ($realfile !~ /\.[hc]$/);
 
+# Blackfin: don't use __builtin_bfin_[cs]sync
+   if ($line =~ /__builtin_bfin_csync/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the CSYNC() macro in asm/blackfin.h\n" . 
$herevet);
+   }
+   if ($line =~ /__builtin_bfin_ssync/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the SSYNC() macro in asm/blackfin.h\n" . 
$herevet);
+   }
+
 # at the beginning of a line any tabs must come first and anything
 # more than 8 must use tabs.
if ($line=~/^\+\s* \t\s*\S/ or $line=~/^\+\s*\s*/) {
-- 
1.5.3.rc5


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH 11/23] make atomic_read() and atomic_set() behavior consistent on m32r

2007-08-21 Thread Hirokazu Takata
From: Chris Snook <[EMAIL PROTECTED]>
Date: Mon, 13 Aug 2007 07:24:52 -0400
> From: Chris Snook <[EMAIL PROTECTED]>
> 
> Use volatile consistently in atomic.h on m32r.
> 
> Signed-off-by: Chris Snook <[EMAIL PROTECTED]>

Thanks,

Acked-by: Hirokazu Takata <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] on the system with companion host controller, error -71 returns

2007-08-21 Thread Kiyoshi Sasaki

Alan

Thank you for your comment. I'll try to change the load order.

--

Kiyoshi Sasaki <[EMAIL PROTECTED]>


On Tue, 21 Aug 2007, Kiyoshi Sasaki wrote:


Hello,

I see below errors in dmesg on ICH6/ICH7 machine:

usb 1-1: device not accepting address 2, error -71
or
usb 1-1: device descriptor read/all, error -71

I'm trying to debug it, but by now I can't make it.
Can you give me your help ?


There's nothing to debug; these messages are perfectly normal.  If you 
want to prevent them from occurring, you should change the load order 
of your modules: Load ehci-hcd before uhci-hcd.


Alan Stern



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with IDE on linux 2.6.22.X

2007-08-21 Thread José Luis Patiño Andrés
El Miércoles, 22 de Agosto de 2007 00:08, Rene Herman escribió:

> You have a SATA harddrive (Hitachi Travelstar 5K100 100GB SATA/2.5") and an
> IDE (also known as PATA) DVD drive (LG GMA-4082N). That is, your disk
> should be driven by the:
>
> "Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support"
>
> under the "Serial ATA (prod) and Parallel ATA (experimental) drivers" menu,
> and it seems this driver should also take care of your DVD. Not sure from
> your report what you are using -- first try with only that driver, and
> nothing from the old "ATA/ATAPI/MFM/RLL support" menu selected.
>
> In that situation, your harddrive works, but your DVD does not?

Okay, now it's tested as you said. In fact, in this way with only the SATA 
drivers activated and ATA/ATAPI support completely unselected, my HDD works 
but my DVD not.

And so...

> If so, this should be fixed in the driver, but to get things working I
> believe you may try with both the above driver for your harddisk and the
> old IDE driver for the DVD:
>
> <*>   Enhanced IDE/MFM/RLL disk/cdrom/tape/floppy support
> <*> Include IDE/ATAPI CDROM support (NEW)
> [*] PCI IDE chipset support
> [*] Generic PCI bus-master DMA support
> <*>   Intel PIIXn chipsets support

Checked.

> (do not select IDE/ATA-2 disk support)

Unselected.

Now, I have this kernel panic:
###
#VFS: cannot open root device "sda3" or unknown-block (0,0)
#Please, append a correct "root=" boot option; here are the available 
#partitions:
#1600 4194302 hdc driver: ide-cdrom
#Kernel panic - not syncing: VFS: Unable to mount root fs on 
#unknown-block(0,0)
###

> where you may need to boot with a "libata.atapi_enabled=0" kernel
> parameter.

This parameter has no effect. I have the same kernel panic with or without it.

José Luis Patiño.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] fix the max path calculation in radix-tree.c

2007-08-21 Thread Nick Piggin
On Tue, Aug 21, 2007 at 03:48:42PM -0400, Jeff Moyer wrote:
> Hi,
> 
> A while back, Nick Piggin introduced a patch to reduce the node memory
> usage for small files (commit cfd9b7df4abd3257c9e381b0e445817b26a51c0c):
> 
> -#define RADIX_TREE_MAP_SHIFT 6
> +#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)
> 
> Unfortunately, he didn't take into account the fact that the
> calculation of the maximum path was based on an assumption of having
> to round up:
> 
> #define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2)
> 
> So, if CONFIG_BASE_SMALL is set, you will end up with a
> RADIX_TREE_MAX_PATH that is one greater than necessary.  The practical
> upshot of this is just a bit of wasted memory (one long in the
> height_to_maxindex array, an extra pre-allocated radix tree node per
> cpu, and extra stack usage in a couple of functions), but it seems
> worth getting right.
> 
> It's also worth noting that I never build with CONFIG_BASE_SMALL.
> What I did to test this was duplicate the code in a small user-space
> program and check the results of the calculations for max path and the
> contents of the height_to_maxindex array.
> 
> Cheers.
> 
> Signed-off-by: Jeff Moyer <[EMAIL PROTECTED]>
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 514efb2..67c908f 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -60,7 +60,8 @@ struct radix_tree_path {
>  };
>  
>  #define RADIX_TREE_INDEX_BITS  (8 /* CHAR_BIT */ * sizeof(unsigned long))
> -#define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2)
> +#define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
> +   RADIX_TREE_MAP_SHIFT) + 1)
>  
>  static unsigned long height_to_maxindex[RADIX_TREE_MAX_PATH] __read_mostly;
>  

OK, after you DIV_ROUND_UP, what is the extra 1 for? For paths, it is because
they are NULL terminated paths I guess (without remembering too hard), and for
height_to_maxindex array it is needed for 0-height trees I think. So it would
be kinda cleaner to have the _real_ MAX_PATH, and two other constants for
this array and the paths arrays (that just happen to be identical due to
implementation). Don't you think?

But that's not to nack this patch. On the contrary I think your logic is
correct, and it should be fixed. I didn't check the maths myself but I trust
you :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG?] 2.6.23-rc3 on alpha

2007-08-21 Thread Bob Tracy
Thanks to Richard for the "aboot" fixes.

I'm seeing something new and strange with 2.6.23-rc3 that I wasn't
seeing in the 2.6.22+ kernels.  I've got the bootlogo code enabled,
and at the point during system initialization where the logo
disappears, the console switches from tty1 to tty2.  I can switch
back to tty1, so other than the unexpected console tty switch, there
doesn't seem to be anything "unfortunate" happening.

Any ideas/explanations?  It's completely repeatable.  I don't think
it's related to the "aboot" patches :-).

-- 
---
Bob Tracy   | "Eagles may soar, but weasels don't get
[EMAIL PROTECTED]|  sucked into jet engines."   --Anon
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix lazy mode vmalloc synchronization for paravirt

2007-08-21 Thread Zachary Amsden

Found this looping Ubuntu installs with VMI.

If unlucky enough to hit a vmalloc sync fault during a lazy mode 
operation (from an IRQ handler for a module which was not yet populated 
in current page directory, or from inside copy_one_pte, which touches 
swap_map, and hit in an unused 4M region), the required PDE update would 
never get flushed, causing an infinite page fault loop.


This bug affects any paravirt-ops backend which uses lazy updates, I 
believe that makes it a bug in Xen, VMI and lguest.  It only happens on 
LOWMEM kernels.


Currently for 2.6.23, but we'll want to backport to -stable as well.

Zach
Touching vmalloc memory in the middle of a lazy mode update can generate
a kernel PDE update, which must be flushed immediately.  The fix is to
leave lazy mode when doing a vmalloc sync.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff --git a/arch/i386/mm/fault.c b/arch/i386/mm/fault.c
diff --git a/arch/i386/mm/fault.c b/arch/i386/mm/fault.c
index 01ffdd4..fcb38e7 100644
--- a/arch/i386/mm/fault.c
+++ b/arch/i386/mm/fault.c
@@ -249,9 +249,10 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 	pmd_k = pmd_offset(pud_k, address);
 	if (!pmd_present(*pmd_k))
 		return NULL;
-	if (!pmd_present(*pmd))
+	if (!pmd_present(*pmd)) {
 		set_pmd(pmd, *pmd_k);
-	else
+		arch_flush_lazy_mmu_mode();
+	} else
 		BUG_ON(pmd_page(*pmd) != pmd_page(*pmd_k));
 	return pmd_k;
 }


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> 
> > As I am going back through the initial cmpxchg_local implementation, it
> > seems like it was executing __slab_alloc() with preemption disabled,
> > which is wrong. new_slab() is not designed for that.
> 
> The version I send you did not use preemption.
> 
> We need to make a decision if we want to go without preemption and cmpxchg 
> or with preemption and cmpxchg_local.
> 

I don't expect any performance improvements with cmpxchg() over irq
disable/restore. I think we'll have to use cmpxchg_local

Also, we may argue that locked cmpxchg will have more scalability impact
than cmpxchg_local. Actually, I expect the LOCK prefix to have a bigger
scalability impact than the irq save/restore pair.

> If we really want to do this then the implementation of all of these 
> components need to result in competitive performance on all platforms.
> 

The minor issue I see here is on architectures where we have to simulate
cmpxchg_local with irq save/restore. Depending on how we implement the
code, it may result in two irq save/restore pairs instead of one, which
could make the code slower. However, if we are clever enough in our
low-level primitive usage, I think we could make the code use
cmpxchg_local when available and fall back on only _one_ irq disabled
section surrounding the whole code for other architectures.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] writeback time order/delay fixes take 3

2007-08-21 Thread Fengguang Wu
On Tue, Aug 21, 2007 at 08:23:14PM -0400, Chris Mason wrote:
> On Sun, 12 Aug 2007 17:11:20 +0800
> Fengguang Wu <[EMAIL PROTECTED]> wrote:
> 
> > Andrew and Ken,
> > 
> > Here are some more experiments on the writeback stuff.
> > Comments are highly welcome~ 
> 
> I've been doing benchmarks lately to try and trigger fragmentation, and
> one of them is a simulation of make -j N.  It takes a list of all
> the .o files in the kernel tree, randomly sorts them and then
> creates bogus files with the same names and sizes in clean kernel trees.
> 
> This is basically creating a whole bunch of files in random order in a
> whole bunch of subdirectories.
> 
> The results aren't pretty:
> 
> http://oss.oracle.com/~mason/compilebench/makej/compare-compile-dirs-0.png
> 
> The top graph shows one dot for each write over time.  It shows that
> ext3 is basically writing all over the place the whole time.  But, ext3
> actually wins the read phase, so the layout isn't horrible.  My guess
> is that if we introduce some write clustering by sending a group of
> inodes down at the same time, it'll go much much better.
> 
> Andrew has mentioned bringing a few radix trees into the writeback paths
> before, it seems like file servers and other general uses will benefit
> from better clustering here.
> 
> I'm hoping to talk you into trying it out ;)

Thank you for the description of problem. So far I have a similar one
in mind: if we are to delay writeback of atime-dirty-only inodes to
above 1 hour, some grouping/piggy-backing scenario would be
beneficial.  (Which I guess does not deserve the complexity now that
we have Ingo's make-reltime-default patch.)

My vague idea is to
- keep the s_io/s_more_io as a FIFO/cyclic writeback dispatching queue.
- convert s_dirty to some radix-tree/rbtree based data structure.
  It would have dual functions: delayed-writeback and clustered-writeback.
  
clustered-writeback:
- Use inode number as clue of locality, hence the key for the sorted
  tree.
- Drain some more s_dirty inodes into s_io on every kupdate wakeup,
  but do it in the ascending order of inode number instead of
  ->dirtied_when. 

delayed-writeback:
- Make sure that a full scan of the s_dirty tree takes <=30s, i.e.
  dirty_expire_interval.

Notes:
(1) I'm not sure inode number is correlated to disk location in
filesystems other than ext2/3/4. Or parent dir?
(2) It duplicates some function of elevators. Why is it necessary?
Maybe we have no clue on the exact data location at this time?

Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ver_linux is [censored]

2007-08-21 Thread Jesper Juhl

Fix ver_linux glibc version printing (for real this time)

Alexey Dobriyan reported that commit 
4a645d5ea65baaa5736bcb566673bf4a351b2ad8
broke ver_linux when glibc has a 3 digit 
version number, and proposed a patch.
Al Viro then suggested a simpler way to 
solve the problem which I've then simply 
put into patch form.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
Signed-off-by: Al Viro <[EMAIL PROTECTED]>
Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
---

 scripts/ver_linux |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/scripts/ver_linux b/scripts/ver_linux
index 8f8df93..5a16bad 100755
--- a/scripts/ver_linux
+++ b/scripts/ver_linux
@@ -65,9 +65,8 @@ isdnctrl 2>&1 | grep version | awk \
 showmount --version 2>&1 | grep nfs-utils | awk \
 'NR==1{print "nfs-utils ", $NF}'
 
-ls -l `ldd /bin/sh | awk '/libc/{print $3}'` | sed \
--e 's/\.so$//' | sed -e 's/>//' | \
-awk -F'[.-]' '{print "Linux C Library"$(NF-1)"."$NF}'
+echo -n "Linux C Library"
+sed -n -e '/^.*\/libc-\([^/]*\)\.so$/{s//\1/;p;q}' < /proc/self/maps
 
 ldd -v > /dev/null 2>&1 && ldd -v || ldd --version |head -n 1 | awk \
 'NR==1{print "Dynamic linker (ldd)  ", $NF}'


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] dma: override "dma_flags_set_dmaflush" for sn-ia64

2007-08-21 Thread James Bottomley
On Tue, 2007-08-21 at 17:34 -0700, [EMAIL PROTECTED] wrote:
> On Tue, Aug 21, 2007 at 03:55:29PM -0500, James Bottomley wrote:
> 
> > .
> > Almost every platform supports posted DMA ... its a property of most PCI
> > bridge chips.
> > 
> 
> The term "posted DMA" is used to describe this behavior in the Altix 
> Device Driver Writer's Guide, but it may be confusing things here. 
> Maybe a better term will suggest itself if I can clarify

OK, but posted DMA has a pretty specific meaning in terms of PCI, hence
the confusion.

> On Altix, DMA from a device isn't guaranteed to arrive in host memory 
> in the order it was sent from the device. This reordering can happen 
> in the NUMA interconnect (it's specifically not a PCI reordering.)

This is mmiowb and read_relaxed() again, isn't it?

> > ..
> > This isn't possible on most platforms.  PCI write posting can only be
> > flushed by a read transaction on the device (or sometimes any device on
> > the bridge).  Either this interface is misnamed and misdescribed, or it
> > can't work for most systems.
> > 
> 
> Clearly it wasn't described adequately...
> 
> A read transaction on the device will flush pending writes to the 
> device. But I'm worried about DMA from the device to host memory. 
> On Altix, there are two mechanisms that flush all in-flight DMA 
> to host memory: 1) an interrupt, and 2) a write to a memory region 
> which has a "barrier" attribute set. Obviously option 1 isn't 
> viable for performance reasons. This new interface is about making 
> "option 2" generally available. (As it is now, the only way to get 
> memory with the "barrier" attribute is to allocate it with 
> dma_alloc_coherent().)

Which sounds exactly what mmiowb does ... is there a need for a new API;
can't you just use mmiowb()?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz 
> (hyperthreading enabled). Test run with your module show only minor 
> performance improvements and lots of regressions. So we must have 
> cmpxchg_local to see any improvements? Some kind of a recent optimization 
> of cmpxchg performance that we do not see on older cpus?
> 

I did not expect the cmpxchg with LOCK prefix to be faster than irq
save/restore. You will need to run these tests using cmpxchg_local to
see an improvement.

Mathieu

> 
> Code of kmem_cache_alloc (to show you that there are no debug options on):
> 
> Dump of assembler code for function kmem_cache_alloc:
> 0x4015cfa9 :push   %ebp
> 0x4015cfaa :mov%esp,%ebp
> 0x4015cfac :push   %edi
> 0x4015cfad :push   %esi
> 0x4015cfae :push   %ebx
> 0x4015cfaf :sub$0x10,%esp
> 0x4015cfb2 :mov%eax,%esi
> 0x4015cfb4 :   mov%edx,0xffe8(%ebp)
> 0x4015cfb7 :   mov0x4(%ebp),%eax
> 0x4015cfba :   mov%eax,0xfff0(%ebp)
> 0x4015cfbd :   mov%fs:0x404af008,%eax
> 0x4015cfc3 :   mov0x90(%esi,%eax,4),%edi
> 0x4015cfca :   mov(%edi),%ecx
> 0x4015cfcc :   test   %ecx,%ecx
> 0x4015cfce :   je 0x4015d00a 
> 
> 0x4015cfd0 :   mov0xc(%edi),%eax
> 0x4015cfd3 :   mov(%ecx,%eax,4),%eax
> 0x4015cfd6 :   mov%eax,%edx
> 0x4015cfd8 :   mov%ecx,%eax
> 0x4015cfda :   lock cmpxchg %edx,(%edi)
> 0x4015cfde :   mov%eax,%ebx
> 0x4015cfe0 :   cmp%ecx,%eax
> 0x4015cfe2 :   jne0x4015cfbd 
> 
> 0x4015cfe4 :   cmpw   $0x0,0xffe8(%ebp)
> 0x4015cfe9 :   jns0x4015d006 
> 
> 0x4015cfeb :   mov0x10(%edi),%edx
> 0x4015cfee :   xor%eax,%eax
> 0x4015cff0 :   mov%edx,%ecx
> 0x4015cff2 :   shr$0x2,%ecx
> 0x4015cff5 :   mov%ebx,%edi
> 
> Base
> 
> 1. Kmalloc: Repeatedly allocate then free test
> 1 times kmalloc(8) -> 332 cycles kfree -> 422 cycles
> 1 times kmalloc(16) -> 218 cycles kfree -> 360 cycles
> 1 times kmalloc(32) -> 214 cycles kfree -> 368 cycles
> 1 times kmalloc(64) -> 244 cycles kfree -> 390 cycles
> 1 times kmalloc(128) -> 320 cycles kfree -> 417 cycles
> 1 times kmalloc(256) -> 438 cycles kfree -> 550 cycles
> 1 times kmalloc(512) -> 527 cycles kfree -> 626 cycles
> 1 times kmalloc(1024) -> 678 cycles kfree -> 775 cycles
> 1 times kmalloc(2048) -> 748 cycles kfree -> 822 cycles
> 1 times kmalloc(4096) -> 641 cycles kfree -> 650 cycles
> 1 times kmalloc(8192) -> 741 cycles kfree -> 817 cycles
> 1 times kmalloc(16384) -> 872 cycles kfree -> 927 cycles
> 2. Kmalloc: alloc/free test
> 1 times kmalloc(8)/kfree -> 332 cycles
> 1 times kmalloc(16)/kfree -> 327 cycles
> 1 times kmalloc(32)/kfree -> 323 cycles
> 1 times kmalloc(64)/kfree -> 320 cycles
> 1 times kmalloc(128)/kfree -> 320 cycles
> 1 times kmalloc(256)/kfree -> 333 cycles
> 1 times kmalloc(512)/kfree -> 332 cycles
> 1 times kmalloc(1024)/kfree -> 330 cycles
> 1 times kmalloc(2048)/kfree -> 334 cycles
> 1 times kmalloc(4096)/kfree -> 674 cycles
> 1 times kmalloc(8192)/kfree -> 1155 cycles
> 1 times kmalloc(16384)/kfree -> 1226 cycles
> 
> Slub cmpxchg.
> 
> 1. Kmalloc: Repeatedly allocate then free test
> 1 times kmalloc(8) -> 296 cycles kfree -> 515 cycles
> 1 times kmalloc(16) -> 193 cycles kfree -> 412 cycles
> 1 times kmalloc(32) -> 188 cycles kfree -> 422 cycles
> 1 times kmalloc(64) -> 222 cycles kfree -> 441 cycles
> 1 times kmalloc(128) -> 292 cycles kfree -> 476 cycles
> 1 times kmalloc(256) -> 414 cycles kfree -> 589 cycles
> 1 times kmalloc(512) -> 513 cycles kfree -> 673 cycles
> 1 times kmalloc(1024) -> 694 cycles kfree -> 825 cycles
> 1 times kmalloc(2048) -> 739 cycles kfree -> 878 cycles
> 1 times kmalloc(4096) -> 636 cycles kfree -> 653 cycles
> 1 times kmalloc(8192) -> 715 cycles kfree -> 799 cycles
> 1 times kmalloc(16384) -> 855 cycles kfree -> 927 cycles
> 2. Kmalloc: alloc/free test
> 1 times kmalloc(8)/kfree -> 354 cycles
> 1 times kmalloc(16)/kfree -> 336 cycles
> 1 times kmalloc(32)/kfree -> 335 cycles
> 1 times kmalloc(64)/kfree -> 337 cycles
> 1 times kmalloc(128)/kfree -> 337 cycles
> 1 times kmalloc(256)/kfree -> 355 cycles
> 1 times kmalloc(512)/kfree -> 354 cycles
> 1 times kmalloc(1024)/kfree -> 337 cycles
> 1 times kmalloc(2048)/kfree -> 339 cycles
> 1 times kmalloc(4096)/kfree -> 674 cycles
> 1 times kmalloc(8192)/kfree -> 1128 cycles
> 1 times kmalloc(16384)/kfree -> 1240 cycles
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the 

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
Ok. Measurements vs. simple cmpxchg on a Intel(R) Pentium(R) 4 CPU 3.20GHz 
(hyperthreading enabled). Test run with your module show only minor 
performance improvements and lots of regressions. So we must have 
cmpxchg_local to see any improvements? Some kind of a recent optimization 
of cmpxchg performance that we do not see on older cpus?


Code of kmem_cache_alloc (to show you that there are no debug options on):

Dump of assembler code for function kmem_cache_alloc:
0x4015cfa9 :push   %ebp
0x4015cfaa :mov%esp,%ebp
0x4015cfac :push   %edi
0x4015cfad :push   %esi
0x4015cfae :push   %ebx
0x4015cfaf :sub$0x10,%esp
0x4015cfb2 :mov%eax,%esi
0x4015cfb4 :   mov%edx,0xffe8(%ebp)
0x4015cfb7 :   mov0x4(%ebp),%eax
0x4015cfba :   mov%eax,0xfff0(%ebp)
0x4015cfbd :   mov%fs:0x404af008,%eax
0x4015cfc3 :   mov0x90(%esi,%eax,4),%edi
0x4015cfca :   mov(%edi),%ecx
0x4015cfcc :   test   %ecx,%ecx
0x4015cfce :   je 0x4015d00a 
0x4015cfd0 :   mov0xc(%edi),%eax
0x4015cfd3 :   mov(%ecx,%eax,4),%eax
0x4015cfd6 :   mov%eax,%edx
0x4015cfd8 :   mov%ecx,%eax
0x4015cfda :   lock cmpxchg %edx,(%edi)
0x4015cfde :   mov%eax,%ebx
0x4015cfe0 :   cmp%ecx,%eax
0x4015cfe2 :   jne0x4015cfbd 
0x4015cfe4 :   cmpw   $0x0,0xffe8(%ebp)
0x4015cfe9 :   jns0x4015d006 
0x4015cfeb :   mov0x10(%edi),%edx
0x4015cfee :   xor%eax,%eax
0x4015cff0 :   mov%edx,%ecx
0x4015cff2 :   shr$0x2,%ecx
0x4015cff5 :   mov%ebx,%edi

Base

1. Kmalloc: Repeatedly allocate then free test
1 times kmalloc(8) -> 332 cycles kfree -> 422 cycles
1 times kmalloc(16) -> 218 cycles kfree -> 360 cycles
1 times kmalloc(32) -> 214 cycles kfree -> 368 cycles
1 times kmalloc(64) -> 244 cycles kfree -> 390 cycles
1 times kmalloc(128) -> 320 cycles kfree -> 417 cycles
1 times kmalloc(256) -> 438 cycles kfree -> 550 cycles
1 times kmalloc(512) -> 527 cycles kfree -> 626 cycles
1 times kmalloc(1024) -> 678 cycles kfree -> 775 cycles
1 times kmalloc(2048) -> 748 cycles kfree -> 822 cycles
1 times kmalloc(4096) -> 641 cycles kfree -> 650 cycles
1 times kmalloc(8192) -> 741 cycles kfree -> 817 cycles
1 times kmalloc(16384) -> 872 cycles kfree -> 927 cycles
2. Kmalloc: alloc/free test
1 times kmalloc(8)/kfree -> 332 cycles
1 times kmalloc(16)/kfree -> 327 cycles
1 times kmalloc(32)/kfree -> 323 cycles
1 times kmalloc(64)/kfree -> 320 cycles
1 times kmalloc(128)/kfree -> 320 cycles
1 times kmalloc(256)/kfree -> 333 cycles
1 times kmalloc(512)/kfree -> 332 cycles
1 times kmalloc(1024)/kfree -> 330 cycles
1 times kmalloc(2048)/kfree -> 334 cycles
1 times kmalloc(4096)/kfree -> 674 cycles
1 times kmalloc(8192)/kfree -> 1155 cycles
1 times kmalloc(16384)/kfree -> 1226 cycles

Slub cmpxchg.

1. Kmalloc: Repeatedly allocate then free test
1 times kmalloc(8) -> 296 cycles kfree -> 515 cycles
1 times kmalloc(16) -> 193 cycles kfree -> 412 cycles
1 times kmalloc(32) -> 188 cycles kfree -> 422 cycles
1 times kmalloc(64) -> 222 cycles kfree -> 441 cycles
1 times kmalloc(128) -> 292 cycles kfree -> 476 cycles
1 times kmalloc(256) -> 414 cycles kfree -> 589 cycles
1 times kmalloc(512) -> 513 cycles kfree -> 673 cycles
1 times kmalloc(1024) -> 694 cycles kfree -> 825 cycles
1 times kmalloc(2048) -> 739 cycles kfree -> 878 cycles
1 times kmalloc(4096) -> 636 cycles kfree -> 653 cycles
1 times kmalloc(8192) -> 715 cycles kfree -> 799 cycles
1 times kmalloc(16384) -> 855 cycles kfree -> 927 cycles
2. Kmalloc: alloc/free test
1 times kmalloc(8)/kfree -> 354 cycles
1 times kmalloc(16)/kfree -> 336 cycles
1 times kmalloc(32)/kfree -> 335 cycles
1 times kmalloc(64)/kfree -> 337 cycles
1 times kmalloc(128)/kfree -> 337 cycles
1 times kmalloc(256)/kfree -> 355 cycles
1 times kmalloc(512)/kfree -> 354 cycles
1 times kmalloc(1024)/kfree -> 337 cycles
1 times kmalloc(2048)/kfree -> 339 cycles
1 times kmalloc(4096)/kfree -> 674 cycles
1 times kmalloc(8192)/kfree -> 1128 cycles
1 times kmalloc(16384)/kfree -> 1240 cycles


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Trouble booting with new 2.6.22.3 kernel

2007-08-21 Thread Hex Star
I resolved the issue like so:

1) Download kernel source (K) from kernel.org

2) move the archive to a place where source is, /home/user/linux for
example (will have to do mkdir /home/user/linux first of course!) via
mv /home/user/Desktop/archive /dest/archive

3) Extract the archive: if .tar.bz2: tar -jxvf archive.tar.bz2 else
tar -zxvf archive.tar.gz

4) cd into resulting directory

5) do a sudo make defconfig, then sudo make xconfig (or if in terminal
menuconfig) and customize options for system, filesystem and
controller support must be built in!

6) do a sudo make

7) then sudo make modules_install

8) then sudo make install

9) then do a sudo update-initramfs -k kernelversion -c -v

10) then grub-update

11) reboot and enjoy!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ver_linux is [censored]

2007-08-21 Thread Al Viro
On Wed, Aug 22, 2007 at 02:02:44AM +0200, Jesper Juhl wrote:
> > How about simply doing
> > sh -c 'cat /proc/$$/maps'|sed -n -e '/^.*\/libc-\([^/]*\)\.so$/{s//\1/;p;q}'
> > and to hell with parsing ls -l output?
> >
> Works for me.

or, simpler yet,

sed -n -e '/^.*\/libc-\([^/]*\)\.so$/{s//\1/;p;q}' http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-21 Thread Paul E. McKenney
On Tue, Aug 21, 2007 at 06:51:16PM -0400, [EMAIL PROTECTED] wrote:
> On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said:
> 
> > I agree that instant gratification is hard to come by when synching
> > up compiler and kernel versions.  Nonetheless, it should be possible
> > to create APIs that are are conditioned on the compiler version.
> 
> We've tried that, sort of.  See the mess surrounding the whole
> extern/static/inline/__whatever boondogle, which seems to have
> changed semantics in every single gcc release since 2.95 or so.
> 
> And recently mention was made that gcc4.4 will have *new* semantics
> in this area. Yee. Hah.

;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How to learn Linux Kernel Programming

2007-08-21 Thread Jesper Juhl
On 21/08/07, Noud Aldenhoven <[EMAIL PROTECTED]> wrote:
> Hello Kernel Develop mailing list,
>
...
>
> I'm a simple Math/Computer Science student and would like to learn
> more about linux and it's kernel.
> To be more precise, I'd to learn how to program in the linux kernel
> and maybe become a developer,
> if everything goes fine.
> But where do I start?

Start by reading Documentation/HOWTO from a recent copy of the kernel source.


> Almost all information I found on the Internet
> if from before 2005

There's lots of good kernel related material to be found online. See
for example :

http://kernelnewbies.org/
http://janitor.kernelnewbies.org/
http://lwn.net/Kernel/LDD3/
http://lwn.net/Kernel/
http://kerneltrap.org/
http://kerneltraffic.org/


> and I think that
> means it's out-of-date.

That's not always true.


> Are there up-to-date documentations that are
> use full to read and explain how
> the kernel is build. (for example, is /usr/src/linux/Documentation a
> use full dir?)

Yes it is useful.  Not everything in there is 100% up-to-date, but
there is still a *LOT* of useful documentation to be found there.


> An other question I'd like to ask is how and where did you start? I'd
> like to know how you manage to became
> linux kernel developers.
>
Most people start out fixing small bugs, cleanups etc or by
implementing some small feature or driver that they need. There's no
fixed way.


-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Andi Kleen ([EMAIL PROTECTED]) wrote:
> Mathieu Desnoyers <[EMAIL PROTECTED]> writes:
> > 
> > The measurements I get (in cycles):
> >  enable interrupts (STI)   disable interrupts (CLI)   local 
> > CMPXCHG
> > IA32 (P4)11282 26
> > x86_64 AMD64 125   102 19
> 
> What exactly did you benchmark here? On K8 CLI/STI are only supposed
> to be a few cycles. pushf/popf might me more expensive, but not that much.
> 

Hi Andi,

I benchmarked cmpxchg_local vs local_irq_save/local_irq_restore.
Details, and code, follow.

* cpuinfo:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 35
model name  : AMD Athlon(tm)64 X2 Dual Core Processor  3800+
stepping: 2
cpu MHz : 2009.204
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 
3dnow pni lahf_lm cmp_legacy
bogomips: 4023.38
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 35
model name  : AMD Athlon(tm)64 X2 Dual Core Processor  3800+
stepping: 2
cpu MHz : 2009.204
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 
3dnow pni lahf_lm cmp_legacy
bogomips: 4018.49
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


* Test ran:


/* test-cmpxchg-nolock.c
 *
 * Compare local cmpxchg with irq disable / enable.
 */


#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define NR_LOOPS 2

int test_val;

static void do_test_cmpxchg(void)
{
int ret;
long flags;
unsigned int i;
cycles_t time1, time2, time;
long rem;

local_irq_save(flags);
preempt_disable();
time1 = get_cycles();
for (i = 0; i < NR_LOOPS; i++) {
ret = cmpxchg_local(_val, 0, 0);
}
time2 = get_cycles();
local_irq_restore(flags);
preempt_enable();
time = time2 - time1;

printk(KERN_ALERT "test results: time for non locked cmpxchg\n");
printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
printk(KERN_ALERT "total time: %llu\n", time);
time = div_long_long_rem(time, NR_LOOPS, );
printk(KERN_ALERT "-> non locked cmpxchg takes %llu cycles\n", time);
printk(KERN_ALERT "test end\n");
}

/*
 * This test will have a higher standard deviation due to incoming interrupts.
 */
static void do_test_enable_int(void)
{
long flags;
unsigned int i;
cycles_t time1, time2, time;
long rem;

local_irq_save(flags);
preempt_disable();
time1 = get_cycles();
for (i = 0; i < NR_LOOPS; i++) {
local_irq_restore(flags);
}
time2 = get_cycles();
local_irq_restore(flags);
preempt_enable();
time = time2 - time1;

printk(KERN_ALERT "test results: time for enabling interrupts (STI)\n");
printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
printk(KERN_ALERT "total time: %llu\n", time);
time = div_long_long_rem(time, NR_LOOPS, );
printk(KERN_ALERT "-> enabling interrupts (STI) takes %llu cycles\n",
time);
printk(KERN_ALERT "test end\n");
}

static void do_test_disable_int(void)
{
unsigned long flags, flags2;
unsigned int i;
cycles_t time1, time2, time;
long rem;

local_irq_save(flags);
preempt_disable();
time1 = get_cycles();
for ( i = 0; i < NR_LOOPS; i++) {
local_irq_save(flags2);
}
time2 = get_cycles();
local_irq_restore(flags);
preempt_enable();
time = time2 - time1;

printk(KERN_ALERT "test results: time for disabling interrupts 
(CLI)\n");
printk(KERN_ALERT "number of loops: %d\n", NR_LOOPS);
printk(KERN_ALERT "total time: %llu\n", time);
time = div_long_long_rem(time, NR_LOOPS, );
printk(KERN_ALERT "-> disabling interrupts (CLI) takes %llu cycles\n",
time);
 

Re: [PATCH 2/3] dma: override "dma_flags_set_dmaflush" for sn-ia64

2007-08-21 Thread akepner
On Tue, Aug 21, 2007 at 03:55:29PM -0500, James Bottomley wrote:

> .
> Almost every platform supports posted DMA ... its a property of most PCI
> bridge chips.
> 

The term "posted DMA" is used to describe this behavior in the Altix 
Device Driver Writer's Guide, but it may be confusing things here. 
Maybe a better term will suggest itself if I can clarify

On Altix, DMA from a device isn't guaranteed to arrive in host memory 
in the order it was sent from the device. This reordering can happen 
in the NUMA interconnect (it's specifically not a PCI reordering.)

> ..
> This isn't possible on most platforms.  PCI write posting can only be
> flushed by a read transaction on the device (or sometimes any device on
> the bridge).  Either this interface is misnamed and misdescribed, or it
> can't work for most systems.
> 

Clearly it wasn't described adequately...

A read transaction on the device will flush pending writes to the 
device. But I'm worried about DMA from the device to host memory. 
On Altix, there are two mechanisms that flush all in-flight DMA 
to host memory: 1) an interrupt, and 2) a write to a memory region 
which has a "barrier" attribute set. Obviously option 1 isn't 
viable for performance reasons. This new interface is about making 
"option 2" generally available. (As it is now, the only way to get 
memory with the "barrier" attribute is to allocate it with 
dma_alloc_coherent().)

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> As I am going back through the initial cmpxchg_local implementation, it
> seems like it was executing __slab_alloc() with preemption disabled,
> which is wrong. new_slab() is not designed for that.

The version I send you did not use preemption.

We need to make a decision if we want to go without preemption and cmpxchg 
or with preemption and cmpxchg_local.

If we really want to do this then the implementation of all of these 
components need to result in competitive performance on all platforms.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Andi Kleen
Mathieu Desnoyers <[EMAIL PROTECTED]> writes:
> 
> The measurements I get (in cycles):
>  enable interrupts (STI)   disable interrupts (CLI)   local 
> CMPXCHG
> IA32 (P4)11282 26
> x86_64 AMD64 125   102 19

What exactly did you benchmark here? On K8 CLI/STI are only supposed
to be a few cycles. pushf/popf might me more expensive, but not that much.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] x86_64 EFI runtime service support

2007-08-21 Thread Andi Kleen
> current LinuxBIOS's path: the elfboot in LinuxBIOS will prepare the
> e820 table, and jump to startup_32 in kernel. is that not good and
> simple? 

The problem is that the zero page cannot be changed at all in this
setup. Or rather it can be only changed by breaking LinuxBios.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[LOCKDEP][2.6.23-rc2-rt]

2007-08-21 Thread Sven-Thorsten Dietrich
Hi Ingo,

here is a lockdep trace I just encountered in the latest rt patch series.

(which has gotten a bit stale, btw.)

Enjoy,

Sven


=   
[ BUG: bad unlock balance detected! ]   
-   
swapper/1 is trying to release lock (per_cpu_lock__slab_irq_locks_locked) at:   
[] kmem_cache_alloc+0xb4/0x150
but there are no more locks to release! 

other info that might help us debug this:   
1 lock held by swapper/1:   
 #0:  (per_cpu_lock__slab_irq_locks_locked#7){--..}, at: [] c0

stack backtrace:

Call Trace: 
 [] print_unlock_inbalance_bug+0xf7/0x100 
 [] lock_release_non_nested+0x111/0x1a0   
 [] kmem_cache_alloc+0xb4/0x150   
 [] lock_release+0xd2/0x1f0   
 [] rt_spin_unlock+0x26/0x40  
 [] kmem_cache_alloc+0xb4/0x150   
 [] kobject_uevent_env+0x13c/0x520
 [] trace_hardirqs_on+0xd/0x10
 [] rt_mutex_slowunlock+0x54/0x90 
 [] get_bus+0x9/0x40  
 [] kobject_uevent+0x10/0x20  
 [] device_add+0x516/0x680
 [] device_register+0x1e/0x30 
 [] device_create+0xec/0x130  
 [] sprintf+0x6d/0x70 
 [] add_preempt_count+0x2b/0x150  
 [] put_lock_stats+0x13/0x40  
 [] lock_release_holdtime+0x6b/0x90   
 [] mark_held_locks+0x10/0x90 
 [] trace_hardirqs_on+0xd/0x10
 [] tty_register_device+0x74/0x100
 [] rt_mutex_slowunlock+0x54/0x90 
 [] tty_register_driver+0x16c/0x2a0   
 [] pty_init+0x22e/0x570  
 [] kernel_init+0x194/0x490   
 [] trace_hardirqs_on+0xd/0x10
 [] mark_held_locks+0x10/0x90 
 [] trace_hardirqs_on_thunk+0x3a/0x3c 
 [] trace_hardirqs_on_caller+0xd7/0x170   
 [] child_rip+0xa/0x12
 [] restore_args+0x0/0x30 
 [] kernel_init+0x0/0x490 
 [] child_rip+0x0/0x12

INFO: lockdep is turned off.
--- 
| preempt count:  ] 
| 0-level deep critical section nesting:


[ cut here ]
kernel BUG at kernel/rtmutex.c:682! 
invalid opcode:  [1] PREEMPT SMP
CPU 6   
Modules linked in:  
Pid: 1, comm: swapper Not tainted 2.6.23-rc2-rt1-debug #1   
RIP: 0010:[]  [] rt_spin_lock_slowlock+0x1b0
RSP: 0018:81041d837a10  EFLAGS: 00010046
RAX: 81031d836040 RBX: 81032c1542a0 RCX:    
RDX: 81031d836040 RSI: 81032c1542b8 RDI: 81032c1542a0   
RBP: 81041d837ad0 R08: 0002 R09: 0001   
R10:  R11:  R12: 0246   
R13: 80d0 R14: 81011b9009c0 R15: 805606b4   
FS:  

Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> 
> > - Rounding error.. you seem to round at 0.1ms, but I keep the values in
> >   cycles. The times that you get (1.1ms) seems strangely higher than
> >   mine, which are under 1000 cycles on a 3GHz system (less than 333ns).
> >   I guess there is both a ms - ns error there and/or not enough
> >   precision in your numbers.
> 
> Nope the rounding for output is depending on the amount. Rounds to one 
> digit after whatever unit we figured out is best to display.
> 
> And multiplications (cyc2ns) do not result in rounding errors.
> 

Ok, I see now that the 1.1ms was for the 1 iterations, which makes
it about 230 ns/iteration for the 1 times kmalloc(8) = 2.3ms test.

As I am going back through the initial cmpxchg_local implementation, it
seems like it was executing __slab_alloc() with preemption disabled,
which is wrong. new_slab() is not designed for that.

I'll try to run my tests on AMD64.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] writeback time order/delay fixes take 3

2007-08-21 Thread Chris Mason
On Sun, 12 Aug 2007 17:11:20 +0800
Fengguang Wu <[EMAIL PROTECTED]> wrote:

> Andrew and Ken,
> 
> Here are some more experiments on the writeback stuff.
> Comments are highly welcome~ 

I've been doing benchmarks lately to try and trigger fragmentation, and
one of them is a simulation of make -j N.  It takes a list of all
the .o files in the kernel tree, randomly sorts them and then
creates bogus files with the same names and sizes in clean kernel trees.

This is basically creating a whole bunch of files in random order in a
whole bunch of subdirectories.

The results aren't pretty:

http://oss.oracle.com/~mason/compilebench/makej/compare-compile-dirs-0.png

The top graph shows one dot for each write over time.  It shows that
ext3 is basically writing all over the place the whole time.  But, ext3
actually wins the read phase, so the layout isn't horrible.  My guess
is that if we introduce some write clustering by sending a group of
inodes down at the same time, it'll go much much better.

Andrew has mentioned bringing a few radix trees into the writeback paths
before, it seems like file servers and other general uses will benefit
from better clustering here.

I'm hoping to talk you into trying it out ;)

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Please pull from 'fixes-2.6.23' branch

2007-08-21 Thread Kumar Gala
Please pull from 'fixes-2.6.23' branch of

master.kernel.org:/pub/scm/linux/kernel/git/galak/powerpc.git 
fixes-2.6.23

to receive the following updates:

 arch/powerpc/sysdev/fsl_pci.c |2 ++
 include/linux/pci_ids.h   |6 --
 2 files changed, 6 insertions(+), 2 deletions(-)

Kumar Gala (1):
  [POWERPC] Fix PCI Device ID for MPC8544/8533 processors

commit 15f6ddc7d9cf96f2ee88897c7164198ed6e45a77
Author: Kumar Gala <[EMAIL PROTECTED]>
Date:   Tue Aug 21 19:15:31 2007 -0500

[POWERPC] Fix PCI Device ID for MPC8544/8533 processors

The initial user manuals for MPC8544/8533 had some issues with properly
documenting the device IDs for MPC8544/8533.  These processors are almost
identical and both show up on the reference boards.

Fix up the quirks for PCIe support to handle MPC8533/E.

Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>

diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 9fb0ce5..114c90f 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -251,6 +251,8 @@ DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8568E, 
quirk_fsl_pcie_transpare
 DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8568, 
quirk_fsl_pcie_transparent);
 DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8567E, 
quirk_fsl_pcie_transparent);
 DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8567, 
quirk_fsl_pcie_transparent);
+DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8533E, 
quirk_fsl_pcie_transparent);
+DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8533, 
quirk_fsl_pcie_transparent);
 DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8544E, 
quirk_fsl_pcie_transparent);
 DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8544, 
quirk_fsl_pcie_transparent);
 DECLARE_PCI_FIXUP_EARLY(0x1957, PCI_DEVICE_ID_MPC8641, 
quirk_fsl_pcie_transparent);
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 07fc574..8938d59 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2092,8 +2092,10 @@
 #define PCI_DEVICE_ID_MPC8568  0x0021
 #define PCI_DEVICE_ID_MPC8567E 0x0022
 #define PCI_DEVICE_ID_MPC8567  0x0023
-#define PCI_DEVICE_ID_MPC8544E 0x0030
-#define PCI_DEVICE_ID_MPC8544  0x0031
+#define PCI_DEVICE_ID_MPC8533E 0x0030
+#define PCI_DEVICE_ID_MPC8533  0x0031
+#define PCI_DEVICE_ID_MPC8544E 0x0032
+#define PCI_DEVICE_ID_MPC8544  0x0033
 #define PCI_DEVICE_ID_MPC8641  0x7010
 #define PCI_DEVICE_ID_MPC8641D 0x7011

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> - Rounding error.. you seem to round at 0.1ms, but I keep the values in
>   cycles. The times that you get (1.1ms) seems strangely higher than
>   mine, which are under 1000 cycles on a 3GHz system (less than 333ns).
>   I guess there is both a ms - ns error there and/or not enough
>   precision in your numbers.

Nope the rounding for output is depending on the amount. Rounds to one 
digit after whatever unit we figured out is best to display.

And multiplications (cyc2ns) do not result in rounding errors.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: drop support for gcc < 4.0

2007-08-21 Thread Segher Boessenkool

How many people e.g. test -rc kernels compiled with gcc 3.2?


Why would that matter?  It either works or not.  If it doesn't
work, it can either be fixed, or support for that old compiler
version can be removed.


One bug report "kernel doesn't work / crash / ... when compiled with
gcc 3.2, but works when compiled with gcc 4.2" will most likely be lost
in the big pile of unhandled bugs, not cause the removal of gcc 3.2
support...


While that might be true, it's a separate problem.


The only other policy than "only remove support if things are
badly broken" would be "only support what the GCC team supports",
which would be >= 4.1 now; and there are very good arguments for
supporting more than that with the Linux kernel.


No, it's not about bugs in gcc, it's about kernel+gcc combinations that
are mostly untested but officially supported.


What does "officially supported" mean?  Especially the
"officially" part.  Is this documented somewhere?


E.g. how many kernel developers use kernels compiled without
unit-at-a-time? And unit-at-a-time does paper over some bugs,
e.g. at about half a dozen section mismatch bugs I've fixed
recently are not present with it.


If any developer is interested in supporting some certain old
compiler version, he should be testing regularly with it.  Sounds
like that's you ;-)

If no developer is interested, we shouldn't claim to support
using that compiler version.


But as the discussions have shown gcc 4.0 is currently too high for
making a cut, and it is not yet the right time for raising the minimum
required gcc version.


Agreed.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2.6.23-rc2-mm2] small fix for ia64 icache sync patch

2007-08-21 Thread KAMEZAWA Hiroyuki
On Tue, 21 Aug 2007 14:12:02 -0700
"Luck, Tony" <[EMAIL PROTECTED]> wrote:

> > +   if (pte_present(pteval) &&// swap out ?
> > +   pte_exec(pteval) &&// flush only new executable page.
> > pte_user(pteval) &&// ignore kernel page
> > (!pte_present(*ptep) ||// do_no_page or swap in, migration,
> > pte_pfn(*ptep) != pte_pfn(pteval))) // do_wp_page(), page copy
> 
> David Mosberger was concerned about the increase in code
> size from this inline function.  We can reduce the bloat
> a bit by defining a macro that tests for "present &&
> executable && user-mode" in one go:
> 
> #define pte_pux(pte)((pte_val(pte) & 
> (_PAGE_P|_PAGE_PL_MASK|_PAGE_AR_RX)) == \
> (_PAGE_P|_PAGE_PL_3|_PAGE_AR_RX))
> 
Hmm, ok.

> Perhaps there is a better name than "pte_pux"?  I don't know whether
> the code that this generates is faster, but it is smaller (bloat
> is only 3k instead of 4k).
> 
> One last cleanup needed ... don't use C-99/C++ style comments.
> 
ok.

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Add missing PCI capability IDs

2007-08-21 Thread Alex Chiang

These IDs are in pciutils, but haven't been added to the kernel
yet.

Signed-off-by: Alex Chiang <[EMAIL PROTECTED]>
Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---

diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index 495d368..1ef8712 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -202,8 +202,12 @@
 #define  PCI_CAP_ID_CHSWP  0x06/* CompactPCI HotSwap */
 #define  PCI_CAP_ID_PCIX   0x07/* PCI-X */
 #define  PCI_CAP_ID_HT 0x08/* HyperTransport */
-#define  PCI_CAP_ID_VNDR   0x09/* Vendor specific capability */
+#define  PCI_CAP_ID_VNDR   0x09/* Vendor specific */
+#define  PCI_CAP_ID_DBG0x0A/* Debug port */
+#define  PCI_CAP_ID_CCRC   0x0B/* CompactPCI Central Resource Control 
*/
 #define  PCI_CAP_ID_SHPC   0x0C/* PCI Standard Hot-Plug Controller */
+#define  PCI_CAP_ID_SSVID  0x0D/* Bridge subsystem vendor/device ID */
+#define  PCI_CAP_ID_AGP3   0x0E/* AGP Target PCI-PCI bridge */
 #define  PCI_CAP_ID_EXP0x10/* PCI Express */
 #define  PCI_CAP_ID_MSIX   0x11/* MSI-X */
 #define PCI_CAP_LIST_NEXT  1   /* Next capability in the list */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ver_linux is [censored]

2007-08-21 Thread Jesper Juhl
On 22/08/07, Al Viro <[EMAIL PROTECTED]> wrote:
> On Tue, Aug 21, 2007 at 11:56:32AM +0200, Jesper Juhl wrote:
> > On 21/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> > > Commit 4a645d5ea65baaa5736bcb566673bf4a351b2ad8 broke ver_linux
> > > on etch which glibc has 3-digit version number.
> >
> > Whoops, sorry about that.
> >
> > > Patch replaces awk
> > > wanking with more robust sed wanking.
> > >
> > > Tested on gentoo, etch, centos 4.2.
> > >
> > I tested your patch on Slackware 12.0, Debian 3.1 & Gentoo Base System
> > release 1.12.9 and it works fine on those as well.
>
> How about simply doing
> sh -c 'cat /proc/$$/maps'|sed -n -e '/^.*\/libc-\([^/]*\)\.so$/{s//\1/;p;q}'
> and to hell with parsing ls -l output?
>
Works for me.

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> 
> > Are you running a UP or SMP kernel ? If you run a UP kernel, the
> > cmpxchg_local and cmpxchg are identical.
> 
> UP.
> 
> > Oh, and if you run your tests at boot time, the alternatives code may
> > have removed the lock prefix, therefore making cmpxchg and cmpxchg_local
> > exactly the same.
> 
> Tests were run at boot time.
> 
> That still does not explain kmalloc not showing improvements.
> 

Hrm, weird.. because it should. Here are the numbers I posted
previously:


The measurements I get (in cycles):
 enable interrupts (STI)   disable interrupts (CLI)   local CMPXCHG
IA32 (P4)11282 26
x86_64 AMD64 125   102 19

So both AMD64 and IA32 should be improved.

So why those improvements are not shown in your test ? A few possible
causes:

- Do you have any CONFIG_DEBUG_* options activated ? smp_processor_id()
  may end up being more expensive in these cases.
- Rounding error.. you seem to round at 0.1ms, but I keep the values in
  cycles. The times that you get (1.1ms) seems strangely higher than
  mine, which are under 1000 cycles on a 3GHz system (less than 333ns).
  I guess there is both a ms - ns error there and/or not enough
  precision in your numbers.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] x86_64 EFI runtime service support

2007-08-21 Thread Yinghai Lu
On 8/21/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Tue, Aug 21, 2007 at 03:41:44AM -0700, H. Peter Anvin wrote:
> > Andi Kleen wrote:
> > >> - "struct boot_params" (the zeropage) is kept as a legacy interface.
> > >
> > > Legacy interface for what?  Just for kexec utils which never should
> > > have been using it anyways keeping backwards cruft around seems
> > > misplac.ed
> >
> > Worse.  LinuxBIOS.  :(
>
> Sigh. Perhaps it should be renamed AntiLinuxBios: it seems to be actively 
> adverse.

current LinuxBIOS's path: the elfboot in LinuxBIOS will prepare the
e820 table, and jump to startup_32 in kernel. is that not good and
simple? kernel is not supposed to switch back and forth to get such
memmap...

Why not using ACPI mean AntiLinux?

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> Are you running a UP or SMP kernel ? If you run a UP kernel, the
> cmpxchg_local and cmpxchg are identical.

UP.

> Oh, and if you run your tests at boot time, the alternatives code may
> have removed the lock prefix, therefore making cmpxchg and cmpxchg_local
> exactly the same.

Tests were run at boot time.

That still does not explain kmalloc not showing improvements.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> 
> > Using cmpxchg_local vs cmpxchg has a clear impact on the fast paths, as
> > shown below: it saves about 60 to 70 cycles for kmalloc and 200 cycles
> > for the kmalloc/kfree pair (test 2).
> 
> H.. I wonder if the AMD processors simply do the same in either 
> version.

No supposed to. I remember having posted numbers that show a
difference.

Are you running a UP or SMP kernel ? If you run a UP kernel, the
cmpxchg_local and cmpxchg are identical.

Oh, and if you run your tests at boot time, the alternatives code may
have removed the lock prefix, therefore making cmpxchg and cmpxchg_local
exactly the same.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
> * Christoph Lameter ([EMAIL PROTECTED]) wrote:
> > On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> > 
> > > - Changed smp_rmb() for barrier(). We are not interested in read order
> > >   across cpus, what we want is to be ordered wrt local interrupts only.
> > >   barrier() is much cheaper than a rmb().
> > 
> > But this means a preempt disable is required. RT users do not want that.
> > Without preemption the processor can be moved after c has been determined.
> > That is why the smp_rmb() is there.
> 
> preemption is required if we want to use cmpxchg_local anyway.
> 
> We may have to find a way to use preemption while being able to give an
> upper bound on the preempt disabled execution time. I think I got a way
> to do this yesterday.. I'll dig in my patches.
> 

Yeah, I remember having done so : moving the preempt disable nearer to
the cmpxchg, checking if the cpuid has changed between the
raw_smp_processor_id() read and the preempt_disable done later, redo if
it is different. It makes the slow path faster, but makes the fast path
more complex, therefore I finally dropped the patch. And we talk about
~10 cycles for the slow path here, I doubt it's worth the complexity
added to the fast path.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> Using cmpxchg_local vs cmpxchg has a clear impact on the fast paths, as
> shown below: it saves about 60 to 70 cycles for kmalloc and 200 cycles
> for the kmalloc/kfree pair (test 2).

H.. I wonder if the AMD processors simply do the same in either 
version.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> kmalloc(8)/kfree = 112 cycles
> kmalloc(16)/kfree = 103 cycles
> kmalloc(32)/kfree = 103 cycles
> kmalloc(64)/kfree = 103 cycles
> kmalloc(128)/kfree = 112 cycles
> kmalloc(256)/kfree = 111 cycles
> kmalloc(512)/kfree = 111 cycles
> kmalloc(1024)/kfree = 111 cycles
> kmalloc(2048)/kfree = 121 cycles

Looks good. This improves handling for short lived objects about 
threefold.

> kmalloc(4096)/kfree = 650 cycles
> kmalloc(8192)/kfree = 1042 cycles
> kmalloc(16384)/kfree = 1149 cycles

Hmmm... The page allocator is really bad here

Could we use the cmpxchg_local approach for the per cpu queues in the 
page_allocator? May have an even greater influence on overall system 
performance than the SLUB changes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> 
> > SLUB Use cmpxchg() everywhere.
> > 
> > It applies to "SLUB: Single atomic instruction alloc/free using
> > cmpxchg".
> 
> > +++ slab/mm/slub.c  2007-08-20 18:42:28.0 -0400
> > @@ -1682,7 +1682,7 @@ redo:
> >  
> > object[c->offset] = freelist;
> >  
> > -   if (unlikely(cmpxchg_local(>freelist, freelist, object) != freelist))
> > +   if (unlikely(cmpxchg(>freelist, freelist, object) != freelist))
> > goto redo;
> > return;
> >  slow:
> 
> Ok so regular cmpxchg, no cmpxchg_local. cmpxchg_local does not bring 
> anything more? My measurements did not show any difference. I measured on 
> Athlon64. What processor is being used?
> 

This patch only cleans up the tree before proposing my cmpxchg_local
changes. There was an inconsistent use of cmpxchg/cmpxchg_local there.

Using cmpxchg_local vs cmpxchg has a clear impact on the fast paths, as
shown below: it saves about 60 to 70 cycles for kmalloc and 200 cycles
for the kmalloc/kfree pair (test 2).

Pros :
- we can use barrier() instead of rmb()
- cmpxchg_local is faster

Con :
- we must disable preemption

I use a 3GHz Pentium 4 for my tests.

Results (compared to cmpxchg_local numbers) :

SLUB Performance testing

1. Kmalloc: Repeatedly allocate then free test
(kfree here is slow path)

* cmpxchg
kmalloc(8) = 271 cycles kfree = 645 cycles
kmalloc(16) = 158 cycles  kfree = 428 cycles
kmalloc(32) = 153 cycles  kfree = 446 cycles
kmalloc(64) = 178 cycles  kfree = 459 cycles
kmalloc(128) = 247 cycles kfree = 481 cycles
kmalloc(256) = 363 cycles kfree = 605 cycles
kmalloc(512) = 449 cycles kfree = 677 cycles
kmalloc(1024) = 626 cycles  kfree = 810 cycles
kmalloc(2048) = 681 cycles  kfree = 869 cycles
kmalloc(4096) = 471 cycles  kfree = 575 cycles
kmalloc(8192) = 666 cycles  kfree = 747 cycles
kmalloc(16384) = 736 cycles kfree = 853 cycles

* cmpxchg_local
kmalloc(8) = 83 cycles  kfree = 363 cycles
kmalloc(16) = 85 cycles kfree = 372 cycles
kmalloc(32) = 92 cycles kfree = 377 cycles
kmalloc(64) = 115 cycleskfree = 397 cycles
kmalloc(128) = 179 cycles   kfree = 438 cycles
kmalloc(256) = 314 cycles   kfree = 564 cycles
kmalloc(512) = 398 cycles   kfree = 615 cycles
kmalloc(1024) = 573 cycles  kfree = 745 cycles
kmalloc(2048) = 629 cycles  kfree = 816 cycles
kmalloc(4096) = 473 cycles  kfree = 548 cycles
kmalloc(8192) = 659 cycles  kfree = 745 cycles
kmalloc(16384) = 724 cycles kfree = 843 cycles


2. Kmalloc: alloc/free test

*cmpxchg
kmalloc(8)/kfree = 321 cycles
kmalloc(16)/kfree = 308 cycles
kmalloc(32)/kfree = 311 cycles
kmalloc(64)/kfree = 310 cycles
kmalloc(128)/kfree = 306 cycles
kmalloc(256)/kfree = 325 cycles
kmalloc(512)/kfree = 324 cycles
kmalloc(1024)/kfree = 322 cycles
kmalloc(2048)/kfree = 309 cycles
kmalloc(4096)/kfree = 678 cycles
kmalloc(8192)/kfree = 1027 cycles
kmalloc(16384)/kfree = 1204 cycles

* cmpxchg_local
kmalloc(8)/kfree = 112 cycles
kmalloc(16)/kfree = 103 cycles
kmalloc(32)/kfree = 103 cycles
kmalloc(64)/kfree = 103 cycles
kmalloc(128)/kfree = 112 cycles
kmalloc(256)/kfree = 111 cycles
kmalloc(512)/kfree = 111 cycles
kmalloc(1024)/kfree = 111 cycles
kmalloc(2048)/kfree = 121 cycles
kmalloc(4096)/kfree = 650 cycles
kmalloc(8192)/kfree = 1042 cycles
kmalloc(16384)/kfree = 1149 cycles

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc2-mm2

2007-08-21 Thread Andrew Morton
On Sun, 19 Aug 2007 15:56:07 + (UTC)
richard kennedy <[EMAIL PROTECTED]> wrote:

> On Thu, 09 Aug 2007 22:42:54 -0700, Andrew Morton wrote:
> 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-
> rc2/2.6.23-rc2-mm2/
> > 
> > - Various problems from 2.6.23-rc2-mm1 were fixed
> > 
> > 
> > 
> > Boilerplate:
> > 
> > - See the `hot-fixes' directory for any important updates to this
> > patchset.
> > 
> > - To fetch an -mm tree using git, use (for example)
> > 
> >   git-fetch
> >   git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git
> >   tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1
> > 
> Hi Andrew,

Please always do reply-to-all.  Otherwise you end up thinking that you're
being ignored ;)

> the git tree you mentioned in the boilerplate doesn't seem to have been 
> updated in about 7 weeks.
> 2.6.22-rc6-mm1 is the last tag I can see on the summary page. Is 
> something broken ?

Yes, the software which auto-imports -mm into git appears to have broken
a few weeks ago.  Matthias has been informed, but I guess he is busy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> * cmpxchg_local Slub test
> kmalloc(8) = 83 cycleskfree = 363 cycles
> kmalloc(16) = 85 cycles   kfree = 372 cycles
> kmalloc(32) = 92 cycles   kfree = 377 cycles
> kmalloc(64) = 115 cycleskfree = 397 cycles
> kmalloc(128) = 179 cycles   kfree = 438 cycles

So for consecutive allocs of small slabs up to 128 bytes this effectively 
doubles the speed of kmalloc.

> kmalloc(256) = 314 cycles   kfree = 564 cycles
> kmalloc(512) = 398 cycles   kfree = 615 cycles
> kmalloc(1024) = 573 cycleskfree = 745 cycles

Less of a benefit.

> kmalloc(2048) = 629 cycleskfree = 816 cycles

Allmost as before.

> kmalloc(4096) = 473 cycleskfree = 548 cycles
> kmalloc(8192) = 659 cycleskfree = 745 cycles
> kmalloc(16384) = 724 cycles   kfree = 843 cycles

Page allocator pass through measurements.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> 
> > - Changed smp_rmb() for barrier(). We are not interested in read order
> >   across cpus, what we want is to be ordered wrt local interrupts only.
> >   barrier() is much cheaper than a rmb().
> 
> But this means a preempt disable is required. RT users do not want that.
> Without preemption the processor can be moved after c has been determined.
> That is why the smp_rmb() is there.

preemption is required if we want to use cmpxchg_local anyway.

We may have to find a way to use preemption while being able to give an
upper bound on the preempt disabled execution time. I think I got a way
to do this yesterday.. I'll dig in my patches.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
Reformatting...

* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
> Hi Christoph,
> 
> If you are interested in the raw numbers:
> 
> The (very basic) test module follows. Make sure you change get_cycles()
> for get_cycles_sync() if you plan to run this on x86_64.
> 
> (tests taken on a 3GHz Pentium 4)
> 

(Note: test 1 uses the kfree slow path, as figured out by
instrumentation)

SLUB Performance testing

1. Kmalloc: Repeatedly allocate then free test

* slub HEAD, test 1
kmalloc(8) = 201 cycles kfree = 351 cycles
kmalloc(16) = 198 cycles  kfree = 359 cycles
kmalloc(32) = 200 cycles  kfree = 381 cycles
kmalloc(64) = 224 cycles  kfree = 394 cycles
kmalloc(128) = 285 cycles kfree = 424 cycles
kmalloc(256) = 411 cycles kfree = 546 cycles
kmalloc(512) = 480 cycles kfree = 619 cycles
kmalloc(1024) = 623 cycles  kfree = 750 cycles
kmalloc(2048) = 686 cycles  kfree = 811 cycles
kmalloc(4096) = 482 cycles  kfree = 538 cycles
kmalloc(8192) = 680 cycles  kfree = 734 cycles
kmalloc(16384) = 713 cycles kfree = 843 cycles

* Slub HEAD, test 2
kmalloc(8) = 190 cycles kfree = 351 cycles
kmalloc(16) = 195 cycles  kfree = 360 cycles
kmalloc(32) = 201 cycles  kfree = 370 cycles
kmalloc(64) = 245 cycles  kfree = 389 cycles
kmalloc(128) = 283 cycles kfree = 413 cycles
kmalloc(256) = 409 cycles kfree = 547 cycles
kmalloc(512) = 476 cycles kfree = 616 cycles
kmalloc(1024) = 628 cycles  kfree = 753 cycles
kmalloc(2048) = 684 cycles  kfree = 811 cycles
kmalloc(4096) = 480 cycles  kfree = 539 cycles
kmalloc(8192) = 661 cycles  kfree = 746 cycles
kmalloc(16384) = 741 cycles kfree = 856 cycles

* cmpxchg_local Slub test
kmalloc(8) = 83 cycles  kfree = 363 cycles
kmalloc(16) = 85 cycles kfree = 372 cycles
kmalloc(32) = 92 cycles kfree = 377 cycles
kmalloc(64) = 115 cycles  kfree = 397 cycles
kmalloc(128) = 179 cycles kfree = 438 cycles
kmalloc(256) = 314 cycles kfree = 564 cycles
kmalloc(512) = 398 cycles kfree = 615 cycles
kmalloc(1024) = 573 cycles  kfree = 745 cycles
kmalloc(2048) = 629 cycles  kfree = 816 cycles
kmalloc(4096) = 473 cycles  kfree = 548 cycles
kmalloc(8192) = 659 cycles  kfree = 745 cycles
kmalloc(16384) = 724 cycles kfree = 843 cycles



2. Kmalloc: alloc/free test

* slub HEAD, test 1
kmalloc(8)/kfree = 322 cycles
kmalloc(16)/kfree = 318 cycles
kmalloc(32)/kfree = 318 cycles
kmalloc(64)/kfree = 325 cycles
kmalloc(128)/kfree = 318 cycles
kmalloc(256)/kfree = 328 cycles
kmalloc(512)/kfree = 328 cycles
kmalloc(1024)/kfree = 328 cycles
kmalloc(2048)/kfree = 328 cycles
kmalloc(4096)/kfree = 678 cycles
kmalloc(8192)/kfree = 1013 cycles
kmalloc(16384)/kfree = 1157 cycles

* Slub HEAD, test 2
kmalloc(8)/kfree = 323 cycles
kmalloc(16)/kfree = 318 cycles
kmalloc(32)/kfree = 318 cycles
kmalloc(64)/kfree = 318 cycles
kmalloc(128)/kfree = 318 cycles
kmalloc(256)/kfree = 328 cycles
kmalloc(512)/kfree = 328 cycles
kmalloc(1024)/kfree = 328 cycles
kmalloc(2048)/kfree = 328 cycles
kmalloc(4096)/kfree = 648 cycles
kmalloc(8192)/kfree = 1009 cycles
kmalloc(16384)/kfree = 1105 cycles

* cmpxchg_local Slub test
kmalloc(8)/kfree = 112 cycles
kmalloc(16)/kfree = 103 cycles
kmalloc(32)/kfree = 103 cycles
kmalloc(64)/kfree = 103 cycles
kmalloc(128)/kfree = 112 cycles
kmalloc(256)/kfree = 111 cycles
kmalloc(512)/kfree = 111 cycles
kmalloc(1024)/kfree = 111 cycles
kmalloc(2048)/kfree = 121 cycles
kmalloc(4096)/kfree = 650 cycles
kmalloc(8192)/kfree = 1042 cycles
kmalloc(16384)/kfree = 1149 cycles

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + git-net-fix-export.patch added to -mm tree

2007-08-21 Thread David Miller
From: [EMAIL PROTECTED]
Date: Tue, 21 Aug 2007 16:03:06 -0700

> Subject: git-net fix export
> From: Andrew Morton <[EMAIL PROTECTED]>
> 
> Must be silly season or something.
> 
> Cc: "David S. Miller" <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>


I've applied this to net-2.6.24, thanks Andrew.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> SLUB Use cmpxchg() everywhere.
> 
> It applies to "SLUB: Single atomic instruction alloc/free using
> cmpxchg".

> +++ slab/mm/slub.c2007-08-20 18:42:28.0 -0400
> @@ -1682,7 +1682,7 @@ redo:
>  
>   object[c->offset] = freelist;
>  
> - if (unlikely(cmpxchg_local(>freelist, freelist, object) != freelist))
> + if (unlikely(cmpxchg(>freelist, freelist, object) != freelist))
>   goto redo;
>   return;
>  slow:

Ok so regular cmpxchg, no cmpxchg_local. cmpxchg_local does not bring 
anything more? My measurements did not show any difference. I measured on 
Athlon64. What processor is being used?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ver_linux is [censored]

2007-08-21 Thread Al Viro
On Tue, Aug 21, 2007 at 11:56:32AM +0200, Jesper Juhl wrote:
> On 21/08/07, Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> > Commit 4a645d5ea65baaa5736bcb566673bf4a351b2ad8 broke ver_linux
> > on etch which glibc has 3-digit version number.
> 
> Whoops, sorry about that.
> 
> > Patch replaces awk
> > wanking with more robust sed wanking.
> >
> > Tested on gentoo, etch, centos 4.2.
> >
> I tested your patch on Slackware 12.0, Debian 3.1 & Gentoo Base System
> release 1.12.9 and it works fine on those as well.

How about simply doing
sh -c 'cat /proc/$$/maps'|sed -n -e '/^.*\/libc-\([^/]*\)\.so$/{s//\1/;p;q}'
and to hell with parsing ls -l output?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:

> - Changed smp_rmb() for barrier(). We are not interested in read order
>   across cpus, what we want is to be ordered wrt local interrupts only.
>   barrier() is much cheaper than a rmb().

But this means a preempt disable is required. RT users do not want that.
Without preemption the processor can be moved after c has been determined.
That is why the smp_rmb() is there.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with IDE on linux 2.6.22.X

2007-08-21 Thread Rene Herman

On 08/22/2007 01:00 AM, Alan Cox wrote:


"Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support"


Not for the newer chips. You want ATA/SATA (PIIX and possibly AHCI)
support from the new drivers, SCSI disk and SCSI cd.


That _is_ the *config description for the new (CONFIG_ATA_PIIX) driver (in 
2.6.22.x).



where you may need to boot with a "libata.atapi_enabled=0" kernel parameter.


Why deliberately disable atapi when you need atapi ?


Because he described the problem that if he got his (SATA) disk supported he 
lost his (PATA) DVD drive. Although I'm as said not completely sure it would 
actually work, I suggested compiling in both ATA_PIIX (yes, and sd) for his 
drive and the IDE PIIX/ICH driver and ide-cd for his DVD, where if it works 
at all, passing the above option may or may not be useful.


But his report, although expansive, was a little unclear. Let's wait for 
what happens with only ATA_PIIIX.


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SLUB use cmpxchg_local

2007-08-21 Thread Mathieu Desnoyers
* Christoph Lameter ([EMAIL PROTECTED]) wrote:
> On Tue, 21 Aug 2007, Mathieu Desnoyers wrote:
> 
> > - Fixed an erroneous test in slab_free() (logic was flipped from the 
> >   original code when testing for slow path. It explains the wrong 
> >   numbers you have with big free).
> 
> If you look at the numbers that I posted earlier then you will see that 
> even the measurements without free were not up to par.
> 

I seem to get a clear performance improvement in the kmalloc fast path.

> > It applies on top of the 
> > "SLUB Use cmpxchg() everywhere" patch.
> 
> Which one is that?
> 

This one:


SLUB Use cmpxchg() everywhere.

It applies to "SLUB: Single atomic instruction alloc/free using
cmpxchg".

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
---
 mm/slub.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: slab/mm/slub.c
===
--- slab.orig/mm/slub.c 2007-08-20 18:42:16.0 -0400
+++ slab/mm/slub.c  2007-08-20 18:42:28.0 -0400
@@ -1682,7 +1682,7 @@ redo:
 
object[c->offset] = freelist;
 
-   if (unlikely(cmpxchg_local(>freelist, freelist, object) != freelist))
+   if (unlikely(cmpxchg(>freelist, freelist, object) != freelist))
goto redo;
return;
 slow:

> >  | slab.git HEAD slub (min-max)|  cmpxchg_local slub
> > kmalloc(8)   | 190 - 201   | 83
> > kfree(8) | 351 - 351   |363
> > kmalloc(64)  | 224 - 245   |115
> > kfree(64)| 389 - 394   |397
> > kmalloc(16384)|713 - 741   |724
> > kfree(16384) | 843 - 856   |843
> > 
> > Therefore, there seems to be a repeatable gain on the kmalloc fast path
> > (more than twice faster). No significant performance hit for the kfree
> > case, but no gain neither, same for large kmalloc, as expected.
> 
> There is a consistent loss on slab_free it seems. The 16k numbers are 
> irrelevant since we do not use slab_alloc/slab_free due to the direct pass 
> through patch but call the page allocator directly. That also explains 
> that there is no loss there.
> 

Yes. slab_free in these tests falls mostly into __slab_free() slow path
(I instrumented the number of slow and fast path to get this). The small
performance hit (~10 cycles) can be explained by the added
preempt_disable()/preempt_enable().

> The kmalloc numbers look encouraging. I will check to see if I can 
> reproduce it once I sort out the patches.

Ok.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS hang + umount -f: better behaviour requested.

2007-08-21 Thread Valdis . Kletnieks
On Tue, 21 Aug 2007 14:50:42 EDT, John Stoffel said:

> Now maybe those issues are raised when you have a Linux NFS server
> with Solaris clients.  But in my book, reliable NFS servers are key,
> and if they are reliable, 'soft,intr' works just fine.

And you don't need all that ext3 journal overhead if your disk drives
are reliable too.  Gotcha. :)


pgp2uTg72zF5n.pgp
Description: PGP signature


input: limit memory allocated by uinput ff drivers

2007-08-21 Thread Chuck Ebbert
input: limit memory allocated by uinput ff drivers

Don't let force feedback drivers allocate more than 256K of kernel
memory. On kernel 2.6.22 this causes a kernel OOPS with the SLUB
memory allocator; on later kernels the drivers may allocate large
amounts of memory.

Signed-off-by: Chuck Ebbert <[EMAIL PROTECTED]>

---
 drivers/input/ff-core.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- linux-2.6.22.noarch.orig/drivers/input/ff-core.c
+++ linux-2.6.22.noarch/drivers/input/ff-core.c
@@ -306,6 +306,7 @@ int input_ff_create(struct input_dev *de
 {
struct ff_device *ff;
int i;
+   int needed_mem;
 
if (!max_effects) {
printk(KERN_ERR
@@ -313,8 +314,11 @@ int input_ff_create(struct input_dev *de
return -EINVAL;
}
 
-   ff = kzalloc(sizeof(struct ff_device) +
-max_effects * sizeof(struct file *), GFP_KERNEL);
+   needed_mem = sizeof(struct ff_device) + max_effects * sizeof(struct 
file *);
+   if (needed_mem > 256 * 1024)
+   return -ENOMEM;
+
+   ff = kzalloc(needed_mem, GFP_KERNEL);
if (!ff)
return -ENOMEM;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-21 Thread Valdis . Kletnieks
On Tue, 21 Aug 2007 09:16:43 PDT, "Paul E. McKenney" said:

> I agree that instant gratification is hard to come by when synching
> up compiler and kernel versions.  Nonetheless, it should be possible
> to create APIs that are are conditioned on the compiler version.

We've tried that, sort of.  See the mess surrounding the whole
extern/static/inline/__whatever boondogle, which seems to have
changed semantics in every single gcc release since 2.95 or so.

And recently mention was made that gcc4.4 will have *new* semantics
in this area. Yee. Hah.







pgpGx7YTiWc5V.pgp
Description: PGP signature


Re: Problems with IDE on linux 2.6.22.X

2007-08-21 Thread Alan Cox
> "Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support"

Not for the newer chips. You want ATA/SATA (PIIX and possibly AHCI)
support from the new drivers, SCSI disk and SCSI cd.

> where you may need to boot with a "libata.atapi_enabled=0" kernel parameter.

Why deliberately disable atapi when you need atapi ?

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/7] Postphone reclaim laundry to write at high water marks

2007-08-21 Thread Christoph Lameter
On Wed, 22 Aug 2007, Peter Zijlstra wrote:

> Also, all I want is for slab to honour gfp flags like page allocation
> does, nothing more, nothing less.
> 
> (well, actually slightly less, since I'm only really interrested in the
> ALLOC_MIN|ALLOC_HIGH|ALLOC_HARDER -> ALLOC_NO_WATERMARKS transition and
> not all higher ones)

I am still not sure what that brings you. There may be multiple 
PF_MEMALLOC going on at the same time. On a large system with N cpus
there may be more than N of these that can steal objects from one another. 

A NUMA system will be shot anyways if memory gets that problematic to 
handle since the OS cannot effectively place memory if all zones are 
overallocated so that only a few pages are left.


> I want slab to fail when a similar page alloc would fail, no magic.

Yes I know. I do not want allocations to fail but that reclaim occurs in 
order to avoid failing any allocation. We need provisions that 
make sure that we never get into such a bad memory situation that would
cause severe slowless and usually end up in a livelock anyways.

> > > Anonymous pages are a there to stay, and we cannot tell people how to
> > > use them. So we need some free or freeable pages in order to avoid the
> > > vm deadlock that arises from all memory dirty.
> > 
> > No one is trying to abolish Anonymous pages. Free memory is readily 
> > available on demand if one calls reclaim. Your scheme introduces complex 
> > negotiations over a few scraps of memory when large amounts of memory 
> > would still be readily available if one would do the right thing and call 
> > into reclaim.
> 
> This is the thing I contend, there need not be large amounts of memory
> around. In my test prog the hot code path fits into a single page, the
> rest can be anonymous.

Thats a bit extreme We need to make sure that there are larger amounts 
of memory around. Pages are used for all shorts of short term uses (like 
slab shrinking etc etc.). If memory is that low that a single page matters
then we are in very bad shape anyways.

> > Sounds like you would like to change the way we handle memory in general 
> > in the VM? Reclaim (and thus finding freeable pages) is basic to Linux 
> > memory management.
> 
> Not quite, currently we have free pages in the reserves, if you want to
> replace some (or all) of that by freeable pages then that is a change.

We have free pages primarily to optimize the allocation. Meaning we do not 
have to run reclaim on every call. We want to use all of memory. The 
reserves are there for the case that we cannot call into reclaim. The easy 
solution if that is problematic is to enhance the reclaim to work in the
critical situations that we care about.


> > Sorry I just got into this a short time ago and I may need a few cycles 
> > to get this all straight. An approach that uses memory instead of 
> > ignoring available memory is certainly better.
> 
> Sure if and when possible. There will always be need to fall back to the
> reserves.

Maybe. But we can certainly avoid that as much as possible which would 
also increase our ability to use all available memory instead of leaving 
some of it unused./

> A bit off-topic, re that reclaim from atomic context:
> Currently we try to hold spinlocks only for short periods of time so
> that reclaim can be preempted, if you run all of reclaim from a
> non-preemptible context you get very large preemption latencies and if
> done from int context it'd also generate large int latencies.

If you call into the page allocator from an interrupt context then you are 
already in bad shape since we may check pcps lists and then potentially 
have to traverse the zonelists and check all sorts of things. If we 
would implement atomic reclaim then the reserves may become a latency 
optimizations. At least we will not fail anymore if the reserves are out.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/7] Postphone reclaim laundry to write at high water marks

2007-08-21 Thread Christoph Lameter
On Tue, 21 Aug 2007, Rik van Riel wrote:

> Christoph Lameter wrote:
> 
> > I want general improvements to reclaim to address the issues that you see
> > and other issues related to reclaim instead of the strange code that makes
> > PF_MEMALLOC allocs compete for allocations from a single slab and putting
> > logic into the kernel to decide which allocs to fail. We can reclaim after
> > all. Its just a matter of finding the right way to do this. 
> 
> The simplest way of achieving that would be to allow
> recursion of the page reclaim code, under the condition
> that the second level call can only reclaim clean pages,
> while the "outer" call does what the VM does today.

Yes that is what the precursor to this patchset does.

See http://marc.info/?l=linux-mm=118710207203449=2

This one did not even come up to the level of the earlier one. Sigh.

The way forward may be:

1. Like in the earlier patchset allow reentry to reclaim under 
   PF_MEMALLOC if we are out of all memory.

2. Do the laundry as here but do not write out laundry directly.
   Instead move laundry to a new lru style list in the zone structure.
   This will allow the recursive reclaim to also trigger writeout
   of pages (what this patchset was supposed to accomplish).

3. Perform writeback only from kswapd. Make other threads
   wait on kswapd if memory is low, we can wait and writeback still
   has to progress.

4. Then allow reclaim of GFP_ATOMIC allocs (see
   http://marc.info/?l=linux-kernel=118710595617696=2). Atomic
   reclaim can then also put pages onto the zone laundry lists from where
   it is going to be picked up and written out by kswapd ASAP. This one
   may be tricky so maybe keep this separate.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-21 Thread Al Boldi
Ingo Molnar wrote:
> * Al Boldi <[EMAIL PROTECTED]> wrote:
> > There is one workload that still isn't performing well; it's a
> > web-server workload that spawns 1K+ client procs.  It can be emulated
> > by using this:
> >
> >   for i in `seq 1 to `; do ping 10.1 -A > /dev/null & done
>
> on bash i did this as:
>
>   for ((i=0; i<; i++)); do ping 10.1 -A > /dev/null & done
>
> and this quickly creates a monster-runqueue with tons of ping tasks
> pending. (i replaced 10.1 with the IP of another box on the same LAN as
> the testbox) Is this what should happen?

Yes, sometimes they start pending and sometimes they run immediately.

> > The problem is that consecutive runs don't give consistent results and
> > sometimes stalls.  You may want to try that.
>
> well, there's a natural saturation point after a few hundred tasks
> (depending on your CPU's speed), at which point there's no idle time
> left. From that point on things get slower progressively (and the
> ability of the shell to start new ping tasks is impacted as well), but
> that's expected on an overloaded system, isnt it?

Of course, things should get slower with higher load, but it should be 
consistent without stalls.

To see this problem, make sure you boot into /bin/sh with the normal VGA 
console (ie. not fb-console).  Then try each loop a few times to show 
different behaviour; loops like:

# for ((i=0; i<; i++)); do ping 10.1 -A > /dev/null & done

# for ((i=0; i<; i++)); do nice -99 ping 10.1 -A > /dev/null & done

# { for ((i=0; i<; i++)); do
ping 10.1 -A > /dev/null &
done } > /dev/null 2>&1

Especially the last one sometimes causes a complete console lock-up, while 
the other two sometimes stall then surge periodically.

BTW, I am also wondering how one might test threading behaviour wrt to 
startup and sync-on-exit with parent thread.  This may not show any problems 
with small number of threads, but how does it scale with 1K+?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] add some Blackfin specific checks to checkpatch.pl

2007-08-21 Thread Mike Frysinger
Check for a few common errors in Blackfin-specific code wrt MMR loading in
assembly and doing core/system syncs.

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
CC: Bryan Wu <[EMAIL PROTECTED]>
CC: Andy Whitcroft <[EMAIL PROTECTED]>
---
 scripts/checkpatch.pl |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dae7d30..ead9675 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -486,9 +486,29 @@ sub process {
WARN("line over 80 characters\n" . $herecurr);
}
 
+# Blackfin: use hi/lo macros
+   if ($line =~ 
/\.[lL][[:space:]]*=.*&[[:space:]]*0x[fF][fF][fF][fF]/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the LO() macro, not (... & 0x)\n" . 
$herevet);
+   }
+   if ($line =~ /\.[hH][[:space:]]*=.*>>[[:space:]]*16/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the HI() macro, not (... >> 16)\n" . 
$herevet);
+   }
+
 # check we are in a valid source file *.[hc] if not then ignore this hunk
next if ($realfile !~ /\.[hc]$/);
 
+# Blackfin: don't use __builtin_bfin_[cs]sync
+   if ($line =~ /__builtin_bfin_csync/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the CSYNC() macro in asm/blackfin.h\n" . 
$herevet);
+   }
+   if ($line =~ /__builtin_bfin_ssync/) {
+   my $herevet = "$here\n" . cat_vet($line) . "\n";
+   ERROR("use the SSYNC() macro in asm/blackfin.h\n" . 
$herevet);
+   }
+
 # at the beginning of a line any tabs must come first and anything
 # more than 8 must use tabs.
if ($line=~/^\+\s* \t\s*\S/ or $line=~/^\+\s*\s*/) {
-- 
1.5.3.rc5
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Restricting CDC-ACM devices

2007-08-21 Thread Nate
I would like to use the cdc-acm driver in the Linux kernel (2.6.22-rc1),
but restrict the access to only my VID/PID devices.  Is there an easy way
to do with without modifying cdc-acm.c?

In a past prototype I made a simple wrapper driver for usb serial by
adding my VID/PID numbers to the wrapper driver's id_table.  Then when
that usb driver was accessed on connection, the driver just pointed to the
usb_serial_* functions (probe, disconnect, etc).  I tried to do the same
with the cdc-acm driver, but the cdc-acm driver's probe function was
called before my driver's probe.  I noticed that the cdc-amc driver will
attach when it detects the two CDC-ACM interfaces, so I removed the
cdc-acm driver with "make menuconfig".  This didn't work because the
cdc-acm functions I was attempting to call from my driver do not exist.

Thanks for the help,
-Nate


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Add stack checking for Blackfin

2007-08-21 Thread Mike Frysinger
Simply fill out the bits in checkstack.pl for Blackfin.  I thought I already
sent this, but I don't see it in -mm anywhere ...

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
CC: Bryan Wu <[EMAIL PROTECTED]>
---
 scripts/checkstack.pl |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/scripts/checkstack.pl b/scripts/checkstack.pl
index f7844f6..9226381 100755
--- a/scripts/checkstack.pl
+++ b/scripts/checkstack.pl
@@ -73,6 +73,9 @@ my (@stack, $re, $x, $xs);
# pair for larger users. -- PFM.
#a00048e0:   d4fc40f0addi.l  r15,-240,r15
$re = qr/.*addi\.l.*r15,-(([0-9]{2}|[3-9])[0-9]{2}),r15/o;
+   } elsif ($arch =~ /^blackfin$/) {
+   #   0:   00 e8 38 01 LINK 0x4e0;
+   $re = qr/.*[[:space:]]LINK[[:space:]]*(0x$x{1,8})/o;
} else {
print("wrong or unknown architecture\n");
exit
-- 
1.5.3.rc5
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] Fix mainline filesystems to handle ATTR_KILL_ bits correctly

2007-08-21 Thread Jeff Layton
On Tue, 21 Aug 2007 17:21:28 -0400
Josef Sipek <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 21, 2007 at 07:35:51AM -0400, Jeff Layton wrote:
> > On Tue, 21 Aug 2007 15:35:08 +1000
> > Timothy Shimmin <[EMAIL PROTECTED]> wrote:
> > 
> > > Jeff Layton wrote:
> > > > This should fix all of the filesystems in the mainline kernels to handle
> > > > ATTR_KILL_SUID and ATTR_KILL_SGID correctly. For most of them, this is
> > > > just a matter of making sure that they call generic_attrkill early in
> > > > the setattr inode op.
> > > > 
> > > > Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>
> > > > ---
> > > >  fs/xfs/linux-2.6/xfs_iops.c   |5 -
> > > > --- a/fs/xfs/linux-2.6/xfs_iops.c
> > > > +++ b/fs/xfs/linux-2.6/xfs_iops.c
> > > > @@ -651,12 +651,15 @@ xfs_vn_setattr(
> > > > struct iattr*attr)
> > > >  {
> > > > struct inode*inode = dentry->d_inode;
> > > > -   unsigned intia_valid = attr->ia_valid;
> > > > +   unsigned intia_valid;
> > > > bhv_vnode_t *vp = vn_from_inode(inode);
> > > > bhv_vattr_t vattr = { 0 };
> > > > int flags = 0;
> > > > int error;
> > > >  
> > > > +   generic_attrkill(inode->i_mode, attr);
> > > > +   ia_valid = attr->ia_valid;
> > > > +
> > > > if (ia_valid & ATTR_UID) {
> > > > vattr.va_mask |= XFS_AT_UID;
> > > > vattr.va_uid = attr->ia_uid;
> > > 
> > > Looks reasonable to me for XFS.
> > > Acked-by: Tim Shimmin <[EMAIL PROTECTED]>
> > > 
> > > So before, this clearing would happen directly in notify_change()
> > > and now this won't happen until notify_change() calls i_op->setattr
> > > which for a particular fs it can call generic_attrkill() to do it.
> > > So I guess for the cases where i_op->setattr is called outside of
> > > via notify_change, we don't normally have ATTR_KILL_SUID/SGID
> > > set so that nothing will happen there?
> > 
> > Right. If neither ATTR_KILL bit is set then generic_attrkill is a
> > noop.
> > 
> > > I guess just wondering the effect with having the code on all
> > > setattr's. (I'm not familiar with the code path)
> > > 
> > 
> > These bits are referenced in very few places in the current kernel
> > tree -- mostly in the VFS layer. The *only* place I see that they
> > actually get interpreted into a mode change is in notify_change. So
> > places that call setattr ops w/o going through notify_change are
> > not likely to have those bits set.
> > 
> > But hypothetically, if a fs did set ATTR_KILL_* and call setattr
> > directly, then the setattr would now include a mode change that
> > clears setuid or setgid bits where it may not have before.
> 

I should probably clarify -- in the hypothetical situation above,
the setattr function would have to call generic_attrkill (as most
filesystems should do with this change).

> It almost sounds like an argument for a new inode op (NULL would use
> generic_attr_kill).
> 

That's not a bad idea at all. I suppose that would be easier than
modifying every fs like this, and it does seem like it might be
cleaner. I need to mull it over, but that might be the best
solution.

-- 
Jeff Layton <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with IDE on linux 2.6.22.X

2007-08-21 Thread Rene Herman

On 08/21/2007 09:49 PM, José Luis Patiño Andrés wrote:


Somebody tolds me that I can solve this problem unchecking the
IDE_GENERIC option in the kernel configuration. It's true, but when I do
this the DVD device is not recognized by the kernel. No exists.


The OpenSuSE Live CD thing not booting may mean you have a deeper problem,
but please note that you shouldn't be using ide-disk:


In my working 2.6.20.15 kernel, the 'cat /proc/ide/drivers' command
outputs this:



###
#ide-disk   version 1.18
#ide-cdrom  version 4.61
###

But in 2.6.22.X, the output is only:
###
#ide-disk   version 1.18
###


You have a SATA harddrive (Hitachi Travelstar 5K100 100GB SATA/2.5") and an
IDE (also known as PATA) DVD drive (LG GMA-4082N). That is, your disk should 
be driven by the:


"Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support"

under the "Serial ATA (prod) and Parallel ATA (experimental) drivers" menu, 
and it seems this driver should also take care of your DVD. Not sure from 
your report what you are using -- first try with only that driver, and 
nothing from the old "ATA/ATAPI/MFM/RLL support" menu selected.


In that situation, your harddrive works, but your DVD does not?

If so, this should be fixed in the driver, but to get things working I 
believe you may try with both the above driver for your harddisk and the old 
IDE driver for the DVD:


<*>   Enhanced IDE/MFM/RLL disk/cdrom/tape/floppy support
<*> Include IDE/ATAPI CDROM support (NEW)
[*] PCI IDE chipset support
[*] Generic PCI bus-master DMA support
<*>   Intel PIIXn chipsets support

(do not select IDE/ATA-2 disk support)

where you may need to boot with a "libata.atapi_enabled=0" kernel parameter.

Not actually particularly sure if that works given that it's the same chip 
and all it seems but anyways, please first verify results with only that 
SATA driver.


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: drop support for gcc < 4.0

2007-08-21 Thread Adrian Bunk
On Tue, Aug 21, 2007 at 04:49:38PM -0500, James Bottomley wrote:
> On Tue, 2007-08-21 at 23:21 +0200, Adrian Bunk wrote:
> > On Tue, Aug 21, 2007 at 10:49:49PM +0200, Segher Boessenkool wrote:
> > >> How many people e.g. test -rc kernels compiled with gcc 3.2?
> > >
> > > Why would that matter?  It either works or not.  If it doesn't
> > > work, it can either be fixed, or support for that old compiler
> > > version can be removed.
> > 
> > One bug report "kernel doesn't work / crash / ... when compiled with
> > gcc 3.2, but works when compiled with gcc 4.2" will most likely be lost 
> > in the big pile of unhandled bugs, not cause the removal of gcc 3.2 
> > support...
> 
> What's the bugzilla or pointer to this report please?  Those of us who
> use gcc-3 as the default kernel compiler will take it seriously (if it
> looks to have an impact to our kernel builds) otherwise we can tell you
> it's unreproducible/not a problem etc.

This was an example in response to Segher's point we would remove 
support for a gcc version in such a case.

I remember we had such issues, but I don't find any pointer to a 
specific one at the moment.

I'll keep you informed when bug reports come in that only occur with 
older gcc versions and that aren't easily fixable.

> James

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >