Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

2007-05-10 Thread Nicolas Mailhot
Le jeudi 10 mai 2007 à 16:01 -0700, Christoph Lameter a écrit :
> On Fri, 11 May 2007, Mel Gorman wrote:
> 
> > Nicholas, could you backout the patch
> > dont-group-high-order-atomic-allocations.patch and test again please?
> > The following patch has the same effect. Thanks
> 
> Great! Thanks.

The proposed patch did not apply

+ cd /builddir/build/BUILD
+ rm -rf linux-2.6.21
+ /usr/bin/bzip2 -dc /builddir/build/SOURCES/linux-2.6.21.tar.bz2
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd linux-2.6.21
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ echo 'Patch #2 (2.6.21-mm2.bz2):'
Patch #2 (2.6.21-mm2.bz2):
+ /usr/bin/bzip2 -d
+ patch -p1 -s
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ echo 'Patch #3 (md-improve-partition-detection-in-md-array.patch):'
Patch #3 (md-improve-partition-detection-in-md-array.patch):
+ patch -p1 -R -s
+ echo 'Patch #4 (bug-8464.patch):'
Patch #4 (bug-8464.patch):
+ patch -p1 -s
1 out of 1 hunk FAILED -- saving rejects to file
include/linux/pageblock-flags.h
.rej
6 out of 6 hunks FAILED -- saving rejects to file mm/page_alloc.c.rej

Backing out dont-group-high-order-atomic-allocations.patch worked and
seems to have cured the system so far (need to charge it a bit longer to
be sure)

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [PATCH] UDF: check for allocated memory for inode data

2007-05-10 Thread Cyrill Gorcunov
[Andrew Morton - Thu, May 10, 2007 at 03:46:40PM -0700]

[...snip...] 

| But please let's not add three copies of identical code.  Do something like:

[...snip...]

Thanks for comments, Andrew. Let me rewrite the patch...

Cyrill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG][debian-2.6.20-1-686] bridging + vlans + "vconfig rem" == stuck kernel

2007-05-10 Thread Kyle Moffett

On May 10, 2007, at 00:34:11, Kyle Moffett wrote:

On May 10, 2007, at 00:25:54, Ben Greear wrote:
Looks like a deadlock in the vlan code.  Any chance you can run  
this test with lockdep enabled?


You could also add a printk in vlan_device_event() to check which  
event it is hanging on, and the netdevice that is passed in.


Ok, I'll try building a 2.6.21 kernel with lockdep and some  
debugging printk()s in the vlan_device_event() function and get  
back to you tomorrow.  Thanks for the quick response!


Progress!!!  I built a 2.6.21.1 kernel with a 1MB dmesg buffer,  
almost all of the locking debugging options on (as well as a few  
others just for kicks), a VLAN debug #define turned on in the net/ 
8021q/vlan.h file, and lots of extra debugging messages added to the  
functions in vlan.c.  My initial interpretation is that due to the  
funny order in which "ifdown -a" takes down interfaces, it tries to  
delete the VLAN interfaces before the bridges running atop them have  
been taken down.  Ordinarily this seems to work, but when the  
underlying physical ethernet is down already, the last VLAN to be  
deleted seems to hang somehow.  The full results are as follows:


The lock dependency validator at startup passes all 218 testcases,  
indicating that all the locking crap is probably working correctly  
(those debug options chew up another meg of RAM).


ifup -a brings up the interfaces in this order (See previous email  
for configuration details):

lo net0 wfi0 world0 lan lan:0 world

ifdown -a appears to bring them down in the same order (at least,  
until it gets stuck).


Attached below is filtered debugging information.  I cut out 90% of  
the crap in the syslog, but there's still a lot left over to sift  
through; sorry.  If you want my .config or the full text of the log  
then email me privately and I'll send it to you, as it's kinda big.


I appreciate any advice, thanks for all your help

Cheers,
Kyle Moffett

This first bit is the "ifup -a -v -i interfaces":

ADDRCONF(NETDEV_UP): net0: link is not ready
vlan_ioctl_handler: args.cmd: 6
vlan_ioctl_handler: args.cmd: 0
register_vlan_device: if_name -:net0:-^Ivid: 2
About to allocate name, vlan_name_type: 3
Allocated new name -:net0.2:-
About to go find the group for idx: 2
vlan_transfer_operstate: net0 state transition applies to net0.2 too:
vlan_proc_add, device -:net0.2:- being added.
Allocated new device successfully, returning.
wfi0: add 33:33:00:00:00:01 mcast address to master interface
wfi0: add 01:00:5e:00:00:01 mcast address to master interface
ADDRCONF(NETDEV_UP): wfi0: link is not ready
vlan_ioctl_handler: args.cmd: 6
vlan_ioctl_handler: args.cmd: 0
register_vlan_device: if_name -:net0:-^Ivid: 4094
About to allocate name, vlan_name_type: 3
Allocated new name -:net0.4094:-
About to go find the group for idx: 2
vlan_transfer_operstate: net0 state transition applies to net0.4094  
too:

vlan_proc_add, device -:net0.4094:- being added.
Allocated new device successfully, returning.
world0: add 33:33:00:00:00:01 mcast address to master interface
world0: add 01:00:5e:00:00:01 mcast address to master interface
ADDRCONF(NETDEV_UP): world0: link is not ready
tg3: net0: Link is up at 1000 Mbps, full duplex.
tg3: net0: Flow control is on for TX and on for RX.
ADDRCONF(NETDEV_CHANGE): net0: link becomes ready
Propagating NETDEV_CHANGE for device net0...
... to wfi0
vlan_transfer_operstate: net0 state transition applies to  
wfi0 too:

...found a carrier, applying to VLAN device
... to world0
vlan_transfer_operstate: net0 state transition applies to  
world0 too:

...found a carrier, applying to VLAN device
lan: port 1(net0) entering listening state
ADDRCONF(NETDEV_CHANGE): wfi0: link becomes ready
wfi0: dev_set_promiscuity(master, 1)
wfi0: add 33:33:ff:5f:60:92 mcast address to master interface
lan: port 2(wfi0) entering listening state
ADDRCONF(NETDEV_CHANGE): world0: link becomes ready
world0: add 33:33:ff:91:e2:4c mcast address to master interface
lan: no IPv6 routers present
world: no IPv6 routers present
net0: no IPv6 routers present
world0: no IPv6 routers present
wfi0: no IPv6 routers present
lan: port 1(net0) entering learning state
lan: port 2(wfi0) entering learning state
lan: topology change detected, propagating
lan: port 1(net0) entering forwarding state
lan: topology change detected, propagating
lan: port 2(wfi0) entering forwarding state


This bit is for "ifdown -a -v -i interfaces":

Propagating NETDEV_DOWN for device net0...
... to wfi0
wfi0: del 33:33:ff:5f:60:92 mcast address from vlan interface
wfi0: del 33:33:ff:5f:60:92 mcast address from master interface
wfi0: del 01:00:5e:00:00:01 mcast address from vlan interface
wfi0: del 01:00:5e:00:00:01 mcast address from master interface
wfi0: del 33:33:00:00:00:01 mcast address from vlan interface
wfi0: del 33:33:00:00:00:01 mcast address from master interface
lan: port 2(wfi0) entering disabled state
... to world0

[PATCH] PowerPC64 symbols start with '.'

2007-05-10 Thread Stephen Rothwell
which we want to skip during modpost processing.  We need this to make
some of the whitelisting work.

Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]>
---
 scripts/mod/modpost.c |   18 +-
 1 files changed, 17 insertions(+), 1 deletions(-)

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 113dc77..748b058 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -164,7 +164,13 @@ static inline unsigned int tdb_hash(const char *name)
 static struct symbol *alloc_symbol(const char *name, unsigned int weak,
   struct symbol *next)
 {
-   struct symbol *s = NOFAIL(malloc(sizeof(*s) + strlen(name) + 1));
+   struct symbol *s;
+
+   /* For our purposes, .foo matches foo.  PPC64 needs this. */
+   if (name[0] == '.')
+   name++;
+
+   s = NOFAIL(malloc(sizeof(*s) + strlen(name) + 1));
 
memset(s, 0, sizeof(*s));
strcpy(s->name, name);
@@ -180,6 +186,10 @@ static struct symbol *new_symbol(const char *name, struct 
module *module,
unsigned int hash;
struct symbol *new;
 
+   /* For our purposes, .foo matches foo.  PPC64 needs this. */
+   if (name[0] == '.')
+   name++;
+
hash = tdb_hash(name) % SYMBOL_HASH_SIZE;
new = symbolhash[hash] = alloc_symbol(name, 0, symbolhash[hash]);
new->module = module;
@@ -684,6 +694,12 @@ static int secref_whitelist(const char *modname, const 
char *tosec,
NULL
};
 
+   /* For our purposes, .foo matches foo.  PPC64 needs this. */
+   if (atsym[0] == '.')
+   atsym++;
+   if (refsymname[0] == '.')
+   refsymname++;
+
/* Check for pattern 1 */
if (strcmp(tosec, ".init.data") != 0)
f1 = 0;
-- 
1.5.1.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/12] crypto: don't pollute the global namespace with sg_next()

2007-05-10 Thread Benny Halevy
Jens Axboe wrote:
> On Thu, May 10 2007, Benny Halevy wrote:
>> Jens Axboe wrote:
>>> It's a subsystem function, prefix it as such.
>> Jens, Boaz and I talked about this over lunch.
>> I wonder whether the crypto code must use your implementation
>> instead of its own as it needs to over the sglist, e.g. for
>> calculating iscsi (data) digest.
> 
> The thought did cross my mind, and yes I think that would be a good
> idea. The whole thing should probably just migrate to
> lib/scattersomething.c
> 
>> The crypto implementation of chained sglists in crypto/scatterwalk.h
>> determines the chain link by !sg->length which will sorta work
>> with your implementation, however the marker bit on page pointer must
>> be cleared to use it.
> 
> I don't like using sg->length, as that may be modified for legitimate
> reason. That's why I chose to use the lsb bit of the page pointer.
> 
>> Also, is it possible that after the original sglist has gone through
>> dma_map_sg and entries were merged, some entries will have zero
>> length?  I'm not sure... If so, if the crypto implementation scans
>> the sg list after it was dma mapped (maybe in a retry path) it
>> may hit an entry that looks to it like a chaining link.  This
>> might be an existing bug and another reason for the crypto code
>> to use your implementation.
> 
> It's hard to say, depends heavily on the sub system or arch. Even if
> using the pointer tagging mechanism seems a bit nasty, I think it's the
> more resilient approach.
> 

We're in agreement then :)
I was trying to say that the methods should be compatible, otherwise
bugs can happen, and that your scheme is better since it can
handle sglists with zero length entries that aren't the last.
A case that might be valid after dma mapping and merging.
If indeed this case is possible, this seems to be the right time
to converge to your scheme.

Benny
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slub-i386-support.patch

2007-05-10 Thread Christoph Lameter
On Thu, 10 May 2007, William Lee Irwin III wrote:

> Looking more closely at it, the entire attempt to avoid struct page
> pointers is far beyond pointless. The freeing functions unconditionally
> require struct page pointers to either be passed or computed and the
> allocation function's virtual address it returns as a result is not
> directly usable. The callers all have to do arithmetic on the result.
> One might as well stash precomputed pfn's (if not paddrs) and vaddrs in
> page->private and page->mapping, chain them with ->lru (use only .next
> if you care to stay singly-linked), and handle struct page pointers

Well then you'd have to rewrite the existing ways of fiddling with page 
structs. This way all is clear and you fiddle as you want. It just works. 
Could we get this in? You acked it once already?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers/scsi/aic7xxx_old: Convert to generic boolean

2007-05-10 Thread Richard Knutsson
Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
---
Compile-tested with all(yes|mod|no)config on x86(|_64) & sparc(|64)
  got some warnings on some builds, none related to this patch
Diffed against Linus' git-tree.

 aic7xxx_old.c  |  326 ++---
 aic7xxx_old/aic7xxx_proc.c |2
 2 files changed, 161 insertions(+), 167 deletions(-)


diff --git a/drivers/scsi/aic7xxx_old.c b/drivers/scsi/aic7xxx_old.c
index a988d5a..afa5ded 100644
--- a/drivers/scsi/aic7xxx_old.c
+++ b/drivers/scsi/aic7xxx_old.c
@@ -255,12 +255,6 @@
 #define ALL_LUNS -1
 #define MAX_TARGETS  16
 #define MAX_LUNS 8
-#ifndef TRUE
-#  define TRUE 1
-#endif
-#ifndef FALSE
-#  define FALSE 0
-#endif
 
 #if defined(__powerpc__) || defined(__i386__) || defined(__x86_64__)
 #  define MMAPIO
@@ -1382,7 +1376,7 @@ aic7xxx_setup(char *s)
 char *tok, *tok_end, *tok_end2;
 char tok_list[] = { '.', ',', '{', '}', '\0' };
 int i, instance = -1, device = -1;
-unsigned char done = FALSE;
+bool done = false;
 
 base = p;
 tok = base + n + 1;  /* Forward us just past the ':' */
@@ -1410,14 +1404,14 @@ aic7xxx_setup(char *s)
 case ',':
 case '.':
   if (instance == -1)
-done = TRUE;
+done = true;
   else if (device >= 0)
 device++;
   else if (instance >= 0)
 instance++;
   if ( (device >= MAX_TARGETS) || 
(instance >= ARRAY_SIZE(aic7xxx_tag_info)) )
-done = TRUE;
+done = true;
   tok++;
   if (!done)
   {
@@ -1425,10 +1419,10 @@ aic7xxx_setup(char *s)
   }
   break;
 case '\0':
-  done = TRUE;
+  done = true;
   break;
 default:
-  done = TRUE;
+  done = true;
   tok_end = strchr(tok, '\0');
   for(i=0; tok_list[i]; i++)
   {
@@ -1436,7 +1430,7 @@ aic7xxx_setup(char *s)
 if ( (tok_end2) && (tok_end2 < tok_end) )
 {
   tok_end = tok_end2;
-  done = FALSE;
+  done = false;
 }
   }
   if ( (instance >= 0) && (device >= 0) &&
@@ -1512,7 +1506,7 @@ pause_sequencer(struct aic7xxx_host *p)
  *   warrant an easy way to do it.
  *-F*/
 static void
-unpause_sequencer(struct aic7xxx_host *p, int unpause_always)
+unpause_sequencer(struct aic7xxx_host *p, bool unpause_always)
 {
   if (unpause_always ||
   ( !(aic_inb(p, INTSTAT) & (SCSIINT | SEQINT | BRKADRINT)) &&
@@ -1771,7 +1765,7 @@ aic7xxx_loadseq(struct aic7xxx_host *p)
   aic_outb(p, 0, SEQADDR0);
   aic_outb(p, 0, SEQADDR1);
   aic_outb(p, FASTMODE | FAILDIS, SEQCTL);
-  unpause_sequencer(p, TRUE);
+  unpause_sequencer(p, true);
   mdelay(1);
   pause_sequencer(p);
   aic_outb(p, FASTMODE, SEQCTL);
@@ -1820,7 +1814,7 @@ aic7xxx_print_sequencer(struct aic7xxx_host *p, int 
downloaded)
   aic_outb(p, 0, SEQADDR0);
   aic_outb(p, 0, SEQADDR1);
   aic_outb(p, FASTMODE | FAILDIS, SEQCTL);
-  unpause_sequencer(p, TRUE);
+  unpause_sequencer(p, true);
   mdelay(1);
   pause_sequencer(p);
   aic_outb(p, FASTMODE, SEQCTL);
@@ -1868,7 +1862,7 @@ aic7xxx_find_syncrate(struct aic7xxx_host *p, unsigned 
int *period,
   unsigned int maxsync, unsigned char *options)
 {
   struct aic7xxx_syncrate *syncrate;
-  int done = FALSE;
+  bool done = false;
 
   switch(*options)
   {
@@ -1924,7 +1918,7 @@ aic7xxx_find_syncrate(struct aic7xxx_host *p, unsigned 
int *period,
 case MSG_EXT_PPR_OPTION_DT_UNITS:
   if(!(syncrate->sxfr_ultra2 & AHC_SYNCRATE_CRC))
   {
-done = TRUE;
+done = true;
 /*
  * oops, we went too low for the CRC/DualEdge signalling, so
  * clear the options byte
@@ -1938,7 +1932,7 @@ aic7xxx_find_syncrate(struct aic7xxx_host *p, unsigned 
int *period,
   }
   else
   {
-done = TRUE;
+done = true;
 if(syncrate == _syncrates[maxsync])
 {
   *period = syncrate->period;
@@ -1948,7 +1942,7 @@ aic7xxx_find_syncrate(struct aic7xxx_host *p, unsigned 
int *period,
 default:
   if(!(syncrate->sxfr_ultra2 & AHC_SYNCRATE_CRC))
   {
-done = TRUE;
+done = true;
 if(syncrate == _syncrates[maxsync])
 {
   *period = syncrate->period;
@@ -2375,22 +2369,22 @@ scbq_insert_tail(volatile scb_queue_type *queue, struct 
aic7xxx_scb *scb)
  *   on the 

Re: [patch 05/10] Linux Kernel Markers - i386 optimized version

2007-05-10 Thread Ananth N Mavinakayanahalli
On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> * Alan Cox ([EMAIL PROTECTED]) wrote:

...
> > > * Third issue : Scalability. Changing code will stop every CPU on the
> > >   system for a while. Compared to this, the int3-based approach will run
> > >   through the breakpoint handler "if" one of the CPU happens to execute
> > >   this code at the wrong time. The standard case is just an IPI (to
> > 
> > If I read the errata right then patching in an int3 will itself trigger
> > the errata so anything could happen.
> > 
> > I believe there are other safe sequences for doing code patching - perhaps
> > one of the Intel folk can advise ?

IIRC, when the first implementation of what exists now as kprobes was
done (as part of the dprobes framework), this question did come up. I
think the conclusion was that the errata applies only to multi-byte
modifications and single-byte changes are guaranteed to be atomic.
Given int3 on Intel is just 1-byte, we are safe.

> I'll let the Intel guys confirm this, I don't have the reference nearby
> (I got this information by talking with the kprobe team members, and
> they got this information directly from Intel developers) but the
> int3 is the one special case to which the errata does not apply.
> Otherwise, kprobes and gdb would have a big, big issue.

Perhaps Richard/Suparna can confirm.

Ananth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.21-mm1 (git-input) on Dell D610 laptop

2007-05-10 Thread Dmitry Torokhov
Hi Remi,

On Thursday 10 May 2007 21:50, Andrew Morton wrote:
> On Thu, 10 May 2007 15:05:25 +0200
> Remi Colinet <[EMAIL PROTECTED]> wrote:
> 
> > My D610 ALPS Glide Point is unresponsive with 2.6.21-mm1 patch.
> > No problem noticed with 2.6.21.
> > 
> > The culprit seems to be git-input. I have applied 2.6.21-mm1 on top of 
> > 2.6.21
> > and then removed git-input patch. It is ok since then.

Have you tried any other -mm? Also, does it help if you stick

ps2_command(>ps2dev, NULL, PSMOUSE_CMD_SETSTREAM);

at the very beginning of psmouse_initialize() in
drivers/input/mouse/psmouse-base.c?

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86 setup rewrite tree ready for flamage^W review

2007-05-10 Thread Vivek Goyal
On Tue, May 08, 2007 at 10:15:21PM -0700, H. Peter Anvin wrote:
> Hello all,
> 
> I believe the x86 setup tree is now finished.  I will turn it into a
> "clean patchset" later this week, but I wanted to get flamed^W feedback
> on it first.
> 
> The git tree is at:
> 
> http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=summary
> git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-newsetup.git
> ...
> 
> ... and a flat patch at ...
> 
> http://www.kernel.org/pub/linux/kernel/people/hpa/newsetup-36f021b5.patch
> 

Wow, reading code in C is so much better than decoding assembly. :-)

Had a quick look, mainly from relocatable kernel code point of view. Yet
to dive deeper. 

PHYSICAL_ALIGN needs to be 2MB on x86_64 instead of 1MB.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21 broke arm scsi on qemu -M volatilepb

2007-05-10 Thread Randy Dunlap
[adding linux-scsi]

On Thu, 10 May 2007 23:59:10 -0400 Rob Landley wrote:

> Booting a 2.6.20 kernel under qemu works fine and gets me to a shell prompt, 
> but booting a 2.6.21.1 kernel cycles endlessly on scsi, going:
> 
> Loading iSCSI transport class v2.0-724.
> PCI: enabling device :00:0c.0 (0140 -> 0143)
> sym0: <895a> rev 0x0 at pci :00:0c.0 irq 0
> sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> sym0: SCSI BUS has been reset.
> scsi0 : sym-2.2.3
> scsi 0:0:0:0: ABORT operation started.
> scsi 0:0:0:0: ABORT operation timed-out.
> scsi 0:0:0:0: DEVICE RESET operation started.
> scsi 0:0:0:0: DEVICE RESET operation timed-out.
> scsi 0:0:0:0: BUS RESET operation started.
> scsi 0:0:0:0: BUS RESET operation timed-out.
> scsi 0:0:0:0: HOST RESET operation started.
> sym0: SCSI BUS has been reset.
> ...
> And so on.
> 
> If you're interested in reproducing this, download the most recent 
> http://landley.net/hg/firmware snapshot (links up top), run "./build.sh 
> armv4l", and when that's done "cd build" and "./run-armv4l.sh".
> 
> Is this a known issue?  A quick google for "arm scsi 2.6.21" didn't turn up 
> anything relevant...
> 
> Rob


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: (hacky) [PATCH] silence MODPOST section mismatch warnings

2007-05-10 Thread Sam Ravnborg
On Thu, May 10, 2007 at 11:18:50PM +0100, Russell King wrote:
> On Fri, May 11, 2007 at 12:16:59AM +0200, Sam Ravnborg wrote:
> > On Thu, May 10, 2007 at 10:59:20PM +0100, Russell King wrote:
> > > file:(section+offset): message
> > 
> > I like the new format - thanks!
> > Did you drop the ':' after the file on purpose?
> 
> Oops, yes.
> 
> > PS. Will apply the path you submitted in next mail.
> 
> Do you want a patch with added colons?
I will add them locally - no problem.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21 broke arm scsi on qemu -M volatilepb

2007-05-10 Thread Rob Landley
Booting a 2.6.20 kernel under qemu works fine and gets me to a shell prompt, 
but booting a 2.6.21.1 kernel cycles endlessly on scsi, going:

Loading iSCSI transport class v2.0-724.
PCI: enabling device :00:0c.0 (0140 -> 0143)
sym0: <895a> rev 0x0 at pci :00:0c.0 irq 0
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.3
scsi 0:0:0:0: ABORT operation started.
scsi 0:0:0:0: ABORT operation timed-out.
scsi 0:0:0:0: DEVICE RESET operation started.
scsi 0:0:0:0: DEVICE RESET operation timed-out.
scsi 0:0:0:0: BUS RESET operation started.
scsi 0:0:0:0: BUS RESET operation timed-out.
scsi 0:0:0:0: HOST RESET operation started.
sym0: SCSI BUS has been reset.
...
And so on.

If you're interested in reproducing this, download the most recent 
http://landley.net/hg/firmware snapshot (links up top), run "./build.sh 
armv4l", and when that's done "cd build" and "./run-armv4l.sh".

Is this a known issue?  A quick google for "arm scsi 2.6.21" didn't turn up 
anything relevant...

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3 years since last 2.2 release, why still on kernel.org main page?

2007-05-10 Thread Rob Landley
Out of curiosity, since 2.2 hasn't had a release in 3 years, and the last 
prepatch was 2 years ago, why is its' status still on the kernel.org main 
page?

Not exactly something people are checking the status of on a daily basis...

Just wondering...

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


exporting variables across modules

2007-05-10 Thread Bhuvan Kumar MITTAL
I have 2 kernel modules. In one module I have 2 global pointers(of type 
unsigned char and int respectively). I'd like to use these 2 pointers in the 
other module. I am adopting the following approach but not really sure whether 
its right or not:

1. I have exported both the pointers using EXPORT_SYMBOL(sym1) and 
EXPORT_SYMBOL(sym2)

2. Then in the module in which I want to use these pointers I have declared 
these 2 pointers as global extern variables as:
extern unsigned char * sym1;
extern unsigned int * sym2;

Is this the right way to use these pointers? Kindly guide me.

Regards,

Bhuvan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-mm2 - 100% CPU on ksoftirqd/1

2007-05-10 Thread Valdis . Kletnieks
On Wed, 09 May 2007 12:08:43 EDT, [EMAIL PROTECTED] said:

> On Wed, 09 May 2007 01:23:22 PDT, Andrew Morton said:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21/2.6.21-mm2/
> 
> Boots up to multiuser mostly OK.  However...
> 
> It comes up with a screaming ksoftirqd - usually /1 but one boot had /0.
> Just sitting there, 100% CPU according to 'top'.  Tried 'echo t > 
> /proc/sysrq-trigger' to get
> a trace, but it was always running on the other CPU - even after I reniced
> it down to 19 and launched 2 'for(;;)' C programs to suck the cycles.  It 
> would
> be failing to get any CPU - until I did the 'echo t' and then it would be
> "running" again.  Anybody got any good debugging ideas here?

OK, finally tracked this one down - the out-of-tree iwlwifi git tree for the
Intel 3945ABG card had some disagreements with the 2.6.21-mm2 git-wireless.patch

Unfortunately, the last known-working for this was -r5-mm2, as I didn't test
-rc6-mm* or -rc7-mm* for this (I hit other issues with those so I didn't
notice this one).  I'll try to work up some of those tomorrow and see if
I can narrow it down at least a *little* bit.





pgpAAM5MFXA5b.pgp
Description: PGP signature


Re: [GIT PATCH] ACPI patches for 2.6.22 - part 2

2007-05-10 Thread Linus Torvalds


On Thu, 10 May 2007, Len Brown wrote:
> 
> That said, can you send me or point me to the acpidump output
> for your EVO.  Yes, I'm sure you've sent it before a long time
> ago, but that was about probably 2,000,000 e-mail messages
> and a couple of disk crashes ago:-)

Sure. If you send me a pointer to "acpidump" again, because I've long 
since updated that machine, and no longer have it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PCI legacy I/O port free driver - Making Intel e1000 driver legacy I/O port free

2007-05-10 Thread Kok, Auke

Tomohiro Kusumi wrote:

Dear Auke

 > I'm ok with the bottom part of the patch, but I do not like
 > the modification of the pci device ID table in this way. As
 > Arjan van der Ven previously commented as well, this makes
 > it hard for future device ID's to be bound to the driver.

  I googled the previous comment by Arjan. Now I understand
  that the patch makes it difficult to add PCI ID's to the
  driver at runtime.

 > On top of that, there is no logical correlation between the
 > mapping and chipsets, so a lot of information is lost in that
 > table. It really does not show which _chipsets_ support this
 > functionality.

  Thanks for pointing out the problem, but I can't quite understand
  what you are trying to say. What do you mean by the chipset?
  Are you talking about the chipset of the NIC? or the South bridge?
  I'd be glad if you can explain it to me.


perhaps my wording was poor. I was referring to the NIC chip. Since there are 
about 12 different physical e1000 NIC chips (and lots of different pci IDs per 
e1000 NIC chip), it would be best to correlate the capability of each NIC chip 
number to be able to work without legacy IO mode instead of providing this 
mapping based on the PCI device ID.


It would serve two purposes: new pci id's for a chipset of which we already know 
that it can work without legacy IO can automatically inherit this property from 
the NIC chipset properties, and new e1000 chips would automatically get a 
default property for this value.


I will (time permitting) try to reverse your matrix to chip numbers and see if 
we can add this property in a much easier way.


Auke



Tomohiro Kusumi


Kok, Auke wrote:

Tomohiro Kusumi wrote:

Hi

As you can see in the "10. pci_enable_device_bars() and Legacy I/O
Port space" of the Documentation/pci.txt, the latest kernel has
interfaces for PCI device drivers to tell the kernel which resource
the driver want to use, ex. I/O port or MMIO.

I've made a patch which makes Intel e1000 driver legacy I/O port
free by using the PCI core changes I mentioned above. The Intel
e1000 driver can handle some of its devices without using I/O port.
So this patch changes the driver not to enable/request I/O port
region depending on the device id.

As a result, the driver can handle its device even when there are
huge number of PCI devices being used on the system and no I/O
port region assigned to the device.

Tomohiro,

I'm ok with the bottom part of the patch, but I do not like the 
modification of the pci device ID table in this way. As Arjan van der 
Ven previously commented as well, this makes it hard for future device 
ID's to be bound to the driver.


On top of that, there is no logical correlation between the mapping and 
chipsets, so a lot of information is lost in that table. It really does 
not show which _chipsets_ support this functionality.


I think if we want to work with this, we need some way of mapping the 
device ID's back to chipsets, and enable the feature on that basis.


Auke


Tomohiro Kusumi

Signed-off-by: Tomohiro Kusumi <[EMAIL PROTECTED]>

---
 e1000.h  |6 +-
 e1000_main.c |  152 
+++

 2 files changed, 86 insertions(+), 72 deletions(-)

diff -uprN linux-2.6.21.orig/drivers/net/e1000/e1000.h 
linux-2.6.21/drivers/net/e1000/e1000.h
--- linux-2.6.21.orig/drivers/net/e1000/e1000.h2007-05-09 
18:02:26.0 +0900
+++ linux-2.6.21/drivers/net/e1000/e1000.h2007-05-09 
18:02:59.0 +0900

@@ -74,8 +74,9 @@
 #define BAR_11
 #define BAR_55

-#define INTEL_E1000_ETHERNET_DEVICE(device_id) {\
-PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id)}
+#define E1000_USE_IOPORT   (1 << 0)
+#define INTEL_E1000_ETHERNET_DEVICE(device_id, flags) {\
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id), .driver_data = flags}

 struct e1000_adapter;

@@ -347,6 +348,7 @@ struct e1000_adapter {
 boolean_t quad_port_a;
 unsigned long flags;
 uint32_t eeprom_wol;
+int bars;   /* BARs to be enabled */
 };

 enum e1000_state_t {
diff -uprN linux-2.6.21.orig/drivers/net/e1000/e1000_main.c 
linux-2.6.21/drivers/net/e1000/e1000_main.c
--- linux-2.6.21.orig/drivers/net/e1000/e1000_main.c2007-05-09 
18:02:27.0 +0900
+++ linux-2.6.21/drivers/net/e1000/e1000_main.c2007-05-09 
18:03:00.0 +0900

@@ -48,65 +48,65 @@ static char e1000_copyright[] = "Copyrig
  *   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id)}
  */
 static struct pci_device_id e1000_pci_tbl[] = {
-INTEL_E1000_ETHERNET_DEVICE(0x1000),
-INTEL_E1000_ETHERNET_DEVICE(0x1001),
-INTEL_E1000_ETHERNET_DEVICE(0x1004),
-INTEL_E1000_ETHERNET_DEVICE(0x1008),
-INTEL_E1000_ETHERNET_DEVICE(0x1009),
-INTEL_E1000_ETHERNET_DEVICE(0x100C),
-INTEL_E1000_ETHERNET_DEVICE(0x100D),
-INTEL_E1000_ETHERNET_DEVICE(0x100E),
-INTEL_E1000_ETHERNET_DEVICE(0x100F),
-INTEL_E1000_ETHERNET_DEVICE(0x1010),
-

Re: Kconfig warnings on latest GIT

2007-05-10 Thread Simon Horman
On Fri, May 11, 2007 at 11:27:22AM +0900, Simon Horman wrote:
> On Thu, May 10, 2007 at 09:13:34PM -0500, Kumar Gala wrote:
> > On Fri, 11 May 2007, Simon Horman wrote:
> > 
> > > On Thu, May 10, 2007 at 08:47:05PM -0500, Kumar Gala wrote:
> > > > Try this patch:
> > >
> > > That certainly resolves the problem for me.
> > > I'll see about doing something like that for the similar
> > > Kconfig problems that I see.
> > 
> > I've got a similar fix for SYS_SUPPORTS_APM_EMULATION already.  I'll push
> > both of these to Paul.  If you can put something in place for the
> > Atari/68k and send it to Geert that would be good (feeling a little lazy
> > right now :)
> > 
> > I'm still not happy about this fix.  I'd like to get Sam's feeling on if
> > we can fixup kconfig not to warn if the dependency isn't meet.  I think
> > the select is valid, and would prefer to fix this properly before we paper
> > tape over it.
> 
> I agree. I had thought a little about a kconfig fix. Though I'm
> wondering if removing the warning will lead to oodles of dangling
> symbols and invalid checks over time.
> 
> In any case, I'll look into the Atari problem. At least that way
> there will be some patches to add to the discussion.

The fix below seems to work for the ATARI problem.
Do you want me to submit it properly, do you want to
submit it along with the other patches, or do you think
we should sit on things for a bit?

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

From: Simon Horman <[EMAIL PROTECTED]>
Subject: [PATCH] [IA64] ATARI_KBD_CORE only exists on m68k

ATARI_KBD_CORE doesn't exist on architectures other than m68k,
which causes the following warnings:

drivers/input/keyboard/Kconfig:170:warning: 'select' used by config symbol 
'KEYBOARD_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'
drivers/input/mouse/Kconfig:181:warning: 'select' used by config symbol 
'MOUSE_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'

By reversing the Kconfig logic, the same results should occur on
m68k as the current code, but the warnings go away on other platforms.

Cc: Kumar Gala <[EMAIL PROTECTED]>
Signed-off-by: Simon Horman <[EMAIL PROTECTED]>

--- 
 arch/m68k/Kconfig  |1 +
 drivers/input/keyboard/Kconfig |1 -
 drivers/input/mouse/Kconfig|1 -
 3 files changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/arch/m68k/Kconfig
===
--- linux-2.6.orig/arch/m68k/Kconfig2007-05-11 11:37:25.0 +0900
+++ linux-2.6/arch/m68k/Kconfig 2007-05-11 11:42:48.0 +0900
@@ -410,6 +410,7 @@ config STRAM_PROC
  Say Y here to report ST-RAM usage statistics in /proc/stram.
 
 config ATARI_KBD_CORE
+   default y if KEYBOARD_ATARI || MOUSE_ATARI
bool
 
 config HEARTBEAT
Index: linux-2.6/drivers/input/keyboard/Kconfig
===
--- linux-2.6.orig/drivers/input/keyboard/Kconfig   2007-05-11 
11:37:25.0 +0900
+++ linux-2.6/drivers/input/keyboard/Kconfig2007-05-11 11:42:53.0 
+0900
@@ -167,7 +167,6 @@ config KEYBOARD_AMIGA
 config KEYBOARD_ATARI
tristate "Atari keyboard"
depends on ATARI
-   select ATARI_KBD_CORE
help
  Say Y here if you are running Linux on any Atari and have a keyboard
  attached.
Index: linux-2.6/drivers/input/mouse/Kconfig
===
--- linux-2.6.orig/drivers/input/mouse/Kconfig  2007-05-11 11:40:32.0 
+0900
+++ linux-2.6/drivers/input/mouse/Kconfig   2007-05-11 11:42:58.0 
+0900
@@ -178,7 +178,6 @@ config MOUSE_AMIGA
 config MOUSE_ATARI
tristate "Atari mouse"
depends on ATARI
-   select ATARI_KBD_CORE
help
  Say Y here if you have an Atari and want its native mouse
  supported by the kernel.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] v4l: saa7134: support ir-remote for 10moons TM300

2007-05-10 Thread Tony Wan
Enable the IR-remote of the 10moons TM300 card and add the key-codes for
it's remote. 

It has been tested using lirc. All the key codes are accepted.

Signed-off-by: Tony Wan <[EMAIL PROTECTED]>
---
 drivers/media/common/ir-keymaps.c   |   69
+++
 drivers/media/video/saa7134/saa7134-cards.c |1 +
 drivers/media/video/saa7134/saa7134-input.c |6 ++
 include/media/ir-common.h   |1 +
 4 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/drivers/media/common/ir-keymaps.c
b/drivers/media/common/ir-keymaps.c
index cbd1184..5aa293e 100644
--- a/drivers/media/common/ir-keymaps.c
+++ b/drivers/media/common/ir-keymaps.c
@@ -1783,3 +1783,72 @@ IR_KEYTAB_TYPE ir_codes_tt_1500[IR_KEYTAB_SIZE] =
{
 };
 
 EXPORT_SYMBOL_GPL(ir_codes_tt_1500);
+
+/* 10MOONS TM300 */
+IR_KEYTAB_TYPE ir_codes_10moonstm3[IR_KEYTAB_SIZE] = {
+   [ 0x10 ] = KEY_POWER,   // Power
+   [ 0x0d ] = KEY_MUTE,// Mute
+   [ 0x1e ] = KEY_TUNER,   // Cable
+   [ 0x00 ] = KEY_VIDEO,   // Composite / S-Video
+   [ 0x01 ] = KEY_RADIO,   // Music
+   [ 0x02 ] = KEY_TEXT,// Photo
+
+   [ 0x1f ] = KEY_1,
+   [ 0x03 ] = KEY_2,
+   [ 0x04 ] = KEY_3,
+   [ 0x05 ] = KEY_4,
+   [ 0x1c ] = KEY_5,
+   [ 0x06 ] = KEY_6,
+   [ 0x07 ] = KEY_7,
+   [ 0x08 ] = KEY_8,
+   [ 0x1d ] = KEY_9,
+   [ 0x09 ] = KEY_SELECT,  // 2 digit select (-/--)
+   [ 0x0a ] = KEY_0,
+   [ 0x0b ] = KEY_AGAIN,   // Recall
+
+   [ 0x14 ] = KEY_F1,  // Begin
+   [ 0x15 ] = KEY_F2,  // End
+
+   [ 0x16 ] = KEY_CHANNELUP,   // CH+
+   [ 0x12 ] = KEY_CHANNELDOWN, // CH-
+   [ 0x0c ] = KEY_VOLUMEUP,// VOL+
+   [ 0x17 ] = KEY_VOLUMEDOWN,  // VOL-
+   [ 0x18 ] = KEY_OK,  // OK
+
+   [ 0x0e ] = KEY_EXIT,// Exit
+   [ 0x13 ] = KEY_COMPUTER,// Desktop
+   [ 0x11 ] = KEY_TAB, // TAB
+   [ 0x19 ] = KEY_CYCLEWINDOWS,// Switch task
+
+   [ 0x1a ] = KEY_MENU,// Menu
+   [ 0x1b ] = KEY_ZOOM,// Fullscreen
+   [ 0x24 ] = KEY_ARCHIVE, // Time shifting
+   [ 0x20 ] = KEY_SWITCHVIDEOMODE, // Selcect source
+
+   [ 0x3a ] = KEY_RECORD,  // Record
+   [ 0x22 ] = KEY_PLAY,// Play/Pause
+   [ 0x25 ] = KEY_STOP,// Stop
+   [ 0x23 ] = KEY_CAMERA,  // Snapshot
+
+   [ 0x28 ] = KEY_BACK,// Backward <<
+   [ 0x2a ] = KEY_FORWARD, // Forward >>
+   [ 0x29 ] = KEY_PREVIOUS,// Back |<<
+   [ 0x2b ] = KEY_NEXT,// End >>|
+
+   [ 0x2c ] = KEY_PROGRAM, // Multi-view
+   [ 0x2d ] = KEY_AUDIO,   // Audio Tracks
+   [ 0x2e ] = KEY_SOUND,   // Sound
+   [ 0x2f ] = KEY_SUBTITLE,// Subtitles
+
+   [ 0x30 ] = KEY_TIME,// Set timer
+   [ 0x31 ] = KEY_CHANNEL, // Stereo
+   [ 0x32 ] = KEY_LANGUAGE,// Language
+   [ 0x33 ] = KEY_TEXT,// Text
+
+   [ 0x39 ] = KEY_RED, // RED
+   [ 0x21 ] = KEY_GREEN,   // GREEN
+   [ 0x27 ] = KEY_YELLOW,  // YELLOW
+   [ 0x37 ] = KEY_BLUE,// BLUE
+};
+
+EXPORT_SYMBOL_GPL(ir_codes_10moonstm3);
diff --git a/drivers/media/video/saa7134/saa7134-cards.c
b/drivers/media/video/saa7134/saa7134-cards.c
index 44f2077..5813509 100644
--- a/drivers/media/video/saa7134/saa7134-cards.c
+++ b/drivers/media/video/saa7134/saa7134-cards.c
@@ -4368,6 +4368,7 @@ int saa7134_board_init1(struct saa7134_dev *dev)
case SAA7134_BOARD_AVERMEDIA_A16AR:
case SAA7134_BOARD_ENCORE_ENLTV:
case SAA7134_BOARD_ENCORE_ENLTV_FM:
+   case SAA7134_BOARD_10MOONSTVMASTER3:
dev->has_remote = SAA7134_REMOTE_GPIO;
break;
case SAA7134_BOARD_FLYDVBS_LR300:
diff --git a/drivers/media/video/saa7134/saa7134-input.c
b/drivers/media/video/saa7134/saa7134-input.c
index c0de37e..c87755b 100644
--- a/drivers/media/video/saa7134/saa7134-input.c
+++ b/drivers/media/video/saa7134/saa7134-input.c
@@ -333,6 +333,12 @@ int saa7134_input_init1(struct saa7134_dev *dev)
mask_keyup   = 0x04;
polling  = 50; // ms
break;
+   case SAA7134_BOARD_10MOONSTVMASTER3:
+   ir_codes = ir_codes_10moonstm3;
+   mask_keycode = 0x4f8;
+   mask_keyup   = 0x800;
+   polling  = 50; //ms
+   break;
}
if (NULL == ir_codes) {
printk("%s: Oops: IR config error [card=%d]\n",
diff --git a/include/media/ir-common.h b/include/media/ir-common.h
index 9807a7c..4e4d207 100644
--- a/include/media/ir-common.h
+++ b/include/media/ir-common.h
@@ -140,6 +140,7 @@ extern IR_KEYTAB_TYPE

Re: [PATCH] PCI legacy I/O port free driver - Making Intel e1000 driver legacy I/O port free

2007-05-10 Thread Tomohiro Kusumi

Dear Auke

> I'm ok with the bottom part of the patch, but I do not like
> the modification of the pci device ID table in this way. As
> Arjan van der Ven previously commented as well, this makes
> it hard for future device ID's to be bound to the driver.

 I googled the previous comment by Arjan. Now I understand
 that the patch makes it difficult to add PCI ID's to the
 driver at runtime.

> On top of that, there is no logical correlation between the
> mapping and chipsets, so a lot of information is lost in that
> table. It really does not show which _chipsets_ support this
> functionality.

 Thanks for pointing out the problem, but I can't quite understand
 what you are trying to say. What do you mean by the chipset?
 Are you talking about the chipset of the NIC? or the South bridge?
 I'd be glad if you can explain it to me.

Tomohiro Kusumi


Kok, Auke wrote:

Tomohiro Kusumi wrote:

Hi

As you can see in the "10. pci_enable_device_bars() and Legacy I/O
Port space" of the Documentation/pci.txt, the latest kernel has
interfaces for PCI device drivers to tell the kernel which resource
the driver want to use, ex. I/O port or MMIO.

I've made a patch which makes Intel e1000 driver legacy I/O port
free by using the PCI core changes I mentioned above. The Intel
e1000 driver can handle some of its devices without using I/O port.
So this patch changes the driver not to enable/request I/O port
region depending on the device id.

As a result, the driver can handle its device even when there are
huge number of PCI devices being used on the system and no I/O
port region assigned to the device.


Tomohiro,

I'm ok with the bottom part of the patch, but I do not like the 
modification of the pci device ID table in this way. As Arjan van der 
Ven previously commented as well, this makes it hard for future device 
ID's to be bound to the driver.


On top of that, there is no logical correlation between the mapping and 
chipsets, so a lot of information is lost in that table. It really does 
not show which _chipsets_ support this functionality.


I think if we want to work with this, we need some way of mapping the 
device ID's back to chipsets, and enable the feature on that basis.


Auke



Tomohiro Kusumi

Signed-off-by: Tomohiro Kusumi <[EMAIL PROTECTED]>

---
 e1000.h  |6 +-
 e1000_main.c |  152 
+++

 2 files changed, 86 insertions(+), 72 deletions(-)

diff -uprN linux-2.6.21.orig/drivers/net/e1000/e1000.h 
linux-2.6.21/drivers/net/e1000/e1000.h
--- linux-2.6.21.orig/drivers/net/e1000/e1000.h2007-05-09 
18:02:26.0 +0900
+++ linux-2.6.21/drivers/net/e1000/e1000.h2007-05-09 
18:02:59.0 +0900

@@ -74,8 +74,9 @@
 #define BAR_11
 #define BAR_55

-#define INTEL_E1000_ETHERNET_DEVICE(device_id) {\
-PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id)}
+#define E1000_USE_IOPORT   (1 << 0)
+#define INTEL_E1000_ETHERNET_DEVICE(device_id, flags) {\
+   PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id), .driver_data = flags}

 struct e1000_adapter;

@@ -347,6 +348,7 @@ struct e1000_adapter {
 boolean_t quad_port_a;
 unsigned long flags;
 uint32_t eeprom_wol;
+int bars;   /* BARs to be enabled */
 };

 enum e1000_state_t {
diff -uprN linux-2.6.21.orig/drivers/net/e1000/e1000_main.c 
linux-2.6.21/drivers/net/e1000/e1000_main.c
--- linux-2.6.21.orig/drivers/net/e1000/e1000_main.c2007-05-09 
18:02:27.0 +0900
+++ linux-2.6.21/drivers/net/e1000/e1000_main.c2007-05-09 
18:03:00.0 +0900

@@ -48,65 +48,65 @@ static char e1000_copyright[] = "Copyrig
  *   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, device_id)}
  */
 static struct pci_device_id e1000_pci_tbl[] = {
-INTEL_E1000_ETHERNET_DEVICE(0x1000),
-INTEL_E1000_ETHERNET_DEVICE(0x1001),
-INTEL_E1000_ETHERNET_DEVICE(0x1004),
-INTEL_E1000_ETHERNET_DEVICE(0x1008),
-INTEL_E1000_ETHERNET_DEVICE(0x1009),
-INTEL_E1000_ETHERNET_DEVICE(0x100C),
-INTEL_E1000_ETHERNET_DEVICE(0x100D),
-INTEL_E1000_ETHERNET_DEVICE(0x100E),
-INTEL_E1000_ETHERNET_DEVICE(0x100F),
-INTEL_E1000_ETHERNET_DEVICE(0x1010),
-INTEL_E1000_ETHERNET_DEVICE(0x1011),
-INTEL_E1000_ETHERNET_DEVICE(0x1012),
-INTEL_E1000_ETHERNET_DEVICE(0x1013),
-INTEL_E1000_ETHERNET_DEVICE(0x1014),
-INTEL_E1000_ETHERNET_DEVICE(0x1015),
-INTEL_E1000_ETHERNET_DEVICE(0x1016),
-INTEL_E1000_ETHERNET_DEVICE(0x1017),
-INTEL_E1000_ETHERNET_DEVICE(0x1018),
-INTEL_E1000_ETHERNET_DEVICE(0x1019),
-INTEL_E1000_ETHERNET_DEVICE(0x101A),
-INTEL_E1000_ETHERNET_DEVICE(0x101D),
-INTEL_E1000_ETHERNET_DEVICE(0x101E),
-INTEL_E1000_ETHERNET_DEVICE(0x1026),
-INTEL_E1000_ETHERNET_DEVICE(0x1027),
-INTEL_E1000_ETHERNET_DEVICE(0x1028),
-INTEL_E1000_ETHERNET_DEVICE(0x1049),
-INTEL_E1000_ETHERNET_DEVICE(0x104A),
-INTEL_E1000_ETHERNET_DEVICE(0x104B),
-INTEL_E1000_ETHERNET_DEVICE(0x104C),
-

Re: Kconfig warnings on latest GIT

2007-05-10 Thread Timur Tabi

Simon Horman wrote:


I agree. I had thought a little about a kconfig fix. Though I'm
wondering if removing the warning will lead to oodles of dangling
symbols and invalid checks over time.


I'm pretty sure it will.  Perhaps we need to have a lint for Kconfig?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 3/9] Containers (V9): Add tasks file interface

2007-05-10 Thread Balbir Singh
Paul Menage wrote:
> On 5/8/07, Balbir Singh <[EMAIL PROTECTED]> wrote:
>>
>> I now have a use case for maintaining a per-container task list.
>> I am trying to build a per-container stats similar to taskstats.
>> I intend to support container accounting of
>>
>> 1. Tasks running
>> 2. Tasks stopped
>> 3. Tasks un-interruptible
>> 4. Tasks blocked on IO
>> 5. Tasks sleeping
>>
>> This would provide statistics similar to the patch that Pavel had sent
>> out.
>>
>> I faced the following problems while trying to implement this feature
>>
>> 1. There is no easy way to get a list of all tasks belonging to a
>> container
>>(we need to walk all threads)
> 
> Well, walking the taks list is pretty easy - but yes, it could become
> inefficient when there are many small containers in use.
> 
> I've got some ideas for a way of tracking this specifically for
> containers with subsystems that want this, while avoiding the overhead
> for subsystems that don't really need it. I'll try to add them to the
> next patchset.

Super!

> 
>> 2. There is no concept of a container identifier. When a user issues a
>> command
>>to extract statistics, the only unique container identifier is the
>> container
>>path, which means that we need to do a path lookup to determine the
>> dentry
>>for the container (which gets quite ugly with all the string
>> manipulation)
> 
> We could just cache the container path permanently in the container,
> and invalidate it if any of its parents gets renamed. (I imagine this
> happens almost never.)
>

Here's what I have so far, I cache the mount point of the container
and add the container path to it. I'm now stuck examining tasks,
while walking through a bunch of tasks, there is no easy way of
knowing the container path of the task without walking all subsystems
and then extracting the containers absolute path.
 
>>
>>Adding  a container id, will make it easier to find a container and
>> return
>>statistics belonging to the container.
> 
> Not unreasonable, but there are a few questions that would have to be
> answered:
> 
> - how is the container id picked? Like a pid, or user-defined? Or some
> kind of string?
> 

I was planning on using a hierarchical scheme, top 8  bits for
the container hierarchy and bottom 24 for a unique id. The id
is automatically selected. Once we know the container id, we'll
need a more efficient mechanism to map the id to the container.

> - how would it be exposed to userspace? A generic control file
> provided by the container filesystem in all container directories?
> 

A file in all container directories is an option

> - can you give a more concrete example of how this would actually be
> useful? For your container stats, it seems that just reading a control
> file in the container's directory would give you the stats that you
> want, and userspace already knows the container's name/id since it
> opened the control file.
> 

Sure, the plan is to build a containerstats interface like taskstats.
In taskstats, we exchange data between user space and kernel space
using genetlink sockets. We have a push and pull mechanism for statistics.


> Paul


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kconfig warnings on latest GIT

2007-05-10 Thread Simon Horman
On Thu, May 10, 2007 at 09:13:34PM -0500, Kumar Gala wrote:
> On Fri, 11 May 2007, Simon Horman wrote:
> 
> > On Thu, May 10, 2007 at 08:47:05PM -0500, Kumar Gala wrote:
> > > Try this patch:
> >
> > That certainly resolves the problem for me.
> > I'll see about doing something like that for the similar
> > Kconfig problems that I see.
> 
> I've got a similar fix for SYS_SUPPORTS_APM_EMULATION already.  I'll push
> both of these to Paul.  If you can put something in place for the
> Atari/68k and send it to Geert that would be good (feeling a little lazy
> right now :)
> 
> I'm still not happy about this fix.  I'd like to get Sam's feeling on if
> we can fixup kconfig not to warn if the dependency isn't meet.  I think
> the select is valid, and would prefer to fix this properly before we paper
> tape over it.

I agree. I had thought a little about a kconfig fix. Though I'm
wondering if removing the warning will lead to oodles of dangling
symbols and invalid checks over time.

In any case, I'll look into the Atari problem. At least that way
there will be some patches to add to the discussion.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Slab allocators: Drop support for destructors

2007-05-10 Thread Paul Mundt
On Fri, May 11, 2007 at 08:35:27AM +0900, Paul Mundt wrote:
> On Thu, May 10, 2007 at 12:00:08PM -0700, Christoph Lameter wrote:
> > As far as I can tell there is only a single slab destructor left (there 
> > is currently another in i386 but its going to go as soon as Andi merges 
> > i386s support for quicklists).
> > 
> > I wonder how difficult it would be to remove it? If we have no need for 
> > destructors anymore then maybe we could remove destructor support from the 
> > slab allocators? There is no point in checking for destructor uses in 
> > the slab allocators if there are none.
> > 
> > Or are there valid reason to keep them around? It seems they were mainly 
> > used for list management which required them to take a spinlock. Taking a 
> > spinlock in a destructor is a bit risky since the slab allocators may run 
> > the destructors anytime they decide a slab is no longer needed.
> > 
> > Or do we want to continue support destructors? If so why?
> > 
> [snip pmb stuff]
> 
> I'll take a look at tidying up the PMB slab, getting rid of the dtor
> shouldn't be terribly painful. I simply opted to do the list management
> there since others were doing it for the PGD slab cache at the time that
> was written.

And here's the bit for dropping pmb_cache_dtor(), moving the list
management up to pmb_alloc() and pmb_free().

With this applied, we're all set for killing off slab destructors
from the kernel entirely.

Signed-off-by: Paul Mundt <[EMAIL PROTECTED]>

--

 arch/sh/mm/pmb.c |   79 ++-
 1 file changed, 38 insertions(+), 41 deletions(-)

diff --git a/arch/sh/mm/pmb.c b/arch/sh/mm/pmb.c
index 02aae06..b6a5a33 100644
--- a/arch/sh/mm/pmb.c
+++ b/arch/sh/mm/pmb.c
@@ -3,7 +3,7 @@
  *
  * Privileged Space Mapping Buffer (PMB) Support.
  *
- * Copyright (C) 2005, 2006 Paul Mundt
+ * Copyright (C) 2005, 2006, 2007 Paul Mundt
  *
  * P1/P2 Section mapping definitions from map32.h, which was:
  *
@@ -68,6 +68,32 @@ static inline unsigned long mk_pmb_data(unsigned int entry)
return mk_pmb_entry(entry) | PMB_DATA;
 }
 
+static DEFINE_SPINLOCK(pmb_list_lock);
+static struct pmb_entry *pmb_list;
+
+static inline void pmb_list_add(struct pmb_entry *pmbe)
+{
+   struct pmb_entry **p, *tmp;
+
+   p = _list;
+   while ((tmp = *p) != NULL)
+   p = >next;
+
+   pmbe->next = tmp;
+   *p = pmbe;
+}
+
+static inline void pmb_list_del(struct pmb_entry *pmbe)
+{
+   struct pmb_entry **p, *tmp;
+
+   for (p = _list; (tmp = *p); p = >next)
+   if (tmp == pmbe) {
+   *p = tmp->next;
+   return;
+   }
+}
+
 struct pmb_entry *pmb_alloc(unsigned long vpn, unsigned long ppn,
unsigned long flags)
 {
@@ -81,11 +107,19 @@ struct pmb_entry *pmb_alloc(unsigned long vpn, unsigned 
long ppn,
pmbe->ppn   = ppn;
pmbe->flags = flags;
 
+   spin_lock_irq(_list_lock);
+   pmb_list_add(pmbe);
+   spin_unlock_irq(_list_lock);
+
return pmbe;
 }
 
 void pmb_free(struct pmb_entry *pmbe)
 {
+   spin_lock_irq(_list_lock);
+   pmb_list_del(pmbe);
+   spin_unlock_irq(_list_lock);
+
kmem_cache_free(pmb_cache, pmbe);
 }
 
@@ -167,31 +201,6 @@ void clear_pmb_entry(struct pmb_entry *pmbe)
clear_bit(entry, _map);
 }
 
-static DEFINE_SPINLOCK(pmb_list_lock);
-static struct pmb_entry *pmb_list;
-
-static inline void pmb_list_add(struct pmb_entry *pmbe)
-{
-   struct pmb_entry **p, *tmp;
-
-   p = _list;
-   while ((tmp = *p) != NULL)
-   p = >next;
-
-   pmbe->next = tmp;
-   *p = pmbe;
-}
-
-static inline void pmb_list_del(struct pmb_entry *pmbe)
-{
-   struct pmb_entry **p, *tmp;
-
-   for (p = _list; (tmp = *p); p = >next)
-   if (tmp == pmbe) {
-   *p = tmp->next;
-   return;
-   }
-}
 
 static struct {
unsigned long size;
@@ -283,25 +292,14 @@ void pmb_unmap(unsigned long addr)
} while (pmbe);
 }
 
-static void pmb_cache_ctor(void *pmb, struct kmem_cache *cachep, unsigned long 
flags)
+static void pmb_cache_ctor(void *pmb, struct kmem_cache *cachep,
+  unsigned long flags)
 {
struct pmb_entry *pmbe = pmb;
 
memset(pmb, 0, sizeof(struct pmb_entry));
 
-   spin_lock_irq(_list_lock);
-
pmbe->entry = PMB_NO_ENTRY;
-   pmb_list_add(pmbe);
-
-   spin_unlock_irq(_list_lock);
-}
-
-static void pmb_cache_dtor(void *pmb, struct kmem_cache *cachep, unsigned long 
flags)
-{
-   spin_lock_irq(_list_lock);
-   pmb_list_del(pmb);
-   spin_unlock_irq(_list_lock);
 }
 
 static int __init pmb_init(void)
@@ -312,8 +310,7 @@ static int __init pmb_init(void)
BUG_ON(unlikely(nr_entries >= NR_PMB_ENTRIES));
 
pmb_cache = kmem_cache_create("pmb", sizeof(struct pmb_entry), 0,
-  

Re: [PATCH] utimensat implementation

2007-05-10 Thread Neil Brown
On Thursday May 10, [EMAIL PROTECTED] wrote:
> Ulrich Drepper wrote:
> > Neil Brown wrote:
> >> Does it also specify how to find out what granularity is used by the
> >> filesystem?  I had a need for this just recently and couldn't see any
> >> way to extract it.
> > 
> > That's still on the table.  We might end up with an fpathconf() solution.
> 
> OK, the pathconf()-based solution will most probably be in the next 
> POSIX spec.
> 
> Now, somebody has to provide a way to get to this information.  The 
> kernel does not export it so far.  Is it finally time to break down and 
> allow pathconf() and fpathconf() syscalls into the kernel?

Maybe... certainly we want some way to get at this information.

It has occurred to me a number of times that there is no easy way to
export information about filesystems from the kernel.

One specific example is request statistics for an NFS filesystem.  We
can get system-wide statistics, but to get stats for a single
filesystem isn't possible, and a big reason for this is that there is
no-where to put that information.

Filesystems also have a variety of mount options and they are only
available through "/proc/mounts" and can only be change by "remount"
which is a bit of a clunky interface.

Just about every other kernel object is, or can be, exposed through
sysfs.  But filesystems cannot.  This is presumably because there is
no unique handle for them (what with name spaces and bind mounts and
so forth).
Each filesystem still have a unique device number (->s_dev) so that
could be used. e.g. we could create
  /sysfs/filesystem/00:03/

which would contain info about the filesystem with device number 0:3.
We could then put time-granularity and other fs-specific info in
there.

I feel that would be more flexible than a specific fpathconfat system
call.  But would it be enough?
The pathconf values can apparently be different for different files in
a filesystem.  Is that important?  If it is, we really would want
some new syscall rather than just sysfs attributes.

So that makes two questions for anyone with opinions:
 1/ Does pathconf have to be per-file, or is per-filesystem OK
 2/ Can we have a way to put attributes for filesystems in sysfs?

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kconfig warnings on latest GIT

2007-05-10 Thread Kumar Gala
On Fri, 11 May 2007, Simon Horman wrote:

> On Thu, May 10, 2007 at 08:47:05PM -0500, Kumar Gala wrote:
> > Try this patch:
>
> That certainly resolves the problem for me.
> I'll see about doing something like that for the similar
> Kconfig problems that I see.
>
> --
> Horms
>   H: http://www.vergenet.net/~horms/
>   W: http://www.valinux.co.jp/en/
>

I've got a similar fix for SYS_SUPPORTS_APM_EMULATION already.  I'll push
both of these to Paul.  If you can put something in place for the
Atari/68k and send it to Geert that would be good (feeling a little lazy
right now :)

I'm still not happy about this fix.  I'd like to get Sam's feeling on if
we can fixup kconfig not to warn if the dependency isn't meet.  I think
the select is valid, and would prefer to fix this properly before we paper
tape over it.

- k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powerpc: fix Kconfig 'select' warning with UCC_FAST

2007-05-10 Thread Timur Tabi
The UCC_GETH Kconfig option in drivers/net/Kconfig had a line to select
the UCC_FAST option is arch/powerpc/sysdev/qe_lib/Kconfig, which is only used
on PowerPC builds.  On other architectures, this would generated a warning.
The fix is to have UCC_FAST depend on UCC_GETH.

Signed-off-by: Timur Tabi <[EMAIL PROTECTED]>
---

The reason I used 'select' in the first place was because I didn't want to
have to update the definitions of UCC_FAST or UCC_SLOW every time we added
a new UCC device driver, but I guess that's unavoidable.

 arch/powerpc/sysdev/qe_lib/Kconfig |4 +---
 drivers/net/Kconfig|1 -
 2 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/sysdev/qe_lib/Kconfig 
b/arch/powerpc/sysdev/qe_lib/Kconfig
index 887739f..f611d34 100644
--- a/arch/powerpc/sysdev/qe_lib/Kconfig
+++ b/arch/powerpc/sysdev/qe_lib/Kconfig
@@ -5,15 +5,13 @@
 config UCC_SLOW
bool
default n
-   select UCC
help
  This option provides qe_lib support to UCC slow
  protocols: UART, BISYNC, QMC
 
 config UCC_FAST
bool
-   default n
-   select UCC
+   default y if UCC_GETH
help
  This option provides qe_lib support to UCC fast
  protocols: HDLC, Ethernet, ATM, transparent
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b86ccd2..5a5c026 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2276,7 +2276,6 @@ config GFAR_NAPI
 config UCC_GETH
tristate "Freescale QE Gigabit Ethernet"
depends on QUICC_ENGINE
-   select UCC_FAST
help
  This driver supports the Gigabit Ethernet mode of the QUICC Engine,
  which is available on some Freescale SOCs.
-- 
1.5.0.2.260.g2eb065

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kconfig warnings on latest GIT

2007-05-10 Thread Simon Horman
On Thu, May 10, 2007 at 08:47:05PM -0500, Kumar Gala wrote:
> Try this patch:

That certainly resolves the problem for me.
I'll see about doing something like that for the similar
Kconfig problems that I see.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hi, I have one question about rt_mutex.

2007-05-10 Thread Li Yu
Steven Rostedt wrote:
> Li Yu wrote:
>   
>>> Now since mutexes can be defined by user-land applications, we don't
>>>   
>> want a DOS
>> 
>>> type of application that nests large amounts of mutexes to create a large
>>> PI chain, and have the code holding spin locks while looking at a large
>>> amount of data. So to prevent this, the implementation not only implements
>>> a maximum lock depth, but also only holds at most two different locks at a
>>> time, as it walks the PI chain. More about this below.
>>>   
>> After read the implementation of rt_mutex_adjust_prio_chain(), I found
>> the we really require maximin lock depth (1024 default), but I can not
>> see the check for more same locks duplication. Does this doc is
>> inconsistent with code?
>> 
>
> Nope, the code and the doc are still the same.
>
> The thing that was most difficult in writing that document, was a way to
> talk about the user locks (futex - fast user mutex) and the kernel locks
> (spin_locks) without confusing the two.  The max depth is in reference
> to the user futex, but the comment about the "at most two different
> locks" is referencing the kernel's spin_locks.
>
> I don't remember talking about looking for "lock duplication", which I'm
> thinking you are referring to circular dead locks. I didn't cover that
> in the document and I believe I even mentioned that I would not cover
> the debug aspect of the code which would handle catching circular deadlocks.
>
> But back to the "no more than two kernel locks held". This is very
> important. Some PI implementations requires all locks in the PI chain to
> have their internal locks held (as in spin_locks).  But letting user
> space determine the number of spin locks held can cause large latencies
> for the rest of the system.  So we designed a method to only need to
> hold two internal spin_locks in the PI chain at a time.  The kernel
> doesn't care if the user application is abusing itself (holding too many
> of it's own user locks).  But the kernel does care if a user application
> can affect other non related applications.
>
> As Esben already mentioned, the PI chain even lets the locking user
> mutex schedule without holding any kernel locks.  This is very key. It
> keeps the latency down on setting up a PI chain which can be very expensive.
>
> Note: Esben helped a lot in the development of the final design of
> rtmutex.c.
>
> -- Steve
>   
First, Thanks for such good explanation from you two guru in time.

Er, I think these two locks which you said are task->pi_lock and
rt_mutex->wait_lock.

>The max depth is in reference to the user futex, but the comment
>about the "at most two different locks" is referencing the 
>kernel's spin_locks.

This sentence make the my world clear from now on ;)

However, I found the sys_futex() do not use rt_mutex, so what's mean of the 
user futex you said?
Even, I have not found any usage for rt_mutex in kernel code. Or, some 
beautiful story will happen in future?

Goodluck.

- Li Yu



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kconfig warnings on latest GIT

2007-05-10 Thread Kumar Gala


On May 10, 2007, at 8:25 PM, Simon Horman wrote:


On Thu, May 10, 2007 at 11:56:48AM -0500, Timur Tabi wrote:

Simon Horman wrote:


So my question is: in which Kconfig do I define "UCC_FAST_TEMP" and
"UCC_SLOW_TEMP"?  At first I thought, just put it in drivers/ 
Kconfig, but
that Kconfig does nothing but including other Kconfigs.  I  
believe that if I

submit a patch that adds "UCC_FAST_TEMP" and "UCC_SLOW_TEMP" to
drivers/Kconfig, it will be rejected.  Either that, or I'll  
spend six weeks

trying to persuade everyone that it's a good idea.

Does anyone have any suggestions on how I can fix this?

That does seem like a reasonable suggestion, and one that
would probably work well with the other similar problems
that have been introduced sice 2.6.21.


Looks like the fix is simpler than I thought.  Instead of having

UCC_GETH
select UCC_FAST

I need to do

UCC_FAST
default y if UCC_GETH


I pondered something like that, but I couldn't get it quite right :(


I'll have a patch that fixes this out later today.

I chose the first method because I wanted each individual UCC device
driver to select UCC_FAST or UCC_SLOW as appropriate, so that I
wouldn't have to update arch/powerpc/sysdev/qe_lib/Kconfig every time
we add a new UCC driver.  Oh well.


It really seems like the kconfig shouldn't complain if the depends  
isnt satisfied.


- k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Conveying memory pressure to userspace?

2007-05-10 Thread Martin J. Bligh

Bas Westerbaan wrote:

Hello,

Quite a lot of userspace applications cache.  Firefox caches pages;
mySQL caches queries;  libc' free (like practically all other
userspace allocators) won't directly return the first byte of memory
freed, etc.

These applications are unaware of memory pressure.  While the kernel
is trying its best to to free memory by for instance dropping,
possibly more valuable caches, userspace is left blissfully unaware.

Obviously this isn't a really big problem, given that we've still got
swap to swap out those rarely used caches, except for when the caches
aren't _that_ rarely used and of which the backing store (eg.
precomputed values) might be faster than the disk to swap back the
pages from.

A solution would be to either

 a) let the application make the kernel aware of pages that, when in
memory pressure, may be dropped.  This would be tricky to implement
for the userspace: it's hard to avoid an application to race into a
dropped page.  However, the kernel can directly free a page from
userspace, which makes it use full when under real pressure.  This in
contrast to
b) letting the application register itself with a cache share
priority.  The application (and other aware applications) would then
be able to query how fair they are at the moment proportional to their
cache share priority.  Freeing would still be completely in their own
hands.


The only relevant related matter I could find were madvise and mincore.

With madvise pages can be marked to be unnecessary and these should be
swapped out earlier.  With mincore one can determine whether pages are
resident (not cached).  This would make an existing alternative to
solution a.  However, this doesn't eliminate the writes to the swap
and polling everytime before accessing a cache isn't really pretty.

I did consider guessing the memory pressure by looking at
/proc/meminfo, but I think it isn't that accurate.


The prev_priority field in the zoneinfo stuff is more useful for
memory pressure. I'm playing with making a blocking callback that
can wake someone up when this gets down to a certain priority level
(prio=12 => everything's rosy, prio=0 => we're in deep shit).


Before hacking something together (and being uncertain about the
thoroughness with which I searched for existing work, sorry), I would
like your thoughts on this.

Please CC me, I'm not in the list.

 Bas



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.21-mm1 (git-input) on Dell D610 laptop

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 15:05:25 +0200
Remi Colinet <[EMAIL PROTECTED]> wrote:

> My D610 ALPS Glide Point is unresponsive with 2.6.21-mm1 patch.
> No problem noticed with 2.6.21.
> 
> The culprit seems to be git-input. I have applied 2.6.21-mm1 on top of 2.6.21
> and then removed git-input patch. It is ok since then.
> 
> >From what i can see, no interrupt is raised from the GlidePoint with 
> >git-input
> applied. IRQ count 12 does not increase. It is when using the touchpad.
> 
>CPU0
>   0:160   IO-APIC-edge  timer
>   1:935   IO-APIC-edge  i8042
>   7:  0   IO-APIC-edge  parport0
>   8:  1   IO-APIC-edge  rtc
>   9:  2   IO-APIC-fasteoi   acpi
> => 12:114   IO-APIC-edge  i8042 <=
>  14:   3223   IO-APIC-edge  libata
>  15:  5   IO-APIC-edge  libata
>  16:  0   IO-APIC-fasteoi   uhci_hcd:usb1, ehci_hcd:usb5, Intel ICH6
>  17:  1   IO-APIC-fasteoi   uhci_hcd:usb2, ipw2200, Intel ICH6 Modem
>  18:  0   IO-APIC-fasteoi   uhci_hcd:usb3
>  19:  1   IO-APIC-fasteoi   uhci_hcd:usb4, yenta
> NMI:  0
> LOC:   4051
> ERR:  0
> MIS:  0
> 
> I have also tried to disable the ALPS driver in the .config file. IRQ 12 are
> then raised when using the Glide Point. X refuses to start.
> 

Are you able to test 2.6.21-mm2?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kconfig warnings on latest GIT

2007-05-10 Thread Kumar Gala
On Fri, 11 May 2007, Simon Horman wrote:

> On Thu, May 10, 2007 at 11:56:48AM -0500, Timur Tabi wrote:
> > Simon Horman wrote:
> >
> > >>So my question is: in which Kconfig do I define "UCC_FAST_TEMP" and
> > >>"UCC_SLOW_TEMP"?  At first I thought, just put it in drivers/Kconfig, but
> > >>that Kconfig does nothing but including other Kconfigs.  I believe that 
> > >>if I
> > >>submit a patch that adds "UCC_FAST_TEMP" and "UCC_SLOW_TEMP" to
> > >>drivers/Kconfig, it will be rejected.  Either that, or I'll spend six 
> > >>weeks
> > >>trying to persuade everyone that it's a good idea.
> > >>
> > >>Does anyone have any suggestions on how I can fix this?
> > >That does seem like a reasonable suggestion, and one that
> > >would probably work well with the other similar problems
> > >that have been introduced sice 2.6.21.
> >
> > Looks like the fix is simpler than I thought.  Instead of having
> >
> > UCC_GETH
> > select UCC_FAST
> >
> > I need to do
> >
> > UCC_FAST
> > default y if UCC_GETH
>
> I pondered something like that, but I couldn't get it quite right :(
>
> > I'll have a patch that fixes this out later today.
> >
> > I chose the first method because I wanted each individual UCC device
> > driver to select UCC_FAST or UCC_SLOW as appropriate, so that I
> > wouldn't have to update arch/powerpc/sysdev/qe_lib/Kconfig every time
> > we add a new UCC driver.  Oh well.
>
> --
> Horms
>   H: http://www.vergenet.net/~horms/
>   W: http://www.valinux.co.jp/en/
>

Try this patch:

diff --git a/arch/powerpc/sysdev/qe_lib/Kconfig 
b/arch/powerpc/sysdev/qe_lib/Kconfig
index 887739f..5de7aba 100644
--- a/arch/powerpc/sysdev/qe_lib/Kconfig
+++ b/arch/powerpc/sysdev/qe_lib/Kconfig
@@ -12,6 +12,7 @@ config UCC_SLOW

 config UCC_FAST
bool
+   default y if UCC_GETH
default n
select UCC
help
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index fa489b1..b159c6c 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2276,7 +2276,6 @@ config GFAR_NAPI
 config UCC_GETH
tristate "Freescale QE Gigabit Ethernet"
depends on QUICC_ENGINE
-   select UCC_FAST
help
  This driver supports the Gigabit Ethernet mode of the QUICC Engine,
  which is available on some Freescale SOCs.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slub-i386-support.patch

2007-05-10 Thread William Lee Irwin III
On Thu, May 10, 2007 at 05:07:02PM -0700, William Lee Irwin III wrote:
> I described it as motivated by such, not really correctly handling it.
> I didn't bother analyzing it for correctness. I'm not surprised at all
> that the TLB flush can be missed where it now stands in the patch. I
> wanted to move it to tlb_finish_mmu() all along, along with quicklist
> management of lower levels of hierarchy.
> quicklist_free() with unflushed TLB entries admits speculation through
> the pagetable entries corresponding to the list links. So tlb_finish_mmu()
> is the place to call quicklist_free() on pagetables. This requires
> distinguishing preconstructed pagetables from freed user pages, which
> is not done in include/asm-generic/tlb.h (and core callers may need
> to be adjusted, pending the results of audits).
> To clarify, upper levels of pagetables are indeed cached by x86 TLB's.
> The same kind of deferral of freeing until the TLB is flushed required
> for leaf pagetables is required for the upper levels as well.

Never mind. The present bit ends up unset because all the vaddrs are
page-aligned, and PDPTE entries (which lack present bits) aren't ever
internally updated until explicit reloads. I'm still not wild about it,
but can't be arsed to deal with it unless it actually breaks.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kconfig warnings on latest GIT

2007-05-10 Thread Simon Horman
On Thu, May 10, 2007 at 11:56:48AM -0500, Timur Tabi wrote:
> Simon Horman wrote:
> 
> >>So my question is: in which Kconfig do I define "UCC_FAST_TEMP" and 
> >>"UCC_SLOW_TEMP"?  At first I thought, just put it in drivers/Kconfig, but 
> >>that Kconfig does nothing but including other Kconfigs.  I believe that if 
> >>I 
> >>submit a patch that adds "UCC_FAST_TEMP" and "UCC_SLOW_TEMP" to 
> >>drivers/Kconfig, it will be rejected.  Either that, or I'll spend six weeks 
> >>trying to persuade everyone that it's a good idea.
> >>
> >>Does anyone have any suggestions on how I can fix this?
> >That does seem like a reasonable suggestion, and one that
> >would probably work well with the other similar problems
> >that have been introduced sice 2.6.21.
> 
> Looks like the fix is simpler than I thought.  Instead of having
> 
> UCC_GETH
>   select UCC_FAST
> 
> I need to do
> 
> UCC_FAST
>   default y if UCC_GETH

I pondered something like that, but I couldn't get it quite right :(

> I'll have a patch that fixes this out later today.
> 
> I chose the first method because I wanted each individual UCC device
> driver to select UCC_FAST or UCC_SLOW as appropriate, so that I
> wouldn't have to update arch/powerpc/sysdev/qe_lib/Kconfig every time
> we add a new UCC driver.  Oh well.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-gitX: known regressions

2007-05-10 Thread Kok, Auke

Andrew Morton wrote:

On Thu, 10 May 2007 14:04:13 +0200
Michal Piotrowski <[EMAIL PROTECTED]> wrote:


Hi all,

Here is a list of some known regressions in 2.6.21-gitX.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions


Networking:

Subject: panic with e1000 driver on HP Integrity servers
References : http://bugzilla.kernel.org/show_bug.cgi?id=8455
Submitter  : Doug Chapman <[EMAIL PROTECTED]>
Caused-By  : Auke Kok <[EMAIL PROTECTED]>
 commit e0aac5a289b1dacbc94bd9ae8c449bcdf9ab508c
Status : Unknown


We're trying to reproduce this in our labs here but that piece of code has been 
extensively tested on various platforms and architectures, so I'm a bit 
surprised about it. I've asked for more info on the bugzilla as well.


So, this is being worked on actively.

Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] lguest console driver feedback tidyups

2007-05-10 Thread Rusty Russell
1) Use new lguest_send_dma & lguest_bind_dma functions.
2) sparse: lguest_cons can be static.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/char/hvc_lguest.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

===
--- a/drivers/char/hvc_lguest.c
+++ b/drivers/char/hvc_lguest.c
@@ -36,7 +36,7 @@ static int put_chars(u32 vtermno, const 
dma.len[1] = 0;
dma.addr[0] = __pa(buf);
 
-   hcall(LHCALL_SEND_DMA, LGUEST_CONSOLE_DMA_KEY, __pa(), 0);
+   lguest_send_dma(LGUEST_CONSOLE_DMA_KEY, );
return count;
 }
 
@@ -59,7 +59,7 @@ static int get_chars(u32 vtermno, char *
return count;
 }
 
-struct hv_ops lguest_cons = {
+static struct hv_ops lguest_cons = {
.get_chars = get_chars,
.put_chars = put_chars,
 };
@@ -75,14 +75,17 @@ console_initcall(cons_init);
 
 static int lguestcons_probe(struct lguest_device *lgdev)
 {
-   lgdev->private = hvc_alloc(0, lgdev->index+1, _cons, 256);
+   int err;
+
+   lgdev->private = hvc_alloc(0, lgdev_irq(lgdev), _cons, 256);
if (IS_ERR(lgdev->private))
return PTR_ERR(lgdev->private);
 
-   if (!hcall(LHCALL_BIND_DMA, LGUEST_CONSOLE_DMA_KEY, __pa(_input),
-  (1<<8) + lgdev->index+1))
+   err = lguest_bind_dma(LGUEST_CONSOLE_DMA_KEY, _input, 1,
+ lgdev_irq(lgdev));
+   if (err)
printk("lguest console: failed to bind buffer.\n");
-   return 0;
+   return err;
 }
 
 static struct lguest_driver lguestcons_drv = {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] lguest block driver feedback tidyups

2007-05-10 Thread Rusty Russell
1) Use new dma wrapper functions, and handle bind failure (may happen
   in future)
2) Use new lgdev_irq() "get me a good interrupt number" function.
3)  __force the ioremap: guests can use it as normal memory.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/block/lguest_blk.c |   16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

===
--- a/drivers/block/lguest_blk.c
+++ b/drivers/block/lguest_blk.c
@@ -123,7 +123,7 @@ static void do_write(struct blockdev *bd
pr_debug("lgb: WRITE sector %li\n", (long)req->sector);
setup_req(bd, 1, req, );
 
-   hcall(LHCALL_SEND_DMA, bd->phys_addr, __pa(), 0);
+   lguest_send_dma(bd->phys_addr, );
 }
 
 static void do_read(struct blockdev *bd, struct request *req)
@@ -134,7 +134,7 @@ static void do_read(struct blockdev *bd,
setup_req(bd, 0, req, >dma);
 
empty_dma();
-   hcall(LHCALL_SEND_DMA, bd->phys_addr, __pa(), 0);
+   lguest_send_dma(bd->phys_addr, );
 }
 
 static void do_lgb_request(request_queue_t *q)
@@ -183,13 +183,13 @@ static int lguestblk_probe(struct lguest
return -ENOMEM;
 
spin_lock_init(>lock);
-   bd->irq = lgdev->index+1;
+   bd->irq = lgdev_irq(lgdev);
bd->req = NULL;
bd->dma.used_len = 0;
bd->dma.len[0] = 0;
bd->phys_addr = (lguest_devices[lgdev->index].pfn << PAGE_SHIFT);
 
-   bd->lb_page = (void *)ioremap(bd->phys_addr, PAGE_SIZE);
+   bd->lb_page = (__force void *)ioremap(bd->phys_addr, PAGE_SIZE);
if (!bd->lb_page) {
err = -ENOMEM;
goto out_free_bd;
@@ -225,7 +225,9 @@ static int lguestblk_probe(struct lguest
if (err)
goto out_cleanup_queue;
 
-   hcall(LHCALL_BIND_DMA, bd->phys_addr, __pa(>dma), (1<<8)+bd->irq);
+   err = lguest_bind_dma(bd->phys_addr, >dma, 1, bd->irq);
+   if (err)
+   goto out_free_irq;
 
bd->disk->major = bd->major;
bd->disk->first_minor = 0;
@@ -241,6 +243,8 @@ static int lguestblk_probe(struct lguest
lgdev->private = bd;
return 0;
 
+out_free_irq:
+   free_irq(bd->irq, bd);
 out_cleanup_queue:
blk_cleanup_queue(bd->disk->queue);
 out_put_disk:
@@ -248,7 +252,7 @@ out_unregister_blkdev:
 out_unregister_blkdev:
unregister_blkdev(bd->major, "lguestblk");
 out_unmap:
-   iounmap(bd->lb_page);
+   iounmap((__force void *__iomem)bd->lb_page);
 out_free_bd:
kfree(bd);
return err;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] lguest network driver feedback tidyups

2007-05-10 Thread Rusty Russell
Feedback from Jeff Garzik:
1) Use netdev_priv instead of dev->priv.
2) Check for ioremap failure
3) iounmap on failure.
4) Wrap SEND_DMA and BIND_DMA calls
5) Don't set NETIF_F_SG unless we set NETIF_F_NO_CSUM
6) Use SET_NETDEV_DEV()
7) Don't set dev->irq, mem_start & mem_end (deprecated)

Sparse warnings:
8) __force the ioremap: guests can use it as normal memory.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/net/lguest_net.c |   53 ++
 1 file changed, 31 insertions(+), 22 deletions(-)

===
--- a/drivers/net/lguest_net.c
+++ b/drivers/net/lguest_net.c
@@ -35,6 +35,9 @@ struct lguestnet_info
unsigned long peer_phys;
unsigned long mapsize;
 
+   /* The lguest_device I come from */
+   struct lguest_device *lgdev;
+
/* My peerid. */
unsigned int me;
 
@@ -84,7 +87,7 @@ static void skb_to_dma(const struct sk_b
 
 static void lguestnet_set_multicast(struct net_device *dev)
 {
-   struct lguestnet_info *info = dev->priv;
+   struct lguestnet_info *info = netdev_priv(dev);
 
if ((dev->flags & (IFF_PROMISC|IFF_ALLMULTI)) || dev->mc_count)
info->peer[info->me].mac[0] |= PROMISC_BIT;
@@ -110,13 +113,13 @@ static void transfer_packet(struct net_d
struct sk_buff *skb,
unsigned int peernum)
 {
-   struct lguestnet_info *info = dev->priv;
+   struct lguestnet_info *info = netdev_priv(dev);
struct lguest_dma dma;
 
skb_to_dma(skb, skb_headlen(skb), );
pr_debug("xfer length %04x (%u)\n", htons(skb->len), skb->len);
 
-   hcall(LHCALL_SEND_DMA, peer_key(info,peernum), __pa(), 0);
+   lguest_send_dma(peer_key(info, peernum), );
if (dma.used_len != skb->len) {
dev->stats.tx_carrier_errors++;
pr_debug("Bad xfer to peer %i: %i of %i (dma %p/%i)\n",
@@ -137,7 +140,7 @@ static int lguestnet_start_xmit(struct s
 {
unsigned int i;
int broadcast;
-   struct lguestnet_info *info = dev->priv;
+   struct lguestnet_info *info = netdev_priv(dev);
const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
 
pr_debug("%s: xmit %02x:%02x:%02x:%02x:%02x:%02x\n",
@@ -162,7 +165,7 @@ static int lguestnet_start_xmit(struct s
 /* Find a new skb to put in this slot in shared mem. */
 static int fill_slot(struct net_device *dev, unsigned int slot)
 {
-   struct lguestnet_info *info = dev->priv;
+   struct lguestnet_info *info = netdev_priv(dev);
/* Try to create and register a new one. */
info->skb[slot] = netdev_alloc_skb(dev, ETH_HLEN + ETH_DATA_LEN);
if (!info->skb[slot]) {
@@ -180,7 +183,7 @@ static irqreturn_t lguestnet_rcv(int irq
 static irqreturn_t lguestnet_rcv(int irq, void *dev_id)
 {
struct net_device *dev = dev_id;
-   struct lguestnet_info *info = dev->priv;
+   struct lguestnet_info *info = netdev_priv(dev);
unsigned int i, done = 0;
 
for (i = 0; i < ARRAY_SIZE(info->dma); i++) {
@@ -220,7 +223,7 @@ static int lguestnet_open(struct net_dev
 static int lguestnet_open(struct net_device *dev)
 {
int i;
-   struct lguestnet_info *info = dev->priv;
+   struct lguestnet_info *info = netdev_priv(dev);
 
/* Set up our MAC address */
memcpy(info->peer[info->me].mac, dev->dev_addr, ETH_ALEN);
@@ -232,8 +235,8 @@ static int lguestnet_open(struct net_dev
if (fill_slot(dev, i) != 0)
goto cleanup;
}
-   if (!hcall(LHCALL_BIND_DMA, peer_key(info, info->me), __pa(info->dma),
-  (NUM_SKBS << 8) | dev->irq))
+   if (lguest_bind_dma(peer_key(info,info->me), info->dma,
+   NUM_SKBS, lgdev_irq(info->lgdev)) != 0)
goto cleanup;
return 0;
 
@@ -246,13 +249,13 @@ static int lguestnet_close(struct net_de
 static int lguestnet_close(struct net_device *dev)
 {
unsigned int i;
-   struct lguestnet_info *info = dev->priv;
+   struct lguestnet_info *info = netdev_priv(dev);
 
/* Clear all trace: others might deliver packets, we'll ignore it. */
memset(>peer[info->me], 0, sizeof(info->peer[info->me]));
 
/* Deregister sg lists. */
-   hcall(LHCALL_BIND_DMA, peer_key(info, info->me), __pa(info->dma), 0);
+   lguest_unbind_dma(peer_key(info, info->me), info->dma);
for (i = 0; i < ARRAY_SIZE(info->dma); i++)
dev_kfree_skb(info->skb[i]);
return 0;
@@ -290,30 +293,34 @@ static int lguestnet_probe(struct lguest
/* Turning on/off promisc will call dev->set_multicast_list.
 * We don't actually support multicast yet */
dev->set_multicast_list = lguestnet_set_multicast;
-   dev->mem_start = ((unsigned long)desc->pfn << PAGE_SHIFT);
-   dev->mem_end = dev->mem_start + 

Re: Kconfig warnings on latest GIT

2007-05-10 Thread Simon Horman
On Thu, May 10, 2007 at 05:39:29PM +0200, Johannes Berg wrote:
> On Thu, 2007-05-10 at 14:10 +0900, Simon Horman wrote:
> 
> > drivers/macintosh/Kconfig:112:warning: 'select' used by config symbol 
> > 'PMAC_APM_EMU' refer to undefined symbol 'SYS_SUPPORTS_APM_EMULATION'
> 
> Argh. Is that with ARCH=ppc? I keep forgetting that it still exists,
> sorry.

Actually, it was with ARCH=ia64. I have a feeling that you can get
it to show up quite easily with anything other than ARCH=powerpc.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] lguest guest feedback tidyups

2007-05-10 Thread Rusty Russell
1) send-dma and bind-dma hypercall wrappers for drivers to use,
2) formalization of the convention that devices can use the irq
   corresponding to their index on the lguest_bus.
3) ___force to shut up sparse: guests *can* use ioremap as virtual mem.
4) lguest.c should include "lguest_bus.h" for lguest_devices declaration.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/lguest/lguest.c |   20 
 drivers/lguest/lguest_bus.c |2 +-
 include/linux/lguest_bus.h  |   13 -
 3 files changed, 33 insertions(+), 2 deletions(-)

===
--- a/include/linux/lguest_bus.h
+++ b/include/linux/lguest_bus.h
@@ -7,7 +7,6 @@
 
 struct lguest_device {
/* Unique busid, and index into lguest_page->devices[] */
-   /* By convention, each device can use irq index+1 if it wants to. */
unsigned int index;
 
struct device dev;
@@ -15,6 +14,18 @@ struct lguest_device {
/* Driver can hang data off here. */
void *private;
 };
+
+/* By convention, each device can use irq index+1 if it wants to. */
+static inline int lgdev_irq(const struct lguest_device *dev)
+{
+   return dev->index + 1;
+}
+
+/* dma args must not be vmalloced! */
+void lguest_send_dma(unsigned long key, struct lguest_dma *dma);
+int lguest_bind_dma(unsigned long key, struct lguest_dma *dmas,
+   unsigned int num, u8 irq);
+void lguest_unbind_dma(unsigned long key, struct lguest_dma *dmas);
 
 struct lguest_driver {
const char *name;
===
--- a/drivers/lguest/lguest.c
+++ b/drivers/lguest/lguest.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -99,6 +100,25 @@ void async_hcall(unsigned long call,
next_call = 0;
}
local_irq_restore(flags);
+}
+
+void lguest_send_dma(unsigned long key, struct lguest_dma *dma)
+{
+   dma->used_len = 0;
+   hcall(LHCALL_SEND_DMA, key, __pa(dma), 0);
+}
+
+int lguest_bind_dma(unsigned long key, struct lguest_dma *dmas,
+   unsigned int num, u8 irq)
+{
+   if (!hcall(LHCALL_BIND_DMA, key, __pa(dmas), (num << 8) | irq))
+   return -ENOMEM;
+   return 0;
+}
+
+void lguest_unbind_dma(unsigned long key, struct lguest_dma *dmas)
+{
+   hcall(LHCALL_BIND_DMA, key, __pa(dmas), 0);
 }
 
 static unsigned long save_fl(void)
===
--- a/drivers/lguest/lguest_bus.c
+++ b/drivers/lguest/lguest_bus.c
@@ -136,7 +136,7 @@ static int __init lguest_bus_init(void)
return 0;
 
/* Devices are in page above top of "normal" mem. */
-   lguest_devices = ioremap(max_pfn << PAGE_SHIFT, PAGE_SIZE);
+   lguest_devices = (__force void*)ioremap(max_pfn

[PATCH 1/5] lguest host feedback tidyups

2007-05-10 Thread Rusty Russell
1) Sam Ravnborg says lg-objs is deprecated, use lg-y.
2) Sparse: page_tables.c unnecessary initialization
3) Lots of __force to shut sparse up: guest "physical" addresses are
   userspace virtual.
4) Change prototype of run_lguest and do cast in caller instead (when we add
   __iomem to cast, it runs over another line).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/lguest/Makefile   |2 +-
 drivers/lguest/core.c |   16 
 drivers/lguest/hypercalls.c   |3 ++-
 drivers/lguest/interrupts_and_traps.c |4 ++--
 drivers/lguest/lg.h   |2 +-
 drivers/lguest/lguest_user.c  |2 +-
 drivers/lguest/page_tables.c  |2 +-
 7 files changed, 16 insertions(+), 15 deletions(-)

===
--- a/drivers/lguest/Makefile
+++ b/drivers/lguest/Makefile
@@ -3,5 +3,5 @@ obj-$(CONFIG_LGUEST_GUEST) += lguest.o l
 
 # Host requires the other files, which can be a module.
 obj-$(CONFIG_LGUEST)   += lg.o
-lg-objs := core.o hypercalls.o page_tables.o interrupts_and_traps.o \
+lg-y := core.o hypercalls.o page_tables.o interrupts_and_traps.o \
segments.o io.o lguest_user.o switcher.o
===
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -218,7 +218,7 @@ u32 lgread_u32(struct lguest *lg, u32 ad
 
/* Don't let them access lguest binary */
if (!lguest_address_ok(lg, addr, sizeof(val))
-   || get_user(val, (u32 __user *)addr) != 0)
+   || get_user(val, (__force u32 __user *)addr) != 0)
kill_guest(lg, "bad read address %u", addr);
return val;
 }
@@ -226,14 +226,14 @@ void lgwrite_u32(struct lguest *lg, u32 
 void lgwrite_u32(struct lguest *lg, u32 addr, u32 val)
 {
if (!lguest_address_ok(lg, addr, sizeof(val))
-   || put_user(val, (u32 __user *)addr) != 0)
+   || put_user(val, (__force u32 __user *)addr) != 0)
kill_guest(lg, "bad write address %u", addr);
 }
 
 void lgread(struct lguest *lg, void *b, u32 addr, unsigned bytes)
 {
if (!lguest_address_ok(lg, addr, bytes)
-   || copy_from_user(b, (void __user *)addr, bytes) != 0) {
+   || copy_from_user(b, (__force void __user *)addr, bytes) != 0) {
/* copy_from_user should do this, but as we rely on it... */
memset(b, 0, bytes);
kill_guest(lg, "bad read address %u len %u", addr, bytes);
@@ -243,7 +243,7 @@ void lgwrite(struct lguest *lg, u32 addr
 void lgwrite(struct lguest *lg, u32 addr, const void *b, unsigned bytes)
 {
if (!lguest_address_ok(lg, addr, bytes)
-   || copy_to_user((void __user *)addr, b, bytes) != 0)
+   || copy_to_user((__force void __user *)addr, b, bytes) != 0)
kill_guest(lg, "bad write address %u len %u", addr, bytes);
 }
 
@@ -294,7 +294,7 @@ static void run_guest_once(struct lguest
 : "memory", "%edx", "%ecx", "%edi", "%esi");
 }
 
-int run_guest(struct lguest *lg, char *__user user)
+int run_guest(struct lguest *lg, unsigned long __user *user)
 {
while (!lg->dead) {
unsigned int cr2 = 0; /* Damn gcc */
@@ -302,8 +302,8 @@ int run_guest(struct lguest *lg, char *_
/* Hypercalls first: we might have been out to userspace */
do_hypercalls(lg);
if (lg->dma_is_pending) {
-   if (put_user(lg->pending_dma, (unsigned long *)user) ||
-   put_user(lg->pending_key, (unsigned long *)user+1))
+   if (put_user(lg->pending_dma, user) ||
+   put_user(lg->pending_key, user+1))
return -EFAULT;
return sizeof(unsigned long)*2;
}
@@ -420,7 +420,7 @@ static int __init init(void)
lock_cpu_hotplug();
if (cpu_has_pge) { /* We have a broader idea of "global". */
cpu_had_pge = 1;
-   on_each_cpu(adjust_pge, 0, 0, 1);
+   on_each_cpu(adjust_pge, (void *)0, 0, 1);
clear_bit(X86_FEATURE_PGE, boot_cpu_data.x86_capability);
}
unlock_cpu_hotplug();
===
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -83,7 +83,8 @@ static void do_hcall(struct lguest *lg, 
guest_set_pmd(lg, regs->edx, regs->ebx);
break;
case LHCALL_LOAD_TLS:
-   guest_load_tls(lg, (struct desc_struct __user*)regs->edx);
+   guest_load_tls(lg,
+  (__force struct desc_struct __user*)regs->edx);
break;
case LHCALL_TS:
lg->ts = regs->edx;
===
--- 

[PATCH 0/5] lguest feedback tidyups

2007-05-10 Thread Rusty Russell
Hi all,

Gratefully-received recent feedback from CC'd was applied to excellent
effect (and the advice from Matt Mackall about my personal appearance is
best unrequited).

The patch is split in 5 parts to correspond with the 9 parts Andrew
sent out before, but here's the summary:

1) Sparse (thanks Christoph Hellwig):
- lguest_const can be static now
- lguest.c should include "lguest_bus.h" for lguest_devices declaration.
- page_tables.c unnecessary initialization
- But the cost was high: lots of __force casts 8(
2) Jeff Garzik
- Use netdev_priv instead of dev->priv.
- Check for ioremap failure
- iounmap on failure.
- Wrap SEND_DMA and BIND_DMA calls
- Don't set NETIF_F_SG unless we set NETIF_F_NO_CSUM
- Use SET_NETDEV_DEV()
- Don't set dev->irq, mem_start & mem_end (deprecated)
3) Sam Ravnborg
- lg-objs is deprecated, use lg-y.

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-gitX: known regressions

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 14:04:13 +0200
Michal Piotrowski <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> Here is a list of some known regressions in 2.6.21-gitX.
> 
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
> 
> 
> 
> Unclassified:
> 
> Subject: 2.6.21-git10/11: files getting truncated on xfs (after 
> suspend/resume?)
> References : http://lkml.org/lkml/2007/5/9/410
> Submitter  : Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> Handled-By : David Chinner <[EMAIL PROTECTED]>
> Status : problem is being debugged
> 
> Subject: Current -git kernel kills X
> References : http://lkml.org/lkml/2007/5/8/667
> Submitter  : Jeff Garzik <[EMAIL PROTECTED]>
> Status : Unknown
> 
> 
> 
> Block devices:
> 
> Subject: BUG in loop.ko
> References : http://lkml.org/lkml/2007/5/9/510
> Submitter  : Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> Status : Unknown
> 
> 
> 
> Networking:
> 
> Subject: panic with e1000 driver on HP Integrity servers
> References : http://bugzilla.kernel.org/show_bug.cgi?id=8455
> Submitter  : Doug Chapman <[EMAIL PROTECTED]>
> Caused-By  : Auke Kok <[EMAIL PROTECTED]>
>  commit e0aac5a289b1dacbc94bd9ae8c449bcdf9ab508c
> Status : Unknown
> 
> 
> 
> Timers/NOHZ:
> 
> Subject: 2.6.21-git4 BUG: soft lockup detected on CPU#1! 
> References : http://lkml.org/lkml/2007/5/2/511
> Submitter  : Michal Piotrowski <[EMAIL PROTECTED]>
> Handled-By : Thomas Gleixner <[EMAIL PROTECTED]>
> Status : problem is being debugged
> 

Please also consider:

Subject: libata reset-seq merge broke sata_sil on sh
Subject: [Bugme-new] [Bug 8462] New: applications under wine freezes

But we have many many more regressions which are in 2.6.21.x, only nobody's
tracking those.  Nobody seems to be fixing them either.  Probably
everyone's busy on the 2.6.14 regressions.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] memory hotremove patch take 2 [01/10] (counter of removable page)

2007-05-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 May 2007 11:00:31 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Wed, 9 May 2007, Yasunori Goto wrote:
> 
> >  
> > +unsigned int nr_free_movable_pages(void)
> > +{
> > +   unsigned long nr_pages = 0;
> > +   struct zone *zone;
> > +   int nid;
> > +
> > +   for_each_online_node(nid) {
> > +   zone = &(NODE_DATA(nid)->node_zones[ZONE_MOVABLE]);
> > +   nr_pages += zone_page_state(zone, NR_FREE_PAGES);
> > +   }
> > +   return nr_pages;
> > +}
> 
> 
> H... This is redoing what the vm counters already provide
> 
> Could you add
> 
> NR_MOVABLE_PAGES etc.
> 
> instead and then let the ZVC counter logic take care of the rest?
> 
Okay, we'll try ZVC.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] memory hotremove patch take 2 [01/10] (counter of removable page)

2007-05-10 Thread KAMEZAWA Hiroyuki
On 10 May 2007 15:44:08 +0200
Andi Kleen <[EMAIL PROTECTED]> wrote:

> Yasunori Goto <[EMAIL PROTECTED]> writes:
> 
> 
> (not a full review, just something I noticed)
> > @@ -352,6 +352,8 @@ struct sysinfo {
> > unsigned short pad; /* explicit padding for m68k */
> > unsigned long totalhigh;/* Total high memory size */
> > unsigned long freehigh; /* Available high memory size */
> > +   unsigned long movable;  /* pages used only for data */
> > +   unsigned long free_movable; /* Avaiable pages in movable */
> 
> You can't just change that structure, it is exported to user space.
> 
Okay. We'll drop this.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slub-i386-support.patch

2007-05-10 Thread William Lee Irwin III
On Thu, May 10, 2007 at 05:07:02PM -0700, William Lee Irwin III wrote:
> quicklist_free() with unflushed TLB entries admits speculation through
> the pagetable entries corresponding to the list links. So tlb_finish_mmu()
> is the place to call quicklist_free() on pagetables. This requires
> distinguishing preconstructed pagetables from freed user pages, which
> is not done in include/asm-generic/tlb.h (and core callers may need
> to be adjusted, pending the results of audits).
> To clarify, upper levels of pagetables are indeed cached by x86 TLB's.
> The same kind of deferral of freeing until the TLB is flushed required
> for leaf pagetables is required for the upper levels as well.

Looking more closely at it, the entire attempt to avoid struct page
pointers is far beyond pointless. The freeing functions unconditionally
require struct page pointers to either be passed or computed and the
allocation function's virtual address it returns as a result is not
directly usable. The callers all have to do arithmetic on the result.
One might as well stash precomputed pfn's (if not paddrs) and vaddrs in
page->private and page->mapping, chain them with ->lru (use only .next
if you care to stay singly-linked), and handle struct page pointers
throughout. At that point quicklists not only become directly callable
for pagetable freeing (including upper levels) instead of needing calls
to quicklist freeing staged to occur at the time of tlb_finish_mmu(),
but also become usable for the highpte case.

The computations this is trying to save on are computing the virtual
and physical addresses (pfn's modulo a cheap shift; besides, all the
API's work on pfn's) of a page from the pointer to the struct page.
Chaining through the memory for the page incurs the cost of having to
stage freeing through tlb_finish_mmu() instead of using the quicklist
as a staging arena directly. So the translation from a struct page
pointer is not saving work. It's not saving cache, either. The page's
memory is no more likely to be hot than its struct page.

In the course of freeing the pointer to the struct page is computed
whether by the caller or the API function. So the translation to a
struct page pointer is done during freeing regardless.

A better solution would be to precompute those results and store
them in various fields of the struct page. i386 can move to using
generation numbers (->_mapcount and ->index are still available
for 64 bits there even after quicklists use ->lru, ->mapping, and
->private, and quicklists really only need half of ->lru) to handle
change_page_attr() and vmalloc_sync().


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] utimensat implementation

2007-05-10 Thread H. Peter Anvin
Christoph Hellwig wrote:
> 
> I'd be happy to have them.  While it's not the nicest API in the world
> it's in Posix and we have to support it at the library level, so we
> should better get it right.
> 
> I'd like to avoid having a big swithc statement in every filesystem,
> though, instead of we should have a table-driven approach instead
> where each filesystem defines one table (or multiple ones when it
> supports subtypes with different limits) and just sets a pointer in
> the superblock to it.
> 

This is starting to sound an awful lot like statfs().  Maybe we could
create a new statfs call which takes a buffer size input (so that we can
add new fields as time goes on) and which returns the necessary information?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] memory hotremove patch take 2 [05/10] (make basic remove code)

2007-05-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 May 2007 11:09:29 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Wed, 9 May 2007, Yasunori Goto wrote:
> 
> > +/*
> > + * Just an easy implementation.
> > + */
> > +static struct page *
> > +hotremove_migrate_alloc(struct page *page,
> > +   unsigned long private,
> > +   int **x)
> > +{
> > +   return alloc_page(GFP_HIGH_MOVABLE);
> > +}
> 
> This would need to reflect the zone in which you are performing hot 
> remove. Or is hot remove only possible in the higest zone?
> 
No. We'll allow hot remove in any zone-type.
My old patchest didn't include Mel-san's page grouping and just had
ZONE_MOVABLE, so I wrote this. Reflecting migration target's zone here
is reasobanle. 

Anyway, I think we'll need more complicated function here.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] memory hotremove patch take 2 [04/10] (isolate all free pages)

2007-05-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 May 2007 17:42:54 +0100 (IST)
Mel Gorman <[EMAIL PROTECTED]> wrote:

> > +   if (!pfn_valid(pfn))
> > +   return -EINVAL;
> 
> This may lead to boundary cases where pages cannot be captured at the 
> start and end of non-aligned zones due to memory holes.
> 
Hm, ok. maybe we can remove this.

> > +   zone = info->zone;
> > +   if ((zone != page_zone(pfn_to_page(pfn))) ||
> > +   (zone != page_zone(pfn_to_page(last_pfn
> > +   return -EINVAL;
> 
> Is this check really necessary? Surely a caller to 
> capture_isolate_freed_pages() will have already made all the necessary 
> checks when adding the struct insolation_info ?
> 
just because isolation_info is treated per zone.
Maybe MIGRATE_ISOLATING can allow us more flexible approach.


> > +   drain_all_pages();
> > +   spin_lock(>lock);
> > +   while (pfn < info->end_pfn) {
> > +   if (!pfn_valid(pfn)) {
> > +   pfn++;
> > +   continue;
> > +   }
> > +   page = pfn_to_page(pfn);
> > +   /* See page_is_buddy()  */
> > +   if (page_count(page) == 0 && PageBuddy(page)) {
> 
> If PageBuddy is set it's free, you shouldn't have to check the page_count.
> 
ok.

> > +   order = page_order(page);
> > +   order_size = 1 << order;
> > +   zone->free_area[order].nr_free--;
> > +   __mod_zone_page_state(zone, NR_FREE_PAGES, -order_size);
> > +   list_del(>lru);
> > +   rmv_page_order(page);
> > +   isolate_page_nolock(info, page, order);
> > +   nr_pages += order_size;
> > +   pfn += order_size;
> > +   } else {
> > +   pfn++;
> > +   }
> > +   }
> > +   spin_unlock(>lock);
> > +   return nr_pages;
> > +}
> > #endif /* CONFIG_PAGE_ISOLATION */
> >
> 
> This is all similar to move_freepages() other than the locking part. It 
> would be worth checking if there is code that could be shared or at least 
> have similar styles.

Thank you, I'll look into move_freepages().

-Kame


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86 setup rewrite tree ready for flamage^W review

2007-05-10 Thread H. Peter Anvin
Martin Mares wrote:
> Hello!
> 
>> As far as I could tell, "scan" simply caused the nonstandard video
>> driver scan modules (unsafe probes) to be invoked.  Since those modules
>> are no longer present, there appeared to be no need for them.  The VGA
>> and VESA probes are safe.
> 
> "scan" is still useful, because it is able to find BIOS video modes with
> non-standard numbers (they are still sometimes found on recent cards).

Well, I don't have a card which does anything like that, but I did just
implement the "scan" functionality and pushed it out.  If anyone cares
about that functionality it would be good if they could test it out and
report if it works.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Conveying memory pressure to userspace?

2007-05-10 Thread Bas Westerbaan

Hello,

Quite a lot of userspace applications cache.  Firefox caches pages;
mySQL caches queries;  libc' free (like practically all other
userspace allocators) won't directly return the first byte of memory
freed, etc.

These applications are unaware of memory pressure.  While the kernel
is trying its best to to free memory by for instance dropping,
possibly more valuable caches, userspace is left blissfully unaware.

Obviously this isn't a really big problem, given that we've still got
swap to swap out those rarely used caches, except for when the caches
aren't _that_ rarely used and of which the backing store (eg.
precomputed values) might be faster than the disk to swap back the
pages from.

A solution would be to either

 a) let the application make the kernel aware of pages that, when in
memory pressure, may be dropped.  This would be tricky to implement
for the userspace: it's hard to avoid an application to race into a
dropped page.  However, the kernel can directly free a page from
userspace, which makes it use full when under real pressure.  This in
contrast to
b) letting the application register itself with a cache share
priority.  The application (and other aware applications) would then
be able to query how fair they are at the moment proportional to their
cache share priority.  Freeing would still be completely in their own
hands.


The only relevant related matter I could find were madvise and mincore.

With madvise pages can be marked to be unnecessary and these should be
swapped out earlier.  With mincore one can determine whether pages are
resident (not cached).  This would make an existing alternative to
solution a.  However, this doesn't eliminate the writes to the swap
and polling everytime before accessing a cache isn't really pretty.

I did consider guessing the memory pressure by looking at
/proc/meminfo, but I think it isn't that accurate.

Before hacking something together (and being uncertain about the
thoroughness with which I searched for existing work, sorry), I would
like your thoughts on this.

Please CC me, I'm not in the list.

 Bas

--
Bas Westerbaan
GPG 99BA289B | SINP [EMAIL PROTECTED]
http://blog.w-nz.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] memory hotremove patch take 2 [03/10] (drain all pages)

2007-05-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 May 2007 16:35:37 +0100 (IST)
Mel Gorman <[EMAIL PROTECTED]> wrote:

> On Wed, 9 May 2007, Yasunori Goto wrote:
> 
> > This patch add function drain_all_pages(void) to drain all
> > pages on per-cpu-freelist.
> > Page isolation will catch them in free_one_page.
> >
> 
> Is this significantly different to what drain_all_local_pages() currently 
> does?
> 

no difference. this duplicating it. thank you for pointing out.
Maybe I missed this because this func only exists in -mm.

Regards,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] module_author: don't advice putting in an email address

2007-05-10 Thread Rene Herman

Hi Rusty.

Following up the recent MODULE_MAINTAINER discussion:

http://lkml.org/lkml/2007/4/4/170

that concluded with MODULE_MAINTAINER not being a good idea, here's a small 
patch that just deletes the advice of including an email address in the 
MODULE_AUTHOR tag as suggested (and not objected to) at the end of it.


The email address is the problem I was trying to fix; with multiple current 
and non-current authors and maintainers who might not even be authors the 
address(es) available from the tag confuse the issue of whom to contact. 
It's moreover also information that easily outdated.


A bit more than half of the tags in the tree don't include an email address 
already and I'll submit patches removing more...


Rene.

commit 3b4fa382d5a6a3d9afdcb5a9232d63c47391fb30
Author: Rene Herman <[EMAIL PROTECTED]>
Date:   Fri May 11 02:24:35 2007 +0200

module_author: don't advice putting in an email address

It's information that's easily outdated and easily mistaken for
a driver contact which is a problem especially for modules with
multiple current and non-current authors as well as for modules
with a maintainer who may not even be a module author.

Signed-off-by: Rene Herman <[EMAIL PROTECTED]>

diff --git a/include/linux/module.h b/include/linux/module.h
index 792d483..e6e0f86 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -124,7 +124,7 @@ extern struct module __this_module;
  */
 #define MODULE_LICENSE(_license) MODULE_INFO(license, _license)
 
-/* Author, ideally of form NAME [, NAME ]*[ and NAME ] */
+/* Author, ideally of form NAME[, NAME]*[ and NAME] */
 #define MODULE_AUTHOR(_author) MODULE_INFO(author, _author)
   
 /* What your module does. */


Re: [RFC] memory hotremove patch take 2 [03/10] (drain all pages)

2007-05-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 May 2007 11:07:08 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Wed, 9 May 2007, Yasunori Goto wrote:
> 
> > This patch add function drain_all_pages(void) to drain all 
> > pages on per-cpu-freelist.
> > Page isolation will catch them in free_one_page.
> 
> This is only draining the pcps of the local processor. I would think 
> that you need to drain all other processors pcps of this zone as well. And 
> there is no need to drain this processors pcps of other zones.
> 

As Mel-san pointed, -mm has drain_all_local_pages(). We'll use it.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata reset-seq merge broke sata_sil on sh

2007-05-10 Thread Paul Mundt
On Thu, May 10, 2007 at 03:08:59PM +0200, Tejun Heo wrote:
> Paul Mundt wrote:
> > The detection is simply flaky after that point, however before the
> > current master it never hit the 35 second point (and thus never implied
> > that the link was down). I'll double check the bisect log to see if there
> > was anything beyond that that may have caused it.
> > 
> > The -ENODEV at least implies that the SRST fails, so at least that's a
> > starting point.
> 
> If prereset() fails to get the initial DRDY before 10secs, it assumes
> something went wrong and escalates to hardreset.  sil family of
> controllers report 0xff status while the link is broken and it seems
> that your particular drive needs more than the current 150ms to recover
> phy link.  It probably went unnoticed till now because the device was
> never hardreset before.  If the diagnosis is correct, increasing the
> delay in hardreset should fix the problem.  Well, let's see.  :-)
> 
Bumping the hardreset delay up does indeed fix it, I've had to bump it up
to 1200 before it started working (at 600 it still fails):

[0.967379] scsi0 : sata_sil
[0.970425] scsi1 : sata_sil
[0.973298] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 
0xfd000200 irq 0
[0.981331] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 
0xfd000208 irq 0
[1.299353] ata1: device not ready (errno=-19), forcing hardreset
[2.817893] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[2.826284] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 
39070080
[2.831052] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
[2.837548] ata1.00: 39070080 sectors, multi 0: LBA
[2.842702] ata1.00: applying bridge limits
[2.854162] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 
39070080
[2.858938] ata1.00: configured for UDMA/100
[3.172602] ata2: SATA link down (SStatus 0 SControl 310)
[3.175736] scsi 0:0:0:0: Direct-Access ATA  HHD424020F7SV00  00ML 
PQ: 0 ANSI: 5

I'm not sure if it matters or not, but this is an iVDR drive, so that
might also have additional implications.

--

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4595d1f..4dad3fd 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3518,7 +3518,7 @@ int sata_std_hardreset(struct ata_port *ap, unsigned int 
*class,
}
 
/* wait a while before checking status, see SRST for more info */
-   msleep(150);
+   msleep(1200);
 
rc = ata_wait_ready(ap, deadline);
/* link occupied, -ENODEV too is an error */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: add human-readable error value decoding

2007-05-10 Thread Alan Cox
> Scrollback rarely works as planned, for me.  Overall, a balance must be 
> found.
> 
> More information is more helpful.  But.
> 
> There are downsides to spewing everything possible, upon error.  You 
> cause logging to the possibly problematic disk, you push older messages 
> out of the printk ring buffer, etc., etc.

Get yourself a Voodoo5 or similar card cheap off ebay. The firmware on
most of them doesn't clear the top 30MB of RAM on a reboot/PCI reset
which makes them excellent debug buffers providing you empty the buffer
before you run the X server.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] memory hotremove patch take 2 [02/10] (make page unused)

2007-05-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 May 2007 11:04:37 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Wed, 9 May 2007, Yasunori Goto wrote:
> 
> > This patch is for supporting making page unused.
> > 
> > Isolate pages by capturing freed pages before inserting free_area[],
> > buddy allocator.
> > If you have an idea for avoiding spin_lock(), please advise me.
> 
> Using the zone lock instead may avoid to introduce another lock? Or is the 
> new lock here for performance reasons?
> 
> Isnt it possible to just add another flavor of pages like what Mel has 
> been doing with reclaimable and movable? I.e. add another category of free 
> pages to Mel's scheme called isolated and use Mel's function to move stuff 
> over there?
> 
Mel-san's idea seems good. So we'll rewrite the whole this patch.

Thank you.
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] memory hotremove patch take 2 [02/10] (make page unused)

2007-05-10 Thread KAMEZAWA Hiroyuki
On Thu, 10 May 2007 16:34:01 +0100 (IST)
Mel Gorman <[EMAIL PROTECTED]> wrote:

> > +#ifdef CONFIG_PAGE_ISOLATION
> > +   /*
> > +*  For pages which are not used but not free.
> > +*  See include/linux/page_isolation.h
> > +*/
> > +   spinlock_t  isolation_lock;
> > +   struct list_headisolation_list;
> > +#endif
> 
> Using MIGRATE_ISOLATING instead of this approach does mean that there will 
> be MAX_ORDER additional struct free_area added to the zone. That is more 
> lists than this approach.
> 
Thank you!, its an interesting idea. I think it will make our code much
simpler. I'll look into.


> I am somewhat suprised that CONFIG_PAGE_ISOLATION exists as a separate 
> option. If it was a compile-time option at all, I would expect it to 
> depend on memory hot-remove being selected.
> 
I myself think CONFIG_PAGE_ISOLATION can be used by some code which need to
isolate some amount of contiguous pages. So config is divided for now.
Now, CONFIG_MEMORY_HOTREMOVE selects this.
CONFIG_PAGE_ISOLATION and CONFIG_MEMORY_HOTREMOVE will be merged later 
if there are no one who use this except for hot-removal.



> > /*
> >  * zone_start_pfn, spanned_pages and present_pages are all
> >  * protected by span_seqlock.  It is a seqlock because it has
> > Index: current_test/mm/page_alloc.c
> > ===
> > --- current_test.orig/mm/page_alloc.c   2007-05-08 15:07:20.0 
> > +0900
> > +++ current_test/mm/page_alloc.c2007-05-08 15:08:34.0 +0900
> > @@ -41,6 +41,7 @@
> > #include 
> > #include 
> > #include 
> > +#include 
> >
> > #include 
> > #include 
> > @@ -448,6 +449,9 @@ static inline void __free_one_page(struc
> > if (unlikely(PageCompound(page)))
> > destroy_compound_page(page, order);
> >
> > +   if (page_under_isolation(zone, page, order))
> > +   return;
> > +
> 
> Using MIGRATE_ISOLATING would avoid a potential list search here.
> 
yes. thank you.

> > page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
> >
> > VM_BUG_ON(page_idx & (order_size - 1));
> > @@ -3259,6 +3263,10 @@ static void __meminit free_area_init_cor
> > zone->nr_scan_inactive = 0;
> > zap_zone_vm_stats(zone);
> > atomic_set(>reclaim_in_progress, 0);
> > +#ifdef CONFIG_PAGE_ISOLATION
> > +   spin_lock_init(>isolation_lock);
> > +   INIT_LIST_HEAD(>isolation_list);
> i> +#endif
> > if (!size)
> > continue;
> >
> > @@ -4214,3 +4222,182 @@ void set_pageblock_flags_group(struct pa
> > else
> > __clear_bit(bitidx + start_bitidx, bitmap);
> > }
> > +
> > +#ifdef CONFIG_PAGE_ISOLATION
> > +/*
> > + * Page Isolation.
> > + *
> > + * If a page is removed from usual free_list and will never be used,
> > + * It is linked to "struct isolation_info" and set Reserved, Private
> > + * bit. page->mapping points to isolation_info in it.
> > + * and page_count(page) is 0.
> > + *
> > + * This can be used for creating a chunk of contiguous *unused* memory.
> > + *
> > + * current user is Memory-Hot-Remove.
> > + * maybe move to some other file is better.
> 
> page_isolation.c to match the header filename seems reasonable. 
> page_alloc.c has a lot of multi-function stuff like memory initialisation 
> in it.

Hmm.

> 
> > + */
> > +static void
> > +isolate_page_nolock(struct isolation_info *info, struct page *page, int 
> > order)
> > +{
> > +   int pagenum;
> > +   pagenum = 1 << order;
> > +   while (pagenum > 0) {
> > +   SetPageReserved(page);
> > +   SetPagePrivate(page);
> > +   page->private = (unsigned long)info;
> > +   list_add(>lru, >pages);
> > +   page++;
> > +   pagenum--;
> > +   }
> > +}
> 
> It's worth commenting somewhere that pages on the list in isolation_info 
> are always order-0.
> 
okay.

> > +
> > +/*
> > + * This function is called from page_under_isolation()
> > + */
> > +
> > +int __page_under_isolation(struct zone *zone, struct page *page, int order)
> > +{
> > +   struct isolation_info *info;
> > +   unsigned long pfn = page_to_pfn(page);
> > +   unsigned long flags;
> > +   int found = 0;
> > +
> > +   spin_lock_irqsave(>isolation_lock,flags);
> 
> An unwritten convention seems to be that __ versions of same-named 
> functions are the nolock version. i.e. I would expect 
> page_under_isolation() to acquire and release the spinlock and 
> __page_under_isolation() to do no additional locking.
> 
> Locking outside of here might make the flow a little clearer as well if 
> you had two returns and avoided the use of "found".
> 
Maybe MOVABLE_ISOLATING will simplify these code.


> > +   list_for_each_entry(info, >isolation_list, list) {
> > +   if (info->start_pfn <= pfn && pfn < info->end_pfn) {
> > +   found = 1;
> > +   break;
> > +   }
> > +   }
> > +   if (found) {
> > 

Re: Please revert 5b479c91da90eef605f851508744bfe8269591a0 (md partition rescan)

2007-05-10 Thread H. Peter Anvin
Satyam Sharma wrote:
> On 5/10/07, Xavier Bestel <[EMAIL PROTECTED]> wrote:
>> On Thu, 2007-05-10 at 16:51 +0200, Jan Engelhardt wrote:
>> > >(But Andrew never saw your email, I suspect: "[EMAIL PROTECTED]" is
>> > probably
>> > >some strange mixup of Andrew Morton and Andi Kleen in your mind ;)
>> >
>> > What do the letters kp stand for?
> 
> Heh ... I've always wanted to know that myself. It's funny, no one
> seems to have asked that on lkml during all these years (at least none
> that a Google search would throw up).
> 
>> "Keep Patching" ?
> 
> Unlikely. "akpm" seems to be a pre-Linux-kernel nick.

http://en.wikipedia.org/wiki/Andrew_Morton_%28computer_programmer%29

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?

2007-05-10 Thread David Chinner
On Thu, May 10, 2007 at 04:49:35PM -0700, Jeremy Fitzhardinge wrote:
> David Chinner wrote:
> > Ok, this is important to kow becase we merged a mod around that time
> > that changes the way we handle the updates to the file size i.e. the
> > fix for the NULL-files-on-crash problem:
> >
> > http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba87ea699ebd9dd577bf055ebc4a98200e337542
> >
> > and that means the size of the file is not updated to the incore
> > cached inode until after the data write is complete. The symptoms
> > being seen would match with a inode-not-being-written-after-last-
> > data-write-bug in this mod
> >   
> 
> Yes, that does look like a good candidate.  Should I try to
> before-and-after this change?

Yes please!

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Info about the new netlink layer userland API

2007-05-10 Thread Rodolfo Giometti
On Thu, May 10, 2007 at 04:01:52AM -0700, David Miller wrote:
> 
> It's not OK, please use the generic netlink interface and as
> such you will not need to allocate any numbers at all.
> 
> Documentation/networking/generic_netlink.txt gives a link
> to some infomration on this topic.

Where can I find some infos about userland programming _without_ using
the libnl library?

There are something similar to the magic command:

   ret = socket(PF_NETLINK, SOCK_RAW, NETLINK_PPSAPI);

in this new netlink API?

Thanks for your help,

Rodolfo

-- 

GNU/Linux Solutions  e-mail:[EMAIL PROTECTED]
Linux Device Driver [EMAIL PROTECTED]
Embedded Systems[EMAIL PROTECTED]
UNIX programming phone: +39 349 2432127
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] GIT 1.5.2-rc3

2007-05-10 Thread Junio C Hamano
Upcoming 1.5.2 will have three large-ish new features that the
user community wished to have for quite some time.  I usually do
not CC the kernel list for -rc releases, but this one will
hopefully be pretty much the same as what the final one would
look like, so here it is.

We may get around fixing the long standing git-apply corner case
HPA's rebase problem unveiled before v1.5.2, but we might defer
it post 1.5.2 (it is not a regression).  We'll see.



GIT v1.5.2 Release Notes (draft)


Updates since v1.5.1


* Plumbing level subproject support.

  You can include a subdirectory that has an independent git
  repository in your index and tree objects as a
  "subproject".  This plumbing (i.e. "core") level subproject
  support explicitly excludes recursive behaviour.

  The "subproject" entries in the index and trees are
  incompatible with older versions of git.  Experimenting with
  the plumbing level support is encouraged, but be warned that
  unless everybody in your project updates to this release or
  later, using this feature would make your project
  inaccessible by people with older versions of git.

* Plumbing level gitattributes support.

  The gitattributes mechanism allows you to add 'attributes' to
  paths in your project, and affect the way certain git
  operations work.  Currently you can influence if a path is
  considered a binary or text (the former would be treated by
  'git diff' not to produce textual output; the latter can go
  through the line endings conversion process in repositories
  with core.autocrlf set), expand and unexpand '$ident$' keyword
  with blob object name, specify a custom 3-way merge driver,
  and specify a custom diff driver.  You can also apply
  arbitrary filter to contents on check-in/check-out codepath
  but this feature is an extremely sharp-edged razor and needs
  to be handled with caution (do not use it unless you
  understand the earlier mailing list discussion on keyword
  expansion).

* The packfile format now optionally suports 64-bit index.

  This release supports the "version 2" format of the .idx
  file.  This is automatically enabled when a huge packfile
  needs more than 32-bit to express offsets of objects in the
  pack

* Comes with an updated git-gui 0.7.0

* Updated gitweb:

  - can show combined diff for merges;
  - uses font size of user's preference, not hardcoded in pixels;

* New commands and options.

  - "git bisect start" can optionally take a single bad commit and
zero or more good commits on the command line.

  - "git shortlog" can optionally be told to wrap its output.

  - "subtree" merge strategy allows another project to be merged in as
your subdirectory.

  - "git format-patch" learned a new --subject-prefix=
option, to override the built-in "[PATCH]".

  - "git add -u" is a quick way to do the first stage of "git
commit -a" (i.e. update the index to match the working
tree); it obviously does not make a commit.

  - "git clean" honors a new configuration, "clean.requireforce".  When
set to true, this makes "git clean" a no-op, preventing you
from losing files by typing "git clean" when you meant to
say "make clean".  You can still say "git clean -f" to
override this.

  - "git log" family of commands learned --date={local,relative,default}
option.  --date=relative is synonym to the --relative-date.
--date=local gives the timestamp in local timezone.

* Updated behavior of existing commands.

  - When $GIT_COMMITTER_EMAIL or $GIT_AUTHOR_EMAIL is not set
but $EMAIL is set, the latter is used as a substitute.

  - "git diff --stat" shows size of preimage and postimage blobs
for binary contents.  Earlier it only said "Bin".

  - "git lost-found" shows stuff that are unreachable except
from reflogs.

  - "git checkout branch^0" now detaches HEAD at the tip commit
on the named branch, instead of just switching to the
branch (use "git checkout branch" to switch to the branch,
as before).

  - "git bisect next" can be used after giving only a bad commit
without giving a good one (this starts bisection half-way to
the root commit).  We used to refuse to operate without a
good and a bad commit.

  - "git push", when pushing into more than one repository, does
not stop at the first error.

  - "git archive" does not insist you to give --format parameter
anymore; it defaults to "tar".

  - "git cvsserver" can use backends other than sqlite.

  - "gitview" (in contrib/ section) learned to better support
"git-annotate".

  - "git diff $commit1:$path2 $commit2:$path2" can now report
mode changes between the two blobs.

  - Local "git fetch" from a repository whose object store is
one of the alternates (e.g. fetching from the origin in a
repository created with "git clone -l -s") avoids
downloading objects unnecessary.

  - 

Re: slub-i386-support.patch

2007-05-10 Thread William Lee Irwin III
On Thu, 10 May 2007, William Lee Irwin III wrote:
>> So now quicklist semantics vs. TLB flushing are the motive behind the
>> odd flush_tlb_mm() affair. The real trick with it is that flushing
>> must never occur until the TLB flush. Any change to the core quicklist
>> code that retires pages back to the page allocator earlier (e.g. based
>> on some limit) will break things badly.

On Fri, May 11, 2007 at 12:14:14AM +0100, Hugh Dickins wrote:
> I don't think that's right.  It's vital that TLB (of an active mm)
> be flushed before freeing its page back to the quicklist, before it's
> recycled to another mm (or elsewhere in this mm); but having done that,
> it really doesn't matter much when quicklist_trim() (check_pgt_cache)
> is called to free surplus pages from quicklist back to page_alloc.c.

What I was really going on about was that quicklist freeing can't
enforce any high watermarks in the future because it must wait until
the TLB flush unless it's guaranteed that TLB flushes are done prior
to quicklist freeing (which is furthermore required for other reasons,
to be described in the sequel).


On Fri, May 11, 2007 at 12:14:14AM +0100, Hugh Dickins wrote:
> tlb_finish_mmu() happens to be the traditional place it's done, and
> that's where we expect it.  flush_tlb_mm() avoids flushing TLB unless
> it's actually required for the mm in question: so wouldn't be a good
> place to rely on flushing TLB for pages freed earlier from other mms
> (but we'd already be in trouble to be leaving them that late).
> I'm guessing (haven't rechecked source) that the cpu_idle() call comes
> about because the top level pgd of a process gets freed very late in
> its exit, and after a great flurry of processes have just exited,
> perhaps there was nothing to free up the accumulation.  Though
> it still strikes me as an odd place to do it.

I described it as motivated by such, not really correctly handling it.
I didn't bother analyzing it for correctness. I'm not surprised at all
that the TLB flush can be missed where it now stands in the patch. I
wanted to move it to tlb_finish_mmu() all along, along with quicklist
management of lower levels of hierarchy.

quicklist_free() with unflushed TLB entries admits speculation through
the pagetable entries corresponding to the list links. So tlb_finish_mmu()
is the place to call quicklist_free() on pagetables. This requires
distinguishing preconstructed pagetables from freed user pages, which
is not done in include/asm-generic/tlb.h (and core callers may need
to be adjusted, pending the results of audits).

To clarify, upper levels of pagetables are indeed cached by x86 TLB's.
The same kind of deferral of freeing until the TLB is flushed required
for leaf pagetables is required for the upper levels as well.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?

2007-05-10 Thread Jeremy Fitzhardinge
David Chinner wrote:
> Ok, this is important to kow becase we merged a mod around that time
> that changes the way we handle the updates to the file size i.e. the
> fix for the NULL-files-on-crash problem:
>
> http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba87ea699ebd9dd577bf055ebc4a98200e337542
>
> and that means the size of the file is not updated to the incore
> cached inode until after the data write is complete. The symptoms
> being seen would match with a inode-not-being-written-after-last-
> data-write-bug in this mod
>   

Yes, that does look like a good candidate.  Should I try to
before-and-after this change?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [2.6.21.1] SATA freeze

2007-05-10 Thread Fred Moyer

Robert Hancock wrote:
>Gerhard Mack wrote:
>> On Wed, 9 May 2007, Jeff Garzik wrote:
>>> Gerhard Mack wrote:
 May  9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 
0x0 SErr

 0x180 action 0x2 frozen
 May  9 14:51:35 mgerhard kernel: ata1.00: cmd
 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out
 May  9 14:51:35 mgerhard kernel:  res
 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout)
 May  9 14:51:42 mgerhard kernel: ata1: port is slow to respond, 
please be

 patient (Status 0xd0)

 Anything I can do to figgure out what's causing this?
> You're showing various flags set in the SError register, which
> suggests you're having SATA communication problems with the drive. A
> bad SATA cable or power problems would be a strong possibility.

I just joined the list today so apologies if this email breaks any email 
client post threading.


I have been seeing similar errors on two different systems.  I applied 
Robert's sata_nv patch posted to the list on May 5th, and approved today 
by Jeff Garzik.  I've taken several steps to insure that this isn't a 
faulty cable or drive issue.  This is running on a hp dl145g2.  Here is 
my lspci, dmesg, and relevant kernel config sections:



Linux version 2.6.21-gentoo ([EMAIL PROTECTED]) (gcc version 
4.1.1 (Gentoo 4.1.1)) #6 SMP Sun May 6 16:44:40 PDT 2007

Command line: root=/dev/sda2
BIOS-provided physical RAM map:
 BIOS-e820:  - 00098800 (usable)
 BIOS-e820: 00098800 - 000a (reserved)
 BIOS-e820: 000c2000 - 0010 (reserved)
 BIOS-e820: 0010 - bff2 (usable)
 BIOS-e820: bff2 - bff29000 (ACPI data)
 BIOS-e820: bff29000 - bff8 (ACPI NVS)
 BIOS-e820: bff8 - c000 (reserved)
 BIOS-e820: d800 - d8000400 (reserved)
 BIOS-e820: d8001000 - d8001400 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec00400 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: fff8 - 0001 (reserved)
 BIOS-e820: 0001 - 00014000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 256 used
Entering add_active_range(0, 256, 786208) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used
end_pfn_map = 1310720
DMI present.
Entering add_active_range(0, 0, 152) 0 entries of 256 used
Entering add_active_range(0, 256, 786208) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1310720
early_node_map[3] active PFN ranges
0:0 ->  152
0:  256 ->   786208
0:  1048576 ->  1310720
On node 0 totalpages: 1048248
  DMA zone: 56 pages used for memmap
  DMA zone: 1138 pages reserved
  DMA zone: 2798 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 767832 pages, LIFO batch:31
  Normal zone: 3584 pages used for memmap
  Normal zone: 258560 pages, LIFO batch:31
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: AMD  MPTABLE: Product ID: HAMMER   MPTABLE: 
APIC at: 0xFEE0

Processor #0 (Bootup-CPU)
Processor #1
I/O APIC #2 at 0xFEC0.
I/O APIC #3 at 0xD800.
I/O APIC #4 at 0xD8001000.
Setting APIC routing to flat
Processors: 2
Nosave address range: 00098000 - 00099000
Nosave address range: 00099000 - 000a
Nosave address range: 000a - 000c2000
Nosave address range: 000c2000 - 0010
Nosave address range: bff2 - bff29000
Nosave address range: bff29000 - bff8
Nosave address range: bff8 - c000
Nosave address range: c000 - d800
Nosave address range: d800 - d8001000
Nosave address range: d8001000 - e000
Nosave address range: e000 - f000
Nosave address range: f000 - fec0
Nosave address range: fec0 - fee0
Nosave address range: fee0 - fee01000
Nosave address range: fee01000 - fff8
Nosave address range: fff8 - 0001
Allocating PCI resources starting at c200 (gap: c000:1800)
PERCPU: Allocating 36608 bytes of per cpu data
Built 1 zonelists.  Total pages: 1029190
Kernel command line: root=/dev/sda2
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Detected 2009.287 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Checking aperture...
CPU 0: aperture @ 233e00 size 32 MB

Re: [PATCH] use defines in sys_getpriority/sys_setpriority

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 10:22:23 -0700
Daniel Walker <[EMAIL PROTECTED]> wrote:

> Switch to the defines for these two checks, instead of hard
> coding the values.
> 
> Signed-Off-By: Daniel Walker <[EMAIL PROTECTED]>
> 
> ---
>  kernel/sys.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.21/kernel/sys.c
> ===
> --- linux-2.6.21.orig/kernel/sys.c
> +++ linux-2.6.21/kernel/sys.c
> @@ -598,7 +598,7 @@ asmlinkage long sys_setpriority(int whic
>   int error = -EINVAL;
>   struct pid *pgrp;
>  
> - if (which > 2 || which < 0)
> + if (which > PRIO_USER || which < PRIO_PROCESS)
>   goto out;
>  
>   /* normalize: avoid signed division (rounding problems) */
> @@ -662,7 +662,7 @@ asmlinkage long sys_getpriority(int whic
>   long niceval, retval = -ESRCH;
>   struct pid *pgrp;
>  
> - if (which > 2 || which < 0)
> + if (which > PRIO_USER || which < PRIO_PROCESS)
>   return -EINVAL;
>  
>   read_lock(_lock);

I added this:

--- a/kernel/sys.c~use-defines-in-sys_getpriority-sys_setpriority-fix
+++ a/kernel/sys.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: add human-readable error value decoding

2007-05-10 Thread Jeff Garzik

Robert Hancock wrote:
I don't think this is as big of a deal here as in other cases, like oops 
output. With libata errors, if they're at the console (which they'd have 
to be to see these messages), unless something has actually caused a 
panic the scrollback buffer should still be functional and they'd be 
able to see the entire output..



Scrollback rarely works as planned, for me.  Overall, a balance must be 
found.


More information is more helpful.  But.

There are downsides to spewing everything possible, upon error.  You 
cause logging to the possibly problematic disk, you push older messages 
out of the printk ring buffer, etc., etc.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-mm2 -- powerpc missing kset

2007-05-10 Thread Stephen Rothwell
On Thu, 10 May 2007 08:48:02 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
>
> On Thu, 10 May 2007 22:16:31 +1000 Stephen Rothwell wrote:
>
> > On Thu, 10 May 2007 12:48:28 +0100 Andy Whitcroft <[EMAIL PROTECTED]> wrote:
> > >
> > > arch/powerpc/platforms/pseries/power.c:31: warning: `struct subsystem'
> > > declared inside parameter list
> >
> > There is no explicit reference to struct subsystem in the current version
> > of that file.
>
> There is in 2.6.21-mm2.  Are you saying that it's been fixed
> somewhere else?  (where?)

Linus' tree.  It was fixed by commit
823bccfc4002296ba88c3ad0f049e1abd8108d30 ('remove "struct subsystem" as
it is no longer needed') from Greg Kroah-Hartman dated 2007-04-14 which
was applied before v2.6.21-mm2 (according to gitk).

--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpKkL6HDhqZb.pgp
Description: PGP signature


Re: [PATCH] Use boot based time for process start time and boot time in /proc

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 19:10:42 +0200
Tomas Janousek <[EMAIL PROTECTED]> wrote:

> Commit 411187fb05cd11676b0979d9fbf3291db69dbce2 caused boot time to move and
> process start times to become invalid after suspend. Using boot based time for
> those restores the old behaviour and fixes the issue.
> 
> ..
>
> @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
>   unsigned long jif;
>   cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
>   u64 sum = 0;
> + struct timespec boottime;
>  
>   user = nice = system = idle = iowait =
>   irq = softirq = steal = cputime64_zero;
> - jif = - wall_to_monotonic.tv_sec;
> - if (wall_to_monotonic.tv_nsec)
> - --jif;
> + getboottime();
> + jif = boottime.tv_sec;
> + if (boottime.tv_nsec)
> + ++jif;
>

Is the switch from --jif to ++jif a functional change?  If so, how come?

>   for_each_possible_cpu(i) {
>   int j;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 40645b4..386ff51 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -918,7 +918,7 @@ struct task_struct {
>   unsigned int rt_priority;
>   cputime_t utime, stime;
>   unsigned long nvcsw, nivcsw; /* context switch counts */
> - struct timespec start_time;
> + struct timespec start_time, real_start_time;

no, please prefer to do

struct timespec start_time;
struct timespec real_start_time;

which gives a nice place to add a comment documenting the field.

Please document fields.

What is the difference between start_time and real_start_time?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce boot based time

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 19:10:25 +0200
Tomas Janousek <[EMAIL PROTECTED]> wrote:

> The commits
>   411187fb05cd11676b0979d9fbf3291db69dbce2 (GTOD: persistent clock support)
>   c1d370e167d66b10bca3b602d3740405469383de (i386: use GTOD persistent clock
> support)
> changed the monotonic time so that it no longer jumps after resume, but it's
> not possible to use it for boot time and process start time calculations then.
> Also, the uptime no longer increases during suspend.
> 
> I add a variable to track the wall_to_monotonic changes, a function to get the
> real boot time and a function to get the boot based time from the monotonic
> one.

From: Andrew Morton <[EMAIL PROTECTED]>

- I don't think those sybols are needed in modules.

- Document total_sleep_time units (would have been better to call it
  total_sleep_time_secs, perhaps).

Cc: John Stultz <[EMAIL PROTECTED]>
Cc: Tomas Janousek <[EMAIL PROTECTED]>
Cc: Tomas Smetana <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 kernel/time/timekeeping.c |6 +-
 1 files changed, 1 insertion(+), 5 deletions(-)

diff -puN include/linux/time.h~introduce-boot-based-time-fix 
include/linux/time.h
diff -puN kernel/time/timekeeping.c~introduce-boot-based-time-fix 
kernel/time/timekeeping.c
--- a/kernel/time/timekeeping.c~introduce-boot-based-time-fix
+++ a/kernel/time/timekeeping.c
@@ -46,7 +46,7 @@ EXPORT_SYMBOL(xtime_lock);
  */
 struct timespec xtime __attribute__ ((aligned (16)));
 struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
-static unsigned long total_sleep_time;
+static unsigned long total_sleep_time; /* seconds */
 
 EXPORT_SYMBOL(xtime);
 
@@ -503,8 +503,6 @@ void getboottime(struct timespec *ts)
- wall_to_monotonic.tv_nsec);
 }
 
-EXPORT_SYMBOL(getboottime);
-
 /**
  * monotonic_to_bootbased - Convert the monotonic time to boot based.
  * @ts:pointer to the timespec to be converted
@@ -513,5 +511,3 @@ void monotonic_to_bootbased(struct times
 {
ts->tv_sec += total_sleep_time;
 }
-
-EXPORT_SYMBOL(monotonic_to_bootbased);
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Slab allocators: Drop support for destructors

2007-05-10 Thread Paul Mundt
On Thu, May 10, 2007 at 12:00:08PM -0700, Christoph Lameter wrote:
> As far as I can tell there is only a single slab destructor left (there 
> is currently another in i386 but its going to go as soon as Andi merges 
> i386s support for quicklists).
> 
> I wonder how difficult it would be to remove it? If we have no need for 
> destructors anymore then maybe we could remove destructor support from the 
> slab allocators? There is no point in checking for destructor uses in 
> the slab allocators if there are none.
> 
> Or are there valid reason to keep them around? It seems they were mainly 
> used for list management which required them to take a spinlock. Taking a 
> spinlock in a destructor is a bit risky since the slab allocators may run 
> the destructors anytime they decide a slab is no longer needed.
> 
> Or do we want to continue support destructors? If so why?
> 
[snip pmb stuff]

I'll take a look at tidying up the PMB slab, getting rid of the dtor
shouldn't be terribly painful. I simply opted to do the list management
there since others were doing it for the PGD slab cache at the time that
was written.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slub-i386-support.patch

2007-05-10 Thread William Lee Irwin III
William Lee Irwin III wrote:
>> Xen is not mandatory as it now stands.

On Thu, May 10, 2007 at 02:28:05PM -0700, Jeremy Fitzhardinge wrote:
> ?  I'm hoping to merge the Xen code in the next couple of days, so I'd
> appreciate it if we don't break the foundations just before building the
> building.

CONFIG_X86_PAE without CONFIG_PARAVIRT is the case in question here.
What's done in that case can't break Xen because it doesn't run under
Xen.


William Lee Irwin III wrote:
>>  Also, I intend to fix up Xen
>> at some point so it doesn't need this.

On Thu, May 10, 2007 at 02:28:05PM -0700, Jeremy Fitzhardinge wrote:
> As I mentioned in the previous mail, its only really necessary for a
> 32-bit guest under a 32-bit hypervisor.  While that's going to be a
> supported configuration for a long time, we expect that people will
> increasingly use 64-bit hypervisors on new machines, so this will become
> less of an issue.
> We're also looking at shadowing the 4 top-level PAE entries rather than
> using them directly, since the shadows only need to be updated when
> reloading cr3.  This would allow us to use compact pgds, so long as
> there's some other way to maintain the pgd list (ideally, something that
> can be shared with non-PAE).

ISTR you describing this method earlier. This is what I had in mind
for fixing up Xen not to need full PAGE_SIZE-sized pgd's.


On Thu, May 10, 2007 at 02:28:05PM -0700, Jeremy Fitzhardinge wrote:
> Or did I miss something?  Is pgd_list being maintained some other way
> with slub/quicklists?

No, it's identical. clameter's code makes PAGE_SIZE-sized pgd's
unconditional for CONFIG_X86_PAE, which is what bothered me.


William Lee Irwin III wrote:
>> The alternative was 64-bit generation numbers incremented at the time
>> of change_page_attr(). If generation numbers were used, it would be
>> possible to dispose of the list altogether. Given the awkwardness of
>> the list maintenance for Xen, it may be worth using them now. PAE
>> pgd's could merely double in size to maintain those for the unshared
>> kernel pmd case, and remain 32B otherwise. Full PAGE_SIZE -sized pgd's
>> for 2-level pagetables could distribute the generation number across
>> page->index and page->private, or any other fields available.

On Thu, May 10, 2007 at 02:28:05PM -0700, Jeremy Fitzhardinge wrote:
> If you use page->index for that, how does pgd_list get linked together
> for vmalloc syncing?

It doesn't need to be linked together for vmalloc_sync(). Just increment
the generation number and walk the mmlist the same as for pageattr.c


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: add human-readable error value decoding

2007-05-10 Thread Robert Hancock

Jeff Garzik wrote:

Mark Lord wrote:

If we're compiling the messages into the kernel regardless,
then it doesn't really make much sense to NOT show all of them
on the error paths.



Not true.  Uncontrolled message spewage inevitably results in critical 
information scrolling off the screen, before a user can take a digital 
photo of the output...  Or of users being confused by subsequent error 
fallout (i.e. multiple oopses reporting problem).


Moderation and restraint still have roles to play...  :)

Jeff


I don't think this is as big of a deal here as in other cases, like oops 
output. With libata errors, if they're at the console (which they'd have 
to be to see these messages), unless something has actually caused a 
panic the scrollback buffer should still be functional and they'd be 
able to see the entire output..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] locks: fix F_GETLK regression (failure to find conflicts)

2007-05-10 Thread Doug Chapman
On Thu, 2007-05-10 at 18:38 -0400, J. Bruce Fields wrote:
> In 9d6a8c5c213e34c475e72b245a8eb709258e968c we changed posix_test_lock
> to modify its single file_lock argument instead of taking separate input
> and output arguments.  This makes it no longer safe to set the output
> lock's fl_type to F_UNLCK before looking for a conflict, since that
> means searching for a conflict against a lock with type F_UNLCK.
> 
> This fixes a regression which causes F_GETLK to incorrectly report no
> conflict on most filesystems (including any filesystem that doesn't do
> its own locking).
> 
> Also fix posix_lock_to_flock() to copy the lock type.  This isn't
> strictly necessary, since the caller already does this; but it seems
> less likely to cause confusion in the future.
> 
> Thanks to Doug Chapman for the bug report.
> 
> Signed-off-by: "J. Bruce Fields" <[EMAIL PROTECTED]>
> ---
>  fs/locks.c |5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 671a034..8ec16ab 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -669,7 +669,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>  {
>   struct file_lock *cfl;
>  
> - fl->fl_type = F_UNLCK;
>   lock_kernel();
>   for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = 
> cfl->fl_next) {
>   if (!IS_POSIX(cfl))
> @@ -681,7 +680,8 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>   __locks_copy_lock(fl, cfl);
>   unlock_kernel();
>   return 1;
> - }
> + } else
> + fl->fl_type = F_UNLCK;
>   unlock_kernel();
>   return 0;
>  }
> @@ -1632,6 +1632,7 @@ static int posix_lock_to_flock(struct flock *flock, 
> struct file_lock *fl)
>   flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
>   fl->fl_end - fl->fl_start + 1;
>   flock->l_whence = 0;
> + flock->l_type = fl->fl_type;
>   return 0;
>  }
>  

I tested this both with my little hacked up test program as well as with
the LTP tests.  Looks good.  Nice job on the quick turnaround on this
Bruce.

- Doug


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: add human-readable error value decoding

2007-05-10 Thread Robert Hancock

Tejun Heo wrote:

+if (ehc->i.serror)
+ata_port_printk(ap, KERN_ERR,
+  "SError: {%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s}\n",
+  ehc->i.serror & SERR_DATA_RECOVERED ? "RecovDataErr " : "",
+  ehc->i.serror & SERR_COMM_RECOVERED ? "RecovCommErr " : "",
+  ehc->i.serror & SERR_DATA ? "UnrecovDataErr " : "",
+  ehc->i.serror & SERR_PERSISTENT ? "PersistErr " : "",
+  ehc->i.serror & SERR_PROTOCOL ? "ProtocolErr " : "",
+  ehc->i.serror & SERR_INTERNAL ? "HostInternalErr " : "",
+  ehc->i.serror & SERR_PHYRDY_CHG ? "PHYRdyChg " : "",
+  ehc->i.serror & SERR_PHY_INT_ERR ? "PHYInternalErr " : "",
+  ehc->i.serror & SERR_COMM_WAKE ? "CommWake " : "",
+  ehc->i.serror & SERR_10B_8B_ERR ? "10B8BErr " : "",
+  ehc->i.serror & SERR_DISPARITY ? "Disparity " : "",
+  ehc->i.serror & SERR_CRC ? "CRCErr " : "",
+  ehc->i.serror & SERR_HANDSHAKE ? "HandshakeErr " : "",
+  ehc->i.serror & SERR_LINK_SEQ_ERR ? "LinkSeqErr " : "",
+  ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr " : "",
+  ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
+  ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );


I'm not really convinced whether this is necessary.  The human readable
form is also a bit cryptic and can get quite long.  So, mild NACK from me.



It certainly seems useful when debugging hotplug issues or random SATA 
problems which end up being caused by communication problems. Without 
this output, Joe User stands no chance of figuring out what's going on, 
and neither does Joe libata Developer unless they really care to dig 
through the spec and count bits to figure out what they mean. At least 
with this you can see that there was a CRC error, etc. and go from that..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?

2007-05-10 Thread David Chinner
On Thu, May 10, 2007 at 04:07:30PM -0700, Jeremy Fitzhardinge wrote:
> David Chinner wrote:
> > Just to confirm this isn't a result of a recent change, can you reproduce
> > this on a 2.6.20 or 2.6.21 kernel? (sorry if you've already done this - 
> > I've juggling
> > some many things at once it's easy to forget little things).
> 
> It is the result of a recent change.  I had seen no problem until around
> 2.6.21-git8-11.  I will try again with a plain 2.6.21 kernel, just to
> confirm.

Ok, this is important to kow becase we merged a mod around that time
that changes the way we handle the updates to the file size i.e. the
fix for the NULL-files-on-crash problem:

http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ba87ea699ebd9dd577bf055ebc4a98200e337542

and that means the size of the file is not updated to the incore
cached inode until after the data write is complete. The symptoms
being seen would match with a inode-not-being-written-after-last-
data-write-bug in this mod

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] AFS: Fix interminable loop in afs_write_back_from_locked_page()

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 15:33:34 +0100
David Howells <[EMAIL PROTECTED]> wrote:

> Following bug was uncovered by compiling with '-W' flag:

gcc -W finds a number of fairly scary bugs.

More than one would expect, given that it is recommended in
Documentation/SubmitChecklist, which everyone reads ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: slub-i386-support.patch

2007-05-10 Thread Hugh Dickins
On Thu, 10 May 2007, William Lee Irwin III wrote:
> On Thu, May 10, 2007 at 09:03:39PM +0100, Hugh Dickins wrote:
> > Though when I look at the patchset (copied below), I do wonder why
> > it puts a quicklist_trim() into i386's cpu_idle() and flush_tlb_mm():
> > neither is where I'd expect us to be secretly freeing pages.  Ah,
> > several arches do it in cpu_idle(): how odd, oh well.
> 
> So now quicklist semantics vs. TLB flushing are the motive behind the
> odd flush_tlb_mm() affair. The real trick with it is that flushing
> must never occur until the TLB flush. Any change to the core quicklist
> code that retires pages back to the page allocator earlier (e.g. based
> on some limit) will break things badly.

I don't think that's right.  It's vital that TLB (of an active mm)
be flushed before freeing its page back to the quicklist, before it's
recycled to another mm (or elsewhere in this mm); but having done that,
it really doesn't matter much when quicklist_trim() (check_pgt_cache)
is called to free surplus pages from quicklist back to page_alloc.c.

tlb_finish_mmu() happens to be the traditional place it's done, and
that's where we expect it.  flush_tlb_mm() avoids flushing TLB unless
it's actually required for the mm in question: so wouldn't be a good
place to rely on flushing TLB for pages freed earlier from other mms
(but we'd already be in trouble to be leaving them that late).

I'm guessing (haven't rechecked source) that the cpu_idle() call comes
about because the top level pgd of a process gets freed very late in
its exit, and after a great flurry of processes have just exited,
perhaps there was nothing to free up the accumulation.  Though
it still strikes me as an odd place to do it.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: getcpu after sched_setaffinity

2007-05-10 Thread Ulrich Drepper

Andi Kleen wrote:

Probably. In principle getcpu() (where does the sched_ come from btw?)


getcpu() is an unacceptable name.  All te other functions dealing with 
CPU (sets, etc) have a sched_ prefix.




is only designed for the case where you don't set the affinity explicitely;
otherwise you should already know where you are and don't need it.


That's not true in general.  Yes, because I want to test vgetcpu() I 
restrict the set to just one CPU.


But if I have more than 2 "CPUs" and I set the affinity to two CPUs 
which currently are not used you cannot make this argument.


getcpu should always work correctly not only if you cannot determine it 
in another way.




Hmm ok one could probably define memset(..., 0) as a invalidation
interface, but because of the considerations above i don't think
it is really needed.


It is needed.

For now I added the cache clearing in the setaffinity calls in libc. 
Resetting to cache to {0,0} seems to work.


--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?

2007-05-10 Thread David Chinner
On Thu, May 10, 2007 at 05:51:29PM -0400, Chuck Ebbert wrote:
> Jeremy Fitzhardinge wrote:
> > Chuck Ebbert wrote:
> >> What CPU architecture is this happening on? Not i686 with PAE by
> >> any chance?
> > 
> > Yes.  Why?
> 
> I have a bug report where NFS files are corrupted only with PAE clients.
> Corruption is at the end of the (newly untarred) files. Doesn't happen
> without PAE.

Chuck, can you post a pointer to this thread?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?

2007-05-10 Thread Jeremy Fitzhardinge
David Chinner wrote:
> Just to confirm this isn't a result of a recent change, can you reproduce
> this on a 2.6.20 or 2.6.21 kernel? (sorry if you've already done this - I've 
> juggling
> some many things at once it's easy to forget little things).

It is the result of a recent change.  I had seen no problem until around
2.6.21-git8-11.  I will try again with a plain 2.6.21 kernel, just to
confirm.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [FW ide-cs] Re: jvc cdrom drive lockup

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 14:06:54 +0800
"Zhang, Yanmin" <[EMAIL PROTECTED]> wrote:

> On Sun, 2007-05-06 at 16:00 +0100, Richard Kennedy wrote:
> > On Fri, 2007-05-04 at 23:32 +0900, Komuro wrote:
> > > On Thu, 03 May 2007 15:29:19 +0100
> > > Richard Kennedy <[EMAIL PROTECTED]> wrote:
> > > 
> > > 
> > > IDE bugs should be posted to the linux-ide mailing list.
> > > 
> > > 
> > > > Hi all, 
> > > > I have a JVC MP-CDX1 cdrom drive that came with my laptop which used to
> > > > work with ide-cs but stopped working with newer kernels.
> > > > 
> > > > I added its ident to ide-cs.c (see patch below) and the drive now is
> > > > detected and gets mounted when plugged in and seems to work correctly.
> > > > 
> > > > But when I eject the card, pccardctl eject 0, the laptop locks up
> > > > completely, there are no messages in the log, and the fan goes to full
> > > > speed so I guess the cpu is running at 100%.
> > > > Any ideas what's going wrong or how to debug it ? 
> > > > Is there anything else I need to patch to get this working ?
> > > > 
> > > > Thanks
> > > > Richard
> > > > 
> > > > card info :- 
> > > > 
> > > > May  3 11:22:52 mininote kernel: pccard: PCMCIA card inserted into slot > > > > 0
> > > > May  3 11:22:52 mininote kernel: cs: memory probe 
> > > > 0xa000-0xa0ff: clean.
> > > > May  3 11:22:52 mininote kernel: pcmcia: registering new device 
> > > > pcmcia0.0
> > > > May  3 11:22:53 mininote kernel: hdc: UJDB130, ATAPI CD/DVD-ROM drive
> > > > May  3 11:22:53 mininote kernel: ide1 at 0x190-0x197,0x396 on irq 3
> > > > May  3 11:22:53 mininote kernel: ide-cs: hdc: Vpp = 0.0
> > > > May  3 11:22:54 mininote kernel: hdc: ATAPI 20X CD-ROM drive, 128kB 
> > > > Cache
> > > > May  3 11:22:54 mininote kernel: Uniform CD-ROM driver Revision: 3.20
> > > > May  3 11:23:04 mininote hald: mounted /dev/hdc on behalf of uid 500
> > > > May  3 11:23:34 mininote hald: unmounted /dev/hdc from '/media/FC_4 
> > > > i386 ftp #1' on behalf of uid 500
> > > > May  3 11:24:17 mininote kernel: pccard: card ejected from slot 0
> > > > << lockup happened here >>
> 
> > I rebuilt the kernel with the lock dependency checking turned on, which
> > shows up 2 problems (and also breaks the deadlock).
> > 
> > kernel: pccard: card ejected from slot 0
> > kernel: 
> 
> > kernel: BUG: sleeping function called from invalid context at 
> > kernel/rwsem.c:20
> > kernel: in_atomic():0, irqs_disabled():1
> > kernel: INFO: lockdep is turned off.
> > kernel: irq event stamp: 2258
> > kernel: hardirqs last  enabled at (2257): [] kfree+0x78/0x7f
> > kernel: hardirqs last disabled at (2258): [] 
> > _spin_lock_irq+0xc/0x3a
> > kernel: softirqs last  enabled at (2252): [] do_softirq+0x4d/0xb6
> > kernel: softirqs last disabled at (2243): [] do_softirq+0x4d/0xb6
> > kernel:  [] down_read+0x15/0x4d
> > kernel:  [] pci_get_subsys+0x68/0xea
> > kernel:  [] pci_get_device+0x16/0x19
> > kernel:  [] init_hwif_default+0x28/0xf0
> > kernel:  [] ide_unregister+0x242/0x573
> > kernel:  [] ide_release+0x18/0x28 [ide_cs]
> > kernel:  [] ide_detach+0x8/0x14 [ide_cs]
> > kernel:  [] pcmcia_device_remove+0x50/0xb5
> > kernel:  [] __device_release_driver+0x71/0x8e
> > kernel:  [] device_release_driver+0x31/0x46
> > kernel:  [] bus_remove_device+0x70/0x80
> > kernel:  [] device_del+0x162/0x1c6
> > kernel:  [] device_unregister+0x8/0x10
> > kernel:  [] pcmcia_card_remove+0x58/0x77
> > kernel:  [] ds_event+0x56/0x87
> > kernel:  [] kobject_get+0xf/0x13
> > kernel:  [] send_event+0x31/0x49
> > kernel:  [] socket_shutdown+0xc/0xb3
> > kernel:  [] socket_remove+0x1c/0x26
> > kernel:  [] pcmcia_eject_card+0x3f/0x4c
> > kernel:  [] pccard_store_eject+0x1b/0x22
> > kernel:  [] pccard_store_eject+0x0/0x22
> > kernel:  [] dev_attr_store+0x27/0x2c
> > kernel:  [] sysfs_write_file+0xbf/0xe8
> > kernel:  [] sysfs_write_file+0x0/0xe8
> > kernel:  [] vfs_write+0xa8/0x154
> > kernel:  [] sys_write+0x41/0x67
> > kernel:  [] sysenter_past_esp+0x5f/0x99
> > kernel:  ===
> Before calling init_hwif_default, ide_unregister gets lock ide_lock and 
> disables irq.
> init_hwif_default calls ide_default_io_base which calls pci_get_device and 
> later
> pci_get_subsys tries to apply for semaphore pci_bus_sem and goes to sleep.
> 
> Mostly, pci_get_device should be called when irq is turned on.
> 
> I still don't understand an issue. If you test it on a mobile, mostly, the 
> process won't
> sleep when applying for pci_bus_sem because there is no too many 
> opportunities for 2 processes
> to apply for the semaphore at the same time.
> 
> As just needing know if pci is initiated, ide_default_io_base just needs find 
> if list
> pci_devices is empty.
> 
> Could you try below patch against 2.6.21?
> 
> Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
> 
> ---
> 
> diff -Nraup linux-2.6.21/drivers/pci/probe.c 
> linux-2.6.21_fix/drivers/pci/probe.c
> --- linux-2.6.21/drivers/pci/probe.c  2007-05-10 11:35:06.0 +0800
> +++ 

Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

2007-05-10 Thread Mel Gorman
On (10/05/07 15:49), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
> 
> > > I cannot predict how allocations on a slab will be performed. In order 
> > > to avoid the higher order allocations in we would have to add a flag 
> > > that tells SLUB at slab creation creation time that this cache will be 
> > > used for atomic allocs and thus we can avoid configuring slabs in such a 
> > > way that they use higher order allocs.
> > > 
> > 
> > It is an option. I had the gfp flags passed in to kmem_cache_create() in
> > mind for determining this but SLUB creates slabs differently and different
> > flags could be passed into kmem_cache_alloc() of course.
> 
> So we have a collection of flags to add
> 
> SLAB_USES_ATOMIC

This is a possibility.

> SLAB_TEMPORARY

I have a patch for this sitting in a queue waiting for testing

> SLAB_PERSISTENT
> SLAB_RECLAIMABLE
> SLAB_MOVABLE

I don't think these are required because the necessary information is
available from the GFP flags.

> 
> ?
> 
> > Another alternative is that anti-frag used to also group high-order
> > allocations together and make it hard to fallback to those areas
> > for non-atomic allocations. It is currently backed out by the
> > patch dont-group-high-order-atomic-allocations.patch because
> > it was intended for rare high-order short-lived allocations
> > such as e1000 that are currently dealt with by MIGRATE_RESERVE
> > (bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
> >  The high-order atomic groupings may help here because the high-order
> > allocations are long-lived and would claim contiguous areas.
> > 
> > The last alternative I think I mentioned already is to have the minimum
> > order kswapd reclaims as the same order SLUB uses instead of 0 so that
> > min_free_kbytes is kept at higher orders than current.
> 
> Would you get a patch to Nicholas to test either of these solutions?

I do not have a kswapd related patch ready but the first alternative is
readily available.

Nicholas, could you backout the patch
dont-group-high-order-atomic-allocations.patch and test again please?
The following patch has the same effect. Thanks

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.21-mm2-clean/include/linux/mmzone.h 
linux-2.6.21-mm2-grouphigh/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h   2007-05-09 
10:21:28.0 +0100
+++ linux-2.6.21-mm2-grouphigh/include/linux/mmzone.h   2007-05-10 
23:54:45.0 +0100
@@ -38,8 +38,9 @@ extern int page_group_by_mobility_disabl
 #define MIGRATE_UNMOVABLE 0
 #define MIGRATE_RECLAIMABLE   1
 #define MIGRATE_MOVABLE   2
-#define MIGRATE_RESERVE   3
-#define MIGRATE_TYPES 4
+#define MIGRATE_HIGHATOMIC3
+#define MIGRATE_RESERVE   4
+#define MIGRATE_TYPES 5
 
 #define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.21-mm2-clean/include/linux/pageblock-flags.h 
linux-2.6.21-mm2-grouphigh/include/linux/pageblock-flags.h
--- linux-2.6.21-mm2-clean/include/linux/pageblock-flags.h  2007-05-09 
10:21:28.0 +0100
+++ linux-2.6.21-mm2-grouphigh/include/linux/pageblock-flags.h  2007-05-10 
23:54:45.0 +0100
@@ -31,7 +31,7 @@
 
 /* Bit indices that affect a whole block of pages */
 enum pageblock_bits {
-   PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
+   PB_range(PB_migrate, 3), /* 3 bits required for migrate types */
NR_PAGEBLOCK_BITS
 };
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.21-mm2-clean/mm/page_alloc.c 
linux-2.6.21-mm2-grouphigh/mm/page_alloc.c
--- linux-2.6.21-mm2-clean/mm/page_alloc.c  2007-05-09 10:21:28.0 
+0100
+++ linux-2.6.21-mm2-grouphigh/mm/page_alloc.c  2007-05-10 23:54:45.0 
+0100
@@ -167,6 +167,11 @@ static inline int allocflags_to_migratet
if (unlikely(page_group_by_mobility_disabled))
return MIGRATE_UNMOVABLE;
 
+   /* Cluster high-order atomic allocations together */
+   if (unlikely(order > 0) &&
+   (!(gfp_flags & __GFP_WAIT) || in_interrupt()))
+   return MIGRATE_HIGHATOMIC;
+
/* Cluster based on mobility */
return (((gfp_flags & __GFP_MOVABLE) != 0) << 1) |
((gfp_flags & __GFP_RECLAIMABLE) != 0);
@@ -713,10 +718,11 @@ static struct page *__rmqueue_smallest(s
  * the free lists for the desirable migrate type are depleted
  */
 static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
-   [MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
MIGRATE_RESERVE },
-   [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
MIGRATE_RESERVE },
-   [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
MIGRATE_RESERVE },
-   [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE,   

Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

2007-05-10 Thread Christoph Lameter
On Fri, 11 May 2007, Mel Gorman wrote:

> Nicholas, could you backout the patch
> dont-group-high-order-atomic-allocations.patch and test again please?
> The following patch has the same effect. Thanks

Great! Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-git10/11: files getting truncated on xfs? or maybe an nlink problem?

2007-05-10 Thread David Chinner
On Thu, May 10, 2007 at 02:54:25PM -0700, Jeremy Fitzhardinge wrote:
> Chuck Ebbert wrote:
> > Jeremy Fitzhardinge wrote:
> >   
> >> Chuck Ebbert wrote:
> >> 
> >>> What CPU architecture is this happening on? Not i686 with PAE by
> >>> any chance?
> >>>   
> >> Yes.  Why?
> >> 
> >
> > I have a bug report where NFS files are corrupted only with PAE clients.
> > Corruption is at the end of the (newly untarred) files. Doesn't happen
> > without PAE.
> >   
> 
> Hm, suggestive, but I'm not convinced.  Two differences to this situation:
> 
>1. Immediately after the clone ("untar"), the contents are completely
>   OK; it's only after a umount/mount cycle to problems appear
>2. There's no corruption as such; the files are just too short.  And
>   it seems they're at a previously OK length, not some random size.

Just to confirm this isn't a result of a recent change, can you reproduce
this on a 2.6.20 or 2.6.21 kernel? (sorry if you've already done this - I've 
juggling
some many things at once it's easy to forget little things).

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.22 - part 2

2007-05-10 Thread Len Brown
On Thursday 10 May 2007 16:56, Linus Torvalds wrote:
> 
> On Thu, 10 May 2007, Linus Torvalds wrote:
> > 
> > Seems to work for me. My evo correctly started the fan, and stopped it 
> > when the temperature went down again.
> 
> Looking at things in "top", I do end up occasionally seeing spikes where 
> kacpid takes 17% of CPU time, and kacpi_notify takes a few percent too. 
> But the machine works ok, and it doesn't seem to be horrible:
> 
>64 ?S< 0:15 [kacpid]
>65 ?S< 0:08 [kacpi_notify]
> 
> so they've gotten 23 seconds of CPU time over the 37 minutes that laptop 
> has been up now. That's arguably too much, but on the other hand, I did 
> end up trying to stress it out by doing some 3D stuff while compiling the 
> kernel and doing "git grep" over the kernel tree etc.

Thanks, I noticed the same thing on an nx6325.
The goal at the moment is to revert to the simplest functional & stable 
solution --
as what is shipping today crashes on some boxes.

We've got a couple of tweaks in mind where we think Linux can get
smarter -- but this is an area where several platform vendors are taking
advantage of Windows' implementation in (different) twisted ways.
For us to reach our goal of Linux handling any system out there
in as optimal a way as possible, we need to study each one in detail.

That said, can you send me or point me to the acpidump output
for your EVO.  Yes, I'm sure you've sent it before a long time
ago, but that was about probably 2,000,000 e-mail messages
and a couple of disk crashes ago:-)

thanks,
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-10 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Yep, this patch gets rid of my spinning thread.  I can't find this patch
> or any discussion on marc.info; is there a better netdev list archive?

See the "linkwatch bustage in git-net" thread on netdev

http://thread.gmane.org/gmane.linux.network/61800/focus=61812
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-10 Thread David Miller
From: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Date: Thu, 10 May 2007 15:45:42 -0700

> David Miller wrote:
> > I'm not so certain now that we know it's the jiffies wrap point :-)
> >
> > The fixes in question are attached below and they were posted and
> > discussed on netdev:
> >   
> 
> Yep, this patch gets rid of my spinning thread.  I can't find this patch
> or any discussion on marc.info; is there a better netdev list archive?

I don't see it there either... let me check my mail archive...

Indeed, they were "posted" to netdev but were blocked by the vger
regexp filters on the keyword "urgent" so that postings never made it
to the list.  I removed that filter regexp so that never happens
again, sorry.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

2007-05-10 Thread Christoph Lameter
On Thu, 10 May 2007, Mel Gorman wrote:

> > I cannot predict how allocations on a slab will be performed. In order 
> > to avoid the higher order allocations in we would have to add a flag 
> > that tells SLUB at slab creation creation time that this cache will be 
> > used for atomic allocs and thus we can avoid configuring slabs in such a 
> > way that they use higher order allocs.
> > 
> 
> It is an option. I had the gfp flags passed in to kmem_cache_create() in
> mind for determining this but SLUB creates slabs differently and different
> flags could be passed into kmem_cache_alloc() of course.

So we have a collection of flags to add

SLAB_USES_ATOMIC
SLAB_TEMPORARY
SLAB_PERSISTENT
SLAB_RECLAIMABLE
SLAB_MOVABLE

?

> Another alternative is that anti-frag used to also group high-order
> allocations together and make it hard to fallback to those areas
> for non-atomic allocations. It is currently backed out by the
> patch dont-group-high-order-atomic-allocations.patch because
> it was intended for rare high-order short-lived allocations
> such as e1000 that are currently dealt with by MIGRATE_RESERVE
> (bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
>  The high-order atomic groupings may help here because the high-order
> allocations are long-lived and would claim contiguous areas.
> 
> The last alternative I think I mentioned already is to have the minimum
> order kswapd reclaims as the same order SLUB uses instead of 0 so that
> min_free_kbytes is kept at higher orders than current.

Would you get a patch to Nicholas to test either of these solutions?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86 setup rewrite tree ready for flamage^W review

2007-05-10 Thread H. Peter Anvin
Alexander van Heukelum wrote:
> On Thu, May 10, 2007 at 11:08:10AM -0700, H. Peter Anvin wrote:
>> As far as I could tell, "scan" simply caused the nonstandard video
>> driver scan modules (unsafe probes) to be invoked.  Since those modules
>> are no longer present, there appeared to be no need for them.  The VGA
>> and VESA probes are safe.
> 
> It doesn't probe the hardware in dangerous ways. (Search for mode_scan
> in video.S) It works by trying to set a mode via the normal
> AH=0/AL=mode/int 0x10 method for all possible values of mode. It then
> checks if the bios reports the new mode as being set and reads a few
> standard vga registers to determine if it is a text mode. It's
> completely independent of the CONFIG_VIDEO_SVGA stuff.

It's dangerous, all right (which is why it doesn't do it by default),
since you have no guarantee that the BIOS doesn't totally vomit on these
calls -- or, like my laptop, take about a minute before giving up
finding nothing.

Anyway, I re-implemented scanning and pushed it out to the git tree;
please try it out as it does absolutely nothing on any of my machines.

> That makes me wonder: (from arch/i386/boot/pmjump.S)
> 
> 37 movw$__BOOT_DS, %cx
> 38 
> 39 movl%cr0, %edx
> 40 orb $1, %dl # Protected mode (PE) bit
> 41 movl%edx, %cr0
> 42
> 43 movw%cx, %ds
> 44 movw%cx, %es
> 45 movw%cx, %fs
> 46 movw%cx, %gs
> 47 movw%cx, %ss
> 48
> 49 # Jump to the 32-bit entrypoint
> 50 .byte   0x66, 0xea  # ljmpl opcode
> 51 2:  .long   0   # offset
> 52 .word   __BOOT_CS   # segment
> 
> I thought the 32-bit jump was required to come before the segment loads.
> Does this code load values from the gdt, or are they just loaded as real
> mode segments? As long as it does not crash it does not matter, because
> head.S reloads them again.

Once CR0.PE is set, segments are loaded from the GDT.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: getcpu after sched_setaffinity

2007-05-10 Thread Andi Kleen
On Thu, May 10, 2007 at 03:24:58PM -0700, Ulrich Drepper wrote:
> The attached test program fails on a dual core (and probably SMP) 
> machine on x86-64.  Depending on where the thread starts, in one of the 
> iterations the sched_setffinity() call succeeds but then sched_getcpu() 
> fails to report the correct CPU.
> 
> In set_cpus_allowed migrate_task() is called if the new CPU set does not 
> include the current CPU.  I hope that migrate_task() also works for 
> p==current.
> 
> This leaves the x86-64 vgetcpu() implementation as the weak point.  Is 
> the caching causing problems?  

Probably. In principle getcpu() (where does the sched_ come from btw?)
is only designed for the case where you don't set the affinity explicitely;
otherwise you should already know where you are and don't need it.

The cache is optimized for the case when you run without affinity
and change CPUs only rarely (which is normal) so it is kept valid for a 
jiffie. And you always need to handle an outdated result from getcpu
anyways because you can't disable preemption from user space and could
switch any time.

In short your test case has a broken design.

> is reset?

The vsyscall/kernel can't reset the cache because it is managed by the
application.

Hmm ok one could probably define memset(..., 0) as a invalidation
interface, but because of the considerations above i don't think
it is really needed.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] UDF: check for allocated memory for inode data

2007-05-10 Thread Andrew Morton
On Thu, 10 May 2007 18:00:00 +0400
Cyrill Gorcunov <[EMAIL PROTECTED]> wrote:

> This patch adds cheking for granted memory while
> filling up inode data to prevent possible NULL
> pointer usage. If there is not enough memory to
> fill inode data we just mark it as "bad".
> 
> Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>
> 
> Please check the patch, maybe just marking inode as
> "bad" is not a good solution.
> 

yes, make_bad_inode() is appropriate here.

> 
> diff --git a/fs/udf/inode.c b/fs/udf/inode.c
> index c846155..91cddae 100644
> --- a/fs/udf/inode.c
> +++ b/fs/udf/inode.c
> @@ -1144,6 +1144,13 @@ static void udf_fill_inode(struct inode *inode, struct 
> buffer_head *bh)
>   UDF_I_EFE(inode) = 1;
>   UDF_I_USE(inode) = 0;
>   UDF_I_DATA(inode) = kmalloc(inode->i_sb->s_blocksize - 
> sizeof(struct extendedFileEntry), GFP_KERNEL);
> + if (!UDF_I_DATA(inode))
> + {
> + printk(KERN_ERR "udf: udf_fill_inode(ino %ld) no free 
> memory\n",
> +inode->i_ino);
> + make_bad_inode(inode);
> + return;
> + }

But please let's not add three copies of identical code.  Do something like:

static int udf_check_inode(struct inode *inode)
{
if (!UDF_I_DATA(inode)) {
printk(KERN_ERR "udf: udf_fill_inode(ino %ld) no free memory\n",
inode->i_ino);
make_bad_inode(inode);
return -1;
}
return 0;
}


if (udf_check_inode(inode))
return;

In fact you can also do the kmalloc in that helper function too:

static int udf_alloc_i_data(struct inode *inode, size_t size)
{
UDF_I_DATA(inode) = kmalloc(...);
...
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/2] [NET] link_watch: Move link watch list into net_device

2007-05-10 Thread Jeremy Fitzhardinge
David Miller wrote:
> I'm not so certain now that we know it's the jiffies wrap point :-)
>
> The fixes in question are attached below and they were posted and
> discussed on netdev:
>   

Yep, this patch gets rid of my spinning thread.  I can't find this patch
or any discussion on marc.info; is there a better netdev list archive?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

2007-05-10 Thread Mel Gorman
On (10/05/07 15:27), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
> 
> > On (10/05/07 15:11), Christoph Lameter didst pronounce:
> > > On Thu, 10 May 2007, Mel Gorman wrote:
> > > 
> > > > I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was 
> > > > set,
> > > > right? Does that mean that SLUB is trying to allocate pages atomically? 
> > > > If so,
> > > > it would explain why this situation could still occur even though 
> > > > high-order
> > > > allocations that could sleep would succeed.
> > > 
> > > SLUB is following the gfp mask of the caller like all well behaved slab 
> > > allocators do. If the caller does not set __GFP_WAIT then the page 
> > > allocator also cannot wait.
> > 
> > Then SLUB should not use the higher orders for slab allocations that cannot
> > sleep during allocations. What could be done in the longer term is decide
> > how to tell kswapd to keep pages free at an order other than 0 when it is
> > known there are a large number of high-order long-lived allocations like 
> > this.
> 
> I cannot predict how allocations on a slab will be performed. In order 
> to avoid the higher order allocations in we would have to add a flag 
> that tells SLUB at slab creation creation time that this cache will be 
> used for atomic allocs and thus we can avoid configuring slabs in such a 
> way that they use higher order allocs.
> 

It is an option. I had the gfp flags passed in to kmem_cache_create() in
mind for determining this but SLUB creates slabs differently and different
flags could be passed into kmem_cache_alloc() of course.

> The other solution is not to use higher order allocations by dropping the 
> antifrag patches in mm that allow SLUB to use higher order allocations. 
> But then there would be no higher order allocations at all that would
> use the benefits of antifrag measures.

That would be an immediate solution.

Another alternative is that anti-frag used to also group high-order
allocations together and make it hard to fallback to those areas
for non-atomic allocations. It is currently backed out by the
patch dont-group-high-order-atomic-allocations.patch because
it was intended for rare high-order short-lived allocations
such as e1000 that are currently dealt with by MIGRATE_RESERVE
(bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
. The high-order atomic groupings may help here because the high-order
allocations are long-lived and would claim contiguous areas.

The last alternative I think I mentioned already is to have the minimum
order kswapd reclaims as the same order SLUB uses instead of 0 so that
min_free_kbytes is kept at higher orders than current.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-10 Thread David Chinner
On Thu, May 10, 2007 at 05:26:20PM +0530, Amit K. Arora wrote:
> On Thu, May 10, 2007 at 10:59:26AM +1000, David Chinner wrote:
> > On Wed, May 09, 2007 at 09:31:02PM +0530, Amit K. Arora wrote:
> > > I have the updated patches ready which take care of Andrew's comments.
> > > Will run some tests and post them soon.
> > > 
> > > But, before submitting these patches, I think it will be better to
> > > finalize on certain things which might be worth some discussion here:
> > > 
> > > 1) Should the file size change when preallocation is done beyond EOF ?
> > > - Andreas and Chris Wedgwood are in favor of not changing the file size
> > > in this case. I also tend to agree with them. Does anyone has an
> > > argument in favor of changing the filesize ?  If not, I will remove the
> > > code which changes the filesize, before I resubmit the concerned ext4
> > > patch.
> > 
> > I think there needs to be both. If we don't have a mechanism to atomically
> > change the file size with the preallocation, then applications that use
> > stat() to work out if they need to preallocate more space will end up
> > racing.
> 
> By "both" above, do you mean we should give user the flexibility if it wants
> the filesize changed or not ? It can be done by having *two* modes for
> preallocation in the system call - say FA_PREALLOCATE and FA_ALLOCATE. If we
> use FA_PREALLOCATE mode, fallocate() will allocate blocks, but will not
> change the filesize and [cm]time. If FA_ALLOCATE mode is used, fallocate()
> will change the filesize if required (i.e.  when allocation is beyond EOF)
> and also update [cm]time.  This way, the application can decide what it
> wants.

Yes, that's right.

> This will be helpfull for the partial allocation scenario also. Think of the
> case when we do not change the filesize in fallocate() and expect
> applications/posix_fallocate() to do ftruncate() after fallocate() for this.
> Now if fallocate() results in a partial allocation with -ENOSPC error
> returned, applications/posix_fallocate() will not know for what length
> ftruncate() has to be called.  :(

Well, posix_fallocate() either gets all the space or it fails. If
you truncate to extend the file size after an ENOSPC, then that is
a buggy implementation.

The same could be said for any application, or even the fallocate()
call itself if it changes the filesize without having completely
preallocated the space asked

> Hence it may be a good idea to give user the flexibility if it wants to
> atomically change the file size with preallocation or not. But, with more
> flexibility there comes inconsistency in behavior, which is worth
> considering.

We've got different modes to specify different behaviour. That's
what the mode field was put there for in the first place - the
interface is *designed* to support different preallocation
behaviours

> > > 2) For FA_UNALLOCATE mode, should the file system allow unallocation of
> > > normal (non-preallocated) blocks (blocks allocated via regular
> > > write/truncate operations) also (i.e. work as punch()) ?
> > 
> > Yes. That is the current XFS implementation for XFS_IOC_UNRESVSP, and what
> > i did for FA_UNALLOCATE as well.
> 
> Ok. But, some people may not expect/like this. I think, we can keep it on
> the backburner for a while, till other issues are sorted out.

How can it be a "backburner" issue when it defines the
implementation?  I've already implemented some thing in XFS that
sort of does what I think that the interface is supposed to do, but
I need that interface to be nailed down before proceeding any
further.

All I'm really interested in right now is that the fallocate
_interface_ can be used as a *complete replacement* for the
pre-existing XFS-specific ioctls that are already used by
applications.  What ext4 can or can't do right now is irrelevant to
this discussion - the interface definition needs to take priority
over implementation

Cheers,

Dave,
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/30] Use menuconfig objects

2007-05-10 Thread Andrew Morton
On Tue, 10 Apr 2007 21:17:40 +0200 (MEST)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:

> the following patch series turns some menus into menuconfigs, so they 
> can be disabled whilst "walking" thorugh the parent menu (check the 
> videos [1], [2] to see what I mean), enabling for disabling lots of 
> options _quickly_.

Well Martin's little tromp through the Kconfig menus meant that I had to
repair pretty much every one of these patches.  Could you please have a
look at http://userweb.kernel.org/~akpm/menuconfig/, see if I screwed
anything up?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] locks: fix F_GETLK regression (failure to find conflicts)

2007-05-10 Thread J. Bruce Fields
In 9d6a8c5c213e34c475e72b245a8eb709258e968c we changed posix_test_lock
to modify its single file_lock argument instead of taking separate input
and output arguments.  This makes it no longer safe to set the output
lock's fl_type to F_UNLCK before looking for a conflict, since that
means searching for a conflict against a lock with type F_UNLCK.

This fixes a regression which causes F_GETLK to incorrectly report no
conflict on most filesystems (including any filesystem that doesn't do
its own locking).

Also fix posix_lock_to_flock() to copy the lock type.  This isn't
strictly necessary, since the caller already does this; but it seems
less likely to cause confusion in the future.

Thanks to Doug Chapman for the bug report.

Signed-off-by: "J. Bruce Fields" <[EMAIL PROTECTED]>
---
 fs/locks.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 671a034..8ec16ab 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -669,7 +669,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 {
struct file_lock *cfl;
 
-   fl->fl_type = F_UNLCK;
lock_kernel();
for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = 
cfl->fl_next) {
if (!IS_POSIX(cfl))
@@ -681,7 +680,8 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
__locks_copy_lock(fl, cfl);
unlock_kernel();
return 1;
-   }
+   } else
+   fl->fl_type = F_UNLCK;
unlock_kernel();
return 0;
 }
@@ -1632,6 +1632,7 @@ static int posix_lock_to_flock(struct flock *flock, 
struct file_lock *fl)
flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
fl->fl_end - fl->fl_start + 1;
flock->l_whence = 0;
+   flock->l_type = fl->fl_type;
return 0;
 }
 
-- 
1.5.1.1.107.g7a159

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

2007-05-10 Thread Mel Gorman
On (10/05/07 14:49), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Andrew Morton wrote:
> 
> > Christoph, can we please take a look at /proc/slabinfo and its slub
> > equivalent (I forget what that is?) and review any and all changes to the
> > underlying allocation size for each cache?
> > 
> > Because this is *not* something we should change lightly.
> 
> It was changed specially for mm in order to stress the antifrag code. If 
> this causes trouble then do not merge the patches against SLUB that 
> exploit the antifrag methods. This failure should help see how effective 
> Mel's antifrag patches are. He needs to get on this dicussion.
> 

The antfrag mechanism depends on the caller being able to sleep and reclaim
pages if necessary to get the contiguous allocation. No attempts are being
currently made to keep pages at a particular order free.

I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
right? Does that mean that SLUB is trying to allocate pages atomically? If so,
it would explain why this situation could still occur even though high-order
allocations that could sleep would succeed.

> Upstream has slub_max_order=1.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >