date:20080219

Re: pci/pcie/aer/aerdrv_acpi.c: inconsequent NULL checking

2008-02-19 Thread Adrian Bunk

On Tue, Feb 19, 2008 at 09:47:58PM -0800, Greg KH wrote:
> On Tue, Feb 19, 2008 at 09:29:02PM +0200, Adrian Bunk wrote:
> > The Coverity checker spotted the following inconsequent NULL checking 
> > introduced by commit 3c75e23784e6ed5f4841de43d0750fd9b37bafcb:
> > 
> > <--  snip  -->
> > 
> > ...
> > int aer_osc_setup(struct pcie_device *pciedev)
> > {
> > ...v
> > while (pdev->bus && pdev->bus->self)
> > pdev = pdev->bus->self;
> 
> That could probably change to just pdev->bus->self, as a bus should
> always be there for a pdev, so I don't see this as a problem.

I'm not claiming this specific case was a problem.

When a NULL check is only performed in some cases that's sometimes a bug 
that has to be fixed and in most cases a not required check that should 
be removed at some point in time.

> thanks,
> 
> greg k-h

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix dma_poor_create

2008-02-19 Thread Yinghai Lu

On Tuesday 19 February 2008 10:52:30 pm Ingo Molnar wrote:
> 
> * Yinghai Lu <[EMAIL PROTECTED]> wrote:
> 
> > dev_to_node could return node that without RAM. So check it before use 
> > it in kmalloc_node
> 
> > -   retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
> > +   node = dev_to_node(dev);
> > +   if (node == -1 || !node_online(node))
> > +   node = numa_node_id();
> > +
> > +   retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, node);
> 
> so this is about not crashing during bootup on nodes that have CPUs but 
> which have no node-specific memory attached, right?
> 
> Shouldnt kmalloc_node() be made more robust instead? I.e. push the same 
> code into kmalloc_node() - and make sure it will allocate _something_? 
> That would probably also fix a similar bug in net/core/skbuff.c's 
> __netdev_alloc_skb(), which too passes a dev_to_node() result to an 
> allocator.

sound good idea to update the dev_to_node to make sure it will return -1 or the 
one is online.

Will send updated one.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 20

2008-02-19 Thread Frank Seidel

Stephen Rothwell wrote:
> That would work.  Chris has the right idea, though.  Just set up
> linux-next as a remote on any existing clone of Linus' tree and the
> "fetch" will forcibly update the linux-next/master branch (remember to
> not have that branch checked out when you fetch).
> 
> If you keep a continuing git tree for this, you will have the history of
> all the next trees because I tag each one.

Thanks for that hint. Added it to the FAQ on the Wiki
(http://linux.f-seidel.de/linux-next/pmwiki/).

Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.25-rc1 xen pvops regression

2008-02-19 Thread H. Peter Anvin


Ian Campbell wrote:

On Mon, 2008-02-18 at 02:40 -0800, Joel Becker wrote:

On Sun, Feb 17, 2008 at 06:49:21PM +, Ian Campbell wrote:



x86/xen: Do not scan for DMI unless the DMI region is reserved by e820.



This fixed it.  I'm now booting successfully.  Thank you!


Excellent. Jeremy, are you happy for this to go in?



NAK!

It's pretty standard for 0xf...0x10 to be marked RESERVED in 
E820 on real hardware (including the system I'm typing on right now.) 
It is so marked to indicate that hardware cannot be mapped into that 
space.  However, you can't rely on this fact -- heck, you can't rely on 
E820 even existing on a real machine.  I have specimens of real-life 
machines that go both ways.


This patch WILL break real hardware.

What's particularly damning is that it's titled "x86/xen: Do not scan 
for DMI unless the DMI region is reserved by e820." whereas in fact it 
changes (breaks) generic code.


-hpa


From 23e4ec12b95064320f83fca1cc1ad5c7b2eb3386 Mon Sep 17 00:00:00 2001

From: Ian Campbell <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 21:57:45 +
Subject: [PATCH] x86/xen: Do not scan for DMI unless the DMI region is reserved 
by e820.

Under Xen the memory at 0xf is regular RAM and so can potentially contain a
page table and hence cannot be mapped. The e820 map given to guest reflects
this.

Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
---
 drivers/firmware/dmi_scan.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 653265a..7d29403 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static char dmi_empty_string[] = "";
 
@@ -371,6 +372,9 @@ void __init dmi_scan_machine(void)

}
}
else {
+   if (!e820_all_mapped(0xF, 0xF+0x1, E820_RESERVED))
+   goto out;
+
/*
 * no iounmap() for that ioremap(); it would be a no-op, but
 * it's so early in setup that sucker gets confused into doing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Huawei E220 and usb storage

2008-02-19 Thread Norbert Preining

On Do, 14 Feb 2008, Pete Zaitcev wrote:
> that you did, after taking care of detection and initialization.
> Look at his dmesg in comment #44 in this:

Yes, that looks very similar.

> > - changing the penultimage argument in the usb_stor_huawei_e220_init
> >   function from 0x1 to 0 stopped this misbehaviour, but
> > 
> > - with the change from 0x1 to 0 the initialization works automatically.
> 
> I may be able to test this.

I test it regularly, last with 2.6.25-rc1, and it works, always. Maybe
it is not the optimal solution, but who knows.

> As you recall, Huawei people themselves suggested nonzero length,

Umpf, ahh, very interesting.

Best wishes

Norbert

---
Dr. Norbert Preining <[EMAIL PROTECTED]>Vienna University of Technology
Debian Developer <[EMAIL PROTECTED]> Debian TeX Group
gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
---
Serious error.
All shortcuts have disappeared.
Screen.
Mind.
Both are blank.
   --- Windows Error Haiku
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.25-rc2: ohci1394 problem

2008-02-19 Thread Thomas Meyer


Hi.

With 2.6.25-rc2 my kernel log consists mainly of:

"ohci1394: fw-host0: Unhandled interrupt(s) 0xfc7cfe0c
ohci1394: fw-host0: Unrecoverable error!
ohci1394: fw-host0: Async Rsp Tx Context died: ctrl[f0002a00] 
cmdptr[f0002a00]
ohci1394: fw-host0: Iso Recv 3 Context died: ctrl[d4000d0e] 
cmdptr[0005c806] match[]
ohci1394: fw-host0: Iso Recv 17 Context died: ctrl[7c006e38] 
cmdptr[f58b18cd] match[4910c683]
ohci1394: fw-host0: Iso Recv 18 Context died: ctrl[003cacf0] 
cmdptr[88f2eb10] match[46e8104e]
ohci1394: fw-host0: Iso Recv 19 Context died: ctrl[0c047e80] 
cmdptr[83060246] match[83060846]
ohci1394: fw-host0: Iso Recv 26 Context died: ctrl[00656c62] 
cmdptr[6e696461] match[706f2067]
ohci1394: fw-host0: Iso Recv 27 Context died: ctrl[4d006d65] 
cmdptr[61726570] match[676e6974]

ohci1394: fw-host0: physical posted write error
ohci1394: fw-host0: respTxComplete: dma prg stopped
ohci1394: fw-host0: SelfID received outside of bus reset sequence
ohci1394: fw-host0: Unhandled interrupt(s) 0xfc7cfe0c
ohci1394: fw-host0: Unrecoverable error!
ohci1394: fw-host0: Async Rsp Tx Context died: ctrl[f0002a00] 
cmdptr[f0002a00]
ohci1394: fw-host0: Iso Recv 3 Context died: ctrl[d4000d0e] 
cmdptr[0005c806] match[]
ohci1394: fw-host0: Iso Recv 17 Context died: ctrl[7c006e38] 
cmdptr[f58b18cd] match[4910c683]
ohci1394: fw-host0: Iso Recv 18 Context died: ctrl[003cacf0] 
cmdptr[88f2eb10] match[46e8104e]
ohci1394: fw-host0: Iso Recv 19 Context died: ctrl[0c047e80] 
cmdptr[83060246] match[83060846]
ohci1394: fw-host0: Iso Recv 26 Context died: ctrl[00656c62] 
cmdptr[6e696461] match[706f2067]
ohci1394: fw-host0: Iso Recv 27 Context died: ctrl[4d006d65] 
cmdptr[61726570] match[676e6974]

ohci1394: fw-host0: physical posted write error
ohci1394: fw-host0: respTxComplete: dma prg stopped
ohci1394: fw-host0: SelfID received outside of bus reset sequence
ohci1394: fw-host0: Unhandled interrupt(s) 0xfc7cfe0c
ohci1394: fw-host0: Unrecoverable error!
"

Strange, isn't it?

regards
thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 20

2008-02-19 Thread Jeff Garzik


Greg KH wrote:

On Wed, Feb 20, 2008 at 04:34:57PM +1100, Stephen Rothwell wrote:

Hi all,

I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
allmodconfig for both powerpc and x86_64.


What's the best way to constantly follow this tree?  I had cloned it a
while ago, but now if I 'git pull' it wants to merge things, which isn't
right.

I'm guessing that this is constantly being rebased?  Against what,
Linus's tree?  So we should be able to clone Linus's tree, and then pull
in -next?

Or am I totally missing something here?


You can use 'git fetch -f' to override your local tree with the remote 
contents.


I'm pretty sure there's a better way to do it, but I don't know it...

Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tbench regression in 2.6.25-rc1

2008-02-19 Thread Eric Dumazet


Zhang, Yanmin a écrit :

On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote:

Zhang, Yanmin a �crit :
On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 

On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:


I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The 
performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

Could you add a comment someplace that says "refcnt wants to be on a different
cache line from input/output/ops or performance tanks badly", to warn some
future kernel hacker who starts adding new fields to the structure?

Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-	u32			metrics[RTAX_MAX];

-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE

-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
 	struct neighbour	*neighbour;

struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
 	struct  dst_ops	*ops;

-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;




I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels.

Could you please test now with 32 bits one ?

I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no 
difference
between pure kernel and patched kernel.

New update: On 8-core stoakley, the regression becomes 2~3% with kernel 
2.6.25-rc2. On
tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 
cores+hyperthreading),
the regression is still 4% with 2.6.25-rc2.

With my patch, on tigerton, almost all regression disappears. On tulsa, only 
about 2%
regression disappears.

So this issue is triggerred with multiple-cpu. Perhaps process scheduler is 
another
factor causing the issue to happen, but it's very hard to change scheduler.



Thanks very much Yanmin, I think we can apply your patch as is, if no 
regression was found for 32bits.




Eric,

I tested your new patch in function loopback_xmit. It has no improvement, while 
it doesn't
introduce new issues. As you tested it on dual-core machine and got 
improvement, how about
merging your patch with mine?


No, thank you, that was an experiment and is not related to your findings on 
dst_entry.


I am currently working on a 'distributed refcount' infrastructure, to be able 
to spread on several nodes (for NUMA machines) or several cache lines (normal 
SMP machines)  the high pressure we currently have on some refcnt (struct 
dst_entry, struct net_device, and many more refcnts ...)


Instead of NR_CPUS allocations, goal is to be able to restrict to a small 
value like 4, 8 or 16 the number of 32bits entities used to store one refcnt, 
even if NR_CPUS=4096 or so.


atomic_inc(>refcnt) ->  distref_inc(>refcnt)

distref_inc(struct distref *p)
{
atomic_inc(myptr[p->offset]);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: Integration of SCST in the mainstream Linux kernel

2008-02-19 Thread Erez Zilber

Bart Van Assche wrote:
> On Feb 18, 2008 10:43 AM, Erez Zilber <[EMAIL PROTECTED]> wrote:
>   
>> If you use a high value for FirstBurstLength, all (or most) of your data
>> will be sent as unsolicited data-out PDUs. These PDUs don't use the RDMA
>> engine, so you miss the advantage of IB.
>> 
>
> Hello Erez,
>
> Did you notice the e-mail Roland Dreier wrote on Februari 6, 2008 ?
> This is what Roland wrote:
>   
>> I think the confusion here is caused by a slight misuse of the term
>> "RDMA".  It is true that all data is always transported over an
>> InfiniBand connection when iSER is used, but not all such transfers
>> are one-sided RDMA operations; some data can be transferred using
>> send/receive operations.
>> 
>
>   
Yes, I saw that. I tried to give an explanation with more details.

> Or: data sent during the first burst is not transferred via one-sided
> remote memory reads or writes but via two-sided send/receive
> operations. At least on my setup, these operations are as fast as
> one-sided remote memory reads or writes. As an example, I obtained the
> following numbers on my setup (SDR 4x network);
> ib_write_bw: 933 MB/s.
> ib_read_bw: 905 MB/s.
> ib_send_bw: 931 MB/s.
>
>   
According to these numbers one can think that you don't need RDMA at
all, just send iSCSI PDUs over IB. The benchmarks that you use are
synthetic IB benchmarks that are not equivalent to iSCSI over iSER. They
just send IB packets. I'm not surprised that you got more or less the
same performance because, AFAIK, ib_send_bw doesn't copy data (unlike
iSCSI that has to copy data that is sent/received without RDMA).

When you use RDMA with iSCSI (i.e. iSER), you don't need to create iSCSI
PDUs and process them. The CPU is not busy as it is with iSCSI over TCP
because no data copies are required. Another advantage is that you don't
need header/data digest because the IB HW does that.

Erez
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] Fix Unlikely(x) == y

2008-02-19 Thread Willy Tarreau

On Tue, Feb 19, 2008 at 10:28:46AM +0100, Andi Kleen wrote:
> > Sometimes, for performance critical paths, I would like gcc to be dumb and
> > follow *my* code and not its hard-coded probabilities. 
> 
> If you really want that, simple: just disable optimization @)

already tried. It fixed some difficulties, but create new expected issues
with data being reloaded often from memory instead of being passed along
a few registers. Don't forget that optimizing for x86 requires a lot of
smartness from the compiler because of the very small number of registers!

> > Maybe one thing we would need would be the ability to assign probabilities
> > to each branch based on what we expect, so that gcc could build a better
> > tree keeping most frequently used code tight.
> 
> Just use profile feedback then for user space. I don't think it's a good
> idea for kernel code though because it leads to unreproducible binaries
> which would wreck the development model.

I never found this to be practically usable in fact, because you have to
use it on the *exact* same source. End of game for cross-compilation. It
would be good to be able to use a few pragmas in the code to say "hey, I
want this block optimized like this". This is what I understood the
__builtin_expect() was for, except that it tends to throw unpredicted
branches too far away.

> > Hmm I've just noticed -fno-guess-branch-probability in the man, I never 
> > tried
> > it.
> 
> Or -fno-reorder-blocks

Thanks for the hint, I will try it.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.25-rc2 hangs after "Suspending console(s)"

2008-02-19 Thread Tino Keitel

On Tue, Feb 19, 2008 at 09:52:22 +0100, Tino Keitel wrote:
> On Mon, Feb 18, 2008 at 20:49:04 +0100, Pavel Machek wrote:
> > On Mon 2008-02-18 01:28:15, Tino Keitel wrote:
> > > Hi folks,
> > > 
> > > with 2.6.25-rc2, my Mac mini Core Duo hangs at suspend. The last
> > > message on the console is "Suspending console(s)". I also tried some
> > > other versions after 2.6.24, all of them fail with this hang.
> > 
> > Try adding 'no_console_suspend' to kernel command line.
> 
> Thanks, that gave a bit insight. The last message is now:
> 
> ACPI: PCI interrupt for device :00:02.0 disabled
> 
> This is my graphics card:
> 
> 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile
> 945GM/GMS, 943/940GML Express Integrated Graphics Controller
> 
> So it looks like the recent changes that should repair console
> sudpend/resume for Intel graphics broke suspend for me.

It looks like I was wrong. It also hangs when the i915 module isn't
loaded.

Regards,
Tino
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SMP-related kernel memory leak

2008-02-19 Thread Bart Van Assche

On Feb 19, 2008 7:18 PM, Oliver Pinter <[EMAIL PROTECTED]> wrote:
> On 2/19/08, Bart Van Assche <[EMAIL PROTECTED]> wrote:
> > I noticed that the amount of memory used by the Linux kernel steadily
> > increases over time on SMP systems (x86 architecture, 32-bit kernel).
> > This problem disappears when I add maxcpus=1 to the kernel command
> > line. I have observed this behavior both on the 2.6.22.18 and 2.6.24.2
> > kernels. Did anyone notice anything similar ?
> >
> > See also: http://bugzilla.kernel.org/show_bug.cgi?id=9991
>
> this patch fixed them http://lkml.org/lkml/2008/2/18/405 ?

Thanks for the hint. If I interpreted the 2.6.24 changelog correctly
this patch is already included with 2.6.24 ? The problem still occurs
with 2.6.24.2. I am currently trying to find the minimal kernel config
which still triggers this problem. Any other hints for finding the
cause of this issue are welcome of course.

Bart Van Assche.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.25-rc2-mm1 (x64 thermal build failure)

2008-02-19 Thread Thomas Petazzoni

Le Tue, 19 Feb 2008 15:21:29 -0800,
Andrew Morton <[EMAIL PROTECTED]> a écrit :

> ug, sorry, if I'd realised it was like this I'd have said "don't
> bother". Apart from the obvious problem, this means that people will
> keep breaking CONFIG_DMI=n all the time, because they will forget the
> ifdefs, and the number of people who test with CONFIG_DMI=n will be
> small.

Yes, #ifdef CONFIG_DMI is not very comfortable. That why I proposed
things such as DECLARE_DMI_FIXUP_TABLE(), because it would force people
to use these macros, which would then be working correctly depending on
DMI=y/n. However, there's still the issue of driver_data that I
mentionned in my earlier post.

What should I do ? Option 1 ? Option 2 ? Give up with the patch ?

Thanks for your comments,

Thomas
-- 
Thomas Petazzoni, Free Electrons
Free Embedded Linux Training Materials
on http://free-electrons.com/training
(More than 1500 pages!)


signature.asc
Description: PGP signature

Re: Linux 2.6.25-rc2

2008-02-19 Thread Pekka Enberg


On 2/20/2008, "Zhang, Yanmin" <[EMAIL PROTECTED]> wrote:
> Kernel with the reverting patch is ok.
> I ran reboot/hackbench for more than 10 times on every one of my 3 x86-64 
> machines, and kernel didn't crash.

Great, Linus reverted the patch yesterday. Thanks for testing!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tbench regression in 2.6.25-rc1

2008-02-19 Thread Zhang, Yanmin

On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote:
> Zhang, Yanmin a �crit :
> > On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 
> >> On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:
> >>
> >>> I also think __refcnt is the key. I did a new testing by adding 2 
> >>> unsigned long
> >>> pading before lastuse, so the 3 members are moved to next cache line. The 
> >>> performance is
> >>> recovered.
> >>>
> >>> How about below patch? Almost all performance is recovered with the new 
> >>> patch.
> >>>
> >>> Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
> >> Could you add a comment someplace that says "refcnt wants to be on a 
> >> different
> >> cache line from input/output/ops or performance tanks badly", to warn some
> >> future kernel hacker who starts adding new fields to the structure?
> > Ok. Below is the new patch.
> > 
> > 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
> > sizeof(dst_entry)=200
> > no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
> > tigerton by
> > moving tclassid to different place. It looks like tclassid could also have 
> > impact on
> > performance.
> > If moving tclassid before metrics, or just don't move tclassid, the 
> > performance isn't
> > good. So I move it behind metrics.
> > 
> > 2) Add comments before __refcnt.
> > 
> > If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better 
> > than
> > the one without the patch.
> > 
> > If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better 
> > than
> > the one without the patch.
> > 
> > Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
> > 
> > ---
> > 
> > --- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 
> > +0800
> > +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
> > +0800
> > @@ -52,15 +52,10 @@ struct dst_entry
> > unsigned short  header_len; /* more space at head required 
> > */
> > unsigned short  trailer_len;/* space to reserve at tail */
> >  
> > -   u32 metrics[RTAX_MAX];
> > -   struct dst_entry*path;
> > -
> > -   unsigned long   rate_last;  /* rate limiting for ICMP */
> > unsigned intrate_tokens;
> > +   unsigned long   rate_last;  /* rate limiting for ICMP */
> >  
> > -#ifdef CONFIG_NET_CLS_ROUTE
> > -   __u32   tclassid;
> > -#endif
> > +   struct dst_entry*path;
> >  
> > struct neighbour*neighbour;
> > struct hh_cache *hh;
> > @@ -70,10 +65,20 @@ struct dst_entry
> > int (*output)(struct sk_buff*);
> >  
> > struct  dst_ops *ops;
> > -   
> > -   unsigned long   lastuse;
> > +
> > +   u32 metrics[RTAX_MAX];
> > +
> > +#ifdef CONFIG_NET_CLS_ROUTE
> > +   __u32   tclassid;
> > +#endif
> > +
> > +   /*
> > +* __refcnt wants to be on a different cache line from
> > +* input/output/ops or performance tanks badly
> > +*/
> > atomic_t__refcnt;   /* client references*/
> > int __use;
> > +   unsigned long   lastuse;
> > union {
> > struct dst_entry *next;
> > struct rtable*rt_next;
> > 
> > 
> > 
> 
> I prefer this patch, but unfortunatly your perf numbers are for 64 bits 
> kernels.
> 
> Could you please test now with 32 bits one ?
I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no 
difference
between pure kernel and patched kernel.

New update: On 8-core stoakley, the regression becomes 2~3% with kernel 
2.6.25-rc2. On
tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 
cores+hyperthreading),
the regression is still 4% with 2.6.25-rc2.

With my patch, on tigerton, almost all regression disappears. On tulsa, only 
about 2%
regression disappears.

So this issue is triggerred with multiple-cpu. Perhaps process scheduler is 
another
factor causing the issue to happen, but it's very hard to change scheduler.


Eric,

I tested your new patch in function loopback_xmit. It has no improvement, while 
it doesn't
introduce new issues. As you tested it on dual-core machine and got 
improvement, how about
merging your patch with mine?

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 20

2008-02-19 Thread Stephen Rothwell

Hi Greg,

On Tue, 19 Feb 2008 21:50:55 -0800 Greg KH <[EMAIL PROTECTED]> wrote:
>
> What's the best way to constantly follow this tree?  I had cloned it a
> while ago, but now if I 'git pull' it wants to merge things, which isn't
> right.
> 
> I'm guessing that this is constantly being rebased?  Against what,
> Linus's tree?  So we should be able to clone Linus's tree, and then pull
> in -next?

That would work.  Chris has the right idea, though.  Just set up
linux-next as a remote on any existing clone of Linus' tree and the
"fetch" will forcibly update the linux-next/master branch (remember to
not have that branch checked out when you fetch).

If you keep a continuing git tree for this, you will have the history of
all the next trees because I tag each one.

> Or am I totally missing something here?

I said in the original announcement that the "master" branch would be
rebasing every day (well, I actually said that I would recreate the tree
every day).

Each day, I start with the latest version of Linus' tree (my "stable"
branch) and then merge all the subsystem trees on that.

> I like seeing these, to know that things are at least still working.  I
> imagine you could script them, or just send them to the linux-next list
> if there are no problems, but lkml should probably be notified of any
> issues, right?

Sounds like a plan.  So new "normal" announcements will happen on the
linux-next mailing list and "abnormal" ones to LKML as well (at least).

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/

pgpsMkb4fDYGF.pgp
Description: PGP signature

Re: Linux 2.6.25-rc2

2008-02-19 Thread Zhang, Yanmin

On Wed, 2008-02-20 at 10:08 +0800, Zhang, Yanmin wrote:
> On Wed, 2008-02-20 at 08:36 +0800, Zhang, Yanmin wrote:
> > On Tue, 2008-02-19 at 17:52 +0200, Pekka Enberg wrote:
> > > Ingo Molnar wrote:
> > > > * Pekka Enberg <[EMAIL PROTECTED]> wrote:
> > > > 
> > > >>> Yes, this can happen. Are you saying it is not safe to be in the 
> > > >>> lockless path when an IRQ triggers?
> > > >> Hmm. The barrier() in slab_free() looks fishy. The comment says it's 
> > > >> there to make sure we've retrieved c->freelist before c->page but then 
> > > >> it uses a _compiler barrier_ which doesn't affect the CPU and the 
> > > >> reads may still be re-ordered... Not sure if that matters here though.
> > > > 
> > > > find a fix patch for that below - most systems affected seem to be SMP 
> > > > ones.
> > > > 
> > > > If this (or my other patch) indeed solves the problem i'd still favor a 
> > > > full revert of the SLUB_FASTPATH (commit 1f84260c8ce3b1ce26d4), it 
> > > > looks 
> > > > quite un-cooked and quite un-tested for multiple independent reasons.
> > > > 
> > > > Sigh, why do i again have to be the messenger who brings the bad news 
> > > > to 
> > > > SLUB land, and again when poor Christoph went on vacation? :-/
> > > > 
> > > > Ingo
> > > > 
> > > > -->
> > > > Subject: SLUB: barrier fix
> > > > From: Ingo Molnar <[EMAIL PROTECTED]>
> > > > 
> > > > ---
> > > >  mm/slub.c |2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > Index: linux/mm/slub.c
> > > > ===
> > > > --- linux.orig/mm/slub.c
> > > > +++ linux/mm/slub.c
> > > > @@ -1862,7 +1862,7 @@ static __always_inline void slab_free(st
> > > > debug_check_no_locks_freed(object, s->objsize);
> > > > do {
> > > > freelist = c->freelist;
> > > > -   barrier();
> > > > +   smp_mb();
> > > > /*
> > > >  * If the compiler would reorder the retrieval of 
> > > > c->page to
> > > >  * come before c->freelist then an interrupt could
> > > 
> > > Torsten/Yamin, does this fix things for you? What about reverting commit 
> > > 1f84260c8ce3b1ce26d4c1d6dedc2f33a3a29c0c ("SLUB: Alternate fast paths 
> > > using cmpxchg_local")?
> > I'm busy in another issue and will test it ASAP. Sorry.
> I tested it on my 3 x86-64 machines. The small fix to use smp_mb to replace
> barrier in slab_free doesn't work. Kernel still crashed at the same place.
> 
> I will test the reverting patch.
Kernel with the reverting patch is ok.
I ran reboot/hackbench for more than 10 times on every one of my 3 x86-64 
machines,
and kernel didn't crash.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix dma_poor_create

2008-02-19 Thread Ingo Molnar

* Yinghai Lu <[EMAIL PROTECTED]> wrote:

> dev_to_node could return node that without RAM. So check it before use 
> it in kmalloc_node

> - retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, dev_to_node(dev));
> + node = dev_to_node(dev);
> + if (node == -1 || !node_online(node))
> + node = numa_node_id();
> +
> + retval = kmalloc_node(sizeof(*retval), GFP_KERNEL, node);

so this is about not crashing during bootup on nodes that have CPUs but 
which have no node-specific memory attached, right?

Shouldnt kmalloc_node() be made more robust instead? I.e. push the same 
code into kmalloc_node() - and make sure it will allocate _something_? 
That would probably also fix a similar bug in net/core/skbuff.c's 
__netdev_alloc_skb(), which too passes a dev_to_node() result to an 
allocator.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tsc breaks atkbd suspend

2008-02-19 Thread Thomas Gleixner

On Tue, 19 Feb 2008, Len Brown wrote:

> On Tuesday 19 February 2008 11:51, Thomas Gleixner wrote:
> > On Tue, 19 Feb 2008, Ingo Molnar wrote:
> > > * Pavel Machek <[EMAIL PROTECTED]> wrote:
> > > 
> > > > TSC is used even on machines when CONFIG_X86_TSC is not set (X86_TSC 
> > > > means _require_ TSC), but it is not properly disabled when it is 
> > > > unusable, because acpi code understood the config switch as "may use 
> > > > TSC".
> > > > 
> > > > This actually fixes suspend problems on my x60.
> > > 
> > > ah! This makes tons of sense. I've applied your patch
> 
> please do not.

It's not in the -mm branch.
 
> > > - but i guess it  should go via the ACPI tree.
> 
> yes.
> 
> > Right. The breakage was introduced there when IA64 switched to
> > GENERIC_TIME and the X86_TSC dependency was added.
> 
> so do we need a patch for 2.6.23.stable and 2.6.24.stable?

Yes.

Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] documentation: fix firmware_sample_firmware_class to build

2008-02-19 Thread Greg KH

On Mon, Feb 18, 2008 at 04:22:16PM -0800, Randy Dunlap wrote:
> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> Fix firmware_sample_firmware_class module to build without error.
> sysfs.h already has the function prototypes and has them correctly.
> 
> Documentation/firmware_class/firmware_sample_firmware_class.c:37: error: 
> conflicting types for 'sysfs_remove_bin_file'
> include/linux/sysfs.h:100: error: previous declaration of 
> 'sysfs_remove_bin_file' was here
> 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
> ---
>  Documentation/firmware_class/firmware_sample_firmware_class.c |3 ---
>  1 file changed, 3 deletions(-)

Can we move this file to the samples/ directory, so the build will catch
stuff like this?

What's the final version of this patch?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] UIO: introduce sysfs_ops for map_attr_ktype

2008-02-19 Thread Greg KH

On Tue, Feb 19, 2008 at 01:55:05AM -0800, Brandon Philips wrote:
> This fixes two bugs with UIO that cropped up recently in -rc1
> 
> 1) WARNING: at fs/sysfs/file.c:334 sysfs_open_file when trying to open
>a map addr/size file - complaining about missing sysfs_ops for ktype
> 
> 2) Permission denied when reading uio/uio0/maps/map0/{addr,size} when
>files are mode S_IRUGO
> 
> Also fix a typo: attr_attribute -> addr_attribute
> 
> Signed-off-by: Brandon Philips <[EMAIL PROTECTED]>

Hm, I thought I could get away with just using a kobject attribute, not
a special one.

Thanks for fixing this, it was my fault.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.25 patch] fix broken error handling in ieee80211_sta_process_addba_request()

2008-02-19 Thread Jarek Poplawski

On 19-02-2008 23:58, Adrian Bunk wrote:
...
> --- a/net/mac80211/ieee80211_sta.c
> +++ b/net/mac80211/ieee80211_sta.c
> @@ -1116,9 +1116,10 @@ static void ieee80211_sta_process_addba_request(struct 
> net_device *dev,
...
> + printk(KERN_ERR "can not allocate reordering buffer "

  + printk(KERN_ERR "cannot allocate reordering buffer "

Probably this can be fixed during the commit.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.24] block2mtd: removing a device and typo fixes

2008-02-19 Thread Jörn Engel

On Tue, 19 February 2008 23:33:38 +0100, Arnd Bergmann wrote:
> 
> Given that loop works in this way, I certainly see that as doable,
> but then I'd vote for using the existing ioctl semantics of
> LOOP_SET_FD and LOOP_DEL_FD on the mtdchar device, which already
> comes with an ioctl interface for mtd devices.
> I'd probably also allow the LOOP_{GET,SET}_STATUS{,64} commands,
> so you can actually use the existing losetup tool.
> That way, we wouldn't have to introduce a new API, just extend
> an existing one to work on more things.

I like this approach.  It somewhat collides with the mtd principle of
having a seperate module for every 2-3 lines of code, but maybe that is
not a bad thing after all.

Onto my list of code to write on rainy afternoons (and secretly hoping
for others to do it instead).

Jörn

-- 
Mundie uses a textbook tactic of manipulation: start with some
reasonable talk, and lead the audience to an unreasonable conclusion.
-- Bruce Perens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] AMD opteron mm config numa etc

2008-02-19 Thread Ingo Molnar


* Greg KH <[EMAIL PROTECTED]> wrote:

> > > could make up for system that system have acpi problem or still 
> > > can mmconf and numa when acpi=off
> > 
> > Greg, any deep objections against these patches? (other than that 
> > they need a good amount of testing) I personally think that the more 
> > independent the kernel is of the whims of the BIOS, the better ...
> 
> No objection from me, other than they need a LOT of testing. [...]

ok - have queued it up for v2.6.26. Note: Andrew might get grumpy when 
your PCI tree starts changing nearby places in arch/x86/pci again and it 
clashes with these changes in x86.git - in that case please pick up the 
full lot from x86.git#testing and carry it in the PCI tree. (or, 
alternatively, send me any trivial, arch/x86-only PCI bits to 
x86.git#testing so that we can keep it and test it all in a single place 
- whichever approach is more convenient to you)

> [...] Oh, and the networking patch is still wrong, and the poster has 
> been told this numerous times, which makes me wonder how well the pci 
> bridge patch was tested...

i think the optimization should be more correct now than in the past, 
its purpose and dependencies just have not been communicated fully. 
We'll get there eventually :-)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Status of storage autosuspend

2008-02-19 Thread Greg KH

On Mon, Feb 18, 2008 at 10:19:11PM -0500, Alan Stern wrote:
> On Mon, 18 Feb 2008, Pavel Machek wrote:
> 
> > > Should we ignore this issue and submit the patches anyway?
> > 
> > I think you should. "Easy" (and clean) solution to that issue is to
> > just return -EPERM from SG_IOCTL if autosuspend is configured in ;-).
> 
> :-)
> 
> Okay, I'll update the patches to 2.6.25-rc2 and submit them in a few
> days.  (Actually the SCSI patch has to go in first and the usb-storage
> patch afterward, which will probably cause it to be delayed one kernel
> version.  I don't know any good way to handle these cross-subsystem
> updates...)

Push the usb-storage one through the scsi tree as well.  The subsystem
maintainers handle this kind of thing all the time (for example, a sysfs
feature is about to go in through the ocfs tree for this very reason.)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PCI] duplicate sysfs symbols getting registered in current git

2008-02-19 Thread Greg KH

On Mon, Feb 18, 2008 at 09:52:25PM +0100, Guennadi Liakhovetski wrote:
> Booting an x86 SMP PC with todays git-snapshot or just with 2.6.25-rc2 
> getting the following warnings (with a bit of context):

Can you try enabling CONFIG_DEBUG_KOBJECT and sending the output at boot
time from this?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/8] net: use numa_node in net_devcice->dev instead of parent

2008-02-19 Thread Ingo Molnar


* Yinghai Lu <[EMAIL PROTECTED]> wrote:

> > > can you check the 5/8? that will make sure every struct device get 
> > > numa_node get assigned.
> >
> > Why do we need to bother with that if the parent will have the 
> > necessary information for us here?
> 
> less code?
> 
> or some kind of usb or other bus interface. may have several level...

you mean it's a small optimization: otherwise every struct device (net 
dev, usb...) need to go back to find another pci device (parent or host 
controller) to use their numa_node.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread Paul Menage

On Feb 19, 2008 10:14 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
> > On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
> > >
> > > it changes the format from "%s %lld" to "%s: %llu", right?
> > > why?
> > >
> >
> > The colon for consistency with maps in /proc. I think it also makes it
> > slightly more readable.
>
> can you be a little more specific?
>
> i object against the colon because i want to use the same parser for
> /proc/vmstat, which doesn't have colons.

Ah. This /proc behaviour of having multiple formats for reporting the
same kind of data (compare with /proc/meminfo, which does use colons)
is the kind of thing that I want to avoid with cgroups. i.e. if two
cgroup subsystems are both reporting the same kind of structured data,
then they should both use the same output format.

I guess since /proc has both styles, and memory.stat is the first file
reporting key/value pairs in cgroups, you get to call the format. OK,
I'll zap the colon.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] exporting capability code/name pairs (try #6.1)

2008-02-19 Thread Kohei KaiGai

>> Could you also modify the documentation and the sample code to use this
>> new field, showing how it is to be used, and testing that it works
>> properly at the same time?
> 
> OK, Please wait for a while.

[3/3] Add a new example of kobject/attribute

The attached patch can provide a new exmple to use kobject and attribute.
The _show() and _store() method can refer/store the private data field of
kobj_attribute structure to know what entries are refered by users.
It will make easier to share a single _show()/_store() method with several
entries.

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
--
 samples/kobject/kobject-example.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/samples/kobject/kobject-example.c 
b/samples/kobject/kobject-example.c
index 08d0d3f..f99d734 100644
--- a/samples/kobject/kobject-example.c
+++ b/samples/kobject/kobject-example.c
@@ -77,6 +77,35 @@ static struct kobj_attribute baz_attribute =
 static struct kobj_attribute bar_attribute =
__ATTR(bar, 0666, b_show, b_store);

+/*
+ * You can store a private data within 'data' field of kobj_attribute.
+ * It enables to share a single _show() or _store() method with several
+ * entries.
+ */
+static ssize_t integer_show(struct kobject *kobj,
+   struct kobj_attribute *attr,
+   char *buf)
+{
+   return scnprintf(buf, PAGE_SIZE, "%d\n", (int) attr->data);
+}
+
+static ssize_t integer_store(struct kobject *kobj,
+struct kobj_attribute *attr,
+const char *buf, size_t count)
+{
+   int code;
+
+   sscanf(buf, "%du", );
+   attr->data = (void *) code;
+   return count;
+}
+
+static struct kobj_attribute hoge_attribute =
+   __ATTR_DATA(hoge, 0666, integer_show, integer_store, 123);
+static struct kobj_attribute piyo_attribute =
+   __ATTR_DATA(piyo, 0666, integer_show, integer_store, 456);
+static struct kobj_attribute fuga_attribute =
+   __ATTR_DATA(fuga, 0444, integer_show, NULL, 789);

 /*
  * Create a group of attributes so that we can create and destory them all
@@ -86,6 +115,9 @@ static struct attribute *attrs[] = {
_attribute.attr,
_attribute.attr,
_attribute.attr,
+   _attribute.attr,
+   _attribute.attr,
+   _attribute.attr,
NULL,   /* need to NULL terminate the list of attributes */
 };

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bootup crash, -git] Re: patch pci-pcie-aspm-support.patchadded to gregkh-2.6 tree

2008-02-19 Thread Shaohua Li


On Tue, 2008-02-19 at 21:42 -0800, Greg KH wrote:
> On Wed, Feb 20, 2008 at 01:24:48PM +0800, Shaohua Li wrote:
> > 
> > On Tue, 2008-02-19 at 21:04 -0800, Greg KH wrote:
> > > On Wed, Feb 20, 2008 at 12:48:21PM +0800, Shaohua Li wrote:
> > > > 
> > > > On Tue, 2008-02-19 at 20:14 -0800, Greg KH wrote:
> > > > > On Wed, Feb 20, 2008 at 09:36:07AM +0800, Shaohua Li wrote:
> > > > > > --- linux.orig/include/linux/pci-acpi.h 2008-02-19 
> > > > > > 11:03:51.0 +0800
> > > > > > +++ linux/include/linux/pci-acpi.h  2008-02-20 09:19:15.0 
> > > > > > +0800
> > > > > > @@ -47,6 +47,7 @@
> > > > > > OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL)
> > > > > >  
> > > > > >  #ifdef CONFIG_ACPI
> > > > > > +#include 
> > > > > >  extern acpi_status pci_osc_control_set(acpi_handle handle, u32 
> > > > > > flags);
> > > > > >  extern acpi_status __pci_osc_support_set(u32 flags, const char 
> > > > > > *hid);
> > > > > >  static inline acpi_status pci_osc_support_set(u32 flags)
> > > > > > @@ -59,13 +60,11 @@ static inline acpi_status pcie_osc_suppo
> > > > > >  }
> > > > > >  #else
> > > > > >  #if !defined(AE_ERROR)
> > > > > > -typedef u32acpi_status;
> > > > > > -#define AE_ERROR   (acpi_status) (0x0001)
> > > > > > -#endif
> > > > > > -static inline acpi_status pci_osc_control_set(acpi_handle handle, 
> > > > > > u32 flags)
> > > > > > -{return AE_ERROR;}
> > > > > > -static inline acpi_status pci_osc_support_set(u32 flags) {return 
> > > > > > AE_ERROR;} 
> > > > > > -static inline acpi_status pcie_osc_support_set(u32 flags) {return 
> > > > > > AE_ERROR;}
> > > > > > +#define AE_ERROR   (0x0001)
> > > > > > +#endif
> > > > > > +#define pci_osc_control_set(handle, flags) (AE_ERROR)
> > > > > > +#define pci_osc_support_set(flags) (AE_ERROR)
> > > > > > +#define pcie_osc_support_set(flags) (AE_ERROR)
> > > > > 
> > > > > No, please use inline functions, don't change these functions that
> > > > > should be just fine.  Why are you needing to change them?
> > > > some types aren't defined in non-ACPI, like acpi_handle, acpi_status.
> > > 
> > > Then why include a non-ACPI header file in non-ACPI .c files?
> > aspm is generic, but in ACPI platform, it needs special handling. I can
> > add 'ifdef CONFIG_ACPI' in aspm.c to avoid changing pci-acpi.h, but
> > thought it's better pci-acpi.h is self-contained.
> 
> Ugh, "generic" stuff needing ACPI, that's an oxymoron...
> 
> Will this ever work on non-ACPI systems?  If so, then I expect to see
> some #ifdefs in the .c file (or split it into two) to handle that.  If
> not, why not only include it if ACPI is enabled, as that means it really
> is a dependancy :(
It should work. Ok, this version doesn't change pci-acpi.h

PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.

This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.

In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.

Note: some devices might not work well with aspm, either because chipset
issue or device issue. The patch provide API (pci_disable_link_state),
driver can disable ASPM for specific device.

Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
---
 drivers/pci/pci-sysfs.c   |5 
 drivers/pci/pci.c |4 
 drivers/pci/pcie/Kconfig  |   20 +
 drivers/pci/pcie/Makefile |3 
 drivers/pci/pcie/aspm.c   |  811 ++
 drivers/pci/probe.c   |5 
 drivers/pci/remove.c  |4 
 include/linux/pci-aspm.h  |   56 +++
 include/linux/pci.h   |5 
 include/linux/pci_regs.h  |8 
 10 files changed, 921 insertions(+)

Index: linux/drivers/pci/pcie/Makefile
===
--- linux.orig/drivers/pci/pcie/Makefile2008-02-20 10:22:16.0 
+0800
+++ linux/drivers/pci/pcie/Makefile 2008-02-20 13:59:10.0 +0800
@@ -2,6 +2,9 @@
 # Makefile for PCI-Express PORT Driver
 #
 
+# Build PCI Express ASPM if needed
+obj-$(CONFIG_PCIEASPM) += aspm.o
+
 pcieportdrv-y

Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-19 Thread Jeff Chua

On Feb 20, 2008 12:32 PM, Jesse Barnes <[EMAIL PROTECTED]> wrote:
>
> On Tuesday, February 19, 2008 6:28 pm Linus Torvalds wrote:
> > On Tue, 19 Feb 2008, Jesse Barnes wrote:
> > > I found the same poweroff issue on my T61.  It turned out to be related
> > > to the C state code disabling interrupts when it shouldn't iirc.  Booting
> > > with 'idle=poll' seems to work around the problem.
> > >
> > > The "green screen" problem should be fixed (see the DRM git tree for
> > > details).
> Jeff, can you retest with Linus' tree?  If you're still seeing problems, it
> might help to add some printks to the i915 driver's suspend routine.  Just
> reading the regs really shouldn't cause a hang, but maybe the VGA bits are
> subtly wrong again...

The funny thing is the screen is now normal during suspend, but the
green came back after suspend!

And the suspend still does NOT power off with lastest Linus's tree.

I'll try the "idle=poll" to see if that works and will try some printk as well.

Thanks,
Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] exporting capability code/name pairs (try #6.1)

2008-02-19 Thread Kohei KaiGai

[Sorry, I sent a patch with TABs translated into spaces.]

In the attached patch, every attribute entry stores its capability
identifier in numerical or symbolic representation within private
data field of kobj_attribute structure.
The rest of them are unchanged.


[2/3] Exporting capability code/name pairs

This patch enables to export code/name of capabilities supported
on the running kernel.

A newer kernel sometimes adds new capabilities, like CAP_MAC_ADMIN
at 2.6.25. However, we have no interface to disclose what capabilities
are supported on this kernel. Thus, we have to maintain libcap version
in appropriate one synchronously.

This patch enables libcap to collect the list of capabilities on
run time, and provide them for users.
It helps to improve portability of library.

It exports these information as regular files under /sys/kernel/capability.
The numeric node exports its name, the symbolic node exports its code.

Please consider to put this patch on the queue of 2.6.25.

Thanks,
===
[EMAIL PROTECTED] ~]$ ls -R /sys/kernel/capability/
/sys/kernel/capability/:
codes  names  version

/sys/kernel/capability/codes:
0  10  12  14  16  18  2   21  23  25  27  29  30  32  4  6  8
1  11  13  15  17  19  20  22  24  26  28  3   31  33  5  7  9

/sys/kernel/capability/names:
cap_audit_controlcap_kill  cap_net_raw cap_sys_nice
cap_audit_write  cap_lease cap_setfcap cap_sys_pacct
cap_chowncap_linux_immutable   cap_setgid  cap_sys_ptrace
cap_dac_override cap_mac_admin cap_setpcap cap_sys_rawio
cap_dac_read_search  cap_mac_override  cap_setuid  cap_sys_resource
cap_fowner   cap_mknod cap_sys_admin   cap_sys_time
cap_fsetid   cap_net_admin cap_sys_bootcap_sys_tty_config
cap_ipc_lock cap_net_bind_service  cap_sys_chroot
cap_ipc_ownercap_net_broadcast cap_sys_module
[EMAIL PROTECTED] ~]$ cat /sys/kernel/capability/version
0x20071026
[EMAIL PROTECTED] ~]$ cat /sys/kernel/capability/codes/30
cap_audit_control
[EMAIL PROTECTED] ~]$ cat /sys/kernel/capability/names/cap_sys_pacct
20
[EMAIL PROTECTED] ~]$
===

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
--
 Documentation/ABI/testing/sysfs-kernel-capability |   23 +
 scripts/mkcapnames.sh |   44 +
 security/Makefile |9 ++
 security/commoncap.c  |   99 +
 4 files changed, 175 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-capability 
b/Documentation/ABI/testing/sysfs-kernel-capability
index e69de29..402ef06 100644
--- a/Documentation/ABI/testing/sysfs-kernel-capability
+++ b/Documentation/ABI/testing/sysfs-kernel-capability
@@ -0,0 +1,23 @@
+What:  /sys/kernel/capability
+Date:  Feb 2008
+Contact:   KaiGai Kohei <[EMAIL PROTECTED]>
+Description:
+   The entries under /sys/kernel/capability are used to export
+   the list of capabilities the running kernel supported.
+
+   - /sys/kernel/capability/version
+ returns the most preferable version number for the
+ running kernel.
+ e.g) $ cat /sys/kernel/capability/version
+  0x20071026
+
+   - /sys/kernel/capability/code/
+ returns its symbolic representation, on reading.
+ e.g) $ cat /sys/kernel/capability/codes/30
+  cap_audit_control
+
+   - /sys/kernel/capability/name/
+ returns its numerical representation, on reading.
+ e.g) $ cat /sys/kernel/capability/names/cap_sys_pacct
+  20
+
diff --git a/scripts/mkcapnames.sh b/scripts/mkcapnames.sh
index e69de29..5d36d52 100644
--- a/scripts/mkcapnames.sh
+++ b/scripts/mkcapnames.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+
+#
+# generate a cap_names.h file from include/linux/capability.h
+#
+
+CAPHEAD="`dirname $0`/../include/linux/capability.h"
+REGEXP='^#define CAP_[A-Z_]+[  ]+[0-9]+$'
+NUMCAP=`cat "$CAPHEAD" | egrep -c "$REGEXP"`
+
+echo '#ifndef CAP_NAMES_H'
+echo '#define CAP_NAMES_H'
+echo
+echo '/*'
+echo ' * Do NOT edit this file directly.'
+echo ' * This file is generated from include/linux/capability.h automatically'
+echo ' */'
+echo
+echo '#if !defined(SYSFS_CAP_NAME_ENTRY) || !defined(SYSFS_CAP_CODE_ENTRY)'
+echo '#error cap_names.h should be included from security/capability.c'
+echo '#else'
+echo "#if $NUMCAP != CAP_LAST_CAP + 1"
+echo '#error mkcapnames.sh cannot collect capabilities correctly'
+echo '#else'
+cat "$CAPHEAD" | egrep "$REGEXP" \
+| awk '{ printf("SYSFS_CAP_NAME_ENTRY(%s,%s);\n", tolower($2), $2); }'
+echo
+echo 'static struct attribute *capability_name_attrs[] = {'
+cat "$CAPHEAD" | egrep "$REGEXP" \
+| awk

Re: [PATCH] exporting capability code/name pairs (try #6.1)

2008-02-19 Thread Kohei KaiGai

[Sorry, I sent a patch with TABs translated into spaces.]

[1/3] Add a private data field within kobj_attribute structure.

This patch add a private data field, declared as void *, within kobj_attribute
structure. Anyone wants to use sysfs can store their private data to refer at
_show() and _store() method.
It enables to share a single method function with several similar entries,
like ones to export the list of capabilities the running kernel supported.

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
-- 
 include/linux/kobject.h |1 +
 include/linux/sysfs.h   |7 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index caa3f41..57d5bf1 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -130,6 +130,7 @@ struct kobj_attribute {
char *buf);
ssize_t (*store)(struct kobject *kobj, struct kobj_attribute *attr,
 const char *buf, size_t count);
+   void *data; /* a private field */
 };

 extern struct sysfs_ops kobj_sysfs_ops;
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 8027104..6f40ff9 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -50,6 +50,13 @@ struct attribute_group {
.store  = _store,   \
 }

+#define __ATTR_DATA(_name,_mode,_show,_store,_data) {  \
+   .attr = {.name = __stringify(_name), .mode = _mode },   \
+   .show   = _show,\
+   .store  = _store,   \
+   .data   = (void *)(_data),  \
+}
+   
 #define __ATTR_RO(_name) { \
.attr   = { .name = __stringify(_name), .mode = 0444 }, \
.show   = _name##_show, \

-- 
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 20

2008-02-19 Thread Chris Wedgwood

On Tue, Feb 19, 2008 at 09:50:55PM -0800, Greg KH wrote:

> What's the best way to constantly follow this tree?  I had cloned it
> a while ago, but now if I 'git pull' it wants to merge things, which
> isn't right.

I would guess:

  $ git remote add linux-next 
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
  $ git fetch linux-next

then use the remote branch names when poking about:

  $ git log -p linux-next/master

etc?


Or is there a better way?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread YAMAMOTO Takashi

> On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
> >
> > it changes the format from "%s %lld" to "%s: %llu", right?
> > why?
> >
> 
> The colon for consistency with maps in /proc. I think it also makes it
> slightly more readable.

can you be a little more specific?

i object against the colon because i want to use the same parser for
/proc/vmstat, which doesn't have colons.

btw, when making ABI changes like this, can you please mention it
explicitly in the patch descriptions?

> For %lld versus %llu - I think that cgroup resource APIs are much more
> likely to need to report unsigned rather than signed values. In the
> case of the memory.stat file, that's certainly the case.
> 
> But I guess there's an argument to be made that nothing's likely to
> need the final 64th bit of an unsigned value, whereas the ability to
> report negative numbers could potentially be useful for some cgroups.
> 
> Paul

i don't have any strong opinions about signedness.

YAMAMOTO Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.25-rc2 regression - hang on suspend

2008-02-19 Thread Soeren Sonnenburg

On Wed, 2008-02-20 at 00:50 +0100, Rafael J. Wysocki wrote:
> On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > On Tue, 2008-02-19 at 22:06 +0100, Rafael J. Wysocki wrote:
> > > On Tuesday, 19 of February 2008, Soeren Sonnenburg wrote:
> > > > Hi,
> > > 
> > > Hi,
> > > 
> > > > since 2.6.25-rc1 (first version I tried) and still in rc2 (and git), I
> > > > see a hang on s2ram already when trying to suspend.
> > > 
> > > Does it work with 2.6.24?
> > 
> > yes.
> 
> Please take the current mainline (there are a couple of nasty bugs fixed in
> it), configure it with CONFIG_PM_DEBUG set, boot it with "no_console_suspend",
> run
> 
> # echo 8 > /proc/sys/kernel/printk
> # echo devices > /sys/power/pm_test
> # echo mem > /sys/power/state
> 
> If it hangs, it should leave a stack trace before and I need that trace to see
> what's going on.  If it doesn't hang, I'll tell you what to do next.

I tried with 2.6.24.2 with CONFIG_PM_DEBUG set, following your steps and
yes it works flawlessly (though the display did not come back I could
suspend/resume multiple times without problems, and finally s2ram -f -p
brought the display back).

So what next?

Soeren
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread Paul Menage

On Feb 19, 2008 9:48 PM, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
>
> it changes the format from "%s %lld" to "%s: %llu", right?
> why?
>

The colon for consistency with maps in /proc. I think it also makes it
slightly more readable.

For %lld versus %llu - I think that cgroup resource APIs are much more
likely to need to report unsigned rather than signed values. In the
case of the memory.stat file, that's certainly the case.

But I guess there's an argument to be made that nothing's likely to
need the final 64th bit of an unsigned value, whereas the ability to
report negative numbers could potentially be useful for some cgroups.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] exporting capability code/name pairs (try #6)

2008-02-19 Thread Greg KH

On Wed, Feb 20, 2008 at 02:38:16PM +0900, Kohei KaiGai wrote:
> Greg KH wrote:
>> On Wed, Feb 20, 2008 at 01:38:59PM +0900, Kohei KaiGai wrote:
> If we can have a private member in kobj_attribute, we can found the 
>>> content
> to be returned in a single step.
 Ok, again, just send me a patch that adds this functionality and we will
 be very glad to consider it.
>>> [1/2] Add a private data field within kobj_attribute structure.
>>>
>>> This patch add a private data field, declared as void *, within 
>>> kobj_attribute
>>> structure. Anyone wants to use sysfs can store their private data to 
>>> refer at
>>> _show() and _store() method.
>>> It enables to share a single method function with several similar 
>>> entries,
>>> like ones to export the list of capabilities the running kernel 
>>> supported.
>> But your patch 2/2 doesn't use this interface, why not?
>
> Really?
> The following two _show() methods shared by every capabilities refer
> the private member of kobj_attribute.
>
> | +static ssize_t capability_name_show(struct kobject *kobj,
> | +struct kobj_attribute *attr,
> | +char *buffer)
> | +{
> | +/* It returns numerical representation of capability. */
> | +return scnprintf(buffer, PAGE_SIZE, "%d\n", (int) attr->data);
> | +}
> | +
> | +static ssize_t capability_code_show(struct kobject *kobj,
> | +struct kobj_attribute *attr,
> | +char *buffer)
> | +{
> | +/* It returns symbolic representation of capability. */
> | +return scnprintf(buffer, PAGE_SIZE, "%s\n", (char *) attr->data);
> | +}

Ah, sorry, missed that.  I also missed where this was set up as well :(

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ofa-general] [2.6 patch] infiniband/hw/nes/nes_verbs.c: fix off-by-one

2008-02-19 Thread Adrian Bunk

On Tue, Feb 19, 2008 at 08:23:19PM -0800, Roland Dreier wrote:
> Thanks, this is already upstream as 51af33e8

No, 51af33e8 was for a similar same bug 400 lines below this bug...

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 20

2008-02-19 Thread David Miller

From: Greg KH <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 21:50:55 -0800

> On Wed, Feb 20, 2008 at 04:34:57PM +1100, Stephen Rothwell wrote:
> > I will stop making these announcements now unless there is some change to
> > the tree or things people should know.  There should be a new tree every
> > (Australian Capital Territory) working day.
> 
> I like seeing these, to know that things are at least still working.

FWIW, I like seeing them too.  It acts as a catalist in my inbox
which works as a TODO list.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: atmel_spi clock polarity

2008-02-19 Thread Atsushi Nemoto

On Mon, 18 Feb 2008 15:31:58 +0100, Haavard Skinnemoen <[EMAIL PROTECTED]> 
wrote:
> > Anyway, I will try your patch in a few days.
> 
> Ok, thanks. If it works, that would be great, but given your
> description above I'm not sure if I dare hope for it.

Unfortunately it did not work.  The clock state did not change by
writing MR register.

---
Atsushi Nemoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] signal(x86_32): Improve the signal stack overflow check

2008-02-19 Thread Harvey Harrison

On Tue, 2008-02-19 at 18:49 -0800, Roland McGrath wrote:
> > I spent some time read you mail carefully and dig into the code again.
> > 
> > And yes, you are right. It's possible that SA_ONSTACK has been cleared
> > before the second signal on the same stack comes.
> 
> It's not necessary for SA_ONSTACK to have "been cleared", by which I assume
> you mean a sigaction call with SA_ONSTACK not set in sa_flags.  That is
> indeed possible, but it's not the only case your patch broke.  It can just
> be a different signal whose sigaction never had SA_ONSTACK, when you are
> still on the signal stack from an earlier signal that did have SA_ONSTACK.
> 
> > So this patch is wrong  :( . I will revise the other 4 patches.
> 
> For 2 and 3, I would rather just wait until we unify signal.c anyway.
> 

I've been looking at that, at the same time a bunch of ia32/signal.c
looks like it can go away.

Harvey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pci/pcie/aer/aerdrv_acpi.c: inconsequent NULL checking

2008-02-19 Thread Greg KH

On Tue, Feb 19, 2008 at 09:29:02PM +0200, Adrian Bunk wrote:
> The Coverity checker spotted the following inconsequent NULL checking 
> introduced by commit 3c75e23784e6ed5f4841de43d0750fd9b37bafcb:
> 
> <--  snip  -->
> 
> ...
> int aer_osc_setup(struct pcie_device *pciedev)
> {
> ...v
> while (pdev->bus && pdev->bus->self)
> pdev = pdev->bus->self;

That could probably change to just pdev->bus->self, as a bus should
always be there for a pdev, so I don't see this as a problem.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Feb 20

2008-02-19 Thread Greg KH

On Wed, Feb 20, 2008 at 04:34:57PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> I have created today's linux-next tree at
> git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git.
> 
> You can see which trees have been included by looking in the Next/Trees
> file in the source.  There are also quilt-import.log and merge.log files
> in the Next directory.  Between each merge, the tree was built with
> allmodconfig for both powerpc and x86_64.

What's the best way to constantly follow this tree?  I had cloned it a
while ago, but now if I 'git pull' it wants to merge things, which isn't
right.

I'm guessing that this is constantly being rebased?  Against what,
Linus's tree?  So we should be able to clone Linus's tree, and then pull
in -next?

Or am I totally missing something here?

> There were no merge conflicts and only one build failure!
> 
> We are up to 27 trees, more are welcome (even if they are currently
> empty).  I would encourage architecture maintainers, in particular, to
> set up a git branch or quilt tree now to avoid the rush after RC2 :-)
> 
> I will stop making these announcements now unless there is some change to
> the tree or things people should know.  There should be a new tree every
> (Australian Capital Territory) working day.

I like seeing these, to know that things are at least still working.  I
imagine you could script them, or just send them to the linux-next list
if there are no problems, but lkml should probably be notified of any
issues, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bootup crash, -git] Re: patch pci-pcie-aspm-support.patchadded to gregkh-2.6 tree

2008-02-19 Thread Greg KH

On Wed, Feb 20, 2008 at 01:24:48PM +0800, Shaohua Li wrote:
> 
> On Tue, 2008-02-19 at 21:04 -0800, Greg KH wrote:
> > On Wed, Feb 20, 2008 at 12:48:21PM +0800, Shaohua Li wrote:
> > > 
> > > On Tue, 2008-02-19 at 20:14 -0800, Greg KH wrote:
> > > > On Wed, Feb 20, 2008 at 09:36:07AM +0800, Shaohua Li wrote:
> > > > > --- linux.orig/include/linux/pci-acpi.h   2008-02-19 
> > > > > 11:03:51.0 +0800
> > > > > +++ linux/include/linux/pci-acpi.h2008-02-20 09:19:15.0 
> > > > > +0800
> > > > > @@ -47,6 +47,7 @@
> > > > >   OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL)
> > > > >  
> > > > >  #ifdef CONFIG_ACPI
> > > > > +#include 
> > > > >  extern acpi_status pci_osc_control_set(acpi_handle handle, u32 
> > > > > flags);
> > > > >  extern acpi_status __pci_osc_support_set(u32 flags, const char *hid);
> > > > >  static inline acpi_status pci_osc_support_set(u32 flags)
> > > > > @@ -59,13 +60,11 @@ static inline acpi_status pcie_osc_suppo
> > > > >  }
> > > > >  #else
> > > > >  #if !defined(AE_ERROR)
> > > > > -typedef u32  acpi_status;
> > > > > -#define AE_ERROR (acpi_status) (0x0001)
> > > > > -#endif
> > > > > -static inline acpi_status pci_osc_control_set(acpi_handle handle, 
> > > > > u32 flags)
> > > > > -{return AE_ERROR;}
> > > > > -static inline acpi_status pci_osc_support_set(u32 flags) {return 
> > > > > AE_ERROR;} 
> > > > > -static inline acpi_status pcie_osc_support_set(u32 flags) {return 
> > > > > AE_ERROR;}
> > > > > +#define AE_ERROR (0x0001)
> > > > > +#endif
> > > > > +#define pci_osc_control_set(handle, flags) (AE_ERROR)
> > > > > +#define pci_osc_support_set(flags) (AE_ERROR)
> > > > > +#define pcie_osc_support_set(flags) (AE_ERROR)
> > > > 
> > > > No, please use inline functions, don't change these functions that
> > > > should be just fine.  Why are you needing to change them?
> > > some types aren't defined in non-ACPI, like acpi_handle, acpi_status.
> > 
> > Then why include a non-ACPI header file in non-ACPI .c files?
> aspm is generic, but in ACPI platform, it needs special handling. I can
> add 'ifdef CONFIG_ACPI' in aspm.c to avoid changing pci-acpi.h, but
> thought it's better pci-acpi.h is self-contained.

Ugh, "generic" stuff needing ACPI, that's an oxymoron...

Will this ever work on non-ACPI systems?  If so, then I expect to see
some #ifdefs in the .c file (or split it into two) to handle that.  If
not, why not only include it if ACPI is enabled, as that means it really
is a dependancy :(

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread YAMAMOTO Takashi

> These patches add a new cgroup control file output type - a map from
> strings to u64 values - and make use of it for the memory controller
> "stat" file.
> 
> It is intended for use when the subsystem wants to return a collection
> of values that are related in some way, for which a separate control
> file for each value would make the reporting unwieldy.
> 
> The advantages of this are:
> 
> - more standardized output from control files that report
> similarly-structured data
> 
> - less boilerplate required in cgroup subsystems
> 
> - simplifies transition to a future efficient cgroups binary API
> 
> Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

it changes the format from "%s %lld" to "%s: %llu", right?
why?

YAMAMOTO Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] exporting capability code/name pairs (try #6)

2008-02-19 Thread Kohei KaiGai


Greg KH wrote:

On Wed, Feb 20, 2008 at 01:38:59PM +0900, Kohei KaiGai wrote:
If we can have a private member in kobj_attribute, we can found the 

content

to be returned in a single step.

Ok, again, just send me a patch that adds this functionality and we will
be very glad to consider it.

[1/2] Add a private data field within kobj_attribute structure.

This patch add a private data field, declared as void *, within 
kobj_attribute
structure. Anyone wants to use sysfs can store their private data to refer 
at

_show() and _store() method.
It enables to share a single method function with several similar entries,
like ones to export the list of capabilities the running kernel supported.


But your patch 2/2 doesn't use this interface, why not?


Really?
The following two _show() methods shared by every capabilities refer
the private member of kobj_attribute.

| +static ssize_t capability_name_show(struct kobject *kobj,
| +struct kobj_attribute *attr,
| +char *buffer)
| +{
| +/* It returns numerical representation of capability. */
| +return scnprintf(buffer, PAGE_SIZE, "%d\n", (int) attr->data);
| +}
| +
| +static ssize_t capability_code_show(struct kobject *kobj,
| +struct kobj_attribute *attr,
| +char *buffer)
| +{
| +/* It returns symbolic representation of capability. */
| +return scnprintf(buffer, PAGE_SIZE, "%s\n", (char *) attr->data);
| +}


 include/linux/kobject.h |1 +
 include/linux/sysfs.h   |7 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index caa3f41..57d5bf1 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -130,6 +130,7 @@ struct kobj_attribute {
char *buf);
ssize_t (*store)(struct kobject *kobj, struct kobj_attribute *attr,
 const char *buf, size_t count);
+   void *data; /* a private field */


Hm, can you really use this?


Yes,


 extern struct sysfs_ops kobj_sysfs_ops;
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 8027104..6f40ff9 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -50,6 +50,13 @@ struct attribute_group {
.store  = _store,   \
 }

+#define __ATTR_DATA(_name,_mode,_show,_store,_data) {  \
+   .attr = {.name = __stringify(_name), .mode = _mode },   \
+   .show   = _show,\
+   .store  = _store,   \
+   .data   = (void *)(_data),  \
+}


I don't see how this would be any different from the original?  You are
always passed a kobject, which can be embedded in anything else.


The intension of the latest patch is same as the version which uses
capability_attribute structure.
It enables to store the content to be returned in the expanded field.
Applying kobj_attribute killed needs to declare my own structure.

However, every entries had its own _show() method, generated by macros
automatically, in the previous version. It fundamentally differ from
the latest one.


Could you also modify the documentation and the sample code to use this
new field, showing how it is to be used, and testing that it works
properly at the same time?


OK, Please wait for a while.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Feb 20

2008-02-19 Thread Stephen Rothwell

Hi all,

I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
allmodconfig for both powerpc and x86_64.

There were no merge conflicts and only one build failure!

We are up to 27 trees, more are welcome (even if they are currently
empty).  I would encourage architecture maintainers, in particular, to
set up a git branch or quilt tree now to avoid the rush after RC2 :-)

I will stop making these announcements now unless there is some change to
the tree or things people should know.  There should be a new tree every
(Australian Capital Territory) working day.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpUnX4GJ2MQQ.pgp
Description: PGP signature

[PATCH 0/2] cgroup map files: Add a key/value map file type to cgroups

2008-02-19 Thread menage

These patches add a new cgroup control file output type - a map from
strings to u64 values - and make use of it for the memory controller
"stat" file.

It is intended for use when the subsystem wants to return a collection
of values that are related in some way, for which a separate control
file for each value would make the reporting unwieldy.

The advantages of this are:

- more standardized output from control files that report
similarly-structured data

- less boilerplate required in cgroup subsystems

- simplifies transition to a future efficient cgroups binary API

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] cgroup map files: Use cgroup map for memcontrol stats file

2008-02-19 Thread menage

Remove the seq_file boilerplate used to construct the memcontrol stats
map, and instead use the new map representation for cgroup control
files

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 mm/memcontrol.c |   30 ++
 1 file changed, 6 insertions(+), 24 deletions(-)

Index: cgroupmap-2.6.25-rc2-mm1/mm/memcontrol.c
===
--- cgroupmap-2.6.25-rc2-mm1.orig/mm/memcontrol.c
+++ cgroupmap-2.6.25-rc2-mm1/mm/memcontrol.c
@@ -974,9 +974,9 @@ static const struct mem_cgroup_stat_desc
[MEM_CGROUP_STAT_RSS] = { "rss", PAGE_SIZE, },
 };
 
-static int mem_control_stat_show(struct seq_file *m, void *arg)
+static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
+struct cgroup_map_cb *cb)
 {
-   struct cgroup *cont = m->private;
struct mem_cgroup *mem_cont = mem_cgroup_from_cont(cont);
struct mem_cgroup_stat *stat = _cont->stat;
int i;
@@ -986,8 +986,7 @@ static int mem_control_stat_show(struct 
 
val = mem_cgroup_read_stat(stat, i);
val *= mem_cgroup_stat_desc[i].unit;
-   seq_printf(m, "%s %lld\n", mem_cgroup_stat_desc[i].msg,
-   (long long)val);
+   cb->fill(cb, mem_cgroup_stat_desc[i].msg, val);
}
/* showing # of active pages */
{
@@ -997,29 +996,12 @@ static int mem_control_stat_show(struct 
MEM_CGROUP_ZSTAT_INACTIVE);
active = mem_cgroup_get_all_zonestat(mem_cont,
MEM_CGROUP_ZSTAT_ACTIVE);
-   seq_printf(m, "active %ld\n", (active) * PAGE_SIZE);
-   seq_printf(m, "inactive %ld\n", (inactive) * PAGE_SIZE);
+   cb->fill(cb, "active", (active) * PAGE_SIZE);
+   cb->fill(cb, "inactive", (inactive) * PAGE_SIZE);
}
return 0;
 }
 
-static const struct file_operations mem_control_stat_file_operations = {
-   .read = seq_read,
-   .llseek = seq_lseek,
-   .release = single_release,
-};
-
-static int mem_control_stat_open(struct inode *unused, struct file *file)
-{
-   /* XXX __d_cont */
-   struct cgroup *cont = file->f_dentry->d_parent->d_fsdata;
-
-   file->f_op = _control_stat_file_operations;
-   return single_open(file, mem_control_stat_show, cont);
-}
-
-
-
 static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
@@ -1044,7 +1026,7 @@ static struct cftype mem_cgroup_files[] 
},
{
.name = "stat",
-   .open = mem_control_stat_open,
+   .read_map = mem_control_stat_show,
},
 };
 

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cgroup map files: Add cgroup map data type

2008-02-19 Thread menage

Adds a new type of supported control file representation, a map from
strings to u64 values.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 include/linux/cgroup.h |   19 +++
 kernel/cgroup.c|   59 -
 2 files changed, 77 insertions(+), 1 deletion(-)

Index: cgroupmap-2.6.25-rc2-mm1/include/linux/cgroup.h
===
--- cgroupmap-2.6.25-rc2-mm1.orig/include/linux/cgroup.h
+++ cgroupmap-2.6.25-rc2-mm1/include/linux/cgroup.h
@@ -166,6 +166,16 @@ struct css_set {
 
 };
 
+/*
+ * cgroup_map_cb is an abstract callback API for reporting map-valued
+ * control files
+ */
+
+struct cgroup_map_cb {
+   int (*fill)(struct cgroup_map_cb *cb, const char *key, u64 value);
+   void *state;
+};
+
 /* struct cftype:
  *
  * The files in the cgroup filesystem mostly have a very simple read/write
@@ -194,6 +204,15 @@ struct cftype {
 * single integer. Use it in place of read()
 */
u64 (*read_uint) (struct cgroup *cont, struct cftype *cft);
+   /*
+* read_map() is used for defining a map of key/value
+* pairs. It should call cb->fill(cb, key, value) for each
+* entry. The key/value pairs (and their ordering) should not
+* change between reboots.
+*/
+   int (*read_map) (struct cgroup *cont, struct cftype *cft,
+struct cgroup_map_cb *cb);
+
ssize_t (*write) (struct cgroup *cont, struct cftype *cft,
  struct file *file,
  const char __user *buf, size_t nbytes, loff_t *ppos);
Index: cgroupmap-2.6.25-rc2-mm1/kernel/cgroup.c
===
--- cgroupmap-2.6.25-rc2-mm1.orig/kernel/cgroup.c
+++ cgroupmap-2.6.25-rc2-mm1/kernel/cgroup.c
@@ -1487,6 +1487,46 @@ static ssize_t cgroup_file_read(struct f
return -EINVAL;
 }
 
+/*
+ * seqfile ops/methods for returning structured data. Currently just
+ * supports string->u64 maps, but can be extended in future.
+ */
+
+struct cgroup_seqfile_state {
+   struct cftype *cft;
+   struct cgroup *cgroup;
+};
+
+static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value)
+{
+   struct seq_file *sf = cb->state;
+   return seq_printf(sf, "%s: %llu\n", key, value);
+}
+
+static int cgroup_seqfile_show(struct seq_file *m, void *arg)
+{
+   struct cgroup_seqfile_state *state = m->private;
+   struct cftype *cft = state->cft;
+   struct cgroup_map_cb cb = {
+   .fill = cgroup_map_add,
+   .state = m,
+   };
+   if (cft->read_map) {
+   return cft->read_map(state->cgroup, cft, );
+   } else {
+   BUG();
+   }
+}
+
+int cgroup_seqfile_release(struct inode *inode, struct file *file)
+{
+   struct seq_file *seq = file->private_data;
+   kfree(seq->private);
+   return single_release(inode, file);
+}
+
+static struct file_operations cgroup_seqfile_operations;
+
 static int cgroup_file_open(struct inode *inode, struct file *file)
 {
int err;
@@ -1499,7 +1539,18 @@ static int cgroup_file_open(struct inode
cft = __d_cft(file->f_dentry);
if (!cft)
return -ENODEV;
-   if (cft->open)
+   if (cft->read_map) {
+   struct cgroup_seqfile_state *state =
+   kzalloc(sizeof(*state), GFP_USER);
+   if (!state)
+   return -ENOMEM;
+   state->cft = cft;
+   state->cgroup = __d_cgrp(file->f_dentry->d_parent);
+   file->f_op = _seqfile_operations;
+   err = single_open(file, cgroup_seqfile_show, state);
+   if (err < 0)
+   kfree(state);
+   } else if (cft->open)
err = cft->open(inode, file);
else
err = 0;
@@ -1538,6 +1589,12 @@ static struct file_operations cgroup_fil
.release = cgroup_file_release,
 };
 
+static struct file_operations cgroup_seqfile_operations = {
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = cgroup_seqfile_release,
+};
+
 static struct inode_operations cgroup_dir_inode_operations = {
.lookup = simple_lookup,
.mkdir = cgroup_mkdir,

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [LMB]: Fix lmb_add_region if region should be added at the head

2008-02-19 Thread Kumar Gala

On Feb 19, 2008, at 11:26 PM, David Miller wrote:

From: Kumar Gala <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 23:16:18 -0600

The for loop above the code I added will move all the existing slots
up one.  Its just the tail cleanup we are missing.

Aha, I see how this works now, thanks!

I'll add this to my LMB tree.

Sounds good.  Now just convince Paul or Linus to pull this in for  
2.6.25 :)

- k
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [LMB]: Fix lmb_add_region if region should be added at the head

2008-02-19 Thread David Miller

From: Kumar Gala <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 23:16:18 -0600

> The for loop above the code I added will move all the existing slots  
> up one.  Its just the tail cleanup we are missing.

Aha, I see how this works now, thanks!

I'll add this to my LMB tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bootup crash, -git] Re: patch pci-pcie-aspm-support.patchadded to gregkh-2.6 tree

2008-02-19 Thread Shaohua Li


On Tue, 2008-02-19 at 21:04 -0800, Greg KH wrote:
> On Wed, Feb 20, 2008 at 12:48:21PM +0800, Shaohua Li wrote:
> > 
> > On Tue, 2008-02-19 at 20:14 -0800, Greg KH wrote:
> > > On Wed, Feb 20, 2008 at 09:36:07AM +0800, Shaohua Li wrote:
> > > > --- linux.orig/include/linux/pci-acpi.h 2008-02-19 11:03:51.0 
> > > > +0800
> > > > +++ linux/include/linux/pci-acpi.h  2008-02-20 09:19:15.0 
> > > > +0800
> > > > @@ -47,6 +47,7 @@
> > > > OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL)
> > > >  
> > > >  #ifdef CONFIG_ACPI
> > > > +#include 
> > > >  extern acpi_status pci_osc_control_set(acpi_handle handle, u32 flags);
> > > >  extern acpi_status __pci_osc_support_set(u32 flags, const char *hid);
> > > >  static inline acpi_status pci_osc_support_set(u32 flags)
> > > > @@ -59,13 +60,11 @@ static inline acpi_status pcie_osc_suppo
> > > >  }
> > > >  #else
> > > >  #if !defined(AE_ERROR)
> > > > -typedef u32acpi_status;
> > > > -#define AE_ERROR   (acpi_status) (0x0001)
> > > > -#endif
> > > > -static inline acpi_status pci_osc_control_set(acpi_handle handle, u32 
> > > > flags)
> > > > -{return AE_ERROR;}
> > > > -static inline acpi_status pci_osc_support_set(u32 flags) {return 
> > > > AE_ERROR;} 
> > > > -static inline acpi_status pcie_osc_support_set(u32 flags) {return 
> > > > AE_ERROR;}
> > > > +#define AE_ERROR   (0x0001)
> > > > +#endif
> > > > +#define pci_osc_control_set(handle, flags) (AE_ERROR)
> > > > +#define pci_osc_support_set(flags) (AE_ERROR)
> > > > +#define pcie_osc_support_set(flags) (AE_ERROR)
> > > 
> > > No, please use inline functions, don't change these functions that
> > > should be just fine.  Why are you needing to change them?
> > some types aren't defined in non-ACPI, like acpi_handle, acpi_status.
> 
> Then why include a non-ACPI header file in non-ACPI .c files?
aspm is generic, but in ACPI platform, it needs special handling. I can
add 'ifdef CONFIG_ACPI' in aspm.c to avoid changing pci-acpi.h, but
thought it's better pci-acpi.h is self-contained.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Menage

On Feb 19, 2008 9:17 PM, Paul Jackson <[EMAIL PROTECTED]> wrote:
>
> Perhaps my primary concern with these *.api files was that I did not
> understand who or what the critical use or user was; who found this
> essential, not just nice to have.
>

Right now, no-one would find it essential. If/when a binary API is
added, I guess I'll ressurrect this part of the patchset.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-19 Thread KOSAKI Motohiro

> Did those jobs share nodes -- sometimes two or more jobs using the same
> nodes?  I am sure SGI has such users too, though such job mixes make
> the runtimes of specific jobs less obvious, so customers are more
> tolerant of variations and some inefficiencies, as they get hidden in
> the mix.

Hm
our dedicated ndoe user set memory limit to machine physical memory
size (minus a bit).

I think don't have so much share/dedicate and watch user-defined/swap.
am i misundestand?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/7] CGroup API: Add cgroup.api control file

2008-02-19 Thread Paul Jackson

Paul M wrote:
> I guess it's not essential, I just figured that if we had that
> information, it made sense to make it available to userspace. I guess
> I'm happy with dropping the actual exposed cgroup.api file for now as
> long as we can work towards reducing the number of control files that
> just return strings, and make use of the structured output such as
> read_uint() miore.

I could certainly go along with that ... reducing the proportion of
control files returning untyped strings.

My sense of kernel-user API's is that usually the less said the better.
Identify the essential information that one side requires from the
other via a runtime API, and pass only that.  API's represent a
lifetime commitment, so the less promised the better.

Perhaps my primary concern with these *.api files was that I did not
understand who or what the critical use or user was; who found this
essential, not just nice to have.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] -mm: fix nommu path broken by procfs task exe symlink

2008-02-19 Thread Matt Helsley

Hi Andrew,

nommu configurations will not compile because the "mm" variable does not
exist. Replace usage of the mm variable and the empty vma->vm_mm field
with correct mm pointers.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
Cc: Mike Frysinger <[EMAIL PROTECTED]>
---
Needs testing on a nommu system. I am working on getting an emulated nommu
environment built but it is not coming together quickly.

 mm/nommu.c |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

Index: linux-2.6.24-mm1/mm/nommu.c
===
--- linux-2.6.24-mm1.orig/mm/nommu.c
+++ linux-2.6.24-mm1/mm/nommu.c
@@ -962,12 +962,14 @@ unsigned long do_mmap_pgoff(struct file 
 
INIT_LIST_HEAD(>anon_vma_node);
atomic_set(>vm_usage, 1);
if (file) {
get_file(file);
-   if (vm_flags & VM_EXECUTABLE)
-   added_exe_file_vma(mm);
+   if (vm_flags & VM_EXECUTABLE) {
+   added_exe_file_vma(current->mm);
+   vma->vm_mm = current->mm;
+   }
}
vma->vm_file= file;
vma->vm_flags   = vm_flags;
vma->vm_start   = addr;
vma->vm_end = addr + len;
@@ -1053,11 +1055,11 @@ unsigned long do_mmap_pgoff(struct file 
 EXPORT_SYMBOL(do_mmap_pgoff);
 
 /*
  * handle mapping disposal for uClinux
  */
-static void put_vma(struct vm_area_struct *vma)
+static void put_vma(struct mm_struct *mm, struct vm_area_struct *vma)
 {
if (vma) {
down_write(_vma_sem);
 
if (atomic_dec_and_test(>vm_usage)) {
@@ -1078,11 +1080,11 @@ static void put_vma(struct vm_area_struc
askedalloc -= sizeof(*vma);
 
if (vma->vm_file) {
fput(vma->vm_file);
if (vma->vm_flags & VM_EXECUTABLE)
-   removed_exe_file_vma(vma->vm_mm);
+   removed_exe_file_vma(mm);
}
kfree(vma);
}
 
up_write(_vma_sem);
@@ -1116,11 +1118,11 @@ int do_munmap(struct mm_struct *mm, unsi
return -EINVAL;
 
  found:
vml = *parent;
 
-   put_vma(vml->vma);
+   put_vma(mm, vml->vma);
 
*parent = vml->next;
realalloc -= kobjsize(vml);
askedalloc -= sizeof(*vml);
kfree(vml);
@@ -1161,11 +1163,11 @@ void exit_mmap(struct mm_struct * mm)
 
mm->total_vm = 0;
 
while ((tmp = mm->context.vmlist)) {
mm->context.vmlist = tmp->next;
-   put_vma(tmp->vma);
+   put_vma(mm, tmp->vma);
 
realalloc -= kobjsize(tmp);
askedalloc -= sizeof(*tmp);
kfree(tmp);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [LMB]: Fix lmb_add_region if region should be added at the head

2008-02-19 Thread Kumar Gala

On Feb 19, 2008, at 10:45 PM, David Miller wrote:

From: Kumar Gala <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 22:27:48 -0600 (CST)

We introduced a bug in fixing lmb_add_region to handle an initial
region being non-zero.  Before that fix it was impossible to insert
a region at the head of the list since the first region always  
started

at zero.

Now that its possible for the first region to be non-zero we need to
check to see if the new region should be added at the head and if so
actually add it.

Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>

...
@@ -184,6 +184,11 @@ static long __init lmb_add_region(struct  
lmb_region *rgn, u64 base, u64 size)

break;
}
}
+
+   if (base < rgn->region[0].base) {
+   rgn->region[0].base = base;
+   rgn->region[0].size = size;
+   }
rgn->cnt++;

return 0;

Are you sure this is sufficient?

It seems to me, to handle this properly, you'll need to handle
the case where the lower addressed entry you are inserting is
not contiguous with the existing entry 0.

Therefore, you need to move all existing entries up a slot,
then you can set the 0 entry to 'base' and 'size'.

The for loop above the code I added will move all the existing slots  
up one.  Its just the tail cleanup we are missing.

- k
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bootup crash, -git] Re: patch pci-pcie-aspm-support.patchadded to gregkh-2.6 tree

2008-02-19 Thread Greg KH

On Wed, Feb 20, 2008 at 12:48:21PM +0800, Shaohua Li wrote:
> 
> On Tue, 2008-02-19 at 20:14 -0800, Greg KH wrote:
> > On Wed, Feb 20, 2008 at 09:36:07AM +0800, Shaohua Li wrote:
> > > --- linux.orig/include/linux/pci-acpi.h   2008-02-19 11:03:51.0 
> > > +0800
> > > +++ linux/include/linux/pci-acpi.h2008-02-20 09:19:15.0 
> > > +0800
> > > @@ -47,6 +47,7 @@
> > >   OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL)
> > >  
> > >  #ifdef CONFIG_ACPI
> > > +#include 
> > >  extern acpi_status pci_osc_control_set(acpi_handle handle, u32 flags);
> > >  extern acpi_status __pci_osc_support_set(u32 flags, const char *hid);
> > >  static inline acpi_status pci_osc_support_set(u32 flags)
> > > @@ -59,13 +60,11 @@ static inline acpi_status pcie_osc_suppo
> > >  }
> > >  #else
> > >  #if !defined(AE_ERROR)
> > > -typedef u32  acpi_status;
> > > -#define AE_ERROR (acpi_status) (0x0001)
> > > -#endif
> > > -static inline acpi_status pci_osc_control_set(acpi_handle handle, u32 
> > > flags)
> > > -{return AE_ERROR;}
> > > -static inline acpi_status pci_osc_support_set(u32 flags) {return 
> > > AE_ERROR;} 
> > > -static inline acpi_status pcie_osc_support_set(u32 flags) {return 
> > > AE_ERROR;}
> > > +#define AE_ERROR (0x0001)
> > > +#endif
> > > +#define pci_osc_control_set(handle, flags) (AE_ERROR)
> > > +#define pci_osc_support_set(flags) (AE_ERROR)
> > > +#define pcie_osc_support_set(flags) (AE_ERROR)
> > 
> > No, please use inline functions, don't change these functions that
> > should be just fine.  Why are you needing to change them?
> some types aren't defined in non-ACPI, like acpi_handle, acpi_status.

Then why include a non-ACPI header file in non-ACPI .c files?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] Cpusets API: Update cpusets to use cgroup structured file API

2008-02-19 Thread Paul Menage

Many of the cpusets control files are simple integer values, which
don't require the overhead of memory allocations for reads and writes.

Move the handlers for these control files into cpuset_read_uint() and
cpuset_write_uint(). This also has the advantage that the control
files show up as "u64" rather than "string" in the cgroup.api file.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 kernel/cpuset.c |  156 +---
 1 file changed, 82 insertions(+), 74 deletions(-)

Index: cpusets-2.6.25-rc2-mm1/kernel/cpuset.c
===
--- cpusets-2.6.25-rc2-mm1.orig/kernel/cpuset.c
+++ cpusets-2.6.25-rc2-mm1/kernel/cpuset.c
@@ -999,19 +999,6 @@ int current_cpuset_is_being_rebound(void
 }
 
 /*
- * Call with cgroup_mutex held.
- */
-
-static int update_memory_pressure_enabled(struct cpuset *cs, char *buf)
-{
-   if (simple_strtoul(buf, NULL, 10) != 0)
-   cpuset_memory_pressure_enabled = 1;
-   else
-   cpuset_memory_pressure_enabled = 0;
-   return 0;
-}
-
-/*
  * update_flag - read a 0 or a 1 in a file and update associated flag
  * bit:the bit to update (CS_CPU_EXCLUSIVE, CS_MEM_EXCLUSIVE,
  * CS_SCHED_LOAD_BALANCE,
@@ -1023,15 +1010,13 @@ static int update_memory_pressure_enable
  * Call with cgroup_mutex held.
  */
 
-static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, char *buf)
+static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
+  int turning_on)
 {
-   int turning_on;
struct cpuset trialcs;
int err;
int cpus_nonempty, balance_flag_changed;
 
-   turning_on = (simple_strtoul(buf, NULL, 10) != 0);
-
trialcs = *cs;
if (turning_on)
set_bit(bit, );
@@ -1247,43 +1232,65 @@ static ssize_t cpuset_common_file_write(
case FILE_MEMLIST:
retval = update_nodemask(cs, buffer);
break;
+   default:
+   retval = -EINVAL;
+   goto out2;
+   }
+
+   if (retval == 0)
+   retval = nbytes;
+out2:
+   cgroup_unlock();
+out1:
+   kfree(buffer);
+   return retval;
+}
+
+static int cpuset_write_uint(struct cgroup *cgrp, struct cftype *cft, u64 val)
+{
+   int retval = 0;
+   struct cpuset *cs = cgroup_cs(cgrp);
+   cpuset_filetype_t type = cft->private;
+
+   cgroup_lock();
+
+   if (cgroup_is_removed(cgrp)) {
+   cgroup_unlock();
+   return -ENODEV;
+   }
+
+   switch (type) {
case FILE_CPU_EXCLUSIVE:
-   retval = update_flag(CS_CPU_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_CPU_EXCLUSIVE, cs, val);
break;
case FILE_MEM_EXCLUSIVE:
-   retval = update_flag(CS_MEM_EXCLUSIVE, cs, buffer);
+   retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
break;
case FILE_SCHED_LOAD_BALANCE:
-   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, buffer);
+   retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
break;
case FILE_MEMORY_MIGRATE:
-   retval = update_flag(CS_MEMORY_MIGRATE, cs, buffer);
+   retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
break;
case FILE_MEMORY_PRESSURE_ENABLED:
-   retval = update_memory_pressure_enabled(cs, buffer);
+   cpuset_memory_pressure_enabled = !!val;
break;
case FILE_MEMORY_PRESSURE:
retval = -EACCES;
break;
case FILE_SPREAD_PAGE:
-   retval = update_flag(CS_SPREAD_PAGE, cs, buffer);
+   retval = update_flag(CS_SPREAD_PAGE, cs, val);
cs->mems_generation = cpuset_mems_generation++;
break;
case FILE_SPREAD_SLAB:
-   retval = update_flag(CS_SPREAD_SLAB, cs, buffer);
+   retval = update_flag(CS_SPREAD_SLAB, cs, val);
cs->mems_generation = cpuset_mems_generation++;
break;
default:
retval = -EINVAL;
-   goto out2;
+   break;
}
-
-   if (retval == 0)
-   retval = nbytes;
-out2:
cgroup_unlock();
-out1:
-   kfree(buffer);
return retval;
 }
 
@@ -1345,30 +1352,6 @@ static ssize_t cpuset_common_file_read(s
case FILE_MEMLIST:
s += cpuset_sprintf_memlist(s, cs);
break;
-   case FILE_CPU_EXCLUSIVE:
-   *s++ = is_cpu_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_MEM_EXCLUSIVE:
-   *s++ = is_mem_exclusive(cs) ? '1' : '0';
-   break;
-   case FILE_SCHED_LOAD_BALANCE:
-   *s++ = is_sched_load_balance(cs) ? '1' : '0';
-   break;
-   case FILE_MEMORY_MIGRATE:
-

[PATCH 1/2] Cpusets API: From: Paul Jackson <[EMAIL PROTECTED]>

2008-02-19 Thread Paul Menage

Strip all trailing whitespace in cgroup_write_uint

This removes the need for people to remember to pass the -n flag to
echo when writing values to cgroup control files.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---
 kernel/cgroup.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

Index: cpusets-2.6.25-rc2-mm1/kernel/cgroup.c
===
--- cpusets-2.6.25-rc2-mm1.orig/kernel/cgroup.c
+++ cpusets-2.6.25-rc2-mm1/kernel/cgroup.c
@@ -1321,10 +1321,7 @@ static ssize_t cgroup_write_uint(struct 
return -EFAULT;
 
buffer[nbytes] = 0; /* nul-terminate */
-
-   /* strip newline if necessary */
-   if (nbytes && (buffer[nbytes-1] == '\n'))
-   buffer[nbytes-1] = 0;
+   strstrip(buffer);
val = simple_strtoull(buffer, , 0);
if (*end)
return -EINVAL;

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] Cpusets API: Update Cpusets control files

2008-02-19 Thread Paul Menage

This pair of patches simplifies the cpusets read/write path for the
control files that consist of simple integers.

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bootup crash, -git] Re: patch pci-pcie-aspm-support.patchadded to gregkh-2.6 tree

2008-02-19 Thread Shaohua Li


On Tue, 2008-02-19 at 20:14 -0800, Greg KH wrote:
> On Wed, Feb 20, 2008 at 09:36:07AM +0800, Shaohua Li wrote:
> > --- linux.orig/include/linux/pci-acpi.h 2008-02-19 11:03:51.0 
> > +0800
> > +++ linux/include/linux/pci-acpi.h  2008-02-20 09:19:15.0 +0800
> > @@ -47,6 +47,7 @@
> > OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL)
> >  
> >  #ifdef CONFIG_ACPI
> > +#include 
> >  extern acpi_status pci_osc_control_set(acpi_handle handle, u32 flags);
> >  extern acpi_status __pci_osc_support_set(u32 flags, const char *hid);
> >  static inline acpi_status pci_osc_support_set(u32 flags)
> > @@ -59,13 +60,11 @@ static inline acpi_status pcie_osc_suppo
> >  }
> >  #else
> >  #if !defined(AE_ERROR)
> > -typedef u32acpi_status;
> > -#define AE_ERROR   (acpi_status) (0x0001)
> > -#endif
> > -static inline acpi_status pci_osc_control_set(acpi_handle handle, u32 
> > flags)
> > -{return AE_ERROR;}
> > -static inline acpi_status pci_osc_support_set(u32 flags) {return 
> > AE_ERROR;} 
> > -static inline acpi_status pcie_osc_support_set(u32 flags) {return 
> > AE_ERROR;}
> > +#define AE_ERROR   (0x0001)
> > +#endif
> > +#define pci_osc_control_set(handle, flags) (AE_ERROR)
> > +#define pci_osc_support_set(flags) (AE_ERROR)
> > +#define pcie_osc_support_set(flags) (AE_ERROR)
> 
> No, please use inline functions, don't change these functions that
> should be just fine.  Why are you needing to change them?
some types aren't defined in non-ACPI, like acpi_handle, acpi_status.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] exporting capability code/name pairs (try #6)

2008-02-19 Thread Greg KH

On Wed, Feb 20, 2008 at 01:38:59PM +0900, Kohei KaiGai wrote:
> >> If we can have a private member in kobj_attribute, we can found the 
> content
> >> to be returned in a single step.
> >
> > Ok, again, just send me a patch that adds this functionality and we will
> > be very glad to consider it.
>
> [1/2] Add a private data field within kobj_attribute structure.
>
> This patch add a private data field, declared as void *, within 
> kobj_attribute
> structure. Anyone wants to use sysfs can store their private data to refer 
> at
> _show() and _store() method.
> It enables to share a single method function with several similar entries,
> like ones to export the list of capabilities the running kernel supported.

But your patch 2/2 doesn't use this interface, why not?

>  include/linux/kobject.h |1 +
>  include/linux/sysfs.h   |7 +++
>  2 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index caa3f41..57d5bf1 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -130,6 +130,7 @@ struct kobj_attribute {
>   char *buf);
>   ssize_t (*store)(struct kobject *kobj, struct kobj_attribute *attr,
>const char *buf, size_t count);
> + void *data; /* a private field */

Hm, can you really use this?

>  extern struct sysfs_ops kobj_sysfs_ops;
> diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
> index 8027104..6f40ff9 100644
> --- a/include/linux/sysfs.h
> +++ b/include/linux/sysfs.h
> @@ -50,6 +50,13 @@ struct attribute_group {
>   .store  = _store,   \
>  }
>
> +#define __ATTR_DATA(_name,_mode,_show,_store,_data) {\
> + .attr = {.name = __stringify(_name), .mode = _mode },   \
> + .show   = _show,\
> + .store  = _store,   \
> + .data   = (void *)(_data),  \
> +}

I don't see how this would be any different from the original?  You are
always passed a kobject, which can be embedded in anything else.

Could you also modify the documentation and the sample code to use this
new field, showing how it is to be used, and testing that it works
properly at the same time?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-19 Thread Paul Jackson

Kosaki-san wrote:
> Yes.
> Fujitsu HPC middleware watching sum of memory consumption of the job
> and, if over-consumption happened, kill process and remove job schedule.

Did those jobs share nodes -- sometimes two or more jobs using the same
nodes?  I am sure SGI has such users too, though such job mixes make
the runtimes of specific jobs less obvious, so customers are more
tolerant of variations and some inefficiencies, as they get hidden in
the mix.

In other words, Rik, both yes and no ;).  Both sorts of HPC loads
exist, sharing nodes and a dedicated set of nodes for each job.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [RFC] Smack: unlabeled outgoing ambient packets - v2

2008-02-19 Thread Casey Schaufler


From: Casey Schaufler <[EMAIL PROTECTED]>

Smack uses CIPSO labeling, but allows for unlabeled packets
by specifying an "ambient" label that is applied to incoming
unlabeled packets. Because the other end of the connection
may dislike IP options, and ssh is one know application that
behaves thus, it is prudent to respond in kind. This patch
changes the network labeling behavior such that an outgoing
packet that would be given a CIPSO label that matches the
ambient label is left unlabeled. An "unlbl" domain is added
and the netlabel defaulting mechanism invoked rather than
assuming that everything is CIPSO. Locking has been added
around changes to the ambient label as the mechanisms used
to do so are more involved.

Signed-off-by: Casey Schaufler <[EMAIL PROTECTED]>

---

This patch differs significantly from the previous version.
I think that I am using the netlbl interfaces more appropriately,
Paul, please let me know if there's a better approach.

It's inconvenient that netlbl_sock_setattr frees the domain passed.
I see that it makes sense for SELinux with the way SELinux treats
secctx's, but Smack is more careful about memory usage and I have
to do what I consider a gratuitous kalloc because of this behavior.
Would you be open to a patch to change this if it included the SELinux
changes?

Thank you.

security/smack/smack_lsm.c |   20 ++--
security/smack/smackfs.c   |   54 +--
2 files changed, 49 insertions(+), 25 deletions(-)

diff -uprN -X linux-2.6.25-g0210-base//Documentation/dontdiff 
linux-2.6.25-g0210-base/security/smack/smackfs.c 
linux-2.6.25-g0210/security/smack/smackfs.c
--- linux-2.6.25-g0210-base/security/smack/smackfs.c2008-02-10 
19:30:47.0 -0800
+++ linux-2.6.25-g0210/security/smack/smackfs.c 2008-02-11 07:14:54.0 
-0800
@@ -45,6 +45,7 @@ enum smk_inos {
 */
static DEFINE_MUTEX(smack_list_lock);
static DEFINE_MUTEX(smack_cipso_lock);
+static DEFINE_MUTEX(smack_ambient_lock);

/*
 * This is the "ambient" label for network traffic.
@@ -363,6 +364,27 @@ void smk_cipso_doi(void)
   __func__, __LINE__, rc);
}

+/**
+ * smk_unlbl_ambient - initialize the unlabeled domain
+ */
+void smk_unlbl_ambient(char *oldambient)
+{
+   int rc;
+   struct netlbl_audit audit_info;
+
+   if (oldambient != NULL) {
+   rc = netlbl_cfg_map_del(oldambient, _info);
+   if (rc != 0)
+   printk(KERN_WARNING "%s:%d remove rc = %d\n",
+  __func__, __LINE__, rc);
+   }
+
+   rc = netlbl_cfg_unlbl_add_map(smack_net_ambient, _info);
+   if (rc != 0)
+   printk(KERN_WARNING "%s:%d add rc = %d\n",
+  __func__, __LINE__, rc);
+}
+
/*
 * Seq_file read operations for /smack/cipso
 */
@@ -709,7 +731,6 @@ static ssize_t smk_read_ambient(struct f
size_t cn, loff_t *ppos)
{
ssize_t rc;
-   char out[SMK_LABELLEN];
int asize;

if (*ppos != 0)
@@ -717,23 +738,18 @@ static ssize_t smk_read_ambient(struct f
/*
 * Being careful to avoid a problem in the case where
 * smack_net_ambient gets changed in midstream.
-* Since smack_net_ambient is always set with a value
-* from the label list, including initially, and those
-* never get freed, the worst case is that the pointer
-* gets changed just after this strncpy, in which case
-* the value passed up is incorrect. Locking around
-* smack_net_ambient wouldn't be any better than this
-* copy scheme as by the time the caller got to look
-* at the ambient value it would have cleared the lock
-* and been changed.
 */
-   strncpy(out, smack_net_ambient, SMK_LABELLEN);
-   asize = strlen(out) + 1;
+   mutex_lock(_ambient_lock);

-   if (cn < asize)
-   return -EINVAL;
+   asize = strlen(smack_net_ambient) + 1;

-   rc = simple_read_from_buffer(buf, cn, ppos, out, asize);
+   if (cn >= asize)
+   rc = simple_read_from_buffer(buf, cn, ppos,
+smack_net_ambient, asize);
+   else
+   rc = -EINVAL;
+
+   mutex_unlock(_ambient_lock);

return rc;
}
@@ -751,6 +767,7 @@ static ssize_t smk_write_ambient(struct 
 size_t count, loff_t *ppos)

{
char in[SMK_LABELLEN];
+   char *oldambient;
char *smack;

if (!capable(CAP_MAC_ADMIN))
@@ -766,7 +783,13 @@ static ssize_t smk_write_ambient(struct 
	if (smack == NULL)

return -EINVAL;

+   mutex_lock(_ambient_lock);
+
+   oldambient = smack_net_ambient;
smack_net_ambient = smack;
+   smk_unlbl_ambient(oldambient);
+
+   mutex_unlock(_ambient_lock);

return count;
}
@@ -974,6 +997,7 @@ static int __init init_smk_fs(void)

sema_init(_write_sem, 1);
smk_cipso_doi();
+

Re: [PATCH] [LMB]: Fix lmb_add_region if region should be added at the head

2008-02-19 Thread David Miller

From: Kumar Gala <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 22:27:48 -0600 (CST)

> We introduced a bug in fixing lmb_add_region to handle an initial
> region being non-zero.  Before that fix it was impossible to insert
> a region at the head of the list since the first region always started
> at zero.
> 
> Now that its possible for the first region to be non-zero we need to
> check to see if the new region should be added at the head and if so
> actually add it.
> 
> Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>
 ...
> @@ -184,6 +184,11 @@ static long __init lmb_add_region(struct lmb_region 
> *rgn, u64 base, u64 size)
>   break;
>   }
>   }
> +
> + if (base < rgn->region[0].base) {
> + rgn->region[0].base = base;
> + rgn->region[0].size = size;
> + }
>   rgn->cnt++;
> 
>   return 0;

Are you sure this is sufficient?

It seems to me, to handle this properly, you'll need to handle
the case where the lower addressed entry you are inserting is
not contiguous with the existing entry 0.

Therefore, you need to move all existing entries up a slot,
then you can set the 0 entry to 'base' and 'size'.

What do you think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] exporting capability code/name pairs (try #6)

2008-02-19 Thread Kohei KaiGai


Greg KH wrote:

On Mon, Feb 18, 2008 at 05:45:46PM +0900, Kohei KaiGai wrote:

Greg KH wrote:

Also, this code can be cleaned up a lot by just using the basic kobject
attributes, and not rolling your own types here.

I replaced my own defined capability_attribute by kobj_attribute.

It made the patch cleaned up, however, it also impossible to share a single
_show() method instance, because kobj_attribute does not have any private 
member.
Is there any reason why kobj_attribute does not have "void *private;"?

Because no one has asked for it?  :)

Or you can just do as the example in samples/kobject/ does it, no need
for the void pointer as that code shows.

It shows us a good example in samples/kobject.

However, it is unsuitable to export the list of capabilities.
The shared _show() method (b_show) calls strcmp() once with the name of kobject
attribute to switch its returning string.
If we have 34 of candidates to be returned, like the capability case, we have
to call strcmp() 33 times in maximum.

If we can have a private member in kobj_attribute, we can found the content
to be returned in a single step.


Ok, again, just send me a patch that adds this functionality and we will
be very glad to consider it.


In the attached patch, every attribute entry stores its capability
identifier in numerical or symbolic representation within private
data field of kobj_attribute structure.
The rest of them are unchanged.


[2/2] Exporting capability code/name pairs

This patch enables to export code/name of capabilities supported
on the running kernel.

A newer kernel sometimes adds new capabilities, like CAP_MAC_ADMIN
at 2.6.25. However, we have no interface to disclose what capabilities
are supported on this kernel. Thus, we have to maintain libcap version
in appropriate one synchronously.

This patch enables libcap to collect the list of capabilities on
run time, and provide them for users.
It helps to improve portability of library.

It exports these information as regular files under /sys/kernel/capability.
The numeric node exports its name, the symbolic node exports its code.

Please consider to put this patch on the queue of 2.6.25.

Thanks,
===
[EMAIL PROTECTED] ~]$ ls -R /sys/kernel/capability/
/sys/kernel/capability/:
codes  names  version

/sys/kernel/capability/codes:
0  10  12  14  16  18  2   21  23  25  27  29  30  32  4  6  8
1  11  13  15  17  19  20  22  24  26  28  3   31  33  5  7  9

/sys/kernel/capability/names:
cap_audit_controlcap_kill  cap_net_raw cap_sys_nice
cap_audit_write  cap_lease cap_setfcap cap_sys_pacct
cap_chowncap_linux_immutable   cap_setgid  cap_sys_ptrace
cap_dac_override cap_mac_admin cap_setpcap cap_sys_rawio
cap_dac_read_search  cap_mac_override  cap_setuid  cap_sys_resource
cap_fowner   cap_mknod cap_sys_admin   cap_sys_time
cap_fsetid   cap_net_admin cap_sys_bootcap_sys_tty_config
cap_ipc_lock cap_net_bind_service  cap_sys_chroot
cap_ipc_ownercap_net_broadcast cap_sys_module
[EMAIL PROTECTED] ~]$ cat /sys/kernel/capability/version
0x20071026
[EMAIL PROTECTED] ~]$ cat /sys/kernel/capability/codes/30
cap_audit_control
[EMAIL PROTECTED] ~]$ cat /sys/kernel/capability/names/cap_sys_pacct
20
[EMAIL PROTECTED] ~]$
===

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>

--
 Documentation/ABI/testing/sysfs-kernel-capability |   23 +
 scripts/mkcapnames.sh |   44 +
 security/Makefile |9 ++
 security/commoncap.c  |   99 +
 4 files changed, 175 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-capability 
b/Documentation/ABI/testing/sysfs-kernel-capability
index e69de29..402ef06 100644
--- a/Documentation/ABI/testing/sysfs-kernel-capability
+++ b/Documentation/ABI/testing/sysfs-kernel-capability
@@ -0,0 +1,23 @@
+What:  /sys/kernel/capability
+Date:  Feb 2008
+Contact:   KaiGai Kohei <[EMAIL PROTECTED]>
+Description:
+   The entries under /sys/kernel/capability are used to export
+   the list of capabilities the running kernel supported.
+
+   - /sys/kernel/capability/version
+ returns the most preferable version number for the
+ running kernel.
+ e.g) $ cat /sys/kernel/capability/version
+  0x20071026
+
+   - /sys/kernel/capability/code/
+ returns its symbolic representation, on reading.
+ e.g) $ cat /sys/kernel/capability/codes/30
+  cap_audit_control
+
+   - /sys/kernel/capability/name/
+ returns its numerical representation, on reading.
+ e.g) $ cat

[PATCH] exporting capability code/name pairs (try #6)

2008-02-19 Thread Kohei KaiGai


>> If we can have a private member in kobj_attribute, we can found the content
>> to be returned in a single step.
>
> Ok, again, just send me a patch that adds this functionality and we will
> be very glad to consider it.

[1/2] Add a private data field within kobj_attribute structure.

This patch add a private data field, declared as void *, within kobj_attribute
structure. Anyone wants to use sysfs can store their private data to refer at
_show() and _store() method.
It enables to share a single method function with several similar entries,
like ones to export the list of capabilities the running kernel supported.

Signed-off-by: KaiGai Kohei <[EMAIL PROTECTED]>
--
 include/linux/kobject.h |1 +
 include/linux/sysfs.h   |7 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index caa3f41..57d5bf1 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -130,6 +130,7 @@ struct kobj_attribute {
char *buf);
ssize_t (*store)(struct kobject *kobj, struct kobj_attribute *attr,
 const char *buf, size_t count);
+   void *data; /* a private field */
 };

 extern struct sysfs_ops kobj_sysfs_ops;
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 8027104..6f40ff9 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -50,6 +50,13 @@ struct attribute_group {
.store  = _store,   \
 }

+#define __ATTR_DATA(_name,_mode,_show,_store,_data) {  \
+   .attr = {.name = __stringify(_name), .mode = _mode },   \
+   .show   = _show,\
+   .store  = _store,   \
+   .data   = (void *)(_data),  \
+}
+   
 #define __ATTR_RO(_name) { \
.attr   = { .name = __stringify(_name), .mode = 0444 }, \
.show   = _name##_show, \

--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-19 Thread Paul Jackson

Rik wrote:
> In that case the user is better off having that job killed and
> restarted elsewhere, than having all of the jobs on that node
> crawl to a halt due to swapping.
> 
> Paul, is this guess correct? :)

Not for the loads I focus on.  Each job gets exclusive use of its own
dedicated set of nodes, for the duration of the job.  With that comes a
quite specific upper limit on how much memory, in total, including node
local kernel data, that job is allowed to use.

One problem with swapping is that nodes aren't entirely isolated.
They share buses, i/o channels, disk arms, kernel data cache lines and
kernel locks with other nodes, running other jobs.   A job thrashing
its swap is a drag on the rest of the system.

Another problem with swapping is that it's a waste of resources.  Once
a pure compute bound job goes into swapping when it shouldn't, that job
has near zero hope of continuing with the intended performance, as it
has just slowed from main memory speeds to disk speeds, which are
thousands of times slower.  Best to get it out of there, immediately.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.25-rc2 System no longer powers off after suspend-to-disk. Screen becomes green.

2008-02-19 Thread Jesse Barnes

On Tuesday, February 19, 2008 6:28 pm Linus Torvalds wrote:
> On Tue, 19 Feb 2008, Jesse Barnes wrote:
> > I found the same poweroff issue on my T61.  It turned out to be related
> > to the C state code disabling interrupts when it shouldn't iirc.  Booting
> > with 'idle=poll' seems to work around the problem.
> >
> > The "green screen" problem should be fixed (see the DRM git tree for
> > details).
>
> ..and the latter is hopefully now merged in my tree too (at least some of
> the drm updates are).

Cool, thanks.

Jeff, can you retest with Linus' tree?  If you're still seeing problems, it 
might help to add some printks to the i915 driver's suspend routine.  Just 
reading the regs really shouldn't cause a hang, but maybe the VGA bits are 
subtly wrong again...

Thanks,
Jesse
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [LMB]: Fix lmb_add_region if region should be added at the head

2008-02-19 Thread Kumar Gala

We introduced a bug in fixing lmb_add_region to handle an initial
region being non-zero.  Before that fix it was impossible to insert
a region at the head of the list since the first region always started
at zero.

Now that its possible for the first region to be non-zero we need to
check to see if the new region should be added at the head and if so
actually add it.

Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>
---
 lib/lmb.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/lib/lmb.c b/lib/lmb.c
index e3c8dcb..3c43b95 100644
--- a/lib/lmb.c
+++ b/lib/lmb.c
@@ -184,6 +184,11 @@ static long __init lmb_add_region(struct lmb_region *rgn, 
u64 base, u64 size)
break;
}
}
+
+   if (base < rgn->region[0].base) {
+   rgn->region[0].base = base;
+   rgn->region[0].size = size;
+   }
rgn->cnt++;

return 0;
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ofa-general] [2.6 patch] infiniband/hw/nes/nes_verbs.c: fix off-by-one

2008-02-19 Thread Roland Dreier

Thanks, this is already upstream as 51af33e8
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] sysfs: small header file cleanup

2008-02-19 Thread Randy Dunlap


David Rientjes wrote:

Convert sysfs_remove_bin_file() to have a return type of 'void' for
!CONFIG_SYSFS configurations.  Also removes unnecessary colons from empty
void functions.

Cc: Randy Dunlap <[EMAIL PROTECTED]>


Reviewed-by: Randy Dunlap <[EMAIL PROTECTED]>

Thanks, David.


Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/linux/sysfs.h |9 ++---
 1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -131,7 +131,6 @@ static inline int sysfs_create_dir(struct kobject *kobj)
 
 static inline void sysfs_remove_dir(struct kobject *kobj)

 {
-   ;
 }
 
 static inline int sysfs_rename_dir(struct kobject *kobj, const char *new_name)

@@ -160,7 +159,6 @@ static inline int sysfs_chmod_file(struct kobject *kobj,
 static inline void sysfs_remove_file(struct kobject *kobj,
 const struct attribute *attr)
 {
-   ;
 }
 
 static inline int sysfs_create_bin_file(struct kobject *kobj,

@@ -169,10 +167,9 @@ static inline int sysfs_create_bin_file(struct kobject 
*kobj,
return 0;
 }
 
-static inline int sysfs_remove_bin_file(struct kobject *kobj,

-   struct bin_attribute *attr)
+static inline void sysfs_remove_bin_file(struct kobject *kobj,
+struct bin_attribute *attr)
 {
-   return 0;
 }
 
 static inline int sysfs_create_link(struct kobject *kobj,

@@ -183,7 +180,6 @@ static inline int sysfs_create_link(struct kobject *kobj,
 
 static inline void sysfs_remove_link(struct kobject *kobj, const char *name)

 {
-   ;
 }
 
 static inline int sysfs_create_group(struct kobject *kobj,

@@ -195,7 +191,6 @@ static inline int sysfs_create_group(struct kobject *kobj,
 static inline void sysfs_remove_group(struct kobject *kobj,
  const struct attribute_group *grp)
 {
-   ;
 }
 
 static inline int sysfs_add_file_to_group(struct kobject *kobj,



--
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bootup crash, -git] Re: patch pci-pcie-aspm-support.patchadded to gregkh-2.6 tree

2008-02-19 Thread Greg KH

On Wed, Feb 20, 2008 at 09:36:07AM +0800, Shaohua Li wrote:
> --- linux.orig/include/linux/pci-acpi.h   2008-02-19 11:03:51.0 
> +0800
> +++ linux/include/linux/pci-acpi.h2008-02-20 09:19:15.0 +0800
> @@ -47,6 +47,7 @@
>   OSC_PCI_EXPRESS_CAP_STRUCTURE_CONTROL)
>  
>  #ifdef CONFIG_ACPI
> +#include 
>  extern acpi_status pci_osc_control_set(acpi_handle handle, u32 flags);
>  extern acpi_status __pci_osc_support_set(u32 flags, const char *hid);
>  static inline acpi_status pci_osc_support_set(u32 flags)
> @@ -59,13 +60,11 @@ static inline acpi_status pcie_osc_suppo
>  }
>  #else
>  #if !defined(AE_ERROR)
> -typedef u32  acpi_status;
> -#define AE_ERROR (acpi_status) (0x0001)
> -#endif
> -static inline acpi_status pci_osc_control_set(acpi_handle handle, u32 flags)
> -{return AE_ERROR;}
> -static inline acpi_status pci_osc_support_set(u32 flags) {return AE_ERROR;} 
> -static inline acpi_status pcie_osc_support_set(u32 flags) {return AE_ERROR;}
> +#define AE_ERROR (0x0001)
> +#endif
> +#define pci_osc_control_set(handle, flags) (AE_ERROR)
> +#define pci_osc_support_set(flags) (AE_ERROR)
> +#define pcie_osc_support_set(flags) (AE_ERROR)

No, please use inline functions, don't change these functions that
should be just fine.  Why are you needing to change them?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] procfs task exe symlink

2008-02-19 Thread Mike Frysinger

On Feb 19, 2008 4:54 PM, Matt Helsley <[EMAIL PROTECTED]> wrote:
> On Sat, 2008-02-16 at 07:12 -0500, Mike Frysinger wrote:
> > On Feb 6, 2008 8:44 PM, Matt Helsley <[EMAIL PROTECTED]> wrote:
> > > The kernel implements readlink of /proc/pid/exe by getting the file from 
> > > the
> > > first executable VMA. Then the path to the file is reconstructed and 
> > > reported as
> > > the result.
> > >
> > > Because of the VMA walk the code is slightly different on nommu systems. 
> > > This
> > > patch avoids separate /proc/pid/exe code on nommu systems. Instead of 
> > > walking
> > > the VMAs to find the first executable file-backed VMA we store a 
> > > reference to
> > > the exec'd file in the mm_struct.
> > >
> > > That reference would prevent the filesystem holding the executable file 
> > > from
> > > being unmounted even after unmapping the VMAs. So we track the number of
> > > VM_EXECUTABLE VMAs and drop the new reference when the last one is 
> > > unmapped.
> > > This avoids pinning the mounted filesystem.
> > >
> > > Andrew, these are the updates I promised. Please consider this patch for
> > > inclusion in -mm.
> >
> > mm/nommu.c wasnt compiled tested, it's trivially broken:
>
> Thanks for the report. I've looked into this and the "obvious" fix,
> using vma->vm_mm, isn't correct since it's never set. This means the
> portions of the procfs task exe symlink patch using vma->vm_mm in
> mm/nommu.c are incorrect.
>
> The patch below attempts to fix this by using current->mm during mmap
> and passing the current mm to put_vma() as well.
>
> Mike, does this patch fix the compile problem(s)?

thanks, that does fix compiling.

> Mike, I don't have a nommu compile or test environment. I have yet
> to to generate a good CONFIG_MMU=n config on i386 (got one?). Since I
> don't have any nommu hardware I'm also looking into using qemu. It looks
> like I may need to build my own image from scratch. Do you have any
> recommendations on testing nommu configs without hardware?

it should be easy to cross-compile a kernel for the Blackfin
processor, but there is no qemu port.  the Blackfin port is the only
one ive done with no-mmu, so i cant comment on any other port.
-mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Provide u64 version of jiffies_to_usecs() in kernel/tsacct.c

2008-02-19 Thread Randy Dunlap


Jonathan Lim wrote:

It's possible that the values used in and returned from jiffies_to_usecs() are
incorrect because of truncation when variables of type u64 are involved.  So a
function specific to that type is used instead.

This version implements a correction to jiffies_64_to_usecs() based on feedback
from Randy Dunlap.

Diff'd against: linux/kernel/git/torvalds/linux-2.6.git

Signed-off-by: Jonathan Lim <[EMAIL PROTECTED]>

--- a/include/linux/jiffies.h   Thu Feb 14 18:04:14 PST 2008
+++ b/include/linux/jiffies.h   Thu Feb 14 18:07:17 PST 2008
@@ -42,7 +42,7 @@
 /* LATCH is used in the interval timer and ftape setup. */
 #define LATCH  ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */
 
-/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, then we can

+/* Suppose we want to divide two numbers NOM and DEN: NOM/DEN, then we can
  * improve accuracy by shifting LSH bits, hence calculating:
  * (NOM << LSH) / DEN
  * This however means trouble for large NOM, because (NOM << LSH) may no
@@ -204,7 +204,7 @@ extern unsigned long preset_lpj;
  * operator if the result is a long long AND at least one of the
  * operands is cast to long long (usually just prior to the "*" so as
  * not to confuse it into thinking it really has a 64-bit operand,
- * which, buy the way, it can do, but it takes more code and at least 2
+ * which, by the way, it can do, but it takes more code and at least 2
  * mpys).
 
  * We also need to be aware that one second in nanoseconds is only a

@@ -269,6 +269,7 @@ extern unsigned long preset_lpj;
  */
 extern unsigned int jiffies_to_msecs(const unsigned long j);
 extern unsigned int jiffies_to_usecs(const unsigned long j);
+extern u64 jiffies_64_to_usecs(const u64 j);
 extern unsigned long msecs_to_jiffies(const unsigned int m);
 extern unsigned long usecs_to_jiffies(const unsigned int u);
 extern unsigned long timespec_to_jiffies(const struct timespec *value);
--- a/kernel/time.c Thu Feb 14 18:05:12 PST 2008
+++ b/kernel/time.c Tue Feb 19 17:00:11 PST 2008


kernel/time.c needs:
#include 

After that, it's
Acked-by: Randy Dunlap <[EMAIL PROTECTED]>


@@ -268,6 +268,12 @@ unsigned int inline jiffies_to_usecs(con
 }
 EXPORT_SYMBOL(jiffies_to_usecs);
 
+u64 jiffies_64_to_usecs(const u64 j)

+{
+   return div64_64(j*HZ_TO_USEC_NUM + HZ_TO_USEC_DEN-1, HZ_TO_USEC_DEN);
+}
+EXPORT_SYMBOL(jiffies_64_to_usecs);
+
 /**
  * timespec_trunc - Truncate timespec to a granularity
  * @t: Timespec
--- a/kernel/tsacct.c   Thu Feb 14 18:06:17 PST 2008
+++ b/kernel/tsacct.c   Thu Feb 14 18:08:47 PST 2008
@@ -85,8 +85,8 @@ void xacct_add_tsk(struct taskstats *sta
struct mm_struct *mm;
 
 	/* convert pages-jiffies to Mbyte-usec */

-   stats->coremem = jiffies_to_usecs(p->acct_rss_mem1) * PAGE_SIZE / MB;
-   stats->virtmem = jiffies_to_usecs(p->acct_vm_mem1) * PAGE_SIZE / MB;
+   stats->coremem = jiffies_64_to_usecs(p->acct_rss_mem1) * PAGE_SIZE / MB;
+   stats->virtmem = jiffies_64_to_usecs(p->acct_vm_mem1) * PAGE_SIZE / MB;
mm = get_task_mm(p);
if (mm) {
/* adjust to KB unit */



--
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem)

2008-02-19 Thread Nick Piggin

On Wednesday 20 February 2008 14:12, Robin Holt wrote:
> For XPMEM, we do not currently allow file backed
> mapping pages from being exported so we should never reach this condition.
> It has been an issue since day 1.  We have operated with that assumption
> for 6 years and have not had issues with that assumption.  The user of
> xpmem is MPT and it controls the communication buffers so it is reasonable
> to expect this type of behavior.

OK, that makes things simpler.

So why can't you export a device from your xpmem driver, which
can be mmap()ed to give out "anonymous" memory pages to be used
for these communication buffers?

I guess you may also want an "munmap/mprotect" callback, which
we don't have in the kernel right now... but at least you could
prototype it easily by having an ioctl to be called before
munmapping or mprotecting (eg. the ioctl could prevent new TLB
setup for the region, and shoot down existing ones).

This is actually going to be much faster for you if you use any
threaded applications, because you will be able to do all the
shootdown round trips outside mmap_sem, and so you will be able
to have other threads faulting and even mmap()ing / munmaping
at the same time as the shootdown is happening.

I guess there is some catch...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] next-20080219 - ide oops while bootup at ide_device_add_all ()

2008-02-19 Thread Kamalesh Babulal

Hi, 

The next-20080219 kernel oops while booting up on x86_64 box. This bug
was fixed in the 2.6.24-git(s) with the patch posted at
http://lkml.org/lkml/2008/2/11/350 

ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
BUG: unable to handle kernel paging request at 80a37b8d
IP: [] ide_device_add_all+0x1a5/0x517
PGD 203067 PUD 207063 PMD 0 
Oops:  [1] SMP 
last sysfs file: 
CPU 1 
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.25-rc2-autotest-next-20080219 #1
RIP: 0010:[]  [] 
ide_device_add_all+0x1a5/0x517
RSP: :8101e7125db0  EFLAGS: 00010206
RAX: 000a8000 RBX: 0009 RCX: 0009
RDX: 8101e7125e60 RSI:  RDI: 8101e7125e60
RBP:  R08:  R09: 8101e68de870
R10: 0177 R11: 0174 R12: 80990080
R13:  R14: 000a R15: 
FS:  () GS:8101e710d740() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 80a37b8d CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 1, threadinfo 8101e7124000, task 8101e7123200)
Stack:  8101e7125e60 0009 01f0 80990080
 0001 000a  8088d725
 0170 0171 0172 0173
Call Trace:
 [] ide_generic_init+0x169/0x1d8
 [] kernel_init+0x17d/0x2e9
 [] child_rip+0xa/0x12
 [] kernel_init+0x0/0x2e9
 [] child_rip+0x0/0x12


Code: 49 ff c5 49 83 fe 0a 0f 85 97 fe ff ff 45 31 ed 48 8b 14 24 41 8a 44 15 
00 3c ff 0f 84 4d 01 00 00 0f b6 c0 48 69 c0 00 0c 00 00 <80> b8 8d fb 98 80 0d 
48 8d 98 80 f4 98 80 75 15 48 8b 80 88 f4 
RIP  [] ide_device_add_all+0x1a5/0x517
 RSP 
CR2: 80a37b8d
---[ end trace 31f82065a26d65bf ]---
Kernel panic - not syncing: Attempted to kill init!
-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Please pull powerpc.git merge branch

2008-02-19 Thread Paul Mackerras

Linus,

Please do

git pull \
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git merge

to get a few more bug and warning fixes for powerpc.  The diffstat is
bloated by the defconfig updates -- the actual code changes are only a
few dozen lines.

Thanks,
Paul.

 arch/powerpc/boot/Makefile |8 
 arch/powerpc/boot/dts/bamboo.dts   |3 
 arch/powerpc/boot/dts/ebony.dts|2 
 arch/powerpc/boot/dts/katmai.dts   |2 
 arch/powerpc/boot/dts/kilauea.dts  |3 
 arch/powerpc/boot/dts/makalu.dts   |3 
 arch/powerpc/boot/dts/rainier.dts  |4 
 arch/powerpc/boot/dts/sequoia.dts  |4 
 arch/powerpc/boot/dts/taishan.dts  |4 
 arch/powerpc/configs/bamboo_defconfig  |   81 ++-
 arch/powerpc/configs/ebony_defconfig   |   79 ++-
 arch/powerpc/configs/ep405_defconfig   |   92 ++-
 arch/powerpc/configs/kilauea_defconfig |   69 ++
 arch/powerpc/configs/makalu_defconfig  |   69 ++
 arch/powerpc/configs/ppc44x_defconfig  |  904 
 arch/powerpc/configs/rainier_defconfig |   82 ++-
 arch/powerpc/configs/sequoia_defconfig |   77 ++-
 arch/powerpc/configs/taishan_defconfig |   81 ++-
 arch/powerpc/configs/walnut_defconfig  |   81 ++-
 arch/powerpc/configs/warp_defconfig|  139 +++--
 arch/powerpc/kernel/kprobes.c  |9 
 arch/powerpc/kernel/prom.c |   13 
 arch/powerpc/platforms/44x/Kconfig |   10 
 arch/powerpc/platforms/pseries/power.c |2 
 arch/ppc/platforms/4xx/ibm440ep.c  |6 
 drivers/net/ibm_newemac/rgmii.c|1 
 26 files changed, 1497 insertions(+), 331 deletions(-)
 create mode 100644 arch/powerpc/configs/ppc44x_defconfig

Ananth N Mavinakayanahalli (1):
  [POWERPC] Kill sparse warnings in kprobes

Becky Bruce (1):
  [POWERPC] Fix dt_mem_next_cell() to read the full address

Josh Boyer (4):
  [POWERPC] 4xx: Update defconfigs for 2.6.25
  [POWERPC] 44x: Fix Kconfig formatting
  [POWERPC] 44x: Add multiplatform defconfig
  [POWERPC] Fix bootwrapper builds with older gcc versions

Stefan Roese (2):
  [POWERPC] net: NEWEMAC: Remove "rgmii-interface" from rgmii matching table
  [POWERPC] 4xx: Remove "i2c" and "xxmii-interface" device_types from dts

Stephen Rothwell (1):
  [POWERPC] Fix warning in pseries/power.c

Wolfgang Ocker (1):
  [POWERPC] PPC440EP Interrupt Triggering and Level Settings

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-19 Thread Herbert Xu

On Tue, Feb 19, 2008 at 05:14:26PM -0500, Tony Battersby wrote:
>
> Update: when I revert Herbert's patch in addition to applying your
> patch, the iSCSI performance goes back up to 115 MB/s again in both
> directions.  So it looks like turning off SG for TX didn't itself cause
> the performance drop, but rather that the performance drop is just
> another manifestation of whatever bug is causing the data corruption.

Interesting.  So the workload that regressed is mostly RX with a
little TX traffic? Can you try to reproduce this with something
like netperf to eliminate other variables?

This is all very puzzling since the patch in question shouldn't
change an RX load at all.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Where to put adapters, /proc is cool

2008-02-19 Thread Valdis . Kletnieks

On Tue, 19 Feb 2008 14:39:45 EST, Karl Dahlke said:

> Really, /proc is the only place for these virtual files that interact
> directly with the kernel and/or its modules;
> I just wanted a fixed place under /proc for adapters to live,
> like sys ttys scsi net, and so on.

There's an awful lot of stuff under /dev for virtual files and similar.
For starters, look at /dev/null, /dev/kmsg, all the dm/md stuff in /dev/mapper,
all the PTY stuff etc etc...


pgpT8bGghjVyF.pgp
Description: PGP signature

Re: Improve init/Kconfig help descriptions [PATCH 4/9]

2008-02-19 Thread Valdis . Kletnieks

On Wed, 20 Feb 2008 01:38:55 +1100, Nick Andrew said:

> +   Enable an auditing infrastructure that can be used with another
> +   kernel subsystem, such as Security-Enhanced Linux (SELinux),
> +   which requires this option for logging of AVC messages output.
> +
> +   AVC refers to Access Vector Cache, a subsystem used by SELinux
> +   to improve performance of the security checking by caching
> +   previous access decisions.

This paragraph can be dropped, as the reasons that SELinux denial messages
are tagged with 'avc' are mostly historical.   If you want to expand on anything
in here, explain that 'AVC' messages are interesting because they indicate
some sort of security rule denial.  So - if you don't enable auditing,
your security messages end up in the kernel syslog.  If you enable auditing,
they end up in the audit logs.  Explaining *that* clearly would be a lot
more useful than explaining what avc originally stood for.. ;)


pgpfqLD9e6x5N.pgp
Description: PGP signature

Re: [PATCH v2][POWERPC] Fix initial lmb add region with a non-zero base

2008-02-19 Thread David Miller

From: Kumar Gala <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 21:02:04 -0600

> np.  Are we trying to get this into 2.6.25 or .26?

I'm ambivalent but I would obviously prefer 2.6.25 because
it would allow me to proceed more easily with my sparc64
NUMA work as well as get your bug fixes in more smoothly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-19 Thread Robin Holt

On Wed, Feb 20, 2008 at 02:11:41PM +1100, Nick Piggin wrote:
> On Wednesday 20 February 2008 14:00, Robin Holt wrote:
> > On Wed, Feb 20, 2008 at 02:00:38AM +0100, Andrea Arcangeli wrote:
> > > On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote:
> 
> > > > Also, how to you resolve the case where you are not allowed to sleep?
> > > > I would have thought either you have to handle it, in which case nobody
> > > > needs to sleep; or you can't handle it, in which case the code is
> > > > broken.
> > >
> > > I also asked exactly this, glad you reasked this too.
> >
> > Currently, we BUG_ON having a PFN in our tables and not being able
> > to sleep.  These are mappings which MPT has never supported in the past
> > and XPMEM was already not allowing page faults for VMAs which are not
> > anonymous so it should never happen.  If the file-backed operations can
> > ever get changed to allow for sleeping and a customer has a need for it,
> > we would need to change XPMEM to allow those types of faults to succeed.
> 
> Do you really want to be able to swap, or are you just interested
> in keeping track of unmaps / prot changes?

I would rather not swap, but we do have one customer that would like
swapout to work for certain circumstances.  Additionally, we have
many customers that would rather that their system not die under I/O
termination.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-19 Thread Nick Piggin

On Wednesday 20 February 2008 14:00, Robin Holt wrote:
> On Wed, Feb 20, 2008 at 02:00:38AM +0100, Andrea Arcangeli wrote:
> > On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote:

> > > Also, how to you resolve the case where you are not allowed to sleep?
> > > I would have thought either you have to handle it, in which case nobody
> > > needs to sleep; or you can't handle it, in which case the code is
> > > broken.
> >
> > I also asked exactly this, glad you reasked this too.
>
> Currently, we BUG_ON having a PFN in our tables and not being able
> to sleep.  These are mappings which MPT has never supported in the past
> and XPMEM was already not allowing page faults for VMAs which are not
> anonymous so it should never happen.  If the file-backed operations can
> ever get changed to allow for sleeping and a customer has a need for it,
> we would need to change XPMEM to allow those types of faults to succeed.

Do you really want to be able to swap, or are you just interested
in keeping track of unmaps / prot changes?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem)

2008-02-19 Thread Robin Holt

On Wed, Feb 20, 2008 at 10:55:20AM +1100, Nick Piggin wrote:
> On Friday 15 February 2008 17:49, Christoph Lameter wrote:
> > These special additional callbacks are required because XPmem (and likely
> > other mechanisms) do use their own rmap (multiple processes on a series
> > of remote Linux instances may be accessing the memory of a process).
> > F.e. XPmem may have to send out notifications to remote Linux instances
> > and receive confirmation before a page can be freed.
> >
> > So we handle this like an additional Linux reverse map that is walked after
> > the existing rmaps have been walked. We leave the walking to the driver
> > that is then able to use something else than a spinlock to walk its reverse
> > maps. So we can actually call the driver without holding spinlocks while we
> > hold the Pagelock.
> 
> I don't know how this is supposed to solve anything. The sleeping
> problem happens I guess mostly in truncate. And all you are doing
> is putting these rmap callbacks in page_mkclean and try_to_unmap.
> 
> 
> > However, we cannot determine the mm_struct that a page belongs to at
> > that point. The mm_struct can only be determined from the rmaps by the
> > device driver.
> >
> > We add another pageflag (PageExternalRmap) that is set if a page has
> > been remotely mapped (f.e. by a process from another Linux instance).
> > We can then only perform the callbacks for pages that are actually in
> > remote use.
> >
> > Rmap notifiers need an extra page bit and are only available
> > on 64 bit platforms. This functionality is not available on 32 bit!
> >
> > A notifier that uses the reverse maps callbacks does not need to provide
> > the invalidate_page() method that is called when locks are held.
> 
> That doesn't seem right. To start with, the new callbacks aren't
> even called in the places where invalidate_page isn't allowed to
> sleep.
> 
> The problem is unmap_mapping_range, right? And unmap_mapping_range
> must walk the rmaps with the mmap lock held, which is why it can't
> sleep. And it can't hold any mmap_sem so it cannot prevent address
> space modifications of the processes in question between the time
> you unmap them from the linux ptes with unmap_mapping_range, and the
> time that you unmap them from your driver.
> 
> So in the meantime, you could have eg. a fault come in and set up a
> new page for one of the processes, and that page might even get
> exported via the same external driver. And now you have a totally
> inconsistent view.
> 
> Preventing new mappings from being set up until the old mapping is
> completely flushed is basically what we need to ensure for any sane
> TLB as far as I can tell. To do that, you'll need to make the mmap
> lock sleep, and either take mmap_sem inside it (which is a
> deadlock condition at the moment), or make ptl sleep as well. These
> are simply the locks we use to prevent that from happening, so I
> can't see how you can possibly hope to have a coherent TLB without
> invalidating inside those locks.

All of that is correct.  For XPMEM, we do not currently allow file backed
mapping pages from being exported so we should never reach this condition.
It has been an issue since day 1.  We have operated with that assumption
for 6 years and have not had issues with that assumption.  The user of
xpmem is MPT and it controls the communication buffers so it is reasonable
to expect this type of behavior.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Improve init/Kconfig help descriptions [PATCH 6/9]

2008-02-19 Thread Paul Menage

On Feb 19, 2008 6:54 PM, Nick Andrew <[EMAIL PROTECTED]> wrote:
>
> config CGROUPS
> bool "Control Group support"
> help
>   Control Groups enables processes to be tracked and grouped
>   into "cgroups". This enables you, for example, to associate
>   cgroups with certain CPU sets using "cpusets".
>
>   When enabled, a new filesystem type "cgroup" is available
>   and can be mounted to control cpusets and other
>   resource/behaviour controllers.
>
>   See  for more information.
>
>   If unsure, say N.
>
>
> I don't think that description is as clear as it could be. From
> the non-kernel-developer point of view, that is.

Originally this wasn't a user-selectable config value, it was
auto-selected by any subsystem that needed it. I think that was nicer
from the user-experience, and it would eliminate the need for this
documentation but there were concerns that this triggered unspecified
brokenness in the Kbuild system.

>
> Re "other resource/behaviour controllers", what in particular?
> I take it that our current controllers are cpusets, scheduler,
> CPU accounting and Resource counters?

Resource counters aren't a resource controller, they're a helper
library. The others are good examples, as is the memory controller
that's just been added to 2.6.25.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] (linus git 02/19/08) Smack update for file capabilities

2008-02-19 Thread Casey Schaufler


From: Casey Schaufler <[EMAIL PROTECTED]>

Update the Smack LSM to allow the registration of the capability
"module" as a secondary LSM. Integrate the new hooks required for
file based capabilities.

Signed-off-by: Casey Schaufler <[EMAIL PROTECTED]>

---

security/smack/smack_lsm.c |   87 +--
1 file changed, 74 insertions(+), 13 deletions(-)

diff -uprN -X linux-2.6.25-g0219-precap/Documentation/dontdiff 
linux-2.6.25-g0219-precap/security/smack/smack_lsm.c 
linux-2.6.25-g0219/security/smack/smack_lsm.c
--- linux-2.6.25-g0219-precap/security/smack/smack_lsm.c2008-02-19 
10:15:30.0 -0800
+++ linux-2.6.25-g0219/security/smack/smack_lsm.c   2008-02-19 
09:24:19.0 -0800
@@ -584,14 +584,20 @@ static int smack_inode_getattr(struct vf
static int smack_inode_setxattr(struct dentry *dentry, char *name,
void *value, size_t size, int flags)
{
-   if (!capable(CAP_MAC_ADMIN)) {
-   if (strcmp(name, XATTR_NAME_SMACK) == 0 ||
-   strcmp(name, XATTR_NAME_SMACKIPIN) == 0 ||
-   strcmp(name, XATTR_NAME_SMACKIPOUT) == 0)
-   return -EPERM;
-   }
+   int rc = 0;
+
+   if (strcmp(name, XATTR_NAME_SMACK) == 0 ||
+   strcmp(name, XATTR_NAME_SMACKIPIN) == 0 ||
+   strcmp(name, XATTR_NAME_SMACKIPOUT) == 0) {
+   if (!capable(CAP_MAC_ADMIN))
+   rc = -EPERM;
+   } else
+   rc = cap_inode_setxattr(dentry, name, value, size, flags);
+
+   if (rc == 0)
+   rc = smk_curacc(smk_of_inode(dentry->d_inode), MAY_WRITE);

-   return smk_curacc(smk_of_inode(dentry->d_inode), MAY_WRITE);
+   return rc;
}

/**
@@ -658,10 +664,20 @@ static int smack_inode_getxattr(struct d
 */
static int smack_inode_removexattr(struct dentry *dentry, char *name)
{
-   if (strcmp(name, XATTR_NAME_SMACK) == 0 && !capable(CAP_MAC_ADMIN))
-   return -EPERM;
+   int rc = 0;

-   return smk_curacc(smk_of_inode(dentry->d_inode), MAY_WRITE);
+   if (strcmp(name, XATTR_NAME_SMACK) == 0 ||
+   strcmp(name, XATTR_NAME_SMACKIPIN) == 0 ||
+   strcmp(name, XATTR_NAME_SMACKIPOUT) == 0) {
+   if (!capable(CAP_MAC_ADMIN))
+   rc = -EPERM;
+   } else
+   rc = cap_inode_removexattr(dentry, name);
+
+   if (rc == 0)
+   rc = smk_curacc(smk_of_inode(dentry->d_inode), MAY_WRITE);
+
+   return rc;
}

/**
@@ -1016,7 +1032,12 @@ static void smack_task_getsecid(struct t
 */
static int smack_task_setnice(struct task_struct *p, int nice)
{
-   return smk_curacc(p->security, MAY_WRITE);
+   int rc;
+
+   rc = cap_task_setnice(p, nice);
+   if (rc == 0)
+   rc = smk_curacc(p->security, MAY_WRITE);
+   return rc;
}

/**
@@ -1028,7 +1049,12 @@ static int smack_task_setnice(struct tas
 */
static int smack_task_setioprio(struct task_struct *p, int ioprio)
{
-   return smk_curacc(p->security, MAY_WRITE);
+   int rc;
+
+   rc = cap_task_setioprio(p, ioprio);
+   if (rc == 0)
+   rc = smk_curacc(p->security, MAY_WRITE);
+   return rc;
}

/**
@@ -1053,7 +1079,12 @@ static int smack_task_getioprio(struct t
static int smack_task_setscheduler(struct task_struct *p, int policy,
   struct sched_param *lp)
{
-   return smk_curacc(p->security, MAY_WRITE);
+   int rc;
+
+   rc = cap_task_setscheduler(p, policy, lp);
+   if (rc == 0)
+   rc = smk_curacc(p->security, MAY_WRITE);
+   return rc;
}

/**
@@ -1093,6 +1124,11 @@ static int smack_task_movememory(struct 
static int smack_task_kill(struct task_struct *p, struct siginfo *info,

   int sig, u32 secid)
{
+   int rc;
+
+   rc = cap_task_kill(p, info, sig, secid);
+   if (rc != 0)
+   return rc;
/*
 * Special cases where signals really ought to go through
 * in spite of policy. Stephen Smalley suggests it may
@@ -1778,6 +1814,27 @@ static int smack_ipc_permission(struct k
return smk_curacc(isp, may);
}

+/* module stacking operations */
+
+/**
+ * smack_register_security - stack capability module
+ * @name: module name
+ * @ops: module operations - ignored
+ *
+ * Allow the capability module to register.
+ */
+static int smack_register_security(const char *name,
+  struct security_operations *ops)
+{
+   if (strcmp(name, "capability") != 0)
+   return -EINVAL;
+
+   printk(KERN_INFO "%s:  Registering secondary module %s\n",
+  __func__, name);
+
+   return 0;
+}
+
/**
 * smack_d_instantiate - Make sure the blob is correct on an inode
 * @opt_dentry: unused
@@ -2412,6 +2469,8 @@ static struct security_operations smack_
.inode_post_setxattr =  smack_inode_post_setxattr,
.inode_getxattr =

Re: [PATCH 1/7] cgroup: fix and update documentation

2008-02-19 Thread Paul Menage

On Feb 18, 2008 12:39 AM, Li Zefan <[EMAIL PROTECTED]> wrote:
> Misc fixes and updates, make the doc consistent with current
> cgroup implementation.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

Thanks for these cleanups.

Paul

> ---
>  Documentation/cgroups.txt |   66 ++--
>  1 files changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/cgroups.txt b/Documentation/cgroups.txt
> index 42d7c4c..31d12e2 100644
>
> --- a/Documentation/cgroups.txt
> +++ b/Documentation/cgroups.txt
> @@ -28,7 +28,7 @@ CONTENTS:
>  4. Questions
>
>  1. Control Groups
> -==
> +=
>
>  1.1 What are cgroups ?
>  --
> @@ -143,10 +143,10 @@ proliferation of such cgroups.
>
>  Also lets say that the administrator would like to give enhanced network
>  access temporarily to a student's browser (since it is night and the user
> -wants to do online gaming :)  OR give one of the students simulation
> +wants to do online gaming :))  OR give one of the students simulation
>  apps enhanced CPU power,
>
> -With ability to write pids directly to resource classes, its just a
> +With ability to write pids directly to resource classes, it's just a
>  matter of :
>
> # echo pid > /mnt/network//tasks
> @@ -227,10 +227,13 @@ Each cgroup is represented by a directory in the cgroup 
> file system
>  containing the following files describing that cgroup:
>
>   - tasks: list of tasks (by pid) attached to that cgroup
> - - notify_on_release flag: run /sbin/cgroup_release_agent on exit?
> + - releasable flag: cgroup currently removeable?
> + - notify_on_release flag: run the release agent on exit?
> + - release_agent: the path to use for release notifications (this file
> +   exists in the top cgroup only)
>
>  Other subsystems such as cpusets may add additional files in each
> -cgroup dir
> +cgroup dir.
>
>  New cgroups are created using the mkdir system call or shell
>  command.  The properties of a cgroup, such as its flags, are
> @@ -257,7 +260,7 @@ performance.
>  To allow access from a cgroup to the css_sets (and hence tasks)
>  that comprise it, a set of cg_cgroup_link objects form a lattice;
>  each cg_cgroup_link is linked into a list of cg_cgroup_links for
> -a single cgroup on its cont_link_list field, and a list of
> +a single cgroup on its cgrp_link_list field, and a list of
>  cg_cgroup_links for a single css_set on its cg_link_list.
>
>  Thus the set of tasks in a cgroup can be listed by iterating over
> @@ -271,9 +274,6 @@ for cgroups, with a minimum of additional kernel code.
>  1.4 What does notify_on_release do ?
>  
>
> -*** notify_on_release is disabled in the current patch set. It will be
> -*** reactivated in a future patch in a less-intrusive manner
> -
>  If the notify_on_release flag is enabled (1) in a cgroup, then
>  whenever the last task in the cgroup leaves (exits or attaches to
>  some other cgroup) and the last child cgroup of that cgroup
> @@ -360,8 +360,8 @@ Now you want to do something with this cgroup.
>
>  In this directory you can find several files:
>  # ls
> -notify_on_release release_agent tasks
> -(plus whatever files are added by the attached subsystems)
> +notify_on_release releasable tasks
> +(plus whatever files added by the attached subsystems)
>
>  Now attach your shell to this cgroup:
>  # /bin/echo $$ > tasks
> @@ -404,19 +404,13 @@ with a subsystem id which will be assigned by the 
> cgroup system.
>  Other fields in the cgroup_subsys object include:
>
>  - subsys_id: a unique array index for the subsystem, indicating which
> -  entry in cgroup->subsys[] this subsystem should be
> -  managing. Initialized by cgroup_register_subsys(); prior to this
> -  it should be initialized to -1
> +  entry in cgroup->subsys[] this subsystem should be managing.
>
> -- hierarchy: an index indicating which hierarchy, if any, this
> -  subsystem is currently attached to. If this is -1, then the
> -  subsystem is not attached to any hierarchy, and all tasks should be
> -  considered to be members of the subsystem's top_cgroup. It should
> -  be initialized to -1.
> +- name: should be initialized to a unique subsystem name. Should be
> +  no longer than MAX_CGROUP_TYPE_NAMELEN.
>
> -- name: should be initialized to a unique subsystem name prior to
> -  calling cgroup_register_subsystem. Should be no longer than
> -  MAX_CGROUP_TYPE_NAMELEN
> +- early_init: indicate if the subsystem needs early initialization
> +  at system boot.
>
>  Each cgroup object created by the system has an array of pointers,
>  indexed by subsystem id; this pointer is entirely managed by the
> @@ -434,8 +428,6 @@ situation.
>  See kernel/cgroup.c for more details.
>
>  Subsystems can take/release the cgroup_mutex via the functions
> -cgroup_lock()/cgroup_unlock(), and can
> -take/release the callback_mutex via the functions
>

Re: [PATCH 4/7] cgroup: fix memory leak in cgroup_get_sb()

2008-02-19 Thread Paul Menage

On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> opts.release_agent is not kfree()ed in all necessary places.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

Good catch, although hopefully something that would be extremely rare
in practice.

Thanks,

Paul

> ---
>  kernel/cgroup.c |5 -
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 0c35022..aa76bbd 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -961,8 +961,11 @@ static int cgroup_get_sb(struct file_system_type 
> *fs_type,
> }
>
> root = kzalloc(sizeof(*root), GFP_KERNEL);
> -   if (!root)
> +   if (!root) {
> +   if (opts.release_agent)
> +   kfree(opts.release_agent);
> return -ENOMEM;
> +   }
>
> init_cgroup_root(root);
> root->subsys_bits = opts.subsys_bits;
> --
> 1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/7] cgroup: remove duplicate code in find_css_set()

2008-02-19 Thread Paul Menage

On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> The list head res->tasks gets initialized twice in find_css_set().
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

> ---
>  kernel/cgroup.c |1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e8c8e58..71cf961 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -473,7 +473,6 @@ static struct css_set *find_css_set(
> /* Link this cgroup group into the list */
> list_add(>list, _css_set.list);
> css_set_count++;
> -   INIT_LIST_HEAD(>tasks);
> write_unlock(_set_lock);
>
> return res;
> --
> 1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2][POWERPC] Fix initial lmb add region with a non-zero base

2008-02-19 Thread Kumar Gala

On Feb 19, 2008, at 6:30 PM, David Miller wrote:

From: Kumar Gala <[EMAIL PROTECTED]>
Date: Tue, 19 Feb 2008 13:51:37 -0600 (CST)

If we add to an empty lmb region with a non-zero base we will not  
coalesce
the number of regions down to one.  This causes problems on ppc32  
for the

memory region as its assumed to only have one region.

We can fix this easily by causing the initial add to replace the  
dummy

region.

Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>
---

Fix a bug the initial patch introduced if we have a region that  
gets added

at the beginning of the list we wouldn't actually add it.

Dave can you replace the patch in you tree with this one.

I think my tree has already or will soon be pulled in so
I don't want to rebase it.

Why don't you simply send me the relative bug fix instead?

np.  Are we trying to get this into 2.6.25 or .26?

- k
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] cgroup: fix comments

2008-02-19 Thread Paul Menage

On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> fix:
> - comments about need_forkexit_callback
> - comments about release agent
> - typo and comment style, etc.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>
> ---
>  include/linux/cgroup.h |2 +-
>  kernel/cgroup.c|   44 +---
>  2 files changed, 22 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index ff9055f..2ebf7af 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -175,7 +175,7 @@ struct css_set {
>   *
>   *
>   * When reading/writing to a file:
> - * - the cgroup to use in file->f_dentry->d_parent->d_fsdata
> + * - the cgroup to use is file->f_dentry->d_parent->d_fsdata
>   * - the 'cftype' of the file is file->f_dentry->d_fsdata
>   */
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 4766bb6..0c35022 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -113,9 +113,9 @@ static int root_count;
>  #define dummytop (_cgroup)
>
>  /* This flag indicates whether tasks in the fork and exit paths should
> - * take callback_mutex and check for fork/exit handlers to call. This
> - * avoids us having to do extra work in the fork/exit path if none of the
> - * subsystems need to be called.
> + * check for fork/exit handlers to call. This avoids us having to do
> + * extra work in the fork/exit path if none of the subsystems need to
> + * be called.
>   */
>  static int need_forkexit_callback;
>
> @@ -507,8 +507,8 @@ static struct css_set *find_css_set(
>   * critical pieces of code here.  The exception occurs on cgroup_exit(),
>   * when a task in a notify_on_release cgroup exits.  Then cgroup_mutex
>   * is taken, and if the cgroup count is zero, a usermode call made
> - * to /sbin/cgroup_release_agent with the name of the cgroup (path
> - * relative to the root of cgroup file system) as the argument.
> + * to the release agent with the name of the cgroup (path relative to
> + * the root of cgroup file system) as the argument.
>   *
>   * A cgroup can only be deleted if both its 'count' of using tasks
>   * is zero, and its list of 'children' cgroups is empty.  Since all
> @@ -521,7 +521,7 @@ static struct css_set *find_css_set(
>   *
>   * The need for this exception arises from the action of
>   * cgroup_attach_task(), which overwrites one tasks cgroup pointer with
> - * another.  It does so using cgroup_mutexe, however there are
> + * another.  It does so using cgroup_mutex, however there are
>   * several performance critical places that need to reference
>   * task->cgroup without the expense of grabbing a system global
>   * mutex.  Therefore except as noted below, when dereferencing or, as
> @@ -1192,7 +1192,7 @@ static void get_first_subsys(const struct cgroup *cgrp,
>   * Attach task 'tsk' to cgroup 'cgrp'
>   *
>   * Call holding cgroup_mutex.  May take task_lock of
> - * the task 'pid' during call.
> + * the task 'tsk' during call.
>   */
>  int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
>  {
> @@ -1584,12 +1584,11 @@ static int cgroup_create_file(struct dentry *dentry, 
> int mode,
>  }
>
>  /*

I think that docbook-style function comments need /** at the start of
the comment block.

Thanks,

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] signal(x86_32): Improve the signal stack overflow check

2008-02-19 Thread Shi Weihua

Roland McGrath wrote::
>> I spent some time read you mail carefully and dig into the code again.
>>
>> And yes, you are right. It's possible that SA_ONSTACK has been cleared
>> before the second signal on the same stack comes.
> 
> It's not necessary for SA_ONSTACK to have "been cleared", by which I assume
> you mean a sigaction call with SA_ONSTACK not set in sa_flags.  That is
> indeed possible, but it's not the only case your patch broke.  It can just
> be a different signal whose sigaction never had SA_ONSTACK, when you are
> still on the signal stack from an earlier signal that did have SA_ONSTACK.

Thanks for your explanation.

> 
>> So this patch is wrong  :( . I will revise the other 4 patches.
> 
> For 2 and 3, I would rather just wait until we unify signal.c anyway.

Ok. I see.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-02-19 Thread Robin Holt

On Wed, Feb 20, 2008 at 02:00:38AM +0100, Andrea Arcangeli wrote:
> On Wed, Feb 20, 2008 at 10:08:49AM +1100, Nick Piggin wrote:
> > You can't sleep inside rcu_read_lock()!
> > 
> > I must say that for a patch that is up to v8 or whatever and is
> > posted twice a week to such a big cc list, it is kind of slack to
> > not even test it and expect other people to review it.
> 
> Well, xpmem requirements are complex. As as side effect of the
> simplicity of my approach, my patch is 100% safe since #v1. Now it
> also works for GRU and it cluster invalidates.
> 
> > Also, what we are going to need here are not skeleton drivers
> > that just do all the *easy* bits (of registering their callbacks),
> > but actual fully working examples that do everything that any
> > real driver will need to do. If not for the sanity of the driver
> 
> I've a fully working scenario for my patch, infact I didn't post the
> mmu notifier patch until I got KVM to swap 100% reliably to be sure I
> would post something that works well. mmu notifiers are already used
> in KVM for:
> 
> 1) 100% reliable and efficient swapping of guest physical memory
> 2) copy-on-writes of writeprotect faults after ksm page sharing of guest
>physical memory
> 3) ballooning using madvise to give the guest memory back to the host
> 
> My implementation is the most handy because it requires zero changes
> to the ksm code too (no explicit mmu notifier calls after
> ptep_clear_flush) and it's also 100% safe (no mess with schedules over
> rcu_read_lock), no "atomic" parameters, and it doesn't open a window
> where sptes have a view on older pages and linux pte has view on newer
> pages (this can happen with remap_file_pages with my KVM swapping
> patch to use V8 Christoph's patch).
> 
> > Also, how to you resolve the case where you are not allowed to sleep?
> > I would have thought either you have to handle it, in which case nobody
> > needs to sleep; or you can't handle it, in which case the code is
> > broken.
> 
> I also asked exactly this, glad you reasked this too.

Currently, we BUG_ON having a PFN in our tables and not being able
to sleep.  These are mappings which MPT has never supported in the past
and XPMEM was already not allowing page faults for VMAs which are not
anonymous so it should never happen.  If the file-backed operations can
ever get changed to allow for sleeping and a customer has a need for it,
we would need to change XPMEM to allow those types of faults to succeed.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7] cgroup: fix subsys bitops

2008-02-19 Thread Paul Menage

On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> Cgroup uses unsigned long for subsys bitops, not unsigned long long.
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

> ---
>  kernel/cgroup.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index aa76bbd..e8c8e58 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -320,7 +320,7 @@ static struct css_set *find_existing_css_set(
> /* Built the set of subsystem state objects that we want to
>  * see in the new css_set */
> for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> -   if (root->subsys_bits & (1ull << i)) {
> +   if (root->subsys_bits & (1UL << i)) {
> /* Subsystem is in this hierarchy. So we want
>  * the subsystem state from the new
>  * cgroup */
> @@ -696,7 +696,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
> added_bits = final_bits & ~root->actual_subsys_bits;
> /* Check that any added subsystems are currently free */
> for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> -   unsigned long long bit = 1ull << i;
> +   unsigned long bit = 1UL << i;
> struct cgroup_subsys *ss = subsys[i];
> if (!(bit & added_bits))
> continue;
> --
> 1.5.4.rc3
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] scsi/qla4xxx/ql4_isr.c: remove dead code

2008-02-19 Thread Andrew Vasquez

On Tue, 19 Feb 2008, James Bottomley wrote:

> On Tue, 2008-02-19 at 18:35 -0800, Andrew Vasquez wrote:
> > On Tue, 19 Feb 2008, James Bottomley wrote:
> > 
> > > On Tue, 2008-02-19 at 21:29 +0200, Adrian Bunk wrote:
> > > > This patch removes dead code spotted by the Coverity checker.
> > > > 
> > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> > > > 
> > > > ---
> > > > 
> > > >  drivers/scsi/qla4xxx/ql4_isr.c |   18 +-
> > > >  1 file changed, 1 insertion(+), 17 deletions(-)
> > > > 
> > > > --- linux-2.6/drivers/scsi/qla4xxx/ql4_isr.c.old2008-02-19 
> > > > 20:29:16.0 +0200
> > > > +++ linux-2.6/drivers/scsi/qla4xxx/ql4_isr.c2008-02-19 
> > > > 20:30:37.0 +0200
> > > > @@ -91,38 +91,22 @@ static void qla4xxx_status_entry(struct 
> > > > if (scsi_status == 0) {
> > > > cmd->result = DID_OK << 16;
> > > > break;
> > > > }
> > > >  
> > > > if (sts_entry->iscsiFlags & ISCSI_FLAG_RESIDUAL_OVER) {
> > > > cmd->result = DID_ERROR << 16;
> > > > break;
> > > > }
> > > >  
> > > > -   if (sts_entry->iscsiFlags _FLAG_RESIDUAL_UNDER) {
> > > > +   if (sts_entry->iscsiFlags _FLAG_RESIDUAL_UNDER)
> > > > scsi_set_resid(cmd, residual);
> > > > -   if (!scsi_status && ((scsi_bufflen(cmd) - 
> > > > residual) <
> > > > -   cmd->underflow)) {
> > > > -
> > > > -   cmd->result = DID_ERROR << 16;
> > > > -
> > > > -   DEBUG2(printk("scsi%ld:%d:%d:%d: %s: "
> > > > -   "Mid-layer Data underrun0, "
> > > > -   "xferlen = 0x%x, "
> > > > -   "residual = 0x%x\n", 
> > > > ha->host_no,
> > > > -   cmd->device->channel,
> > > > -   cmd->device->id,
> > > > -   cmd->device->lun, __func__,
> > > > -   scsi_bufflen(cmd), residual));
> > > > -   break;
> > > > -   }
> > > > -   }
> > > 
> > > This code doesn't look dead to me, it looks to be enforcing
> > > cmd->underrun if set ... what makes the coverity checker think it can
> > > never be executed?
> > 
> > Hmm, guess it's the earlier 'if (scsi_status == 0)' check a few lines
> > up...  Dave S., can you take a look at this...  Thanks, av
> 
> Ah, so the !scsi_status is wrong it was supposed to be scsi_status !=
> 0 ... and even then it can just be dropped.

My guess is that the check should have been written as:

...
if (sts_entry->iscsiFlags _FLAG_RESIDUAL_UNDER)
scsi_set_resid(cmd, residual);
if ((scsi_bufflen(cmd) - residual) < cmd->underflow) {
...

It looks to be a logic-error while porting from qla2xxx, where
scsi_status during CS_COMPLETE is the full 16-bit status (high-byte is
transport, low-byte SCSI status) from from the FCP_RSP frame (not so
in iSCSI, where it's just the SCSI-status) and the residual check
in qla_isr.c::qla2x00_status_entry() looks like:

if (!lscsi_status &&
((unsigned)(scsi_bufflen(cp) - resid) <
 cp->underflow)) {
...

I'll defer to Dave S. for verification.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/7] cgroup: clean up cgroup.h

2008-02-19 Thread Paul Menage

On Feb 17, 2008 9:49 PM, Li Zefan <[EMAIL PROTECTED]> wrote:
> - replace old name 'cont' with 'cgrp' (Paul Menage did this cleanup for
>   cgroup.c in commit bd89aabc6761de1c35b154fe6f914a445d301510)
> - remove a duplicate declaration of cgroup_path()
>
> Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

Acked-by: Paul Menage <[EMAIL PROTECTED]>

> ---
>  include/linux/cgroup.h |   48 
> +++-
>  1 files changed, 23 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 2ebf7af..028ba3b 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -186,15 +186,15 @@ struct cftype {
> char name[MAX_CFTYPE_NAME];
> int private;
> int (*open) (struct inode *inode, struct file *file);
> -   ssize_t (*read) (struct cgroup *cont, struct cftype *cft,
> +   ssize_t (*read) (struct cgroup *cgrp, struct cftype *cft,
>  struct file *file,
>  char __user *buf, size_t nbytes, loff_t *ppos);
> /*
>  * read_uint() is a shortcut for the common case of returning a
>  * single integer. Use it in place of read()
>  */
> -   u64 (*read_uint) (struct cgroup *cont, struct cftype *cft);
> -   ssize_t (*write) (struct cgroup *cont, struct cftype *cft,
> +   u64 (*read_uint) (struct cgroup *cgrp, struct cftype *cft);
> +   ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft,
>   struct file *file,
>   const char __user *buf, size_t nbytes, loff_t 
> *ppos);
>
> @@ -203,7 +203,7 @@ struct cftype {
>  * a single integer (as parsed by simple_strtoull) from
>  * userspace. Use in place of write(); return 0 or error.
>  */
> -   int (*write_uint) (struct cgroup *cont, struct cftype *cft, u64 val);
> +   int (*write_uint) (struct cgroup *cgrp, struct cftype *cft, u64 val);
>
> int (*release) (struct inode *inode, struct file *file);
>  };
> @@ -218,41 +218,41 @@ struct cgroup_scanner {
>
>  /* Add a new file to the given cgroup directory. Should only be
>   * called by subsystems from within a populate() method */
> -int cgroup_add_file(struct cgroup *cont, struct cgroup_subsys *subsys,
> +int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
>const struct cftype *cft);
>
>  /* Add a set of new files to the given cgroup directory. Should
>   * only be called by subsystems from within a populate() method */
> -int cgroup_add_files(struct cgroup *cont,
> +int cgroup_add_files(struct cgroup *cgrp,
> struct cgroup_subsys *subsys,
> const struct cftype cft[],
> int count);
>
> -int cgroup_is_removed(const struct cgroup *cont);
> +int cgroup_is_removed(const struct cgroup *cgrp);
>
> -int cgroup_path(const struct cgroup *cont, char *buf, int buflen);
> +int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
>
> -int cgroup_task_count(const struct cgroup *cont);
> +int cgroup_task_count(const struct cgroup *cgrp);
>
>  /* Return true if the cgroup is a descendant of the current cgroup */
> -int cgroup_is_descendant(const struct cgroup *cont);
> +int cgroup_is_descendant(const struct cgroup *cgrp);
>
>  /* Control Group subsystem type. See Documentation/cgroups.txt for details */
>
>  struct cgroup_subsys {
> struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss,
> - struct cgroup *cont);
> -   void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cont);
> -   void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cont);
> + struct cgroup *cgrp);
> +   void (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> +   void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> int (*can_attach)(struct cgroup_subsys *ss,
> - struct cgroup *cont, struct task_struct *tsk);
> -   void (*attach)(struct cgroup_subsys *ss, struct cgroup *cont,
> -   struct cgroup *old_cont, struct task_struct *tsk);
> + struct cgroup *cgrp, struct task_struct *tsk);
> +   void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +   struct cgroup *old_cgrp, struct task_struct *tsk);
> void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
> void (*exit)(struct cgroup_subsys *ss, struct task_struct *task);
> int (*populate)(struct cgroup_subsys *ss,
> -   struct cgroup *cont);
> -   void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cont);
> +   struct cgroup *cgrp);
> +   void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> void (*bind)(struct cgroup_subsys

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1330 matches

Mail list logo