date:20071203

Re: [patch] rewrite rd

2007-12-03 Thread Rob Landley

On Monday 03 December 2007 22:26:28 Nick Piggin wrote:
> There is one slight downside -- direct block device access and filesystem
> metadata access goes through an extra copy and gets stored in RAM twice.
> However, this downside is only slight, because the real buffercache of the
> device is now reclaimable (because we're not playing crazy games with it),
> so under memory intensive situations, footprint should effectively be the
> same -- maybe even a slight advantage to the new driver because it can also
> reclaim buffer heads.

For the embedded world, initramfs has pretty much rendered initrd obsolete, 
and that was the biggest user of the ramdisk code I know of.  Beyond that, 
loopback mounts give you more flexible transient block devices than ramdisks 
do.  (In fact, ramdisks are such an amazing pain to use/size/free that if I 
really needed something like that I'd just make a loopback mount in a ramfs 
instance.)

Embedded users who still want a block interface for memory are generally 
trying to use a cramfs or squashfs image out of ROM or flash, although there 
are flash-specific filesystems for this and I dunno if they're actually 
mounting /dev/mem at an offset or something (md?  losetup -o?  Beats me, I 
haven't tried that myself yet...)

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: The use of KOBJ_NAME_LEN

2007-12-03 Thread Dave Young

On Dec 4, 2007 3:46 PM, Greg KH <[EMAIL PROTECTED]> wrote:
> On Tue, Dec 04, 2007 at 02:45:47PM +0800, Dave Young wrote:
> > Hi,
> > Does the KOBJ_NAME_LEN really means the limit of kobject name length? seems
> > not . And if it's true, is the KOBJ_NAME_LEN of 20 enough to use?
>
> No, not anymore, the kobject name is totally dynamic.

Eh, Why does this macro still exist? If KOBJ_NAME_LEN is really
needed, maybe it should be renamed to something else to avoid
misleading.

>
> > In the kobject_set_name, the limit is 1024. Looks like either the comment or
> > the code should be updated.
>
> Here's a patch below for the comment updating it.

Thanks
>
> I also have a patch in my -mm series that takes away the 1024 character
> limit, but for 2.6.24 we'll have to live with it :)
>
> thanks,
>
> greg k-h
>
>
> ---
>  lib/kobject.c |   12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -234,13 +234,13 @@ int kobject_register(struct kobject * ko
>
>
>  /**
> - * kobject_set_name - Set the name of an object
> - * @kobj:  object.
> - * @fmt:   format string used to build the name
> + * kobject_set_name - Set the name of a kobject
> + * @kobj: kobject to name
> + * @fmt: format string used to build the name
>   *
> - * If strlen(name) >= KOBJ_NAME_LEN, then use a dynamically allocated
> - * string that @kobj->k_name points to. Otherwise, use the static
> - * @kobj->name array.
> + * This sets the name of the kobject.  If you have already added the
> + * kobject to the system, you must call kobject_rename() in order to
> + * change the name of the kobject.
>   */
>  int kobject_set_name(struct kobject * kobj, const char * fmt, ...)
>  {
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pktcdvd : add kobject_put when kobject register fails

2007-12-03 Thread Pekka Enberg

Hi Dave,

On Dec 4, 2007 3:31 AM, Dave Young <[EMAIL PROTECTED]> wrote:
> Kobject_put should be called when kobject register functioin fails, so the
> the kobj ref count touch zero and then the proper cleanup routines will be
> called.

[snip]

> diff -upr linux/drivers/block/pktcdvd.c linux.new/drivers/block/pktcdvd.c
> --- linux/drivers/block/pktcdvd.c   2007-11-30 13:13:44.0 +0800
> +++ linux.new/drivers/block/pktcdvd.c   2007-11-30 13:24:08.0 +0800
> @@ -117,8 +117,10 @@ static struct pktcdvd_kobj* pkt_kobj_cre
> p->kobj.parent = parent;
> p->kobj.ktype = ktype;
> p->pd = pd;
> -   if (kobject_register(>kobj) != 0)
> +   if (kobject_register(>kobj) != 0) {
> +   kobject_put(>kobj);
> return NULL;

This looks wrong to me. AFAICT the only thing that can fail
kobject_register() is kobject_add() and it cleans up after itself. Am
I missing something here?

   Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: The use of KOBJ_NAME_LEN

2007-12-03 Thread Greg KH

On Tue, Dec 04, 2007 at 01:50:53AM -0500, Robert P. J. Day wrote:
> On Tue, 4 Dec 2007, Dave Young wrote:
> 
> > Hi,
> > Does the KOBJ_NAME_LEN really means the limit of kobject name length? seems 
> > not . And if it's true, is the KOBJ_NAME_LEN of 20 enough to use?
> >
> > In the kobject_set_name, the limit is 1024. Looks like either the comment 
> > or the code should be updated.
> >
> > /**
> >  *  kobject_set_name - Set the name of an object
> >  *  @kobj:  object.
> >  *  @fmt:   format string used to build the name
> >  *
> >  *  If strlen(name) >= KOBJ_NAME_LEN, then use a dynamically allocated
> >  *  string that @kobj->k_name points to. Otherwise, use the static
> >  *  @kobj->name array.
> >  */
> 
> the comment seems fairly clear -- if the name is sufficiently short,
> it's stored in the static array.  if not, then it's stored in
> dynamically allocated space.

Unfortunately, it's totally wrong, the code was updated by the comment
wasn't, sorry.  See my patch to fix this.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: The use of KOBJ_NAME_LEN

2007-12-03 Thread Greg KH

On Tue, Dec 04, 2007 at 02:45:47PM +0800, Dave Young wrote:
> Hi,
> Does the KOBJ_NAME_LEN really means the limit of kobject name length? seems
> not . And if it's true, is the KOBJ_NAME_LEN of 20 enough to use?

No, not anymore, the kobject name is totally dynamic.

> In the kobject_set_name, the limit is 1024. Looks like either the comment or
> the code should be updated.

Here's a patch below for the comment updating it.

I also have a patch in my -mm series that takes away the 1024 character
limit, but for 2.6.24 we'll have to live with it :)

thanks,

greg k-h


---
 lib/kobject.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -234,13 +234,13 @@ int kobject_register(struct kobject * ko
 
 
 /**
- * kobject_set_name - Set the name of an object
- * @kobj:  object.
- * @fmt:   format string used to build the name
+ * kobject_set_name - Set the name of a kobject
+ * @kobj: kobject to name
+ * @fmt: format string used to build the name
  *
- * If strlen(name) >= KOBJ_NAME_LEN, then use a dynamically allocated
- * string that @kobj->k_name points to. Otherwise, use the static 
- * @kobj->name array.
+ * This sets the name of the kobject.  If you have already added the
+ * kobject to the system, you must call kobject_rename() in order to
+ * change the name of the kobject.
  */
 int kobject_set_name(struct kobject * kobj, const char * fmt, ...)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel newbies list?

2007-12-03 Thread Robert P. J. Day


  does anyone know what's happened with the KN list?  it seems to have
gone utterly dead for the last day or so.

rday


Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://crashcourse.ca

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REQUEST] New boot flag/kernel option

2007-12-03 Thread Pavel Machek

On Sat 2007-11-17 11:02:20, Raymano Garibaldi wrote:
> I would like to request a new boot flag/kernel option that would make
> the following scenario possible:
> 
> 1) Working on laptop with a live USB distro on a read-only USB stick.
> 2) Suspend laptop.
> 3) Detach USB stick.
> 
> 4) Do other things, get on a plane, go on a bus, deal with police
> officer giving you a ticket for operating a laptop while driving...
> 
> 5) Attach the same read-only USB stick.
> 6) Resume laptop.
> 7) Continue work as if nothing happened.
> 
> The last time we were able to do something like this was in 2.6.21.
> 
> If not, could you please advise a workaround to get this functionality
> with the latest kernels.

There's CONFIG switch in usb, exactly for this.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: The use of KOBJ_NAME_LEN

2007-12-03 Thread Dave Young

On Dec 4, 2007 2:50 PM, Robert P. J. Day <[EMAIL PROTECTED]> wrote:
>
> On Tue, 4 Dec 2007, Dave Young wrote:
>
> > Hi,
> > Does the KOBJ_NAME_LEN really means the limit of kobject name length? seems 
> > not . And if it's true, is the KOBJ_NAME_LEN of 20 enough to use?
> >
> > In the kobject_set_name, the limit is 1024. Looks like either the comment 
> > or the code should be updated.
> >
> > /**
> >  *  kobject_set_name - Set the name of an object
> >  *  @kobj:  object.
> >  *  @fmt:   format string used to build the name
> >  *
> >  *  If strlen(name) >= KOBJ_NAME_LEN, then use a dynamically allocated
> >  *  string that @kobj->k_name points to. Otherwise, use the static
> >  *  @kobj->name array.
> >  */
>
> the comment seems fairly clear -- if the name is sufficiently short,
> it's stored in the static array.  if not, then it's stored in
> dynamically allocated space.

Yes, It's clear, but it is not the topic I talk about.
Please look at the KOBJ_NAME_LEN macro usage.

>
> rday
> 
> Robert P. J. Day
> Linux Consulting, Training and Annoying Kernel Pedantry
> Waterloo, Ontario, CANADA
>
> http://crashcourse.ca
> 
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] rewrite rd

2007-12-03 Thread Nick Piggin

On Tue, Dec 04, 2007 at 08:01:31AM +0100, Nick Piggin wrote:
> Thanks for the review, I'll post an incremental patch in a sec.
 

Index: linux-2.6/drivers/block/brd.c
===
--- linux-2.6.orig/drivers/block/brd.c
+++ linux-2.6/drivers/block/brd.c
@@ -56,7 +56,7 @@ struct brd_device {
  */
 static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
 {
-   unsigned long idx;
+   pgoff_t idx;
struct page *page;
 
/*
@@ -87,13 +87,17 @@ static struct page *brd_lookup_page(stru
  */
 static struct page *brd_insert_page(struct brd_device *brd, sector_t sector)
 {
-   unsigned long idx;
+   pgoff_t idx;
struct page *page;
 
page = brd_lookup_page(brd, sector);
if (page)
return page;
 
+   /*
+* Must use NOIO because we don't want to recurse back into the
+* block or filesystem layers from page reclaim.
+*/
page = alloc_page(GFP_NOIO | __GFP_HIGHMEM | __GFP_ZERO);
if (!page)
return NULL;
@@ -148,6 +152,11 @@ static void brd_free_pages(struct brd_de
 
pos++;
 
+   /*
+* This assumes radix_tree_gang_lookup always returns as
+* many pages as possible. If the radix-tree code changes,
+* so will this have to.
+*/
} while (nr_pages == FREE_BATCH);
 }
 
@@ -159,7 +168,7 @@ static int copy_to_brd_setup(struct brd_
unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT;
size_t copy;
 
-   copy = min((unsigned long)n, PAGE_SIZE - offset);
+   copy = min_t(size_t, n, PAGE_SIZE - offset);
if (!brd_insert_page(brd, sector))
return -ENOMEM;
if (copy < n) {
@@ -181,7 +190,7 @@ static void copy_to_brd(struct brd_devic
unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT;
size_t copy;
 
-   copy = min((unsigned long)n, PAGE_SIZE - offset);
+   copy = min_t(size_t, n, PAGE_SIZE - offset);
page = brd_lookup_page(brd, sector);
BUG_ON(!page);
 
@@ -213,7 +222,7 @@ static void copy_from_brd(void *dst, str
unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT;
size_t copy;
 
-   copy = min((unsigned long)n, PAGE_SIZE - offset);
+   copy = min_t(size_t, n, PAGE_SIZE - offset);
page = brd_lookup_page(brd, sector);
if (page) {
src = kmap_atomic(page, KM_USER1);
@@ -318,6 +327,9 @@ static int brd_ioctl(struct inode *inode
/*
 * Invalidate the cache first, so it isn't written
 * back to the device.
+*
+* Another thread might instantiate more buffercache here,
+* but there is not much we can do to close that race.
 */
invalidate_bh_lrus();
truncate_inode_pages(bdev->bd_inode->i_mapping, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] rewrite rd

2007-12-03 Thread Nick Piggin

On Mon, Dec 03, 2007 at 10:29:03PM -0800, Andrew Morton wrote:
> On Tue, 4 Dec 2007 05:26:28 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> > 
> > There is one slight downside -- direct block device access and filesystem
> > metadata access goes through an extra copy and gets stored in RAM twice.
> > However, this downside is only slight, because the real buffercache of the
> > device is now reclaimable (because we're not playing crazy games with it),
> > so under memory intensive situations, footprint should effectively be the
> > same -- maybe even a slight advantage to the new driver because it can also
> > reclaim buffer heads.
> 
> what about mmap(/dev/ram0)?

Same thing I guess -- it will use more memory in the common case, but should
actually be able to reclaim slightly more memory under pressure, so the
tiny systems guys shouldn't have too much trouble.

And we're avoiding the whole class of aliasing problems where mmap on
the old rd driver means that you're creating new mappings to your backing
store pages.

 
> > Text is larger, but data and bss are smaller, making total size smaller.
> > 
> > A few other nice things about it:
> > - Similar structure and layout to the new loop device handlinag.
> > - Dynamic ramdisk creation.
> > - Runtime flexible buffer head size (because it is no longer part of the
> >   ramdisk code).
> > - Boot / load time flexible ramdisk size, which could easily be extended
> >   to a per-ramdisk runtime changeable size (eg. with an ioctl).
> 
> This ramdisk driver can use highmem whereas the existing one can't (yes?). 
> That's a pretty major difference.  Plus look at the revoltingness in rd.c's
> mapping_set_gfp_mask()s.

Ah yep, there is the highmem advantage too.


> > +#define SECTOR_SHIFT   9
> 
> That's our third definition of SECTOR_SHIFT.

Or 7th, depend on how you count ;) I always thought redefining it is a
prerequsite to getting anything merged into the block layer, so I'm too
scared to put it in include/linux/blkdev.h ;)


> > +/*
> > + * Look up and return a brd's page for a given sector.
> > + */
> > +static struct page *brd_lookup_page(struct brd_device *brd, sector_t 
> > sector)
> > +{
> > +   unsigned long idx;
> 
> Could use pgoff_t here if you think that's clearer.

I guess it is. Although on one hand the radix-tree uses unsigned long, but
on the other hand, page->index is pgoff.

> > +{
> > +   unsigned long idx;
> > +   struct page *page;
> > +
> > +   page = brd_lookup_page(brd, sector);
> > +   if (page)
> > +   return page;
> > +
> > +   page = alloc_page(GFP_NOIO | __GFP_HIGHMEM | __GFP_ZERO);
> 
> Why GFP_NOIO?

I guess it has similar issues to rd.c -- we can't risk recursing
into the block layer here. However unlike rd.c, we _could_ easily
add some mode or ioctl to allocate the backing store upfront, with
full reclaim flags...

 
> Have you thought about __GFP_MOVABLE treatment here?  Seems pretty
> desirable as this sucker can be large.

AFAIK that only applies to pagecache but I haven't been paying much
attention to that stuff lately. It wouldn't be hard to move these
pages around, if we had a hook from the vm for it.
 

> > +static void brd_free_pages(struct brd_device *brd)
> > +{
> > +   unsigned long pos = 0;
> > +   struct page *pages[FREE_BATCH];
> > +   int nr_pages;
> > +
> > +   do {
> > +   int i;
> > +
> > +   nr_pages = radix_tree_gang_lookup(>brd_pages,
> > +   (void **)pages, pos, FREE_BATCH);
> > +
> > +   for (i = 0; i < nr_pages; i++) {
> > +   void *ret;
> > +
> > +   BUG_ON(pages[i]->index < pos);
> > +   pos = pages[i]->index;
> > +   ret = radix_tree_delete(>brd_pages, pos);
> > +   BUG_ON(!ret || ret != pages[i]);
> > +   __free_page(pages[i]);
> > +   }
> > +
> > +   pos++;
> > +
> > +   } while (nr_pages == FREE_BATCH);
> > +}
> 
> I have vague memories that radix_tree_gang_lookup()'s "naive"
> implementation may return fewer items than you asked for even when there
> are more items remaining - when it hits certain boundaries.

Good memory, but it's the low level leaf traversal that bales out at
boundaries. The higher level code then retries, so we should be OK
here.

> > +/*
> > + * copy_to_brd_setup must be called before copy_to_brd. It may sleep.
> > + */
> > +static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, 
> > size_t n)
> > +{
> > +   unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT;
> > +   size_t copy;
> > +
> > +   copy = min((unsigned long)n, PAGE_SIZE - offset);
> 
> use min_t.  Or, better, sort out the types.

I guess the API is using size_t, so that would be the more approprite type
to convert to. And I'll use min_t too.


> > +static void copy_to_brd(struct brd_device *brd, const void *src,
> > +   sector_t sector, size_t n)
> > +{
> > +   struct page *page;
> > +   void

Linux Kernel - Future works

2007-12-03 Thread Muhammad Nowbuth

Hi all,

Could anyone give some ideas of future pending works which are needed
on the linux kernel?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: The use of KOBJ_NAME_LEN

2007-12-03 Thread Robert P. J. Day

On Tue, 4 Dec 2007, Dave Young wrote:

> Hi,
> Does the KOBJ_NAME_LEN really means the limit of kobject name length? seems 
> not . And if it's true, is the KOBJ_NAME_LEN of 20 enough to use?
>
> In the kobject_set_name, the limit is 1024. Looks like either the comment or 
> the code should be updated.
>
> /**
>  *  kobject_set_name - Set the name of an object
>  *  @kobj:  object.
>  *  @fmt:   format string used to build the name
>  *
>  *  If strlen(name) >= KOBJ_NAME_LEN, then use a dynamically allocated
>  *  string that @kobj->k_name points to. Otherwise, use the static
>  *  @kobj->name array.
>  */

the comment seems fairly clear -- if the name is sufficiently short,
it's stored in the static array.  if not, then it's stored in
dynamically allocated space.

rday

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://crashcourse.ca

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

The use of KOBJ_NAME_LEN

2007-12-03 Thread Dave Young

Hi,
Does the KOBJ_NAME_LEN really means the limit of kobject name length? seems not 
. And if it's true, is the KOBJ_NAME_LEN of 20 enough to use?

In the kobject_set_name, the limit is 1024. Looks like either the comment or 
the code should be updated.

/**
 *  kobject_set_name - Set the name of an object
 *  @kobj:  object.
 *  @fmt:   format string used to build the name
 *
 *  If strlen(name) >= KOBJ_NAME_LEN, then use a dynamically allocated
 *  string that @kobj->k_name points to. Otherwise, use the static 
 *  @kobj->name array.
 */

Regards
dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Zhang, Yanmin

On Mon, 2007-12-03 at 11:05 +0100, Ingo Molnar wrote:
> * Zhang, Yanmin <[EMAIL PROTECTED]> wrote:
> 
> > Although no source codes of volanoMark, I suspect it calls 
> > Thread.sched. volanoMark is a kind of chatroom benchmark. When a 
> > client sends out a message, server will send the message to all 
> > clients. I suspect the client calls Thread.yield after sending out a 
> > couple of messages.
> 
> yeah, so far only volanomark seems to be affected by this, and if it 
> indeed calls Thread.yield artificially it's a pretty stupid benchmark 
> and it's not the fault of the JDK. If we had the source to volanomark we 
> could fix this easily.
> 
> > 2 JVM all have regression if sched_compat_yield=0.
> > 
> > I ran some testing, such like iozone/specjbb/tbench/dbench/sysbench, 
> > and didn't see regression.
> 
> which JVM was utilized by the specjbb (Java Business Benchmark) tests?
BEA Jrockit. It supports huge pages which promote performance for about 8%~10%.

-yanmin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Updates to nfsroot documentation

2007-12-03 Thread Amos Waterland

On Tue, Dec 04, 2007 at 01:24:40PM +0900, Simon Horman wrote:
> On Mon, Dec 03, 2007 at 10:43:45PM -0500, Amos Waterland wrote:
> > The difference between ip=off and ip=::off has been a cause of much
> > confusion.  Document how each behaves, and do not contradict ourselves
> > by saying that "off" is the default when in fact "any" is the default
> > and is descibed as being so lower in the file.
> 
> Is that really how it works? If so it sounds a bit silly to me.
> Surely it would be desirable for ip=off and ip=::off to
> do the same thing. Or am I missing the point?

Yes, that is how it works.  Pretty confusing, so I figured I'd better
send in a patch to document it.

In the ip=::off case, the code in ip_auto_config() sees that
ic_enable is asserted but that ic_myaddr is NONE and proceeds to do
autoconfiguration.

I'd welcome comments from people on whether we should change how it
works instead of just document it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Freezer: Fix JFFS2 garbage collector freezing issue (rev. 2)

2007-12-03 Thread Len Brown

Applied to suspend branch.

thanks,
-len

On Monday 03 December 2007 19:11, Rafael J. Wysocki wrote:
> [This is a replacement for
> freezer-fix-jffs2-garbage-collector-freezing-issue.patch]
> ---
> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Fix breakage caused by commit d5d8c5976d6adeddb8208c240460411e2198b393
> "freezer: do not send signals to kernel threads" in
> jffs2_garbage_collect_thread() that assumed it would be sent signals
> by the freezer.
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> Cc: David Woodhouse <[EMAIL PROTECTED]>
> Cc: Pete MacKay <[EMAIL PROTECTED]>
> ---
>  fs/jffs2/background.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/fs/jffs2/background.c
> ===
> --- linux-2.6.orig/fs/jffs2/background.c
> +++ linux-2.6/fs/jffs2/background.c
> @@ -105,7 +105,7 @@ static int jffs2_garbage_collect_thread(
>  
>   /* Put_super will send a SIGKILL and then wait on the sem.
>*/
> - while (signal_pending(current)) {
> + while (signal_pending(current) || freezing(current)) {
>   siginfo_t info;
>   unsigned long signr;
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] rewrite rd

2007-12-03 Thread Andrew Morton

On Tue, 4 Dec 2007 05:26:28 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> This is my proposal for a (hopefully) backwards compatible rd driver.
> The motivation for me is not pressing, except that I have this code
> sitting here that is either going to rot or get merged. I'm happy to
> make myself maintainer of this code, but if any real block device
> driver writer would like to, that would be fine by me ;)
> 
> Comments?
> 
> --
> 
> This is a rewrite of the ramdisk block device driver.
> 
> The old one is really difficult because it effectively implements a block
> device which serves data out of its own buffer cache. It relies on the dirty
> bit being set, to pin its backing store in cache, however there are non
> trivial paths which can clear the dirty bit (eg. try_to_free_buffers()),
> which had recently lead to data corruption. And in general it is completely
> wrong for a block device driver to do this.
> 
> The new one is more like a regular block device driver. It has no idea
> about vm/vfs stuff. It's backing store is similar to the buffer cache
> (a simple radix-tree of pages), but it doesn't know anything about page
> cache (the pages in the radix tree are not pagecache pages).
> 
> There is one slight downside -- direct block device access and filesystem
> metadata access goes through an extra copy and gets stored in RAM twice.
> However, this downside is only slight, because the real buffercache of the
> device is now reclaimable (because we're not playing crazy games with it),
> so under memory intensive situations, footprint should effectively be the
> same -- maybe even a slight advantage to the new driver because it can also
> reclaim buffer heads.

what about mmap(/dev/ram0)?

> The fact that it now goes through all the regular vm/fs paths makes it
> much more useful for testing, too.
> 
>textdata bss dec hex filename
>2837 849 3844070 fe6 drivers/block/rd.o
>3528 371  123911 f47 drivers/block/brd.o
> 
> Text is larger, but data and bss are smaller, making total size smaller.
> 
> A few other nice things about it:
> - Similar structure and layout to the new loop device handlinag.
> - Dynamic ramdisk creation.
> - Runtime flexible buffer head size (because it is no longer part of the
>   ramdisk code).
> - Boot / load time flexible ramdisk size, which could easily be extended
>   to a per-ramdisk runtime changeable size (eg. with an ioctl).

This ramdisk driver can use highmem whereas the existing one can't (yes?). 
That's a pretty major difference.  Plus look at the revoltingness in rd.c's
mapping_set_gfp_mask()s.

> +++ linux-2.6/drivers/block/brd.c
> @@ -0,0 +1,536 @@
> +/*
> + * Ram backed block device driver.
> + *
> + * Copyright (C) 2007 Nick Piggin
> + * Copyright (C) 2007 Novell Inc.
> + *
> + * Parts derived from drivers/block/rd.c, and drivers/block/loop.c, copyright
> + * of their respective owners.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include  /* invalidate_bh_lrus() */
> +
> +#include 
> +
> +#define SECTOR_SHIFT 9

That's our third definition of SECTOR_SHIFT.

> +#define PAGE_SECTORS_SHIFT   (PAGE_SHIFT - SECTOR_SHIFT)
> +#define PAGE_SECTORS (1 << PAGE_SECTORS_SHIFT)
> +
> +/*
> + * Each block ramdisk device has a radix_tree brd_pages of pages that stores
> + * the pages containing the block device's contents. A brd page's ->index is
> + * its offset in PAGE_SIZE units. This is similar to, but in no way connected
> + * with, the kernel's pagecache or buffer cache (which sit above our block
> + * device).
> + */
> +struct brd_device {
> + int brd_number;
> + int brd_refcnt;
> + loff_t  brd_offset;
> + loff_t  brd_sizelimit;
> + unsignedbrd_blocksize;
> +
> + struct request_queue*brd_queue;
> + struct gendisk  *brd_disk;
> + struct list_headbrd_list;
> +
> + /*
> +  * Backing store of pages and lock to protect it. This is the contents
> +  * of the block device.
> +  */
> + spinlock_t  brd_lock;
> + struct radix_tree_root  brd_pages;
> +};
> +
> +/*
> + * Look up and return a brd's page for a given sector.
> + */
> +static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
> +{
> + unsigned long idx;

Could use pgoff_t here if you think that's clearer.

> + struct page *page;
> +
> + /*
> +  * The page lifetime is protected by the fact that we have opened the
> +  * device node -- brd pages will never be deleted under us, so we
> +  * don't need any further locking or refcounting.
> +  *
> +  * This is strictly true for the radix-tree nodes as well (ie. we
> +  * don't actually need the rcu_read_lock()), however that is not a
> +  * documented feature of the radix-tree API so it is better to be
> +

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-03 Thread Andrew Morgan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KaiGai Kohei wrote:
> Serge,
> 
> Please tell me the meanings of the following condition.
> 
>> diff --git a/security/commoncap.c b/security/commoncap.c
>> index 3a95990..cb71bb0 100644
>> --- a/security/commoncap.c
>> +++ b/security/commoncap.c
>> @@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target,
>> kernel_cap_t *effective,
>>  /* incapable of using this inheritable set */
>>  return -EPERM;
>>  }
>> +if (!!cap_issubset(*inheritable,
>> +   cap_combine(target->cap_inheritable,
>> +   current->cap_bset))) {
>> +/* no new pI capabilities outside bounding set */
>> +return -EPERM;
>> +}
>>  
>>  /* verify restrictions on target's new Permitted set */
>>  if (!cap_issubset (*permitted,
> 
> It seems to me this condition requires the new inheritable capability
> set must have a capability more than bounding set, at least.
> What is the purpose of this checking?

Yes, the !! was a bug. The correct check is a single !.

(Thus, the correct check says no 'new' pI bits can be outside cap_bset.)

Cheers

Andrew

> 
> In the initial state, any process have no inheritable capability set
> and full bounding set. Thus, we cannot do capset() always.
> 
> Thanks,

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHVPBS+bHCR3gb8jsRAnxQAJ0Vna82bl9M11OL/uuEe21nF5+9TACfSzGi
aY0SUvMmLZCIF0KovBTpihE=
=wT9N
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pci: Fix bus resource assignment on 32 bits with 64b resources

2007-12-03 Thread Benjamin Herrenschmidt

The current pci_assign_unassigned_resources() code doesn't work properly
on 32 bits platforms with 64 bits resources. The main reason is the use
of unsigned long in various places instead of resource_size_t.

This fixes it, along with some tricks to avoid casting to 64 bits on
platforms that don't need it in every printk around.

This is a pre-requisite for making powerpc use the generic code instead of
its own half-useful implementation.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

 drivers/pci/pci.h   |   11 +++
 drivers/pci/setup-bus.c |   32 +---
 drivers/pci/setup-res.c |5 ++---
 include/linux/pci.h |4 ++--
 4 files changed, 32 insertions(+), 20 deletions(-)

Index: linux-work/drivers/pci/pci.h
===
--- linux-work.orig/drivers/pci/pci.h   2007-12-04 17:00:43.0 +1100
+++ linux-work/drivers/pci/pci.h2007-12-04 17:02:11.0 +1100
@@ -91,3 +91,14 @@ pci_match_one_device(const struct pci_de
 }
 
 struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev);
+
+#ifdef CONFIG_RESOURCES_64BIT
+#define RESOURCE_ORDER(order)  (1ULL << (order))
+#define RES_PR "%016llx"
+#else
+#define RESOURCE_ORDER(order)  (1UL << (order))
+#define RES_PR "%08lx"
+#endif
+
+#define RANGE_PR   RES_PR "-" RES_PR
+
Index: linux-work/drivers/pci/setup-bus.c
===
--- linux-work.orig/drivers/pci/setup-bus.c 2007-12-04 17:00:43.0 
+1100
+++ linux-work/drivers/pci/setup-bus.c  2007-12-04 17:04:23.0 +1100
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+#include "pci.h"
 
 #define DEBUG_CONFIG 1
 #if DEBUG_CONFIG
@@ -89,8 +90,9 @@ void pci_setup_cardbus(struct pci_bus *b
 * The IO resource is allocated a range twice as large as it
 * would normally need.  This allows us to set both IO regs.
 */
-   printk("  IO window: %08lx-%08lx\n",
-   region.start, region.end);
+   printk(KERN_INFO "  IO window: 0x%08lx-0x%08lx\n",
+  (unsigned long)region.start,
+  (unsigned long)region.end);
pci_write_config_dword(bridge, PCI_CB_IO_BASE_0,
region.start);
pci_write_config_dword(bridge, PCI_CB_IO_LIMIT_0,
@@ -99,7 +101,7 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[1]);
if (bus->resource[1]->flags & IORESOURCE_IO) {
-   printk("  IO window: %08lx-%08lx\n",
+   printk(KERN_INFO "  IO window: "RANGE_PR"\n",
region.start, region.end);
pci_write_config_dword(bridge, PCI_CB_IO_BASE_1,
region.start);
@@ -109,7 +111,7 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[2]);
if (bus->resource[2]->flags & IORESOURCE_MEM) {
-   printk("  PREFETCH window: %08lx-%08lx\n",
+   printk(KERN_INFO "  PREFETCH window: "RANGE_PR"\n",
region.start, region.end);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_0,
region.start);
@@ -119,7 +121,7 @@ void pci_setup_cardbus(struct pci_bus *b
 
pcibios_resource_to_bus(bridge, , bus->resource[3]);
if (bus->resource[3]->flags & IORESOURCE_MEM) {
-   printk("  MEM window: %08lx-%08lx\n",
+   printk(KERN_INFO "  MEM window: "RANGE_PR"\n",
region.start, region.end);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_1,
region.start);
@@ -159,7 +161,8 @@ pci_setup_bridge(struct pci_bus *bus)
/* Set up upper 16 bits of I/O base/limit. */
io_upper16 = (region.end & 0x) | (region.start >> 16);
DBG(KERN_INFO "  IO window: %04lx-%04lx\n",
-   region.start, region.end);
+   (unsigned long)region.start,
+   (unsigned long)region.end);
}
else {
/* Clear upper 16 bits of I/O base/limit. */
@@ -180,7 +183,7 @@ pci_setup_bridge(struct pci_bus *bus)
if (bus->resource[1]->flags & IORESOURCE_MEM) {
l = (region.start >> 16) & 0xfff0;
l |= region.end & 0xfff0;
-   DBG(KERN_INFO "  MEM window: %08lx-%08lx\n",
+   DBG(KERN_INFO "  MEM window: "RANGE_PR"\n",
region.start, region.end);
}
else {
@@ -199,7 +202,7 @@ pci_setup_bridge(struct pci_bus *bus)
if (bus->resource[2]->flags & IORESOURCE_PREFETCH) {
l = (region.start >> 16) & 0xfff0;

[PATCH v2] Fix hardware IRQ time accounting problem.

2007-12-03 Thread Tony Breeds

The commit fa13a5a1f25f671d084d8884be96fc48d9b68275 (sched: restore
deterministic CPU accounting on powerpc), unconditionally calls
update_process_tick() in system context.  In the deterministic accounting case
this is the correct thing to do.  However, in the non-deterministic accounting
case we need to not do this, and results in the time accounted as hardware irq
time being artificially elevated.

Also this patch collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING'
checks in time.h into one for neatness.

Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
---
The problem was seen and reported by Johannes Berg  and Frederik Himpe.
Paul, I think this is good for 2.6.24.

Changes since v1:
 - I noticed that the #define was explictly using "current" rather than
   the task passed in.  Using tsk is the right thing to do.
 - The whiteapce changes dirty-up the patch and are un-needed with the
   change above.

 arch/powerpc/kernel/process.c |2 +-
 include/asm-powerpc/time.h|8 ++--
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 41e13f4..b9d8837 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -350,7 +350,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
local_irq_save(flags);
 
account_system_vtime(current);
-   account_process_tick(current, 0);
+   account_process_vtime(current);
calculate_steal_time();
 
last = _switch(old_thread, new_thread);
diff --git a/include/asm-powerpc/time.h b/include/asm-powerpc/time.h
index 780f826..a7281e0 100644
--- a/include/asm-powerpc/time.h
+++ b/include/asm-powerpc/time.h
@@ -237,18 +237,14 @@ struct cpu_usage {
 
 DECLARE_PER_CPU(struct cpu_usage, cpu_usage_array);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-extern void account_process_vtime(struct task_struct *tsk);
-#else
-#define account_process_vtime(tsk) do { } while (0)
-#endif
-
 #if defined(CONFIG_VIRT_CPU_ACCOUNTING)
 extern void calculate_steal_time(void);
 extern void snapshot_timebases(void);
+#define account_process_vtime(tsk) account_process_tick(tsk, 0);
 #else
 #define calculate_steal_time() do { } while (0)
 #define snapshot_timebases()   do { } while (0)
+#define account_process_vtime(tsk) do { } while (0)
 #endif
 
 extern void secondary_cpu_time_init(void);
-- 
1.5.3.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.24-rc4

2007-12-03 Thread Linus Torvalds


We should have one week between -rc releases, but I was gone for a week 
over thanksgiving (as were some other kernel developers), so this one is a 
bit late. It's been almost the rule rather than the exception, but I 
promise I'll be better...

Anyway, there aren't a lot of exciting changes here, but there's still a 
_lot_ more churn than I really hoped for at the -rc4 stage. Blackfin, MIPS 
and Power do stand out in the diffstats, but ARM and x86 got some updates 
too.

And we had some ACPI churn (processor throttling etc), along with various 
driver updates: ATA, IDE, infiniband, SCSI, USB and network drivers.. And 
on the filesystem side, cifs, NFS, ocfs2 and proc. Ugh. Too much.

In fact, the diff from -rc3 is almost 36,000 lines, and that's the smaller 
git one with the renames shown as renames (not the ones I upload as 
patches to kernel.org - those are done so that people with GNU patch and 
other legacy patch programs can use the diffs). I'll blame the two-week 
window for some of it, but even so, this is a bit disheartening. I'm 
really hoping that we're slowing down and -rc5 won't be anywhere near that 
large.

That said, none of the changes are really _exciting_ or really scary. And 
we should have fixed a number of regressions, although more certainly 
remain.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] (2.6.24-rc3-mm2) -mm Smack mutex cleanup

2007-12-03 Thread Casey Schaufler


--- Jiri Slaby <[EMAIL PROTECTED]> wrote:

> On 12/03/2007 07:39 PM, Casey Schaufler wrote:
> > From: Casey Schaufler <[EMAIL PROTECTED]>
> > 
> > Clean out unnecessary mutex initializations for Smack list locks.
> > Once this is done, there is no need for them to be shared among
> > multiple files, so pull them out of the header file and put them
> > in the files where they belong.
> 
> Then it might be static.

Doh. Right you are.
 
> > Pull unnecessary locking from smack_inode_setsecurity, it used
> > to be required when the assignment was not guaranteed to be a
> > scalar value but isn't now.
> > 
> > Change uses of __capable(current,...) to capable(...).
> > Take out an inappropriate cast. Use container_of() instead
> > of doing the same calculation by hand.
> > Fix comment spelling errors.
> 
> Too many different changes according to the name of the patch.

OK, that's fair.
 
> > Signed-off-by: Casey Schaufler <[EMAIL PROTECTED]>
> > 
> > ---
> > 
> > Tested with stamp-2007-11-30-16-39
> > 
> >  security/smack/smack.h|3 --
> >  security/smack/smack_access.c |3 ++
> >  security/smack/smack_lsm.c|   34 +---
> >  security/smack/smackfs.c  |6 +
> >  4 files changed, 19 insertions(+), 27 deletions(-)
> > 
> > diff -uprN -X linux-2.6.24-rc3-mm2-base/Documentation/dontdiff
> linux-2.6.24-rc3-mm2-base/security/smack/smack_lsm.c
> linux-2.6.24-rc3-mm2-smack/security/smack/smack_lsm.c
> > --- linux-2.6.24-rc3-mm2-base/security/smack/smack_lsm.c2007-11-27
> 16:47:05.0 -0800
> > +++ linux-2.6.24-rc3-mm2-smack/security/smack/smack_lsm.c   2007-11-28
> 11:46:13.0 -0800
> [...]
> > @@ -748,9 +746,7 @@ static int smack_inode_setsecurity(struc
> > return -EINVAL;
> >  
> > if (strcmp(name, XATTR_SMACK_SUFFIX) == 0) {
> > -   mutex_lock(>smk_lock);
> > nsp->smk_inode = sp;
> > -   mutex_unlock(>smk_lock);
> > return 0;
> > }
> > /*
> 
> Ok, it still might be atomic as a variable change, but it will break
> scenarios
> such as
> 
> mutex_lock(>smk_lock);
> create(nsp->smk_inode);
> cook_a_dinner();
> get_info(nsp->smk_inode);
> mutex_unlock(>smk_lock);
> 
> While cook_a_dinner(), smack_inode_setsecurity() is called and the attribute
> changed...
> 
> Doesn't this matter?

The only place dinner can get cooked is during d_instantiate, and
you can't call inode_security until after that's finished. No,
it doesn't matter.


Casey Schaufler
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix hardware IRQ time accounting problem.

2007-12-03 Thread Tony Breeds

The commit fa13a5a1f25f671d084d8884be96fc48d9b68275, unconditionally calls
update_process_tick() in system context.  In the deterministic accounting case
this is the correct thing to do.  However, in the non-deterministic accounting
case we need to not do this, and results in the time accounted as
hardware irq time being artificially elevated.

Also this patch collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING'
checks in time.h into for neatness.

Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
---
The problem was seen and reported by Johannes Berg  and Frederik Himpe.
Paul, I think this is good for 2.6.24.


 arch/powerpc/kernel/process.c |2 +-
 include/asm-powerpc/time.h|   12 
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 41e13f4..b9d8837 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -350,7 +350,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
local_irq_save(flags);
 
account_system_vtime(current);
-   account_process_tick(current, 0);
+   account_process_vtime(current);
calculate_steal_time();
 
last = _switch(old_thread, new_thread);
diff --git a/include/asm-powerpc/time.h b/include/asm-powerpc/time.h
index 780f826..8a2c8db 100644
--- a/include/asm-powerpc/time.h
+++ b/include/asm-powerpc/time.h
@@ -237,18 +237,14 @@ struct cpu_usage {
 
 DECLARE_PER_CPU(struct cpu_usage, cpu_usage_array);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-extern void account_process_vtime(struct task_struct *tsk);
-#else
-#define account_process_vtime(tsk) do { } while (0)
-#endif
-
 #if defined(CONFIG_VIRT_CPU_ACCOUNTING)
 extern void calculate_steal_time(void);
 extern void snapshot_timebases(void);
+#define account_process_vtime(tsk) account_process_tick(current, 0);
 #else
-#define calculate_steal_time() do { } while (0)
-#define snapshot_timebases()   do { } while (0)
+#define calculate_steal_time() do { } while (0)
+#define snapshot_timebases()   do { } while (0)
+#define account_process_vtime(tsk) do { } while (0)
 #endif
 
 extern void secondary_cpu_time_init(void);
-- 
1.5.3.6


Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] capabilities: introduce per-process capability bounding set (v10)

2007-12-03 Thread KaiGai Kohei


Serge,

Please tell me the meanings of the following condition.


diff --git a/security/commoncap.c b/security/commoncap.c
index 3a95990..cb71bb0 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct *target, 
kernel_cap_t *effective,
/* incapable of using this inheritable set */
return -EPERM;
}
+   if (!!cap_issubset(*inheritable,
+  cap_combine(target->cap_inheritable,
+  current->cap_bset))) {
+   /* no new pI capabilities outside bounding set */
+   return -EPERM;
+   }
 
 	/* verify restrictions on target's new Permitted set */

if (!cap_issubset (*permitted,


It seems to me this condition requires the new inheritable capability
set must have a capability more than bounding set, at least.
What is the purpose of this checking?

In the initial state, any process have no inheritable capability set
and full bounding set. Thus, we cannot do capset() always.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] rewrite rd

2007-12-03 Thread Nick Piggin

Hi,

This is my proposal for a (hopefully) backwards compatible rd driver.
The motivation for me is not pressing, except that I have this code
sitting here that is either going to rot or get merged. I'm happy to
make myself maintainer of this code, but if any real block device
driver writer would like to, that would be fine by me ;)

Comments?

--

This is a rewrite of the ramdisk block device driver.

The old one is really difficult because it effectively implements a block
device which serves data out of its own buffer cache. It relies on the dirty
bit being set, to pin its backing store in cache, however there are non
trivial paths which can clear the dirty bit (eg. try_to_free_buffers()),
which had recently lead to data corruption. And in general it is completely
wrong for a block device driver to do this.

The new one is more like a regular block device driver. It has no idea
about vm/vfs stuff. It's backing store is similar to the buffer cache
(a simple radix-tree of pages), but it doesn't know anything about page
cache (the pages in the radix tree are not pagecache pages).

There is one slight downside -- direct block device access and filesystem
metadata access goes through an extra copy and gets stored in RAM twice.
However, this downside is only slight, because the real buffercache of the
device is now reclaimable (because we're not playing crazy games with it),
so under memory intensive situations, footprint should effectively be the
same -- maybe even a slight advantage to the new driver because it can also
reclaim buffer heads.

The fact that it now goes through all the regular vm/fs paths makes it
much more useful for testing, too.

   textdata bss dec hex filename
   2837 849 3844070 fe6 drivers/block/rd.o
   3528 371  123911 f47 drivers/block/brd.o

Text is larger, but data and bss are smaller, making total size smaller.

A few other nice things about it:
- Similar structure and layout to the new loop device handlinag.
- Dynamic ramdisk creation.
- Runtime flexible buffer head size (because it is no longer part of the
  ramdisk code).
- Boot / load time flexible ramdisk size, which could easily be extended
  to a per-ramdisk runtime changeable size (eg. with an ioctl).

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
---
 MAINTAINERS|5 
 drivers/block/Kconfig  |   12 -
 drivers/block/Makefile |2 
 drivers/block/brd.c|  500 +
 drivers/block/rd.c |  536 -
 fs/buffer.c|1 
 6 files changed, 508 insertions(+), 548 deletions(-)

Index: linux-2.6/drivers/block/Kconfig
===
--- linux-2.6.orig/drivers/block/Kconfig
+++ linux-2.6/drivers/block/Kconfig
@@ -311,7 +311,7 @@ config BLK_DEV_UB
  If unsure, say N.
 
 config BLK_DEV_RAM
-   tristate "RAM disk support"
+   tristate "RAM block device support"
---help---
  Saying Y here will allow you to use a portion of your RAM memory as
  a block device, so that you can make file systems on it, read and
@@ -346,16 +346,6 @@ config BLK_DEV_RAM_SIZE
  The default value is 4096 kilobytes. Only change this if you know
  what you are doing.
 
-config BLK_DEV_RAM_BLOCKSIZE
-   int "Default RAM disk block size (bytes)"
-   depends on BLK_DEV_RAM
-   default "1024"
-   help
- The default value is 1024 bytes.  PAGE_SIZE is a much more
- efficient choice however.  The default is kept to ensure initrd
- setups function - apparently needed by the rd_load_image routine
- that supposes the filesystem in the image uses a 1024 blocksize.
-
 config CDROM_PKTCDVD
tristate "Packet writing on CD/DVD media"
depends on !UML
Index: linux-2.6/drivers/block/Makefile
===
--- linux-2.6.orig/drivers/block/Makefile
+++ linux-2.6/drivers/block/Makefile
@@ -11,7 +11,7 @@ obj-$(CONFIG_AMIGA_FLOPPY)+= amiflop.o
 obj-$(CONFIG_PS3_DISK) += ps3disk.o
 obj-$(CONFIG_ATARI_FLOPPY) += ataflop.o
 obj-$(CONFIG_AMIGA_Z2RAM)  += z2ram.o
-obj-$(CONFIG_BLK_DEV_RAM)  += rd.o
+obj-$(CONFIG_BLK_DEV_RAM)  += brd.o
 obj-$(CONFIG_BLK_DEV_LOOP) += loop.o
 obj-$(CONFIG_BLK_DEV_PS2)  += ps2esdi.o
 obj-$(CONFIG_BLK_DEV_XD)   += xd.o
Index: linux-2.6/drivers/block/brd.c
===
--- /dev/null
+++ linux-2.6/drivers/block/brd.c
@@ -0,0 +1,536 @@
+/*
+ * Ram backed block device driver.
+ *
+ * Copyright (C) 2007 Nick Piggin
+ * Copyright (C) 2007 Novell Inc.
+ *
+ * Parts derived from drivers/block/rd.c, and drivers/block/loop.c, copyright
+ * of their respective owners.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include  /*

Re: [PATCH] Updates to nfsroot documentation

2007-12-03 Thread Simon Horman

On Tue, Dec 04, 2007 at 01:24:40PM +0900, Simon Horman wrote:
> On Mon, Dec 03, 2007 at 10:43:45PM -0500, Amos Waterland wrote:
> > The difference between ip=off and ip=::off has been a cause of much
> > confusion.  Document how each behaves, and do not contradict ourselves
> > by saying that "off" is the default when in fact "any" is the default
> > and is descibed as being so lower in the file.
> 
> Is that really how it works? If so it sounds a bit silly to me.
> Surely it would be desirable for ip=off and ip=::off to
> do the same thing. Or am I missing the point?
> 
> > Signed-off-by: Amos Waterland <[EMAIL PROTECTED]>
> > 
> > ---
> > 
> >  nfsroot.txt |9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt
> > index 16a7cae..ac04a1d 100644
> > --- a/Documentation/nfsroot.txt
> > +++ b/Documentation/nfsroot.txt
> > @@ -92,8 +92,11 @@ 
> > ip=::
> >autoconfiguration.
> >  
> >The  parameter can appear alone as the value to the `ip'
> > -  parameter (without all the ':' characters before) in which case auto-
> > -  configuration is used.
> > +  parameter (without all the ':' characters before).  If the value is
> > +  "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
> > +  autoconfiguration will take place.  Note that "ip=off" is not the same
> > +  thing as "ip=::off", because in the latter autoconfiguration will 
> > take
> > +  place if any of DHCP, BOOTP or RARP are compiled in the kernel.
> >  
> >  IP address of the client.
> >  
> > @@ -142,7 +145,7 @@ 
> > ip=::
> > into the kernel will be used, regardless of the value of
> > this option.
> >  
> > -  off or none: don't use autoconfiguration (default)
> > +  off or none: don't use autoconfiguration
> >   on or any:   use any protocol available in the kernel
> >   dhcp:use DHCP
> >   bootp:   use BOOTP
> 
> This second fragment seems fine, though perhaps the documentation in
> net/ipv4/ipconfig.c, just above ic_proto_name, should also be updated
> too.

Deleting the documentation in net/ipv4/ipconfig.c might actually
better, as it just duplicates part of what is in Documentation/nfsroot.txt

-- 
Horms

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: remote debugging via FireWire * fast firedump!

2007-12-03 Thread Bernhard Kaindl


Hi,
   I just wanted to let you know that I'll have picked up the early
firewire patch again and cleaned it up very much so that it should
be ready to submit it and but it on the patch-submission road.

What's left to do is to write some HOWTO like Stefan describes
below, but I'll try to get that done soon.

I've also started working on the userspace tools and got firescope
to work across the 32/64-bit machines (both directions), there is
one hack (which I should do in a clean way insteat) in that patch
of which I do not know if that works in ppc/ppc64 but I could look
at it if needed or send the patch to Benjamin for adding support
for ppc64, to do it properly, we'll probably need an target architecture
option in firescope and as I do not know if it's needed by Benjamin,
I left out ppc64 for now.

I have just had the guts to explore __fast__ memory dumping over
firewire for full-system dumps (reading quadlets is __painfully__
show if you want to read 2GB of memory over the bus, you only get
about some some kilobytes each second) using raw1394_start_read()
to allow also block reads instead of just quatlet reads.

The biggest block size that worked here was 2048 bytes, which was
enough to get nearly 10MB/s of data transfer rate from the remote
memory to disk. Dumping 2GB of remote memory was just a matter of
about 3 few short minutes which quickly ran by.

Afterwards, the victim was dead (I excluded the low MB of memory,
so something else must have caused this), at least the start of
the dump looked well, but I haven't tested the error handling yet,
but I'll send you the tool (I called it firedump) soon.

Bernhard

On Sat, 10 Feb 2007, Stefan Richter wrote:


Andi Kleen wrote at LKML:
...

Not likely to make .21:

...

- Early firewire support for firescope at early boot

...

Was it seen in canonical patch format on a mailinglist before?
Is it Bernhard Kaindl's ohci1394_early?
http://www.suse.de/~bk/firewire/

Would be good to put this on the usual patch-submission road in order to
prep it for 2.6.22. Could be handled via linux1394-2.6.git, although a
different channel where the actual users of this facility watch would
IMO be more appropriate.

I also have suggestions, at least WRT Bernhard's code:
 - The Kconfig option could go into the "Kernel hacking" submenu rather
   than the IEEE 1394 submenu. (The driver source should stay in
   drivers/ieee1394.)
 - Leave a note in the Kconfig help how it is typically used, i.e. what
   is required on the remote terminal side, where to find firescope,
   fireproxy etc. and assorted HOWTOs.
 - Indicate in the Kconfig help that only a 4GB address range is made
   visible this way.

A mostly unrelated note: A simple to set up remote-dmesg utility would
be nice to have on the terminal side. Maybe a small ieee1394 high-level
driver which gives hints on the location of the dmesg buffer via
configuration ROM would be warranted. Or is it feasible to find the
dmesg buffer by plain memory analysis?
--
Stefan Richter
-=-=-=== --=- -=-=-
http://arcgraph.de/sr/

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk=120709=263057=121642
___
mailing list [EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/linux1394-devel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Updates to nfsroot documentation

2007-12-03 Thread Simon Horman

On Mon, Dec 03, 2007 at 10:43:45PM -0500, Amos Waterland wrote:
> The difference between ip=off and ip=::off has been a cause of much
> confusion.  Document how each behaves, and do not contradict ourselves
> by saying that "off" is the default when in fact "any" is the default
> and is descibed as being so lower in the file.

Is that really how it works? If so it sounds a bit silly to me.
Surely it would be desirable for ip=off and ip=::off to
do the same thing. Or am I missing the point?

> Signed-off-by: Amos Waterland <[EMAIL PROTECTED]>
> 
> ---
> 
>  nfsroot.txt |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt
> index 16a7cae..ac04a1d 100644
> --- a/Documentation/nfsroot.txt
> +++ b/Documentation/nfsroot.txt
> @@ -92,8 +92,11 @@ 
> ip=::
>autoconfiguration.
>  
>The  parameter can appear alone as the value to the `ip'
> -  parameter (without all the ':' characters before) in which case auto-
> -  configuration is used.
> +  parameter (without all the ':' characters before).  If the value is
> +  "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
> +  autoconfiguration will take place.  Note that "ip=off" is not the same
> +  thing as "ip=::off", because in the latter autoconfiguration will take
> +  place if any of DHCP, BOOTP or RARP are compiled in the kernel.
>  
>IP address of the client.
>  
> @@ -142,7 +145,7 @@ 
> ip=::
>   into the kernel will be used, regardless of the value of
>   this option.
>  
> -  off or none: don't use autoconfiguration (default)
> +  off or none: don't use autoconfiguration
> on or any:   use any protocol available in the kernel
> dhcp:use DHCP
> bootp:   use BOOTP

This second fragment seems fine, though perhaps the documentation in
net/ipv4/ipconfig.c, just above ic_proto_name, should also be updated
too.

-- 
Horms

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Is BIO_RW_FAILFAST really usable?

2007-12-03 Thread Andrey Borzenkov

Jeff Garzik wrote:

> Neil Brown wrote:
>> I've been looking at use BIO_RW_FAILFAST in md/raid to improve
>> handling of some error cases.
>> 
>> This is particularly significant for the DASD driver (s390 specific).
>> I believe it uses optic fibre to connect to the drives.  When one of
>> these paths is unplugged, IO requests will block until an operator
>> runs a command to reset the card (or until it is plugged back in).

Are there any options? This reminds me of Emulex lpfc driver that by default
would retry forever but could be configured to fail request after timeout.

>> The only way to avoid this blockage is to use BIO_RW_FAILFAST.  So
>> we really need BIO_RW_FAILFAST for a reliable RAID1 configuration on
>> DASD drives.
>> 
>> However, I just tested BIO_RW_FAILFAST on my SATA drives: controller
>> 
>> 02:06.0 RAID bus controller: Silicon Image, Inc. SiI 3114
>> [SATALink/SATARaid] Serial ATA Controller (rev 02)
>> 
>> (not using the cards minimal RAID functionality) and requests fail
>> immediately and always with e.g.
>> 
>> sd 2:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT
>> driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector
>> 2048
>> 
>> So fail fast obviously isn't generally usable.
>> 
>> What is the answer here?  Is the Silicon Image driver doing the wrong
>> thing, or is DASD doing the wrong thing, or is BIO_RW_FAILFAST
>> under-specified and we really need multiple flags or what?
> 
> It's a hard thing to implement, in general, for scalability reasons.
> 
> To make it work, you need to examine each driver's error handling to
> figure out what "fail fast" really means.
> 
> Most storage drivers are written to try as hard as possible to complete
> a request, where "try as hard as possible" can often mean internal
> retries while trying various multi-path configurations and hardware mode
> changes.  You might be catching SATA in the middle of error handling,
> for example.
> 
> So each driver really has a /slight different/ version of "try to
> complete this request", which has the obvious effects on BIO_RW_FAILFAST.
> 
> No clue about DASD, but in SATA's case I bet that a media or transfer
> error could be returned to the system more rapidly, while we continue to
> try to recover in the background. 

Well, FAILFAST is really needed only for redundant configuration (either
multipath or RAID). But in this case what is the point of retrying request
at all? It just complicates implementation.

Just fail request as soon as possible and let upper layer to recover.

-andrey

> libata doesn't have any direct 
> knowledge of fail-fast at this point, IIRC.
> 
> But overall it's a job where you must examine each driver, or set of
> drivers :/
> 
> Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Herbert Xu

On Tue, Dec 04, 2007 at 11:50:55AM +0800, Wang Chen wrote:
>
> >  #include 
> >  #include 
> > +#include 
> 
> It's no need to include smp.h?

We need it for smp_processor_id.  Well we needed it before too but
there's probably some implicit inclusion which has made it work.
It's better to declare these inclusions explicitly as otherwise
they may break on less common architectures later.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Wang Chen

Herbert Xu said the following on 2007-12-3 21:17:
> On Mon, Dec 03, 2007 at 10:54:35PM +1100, Herbert Xu wrote:
> diff --git a/include/net/snmp.h b/include/net/snmp.h
> index ea206bf..9444b54 100644
> --- a/include/net/snmp.h
> +++ b/include/net/snmp.h
> @@ -23,6 +23,7 @@
>  
>  #include 
>  #include 
> +#include 

It's no need to include smp.h?

--
WCN

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Is BIO_RW_FAILFAST really usable?

2007-12-03 Thread Jeff Garzik


Neil Brown wrote:

I've been looking at use BIO_RW_FAILFAST in md/raid to improve
handling of some error cases.

This is particularly significant for the DASD driver (s390 specific).
I believe it uses optic fibre to connect to the drives.  When one of
these paths is unplugged, IO requests will block until an operator
runs a command to reset the card (or until it is plugged back in).
The only way to avoid this blockage is to use BIO_RW_FAILFAST.  So
we really need BIO_RW_FAILFAST for a reliable RAID1 configuration on
DASD drives.

However, I just tested BIO_RW_FAILFAST on my SATA drives: controller 


02:06.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] 
Serial ATA Controller (rev 02)

(not using the cards minimal RAID functionality) and requests fail
immediately and always with e.g.

sd 2:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdc, sector 2048

So fail fast obviously isn't generally usable.

What is the answer here?  Is the Silicon Image driver doing the wrong
thing, or is DASD doing the wrong thing, or is BIO_RW_FAILFAST
under-specified and we really need multiple flags or what?


It's a hard thing to implement, in general, for scalability reasons.

To make it work, you need to examine each driver's error handling to 
figure out what "fail fast" really means.


Most storage drivers are written to try as hard as possible to complete 
a request, where "try as hard as possible" can often mean internal 
retries while trying various multi-path configurations and hardware mode 
changes.  You might be catching SATA in the middle of error handling, 
for example.


So each driver really has a /slight different/ version of "try to 
complete this request", which has the obvious effects on BIO_RW_FAILFAST.


No clue about DASD, but in SATA's case I bet that a media or transfer 
error could be returned to the system more rapidly, while we continue to 
try to recover in the background.  libata doesn't have any direct 
knowledge of fail-fast at this point, IIRC.


But overall it's a job where you must examine each driver, or set of 
drivers :/


Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Updates to nfsroot documentation

2007-12-03 Thread Amos Waterland

The difference between ip=off and ip=::off has been a cause of much
confusion.  Document how each behaves, and do not contradict ourselves
by saying that "off" is the default when in fact "any" is the default
and is descibed as being so lower in the file.

Signed-off-by: Amos Waterland <[EMAIL PROTECTED]>

---

 nfsroot.txt |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt
index 16a7cae..ac04a1d 100644
--- a/Documentation/nfsroot.txt
+++ b/Documentation/nfsroot.txt
@@ -92,8 +92,11 @@ 
ip=::
   autoconfiguration.
 
   The  parameter can appear alone as the value to the `ip'
-  parameter (without all the ':' characters before) in which case auto-
-  configuration is used.
+  parameter (without all the ':' characters before).  If the value is
+  "ip=off" or "ip=none", no autoconfiguration will take place, otherwise
+  autoconfiguration will take place.  Note that "ip=off" is not the same
+  thing as "ip=::off", because in the latter autoconfiguration will take
+  place if any of DHCP, BOOTP or RARP are compiled in the kernel.
 
 IP address of the client.
 
@@ -142,7 +145,7 @@ 
ip=::
into the kernel will be used, regardless of the value of
this option.
 
-  off or none: don't use autoconfiguration (default)
+  off or none: don't use autoconfiguration
  on or any:   use any protocol available in the kernel
  dhcp:use DHCP
  bootp:   use BOOTP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -rt] ARM: compile fix for event tracing

2007-12-03 Thread Kevin Hilman

The cycles/usecs conversion macros should be dependent on
CONFIG_EVENT_TRACE instead of CONFIG_LATENCY_TIMING.

Signed-off-by: Kevin Hilman <[EMAIL PROTECTED]>

--- a/include/asm-arm/timex.h
+++ b/include/asm-arm/timex.h
@@ -18,7 +18,7 @@ typedef unsigned long cycles_t;
 
 #ifndef mach_read_cycles
  #define mach_read_cycles() (0)
-#ifdef CONFIG_LATENCY_TIMING
+#ifdef CONFIG_EVENT_TRACE
  #define mach_cycles_to_usecs(d) (d)
  #define mach_usecs_to_cycles(d) (d)
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression - 2.6.24-rc3 - umem nvram card driver oops

2007-12-03 Thread Neil Brown

On Tuesday December 4, [EMAIL PROTECTED] wrote:
> On Tue, Dec 04, 2007 at 11:14:12AM +1100, Neil Brown wrote:
> > On Tuesday December 4, [EMAIL PROTECTED] wrote:
> > > Neil,
> > > 
> > > I just upgraded an ia64 (Altix, 16k page size) test box to 2.6.24-rc3
> > > from 2.6.23 and I get it panicing on boot in the umem driver.
> > 
> > Cool - someone is using umem!  And even testing it.  Thanks!
> > 
> > A quick look shows a probable NULL deref.  Let me know if this fixes
> > it.  I'll read through the offending patch more carefully and make
> > sure there is nothing else wrong.
> 
> Yeah, that appears to fix the problem.
> 
> Tested-by: Dave Chinner <[EMAIL PROTECTED]>

Thanks. 
I couldn't find any other issues in the code.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Is BIO_RW_FAILFAST really usable?

2007-12-03 Thread Neil Brown


I've been looking at use BIO_RW_FAILFAST in md/raid to improve
handling of some error cases.

This is particularly significant for the DASD driver (s390 specific).
I believe it uses optic fibre to connect to the drives.  When one of
these paths is unplugged, IO requests will block until an operator
runs a command to reset the card (or until it is plugged back in).
The only way to avoid this blockage is to use BIO_RW_FAILFAST.  So
we really need BIO_RW_FAILFAST for a reliable RAID1 configuration on
DASD drives.

However, I just tested BIO_RW_FAILFAST on my SATA drives: controller 

02:06.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] 
Serial ATA Controller (rev 02)

(not using the cards minimal RAID functionality) and requests fail
immediately and always with e.g.

sd 2:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT 
driverbyte=DRIVER_OK,SUGGEST_OK
end_request: I/O error, dev sdc, sector 2048

So fail fast obviously isn't generally usable.

What is the answer here?  Is the Silicon Image driver doing the wrong
thing, or is DASD doing the wrong thing, or is BIO_RW_FAILFAST
under-specified and we really need multiple flags or what?

Any ideas?

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression - 2.6.24-rc3 - umem nvram card driver oops

2007-12-03 Thread David Chinner

On Tue, Dec 04, 2007 at 11:14:12AM +1100, Neil Brown wrote:
> On Tuesday December 4, [EMAIL PROTECTED] wrote:
> > Neil,
> > 
> > I just upgraded an ia64 (Altix, 16k page size) test box to 2.6.24-rc3
> > from 2.6.23 and I get it panicing on boot in the umem driver.
> 
> Cool - someone is using umem!  And even testing it.  Thanks!
> 
> A quick look shows a probable NULL deref.  Let me know if this fixes
> it.  I'll read through the offending patch more carefully and make
> sure there is nothing else wrong.

Yeah, that appears to fix the problem.

Tested-by: Dave Chinner <[EMAIL PROTECTED]>

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Nick Piggin

On Tuesday 04 December 2007 11:30, David Schwartz wrote:

> Perhaps it might be possible to scan for the task at the same static
> priority level that is ready-to-run but last in line among other
> ready-to-run tasks and put it after that task?

Nice level versus posix static priority level debate aside, this
is the exact behaviour which the compat mode does now basically,
when you have all tasks running at nice 0 (which I assume is the
essentially the case in both the jvm and firefox tests) (some
things, eg. kernel threads or X server could run at a higher prio,
but these are not the ones calling yield anyway...)

> I think that's about as 
> close as we can get to the POSIX-specified behavior.

I don't think it is a question of POSIX being a bit fuzzy, or some
problem we have implementing it. It is explicitly specified to
allow any behaviour.

So the current default is not wrong, any more than the compat mode
is right.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ivtv: Some general fixes

2007-12-03 Thread Richard Knutsson

Fix "warning: Using plain integer as NULL pointer".
Convert 'x < y ? x : y' to use min() instead.

Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
Signed-off-by: Hans Verkuil <[EMAIL PROTECTED]>
---
Compile-tested on i386 with "allyesconfig" and "allmodconfig".
Resend, since the 'Remove a gcc-2.95 requirement'-part is taken away.

 ivtv-driver.c  |2 +-
 ivtv-ioctl.c   |2 +-
 ivtv-irq.c |4 ++--
 ivtv-streams.c |4 ++--
 ivtvfb.c   |2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)


diff --git a/drivers/media/video/ivtv/ivtv-driver.c 
b/drivers/media/video/ivtv/ivtv-driver.c
index 6d2dd87..96f340c 100644
--- a/drivers/media/video/ivtv/ivtv-driver.c
+++ b/drivers/media/video/ivtv/ivtv-driver.c
@@ -979,7 +979,7 @@ static int __devinit ivtv_probe(struct pci_dev *dev,
}
 
itv = kzalloc(sizeof(struct ivtv), GFP_ATOMIC);
-   if (itv == 0) {
+   if (itv == NULL) {
spin_unlock(_cards_lock);
return -ENOMEM;
}
diff --git a/drivers/media/video/ivtv/ivtv-ioctl.c 
b/drivers/media/video/ivtv/ivtv-ioctl.c
index fd6826f..24270de 100644
--- a/drivers/media/video/ivtv/ivtv-ioctl.c
+++ b/drivers/media/video/ivtv/ivtv-ioctl.c
@@ -688,7 +688,7 @@ static int ivtv_debug_ioctls(struct file *filp, unsigned 
int cmd, void *arg)
ivtv_reset_ir_gpio(itv);
}
if (val & 0x02) {
-   itv->video_dec_func(itv, cmd, 0);
+   itv->video_dec_func(itv, cmd, NULL);
}
break;
}
diff --git a/drivers/media/video/ivtv/ivtv-irq.c 
b/drivers/media/video/ivtv/ivtv-irq.c
index fd1688e..1384615 100644
--- a/drivers/media/video/ivtv/ivtv-irq.c
+++ b/drivers/media/video/ivtv/ivtv-irq.c
@@ -204,7 +204,7 @@ static int stream_enc_dma_append(struct ivtv_stream *s, u32 
data[CX2341X_MBOX_MA
s->sg_pending[idx].dst = buf->dma_handle;
s->sg_pending[idx].src = offset;
s->sg_pending[idx].size = s->buf_size;
-   buf->bytesused = (size < s->buf_size) ? size : s->buf_size;
+   buf->bytesused = min(size, s->buf_size);
buf->dma_xfer_cnt = s->dma_xfer_cnt;
 
s->q_predma.bytesused += buf->bytesused;
@@ -705,7 +705,7 @@ static void ivtv_irq_dec_data_req(struct ivtv *itv)
s = >streams[IVTV_DEC_STREAM_TYPE_YUV];
}
else {
-   itv->dma_data_req_size = data[2] >= 0x1 ? 0x1 : data[2];
+   itv->dma_data_req_size = min_t(u32, data[2], 0x1);
itv->dma_data_req_offset = data[1];
s = >streams[IVTV_DEC_STREAM_TYPE_MPG];
}
diff --git a/drivers/media/video/ivtv/ivtv-streams.c 
b/drivers/media/video/ivtv/ivtv-streams.c
index aa03e61..0e9e7d0 100644
--- a/drivers/media/video/ivtv/ivtv-streams.c
+++ b/drivers/media/video/ivtv/ivtv-streams.c
@@ -572,10 +572,10 @@ int ivtv_start_v4l2_encode_stream(struct ivtv_stream *s)
clear_bit(IVTV_F_I_EOS, >i_flags);
 
/* Initialize Digitizer for Capture */
-   itv->video_dec_func(itv, VIDIOC_STREAMOFF, 0);
+   itv->video_dec_func(itv, VIDIOC_STREAMOFF, NULL);
ivtv_msleep_timeout(300, 1);
ivtv_vapi(itv, CX2341X_ENC_INITIALIZE_INPUT, 0);
-   itv->video_dec_func(itv, VIDIOC_STREAMON, 0);
+   itv->video_dec_func(itv, VIDIOC_STREAMON, NULL);
}
 
/* begin_capture */
diff --git a/drivers/media/video/ivtv/ivtvfb.c 
b/drivers/media/video/ivtv/ivtvfb.c
index 52ffd15..f73ce98 100644
--- a/drivers/media/video/ivtv/ivtvfb.c
+++ b/drivers/media/video/ivtv/ivtvfb.c
@@ -1053,7 +1053,7 @@ static int ivtvfb_init_card(struct ivtv *itv)
}
 
itv->osd_info = kzalloc(sizeof(struct osd_info), GFP_ATOMIC);
-   if (itv->osd_info == 0) {
+   if (itv->osd_info == NULL) {
IVTVFB_ERR("Failed to allocate memory for osd_info\n");
return -ENOMEM;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: suspend-related lockdep warning

2007-12-03 Thread Andrew Morton

On Mon, 3 Dec 2007 23:34:26 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> On Monday, 3 of December 2007, Andrew Morton wrote:
> > On Sun, 2 Dec 2007 21:33:23 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > > On Saturday, 1 of December 2007, Pavel Machek wrote:
> > > > Hi!
> > > > 
> > > > > 2.6.24-rc3-mm2 (which will be released if it boots on two more 
> > > > > machines and
> > > > > if I stay awake) will say this during suspend-to-RAM on the Vaio:
> > > > > 
> > > > > [   91.876445] Syncing filesystems ... done.
> > > > > [   92.382595] Freezing user space processes ... WARNING: at 
> > > > > kernel/lockdep.c:2662 check_flags()
> > > > > [   92.384000] Pid: 1925, comm: dbus-daemon Not tainted 
> > > > > 2.6.24-rc3-mm2 #32
> > > > > [   92.384177]  [] show_trace_log_lvl+0x12/0x25
> > > > > [   92.384335]  [] show_trace+0xd/0x10
> > > > > [   92.384469]  [] dump_stack+0x55/0x5d
> > > > > [   92.384605]  [] check_flags+0x7f/0x11a
> > > > > [   92.384746]  [] lock_acquire+0x3a/0x86
> > > > > [   92.384886]  [] _spin_lock+0x26/0x53
> > > > > [   92.385023]  [] refrigerator+0x13/0xc8
> > > > > [   92.385163]  [] get_signal_to_deliver+0x32/0x3fb
> > > > > [   92.385326]  [] do_notify_resume+0x8c/0x699
> > > > > [   92.385476]  [] work_notifysig+0x13/0x1b
> > > > > [   92.385620]  ===
> > > > > [   92.385719] irq event stamp: 309
> > > > > [   92.385809] hardirqs last  enabled at (309): [] 
> > > > > syscall_exit_work+0x11/0x26
> > > > > [   92.386045] hardirqs last disabled at (308): [] 
> > > > > syscall_exit+0x14/0x25
> > > > > [   92.386265] softirqs last  enabled at (0): [] 
> > > > > copy_process+0x374/0x130e
> > > > > [   92.386491] softirqs last disabled at (0): [<>] 0x0
> > > > > [   92.392457] (elapsed 0.00 seconds) done.
> > > > > [   92.392581] Freezing remaining freezable tasks ... (elapsed 0.00 
> > > > > seconds) done.
> > > > > [   92.392882] PM: Entering mem sleep
> > > > > [   92.392974] Suspending console(s)
> > > > > 
> > > > > this has been happening for quite some time and might even be 
> > > > > happening in
> > > > > mainline.  
> > > > 
> > > > Is it complaining that we entered refrigerator with irqs disabled?
> > > 
> > > Or that someone else called task_lock() with irqs disabled at one point 
> > > ...
> > > 
> > > Hm, perhaps it's related to kernel preemption.  Andrew, I guess the 
> > > kernel is
> > > preemptible?
> > > 
> > 
> > yup.  http://userweb.kernel.org/~akpm/config-sony.txt
> 
> Is this reproducible with kernel preemption off?

yes.  Current -mm lineup:

[   34.455096] ipw2200: Failed to send WEP_KEY: Command timed out.
[   34.911876] Syncing filesystems ... done.
[   34.934526] Freezing user space processes ... WARNING: at 
kernel/lockdep.c:2662 check_flags()
[   34.934917] Pid: 1922, comm: dbus-daemon Not tainted 2.6.24-rc3-mm3 #2
[   34.935036]  [] show_trace_log_lvl+0x12/0x25
[   34.935142]  [] show_trace+0xd/0x10
[   34.935231]  [] dump_stack+0x55/0x5d
[   34.935322]  [] check_flags+0x7f/0x11a
[   34.935417]  [] lock_acquire+0x3a/0x86
[   34.935511]  [] _spin_lock+0x1c/0x49
[   34.935603]  [] refrigerator+0x13/0xc8
[   34.935697]  [] get_signal_to_deliver+0x34/0x2e8
[   34.935807]  [] do_notify_resume+0x8c/0x6fe
[   34.935907]  [] work_notifysig+0x13/0x1b
[   34.936004]  ===
[   34.936072] irq event stamp: 253
[   34.936133] hardirqs last  enabled at (253): [] 
syscall_exit_work+0x11/0x26
[   34.936294] hardirqs last disabled at (252): [] 
syscall_exit+0x14/0x25
[   34.936446] softirqs last  enabled at (0): [] 
copy_process+0x300/0x1246
[   34.936599] softirqs last disabled at (0): [<>] 0x0
[   34.954308] (elapsed 0.01 seconds) done.
[   34.954389] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) 
done.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possibly SATA related freeze killed networking and RAID

2007-12-03 Thread Tejun Heo

Phillip Susi wrote:
> Tejun Heo wrote:
>> Surprise, surprise.  There's no way to tell whether the controller
>> raised interrupt or not if command is not in progress.  As I said
>> before, there's no IRQ pending bit.  While processing commands, you can
>> tell by looking at other status registers but when there's nothing in
>> flight and the controller determines it's a good time to raise a
>> spurious interrupt, there's no way you can tell.  That dang SFF
>> interface is like 15+ years old.
>>
>> But we can still make things pretty robust.  We're working on it.
> 
> It sounds like you mean that you know the controller did NOT raise the
> interrupt ( intentionally/correctly ) if there was no command in
> progress, as opposed to not being able to tell.  Unless there is some
> condition under which it is valid for the controller to raise an
> interrupt when it had no commands in progress?  And if that's the case
> and there's know way to know WHY, that's a broken design.

If everything works correctly, all interrupts can be accounted for.
It's just that there's no margin for erratic behaviors and most ATA
controllers are built really cheap.  So, yeah, it's a 15+ years old
half-broken design.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pktcdvd : add kobject_put when kobject register fails

2007-12-03 Thread Dave Young

Kobject_put should be called when kobject register functioin fails, so the the 
kobj ref count touch zero and then the proper cleanup routines will be called.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
drivers/block/pktcdvd.c |4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff -upr linux/drivers/block/pktcdvd.c linux.new/drivers/block/pktcdvd.c
--- linux/drivers/block/pktcdvd.c   2007-11-30 13:13:44.0 +0800
+++ linux.new/drivers/block/pktcdvd.c   2007-11-30 13:24:08.0 +0800
@@ -117,8 +117,10 @@ static struct pktcdvd_kobj* pkt_kobj_cre
p->kobj.parent = parent;
p->kobj.ktype = ktype;
p->pd = pd;
-   if (kobject_register(>kobj) != 0)
+   if (kobject_register(>kobj) != 0) {
+   kobject_put(>kobj);
return NULL;
+   }
return p;
 }
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pnpacpi : exceeded the max number of IO resources

2007-12-03 Thread Dave Young

On Tue, Dec 04, 2007 at 08:55:13AM +0800, Shaohua Li wrote:
> 
> On Mon, 2007-12-03 at 18:02 +0100, Rene Herman wrote:
> > On 30-11-07 23:22, Rene Herman wrote:
> > 
> > > On 30-11-07 14:14, Chris Holvenstot wrote:
> > > 
> > >> For what it is worth I too have seen this problem this morning and it
> > >> DOES appear to be new (in contrast to a previous comment)
> > >>
> > >> The message:  pnpacpi: exceeded the max number of mem resources: 12
> > >>
> > >> is displayed each time the system is booted with the 2.6.24-rc3-git5
> > >> kernel but is NOT displayed when booting 2.6.24-rc3-git4
> > >>
> > >> I have made no changes in my config file between these two kernels other
> > >> than to accept any new defaults when running make oldconfig.
> > >>
> > >> If you had already narrowed it down to a change between git4 and git5 I
> > >> apologize for wasting your time.  Have to run to work now.
> > > 
> > > Thanks, and re-added the proper CCs. Sigh...
> > > 
> > > Well, yes, the warning is actually new as well. Previously your kernel 
> > > just silently ignored 8 more mem resources than it does now it seems.
> > > 
> > > Given that people are hitting these limits, it might make sense to just 
> > > do away with the warning for 2.6.24 again while waiting for the dynamic 
> > > code?
> > 
> > Ping. Should these warnings be reverted for 2.6.24?
> Revert the warning doesn't make any sense. I'd suggest changing the IO
> resources number bigger till Thomas's patch in.
Agree.
Change it to 90 works for me, But I think maybe 128 is better.

include/linux/pnp.h |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff -upr linux/include/linux/pnp.h linux.new/include/linux/pnp.h
--- linux/include/linux/pnp.h   2007-12-04 09:09:23.0 +0800
+++ linux.new/include/linux/pnp.h   2007-12-04 09:09:40.0 +0800
@@ -13,7 +13,7 @@
 #include 
 #include 
 
-#define PNP_MAX_PORT   24
+#define PNP_MAX_PORT   128
 #define PNP_MAX_MEM12
 #define PNP_MAX_IRQ2
 #define PNP_MAX_DMA2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

cgit for kernel.org

2007-12-03 Thread Lars Hjemli

Jan Engelhardt wrote:
> Repository overview.
>   CGIT: http://cgit.freedesktop.org/
>   GITWEB: http://git.kernel.org/

Why not compare more equal pages? freedesktop.org also runs gitweb:
  http://gitweb.freedesktop.org

>
>   Remarks:
>   cgit is lacking the links "shortlog", "log", "tree" and "git"
>   which gitweb prints for each repository line.

Adding "enable-index-links=1" to /etc/cgitrc will print links to "summary",
"log" and "tree" for each repository line, but you still have to open the
summary page to get the git (clone) link. It wouldn't be very hard to add
another link (see http://hjemli.net/git/cgit/tree/ui-repolist.c#n96), but I'm
not sure what actual value such a link would provide.

>
> Inside a repository.
>   CGIT: http://cgit.freedesktop.org/hal/

The similar page in gitweb:
  http://gitweb.freedesktop.org/?p=hal.git;a=summary

>   GITWEB:
>http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=summary

The similar page in cgit:
  http://hjemli.net/git/linux/

>   Remarks:
>   In cgit, the left gray panel went by quite unnoticed

IMHO the sidebar makes cgit way easier to navigate (especially the dropdown box
to switch between branches is pretty handy).

>
> Clicking on a commit.
>   CGIT:
>http://cgit.freedesktop.org/hal/commit/?id=b8f72cec53415ebc2d32805b049dcc94cef4a854

Similar page in gitweb:
http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=b8f72cec53415ebc2d32805b049dcc94cef4a854


>   GITWEB:
>http://git.kernel.org/?p=linux/kernel/git/jgarzik/misc-2.6.git;a=commit;h=8c27eba54970c6ebbb408186e5baa2274435e869

Similar page in cgit:
http://hjemli.net/git/linux/commit/?id=8c27eba54970c6ebbb408186e5baa2274435e869


>
>   Remarks:
>   cgit comes with a diffstat. Not that I look at diffstats all that
>   often anyway. Still need to click on "(diff)" go get a diff.

There's always the link in the sidebar ;-)

But seriously, showing the diff on the same page as the commit message would
sometimes be nice and sometimes it would be a bit too much:
http://hjemli.net/git/linux/commit/?id=b5faa4b89e4d83203b1f44f143a351b518f7cda2
http://git.kernel.org/?p=linux/kernel/git/jgarzik/misc-2.6.git;a=commit;h=b5faa4b89e4d83203b1f44f143a351b518f7cda2


>
>   Filenames are displayed with monospace font in gitweb :)
>   Clicking on the commit actually shows its diff in gitweb.

I belive cgit and gitweb displays roughly the same information on their commit
pages; I don't see a diff in gitweb either unless I click one of the diff-
links.


>
> Final remark:
>   gitweb has the warmer colors.

Yes, gitweb definitly looks more polished. It's a bit slow, though...

Another remark - IMHO there are some practical benefits to be had from the
virtual url support in cgit, e.g. linking to blobs:

http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=blob;h=ace11817054bfb5bcc8d55a31bd51b9dea49bbed;hb=fe25f897c62bb324660217e15dbd3091c808dbba;f=GL/mesa/X/Makefile.am

http://cgit.freedesktop.org/xorg/xserver/tree/GL/mesa/X/Makefile.am

--
larsh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Nick Piggin

On Monday 03 December 2007 22:37, Ingo Molnar wrote:
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> > > given how poorly sched_yield() is/was defined the only "compatible"
> > > solution would be to go back to the old yield code.
> >
> > While it is technically allowed to do anything with SCHED_OTHER class,
> > putting the thread to the back of the runnable tasks, or at least
> > having it give up _some_ priority (like the old scheduler) is less
> > surprising than having it do _nothing_.
>
> wrong: it's not "nothing" that the new code does - run two yield-ing
> loops and they'll happily switch to each other, at a rate of a few
> million context switches per second.

OK, it's not nothing, it interacts with the quantisation of the
update granularity and wakeup granularity... It's definitely not
what would be expected if you didn't look at the implementation
though.

> > Wheras JVMs (eg. that have garbage collectors call yield), presumably
> > get quite a lot of tuning, and that was probably done with the less
> > surprising (and more common) sched_yield behaviour.
>
> i disagree. To some of them, having a _more_ agressive yield than 2.6.22
> might increase latencies and jitter - which can be seen as a regression
> as well. All tests i've seen so far show dramatically lower jitter in
> v2.6.23 and upwards kernels.

Right so we should have one being about the _same_ aggressiveness.
Doesn't that make sense?

> anyway, right now what we have is a closed-source benchmark (which is a
> quite silly one as well) against a popular open-source desktop app and
> in that case the choice is obvious. Actual Java app server benchmarks
> did not show any regression so maybe Java's use of yield for locking is
> not that significant after all and it's only Volanomark that is doing
> extra (unnecessary) yields. (and java benchmarks are part of the
> upstream kernel test grid anyway so we'd have noticed any serious
> regression)

Sure I'm not basing this purely on volanomark at all. If you've tested
a reasonable range of actual java app server benchmarks with a range of
jvms then fine.

> if you insist on flipping the default then that just shows a blatant
> disregard to desktop performance

That statement is true. But you know I'm not insisting on flipping
the default, so I don't see how it is relevant.

BTW. can you answer what workload did firefox see the sched_yield
pauses with, and/or where that thread is archived? I still think
firefox should not call sched_yield at all.

> > > i think the sanest long-term solution is to strongly discourage the
> > > use of SCHED_OTHER::yield, because there's just no sane definition
> > > for yield that apps could rely upon. (well Linus suggested a pretty
> > > sane definition but that would necessiate the burdening of the
> > > scheduler fastpath - we dont want to do that.) New ideas are welcome
> > > of course.
> >
> > sched_yield is defined to put the calling task at the end of the queue
> > for the given priority level as you know (ie. at the end of all other
> > priority 0 tasks, for SCHED_OTHER).
>
> almost: substitute "priority" with "nice level". One problem is, that's
> not what the old scheduler did.

I'm not sure if that's right. Posix realtime scheduling says that all
SCHED_OTHER tasks are priority 0. But I'm not much of a standards reader.
And even if it were just applied to a given nice level, that would be
more intuitive than the current default.

> > > [ also, actual technical feedback on the SCHED_BATCH patch i sent
> > >   (which was the only "forward looking" moment in this thread so far
> > >;-) would be nice too. ]
> >
> > I dislike a wholesale change in behaviour like that. Especially when
> > it is changing behaviour of yield among SCHED_BATCH tasks versus yield
> > among SCHED_OTHER tasks.
>
> There's no wholesale change in behavior, SCHED_BATCH tasks have clear
> expectations of being throughput versus latency, hence the patch makes
> quite a bit of sense to me. YMMV.

sched_yield semantics are totally different depending on whether the
process is SCHED_BATCH or not. That's what I was calling a change in
behaviour, so arguing otherwise is just arguing semantics.

I just would think it isn't such a good thing if you suddently got a
500% speedup by making your jvm SCHED_BATCH, only to find that it
stops working when your batch cron jobs or something start running...
But if there are no real jvm workloads that would see such a speedup,
then I guess the point is moot ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pnpacpi : exceeded the max number of IO resources

2007-12-03 Thread Shaohua Li


On Mon, 2007-12-03 at 18:02 +0100, Rene Herman wrote:
> On 30-11-07 23:22, Rene Herman wrote:
> 
> > On 30-11-07 14:14, Chris Holvenstot wrote:
> > 
> >> For what it is worth I too have seen this problem this morning and it
> >> DOES appear to be new (in contrast to a previous comment)
> >>
> >> The message:  pnpacpi: exceeded the max number of mem resources: 12
> >>
> >> is displayed each time the system is booted with the 2.6.24-rc3-git5
> >> kernel but is NOT displayed when booting 2.6.24-rc3-git4
> >>
> >> I have made no changes in my config file between these two kernels other
> >> than to accept any new defaults when running make oldconfig.
> >>
> >> If you had already narrowed it down to a change between git4 and git5 I
> >> apologize for wasting your time.  Have to run to work now.
> > 
> > Thanks, and re-added the proper CCs. Sigh...
> > 
> > Well, yes, the warning is actually new as well. Previously your kernel 
> > just silently ignored 8 more mem resources than it does now it seems.
> > 
> > Given that people are hitting these limits, it might make sense to just 
> > do away with the warning for 2.6.24 again while waiting for the dynamic 
> > code?
> 
> Ping. Should these warnings be reverted for 2.6.24?
Revert the warning doesn't make any sense. I'd suggest changing the IO
resources number bigger till Thomas's patch in.

Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

newlist: public malware discussion [Re: Out of tree module using LSM]

2007-12-03 Thread Jon Masters

On Mon, 2007-12-03 at 23:45 +0100, Bodo Eggert wrote:
> Jon Masters <[EMAIL PROTECTED]> wrote:
> > On Thu, 2007-11-29 at 11:11 -0800, Ray Lee wrote:
> >> On Nov 29, 2007 10:56 AM, Jon Masters <[EMAIL PROTECTED]> wrote:
> >> > On Thu, 2007-11-29 at 10:40 -0800, Ray Lee wrote:
> >> > > On Nov 29, 2007 9:36 AM, Alan Cox <[EMAIL PROTECTED]> wrote:
> 
> >> > > > > closed. But more importantly further access to it can be blocked
> >> > > > > until appropriate actions are taken which also applies with your
> >> > > > > example, no? Is
> >> > > >
> >> > > > That bit is hard- very hard.
> 
> >> To lift Alan's example, a naive first implementation
> >> would be to create a suffix tree of all of ESR's works, then scan each
> >> page on fault to see if there are any partial matches in the tree.
> > 
> > Ah, but I could write a sequence of pages that on their own looked
> > garbage, but in reality, when executed would print out a copy of the
> > Jargon File in all its glory. And if you still think you could look for
> > patterns, how about executable code that self-modifies in random ways
> > but when executed as a whole actually has the functionality of fetchmail
> > embedded within it? How would you guard against that?
> 
> You can't scan all possible code for malware:
> Take a random piece of code, possibly halting. Replace all halting conditions
> using a piece of malware. Scan it. If it were possible to detect the malware
> without false positives, you'd have solved the halting problem.

Good. I think you got the point of my sarcasm. My *point* was that we
have two different camps of people here:

* Those who think some solution is better than none.
* Those who want an unobtainable, perfect solution.

I'm not criticising, each has their position. However, I was attempting
to explain that I do fully "get it" by running through an example of how
to work around more elementary on-access scanning schemes. I know that
(no matter what marketing exists to the contrary), it is never possible
to have perfect anti-malware software. But I do think there is a time
and a place for Linux to help make some folks feel safer - on access
file scanning isn't evil, and you don't have to use it! Freedom! :-)

Having spoken to a few people, I've created the following mailing list,
so we can rant away and come up with a list of requirements to present
for further discussion. Note that this is a case where I actually expect
people to be *happy* with yet another email list :-) 

http://lists.printk.net/cgi-bin/mailman/listinfo/malware-list

Please sign up, and encourage interested third parties to do so too.
Let's work this all out. Then I'll come back sometime over the holidays
with a summary and some followup.

> If I had to design a virus scanner interface, I'd e.g. create a library*
> providing an {open|mmap}_and_scan() function that would give me a clean
> copy/really-private mapping of a scanned file, and a scan_{blob,file}()
> function that would scan a block of memory/a file.

Although I'm open to the idea, I'm almost 100% convinced that nobody is
going to buy modifying userspace applications one at a time. I think
there is a legitimate feeling of this needing to be massaged by the
kernel on some level. But I might be wrong - don't flame me.

Jon.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread David Schwartz

> * Mark Lord <[EMAIL PROTECTED]> wrote:

> > Ack.  And what of the suggestion to try to ensure that a yielding task
> > simply not end up as the very next one chosen to run?  Maybe by
> > swapping it with another (adjacent?) task in the tree if it comes out
> > on top again?

> we did that too for quite some time in CFS - it was found to be "not
> agressive enough" by some folks and "too agressive" by others. Then when
> people started bickering over this we added these two simple corner
> cases - switchable via a flag. (minimum agression and maximum agression)

They are both correct. It is not agressive enough if there are tasks other
than those two that are at the same static priority level and ready to run.
It is too agressive if the task it is swapped with is at a lower static
priority level.

Perhaps it might be possible to scan for the task at the same static
priority level that is ready-to-run but last in line among other
ready-to-run tasks and put it after that task? I think that's about as close
as we can get to the POSIX-specified behavior.

> > Thanks Ingo -- I *really* like this scheduler!

Just in case this isn't clear, I like CFS too and sincerely appreciate the
work Ingo, Con, and others have done on it.

DS

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Nick Piggin

On Tuesday 04 December 2007 09:33, Ingo Molnar wrote:
> * Mark Lord <[EMAIL PROTECTED]> wrote:
> >> heh, thanks :) For which workload does it make the biggest difference
> >> for you? (and compared to what other scheduler you used before?
> >> 2.6.22?)
> >
> > ..
> >
> > Heh.. I'm just a very unsophisticated desktop user, and I like it when
> > Thunderbird and Firefox are unaffected by the "make -j3" kernel builds
> > that are often running in another window.  BIG difference there.
> >
> > And on the cool side, the Swarm game (swarm.swf) is a great example of
> > something that used to get jerky really fast whenever anything else
> > was running, and now it really doesn't seem to be affected by
> > anything. (I don't really play computer games, but this one is has a
> > very retro feel..).
>
> nice! Do you feel any difference between 2.6.23 and 2.6.24-rc for these
> workloads? (if you've tried .24 already)

And also, I wonder what the average timeslice and number of context
switches is between 2.6.22 and 2.6.23-4. Would be interesting to see.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression - 2.6.24-rc3 - umem nvram card driver oops

2007-12-03 Thread Neil Brown

On Tuesday December 4, [EMAIL PROTECTED] wrote:
> Neil,
> 
> I just upgraded an ia64 (Altix, 16k page size) test box to 2.6.24-rc3
> from 2.6.23 and I get it panicing on boot in the umem driver.

Cool - someone is using umem!  And even testing it.  Thanks!

A quick look shows a probable NULL deref.  Let me know if this fixes
it.  I'll read through the offending patch more carefully and make
sure there is nothing else wrong.

NeilBrown


Fix possible NULL dereference in umem.c

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/block/umem.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff .prev/drivers/block/umem.c ./drivers/block/umem.c
--- .prev/drivers/block/umem.c  2007-12-04 11:11:30.0 +1100
+++ ./drivers/block/umem.c  2007-12-04 11:11:42.0 +1100
@@ -484,7 +484,8 @@ static void process_page(unsigned long d
page->idx++;
if (page->idx >= bio->bi_vcnt) {
page->bio = bio->bi_next;
-   page->idx = page->bio->bi_idx;
+   if (page->bio)
+   page->idx = page->bio->bi_idx;
}
 
pci_unmap_page(card->dev, desc->data_dma_handle,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Strange 1-second pauses during Resume-from-RAM

2007-12-03 Thread Jörn Engel

On Mon, 3 December 2007 01:57:02 +0100, Jörn Engel wrote:
> 
> After an eternity of compile time, this config does generate some useful
> output.  qemu is not to blame.

Or is it?  The output definitely looks suspicious.  Large amounts of
code get processed within a microsecond, while update_wall_time()
appears to cause huge delays every time it is called:
http://logfs.org/~joern/trace

Does this output make sense or does it rather indicate some sloppiness
wrt. time in the qemu virtual machine?

Jörn

-- 
tglx1 thinks that joern should get a (TM) for "Thinking Is Hard"
-- Thomas Gleixner
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Ingo Molnar


* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:

> > > Er, it won't play well if that happen when tasks are frozen for 
> > > suspend.
> > 
> > right now any suspend attempt times out after 20 seconds:
> > 
> >   $ grep TIMEOUT kernel/power/process.c
> >   #define TIMEOUT (20 * HZ)
> >   end_time = jiffies + TIMEOUT;
> 
> This is the timeout for freezing tasks, but if the freezing succeeds, 
> they can stay in TASK_UNINTERRUPTIBLE for quite some more time, 
> especially during a hibernation (the tasks stay frozen until we power 
> off the system after saving the image).

ah, ok. So this was a live bug - thanks for the clarification.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] SC26XX: New serial driver for SC2681 uarts

2007-12-03 Thread Arjan van de Ven

On Mon, 3 Dec 2007 15:53:17 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Sun,  2 Dec 2007 20:43:46 +0100 (CET)
> Thomas Bogendoerfer <[EMAIL PROTECTED]> wrote:
> 
> > New serial driver for SC2681/SC2691 uarts. Older SNI RM400 machines
> > are using these chips for onboard serial ports.
> > 
> 
> Little things...
> 
> > --- /dev/null
> > +++ b/drivers/serial/sc26xx.c
> > @@ -0,0 +1,757 @@
> > +/*
> > + * SC268xx.c: Serial driver for Philiphs SC2681/SC2692 devices.
> > + *
> > + * Copyright (C) 2006,2007 Thomas Bogend__rfer
> > ([EMAIL PROTECTED])
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#if defined(CONFIG_MAGIC_SYSRQ)
> > +#define SUPPORT_SYSRQ
> > +#endif
> > +
> > +#include 
> > +
> > +#define SC26XX_MAJOR 204
> > +#define SC26XX_MINOR_START   205
> > +#define SC26XX_NR2

did lanana assign these numbers officially?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] SC26XX: New serial driver for SC2681 uarts

2007-12-03 Thread Andrew Morton

On Sun,  2 Dec 2007 20:43:46 +0100 (CET)
Thomas Bogendoerfer <[EMAIL PROTECTED]> wrote:

> New serial driver for SC2681/SC2691 uarts. Older SNI RM400 machines are
> using these chips for onboard serial ports.
> 

Little things...

> --- /dev/null
> +++ b/drivers/serial/sc26xx.c
> @@ -0,0 +1,757 @@
> +/*
> + * SC268xx.c: Serial driver for Philiphs SC2681/SC2692 devices.
> + *
> + * Copyright (C) 2006,2007 Thomas Bogend__rfer ([EMAIL PROTECTED])
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#if defined(CONFIG_MAGIC_SYSRQ)
> +#define SUPPORT_SYSRQ
> +#endif
> +
> +#include 
> +
> +#define SC26XX_MAJOR 204
> +#define SC26XX_MINOR_START   205
> +#define SC26XX_NR2
> +
> +struct uart_sc26xx_port {
> + struct uart_port  port[2];
> + u8 dsr_mask[2];
> + u8 cts_mask[2];
> + u8 dcd_mask[2];
> + u8 ri_mask[2];
> + u8 dtr_mask[2];
> + u8 rts_mask[2];
> + u8 imr;
> +};
> +
> +/* register common to both ports */
> +#define RD_ISR  0x14
> +#define RD_IPR  0x34
> +
> +#define WR_ACR  0x10
> +#define WR_IMR  0x14
> +#define WR_OPCR 0x34
> +#define WR_OPR_SET  0x38
> +#define WR_OPR_CLR  0x3C
> +
> +/* access common register */
> +#define READ_SC(p, r)readb ((p)->membase + RD_##r)
> +#define WRITE_SC(p, r, v)writeb ((v), (p)->membase + WR_##r)

No space before the (.  checkpatch misses this.

> +/* register per port */
> +#define RD_PORT_MRx 0x00
> +#define RD_PORT_SR  0x04
> +#define RD_PORT_RHR 0x0c
> +
> +#define WR_PORT_MRx 0x00
> +#define WR_PORT_CSR 0x04
> +#define WR_PORT_CR  0x08
> +#define WR_PORT_THR 0x0c
> +
> +/* access port register */
> +#define READ_SC_PORT(p, r) \
> + readb((p)->membase + (p)->line * 0x20 + RD_PORT_##r)
> +#define WRITE_SC_PORT(p, r, v) \
> + writeb((v), (p)->membase + (p)->line * 0x20 + WR_PORT_##r)

eww, ugly.  Why not have a nice C function which is passed RD_PORT_SR,
RD_PORT_RHR, etc?

That has the (minor) advantage that it won't malfunction if someone does

WRITE_SC_PORT(foo++, r, v);

(the macro references an arg more than once)

> +/* SR bits */
> +#define SR_BREAK(1 << 7)
> +#define SR_FRAME(1 << 6)
> +#define SR_PARITY   (1 << 5)
> +#define SR_OVERRUN  (1 << 4)
> +#define SR_TXRDY(1 << 2)
> +#define SR_RXRDY(1 << 0)
> +
> +#define CR_RES_MR   (1 << 4)
> +#define CR_RES_RX   (2 << 4)
> +#define CR_RES_TX   (3 << 4)
> +#define CR_STRT_BRK (6 << 4)
> +#define CR_STOP_BRK (7 << 4)
> +#define CR_DIS_TX   (1 << 3)
> +#define CR_ENA_TX   (1 << 2)
> +#define CR_DIS_RX   (1 << 1)
> +#define CR_ENA_RX   (1 << 0)
> +
> +/* ISR bits */
> +#define ISR_RXRDYB  (1 << 5)
> +#define ISR_TXRDYB  (1 << 4)
> +#define ISR_RXRDYA  (1 << 1)
> +#define ISR_TXRDYA  (1 << 0)
> +
> +/* IMR bits */
> +#define IMR_RXRDY   (1 << 1)
> +#define IMR_TXRDY   (1 << 0)
> +
> +static void sc26xx_enable_irq(struct uart_port *port, int mask)
> +{
> + struct uart_sc26xx_port *up;
> + int line = port->line;
> +
> + port -= line;
> + up = (struct uart_sc26xx_port *)port;

Yeah, lots of serial drivers do this and it's old-fashioned.  It would be
clearer to use container_of() rather than the open-coded typecast.  That
way, the uart_port doesn't need to be the first member of uart_sc26xx_port,
too.


> + up->imr |= mask << (line * 4);
> + WRITE_SC(port, IMR, up->imr);
> +}
>
> ...
>
> +
> +/* port->lock is not held.  */
> +static unsigned int sc26xx_tx_empty(struct uart_port *port)
> +{
> + unsigned long flags;
> + unsigned int ret;
> +
> + spin_lock_irqsave(>lock, flags);
> + ret = (READ_SC_PORT(port, SR) & SR_TXRDY) ? TIOCSER_TEMT : 0;
> + spin_unlock_irqrestore(>lock, flags);
> + return ret;
> +}

I suspect the locking here doesn't actually do anything?

> +/* port->lock is not held.  */
> +static void sc26xx_set_termios(struct uart_port *port, struct ktermios 
> *termios,
> +   struct ktermios *old)

hm, termios stuff.  I'll cc Alan on the commit...

> +static struct uart_port *sc26xx_port;
> +
> +#ifdef CONFIG_SERIAL_SC26XX_CONSOLE
> +static inline void sc26xx_console_putchar(struct uart_port *port, char c)
> +{
> + unsigned long flags;
> + int limit = 100;
> +
> + spin_lock_irqsave(>lock, flags);
> +
> + while (limit-- > 0) {
> + if (READ_SC_PORT(port, SR) & SR_TXRDY) {
> + WRITE_SC_PORT(port, THR, c);
> + break;
> + }
> + udelay(2);
> + }
> +
> + spin_unlock_irqrestore(>lock, flags);
> +}

This is far too large to be inlined.

> +static void sc26xx_console_write(struct console *con, const char *s, 
> unsigned n)
> +{
> + struct uart_port *port = sc26xx_port;
> + int i;
> +
> + for (i = 0; i <

[PATCH] Freezer: Fix JFFS2 garbage collector freezing issue (rev. 2)

2007-12-03 Thread Rafael J. Wysocki

[This is a replacement for
freezer-fix-jffs2-garbage-collector-freezing-issue.patch]
---
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Fix breakage caused by commit d5d8c5976d6adeddb8208c240460411e2198b393
"freezer: do not send signals to kernel threads" in
jffs2_garbage_collect_thread() that assumed it would be sent signals
by the freezer.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Cc: David Woodhouse <[EMAIL PROTECTED]>
Cc: Pete MacKay <[EMAIL PROTECTED]>
---
 fs/jffs2/background.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/fs/jffs2/background.c
===
--- linux-2.6.orig/fs/jffs2/background.c
+++ linux-2.6/fs/jffs2/background.c
@@ -105,7 +105,7 @@ static int jffs2_garbage_collect_thread(
 
/* Put_super will send a SIGKILL and then wait on the sem.
 */
-   while (signal_pending(current)) {
+   while (signal_pending(current) || freezing(current)) {
siginfo_t info;
unsigned long signr;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Freezer: Fix JFFS2 garbage collector freezing issue (was: Re: JFFS2 garbage collection threads not freezing?)

2007-12-03 Thread Rafael J. Wysocki

On Saturday, 1 of December 2007, Rafael J. Wysocki wrote:
> On Friday, 30 of November 2007, Pete MacKay wrote:
[--snip--]
> ---
> Subject: Freezer: Fix JFFS2 garbage collector freezing issue
> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Fix breakage caused by commit d5d8c5976d6adeddb8208c240460411e2198b393
> "freezer: do not send signals to kernel threads" in
> jffs2_garbage_collect_thread() that assumed it would be sent signals
> by the freezer.
> 
> Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> Cc: Pete MacKay <[EMAIL PROTECTED]>
> Cc: Andrew Morton <[EMAIL PROTECTED]>
> ---
>  fs/jffs2/background.c |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/fs/jffs2/background.c
> ===
> --- linux-2.6.orig/fs/jffs2/background.c
> +++ linux-2.6/fs/jffs2/background.c
> @@ -103,15 +103,17 @@ static int jffs2_garbage_collect_thread(
>  get there first. */
>   yield();
>  
> + /* If system suspend is in progress, go to the refrigerator and
> +start again when the suspend is done */
> + if (try_to_freeze())
> + goto again;
> +

This still has the problem that, if the freeze request comes exactly here, the
loop below will not allow us to freeze.

>   /* Put_super will send a SIGKILL and then wait on the sem.
>*/
>   while (signal_pending(current)) {
>   siginfo_t info;
>   unsigned long signr;
>  
> - if (try_to_freeze())
> - goto again;
> -
>   signr = dequeue_signal_lock(current, >blocked, 
> );
>  
>   switch(signr) {
> --

I'll send another patch for this in a while.

Greetings,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: what happened to RELATIME?

2007-12-03 Thread Joakim Tjernlund

> -Original Message-
> From: Mark Fasheh [mailto:[EMAIL PROTECTED] 
> Sent: den 3 december 2007 22:34
> To: Joakim Tjernlund
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: what happened to RELATIME?
> 
> On Mon, Dec 03, 2007 at 09:16:15PM +0100, Joakim Tjernlund wrote:
> > Looking in 2.6.23 sources it seems like only ocfs2 has added
> > support for RELATIME. Was RELATIME a bad idea or is there
> > some other reason other filesystems hasn't added support?
> 
> Ocfs2 just needs a bit of explicit support - as far as I 
> recall, other file
> systems should have gotten it automagically via the generic 
> vfs changes.
>   --Mark

Ahh, that explains it. Can anyone confirm that all in
FS:es honours relatime?

Thanks, Jocke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bugme-new] [Bug 9482] New: kernel GPF in 2.6.24 (g09f345da)

2007-12-03 Thread Ed L. Cashin

On Mon, Dec 03, 2007 at 03:13:49PM -0800, Andrew Morton wrote:
> On Mon, 3 Dec 2007 14:47:22 -0800
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > Does this fix?
> 
> Slightly more elaborate version

Yes, this patch does eliminate the problem.  Without it, no write can
complete, and with it I have seen many writes complete without any
trouble.

Thank you for looking into this.  I will look more closely at this
patch tomorrow.

-- 
  Ed L Cashin <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] sh updates for 2.6.24-rc4, part 2.

2007-12-03 Thread Paul Mundt

Please pull from:

master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6.24.git

Which contains:

Nobuhiro Iwamatsu (2):
  sh: Fix PCI IO space base address of SH7780.
  sh: Support PCI IO access of SH7780 base boards.

 arch/sh/drivers/pci/ops-r7780rp.c |4 ++--
 arch/sh/drivers/pci/ops-se7780.c  |4 ++--
 arch/sh/drivers/pci/pci-sh7780.h  |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add EXPORT_SYMBOL(ksize);

2007-12-03 Thread Andrew Morton

On Sun, 2 Dec 2007 14:48:42 +0100
Adrian Bunk <[EMAIL PROTECTED]> wrote:

> On Sun, Dec 02, 2007 at 05:43:39PM +0900, Tetsuo Handa wrote:
> > 
> > mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't. I don't know 
> > why.
> >...
> 
> That's due to the fact that my patch to remove this unused export from 
> slub was not yet applied...
> 
> Where is the modular in-kernel user?
> 

binfmt_flat.c, binfmt_elf_fdpic.c.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7][QUOTA] Move sysctl management code under ifdef CONFIG_SYSCTL

2007-12-03 Thread Eric W. Biederman

Andrew Morton <[EMAIL PROTECTED]> writes:

>> +#ifdef CONFIG_SYSCTL
>>  static ctl_table fs_dqstats_table[] = {
>>  {
>>  .ctl_name   = FS_DQ_LOOKUPS,
>> @@ -1918,6 +1919,7 @@ static ctl_table sys_table[] = {
>>  },
>>  { .ctl_name = 0 },
>>  };
>> +#endif
>>  
>>  static int __init dquot_init(void)
>>  {
>> @@ -1926,7 +1928,9 @@ static int __init dquot_init(void)
>>  
>>  printk(KERN_NOTICE "VFS: Disk quotas %s\n", __DQUOT_VERSION__);
>>  
>> +#ifdef CONFIG_SYSCTL
>>  register_sysctl_table(sys_table);
>> +#endif
>>  
>>  dquot_cachep = kmem_cache_create("dquot",
>>  sizeof(struct dquot), sizeof(unsigned long) * 4,
>
> We should avoid the ifdefs around the register_sysctl_table() call.
>
> At present the !CONFIG_SYSCTL implementation of register_sysctl_table() is
> a non-inlined NULL-returning stub.  All we have to do is to inline that stub
> then these ifdefs can go away.

Yes agreed.  What we need to do is to give the compiler enough information
to know that the sysctl table is not used.

Making the function an inline and having the table marked "static"
should be enough for the compiler to do the optimization for us instead
of having to manually remove sysctl tables by hand.

Doing it with an inline function should save us a lot of work and maintenance
in the long run.   I will see if I can cook up that patch.

> The same applies to register_sysctl_paths().

Agreed.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SoftMAC: Getting essid from req_essid

2007-12-03 Thread Larry Finger


Ray Lee wrote:

Hey there Larry, all,

git blame fingered commit id efe870f9 (from Larry) for adding a couple
of fairly harmless looking messages to
net/ieee80211/softmac/ieee80211softmac_wx.c . The problem is that one
of them is clogging up my syslog at the tune of once a second or so
("SoftMAC: Getting essid from req_essid"), and rolling everything else
out of my dmesg.

I just rebooted into 2.6.23-rc3+some, and after 36 minutes of uptime I
already have:

$ dmesg | cut -d ']' -f2- | sort | uniq -c | sort -nr | head -3

   1047  SoftMAC: Getting essid from req_essid
 38  SoftMAC: Getting essid from associate_essid
 22  SoftMAC: Scanning finished: scanned 13 channels starting with channel 1

Is the message important for debugging, or can I make a patch to yank
the silly thing?


Just turn off SoftMAC debugging.

Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bugme-new] [Bug 9482] New: kernel GPF in 2.6.24 (g09f345da)

2007-12-03 Thread Ed L. Cashin

On Mon, Dec 03, 2007 at 02:47:22PM -0800, Andrew Morton wrote:
> On Mon, 3 Dec 2007 16:38:37 -0500
> "Ed L. Cashin" <[EMAIL PROTECTED]> wrote:
...
> > It appears that the fbc->counters pointer is NULL.
> 
> Does this fix?
> 
> --- a/drivers/block/aoe/aoeblk.c~a
> +++ a/drivers/block/aoe/aoeblk.c
> @@ -6,6 +6,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -228,6 +229,7 @@ aoeblk_gdalloc(void *vp)
>  
>   spin_lock_irqsave(>lock, flags);
>   blk_queue_make_request(>blkq, aoeblk_make_request);
> + bdi_init(>blkq.backing_dev_info);
>   gd->major = AOE_MAJOR;
>   gd->first_minor = d->sysminor * AOE_PARTITIONS;
>   gd->fops = _bdops;
> _
> 
> 
> 

No, the behavior doesn't change with this patch applied.

Meanwhile I have started a git bisect, and hopefully that will turn up
a specific patch before I hit an unbootable kernel or get my machine
in a state where it won't boot.

-- 
  Ed L Cashin <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Need lockdep help

2007-12-03 Thread Jarek Poplawski

Alan Stern wrote, On 12/03/2007 04:08 PM:

> On Mon, 3 Dec 2007, Jarek Poplawski wrote:
> 
>>> System sleep start:
>>> down_read(notifier-chain rwsem);
>>> call the notifier routine
>>> down_write(_sleep_in_progress_rwsem);
>>> up_read(notifier-chain rwsem);
>>>
>>> System sleep end:
>>> down_read(notifier-chain rwsem);
>>> call the notifier routine
>>> up_write(_sleep_in_progress_rwsem);
>>> up_read(notifier-chain rwsem);
>>>
>>> This creates a lockdep violation; each rwsem in turn is locked while 
>>> the other is being held.  However the only way this could lead to 
>>> deadlock would be if there was already a bug in the system Power 
>>> Management code (overlapping notifications).
>> Actually, IMHO, there is no reason for any lockdep violation:
>>
>> thread #1: has down_read(A); waits for #2 to down_write(B)
>> thread #2: has down_write(B); never waits for #1 to down_read(A)
>>
>> So, deadlock isn't possible here. If lockdep reports something else it
>> should be fixed (and you'd be right to omit lockdep until this is
>> done).
> 
> I think the reasoning goes the way Arjan described.  Suppose in between
> #1 and #2 there is thread #3 trying to do down_write(A) and waiting for
> #1.  Then thread #2 doesn't have to wait for #1 directly, but it would
> have to wait for #3.

As a matter of fact I completely missed Arjan's point because I thought
you described these locks according to lockdep's report, and there is
an information about read or write... Since, you still seem to guess
above about possible scenario, maybe it would be easier to show this
report (and maybe a piece of this code if possible)?

Btw., if it were like you're suggesting, it still shouldn't make any
difference: if thread #3 is only waiting for the lock taken for reading,
then I can't see why thread #2 has to wait for anything. Probably more
dangerous, at least for lockdep, could look taking down_write(A) by
thread #1 in between, but if it all were possible only within one
thread, then still there should be no reason to change good program
only to please lockdep.

> In my case the simplest answer appears to be the replace the rwsem
> with something slightly more complicated (a mutex plus a boolean flag 
> -- the loss of concurrency won't matter much since it isn't on a hot 
> path).

I'm not sure I can understand your plan, but I doubt there should be
such problems with taking rwsem for sleeping, so maybe it would be
better to figure out what really scares lockdep, to fix the right place?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] (2.6.24-rc3-mm2) -mm Smack mutex cleanup

2007-12-03 Thread Jiri Slaby

On 12/03/2007 07:39 PM, Casey Schaufler wrote:
> From: Casey Schaufler <[EMAIL PROTECTED]>
> 
> Clean out unnecessary mutex initializations for Smack list locks.
> Once this is done, there is no need for them to be shared among
> multiple files, so pull them out of the header file and put them
> in the files where they belong.

Then it might be static.

> Pull unnecessary locking from smack_inode_setsecurity, it used
> to be required when the assignment was not guaranteed to be a
> scalar value but isn't now.
> 
> Change uses of __capable(current,...) to capable(...).
> Take out an inappropriate cast. Use container_of() instead
> of doing the same calculation by hand.
> Fix comment spelling errors.

Too many different changes according to the name of the patch.

> Signed-off-by: Casey Schaufler <[EMAIL PROTECTED]>
> 
> ---
> 
> Tested with stamp-2007-11-30-16-39
> 
>  security/smack/smack.h|3 --
>  security/smack/smack_access.c |3 ++
>  security/smack/smack_lsm.c|   34 +---
>  security/smack/smackfs.c  |6 +
>  4 files changed, 19 insertions(+), 27 deletions(-)
> 
> diff -uprN -X linux-2.6.24-rc3-mm2-base/Documentation/dontdiff 
> linux-2.6.24-rc3-mm2-base/security/smack/smack_lsm.c 
> linux-2.6.24-rc3-mm2-smack/security/smack/smack_lsm.c
> --- linux-2.6.24-rc3-mm2-base/security/smack/smack_lsm.c  2007-11-27 
> 16:47:05.0 -0800
> +++ linux-2.6.24-rc3-mm2-smack/security/smack/smack_lsm.c 2007-11-28 
> 11:46:13.0 -0800
[...]
> @@ -748,9 +746,7 @@ static int smack_inode_setsecurity(struc
>   return -EINVAL;
>  
>   if (strcmp(name, XATTR_SMACK_SUFFIX) == 0) {
> - mutex_lock(>smk_lock);
>   nsp->smk_inode = sp;
> - mutex_unlock(>smk_lock);
>   return 0;
>   }
>   /*

Ok, it still might be atomic as a variable change, but it will break scenarios
such as

mutex_lock(>smk_lock);
create(nsp->smk_inode);
cook_a_dinner();
get_info(nsp->smk_inode);
mutex_unlock(>smk_lock);

While cook_a_dinner(), smack_inode_setsecurity() is called and the attribute
changed...

Doesn't this matter?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bugme-new] [Bug 9482] New: kernel GPF in 2.6.24 (g09f345da)

2007-12-03 Thread Andrew Morton

On Mon, 3 Dec 2007 14:47:22 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> Does this fix?

Slightly more elaborate version

- handle errors

- don't do illegal things under spinlock

- clean up error unwinding

--- 
a/drivers/block/aoe/aoeblk.c~aoe-properly-initialise-the-request_queues-backing_dev_info
+++ a/drivers/block/aoe/aoeblk.c
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -210,25 +211,20 @@ aoeblk_gdalloc(void *vp)
if (gd == NULL) {
printk(KERN_ERR "aoe: cannot allocate disk structure for 
%ld.%ld\n",
d->aoemajor, d->aoeminor);
-   spin_lock_irqsave(>lock, flags);
-   d->flags &= ~DEVFL_GDALLOC;
-   spin_unlock_irqrestore(>lock, flags);
-   return;
+   goto err;
}
 
d->bufpool = mempool_create_slab_pool(MIN_BUFS, buf_pool_cache);
if (d->bufpool == NULL) {
printk(KERN_ERR "aoe: cannot allocate bufpool for %ld.%ld\n",
d->aoemajor, d->aoeminor);
-   put_disk(gd);
-   spin_lock_irqsave(>lock, flags);
-   d->flags &= ~DEVFL_GDALLOC;
-   spin_unlock_irqrestore(>lock, flags);
-   return;
+   goto err_disk;
}
 
-   spin_lock_irqsave(>lock, flags);
blk_queue_make_request(>blkq, aoeblk_make_request);
+   if (bdi_init(>blkq.backing_dev_info))
+   goto err_mempool;
+   spin_lock_irqsave(>lock, flags);
gd->major = AOE_MAJOR;
gd->first_minor = d->sysminor * AOE_PARTITIONS;
gd->fops = _bdops;
@@ -246,6 +242,16 @@ aoeblk_gdalloc(void *vp)
 
add_disk(gd);
aoedisk_add_sysfs(d);
+   return;
+
+err_mempool:
+   mempool_destroy(d->bufpool);
+err_disk:
+   put_disk(gd);
+err:
+   spin_lock_irqsave(>lock, flags);
+   d->flags &= ~DEVFL_GDALLOC;
+   spin_unlock_irqrestore(>lock, flags);
 }
 
 void
_


It was done lackadaisically and needs checking.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 24/28] blk_end_request: changing ide normal caller (take 3)

2007-12-03 Thread Kiyoshi Ueda

Hi Bartlomiej,

On Sat, 1 Dec 2007 23:53:05 +0100, Bartlomiej Zolnierkiewicz <[EMAIL 
PROTECTED]> wrote:
> On Saturday 01 December 2007, Kiyoshi Ueda wrote:
> > This patch converts "normal" parts of ide to use blk_end_request().
> > 
> > Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
> > Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
> > ---
> >  drivers/ide/ide-cd.c |6 +++---
> >  drivers/ide/ide-io.c |   17 ++---
> >  2 files changed, 9 insertions(+), 14 deletions(-)
> 
> [...]
> 
> > Index: 2.6.24-rc3-mm2/drivers/ide/ide-io.c
> > ===
> > --- 2.6.24-rc3-mm2.orig/drivers/ide/ide-io.c
> > +++ 2.6.24-rc3-mm2/drivers/ide/ide-io.c
> > @@ -78,14 +78,9 @@ static int __ide_end_request(ide_drive_t
> > ide_dma_on(drive);
> > }
> >  
> > -   if (!end_that_request_chunk(rq, uptodate, nr_bytes)) {
> > -   add_disk_randomness(rq->rq_disk);
> > -   if (dequeue) {
> > -   if (!list_empty(>queuelist))
> > -   blkdev_dequeue_request(rq);
> > +   if (!__blk_end_request(rq, uptodate, nr_bytes)) {
> > +   if (dequeue)
> > HWGROUP(drive)->rq = NULL;
> > -   }
> > -   end_that_request_last(rq, uptodate);
> > ret = 0;
> > }
> 
> Hmmm, this seems to change the old behavior (the request should
> be dequeued from the queue only if 'dequeue' variable is set)
> and AFAIR some error handling code (in ide-cd?) depends on the
> old behavior so please revisit this patch.

blk_end_request() takes care of the dequeue like below,
so I think no problem.  (Please see PATCH 01)

> + /* rq->queuelist of dequeued request should be list_empty() */
> + if (!list_empty(>queuelist))
> + blkdev_dequeue_request(rq);

In the case of ide-cd,
  o 'dequeue' variable is 1 only when the request is still linked
to the queue (i.e. rq->queuelist is not empty)
  o 'dequeue' variable is 0 only when the request has already been
removed from the queue (i.e. rq->queuelist is empty)
So blk_end_request() can handle it correctly.


If there are any drivers which don't want dequeue the queued request,
the code above would not work.
But, as far as I investigated, I have never seen such a requirement
in device drivers.

Do you think that ide may still gets a problem for the 'dequeue'?

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: solid state drive access and context switching

2007-12-03 Thread Alan Cox

> Given a fast low-latency solid state drive, would it ever be beneficial 
> to simply wait in the kernel for synchronous read/write calls to 
> complete?  The idea is that you could avoid at least two task context 
> switches, and if the data access can be completed at less cost than 
> those context switches it could be an overall win.

In certain situations theoretically yes, the kernel is better off
continuing to poll than switching to the idle thread. You can do this to
some extent in a driver already today - just poll rather than sleeping
but respsect the reschedule hints and don't do it with irqs masked.
 
> Has anyone played with this concept?

For things like SATA based devices they aren't that fast yet. 

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

solid state drive access and context switching

2007-12-03 Thread Chris Friesen



Over on comp.os.linux.development.system someone asked an interesting 
question, and I thought I'd mention it here.


Given a fast low-latency solid state drive, would it ever be beneficial 
to simply wait in the kernel for synchronous read/write calls to 
complete?  The idea is that you could avoid at least two task context 
switches, and if the data access can be completed at less cost than 
those context switches it could be an overall win.


Has anyone played with this concept?

Chris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: remap_file_pages() broken in 2.6.23?

2007-12-03 Thread Nick Piggin

On Mon, Dec 03, 2007 at 06:01:40PM +0530, Supriya Kannery wrote:
> Nick Piggin wrote:
> >On Thu, Nov 29, 2007 at 02:45:23PM -0500, Chuck Ebbert wrote:
> >  
> >>Original report: https://bugzilla.redhat.com/show_bug.cgi?id=404201
> >>
> >>The test case below, taken from the LTP test code, prints -1 (as
> >>expected) on 2.6.22 and 0 on 2.6.23. It tries to remap an out-of-range
> >>page. Proposed patch follows the program. Bug was apparently caused by
> >>commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7.
> >>
> >
> >Ah, that's not such good behaviour anyway. mmap is allowed to map
> >outside the file offset, so you're telling me that remap_file_pages
> >just magically should not be allowed to remap these...?
> >
> >  
> Validation check for pgoff was there in populate() in earlier 
> kernels.When populate() got removed and populate_range() was added, 
> during the specified commit, validation for pgoff also got removed. This 
> symantic would break existing apps that expects an error from 
> remap_file_pages when a large value for pgoff is given. Though the 
> change is error handling related, it breaks ABI from previous kernel 
> versions.

But only Oracle uses it AFAIK, and they don't require this behaviour.

 
> For validation, we check whether the pgoff + size exceeds the file size, 
> all in page units. And while calculating file size in page units, one 
> additional page unit is taken into account to get the exact number of 
> pages that contain the file size in bytes.
> f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
> < file size in bytes ---> <--- helps in rounding to next page 
> unit -->
> 
> mmap() will be mapping the minimum number of pages that can contain a 
> file. So offset cannot be a large value compared to file size. mmap() is 
> also supposed to return EINVAL when the offset is a large/invalid value 
> as man page mandates.

I don't think it is required that mmap must fail if it maps past i_size.
I don't think Linux fails in this case.


> >  
> >>Patch:
> >>
> >>Signed-off-by: Supriya Kannery <[EMAIL PROTECTED]>
> >>
> >>--- linux-2.6.23/mm/fremap.c.orig   2007-11-22 00:56:09.0 -0600
> >>+++ linux-2.6.23/mm/fremap.c2007-11-26 03:08:55.0 -0600
> >>@@ -124,6 +124,7 @@ asmlinkage long sys_remap_file_pages(uns
> >>struct vm_area_struct *vma;
> >>int err = -EINVAL;
> >>int has_write_lock = 0;
> >>+   unsigned long f_size = 0;
> >> 
> >>if (__prot)
> >>return err;
> >>@@ -181,6 +182,14 @@ asmlinkage long sys_remap_file_pages(uns
> >>goto retry;
> >>}
> >>mapping = vma->vm_file->f_mapping;
> >>+
> >>+   f_size = i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1;
> >>+   f_size = f_size >> PAGE_CACHE_SHIFT;
> >>+   if ((pgoff + size >> PAGE_CACHE_SHIFT) > f_size) {
> >>+   err = -EINVAL;
> >>+   goto out;
> >>+   }
> >>+
> >>/*
> >> * page_mkclean doesn't work on nonlinear vmas, so if
> >> * dirty pages need to be accounted, emulate with linear
> >>
> >
> >
> >I don't think there is anything preventing truncate races here. 
> >Theoretically
> >we could do it by taking i_mutex around here, but anyway then a subsequent
> >truncate is just going to be able to cause the mapping to be out of bounds
> >anyway.
> >
> >  
> i_size_read() is taking care of syncing between the writes/truncations 
> in SMP/ pre-emtable kernel. For SMP, it specifically takes care to get 
> the value again if any changes happen to the source.

And then right afterwards, the file gets truncated, and you hav eremapped
past i_size. So what's the point of preventing it? We have SIGBUS for
that.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/28] blk_end_request: changing ps3disk (take 3)

2007-12-03 Thread Kiyoshi Ueda

Hi Geert,

On Sun, 2 Dec 2007 10:34:56 +0100 (CET), Geert Uytterhoeven <[EMAIL PROTECTED]> 
wrote:
> On Fri, 30 Nov 2007, Kiyoshi Ueda wrote:
> > This patch converts ps3disk to use blk_end_request().
>  ^^^
> Patch subject and description are inconsistent with actual change.
> 
> > Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
> > Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
> > ---
> >  drivers/block/ps3disk.c |6 +-
> >  1 files changed, 1 insertion(+), 5 deletions(-)
> > 
> > Index: 2.6.24-rc3-mm2/drivers/block/ps3disk.c
> > ===
> > --- 2.6.24-rc3-mm2.orig/drivers/block/ps3disk.c
> > +++ 2.6.24-rc3-mm2/drivers/block/ps3disk.c
> > @@ -280,11 +280,7 @@ static irqreturn_t ps3disk_interrupt(int
> > }
> >  
> > spin_lock(>lock);
> > -   if (!end_that_request_first(req, uptodate, num_sectors)) {
> > -   add_disk_randomness(req->rq_disk);
> > -   blkdev_dequeue_request(req);
> > -   end_that_request_last(req, uptodate);
> > -   }
> > +   __blk_end_request(req, uptodate, num_sectors << 9);
>   ^

Thank you for the comment.
The description meant the blk_end_request family, not actual function,
blk_end_request().  But as you pointed out, it is misleading.
I'll change the description of all related patches.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pnpacpi : exceeded the max number of IO resources

2007-12-03 Thread Chris Holvenstot


On Mon, 2007-12-03 at 18:02 +0100, Rene Herman wrote:
> On 30-11-07 23:22, Rene Herman wrote:
> 
> > On 30-11-07 14:14, Chris Holvenstot wrote:
> > 
> >> For what it is worth I too have seen this problem this morning and it
> >> DOES appear to be new (in contrast to a previous comment)
> >>
> >> The message:  pnpacpi: exceeded the max number of mem resources: 12
> >>
> >> is displayed each time the system is booted with the 2.6.24-rc3-git5
> >> kernel but is NOT displayed when booting 2.6.24-rc3-git4
> >>
> >> I have made no changes in my config file between these two kernels other
> >> than to accept any new defaults when running make oldconfig.
> >>
> >> If you had already narrowed it down to a change between git4 and git5 I
> >> apologize for wasting your time.  Have to run to work now.
> > 
> > Thanks, and re-added the proper CCs. Sigh...
> > 
> > Well, yes, the warning is actually new as well. Previously your kernel 
> > just silently ignored 8 more mem resources than it does now it seems.
> > 
> > Given that people are hitting these limits, it might make sense to just 
> > do away with the warning for 2.6.24 again while waiting for the dynamic 
> > code?
> 
> Ping. Should these warnings be reverted for 2.6.24?
> 
> Rene.
> 

Rene - 

Thanks for the follow up - from my perspective now that I know that the
condition that caused the warning messages has been with us for some
time, and that previously the messages were suppressed it really does
not make that much difference to me if the warnings are reverted or not.

So I guess that I vote for doing whatever is best for the developer.
After all they are the ones doing the heavy lifting. If the warning
message is able to provide some insight into the problem so much the
better.

At this point my goal is just to learn enough to be an asset as a tester
instead of a net loss (defined as someone whose efforts cost the team
more man-hours than their contribution is worth)

Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 26/28] blk_end_request: changing ide-cd (take 3)

2007-12-03 Thread Kiyoshi Ueda

Hi Bartlomiej,

On Sat, 1 Dec 2007 23:42:51 +0100, Bartlomiej Zolnierkiewicz <[EMAIL 
PROTECTED]> wrote:
> On Saturday 01 December 2007, Kiyoshi Ueda wrote:
> > This patch converts ide-cd (cdrom_newpc_intr()) to use blk_end_request().
> > 
> > ide-cd (cdrom_newpc_intr()) has some tricky behaviors below which
> > need to use blk_end_request_callback().
> > Needs to:
> >   1. call post_transform_command() to modify request contents
> 
> Seems like post_transform_command() call can be removed (patch below).
> 
> >   2. wait completing request until DRQ_STAT is cleared
> 
> Would be great if somebody convert cdrom_newpc_intr() to use scatterlists
> also for PIO transfers (ide_pio_sector() in ide-taskfile.c should serve
> as a good starting base to see how to do PIO transfers using scatterlists)
> so we could get rid of partial request completions in cdrom_newpc_intr()
> and just fully complete request when the transfer is done.  Shouldn't be
> difficult but I guess that we can live with blk_end_request_callback() for
> the time being...
> 
> > after end_that_request_first() and before end_that_request_last().
> > 
> > As for the second one, ide-cd will wait for the interrupt from device.
> > So blk_end_request_callback() has to return without completing request
> > even if no leftover in the request.
> > ide-cd uses a dummy callback function, which just returns value '1',
> > to tell blk_end_request_callback() about that.
> > 
> > Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
> > Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
> 
> [PATCH] ide-cd: remove dead post_transform_command()
> 
> post_transform_command() call in cdrom_newpc_intr() has no effect because
> it is done after the request has already been fully completed (rq->bio and
> rq->data are always NULL).  It was verified to be true regardless whether
> INQUIRY command is using DMA or PIO to transfer data (by using modified
> Tejun Heo's test-shortsg.c utility and adding a few printk()-s to ide-cd).
> 
> This was uncovered thanks to the "blk_end_request: full I/O completion
> handler (take 3)" patch series from Kiyoshi Ueda.
> 
> Cc: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Cc: Kiyoshi Ueda <[EMAIL PROTECTED]
> Cc: Jun'ichi Nomura <[EMAIL PROTECTED]>
> Cc: Tejun Heo <[EMAIL PROTECTED]>
> Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>
> ---
> Kiyoshi: please rebase your patch on top of this one (I'll send
> it to Linus in the next IDE update), should make your patch a bit
> simpler.
> 
> Tejun: you had really good timing with posting test-shortsg.c
> (it saved me some time coding user-space SG_IO tester), thanks!
> 
>  drivers/ide/ide-cd.c |   28 
>  1 file changed, 28 deletions(-)
> 
> Index: b/drivers/ide/ide-cd.c
> ===
> --- a/drivers/ide/ide-cd.c
> +++ b/drivers/ide/ide-cd.c
> @@ -1650,31 +1650,6 @@ static int cdrom_write_check_ireason(ide
>   return 1;
>  }
>  
> -static void post_transform_command(struct request *req)
> -{
> - u8 *c = req->cmd;
> - char *ibuf;
> -
> - if (!blk_pc_request(req))
> - return;
> -
> - if (req->bio)
> - ibuf = bio_data(req->bio);
> - else
> - ibuf = req->data;
> -
> - if (!ibuf)
> - return;
> -
> - /*
> -  * set ansi-revision and response data as atapi
> -  */
> - if (c[0] == GPCMD_INQUIRY) {
> - ibuf[2] |= 2;
> - ibuf[3] = (ibuf[3] & 0xf0) | 2;
> - }
> -}
> -
>  typedef void (xfer_func_t)(ide_drive_t *, void *, u32);
>  
>  /*
> @@ -1810,9 +1785,6 @@ static ide_startstop_t cdrom_newpc_intr(
>   return ide_started;
>  
>  end_request:
> - if (!rq->data_len)
> - post_transform_command(rq);
> -
>   spin_lock_irqsave(_lock, flags);
>   blkdev_dequeue_request(rq);
>   end_that_request_last(rq, 1);

Thank you for the comments.
I rebased my patch on top of 2.6.24-rc3-mm2 + the patch to remove
post_transform_command().

As a result, one callback function for DMA mode has been removed.
What do you think about the patch below?



Subject: [PATCH 26/28] blk_end_request: changing ide-cd (take 3)


This patch converts ide-cd (cdrom_newpc_intr()) to use blk_end_request
interfaces.

In PIO mode, ide-cd (cdrom_newpc_intr()) needs to defer
end_that_request_last() until the device clears DRQ_STAT and raises
an interrupt after end_that_request_first().
So blk_end_request() has to return without completing request
even if no leftover in the request.

ide-cd uses blk_end_request_callback() and a dummy callback function,
which just returns value '1', to tell blk_end_request_callback()
about that.

Signed-off-by: Kiyoshi Ueda <[EMAIL PROTECTED]>
Signed-off-by: Jun'ichi Nomura <[EMAIL PROTECTED]>
---
 drivers/ide/ide-cd.c |   49 +++--
 1 files changed, 35 insertions(+), 14 deletions(-)

Index: 2.6.24-rc3-mm2/drivers/ide/ide-cd.c

Re: [Bugme-new] [Bug 9482] New: kernel GPF in 2.6.24 (g09f345da)

2007-12-03 Thread Andrew Morton

On Mon, 3 Dec 2007 16:38:37 -0500
"Ed L. Cashin" <[EMAIL PROTECTED]> wrote:

> >   --- lx/lib/percpu_counter.c.200711302007-12-03 15:43:19.0 
> > -0500
> >   +++ lx/lib/percpu_counter.c 2007-12-03 15:47:38.0 -0500
> >   @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_
> >   s64 count;
> >   s32 *pcount;
> >   int cpu = get_cpu();
> >   +   u64 badval = 0xULL;
> >
> >   +   BUG_ON(!cpu_possible(cpu));
> >   pcount = per_cpu_ptr(fbc->counters, cpu);
> >   count = *pcount + amount;
> >   if (count >= batch || count <= -batch) {
> 
> It appears that the fbc->counters pointer is NULL.

Does this fix?

--- a/drivers/block/aoe/aoeblk.c~a
+++ a/drivers/block/aoe/aoeblk.c
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -228,6 +229,7 @@ aoeblk_gdalloc(void *vp)
 
spin_lock_irqsave(>lock, flags);
blk_queue_make_request(>blkq, aoeblk_make_request);
+   bdi_init(>blkq.backing_dev_info);
gd->major = AOE_MAJOR;
gd->first_minor = d->sysminor * AOE_PARTITIONS;
gd->fops = _bdops;
_



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Out of tree module using LSM

2007-12-03 Thread Bodo Eggert

Jon Masters <[EMAIL PROTECTED]> wrote:
> On Thu, 2007-11-29 at 11:11 -0800, Ray Lee wrote:
>> On Nov 29, 2007 10:56 AM, Jon Masters <[EMAIL PROTECTED]> wrote:
>> > On Thu, 2007-11-29 at 10:40 -0800, Ray Lee wrote:
>> > > On Nov 29, 2007 9:36 AM, Alan Cox <[EMAIL PROTECTED]> wrote:

>> > > > > closed. But more importantly further access to it can be blocked
>> > > > > until appropriate actions are taken which also applies with your
>> > > > > example, no? Is
>> > > >
>> > > > That bit is hard- very hard.

>> To lift Alan's example, a naive first implementation
>> would be to create a suffix tree of all of ESR's works, then scan each
>> page on fault to see if there are any partial matches in the tree.
> 
> Ah, but I could write a sequence of pages that on their own looked
> garbage, but in reality, when executed would print out a copy of the
> Jargon File in all its glory. And if you still think you could look for
> patterns, how about executable code that self-modifies in random ways
> but when executed as a whole actually has the functionality of fetchmail
> embedded within it? How would you guard against that?

You can't scan all possible code for malware:
Take a random piece of code, possibly halting. Replace all halting conditions
using a piece of malware. Scan it. If it were possible to detect the malware
without false positives, you'd have solved the halting problem.

In practice, this does not hinder virus scanners from preventing most damage.
Therefore I think it's OK to have one.

If I had to design a virus scanner interface, I'd e.g. create a library*
providing an {open|mmap}_and_scan() function that would give me a clean
copy/really-private mapping of a scanned file, and a scan_{blob,file}()
function that would scan a block of memory/a file. Then, it's up to the
application to ensure that it uses that library. As a result, you could
e.g. run "less eicar.sh", but you could not run "bash eicar.sh"**, and an
application receiving a strangely encoded piece of malware into it's
memory has a chance of avoiding an infection without writing it to a file.
Maybe gpg < eicar.gpg.sh|sh will unintendedly work, but I don't think
scanning pipes would be easy anyway. OTOH, maybe the library would make
it feasible at all, provided the malicious code is not located way before
the signature.

Off cause I'd need to do something about binaries. At first glance, this
does not seem too bad, since there is a way to run ld*.so. I'd just use it
to enforce a preloader for static binaries, too. (I'm glad I can leave the
implementation details to somebody else.-)

*  Without having a virus scanner installed, this library will just NOOP
   by default.

** Bonus: I can unzip open_office_file; rm macros; zip open_office_file.
   OTOH, the scanner should provide a cleaner for those simple cases.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4, v3] Physical PCI slot objects

2007-12-03 Thread Alex Chiang

Hi Kenji-san,

* Kenji Kaneshige <[EMAIL PROTECTED]>:
> Hi Alex-san,
>
>>> On my system, hotplug slots themselves can be added, removed
>>> and replaced with the ohter type of I/O box. 

Are you talking about some sort of I/O cabinet/chassis that you
can attach to the actual computer? Can the I/O expander unit be
hotplugged? Or do you need to power your machine down to attach
it?

If you can hotplug it, I'm guessing that is why your firmware
presents SxFy objects in the namespace with "weird" _SUN values,
and it's why you have to check _STA to see if the slots are valid
or not. That means the value returned by _SUN will change too,
right? What will it turn into?

Is that right? Or am I completely wrong? :)

>>> The ACPI firmware tells OS the presence of those slots using
>>> _STA method (That is, it doesn't use 'LoadTable()' AML
>>> operator). On the other hand, current pci_slot driver doesn't
>>> check _STA.  As a result, pci_slot driver tryied to register
>>> the invalid (non-existing) slots. The ACPI firmware of my
>>> system returns '1023' if the invalid slot's _SUN is
>>> evaluated. This is the cause of Call Traces mentioned above.
>>> To fix this problem, pci_slot driver need to check _STA when
>>> scanning ACPI Namespace.
>>
>> Now this is very curious. The relevant line in pci_slot is:
>> check_slot()
>>  status = acpi_evaluate_integer(handle, "_SUN", NULL, sun);
>>  if (ACPI_FAILURE(status))
>>  return -1;
>> Why does your firmware return the error information inside sun,
>> instead of returning an error in status? That doesn't seem right
>> to me...
>
> Because ACPI spec doesn't provide any way for firmware (AML)
> to return as error.

You are right -- I got confused about the interpreter returning
AE_NOT_FOUND vs the actual firmware returning an error value.
Thank you for this clarification.

> In addtion, I think we should not trust the _SUN value of
> non-existing device because the ACPI spec says in "6.5.1 _INI
> (Init)" that _INI method is run before _ADR, _CID, _HID, _SUN, and
> _UID are run. It means _SUN could be initialized in _INI method
> implecitely. And it also says that "If the _STA method indicates
> that the device is not present, OSPM will not run the _INI and will
> not examine the children of the device for _INI methods.". After all,
> _SUN for non-existing device is not reliable because it might not
> initialized by _INI method.

This is true, but HP platforms provide _INI at the root
device/host bridge level, not on SxFy objects, so it doesn't seem
that we would need to call _STA before calling _SUN for SxFy.

Does your firmware provide _INI on SxFy objects?

>>> I'm sorry for reporting this so late. I'm attaching the patch
>>> to fix the problem. This is against 2.6.24-rc3 with your
>>> patches applied. Could you try it?
>> Applying this patch causes me to only detect populated slots in
>> my system, which isn't what I want -- otherwise, I could have
>> just enumerated the PCI bus and found the devices that way. :)
>> Maybe on your machine, checking existence of _STA might do the
>> right thing, but I don't think we should actually be looking at
>> any of the actual bits returned. If we check ACPI_STA_DEVICE_PRESENT, then 
>> we will not detect
>> empty slots on my system. Can you try this patch to see if at
>> least the first call to acpi_evaluate_integer helps? If that
>> doesn't help, maybe the second block will help you, but it breaks
>> my machine...
>
> Maybe the result is as you guess.
> The first block doesn't help me (with the first block, all of the
> slot disappeared. Please see the bottom of this mail for details).
> The second block helps me.
>
> There seems a difference of the interpretation about _STA for PCI
> hotplug slot between your firmware and my firmware. The difference
> is:
>
>  - Your firmware provides the _STA method to represent the presence
>of PCI adapter card on the slot.
>
>  - My firmware provides the _STA method to represent the presence
>of the slot.

Yes, that sounds right...

> Providing _STA method to represent the presence of PCI adpater card
> on the slot (as your firmware does) doesn't seem right to me because
> of the following reasons.
>
>  - ACPI spec says "After calling _EJ0, OSPM verifies the device no
>longer exists to determine if the eject succeeded. For _HID devices,
>OSPM evaluates the _STA method. For _ADR devices, OSPM checks with
>the bus driver for that device." in "6.3.3 _EJx (Eject)". Because
>PCI adapter card on the slot is _ADR device, the presence of the
>card must be checked with bus driver, not _STA.
>
>  - ACPI spec provides the example AML code which uses _STA to
>represent Docking Station (See 6.3.2 _EJD (Ejection Dependent
>Device)". The usage of this is same as my firmware.
>
> What do you think about that?

Our firmware teams seem to think that _STA should give the status
of the card for hotplug support and general functional state.
They claim that it

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Ingo Molnar


* Mark Lord <[EMAIL PROTECTED]> wrote:

>> heh, thanks :) For which workload does it make the biggest difference 
>> for you? (and compared to what other scheduler you used before? 
>> 2.6.22?)
> ..
>
> Heh.. I'm just a very unsophisticated desktop user, and I like it when 
> Thunderbird and Firefox are unaffected by the "make -j3" kernel builds 
> that are often running in another window.  BIG difference there.
>
> And on the cool side, the Swarm game (swarm.swf) is a great example of 
> something that used to get jerky really fast whenever anything else 
> was running, and now it really doesn't seem to be affected by 
> anything. (I don't really play computer games, but this one is has a 
> very retro feel..).

nice! Do you feel any difference between 2.6.23 and 2.6.24-rc for these 
workloads? (if you've tried .24 already)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

2007-12-03 Thread Rafael J. Wysocki

On Monday, 3 of December 2007, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> 
> > > This feature will save one full reporter-developer round-trip during 
> > > investigation of a significant number of bug reports.
> > > 
> > > It might be more practical if it were to dump the traces for _all_ 
> > > D-state processes when it fires - basically an auto-triggered 
> > > sysrq-W.
> > 
> > Er, it won't play well if that happen when tasks are frozen for 
> > suspend.
> 
> right now any suspend attempt times out after 20 seconds:
> 
>   $ grep TIMEOUT kernel/power/process.c
>   #define TIMEOUT (20 * HZ)
>   end_time = jiffies + TIMEOUT;

This is the timeout for freezing tasks, but if the freezing succeeds, they
can stay in TASK_UNINTERRUPTIBLE for quite some more time, especially during
a hibernation (the tasks stay frozen until we power off the system after saving
the image).

> which should be well before the 120 seconds timeout that the detector 
> uses. But indeed you are right in that the refrigerator() works via 
> TASK_UNINTERRUPTIBLE too. I've updated the patch to exclude PF_FROZEN - 
> attached below. That should solve this particular issue, even if the 
> timeout increased to above 20 secs, right?

Sure.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: XFS/firefox-bin (2.6.23.8)

2007-12-03 Thread Rafael J. Wysocki

On Monday, 3 of December 2007, David Chinner wrote:
> On Sun, Dec 02, 2007 at 09:06:19AM -0800, Avuton Olrich wrote:
> > Adding xfs to CC
> > 
> > On Dec 2, 2007 9:02 AM, Avuton Olrich <[EMAIL PROTECTED]> wrote:
> > > Hello,
> > >
> > > 2.6.23.8 just crashed here, it had been up 8 days and suspended to
> > > disk many times in those 8 days. The process that crashed it was
> > > firefox-3.0b1. It crashed and could not be killed (please excuse me, I
> > > failed to get ps auxf output).
> > >
> > > All of the following information was after reboot, except, of course,
> > > for the BUG.
> .
> 
> > > [ 3158.936251] BUG: unable to handle kernel NULL pointer dereference
> > > at virtual address 
> > > [ 3158.936260]  printing eip:
> > > [ 3158.936261] c013405b
> > > [ 3158.936262] *pde = 
> > > [ 3158.936266] Oops: 0002 [#1]
> > > [ 3158.936276] PREEMPT
> > > [ 3158.936282] Modules linked in: cdc_acm netconsole snd_pcm_oss 
> > > snd_mixer_oss
> > > [ 3158.936297] CPU:0
> > > [ 3158.936298] EIP:0060:[]Not tainted VLI
> > > [ 3158.936299] EFLAGS: 00210246   (2.6.23.8 #4)
> > > [ 3158.936312] EIP is at current_kernel_time+0x2b/0x40
> 
> I don't think this is XFS related - there's something really screwed up
> if you've crashed in current_kernel_time().
> 
> We've got suspend/resume involved, so who knows what might have gone
> wrong.
> 
> Rafael, any ideas?

Well, I haven't seen symptoms like these yet.

I'd try to unset NO_HZ (if set) and HIGH_RES_TIMERS (if set), but it doesn't
look like a readily reprodicible bug.

> > > [ 3158.936316] eax:    ebx: 24a60770   ecx:    edx: 
> > > 0f99c038
> > > [ 3158.936320] esi: 00402000   edi: 0007   ebp: 81a4   esp: 
> > > ee429ce0
> > > [ 3158.936323] ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
> > > [ 3158.936327] Process firefox-bin (pid: 11154, ti=ee428000
> > > task=eec00540 task.ti=ee428000)
> > > [ 3158.936331] Stack: f177ac00 f21128c0 0007 81a4 c026495a
> > > 8000 efeb4180 
> > > [ 3158.936348]c023ac60 098c745c  0001 0004
> > > ee429d30  f7c129e0
> > > [ 3158.936366]f21128c0  098c745c  f177ac00
> > > f7c129e0  ee429e18
> > > [ 3158.936451] Call Trace:
> > > [ 3158.936455]  [] xfs_ichgtime+0x1a/0xa0
> > > [ 3158.936465]  [] xfs_ialloc+0x230/0x620
> > > [ 3158.936473]  [] xfs_dir_ialloc+0x85/0x2d0
> > > [ 3158.936483]  [] xfs_trans_reserve+0x82/0x200
> > > [ 3158.936489]  [] xfs_create+0x386/0x690
> > > [ 3158.936494]  [] dput+0x20/0x150
> > > [ 3158.936501]  [] futex_wait+0x266/0x360
> > > [ 3158.936507]  [] xfs_create+0x0/0x690
> > > [ 3158.936511]  [] xfs_vn_mknod+0x15b/0x200
> > > [ 3158.936516]  [] xfs_vn_create+0x0/0x10
> > > [ 3158.936521]  [] vfs_create+0x93/0xd0
> > > [ 3158.936525]  [] open_namei+0x53e/0x650
> > > [ 3158.936530]  [] do_wp_page+0x312/0x4a0
> > > [ 3158.936537]  [] do_filp_open+0x2e/0x60
> > > [ 3158.936542]  [] get_unused_fd_flags+0x4e/0xe0
> > > [ 3158.936546]  [] do_sys_open+0x4c/0xe0
> > > [ 3158.936612]  [] sys_open+0x1c/0x20
> > > [ 3158.936616]  [] sysenter_past_esp+0x5f/0x85
> > > [ 3158.936622]  ===
> > > [ 3158.936624] Code: 55 8b 0d 80 a4 56 c0 57 56 53 eb 06 8d 74 26 00
> > > 89 d1 8b 1d b4 a4 56 c0 8b 35 b0 a4 56 c0 8b 15 80 a4 56 c0 89 c8 83
> > > e1 01 31 d0 <09> c8 75 e1 89 da 89 f0 5b 5e 5f 5d c3 90 8d b4 26 00 00
> > > 00 00
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] NET: mac80211 - fix missed mutex unlock

2007-12-03 Thread Michael Wu

On Monday 03 December 2007 05:50:38 Cyrill Gorcunov wrote:
> This patch does fix missed mutex unlock. Please see it attached.
> (Can't send it as a plain text patch - have only IE based mail access)
>
> Cyrill
Acked-by: Michael Wu <[EMAIL PROTECTED]>

Thanks,
-Michael Wu


signature.asc
Description: This is a digitally signed message part.

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Mark Lord


Ingo Molnar wrote:

* Mark Lord <[EMAIL PROTECTED]> wrote:

Ack.  And what of the suggestion to try to ensure that a yielding task 
simply not end up as the very next one chosen to run?  Maybe by 
swapping it with another (adjacent?) task in the tree if it comes out 
on top again?


we did that too for quite some time in CFS - it was found to be "not 
agressive enough" by some folks and "too agressive" by others. Then when 
people started bickering over this we added these two simple corner 
cases - switchable via a flag. (minimum agression and maximum agression)


(I really don't know the proper terminology to use here, but hopefully 
Ingo can translate that).


the terminology you used is perfectly fine.


Thanks Ingo -- I *really* like this scheduler!


heh, thanks :) For which workload does it make the biggest difference 
for you? (and compared to what other scheduler you used before? 2.6.22?)

..

Heh.. I'm just a very unsophisticated desktop user, and I like it when
Thunderbird and Firefox are unaffected by the "make -j3" kernel builds
that are often running in another window.  BIG difference there.

And on the cool side, the Swarm game (swarm.swf) is a great example of
something that used to get jerky really fast whenever anything else was
running, and now it really doesn't seem to be affected by anything.
(I don't really play computer games, but this one is has a very retro feel..).

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: suspend-related lockdep warning

2007-12-03 Thread Rafael J. Wysocki

On Monday, 3 of December 2007, Andrew Morton wrote:
> On Sun, 2 Dec 2007 21:33:23 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
> wrote:
> 
> > On Saturday, 1 of December 2007, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > 2.6.24-rc3-mm2 (which will be released if it boots on two more machines 
> > > > and
> > > > if I stay awake) will say this during suspend-to-RAM on the Vaio:
> > > > 
> > > > [   91.876445] Syncing filesystems ... done.
> > > > [   92.382595] Freezing user space processes ... WARNING: at 
> > > > kernel/lockdep.c:2662 check_flags()
> > > > [   92.384000] Pid: 1925, comm: dbus-daemon Not tainted 2.6.24-rc3-mm2 
> > > > #32
> > > > [   92.384177]  [] show_trace_log_lvl+0x12/0x25
> > > > [   92.384335]  [] show_trace+0xd/0x10
> > > > [   92.384469]  [] dump_stack+0x55/0x5d
> > > > [   92.384605]  [] check_flags+0x7f/0x11a
> > > > [   92.384746]  [] lock_acquire+0x3a/0x86
> > > > [   92.384886]  [] _spin_lock+0x26/0x53
> > > > [   92.385023]  [] refrigerator+0x13/0xc8
> > > > [   92.385163]  [] get_signal_to_deliver+0x32/0x3fb
> > > > [   92.385326]  [] do_notify_resume+0x8c/0x699
> > > > [   92.385476]  [] work_notifysig+0x13/0x1b
> > > > [   92.385620]  ===
> > > > [   92.385719] irq event stamp: 309
> > > > [   92.385809] hardirqs last  enabled at (309): [] 
> > > > syscall_exit_work+0x11/0x26
> > > > [   92.386045] hardirqs last disabled at (308): [] 
> > > > syscall_exit+0x14/0x25
> > > > [   92.386265] softirqs last  enabled at (0): [] 
> > > > copy_process+0x374/0x130e
> > > > [   92.386491] softirqs last disabled at (0): [<>] 0x0
> > > > [   92.392457] (elapsed 0.00 seconds) done.
> > > > [   92.392581] Freezing remaining freezable tasks ... (elapsed 0.00 
> > > > seconds) done.
> > > > [   92.392882] PM: Entering mem sleep
> > > > [   92.392974] Suspending console(s)
> > > > 
> > > > this has been happening for quite some time and might even be happening 
> > > > in
> > > > mainline.  
> > > 
> > > Is it complaining that we entered refrigerator with irqs disabled?
> > 
> > Or that someone else called task_lock() with irqs disabled at one point ...
> > 
> > Hm, perhaps it's related to kernel preemption.  Andrew, I guess the kernel 
> > is
> > preemptible?
> > 
> 
> yup.  http://userweb.kernel.org/~akpm/config-sony.txt

Is this reproducible with kernel preemption off?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: HMAC broken on s390

2007-12-03 Thread Herbert Xu

On Mon, Dec 03, 2007 at 02:31:40PM +0100, Jan Glauber wrote:
> Hi Herbert,
> 
> commit 788fefa33b0b50581585925c53c230a36af35d0e in cryptodev breaks hmac
> on s390 du to the usage of sg_chain():
> 
> static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents,
> struct scatterlist *sgl)
> {
> #ifndef ARCH_HAS_SG_CHAIN
> BUG();
> #endif
> 
> ARCH_HAS_SG_CHAIN is false for s390 (and also for some other arch's).
> 
> What should we do with this?

Looks like we took a step backwards because the chaining I had
before worked on all architectures :)

I suppose either we'll have to do our own chaining again or implement
it for all architectures.  I'll look into this.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Ingo Molnar

* Mark Lord <[EMAIL PROTECTED]> wrote:

> Ack.  And what of the suggestion to try to ensure that a yielding task 
> simply not end up as the very next one chosen to run?  Maybe by 
> swapping it with another (adjacent?) task in the tree if it comes out 
> on top again?

we did that too for quite some time in CFS - it was found to be "not 
agressive enough" by some folks and "too agressive" by others. Then when 
people started bickering over this we added these two simple corner 
cases - switchable via a flag. (minimum agression and maximum agression)

> (I really don't know the proper terminology to use here, but hopefully 
> Ingo can translate that).

the terminology you used is perfectly fine.

> Thanks Ingo -- I *really* like this scheduler!

heh, thanks :) For which workload does it make the biggest difference 
for you? (and compared to what other scheduler you used before? 2.6.22?)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] AVR32 - Add HAVE_OPROFILE

2007-12-03 Thread Mathieu Desnoyers

Should be applied after the 
"Add HAVE_KPROBES" patch.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
CC: Haavard Skinnemoen <[EMAIL PROTECTED]>
CC: Andrew Morton <[EMAIL PROTECTED]>
---
 arch/avr32/Kconfig |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6-lttng/arch/avr32/Kconfig
===
--- linux-2.6-lttng.orig/arch/avr32/Kconfig 2007-12-03 16:55:43.0 
-0500
+++ linux-2.6-lttng/arch/avr32/Kconfig  2007-12-03 16:56:01.0 -0500
@@ -12,6 +12,7 @@ config AVR32
# that we usually don't need on AVR32.
select EMBEDDED
select HAVE_KPROBES
+   select HAVE_OPROFILE
help
  AVR32 is a high-performance 32-bit RISC microprocessor core,
  designed for cost-sensitive embedded applications, with particular

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-12-03 Thread Andi Kleen

> All ptrace cleanup patches from Roland were posted by him to lkml, and 
> we picked them up from there. Review is ongoing, Roland replied to all 
> feedback with more patches, and those were integrated as well. Final 
> upstream merging of this depends on more review and test results (as 
> usual), but it's looking good so far.

Yes, clearly I overreacted. I actually saw the ptrace patches, but 
somehow didn't connect them with utrace which as I remember
was a much larger patch kit.  Sorry for that, Roland.

Anyways I think my original point about not delaying jump ptrace
for utrace holds still though. e.g. Markus' patches are a clear .25 
candidates, but for utrace that is probably far too late by now.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix AVR32 for instrumentation menu removal

2007-12-03 Thread Mathieu Desnoyers

Quoting : Haavard Skinnemoen <[EMAIL PROTECTED]>

Mathieu, I want to get oprofile support queued up for 2.6.25, and
since I don't see any signs that the Kconfig.instrumentation removal
patches are going in any time soon, I'm going to turn things around
and apply these two patches to my tree.

This unfortunately means that your patch won't apply cleanly to -mm
anymore, but it'll hopefully be easy to fix. When you do, please add
ARCH_SUPPORTS_OPROFILE to arch/avr32/Kconfig as well.

Me:

This patch fixes the changes made by [PATCH 2/2] [AVR32] Oprofile support so the
instrumentation removal patch applies correctly.

Should be applied after "[PATCH 2/2] [AVR32] Oprofile support", but before the
instrumentation menu removal patches.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
CC: Haavard Skinnemoen <[EMAIL PROTECTED]>
CC: Andrew Morton <[EMAIL PROTECTED]>
---
 kernel/Kconfig.instrumentation |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-lttng/kernel/Kconfig.instrumentation
===
--- linux-2.6-lttng.orig/kernel/Kconfig.instrumentation 2007-12-03 
16:52:42.0 -0500
+++ linux-2.6-lttng/kernel/Kconfig.instrumentation  2007-12-03 
16:52:54.0 -0500
@@ -21,7 +21,7 @@ config PROFILING
 config OPROFILE
tristate "OProfile system profiling (EXPERIMENTAL)"
depends on PROFILING
-   depends on ALPHA || ARM || AVR32 || BLACKFIN || X86_32 || IA64 || M32R 
|| MIPS || PARISC || PPC || S390 || SUPERH || SPARC || X86_64
+   depends on ALPHA || ARM || BLACKFIN || X86_32 || IA64 || M32R || MIPS 
|| PARISC || PPC || S390 || SUPERH || SPARC || X86_64
help
  OProfile is a profiling system capable of profiling the
  whole system, include the kernel, kernel modules, libraries,

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-12-03 Thread Alan Cox

> You could write an equally effcient kernel in languages like C++,
> using C++ abstractions as a high level organization, where

It's very very hard to generate good C code because of the numerous ways
objects get temporarily created, and the week aliasing rules (as with C).

There are reasons that Fortran lives on (and no I'm not suggesting one
should rewrite the kernel in Fortran ;)) and the fact its not really got
pointer aliasing or "address of" operators and all the resulting
optimsation problems is one of the big ones.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Built-in modules for PCI devices

2007-12-03 Thread Michael_chen


Hi guys,

Can anybody tell me how to find the mapping of PCI PnP ids for those
built-in modules(linked into kernel). AFAICT, PnP id and module mapping
could be found in /lib/modules/`uname -r`/modules.pcimap for the loadable
modules. I think there should be a table or list for the mapping between PnP
ids and modules loaded into kernel inside kernel space. My question is if it
could be accessed in user space and how to do it if possible.

Thanks in advance,

-Michael  
-- 
View this message in context: 
http://www.nabble.com/Built-in-modules-for-PCI-devices-tf4939545.html#a14139669
Sent from the linux-kernel mailing list archive at Nabble.com.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Mark Lord


Ingo Molnar wrote:

* Mark Lord <[EMAIL PROTECTED]> wrote:

That's not the same thing at all. I think that David is suggesting 
that the reinsertion logic should pretend that the task used up all of 
the CPU time it was offered in the slot leading up to the 
sched_yield() call.


we have tried this too, and this has problems too (artificial inflation 
of the vruntime metric and a domino effects on other portions of the 
scheduler). So this is a worse solution than what we have now. (and this 
has all been pointed out in past discussions in which David 
participated. I'll certainly reply to any genuinely new idea.)

..

Ack.  And what of the suggestion to try to ensure that a yielding task
simply not end up as the very next one chosen to run?  Maybe by swapping
it with another (adjacent?) task in the tree if it comes out on top again?

(I really don't know the proper terminology to use here,
but hopefully Ingo can translate that).

That's probably already been covered too, but are the prior conclusions still 
valid?

Thanks Ingo -- I *really* like this scheduler!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Timers SMP] can this machine be helped?

2007-12-03 Thread Guennadi Liakhovetski

On Mon, 3 Dec 2007, Pavel Machek wrote:

> On Mon 2007-12-03 22:45:06, Guennadi Liakhovetski wrote:
> > On Sun, 2 Dec 2007, Pavel Machek wrote:
> 
> > > > CR0: 8005003b CR2: 081dcf88 CR3: 07e46000 CR4: 02d0
> > > > DR0:  DR1:  DR2:  DR3: 
> > > > DR6: 0ff0 DR7: 0400
> > > >  [] show_trace_log_lvl+0x1a/0x30
> > > >  [] show_trace+0x12/0x20
> > > 
> > > ...and disable softlockup watchdog, too...
> > 
> > So, you think those BUGs are bogus?
> 
> Well you want to make old machine usable, right? Disabling
> warnings is fair game.

Ouch, but not if a CPU __really__ gets stuck for 13 seconds...

Thanks
Guennadi
---
Guennadi Liakhovetski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-12-03 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> Ingo Molnar <[EMAIL PROTECTED]> writes:
> 
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> >> Just don't wait for that. utrace doesn't seem to have any concrete 
> >> plans to merge any time soon AFAIK[1] and it would be a shame to delay 
> >> an useful feature forever.
> >> 
> >> [1] At least the patches have not reached any mailing lists
> >
> > FYI, as far as arch/x86 goes, the merging of Roland's utrace 
> > preparatory patches is well underway in x86.git, and the merge went 
> > pretty well so far, with robust results. It's 49 patches so far:
> 
> I see. They are planning to just skip the public review stage?  
> Clever.

Andi, is this some kind of "destroy your years of kernel hacking 
credibility within a few days" contest that you are participating in?? 
What you are doing is getting really silly.

All ptrace cleanup patches from Roland were posted by him to lkml, and 
we picked them up from there. Review is ongoing, Roland replied to all 
feedback with more patches, and those were integrated as well. Final 
upstream merging of this depends on more review and test results (as 
usual), but it's looking good so far.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Timers SMP] can this machine be helped?

2007-12-03 Thread Pavel Machek

On Mon 2007-12-03 22:45:06, Guennadi Liakhovetski wrote:
> On Sun, 2 Dec 2007, Pavel Machek wrote:
> 
> > > I compiled a .24-ish kernel for it with CONFIG_NO_HZ and 
> > > CONFIG_HIGH_RES_TIMERS. To get the system boot at least sometimes I have 
> > > to specify nohz=off. Then I get
> > 
> > Try highres=off, too... Hehe, and even idle=poll might help.
> 
> Ok, for now I've compiled a kernel with all "advanced" features off like 
> hrt, nohz. Will see how it behaves. But thanks for the hints.

Let us know...

> > > CR0: 8005003b CR2: 081dcf88 CR3: 07e46000 CR4: 02d0
> > > DR0:  DR1:  DR2:  DR3: 
> > > DR6: 0ff0 DR7: 0400
> > >  [] show_trace_log_lvl+0x1a/0x30
> > >  [] show_trace+0x12/0x20
> > 
> > ...and disable softlockup watchdog, too...
> 
> So, you think those BUGs are bogus?

Well you want to make old machine usable, right? Disabling
warnings is fair game.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Ingo Molnar

* Mark Lord <[EMAIL PROTECTED]> wrote:

> That's not the same thing at all. I think that David is suggesting 
> that the reinsertion logic should pretend that the task used up all of 
> the CPU time it was offered in the slot leading up to the 
> sched_yield() call.

we have tried this too, and this has problems too (artificial inflation 
of the vruntime metric and a domino effects on other portions of the 
scheduler). So this is a worse solution than what we have now. (and this 
has all been pointed out in past discussions in which David 
participated. I'll certainly reply to any genuinely new idea.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Timers SMP] can this machine be helped?

2007-12-03 Thread Guennadi Liakhovetski

On Sun, 2 Dec 2007, Pavel Machek wrote:

> > I compiled a .24-ish kernel for it with CONFIG_NO_HZ and 
> > CONFIG_HIGH_RES_TIMERS. To get the system boot at least sometimes I have 
> > to specify nohz=off. Then I get
> 
> Try highres=off, too... Hehe, and even idle=poll might help.

Ok, for now I've compiled a kernel with all "advanced" features off like 
hrt, nohz. Will see how it behaves. But thanks for the hints.

> > Pid: 0, comm: swapper Not tainted (2.6.24-rc2-g8c086340 #3)
> > EIP: 0060:[] EFLAGS: 0283 CPU: 0
> > EIP is at acpi_processor_idle+0x2ae/0x477
> > EAX:  EBX: feab ECX: 0001 EDX: 0001
> > ESI: c7c5f2d0 EDI: 00122d9f EBP: c03ddfa8 ESP: c03ddf90
> >  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> > CR0: 8005003b CR2: 081dcf88 CR3: 07e46000 CR4: 02d0
> > DR0:  DR1:  DR2:  DR3: 
> > DR6: 0ff0 DR7: 0400
> >  [] show_trace_log_lvl+0x1a/0x30
> >  [] show_trace+0x12/0x20
> 
> ...and disable softlockup watchdog, too...

So, you think those BUGs are bogus?

Thanks
Guennadi
---
Guennadi Liakhovetski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched_yield: delete sysctl_sched_compat_yield

2007-12-03 Thread Mark Lord


Chris Friesen wrote:

David Schwartz wrote:

Chris Friesen wrote:

..

The problem is where do we insert the task that is yielding?  CFS is
based around a tree structure ordered by time.


We put it exactly where we would have when its timeslice ran out. If 
we can

reward it a little bit, that's great. But if not, we can live with that.
Just imagine that the timer interrupt fired to indicate the end of the
thread's run time when the thread called 'sched_yield'.


CFS doesn't really do "timeslice".  But in essence what you are 
describing is the default behaviour currently...it simply removes the 
task from the tree and reinserts it based on how much cpu time it used up.



Then what does he do when the task runs out of run time? It's hard to
imagine we can't do that when the task calls sched_yield.


It gets reinserted into the tree at a position based on how much cpu 
time it used.  This is exactly the current sched_yield() behaviour.

..

That's not the same thing at all.
I think that David is suggesting that the reinsertion logic
should pretend that the task used up all of the CPU time it
was offered in the slot leading up to the sched_yield() call.

If it did that, then the task would be far more likely not to
end up as the next task chosen to run.

Without doing that, the task is highly likely to be chosen
to run again immediately, as it will appear to have done
nothing since it was previously chosen -- and so the same 
criteria will result in it being chosen again, and again,

and again, until it finally wastes enough cycles to not
be reinserted into the "currently active" slot of the tree.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bugme-new] [Bug 9482] New: kernel GPF in 2.6.24 (g09f345da)

2007-12-03 Thread Ed L. Cashin

On Mon, Dec 03, 2007 at 04:00:05PM -0500, Ed L. Cashin wrote:
...
> I'll keep looking at this, but at a glance it looks like the cpu
> number is valid, because I don't trip a BUG_ON when I make the change
> below (the badval variable is noise, sorry).
> 
>   --- lx/lib/percpu_counter.c.200711302007-12-03 15:43:19.0 -0500
>   +++ lx/lib/percpu_counter.c 2007-12-03 15:47:38.0 -0500
>   @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_
>   s64 count;
>   s32 *pcount;
>   int cpu = get_cpu();
>   +   u64 badval = 0xULL;
>
>   +   BUG_ON(!cpu_possible(cpu));
>   pcount = per_cpu_ptr(fbc->counters, cpu);
>   count = *pcount + amount;
>   if (count >= batch || count <= -batch) {

It appears that the fbc->counters pointer is NULL.  I added the line,

BUG_ON(!fbc->counters);

... (on line 39 in my percpu_counter.c), and it results in the trace
below.  It looks like when it's NULL, percpu_ptr passes it to
__percpu_disguise, which makes it all ones and then tries to
dereference 0x to access to the "ptrs" member of the
struct percpu_data.

[ cut here ]
kernel BUG at lib/percpu_counter.c:39!
invalid opcode:  [1] SMP 
CPU 0 
Modules linked in: aoe
Pid: 3354, comm: bash Not tainted 2.6.24-rc3-47dbg #10
RIP: 0010:[]  [] 
__percpu_counter_add+0x2a/0x8f
RSP: 0018:810075031aa8  EFLAGS: 00010046
RAX:  RBX: 81007fd19bd8 RCX: 
RDX: 0010 RSI: 0001 RDI: 
RBP: 810075031ac8 R08: 81007cc077b0 R09: 802ae5ee
R10: 810075031aa8 R11: 8100750318e8 R12: 81007c81c380
R13: 810073ce8250 R14: 0200 R15: 8100755016b0
FS:  2b3e5c052db0() GS:8078b000() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 2b7f44fb64e0 CR3: 7c4b1000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process bash (pid: 3354, threadinfo 81007503, task 81007b4da040)
Stack:  810075031ac8 81007fd19bd8 81007c81c380 
 810075031af8 802ae682 100075031ae8 8100755016b0
 0200 81007fd19bd8 810075031b18 802ae75c
Call Trace:
 [] __set_page_dirty+0xdc/0x121
 [] mark_buffer_dirty+0x95/0x99
 [] __block_commit_write+0x72/0xac
 [] block_write_end+0x4f/0x5b
 [] blkdev_write_end+0x1b/0x38
 [] generic_file_buffered_write+0x1c0/0x648
 [] current_fs_time+0x22/0x29
 [] __generic_file_aio_write_nolock+0x358/0x3c2
 [] filemap_fault+0x1c4/0x320
 [] unlock_page+0x2d/0x31
 [] generic_file_aio_write_nolock+0x3b/0x8d
 [] do_sync_write+0xe2/0x126
 [] autoremove_wake_function+0x0/0x38
 [] do_page_fault+0x3f8/0x7bb
 [] fd_install+0x5f/0x68
 [] vfs_write+0xae/0x137
 [] sys_write+0x47/0x70
 [] system_call+0x7e/0x83


Code: 0f 0b eb fe 0f a3 3d 7e 08 4f 00 19 c0 85 c0 75 04 0f 0b eb 
RIP  [] __percpu_counter_add+0x2a/0x8f
 RSP 
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
INFO: lockdep is turned off.

Call Trace:
 [] debug_show_held_locks+0x1b/0x24
 [] __might_sleep+0xc7/0xc9
 [] down_read+0x1d/0x4a
 [] exit_mm+0x34/0xf7
 [] do_exit+0x247/0x75b
 [] kernel_math_error+0x0/0x7e
 [] do_trap+0x101/0x110
 [] do_invalid_op+0x91/0x9a
 [] __percpu_counter_add+0x2a/0x8f
 [] :aoe:aoeblk_make_request+0x1c3/0x1d0
 [] io_schedule+0x28/0x34
 [] error_exit+0x0/0x9a
 [] __set_page_dirty+0x48/0x121
 [] __percpu_counter_add+0x2a/0x8f
 [] __set_page_dirty+0xdc/0x121
 [] mark_buffer_dirty+0x95/0x99
 [] __block_commit_write+0x72/0xac
 [] block_write_end+0x4f/0x5b
 [] blkdev_write_end+0x1b/0x38
 [] generic_file_buffered_write+0x1c0/0x648
 [] current_fs_time+0x22/0x29
 [] __generic_file_aio_write_nolock+0x358/0x3c2
 [] filemap_fault+0x1c4/0x320
 [] unlock_page+0x2d/0x31
 [] generic_file_aio_write_nolock+0x3b/0x8d
 [] do_sync_write+0xe2/0x126
 [] autoremove_wake_function+0x0/0x38
 [] do_page_fault+0x3f8/0x7bb
 [] fd_install+0x5f/0x68
 [] vfs_write+0xae/0x137
 [] sys_write+0x47/0x70
 [] system_call+0x7e/0x83



-- 
  Ed L Cashin <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7][QUOTA] Move sysctl management code under ifdef CONFIG_SYSCTL

2007-12-03 Thread Andrew Morton

On Fri, 30 Nov 2007 16:02:50 +0300
Pavel Emelyanov <[EMAIL PROTECTED]> wrote:

> This includes the tables themselves and the call to the
> register_sysctl_table(). Since this call is done from the __init
> call, I hope this is OK to keep the #ifdef inside the function, 
> rather than making proper helpers outside it.
> 
> Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>
> 
> ---
> 
> diff --git a/fs/dquot.c b/fs/dquot.c
> index 50e7c2a..efee14d 100644
> --- a/fs/dquot.c
> +++ b/fs/dquot.c
> @@ -1821,6 +1821,7 @@ struct quotactl_ops vfs_quotactl_ops = {
>   .set_dqblk  = vfs_set_dqblk
>  };
>  
> +#ifdef CONFIG_SYSCTL
>  static ctl_table fs_dqstats_table[] = {
>   {
>   .ctl_name   = FS_DQ_LOOKUPS,
> @@ -1918,6 +1919,7 @@ static ctl_table sys_table[] = {
>   },
>   { .ctl_name = 0 },
>  };
> +#endif
>  
>  static int __init dquot_init(void)
>  {
> @@ -1926,7 +1928,9 @@ static int __init dquot_init(void)
>  
>   printk(KERN_NOTICE "VFS: Disk quotas %s\n", __DQUOT_VERSION__);
>  
> +#ifdef CONFIG_SYSCTL
>   register_sysctl_table(sys_table);
> +#endif
>  
>   dquot_cachep = kmem_cache_create("dquot",
>   sizeof(struct dquot), sizeof(unsigned long) * 4,

We should avoid the ifdefs around the register_sysctl_table() call.

At present the !CONFIG_SYSCTL implementation of register_sysctl_table() is
a non-inlined NULL-returning stub.  All we have to do is to inline that stub
then these ifdefs can go away.

The same applies to register_sysctl_paths().


If that's all agreeable then there isn't a lot of point in me merging these
seven patches.



btw, administrivia detail: please don't put the patch's subsystem
identifier in [].  IOW, this:

Subject: [PATCH 1/7][QUOTA] Move sysctl management code under ifdef 
CONFIG_SYSCTL

should have been

Subject: [PATCH 1/7] quota: move sysctl management code under ifdef 
CONFIG_SYSCTL

for reasons described in section 2 of
http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt.


Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Development & Objective-C

2007-12-03 Thread J.A. Magallón

On Mon, 3 Dec 2007 22:13:53 +0100, Willy Tarreau <[EMAIL PROTECTED]> wrote:

...
> 
> It just depends how many times a second it happens. For instance, consider
> this trivial loop (fct is a two-function array which just return 1 or 2) :
> 
> i = 0;
> for (j = 0; j < (1 << 28); j++) {
> k = (j >> 8) & 1;
> i += fct[k]();
> }
> 
> It takes 1.6 seconds to execute on my athlon-xp 1.5 GHz. If, instead of
> changing the function once every 256 calls, you change it to every call :
> 
> i = 0;
> for (j = 0; j < (1 << 28); j++) {
> k = (j >> 0) & 1;
> i += fct[k]();
> }
> 
> Then it only takes 4.3 seconds, which is about 3 times slower. The number
> of calls per function remains the same (128M calls each), it's just the
> branch prediction which is wrong every time. The very few nanoseconds added
> at each call are enough to slow down a program from 1.6 to 4.3 seconds while
> it executes the exact same code (it may even save one shift). If you have
> such stupid code, say, to compute the color or alpha of each pixel in an
> image, you will certainly notice the difference.
> 
> And such poorly efficient code may happen very often when you blindly rely
> on function pointers instead of explicit calls.
> 
...
> 
> You are forgetting something very important : once you start stacking
> functions to perform the dirty work for you, you end up with so much
> abstraction that even new stupid code cannot be written at all without
> relying on them, and it's where the problem takes its roots, because
> when you need to write a fast function and you notice that you cannot
> touch a variable without passing through a slow pinhole, your fast
> function will remain slow whatever you do, and the worst of all is that
> you will think that it is normally fast and that it cannot be written
> faster.
> 

But don't forget that OOP is just another way to organize your code,
and let the language/compiler do some things you shouldn't de doing,
like fill an vtable pointer, that are error prone.

And of course everything depends on what language you choose and how
you use it.
You could write an equally effcient kernel in languages like C++,
using C++ abstractions as a high level organization, where
the fast paths could be coded the right way; we are not talking about
C# or Java, where even a sum is a call to an overloaded method.
Its the difference between doing school-book push and pops to lists,
and suddenly inventing the splice operator...

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Regression - 2.6.24-rc3 - umem nvram card driver oops

2007-12-03 Thread David Chinner

Neil,

I just upgraded an ia64 (Altix, 16k page size) test box to 2.6.24-rc3
from 2.6.23 and I get it panicing on boot in the umem driver.

[   55.499300] v2.3 : Micro Memory(tm) PCI memory board block driver
[   55.519331] ACPI: Unable to derive IRQ for device 0006:00:01.0
[   55.528294] ACPI: PCI Interrupt 0006:00:01.0[A]: no GSI
[   55.529214] umem 0006:00:01.0: Micro Memory(tm) controller found (PCI Mem 
Module (Battery Backup))
[   55.530968] umem 0006:00:01.0: CSR 0xc0080da0 -> 0xc0080da0 
(0x100)
[   55.552881] umem 0006:00:01.0: Size 1048576 KB, Battery 1 Disabled 
(FAILURE), Battery 2 Disabled (FAILURE)
[   55.559064] umem 0006:00:01.0: Window size 16777216 bytes, IRQ 64
[   55.560131] umem 0006:00:01.0: memory already initialized
[   55.561501]  umema:<1>Unable to handle kernel NULL pointer dereference 
(address 002a)
[   55.580231] swapper[0]: Oops 8813272891392 [1]
[   55.581096] Modules linked in: umem qla2xxx
[   55.582022]
[   55.582023] Pid: 0, CPU 0, comm:  swapper
[   55.608663] psr : 101008026018 ifs : 8b9c ip  : 
[]Not tainted
[   55.610226] ip is at process_page+0x1c0/0x760 [umem]
[   55.611107] unat:  pfs : 0b9c rsc : 
0003
[   55.660528] rnat: e030023e5e40 bsps: e03002563000 pr  : 
a56911155aa696a5
[   55.661866] ldrs:  ccv :  fpsr: 
0009804c0270033f
[   55.663204] csd :  ssd : 
[   55.712930] b0  : a002046e1b70 b6  : a002046e1b20 b7  : 
a0010008d300
[   55.714267] f6  : 1003e0601 f7  : 1003e0040
[   55.715395] f8  : 1003e00180400 f9  : 1003e8000
[   55.816292] f10 : 1003e0400 f11 : 1003e00049919
[   55.817422] r1  : a002046ebe80 r2  : a002046f7640 r3  : 
0040
[   55.818747] r8  :  r9  :  r10 : 

[   55.866550] r11 : 0003 r12 : a00100bfbbb0 r13 : 
a00100bf4000
[   55.867886] r14 : 002a r15 :  r16 : 
0003
[   55.894776] r17 : a002046e5630 r18 :  r19 : 
0003
[   55.896113] r20 : e0398c980030 r21 : e03988381f40 r22 : 

[   55.897450] r23 : 0001 r24 : 0001 r25 : 
e03988381f28
[   55.927571] r26 : 0001 r27 :  r28 : 
0004
[   55.928734] r29 : cf0c01d0 r30 : e0398c980038 r31 : 

[   55.930075]
[   55.930077] Call Trace:
[   56.031008]  [] show_stack+0x80/0xa0
[   56.031010] sp=a00100bfb780 
bsp=a00100bf5238
[   56.033460]  [] show_regs+0x870/0x8a0
[   56.033462] sp=a00100bfb950 
bsp=a00100bf51d8
[   56.093627]  [] die+0x210/0x3a0
[   56.093629] sp=a00100bfb950 
bsp=a00100bf5190
[   56.213913]  [] ia64_do_page_fault+0x9c0/0xb00
[   56.213915] sp=a00100bfb950 
bsp=a00100bf5138
[   56.231427]  [] ia64_leave_kernel+0x0/0x280
[   56.231429] sp=a00100bfb9e0 
bsp=a00100bf5138
[   56.278001]  [] process_page+0x1c0/0x760 [umem]
[   56.278004] sp=a00100bfbbb0 
bsp=a00100bf5058
[   56.280476]  [] tasklet_action+0x270/0x360
[   56.280478] sp=a00100bfbbb0 
bsp=a00100bf5018
[   56.370160]  [] __do_softirq+0x200/0x240
[   56.370162] sp=a00100bfbbb0 
bsp=a00100bf4f80
[   56.476607]  [] do_softirq+0x70/0xc0
[   56.476609] sp=a00100bfbbb0 
bsp=a00100bf4f20
[   56.478889]  [] irq_exit+0x80/0xc0
[   56.478891] sp=a00100bfbbb0 
bsp=a00100bf4f08
[   56.529780]  [] ia64_handle_irq+0x1b0/0x3c0
[   56.529782] sp=a00100bfbbb0 
bsp=a00100bf4e98
[   56.636341]  [] ia64_leave_kernel+0x0/0x280
[   56.636343] sp=a00100bfbbb0 
bsp=a00100bf4e98
[   56.638823]  [] default_idle+0x1a0/0x1c0
[   56.638825] sp=a00100bfbd80 
bsp=a00100bf4e30
[   56.689924]  [] cpu_idle+0x210/0x440
[   56.689926] sp=a00100bfbe20 
bsp=a00100bf4db8
[   56.796362]  [] rest_init+0x110/0x140
[   56.796364] sp=a00100bfbe20 
bsp=a00100bf4da0
[   56.798668]  [] start_kernel+0x7c0/0x880
[   56.798670] sp=a00100bfbe20 
bsp=a00100bf4d28
[   56.850239]  [] __kprobes_text_end+0x6d0/0x6f0
[   56.850241] sp=a00100bfbe30 
bsp=a00100bf4c40

-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe

1 2 3 4 5 6 7 >

1 - 100 of 610 matches

Mail list logo