Re: [PATCH 0/2] file capabilities: Introduction

2007-05-16 Thread Suparna Bhattacharya
On Mon, May 14, 2007 at 08:00:11PM +, Pavel Machek wrote:
> Hi!
> 
> > "Serge E. Hallyn" <[EMAIL PROTECTED]> wrote:
> > 
> > > Following are two patches which have been sitting for some time in -mm.
> > 
> > Where "some time" == "nearly six months".
> > 
> > We need help considering, reviewing and testing this code, please.
> 
> I did quick scan, and it looks ok. Plus, it means we can finally start
> using that old capabilities subsystem... so I think we should do it.

FWIW, I looked through it recently as well, and it looked reasonable enough
to me, though I'm not a security expert. I did have a question about
testing corner cases etc, which Serge has tried to address.

Serge, are you planning to post an update without STRICTXATTR ? That should
simplify the second patch.

Regards
Suparna

> 
>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bluetooth: fix locking in hci_sock_dev_event()

2007-05-16 Thread David Miller
From: Satyam Sharma <[EMAIL PROTECTED]>
Date: Thu, 17 May 2007 11:13:36 +0530 (IST)

> [PATCH] bluetooth: fix locking in hci_sock_dev_event()
> 
> We presently use lock_sock() to acquire a lock on a socket in
> hci_sock_dev_event(), but this goes BUG because lock_sock()
> can sleep and we're already holding a read-write spinlock at
> that point. So, we must use the non-sleeping BH version,
> bh_lock_sock().
> 
> However, hci_sock_dev_event() is called from user context and
> hence using simply bh_lock_sock() will deadlock against a
> concurrent softirq that tries to acquire a lock on the same
> socket. Hence, disabling BH's before acquiring the socket lock
> and enable them afterwards, is the proper solution to fix
> socket locking in hci_sock_dev_event().
> 
> Cc: David Miller <[EMAIL PROTECTED]>
> Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
> Signed-off-by: Marcel Holtmann <[EMAIL PROTECTED]>
> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

Thanks I'll merge this in.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] bluetooth: fix locking in hci_sock_dev_event()

2007-05-16 Thread Satyam Sharma

[PATCH] bluetooth: fix locking in hci_sock_dev_event()

We presently use lock_sock() to acquire a lock on a socket in
hci_sock_dev_event(), but this goes BUG because lock_sock()
can sleep and we're already holding a read-write spinlock at
that point. So, we must use the non-sleeping BH version,
bh_lock_sock().

However, hci_sock_dev_event() is called from user context and
hence using simply bh_lock_sock() will deadlock against a
concurrent softirq that tries to acquire a lock on the same
socket. Hence, disabling BH's before acquiring the socket lock
and enable them afterwards, is the proper solution to fix
socket locking in hci_sock_dev_event().

Cc: David Miller <[EMAIL PROTECTED]>
Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Marcel Holtmann <[EMAIL PROTECTED]>
Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

---

 net/bluetooth/hci_sock.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

---

diff -ruNp a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
--- a/net/bluetooth/hci_sock.c  2007-05-16 17:31:06.0 +0530
+++ b/net/bluetooth/hci_sock.c  2007-05-16 17:38:35.0 +0530
@@ -665,7 +665,8 @@ static int hci_sock_dev_event(struct not
/* Detach sockets from device */
read_lock(_sk_list.lock);
sk_for_each(sk, node, _sk_list.head) {
-   lock_sock(sk);
+   local_bh_disable();
+   bh_lock_sock_nested(sk);
if (hci_pi(sk)->hdev == hdev) {
hci_pi(sk)->hdev = NULL;
sk->sk_err = EPIPE;
@@ -674,7 +675,8 @@ static int hci_sock_dev_event(struct not

hci_dev_put(hdev);
}
-   release_sock(sk);
+   bh_unlock_sock(sk);
+   local_bh_enable();
}
read_unlock(_sk_list.lock);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Jeff Zheng

Yeah, seems you've locked it down, :D. I've written 600GB of data now,
and anything is still fine.
Will let it run overnight, and fill the whole 11T. I'll post the result
tomorrow

Thanks a lot though.

Jeff 

> -Original Message-
> From: Neil Brown [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, 17 May 2007 5:31 p.m.
> To: [EMAIL PROTECTED]; Jeff Zheng; Michal Piotrowski; Ingo 
> Molnar; [EMAIL PROTECTED]; 
> linux-kernel@vger.kernel.org; [EMAIL PROTECTED]
> Subject: RE: Software raid0 will crash the file-system, when 
> each disk is 5TB
> 
> On Thursday May 17, [EMAIL PROTECTED] wrote:
> > 
> > Uhm, I just noticed something.
> > 'chunk' is unsigned long, and when it gets shifted up, we 
> might lose 
> > bits.  That could still happen with the 4*2.75T arrangement, but is 
> > much more likely in the 2*5.5T arrangement.
> 
> Actually, it cannot be a problem with the 4*2.75T arrangement.
>   chuck << chunksize_bits
> 
> will not exceed the size of the underlying device *in*kilobytes*.
> In that case that is 0xAE9EC800 which will git in a 32bit long.
> We don't double it to make sectors until after we add
> zone->dev_offset, which is "sector_t" and so 64bit arithmetic is used.
> 
> So I'm quite certain this bug will cause exactly the problems 
> experienced!!
> 
> > 
> > Jeff, can you try this patch?
> 
> Don't bother about the other tests I mentioned, just try this one.
> Thanks.
> 
> NeilBrown
> 
> > Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> > 
> > ### Diffstat output
> >  ./drivers/md/raid0.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff .prev/drivers/md/raid0.c ./drivers/md/raid0.c
> > --- .prev/drivers/md/raid0.c2007-05-17 
> 10:33:30.0 +1000
> > +++ ./drivers/md/raid0.c2007-05-17 15:02:15.0 +1000
> > @@ -475,7 +475,7 @@ static int raid0_make_request (request_q
> > x = block >> chunksize_bits;
> > tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> > }
> > -   rsect = (((chunk << chunksize_bits) + zone->dev_offset)<<1)
> > +   rsect = sector_t)chunk << chunksize_bits) + 
> > +zone->dev_offset)<<1)
> > + sect_in_chunk;
> >   
> > bio->bi_bdev = tmp_dev->bdev;
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Neil Brown
On Thursday May 17, [EMAIL PROTECTED] wrote:
> 
> Uhm, I just noticed something.
> 'chunk' is unsigned long, and when it gets shifted up, we might lose
> bits.  That could still happen with the 4*2.75T arrangement, but is
> much more likely in the 2*5.5T arrangement.

Actually, it cannot be a problem with the 4*2.75T arrangement.
  chuck << chunksize_bits

will not exceed the size of the underlying device *in*kilobytes*.
In that case that is 0xAE9EC800 which will git in a 32bit long.
We don't double it to make sectors until after we add
zone->dev_offset, which is "sector_t" and so 64bit arithmetic is used.

So I'm quite certain this bug will cause exactly the problems
experienced!!

> 
> Jeff, can you try this patch?

Don't bother about the other tests I mentioned, just try this one.
Thanks.

NeilBrown

> Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> 
> ### Diffstat output
>  ./drivers/md/raid0.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff .prev/drivers/md/raid0.c ./drivers/md/raid0.c
> --- .prev/drivers/md/raid0.c  2007-05-17 10:33:30.0 +1000
> +++ ./drivers/md/raid0.c  2007-05-17 15:02:15.0 +1000
> @@ -475,7 +475,7 @@ static int raid0_make_request (request_q
>   x = block >> chunksize_bits;
>   tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
>   }
> - rsect = (((chunk << chunksize_bits) + zone->dev_offset)<<1)
> + rsect = sector_t)chunk << chunksize_bits) + zone->dev_offset)<<1)
>   + sect_in_chunk;
>   
>   bio->bi_bdev = tmp_dev->bdev;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/3] Support removal of unused dentry entries via SLUB defrag interface

2007-05-16 Thread clameter
This patch allows the removal of unused dentry entries in a partial
populated slab page. Very limited (yet) in what it can do for reclaim
but this catches bad cases in which we have a long list of partial
slabs with a few entries in each of them. We can free up the slabs
that have only unused dentry entries in them.

get_dentry() uses the dcache lock and then works with dget_locked
to obtain a reference to the dentry. An additional complication is that
the dentry may be in process of being freed or it may just have been
allocated. In that case d_inode is NULL. If we discover this then we
simply stay away from the object and return 1 to indicate to the
defrag logic that this object will be free. Otherwise we increment
the refcount and return success.

kick_dentry() is called after get_dentry_reference() has
been used and after the slab has dropped all of its own locks. The dentry
pruning for unused entries works in a straighforward way.

Note: The code here could be significantly improved. If we could
get to a point where all used dentries could be moved then full
dentry slab defragmentation and vacating of dentry slab pages would
become possible.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/dcache.c |   89 ++--
 1 file changed, 81 insertions(+), 8 deletions(-)

Index: slub/fs/dcache.c
===
--- slub.orig/fs/dcache.c   2007-05-16 20:58:02.0 -0700
+++ slub/fs/dcache.c2007-05-16 20:59:27.0 -0700
@@ -2114,18 +2114,91 @@ static void __init dcache_init_early(voi
INIT_HLIST_HEAD(_hashtable[loop]);
 }
 
+/*
+ * The slab is holding off frees. Thus we can safely examine
+ * the object without the danger of it vanishing from under us.
+ */
+static int get_dentry(struct kmem_cache *s, void *private)
+{
+   struct dentry *dentry = private;
+   int result = 0;
+
+   spin_lock(_lock);
+   /*
+* dentry->d_inode is set to NULL when the dentry
+* is freed. Use that as an indicator that we should
+* not interfere with the freeing process.
+*/
+   if (dentry->d_inode) {
+   dget_locked(dentry);
+   if (atomic_read(>d_count) > 2)
+   /*
+* Moving of dentries in use not
+* implemented yet.
+*/
+   result = -EINVAL;
+   } else
+   result = 1;
+   spin_unlock(_lock);
+   return result;
+}
+
+static void put_dentry(struct kmem_cache *s, void *private)
+{
+   struct dentry *dentry = private;
+
+   dput(dentry);
+}
+
+/*
+ * Slab has dropped all the locks. Get rid of the
+ * refcount we obtained earlier and also rid of the
+ * object.
+ */
+static int kick_dentry(struct kmem_cache *s, void *private)
+{
+   struct dentry *dentry = private;
+
+   spin_lock(_lock);
+   spin_lock(>d_lock);
+   if (atomic_read(>d_count) > 1) {
+   /*
+* Reference count was increased.
+* We need to abandon the freeing of
+* objects.
+*/
+   spin_unlock(>d_lock);
+   spin_unlock(_lock);
+   dput(dentry);
+   return -EBUSY;
+   }
+
+   /* Remove from LRU */
+   if (!list_empty(>d_lru)) {
+   dentry_stat.nr_unused--;
+   list_del_init(>d_lru);
+   }
+   /* Drop the entry */
+   prune_one_dentry(dentry, 1);
+   spin_unlock(_lock);
+   return 0;
+}
+
+static struct kmem_cache_ops dentry_kmem_cache_ops = {
+   .get = get_dentry,
+   .put = put_dentry,
+   .kick = kick_dentry,
+   .sync = synchronize_rcu
+};
+
 static void __init dcache_init(unsigned long mempages)
 {
int loop;
 
-   /* 
-* A constructor could be added for stable state like the lists,
-* but it is probably not worth it because of the cache nature
-* of the dcache. 
-*/
-   dentry_cache = KMEM_CACHE(dentry,
-   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
-   
+   dentry_cache = KMEM_CACHE_OPS(dentry,
+   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD,
+   _kmem_cache_ops);
+
register_shrinker(_shrinker);
 
/* Hash may have been set up in dcache_init_early */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/3] SLUB: add support for kmem_cache_ops

2007-05-16 Thread clameter
We use the parameter formerly used by the destructor to pass an optional
pointer to a kmem_cache_ops structure to kmem_cache_create.

kmem_cache_ops is created as empty. Later patches populate kmem_cache_ops.

Create a KMEM_CACHE_OPS macro that allows the specification of a the
kmem_cache_ops.

Code to handle kmem_cache_ops is added to SLUB. SLAB and SLOB are updated
to be able to take a kmem_cache_ops structure but will ignore it.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slab.h |   13 +
 include/linux/slub_def.h |1 +
 mm/slab.c|6 +++---
 mm/slob.c|2 +-
 mm/slub.c|   44 ++--
 5 files changed, 44 insertions(+), 22 deletions(-)

Index: slub/include/linux/slab.h
===
--- slub.orig/include/linux/slab.h  2007-05-15 21:19:51.0 -0700
+++ slub/include/linux/slab.h   2007-05-15 21:27:07.0 -0700
@@ -38,10 +38,13 @@ typedef struct kmem_cache kmem_cache_t _
 void __init kmem_cache_init(void);
 int slab_is_available(void);
 
+struct kmem_cache_ops {
+};
+
 struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
unsigned long,
void (*)(void *, struct kmem_cache *, unsigned long),
-   void (*)(void *, struct kmem_cache *, unsigned long));
+   const struct kmem_cache_ops *s);
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
@@ -59,9 +62,11 @@ int kmem_ptr_validate(struct kmem_cache 
  * f.e. add cacheline_aligned_in_smp to the struct declaration
  * then the objects will be properly aligned in SMP configurations.
  */
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
-   sizeof(struct __struct), __alignof__(struct __struct),\
-   (__flags), NULL, NULL)
+#define KMEM_CACHE_OPS(__struct, __flags, __ops) \
+   kmem_cache_create(#__struct, sizeof(struct __struct), \
+   __alignof__(struct __struct), (__flags), NULL, (__ops))
+
+#define KMEM_CACHE(__struct, __flags) KMEM_CACHE_OPS(__struct, __flags, NULL)
 
 #ifdef CONFIG_NUMA
 extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
Index: slub/mm/slub.c
===
--- slub.orig/mm/slub.c 2007-05-15 21:25:46.0 -0700
+++ slub/mm/slub.c  2007-05-15 21:29:36.0 -0700
@@ -294,6 +294,9 @@ static inline int check_valid_pointer(st
return 1;
 }
 
+struct kmem_cache_ops slub_default_ops = {
+};
+
 /*
  * Slow version of get and set free pointer.
  *
@@ -2003,11 +2006,13 @@ static int calculate_sizes(struct kmem_c
 static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags,
const char *name, size_t size,
size_t align, unsigned long flags,
-   void (*ctor)(void *, struct kmem_cache *, unsigned long))
+   void (*ctor)(void *, struct kmem_cache *, unsigned long),
+   const struct kmem_cache_ops *ops)
 {
memset(s, 0, kmem_size);
s->name = name;
s->ctor = ctor;
+   s->ops = ops;
s->objsize = size;
s->flags = flags;
s->align = align;
@@ -2191,7 +2196,7 @@ static struct kmem_cache *create_kmalloc
 
down_write(_lock);
if (!kmem_cache_open(s, gfp_flags, name, size, ARCH_KMALLOC_MINALIGN,
-   flags, NULL))
+   flags, NULL, _default_ops))
goto panic;
 
list_add(>list, _caches);
@@ -2505,12 +2510,16 @@ static int slab_unmergeable(struct kmem_
if (s->ctor)
return 1;
 
+   if (s->ops != _default_ops)
+   return 1;
+
return 0;
 }
 
 static struct kmem_cache *find_mergeable(size_t size,
size_t align, unsigned long flags,
-   void (*ctor)(void *, struct kmem_cache *, unsigned long))
+   void (*ctor)(void *, struct kmem_cache *, unsigned long),
+   const struct kmem_cache_ops *ops)
 {
struct list_head *h;
 
@@ -2520,6 +2529,9 @@ static struct kmem_cache *find_mergeable
if (ctor)
return NULL;
 
+   if (ops != _default_ops)
+   return NULL;
+
size = ALIGN(size, sizeof(void *));
align = calculate_alignment(flags, align, size);
size = ALIGN(size, align);
@@ -2555,13 +2567,15 @@ static struct kmem_cache *find_mergeable
 struct kmem_cache *kmem_cache_create(const char *name, size_t size,
size_t align, unsigned long flags,
void (*ctor)(void *, struct kmem_cache *, unsigned long),
-   void (*dtor)(void *, struct kmem_cache *, unsigned long))
+   const struct kmem_cache_ops *ops)
 {
struct kmem_cache 

[patch 2/3] SLUB: Implement targeted reclaim and partial list defragmentation

2007-05-16 Thread clameter
Targeted reclaim allows to target a single slab for reclaim. This is done by
calling

kmem_cache_vacate(page);

It will return 1 on success, 0 if the operation failed.

The vacate functionality is also used for slab shrinking. During the shrink
operation SLUB will generate a list sorted by the number of objects in use.

We extract pages off that list that are only filled less than a quarter. These
objects are then processed using kmem_cache_vacate.

In order for a slabcache to support this functionality a couple of functions
must be defined via kmem_cache_ops. These are

int get(struct kmem_cache *s, void *)

Must obtain a reference to the indicated object. SLUB guarantees that
the objects is still allocated. However, another thread may be blocked
in slab_free attempting to free the same object. It may succeed as
soon as get() returns to the slab allocator. The function must
detect this situation and return 1 if that is the case.
If the object cannot be freed then a negative -Exx code must be
returned indicating the reason for the failure.
get() return 0 on success.

No slab operations may be performed in get_reference(). Interrupts
are disabled. What can be done is very limited. The slab lock
for the page with the object is taken. Any attempt to perform a slab
operation may lead to a deadlock.

void put(struct kmem_cache *, void *)

Used to restore the reference count obtained by get() if the reclaim
logic decides to abandon the attempt to vacate all objects in a slab.
This is usually the case if get() indicates that an object is not
freeable.
put() is optional. If it is not defined then it is assumed that we
can simply abandon get()s on slab objects.

int kick(struct kmem_cache *, void *)

After SLUB has established references to the remaining objects in a
slab it will drop all locks and then use kick() on each of the
objects. The existence of the object is guaranteed by virtue of the
earlier obtained reference. The callback may perform any slab operation
since no locks are held at the time of call.
Function must return 0 if the object was successfully freed.
Return -Exxx to indicate that the object is not freeable and to stop
further attempt to free objects in this slab.

The callback should remove the object from the slab in some way. This
may be accomplished by reclaiming the object and then running
kmem_cache_free() or reallocating it and then running
kmem_cache_free(). Reallocation is advantageous because the partial
slabs were just sorted to have the partial slabs with the most objects
first. Allocation is likely to result in filling up a slab so that
it can be removed from the partial list.

void sync(void)

After all objects have been removed by kick()s this function will be
called to ensure that all free operations have completed. Typically
the function called here is synchronize_rcu() if the slab cache uses
RCU to free objects. The function is optional. If it is not specified
then no synchronization is done before removing the slab.

If a kmem_cache_vacate on a page fails then the slab has usually a pretty
low usage ratio. Go through the slab and resequence the freelist so that
object addresses increase as we allocate objects. This will trigger the
cacheline prefetcher and increase allocations speed.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slab.h |   34 +
 mm/slab.c|9 +
 mm/slob.c|9 +
 mm/slub.c|  304 +--
 4 files changed, 346 insertions(+), 10 deletions(-)

Index: slub/include/linux/slab.h
===
--- slub.orig/include/linux/slab.h  2007-05-16 22:12:43.0 -0700
+++ slub/include/linux/slab.h   2007-05-16 22:12:44.0 -0700
@@ -39,6 +39,39 @@ void __init kmem_cache_init(void);
 int slab_is_available(void);
 
 struct kmem_cache_ops {
+   /*
+* Called with slab lock held and interrupts disabled.
+* No slab operation may be performed.
+*
+* Return 0 if reference was successfully obtained
+* Return 1 if a concurrent kmem_cache_free is waiting to free object
+* Return -errcode if it is not possible to free the object.
+*  No reference was obtained.
+*/
+   int (*get)(struct kmem_cache *, void *);
+
+   /*
+* Use to restore the reference count if we abandon the
+* attempt to vacate a slab page due to an unmovable
+* object.
+*/
+   void (*put)(struct kmem_cache *, void *);
+
+   /*
+* Called with no locks held and interrupts enabled.
+* Any operation may be 

[patch 0/3] Slab Defrag / Slab Targeted Reclaim

2007-05-16 Thread clameter
Initial support for Slab defragmentation and targeted reclaim. The
functionality here is minimal. We establish a slab API to allow removal
or moving of objects between slabs.

The only user provided here is a dentry cache reclaim capability. This is
limited to the removal of unused dentries for now. It is planned to later
add a similar inode reclaim capability and then extend the move/reclaim
to support moving of dentries and inodes.

Slab defragmentation is performed during kmem_cache_shrink. This is usually
triggered through the slab shrinkers but can also be manually triggered
through the slabinfo command.

Support is provided for antifrag/defrag to evict a specific slab page
through the kmem_cache_vacate function call. Since we can only reclaim
unused dentries for now that functionality is pretty limited (we need to
put some work into making dentries and inode more reclaimable or movable)
but we can increase the capabilities over time which will allow us to move
slabs from the reclaimable area into the movable area. This will shrink
the reclaimable area significantly. Since we can target the vacating of
pages this may allow the antifrag code to remove a page that hinders the
freeing of higher order page.

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb-scanner-cameras kernel-2.6.22 and udev-095 problem

2007-05-16 Thread Greg KH
On Wed, May 16, 2007 at 02:58:22PM -0500, [EMAIL PROTECTED] wrote:
>  greg
> 
>  CONFIG_SYSFS_DEPRECATED=Y
> 
>  check that we have
>  rwxrwxrwx 1 root root   15 May 16 13:43 scanner-usbdev2.12 -> 
>  bus/usb/002/012
> 
>  in /dev directory after usb-scanner connection for kernel-2.6.20
>  we don't have this for kernel-2.6.22

Who creates that?  udev?  What udev rule does that?

Can you run 'udevtest' to see what is supposed to be matching here?
Odds are you have a broken rule somehow.


> 
>  2.6.20 usb-scanner connect
> 
>  # /usr/sbin/udevmonitor
> 
>  UEVENT[1179340996.886805] 
>  add@/devices/pci:00/:00:0b.1/usb2/2-2/2-2.2
>  UEVENT[1179340996.886864] add@/class/usb_endpoint/usbdev2.12_ep00
>  UEVENT[1179340996.887438] 
>  add@/devices/pci:00/:00:0b.1/usb2/2-2/2-2.2/2-2.2:1.0
>  UEVENT[1179340996.887467] add@/class/usb_endpoint/usbdev2.12_ep81
>  UEVENT[1179340996.887484] add@/class/usb_endpoint/usbdev2.12_ep02
>  UEVENT[1179340996.887499] add@/class/usb_endpoint/usbdev2.12_ep83
>  UEVENT[1179340996.887514] add@/class/usb_device/usbdev2.12
>  UDEV  [1179340996.921506] 
>  add@/devices/pci:00/:00:0b.1/usb2/2-2/2-2.2
>  UDEV  [1179340996.936005] add@/class/usb_endpoint/usbdev2.12_ep00
>  UDEV  [1179340996.960144] add@/class/usb_endpoint/usbdev2.12_ep81
>  UDEV  [1179340996.963672] add@/class/usb_endpoint/usbdev2.12_ep02
>  UDEV  [1179340996.967439] add@/class/usb_endpoint/usbdev2.12_ep83
>  UDEV  [1179340997.240934] 
>  add@/devices/pci:00/:00:0b.1/usb2/2-2/2-2.2/2-2.2:1.0
>  UDEV  [1179340997.473142] add@/class/usb_device/usbdev2.12

This last device is correct, and what your udev rule should be using to
create your symlink.

You didn't answer my, "what distro are you using" question.  Also, what
package created the udev rule that creates the above symlink?

thanks,

greg k-h

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Jeff Zheng

> What is the nature of the corruption?  Is it data in a file 
> that is wrong when you read it back, or does the filesystem 
> metadata get corrupted?
The corruption is in fs metadata, jfs is completely destroied, after 
Umount, fsck does not recogonize it as jfs anymore. Xfs gives kernel 
Crash, but seems still recoverable.
> 
> Can you try the configuration that works, and sha1sum the 
> files after you have written them to make sure that they 
> really are correct?
We have verified the data on the working configuration, we have written 
around 900 identical 10G files , and verified that the md5sum is
actually
the same. The verification took two days though :)

> My thought here is "maybe there is a bad block on one device, 
> and the block is used for data in the 'working' config, and 
> for metadata in the 'broken' config.
> 
> Can you try a degraded raid10 configuration. e.g.
> 
>mdadm -C /dev/md1 --level=10 --raid-disks=4 /dev/first missing \
>/dev/second missing
> 
> That will lay out the data in exactly the same place as with 
> raid0, but will use totally different code paths to access 
> it.  If you still get a problem, then it isn't in the raid0 code.

I will try this later today. As I'm now trying different size of the
component.
3.4T, seems working. Test 4.1T right now.

> Maybe try version 1 metadata (mdadm --metadata=1).  I doubt 
> that would make a difference, but as I am grasping at straws 
> already, it may be a straw woth trying.

Well the problem may also be in 3ware disk array, or disk array driver.
The guy
complaining about the same problem is also using 3ware disk array
controller.
But there is no way to verify that and a single disk array has been
working fine for us.

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: user pointers and race conditions

2007-05-16 Thread David Miller
From: sk b <[EMAIL PROTECTED]>
Date: Wed, 16 May 2007 22:56:22 -0600

> 3:if (!access_ok(VERIFY_READ,stp,sizeof(struct st)))
> 4:return;
> 5:if (!access_ok(VERIFY_WRITE,stp->u,sizeof(int)))
> 6:return;

This code would not exist in the kernel, the kernel cannot dereference
stp->u.  The stp->u dereference would silently work on x86 and x86_64
but it would generate an exception on sparc64 and other platforms.

User space accesses must go through the proper copy_from_user(),
copy_to_user, get_user(), and put_user() interfaces.

It must first copy stp into a local kernel space copy, then it may
inspect the value of stp->u.

And yes sparse would catch this problem in your code, because the
"__user" annotations would catch the illegal "stp->u" dereference.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: user pointers and race conditions

2007-05-16 Thread Al Viro
On Wed, May 16, 2007 at 10:56:22PM -0600, sk b wrote:
> 
> Hello,
> 
> I'm wondering whether there is an exploitable TOCTTOU race condition in the 
> way user pointers are handled in the kernel. Consider the following code:
> 
> 1: struct st { int *u; };
> 2: void syscall(struct st * stp) {
> 3:if (!access_ok(VERIFY_READ,stp,sizeof(struct st)))
> 4:return;
> 5:if (!access_ok(VERIFY_WRITE,stp->u,sizeof(int)))

... and there's your bug - direct access to userland data.  The normal
variant is to use accessors (get_user() or copy_from_user()) to fetch
the value of stp->u.  At which point races of the kind you mentioned
take an obviously dumb code (explicitly copying the same struct from
userland _twice_).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Neil Brown
On Wednesday May 16, [EMAIL PROTECTED] wrote:
> On Thu, 17 May 2007, Neil Brown wrote:
> 
> > On Thursday May 17, [EMAIL PROTECTED] wrote:
> >>
> >>> The only difference of any significance between the working
> >>> and non-working configurations is that in the non-working,
> >>> the component devices are larger than 2Gig, and hence have
> >>> sector offsets greater than 32 bits.
> >>
> >> Do u mean 2T here?, but in both configuartion, the component devices are
> >> larger than 2T (2.25T&5.5T).
> >
> > Yes, I meant 2T, and yes, the components are always over 2T.
> 
> 2T decimal or 2T binary?
> 

Either.  The smallest as actually 2.75T (typo above).
Precisely it was
  2929641472  kilobytes
or 
  5859282944 sectors
or 
  0x15D3D9000 sectors.

So it is over 32bits already...

Uhm, I just noticed something.
'chunk' is unsigned long, and when it gets shifted up, we might lose
bits.  That could still happen with the 4*2.75T arrangement, but is
much more likely in the 2*5.5T arrangement.

Jeff, can you try this patch?

Thanks.
NeilBrown


Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/md/raid0.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/raid0.c ./drivers/md/raid0.c
--- .prev/drivers/md/raid0.c2007-05-17 10:33:30.0 +1000
+++ ./drivers/md/raid0.c2007-05-17 15:02:15.0 +1000
@@ -475,7 +475,7 @@ static int raid0_make_request (request_q
x = block >> chunksize_bits;
tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
}
-   rsect = (((chunk << chunksize_bits) + zone->dev_offset)<<1)
+   rsect = sector_t)chunk << chunksize_bits) + zone->dev_offset)<<1)
+ sect_in_chunk;
  
bio->bi_bdev = tmp_dev->bdev;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Convert namespace_sem to a mutex

2007-05-16 Thread Satyam Sharma

On 5/17/07, Bharata B Rao <[EMAIL PROTECTED]> wrote:

From: Bharata B Rao <[EMAIL PROTECTED]>

namespace_sem is a rwsem. It is acquired as read sem at only one place(used

 ^^

Actually, this ...


by /proc/mounts, /proc//mounts and /proc//mountstats). In all other
cases it is acquired as a write sem. So, as there is not more than one reader

 ^

... does not mean this. Multiple threads could be reading mounts or
mountstats, and otoh mount(2) and umount(2) (acquire it as write sem)
could be less frequent?


for this sem, this can be a generic sem (and not rwsem) and if so it can be
a mutex.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Convert namespace_sem to a mutex

2007-05-16 Thread Al Viro
On Thu, May 17, 2007 at 10:20:41AM +0530, Bharata B Rao wrote:
> From: Bharata B Rao <[EMAIL PROTECTED]>
> 
> namespace_sem is a rwsem. It is acquired as read sem at only one place(used
> by /proc/mounts, /proc//mounts and /proc//mountstats). In all other
> cases it is acquired as a write sem. So, as there is not more than one reader
> for this sem, this can be a generic sem (and not rwsem) and if so it can be
> a mutex.

Except that read accesses outnumber write ones by far and we have no reason
for serializing them against each other.  NAK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


user pointers and race conditions

2007-05-16 Thread sk b

Hello,

I'm wondering whether there is an exploitable TOCTTOU race condition in the way 
user pointers are handled in the kernel. Consider the following code:

1: struct st { int *u; };
2: void syscall(struct st * stp) {
3:if (!access_ok(VERIFY_READ,stp,sizeof(struct st)))
4:return;
5:if (!access_ok(VERIFY_WRITE,stp->u,sizeof(int)))
6:return;
7:foo();   //user app writes a kernel address to stp->u
8:*(stp->u) = 0;
9:}

Suppose syscall is some system call and, thus, stp and stp->u are user 
pointers. The function checks the stp and stp->u pointers using the access_ok 
macro on lines 3 and 5. Also suppose that the call to foo on line 7  takes a 
non-trivial amount of time to execute. During the time it takes foo to execute, 
the user application writes a kernel address to stp->u. Note that this write 
occurs after the check on line 5. Then, on line 8, the kernel writes to stp->u 
which contains a kernel address. So, the user application could force the 
kernel to overwrite itself. Is it possible to exploit this race condition? If 
so, does Sparse check for this?

-SKB
_
Download Messenger. Start an i’m conversation. Support a cause. Join now.
http://im.live.com/messenger/im/home/?source=TAGWL_MAY07-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread david

On Thu, 17 May 2007, Neil Brown wrote:


On Thursday May 17, [EMAIL PROTECTED] wrote:



The only difference of any significance between the working
and non-working configurations is that in the non-working,
the component devices are larger than 2Gig, and hence have
sector offsets greater than 32 bits.


Do u mean 2T here?, but in both configuartion, the component devices are
larger than 2T (2.25T&5.5T).


Yes, I meant 2T, and yes, the components are always over 2T.


2T decimal or 2T binary?


So I'm
at a complete loss.  The raid0 code follows the same paths and does
the same things and uses 64bit arithmetic where needed.

So I have no idea how there could be a difference between these two
cases.

I'm at a loss...

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread H. Peter Anvin
Linus Torvalds wrote:
> 
> On Wed, 16 May 2007, H. Peter Anvin wrote:
>> It gets turned on by the code in arch/i386/kernel/cpu.  It's just that
>> the new code that Andi added runs during setup, i.e. in real mode, so
>> *way* earlier than that.
> 
> Ahh. Do we really need it that early?

The reason to do it early is so that we can still get a message out if
the CPU doesn't have the necessary features.  This is generic code and
not specific to CX8.

Since I'm rewriting the setup code in C, I have added code to enable
features on VIA and Transmeta CPUs (there was already code in there to
enable features on AMD; Intel isn't known to hide any features other
than PAE on 400 MHz FSB Pentium-M.)

I think the early feature detection makes good sense.  It's a heckuva
lot nicer to get a message on your screen saying that you can't boot
this kernel on this CPU than a crash, or an early_printk which may never
actually get to you.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Convert namespace_sem to a mutex

2007-05-16 Thread Bharata B Rao
From: Bharata B Rao <[EMAIL PROTECTED]>

namespace_sem is a rwsem. It is acquired as read sem at only one place(used
by /proc/mounts, /proc//mounts and /proc//mountstats). In all other
cases it is acquired as a write sem. So, as there is not more than one reader
for this sem, this can be a generic sem (and not rwsem) and if so it can be
a mutex.

Patch is for 2.6.22-rc1-mm1.

Signed-off-by: Bharata B Rao <[EMAIL PROTECTED]>
---
 fs/namespace.c |   48 
 1 files changed, 24 insertions(+), 24 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -37,7 +37,7 @@ static int event;
 static struct list_head *mount_hashtable __read_mostly;
 static int hash_mask __read_mostly, hash_bits __read_mostly;
 static struct kmem_cache *mnt_cache __read_mostly;
-static struct rw_semaphore namespace_sem;
+static struct mutex namespace_mutex;
 
 int nr_user_mounts;
 int max_user_mounts = 1024;
@@ -396,7 +396,7 @@ static void *m_start(struct seq_file *m,
struct list_head *p;
loff_t l = *pos;
 
-   down_read(_sem);
+   mutex_lock(_mutex);
list_for_each(p, >list)
if (!l--)
return list_entry(p, struct vfsmount, mnt_list);
@@ -413,7 +413,7 @@ static void *m_next(struct seq_file *m, 
 
 static void m_stop(struct seq_file *m, void *v)
 {
-   up_read(_sem);
+   mutex_unlock(_mutex);
 }
 
 static inline void mangle(struct seq_file *m, const char *s)
@@ -683,7 +683,7 @@ static int do_umount(struct vfsmount *mn
return retval;
}
 
-   down_write(_sem);
+   mutex_lock(_mutex);
spin_lock(_lock);
event++;
 
@@ -696,7 +696,7 @@ static int do_umount(struct vfsmount *mn
spin_unlock(_lock);
if (retval)
security_sb_umount_busy(mnt);
-   up_write(_sem);
+   mutex_unlock(_mutex);
release_mounts(_list);
return retval;
 }
@@ -1002,12 +1002,12 @@ static int do_change_type(struct nameida
if (nd->dentry != nd->mnt->mnt_root)
return -EINVAL;
 
-   down_write(_sem);
+   mutex_lock(_mutex);
spin_lock(_lock);
for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL))
change_mnt_propagation(m, type);
spin_unlock(_lock);
-   up_write(_sem);
+   mutex_unlock(_mutex);
return 0;
 }
 
@@ -1030,7 +1030,7 @@ static int do_loopback(struct nameidata 
if (err)
return err;
 
-   down_write(_sem);
+   mutex_lock(_mutex);
err = -EINVAL;
if (IS_MNT_UNBINDABLE(old_nd.mnt))
goto out;
@@ -1062,7 +1062,7 @@ static int do_loopback(struct nameidata 
}
 
 out:
-   up_write(_sem);
+   mutex_unlock(_mutex);
path_release(_nd);
return err;
 }
@@ -1124,7 +1124,7 @@ static int do_move_mount(struct nameidat
if (err)
return err;
 
-   down_write(_sem);
+   mutex_lock(_mutex);
while (d_mountpoint(nd->dentry) && follow_down(>mnt, >dentry))
;
err = -EINVAL;
@@ -1176,7 +1176,7 @@ static int do_move_mount(struct nameidat
 out1:
mutex_unlock(>dentry->d_inode->i_mutex);
 out:
-   up_write(_sem);
+   mutex_unlock(_mutex);
if (!err)
path_release(_nd);
path_release(_nd);
@@ -1238,7 +1238,7 @@ int do_add_mount(struct vfsmount *newmnt
 {
int err;
 
-   down_write(_sem);
+   mutex_lock(_mutex);
/* Something was mounted here while we slept */
while (d_mountpoint(nd->dentry) && follow_down(>mnt, >dentry))
;
@@ -1267,11 +1267,11 @@ int do_add_mount(struct vfsmount *newmnt
list_add_tail(>mnt_expire, fslist);
spin_unlock(_lock);
}
-   up_write(_sem);
+   mutex_unlock(_mutex);
return 0;
 
 unlock:
-   up_write(_sem);
+   mutex_unlock(_mutex);
mntput(newmnt);
return err;
 }
@@ -1337,9 +1337,9 @@ static void expire_mount_list(struct lis
get_mnt_ns(ns);
 
spin_unlock(_lock);
-   down_write(_sem);
+   mutex_lock(_mutex);
expire_mount(mnt, mounts, );
-   up_write(_sem);
+   mutex_unlock(_mutex);
release_mounts();
mntput(mnt);
put_mnt_ns(ns);
@@ -1612,12 +1612,12 @@ static struct mnt_namespace *dup_mnt_ns(
init_waitqueue_head(_ns->poll);
new_ns->event = 0;
 
-   down_write(_sem);
+   mutex_lock(_mutex);
/* First pass: copy the tree topology */
new_ns->root = copy_tree(mnt_ns->root, mnt_ns->root->mnt_root,
CL_COPY_ALL | CL_EXPIRE, 0);
if (IS_ERR(new_ns->root)) {
-   up_write(_sem);
+   mutex_unlock(_mutex);
kfree(new_ns);
return NULL;
}
@@ -1651,7 +1651,7 @@ static struct 

Re: [BUG] (regression) AMD k6-III/450 won't boot w/2.6.22-rc1

2007-05-16 Thread Bob Tracy
Dave Jones wrote:
> Bob, does this patch make it boot again for you?
> 
>   Dave
> 
> Some AMD K6's advertise machine check capability, but don't actually
> have an Intel compatible implementation. It also doesn't actually work,
> so don't advertise it as being present.
> 
> Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

NAK.  No difference.  Identical panic message.  (Yes, I double-checked
to make sure I was booting the patched kernel :-)).

-- 
---
Bob Tracy   WTO + WIPO = DMCA? http://www.anti-dmca.org
[EMAIL PROTECTED]
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Neil Brown
On Thursday May 17, [EMAIL PROTECTED] wrote:
> I tried the patch, same problem show up, but no bug_on report
> 
> Is there any other things I can do?
> 

What is the nature of the corruption?  Is it data in a file that is
wrong when you read it back, or does the filesystem metadata get
corrupted?

Can you try the configuration that works, and sha1sum the files after
you have written them to make sure that they really are correct?
My thought here is "maybe there is a bad block on one device, and the
block is used for data in the 'working' config, and for metadata in
the 'broken' config.

Can you try a degraded raid10 configuration. e.g.

   mdadm -C /dev/md1 --level=10 --raid-disks=4 /dev/first missing \
   /dev/second missing

That will lay out the data in exactly the same place as with raid0,
but will use totally different code paths to access it.  If you still
get a problem, then it isn't in the raid0 code.

Maybe try version 1 metadata (mdadm --metadata=1).  I doubt that would
make a difference, but as I am grasping at straws already, it may be a
straw woth trying.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

2007-05-16 Thread Ray Lee

On 5/16/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:

This issue has actually been resolved, see the patch at:
http://lkml.org/lkml/2007/5/16/149


Ah, excellent. Thanks!

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86-64 highres/dyntick support 2.6.22-rc1-v5

2007-05-16 Thread Frank Sorenson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Frank Sorenson wrote:

> After adding *lots* of early_printks, I see that it hangs in
> hpet_is_known(hdp) called from hpet_alloc(), so something in the hpet
> code is still buggy.  Adding nohpet to the kernel command line allows it
> to boot correctly.

Hrm.  Looks like it gets past the hpet_is_known  There's still something
in the hpet detection code, but I didn't get to the bottom of it yet.
I'll do some more debugging to track down where it's really hanging.
Sorry for the noise.

Frank
- --
Frank Sorenson - KD7TZK
Linux Systems Engineer, DSS Engineering, UBS AG
[EMAIL PROTECTED]
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFGS9lFaI0dwg4A47wRApdSAJoDsFphRHZq/tu3d4nJaqMvt+tLGQCghf1L
OCuPEpCRr9tBSnBdVNiShRE=
=NDZn
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.6.21 - "make modules" with GREP_OPTIONS="-C1" (and other)

2007-05-16 Thread Kok, Auke

Martin Christoph wrote:

[1] Summary:
If i have some GREP_OPTIONS set (like -C1 or other) i get several errors
while trying to do "make modules".

[2] Full description:
With some GREP_OPTIONS set "make modules" drops several errors like that:

[EMAIL PROTECTED] /usr/src/linux # GREP_OPTIONS="-C1" make modules
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  Building modules, stage 2.
[...]
WARNING: "aes_enc_blk" [arch/i386/crypto/aes.ko] undefined!
WARNING: "aes_dec_blk" [arch/i386/crypto/aes.ko] undefined!
[...]
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

[3] Keywords:
"make modules", "GREP_OPTIONS", "WARNING", "undefined"

[X.] Suggestion to fix:
Unset GREP_OPTIONS within make process.


While I admit that this will break the build, I think it's safe to say that 
there are hundreds of environment variables that will influence the kbuild 
system and makefiles. It's going to be an uphill battle if you want to fix each 
and every occurrence of a *possible* build breakage due to an environment 
variable being set wrongly.


I think it's perfectly fine for the kbuild system to expect a reasonably sane 
and clean build system. Those who want to set specific variables to influence 
their build should be able to do so as well, without getting settings removed.


In your case, I would suggest not setting this option by default in your shell 
;)

Cheers,

Auke


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-16 Thread Bharata B Rao

On 5/16/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

Andy Whitcroft wrote:
> Getting this on both x86 and x86_64 boxes, they are the older boxen so
> likely older compilers:

Please give the gcc version number.

>   CC  arch/x86_64/boot/memory.o
> arch/i386/boot/memory.c: In function `detect_memory':
> arch/i386/boot/memory.c:32: error: can't find a register in class `DREG'
> while reloading `asm'
>
> Seems to come from git-netsetup, but that tree isn't pulled into your
> git version of -mm so I can't be more specific.

Does the following patch work for you?

-hpa


diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 8a82aa9..d7b250b 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -30,7 +30,7 @@ static int detect_memory_e820(void)
size = sizeof(struct e820entry);
id = SMAP;
asm("int $0x15; setc %0"
-   : "=dm" (err), "+b" (next), "+d" (id), "+c" (size),
+   : "=am" (err), "+b" (next), "+d" (id), "+c" (size),
  "=m" (*desc)
: "D" (desc), "a" (0xe820));


Observed same problem with gcc version 3.4.4 20050721 (Red Hat
3.4.4-2) and binutils-2.15.92.0.2-15 and the above patch fixes it.

Regards,
Bharata.
--
"Men come and go but mountains remain" -- Ruskin Bond.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make hci_notifier a blocking notifier (was Re: BUG: sleeping function called from invalid context at net/core/sock.c:1523)

2007-05-16 Thread Satyam Sharma

On 5/17/07, Ray Lee <[EMAIL PROTECTED]> wrote:

Apologies for taking so long to get back to you -- I've been on the
road for the last week and have finally got to a point where I could
test the patch.

On 5/6/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> (Dropped Pavel, Rafael and linux-pm from CC list, this isn't a PM
> error so don't want to spam them; and added bluez-devel)
>
> On 5/7/07, Ray Lee <[EMAIL PROTECTED]> wrote:
> > On 5/6/07, Alan Stern <[EMAIL PROTECTED]> wrote:
> > > On Sun, 6 May 2007, Satyam Sharma wrote:
> > >
> > > > Anyway, the hci_notifier is called from the following six call sites:
> > > >
> > > > hci_dev_open() and hci_dev_close() -> both called from
> > > > hci_sock_ioctl() => both can sleep
> > > > hci_register_dev() and hci_unregister_dev() => again both are capable
> > > > of sleeping
> > > > hci_suspend_dev() and hci_resume_dev() -> called from the .suspend()
> > > > and .resume() of the hci_usb_driver, and again both of these can sleep
> > > >
> > > > Is there any other reason why hci_notifier must be an atomic notifier?
> > > >
> > > > (CC'ing Alan Stern just in case, apparently hci_notifier became atomic
> > > > when notifier chains were classified into atomic / blocking)
> > >
> > > I don't remember exactly why this particular choice was made.  Perhaps we
> > > found that the notifier callout routines didn't use any blocking
> > > primitives (we may have been mistaken about this -- there was a lot of
> > > code to check) and so therefore the choice didn't matter.  In that case we
> > > probably just decided to make it an atomic notifier to keep things simple.
> > >
> > > As you found, changing it to a blocking notifier is very easy.  Provided
> > > all the callers are non-atomic it should work just fine.
> >
> > Okay, I'll go ahead and try the patch, then, and report back.
>
> You'd still get the BUG message. To fully resolve the problem, we need
> to make the hci_sock_dev_event() notifier callout blocking (which
> happened with this patch) but also convert hci_sk_list.lock to a
> rwsem, but some users of that rwlock (other than hci_sock_dev_event)
> are atomic.
>
> However, please do try and get back, as your testing would still be
> helpful to see whether converting hci_notifier to blocking had other
> side-effects -- if you only see the same message again and otherwise
> things seem fine, then we're good as far as at least this change was
> concerned.

Yes, it's roughly the same trace. There are some differences, though
those are likely due to me finding a new way to trigger the issue. (My
laptop has a button to turn the WiFi/Bluetooth on and off. Hitting
that and causing a disconnect of the internal Bluetooth connector
triggers the same issue without going through a suspend/resume cycle.)


Hi Ray,

This issue has actually been resolved, see the patch at:
http://lkml.org/lkml/2007/5/16/149

[ We've slightly altered the locking scheme, but it's also good to know
that converting hci_notifier to a blocking notifier doesn't cause any
troubles either. If this is fine with other drivers too, this could actually
be a separate patch. ]

I'll also soon send that patch to Andrew, will Cc you too.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: 2.6.21 - "make modules" with GREP_OPTIONS="-C1" (and other)

2007-05-16 Thread Martin Christoph
[1] Summary:
If i have some GREP_OPTIONS set (like -C1 or other) i get several errors
while trying to do "make modules".

[2] Full description:
With some GREP_OPTIONS set "make modules" drops several errors like that:

[EMAIL PROTECTED] /usr/src/linux # GREP_OPTIONS="-C1" make modules
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  Building modules, stage 2.
[...]
WARNING: "aes_enc_blk" [arch/i386/crypto/aes.ko] undefined!
WARNING: "aes_dec_blk" [arch/i386/crypto/aes.ko] undefined!
[...]
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

[3] Keywords:
"make modules", "GREP_OPTIONS", "WARNING", "undefined"

[X.] Suggestion to fix:
Unset GREP_OPTIONS within make process.




signature.asc
Description: OpenPGP digital signature


Re: Weird hard disk noise on shutdown (bug #7674)

2007-05-16 Thread Henrique de Moraes Holschuh
On Wed, 16 May 2007, Rob Landley wrote:
> > But you need to detect if the kernel has proper SCSI device shutdown
> > support, because if it does not, you have to do a cache flush and spindown
> > on shutdown(8) if you can...
> 
> Or (and this is just a thought), you could upgrade your kernel so it 
> correctly 
> handles your hardware, treating this just like any other driver bug or other 

The distros can't update kernels that easily on their stable branches.

And in the userland side, we are not breaking things any further for users
of kernels before 2.6.22 anyway.  Don't expect shutdown(8) to remove support
for <2.6.22 any time soon, at least in Debian.  That will happen only when
we are forced by some other reason to completely break compatibility with
such kernels.

> Last I checked you didn't have to spin down a USB flash key.  If SATA is 

But you have to spin down an USB HD...  el-cheapo USB enclosures will NOT do
it for you.  It is not an easy problem.

> SCSI, what the heck is SAS?  (Answer: a cynical marketing hack to bleed SCSI 
> bigots for the huge margins they've always been bled for.  But oh well.)  It 

Actually, in my limited experience, SAS is marginally less crappy than SATA,
and has a higher MTBF, probably because the manufacturers try to cut less
corners.  But if one can get high-quality SATA drives (where?!), I don't
know why SAS would be superior to SATA.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread Linus Torvalds


On Wed, 16 May 2007, H. Peter Anvin wrote:
> 
> It gets turned on by the code in arch/i386/kernel/cpu.  It's just that
> the new code that Andi added runs during setup, i.e. in real mode, so
> *way* earlier than that.

Ahh. Do we really need it that early?

Now, it's easy enough to just turn off CONFIG_X86_CMPXCHG64 (it really 
should be "8B" instead of "64", but that's another issue) for those 
things, and nobody should really care, but still, maybe we could re-do the 
early bits to be more polite to those VIA CPU's?

I thought the cmpxchg8b stuff was just used to page table setup. Do those 
things even _support_ PAE?

What else uses it? Early setup in real mode? What am I missing? My grep 
powers are waning..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] recalc_sigpending_tsk fixes

2007-05-16 Thread Roland McGrath
> We already discussed this, this is not so important, but how about
> 
>   void recalc_sigpending_and_wake(struct task_struct *t)
>   {
>   int was_pending = signal_pending(t);
> 
>   if (recalc_sigpending_tsk(t) && !was_pending)
>   signal_wake_up(t, 0);
>   }
> 
> ?
> 
> This "was_pending" is more a documenation than a optimization.

I don't object, but I think another comment about the wakeup being
sometimes superfluous is enough, if anything.  


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Jeff Zheng
I tried the patch, same problem show up, but no bug_on report

Is there any other things I can do?


Jeff


> Yes, I meant 2T, and yes, the components are always over 2T.  
> So I'm at a complete loss.  The raid0 code follows the same 
> paths and does the same things and uses 64bit arithmetic where needed.
> 
> So I have no idea how there could be a difference between 
> these two cases.  
> 
> I'm at a loss...
> 
> NeilBrown
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread H. Peter Anvin
Linus Torvalds wrote:
> 
> On Thu, 17 May 2007, Christian wrote:
> 
>> Linus Torvalds wrote:
>>> Can you check? The Nehemian (C3-2) should be model 9 or greater.
>> Yes, it's a Nehemiah
> 
> Ok. If so, we should blacklist both MCYRIXIII and MVIAC3_2, I suspect.
> 
>> lola:~ # cat /proc/cpuinfo
>> flags   : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx fxsr 
>> sse rng rng_en ace ace_en
> 
> However, it does seem to *claim* to support "cx8" aka cmpxchg8b. What's up 
> with that?

It gets turned on by the code in arch/i386/kernel/cpu.  It's just that
the new code that Andi added runs during setup, i.e. in real mode, so
*way* earlier than that.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: dock support on thinkpad x60s for DVD/CDROM

2007-05-16 Thread Henrique de Moraes Holschuh
On Wed, 16 May 2007, George Nychis wrote:
> I was wondering if any progress was made on the docking station support for 
> new thinkpads,
> like the x60s ultradock.  I'm looking for support for the CD/DVD burner on 
> it.  I tried
> enabling the most recent dock support in the kernel and still nothing. 
> (2.6.22-rc1)

ACPI generic dock driver, or thinkpad-acpi dock driver?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird hard disk noise on shutdown (bug #7674)

2007-05-16 Thread Rob Landley
On Wednesday 16 May 2007 8:58 pm, Henrique de Moraes Holschuh wrote:
> On Wed, 16 May 2007, Rob Landley wrote:
> > Ok, so the change is to get shutdown to _stop_ doing something stupid 
> > (spinning down the disk without first flushing the cache), and the correct 
> > thing for shutdown to do is keep its' mitts off the thing and let the 
kernel 
> > power down the darn hardware?
> 
> Yes, for *all* SCSI disk devices, libata or not.

I realize that this time next year it won't be possible to use a ramdisk or a 
network block device without going through the SCSI layer, but while it 
remains an option I'm relishing _not_ using it, thanks.

> But you need to detect if the kernel has proper SCSI device shutdown
> support, because if it does not, you have to do a cache flush and spindown
> on shutdown(8) if you can...

Or (and this is just a thought), you could upgrade your kernel so it correctly 
handles your hardware, treating this just like any other driver bug or other 
lack of proper hardware support in the history of Linux.  (Back when APM 
couldn't power off the machine at the end of the shutdown sequence, did we 
modify shutdown to try to work around this, or did we fix it in the kernel so 
it worked?)

Why does everybody want to shoehorn everything through the SCSI layer, anyway?  
Last I checked you didn't have to spin down a USB flash key.  If SATA is 
SCSI, what the heck is SAS?  (Answer: a cynical marketing hack to bleed SCSI 
bigots for the huge margins they've always been bled for.  But oh well.)  It 
would be hilarious if I didn't have to put up with it renumbering my devices 
and imposing requirements for hardware I haven't got on hardware I have 
got...

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] make slab gfp fair

2007-05-16 Thread Christoph Lameter
On Mon, 14 May 2007, Peter Zijlstra wrote:

> 
> In the interest of creating a reserve based allocator; we need to make the 
> slab
> allocator (*sigh*, all three) fair with respect to GFP flags.
> 
> That is, we need to protect memory from being used by easier gfp flags than it
> was allocated with. If our reserve is placed below GFP_ATOMIC, we do not want 
> a
> GFP_KERNEL allocation to walk away with it - a scenario that is perfectly
> possible with the current allocators.

And the solution is to fail the allocation of the process which tries to 
walk away with it. The failing allocation will lead to the killing of the 
process right?

We already have an OOM killer which potentially kills random processes. We 
hate it.

Could you please modify the patchset to *avoid* failure conditions. This 
patchset here only manages failure conditions. The system should not get 
into the failure conditions in the first place! For that purpose you may 
want to put processes to sleep etc. But in order to do so you need to 
figure out which processes you need to make progress.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Neil Brown
On Thursday May 17, [EMAIL PROTECTED] wrote:
> 
> > The only difference of any significance between the working 
> > and non-working configurations is that in the non-working, 
> > the component devices are larger than 2Gig, and hence have 
> > sector offsets greater than 32 bits.
> 
> Do u mean 2T here?, but in both configuartion, the component devices are
> larger than 2T (2.25T&5.5T).

Yes, I meant 2T, and yes, the components are always over 2T.  So I'm
at a complete loss.  The raid0 code follows the same paths and does
the same things and uses 64bit arithmetic where needed.

So I have no idea how there could be a difference between these two
cases.  

I'm at a loss...

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] XFS: memory leak in xfs_inactive() - is xfs_trans_free() enough or do we need xfs_trans_cancel() ?

2007-05-16 Thread David Chinner
On Wed, May 16, 2007 at 11:31:16PM +0200, Jesper Juhl wrote:
> Hi,
> 
> The Coverity checker found a memory leak in xfs_inactive().

> So, the code allocates a transaction, but in the case where 'truncate' is
> !=0 and xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, 0); happens to return
> an error, we'll just return from the function without dealing with the
> memory allocated byxfs_trans_alloc() and assigned to 'tp', thus it'll be
> orphaned/leaked - not good.

Yeah, introduced by:

http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d3cf209476b72c83907a412b6708c5e498410aa7

Thanks for reporting the problem, Jesper.

> What I'm wondering is this; is it enough, at this point, to call
> xfs_trans_free(tp); (it would seem to me that would be OK, but I'm not
> intimite with this code) or do we need a full xfs_trans_cancel(tp, 0);  ???

xfs_trans_free() is not supposed to be called by anything but the transaction
code (it's static). So a xfs_trans_cancel() would need to be issued.

> In case I'm right and xfs_trans_free(tp); is all we need, then please
> consider the patch below. Otherwise please NACK the patch and I'll cook up
> another one :-)

NACK ;)

xfs_trans_cancel() is needed. Patch below.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/xfs_vnodeops.c |1 +
 1 file changed, 1 insertion(+)

Index: 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_vnodeops.c2007-05-11 16:04:03.0 
+1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_vnodeops.c 2007-05-17 12:37:25.671399078 +1000
@@ -1710,6 +1710,7 @@ xfs_inactive(
 
error = xfs_itruncate_start(ip, XFS_ITRUNC_DEFINITE, 0);
if (error) {
+   xfs_trans_cancel(tp, 0);
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
return VN_INACTIVE_CACHE;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch] Allocate sparsemem memmap above 4G on X86_64

2007-05-16 Thread Zou Nan hai
On system with huge amount of physical memory. 
VFS cache and memory memmap may eat all available system memory under
4G, then system may fail to allocated swiotlb bounce buffer. 

There was a fix in arch/x86_64/mm/numa.c, but that fix does not cover
sparsemem model.
This patch add fix to sparsemem model.

Signed-off-by: Zou Nan hai <[EMAIL PROTECTED]>
Acked-by: Siddha, Suresh <[EMAIL PROTECTED]>
---
 include/asm-x86_64/mmzone.h |5 +
 include/linux/bootmem.h |3 +++
 mm/sparse.c |5 +
 3 files changed, 13 insertions(+)

diff -Nraup a/include/asm-x86_64/mmzone.h b/include/asm-x86_64/mmzone.h
--- a/include/asm-x86_64/mmzone.h   2007-05-17 09:38:02.0 +0800
+++ b/include/asm-x86_64/mmzone.h   2007-05-17 09:54:10.0 +0800
@@ -52,5 +52,10 @@ extern int pfn_valid(unsigned long pfn);
 #define FAKE_NODE_MIN_HASH_MASK(~(FAKE_NODE_MIN_SIZE - 1uL))
 #endif
 
+#define ARCH_HAS_ALLOC_BOOTMEM_HIGH_NODE 1
+#define alloc_bootmem_high_node(pgdat,size) \
+({__alloc_bootmem_core(pgdat->bdata, size, SMP_CACHE_BYTES, 
(4UL*1024*1024*1024), 0);})
+
+
 #endif
 #endif
diff -Nraup a/include/linux/bootmem.h b/include/linux/bootmem.h
--- a/include/linux/bootmem.h   2007-05-17 09:38:02.0 +0800
+++ b/include/linux/bootmem.h   2007-05-17 09:37:00.0 +0800
@@ -131,5 +131,8 @@ extern void *alloc_large_system_hash(con
 #endif
 extern int hashdist;   /* Distribute hashes across NUMA nodes? */
 
+#ifndef ARCH_HAS_ALLOC_BOOTMEM_HIGH_NODE
+#define alloc_bootmem_high_node(pgdat, size) ({NULL;})
+#endif
 
 #endif /* _LINUX_BOOTMEM_H */
diff -Nraup a/mm/sparse.c b/mm/sparse.c
--- a/mm/sparse.c   2007-05-17 09:38:03.0 +0800
+++ b/mm/sparse.c   2007-05-17 09:54:27.0 +0800
@@ -219,6 +219,11 @@ static struct page __init *sparse_early_
if (map)
return map;
 
+   map = alloc_bootmem_high_node(NODE_DATA(nid),
+   sizeof(struct page) * PAGES_PER_SECTION);
+if (map)
+return map;
+
map = alloc_bootmem_node(NODE_DATA(nid),
sizeof(struct page) * PAGES_PER_SECTION);
if (map)




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread Linus Torvalds


On Thu, 17 May 2007, Christian wrote:

> Linus Torvalds wrote:
> > Can you check? The Nehemian (C3-2) should be model 9 or greater.
> 
> Yes, it's a Nehemiah

Ok. If so, we should blacklist both MCYRIXIII and MVIAC3_2, I suspect.

> lola:~ # cat /proc/cpuinfo
> flags   : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx fxsr 
> sse rng rng_en ace ace_en

However, it does seem to *claim* to support "cx8" aka cmpxchg8b. What's up 
with that?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]x86_64: early_print kernel console should send CRLF not LFCR

2007-05-16 Thread Yinghai Lu
[PATCH]x86_64: early_print kernel console should send CRLF not LFCR

   in 
commit d358788f3f30113e49882187d794832905e42592
Author: Russell King <[EMAIL PROTECTED]>
Date:   Mon Mar 20 20:00:09 2006 +

Glen Turner reported that writing LFCR rather than the more
traditional CRLF causes issues with some terminals.

Since this aflicts many serial drivers, extract the common code
to a library function (uart_console_write) and arrange for each
driver to supply a "putchar" function.

but early_printk is left out. 

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Russell King <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/kernel/early_printk.c 
b/arch/x86_64/kernel/early_printk.c
index 56eaa25..296d2b0 100644
--- a/arch/x86_64/kernel/early_printk.c
+++ b/arch/x86_64/kernel/early_printk.c
@@ -91,9 +91,9 @@ static int early_serial_putc(unsigned char ch)
 static void early_serial_write(struct console *con, const char *s, unsigned n)
 {
while (*s && n-- > 0) {
-   early_serial_putc(*s);
if (*s == '\n')
early_serial_putc('\r');
+   early_serial_putc(*s);
s++;
}
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Serial 8250: Handle saving the clear-on-read bits from the LSR and MSR

2007-05-16 Thread Gene Heskett
On Wednesday 16 May 2007, Corey Minyard wrote:
>Russell King wrote:
>> On Sun, May 06, 2007 at 12:58:25PM -0400, Gene Heskett wrote:
>>> [long message snipped]
>>>
>>> Thanks for your patience Corey.
>>
>> So, in one sentence or preferably one word, did Corey's patch cause a
>> regression?
>
>>From what Gene said, I think the final outcome is that this patch didn't
>seem to make any difference. It looks to me that the problems were
>elsewhere.
>
>So what's the state of this patch?
>
>Thanks,
>
>-corey

Gene here.  My impression was that this patch did help in that it appeared to 
clean up what was thought to be less than optimum code in that area.  There 
were a few times when it didn't seem to take quite as many kills and restarts 
of that ill-coded proprietary daemon to make things behave.

OTOH, I get the very strong impression there is another, more serious buglet 
someplace else that does a pretty good job of masking any black and white 
comparisons one might make about this patch.

Older code, as in 4 or 5 minor kernel versions back, appeared to work 
correctly, either on a serial port, or a usb port with a pl2303, if one could 
tolerate the miss-fires it did occasionally.  Now of course the pl2303 seems 
to be broken, both of the ones I have quit working at all with 2.6.21 final.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
* |Rain| prepares for polygon soup
<|Rain|> sweet merciful crap, it works?
* |Rain| faints
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Jeff Zheng

> The only difference of any significance between the working 
> and non-working configurations is that in the non-working, 
> the component devices are larger than 2Gig, and hence have 
> sector offsets greater than 32 bits.

Do u mean 2T here?, but in both configuartion, the component devices are
larger than 2T (2.25T&5.5T).
 
> This does cause a slightly different code path in one place, 
> but I cannot see it making a difference.  But maybe it does.
> 
> What architecture is this running on?
> What C compiler are you using?

I386(i686)
Gcc 4.0.2 20051125, 
Distro is Fedora core, we've tried fc4 and fc6.

> Can you try with this patch?  It is the only thing that I can 
> find that could conceivably go wrong.
> 

OK, I will try the patach and post the result.

Best Regards
Jeff Zheng

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.20.11] File system corruption with degraded md/RAID-5 device

2007-05-16 Thread Neil Brown
On Wednesday May 16, [EMAIL PROTECTED] wrote:
> 
> Here is what I am doing to test:
> 
> fdisk /dev/sda1 and /dev/sb1 to type fd/Linux raid auto
> mdadm --create /dev/md1 -c 128 -l 5 -n 3 /dev/sda1 /dev/sdb1 missing
> mke2fs -j -b 4096 -R stride=32 /dev/md1
> e2fsck -f /dev/md1
> -
> Result: FAILS - fsck errors (Example: "Inode 3930855 is in use, but
> has dtime set.")

Very odd.  I cannot reproduce this, but then my drives are somewhat
smaller than yours (though I'm not sure how that could be
significant).

Can you try a raid0 across 2 drives?  That would be more like the
raid5 layout than raid1.

My guess is some subtle hardware problem,  as I would be very
surprised in the raid5 code is causing this.  Maybe run memtest86? 

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-16 Thread David Chinner
On Wed, May 16, 2007 at 09:41:33AM -0700, Andrew Morton wrote:
> On Wed, 16 May 2007 18:24:44 +0200 Michal Piotrowski <[EMAIL PROTECTED]> 
> wrote:
> 
> > Andrew Morton napisał(a):
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/
> > > 
> > 
> > Almost every time when I try to run this script I hit a bug. I'm wondering 
> > why...
> > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc1-mm1/test_mount_fs.sh
> > 
> > [ .713016] kernel BUG at /home/devel/linux-mm/include/linux/mm.h:288!
> 
> static inline int put_page_testzero(struct page *page)
> {
>   VM_BUG_ON(atomic_read(>_count) == 0);
>   return atomic_dec_and_test(>_count);
> }

I haven't seen that one. I expect that it will be the noaddr buffer allocation
changes that have triggered this...

> > [ .719690] invalid opcode:  [#1]
> > [ .723397] PREEMPT SMP
> > [ .725999] Modules linked in: xfs loop pktgen ipt_MASQUERADE 
> > iptable_nat nf_nat autofs4 af_packet nf_conntrack_netbios_ns ipt_REJECT 
> > nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables 
> > ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 binfmt_misc 
> > thermal processor fan container nvram snd_intel8x0 snd_ac97_codec ac97_bus 
> > snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device 
> > snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd soundcore intel_agp 
> > agpgart snd_page_alloc i2c_i801 ide_cd cdrom rtc unix
> > [ .776026] CPU:0
> > [ .776027] EIP:0060:[]Not tainted VLI
> > [ .776028] EFLAGS: 00010202   (2.6.22-rc1-mm1 #3)
> > [ .788519] EIP is at put_page+0x44/0xee
> > [ .792491] eax: 0001   ebx: c549f728   ecx: c04b27e0   edx: 0001
> > [ .799345] esi:    edi: 0080   ebp: d067e9e0   esp: d067e9c8
> > [ .806208] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> > [ .812104] Process mount (pid: 9419, ti=d067e000 task=d00a4070 
> > task.ti=d067e000)
> > [ .819486] Stack: d8980180 0080 d067e9f0 d8980180  0080 
> > d067e9f0 fdc8eda3
> > [ .828103]fffc d8980180 d067ea20 fdc8f7ff fdc9b425 fdc96e5c 
> > 0008 
> > [ .836635]c549dfd0 0200  cd44b8e0 2160 cd44b8e0 
> > d067ea30 fdc78937
> > [ .845253] Call Trace:
> > [ .847939]  [] xfs_buf_free+0x41/0x61 [xfs]
> > [ .853247]  [] xfs_buf_get_noaddr+0x10c/0x118 [xfs]
> > [ .859231]  [] xlog_get_bp+0x65/0x69 [xfs]

Yeah - that trace implies a memory allocation failure when allocating
log buffer pages and the cleanup looks like it does a double free
of the pages that got allocated. Patch attached below that should fix
this problem.

> > [ 6667.271984] XFS: Filesystem loop1 has duplicate UUID - can't mount
> >
> > ...
> >
> > [ 6670.074487] XFS: Filesystem loop1 has duplicate UUID - can't mount
> > [ 6670.240395] XFS: Filesystem loop1 has duplicate UUID - can't mount
> > [ 6670.350305] XFS: Filesystem loop1 has duplicate UUID - can't mount
> > [ 6670.458773] XFS: Filesystem loop1 has duplicate UUID - can't mount

I assume that the thread doing the mount got killed by the BUG and so the
normal error handling path on log mount failure was not executed and hence the
uuid for the filesystem never got removed from the table used to detect
multiple mounts of the same filesystem

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/linux-2.6/xfs_buf.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-05-11 
16:03:26.0 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c2007-05-17 11:53:40.293585132 
+1000
@@ -323,9 +323,16 @@ xfs_buf_free(
for (i = 0; i < bp->b_page_count; i++) {
struct page *page = bp->b_pages[i];
 
-   if (bp->b_flags & _XBF_PAGE_CACHE)
+   /* handle noaddr allocation failure case */
+   if (!page)
+   break;
+
+   if (bp->b_flags & _XBF_PAGE_CACHE) {
ASSERT(!PagePrivate(page));
-   page_cache_release(page);
+   page_cache_release(page);
+   } else {
+   __free_page(page);
+   }
}
_xfs_buf_free_pages(bp);
}
@@ -766,6 +773,8 @@ xfs_buf_get_noaddr(
goto fail;
_xfs_buf_initialize(bp, target, 0, len, 0);
 
+   bp->b_flags |= _XBF_PAGES;
+
error = _xfs_buf_get_pages(bp, page_count, 0);
if (error)
goto fail_free_buf;
@@ -773,15 +782,14 @@ xfs_buf_get_noaddr(
for (i = 0; i < 

dock support on thinkpad x60s for DVD/CDROM

2007-05-16 Thread George Nychis
Hey all,

I was wondering if any progress was made on the docking station support for new 
thinkpads,
like the x60s ultradock.  I'm looking for support for the CD/DVD burner on it.  
I tried
enabling the most recent dock support in the kernel and still nothing. 
(2.6.22-rc1)

Thanks!
George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fix kmalloc(0) in arch/ia64/pci/pci.c

2007-05-16 Thread KAMEZAWA Hiroyuki
Fix following kmalloc(0) message. patch is agaisnt 2.6.22-rc1-mm1.

==
BUG: at mm/slab.c:792 __find_general_cachep()

Call Trace:
 [] show_stack+0x40/0xa0
sp=e14042227b00 bsp=e14042220f90
 [] dump_stack+0x30/0x60
sp=e14042227cd0 bsp=e14042220f78
 [] kmem_find_general_cachep+0x90/0x140
sp=e14042227cd0 bsp=e14042220f48
 [] __kmalloc_node+0x30/0xa0
sp=e14042227cd0 bsp=e14042220f18
 [] pci_acpi_scan_root+0x180/0x4a0
sp=e14042227cd0 bsp=e14042220ed0
 [] acpi_pci_root_add+0x4e0/0x700
sp=e14042227cf0 bsp=e14042220e90
 [] acpi_device_probe+0xa0/0x160
sp=e14042227d10 bsp=e14042220e58
 [] driver_probe_device+0x250/0x380
sp=e14042227d10 bsp=e14042220e18
 [] __driver_attach+0xc0/0x160
sp=e14042227d10 bsp=e14042220de0
 [] bus_for_each_dev+0x80/0x100
sp=e14042227d10 bsp=e14042220da8
 [] driver_attach+0x40/0x60
sp=e14042227d30 bsp=e14042220d88
 [] bus_add_driver+0xf0/0x3c0
sp=e14042227d30 bsp=e14042220d48
 [] driver_register+0x140/0x160
sp=e14042227d30 bsp=e14042220d28
 [] acpi_bus_register_driver+0x50/0x80
sp=e14042227d30 bsp=e14042220d08
 [] acpi_pci_root_init+0x20/0x60
sp=e14042227d30 bsp=e14042220cf0
 [] kernel_init+0x450/0x7c0
sp=e14042227d30 bsp=e14042220ca8
 [] kernel_thread_helper+0x30/0x60
sp=e14042227e30 bsp=e14042220c80
 [] start_kernel_thread+0x20/0x40
sp=e14042227e30 bsp=e14042220c80
==

Fix kmalloc(0)

Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

Index: linux-2.6.22-rc1-mm1/arch/ia64/pci/pci.c
===
--- linux-2.6.22-rc1-mm1.orig/arch/ia64/pci/pci.c
+++ linux-2.6.22-rc1-mm1/arch/ia64/pci/pci.c
@@ -354,6 +354,8 @@ pci_acpi_scan_root(struct acpi_device *d
 
acpi_walk_resources(device->handle, METHOD_NAME__CRS, count_window,
);
+   if (!windows)
+   goto out2;
controller->window = kmalloc_node(sizeof(*controller->window) * windows,
GFP_KERNEL, controller->node);
if (!controller->window)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: radeonfb and X800 cards

2007-05-16 Thread Benjamin Herrenschmidt
On Wed, 2007-05-16 at 21:47 -0400, Daniel Drake wrote:
> Hi,
> 
> Did anything happen to the patch titled "radeonfb: add support for newer 
> cards"?
> http://lwn.net/Articles/215965/
> 
> Jimmy at http://bugs.gentoo.org/174063 has extended upon this with some 
> further fixes based on code the in X11 driver. The patches are on the 
> bug report.
> 
> Ben, where can the most up-to-date radeonfb code be found?

upstream. I haven't released anything else so far. Does the patch still
apply ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread H. Peter Anvin
H. Peter Anvin wrote:
> 
> Andi added code to verify that we can actually execute on the processor
> before protected mode (so we can still get a message out through the
> BIOS.)  That code presumably doesn't know of the MSR that needs to be
> touched.
> 
> That code is in assembly in Andi's version, my rewritten version has it
> in C.  I should add this code.
> 

The newsetup tree now has code to unmask features on VIA and Transmeta
(as well as AMD, which was already in there):

http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=commitdiff;h=c9cf55604433b386d0b499ed7bed654fd01c3be2

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


patch jfs-fix-race-waking-up-jfsio-kernel-thread.patch queued to 2.6.21-stable tree

2007-05-16 Thread chrisw

This is a note to let you know that we have just queued up the patch titled

 Subject: JFS: Fix race waking up jfsIO kernel thread

to the 2.6.21-stable tree.  Its filename is

 jfs-fix-race-waking-up-jfsio-kernel-thread.patch

A git repo of this tree can be found at 

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From [EMAIL PROTECTED]  Tue May 15 20:55:43 2007
From: Dave Kleikamp <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Date: Tue, 15 May 2007 22:53:36 -0500
Message-Id: <[EMAIL PROTECTED]>
Cc: linux-kernel 
Subject: JFS: Fix race waking up jfsIO kernel thread

It's possible for a journal I/O request to be added to the log_redrive
queue and the jfsIO thread to be awakened after the thread releases
log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE.

The jfsIO thread should set the state before giving up the spinlock, so
the waking thread will really wake it.

Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 fs/jfs/jfs_logmgr.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.21.1.orig/fs/jfs/jfs_logmgr.c
+++ linux-2.6.21.1/fs/jfs/jfs_logmgr.c
@@ -2354,12 +2354,13 @@ int jfsIOWait(void *arg)
lbmStartIO(bp);
spin_lock_irq(_redrive_lock);
}
-   spin_unlock_irq(_redrive_lock);
 
if (freezing(current)) {
+   spin_unlock_irq(_redrive_lock);
refrigerator();
} else {
set_current_state(TASK_INTERRUPTIBLE);
+   spin_unlock_irq(_redrive_lock);
schedule();
current->state = TASK_RUNNING;
}


Patches currently in stable-queue which might be from [EMAIL PROTECTED] are

queue-2.6.21/jfs-fix-race-waking-up-jfsio-kernel-thread.patch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


radeonfb and X800 cards

2007-05-16 Thread Daniel Drake

Hi,

Did anything happen to the patch titled "radeonfb: add support for newer 
cards"?

http://lwn.net/Articles/215965/

Jimmy at http://bugs.gentoo.org/174063 has extended upon this with some 
further fixes based on code the in X11 driver. The patches are on the 
bug report.


Ben, where can the most up-to-date radeonfb code be found?

Thanks,
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread Christian
Dave Jones wrote:
> The C3s all have cx8, but it needs to be enabled in an MSR first.
> (See arch/i386/kernel/cpu/centaur.c , search for CX8)
> 
> Did we add code that uses cmpxchg8b before identify_cpu() gets run ?
> I've not been paying attention to .22rc (busy trying to beat .21 into shape 
> for F7)
> so I may have missed something obvious. Andi?
> 
>   Dave
> 

May I brought up a wrong reason with the command cmpxchg64.
But disabling CONFIG_X86_CMPXCHG64 helps.


The via C3 EBGA datasheet R1.9 tells me this command works always:
>The CMPXCHG8B instruction is provided and always enabled, however, it appears 
>disabled in the corresponding
>CPUID function bit 0 to avoid a bug in an early version of Windows NT. 
>However, this default can be changed
>via a bit in the FCR MSR.


Hmm, I should be able to add a few small "here I am" to
the my local boot code with a little hint.

Anyway I will try tomorrow to find this on my own.
printfs for debugging are more friendly than assembler.

H. Peter Anvin wrote:
> Andi added code to verify that we can actually execute on the processor
> before protected mode (so we can still get a message out through the
> BIOS.)  That code presumably doesn't know of the MSR that needs to be
> touched.


Best regards,

Christian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread Christian
Linus Torvalds wrote:
> Can you check? The Nehemian (C3-2) should be model 9 or greater.

Yes, it's a Nehemiah

lola:~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : CentaurHauls
cpu family  : 6
model   : 9
model name  : VIA Nehemiah
stepping: 8
cpu MHz : 998.732
cache size  : 64 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx fxsr sse 
rng rng_en ace ace_en
bogomips: 1999.51
clflush size: 32
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread H. Peter Anvin
Dave Jones wrote:
> 
> The C3s all have cx8, but it needs to be enabled in an MSR first.
> (See arch/i386/kernel/cpu/centaur.c , search for CX8)
> 
> Did we add code that uses cmpxchg8b before identify_cpu() gets run ?
> I've not been paying attention to .22rc (busy trying to beat .21 into shape 
> for F7)
> so I may have missed something obvious. Andi?
> 

Andi added code to verify that we can actually execute on the processor
before protected mode (so we can still get a message out through the
BIOS.)  That code presumably doesn't know of the MSR that needs to be
touched.

That code is in assembly in Andi's version, my rewritten version has it
in C.  I should add this code.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: UML doesn't compile in 2.6.21

2007-05-16 Thread Rob Landley
On Wednesday 16 May 2007 6:49 pm, Robert Schwebel wrote:
> Jeff,
> 
> Any idea how this could happen? I'm trying to build 2.6.21 for ARCH=um, and 
the
> linker stage explodes here:

2.6.21.1 built for me:

tar xvjf linux-2.6.21.1.tar.bz2 &&
cd linux-2.6.21.1 &&
cat > mini.conf << EOF
CONFIG_MODE_SKAS=y
CONFIG_BINFMT_ELF=y
CONFIG_HOSTFS=y
CONFIG_SYSCTL=y
CONFIG_STDERR_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_LBD=y
CONFIG_EXT2_FS=y
CONFIG_PROC_FS=y
EOF
make ARCH=um allnoconfig KCONFIG_ALLCONFIG=mini.conf &&
make ARCH=um &&
./linux rootfstype=hostfs rw init=/bin/sh

Does this not work for you?

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Proposed update of the i386 boot document

2007-05-16 Thread H. Peter Anvin
I have noticed that a few items in the i386 boot document have become a
bit incoherent and/or dated over the years.  Attached is an attempt at
rewriting a few of the sections; the main things is a field-by-field
description for the setup header.

I would appreciate comments as to if this makes it easier to follow or not.

-hpa
 THE LINUX/I386 BOOT PROTOCOL
 

H. Peter Anvin <[EMAIL PROTECTED]>
Last update 2007-05-16

On the i386 platform, the Linux kernel uses a rather complicated boot
convention.  This has evolved partially due to historical aspects, as
well as the desire in the early days to have the kernel itself be a
bootable image, the complicated PC memory model and due to changed
expectations in the PC industry caused by the effective demise of
real-mode DOS as a mainstream operating system.

Currently, the following versions of the Linux/i386 boot protocol exist.

Old kernels:zImage/Image support only.  Some very early kernels
may not even support a command line.

Protocol 2.00:  (Kernel 1.3.73) Added bzImage and initrd support, as
well as a formalized way to communicate between the
boot loader and the kernel.  setup.S made relocatable,
although the traditional setup area still assumed
writable.

Protocol 2.01:  (Kernel 1.3.76) Added a heap overrun warning.

Protocol 2.02:  (Kernel 2.4.0-test3-pre3) New command line protocol.
Lower the conventional memory ceiling.  No overwrite
of the traditional setup area, thus making booting
safe for systems which use the EBDA from SMM or 32-bit
BIOS entry points.  zImage deprecated but still
supported.

Protocol 2.03:  (Kernel 2.4.18-pre1) Explicitly makes the highest possible
initrd address available to the bootloader.

Protocol 2.04:  (Kernel 2.6.14) Extend the syssize field to four bytes.

Protocol 2.05:  (Kernel 2.6.20) Make protected mode kernel relocatable.
Introduce relocatable_kernel and kernel_alignment fields.

Protocol 2.06:  (Kernel 2.6.22) Added a field that contains the size of
the boot command line


 MEMORY LAYOUT

The traditional memory map for the kernel loader, used for Image or
zImage kernels, typically looks like:

||
0A  ++
|  Reserved for BIOS |  Do not use.  Reserved for BIOS EBDA.
09A000  ++
|  Command line  |
|  Stack/heap|  For use by the kernel real-mode code.
098000  ++  
|  Kernel setup  |  The kernel real-mode code.
090200  ++
|  Kernel boot sector|  The kernel legacy boot sector.
09  ++
|  Protected-mode kernel |  The bulk of the kernel image.
01  ++
|  Boot loader   |  <- Boot sector entry point :7C00
001000  ++
|  Reserved for MBR/BIOS |
000800  ++
|  Typically used by MBR |
000600  ++ 
|  BIOS use only |
00  ++


When using bzImage, the protected-mode kernel was relocated to
0x10 ("high memory"), and the kernel real-mode block (boot sector,
setup, and stack/heap) was made relocatable to any address between
0x1 and end of low memory. Unfortunately, in protocols 2.00 and
2.01 the 0x9+ memory range is still used internally by the kernel;
the 2.02 protocol resolves that problem.

It is desirable to keep the "memory ceiling" -- the highest point in
low memory touched by the boot loader -- as low as possible, since
some newer BIOSes have begun to allocate some rather large amounts of
memory, called the Extended BIOS Data Area, near the top of low
memory.  The boot loader should use the "INT 12h" BIOS call to verify
how much low memory is available.

Unfortunately, if INT 12h reports that the amount of memory is too
low, there is usually nothing the boot loader can do but to report an
error to the user.  The boot loader should therefore be designed to
take up as little space in low memory as it reasonably can.  For
zImage or old bzImage kernels, which need data written into the
0x9 segment, the boot loader should make sure not to use memory
above the 0x9A000 point; too many BIOSes will break above that point.

For a modern bzImage kernel with boot protocol version >= 2.02, a
memory layout like the following is suggested:

~~
|  Protected-mode kernel |
10  ++
|  I/O memory hole   |
0A  ++
|  Reserved for BIOS |  Leave as 

Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread Linus Torvalds


On Thu, 17 May 2007, Christian wrote:
>
> my small VIA C3_2 box does not boot with 2.6.22-rc1.
> It even does not uncompress the kernel.
> 
> The configuration as M386 M486 works. But M586 + MVIAC3_2
> does not work.

Ahh, from the EPIA HOWTO:

13.2. Is the C3 Pentium compatible?

Yes. But Samuel 2, Ezra, Ezra T C3 processors have a problem with 
the cmpxchg8b (i.e. CMOV) opcode. Nehemiah and Antaur processors 
are not affected.

However, that would imply that you don't have a VIA C3-2 (Nehamiah) at 
all, but the older original CyrixIII/VIA C3.

Can you check? The Nehemian (C3-2) should be model 9 or greater.

So afaik, you should use MCYRIXIII, and make _that_ be the one that 
disables the use of the cmpxchg8b instruction.

Can you please verify?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powernow-k8: depend on acpi-processor for SMP systems

2007-05-16 Thread Ed Sweetman

Daniel Drake wrote:

Joshua Hoblitt wrote:

I don't think this is quiet right either as Ed Sweetman has reported
that this issue doesn't occur on single socket/multi-core systems.


Where did he write that? In an off-list mail, Ed seemed to agree with 
my patch.


Daniel


What i didn't agree with was the dependency on the acpi P-state driver 
for single socket multi-core systems, where in the original post of this 
thread, Joshua was stating that smp systems required that driver.   
Later it was found that the acpi p-state driver was only being used to 
enforce the dependency on the acpi_processor driver ...which is the 
actual driver we care about (dependency wise).  

So yes, I do agree with your patch, in so far as my experience with the 
hardware.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc1-mm1] add check_highest_zone to build_zonelists_in_zone_order

2007-05-16 Thread KAMEZAWA Hiroyuki
On Wed, 16 May 2007 15:57:39 -0400
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> 
> [PATCH 2.6.21-rc1-mm1] add check_highest_zone to build_zonelists_in_zone_order
> 
> We missed this in the "change zone order" series.  We need to record
> the highest populated zone, just as build_zonelists_node() does.
> Memory policies apply only to this zone.  Without this, we'll be
> applying policy to all zones, including DMA, I think.  Not having
> thought about it much, I can't claim to understand the downside of
> doing so.
> 
> Also, display selected "policy zone" during boot or reconfig
> of zonelist order, if 'NUMA.  Inquiring minds [might] want to know...
> 
> Cleanup:  remove stale comment in set_zonelist_order()
> 
> Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>
>
Acked-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird hard disk noise on shutdown (bug #7674)

2007-05-16 Thread Henrique de Moraes Holschuh
On Wed, 16 May 2007, Rob Landley wrote:
> Ok, so the change is to get shutdown to _stop_ doing something stupid 
> (spinning down the disk without first flushing the cache), and the correct 
> thing for shutdown to do is keep its' mitts off the thing and let the kernel 
> power down the darn hardware?

Yes, for *all* SCSI disk devices, libata or not.

But you need to detect if the kernel has proper SCSI device shutdown
support, because if it does not, you have to do a cache flush and spindown
on shutdown(8) if you can...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: implement ata_wait_after_reset()

2007-05-16 Thread Paul Mundt
On Wed, May 16, 2007 at 06:44:53PM +0200, Tejun Heo wrote:
> + /* FIXME: GoVault needs 2s but we can't afford that without
> +  * parallel probing.  800ms is enough for iVDR disk
> +  * HHD424020F7SV00.  Increase to 2secs when parallel probing
> +  * is in place.
> +  */
> + ATA_TMOUT_FF_WAIT   = 4 * HZ / 5,
> +

Changing this to 4 * HZ / 4 gets rid of the occasional COMRESET failure.
So it would seem that 800ms is good enough for the common case, but it
seems to be cutting it pretty close..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: select(0, ..) is valid ?

2007-05-16 Thread Badari Pulavarty
On Wed, 2007-05-16 at 10:37 -0500, Anton Blanchard wrote:
> Hi Hugh,
> 
> > It's interesting that compat_core_sys_select() shows this kmalloc(0)
> > failure but core_sys_select() does not.  That's because core_sys_select()
> > avoids kmalloc by using a buffer on the stack for small allocations (and
> > 0 sure is small).  Shouldn't compat_core_sys_select() do just the same?
> > Or is SLUB going to be so efficient that doing so is a waste of time?
> 
> Nice catch, the original optimisation from Andi is:
> 
> http://git.kernel.org/git-new/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=70674f95c0a2ea694d5c39f4e514f538a09be36f
> 
> And I think it makes sense for the compat code to do it too.
> 
> Anton

Here it is ..

Should I do one for poll() also ?

Thanks,
Badari

Optimize select by a using stack space for small fd sets.
core_sys_select() already has this optimization. This is
for compat version. 

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 fs/compat.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

Index: linux-2.6.22-rc1/fs/compat.c
===
--- linux-2.6.22-rc1.orig/fs/compat.c   2007-05-12 18:45:56.0 -0700
+++ linux-2.6.22-rc1/fs/compat.c2007-05-16 17:50:39.0 -0700
@@ -1544,9 +1544,10 @@ int compat_core_sys_select(int n, compat
compat_ulong_t __user *outp, compat_ulong_t __user *exp, s64 *timeout)
 {
fd_set_bits fds;
-   char *bits;
+   void *bits;
int size, max_fds, ret = -EINVAL;
struct fdtable *fdt;
+   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
 
if (n < 0)
goto out_nofds;
@@ -1564,11 +1565,14 @@ int compat_core_sys_select(int n, compat
 * since we used fdset we need to allocate memory in units of
 * long-words.
 */
-   ret = -ENOMEM;
size = FDS_BYTES(n);
-   bits = kmalloc(6 * size, GFP_KERNEL);
-   if (!bits)
-   goto out_nofds;
+   bits = stack_fds;
+   if (size > sizeof(stack_fds) / 6) {
+   bits = kmalloc(6 * size, GFP_KERNEL);
+   ret = -ENOMEM;
+   if (!bits)
+   goto out_nofds;
+   }
fds.in  = (unsigned long *)  bits;
fds.out = (unsigned long *) (bits +   size);
fds.ex  = (unsigned long *) (bits + 2*size);
@@ -1600,7 +1604,8 @@ int compat_core_sys_select(int n, compat
compat_set_fd_set(n, exp, fds.res_ex))
ret = -EFAULT;
 out:
-   kfree(bits);
+   if (bits != stack_fds)
+   kfree(bits);
 out_nofds:
return ret;
 }



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powernow-k8: depend on acpi-processor for SMP systems

2007-05-16 Thread Daniel Drake

Joshua Hoblitt wrote:

I don't think this is quiet right either as Ed Sweetman has reported
that this issue doesn't occur on single socket/multi-core systems.


Where did he write that? In an off-list mail, Ed seemed to agree with my 
patch.


Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata: implement ata_wait_after_reset()

2007-05-16 Thread Paul Mundt
On Wed, May 16, 2007 at 06:44:53PM +0200, Tejun Heo wrote:
> This patch is against the current libata-dev#upstream +
> pata_scc-fix-build-failure[1].
> 
>   [1] http://article.gmane.org/gmane.linux.kernel/528405
> 
> Paul, please verify this fixes your problem.  You can skip the
> pata_scc patch, it will cause pata_scc part to be rejected but doesn't
> matter.
> 
Yes, this does get iVDR detection working again. The only problem seems
to be that every now and then I end up with this:

scsi0 : sata_sil
scsi1 : sata_sil
ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
ata1: device not ready (errno=-19), forcing hardreset
ata1: COMRESET failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 9 secs
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

So at least the drive detection works, but it would be nice not to
trigger this 9-second retry.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Resending: RT patches expose netdev race [was Re: [RFC] [patch 2/2] powerpc 2.6.21-rt1: fix kernel hang and/or panic

2007-05-16 Thread Benjamin Herrenschmidt

> I do not know why sk_buff->head would be null, or
> would be set in a racy kind of way, or why the rt patches
> would cause this. But the evidence implicates that.

Would it be possible that a locking bug in spidernet would cause it
under some circumstances to get a stale skb pointer ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Software raid0 will crash the file-system, when each disk is 5TB

2007-05-16 Thread Neil Brown
On Wednesday May 16, [EMAIL PROTECTED] wrote:
> Here is the information of the created raid0. Hope it is enough.

Thanks.
Everything looks fine here.

The only difference of any significance between the working and
non-working configurations is that in the non-working, the component
devices are larger than 2Gig, and hence have sector offsets greater
than 32 bits.

This does cause a slightly different code path in one place, but I
cannot see it making a difference.  But maybe it does.

What architecture is this running on?
What C compiler are you using?

Can you try with this patch?  It is the only thing that I can find
that could conceivably go wrong.

Thanks,
NeilBrown

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/md/raid0.c |1 +
 1 file changed, 1 insertion(+)

diff .prev/drivers/md/raid0.c ./drivers/md/raid0.c
--- .prev/drivers/md/raid0.c2007-05-17 10:33:30.0 +1000
+++ ./drivers/md/raid0.c2007-05-17 10:34:02.0 +1000
@@ -461,6 +461,7 @@ static int raid0_make_request (request_q
  
while (block >= (zone->zone_offset + zone->size)) 
zone++;
+   BUG_ON(block < zone->zone_offset);
 
sect_in_chunk = bio->bi_sector & ((chunk_size<<1) -1);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1: boot failure under qemu

2007-05-16 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>   
>> H. Peter Anvin wrote:
>> 
>>> Okay, I've established that this is a bug in the Qemu kernel loader: the
>>> Qemu loader puts zero in the loadflags, which is wrong no matter how you
>>> slice it.
>>>
>>> I have checked in a workaround in the git.newsetup tree; the workaround
>>> is to rely on a compile-time value for load low/load high instead of
>>> looking at loadflags.
>>>   
>>>   
>> Can you post a patch to try?
>>
>> 
>
> Cumulative diff from -rc1-mm1.
>   

Thanks, this works for me.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread Dave Jones
On Thu, May 17, 2007 at 02:09:16AM +0200, Christian wrote:

 > my small VIA C3_2 box does not boot with 2.6.22-rc1.
 > It even does not uncompress the kernel.
 > 
 > The configuration as M386 M486 works. But M586 + MVIAC3_2
 > does not work.
 > 
 > solution for me, cahnge arch/i386/Kconfig.cpu
 > 
 > --- arch/i386/Kconfig.cpu.before 2007-05-17 01:38:26.0 +0200
 > +++ arch/i386/Kconfig.cpu   2007-05-17 00:54:52.0 +0200
 > @@ -299,5 +299,5 @@
 > 
 >  config X86_CMPXCHG64
 > bool
 > -   depends on !M386 && !M486
 > +   depends on !M386 && !M486 && !MVIAC3_2
 > default y
 > 
 > 
 > The related #ifdef is in ./include/asm-i386/cmpxchg.h
 > May be cmpxchg8b is not supported by VIAC3_2 ?
 > 
 > May be some other non Intel/AMD need to be excluded from X86_CMPXCHG64 ?
 > May be the generic option CONFIG_X86_GENERIC need to switch this off also ?

The C3s all have cx8, but it needs to be enabled in an MSR first.
(See arch/i386/kernel/cpu/centaur.c , search for CX8)

Did we add code that uses cmpxchg8b before identify_cpu() gets run ?
I've not been paying attention to .22rc (busy trying to beat .21 into shape for 
F7)
so I may have missed something obvious. Andi?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powernow-k8: depend on acpi-processor for SMP systems

2007-05-16 Thread Dave Jones
On Wed, May 16, 2007 at 02:26:14PM -1000, Joshua Hoblitt wrote:
 > I don't think this is quiet right either as Ed Sweetman has reported
 > that this issue doesn't occur on single socket/multi-core systems.

I'm not sure why [*], because this should be preventing it..

if (num_online_cpus() != 1) {
printk(KERN_ERR PFX "MP systems not supported by PSB 
BIOS structure\n");
kfree(data);
return -ENODEV;
}

num_online_cpus will return 2 in a dual-core system, even though there's
just one socket.  Given they share a power plane, if there's a valid
PSB structure however, it may be usable.  Though this isn't necessarily
true for all future dual-core AMD CPUs, and the ACPI tables really
should be preferred.

Dave

[*] unless you have the second core disabled or CONFIG_SMP=n

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] select and dependencies in Kconfig

2007-05-16 Thread Stefan Richter
Timur Tabi wrote:
> For example, if I want to add a new driver C that uses library B, I can
> just add this:
> 
> C
> select B
> 
> If I have to use "depends on", then I would have to change the Kconfig
> option for B like this:
> 
> B
> depends on A || C

You mean, "B... serves A, C".

However, it shouldn't matter which way around the dependencies are
written down in the Kconfigs.  What does matter is how "make
{old,menu,...}config" deal with it.
-- 
Stefan Richter
-=-=-=== -=-= =---=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] scalable rw_mutex

2007-05-16 Thread Andrew Morton
On Wed, 16 May 2007 16:40:59 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> 
wrote:

> On Wed, 16 May 2007, Andrew Morton wrote:
> 
> > (I hope.  Might have race windows in which the percpu_counter_sum() count is
> > inaccurate?)
> 
> The question is how do these race windows affect the locking scheme?

The race to which I refer here is if another CPU is running
percpu_counter_sum() in the window between the clearing of the bit in
cpu_online_map and the CPU_DEAD callout.  Maybe that's too small to care
about in the short-term, dunno.

Officially we should fix that by taking lock_cpu_hotplug() in
percpu_counter_sum(), but I hate that thing.

I was thinking of putting a cpumask into the counter.  If we do that then
there's no race at all: everything happens under fbc->lock.  This would be
a preferable fix, if we need to fix it.

But I'd prefer that freezer-based cpu-hotplug comes along and saves us
again.



umm, actually, we can fix the race by using CPU_DOWN_PREPARE instead of
CPU_DEAD.  Because it's OK if percpu_counter_sum() looks at a gone-away
CPU's slot.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Resending: RT patches expose netdev race [was Re: [RFC] [patch 2/2] powerpc 2.6.21-rt1: fix kernel hang and/or panic

2007-05-16 Thread Linas Vepstas
(resending , Owa-san was cut from cc list!??)

Hi,

On Tue, May 15, 2007 at 08:09:02PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2007-05-15 at 17:47 +0900, Tsutomu OWA wrote:
> >   I encountered the following error when doing netperf from other machine 
> > to Celleb running RT kernel.  PREEPT_NONE kernel works just fine as well.
> 
> Hrm... sounds a bit weird. I wonder if there's a locking bug in the
> driver in the first place.
> 
> Linas, what's your take ?

Heh. I almost deleted the entire email thread cause it
didn't say "spidernet" in the subject line. :-)
Seriously, I really almost did 

Since this is a long email; let me put a summary up front:
I think the RT/premption patches are exposing some sort
of race in the ip header handling code. The rest of the 
note is forensics pointing to this.



Reading the patch, it looks like all it did was to move
around the locks, without changing the semantics. Two
comments about that:

-- The current spidernet locks are very fine-grained;
   this makes the whole thing function more smoothly.
   The patch would make them coarse-grained, I don't
   like that.

-- Moving around locks like that changes the timing
   completely, and changing the timing makes races
   come and go. The races seem to vanish, but that's
   only cause you are getting lucky.

Since I'm sick-n-tired of dealing with spidernet, I thought
I'd give this one a little extra attention.

The crash is a null pointer deref. The spidernet doesn't
use locks to protect null pointers. The spidernet mostly
doesn't play with pointers at all; they're mostly static.
So this crash is "unusual" from the get-go.

>> Instruction dump:
>> 6000 81790088 901f000c 913f0018 913f0008 917f0004 48132e8d
>> 6000
>> a019009e 2f800800 409e0038 e9390038 <88690009> 2f830006 419e0010
>> 2f830011

The crashing instruction is <88690009> which is very unique:
  lbz r3,9(r9)

load byte ... at an offset of 9 bytes!? spidernet does
nothing with bytes, so its another reason its not spidernet.

Below follows a manual disassembly. The guilty party appears
to the the skb, and spcifically, skb->head has not been set.
You'll have to read the details below to see why.

I do not know why sk_buff->head would be null, or
would be set in a racy kind of way, or why the rt patches
would cause this. But the evidence implicates that.

--linas

Long stuff below. For the record:

> > Unable to handle kernel paging request for data at address 0x0009
> > Faulting instruction address: 0xc0295434
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > PREEMPT SMP NR_CPUS=2 NUMA 
> > Modules linked in:
> > NIP: C0295434 LR: C0295420 CTR: 
> > REGS: c95d6e30 TRAP: 0300   Not tainted  (2.6.21-rc5-rt7)
> > MSR: 80009032   CR: 24000482  XER: 2000
> > DAR: 0009, DSISR: 4000
> > TASK = c1e7c440[626] 'netserver' THREAD: c95d4000 CPU: 0
> > GPR00: 0800 C95D70B0 C05D77B8 0001 
> > GPR04: 0001  C95D7080  
> > GPR08: C95D7030  C95D7040  
> > GPR12: FC69925300080D5D C04DE680  00422208 
> > GPR16: 0040 00420D10  C95D7C88 
> > GPR20: C1E7C440  0001 C8ACEAE0 
> > GPR24: 0020 C0E50C80 81F84C5E C1C00BE0 
> > GPR28: C1C05430 C1C00B80 C0570F30 C1FD1720 
> > NIP [C0295434] .spider_net_xmit+0x1dc/0x448
> > LR [C0295420] .spider_net_xmit+0x1c8/0x448
> > Call Trace:
> > [C95D70B0] [C0295420] .spider_net_xmit+0x1c8/0x448 
> > (unreliable)
> > [C95D7160] [C0327EE8] .dev_hard_start_xmit+0x238/0x300
> > [C95D7200] [C033A7F4] .__qdisc_run+0xdc/0x2a4
> > [C95D72B0] [C032A948] .dev_queue_xmit+0x1b0/0x2fc
> > [C95D7350] [C034B470] .ip_output+0x280/0x2d8
> > [C95D73F0] [C034C6CC] .ip_queue_xmit+0x448/0x4d8
> > [C95D74F0] [C035F6D8] .tcp_transmit_skb+0x850/0x8c0
> > [C95D75C0] [C035C394] .__tcp_ack_snd_check+0x84/0xc0
> > [C95D7650] [C035E114] .tcp_rcv_established+0x4f0/0x8ac
> > [C95D7700] [C0365B24] .tcp_v4_do_rcv+0x5c/0x448
> > [C95D77D0] [C031C2C4] .release_sock+0x94/0x11c
> > [C95D7870] [C0354E7C] .tcp_recvmsg+0x374/0x8d8
> > [C95D7960] [C031B8A0] .sock_common_recvmsg+0x5c/0x84
> > [C95D79F0] [C031921C] .sock_recvmsg+0x110/0x15c
> > [C95D7C00] [C031AA50] .sys_recvfrom+0xf0/0x174
> > [C95D7D90] [C0339368] .compat_sys_socketcall+0x178/0x214
> > [C95D7E30] [C0008634] syscall_exit+0x0/0x40
> > Instruction dump:
> > 6000 81790088 901f000c 913f0018 

Re: [PATCH] powernow-k8: depend on acpi-processor for SMP systems

2007-05-16 Thread Joshua Hoblitt
I don't think this is quiet right either as Ed Sweetman has reported
that this issue doesn't occur on single socket/multi-core systems.

-J

--
On Thu, May 17, 2007 at 12:50:50AM +0100, Daniel Drake wrote:
> powernow-k8 uses PSB BIOS tables to read frequency info on UP systems, but
> on SMP it requires the acpi-processor driver. Kconfig should be updated
> accordingly to avoid the issues that users are running into.
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=8075
> https://bugs.gentoo.org/show_bug.cgi?id=178585
> 
> Signed-off-by: Daniel Drake <[EMAIL PROTECTED]>
> 
> Index: linux/arch/i386/kernel/cpu/cpufreq/Kconfig
> ===
> --- linux.orig/arch/i386/kernel/cpu/cpufreq/Kconfig
> +++ linux/arch/i386/kernel/cpu/cpufreq/Kconfig
> @@ -81,6 +81,7 @@ config X86_POWERNOW_K7_ACPI
>  config X86_POWERNOW_K8
>   tristate "AMD Opteron/Athlon64 PowerNow!"
>   select CPU_FREQ_TABLE
> + select ACPI_PROCESSOR if SMP
>   depends on EXPERIMENTAL
>   help
> This adds the CPUFreq driver for mobile AMD Opteron/Athlon64 
> processors.


pgpu8Rj7GewjW.pgp
Description: PGP signature


Re: [RFC] select and dependencies in Kconfig

2007-05-16 Thread Stefan Richter
Timur Tabi wrote:
> Stefan Richter wrote:
>> "A... select B" is just a flavor of "A... depends on B",
...
> I think you mean "A... select B" is just a flavor of "B... depends on
> A".

No, A requires B's symbols.
-- 
Stefan Richter
-=-=-=== -=-= =---=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc1-mm1: strange GPF when panicing under kvm

2007-05-16 Thread Jeremy Fitzhardinge
When I boot 2.6.22-rc1-mm1 under kvm, but forget to specify a root
filesystem, it panics as expected.  However, when panicing, it gets a
GPF in delay_tsc, and then starts recursively panicing.

I don't really understand what's going on; the instruction it's faulting
on seems to be "pause" (ie, rep;nop), which seems like it shouldn't
fault at all.  It looks like some kvm artifact to me, but I'm not sure.

Hm, given the error code, maybe it's a segment register problem.

VFS: Cannot open root device "" or unknown-block(254,0)
Please append a correct "root=" boot option; here are the available partitions:
03004194304 hda driver: ide-disk
  03014192933 hda1
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(254,0)
general protection fault: fffa [#1]
PREEMPT SMP 
Modules linked in:
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 0297   (2.6.22-rc1-mm1-paravirt #1391)
EIP is at delay_tsc+0x20/0x42
eax: 00025431   ebx:    ecx:    edx: 0002
esi: 55c34bd8   edi:    ebp: c1421e70   esp: c1421e5c
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 1, ti=c142 task=c141f4f0 task.ti=c142)
Stack: 001c096b 55c0f7a7  0003  c1421e80 c021dc90 0001 
    c1421e90 c021dcb9  0003 c1421ec0 c01298f3 c0426f0e 
   c051bd60 c1421ec0 c012a168 c041f0a4 c1421ecc c0430656 c1421ecc c1421ef4 
Call Trace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_stack_log_lvl+0x9d/0xac
 [] show_registers+0x1f7/0x336
 [] die+0x119/0x21b
 [] do_general_protection+0x1bf/0x1c7
 [] error_code+0x72/0x78
 [] __delay+0xc/0xe
 [] __const_udelay+0x27/0x29
 [] panic+0xf8/0x101
 [] mount_block_root+0x221/0x236
 [] mount_root+0x59/0x5f
 [] prepare_namespace+0x102/0x149
 [] kernel_init+0x2bf/0x2ce
 [] kernel_thread_helper+0x7/0x10
 ===
INFO: lockdep is turned off.
Code: e2 8d 42 01 e8 cb ff ff ff c9 c3 55 89 e5 57 56 53 83 ec 08 89 45 ec 0f 
31 8d 74 26 00 b9 00 00 00 00 89 c6 89 c8 09 f0 89 45 f0  90 0f 31 8d 74 26 
00 b9 00 00 00 00 89 c6 89 c8 09 f0 2b 45 
EIP: [] delay_tsc+0x20/0x42 SS:ESP 0068:c1421e5c
general protection fault: fffa [#2]
PREEMPT SMP 
Modules linked in:
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 0282   (2.6.22-rc1-mm1-paravirt #1391)
EIP is at _spin_unlock_irqrestore+0x44/0x6d
eax: 0282   ebx: c048cd80   ecx: c01096f7   edx: 0001
esi: 0282   edi: 0068   ebp: c1421dbc   esp: c1421db4
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 1, ti=c142 task=c141f4f0 task.ti=c142)
Stack: c1421e24 c1421e5c c1421dec c01096f7 c0420228 0068 c1421e5c 0001 
   c048ecd4 c1421e24 0282 c1421e24 55c34bd8 fffa c1421e1c c037b992 
   fffa 000d 000b c011a748 0001 c11c2000 c141f6c0  
Call Trace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_stack_log_lvl+0x9d/0xac
 [] show_registers+0x1f7/0x336
 [] die+0x119/0x21b
 [] do_general_protection+0x1bf/0x1c7
 [] error_code+0x72/0x78
 [] die+0x188/0x21b
 [] do_general_protection+0x1bf/0x1c7
 [] error_code+0x72/0x78
 [] __delay+0xc/0xe
 [] __const_udelay+0x27/0x29
 [] panic+0xf8/0x101
 [] mount_block_root+0x221/0x236
 [] mount_root+0x59/0x5f
 [] prepare_namespace+0x102/0x149
 [] kernel_init+0x2bf/0x2ce
 [] kernel_thread_helper+0x7/0x10
 ===
INFO: lockdep is turned off.
Code: 89 d8 e8 7a 4f ea ff f7 c6 00 02 00 00 75 13 89 f0 50 9d 90 8d b4 26 00 
00 00 00 e8 0d 9f dc ff eb 11 e8 b0 b4 dc ff 89 f0 50 9d <90> 8d b4 26 00 00 00 
00 b8 01 00 00 00 e8 c9 99 da ff 89 e0 25 
EIP: [] _spin_unlock_irqrestore+0x44/0x6d SS:ESP 0068:c1421db4
general protection fault: fffa [#3]

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc1 does not boot on VIA C3_2 cause of X86_CMPXCHG64

2007-05-16 Thread Christian
Hi,

my small VIA C3_2 box does not boot with 2.6.22-rc1.
It even does not uncompress the kernel.

The configuration as M386 M486 works. But M586 + MVIAC3_2
does not work.

solution for me, cahnge arch/i386/Kconfig.cpu

--- arch/i386/Kconfig.cpu.before 2007-05-17 01:38:26.0 +0200
+++ arch/i386/Kconfig.cpu   2007-05-17 00:54:52.0 +0200
@@ -299,5 +299,5 @@

 config X86_CMPXCHG64
bool
-   depends on !M386 && !M486
+   depends on !M386 && !M486 && !MVIAC3_2
default y


The related #ifdef is in ./include/asm-i386/cmpxchg.h
May be cmpxchg8b is not supported by VIAC3_2 ?

May be some other non Intel/AMD need to be excluded from X86_CMPXCHG64 ?
May be the generic option CONFIG_X86_GENERIC need to switch this off also ?


Just write an email to me if you want to send a patch to test on a C3_2.

Best regards,

Christian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powernow-k8: depend on acpi-processor for SMP systems

2007-05-16 Thread Dave Jones
On Thu, May 17, 2007 at 12:50:50AM +0100, Daniel Drake wrote:
 > powernow-k8 uses PSB BIOS tables to read frequency info on UP systems, but
 > on SMP it requires the acpi-processor driver. Kconfig should be updated
 > accordingly to avoid the issues that users are running into.
 > 
 > http://bugzilla.kernel.org/show_bug.cgi?id=8075
 > https://bugs.gentoo.org/show_bug.cgi?id=178585

looks ok to me, but I'd like someone who has been seeing problems
to confirm this works first.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/3] 2.6.22-rc1: known regressions v2 - XFS

2007-05-16 Thread David Chinner
On Wed, May 16, 2007 at 04:40:20PM -0700, Jeremy Fitzhardinge wrote:
> David Chinner wrote:
> > Jeremy has tentatively indicated that the patch has fixed the problem.
> > Have you seen any more problems since applying the patch, Jeremy?
> >   
> 
> No, it continues to seem sound with casual use; I would have expected to
> see the problem reoccur by now.  I'd like to rerun the full set of tests
> I did before to be sure, but so far so good.  No other apparent
> regressions either.

Good to here. I think the problem is fixed, then.

> Also, the match between the observed symptoms and the bugfix is very
> good, which adds confidence (ie, no element of "it works now but we
> don't know why").  I guess the only remaining concern is whether there
> are any other paths which fail to dirty the inode.

There aren't any that I can see - if more come up we'll deal with
them then.

> Did you manage to repro the problem?

xfs_io is my friend ;)

Without patch:

# touch /mnt/scratch/fred
# xfs_io -c "pwrite 0 5" -c "s" -c "pwrite 5 5" /mnt/scratch/fred
wrote 5/5 bytes at offset 0
5.00 bytes, 1 ops; 0. sec (78.755 KiB/sec and 16129.0323 ops/sec)
wrote 5/5 bytes at offset 5
5.00 bytes, 1 ops; 0. sec (542.535 KiB/sec and 11. ops/sec)
# umount /mnt/scratch; mount /mnt/scratch; ls -l /mnt/scratch/fred
-rw-r--r-- 1 root  root  5 May 17 10:04 fred
#

So the second 5 byte write didn't change the file size.

With patch:

# touch /mnt/scratch/fred
# xfs_io -c "pwrite 0 5" -c "s" -c "pwrite 5 5" /mnt/scratch/fred
wrote 5/5 bytes at offset 0
5.00 bytes, 1 ops; 0. sec (76 KiB/sec and 15625. ops/sec)
wrote 5/5 bytes at offset 5
5.00 bytes, 1 ops; 0. sec (610 KiB/sec and 125000. ops/sec)
# umount /mnt/scratch; mount /mnt/scratch; ls -l /mnt/scratch/fred
-rw-r--r-- 1 root  root 10 May 17 09:53 fred
#

So yes, I've reproduced it and confirmed the patch fixes the problem.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powernow-k8: depend on acpi-processor for SMP systems

2007-05-16 Thread Daniel Drake
powernow-k8 uses PSB BIOS tables to read frequency info on UP systems, but
on SMP it requires the acpi-processor driver. Kconfig should be updated
accordingly to avoid the issues that users are running into.

http://bugzilla.kernel.org/show_bug.cgi?id=8075
https://bugs.gentoo.org/show_bug.cgi?id=178585

Signed-off-by: Daniel Drake <[EMAIL PROTECTED]>

Index: linux/arch/i386/kernel/cpu/cpufreq/Kconfig
===
--- linux.orig/arch/i386/kernel/cpu/cpufreq/Kconfig
+++ linux/arch/i386/kernel/cpu/cpufreq/Kconfig
@@ -81,6 +81,7 @@ config X86_POWERNOW_K7_ACPI
 config X86_POWERNOW_K8
tristate "AMD Opteron/Athlon64 PowerNow!"
select CPU_FREQ_TABLE
+   select ACPI_PROCESSOR if SMP
depends on EXPERIMENTAL
help
  This adds the CPUFreq driver for mobile AMD Opteron/Athlon64 
processors.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird hard disk noise on shutdown (bug #7674)

2007-05-16 Thread Rob Landley
On Wednesday 16 May 2007 9:49 am, Francesco Pretto wrote:
> 2007/5/16, Stephen Clark <[EMAIL PROTECTED]>:
> > >On Tuesday 15 May 2007 5:08 pm, Dave Jones wrote:
> > >
> > >I'm confused.  Could someone please explain?
> > >
> > I agree. This didn't happen when I was just using the ide driver, why
> > can't libata work as well
> > as the old ide driver.
> >
> 
> Read my reply to that post. To summarize: libata, prior to 2.6.22rc1,
> lacked the feature to spindown the hard disk. The last discussion was
> about who's responsable to issue the STANDBYNOW command to the hard
> disk. Response from the discussion is: the kernel. Trying to issue it
> form userspace (iff your shutdown(8) implementation do so) will now
> result in a big fat warning, until these compatibility measures will
> be dropped from the kernel (soon or later).

The last bit was what threw me.  It seemed that the kernel was changed to do 
the right thing, but only as a compatability measure that would be dropped 
because userspace should be changed to start doing it (which seemed crazy).

It seems that the _warning_ is the compatability measure that will be dropped 
(or perhaps the ability for userspace to do the wrong thing at all?), and the 
kernel will continue to DTRT.

It's a bit confusing for those of us coming in late in the discussion. :)

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird hard disk noise on shutdown (bug #7674)

2007-05-16 Thread Rob Landley
On Wednesday 16 May 2007 7:41 am, Tejun Heo wrote:
> Hello,
> 
> Rob Landley wrote:
> > Um, hang on.  So libata can't reliably turn the system off without data 
loss 
> > and potential damage to hardware unless userspace goes through a special 
song 
> > and dance?  And this is _not_ considered a defect in the kernel?
> 
> Yeap, definitely a bug in the kernel and we're trying to fix it.  Just
> for the record, we have _always_ issued FLUSH CACHE, so there hasn't
> been and won't be any data loss problem.  The data loss problem was
> mentioned as why we can't do things completely inside kernel without
> updating userland shutdown(8) which issues STANDBYNOW.

Ok, so the change is to get shutdown to _stop_ doing something stupid 
(spinning down the disk without first flushing the cache), and the correct 
thing for shutdown to do is keep its' mitts off the thing and let the kernel 
power down the darn hardware?

Woot,

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird hard disk noise on shutdown (bug #7674)

2007-05-16 Thread Rob Landley
On Wednesday 16 May 2007 5:15 am, Francesco Pretto wrote:
> - everyone else:
> // continue to do nothing :-)
> reboot();

That would be cool, but the impression I got from 
http://linux-ata.org/shutdown.html was that shutdown commands were supposed 
to _add_ quiescing of drives in order to avoid emergency head parking on 
poweroff.

That article says:

> Distros should update their shutdown(8) to do the followings.
> Check whether /sys/modules/libata/parameters/spindown_compat exists. If it
> does, write 0 to it. For each libata harddisk
> Check whether /sys/class/scsi_disk/h:c:i:l/manage_start_stop exists. If it
> doesn't, synchronize cache and spin the disk down as before. If it does, do
> nothing and continue to the next disk. The file is also accessible as
> /sys/block/sdX/device/scsi_disk:*/manage_start_stop.

You're saying all this is to work around kernels _before_ 2.6.22, and instead 
of updating your shutdown you could just either update the kernel instead?

> If exists some, at this point, exotic shutdown(8) implementation that
> is trying to issue STANDBYBOW on libata/scsi devices, it will get a
> fat warning. The warning could also state that the only supported way
> is to leave complete responsibility of spinning down the hard disks to
> kernel, so eventually it could be cleaned of checks and compatibility
> options.

I'm all for leaving this to the kernel.  I play in the embedded space a lot, 
so the less I can get away with doing, the better. :)

Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v12

2007-05-16 Thread Peter Williams

Ingo Molnar wrote:

* Peter Williams <[EMAIL PROTECTED]> wrote:

As usual, any sort of feedback, bugreport, fix and suggestion is more 
than welcome,
Load balancing appears to be badly broken in this version.  When I 
started 4 hard spinners on my 2 CPU machine one ended up on one CPU 
and the other 3 on the other CPU and they stayed there.


hm, i cannot reproduce this on 4 different SMP boxen, trying various 
combinations of SCHED_SMT/MC


You may need to try more than once.  Testing load balancing can be a 
pain as there's always a possibility you'll get a good result just by 
chance.  I.e. you need a bunch of good results to say it's OK but only 
one bad result to say it's broken.   This makes testing load balancing a 
pain.


and other .config options that might make a 
difference to balancing. Could you send me your .config?


Sent separately.

Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] recalc_sigpending_tsk fixes

2007-05-16 Thread Oleg Nesterov
On 05/16, Roland McGrath wrote:
> 
> + * After recalculating TIF_SIGPENDING, we need to make sure the task wakes 
> up.
> + * This is superfluous when called on current, the wakeup is a harmless 
> no-op.
> + */
> +void recalc_sigpending_and_wake(struct task_struct *t)
> +{
> + if (recalc_sigpending_tsk(t))
> + signal_wake_up(t, 0);
>  }

We already discussed this, this is not so important, but how about

void recalc_sigpending_and_wake(struct task_struct *t)
{
int was_pending = signal_pending(t);

if (recalc_sigpending_tsk(t) && !was_pending)
signal_wake_up(t, 0);
}

?

This "was_pending" is more a documenation than a optimization.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] scalable rw_mutex

2007-05-16 Thread Christoph Lameter
On Wed, 16 May 2007, Andrew Morton wrote:

> (I hope.  Might have race windows in which the percpu_counter_sum() count is
> inaccurate?)

The question is how do these race windows affect the locking scheme?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5][TAKE3] fallocate() implementation on i86, x86_64 and powerpc

2007-05-16 Thread David Chinner
On Wed, May 16, 2007 at 07:21:16AM -0500, Dave Kleikamp wrote:
> On Wed, 2007-05-16 at 13:16 +1000, David Chinner wrote:
> > On Wed, May 16, 2007 at 01:33:59AM +0530, Amit K. Arora wrote:
> 
> > > Following changes were made to the previous version:
> > >  1) Added description before sys_fallocate() definition.
> > >  2) Return EINVAL for len<=0 (With new draft that Ulrich pointed to,
> > > posix_fallocate should return EINVAL for len <= 0.
> > >  3) Return EOPNOTSUPP if mode is not one of FA_ALLOCATE or FA_DEALLOCATE
> > >  4) Do not return ENODEV for dirs (let individual file systems decide if
> > > they want to support preallocation to directories or not.
> > >  5) Check for wrap through zero.
> > >  6) Update c/mtime if fallocate() succeeds.
> > 
> > Please don't make this always happen. c/mtime updates should be dependent
> > on the mode being used and whether there is visible change to the file. If 
> > no
> > userspace visible changes to the file occurred, then timestamps should not
> > be changed.
> 
> i_blocks will be updated, so it seems reasonable to update ctime.  mtime
> shouldn't be changed, though, since the contents of the file will be
> unchanged.

That's assuming blocks were actually allocated - if the prealloc range already
has underlying blocks there is no change and so we should not be changing
mtime either. Only the filesystem will know if it has changed the file, so I
think that timestamp updates need to be driven down to that level, not done
blindy at the highest layer

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/3] 2.6.22-rc1: known regressions v2 - XFS

2007-05-16 Thread Jeremy Fitzhardinge
David Chinner wrote:
> Jeremy has tentatively indicated that the patch has fixed the problem.
> Have you seen any more problems since applying the patch, Jeremy?
>   

No, it continues to seem sound with casual use; I would have expected to
see the problem reoccur by now.  I'd like to rerun the full set of tests
I did before to be sure, but so far so good.  No other apparent
regressions either.

Also, the match between the observed symptoms and the bugfix is very
good, which adds confidence (ie, no element of "it works now but we
don't know why").  I guess the only remaining concern is whether there
are any other paths which fail to dirty the inode.

Did you manage to repro the problem?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kconfig powernow-k8 driver should depend on ACPI P-States driver

2007-05-16 Thread Ed Sweetman

Daniel Drake wrote:

Ed Sweetman wrote:
Like i mentioned off list, the problem here is that cpu freq modules 
dont depend (Kconfig) on CONFIG_ACPI_PROCESSOR, yet they do.


Not really. Firstly, some of the cpufreq modules *do* depend on 
CONFIG_ACPI_PROCESSOR. Secondly, the ones that don't have an existing 
dependency do not actually depend on ACPI_PROCESSOR in some/most 
configurations.


I'll send in a patch to fix the real problem soon.

Daniel


The way i patched it was to just include a "select ACPI_PROCESSOR" in
the X86_POWERNOW_K8 Kconfig entry.

Since the "driver" that the user sees in the Kconfig is X86_POWERNOW_K8
is actually not a driver at all, our actual driver behaves differently
since the "pseudo" driver only depends is CPUFREQ.  This is misleading,
as the actual driver beneath it, depends on ACPI_PROCESSOR too.  Now the
driver beneath it responds as you'd think.  It disables itself when it's
depends lines are invalid.  Which is good. But the menu entry that we
see is for X86_POWERNOW_K8, and that isn't disabled or anything when
those depends lines of the driver it actually represents fails.


This is easily fixed with the select line in the "pseudo" driver
...which i find a little more appropriate than a depends line.


As for actual module dependency issues, i haven't bothered looking into
that.   As far as the Kconfig shows it shouldn't be allowed to have
ACPI_PROCESSOR as a module at all. So maybe that's intended.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-16 Thread H. Peter Anvin
Correction, does *this patch* do it for you?

-hpa

diff --git a/arch/i386/boot/setup.ld b/arch/i386/boot/setup.ld
index e9ca0c2..c9c5530 100644
--- a/arch/i386/boot/setup.ld
+++ b/arch/i386/boot/setup.ld
@@ -44,5 +44,5 @@ SECTIONS
 
/DISCARD/ : { *(.note*) }
 
-   ASSERT(_end <= 0x8000, "Setup too big!")
+   . = ASSERT(_end <= 0x8000, "Setup too big!");
 }


Re: [2/3] 2.6.22-rc1: known regressions v2 - XFS

2007-05-16 Thread David Chinner
On Wed, May 16, 2007 at 10:31:39PM +0200, Michal Piotrowski wrote:
> Hi all,
> 
> Here is a list of some known regressions in 2.6.22-rc1.
> 
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
> 
> File systems
> 
> Subject: 2.6.21-git10/11: files getting truncated on xfs
> References : http://lkml.org/lkml/2007/5/9/410
> Submitter  : Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> Handled-By : David Chinner <[EMAIL PROTECTED]>
> Patch  : http://lkml.org/lkml/2007/5/12/93
> Status : patch was suggested

Jeremy has tentatively indicated that the patch has fixed the problem.
Have you seen any more problems since applying the patch, Jeremy?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] scalable rw_mutex

2007-05-16 Thread Andrew Morton
On Sat, 12 May 2007 11:06:24 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On 12 May 2007 20:55:28 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > Andrew Morton <[EMAIL PROTECTED]> writes:
> > 
> > > On Fri, 11 May 2007 10:07:17 -0700 (PDT)
> > > Christoph Lameter <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On Fri, 11 May 2007, Andrew Morton wrote:
> > > > 
> > > > > yipes.  percpu_counter_sum() is expensive.
> > > > 
> > > > Capable of triggering NMI watchdog on 4096+ processors?
> > > 
> > > Well.  That would be a millisecond per cpu which sounds improbable.  And
> > > we'd need to be calling it under local_irq_save() which we presently 
> > > don't.
> > > And nobody has reported any problems against the existing callsites.
> > > 
> > > But it's no speed demon, that's for sure.
> > 
> > There is one possible optimization for this I did some time ago. You don't 
> > really
> > need to sum all over the possible map, but only all CPUs that were ever 
> > online. But this only helps on systems where the possible map is bigger
> > than online map in the common case. But that shouldn't be the case anymore 
> > on x86
> > -- it just used to be. If it's true on some other architectures it might
> > be still worth it.
> > 
> 
> hm, yeah.
> 
> We could put a cpumask in percpu_counter, initialise it to
> cpu_possible_map.  Then, those callsites which have hotplug notifiers can
> call into new percpu_counter functions which clear and set bits in that
> cpumask and which drain percpu_counter.counts[cpu] into
> percpu_counter.count.
> 
> And percpu_counter_sum() gets taught to do for_each_cpu_mask(fbc->cpumask).

Like this:


From: Andrew Morton <[EMAIL PROTECTED]>

per-cpu counters presently must iterate over all possible CPUs in the
exhaustive percpu_counter_sum().

But it can be much better to only iterate over the presently-online CPUs.  To
do this, we must arrange for an offlined CPU's count to be spilled into the
counter's central count.

We can do this for all percpu_counters in the machine by linking them into a
single global list and walking that list at CPU_DEAD time.

(I hope.  Might have race windows in which the percpu_counter_sum() count is
inaccurate?)


Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 include/linux/percpu_counter.h |   18 ++--
 lib/percpu_counter.c   |   66 +++
 2 files changed, 72 insertions(+), 12 deletions(-)

diff -puN lib/percpu_counter.c~percpu_counters-use-cpu-notifiers 
lib/percpu_counter.c
--- a/lib/percpu_counter.c~percpu_counters-use-cpu-notifiers
+++ a/lib/percpu_counter.c
@@ -3,8 +3,17 @@
  */
 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include 
 
+#ifdef CONFIG_HOTPLUG_CPU
+static LIST_HEAD(percpu_counters);
+static DEFINE_MUTEX(percpu_counters_lock);
+#endif
+
 void percpu_counter_mod(struct percpu_counter *fbc, s32 amount)
 {
long count;
@@ -44,3 +53,60 @@ s64 percpu_counter_sum(struct percpu_cou
return ret < 0 ? 0 : ret;
 }
 EXPORT_SYMBOL(percpu_counter_sum);
+
+void percpu_counter_init(struct percpu_counter *fbc, s64 amount)
+{
+   spin_lock_init(>lock);
+   fbc->count = amount;
+   fbc->counters = alloc_percpu(s32);
+#ifdef CONFIG_HOTPLUG_CPU
+   mutex_lock(_counters_lock);
+   list_add(>list, _counters);
+   mutex_unlock(_counters_lock);
+#endif
+}
+EXPORT_SYMBOL(percpu_counter_init);
+
+void percpu_counter_destroy(struct percpu_counter *fbc)
+{
+   free_percpu(fbc->counters);
+#ifdef CONFIG_HOTPLUG_CPU
+   mutex_lock(_counters_lock);
+   list_del(>list);
+   mutex_unlock(_counters_lock);
+#endif
+}
+EXPORT_SYMBOL(percpu_counter_destroy);
+
+#ifdef CONFIG_HOTPLUG_CPU
+static int __cpuinit percpu_counter_hotcpu_callback(struct notifier_block *nb,
+   unsigned long action, void *hcpu)
+{
+   unsigned int cpu;
+   struct percpu_counter *fbc;
+
+   if (action != CPU_DEAD)
+   return NOTIFY_OK;
+
+   cpu = (unsigned long)hcpu;
+   mutex_lock(_counters_lock);
+   list_for_each_entry(fbc, _counters, list) {
+   s32 *pcount;
+
+   spin_lock(>lock);
+   pcount = per_cpu_ptr(fbc->counters, cpu);
+   fbc->count += *pcount;
+   *pcount = 0;
+   spin_unlock(>lock);
+   }
+   mutex_unlock(_counters_lock);
+   return NOTIFY_OK;
+}
+
+static int __init percpu_counter_startup(void)
+{
+   hotcpu_notifier(percpu_counter_hotcpu_callback, 0);
+   return 0;
+}
+module_init(percpu_counter_startup);
+#endif
diff -puN include/linux/percpu.h~percpu_counters-use-cpu-notifiers 
include/linux/percpu.h
diff -puN include/linux/percpu_counter.h~percpu_counters-use-cpu-notifiers 
include/linux/percpu_counter.h
--- a/include/linux/percpu_counter.h~percpu_counters-use-cpu-notifiers
+++ a/include/linux/percpu_counter.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -17,6 +18,9 @@

[PATCH] spi: potential memleak in spidev_ioctl

2007-05-16 Thread Florin Malita
'ioc' should be deallocated if __copy_from_user fails (found by Coverity 
- CID 1644).


Signed-off-by: Florin Malita <[EMAIL PROTECTED]>
---

spidev.c |1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index c0a6dce..2464f34 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -364,6 +364,7 @@ spidev_ioctl(struct inode *inode, struct file *filp,
break;
}
if (__copy_from_user(ioc, (void __user *)arg, tmp)) {
+   kfree(ioc);
retval = -EFAULT;
break;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc1-mm1

2007-05-16 Thread H. Peter Anvin
Mel Gorman wrote:
>   LD  arch/i386/boot/setup.elf
> ld:arch/i386/boot/setup.ld:47: syntax error

Does this patch fix it for you?

-hpa
diff --git a/arch/i386/boot/setup.ld b/arch/i386/boot/setup.ld
index e9ca0c2..c9c5530 100644
--- a/arch/i386/boot/setup.ld
+++ b/arch/i386/boot/setup.ld
@@ -44,5 +44,5 @@ SECTIONS
 
/DISCARD/ : { *(.note*) }
 
-   ASSERT(_end <= 0x8000, "Setup too big!")
+   . = ASSERT(_end <= 0x8000, "Setup too big!")
 }


Re: v2.6.21-rt2

2007-05-16 Thread Free Ekanayaka
Hi,

|--==> Ingo Molnar writes:

  IM> i'm pleased to announce the v2.6.20-rt2 kernel, which can be downloaded 
  IM> from the usual place:

  IM> http://redhat.com/~mingo/realtime-preempt/

This new version of the patch solves the amd64/udev bug I reported
against previous releases:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg00353.html

I'm going to test the patch on other amd64 machines as well, thanks a
lot and keep up goo the work!

Ciao,

Free
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] select and dependencies in Kconfig

2007-05-16 Thread Timur Tabi

Stefan Richter wrote:


"A... select B" is just a flavor of "A... depends on B", with the
additional instruction to the Kconfig UIs:  Don't hide A if you can
silently switch on B.


I think you mean "A... select B" is just a flavor of "B... depends on A".  There is one 
minor difference between the two.


If A is a driver and B is a library, then it's more intuitive to update the Kconfig option 
for A then it is to update the Kconfig option for B.  For example, if I want to add a new 
driver C that uses library B, I can just add this:


C
select B

If I have to use "depends on", then I would have to change the Kconfig option 
for B like this:

B
depends on A || C

And every time I create a new driver that depends on library B, I have to update that 
"depends on" line *in addition to* creating the Kconfig line for the new driver.  If 10 
drivers use library B, you'll have this:


B
depends on A || C || D || E || F || G || H || I || J || K


How about throwing "select" out of the Kconfig language and improving
the UIs instead, so that users find what they want and need?


I know a lot of people don't like 'select', but I prefer it over 'depends on'.

--
Timur Tabi
Linux Kernel Developer @ Freescale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1 of 2] block_page_mkwrite() Implementation V2

2007-05-16 Thread David Chinner
On Wed, May 16, 2007 at 11:19:29AM +0100, David Howells wrote:
> 
> However, page_mkwrite() isn't told which bit of the page is going to be
> written to.  This means it has to ask prepare_write() to make sure the whole
> page is filled in.  In other words, offset and to must be equal (in AFS I set
> them both to 0).

The assumption is the page is already up to date and we are writing
the whole page unless EOF lands inside the page. AFAICT, we can't
get called with a page that is not uptodate and so page filling is
not something we should be doing (or want to be doing) here. All we
want to do is to be able to change the mapping from a read to a
write mapping (e.g. a read mapping of a hole needs to be changed on
write) and do the relevant space reservation/allocation and buffer
mapping needed for this change.

> However, if someone adds a syscall to punch holes in files, this may change...

We already have them - ioctl(XFS_IOC_UNRESVSP) and
madvise(MADV_REMOVE) - and another - fallocate(FA_DEALLOCATE) - is
on it's way. Racing with truncates should already be handled by the
truncate code (i.e. partial page truncation does the zero filling).

/me makes note to implement ->truncate_range() in XFS for MADV_REMOVE.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


bug seen with dynticks from CONFIG_HARDIRQS_SW_RESEND

2007-05-16 Thread Woodruff, Richard
Hi,

In testing we were noticing that we were getting some intermittent
crashes in profile_tick() when dyntick was enabled.

The crashes were because the frame pointer per_cpuirq_regs value was
0.  That code does a user_mode(get_irq_regs()).  Currently regs is set
only upon real hardware entry on an irq.

The crash path shows resend_irqs() could be called with in a context
where set_irq_regs() was not executed.  In one specific case this was
from
softirq->tasklet_action(resend_tasklet)->resend_irqs->handle_level_irq->
handle_IRQ_event->...->profile_tick.

It seems anyone calling kernel/irq/manage.c:enable_irq() at the wrong
time can trigger this crash.

Creating a fake stack and doing a set_irq_regs() fixes the crash.  Would
it be useful to set a pointer to the entry context on all state changes?
For ease I just hacked a default fake stack into the init process after
fork time so there is never a 0 but that doesn't seem so nice.

Regards,
Richard W.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kconfig powernow-k8 driver should depend on ACPI P-States driver

2007-05-16 Thread Daniel Drake

Ed Sweetman wrote:
Like i mentioned off list, the problem here is that cpu freq modules 
dont depend (Kconfig) on CONFIG_ACPI_PROCESSOR, yet they do.


Not really. Firstly, some of the cpufreq modules *do* depend on 
CONFIG_ACPI_PROCESSOR. Secondly, the ones that don't have an existing 
dependency do not actually depend on ACPI_PROCESSOR in some/most 
configurations.


I'll send in a patch to fix the real problem soon.

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523

2007-05-16 Thread Jiri Kosina
On Wed, 16 May 2007, David Miller wrote:

> > I have just verified that this locking scheme is indeed correct. So you 
> > can add
> > 
> > Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
> > 
> > if you wish to, and submit the patch to Andrew.
> I guess I don't get sent networking patches any more?
> :-)

Well, this is bluetooth-specific, but it seemed to me that Marcel wasn't 
going to send pull requests to Linus any time soon, therefore I thought 
going through akpm is a thing to do.

Honestly, I really don't care through which tree this goes in, so sorry if 
any offence was caused here :)

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why can't we sleep in an ISR?

2007-05-16 Thread Dong Feng

OK. I think the gap between you and me is the definition of term
*context*. If you go to Linux Kernel Development, 2nd Edition (ISBN
0-672-32720-1), Page 6, then you will read the following:

  in Linux, ... each processor is doing one of three things at any
given moment:

1. In kernel-space, in process context, ...
2. In kernel-space, in interrupt context, not associated with a process, ...
3. In user-space ...

This list is inclusive. ...


Maybe you prefer other terminology system, but I do like the above
definition given by Robert Love. So maybe in your system *context*
mean something at hardware level and you say ISR is in process
context, but I think it is more like a logical level and agree with
Rovert's definition.

And in hardware level, Robert's *context* definition also mean
something specific, that I started to be aware of. That is, *in the
same context* means a kernel-code is triggered by a user-space code.
*in different context* means a kernel-code is triggered by an external
interrupt source other than a user-space code.

Context has nothing to do with whether an ISR borrow any data
structure of a process, instead, its something logical or related to
causality.



2007/5/16, Phillip Susi <[EMAIL PROTECTED]>:

Dong Feng wrote:
> If what you say were true, then an ISR would be running in the same
> context as the interrupted process.

Yes, and it is, as others have said in this thread, which is a good
reason why ISRs can't sleep.

> But please check any article or
> book, it will say ISR running in different context from any process.
> So ISR is considered in its own context, although it shares a lot of
> things with the interrupted process. I would only say *context* is a
> higher-level logical concept.

Depends on which book or article you are reading I suppose.  The
generally accepted and often used thought is that ISRs technically are
running in the context of the interrupted process, but because that
context is unknown and therefore should not be used, it is often said
that they run in no context, or outside of any context.  Sometimes
people then assume that because they run outside of any ( particular )
process context, they must be in their own context, but this is a mistake.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >