Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Andi Kleen
On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 21 Jul 2007, Fengguang Wu wrote:
> >
> > Sorry, forgot to prefix the patch titles with [readahead].
> > Should I repost?
> 
> Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 

Haven't the readahead patches already essentially been in -mm* for some time?
I thought the new patches were some some restructured code, but essentially
the tested algorithms? 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Steven Rostedt


> On Saturday 21 July 2007 01:55, Michal Piotrowski wrote:
> >
> > I really like this idea - code duplication is a bad thing.
>
> Did you actually look at the patch? It doesn't have a single line
> less duplication than there was before. Everything that could
> be easily shared was shared already.
>
> It's just new window dressing without any real advantages.

And did you read what tglx wrote?

This patch was the beginning of the merger, not the end result. It strived
for binary identical images. It was to put everything together as a
_starting_point_!   The next thing to do after this is to start the
merging.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] radixtree: introduce radix_tree_scan_hole()

2007-07-20 Thread Andrew Morton
On Sat, 21 Jul 2007 12:43:06 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:

> Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree
> for the first hole. It will be used in interleaved readahead.

If you're ever feeling fantastically bored, please consider updating the
userspace radix-tree test harness for this?  Cook up a couple of testcases
for the new functionality?

Thanks.

http://www.zip.com.au/~akpm/linux/patches/stuff/rtth.tar.gz is the latest.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] console: fix section mismatch warning in vgacon.c

2007-07-20 Thread Sam Ravnborg
On Sat, Jul 21, 2007 at 07:37:29AM +0800, Antonino A. Daplas wrote:
> On Fri, 2007-07-20 at 23:27 +0200, Sam Ravnborg wrote:
> > Fix following section mismatch warning:
> > WARNING: vmlinux.o(.text+0x121e62): Section mismatch: reference to 
> > .init.text:__alloc_bootmem (between 'vgacon_startup' and 
> > 'vgacon_scrolldelta')
> > 
> > Browsing the code it seems that vgacon_scrollback_startup() is only
> > called during the init phase so the reference to the .init.text
> > section is OK.
> > Teach modpost not to warn using ___init_refok.
> > 
> > Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
> Acked-by: Antonino Daplas <[EMAIL PROTECTED]>

Thanks. Will you take care of forwarding it it or do we rely
on Andrew in this area?

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Andi Kleen
On Saturday 21 July 2007 01:55, Michal Piotrowski wrote:
> Hi,
>
> On 21/07/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> > We are pleased to announce a project we've been working on for some
> > time: the unified x86 architecture tree, or "arch/x86" - and we'd like
> > to solicit feedback about it.
> >
> > What is this about?
>
> [..]
>
> > As usual, comments and suggestions are welcome!
>
> I really like this idea - code duplication is a bad thing.

Did you actually look at the patch? It doesn't have a single line
less duplication than there was before. Everything that could
be easily shared was shared already. 

It's just new window dressing without any real advantages.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Andi Kleen
On Saturday 21 July 2007 00:32, Thomas Gleixner wrote:
> We are pleased to announce a project we've been working on for some
> time: the unified x86 architecture tree, or "arch/x86" - and we'd like
> to solicit feedback about it.

Well you know my position on this. I think it's a bad idea because
it means we can never get rid of any old junk. IMNSHO arch/x86_64
is significantly cleaner and simpler in many ways than arch/i386 and I would
like to preserve that. Also in general arch/x86_64 is much easier to hack
than arch/i386 because it's easier to regression test and in general
has to care about much less junk. And I don't 
know of any way to ever fix that for i386 besides splitting the old
stuff off completely.

Besides radical file movements like this are bad anyways. They cause
a big break in patchkits and forward/backwards porting that doesn't 
really help anybody.

> This causes double maintenance
> even for functionality that is conceptually the same for the 32-bit and
> the 64-bit tree. (such as support for standard PC platform architecture
> devices)

It's not really the same platform: one is PC hardware going back forever
with zillions of bugs, the other is modern PC platforms which much less
bugs and quirks

To see it otherwise it's more a junkification of arch/x86_64 than
a cleanup of arch/i386 -- in fact you didn't really clean up arch/i386 
at all.

> How did we do it?
> -
>
> As an initial matter, we made it painstakingly sure that the resulting
> .o files in a 32-bit build are bit for bit equal.

You got not a single line less code duplication then, so i don't really
see the point of this.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Nigel Cunningham
Hi.

On Saturday 21 July 2007 08:43:20 [EMAIL PROTECTED] wrote:
> On Fri, 20 Jul 2007, Alan Stern wrote:
> 
> > On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:
> >
>  when doing a suspend-to-ram you get to a point where you just don't use
>  any userspace.
> >>
> >>> What do you mean?  How can you prevent user tasks from running?  That's
> >>> basically what the freezer does, and the whole point of this approach
> >>> is to eliminate the freezer.  Right?
> >>
> >> Presumably no tasks at all would be scheduled.
> >
> > How would you prevent tasks from being scheduled?  How would you
> > prevent drivers from deadlocking because in order to put their device
> > in a low-power state they need to acquire a lock which is held by a
> > user task?
> 
> you give up on the suspend becouse you have no way of getting the user 
> task to give up the lock.
> 
> however, kernel locks should not be held by user tasks, user tasks are not 
> expected to behave in rational ways, allowing them to compete with kernel 
> tasks for locks is a sure way to get a deadlock or indefinate stall.
> 
> what locks are accessed this way?

Any userspace process can do a syscall. In the process of the syscall, it can 
take kernel locks, and it can schedule (eg, while seeking to take a second 
lock).

Regards,

Nigel


pgpl7edMXgJyR.pgp
Description: PGP signature


Re: where is the code for read system call?

2007-07-20 Thread Folkert van Heusden
> My application reads from socket. I need to change the behavior of read
> system call for an experiment. Can someone point me to code?

Wouldn't it be easier to create a preload-library-wrapper around glibc?


Folkert van Heusden

-- 
MultiTail is a versatile tool for watching logfiles and output of
commands. Filtering, coloring, merging, diff-view, etc.
http://www.vanheusden.com/multitail/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2007-07-20 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This will get another small batch of changes for 2.6.23:

Arthur Jones (1):
  IB/ipath: Remove ipath_layer dead code

Florin Malita (1):
  IB/mlx4: Fix leaks in __mlx4_ib_modify_qp

Hoang-Nam Nguyen (3):
  IB/ehca: Support large page MRs
  IB/ehca: Generate async event when SRQ limit reached
  IB/ehca: Move ehca2ib_return_code() out of line

Joachim Fenkes (1):
  IB/ehca: Make internal_create/destroy_qp() static

Michael S. Tsirkin (1):
  IB/mthca: Change command token on timeout

Roland Dreier (2):
  mlx4_core: Change command token on timeout
  IB/mlx4: Fix error path in create_qp_common()

Stefan Roscher (1):
  IB/ehca: Support small QP queues

 drivers/infiniband/hw/ehca/ehca_classes.h |   50 +++--
 drivers/infiniband/hw/ehca/ehca_cq.c  |8 +-
 drivers/infiniband/hw/ehca/ehca_eq.c  |8 +-
 drivers/infiniband/hw/ehca/ehca_irq.c |   42 +++-
 drivers/infiniband/hw/ehca/ehca_main.c|   49 -
 drivers/infiniband/hw/ehca/ehca_mrmw.c|  371 -
 drivers/infiniband/hw/ehca/ehca_mrmw.h|2 +-
 drivers/infiniband/hw/ehca/ehca_pd.c  |   25 ++-
 drivers/infiniband/hw/ehca/ehca_qp.c  |  178 --
 drivers/infiniband/hw/ehca/ehca_tools.h   |   19 +--
 drivers/infiniband/hw/ehca/ehca_uverbs.c  |2 +-
 drivers/infiniband/hw/ehca/hcp_if.c   |   50 +++-
 drivers/infiniband/hw/ehca/ipz_pt_fn.c|  222 +
 drivers/infiniband/hw/ehca/ipz_pt_fn.h|   26 ++-
 drivers/infiniband/hw/ipath/Makefile  |1 -
 drivers/infiniband/hw/ipath/ipath_layer.c |  365 
 drivers/infiniband/hw/ipath/ipath_layer.h |   71 --
 drivers/infiniband/hw/ipath/ipath_verbs.h |2 -
 drivers/infiniband/hw/mlx4/qp.c   |   20 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c   |3 +-
 drivers/net/mlx4/cmd.c|3 +-
 21 files changed, 802 insertions(+), 715 deletions(-)
 delete mode 100644 drivers/infiniband/hw/ipath/ipath_layer.c
 delete mode 100644 drivers/infiniband/hw/ipath/ipath_layer.h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] readahead cleanups and interleaved readahead take 3

2007-07-20 Thread Fengguang Wu
Andrew,

The following patches are based on yesterday's discussions, compiled and
tested OK:

smaller file_ra_state:
[PATCH 1/7] readahead: compacting file_ra_state
[PATCH 2/7] readahead: mmap read-around simplification
[PATCH 3/7] readahead: combine file_ra_state.prev_index/prev_offset 
into prev_

code cleanups:
[PATCH 4/7] readahead: remove several readahead macros
[PATCH 5/7] readahead: remove the limit max_sectors_kb imposed on 
max_readahead_kb

support of interleaved reads:
[PATCH 6/7] radixtree: introduce radix_tree_scan_hole()
[PATCH 7/7] readahead: basic support of interleaved reads

The diffstat is

 block/ll_rw_blk.c  |9 -
 fs/ext3/dir.c  |2 -
 fs/ext4/dir.c  |2 -
 fs/splice.c|2 -
 include/linux/fs.h |   14 +++-
 include/linux/mm.h |2 -
 include/linux/radix-tree.h |2 +
 lib/radix-tree.c   |   34 
 mm/filemap.c   |   17 +-
 mm/readahead.c |   58 +++
 10 files changed, 86 insertions(+), 56 deletions(-)

Regards,
Fengguang Wu
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/7] readahead: mmap read-around simplification

2007-07-20 Thread Fengguang Wu
Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss
and make it an int.

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 include/linux/fs.h |3 +--
 mm/filemap.c   |4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h
+++ linux-2.6.22-rc6-mm1/include/linux/fs.h
@@ -777,8 +777,7 @@ struct file_ra_state {
   there are only # of pages ahead */
 
unsigned int ra_pages;  /* Maximum readahead window */
-   unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
-   unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
+   int mmap_miss;  /* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
 };
--- linux-2.6.22-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.22-rc6-mm1/mm/filemap.c
@@ -1389,7 +1389,7 @@ retry_find:
 * Do we miss much more than hit in this file? If so,
 * stop bothering with read-ahead. It will only hurt.
 */
-   if (ra->mmap_miss > ra->mmap_hit + MMAP_LOTSAMISS)
+   if (ra->mmap_miss > MMAP_LOTSAMISS)
goto no_cached_page;
 
/*
@@ -1415,7 +1415,7 @@ retry_find:
}
 
if (!did_readaround)
-   ra->mmap_hit++;
+   ra->mmap_miss--;
 
/*
 * We have a locked page in the page cache, now we need to check

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] readahead: combine file_ra_state.prev_index/prev_offset into prev_pos

2007-07-20 Thread Fengguang Wu
Combine the file_ra_state members
unsigned long prev_index
unsigned int prev_offset
into
loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

Cc: Peter Zijlstra <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/ext3/dir.c  |2 +-
 fs/ext4/dir.c  |2 +-
 fs/splice.c|2 +-
 include/linux/fs.h |3 +--
 mm/filemap.c   |   11 ++-
 mm/readahead.c |   15 ---
 6 files changed, 18 insertions(+), 17 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h
+++ linux-2.6.22-rc6-mm1/include/linux/fs.h
@@ -778,8 +778,7 @@ struct file_ra_state {
 
unsigned int ra_pages;  /* Maximum readahead window */
int mmap_miss;  /* Cache miss stat for mmap accesses */
-   unsigned long prev_index;   /* Cache last read() position */
-   unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
+   loff_t prev_pos;/* Cache last read() position */
 };
 
 /*
--- linux-2.6.22-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.22-rc6-mm1/mm/filemap.c
@@ -881,8 +881,8 @@ void do_generic_mapping_read(struct addr
 
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
-   prev_index = ra.prev_index;
-   prev_offset = ra.prev_offset;
+   prev_index = ra.prev_pos >> PAGE_CACHE_SHIFT;
+   prev_offset = ra.prev_pos & (PAGE_CACHE_SIZE-1);
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
 
@@ -968,7 +968,6 @@ page_ok:
index += offset >> PAGE_CACHE_SHIFT;
offset &= ~PAGE_CACHE_MASK;
prev_offset = offset;
-   ra.prev_offset = offset;
 
page_cache_release(page);
if (ret == nr && desc->count)
@@ -1055,7 +1054,9 @@ no_cached_page:
 
 out:
*_ra = ra;
-   _ra->prev_index = prev_index;
+   _ra->prev_pos = prev_index;
+   _ra->prev_pos <<= PAGE_CACHE_SHIFT;
+   _ra->prev_pos |= prev_offset;
 
*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
if (filp)
@@ -1435,7 +1436,7 @@ retry_find:
 * Found the page and have a reference on it.
 */
mark_page_accessed(page);
-   ra->prev_index = page->index;
+   ra->prev_pos = page->index << PAGE_CACHE_SHIFT;
return page;
 
 outside_data_content:
--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -45,7 +45,7 @@ void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
 {
ra->ra_pages = mapping->backing_dev_info->ra_pages;
-   ra->prev_index = -1;
+   ra->prev_pos = -1;
 }
 EXPORT_SYMBOL_GPL(file_ra_state_init);
 
@@ -318,7 +318,7 @@ static unsigned long get_next_ra_size(st
  * indicator. The flag won't be set on already cached pages, to avoid the
  * readahead-for-nothing fuss, saving pointless page cache lookups.
  *
- * prev_index tracks the last visited page in the _previous_ read request.
+ * prev_pos tracks the last visited byte in the _previous_ read request.
  * It should be maintained by the caller, and will be used for detecting
  * small random reads. Note that the readahead algorithm checks loosely
  * for sequential patterns. Hence interleaved reads might be served as
@@ -342,11 +342,9 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   int max;/* max readahead pages */
-   int sequential;
-
-   max = ra->ra_pages;
-   sequential = (offset - ra->prev_index <= 1UL) || (req_size > max);
+   int max = ra->ra_pages; /* max readahead pages */
+   pgoff_t prev_offset;
+   int sequential;
 
/*
 * It's the expected callback offset, assume sequential access.
@@ -360,6 +358,9 @@ ondemand_readahead(struct address_space 
goto readit;
}
 
+   prev_offset = ra->prev_pos >> PAGE_CACHE_SHIFT;
+   sequential = offset - prev_offset <= 1UL || req_size > max;
+
/*
 * Standalone, small read.
 * Read as is, and do not pollute the readahead state.
--- linux-2.6.22-rc6-mm1.orig/fs/ext3/dir.c
+++ linux-2.6.22-rc6-mm1/fs/ext3/dir.c
@@ -143,7 +143,7 @@ static int ext3_readdir(struct file * fi
sb->s_bdev->bd_inode->i_mapping,
>f_ra, filp,
index, 1);
-   filp->f_ra.prev_index = index;
+   filp->f_ra.prev_pos = index << PAGE_CACHE_SHIFT;
bh = ext3_bread(NULL, inode, blk, 0, );
}
 
--- linux-2.6.22-rc6-mm1.orig/fs/ext4/dir.c
+++ 

[PATCH 1/7] readahead: compacting file_ra_state

2007-07-20 Thread Fengguang Wu
Use 'unsigned int' instead of 'unsigned long' for readahead sizes.

This helps reduce memory consumption on 64bit CPU when
a lot of files are opened.

CC: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 include/linux/fs.h |8 
 mm/filemap.c   |2 +-
 mm/readahead.c |2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h
+++ linux-2.6.22-rc6-mm1/include/linux/fs.h
@@ -771,12 +771,12 @@ struct fown_struct {
  * Track a single file's readahead state
  */
 struct file_ra_state {
-   pgoff_t start;  /* where readahead started */
-   unsigned long size; /* # of readahead pages */
-   unsigned long async_size;   /* do asynchronous readahead when
+   pgoff_t start;  /* where readahead started */
+   unsigned int size;  /* # of readahead pages */
+   unsigned int async_size;/* do asynchronous readahead when
   there are only # of pages ahead */
 
-   unsigned long ra_pages; /* Maximum readahead window */
+   unsigned int ra_pages;  /* Maximum readahead window */
unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
--- linux-2.6.22-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.22-rc6-mm1/mm/filemap.c
@@ -840,7 +840,7 @@ static void shrink_readahead_size_eio(st
if (count > 5)
return;
count++;
-   printk(KERN_WARNING "Reducing readahead size to %luK\n",
+   printk(KERN_WARNING "Reducing readahead size to %dK\n",
ra->ra_pages << (PAGE_CACHE_SHIFT - 10));
 }
 
--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -342,7 +342,7 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   unsigned long max;  /* max readahead pages */
+   int max;/* max readahead pages */
int sequential;
 
max = ra->ra_pages;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] radixtree: introduce radix_tree_scan_hole()

2007-07-20 Thread Fengguang Wu
Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree
for the first hole. It will be used in interleaved readahead.

The implementation is dumb and obviously correct.
It can help debug(and document) the possible smart one in future.

Cc: Nick Piggin <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---

 include/linux/radix-tree.h |2 ++
 lib/radix-tree.c   |   34 ++
 2 files changed, 36 insertions(+)

--- linux-2.6.22-rc6-mm1.orig/include/linux/radix-tree.h
+++ linux-2.6.22-rc6-mm1/include/linux/radix-tree.h
@@ -155,6 +155,8 @@ void *radix_tree_delete(struct radix_tre
 unsigned int
 radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
--- linux-2.6.22-rc6-mm1.orig/lib/radix-tree.c
+++ linux-2.6.22-rc6-mm1/lib/radix-tree.c
@@ -601,6 +601,40 @@ int radix_tree_tag_get(struct radix_tree
 EXPORT_SYMBOL(radix_tree_tag_get);
 #endif
 
+static unsigned long
+radix_tree_scan_hole_dumb(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   unsigned long i;
+
+   for (i = 0; i < max_scan; i++) {
+   if (!radix_tree_lookup(root, index))
+   break;
+   if (++index == 0)
+   break;
+   }
+
+   return index;
+}
+
+/**
+ * radix_tree_scan_hole-scan for hole
+ * @root:  radix tree root
+ * @index: index key
+ * @max_scan:  advice on max items to scan (it may scan a little more)
+ *
+ *  Scan forward from @index for a hole/empty item, stop when
+ *  - hit hole
+ *  - wrap-around to index 0
+ *  - @max_scan or more items scanned
+ */
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   return radix_tree_scan_hole_dumb(root, index, max_scan);
+}
+EXPORT_SYMBOL(radix_tree_scan_hole);
+
 static unsigned int
 __lookup(struct radix_tree_node *slot, void **results, unsigned long index,
unsigned int max_items, unsigned long *next_index)

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/7] readahead: remove the limit max_sectors_kb imposed on max_readahead_kb

2007-07-20 Thread Fengguang Wu
Remove the size limit max_sectors_kb imposed on max_readahead_kb.

The size restriction is unreasonable. Especially when max_sectors_kb cannot
grow larger than max_hw_sectors_kb, which can be rather small for some disk
drives.

Cc: Jens Axboe <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
Acked-by: Jens Axboe <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c |9 -
 1 file changed, 9 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/block/ll_rw_blk.c
+++ linux-2.6.22-rc6-mm1/block/ll_rw_blk.c
@@ -3945,7 +3945,6 @@ queue_max_sectors_store(struct request_q
max_hw_sectors_kb = q->max_hw_sectors >> 1,
page_kb = 1 << (PAGE_CACHE_SHIFT - 10);
ssize_t ret = queue_var_store(_sectors_kb, page, count);
-   int ra_kb;
 
if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
return -EINVAL;
@@ -3954,14 +3953,6 @@ queue_max_sectors_store(struct request_q
 * values synchronously:
 */
spin_lock_irq(q->queue_lock);
-   /*
-* Trim readahead window as well, if necessary:
-*/
-   ra_kb = q->backing_dev_info.ra_pages << (PAGE_CACHE_SHIFT - 10);
-   if (ra_kb > max_sectors_kb)
-   q->backing_dev_info.ra_pages =
-   max_sectors_kb >> (PAGE_CACHE_SHIFT - 10);
-
q->max_sectors = max_sectors_kb << 1;
spin_unlock_irq(q->queue_lock);
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] readahead: remove several readahead macros

2007-07-20 Thread Fengguang Wu
Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES.

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 include/linux/mm.h |2 --
 mm/readahead.c |   10 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/mm.h
+++ linux-2.6.22-rc6-mm1/include/linux/mm.h
@@ -1148,8 +1148,6 @@ int write_one_page(struct page *page, in
 /* readahead.c */
 #define VM_MAX_READAHEAD   128 /* kbytes */
 #define VM_MIN_READAHEAD   16  /* kbytes (includes current page) */
-#define VM_MAX_CACHE_HIT   256 /* max pages in a row in cache before
-* turning readahead off */
 
 int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
pgoff_t offset, unsigned long nr_to_read);
--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -21,16 +21,8 @@ void default_unplug_io_fn(struct backing
 }
 EXPORT_SYMBOL(default_unplug_io_fn);
 
-/*
- * Convienent macros for min/max read-ahead pages.
- * Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up.
- * The latter is necessary for systems with large page size(i.e. 64k).
- */
-#define MAX_RA_PAGES   (VM_MAX_READAHEAD*1024 / PAGE_CACHE_SIZE)
-#define MIN_RA_PAGES   DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE)
-
 struct backing_dev_info default_backing_dev_info = {
-   .ra_pages   = MAX_RA_PAGES,
+   .ra_pages   = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
.state  = 0,
.capabilities   = BDI_CAP_MAP_COPY,
.unplug_io_fn   = default_unplug_io_fn,

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] readahead: basic support of interleaved reads

2007-07-20 Thread Fengguang Wu
This is a simplified version of the pagecache context based readahead.
It handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it do not try to detect
interleaved reads _actively_, which requires a probe into the page cache(which
means a little more overheads for random reads). It only tries to handle a
previously started sequential readahead whose state was overwritten by
another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
   right:
timestream 1  stream 2
0   1 
1 1001
2   2
3 1002
4   3
5 1003
6   4
7 1004
8   5
9 1005
Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
   initial sequential readahead be started is huge. Once the first sequential
   readahead is started for a stream, this patch will ensure that the readahead
   window continues to rampup and won't be disturbed by other streams.

timestream 1  stream 2
0   1 
1   2
2 1001
3   3
4 1002
5 1003
6   4
7   5
8 1004
9   6
101005
11  7
121006
131007
Here steam 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served right.

Cc: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 mm/readahead.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -363,6 +363,29 @@ ondemand_readahead(struct address_space 
}
 
/*
+* Hit a marked page without valid readahead state.
+* E.g. interleaved reads.
+* Query the pagecache for async_size, which normally equals to
+* readahead size. Ramp it up and use it as the new readahead size.
+*/
+   if (hit_readahead_marker) {
+   pgoff_t start;
+
+   read_lock_irq(>tree_lock);
+   start = radix_tree_scan_hole(>page_tree, offset, 
max+1);
+   read_unlock_irq(>tree_lock);
+
+   if (!start || start - offset > max)
+   return 0;
+
+   ra->start = start;
+   ra->size = start - offset;  /* old async_size */
+   ra->size = get_next_ra_size(ra, max);
+   ra->async_size = ra->size;
+   goto readit;
+   }
+
+   /*
 * It may be one of
 *  - first read on start of file
 *  - sequential cache miss
@@ -373,16 +396,6 @@ ondemand_readahead(struct address_space 
ra->size = get_init_ra_size(req_size, max);
ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
-   /*
-* Hit on a marked page without valid readahead state.
-* E.g. interleaved reads.
-* Not knowing its readahead pos/size, bet on the minimal possible one.
-*/
-   if (hit_readahead_marker) {
-   ra->start++;
-   ra->size = get_next_ra_size(ra, max);
-   }
-
 readit:
return ra_submit(ra, mapping, filp);
 }

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Fengguang Wu
On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 21 Jul 2007, Fengguang Wu wrote:
> >
> > Sorry, forgot to prefix the patch titles with [readahead].
> > Should I repost?
> 
> Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 
> even if it does mean missing the merge window this time around. 

OK. Let me repost it...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Linus Torvalds


On Sat, 21 Jul 2007, Fengguang Wu wrote:
>
> Sorry, forgot to prefix the patch titles with [readahead].
> Should I repost?

Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 
even if it does mean missing the merge window this time around. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] [PATCH 5/5] ehca: Support small QP queues

2007-07-20 Thread Roland Dreier
thanks, applied.  I fixed this up myself to work with commit 20c2df83,
which got rid of the destructor argument to kmem_cache_create() -- you
probably want to check my tree to make sure it's OK.

Also the same as I said before about checkpatch.pl's warning:

WARNING: externs should be avoided in .c files
#337: FILE: drivers/infiniband/hw/ehca/ehca_pd.c:91:
+   extern struct kmem_cache *small_qp_cache;

please fix that up when you get a chance
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fixing lables after GNU indent (Re: [PATCH 1/2] run scripts/Lindent on it to match Documentation/CodingStyle)

2007-07-20 Thread Oleg Verych
[]
> > > sed -i -e 's/^\t*  \(\w*:\)/ \1/' "$@"
> > >
> > > which will replace the leading tabs and spaces with one space.
> > > It should leave case labels unmolested, as they should be indented with
> > > tabs, not 6 spaces.
> > >
> > > Any regexp ninjas want to have a go at something better?
> > 
> > I'm the one. Trying to write portable, optimized and easy to
> > understand scripts [0].
> > 
> > Please, describe more what must be done, and i will do it. Case labels
> > are handled very strangely in you example.
> 
> OK.  indent will indent labels to a column number that's a multiple of
> 8, plus 6.  So it may start in column 6, 14, 20, 28, etc.  I'm not quite
> sure what the definition of a label is; I had it as \w*: up there, but I
> don't know if that would match the _.  The point is to *not* handle case
> labels, only goto labels.

t=`printf '\t'`
sed -i "s_^\($t*\)  *\([^:]*:\)_\1\2_" "$@"
  ^-_
I'm not sure about leaving one space `here, otherwise it removes
spaces between (supposedly right indented) line start, i.e. nothing or
tab(s), and a label, i.e. `label_name:' without space before colon;
`label_name' here actually not a colon, let's leave that kind of
breakage to compiler.

The variable $t is used for readability of the regex and because POSIX
BREs leave undefined characters after a backslash, POSIX sed defines
only \n as a new line.

--
-o--=O`C
 #oo'L O
<___=E M
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] ehca: Make ehca2ib_return_code() non-inline

2007-07-20 Thread Roland Dreier
thanks, applied
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] ehca: Generate event when SRQ limit reached

2007-07-20 Thread Roland Dreier
thanks, applied.

BTW, does your SRQ-capable hardware support generating the "last WQE
reached" event?  There's not any reliable way to avoid problems when
destroying QPs attached to an SRQ without it, and the IB spec requires
CAs that support SRQs to generate it (o11-5.2.5 in chapter 11 of vol 1).

I don't see any code in ehca to generate the event, and IPoIB CM at
least will be very unhappy when using SRQs if the event is not
generated.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Fengguang Wu
Sorry, forgot to prefix the patch titles with [readahead].
Should I repost?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] [PATCH 1/5] ehca: Supports large page MRs

2007-07-20 Thread Roland Dreier
I applied this, but I agree with checkpatch.pl:

 > WARNING: externs should be avoided in .c files
 > #227: FILE: drivers/infiniband/hw/ehca/ehca_mrmw.c:67:
 > +extern int ehca_mr_largepage;
 > 
 > WARNING: externs should be avoided in .c files
 > #949: FILE: drivers/infiniband/hw/ehca/hcp_if.c:753:
 > +extern int ehca_debug_level;

if you need to use a variable in more than one .c file, put the extern
declaration in a common header that's included everywhere you use the
variable, including the .c file that it is defined in.  That way the
compiler can see if you get confused about the type of the variable.

When you get a chance, please post a follow-on patch to fix this.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] trivial filemap.c cleanups

2007-07-20 Thread Fengguang Wu
- remove unused local next_index in do_generic_mapping_read()
- convert some 'unsigned long' to pgoff_t
- wrap a long line

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 mm/filemap.c |   16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -866,11 +866,10 @@ void do_generic_mapping_read(struct addr
 read_actor_t actor)
 {
struct inode *inode = mapping->host;
-   unsigned long index;
-   unsigned long offset;
-   unsigned long last_index;
-   unsigned long next_index;
-   unsigned long prev_index;
+   pgoff_t index;
+   pgoff_t offset;
+   pgoff_t last_index;
+   pgoff_t prev_index;
unsigned int prev_offset;
struct page *cached_page;
int error;
@@ -878,7 +877,6 @@ void do_generic_mapping_read(struct addr
 
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
-   next_index = index;
prev_index = ra.prev_pos >> PAGE_CACHE_SHIFT;
prev_offset = ra.prev_pos & (PAGE_CACHE_SIZE-1);
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
@@ -1219,7 +1217,8 @@ out:
 }
 EXPORT_SYMBOL(generic_file_aio_read);
 
-int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long 
offset, unsigned long size)
+int file_send_actor(read_descriptor_t * desc, struct page *page,
+   unsigned long offset, unsigned long size)
 {
ssize_t written;
unsigned long count = desc->count;
@@ -1272,7 +1271,6 @@ asmlinkage ssize_t sys_readahead(int fd,
 }
 
 #ifdef CONFIG_MMU
-static int FASTCALL(page_cache_read(struct file * file, unsigned long offset));
 /**
  * page_cache_read - adds requested page to the page cache if not already there
  * @file:  file to read
@@ -1281,7 +1279,7 @@ static int FASTCALL(page_cache_read(stru
  * This adds the requested page to the page cache if it isn't already there,
  * and schedules an I/O to read in its contents from disk.
  */
-static int fastcall page_cache_read(struct file * file, unsigned long offset)
+static int fastcall page_cache_read(struct file * file, pgoff_t offset)
 {
struct address_space *mapping = file->f_mapping;
struct page *page; 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] mmap read-around simplification

2007-07-20 Thread Fengguang Wu
Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss
and make it an int.

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 include/linux/fs.h |3 +--
 mm/filemap.c   |4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -703,8 +703,7 @@ struct file_ra_state {
   there are only # of pages ahead */
 
unsigned int ra_pages;  /* Maximum readahead window */
-   unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
-   unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
+   int mmap_miss;  /* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
 };
--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -1369,7 +1369,7 @@ retry_find:
 * Do we miss much more than hit in this file? If so,
 * stop bothering with read-ahead. It will only hurt.
 */
-   if (ra->mmap_miss > ra->mmap_hit + MMAP_LOTSAMISS)
+   if (ra->mmap_miss > MMAP_LOTSAMISS)
goto no_cached_page;
 
/*
@@ -1395,7 +1395,7 @@ retry_find:
}
 
if (!did_readaround)
-   ra->mmap_hit++;
+   ra->mmap_miss--;
 
/*
 * We have a locked page in the page cache, now we need to check

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb

2007-07-20 Thread Fengguang Wu
Remove the size limit max_sectors_kb imposed on max_readahead_kb.

The size restriction is unreasonable. Especially when max_sectors_kb cannot
grow larger than max_hw_sectors_kb, which can be rather small for some disk
drives.

Cc: Jens Axboe <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
Acked-by: Jens Axboe <[EMAIL PROTECTED]>
---
 block/ll_rw_blk.c |9 -
 1 file changed, 9 deletions(-)

--- linux-2.6.22-git15.orig/block/ll_rw_blk.c
+++ linux-2.6.22-git15/block/ll_rw_blk.c
@@ -3946,7 +3946,6 @@ queue_max_sectors_store(struct request_q
max_hw_sectors_kb = q->max_hw_sectors >> 1,
page_kb = 1 << (PAGE_CACHE_SHIFT - 10);
ssize_t ret = queue_var_store(_sectors_kb, page, count);
-   int ra_kb;
 
if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
return -EINVAL;
@@ -3955,14 +3954,6 @@ queue_max_sectors_store(struct request_q
 * values synchronously:
 */
spin_lock_irq(q->queue_lock);
-   /*
-* Trim readahead window as well, if necessary:
-*/
-   ra_kb = q->backing_dev_info.ra_pages << (PAGE_CACHE_SHIFT - 10);
-   if (ra_kb > max_sectors_kb)
-   q->backing_dev_info.ra_pages =
-   max_sectors_kb >> (PAGE_CACHE_SHIFT - 10);
-
q->max_sectors = max_sectors_kb << 1;
spin_unlock_irq(q->queue_lock);
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] introduce radix_tree_scan_hole()

2007-07-20 Thread Fengguang Wu
Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree
for the first hole. It will be used in interleaved readahead.

The implementation is dumb and obviously correct.
It can help debug(and document) the possible smart one in future.

Cc: Nick Piggin <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---

 include/linux/radix-tree.h |2 ++
 lib/radix-tree.c   |   34 ++
 2 files changed, 36 insertions(+)

--- linux-2.6.22-git15.orig/include/linux/radix-tree.h
+++ linux-2.6.22-git15/include/linux/radix-tree.h
@@ -155,6 +155,8 @@ void *radix_tree_delete(struct radix_tre
 unsigned int
 radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
--- linux-2.6.22-git15.orig/lib/radix-tree.c
+++ linux-2.6.22-git15/lib/radix-tree.c
@@ -599,6 +599,40 @@ int radix_tree_tag_get(struct radix_tree
 EXPORT_SYMBOL(radix_tree_tag_get);
 #endif
 
+static unsigned long
+radix_tree_scan_hole_dumb(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   unsigned long i;
+
+   for (i = 0; i < max_scan; i++) {
+   if (!radix_tree_lookup(root, index))
+   break;
+   if (++index == 0)
+   break;
+   }
+
+   return index;
+}
+
+/**
+ * radix_tree_scan_hole-scan for hole
+ * @root:  radix tree root
+ * @index: index key
+ * @max_scan:  advice on max items to scan (it may scan a little more)
+ *
+ *  Scan forward from @index for a hole/empty item, stop when
+ *  - hit hole
+ *  - wrap-around to index 0
+ *  - @max_scan or more items scanned
+ */
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   return radix_tree_scan_hole_dumb(root, index, max_scan);
+}
+EXPORT_SYMBOL(radix_tree_scan_hole);
+
 static unsigned int
 __lookup(struct radix_tree_node *slot, void **results, unsigned long index,
unsigned int max_items, unsigned long *next_index)

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos

2007-07-20 Thread Fengguang Wu
Combine the file_ra_state members
unsigned long prev_index
unsigned int prev_offset
into
loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

Cc: Peter Zijlstra <[EMAIL PROTECTED]>
Cc: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/ext3/dir.c  |2 +-
 fs/ext4/dir.c  |2 +-
 fs/splice.c|2 +-
 include/linux/fs.h |3 +--
 mm/filemap.c   |   11 ++-
 mm/readahead.c |   15 ---
 6 files changed, 18 insertions(+), 17 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -704,8 +704,7 @@ struct file_ra_state {
 
unsigned int ra_pages;  /* Maximum readahead window */
int mmap_miss;  /* Cache miss stat for mmap accesses */
-   unsigned long prev_index;   /* Cache last read() position */
-   unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
+   loff_t prev_pos;/* Cache last read() position */
 };
 
 /*
--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -879,8 +879,8 @@ void do_generic_mapping_read(struct addr
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
-   prev_index = ra.prev_index;
-   prev_offset = ra.prev_offset;
+   prev_index = ra.prev_pos >> PAGE_CACHE_SHIFT;
+   prev_offset = ra.prev_pos & (PAGE_CACHE_SIZE-1);
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
 
@@ -966,7 +966,6 @@ page_ok:
index += offset >> PAGE_CACHE_SHIFT;
offset &= ~PAGE_CACHE_MASK;
prev_offset = offset;
-   ra.prev_offset = offset;
 
page_cache_release(page);
if (ret == nr && desc->count)
@@ -1056,7 +1055,9 @@ no_cached_page:
 
 out:
*_ra = ra;
-   _ra->prev_index = prev_index;
+   _ra->prev_pos = prev_index;
+   _ra->prev_pos <<= PAGE_CACHE_SHIFT;
+   _ra->prev_pos |= prev_offset;
 
*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
if (cached_page)
@@ -1415,7 +1416,7 @@ retry_find:
 * Found the page and have a reference on it.
 */
mark_page_accessed(page);
-   ra->prev_index = page->index;
+   ra->prev_pos = page->index << PAGE_CACHE_SHIFT;
vmf->page = page;
return ret | VM_FAULT_LOCKED;
 
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -45,7 +45,7 @@ void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
 {
ra->ra_pages = mapping->backing_dev_info->ra_pages;
-   ra->prev_index = -1;
+   ra->prev_pos = -1;
 }
 EXPORT_SYMBOL_GPL(file_ra_state_init);
 
@@ -326,7 +326,7 @@ static unsigned long get_next_ra_size(st
  * indicator. The flag won't be set on already cached pages, to avoid the
  * readahead-for-nothing fuss, saving pointless page cache lookups.
  *
- * prev_index tracks the last visited page in the _previous_ read request.
+ * prev_pos tracks the last visited byte in the _previous_ read request.
  * It should be maintained by the caller, and will be used for detecting
  * small random reads. Note that the readahead algorithm checks loosely
  * for sequential patterns. Hence interleaved reads might be served as
@@ -350,11 +350,9 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   int max;/* max readahead pages */
-   int sequential;
-
-   max = ra->ra_pages;
-   sequential = (offset - ra->prev_index <= 1UL) || (req_size > max);
+   int max = ra->ra_pages; /* max readahead pages */
+   pgoff_t prev_offset;
+   int sequential;
 
/*
 * It's the expected callback offset, assume sequential access.
@@ -368,6 +366,9 @@ ondemand_readahead(struct address_space 
goto readit;
}
 
+   prev_offset = ra->prev_pos >> PAGE_CACHE_SHIFT;
+   sequential = offset - prev_offset <= 1UL || req_size > max;
+
/*
 * Standalone, small read.
 * Read as is, and do not pollute the readahead state.
--- linux-2.6.22-git15.orig/fs/ext3/dir.c
+++ linux-2.6.22-git15/fs/ext3/dir.c
@@ -143,7 +143,7 @@ static int ext3_readdir(struct file * fi
sb->s_bdev->bd_inode->i_mapping,
>f_ra, filp,
index, 1);
-   filp->f_ra.prev_index = index;
+   filp->f_ra.prev_pos = index << PAGE_CACHE_SHIFT;
bh = ext3_bread(NULL, inode, blk, 0, );
  

[PATCH 5/8] remove several readahead macros

2007-07-20 Thread Fengguang Wu
Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES.

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 include/linux/mm.h |2 --
 mm/readahead.c |   10 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/mm.h
+++ linux-2.6.22-git15/include/linux/mm.h
@@ -1136,8 +1136,6 @@ int write_one_page(struct page *page, in
 /* readahead.c */
 #define VM_MAX_READAHEAD   128 /* kbytes */
 #define VM_MIN_READAHEAD   16  /* kbytes (includes current page) */
-#define VM_MAX_CACHE_HIT   256 /* max pages in a row in cache before
-* turning readahead off */
 
 int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
pgoff_t offset, unsigned long nr_to_read);
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -21,16 +21,8 @@ void default_unplug_io_fn(struct backing
 }
 EXPORT_SYMBOL(default_unplug_io_fn);
 
-/*
- * Convienent macros for min/max read-ahead pages.
- * Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up.
- * The latter is necessary for systems with large page size(i.e. 64k).
- */
-#define MAX_RA_PAGES   (VM_MAX_READAHEAD*1024 / PAGE_CACHE_SIZE)
-#define MIN_RA_PAGES   DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE)
-
 struct backing_dev_info default_backing_dev_info = {
-   .ra_pages   = MAX_RA_PAGES,
+   .ra_pages   = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
.state  = 0,
.capabilities   = BDI_CAP_MAP_COPY,
.unplug_io_fn   = default_unplug_io_fn,

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/8] readahead cleanups and interleaved readahead take 2

2007-07-20 Thread Fengguang Wu
Linus,

To save you from some merge conflicts, I rebased this readahead patchset
to 2.6.22-git5.

The following patches are based on yesterday's discussions, compiled and
tested OK.

smaller file_ra_state:
[PATCH 1/8] compacting file_ra_state
  
[PATCH 2/8] mmap read-around simplification 
  
[PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos  
  

code cleanups:
[PATCH 4/8] trivial filemap.c cleanups  
  
[PATCH 5/8] remove several readahead macros 
  
[PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb 
  

support of interleaved reads:
[PATCH 7/8] introduce radix_tree_scan_hole()
  
[PATCH 8/8] basic support of interleaved reads  
  


The diffstat is

 block/ll_rw_blk.c  |9 -
 fs/ext3/dir.c  |2 -
 fs/ext4/dir.c  |2 -
 fs/splice.c|2 -
 include/linux/fs.h |   14 +++-
 include/linux/mm.h |2 -
 include/linux/radix-tree.h |2 +
 lib/radix-tree.c   |   34 
 mm/filemap.c   |   31 +-
 mm/readahead.c |   58 +++
 10 files changed, 92 insertions(+), 64 deletions(-)

Regards,
Fengguang Wu
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] basic support of interleaved reads

2007-07-20 Thread Fengguang Wu
This is a simplified version of the pagecache context based readahead.
It handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it do not try to detect
interleaved reads _actively_, which requires a probe into the page cache(which
means a little more overheads for random reads). It only tries to handle a
previously started sequential readahead whose state was overwritten by
another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
   right:
timestream 1  stream 2
0   1 
1 1001
2   2
3 1002
4   3
5 1003
6   4
7 1004
8   5
9 1005
Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
   initial sequential readahead be started is huge. Once the first sequential
   readahead is started for a stream, this patch will ensure that the readahead
   window continues to rampup and won't be disturbed by other streams.

timestream 1  stream 2
0   1 
1   2
2 1001
3   3
4 1002
5 1003
6   4
7   5
8 1004
9   6
101005
11  7
121006
131007
Here steam 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served right.

Cc: Nick Piggin <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 mm/readahead.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -371,6 +371,29 @@ ondemand_readahead(struct address_space 
}
 
/*
+* Hit a marked page without valid readahead state.
+* E.g. interleaved reads.
+* Query the pagecache for async_size, which normally equals to
+* readahead size. Ramp it up and use it as the new readahead size.
+*/
+   if (hit_readahead_marker) {
+   pgoff_t start;
+
+   read_lock_irq(>tree_lock);
+   start = radix_tree_scan_hole(>page_tree, offset, 
max+1);
+   read_unlock_irq(>tree_lock);
+
+   if (!start || start - offset > max)
+   return 0;
+
+   ra->start = start;
+   ra->size = start - offset;  /* old async_size */
+   ra->size = get_next_ra_size(ra, max);
+   ra->async_size = ra->size;
+   goto readit;
+   }
+
+   /*
 * It may be one of
 *  - first read on start of file
 *  - sequential cache miss
@@ -381,16 +404,6 @@ ondemand_readahead(struct address_space 
ra->size = get_init_ra_size(req_size, max);
ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
-   /*
-* Hit on a marked page without valid readahead state.
-* E.g. interleaved reads.
-* Not knowing its readahead pos/size, bet on the minimal possible one.
-*/
-   if (hit_readahead_marker) {
-   ra->start++;
-   ra->size = get_next_ra_size(ra, max);
-   }
-
 readit:
return ra_submit(ra, mapping, filp);
 }

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Fengguang Wu
Use 'unsigned int' instead of 'unsigned long' for readahead sizes.

This helps reduce memory consumption on 64bit CPU when
a lot of files are opened.

CC: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 include/linux/fs.h |8 
 mm/readahead.c |2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -697,12 +697,12 @@ struct fown_struct {
  * Track a single file's readahead state
  */
 struct file_ra_state {
-   pgoff_t start;  /* where readahead started */
-   unsigned long size; /* # of readahead pages */
-   unsigned long async_size;   /* do asynchronous readahead when
+   pgoff_t start;  /* where readahead started */
+   unsigned int size;  /* # of readahead pages */
+   unsigned int async_size;/* do asynchronous readahead when
   there are only # of pages ahead */
 
-   unsigned long ra_pages; /* Maximum readahead window */
+   unsigned int ra_pages;  /* Maximum readahead window */
unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -350,7 +350,7 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   unsigned long max;  /* max readahead pages */
+   int max;/* max readahead pages */
int sequential;
 
max = ra->ra_pages;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Kconfig: Remove top level menu "Code maturity level options"

2007-07-20 Thread Al Boldi

This patch removes the top level menu "Code maturity level options", and 
moves its options into menu "General setup".

This makes Kconfig less cluttered and easier to setup.


Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Al Boldi <[EMAIL PROTECTED]>

---
--- a/init/Kconfig  2007-07-09 06:38:47.0 +0300
+++ b/init/Kconfig  2007-07-21 06:42:06.0 +0300
@@ -7,7 +7,7 @@ config DEFCONFIG_LIST
default "/boot/config-$UNAME_RELEASE"
default "arch/$ARCH/defconfig"
 
-menu "Code maturity level options"
+menu "General setup"
 
 config EXPERIMENTAL
bool "Prompt for development and/or incomplete code/drivers"
@@ -61,9 +61,6 @@ config INIT_ENV_ARG_LIMIT
  Maximum of each of the number of arguments and environment
  variables passed to init from the kernel command line.
 
-endmenu
-
-menu "General setup"
 
 config LOCALVERSION
string "Local version - append to kernel release"

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

On 7/21/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:

Hopefully this bug should be 100% reproducible at boot time anyway.
Don't care much for XFS and unionfs, but hoping deselecting ATA from
the config doesn't change the variables much in this equation. ]



Gargh! My system obviously cannot boot without libata. Guess it's
time to go through git log and see how to fix that build breakage
myself ...

Michal, how did you even manage to build / boot this kernel!



On 7/21/07, Greg KH <[EMAIL PROTECTED]> wrote:
> On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote:
> > On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH <[EMAIL PROTECTED]> wrote:
> >
> > > --- a/kernel/params.c
> > > +++ b/kernel/params.c
> > > @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
> > > kobject_set_name(>kobj, name);
> > > kobject_init(>kobj);
> > > ret = kobject_add(>kobj);
> > > -   BUG_ON(ret < 0);
> > > +   if (ret) {
> > > +   printk(KERN_ERR "module '%s' failed to be added to sysfs, "
> > > +   "the system will be unstable now.\n", name);
> > > +   return;
> > > +   }
> >
> > It would be nice to print the value of `ret' too.


What I'm surprised about is that %eax doesn't seem to contain the
return value `ret' of kobject_add(). It's 1, which is funny, given:

ret = kobject_add(>kobj);
BUG_ON(ret < 0);

One wouldn't expect BUG() -- or the corresponding exception handler --
to clobber registers, that would be a sad day.



But I cracked this one alright. His .config has CONFIG_PROFILE_LIKELY=y
which replaces unlikely() / likely() with do_check_likely() and forces
gcc to clobber %eax with the condition itself, which in our case was
(ret < 0) == TRUE, and thus, the "1" value we saw in %eax in the
register dumps.

We should probably document somewhere that CONFIG_PROFILE_LIKELY
is not good for debugging.

Hmmm ... thinking out aloud here, but probably I don't need to fix that
libata breakage at all. I'll just put the BUG_ON(ret < 0) back in the
code, deselect PROFILE_LIKELY, and this time we _will_ have the
return of kobject_add() in %eax ...


That'll at least clear up the EEXIST vs EINVAL mystery, that'll be a
good data point, yes.

Anyway, I guess I must stop my running commentary -- will only post
after this is cleared up now :-)

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Rob Landley
On Friday 20 July 2007 4:09:36 am Greg KH wrote:
> On Fri, Jul 20, 2007 at 09:54:01AM +0200, Cornelia Huck wrote:
> > On Fri, 20 Jul 2007 00:00:01 -0700,
> >
> > Greg KH <[EMAIL PROTECTED]> wrote:
> > > > I don't insist on it, mknod insists on it.  You cannot mknod a dev
> > > > node without specifying block or char.
> > > >
> > > > You're saying that sysfs should provide major and minor numbers
> > > > without anywhere specifying "char" or "block", meaning the major and
> > > > minor numbers cannot be _used_.  I am insisting on getting the third
> > > > piece of information without which "major" and "minor" are useless.
> > > >
> > > > I asked very specifically about this at OLS, several times.  What
> > > > you're telling me now seems to contradict what you told me then.
> > >
> > > Here's the rule:
> > >   If the SUBSYSTEM is "block", it's a block device.  Otherwise
> > >   it's a char device.
> >
> > That's actually quite confusing to the casual reader, since:
> > > But also realize that the majority of events you will get have nothing
> > > to do with device nodes.  I think you are forgetting this fact.
> >
> > So the rule should be:
> > If the SUBSYSTEM is "block" (implying major/minor are provided),
> > it's a block device.
> > If the SUBSYSTEM is not "block", and major/minor are provided,
> > it's a char device.
> > If major/minor are not provided, the event/device is not
> > relevant to device node creation.
>
> Yes, that is much more descriptive, thanks.

agreed, thanks.

I'll try to post an updated version of my hotplug documentation later tonight.  
(Just a _touch_ jetlagged at the moment, though.  It may only be 9:47 
california time, but it's 11:47 on the east cost.  I think.)

> greg k-h

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: posible latency issues in seq_read

2007-07-20 Thread Eric Dumazet

Chris Friesen a écrit :

Lee Revell wrote:

On 7/20/07, Chris Friesen <[EMAIL PROTECTED]> wrote:



We've run into an issue (on 2.6.10) where calling "lsof" triggers lost
packets on our server.  Preempt is disabled, and NAPI is enabled.



Can you reproduce with a recent kernel?  Lots of latency issues have
been fixed since then.


Unfortunately I have to fix it on this version (the bug was found on 
shipped product), so if there was a difference I'd have to isolate the 
changes and backport them.  Also, I can't run the software that triggers 
the problem on a newer kernel as it has dependencies on various patches 
that are not in mainline.


Basically what I'd like to know is whether calling schedule() in 
seq_read() is safe or whether it would break assumptions made by 
seq_file users.




It wont help much. seq_read() is fine in itself.

The problem is in established_get_next() and established_get_first() not 
allowing softirq processing, while scanning a possibly huge hash table, even 
if few sockets are hashed in.


As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to 
check the diffs between linux-2.6.10 & linux-2.6.11


files :

include/linux/sched.h
net/core/sock.c  (__release_sock() latency)
net/ipv4/tcp_ipv4.c  (/proc/net/tcp latency)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use descriptor's functions instead of inline assembly

2007-07-20 Thread Chris Wright
* Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
> This patch provides a new set of functions for managing the descriptor
> tables that can be used instead of putting the raw assembly in .c files.

Looks alright, some cleanups below

> Remodeling of store_tr() suggested by Frederik Deweerdt.
> 
> Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> 
> diff --git a/arch/x86_64/kernel/head64.c b/arch/x86_64/kernel/head64.c
> index 6c34bdd..dde41d7 100644
> --- a/arch/x86_64/kernel/head64.c
> +++ b/arch/x86_64/kernel/head64.c
> @@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * real_mode_data)
>  
>   for (i = 0; i < IDT_ENTRIES; i++)
>   set_intr_gate(i, early_idt_handler);
> - asm volatile("lidt %0" :: "m" (idt_descr));
> + load_idt((const struct desc_ptr *)_descr);

No need for extra casting

>   early_printk("Kernel alive\n");
>  
> diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
> index 7503068..7c50a12 100644
> --- a/arch/x86_64/kernel/reboot.c
> +++ b/arch/x86_64/kernel/reboot.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -132,7 +133,7 @@ void machine_emergency_restart(void)
>   }
>  
>   case BOOT_TRIPLE: 
> - __asm__ __volatile__("lidt (%0)": :"r" (_idt));
> + load_idt((const struct desc_ptr *)_idt);

same here, plus opportunity for cleanup

>   __asm__ __volatile__("int3");
>  
>   reboot_type = BOOT_KBD;
> diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
> index 1200aaa..fef7290 100644
> --- a/arch/x86_64/kernel/setup64.c
> +++ b/arch/x86_64/kernel/setup64.c
> @@ -224,8 +224,8 @@ void __cpuinit cpu_init (void)
>   memcpy(cpu_gdt(cpu), cpu_gdt_table, GDT_SIZE);
>  
>   cpu_gdt_descr[cpu].size = GDT_SIZE;
> - asm volatile("lgdt %0" :: "m" (cpu_gdt_descr[cpu]));
> - asm volatile("lidt %0" :: "m" (idt_descr));
> + load_gdt((const struct desc_ptr *)_gdt_descr[cpu]);
> + load_idt((const struct desc_ptr *)_descr);

same here

>   memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
>   syscall_init();
> diff --git a/arch/x86_64/kernel/suspend.c b/arch/x86_64/kernel/suspend.c
> index b39d478..ddedadf 100644
> --- a/arch/x86_64/kernel/suspend.c
> +++ b/arch/x86_64/kernel/suspend.c
> @@ -32,9 +32,9 @@ void __save_processor_state(struct saved_context *ctxt)
>   /*
>* descriptor tables
>*/
> - asm volatile ("sgdt %0" : "=m" (ctxt->gdt_limit));
> - asm volatile ("sidt %0" : "=m" (ctxt->idt_limit));
> - asm volatile ("str %0"  : "=m" (ctxt->tr));
> + store_gdt((struct desc_ptr *)>gdt_limit);
> + store_idt((struct desc_ptr *)>idt_limit);

same here, opportunity for cleanup

> + store_tr(ctxt->tr);
>  
>   /* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
>   /*
> @@ -91,8 +91,9 @@ void __restore_processor_state(struct saved_context *ctxt)
>* now restore the descriptor tables to their proper values
>* ltr is done i fix_processor_context().
>*/
> - asm volatile ("lgdt %0" :: "m" (ctxt->gdt_limit));
> - asm volatile ("lidt %0" :: "m" (ctxt->idt_limit));
> + load_gdt((const struct desc_ptr *)>gdt_limit);
> + load_idt((const struct desc_ptr *)>idt_limit);
> + 
>  
>   /*
>* segment registers
> diff --git a/include/asm-x86_64/desc.h b/include/asm-x86_64/desc.h
> index ac991b5..f2b0a6f 100644
> --- a/include/asm-x86_64/desc.h
> +++ b/include/asm-x86_64/desc.h
> @@ -20,6 +20,15 @@ extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
>  #define load_LDT_desc() asm volatile("lldt %w0"::"r" (GDT_ENTRY_LDT*8))
>  #define clear_LDT()  asm volatile("lldt %w0"::"r" (0))
>  
> +static inline unsigned long __store_tr(void)
> +{
> +   unsigned long tr;
> +   asm volatile ("str %w0":"=r" (tr));
> +   return tr;
> +}

native_store_tr (although I've no objection to just fixing the interface)


Index: linus-2.6/arch/x86_64/kernel/head64.c
===
--- linus-2.6.orig/arch/x86_64/kernel/head64.c
+++ linus-2.6/arch/x86_64/kernel/head64.c
@@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * r
 
for (i = 0; i < IDT_ENTRIES; i++)
set_intr_gate(i, early_idt_handler);
-   load_idt((const struct desc_ptr *)_descr);
+   load_idt(_descr);
 
early_printk("Kernel alive\n");
 
Index: linus-2.6/arch/x86_64/kernel/reboot.c
===
--- linus-2.6.orig/arch/x86_64/kernel/reboot.c
+++ linus-2.6/arch/x86_64/kernel/reboot.c
@@ -24,7 +24,7 @@
 void (*pm_power_off)(void);
 EXPORT_SYMBOL(pm_power_off);
 
-static long no_idt[3];
+static struct desc_ptr no_idt;
 static enum { 
BOOT_TRIPLE = 't',
BOOT_KBD = 'k'
@@ -133,7 +133,7 @@ void 

Re: [PATCH] infiniband mlx4: potential leaks in __mlx4_ib_modify_qp

2007-07-20 Thread Roland Dreier
thanks, applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

[ Considering this has sufficiently excited me, I became the second person
to illegitimately download 2.6.22-mm1 and am presently building Michal's
config. The strange thing is that I couldn't get 22-mm1 to even build with
the posted .config -- so had to deselect XFS, ATA, unionfs.

Hopefully this bug should be 100% reproducible at boot time anyway.
Don't care much for XFS and unionfs, but hoping deselecting ATA from
the config doesn't change the variables much in this equation. ]


On 7/21/07, Greg KH <[EMAIL PROTECTED]> wrote:

On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote:
> On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH <[EMAIL PROTECTED]> wrote:
>
> > --- a/kernel/params.c
> > +++ b/kernel/params.c
> > @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
> > kobject_set_name(>kobj, name);
> > kobject_init(>kobj);
> > ret = kobject_add(>kobj);
> > -   BUG_ON(ret < 0);
> > +   if (ret) {
> > +   printk(KERN_ERR "module '%s' failed to be added to sysfs, "
> > +   "the system will be unstable now.\n", name);
> > +   return;
> > +   }
>
> It would be nice to print the value of `ret' too.



What I'm surprised about is that %eax doesn't seem to contain the
return value `ret' of kobject_add(). It's 1, which is funny, given:

ret = kobject_add(>kobj);
BUG_ON(ret < 0);

One wouldn't expect BUG() -- or the corresponding exception handler --
to clobber registers, that would be a sad day.



Ok, how about this version:

---
 kernel/params.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/kernel/params.c
+++ b/kernel/params.c
@@ -567,7 +567,12 @@ static void __init kernel_param_sysfs_se
kobject_set_name(>kobj, name);
kobject_init(>kobj);
ret = kobject_add(>kobj);
-   BUG_ON(ret < 0);
+   if (ret) {
+   printk(KERN_ERR "Module '%s' failed to be added to sysfs, "
+ "error number %d\n", name, ret);
+   printk(KERN_ERR "The system will be unstable now.\n");
+   return;
+   }
param_sysfs_setup(mk, kparam, num_params, name_skip);
kobject_uevent(>kobj, KOBJ_ADD);
 }



I'm building with this:

if (ret) {
   printk("~ .%s.%d.%s. ~\n", name, ret, kparam->name);
   return;
}

To also print out the evil kparam->name that caused us to crash.
When ret == EINVAL, name would be "", so not so helpful alone.

Also enabling netconsole, though I'm sure there's zero chances
of NET / ethXXX / netconsole being up _this_ early in the boot ...

Will keep you guys posted :-)

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] [V2] Define is_global_init() and is_container_init()

2007-07-20 Thread sukadev
Andrew Morton [EMAIL PROTECTED] wrote:
| On Thu, 19 Jul 2007 00:21:58 -0700
| [EMAIL PROTECTED] wrote:
| 
| > --- lx26-22-rc6-mm1a.orig/kernel/pid.c  2007-07-16 12:55:15.0 
-0700
| > +++ lx26-22-rc6-mm1a/kernel/pid.c   2007-07-16 13:10:48.0 -0700
| > @@ -69,6 +69,13 @@ struct pid_namespace init_pid_ns = {
| > .last_pid = 0,
| > .child_reaper = _task
| >  };
| > +EXPORT_SYMBOL(init_pid_ns);
| > +
| > +int is_global_init(struct task_struct *tsk)
| > +{
| > +   return tsk == init_pid_ns.child_reaper;
| > +}
| > +EXPORT_SYMBOL(is_global_init);
| 
| I don't immediately see why init_pid_ns was exported to modules.

| 
| It would need to be exported if is_global_init() was made static inline in a
| header (which seems like a sensible thing to do), but it wasn't.

It did not need to be exported in this patch.

I have a couple of follow-on patches that cleaned up some header-file
dependencies and made is_global_init() inline. Those patches are
changing a bit as I merge them with Pavel Emelianov's pid ns changes.

I will send a separate patch to inline is_global_init().

Suka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Yinghai Lu

On 7/20/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:


* Jeff Garzik <[EMAIL PROTECTED]> wrote:

> I agree with Andi...  it's quite nice to be able to leave some
> arch/i386 stuff, and not carry it over to arch/x86-64.

we can leave those few items in arch/x86 just as much. No need to keep
around a legacy tree for that.


how about making all files ans directories take _32 or _64 in the name?
except the files or dir that are shared.

for example: k8_bus.c is only need by 64 ===> change it to k8_bus_64.c
mach-generic===> mach-generic_32

YH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Dell Inspiron 1501 fails to boot in 2.6.21+

2007-07-20 Thread Glauber de Oliveira Costa

On 7/20/07, Mark Tiefenbruck <[EMAIL PROTECTED]> wrote:

I'd appreciate any help on getting this report sent to the appropriate
list and, of course, getting this fixed. I don't know what's useful,
so you're getting everything. This will be a very long e-mail.

My new laptop won't boot with kernel versions 2.6.21 or 2.6.22 . No
oops. No panic. It just stops printing messages. Maybe it would
eventually continue if I wait long enough, but it's unacceptable
either way. I include below the contents of dmesg for a working kernel
up to the point where it halts. I'm also including what it usually
does for a few lines after that point.

I did git-bisect on the 2.6.21.y tree. I'm including the result of
that as well. It mentions HPET, so I should mention my computer also
fails to boot when I enable HPET in my BIOS. I don't have the details
of this currently; I can reproduce it again if needed.

I've also included my kernel configuration and ver_linux output.
You'll notice that my gcc version is 4.2.0, but this also happens with
4.1.2. I'm including /proc/cpuinfo and lspci -vvv. I'm including
/proc/ioports and /proc/iomem. I don't have a /proc/scsi.

  Thanks,
 Mark


Here's the commit that causes the problem:



e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit
commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Author: Thomas Gleixner <[EMAIL PROTECTED]>
Date:   Fri Feb 16 01:28:04 2007 -0800

[PATCH] clockevents: i386 drivers

Add clockevent drivers for i386: lapic (local) and PIT/HPET
(global).  Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver.  The
assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()

Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.

No changes to existing functionality.

[ kdump fix from Vivek Goyal <[EMAIL PROTECTED]> ]
[ fixes based on review feedback from Arjan van de Ven
<[EMAIL PROTECTED]> ]
Cleanups-from: Adrian Bunk <[EMAIL PROTECTED]>
Build-fixes-from: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Cc: john stultz <[EMAIL PROTECTED]>
Cc: Roman Zippel <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>


As a wild guess, I'd bet that the rcu queues are failing to get called
(probably some problem with the timer interrupt in the APs?), thus
preventing the system to get into a quiescent state.

It does seem timer related to me. Maybe one of the timer gurus have
any other word on this?

--
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Glauber de Oliveira Costa

On 7/20/07, Steven Rostedt <[EMAIL PROTECTED]> wrote:


> I really like the idea of a unified source tree for the 2 x86 variants.
> The technical differences are really small (of course there are
> differences, especially in the boot sequence), and striving to unify as
> much as possible while having a clean way to do per 32/64 bit parts as
> well is something that imo is the right thing.
>

Not to mention all the paravirt stuff that's going on. Having a single
x86 arch to work with would be greatly beneficial to the work being done
to port paravirt to x86_64.


As for paravirt, it'd really help. As I had the tree lagged behind by
so much, a great part of the work now is checking where i386 is,
seeing if it applies for 64-bit, and so on. The differences are not so
huge, and I'm trying my best to not let them deviate too much. It
could mostly be built incrementally.

And I bet a huge part of the tree could be like this too: In most
places, they are different for no particular reason, just because two
people implemented it separately. There'd be a huge effort to bring
those differences into an end, but I think I'd pay in future
development speed. (not to mention the duplicate bugs linus have
already talked about)


Way to go, Thomas and Ingo!

I am pretty much for it too.


--
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: net/ipv4/inetpeer.c stack warnings

2007-07-20 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 19 Jul 2007 14:48:59 +0200

> Gabriel C wrote:
> > Hello ,
> >
> > I noticed on current git this warning in net/ipv4/inetpeer.c
> 
> Yeah, I have no idea why the gcc people thought that this was
> something worth warning about. Especially since explicitly
> checking for != NULL silences the warning again.

Sigh, applied :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Kay Sievers <[EMAIL PROTECTED]> wrote:

On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
> On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote:
>  > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
>  > > On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
>  > >  > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
>  > >  > > Just one of my machines to 2.6.22.1, and got this during boot..
>  > >  > >
>  > >  > > Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists
>  > >  > >
>  > >  > > Under 2.6.21, all was fine.
>  > >  > >
>  > >  > > sdc is one disk of a 3 disk raid5 set.
>  > >  > > The raidset still manages to come up despite this.
>  > >  > >
>  > >  > > This is a Fedora 7 box, with udev-106-4.1.fc7
>  > >  > >
>  > >  > > What changed this time?
>  > >  >
>  > >  > CONFIG_BLK_DEV_BSG=y?
>  > >  >
>  > >  > There's a name-clash, because bsg tries to create devices with the 
same name.
>  > >  > James sent a patch, it's on lkml.
>  > >
>  > > BSG isn't in 2.6.22
>  >
>  > Ok. There has nothing else changed, that I could think of what could cause 
this.
>  >
>  > The code in udev that prints this message looks like:
>  >err("symlink(%s, %s) failed: %s", linktarget, filename, 
strerror(errno));
>  >
>  > That doesn't really match what you posted. Are there chars missing?
>
> Umm. Now I'm confused. Note above that it's talking about sdc.
> /dev/disk/by-uuid/ contains ..
>
> lrwxrwxrwx 1 root root  9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 
-> ../../sdd
> lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD -> ../../sdl1
> lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB -> ../../sdi1
> lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 
-> ../../sdc1
> lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 
-> ../../sda1
> lrwxrwxrwx 1 root root  9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 
-> ../../md0
> lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf 
-> ../../sda2
>
> note that uuid matches sdd instead.
>
>  > And what does:
>  >   udevtest /block/sdc
>  > print?
>
> parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file
> parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file
> This program is for debugging only, it does not create any node,
> or run any program specified by a RUN key. It may show incorrect results,
> if rules match against subsystem specfic kernel event variables.
>
> main: looking at device '/block/sdc' from subsystem 'block'
> run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath''
> run_program: '/bin/bash' (stdout) 'dm_multipath   28889  0 '
> run_program: '/bin/bash' returned with status 0
> run_program: '/lib/udev/usb_id -x'
> run_program: '/lib/udev/usb_id' returned with status 1
> run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32'
> run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA'
> run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M'
> run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0'
> run_program: '/lib/udev/scsi_id' (stdout) 
'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088'
> run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088'
> run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk'
> run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi'
> run_program: '/lib/udev/scsi_id' returned with status 0
> udev_rules_get_name: add symlink 
'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088'
> run_program: '/lib/udev/path_id /block/sdc'
> run_program: '/lib/udev/path_id' (stdout) 
'ID_PATH=pci-:05:05.0-scsi-0:0:0:0'
> run_program: '/lib/udev/path_id' returned with status 0
> udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0'
> run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32'
> run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid'
> run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member'
> run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0'
> 

Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Steven Rostedt
On Sat, 21 Jul 2007, Arnd Bergmann wrote:

> On Saturday 21 July 2007, Thomas Gleixner wrote:

>
> In my experience, it's very helpful to have a single set of header
> files, and merging the two versions of one header usually exposes
> bugs that have been fixed in only one of the two, so you get
> to fix actual bugs in the process.

This can still be done after the merge tglx did.

>
> In the s390 merge, I also started out in an attempt to guarantee
> unchanged object files, much like what you describe. However, it
> turned out that fixing it in the process is actually easier.
> Either way, 'diff -D __x86_64__' is a great tool for a start, you
> should try it out to see how easy it is to merge a lot of files.
>
> To put it into perspective, I think the s390 merge was a lot easier
> than the x86 merge, because there is only a very limited set of
> hardware configurations for s390 compared to others. We ended up
> doing the full merge with three people within less than a week
> and no separate files at all.

This is the big reason they wanted to keep it binary identical. Since
there are just way too many different configs out there in the x86
world

>
> OTOH, the powerpc merge is now going into its third year, mostly
> because it was started with the intention to remove all cruft
> in the process and to only allow sane code into the new architecture.

I'd expect x86 to move much faster, just because there are more developers
and users of x86 PCs than there are for powerpc.

>
> The steps that I'd suggest instead are:
>
> * merge all exported header files of the two architectures. This
>   alone is a worthy goal, because it allows us to get rid of
>   the ugly code for deciding which version to use in installed
>   headers and elsewhere.

I don't see why this can't be done after the first "Big" merge.

>
> * Create an arch/x86/Makefile that descends into ../i386/* and
>   ../x86_64/* instead of its subdirectories.

The thing that Thomas pointed out, is that physical location of the source
actually does matter. Having two files side by side with the same name
except for a _32.c and _64.c, makes a developer want to merge them.

A perfect example is looking at both
  arch/x86/kernel/module_{32,64}.c
One would be encouraged to make that into a single file. But having
a arch/i386/kernel/module.c and a arch/x86_64/kernel/module.c would
take some time before anyone would care.

>
> * Merge the arch/x86/* subdirectories, one at a time, starting with
>   the low-hanging fruit like oprofile or pci, and do the hard
>   ones like mm and kernel last.

Your looking at a 10year plus merge with that approach. I think that is
exactly what Ingo and Thomas _dont_ want.  Doing it as the big bang way as
is posted in this patch is the fastest way to get where we want to go.

>
> Unfortunately, I don't think I'll spend much time on this, so I
> don't get to decide on it, but you asked for feedback ;-)
>

I'm actually looking forward to helping out here ;-)

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:

On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote:
 > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
 > > On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
 > >  > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
 > >  > > Just one of my machines to 2.6.22.1, and got this during boot..
 > >  > >
 > >  > > Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists
 > >  > >
 > >  > > Under 2.6.21, all was fine.
 > >  > >
 > >  > > sdc is one disk of a 3 disk raid5 set.
 > >  > > The raidset still manages to come up despite this.
 > >  > >
 > >  > > This is a Fedora 7 box, with udev-106-4.1.fc7
 > >  > >
 > >  > > What changed this time?
 > >  >
 > >  > CONFIG_BLK_DEV_BSG=y?
 > >  >
 > >  > There's a name-clash, because bsg tries to create devices with the same 
name.
 > >  > James sent a patch, it's on lkml.
 > >
 > > BSG isn't in 2.6.22
 >
 > Ok. There has nothing else changed, that I could think of what could cause 
this.
 >
 > The code in udev that prints this message looks like:
 >err("symlink(%s, %s) failed: %s", linktarget, filename, strerror(errno));
 >
 > That doesn't really match what you posted. Are there chars missing?

Umm. Now I'm confused. Note above that it's talking about sdc.
/dev/disk/by-uuid/ contains ..

lrwxrwxrwx 1 root root  9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 
-> ../../sdd
lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD -> ../../sdl1
lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB -> ../../sdi1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 
-> ../../sdc1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 
-> ../../sda1
lrwxrwxrwx 1 root root  9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 
-> ../../md0
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf 
-> ../../sda2

note that uuid matches sdd instead.

 > And what does:
 >   udevtest /block/sdc
 > print?

parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file
parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file
parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file
parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file
parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file
This program is for debugging only, it does not create any node,
or run any program specified by a RUN key. It may show incorrect results,
if rules match against subsystem specfic kernel event variables.

main: looking at device '/block/sdc' from subsystem 'block'
run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath''
run_program: '/bin/bash' (stdout) 'dm_multipath   28889  0 '
run_program: '/bin/bash' returned with status 0
run_program: '/lib/udev/usb_id -x'
run_program: '/lib/udev/usb_id' returned with status 1
run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0'
run_program: '/lib/udev/scsi_id' (stdout) 
'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi'
run_program: '/lib/udev/scsi_id' returned with status 0
udev_rules_get_name: add symlink 
'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/path_id /block/sdc'
run_program: '/lib/udev/path_id' (stdout) 
'ID_PATH=pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/path_id' returned with status 0
udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0'
run_program: '/lib/udev/vol_id' (stdout) 
'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL='
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL_SAFE='
run_program: 

Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote:
> On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH <[EMAIL PROTECTED]> wrote:
> 
> > --- a/kernel/params.c
> > +++ b/kernel/params.c
> > @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
> > kobject_set_name(>kobj, name);
> > kobject_init(>kobj);
> > ret = kobject_add(>kobj);
> > -   BUG_ON(ret < 0);
> > +   if (ret) {
> > +   printk(KERN_ERR "module '%s' failed to be added to sysfs, "
> > +   "the system will be unstable now.\n", name);
> > +   return;
> > +   }
> 
> It would be nice to print the value of `ret' too.

Ok, how about this version:

---
 kernel/params.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/kernel/params.c
+++ b/kernel/params.c
@@ -567,7 +567,12 @@ static void __init kernel_param_sysfs_se
kobject_set_name(>kobj, name);
kobject_init(>kobj);
ret = kobject_add(>kobj);
-   BUG_ON(ret < 0);
+   if (ret) {
+   printk(KERN_ERR "Module '%s' failed to be added to sysfs, "
+ "error number %d\n", name, ret);
+   printk(KERN_ERR "The system will be unstable now.\n");
+   return;
+   }
param_sysfs_setup(mk, kparam, num_params, name_skip);
kobject_uevent(>kobj, KOBJ_ADD);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] AFS: Fix file locking

2007-07-20 Thread Linus Torvalds


On Fri, 20 Jul 2007, Nick Piggin wrote:
> 
> So you did. Then to answer that, yes it could be faster because there are
> stupid volatiles sprinkled all over the bitops code so you could easily
> end up having to do more loads. Does it make a real difference? Unlikely,
> but David loves counting cycles :)

I thought we long long since removed the volatiles. They are buggy and 
horrible, and we really want to let the compiler combine multiple 
test-bits, and if they matter that implies locking is buggy or something 
worse..

Ie we'd *want*

if (test_bit(x, y) || test_bit(z,y))

to be rewritten by the compiler as testing bits x/z at the same time.

But now I'm too scared to look.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Andrew Morton
On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH <[EMAIL PROTECTED]> wrote:

> --- a/kernel/params.c
> +++ b/kernel/params.c
> @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
>   kobject_set_name(>kobj, name);
>   kobject_init(>kobj);
>   ret = kobject_add(>kobj);
> - BUG_ON(ret < 0);
> + if (ret) {
> + printk(KERN_ERR "module '%s' failed to be added to sysfs, "
> + "the system will be unstable now.\n", name);
> + return;
> + }

It would be nice to print the value of `ret' too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

(sorry if this is a resend... something bad seems to have happened to me)

Andrew Morton wrote:

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:



This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 



Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..


As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Dave Jones
On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote:
 > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
 > > On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
 > >  > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
 > >  > > Just one of my machines to 2.6.22.1, and got this during boot..
 > >  > >
 > >  > > Starting udev: udevd-event[619]: udev_node_symlink: 
 > > symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
 > > failed: File exists
 > >  > >
 > >  > > Under 2.6.21, all was fine.
 > >  > >
 > >  > > sdc is one disk of a 3 disk raid5 set.
 > >  > > The raidset still manages to come up despite this.
 > >  > >
 > >  > > This is a Fedora 7 box, with udev-106-4.1.fc7
 > >  > >
 > >  > > What changed this time?
 > >  >
 > >  > CONFIG_BLK_DEV_BSG=y?
 > >  >
 > >  > There's a name-clash, because bsg tries to create devices with the same 
 > > name.
 > >  > James sent a patch, it's on lkml.
 > >
 > > BSG isn't in 2.6.22
 > 
 > Ok. There has nothing else changed, that I could think of what could cause 
 > this.
 > 
 > The code in udev that prints this message looks like:
 >err("symlink(%s, %s) failed: %s", linktarget, filename, strerror(errno));
 > 
 > That doesn't really match what you posted. Are there chars missing?

Umm. Now I'm confused. Note above that it's talking about sdc.
/dev/disk/by-uuid/ contains ..

lrwxrwxrwx 1 root root  9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 
-> ../../sdd
lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD -> ../../sdl1
lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB -> ../../sdi1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 
-> ../../sdc1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 
-> ../../sda1
lrwxrwxrwx 1 root root  9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 
-> ../../md0
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf 
-> ../../sda2

note that uuid matches sdd instead.

 > And what does:
 >   udevtest /block/sdc
 > print?

parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file
parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file
parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file
parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file
parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file
This program is for debugging only, it does not create any node,
or run any program specified by a RUN key. It may show incorrect results,
if rules match against subsystem specfic kernel event variables.

main: looking at device '/block/sdc' from subsystem 'block'
run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath''
run_program: '/bin/bash' (stdout) 'dm_multipath   28889  0 '
run_program: '/bin/bash' returned with status 0
run_program: '/lib/udev/usb_id -x'
run_program: '/lib/udev/usb_id' returned with status 1
run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0'
run_program: '/lib/udev/scsi_id' (stdout) 
'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi'
run_program: '/lib/udev/scsi_id' returned with status 0
udev_rules_get_name: add symlink 
'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/path_id /block/sdc'
run_program: '/lib/udev/path_id' (stdout) 
'ID_PATH=pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/path_id' returned with status 0
udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0'
run_program: '/lib/udev/vol_id' (stdout) 
'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL='
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL_SAFE='
run_program: '/lib/udev/vol_id' returned with 

Re: [PATCH 2/3] i386: use x86_64's desc_def.h

2007-07-20 Thread Chris Wright
* Rusty Russell ([EMAIL PROTECTED]) wrote:
> On Thu, 2007-07-19 at 09:27 +1000, Rusty Russell wrote:
> > On Wed, 2007-07-18 at 09:19 -0700, Zachary Amsden wrote:
> > > > +#define GET_CONTENTS(desc) (((desc)->raw32.b >> 10) & 3)
> > > > +#define GET_WRITABLE(desc) (((desc)->raw32.b >>  9) & 1)
> > > 
> > > You got rid of the duplicate definitions here, but then added new 
> > > duplicates (GET_CONTENTS / WRITABLE).  Can you stick them in desc.h?
> > 
> > To be honest, I got sick of counting bits at this point, and didn't want
> > to introduce bugs.
> > 
> > Here's the updated version of PATCH 1/3:
> 
> And 2/3:
> ===
> i386: use x86_64's desc_def.h

plus this needed as well now

Index: linus-2.6/include/asm-i386/xen/hypercall.h
===
--- linus-2.6.orig/include/asm-i386/xen/hypercall.h
+++ linus-2.6/include/asm-i386/xen/hypercall.h
@@ -359,8 +359,8 @@ MULTI_update_descriptor(struct multicall
mcl->op = __HYPERVISOR_update_descriptor;
mcl->args[0] = maddr;
mcl->args[1] = maddr >> 32;
-   mcl->args[2] = desc.a;
-   mcl->args[3] = desc.b;
+   mcl->args[2] = desc.raw32.a;
+   mcl->args[3] = desc.raw32.b;
 }
 
 static inline void
Index: linus-2.6/drivers/lguest/interrupts_and_traps.c
===
--- linus-2.6.orig/drivers/lguest/interrupts_and_traps.c
+++ linus-2.6/drivers/lguest/interrupts_and_traps.c
@@ -103,9 +103,9 @@ void maybe_do_interrupt(struct lguest *l
}
 
idt = >idt[FIRST_EXTERNAL_VECTOR+irq];
-   if (idt_present(idt->a, idt->b)) {
+   if (idt_present(idt->raw32.a, idt->raw32.b)) {
clear_bit(irq, lg->irqs_pending);
-   set_guest_interrupt(lg, idt->a, idt->b, 0);
+   set_guest_interrupt(lg, idt->raw32.a, idt->raw32.b, 0);
}
 }
 
@@ -116,7 +116,7 @@ static int has_err(unsigned int trap)
 
 int deliver_trap(struct lguest *lg, unsigned int num)
 {
-   u32 lo = lg->idt[num].a, hi = lg->idt[num].b;
+   u32 lo = lg->idt[num].raw32.a, hi = lg->idt[num].raw32.b;
 
if (!idt_present(lo, hi))
return 0;
@@ -139,7 +139,7 @@ static int direct_trap(const struct lgue
return 0;
 
/* Interrupt gates (0xE) or not present (0x0) can't go direct. */
-   return idt_type(trap->a, trap->b) == 0xF;
+   return idt_type(trap->raw32.a, trap->raw32.b) == 0xF;
 }
 
 void pin_stack_pages(struct lguest *lg)
@@ -170,15 +170,15 @@ static void set_trap(struct lguest *lg, 
u8 type = idt_type(lo, hi);
 
if (!idt_present(lo, hi)) {
-   trap->a = trap->b = 0;
+   trap->raw32.a = trap->raw32.b = 0;
return;
}
 
if (type != 0xE && type != 0xF)
kill_guest(lg, "bad IDT type %i", type);
 
-   trap->a = ((__KERNEL_CS|GUEST_PL)<<16) | (lo&0x);
-   trap->b = (hi&0xEF00);
+   trap->raw32.a = ((__KERNEL_CS|GUEST_PL)<<16) | (lo&0x);
+   trap->raw32.b = (hi&0xEF00);
 }
 
 void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi)
@@ -204,8 +204,8 @@ static void default_idt_entry(struct des
if (trap == LGUEST_TRAP_ENTRY)
flags |= (GUEST_PL << 13);
 
-   idt->a = (LGUEST_CS<<16) | (handler&0x);
-   idt->b = (handler&0x) | flags;
+   idt->raw32.a = (LGUEST_CS<<16) | (handler&0x);
+   idt->raw32.b = (handler&0x) | flags;
 }
 
 void setup_default_idt_entries(struct lguest_ro_state *state,
Index: linus-2.6/drivers/lguest/lg.h
===
--- linus-2.6.orig/drivers/lguest/lg.h
+++ linus-2.6/drivers/lguest/lg.h
@@ -44,8 +44,8 @@ void free_pagetables(void);
 int init_pagetables(struct page **switcher_page, unsigned int pages);
 
 /* Full 4G segment descriptors, suitable for CS and DS. */
-#define FULL_EXEC_SEGMENT ((struct desc_struct){0x, 0x00cf9b00})
-#define FULL_SEGMENT ((struct desc_struct){0x, 0x00cf9300})
+#define FULL_EXEC_SEGMENT ((struct desc_struct){ {0x00cf9b00ULL} })
+#define FULL_SEGMENT ((struct desc_struct){ {0x00cf9300ULL} })
 
 struct lguest_dma_info
 {
Index: linus-2.6/drivers/lguest/lguest.c
===
--- linus-2.6.orig/drivers/lguest/lguest.c
+++ linus-2.6/drivers/lguest/lguest.c
@@ -173,7 +173,7 @@ static void lguest_load_idt(const struct
struct desc_struct *idt = (void *)desc->address;
 
for (i = 0; i < (desc->size+1)/8; i++)
-   hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].a, idt[i].b);
+   hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].raw32.a, idt[i].raw32.b);
 }
 
 static void lguest_load_gdt(const struct Xgt_desc_struct *desc)
Index: linus-2.6/drivers/lguest/segments.c
===
--- 

Re: [PATCH 3/3] i386: Replace struct Xgt_desc_struct with struct desc_ptr

2007-07-20 Thread Chris Wright
* Rusty Russell ([EMAIL PROTECTED]) wrote:
> Remove i386's Xgt_desc_struct definition and use desc_def.h's desc_ptr.

plus this is needed now


Index: linus-2.6/drivers/lguest/lg.h
===
--- linus-2.6.orig/drivers/lguest/lg.h
+++ linus-2.6/drivers/lguest/lg.h
@@ -91,13 +91,13 @@ struct lguest_ro_state
 {
/* Host information we need to restore when we switch back. */
u32 host_cr3;
-   struct Xgt_desc_struct host_idt_desc;
-   struct Xgt_desc_struct host_gdt_desc;
+   struct desc_ptr host_idt_desc;
+   struct desc_ptr host_gdt_desc;
u32 host_sp;
 
/* Fields which are used when guest is running. */
-   struct Xgt_desc_struct guest_idt_desc;
-   struct Xgt_desc_struct guest_gdt_desc;
+   struct desc_ptr guest_idt_desc;
+   struct desc_ptr guest_gdt_desc;
struct i386_hw_tss guest_tss;
struct desc_struct guest_idt[IDT_ENTRIES];
struct desc_struct guest_gdt[GDT_ENTRIES];
Index: linus-2.6/arch/i386/xen/enlighten.c
===
--- linus-2.6.orig/arch/i386/xen/enlighten.c
+++ linus-2.6/arch/i386/xen/enlighten.c
@@ -301,7 +301,7 @@ static void xen_set_ldt(const void *addr
xen_mc_issue(PARAVIRT_LAZY_CPU);
 }
 
-static void xen_load_gdt(const struct Xgt_desc_struct *dtr)
+static void xen_load_gdt(const struct desc_ptr *dtr)
 {
unsigned long *frames;
unsigned long va = dtr->address;
@@ -401,7 +401,7 @@ static int cvt_gate_to_trap(int vector, 
 }
 
 /* Locations of each CPU's IDT */
-static DEFINE_PER_CPU(struct Xgt_desc_struct, idt_desc);
+static DEFINE_PER_CPU(struct desc_ptr, idt_desc);
 
 /* Set an IDT entry.  If the entry is part of the current IDT, then
also update Xen. */
@@ -433,7 +433,7 @@ static void xen_write_idt_entry(struct d
preempt_enable();
 }
 
-static void xen_convert_trap_info(const struct Xgt_desc_struct *desc,
+static void xen_convert_trap_info(const struct desc_ptr *desc,
  struct trap_info *traps)
 {
unsigned in, out, count;
@@ -452,7 +452,7 @@ static void xen_convert_trap_info(const 
 
 void xen_copy_trap_info(struct trap_info *traps)
 {
-   const struct Xgt_desc_struct *desc = &__get_cpu_var(idt_desc);
+   const struct desc_ptr *desc = &__get_cpu_var(idt_desc);
 
xen_convert_trap_info(desc, traps);
 }
@@ -460,7 +460,7 @@ void xen_copy_trap_info(struct trap_info
 /* Load a new IDT into Xen.  In principle this can be per-CPU, so we
hold a spinlock to protect the static traps[] array (static because
it avoids allocation, and saves stack space). */
-static void xen_load_idt(const struct Xgt_desc_struct *desc)
+static void xen_load_idt(const struct desc_ptr *desc)
 {
static DEFINE_SPINLOCK(lock);
static struct trap_info traps[257];
Index: linus-2.6/drivers/lguest/lguest.c
===
--- linus-2.6.orig/drivers/lguest/lguest.c
+++ linus-2.6/drivers/lguest/lguest.c
@@ -167,7 +167,7 @@ static void lguest_write_idt_entry(struc
hcall(LHCALL_LOAD_IDT_ENTRY, entrynum, low, high);
 }
 
-static void lguest_load_idt(const struct Xgt_desc_struct *desc)
+static void lguest_load_idt(const struct desc_ptr *desc)
 {
unsigned int i;
struct desc_struct *idt = (void *)desc->address;
@@ -176,7 +176,7 @@ static void lguest_load_idt(const struct
hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].raw32.a, idt[i].raw32.b);
 }
 
-static void lguest_load_gdt(const struct Xgt_desc_struct *desc)
+static void lguest_load_gdt(const struct desc_ptr *desc)
 {
BUG_ON((desc->size+1)/8 != GDT_ENTRIES);
hcall(LHCALL_LOAD_GDT, __pa(desc->address), GDT_ENTRIES, 0);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:

On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
 > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
 > > Just one of my machines to 2.6.22.1, and got this during boot..
 > >
 > > Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File 
exists
 > >
 > > Under 2.6.21, all was fine.
 > >
 > > sdc is one disk of a 3 disk raid5 set.
 > > The raidset still manages to come up despite this.
 > >
 > > This is a Fedora 7 box, with udev-106-4.1.fc7
 > >
 > > What changed this time?
 >
 > CONFIG_BLK_DEV_BSG=y?
 >
 > There's a name-clash, because bsg tries to create devices with the same name.
 > James sent a patch, it's on lkml.

BSG isn't in 2.6.22


Ok. There has nothing else changed, that I could think of what could cause this.

The code in udev that prints this message looks like:
  err("symlink(%s, %s) failed: %s", linktarget, filename, strerror(errno));

That doesn't really match what you posted. Are there chars missing?
Can you please recheck?

And what does:
 udevtest /block/sdc
print?

Thanks,
Kay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

Andrew Morton wrote:

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:



This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 



Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..


As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] lguest: documentation pt VII: FIXMEs

2007-07-20 Thread Rusty Russell
Documentation: The FIXMEs

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

---
 Documentation/lguest/lguest.c |   12 
 drivers/char/hvc_lguest.c |3 +++
 drivers/lguest/interrupts_and_traps.c |   14 ++
 drivers/lguest/io.c   |   10 ++
 drivers/lguest/lguest.c   |8 
 drivers/lguest/lguest_asm.S   |   14 ++
 drivers/lguest/page_tables.c  |5 +
 drivers/lguest/segments.c |4 
 drivers/net/lguest_net.c  |   19 +++
 9 files changed, 89 insertions(+)

===
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1536,3 +1536,15 @@ int main(int argc, char *argv[])
/* Finally, run the Guest.  This doesn't return. */
run_guest(lguest_fd, _list);
 }
+/*:*/
+
+/*M:999
+ * Mastery is done: you now know everything I do.
+ *
+ * But surely you have seen code, features and bugs in your wanderings which
+ * you now yearn to attack?  That is the real game, and I look forward to you
+ * patching and forking lguest into the Your-Name-Here-visor.
+ *
+ * Farewell, and good coding!
+ * Rusty Russell.
+ */
===
--- a/drivers/char/hvc_lguest.c
+++ b/drivers/char/hvc_lguest.c
@@ -13,6 +13,9 @@
  * functions.
  :*/
 
+/*M:002 The console can be flooded: while the Guest is processing input the
+ * Host can send more.  Buffering in the Host could alleviate this, but it is a
+ * difficult problem in general. :*/
 /* Copyright (C) 2006 Rusty Russell, IBM Corporation
  *
  * This program is free software; you can redistribute it and/or modify
===
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -231,6 +231,20 @@ static int direct_trap(const struct lgue
 * go direct, of course 8) */
return idt_type(trap->a, trap->b) == 0xF;
 }
+/*:*/
+
+/*M:005 The Guest has the ability to turn its interrupt gates into trap gates,
+ * if it is careful.  The Host will let trap gates can go directly to the
+ * Guest, but the Guest needs the interrupts atomically disabled for an
+ * interrupt gate.  It can do this by pointing the trap gate at instructions
+ * within noirq_start and noirq_end, where it can safely disable interrupts. */
+
+/*M:006 The Guests do not use the sysenter (fast system call) instruction,
+ * because it's hardcoded to enter privilege level 0 and so can't go direct.
+ * It's about twice as fast as the older "int 0x80" system call, so it might
+ * still be worthwhile to handle it in the Switcher and lcall down to the
+ * Guest.  The sysenter semantics are hairy tho: search for that keyword in
+ * entry.S :*/
 
 /*H:260 When we make traps go directly into the Guest, we need to make sure
  * the kernel stack is valid (ie. mapped in the page tables).  Otherwise, the
===
--- a/drivers/lguest/io.c
+++ b/drivers/lguest/io.c
@@ -553,6 +553,16 @@ void release_all_dma(struct lguest *lg)
up_read(>mm->mmap_sem);
 }
 
+/*M:007 We only return a single DMA buffer to the Launcher, but it would be
+ * more efficient to return a pointer to the entire array of DMA buffers, which
+ * it can cache and choose one whenever it wants.
+ *
+ * Currently the Launcher uses a write to /dev/lguest, and the return value is
+ * the address of the DMA structure with the interrupt number placed in
+ * dma->used_len.  If we wanted to return the entire array, we need to return
+ * the address, array size and interrupt number: this seems to require an
+ * ioctl(). :*/
+
 /*L:320 This routine looks for a DMA buffer registered by the Guest on the
  * given key (using the BIND_DMA hypercall). */
 unsigned long get_dma_buffer(struct lguest *lg,
===
--- a/drivers/lguest/lguest.c
+++ b/drivers/lguest/lguest.c
@@ -251,6 +251,14 @@ static void irq_enable(void)
 {
lguest_data.irq_enabled = X86_EFLAGS_IF;
 }
+/*:*/
+/*M:003 Note that we don't check for outstanding interrupts when we re-enable
+ * them (or when we unmask an interrupt).  This seems to work for the moment,
+ * since interrupts are rare and we'll just get the interrupt on the next timer
+ * tick, but now we have CONFIG_NO_HZ, we should revisit this.  One way
+ * would be to put the "irq_enabled" field in a page by itself, and have the
+ * Host write-protect it when an interrupt comes in when irqs are disabled.
+ * There will then be a page fault as soon as interrupts are re-enabled. :*/
 
 /*G:034
  * The Interrupt Descriptor Table (IDT).
===
--- a/drivers/lguest/lguest_asm.S
+++ b/drivers/lguest/lguest_asm.S
@@ -41,6 +41,20 @@ LGUEST_PATCH(pushf, movl 

Re: [PATCH] AFS: Fix file locking

2007-07-20 Thread Nick Piggin

Andrew Morton wrote:

On Wed, 18 Jul 2007 15:56:53 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote:



Andrew Morton wrote:


On Tue, 17 Jul 2007 13:47:32 +0100
David Howells <[EMAIL PROTECTED]> wrote:




+   if (type == AFS_LOCK_READ &&
+   vnode->flags & (1 << AFS_VNODE_READLOCKED)) {



Here we use

vnode->flags & (1 << foo)




+   set_bit(AFS_VNODE_LOCKING, >flags);



and elsewhere we use set_bit(foo, >flags) and clear_bit()

This is a bit strange.  Does the open-coded bit-test have any performance
benefit on any architecture?  Not on x86 at least, afaik.


It uses locked operations on x86, but you can use __set_bit instead
(which should always be at least as efficient as the C version).



I said "bit-test".  ie: test_bit().  That doesn't use a locked operation.


So you did. Then to answer that, yes it could be faster because there are
stupid volatiles sprinkled all over the bitops code so you could easily
end up having to do more loads. Does it make a real difference? Unlikely,
but David loves counting cycles :)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] lguest: documentation pt VI: Switcher

2007-07-20 Thread Rusty Russell
Documentation: The Switcher

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

---
 drivers/lguest/core.c |   51 +++-
 drivers/lguest/switcher.S |  271 ++---
 2 files changed, 276 insertions(+), 46 deletions(-)

===
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -394,46 +394,89 @@ static void set_ts(void)
write_cr0(cr0|8);
 }
 
+/*S:010
+ * We are getting close to the Switcher.
+ *
+ * Remember that each CPU has two pages which are visible to the Guest when it
+ * runs on that CPU.  This has to contain the state for that Guest: we copy the
+ * state in just before we run the Guest.
+ *
+ * Each Guest has "changed" flags which indicate what has changed in the Guest
+ * since it last ran.  We saw this set in interrupts_and_traps.c and
+ * segments.c.
+ */
 static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages)
 {
+   /* Copying all this data can be quite expensive.  We usually run the
+* same Guest we ran last time (and that Guest hasn't run anywhere else
+* meanwhile).  If that's not the case, we pretend everything in the
+* Guest has changed. */
if (__get_cpu_var(last_guest) != lg || lg->last_pages != pages) {
__get_cpu_var(last_guest) = lg;
lg->last_pages = pages;
lg->changed = CHANGED_ALL;
}
 
-   /* These are pretty cheap, so we do them unconditionally. */
+   /* These copies are pretty cheap, so we do them unconditionally: */
+   /* Save the current Host top-level page directory. */
pages->state.host_cr3 = __pa(current->mm->pgd);
+   /* Set up the Guest's page tables to see this CPU's pages (and no
+* other CPU's pages). */
map_switcher_in_guest(lg, pages);
+   /* Set up the two "TSS" members which tell the CPU what stack to use
+* for traps which do directly into the Guest (ie. traps at privilege
+* level 1). */
pages->state.guest_tss.esp1 = lg->esp1;
pages->state.guest_tss.ss1 = lg->ss1;
 
-   /* Copy direct trap entries. */
+   /* Copy direct-to-Guest trap entries. */
if (lg->changed & CHANGED_IDT)
copy_traps(lg, pages->state.guest_idt, default_idt_entries);
 
-   /* Copy all GDT entries but the TSS. */
+   /* Copy all GDT entries which the Guest can change. */
if (lg->changed & CHANGED_GDT)
copy_gdt(lg, pages->state.guest_gdt);
/* If only the TLS entries have changed, copy them. */
else if (lg->changed & CHANGED_GDT_TLS)
copy_gdt_tls(lg, pages->state.guest_gdt);
 
+   /* Mark the Guest as unchanged for next time. */
lg->changed = 0;
 }
 
+/* Finally: the code to actually call into the Switcher to run the Guest. */
 static void run_guest_once(struct lguest *lg, struct lguest_pages *pages)
 {
+   /* This is a dummy value we need for GCC's sake. */
unsigned int clobber;
 
+   /* Copy the guest-specific information into this CPU's "struct
+* lguest_pages". */
copy_in_guest_info(lg, pages);
 
-   /* Put eflags on stack, lcall does rest: suitable for iret return. */
+   /* Now: we push the "eflags" register on the stack, then do an "lcall".
+* This is how we change from using the kernel code segment to using
+* the dedicated lguest code segment, as well as jumping into the
+* Switcher.
+*
+* The lcall also pushes the old code segment (KERNEL_CS) onto the
+* stack, then the address of this call.  This stack layout happens to
+* exactly match the stack of an interrupt... */
asm volatile("pushf; lcall *lguest_entry"
+/* This is how we tell GCC that %eax ("a") and %ebx ("b")
+ * are changed by this routine.  The "=" means output. */
 : "=a"(clobber), "=b"(clobber)
+/* %eax contains the pages pointer.  ("0" refers to the
+ * 0-th argument above, ie "a").  %ebx contains the
+ * physical address of the Guest's top-level page
+ * directory. */
 : "0"(pages), "1"(__pa(lg->pgdirs[lg->pgdidx].pgdir))
+/* We tell gcc that all these registers could change,
+ * which means we don't have to save and restore them in
+ * the Switcher. */
 : "memory", "%edx", "%ecx", "%edi", "%esi");
 }
+/*:*/
 
 /*H:030 Let's jump straight to the the main loop which runs the Guest.
  * Remember, this is called by the Launcher reading /dev/lguest, and we keep
===
--- a/drivers/lguest/switcher.S
+++ b/drivers/lguest/switcher.S
@@ -6,41 +6,131 @@
  * are feeling invigorated and refreshed then the next, more 

[PATCH 1/7] lguest: documentation pt I: Preparation

2007-07-20 Thread Rusty Russell
The netfilter code had very good documentation: the Netfilter Hacking
HOWTO.  Noone ever read it.

So this time I'm trying something different, using a bit of
Knuthiness.  Start with drivers/lguest/README.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 Documentation/lguest/extract  |   58 +
 Documentation/lguest/lguest.c |9 +++--
 drivers/lguest/Makefile   |   12 ++
 drivers/lguest/README |   47 ++
 drivers/lguest/core.c |7 ++-
 drivers/lguest/hypercalls.c   |9 +++--
 drivers/lguest/interrupts_and_traps.c |   13 +++
 drivers/lguest/io.c   |8 +++-
 drivers/lguest/lguest.c   |   30 +++--
 drivers/lguest/lguest_bus.c   |3 +
 drivers/lguest/lguest_user.c  |7 +++
 drivers/lguest/page_tables.c  |   10 -
 drivers/lguest/segments.c |   11 ++
 drivers/lguest/switcher.S |   13 +++
 14 files changed, 218 insertions(+), 19 deletions(-)

===
--- /dev/null
+++ b/Documentation/lguest/extract
@@ -0,0 +1,58 @@
+#! /bin/sh
+
+set -e
+
+PREFIX=$1
+shift
+
+trap 'rm -r $TMPDIR' 0
+TMPDIR=`mktemp -d`
+
+exec 3>/dev/null
+for f; do
+while IFS="
+" read -r LINE; do
+   case "$LINE" in
+   *$PREFIX:[0-9]*:\**)
+   NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
+   if [ -f $TMPDIR/$NUM ]; then
+   echo "$TMPDIR/$NUM already exits prior to $f"
+   exit 1
+   fi
+   exec 3>>$TMPDIR/$NUM
+   echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
+   /bin/echo "$LINE" | sed -e "s/$PREFIX:[0-9]*//" -e "s/:\*/*/" 
>&3
+   ;;
+   *$PREFIX:[0-9]*)
+   NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
+   if [ -f $TMPDIR/$NUM ]; then
+   echo "$TMPDIR/$NUM already exits prior to $f"
+   exit 1
+   fi
+   exec 3>>$TMPDIR/$NUM
+   echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
+   /bin/echo "$LINE" | sed "s/$PREFIX:[0-9]*//" >&3
+   ;;
+   *:\**)
+   /bin/echo "$LINE" | sed -e "s/:\*/*/" -e "s,/\*\*/,," >&3
+   echo >&3
+   exec 3>/dev/null
+   ;;
+   *)
+   /bin/echo "$LINE" >&3
+   ;;
+   esac
+done < $f
+echo >&3
+exec 3>/dev/null
+done
+
+LASTFILE=""
+for f in $TMPDIR/*; do
+if [ "$LASTFILE" != $(cat $TMPDIR/.$(basename $f) ) ]; then
+   LASTFILE=$(cat $TMPDIR/.$(basename $f) )
+   echo "[ $LASTFILE ]"
+fi
+cat $f
+done
+
===
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1,5 +1,10 @@
-/* Simple program to layout "physical" memory for new lguest guest.
- * Linked high to avoid likely physical memory.  */
+/*P:100 This is the Launcher code, a simple program which lays out the
+ * "physical" memory for the new Guest by mapping the kernel image and the
+ * virtual devices, then reads repeatedly from /dev/lguest to run the Guest.
+ *
+ * The only trick: the Makefile links it statically at a high address, so it
+ * will be clear of the guest memory region.  It means that each Guest cannot
+ * have more than 2.5G of memory on a normally configured Host. :*/
 #define _LARGEFILE64_SOURCE
 #define _GNU_SOURCE
 #include 
===
--- a/drivers/lguest/Makefile
+++ b/drivers/lguest/Makefile
@@ -5,3 +5,15 @@ obj-$(CONFIG_LGUEST)   += lg.o
 obj-$(CONFIG_LGUEST)   += lg.o
 lg-y := core.o hypercalls.o page_tables.o interrupts_and_traps.o \
segments.o io.o lguest_user.o switcher.o
+
+Preparation Preparation!: PREFIX=P
+Guest: PREFIX=G
+Drivers: PREFIX=D
+Launcher: PREFIX=L
+Host: PREFIX=H
+Switcher: PREFIX=S
+Mastery: PREFIX=M
+Beer:
+   @for f in Preparation Guest Drivers Launcher Host Switcher Mastery; do 
echo "{==- $$f -==}"; make -s $$f; done; echo "{==-==}"
+Preparation Preparation! Guest Drivers Launcher Host Switcher Mastery:
+   @sh ../../Documentation/lguest/extract $(PREFIX) `find ../../* -name 
'*.[chS]' -wholename '*lguest*'`
===
--- /dev/null
+++ b/drivers/lguest/README
@@ -0,0 +1,47 @@
+Welcome, friend reader, to lguest.
+
+Lguest is an adventure, with you, the reader, as Hero.  I can't think of many
+5000-line projects which offer both such capability and glimpses of future
+potential; it is an exciting time to be delving into the source!
+
+But be warned; this is an arduous journey of several hours or more!  And as we
+know, all true Heroes are driven by a Noble Goal.  Thus I offer a Beer (or
+equivalent) to anyone 

[PATCH 2/7] lguest: documentation pt II: Guest

2007-07-20 Thread Rusty Russell
Documentation: The Guest

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

---
 drivers/lguest/lguest.c |  458 ---
 drivers/lguest/lguest_asm.S |   57 +++--
 include/linux/lguest.h  |   47 +++-
 3 files changed, 512 insertions(+), 50 deletions(-)

===
--- a/drivers/lguest/lguest.c
+++ b/drivers/lguest/lguest.c
@@ -66,6 +66,12 @@
 #include 
 #include 
 
+/*G:010 Welcome to the Guest!
+ *
+ * The Guest in our tale is a simple creature: identical to the Host but
+ * behaving in simplified but equivalent ways.  In particular, the Guest is the
+ * same kernel as the Host (or at least, built from the same source code). :*/
+
 /* Declarations for definitions in lguest_guest.S */
 extern char lguest_noirq_start[], lguest_noirq_end[];
 extern const char lgstart_cli[], lgend_cli[];
@@ -84,7 +90,26 @@ struct lguest_device_desc *lguest_device
 struct lguest_device_desc *lguest_devices;
 static cycle_t clock_base;
 
-static enum paravirt_lazy_mode lazy_mode;
+/*G:035 Notice the lazy_hcall() above, rather than hcall().  This is our first
+ * real optimization trick!
+ *
+ * When lazy_mode is set, it means we're allowed to defer all hypercalls and do
+ * them as a batch when lazy_mode is eventually turned off.  Because hypercalls
+ * are reasonably expensive, batching them up makes sense.  For example, a
+ * large mmap might update dozens of page table entries: that code calls
+ * lguest_lazy_mode(PARAVIRT_LAZY_MMU), does the dozen updates, then calls
+ * lguest_lazy_mode(PARAVIRT_LAZY_NONE).
+ *
+ * So, when we're in lazy mode, we call async_hypercall() to store the call for
+ * future processing.  When lazy mode is turned off we issue a hypercall to
+ * flush the stored calls.
+ *
+ * There's also a hack where "mode" is set to "PARAVIRT_LAZY_FLUSH" which
+ * indicates we're to flush any outstanding calls immediately.  This is used
+ * when an interrupt handler does a kmap_atomic(): the page table changes must
+ * happen immediately even if we're in the middle of a batch.  Usually we're
+ * not, though, so there's nothing to do. */
+static enum paravirt_lazy_mode lazy_mode; /* Note: not SMP-safe! */
 static void lguest_lazy_mode(enum paravirt_lazy_mode mode)
 {
if (mode == PARAVIRT_LAZY_FLUSH) {
@@ -108,6 +133,16 @@ static void lazy_hcall(unsigned long cal
async_hcall(call, arg1, arg2, arg3);
 }
 
+/* async_hcall() is pretty simple: I'm quite proud of it really.  We have a
+ * ring buffer of stored hypercalls which the Host will run though next time we
+ * do a normal hypercall.  Each entry in the ring has 4 slots for the hypercall
+ * arguments, and a "hcall_status" word which is 0 if the call is ready to go,
+ * and 255 once the Host has finished with it.
+ *
+ * If we come around to a slot which hasn't been finished, then the table is
+ * full and we just make the hypercall directly.  This has the nice side
+ * effect of causing the Host to run all the stored calls in the ring buffer
+ * which empties it for next time! */
 void async_hcall(unsigned long call,
 unsigned long arg1, unsigned long arg2, unsigned long arg3)
 {
@@ -115,6 +150,9 @@ void async_hcall(unsigned long call,
static unsigned int next_call;
unsigned long flags;
 
+   /* Disable interrupts if not already disabled: we don't want an
+* interrupt handler making a hypercall while we're already doing
+* one! */
local_irq_save(flags);
if (lguest_data.hcall_status[next_call] != 0xFF) {
/* Table full, so do normal hcall which will flush table. */
@@ -124,7 +162,7 @@ void async_hcall(unsigned long call,
lguest_data.hcalls[next_call].edx = arg1;
lguest_data.hcalls[next_call].ebx = arg2;
lguest_data.hcalls[next_call].ecx = arg3;
-   /* Make sure host sees arguments before "valid" flag. */
+   /* Arguments must all be written before we mark it to go */
wmb();
lguest_data.hcall_status[next_call] = 0;
if (++next_call == LHCALL_RING_SIZE)
@@ -132,9 +170,14 @@ void async_hcall(unsigned long call,
}
local_irq_restore(flags);
 }
-
+/*:*/
+
+/* Wrappers for the SEND_DMA and BIND_DMA hypercalls.  This is mainly because
+ * Jeff Garzik complained that __pa() should never appear in drivers, and this
+ * helps remove most of them.   But also, it wraps some ugliness. */
 void lguest_send_dma(unsigned long key, struct lguest_dma *dma)
 {
+   /* The hcall might not write this if something goes wrong */
dma->used_len = 0;
hcall(LHCALL_SEND_DMA, key, __pa(dma), 0);
 }
@@ -142,11 +185,16 @@ int lguest_bind_dma(unsigned long key, s
 int lguest_bind_dma(unsigned long key, struct lguest_dma *dmas,
unsigned int num, u8 irq)
 {
+   /* This is the only hypercall which actually wants 5 

[PATCH 3/7] lguest: documentation pt III: Drivers

2007-07-20 Thread Rusty Russell
Documentation: The Drivers

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

---
 drivers/block/lguest_blk.c  |  171 +++---
 drivers/char/hvc_lguest.c   |   77 +
 drivers/lguest/lguest_bus.c |   72 
 drivers/net/lguest_net.c|  222 +++
 include/linux/lguest_bus.h  |5 
 include/linux/lguest_launcher.h |   60 ++
 6 files changed, 565 insertions(+), 42 deletions(-)

===
--- a/drivers/block/lguest_blk.c
+++ b/drivers/block/lguest_blk.c
@@ -1,6 +1,12 @@
-/* A simple block driver for lguest.
- *
- * Copyright 2006 Rusty Russell <[EMAIL PROTECTED]> IBM Corporation
+/*D:400
+ * The Guest block driver
+ *
+ * This is a simple block driver, which appears as /dev/lgba, lgbb, lgbc etc.
+ * The mechanism is simple: we place the information about the request in the
+ * device page, then use SEND_DMA (containing the data for a write, or an empty
+ * "ping" DMA for a read).
+ :*/
+/* Copyright 2006 Rusty Russell <[EMAIL PROTECTED]> IBM Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -25,27 +31,50 @@
 
 static char next_block_index = 'a';
 
+/*D:420 Here is the structure which holds all the information we need about
+ * each Guest block device.
+ *
+ * I'm sure at this stage, you're wondering "hey, where was the adventure I was
+ * promised?" and thinking "Rusty sucks, I shall say nasty things about him on
+ * my blog".  I think Real adventures have boring bits, too, and you're in the
+ * middle of one.  But it gets better.  Just not quite yet. */
 struct blockdev
 {
+   /* The block queue infrastructure wants a spinlock: it is held while it
+* calls our block request function.  We grab it in our interrupt
+* handler so the responses don't mess with new requests. */
spinlock_t lock;
 
-   /* The disk structure for the kernel. */
+   /* The disk structure registered with kernel. */
struct gendisk *disk;
 
-   /* The major number for this disk. */
+   /* The major device number for this disk, and the interrupt.  We only
+* really keep them here for completeness; we'd need them if we
+* supported device unplugging. */
int major;
int irq;
 
+   /* The physical address of this device's memory page */
unsigned long phys_addr;
-   /* The mapped block page. */
+   /* The mapped memory page for convenient acces. */
struct lguest_block_page *lb_page;
 
-   /* We only have a single request outstanding at a time. */
+   /* We only have a single request outstanding at a time: this is it. */
struct lguest_dma dma;
struct request *req;
 };
 
-/* Jens gave me this nice helper to end all chunks of a request. */
+/*D:495 We originally used end_request() throughout the driver, but it turns
+ * out that end_request() is deprecated, and doesn't actually end the request
+ * (which seems like a good reason to deprecate it!).  It simply ends the first
+ * bio.  So if we had 3 bios in a "struct request" we would do all 3,
+ * end_request(), do 2, end_request(), do 1 and end_request(): twice as much
+ * work as we needed to do.
+ *
+ * This reinforced to me that I do not understand the block layer.
+ *
+ * Nonetheless, Jens Axboe gave me this nice helper to end all chunks of a
+ * request.  This improved disk speed by 130%. */
 static void end_entire_request(struct request *req, int uptodate)
 {
if (end_that_request_first(req, uptodate, req->hard_nr_sectors))
@@ -55,30 +84,62 @@ static void end_entire_request(struct re
end_that_request_last(req, uptodate);
 }
 
+/* I'm told there are only two stories in the world worth telling: love and
+ * hate.  So there used to be a love scene here like this:
+ *
+ *  Launcher:  We could make beautiful I/O together, you and I.
+ *  Guest: My, that's a big disk!
+ *
+ * Unfortunately, it was just too raunchy for our otherwise-gentle tale. */
+
+/*D:490 This is the interrupt handler, called when a block read or write has
+ * been completed for us. */
 static irqreturn_t lgb_irq(int irq, void *_bd)
 {
+   /* We handed our "struct blockdev" as the argument to request_irq(), so
+* it is passed through to us here.  This tells us which device we're
+* dealing with in case we have more than one. */
struct blockdev *bd = _bd;
unsigned long flags;
 
+   /* We weren't doing anything?  Strange, but could happen if we shared
+* interrupts (we don't!). */
if (!bd->req) {
pr_debug("No work!\n");
return IRQ_NONE;
}
 
+   /* Not done yet?  That's equally strange. */
if (!bd->lb_page->result) {
pr_debug("No result!\n");
return IRQ_NONE;
   

Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Dave Jones
On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
 > On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:
 > > Just one of my machines to 2.6.22.1, and got this during boot..
 > >
 > > Starting udev: udevd-event[619]: udev_node_symlink: 
 > > symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
 > > failed: File exists
 > >
 > > Under 2.6.21, all was fine.
 > >
 > > sdc is one disk of a 3 disk raid5 set.
 > > The raidset still manages to come up despite this.
 > >
 > > This is a Fedora 7 box, with udev-106-4.1.fc7
 > >
 > > What changed this time?
 > 
 > CONFIG_BLK_DEV_BSG=y?
 > 
 > There's a name-clash, because bsg tries to create devices with the same name.
 > James sent a patch, it's on lkml.

BSG isn't in 2.6.22

Dave
 
-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: where is the code for read system call?

2007-07-20 Thread Karsten Wiese
Am Samstag, 21. Juli 2007 schrieb Agarwal, Lomesh:
> My application reads from socket. I need to change the behavior of read
> system call for an experiment. Can someone point me to code?

fs/read_write.c: line 356
asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Dave Jones <[EMAIL PROTECTED]> wrote:

Just one of my machines to 2.6.22.1, and got this during boot..

Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
failed: File exists

Under 2.6.21, all was fine.

sdc is one disk of a 3 disk raid5 set.
The raidset still manages to come up despite this.

This is a Fedora 7 box, with udev-106-4.1.fc7

What changed this time?


CONFIG_BLK_DEV_BSG=y?

There's a name-clash, because bsg tries to create devices with the same name.
James sent a patch, it's on lkml.

Kay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

Nishanth Aravamudan wrote:

On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:


On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <[EMAIL PROTECTED]> wrote:



+   }
+
+   offset += ret;
+   retval += ret;
+   len -= ret;
+   index += offset >> HPAGE_SHIFT;
+   offset &= ~HPAGE_MASK;
+
+   page_cache_release(page);
+   if (ret == nr && len)
+   continue;
+   goto out;
+   }
+out:
+   return retval;
+}


This code doesn't have all the ghastly tricks which we deploy to
handle concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 


Nick, can you think of any serious consequences of a read/truncate
race in there?  I can't..



All I want is a simple read() to get my oprofile working.  Please
advise.


Did you consider changing oprofile userspace to read the executable
with mmap?



It's not actually oprofile's code, though, it's libbfd (used by
oprofile). And it works fine (presumably) for other binaries.


So... what's the problem with changing it? The fact that it is a
library doesn't really make a difference except that you'll also
help everyone else who links with it.

It won't break backwards compatibility, and it will work on older
kernels...

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


film at 11: kernel update breaks udev.

2007-07-20 Thread Dave Jones
Just one of my machines to 2.6.22.1, and got this during boot..

Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
failed: File exists

Under 2.6.21, all was fine.

sdc is one disk of a 3 disk raid5 set.
The raidset still manages to come up despite this.

This is a Fedora 7 box, with udev-106-4.1.fc7

What changed this time?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Gabriel C
Thomas Gleixner wrote:
>

[...]

> As usual, comments and suggestions are welcome!


Compiles and boots fine here ( on my Dell Precision WorkStation 530 MT ). And 
nothing broke so far.

I only got some Kconfig warnings[1] with my config[2] but that is.

( I don't know whatever this matter but it boots 7,52 seconds faster as current 
git head )

[1]http://194.231.229.228/linux-x86/warning
[2]http://194.231.229.228/linux-x86/config-x86

> 
>   Thomas, Ingo
> 
> 


Regards,

Gabriel C
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Greg KH
On Sat, Jul 21, 2007 at 02:28:52AM +0200, Michal Piotrowski wrote:
>  On 21/07/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> > Oh, which means ...
> >
> >
> > On 7/21/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> > > On 7/21/07, Greg KH <[EMAIL PROTECTED]> wrote:
> > > > On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
> > > > > On Fri, 20 Jul 2007 15:50:47 -0700
> > > > > Greg KH <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
> > > > > > >  Hi Greg,
> > > > > > >
> > > > > > >  This looks like a sysfs bug
> > > > > > >  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
> > > > > > >  broken-out-2007-07-20-00-22/3.jpg
> > > > > > >
> > > > > > >  l *kernel_param_sysfs_setup+0x75
> > > > > > >  0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
> > > > > > >  565 mk->mod = THIS_MODULE;
> > > > > > >  566 kobj_set_kset_s(mk, module_subsys);
> >
> > > > > > >  567 kobject_set_name(>kobj, name);
> >
> > Shouldn't the return of kobject_set_name() be checked here?
> >
> > [ Looking at code, and realizing that kobject_set_name() manages to
> > succeed even when given a null string! ]
> >
> > > > > > >  568 kobject_init(>kobj);
> > > > > > >  569 ret = kobject_add(>kobj);
> > > > > > >  570 BUG_ON(ret < 0);
> > > > > > >  571 param_sysfs_setup(mk, kparam, num_params, 
> > name_skip);
> > > > > > >  572 kobject_uevent(>kobj, KOBJ_ADD);
> > > > > > >  573 }
> > > > > > >  574
> > > > > > >
> > > > > > >  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
> > > > > > >  broken-out-2007-07-20-00-22/mm-config
> > > > > >
> > > > > > What kernel version is this happening on?  The -mm tree?  Can you 
> > try
> > > > > > Linus's tree instead?
> > > > > >
> > > > > > It looks like there was some needed information right before the 
> > first
> > > > > > stack dump, showing exactly what kobject was trying to be added 
> > that was
> > > > > > already present.  Odds are this is a kernel parameter with the same 
> > name
> > > > > > as a duplicate one within the same module,
> > >
> > > I don't think that's an -EEXIST.
> > >
> > > I think what we have here is kobject_add() exiting with -EINVAL.
> > > (kobject attempted to be registered with no name!)
> > >
> > > [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
> > > That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
> > > we would've seen an offset in kobject_shadow_add closer to 0x189,
> > > because the dump_stack() for EEXIST is barely 4 instructions before
> > > we return from that function. ]
> > >
> > > > > > but the trick is going to be
> > > > > > trying to figure out what module is causing this.
> > >
> > > So I'd guess we want to search for a module that's passing a kobject *
> > > to kobject_add() such that !kobj->k_name is true.
> >
> > Oh, that's kernel_param_sysfs_setup itself. So we actually need to
> > search for a built-in module in Michal's config that ... has an ... empty
> > "" modname !?
> 
>  I'll try to figure out this

Try the patch below to help you boot and figure out what went wrong.

Post the kernel log results and I'll try to help you out.

thanks,

greg k-h

---
 kernel/params.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/params.c
+++ b/kernel/params.c
@@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
kobject_set_name(>kobj, name);
kobject_init(>kobj);
ret = kobject_add(>kobj);
-   BUG_ON(ret < 0);
+   if (ret) {
+   printk(KERN_ERR "module '%s' failed to be added to sysfs, "
+   "the system will be unstable now.\n", name);
+   return;
+   }
param_sysfs_setup(mk, kparam, num_params, name_skip);
kobject_uevent(>kobj, KOBJ_ADD);
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use the tsk argument in init_new_context()

2007-07-20 Thread Diego Woitasen
On Thu, Jul 19, 2007 at 05:42:38PM -0700, Andrew Morton wrote:
> On Sun,  8 Jul 2007 22:55:08 -0300
> Diego Woitasen <[EMAIL PROTECTED]> wrote:
> 
> > Signed-off-by: Diego Woitasen <[EMAIL PROTECTED]>
> > ---
> >  arch/i386/kernel/ldt.c   |2 +-
> >  arch/x86_64/kernel/ldt.c |2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/i386/kernel/ldt.c b/arch/i386/kernel/ldt.c
> > index e0b2d17..c2eb4fb 100644
> > --- a/arch/i386/kernel/ldt.c
> > +++ b/arch/i386/kernel/ldt.c
> > @@ -96,7 +96,7 @@ int init_new_context(struct task_struct *tsk, struct 
> > mm_struct *mm)
> >  
> > init_MUTEX(>context.sem);
> > mm->context.size = 0;
> > -   old_mm = current->mm;
> > +   old_mm = tsk->mm;
> > if (old_mm && old_mm->context.size > 0) {
> > down(_mm->context.sem);
> > retval = copy_ldt(>context, _mm->context);
> > diff --git a/arch/x86_64/kernel/ldt.c b/arch/x86_64/kernel/ldt.c
> > index bc9ffd5..99a92ed 100644
> > --- a/arch/x86_64/kernel/ldt.c
> > +++ b/arch/x86_64/kernel/ldt.c
> > @@ -100,7 +100,7 @@ int init_new_context(struct task_struct *tsk, struct 
> > mm_struct *mm)
> >  
> > init_MUTEX(>context.sem);
> > mm->context.size = 0;
> > -   old_mm = current->mm;
> > +   old_mm = tsk->mm;
> > if (old_mm && old_mm->context.size > 0) {
> > down(_mm->context.sem);
> > retval = copy_ldt(>context, _mm->context);
> 
> 
> When called from dup_mm(), `tsk' refers to the new task and `current'
> refers to the old one.  I'd have expected this to crash during your testing?
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Yes, sorry... that patch is bad. Now my question is, why all
architectures have the task argument and neither use it? I undertand now
that init_new_context() work with current but what about the *tsk arg.



-- 

--
Diego Woitasen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote:
> >  Always look at the parent devices themselves for determining device
> >  context properties.
> 
> For determining?
> 
> What was the original language of this document?

Ok, that's just being mean, cut it out right now if you ever want my
help again.

I'll gladly accept patches for this document that is in the kernel tree
now if you want to send them.  But criticizing the grammer of a document
with statements like this one gets you no where and is damm rude.

I suggest you start this thread over if you want my feedback, I'm not
going to respond anymore to this one.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote:
> I'm not trying to document /sys/devices.  I'm trying to document hotplug, 
> populating /dev, and things like firmware loading that fall out of that.  
> This requires use of sysfs, and I'm only trying to document as much of sysfs 
> as you need to do that.

Like I stated before, you do not need to even have sysfs mounted to have
a dynamic /dev.

And why do you need to document populating /dev dynamically?  udev
already solves this problem for you, it's not like people are going off
and reinventing udev for their own enjoyment would not at least look at
how it solves this problem first.

To do otherwise would be foolish :)

Firmware loading is fine to document if you wish to do so.  But again,
why?  We already have multiple userspace programs that provide this
feature for them.  Perhaps you want to document how to add firmware to a
system in order for these different programs to pick them up?

Or perhaps you want to document how to add this kind of functionality to
your kernel driver so that it can handle firmware loading by using the
firmware interface that the kernel provides?

If you just want to document the hotplug/uevent api, then do just that.
However I think you are overreaching with your scope here and getting
mighty confused in the process.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote:
> Ok, back up.  /sys/devices does not contain all the information necessary to 
> populate /dev, because it hasn't got things like 
> ramdisks, /dev/zero, /dev/console which are THERE in sysfs, which may or may 
> not be supported by the kernel (the kernel might have ramdisk support, might 
> not).

Welcome to 2007:

$ ls /sys/devices/virtual/mem/
full  kmem  kmsg  mem  null  port  random  urandom  zero
$ ls /sys/devices/virtual/tty/
console  tty12  tty19  tty25  tty31  tty38  tty44  tty50  tty57  tty63
ptmx tty13  tty2   tty26  tty32  tty39  tty45  tty51  tty58  tty7
tty  tty14  tty20  tty27  tty33  tty4   tty46  tty52  tty59  tty8
tty0 tty15  tty21  tty28  tty34  tty40  tty47  tty53  tty6   tty9
tty1 tty16  tty22  tty29  tty35  tty41  tty48  tty54  tty60
tty10tty17  tty23  tty3   tty36  tty42  tty49  tty55  tty61
tty11tty18  tty24  tty30  tty37  tty43  tty5   tty56  tty62

I suggest you take a close look at the kernel before making statements
like the above :)

> These things could also, in future, have their major and minor numbers 
> dynamically (even randomly) assigned.  That's been discussed on this list.

I tried that once, it will require some core api kernel changes and a
lot of infrastrucure work to get that to work properly.  Not that it
will never happen in the future, but it's just not a trivial change at
the moment...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Rob Landley
On Wednesday 18 July 2007 7:40:20 pm Greg KH wrote:
> On Wed, Jul 18, 2007 at 01:39:53PM -0400, Rob Landley wrote:
> > PICK ONE!  JUST 

[GIT PULL] MMC updates

2007-07-20 Thread Pierre Ossman
Linus, please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc.git for-linus

to receive the following updates:

 MAINTAINERS |7 ++-
 drivers/mmc/host/at91_mci.c |   13 -
 drivers/mmc/host/sdhci.c|2 ++
 drivers/mmc/host/sdhci.h|1 +
 4 files changed, 21 insertions(+), 2 deletions(-)

Marc Pignat (1):
  mmc: at91_mci: wakeup on card insertion (or removal)

Pierre Ossman (2):
  mmc: add maintainer for at91
  sdhci: make sure to clear the error interrupt

diff --git a/MAINTAINERS b/MAINTAINERS
index fbe0dca..c9fab2b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -645,7 +645,12 @@ W: http://linux-atm.sourceforge.net
 S: Maintained
 
 ATMEL AT91 MCI DRIVER
-S: Orphan
+P: Nicolas Ferre
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED] (subscribers-only)
+W: http://www.atmel.com/products/AT91/
+W: http://www.at91.com/
+S: Maintained
 
 ATMEL MACB ETHERNET DRIVER
 P: Haavard Skinnemoen
diff --git a/drivers/mmc/host/at91_mci.c b/drivers/mmc/host/at91_mci.c
index 28c8818..15aab37 100644
--- a/drivers/mmc/host/at91_mci.c
+++ b/drivers/mmc/host/at91_mci.c
@@ -903,8 +903,10 @@ static int __init at91_mci_probe(struct platform_device 
*pdev)
/*
 * Add host to MMC layer
 */
-   if (host->board->det_pin)
+   if (host->board->det_pin) {
host->present = !at91_get_gpio_value(host->board->det_pin);
+   device_init_wakeup(>dev, 1);
+   }
else
host->present = -1;
 
@@ -940,6 +942,7 @@ static int __exit at91_mci_remove(struct platform_device 
*pdev)
host = mmc_priv(mmc);
 
if (host->present != -1) {
+   device_init_wakeup(>dev, 0);
free_irq(host->board->det_pin, host);
cancel_delayed_work(>mmc->detect);
}
@@ -966,8 +969,12 @@ static int __exit at91_mci_remove(struct platform_device 
*pdev)
 static int at91_mci_suspend(struct platform_device *pdev, pm_message_t state)
 {
struct mmc_host *mmc = platform_get_drvdata(pdev);
+   struct at91mci_host *host = mmc_priv(mmc);
int ret = 0;
 
+   if (device_may_wakeup(>dev))
+   enable_irq_wake(host->board->det_pin);
+
if (mmc)
ret = mmc_suspend_host(mmc, state);
 
@@ -977,8 +984,12 @@ static int at91_mci_suspend(struct platform_device *pdev, 
pm_message_t state)
 static int at91_mci_resume(struct platform_device *pdev)
 {
struct mmc_host *mmc = platform_get_drvdata(pdev);
+   struct at91mci_host *host = mmc_priv(mmc);
int ret = 0;
 
+   if (device_may_wakeup(>dev))
+   disable_irq_wake(host->board->det_pin);
+
if (mmc)
ret = mmc_resume_host(mmc);
 
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 10d15c3..4a24db0 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1024,6 +1024,8 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
 
intmask &= ~(SDHCI_INT_CMD_MASK | SDHCI_INT_DATA_MASK);
 
+   intmask &= ~SDHCI_INT_ERROR;
+
if (intmask & SDHCI_INT_BUS_POWER) {
printk(KERN_ERR "%s: Card is consuming too much power!\n",
mmc_hostname(host->mmc));
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 7400f4b..a6c8704 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -107,6 +107,7 @@
 #define  SDHCI_INT_CARD_INSERT 0x0040
 #define  SDHCI_INT_CARD_REMOVE 0x0080
 #define  SDHCI_INT_CARD_INT0x0100
+#define  SDHCI_INT_ERROR   0x8000
 #define  SDHCI_INT_TIMEOUT 0x0001
 #define  SDHCI_INT_CRC 0x0002
 #define  SDHCI_INT_END_BIT 0x0004


-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] two warning fixes

2007-07-20 Thread Benjamin Herrenschmidt
On Fri, 2007-07-20 at 20:34 +0200, Krzysztof Halasa wrote:
> Linus Torvalds <[EMAIL PROTECTED]> writes:
> 
> > More people *should* generally ask themselves: "was the warning worth it?" 
> > and then, if the answer is "no", they shouldn't add code, they should 
> > remove the thing that causes the warning in the first place.
> 
> Sure. If a routine uses must_check yet its return value may be
> safely ignored then that must_check is simply misplaced and should
> be removed. It does not mean all must_checks are bad - each of them
> isn't bad unless one can demonstrate it is.
> 
> Back to sysfs_create_bin_file() - if one can demonstrate a caller
> can safely ignore the return value (which, it seems, is the
> case), then exactly this very must_check should be removed

Typically, the EDID creation in radeonfb :-)

In fact, I'm not even sure there's -any- user of those sysfs files. I
added them back then to allow distros to extract the EDID infos that
were probed by radeonfb to properly configure the X server (because on
some machines, the EDID is coming from the firmware/BIOS, not from DDC,
and X can't get at it). I don't know if they ever used them.

In any case, it doesn't make sense to abort initialization of the driver
if for some reasons those files can't be created (for example, the core
fbdev starts exposing EDID files, radeonfb isn't properly updated, name
clash, error). Aborting the initialization will make sure that on some
machines such as powermacs with radeon, whatever error is displayed will
never be seen by the user.

That's a typical, but I have plenty more.

For example, the powermac thermal control drivers. They work pretty well
by themselves. They also expose via sysfs all the current values, fan
speeds, temps ,etc... for the sake of whoever wants to do a GUI or
"monitor" what's going on, but that is not critical to the operation of
the driver. Thus, failure to create those files is not critical.

I have plenty other examples.

Thus, we have two choices here:

 - The simple one: sysfs_create_blah() displays a warning when it fails
and has no must_check

 - The one that adds code everywhere (the current one):
sysfs_create_blah() returns an error, has much_check, and thus all
callers like I described abvoe need to add code to test it and print a
warning. Lots of added .text and .data for little benefit.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Arnd Bergmann
On Saturday 21 July 2007, Thomas Gleixner wrote:
> The topic of sharing more x86 code has been discussed on LKML a number 
> of times. Various approaches were discussed and we decided to advance 
> the discussion by implementing a full solution that brings the 
> transition to a shared tree to completion.

Great stuff. I've worked on doing the same for s390 and powerpc
in the past, and really think it's the right thing to do. I've
even started my own x86 merge two or three times in the past
but never got very far because of the quickly moving source.

> In this initial implementation the old arch/i386 and arch/x86_64 trees 
> are removed _immediately_, in the same commit, and all future x86 
> development goes on in the new, shared tree. So the transition right now 
> is one atomic operation.
> 
> As a next step we plan to generate a gradual, fully bisectable, fully
> working switchover from the current code to the fully populated
> arch/x86 tree. It will result in about 1000-2000 commits. We are
> releasing our current solution because it 100% represents the finally
> resulting arch/x86 source tree already, and we first wanted to make
> sure that the new architecture layout works fine and folks are happy
> before we go and do the (even more complex) fine-grained work.

I don't think it's really good to do it this way, or maybe I'm still
misunderstanding where you're going. If you really want to end
up with the exact set of files that you have your tree now, I see
absolutely zero point in making it bisectable. On the contrary,
there is nothing particularly complicated in it, so once it has
seen some amount of testing it can better get merged in one
big changeset. I'm just not convinced that it actully is what
we want to end up with.

In my experience, it's very helpful to have a single set of header
files, and merging the two versions of one header usually exposes
bugs that have been fixed in only one of the two, so you get
to fix actual bugs in the process.

In the s390 merge, I also started out in an attempt to guarantee
unchanged object files, much like what you describe. However, it
turned out that fixing it in the process is actually easier.
Either way, 'diff -D __x86_64__' is a great tool for a start, you
should try it out to see how easy it is to merge a lot of files.

To put it into perspective, I think the s390 merge was a lot easier
than the x86 merge, because there is only a very limited set of
hardware configurations for s390 compared to others. We ended up
doing the full merge with three people within less than a week
and no separate files at all.

OTOH, the powerpc merge is now going into its third year, mostly
because it was started with the intention to remove all cruft
in the process and to only allow sane code into the new architecture.

The steps that I'd suggest instead are:

* merge all exported header files of the two architectures. This
  alone is a worthy goal, because it allows us to get rid of
  the ugly code for deciding which version to use in installed
  headers and elsewhere.

* Merge the remaining header files, to end up with a single
  include/asm-x86 directory.

* Come up with a model that integrates the machine type selection
  of i386 with the way we build things on x86_64. One way would
  be to make X86_64 another platform next to X86_PC, X86_VOYAGER
  and the others.

* Create an arch/x86/Kconfig that handles the new common
  configuration

* Create an arch/x86/Makefile that descends into ../i386/* and
  ../x86_64/* instead of its subdirectories.

* Merge the arch/x86/* subdirectories, one at a time, starting with
  the low-hanging fruit like oprofile or pci, and do the hard
  ones like mm and kernel last.

Unfortunately, I don't think I'll spend much time on this, so I
don't get to decide on it, but you asked for feedback ;-)

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Michal Piotrowski

On 21/07/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:

Oh, which means ...


On 7/21/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> On 7/21/07, Greg KH <[EMAIL PROTECTED]> wrote:
> > On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
> > > On Fri, 20 Jul 2007 15:50:47 -0700
> > > Greg KH <[EMAIL PROTECTED]> wrote:
> > >
> > > > On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
> > > > >  Hi Greg,
> > > > >
> > > > >  This looks like a sysfs bug
> > > > >  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
> > > > >  broken-out-2007-07-20-00-22/3.jpg
> > > > >
> > > > >  l *kernel_param_sysfs_setup+0x75
> > > > >  0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
> > > > >  565 mk->mod = THIS_MODULE;
> > > > >  566 kobj_set_kset_s(mk, module_subsys);

> > > > >  567 kobject_set_name(>kobj, name);

Shouldn't the return of kobject_set_name() be checked here?

[ Looking at code, and realizing that kobject_set_name() manages to
succeed even when given a null string! ]

> > > > >  568 kobject_init(>kobj);
> > > > >  569 ret = kobject_add(>kobj);
> > > > >  570 BUG_ON(ret < 0);
> > > > >  571 param_sysfs_setup(mk, kparam, num_params, name_skip);
> > > > >  572 kobject_uevent(>kobj, KOBJ_ADD);
> > > > >  573 }
> > > > >  574
> > > > >
> > > > >  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
> > > > >  broken-out-2007-07-20-00-22/mm-config
> > > >
> > > > What kernel version is this happening on?  The -mm tree?  Can you try
> > > > Linus's tree instead?
> > > >
> > > > It looks like there was some needed information right before the first
> > > > stack dump, showing exactly what kobject was trying to be added that was
> > > > already present.  Odds are this is a kernel parameter with the same name
> > > > as a duplicate one within the same module,
>
> I don't think that's an -EEXIST.
>
> I think what we have here is kobject_add() exiting with -EINVAL.
> (kobject attempted to be registered with no name!)
>
> [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
> That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
> we would've seen an offset in kobject_shadow_add closer to 0x189,
> because the dump_stack() for EEXIST is barely 4 instructions before
> we return from that function. ]
>
> > > > but the trick is going to be
> > > > trying to figure out what module is causing this.
>
> So I'd guess we want to search for a module that's passing a kobject *
> to kobject_add() such that !kobj->k_name is true.

Oh, that's kernel_param_sysfs_setup itself. So we actually need to
search for a built-in module in Michal's config that ... has an ... empty
"" modname !?


I'll try to figure out this


Shouldn't that turn up pretty quickly in a grep?

How do I do that, btw?



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: v2.6.22.1-rt3

2007-07-20 Thread Thomas Gleixner
On Thu, 2007-07-19 at 20:37 -0700, Daniel Walker wrote:
> The broken out series is here,
> ftp://source.mvista.com/pub/dwalker/rt/patch-2.6.22.1-rt4-dw1.tar.gz

I'll pick that up soon.

Thanks,

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Rob Landley
On Thursday 19 July 2007 4:16:17 am Cornelia Huck wrote:
> On Wed, 18 Jul 2007 13:39:53 -0400,
>
> Rob Landley <[EMAIL PROTECTED]> wrote:
> > Nope.  If you recurse down under /sys/class following symlinks, you go
> > into an endless loop bouncing off of /sys/devices and getting pointed
> > back.  If you don't follow symlinks, it works fine up until about 2.6.20
> > at which point things that were previously directories BECAME symlinks
> > because the directories got moved, and it all broke.
>
> I have no idea what you're doing.

See the email to kay sievers.  In 2.6.14 following symlinks hit an endless
/sys/block/hda/device/block/device/block/device/block...  This has changed 
since, like much of sysfs, but in the absence of either a spec or a stable 
API there's no guarantee it won't reoccur.

> > Which is why I want it documented where to look for these suckers.  Just
> > give me ONE STABLE WAY TO FIND THIS INFORMATION, PLEASE.
>
> See Documentation/sysfs-rules.txt.

Ok:

Paragraph 1: "It's not stable."
Paragraph 2: "It's not stable."
Paragraph 3: If you really really need to access it directly...
Paragraph 4: DO NOT DO $XXX.
Paragraph 5: Expect it to be mounted at /sys
Paragraph 6: DO NOT DO $XXX.  (Specficially, the way you were distinguishing 
between block and char devices?  Don't do that.  No, we won't tell you what 
to replace it with, keep reading.)

So far, not exactly gripping reading.

Paragraph 7: What a devpath is.  Ok, is it just me or does it say that 
applications shouldn't use the symlinks in sysfs?  Why are they there, then?

Paragraph 8: The kernel has a name for the device.
Paragraph 9: Subsystem is a string.  What it means, we leave for you to guess.
Paragraph 10: Driver is the name of a driver.  (Does this mean a driver is 
currently loaded and handling the device, or that the kernel is suggesting a 
driver based on something like PCI ID, through the kind of mechanism that 
used to be used to request module loading?  Experimentally, it looks like the 
first, which makes sense but isn't specified.  Does something 
like /sys/class/mem/zero or have a driver?  Experimentally, no, it hasn't got 
a device link.)
Paragraph 11: Atributes, and yet more DO NOT DO $XXX.  It took me three reads 
of that to figure out they probably meant "Attributes belong to a device, 
don't confuse the attributes of another device with attributes of this 
device."  (Following _which_ device symlink?)

Ok, back up.  /sys/devices does not contain all the information necessary to 
populate /dev, because it hasn't got things like 
ramdisks, /dev/zero, /dev/console which are THERE in sysfs, which may or may 
not be supported by the kernel (the kernel might have ramdisk support, might 
not).  These things could also, in future, have their major and minor numbers 
dynamically (even randomly) assigned.  That's been discussed on this list.

I'm not trying to document /sys/devices.  I'm trying to document hotplug, 
populating /dev, and things like firmware loading that fall out of that.  
This requires use of sysfs, and I'm only trying to document as much of sysfs 
as you need to do that.  I'm not documenting stuff 
like /sys/devices/system/cpu.

The consensus so far is "the udev implementation is the spec", except I 
watched the udev implementation change rather a lot before I stopped tracking 
it, and saw a number of people complain on this list about things breaking 
when they upgraded the kernel but not udev.

Back to reading the document:
> - Properties of parent devices never belong into a child device.

Belong into?

>  Always look at the parent devices themselves for determining device
>  context properties.

For determining?

What was the original language of this document?

> If the device 'eth0' or 'sda' does not have a
>   "driver"-link, then this device does not have a driver.

Again, whether they mean "the kernel was not built with a driver that can 
handle this device" or "no driver is currently loaded and handling this 
device".  It _sounds_ like "this device is not supported by Linux", which 
probably isn't what they meant.

> Never copy any property of the parent-device into a child-device.

I note that the only mention made so far of parent-child relationships in 
devices is in terms of "don'ts".  I assume they're talking about how a 
partition can be the child of a block device, and a network controller card 
can be the child of a pci bus device?

Ah, I see.  The next paragraph is on hierarchy, yet doesn't actually explain 
anything, other than to imply that the device hierarchy being fully 
represented there is a dream to be achieved sometime in the future but not 
necessarily the truth with today's kernels, because stuff is still being 
_moved_ into /sys/devices.

> - Classification by subsystem
>  There are currently three places for classification of devices:
>  /sys/block, /sys/class and /sys/bus.

So if somebody wants to write code that runs on a current kernel, they have no 
alternative but to 

Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Michal Piotrowski

On 21/07/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:


* Michal Piotrowski <[EMAIL PROTECTED]> wrote:

> >We are pleased to announce a project we've been working on for some
> >time: the unified x86 architecture tree, or "arch/x86" - and we'd
> >like to solicit feedback about it.
> >
> >What is this about?
> [..]
> >As usual, comments and suggestions are welcome!
>
> I really like this idea - code duplication is a bad thing.
>
> BTW. I don't see any regression here :)

cool - could you tell us a bit more about on what type of box you tried
it,


it is an old P4 (i386)


and how wide and versatile the .config is?


http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-git15/config



Ingo



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread H. Peter Anvin
Alan Cox wrote:
> On Fri, 20 Jul 2007 18:38:39 -0400
> Jeff Garzik <[EMAIL PROTECTED]> wrote:
> 
>> I agree with Andi...  it's quite nice to be able to leave some arch/i386 
>> stuff, and not carry it over to arch/x86-64.
> 
> Its easy enough to push that stuff into arch/x86/legacy and have one
> subdirectory of stuff to pull in for ancient systems.

The other thing is that "legacy" in this context is fungible.  No IOMMU
was legacy until the Intel x86-64 chips came out, and I can promise you
that some legacy code will be necessary once we start seeing VIA and
others come out with embedded x86-64.

On the other hand, it's pretty bloody safe to assume that we'll never
see an x86-64 chip without CPUID, CMOV, FXSAVE, SSE-2, CMPXCHG, etc.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: joydev.c and saitek cyborg evo force

2007-07-20 Thread Renato Golin

On 20/06/07, Jiri Kosina <[EMAIL PROTECTED]> wrote:

Could you please send me the report descriptor of the device, so that I
could debug it locally here?


Hi Jiri,

sorry for the delay, below the report descriptor and attached is the
full report when I've connected the joystick.


report descriptor (size 851, read 851) =  05 01 09 04 a1 01 09 01 a1
00 85 06 09 30 15 00 26 00 10 35 00 46 00 10 75 10 95 01 81 02 09 31
81 02 05 02 09 bb 26 ff 00 46 ff 00 75 08 81 02 05 09 19 01 29 0c 25
01 45 01 75 01 95 0c 81 02 05 01 09 39 25 07 46 3b 01 55 00 65 44 75
04 95 01 81 42 65 00 05 02 09 ba 26 ff 00 46 ff 00 75 08 81 02 c0 05
0f 09 92 a1 02 85 02 09 a6 09 a4 09 a0 09 9f 25 01 45 00 75 01 95 04
81 02 75 04 95 01 81 03 09 22 75 07 25 09 81 02 09 94 75 01 25 01 81
02 75 08 81 03 c0 09 21 a1 02 85 0b 09 22 25 09 91 02 09 25 a1 02 09
26 09 30 09 32 09 31 09 33 09 34 09 40 09 41 15 01 25 08 91 00 c0 09
53 25 0c 75 05 91 02 09 56 15 00 25 01 75 01 91 02 09 55 a1 02 05 01
09 30 09 31 95 02 91 02 c0 05 0f 09 50 27 fe ff 00 00 47 fe ff 00 00
75 10 95 01 55 fd 66 01 10 91 02 55 00 65 00 09 57 26 ff 00 46 68 01
75 08 65 44 91 02 65 00 09 54 27 fe ff 00 00 47 fe ff 00 00 75 10 55
fd 66 01 10 91 02 55 00 65 00 09 58 a1 02 05 0a 09 01 09 02 26 2b 01
45 00 95 02 91 02 c0 05 0f 09 a7 27 fe ff 00 00 47 fe ff 00 00 95 01
55 fd 66 01 10 91 02 55 00 65 00 c0 09 5a a1 02 85 0c 09 23 26 2b 01
45 00 91 02 09 5c 26 10 27 46 10 27 55 fd 66 01 10 91 02 55 00 65 00
09 5b 25 7f 75 08 91 02 09 5e 26 10 27 75 10 55 fd 66 01 10 91 02 55
00 65 00 09 5d 25 7f 75 08 91 02 c0 09 73 a1 02 85 0d 09 23 26 2b 01
45 00 75 10 91 02 09 70 15 81 25 7f 36 f0 d8 46 10 27 75 08 91 02 c0
09 6e a1 02 85 0e 09 23 15 00 26 2b 01 35 00 45 00 75 10 91 02 09 70
25 7f 46 10 27 75 08 91 02 09 6f 15 81 36 f0 d8 91 02 09 71 15 00 26
ff 00 35 00 46 68 01 91 02 09 72 26 10 27 46 10 27 75 10 55 fd 66 01
10 91 02 55 00 65 00 c0 09 5f a1 02 85 0f 09 23 26 2b 01 45 00 91 02
09 61 15 9c 25 64 36 f0 d8 46 10 27 75 08 91 02 09 62 91 02 09 60 16
0c fe 26 f4 01 75 10 91 02 09 65 15 00 26 e8 03 35 00 91 02 09 63 25
64 75 08 91 02 09 64 91 02 c0 09 77 a1 02 85 51 09 22 25 09 45 00 91
02 09 78 a1 02 09 7b 09 79 09 7a 15 01 25 03 91 00 c0 09 7c 15 00 26
fe 00 91 02 c0 09 92 a1 02 85 52 09 96 a1 02 09 9a 09 99 09 97 09 98
09 9b 09 9c 15 01 25 06 91 00 c0 c0 05 ff 0a 01 03 a1 02 85 40 0a 02
03 a1 02 1a 11 03 2a 20 03 25 10 91 00 c0 0a 03 03 15 00 27 ff ff 00
00 75 10 91 02 c0 05 0f 09 7d a1 02 85 43 09 7e 26 80 00 46 10 27 75
08 91 02 c0 09 85 a1 02 85 44 09 86 27 ff ff 00 00 45 00 75 10 91 02
09 87 91 02 09 88 91 02 c0 05 ff 0a 00 01 a1 02 85 81 05 01 09 30 15
81 25 7f 36 f0 d8 46 10 27 75 08 91 02 09 31 91 02 c0 05 0f 09 7f a1
02 85 0b 09 80 15 00 26 ff 7f 35 00 45 00 75 0f b1 03 09 a9 25 01 75
01 b1 03 09 83 26 ff 00 75 08 b1 03 09 84 25 10 b1 03 09 a8 a1 02 09
73 09 6e 09 5a 09 5f 95 04 b1 03 c0 c0 c0

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm


joy-dmesg.log.gz
Description: GNU Zip compressed data


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

Oh, which means ...


On 7/21/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:

On 7/21/07, Greg KH <[EMAIL PROTECTED]> wrote:
> On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
> > On Fri, 20 Jul 2007 15:50:47 -0700
> > Greg KH <[EMAIL PROTECTED]> wrote:
> >
> > > On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
> > > >  Hi Greg,
> > > >
> > > >  This looks like a sysfs bug
> > > >  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
> > > >  broken-out-2007-07-20-00-22/3.jpg
> > > >
> > > >  l *kernel_param_sysfs_setup+0x75
> > > >  0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
> > > >  565 mk->mod = THIS_MODULE;
> > > >  566 kobj_set_kset_s(mk, module_subsys);



> > > >  567 kobject_set_name(>kobj, name);


Shouldn't the return of kobject_set_name() be checked here?

[ Looking at code, and realizing that kobject_set_name() manages to
succeed even when given a null string! ]


> > > >  568 kobject_init(>kobj);
> > > >  569 ret = kobject_add(>kobj);
> > > >  570 BUG_ON(ret < 0);
> > > >  571 param_sysfs_setup(mk, kparam, num_params, name_skip);
> > > >  572 kobject_uevent(>kobj, KOBJ_ADD);
> > > >  573 }
> > > >  574
> > > >
> > > >  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
> > > >  broken-out-2007-07-20-00-22/mm-config
> > >
> > > What kernel version is this happening on?  The -mm tree?  Can you try
> > > Linus's tree instead?
> > >
> > > It looks like there was some needed information right before the first
> > > stack dump, showing exactly what kobject was trying to be added that was
> > > already present.  Odds are this is a kernel parameter with the same name
> > > as a duplicate one within the same module,

I don't think that's an -EEXIST.

I think what we have here is kobject_add() exiting with -EINVAL.
(kobject attempted to be registered with no name!)

[ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
we would've seen an offset in kobject_shadow_add closer to 0x189,
because the dump_stack() for EEXIST is barely 4 instructions before
we return from that function. ]

> > > but the trick is going to be
> > > trying to figure out what module is causing this.

So I'd guess we want to search for a module that's passing a kobject *
to kobject_add() such that !kobj->k_name is true.


Oh, that's kernel_param_sysfs_setup itself. So we actually need to
search for a built-in module in Michal's config that ... has an ... empty
"" modname !? Shouldn't that turn up pretty quickly in a grep?

How do I do that, btw?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Ingo Molnar

* Michal Piotrowski <[EMAIL PROTECTED]> wrote:

> >We are pleased to announce a project we've been working on for some 
> >time: the unified x86 architecture tree, or "arch/x86" - and we'd 
> >like to solicit feedback about it.
> >
> >What is this about?
> [..]
> >As usual, comments and suggestions are welcome!
> 
> I really like this idea - code duplication is a bad thing.
> 
> BTW. I don't see any regression here :)

cool - could you tell us a bit more about on what type of box you tried 
it, and how wide and versatile the .config is?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Michal Piotrowski

Hi,

On 21/07/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:

We are pleased to announce a project we've been working on for some
time: the unified x86 architecture tree, or "arch/x86" - and we'd like
to solicit feedback about it.

What is this about?

[..]

As usual, comments and suggestions are welcome!


I really like this idea - code duplication is a bad thing.

BTW. I don't see any regression here :)



Thomas, Ingo


Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Alan Cox
On Fri, 20 Jul 2007 18:38:39 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> I agree with Andi...  it's quite nice to be able to leave some arch/i386 
> stuff, and not carry it over to arch/x86-64.

Its easy enough to push that stuff into arch/x86/legacy and have one
subdirectory of stuff to pull in for ancient systems.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] pcmcia: CompactFlash driver for PA Semi Electra boards

2007-07-20 Thread Andrew Morton
On Thu, 5 Jul 2007 09:49:14 -0500
[EMAIL PROTECTED] (Olof Johansson) wrote:

> Driver for the CompactFlash slot on the PA Semi Electra eval board. It's
> a simple device sitting on localbus, with interrupts and detect/voltage
> control over GPIO.
> 
> The driver is implemented as an of_platform driver, and adds localbus
> as a bus being probed by the of_platform framework.
> 
> 
> Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>
> 
> ---
> 
> On Mon, Jun 25, 2007 at 03:43:41PM -0500, olof wrote:
> 
> > The ifdef is needed since for CONFIG_PCMCIA=n builds, the bus notifier
> > isn't available. I wanted to do the bus notifier registration explicitly
> > before the of_platform bus probe to avoid later surprises due to reordered
> > initcalls in case it was split up in it's own initcall.
> > 
> > I could add the code under ifdef as well, but it didn't seem too
> > critical. Once the second major board comes along I'll probably move it
> > out to a per-board file, there's no real need for it just yet.
> 
> Alright, turns out I still need to declare the extern bus type, which would 
> mean
> two #ifdefs in one function. Moving it out instead.
> 
> I've addressed Milton's comments as well.
> 
> Who's maintaining PCMCIA? MAINTAINERS only lists a mailing list, no person. 
> Seems
> weird for a component that's marked as maintained.

Dominik Brodowski.  He's having a bit of downtime at present (exams, I
think).  He expects to return.  Meanwhile, cc'ing me usually has some
effect.

>
> ...
>
> +static const char driver_name[] = "electra-cf";
>
> ...
>
> +static struct of_device_id electra_cf_match[] =
> +{
> + {
> + .compatible   = "electra-cf",
> + },
> + {},
> +};

Could have reused driver_name[] here, if that was appropriate.

> +static struct of_platform_driver electra_cf_driver =
> +{
> + .name  = (char *)driver_name,

ug.  But it's not your fault - we should have always made it const.

> --- mainline.orig/arch/powerpc/platforms/pasemi/setup.c
> +++ mainline/arch/powerpc/platforms/pasemi/setup.c

I never know who maintains random-scruffy-ppc code like this.  From a peek
in the git-whatchanged output, it appears to be yourself.


Have a few little fixies:

--- 
a/drivers/pcmcia/electra_cf.c~pcmcia-compactflash-driver-for-pa-semi-electra-boards-fix
+++ a/drivers/pcmcia/electra_cf.c
@@ -201,9 +201,7 @@ static int __devinit electra_cf_probe(st
if (!cf)
return -ENOMEM;
 
-   init_timer(>timer);
-   cf->timer.function = electra_cf_timer;
-   cf->timer.data = (unsigned long) cf;
+   setup_timer(>timer, electra_cf_timer, (unsigned long)cf);
cf->irq = NO_IRQ;
 
cf->ofdev = ofdev;
@@ -340,16 +338,14 @@ static int __devexit electra_cf_remove(s
return 0;
 }
 
-static struct of_device_id electra_cf_match[] =
-{
+static struct of_device_id electra_cf_match[] = {
{
.compatible   = "electra-cf",
},
{},
 };
 
-static struct of_platform_driver electra_cf_driver =
-{
+static struct of_platform_driver electra_cf_driver = {
.name  = (char *)driver_name,
.match_table= electra_cf_match,
.probe= electra_cf_probe,
@@ -371,4 +367,3 @@ module_exit(electra_cf_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR ("Olof Johansson <[EMAIL PROTECTED]>");
 MODULE_DESCRIPTION("PA Semi Electra CF driver");
-
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Create clflush() inline, remove hardcoded wbinvd

2007-07-20 Thread H. Peter Anvin
Glauber de Oliveira Costa wrote:
> On Fri, 2007-07-20 at 14:19 -0700, H. Peter Anvin wrote:
>> Create an inline function for clflush(), with the proper arguments,
>> and use it instead of hard-coding the instruction.
>>
>> This also removes one instance of hard-coded wbinvd, based on a patch
>> by Bauder de Oliveira Costa.
> Hey, Who's that guy that got a name so close to mine? ;-)

That would be Mr. Typo!

>> Cc: Andi Kleen <[EMAIL PROTECTED]>
>> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>

I got it right here at least :-/

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] compat_ioctl requires CONFIG_BLOCK

2007-07-20 Thread Arnd Bergmann
On Saturday 21 July 2007, Sebastian Siewior wrote:
> 
> Got with randconfig
> include/linux/loop.h:66: error: expected specifier-qualifier-list before
> 'request_queue_t'
> make[1]: *** [fs/compat_ioctl.o] Error 1
> 
> parts of compat ioctl require CONFIG_BLOCK to be set.
> 
> Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]>
> Index: b/fs/compat_ioctl.c
> ===
> --- a/fs/compat_ioctl.c
> +++ b/fs/compat_ioctl.c
> @@ -63,7 +63,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_BLOCK
>  #include 
> +#endif

Adding #ifdef around an #include is considered bad style. Better just
make loop.h compile without any conditionals. Does the below
patch work for you?

Arnd <><

--- a/include/linux/loop.h
+++ b/include/linux/loop.h
@@ -63,7 +63,7 @@ struct loop_device {
struct task_struct  *lo_thread;
wait_queue_head_t   lo_event;
 
-   request_queue_t *lo_queue;
+   struct request_queue*lo_queue;
struct gendisk  *lo_disk;
struct list_headlo_list;
 };
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

On 7/21/07, Greg KH <[EMAIL PROTECTED]> wrote:

On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
> On Fri, 20 Jul 2007 15:50:47 -0700
> Greg KH <[EMAIL PROTECTED]> wrote:
>
> > On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
> > >  Hi Greg,
> > >
> > >  This looks like a sysfs bug
> > >  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
> > >  broken-out-2007-07-20-00-22/3.jpg
> > >
> > >  l *kernel_param_sysfs_setup+0x75
> > >  0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
> > >  565 mk->mod = THIS_MODULE;
> > >  566 kobj_set_kset_s(mk, module_subsys);
> > >  567 kobject_set_name(>kobj, name);
> > >  568 kobject_init(>kobj);
> > >  569 ret = kobject_add(>kobj);
> > >  570 BUG_ON(ret < 0);
> > >  571 param_sysfs_setup(mk, kparam, num_params, name_skip);
> > >  572 kobject_uevent(>kobj, KOBJ_ADD);
> > >  573 }
> > >  574
> > >
> > >  
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/broken-out-2007-07-20-00-22/mm-config
> >
> > What kernel version is this happening on?  The -mm tree?  Can you try
> > Linus's tree instead?
> >
> > It looks like there was some needed information right before the first
> > stack dump, showing exactly what kobject was trying to be added that was
> > already present.  Odds are this is a kernel parameter with the same name
> > as a duplicate one within the same module,


I don't think that's an -EEXIST.

I think what we have here is kobject_add() exiting with -EINVAL.
(kobject attempted to be registered with no name!)

[ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
we would've seen an offset in kobject_shadow_add closer to 0x189,
because the dump_stack() for EEXIST is barely 4 instructions before
we return from that function. ]


> > but the trick is going to be
> > trying to figure out what module is causing this.


So I'd guess we want to search for a module that's passing a kobject *
to kobject_add() such that !kobj->k_name is true.


> > So it's not a sysfs bug, but rather a driver issue that this is
> > catching.
>
> In that case a BUG was way too harsh treatment, and in fact directly
> contributed to our inability to debug the bug!
>
> Can we wind that back a bit?  Add some useful printks and then recover
> in some fashion?
[...]
So I'm guessing he was trying to catch something specific here.


Considering that:

(1) This isn't a bug that should bring down the kernel that hard, and,
(2) kobject_shadow_add() seems to be dumping enough stacks and
printing printk's on errors already,

I'd suggest to just get rid of the BUG_ON() in kernel_param_sysfs_setup()


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


where is the code for read system call?

2007-07-20 Thread Agarwal, Lomesh
My application reads from socket. I need to change the behavior of read
system call for an experiment. Can someone point me to code?

thanks
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][SELinux] Let us not leak memory in SELinux : security_netlbl_cache_add()

2007-07-20 Thread James Morris
On Sat, 21 Jul 2007, Jesper Juhl wrote:

> Hi,
> 
> Leaking memory is a bad idea, so let's not do it, in 
> security/selinux/ss/services.c::security_netlbl_cache_add().
> 
> Note: The Coverity checker gets credit for spotting this one.
> Note: Patch has only been compile tested.

Thanks!

Verified and applied to:

git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-linus





- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] console: fix section mismatch warning in vgacon.c

2007-07-20 Thread Antonino A. Daplas
On Fri, 2007-07-20 at 23:27 +0200, Sam Ravnborg wrote:
> Fix following section mismatch warning:
> WARNING: vmlinux.o(.text+0x121e62): Section mismatch: reference to 
> .init.text:__alloc_bootmem (between 'vgacon_startup' and 'vgacon_scrolldelta')
> 
> Browsing the code it seems that vgacon_scrollback_startup() is only
> called during the init phase so the reference to the .init.text
> section is OK.
> Teach modpost not to warn using ___init_refok.
> 
> Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
Acked-by: Antonino Daplas <[EMAIL PROTECTED]>

Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [GIT PULL][RESEND] Late KVM Updates for the 2.6.23 merge window

2007-07-20 Thread Linus Torvalds


On Sat, 21 Jul 2007, S.Çağlar Onur wrote:
> 
> With Linus's latest git, shutting down a guest (fired with -smp 2 -m 512) 
> sometimes 
> ends up like [1], this occured as soon as qemu window closed.
> 
> [1] http://cekirdek.pardus.org.tr/~caglar/kvm/dmesg.latest

[  737.460654] Bad page state in process 'qemu-kvm'
[  737.460656] page:f5e68000 flags:0xea02 mapping: mapcount:2 
count:0
[  737.460657] Trying to fix it up, but a reboot is needed
[  737.460659] Backtrace:
[  737.460691]  [] bad_page+0x64/0x8e
[  737.460733]  [] free_hot_cold_page+0x68/0x15a

That's the "free_pages_check()", and in particular it seems to be 
"page_mapcount()" being non-zero that triggered that thing.

So it looks like something in KVM isn't coherent about the mapping vs the 
usage counters..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Create clflush() inline, remove hardcoded wbinvd

2007-07-20 Thread Glauber de Oliveira Costa
On Fri, 2007-07-20 at 14:19 -0700, H. Peter Anvin wrote:
> Create an inline function for clflush(), with the proper arguments,
> and use it instead of hard-coding the instruction.
> 
> This also removes one instance of hard-coded wbinvd, based on a patch
> by Bauder de Oliveira Costa.
Hey, Who's that guy that got a name so close to mine? ;-)

> 
> Cc: Andi Kleen <[EMAIL PROTECTED]>
> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i2o: defined but not used.

2007-07-20 Thread Sebastian Siewior
Got with randconfig
drivers/message/i2o/exec-osm.c:539: warning: 'i2o_exec_lct_notify' defined but
not used
Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]>
Index: b/drivers/message/i2o/exec-osm.c
===
--- a/drivers/message/i2o/exec-osm.c
+++ b/drivers/message/i2o/exec-osm.c
@@ -389,9 +389,7 @@ static void i2o_exec_lct_modified(struct
if (i2o_device_parse_lct(c) != -EAGAIN)
change_ind = c->lct->change_ind + 1;
 
-#ifdef CONFIG_I2O_LCT_NOTIFY_ON_CHANGES
i2o_exec_lct_notify(c, change_ind);
-#endif
 };
 
 /**
@@ -525,6 +523,7 @@ int i2o_exec_lct_get(struct i2o_controll
return rc;
 }
 
+#ifdef CONFIG_I2O_LCT_NOTIFY_ON_CHANGES
 /**
  * i2o_exec_lct_notify - Send a asynchronus LCT NOTIFY request
  * @c: I2O controller to which the request should be send
@@ -570,6 +569,13 @@ static int i2o_exec_lct_notify(struct i2
return 0;
 };
 
+#else
+static int i2o_exec_lct_notify(struct i2o_controller *c, u32 change_ind)
+{
+   return 0;
+}
+#endif
+
 /* Exec OSM driver struct */
 struct i2o_driver i2o_exec_driver = {
.name = OSM_NAME,

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Randy Dunlap
On Fri, 20 Jul 2007 16:10:52 -0700 Greg KH wrote:

> On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
> > On Fri, 20 Jul 2007 15:50:47 -0700
> > Greg KH <[EMAIL PROTECTED]> wrote:
> > 
> > > On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
> > > >  Hi Greg,
> > > > 
> > > >  This looks like a sysfs bug
> > > >  
> > > > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/broken-out-2007-07-20-00-22/3.jpg
> > > > 
> > > >  l *kernel_param_sysfs_setup+0x75
> > > >  0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
> > > >  565 mk->mod = THIS_MODULE;
> > > >  566 kobj_set_kset_s(mk, module_subsys);
> > > >  567 kobject_set_name(>kobj, name);
> > > >  568 kobject_init(>kobj);
> > > >  569 ret = kobject_add(>kobj);
> > > >  570 BUG_ON(ret < 0);
> > > >  571 param_sysfs_setup(mk, kparam, num_params, name_skip);
> > > >  572 kobject_uevent(>kobj, KOBJ_ADD);
> > > >  573 }
> > > >  574
> > > > 
> > > >  
> > > > http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/broken-out-2007-07-20-00-22/mm-config
> > > 
> > > What kernel version is this happening on?  The -mm tree?  Can you try
> > > Linus's tree instead?
> > > 
> > > It looks like there was some needed information right before the first
> > > stack dump, showing exactly what kobject was trying to be added that was
> > > already present.  Odds are this is a kernel parameter with the same name
> > > as a duplicate one within the same module, but the trick is going to be
> > > trying to figure out what module is causing this.
> > > 
> > > So it's not a sysfs bug, but rather a driver issue that this is
> > > catching.
> > 
> > In that case a BUG was way too harsh treatment, and in fact directly
> > contributed to our inability to debug the bug!
> > 
> > Can we wind that back a bit?  Add some useful printks and then recover
> > in some fashion?
> 
> Sure, I don't mind doing that at all.
> 
> Hm, it looks like Randy added this back in September last year with:
>   commit d8c7649e99e4b081b624aefe1e77caa30b53cb18
>   Author: Randy Dunlap <[EMAIL PROTECTED]>
>   Date:   Fri Sep 29 01:58:55 2006 -0700
> 
>   [PATCH] kernel/params: driver layer error checking
> 
>   Check driver layer return values in kernel/params.c
> 
>   Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
>   Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
>   Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
> 
> (wow, I love git and the signed-off-tree for things like this, it's
> trivial to find this information out.)
> 
> So I'm guessing he was trying to catch something specific here.
> 
> Randy, any objection to changing that BUG_ON to a printk warning instead
> telling the user exactly what needs to be fixed and that the system is
> now going to be unstable when any module is unloaded?

Of course not (no objection).

I added a BUG_ON() ?  Shame on me.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >