Re: [PATCH v3 2/2] io_uring: add support for IORING_OP_GETDENTS

2021-02-19 Thread Lennert Buytenhek
On Fri, Feb 19, 2021 at 08:07:04PM +0200, Lennert Buytenhek wrote:

> > > IORING_OP_GETDENTS may or may not update the specified directory's
> > > file offset, and the file offset should not be relied upon having
> > > any particular value during or after an IORING_OP_GETDENTS call.
> > 
> > This doesn't give me the warm fuzzies.  What I might suggest
> > is either passing a parameter to iterate_dir() or breaking out an
> > iterate_dir_nofpos() to make IORING_OP_GETDENTS more of a READV operation.
> > ie the equivalent of this:
> > 
> > @@ -37,7 +37,7 @@
> >  } while (0)
> >  
> >  
> > -int iterate_dir(struct file *file, struct dir_context *ctx)
> > +int iterate_dir(struct file *file, struct dir_context *ctx, bool use_fpos)
> >  {
> > struct inode *inode = file_inode(file);
> > bool shared = false;
> > @@ -60,12 +60,14 @@ int iterate_dir(struct file *file, struct dir_context 
> > *ctx)
> >  
> > res = -ENOENT;
> > if (!IS_DEADDIR(inode)) {
> > -   ctx->pos = file->f_pos;
> > +   if (use_fpos)
> > +   ctx->pos = file->f_pos;
> > if (shared)
> > res = file->f_op->iterate_shared(file, ctx);
> > else
> > res = file->f_op->iterate(file, ctx);
> > -   file->f_pos = ctx->pos;
> > +   if (use_fpos)
> > +   file->f_pos = ctx->pos;
> > fsnotify_access(file);
> > file_accessed(file);
> > }
> > 
> > That way there's no need to play with llseek or take a mutex on the
> > f_pos of the directory.
> 
> I'll try this!

The patch below (on top of v3) does what you suggest, and it removes
the vfs_llseek() call, but there's two issues:

- We still need to take some sort of mutex on the directory, because,
  while ->iterate_shared() can be called concurrently on different
  struct files that point to the same underlying dir inode, it cannot
  be called concurrently on the same struct file.  From
  Documentation/filesystems/porting.rst:

->iterate_shared() is added; it's a parallel variant of ->iterate().
Exclusion on struct file level is still provided (as well as that
between it and lseek on the same struct file) but if your directory
has been opened several times, you can get these called in parallel.
[...]

- Calling a filesystem's ->iterate_shared() on the same dir with changing
  file positions but without calling the directory's ->llseek() in between
  to notify the filesystem of changes in the file position seems to violate
  an (unstated?) guarantee.  It works on my btrfs root fs, since that uses
  generic_file_llseek() for directory ->llseek(), but e.g. ceph does:

| static loff_t ceph_dir_llseek(struct file *file, loff_t offset, int whence)
| {
| struct ceph_dir_file_info *dfi = file->private_data;
| struct inode *inode = file->f_mapping->host;
| loff_t retval;
|
| inode_lock(inode);
| retval = -EINVAL;
| switch (whence) {
| case SEEK_CUR:
| offset += file->f_pos;
| case SEEK_SET:
| break;
| case SEEK_END:
| retval = -EOPNOTSUPP;
| default:
| goto out;
| }
|
| if (offset >= 0) {
| if (need_reset_readdir(dfi, offset)) {
| dout("dir_llseek dropping %p content\n", file);
| reset_readdir(dfi);
| } else if (is_hash_order(offset) && offset > file->f_pos) {
| /* for hash offset, we don't know if a forward seek
|  * is within same frag */
| dfi->dir_release_count = 0;
| dfi->readdir_cache_idx = -1;
| }
|
| if (offset != file->f_pos) {
| file->f_pos = offset;
| file->f_version = 0;
| dfi->file_info.flags &= ~CEPH_F_ATEND;
| }
| retval = offset;
| }
| out:
| inode_unlock(inode);
| return retval;
| }

So I think we probably can't get rid of the conditional vfs_llseek()
call for now (and we'll probably have to keep taking the dir's
->f_pos_lock) -- what do you think?

(The caveat about that the file pointer may or may not be updated by
IORING_OP_GETDENTS would allow making this optimization in the future,
and for now it would mean that you can't mix getdents64() and
IORING_OP_GETDENTS calls on t

Re: [PATCH v3 2/2] io_uring: add support for IORING_OP_GETDENTS

2021-02-19 Thread Lennert Buytenhek
On Fri, Feb 19, 2021 at 12:34:03PM +, Matthew Wilcox wrote:

> > IORING_OP_GETDENTS may or may not update the specified directory's
> > file offset, and the file offset should not be relied upon having
> > any particular value during or after an IORING_OP_GETDENTS call.
> 
> This doesn't give me the warm fuzzies.  What I might suggest
> is either passing a parameter to iterate_dir() or breaking out an
> iterate_dir_nofpos() to make IORING_OP_GETDENTS more of a READV operation.
> ie the equivalent of this:
> 
> @@ -37,7 +37,7 @@
>  } while (0)
>  
>  
> -int iterate_dir(struct file *file, struct dir_context *ctx)
> +int iterate_dir(struct file *file, struct dir_context *ctx, bool use_fpos)
>  {
> struct inode *inode = file_inode(file);
> bool shared = false;
> @@ -60,12 +60,14 @@ int iterate_dir(struct file *file, struct dir_context 
> *ctx)
>  
> res = -ENOENT;
> if (!IS_DEADDIR(inode)) {
> -   ctx->pos = file->f_pos;
> +   if (use_fpos)
> +   ctx->pos = file->f_pos;
> if (shared)
> res = file->f_op->iterate_shared(file, ctx);
> else
> res = file->f_op->iterate(file, ctx);
> -   file->f_pos = ctx->pos;
> +   if (use_fpos)
> +   file->f_pos = ctx->pos;
> fsnotify_access(file);
> file_accessed(file);
> }
> 
> That way there's no need to play with llseek or take a mutex on the
> f_pos of the directory.

I'll try this!


Re: [PATCH v3 2/2] io_uring: add support for IORING_OP_GETDENTS

2021-02-19 Thread Lennert Buytenhek
On Fri, Feb 19, 2021 at 12:05:58PM +, Pavel Begunkov wrote:

> > IORING_OP_GETDENTS behaves much like getdents64(2) and takes the same
> > arguments, but with a small twist: it takes an additional offset
> > argument, and reading from the specified directory starts at the given
> > offset.
> > 
> > For the first IORING_OP_GETDENTS call on a directory, the offset
> > parameter can be set to zero, and for subsequent calls, it can be
> > set to the ->d_off field of the last struct linux_dirent64 returned
> > by the previous IORING_OP_GETDENTS call.
> > 
> > Internally, if necessary, IORING_OP_GETDENTS will vfs_llseek() to
> > the right directory position before calling vfs_getdents().
> > 
> > IORING_OP_GETDENTS may or may not update the specified directory's
> > file offset, and the file offset should not be relied upon having
> > any particular value during or after an IORING_OP_GETDENTS call.
> > 
> > Signed-off-by: Lennert Buytenhek 
> > ---
> >  fs/io_uring.c | 73 +++
> >  include/uapi/linux/io_uring.h |  1 +
> >  2 files changed, 74 insertions(+)
> > 
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 056bd4c90ade..6853bf48369a 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -635,6 +635,13 @@ struct io_mkdir {
> > struct filename *filename;
> >  };
> >  
> [...]
> > +static int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
> > +{
> > +   struct io_getdents *getdents = >getdents;
> > +   bool pos_unlock = false;
> > +   int ret = 0;
> > +
> > +   /* getdents always requires a blocking context */
> > +   if (issue_flags & IO_URING_F_NONBLOCK)
> > +   return -EAGAIN;
> > +
> > +   /* for vfs_llseek and to serialize ->iterate_shared() on this file */
> > +   if (file_count(req->file) > 1) {
> 
> Looks racy, is it safe? E.g. can be concurrently dupped and used, or
> just several similar IORING_OP_GETDENTS requests.

I thought that it was safe, but I thought about it a bit more, and it
seems that it is unsafe -- if you IORING_REGISTER_FILES to register the
dirfd and then close the dirfd, you'll get a file_count of 1, while you
can submit concurrent operations.  So I'll remove the conditional
locking.  Thanks!

(If not for IORING_REGISTER_FILES, it seems safe, because then
io_file_get() will hold a(t least one) reference on the file while the
operation is in flight, so then if file_count(req->file) == 1 here,
then it means that the file is no longer referenced by any fdtable,
and nobody else should be able to get a reference to it -- but that's
a bit of a useless optimization.)

(Logic was taken from __fdget_pos, where it is safe for a different
reason, i.e. __fget_light will not bump the refcount iff current->files
is unshared.)


> > +   pos_unlock = true;
> > +   mutex_lock(>file->f_pos_lock);
> > +   }
> > +
> > +   if (req->file->f_pos != getdents->pos) {
> > +   loff_t res = vfs_llseek(req->file, getdents->pos, SEEK_SET);
> 
> I may be missing the previous discussions, but can this ever become
> stateless, like passing an offset? Including readdir.c and beyond. 

My aim was to only make the minimally required change initially, but
to make that optimization possible in the future (e.g. by reserving the
right to either update or not update the file position) -- but I'll
try doing the optimization now.


> > +   if (res < 0)
> > +   ret = res;
> > +   }
> > +
> > +   if (ret == 0) {
> > +   ret = vfs_getdents(req->file, getdents->dirent,
> > +  getdents->count);
> > +   }
> > +
> > +   if (pos_unlock)
> > +   mutex_unlock(>file->f_pos_lock);
> > +
> > +   if (ret < 0) {
> > +   if (ret == -ERESTARTSYS)
> > +   ret = -EINTR;
> > +   req_set_fail_links(req);
> > +   }
> > +   io_req_complete(req, ret);
> > +   return 0;
> > +}
> [...]
> 
> -- 
> Pavel Begunkov


[PATCH v3 1/2] readdir: split the core of getdents64(2) out into vfs_getdents()

2021-02-18 Thread Lennert Buytenhek
So that IORING_OP_GETDENTS may use it, split out the core of the
getdents64(2) syscall into a helper function, vfs_getdents().

vfs_getdents() calls into filesystems' ->iterate{,_shared}() which
expect serialization on struct file, which means that callers of
vfs_getdents() are responsible for either using fdget_pos() or
performing the equivalent serialization by hand.

Cc: Al Viro 
Signed-off-by: Lennert Buytenhek 
---
 fs/readdir.c   | 25 +
 include/linux/fs.h |  4 
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index 19434b3c982c..f52167c1eb61 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -348,10 +348,9 @@ static int filldir64(struct dir_context *ctx, const char 
*name, int namlen,
return -EFAULT;
 }
 
-SYSCALL_DEFINE3(getdents64, unsigned int, fd,
-   struct linux_dirent64 __user *, dirent, unsigned int, count)
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+unsigned int count)
 {
-   struct fd f;
struct getdents_callback64 buf = {
.ctx.actor = filldir64,
.count = count,
@@ -359,11 +358,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
};
int error;
 
-   f = fdget_pos(fd);
-   if (!f.file)
-   return -EBADF;
-
-   error = iterate_dir(f.file, );
+   error = iterate_dir(file, );
if (error >= 0)
error = buf.error;
if (buf.prev_reclen) {
@@ -376,6 +371,20 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
else
error = count - buf.count;
}
+   return error;
+}
+
+SYSCALL_DEFINE3(getdents64, unsigned int, fd,
+   struct linux_dirent64 __user *, dirent, unsigned int, count)
+{
+   struct fd f;
+   int error;
+
+   f = fdget_pos(fd);
+   if (!f.file)
+   return -EBADF;
+
+   error = vfs_getdents(f.file, dirent, count);
fdput_pos(f);
return error;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd47deea7c17..114885d3f6c4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3109,6 +3109,10 @@ extern const struct inode_operations 
simple_symlink_inode_operations;
 
 extern int iterate_dir(struct file *, struct dir_context *);
 
+struct linux_dirent64;
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+unsigned int count);
+
 int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
int flags);
 int vfs_fstat(int fd, struct kstat *stat);
-- 
2.29.2


[PATCH v3 2/2] io_uring: add support for IORING_OP_GETDENTS

2021-02-18 Thread Lennert Buytenhek
IORING_OP_GETDENTS behaves much like getdents64(2) and takes the same
arguments, but with a small twist: it takes an additional offset
argument, and reading from the specified directory starts at the given
offset.

For the first IORING_OP_GETDENTS call on a directory, the offset
parameter can be set to zero, and for subsequent calls, it can be
set to the ->d_off field of the last struct linux_dirent64 returned
by the previous IORING_OP_GETDENTS call.

Internally, if necessary, IORING_OP_GETDENTS will vfs_llseek() to
the right directory position before calling vfs_getdents().

IORING_OP_GETDENTS may or may not update the specified directory's
file offset, and the file offset should not be relied upon having
any particular value during or after an IORING_OP_GETDENTS call.

Signed-off-by: Lennert Buytenhek 
---
 fs/io_uring.c | 73 +++
 include/uapi/linux/io_uring.h |  1 +
 2 files changed, 74 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 056bd4c90ade..6853bf48369a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -635,6 +635,13 @@ struct io_mkdir {
struct filename *filename;
 };
 
+struct io_getdents {
+   struct file *file;
+   struct linux_dirent64 __user*dirent;
+   unsigned intcount;
+   loff_t  pos;
+};
+
 struct io_completion {
struct file *file;
struct list_headlist;
@@ -772,6 +779,7 @@ struct io_kiocb {
struct io_renamerename;
struct io_unlinkunlink;
struct io_mkdir mkdir;
+   struct io_getdents  getdents;
/* use only after cleaning per-op data, see io_clean_op() */
struct io_completioncompl;
};
@@ -1030,6 +1038,11 @@ static const struct io_op_def io_op_defs[] = {
.work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_FILES |
IO_WQ_WORK_FS | 
IO_WQ_WORK_BLKCG,
},
+   [IORING_OP_GETDENTS] = {
+   .needs_file = 1,
+   .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_FILES |
+   IO_WQ_WORK_FS | 
IO_WQ_WORK_BLKCG,
+   },
 };
 
 static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
@@ -4677,6 +4690,61 @@ static int io_sync_file_range(struct io_kiocb *req, 
unsigned int issue_flags)
return 0;
 }
 
+static int io_getdents_prep(struct io_kiocb *req,
+   const struct io_uring_sqe *sqe)
+{
+   struct io_getdents *getdents = >getdents;
+
+   if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+   return -EINVAL;
+   if (sqe->ioprio || sqe->rw_flags || sqe->buf_index)
+   return -EINVAL;
+
+   getdents->pos = READ_ONCE(sqe->off);
+   getdents->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
+   getdents->count = READ_ONCE(sqe->len);
+   return 0;
+}
+
+static int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
+{
+   struct io_getdents *getdents = >getdents;
+   bool pos_unlock = false;
+   int ret = 0;
+
+   /* getdents always requires a blocking context */
+   if (issue_flags & IO_URING_F_NONBLOCK)
+   return -EAGAIN;
+
+   /* for vfs_llseek and to serialize ->iterate_shared() on this file */
+   if (file_count(req->file) > 1) {
+   pos_unlock = true;
+   mutex_lock(>file->f_pos_lock);
+   }
+
+   if (req->file->f_pos != getdents->pos) {
+   loff_t res = vfs_llseek(req->file, getdents->pos, SEEK_SET);
+   if (res < 0)
+   ret = res;
+   }
+
+   if (ret == 0) {
+   ret = vfs_getdents(req->file, getdents->dirent,
+  getdents->count);
+   }
+
+   if (pos_unlock)
+   mutex_unlock(>file->f_pos_lock);
+
+   if (ret < 0) {
+   if (ret == -ERESTARTSYS)
+   ret = -EINTR;
+   req_set_fail_links(req);
+   }
+   io_req_complete(req, ret);
+   return 0;
+}
+
 #if defined(CONFIG_NET)
 static int io_setup_async_msg(struct io_kiocb *req,
  struct io_async_msghdr *kmsg)
@@ -6184,6 +6252,8 @@ static int io_req_prep(struct io_kiocb *req, const struct 
io_uring_sqe *sqe)
return io_unlinkat_prep(req, sqe);
case IORING_OP_MKDIRAT:
return io_mkdirat_prep(req, sqe);
+   case IORING_OP_GETDENTS:
+   return io_getdents_prep(req, sqe);
}
 
printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6428,6 +6498,9 @@ static int io_issue_sqe(struct io_kiocb *re

[PATCH v3 0/2] io_uring: add support for IORING_OP_GETDENTS

2021-02-18 Thread Lennert Buytenhek
These patches add support for IORING_OP_GETDENTS, which is a new io_uring
opcode that more or less does an lseek(sqe->fd, sqe->off, SEEK_SET)
followed by a getdents64(sqe->fd, (void *)sqe->addr, sqe->len).

A dumb test program for IORING_OP_GETDENTS is available here:

https://krautbox.wantstofly.org/~buytenh/uringfind-v2.c

This test program does something along the lines of what find(1) does:
it scans recursively through a directory tree and prints the names of
all directories and files it encounters along the way -- but then using
io_uring.  (The io_uring version prints the names of encountered files and
directories in an order that's determined by SQE completion order, which
is somewhat nondeterministic and likely to differ between runs.)

On a directory tree with 14-odd million files in it that's on a
six-drive (spinning disk) btrfs raid, find(1) takes:

# echo 3 > /proc/sys/vm/drop_caches
# time find /mnt/repo > /dev/null

real24m7.815s
user0m15.015s
sys 0m48.340s
#

And the io_uring version takes:

# echo 3 > /proc/sys/vm/drop_caches
# time ./uringfind /mnt/repo > /dev/null

real10m29.064s
user0m4.347s
sys 0m1.677s
#


The fully cached case also shows some speedup.  find(1):

# time find /mnt/repo > /dev/null

real0m5.223s
user0m1.926s
sys 0m3.268s
#

Versus the io_uring version:

# time ./uringfind /mnt/repo > /dev/null

real0m3.604s
user0m2.417s
sys 0m0.793s
#


That said, the point of this patch isn't primarily to enable
lightning-fast find(1) or du(1), but more to complete the set of
filesystem I/O primitives available via io_uring, so that applications
can do all of their filesystem I/O using the same mechanism, without
having to manually punt some of their work out to worker threads -- and
indeed, an object storage backend server that I wrote a while ago can
run with a pure io_uring based event loop with this patch.

Changes since v2 RFC:

- Rebase onto io_uring-2021-02-17 plus a manually applied version of
  the mkdirat patch.  The latter is needed because userland (liburing)
  has already merged the opcode for IORING_OP_MKDIRAT (in commit
  "io_uring.h: 5.12 pending kernel sync") while this opcode isn't in
  the kernel yet (as of io_uring-2021-02-17), and this means that this
  can't be merged until IORING_OP_MKDIRAT is merged.

- Adapt to changes made in "io_uring: replace force_nonblock with flags"
  that are in io_uring-2021-02-17.

Changes since v1 RFC:

- Drop the trailing '64' from IORING_OP_GETDENTS64 (suggested by
  Matthew Wilcox).

- Instead of requiring that sqe->off be zero, use this field to pass
  in a directory offset to start reading from.  For the first
  IORING_OP_GETDENTS call on a directory, this can be set to zero,
  and for subsequent calls, it can be set to the ->d_off field of
  the last struct linux_dirent64 returned by the previous call.

Lennert Buytenhek (2):
  readdir: split the core of getdents64(2) out into vfs_getdents()
  io_uring: add support for IORING_OP_GETDENTS

 fs/io_uring.c |   73 ++
 fs/readdir.c  |   25 +-
 include/linux/fs.h|4 ++
 include/uapi/linux/io_uring.h |1 
 4 files changed, 95 insertions(+), 8 deletions(-)


Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

2021-01-29 Thread Lennert Buytenhek
On Fri, Jan 29, 2021 at 07:37:03AM +0200, Lennert Buytenhek wrote:

> > > > One open question is whether IORING_OP_GETDENTS64 should be more like
> > > > pread(2) and allow passing in a starting offset to read from the
> > > > directory from.  (This would require some more surgery in fs/readdir.c.)
> > > 
> > > Since directories are seekable this ought to work.
> > > Modulo horrid issues with 32bit file offsets.
> > 
> > The incremental patch below does this.  (It doesn't apply cleanly on
> > top of v1 of the IORING_OP_GETDENTS patch as I have other changes in
> > my tree -- I'm including it just to illustrate the changes that would
> > make this work.)
> > 
> > This change seems to work, and makes IORING_OP_GETDENTS take an
> > explicitly specified directory offset (instead of using the file's
> > ->f_pos), making it more like pread(2) [...]
> 
> ...but the fact that this patch avoids taking file->f_pos_lock (as this
> proposed version of IORING_OP_GETDENTS avoids using file->f_pos) means
> that ->iterate_shared() can then be called concurrently on the same
> struct file, which breaks the mutual exclusion guarantees provided here.
> 
> If possible, I'd like to keep the ability to explicitly pass in a
> directory offset to start reading from into IORING_OP_GETDENTS, so
> perhaps we can simply satisfy the mutual exclusion requirement by
> taking ->f_pos_lock by hand -- but then I do need to check that it's OK
> for ->iterate{,_shared}() to be called successively with discontinuous
> offsets without ->llseek() being called in between.
> 
> (If that's not OK, then we can either have IORING_OP_GETDENTS just
> always start reading at ->f_pos like before (which might then require
> adding a IORING_OP_GETDENTS2 at some point in the future if we'll
> ever want to change that), or we could have IORING_OP_GETDENTS itself
> call ->llseek() for now whenever necessary.)

Having IORING_OP_GETDENTS seek to sqe->off if needed seems easy
enough to implement, and it simplifies the other code as well, so
I'll send out a v2 RFC shortly that does this.


Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

2021-01-28 Thread Lennert Buytenhek
On Fri, Jan 29, 2021 at 01:07:10AM +0200, Lennert Buytenhek wrote:

> > > One open question is whether IORING_OP_GETDENTS64 should be more like
> > > pread(2) and allow passing in a starting offset to read from the
> > > directory from.  (This would require some more surgery in fs/readdir.c.)
> > 
> > Since directories are seekable this ought to work.
> > Modulo horrid issues with 32bit file offsets.
> 
> The incremental patch below does this.  (It doesn't apply cleanly on
> top of v1 of the IORING_OP_GETDENTS patch as I have other changes in
> my tree -- I'm including it just to illustrate the changes that would
> make this work.)
> 
> This change seems to work, and makes IORING_OP_GETDENTS take an
> explicitly specified directory offset (instead of using the file's
> ->f_pos), making it more like pread(2) [...]

...but the fact that this patch avoids taking file->f_pos_lock (as this
proposed version of IORING_OP_GETDENTS avoids using file->f_pos) means
that ->iterate_shared() can then be called concurrently on the same
struct file, which breaks the mutual exclusion guarantees provided here.

If possible, I'd like to keep the ability to explicitly pass in a
directory offset to start reading from into IORING_OP_GETDENTS, so
perhaps we can simply satisfy the mutual exclusion requirement by
taking ->f_pos_lock by hand -- but then I do need to check that it's OK
for ->iterate{,_shared}() to be called successively with discontinuous
offsets without ->llseek() being called in between.

(If that's not OK, then we can either have IORING_OP_GETDENTS just
always start reading at ->f_pos like before (which might then require
adding a IORING_OP_GETDENTS2 at some point in the future if we'll
ever want to change that), or we could have IORING_OP_GETDENTS itself
call ->llseek() for now whenever necessary.)


> , and I like the change from
> a conceptual point of view, but it's a bit ugly around
> iterate_dir_use_ctx_pos().  Any thoughts on how to do this more
> cleanly (without breaking iterate_dir() semantics)?
> 
> 
> > You'd need to return the final offset to allow another
> > read to continue from the end position.
> 
> We can use the ->d_off value as returned in the last struct
> linux_dirent64 as the directory offset to continue reading from
> with the next IORING_OP_GETDENTS call, illustrated by the patch
> to uringfind.c at the bottom.
> 
> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 13dd29f8ebb3..0f9707ed9294 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -576,6 +576,7 @@ struct io_getdents {
>   struct file *file;
>   struct linux_dirent64 __user*dirent;
>   unsigned intcount;
> + loff_t  pos;
>  };
>  
>  struct io_completion {
> @@ -4584,9 +4585,10 @@ static int io_getdents_prep(struct io_kiocb *req,
>  
>   if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>   return -EINVAL;
> - if (sqe->ioprio || sqe->off || sqe->rw_flags || sqe->buf_index)
> + if (sqe->ioprio || sqe->rw_flags || sqe->buf_index)
>   return -EINVAL;
>  
> + getdents->pos = READ_ONCE(sqe->off);
>   getdents->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
>   getdents->count = READ_ONCE(sqe->len);
>   return 0;
> @@ -4601,7 +4603,8 @@ static int io_getdents(struct io_kiocb *req, bool 
> force_nonblock)
>   if (force_nonblock)
>   return -EAGAIN;
>  
> - ret = vfs_getdents(req->file, getdents->dirent, getdents->count);
> + ret = vfs_getdents(req->file, getdents->dirent, getdents->count,
> +>pos);
>   if (ret < 0) {
>   if (ret == -ERESTARTSYS)
>   ret = -EINTR;
> diff --git a/fs/readdir.c b/fs/readdir.c
> index f52167c1eb61..d6bd78f6350a 100644
> --- a/fs/readdir.c
> +++ b/fs/readdir.c
> @@ -37,7 +37,7 @@
>  } while (0)
>  
>  
> -int iterate_dir(struct file *file, struct dir_context *ctx)
> +int iterate_dir_use_ctx_pos(struct file *file, struct dir_context *ctx)
>  {
>   struct inode *inode = file_inode(file);
>   bool shared = false;
> @@ -60,12 +60,10 @@ int iterate_dir(struct file *file, struct dir_context 
> *ctx)
>  
>   res = -ENOENT;
>   if (!IS_DEADDIR(inode)) {
> - ctx->pos = file->f_pos;
>   if (shared)
>   res = file->f_op->iterate_shared(file, ctx);
>   else
>   res = file->f_op->iterate(file, ctx);
> - file->f_pos = ctx->pos;
>   fsnotify_access

Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

2021-01-28 Thread Lennert Buytenhek
On Sun, Jan 24, 2021 at 10:21:38PM +, David Laight wrote:

> > One open question is whether IORING_OP_GETDENTS64 should be more like
> > pread(2) and allow passing in a starting offset to read from the
> > directory from.  (This would require some more surgery in fs/readdir.c.)
> 
> Since directories are seekable this ought to work.
> Modulo horrid issues with 32bit file offsets.

The incremental patch below does this.  (It doesn't apply cleanly on
top of v1 of the IORING_OP_GETDENTS patch as I have other changes in
my tree -- I'm including it just to illustrate the changes that would
make this work.)

This change seems to work, and makes IORING_OP_GETDENTS take an
explicitly specified directory offset (instead of using the file's
->f_pos), making it more like pread(2), and I like the change from
a conceptual point of view, but it's a bit ugly around
iterate_dir_use_ctx_pos().  Any thoughts on how to do this more
cleanly (without breaking iterate_dir() semantics)?


> You'd need to return the final offset to allow another
> read to continue from the end position.

We can use the ->d_off value as returned in the last struct
linux_dirent64 as the directory offset to continue reading from
with the next IORING_OP_GETDENTS call, illustrated by the patch
to uringfind.c at the bottom.



diff --git a/fs/io_uring.c b/fs/io_uring.c
index 13dd29f8ebb3..0f9707ed9294 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -576,6 +576,7 @@ struct io_getdents {
struct file *file;
struct linux_dirent64 __user*dirent;
unsigned intcount;
+   loff_t  pos;
 };
 
 struct io_completion {
@@ -4584,9 +4585,10 @@ static int io_getdents_prep(struct io_kiocb *req,
 
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
-   if (sqe->ioprio || sqe->off || sqe->rw_flags || sqe->buf_index)
+   if (sqe->ioprio || sqe->rw_flags || sqe->buf_index)
return -EINVAL;
 
+   getdents->pos = READ_ONCE(sqe->off);
getdents->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
getdents->count = READ_ONCE(sqe->len);
return 0;
@@ -4601,7 +4603,8 @@ static int io_getdents(struct io_kiocb *req, bool 
force_nonblock)
if (force_nonblock)
return -EAGAIN;
 
-   ret = vfs_getdents(req->file, getdents->dirent, getdents->count);
+   ret = vfs_getdents(req->file, getdents->dirent, getdents->count,
+  >pos);
if (ret < 0) {
if (ret == -ERESTARTSYS)
ret = -EINTR;
diff --git a/fs/readdir.c b/fs/readdir.c
index f52167c1eb61..d6bd78f6350a 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -37,7 +37,7 @@
 } while (0)
 
 
-int iterate_dir(struct file *file, struct dir_context *ctx)
+int iterate_dir_use_ctx_pos(struct file *file, struct dir_context *ctx)
 {
struct inode *inode = file_inode(file);
bool shared = false;
@@ -60,12 +60,10 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 
res = -ENOENT;
if (!IS_DEADDIR(inode)) {
-   ctx->pos = file->f_pos;
if (shared)
res = file->f_op->iterate_shared(file, ctx);
else
res = file->f_op->iterate(file, ctx);
-   file->f_pos = ctx->pos;
fsnotify_access(file);
file_accessed(file);
}
@@ -76,6 +74,17 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 out:
return res;
 }
+
+int iterate_dir(struct file *file, struct dir_context *ctx)
+{
+   int res;
+
+   ctx->pos = file->f_pos;
+   res = iterate_dir_use_ctx_pos(file, ctx);
+   file->f_pos = ctx->pos;
+
+   return res;
+}
 EXPORT_SYMBOL(iterate_dir);
 
 /*
@@ -349,16 +358,18 @@ static int filldir64(struct dir_context *ctx, const char 
*name, int namlen,
 }
 
 int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
-unsigned int count)
+unsigned int count, loff_t *pos)
 {
struct getdents_callback64 buf = {
.ctx.actor = filldir64,
+   .ctx.pos = *pos,
.count = count,
.current_dir = dirent
};
int error;
 
-   error = iterate_dir(file, );
+   error = iterate_dir_use_ctx_pos(file, );
+   *pos = buf.ctx.pos;
if (error >= 0)
error = buf.error;
if (buf.prev_reclen) {
@@ -384,7 +395,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
if (!f.file)
return -EBADF;
 
-   error = vfs_getdents(f.file, dirent, count);
+   error = vfs_getdents(f.file, dirent, count, >f_pos);
fdput_pos(f);
return error;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 114885d3f6c4..4d9d96163f92 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3107,11 +3107,12 @@ const 

Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

2021-01-28 Thread Lennert Buytenhek
On Sat, Jan 23, 2021 at 04:33:34PM -0700, Jens Axboe wrote:

> >> IORING_OP_GETDENTS64 behaves like getdents64(2) and takes the same
> > 
> > Could we drop the '64'?  We don't, for example, have IOURING_OP_FADVISE64
> > even though that's the name of the syscall.
> 
> Agreed, only case we do mimic the names are for things like
> IORING_OP_OPENAT2 where it does carry meaning. For this one, it should
> just be IORING_OP_GETDENTS.

OK, I've made this change.


Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

2021-01-23 Thread Lennert Buytenhek
On Sat, Jan 23, 2021 at 10:37:25AM -0700, Jens Axboe wrote:

> > IORING_OP_GETDENTS64 behaves like getdents64(2) and takes the same
> > arguments.
> > 
> > Signed-off-by: Lennert Buytenhek 
> > ---
> > This seems to work OK, but I'd appreciate a review from someone more
> > familiar with io_uring internals than I am, as I'm not entirely sure
> > I did everything quite right.
> > 
> > A dumb test program for IORING_OP_GETDENTS64 is available here:
> > 
> > https://krautbox.wantstofly.org/~buytenh/uringfind.c
> > 
> > This does more or less what find(1) does: it scans recursively through
> > a directory tree and prints the names of all directories and files it
> > encounters along the way -- but then using io_uring.  (The uring version
> > prints the names of encountered files and directories in an order that's
> > determined by SQE completion order, which is somewhat nondeterministic
> > and likely to differ between runs.)
> > 
> > On a directory tree with 14-odd million files in it that's on a
> > six-drive (spinning disk) btrfs raid, find(1) takes:
> > 
> > # echo 3 > /proc/sys/vm/drop_caches 
> > # time find /mnt/repo > /dev/null
> > 
> > real24m7.815s
> > user0m15.015s
> > sys 0m48.340s
> > #
> > 
> > And the io_uring version takes:
> > 
> > # echo 3 > /proc/sys/vm/drop_caches 
> > # time ./uringfind /mnt/repo > /dev/null
> > 
> > real10m29.064s
> > user0m4.347s
> > sys 0m1.677s
> > #
> > 
> > These timings are repeatable and consistent to within a few seconds.
> > 
> > (btrfs seems to be sending most metadata reads to the same drive in the
> > array during this test, even though this filesystem is using the raid1c4
> > profile for metadata, so I suspect that more drive-level parallelism can
> > be extracted with some btrfs tweaks.)
> > 
> > The fully cached case also shows some speedup for the io_uring version:
> > 
> > # time find /mnt/repo > /dev/null
> > 
> > real0m5.223s
> > user0m1.926s
> > sys 0m3.268s
> > #
> > 
> > vs:
> > 
> > # time ./uringfind /mnt/repo > /dev/null
> > 
> > real0m3.604s
> > user0m2.417s
> > sys 0m0.793s
> > #
> > 
> > That said, the point of this patch isn't primarily to enable
> > lightning-fast find(1) or du(1), but more to complete the set of
> > filesystem I/O primitives available via io_uring, so that applications
> > can do all of their filesystem I/O using the same mechanism, without
> > having to manually punt some of their work out to worker threads -- and
> > indeed, an object storage backend server that I wrote a while ago can
> > run with a pure io_uring based event loop with this patch.
> 
> The results look nice for sure.

Thanks!  And thank you for having a look.


> Once concern is that io_uring generally
> guarantees that any state passed in is stable once submit is done. For
> the below implementation, that doesn't hold as the linux_dirent64 isn't
> used until later in the process. That means if you do:
> 
> submit_getdents64(ring)
> {
>   struct linux_dirent64 dent;
>   struct io_uring_sqe *sqe;
> 
>   sqe = io_uring_get_sqe(ring);
>   io_uring_prep_getdents64(sqe, ..., );
>   io_uring_submit(ring);
> }
> 
> other_func(ring)
> {
>   struct io_uring_cqe *cqe;
> 
>   submit_getdents64(ring);
>   io_uring_wait_cqe(ring, );
>   
> }
> 
> then the kernel side might get garbage by the time the sqe is actually
> submitted. This is true because you don't use it inline, only from the
> out-of-line async context. Usually this is solved by having the prep
> side copy in the necessary state, eg see io_openat2_prep() for how we
> make filename and open_how stable by copying them into kernel memory.
> That ensures that if/when these operations need to go async and finish
> out-of-line, the contents are stable and there's no requirement for the
> application to keep them valid once submission is done.
> 
> Not sure how best to solve that, since the vfs side relies heavily on
> linux_dirent64 being a user pointer...

No data is passed into the kernel on a getdents64(2) call via user
memory, i.e. getdents64(2) only ever writes into the supplied
linux_dirent64 user pointer, it never reads from it.  The only things
that we need to keep stable here are the linux_dirent64 pointer itself
and the 'count' argument and those are both passed in via the SQE, and
we READ_ONCE() them from the SQE in the prep function.  I think that's
probably the source of confusion here?


Cheers,
Lennert


[RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64

2021-01-23 Thread Lennert Buytenhek
IORING_OP_GETDENTS64 behaves like getdents64(2) and takes the same
arguments.

Signed-off-by: Lennert Buytenhek 
---
This seems to work OK, but I'd appreciate a review from someone more
familiar with io_uring internals than I am, as I'm not entirely sure
I did everything quite right.

A dumb test program for IORING_OP_GETDENTS64 is available here:

https://krautbox.wantstofly.org/~buytenh/uringfind.c

This does more or less what find(1) does: it scans recursively through
a directory tree and prints the names of all directories and files it
encounters along the way -- but then using io_uring.  (The uring version
prints the names of encountered files and directories in an order that's
determined by SQE completion order, which is somewhat nondeterministic
and likely to differ between runs.)

On a directory tree with 14-odd million files in it that's on a
six-drive (spinning disk) btrfs raid, find(1) takes:

# echo 3 > /proc/sys/vm/drop_caches 
# time find /mnt/repo > /dev/null

real24m7.815s
user0m15.015s
sys 0m48.340s
#

And the io_uring version takes:

# echo 3 > /proc/sys/vm/drop_caches 
# time ./uringfind /mnt/repo > /dev/null

real10m29.064s
user0m4.347s
sys 0m1.677s
#

These timings are repeatable and consistent to within a few seconds.

(btrfs seems to be sending most metadata reads to the same drive in the
array during this test, even though this filesystem is using the raid1c4
profile for metadata, so I suspect that more drive-level parallelism can
be extracted with some btrfs tweaks.)

The fully cached case also shows some speedup for the io_uring version:

# time find /mnt/repo > /dev/null

real0m5.223s
user0m1.926s
sys 0m3.268s
#

vs:

# time ./uringfind /mnt/repo > /dev/null

real0m3.604s
user0m2.417s
sys 0m0.793s
#

That said, the point of this patch isn't primarily to enable
lightning-fast find(1) or du(1), but more to complete the set of
filesystem I/O primitives available via io_uring, so that applications
can do all of their filesystem I/O using the same mechanism, without
having to manually punt some of their work out to worker threads -- and
indeed, an object storage backend server that I wrote a while ago can
run with a pure io_uring based event loop with this patch.

One open question is whether IORING_OP_GETDENTS64 should be more like
pread(2) and allow passing in a starting offset to read from the
directory from.  (This would require some more surgery in fs/readdir.c.)

 fs/io_uring.c |   51 ++
 fs/readdir.c  |   25 ++--
 include/linux/fs.h|4 +++
 include/uapi/linux/io_uring.h |1 
 4 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 985a9e3f976d..5d79b9668ee0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -572,6 +572,12 @@ struct io_unlink {
struct filename *filename;
 };
 
+struct io_getdents64 {
+   struct file *file;
+   struct linux_dirent64 __user*dirent;
+   unsigned intcount;
+};
+
 struct io_completion {
struct file *file;
struct list_headlist;
@@ -699,6 +705,7 @@ struct io_kiocb {
struct io_shutdown  shutdown;
struct io_renamerename;
struct io_unlinkunlink;
+   struct io_getdents64getdents64;
/* use only after cleaning per-op data, see io_clean_op() */
struct io_completioncompl;
};
@@ -987,6 +994,11 @@ static const struct io_op_def io_op_defs[] = {
.work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_FILES |
IO_WQ_WORK_FS | 
IO_WQ_WORK_BLKCG,
},
+   [IORING_OP_GETDENTS64] = {
+   .needs_file = 1,
+   .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_FILES |
+   IO_WQ_WORK_FS | 
IO_WQ_WORK_BLKCG,
+   },
 };
 
 enum io_mem_account {
@@ -4552,6 +4564,40 @@ static int io_sync_file_range(struct io_kiocb *req, bool 
force_nonblock)
return 0;
 }
 
+static int io_getdents64_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+   struct io_getdents64 *getdents64 = >getdents64;
+
+   if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+   return -EINVAL;
+   if (sqe->ioprio || sqe->off || sqe->rw_flags || sqe->buf_index)
+   return -EINVAL;
+
+   getdents64->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
+   getdents64->count = REA

Re: [PATCH 3/8] ARM: l2x0: add Marvell Tauros3 compatible

2013-10-11 Thread Lennert Buytenhek
On Wed, Oct 09, 2013 at 09:27:14PM +0200, Sebastian Hesselbarth wrote:

> >This add a compatible for the Marvell Tauros3 cache controller which
> >is compatible with l2x0 cache controllers. While updating the binding
> >documentation, clean up the list of possible compatibles.
> >
> >Signed-off-by: Sebastian Hesselbarth 
> >---
> 
> Added Jisheng and Lennert to Cc.
> 
> Lennert, while looking for differences between ARM PL310 and
> Marvell Tauros3 cache controller in a GPL'ed 2.6 kernel source
> from Asus, I found arch/arm/mm/cache-tauros3.c which states you
> as the original author. If that is wrong, please ignore this.

I'm the original author of cache-tauros2.c, but I've never heard of
Tauros3, and my name probably ended up in there via cp(1).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] ARM: l2x0: add Marvell Tauros3 compatible

2013-10-11 Thread Lennert Buytenhek
On Wed, Oct 09, 2013 at 09:27:14PM +0200, Sebastian Hesselbarth wrote:

 This add a compatible for the Marvell Tauros3 cache controller which
 is compatible with l2x0 cache controllers. While updating the binding
 documentation, clean up the list of possible compatibles.
 
 Signed-off-by: Sebastian Hesselbarth sebastian.hesselba...@gmail.com
 ---
 
 Added Jisheng and Lennert to Cc.
 
 Lennert, while looking for differences between ARM PL310 and
 Marvell Tauros3 cache controller in a GPL'ed 2.6 kernel source
 from Asus, I found arch/arm/mm/cache-tauros3.c which states you
 as the original author. If that is wrong, please ignore this.

I'm the original author of cache-tauros2.c, but I've never heard of
Tauros3, and my name probably ended up in there via cp(1).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mv643xx_eth: Fix a possible deadlock upon ifdown

2013-03-11 Thread Lennert Buytenhek
On Fri, Mar 01, 2013 at 06:30:13PM +0100, Alexander Holler wrote:

> >>From: Lubomir Rintel 
> >>
> >>=
> >>[ INFO: inconsistent lock state ]
> >>3.7.0-6.luboskovo.fc19.armv5tel.kirkwood #1 Tainted: GW
> >>-
> >>inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> >>NetworkManager/337 [HC0[0]:SC0[0]:HE1:SE1] takes:
> >>  (_xmit_ETHER#2){+.?...}, at: [] txq_reclaim+0x54/0x264 
> >> [mv643xx_eth]
> 
> I get the same annoying warning when the MTU gets changed (through dhcp).

That is actually an issue.


> >Maybe I'm not reading it right, but I doubt that this is an actual
> >deadlock or that the patch is needed.
> >
> >txq_reclaim() indeed doesn't disable BHs, but that's because it's
> >always called in BH context.  Almost always -- the only exception is
> >txq_deinit(), called from ->ndo_stop(), but by that time we've
> >already napi_disable()'d and netif_carrier_off()'d and free_irq()'d.
> 
> Agreed. I've just read me through that too and don't think a
> deadlock is possible.
> 
> >How to explain that to lockdep, though, I don't know.
> 
> The patch helps with that. ;)

It fixes a bug (the MTU change thing) and a non-bug (the lockdep
warning) at the expense of slowing down the much more common path,
and I don't like it for that reason.

Can you make a __txq_reclaim() which is basically txq_reclaim()
without grabbing the tx queue lock, and then move the lock grabbing
to the caller?

E.g. make __txq_reclaim() have two callers, txq_reclaim() and
txq_reclaim_bh(), and then use the appropriate wrapper depending on
the context.  (tx queue lock but no BH disable when called from
mv643xx_eth_poll(), tx queue lock plus BH disable for MTU change,
and no locking at all when called from ->ndo_stop().  Something
like that.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mv643xx_eth: Fix a possible deadlock upon ifdown

2013-03-11 Thread Lennert Buytenhek
On Fri, Mar 01, 2013 at 06:30:13PM +0100, Alexander Holler wrote:

 From: Lubomir Rintel lubo.rin...@gooddata.com
 
 =
 [ INFO: inconsistent lock state ]
 3.7.0-6.luboskovo.fc19.armv5tel.kirkwood #1 Tainted: GW
 -
 inconsistent {IN-SOFTIRQ-W} - {SOFTIRQ-ON-W} usage.
 NetworkManager/337 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (_xmit_ETHER#2){+.?...}, at: [bf07adfc] txq_reclaim+0x54/0x264 
  [mv643xx_eth]
 
 I get the same annoying warning when the MTU gets changed (through dhcp).

That is actually an issue.


 Maybe I'm not reading it right, but I doubt that this is an actual
 deadlock or that the patch is needed.
 
 txq_reclaim() indeed doesn't disable BHs, but that's because it's
 always called in BH context.  Almost always -- the only exception is
 txq_deinit(), called from -ndo_stop(), but by that time we've
 already napi_disable()'d and netif_carrier_off()'d and free_irq()'d.
 
 Agreed. I've just read me through that too and don't think a
 deadlock is possible.
 
 How to explain that to lockdep, though, I don't know.
 
 The patch helps with that. ;)

It fixes a bug (the MTU change thing) and a non-bug (the lockdep
warning) at the expense of slowing down the much more common path,
and I don't like it for that reason.

Can you make a __txq_reclaim() which is basically txq_reclaim()
without grabbing the tx queue lock, and then move the lock grabbing
to the caller?

E.g. make __txq_reclaim() have two callers, txq_reclaim() and
txq_reclaim_bh(), and then use the appropriate wrapper depending on
the context.  (tx queue lock but no BH disable when called from
mv643xx_eth_poll(), tx queue lock plus BH disable for MTU change,
and no locking at all when called from -ndo_stop().  Something
like that.)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/net/wireless/mwl8k.c: avoid use-after-free

2013-01-06 Thread Lennert Buytenhek
On Sun, Jan 06, 2013 at 10:02:14PM -0500, Nickolai Zeldovich wrote:

> > Good catch, but the patch would be better titled "mwl8k.c: avoid
> > having a working driver", as the station_id return code _is_ needed
> > by the caller in case of success.
> 
> I'm not quite sure what you mean -- is there something subtle going on
> here?  I believe my patch preserves the semantics of the original
> code: it returns the value of p->station_id if mwl8k_post_cmd()
> returned 0, but it just does so by reading p->station_id first before
> calling kfree(cmd).

Oops!  You're right.  Sorry about that.

/me goes to order some crow for dinner
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/net/wireless/mwl8k.c: avoid use-after-free

2013-01-06 Thread Lennert Buytenhek
On Sun, Jan 06, 2013 at 08:27:22PM -0500, Nickolai Zeldovich wrote:

> Do not dereference p->station_id after kfree(cmd) because p
> points into the cmd data structure.

Good catch, but the patch would be better titled "mwl8k.c: avoid
having a working driver", as the station_id return code _is_ needed
by the caller in case of success.


> Signed-off-by: Nickolai Zeldovich 
> ---
>  drivers/net/wireless/mwl8k.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/mwl8k.c b/drivers/net/wireless/mwl8k.c
> index f221b95..83564d3 100644
> --- a/drivers/net/wireless/mwl8k.c
> +++ b/drivers/net/wireless/mwl8k.c
> @@ -4250,9 +4250,11 @@ static int mwl8k_cmd_update_stadb_add(struct 
> ieee80211_hw *hw,
>   p->amsdu_enabled = 0;
>  
>   rc = mwl8k_post_cmd(hw, >header);
> + if (!rc)
> + rc = p->station_id;
>   kfree(cmd);
>  
> - return rc ? rc : p->station_id;
> + return rc;
>  }
>  
>  static int mwl8k_cmd_update_stadb_del(struct ieee80211_hw *hw,
> -- 
> 1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/net/wireless/mwl8k.c: avoid use-after-free

2013-01-06 Thread Lennert Buytenhek
On Sun, Jan 06, 2013 at 08:27:22PM -0500, Nickolai Zeldovich wrote:

 Do not dereference p-station_id after kfree(cmd) because p
 points into the cmd data structure.

Good catch, but the patch would be better titled mwl8k.c: avoid
having a working driver, as the station_id return code _is_ needed
by the caller in case of success.


 Signed-off-by: Nickolai Zeldovich nicko...@csail.mit.edu
 ---
  drivers/net/wireless/mwl8k.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/wireless/mwl8k.c b/drivers/net/wireless/mwl8k.c
 index f221b95..83564d3 100644
 --- a/drivers/net/wireless/mwl8k.c
 +++ b/drivers/net/wireless/mwl8k.c
 @@ -4250,9 +4250,11 @@ static int mwl8k_cmd_update_stadb_add(struct 
 ieee80211_hw *hw,
   p-amsdu_enabled = 0;
  
   rc = mwl8k_post_cmd(hw, cmd-header);
 + if (!rc)
 + rc = p-station_id;
   kfree(cmd);
  
 - return rc ? rc : p-station_id;
 + return rc;
  }
  
  static int mwl8k_cmd_update_stadb_del(struct ieee80211_hw *hw,
 -- 
 1.7.10.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/net/wireless/mwl8k.c: avoid use-after-free

2013-01-06 Thread Lennert Buytenhek
On Sun, Jan 06, 2013 at 10:02:14PM -0500, Nickolai Zeldovich wrote:

  Good catch, but the patch would be better titled mwl8k.c: avoid
  having a working driver, as the station_id return code _is_ needed
  by the caller in case of success.
 
 I'm not quite sure what you mean -- is there something subtle going on
 here?  I believe my patch preserves the semantics of the original
 code: it returns the value of p-station_id if mwl8k_post_cmd()
 returned 0, but it just does so by reading p-station_id first before
 calling kfree(cmd).

Oops!  You're right.  Sorry about that.

/me goes to order some crow for dinner
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mv643xx_eth: Fix a possible deadlock upon ifdown

2013-01-04 Thread Lennert Buytenhek
On Fri, Jan 04, 2013 at 03:07:02PM +0100, Lubomir Rintel wrote:

> From: Lubomir Rintel 
> 
> =
> [ INFO: inconsistent lock state ]
> 3.7.0-6.luboskovo.fc19.armv5tel.kirkwood #1 Tainted: GW
> -
> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> NetworkManager/337 [HC0[0]:SC0[0]:HE1:SE1] takes:
>  (_xmit_ETHER#2){+.?...}, at: [] txq_reclaim+0x54/0x264 
> [mv643xx_eth]
> {IN-SOFTIRQ-W} state was registered at:
>   [] __lock_acquire+0x5b4/0x17d0
>   [] lock_acquire+0x160/0x1e0
>   [] _raw_spin_lock+0x50/0x88
>   [] sch_direct_xmit+0x4c/0x2d4
>   [] dev_queue_xmit+0x4b8/0x8d8
>   [] ip6_finish_output2+0x350/0x42c
>   [] mld_sendpack+0x2d0/0x514
>   [] mld_ifc_timer_expire+0x228/0x278
>   [] call_timer_fn+0x140/0x33c
>   [] run_timer_softirq+0x278/0x32c
>   [] __do_softirq+0x16c/0x398
>   [] irq_exit+0x5c/0xc0
>   [] handle_IRQ+0x6c/0x8c
>   [] __irq_svc+0x38/0x80
>   [] lock_is_held+0x4/0x54
>   [] __might_sleep+0x44/0x228
>   [] down_read+0x28/0x88
>   [] __copy_to_user_memcpy+0xa8/0x140
>   [] seq_read+0x3ac/0x474
>   [] vfs_read+0xac/0x184
>   [] sys_read+0x40/0x6c
>   [] ret_fast_syscall+0x0/0x38
> irq event stamp: 115119
> hardirqs last  enabled at (115119): [] 
> _raw_spin_unlock_irqrestore+0x44/0x64
> hardirqs last disabled at (115118): [] 
> _raw_spin_lock_irqsave+0x28/0xa0
> softirqs last  enabled at (114880): [] __do_softirq+0x2b8/0x398
> softirqs last disabled at (114873): [] irq_exit+0x5c/0xc0
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>CPU0
>
>   lock(_xmit_ETHER#2);
>   
> lock(_xmit_ETHER#2);
> 
>  *** DEADLOCK ***
> 
> 1 lock held by NetworkManager/337:
>  #0:  (rtnl_mutex){+.+.+.}, at: [] rtnetlink_rcv+0x14/0x2c
> 
> stack backtrace:
> [] (unwind_backtrace+0x0/0x124) from [] 
> (print_usage_bug.part.29+0x20c/0x26c)
> [] (print_usage_bug.part.29+0x20c/0x26c) from [] 
> (mark_lock+0x404/0x60c)
> [] (mark_lock+0x404/0x60c) from [] 
> (__lock_acquire+0x638/0x17d0)
> [] (__lock_acquire+0x638/0x17d0) from [] 
> (lock_acquire+0x160/0x1e0)
> [] (lock_acquire+0x160/0x1e0) from [] 
> (_raw_spin_lock+0x50/0x88)
> [] (_raw_spin_lock+0x50/0x88) from [] 
> (txq_reclaim+0x54/0x264 [mv643xx_eth])
> [] (txq_reclaim+0x54/0x264 [mv643xx_eth]) from [] 
> (txq_deinit+0x30/0xec [mv643xx_eth])
> [] (txq_deinit+0x30/0xec [mv643xx_eth]) from [] 
> (mv643xx_eth_stop+0x124/0x140 [mv643xx_eth])
> [] (mv643xx_eth_stop+0x124/0x140 [mv643xx_eth]) from [] 
> (__dev_close_many+0xb0/0xec)
> [] (__dev_close_many+0xb0/0xec) from [] 
> (__dev_close+0x30/0x44)
> [] (__dev_close+0x30/0x44) from [] 
> (__dev_change_flags+0x94/0x120)
> [] (__dev_change_flags+0x94/0x120) from [] 
> (dev_change_flags+0x18/0x4c)
> [] (dev_change_flags+0x18/0x4c) from [] 
> (do_setlink+0x2cc/0x7ac)
> [] (do_setlink+0x2cc/0x7ac) from [] 
> (rtnl_newlink+0x26c/0x4a8)
> [] (rtnl_newlink+0x26c/0x4a8) from [] 
> (rtnetlink_rcv_msg+0x280/0x29c)
> [] (rtnetlink_rcv_msg+0x280/0x29c) from [] 
> (netlink_rcv_skb+0x58/0xb4)
> [] (netlink_rcv_skb+0x58/0xb4) from [] 
> (rtnetlink_rcv+0x20/0x2c)
> [] (rtnetlink_rcv+0x20/0x2c) from [] 
> (netlink_unicast+0x158/0x208)
> [] (netlink_unicast+0x158/0x208) from [] 
> (netlink_sendmsg+0x310/0x3c0)
> [] (netlink_sendmsg+0x310/0x3c0) from [] 
> (sock_sendmsg+0xa8/0xd0)
> [] (sock_sendmsg+0xa8/0xd0) from [] 
> (__sys_sendmsg+0x1d8/0x280)
> [] (__sys_sendmsg+0x1d8/0x280) from [] 
> (sys_sendmsg+0x44/0x68)
> [] (sys_sendmsg+0x44/0x68) from [] 
> (ret_fast_syscall+0x0/0x38)
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
> b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index 84c1326..67a3e78 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -943,7 +943,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, 
> int force)
>   struct netdev_queue *nq = netdev_get_tx_queue(mp->dev, txq->index);
>   int reclaimed;
>  
> - __netif_tx_lock(nq, smp_processor_id());
> + __netif_tx_lock_bh(nq);
>  
>   reclaimed = 0;
>   while (reclaimed < budget && txq->tx_desc_count > 0) {
> @@ -989,7 +989,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, 
> int force)
>   dev_kfree_skb(skb);
>   }
>  
> - __netif_tx_unlock(nq);
> + __netif_tx_unlock_bh(nq);
>  
>   if (reclaimed < budget)
>   mp->work_tx &= ~(1 << txq->index);
> -- 

Maybe I'm not reading it right, but I doubt that this is an actual
deadlock or that the patch is needed.

txq_reclaim() indeed doesn't disable BHs, but that's because it's
always called in BH context.  Almost always -- the only exception is
txq_deinit(), called from ->ndo_stop(), but by that time we've
already napi_disable()'d and netif_carrier_off()'d and free_irq()'d.

How to explain that to lockdep, 

Re: [PATCH] mv643xx_eth: Fix a possible deadlock upon ifdown

2013-01-04 Thread Lennert Buytenhek
On Fri, Jan 04, 2013 at 03:07:02PM +0100, Lubomir Rintel wrote:

 From: Lubomir Rintel lubo.rin...@gooddata.com
 
 =
 [ INFO: inconsistent lock state ]
 3.7.0-6.luboskovo.fc19.armv5tel.kirkwood #1 Tainted: GW
 -
 inconsistent {IN-SOFTIRQ-W} - {SOFTIRQ-ON-W} usage.
 NetworkManager/337 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (_xmit_ETHER#2){+.?...}, at: [bf07adfc] txq_reclaim+0x54/0x264 
 [mv643xx_eth]
 {IN-SOFTIRQ-W} state was registered at:
   [c0068480] __lock_acquire+0x5b4/0x17d0
   [c0069d7c] lock_acquire+0x160/0x1e0
   [c04f41a0] _raw_spin_lock+0x50/0x88
   [c0407178] sch_direct_xmit+0x4c/0x2d4
   [c03ec978] dev_queue_xmit+0x4b8/0x8d8
   [c0492dc8] ip6_finish_output2+0x350/0x42c
   [c04b7fd8] mld_sendpack+0x2d0/0x514
   [c04b8834] mld_ifc_timer_expire+0x228/0x278
   [c002afe8] call_timer_fn+0x140/0x33c
   [c002bbf0] run_timer_softirq+0x278/0x32c
   [c0024288] __do_softirq+0x16c/0x398
   [c002488c] irq_exit+0x5c/0xc0
   [c0009c64] handle_IRQ+0x6c/0x8c
   [c04f5218] __irq_svc+0x38/0x80
   [c0065684] lock_is_held+0x4/0x54
   [c004d5a0] __might_sleep+0x44/0x228
   [c04f25a4] down_read+0x28/0x88
   [c0263c94] __copy_to_user_memcpy+0xa8/0x140
   [c01374d0] seq_read+0x3ac/0x474
   [c011623c] vfs_read+0xac/0x184
   [c0116354] sys_read+0x40/0x6c
   [c0008cc0] ret_fast_syscall+0x0/0x38
 irq event stamp: 115119
 hardirqs last  enabled at (115119): [c04f4cf0] 
 _raw_spin_unlock_irqrestore+0x44/0x64
 hardirqs last disabled at (115118): [c04f430c] 
 _raw_spin_lock_irqsave+0x28/0xa0
 softirqs last  enabled at (114880): [c00243d4] __do_softirq+0x2b8/0x398
 softirqs last disabled at (114873): [c002488c] irq_exit+0x5c/0xc0
 
 other info that might help us debug this:
  Possible unsafe locking scenario:
 
CPU0

   lock(_xmit_ETHER#2);
   Interrupt
 lock(_xmit_ETHER#2);
 
  *** DEADLOCK ***
 
 1 lock held by NetworkManager/337:
  #0:  (rtnl_mutex){+.+.+.}, at: [c03fad04] rtnetlink_rcv+0x14/0x2c
 
 stack backtrace:
 [c000f5a8] (unwind_backtrace+0x0/0x124) from [c04ebea8] 
 (print_usage_bug.part.29+0x20c/0x26c)
 [c04ebea8] (print_usage_bug.part.29+0x20c/0x26c) from [c0067cc4] 
 (mark_lock+0x404/0x60c)
 [c0067cc4] (mark_lock+0x404/0x60c) from [c0068504] 
 (__lock_acquire+0x638/0x17d0)
 [c0068504] (__lock_acquire+0x638/0x17d0) from [c0069d7c] 
 (lock_acquire+0x160/0x1e0)
 [c0069d7c] (lock_acquire+0x160/0x1e0) from [c04f41a0] 
 (_raw_spin_lock+0x50/0x88)
 [c04f41a0] (_raw_spin_lock+0x50/0x88) from [bf07adfc] 
 (txq_reclaim+0x54/0x264 [mv643xx_eth])
 [bf07adfc] (txq_reclaim+0x54/0x264 [mv643xx_eth]) from [bf07b03c] 
 (txq_deinit+0x30/0xec [mv643xx_eth])
 [bf07b03c] (txq_deinit+0x30/0xec [mv643xx_eth]) from [bf07b21c] 
 (mv643xx_eth_stop+0x124/0x140 [mv643xx_eth])
 [bf07b21c] (mv643xx_eth_stop+0x124/0x140 [mv643xx_eth]) from [c03e8bbc] 
 (__dev_close_many+0xb0/0xec)
 [c03e8bbc] (__dev_close_many+0xb0/0xec) from [c03e8c28] 
 (__dev_close+0x30/0x44)
 [c03e8c28] (__dev_close+0x30/0x44) from [c03ed154] 
 (__dev_change_flags+0x94/0x120)
 [c03ed154] (__dev_change_flags+0x94/0x120) from [c03ed260] 
 (dev_change_flags+0x18/0x4c)
 [c03ed260] (dev_change_flags+0x18/0x4c) from [c03fb174] 
 (do_setlink+0x2cc/0x7ac)
 [c03fb174] (do_setlink+0x2cc/0x7ac) from [c03fc5ec] 
 (rtnl_newlink+0x26c/0x4a8)
 [c03fc5ec] (rtnl_newlink+0x26c/0x4a8) from [c03fc104] 
 (rtnetlink_rcv_msg+0x280/0x29c)
 [c03fc104] (rtnetlink_rcv_msg+0x280/0x29c) from [c041245c] 
 (netlink_rcv_skb+0x58/0xb4)
 [c041245c] (netlink_rcv_skb+0x58/0xb4) from [c03fad10] 
 (rtnetlink_rcv+0x20/0x2c)
 [c03fad10] (rtnetlink_rcv+0x20/0x2c) from [c0411dec] 
 (netlink_unicast+0x158/0x208)
 [c0411dec] (netlink_unicast+0x158/0x208) from [c0412254] 
 (netlink_sendmsg+0x310/0x3c0)
 [c0412254] (netlink_sendmsg+0x310/0x3c0) from [c03d209c] 
 (sock_sendmsg+0xa8/0xd0)
 [c03d209c] (sock_sendmsg+0xa8/0xd0) from [c03d2314] 
 (__sys_sendmsg+0x1d8/0x280)
 [c03d2314] (__sys_sendmsg+0x1d8/0x280) from [c03d4054] 
 (sys_sendmsg+0x44/0x68)
 [c03d4054] (sys_sendmsg+0x44/0x68) from [c0008cc0] 
 (ret_fast_syscall+0x0/0x38)
 ---
  drivers/net/ethernet/marvell/mv643xx_eth.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
 b/drivers/net/ethernet/marvell/mv643xx_eth.c
 index 84c1326..67a3e78 100644
 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
 +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
 @@ -943,7 +943,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, 
 int force)
   struct netdev_queue *nq = netdev_get_tx_queue(mp-dev, txq-index);
   int reclaimed;
  
 - __netif_tx_lock(nq, smp_processor_id());
 + __netif_tx_lock_bh(nq);
  
   reclaimed = 0;
   while (reclaimed  budget  txq-tx_desc_count  0) {
 @@ -989,7 +989,7 @@ static int txq_reclaim(struct tx_queue *txq, int budget, 
 int force)
   dev_kfree_skb(skb);
   }
  
 - __netif_tx_unlock(nq);
 + __netif_tx_unlock_bh(nq);
  
   if 

Re: futex local DoS on most architectures

2008-02-11 Thread Lennert Buytenhek
On Mon, Feb 11, 2008 at 02:59:34PM +0100, Thomas Gleixner wrote:

> Subject: futex: disable PI/robust on archs w/o valid implementation
> From: Thomas Gleixner <[EMAIL PROTECTED]>
> 
> We have to disable the complete PI/robust functionality for those
> archs, which do not implement futex_atomic_cmpxchg_inatomic(). The
> code in question relies on a valid implementation and does not expect
> -ENOSYS, which is returned by the stub implementation in
> asm-generic/futex.h
> 
> Pointed out by: Mikael Pettersson, Riku Voipio and Adrian Bunk

FWIW, I reported it originally.


> This patch is intended for easy backporting and needs to be cleaned up
> further for current mainline.
> 
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
> Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

Looks good to me.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex local DoS on most architectures

2008-02-11 Thread Lennert Buytenhek
On Mon, Feb 11, 2008 at 02:59:34PM +0100, Thomas Gleixner wrote:

 Subject: futex: disable PI/robust on archs w/o valid implementation
 From: Thomas Gleixner [EMAIL PROTECTED]
 
 We have to disable the complete PI/robust functionality for those
 archs, which do not implement futex_atomic_cmpxchg_inatomic(). The
 code in question relies on a valid implementation and does not expect
 -ENOSYS, which is returned by the stub implementation in
 asm-generic/futex.h
 
 Pointed out by: Mikael Pettersson, Riku Voipio and Adrian Bunk

FWIW, I reported it originally.


 This patch is intended for easy backporting and needs to be cleaned up
 further for current mainline.
 
 Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
 Acked-by: Ingo Molnar [EMAIL PROTECTED]

Looks good to me.  Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] asm-arm/{arch-omap,arch-ixp23xx}: parentheses around NR_IRQS definition

2007-11-29 Thread Lennert Buytenhek
On Thu, Nov 29, 2007 at 02:07:06AM +0100, Roel Kluin wrote:

> in include/asm-arm/arch-omap/board-innovator.h:40
> #define NR_IRQSIH_BOARD_BASE + NR_FPGA_IRQS
> 
> in include/asm-arm/arch-ixp23xx/irqs.h:156:
> #define NR_IRQS  NR_IXP23XX_IRQS + NR_IXP23XX_MACH_IRQS
> 
> This could lead to problems when this definition is used in:
> 
> arch/ia64/sn/kernel/irq.c:516:
> sn_irq_lh = kmalloc(sizeof(struct list_head *) * NR_IRQS, GFP_KERNEL);
> arch/x86/kernel/io_apic_32.c:693:
> irq_cpu_data[i].irq_delta = kmalloc(sizeof(unsigned long) * NR_IRQS, 
> GFP_KERNEL);
> 694:
> irq_cpu_data[i].last_irq = kmalloc(sizeof(unsigned long) * NR_IRQS, 
> GFP_KERNEL);
> 699:
> memset(irq_cpu_data[i].irq_delta,0,sizeof(unsigned long) * NR_IRQS);
> 700:
> memset(irq_cpu_data[i].last_irq,0,sizeof(unsigned long) * NR_IRQS);
> fs/proc/proc_misc.c:464:
> per_irq_sum = kzalloc(sizeof(unsigned int)*NR_IRQS, GFP_KERNEL);
> 
> I am not sure whether this definition actually is used in any of these files.
> Am I being paranoya? anyway, adding parentheses should be safe.
> --
> Add parentheses to prevent operator precedence errors
> 
> Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>

Acked-by: Lennert Buytenhek <[EMAIL PROTECTED]>

Please send to the ARM patch system.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] asm-arm/{arch-omap,arch-ixp23xx}: parentheses around NR_IRQS definition

2007-11-29 Thread Lennert Buytenhek
On Thu, Nov 29, 2007 at 02:07:06AM +0100, Roel Kluin wrote:

 in include/asm-arm/arch-omap/board-innovator.h:40
 #define NR_IRQSIH_BOARD_BASE + NR_FPGA_IRQS
 
 in include/asm-arm/arch-ixp23xx/irqs.h:156:
 #define NR_IRQS  NR_IXP23XX_IRQS + NR_IXP23XX_MACH_IRQS
 
 This could lead to problems when this definition is used in:
 
 arch/ia64/sn/kernel/irq.c:516:
 sn_irq_lh = kmalloc(sizeof(struct list_head *) * NR_IRQS, GFP_KERNEL);
 arch/x86/kernel/io_apic_32.c:693:
 irq_cpu_data[i].irq_delta = kmalloc(sizeof(unsigned long) * NR_IRQS, 
 GFP_KERNEL);
 694:
 irq_cpu_data[i].last_irq = kmalloc(sizeof(unsigned long) * NR_IRQS, 
 GFP_KERNEL);
 699:
 memset(irq_cpu_data[i].irq_delta,0,sizeof(unsigned long) * NR_IRQS);
 700:
 memset(irq_cpu_data[i].last_irq,0,sizeof(unsigned long) * NR_IRQS);
 fs/proc/proc_misc.c:464:
 per_irq_sum = kzalloc(sizeof(unsigned int)*NR_IRQS, GFP_KERNEL);
 
 I am not sure whether this definition actually is used in any of these files.
 Am I being paranoya? anyway, adding parentheses should be safe.
 --
 Add parentheses to prevent operator precedence errors
 
 Signed-off-by: Roel Kluin [EMAIL PROTECTED]

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]

Please send to the ARM patch system.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] some overdue I2C driver removal

2007-11-28 Thread Lennert Buytenhek
On Wed, Nov 28, 2007 at 11:23:57AM +0100, Jean Delvare wrote:

> > > 6a8e0e37019c0ffeb0071fae30210baf2c3bdd75 
> > > diff --git a/Documentation/feature-removal-schedule.txt 
> > > b/Documentation/feature-removal-schedule.txt
> > > index 2af3835..9114379 100644
> > > --- a/Documentation/feature-removal-schedule.txt
> > > +++ b/Documentation/feature-removal-schedule.txt
> > > @@ -218,14 +218,6 @@ Who: Len Brown <[EMAIL PROTECTED]>
> > >  
> > >  ---
> > >  
> > > -What:i2c-ixp2000, i2c-ixp4xx and scx200_i2c drivers
> > > -When:September 2007
> > > -Why: Obsolete. The new i2c-gpio driver replaces all hardware-specific
> > > - I2C-over-GPIO drivers.
> > > -Who: Jean Delvare <[EMAIL PROTECTED]>
> > > -
> > > 
> > 
> > Did anyone ever write a driver that does i2c on the scx200 using the new
> > i2c-gpio method?  I sure haven't seen it, although I am still running
> > 2.6.18 (since that is what debian stable uses) on my scx200 systems.  I
> > certainly haven't found any useful instructions on how to use the new
> > i2c-gpio driver on an scx200.
> 
> There's no driver to write (i2c-gpio is the driver). Instead, the
> scx200 platform needs to implement the standard GPIO API. I don't think
> that it has happened yet, and last time this was discussed, someone
> (can't remember who, sorry), complained that the GPIO API "sucked" and
> that the scx200 platform would not be updated to use it. I didn't buy
> the claim due to a lack of argumentation and the fact that the GPIO
> infrastructure seems to work well enough for many other platforms.

I expressed some concerns about the GPIO API some time ago, but
that was in the context of ep93xx.


> As far as I can see, the ixp2000 platform also doesn't implement the
> standard GPIO API yet, so of the 3 drivers that are about to be
> removed, only i2c-ixp4xx can be removed without functionality loss at
> the moment. Lennert, Russell, are there any plans to convert the
> ixp2000 platform to use the generic GPIO layer?

No plans at this point from my side.

I'm not entirely sold on the GPIO API, but nevertheless I expect
that ixp2000 will be converted sooner rather than later.

If you don't want people to use i2c-ixp2000 as an example for new
code, maybe you can stick in a comment at the top of the file that
says that what i2c-ixp2000 does isn't the Preferred(TM) way of doing
it?


> Maybe I shouldn't have added this entry in feature-removal-schedule.txt
> in the first place: these drivers should ideally be dropped in favor of
> i2c-gpio, but it can only happen for platforms that implement the
> standard GPIO API. As I am not the one who will convert these
> platforms, and some of them might as well never be converted (I don't
> know how active they are nowadays), there's not much I can do.

Sorry, I seem to have missed the addition of i2c-ixp2000 to
feature-removal-schedule.txt.  I would have spoken up earlier if I
had been aware of it being in there.

It seems odd to me to advocate removal of something which works fine
and for which no functional replacement is available just because it
doesn't conform to the Latest And Greatest API(TM).  That's like
suggesting that Jan 1st 2009 we should rm -f all .c files that still
reference the big kernel lock, or rm -f all ARM ports that don't
use the clocksource/clockevent mechanism yet (which includes
ixp2000 as well.)

Let's just remove i2c-ixp2000 when the GPIO API conversion happens, ok?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] some overdue I2C driver removal

2007-11-28 Thread Lennert Buytenhek
On Wed, Nov 28, 2007 at 11:23:57AM +0100, Jean Delvare wrote:

   6a8e0e37019c0ffeb0071fae30210baf2c3bdd75 
   diff --git a/Documentation/feature-removal-schedule.txt 
   b/Documentation/feature-removal-schedule.txt
   index 2af3835..9114379 100644
   --- a/Documentation/feature-removal-schedule.txt
   +++ b/Documentation/feature-removal-schedule.txt
   @@ -218,14 +218,6 @@ Who: Len Brown [EMAIL PROTECTED]

---

   -What:i2c-ixp2000, i2c-ixp4xx and scx200_i2c drivers
   -When:September 2007
   -Why: Obsolete. The new i2c-gpio driver replaces all hardware-specific
   - I2C-over-GPIO drivers.
   -Who: Jean Delvare [EMAIL PROTECTED]
   -
   
  
  Did anyone ever write a driver that does i2c on the scx200 using the new
  i2c-gpio method?  I sure haven't seen it, although I am still running
  2.6.18 (since that is what debian stable uses) on my scx200 systems.  I
  certainly haven't found any useful instructions on how to use the new
  i2c-gpio driver on an scx200.
 
 There's no driver to write (i2c-gpio is the driver). Instead, the
 scx200 platform needs to implement the standard GPIO API. I don't think
 that it has happened yet, and last time this was discussed, someone
 (can't remember who, sorry), complained that the GPIO API sucked and
 that the scx200 platform would not be updated to use it. I didn't buy
 the claim due to a lack of argumentation and the fact that the GPIO
 infrastructure seems to work well enough for many other platforms.

I expressed some concerns about the GPIO API some time ago, but
that was in the context of ep93xx.


 As far as I can see, the ixp2000 platform also doesn't implement the
 standard GPIO API yet, so of the 3 drivers that are about to be
 removed, only i2c-ixp4xx can be removed without functionality loss at
 the moment. Lennert, Russell, are there any plans to convert the
 ixp2000 platform to use the generic GPIO layer?

No plans at this point from my side.

I'm not entirely sold on the GPIO API, but nevertheless I expect
that ixp2000 will be converted sooner rather than later.

If you don't want people to use i2c-ixp2000 as an example for new
code, maybe you can stick in a comment at the top of the file that
says that what i2c-ixp2000 does isn't the Preferred(TM) way of doing
it?


 Maybe I shouldn't have added this entry in feature-removal-schedule.txt
 in the first place: these drivers should ideally be dropped in favor of
 i2c-gpio, but it can only happen for platforms that implement the
 standard GPIO API. As I am not the one who will convert these
 platforms, and some of them might as well never be converted (I don't
 know how active they are nowadays), there's not much I can do.

Sorry, I seem to have missed the addition of i2c-ixp2000 to
feature-removal-schedule.txt.  I would have spoken up earlier if I
had been aware of it being in there.

It seems odd to me to advocate removal of something which works fine
and for which no functional replacement is available just because it
doesn't conform to the Latest And Greatest API(TM).  That's like
suggesting that Jan 1st 2009 we should rm -f all .c files that still
reference the big kernel lock, or rm -f all ARM ports that don't
use the clocksource/clockevent mechanism yet (which includes
ixp2000 as well.)

Let's just remove i2c-ixp2000 when the GPIO API conversion happens, ok?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] ixp23xx: fix some double includes

2007-11-05 Thread Lennert Buytenhek
On Mon, Nov 05, 2007 at 03:23:30PM +0100, Andre Haupt wrote:

> From: Andre Haupt <[EMAIL PROTECTED]>
> Signed-off-by: Andre Haupt <[EMAIL PROTECTED]>

Acked-by: Lennert Buytenhek <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] ixp23xx: fix some double includes

2007-11-05 Thread Lennert Buytenhek
On Mon, Nov 05, 2007 at 03:23:30PM +0100, Andre Haupt wrote:

 From: Andre Haupt [EMAIL PROTECTED]
 Signed-off-by: Andre Haupt [EMAIL PROTECTED]

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix breakage in pegasos_eth (fallout from commit b45d9147f1582333e180e1023624c003874b7312)

2007-10-28 Thread Lennert Buytenhek
On Sat, Oct 27, 2007 at 09:02:32PM +0100, Al Viro wrote:

> diff --git a/include/linux/mv643xx_eth.h b/include/linux/mv643xx_eth.h
> index 3f27239..8df230a 100644
> --- a/include/linux/mv643xx_eth.h
> +++ b/include/linux/mv643xx_eth.h
> @@ -8,6 +8,9 @@
>  #define MV643XX_ETH_NAME "mv643xx_eth"
>  #define MV643XX_ETH_SHARED_REGS  0x2000
>  #define MV643XX_ETH_SHARED_REGS_SIZE 0x2000
> +#define MV643XX_ETH_BAR_40x220
> +#define MV643XX_ETH_SIZE_REG_4   0x224
> +#define MV643XX_ETH_BASE_ADDR_ENABLE_REG 0x0290

Ideally, pegasos_eth shouldn't be poking around in mv643xx_eth
registers directly (but I'll put that on the TODO list to try
and fix later.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix breakage in pegasos_eth (fallout from commit b45d9147f1582333e180e1023624c003874b7312)

2007-10-28 Thread Lennert Buytenhek
On Sat, Oct 27, 2007 at 09:02:32PM +0100, Al Viro wrote:

 diff --git a/include/linux/mv643xx_eth.h b/include/linux/mv643xx_eth.h
 index 3f27239..8df230a 100644
 --- a/include/linux/mv643xx_eth.h
 +++ b/include/linux/mv643xx_eth.h
 @@ -8,6 +8,9 @@
  #define MV643XX_ETH_NAME mv643xx_eth
  #define MV643XX_ETH_SHARED_REGS  0x2000
  #define MV643XX_ETH_SHARED_REGS_SIZE 0x2000
 +#define MV643XX_ETH_BAR_40x220
 +#define MV643XX_ETH_SIZE_REG_4   0x224
 +#define MV643XX_ETH_BASE_ADDR_ENABLE_REG 0x0290

Ideally, pegasos_eth shouldn't be poking around in mv643xx_eth
registers directly (but I'll put that on the TODO list to try
and fix later.)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Remove pointless casts from void pointers,

2007-10-26 Thread Lennert Buytenhek
On Fri, Oct 26, 2007 at 05:40:22AM -0400, Jeff Garzik wrote:

>  arch/arm/mach-pxa/ssp.c|2 +-
>  arch/arm/mach-s3c2410/usb-simtec.c |2 +-
>  arch/arm/plat-omap/mailbox.c   |2 +-

FWIW

Acked-by: Lennert Buytenhek <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: Misc minor interrupt handler cleanups

2007-10-26 Thread Lennert Buytenhek
On Fri, Oct 26, 2007 at 05:40:25AM -0400, Jeff Garzik wrote:

> mach-integrator/pci_v3.c: no need to reference 'irq' arg, its constant
> 
> mach-omap1/pm.c: remove extra whitespace
> 
> arch/arm/mach-sa1100/ssp.c: remove braces around single C stmt
> 
> arch/arm/plat-omap/mcbsp.c:
>   - remove pointless casts from void*
>   - make longer lines more readable
> 
> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

FWIW

Acked-by: Lennert Buytenhek <[EMAIL PROTECTED]>


> ---
>  arch/arm/mach-integrator/pci_v3.c |8 +---
>  arch/arm/mach-omap1/pm.c  |2 +-
>  arch/arm/mach-sa1100/ssp.c|3 +--
>  arch/arm/plat-omap/mcbsp.c|   20 
>  4 files changed, 19 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm/mach-integrator/pci_v3.c 
> b/arch/arm/mach-integrator/pci_v3.c
> index d4d8134..d55fa4e 100644
> --- a/arch/arm/mach-integrator/pci_v3.c
> +++ b/arch/arm/mach-integrator/pci_v3.c
> @@ -440,7 +440,7 @@ v3_pci_fault(unsigned long addr, unsigned int fsr, struct 
> pt_regs *regs)
>   return 1;
>  }
>  
> -static irqreturn_t v3_irq(int irq, void *devid)
> +static irqreturn_t v3_irq(int dummy, void *devid)
>  {
>  #ifdef CONFIG_DEBUG_LL
>   struct pt_regs *regs = get_irq_regs();
> @@ -448,8 +448,10 @@ static irqreturn_t v3_irq(int irq, void *devid)
>   unsigned long instr = *(unsigned long *)pc;
>   char buf[128];
>  
> - sprintf(buf, "V3 int %d: pc=0x%08lx [%08lx] LBFADDR=%08x LBFCODE=%02x 
> ISTAT=%02x\n", irq,
> - pc, instr, __raw_readl(SC_LBFADDR), __raw_readl(SC_LBFCODE) & 
> 255,
> + sprintf(buf, "V3 int %d: pc=0x%08lx [%08lx] LBFADDR=%08x LBFCODE=%02x "
> + "ISTAT=%02x\n", IRQ_AP_V3INT, pc, instr,
> + __raw_readl(SC_LBFADDR),
> + __raw_readl(SC_LBFCODE) & 255,
>   v3_readb(V3_LB_ISTAT));
>   printascii(buf);
>  #endif
> diff --git a/arch/arm/mach-omap1/pm.c b/arch/arm/mach-omap1/pm.c
> index 3bf01e2..94c299b 100644
> --- a/arch/arm/mach-omap1/pm.c
> +++ b/arch/arm/mach-omap1/pm.c
> @@ -646,7 +646,7 @@ static void omap_pm_finish(void)
>  }
>  
>  
> -static irqreturn_t  omap_wakeup_interrupt(int irq, void *dev)
> +static irqreturn_t omap_wakeup_interrupt(int irq, void *dev)
>  {
>   return IRQ_HANDLED;
>  }
> diff --git a/arch/arm/mach-sa1100/ssp.c b/arch/arm/mach-sa1100/ssp.c
> index 59703c6..06206ce 100644
> --- a/arch/arm/mach-sa1100/ssp.c
> +++ b/arch/arm/mach-sa1100/ssp.c
> @@ -29,9 +29,8 @@ static irqreturn_t ssp_interrupt(int irq, void *dev_id)
>  {
>   unsigned int status = Ser4SSSR;
>  
> - if (status & SSSR_ROR) {
> + if (status & SSSR_ROR)
>   printk(KERN_WARNING "SSP: receiver overrun\n");
> - }
>  
>   Ser4SSSR = SSSR_ROR;
>  
> diff --git a/arch/arm/plat-omap/mcbsp.c b/arch/arm/plat-omap/mcbsp.c
> index f7b9ccd..2af5bd5 100644
> --- a/arch/arm/plat-omap/mcbsp.c
> +++ b/arch/arm/plat-omap/mcbsp.c
> @@ -98,9 +98,10 @@ static void omap_mcbsp_dump_reg(u8 id)
>  
>  static irqreturn_t omap_mcbsp_tx_irq_handler(int irq, void *dev_id)
>  {
> - struct omap_mcbsp * mcbsp_tx = (struct omap_mcbsp *)(dev_id);
> + struct omap_mcbsp *mcbsp_tx = dev_id;
>  
> - DBG("TX IRQ callback : 0x%x\n", OMAP_MCBSP_READ(mcbsp_tx->io_base, 
> SPCR2));
> + DBG("TX IRQ callback : 0x%x\n",
> + OMAP_MCBSP_READ(mcbsp_tx->io_base, SPCR2));
>  
>   complete(_tx->tx_irq_completion);
>   return IRQ_HANDLED;
> @@ -108,9 +109,10 @@ static irqreturn_t omap_mcbsp_tx_irq_handler(int irq, 
> void *dev_id)
>  
>  static irqreturn_t omap_mcbsp_rx_irq_handler(int irq, void *dev_id)
>  {
> - struct omap_mcbsp * mcbsp_rx = (struct omap_mcbsp *)(dev_id);
> + struct omap_mcbsp *mcbsp_rx = dev_id;
>  
> - DBG("RX IRQ callback : 0x%x\n", OMAP_MCBSP_READ(mcbsp_rx->io_base, 
> SPCR2));
> + DBG("RX IRQ callback : 0x%x\n",
> + OMAP_MCBSP_READ(mcbsp_rx->io_base, SPCR2));
>  
>   complete(_rx->rx_irq_completion);
>   return IRQ_HANDLED;
> @@ -118,9 +120,10 @@ static irqreturn_t omap_mcbsp_rx_irq_handler(int irq, 
> void *dev_id)
>  
>  static void omap_mcbsp_tx_dma_callback(int lch, u16 ch_status, void *data)
>  {
> - struct omap_mcbsp * mcbsp_dma_tx = (struct omap_mcbsp *)(data);
> + struct omap_mcbsp *mcbsp_dma_tx = data;
>  
> - DBG("TX DMA callback : 0x%x\n", OMAP_MCBSP_READ(mcbsp_dma_tx->io_base, 
> SPCR2));
> + DBG("TX DMA callback : 0x%x\n",
> + OMAP_MCBSP_READ(m

Re: [PATCH] ARM: Misc minor interrupt handler cleanups

2007-10-26 Thread Lennert Buytenhek
On Fri, Oct 26, 2007 at 05:40:25AM -0400, Jeff Garzik wrote:

 mach-integrator/pci_v3.c: no need to reference 'irq' arg, its constant
 
 mach-omap1/pm.c: remove extra whitespace
 
 arch/arm/mach-sa1100/ssp.c: remove braces around single C stmt
 
 arch/arm/plat-omap/mcbsp.c:
   - remove pointless casts from void*
   - make longer lines more readable
 
 Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

FWIW

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]


 ---
  arch/arm/mach-integrator/pci_v3.c |8 +---
  arch/arm/mach-omap1/pm.c  |2 +-
  arch/arm/mach-sa1100/ssp.c|3 +--
  arch/arm/plat-omap/mcbsp.c|   20 
  4 files changed, 19 insertions(+), 14 deletions(-)
 
 diff --git a/arch/arm/mach-integrator/pci_v3.c 
 b/arch/arm/mach-integrator/pci_v3.c
 index d4d8134..d55fa4e 100644
 --- a/arch/arm/mach-integrator/pci_v3.c
 +++ b/arch/arm/mach-integrator/pci_v3.c
 @@ -440,7 +440,7 @@ v3_pci_fault(unsigned long addr, unsigned int fsr, struct 
 pt_regs *regs)
   return 1;
  }
  
 -static irqreturn_t v3_irq(int irq, void *devid)
 +static irqreturn_t v3_irq(int dummy, void *devid)
  {
  #ifdef CONFIG_DEBUG_LL
   struct pt_regs *regs = get_irq_regs();
 @@ -448,8 +448,10 @@ static irqreturn_t v3_irq(int irq, void *devid)
   unsigned long instr = *(unsigned long *)pc;
   char buf[128];
  
 - sprintf(buf, V3 int %d: pc=0x%08lx [%08lx] LBFADDR=%08x LBFCODE=%02x 
 ISTAT=%02x\n, irq,
 - pc, instr, __raw_readl(SC_LBFADDR), __raw_readl(SC_LBFCODE)  
 255,
 + sprintf(buf, V3 int %d: pc=0x%08lx [%08lx] LBFADDR=%08x LBFCODE=%02x 
 + ISTAT=%02x\n, IRQ_AP_V3INT, pc, instr,
 + __raw_readl(SC_LBFADDR),
 + __raw_readl(SC_LBFCODE)  255,
   v3_readb(V3_LB_ISTAT));
   printascii(buf);
  #endif
 diff --git a/arch/arm/mach-omap1/pm.c b/arch/arm/mach-omap1/pm.c
 index 3bf01e2..94c299b 100644
 --- a/arch/arm/mach-omap1/pm.c
 +++ b/arch/arm/mach-omap1/pm.c
 @@ -646,7 +646,7 @@ static void omap_pm_finish(void)
  }
  
  
 -static irqreturn_t  omap_wakeup_interrupt(int irq, void *dev)
 +static irqreturn_t omap_wakeup_interrupt(int irq, void *dev)
  {
   return IRQ_HANDLED;
  }
 diff --git a/arch/arm/mach-sa1100/ssp.c b/arch/arm/mach-sa1100/ssp.c
 index 59703c6..06206ce 100644
 --- a/arch/arm/mach-sa1100/ssp.c
 +++ b/arch/arm/mach-sa1100/ssp.c
 @@ -29,9 +29,8 @@ static irqreturn_t ssp_interrupt(int irq, void *dev_id)
  {
   unsigned int status = Ser4SSSR;
  
 - if (status  SSSR_ROR) {
 + if (status  SSSR_ROR)
   printk(KERN_WARNING SSP: receiver overrun\n);
 - }
  
   Ser4SSSR = SSSR_ROR;
  
 diff --git a/arch/arm/plat-omap/mcbsp.c b/arch/arm/plat-omap/mcbsp.c
 index f7b9ccd..2af5bd5 100644
 --- a/arch/arm/plat-omap/mcbsp.c
 +++ b/arch/arm/plat-omap/mcbsp.c
 @@ -98,9 +98,10 @@ static void omap_mcbsp_dump_reg(u8 id)
  
  static irqreturn_t omap_mcbsp_tx_irq_handler(int irq, void *dev_id)
  {
 - struct omap_mcbsp * mcbsp_tx = (struct omap_mcbsp *)(dev_id);
 + struct omap_mcbsp *mcbsp_tx = dev_id;
  
 - DBG(TX IRQ callback : 0x%x\n, OMAP_MCBSP_READ(mcbsp_tx-io_base, 
 SPCR2));
 + DBG(TX IRQ callback : 0x%x\n,
 + OMAP_MCBSP_READ(mcbsp_tx-io_base, SPCR2));
  
   complete(mcbsp_tx-tx_irq_completion);
   return IRQ_HANDLED;
 @@ -108,9 +109,10 @@ static irqreturn_t omap_mcbsp_tx_irq_handler(int irq, 
 void *dev_id)
  
  static irqreturn_t omap_mcbsp_rx_irq_handler(int irq, void *dev_id)
  {
 - struct omap_mcbsp * mcbsp_rx = (struct omap_mcbsp *)(dev_id);
 + struct omap_mcbsp *mcbsp_rx = dev_id;
  
 - DBG(RX IRQ callback : 0x%x\n, OMAP_MCBSP_READ(mcbsp_rx-io_base, 
 SPCR2));
 + DBG(RX IRQ callback : 0x%x\n,
 + OMAP_MCBSP_READ(mcbsp_rx-io_base, SPCR2));
  
   complete(mcbsp_rx-rx_irq_completion);
   return IRQ_HANDLED;
 @@ -118,9 +120,10 @@ static irqreturn_t omap_mcbsp_rx_irq_handler(int irq, 
 void *dev_id)
  
  static void omap_mcbsp_tx_dma_callback(int lch, u16 ch_status, void *data)
  {
 - struct omap_mcbsp * mcbsp_dma_tx = (struct omap_mcbsp *)(data);
 + struct omap_mcbsp *mcbsp_dma_tx = data;
  
 - DBG(TX DMA callback : 0x%x\n, OMAP_MCBSP_READ(mcbsp_dma_tx-io_base, 
 SPCR2));
 + DBG(TX DMA callback : 0x%x\n,
 + OMAP_MCBSP_READ(mcbsp_dma_tx-io_base, SPCR2));
  
   /* We can free the channels */
   omap_free_dma(mcbsp_dma_tx-dma_tx_lch);
 @@ -131,9 +134,10 @@ static void omap_mcbsp_tx_dma_callback(int lch, u16 
 ch_status, void *data)
  
  static void omap_mcbsp_rx_dma_callback(int lch, u16 ch_status, void *data)
  {
 - struct omap_mcbsp * mcbsp_dma_rx = (struct omap_mcbsp *)(data);
 + struct omap_mcbsp *mcbsp_dma_rx = data;
  
 - DBG(RX DMA callback : 0x%x\n, OMAP_MCBSP_READ(mcbsp_dma_rx-io_base, 
 SPCR2));
 + DBG(RX DMA callback : 0x%x\n,
 + OMAP_MCBSP_READ(mcbsp_dma_rx-io_base, SPCR2));
  
   /* We can free the channels */
   omap_free_dma

Re: [PATCH] Remove pointless casts from void pointers,

2007-10-26 Thread Lennert Buytenhek
On Fri, Oct 26, 2007 at 05:40:22AM -0400, Jeff Garzik wrote:

  arch/arm/mach-pxa/ssp.c|2 +-
  arch/arm/mach-s3c2410/usb-simtec.c |2 +-
  arch/arm/plat-omap/mailbox.c   |2 +-

FWIW

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pci: get rid of pci_dev::{vendor,device}_compatible fields

2007-10-25 Thread Lennert Buytenhek
The vendor_compatible and device_compatible fields in struct
pci_dev aren't used anywhere, and are somewhat pointless.  Assuming
that these are historical artifacts, remove them.

Signed-off-by: Lennert Buytenhek <[EMAIL PROTECTED]>

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5d2281f..d503ef1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -66,7 +66,6 @@ enum pci_mmap_state {
 #define PCI_DMA_FROMDEVICE 2
 #define PCI_DMA_NONE   3
 
-#define DEVICE_COUNT_COMPATIBLE4
 #define DEVICE_COUNT_RESOURCE  12
 
 typedef int __bitwise pci_power_t;
@@ -159,10 +158,6 @@ struct pci_dev {
pci_channel_state_t error_state;/* current connectivity state */
struct  device  dev;/* Generic device interface */
 
-   /* device is compatible with these IDs */
-   unsigned short vendor_compatible[DEVICE_COUNT_COMPATIBLE];
-   unsigned short device_compatible[DEVICE_COUNT_COMPATIBLE];
-
int cfg_size;   /* Size of configuration space */
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugme-new] [Bug 9217] New: CONFIG_CMDLINE doesn't pass to kernel

2007-10-25 Thread Lennert Buytenhek
On Wed, Oct 24, 2007 at 10:35:33PM -0500, Bill Gatliff wrote:

> >Something broke CONFIG_CMDLINE of ARM (at least) between 2.6.22 and 2.6.23.
> >
> >I don't know whether it was an ARM patch one of those kernel-wide changes. 
> >We have futzed with the command-line parsing a bit recently, but the 2.6.23
> >changelog doesn't suggest anything obvious.
> 
> What does the affected system's bootloader pass in r2?  If it's nonzero, 
> ARM's 2.6.23 may interpret it as being an ATAGS pointer.  And when that 
> happens, the system prefers the ATAGS over CONFIG_CMDLINE.
> 
> There's sanity checking in __vet_atags, but maybe it isn't enough.  
> Other than that, I can't see anything yet.

If this is the SHARP Shepherd (which seems to be the Zaurus SL-C750,
which looks like it's the one the submitter is using), it's not setting
boot_params at all:

MACHINE_START(SHEPHERD, "SHARP Shepherd")
.phys_io= 0x4000,
.io_pg_offst= (io_p2v(0x4000) >> 18) & 0xfffc,
.fixup  = fixup_corgi,
.map_io = pxa_map_io,
.init_irq   = pxa25x_init_irq,
.init_machine   = corgi_init,
.timer  = _timer,
MACHINE_END
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugme-new] [Bug 9217] New: CONFIG_CMDLINE doesn't pass to kernel

2007-10-25 Thread Lennert Buytenhek
On Wed, Oct 24, 2007 at 10:35:33PM -0500, Bill Gatliff wrote:

 Something broke CONFIG_CMDLINE of ARM (at least) between 2.6.22 and 2.6.23.
 
 I don't know whether it was an ARM patch one of those kernel-wide changes. 
 We have futzed with the command-line parsing a bit recently, but the 2.6.23
 changelog doesn't suggest anything obvious.
 
 What does the affected system's bootloader pass in r2?  If it's nonzero, 
 ARM's 2.6.23 may interpret it as being an ATAGS pointer.  And when that 
 happens, the system prefers the ATAGS over CONFIG_CMDLINE.
 
 There's sanity checking in __vet_atags, but maybe it isn't enough.  
 Other than that, I can't see anything yet.

If this is the SHARP Shepherd (which seems to be the Zaurus SL-C750,
which looks like it's the one the submitter is using), it's not setting
boot_params at all:

MACHINE_START(SHEPHERD, SHARP Shepherd)
.phys_io= 0x4000,
.io_pg_offst= (io_p2v(0x4000)  18)  0xfffc,
.fixup  = fixup_corgi,
.map_io = pxa_map_io,
.init_irq   = pxa25x_init_irq,
.init_machine   = corgi_init,
.timer  = pxa_timer,
MACHINE_END
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pci: get rid of pci_dev::{vendor,device}_compatible fields

2007-10-25 Thread Lennert Buytenhek
The vendor_compatible and device_compatible fields in struct
pci_dev aren't used anywhere, and are somewhat pointless.  Assuming
that these are historical artifacts, remove them.

Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED]

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5d2281f..d503ef1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -66,7 +66,6 @@ enum pci_mmap_state {
 #define PCI_DMA_FROMDEVICE 2
 #define PCI_DMA_NONE   3
 
-#define DEVICE_COUNT_COMPATIBLE4
 #define DEVICE_COUNT_RESOURCE  12
 
 typedef int __bitwise pci_power_t;
@@ -159,10 +158,6 @@ struct pci_dev {
pci_channel_state_t error_state;/* current connectivity state */
struct  device  dev;/* Generic device interface */
 
-   /* device is compatible with these IDs */
-   unsigned short vendor_compatible[DEVICE_COUNT_COMPATIBLE];
-   unsigned short device_compatible[DEVICE_COUNT_COMPATIBLE];
-
int cfg_size;   /* Size of configuration space */
 
/*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add missing KERN_ to multiline printk(KERN_...)

2007-08-09 Thread Lennert Buytenhek
On Wed, Aug 08, 2007 at 05:28:46PM -0700, Joe Perches wrote:

> Some uses of printk are missing KERN_ on the second
> and subsequent lines.

I'm not the ARM maintainer, but for what it's worth,
the arch/arm/kernel/ecard.c change looks fine to me.

Acked-by: Lennert Buytenhek <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add missing KERN_level to multiline printk(KERN_level...)

2007-08-09 Thread Lennert Buytenhek
On Wed, Aug 08, 2007 at 05:28:46PM -0700, Joe Perches wrote:

 Some uses of printk are missing KERN_level on the second
 and subsequent lines.

I'm not the ARM maintainer, but for what it's worth,
the arch/arm/kernel/ecard.c change looks fine to me.

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Lennert Buytenhek
On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:

> From: Chris Snook <[EMAIL PROTECTED]>
> 
> Some architectures currently do not declare the contents of an atomic_t to be
> volatile.  This causes confusion since atomic_read() might not actually read
> anything if an optimizing compiler re-uses a value stored in a register, which
> can break code that loops until something external changes the value of an
> atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
> of all registers used in the loop, thus hurting performance instead of helping
> it, particularly on architectures where it's unnecessary.  Since we generally
> want to re-read the contents of an atomic variable on every access anyway,
> let's standardize the behavior across all architectures and avoid the
> performance and correctness problems of requiring the use of barrier() in
> loops that expect atomic_t variables to change externally.  This is relevant
> even on non-smp architectures, since drivers may use atomic operations in
> interrupt handlers.
> 
> Signed-off-by: Chris Snook <[EMAIL PROTECTED]>

Documentation/atomic_ops.txt would need updating:

[...]

One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers.  They need only perform the
atomic_t counter update in an SMP safe manner.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make atomic_t volatile on all architectures

2007-08-08 Thread Lennert Buytenhek
On Wed, Aug 08, 2007 at 07:07:33PM -0400, Chris Snook wrote:

 From: Chris Snook [EMAIL PROTECTED]
 
 Some architectures currently do not declare the contents of an atomic_t to be
 volatile.  This causes confusion since atomic_read() might not actually read
 anything if an optimizing compiler re-uses a value stored in a register, which
 can break code that loops until something external changes the value of an
 atomic_t.  Avoiding such bugs requires using barrier(), which causes re-loads
 of all registers used in the loop, thus hurting performance instead of helping
 it, particularly on architectures where it's unnecessary.  Since we generally
 want to re-read the contents of an atomic variable on every access anyway,
 let's standardize the behavior across all architectures and avoid the
 performance and correctness problems of requiring the use of barrier() in
 loops that expect atomic_t variables to change externally.  This is relevant
 even on non-smp architectures, since drivers may use atomic operations in
 interrupt handlers.
 
 Signed-off-by: Chris Snook [EMAIL PROTECTED]

Documentation/atomic_ops.txt would need updating:

[...]

One very important aspect of these two routines is that they DO NOT
require any explicit memory barriers.  They need only perform the
atomic_t counter update in an SMP safe manner.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Lennert Buytenhek
On Thu, Aug 02, 2007 at 02:06:27AM +0200, Mikael Pettersson wrote:

> > > @@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
> > >  static inline int
> > >  futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
> > >  {
> > > +#ifdef CONFIG_SMP
> > >   return -ENOSYS;
> > > +#else
> > 
> > Since the callers of futex_atomic_cmpxchg_inatomic() don't really
> > seem prepared to deal with -ENOSYS (e.g. the handle_futex_death()
> > infinite loop when it gets -ENOSYS), it seems better never to
> > return -ENOSYS from this function at all.
> > 
> > What if you just stick an #error in here in the SMP case?
> 
> The problem with #error is that it would cause compile-time
> regressions. I assume that e.g. alpha supports building SMP
> kernels, but #error would prevent that.
> 
> Thus I opted to fix the UP case while leaving the SMP case
> unchanged. Actually I think the SMP case should be a BUG()
> rather than -ENOSYS,

Probably.  Or handle -ENOSYS in the callers -- but that's more
work, and would cease to be necessary once everyone implements
futex_atomic_cmpxchg_inatomic().


> but that's a different issue from the UP case which I really do
> want to see fixed.

ACK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Lennert Buytenhek
On Thu, Aug 02, 2007 at 01:00:21AM +0200, Mikael Pettersson wrote:

> @@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
>  static inline int
>  futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
>  {
> +#ifdef CONFIG_SMP
>   return -ENOSYS;
> +#else

Since the callers of futex_atomic_cmpxchg_inatomic() don't really
seem prepared to deal with -ENOSYS (e.g. the handle_futex_death()
infinite loop when it gets -ENOSYS), it seems better never to
return -ENOSYS from this function at all.

What if you just stick an #error in here in the SMP case?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Lennert Buytenhek
On Thu, Aug 02, 2007 at 01:00:21AM +0200, Mikael Pettersson wrote:

 @@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
  static inline int
  futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
  {
 +#ifdef CONFIG_SMP
   return -ENOSYS;
 +#else

Since the callers of futex_atomic_cmpxchg_inatomic() don't really
seem prepared to deal with -ENOSYS (e.g. the handle_futex_death()
infinite loop when it gets -ENOSYS), it seems better never to
return -ENOSYS from this function at all.

What if you just stick an #error in here in the SMP case?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Lennert Buytenhek
On Thu, Aug 02, 2007 at 02:06:27AM +0200, Mikael Pettersson wrote:

   @@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
static inline int
futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
{
   +#ifdef CONFIG_SMP
 return -ENOSYS;
   +#else
  
  Since the callers of futex_atomic_cmpxchg_inatomic() don't really
  seem prepared to deal with -ENOSYS (e.g. the handle_futex_death()
  infinite loop when it gets -ENOSYS), it seems better never to
  return -ENOSYS from this function at all.
  
  What if you just stick an #error in here in the SMP case?
 
 The problem with #error is that it would cause compile-time
 regressions. I assume that e.g. alpha supports building SMP
 kernels, but #error would prevent that.
 
 Thus I opted to fix the UP case while leaving the SMP case
 unchanged. Actually I think the SMP case should be a BUG()
 rather than -ENOSYS,

Probably.  Or handle -ENOSYS in the callers -- but that's more
work, and would cease to be necessary once everyone implements
futex_atomic_cmpxchg_inatomic().


 but that's a different issue from the UP case which I really do
 want to see fixed.

ACK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] EP93XX_ETH must select MII

2007-07-13 Thread Lennert Buytenhek
On Fri, Jul 13, 2007 at 02:12:08AM +0200, Adrian Bunk wrote:

> From: John Donoghue <[EMAIL PROTECTED]>
> 
> CONFIG_EP93XX_ETH=y, CONFIG_MII=n results in an obvious link error.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Acked-by: Lennert Buytenhek <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] EP93XX_ETH must select MII

2007-07-13 Thread Lennert Buytenhek
On Fri, Jul 13, 2007 at 02:12:08AM +0200, Adrian Bunk wrote:

 From: John Donoghue [EMAIL PROTECTED]
 
 CONFIG_EP93XX_ETH=y, CONFIG_MII=n results in an obvious link error.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Acked-by: Lennert Buytenhek [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


handle_futex_death() infinite loop on ARM (fwd)

2007-07-04 Thread Lennert Buytenhek
- Forwarded message from Lennert Buytenhek <[EMAIL PROTECTED]> -

Date: Wed, 13 Jun 2007 20:40:40 +0200
From: Lennert Buytenhek <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: handle_futex_death() infinite loop
User-Agent: Mutt/1.4.1i

Hi,

If you're also running into glibc's tst-robust1 test suite test
locking up your ARM machine, you're probably running into the fact
that asm-arm/futex.h includes asm-generic/futex.h, and
asm-generic/futex.h defines futex_atomic_cmpxchg_inatomic() to
return -ENOSYS.  This causes handle_futex_death() to loop forever.


cheers,
Lennert

- End forwarded message -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


handle_futex_death() infinite loop on ARM (fwd)

2007-07-04 Thread Lennert Buytenhek
- Forwarded message from Lennert Buytenhek [EMAIL PROTECTED] -

Date: Wed, 13 Jun 2007 20:40:40 +0200
From: Lennert Buytenhek [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: handle_futex_death() infinite loop
User-Agent: Mutt/1.4.1i

Hi,

If you're also running into glibc's tst-robust1 test suite test
locking up your ARM machine, you're probably running into the fact
that asm-arm/futex.h includes asm-generic/futex.h, and
asm-generic/futex.h defines futex_atomic_cmpxchg_inatomic() to
return -ENOSYS.  This causes handle_futex_death() to loop forever.


cheers,
Lennert

- End forwarded message -
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-07-01 Thread Lennert Buytenhek
On Sun, Jul 01, 2007 at 12:23:16PM +0200, Michael Buesch wrote:

> > More or less.  You can't add the resistances like that, since the
> > bus isolation chip buffers the IDSEL signal, but it is correct that
> > if the host's IDSEL resistor is larger than a certain value, the
> > combination of the resistive coupling of IDSEL plus the extra buffer
> > in the isolator might be causing the IDSEL input on the 'guest' PCI
> > board to assert too late (or not assert at all), causing config
> > accesses to fail.
> > 
> > (This also depends on the specific 'guest' PCI board used, as you
> > noted, due to differing IDSEL trace lengths/capacitances and input
> > pin capacitances on different PCI boards.  Also, it might work at
> > 33 MHz but not work at 66 MHz, etc.)
> 
> It doesn't work on any of my boards :(

What extender board is this?  Do you have docs/schematics?
And what motherboard brand/type?


> > If you feel adventurous, you could try to hack around this by
> > figuring out which AD[31:16] line this PCI slot's IDSEL line is
> > resistively coupled to (depends on the slot), and then adding
> > another parallel resistor on the board itself to make the bus
> > isolator's input buffer charge faster.  Note that this does
> > increase the load on that specific AD[] line, which might cause
> > other funny effects.
> 
> Well, but how to find out to which address line it's connected to?
> Pretty hard to follow the PCB traces, especially since it's
> multilayered.

Actually, the IDSEL resistor would be on the computer's
motherboard, not on the PCI board.  And to which address line
the IDSEL line is connected depends on which PCI slot on the
motherboard you're looking at.

A multimeter should do the trick, but I would advise against this
if you're not totally comfortable with hacking hardware.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PXA27x UDC driver.

2007-07-01 Thread Lennert Buytenhek
On Thu, Jun 28, 2007 at 12:36:20PM +0200, Rodolfo Giometti wrote:

> attached you can find new version of my patch for PXA27x UDC driver
> support against kernel 2.6.22-rc3 (but it applies also ro rc6).
> 
> I'd like to know what I have to do in order to prepare this patch for
> kernel inclusion. It's time PXA27x has its UDC driver into linux! :)

Aren't there about 300 different versions of the PXA27x gadget
driver out there?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PXA27x UDC driver.

2007-07-01 Thread Lennert Buytenhek
On Thu, Jun 28, 2007 at 12:36:20PM +0200, Rodolfo Giometti wrote:

 attached you can find new version of my patch for PXA27x UDC driver
 support against kernel 2.6.22-rc3 (but it applies also ro rc6).
 
 I'd like to know what I have to do in order to prepare this patch for
 kernel inclusion. It's time PXA27x has its UDC driver into linux! :)

Aren't there about 300 different versions of the PXA27x gadget
driver out there?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-07-01 Thread Lennert Buytenhek
On Sun, Jul 01, 2007 at 12:23:16PM +0200, Michael Buesch wrote:

  More or less.  You can't add the resistances like that, since the
  bus isolation chip buffers the IDSEL signal, but it is correct that
  if the host's IDSEL resistor is larger than a certain value, the
  combination of the resistive coupling of IDSEL plus the extra buffer
  in the isolator might be causing the IDSEL input on the 'guest' PCI
  board to assert too late (or not assert at all), causing config
  accesses to fail.
  
  (This also depends on the specific 'guest' PCI board used, as you
  noted, due to differing IDSEL trace lengths/capacitances and input
  pin capacitances on different PCI boards.  Also, it might work at
  33 MHz but not work at 66 MHz, etc.)
 
 It doesn't work on any of my boards :(

What extender board is this?  Do you have docs/schematics?
And what motherboard brand/type?


  If you feel adventurous, you could try to hack around this by
  figuring out which AD[31:16] line this PCI slot's IDSEL line is
  resistively coupled to (depends on the slot), and then adding
  another parallel resistor on the board itself to make the bus
  isolator's input buffer charge faster.  Note that this does
  increase the load on that specific AD[] line, which might cause
  other funny effects.
 
 Well, but how to find out to which address line it's connected to?
 Pretty hard to follow the PCB traces, especially since it's
 multilayered.

Actually, the IDSEL resistor would be on the computer's
motherboard, not on the PCI board.  And to which address line
the IDSEL line is connected depends on which PCI slot on the
motherboard you're looking at.

A multimeter should do the trick, but I would advise against this
if you're not totally comfortable with hacking hardware.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek
On Sun, Jul 01, 2007 at 12:24:40AM +0200, Michael Buesch wrote:

> > > Hm, I was going to measure the real power advantage with a
> > > PCI-extender card. But my B44B0 card doesn't seem to work in
> > > that extender card. It works perfectly fine sticked directly into
> > > the motherboard, though, and other cards like a BCM4318 work in
> > > the extender, too.
> > > Not sure what this is.
> > > The extender has an application note about nonworking cards in the
> > > extender and a too big resistor on the board IDSEL pin being the
> > > cause of this.
> > 
> > Does the card show up in lspci at all?
> 
> No it doesn't.

Right, so it sounds like it might be this issue.


> > Does the extender board have a PCI-PCI bridge on it?  (If not,
> > there's not really any reason to resistively couple the IDSEL
> > line to the host, since the host should take care of that.)
> 
> There's no bridge. It just decouples all voltage lines, so you can
> drive it from external supply and/or measure voltages and current.
> On the PCB it looks like the the IDSEL line is rather directly
> routed to the host IDSEL. It just goes through one of the bus
> isolation chips. So I guess (just my guess) that this chip has some
> resistance and if the total resistance of the chip + the IDSEL
> resistor on the mainboard goes above some threshold it doesn't work
> anymore for some cards. In the application note they write
> about trouble for IDSEL resistors >51ohms.

More or less.  You can't add the resistances like that, since the
bus isolation chip buffers the IDSEL signal, but it is correct that
if the host's IDSEL resistor is larger than a certain value, the
combination of the resistive coupling of IDSEL plus the extra buffer
in the isolator might be causing the IDSEL input on the 'guest' PCI
board to assert too late (or not assert at all), causing config
accesses to fail.

(This also depends on the specific 'guest' PCI board used, as you
noted, due to differing IDSEL trace lengths/capacitances and input
pin capacitances on different PCI boards.  Also, it might work at
33 MHz but not work at 66 MHz, etc.)

If you feel adventurous, you could try to hack around this by
figuring out which AD[31:16] line this PCI slot's IDSEL line is
resistively coupled to (depends on the slot), and then adding
another parallel resistor on the board itself to make the bus
isolator's input buffer charge faster.  Note that this does
increase the load on that specific AD[] line, which might cause
other funny effects.


> > > Maybe I can try with another machine tomorrow.
> > 
> > That would only make a difference if there is no PCI-PCI bridge on
> > the extender board.
> 
> Well, they suggest it in the application note as a possible fix. ;)

The bus isolation chip doesn't count as a PCI-PCI bridge. :)  I'm just
saying that you wouldn't see the issue you are seeing now if the
extender board had a real PCI-PCI bridge on it, since in that case the
type 0 config access to the guest PCI board would be generated by the
bridge instead of by the host.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek
On Sat, Jun 30, 2007 at 11:53:25PM +0200, Michael Buesch wrote:

> > When the interface is down (or driver removed), the BroadCom 44xx card 
> > remains
> > powered on, and both its MAC and PHY is using up power.
> > This patch makes the driver issue a MAC_CTRL_PHY_PDOWN when the interface
> > is halted, and does a partial chip reset turns off the activity LEDs too.
> > 
> > Applies to 2.6.22-rc6, or current git head.
> > 
> > Tested on a Broadcom BCM4401-B0 card, it saves ~0.5W (measured using 
> > powertop).
> 
> Hm, I was going to measure the real power advantage with a
> PCI-extender card. But my B44B0 card doesn't seem to work in
> that extender card. It works perfectly fine sticked directly into
> the motherboard, though, and other cards like a BCM4318 work in
> the extender, too.
> Not sure what this is.
> The extender has an application note about nonworking cards in the
> extender and a too big resistor on the board IDSEL pin being the
> cause of this.

Does the card show up in lspci at all?  IDSEL drive strength
issues should only affect config space accesses.

Does the extender board have a PCI-PCI bridge on it?  (If not,
there's not really any reason to resistively couple the IDSEL
line to the host, since the host should take care of that.)


> Maybe I can try with another machine tomorrow.

That would only make a difference if there is no PCI-PCI bridge on
the extender board.

If the extender resistively couples the host's IDSEL line, you might
see different results on a different host bridge, since different
host bridges can use different numbers of IDSEL stepping cycles.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek
On Sat, Jun 30, 2007 at 04:19:23PM +0100, Matthew Garrett wrote:

> I'd agree that there's a need for a state where we power down as much as 
> possible (even at the cost of functionality), but where possible it 
> would also be nice to offer a state where the mac is powered down and 
> the phy left up.

There are PHYs which can detect that someone's on the other end even
when powered down..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek
On Sat, Jun 30, 2007 at 04:19:23PM +0100, Matthew Garrett wrote:

 I'd agree that there's a need for a state where we power down as much as 
 possible (even at the cost of functionality), but where possible it 
 would also be nice to offer a state where the mac is powered down and 
 the phy left up.

There are PHYs which can detect that someone's on the other end even
when powered down..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek
On Sat, Jun 30, 2007 at 11:53:25PM +0200, Michael Buesch wrote:

  When the interface is down (or driver removed), the BroadCom 44xx card 
  remains
  powered on, and both its MAC and PHY is using up power.
  This patch makes the driver issue a MAC_CTRL_PHY_PDOWN when the interface
  is halted, and does a partial chip reset turns off the activity LEDs too.
  
  Applies to 2.6.22-rc6, or current git head.
  
  Tested on a Broadcom BCM4401-B0 card, it saves ~0.5W (measured using 
  powertop).
 
 Hm, I was going to measure the real power advantage with a
 PCI-extender card. But my B44B0 card doesn't seem to work in
 that extender card. It works perfectly fine sticked directly into
 the motherboard, though, and other cards like a BCM4318 work in
 the extender, too.
 Not sure what this is.
 The extender has an application note about nonworking cards in the
 extender and a too big resistor on the board IDSEL pin being the
 cause of this.

Does the card show up in lspci at all?  IDSEL drive strength
issues should only affect config space accesses.

Does the extender board have a PCI-PCI bridge on it?  (If not,
there's not really any reason to resistively couple the IDSEL
line to the host, since the host should take care of that.)


 Maybe I can try with another machine tomorrow.

That would only make a difference if there is no PCI-PCI bridge on
the extender board.

If the extender resistively couples the host's IDSEL line, you might
see different results on a different host bridge, since different
host bridges can use different numbers of IDSEL stepping cycles.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] b44: power down PHY when interface down

2007-06-30 Thread Lennert Buytenhek
On Sun, Jul 01, 2007 at 12:24:40AM +0200, Michael Buesch wrote:

   Hm, I was going to measure the real power advantage with a
   PCI-extender card. But my B44B0 card doesn't seem to work in
   that extender card. It works perfectly fine sticked directly into
   the motherboard, though, and other cards like a BCM4318 work in
   the extender, too.
   Not sure what this is.
   The extender has an application note about nonworking cards in the
   extender and a too big resistor on the board IDSEL pin being the
   cause of this.
  
  Does the card show up in lspci at all?
 
 No it doesn't.

Right, so it sounds like it might be this issue.


  Does the extender board have a PCI-PCI bridge on it?  (If not,
  there's not really any reason to resistively couple the IDSEL
  line to the host, since the host should take care of that.)
 
 There's no bridge. It just decouples all voltage lines, so you can
 drive it from external supply and/or measure voltages and current.
 On the PCB it looks like the the IDSEL line is rather directly
 routed to the host IDSEL. It just goes through one of the bus
 isolation chips. So I guess (just my guess) that this chip has some
 resistance and if the total resistance of the chip + the IDSEL
 resistor on the mainboard goes above some threshold it doesn't work
 anymore for some cards. In the application note they write
 about trouble for IDSEL resistors 51ohms.

More or less.  You can't add the resistances like that, since the
bus isolation chip buffers the IDSEL signal, but it is correct that
if the host's IDSEL resistor is larger than a certain value, the
combination of the resistive coupling of IDSEL plus the extra buffer
in the isolator might be causing the IDSEL input on the 'guest' PCI
board to assert too late (or not assert at all), causing config
accesses to fail.

(This also depends on the specific 'guest' PCI board used, as you
noted, due to differing IDSEL trace lengths/capacitances and input
pin capacitances on different PCI boards.  Also, it might work at
33 MHz but not work at 66 MHz, etc.)

If you feel adventurous, you could try to hack around this by
figuring out which AD[31:16] line this PCI slot's IDSEL line is
resistively coupled to (depends on the slot), and then adding
another parallel resistor on the board itself to make the bus
isolator's input buffer charge faster.  Note that this does
increase the load on that specific AD[] line, which might cause
other funny effects.


   Maybe I can try with another machine tomorrow.
  
  That would only make a difference if there is no PCI-PCI bridge on
  the extender board.
 
 Well, they suggest it in the application note as a possible fix. ;)

The bus isolation chip doesn't count as a PCI-PCI bridge. :)  I'm just
saying that you wouldn't see the issue you are seeing now if the
extender board had a real PCI-PCI bridge on it, since in that case the
type 0 config access to the guest PCI board would be generated by the
bridge instead of by the host.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek
On Wed, May 16, 2007 at 09:05:18PM +0930, Rod Whitby wrote:

> >> So, if the author of these patches wishes to concentrate on big-endian
> >> support first, then we will not say (and have not said) anything which
> >> will block inclusion of a big-endian only version of this driver.
> > 
> > The NSLU2 people are the ones here that are saying that the driver
> > should really support LE (because that is what they happen to be
> > using, the rest of the world runs the ixp4xx in BE)
> 
> I'll repeat again.  NSLU2-Linux supports both BE and LE.  We have
> about 5,000 users running BE and about 5,000 users running LE.

Perhaps, but somehow I don't think that we'd have seen any reaction
if the submitted driver had only supported LE and not BE.


> > Please just write the patch and let's get this over with.
> 
> Please let's just stop arguing about it.  If a patch appears before
> it gets merged, then great.  If it doesn't then it will appear at a
> later date.

Great.  I agree.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek
On Wed, May 16, 2007 at 08:16:38PM +0930, Rod Whitby wrote:

> So, if the author of these patches wishes to concentrate on big-endian
> support first, then we will not say (and have not said) anything which
> will block inclusion of a big-endian only version of this driver.

The NSLU2 people are the ones here that are saying that the driver
should really support LE (because that is what they happen to be
using, the rest of the world runs the ixp4xx in BE), and they keep
saying that it would be so easy to make a patch to add LE support,
but so far they haven't produced such a patch.

Please just write the patch and let's get this over with.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek
On Wed, May 16, 2007 at 08:13:01AM +0100, Christoph Hellwig wrote:

> > >>+#ifndef __ARMEB__
> > >>+#warning Little endian mode not supported
> > >>+#endif
> > >
> > >Personally I'm less fussed about WAN / LE support. Anyone with any  
> > >sense will run ixp4xx boards doing such a specialised network  
> > >operation as BE. Also, NSLU2-Linux can't test this functionality with  
> > >our LE setup as we don't have this hardware on-board. You may just  
> > >want to declare a depends on ARMEB in Kconfig (with or without OR  
> > >(ARM || BROKEN) ) and have done with it - it's up to you.
> > 
> > Christian Hohnstaedt's work did support LE though.
> > 
> > Not all ixp4xx boards are by definition "doing such a specialised 
> > network operation".
> > 
> > Krzysztof, why is LE not supported?
> > 
> > Do you need access to ixp4xx that starts in LE mode?
> 
> Not even trying to support LE is a clear merge blocker.  Maybe
> Krzysztof can't actually test it himself, which is fine - but
> not even pretending to be endian clean is not what proper Linux
> drivers do.

The issue is not that the driver is not 'endian clean'.

This is a driver for an on-chip ethernet MAC on an ARM CPU.  I.e. the
ethernet MAC is on the CPU itself, it's not some kind of PCI device or
something like that.  The ARM CPU in question can be run in either
little endian or big endian mode.  Making a driver work in both modes
of operation is generally not just an issue of adding a couple of
be32_to_cpu()s in the right places.

For example, intel IXP2000 and IXP23xx CPU support in arch/arm only
supports big-endian mode of operation, and none of the associated
drivers support little-endian mode.

Most of the other CPU support in arch/arm only supports
little-endian mode, and none of the associated drivers support
big-endian mode.  According to your criterion, that would mean that
most of the ARM drivers (alsa, usb, framebuffer, networking, etc.)
should never have been accepted in the kernel tree in the first place.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek
On Wed, May 16, 2007 at 08:13:01AM +0100, Christoph Hellwig wrote:

  +#ifndef __ARMEB__
  +#warning Little endian mode not supported
  +#endif
  
  Personally I'm less fussed about WAN / LE support. Anyone with any  
  sense will run ixp4xx boards doing such a specialised network  
  operation as BE. Also, NSLU2-Linux can't test this functionality with  
  our LE setup as we don't have this hardware on-board. You may just  
  want to declare a depends on ARMEB in Kconfig (with or without OR  
  (ARM || BROKEN) ) and have done with it - it's up to you.
  
  Christian Hohnstaedt's work did support LE though.
  
  Not all ixp4xx boards are by definition doing such a specialised 
  network operation.
  
  Krzysztof, why is LE not supported?
  
  Do you need access to ixp4xx that starts in LE mode?
 
 Not even trying to support LE is a clear merge blocker.  Maybe
 Krzysztof can't actually test it himself, which is fine - but
 not even pretending to be endian clean is not what proper Linux
 drivers do.

The issue is not that the driver is not 'endian clean'.

This is a driver for an on-chip ethernet MAC on an ARM CPU.  I.e. the
ethernet MAC is on the CPU itself, it's not some kind of PCI device or
something like that.  The ARM CPU in question can be run in either
little endian or big endian mode.  Making a driver work in both modes
of operation is generally not just an issue of adding a couple of
be32_to_cpu()s in the right places.

For example, intel IXP2000 and IXP23xx CPU support in arch/arm only
supports big-endian mode of operation, and none of the associated
drivers support little-endian mode.

Most of the other CPU support in arch/arm only supports
little-endian mode, and none of the associated drivers support
big-endian mode.  According to your criterion, that would mean that
most of the ARM drivers (alsa, usb, framebuffer, networking, etc.)
should never have been accepted in the kernel tree in the first place.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek
On Wed, May 16, 2007 at 08:16:38PM +0930, Rod Whitby wrote:

 So, if the author of these patches wishes to concentrate on big-endian
 support first, then we will not say (and have not said) anything which
 will block inclusion of a big-endian only version of this driver.

The NSLU2 people are the ones here that are saying that the driver
should really support LE (because that is what they happen to be
using, the rest of the world runs the ixp4xx in BE), and they keep
saying that it would be so easy to make a patch to add LE support,
but so far they haven't produced such a patch.

Please just write the patch and let's get this over with.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-16 Thread Lennert Buytenhek
On Wed, May 16, 2007 at 09:05:18PM +0930, Rod Whitby wrote:

  So, if the author of these patches wishes to concentrate on big-endian
  support first, then we will not say (and have not said) anything which
  will block inclusion of a big-endian only version of this driver.
  
  The NSLU2 people are the ones here that are saying that the driver
  should really support LE (because that is what they happen to be
  using, the rest of the world runs the ixp4xx in BE)
 
 I'll repeat again.  NSLU2-Linux supports both BE and LE.  We have
 about 5,000 users running BE and about 5,000 users running LE.

Perhaps, but somehow I don't think that we'd have seen any reaction
if the submitted driver had only supported LE and not BE.


  Please just write the patch and let's get this over with.
 
 Please let's just stop arguing about it.  If a patch appears before
 it gets merged, then great.  If it doesn't then it will appear at a
 later date.

Great.  I agree.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek
On Wed, May 09, 2007 at 12:35:40PM +0200, Mikael Pettersson wrote:

> > > Does that mean that the Debian ARM people have their heads so far
> > > up their collective asses that they think that every form of change
> > > is bad and are unable to accept that some forms of change might be
> > > for the better?
> > 
> > Well, I am not one of the Debian ARM people, just a user... and I
> > do hope the EABI port becomes supported in the future! But in the
> > meatime there is a crowd of users running Debian on consumer devices
> > like the NSLU2, and they need a LE network driver.
> 
> 1) Development _should_ happen in small individually-manageable steps.
>It's wrong to delay integration of the new IXP4xx eth driver just
>because it's not yet LE-compatible.

Exactly.


> 2) LE Debian/ARM users do have alternatives: they can use USB-Ethernet
>adapters, for instance.

Or just use Christian's driver.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek
On Wed, May 09, 2007 at 11:35:03AM +0200, Marcus Better wrote:

> > Does that mean that the Debian ARM people have their heads so far
> > up their collective asses that they think that every form of change
> > is bad and are unable to accept that some forms of change might be
> > for the better?
> 
> Well, I am not one of the Debian ARM people, just a user... and I
> do hope the EABI port becomes supported in the future! But in the
> meatime there is a crowd of users running Debian on consumer devices
> like the NSLU2, and they need a LE network driver.

"There's a crowd of users running Linux on TCP offload capable
cards, and they need TCP offload support in Linux."

The people who need a LE network driver can use Christian's driver,
as Christian's driver works in LE just fine.  The people who care
about LE support can add LE support to the driver that Krzysztof wrote.

I don't think that not supporting LE is a reason not to merge
Krzysztof's driver.  Don't make supporting LE systems Krzysztof's
problem.

Krzysztof has written an excellent driver, and while it would be 100%
Debian style to reject his driver just because it doesn't support LE[*],
thankfully, Linux is not Debian.  Please don't turn Linux into Debian.


[*] And if he were to complain about this, he would get slapped with
the standard "Our priorities are our users and free software"
Debian Social Contract rhetoric -- thank $DEITY we don't have a
Linux Kernel Social Contract with the same bullshit in it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-09 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 06:59:36PM +0200, Krzysztof Halasa wrote:

> >> There may be up to 6 Ethernet ports (not sure about hardware
> >> status, not yet supported even by Intel) - 7 queues * 128 entries
> >> each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something
> >> for TX, and then crypto, and maybe other things.
> >
> > You're unlikely to be using all of those at the same time, though.
> 
> That's the point.
> 
> > And what do you do if the user does compile all of these features into
> > his kernel and then tries to use them all at the same time?  Return
> > -ENOMEM?
> 
> If he is able to do so, yes - there is nothing we can do. But
> I suspect a single machine would not have all possible hardware.
> The problem is, we don't know what would it have, so it must be
> dynamic.

Well, you _would_ like to have a way to make sure that all the
capabilities on the board can be used.  If you have a future ixp4xx
based board with 16 ethernet ports, you don't want 'ifconfig eth7 up'
to give you -ENOMEM just because we ran out of SRAM.

The way I see it, that means that you do want to scale back your
other SRAM allocations if you know that you're going to need a lot
of SRAM (say, for ethernet RX/TX queues.)

Either you can do this with an ugly hack a la:

/*
 * The FOO board has many ethernet ports, and runs out of
 * SRAM prematurely if we use the default TX/RX ring sizes.
 */
#ifdef CONFIG_MACH_IXP483_FOO_BOARD
#define IXP4XX_ETH_RXTX_QUEUE_SIZE  32
#else
#define IXP4XX_ETH_RXTX_QUEUE_SIZE  256
#endif

Or you can put this knowledge in the board support code (cleaner, IMHO.)

E.g. let arch/arm/mach-ixp4xx/nslu2.c decide, at platform device
instantiation time, which region of queue SRAM can be used by which
queue, and take static allocations for things like the crypto unit
into account.  (This is just one form of that idea, there are many
different variations.)

That way, you can _guarantee_ that you'll always have enough SRAM
to be able to use the functionality that is exposed on the board you
are running on (which is a desirable property, IMHO), which is
something that you can't achieve with an allocator, as far as I can
see.

I'm not per se against the allocator, I just think that there are
problems (running out of SRAM, fragmentation) that can't be solved
by the allocator alone (SRAM users have to be aware which other SRAM
users there are in the system, while the idea of the allocator is to
insulate these users from each other), and any solution that solves
those two problems IMHO also automatically solves the problem that
the allocator is trying to solve.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek
On Wed, May 09, 2007 at 10:58:06AM +0200, Marcus Better wrote:

> >> There _is_ an ARM BE version of Debian.
> >>
> >> It's not an official port, but it's not maintained any worse than
> >> the 'official' LE ARM Debian port is.
> 
> > Hmm... That changes a bit. Perhaps we should forget about
> > that LE thing then, and (at best) put that trivial workaround?
> 
> Please keep in mind that users are unlikely to install an unofficial port
> which lacks integration with the Debian infrastructure, security support
> and other services. The arm architecture (LE) is currently the third most
> popular in Debian, whereas I suspect (?) there are very few BE Debian
> systems out there.

Note that all of your arguments also apply to the experimental
EABI little-endian ARM port.

I.e.:
1. The EABI port is an unofficial port.
2. The EABI port is not integrated with the Debian infrastructure.
3. The EABI port lacks security support.

You could also argue that:
4. "There is no reason to use EABI -- old-ABI works just as well."
5. "The perceived floating point speedups that EABI gives are
   completely drowned out by the slowness of the rest of the system."
6. "A lot of programs assume old-ABI behavior, it is too much work
   to patch them all."

Does that mean that the Debian ARM people have their heads so far
up their collective asses that they think that every form of change
is bad and are unable to accept that some forms of change might be
for the better?

I think you've just summarised why I don't like working on Debian.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek
On Wed, May 09, 2007 at 10:58:06AM +0200, Marcus Better wrote:

  There _is_ an ARM BE version of Debian.
 
  It's not an official port, but it's not maintained any worse than
  the 'official' LE ARM Debian port is.
 
  Hmm... That changes a bit. Perhaps we should forget about
  that LE thing then, and (at best) put that trivial workaround?
 
 Please keep in mind that users are unlikely to install an unofficial port
 which lacks integration with the Debian infrastructure, security support
 and other services. The arm architecture (LE) is currently the third most
 popular in Debian, whereas I suspect (?) there are very few BE Debian
 systems out there.

Note that all of your arguments also apply to the experimental
EABI little-endian ARM port.

I.e.:
1. The EABI port is an unofficial port.
2. The EABI port is not integrated with the Debian infrastructure.
3. The EABI port lacks security support.

You could also argue that:
4. There is no reason to use EABI -- old-ABI works just as well.
5. The perceived floating point speedups that EABI gives are
   completely drowned out by the slowness of the rest of the system.
6. A lot of programs assume old-ABI behavior, it is too much work
   to patch them all.

Does that mean that the Debian ARM people have their heads so far
up their collective asses that they think that every form of change
is bad and are unable to accept that some forms of change might be
for the better?

I think you've just summarised why I don't like working on Debian.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-09 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 06:59:36PM +0200, Krzysztof Halasa wrote:

  There may be up to 6 Ethernet ports (not sure about hardware
  status, not yet supported even by Intel) - 7 queues * 128 entries
  each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something
  for TX, and then crypto, and maybe other things.
 
  You're unlikely to be using all of those at the same time, though.
 
 That's the point.
 
  And what do you do if the user does compile all of these features into
  his kernel and then tries to use them all at the same time?  Return
  -ENOMEM?
 
 If he is able to do so, yes - there is nothing we can do. But
 I suspect a single machine would not have all possible hardware.
 The problem is, we don't know what would it have, so it must be
 dynamic.

Well, you _would_ like to have a way to make sure that all the
capabilities on the board can be used.  If you have a future ixp4xx
based board with 16 ethernet ports, you don't want 'ifconfig eth7 up'
to give you -ENOMEM just because we ran out of SRAM.

The way I see it, that means that you do want to scale back your
other SRAM allocations if you know that you're going to need a lot
of SRAM (say, for ethernet RX/TX queues.)

Either you can do this with an ugly hack a la:

/*
 * The FOO board has many ethernet ports, and runs out of
 * SRAM prematurely if we use the default TX/RX ring sizes.
 */
#ifdef CONFIG_MACH_IXP483_FOO_BOARD
#define IXP4XX_ETH_RXTX_QUEUE_SIZE  32
#else
#define IXP4XX_ETH_RXTX_QUEUE_SIZE  256
#endif

Or you can put this knowledge in the board support code (cleaner, IMHO.)

E.g. let arch/arm/mach-ixp4xx/nslu2.c decide, at platform device
instantiation time, which region of queue SRAM can be used by which
queue, and take static allocations for things like the crypto unit
into account.  (This is just one form of that idea, there are many
different variations.)

That way, you can _guarantee_ that you'll always have enough SRAM
to be able to use the functionality that is exposed on the board you
are running on (which is a desirable property, IMHO), which is
something that you can't achieve with an allocator, as far as I can
see.

I'm not per se against the allocator, I just think that there are
problems (running out of SRAM, fragmentation) that can't be solved
by the allocator alone (SRAM users have to be aware which other SRAM
users there are in the system, while the idea of the allocator is to
insulate these users from each other), and any solution that solves
those two problems IMHO also automatically solves the problem that
the allocator is trying to solve.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek
On Wed, May 09, 2007 at 11:35:03AM +0200, Marcus Better wrote:

  Does that mean that the Debian ARM people have their heads so far
  up their collective asses that they think that every form of change
  is bad and are unable to accept that some forms of change might be
  for the better?
 
 Well, I am not one of the Debian ARM people, just a user... and I
 do hope the EABI port becomes supported in the future! But in the
 meatime there is a crowd of users running Debian on consumer devices
 like the NSLU2, and they need a LE network driver.

There's a crowd of users running Linux on TCP offload capable
cards, and they need TCP offload support in Linux.

The people who need a LE network driver can use Christian's driver,
as Christian's driver works in LE just fine.  The people who care
about LE support can add LE support to the driver that Krzysztof wrote.

I don't think that not supporting LE is a reason not to merge
Krzysztof's driver.  Don't make supporting LE systems Krzysztof's
problem.

Krzysztof has written an excellent driver, and while it would be 100%
Debian style to reject his driver just because it doesn't support LE[*],
thankfully, Linux is not Debian.  Please don't turn Linux into Debian.


[*] And if he were to complain about this, he would get slapped with
the standard Our priorities are our users and free software
Debian Social Contract rhetoric -- thank $DEITY we don't have a
Linux Kernel Social Contract with the same bullshit in it.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-09 Thread Lennert Buytenhek
On Wed, May 09, 2007 at 12:35:40PM +0200, Mikael Pettersson wrote:

   Does that mean that the Debian ARM people have their heads so far
   up their collective asses that they think that every form of change
   is bad and are unable to accept that some forms of change might be
   for the better?
  
  Well, I am not one of the Debian ARM people, just a user... and I
  do hope the EABI port becomes supported in the future! But in the
  meatime there is a crowd of users running Debian on consumer devices
  like the NSLU2, and they need a LE network driver.
 
 1) Development _should_ happen in small individually-manageable steps.
It's wrong to delay integration of the new IXP4xx eth driver just
because it's not yet LE-compatible.

Exactly.


 2) LE Debian/ARM users do have alternatives: they can use USB-Ethernet
adapters, for instance.

Or just use Christian's driver.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 05:28:21PM +0200, Krzysztof Halasa wrote:

> > I was always curious, why do people want to run ixp4xx in LE mode? What
> > are the benefits that overweight the obvious performance degradation?
> 
> Debian is indeed a valid reason.
> I wonder if it would be much work to create BE Debian as well.

There _is_ an ARM BE version of Debian.

It's not an official port, but it's not maintained any worse than
the 'official' LE ARM Debian port is.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 04:31:12PM +0200, Krzysztof Halasa wrote:

> >> +/* Built-in 10/100 Ethernet MAC interfaces */
> >> +static struct mac_plat_info ixdp425_plat_mac[] = {
> >> +  {
> >> +  .phy= 0,
> >> +  .rxq= 3,
> >> +  }, {
> >> +  .phy= 1,
> >> +  .rxq= 4,
> >> +  }
> >> +};
> >
> > As with Christian's driver (I'm feeling like a bit of a broken record
> > here :-), putting knowledge of which queue to use (which is firmware-
> > specific) in the _board_ support file is almost certainly wrong.
> >
> > I would just put the port number in there, and let the ethernet
> > driver map the port number to the hardware queue number.  After all,
> > the ethernet driver knows which queues the firmware uses, while the
> > board support code doesn't.
> 
> No, quite the opposite. The board code knows its set of hardware
> interfaces etc. and can let Ethernet driver use, say, HSS queues.
> The driver can't know that.

You are attacking a point that I did not make.

The board support code knows such things as that the "front ethernet
port" on the board is connected to the CPU's MII port number #2, but
the board support code does _not_ know that MII port number #2
corresponds to "ixp4xx hardware queue #5."

If Intel puts out a firmware update next month, and your ethernet
driver is modified to take advantage of the new features in that
firmware and starts depending on the newer version of that firmware,
we will have to modify every ixp4xx board support file in case the
firmware update modifies the ixp4xx queue numbers in use.  The
mapping from hardware ports (MII port #0, MII port #6, HSS port
#42, whatever) to ixp4xx hardware queue numbers (0-63) should _not_
be put in every single ixp4xx board support file.

Even if you only change the

(in board support file)
.rxq= 4,

line to something like this instead:

(in some ixp4xx-specific or driver-specific header file)
#define IXP4XX_MII_PORT_1_RX_QUEUE  4

(in board support file)
.rxq= IXP4XX_MII_PORT_1_RX_QUEUE,

then you have remved this dependency, and then you only have to update
one place if you move to a newer firmware version.


> > I generally discourage the use of such wrappers, as it often makes
> > people forget that the set and clear operations are not atomic, and
> > it ignores the fact that some of the other bits in the register you
> > are modifying might have side-effects.
> 
> Without them the code in question is hardly readable,

You can read Polish, how can you complain about code readability. :-))

*runs*


> I pick the need to remember about non-atomicity and possible side
> effects instead :-)

Sure, point taken, it's just that the person after you might not
remember..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 04:12:17PM +0200, Krzysztof Halasa wrote:

> > The queue manager interrupts should probably be implemented as an
> > irqchip, in the same way that GPIO interrupts are implemented.  (I.e.
> > allocate 'real' interrupt numbers for them, and use the interrupt
> > cascade mechanism.)  You probably want to have separate irqchips for
> > the upper and lower halves, too.  This way, drivers can just use
> > request_irq() instead of having to bother with platform-specific
> > qmgr_set_irq() methods.
> 
> Is there a sample somewhere?

See for example arch/arm/mach-ep93xx/core.c, handling of the A/B/F
port GPIO interrupts.

In a nutshell, it goes like this.

1) Allocate a set of IRQ numbers.  E.g. in include/asm-arm/arch-ixp4xx/irqs.h:

#define IRQ_IXP4XX_QUEUE_0  64
#define IRQ_IXP4XX_QUEUE_1  65
[...]

   Adjust NR_IRQS, too.

2) Implement interrupt chip functions:

static void ixp4xx_queue_low_irq_mask_ack(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_mask(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_unmask(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_set_type(unsigned int irq)
{
[...]
}

static struct irq_chip ixp4xx_queue_low_irq_chip = {
.name   = "QMGR low",
.ack= ixp4xx_queue_low_irq_mask_ack,
.mask   = ixp4xx_queue_low_irq_mask,
.unmask = ixp4xx_queue_low_irq_unmask,
.set_type   = ixp4xx_queue_low_irq_set_type,
};

3) Hook up the queue interrupts:

for (i = IRQ_IXP4XX_QUEUE_0; i <= IRQ_IXP4XX_QUEUE_31; i++) {
set_irq_chip(i, _queue_low_irq_chip);
set_irq_handler(i, handle_level_irq);
set_irq_flags(i, IRQF_VALID);
}

4) Implement an interrupt handler for the parent interrupt:

static void ixp4xx_qmgr_low_irq_handler(unsigned int irq, struct 
irq_des c *desc)
{
u32 status;
int i;

status = __raw_readl(IXP4XX_WHATEVER_QMGR_LOW_STATUS_REGISTER);
for (i = 0; i < 32; i++) {
if (status & (1 << i)) {
desc = irq_desc + IRQ_IXP4XX_QUEUE_0 + i;
desc_handle_irq(IRQ_IXP4XX_QUEUE_0 + i, desc);
}
}
}

5) Hook up the parent interrupt:

set_irq_chained_handler(IRQ_IXP4XX_QM1, ixp4xx_qmgr_low_irq_handler);


Or something like that.


> > As with Christian's driver, I don't know whether an SRAM allocator
> > makes much sense.  We can just set up a static allocation map for the
> > in-tree drivers and leave out the allocator altogether.  I.e. I don't
> > think it's worth the complexity (and just because the butt-ugly Intel
> > code has an allocator isn't a very good reason. :-)
> 
> It's a very simple allocator. I don't whink we have enough SRAM
> without it. For now it would work but it's probably too small
> for all potential users at a time.
> 
> There may be up to 6 Ethernet ports (not sure about hardware
> status, not yet supported even by Intel) - 7 queues * 128 entries
> each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something
> for TX, and then crypto, and maybe other things.

You're unlikely to be using all of those at the same time, though.

And what do you do if the user does compile all of these features into
his kernel and then tries to use them all at the same time?  Return
-ENOMEM?

Shouldn't we make sure that at least the features that are compiled in
can be used at the same time?  If you want that guarantee, then you
might as well determine the SRAM map at compile time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 04:47:31PM +0400, Alexey Zaytsev wrote:

> > As with Christian's driver, I don't know whether an SRAM allocator
> > makes much sense.  We can just set up a static allocation map for the
> > in-tree drivers and leave out the allocator altogether.  I.e. I don't
> > think it's worth the complexity (and just because the butt-ugly Intel
> > code has an allocator isn't a very good reason. :-)
> 
> Is the qmgr used when the NPEs are utilized as DMA engines?

I'm not sure, but probably yes.


> And is the allocator needed in this case?

If you statically partition the available queue SRAM, no.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek
On Mon, May 07, 2007 at 09:18:00PM +0100, Michael-Luke Jones wrote:

> >Well, I'm told that (compatible) NPEs are present on other IXP CPUs.
> >Not sure about details.
> 
> If, by a combined effort, we ever manage to create a generic NPE  
> driver for the NPEs found in IXP42x/43x/46x/2000/23xx then the driver  
> should go in arch/arm/npe.c

(Note that the ixp2000 doesn't have NPEs.)

(Both the 2000 and the 23xx have microengines, which are both
supported by arch/arm/common/uengine.c.)


> It's possible, but hard due to the differences in hardware design

The ixp23xx NPEs seem pretty much identical to me to the ixp4xx
NPEs.  There are some minor differences between the ixp2000 and
ixp23xx uengines, but those are easy enough to deal with.


> and the fact that boards based on anything other than 42x are few
> and far between. The vast majority of 'independent' users following
> mainline are likely running on 42x boards.

Sure, ixp23xx hardware is harder to get.  I'm not sure what you mean
by 'independent' users, though.  Are people with non-42x hardware
'dependent' users, and why?


> Thus, for now, I would drop the NPE / QMGR code in arch/arm/mach- 
> ixp4xx/ and concentrate on making it 42x/43x/46x agnostic. One step  
> at a time :)

I'd say that it's up to those who are interested in ixp23xx support
(probably only myself at this point) to add ixp23xx support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek
On Mon, May 07, 2007 at 10:00:20PM +0200, Krzysztof Halasa wrote:

> > - the NPE can also be used as DMA engine and for crypto operations.
> >   Both are not network related.
> >   Additionally, the NPE is not only ixp4xx related, but is
> >   also used in IXP23xx CPUs, so it could be placed in
> >   arch/arm/common or arch/arm/xscale ?
> >
> > - The MAC is used on IXP23xx, too. So the drivers for
> >   both CPU familys only differ in the way they exchange
> >   network packets between the NPE and the kernel.
> 
> Hmm... perhaps someone have a spare device with such IXP23xx
> and wants to make it a donation for science? :-)

I have a couple of ixp23xx boards at home, but I'm not sure whether I
can give them away.  I can give you remote access to them, though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek
On Mon, May 07, 2007 at 02:07:16AM +0200, Krzysztof Halasa wrote:

> + * Ethernet port config (0x00 is not present on IXP42X):
> + *
> + * logical port  0x000x100x20
> + * NPE   0 (NPE-A)   1 (NPE-B)   2 (NPE-C)
> + * physical PortId   2   0   1
> + * TX queue  23  24  25
> + * RX-free queue 26  27  28
> + * TX-done queue is always 31, RX queue is configurable

(Note that this assignment depends on the firmware, and different
firmware versions use different queues -- you might want to add a
note about which firmware version this holds for.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 03:19:22AM +0200, Krzysztof Halasa wrote:

> diff --git a/arch/arm/mach-ixp4xx/ixdp425-setup.c 
> b/arch/arm/mach-ixp4xx/ixdp425-setup.c
> index ec4f079..f20d39d 100644
> --- a/arch/arm/mach-ixp4xx/ixdp425-setup.c
> +++ b/arch/arm/mach-ixp4xx/ixdp425-setup.c
> @@ -101,10 +101,35 @@ static struct platform_device ixdp425_uart = {
>   .resource   = ixdp425_uart_resources
>  };
>  
> +/* Built-in 10/100 Ethernet MAC interfaces */
> +static struct mac_plat_info ixdp425_plat_mac[] = {
> + {
> + .phy= 0,
> + .rxq= 3,
> + }, {
> + .phy= 1,
> + .rxq= 4,
> + }
> +};

As with Christian's driver (I'm feeling like a bit of a broken record
here :-), putting knowledge of which queue to use (which is firmware-
specific) in the _board_ support file is almost certainly wrong.

I would just put the port number in there, and let the ethernet
driver map the port number to the hardware queue number.  After all,
the ethernet driver knows which queues the firmware uses, while the
board support code doesn't.


> +#ifndef __ARMEB__
> +#warning Little endian mode not supported
> +#endif

Yay. :-)  /me hides


> +static inline void set_regbits(u32 bits, u32 __iomem *reg)
> +{
> + __raw_writel(__raw_readl(reg) | bits, reg);
> +}
> +static inline void clr_regbits(u32 bits, u32 __iomem *reg)
> +{
> + __raw_writel(__raw_readl(reg) & ~bits, reg);
> +}

I generally discourage the use of such wrappers, as it often makes
people forget that the set and clear operations are not atomic, and
it ignores the fact that some of the other bits in the register you
are modifying might have side-effects.

Didn't review the rest -- not sure whether I'm looking at the latest
version.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek
I'm not sure what the latest versions are, so I'm not sure which
patches to review and which patches are obsolete.


On Tue, May 08, 2007 at 02:46:28AM +0200, Krzysztof Halasa wrote:

> +struct qmgr_regs __iomem *qmgr_regs;
> +static struct resource *mem_res;
> +static spinlock_t qmgr_lock;
> +static u32 used_sram_bitmap[4]; /* 128 16-dword pages */
> +static void (*irq_handlers[HALF_QUEUES])(void *pdev);
> +static void *irq_pdevs[HALF_QUEUES];
> +
> +void qmgr_set_irq(unsigned int queue, int src,
> +   void (*handler)(void *pdev), void *pdev)
> +{
> + u32 __iomem *reg = _regs->irqsrc[queue / 8]; /* 8 queues / u32 */
> + int bit = (queue % 8) * 4; /* 3 bits + 1 reserved bit per queue */
> + unsigned long flags;
> +
> + src &= 7;
> + spin_lock_irqsave(_lock, flags);
> + __raw_writel((__raw_readl(reg) & ~(7 << bit)) | (src << bit), reg);
> + irq_handlers[queue] = handler;
> + irq_pdevs[queue] = pdev;
> + spin_unlock_irqrestore(_lock, flags);
> +}

The queue manager interrupts should probably be implemented as an
irqchip, in the same way that GPIO interrupts are implemented.  (I.e.
allocate 'real' interrupt numbers for them, and use the interrupt
cascade mechanism.)  You probably want to have separate irqchips for
the upper and lower halves, too.  This way, drivers can just use
request_irq() instead of having to bother with platform-specific
qmgr_set_irq() methods.  I think I also made this review comment
with Christian's driver.


> +int qmgr_request_queue(unsigned int queue, unsigned int len /* dwords */,
> +unsigned int nearly_empty_watermark,
> +unsigned int nearly_full_watermark)
> +{
> + u32 cfg, addr = 0, mask[4]; /* in 16-dwords */
> + int err;
> +
> + if (queue >= HALF_QUEUES)
> + return -ERANGE;
> +
> + if ((nearly_empty_watermark | nearly_full_watermark) & ~7)
> + return -EINVAL;
> +
> + switch (len) {
> + case  16:
> + cfg = 0 << 24;
> + mask[0] = 0x1;
> + break;
> + case  32:
> + cfg = 1 << 24;
> + mask[0] = 0x3;
> + break;
> + case  64:
> + cfg = 2 << 24;
> + mask[0] = 0xF;
> + break;
> + case 128:
> + cfg = 3 << 24;
> + mask[0] = 0xFF;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + cfg |= nearly_empty_watermark << 26;
> + cfg |= nearly_full_watermark << 29;
> + len /= 16;  /* in 16-dwords: 1, 2, 4 or 8 */
> + mask[1] = mask[2] = mask[3] = 0;
> +
> + if (!try_module_get(THIS_MODULE))
> + return -ENODEV;
> +
> + spin_lock_irq(_lock);
> + if (__raw_readl(_regs->sram[queue])) {
> + err = -EBUSY;
> + goto err;
> + }
> +
> + while (1) {
> + if (!(used_sram_bitmap[0] & mask[0]) &&
> + !(used_sram_bitmap[1] & mask[1]) &&
> + !(used_sram_bitmap[2] & mask[2]) &&
> + !(used_sram_bitmap[3] & mask[3]))
> + break; /* found free space */
> +
> + addr++;
> + shift_mask(mask);
> + if (addr + len > ARRAY_SIZE(qmgr_regs->sram)) {
> + printk(KERN_ERR "qmgr: no free SRAM space for"
> +" queue %i\n", queue);
> + err = -ENOMEM;
> + goto err;
> + }
> + }
> +
> + used_sram_bitmap[0] |= mask[0];
> + used_sram_bitmap[1] |= mask[1];
> + used_sram_bitmap[2] |= mask[2];
> + used_sram_bitmap[3] |= mask[3];
> + __raw_writel(cfg | (addr << 14), _regs->sram[queue]);
> + spin_unlock_irq(_lock);
> +
> +#if DEBUG
> + printk(KERN_DEBUG "qmgr: requested queue %i, addr = 0x%02X\n",
> +queue, addr);
> +#endif
> + return 0;
> +
> +err:
> + spin_unlock_irq(_lock);
> + module_put(THIS_MODULE);
> + return err;
> +}

As with Christian's driver, I don't know whether an SRAM allocator
makes much sense.  We can just set up a static allocation map for the
in-tree drivers and leave out the allocator altogether.  I.e. I don't
think it's worth the complexity (and just because the butt-ugly Intel
code has an allocator isn't a very good reason. :-)

I.e. an API a la:

ixp4xx_qmgr_config_queue(int queue_nr, int sram_base_address, int 
queue_size, ...);

might simply suffice.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek
I'm not sure what the latest versions are, so I'm not sure which
patches to review and which patches are obsolete.


On Tue, May 08, 2007 at 02:46:28AM +0200, Krzysztof Halasa wrote:

 +struct qmgr_regs __iomem *qmgr_regs;
 +static struct resource *mem_res;
 +static spinlock_t qmgr_lock;
 +static u32 used_sram_bitmap[4]; /* 128 16-dword pages */
 +static void (*irq_handlers[HALF_QUEUES])(void *pdev);
 +static void *irq_pdevs[HALF_QUEUES];
 +
 +void qmgr_set_irq(unsigned int queue, int src,
 +   void (*handler)(void *pdev), void *pdev)
 +{
 + u32 __iomem *reg = qmgr_regs-irqsrc[queue / 8]; /* 8 queues / u32 */
 + int bit = (queue % 8) * 4; /* 3 bits + 1 reserved bit per queue */
 + unsigned long flags;
 +
 + src = 7;
 + spin_lock_irqsave(qmgr_lock, flags);
 + __raw_writel((__raw_readl(reg)  ~(7  bit)) | (src  bit), reg);
 + irq_handlers[queue] = handler;
 + irq_pdevs[queue] = pdev;
 + spin_unlock_irqrestore(qmgr_lock, flags);
 +}

The queue manager interrupts should probably be implemented as an
irqchip, in the same way that GPIO interrupts are implemented.  (I.e.
allocate 'real' interrupt numbers for them, and use the interrupt
cascade mechanism.)  You probably want to have separate irqchips for
the upper and lower halves, too.  This way, drivers can just use
request_irq() instead of having to bother with platform-specific
qmgr_set_irq() methods.  I think I also made this review comment
with Christian's driver.


 +int qmgr_request_queue(unsigned int queue, unsigned int len /* dwords */,
 +unsigned int nearly_empty_watermark,
 +unsigned int nearly_full_watermark)
 +{
 + u32 cfg, addr = 0, mask[4]; /* in 16-dwords */
 + int err;
 +
 + if (queue = HALF_QUEUES)
 + return -ERANGE;
 +
 + if ((nearly_empty_watermark | nearly_full_watermark)  ~7)
 + return -EINVAL;
 +
 + switch (len) {
 + case  16:
 + cfg = 0  24;
 + mask[0] = 0x1;
 + break;
 + case  32:
 + cfg = 1  24;
 + mask[0] = 0x3;
 + break;
 + case  64:
 + cfg = 2  24;
 + mask[0] = 0xF;
 + break;
 + case 128:
 + cfg = 3  24;
 + mask[0] = 0xFF;
 + break;
 + default:
 + return -EINVAL;
 + }
 +
 + cfg |= nearly_empty_watermark  26;
 + cfg |= nearly_full_watermark  29;
 + len /= 16;  /* in 16-dwords: 1, 2, 4 or 8 */
 + mask[1] = mask[2] = mask[3] = 0;
 +
 + if (!try_module_get(THIS_MODULE))
 + return -ENODEV;
 +
 + spin_lock_irq(qmgr_lock);
 + if (__raw_readl(qmgr_regs-sram[queue])) {
 + err = -EBUSY;
 + goto err;
 + }
 +
 + while (1) {
 + if (!(used_sram_bitmap[0]  mask[0]) 
 + !(used_sram_bitmap[1]  mask[1]) 
 + !(used_sram_bitmap[2]  mask[2]) 
 + !(used_sram_bitmap[3]  mask[3]))
 + break; /* found free space */
 +
 + addr++;
 + shift_mask(mask);
 + if (addr + len  ARRAY_SIZE(qmgr_regs-sram)) {
 + printk(KERN_ERR qmgr: no free SRAM space for
 + queue %i\n, queue);
 + err = -ENOMEM;
 + goto err;
 + }
 + }
 +
 + used_sram_bitmap[0] |= mask[0];
 + used_sram_bitmap[1] |= mask[1];
 + used_sram_bitmap[2] |= mask[2];
 + used_sram_bitmap[3] |= mask[3];
 + __raw_writel(cfg | (addr  14), qmgr_regs-sram[queue]);
 + spin_unlock_irq(qmgr_lock);
 +
 +#if DEBUG
 + printk(KERN_DEBUG qmgr: requested queue %i, addr = 0x%02X\n,
 +queue, addr);
 +#endif
 + return 0;
 +
 +err:
 + spin_unlock_irq(qmgr_lock);
 + module_put(THIS_MODULE);
 + return err;
 +}

As with Christian's driver, I don't know whether an SRAM allocator
makes much sense.  We can just set up a static allocation map for the
in-tree drivers and leave out the allocator altogether.  I.e. I don't
think it's worth the complexity (and just because the butt-ugly Intel
code has an allocator isn't a very good reason. :-)

I.e. an API a la:

ixp4xx_qmgr_config_queue(int queue_nr, int sram_base_address, int 
queue_size, ...);

might simply suffice.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 03:19:22AM +0200, Krzysztof Halasa wrote:

 diff --git a/arch/arm/mach-ixp4xx/ixdp425-setup.c 
 b/arch/arm/mach-ixp4xx/ixdp425-setup.c
 index ec4f079..f20d39d 100644
 --- a/arch/arm/mach-ixp4xx/ixdp425-setup.c
 +++ b/arch/arm/mach-ixp4xx/ixdp425-setup.c
 @@ -101,10 +101,35 @@ static struct platform_device ixdp425_uart = {
   .resource   = ixdp425_uart_resources
  };
  
 +/* Built-in 10/100 Ethernet MAC interfaces */
 +static struct mac_plat_info ixdp425_plat_mac[] = {
 + {
 + .phy= 0,
 + .rxq= 3,
 + }, {
 + .phy= 1,
 + .rxq= 4,
 + }
 +};

As with Christian's driver (I'm feeling like a bit of a broken record
here :-), putting knowledge of which queue to use (which is firmware-
specific) in the _board_ support file is almost certainly wrong.

I would just put the port number in there, and let the ethernet
driver map the port number to the hardware queue number.  After all,
the ethernet driver knows which queues the firmware uses, while the
board support code doesn't.


 +#ifndef __ARMEB__
 +#warning Little endian mode not supported
 +#endif

Yay. :-)  /me hides


 +static inline void set_regbits(u32 bits, u32 __iomem *reg)
 +{
 + __raw_writel(__raw_readl(reg) | bits, reg);
 +}
 +static inline void clr_regbits(u32 bits, u32 __iomem *reg)
 +{
 + __raw_writel(__raw_readl(reg)  ~bits, reg);
 +}

I generally discourage the use of such wrappers, as it often makes
people forget that the set and clear operations are not atomic, and
it ignores the fact that some of the other bits in the register you
are modifying might have side-effects.

Didn't review the rest -- not sure whether I'm looking at the latest
version.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek
On Mon, May 07, 2007 at 02:07:16AM +0200, Krzysztof Halasa wrote:

 + * Ethernet port config (0x00 is not present on IXP42X):
 + *
 + * logical port  0x000x100x20
 + * NPE   0 (NPE-A)   1 (NPE-B)   2 (NPE-C)
 + * physical PortId   2   0   1
 + * TX queue  23  24  25
 + * RX-free queue 26  27  28
 + * TX-done queue is always 31, RX queue is configurable

(Note that this assignment depends on the firmware, and different
firmware versions use different queues -- you might want to add a
note about which firmware version this holds for.)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek
On Mon, May 07, 2007 at 09:18:00PM +0100, Michael-Luke Jones wrote:

 Well, I'm told that (compatible) NPEs are present on other IXP CPUs.
 Not sure about details.
 
 If, by a combined effort, we ever manage to create a generic NPE  
 driver for the NPEs found in IXP42x/43x/46x/2000/23xx then the driver  
 should go in arch/arm/npe.c

(Note that the ixp2000 doesn't have NPEs.)

(Both the 2000 and the 23xx have microengines, which are both
supported by arch/arm/common/uengine.c.)


 It's possible, but hard due to the differences in hardware design

The ixp23xx NPEs seem pretty much identical to me to the ixp4xx
NPEs.  There are some minor differences between the ixp2000 and
ixp23xx uengines, but those are easy enough to deal with.


 and the fact that boards based on anything other than 42x are few
 and far between. The vast majority of 'independent' users following
 mainline are likely running on 42x boards.

Sure, ixp23xx hardware is harder to get.  I'm not sure what you mean
by 'independent' users, though.  Are people with non-42x hardware
'dependent' users, and why?


 Thus, for now, I would drop the NPE / QMGR code in arch/arm/mach- 
 ixp4xx/ and concentrate on making it 42x/43x/46x agnostic. One step  
 at a time :)

I'd say that it's up to those who are interested in ixp23xx support
(probably only myself at this point) to add ixp23xx support.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Intel IXP4xx network drivers

2007-05-08 Thread Lennert Buytenhek
On Mon, May 07, 2007 at 10:00:20PM +0200, Krzysztof Halasa wrote:

  - the NPE can also be used as DMA engine and for crypto operations.
Both are not network related.
Additionally, the NPE is not only ixp4xx related, but is
also used in IXP23xx CPUs, so it could be placed in
arch/arm/common or arch/arm/xscale ?
 
  - The MAC is used on IXP23xx, too. So the drivers for
both CPU familys only differ in the way they exchange
network packets between the NPE and the kernel.
 
 Hmm... perhaps someone have a spare device with such IXP23xx
 and wants to make it a donation for science? :-)

I have a couple of ixp23xx boards at home, but I'm not sure whether I
can give them away.  I can give you remote access to them, though.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 04:47:31PM +0400, Alexey Zaytsev wrote:

  As with Christian's driver, I don't know whether an SRAM allocator
  makes much sense.  We can just set up a static allocation map for the
  in-tree drivers and leave out the allocator altogether.  I.e. I don't
  think it's worth the complexity (and just because the butt-ugly Intel
  code has an allocator isn't a very good reason. :-)
 
 Is the qmgr used when the NPEs are utilized as DMA engines?

I'm not sure, but probably yes.


 And is the allocator needed in this case?

If you statically partition the available queue SRAM, no.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.3 - QMGR

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 04:12:17PM +0200, Krzysztof Halasa wrote:

  The queue manager interrupts should probably be implemented as an
  irqchip, in the same way that GPIO interrupts are implemented.  (I.e.
  allocate 'real' interrupt numbers for them, and use the interrupt
  cascade mechanism.)  You probably want to have separate irqchips for
  the upper and lower halves, too.  This way, drivers can just use
  request_irq() instead of having to bother with platform-specific
  qmgr_set_irq() methods.
 
 Is there a sample somewhere?

See for example arch/arm/mach-ep93xx/core.c, handling of the A/B/F
port GPIO interrupts.

In a nutshell, it goes like this.

1) Allocate a set of IRQ numbers.  E.g. in include/asm-arm/arch-ixp4xx/irqs.h:

#define IRQ_IXP4XX_QUEUE_0  64
#define IRQ_IXP4XX_QUEUE_1  65
[...]

   Adjust NR_IRQS, too.

2) Implement interrupt chip functions:

static void ixp4xx_queue_low_irq_mask_ack(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_mask(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_unmask(unsigned int irq)
{
[...]
}

static void ixp4xx_queue_low_irq_set_type(unsigned int irq)
{
[...]
}

static struct irq_chip ixp4xx_queue_low_irq_chip = {
.name   = QMGR low,
.ack= ixp4xx_queue_low_irq_mask_ack,
.mask   = ixp4xx_queue_low_irq_mask,
.unmask = ixp4xx_queue_low_irq_unmask,
.set_type   = ixp4xx_queue_low_irq_set_type,
};

3) Hook up the queue interrupts:

for (i = IRQ_IXP4XX_QUEUE_0; i = IRQ_IXP4XX_QUEUE_31; i++) {
set_irq_chip(i, ixp4xx_queue_low_irq_chip);
set_irq_handler(i, handle_level_irq);
set_irq_flags(i, IRQF_VALID);
}

4) Implement an interrupt handler for the parent interrupt:

static void ixp4xx_qmgr_low_irq_handler(unsigned int irq, struct 
irq_des c *desc)
{
u32 status;
int i;

status = __raw_readl(IXP4XX_WHATEVER_QMGR_LOW_STATUS_REGISTER);
for (i = 0; i  32; i++) {
if (status  (1  i)) {
desc = irq_desc + IRQ_IXP4XX_QUEUE_0 + i;
desc_handle_irq(IRQ_IXP4XX_QUEUE_0 + i, desc);
}
}
}

5) Hook up the parent interrupt:

set_irq_chained_handler(IRQ_IXP4XX_QM1, ixp4xx_qmgr_low_irq_handler);


Or something like that.


  As with Christian's driver, I don't know whether an SRAM allocator
  makes much sense.  We can just set up a static allocation map for the
  in-tree drivers and leave out the allocator altogether.  I.e. I don't
  think it's worth the complexity (and just because the butt-ugly Intel
  code has an allocator isn't a very good reason. :-)
 
 It's a very simple allocator. I don't whink we have enough SRAM
 without it. For now it would work but it's probably too small
 for all potential users at a time.
 
 There may be up to 6 Ethernet ports (not sure about hardware
 status, not yet supported even by Intel) - 7 queues * 128 entries
 each = ~ 3.5 KB. Add 2 long queues (RX) for HSS and something
 for TX, and then crypto, and maybe other things.

You're unlikely to be using all of those at the same time, though.

And what do you do if the user does compile all of these features into
his kernel and then tries to use them all at the same time?  Return
-ENOMEM?

Shouldn't we make sure that at least the features that are compiled in
can be used at the same time?  If you want that guarantee, then you
might as well determine the SRAM map at compile time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 04:31:12PM +0200, Krzysztof Halasa wrote:

  +/* Built-in 10/100 Ethernet MAC interfaces */
  +static struct mac_plat_info ixdp425_plat_mac[] = {
  +  {
  +  .phy= 0,
  +  .rxq= 3,
  +  }, {
  +  .phy= 1,
  +  .rxq= 4,
  +  }
  +};
 
  As with Christian's driver (I'm feeling like a bit of a broken record
  here :-), putting knowledge of which queue to use (which is firmware-
  specific) in the _board_ support file is almost certainly wrong.
 
  I would just put the port number in there, and let the ethernet
  driver map the port number to the hardware queue number.  After all,
  the ethernet driver knows which queues the firmware uses, while the
  board support code doesn't.
 
 No, quite the opposite. The board code knows its set of hardware
 interfaces etc. and can let Ethernet driver use, say, HSS queues.
 The driver can't know that.

You are attacking a point that I did not make.

The board support code knows such things as that the front ethernet
port on the board is connected to the CPU's MII port number #2, but
the board support code does _not_ know that MII port number #2
corresponds to ixp4xx hardware queue #5.

If Intel puts out a firmware update next month, and your ethernet
driver is modified to take advantage of the new features in that
firmware and starts depending on the newer version of that firmware,
we will have to modify every ixp4xx board support file in case the
firmware update modifies the ixp4xx queue numbers in use.  The
mapping from hardware ports (MII port #0, MII port #6, HSS port
#42, whatever) to ixp4xx hardware queue numbers (0-63) should _not_
be put in every single ixp4xx board support file.

Even if you only change the

(in board support file)
.rxq= 4,

line to something like this instead:

(in some ixp4xx-specific or driver-specific header file)
#define IXP4XX_MII_PORT_1_RX_QUEUE  4

(in board support file)
.rxq= IXP4XX_MII_PORT_1_RX_QUEUE,

then you have remved this dependency, and then you only have to update
one place if you move to a newer firmware version.


  I generally discourage the use of such wrappers, as it often makes
  people forget that the set and clear operations are not atomic, and
  it ignores the fact that some of the other bits in the register you
  are modifying might have side-effects.
 
 Without them the code in question is hardly readable,

You can read Polish, how can you complain about code readability. :-))

*runs*


 I pick the need to remember about non-atomicity and possible side
 effects instead :-)

Sure, point taken, it's just that the person after you might not
remember..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-08 Thread Lennert Buytenhek
On Tue, May 08, 2007 at 05:28:21PM +0200, Krzysztof Halasa wrote:

  I was always curious, why do people want to run ixp4xx in LE mode? What
  are the benefits that overweight the obvious performance degradation?
 
 Debian is indeed a valid reason.
 I wonder if it would be much work to create BE Debian as well.

There _is_ an ARM BE version of Debian.

It's not an official port, but it's not maintained any worse than
the 'official' LE ARM Debian port is.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22 patch] iop: combined watchdog timer driver for iop3xx and iop13xx

2007-05-06 Thread Lennert Buytenhek
On Sun, May 06, 2007 at 01:13:58PM -0700, Dan Williams wrote:

> >> Here is a new watchdog driver for your review.  It supports two flavors
> >> of the iop watchdog timer.  The iop13xx watchdog can be stopped while
> >> the iop3xx version cannot.
> >
> >I started reviewing this patch yesterday. First thing I noticed was that
> >you seem to be moving some code from include/asm-arm/arch-iop13xx/system.h
> >to include/asm-arm/arch-iop13xx/iop13xx.h .
> >This should not be part of this patch since it is touching architecture
> >dependant code for which I do not have enough knowledge about this specific
> >architecture to tell if this is indeed the correct way to do this.
> >The maintainers of this architecture should imho comment on this.
> >Could you split this patch into 2 patches: one that deals with the moving 
> >of
> >the architecture dependant code (and explaining why) and one with the new
> >watchdog drivers? I will continue my review today.
> 
> I am one of the maintainers of this architecture, (Lennert Buytenhek
> is the other).

Dan has done more work on iop13xx than I have, and I'm OK with his
changes.

It's true that ARM-specific changes generally should go through the ARM
tree, but IMHO sometimes it makes sense to have one patch touch both
stuff under drivers/ and stuff under arch/arm/mach-foo, especially if
the changes are dependent and cause compile breakage if applied
separately.  Not sure whether that's the case here.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22 patch] iop: combined watchdog timer driver for iop3xx and iop13xx

2007-05-06 Thread Lennert Buytenhek
On Sun, May 06, 2007 at 01:13:58PM -0700, Dan Williams wrote:

  Here is a new watchdog driver for your review.  It supports two flavors
  of the iop watchdog timer.  The iop13xx watchdog can be stopped while
  the iop3xx version cannot.
 
 I started reviewing this patch yesterday. First thing I noticed was that
 you seem to be moving some code from include/asm-arm/arch-iop13xx/system.h
 to include/asm-arm/arch-iop13xx/iop13xx.h .
 This should not be part of this patch since it is touching architecture
 dependant code for which I do not have enough knowledge about this specific
 architecture to tell if this is indeed the correct way to do this.
 The maintainers of this architecture should imho comment on this.
 Could you split this patch into 2 patches: one that deals with the moving 
 of
 the architecture dependant code (and explaining why) and one with the new
 watchdog drivers? I will continue my review today.
 
 I am one of the maintainers of this architecture, (Lennert Buytenhek
 is the other).

Dan has done more work on iop13xx than I have, and I'm OK with his
changes.

It's true that ARM-specific changes generally should go through the ARM
tree, but IMHO sometimes it makes sense to have one patch touch both
stuff under drivers/ and stuff under arch/arm/mach-foo, especially if
the changes are dependent and cause compile breakage if applied
separately.  Not sure whether that's the case here.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT] e100 driver on ARM

2007-04-26 Thread Lennert Buytenhek
On Thu, Apr 26, 2007 at 09:41:22AM -0400, David Acker wrote:

> Here is a quote from Russell that describes what I believe is the main 
> problem:
> http://www-gatago.com/linux/kernel/15457063.html
> "
> Has e100 actually been fixed to use the PCI DMA API correctly yet?
> Looking at it, it doesn't look like it, so until it does, eepro100
> is the far better bet for platforms needing working DMA API.
> 
> What I'm talking about is e100's apparant belief that it can modify
> rfd's in the receive ring on a non-cache coherent architecture and
> expect the data around it to remain unaffected (see e100_rx_alloc_skb):
> 
> struct rfd {
> u16 status;
> u16 command;
> u32 link;
> u32 rbd;
> u16 actual_size;
> u16 size;
> };
> 
> it touches command and link. This means that the whole rfd plus
> maybe the following or preceding 16 bytes get loaded into a cache
> line (assuming cache lines of 32 bytes), and that data written
> out again at sync. However, it does this on what seems to be an
> active receive chain.
> 
> So, both the CPU _and_ the device own the same data. Which is a
> violation of the DMA API.
> "
> 
> I think that the S-bit patch fixes it because the hardware spins on the 
> s-bit instead of using the packet.  With just the el-bit, the hardware 
> tries to use the same cache line that the software is updating.
> 
> Can someone from Intel let us know if I understand the hardware's 
> handling of the S and EL bits?  If my interpretation is correct, can the 
> s-bit patch be applied?  It seems like the correct way to lock out the 
> hardware while a packet is being updated.  I have not seen a reason 
> given not to apply the patch.

This is all a while ago now, but wasn't the e100 S-bit patch originally
written by Intel people in response to the very same quote by Russell
King that you've quoted above?  The S-bit patch should probably just
be applied, IMHO.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT] e100 driver on ARM

2007-04-26 Thread Lennert Buytenhek
On Thu, Apr 26, 2007 at 09:41:22AM -0400, David Acker wrote:

 Here is a quote from Russell that describes what I believe is the main 
 problem:
 http://www-gatago.com/linux/kernel/15457063.html
 
 Has e100 actually been fixed to use the PCI DMA API correctly yet?
 Looking at it, it doesn't look like it, so until it does, eepro100
 is the far better bet for platforms needing working DMA API.
 
 What I'm talking about is e100's apparant belief that it can modify
 rfd's in the receive ring on a non-cache coherent architecture and
 expect the data around it to remain unaffected (see e100_rx_alloc_skb):
 
 struct rfd {
 u16 status;
 u16 command;
 u32 link;
 u32 rbd;
 u16 actual_size;
 u16 size;
 };
 
 it touches command and link. This means that the whole rfd plus
 maybe the following or preceding 16 bytes get loaded into a cache
 line (assuming cache lines of 32 bytes), and that data written
 out again at sync. However, it does this on what seems to be an
 active receive chain.
 
 So, both the CPU _and_ the device own the same data. Which is a
 violation of the DMA API.
 
 
 I think that the S-bit patch fixes it because the hardware spins on the 
 s-bit instead of using the packet.  With just the el-bit, the hardware 
 tries to use the same cache line that the software is updating.
 
 Can someone from Intel let us know if I understand the hardware's 
 handling of the S and EL bits?  If my interpretation is correct, can the 
 s-bit patch be applied?  It seems like the correct way to lock out the 
 hardware while a packet is being updated.  I have not seen a reason 
 given not to apply the patch.

This is all a while ago now, but wasn't the e100 S-bit patch originally
written by Intel people in response to the very same quote by Russell
King that you've quoted above?  The S-bit patch should probably just
be applied, IMHO.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: I/O memory barriers vs SMP memory barriers

2007-03-28 Thread Lennert Buytenhek
On Mon, Mar 26, 2007 at 01:07:11PM -0700, Paul E. McKenney wrote:

> > > > Does everybody agree on these semantics, though?  At least David
> > > > seems to think that mb/rmb/wmb aren't required to order normal
> > > > memory accesses against each other..
> > > 
> > > Not on UP.  On SMP, ordering is (almost certainly) required.
> > 
> > 'almost certainly'?  That sounds like there is a possibility that it
> > wouldn't have to?  What does this depend on?
> 
> The underlying memory model of the CPU.  For sequentially consistent
> systems, only compiler barriers are required.  There are very few such
> systems -- MIPS and PA-RISC, if I remember correctly.  Performance
> dictates otherwise.
> 
> I believe that MIPS is -not- sequentially consistent, but have not yet
> purchased an architecture reference manual.

ARM Normal memory (RAM) accesses are weakly ordered, so on SMP, you
need barriers.  (SMP ARM systems are the definite minority, though.)

(For ARM UP, we generally don't care, since most have virtual caches
and are not I/O coherent, and so DMA coherent mappings will be done
as uncached mappings, and uncached mappings are strongly ordered --
except on XScale V3, which supports I/O coherency, and so you need to
use barriers when operating on DMA coherent memory because DMA coherent
mappings are done as Normal memory (which is weakly ordered) when I/O
coherency is enabled.)


> Given that ARM device drivers are accessing MMIO locations, which are
> often slow anyway, how much is ARM really gaining by dropping memory
> barriers when only I/O accesses need be ordered?  Is it measurable?

No idea -- I assume Catalin has looked at this.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: I/O memory barriers vs SMP memory barriers

2007-03-28 Thread Lennert Buytenhek
On Mon, Mar 26, 2007 at 01:07:11PM -0700, Paul E. McKenney wrote:

Does everybody agree on these semantics, though?  At least David
seems to think that mb/rmb/wmb aren't required to order normal
memory accesses against each other..
   
   Not on UP.  On SMP, ordering is (almost certainly) required.
  
  'almost certainly'?  That sounds like there is a possibility that it
  wouldn't have to?  What does this depend on?
 
 The underlying memory model of the CPU.  For sequentially consistent
 systems, only compiler barriers are required.  There are very few such
 systems -- MIPS and PA-RISC, if I remember correctly.  Performance
 dictates otherwise.
 
 I believe that MIPS is -not- sequentially consistent, but have not yet
 purchased an architecture reference manual.

ARM Normal memory (RAM) accesses are weakly ordered, so on SMP, you
need barriers.  (SMP ARM systems are the definite minority, though.)

(For ARM UP, we generally don't care, since most have virtual caches
and are not I/O coherent, and so DMA coherent mappings will be done
as uncached mappings, and uncached mappings are strongly ordered --
except on XScale V3, which supports I/O coherency, and so you need to
use barriers when operating on DMA coherent memory because DMA coherent
mappings are done as Normal memory (which is weakly ordered) when I/O
coherency is enabled.)


 Given that ARM device drivers are accessing MMIO locations, which are
 often slow anyway, how much is ARM really gaining by dropping memory
 barriers when only I/O accesses need be ordered?  Is it measurable?

No idea -- I assume Catalin has looked at this.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >