Re: PCI DAC DMA APIs

2007-03-15 Thread Jesse Barnes
On Thursday, March 15, 2007 5:38 am Jan Beulich wrote:
> While the kernel headers provide for this, there don't appear to be
> any in-tree users (which seems contrary to general Linux policies).
> Would there be objections to remove all of these?

It should be safe to kill them, but I remember arguing with davem about 
this stuff in the past...

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI DAC DMA APIs

2007-03-15 Thread Christoph Hellwig
On Thu, Mar 15, 2007 at 12:38:13PM +, Jan Beulich wrote:
> While the kernel headers provide for this, there don't appear to be any
> in-tree users (which seems contrary to general Linux policies). Would there
> be objections to remove all of these?

They should go away.  Having them in for more than five years without
any users is almost a guarantee for bitrot.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGFIX][PATCH] fixing placement of register stack under ulimit -s

2007-03-15 Thread KAMEZAWA Hiroyuki
On Thu, 15 Mar 2007 09:57:28 -0600
"David Mosberger-Tang" <[EMAIL PROTECTED]> wrote:

> But aren't you going to be limited to less than a page worth of
> register-backing store even with your patch applied because the
> backing store will end up overflowing the memory stack?
> 

I think pthread's stack, which is created by malloc, is also shared
among register-stack and memory-stack. 
(glibc's pthread's stack is limited by ulimit, too.)

So, it seems stack_size_limit = register_stack_limit + memory_stack_limit
is a consistent way. I'm sorry if I don't catch your point.

 --Kame


>   --david
> 
> On 3/15/07, KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> > This patch fixes ia64's bug in ulimit -s handling. against 2.6.21-rc3.
> >
> > At first,the address of register stack is defined by this
> > == ia64_set_rbs_bot()
> > unsigned long stack_size = current->signal->rlim[RLIMIT_STACK].rlim_max & 
> > -16;
> >
> > if (stack_size > MAX_USER_STACK_SIZE)
> > stack_size = MAX_USER_STACK_SIZE;
> > current->thread.rbs_bot = STACK_TOP - stack_size;
> > }
> > ==
> > default rlim[RLIMIT_STACK].rlim_max is very big.
> >
> > If ulimit -s is used, rlim_max can be small. As a result, rbs_bot can be
> > very high address. Because of stack-address-randomize, there is a case that
> > "regiter stack" is placed in the higher address than "memory stack".
> > == example...
> > [EMAIL PROTECTED] linux-2.6.21-rc2]$ ulimit -s
> > 8192
> > [EMAIL PROTECTED] linux-2.6.21-rc2]$ cat /proc/self/maps
> > -4000 r--p  00:00 0
> > 2000-20038000 r-xp  08:02 5991525
> > /lib/ld-2.5.so
> > 20044000-2005 rw-p 00034000 08:02 5991525
> > /lib/ld-2.5.so
> > 20064000-202c8000 r-xp  08:02 5990699
> > /lib/libc-2.5.so
> > 202c8000-202d4000 ---p 00264000 08:02 5990699
> > /lib/libc-2.5.so
> > 202d4000-202e rw-p 0026 08:02 5990699
> > /lib/libc-2.5.so
> > 202e-202ec000 rw-p 202e 00:00 0
> > 202ec000-23a24000 r--p  08:02 9472842
> > /usr/lib/locale/locale-archive
> > 4000-40008000 r-xp  08:02 4157490
> > /bin/cat
> > 60004000-6000c000 rw-p 4000 08:02 4157490
> > /bin/cat
> > 6000c000-6003 rw-p 6000c000 00:00 0  
> > [heap]
> > 6f5f4000-6f648000 rw-p 6f5f4000 00:00 0  
> > [stack]
> > 6f7fc000-6f80 rw-p 6f7fc000 00:00 0  (*)
> > a000-a002 ---p  00:00 0  
> > [vdso]
> > (*) is register stack.
> > ==
> > This register-stack/memory-stack upside down is not expected.
> > Current ia64 page fault handler doesn't handle this case. In this case,
> > register stack expansion causes SEGV.
> > This means that the user program can use only 1 page for its register stack.
> >
> > This patch fixes the above case by moving register stack to suitable place.
> > Note) fixing page fault handler seems to be another way...but a bit 
> > complicated.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
> > ---
> > Index: linux-2.6.21-rc3/arch/ia64/mm/init.c
> > ===
> > --- linux-2.6.21-rc3.orig/arch/ia64/mm/init.c
> > +++ linux-2.6.21-rc3/arch/ia64/mm/init.c
> > @@ -155,7 +155,7 @@ ia64_set_rbs_bot (void)
> >
> > if (stack_size > MAX_USER_STACK_SIZE)
> > stack_size = MAX_USER_STACK_SIZE;
> > -   current->thread.rbs_bot = STACK_TOP - stack_size;
> > +   current->thread.rbs_bot = PAGE_ALIGN(current->mm->start_stack - 
> > stack_size);
> >  }
> >
> >  /*
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> -- 
> Mosberger Consulting LLC, http://www.mosberger-consulting.com/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] FUTEX : introduce private hashtables

2007-03-15 Thread Nick Piggin

Ulrich Drepper wrote:

On 3/15/07, Nick Piggin <[EMAIL PROTECTED]> wrote:

There should be little contention on the memory in the global hash 
anyway,
because we can roughly reduce contention as a factor of 
hash-size/cacheline-size.


What we will have are cache misses on the global table... but we're 
going to

get cache misses on those private tables as well.



I'm thinking about NUMA cases.  If you have private tables for a
process which is pinned to some cluster in a NUMA machine the table is
local to the node.  If you have a global table you cannot optimize
your application for such a situation because at least some of the
pages of the global table are remote.



That's true, but it also might be able to be improved in other ways.

At least once we get the basic support in the kernel, and glibc picks
it up, then we have a better base to evaluate these more exotic changes
against.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] swsusp: Do not use page flags

2007-03-15 Thread Andrew Morton
On Thu, 15 Mar 2007 22:05:53 +0100
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> On Thursday, 15 March 2007 20:08, Andrew Morton wrote:
> > > On Mon, 12 Mar 2007 22:19:20 +0100 "Rafael J. Wysocki" <[EMAIL 
> > > PROTECTED]> wrote:
> > > +int create_basic_memory_bitmaps(void)
> > > +{
> > > + struct memory_bitmap *bm1, *bm2;
> > > + int error = 0;
> > > +
> > > + BUG_ON(forbidden_pages_map || free_pages_map);
> > > +
> > > + bm1 = kzalloc(sizeof(struct memory_bitmap), GFP_ATOMIC);
> > > + if (!bm1)
> > > + return -ENOMEM;
> > > +
> > > + error = memory_bm_create(bm1, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> > > + if (error)
> > > + goto Free_first_object;
> > > +
> > > + bm2 = kzalloc(sizeof(struct memory_bitmap), GFP_ATOMIC);
> > > + if (!bm2)
> > > + goto Free_first_bitmap;
> > > +
> > > + error = memory_bm_create(bm2, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> > > + if (error)
> > 
> > What is the risk that we'll go OOM here?  GFP_ATOMIC is rather unreliable.
> 
> Well, this can be called after processes (including kswapd) has been frozen.
> We can't go to sleep at this point.

So it _is_ unreliable?

> > And why _does_ suspend use GFP_ATOMIC all over the place?
> 
> Generally, because it cannot sleep.

Why not?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NPTL patch for linux 2.4.28

2007-03-15 Thread Syed Ahemed

Hello Willy ,

As an afterthought , You had seamlessly backported 2.4.32 and other
patches for me sometime back.Please see article below.
http://lwn.net/Articles/169722/

I could provide you with patch rejects and any other help from the
2.4.20-8 patch (rpm2cpio'ed  redhat source rpm) to see if we can work
together and provide a patch for other kernel releases too.

As for my problem it wil take some convincing on my part and
feasability study to convince my company to get to 2.6/RHEL
kernel.While that happens i have the time and interest to make this
work ...Sincerly.


Regards
Syed



On 3/15/07, Syed Ahemed <[EMAIL PROTECTED]> wrote:

Point taken Sir , Change to 2.6 is inevitable i believe Allthough
i shall give the RHEL patch port to 2.4.28 a try over the weekend :-)
Thanks  a lot Willy and Peter for the help .

On 3/15/07, Willy Tarreau <[EMAIL PROTECTED]> wrote:
> On Thu, Mar 15, 2007 at 08:53:06AM +0100, Peter Zijlstra wrote:
> > On Thu, 2007-03-15 at 03:14 +0530, Syed Ahemed wrote:
> >
> > > Getting RHEL's source ( http://lkml.org/lkml/2005/3/21/380 ) was an
> > > idea i thought about but then a download of the RHEL source from the
> > > following location was denied .
> > > http://download.fedora.redhat.com/pub/fedora/linux/core/1/SRPMS/  and
> > > the rpmfind site.
> > > (Guess need to be a paid subscriber for that right ?)
> >
> > Strangely enough you try to download Fedora Core SRPMs whilst you speak
> > of RHEL. Try this one:
> >
> > 
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/os/i386/SRPMS/kernel-2.4.21-4.EL.src.rpm
> >
> > Also, CentOS would have similar sources. Google could have informed you
> > on that.
> >
> > > I still wonder why there aren't any NPTL patches available in the
> > > non-redhat sites for kernels like 2.4.21 or more .
> >
> > Because most people, especially the ones on this mailing list have moved
> > on to 2.6 quite some time ago. May I suggest you do the same?
>
> ... or they stick to 2.4 for specific uses and do not need NPTL at all, which
> came late in the development cycle.
>
> Regards,
> Willy
>
>


--
Azhar khan

I'm afraid that I've seen too many people fix bugs by looking at
debugger output, and that almost inevitably leads to fixing the
symptoms rather than the underlying problems.

--Linus




--
Azhar khan

I'm afraid that I've seen too many people fix bugs by looking at
debugger output, and that almost inevitably leads to fixing the
symptoms rather than the underlying problems.

--Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-15 Thread Andrea Arcangeli
On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <[EMAIL PROTECTED]> 
> > wrote:
> > I still think the simple fix of removing the 
> > condition is the best approach, but I'm certainly open to alternatives.
> 
> Yes, the problem of falsely activating pages when the file is read in small
> hunks is worse than the problem which your patch fixes.

Really? I would have expected all performance sensitive apps to read
in >=PAGE_SIZE chunks. And if they don't because they split their
dataset in blocks (like some database), it may not be so wrong to
activate those pages that have two "hot" blocks more aggressively than
those pages with a single hot block.

So I've an hard time to advocate to prefer the current behavior, but
certainly this can be "fixed" by caching the last_offset like others
pointed out ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 6/13] signalfd/timerfd/asyncfd v5 - timerfd core ...

2007-03-15 Thread Thomas Gleixner
Davide,

On Wed, 2007-03-14 at 15:19 -0700, Davide Libenzi wrote:

> +static int timerfd_tmrproc(struct hrtimer *htmr)
> +{
> + struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
> + int rval = HRTIMER_NORESTART;
> + unsigned long flags;
> +
> + spin_lock_irqsave(>lock, flags);
> + ctx->ticks++;
> + wake_up_locked(>wqh);
> + if (ctx->tintv.tv64 != 0) {
> + hrtimer_forward(htmr, htmr->base->softirq_time, ctx->tintv);

Sorry, I missed that in the first reviews. Please use
hrtimer_cb_get_time(htmr) instead of htmr->base->softirq_time, so this
is high res timer safe.

> + rval = HRTIMER_RESTART;
> + }
> + spin_unlock_irqrestore(>lock, flags);
> +
> + return rval;
> +}
> +
> +
> +static int timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
> +  const struct itimerspec *ktmr)
> +{

Make this void, returns 0 anyway

> + enum hrtimer_mode htmode;
> +
> + htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_ABS: HRTIMER_REL;
> +
> + ctx->ticks = 0;
> + ctx->clockid = clockid;
> + ctx->flags = flags;
> + ctx->texp = timespec_to_ktime(ktmr->it_value);

clockid is stored in the timer on setup, so no need to store it again.
expiry time and flags are not used after setup.

Please remove those fields.

> + ctx->tintv = timespec_to_ktime(ktmr->it_interval);
> + hrtimer_init(>tmr, ctx->clockid, htmode);
> + ctx->tmr.expires = ctx->texp;
> + ctx->tmr.function = timerfd_tmrproc;
> + if (ctx->texp.tv64 != 0)
> + hrtimer_start(>tmr, ctx->texp, htmode);
> +
> + return 0;
> +}
> +
> + if (ufd == -1) {
> + ctx = kmem_cache_alloc(timerfd_ctx_cachep, GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + init_waitqueue_head(>wqh);
> + spin_lock_init(>lock);
> + ctx->clockid = -1;
> +
> + error = timerfd_setup(ctx, clockid, flags, );
> + if (error)
> + goto err_ctxfree;

Timer setup can not fail

> + /*
> +  * When we call this, the initialization must be complete, since
> +  * aino_getfd() will install the fd.
> +  */
> + error = aino_getfd(, , , "[timerfd]",
> +_fops, ctx);
> + if (error)
> + goto err_ctxfree;

Again: Please turn this around. No need to start the timer before we
know, that everything works. 

> + } else {
> + file = fget(ufd);
> + if (!file)
> + return -EBADF;
> + ctx = file->private_data;
> + if (file->f_op != _fops) {
> + fput(file);
> + return -EINVAL;
> + }
> + /*
> +  * We need to stop the existing timer before reprogramming
> +  * it to the new values.
> +  */
> + for (;;) {
> + spin_lock_irq(>lock);
> + if (hrtimer_try_to_cancel(>tmr) >= 0)
> + break;
> + spin_unlock_irq(>lock);
> + cpu_relax();
> + }
> + /*
> +  * Re-program the timer to the new value ...
> +  */
> + error = timerfd_setup(ctx, clockid, flags, );

Timer setup can not fail

> + spin_unlock_irq(>lock);
> + fput(file);
> + if (error)
> + return error;
> + }
> +
> + return ufd;
> +
> +err_ctxfree:
> + timerfd_cleanup(ctx);
> + return error;
> +}
> +
> +
> +static void timerfd_cleanup(struct timerfd_ctx *ctx)
> +{
> + if (ctx->clockid >= 0)
> + hrtimer_cancel(>tmr);

You don't have a file descriptor, when the setup failed. So the timer is
always initialized.

> + kmem_cache_free(timerfd_ctx_cachep, ctx);
> +}
> +
> +
> +static int timerfd_close(struct inode *inode, struct file *file)
> +{
> + timerfd_cleanup(file->private_data);
> + return 0;
> +}
> +

Please move the timerfd_cleanup code into close(). 

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-15 Thread Andrew Morton
On Thu, 15 Mar 2007 22:49:23 +0100
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <[EMAIL PROTECTED]> 
> > > wrote:
> > > I still think the simple fix of removing the 
> > > condition is the best approach, but I'm certainly open to alternatives.
> > 
> > Yes, the problem of falsely activating pages when the file is read in small
> > hunks is worse than the problem which your patch fixes.
> 
> Really? I would have expected all performance sensitive apps to read
> in >=PAGE_SIZE chunks. And if they don't because they split their
> dataset in blocks (like some database), it may not be so wrong to
> activate those pages that have two "hot" blocks more aggressively than
> those pages with a single hot block.

But the problem which is being fixed here is really obscure: an application
repeatedly reading the first page and only the first page of a file, always
via the same fd.

I'd expect that the sub-page-size read scenarion happens heaps more often
than that, especially when dealing with larger PAGE_SIZEs.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13 of 33] IB/ipath - Fix CQ flushing when QP is modified to error state

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID aaee8124e3bc44ec423b5e1c46ef90ede9f21483
# Parent  84a9691cf7ff54ce76de402d2353a451ba9c555b
IB/ipath - Fix CQ flushing when QP is modified to error state

If a receive work request has been removed from the queue but
has not had a CQ entry generated for it and the QP is modified
to the error state, the completion entry generated is
incorrect.  This patch fixes the problem.

Signed-off-by: Ralph Campbell <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 84a9691cf7ff -r aaee8124e3bc drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:25 2007 -0700
@@ -519,7 +519,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
break;
 
case IB_QPS_ERR:
-   ipath_error_qp(qp, IB_WC_GENERAL_ERR);
+   ipath_error_qp(qp, IB_WC_WR_FLUSH_ERR);
break;
 
default:
diff -r 84a9691cf7ff -r aaee8124e3bc drivers/infiniband/hw/ipath/ipath_ud.c
--- a/drivers/infiniband/hw/ipath/ipath_ud.cThu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ud.cThu Mar 15 14:34:25 2007 -0700
@@ -647,6 +647,7 @@ void ipath_ud_rcv(struct ipath_ibdev *de
ipath_skip_sge(>r_sge, sizeof(struct ib_grh));
ipath_copy_sge(>r_sge, data,
   wc.byte_len - sizeof(struct ib_grh));
+   qp->r_wrid_valid = 0;
wc.wr_id = qp->r_wr_id;
wc.status = IB_WC_SUCCESS;
wc.opcode = IB_WC_RECV;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17 of 33] IB/ipath - remove unused register read routine ipath_read_kreg64_port()

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID a023ffe32d9df8cba7d8b15c24e7918eeb236a2c
# Parent  4d22cec2265b606cecee72d5abca4436bb1e6cb7
IB/ipath - remove unused register read routine ipath_read_kreg64_port()

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 4d22cec2265b -r a023ffe32d9d drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
@@ -1804,29 +1804,6 @@ int ipath_set_lid(struct ipath_devdata *
return 0;
 }
 
-/**
- * ipath_read_kreg64_port - read a device's per-port 64-bit kernel register
- * @dd: the infinipath device
- * @regno: the register number to read
- * @port: the port containing the register
- *
- * Registers that vary with the chip implementation constants (port)
- * use this routine.
- */
-u64 ipath_read_kreg64_port(const struct ipath_devdata *dd, ipath_kreg regno,
-  unsigned port)
-{
-   u16 where;
-
-   if (port < dd->ipath_portcnt &&
-   (regno == dd->ipath_kregs->kr_rcvhdraddr ||
-regno == dd->ipath_kregs->kr_rcvhdrtailaddr))
-   where = regno + port;
-   else
-   where = -1;
-
-   return ipath_read_kreg64(dd, where);
-}
 
 /**
  * ipath_write_kreg_port - write a device's per-port 64-bit kernel register
diff -r 4d22cec2265b -r a023ffe32d9d drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c   Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c   Thu Mar 15 14:34:25 
2007 -0700
@@ -208,8 +208,8 @@ static const struct ipath_kregs ipath_ht
.kr_serdesstatus = IPATH_KREG_OFFSET(SerdesStatus),
.kr_xgxsconfig = IPATH_KREG_OFFSET(XGXSConfig),
/*
-* These should not be used directly via ipath_read_kreg64(),
-* use them with ipath_read_kreg64_port(),
+* These should not be used directly via ipath_write_kreg64(),
+* use them with ipath_write_kreg64_port(),
 */
.kr_rcvhdraddr = IPATH_KREG_OFFSET(RcvHdrAddr0),
.kr_rcvhdrtailaddr = IPATH_KREG_OFFSET(RcvHdrTailAddr0)
diff -r 4d22cec2265b -r a023ffe32d9d drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c   Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c   Thu Mar 15 14:34:25 
2007 -0700
@@ -207,8 +207,8 @@ static const struct ipath_kregs ipath_pe
.kr_ibpllcfg = IPATH_KREG_OFFSET(IBPLLCfg),
 
/*
-* These should not be used directly via ipath_read_kreg64(),
-* use them with ipath_read_kreg64_port()
+* These should not be used directly via ipath_write_kreg64(),
+* use them with ipath_write_kreg64_port(),
 */
.kr_rcvhdraddr = IPATH_KREG_OFFSET(RcvHdrAddr0),
.kr_rcvhdrtailaddr = IPATH_KREG_OFFSET(RcvHdrTailAddr0),
diff -r 4d22cec2265b -r a023ffe32d9d drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.hThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.hThu Mar 15 14:34:25 
2007 -0700
@@ -756,8 +756,6 @@ int ipath_eeprom_write(struct ipath_devd
 /* these are used for the registers that vary with port */
 void ipath_write_kreg_port(const struct ipath_devdata *, ipath_kreg,
   unsigned, u64);
-u64 ipath_read_kreg64_port(const struct ipath_devdata *, ipath_kreg,
-  unsigned);
 
 /*
  * We could have a single register get/put routine, that takes a group type,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 29 of 33] IB/ipath - fix unit selection due to all cpu affinity bits set

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 8c4f730dbde3eed6e066ead5be4746d58840f24f
# Parent  b436c73d4fe312c3cba092d5f642de5c0ff6aa91
IB/ipath - fix unit selection due to all cpu affinity bits set

At some point things changed so that all the affinity bits can be
set, but cpus_full() macro is not true.  This caused problems with
the unit selection logic on multi-unit (board) configurations.

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r b436c73d4fe3 -r 8c4f730dbde3 
drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
@@ -1592,15 +1592,16 @@ static int find_best_unit(struct file *f
 */
if (!cpus_empty(current->cpus_allowed) &&
!cpus_full(current->cpus_allowed)) {
-   int ncpus = num_online_cpus(), curcpu = -1;
+   int ncpus = num_online_cpus(), curcpu = -1, nset = 0;
for (i = 0; i < ncpus; i++)
if (cpu_isset(i, current->cpus_allowed)) {
ipath_cdbg(PROC, "%s[%u] affinity set for "
-  "cpu %d\n", current->comm,
-  current->pid, i);
+  "cpu %d/%d\n", current->comm,
+  current->pid, i, ncpus);
curcpu = i;
+   nset++;
}
-   if (curcpu != -1) {
+   if (curcpu != -1 && nset != ncpus) {
if (npresent) {
prefunit = curcpu / (ncpus / npresent);
ipath_cdbg(PROC,"%s[%u] %d chips, %d cpus, "
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22 of 33] IB/ipath - print better error messages if kernel is misconfigured

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID fe719d50378ce70909f96bd5e7bc8e4f28a5031b
# Parent  68302e9dbd8803f937af9f02ca26a63ff43e9afa
IB/ipath - print better error messages if kernel is misconfigured

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 68302e9dbd88 -r fe719d50378c drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
@@ -390,15 +390,23 @@ static int __devinit ipath_init_one(stru
 
/* setup the chip-specific functions, as early as possible. */
switch (ent->device) {
+   case PCI_DEVICE_ID_INFINIPATH_HT:
 #ifdef CONFIG_HT_IRQ
-   case PCI_DEVICE_ID_INFINIPATH_HT:
ipath_init_iba6110_funcs(dd);
break;
+#else
+   ipath_dev_err(dd, "QLogic HT device 0x%x cannot work if "
+ "CONFIG_HT_IRQ is not enabled\n", ent->device);
+   return -ENODEV;
 #endif
+   case PCI_DEVICE_ID_INFINIPATH_PE800:
 #ifdef CONFIG_PCI_MSI
-   case PCI_DEVICE_ID_INFINIPATH_PE800:
ipath_init_iba6120_funcs(dd);
break;
+#else
+   ipath_dev_err(dd, "QLogic PCIE device 0x%x cannot work if "
+ "CONFIG_PCI_MSI is not enabled\n", ent->device);
+   return -ENODEV;
 #endif
default:
ipath_dev_err(dd, "Found unknown QLogic deviceid 0x%x, "
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04 of 33] IB/ipath - don't initialize port memory for subports

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID d90faa722f120e0896aff3643a623b1e0c0c69d0
# Parent  e2eec96f356a7269b46a68f29fc5e711d2f5a7a4
IB/ipath - don't initialize port memory for subports

A recent change was made to allocate memory for a port after CPU
affinity is set. That change didn't account for subports and
was trying to allocate memory for the port twice.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r e2eec96f356a -r d90faa722f12 
drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:24 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:24 
2007 -0700
@@ -178,8 +178,7 @@ static int ipath_get_base_info(struct fi
 
kinfo->spi_rcvhdr_base = ((u64) pd->subport_rcvhdr_base +
pd->port_rcvhdrq_size * slave) & MMAP64_MASK;
-   kinfo->spi_rcvhdr_tailaddr =
-   (u64) pd->port_rcvhdrqtailaddr_phys & MMAP64_MASK;
+   kinfo->spi_rcvhdr_tailaddr = 0;
kinfo->spi_rcv_egrbufs = ((u64) pd->subport_rcvegrbuf +
dd->ipath_rcvegrcnt * dd->ipath_rcvegrbufsize * slave) &
MMAP64_MASK;
@@ -1443,6 +1442,7 @@ static int init_subports(struct ipath_de
pd->port_subport_cnt = uinfo->spu_subport_cnt;
pd->port_subport_id = uinfo->spu_subport_id;
pd->active_slaves = 1;
+   set_bit(IPATH_PORT_MASTER_UNINIT, >port_flag);
goto bail;
 
 bail_rhdr:
@@ -1764,11 +1764,17 @@ static int ipath_do_user_init(struct fil
  const struct ipath_user_info *uinfo)
 {
int ret;
-   struct ipath_portdata *pd;
+   struct ipath_portdata *pd = port_fp(fp);
struct ipath_devdata *dd;
u32 head32;
 
-   pd = port_fp(fp);
+   /* Subports don't need to initialize anything since master did it. */
+   if (subport_fp(fp)) {
+   ret = wait_event_interruptible(pd->port_wait,
+   !test_bit(IPATH_PORT_MASTER_UNINIT, >port_flag));
+   goto done;
+   }
+
dd = pd->port_dd;
 
if (uinfo->spu_rcvhdrsize) {
@@ -1826,6 +1832,11 @@ static int ipath_do_user_init(struct fil
 dd->ipath_rcvctrl & ~INFINIPATH_R_TAILUPD);
ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 dd->ipath_rcvctrl);
+   /* Notify any waiting slaves */
+   if (pd->port_subport_cnt) {
+   clear_bit(IPATH_PORT_MASTER_UNINIT, >port_flag);
+   wake_up(>port_wait);
+   }
 done:
return ret;
 }
diff -r e2eec96f356a -r d90faa722f12 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.hThu Mar 15 14:34:24 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.hThu Mar 15 14:34:24 
2007 -0700
@@ -701,6 +701,8 @@ int ipath_set_rx_pol_inv(struct ipath_de
 #define IPATH_PORT_WAITING_RCV   2
/* waiting for a PIO buffer to be available */
 #define IPATH_PORT_WAITING_PIO   3
+   /* master has not finished initializing */
+#define IPATH_PORT_MASTER_UNINIT 4
 
 /* free up any allocated data at closes */
 void ipath_free_data(struct ipath_portdata *dd);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05 of 33] IB/ipath - fix case where SRQ limit event causes CQ entry to be dropped

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID fa38a027a0853a80c4f7dfc50345c89f195bc85b
# Parent  d90faa722f120e0896aff3643a623b1e0c0c69d0
IB/ipath - fix case where SRQ limit event causes CQ entry to be dropped

A silly programming error causes a CQ entry to not be generated if a
SRQ limit event is generated.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r d90faa722f12 -r fa38a027a085 drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c   Thu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c   Thu Mar 15 14:34:24 2007 -0700
@@ -202,6 +202,7 @@ int ipath_get_rwqe(struct ipath_qp *qp, 
wq->tail = tail;
 
ret = 1;
+   qp->r_wrid_valid = 1;
if (handler) {
u32 n;
 
@@ -229,7 +230,6 @@ int ipath_get_rwqe(struct ipath_qp *qp, 
}
}
spin_unlock_irqrestore(>lock, flags);
-   qp->r_wrid_valid = 1;
 
 bail:
return ret;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 33 of 33] IB/ipath - fix drift between WCs in user and kernel space

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Robert Walsh <[EMAIL PROTECTED]>
# Date 1173994466 25200
# Node ID e61b0123190cfbc01cc34d1c648d1752a89f8f3d
# Parent  c3b5b279bc90e5758da2ac382cbff4ee0245e84b
IB/ipath - fix drift between WCs in user and kernel space

The kernel ib_wc structure now uses a QP pointer, but the user space
equivalent uses a QP number instead.  This means we can no longer use
a simple structure copy to copy stuff into user space.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r c3b5b279bc90 -r e61b0123190c drivers/infiniband/hw/ipath/ipath_cq.c
--- a/drivers/infiniband/hw/ipath/ipath_cq.cThu Mar 15 14:34:26 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_cq.cThu Mar 15 14:34:26 2007 -0700
@@ -76,7 +76,20 @@ void ipath_cq_enter(struct ipath_cq *cq,
}
return;
}
-   wc->queue[head] = *entry;
+   wc->queue[head].wr_id = entry->wr_id;
+   wc->queue[head].status = entry->status;
+   wc->queue[head].opcode = entry->opcode;
+   wc->queue[head].vendor_err = entry->vendor_err;
+   wc->queue[head].byte_len = entry->byte_len;
+   wc->queue[head].imm_data = (__u32 __force)entry->imm_data;
+   wc->queue[head].qp_num = entry->qp->qp_num;
+   wc->queue[head].src_qp = entry->src_qp;
+   wc->queue[head].wc_flags = entry->wc_flags;
+   wc->queue[head].pkey_index = entry->pkey_index;
+   wc->queue[head].slid = entry->slid;
+   wc->queue[head].sl = entry->sl;
+   wc->queue[head].dlid_path_bits = entry->dlid_path_bits;
+   wc->queue[head].port_num = entry->port_num;
wc->head = next;
 
if (cq->notify == IB_CQ_NEXT_COMP ||
@@ -122,9 +135,30 @@ int ipath_poll_cq(struct ib_cq *ibcq, in
if (tail > (u32) cq->ibcq.cqe)
tail = (u32) cq->ibcq.cqe;
for (npolled = 0; npolled < num_entries; ++npolled, ++entry) {
+   struct ipath_qp *qp;
+
if (tail == wc->head)
break;
-   *entry = wc->queue[tail];
+
+   qp = ipath_lookup_qpn(_idev(cq->ibcq.device)->qp_table,
+ wc->queue[tail].qp_num);
+   entry->qp = >ibqp;
+   if (atomic_dec_and_test(>refcount))
+   wake_up(>wait);
+
+   entry->wr_id = wc->queue[tail].wr_id;
+   entry->status = wc->queue[tail].status;
+   entry->opcode = wc->queue[tail].opcode;
+   entry->vendor_err = wc->queue[tail].vendor_err;
+   entry->byte_len = wc->queue[tail].byte_len;
+   entry->imm_data = wc->queue[tail].imm_data;
+   entry->src_qp = wc->queue[tail].src_qp;
+   entry->wc_flags = wc->queue[tail].wc_flags;
+   entry->pkey_index = wc->queue[tail].pkey_index;
+   entry->slid = wc->queue[tail].slid;
+   entry->sl = wc->queue[tail].sl;
+   entry->dlid_path_bits = wc->queue[tail].dlid_path_bits;
+   entry->port_num = wc->queue[tail].port_num;
if (tail >= cq->ibcq.cqe)
tail = 0;
else
diff -r c3b5b279bc90 -r e61b0123190c drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Mar 15 14:34:26 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Mar 15 14:34:26 2007 -0700
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ipath_layer.h"
 
@@ -188,7 +189,7 @@ struct ipath_cq_wc {
 struct ipath_cq_wc {
u32 head;   /* index of next entry to fill */
u32 tail;   /* index of next ib_poll_cq() entry */
-   struct ib_wc queue[1];  /* this is actually size ibcq.cqe + 1 */
+   struct ib_uverbs_wc queue[1]; /* this is actually size ibcq.cqe + 1 */
 };
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 30 of 33] IB/ipath - check reserved keys

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Robert Walsh <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID bf280d5f83d59788b58c17ff206bc3f54271a790
# Parent  8c4f730dbde3eed6e066ead5be4746d58840f24f
IB/ipath - check reserved keys

Don't let userspace use the direct-physical-map L-key or R-key.

Signed-off-by: Ralph Campbell <[EMAIL PROTECTED]>

diff -r 8c4f730dbde3 -r bf280d5f83d5 drivers/infiniband/hw/ipath/ipath_keys.c
--- a/drivers/infiniband/hw/ipath/ipath_keys.c  Thu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c  Thu Mar 15 14:34:25 2007 -0700
@@ -133,6 +133,12 @@ int ipath_lkey_ok(struct ipath_qp *qp, s
 * being reversible by calling bus_to_virt().
 */
if (sge->lkey == 0) {
+   struct ipath_pd *pd = to_ipd(qp->ibqp.pd);
+
+   if (pd->user) {
+   ret = 0;
+   goto bail;
+   }
isge->mr = NULL;
isge->vaddr = (void *) sge->addr;
isge->length = sge->length;
@@ -206,6 +212,12 @@ int ipath_rkey_ok(struct ipath_qp *qp, s
 * (see ipath_get_dma_mr and ipath_dma.c).
 */
if (rkey == 0) {
+   struct ipath_pd *pd = to_ipd(qp->ibqp.pd);
+
+   if (pd->user) {
+   ret = 0;
+   goto bail;
+   }
sge->mr = NULL;
sge->vaddr = (void *) vaddr;
sge->length = len;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 32 of 33] IB/ipath - check that a UD work request's address handle is valid

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Robert Walsh <[EMAIL PROTECTED]>
# Date 1173994466 25200
# Node ID c3b5b279bc90e5758da2ac382cbff4ee0245e84b
# Parent  9f6468cddf59f26e087d100980a11ee9f1af4f56
IB/ipath - check that a UD work request's address handle is valid

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 9f6468cddf59 -r c3b5b279bc90 drivers/infiniband/hw/ipath/ipath_ud.c
--- a/drivers/infiniband/hw/ipath/ipath_ud.cThu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ud.cThu Mar 15 14:34:26 2007 -0700
@@ -305,6 +305,11 @@ int ipath_post_ud_send(struct ipath_qp *
 
if (!(ib_ipath_state_ops[qp->state] & IPATH_PROCESS_SEND_OK)) {
ret = 0;
+   goto bail;
+   }
+
+   if (wr->wr.ud.ah->pd != qp->ibqp.pd) {
+   ret = -EPERM;
goto bail;
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 31 of 33] IB/ipath - remove duplicate stuff from ipath_verbs.h

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Robert Walsh <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 9f6468cddf59f26e087d100980a11ee9f1af4f56
# Parent  bf280d5f83d59788b58c17ff206bc3f54271a790
IB/ipath - remove duplicate stuff from ipath_verbs.h

ipath_verbs.h has some duplicate stuff.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r bf280d5f83d5 -r 9f6468cddf59 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Mar 15 14:34:25 2007 -0700
@@ -731,8 +731,6 @@ int ipath_query_srq(struct ib_srq *ibsrq
 
 int ipath_destroy_srq(struct ib_srq *ibsrq);
 
-void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig);
-
 int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
 
 struct ib_cq *ipath_create_cq(struct ib_device *ibdev, int entries,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14 of 33] IB/ipath - fix port sharing on powerpc

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 62da2fb770b66310ac06ba0190bf2bed2a5a764f
# Parent  aaee8124e3bc44ec423b5e1c46ef90ede9f21483
IB/ipath - fix port sharing on powerpc

The port sharing feature mixed kernel virtual addresses as well as
physical addresses for the offset used to describe the mmap address to map
the InfiniPath hardware into user space.  This had a conflict on powerpc.
The new scheme converts it to a physical address so it doesn't conflict
with chip addresses and yet still fits in 40/44 bits so it isn't truncated
by 32-bit applications calling mmap64().

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r aaee8124e3bc -r 62da2fb770b6 
drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
@@ -41,12 +41,6 @@
 #include "ipath_kernel.h"
 #include "ipath_common.h"
 
-/*
- * mmap64 doesn't allow all 64 bits for 32-bit applications
- * so only use the low 43 bits.
- */
-#define MMAP64_MASK0x7FFUL
-
 static int ipath_open(struct inode *, struct file *);
 static int ipath_close(struct inode *, struct file *);
 static ssize_t ipath_write(struct file *, const char __user *, size_t,
@@ -62,6 +56,24 @@ static const struct file_operations ipat
.poll = ipath_poll,
.mmap = ipath_mmap
 };
+
+/*
+ * Convert kernel virtual addresses to physical addresses so they don't
+ * potentially conflict with the chip addresses used as mmap offsets.
+ * It doesn't really matter what mmap offset we use as long as we can
+ * interpret it correctly. 
+ */
+static u64 cvt_kvaddr(void *p)
+{
+   struct page *page;
+   u64 paddr = 0;
+
+   page = vmalloc_to_page(p);
+   if (page)
+   paddr = page_to_pfn(page) << PAGE_SHIFT;
+
+   return paddr;
+}
 
 static int ipath_get_base_info(struct file *fp,
   void __user *ubase, size_t ubase_size)
@@ -173,15 +185,14 @@ static int ipath_get_base_info(struct fi
kinfo->spi_piocnt = dd->ipath_pbufsport / subport_cnt;
kinfo->spi_piobufbase = (u64) pd->port_piobufs +
dd->ipath_palign * kinfo->spi_piocnt * slave;
-   kinfo->__spi_uregbase = ((u64) pd->subport_uregbase +
-   PAGE_SIZE * slave) & MMAP64_MASK;
-
-   kinfo->spi_rcvhdr_base = ((u64) pd->subport_rcvhdr_base +
-   pd->port_rcvhdrq_size * slave) & MMAP64_MASK;
+   kinfo->__spi_uregbase = cvt_kvaddr(pd->subport_uregbase +
+   PAGE_SIZE * slave);
+
+   kinfo->spi_rcvhdr_base = cvt_kvaddr(pd->subport_rcvhdr_base +
+   pd->port_rcvhdrq_size * slave);
kinfo->spi_rcvhdr_tailaddr = 0;
-   kinfo->spi_rcv_egrbufs = ((u64) pd->subport_rcvegrbuf +
-   dd->ipath_rcvegrcnt * dd->ipath_rcvegrbufsize * slave) &
-   MMAP64_MASK;
+   kinfo->spi_rcv_egrbufs = cvt_kvaddr(pd->subport_rcvegrbuf +
+   dd->ipath_rcvegrcnt * dd->ipath_rcvegrbufsize * slave);
}
 
kinfo->spi_pioindex = (kinfo->spi_piobufbase - dd->ipath_piobufbase) /
@@ -199,11 +210,11 @@ static int ipath_get_base_info(struct fi
if (master) {
kinfo->spi_runtime_flags |= IPATH_RUNTIME_MASTER;
kinfo->spi_subport_uregbase =
-   (u64) pd->subport_uregbase & MMAP64_MASK;
+   cvt_kvaddr(pd->subport_uregbase);
kinfo->spi_subport_rcvegrbuf =
-   (u64) pd->subport_rcvegrbuf & MMAP64_MASK;
+   cvt_kvaddr(pd->subport_rcvegrbuf);
kinfo->spi_subport_rcvhdr_base =
-   (u64) pd->subport_rcvhdr_base & MMAP64_MASK;
+   cvt_kvaddr(pd->subport_rcvhdr_base);
ipath_cdbg(PROC, "port %u flags %x %llx %llx %llx\n",
kinfo->spi_port, kinfo->spi_runtime_flags,
(unsigned long long) kinfo->spi_subport_uregbase,
@@ -1131,13 +1142,11 @@ static int mmap_kvaddr(struct vm_area_st
struct ipath_devdata *dd;
void *addr;
size_t size;
-   int ret;
+   int ret = 0;
 
/* If the port is not shared, all addresses should be physical */
-   if (!pd->port_subport_cnt) {
-   ret = -EINVAL;
-   goto bail;
-   }
+   if (!pd->port_subport_cnt)
+   goto bail;
 
dd = pd->port_dd;
size = pd->port_rcvegrbuf_chunks * pd->port_rcvegrbuf_size;
@@ -1149,33 +1158,28 @@ static int mmap_kvaddr(struct vm_area_st
if (subport == 0) {
unsigned num_slaves = pd->port_subport_cnt - 1;
 
-   if (pgaddr == ((u64) pd->subport_uregbase & MMAP64_MASK)) {
+

[PATCH] powerpc: 8xx parenthesis balance

2007-03-15 Thread Mariusz Kozlowski
Hello,

This patch (against 2.6.21-rc3-mm1) balances parenthesis in
powerpc 8xx header files.

Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>

 arch/powerpc/platforms/8xx/mpc86xads.h |2 +-
 arch/powerpc/platforms/8xx/mpc885ads.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff -upr linux-2.6.21-rc3-mm1-a/arch/powerpc/platforms/8xx/mpc86xads.h 
linux-2.6.21-rc3-mm1-b/arch/powerpc/platforms/8xx/mpc86xads.h
--- linux-2.6.21-rc3-mm1-a/arch/powerpc/platforms/8xx/mpc86xads.h   
2007-03-15 22:25:05.0 +0100
+++ linux-2.6.21-rc3-mm1-b/arch/powerpc/platforms/8xx/mpc86xads.h   
2007-03-15 22:30:59.0 +0100
@@ -37,7 +37,7 @@
 #define CPM_MAP_ADDR   (get_immrbase() + MPC8xx_CPM_OFFSET)
 #define CPM_IRQ_OFFSET 16 // for compability with cpm_uart driver
 
-#define PCMCIA_MEM_ADDR(uint)0xff02)
+#define PCMCIA_MEM_ADDR((uint)0xff02)
 #define PCMCIA_MEM_SIZE((uint)(64 * 1024))
 
 /* Bits of interest in the BCSRs.
diff -upr linux-2.6.21-rc3-mm1-a/arch/powerpc/platforms/8xx/mpc885ads.h 
linux-2.6.21-rc3-mm1-b/arch/powerpc/platforms/8xx/mpc885ads.h
--- linux-2.6.21-rc3-mm1-a/arch/powerpc/platforms/8xx/mpc885ads.h   
2007-03-15 22:25:05.0 +0100
+++ linux-2.6.21-rc3-mm1-b/arch/powerpc/platforms/8xx/mpc885ads.h   
2007-03-15 22:31:15.0 +0100
@@ -37,7 +37,7 @@
 #define CPM_MAP_ADDR   (get_immrbase() + MPC8xx_CPM_OFFSET)
 #define CPM_IRQ_OFFSET 16 // for compability with cpm_uart driver
 
-#define PCMCIA_MEM_ADDR(uint)0xff02)
+#define PCMCIA_MEM_ADDR((uint)0xff02)
 #define PCMCIA_MEM_SIZE((uint)(64 * 1024))
 
 /* Bits of interest in the BCSRs.

Regards,

Mariusz Kozlowski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16 of 33] IB/ipath - fix RDMA reads of length zero and error handling

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 4d22cec2265b606cecee72d5abca4436bb1e6cb7
# Parent  5ff8f23d0e61169f598ab1d93aa6324d88c17921
IB/ipath - fix RDMA reads of length zero and error handling

Fix RDMA read response length checking for RDMA_READ_RESPONSE_ONLY
to allow a zero length response.
RDMA read responses which don't match the expected length or occur
in response to some other operation should generate a completion
queue error (see table 56, ch. 9.9.2.3).

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 5ff8f23d0e61 -r 4d22cec2265b drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:25 2007 -0700
@@ -1136,7 +1136,7 @@ static inline void ipath_rc_rcv_resp(str
goto ack_done;
hdrsize += 4;
if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
-   goto ack_done;
+   goto ack_op_err;
/*
 * If this is a response to a resent RDMA read, we
 * have to be careful to copy the data to the right
@@ -1154,12 +1154,12 @@ static inline void ipath_rc_rcv_resp(str
goto ack_done;
}
if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
-   goto ack_done;
+   goto ack_op_err;
read_middle:
if (unlikely(tlen != (hdrsize + pmtu + 4)))
-   goto ack_done;
+   goto ack_len_err;
if (unlikely(pmtu >= qp->s_rdma_read_len))
-   goto ack_done;
+   goto ack_len_err;
 
/* We got a response so update the timeout. */
spin_lock(>pending_lock);
@@ -1184,12 +1184,20 @@ static inline void ipath_rc_rcv_resp(str
goto ack_done;
}
if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
-   goto ack_done;
+   goto ack_op_err;
+   /* Get the number of bytes the message was padded by. */
+   pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+   /*
+* Check that the data size is >= 0 && <= pmtu.
+* Remember to account for the AETH header (4) and
+* ICRC (4).
+*/
+   if (unlikely(tlen < (hdrsize + pad + 8)))
+   goto ack_len_err;
/*
 * If this is a response to a resent RDMA read, we
 * have to be careful to copy the data to the right
 * location.
-* XXX should check PSN and wqe opcode first.
 */
qp->s_rdma_read_len = restart_sge(>s_rdma_read_sge,
  wqe, psn, pmtu);
@@ -1203,26 +1211,20 @@ static inline void ipath_rc_rcv_resp(str
goto ack_done;
}
if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
-   goto ack_done;
-   read_last:
-   /*
-* Get the number of bytes the message was padded by.
-*/
+   goto ack_op_err;
+   /* Get the number of bytes the message was padded by. */
pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
/*
 * Check that the data size is >= 1 && <= pmtu.
 * Remember to account for the AETH header (4) and
 * ICRC (4).
 */
-   if (unlikely(tlen <= (hdrsize + pad + 8))) {
-   /* XXX Need to generate an error CQ entry. */
-   goto ack_done;
-   }
+   if (unlikely(tlen <= (hdrsize + pad + 8)))
+   goto ack_len_err;
+   read_last:
tlen -= hdrsize + pad + 8;
-   if (unlikely(tlen != qp->s_rdma_read_len)) {
-   /* XXX Need to generate an error CQ entry. */
-   goto ack_done;
-   }
+   if (unlikely(tlen != qp->s_rdma_read_len))
+   goto ack_len_err;
if (!header_in_data)
aeth = be32_to_cpu(ohdr->u.aeth);
else {
@@ -1236,6 +1238,29 @@ static inline void ipath_rc_rcv_resp(str
 
 ack_done:
spin_unlock_irqrestore(>s_lock, flags);
+   goto bail;
+
+ack_op_err:
+   wc.status = IB_WC_LOC_QP_OP_ERR;
+   goto ack_err;
+
+ack_len_err:
+   wc.status = IB_WC_LOC_LEN_ERR;
+ack_err:
+   wc.wr_id = wqe->wr.wr_id;
+   wc.opcode = ib_ipath_wc_opcode[wqe->wr.opcode];
+   wc.vendor_err = 0;
+   wc.byte_len = 0;
+   wc.imm_data = 0;
+   wc.qp = >ibqp;
+   wc.src_qp 

[PATCH 11 of 33] IB/ipath - Change packet problems vs chip errors handling and reporting

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID c793dc8a526564b73018924a707bcb21052f8f36
# Parent  4050989280f08d81d06642e3d6cf5c3ea4397107
IB/ipath - Change packet problems vs chip errors handling and reporting

Some types of packet errors are moderately common with longer IB
cables and large clusters, and are not reported with prints by
other IB HCA drivers.  This suppresses those messages unless the
new __IPATH_ERRPKTDBG bit is set in ipath_debug.  Reporting
of temporarily disabled frequent error interrupts was also made
clearer

We also distinguish between chip errors, and bad packets sent or
received in the wording of the messages.

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 4050989280f0 -r c793dc8a5265 drivers/infiniband/hw/ipath/ipath_debug.h
--- a/drivers/infiniband/hw/ipath/ipath_debug.h Thu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_debug.h Thu Mar 15 14:34:24 2007 -0700
@@ -57,6 +57,7 @@
 #define __IPATH_PROCDBG 0x100
 /* print mmap/nopage stuff, not using VDBG any more */
 #define __IPATH_MMDBG   0x200
+#define __IPATH_ERRPKTDBG   0x400
 #define __IPATH_USER_SEND   0x1000 /* use user mode send */
 #define __IPATH_KERNEL_SEND 0x2000 /* use kernel mode send */
 #define __IPATH_EPKTDBG 0x4000 /* print ethernet packet data */
diff -r 4050989280f0 -r c793dc8a5265 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:24 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:24 
2007 -0700
@@ -754,9 +754,42 @@ static int ipath_wait_linkstate(struct i
return (dd->ipath_flags & state) ? 0 : -ETIMEDOUT;
 }
 
-void ipath_decode_err(char *buf, size_t blen, ipath_err_t err)
-{
+/*
+ * Decode the error status into strings, deciding whether to always
+ * print * it or not depending on "normal packet errors" vs everything
+ * else.   Return 1 if "real" errors, otherwise 0 if only packet
+ * errors, so caller can decide what to print with the string.
+ */
+int ipath_decode_err(char *buf, size_t blen, ipath_err_t err)
+{
+   int iserr = 1;
*buf = '\0';
+   if (err & INFINIPATH_E_PKTERRS) {
+   if (!(err & ~INFINIPATH_E_PKTERRS))
+   iserr = 0; // if only packet errors.
+   if (ipath_debug & __IPATH_ERRPKTDBG) {
+   if (err & INFINIPATH_E_REBP)
+   strlcat(buf, "EBP ", blen);
+   if (err & INFINIPATH_E_RVCRC)
+   strlcat(buf, "VCRC ", blen);
+   if (err & INFINIPATH_E_RICRC) {
+   strlcat(buf, "CRC ", blen);
+   // clear for check below, so only once
+   err &= INFINIPATH_E_RICRC; 
+   }
+   if (err & INFINIPATH_E_RSHORTPKTLEN)
+   strlcat(buf, "rshortpktlen ", blen);
+   if (err & INFINIPATH_E_SDROPPEDDATAPKT)
+   strlcat(buf, "sdroppeddatapkt ", blen);
+   if (err & INFINIPATH_E_SPKTLEN)
+   strlcat(buf, "spktlen ", blen);
+   }
+   if ((err & INFINIPATH_E_RICRC) &&
+   !(err&(INFINIPATH_E_RVCRC|INFINIPATH_E_REBP)))
+   strlcat(buf, "CRC ", blen);
+   if (!iserr)
+   goto done;
+   }
if (err & INFINIPATH_E_RHDRLEN)
strlcat(buf, "rhdrlen ", blen);
if (err & INFINIPATH_E_RBADTID)
@@ -767,12 +800,12 @@ void ipath_decode_err(char *buf, size_t 
strlcat(buf, "rhdr ", blen);
if (err & INFINIPATH_E_RLONGPKTLEN)
strlcat(buf, "rlongpktlen ", blen);
-   if (err & INFINIPATH_E_RSHORTPKTLEN)
-   strlcat(buf, "rshortpktlen ", blen);
if (err & INFINIPATH_E_RMAXPKTLEN)
strlcat(buf, "rmaxpktlen ", blen);
if (err & INFINIPATH_E_RMINPKTLEN)
strlcat(buf, "rminpktlen ", blen);
+   if (err & INFINIPATH_E_SMINPKTLEN)
+   strlcat(buf, "sminpktlen ", blen);
if (err & INFINIPATH_E_RFORMATERR)
strlcat(buf, "rformaterr ", blen);
if (err & INFINIPATH_E_RUNSUPVL)
@@ -781,32 +814,20 @@ void ipath_decode_err(char *buf, size_t 
strlcat(buf, "runexpchar ", blen);
if (err & INFINIPATH_E_RIBFLOW)
strlcat(buf, "ribflow ", blen);
-   if (err & INFINIPATH_E_REBP)
-   strlcat(buf, "EBP ", blen);
if (err & INFINIPATH_E_SUNDERRUN)
strlcat(buf, "sunderrun ", blen);
if (err & INFINIPATH_E_SPIOARMLAUNCH)
strlcat(buf, "spioarmlaunch ", blen);
if (err & INFINIPATH_E_SUNEXPERRPKTNUM)

[PATCH 28 of 33] IB/ipath - Don't allow QP's 0 and 1 to be opened multiple times

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID b436c73d4fe312c3cba092d5f642de5c0ff6aa91
# Parent  fddf5d03720ca586054b66d250d84233bdb3bf86
IB/ipath - Don't allow QP's 0 and 1 to be opened multiple times

Signed-off-by: Robert Walsh <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r fddf5d03720c -r b436c73d4fe3 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:25 2007 -0700
@@ -81,11 +81,51 @@ static u32 credit_table[31] = {
32768   /* 1E */
 };
 
-static u32 alloc_qpn(struct ipath_qp_table *qpt)
+
+static void get_map_page(struct ipath_qp_table *qpt, struct qpn_map *map)
+{
+   unsigned long page = get_zeroed_page(GFP_KERNEL);
+   unsigned long flags;
+
+   /*
+* Free the page if someone raced with us installing it.
+*/
+
+   spin_lock_irqsave(>lock, flags);
+   if (map->page)
+   free_page(page);
+   else
+   map->page = (void *)page;
+   spin_unlock_irqrestore(>lock, flags);
+}
+
+
+static int alloc_qpn(struct ipath_qp_table *qpt, enum ib_qp_type type)
 {
u32 i, offset, max_scan, qpn;
struct qpn_map *map;
-   u32 ret;
+   u32 ret = -1;
+
+   if (type == IB_QPT_SMI)
+   ret = 0;
+   else if (type == IB_QPT_GSI)
+   ret = 1;
+
+   if (ret != -1) {
+   map = >map[0];
+   if (unlikely(!map->page)) {
+   get_map_page(qpt, map);
+   if (unlikely(!map->page)) {
+   ret = -ENOMEM;
+   goto bail;
+   }
+   }
+   if (!test_and_set_bit(ret, map->page))
+   atomic_dec(>n_free);
+   else
+   ret = -EBUSY;
+   goto bail;
+   }
 
qpn = qpt->last + 1;
if (qpn >= QPN_MAX)
@@ -95,19 +135,7 @@ static u32 alloc_qpn(struct ipath_qp_tab
max_scan = qpt->nmaps - !offset;
for (i = 0;;) {
if (unlikely(!map->page)) {
-   unsigned long page = get_zeroed_page(GFP_KERNEL);
-   unsigned long flags;
-
-   /*
-* Free the page if someone raced with us
-* installing it:
-*/
-   spin_lock_irqsave(>lock, flags);
-   if (map->page)
-   free_page(page);
-   else
-   map->page = (void *)page;
-   spin_unlock_irqrestore(>lock, flags);
+   get_map_page(qpt, map);
if (unlikely(!map->page))
break;
}
@@ -151,7 +179,7 @@ static u32 alloc_qpn(struct ipath_qp_tab
qpn = mk_qpn(qpt, map, offset);
}
 
-   ret = 0;
+   ret = -ENOMEM;
 
 bail:
return ret;
@@ -180,29 +208,19 @@ static int ipath_alloc_qpn(struct ipath_
   enum ib_qp_type type)
 {
unsigned long flags;
-   u32 qpn;
int ret;
 
-   if (type == IB_QPT_SMI)
-   qpn = 0;
-   else if (type == IB_QPT_GSI)
-   qpn = 1;
-   else {
-   /* Allocate the next available QPN */
-   qpn = alloc_qpn(qpt);
-   if (qpn == 0) {
-   ret = -ENOMEM;
-   goto bail;
-   }
-   }
-   qp->ibqp.qp_num = qpn;
+   ret = alloc_qpn(qpt, type);
+   if (ret < 0)
+   goto bail;
+   qp->ibqp.qp_num = ret;
 
/* Add the QP to the hash table. */
spin_lock_irqsave(>lock, flags);
 
-   qpn %= qpt->max;
-   qp->next = qpt->table[qpn];
-   qpt->table[qpn] = qp;
+   ret %= qpt->max;
+   qp->next = qpt->table[ret];
+   qpt->table[ret] = qp;
atomic_inc(>refcount);
 
spin_unlock_irqrestore(>lock, flags);
@@ -245,9 +263,7 @@ static void ipath_free_qp(struct ipath_q
if (!fnd)
return;
 
-   /* If QPN is not reserved, mark QPN free in the bitmap. */
-   if (qp->ibqp.qp_num > 1)
-   free_qpn(qpt, qp->ibqp.qp_num);
+   free_qpn(qpt, qp->ibqp.qp_num);
 
wait_event(qp->wait, !atomic_read(>refcount));
 }
@@ -270,8 +286,7 @@ void ipath_free_all_qps(struct ipath_qp_
 
while (qp) {
nqp = qp->next;
-   if (qp->ibqp.qp_num > 1)
-   free_qpn(qpt, qp->ibqp.qp_num);
+   free_qpn(qpt, qp->ibqp.qp_num);
if (!atomic_dec_and_test(>refcount) ||
!ipath_destroy_qp(>ibqp))
  

[PATCH 23 of 33] IB/ipath - Improve handling and reporting of parity errors, mostly cleanup

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 01cde17958018b5262570cd9ea399378f95051e7
# Parent  fe719d50378ce70909f96bd5e7bc8e4f28a5031b
IB/ipath - Improve handling and reporting of parity errors, mostly cleanup

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r fe719d50378c -r 01cde1795801 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
@@ -605,8 +605,9 @@ static void __devexit cleanup_device(str
 
ipath_cdbg(VERBOSE, "Free shadow page tid array at %p\n",
   dd->ipath_pageshadow);
-   vfree(dd->ipath_pageshadow);
+   tmpp = dd->ipath_pageshadow;
dd->ipath_pageshadow = NULL;
+   vfree(tmpp);
}
 
/*
diff -r fe719d50378c -r 01cde1795801 drivers/infiniband/hw/ipath/ipath_eeprom.c
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.cThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.cThu Mar 15 14:34:25 
2007 -0700
@@ -626,6 +626,10 @@ void ipath_get_eeprom_info(struct ipath_
} else
memcpy(dd->ipath_serial, ifp->if_serial,
   sizeof ifp->if_serial);
+   if (!strstr(ifp->if_comment, "Tested successfully"))
+   ipath_dev_err(dd, "Board SN %s did not pass functional "
+   "test: %s\n", dd->ipath_serial,
+   ifp->if_comment);
 
ipath_cdbg(VERBOSE, "Initted GUID to %llx from eeprom\n",
   (unsigned long long) be64_to_cpu(dd->ipath_guid));
diff -r fe719d50378c -r 01cde1795801 drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c   Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c   Thu Mar 15 14:34:25 
2007 -0700
@@ -284,6 +284,14 @@ static const struct ipath_cregs ipath_ht
 #define INFINIPATH_EXTS_MEMBIST_ENDTEST 0x4000
 #define INFINIPATH_EXTS_MEMBIST_CORRECT 0x8000
 
+
+/* TID entries (memory), HT-only */
+#define INFINIPATH_RT_ADDR_MASK 0xFFULL/* 40 bits valid */
+#define INFINIPATH_RT_VALID 0x8000ULL
+#define INFINIPATH_RT_ADDR_SHIFT 0
+#define INFINIPATH_RT_BUFSIZE_MASK 0x3FFFULL
+#define INFINIPATH_RT_BUFSIZE_SHIFT 48
+
 /*
  * masks and bits that are different in different chips, or present only
  * in one
@@ -402,6 +410,14 @@ static const struct ipath_hwerror_msgs i
INFINIPATH_HWE_MSG(SERDESPLLFAILED, "SerDes PLL"),
 };
 
+#define TXE_PIO_PARITY ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF | \
+   INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC) \
+   << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)
+#define RXE_EAGER_PARITY (INFINIPATH_HWE_RXEMEMPARITYERR_EAGERTID \
+ << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT)
+
+static int ipath_ht_txe_recover(struct ipath_devdata *);
+
 /**
  * ipath_ht_handle_hwerrors - display hardware errors.
  * @dd: the infinipath device
@@ -450,13 +466,12 @@ static void ipath_ht_handle_hwerrors(str
 
/*
 * make sure we get this much out, unless told to be quiet,
+* it's a parity error we may recover from,
 * or it's occurred within the last 5 seconds
 */
-   if ((hwerrs & ~(dd->ipath_lasthwerror |
-   ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
- INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
-   << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT))) ||
-   (ipath_debug & __IPATH_VERBDBG))
+   if ((hwerrs & ~(dd->ipath_lasthwerror | TXE_PIO_PARITY |
+   RXE_EAGER_PARITY)) ||
+   (ipath_debug & __IPATH_VERBDBG))
dev_info(>pcidev->dev, "Hardware error: hwerr=0x%llx "
 "(cleared)\n", (unsigned long long) hwerrs);
dd->ipath_lasthwerror |= hwerrs;
@@ -467,7 +482,7 @@ static void ipath_ht_handle_hwerrors(str
  (hwerrs & ~dd->ipath_hwe_bitsextant));
 
ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control);
-   if (ctrl & INFINIPATH_C_FREEZEMODE) {
+   if ((ctrl & INFINIPATH_C_FREEZEMODE) && !ipath_diag_inuse) {
/*
 * parity errors in send memory are recoverable,
 * just cancel the send (if indicated in * sendbuffererror),
@@ -476,50 +491,14 @@ static void ipath_ht_handle_hwerrors(str
 * occur if a processor speculative read is done to the PIO
 * buffer while we are sending a packet, for example.
 */
-   if (hwerrs & ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
-  INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
- 

Re: [PATCH 2/3] swsusp: Do not use page flags

2007-03-15 Thread Jiri Kosina
On Thu, 15 Mar 2007, Andrew Morton wrote:

> > > And why _does_ suspend use GFP_ATOMIC all over the place?
> > Generally, because it cannot sleep.
> Why not?

I guess it's simply beucase of kswapd being already frozen, so there is no 
chance that once GFP_KERNEL allocation goes to sleep, it is going to get 
any free pages eventually ... ?

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12 of 33] IB/ipath - fix bad argument to clear_bit that trashed memory and/or crashed

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 84a9691cf7ff54ce76de402d2353a451ba9c555b
# Parent  c793dc8a526564b73018924a707bcb21052f8f36
IB/ipath - fix bad argument to clear_bit that trashed memory and/or crashed

Code was converted from a &= ~mask to clear_bit, but the bit was left shifted
instead of being used directly, so we were either trashing memory several
pages away, or sometimes taking a kernel page fault on an invalid page.

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r c793dc8a5265 -r 84a9691cf7ff drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c  Thu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c  Thu Mar 15 14:34:25 2007 -0700
@@ -842,11 +842,10 @@ static void handle_urcv(struct ipath_dev
struct ipath_portdata *pd = dd->ipath_pd[i];
if (portr & (1 << i) && pd && pd->port_cnt &&
test_bit(IPATH_PORT_WAITING_RCV, >port_flag)) {
-   int rcbit;
clear_bit(IPATH_PORT_WAITING_RCV,
  >port_flag);
-   rcbit = i + INFINIPATH_R_INTRAVAIL_SHIFT;
-   clear_bit(1UL << rcbit, >ipath_rcvctrl);
+   clear_bit(i + INFINIPATH_R_INTRAVAIL_SHIFT,
+ >ipath_rcvctrl);
wake_up_interruptible(>port_wait);
rcvdint = 1;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10 of 33] IB/ipath - fix PSN update for RC retries

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID 4050989280f08d81d06642e3d6cf5c3ea4397107
# Parent  ec38d8f91d79a765cf53aaa7e8a59622418f2c9f
IB/ipath - fix PSN update for RC retries

This patch fixes a number of bugs with updating the PSN for retries of
RC requests.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r ec38d8f91d79 -r 4050989280f0 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:24 2007 -0700
@@ -444,7 +444,7 @@ int ipath_make_rc_req(struct ipath_qp *q
qp->s_psn = wqe->lpsn + 1;
else {
qp->s_psn++;
-   if ((int)(qp->s_psn - qp->s_next_psn) > 0)
+   if (ipath_cmp24(qp->s_psn, qp->s_next_psn) > 0)
qp->s_next_psn = qp->s_psn;
}
/*
@@ -471,7 +471,7 @@ int ipath_make_rc_req(struct ipath_qp *q
/* FALLTHROUGH */
case OP(SEND_MIDDLE):
bth2 = qp->s_psn++ & IPATH_PSN_MASK;
-   if ((int)(qp->s_psn - qp->s_next_psn) > 0)
+   if (ipath_cmp24(qp->s_psn, qp->s_next_psn) > 0)
qp->s_next_psn = qp->s_psn;
ss = >s_sge;
len = qp->s_len;
@@ -507,7 +507,7 @@ int ipath_make_rc_req(struct ipath_qp *q
/* FALLTHROUGH */
case OP(RDMA_WRITE_MIDDLE):
bth2 = qp->s_psn++ & IPATH_PSN_MASK;
-   if ((int)(qp->s_psn - qp->s_next_psn) > 0)
+   if (ipath_cmp24(qp->s_psn, qp->s_next_psn) > 0)
qp->s_next_psn = qp->s_psn;
ss = >s_sge;
len = qp->s_len;
@@ -546,7 +546,7 @@ int ipath_make_rc_req(struct ipath_qp *q
qp->s_state = OP(RDMA_READ_REQUEST);
hwords += sizeof(ohdr->u.rc.reth) / sizeof(u32);
bth2 = qp->s_psn++ & IPATH_PSN_MASK;
-   if ((int)(qp->s_psn - qp->s_next_psn) > 0)
+   if (ipath_cmp24(qp->s_psn, qp->s_next_psn) > 0)
qp->s_next_psn = qp->s_psn;
ss = NULL;
len = 0;
@@ -779,7 +779,7 @@ void ipath_restart_rc(struct ipath_qp *q
if (wqe->wr.opcode == IB_WR_RDMA_READ)
dev->n_rc_resends++;
else
-   dev->n_rc_resends += (int)qp->s_psn - (int)psn;
+   dev->n_rc_resends += (qp->s_psn - psn) & IPATH_PSN_MASK;
 
reset_psn(qp, psn);
tasklet_hi_schedule(>s_task);
@@ -915,15 +915,19 @@ static int do_rc_ack(struct ipath_qp *qp
if (qp->s_last == qp->s_cur) {
if (++qp->s_cur >= qp->s_size)
qp->s_cur = 0;
+   qp->s_last = qp->s_cur;
+   if (qp->s_last == qp->s_tail)
+   break;
wqe = get_swqe_ptr(qp, qp->s_cur);
qp->s_state = OP(SEND_LAST);
qp->s_psn = wqe->psn;
-   }
-   if (++qp->s_last >= qp->s_size)
-   qp->s_last = 0;
-   wqe = get_swqe_ptr(qp, qp->s_last);
-   if (qp->s_last == qp->s_tail)
-   break;
+   } else {
+   if (++qp->s_last >= qp->s_size)
+   qp->s_last = 0;
+   if (qp->s_last == qp->s_tail)
+   break;
+   wqe = get_swqe_ptr(qp, qp->s_last);
+   }
}
 
switch (aeth >> 29) {
@@ -935,6 +939,18 @@ static int do_rc_ack(struct ipath_qp *qp
list_add_tail(>timerwait,
  >pending[dev->pending_index]);
spin_unlock(>pending_lock);
+   /*
+* If we get a partial ACK for a resent operation,
+* we can stop resending the earlier packets and
+* continue with the next packet the receiver wants.
+*/
+   if (ipath_cmp24(qp->s_psn, psn) <= 0) {
+   reset_psn(qp, psn + 1);
+   tasklet_hi_schedule(>s_task);
+   }
+   } else if (ipath_cmp24(qp->s_psn, psn) <= 0) {
+   qp->s_state = OP(SEND_LAST);
+   qp->s_psn = psn + 1;
}
ipath_get_credit(qp, aeth);
qp->s_rnr_retry = qp->s_rnr_retry_cnt;
@@ -945,22 +961,23 @@ static int do_rc_ack(struct ipath_qp *qp
 
case 1: /* RNR NAK */
dev->n_rnr_naks++;
+   if (qp->s_last == qp->s_tail)
+   goto bail;
if 

[PATCH 09 of 33] IB/ipath - fix QP error completion queue entries

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID ec38d8f91d79a765cf53aaa7e8a59622418f2c9f
# Parent  187ff5af5e5dd2b1f2ca48ba6ad0056ce7fc7403
IB/ipath - fix QP error completion queue entries

When switching to the QP error state, the completion queue entries
(error or flush) were not being generated correctly.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 187ff5af5e5d -r ec38d8f91d79 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:24 2007 -0700
@@ -361,7 +361,7 @@ static void ipath_reset_qp(struct ipath_
  * @err: the receive completion error to signal if a RWQE is active
  *
  * Flushes both send and receive work queues.
- * QP s_lock should be held and interrupts disabled.
+ * The QP s_lock should be held and interrupts disabled.
  */
 
 void ipath_error_qp(struct ipath_qp *qp, enum ib_wc_status err)
@@ -393,6 +393,8 @@ void ipath_error_qp(struct ipath_qp *qp,
wc.port_num = 0;
if (qp->r_wrid_valid) {
qp->r_wrid_valid = 0;
+   wc.wr_id = qp->r_wr_id;
+   wc.opcode = IB_WC_RECV;
wc.status = err;
ipath_cq_enter(to_icq(qp->ibqp.send_cq), , 1);
}
@@ -972,7 +974,7 @@ bail:
  * @wc: the WC responsible for putting the QP in this state
  *
  * Flushes the send work queue.
- * The QP s_lock should be held.
+ * The QP s_lock should be held and interrupts disabled.
  */
 
 void ipath_sqerror_qp(struct ipath_qp *qp, struct ib_wc *wc)
@@ -998,12 +1000,12 @@ void ipath_sqerror_qp(struct ipath_qp *q
wc->status = IB_WC_WR_FLUSH_ERR;
 
while (qp->s_last != qp->s_head) {
+   wqe = get_swqe_ptr(qp, qp->s_last);
wc->wr_id = wqe->wr.wr_id;
wc->opcode = ib_ipath_wc_opcode[wqe->wr.opcode];
ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 1);
if (++qp->s_last >= qp->s_size)
qp->s_last = 0;
-   wqe = get_swqe_ptr(qp, qp->s_last);
}
qp->s_cur = qp->s_tail = qp->s_head;
qp->state = IB_QPS_SQE;
diff -r 187ff5af5e5d -r ec38d8f91d79 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:24 2007 -0700
@@ -895,8 +895,10 @@ static int do_rc_ack(struct ipath_qp *qp
wc.opcode = ib_ipath_wc_opcode[wqe->wr.opcode];
wc.vendor_err = 0;
wc.byte_len = wqe->length;
+   wc.imm_data = 0;
wc.qp = >ibqp;
wc.src_qp = qp->remote_qpn;
+   wc.wc_flags = 0;
wc.pkey_index = 0;
wc.slid = qp->remote_ah_attr.dlid;
wc.sl = qp->remote_ah_attr.sl;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24 of 33] IB/ipath - fix driver crash (in interrupt or during unload) after chip reset

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Michael Albaugh <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 3e81a6b18b42bbe6dffab382fb26d754dfdf83a1
# Parent  01cde17958018b5262570cd9ea399378f95051e7
IB/ipath - fix driver crash (in interrupt or during unload) after chip reset

Re-init of the kernel structures after a chip reset was leaving the
portdata structure for port zero in an inconsistent state, and a pointer
to it either stale (in re-init code) or NULL (in devdata) Fixing the
order of operations on this struct, and the condition for interrupt
access, prevents the crashes.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 01cde1795801 -r 3e81a6b18b42 
drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Thu Mar 15 14:34:25 
2007 -0700
@@ -216,6 +216,20 @@ static int bringup_link(struct ipath_dev
return ret;
 }
 
+static struct ipath_portdata *create_portdata0(struct ipath_devdata *dd)
+{
+   struct ipath_portdata *pd = NULL;
+
+   pd = kzalloc(sizeof(*pd), GFP_KERNEL);
+   if (pd) {
+   pd->port_dd = dd;
+   pd->port_cnt = 1;
+   /* The port 0 pkey table is used by the layer interface. */
+   pd->port_pkeys[0] = IPATH_DEFAULT_P_KEY;
+   }
+   return pd;
+}
+
 static int init_chip_first(struct ipath_devdata *dd,
   struct ipath_portdata **pdp)
 {
@@ -271,20 +285,16 @@ static int init_chip_first(struct ipath_
goto done;
}
 
-   dd->ipath_pd[0] = kzalloc(sizeof(*pd), GFP_KERNEL);
-
-   if (!dd->ipath_pd[0]) {
+   pd = create_portdata0(dd);
+
+   if (!pd) {
ipath_dev_err(dd, "Unable to allocate portdata for port "
  "0, failing\n");
ret = -ENOMEM;
goto done;
}
-   pd = dd->ipath_pd[0];
-   pd->port_dd = dd;
-   pd->port_port = 0;
-   pd->port_cnt = 1;
-   /* The port 0 pkey table is used by the layer interface. */
-   pd->port_pkeys[0] = IPATH_DEFAULT_P_KEY;
+   dd->ipath_pd[0] = pd;
+
dd->ipath_rcvtidcnt =
ipath_read_kreg32(dd, dd->ipath_kregs->kr_rcvtidcnt);
dd->ipath_rcvtidbase =
@@ -838,11 +848,24 @@ int ipath_init_chip(struct ipath_devdata
 * Set up the port 0 (kernel) rcvhdr q and egr TIDs.  If doing
 * re-init, the simplest way to handle this is to free
 * existing, and re-allocate.
+* Need to re-create rest of port 0 portdata as well.
 */
if (reinit) {
-   struct ipath_portdata *pd = dd->ipath_pd[0];
-   dd->ipath_pd[0] = NULL;
-   ipath_free_pddata(dd, pd);
+   /* Alloc and init new ipath_portdata for port0,
+* Then free old pd. Could lead to fragmentation, but also
+* makes later support for hot-swap easier.
+*/
+   struct ipath_portdata *npd;
+   npd = create_portdata0(dd);
+   if (npd) {
+   ipath_free_pddata(dd, pd);
+   dd->ipath_pd[0] = pd = npd;
+   } else {
+   ipath_dev_err(dd, "Unable to allocate portdata for"
+ "  port 0, failing\n");
+   ret = -ENOMEM;
+   goto done;
+   }
}
dd->ipath_f_tidtemplate(dd);
ret = ipath_create_rcvhdrq(dd, pd);
diff -r 01cde1795801 -r 3e81a6b18b42 drivers/infiniband/hw/ipath/ipath_stats.c
--- a/drivers/infiniband/hw/ipath/ipath_stats.c Thu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c Thu Mar 15 14:34:25 2007 -0700
@@ -207,7 +207,7 @@ void ipath_get_faststats(unsigned long o
 * don't access the chip while running diags, or memory diags can
 * fail
 */
-   if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT) ||
+   if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_INITTED) ||
ipath_diag_inuse)
/* but re-arm the timer, for diags case; won't hurt other */
goto done;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26 of 33] IB/ipath - prevent random program use of diags interface

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 284c34f2a16f7cb4fe48a2f6fbe9ad4beea5
# Parent  e9895e2ad504a2590b0943c037d1fa5f9568fda3
IB/ipath - prevent random program use of diags interface

To prevent random utility reads and writes of the diag interface to the
chip, we first require a handshake of reading from offset 0 and writing
to offset 0 before any other reads or writes can be done through the
diags device.   Otherwise chip errors can be triggered.

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r e9895e2ad504 -r 284c34f2addd drivers/infiniband/hw/ipath/ipath_diag.c
--- a/drivers/infiniband/hw/ipath/ipath_diag.c  Thu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c  Thu Mar 15 14:34:25 2007 -0700
@@ -296,7 +296,7 @@ static int ipath_diag_open(struct inode 
}
 
fp->private_data = dd;
-   ipath_diag_inuse = 1;
+   ipath_diag_inuse = -2;
diag_set_link = 0;
ret = 0;
 
@@ -461,6 +461,8 @@ static ssize_t ipath_diag_read(struct fi
else if ((count % 4) || (*off % 4))
/* address or length is not 32-bit aligned, hence invalid */
ret = -EINVAL;
+   else if (ipath_diag_inuse < 1 && (*off || count != 8))
+   ret = -EINVAL;  /* prevent cat /dev/ipath_diag* */
else if ((count % 8) || (*off % 8))
/* address or length not 64-bit aligned; do 32-bit reads */
ret = ipath_read_umem32(dd, data, kreg_base + *off, count);
@@ -470,6 +472,8 @@ static ssize_t ipath_diag_read(struct fi
if (ret >= 0) {
*off += count;
ret = count;
+   if (ipath_diag_inuse == -2)
+   ipath_diag_inuse++;
}
 
return ret;
@@ -489,6 +493,9 @@ static ssize_t ipath_diag_write(struct f
else if ((count % 4) || (*off % 4))
/* address or length is not 32-bit aligned, hence invalid */
ret = -EINVAL;
+   else if ((ipath_diag_inuse == -1 && (*off || count != 8)) ||
+ipath_diag_inuse == -2)  /* read qw off 0, write qw off 0 */
+   ret = -EINVAL;  /* before any other write allowed */
else if ((count % 8) || (*off % 8))
/* address or length not 64-bit aligned; do 32-bit writes */
ret = ipath_write_umem32(dd, kreg_base + *off, data, count);
@@ -498,7 +505,9 @@ static ssize_t ipath_diag_write(struct f
if (ret >= 0) {
*off += count;
ret = count;
-   }
-
-   return ret;
-}
+   if (ipath_diag_inuse == -1)
+   ipath_diag_inuse = 1; /* all read/write OK now */
+   }
+
+   return ret;
+}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 27 of 33] IB/ipath - cleaner shutdown at driver unload, disable IB link earlier

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID fddf5d03720ca586054b66d250d84233bdb3bf86
# Parent  284c34f2a16f7cb4fe48a2f6fbe9ad4beea5
IB/ipath - cleaner shutdown at driver unload, disable IB link earlier

Moved the code that shuts down the IB link earlier in the unload process, to
be sure no new packets can arrive while we are unloading.

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 284c34f2addd -r fddf5d03720c drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
@@ -536,8 +536,6 @@ static void __devexit cleanup_device(str
 {
int port;
 
-   ipath_shutdown_device(dd);
-
if (*dd->ipath_statusp & IPATH_STATUS_CHIP_PRESENT) {
/* can't do anything more with chip; needs re-init */
*dd->ipath_statusp &= ~IPATH_STATUS_CHIP_PRESENT;
@@ -633,6 +631,12 @@ static void __devexit ipath_remove_one(s
struct ipath_devdata *dd = pci_get_drvdata(pdev);
 
ipath_cdbg(VERBOSE, "removing, pdev=%p, dd=%p\n", pdev, dd);
+
+   /*
+* disable the IB link early, to be sure no new packets arrive, which 
+* complicates the shutdown process
+*/
+   ipath_shutdown_device(dd);
 
if (dd->verbs_dev)
ipath_unregister_ib_device(dd->verbs_dev);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] oom fix: prevent oom from killing a process with children/sibling unkillable

2007-03-15 Thread William Lee Irwin III
On Thu, Mar 15, 2007 at 07:19:21PM +0530, Ankita Garg wrote:
> Looking at oom_kill.c, found that the intention to not kill the selected
> process if any of its children/siblings has OOM_DISABLE set, is not being met.
> Signed-off-by: Ankita Garg <[EMAIL PROTECTED]>
> Index: ankita/linux-2.6.20.1/mm/oom_kill.c
> ===
> --- ankita.orig/linux-2.6.20.1/mm/oom_kill.c  2007-02-20 12:04:32.0 
> +0530
> +++ ankita/linux-2.6.20.1/mm/oom_kill.c   2007-03-15 12:44:50.0 
> +0530
> @@ -320,7 +320,7 @@
>* Don't kill the process if any threads are set to OOM_DISABLE
>*/
>   do_each_thread(g, q) {
> - if (q->mm == mm && p->oomkilladj == OOM_DISABLE)
> + if (q->mm == mm && q->oomkilladj == OOM_DISABLE)
>   return 1;
>   } while_each_thread(g, q);

Acked-by: William Irwin <[EMAIL PROTECTED]>


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25 of 33] IB/ipath - On unrecoverable errors, force link dow, LEDs off

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID e9895e2ad504a2590b0943c037d1fa5f9568fda3
# Parent  3e81a6b18b42bbe6dffab382fb26d754dfdf83a1
IB/ipath - On unrecoverable errors, force link dow, LEDs off

If the chip is no longer usable, LEDs should be turned off so system
can be found easily in the cluster.

Also some minor reorganizing so both chips print hardware error message
at same point and only if there were unrecovered errors

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 3e81a6b18b42 -r e9895e2ad504 drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c   Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c   Thu Mar 15 14:34:25 
2007 -0700
@@ -42,6 +42,9 @@
 
 #include "ipath_kernel.h"
 #include "ipath_registers.h"
+
+static void ipath_setup_ht_setextled(struct ipath_devdata *, u64, u64);
+
 
 /*
  * This lists the InfiniPath registers, in the actual chip layout.
@@ -572,9 +575,14 @@ static void ipath_ht_handle_hwerrors(str
 * make the complaint once, in case it's stuck
 * or recurring, and we get here multiple
 * times.
+* force link down, so switch knows, and
+* LEDs are turned off
 */
-   ipath_dev_err(dd, "%s hardware error\n", msg);
if (dd->ipath_flags & IPATH_INITTED) {
+   ipath_set_linkstate(dd, IPATH_IB_LINKDOWN);
+   ipath_setup_ht_setextled(dd, 
+   INFINIPATH_IBCS_L_STATE_DOWN,
+   INFINIPATH_IBCS_LT_STATE_DISABLED);
ipath_dev_err(dd, "Fatal Hardware Error (freeze "
  "mode), no longer usable, SN %.16s\n",
  dd->ipath_serial);
@@ -592,6 +600,8 @@ static void ipath_ht_handle_hwerrors(str
}
else
*msg = 0; /* recovered from all of them */
+   if (*msg)
+   ipath_dev_err(dd, "%s hardware error\n", msg);
if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg)
/*
 * for status file; if no trailing brace is copied,
diff -r 3e81a6b18b42 -r e9895e2ad504 drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c   Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c   Thu Mar 15 14:34:25 
2007 -0700
@@ -42,6 +42,8 @@
 
 #include "ipath_kernel.h"
 #include "ipath_registers.h"
+
+static void ipath_setup_pe_setextled(struct ipath_devdata *, u64, u64);
 
 /*
  * This file contains all the chip-specific register information and
@@ -407,8 +409,14 @@ static void ipath_pe_handle_hwerrors(str
 * if any set that we aren't ignoring only make the
 * complaint once, in case it's stuck or recurring,
 * and we get here multiple times
+* Force link down, so switch knows, and
+* LEDs are turned off
 */
if (dd->ipath_flags & IPATH_INITTED) {
+   ipath_set_linkstate(dd, IPATH_IB_LINKDOWN);
+   ipath_setup_pe_setextled(dd, 
+   INFINIPATH_IBCS_L_STATE_DOWN,
+   INFINIPATH_IBCS_LT_STATE_DISABLED);
ipath_dev_err(dd, "Fatal Hardware Error (freeze 
"
  "mode), no longer usable, SN 
%.16s\n",
  dd->ipath_serial);
@@ -482,7 +490,8 @@ static void ipath_pe_handle_hwerrors(str
 dd->ipath_hwerrmask);
}
 
-   ipath_dev_err(dd, "%s hardware error\n", msg);
+   if (*msg)
+   ipath_dev_err(dd, "%s hardware error\n", msg);
if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg) {
/*
 * for /sys status file ; if no trailing } is copied, we'll
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08 of 33] IB/ipath - fix up some debug messages

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID 187ff5af5e5dd2b1f2ca48ba6ad0056ce7fc7403
# Parent  02b57b02578b7ffb189de66f7886214e9d5f2045
IB/ipath - fix up some debug messages

ipath_dbg doesn't need the same prefixes that printk does.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 02b57b02578b -r 187ff5af5e5d drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:24 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:24 
2007 -0700
@@ -1989,7 +1989,8 @@ static int __init infinipath_init(void)
 {
int ret;
 
-   ipath_dbg(KERN_INFO DRIVER_LOAD_MSG "%s", ib_ipath_version);
+   if (ipath_debug & __IPATH_DBG)
+   printk(KERN_INFO DRIVER_LOAD_MSG "%s", ib_ipath_version);
 
/*
 * These must be called before the driver is registered with
diff -r 02b57b02578b -r 187ff5af5e5d drivers/infiniband/hw/ipath/ipath_keys.c
--- a/drivers/infiniband/hw/ipath/ipath_keys.c  Thu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c  Thu Mar 15 14:34:24 2007 -0700
@@ -61,7 +61,7 @@ int ipath_alloc_lkey(struct ipath_lkey_t
r = (r + 1) & (rkt->max - 1);
if (r == n) {
spin_unlock_irqrestore(>lock, flags);
-   ipath_dbg(KERN_INFO "LKEY table full\n");
+   ipath_dbg("LKEY table full\n");
ret = 0;
goto bail;
}
diff -r 02b57b02578b -r 187ff5af5e5d drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:24 2007 -0700
@@ -274,7 +274,7 @@ void ipath_free_all_qps(struct ipath_qp_
free_qpn(qpt, qp->ibqp.qp_num);
if (!atomic_dec_and_test(>refcount) ||
!ipath_destroy_qp(>ibqp))
-   ipath_dbg(KERN_INFO "QP memory leak!\n");
+   ipath_dbg("QP memory leak!\n");
qp = nqp;
}
}
@@ -369,7 +369,7 @@ void ipath_error_qp(struct ipath_qp *qp,
struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
struct ib_wc wc;
 
-   ipath_dbg(KERN_INFO "QP%d/%d in error state\n",
+   ipath_dbg("QP%d/%d in error state\n",
  qp->ibqp.qp_num, qp->remote_qpn);
 
spin_lock(>pending_lock);
@@ -980,7 +980,7 @@ void ipath_sqerror_qp(struct ipath_qp *q
struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);
 
-   ipath_dbg(KERN_INFO "Send queue error on QP%d/%d: err: %d\n",
+   ipath_dbg("Send queue error on QP%d/%d: err: %d\n",
  qp->ibqp.qp_num, qp->remote_qpn, wc->status);
 
spin_lock(>pending_lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21 of 33] IB/ipath - force PIOAvail update entry point

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Arthur Jones <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 68302e9dbd8803f937af9f02ca26a63ff43e9afa
# Parent  8a013b707785accfd71589334bbf8e4029ffa892
IB/ipath - force PIOAvail update entry point.

Due to a chip bug, the PIOAvail register is not always updated to memory.
This patch allows userspace to force an update.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 8a013b707785 -r 68302e9dbd88 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.hThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.hThu Mar 15 14:34:25 
2007 -0700
@@ -352,7 +352,7 @@ struct ipath_base_info {
  * may not be implemented; the user code must deal with this if it
  * cares, or it must abort after initialization reports the difference.
  */
-#define IPATH_USER_SWMINOR 4
+#define IPATH_USER_SWMINOR 5
 
 #define IPATH_USER_SWVERSION ((IPATH_USER_SWMAJOR<<16) | IPATH_USER_SWMINOR)
 
@@ -429,8 +429,11 @@ struct ipath_user_info {
 #define __IPATH_CMD_SLAVE_INFO 22  /* return info on slave processes (for 
old user code) */
 #define IPATH_CMD_ASSIGN_PORT  23  /* allocate HCA and port */
 #define IPATH_CMD_USER_INIT24  /* set up userspace */
-
-#define IPATH_CMD_MAX  24
+#define IPATH_CMD_UNUSED_1 25
+#define IPATH_CMD_UNUSED_2 26
+#define IPATH_CMD_PIOAVAILUPD  27  /* force an update of PIOAvail reg */
+
+#define IPATH_CMD_MAX  27
 
 struct ipath_port_info {
__u32 num_active;   /* number of active units */
diff -r 8a013b707785 -r 68302e9dbd88 
drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
@@ -2047,6 +2047,17 @@ static int ipath_get_slave_info(struct i
return ret;
 }
 
+static int ipath_force_pio_avail_update(struct ipath_devdata *dd)
+{
+   u64 reg = dd->ipath_sendctrl;
+
+   clear_bit(IPATH_S_PIOBUFAVAILUPD, );
+   ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, reg);
+   ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
+
+   return 0;
+}
+
 static ssize_t ipath_write(struct file *fp, const char __user *data,
   size_t count, loff_t *off)
 {
@@ -2106,22 +2117,30 @@ static ssize_t ipath_write(struct file *
dest = _mask_addr;
src = >cmd.slave_mask_addr;
break;
+   case IPATH_CMD_PIOAVAILUPD: // force an update of PIOAvail reg
+   copy = 0;
+   src = NULL;
+   dest = NULL;
+   break;
default:
ret = -EINVAL;
goto bail;
}
 
-   if ((count - consumed) < copy) {
-   ret = -EINVAL;
-   goto bail;
-   }
-
-   if (copy_from_user(dest, src, copy)) {
-   ret = -EFAULT;
-   goto bail;
-   }
-
-   consumed += copy;
+   if (copy) {
+   if ((count - consumed) < copy) {
+   ret = -EINVAL;
+   goto bail;
+   }
+
+   if (copy_from_user(dest, src, copy)) {
+   ret = -EFAULT;
+   goto bail;
+   }
+
+   consumed += copy;
+   }
+
pd = port_fp(fp);
if (!pd && cmd.type != __IPATH_CMD_USER_INIT &&
cmd.type != IPATH_CMD_ASSIGN_PORT) {
@@ -2172,6 +2191,9 @@ static ssize_t ipath_write(struct file *
   (void __user *) (unsigned long)
   cmd.cmd.slave_mask_addr);
break;
+   case IPATH_CMD_PIOAVAILUPD:
+   ret = ipath_force_pio_avail_update(pd->port_dd);
+   break;
}
 
if (ret >= 0)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20 of 33] IB/ipath - call free_irq on chip specific initialization failure

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Arthur Jones <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 8a013b707785accfd71589334bbf8e4029ffa892
# Parent  c96d13efde155eb60dc0eca0bd56e81ecd36281b
IB/ipath - call free_irq on chip specific initialization failure

In initialization, if we bailed at chip specific initialization, we
forgot to clean up the irq we had requested.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r c96d13efde15 -r 8a013b707785 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:25 
2007 -0700
@@ -486,7 +486,7 @@ static int __devinit ipath_init_one(stru
 
ret = ipath_init_chip(dd, 0);   /* do the chip-specific init */
if (ret)
-   goto bail_iounmap;
+   goto bail_irqsetup;
 
ret = ipath_enable_wc(dd);
 
@@ -504,6 +504,9 @@ static int __devinit ipath_init_one(stru
ipath_register_ib_device(dd);
 
goto bail;
+
+bail_irqsetup:
+   if (pdev->irq) free_irq(pdev->irq, dd);
 
 bail_iounmap:
iounmap((volatile void __iomem *) dd->ipath_kregbase);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] swsusp: Do not use page flags

2007-03-15 Thread Andrew Morton
On Thu, 15 Mar 2007 23:19:02 +0100 (CET)
Jiri Kosina <[EMAIL PROTECTED]> wrote:

> On Thu, 15 Mar 2007, Andrew Morton wrote:
> 
> > > > And why _does_ suspend use GFP_ATOMIC all over the place?
> > > Generally, because it cannot sleep.
> > Why not?
> 
> I guess it's simply beucase of kswapd being already frozen, so there is no 
> chance that once GFP_KERNEL allocation goes to sleep, it is going to get 
> any free pages eventually ... ?

No, things should run fine with a dead kswapd.

There are reasons why we can't call into filesystems from there, but
GFP_NOIO will ensure that and it is heaps better than GFP_ATOMIC.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Description of the ipath patches I just sent

2007-03-15 Thread Bryan O'Sullivan

My patch mailer decided to send them without a summary.  Oops.

This series is a variety of bugfixes and cleanups for the ipath driver. 
 It doesn't touch anything in IB-land.  The patches apply cleanly and 
run happily against 2.6.21-rc3.


 ipath_common.h|   25 -
 ipath_cq.c|   38 +
 ipath_debug.h |1
 ipath_diag.c  |   19
 ipath_driver.c|  125 +++--
 ipath_eeprom.c|4
 ipath_file_ops.c  |  307 --
 ipath_iba6110.c   |  154 ---
 ipath_iba6120.c   |   73 ++-
 ipath_init_chip.c |   88 ++--
 ipath_intr.c  |  100 +++-
 ipath_kernel.h|   10
 ipath_keys.c  |   14
 ipath_mr.c|   12
 ipath_qp.c|  133 +++---
 ipath_rc.c|  960 +---
 ipath_registers.h |   22 -
 ipath_ruc.c   |   63 +-
 ipath_stats.c |   16
 ipath_uc.c|6
 ipath_ud.c|8
 ipath_verbs.c |   14
 ipath_verbs.h |   57 +-
 23 files changed, 1387 insertions(+), 862 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03 of 33] IB/ipath - definitions of two of RXE parity error bits were reversed

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID e2eec96f356a7269b46a68f29fc5e711d2f5a7a4
# Parent  3337d450afeebc553a09fe5c18ed0b2444547c24
IB/ipath - definitions of two of RXE parity error bits were reversed

The chip documentation on the expected TID vs eager TID parity error
bits was reversed from what was implemented in the RTL, for both chips.
This corrects the definitions.

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 3337d450afee -r e2eec96f356a 
drivers/infiniband/hw/ipath/ipath_registers.h
--- a/drivers/infiniband/hw/ipath/ipath_registers.h Thu Mar 15 14:34:24 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h Thu Mar 15 14:34:24 
2007 -0700
@@ -128,7 +128,7 @@
 
 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */
 /* TXEMEMPARITYERR bit 0: PIObuf, 1: PIOpbc, 2: launchfifo
- * RXEMEMPARITYERR bit 0: rcvbuf, 1: lookupq, 2: eagerTID, 3: expTID
+ * RXEMEMPARITYERR bit 0: rcvbuf, 1: lookupq, 2:  expTID, 3: eagerTID
  * bit 4: flag buffer, 5: datainfo, 6: header info */
 #define INFINIPATH_HWE_TXEMEMPARITYERR_MASK 0xFULL
 #define INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT 40
@@ -143,8 +143,8 @@
 /* rxe mem parity errors (shift by INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) */
 #define INFINIPATH_HWE_RXEMEMPARITYERR_RCVBUF   0x01ULL
 #define INFINIPATH_HWE_RXEMEMPARITYERR_LOOKUPQ  0x02ULL
-#define INFINIPATH_HWE_RXEMEMPARITYERR_EAGERTID 0x04ULL
-#define INFINIPATH_HWE_RXEMEMPARITYERR_EXPTID   0x08ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_EXPTID   0x04ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_EAGERTID 0x08ULL
 #define INFINIPATH_HWE_RXEMEMPARITYERR_FLAGBUF  0x10ULL
 #define INFINIPATH_HWE_RXEMEMPARITYERR_DATAINFO 0x20ULL
 #define INFINIPATH_HWE_RXEMEMPARITYERR_HDRINFO  0x40ULL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00 of 33] Set of ipath patches for 2.6.22

2007-03-15 Thread Bryan O'Sullivan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/22 take 3] UBI: EBA unit

2007-03-15 Thread Randy Dunlap
On Thu, 15 Mar 2007 11:07:03 -0800 Andrew Morton wrote:

> 
> There's way too much code here to expect it to get decently reviewed, alas.

Yes.

/me repeats wish that Not Everything Should Be Sent to lkml.  :(

> > On Wed, 14 Mar 2007 17:20:24 +0200 Artem Bityutskiy <[EMAIL PROTECTED]> 
> > wrote:
> >
> > ...
> >
> > +/**
> > + * leb_get_ver - get logical eraseblock version.
> > + *
> > + * @ubi: the UBI device description object
> > + * @vol_id: the volume ID
> > + * @lnum: the logical eraseblock number
> > + *
> > + * The logical eraseblock has to be locked. Note, all this leb_ver stuff is
> > + * obsolete and will be removed eventually. FIXME: to be removed together 
> > with
> > + * leb_ver support.
> > + */

Please use kernel-doc syntax and test it.  Using and testing it
are really easy to do.  It's just a simple language.  Don't make
(even trivial) problems for others to clean up...

Documentation/kernel-doc-nano-HOWTO.txt

Above:  no "blank" line between the function name and its parameters.

> > +static inline int leb_get_ver(struct ubi_info *ubi, int vol_id, int lnum)
> > +{
> > +   int idx, leb_ver;
> > +
> > +   idx = vol_id2idx(ubi, vol_id);
> > +
> > +   spin_lock(>eba.eba_tbl_lock);
> > +   ubi_assert(ubi->eba.eba_tbl[idx].recs);
> > +   leb_ver = ubi->eba.eba_tbl[idx].recs[lnum].leb_ver;
> > +   spin_unlock(>eba.eba_tbl_lock);
> > +
> > +   return leb_ver;
> > +}


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01 of 33] IB/ipath - add ability to set and clear IB local loopback

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID b1d05f3486f8bba1dd3c5cbca39f06a5e1b3d6fb
# Parent  0d37971d4ab0c8b6f7a8f6e8222112321982498f
IB/ipath - add ability to set and clear IB local loopback

This is a sticky state.  It is useful for diagnosing problems with boards
versus cable/switch problems.

Signed-off-by: Dave Olson <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 0d37971d4ab0 -r b1d05f3486f8 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.hWed Mar 14 17:53:43 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.hThu Mar 15 14:34:24 
2007 -0700
@@ -78,6 +78,8 @@
 #define IPATH_IB_LINKINIT  3
 #define IPATH_IB_LINKDOWN_SLEEP4
 #define IPATH_IB_LINKDOWN_DISABLE  5
+#define IPATH_IB_LINK_LOOPBACK 6 /* enable local loopback */
+#define IPATH_IB_LINK_EXTERNAL 7 /* normal, disable local loopback */
 
 /*
  * stats maintained by the driver.  For now, at least, this is global
diff -r 0d37971d4ab0 -r b1d05f3486f8 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cWed Mar 14 17:53:43 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cThu Mar 15 14:34:24 
2007 -0700
@@ -1662,6 +1662,22 @@ int ipath_set_linkstate(struct ipath_dev
lstate = IPATH_LINKACTIVE;
break;
 
+   case IPATH_IB_LINK_LOOPBACK:
+   dev_info(>pcidev->dev, "Enabling IB local loopback\n");
+   dd->ipath_ibcctrl |= INFINIPATH_IBCC_LOOPBACK;
+   ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl,
+dd->ipath_ibcctrl);
+   ret = 0;
+   goto bail; // no state change to wait for
+
+   case IPATH_IB_LINK_EXTERNAL:
+   dev_info(>pcidev->dev, "Disabling IB local loopback 
(normal)\n");
+   dd->ipath_ibcctrl &= ~INFINIPATH_IBCC_LOOPBACK;
+   ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl,
+dd->ipath_ibcctrl);
+   ret = 0;
+   goto bail; // no state change to wait for
+
default:
ipath_dbg("Invalid linkstate 0x%x requested\n", newstate);
ret = -EINVAL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02 of 33] IB/ipath - fix user memory region creation when IOMMU present

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID 3337d450afeebc553a09fe5c18ed0b2444547c24
# Parent  b1d05f3486f8bba1dd3c5cbca39f06a5e1b3d6fb
IB/ipath - fix user memory region creation when IOMMU present

The loop which initializes the user memory region from an array
of pages was using the wrong limit for the array.  This worked
OK when dma_map_sg() returned the same number as the number of pages.
This patch fixes the problem.

Signed-off-by: Ralph Campbell <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r b1d05f3486f8 -r 3337d450afee drivers/infiniband/hw/ipath/ipath_mr.c
--- a/drivers/infiniband/hw/ipath/ipath_mr.cThu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mr.cThu Mar 15 14:34:24 2007 -0700
@@ -210,9 +210,15 @@ struct ib_mr *ipath_reg_user_mr(struct i
m = 0;
n = 0;
list_for_each_entry(chunk, >chunk_list, list) {
-   for (i = 0; i < chunk->nmap; i++) {
-   mr->mr.map[m]->segs[n].vaddr =
-   page_address(chunk->page_list[i].page);
+   for (i = 0; i < chunk->nents; i++) {
+   void *vaddr;
+
+   vaddr = page_address(chunk->page_list[i].page);
+   if (!vaddr) {
+   ret = ERR_PTR(-EINVAL);
+   goto bail;
+   }
+   mr->mr.map[m]->segs[n].vaddr = vaddr;
mr->mr.map[m]->segs[n].length = region->page_size;
n++;
if (n == IPATH_SEGSZ) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15 of 33] IB/ipath - allow receive ports mapped into userspace to be shared

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Mark Debbage <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 5ff8f23d0e61169f598ab1d93aa6324d88c17921
# Parent  62da2fb770b66310ac06ba0190bf2bed2a5a764f
IB/ipath - allow receive ports mapped into userspace to be shared

Improve port-sharing performance by allowing any process to receive
packets from the shared hardware port under a spin lock for mutual
exclusion. Previously, one process was nominated as the master and
that process was responsible for receiving all packets from the shared
hardware port and either consuming them or forwarding them to their
destination. This led to starvation problems for other processes when
the master process was busy in computation phases.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 62da2fb770b6 -r 5ff8f23d0e61 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.hThu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.hThu Mar 15 14:34:25 
2007 -0700
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -318,10 +318,16 @@ struct ipath_base_info {
/* address of readonly memory copy of the rcvhdrq tail register. */
__u64 spi_rcvhdr_tailaddr;
 
-   /* shared memory pages for subports if IPATH_RUNTIME_MASTER is set */
+   /* shared memory pages for subports if port is shared */
__u64 spi_subport_uregbase;
__u64 spi_subport_rcvegrbuf;
__u64 spi_subport_rcvhdr_base;
+
+   /* shared memory page for hardware port if it is shared */
+   __u64 spi_port_uregbase;
+   __u64 spi_port_rcvegrbuf;
+   __u64 spi_port_rcvhdr_base;
+   __u64 spi_port_rcvhdr_tailaddr;
 
 } __attribute__ ((aligned(8)));
 
@@ -346,7 +352,7 @@ struct ipath_base_info {
  * may not be implemented; the user code must deal with this if it
  * cares, or it must abort after initialization reports the difference.
  */
-#define IPATH_USER_SWMINOR 3
+#define IPATH_USER_SWMINOR 4
 
 #define IPATH_USER_SWVERSION ((IPATH_USER_SWMAJOR<<16) | IPATH_USER_SWMINOR)
 
@@ -420,7 +426,7 @@ struct ipath_user_info {
 #define IPATH_CMD_TID_UPDATE   19  /* update expected TID entries */
 #define IPATH_CMD_TID_FREE 20  /* free expected TID entries */
 #define IPATH_CMD_SET_PART_KEY 21  /* add partition key */
-#define IPATH_CMD_SLAVE_INFO   22  /* return info on slave processes */
+#define __IPATH_CMD_SLAVE_INFO 22  /* return info on slave processes (for 
old user code) */
 #define IPATH_CMD_ASSIGN_PORT  23  /* allocate HCA and port */
 #define IPATH_CMD_USER_INIT24  /* set up userspace */
 
@@ -432,7 +438,7 @@ struct ipath_port_info {
__u16 port; /* port on unit assigned to caller */
__u16 subport;  /* subport on unit assigned to caller */
__u16 num_ports;/* number of ports available on unit */
-   __u16 num_subports; /* number of subport slaves opened on port */
+   __u16 num_subports; /* number of subports opened on port */
 };
 
 struct ipath_tid_info {
diff -r 62da2fb770b6 -r 5ff8f23d0e61 
drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Thu Mar 15 14:34:25 
2007 -0700
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -99,7 +99,7 @@ static int ipath_get_base_info(struct fi
sz = sizeof(*kinfo);
/* If port sharing is not requested, allow the old size structure */
if (!shared)
-   sz -= 3 * sizeof(u64);
+   sz -= 7 * sizeof(u64);
if (ubase_size < sz) {
ipath_cdbg(PROC,
   "Base size %zu, need %zu (version mismatch?)\n",
@@ -177,38 +177,30 @@ static int ipath_get_base_info(struct fi
kinfo->spi_piobufbase = (u64) pd->port_piobufs +
dd->ipath_palign *
(dd->ipath_pbufsport - kinfo->spi_piocnt);
-   kinfo->__spi_uregbase = (u64) dd->ipath_uregbase +
-   dd->ipath_palign * pd->port_port;
} else {
unsigned slave = subport_fp(fp) - 1;
 
kinfo->spi_piocnt = dd->ipath_pbufsport / subport_cnt;
kinfo->spi_piobufbase = (u64) pd->port_piobufs +
dd->ipath_palign * kinfo->spi_piocnt * slave;
+   }
+   if (shared) {
+   kinfo->spi_port_uregbase = 

[PATCH 18 of 33] IB/ipath - Fix calculation for number of kernel PIO buffers

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID 878b6054e9ca5327db9c9438f66265afaf88b055
# Parent  a023ffe32d9df8cba7d8b15c24e7918eeb236a2c
IB/ipath - Fix calculation for number of kernel PIO buffers

If the module parameter "kpiobufs" is set too high, the calculation
to reset it to a sane value was incorrect.

Signed-off-by: Ralph Campbell <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r a023ffe32d9d -r 878b6054e9ca 
drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Thu Mar 15 14:34:25 
2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Thu Mar 15 14:34:25 
2007 -0700
@@ -668,6 +668,7 @@ int ipath_init_chip(struct ipath_devdata
 {
int ret = 0, i;
u32 val32, kpiobufs;
+   u32 piobufs, uports;
u64 val;
struct ipath_portdata *pd = NULL; /* keep gcc4 happy */
gfp_t gfp_flags = GFP_USER | __GFP_COMP;
@@ -702,16 +703,17 @@ int ipath_init_chip(struct ipath_devdata
 * the in memory DMA'ed copies of the registers.  This has to
 * be done early, before we calculate lastport, etc.
 */
-   val = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k;
+   piobufs = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k;
/*
 * calc number of pioavail registers, and save it; we have 2
 * bits per buffer.
 */
-   dd->ipath_pioavregs = ALIGN(val, sizeof(u64) * BITS_PER_BYTE / 2)
+   dd->ipath_pioavregs = ALIGN(piobufs, sizeof(u64) * BITS_PER_BYTE / 2)
/ (sizeof(u64) * BITS_PER_BYTE / 2);
+   uports = dd->ipath_cfgports ? dd->ipath_cfgports - 1 : 0;
if (ipath_kpiobufs == 0) {
/* not set by user (this is default) */
-   if ((dd->ipath_piobcnt2k + dd->ipath_piobcnt4k) > 128)
+   if (piobufs >= (uports * IPATH_MIN_USER_PORT_BUFCNT) + 32)
kpiobufs = 32;
else
kpiobufs = 16;
@@ -719,31 +721,25 @@ int ipath_init_chip(struct ipath_devdata
else
kpiobufs = ipath_kpiobufs;
 
-   if (kpiobufs >
-   (dd->ipath_piobcnt2k + dd->ipath_piobcnt4k -
-(dd->ipath_cfgports * IPATH_MIN_USER_PORT_BUFCNT))) {
-   i = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k -
-   (dd->ipath_cfgports * IPATH_MIN_USER_PORT_BUFCNT);
+   if (kpiobufs + (uports * IPATH_MIN_USER_PORT_BUFCNT) > piobufs) {
+   i = (int) piobufs -
+   (int) (uports * IPATH_MIN_USER_PORT_BUFCNT);
if (i < 0)
i = 0;
-   dev_info(>pcidev->dev, "Allocating %d PIO bufs for "
-"kernel leaves too few for %d user ports "
+   dev_info(>pcidev->dev, "Allocating %d PIO bufs of "
+"%d for kernel leaves too few for %d user ports "
 "(%d each); using %u\n", kpiobufs,
-dd->ipath_cfgports - 1,
-IPATH_MIN_USER_PORT_BUFCNT, i);
+piobufs, uports, IPATH_MIN_USER_PORT_BUFCNT, i);
/*
 * shouldn't change ipath_kpiobufs, because could be
 * different for different devices...
 */
kpiobufs = i;
}
-   dd->ipath_lastport_piobuf =
-   dd->ipath_piobcnt2k + dd->ipath_piobcnt4k - kpiobufs;
-   dd->ipath_pbufsport = dd->ipath_cfgports > 1
-   ? dd->ipath_lastport_piobuf / (dd->ipath_cfgports - 1)
-   : 0;
-   val32 = dd->ipath_lastport_piobuf -
-   (dd->ipath_pbufsport * (dd->ipath_cfgports - 1));
+   dd->ipath_lastport_piobuf = piobufs - kpiobufs;
+   dd->ipath_pbufsport =
+   uports ? dd->ipath_lastport_piobuf / uports : 0;
+   val32 = dd->ipath_lastport_piobuf - (dd->ipath_pbufsport * uports);
if (val32 > 0) {
ipath_dbg("allocating %u pbufs/port leaves %u unused, "
  "add to kernel\n", dd->ipath_pbufsport, val32);
@@ -754,8 +750,7 @@ int ipath_init_chip(struct ipath_devdata
dd->ipath_lastpioindex = dd->ipath_lastport_piobuf;
ipath_cdbg(VERBOSE, "%d PIO bufs for kernel out of %d total %u "
   "each for %u user ports\n", kpiobufs,
-  dd->ipath_piobcnt2k + dd->ipath_piobcnt4k,
-  dd->ipath_pbufsport, dd->ipath_cfgports - 1);
+  piobufs, dd->ipath_pbufsport, uports);
 
dd->ipath_f_early_init(dd);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19 of 33] IB/ipath - Discard multicast packets without a GRH

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Bryan O'Sullivan <[EMAIL PROTECTED]>
# Date 1173994465 25200
# Node ID c96d13efde155eb60dc0eca0bd56e81ecd36281b
# Parent  878b6054e9ca5327db9c9438f66265afaf88b055
IB/ipath - Discard multicast packets without a GRH

This patch fixes a bug where multicast packets without a GRH
were not being dropped as per the IB spec.

Signed-off-by: Ralph Campbell <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 878b6054e9ca -r c96d13efde15 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Mar 15 14:34:25 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Mar 15 14:34:25 2007 -0700
@@ -438,6 +438,10 @@ void ipath_ib_rcv(struct ipath_ibdev *de
struct ipath_mcast *mcast;
struct ipath_mcast_qp *p;
 
+   if (lnh != IPATH_LRH_GRH) {
+   dev->n_pkt_drops++;
+   goto bail;
+   }
mcast = ipath_mcast_find(>u.l.grh.dgid);
if (mcast == NULL) {
dev->n_pkt_drops++;
@@ -445,8 +449,7 @@ void ipath_ib_rcv(struct ipath_ibdev *de
}
dev->n_multicast_rcv++;
list_for_each_entry_rcu(p, >qp_list, list)
-   ipath_qp_rcv(dev, hdr, lnh == IPATH_LRH_GRH, data,
-tlen, p->qp);
+   ipath_qp_rcv(dev, hdr, 1, data, tlen, p->qp);
/*
 * Notify ipath_multicast_detach() if it is waiting for us
 * to finish.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07 of 33] IB/ipath - support larger IB_QP_MAX_DEST_RD_ATOMIC and IB_QP_MAX_QP_RD_ATOMIC

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID 02b57b02578b7ffb189de66f7886214e9d5f2045
# Parent  78ae7bddbd5e205adc12993ad2956e0402ca01d7
IB/ipath - support larger IB_QP_MAX_DEST_RD_ATOMIC and IB_QP_MAX_QP_RD_ATOMIC

This patch adds support for multiple RDMA reads and atomics to be
sent before an ACK is required to be seen by the requester.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 78ae7bddbd5e -r 02b57b02578b drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cThu Mar 15 14:34:24 2007 -0700
@@ -320,7 +320,8 @@ static void ipath_reset_qp(struct ipath_
qp->remote_qpn = 0;
qp->qkey = 0;
qp->qp_access_flags = 0;
-   clear_bit(IPATH_S_BUSY, >s_flags);
+   qp->s_busy = 0;
+   qp->s_flags &= ~IPATH_S_SIGNAL_REQ_WR;
qp->s_hdrwords = 0;
qp->s_psn = 0;
qp->r_psn = 0;
@@ -333,7 +334,6 @@ static void ipath_reset_qp(struct ipath_
qp->r_state = IB_OPCODE_UC_SEND_LAST;
}
qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
-   qp->r_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
qp->r_nak_state = 0;
qp->r_wrid_valid = 0;
qp->s_rnr_timeout = 0;
@@ -344,6 +344,10 @@ static void ipath_reset_qp(struct ipath_
qp->s_ssn = 1;
qp->s_lsn = 0;
qp->s_wait_credit = 0;
+   memset(qp->s_ack_queue, 0, sizeof(qp->s_ack_queue));
+   qp->r_head_ack_queue = 0;
+   qp->s_tail_ack_queue = 0;
+   qp->s_num_rd_atomic = 0;
if (qp->r_rq.wq) {
qp->r_rq.wq->head = 0;
qp->r_rq.wq->tail = 0;
@@ -503,6 +507,10 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
attr->path_mig_state != IB_MIG_REARM)
goto inval;
 
+   if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
+   if (attr->max_dest_rd_atomic > IPATH_MAX_RDMA_ATOMIC)
+   goto inval;
+
switch (new_state) {
case IB_QPS_RESET:
ipath_reset_qp(qp);
@@ -558,6 +566,12 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
 
if (attr_mask & IB_QP_QKEY)
qp->qkey = attr->qkey;
+
+   if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
+   qp->r_max_rd_atomic = attr->max_dest_rd_atomic;
+
+   if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC)
+   qp->s_max_rd_atomic = attr->max_rd_atomic;
 
qp->state = new_state;
spin_unlock_irqrestore(>s_lock, flags);
@@ -598,8 +612,8 @@ int ipath_query_qp(struct ib_qp *ibqp, s
attr->alt_pkey_index = 0;
attr->en_sqd_async_notify = 0;
attr->sq_draining = 0;
-   attr->max_rd_atomic = 1;
-   attr->max_dest_rd_atomic = 1;
+   attr->max_rd_atomic = qp->s_max_rd_atomic;
+   attr->max_dest_rd_atomic = qp->r_max_rd_atomic;
attr->min_rnr_timer = qp->r_min_rnr_timer;
attr->port_num = 1;
attr->timeout = qp->timeout;
@@ -614,7 +628,7 @@ int ipath_query_qp(struct ib_qp *ibqp, s
init_attr->recv_cq = qp->ibqp.recv_cq;
init_attr->srq = qp->ibqp.srq;
init_attr->cap = attr->cap;
-   if (qp->s_flags & (1 << IPATH_S_SIGNAL_REQ_WR))
+   if (qp->s_flags & IPATH_S_SIGNAL_REQ_WR)
init_attr->sq_sig_type = IB_SIGNAL_REQ_WR;
else
init_attr->sq_sig_type = IB_SIGNAL_ALL_WR;
@@ -786,7 +800,7 @@ struct ib_qp *ipath_create_qp(struct ib_
qp->s_size = init_attr->cap.max_send_wr + 1;
qp->s_max_sge = init_attr->cap.max_send_sge;
if (init_attr->sq_sig_type == IB_SIGNAL_REQ_WR)
-   qp->s_flags = 1 << IPATH_S_SIGNAL_REQ_WR;
+   qp->s_flags = IPATH_S_SIGNAL_REQ_WR;
else
qp->s_flags = 0;
dev = to_idev(ibpd->device);
diff -r 78ae7bddbd5e -r 02b57b02578b drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar 15 14:34:24 2007 -0700
@@ -37,6 +37,19 @@
 /* cut down ridiculously long IB macro names */
 #define OP(x) IB_OPCODE_RC_##x
 
+static u32 restart_sge(struct ipath_sge_state *ss, struct ipath_swqe *wqe,
+  u32 psn, u32 pmtu)
+{
+   u32 len;
+
+   len = ((psn - wqe->psn) & IPATH_PSN_MASK) * pmtu;
+   ss->sge = wqe->sg_list[0];
+   ss->sg_list = wqe->sg_list + 1;
+   ss->num_sge = wqe->wr.num_sge;
+   ipath_skip_sge(ss, len);
+   return wqe->length - len;
+}
+
 /**
  * ipath_init_restart- initialize the qp->s_sge after a restart
  * @qp: the QP who's SGE we're restarting
@@ -47,15 +60,9 @@ static void ipath_init_restart(struct ip
 static void ipath_init_restart(struct ipath_qp *qp, struct ipath_swqe *wqe)
 {
struct ipath_ibdev *dev;
-   u32 len;
-
-   len = 

[PATCH 06 of 33] IB/ipath - NMI cpu lockup if local loopback used

2007-03-15 Thread Bryan O'Sullivan
# HG changeset patch
# User Ralph Campbell <[EMAIL PROTECTED]>
# Date 1173994464 25200
# Node ID 78ae7bddbd5e205adc12993ad2956e0402ca01d7
# Parent  fa38a027a0853a80c4f7dfc50345c89f195bc85b
IB/ipath - NMI cpu lockup if local loopback used

If a post send is done in loopback and there is no receive queue entry,
the sending QP is put on a timeout list for a while so the receiver has
a chance to post a receive buffer. If the another post send is done,
the code incorrectly tried to put the QP on the timeout list again an
corrupted the timeout list. This eventually leads to a spin lock deadlock
NMI due to the timer function looping forever with the lock held.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r fa38a027a085 -r 78ae7bddbd5e drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c   Thu Mar 15 14:34:24 2007 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c   Thu Mar 15 14:34:24 2007 -0700
@@ -265,7 +265,8 @@ again:
 again:
spin_lock_irqsave(>s_lock, flags);
 
-   if (!(ib_ipath_state_ops[sqp->state] & IPATH_PROCESS_SEND_OK)) {
+   if (!(ib_ipath_state_ops[sqp->state] & IPATH_PROCESS_SEND_OK) ||
+   qp->s_rnr_timeout) {
spin_unlock_irqrestore(>s_lock, flags);
goto done;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: thread stacks and strict vm overcommit accounting

2007-03-15 Thread Alan Cox
> Stack RSS should certainly be included in Committed_AS,
> but RLIMIT_STACK merely limits how big the stack vma may grow to:
> at any moment the stack vma is probably very much smaller,
> and only its current size is accounted in Committed_AS.

With a typical size as a fuzz factor preaccounted in later kernels.

> > > Is this the intended behaviour?
> > 
> > That sounds like a bug to me.
> 
> I'm suspecting it's an oddity rather than a bug.

It is intended behaviour.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: thread stacks and strict vm overcommit accounting

2007-03-15 Thread Andrew Morton
On Thu, 15 Mar 2007 23:33:43 +
Alan Cox <[EMAIL PROTECTED]> wrote:

> > Stack RSS should certainly be included in Committed_AS,
> > but RLIMIT_STACK merely limits how big the stack vma may grow to:
> > at any moment the stack vma is probably very much smaller,
> > and only its current size is accounted in Committed_AS.
> 
> With a typical size as a fuzz factor preaccounted in later kernels.

Where's that done?

> > > > Is this the intended behaviour?
> > > 
> > > That sounds like a bug to me.
> > 
> > I'm suspecting it's an oddity rather than a bug.
> 
> It is intended behaviour.

Each instance of

main()
{
sleep(100);
}

appears to increase Committed_AS by around 200kb.  But we've committed to
providing it with 8MB for stack.

How come this is correct?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: taskstats accounting info

2007-03-15 Thread Balbir Singh

Andrew Morton wrote:

On Wed, 14 Mar 2007 17:48:32 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:
Randy.Dunlap wrote:

Hi,

Documentation/accounting/delay-accounting.txt says that the
getdelays program has a "-c cmd" argument, but that option
does not seem to exist in Documentation/account/getdelays.c.

Do you have an updated version of getdelays.c?
If not, please correct that documentation.


Yes, I did, but then I changed my laptop. I should have it archived
at some place, I'll dig it out or correct the documentation.


Is getdelays.c the best available example of a program
using the taskstats netlink interface?


It's the most portable example, since it does not depend on libnl.


err, what is libnl?


libnl is a library abstraction for netlink (libnetlink).



If there exists some real userspace infrastructure which utilises
taskstats, can we please get a referece to it into the kernel
Documentation?  Perhaps in the TASKSTATS Kconfig entry, thanks.



That sounds like a good idea. I'll check for details and get back.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Linux 2.6.20.2 - unable to handle kernel paging request - still accessing freed memory

2007-03-15 Thread Greg KH
On Wed, Mar 14, 2007 at 01:23:02PM +0200, Pekka Enberg wrote:
> Hi Greg,
> 
> I think there's some sort of reference counting problem with sysfs in
> 2.6.20 kernels. Can you please help us debug it further?

Is there any way you can use 'git bisect' to try to track down the root
cause of this?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Move to unshared VMAs in NOMMU mode?

2007-03-15 Thread David Howells
Hugh Dickins <[EMAIL PROTECTED]> wrote:

> But if "the SYSV SHM problem" you mention at the beginning
> is just the "nattch" problem you mention at the end, I doubt
> that's worth such a redesign as you're considering here.

Yes, as far as I know that's the problem.  nattch is available to userspace and
seems to misbehave as far as userspace programs are concerned (I think the
program sees that it is 1 and assumes itself to be the last user).

> Actually, I'm rather surprised SHM needs any such nattch count,
> I'd expect it to deducible from file->f_count and mode_DEST
> (but haven't investigated whether that really works out at all).

Ummm...  Currently file->f_count doesn't count the number of shmats because the
VMAs are shared.  If they are no longer shared then the problem goes away.

There may be several VMLs for a particular process pointing to a VMA.

sys_shmdt() doesn't malfunction because it's not possible to split a VMA in
NOMMU mode, and so the whole VMA must match.

Actually, looking carefully at it, it might go wrong it someone does shmat(),
munmap(), shmdt().  do_munmap(), however, protects against too many munmaps (in
whatever form they're issued).

> If you just need a little CONFIG_MMU in ipc/shm.c to solve your
> problem, I don't think more is justified.

Hmmm... I'm not sure it's quite that simple.  SYSV SHM is provided by a chain
of shm -> tiny-shmem -> ramfs.  The mapping is actually managed by ramfs.

> Your struct vm_region idea does look more to my taste than what
> you presently have; yet if you pursue it, I think it would just
> make divergence worse wouldn't it?  NOMMU wanting vma to contain
> a pointer to vm_region, MMU wanting vm_region embedded in vma.

That bit of divergence is, in effect, already there.  In NOMMU-mode the VMA
owns the backing store; in MMU-mode it does not.  This would, at least, rectify
that: fixing it would mean that the backing store is no longer owned by the
VMA, and would permit more flexibility in overlapping mappings.

> I don't really understand why NOMMU chooses to share vmas, or
> vm_regions, rather than just sharing the data which they indicate.

Where would that data be?  How do you keep track of it?  How do you know when
to deallocate it?

I have considered co-opting the pagecache attached to the mapped inode (which
is exactly how I do shared-writable mappings on ramfs), but that only works for
shared mappings.  I still have to have a way to handle unshareable mappings.
At the moment, they're both the same way (unless overridden by the driver/fs),
and I just share the VMA.

> Just because you can use less memory that way?

That's one consideration.  The other is that it makes management of these
chunks of data simpler.  If the memory isn't attached to the VMA then it must
be managed in some other manner.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix COMPAT_VDSO regression bug

2007-03-15 Thread Leroy van Logchem
Andrew Morton  linux-foundation.org> writes:

> > Revert "[PATCH] Fix CONFIG_COMPAT_VDSO"
> > This reverts commit a1f3bb9ae4497a2ed3eac773fd7798ac33a0371f.
> > 
> > Several systems couldnt boot using CONFIG_HIGHMEM64G=y as
> > reported in bug #8040. Reverting the above patch solved the problem.

> I think reverting it is probably the right thing to do, unless we can fix
> it for real quite promptly.

Chuck Ebbert at redhat.com asked:

> Can you please double check this by trying with/without again -- sometimes
bisects go bad.

As requested I started to redo the test but now without git
using kernel.org tars. The results now are, still using the same .config:
linux-2.6.20.tar.gz  : bad
linux-2.6.20.1.tar.gz: bad (boot log equal)
linux-2.6.20.2.tar.gz: good
linux-2.6.20.3.tar.gz: good
(triple checked)

Chuck is right, the bisect gone bad.
I asked Nilshar to try these kernels too with:
COMPAT_VDSO=y
CONFIG_HIGHMEM64G=y

He did and says 2.6.20.3 works fine. So only 2.6.20 and 2.6.20.1 had
this 'hang' at boot behavior on my Supermicro 7044 while Nilshar's
machine started working with 2.6.20.3

Reverting avoided imo. I hope more people cheer up who reported bug #8040
and confirm it's fine with the latest stable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-15 Thread Andrea Arcangeli
On Thu, Mar 15, 2007 at 05:44:01PM +, Hugh Dickins wrote:
> who removed the !offset condition, he should be consulted on its
> reintroduction.

the !offset check looks a pretty broken heuristic indeed, it would
break random I/O. The real fix is to add a ra.prev_offset along with
ra.prev_page, and if who implements it wants to be stylish he can as
well use a ra.last_contiguous_read structure that has a page and
offset fields (and then of course remove ra.prev_page).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] FUTEX : introduce private hashtables

2007-03-15 Thread William Lee Irwin III
On Fri, Mar 16, 2007 at 07:25:53AM +1100, Nick Piggin wrote:
> I would just avoid the complexity and setup/teardown costs, and just
> use a vmalloc'ed global hash for NUMA.

This patch is not the way to go, but neither are vmalloc()'d global
hashtables. When you just happen to hash to the wrong node, you're in
for quasi-unreproducible poor performance. The size is never right, at
which point RCU resizing is required with all its overhead and memory
freeing delays and failure to resize (even if only to contract) under
pressure. Better would be to use a different data structure admitting
locality of reference and adaptively sizing itself, furthermore
localized to the appropriate sharing domain.  For file-backed futexes,
this would be the struct address_space. For anonymous-backed futexes,
this would be the COW sharing group, which an anon_vma could almost be
used to represent. Using an object to properly represent the COW
sharing group (i.e. Hugh's struct anon) would do the trick, and one
might as well move the rmap code over to it while we're at it since the
anon_vma scanning tricks are all pointless overhead once the COW
sharing group is accurately tracked (the scanning around for nearby vmas
with ->anon_vma set is not great anyway, though the overhead is hidden
in the noise of large teardown and setup operations; inheriting on
fork() is much simpler and faster).

In such a manner localization is accomplished while no interface
extensions are required.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kconfig style question

2007-03-15 Thread Kumar Gala

For source lines I've seen both:

source "arch/powerpc/platforms/52xx/Kconfig"

and

source arch/powerpc/platforms/85xx/Kconfig

Is there a preferred style?  Quotes or not?

- k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] swsusp: Do not use page flags

2007-03-15 Thread Pavel Machek
Hi!

> > > > On Mon, 12 Mar 2007 22:19:20 +0100 "Rafael J. Wysocki" <[EMAIL 
> > > > PROTECTED]> wrote:
> > > > +int create_basic_memory_bitmaps(void)
> > > > +{
> > > > +   struct memory_bitmap *bm1, *bm2;
> > > > +   int error = 0;
> > > > +
> > > > +   BUG_ON(forbidden_pages_map || free_pages_map);
> > > > +
> > > > +   bm1 = kzalloc(sizeof(struct memory_bitmap), GFP_ATOMIC);
> > > > +   if (!bm1)
> > > > +   return -ENOMEM;
> > > > +
> > > > +   error = memory_bm_create(bm1, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> > > > +   if (error)
> > > > +   goto Free_first_object;
> > > > +
> > > > +   bm2 = kzalloc(sizeof(struct memory_bitmap), GFP_ATOMIC);
> > > > +   if (!bm2)
> > > > +   goto Free_first_bitmap;
> > > > +
> > > > +   error = memory_bm_create(bm2, GFP_ATOMIC | __GFP_COLD, PG_ANY);
> > > > +   if (error)
> > > 
> > > What is the risk that we'll go OOM here?  GFP_ATOMIC is rather unreliable.
> > 
> > Well, this can be called after processes (including kswapd) has been frozen.
> > We can't go to sleep at this point.
> 
> So it _is_ unreliable?

We are careful to leave some memory aside for suspend... We actually
free memory at beggining of suspend, and there's some simple "add few
percent for our overhead" there.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 6/13] signalfd/timerfd/asyncfd v5 - timerfd core ...

2007-03-15 Thread Davide Libenzi
On Thu, 15 Mar 2007, Thomas Gleixner wrote:

> Davide,
> 
> On Wed, 2007-03-14 at 15:19 -0700, Davide Libenzi wrote:
> 
> > +static int timerfd_tmrproc(struct hrtimer *htmr)
> > +{
> > +   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
> > +   int rval = HRTIMER_NORESTART;
> > +   unsigned long flags;
> > +
> > +   spin_lock_irqsave(>lock, flags);
> > +   ctx->ticks++;
> > +   wake_up_locked(>wqh);
> > +   if (ctx->tintv.tv64 != 0) {
> > +   hrtimer_forward(htmr, htmr->base->softirq_time, ctx->tintv);
> 
> Sorry, I missed that in the first reviews. Please use
> hrtimer_cb_get_time(htmr) instead of htmr->base->softirq_time, so this
> is high res timer safe.

Heh, I was actually looking for a function instead of peeking over the 
tiemr strcture, but 2.6.20 did not have. Rebased over 2.6.21-rc3 now, so I 
can use it.




> > +   rval = HRTIMER_RESTART;
> > +   }
> > +   spin_unlock_irqrestore(>lock, flags);
> > +
> > +   return rval;
> > +}
> > +
> > +
> > +static int timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
> > +const struct itimerspec *ktmr)
> > +{
> 
> Make this void, returns 0 anyway

Ack



> > +   enum hrtimer_mode htmode;
> > +
> > +   htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_ABS: HRTIMER_REL;
> > +
> > +   ctx->ticks = 0;
> > +   ctx->clockid = clockid;
> > +   ctx->flags = flags;
> > +   ctx->texp = timespec_to_ktime(ktmr->it_value);
> 
> clockid is stored in the timer on setup, so no need to store it again.
> expiry time and flags are not used after setup.
> 
> Please remove those fields.

Ack



> > +   if (ufd == -1) {
> > +   ctx = kmem_cache_alloc(timerfd_ctx_cachep, GFP_KERNEL);
> > +   if (!ctx)
> > +   return -ENOMEM;
> > +
> > +   init_waitqueue_head(>wqh);
> > +   spin_lock_init(>lock);
> > +   ctx->clockid = -1;
> > +
> > +   error = timerfd_setup(ctx, clockid, flags, );
> > +   if (error)
> > +   goto err_ctxfree;
> 
> Timer setup can not fail

Ack, the new version can't.



> > +   /*
> > +* When we call this, the initialization must be complete, since
> > +* aino_getfd() will install the fd.
> > +*/
> > +   error = aino_getfd(, , , "[timerfd]",
> > +  _fops, ctx);
> > +   if (error)
> > +   goto err_ctxfree;
> 
> Again: Please turn this around. No need to start the timer before we
> know, that everything works. 

The timerfd_setup() is not locked, so we need to make sure everything is 
setup, before advertising the fd (and aino_getfd does that).



> > +   kmem_cache_free(timerfd_ctx_cachep, ctx);
> > +}
> > +
> > +
> > +static int timerfd_close(struct inode *inode, struct file *file)
> > +{
> > +   timerfd_cleanup(file->private_data);
> > +   return 0;
> > +}
> > +
> 
> Please move the timerfd_cleanup code into close(). 

I usually prefer to have a cleanup function that works on the file's data, 
but I moved the code in the release function now.
Thx for the review! I'll repost a new version based on 2.6.21-rc3 ...




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: thread stacks and strict vm overcommit accounting

2007-03-15 Thread Hugh Dickins
On Thu, 15 Mar 2007, Andrew Morton wrote:
> On Thu, 15 Mar 2007 23:33:43 +
> Alan Cox <[EMAIL PROTECTED]> wrote:
> 
> > > Stack RSS should certainly be included in Committed_AS,
> > > but RLIMIT_STACK merely limits how big the stack vma may grow to:
> > > at any moment the stack vma is probably very much smaller,
> > > and only its current size is accounted in Committed_AS.
> > 
> > With a typical size as a fuzz factor preaccounted in later kernels.
> 
> Where's that done?

I don't know what Alan is referring to there.

> 
> > > > > Is this the intended behaviour?
> > > > 
> > > > That sounds like a bug to me.
> > > 
> > > I'm suspecting it's an oddity rather than a bug.
> > 
> > It is intended behaviour.

Intended in the way the different stacks are implemented,
but odd enough for us to wonder at the difference.

> 
> Each instance of
> 
> main()
> {
>   sleep(100);
> }
> 
> appears to increase Committed_AS by around 200kb.  But we've committed to
> providing it with 8MB for stack.
> 
> How come this is correct?

We've no more committed to providing each instance with 8MB of stack,
than we've committed to providing each instance with RLIMIT_AS of
address space.  The rlimits are limits, not commitments, surely?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUICKLIST 0/4] Arch independent quicklists V2

2007-03-15 Thread William Lee Irwin III
On Tue, Mar 13, 2007 at 06:12:44PM -0700, William Lee Irwin III wrote:
> There are furthermore distinctions to make between fork() and execve().
> fork() stomps over the entire process address space copying pagetables
> en masse. After execve() a process incrementally faults in PTE's one at
> a time. It should be clear that if case analyses are of interest at
> all, fork() will want cache-hot pages (cache-preloaded pages?) where
> such are largely wasted on incremental faults after execve(). The copy
> operations in fork() should probably also be examined in the context of
> shared pagetables at some point.

To make this perfectly clear, we can deal with the varying usage cases
with hot/cold flags to the pagetable allocator functions. Where bulk
copies such as fork() are happening, it makes perfect sense to
precharge the cache by eager zeroing. Where sparse single pte affairs
such as incrementally faulting things in after execve() are involved,
cache cold preconstructed pagetable pages are ideal. Address hints
could furthermore be used to precharge single cachelines (e.g. via
prefetch) in the sparse usage case.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-15 Thread Andrea Arcangeli
On Thu, Mar 15, 2007 at 03:06:01PM -0700, Andrew Morton wrote:
> On Thu, 15 Mar 2007 22:49:23 +0100
> Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > > > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <[EMAIL 
> > > > PROTECTED]> wrote:
> > > > I still think the simple fix of removing the 
> > > > condition is the best approach, but I'm certainly open to alternatives.
> > > 
> > > Yes, the problem of falsely activating pages when the file is read in 
> > > small
> > > hunks is worse than the problem which your patch fixes.
> > 
> > Really? I would have expected all performance sensitive apps to read
> > in >=PAGE_SIZE chunks. And if they don't because they split their
> > dataset in blocks (like some database), it may not be so wrong to
> > activate those pages that have two "hot" blocks more aggressively than
> > those pages with a single hot block.
> 
> But the problem which is being fixed here is really obscure: an application
> repeatedly reading the first page and only the first page of a file, always
> via the same fd.
>
> I'd expect that the sub-page-size read scenarion happens heaps more often
> than that, especially when dealing with larger PAGE_SIZEs.

Whatever that app is doing, clearly we have to keep those 4k in cache!
Like obviously the specweb demonstrated that as long as you are
_repeating_ the same read, it's correct to activate the page even if
it was reading from the same page as before.

What is wrong is to activate the page more aggressively if it's
_different_ parts of the page that are being read in a contiguous
way. I thought that the whole point of the ra.prev_page was to detect
_contiguous_ (not random) I/O made with a small buffer, anything else
doesn't make much sense to me.

In short I think taking a ra.prev_offset into account as suggested by
Dave Kleikamp is the best, it may actually benefit the obscure app too ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-15 Thread Dave Kleikamp
On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote:
> On Thu, Mar 15, 2007 at 05:44:01PM +, Hugh Dickins wrote:
> > who removed the !offset condition, he should be consulted on its
> > reintroduction.
> 
> the !offset check looks a pretty broken heuristic indeed, it would
> break random I/O.

I wouldn't call it broken.  At worst, I'd say it's imperfect.  But
that's the nature of a heuristic.  It most likely works in a huge
majority of cases.

> The real fix is to add a ra.prev_offset along with
> ra.prev_page, and if who implements it wants to be stylish he can as
> well use a ra.last_contiguous_read structure that has a page and
> offset fields (and then of course remove ra.prev_page).

I suggested something along these lines, but I wonder if it's overkill.
The !offset check is simple and appears to be a decent improvement over
the current code.
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix cyclades.h for x86_64 (and probably others)

2007-03-15 Thread Klaus Kudielka
On Thu, Mar 15, 2007 at 11:07:08AM -0800, Andrew Morton wrote:
> Looks OK, thanks.
> 
> It would be nice as a followup patch to simply remove ucchar, uclong and
> all that gunk altogether from that driver  and just use u8, u16 etc.
> 
> But if you decide to do that, please fix your email client first - it is
> replacing tabs with spaces.

Something like this? Applies & compiles Ok on 2.6.20.
I don't have access to the hardware right now, but am pretty sure that
the result is the same.

BTW, it was a copy-paste which made the spaces ;)

Regards, Klaus

--- include/linux/cyclades.h.orig   2007-03-15 23:46:00.0 +0100
+++ include/linux/cyclades.h2007-03-15 23:14:26.0 +0100
@@ -67,6 +67,8 @@
 #ifndef _LINUX_CYCLADES_H
 #define _LINUX_CYCLADES_H
 
+#include 
+
 struct cyclades_monitor {
 unsigned long   int_count;
 unsigned long   char_count;
@@ -149,15 +151,6 @@
  * architectures and compilers.
  */
 
-#if defined(__alpha__)
-typedef unsigned long  ucdouble;   /* 64 bits, unsigned */
-typedef unsigned int   uclong; /* 32 bits, unsigned */
-#else
-typedef unsigned long  uclong; /* 32 bits, unsigned */
-#endif
-typedef unsigned short ucshort;/* 16 bits, unsigned */
-typedef unsigned char  ucchar; /* 8 bits, unsigned */
-
 /*
  * Memory Window Sizes
  */
@@ -174,24 +167,24 @@
  */
 
 struct CUSTOM_REG {
-   uclong  fpga_id;/* FPGA Identification Register */
-   uclong  fpga_version;   /* FPGA Version Number Register */
-   uclong  cpu_start;  /* CPU start Register (write) */
-   uclong  cpu_stop;   /* CPU stop Register (write) */
-   uclong  misc_reg;   /* Miscelaneous Register */
-   uclong  idt_mode;   /* IDT mode Register */
-   uclong  uart_irq_status;/* UART IRQ status Register */
-   uclong  clear_timer0_irq;   /* Clear timer interrupt Register */
-   uclong  clear_timer1_irq;   /* Clear timer interrupt Register */
-   uclong  clear_timer2_irq;   /* Clear timer interrupt Register */
-   uclong  test_register;  /* Test Register */
-   uclong  test_count; /* Test Count Register */
-   uclong  timer_select;   /* Timer select register */
-   uclong  pr_uart_irq_status; /* Prioritized UART IRQ stat Reg */
-   uclong  ram_wait_state; /* RAM wait-state Register */
-   uclong  uart_wait_state;/* UART wait-state Register */
-   uclong  timer_wait_state;   /* timer wait-state Register */
-   uclong  ack_wait_state; /* ACK wait State Register */
+   __u32   fpga_id;/* FPGA Identification Register */
+   __u32   fpga_version;   /* FPGA Version Number Register */
+   __u32   cpu_start;  /* CPU start Register (write) */
+   __u32   cpu_stop;   /* CPU stop Register (write) */
+   __u32   misc_reg;   /* Miscelaneous Register */
+   __u32   idt_mode;   /* IDT mode Register */
+   __u32   uart_irq_status;/* UART IRQ status Register */
+   __u32   clear_timer0_irq;   /* Clear timer interrupt Register */
+   __u32   clear_timer1_irq;   /* Clear timer interrupt Register */
+   __u32   clear_timer2_irq;   /* Clear timer interrupt Register */
+   __u32   test_register;  /* Test Register */
+   __u32   test_count; /* Test Count Register */
+   __u32   timer_select;   /* Timer select register */
+   __u32   pr_uart_irq_status; /* Prioritized UART IRQ stat Reg */
+   __u32   ram_wait_state; /* RAM wait-state Register */
+   __u32   uart_wait_state;/* UART wait-state Register */
+   __u32   timer_wait_state;   /* timer wait-state Register */
+   __u32   ack_wait_state; /* ACK wait State Register */
 };
 
 /*
@@ -201,34 +194,34 @@
  */
 
 struct RUNTIME_9060 {
-   uclong  loc_addr_range; /* 00h - Local Address Range */
-   uclong  loc_addr_base;  /* 04h - Local Address Base */
-   uclong  loc_arbitr; /* 08h - Local Arbitration */
-   uclong  endian_descr;   /* 0Ch - Big/Little Endian Descriptor */
-   uclong  loc_rom_range;  /* 10h - Local ROM Range */
-   uclong  loc_rom_base;   /* 14h - Local ROM Base */
-   uclong  loc_bus_descr;  /* 18h - Local Bus descriptor */
-   uclong  loc_range_mst;  /* 1Ch - Local Range for Master to PCI */
-   uclong  loc_base_mst;   /* 20h - Local Base for Master PCI */
-   uclong  loc_range_io;   /* 24h - Local Range for Master IO */
-   uclong  pci_base_mst;   /* 28h - PCI Base for Master PCI */
-   uclong  pci_conf_io;/* 2Ch - PCI configuration for Master IO */
-   uclong  filler1;/* 30h */
-   uclong  filler2;/* 34h */
-   uclong  filler3;/* 38h */
-   uclong  filler4;  

Re: [PATCH 10/22 take 3] UBI: EBA unit

2007-03-15 Thread Josh Boyer
On Thu, Mar 15, 2007 at 02:24:10PM -0700, Randy Dunlap wrote:
> On Thu, 15 Mar 2007 11:07:03 -0800 Andrew Morton wrote:
> 
> > 
> > There's way too much code here to expect it to get decently reviewed, alas.
> 
> Yes.
> 
> /me repeats wish that Not Everything Should Be Sent to lkml.  :(

Just curious, but where would you suggest this be sent to for review then?

josh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Simplify smp_call_function*() by using common implementation

2007-03-15 Thread Jeremy Fitzhardinge
Andrew Morton wrote:
> Hopeless, sorry.   It's probably time to start thinking about raising x86
> patches against the x86 tree (at least).
>   

How's this?

J

Subject: Simplify smp_call_function*() by using common implementation

smp_call_function and smp_call_function_single are almost complete
duplicates of the same logic.  This patch combines them by
implementing them in terms of the more general
smp_call_function_mask().

[ Jan, Andi: This only changes arch/i386; can x86_64 be changed in the
  same way? ]

[ Rebased onto Jan's x86_64-mm-consolidate-smp_send_stop patch ]

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Jan Beulich <[EMAIL PROTECTED]>
Cc: Stephane Eranian <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: "Randy.Dunlap" <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>

---
 arch/i386/kernel/smp.c |  177 +++-
 1 file changed, 86 insertions(+), 91 deletions(-)

===
--- a/arch/i386/kernel/smp.c
+++ b/arch/i386/kernel/smp.c
@@ -515,14 +515,26 @@ void unlock_ipi_call_lock(void)
 
 static struct call_data_struct *call_data;
 
-static void __smp_call_function(void (*func) (void *info), void *info,
-   int nonatomic, int wait)
+
+static int __smp_call_function_mask(cpumask_t mask,
+   void (*func)(void *), void *info,
+   int wait)
 {
struct call_data_struct data;
-   int cpus = num_online_cpus() - 1;
+   cpumask_t allbutself;
+   int cpus;
+
+   /* Can deadlock when called with interrupts disabled */
+   WARN_ON(irqs_disabled());
+
+   allbutself = cpu_online_map;
+   cpu_clear(smp_processor_id(), allbutself);
+
+   cpus_and(mask, mask, allbutself);
+   cpus = cpus_weight(mask);
 
if (!cpus)
-   return;
+   return 0;
 
data.func = func;
data.info = info;
@@ -533,9 +545,12 @@ static void __smp_call_function(void (*f
 
call_data = 
mb();
-   
-   /* Send a message to all other CPUs and wait for them to respond */
-   send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+
+   /* Send a message to other CPUs */
+   if (cpus_equal(mask, allbutself))
+   send_IPI_allbutself(CALL_FUNCTION_VECTOR);
+   else
+   send_IPI_mask(mask, CALL_FUNCTION_VECTOR);
 
/* Wait for response */
while (atomic_read() != cpus)
@@ -544,6 +559,34 @@ static void __smp_call_function(void (*f
if (wait)
while (atomic_read() != cpus)
cpu_relax();
+
+   return 0;
+}
+
+/**
+ * smp_call_function_mask(): Run a function on a set of other CPUs.
+ * @mask: The set of cpus to run on.  Must not include the current cpu.
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait (atomically) until function has completed on other 
CPUs.
+ *
+ * Returns 0 on success, else a negative status code. Does not return until
+ * remote CPUs are nearly ready to execute <> or are or have finished.
+ *
+ * You must not call this function with disabled interrupts or from a
+ * hardware interrupt handler or from a bottom half handler.
+ */
+int smp_call_function_mask(cpumask_t mask,
+void (*func)(void *), void *info,
+int wait)
+{
+   int ret;
+
+   spin_lock(_lock);
+   ret = __smp_call_function_mask(mask, func, info, wait);
+   spin_unlock(_lock);
+
+   return ret;
 }
 
 /**
@@ -559,20 +602,43 @@ static void __smp_call_function(void (*f
  * You must not call this function with disabled interrupts or from a
  * hardware interrupt handler or from a bottom half handler.
  */
-int smp_call_function (void (*func) (void *info), void *info, int nonatomic,
-   int wait)
-{
-   /* Can deadlock when called with interrupts disabled */
-   WARN_ON(irqs_disabled());
-
-   /* Holding any lock stops cpus from going down. */
-   spin_lock(_lock);
-   __smp_call_function(func, info, nonatomic, wait);
-   spin_unlock(_lock);
-
-   return 0;
+int smp_call_function(void (*func) (void *info), void *info, int nonatomic,
+ int wait)
+{
+   return smp_call_function_mask(cpu_online_map, func, info, wait);
 }
 EXPORT_SYMBOL(smp_call_function);
+
+/*
+ * smp_call_function_single - Run a function on another CPU
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @nonatomic: Currently unused.
+ * @wait: If true, wait until function has completed on other CPUs.
+ *
+ * Retrurns 0 on success, else a negative status code.
+ *
+ * Does not return until the remote CPU is nearly ready to execute 
+ 

Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-15 Thread Andrea Arcangeli
On Thu, Mar 15, 2007 at 06:15:45PM -0500, Dave Kleikamp wrote:
> On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote:
> > On Thu, Mar 15, 2007 at 05:44:01PM +, Hugh Dickins wrote:
> > > who removed the !offset condition, he should be consulted on its
> > > reintroduction.
> > 
> > the !offset check looks a pretty broken heuristic indeed, it would
> > break random I/O.
> 
> I wouldn't call it broken.  At worst, I'd say it's imperfect.  But
> that's the nature of a heuristic.  It most likely works in a huge
> majority of cases.

well, IMHO in the huge majority of cases the prev_page check isn't
necessary in the first place (and IMHO it hurts a lot more than it can
help, as demonstrated by specweb, since we'll bite on the good guys to
help the bad guys).

The only case where I can imagine the prev_page to make sense is to
handle contiguous I/O made with a small buffer, so clearly an
inefficient code in the first place. But if this guy is reading with
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [REPOST] x86_64, i386: Add command line length to boot protocol

2007-03-15 Thread H. Peter Anvin

Alon Bar-Lev wrote:

Hello,

I really don' t understand why you insist that the boot protocol

=2.02 had 255 limit!

Please remove this from the description.
You want to add size, that's OK, but please don't mess with previous
definitions.
Boot protocol 2.02 introduced the null terminated string truncated by
kernel, which can be at any size.



Well, except for a very brief window, the limit *was* 255.  If the boot 
loader wants to verify nontruncation, this is a valid concern.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: thread stacks and strict vm overcommit accounting

2007-03-15 Thread Dan Aloni
On Thu, Mar 15, 2007 at 03:36:13PM -0700, Andrew Morton wrote:
> 
> > > > > Is this the intended behaviour?
> > > > 
> > > > That sounds like a bug to me.
> > > 
> > > I'm suspecting it's an oddity rather than a bug.
> > 
> > It is intended behaviour.
> 
> Each instance of
> 
> main()
> {
>   sleep(100);
> }
> 
> appears to increase Committed_AS by around 200kb.  But we've committed to
> providing it with 8MB for stack.
> 
> How come this is correct?

Perhaps it makes a lot of sense if you regard stack growth at 
the same sense that you regard heap growth by the means of brk(). 

Just by the fact that the stack is limited on default and RLIMIT_DATA 
is unlimited, doesn't mean the we need to account for the maximum
stack size. 

Perhaps for embedded systems where you want to have overcommit_memory=2 
overcommit_ratio=100 and no swap (for design constraints), just to make
sure that allocations fail *always before* OOM gets triggered (and 
therefore OOM never gets triggered, thankfully), it would have been
useful to look at Commited_AS to realize how much the system is close 
to the maximum memory utilization potential.

Learning about this 'oddity' in Commited_AS, I'd guess it would be 
better for me not to rely on it for measurements and perhaps tweak 
smaller values of RSS_STACK for processes on that embedded system.

-- 
Dan Aloni
XIV LTD, http://www.xivstorage.com
da-x (at) monatomic.org, dan (at) xiv.co.il
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] swsusp: Do not use page flags

2007-03-15 Thread Rafael J. Wysocki
On Thursday, 15 March 2007 23:23, Andrew Morton wrote:
> On Thu, 15 Mar 2007 23:19:02 +0100 (CET)
> Jiri Kosina <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, 15 Mar 2007, Andrew Morton wrote:
> > 
> > > > > And why _does_ suspend use GFP_ATOMIC all over the place?
> > > > Generally, because it cannot sleep.
> > > Why not?
> > 
> > I guess it's simply beucase of kswapd being already frozen, so there is no 
> > chance that once GFP_KERNEL allocation goes to sleep, it is going to get 
> > any free pages eventually ... ?
> 
> No, things should run fine with a dead kswapd.
> 
> There are reasons why we can't call into filesystems from there, but
> GFP_NOIO will ensure that and it is heaps better than GFP_ATOMIC.

In fact the role of swsusp_shrink_memory() is to ensure that our subsequent
atomic allocations won't fail.

Still, the particular allocations in create_basic_memory_bitmaps() are made
before we call swsusp_shrink_memory(), so it's better to use GFP_NOIO in there.

I'll prepare a patch for that on top of the current series.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/13] signal/timer/event fds v6 - anonymous inode source ...

2007-03-15 Thread Davide Libenzi
This patch add an anonymous inode source, to be used for files that need 
and inode only in order to create a file*. We do not care of having an 
inode for each file, and we do not even care of having different names in 
the associated dentries (dentry names will be same for classes of file*).
This allow code reuse, and will be used by epoll, signalfd and timerfd 
(and whatever else there'll be).



Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.21-rc3.quilt/fs/anon_inodes.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.21-rc3.quilt/fs/anon_inodes.c 2007-03-15 15:32:33.0 
-0700
@@ -0,0 +1,203 @@
+/*
+ *  fs/anon_inodes.c
+ *
+ *  Copyright (C) 2007  Davide Libenzi 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+static int ainofs_delete_dentry(struct dentry *dentry);
+static struct inode *aino_getinode(void);
+static struct inode *aino_mkinode(void);
+static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
+const char *dev_name, void *data, struct vfsmount 
*mnt);
+
+
+
+static struct vfsmount *aino_mnt __read_mostly;
+static struct inode *aino_inode;
+static struct file_operations aino_fops = { };
+static struct file_system_type aino_fs_type = {
+   .name   = "ainofs",
+   .get_sb = ainofs_get_sb,
+   .kill_sb= kill_anon_super,
+};
+static struct dentry_operations ainofs_dentry_operations = {
+   .d_delete   = ainofs_delete_dentry,
+};
+
+
+
+int aino_getfd(int *pfd, struct inode **pinode, struct file **pfile,
+  char const *name, const struct file_operations *fops, void *priv)
+{
+   struct qstr this;
+   struct dentry *dentry;
+   struct inode *inode;
+   struct file *file;
+   int error, fd;
+
+   error = -ENFILE;
+   file = get_empty_filp();
+   if (!file)
+   goto eexit_1;
+
+   inode = aino_getinode();
+   if (IS_ERR(inode)) {
+   error = PTR_ERR(inode);
+   goto eexit_2;
+   }
+
+   error = get_unused_fd();
+   if (error < 0)
+   goto eexit_3;
+   fd = error;
+
+   /*
+* Link the inode to a directory entry by creating a unique name
+* using the inode sequence number.
+*/
+   error = -ENOMEM;
+   this.name = name;
+   this.len = strlen(name);
+   this.hash = 0;
+   dentry = d_alloc(aino_mnt->mnt_sb->s_root, );
+   if (!dentry)
+   goto eexit_4;
+   dentry->d_op = _dentry_operations;
+   /* Do not publish this dentry inside the global dentry hash table */
+   dentry->d_flags &= ~DCACHE_UNHASHED;
+   d_instantiate(dentry, inode);
+
+   file->f_path.mnt = mntget(aino_mnt);
+   file->f_path.dentry = dentry;
+   file->f_mapping = inode->i_mapping;
+
+   file->f_pos = 0;
+   file->f_flags = O_RDWR;
+   file->f_op = fops;
+   file->f_mode = FMODE_READ | FMODE_WRITE;
+   file->f_version = 0;
+   file->private_data = priv;
+
+   fd_install(fd, file);
+
+   *pfd = fd;
+   *pinode = inode;
+   *pfile = file;
+   return 0;
+
+eexit_4:
+   put_unused_fd(fd);
+eexit_3:
+   iput(inode);
+eexit_2:
+   put_filp(file);
+eexit_1:
+   return error;
+}
+
+
+static int ainofs_delete_dentry(struct dentry *dentry)
+{
+   /*
+* We faked vfs to believe the dentry was hashed when we created it.
+* Now we restore the flag so that dput() will work correctly.
+*/
+   dentry->d_flags |= DCACHE_UNHASHED;
+   return 1;
+}
+
+
+static struct inode *aino_getinode(void)
+{
+   return igrab(aino_inode);
+}
+
+
+/*
+ * A single inode exist for all aino files. On the contrary of pipes,
+ * aino inodes has no per-instance data associated, so we can avoid
+ * the allocation of multiple of them.
+ */
+static struct inode *aino_mkinode(void)
+{
+   int error = -ENOMEM;
+   struct inode *inode = new_inode(aino_mnt->mnt_sb);
+
+   if (!inode)
+   goto eexit_1;
+
+   inode->i_fop = _fops;
+
+   /*
+* Mark the inode dirty from the very beginning,
+* that way it will never be moved to the dirty
+* list because mark_inode_dirty() will think
+* that it already _is_ on the dirty list.
+*/
+   inode->i_state = I_DIRTY;
+   inode->i_mode = S_IRUSR | S_IWUSR;
+   inode->i_uid = current->fsuid;
+   inode->i_gid = current->fsgid;
+   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   return inode;
+
+eexit_1:
+   return ERR_PTR(error);
+}
+
+
+static int ainofs_get_sb(struct file_system_type *fs_type, int flags,
+const char *dev_name, void *data, struct vfsmount *mnt)
+{
+   return get_sb_pseudo(fs_type, "aino:", NULL, 

[patch 2/13] signal/timer/event fds v6 - signalfd core ...

2007-03-15 Thread Davide Libenzi
This patch series implements the new signalfd() system call.
I took part of the original Linus code (and you know how
badly it can be broken :), and I added even more breakage ;)
Signals are fetched from the same signal queue used by the process,
so signalfd will compete with standard kernel delivery in dequeue_signal().
If you want to reliably fetch signals on the signalfd file, you need to
block them with sigprocmask(SIG_BLOCK).
This seems to be working fine on my Dual Opteron machine. I made a quick 
test program for it:

http://www.xmailserver.org/signafd-test.c

The signalfd() system call implements signal delivery into a file 
descriptor receiver. The signalfd file descriptor if created with the 
following API:

int signalfd(int ufd, const sigset_t *mask, size_t masksize);

The "ufd" parameter allows to change an existing signalfd sigmask, w/out 
going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a 
brand new signalfd file.
The "mask" allows to specify the signal mask of signals that we are 
interested in. The "masksize" parameter is the size of "mask".
The signalfd fd supports the poll(2) and read(2) system calls. The poll(2)
will return POLLIN when signals are available to be dequeued. As a direct
consequence of supporting the Linux poll subsystem, the signalfd fd can use
used together with epoll(2) too.
The read(2) system call will return a "struct signalfd_siginfo" structure
in the userspace supplied buffer. The return value is the number of bytes
copied in the supplied buffer, or -1 in case of error. The read(2) call
can also return 0, in case the sighand structure to which the signalfd
was attached, has been orphaned. The O_NONBLOCK flag is also supported, and
read(2) will return -EAGAIN in case no signal is available.
The format of the struct signalfd_siginfo is, and the valid fields depends
of the (->code & __SI_MASK) value, in the same way a struct siginfo would:

struct signalfd_siginfo {
__u32 signo;/* si_signo */
__s32 err;  /* si_errno */
__s32 code; /* si_code */
__u32 pid;  /* si_pid */
__u32 uid;  /* si_uid */
__s32 fd;   /* si_fd */
__u32 tid;  /* si_fd */
__u32 band; /* si_band */
__u32 overrun;  /* si_overrun */
__u32 trapno;   /* si_trapno */
__s32 status;   /* si_status */
__s32 svint;/* si_int */
__u64 svptr;/* si_ptr */
__u64 utime;/* si_utime */
__u64 stime;/* si_stime */
__u64 addr; /* si_addr */
};



Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.21-rc3.quilt/fs/signalfd.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.21-rc3.quilt/fs/signalfd.c2007-03-15 15:33:52.0 
-0700
@@ -0,0 +1,381 @@
+/*
+ *  fs/signalfd.c
+ *
+ *  Copyright (C) 2003  Linus Torvalds
+ *
+ *  Mon Mar 5, 2007: Davide Libenzi 
+ *  Changed ->read() to return a siginfo strcture instead of signal number.
+ *  Fixed locking in ->poll().
+ *  Added sighand-detach notification.
+ *  Added fd re-use in sys_signalfd() syscall.
+ *  Now using anonymous inode source.
+ *  Thanks to Oleg Nesterov for useful code review and suggestions.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+struct signalfd_ctx {
+   struct list_head lnk;
+   wait_queue_head_t wqh;
+   sigset_t sigmask;
+   struct task_struct *tsk;
+};
+
+
+
+static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx *ctx,
+  unsigned long *flags);
+static void signalfd_put_sighand(struct signalfd_ctx *ctx,
+struct sighand_struct *sighand,
+unsigned long *flags);
+static void signalfd_cleanup(struct signalfd_ctx *ctx);
+static int signalfd_close(struct inode *inode, struct file *file);
+static unsigned int signalfd_poll(struct file *file, poll_table *wait);
+static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
+siginfo_t const *kinfo);
+static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
+loff_t *ppos);
+
+
+
+static const struct file_operations signalfd_fops = {
+   .release= signalfd_close,
+   .poll   = signalfd_poll,
+   .read   = signalfd_read,
+};
+static struct kmem_cache *signalfd_ctx_cachep;
+
+
+
+static struct sighand_struct *signalfd_get_sighand(struct signalfd_ctx *ctx,
+  unsigned long *flags)
+{
+   struct sighand_struct *sighand;
+
+   rcu_read_lock();
+   sighand = lock_task_sighand(ctx->tsk, flags);
+   rcu_read_unlock();
+
+   if (sighand && 

[patch 6/13] signal/timer/event fds v6 - timerfd core ...

2007-03-15 Thread Davide Libenzi
This patch introduces a new system call for timers events delivered
though file descriptors. This allows timer event to be used with
standard POSIX poll(2), select(2) and read(2). As a consequence of
supporting the Linux f_op->poll subsystem, they can be used with
epoll(2) too.
The system call is defined as:

int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr);

The "ufd" parameter allows for re-use (re-programming) of an existing
timerfd w/out going through the close/open cycle (same as signalfd).
If "ufd" is -1, s new file descriptor will be created, otherwise the
existing "ufd" will be re-programmed.
The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME.
The time specified in the "utmr->it_value" parameter is the expiry
time for the timer.
If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute
time, otherwise it's a relative time.
If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0,
tv_nsec == 0), this is the period at which the following ticks should
be generated.
The "utmr->it_interval" should be set to zero if only one tick is requested.
Setting the "utmr->it_value" to zero will disable the timer, or will create
a timerfd without the timer enabled.
The function returns the new (or same, in case "ufd" is a valid timerfd
descriptor) file, or -1 in case of error.
As stated before, the timerfd file descriptor supports poll(2), select(2)
and epoll(2). When a timer event happened on the timerfd, a POLLIN mask
will be returned.
The read(2) call can be used, and it will return a u32 variable holding
the number of "ticks" that happened on the interface since the last call
to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN
will be returned if no ticks happened.
A quick test program, shows timerfd working correctly on my amd64 box:

http://www.xmailserver.org/timerfd-test.c




Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.21-rc3.quilt/fs/timerfd.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.21-rc3.quilt/fs/timerfd.c 2007-03-15 16:08:05.0 -0700
@@ -0,0 +1,257 @@
+/*
+ *  fs/timerfd.c
+ *
+ *  Copyright (C) 2007  Davide Libenzi 
+ *
+ *
+ *  Thanks to Thomas Gleixner for code reviews and useful comments.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+struct timerfd_ctx {
+   struct hrtimer tmr;
+   ktime_t texp, tintv;
+   spinlock_t lock;
+   wait_queue_head_t wqh;
+   unsigned long ticks;
+};
+
+
+static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr);
+static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
+ const struct itimerspec *ktmr);
+static int timerfd_close(struct inode *inode, struct file *file);
+static unsigned int timerfd_poll(struct file *file, poll_table *wait);
+static ssize_t timerfd_read(struct file *file, char __user *buf, size_t count,
+   loff_t *ppos);
+
+
+
+static const struct file_operations timerfd_fops = {
+   .release= timerfd_close,
+   .poll   = timerfd_poll,
+   .read   = timerfd_read,
+};
+static struct kmem_cache *timerfd_ctx_cachep;
+
+
+
+static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
+{
+   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
+   enum hrtimer_restart rval = HRTIMER_NORESTART;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   ctx->ticks++;
+   wake_up_locked(>wqh);
+   if (ctx->tintv.tv64 != 0) {
+   hrtimer_forward(htmr, hrtimer_cb_get_time(htmr), ctx->tintv);
+   rval = HRTIMER_RESTART;
+   }
+   spin_unlock_irqrestore(>lock, flags);
+
+   return rval;
+}
+
+
+static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
+ const struct itimerspec *ktmr)
+{
+   enum hrtimer_mode htmode;
+
+   htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_MODE_ABS: 
HRTIMER_MODE_REL;
+
+   ctx->ticks = 0;
+   ctx->texp = timespec_to_ktime(ktmr->it_value);
+   ctx->tintv = timespec_to_ktime(ktmr->it_interval);
+   hrtimer_init(>tmr, clockid, htmode);
+   ctx->tmr.expires = ctx->texp;
+   ctx->tmr.function = timerfd_tmrproc;
+   if (ctx->texp.tv64 != 0)
+   hrtimer_start(>tmr, ctx->texp, htmode);
+}
+
+
+asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
+   const struct itimerspec __user *utmr)
+{
+   int error;
+   struct timerfd_ctx *ctx;
+   struct file *file;
+   struct inode *inode;
+   struct itimerspec ktmr;
+
+   if (copy_from_user(, utmr, sizeof(ktmr)))
+   return -EFAULT;
+
+   if (clockid != 

[patch 4/13] signal/timer/event fds v6 - signalfd wire up x86_64 arch ...

2007-03-15 Thread Davide Libenzi
This patch wire the signalfd system call to the x86_64 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h
===
--- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-02-04 
10:44:54.0 -0800
+++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h  2007-03-15 
15:34:29.0 -0700
@@ -619,8 +619,10 @@
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_signalfd  280
+__SYSCALL(__NR_signalfd, sys_signalfd)
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_signalfd
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-15 
15:19:20.0 -0700
+++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-15 
15:35:35.0 -0700
@@ -714,9 +714,10 @@
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
.quad sys_epoll_pwait
+   .quad sys_signalfd  /* 320 */
 ia32_syscall_end:  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 10/13] signal/timer/event fds v6 - eventfd core ...

2007-03-15 Thread Davide Libenzi
This is a very simple and light file descriptor, that can be used as
event wait/dispatch by userspace (both wait and dispatch) and by the
kernel (dispatch only). It can be used instead of pipe(2) in all cases
where those would simply be used to signal events. Their kernel overhead
is much lower than pipes, and they do not consume two fds. When used in
the kernel, it can offer an fd-bridge to enable, for example, functionalities
like KAIO or syslets/threadlets to signal to an fd the completion of certain
operations. But more in general, an eventfd can be used by the kernel to
signal readiness, in a POSIX poll/select way, of interfaces that would
otherwise be incompatible with it. The API is:

int eventfd(unsigned int count);

The eventfd API accepts an initial "count" parameter, and returns an
eventfd fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and 
write(2).
The POLLIN flag is raised when the internal counter is greater than zero.
The POLLOUT flag is raised when at least a value of "1" can be written to
the internal counter.
The POLLERR flag is raised when an overflow in the counter value is detected.
The write(2) operation can never overflow the counter, since it blocks
(unless O_NONBLOCK is set, in which case -EAGAIN is returned).
But the eventfd_signal() function can do it, since it's supposed to not
sleep during its operation.
The read(2) function reads the __u64 counter value, and reset the internal
value to zero. If the value read is equal to (__u64) -1, an overflow
happened on the internal counter (due to 2^64 eventfd_signal() posts
that has never been retired - unlickely, but possible).
The write(2) call writes an __u64 count value, and adds it
to the current counter. The eventfd fd supports O_NONBLOCK also.
On the kernel side, we have:

struct file *eventfd_fget(int fd);
int eventfd_signal(struct file *file, unsigned int n);

The eventfd_fget() should be called to get a struct file* from an eventfd
fd (this is an fget() + check of f_op being an eventfd fops pointer).
The kernel can then call eventfd_signal() every time it wants to post
an event to userspace. The eventfd_signal() function can be called from any
context.
An eventfd() simple test and bench is available here:

http://www.xmailserver.org/eventfd-bench.c

This is the eventfd-based version of pipetest-4 (pipe(2) based):

http://www.xmailserver.org/pipetest-4.c

Not that performance matters much in the eventfd case, but eventfd-bench
shows almost as double as performance than pipetest-4.




Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.21-rc3.quilt/fs/Makefile
===
--- linux-2.6.21-rc3.quilt.orig/fs/Makefile 2007-03-15 15:53:07.0 
-0700
+++ linux-2.6.21-rc3.quilt/fs/Makefile  2007-03-15 16:11:54.0 -0700
@@ -11,7 +11,7 @@
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
pnode.o drop_caches.o splice.o sync.o utimes.o \
-   stack.o anon_inodes.o signalfd.o timerfd.o
+   stack.o anon_inodes.o signalfd.o timerfd.o eventfd.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
Index: linux-2.6.21-rc3.quilt/include/linux/syscalls.h
===
--- linux-2.6.21-rc3.quilt.orig/include/linux/syscalls.h2007-03-15 
15:53:07.0 -0700
+++ linux-2.6.21-rc3.quilt/include/linux/syscalls.h 2007-03-15 
16:11:54.0 -0700
@@ -605,6 +605,7 @@
 asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t 
sizemask);
 asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
const struct itimerspec __user *utmr);
+asmlinkage long sys_eventfd(unsigned int count);
 
 int kernel_execve(const char *filename, char *const argv[], char *const 
envp[]);
 
Index: linux-2.6.21-rc3.quilt/fs/eventfd.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.21-rc3.quilt/fs/eventfd.c 2007-03-15 16:11:54.0 -0700
@@ -0,0 +1,271 @@
+/*
+ *  fs/eventfd.c
+ *
+ *  Copyright (C) 2007  Davide Libenzi 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+
+
+struct eventfd_ctx {
+   spinlock_t lock;
+   wait_queue_head_t wqh;
+   __u64 count;
+};
+
+
+static void eventfd_cleanup(struct eventfd_ctx *ctx);
+static int eventfd_close(struct inode *inode, struct file *file);
+static unsigned int eventfd_poll(struct file *file, poll_table *wait);
+static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
+   loff_t *ppos);
+static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t 
count,
+

[patch 5/13] signal/timer/event fds v6 - signalfd compat code ...

2007-03-15 Thread Davide Libenzi
This patch implement the necessary compat code for the signalfd system call.


Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.21-rc3.quilt/fs/compat.c
===
--- linux-2.6.21-rc3.quilt.orig/fs/compat.c 2007-02-04 10:44:54.0 
-0800
+++ linux-2.6.21-rc3.quilt/fs/compat.c  2007-03-15 15:35:58.0 -0700
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -2235,3 +2236,24 @@
return sys_ni_syscall();
 }
 #endif
+
+asmlinkage long compat_sys_signalfd(int ufd,
+   const compat_sigset_t __user *sigmask,
+   compat_size_t sigsetsize)
+{
+   compat_sigset_t ss32;
+   sigset_t tmp;
+   sigset_t __user *ksigmask;
+
+   if (sigsetsize != sizeof(compat_sigset_t))
+   return -EINVAL;
+   if (copy_from_user(, sigmask, sizeof(ss32)))
+   return -EFAULT;
+   sigset_from_compat(, );
+   ksigmask = compat_alloc_user_space(sizeof(sigset_t));
+   if (copy_to_user(ksigmask, , sizeof(sigset_t)))
+   return -EFAULT;
+
+   return sys_signalfd(ufd, ksigmask, sizeof(sigset_t));
+}
+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 7/13] signal/timer/event fds v6 - timerfd wire up i386 arch ...

2007-03-15 Thread Davide Libenzi
This patch wire the timerfd system call to the i386 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.21-rc3.quilt.orig/arch/i386/kernel/syscall_table.S
2007-03-15 15:53:15.0 -0700
+++ linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S 2007-03-15 
16:11:47.0 -0700
@@ -320,3 +320,4 @@
.long sys_getcpu
.long sys_epoll_pwait
.long sys_signalfd  /* 320 */
+   .long sys_timerfd
Index: linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h
===
--- linux-2.6.21-rc3.quilt.orig/include/asm-i386/unistd.h   2007-03-15 
15:53:15.0 -0700
+++ linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h2007-03-15 
16:11:47.0 -0700
@@ -326,10 +326,11 @@
 #define __NR_getcpu318
 #define __NR_epoll_pwait   319
 #define __NR_signalfd  320
+#define __NR_timerfd   321
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 321
+#define NR_syscalls 322
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/13] signal/timer/event fds v6 - signalfd wire up i386 arch ...

2007-03-15 Thread Davide Libenzi
This patch wire the signalfd system call to the i386 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.21-rc3.quilt.orig/arch/i386/kernel/syscall_table.S
2007-02-04 10:44:54.0 -0800
+++ linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S 2007-03-15 
15:34:12.0 -0700
@@ -319,3 +319,4 @@
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+   .long sys_signalfd  /* 320 */
Index: linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h
===
--- linux-2.6.21-rc3.quilt.orig/include/asm-i386/unistd.h   2007-02-04 
10:44:54.0 -0800
+++ linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h2007-03-15 
15:34:12.0 -0700
@@ -325,10 +325,11 @@
 #define __NR_move_pages317
 #define __NR_getcpu318
 #define __NR_epoll_pwait   319
+#define __NR_signalfd  320
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 320
+#define NR_syscalls 321
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 9/13] signal/timer/event fds v6 - timerfd compat code ...

2007-03-15 Thread Davide Libenzi
This patch implement the necessary compat code for the timerfd system call.


Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.21-rc3.quilt/fs/compat.c
===
--- linux-2.6.21-rc3.quilt.orig/fs/compat.c 2007-03-15 15:53:11.0 
-0700
+++ linux-2.6.21-rc3.quilt/fs/compat.c  2007-03-15 16:11:52.0 -0700
@@ -2257,3 +2257,23 @@
return sys_signalfd(ufd, ksigmask, sizeof(sigset_t));
 }
 
+
+asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
+  const struct compat_itimerspec __user *utmr)
+{
+   long res;
+   struct itimerspec t;
+   struct itimerspec __user *ut;
+
+   res = -EFAULT;
+   if (get_compat_itimerspec(, utmr))
+   goto err_exit;
+   ut = compat_alloc_user_space(sizeof(*ut));
+   if (copy_to_user(ut, , sizeof(t)) )
+   goto err_exit;
+
+   res = sys_timerfd(ufd, clockid, flags, ut);
+err_exit:
+   return res;
+}
+
Index: linux-2.6.21-rc3.quilt/include/linux/compat.h
===
--- linux-2.6.21-rc3.quilt.orig/include/linux/compat.h  2007-03-15 
15:53:11.0 -0700
+++ linux-2.6.21-rc3.quilt/include/linux/compat.h   2007-03-15 
16:11:52.0 -0700
@@ -225,6 +225,11 @@
return lhs->tv_nsec - rhs->tv_nsec;
 }
 
+extern int get_compat_itimerspec(struct itimerspec *dst,
+const struct compat_itimerspec __user *src);
+extern int put_compat_itimerspec(struct compat_itimerspec __user *dst,
+const struct itimerspec *src);
+
 asmlinkage long compat_sys_adjtimex(struct compat_timex __user *utp);
 
 extern int compat_printk(const char *fmt, ...);
Index: linux-2.6.21-rc3.quilt/kernel/compat.c
===
--- linux-2.6.21-rc3.quilt.orig/kernel/compat.c 2007-03-15 15:53:11.0 
-0700
+++ linux-2.6.21-rc3.quilt/kernel/compat.c  2007-03-15 16:11:52.0 
-0700
@@ -475,8 +475,8 @@
return min_length;
 }
 
-static int get_compat_itimerspec(struct itimerspec *dst, 
-struct compat_itimerspec __user *src)
+int get_compat_itimerspec(struct itimerspec *dst,
+ const struct compat_itimerspec __user *src)
 { 
if (get_compat_timespec(>it_interval, >it_interval) ||
get_compat_timespec(>it_value, >it_value))
@@ -484,8 +484,8 @@
return 0;
 } 
 
-static int put_compat_itimerspec(struct compat_itimerspec __user *dst, 
-struct itimerspec *src)
+int put_compat_itimerspec(struct compat_itimerspec __user *dst,
+ const struct itimerspec *src)
 { 
if (put_compat_timespec(>it_interval, >it_interval) ||
put_compat_timespec(>it_value, >it_value))

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 8/13] signal/timer/event fds v6 - timerfd wire up x86_64 arch ...

2007-03-15 Thread Davide Libenzi
This patch wire the timerfd system call to the x86_64 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-15 
15:53:13.0 -0700
+++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-15 
16:11:50.0 -0700
@@ -720,4 +720,5 @@
.quad sys_getcpu
.quad sys_epoll_pwait
.quad sys_signalfd  /* 320 */
+   .quad sys_timerfd
 ia32_syscall_end:  
Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h
===
--- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-03-15 
15:53:13.0 -0700
+++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h  2007-03-15 
16:11:50.0 -0700
@@ -621,8 +621,10 @@
 __SYSCALL(__NR_move_pages, sys_move_pages)
 #define __NR_signalfd  280
 __SYSCALL(__NR_signalfd, sys_signalfd)
+#define __NR_timerfd   281
+__SYSCALL(__NR_timerfd, sys_timerfd)
 
-#define __NR_syscall_max __NR_signalfd
+#define __NR_syscall_max __NR_timerfd
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 12/13] signal/timer/event fds v6 - eventfd wire up x86_64 arch ...

2007-03-15 Thread Davide Libenzi
This patch wire the eventfd system call to the x86_64 architecture.



Signed-off-by: Davide Libenzi 


- Davide



Index: linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.21-rc3.quilt.orig/arch/x86_64/ia32/ia32entry.S2007-03-15 
16:11:50.0 -0700
+++ linux-2.6.21-rc3.quilt/arch/x86_64/ia32/ia32entry.S 2007-03-15 
16:13:43.0 -0700
@@ -721,4 +721,5 @@
.quad sys_epoll_pwait
.quad sys_signalfd  /* 320 */
.quad sys_timerfd
+   .quad sys_eventfd
 ia32_syscall_end:  
Index: linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h
===
--- linux-2.6.21-rc3.quilt.orig/include/asm-x86_64/unistd.h 2007-03-15 
16:11:50.0 -0700
+++ linux-2.6.21-rc3.quilt/include/asm-x86_64/unistd.h  2007-03-15 
16:13:43.0 -0700
@@ -623,8 +623,10 @@
 __SYSCALL(__NR_signalfd, sys_signalfd)
 #define __NR_timerfd   281
 __SYSCALL(__NR_timerfd, sys_timerfd)
+#define __NR_eventfd   282
+__SYSCALL(__NR_eventfd, sys_eventfd)
 
-#define __NR_syscall_max __NR_timerfd
+#define __NR_syscall_max __NR_eventfd
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 11/13] signal/timer/event fds v6 - eventfd wire up i386 arch ...

2007-03-15 Thread Davide Libenzi
This patch wire the eventfd system call to the i386 architecture.



Signed-off-by: Davide Libenzi 


- Davide


Index: linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.21-rc3.quilt.orig/arch/i386/kernel/syscall_table.S
2007-03-15 16:11:47.0 -0700
+++ linux-2.6.21-rc3.quilt/arch/i386/kernel/syscall_table.S 2007-03-15 
16:13:40.0 -0700
@@ -321,3 +321,4 @@
.long sys_epoll_pwait
.long sys_signalfd  /* 320 */
.long sys_timerfd
+   .long sys_eventfd
Index: linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h
===
--- linux-2.6.21-rc3.quilt.orig/include/asm-i386/unistd.h   2007-03-15 
16:11:47.0 -0700
+++ linux-2.6.21-rc3.quilt/include/asm-i386/unistd.h2007-03-15 
16:13:40.0 -0700
@@ -327,10 +327,11 @@
 #define __NR_epoll_pwait   319
 #define __NR_signalfd  320
 #define __NR_timerfd   321
+#define __NR_eventfd   322
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 322
+#define NR_syscalls 323
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: thread stacks and strict vm overcommit accounting

2007-03-15 Thread Alan Cox
> > > With a typical size as a fuzz factor preaccounted in later kernels.
> > 
> > Where's that done?
> 
> I don't know what Alan is referring to there.

fs/exec.c - we add 20 pages to the stack vma size initially.

> We've no more committed to providing each instance with 8MB of stack,
> than we've committed to providing each instance with RLIMIT_AS of
> address space.  The rlimits are limits, not commitments, surely?

Yes, its just that the C programming language is utterly and
mindbogglingly broken when it comes to resource exhaustion for the stack.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 13/13] signal/timer/event fds v6 - KAIO eventfd support example ...

2007-03-15 Thread Davide Libenzi
This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd
(hence compatible with POSIX select/poll). The KAIO code simply signals
the eventfd fd when events are ready, and this triggers a POLLIN in the fd.
This patch uses a reserved for future use member of the struct iocb to pass
an eventfd file descriptor, that KAIO will use to post events every time
a request completes. At that point, an aio_getevents() will return the
completed result to a struct io_event.
I made a quick test program to verify the patch, and it runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and
epoll too.
This can allow to schedule both block I/O and other poll-able devices requests,
and wait for results using select/poll/epoll.




Signed-off-by: Davide Libenzi 



- Davide



Index: linux-2.6.21-rc3.quilt/fs/aio.c
===
--- linux-2.6.21-rc3.quilt.orig/fs/aio.c2007-03-15 15:52:45.0 
-0700
+++ linux-2.6.21-rc3.quilt/fs/aio.c 2007-03-15 17:15:20.0 -0700
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -421,6 +422,7 @@
req->private = NULL;
req->ki_iovec = NULL;
INIT_LIST_HEAD(>ki_run_list);
+   req->ki_eventfd = ERR_PTR(-EINVAL);
 
/* Check if the completion queue has enough free space to
 * accept an event from this io.
@@ -462,6 +464,8 @@
 {
assert_spin_locked(>ctx_lock);
 
+   if (!IS_ERR(req->ki_eventfd))
+   fput(req->ki_eventfd);
if (req->ki_dtor)
req->ki_dtor(req);
if (req->ki_iovec != >ki_inline_vec)
@@ -946,6 +950,14 @@
return 1;
}
 
+   /*
+* Check if the user asked us to deliver the result through an
+* eventfd. The eventfd_signal() function is safe to be called
+* from IRQ context.
+*/
+   if (unlikely(!IS_ERR(iocb->ki_eventfd)))
+   eventfd_signal(iocb->ki_eventfd, 1);
+
info = >ring_info;
 
/* add a completion event to the ring buffer.
@@ -1555,6 +1567,19 @@
fput(file);
return -EAGAIN;
}
+   if (iocb->aio_resfd != 0) {
+   /*
+* If the aio_resfd field of the iocb is not zero, get an
+* instance of the file* now. The file descriptor must be
+* an eventfd() fd, and will be signaled for each completed
+* event using the eventfd_signal() function.
+*/
+   req->ki_eventfd = eventfd_fget((int) iocb->aio_resfd);
+   if (IS_ERR(req->ki_eventfd)) {
+   ret = PTR_ERR(req->ki_eventfd);
+   goto out_put_req;
+   }
+   }
 
req->ki_filp = file;
ret = put_user(req->ki_key, _iocb->aio_key);
Index: linux-2.6.21-rc3.quilt/include/linux/aio.h
===
--- linux-2.6.21-rc3.quilt.orig/include/linux/aio.h 2007-03-15 
15:52:45.0 -0700
+++ linux-2.6.21-rc3.quilt/include/linux/aio.h  2007-03-15 16:13:45.0 
-0700
@@ -119,6 +119,12 @@
 
struct list_headki_list;/* the aio core uses this
 * for cancellation */
+
+   /*
+* If the aio_resfd field of the userspace iocb is not zero,
+* this is the underlying file* to deliver event to.
+*/
+   struct file *ki_eventfd;
 };
 
 #define is_sync_kiocb(iocb)((iocb)->ki_key == KIOCB_SYNC_KEY)
Index: linux-2.6.21-rc3.quilt/include/linux/aio_abi.h
===
--- linux-2.6.21-rc3.quilt.orig/include/linux/aio_abi.h 2007-03-15 
15:52:45.0 -0700
+++ linux-2.6.21-rc3.quilt/include/linux/aio_abi.h  2007-03-15 
16:13:45.0 -0700
@@ -84,7 +84,11 @@
 
/* extra parameters */
__u64   aio_reserved2;  /* TODO: use this for a (struct sigevent *) */
-   __u64   aio_reserved3;
+   __u32   aio_reserved3;
+   /*
+* If different from 0, this is an eventfd to deliver AIO results to
+*/
+   __u32   aio_resfd;
 }; /* 64 bytes */
 
 #undef IFBIG

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [REPOST] x86_64, i386: Add command line length to boot protocol

2007-03-15 Thread H. Peter Anvin

Bernhard Walle wrote:

Because the command line is increased to 2048 characters after 2.6.21,
it's not possible for boot loaders and userspace tools to determine the length
of the command line the kernel can understand. The benefit of knowing the
length is that users can be warned if the command line size is too long which
prevents surprise if things don't work after bootup.

This patch updates the boot protocol to contain a field called
"cmdline_size" that contain the length of the command line (excluding
the terminating zero).

The patch also adds missing fields (of protocol version 2.05) to the x86_64
setup code.


Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]>
Cc: Alon Bar-Lev <[EMAIL PROTECTED]>


Acked-by: H. Peter Anvin <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH take3 00/20] Make common x86 arch area for i386 and x86_64 - Take 3

2007-03-15 Thread Rusty Russell
On Thu, 2007-03-15 at 01:13 -0400, Steven Rostedt wrote:
> Once again here's an attempt to put the shared files of x86_64 and i386
> into a separate directory.

OK, that's fine, but the next step is to have "make ARCH=x86" compile,
with a config option as to whether to build 32 or 64 bit.  This will
involve a fair amount of Makefile hair, but if you can get Andi to buy
into that then the rest is a simple matter of code churn.  For most
kernel hackers, this would be the flag day.

Moving the rest of the files across to xxx_32.c, xxx_64.h etc is going
to involve a great deal of untangling and code cleanup.  It's also going
to completely screw a whole heap of my cleanup patches.  Oh well.

(Still hoping for an executive summary from the PPC folks).

Cheers!
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sky2 PHY setup

2007-03-15 Thread Thomas Glanzmann
Hello Stephen,

> yesterday I pulled from Linus tree because I saw the sky2 updated and I
> tried to break it but it seems that my problems are gone. I let you know
> if anything pops up in the future.

bad news. I today tried the sky2 driver which is in Linus Kernel Tree
(HEAD) on a machine with very high network load and it stopped working
without any kernel messages after doing a flawless job under high load
for 5 hours. My watchdog rebooted the machine after 500 seconds. ;-(

Thomas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-15 Thread Eric W. Biederman
Alan Cox <[EMAIL PROTECTED]> writes:

>> stuff is happening by comparing page->count and page->_mapcount, but it
>> certainly wouldn't be conclusive.  But, does this kind of nonsense even
>> happen in practice?  
>
> "Is it useful for me as a bad guy to make it happen ?"

To create a DOS attack.

- Allocate some memory you know your victim will want in the future,
  (shared libraries and the like).
- Wait until your victim is using the memory you allocated.
- Terminate your memory resource group.
- Victim is pushed over memory limits by your exiting.
- Victim can no longer allocate memory
- Victim dies

It's not quite that easy unless your victim calls mlockall(MCL_FUTURE),
but the potential is clearly there.

Am I missing something?  Or is this fundamental to any first touch scenario?

I just know I have problems with first touch because it is darn hard to
reason about.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Return EPERM not ECHILD on security_task_wait failure

2007-03-15 Thread Roland McGrath
wait* syscalls return -ECHILD even when an individual PID of a live
child was requested explicitly, when security_task_wait denies the
operation.  This means that something like a broken SELinux policy
can produce an unexpected failure that looks just like a bug with
wait or ptrace or something.

This patch makes do_wait return -EPERM instead of -ECHILD if some
children were ruled out solely because security_task_wait failed.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 kernel/exit.c |   12 +++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f132349..a41052f 100644  
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1067,7 +1067,7 @@ static int eligible_child(pid_t pid, int
return 2;
 
if (security_task_wait(p))
-   return 0;
+   return -1;
 
return 1;
 }
@@ -1449,6 +1449,7 @@ static long do_wait(pid_t pid, int optio
DECLARE_WAITQUEUE(wait, current);
struct task_struct *tsk;
int flag, retval;
+   int allowed, denied;
 
add_wait_queue(>signal->wait_chldexit,);
 repeat:
@@ -1457,6 +1458,7 @@ repeat:
 * match our criteria, even if we are not able to reap it yet.
 */
flag = 0;
+   allowed = denied = 0;
current->state = TASK_INTERRUPTIBLE;
read_lock(_lock);
tsk = current;
@@ -1472,6 +1474,12 @@ repeat:
if (!ret)
continue;
 
+   if (unlikely(ret < 0)) {
+   denied = 1;
+   continue;
+   }
+   allowed = 1;
+
switch (p->state) {
case TASK_TRACED:
/*
@@ -1570,6 +1578,8 @@ check_continued:
goto repeat;
}
retval = -ECHILD;
+   if (unlikely(denied) && !allowed)
+   retval = -EPERM;
 end:
current->state = TASK_RUNNING;
remove_wait_queue(>signal->wait_chldexit,);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


core2 duo, interrupts: is this normal?

2007-03-15 Thread Norberto Bensa
Hello,

is this output, normal? I meant, why counters on CPU1 is zero? Isn't this 
balanced?

$ cat /proc/interrupts
   CPU0   CPU1
  0:4180170  0   IO-APIC-edge  timer
  1:   8060  0   IO-APIC-edge  i8042
  7:  0  0   IO-APIC-edge  parport0
  9:  0  0   IO-APIC-fasteoi   acpi
 12:  5  0   IO-APIC-edge  i8042
 16: 322297  0   IO-APIC-fasteoi   uhci_hcd:usb3, libata, nvidia, 
EMU10K1
 17: 896399  0   IO-APIC-fasteoi   bttv0, eth0, libata
 18:  72867  0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb7
 19:  27770  0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb5
 20:  0  0   IO-APIC-fasteoi   uhci_hcd:usb4
 21:  0  0   IO-APIC-fasteoi   uhci_hcd:usb6
 22:  3  0   IO-APIC-fasteoi   ohci1394
 23:155  0   IO-APIC-fasteoi   HDA Intel
219: 103056  0   PCI-MSI-edge  libata
NMI:  0  0
LOC:40776134077622
ERR:  0
MIS:  0


Many thanks in advance,
Norberto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd

2007-03-15 Thread David Chinner
On Wed, Mar 14, 2007 at 12:34:29PM +0100, Marco Berizzi wrote:
> Hello everybody.
> Since 2.6.19.2 + commit 7fbbb01dca7704d52ace6f45a805c98a5b0362f9

What commit is that? gitweb search tells me it's an nmi watchdog
change. Doesn't seem likely to change XFS behaviour - can
you post a url to the commit?

> I'm experimenting these errors.
> 2.6.19.1 has been worked good for more
> than 30 days.

With the above commit?

> I have reverted back to 2.6.19.1 to see if
> this problem happens again.

without the above commit?

> find_or_create_page+0x37/0x8e
> _xfs_buf_lookup_pages+0x132/0x2ea
> _xfs_buf_initialize+0xc8/0xf6
> xfs_buf_get_flags+0xf8/0x11d
> xfs_buf_read_flags+0x1c/0x7f
> xfs_trans_read_buf+0x16a/0x34f
> xfs_itobp+0x7c/0x242
> xfs_iread+0x68/0x1d3
> xfs_iget_core+0xe7/0x687
> xfs_iget+0xd8/0x150
> xfs_dir_lookup_int+0x98/0x10e
> xfs_lookup+0x5a/0x90
> xfs_vn_lookup+0x52/0x93

Curious - never seen this before - possibly a corrupted inode
number in the directory has led to this.

> ba 4e 8b cd
> Mar 12 14:35:21 Pleiadi kernel: Filesystem "sda8": XFS internal error
> xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c.  Caller
> 0xc01b00bd
> Mar 12 14:35:21 Pleiadi kernel:  [] xfs_da_do_buf+0x70c/0x7b1
> Mar 12 14:35:21 Pleiadi kernel:  [] xfs_da_read_buf+0x30/0x35
> Mar 12 14:35:21 Pleiadi kernel:  [] xfs_da_read_buf+0x30/0x35

Hmm - these could simply be follow-on errors from the first
problem - the buffer would now probably be bad or corrupted,
and the directory buffer read code here is saying the buffer
is bad. All the errors appear to have thesame data in the buffer
(which is lacking the correct magic numbers) so i'd say they
are related to the above error.

Can you run xfs_repair on that filesystem and see if reports
(and fixes) any problems?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/13] BLK_DEV_IDE_CELLEB dependency fix

2007-03-15 Thread Akira Iguchi
Hi,

> Bart wrote:
>> Al wrote:
>> So AFAICS the minimal fix for that sucker is dependency on BLK_DEV_IDE=y;
>> however, I really wonder if
>>  * it needs to be linked into ide-core (as opposed to being a normal
>> module of its own)
>
>AFAICS there are no legacy device ordering issues with scc_pata so it doesn't
>need to be linked into ide-core but I'll leave the definitive answer to Akira
>
>>  * alternatively, its init should be called explicitly.

I don't have the answer why scc_pata is linked into ide-core.
Reviewing your comments and codes, I will make the following fixes:
  * remove link to ide-core and make normal module
  * move from ide/ppc to ide/pci

I will send these patches later.

Best regards,
Akira Iguchi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] revoke: core code

2007-03-15 Thread Andrew Morton
On Sun, 11 Mar 2007 13:30:49 +0200 (EET) Pekka J Enberg <[EMAIL PROTECTED]> 
wrote:

> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> The revokeat(2) and frevoke(2) system calls invalidate open file
> descriptors and shared mappings of an inode. After an successful
> revocation, operations on file descriptors fail with the EBADF or
> ENXIO error code for regular and device files,
> respectively. Attempting to read from or write to a revoked mapping
> causes SIGBUS.
> 
> The actual operation is done in two passes:
> 
>  1. Revoke all file descriptors that point to the given inode. We do
> this under tasklist_lock so that after this pass, we don't need
> to worry about racing with close(2) or dup(2).
>
>  2. Take down shared memory mappings of the inode and close all file
> pointers.
> 
> The file descriptors and memory mapping ranges are preserved until the
> owning task does close(2) and munmap(2), respectively.
> 
> ...
>
> +asmlinkage int sys_revokeat(int dfd, const char __user *filename);
> +asmlinkage int sys_frevoke(unsigned int fd);

n all system calls must return long.

> +static int revoke_vma(struct vm_area_struct *vma, struct zap_details 
> *details)
> +{
> + unsigned long restart_addr, start_addr, end_addr;
> + int need_break;
> +
> + start_addr = vma->vm_start;
> + end_addr = vma->vm_end;
> +
> + /*
> +  * Not holding ->mmap_sem here.
> +  */
> + vma->vm_flags |= VM_REVOKED;

so  the modification of vm_flags is racy?

> + smp_mb();

Please always document barriers.  There's presumably some vm_flags reader
we're concerned about here, but how is the code reader to know what the
code writer was thinking?


> +  again:
> + restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr,
> +   details);
> +
> + need_break = need_resched() || need_lockbreak(details->i_mmap_lock);
> + if (need_break)
> + goto out_need_break;
> +
> + if (restart_addr < end_addr) {
> + start_addr = restart_addr;
> + goto again;
> + }
> + return 0;
> +
> +  out_need_break:
> + spin_unlock(details->i_mmap_lock);
> + cond_resched();
> + spin_lock(details->i_mmap_lock);
> + return -EINTR;
> +}
> +
> +static int revoke_mapping(struct address_space *mapping, struct file 
> *to_exclude)
> +{
> + struct vm_area_struct *vma;
> + struct prio_tree_iter iter;
> + struct zap_details details;
> + int err = 0;
> +
> + details.i_mmap_lock = >i_mmap_lock;
> +
> + spin_lock(>i_mmap_lock);
> + vma_prio_tree_foreach(vma, , >i_mmap, 0, ULONG_MAX) {
> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) {
> + err = revoke_vma(vma, );
> + if (err)
> + goto out;
> + }
> + }
> +
> + list_for_each_entry(vma, >i_mmap_nonlinear, 
> shared.vm_set.list) {
> + if ((vma->vm_flags & VM_SHARED) && vma->vm_file != to_exclude) {
> + err = revoke_vma(vma, );
> + if (err)
> + goto out;
> + }
> + }
> +  out:
> + spin_unlock(>i_mmap_lock);
> + return err;
> +}

This all looks very strange.  If the calling process expires its timeslice,
the entire system call fails?

What's happening here?


> +
> +int generic_file_revoke(struct file *file)
> +{
> + int err;
> +
> + /*
> +  * Flush pending writes.
> +  */
> + err = do_fsync(file, 1);
> + if (err)
> + goto out;
> +
> + /*
> +  * Make pending reads fail.
> +  */
> + err = invalidate_inode_pages2(file->f_mapping);
> +
> +  out:
> + return err;
> +}
> +
> +EXPORT_SYMBOL(generic_file_revoke);

do_fsync() is seriously suboptimal - it will run an ext3 commit. 
do_sync_file_range(...,
SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER)
will run maybe five times quicker.

But otoh, do_sync_file_range() will fail to write back the pages for a
data=journal ext3 file, I expect (oops).


Why is this code using invalidate_inode_pages2()?  That function keeps on
breaking, has ill-defined semantics and will probably change in the future.

Exactly what semantics are you looking for here, and why?

The blank line before the EXPORT_SYMBOL() is a waste of space.

> +/*
> + *   Filesystem for revoked files.
> + */
> +
> +static struct inode *revokefs_alloc_inode(struct super_block *sb)
> +{
> + struct revokefs_inode_info *info;
> +
> + info = kmem_cache_alloc(revokefs_inode_cache, GFP_NOFS);
> + if (!info)
> + return NULL;
> +
> + return >vfs_inode;
> +}

Why GFP_NOFS?

> ===
> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ uml-2.6/include/linux/revoked_fs_i.h  2007-03-11 13:09:20.0 
> +0200
> @@ -0,0 +1,20 @@
> +#ifndef 

Re: AMD64 kernel oops

2007-03-15 Thread Parag Warudkar
Joerg Platte  naasa.net> writes:

> Pid: 14, comm: events/0 Not tainted 2.6.18-4-amd64 #1
> RIP: 0010:[]  [] keyring_destroy+0x32/0x96

[Snip]

> Can this oops be caused by a known and already 
> fixed problem in a newer kernel versions? In this case I would submit a bug 
> to the Debian BTS. Otherwise what can I do to further reproduce and debug 
> this oops?
> 

Check out http://bugzilla.kernel.org/show_bug.cgi?id=8067 which is a duplicate 
of
http://bugzilla.kernel.org/show_bug.cgi?id=7727 which is fixed. There is a 
patch available on the 
bugzilla if you want to try it out. 

HTH
Parag



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_REORDER Kconfig help strange sentence.

2007-03-15 Thread Randy Dunlap
On Tue, 13 Mar 2007 17:37:35 +1100 Rusty Russell wrote:

> On Tue, 2007-03-13 at 00:56 +0100, Andi Kleen wrote:
> > On Tue, Mar 13, 2007 at 10:18:03AM +1100, Rusty Russell wrote:
> > > OK, this confused me:
> > > 
> > > Function reordering (REORDER) [N/y/?] (NEW) ?
> > > 
> > > This option enables the toolchain to reorder functions for a more 
> > > optimal TLB usage. If you have pretty much any version of 
> > > binutils, 
> > > this can increase your kernel build time by roughly one minute.
> > > 
> > > "If you have pretty much any version of binutils"?  Huh?
> > > 
> > > You mean "This will slow your kernel build by about a minute"?
> > 
> > Yes. Lots of sections seem to trigger some quadratic behaviour in ld.
> > 
> > It might be fixed in some unreleased CVS version though (not 100% sure) 
> > 
> > -Andi
> 
> OK, well here is a patch for the moment.
> 
> ==
> Clarify CONFIG_REORDER explanation
> 
> if (1 && X) => if (X).
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
> 
> diff -r de5618b5e562 arch/x86_64/Kconfig
> --- a/arch/x86_64/Kconfig Tue Mar 13 11:41:55 2007 +1100
> +++ b/arch/x86_64/Kconfig Tue Mar 13 17:27:05 2007 +1100
> @@ -632,8 +632,8 @@ config REORDER
>   default n
>   help
>   This option enables the toolchain to reorder functions for a more 
> - optimal TLB usage. If you have pretty much any version of binutils, 
> -  this can increase your kernel build time by roughly one minute.
> + optimal TLB usage.  This will slow your kernel build by
> +  roughly one minute.

Please consistently use  for help text.
Yes, it was already mucked up.

>  config K8_NB
>   def_bool y



---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/22 take 3] UBI: EBA unit

2007-03-15 Thread Randy Dunlap
On Thu, 15 Mar 2007 18:29:51 -0500 Josh Boyer wrote:

> On Thu, Mar 15, 2007 at 02:24:10PM -0700, Randy Dunlap wrote:
> > On Thu, 15 Mar 2007 11:07:03 -0800 Andrew Morton wrote:
> > 
> > > 
> > > There's way too much code here to expect it to get decently reviewed, 
> > > alas.
> > 
> > Yes.
> > 
> > /me repeats wish that Not Everything Should Be Sent to lkml.  :(
> 
> Just curious, but where would you suggest this be sent to for review then?

Valid question.  I should have chosen some other more appropriate
patch to make that comment.

I don't see a better list for UBI patches, so lkml is OK IMO.


Here is a summary of my thinking on Linux-related mailing lists.

1.  Bug reports can go to lkml or focused mailing lists.

2.  Development (like patches) should go to focused mailing lists
if there is such a list and they have enough usage.

Development areas that qualify for this IMO are:
- ACPI
- ATA
- file systems
- frame buffer
- ieee1394
- MM/VM
- multimedia
- networking
- PCI
- power management, suspend/resume
- SCSI
- sound
- USB
- virtualization


(not that I expect anything close to concensus on this)
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-15 Thread Trond Myklebust
On Wed, 2007-03-14 at 21:13 -0700, Mark Fasheh wrote:
> Hi Nick,
> 
> On Wed, Mar 14, 2007 at 02:38:22PM +0100, Nick Piggin wrote:
> > Introduce write_begin, write_end, and perform_write aops.
> > 
> > These are intended to replace prepare_write and commit_write with more
> > flexible alternatives that are also able to avoid the buffered write
> > deadlock problems efficiently (which prepare_write is unable to do).
> 
> > Index: linux-2.6/include/linux/fs.h
> > ===
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -449,6 +449,17 @@ struct address_space_operations {
> >  */
> > int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
> > int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
> > +
> > +   int (*write_begin)(struct file *, struct address_space *mapping,
> > +   loff_t pos, unsigned len, int intr,
> > +   struct page **pagep, void **fsdata);
> > +   int (*write_end)(struct file *, struct address_space *mapping,
> > +   loff_t pos, unsigned len, unsigned copied,
> > +   struct page *page, void *fsdata);
> 
> Are we going to get rid of the file and intr arguments btw? I'm not sure
> intr is useful, and mapping is probably enough to get whatever we inside
> ->write_begin / ->write_end.

Hell no! Struct file carries information that is essential for those of
us that use strong authentication. It stays.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-15 Thread Mark Fasheh
On Thu, Mar 15, 2007 at 04:06:39PM -0400, Trond Myklebust wrote:
> Hell no! Struct file carries information that is essential for those of
> us that use strong authentication. It stays.

Joel pointed this out yesterday for the nfs case. Not to worry, we'll keep
struct file around :)
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGFIX][PATCH] fixing placement of register stack under ulimit -s

2007-03-15 Thread KAMEZAWA Hiroyuki
On Fri, 16 Mar 2007 06:20:47 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> On Thu, 15 Mar 2007 09:57:28 -0600
> "David Mosberger-Tang" <[EMAIL PROTECTED]> wrote:
> 
> > But aren't you going to be limited to less than a page worth of
> > register-backing store even with your patch applied because the
> > backing store will end up overflowing the memory stack?
> > 
> 
> I think pthread's stack, which is created by malloc, is also shared
> among register-stack and memory-stack. 
> (glibc's pthread's stack is limited by ulimit, too.)
> 
> So, it seems stack_size_limit = register_stack_limit + memory_stack_limit
> is a consistent way. I'm sorry if I don't catch your point.
> 
BTW, what way do you recommened to fix this register-stack/memory-stack upside
down problem ?

Plan A) just handle upside-down case in page fault handler.
This means ulimit -s limitation will limit amount of memory-stack and
register stack independently.
Plan B) handle upside-down case in page fault handler and add modify 
acct_stack_growth() to be able to handle the limitation of sum
of separated vmas.(vma for reg stack and mem stack). 
Plan C) don't allow this upside down as this patch. but change calculation of 
rbs_top.

Note:
To see the problem which my patch want to fix run following code under ulimit 
-s.

==
void eat_stack(int num) {
printf("%d\n", num);
eat_stack(num - 1);
}

int main (void) {
eat_stack(1);
}
==

-- Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    2   3   4   5   6   7   8   >