Re: [bug] xfrm_state_lock: possible circular locking dependency detected

2007-11-23 Thread Ingo Molnar

* Herbert Xu <[EMAIL PROTECTED]> wrote:

> On Fri, Nov 23, 2007 at 04:38:51PM +0100, Ingo Molnar wrote:
> > 
> > DaveJ's Fedora 8 rpm for 2.6.24 works petty well, except for the 
> > neworking related lockdep assert attached below, which happened while 
> > starting up ipsec. Let me know if you need any more info - it's a pretty 
> > stock setup.
> 
> Thanks for the report Ingo!
> 
> This is indeed a regression caused by:
> 
> commit 050f009e16f908932070313c1745d09dc69fd62b
> Author: Herbert Xu <[EMAIL PROTECTED]>
> Date:   Tue Oct 9 13:31:47 2007 -0700
> 
> [IPSEC]: Lock state when copying non-atomic fields to user-space
> 
> For 2.6.24 I'm simply going to revert this change since that just puts 
> us back to the same state we've been for the last few years.
> 
> For 2.6.25 I'll do a proper fix by making sure that every xfrm state 
> user obeys the rule that if x->lock is to be taken with 
> xfrm_state_lock then it must be done from within.

ok, great. I cannot test the revert because i only run distro kernels on 
this box so i can only confirm that the bug is gone once your revert is 
upstream and DaveJ has built a new Fedora kernel for it (which is 1-2 
days after the commit goes upstream). So consider it fixed once you do 
the revert and i'll re-report it if i see any similar assert on a kernel 
that has this commit reverted.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets

2007-11-23 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Eric Dumazet wrote:
> 1) Can the fd passing with recvmsg() on AF_UNIX also gets O_CLOEXEC
> support ?

Already there, see MSG_CMSG_CLOEXEC.


> 2) Why this O_NONBLOCK ability is needed for sockets ? Is it a security
> issue, and if yes could you remind it to me ?

No security issue.  But look at any correct network program, all need to
set the mode to non-blocking.  Adding this support to the syscall comes
at almost no cost and it cuts the cost for every program down by one or
two syscalls.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHR9YQ2ijCOnn/RHQRArbyAJ0d25FPg/BWmJ4YIzJKhO9iaBJNXwCgmpuX
PAA6u3Dc56AlBegTRqtqJPc=
=j5vi
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets

2007-11-23 Thread Eric Dumazet

Ulrich Drepper a écrit :

This patch adds support for setting the O_NONBLOCK flag of the file
descriptors returned by socket, socketpair, and accept.



Thanks Ulrich for this v5 series. I have two more questions.

1) Can the fd passing with recvmsg() on AF_UNIX also gets O_CLOEXEC support ?

   (In my understanding, only accept(), socket(), socketcall(), socketpair()) 
are handled, so it might work on i386 (because recvmsg() is multiplexed under 
socketcall), but not on x86_64.


2) Why this O_NONBLOCK ability is needed for sockets ? Is it a security issue, 
and if yes could you remind it to me ?


Thanks

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver

2007-11-23 Thread Bryan Wu
On Nov 24, 2007 2:43 PM, David Woodhouse <[EMAIL PROTECTED]> wrote:
>
> On Fri, 2007-11-23 at 17:04 -0500, Robin Getz wrote:
> > It could be a runtime if() but we don't currently have the is_mach() all set
> > up properly today.
> >
> > This is because on most systems that Blackfin ships on - memory is the
> > dominate cost of the system, and end users don't want to take the either the
> > storage (flash) hit of having code they don't use, or the run time (DRAM)
> > overhead. They are fine with compiling 2 kernels for two platforms if it
> > means things are cheaper. :)
> >
> > That being said, we still need to go back, and add things properly - and 
> > just
> > let gcc optimise things away if it is not used - c code is more maintainable
> > than all the ifdefs we have today.
> >
> > This is the goal - it will just take a little bit to get there.
>
> For now I suspect you could at least define machine_is_bf52x() and
> machine_is_bf54x() which are hard-coded to either zero or one according
> to the configuration, and at least you wouldn't need to add ifdefs to
> drivers.
>

We got some plan to do this, but maybe cpu_is_bf52x() and
cpu_is_bf54x() are better.
Thanks.

-Bryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver

2007-11-23 Thread David Woodhouse

On Fri, 2007-11-23 at 17:04 -0500, Robin Getz wrote:
> It could be a runtime if() but we don't currently have the is_mach() all set 
> up properly today.
> 
> This is because on most systems that Blackfin ships on - memory is the 
> dominate cost of the system, and end users don't want to take the either the 
> storage (flash) hit of having code they don't use, or the run time (DRAM) 
> overhead. They are fine with compiling 2 kernels for two platforms if it 
> means things are cheaper. :)
> 
> That being said, we still need to go back, and add things properly - and just 
> let gcc optimise things away if it is not used - c code is more maintainable 
> than all the ifdefs we have today.
> 
> This is the goal - it will just take a little bit to get there.

For now I suspect you could at least define machine_is_bf52x() and
machine_is_bf54x() which are hard-coded to either zero or one according
to the configuration, and at least you wouldn't need to add ifdefs to
drivers.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-23 Thread James Bottomley

On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote:
> Le 23.11.2007 12:38, Hannes Reinecke a écrit :
> > Hannes Reinecke wrote:
> >> Laurent Riffard wrote:
> >>> Le 21.11.2007 23:41, Andrew Morton a écrit :
>  On Wed, 21 Nov 2007 22:45:22 +0100
>  Laurent Riffard <[EMAIL PROTECTED]> wrote:
> 
> > Le 21.11.2007 05:45, Andrew Morton a écrit :
> >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
> > Hello, 
> >
> > My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
> > that a bunch of task are blocked in "D" state, they seem to wait for
> > some I/O completion. I can try to hand-copy some data if requested.
> >
> > I found these messages in dmesg:
> >
> > ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
> > EXT3-fs: mounted filesystem with ordered data mode.
> > sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
> > driverbyte=DRIVER_OK,SUGGEST_OK
> > end_request: I/O error, dev sda, sector 16460
> > ReiserFS: sda7: found reiserfs format "3.6" with standard journal
> > ReiserFS: sda7: using ordered data mode
> > --
> > ReiserFS: sda7: Using r5 hash to sort names
> > sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
> > driverbyte=DRIVER_OK,SUGGEST_OK
> > end_request: I/O error, dev sdb, sector 19632
> > sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
> > driverbyte=DRIVER_OK,SUGGEST_OK
> > end_request: I/O error, dev sdb, sector 40037363
> > Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 
> > extents:1 across:1048568k
> > lp0: using parport0 (interrupt-driven).
> >
> > These errors occur *only* with 2.6.24-rc3-mm1, they are 100% 
> > reproducible.
> > 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.
> >
> > Maybe something is broken in pata_via driver ?
> >
>  Could be - 
>  libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
>  and 
>  pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
>  touch pata_via.c.
> >>> None of the above...
> >>>
> >>> I did a bisection, it spotted git-scsi-misc.patch. 
> >>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.
> >>>
> >>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not 
> >>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other 
> >>> commits are touching documentation or drivers I don't use. I'll try 
> >>> to revert only this one this evening.
> 
> I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 
> does fix the problem.
> 
> >> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an 
> >> error where
> >> I shouldn't. Checking ...
> >>
> > Ok, found it. We are blocking even special commands (ie requests with 
> > PREEMPT not set)
> > when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.
> 
> Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O 
> errors.

I think the problem is the way we treat BLOCKED and QUIESCED (the latter
is the state that the domain validation uses and which we cannot kill
fastfail on).  It's definitely wrong to kill fastfail requests when the
state is QUIESCE.

This patch (which is applied on top of Hannes original) separates the
BLOCK and QUIESCE states correctly ... does this fix the problem?

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 13e7e09..a7cf23a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1279,18 +1279,21 @@ int scsi_prep_state_check(struct scsi_device *sdev, 
struct request *req)
"rejecting I/O to dead device\n");
ret = BLKPREP_KILL;
break;
-   case SDEV_QUIESCE:
case SDEV_BLOCK:
/*
-* If the devices is blocked we defer normal commands.
-*/
-   if (!(req->cmd_flags & REQ_PREEMPT))
-   ret = BLKPREP_DEFER;
-   /*
 * Return failfast requests immediately
 */
if (req->cmd_flags & REQ_FAILFAST)
ret = BLKPREP_KILL;
+
+   /* fall through */
+
+   case SDEV_QUIESCE:
+   /*
+* If the devices is blocked we defer normal commands.
+*/
+   if (!(req->cmd_flags & REQ_PREEMPT))
+   ret = BLKPREP_DEFER;
break;
default:
/*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Herbert Xu
Alasdair G Kergon <[EMAIL PROTECTED]> wrote:
> Also io->pending may need better protection - atomic, but missing memory
> barriers?  (May be getting away without sometimes due to side-effects of
> other function calls, but needs doing properly.)

If it's using atomic_dec_and_test then that comes with an implicit
memory barrier.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree

2007-11-23 Thread Andrew Morgan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I believe it was you who once eloquently observed that, at its heart,
the POSIX (sic) capabilities model was all about providing a mechanism
for overriding the prevailing security policy (be it MAC or DAC or
whatever) in a defined way.

Casey Schaufler wrote:
> Now I know that there are lots of people who don't share my
> views on granularity, but I have lots of experiance with this
> and the cases where it actually makes sense to break the MAC
> capabilities up are rare.
> 
> That's my going in position, at any rate. I'm always open to
> finding out why I'm wrong.

Its not so much why you are wrong, as being clear that we're not using a
generic name and inadvertently limiting ourselves to a SMACK-like model...

It feels to me as if a MAC "override capability" is, if true to its
name, extra to the MAC model; any MAC model that needs an 'override' to
function seems under-specified... SELinux clearly feels no need for one,
and browsing through your SMACK patch, there are many instances where
this capability is used as an convenience privileged override. However,
in other situations, it appears as if the capability is required for
basic SMACK operations to succeed.

My sense is that there is a case to be made for: CAP_MAC_ADMIN and
CAP_MAC_OVERRIDE here. The former being for cases where SMACK (or
whatever MAC supports it) requires privilege to perform a privileged MAC
operation, and the latter for saying "OK, I'm without a paddle but need
one" (or words to that effect).

Cheers

Andrew

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHR8AA+bHCR3gb8jsRAqY/AJsGI56TDQyBD42LCovpJTYHkaL0pQCdHM5S
kk5v2O4ohY2O0z93JNdKVBY=
=dbQn
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH] SO_NO_CHECK for IPv6

2007-11-23 Thread Herbert Xu
David Schwartz <[EMAIL PROTECTED]> wrote:
> 
>> Regardless of whatever verifications your application is doing
>> on the data, it is not checksumming the ports and that's what
>> the pseudo-header is helping with.
> 
> So what? We are in the case where the data has already gotten to him. If it
> got to him in error, he'll reject it anyway. The receive checksum check will
> only reject packets that he would reject anyway. That makes it needless.

What if it goes to the wrong recipient who doesn't have the upper-
level checksums?

This is the whole point, IPv6 unlike IPv4 does not have IP header
checksums so the high-level needs to protect it by checksumming
the pseudo-header.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)

2007-11-23 Thread Tejun Heo
Tejun Heo wrote:
> Robert Hancock wrote:
>> We need to run any DMA command with result taskfile requested in ADMA mode
>> when the port is in ADMA mode, otherwise it may try to use the legacy DMA 
>> engine
>> in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
>> corruption could potentially result if this happened. Also WARN_ON() if we 
>> try
>> and send result taskfile commands while NCQ commands are still active, since 
>> the
>> hardware doesn't allow this.
>>
>> Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>
>>
>> @@ -1425,9 +1427,17 @@
>> +/* We can't handle result taskfile with NCQ commands active, since
>> +   retrieving the taskfile switches us out of ADMA mode and would abort
>> +   existing commands. */
>> +WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) &&
>> +(qc->ap->qc_allocated & ~(1 << qc->tag)));
> 
> I owe an apology here.  ap->qc_allocated & ~(1 << qc->tag) test isn't
> correct.  Sorry.  qc deferring happens after qc is allocated so the
> condition can trigger (although it should be rare) even when nothing is
> going wrong, so I guess it should be WARN_ON((qc->flags &
> ATA_QCFLAG_RESULT_TF) && link->sactive).

Or, just make the assumption clear by not allowing NCQ w/ RESULT_TF at all.

if (unlikely(qc->tf.protocol == ATA_PROT_NCQ &&
 (qc->flags & ATA_QCFLAG_RESULT_TF)) {
ata_dev_printk(qc->dev, KERN_ERR,
   "NCQ w/ RESULT_TF not allowed\n");
return AC_ERR_SYSTEM;
}

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)

2007-11-23 Thread Tejun Heo
Robert Hancock wrote:
> We need to run any DMA command with result taskfile requested in ADMA mode
> when the port is in ADMA mode, otherwise it may try to use the legacy DMA 
> engine
> in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
> corruption could potentially result if this happened. Also WARN_ON() if we try
> and send result taskfile commands while NCQ commands are still active, since 
> the
> hardware doesn't allow this.
> 
> Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>
> 
> @@ -1425,9 +1427,17 @@
> + /* We can't handle result taskfile with NCQ commands active, since
> +retrieving the taskfile switches us out of ADMA mode and would abort
> +existing commands. */
> + WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) &&
> + (qc->ap->qc_allocated & ~(1 << qc->tag)));

I owe an apology here.  ap->qc_allocated & ~(1 << qc->tag) test isn't
correct.  Sorry.  qc deferring happens after qc is allocated so the
condition can trigger (although it should be rare) even when nothing is
going wrong, so I guess it should be WARN_ON((qc->flags &
ATA_QCFLAG_RESULT_TF) && link->sactive).

Sorry. :-)

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/4] Timerfd v2 - new timerfd API

2007-11-23 Thread Michael Kerrisk
Hi Davide,

[...]

> +asmlinkage long sys_timerfd_create(int clockid, int flags)
>  {
> - int error;
> + int error, ufd;
>   struct timerfd_ctx *ctx;
>   struct file *file;
>   struct inode *inode;
> - struct itimerspec ktmr;
> -
> - if (copy_from_user(, utmr, sizeof(ktmr)))
> - return -EFAULT;
>  
>   if (clockid != CLOCK_MONOTONIC &&
>   clockid != CLOCK_REALTIME)
>   return -EINVAL;

Could I suggest here, the following placeholder addition:

if (flags != 0)
return -EINVAL;

Later than can replaced with something like:

if (flags & ~(O_NONBLOCK | O_CLOEXEC))
return -EINVAL;

Having the first of the checks above will allow userland to determine what
is implemented, rather than having non-zero flags silently ignored.

Cheers,

Michael

> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + init_waitqueue_head(>wqh);
> + ctx->clockid = clockid;
> + hrtimer_init(>tmr, clockid, HRTIMER_MODE_ABS);
> +
> + error = anon_inode_getfd(, , , "[timerfd]",
> +  _fops, ctx);
> + if (error) {
> + kfree(ctx);
> + return error;
> + }
> +
> + return ufd;
> +}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Tejun Heo
Robert Hancock wrote:
> This fixes some problems with ATAPI devices on nForce4 controllers in ADMA 
> mode
> on systems with memory located above 4GB. We need to delay setting the 64-bit
> DMA mask until the PRD table and padding buffer are allocated so that they 
> don't
> get allocated above 4GB and break legacy mode (which is needed for ATAPI
> devices).
> 
> Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

Acked-by: Tejun Heo <[EMAIL PROTECTED]>

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 24, 2007 4:49 AM, Alasdair G Kergon <[EMAIL PROTECTED]> wrote:
> On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
> > ... or I just don't see the bug.
>
> See my earlier post in this thread: there's a race in the write loop
> where a work struct could be used twice on the same queue.
> (Needs data structure change to fix that, which nobody has attempted
> to do yet.)

As I wrote in an earlier post:
I did see this lockdep message even with
agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted, so the
work struct is not used in the write loop.

> BTW To eliminate any internal lockdep concerns (and people say there
> should be no problem) temporarily add a second struct instead of reusing
> one on two queues.

I think, this might really be a lockdep bug, but as I'm not fluent
enough with C, please check, if my logik is correct:

The freed-locked-lock-test is the only function that uses this in lockdep.c:
static inline int in_range(const void *start, const void *addr, const void *end)
{
return addr >= start && addr <= end;
}
This  will return true, if addr is in the range of start (including)
to end (including).

But debug_check_no_locks_freed() seems does:
const void *mem_to = mem_from + mem_len
-> mem_to is the last byte of the freed range, that fits in_range
lock_from = (void *)hlock->instance;
-> first byte of the lock
lock_to = (void *)(hlock->instance + 1);
-> first byte of the next lock, not last byte of the lock that is being checked!
(Or am I reading this wrong?)

The test is:
if (!in_range(mem_from, lock_from, mem_to) &&
!in_range(mem_from, lock_to, mem_to))
continue;
So it tests, if the first byte of the lock is in the range that is freed ->OK
And if the first byte of the *next* lock is in the range that is freed
-> Not OK.

That would also explain the rather strange output:
=
[ BUG: held lock freed! ]
-
kcryptd/1022 is freeing memory
81011EBEFB00-81011EBEFB3F, with a lock still held there!
  (kcryptd){--..}, at: [] run_workqueue+0x129/0x210
2 locks held by kcryptd/1022:
 #0:  (kcryptd){--..}, at: [] run_workqueue+0x129/0x210
 #1:  (>work#2){--..}, at: [] run_workqueue+0x129/0x210

That claims that the lock of the *workqueue* struct, not the work
struct is getting freed!
But I'm still happily using the dm-crypt device, even 19 hours after
that message.

So my current best guess to the source of this message is, that with
the change in the ref counting it is now possible that the work struct
is really getting freed before the workqueue function returns. But as
the comment in run_workqueue() says, that is still legal.
But now the first byte of the next lock is part of the freed memory
and so the wrong "held lock freed" is triggered.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-23 Thread Rusty Russell
On Saturday 24 November 2007 06:53:30 Andi Kleen wrote:
> This serves as a documentation 
> on what is considered internal. And if some obscure module (in or
> out of tree) wants to use an internal interface they first have
> to send the module maintainer a patch and get some review this way.

So, you're saying that there's a problem with in-tree modules using symbols 
they shouldn't?  Can you give an example?

> I believe that is fairly important in tree too because the
> kernel has become so big now that review cannot be the only
> enforcement mechanism for this anymore.

If people aren't reviewing, this won't make them review.  I don't think the 
problem is that people are conniving to avoid review.

> Another secondary reason is that there are too many exported interfaces
> in general.

Probably, but this doesn't reduce it.  

> Several distributions have policies that require to 
> keep the changes to these exported interfaces minimal and that
> is very hard with thousands of exported symbol.  With name spaces
> the number of truly publicly exported symbols will hopefully
> shrink to a much smaller, more manageable set.

*This* makes sense.  But it's not clear that the burden should be placed on 
kernel coders.  You can create a list yourself.  How do I tell the difference 
between "truly publicly exported" symbols and others?

If a symbol has more than one in-tree user, it's hard to argue against an 
out-of-tree module using the symbol, unless you're arguing against *all* 
out-of-tree modules.

Sorry,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree

2007-11-23 Thread Casey Schaufler

--- Andrew Morgan <[EMAIL PROTECTED]> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Casey Schaufler wrote:
> > In the end we can call it CAP_LATE_FOR_DINNER if that's the only way
> > I can move forward. CAP_MAC_OVERRIDE is the obvious partner to
> > CAP_DAC_OVERRIDE, so that's still my preference. CAP_SMACK_OVERRIDE
> > unnecessarily ties it to one LSM, and in spite of what some people
> > still seem to think, I see more LSMs in the pipeline.
> 
> I'd personally not like to see SMACK appear in a capability name. No
> offense Casey, but SMACK may be displaced with YAMAC (*) someday, and
> I'd hate to have wasted a capability on it.

No offense taken. Technology continues marching forward and all that.

> Using CAP_MAC_OVERRIDE makes
> sense to me - even if its not (yet/ever) honored by all MAC LSMs.

Thanks.

> I do have a question about whether one capability is sufficient in
> general for MAC. Looking at the:
> 
>   http://wt.xpilot.org/publications/posix.1e/download.html
> 
> last draft, there are no less than 5 capabilities (p173) allocated for
> MAC. Presumably there was a good reason for 5 and not 1 back then -
> could you summarize what is different now?

There are to my mind two important differences. The first is that
my experiance with Trusted Irix (Trix from here on), which used (uses?)
capabilities and MAC together, is that the granularity is lost on
99 44/100% of programs, programmers, evaluators, admins, and problems.
You just don't get that many cases where it actually gets you anything
to have less than all the MAC capabilities. Applications that care
about MAC to the extent that they use the capabilities tend to use the
lot, if not all the time, in certain circumstances. I'm afraid that
I am not a major fan of fine grained privilege based on my experiance
with it.

The second and perhaps more important reason is that the POSIX
draft assumed a Bell & LaPadula sensitivity model, or at least
a model very much like it. What would CAP_MAC_DOWNGRADE mean on
a Smack system configured:

OneHand  OtherHandr---
OtherHandGrippingHand r---
GrippingHand OneHand  r---

What would CAP_MAC_UPGRADE mean, for that matter? It's even worse
to consider that the relationships can change.

CAP_MAC_READ and CAP_MAC_WRITE still make sense, as does
CAP_MAC_RELABEL_SUBJ. But if you have CAP_MAC_WRITE you can
do pretty much the same damage as if you have CAP_MAC_RELABEL_SUBJ,
and the other way around, and if you're not going to use one
of the other capabilities after you find out interesting things
using CAP_MAC_READ it's hard to figure why you'd bother.

Now I know that there are lots of people who don't share my
views on granularity, but I have lots of experiance with this
and the cases where it actually makes sense to break the MAC
capabilities up are rare.

That's my going in position, at any rate. I'm always open to
finding out why I'm wrong.

Thank you.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ipmi_watchdog can not reset the kernel panic machine

2007-11-23 Thread youquan_song
Build kernel-2.6.24-rc3.  pmi_watchdog can not reset the kernel panic
machine.  The watchdog can never to record panic information to IPMI SEL.

1. I disable auto reset when kernel panic by echo "0" >
/proc/sys/kernel/panic

2.  modprobe ipmi_watchdog timeout=120 action=reset

3.  Load a driver, the driver will call panic() when  ioctl to call into
the driver.

4.  By ioctl call into the driver,  panic the system.

in wdog_panic_handler, I printk "ipmi_watchdog_state=WDOG_TIMEOUT_NONE"
so, the watchdog can never to record panic information to IPMI SEL.


static int wdog_panic_handler(struct notifier_block *this,
  unsigned long event,
  void  *unused)
{
static int panic_event_handled = 0;

/* On a panic, if we have a panic timeout, make sure to extend
   the watchdog timer to a reasonable value to complete the
   panic, if the watchdog timer is running.  Plus the
   pretimeout is meaningless at panic time. */
if (watchdog_user && !panic_event_handled &&
ipmi_watchdog_state != WDOG_TIMEOUT_NONE) {
/* Make sure we do this only once. */
panic_event_handled = 1;

timeout = 255;
pretimeout = 0;
panic_halt_ipmi_set_timeout();
}

return NOTIFY_OK;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Laptop keyboard unusable when ACPI is active was Re: [2.6.22] i8042, ACPI, ipw2100 and issues reported by psmouse.c atkbd.c

2007-11-23 Thread Len Brown
On Sunday 21 October 2007 05:43, [EMAIL PROTECTED] wrote:

> I have emerged lm_sensors but can't get it running - it keeps saying "No
> sensors found!" and complaining about kernel drivers not properly setup.
> I have attached the output of sensors-detect, from which it seems that
> the kernel is OK.

In this case, getting sensors installed is the opposite of what you want to do.
The idea is to simplify the system until it works, then figure out what
simplification made it work.

ie. disable sensors entirely by building a kernel with CONFIG_HWMON=n

If that makes things work, then it is a clue.
If that was disabled already, then just keep it disabled.

cheers,
-Len

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Laptop keyboard unusable when ACPI is active

2007-11-23 Thread Len Brown
On Thursday 22 November 2007 02:24, [EMAIL PROTECTED] wrote:

> It is also important to note that this bug always comes with bug 8740 
> http://bugzilla.kernel.org/show_bug.cgi?id=8740 (also confirmed and also 
> an ACPI issue).

No, 8740 is not an ACPI issue.
http://bugzilla.kernel.org/show_bug.cgi?id=8740#c2

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
> Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio().
> Now there is an additional call to crypt_dec_pending() to balance the
> additional ref placed into crypt_write_io_process(). And that one is
> not called from whatever context/thread cleans up after
> make_generic_request, but directly in the context/thread of the caller
> of crypt_write_io_process(), and that is kcryptd.
 
Please do look at the latest patches (always at
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/series.html
 )
where you'll see I've already disentangled the mess of functions
and given them more understandable names, so at least following the program
flow is easier.

Read and write do the ref counting differently (but correctly AFAICT) - I want
that changing, but held back from doing it without first checking whether the
later patches (not yet reviewed) provide a reason to prefer one method
over the other.

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Laptop keyboard unusable when ACPI is active

2007-11-23 Thread Len Brown
On Friday 23 November 2007 02:44, Mats Johannesson wrote:

> The bad interaction between ACPI controlled EC (embedded controller)
> and the i8042 interrupt handler is theorized about in detail at OLPCs
> http://dev.laptop.org/ticket/2401 - almost at the end of that page.
> Thanks to Daniele C for the link.

huh?

I believe that the OLPC XO1 does not run in ACPI mode
and thus does not use the ACPI EC driver to talk to
the EC on their board.

Presumably they use some native embedded controller driver to
talk to their platform specific embedded controller.

I don't know why they call their interrupt an SCI.
Per above, it can't be an ACPI SCI.  Presumably they
call it that b/c their chipset documentation calls it
that too, on the (invalid) assumption that an ACPI-enabled
OS and firmware would be running on the hardware.

Please let me know if I'm wrong.

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
Also io->pending may need better protection - atomic, but missing memory
barriers?  (May be getting away without sometimes due to side-effects of
other function calls, but needs doing properly.)

[BTW Other device-mapper atomic_t usage also needs reviewing.]

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
> ... or I just don't see the bug.
 
See my earlier post in this thread: there's a race in the write loop
where a work struct could be used twice on the same queue.
(Needs data structure change to fix that, which nobody has attempted
to do yet.)

BTW To eliminate any internal lockdep concerns (and people say there
should be no problem) temporarily add a second struct instead of reusing
one on two queues.

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NET: dmfe: don't access configuration space in D3 state

2007-11-23 Thread Maxim Levitsky
On Saturday 24 November 2007 05:10:37 Jeff Garzik wrote:
> Maxim Levitsky wrote:
> >>From 7e24227257f315e52fe0b494dc1253d2a0ce5dff Mon Sep 17 00:00:00 2001
> > From: Maxim Levitsky <[EMAIL PROTECTED]>
> > Date: Fri, 23 Nov 2007 01:15:36 +0200
> > Subject: [PATCH] NET: dmfe: don't access configuration space in D3 state
> >  Accidently I reversed the order of pci_save_state and
> >  pci_set_power_state in .suspend()/.resume() callbacks
> > 
> > Signed-off-by: Maxim Levitsky <[EMAIL PROTECTED]>
> > ---
> >  drivers/net/tulip/dmfe.c |4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> 
> applied #upstream-fixes, after hand-editing patch changelog taking by 
> git-am from email body
> 
> 
> 

Hi,

Thanks,
Next time I will be more careful with changelogs.

Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree

2007-11-23 Thread Andrew Morgan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Casey Schaufler wrote:
> In the end we can call it CAP_LATE_FOR_DINNER if that's the only way
> I can move forward. CAP_MAC_OVERRIDE is the obvious partner to
> CAP_DAC_OVERRIDE, so that's still my preference. CAP_SMACK_OVERRIDE
> unnecessarily ties it to one LSM, and in spite of what some people
> still seem to think, I see more LSMs in the pipeline.

I'd personally not like to see SMACK appear in a capability name. No
offense Casey, but SMACK may be displaced with YAMAC (*) someday, and
I'd hate to have wasted a capability on it. Using CAP_MAC_OVERRIDE makes
sense to me - even if its not (yet/ever) honored by all MAC LSMs.

I do have a question about whether one capability is sufficient in
general for MAC. Looking at the:

  http://wt.xpilot.org/publications/posix.1e/download.html

last draft, there are no less than 5 capabilities (p173) allocated for
MAC. Presumably there was a good reason for 5 and not 1 back then -
could you summarize what is different now?

Thanks

Andrew

(*) yet-another example of yet-another

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHR5mc+bHCR3gb8jsRAlB9AJsHPi1+fFp1ONKJCMFDpLS1lYG4AwCfYxMX
8aaU+sOBNHU01uldtrJ8cEI=
=/USy
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Blackfin SMC91x Driver: punt CONFIG_BFIN -- we already have CONFIG_BLACKFIN

2007-11-23 Thread Jeff Garzik

Bryan Wu wrote:

From: Mike Frysinger <[EMAIL PROTECTED]>

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
Signed-off-by: Bryan Wu <[EMAIL PROTECTED]>
---
 drivers/net/Kconfig  |2 +-
 drivers/net/smc91x.h |2 +-


applied 1-2 to #upstream-fixes


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NET: dmfe: don't access configuration space in D3 state

2007-11-23 Thread Jeff Garzik

Maxim Levitsky wrote:

From 7e24227257f315e52fe0b494dc1253d2a0ce5dff Mon Sep 17 00:00:00 2001

From: Maxim Levitsky <[EMAIL PROTECTED]>
Date: Fri, 23 Nov 2007 01:15:36 +0200
Subject: [PATCH] NET: dmfe: don't access configuration space in D3 state
 Accidently I reversed the order of pci_save_state and
 pci_set_power_state in .suspend()/.resume() callbacks

Signed-off-by: Maxim Levitsky <[EMAIL PROTECTED]>
---
 drivers/net/tulip/dmfe.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)



applied #upstream-fixes, after hand-editing patch changelog taking by 
git-am from email body



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2][2.6.24] ehea: Improve tx packets counting

2007-11-23 Thread Jeff Garzik

Thomas Klein wrote:

Using own tx_packets counter instead of firmware counters.

Signed-off-by: Thomas Klein <[EMAIL PROTECTED]>

---
 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |9 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)


applies 1-2 to #upstream-fixes


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] cxgb3 - fix MSI-X failure path

2007-11-23 Thread Jeff Garzik

Divy Le Ray wrote:

From: Divy Le Ray <[EMAIL PROTECTED]>

Return error code when msi-x settings fail.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/cxgb3_main.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)


applied 1-9 to #upstream, then trimmed all trailing whitespace


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] e100: free IRQ to remove warning when rebooting

2007-11-23 Thread Jeff Garzik

Ian Wienand wrote:

Hi,

When rebooting today I got

Will now restart.
ACPI: PCI interrupt for device :00:03.0 disabled
GSI 20 (level, low) -> CPU 1 (0x0100) vector 53 unregistered
Destroying IRQ53 without calling free_irq
WARNING: at 
/home/insecure/ianw/programs/git-kernel/linux-2.6/kernel/irq/chip.c:76 
dynamic_irq_cleanup()

Call Trace:
 [] show_stack+0x40/0xa0
sp=e0407c927b40 bsp=e0407c920eb8
 [] dump_stack+0x30/0x60
sp=e0407c927d10 bsp=e0407c920ea0
 [] dynamic_irq_cleanup+0x160/0x1e0
sp=e0407c927d10 bsp=e0407c920e70
 [] destroy_and_reserve_irq+0x30/0xc0
sp=e0407c927d10 bsp=e0407c920e40
 [] iosapic_unregister_intr+0x5b0/0x5e0
sp=e0407c927d10 bsp=e0407c920dd8
 [] acpi_unregister_gsi+0x30/0x60
sp=e0407c927d10 bsp=e0407c920db8
 [] acpi_pci_irq_disable+0x140/0x160
sp=e0407c927d10 bsp=e0407c920d88
 [] pcibios_disable_device+0xa0/0xc0
sp=e0407c927d20 bsp=e0407c920d68
 [] pci_disable_device+0x130/0x160
sp=e0407c927d20 bsp=e0407c920d38
 [] e100_shutdown+0x1c0/0x220
sp=e0407c927d30 bsp=e0407c920d08
 [] pci_device_shutdown+0x80/0xc0
sp=e0407c927d30 bsp=e0407c920ce8
 [] device_shutdown+0xf0/0x180
sp=e0407c927d30 bsp=e0407c920cc8
 [] kernel_restart+0x60/0x120
sp=e0407c927d30 bsp=e0407c920ca8
 [] sys_reboot+0x3b0/0x480
sp=e0407c927d30 bsp=e0407c920c30
 [] ia64_ret_from_syscall+0x0/0x20
sp=e0407c927e30 bsp=e0407c920c30
 [] ia64_ivt+0x00010620/0x400
sp=e0407c928000 bsp=e0407c920c30
Restarting system.

I think the solution might be to free the IRQ before the pci_device_shutdown

Signed-off-by: Ian Wienand <[EMAIL PROTECTED]>

---

 e100.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 3dbaec6..8ae5ac3 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2782,6 +2782,7 @@ static void e100_shutdown(struct pci_dev *pdev)
pci_enable_wake(pdev, PCI_D3cold, 0);
}
 
+	free_irq(pdev->irq, netdev);

pci_disable_device(pdev);
pci_set_power_state(pdev, PCI_D3hot);


agreed, though I think free_irq() should come after pci_disable_device() 
like it does in e100_suspend().


auke?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Mark Lord

Jeff Garzik wrote:

Robert Hancock wrote:
Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.



sata_mv doesn't do ATAPI at all...

..

Not yet, anyway.  Stay tuned..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] xfrm_state_lock: possible circular locking dependency detected

2007-11-23 Thread Herbert Xu
On Fri, Nov 23, 2007 at 04:38:51PM +0100, Ingo Molnar wrote:
> 
> DaveJ's Fedora 8 rpm for 2.6.24 works petty well, except for the 
> neworking related lockdep assert attached below, which happened while 
> starting up ipsec. Let me know if you need any more info - it's a pretty 
> stock setup.

Thanks for the report Ingo!

This is indeed a regression caused by:

commit 050f009e16f908932070313c1745d09dc69fd62b
Author: Herbert Xu <[EMAIL PROTECTED]>
Date:   Tue Oct 9 13:31:47 2007 -0700

[IPSEC]: Lock state when copying non-atomic fields to user-space

For 2.6.24 I'm simply going to revert this change since that
just puts us back to the same state we've been for the last
few years.

For 2.6.25 I'll do a proper fix by making sure that every xfrm
state user obeys the rule that if x->lock is to be taken with
xfrm_state_lock then it must be done from within.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/59] drivers/net/chelsio: Add missing "space"

2007-11-23 Thread Jeff Garzik

Joe Perches wrote:

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
---
 drivers/net/chelsio/cxgb2.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


appied 29-36 to netdev#upstream


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


netdev-2.6 rebased

2007-11-23 Thread Jeff Garzik
I pulled all the patches collected by DaveM in davem/netdev-2.6.git a 
few days ago into jgarzik/netdev-2.6.git#upstream.  As of a few minutes 
ago, jgarzik/netdev-2.6.git was rebased to the latest 2.6.24-rc 
(torvalds/linux-2.6.git).


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dmaengine: Correct invalid assumptions in the Kconfig text

2007-11-23 Thread Dan Williams
From: Haavard Skinnemoen <[EMAIL PROTECTED]>

This patch corrects recently changed (and now invalid) Kconfig
descriptions for the DMA engine framework:

 - Non-Intel(R) hardware also has DMA engines;
 - DMA is used for more than memcpy and RAID offloading.

In fact, on most platforms memcpy and RAID aren't factors, and DMA
exists so that peripherals can transfer data to/from memory while
the CPU does other work.

Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]>
Signed-off-by: David Brownell <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

This corrects a 'regression' of the Kconfig text that happened during
the 2.6.24 merge window.  Adrian was concerned that people were
needlessly turning this capability on without the requisite hardware,
but the wording is indeed misleading.

 drivers/dma/Kconfig |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 6a7d25f..c46b7c2 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -3,11 +3,13 @@
 #
 
 menuconfig DMADEVICES
-   bool "DMA Offload Engine support"
+   bool "DMA Engine support"
depends on (PCI && X86) || ARCH_IOP32X || ARCH_IOP33X || ARCH_IOP13XX
help
- Intel(R) offload engines enable offloading memory copies in the
- network stack and RAID operations in the MD driver.
+ DMA engines can do asynchronous data transfers without
+ involving the host CPU.  Currently, this framework can be
+ used to offload memory copies in the network stack and
+ RAID operations in the MD driver.
 
 if DMADEVICES
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: radeonfb i2c regression post-2.6.18.

2007-11-23 Thread Benjamin Herrenschmidt

On Fri, 2007-11-23 at 23:29 +0100, Jean Delvare wrote:
> On Fri, 23 Nov 2007 17:00:52 +0100, Michael Buesch wrote:
> > This patch fixes my crash problem.
> 
> Out of curiosity, what kind of crash was it? I admit that I can't see
> how the code could crash.

Really sneaky... apparently, keeping the i2c lines asserted on his
laptop model would drain enough current through the pullups (or the
chip) that the temperature will raise significantly, causing a thermal
shutdown if the machine was already warm.

A bit scary... looks to me that a pullup is a bit too weak somewhere on
the motherboard.

That also means that this fix should reduce power consumption on the
battery significantly on those machines as it must take quite a bit of
power to increase the temperature that significantly (either that, or
the heating part sits just next to the sensor).

Cheers,
Ben. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm1 - Kernel Panic on IO-APIC

2007-11-23 Thread Alexey Dobriyan
On Tue, Nov 20, 2007 at 10:18:39PM -0800, Andrew Morton wrote:
> On Wed, 21 Nov 2007 11:41:23 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
> > Hi Andrew,
> > 
> > Kernel panic's across different architectures like powerpc, x86_64, 
> 
> powerpc complains about IO-APICs??
> 
> > Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes)
> > Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes)
> > Mount-cache hash table entries: 256
> > SMP alternatives: switching to UP code
> > ACPI: Core revision 20070126

Hmm. same date here. It's Asus P5B-E motheboard

> > ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> > Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the 
> > 'noapic' kernel parameter
> 
> ACPI or x86 breakage, I guess.
> 
> Did 'noapic' work?

No! The box freezes somewhere after "Freeing unused kernel memory"...

Bisection points to git-x86.patch, though.

git-bisect start
# good: [f05092637dc0d9a3f2249c9b283b973e6e96b7d2] Linux 2.6.24-rc3
git-bisect good f05092637dc0d9a3f2249c9b283b973e6e96b7d2
# bad: [46c8c396d2c87b786a7fac615c289f85a18e53ce] w1-build-fix
git-bisect bad 46c8c396d2c87b786a7fac615c289f85a18e53ce
# bad: [4e22f4852c48e1eddfe04299e78c0456164abe86] 
frv-move-dma-macros-to-scatterlisth-for-consistency
git-bisect bad 4e22f4852c48e1eddfe04299e78c0456164abe86
# bad: [4e22f4852c48e1eddfe04299e78c0456164abe86] 
frv-move-dma-macros-to-scatterlisth-for-consistency
git-bisect bad 4e22f4852c48e1eddfe04299e78c0456164abe86
# good: [d5135f31313af2be37d8ccb71e2a42f8e221d8c4] 
ide-mm-ide-disk-extend-timeout-for-pio-out-commands
git-bisect good d5135f31313af2be37d8ccb71e2a42f8e221d8c4
# good: [6be815e83f506f4c39a46cf59014e29a95c5e6c4] 
iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue
git-bisect good 6be815e83f506f4c39a46cf59014e29a95c5e6c4
# good: [6be815e83f506f4c39a46cf59014e29a95c5e6c4] 
iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue
git-bisect good 6be815e83f506f4c39a46cf59014e29a95c5e6c4
# bad: [c792db6d06114a85e33a27c89e9e979f11b951c4] 
slub-fix-coding-style-violations
git-bisect bad c792db6d06114a85e33a27c89e9e979f11b951c4
# bad: [c792db6d06114a85e33a27c89e9e979f11b951c4] 
slub-fix-coding-style-violations
git-bisect bad c792db6d06114a85e33a27c89e9e979f11b951c4
# bad: [76f3939b76ff557f73720b57a16716196f04e407] 
x86_64-make-sparsemem-vmemmap-the-default-memory-model-v2
git-bisect bad 76f3939b76ff557f73720b57a16716196f04e407
# good: [b8ba611566d8799a979b190d4bb14305ca64ee0e] 
sis-fb-driver-_ioctl32_conversion-functions-do-not-exist-in-recent-kernels
git-bisect good b8ba611566d8799a979b190d4bb14305ca64ee0e
# good: [e34995928859308d2abef1709332e2b12d36db2f] git-ipwireless_cs
git-bisect good e34995928859308d2abef1709332e2b12d36db2f
# bad: [f520abbbe11bc8253714bcd34aaaf19bdf82189e] git-x86-identify_cpu-fix
git-bisect bad f520abbbe11bc8253714bcd34aaaf19bdf82189e


I honestly tried fresh #mm from x86 tree -- the one which ends at commit
70be766db1105c0fc9aed8e954d0c343c1eda067 "x86: Add the RDC machine
specific reboot fixup". FWIW, commit "x86: validate against ACPI motherboard
resources" is innocent. After "x86: make stack size configurable" damn thing
wouldn't build and applying fixets from -mm doesn't help at 3AM.

Again, it's late here, I'll recheck today.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pata_isapnp: Polled devices

2007-11-23 Thread Jeff Garzik

Alan Cox wrote:

If a card has no IRQ then pass no interrupt handler but allow polled
usage.

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>

diff -u --new-file --recursive --exclude-from /usr/src/exclude 
linux.vanilla-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c 
linux-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c
--- linux.vanilla-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c  2007-11-16 
17:54:39.0 +
+++ linux-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c  2007-11-16 
18:14:29.0 +
@@ -75,13 +75,16 @@
struct ata_host *host;
struct ata_port *ap;
void __iomem *cmd_addr, *ctl_addr;
+   int irq = 0;
+   irq_handler_t handler = NULL;
 
 	if (pnp_port_valid(idev, 0) == 0)


applied #upstream-fixes


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:
Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.



sata_mv doesn't do ATAPI at all...


Right.. missed that ATA_FLAG_NO_ATAPI. So these issues Tom is reporting 
are just with a normal SATA hard drive?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Jeff Garzik

Robert Hancock wrote:
Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.



sata_mv doesn't do ATAPI at all...

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Robert Hancock

Mark Lord wrote:

Morrison, Tom wrote:

I am hopeful that the sata_mv has this bug (I proved that the
problem I was experiencing was due to the sata_mv driver with 3.75Gig 
or more of memory)...
 
I am on vacation for a week or more ...or I'd tell you today

if it did have this bug!

..

Yeah, I kind of had your reports in mind when I asked that.  :)

On a related note, I now have lots of Marvell (sata_mv) hardware here,
and an Intel CPU/chipset box with physical RAM above the 4GB boundary.


Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.


Likely it needs a similar slave_config trick to change bounce limit 
depending on the connected device, unless there is really a way to issue 
ATAPI commands with this EDMA interface, as the TODO list in sata_mv.c 
suggests may be possible..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


No error when inotify_add_watch(/an/NFS/file)

2007-11-23 Thread Phil Endecott

Dear Experts,

NFS doesn't work with inotify (and it looks like it can't, certainly 
not before NFS v4.1).  However, if I give an NFS filename to 
inotify_add_watch(), I don't get an error.


If it indicated an error in this case then I could easily fall back to 
some sort of polling.  Without an error, I need some other way to 
detect NFS (and any other non-inotify-compatible filesystems).


Any thoughts?


Phil.

(If you Cc: me in any replies I'll see them sooner.)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-23 Thread Jan-Simon Möller
Am Freitag 23 November 2007 08:21:09 schrieb Andrew Morton:
> On Tue, 13 Nov 2007 21:55:15 +0100 Jan-Simon M__ller <[EMAIL PROTECTED]> 
> wrote:
> > Hi!
>
> You removed from cc the guys who are most likely to fix this.  Please
> always do reply-to-all.
Sri, will remember that.
>
> > Just using kernel 2.6.24-rc2 (325d22df7b19e0116aff3391d3a03f73d0634ded).
> >
>
> So is this problem (which in another email you attributed to smartd) 
Even without smartd in my default runlevel it happens at some point.

> also 
> present in 2.6.23?
I compiled and tested 2.6.23.8. Smartd enabled, nothing noticed, dmesg is 
really clean:
dmesg | grep ata
ACPI: SSDT 7F6D3C3F, 02DD (r1 SataRe SataAhci 1000 INTL 20060912)
PERCPU: Allocating 46888 bytes of per cpu data
Memory: 2042960k/2087744k available (2062k kernel code, 44396k reserved, 982k 
data, 324k init)
ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
libata version 2.21 loaded.
ata1: SATA max UDMA/133 cmd 0xc234e100 ctl 0x bmdma 
0x irq 4347
ata2: SATA max UDMA/133 cmd 0xc234e180 ctl 0x bmdma 
0x irq 4347
ata3: SATA max UDMA/133 cmd 0xc234e200 ctl 0x bmdma 
0x irq 4347
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-8: WDC WD2500BEVS-22UST0, 01.01A01, max UDMA/133
ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 0 SControl 300)
ata_piix :00:1f.1: version 2.12
scsi3 : ata_piix
scsi4 : ata_piix
ata4: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 
0x00011810 irq 14
ata5: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 
0x00011818 irq 15
ata4.00: ATAPI: HL-DT-ST DVDRAM GSA-T20N, WW01, max UDMA/33
ata4.00: configured for UDMA/33
EXT3-fs: mounted filesystem with ordered data mode.



>
> And is is still present in 2.6.24-rc3?
Went back to 2.6.24-rc3 ...
Yes, but not at boot when smartd is started.

dmesg | grep ata
ACPI: SSDT 7F6D3C3F, 02DD (r1 SataRe SataAhci 1000 INTL 20060912)
PERCPU: Allocating 46968 bytes of per cpu data
Memory: 2048732k/2087744k available (2219k kernel code, 38624k reserved, 992k 
data, 344k init)
ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
libata version 3.00 loaded.
ata1: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfc404100 irq 4347
ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfc404180 irq 4347
ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfc404200 irq 4347
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-8: WDC WD2500BEVS-22UST0, 01.01A01, max UDMA/133
ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 0 SControl 300)
ata_piix :00:1f.1: version 2.12
scsi3 : ata_piix
scsi4 : ata_piix
ata4: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1810 irq 14
ata5: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1818 irq 15
ata4.00: ATAPI: HL-DT-ST DVDRAM GSA-T20N, WW01, max UDMA/33
ata4.00: configured for UDMA/33
EXT3-fs: mounted filesystem with ordered data mode.
ata1.00: exception Emask 0x2 SAct 0x73 SErr 0x0 action 0x2 frozen
ata1.00: spurious completions during NCQ issue=0x0 SAct=0x73 
FIS=004040a1:0008
ata1.00: cmd 60/10:00:d4:82:31/00:00:07:00:00/40 tag 0 cdb 0x0 data 8192 in
ata1.00: status: { DRDY }
ata1.00: cmd 60/08:08:9c:e5:cc/00:00:08:00:00/40 tag 1 cdb 0x0 data 4096 in
ata1.00: status: { DRDY }
ata1.00: cmd 60/10:20:24:61:25/00:00:09:00:00/40 tag 4 cdb 0x0 data 8192 in
ata1.00: status: { DRDY }
ata1.00: cmd 60/58:28:c4:65:25/00:00:09:00:00/40 tag 5 cdb 0x0 data 45056 in
ata1.00: status: { DRDY }
ata1.00: cmd 60/20:30:7c:f6:a3/00:00:05:00:00/40 tag 6 cdb 0x0 data 16384 in
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x2 SAct 0x187 SErr 0x0 action 0x2 frozen
ata1.00: spurious completions during NCQ issue=0x0 SAct=0x187 
FIS=004040a1:0040
ata1.00: cmd 60/08:00:ec:af:10/00:00:04:00:00/40 tag 0 cdb 0x0 data 4096 in
ata1.00: status: { DRDY }
ata1.00: cmd 60/10:08:8c:e6:d8/00:00:04:00:00/40 tag 1 cdb 0x0 data 8192 in
ata1.00: status: { DRDY }
ata1.00: cmd 60/20:10:24:1a:da/00:00:04:00:00/40 tag 2 cdb 0x0 data 16384 in
ata1.00: status: { DRDY }
ata1.00: cmd 61/01:38:15:b3:30/00:00:07:00:00/40 tag 7 cdb 0x0 data 512 out
ata1.00: status: { DRDY }
ata1.00: cmd 61/10:40:1c:b3:30/00:00:07:00:00/40 tag 8 cdb 0x0 data 8192 out
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete


Thanks !
Best regards,
Jan-Simon
-
To unsubscribe from this 

Re: nfs failure causes bad page state

2007-11-23 Thread Trond Myklebust

On Fri, 2007-11-16 at 22:13 +, Russell King wrote:
> While testing a kernel based upon
> ecd744eec3aa8bbc949ec04ed3fbf7ecb2958a0e
> (with wrong boot arguments), I got the following bad page state entry
> while
> NFS was trying to mount it's rootfs:
> 
> IP-Config: Complete:
>   device=eth0, addr=192.168.1.101, mask=255.255.255.0,
> gw=255.255.255.255,
>  host=192.168.1.101, domain=, nis-domain=(none),
>  bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
> Looking up port of RPC 13/2 on 192.168.1.100
> rpcbind: server 192.168.1.100 not responding, timed out
> Root-NFS: Unable to get nfsd port number from server, using default
> Looking up port of RPC 15/1 on 192.168.1.100
> rpcbind: server 192.168.1.100 not responding, timed out
> Root-NFS: Unable to get mountd port number from server, using default
> mount: server 192.168.1.100 not responding, timed out
> Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
> VFS: Unable to mount root fs via NFS, trying floppy.
> Bad page state in process 'swapper'
> page:c02b1260 flags:0x0400 mapping: mapcount:0 count:0
> Trying to fix it up, but a reboot is needed
> Backtrace:
> [] (dump_stack+0x0/0x14) from [] (bad_page
> +0x70/0xac)
> [] (bad_page+0x0/0xac) from [] (free_hot_cold_page
> +0x80/0x178)
> [] (free_hot_cold_page+0x0/0x178) from []
> (free_hot_page+0x14/0x18)
> [] (free_hot_page+0x0/0x18) from [] (put_page
> +0xf8/0x154)
> [] (put_page+0x0/0x154) from [] (kfree+0xc8/0xd0)
> [] (kfree+0x0/0xd0) from [] (nfs_get_sb
> +0x230/0x710)
> [] (nfs_get_sb+0x0/0x710) from [] (vfs_kern_mount
> +0x58/0xac)[] (vfs_kern_mount+0x0/0xac) from []
> (do_kern_mount+0x38/0xf4)
> [] (do_kern_mount+0x0/0xf4) from [] (do_mount
> +0x1e8/0x614)
> ...
> 
> This seems to be caused by use of an uninitialised structure due to
> NULL
> options being passed to nfs_validate_mount_data().  Ensure that the
> parsed mount data is always initialised.
> 
> Signed-off-by: Russell King <[EMAIL PROTECTED]>
> 
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index fa517ae..0b1080c 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -1054,10 +1054,11 @@ static int nfs_validate_mount_data(void
> *options,
>  {
>   struct nfs_mount_data *data = (struct nfs_mount_data *)options;
>  
> + memset(args, 0, sizeof(*args));
> +
>   if (data == NULL)
>   goto out_no_data;
>  
> - memset(args, 0, sizeof(*args));
>   args->flags = (NFS_MOUNT_VER3 | NFS_MOUNT_TCP);
>   args->rsize = NFS_MAX_FILE_IO_SIZE;
>   args->wsize = NFS_MAX_FILE_IO_SIZE;

Thanks Russell,

It looks as if the same bug exists in nfs4_validate_mount_data(), so I
added the same fix.

Cheers
  Trond

--- Begin Message ---
While testing a kernel based upon ecd744eec3aa8bbc949ec04ed3fbf7ecb2958a0e
(with wrong boot arguments), I got the following bad page state entry while
NFS was trying to mount it's rootfs:

IP-Config: Complete:
  device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255,
 host=192.168.1.101, domain=, nis-domain=(none),
 bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
Looking up port of RPC 13/2 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 15/1 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 192.168.1.100 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
VFS: Unable to mount root fs via NFS, trying floppy.
Bad page state in process 'swapper'
page:c02b1260 flags:0x0400 mapping: mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[] (dump_stack+0x0/0x14) from [] (bad_page+0x70/0xac)
[] (bad_page+0x0/0xac) from [] 
(free_hot_cold_page+0x80/0x178)
[] (free_hot_cold_page+0x0/0x178) from [] 
(free_hot_page+0x14/0x18)
[] (free_hot_page+0x0/0x18) from [] (put_page+0xf8/0x154)
[] (put_page+0x0/0x154) from [] (kfree+0xc8/0xd0)
[] (kfree+0x0/0xd0) from [] (nfs_get_sb+0x230/0x710)
[] (nfs_get_sb+0x0/0x710) from [] 
(vfs_kern_mount+0x58/0xac)[] (vfs_kern_mount+0x0/0xac) from 
[] (do_kern_mount+0x38/0xf4)
[] (do_kern_mount+0x0/0xf4) from [] (do_mount+0x1e8/0x614)
...

This seems to be caused by use of an uninitialised structure due to NULL
options being passed to nfs_validate_mount_data().  Ensure that the
parsed mount data is always initialised.

Signed-off-by: Russell King <[EMAIL PROTECTED]>
 (Trond: added fix for the same bug in nfs4_validate_mount_data()).
Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/super.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 046d1ac..8d95d7d 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1078,10 +1078,11 @@ static int 

Re: freeze vs freezer

2007-11-23 Thread Rafael J. Wysocki
On Thursday, 22 of November 2007, Jeremy Fitzhardinge wrote:
> It seems that a process blocked in a write to an xfs filesystem due to
> xfs_freeze cannot be frozen by the freezer.

The freezer doesn't handle tasks in TASK_UNINTERRUPTIBLE and I don't know how
to make it handle them without at least partially defeating its purpose.

> I see this if I suspend my laptop while doing something xfs-filesystem
> intensive, like a kernel build.  My suspend scripts freeze the XFS
> filesystem (as Dave said I should), which presumably blocks some writer,
> and then the freezer times out and fails to complete.
> 
> Here's part of the process dump the freezer does when it times out:
> 
> cc1   D  0 18138  18137
>dd5f1e24 00200082 0002  ecdeeb00 ecdeec64 c200f280 
> 0001 
>009c09a0 dd5f1e0c dd5f1e0c 000f    
> dd5f1e74 
>c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d 
> dd5f1e44 
> Call Trace:
>  [] xfs_write+0xf4/0x6d9
>  [] xfs_file_aio_write+0x53/0x5b
>  [] do_sync_write+0xae/0xec
>  [] vfs_write+0xa4/0x120
>  [] sys_write+0x3b/0x60
>  [] sysenter_past_esp+0x6b/0xa1
>  ===
> 
> 
> I haven't looked at how to fix this yet.  I only just worked out why I
> was getting suspend failures.

Well, you can add freezer_do_not_count()/freezer_count() annotations to
xfs_write() (and whatever else is blocked as a result of the XFS being frozen).

Generally, that would be risky without the freezing of XFS, however, because it
might leak us filesystem data to a storage device after creating a hibernation
image which would result in the filesystem corruption after the resume.

Still, if you only suspend to RAM, that should be safe.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OT: Re: System reboot triggered by just reading a device file....!?

2007-11-23 Thread devzero
Hi Clemens, 

> 
> Hi, Roland!
> 
> Please don't top-post.

sorry!

>  > > [was: it would be easy to disable the kernel watchdog]
>  > thanks, but i know i could do this.
> 
> Good. I was also curious and just checked again. The watchdog subsystem
> is by default _disabled_ in the kernel configuration. If you use some
> distro's kernel, where they turned it on, complain to them!
> If you turned it on yourself, you are really on your own...
> the Kconfig help there is IMO sufficient and very clear and,
> "If unsure, say N". Hmm... sorry?!

whoops - sorry for that. i should have checked that, but i think i just didn`t 
expect some distro vendor to change that default.
sure i will complain to suse now. stopping getting on your nerves here, now.


>  > this thread is not meant to protect myself from this curiousity but it is 
> meant
>  > to protect others. it`s a trap.
> 
> I guess I understand your position. But I don't see no way to improve
> the kernel in that point.
> Complain to the guys who enabled the watchdog / setup this trap for
> any reason.

sure. you`re completely right.


>  > i stepped into that.
>  > now i know that trap, so i can easily sidestep.
>  > it maybe very seldom that someone steps into this.
>  > but it may happen and then someone will have trouble and spend time on 
> this.
>  > i think every admin can tell you about weird random reboots of his systems
>  > which he cannot explain what was the reason for it.
> 
> That's one possible way of "learning by doing suicide (tm);"

:)

>  > this maybe some of those reasons and this one could be avoided.
>  > i`m thinking of something simple like echo "now you`re armed" > 
> /dev/watchdog
> 
> Read some details about watchdogs to get more background and why the
> watchdog is triggered so easily and why it's good this way.
> i.e: http://www.ganssle.com/watchdogs.pdf

thanks for your help and for that very useful link. that`s the very best stuff 
i every read about watchdogs!

regards
Roland

__
Erweitern Sie FreeMail zu einem noch leistungsstärkeren E-Mail-Postfach!

Mehr Infos unter http://produkte.web.de/club/?mc=021131

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Dmitri Vorobiev
Daniel Drake пишет:
> Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> grasp on unaligned memory access problems on other architectures and decided
> it was time to figure it out. As a result I've written this documentation
> which I plan to submit for inclusion as
> Documentation/unaligned_memory_access.txt
> 
> Before I do so, any comments on the following?

>From the viewpoint of yours truly (and I am a teacher of operating system 
>classes), this is a long-expected document, which is going to be very useful 
>especially for newbies. My students often make alignment mistakes in their 
>code, and your article will definitely make my job much easier.

Thank you, Daniel, for your work.

Dmitri

> 
> Thanks,
> Daniel
> 
> 
> 
> 
> UNALIGNED MEMORY ACCESSES
> =
> 
> Linux runs on a wide variety of architectures which have varying behaviour
> when it comes to memory access. This document presents some details about
> unaligned accesses, why you need to write code that doesn't cause them,
> and how to write such code!
> 
> 
> What's the definition of an unaligned access?
> =
> 
> Unaligned memory accesses occur when you try to read N bytes of data starting
> from an address that is not evenly divisible by N (i.e. addr % N != 0).
> For example, reading 4 bytes of data from address 0x1004 is fine, but
> reading 4 bytes of data from address 0x1005 would be an unaligned memory
> access.
> 
> 
> Why unaligned access is bad
> ===
> 
> Most architectures are unable to perform unaligned memory accesses. Any
> unaligned access causes a processor exception.
> 
> Some architectures have an exception handler implemented in the kernel which
> corrects the memory access, but this is very expensive and is not true for
> all architectures. You cannot rely on the exception handler to correct your
> memory accesses.
> 
> In summary: if your code causes unaligned memory accesses to happen, your code
> will not work on some platforms, and will perform *very* badly on others.
> 
> You may be wondering why you have never seen these problems on your own
> architecture. Some architectures (such as i386 and x86_64) do not have this
> limitation, but nevertheless it is important for you to write portable code
> that works everywhere.
> 
> 
> Natural alignment
> =
> 
> The rule we mentioned earlier forms what we refer to as natural alignment:
> When accessing N bytes of memory, the base memory address must be evenly
> divisible by N, i.e. addr % N == 0
> 
> When writing code, assume the target architecture has natural alignment
> requirements.
> 
> Sidenote: in reality, only a few architectures require natural alignment
> on all sizes of memory access. However, again we must consider ALL supported
> architectures; natural alignment is the only way to achieve full portability.
> 
> 
> Code that doesn't cause unaligned access
> 
> 
> At first, the concepts above may seem a little hard to relate to actual
> coding practice. After all, you don't have a great deal of control over
> memory addresses of certain variables, etc.
> 
> Fortunately things are not too complex, as in most cases, the compiler
> ensures that things will work for you. For example, take the following
> structure:
> 
>   struct foo {
>   u16 field1;
>   u32 field2;
>   u8 field3;
>   };
> 
> Let us assume that an instance of the above structure resides in memory
> starting at address 0x1000. With a basic level of understanding, it would
> not be unreasonable to expect that accessing field2 would cause an unaligned
> access. You'd be expecting field2 to be located at offset 2 bytes into the
> structure, i.e. address 0x1002, but that address is not evenly divisible
> by 4 (remember, we're reading a 4 byte value here).
> 
> Fortunately, the compiler understands the alignment constraints, so in the
> above case it would insert 2 bytes of padding inbetween field1 and field2.
> Therefore, for standard structure types you can always rely on the compiler
> to pad structures so that accesses to fields are suitably aligned (assuming
> you do not cast the field to a type of different length).
> 
> Similarly, you can also rely on the compiler to align variables and function
> parameters to a naturally aligned scheme, based on the size of the type of
> the variable.
> 
> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For example,
> moving field3 to sit inbetween field1 and field2 (where the padding is
> inserted) would shrink the overall structure by 1 byte:
> 
>   struct foo {
>   u16 field1;
>   u8 field3;
>   u32 field2;
>   };
> 
> Sidenote: it should be obvious by now, but in case it is not, accessing a
> single 

Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote:
> Torsten Kaiser wrote:
> > On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> ...
> > Above this acquire/release sequence is the following comment:
> > #ifdef CONFIG_LOCKDEP
> > /*
> >  * It is permissible to free the struct work_struct
> >  * from inside the function that is called from it,
> >  * this we need to take into account for lockdep too.
> >  * To avoid bogus "held lock freed" warnings as well
> >  * as problems when looking into work->lockdep_map,
> >  * make a copy and use that here.
> >  */
> > struct lockdep_map lockdep_map = work->lockdep_map;
> > #endif
> >
> > Did something trigger this anyway?
> >
> > Anything I could try, apart from more boots with slub_debug=F?
>
> Please could you try which patch from the dm-crypt series cause this ?
> (agk-dm-dm-crypt* names.)
>
> I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
> there is one work struct used subsequently in two threads...
> (io thread already started while crypt thread is processing lockdep_map
> after calling f(work)...)
>
> (btw these patches prepare dm-crypt for next patchset introducing
> async cryptoapi, so there should be no functional changes yet.)

I looked at all of these agk-*-patches, as the error is not
bisectable, because it triggers unreliable.
The one that looks suspicious is agk-dm-dm-crypt-tidy-io-ref-counting.patch

This one does a functional change, as there now is an additional ref
on io->pending. Instead of only increasing io->pending if there really
are more then one clone-bio, it will now take an additional ref in
crypt_write_io_process().

I certainly agree with the cleanup, but this introduces the following change:

Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio().
Now there is an additional call to crypt_dec_pending() to balance the
additional ref placed into crypt_write_io_process(). And that one is
not called from whatever context/thread cleans up after
make_generic_request, but directly in the context/thread of the caller
of crypt_write_io_process(), and that is kcryptd.

So now it is possible (if all requests finish before
crypt_write_io_process() returns) that kcryptd itself will release the
bio, but the workqueue infrastructure still seems to have a lock on
that.

But as the comment in run_workqueue says, this should be legal, and I
can't figure out what would make the the lockdep copy mechanism fail.
Especially if the trigger was really a WRITE request, as with
agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted this
should never use the kcrypt_io-workqueue and so there should be not
even the problem with using INIT_WORK twice on the same work_struct.

... or I just don't see the bug.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: radeonfb i2c regression post-2.6.18.

2007-11-23 Thread Jean Delvare
On Fri, 23 Nov 2007 17:00:52 +0100, Michael Buesch wrote:
> This patch fixes my crash problem.

Out of curiosity, what kind of crash was it? I admit that I can't see
how the code could crash.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver

2007-11-23 Thread Robin Getz
On Fri 23 Nov 2007 16:52, Arjan van de Ven pondered:
> On Fri, 23 Nov 2007 22:25:29 +0800
> "Bryan Wu" <[EMAIL PROTECTED]> wrote:
> 
> > On Nov 23, 2007 6:19 PM, David Woodhouse <[EMAIL PROTECTED]> wrote:
> > >
> > > On Fri, 2007-11-23 at 18:14 +0800, Bryan Wu wrote:
> > > >
> > > > +#ifdef CONFIG_BF54x
> > > > /* Setup DMAC1 channel mux for NFC which shared with SDH
> > > > */ val = bfin_read_DMAC1_PERIMUX();
> > > > val &= 0xFFFE;
> > > > bfin_write_DMAC1_PERIMUX(val);
> > > > SSYNC();
> > > > -
> > > > +#endif
> > >
> > > You can't build a multiplatform kernel which runs on BF52x and
> > > BF54x?
> > 
> > There are some hardware difference between BF52x and BF54x. We have
> > to do this.
> > 
> 
> well does it need to be an #ifdef, or can it be a runtime if() ?

It could be a runtime if() but we don't currently have the is_mach() all set 
up properly today.

This is because on most systems that Blackfin ships on - memory is the 
dominate cost of the system, and end users don't want to take the either the 
storage (flash) hit of having code they don't use, or the run time (DRAM) 
overhead. They are fine with compiling 2 kernels for two platforms if it 
means things are cheaper. :)

That being said, we still need to go back, and add things properly - and just 
let gcc optimise things away if it is not used - c code is more maintainable 
than all the ifdefs we have today.

This is the goal - it will just take a little bit to get there.

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-23 Thread Vadim Lobanov
On Thursday 22 November 2007 04:15:53 pm Daniel Drake wrote:
> Fortunately things are not too complex, as in most cases, the compiler
> ensures that things will work for you. For example, take the following
> structure:
>
>   struct foo {
>   u16 field1;
>   u32 field2;
>   u8 field3;
>   };
>
> Fortunately, the compiler understands the alignment constraints, so in the
> above case it would insert 2 bytes of padding inbetween field1 and field2.
> Therefore, for standard structure types you can always rely on the compiler
> to pad structures so that accesses to fields are suitably aligned (assuming
> you do not cast the field to a type of different length).

It would also insert 3 bytes of padding after field3, in order to satisfy 
alignment constraints for arrays of these structures.

> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For
> example, moving field3 to sit inbetween field1 and field2 (where the
> padding is inserted) would shrink the overall structure by 1 byte:
>
>   struct foo {
>   u16 field1;
>   u8 field3;
>   u32 field2;
>   };

It will actually shrink it by 4 bytes, for the very same reason.

-- Vadim Lobanov


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver

2007-11-23 Thread Arjan van de Ven
On Fri, 23 Nov 2007 22:25:29 +0800
"Bryan Wu" <[EMAIL PROTECTED]> wrote:

> On Nov 23, 2007 6:19 PM, David Woodhouse <[EMAIL PROTECTED]> wrote:
> >
> > On Fri, 2007-11-23 at 18:14 +0800, Bryan Wu wrote:
> > >
> > > +#ifdef CONFIG_BF54x
> > > /* Setup DMAC1 channel mux for NFC which shared with SDH
> > > */ val = bfin_read_DMAC1_PERIMUX();
> > > val &= 0xFFFE;
> > > bfin_write_DMAC1_PERIMUX(val);
> > > SSYNC();
> > > -
> > > +#endif
> >
> > You can't build a multiplatform kernel which runs on BF52x and
> > BF54x?
> 
> There are some hardware difference between BF52x and BF54x. We have
> to do this.
> 

well does it need to be an #ifdef, or can it be a runtime if() ?

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the interrupt going?

2007-11-23 Thread Benjamin Herrenschmidt

On Wed, 2007-11-21 at 17:08 -0800, Al Niessner wrote:
> 
> p8620 = pci_get_device (APC8620_VENDOR_ID, APC8620_DEVICE_ID, p8620);
> <... fail if p8620 is 0 ...>
> apcsi[i].ret_val = register_chrdev (MAJOR_NUM,
> 
> DEVICE_NAME,
> 
> _ops);
> <... fail if ret_val < 0 ...>
> apcsi[i].board_irq = p8620->irq;
> status = request_irq (apcsi[i].board_irq,
>   apc8620_handler,
>   IRQF_DISABLED,
>   DEVICE_NAME,
>   (void*)[i]);

First, that's obviously not the proper way to do a PCI driver but I
suppose you know that :-)

Then, make sure you call pci_enable_device() at one point, don't some
platforms perform the actual IRQ routing that late ? (And don't sample
pdev->irq before the pci_enable_device(), sample it afterward).

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/4] Timerfd v2 - un-break CONFIG_TIMERFD

2007-11-23 Thread Davide Libenzi
Remove the broken status to CONFIG_TIMERFD.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 init/Kconfig |1 -
 1 file changed, 1 deletion(-)

Index: linux-2.6.mod/init/Kconfig
===
--- linux-2.6.mod.orig/init/Kconfig 2007-11-23 13:13:16.0 -0800
+++ linux-2.6.mod/init/Kconfig  2007-11-23 13:36:42.0 -0800
@@ -566,7 +566,6 @@
 config TIMERFD
bool "Enable timerfd() system call" if EMBEDDED
select ANON_INODES
-   depends on BROKEN
default y
help
  Enable the timerfd() system call that allows to receive timer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/4] Timerfd v2 - wire the new timerfd API to the x86 family

2007-11-23 Thread Davide Libenzi
Wires up the new timerfd API to the x86 family.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 arch/x86/ia32/ia32entry.S  |4 +++-
 arch/x86/kernel/syscall_table_32.S |4 +++-
 include/asm-x86/unistd_32.h|6 --
 include/asm-x86/unistd_64.h|9 +++--
 4 files changed, 17 insertions(+), 6 deletions(-)

Index: linux-2.6.mod/include/asm-x86/unistd_32.h
===
--- linux-2.6.mod.orig/include/asm-x86/unistd_32.h  2007-11-23 
13:13:18.0 -0800
+++ linux-2.6.mod/include/asm-x86/unistd_32.h   2007-11-23 13:36:40.0 
-0800
@@ -327,13 +327,15 @@
 #define __NR_epoll_pwait   319
 #define __NR_utimensat 320
 #define __NR_signalfd  321
-#define __NR_timerfd   322
+#define __NR_timerfd_create322
 #define __NR_eventfd   323
 #define __NR_fallocate 324
+#define __NR_timerfd_settime   325
+#define __NR_timerfd_gettime   326
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 325
+#define NR_syscalls 327
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.mod/include/asm-x86/unistd_64.h
===
--- linux-2.6.mod.orig/include/asm-x86/unistd_64.h  2007-11-23 
13:13:18.0 -0800
+++ linux-2.6.mod/include/asm-x86/unistd_64.h   2007-11-23 13:36:40.0 
-0800
@@ -629,12 +629,17 @@
 __SYSCALL(__NR_epoll_pwait, sys_epoll_pwait)
 #define __NR_signalfd  282
 __SYSCALL(__NR_signalfd, sys_signalfd)
-#define __NR_timerfd   283
-__SYSCALL(__NR_timerfd, sys_timerfd)
+#define __NR_timerfd_create283
+__SYSCALL(__NR_timerfd_create, sys_timerfd_create)
 #define __NR_eventfd   284
 __SYSCALL(__NR_eventfd, sys_eventfd)
 #define __NR_fallocate 285
 __SYSCALL(__NR_fallocate, sys_fallocate)
+#define __NR_timerfd_settime   286
+__SYSCALL(__NR_timerfd_settime, sys_timerfd_settime)
+#define __NR_timerfd_gettime   287
+__SYSCALL(__NR_timerfd_gettime, sys_timerfd_gettime)
+
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.mod/arch/x86/kernel/syscall_table_32.S
===
--- linux-2.6.mod.orig/arch/x86/kernel/syscall_table_32.S   2007-11-23 
13:13:18.0 -0800
+++ linux-2.6.mod/arch/x86/kernel/syscall_table_32.S2007-11-23 
13:36:40.0 -0800
@@ -321,6 +321,8 @@
.long sys_epoll_pwait
.long sys_utimensat /* 320 */
.long sys_signalfd
-   .long sys_timerfd
+   .long sys_timerfd_create
.long sys_eventfd
.long sys_fallocate
+   .long sys_timerfd_settime   /* 325 */
+   .long sys_timerfd_gettime
Index: linux-2.6.mod/arch/x86/ia32/ia32entry.S
===
--- linux-2.6.mod.orig/arch/x86/ia32/ia32entry.S2007-11-23 
13:13:18.0 -0800
+++ linux-2.6.mod/arch/x86/ia32/ia32entry.S 2007-11-23 13:36:40.0 
-0800
@@ -723,7 +723,9 @@
.quad sys_epoll_pwait
.quad compat_sys_utimensat  /* 320 */
.quad compat_sys_signalfd
-   .quad compat_sys_timerfd
+   .quad sys_timerfd_create
.quad sys_eventfd
.quad sys32_fallocate
+   .quad compat_sys_timerfd_settime/* 325 */
+   .quad compat_sys_timerfd_gettime
 ia32_syscall_end:

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/4] Timerfd v2 - introduce a new hrtimer_forward_now() function

2007-11-23 Thread Davide Libenzi
I think that advancing the timer against the timer's current "now" can
be a pretty common usage, so, w/out exposing hrtimer's internals, we add
a new hrtimer_forward_now() function.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 include/linux/hrtimer.h |7 +++
 1 file changed, 7 insertions(+)

Index: linux-2.6.mod/include/linux/hrtimer.h
===
--- linux-2.6.mod.orig/include/linux/hrtimer.h  2007-11-23 13:13:21.0 
-0800
+++ linux-2.6.mod/include/linux/hrtimer.h   2007-11-23 13:36:36.0 
-0800
@@ -298,6 +298,13 @@
 extern unsigned long
 hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval);
 
+/* Forward a hrtimer so it expires after the hrtimer's current now */
+static inline unsigned long hrtimer_forward_now(struct hrtimer *timer,
+   ktime_t interval)
+{
+   return hrtimer_forward(timer, timer->base->get_time(), interval);
+}
+
 /* Precise sleep: */
 extern long hrtimer_nanosleep(struct timespec *rqtp,
  struct timespec *rmtp,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/4] Timerfd v2 - new timerfd API

2007-11-23 Thread Davide Libenzi
This is the new timerfd API as it is implemented by the following patch:

int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);

The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not 
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or {0, 
0}
if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are supported
(with the same interface).
Here's a simple test program I used to exercise the new timerfd APIs:

http://www.xmailserver.org/timerfd-test2.c



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 fs/compat.c  |   32 ++-
 fs/timerfd.c |  197 ++-
 include/linux/compat.h   |7 +
 include/linux/syscalls.h |7 +
 4 files changed, 164 insertions(+), 79 deletions(-)

Index: linux-2.6.mod/fs/timerfd.c
===
--- linux-2.6.mod.orig/fs/timerfd.c 2007-11-23 13:13:19.0 -0800
+++ linux-2.6.mod/fs/timerfd.c  2007-11-23 13:36:39.0 -0800
@@ -25,13 +25,15 @@
struct hrtimer tmr;
ktime_t tintv;
wait_queue_head_t wqh;
+   u64 ticks;
int expired;
+   int clockid;
 };
 
 /*
  * This gets called when the timer event triggers. We set the "expired"
  * flag, but we do not re-arm the timer (in case it's necessary,
- * tintv.tv64 != 0) until the timer is read.
+ * tintv.tv64 != 0) until the timer is accessed.
  */
 static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
 {
@@ -40,13 +42,14 @@
 
spin_lock_irqsave(>wqh.lock, flags);
ctx->expired = 1;
+   ctx->ticks++;
wake_up_locked(>wqh);
spin_unlock_irqrestore(>wqh.lock, flags);
 
return HRTIMER_NORESTART;
 }
 
-static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags,
+static void timerfd_setup(struct timerfd_ctx *ctx, int flags,
  const struct itimerspec *ktmr)
 {
enum hrtimer_mode htmode;
@@ -57,8 +60,9 @@
 
texp = timespec_to_ktime(ktmr->it_value);
ctx->expired = 0;
+   ctx->ticks = 0;
ctx->tintv = timespec_to_ktime(ktmr->it_interval);
-   hrtimer_init(>tmr, clockid, htmode);
+   hrtimer_init(>tmr, ctx->clockid, htmode);
ctx->tmr.expires = texp;
ctx->tmr.function = timerfd_tmrproc;
if (texp.tv64 != 0)
@@ -83,7 +87,7 @@
poll_wait(file, >wqh, wait);
 
spin_lock_irqsave(>wqh.lock, flags);
-   if (ctx->expired)
+   if (ctx->ticks)
events |= POLLIN;
spin_unlock_irqrestore(>wqh.lock, flags);
 
@@ -102,11 +106,11 @@
return -EINVAL;
spin_lock_irq(>wqh.lock);
res = -EAGAIN;
-   if (!ctx->expired && !(file->f_flags & O_NONBLOCK)) {
+   if (!ctx->ticks && !(file->f_flags & O_NONBLOCK)) {
__add_wait_queue(>wqh, );
for (res = 0;;) {
set_current_state(TASK_INTERRUPTIBLE);
-   if (ctx->expired) {
+   if (ctx->ticks) {
res = 0;
break;
}
@@ -121,22 +125,21 @@
__remove_wait_queue(>wqh, );
__set_current_state(TASK_RUNNING);
}
-   if (ctx->expired) {
-   ctx->expired = 0;
-   if (ctx->tintv.tv64 != 0) {
+   if (ctx->ticks) {
+   ticks = ctx->ticks;
+   if (ctx->expired && ctx->tintv.tv64) {
/*
 * If tintv.tv64 != 0, this is a periodic timer that
 * needs to be re-armed. We avoid doing it in the timer
 * callback to avoid DoS attacks specifying a very
 * short timer period.
 */
-   ticks = (u64)
-   hrtimer_forward(>tmr,
-   hrtimer_cb_get_time(>tmr),
-   ctx->tintv);
+   ticks += (u64) hrtimer_forward_now(>tmr,
+  ctx->tintv) - 1;
hrtimer_restart(>tmr);
-   } else
-   ticks = 1;
+   }
+   ctx->expired = 0;

Re: unionfs: several more problems

2007-11-23 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Hugh Dickins writes:
[...]
> > I deceived myself for a while that the danger of shmem_writepage
> > hitting its BUG_ON(entry->val) was dealt with too; but that's wrong,
> > I must go back to working out an escape from that one (despite never
> > seeing it).
> 
> Once I tried a more appropriate test (fsx while swapping) I hit that
> easily.  After some thought and testing, I'm happy with the mm/shmem.c
> +mm/swap_state.c fixes I've arrived at for that; but since it's not
> easy to reproduce in normal usage, and hasn't been holding you up,
> I'd prefer for the moment to hold on to that patch.  I need to make
> changes around the same pagecache<->swapcache area to solve some mem
> cgroup issues: there might turn out to be some interaction, so I'd
> rather finalize both patches in the same series if I can.
[...]

If you want, send me those patches and I'll run them w/ my tests, even if
they're not finalized; my testing can give you another useful point of
reference.

> But perhaps before fixing up the several LTP tests, you'll want
> to concentrate on a more directed test.  Please try this sequence:
> 
>   # Running with mem=512M, probably irrelevant
> swapoff -a# Merely to rule out one potential confusion
> mkfs -t ext2 /dev/sdb1
> mount -t ext2 /dev/sdb1 /mnt
> df /mnt   # I have 2280 Used out of 1517920 KB
> cp -a 2.6.24-rc3 /mnt # Copy a kernel source tree into ext2
> rm -rf /mnt/2.6.24-rc3# Delete the copy
> df /mnt   # Again 2280 Used, just as you'd expect
> mount -t unionfs -o dirs=/mnt unionfs /tmp
> cp -a 2.6.24-rc3 /tmp # Copy a kernel source tree into unionfs
> rm -rf /tmp/2.6.24-rc3# Generates 176 unionfs: filldir error messages
> df /mnt   # Now 68380 Used (df /tmp shows the same)
> ls -a /mnt# Shows .  ..  .wh.2.6.24-rc3  lost+found
> echo 1 >/proc/sys/vm/drop_caches  # to free pagecache
> df /mnt   # Still 68380 Used (df /tmp shows the same)
> echo 2 >/proc/sys/vm/drop_caches  # to free dentries and inodes
> df /mnt   # Now 2280 Used as it should be (df /tmp same)
> ls -a /mnt# But still shows that .wh.2.6.24-rc3
> umount /tmp   # Restore
> umount /mnt   # Restore
> swapon -a # Restore
> 
> Three different problems there:
> 
> 1. Whiteouts seem to get left behind (at this top level anyway):
> I'm getting an increasing number of .wh.run-crons.? files there.
> I'm not familiar with the correct behaviour for whiteouts (and it's
> not clear to me why whiteouts are needed at all in this degenerate
> case of a single directory in the union, but never mind).

I could spend a lot of time explaining the history of whiteouts in unioning
file systems, and all the different techniques and algorithms we've tried
ourselves over the years.  But suffice to say that I'd be very happy the day
every Linux f/s has a native whiteout support. :-)

Our current policy for when/where to create whiteouts has evolved after much
experience with users.  The most common use case for unionfs is one or more
read-only branches, plus a high-priority writable branch (for copyup).
Therefore, in the most common case we cannot remove the objects from the
readonly branches, and have to create a whiteout instead.

Using a single branch with unionfs is very uncommon among unionfs users, but
it serves nicely as a useful "null layer" testing (ala BSD's Nullfs or my
fistgen's wrapfs).

Anyway, upon further thinking about this issue I realized that whiteouts in
the single-branch situation are just a generalization of a possibly more
common case -- when the object being unlink'ed (or rmdir'ed) is on the
rightmost, lowest priority branch in which it is known to exist.  In that
case, there's no need to create a whiteout there, b/c there's no chance that
a readonly file by the same name could exist below that branch.  The same is
true if you try to rmdir a directory anywhere in one of the union's
branches: if we know (thanks to ->lookup) that there is no dir by the same
name anywhere else, then we can safely skip creating a whiteout if the
least-priority dir is being rmdir'ed.

I've got a small patch that does just that.

> 2. Why does copying then deleting a tree leave blocks allocated,
> which remain allocated indefinitely, until memory pressure or
> drop_caches removes them?  Hmm, I should have done "df -i" instead,
> that would be more revealing.  This may well be the same as the LTP
> mkdir problem - inodes remaining half-allocated after they're unlinked.

Turns out we weren't releasing the ref's on the lower directory being
rmdir'ed as early as we could.  We'd have done it in delete/clear_inode,
upon memory pressure, or unmount -- so those resources wouldn't have stuck
around forever.

I now have a small patch that releases those resources on rmdir and the
space (df and df -i) is reclaimed right 

Re: Where is the new timerfd?

2007-11-23 Thread Davide Libenzi
On Fri, 23 Nov 2007, Ulrich Drepper wrote:

> On Nov 23, 2007 9:29 AM, Davide Libenzi <[EMAIL PROTECTED]> wrote:
> > Yes, it's disabled, and yes, I'll repost today ...
> 
> I haven't seen the patch and don't feel like searching.  So I say it
> here: please mak sure you add a flags parameter to the system call
> itself (instead of adding it on as for eventfd and signalfd).  We need
> to be able to use O_CLOEXEC some way or another.

I'm more then OK about adding a flags parameter. If it was for me, I'd add 
it even to eventfd and signalfd. I asked Linus if he was OK about adding 
the flags parameter to all. He didn't reply, and I read that as "no".



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Mark Lord

Mark wrote:

Yeah, I kind of had your reports in mind when I asked that.  :)

On a related note, I now have lots of Marvell (sata_mv) hardware here,
and an Intel CPU/chipset box with physical RAM above the 4GB boundary.


Morrison, Tom wrote:

Yes, I believe that - otherwise, this problem would have
been a crisis a LONG time ago...:-)
 
But I do have some more questions in relationship to how 
things are mapped in your environment. I have a flat memory 
map (i.e.: the full 0x0 -- 0x1__ is passed to the 32bit 
Linux kernel without any 'holes' and/or reserved areas).
 
Does your Intel memory map have this same type of flat memory 
model (and thus allow use of the FULL lower 4Gig) - or does it 
reserve areas of lower 4Gig for devices and such - if not - where 
are these reserved areas - and how do the relate to the I/O memory

map for the device?
 
In other words, I would be very interested in seeing the memory 
map & the PCI memory mapping to see if any overlap/correspond 
to reserve areas of lower 4 Gig (in a linux 32bit mode)...

...

I believe that only 2GB or so of the 4GB RAM appears below the 4GB boundary.
The rest is accessed above 4GB, using Intel's 36-bit PAE functionality.

I think what you want to see is /proc/mtrr, annotated below by me:

reg00: base=0x08000 (2048MB), size=2048MB: uncachable, count=1  I/O space
reg01: base=0x0 (   0MB), size=4096MB: write-back, count=1  first 2GB 
of RAM + I/O space
reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1  third GB of 
RAM
reg03: base=0x14000 (5120MB), size= 512MB: write-back, count=1  portion of 
4th GB of RAM
reg04: base=0x16000 (5632MB), size= 256MB: write-back, count=1  portion of 
4th GB of RAM
reg05: base=0x17000 (5888MB), size= 128MB: write-back, count=1  portion of 
4th GB of RAM
reg06: base=0x17800 (6016MB), size=  64MB: write-back, count=1  portion of 
4th GB of RAM
reg07: base=0x0af80 (2808MB), size=   8MB: uncachable, count=1  (?) dunno


From that, the visible RAM should be 2048 + 1024 + 512 + 256 + 128 + 64 = 
3968MB.

In /proc/meminfo, it reports MemTotal of 4067260kB, which divided by 1024 gives 
3971MB.

The BIOS reports 4024MB.

But the MTRR values above do make it rather clear that nearly half the RAM
requires 33-bit physical addressing for access.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall
On Fri, Nov 23, 2007 at 10:54:11PM +0300, Evgeniy Polyakov wrote:
> On Fri, Nov 23, 2007 at 01:41:39PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
> wrote:
> > Here's another thought: move all this logic into the networking core,
> > unify it with current softirq zapper, then allow it to be called from
> > various other places (like atomic allocators). Then it'll all be in
> > central maintained place with more users.
> 
> This can be done quite easily - put a check into __kfree_skb() if
> netpoll is compiled-in and we are in hardirq context, then put skb
> into softirq freeing queue. Then zap_completion_queue() can free
> anything without ever knowing about nature of the packet, since this
> will be checked in __kfree_skb() anyway.

What I had in mind was moving the whole zap_completion_queue concept
into net/core/skbuff. So that netpoll (and, say, atomic kmalloc) can
simply call something like "clean_completion_queue".

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9] Clean up open coded inode dirty checks

2007-11-23 Thread Jan Engelhardt

On Nov 23 2007 11:47, Joe Perches wrote:
>On Fri, 2007-11-23 at 19:16 +0100, Jan Engelhardt wrote:
>> static inline bool xfs_inode_clean(const struct xfs_inode *ip)
>> {
>>  if (ip->i_itemp == NULL)
>>  return true;
>>  if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) &&
>>  ip->i_update_core == NULL)
>>  return true;
>>  return false;
>> }
>
>Your code changed the test.

See - the previous cryptic constructs could not even be decoded ;-)

>xfs_inode.i_update_core is an unsigned char.
>
>I believe reordering the tests to avoid a possibly
>unnecessary dereference is better.
>
>   if (ip->i_update_core)
>   return false;
>   if (!ip->i_itemp)
>   return true;
>   return ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL;

Yeah, something like that.

Note: the function SHOULD return bool for this, to quash the
ilf_fields & XFS_ILOG_ALL into 0/1.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 10:54:10PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
> On Fri, Nov 23, 2007 at 01:41:39PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
> wrote:
> > Here's another thought: move all this logic into the networking core,
> > unify it with current softirq zapper, then allow it to be called from
> > various other places (like atomic allocators). Then it'll all be in
> > central maintained place with more users.
> 
> This can be done quite easily - put a check into __kfree_skb() if
> netpoll is compiled-in and we are in hardirq context, then put skb
> into softirq freeing queue. Then zap_completion_queue() can free
> anything without ever knowing about nature of the packet, since this
> will be checked in __kfree_skb() anyway.

And let's add some mess...
But should fix the case when netpoll code is being executed in interrupt
context and is about to free skb, which should not be freed.

Frankly saying this looks like crap.

Crap-added-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 758dafe..88f8ea9 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -196,10 +196,7 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist->next;
-   if (skb->destructor)
-   dev_kfree_skb_any(skb); /* put this one back */
-   else
-   __kfree_skb(skb);
+   __kfree_skb(skb);
}
}
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 27cfe5f..8642097 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -318,6 +318,26 @@ void kfree_skbmem(struct sk_buff *skb)
 
 void __kfree_skb(struct sk_buff *skb)
 {
+#if defined(CONFIG_NETPOLL) || defined(CONFIG_NETPOLL_TRAP)
+   if (in_irq() || irqs_disabled()) {
+   if (skb->destructor) {
+   dev_kfree_skb_irq(skb);
+   return;
+   }
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+   if (skb->nfct || skb->nfct_reasm) {
+   dev_kfree_skb_irq(skb);
+   return;
+   }
+#endif
+#ifdef CONFIG_XFRM
+   if (skb->sp) {
+   dev_kfree_skb_irq(skb);
+   return;
+   }
+#endif
+   }
+#endif
dst_release(skb->dst);
 #ifdef CONFIG_XFRM
secpath_put(skb->sp);

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/2] wait_task_stopped: remove unneeded delay_group_leader check

2007-11-23 Thread Oleg Nesterov
wait_task_stopped() doesn't need the "delay_group_leader" parameter. If the
child is not traced it must be a group leader. With or without subthreads
->group_stop_count == 0 when the whole task is stopped.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- PT/kernel/exit.c~5_ck_group_stop2007-11-22 19:08:43.0 +0300
+++ PT/kernel/exit.c2007-11-23 20:31:21.0 +0300
@@ -1348,7 +1348,7 @@ static int wait_task_zombie(struct task_
  * the lock and this task is uninteresting.  If we return nonzero, we have
  * released the lock and the system call should return.
  */
-static int wait_task_stopped(struct task_struct *p, int delayed_group_leader,
+static int wait_task_stopped(struct task_struct *p,
 int noreap, struct siginfo __user *infop,
 int __user *stat_addr, struct rusage __user *ru)
 {
@@ -1362,8 +1362,7 @@ static int wait_task_stopped(struct task
if (unlikely(!is_task_stopped_or_traced(p)))
goto unlock_sig;
 
-   if (delayed_group_leader && !(p->ptrace & PT_PTRACED) &&
-   p->signal->group_stop_count > 0)
+   if (!(p->ptrace & PT_PTRACED) && p->signal->group_stop_count > 0)
/*
 * A group stop is in progress and this is the group leader.
 * We won't report until all threads have stopped.
@@ -1519,7 +1518,7 @@ repeat:
!(options & WUNTRACED))
continue;
 
-   retval = wait_task_stopped(p, ret == 2,
+   retval = wait_task_stopped(p,
(options & WNOWAIT), infop,
stat_addr, ru);
} else if (p->exit_state == EXIT_ZOMBIE) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 2/2] do_wait: cleanup delay_group_leader() usage

2007-11-23 Thread Oleg Nesterov
eligible_child() == 2 means delay_group_leader(). With the previous patch this
only matters for EXIT_ZOMBIE task, we can move that special check to the only
place it is really needed.

Also, with this patch we don't skip security_task_wait() for the group leaders
in a non-empty thread group. I don't really understand the exact semantics of
security_task_wait(), but imho this change is a bugfix.

Also rearrange the code a bit to kill an ugly "check_continued" backdoor.

Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>

--- PT/kernel/exit.c~6_delay_leader 2007-11-23 20:31:21.0 +0300
+++ PT/kernel/exit.c2007-11-23 21:29:44.0 +0300
@@ -1137,12 +1137,6 @@ static int eligible_child(pid_t pid, int
if (((p->exit_signal != SIGCHLD) ^ ((options & __WCLONE) != 0))
&& !(options & __WALL))
return 0;
-   /*
-* Do not consider thread group leaders that are
-* in a non-empty thread group:
-*/
-   if (delay_group_leader(p))
-   return 2;
 
err = security_task_wait(p);
if (err)
@@ -1494,10 +1488,9 @@ repeat:
tsk = current;
do {
struct task_struct *p;
-   int ret;
 
list_for_each_entry(p, >children, sibling) {
-   ret = eligible_child(pid, options, p);
+   int ret = eligible_child(pid, options, p);
if (!ret)
continue;
 
@@ -1521,19 +1514,17 @@ repeat:
retval = wait_task_stopped(p,
(options & WNOWAIT), infop,
stat_addr, ru);
-   } else if (p->exit_state == EXIT_ZOMBIE) {
+   } else if (p->exit_state == EXIT_ZOMBIE &&
+   !delay_group_leader(p)) {
/*
-* Eligible but we cannot release it yet:
+* We don't reap group leaders with subthreads.
 */
-   if (ret == 2)
-   goto check_continued;
if (!likely(options & WEXITED))
continue;
retval = wait_task_zombie(p,
(options & WNOWAIT), infop,
stat_addr, ru);
} else if (p->exit_state != EXIT_DEAD) {
-check_continued:
/*
 * It's running now, so it might later
 * exit, stop, or stop and then continue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 01:41:39PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
wrote:
> Here's another thought: move all this logic into the networking core,
> unify it with current softirq zapper, then allow it to be called from
> various other places (like atomic allocators). Then it'll all be in
> central maintained place with more users.

This can be done quite easily - put a check into __kfree_skb() if
netpoll is compiled-in and we are in hardirq context, then put skb
into softirq freeing queue. Then zap_completion_queue() can free
anything without ever knowing about nature of the packet, since this
will be checked in __kfree_skb() anyway.

Kind of this:

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 758dafe..88f8ea9 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -196,10 +196,7 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist->next;
-   if (skb->destructor)
-   dev_kfree_skb_any(skb); /* put this one back */
-   else
-   __kfree_skb(skb);
+   __kfree_skb(skb);
}
}
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 27cfe5f..f720685 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -318,6 +318,12 @@ void kfree_skbmem(struct sk_buff *skb)
 
 void __kfree_skb(struct sk_buff *skb)
 {
+#if defined(CONFIG_NETPOLL) || defined(CONFIG_NETPOLL_TRAP)
+   if (in_irq() || irqs_disabled()) {
+   dev_kfree_skb_irq(skb);
+   return;
+   }
+#endif
dst_release(skb->dst);
 #ifdef CONFIG_XFRM
secpath_put(skb->sp);

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inotify fails to send IN_ATTRIB events

2007-11-23 Thread Morten Welinder
This looks bad, though:

include/linux/fsnotify.h:121: warning: passing argument 2 of
'audit_inode_child' from incompatible pointer type

Missing "->d_inode"?

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.

2007-11-23 Thread Andi Kleen
On Fri, Nov 23, 2007 at 02:35:05PM +1100, Rusty Russell wrote:
> On Friday 23 November 2007 12:36:22 Andi Kleen wrote:
> > On Friday 23 November 2007 01:25, Rusty Russell wrote:
> > > That's my point.  If there's a whole class of modules which can use a
> > > symbol, why are we ruling out external modules?
> >
> > The point is to get cleaner interfaces.
> 
> But this doesn't change interfaces at all.  It makes modules fail to load 
> unless they're on a permitted list, which now requires maintenance.

The modules wouldn't be using the internal interfaces in the first
place with name spaces in place. This serves as a documentation
on what is considered internal. And if some obscure module (in or
out of tree) wants to use an internal interface they first have
to send the module maintainer a patch and get some review this way.

I believe that is fairly important in tree too because the 
kernel has become so big now that review cannot be the only
enforcement mechanism for this anymore.

Another secondary reason is that there are too many exported interfaces
in general. Several distributions have policies that require to 
keep the changes to these exported interfaces minimal and that
is very hard with thousands of exported symbol.  With name spaces
the number of truly publicly exported symbols will hopefully
shrink to a much smaller, more manageable set.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9] Clean up open coded inode dirty checks

2007-11-23 Thread Joe Perches
On Fri, 2007-11-23 at 19:16 +0100, Jan Engelhardt wrote:
> static inline bool xfs_inode_clean(const struct xfs_inode *ip)
> {
>   if (ip->i_itemp == NULL)
>   return true;
>   if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) &&
>   ip->i_update_core == NULL)
>   return true;
>   return false;
> }

Your code changed the test.
xfs_inode.i_update_core is an unsigned char.

I believe reordering the tests to avoid a possibly
unnecessary dereference is better.

if (ip->i_update_core)
return false;
if (!ip->i_itemp)
return true;
return ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall
On Fri, Nov 23, 2007 at 10:32:22PM +0300, Evgeniy Polyakov wrote:
> On Fri, Nov 23, 2007 at 01:11:20PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
> wrote:
> > On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote:
> > > On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL 
> > > PROTECTED]) wrote:
> > > > On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL 
> > > > PROTECTED]) wrote:
> > > > > Stop, we are trying to free skb without destructor and catch 
> > > > > connection
> > > > > tracking, so it is not a solution. To fix the problem we need to check
> > > > > if it is not netfilter related, kind of this (not tested), Simon 
> > > > > please
> > > > > give it a try:
> > > > 
> > > > And to be really cool we need to bypass skbs with xfrm attached, since
> > > > its freeing also assumes BH context.
> > > 
> > > What about compile options?
> > 
> > What about my original suggestion that we mark skbs owned by netpoll
> > and free only those. Much safer, no? Untested:
> 
> This should work if there are netpoll's skbs, but if we are under memory
> pressure we want to free not only netpoll skbs, but at least one, and 
> what if there are no netpoll skbs in the queue?

Yeah, that's a concern (but note that we do have a private reserve and
we only really need the zap when our reserve is depleted). But I worry
that it's too fragile and if we add a new unsafe case, it won't be
noticed for a long time. This is the first report I've seen of this
particular problem, so this has been a latent bug for three or four
years now.

Here's another thought: move all this logic into the networking core,
unify it with current softirq zapper, then allow it to be called from
various other places (like atomic allocators). Then it'll all be in
central maintained place with more users.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 01:11:20PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
wrote:
> On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote:
> > On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL 
> > PROTECTED]) wrote:
> > > On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL 
> > > PROTECTED]) wrote:
> > > > Stop, we are trying to free skb without destructor and catch connection
> > > > tracking, so it is not a solution. To fix the problem we need to check
> > > > if it is not netfilter related, kind of this (not tested), Simon please
> > > > give it a try:
> > > 
> > > And to be really cool we need to bypass skbs with xfrm attached, since
> > > its freeing also assumes BH context.
> > 
> > What about compile options?
> 
> What about my original suggestion that we mark skbs owned by netpoll
> and free only those. Much safer, no? Untested:

This should work if there are netpoll's skbs, but if we are under memory
pressure we want to free not only netpoll skbs, but at least one, and 
what if there are no netpoll skbs in the queue?

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall
On Fri, Nov 23, 2007 at 10:15:24PM +0300, Evgeniy Polyakov wrote:
> On Fri, Nov 23, 2007 at 12:59:43PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
> wrote:
> > So I'd be surprised if that was a problem. But I can imagine having
> > problems for skbs without destructors which run into one of these in
> > __kfree_skb:
> > 
> > dst_release
> > secpath_put
> > nf_conntrack_put
> > nf_conntrack_put_reasm
> > nf_bridge_put
> > 
> > ..some or all of which assume a softirq context.
> 
> bridging is ok, others require softirq context.
> I've sent a patch (the last one should be ok) to guard against xfrm and
> connection tracking.
> 
> > > No matter if we are under memory pressure or whatever - it is not
> > > allowed - a lot of skbs are supposed to be freed in softirq context,
> > > that is why dev_kfree_skb_any() exists.
> > 
> > Some skbs we definitely -can- free in irq context. The only ones we
> > care about are the ones generated by netpoll. If there's a reason you
> > think netpoll's own skbs can't be freed, please describe it.
> 
> Only some and to distinguish them we can not use destructor - if it is
> set (even empty function) it will fire an alarm.

Yep, please look at the patch I just posted.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] utsns: Restore proper namespace handling.

2007-11-23 Thread Eric W. Biederman

When CONFIG_UTS_NS was removed it seems that we also deleted
the code for handling sysctls in the other then the initial
uts namespace.   This patch restores that code.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 kernel/utsname_sysctl.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/utsname_sysctl.c b/kernel/utsname_sysctl.c
index c76c064..71f58c3 100644
--- a/kernel/utsname_sysctl.c
+++ b/kernel/utsname_sysctl.c
@@ -18,6 +18,8 @@
 static void *get_uts(ctl_table *table, int write)
 {
char *which = table->data;
+   struct uts_namespace *uts_ns = current->nsproxy->uts_ns;
+   which = (which - (char *)_uts_ns) + (char *)uts_ns;
 
if (!write)
down_read(_sem);
-- 
1.5.3.rc6.17.g1911

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall
On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote:
> On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL 
> PROTECTED]) wrote:
> > On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL 
> > PROTECTED]) wrote:
> > > Stop, we are trying to free skb without destructor and catch connection
> > > tracking, so it is not a solution. To fix the problem we need to check
> > > if it is not netfilter related, kind of this (not tested), Simon please
> > > give it a try:
> > 
> > And to be really cool we need to bypass skbs with xfrm attached, since
> > its freeing also assumes BH context.
> 
> What about compile options?

What about my original suggestion that we mark skbs owned by netpoll
and free only those. Much safer, no? Untested:

diff -r c60016ba6237 net/core/netpoll.c
--- a/net/core/netpoll.cTue Nov 13 09:09:36 2007 -0800
+++ b/net/core/netpoll.cFri Nov 23 13:10:28 2007 -0600
@@ -203,6 +203,12 @@ static void refill_skbs(void)
spin_unlock_irqrestore(_pool.lock, flags);
 }
 
+/* used to mark an skb as owned by netpoll */
+static void netpoll_skb_destroy(struct sk_buff *skb)
+{
+   return;
+}
+
 static void zap_completion_queue(void)
 {
unsigned long flags;
@@ -219,10 +225,12 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist->next;
-   if (skb->destructor)
+   if (skb->destructor == netpoll_skb_destroy) {
+   skb->destructor = NULL;
+   __kfree_skb(skb);
+   }
+   else
dev_kfree_skb_any(skb); /* put this one back */
-   else
-   __kfree_skb(skb);
}
}
 
@@ -252,6 +260,7 @@ repeat:
 
atomic_set(>users, 1);
skb_reserve(skb, reserve);
+   skb->destructor = netpoll_skb_destroy;
return skb;
 }
 

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 12:59:43PM -0600, Matt Mackall ([EMAIL PROTECTED]) 
wrote:
> So I'd be surprised if that was a problem. But I can imagine having
> problems for skbs without destructors which run into one of these in
> __kfree_skb:
> 
> dst_release
> secpath_put
> nf_conntrack_put
> nf_conntrack_put_reasm
> nf_bridge_put
> 
> ..some or all of which assume a softirq context.

bridging is ok, others require softirq context.
I've sent a patch (the last one should be ok) to guard against xfrm and
connection tracking.

> > No matter if we are under memory pressure or whatever - it is not
> > allowed - a lot of skbs are supposed to be freed in softirq context,
> > that is why dev_kfree_skb_any() exists.
> 
> Some skbs we definitely -can- free in irq context. The only ones we
> care about are the ones generated by netpoll. If there's a reason you
> think netpoll's own skbs can't be freed, please describe it.

Only some and to distinguish them we can not use destructor - if it is
set (even empty function) it will fire an alarm.

> > I think we can drop skbs _without_ destructor from the queue though in
> > that conditions given that we actually need only one.
> 
> Huh?

Don't mind - friday...
I posted a patch (third one should be ok) to fix this issue.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the new timerfd?

2007-11-23 Thread Michael Kerrisk
On Nov 23, 2007 7:38 PM, Ulrich Drepper <[EMAIL PROTECTED]> wrote:
> On Nov 23, 2007 9:29 AM, Davide Libenzi <[EMAIL PROTECTED]> wrote:
> > Yes, it's disabled, and yes, I'll repost today ...
>
> I haven't seen the patch and don't feel like searching.  So I say it
> here: please mak sure you add a flags parameter to the system call
> itself (instead of adding it on as for eventfd and signalfd).  We need
> to be able to use O_CLOEXEC some way or another.

Seems reasonable to add this for timer_create() (though unfortunate
that it is now too late to do the same for eventfd() and signalfd()).
Davide, what do you think?

Cheers,

Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall
On Fri, Nov 23, 2007 at 08:57:57PM +0300, Evgeniy Polyakov wrote:
> On Fri, Nov 23, 2007 at 11:07:56AM -0600, Matt Mackall ([EMAIL PROTECTED]) 
> wrote:
> > On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote:
> > > On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL 
> > > PROTECTED]) wrote:
> > > > > [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at 
> > > > > kernel/softirq.c:139 local_bh_enable()
> > > > > [2059664.620535]  [<80120364>] local_bh_enable+0x3c/0x97
> > > 
> > > > > [2059664.620657]  [<8011c205>] __call_console_drivers+0x61/0x6d
> > > > > [2059664.620669]  [<8011c3fc>] release_console_sem+0x164/0x1bf
> > > > > [2059664.620679]  [<8011c81f>] vprintk+0x27a/0x2ff
> > >  
> > > > If that trace is to be beieved we're doing nefilter stuff on packets 
> > > > which
> > > > were sent across netconsole.
> > > > 
> > > > This probably isn't anything the netfilter guys have thought about.  And
> > > > probably we don't want them to.  Is there some simple way in which we 
> > > > can
> > > > exempt netconsole from netfilter processing?
> > > 
> > > This is not about netfilter, but about freeing skb in interrupt context, 
> > > which is not allowed, and in interrupt skbs are queued to be freed in 
> > > softirq,
> > > but netcnsole wants to flush softirq freeing queue. That is a question: 
> > > why?
> > 
> > My memory here is hazy, but I think this exists to rescue netconsole
> > in low-memory situations. This bit originated with Ingo, so maybe he
> > can recall.
> > 
> > Netpoll can process an arbitrary number of skbs inside a single
> > interrupt. Think sysrq-t at one packet per line or kgdboe where the
> > entire trace session can happen inside one very long interrupt.
> > 
> > Perhaps we can refine this to mark netpoll's skbs (perhaps with
> > ->destructor?) and delete only skbs we own. As these are never passed
> > through any of the other route/xfrm/filter code, they should be safe
> > to delete even in irq context, yes?
> > 
> > > Removing zap_completion_queue() from find_skb() will fix the warning,
> > > but I'm not sure this is a correct fix. I've added Matt to the Cc list.
> > 
> > Care to try the sysrq-t or OOM message tests?
> 
> We basically can not free skbs there - if it is interrupt context and
> we are freeing some skb with destructor we will catch the warning anyway.

Perhaps I'm missing some context here. We don't free skbs with
destructors in irq context in zap_completion_queue. We reinsert them on the
completion list. We do this by calling dev_kfree_skb_any.

So I'd be surprised if that was a problem. But I can imagine having
problems for skbs without destructors which run into one of these in
__kfree_skb:

dst_release
secpath_put
nf_conntrack_put
nf_conntrack_put_reasm
nf_bridge_put

..some or all of which assume a softirq context.

> No matter if we are under memory pressure or whatever - it is not
> allowed - a lot of skbs are supposed to be freed in softirq context,
> that is why dev_kfree_skb_any() exists.

Some skbs we definitely -can- free in irq context. The only ones we
care about are the ones generated by netpoll. If there's a reason you
think netpoll's own skbs can't be freed, please describe it.

> I think we can drop skbs _without_ destructor from the queue though in
> that conditions given that we actually need only one.

Huh?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
> On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL 
> PROTECTED]) wrote:
> > Stop, we are trying to free skb without destructor and catch connection
> > tracking, so it is not a solution. To fix the problem we need to check
> > if it is not netfilter related, kind of this (not tested), Simon please
> > give it a try:
> 
> And to be really cool we need to bypass skbs with xfrm attached, since
> its freeing also assumes BH context.

What about compile options?

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 758dafe..adb3c54 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -196,10 +196,25 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist->next;
-   if (skb->destructor)
+   if (skb->destructor) {
dev_kfree_skb_any(skb); /* put this one back */
-   else
-   __kfree_skb(skb);
+   continue;
+   }
+
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+   if (skb->nfct || skb->nfct_reasm) {
+   dev_kfree_skb_any(skb); /* put this one back */
+   continue;
+   }
+#endif
+
+#ifdef CONFIG_XFRM
+   if (skb->sp) {
+   dev_kfree_skb_any(skb); /* put this one back */
+   continue;
+   }
+#endif
+   __kfree_skb(skb);
}
}
 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
> Stop, we are trying to free skb without destructor and catch connection
> tracking, so it is not a solution. To fix the problem we need to check
> if it is not netfilter related, kind of this (not tested), Simon please
> give it a try:

And to be really cool we need to bypass skbs with xfrm attached, since
its freeing also assumes BH context.

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 758dafe..5f86e60 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -196,7 +196,8 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist->next;
-   if (skb->destructor)
+   if (skb->destructor || skb->nfct ||
+   skb->nfct_reasm || skb->sp)
dev_kfree_skb_any(skb); /* put this one back */
else
__kfree_skb(skb);


-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 08:57:57PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
> > My memory here is hazy, but I think this exists to rescue netconsole
> > in low-memory situations. This bit originated with Ingo, so maybe he
> > can recall.
> > 
> > Netpoll can process an arbitrary number of skbs inside a single
> > interrupt. Think sysrq-t at one packet per line or kgdboe where the
> > entire trace session can happen inside one very long interrupt.
> > 
> > Perhaps we can refine this to mark netpoll's skbs (perhaps with
> > ->destructor?) and delete only skbs we own. As these are never passed
> > through any of the other route/xfrm/filter code, they should be safe
> > to delete even in irq context, yes?
> > 
> > > Removing zap_completion_queue() from find_skb() will fix the warning,
> > > but I'm not sure this is a correct fix. I've added Matt to the Cc list.
> > 
> > Care to try the sysrq-t or OOM message tests?
> 
> We basically can not free skbs there - if it is interrupt context and
> we are freeing some skb with destructor we will catch the warning anyway.
> 
> No matter if we are under memory pressure or whatever - it is not
> allowed - a lot of skbs are supposed to be freed in softirq context,
> that is why dev_kfree_skb_any() exists.
> 
> I think we can drop skbs _without_ destructor from the queue though in
> that conditions given that we actually need only one.

Stop, we are trying to free skb without destructor and catch connection
tracking, so it is not a solution. To fix the problem we need to check
if it is not netfilter related, kind of this (not tested), Simon please
give it a try:

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 758dafe..855bb3f 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -196,7 +196,7 @@ static void zap_completion_queue(void)
while (clist != NULL) {
struct sk_buff *skb = clist;
clist = clist->next;
-   if (skb->destructor)
+   if (skb->destructor || skb->nfct || skb->nfct_reasm)
dev_kfree_skb_any(skb); /* put this one back */
else
__kfree_skb(skb);


-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Morrison, Tom
Yes, I believe that - otherwise, this problem would have
been a crisis a LONG time ago...:-)
 
But I do have some more questions in relationship to how 
things are mapped in your environment. I have a flat memory 
map (i.e.: the full 0x0 -- 0x1__ is passed to the 32bit 
Linux kernel without any 'holes' and/or reserved areas).
 
Does your Intel memory map have this same type of flat memory 
model (and thus allow use of the FULL lower 4Gig) - or does it 
reserve areas of lower 4Gig for devices and such - if not - where 
are these reserved areas - and how do the relate to the I/O memory
map for the device?
 
In other words, I would be very interested in seeing the memory 
map & the PCI memory mapping to see if any overlap/correspond 
to reserve areas of lower 4 Gig (in a linux 32bit mode)...
 
Tom



From: Mark Lord [mailto:[EMAIL PROTECTED]
Sent: Fri 11/23/2007 12:46 PM
To: Morrison, Tom
Cc: Robert Hancock; linux-kernel; ide; Jeff Garzik; Tejun Heo
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)



Morrison, Tom wrote:
> I am hopeful that the sata_mv has this bug (I proved that the
> problem I was experiencing was due to the sata_mv driver
> with 3.75Gig or more of memory)...
> 
> I am on vacation for a week or more ...or I'd tell you today
> if it did have this bug!
..

Yeah, I kind of had your reports in mind when I asked that.  :)

On a related note, I now have lots of Marvell (sata_mv) hardware here,
and an Intel CPU/chipset box with physical RAM above the 4GB boundary.

Cheers


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the new timerfd?

2007-11-23 Thread Ulrich Drepper
On Nov 23, 2007 9:29 AM, Davide Libenzi <[EMAIL PROTECTED]> wrote:
> Yes, it's disabled, and yes, I'll repost today ...

I haven't seen the patch and don't feel like searching.  So I say it
here: please mak sure you add a flags parameter to the system call
itself (instead of adding it on as for eventfd and signalfd).  We need
to be able to use O_CLOEXEC some way or another.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23.1: Random hangs during boot with "tsc" clocksource

2007-11-23 Thread Jordan Russell
Chuck Ebbert wrote:
> On 11/20/2007 06:20 PM, Jordan Russell wrote:
>> Same problem with 2.6.23.8.
>>
>> Are there any specific (TSC related?) patches I should try reverting?
>>
>> Would it help if I captured the dmesg/SysRq output from one of the
>> hanging boots?
>>
>> Any other information that might be useful in getting to the bottom of this?
>>
> 
> Did you try this one? You are seeing problems with preemption disabled,
> but it's at least worth trying.
> 
> 
> 
> From: Marin Mitov <[EMAIL PROTECTED]>
> To: linux-kernel@vger.kernel.org
> Subject: [PATCH]new_TSC_based_delay_tsc()
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Date: Tue, 20 Nov 2007 21:32:27 +0200

Thanks for the response.

I backported that patch to 2.6.23.8, but it didn't make a difference.

I also went ahead and tested 2.6.24-rc3 (with no changes to the existing
config settings):

With the new CPU_IDLE option set to "n", it still hangs, but much less
frequently than on 2.6.23.x. In 25 tries, there were 3 hangs, all after
"input: AT Translated Set 2 keyboard as /class/input/input0".

With CPU_IDLE set to "y", it didn't hang at all in 25 tries. However,
CPU_IDLE=y produces these additional messages, which may explain why:

Marking TSC unstable due to: TSC halts in idle.
...
Time: acpi_pm clocksource has been installed.

Not sure what else to try at this point...

-- 
Jordan Russell
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9] Clean up open coded inode dirty checks

2007-11-23 Thread Jan Engelhardt

On Nov 23 2007 18:02, Christoph Hellwig wrote:
>
>> +STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip)
>> +{
>> +return (((ip->i_itemp == NULL) ||
>> +!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
>> +(ip->i_update_core == 0));
>> +}
>
>Can we please get rid of this useless STATIC_INLINE junk?  It's really
>hurting my eyes.
>
>As does to a lesser extent the verbose style of this
>function.

I have to disagree, but whatever.

>static inline int xfs_inode_clean(struct xfs_inode *ip)
   ^   ^
could be bool - and const
>{
>   return (!ip->i_itemp ||
>   !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
>  !ip->i_update_core;
>}

Perhaps for greater readability:

static inline bool xfs_inode_clean(const struct xfs_inode *ip)
{
if (ip->i_itemp == NULL)
return true;
if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) &&
ip->i_update_core == NULL)
return true;
return false;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/9] Clean up open coded inode dirty checks

2007-11-23 Thread Christoph Hellwig
> +STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip)
> +{
> + return (((ip->i_itemp == NULL) ||
> + !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
> + (ip->i_update_core == 0));
> +}

Can we please get rid of this useless STATIC_INLINE junk?  It's really
hurting my eyes.  As does to a lesser extent the verbose style of this
function.  This should be something like:

static inline int xfs_inode_clean(struct xfs_inode *ip)
{
return (!ip->i_itemp ||
!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
   !ip->i_update_core;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/9] Remove xfs_icluster

2007-11-23 Thread Christoph Hellwig
On Thu, Nov 22, 2007 at 11:39:52AM +1100, David Chinner wrote:
> Remove the xfs_icluster structure and replace with a radix tree lookup.
> 
> We don't need to keep a list of inodes in each cluster around anymore
> as we can look them up quickly when we need to. The only time we need
> to do this now is during inode writeback.
> 
> Factor the inode cluster writeback code out of xfs_iflush and convert
> it to use radix_tree_gang_lookup() instead of walking a list of
> inodes built when we first read in the inodes.
> 
> This remove 3 pointers from each xfs_inode structure and the xfs_icluster
> structure per inode cluster. Hence we reduce the cache footprint of the
> xfs_inodes by between 5-10% depending on cluster sparseness.
> 
> To be truly efficient we need a radix_tree_gang_lookup_range() call
> to stop searching once we are past the end of the cluster instead
> of trying to find a full cluster's worth of inodes.

Nice, I like this a lot.  I was wondering about something like this
already when you put in the radix-tree based inode cache.

> +STATIC int
> +xfs_iflush_cluster(
> + xfs_inode_t *ip,
> + xfs_buf_t   *bp)
> +{
> + xfs_mount_t *mp = ip->i_mount;
> + xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino);
> + unsigned long   first_index, mask;
> + int ilist_size;
> + xfs_inode_t *ilist;
> + xfs_inode_t *iq;
> + xfs_inode_log_item_t*iip;
> + int nr_found;
> + int clcount = 0;
> + int bufwasdelwri;
> +
> + ASSERT(pag->pagi_inodeok);
> + ASSERT(pag->pag_ici_init);
> +
> + ilist_size = XFS_INODE_CLUSTER_SIZE(mp) * sizeof(xfs_inode_t *);
> + ilist = kmem_alloc(ilist_size, KM_MAYFAIL);
> + if (!ilist)
> + return 0;

Now if you just used the linux native allocator this could be a kcalloc :)

> + if ((iq->i_update_core == 0) &&
> + ((iip == NULL) ||
> +  !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
> +   xfs_ipincount(iq) == 0) {
> + continue;
> + }

if (!iq->i_update_core &&
(!iip || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) &&
!xfs_ipincount(iq))
continue;

> + /*
> +  * arriving here means that this inode can be flushed.  First
> +  * re-check that it's dirty before flushing.
> +  */
> + iip = iq->i_itemp;
> + if ((iq->i_update_core != 0) || ((iip != NULL) &&
> +  (iip->ili_format.ilf_fields & XFS_ILOG_ALL))) {

if (!iq->i_update_core ||
(!iip && (iip->ili_format.ilf_fields & XFS_ILOG_ALL)) {

> + /*
> +  * Clean up the buffer.  If it was B_DELWRI, just release it --
> +  * brelse can handle it with no problems.  If not, shut down the
> +  * filesystem before releasing the buffer.
> +  */
> + bufwasdelwri = XFS_BUF_ISDELAYWRITE(bp);
> + if (bufwasdelwri)
> + xfs_buf_relse(bp);
> +
> + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> +
> + if (!bufwasdelwri) {
> + /*
> +  * Just like incore_relse: if we have b_iodone functions,
> +  * mark the buffer as an error and call them.  Otherwise
> +  * mark it as stale and brelse.
> +  */
> + if (XFS_BUF_IODONE_FUNC(bp)) {
> + XFS_BUF_CLR_BDSTRAT_FUNC(bp);
> + XFS_BUF_UNDONE(bp);
> + XFS_BUF_STALE(bp);
> + XFS_BUF_SHUT(bp);
> + XFS_BUF_ERROR(bp,EIO);
> + xfs_biodone(bp);
> + } else {
> + XFS_BUF_STALE(bp);
> + xfs_buf_relse(bp);
> + }
> + }

What's the point of all this if the filesystem is shut down anyway?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Evgeniy Polyakov
On Fri, Nov 23, 2007 at 11:07:56AM -0600, Matt Mackall ([EMAIL PROTECTED]) 
wrote:
> On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote:
> > On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL PROTECTED]) 
> > wrote:
> > > > [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at 
> > > > kernel/softirq.c:139 local_bh_enable()
> > > > [2059664.620535]  [<80120364>] local_bh_enable+0x3c/0x97
> > 
> > > > [2059664.620657]  [<8011c205>] __call_console_drivers+0x61/0x6d
> > > > [2059664.620669]  [<8011c3fc>] release_console_sem+0x164/0x1bf
> > > > [2059664.620679]  [<8011c81f>] vprintk+0x27a/0x2ff
> >  
> > > If that trace is to be beieved we're doing nefilter stuff on packets which
> > > were sent across netconsole.
> > > 
> > > This probably isn't anything the netfilter guys have thought about.  And
> > > probably we don't want them to.  Is there some simple way in which we can
> > > exempt netconsole from netfilter processing?
> > 
> > This is not about netfilter, but about freeing skb in interrupt context, 
> > which is not allowed, and in interrupt skbs are queued to be freed in 
> > softirq,
> > but netcnsole wants to flush softirq freeing queue. That is a question: why?
> 
> My memory here is hazy, but I think this exists to rescue netconsole
> in low-memory situations. This bit originated with Ingo, so maybe he
> can recall.
> 
> Netpoll can process an arbitrary number of skbs inside a single
> interrupt. Think sysrq-t at one packet per line or kgdboe where the
> entire trace session can happen inside one very long interrupt.
> 
> Perhaps we can refine this to mark netpoll's skbs (perhaps with
> ->destructor?) and delete only skbs we own. As these are never passed
> through any of the other route/xfrm/filter code, they should be safe
> to delete even in irq context, yes?
> 
> > Removing zap_completion_queue() from find_skb() will fix the warning,
> > but I'm not sure this is a correct fix. I've added Matt to the Cc list.
> 
> Care to try the sysrq-t or OOM message tests?

We basically can not free skbs there - if it is interrupt context and
we are freeing some skb with destructor we will catch the warning anyway.

No matter if we are under memory pressure or whatever - it is not
allowed - a lot of skbs are supposed to be freed in softirq context,
that is why dev_kfree_skb_any() exists.

I think we can drop skbs _without_ destructor from the queue though in
that conditions given that we actually need only one.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-23 Thread Laurent Riffard
Le 23.11.2007 12:38, Hannes Reinecke a écrit :
> Hannes Reinecke wrote:
>> Laurent Riffard wrote:
>>> Le 21.11.2007 23:41, Andrew Morton a écrit :
 On Wed, 21 Nov 2007 22:45:22 +0100
 Laurent Riffard <[EMAIL PROTECTED]> wrote:

> Le 21.11.2007 05:45, Andrew Morton a écrit :
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
> Hello, 
>
> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
> that a bunch of task are blocked in "D" state, they seem to wait for
> some I/O completion. I can try to hand-copy some data if requested.
>
> I found these messages in dmesg:
>
> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
> EXT3-fs: mounted filesystem with ordered data mode.
> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sda, sector 16460
> ReiserFS: sda7: found reiserfs format "3.6" with standard journal
> ReiserFS: sda7: using ordered data mode
> --
> ReiserFS: sda7: Using r5 hash to sort names
> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdb, sector 19632
> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
> driverbyte=DRIVER_OK,SUGGEST_OK
> end_request: I/O error, dev sdb, sector 40037363
> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 
> extents:1 across:1048568k
> lp0: using parport0 (interrupt-driven).
>
> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible.
> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.
>
> Maybe something is broken in pata_via driver ?
>
 Could be - 
 libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
 and 
 pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
 touch pata_via.c.
>>> None of the above...
>>>
>>> I did a bisection, it spotted git-scsi-misc.patch. 
>>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.
>>>
>>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not 
>>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other 
>>> commits are touching documentation or drivers I don't use. I'll try 
>>> to revert only this one this evening.

I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 
does fix the problem.

>> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an error 
>> where
>> I shouldn't. Checking ...
>>
> Ok, found it. We are blocking even special commands (ie requests with PREEMPT 
> not set)
> when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.

Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O errors.

-- 
laurent

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] Don't block pdflush when flushing inodes

2007-11-23 Thread Christoph Hellwig
> +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c  2007-11-22 10:33:51.037704348 +1100
> @@ -183,12 +183,20 @@ xfs_imap_to_bp(
>   int ni;
>   xfs_buf_t   *bp;
>  
> + if (buf_flags == 0)
> + buf_flags = XFS_BUF_LOCK;

There's just two caller and they never pass 0, so this is not needed.

> + error = xfs_itobp_flags(mp, NULL, ip, , , 0, 0,
> + (noblock) ? XFS_BUF_TRYLOCK : XFS_BUF_LOCK);

no need for the braces around noblock.

> +int  xfs_itobp_flags(struct xfs_mount *, struct xfs_trans *,
> xfs_inode_t *, struct xfs_dinode **, struct xfs_buf 
> **,
> -   xfs_daddr_t, uint);
> +   xfs_daddr_t, uint, uint);
> +#define xfs_itobp(mp, tp, ip, dipp, bpp, bno, iflags)\
> + xfs_itobp_flags(mp, tp, ip, dipp, bpp, bno, iflags, XFS_BUF_LOCK)

I'd say just convert xfs_itobp and all it's user to take the additional
argument.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/3] virtio PCI device

2007-11-23 Thread Avi Kivity

Anthony Liguori wrote:

Avi Kivity wrote:
  

Anthony Liguori wrote:

Well please propose the virtio API first and then I'll adjust the PCI 
ABI.  I don't want to build things into the ABI that we never 
actually end up using in virtio :-)


  
  

Move ->kick() to virtio_driver.



Then on each kick, all queues have to be checked for processing?  What 
devices do you expect this would help?


  


Networking.

I believe Xen networking uses the same event channel for both rx and 
tx, so in effect they're using this model.  Long time since I looked 
though,



I would have to look, but since rx/tx are rather independent actions, 
I'm not sure that you would really save that much.  You still end up 
doing the same number of kicks unless I'm missing something.


  


rx and tx are closely related. You rarely have one without the other.

In fact, a turned implementation should have zero kicks or interrupts 
for bulk transfers. The rx interrupt on the host will process new tx 
descriptors and fill the guest's rx queue; the guest's transmit function 
can also check the receive queue. I don't know if that's achievable for 
Linuz guests currently, but we should aim to make it possible.


Another point is that virtio still has a lot of leading zeros in its 
mileage counter. We need to keep things flexible and learn from others 
as much as possible, especially when talking about the ABI.


I'm wary of introducing the notion of hypercalls to this device because 
it makes the device VMM specific.  Maybe we could have the device 
provide an option ROM that was treated as the device "BIOS" that we 
could use for kicking and interrupt acking?  Any idea of how that would 
map to Windows?  Are there real PCI devices that use the option ROM 
space to provide what's essentially firmware?  Unfortunately, I don't 
think an option ROM BIOS would map well to other architectures.


  


The BIOS wouldn't work even on x86 because it isn't mapped to the guest 
address space (at least not consistently), and doesn't know the guest's 
programming model (16, 32, or 64-bits? segmented or flat?)


Xen uses a hypercall page to abstract these details out. However, I'm 
not proposing that. Simply indicate that we support hypercalls, and use 
some layer below to actually send them. It is the responsibility of this 
layer to detect if hypercalls are present and how to call them.


Hey, I think the best place for it is in paravirt_ops. We can even patch 
the hypercall instruction inline, and the driver doesn't need to know 
about it.


None of the PCI devices currently work like that in QEMU.  It would 
be very hard to make a device that worked this way because since the 
order in which values are written matter a whole lot.  For instance, 
if you wrote the status register before the queue information, the 
driver could get into a funky state.
  
  

I assume you're talking about restore?  Isn't that atomic?



If you're doing restore by passing the PCI config blob to a registered 
routine, then sure, but that doesn't seem much better to me than just 
having the device generate that blob in the first place (which is what 
we have today).  I was assuming that you would want to use the existing 
PIO/MMIO handlers to do restore by rewriting the config as if the guest was.


  


Sure some complexity is unavoidable. But flat is simpler than indirect.


Not much of an argument, I know.


wrt. number of queues, 8 queues will consume 32 bytes of pci space 
if all you store is the ring pfn.


You also at least need a num argument which takes you to 48 or 64 
depending on whether you care about strange formatting.  8 queues may 
not be enough either.  Eric and I have discussed whether the 9p 
virtio device should support multiple mounts per-virtio device and if 
so, whether each one should have it's own queue.  Any devices that 
supports this sort of multiplexing will very quickly start using a 
lot of queues.
  
  
Make it appear as a pci function?  (though my feeling is that multiple 
mounts should be different devices; we can then hotplug mountpoints).



We may run out of PCI slots though :-/
  


Then we can start selling virtio extension chassis.

--
Any sufficiently difficult bug is indistinguishable from a feature.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Mark Lord

Morrison, Tom wrote:

I am hopeful that the sata_mv has this bug (I proved that the
problem I was experiencing was due to the sata_mv driver 
with 3.75Gig or more of memory)...
 
I am on vacation for a week or more ...or I'd tell you today

if it did have this bug!

..

Yeah, I kind of had your reports in mind when I asked that.  :)

On a related note, I now have lots of Marvell (sata_mv) hardware here,
and an Intel CPU/chipset box with physical RAM above the 4GB boundary.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/9] Factor common inode cluster buffer lookup code

2007-11-23 Thread Christoph Hellwig
On Thu, Nov 22, 2007 at 11:36:42AM +1100, David Chinner wrote:
> +STATIC int
> +xfs_ino_to_imap(
> + xfs_mount_t *mp,
> + xfs_trans_t *tp,
> + xfs_ino_t   ino,
> + xfs_imap_t  *imap,
> + uintimap_flags)
> +{
> + int error;
> +
> + error = xfs_imap(mp, tp, ino, imap, imap_flags);
> + if (error) {
> + cmn_err(CE_WARN, "xfs_ino_to_imap: xfs_imap()  returned an "
> + "error %d on %s.  Returning error.",
> + error, mp->m_fsname);
> + return error;
> + }
> +
> + /*
> +  * If the inode number maps to a block outside the bounds
> +  * of the file system then return NULL rather than calling
> +  * read_buf and panicing when we get an error from the
> +  * driver.
> +  */
> + if ((imap->im_blkno + imap->im_len) >
> + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) {
> + xfs_fs_cmn_err(CE_ALERT, mp, "xfs_ino_to_imap: "
> + "(imap->im_blkno (0x%llx) + imap->im_len (0x%llx)) > "
> + " XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) (0x%llx)",
> + (unsigned long long) imap->im_blkno,
> + (unsigned long long) imap->im_len,
> + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks));
> + return XFS_ERROR(EINVAL);
> + }

What about just adding this verification to xfs_imap instead of creating
this wrapper for two of it's three callers?

Otherwise this patch looks fine to me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/9] Use _META bio I/O types for metadata I/O

2007-11-23 Thread Christoph Hellwig
On Thu, Nov 22, 2007 at 11:35:12AM +1100, David Chinner wrote:
> Improve metadata I/O merging in the elevator
> 
> Change all async metadata buffers to use [READ|WRITE]_META I/O types
> so that the I/O doesn't get issued immediately. This allows merging
> of adjacent metadata requests but still prioritises them over bulk
> data. This shows a 10-15% improvement in sequential create speed of
> small files.
> 
> Don't include the log buffers in this classification - leave them
> as sync types so they are issued immediately.

Looks good, and just including the trivial fs.h addition here might be
okay aswell.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Morrison, Tom
I am hopeful that the sata_mv has this bug (I proved that the
problem I was experiencing was due to the sata_mv driver 
with 3.75Gig or more of memory)...
 
I am on vacation for a week or more ...or I'd tell you today
if it did have this bug!



From: [EMAIL PROTECTED] on behalf of Mark Lord
Sent: Fri 11/23/2007 10:22 AM
To: Robert Hancock
Cc: linux-kernel; ide; Jeff Garzik; Tejun Heo
Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)



Robert Hancock wrote:
> This fixes some problems with ATAPI devices on nForce4 controllers in ADMA 
> mode
> on systems with memory located above 4GB. We need to delay setting the 64-bit
> DMA mask until the PRD table and padding buffer are allocated so that they 
> don't
> get allocated above 4GB and break legacy mode (which is needed for ATAPI
> devices).
...

Mmm.. I wonder how many other libata drivers have this exact same bug,
whether noticed yet or not ?

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the new timerfd?

2007-11-23 Thread Davide Libenzi
On Fri, 23 Nov 2007, Andrew Morton wrote:

> > I suppose this means that timerfd will only go in for 2.6.25.  I don't
> > have a problem with that, but we better make sure that the existing
> > timerfd in 2.6.24 is still disabled.  (Andrew had a one liner for
> > that, but I haven't checked if it's in place.)
> > 
> 
> I have no timerfd patches here.

Yes, it's disabled, and yes, I'll repost today ...


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the new timerfd?

2007-11-23 Thread Andrew Morton
On Fri, 23 Nov 2007 13:39:55 +0100 "Michael Kerrisk" <[EMAIL PROTECTED]> wrote:

> On 11/23/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > On Thu, 22 Nov 2007 16:35:38 -0800 (PST) Davide Libenzi
> > <[EMAIL PROTECTED]> wrote:
> >
> > > On Thu, 22 Nov 2007, Andrew Morton wrote:
> [...]
> > > > Last I recall, we removed the API for 2.6.23 because we intended to do a
> > > > different interface for 2.6.24.
> > > >
> > > > But I don't recall seeing any timerfd patches in maybe a month.
> > >
> > > Was sent on Sep 23, Subject: new timerfd API
> >
> > Half of us weren't born then ;)
> >
> > > Do you want me to repost?
> >
> > yes please.
> 
> I suppose this means that timerfd will only go in for 2.6.25.  I don't
> have a problem with that, but we better make sure that the existing
> timerfd in 2.6.24 is still disabled.  (Andrew had a one liner for
> that, but I haven't checked if it's in place.)
> 

I have no timerfd patches here.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()

2007-11-23 Thread Matt Mackall
On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote:
> On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL PROTECTED]) 
> wrote:
> > > [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at 
> > > kernel/softirq.c:139 local_bh_enable()
> > > [2059664.620535]  [<80120364>] local_bh_enable+0x3c/0x97
> 
> > > [2059664.620657]  [<8011c205>] __call_console_drivers+0x61/0x6d
> > > [2059664.620669]  [<8011c3fc>] release_console_sem+0x164/0x1bf
> > > [2059664.620679]  [<8011c81f>] vprintk+0x27a/0x2ff
>  
> > If that trace is to be beieved we're doing nefilter stuff on packets which
> > were sent across netconsole.
> > 
> > This probably isn't anything the netfilter guys have thought about.  And
> > probably we don't want them to.  Is there some simple way in which we can
> > exempt netconsole from netfilter processing?
> 
> This is not about netfilter, but about freeing skb in interrupt context, 
> which is not allowed, and in interrupt skbs are queued to be freed in softirq,
> but netcnsole wants to flush softirq freeing queue. That is a question: why?

My memory here is hazy, but I think this exists to rescue netconsole
in low-memory situations. This bit originated with Ingo, so maybe he
can recall.

Netpoll can process an arbitrary number of skbs inside a single
interrupt. Think sysrq-t at one packet per line or kgdboe where the
entire trace session can happen inside one very long interrupt.

Perhaps we can refine this to mark netpoll's skbs (perhaps with
->destructor?) and delete only skbs we own. As these are never passed
through any of the other route/xfrm/filter code, they should be safe
to delete even in irq context, yes?

> Removing zap_completion_queue() from find_skb() will fix the warning,
> but I'm not sure this is a correct fix. I've added Matt to the Cc list.

Care to try the sysrq-t or OOM message tests?

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.24-rc3-mm1] IPC: consolidate sem_exit_ns(), msg_exit_ns and shm_exit_ns()

2007-11-23 Thread Pierre Peiffer

sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an 
ipc_namespace is
released to free all ipcs of each type.
But in fact, they do the same thing: they loop around all ipcs to free them
individually by calling a specific routine.

This patch proposes to consolidate this by introducing a common function, 
free_ipcs(),
that do the job. The specific routine to call on each individual ipcs is passed 
as
parameter. For this, these ipc-specific 'free' routines are reworked to take a
generic 'struct ipc_perm' as parameter.

Signed-off-by: Pierre Peiffer <[EMAIL PROTECTED]>
---
 include/linux/ipc_namespace.h |5 -
 ipc/msg.c |   28 +---
 ipc/namespace.c   |   30 ++
 ipc/sem.c |   27 +--
 ipc/shm.c |   27 ++-
 5 files changed, 50 insertions(+), 67 deletions(-)

Index: b/ipc/msg.c
===
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -72,7 +72,7 @@ struct msg_sender {
 #define msg_unlock(msq)ipc_unlock(&(msq)->q_perm)
 #define msg_buildid(id, seq)   ipc_buildid(id, seq)
 
-static void freeque(struct ipc_namespace *, struct msg_queue *);
+static void freeque(struct ipc_namespace *, struct kern_ipc_perm *);
 static int newque(struct ipc_namespace *, struct ipc_params *);
 #ifdef CONFIG_PROC_FS
 static int sysvipc_msg_proc_show(struct seq_file *s, void *it);
@@ -91,26 +91,7 @@ void msg_init_ns(struct ipc_namespace *n
 #ifdef CONFIG_IPC_NS
 void msg_exit_ns(struct ipc_namespace *ns)
 {
-   struct msg_queue *msq;
-   struct kern_ipc_perm *perm;
-   int next_id;
-   int total, in_use;
-
-   down_write(_ids(ns).rw_mutex);
-
-   in_use = msg_ids(ns).in_use;
-
-   for (total = 0, next_id = 0; total < in_use; next_id++) {
-   perm = idr_find(_ids(ns).ipcs_idr, next_id);
-   if (perm == NULL)
-   continue;
-   ipc_lock_by_ptr(perm);
-   msq = container_of(perm, struct msg_queue, q_perm);
-   freeque(ns, msq);
-   total++;
-   }
-
-   up_write(_ids(ns).rw_mutex);
+   free_ipcs(ns, _ids(ns), freeque);
 }
 #endif
 
@@ -274,9 +255,10 @@ static void expunge_all(struct msg_queue
  * msg_ids.rw_mutex (writer) and the spinlock for this message queue are held
  * before freeque() is called. msg_ids.rw_mutex remains locked on exit.
  */
-static void freeque(struct ipc_namespace *ns, struct msg_queue *msq)
+static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
 {
struct list_head *tmp;
+   struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm);
 
expunge_all(msq, -EIDRM);
ss_wakeup(>q_senders, 1);
@@ -582,7 +564,7 @@ asmlinkage long sys_msgctl(int msqid, in
break;
}
case IPC_RMID:
-   freeque(ns, msq);
+   freeque(ns, >q_perm);
break;
}
err = 0;
Index: b/ipc/namespace.c
===
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -44,6 +44,36 @@ struct ipc_namespace *copy_ipcs(unsigned
return new_ns;
 }
 
+/*
+ * free_ipcs - free all ipcs of one type
+ * @ns:   the namespace to remove the ipcs from
+ * @ids:  the table of ipcs to free
+ * @free: the function called to free each individual ipc
+ *
+ * Called for each kind of ipc when an ipc_namespace exits.
+ */
+void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
+  void (*free)(struct ipc_namespace *, struct kern_ipc_perm *))
+{
+   struct kern_ipc_perm *perm;
+   int next_id;
+   int total, in_use;
+
+   down_write(>rw_mutex);
+
+   in_use = ids->in_use;
+
+   for (total = 0, next_id = 0; total < in_use; next_id++) {
+   perm = idr_find(>ipcs_idr, next_id);
+   if (perm == NULL)
+   continue;
+   ipc_lock_by_ptr(perm);
+   free(ns, perm);
+   total++;
+   }
+   up_write(>rw_mutex);
+}
+
 void free_ipc_ns(struct kref *kref)
 {
struct ipc_namespace *ns;
Index: b/ipc/sem.c
===
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -94,7 +94,7 @@
 #define sem_buildid(id, seq)   ipc_buildid(id, seq)
 
 static int newary(struct ipc_namespace *, struct ipc_params *);
-static void freeary(struct ipc_namespace *, struct sem_array *);
+static void freeary(struct ipc_namespace *, struct kern_ipc_perm *);
 #ifdef CONFIG_PROC_FS
 static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
 #endif
@@ -129,25 +129,7 @@ void sem_init_ns(struct ipc_namespace *n
 #ifdef CONFIG_IPC_NS
 void sem_exit_ns(struct ipc_namespace *ns)
 {
-   struct sem_array *sma;
-   struct kern_ipc_perm *perm;
-   int next_id;
-

Re: [kvm-devel] [PATCH 3/3] virtio PCI device

2007-11-23 Thread Anthony Liguori

Avi Kivity wrote:

Anthony Liguori wrote:
Well please propose the virtio API first and then I'll adjust the PCI 
ABI.  I don't want to build things into the ABI that we never 
actually end up using in virtio :-)


  


Move ->kick() to virtio_driver.


Then on each kick, all queues have to be checked for processing?  What 
devices do you expect this would help?


I believe Xen networking uses the same event channel for both rx and 
tx, so in effect they're using this model.  Long time since I looked 
though,


I would have to look, but since rx/tx are rather independent actions, 
I'm not sure that you would really save that much.  You still end up 
doing the same number of kicks unless I'm missing something.


I was thinking more along the lines that a hypercall-based device 
would certainly be implemented in-kernel whereas the current device 
is naturally implemented in userspace.  We can simply use a different 
device for in-kernel drivers than for userspace drivers.  


Where the device is implemented is an implementation detail that 
should be hidden from the guest, isn't that one of the strengths of 
virtualization?  Two examples: a file-based block device implemented 
in qemu gives you fancy file formats with encryption and compression, 
while the same device implemented in the kernel gives you a 
low-overhead path directly to a zillion-disk SAN volume.  Or a 
user-level network device capable of running with the slirp stack and 
no permissions vs. the kernel device running copyless most of the time 
and using a dma engine for the rest but requiring you to be good 
friends with the admin.


The user should expect zero reconfigurations moving a VM from one 
model to the other.


I'm wary of introducing the notion of hypercalls to this device because 
it makes the device VMM specific.  Maybe we could have the device 
provide an option ROM that was treated as the device "BIOS" that we 
could use for kicking and interrupt acking?  Any idea of how that would 
map to Windows?  Are there real PCI devices that use the option ROM 
space to provide what's essentially firmware?  Unfortunately, I don't 
think an option ROM BIOS would map well to other architectures.


None of the PCI devices currently work like that in QEMU.  It would 
be very hard to make a device that worked this way because since the 
order in which values are written matter a whole lot.  For instance, 
if you wrote the status register before the queue information, the 
driver could get into a funky state.
  


I assume you're talking about restore?  Isn't that atomic?


If you're doing restore by passing the PCI config blob to a registered 
routine, then sure, but that doesn't seem much better to me than just 
having the device generate that blob in the first place (which is what 
we have today).  I was assuming that you would want to use the existing 
PIO/MMIO handlers to do restore by rewriting the config as if the guest was.



Not much of an argument, I know.


wrt. number of queues, 8 queues will consume 32 bytes of pci space 
if all you store is the ring pfn.



You also at least need a num argument which takes you to 48 or 64 
depending on whether you care about strange formatting.  8 queues may 
not be enough either.  Eric and I have discussed whether the 9p 
virtio device should support multiple mounts per-virtio device and if 
so, whether each one should have it's own queue.  Any devices that 
supports this sort of multiplexing will very quickly start using a 
lot of queues.
  


Make it appear as a pci function?  (though my feeling is that multiple 
mounts should be different devices; we can then hotplug mountpoints).


We may run out of PCI slots though :-/

I think most types of hardware have some notion of a selector or 
mode.  Take a look at the LSI adapter or even VGA.


  


True.  They aren't fun to use, though.


I don't think they're really any worse :-)

Regards,

Anthony Liguori

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch/backport] CFS scheduler, -v24, for v2.6.24-rc3, v2.6.23.8, v2.6.22.13, v2.6.21.7

2007-11-23 Thread Durand
Ingo,

I downloaded the updated cfs-v24 patch and applied to 2.6.22.13. Compiled and
ran fine. Suspend and hibernate are working on my nc6000 laptop now. Now I'm
off to compile and run 2.6.22.14. 

--
Thanks,

Durand
--- Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Durand <[EMAIL PROTECTED]> wrote:
> 
> > Ingo,
> > 
> > Just applied this patch to 2.6.22.13 and 2.6.22.14. Compiles and runs 
> > fine but on my laptop, it prevents suspending and hibernating with 
> > "one tasks refuses to freeze" load_balance_mo.
> 
> please re-download the v24 patch, it should have this bug fixed.
> 
>   Ingo
> 



  

Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  
http://overview.mail.yahoo.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >