Re: [bug] xfrm_state_lock: possible circular locking dependency detected
* Herbert Xu <[EMAIL PROTECTED]> wrote: > On Fri, Nov 23, 2007 at 04:38:51PM +0100, Ingo Molnar wrote: > > > > DaveJ's Fedora 8 rpm for 2.6.24 works petty well, except for the > > neworking related lockdep assert attached below, which happened while > > starting up ipsec. Let me know if you need any more info - it's a pretty > > stock setup. > > Thanks for the report Ingo! > > This is indeed a regression caused by: > > commit 050f009e16f908932070313c1745d09dc69fd62b > Author: Herbert Xu <[EMAIL PROTECTED]> > Date: Tue Oct 9 13:31:47 2007 -0700 > > [IPSEC]: Lock state when copying non-atomic fields to user-space > > For 2.6.24 I'm simply going to revert this change since that just puts > us back to the same state we've been for the last few years. > > For 2.6.25 I'll do a proper fix by making sure that every xfrm state > user obeys the rule that if x->lock is to be taken with > xfrm_state_lock then it must be done from within. ok, great. I cannot test the revert because i only run distro kernels on this box so i can only confirm that the bug is gone once your revert is upstream and DaveJ has built a new Fedora kernel for it (which is 1-2 days after the commit goes upstream). So consider it fixed once you do the revert and i'll re-report it if i see any similar assert on a kernel that has this commit reverted. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Eric Dumazet wrote: > 1) Can the fd passing with recvmsg() on AF_UNIX also gets O_CLOEXEC > support ? Already there, see MSG_CMSG_CLOEXEC. > 2) Why this O_NONBLOCK ability is needed for sockets ? Is it a security > issue, and if yes could you remind it to me ? No security issue. But look at any correct network program, all need to set the mode to non-blocking. Adding this support to the syscall comes at almost no cost and it cuts the cost for every program down by one or two syscalls. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFHR9YQ2ijCOnn/RHQRArbyAJ0d25FPg/BWmJ4YIzJKhO9iaBJNXwCgmpuX PAA6u3Dc56AlBegTRqtqJPc= =j5vi -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv5 4/5] Allow setting O_NONBLOCK flag for new sockets
Ulrich Drepper a écrit : This patch adds support for setting the O_NONBLOCK flag of the file descriptors returned by socket, socketpair, and accept. Thanks Ulrich for this v5 series. I have two more questions. 1) Can the fd passing with recvmsg() on AF_UNIX also gets O_CLOEXEC support ? (In my understanding, only accept(), socket(), socketcall(), socketpair()) are handled, so it might work on i386 (because recvmsg() is multiplexed under socketcall), but not on x86_64. 2) Why this O_NONBLOCK ability is needed for sockets ? Is it a security issue, and if yes could you remind it to me ? Thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver
On Nov 24, 2007 2:43 PM, David Woodhouse <[EMAIL PROTECTED]> wrote: > > On Fri, 2007-11-23 at 17:04 -0500, Robin Getz wrote: > > It could be a runtime if() but we don't currently have the is_mach() all set > > up properly today. > > > > This is because on most systems that Blackfin ships on - memory is the > > dominate cost of the system, and end users don't want to take the either the > > storage (flash) hit of having code they don't use, or the run time (DRAM) > > overhead. They are fine with compiling 2 kernels for two platforms if it > > means things are cheaper. :) > > > > That being said, we still need to go back, and add things properly - and > > just > > let gcc optimise things away if it is not used - c code is more maintainable > > than all the ifdefs we have today. > > > > This is the goal - it will just take a little bit to get there. > > For now I suspect you could at least define machine_is_bf52x() and > machine_is_bf54x() which are hard-coded to either zero or one according > to the configuration, and at least you wouldn't need to add ifdefs to > drivers. > We got some plan to do this, but maybe cpu_is_bf52x() and cpu_is_bf54x() are better. Thanks. -Bryan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver
On Fri, 2007-11-23 at 17:04 -0500, Robin Getz wrote: > It could be a runtime if() but we don't currently have the is_mach() all set > up properly today. > > This is because on most systems that Blackfin ships on - memory is the > dominate cost of the system, and end users don't want to take the either the > storage (flash) hit of having code they don't use, or the run time (DRAM) > overhead. They are fine with compiling 2 kernels for two platforms if it > means things are cheaper. :) > > That being said, we still need to go back, and add things properly - and just > let gcc optimise things away if it is not used - c code is more maintainable > than all the ifdefs we have today. > > This is the goal - it will just take a little bit to get there. For now I suspect you could at least define machine_is_bf52x() and machine_is_bf54x() which are hard-coded to either zero or one according to the configuration, and at least you wouldn't need to add ifdefs to drivers. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3-mm1: I/O error, system hangs
On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote: > Le 23.11.2007 12:38, Hannes Reinecke a écrit : > > Hannes Reinecke wrote: > >> Laurent Riffard wrote: > >>> Le 21.11.2007 23:41, Andrew Morton a écrit : > On Wed, 21 Nov 2007 22:45:22 +0100 > Laurent Riffard <[EMAIL PROTECTED]> wrote: > > > Le 21.11.2007 05:45, Andrew Morton a écrit : > >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/ > > Hello, > > > > My system hangs shortly after I logged in Gnome desktop. SysRq-W shows > > that a bunch of task are blocked in "D" state, they seem to wait for > > some I/O completion. I can try to hand-copy some data if requested. > > > > I found these messages in dmesg: > > > > ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 > > EXT3-fs: mounted filesystem with ordered data mode. > > sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT > > driverbyte=DRIVER_OK,SUGGEST_OK > > end_request: I/O error, dev sda, sector 16460 > > ReiserFS: sda7: found reiserfs format "3.6" with standard journal > > ReiserFS: sda7: using ordered data mode > > -- > > ReiserFS: sda7: Using r5 hash to sort names > > sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT > > driverbyte=DRIVER_OK,SUGGEST_OK > > end_request: I/O error, dev sdb, sector 19632 > > sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT > > driverbyte=DRIVER_OK,SUGGEST_OK > > end_request: I/O error, dev sdb, sector 40037363 > > Adding 1048568k swap on /dev/mapper/vglinux1-lvswap. Priority:-1 > > extents:1 across:1048568k > > lp0: using parport0 (interrupt-driven). > > > > These errors occur *only* with 2.6.24-rc3-mm1, they are 100% > > reproducible. > > 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine. > > > > Maybe something is broken in pata_via driver ? > > > Could be - > libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch > and > pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch > touch pata_via.c. > >>> None of the above... > >>> > >>> I did a bisection, it spotted git-scsi-misc.patch. > >>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine. > >>> > >>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not > >>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other > >>> commits are touching documentation or drivers I don't use. I'll try > >>> to revert only this one this evening. > > I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 > does fix the problem. > > >> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an > >> error where > >> I shouldn't. Checking ... > >> > > Ok, found it. We are blocking even special commands (ie requests with > > PREEMPT not set) > > when FAILFAST is set. Which is clearly wrong. The attached patch fixes this. > > Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O > errors. I think the problem is the way we treat BLOCKED and QUIESCED (the latter is the state that the domain validation uses and which we cannot kill fastfail on). It's definitely wrong to kill fastfail requests when the state is QUIESCE. This patch (which is applied on top of Hannes original) separates the BLOCK and QUIESCE states correctly ... does this fix the problem? James diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 13e7e09..a7cf23a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1279,18 +1279,21 @@ int scsi_prep_state_check(struct scsi_device *sdev, struct request *req) "rejecting I/O to dead device\n"); ret = BLKPREP_KILL; break; - case SDEV_QUIESCE: case SDEV_BLOCK: /* -* If the devices is blocked we defer normal commands. -*/ - if (!(req->cmd_flags & REQ_PREEMPT)) - ret = BLKPREP_DEFER; - /* * Return failfast requests immediately */ if (req->cmd_flags & REQ_FAILFAST) ret = BLKPREP_KILL; + + /* fall through */ + + case SDEV_QUIESCE: + /* +* If the devices is blocked we defer normal commands. +*/ + if (!(req->cmd_flags & REQ_PREEMPT)) + ret = BLKPREP_DEFER; break; default: /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > Also io->pending may need better protection - atomic, but missing memory > barriers? (May be getting away without sometimes due to side-effects of > other function calls, but needs doing properly.) If it's using atomic_dec_and_test then that comes with an implicit memory barrier. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I believe it was you who once eloquently observed that, at its heart, the POSIX (sic) capabilities model was all about providing a mechanism for overriding the prevailing security policy (be it MAC or DAC or whatever) in a defined way. Casey Schaufler wrote: > Now I know that there are lots of people who don't share my > views on granularity, but I have lots of experiance with this > and the cases where it actually makes sense to break the MAC > capabilities up are rare. > > That's my going in position, at any rate. I'm always open to > finding out why I'm wrong. Its not so much why you are wrong, as being clear that we're not using a generic name and inadvertently limiting ourselves to a SMACK-like model... It feels to me as if a MAC "override capability" is, if true to its name, extra to the MAC model; any MAC model that needs an 'override' to function seems under-specified... SELinux clearly feels no need for one, and browsing through your SMACK patch, there are many instances where this capability is used as an convenience privileged override. However, in other situations, it appears as if the capability is required for basic SMACK operations to succeed. My sense is that there is a case to be made for: CAP_MAC_ADMIN and CAP_MAC_OVERRIDE here. The former being for cases where SMACK (or whatever MAC supports it) requires privilege to perform a privileged MAC operation, and the latter for saying "OK, I'm without a paddle but need one" (or words to that effect). Cheers Andrew -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHR8AA+bHCR3gb8jsRAqY/AJsGI56TDQyBD42LCovpJTYHkaL0pQCdHM5S kk5v2O4ohY2O0z93JNdKVBY= =dbQn -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH] SO_NO_CHECK for IPv6
David Schwartz <[EMAIL PROTECTED]> wrote: > >> Regardless of whatever verifications your application is doing >> on the data, it is not checksumming the ports and that's what >> the pseudo-header is helping with. > > So what? We are in the case where the data has already gotten to him. If it > got to him in error, he'll reject it anyway. The receive checksum check will > only reject packets that he would reject anyway. That makes it needless. What if it goes to the wrong recipient who doesn't have the upper- level checksums? This is the whole point, IPv6 unlike IPv4 does not have IP header checksums so the high-level needs to protect it by checksumming the pseudo-header. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)
Tejun Heo wrote: > Robert Hancock wrote: >> We need to run any DMA command with result taskfile requested in ADMA mode >> when the port is in ADMA mode, otherwise it may try to use the legacy DMA >> engine >> in ADMA mode which is not allowed. Enforce this with BUG_ON() since data >> corruption could potentially result if this happened. Also WARN_ON() if we >> try >> and send result taskfile commands while NCQ commands are still active, since >> the >> hardware doesn't allow this. >> >> Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> >> >> @@ -1425,9 +1427,17 @@ >> +/* We can't handle result taskfile with NCQ commands active, since >> + retrieving the taskfile switches us out of ADMA mode and would abort >> + existing commands. */ >> +WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) && >> +(qc->ap->qc_allocated & ~(1 << qc->tag))); > > I owe an apology here. ap->qc_allocated & ~(1 << qc->tag) test isn't > correct. Sorry. qc deferring happens after qc is allocated so the > condition can trigger (although it should be rare) even when nothing is > going wrong, so I guess it should be WARN_ON((qc->flags & > ATA_QCFLAG_RESULT_TF) && link->sactive). Or, just make the assumption clear by not allowing NCQ w/ RESULT_TF at all. if (unlikely(qc->tf.protocol == ATA_PROT_NCQ && (qc->flags & ATA_QCFLAG_RESULT_TF)) { ata_dev_printk(qc->dev, KERN_ERR, "NCQ w/ RESULT_TF not allowed\n"); return AC_ERR_SYSTEM; } Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)
Robert Hancock wrote: > We need to run any DMA command with result taskfile requested in ADMA mode > when the port is in ADMA mode, otherwise it may try to use the legacy DMA > engine > in ADMA mode which is not allowed. Enforce this with BUG_ON() since data > corruption could potentially result if this happened. Also WARN_ON() if we try > and send result taskfile commands while NCQ commands are still active, since > the > hardware doesn't allow this. > > Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> > > @@ -1425,9 +1427,17 @@ > + /* We can't handle result taskfile with NCQ commands active, since > +retrieving the taskfile switches us out of ADMA mode and would abort > +existing commands. */ > + WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) && > + (qc->ap->qc_allocated & ~(1 << qc->tag))); I owe an apology here. ap->qc_allocated & ~(1 << qc->tag) test isn't correct. Sorry. qc deferring happens after qc is allocated so the condition can trigger (although it should be rare) even when nothing is going wrong, so I guess it should be WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) && link->sactive). Sorry. :-) -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/4] Timerfd v2 - new timerfd API
Hi Davide, [...] > +asmlinkage long sys_timerfd_create(int clockid, int flags) > { > - int error; > + int error, ufd; > struct timerfd_ctx *ctx; > struct file *file; > struct inode *inode; > - struct itimerspec ktmr; > - > - if (copy_from_user(, utmr, sizeof(ktmr))) > - return -EFAULT; > > if (clockid != CLOCK_MONOTONIC && > clockid != CLOCK_REALTIME) > return -EINVAL; Could I suggest here, the following placeholder addition: if (flags != 0) return -EINVAL; Later than can replaced with something like: if (flags & ~(O_NONBLOCK | O_CLOEXEC)) return -EINVAL; Having the first of the checks above will allow userland to determine what is implemented, rather than having non-zero flags silently ignored. Cheers, Michael > + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); > + if (!ctx) > + return -ENOMEM; > + > + init_waitqueue_head(>wqh); > + ctx->clockid = clockid; > + hrtimer_init(>tmr, clockid, HRTIMER_MODE_ABS); > + > + error = anon_inode_getfd(, , , "[timerfd]", > + _fops, ctx); > + if (error) { > + kfree(ctx); > + return error; > + } > + > + return ufd; > +} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Robert Hancock wrote: > This fixes some problems with ATAPI devices on nForce4 controllers in ADMA > mode > on systems with memory located above 4GB. We need to delay setting the 64-bit > DMA mask until the PRD table and padding buffer are allocated so that they > don't > get allocated above 4GB and break legacy mode (which is needed for ATAPI > devices). > > Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> Acked-by: Tejun Heo <[EMAIL PROTECTED]> Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 24, 2007 4:49 AM, Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: > > ... or I just don't see the bug. > > See my earlier post in this thread: there's a race in the write loop > where a work struct could be used twice on the same queue. > (Needs data structure change to fix that, which nobody has attempted > to do yet.) As I wrote in an earlier post: I did see this lockdep message even with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted, so the work struct is not used in the write loop. > BTW To eliminate any internal lockdep concerns (and people say there > should be no problem) temporarily add a second struct instead of reusing > one on two queues. I think, this might really be a lockdep bug, but as I'm not fluent enough with C, please check, if my logik is correct: The freed-locked-lock-test is the only function that uses this in lockdep.c: static inline int in_range(const void *start, const void *addr, const void *end) { return addr >= start && addr <= end; } This will return true, if addr is in the range of start (including) to end (including). But debug_check_no_locks_freed() seems does: const void *mem_to = mem_from + mem_len -> mem_to is the last byte of the freed range, that fits in_range lock_from = (void *)hlock->instance; -> first byte of the lock lock_to = (void *)(hlock->instance + 1); -> first byte of the next lock, not last byte of the lock that is being checked! (Or am I reading this wrong?) The test is: if (!in_range(mem_from, lock_from, mem_to) && !in_range(mem_from, lock_to, mem_to)) continue; So it tests, if the first byte of the lock is in the range that is freed ->OK And if the first byte of the *next* lock is in the range that is freed -> Not OK. That would also explain the rather strange output: = [ BUG: held lock freed! ] - kcryptd/1022 is freeing memory 81011EBEFB00-81011EBEFB3F, with a lock still held there! (kcryptd){--..}, at: [] run_workqueue+0x129/0x210 2 locks held by kcryptd/1022: #0: (kcryptd){--..}, at: [] run_workqueue+0x129/0x210 #1: (>work#2){--..}, at: [] run_workqueue+0x129/0x210 That claims that the lock of the *workqueue* struct, not the work struct is getting freed! But I'm still happily using the dm-crypt device, even 19 hours after that message. So my current best guess to the source of this message is, that with the change in the ref counting it is now possible that the work struct is really getting freed before the workqueue function returns. But as the comment in run_workqueue() says, that is still legal. But now the first byte of the next lock is part of the freed memory and so the wrong "held lock freed" is triggered. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Saturday 24 November 2007 06:53:30 Andi Kleen wrote: > This serves as a documentation > on what is considered internal. And if some obscure module (in or > out of tree) wants to use an internal interface they first have > to send the module maintainer a patch and get some review this way. So, you're saying that there's a problem with in-tree modules using symbols they shouldn't? Can you give an example? > I believe that is fairly important in tree too because the > kernel has become so big now that review cannot be the only > enforcement mechanism for this anymore. If people aren't reviewing, this won't make them review. I don't think the problem is that people are conniving to avoid review. > Another secondary reason is that there are too many exported interfaces > in general. Probably, but this doesn't reduce it. > Several distributions have policies that require to > keep the changes to these exported interfaces minimal and that > is very hard with thousands of exported symbol. With name spaces > the number of truly publicly exported symbols will hopefully > shrink to a much smaller, more manageable set. *This* makes sense. But it's not clear that the burden should be placed on kernel coders. You can create a list yourself. How do I tell the difference between "truly publicly exported" symbols and others? If a symbol has more than one in-tree user, it's hard to argue against an out-of-tree module using the symbol, unless you're arguing against *all* out-of-tree modules. Sorry, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree
--- Andrew Morgan <[EMAIL PROTECTED]> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Casey Schaufler wrote: > > In the end we can call it CAP_LATE_FOR_DINNER if that's the only way > > I can move forward. CAP_MAC_OVERRIDE is the obvious partner to > > CAP_DAC_OVERRIDE, so that's still my preference. CAP_SMACK_OVERRIDE > > unnecessarily ties it to one LSM, and in spite of what some people > > still seem to think, I see more LSMs in the pipeline. > > I'd personally not like to see SMACK appear in a capability name. No > offense Casey, but SMACK may be displaced with YAMAC (*) someday, and > I'd hate to have wasted a capability on it. No offense taken. Technology continues marching forward and all that. > Using CAP_MAC_OVERRIDE makes > sense to me - even if its not (yet/ever) honored by all MAC LSMs. Thanks. > I do have a question about whether one capability is sufficient in > general for MAC. Looking at the: > > http://wt.xpilot.org/publications/posix.1e/download.html > > last draft, there are no less than 5 capabilities (p173) allocated for > MAC. Presumably there was a good reason for 5 and not 1 back then - > could you summarize what is different now? There are to my mind two important differences. The first is that my experiance with Trusted Irix (Trix from here on), which used (uses?) capabilities and MAC together, is that the granularity is lost on 99 44/100% of programs, programmers, evaluators, admins, and problems. You just don't get that many cases where it actually gets you anything to have less than all the MAC capabilities. Applications that care about MAC to the extent that they use the capabilities tend to use the lot, if not all the time, in certain circumstances. I'm afraid that I am not a major fan of fine grained privilege based on my experiance with it. The second and perhaps more important reason is that the POSIX draft assumed a Bell & LaPadula sensitivity model, or at least a model very much like it. What would CAP_MAC_DOWNGRADE mean on a Smack system configured: OneHand OtherHandr--- OtherHandGrippingHand r--- GrippingHand OneHand r--- What would CAP_MAC_UPGRADE mean, for that matter? It's even worse to consider that the relationships can change. CAP_MAC_READ and CAP_MAC_WRITE still make sense, as does CAP_MAC_RELABEL_SUBJ. But if you have CAP_MAC_WRITE you can do pretty much the same damage as if you have CAP_MAC_RELABEL_SUBJ, and the other way around, and if you're not going to use one of the other capabilities after you find out interesting things using CAP_MAC_READ it's hard to figure why you'd bother. Now I know that there are lots of people who don't share my views on granularity, but I have lots of experiance with this and the cases where it actually makes sense to break the MAC capabilities up are rare. That's my going in position, at any rate. I'm always open to finding out why I'm wrong. Thank you. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ipmi_watchdog can not reset the kernel panic machine
Build kernel-2.6.24-rc3. pmi_watchdog can not reset the kernel panic machine. The watchdog can never to record panic information to IPMI SEL. 1. I disable auto reset when kernel panic by echo "0" > /proc/sys/kernel/panic 2. modprobe ipmi_watchdog timeout=120 action=reset 3. Load a driver, the driver will call panic() when ioctl to call into the driver. 4. By ioctl call into the driver, panic the system. in wdog_panic_handler, I printk "ipmi_watchdog_state=WDOG_TIMEOUT_NONE" so, the watchdog can never to record panic information to IPMI SEL. static int wdog_panic_handler(struct notifier_block *this, unsigned long event, void *unused) { static int panic_event_handled = 0; /* On a panic, if we have a panic timeout, make sure to extend the watchdog timer to a reasonable value to complete the panic, if the watchdog timer is running. Plus the pretimeout is meaningless at panic time. */ if (watchdog_user && !panic_event_handled && ipmi_watchdog_state != WDOG_TIMEOUT_NONE) { /* Make sure we do this only once. */ panic_event_handled = 1; timeout = 255; pretimeout = 0; panic_halt_ipmi_set_timeout(); } return NOTIFY_OK; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Laptop keyboard unusable when ACPI is active was Re: [2.6.22] i8042, ACPI, ipw2100 and issues reported by psmouse.c atkbd.c
On Sunday 21 October 2007 05:43, [EMAIL PROTECTED] wrote: > I have emerged lm_sensors but can't get it running - it keeps saying "No > sensors found!" and complaining about kernel drivers not properly setup. > I have attached the output of sensors-detect, from which it seems that > the kernel is OK. In this case, getting sensors installed is the opposite of what you want to do. The idea is to simplify the system until it works, then figure out what simplification made it work. ie. disable sensors entirely by building a kernel with CONFIG_HWMON=n If that makes things work, then it is a clue. If that was disabled already, then just keep it disabled. cheers, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Laptop keyboard unusable when ACPI is active
On Thursday 22 November 2007 02:24, [EMAIL PROTECTED] wrote: > It is also important to note that this bug always comes with bug 8740 > http://bugzilla.kernel.org/show_bug.cgi?id=8740 (also confirmed and also > an ACPI issue). No, 8740 is not an ACPI issue. http://bugzilla.kernel.org/show_bug.cgi?id=8740#c2 -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: > Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio(). > Now there is an additional call to crypt_dec_pending() to balance the > additional ref placed into crypt_write_io_process(). And that one is > not called from whatever context/thread cleans up after > make_generic_request, but directly in the context/thread of the caller > of crypt_write_io_process(), and that is kcryptd. Please do look at the latest patches (always at http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/series.html ) where you'll see I've already disentangled the mess of functions and given them more understandable names, so at least following the program flow is easier. Read and write do the ref counting differently (but correctly AFAICT) - I want that changing, but held back from doing it without first checking whether the later patches (not yet reviewed) provide a reason to prefer one method over the other. Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Laptop keyboard unusable when ACPI is active
On Friday 23 November 2007 02:44, Mats Johannesson wrote: > The bad interaction between ACPI controlled EC (embedded controller) > and the i8042 interrupt handler is theorized about in detail at OLPCs > http://dev.laptop.org/ticket/2401 - almost at the end of that page. > Thanks to Daniele C for the link. huh? I believe that the OLPC XO1 does not run in ACPI mode and thus does not use the ACPI EC driver to talk to the EC on their board. Presumably they use some native embedded controller driver to talk to their platform specific embedded controller. I don't know why they call their interrupt an SCI. Per above, it can't be an ACPI SCI. Presumably they call it that b/c their chipset documentation calls it that too, on the (invalid) assumption that an ACPI-enabled OS and firmware would be running on the hardware. Please let me know if I'm wrong. -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Also io->pending may need better protection - atomic, but missing memory barriers? (May be getting away without sometimes due to side-effects of other function calls, but needs doing properly.) [BTW Other device-mapper atomic_t usage also needs reviewing.] Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: > ... or I just don't see the bug. See my earlier post in this thread: there's a race in the write loop where a work struct could be used twice on the same queue. (Needs data structure change to fix that, which nobody has attempted to do yet.) BTW To eliminate any internal lockdep concerns (and people say there should be no problem) temporarily add a second struct instead of reusing one on two queues. Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NET: dmfe: don't access configuration space in D3 state
On Saturday 24 November 2007 05:10:37 Jeff Garzik wrote: > Maxim Levitsky wrote: > >>From 7e24227257f315e52fe0b494dc1253d2a0ce5dff Mon Sep 17 00:00:00 2001 > > From: Maxim Levitsky <[EMAIL PROTECTED]> > > Date: Fri, 23 Nov 2007 01:15:36 +0200 > > Subject: [PATCH] NET: dmfe: don't access configuration space in D3 state > > Accidently I reversed the order of pci_save_state and > > pci_set_power_state in .suspend()/.resume() callbacks > > > > Signed-off-by: Maxim Levitsky <[EMAIL PROTECTED]> > > --- > > drivers/net/tulip/dmfe.c |4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > applied #upstream-fixes, after hand-editing patch changelog taking by > git-am from email body > > > Hi, Thanks, Next time I will be more careful with changelogs. Best regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smack-version-11c-simplified-mandatory-access-control-kernel.patch added to -mm tree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Casey Schaufler wrote: > In the end we can call it CAP_LATE_FOR_DINNER if that's the only way > I can move forward. CAP_MAC_OVERRIDE is the obvious partner to > CAP_DAC_OVERRIDE, so that's still my preference. CAP_SMACK_OVERRIDE > unnecessarily ties it to one LSM, and in spite of what some people > still seem to think, I see more LSMs in the pipeline. I'd personally not like to see SMACK appear in a capability name. No offense Casey, but SMACK may be displaced with YAMAC (*) someday, and I'd hate to have wasted a capability on it. Using CAP_MAC_OVERRIDE makes sense to me - even if its not (yet/ever) honored by all MAC LSMs. I do have a question about whether one capability is sufficient in general for MAC. Looking at the: http://wt.xpilot.org/publications/posix.1e/download.html last draft, there are no less than 5 capabilities (p173) allocated for MAC. Presumably there was a good reason for 5 and not 1 back then - could you summarize what is different now? Thanks Andrew (*) yet-another example of yet-another -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHR5mc+bHCR3gb8jsRAlB9AJsHPi1+fFp1ONKJCMFDpLS1lYG4AwCfYxMX 8aaU+sOBNHU01uldtrJ8cEI= =/USy -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Blackfin SMC91x Driver: punt CONFIG_BFIN -- we already have CONFIG_BLACKFIN
Bryan Wu wrote: From: Mike Frysinger <[EMAIL PROTECTED]> Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]> Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> --- drivers/net/Kconfig |2 +- drivers/net/smc91x.h |2 +- applied 1-2 to #upstream-fixes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NET: dmfe: don't access configuration space in D3 state
Maxim Levitsky wrote: From 7e24227257f315e52fe0b494dc1253d2a0ce5dff Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <[EMAIL PROTECTED]> Date: Fri, 23 Nov 2007 01:15:36 +0200 Subject: [PATCH] NET: dmfe: don't access configuration space in D3 state Accidently I reversed the order of pci_save_state and pci_set_power_state in .suspend()/.resume() callbacks Signed-off-by: Maxim Levitsky <[EMAIL PROTECTED]> --- drivers/net/tulip/dmfe.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) applied #upstream-fixes, after hand-editing patch changelog taking by git-am from email body - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2][2.6.24] ehea: Improve tx packets counting
Thomas Klein wrote: Using own tx_packets counter instead of firmware counters. Signed-off-by: Thomas Klein <[EMAIL PROTECTED]> --- drivers/net/ehea/ehea.h |2 +- drivers/net/ehea/ehea_main.c |9 +++-- 2 files changed, 8 insertions(+), 3 deletions(-) applies 1-2 to #upstream-fixes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/9] cxgb3 - fix MSI-X failure path
Divy Le Ray wrote: From: Divy Le Ray <[EMAIL PROTECTED]> Return error code when msi-x settings fail. Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]> --- drivers/net/cxgb3/cxgb3_main.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) applied 1-9 to #upstream, then trimmed all trailing whitespace - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] e100: free IRQ to remove warning when rebooting
Ian Wienand wrote: Hi, When rebooting today I got Will now restart. ACPI: PCI interrupt for device :00:03.0 disabled GSI 20 (level, low) -> CPU 1 (0x0100) vector 53 unregistered Destroying IRQ53 without calling free_irq WARNING: at /home/insecure/ianw/programs/git-kernel/linux-2.6/kernel/irq/chip.c:76 dynamic_irq_cleanup() Call Trace: [] show_stack+0x40/0xa0 sp=e0407c927b40 bsp=e0407c920eb8 [] dump_stack+0x30/0x60 sp=e0407c927d10 bsp=e0407c920ea0 [] dynamic_irq_cleanup+0x160/0x1e0 sp=e0407c927d10 bsp=e0407c920e70 [] destroy_and_reserve_irq+0x30/0xc0 sp=e0407c927d10 bsp=e0407c920e40 [] iosapic_unregister_intr+0x5b0/0x5e0 sp=e0407c927d10 bsp=e0407c920dd8 [] acpi_unregister_gsi+0x30/0x60 sp=e0407c927d10 bsp=e0407c920db8 [] acpi_pci_irq_disable+0x140/0x160 sp=e0407c927d10 bsp=e0407c920d88 [] pcibios_disable_device+0xa0/0xc0 sp=e0407c927d20 bsp=e0407c920d68 [] pci_disable_device+0x130/0x160 sp=e0407c927d20 bsp=e0407c920d38 [] e100_shutdown+0x1c0/0x220 sp=e0407c927d30 bsp=e0407c920d08 [] pci_device_shutdown+0x80/0xc0 sp=e0407c927d30 bsp=e0407c920ce8 [] device_shutdown+0xf0/0x180 sp=e0407c927d30 bsp=e0407c920cc8 [] kernel_restart+0x60/0x120 sp=e0407c927d30 bsp=e0407c920ca8 [] sys_reboot+0x3b0/0x480 sp=e0407c927d30 bsp=e0407c920c30 [] ia64_ret_from_syscall+0x0/0x20 sp=e0407c927e30 bsp=e0407c920c30 [] ia64_ivt+0x00010620/0x400 sp=e0407c928000 bsp=e0407c920c30 Restarting system. I think the solution might be to free the IRQ before the pci_device_shutdown Signed-off-by: Ian Wienand <[EMAIL PROTECTED]> --- e100.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/e100.c b/drivers/net/e100.c index 3dbaec6..8ae5ac3 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -2782,6 +2782,7 @@ static void e100_shutdown(struct pci_dev *pdev) pci_enable_wake(pdev, PCI_D3cold, 0); } + free_irq(pdev->irq, netdev); pci_disable_device(pdev); pci_set_power_state(pdev, PCI_D3hot); agreed, though I think free_irq() should come after pci_disable_device() like it does in e100_suspend(). auke? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Jeff Garzik wrote: Robert Hancock wrote: Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. sata_mv doesn't do ATAPI at all... .. Not yet, anyway. Stay tuned.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [bug] xfrm_state_lock: possible circular locking dependency detected
On Fri, Nov 23, 2007 at 04:38:51PM +0100, Ingo Molnar wrote: > > DaveJ's Fedora 8 rpm for 2.6.24 works petty well, except for the > neworking related lockdep assert attached below, which happened while > starting up ipsec. Let me know if you need any more info - it's a pretty > stock setup. Thanks for the report Ingo! This is indeed a regression caused by: commit 050f009e16f908932070313c1745d09dc69fd62b Author: Herbert Xu <[EMAIL PROTECTED]> Date: Tue Oct 9 13:31:47 2007 -0700 [IPSEC]: Lock state when copying non-atomic fields to user-space For 2.6.24 I'm simply going to revert this change since that just puts us back to the same state we've been for the last few years. For 2.6.25 I'll do a proper fix by making sure that every xfrm state user obeys the rule that if x->lock is to be taken with xfrm_state_lock then it must be done from within. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/59] drivers/net/chelsio: Add missing "space"
Joe Perches wrote: Signed-off-by: Joe Perches <[EMAIL PROTECTED]> --- drivers/net/chelsio/cxgb2.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) appied 29-36 to netdev#upstream - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
netdev-2.6 rebased
I pulled all the patches collected by DaveM in davem/netdev-2.6.git a few days ago into jgarzik/netdev-2.6.git#upstream. As of a few minutes ago, jgarzik/netdev-2.6.git was rebased to the latest 2.6.24-rc (torvalds/linux-2.6.git). Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] dmaengine: Correct invalid assumptions in the Kconfig text
From: Haavard Skinnemoen <[EMAIL PROTECTED]> This patch corrects recently changed (and now invalid) Kconfig descriptions for the DMA engine framework: - Non-Intel(R) hardware also has DMA engines; - DMA is used for more than memcpy and RAID offloading. In fact, on most platforms memcpy and RAID aren't factors, and DMA exists so that peripherals can transfer data to/from memory while the CPU does other work. Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]> Signed-off-by: David Brownell <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- This corrects a 'regression' of the Kconfig text that happened during the 2.6.24 merge window. Adrian was concerned that people were needlessly turning this capability on without the requisite hardware, but the wording is indeed misleading. drivers/dma/Kconfig |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 6a7d25f..c46b7c2 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -3,11 +3,13 @@ # menuconfig DMADEVICES - bool "DMA Offload Engine support" + bool "DMA Engine support" depends on (PCI && X86) || ARCH_IOP32X || ARCH_IOP33X || ARCH_IOP13XX help - Intel(R) offload engines enable offloading memory copies in the - network stack and RAID operations in the MD driver. + DMA engines can do asynchronous data transfers without + involving the host CPU. Currently, this framework can be + used to offload memory copies in the network stack and + RAID operations in the MD driver. if DMADEVICES - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: radeonfb i2c regression post-2.6.18.
On Fri, 2007-11-23 at 23:29 +0100, Jean Delvare wrote: > On Fri, 23 Nov 2007 17:00:52 +0100, Michael Buesch wrote: > > This patch fixes my crash problem. > > Out of curiosity, what kind of crash was it? I admit that I can't see > how the code could crash. Really sneaky... apparently, keeping the i2c lines asserted on his laptop model would drain enough current through the pullups (or the chip) that the temperature will raise significantly, causing a thermal shutdown if the machine was already warm. A bit scary... looks to me that a pullup is a bit too weak somewhere on the motherboard. That also means that this fix should reduce power consumption on the battery significantly on those machines as it must take quite a bit of power to increase the temperature that significantly (either that, or the heating part sits just next to the sensor). Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3-mm1 - Kernel Panic on IO-APIC
On Tue, Nov 20, 2007 at 10:18:39PM -0800, Andrew Morton wrote: > On Wed, 21 Nov 2007 11:41:23 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > > > Hi Andrew, > > > > Kernel panic's across different architectures like powerpc, x86_64, > > powerpc complains about IO-APICs?? > > > Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes) > > Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes) > > Mount-cache hash table entries: 256 > > SMP alternatives: switching to UP code > > ACPI: Core revision 20070126 Hmm. same date here. It's Asus P5B-E motheboard > > ..MP-BIOS bug: 8254 timer not connected to IO-APIC > > Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the > > 'noapic' kernel parameter > > ACPI or x86 breakage, I guess. > > Did 'noapic' work? No! The box freezes somewhere after "Freeing unused kernel memory"... Bisection points to git-x86.patch, though. git-bisect start # good: [f05092637dc0d9a3f2249c9b283b973e6e96b7d2] Linux 2.6.24-rc3 git-bisect good f05092637dc0d9a3f2249c9b283b973e6e96b7d2 # bad: [46c8c396d2c87b786a7fac615c289f85a18e53ce] w1-build-fix git-bisect bad 46c8c396d2c87b786a7fac615c289f85a18e53ce # bad: [4e22f4852c48e1eddfe04299e78c0456164abe86] frv-move-dma-macros-to-scatterlisth-for-consistency git-bisect bad 4e22f4852c48e1eddfe04299e78c0456164abe86 # bad: [4e22f4852c48e1eddfe04299e78c0456164abe86] frv-move-dma-macros-to-scatterlisth-for-consistency git-bisect bad 4e22f4852c48e1eddfe04299e78c0456164abe86 # good: [d5135f31313af2be37d8ccb71e2a42f8e221d8c4] ide-mm-ide-disk-extend-timeout-for-pio-out-commands git-bisect good d5135f31313af2be37d8ccb71e2a42f8e221d8c4 # good: [6be815e83f506f4c39a46cf59014e29a95c5e6c4] iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue git-bisect good 6be815e83f506f4c39a46cf59014e29a95c5e6c4 # good: [6be815e83f506f4c39a46cf59014e29a95c5e6c4] iommu-sg-merging-call-blk_queue_segment_boundary-in-__scsi_alloc_queue git-bisect good 6be815e83f506f4c39a46cf59014e29a95c5e6c4 # bad: [c792db6d06114a85e33a27c89e9e979f11b951c4] slub-fix-coding-style-violations git-bisect bad c792db6d06114a85e33a27c89e9e979f11b951c4 # bad: [c792db6d06114a85e33a27c89e9e979f11b951c4] slub-fix-coding-style-violations git-bisect bad c792db6d06114a85e33a27c89e9e979f11b951c4 # bad: [76f3939b76ff557f73720b57a16716196f04e407] x86_64-make-sparsemem-vmemmap-the-default-memory-model-v2 git-bisect bad 76f3939b76ff557f73720b57a16716196f04e407 # good: [b8ba611566d8799a979b190d4bb14305ca64ee0e] sis-fb-driver-_ioctl32_conversion-functions-do-not-exist-in-recent-kernels git-bisect good b8ba611566d8799a979b190d4bb14305ca64ee0e # good: [e34995928859308d2abef1709332e2b12d36db2f] git-ipwireless_cs git-bisect good e34995928859308d2abef1709332e2b12d36db2f # bad: [f520abbbe11bc8253714bcd34aaaf19bdf82189e] git-x86-identify_cpu-fix git-bisect bad f520abbbe11bc8253714bcd34aaaf19bdf82189e I honestly tried fresh #mm from x86 tree -- the one which ends at commit 70be766db1105c0fc9aed8e954d0c343c1eda067 "x86: Add the RDC machine specific reboot fixup". FWIW, commit "x86: validate against ACPI motherboard resources" is innocent. After "x86: make stack size configurable" damn thing wouldn't build and applying fixets from -mm doesn't help at 3AM. Again, it's late here, I'll recheck today. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pata_isapnp: Polled devices
Alan Cox wrote: If a card has no IRQ then pass no interrupt handler but allow polled usage. Signed-off-by: Alan Cox <[EMAIL PROTECTED]> diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c linux-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c --- linux.vanilla-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c 2007-11-16 17:54:39.0 + +++ linux-2.6.24-rc2-mm1/drivers/ata/pata_isapnp.c 2007-11-16 18:14:29.0 + @@ -75,13 +75,16 @@ struct ata_host *host; struct ata_port *ap; void __iomem *cmd_addr, *ctl_addr; + int irq = 0; + irq_handler_t handler = NULL; if (pnp_port_valid(idev, 0) == 0) applied #upstream-fixes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Jeff Garzik wrote: Robert Hancock wrote: Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. sata_mv doesn't do ATAPI at all... Right.. missed that ATA_FLAG_NO_ATAPI. So these issues Tom is reporting are just with a normal SATA hard drive? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Robert Hancock wrote: Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. sata_mv doesn't do ATAPI at all... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Mark Lord wrote: Morrison, Tom wrote: I am hopeful that the sata_mv has this bug (I proved that the problem I was experiencing was due to the sata_mv driver with 3.75Gig or more of memory)... I am on vacation for a week or more ...or I'd tell you today if it did have this bug! .. Yeah, I kind of had your reports in mind when I asked that. :) On a related note, I now have lots of Marvell (sata_mv) hardware here, and an Intel CPU/chipset box with physical RAM above the 4GB boundary. Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. Likely it needs a similar slave_config trick to change bounce limit depending on the connected device, unless there is really a way to issue ATAPI commands with this EDMA interface, as the TODO list in sata_mv.c suggests may be possible.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
No error when inotify_add_watch(/an/NFS/file)
Dear Experts, NFS doesn't work with inotify (and it looks like it can't, certainly not before NFS v4.1). However, if I give an NFS filename to inotify_add_watch(), I don't get an error. If it indicated an error in this case then I could easily fall back to some sort of polling. Without an error, I need some other way to detect NFS (and any other non-inotify-compatible filesystems). Any thoughts? Phil. (If you Cc: me in any replies I'll see them sooner.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
Am Freitag 23 November 2007 08:21:09 schrieb Andrew Morton: > On Tue, 13 Nov 2007 21:55:15 +0100 Jan-Simon M__ller <[EMAIL PROTECTED]> > wrote: > > Hi! > > You removed from cc the guys who are most likely to fix this. Please > always do reply-to-all. Sri, will remember that. > > > Just using kernel 2.6.24-rc2 (325d22df7b19e0116aff3391d3a03f73d0634ded). > > > > So is this problem (which in another email you attributed to smartd) Even without smartd in my default runlevel it happens at some point. > also > present in 2.6.23? I compiled and tested 2.6.23.8. Smartd enabled, nothing noticed, dmesg is really clean: dmesg | grep ata ACPI: SSDT 7F6D3C3F, 02DD (r1 SataRe SataAhci 1000 INTL 20060912) PERCPU: Allocating 46888 bytes of per cpu data Memory: 2042960k/2087744k available (2062k kernel code, 44396k reserved, 982k data, 324k init) ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62 ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62 libata version 2.21 loaded. ata1: SATA max UDMA/133 cmd 0xc234e100 ctl 0x bmdma 0x irq 4347 ata2: SATA max UDMA/133 cmd 0xc234e180 ctl 0x bmdma 0x irq 4347 ata3: SATA max UDMA/133 cmd 0xc234e200 ctl 0x bmdma 0x irq 4347 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-8: WDC WD2500BEVS-22UST0, 01.01A01, max UDMA/133 ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link down (SStatus 0 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata_piix :00:1f.1: version 2.12 scsi3 : ata_piix scsi4 : ata_piix ata4: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x00011810 irq 14 ata5: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 0x00011818 irq 15 ata4.00: ATAPI: HL-DT-ST DVDRAM GSA-T20N, WW01, max UDMA/33 ata4.00: configured for UDMA/33 EXT3-fs: mounted filesystem with ordered data mode. > > And is is still present in 2.6.24-rc3? Went back to 2.6.24-rc3 ... Yes, but not at boot when smartd is started. dmesg | grep ata ACPI: SSDT 7F6D3C3F, 02DD (r1 SataRe SataAhci 1000 INTL 20060912) PERCPU: Allocating 46968 bytes of per cpu data Memory: 2048732k/2087744k available (2219k kernel code, 38624k reserved, 992k data, 344k init) ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62 libata version 3.00 loaded. ata1: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfc404100 irq 4347 ata2: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfc404180 irq 4347 ata3: SATA max UDMA/133 abar [EMAIL PROTECTED] port 0xfc404200 irq 4347 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-8: WDC WD2500BEVS-22UST0, 01.01A01, max UDMA/133 ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link down (SStatus 0 SControl 300) ata3: SATA link down (SStatus 0 SControl 300) ata_piix :00:1f.1: version 2.12 scsi3 : ata_piix scsi4 : ata_piix ata4: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1810 irq 14 ata5: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1818 irq 15 ata4.00: ATAPI: HL-DT-ST DVDRAM GSA-T20N, WW01, max UDMA/33 ata4.00: configured for UDMA/33 EXT3-fs: mounted filesystem with ordered data mode. ata1.00: exception Emask 0x2 SAct 0x73 SErr 0x0 action 0x2 frozen ata1.00: spurious completions during NCQ issue=0x0 SAct=0x73 FIS=004040a1:0008 ata1.00: cmd 60/10:00:d4:82:31/00:00:07:00:00/40 tag 0 cdb 0x0 data 8192 in ata1.00: status: { DRDY } ata1.00: cmd 60/08:08:9c:e5:cc/00:00:08:00:00/40 tag 1 cdb 0x0 data 4096 in ata1.00: status: { DRDY } ata1.00: cmd 60/10:20:24:61:25/00:00:09:00:00/40 tag 4 cdb 0x0 data 8192 in ata1.00: status: { DRDY } ata1.00: cmd 60/58:28:c4:65:25/00:00:09:00:00/40 tag 5 cdb 0x0 data 45056 in ata1.00: status: { DRDY } ata1.00: cmd 60/20:30:7c:f6:a3/00:00:05:00:00/40 tag 6 cdb 0x0 data 16384 in ata1.00: status: { DRDY } ata1: soft resetting link ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete ata1.00: exception Emask 0x2 SAct 0x187 SErr 0x0 action 0x2 frozen ata1.00: spurious completions during NCQ issue=0x0 SAct=0x187 FIS=004040a1:0040 ata1.00: cmd 60/08:00:ec:af:10/00:00:04:00:00/40 tag 0 cdb 0x0 data 4096 in ata1.00: status: { DRDY } ata1.00: cmd 60/10:08:8c:e6:d8/00:00:04:00:00/40 tag 1 cdb 0x0 data 8192 in ata1.00: status: { DRDY } ata1.00: cmd 60/20:10:24:1a:da/00:00:04:00:00/40 tag 2 cdb 0x0 data 16384 in ata1.00: status: { DRDY } ata1.00: cmd 61/01:38:15:b3:30/00:00:07:00:00/40 tag 7 cdb 0x0 data 512 out ata1.00: status: { DRDY } ata1.00: cmd 61/10:40:1c:b3:30/00:00:07:00:00/40 tag 8 cdb 0x0 data 8192 out ata1.00: status: { DRDY } ata1: soft resetting link ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete Thanks ! Best regards, Jan-Simon - To unsubscribe from this
Re: nfs failure causes bad page state
On Fri, 2007-11-16 at 22:13 +, Russell King wrote: > While testing a kernel based upon > ecd744eec3aa8bbc949ec04ed3fbf7ecb2958a0e > (with wrong boot arguments), I got the following bad page state entry > while > NFS was trying to mount it's rootfs: > > IP-Config: Complete: > device=eth0, addr=192.168.1.101, mask=255.255.255.0, > gw=255.255.255.255, > host=192.168.1.101, domain=, nis-domain=(none), > bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath= > Looking up port of RPC 13/2 on 192.168.1.100 > rpcbind: server 192.168.1.100 not responding, timed out > Root-NFS: Unable to get nfsd port number from server, using default > Looking up port of RPC 15/1 on 192.168.1.100 > rpcbind: server 192.168.1.100 not responding, timed out > Root-NFS: Unable to get mountd port number from server, using default > mount: server 192.168.1.100 not responding, timed out > Root-NFS: Server returned error -5 while mounting /nfs/rootfs/ > VFS: Unable to mount root fs via NFS, trying floppy. > Bad page state in process 'swapper' > page:c02b1260 flags:0x0400 mapping: mapcount:0 count:0 > Trying to fix it up, but a reboot is needed > Backtrace: > [] (dump_stack+0x0/0x14) from [] (bad_page > +0x70/0xac) > [] (bad_page+0x0/0xac) from [] (free_hot_cold_page > +0x80/0x178) > [] (free_hot_cold_page+0x0/0x178) from [] > (free_hot_page+0x14/0x18) > [] (free_hot_page+0x0/0x18) from [] (put_page > +0xf8/0x154) > [] (put_page+0x0/0x154) from [] (kfree+0xc8/0xd0) > [] (kfree+0x0/0xd0) from [] (nfs_get_sb > +0x230/0x710) > [] (nfs_get_sb+0x0/0x710) from [] (vfs_kern_mount > +0x58/0xac)[] (vfs_kern_mount+0x0/0xac) from [] > (do_kern_mount+0x38/0xf4) > [] (do_kern_mount+0x0/0xf4) from [] (do_mount > +0x1e8/0x614) > ... > > This seems to be caused by use of an uninitialised structure due to > NULL > options being passed to nfs_validate_mount_data(). Ensure that the > parsed mount data is always initialised. > > Signed-off-by: Russell King <[EMAIL PROTECTED]> > > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index fa517ae..0b1080c 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -1054,10 +1054,11 @@ static int nfs_validate_mount_data(void > *options, > { > struct nfs_mount_data *data = (struct nfs_mount_data *)options; > > + memset(args, 0, sizeof(*args)); > + > if (data == NULL) > goto out_no_data; > > - memset(args, 0, sizeof(*args)); > args->flags = (NFS_MOUNT_VER3 | NFS_MOUNT_TCP); > args->rsize = NFS_MAX_FILE_IO_SIZE; > args->wsize = NFS_MAX_FILE_IO_SIZE; Thanks Russell, It looks as if the same bug exists in nfs4_validate_mount_data(), so I added the same fix. Cheers Trond --- Begin Message --- While testing a kernel based upon ecd744eec3aa8bbc949ec04ed3fbf7ecb2958a0e (with wrong boot arguments), I got the following bad page state entry while NFS was trying to mount it's rootfs: IP-Config: Complete: device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255, host=192.168.1.101, domain=, nis-domain=(none), bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath= Looking up port of RPC 13/2 on 192.168.1.100 rpcbind: server 192.168.1.100 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default Looking up port of RPC 15/1 on 192.168.1.100 rpcbind: server 192.168.1.100 not responding, timed out Root-NFS: Unable to get mountd port number from server, using default mount: server 192.168.1.100 not responding, timed out Root-NFS: Server returned error -5 while mounting /nfs/rootfs/ VFS: Unable to mount root fs via NFS, trying floppy. Bad page state in process 'swapper' page:c02b1260 flags:0x0400 mapping: mapcount:0 count:0 Trying to fix it up, but a reboot is needed Backtrace: [] (dump_stack+0x0/0x14) from [] (bad_page+0x70/0xac) [] (bad_page+0x0/0xac) from [] (free_hot_cold_page+0x80/0x178) [] (free_hot_cold_page+0x0/0x178) from [] (free_hot_page+0x14/0x18) [] (free_hot_page+0x0/0x18) from [] (put_page+0xf8/0x154) [] (put_page+0x0/0x154) from [] (kfree+0xc8/0xd0) [] (kfree+0x0/0xd0) from [] (nfs_get_sb+0x230/0x710) [] (nfs_get_sb+0x0/0x710) from [] (vfs_kern_mount+0x58/0xac)[] (vfs_kern_mount+0x0/0xac) from [] (do_kern_mount+0x38/0xf4) [] (do_kern_mount+0x0/0xf4) from [] (do_mount+0x1e8/0x614) ... This seems to be caused by use of an uninitialised structure due to NULL options being passed to nfs_validate_mount_data(). Ensure that the parsed mount data is always initialised. Signed-off-by: Russell King <[EMAIL PROTECTED]> (Trond: added fix for the same bug in nfs4_validate_mount_data()). Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]> --- fs/nfs/super.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 046d1ac..8d95d7d 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -1078,10 +1078,11 @@ static int
Re: freeze vs freezer
On Thursday, 22 of November 2007, Jeremy Fitzhardinge wrote: > It seems that a process blocked in a write to an xfs filesystem due to > xfs_freeze cannot be frozen by the freezer. The freezer doesn't handle tasks in TASK_UNINTERRUPTIBLE and I don't know how to make it handle them without at least partially defeating its purpose. > I see this if I suspend my laptop while doing something xfs-filesystem > intensive, like a kernel build. My suspend scripts freeze the XFS > filesystem (as Dave said I should), which presumably blocks some writer, > and then the freezer times out and fails to complete. > > Here's part of the process dump the freezer does when it times out: > > cc1 D 0 18138 18137 >dd5f1e24 00200082 0002 ecdeeb00 ecdeec64 c200f280 > 0001 >009c09a0 dd5f1e0c dd5f1e0c 000f > dd5f1e74 >c7beb480 dd5f1e88 dd5f1ea8 c0228d97 e8889540 dd5f1e38 c015b75d > dd5f1e44 > Call Trace: > [] xfs_write+0xf4/0x6d9 > [] xfs_file_aio_write+0x53/0x5b > [] do_sync_write+0xae/0xec > [] vfs_write+0xa4/0x120 > [] sys_write+0x3b/0x60 > [] sysenter_past_esp+0x6b/0xa1 > === > > > I haven't looked at how to fix this yet. I only just worked out why I > was getting suspend failures. Well, you can add freezer_do_not_count()/freezer_count() annotations to xfs_write() (and whatever else is blocked as a result of the XFS being frozen). Generally, that would be risky without the freezing of XFS, however, because it might leak us filesystem data to a storage device after creating a hibernation image which would result in the filesystem corruption after the resume. Still, if you only suspend to RAM, that should be safe. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: OT: Re: System reboot triggered by just reading a device file....!?
Hi Clemens, > > Hi, Roland! > > Please don't top-post. sorry! > > > [was: it would be easy to disable the kernel watchdog] > > thanks, but i know i could do this. > > Good. I was also curious and just checked again. The watchdog subsystem > is by default _disabled_ in the kernel configuration. If you use some > distro's kernel, where they turned it on, complain to them! > If you turned it on yourself, you are really on your own... > the Kconfig help there is IMO sufficient and very clear and, > "If unsure, say N". Hmm... sorry?! whoops - sorry for that. i should have checked that, but i think i just didn`t expect some distro vendor to change that default. sure i will complain to suse now. stopping getting on your nerves here, now. > > this thread is not meant to protect myself from this curiousity but it is > meant > > to protect others. it`s a trap. > > I guess I understand your position. But I don't see no way to improve > the kernel in that point. > Complain to the guys who enabled the watchdog / setup this trap for > any reason. sure. you`re completely right. > > i stepped into that. > > now i know that trap, so i can easily sidestep. > > it maybe very seldom that someone steps into this. > > but it may happen and then someone will have trouble and spend time on > this. > > i think every admin can tell you about weird random reboots of his systems > > which he cannot explain what was the reason for it. > > That's one possible way of "learning by doing suicide (tm);" :) > > this maybe some of those reasons and this one could be avoided. > > i`m thinking of something simple like echo "now you`re armed" > > /dev/watchdog > > Read some details about watchdogs to get more background and why the > watchdog is triggered so easily and why it's good this way. > i.e: http://www.ganssle.com/watchdogs.pdf thanks for your help and for that very useful link. that`s the very best stuff i every read about watchdogs! regards Roland __ Erweitern Sie FreeMail zu einem noch leistungsstärkeren E-Mail-Postfach! Mehr Infos unter http://produkte.web.de/club/?mc=021131 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Daniel Drake пишет: > Being spoilt by the luxuries of i386/x86_64 I've never really had a good > grasp on unaligned memory access problems on other architectures and decided > it was time to figure it out. As a result I've written this documentation > which I plan to submit for inclusion as > Documentation/unaligned_memory_access.txt > > Before I do so, any comments on the following? >From the viewpoint of yours truly (and I am a teacher of operating system >classes), this is a long-expected document, which is going to be very useful >especially for newbies. My students often make alignment mistakes in their >code, and your article will definitely make my job much easier. Thank you, Daniel, for your work. Dmitri > > Thanks, > Daniel > > > > > UNALIGNED MEMORY ACCESSES > = > > Linux runs on a wide variety of architectures which have varying behaviour > when it comes to memory access. This document presents some details about > unaligned accesses, why you need to write code that doesn't cause them, > and how to write such code! > > > What's the definition of an unaligned access? > = > > Unaligned memory accesses occur when you try to read N bytes of data starting > from an address that is not evenly divisible by N (i.e. addr % N != 0). > For example, reading 4 bytes of data from address 0x1004 is fine, but > reading 4 bytes of data from address 0x1005 would be an unaligned memory > access. > > > Why unaligned access is bad > === > > Most architectures are unable to perform unaligned memory accesses. Any > unaligned access causes a processor exception. > > Some architectures have an exception handler implemented in the kernel which > corrects the memory access, but this is very expensive and is not true for > all architectures. You cannot rely on the exception handler to correct your > memory accesses. > > In summary: if your code causes unaligned memory accesses to happen, your code > will not work on some platforms, and will perform *very* badly on others. > > You may be wondering why you have never seen these problems on your own > architecture. Some architectures (such as i386 and x86_64) do not have this > limitation, but nevertheless it is important for you to write portable code > that works everywhere. > > > Natural alignment > = > > The rule we mentioned earlier forms what we refer to as natural alignment: > When accessing N bytes of memory, the base memory address must be evenly > divisible by N, i.e. addr % N == 0 > > When writing code, assume the target architecture has natural alignment > requirements. > > Sidenote: in reality, only a few architectures require natural alignment > on all sizes of memory access. However, again we must consider ALL supported > architectures; natural alignment is the only way to achieve full portability. > > > Code that doesn't cause unaligned access > > > At first, the concepts above may seem a little hard to relate to actual > coding practice. After all, you don't have a great deal of control over > memory addresses of certain variables, etc. > > Fortunately things are not too complex, as in most cases, the compiler > ensures that things will work for you. For example, take the following > structure: > > struct foo { > u16 field1; > u32 field2; > u8 field3; > }; > > Let us assume that an instance of the above structure resides in memory > starting at address 0x1000. With a basic level of understanding, it would > not be unreasonable to expect that accessing field2 would cause an unaligned > access. You'd be expecting field2 to be located at offset 2 bytes into the > structure, i.e. address 0x1002, but that address is not evenly divisible > by 4 (remember, we're reading a 4 byte value here). > > Fortunately, the compiler understands the alignment constraints, so in the > above case it would insert 2 bytes of padding inbetween field1 and field2. > Therefore, for standard structure types you can always rely on the compiler > to pad structures so that accesses to fields are suitably aligned (assuming > you do not cast the field to a type of different length). > > Similarly, you can also rely on the compiler to align variables and function > parameters to a naturally aligned scheme, based on the size of the type of > the variable. > > Sidenote: in the above example, you may wish to reorder the fields in the > above structure so that the overall structure uses less memory. For example, > moving field3 to sit inbetween field1 and field2 (where the padding is > inserted) would shrink the overall structure by 1 byte: > > struct foo { > u16 field1; > u8 field3; > u32 field2; > }; > > Sidenote: it should be obvious by now, but in case it is not, accessing a > single
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote: > Torsten Kaiser wrote: > > On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > >> * Torsten Kaiser <[EMAIL PROTECTED]> wrote: > ... > > Above this acquire/release sequence is the following comment: > > #ifdef CONFIG_LOCKDEP > > /* > > * It is permissible to free the struct work_struct > > * from inside the function that is called from it, > > * this we need to take into account for lockdep too. > > * To avoid bogus "held lock freed" warnings as well > > * as problems when looking into work->lockdep_map, > > * make a copy and use that here. > > */ > > struct lockdep_map lockdep_map = work->lockdep_map; > > #endif > > > > Did something trigger this anyway? > > > > Anything I could try, apart from more boots with slub_debug=F? > > Please could you try which patch from the dm-crypt series cause this ? > (agk-dm-dm-crypt* names.) > > I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because > there is one work struct used subsequently in two threads... > (io thread already started while crypt thread is processing lockdep_map > after calling f(work)...) > > (btw these patches prepare dm-crypt for next patchset introducing > async cryptoapi, so there should be no functional changes yet.) I looked at all of these agk-*-patches, as the error is not bisectable, because it triggers unreliable. The one that looks suspicious is agk-dm-dm-crypt-tidy-io-ref-counting.patch This one does a functional change, as there now is an additional ref on io->pending. Instead of only increasing io->pending if there really are more then one clone-bio, it will now take an additional ref in crypt_write_io_process(). I certainly agree with the cleanup, but this introduces the following change: Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio(). Now there is an additional call to crypt_dec_pending() to balance the additional ref placed into crypt_write_io_process(). And that one is not called from whatever context/thread cleans up after make_generic_request, but directly in the context/thread of the caller of crypt_write_io_process(), and that is kcryptd. So now it is possible (if all requests finish before crypt_write_io_process() returns) that kcryptd itself will release the bio, but the workqueue infrastructure still seems to have a lock on that. But as the comment in run_workqueue says, this should be legal, and I can't figure out what would make the the lockdep copy mechanism fail. Especially if the trigger was really a WRITE request, as with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted this should never use the kcrypt_io-workqueue and so there should be not even the problem with using INIT_WORK twice on the same work_struct. ... or I just don't see the bug. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: radeonfb i2c regression post-2.6.18.
On Fri, 23 Nov 2007 17:00:52 +0100, Michael Buesch wrote: > This patch fixes my crash problem. Out of curiosity, what kind of crash was it? I admit that I can't see how the code could crash. -- Jean Delvare - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver
On Fri 23 Nov 2007 16:52, Arjan van de Ven pondered: > On Fri, 23 Nov 2007 22:25:29 +0800 > "Bryan Wu" <[EMAIL PROTECTED]> wrote: > > > On Nov 23, 2007 6:19 PM, David Woodhouse <[EMAIL PROTECTED]> wrote: > > > > > > On Fri, 2007-11-23 at 18:14 +0800, Bryan Wu wrote: > > > > > > > > +#ifdef CONFIG_BF54x > > > > /* Setup DMAC1 channel mux for NFC which shared with SDH > > > > */ val = bfin_read_DMAC1_PERIMUX(); > > > > val &= 0xFFFE; > > > > bfin_write_DMAC1_PERIMUX(val); > > > > SSYNC(); > > > > - > > > > +#endif > > > > > > You can't build a multiplatform kernel which runs on BF52x and > > > BF54x? > > > > There are some hardware difference between BF52x and BF54x. We have > > to do this. > > > > well does it need to be an #ifdef, or can it be a runtime if() ? It could be a runtime if() but we don't currently have the is_mach() all set up properly today. This is because on most systems that Blackfin ships on - memory is the dominate cost of the system, and end users don't want to take the either the storage (flash) hit of having code they don't use, or the run time (DRAM) overhead. They are fine with compiling 2 kernels for two platforms if it means things are cheaper. :) That being said, we still need to go back, and add things properly - and just let gcc optimise things away if it is not used - c code is more maintainable than all the ifdefs we have today. This is the goal - it will just take a little bit to get there. -Robin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
On Thursday 22 November 2007 04:15:53 pm Daniel Drake wrote: > Fortunately things are not too complex, as in most cases, the compiler > ensures that things will work for you. For example, take the following > structure: > > struct foo { > u16 field1; > u32 field2; > u8 field3; > }; > > Fortunately, the compiler understands the alignment constraints, so in the > above case it would insert 2 bytes of padding inbetween field1 and field2. > Therefore, for standard structure types you can always rely on the compiler > to pad structures so that accesses to fields are suitably aligned (assuming > you do not cast the field to a type of different length). It would also insert 3 bytes of padding after field3, in order to satisfy alignment constraints for arrays of these structures. > Sidenote: in the above example, you may wish to reorder the fields in the > above structure so that the overall structure uses less memory. For > example, moving field3 to sit inbetween field1 and field2 (where the > padding is inserted) would shrink the overall structure by 1 byte: > > struct foo { > u16 field1; > u8 field3; > u32 field2; > }; It will actually shrink it by 4 bytes, for the very same reason. -- Vadim Lobanov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] [MTD/NAND]: Add Blackfin BF52x on-chip NAND Flash controller driver support in bf5xx_nand driver
On Fri, 23 Nov 2007 22:25:29 +0800 "Bryan Wu" <[EMAIL PROTECTED]> wrote: > On Nov 23, 2007 6:19 PM, David Woodhouse <[EMAIL PROTECTED]> wrote: > > > > On Fri, 2007-11-23 at 18:14 +0800, Bryan Wu wrote: > > > > > > +#ifdef CONFIG_BF54x > > > /* Setup DMAC1 channel mux for NFC which shared with SDH > > > */ val = bfin_read_DMAC1_PERIMUX(); > > > val &= 0xFFFE; > > > bfin_write_DMAC1_PERIMUX(val); > > > SSYNC(); > > > - > > > +#endif > > > > You can't build a multiplatform kernel which runs on BF52x and > > BF54x? > > There are some hardware difference between BF52x and BF54x. We have > to do this. > well does it need to be an #ifdef, or can it be a runtime if() ? -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the interrupt going?
On Wed, 2007-11-21 at 17:08 -0800, Al Niessner wrote: > > p8620 = pci_get_device (APC8620_VENDOR_ID, APC8620_DEVICE_ID, p8620); > <... fail if p8620 is 0 ...> > apcsi[i].ret_val = register_chrdev (MAJOR_NUM, > > DEVICE_NAME, > > _ops); > <... fail if ret_val < 0 ...> > apcsi[i].board_irq = p8620->irq; > status = request_irq (apcsi[i].board_irq, > apc8620_handler, > IRQF_DISABLED, > DEVICE_NAME, > (void*)[i]); First, that's obviously not the proper way to do a PCI driver but I suppose you know that :-) Then, make sure you call pci_enable_device() at one point, don't some platforms perform the actual IRQ routing that late ? (And don't sample pdev->irq before the pci_enable_device(), sample it afterward). Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/4] Timerfd v2 - un-break CONFIG_TIMERFD
Remove the broken status to CONFIG_TIMERFD. Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]> - Davide --- init/Kconfig |1 - 1 file changed, 1 deletion(-) Index: linux-2.6.mod/init/Kconfig === --- linux-2.6.mod.orig/init/Kconfig 2007-11-23 13:13:16.0 -0800 +++ linux-2.6.mod/init/Kconfig 2007-11-23 13:36:42.0 -0800 @@ -566,7 +566,6 @@ config TIMERFD bool "Enable timerfd() system call" if EMBEDDED select ANON_INODES - depends on BROKEN default y help Enable the timerfd() system call that allows to receive timer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/4] Timerfd v2 - wire the new timerfd API to the x86 family
Wires up the new timerfd API to the x86 family. Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]> - Davide --- arch/x86/ia32/ia32entry.S |4 +++- arch/x86/kernel/syscall_table_32.S |4 +++- include/asm-x86/unistd_32.h|6 -- include/asm-x86/unistd_64.h|9 +++-- 4 files changed, 17 insertions(+), 6 deletions(-) Index: linux-2.6.mod/include/asm-x86/unistd_32.h === --- linux-2.6.mod.orig/include/asm-x86/unistd_32.h 2007-11-23 13:13:18.0 -0800 +++ linux-2.6.mod/include/asm-x86/unistd_32.h 2007-11-23 13:36:40.0 -0800 @@ -327,13 +327,15 @@ #define __NR_epoll_pwait 319 #define __NR_utimensat 320 #define __NR_signalfd 321 -#define __NR_timerfd 322 +#define __NR_timerfd_create322 #define __NR_eventfd 323 #define __NR_fallocate 324 +#define __NR_timerfd_settime 325 +#define __NR_timerfd_gettime 326 #ifdef __KERNEL__ -#define NR_syscalls 325 +#define NR_syscalls 327 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.mod/include/asm-x86/unistd_64.h === --- linux-2.6.mod.orig/include/asm-x86/unistd_64.h 2007-11-23 13:13:18.0 -0800 +++ linux-2.6.mod/include/asm-x86/unistd_64.h 2007-11-23 13:36:40.0 -0800 @@ -629,12 +629,17 @@ __SYSCALL(__NR_epoll_pwait, sys_epoll_pwait) #define __NR_signalfd 282 __SYSCALL(__NR_signalfd, sys_signalfd) -#define __NR_timerfd 283 -__SYSCALL(__NR_timerfd, sys_timerfd) +#define __NR_timerfd_create283 +__SYSCALL(__NR_timerfd_create, sys_timerfd_create) #define __NR_eventfd 284 __SYSCALL(__NR_eventfd, sys_eventfd) #define __NR_fallocate 285 __SYSCALL(__NR_fallocate, sys_fallocate) +#define __NR_timerfd_settime 286 +__SYSCALL(__NR_timerfd_settime, sys_timerfd_settime) +#define __NR_timerfd_gettime 287 +__SYSCALL(__NR_timerfd_gettime, sys_timerfd_gettime) + #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6.mod/arch/x86/kernel/syscall_table_32.S === --- linux-2.6.mod.orig/arch/x86/kernel/syscall_table_32.S 2007-11-23 13:13:18.0 -0800 +++ linux-2.6.mod/arch/x86/kernel/syscall_table_32.S2007-11-23 13:36:40.0 -0800 @@ -321,6 +321,8 @@ .long sys_epoll_pwait .long sys_utimensat /* 320 */ .long sys_signalfd - .long sys_timerfd + .long sys_timerfd_create .long sys_eventfd .long sys_fallocate + .long sys_timerfd_settime /* 325 */ + .long sys_timerfd_gettime Index: linux-2.6.mod/arch/x86/ia32/ia32entry.S === --- linux-2.6.mod.orig/arch/x86/ia32/ia32entry.S2007-11-23 13:13:18.0 -0800 +++ linux-2.6.mod/arch/x86/ia32/ia32entry.S 2007-11-23 13:36:40.0 -0800 @@ -723,7 +723,9 @@ .quad sys_epoll_pwait .quad compat_sys_utimensat /* 320 */ .quad compat_sys_signalfd - .quad compat_sys_timerfd + .quad sys_timerfd_create .quad sys_eventfd .quad sys32_fallocate + .quad compat_sys_timerfd_settime/* 325 */ + .quad compat_sys_timerfd_gettime ia32_syscall_end: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/4] Timerfd v2 - introduce a new hrtimer_forward_now() function
I think that advancing the timer against the timer's current "now" can be a pretty common usage, so, w/out exposing hrtimer's internals, we add a new hrtimer_forward_now() function. Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]> - Davide --- include/linux/hrtimer.h |7 +++ 1 file changed, 7 insertions(+) Index: linux-2.6.mod/include/linux/hrtimer.h === --- linux-2.6.mod.orig/include/linux/hrtimer.h 2007-11-23 13:13:21.0 -0800 +++ linux-2.6.mod/include/linux/hrtimer.h 2007-11-23 13:36:36.0 -0800 @@ -298,6 +298,13 @@ extern unsigned long hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval); +/* Forward a hrtimer so it expires after the hrtimer's current now */ +static inline unsigned long hrtimer_forward_now(struct hrtimer *timer, + ktime_t interval) +{ + return hrtimer_forward(timer, timer->base->get_time(), interval); +} + /* Precise sleep: */ extern long hrtimer_nanosleep(struct timespec *rqtp, struct timespec *rmtp, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/4] Timerfd v2 - new timerfd API
This is the new timerfd API as it is implemented by the following patch: int timerfd_create(int clockid, int flags); int timerfd_settime(int ufd, int flags, const struct itimerspec *utmr, struct itimerspec *otmr); int timerfd_gettime(int ufd, struct itimerspec *otmr); The timerfd_create() API creates an un-programmed timerfd fd. The "clockid" parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME. The timerfd_settime() API give new settings by the timerfd fd, by optionally retrieving the previous expiration time (in case the "otmr" parameter is not NULL). The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit is set in the "flags" parameter. Otherwise it's a relative time. The timerfd_gettime() API returns the next expiration time of the timer, or {0, 0} if the timerfd has not been set yet. Like the previous timerfd API implementation, read(2) and poll(2) are supported (with the same interface). Here's a simple test program I used to exercise the new timerfd APIs: http://www.xmailserver.org/timerfd-test2.c Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]> - Davide --- fs/compat.c | 32 ++- fs/timerfd.c | 197 ++- include/linux/compat.h |7 + include/linux/syscalls.h |7 + 4 files changed, 164 insertions(+), 79 deletions(-) Index: linux-2.6.mod/fs/timerfd.c === --- linux-2.6.mod.orig/fs/timerfd.c 2007-11-23 13:13:19.0 -0800 +++ linux-2.6.mod/fs/timerfd.c 2007-11-23 13:36:39.0 -0800 @@ -25,13 +25,15 @@ struct hrtimer tmr; ktime_t tintv; wait_queue_head_t wqh; + u64 ticks; int expired; + int clockid; }; /* * This gets called when the timer event triggers. We set the "expired" * flag, but we do not re-arm the timer (in case it's necessary, - * tintv.tv64 != 0) until the timer is read. + * tintv.tv64 != 0) until the timer is accessed. */ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr) { @@ -40,13 +42,14 @@ spin_lock_irqsave(>wqh.lock, flags); ctx->expired = 1; + ctx->ticks++; wake_up_locked(>wqh); spin_unlock_irqrestore(>wqh.lock, flags); return HRTIMER_NORESTART; } -static void timerfd_setup(struct timerfd_ctx *ctx, int clockid, int flags, +static void timerfd_setup(struct timerfd_ctx *ctx, int flags, const struct itimerspec *ktmr) { enum hrtimer_mode htmode; @@ -57,8 +60,9 @@ texp = timespec_to_ktime(ktmr->it_value); ctx->expired = 0; + ctx->ticks = 0; ctx->tintv = timespec_to_ktime(ktmr->it_interval); - hrtimer_init(>tmr, clockid, htmode); + hrtimer_init(>tmr, ctx->clockid, htmode); ctx->tmr.expires = texp; ctx->tmr.function = timerfd_tmrproc; if (texp.tv64 != 0) @@ -83,7 +87,7 @@ poll_wait(file, >wqh, wait); spin_lock_irqsave(>wqh.lock, flags); - if (ctx->expired) + if (ctx->ticks) events |= POLLIN; spin_unlock_irqrestore(>wqh.lock, flags); @@ -102,11 +106,11 @@ return -EINVAL; spin_lock_irq(>wqh.lock); res = -EAGAIN; - if (!ctx->expired && !(file->f_flags & O_NONBLOCK)) { + if (!ctx->ticks && !(file->f_flags & O_NONBLOCK)) { __add_wait_queue(>wqh, ); for (res = 0;;) { set_current_state(TASK_INTERRUPTIBLE); - if (ctx->expired) { + if (ctx->ticks) { res = 0; break; } @@ -121,22 +125,21 @@ __remove_wait_queue(>wqh, ); __set_current_state(TASK_RUNNING); } - if (ctx->expired) { - ctx->expired = 0; - if (ctx->tintv.tv64 != 0) { + if (ctx->ticks) { + ticks = ctx->ticks; + if (ctx->expired && ctx->tintv.tv64) { /* * If tintv.tv64 != 0, this is a periodic timer that * needs to be re-armed. We avoid doing it in the timer * callback to avoid DoS attacks specifying a very * short timer period. */ - ticks = (u64) - hrtimer_forward(>tmr, - hrtimer_cb_get_time(>tmr), - ctx->tintv); + ticks += (u64) hrtimer_forward_now(>tmr, + ctx->tintv) - 1; hrtimer_restart(>tmr); - } else - ticks = 1; + } + ctx->expired = 0;
Re: unionfs: several more problems
In message <[EMAIL PROTECTED]>, Hugh Dickins writes: [...] > > I deceived myself for a while that the danger of shmem_writepage > > hitting its BUG_ON(entry->val) was dealt with too; but that's wrong, > > I must go back to working out an escape from that one (despite never > > seeing it). > > Once I tried a more appropriate test (fsx while swapping) I hit that > easily. After some thought and testing, I'm happy with the mm/shmem.c > +mm/swap_state.c fixes I've arrived at for that; but since it's not > easy to reproduce in normal usage, and hasn't been holding you up, > I'd prefer for the moment to hold on to that patch. I need to make > changes around the same pagecache<->swapcache area to solve some mem > cgroup issues: there might turn out to be some interaction, so I'd > rather finalize both patches in the same series if I can. [...] If you want, send me those patches and I'll run them w/ my tests, even if they're not finalized; my testing can give you another useful point of reference. > But perhaps before fixing up the several LTP tests, you'll want > to concentrate on a more directed test. Please try this sequence: > > # Running with mem=512M, probably irrelevant > swapoff -a# Merely to rule out one potential confusion > mkfs -t ext2 /dev/sdb1 > mount -t ext2 /dev/sdb1 /mnt > df /mnt # I have 2280 Used out of 1517920 KB > cp -a 2.6.24-rc3 /mnt # Copy a kernel source tree into ext2 > rm -rf /mnt/2.6.24-rc3# Delete the copy > df /mnt # Again 2280 Used, just as you'd expect > mount -t unionfs -o dirs=/mnt unionfs /tmp > cp -a 2.6.24-rc3 /tmp # Copy a kernel source tree into unionfs > rm -rf /tmp/2.6.24-rc3# Generates 176 unionfs: filldir error messages > df /mnt # Now 68380 Used (df /tmp shows the same) > ls -a /mnt# Shows . .. .wh.2.6.24-rc3 lost+found > echo 1 >/proc/sys/vm/drop_caches # to free pagecache > df /mnt # Still 68380 Used (df /tmp shows the same) > echo 2 >/proc/sys/vm/drop_caches # to free dentries and inodes > df /mnt # Now 2280 Used as it should be (df /tmp same) > ls -a /mnt# But still shows that .wh.2.6.24-rc3 > umount /tmp # Restore > umount /mnt # Restore > swapon -a # Restore > > Three different problems there: > > 1. Whiteouts seem to get left behind (at this top level anyway): > I'm getting an increasing number of .wh.run-crons.? files there. > I'm not familiar with the correct behaviour for whiteouts (and it's > not clear to me why whiteouts are needed at all in this degenerate > case of a single directory in the union, but never mind). I could spend a lot of time explaining the history of whiteouts in unioning file systems, and all the different techniques and algorithms we've tried ourselves over the years. But suffice to say that I'd be very happy the day every Linux f/s has a native whiteout support. :-) Our current policy for when/where to create whiteouts has evolved after much experience with users. The most common use case for unionfs is one or more read-only branches, plus a high-priority writable branch (for copyup). Therefore, in the most common case we cannot remove the objects from the readonly branches, and have to create a whiteout instead. Using a single branch with unionfs is very uncommon among unionfs users, but it serves nicely as a useful "null layer" testing (ala BSD's Nullfs or my fistgen's wrapfs). Anyway, upon further thinking about this issue I realized that whiteouts in the single-branch situation are just a generalization of a possibly more common case -- when the object being unlink'ed (or rmdir'ed) is on the rightmost, lowest priority branch in which it is known to exist. In that case, there's no need to create a whiteout there, b/c there's no chance that a readonly file by the same name could exist below that branch. The same is true if you try to rmdir a directory anywhere in one of the union's branches: if we know (thanks to ->lookup) that there is no dir by the same name anywhere else, then we can safely skip creating a whiteout if the least-priority dir is being rmdir'ed. I've got a small patch that does just that. > 2. Why does copying then deleting a tree leave blocks allocated, > which remain allocated indefinitely, until memory pressure or > drop_caches removes them? Hmm, I should have done "df -i" instead, > that would be more revealing. This may well be the same as the LTP > mkdir problem - inodes remaining half-allocated after they're unlinked. Turns out we weren't releasing the ref's on the lower directory being rmdir'ed as early as we could. We'd have done it in delete/clear_inode, upon memory pressure, or unmount -- so those resources wouldn't have stuck around forever. I now have a small patch that releases those resources on rmdir and the space (df and df -i) is reclaimed right
Re: Where is the new timerfd?
On Fri, 23 Nov 2007, Ulrich Drepper wrote: > On Nov 23, 2007 9:29 AM, Davide Libenzi <[EMAIL PROTECTED]> wrote: > > Yes, it's disabled, and yes, I'll repost today ... > > I haven't seen the patch and don't feel like searching. So I say it > here: please mak sure you add a flags parameter to the system call > itself (instead of adding it on as for eventfd and signalfd). We need > to be able to use O_CLOEXEC some way or another. I'm more then OK about adding a flags parameter. If it was for me, I'd add it even to eventfd and signalfd. I asked Linus if he was OK about adding the flags parameter to all. He didn't reply, and I read that as "no". - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Mark wrote: Yeah, I kind of had your reports in mind when I asked that. :) On a related note, I now have lots of Marvell (sata_mv) hardware here, and an Intel CPU/chipset box with physical RAM above the 4GB boundary. Morrison, Tom wrote: Yes, I believe that - otherwise, this problem would have been a crisis a LONG time ago...:-) But I do have some more questions in relationship to how things are mapped in your environment. I have a flat memory map (i.e.: the full 0x0 -- 0x1__ is passed to the 32bit Linux kernel without any 'holes' and/or reserved areas). Does your Intel memory map have this same type of flat memory model (and thus allow use of the FULL lower 4Gig) - or does it reserve areas of lower 4Gig for devices and such - if not - where are these reserved areas - and how do the relate to the I/O memory map for the device? In other words, I would be very interested in seeing the memory map & the PCI memory mapping to see if any overlap/correspond to reserve areas of lower 4 Gig (in a linux 32bit mode)... ... I believe that only 2GB or so of the 4GB RAM appears below the 4GB boundary. The rest is accessed above 4GB, using Intel's 36-bit PAE functionality. I think what you want to see is /proc/mtrr, annotated below by me: reg00: base=0x08000 (2048MB), size=2048MB: uncachable, count=1 I/O space reg01: base=0x0 ( 0MB), size=4096MB: write-back, count=1 first 2GB of RAM + I/O space reg02: base=0x1 (4096MB), size=1024MB: write-back, count=1 third GB of RAM reg03: base=0x14000 (5120MB), size= 512MB: write-back, count=1 portion of 4th GB of RAM reg04: base=0x16000 (5632MB), size= 256MB: write-back, count=1 portion of 4th GB of RAM reg05: base=0x17000 (5888MB), size= 128MB: write-back, count=1 portion of 4th GB of RAM reg06: base=0x17800 (6016MB), size= 64MB: write-back, count=1 portion of 4th GB of RAM reg07: base=0x0af80 (2808MB), size= 8MB: uncachable, count=1 (?) dunno From that, the visible RAM should be 2048 + 1024 + 512 + 256 + 128 + 64 = 3968MB. In /proc/meminfo, it reports MemTotal of 4067260kB, which divided by 1024 gives 3971MB. The BIOS reports 4024MB. But the MTRR values above do make it rather clear that nearly half the RAM requires 33-bit physical addressing for access. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 10:54:11PM +0300, Evgeniy Polyakov wrote: > On Fri, Nov 23, 2007 at 01:41:39PM -0600, Matt Mackall ([EMAIL PROTECTED]) > wrote: > > Here's another thought: move all this logic into the networking core, > > unify it with current softirq zapper, then allow it to be called from > > various other places (like atomic allocators). Then it'll all be in > > central maintained place with more users. > > This can be done quite easily - put a check into __kfree_skb() if > netpoll is compiled-in and we are in hardirq context, then put skb > into softirq freeing queue. Then zap_completion_queue() can free > anything without ever knowing about nature of the packet, since this > will be checked in __kfree_skb() anyway. What I had in mind was moving the whole zap_completion_queue concept into net/core/skbuff. So that netpoll (and, say, atomic kmalloc) can simply call something like "clean_completion_queue". -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 9/9] Clean up open coded inode dirty checks
On Nov 23 2007 11:47, Joe Perches wrote: >On Fri, 2007-11-23 at 19:16 +0100, Jan Engelhardt wrote: >> static inline bool xfs_inode_clean(const struct xfs_inode *ip) >> { >> if (ip->i_itemp == NULL) >> return true; >> if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) && >> ip->i_update_core == NULL) >> return true; >> return false; >> } > >Your code changed the test. See - the previous cryptic constructs could not even be decoded ;-) >xfs_inode.i_update_core is an unsigned char. > >I believe reordering the tests to avoid a possibly >unnecessary dereference is better. > > if (ip->i_update_core) > return false; > if (!ip->i_itemp) > return true; > return ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL; Yeah, something like that. Note: the function SHOULD return bool for this, to quash the ilf_fields & XFS_ILOG_ALL into 0/1. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 10:54:10PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Fri, Nov 23, 2007 at 01:41:39PM -0600, Matt Mackall ([EMAIL PROTECTED]) > wrote: > > Here's another thought: move all this logic into the networking core, > > unify it with current softirq zapper, then allow it to be called from > > various other places (like atomic allocators). Then it'll all be in > > central maintained place with more users. > > This can be done quite easily - put a check into __kfree_skb() if > netpoll is compiled-in and we are in hardirq context, then put skb > into softirq freeing queue. Then zap_completion_queue() can free > anything without ever knowing about nature of the packet, since this > will be checked in __kfree_skb() anyway. And let's add some mess... But should fix the case when netpoll code is being executed in interrupt context and is about to free skb, which should not be freed. Frankly saying this looks like crap. Crap-added-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 758dafe..88f8ea9 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -196,10 +196,7 @@ static void zap_completion_queue(void) while (clist != NULL) { struct sk_buff *skb = clist; clist = clist->next; - if (skb->destructor) - dev_kfree_skb_any(skb); /* put this one back */ - else - __kfree_skb(skb); + __kfree_skb(skb); } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 27cfe5f..8642097 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -318,6 +318,26 @@ void kfree_skbmem(struct sk_buff *skb) void __kfree_skb(struct sk_buff *skb) { +#if defined(CONFIG_NETPOLL) || defined(CONFIG_NETPOLL_TRAP) + if (in_irq() || irqs_disabled()) { + if (skb->destructor) { + dev_kfree_skb_irq(skb); + return; + } +#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) + if (skb->nfct || skb->nfct_reasm) { + dev_kfree_skb_irq(skb); + return; + } +#endif +#ifdef CONFIG_XFRM + if (skb->sp) { + dev_kfree_skb_irq(skb); + return; + } +#endif + } +#endif dst_release(skb->dst); #ifdef CONFIG_XFRM secpath_put(skb->sp); -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 1/2] wait_task_stopped: remove unneeded delay_group_leader check
wait_task_stopped() doesn't need the "delay_group_leader" parameter. If the child is not traced it must be a group leader. With or without subthreads ->group_stop_count == 0 when the whole task is stopped. Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- PT/kernel/exit.c~5_ck_group_stop2007-11-22 19:08:43.0 +0300 +++ PT/kernel/exit.c2007-11-23 20:31:21.0 +0300 @@ -1348,7 +1348,7 @@ static int wait_task_zombie(struct task_ * the lock and this task is uninteresting. If we return nonzero, we have * released the lock and the system call should return. */ -static int wait_task_stopped(struct task_struct *p, int delayed_group_leader, +static int wait_task_stopped(struct task_struct *p, int noreap, struct siginfo __user *infop, int __user *stat_addr, struct rusage __user *ru) { @@ -1362,8 +1362,7 @@ static int wait_task_stopped(struct task if (unlikely(!is_task_stopped_or_traced(p))) goto unlock_sig; - if (delayed_group_leader && !(p->ptrace & PT_PTRACED) && - p->signal->group_stop_count > 0) + if (!(p->ptrace & PT_PTRACED) && p->signal->group_stop_count > 0) /* * A group stop is in progress and this is the group leader. * We won't report until all threads have stopped. @@ -1519,7 +1518,7 @@ repeat: !(options & WUNTRACED)) continue; - retval = wait_task_stopped(p, ret == 2, + retval = wait_task_stopped(p, (options & WNOWAIT), infop, stat_addr, ru); } else if (p->exit_state == EXIT_ZOMBIE) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 2/2] do_wait: cleanup delay_group_leader() usage
eligible_child() == 2 means delay_group_leader(). With the previous patch this only matters for EXIT_ZOMBIE task, we can move that special check to the only place it is really needed. Also, with this patch we don't skip security_task_wait() for the group leaders in a non-empty thread group. I don't really understand the exact semantics of security_task_wait(), but imho this change is a bugfix. Also rearrange the code a bit to kill an ugly "check_continued" backdoor. Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- PT/kernel/exit.c~6_delay_leader 2007-11-23 20:31:21.0 +0300 +++ PT/kernel/exit.c2007-11-23 21:29:44.0 +0300 @@ -1137,12 +1137,6 @@ static int eligible_child(pid_t pid, int if (((p->exit_signal != SIGCHLD) ^ ((options & __WCLONE) != 0)) && !(options & __WALL)) return 0; - /* -* Do not consider thread group leaders that are -* in a non-empty thread group: -*/ - if (delay_group_leader(p)) - return 2; err = security_task_wait(p); if (err) @@ -1494,10 +1488,9 @@ repeat: tsk = current; do { struct task_struct *p; - int ret; list_for_each_entry(p, >children, sibling) { - ret = eligible_child(pid, options, p); + int ret = eligible_child(pid, options, p); if (!ret) continue; @@ -1521,19 +1514,17 @@ repeat: retval = wait_task_stopped(p, (options & WNOWAIT), infop, stat_addr, ru); - } else if (p->exit_state == EXIT_ZOMBIE) { + } else if (p->exit_state == EXIT_ZOMBIE && + !delay_group_leader(p)) { /* -* Eligible but we cannot release it yet: +* We don't reap group leaders with subthreads. */ - if (ret == 2) - goto check_continued; if (!likely(options & WEXITED)) continue; retval = wait_task_zombie(p, (options & WNOWAIT), infop, stat_addr, ru); } else if (p->exit_state != EXIT_DEAD) { -check_continued: /* * It's running now, so it might later * exit, stop, or stop and then continue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 01:41:39PM -0600, Matt Mackall ([EMAIL PROTECTED]) wrote: > Here's another thought: move all this logic into the networking core, > unify it with current softirq zapper, then allow it to be called from > various other places (like atomic allocators). Then it'll all be in > central maintained place with more users. This can be done quite easily - put a check into __kfree_skb() if netpoll is compiled-in and we are in hardirq context, then put skb into softirq freeing queue. Then zap_completion_queue() can free anything without ever knowing about nature of the packet, since this will be checked in __kfree_skb() anyway. Kind of this: diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 758dafe..88f8ea9 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -196,10 +196,7 @@ static void zap_completion_queue(void) while (clist != NULL) { struct sk_buff *skb = clist; clist = clist->next; - if (skb->destructor) - dev_kfree_skb_any(skb); /* put this one back */ - else - __kfree_skb(skb); + __kfree_skb(skb); } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 27cfe5f..f720685 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -318,6 +318,12 @@ void kfree_skbmem(struct sk_buff *skb) void __kfree_skb(struct sk_buff *skb) { +#if defined(CONFIG_NETPOLL) || defined(CONFIG_NETPOLL_TRAP) + if (in_irq() || irqs_disabled()) { + dev_kfree_skb_irq(skb); + return; + } +#endif dst_release(skb->dst); #ifdef CONFIG_XFRM secpath_put(skb->sp); -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inotify fails to send IN_ATTRIB events
This looks bad, though: include/linux/fsnotify.h:121: warning: passing argument 2 of 'audit_inode_child' from incompatible pointer type Missing "->d_inode"? M. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Fri, Nov 23, 2007 at 02:35:05PM +1100, Rusty Russell wrote: > On Friday 23 November 2007 12:36:22 Andi Kleen wrote: > > On Friday 23 November 2007 01:25, Rusty Russell wrote: > > > That's my point. If there's a whole class of modules which can use a > > > symbol, why are we ruling out external modules? > > > > The point is to get cleaner interfaces. > > But this doesn't change interfaces at all. It makes modules fail to load > unless they're on a permitted list, which now requires maintenance. The modules wouldn't be using the internal interfaces in the first place with name spaces in place. This serves as a documentation on what is considered internal. And if some obscure module (in or out of tree) wants to use an internal interface they first have to send the module maintainer a patch and get some review this way. I believe that is fairly important in tree too because the kernel has become so big now that review cannot be the only enforcement mechanism for this anymore. Another secondary reason is that there are too many exported interfaces in general. Several distributions have policies that require to keep the changes to these exported interfaces minimal and that is very hard with thousands of exported symbol. With name spaces the number of truly publicly exported symbols will hopefully shrink to a much smaller, more manageable set. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 9/9] Clean up open coded inode dirty checks
On Fri, 2007-11-23 at 19:16 +0100, Jan Engelhardt wrote: > static inline bool xfs_inode_clean(const struct xfs_inode *ip) > { > if (ip->i_itemp == NULL) > return true; > if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) && > ip->i_update_core == NULL) > return true; > return false; > } Your code changed the test. xfs_inode.i_update_core is an unsigned char. I believe reordering the tests to avoid a possibly unnecessary dereference is better. if (ip->i_update_core) return false; if (!ip->i_itemp) return true; return ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 10:32:22PM +0300, Evgeniy Polyakov wrote: > On Fri, Nov 23, 2007 at 01:11:20PM -0600, Matt Mackall ([EMAIL PROTECTED]) > wrote: > > On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote: > > > On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL > > > PROTECTED]) wrote: > > > > On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL > > > > PROTECTED]) wrote: > > > > > Stop, we are trying to free skb without destructor and catch > > > > > connection > > > > > tracking, so it is not a solution. To fix the problem we need to check > > > > > if it is not netfilter related, kind of this (not tested), Simon > > > > > please > > > > > give it a try: > > > > > > > > And to be really cool we need to bypass skbs with xfrm attached, since > > > > its freeing also assumes BH context. > > > > > > What about compile options? > > > > What about my original suggestion that we mark skbs owned by netpoll > > and free only those. Much safer, no? Untested: > > This should work if there are netpoll's skbs, but if we are under memory > pressure we want to free not only netpoll skbs, but at least one, and > what if there are no netpoll skbs in the queue? Yeah, that's a concern (but note that we do have a private reserve and we only really need the zap when our reserve is depleted). But I worry that it's too fragile and if we add a new unsafe case, it won't be noticed for a long time. This is the first report I've seen of this particular problem, so this has been a latent bug for three or four years now. Here's another thought: move all this logic into the networking core, unify it with current softirq zapper, then allow it to be called from various other places (like atomic allocators). Then it'll all be in central maintained place with more users. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 01:11:20PM -0600, Matt Mackall ([EMAIL PROTECTED]) wrote: > On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote: > > On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL > > PROTECTED]) wrote: > > > On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL > > > PROTECTED]) wrote: > > > > Stop, we are trying to free skb without destructor and catch connection > > > > tracking, so it is not a solution. To fix the problem we need to check > > > > if it is not netfilter related, kind of this (not tested), Simon please > > > > give it a try: > > > > > > And to be really cool we need to bypass skbs with xfrm attached, since > > > its freeing also assumes BH context. > > > > What about compile options? > > What about my original suggestion that we mark skbs owned by netpoll > and free only those. Much safer, no? Untested: This should work if there are netpoll's skbs, but if we are under memory pressure we want to free not only netpoll skbs, but at least one, and what if there are no netpoll skbs in the queue? -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 10:15:24PM +0300, Evgeniy Polyakov wrote: > On Fri, Nov 23, 2007 at 12:59:43PM -0600, Matt Mackall ([EMAIL PROTECTED]) > wrote: > > So I'd be surprised if that was a problem. But I can imagine having > > problems for skbs without destructors which run into one of these in > > __kfree_skb: > > > > dst_release > > secpath_put > > nf_conntrack_put > > nf_conntrack_put_reasm > > nf_bridge_put > > > > ..some or all of which assume a softirq context. > > bridging is ok, others require softirq context. > I've sent a patch (the last one should be ok) to guard against xfrm and > connection tracking. > > > > No matter if we are under memory pressure or whatever - it is not > > > allowed - a lot of skbs are supposed to be freed in softirq context, > > > that is why dev_kfree_skb_any() exists. > > > > Some skbs we definitely -can- free in irq context. The only ones we > > care about are the ones generated by netpoll. If there's a reason you > > think netpoll's own skbs can't be freed, please describe it. > > Only some and to distinguish them we can not use destructor - if it is > set (even empty function) it will fire an alarm. Yep, please look at the patch I just posted. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] utsns: Restore proper namespace handling.
When CONFIG_UTS_NS was removed it seems that we also deleted the code for handling sysctls in the other then the initial uts namespace. This patch restores that code. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- kernel/utsname_sysctl.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kernel/utsname_sysctl.c b/kernel/utsname_sysctl.c index c76c064..71f58c3 100644 --- a/kernel/utsname_sysctl.c +++ b/kernel/utsname_sysctl.c @@ -18,6 +18,8 @@ static void *get_uts(ctl_table *table, int write) { char *which = table->data; + struct uts_namespace *uts_ns = current->nsproxy->uts_ns; + which = (which - (char *)_uts_ns) + (char *)uts_ns; if (!write) down_read(_sem); -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 09:59:06PM +0300, Evgeniy Polyakov wrote: > On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: > > On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL > > PROTECTED]) wrote: > > > Stop, we are trying to free skb without destructor and catch connection > > > tracking, so it is not a solution. To fix the problem we need to check > > > if it is not netfilter related, kind of this (not tested), Simon please > > > give it a try: > > > > And to be really cool we need to bypass skbs with xfrm attached, since > > its freeing also assumes BH context. > > What about compile options? What about my original suggestion that we mark skbs owned by netpoll and free only those. Much safer, no? Untested: diff -r c60016ba6237 net/core/netpoll.c --- a/net/core/netpoll.cTue Nov 13 09:09:36 2007 -0800 +++ b/net/core/netpoll.cFri Nov 23 13:10:28 2007 -0600 @@ -203,6 +203,12 @@ static void refill_skbs(void) spin_unlock_irqrestore(_pool.lock, flags); } +/* used to mark an skb as owned by netpoll */ +static void netpoll_skb_destroy(struct sk_buff *skb) +{ + return; +} + static void zap_completion_queue(void) { unsigned long flags; @@ -219,10 +225,12 @@ static void zap_completion_queue(void) while (clist != NULL) { struct sk_buff *skb = clist; clist = clist->next; - if (skb->destructor) + if (skb->destructor == netpoll_skb_destroy) { + skb->destructor = NULL; + __kfree_skb(skb); + } + else dev_kfree_skb_any(skb); /* put this one back */ - else - __kfree_skb(skb); } } @@ -252,6 +260,7 @@ repeat: atomic_set(>users, 1); skb_reserve(skb, reserve); + skb->destructor = netpoll_skb_destroy; return skb; } -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 12:59:43PM -0600, Matt Mackall ([EMAIL PROTECTED]) wrote: > So I'd be surprised if that was a problem. But I can imagine having > problems for skbs without destructors which run into one of these in > __kfree_skb: > > dst_release > secpath_put > nf_conntrack_put > nf_conntrack_put_reasm > nf_bridge_put > > ..some or all of which assume a softirq context. bridging is ok, others require softirq context. I've sent a patch (the last one should be ok) to guard against xfrm and connection tracking. > > No matter if we are under memory pressure or whatever - it is not > > allowed - a lot of skbs are supposed to be freed in softirq context, > > that is why dev_kfree_skb_any() exists. > > Some skbs we definitely -can- free in irq context. The only ones we > care about are the ones generated by netpoll. If there's a reason you > think netpoll's own skbs can't be freed, please describe it. Only some and to distinguish them we can not use destructor - if it is set (even empty function) it will fire an alarm. > > I think we can drop skbs _without_ destructor from the queue though in > > that conditions given that we actually need only one. > > Huh? Don't mind - friday... I posted a patch (third one should be ok) to fix this issue. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the new timerfd?
On Nov 23, 2007 7:38 PM, Ulrich Drepper <[EMAIL PROTECTED]> wrote: > On Nov 23, 2007 9:29 AM, Davide Libenzi <[EMAIL PROTECTED]> wrote: > > Yes, it's disabled, and yes, I'll repost today ... > > I haven't seen the patch and don't feel like searching. So I say it > here: please mak sure you add a flags parameter to the system call > itself (instead of adding it on as for eventfd and signalfd). We need > to be able to use O_CLOEXEC some way or another. Seems reasonable to add this for timer_create() (though unfortunate that it is now too late to do the same for eventfd() and signalfd()). Davide, what do you think? Cheers, Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 08:57:57PM +0300, Evgeniy Polyakov wrote: > On Fri, Nov 23, 2007 at 11:07:56AM -0600, Matt Mackall ([EMAIL PROTECTED]) > wrote: > > On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote: > > > On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL > > > PROTECTED]) wrote: > > > > > [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at > > > > > kernel/softirq.c:139 local_bh_enable() > > > > > [2059664.620535] [<80120364>] local_bh_enable+0x3c/0x97 > > > > > > > > [2059664.620657] [<8011c205>] __call_console_drivers+0x61/0x6d > > > > > [2059664.620669] [<8011c3fc>] release_console_sem+0x164/0x1bf > > > > > [2059664.620679] [<8011c81f>] vprintk+0x27a/0x2ff > > > > > > > If that trace is to be beieved we're doing nefilter stuff on packets > > > > which > > > > were sent across netconsole. > > > > > > > > This probably isn't anything the netfilter guys have thought about. And > > > > probably we don't want them to. Is there some simple way in which we > > > > can > > > > exempt netconsole from netfilter processing? > > > > > > This is not about netfilter, but about freeing skb in interrupt context, > > > which is not allowed, and in interrupt skbs are queued to be freed in > > > softirq, > > > but netcnsole wants to flush softirq freeing queue. That is a question: > > > why? > > > > My memory here is hazy, but I think this exists to rescue netconsole > > in low-memory situations. This bit originated with Ingo, so maybe he > > can recall. > > > > Netpoll can process an arbitrary number of skbs inside a single > > interrupt. Think sysrq-t at one packet per line or kgdboe where the > > entire trace session can happen inside one very long interrupt. > > > > Perhaps we can refine this to mark netpoll's skbs (perhaps with > > ->destructor?) and delete only skbs we own. As these are never passed > > through any of the other route/xfrm/filter code, they should be safe > > to delete even in irq context, yes? > > > > > Removing zap_completion_queue() from find_skb() will fix the warning, > > > but I'm not sure this is a correct fix. I've added Matt to the Cc list. > > > > Care to try the sysrq-t or OOM message tests? > > We basically can not free skbs there - if it is interrupt context and > we are freeing some skb with destructor we will catch the warning anyway. Perhaps I'm missing some context here. We don't free skbs with destructors in irq context in zap_completion_queue. We reinsert them on the completion list. We do this by calling dev_kfree_skb_any. So I'd be surprised if that was a problem. But I can imagine having problems for skbs without destructors which run into one of these in __kfree_skb: dst_release secpath_put nf_conntrack_put nf_conntrack_put_reasm nf_bridge_put ..some or all of which assume a softirq context. > No matter if we are under memory pressure or whatever - it is not > allowed - a lot of skbs are supposed to be freed in softirq context, > that is why dev_kfree_skb_any() exists. Some skbs we definitely -can- free in irq context. The only ones we care about are the ones generated by netpoll. If there's a reason you think netpoll's own skbs can't be freed, please describe it. > I think we can drop skbs _without_ destructor from the queue though in > that conditions given that we actually need only one. Huh? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 09:51:01PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: > > Stop, we are trying to free skb without destructor and catch connection > > tracking, so it is not a solution. To fix the problem we need to check > > if it is not netfilter related, kind of this (not tested), Simon please > > give it a try: > > And to be really cool we need to bypass skbs with xfrm attached, since > its freeing also assumes BH context. What about compile options? Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 758dafe..adb3c54 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -196,10 +196,25 @@ static void zap_completion_queue(void) while (clist != NULL) { struct sk_buff *skb = clist; clist = clist->next; - if (skb->destructor) + if (skb->destructor) { dev_kfree_skb_any(skb); /* put this one back */ - else - __kfree_skb(skb); + continue; + } + +#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) + if (skb->nfct || skb->nfct_reasm) { + dev_kfree_skb_any(skb); /* put this one back */ + continue; + } +#endif + +#ifdef CONFIG_XFRM + if (skb->sp) { + dev_kfree_skb_any(skb); /* put this one back */ + continue; + } +#endif + __kfree_skb(skb); } } -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 09:48:51PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > Stop, we are trying to free skb without destructor and catch connection > tracking, so it is not a solution. To fix the problem we need to check > if it is not netfilter related, kind of this (not tested), Simon please > give it a try: And to be really cool we need to bypass skbs with xfrm attached, since its freeing also assumes BH context. Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 758dafe..5f86e60 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -196,7 +196,8 @@ static void zap_completion_queue(void) while (clist != NULL) { struct sk_buff *skb = clist; clist = clist->next; - if (skb->destructor) + if (skb->destructor || skb->nfct || + skb->nfct_reasm || skb->sp) dev_kfree_skb_any(skb); /* put this one back */ else __kfree_skb(skb); -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 08:57:57PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > > My memory here is hazy, but I think this exists to rescue netconsole > > in low-memory situations. This bit originated with Ingo, so maybe he > > can recall. > > > > Netpoll can process an arbitrary number of skbs inside a single > > interrupt. Think sysrq-t at one packet per line or kgdboe where the > > entire trace session can happen inside one very long interrupt. > > > > Perhaps we can refine this to mark netpoll's skbs (perhaps with > > ->destructor?) and delete only skbs we own. As these are never passed > > through any of the other route/xfrm/filter code, they should be safe > > to delete even in irq context, yes? > > > > > Removing zap_completion_queue() from find_skb() will fix the warning, > > > but I'm not sure this is a correct fix. I've added Matt to the Cc list. > > > > Care to try the sysrq-t or OOM message tests? > > We basically can not free skbs there - if it is interrupt context and > we are freeing some skb with destructor we will catch the warning anyway. > > No matter if we are under memory pressure or whatever - it is not > allowed - a lot of skbs are supposed to be freed in softirq context, > that is why dev_kfree_skb_any() exists. > > I think we can drop skbs _without_ destructor from the queue though in > that conditions given that we actually need only one. Stop, we are trying to free skb without destructor and catch connection tracking, so it is not a solution. To fix the problem we need to check if it is not netfilter related, kind of this (not tested), Simon please give it a try: diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 758dafe..855bb3f 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -196,7 +196,7 @@ static void zap_completion_queue(void) while (clist != NULL) { struct sk_buff *skb = clist; clist = clist->next; - if (skb->destructor) + if (skb->destructor || skb->nfct || skb->nfct_reasm) dev_kfree_skb_any(skb); /* put this one back */ else __kfree_skb(skb); -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Yes, I believe that - otherwise, this problem would have been a crisis a LONG time ago...:-) But I do have some more questions in relationship to how things are mapped in your environment. I have a flat memory map (i.e.: the full 0x0 -- 0x1__ is passed to the 32bit Linux kernel without any 'holes' and/or reserved areas). Does your Intel memory map have this same type of flat memory model (and thus allow use of the FULL lower 4Gig) - or does it reserve areas of lower 4Gig for devices and such - if not - where are these reserved areas - and how do the relate to the I/O memory map for the device? In other words, I would be very interested in seeing the memory map & the PCI memory mapping to see if any overlap/correspond to reserve areas of lower 4 Gig (in a linux 32bit mode)... Tom From: Mark Lord [mailto:[EMAIL PROTECTED] Sent: Fri 11/23/2007 12:46 PM To: Morrison, Tom Cc: Robert Hancock; linux-kernel; ide; Jeff Garzik; Tejun Heo Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3) Morrison, Tom wrote: > I am hopeful that the sata_mv has this bug (I proved that the > problem I was experiencing was due to the sata_mv driver > with 3.75Gig or more of memory)... > > I am on vacation for a week or more ...or I'd tell you today > if it did have this bug! .. Yeah, I kind of had your reports in mind when I asked that. :) On a related note, I now have lots of Marvell (sata_mv) hardware here, and an Intel CPU/chipset box with physical RAM above the 4GB boundary. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the new timerfd?
On Nov 23, 2007 9:29 AM, Davide Libenzi <[EMAIL PROTECTED]> wrote: > Yes, it's disabled, and yes, I'll repost today ... I haven't seen the patch and don't feel like searching. So I say it here: please mak sure you add a flags parameter to the system call itself (instead of adding it on as for eventfd and signalfd). We need to be able to use O_CLOEXEC some way or another. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23.1: Random hangs during boot with "tsc" clocksource
Chuck Ebbert wrote: > On 11/20/2007 06:20 PM, Jordan Russell wrote: >> Same problem with 2.6.23.8. >> >> Are there any specific (TSC related?) patches I should try reverting? >> >> Would it help if I captured the dmesg/SysRq output from one of the >> hanging boots? >> >> Any other information that might be useful in getting to the bottom of this? >> > > Did you try this one? You are seeing problems with preemption disabled, > but it's at least worth trying. > > > > From: Marin Mitov <[EMAIL PROTECTED]> > To: linux-kernel@vger.kernel.org > Subject: [PATCH]new_TSC_based_delay_tsc() > Cc: Ingo Molnar <[EMAIL PROTECTED]> > Date: Tue, 20 Nov 2007 21:32:27 +0200 Thanks for the response. I backported that patch to 2.6.23.8, but it didn't make a difference. I also went ahead and tested 2.6.24-rc3 (with no changes to the existing config settings): With the new CPU_IDLE option set to "n", it still hangs, but much less frequently than on 2.6.23.x. In 25 tries, there were 3 hangs, all after "input: AT Translated Set 2 keyboard as /class/input/input0". With CPU_IDLE set to "y", it didn't hang at all in 25 tries. However, CPU_IDLE=y produces these additional messages, which may explain why: Marking TSC unstable due to: TSC halts in idle. ... Time: acpi_pm clocksource has been installed. Not sure what else to try at this point... -- Jordan Russell - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 9/9] Clean up open coded inode dirty checks
On Nov 23 2007 18:02, Christoph Hellwig wrote: > >> +STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip) >> +{ >> +return (((ip->i_itemp == NULL) || >> +!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && >> +(ip->i_update_core == 0)); >> +} > >Can we please get rid of this useless STATIC_INLINE junk? It's really >hurting my eyes. > >As does to a lesser extent the verbose style of this >function. I have to disagree, but whatever. >static inline int xfs_inode_clean(struct xfs_inode *ip) ^ ^ could be bool - and const >{ > return (!ip->i_itemp || > !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && > !ip->i_update_core; >} Perhaps for greater readability: static inline bool xfs_inode_clean(const struct xfs_inode *ip) { if (ip->i_itemp == NULL) return true; if (!(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL) && ip->i_update_core == NULL) return true; return false; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 9/9] Clean up open coded inode dirty checks
> +STATIC_INLINE int xfs_inode_clean(xfs_inode_t *ip) > +{ > + return (((ip->i_itemp == NULL) || > + !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && > + (ip->i_update_core == 0)); > +} Can we please get rid of this useless STATIC_INLINE junk? It's really hurting my eyes. As does to a lesser extent the verbose style of this function. This should be something like: static inline int xfs_inode_clean(struct xfs_inode *ip) { return (!ip->i_itemp || !(ip->i_itemp->ili_format.ilf_fields & XFS_ILOG_ALL)) && !ip->i_update_core; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/9] Remove xfs_icluster
On Thu, Nov 22, 2007 at 11:39:52AM +1100, David Chinner wrote: > Remove the xfs_icluster structure and replace with a radix tree lookup. > > We don't need to keep a list of inodes in each cluster around anymore > as we can look them up quickly when we need to. The only time we need > to do this now is during inode writeback. > > Factor the inode cluster writeback code out of xfs_iflush and convert > it to use radix_tree_gang_lookup() instead of walking a list of > inodes built when we first read in the inodes. > > This remove 3 pointers from each xfs_inode structure and the xfs_icluster > structure per inode cluster. Hence we reduce the cache footprint of the > xfs_inodes by between 5-10% depending on cluster sparseness. > > To be truly efficient we need a radix_tree_gang_lookup_range() call > to stop searching once we are past the end of the cluster instead > of trying to find a full cluster's worth of inodes. Nice, I like this a lot. I was wondering about something like this already when you put in the radix-tree based inode cache. > +STATIC int > +xfs_iflush_cluster( > + xfs_inode_t *ip, > + xfs_buf_t *bp) > +{ > + xfs_mount_t *mp = ip->i_mount; > + xfs_perag_t *pag = xfs_get_perag(mp, ip->i_ino); > + unsigned long first_index, mask; > + int ilist_size; > + xfs_inode_t *ilist; > + xfs_inode_t *iq; > + xfs_inode_log_item_t*iip; > + int nr_found; > + int clcount = 0; > + int bufwasdelwri; > + > + ASSERT(pag->pagi_inodeok); > + ASSERT(pag->pag_ici_init); > + > + ilist_size = XFS_INODE_CLUSTER_SIZE(mp) * sizeof(xfs_inode_t *); > + ilist = kmem_alloc(ilist_size, KM_MAYFAIL); > + if (!ilist) > + return 0; Now if you just used the linux native allocator this could be a kcalloc :) > + if ((iq->i_update_core == 0) && > + ((iip == NULL) || > + !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) && > + xfs_ipincount(iq) == 0) { > + continue; > + } if (!iq->i_update_core && (!iip || !(iip->ili_format.ilf_fields & XFS_ILOG_ALL)) && !xfs_ipincount(iq)) continue; > + /* > + * arriving here means that this inode can be flushed. First > + * re-check that it's dirty before flushing. > + */ > + iip = iq->i_itemp; > + if ((iq->i_update_core != 0) || ((iip != NULL) && > + (iip->ili_format.ilf_fields & XFS_ILOG_ALL))) { if (!iq->i_update_core || (!iip && (iip->ili_format.ilf_fields & XFS_ILOG_ALL)) { > + /* > + * Clean up the buffer. If it was B_DELWRI, just release it -- > + * brelse can handle it with no problems. If not, shut down the > + * filesystem before releasing the buffer. > + */ > + bufwasdelwri = XFS_BUF_ISDELAYWRITE(bp); > + if (bufwasdelwri) > + xfs_buf_relse(bp); > + > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); > + > + if (!bufwasdelwri) { > + /* > + * Just like incore_relse: if we have b_iodone functions, > + * mark the buffer as an error and call them. Otherwise > + * mark it as stale and brelse. > + */ > + if (XFS_BUF_IODONE_FUNC(bp)) { > + XFS_BUF_CLR_BDSTRAT_FUNC(bp); > + XFS_BUF_UNDONE(bp); > + XFS_BUF_STALE(bp); > + XFS_BUF_SHUT(bp); > + XFS_BUF_ERROR(bp,EIO); > + xfs_biodone(bp); > + } else { > + XFS_BUF_STALE(bp); > + xfs_buf_relse(bp); > + } > + } What's the point of all this if the filesystem is shut down anyway? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 11:07:56AM -0600, Matt Mackall ([EMAIL PROTECTED]) wrote: > On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote: > > On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL PROTECTED]) > > wrote: > > > > [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at > > > > kernel/softirq.c:139 local_bh_enable() > > > > [2059664.620535] [<80120364>] local_bh_enable+0x3c/0x97 > > > > > > [2059664.620657] [<8011c205>] __call_console_drivers+0x61/0x6d > > > > [2059664.620669] [<8011c3fc>] release_console_sem+0x164/0x1bf > > > > [2059664.620679] [<8011c81f>] vprintk+0x27a/0x2ff > > > > > If that trace is to be beieved we're doing nefilter stuff on packets which > > > were sent across netconsole. > > > > > > This probably isn't anything the netfilter guys have thought about. And > > > probably we don't want them to. Is there some simple way in which we can > > > exempt netconsole from netfilter processing? > > > > This is not about netfilter, but about freeing skb in interrupt context, > > which is not allowed, and in interrupt skbs are queued to be freed in > > softirq, > > but netcnsole wants to flush softirq freeing queue. That is a question: why? > > My memory here is hazy, but I think this exists to rescue netconsole > in low-memory situations. This bit originated with Ingo, so maybe he > can recall. > > Netpoll can process an arbitrary number of skbs inside a single > interrupt. Think sysrq-t at one packet per line or kgdboe where the > entire trace session can happen inside one very long interrupt. > > Perhaps we can refine this to mark netpoll's skbs (perhaps with > ->destructor?) and delete only skbs we own. As these are never passed > through any of the other route/xfrm/filter code, they should be safe > to delete even in irq context, yes? > > > Removing zap_completion_queue() from find_skb() will fix the warning, > > but I'm not sure this is a correct fix. I've added Matt to the Cc list. > > Care to try the sysrq-t or OOM message tests? We basically can not free skbs there - if it is interrupt context and we are freeing some skb with destructor we will catch the warning anyway. No matter if we are under memory pressure or whatever - it is not allowed - a lot of skbs are supposed to be freed in softirq context, that is why dev_kfree_skb_any() exists. I think we can drop skbs _without_ destructor from the queue though in that conditions given that we actually need only one. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 23.11.2007 12:38, Hannes Reinecke a écrit : > Hannes Reinecke wrote: >> Laurent Riffard wrote: >>> Le 21.11.2007 23:41, Andrew Morton a écrit : On Wed, 21 Nov 2007 22:45:22 +0100 Laurent Riffard <[EMAIL PROTECTED]> wrote: > Le 21.11.2007 05:45, Andrew Morton a écrit : >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/ > Hello, > > My system hangs shortly after I logged in Gnome desktop. SysRq-W shows > that a bunch of task are blocked in "D" state, they seem to wait for > some I/O completion. I can try to hand-copy some data if requested. > > I found these messages in dmesg: > > ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 > EXT3-fs: mounted filesystem with ordered data mode. > sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT > driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sda, sector 16460 > ReiserFS: sda7: found reiserfs format "3.6" with standard journal > ReiserFS: sda7: using ordered data mode > -- > ReiserFS: sda7: Using r5 hash to sort names > sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT > driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdb, sector 19632 > sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT > driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdb, sector 40037363 > Adding 1048568k swap on /dev/mapper/vglinux1-lvswap. Priority:-1 > extents:1 across:1048568k > lp0: using parport0 (interrupt-driven). > > These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible. > 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine. > > Maybe something is broken in pata_via driver ? > Could be - libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch and pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch touch pata_via.c. >>> None of the above... >>> >>> I did a bisection, it spotted git-scsi-misc.patch. >>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine. >>> >>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not >>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other >>> commits are touching documentation or drivers I don't use. I'll try >>> to revert only this one this evening. I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 does fix the problem. >> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an error >> where >> I shouldn't. Checking ... >> > Ok, found it. We are blocking even special commands (ie requests with PREEMPT > not set) > when FAILFAST is set. Which is clearly wrong. The attached patch fixes this. Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O errors. -- laurent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/9] Don't block pdflush when flushing inodes
> +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c 2007-11-22 10:33:51.037704348 +1100 > @@ -183,12 +183,20 @@ xfs_imap_to_bp( > int ni; > xfs_buf_t *bp; > > + if (buf_flags == 0) > + buf_flags = XFS_BUF_LOCK; There's just two caller and they never pass 0, so this is not needed. > + error = xfs_itobp_flags(mp, NULL, ip, , , 0, 0, > + (noblock) ? XFS_BUF_TRYLOCK : XFS_BUF_LOCK); no need for the braces around noblock. > +int xfs_itobp_flags(struct xfs_mount *, struct xfs_trans *, > xfs_inode_t *, struct xfs_dinode **, struct xfs_buf > **, > - xfs_daddr_t, uint); > + xfs_daddr_t, uint, uint); > +#define xfs_itobp(mp, tp, ip, dipp, bpp, bno, iflags)\ > + xfs_itobp_flags(mp, tp, ip, dipp, bpp, bno, iflags, XFS_BUF_LOCK) I'd say just convert xfs_itobp and all it's user to take the additional argument. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 3/3] virtio PCI device
Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: Well please propose the virtio API first and then I'll adjust the PCI ABI. I don't want to build things into the ABI that we never actually end up using in virtio :-) Move ->kick() to virtio_driver. Then on each kick, all queues have to be checked for processing? What devices do you expect this would help? Networking. I believe Xen networking uses the same event channel for both rx and tx, so in effect they're using this model. Long time since I looked though, I would have to look, but since rx/tx are rather independent actions, I'm not sure that you would really save that much. You still end up doing the same number of kicks unless I'm missing something. rx and tx are closely related. You rarely have one without the other. In fact, a turned implementation should have zero kicks or interrupts for bulk transfers. The rx interrupt on the host will process new tx descriptors and fill the guest's rx queue; the guest's transmit function can also check the receive queue. I don't know if that's achievable for Linuz guests currently, but we should aim to make it possible. Another point is that virtio still has a lot of leading zeros in its mileage counter. We need to keep things flexible and learn from others as much as possible, especially when talking about the ABI. I'm wary of introducing the notion of hypercalls to this device because it makes the device VMM specific. Maybe we could have the device provide an option ROM that was treated as the device "BIOS" that we could use for kicking and interrupt acking? Any idea of how that would map to Windows? Are there real PCI devices that use the option ROM space to provide what's essentially firmware? Unfortunately, I don't think an option ROM BIOS would map well to other architectures. The BIOS wouldn't work even on x86 because it isn't mapped to the guest address space (at least not consistently), and doesn't know the guest's programming model (16, 32, or 64-bits? segmented or flat?) Xen uses a hypercall page to abstract these details out. However, I'm not proposing that. Simply indicate that we support hypercalls, and use some layer below to actually send them. It is the responsibility of this layer to detect if hypercalls are present and how to call them. Hey, I think the best place for it is in paravirt_ops. We can even patch the hypercall instruction inline, and the driver doesn't need to know about it. None of the PCI devices currently work like that in QEMU. It would be very hard to make a device that worked this way because since the order in which values are written matter a whole lot. For instance, if you wrote the status register before the queue information, the driver could get into a funky state. I assume you're talking about restore? Isn't that atomic? If you're doing restore by passing the PCI config blob to a registered routine, then sure, but that doesn't seem much better to me than just having the device generate that blob in the first place (which is what we have today). I was assuming that you would want to use the existing PIO/MMIO handlers to do restore by rewriting the config as if the guest was. Sure some complexity is unavoidable. But flat is simpler than indirect. Not much of an argument, I know. wrt. number of queues, 8 queues will consume 32 bytes of pci space if all you store is the ring pfn. You also at least need a num argument which takes you to 48 or 64 depending on whether you care about strange formatting. 8 queues may not be enough either. Eric and I have discussed whether the 9p virtio device should support multiple mounts per-virtio device and if so, whether each one should have it's own queue. Any devices that supports this sort of multiplexing will very quickly start using a lot of queues. Make it appear as a pci function? (though my feeling is that multiple mounts should be different devices; we can then hotplug mountpoints). We may run out of PCI slots though :-/ Then we can start selling virtio extension chassis. -- Any sufficiently difficult bug is indistinguishable from a feature. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Morrison, Tom wrote: I am hopeful that the sata_mv has this bug (I proved that the problem I was experiencing was due to the sata_mv driver with 3.75Gig or more of memory)... I am on vacation for a week or more ...or I'd tell you today if it did have this bug! .. Yeah, I kind of had your reports in mind when I asked that. :) On a related note, I now have lots of Marvell (sata_mv) hardware here, and an Intel CPU/chipset box with physical RAM above the 4GB boundary. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/9] Factor common inode cluster buffer lookup code
On Thu, Nov 22, 2007 at 11:36:42AM +1100, David Chinner wrote: > +STATIC int > +xfs_ino_to_imap( > + xfs_mount_t *mp, > + xfs_trans_t *tp, > + xfs_ino_t ino, > + xfs_imap_t *imap, > + uintimap_flags) > +{ > + int error; > + > + error = xfs_imap(mp, tp, ino, imap, imap_flags); > + if (error) { > + cmn_err(CE_WARN, "xfs_ino_to_imap: xfs_imap() returned an " > + "error %d on %s. Returning error.", > + error, mp->m_fsname); > + return error; > + } > + > + /* > + * If the inode number maps to a block outside the bounds > + * of the file system then return NULL rather than calling > + * read_buf and panicing when we get an error from the > + * driver. > + */ > + if ((imap->im_blkno + imap->im_len) > > + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { > + xfs_fs_cmn_err(CE_ALERT, mp, "xfs_ino_to_imap: " > + "(imap->im_blkno (0x%llx) + imap->im_len (0x%llx)) > " > + " XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) (0x%llx)", > + (unsigned long long) imap->im_blkno, > + (unsigned long long) imap->im_len, > + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)); > + return XFS_ERROR(EINVAL); > + } What about just adding this verification to xfs_imap instead of creating this wrapper for two of it's three callers? Otherwise this patch looks fine to me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/9] Use _META bio I/O types for metadata I/O
On Thu, Nov 22, 2007 at 11:35:12AM +1100, David Chinner wrote: > Improve metadata I/O merging in the elevator > > Change all async metadata buffers to use [READ|WRITE]_META I/O types > so that the I/O doesn't get issued immediately. This allows merging > of adjacent metadata requests but still prioritises them over bulk > data. This shows a 10-15% improvement in sequential create speed of > small files. > > Don't include the log buffers in this classification - leave them > as sync types so they are issued immediately. Looks good, and just including the trivial fs.h addition here might be okay aswell. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
I am hopeful that the sata_mv has this bug (I proved that the problem I was experiencing was due to the sata_mv driver with 3.75Gig or more of memory)... I am on vacation for a week or more ...or I'd tell you today if it did have this bug! From: [EMAIL PROTECTED] on behalf of Mark Lord Sent: Fri 11/23/2007 10:22 AM To: Robert Hancock Cc: linux-kernel; ide; Jeff Garzik; Tejun Heo Subject: Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3) Robert Hancock wrote: > This fixes some problems with ATAPI devices on nForce4 controllers in ADMA > mode > on systems with memory located above 4GB. We need to delay setting the 64-bit > DMA mask until the PRD table and padding buffer are allocated so that they > don't > get allocated above 4GB and break legacy mode (which is needed for ATAPI > devices). ... Mmm.. I wonder how many other libata drivers have this exact same bug, whether noticed yet or not ? Cheers - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the new timerfd?
On Fri, 23 Nov 2007, Andrew Morton wrote: > > I suppose this means that timerfd will only go in for 2.6.25. I don't > > have a problem with that, but we better make sure that the existing > > timerfd in 2.6.24 is still disabled. (Andrew had a one liner for > > that, but I haven't checked if it's in place.) > > > > I have no timerfd patches here. Yes, it's disabled, and yes, I'll repost today ... - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the new timerfd?
On Fri, 23 Nov 2007 13:39:55 +0100 "Michael Kerrisk" <[EMAIL PROTECTED]> wrote: > On 11/23/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Thu, 22 Nov 2007 16:35:38 -0800 (PST) Davide Libenzi > > <[EMAIL PROTECTED]> wrote: > > > > > On Thu, 22 Nov 2007, Andrew Morton wrote: > [...] > > > > Last I recall, we removed the API for 2.6.23 because we intended to do a > > > > different interface for 2.6.24. > > > > > > > > But I don't recall seeing any timerfd patches in maybe a month. > > > > > > Was sent on Sep 23, Subject: new timerfd API > > > > Half of us weren't born then ;) > > > > > Do you want me to repost? > > > > yes please. > > I suppose this means that timerfd will only go in for 2.6.25. I don't > have a problem with that, but we better make sure that the existing > timerfd in 2.6.24 is still disabled. (Andrew had a one liner for > that, but I haven't checked if it's in place.) > I have no timerfd patches here. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23 WARNING: at kernel/softirq.c:139 local_bh_enable()
On Fri, Nov 23, 2007 at 01:55:19PM +0300, Evgeniy Polyakov wrote: > On Fri, Nov 23, 2007 at 12:21:57AM -0800, Andrew Morton ([EMAIL PROTECTED]) > wrote: > > > [2059664.615816] __iptables__: init4 IN=ppp0 OUT=ppp0 WARNING: at > > > kernel/softirq.c:139 local_bh_enable() > > > [2059664.620535] [<80120364>] local_bh_enable+0x3c/0x97 > > > > [2059664.620657] [<8011c205>] __call_console_drivers+0x61/0x6d > > > [2059664.620669] [<8011c3fc>] release_console_sem+0x164/0x1bf > > > [2059664.620679] [<8011c81f>] vprintk+0x27a/0x2ff > > > If that trace is to be beieved we're doing nefilter stuff on packets which > > were sent across netconsole. > > > > This probably isn't anything the netfilter guys have thought about. And > > probably we don't want them to. Is there some simple way in which we can > > exempt netconsole from netfilter processing? > > This is not about netfilter, but about freeing skb in interrupt context, > which is not allowed, and in interrupt skbs are queued to be freed in softirq, > but netcnsole wants to flush softirq freeing queue. That is a question: why? My memory here is hazy, but I think this exists to rescue netconsole in low-memory situations. This bit originated with Ingo, so maybe he can recall. Netpoll can process an arbitrary number of skbs inside a single interrupt. Think sysrq-t at one packet per line or kgdboe where the entire trace session can happen inside one very long interrupt. Perhaps we can refine this to mark netpoll's skbs (perhaps with ->destructor?) and delete only skbs we own. As these are never passed through any of the other route/xfrm/filter code, they should be safe to delete even in irq context, yes? > Removing zap_completion_queue() from find_skb() will fix the warning, > but I'm not sure this is a correct fix. I've added Matt to the Cc list. Care to try the sysrq-t or OOM message tests? -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.24-rc3-mm1] IPC: consolidate sem_exit_ns(), msg_exit_ns and shm_exit_ns()
sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an ipc_namespace is released to free all ipcs of each type. But in fact, they do the same thing: they loop around all ipcs to free them individually by calling a specific routine. This patch proposes to consolidate this by introducing a common function, free_ipcs(), that do the job. The specific routine to call on each individual ipcs is passed as parameter. For this, these ipc-specific 'free' routines are reworked to take a generic 'struct ipc_perm' as parameter. Signed-off-by: Pierre Peiffer <[EMAIL PROTECTED]> --- include/linux/ipc_namespace.h |5 - ipc/msg.c | 28 +--- ipc/namespace.c | 30 ++ ipc/sem.c | 27 +-- ipc/shm.c | 27 ++- 5 files changed, 50 insertions(+), 67 deletions(-) Index: b/ipc/msg.c === --- a/ipc/msg.c +++ b/ipc/msg.c @@ -72,7 +72,7 @@ struct msg_sender { #define msg_unlock(msq)ipc_unlock(&(msq)->q_perm) #define msg_buildid(id, seq) ipc_buildid(id, seq) -static void freeque(struct ipc_namespace *, struct msg_queue *); +static void freeque(struct ipc_namespace *, struct kern_ipc_perm *); static int newque(struct ipc_namespace *, struct ipc_params *); #ifdef CONFIG_PROC_FS static int sysvipc_msg_proc_show(struct seq_file *s, void *it); @@ -91,26 +91,7 @@ void msg_init_ns(struct ipc_namespace *n #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { - struct msg_queue *msq; - struct kern_ipc_perm *perm; - int next_id; - int total, in_use; - - down_write(_ids(ns).rw_mutex); - - in_use = msg_ids(ns).in_use; - - for (total = 0, next_id = 0; total < in_use; next_id++) { - perm = idr_find(_ids(ns).ipcs_idr, next_id); - if (perm == NULL) - continue; - ipc_lock_by_ptr(perm); - msq = container_of(perm, struct msg_queue, q_perm); - freeque(ns, msq); - total++; - } - - up_write(_ids(ns).rw_mutex); + free_ipcs(ns, _ids(ns), freeque); } #endif @@ -274,9 +255,10 @@ static void expunge_all(struct msg_queue * msg_ids.rw_mutex (writer) and the spinlock for this message queue are held * before freeque() is called. msg_ids.rw_mutex remains locked on exit. */ -static void freeque(struct ipc_namespace *ns, struct msg_queue *msq) +static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) { struct list_head *tmp; + struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm); expunge_all(msq, -EIDRM); ss_wakeup(>q_senders, 1); @@ -582,7 +564,7 @@ asmlinkage long sys_msgctl(int msqid, in break; } case IPC_RMID: - freeque(ns, msq); + freeque(ns, >q_perm); break; } err = 0; Index: b/ipc/namespace.c === --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -44,6 +44,36 @@ struct ipc_namespace *copy_ipcs(unsigned return new_ns; } +/* + * free_ipcs - free all ipcs of one type + * @ns: the namespace to remove the ipcs from + * @ids: the table of ipcs to free + * @free: the function called to free each individual ipc + * + * Called for each kind of ipc when an ipc_namespace exits. + */ +void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids, + void (*free)(struct ipc_namespace *, struct kern_ipc_perm *)) +{ + struct kern_ipc_perm *perm; + int next_id; + int total, in_use; + + down_write(>rw_mutex); + + in_use = ids->in_use; + + for (total = 0, next_id = 0; total < in_use; next_id++) { + perm = idr_find(>ipcs_idr, next_id); + if (perm == NULL) + continue; + ipc_lock_by_ptr(perm); + free(ns, perm); + total++; + } + up_write(>rw_mutex); +} + void free_ipc_ns(struct kref *kref) { struct ipc_namespace *ns; Index: b/ipc/sem.c === --- a/ipc/sem.c +++ b/ipc/sem.c @@ -94,7 +94,7 @@ #define sem_buildid(id, seq) ipc_buildid(id, seq) static int newary(struct ipc_namespace *, struct ipc_params *); -static void freeary(struct ipc_namespace *, struct sem_array *); +static void freeary(struct ipc_namespace *, struct kern_ipc_perm *); #ifdef CONFIG_PROC_FS static int sysvipc_sem_proc_show(struct seq_file *s, void *it); #endif @@ -129,25 +129,7 @@ void sem_init_ns(struct ipc_namespace *n #ifdef CONFIG_IPC_NS void sem_exit_ns(struct ipc_namespace *ns) { - struct sem_array *sma; - struct kern_ipc_perm *perm; - int next_id; -
Re: [kvm-devel] [PATCH 3/3] virtio PCI device
Avi Kivity wrote: Anthony Liguori wrote: Well please propose the virtio API first and then I'll adjust the PCI ABI. I don't want to build things into the ABI that we never actually end up using in virtio :-) Move ->kick() to virtio_driver. Then on each kick, all queues have to be checked for processing? What devices do you expect this would help? I believe Xen networking uses the same event channel for both rx and tx, so in effect they're using this model. Long time since I looked though, I would have to look, but since rx/tx are rather independent actions, I'm not sure that you would really save that much. You still end up doing the same number of kicks unless I'm missing something. I was thinking more along the lines that a hypercall-based device would certainly be implemented in-kernel whereas the current device is naturally implemented in userspace. We can simply use a different device for in-kernel drivers than for userspace drivers. Where the device is implemented is an implementation detail that should be hidden from the guest, isn't that one of the strengths of virtualization? Two examples: a file-based block device implemented in qemu gives you fancy file formats with encryption and compression, while the same device implemented in the kernel gives you a low-overhead path directly to a zillion-disk SAN volume. Or a user-level network device capable of running with the slirp stack and no permissions vs. the kernel device running copyless most of the time and using a dma engine for the rest but requiring you to be good friends with the admin. The user should expect zero reconfigurations moving a VM from one model to the other. I'm wary of introducing the notion of hypercalls to this device because it makes the device VMM specific. Maybe we could have the device provide an option ROM that was treated as the device "BIOS" that we could use for kicking and interrupt acking? Any idea of how that would map to Windows? Are there real PCI devices that use the option ROM space to provide what's essentially firmware? Unfortunately, I don't think an option ROM BIOS would map well to other architectures. None of the PCI devices currently work like that in QEMU. It would be very hard to make a device that worked this way because since the order in which values are written matter a whole lot. For instance, if you wrote the status register before the queue information, the driver could get into a funky state. I assume you're talking about restore? Isn't that atomic? If you're doing restore by passing the PCI config blob to a registered routine, then sure, but that doesn't seem much better to me than just having the device generate that blob in the first place (which is what we have today). I was assuming that you would want to use the existing PIO/MMIO handlers to do restore by rewriting the config as if the guest was. Not much of an argument, I know. wrt. number of queues, 8 queues will consume 32 bytes of pci space if all you store is the ring pfn. You also at least need a num argument which takes you to 48 or 64 depending on whether you care about strange formatting. 8 queues may not be enough either. Eric and I have discussed whether the 9p virtio device should support multiple mounts per-virtio device and if so, whether each one should have it's own queue. Any devices that supports this sort of multiplexing will very quickly start using a lot of queues. Make it appear as a pci function? (though my feeling is that multiple mounts should be different devices; we can then hotplug mountpoints). We may run out of PCI slots though :-/ I think most types of hardware have some notion of a selector or mode. Take a look at the LSI adapter or even VGA. True. They aren't fun to use, though. I don't think they're really any worse :-) Regards, Anthony Liguori - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch/backport] CFS scheduler, -v24, for v2.6.24-rc3, v2.6.23.8, v2.6.22.13, v2.6.21.7
Ingo, I downloaded the updated cfs-v24 patch and applied to 2.6.22.13. Compiled and ran fine. Suspend and hibernate are working on my nc6000 laptop now. Now I'm off to compile and run 2.6.22.14. -- Thanks, Durand --- Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Durand <[EMAIL PROTECTED]> wrote: > > > Ingo, > > > > Just applied this patch to 2.6.22.13 and 2.6.22.14. Compiles and runs > > fine but on my laptop, it prevents suspending and hibernating with > > "one tasks refuses to freeze" load_balance_mo. > > please re-download the v24 patch, it should have this bug fixed. > > Ingo > Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/