Re: [PATCH v2] Bitbanging i2c bus driver using the GPIO API
On Sat, 2007-03-10 at 14:13 +0100, Haavard Skinnemoen wrote: This is a very simple bitbanging i2c bus driver utilizing the new arch-neutral GPIO API. Useful for chips that don't have a built-in i2c controller, additional i2c busses, or testing purposes. Sorry for missing this hot discussion. Your idea is exactly what I want. So many arch specific GPIO based I2C adapter implementation will benefit from this. To use, include something similar to the following in the board-specific setup code: #include linux/i2c-gpio.h static struct i2c_gpio_platform_data i2c_gpio_data = { .sda_pin= GPIO_PIN_FOO, .scl_pin= GPIO_PIN_BAR, }; Is this usage right, because 3 flags are added to this structure as below: struct i2c_gpio_platform_data { unsigned int sda_pin; unsigned int scl_pin; unsigned int sda_is_open_drain:1; unsigned int scl_is_open_drain:1; unsigned int scl_is_output_only:1; }; static struct platform_device i2c_gpio_device = { .name = i2c-gpio, .id = 0, .dev= { .platform_data = i2c_gpio_data, }, }; Register this platform_device, set up the i2c pins as GPIO if required and you're ready to go. Signed-off-by: Haavard Skinnemoen [EMAIL PROTECTED] --- This patch is different from the first patch in the following ways: * Handles pins set up as open drain (aka multidrive) by toggling the output value instead of the direction * Handles output-only SCL pins the same way, and also does not install a getscl() callback for such pins * Does not add anything to include/linux/i2c-ids.h * Sets the output value explicitly after changing the direction to output. * Plugs a memory leak in remove() -- algo_data wasn't freed. * Prints out the pin IDs in decimal, with an extra note when clock stretching isn't supported This version has been compile-tested only. I'll give it a spin when I get back to work on monday. Dave, does this address your concerns? Haavard Thanks a lot, I will drop our GPIO based I2C driver and try this one on our platform. + if (!pdata-scl_is_output_only) + bit_data-getscl = i2c_gpio_getscl, + + bit_data-getsda= i2c_gpio_getsda, + bit_data-udelay= 5,/* 100 kHz */ + bit_data-timeout = HZ / 10, /* 100 ms */ Can we add these udelay/timeout to struct i2c_gpio_platform_data? And let customer to choose these according their specific requirement. We use Kconfig to do this, but Jean and David don't like the idea, -:( Regards, -Bryan Wu - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i.MX/MX1 SDHC fix/workaround of SD card recognition problems
On Monday 12 March 2007 00:36, you wrote: Pavel Pisa wrote: The SDHC controllers cannot process shorter transfers. They has to be handled as longer ones, but it such case CRC error is evaluated. There was a case in the code still, where this error is not ignored as it should to be process these transfers. Signed-off-by: Pavel Pisa [EMAIL PROTECTED] Thanks, applied. Is this something critical that should be in 2.6.21? Rgds Hello Pierre, this should go to 2.6.21, I have hold this for some months and I have discussed it in the thread Re: CRC Errors with SD cards in 4bits mode (on i.MXl) You have been CCed. This is not solution for seen data CRC problem, but solves problems with recognition of cards which has been timing sensitive sometimes. I have sent it into Russell's patch queue with my others MX1 fixes I have intended to be included in 2.6.21. It was probably mistake for this one, because it should go through your tree. If you send it to mainline yourself, I would discard patch from patch daemon. We have spoken about MX1 SDHC maintainership. I am attaching my subscription. I am not sure about mailing list field there. Do you suggest this one, ALKML or other? Best wishes Pavel Pisa -- Subject: i.MX/MX1 SDHC maintainer I am reporting to responsibility for i.MX MMC driver bugs and coordination of the fighting against problems of this hardware beast. Signed-off-by: Pavel Pisa [EMAIL PROTECTED] MAINTAINERS |7 +++ 1 file changed, 7 insertions(+) Index: linux-2.6.21-rc1/MAINTAINERS === --- linux-2.6.21-rc1.orig/MAINTAINERS +++ linux-2.6.21-rc1/MAINTAINERS @@ -1713,6 +1713,13 @@ M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (subscribers-only) S: Maintained +IMX MMC/SD HOST CONTROLLER INTERFACE DRIVER +P: Pavel Pisa +M: [EMAIL PROTECTED] +L: [EMAIL PROTECTED] +W: http://mmc.drzeus.cx/wiki/Controllers/Freescale/SDHC +S: Maintained + INFINIBAND SUBSYSTEM P: Roland Dreier M: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd
This is a bug actually in the megaraid. Aha, I'll track it. And this is a direct command submission path: it already passed both online check gates in this path *after* the device was offlined, so adding a third won't fix this. Yeah, I have notice that, however, from the logs, the device have offline, but why still can send cmd to device? isn't the sequences of printk suspectful? single disk, so the I/O was definitely bound for sda? Secondly, can you reproduce with a modern (2.6.20) kernel. Your trace strongly suggests that the device came back online for some reason and then the megaraid driver died. It's hard to update the kernel for the system is a production system, and we cannot debug it at the box :( I dont know if you have notice, the logs come from diskdump, if it caused by diskdump? Thanks, Joe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/3] fs: introduce perform_write aop
On Sat, Mar 10, 2007 at 09:25:41AM +, Christoph Hellwig wrote: On Fri, Mar 09, 2007 at 03:33:01PM -0800, Mark Fasheh wrote: -kernel_write() as opposed to genericizing -perform_write() would be fine with me. Just so long as we get rid of -prepare_write and -commit_write in that other kernel code doesn't call them directly. That interface just doesn't work for Ocfs2. It doesn't work for any filesystem that needs slightly fancy locking. That and the reason that's an interface that doesn't fit into our layering is why I want to get rid of it. Note that fops-kernel_write might in fact use -perform_write with an actor as Nick suggested. I'm not quite sure how it'll look like - I'd rather take care of the buffered write path first and then handle this issue once the first changes have stabilized. Right now I've got Ocfs2 implementing it's own lowest-level buffered write code - think generic_file_buffered_write() replacement for Ocfs2. With some duplicated code above that layer. What's nice is that I can abstract away the copy data into some target pages bits such that the majority of that code is re-usable for ocfs2's splice write operation. I'm not sure we could have that low a level of abstraction for anyhing above individual the file system though which also has to deal with non-kernel writes though. That's where a -kernel_write() might come in handy. Why do you need your own low level buffered write functionality? As in past times when filesystems want to come up I'd like to have a very good exaplanation on why you think it's needed and whether we shouldn't improve the generic buffered write code instead. Fair enough - I personally tried everything I could before coming to the conclusion that for the time being, Ocfs2 should have a seperate write path. As you know, I've been adding sparse file support for Ocfs2. Putting aside all the reasons to have real support for sparse files (as opposed to zeroing allocated regions), the tree code changes alone has gotten us 90% the way to supporting unwritten extents (much like xfs does). Ocfs2 supports atomic data allocation units ('clusters', to use an overloaded term) which can range in size from 4k to 1 meg. This means that for allocating writes on page size cluster size file systems, we have to zero pages adjacent to the one being written so that a re-read doesn't return dirty data. This alone requires page locking which we can't realistically achieve via -prepare_write() and -commit_write(). I believe NTFS has a similar restriction, which has lead to their own file write. So, page locking was definitely the straw that broke the camels back. Some other things which were akward or slightly less critically broken than the page locking: Since ocfs2 has a rather large (compared to a local file system) context to build up during an allocating write, it became uncomfortable to pass that around -prepare_write() and -commit_write() without putting that context on our struct inode and protecting it with a lock. And since the existing interfaces were so rigid, it actually required a lot more context to be passed around than in my current code. There's also the cluster lock / page lock inversion which we have to deal with (it gets even worse if we fault in pages in the middle of the user copy for a write). Granted, we fixed a lot of that before merging, but allocating in write means taking even more cluster locks and I don't really feel comfortable nesting so many of those within the page locks. Finally, we get to the optimization problem - writing stuff one page at a time. To be fair, my current stuff doesn't do a very good job of optimizing the amount of data written in a given pass, but the groundwork is there to easily write at least one clusters worth of user data at a time. My priority has been mostly to stabilize it as opposed to performance tuning. So, quite possibly, I overstated what Ocfs2 was doing earlier - we still make use of as much generic code as we can. The O_DIRECT path for instance wasn't touched. Ocfs2 still makes use of block_commit_write(), the standard jbd mechanisms for ordered data mode, and though we got rid of block_prepare_write() (for zeroing reasons), what we do is a much simpler version. By the way, the code in question can be found in the sparse_files branch of ocfs2.git: http://git.kernel.org/?p=linux/kernel/git/mfasheh/ocfs2.git;a=log;h=sparse_files Your review has been extremely useful in the past, so I welcome any comments you might have. Though it's getting close to being put in ALL (for a spin in -mm), it's definitely a work in progress branch. There's 3 patches to generic code which I need to push out for review (it's pretty much just exporting symbols which we'd need in any case). Also, some of the bug fixes and feature adjustments need to get folded back into their respective patches. This codepath is so nasty that any duplication will be a maintaince horror. All that
Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd
The 2.6.9 base is very old in mainline terms. Are you sure the bug hasn't been fixed in mainline by other means? I cannot confirm if it have fixed in latest kernel, the server is a production system, it's hard to debug it and try reproduce. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd
On Mon, 12 Mar 2007 10:52:22 +0800 Joe Jin [EMAIL PROTECTED] wrote: The 2.6.9 base is very old in mainline terms. Are you sure the bug hasn't been fixed in mainline by other means? I cannot confirm if it have fixed in latest kernel, the server is a production system, it's hard to debug it and try reproduce. Well. That makes it hard to run tests, but perhaps it can be determined from code review.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [git patches] libata fixes
On Sun, 11 Mar 2007, Paul Rolland wrote: My machine is having two problems : the one you are describing above, which is due to a SIL controler being connected to one port of the ICH7 (at least, it seems to), and probing it goes timeout, but nothing is connected on it. Ok, so that's just a message irritation, not actually bothersome otherwise? The second problem is a Jmicron363 controler that is failing to detect the DVD-RW that is connected, unless I use the irqpoll option as Tejun has suggested. .. and this one has never worked without irqpoll? But, as you suggest it, I'm adding pci=nomsi to the command line rebooting... no change for this part of the problem. OK, the /proc/interrupt for this config, and the dmesg attached. 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts CPU0 CPU1 0: 297549 0 IO-APIC-edge timer 1: 7 0 IO-APIC-edge i8042 4: 13 0 IO-APIC-edge serial 6: 5 0 IO-APIC-edge floppy 8: 1 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12:126 0 IO-APIC-edge i8042 14: 8313 0 IO-APIC-edge libata 15: 0 0 IO-APIC-edge libata 16: 0 0 IO-APIC-fasteoi eth1, libata So it's the irq16 one that is the Jmicron controller and just isn't getting any interrupts? Since all the other interrupts work (and MSI worked for other controllers), I don't think it's interrupt-routing related. Especially as MSI shouldn't even care about things like that. And since it all works when irqpoll is used, that implies that the *only* thing that is broken is literally irq delivery. Is there possibly some jmicron-specific enable interrupts bit? PS : I'd like to try 2.6.21-rc3, but it seems that this is breaking my config : disk naming is no more the same, and I end up with a panic Warning: unable to open an initial console though i've been compiling with the same .config I was using for 2.6.21-rc2 Gaah. Can you get a log through serial console or netconsole to see what changed? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA resume slowness, e1000 MSI warning
Quoting Eric W. Biederman [EMAIL PROTECTED]: Subject: Re: SATA resume slowness, e1000 MSI warning Michael S. Tsirkin [EMAIL PROTECTED] writes: OK I guess. I gather we assume writing read-only registers has no side effects? Are there rumors circulating wrt to these? I haven't heard anything about that, and if we are writing the same value back it should be pretty safe. I have heard it asserted that at least one version of the pci spec only required 32bit accesses to be supported by the hardware. One of these days I will have to look that and see if it is true. Maybe. But surely before the PCI-X days. I do know it can be weird for hardware developers to support multiple kinds of decode. Is this the only place where Linux uses pci_read_config_word/pci_read_config_dword? I think such hardware will be pretty much DOA on all OS-es. Why don't we wait and see whether someone reports a broken config? As I recall for pci and pci-x at the hardware level the only difference in between 32bit transactions and smaller ones is the state of the byte-enable lines. True, and same holds for PCI-Express. So let's assume hardware implements RO correctly but ignores the BE bits - nothing bad happens then, right? -- MST - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
Con Kolivas wrote: On Monday 12 March 2007 08:52, Con Kolivas wrote: And thank you! I think I know what's going on now. I think each rotation is followed by another rotation before the higher priority task is getting a look in in schedule() to even get quota and add it to the runqueue quota. I'll try a simple change to see if that helps. Patch coming up shortly. Can you try the following patch and see if it helps. There's also one minor preemption logic fix in there that I'm planning on including. Thanks! Applied on top of v0.28 mainline, and there is no difference. What's it look like on your machine? Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] xfs: use xfs_get_buf_noaddr for iclogs
On Wed, Mar 07, 2007 at 11:13:14AM +0100, Christoph Hellwig wrote: xfs_buf_get_noaddr. There's a subtile change because xfs_buf_get_empty returns the buffer locked, but xfs_buf_get_noaddr returns it unlocked. From my auditing and testing nothing in the log I/O code cares about this distincition, but I'd be happy if someone could try to prove this independently. Looks safe to me - we initialise all the fields in the xfs_buf_t when we allocate out of the slab, so it doesn't really matter what state the buffer is in when we free it. OTOH, all other buffers are supposed to be locked when under I/O. This change makes a special case for the log buffers, and I'd prefer not to have to remember that this behaviour changed fo log buffers at some point in time. I suggest that adding: - iclog-hic_data = (xlog_in_core_2_t *) - kmem_zalloc(iclogsize, KM_SLEEP | KM_LARGE); - iclog-ic_prev = prev_iclog; prev_iclog = iclog; + + bp = xfs_buf_get_noaddr(log-l_iclog_size, mp-m_logdev_targp); + XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone); + XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb); + XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1); + XFS_BUF_PSEMA(bp, PRIBIO); + iclog-ic_bp = bp; + iclog-hic_data = bp-b_addr; + log-l_iclog_bak[i] = (xfs_caddr_t)(iclog-ic_header); head = iclog-ic_header; To lock the buffer should be added here. That way we don't change any semantics of the code at all. @@ -1216,11 +1221,6 @@ INT_SET(head-h_fmt, ARCH_CONVERT, XLOG_FMT); memcpy(head-h_fs_uuid, mp-m_sb.sb_uuid, sizeof(uuid_t)); - bp = xfs_buf_get_empty(log-l_iclog_size, mp-m_logdev_targp); - XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone); - XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb); - XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1); - iclog-ic_bp = bp; iclog-ic_size = XFS_BUF_SIZE(bp) - log-l_iclog_hsize; iclog-ic_state = XLOG_STATE_ACTIVE; @@ -1229,7 +1229,6 @@ iclog-ic_datap = (char *)iclog-hic_data + log-l_iclog_hsize; ASSERT(XFS_BUF_ISBUSY(iclog-ic_bp)); - ASSERT(XFS_BUF_VALUSEMA(iclog-ic_bp) = 0); And this assert can then stay... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Null pointer in autofs4 (_spin_lock) in 2.6.21-rc2
On Sun, 11 Mar 2007, Thomas Renninger wrote: On Thu, 2007-03-08 at 19:39 +0900, Ian Kent wrote: On Thu, 2007-03-08 at 11:12 +0100, Thomas Renninger wrote: On Thu, 2007-03-08 at 01:28 -0800, Andrew Morton wrote: On Thu, 08 Mar 2007 09:57:56 +0100 Thomas Renninger [EMAIL PROTECTED] wrote: I saw this happening several times on 2.6.21-rc2. Tell me how I can help... Some nfs partitions are mounted via nfs using autofs. It takes some hours to run into this: Unable to handle kernel NULL pointer dereference at 0008 RIP: [8025bada] _spin_lock+0x0/0xf PGD 1dde23067 PUD 1d3060067 PMD 0 Oops: 0002 [1] SMP CPU 3 Modules linked in: autofs4 nfs lockd nfs_acl sunrpc asus_acpi af_packet tg3 ipv6 button battery ac ext2 mbcache loop dm_mod floppy parport_pc lp parport reiserfs pata_amd edd fan thermal sg processor sata_sil libata amd74xx sd_mod scsi_mod ide_disk ide_core Pid: 11373, comm: touch Not tainted 2.6.21-rc2-default #6 RIP: 0010:[8025bada] [8025bada] _spin_lock+0x0/0xf RSP: 0018:8101c50a5a50 EFLAGS: 00010202 RAX: 8100eb8916f8 RBX: 81010007dcd8 RCX: 8100ea45b280 RDX: 10e58c2e RSI: 810163bf9e50 RDI: 0008 RBP: 810163bf9e50 R08: 8101c50a4000 R09: 8101c50a5ea8 R10: 81010003fca8 R11: 802299ad R12: R13: 8100eb891680 R14: 0005 R15: 8101c50a5b48 FS: 2b8ae744bf20() GS:81010016a7c0() knlGS:b7bd88d0 CS: 0010 DS: ES: CR0: 8005003b CR2: 0008 CR3: 0001b925f000 CR4: 06e0 Process touch (pid: 11373, threadinfo 8101c50a4000, task 8101b78bd100) Stack: 882d5f38 8101c50a5ea8 8100ec8df4b0 00d0 8100eb8916f8 810163bf9efc 10e58c2eea45b220 8100ea45b220 810163bf9e50 8100ea45b220 8100ec8df4b0 8100ec8df568 Call Trace: [882d5f38] :autofs4:autofs4_lookup+0xcb/0x311 [8020c0d8] do_lookup+0xc4/0x1ae [802097be] __link_path_walk+0x8ec/0xd9d [8824ca24] :sunrpc:rpcauth_lookup_credcache+0x12e/0x24a [8020da3e] link_path_walk+0x58/0xe0 [80232d3f] __strncpy_from_user+0x17/0x41 [8020949b] __link_path_walk+0x5c9/0xd9d [8020da3e] link_path_walk+0x58/0xe0 [80232d3f] __strncpy_from_user+0x17/0x41 [8020bea7] do_path_lookup+0x1b6/0x217 [80221512] __path_lookup_intent_open+0x56/0x97 [80218912] open_namei+0xa9/0x64c [8025dc33] do_page_fault+0x45e/0x7ad [802250eb] do_filp_open+0x1c/0x38 [80232d3f] __strncpy_from_user+0x17/0x41 [80217698] do_sys_open+0x44/0xc1 [8025511e] system_call+0x7e/0x83 Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 00 RIP [8025bada] _spin_lock+0x0/0xf RSP 8101c50a5a50 CR2: 0008 I assume 2.6.20 is OK? Can't say for sure, I expect yes. Set up with 2.6.20 now and let it run for a day or two. Maybe someone has worked in that area and has an idea meanwhile... Do we have any idea on what was being opened here? Might be useful to see the autofs maps if possible. I sent that stuff to Ian... However, I couldn't run into that with 2.6.20 and also not with *2.6.21-rc3* (yet). Maybe it already got fixed? Machine still running, I'll report back if this should happen again. I suspect the problem is still present but maybe a bit hard to trigger. I'm not convinced this is needed but it is the only thing that looks at all suspicious so if (when) you see this again could you give the patch below a try please. Ian --- --- linux-2.6.21-rc3/fs/autofs4/root.c.sbi-check2007-03-12 13:29:42.0 +0900 +++ linux-2.6.21-rc3/fs/autofs4/root.c 2007-03-12 13:30:04.0 +0900 @@ -503,6 +503,9 @@ static struct dentry *autofs4_lookup_unh const unsigned char *str = name-name; struct list_head *p, *head; + if (!sbi) + return NULL; + spin_lock(dcache_lock); spin_lock(sbi-rehash_lock); head = sbi-rehash_list; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)
On Sun, Mar 11, 2007 at 06:49:08PM +0100, Rafael J. Wysocki wrote: On Saturday, 3 March 2007 18:32, Oleg Nesterov wrote: On 03/02, Paul E. McKenney wrote: On Sat, Mar 03, 2007 at 02:33:37AM +0300, Oleg Nesterov wrote: On 03/02, Paul E. McKenney wrote: One way to embed try_to_freeze() into kthread_should_stop() might be as follows: int kthread_should_stop(void) { if (kthread_stop_info.k == current) return 1; try_to_freeze(); return 0; } I think this is dangerous. For example, worker_thread() will probably need some special actions after return from refrigerator. Also, a kernel thread may check kthread_should_stop() in the place where try_to_freeze() is not safe. Perhaps we should introduce a new helper which does this. Good point -- the return value from try_to_freeze() is lost if one uses the above approach. About one third of the calls to try_to_freeze() in 2.6.20 pay attention to the return value. One approach would be to have a kthread_should_stop_nofreeze() for those cases, and let the default be to try to freeze. I personally think we should do the opposite, add kthread_should_stop_check_freeze() or something. kthread_should_stop() is like signal_pending(), we can use it under spin_lock (and it is probably used this way by some out-of-tree driver). The new helper is obviously might_sleep(). Something like this, perhaps: Looks good to me! The other kthread_should_stop() calls in rcutorture.c should also become kthread_should_top_check_freeze(). Acked-by: Paul E. McKenney [EMAIL PROTECTED] include/linux/kthread.h |1 + kernel/kthread.c| 16 kernel/rcutorture.c |5 ++--- 3 files changed, 19 insertions(+), 3 deletions(-) Index: linux-2.6.21-rc3-mm2/kernel/kthread.c === --- linux-2.6.21-rc3-mm2.orig/kernel/kthread.c2007-03-08 21:58:48.0 +0100 +++ linux-2.6.21-rc3-mm2/kernel/kthread.c 2007-03-11 18:32:59.0 +0100 @@ -13,6 +13,7 @@ #include linux/file.h #include linux/module.h #include linux/mutex.h +#include linux/freezer.h #include asm/semaphore.h /* @@ -60,6 +61,21 @@ int kthread_should_stop(void) } EXPORT_SYMBOL(kthread_should_stop); +/** + * kthread_should_stop_check_freeze - check if the thread should return now and + * if not, check if there is a freezing request pending for it. + */ +int kthread_should_stop_check_freeze(void) +{ + might_sleep(); + if (kthread_stop_info.k == current) + return 1; + + try_to_freeze(); + return 0; +} +EXPORT_SYMBOL(kthread_should_stop_check_freeze); + static void kthread_exit_files(void) { struct fs_struct *fs; Index: linux-2.6.21-rc3-mm2/include/linux/kthread.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/kthread.h 2007-02-04 19:44:54.0 +0100 +++ linux-2.6.21-rc3-mm2/include/linux/kthread.h 2007-03-11 18:37:10.0 +0100 @@ -29,5 +29,6 @@ struct task_struct *kthread_create(int ( void kthread_bind(struct task_struct *k, unsigned int cpu); int kthread_stop(struct task_struct *k); int kthread_should_stop(void); +int kthread_should_stop_check_freeze(void); #endif /* _LINUX_KTHREAD_H */ Index: linux-2.6.21-rc3-mm2/kernel/rcutorture.c === --- linux-2.6.21-rc3-mm2.orig/kernel/rcutorture.c 2007-03-11 11:39:06.0 +0100 +++ linux-2.6.21-rc3-mm2/kernel/rcutorture.c 2007-03-11 18:45:00.0 +0100 @@ -540,10 +540,9 @@ rcu_torture_writer(void *arg) } rcu_torture_current_version++; oldbatch = cur_ops-completed(); - try_to_freeze(); - } while (!kthread_should_stop() !fullstop); + } while (!kthread_should_stop_check_freeze() !fullstop); VERBOSE_PRINTK_STRING(rcu_torture_writer task stopping); - while (!kthread_should_stop()) + while (!kthread_should_stop_check_freeze()) schedule_timeout_uninterruptible(1); return 0; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for ... 2.6.18.8 kernel
On Monday 12 March 2007 19:17, Vincent Fortier wrote: There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and 2.6.21-rc3-mm2 to bring RSDL up to version 0.30 for download here: Full patches: http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.30.p at ch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2-rsdl-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. 30 .patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.30 .p atch incrementals: http://ck.kolivas.org/patches/staircase-deadline/2.6.20/2.6.20.2-rsdl-0.2 9- 0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2/2.6.20.2-rsdl-0 .2 9-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3/2.6.21-rc3-rs dl -0.29-0.30.patch http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/2.6.21-rc 3- mm2-rsdl-0.29-0.30.patch And here are the backported RSDL 0.30 patches in case any of you would still be running an older 2.6.18.8 kernel ... Thanks, your efforts are appreciated as it would take me quite a while to do a variety of backports that people are already requesting. Just for info, verison 0.30 seems around 2 seconds faster than 0.26-0.29 versions at boot time. I used to have around 2-3 seconds of difference between a vanilla and a rsdl patched kernel. Now it looks more like 5 seconds faster! Wow.. nice work CK! 2.6.18.8 vanilla kernel: [ 68.514248] ACPI: Power Button (CM) [PWRB] 2.6.18.8-rsdl-0.30: [ 63.739337] ACPI: Power Button (CM) [PWRB] Indeed there's almost 5 seconds difference there. To be honest, the boot time speedups are an unexpected bonus, but everyone seems to be reporting them on all flavours so perhaps all those timeout related driver setups are inadvertently benefiting. - vin Thanks -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1
On Sun, Mar 11, 2007 at 06:02:31PM +0100, Michal Piotrowski wrote: On 10/03/07, Paul E. McKenney [EMAIL PROTECTED] wrote: On Fri, Mar 09, 2007 at 06:18:51PM -0800, Andrew Morton wrote: On Thu, 08 Mar 2007 21:50:29 +0100 Michal Piotrowski [EMAIL PROTECTED] wrote: Andrew Morton napisaĆ(a): Temporarily at http://userweb.kernel.org/~akpm/2.6.21-rc3-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm1/ cpu_hotplug (AutoTest) hangs at this = [ INFO: possible recursive locking detected ] 2.6.21-rc3-mm1 #2 - sh/7213 is trying to acquire lock: (sched_hotcpu_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f but task is already holding lock: (sched_hotcpu_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f other info that might help us debug this: 4 locks held by sh/7213: #0: (cpu_add_remove_lock){--..}, at: [c033883a] mutex_lock+0x1c/0x1f #1: (sched_hotcpu_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f #2: (cache_chain_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f #3: (workqueue_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f That's pretty useless, isn't it? We need to know the mutex_lock() caller here. stack backtrace [c0105256] show_trace_log_lvl+0x1a/0x2f [c010597b] show_trace+0x12/0x14 [c0105a3d] dump_stack+0x16/0x18 [c013fc73] __lock_acquire+0x1aa/0xceb [c014082d] lock_acquire+0x79/0x93 [c03385dc] __mutex_lock_slowpath+0x107/0x349 [c033883a] mutex_lock+0x1c/0x1f [c011d924] sched_getaffinity+0x14/0x91 [c015796d] __synchronize_sched+0x11/0x5f [c011d257] detach_destroy_domains+0x2c/0x30 [c011fc1a] update_sched_domains+0x27/0x3a [c012fe7a] notifier_call_chain+0x2b/0x4a [c012fec6] __raw_notifier_call_chain+0x19/0x1e [c0145756] _cpu_down+0x70/0x282 [c014598e] cpu_down+0x26/0x38 [c0272714] store_online+0x27/0x5a [c026f610] sysdev_store+0x20/0x25 [c01b7a8e] sysfs_write_file+0xc1/0xe9 [c0180052] vfs_write+0xd1/0x15a [c0180682] sys_write+0x3d/0x72 [c0104270] syscall_call+0x7/0xb l *0xc033883a 0xc033883a is in mutex_lock (/mnt/md0/devel/linux-mm/kernel/mutex.c:92). 87 /* 88 * The locking fastpath is the 1-0 transition from 89 * 'unlocked' into 'locked' state. 90 */ 91 __mutex_fastpath_lock(lock-count, __mutex_lock_slowpath); 92 } 93 94 EXPORT_SYMBOL(mutex_lock); 95 96 static void fastcall noinline __sched I didn't test other -mm's with this test. http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3-mm1/console.log http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3-mm1/mm-config I can't immediately spot the bug. Probably it's caused by rcu-preempt's changes to synchronize_sched(): that function now does a heap more than it used to, including taking sched_hotcpu_muex. So, what to do about this. Paul, I'm thinking that I should drop rcu-preempt for now - I don't think we ended up being able to identify any particular benefit which it brings to current mainline, and I suspect that things will become simpler if/when we start using the process freezer for CPU hotplug. It certainly makes sense for Michal to try backing out rcu-preempt using your broken-out list of patches. If that makes the problem go away, Problem is caused by rcu-preempt.patch. OK, clearly we need to fix this. You might be right about the freezer code having to go in first, Andrew -- will see! Thanx, Paul then I would certainly have a hard time arguing with you. We are working on getting measurements showing benefit of rcu-preempt, but aren't there yet. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler
On Monday 12 March 2007 15:42, Al Boldi wrote: Con Kolivas wrote: On Monday 12 March 2007 08:52, Con Kolivas wrote: And thank you! I think I know what's going on now. I think each rotation is followed by another rotation before the higher priority task is getting a look in in schedule() to even get quota and add it to the runqueue quota. I'll try a simple change to see if that helps. Patch coming up shortly. Can you try the following patch and see if it helps. There's also one minor preemption logic fix in there that I'm planning on including. Thanks! Applied on top of v0.28 mainline, and there is no difference. What's it look like on your machine? The higher priority one always get 6-7ms whereas the lower priority one runs 6-7ms and then one larger perfectly bound expiration amount. Basically exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL maximum latency whereas the lower priority task gets RR_INTERVAL min and full expiration (according to the virtual deadline) as a maximum. That's exactly how I intend it to work. Yes I realise that the max latency ends up being longer intermittently on the niced task but that's -in my opinion- perfectly fine as a compromise to ensure the nice 0 one always gets low latency. Eg: nice 0 vs nice 10 nice 0: pid 6288, prio 0, out for7 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms pid 6288, prio 0, out for6 ms nice 10: pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for 66 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms pid 6290, prio 10, out for6 ms exactly as I'd expect. If you want fixed latencies _of niced tasks_ in the presence of less niced tasks you will not get them with this scheduler. What you will get, though, is a perfectly bound relationship knowing exactly what the maximum latency will ever be. Thanks for the test case. It's interesting and nice that it confirms this scheduler works as I expect it to. -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Style Question
2007/3/12, Jan Engelhardt [EMAIL PROTECTED]: On Mar 11 2007 22:15, Cong WANG wrote: I have a question about coding style in linux kernel. In Documention/CodingStyle, it is said that Linux style for comments is the C89 /* ... */ style. Don't use C99-style // ... comments. _But_ I see a lot of '//' style comments in current kernel code. Which is wrong? The documentions or the code, or neither? And why? The code. And because it's not always reviewed but silently pushed. Another question is about NULL. AFAIK, in user space, using NULL is better than directly using 0 in C. In kernel, I know it used its own NULL, which may be defined as ((void*)0), but it's _still_ different from raw zero. In what way? The following code is picked from drivers/kvm/kvm_main.c: static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot) { struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot]; mutex_lock(vcpu-mutex); if (unlikely(!vcpu-vmcs)) { mutex_unlock(vcpu-mutex); return 0; } return kvm_arch_ops-vcpu_load(vcpu); } Obviously, it used 0 rather than NULL when returning a pointer to indicate an error. Should we fix such issue? So can I say using NULL is better than 0 in kernel? On what basis? Do you even know what NULL is defined as in (C, not C++) userspace? Think about it. I think it's more clear to indicate we are using a pointer rather than an integer when we use NULL in kernel. But in userspace, using NULL is for portbility of the program, although most (*just* most, NOT all) of NULL's defination is ((void*)0). ;-) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL for 2.6.21-rc3- 0.29
On Sunday 11 March 2007, Con Kolivas wrote: On Sunday 11 March 2007 15:03, Matt Mackall wrote: On Sat, Mar 10, 2007 at 10:01:32PM -0600, Matt Mackall wrote: On Sun, Mar 11, 2007 at 01:28:22PM +1100, Con Kolivas wrote: Ok I don't think there's any actual accounting problem here per se (although I did just recently post a bugfix for rsdl however I think that's unrelated). What I think is going on in the ccache testcase is that all the work is being offloaded to kernel threads reading/writing to/from the filesystem and the make is not getting any actual cpu time. I don't see significant system time while this is happening. Also, it's running pretty much entirely out of page cache so there wouldn't be a whole lot for kernel threads to do. Well I can't reproduce that behaviour here at all whether from disk or the pagecache with ccache, so I'm not entirely sure what's different at your end. However both you and the other person reporting bad behaviour were using ATI drivers. That's about the only commonality? I wonder if they do need to yield... somewhat instead of not at all. I hate to say it Con, but this one seems to have broken the amanda-tar symbiosis. I haven't tried a plain 21-rc3, so the problem may exist there, and in fact it did for 21-rc1, but I don't recall if it was true for -rc2. But I will have a plain 21-rc3 running by tomorrow nights amanda run to test. What happens is that when amanda tells tar to do a level 1 or 2, tar still thinks its doing a level 0. The net result is that the tape is filled completely and amanda does an EOT exit in about 10 of my 42 dle's. This is tar-1.15-1 for fedora core 6. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) While it may be true that a watched pot never boils, the one you don't keep an eye on can make an awful mess of your stove. -- Edward Stevenson - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Style Question
On Mar 12 2007 13:37, Cong WANG wrote: The following code is picked from drivers/kvm/kvm_main.c: static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot) { struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot]; mutex_lock(vcpu-mutex); if (unlikely(!vcpu-vmcs)) { mutex_unlock(vcpu-mutex); return 0; } return kvm_arch_ops-vcpu_load(vcpu); } Obviously, it used 0 rather than NULL when returning a pointer to indicate an error. Should we fix such issue? Indeed. If it was for me, something like that should throw a compile error. [...] I think it's more clear to indicate we are using a pointer rather than an integer when we use NULL in kernel. But in userspace, using NULL is for portbility of the program, although most (*just* most, NOT all) of NULL's defination is ((void*)0). ;-) NULL has the same bit pattern as the number zero. (I'm not saying the bit pattern is all zeroes. And I am not even sure if NULL ought to have the same pattern as zero.) So C++ could use (void *)0, if it would let itself :p Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL for 2.6.21-rc3- 0.29
Hi Gene. On Monday 12 March 2007 16:38, Gene Heskett wrote: I hate to say it Con, but this one seems to have broken the amanda-tar symbiosis. I haven't tried a plain 21-rc3, so the problem may exist there, and in fact it did for 21-rc1, but I don't recall if it was true for -rc2. But I will have a plain 21-rc3 running by tomorrow nights amanda run to test. What happens is that when amanda tells tar to do a level 1 or 2, tar still thinks its doing a level 0. The net result is that the tape is filled completely and amanda does an EOT exit in about 10 of my 42 dle's. This is tar-1.15-1 for fedora core 6. I'm sorry but I have to say I have no idea what any of this means. I gather you're making an association between some application combination failing and RSDL cpu scheduler. Unfortunately the details of what the problem is, or how the cpu scheduler is responsible, escape me :( -- -ck - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Style Question
On Mon, 2007-03-12 at 06:40 +0100, Jan Engelhardt wrote: On Mar 12 2007 13:37, Cong WANG wrote: The following code is picked from drivers/kvm/kvm_main.c: static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot) { struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot]; mutex_lock(vcpu-mutex); if (unlikely(!vcpu-vmcs)) { mutex_unlock(vcpu-mutex); return 0; } return kvm_arch_ops-vcpu_load(vcpu); } Obviously, it used 0 rather than NULL when returning a pointer to indicate an error. Should we fix such issue? Indeed. If it was for me, something like that should throw a compile error. [...] I think it's more clear to indicate we are using a pointer rather than an integer when we use NULL in kernel. But in userspace, using NULL is for portbility of the program, although most (*just* most, NOT all) of NULL's defination is ((void*)0). ;-) NULL has the same bit pattern as the number zero. (I'm not saying the bit pattern is all zeroes. And I am not even sure if NULL ought to have the same pattern as zero.) So C++ could use (void *)0, if it would let itself :p Not necessarily. You can use 0 at the source level, but the compiler has to convert it to the actual NULL pointer bit pattern, whatever it may be. In C++, NULL is typically defined to 0 (with no void* cast) by most compilers because 0 (and only 0) can be implicitly converted to to null pointer of any ponter type without a cast. GCC introduced the __null extension so that NULL still works correctly in C++ when passed to a varargs function on 64-bit platforms. (This just works in C because C makes NULL ((void*)0) is thus is the right size. In C++, the 0 ends up being an int instead of a pointer when passed to a varargs function, and things tend to blow up when they read the garbage high bits. Of course, nobody else does this, so you still have to use (void*)NULL to be portable.) -- Nicholas Miell [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Make nenuconfig does not save parameters.
On 3/11/07, Sam Ravnborg [EMAIL PROTECTED] wrote: [..snip..] | To make the conversion we should consider renaming from | current Load alternate to Open config file... | and likewise Save alternate to Save config file as... | | Comments? | | Sam [..snip...] I think that is excellent. (Actually I can't test it now but the idea is just perfect) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Style Question
On Mon, 12 Mar 2007, Jan Engelhardt wrote: On Mar 12 2007 13:37, Cong WANG wrote: The following code is picked from drivers/kvm/kvm_main.c: static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot) { struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot]; mutex_lock(vcpu-mutex); if (unlikely(!vcpu-vmcs)) { mutex_unlock(vcpu-mutex); return 0; } return kvm_arch_ops-vcpu_load(vcpu); } Obviously, it used 0 rather than NULL when returning a pointer to indicate an error. Should we fix such issue? Indeed. If it was for me, something like that should throw a compile error. At least it does throw a sparse warning, and yes, it should be fixed. [...] I think it's more clear to indicate we are using a pointer rather than an integer when we use NULL in kernel. But in userspace, using NULL is for portbility of the program, although most (*just* most, NOT all) of NULL's defination is ((void*)0). ;-) NULL has the same bit pattern as the number zero. (I'm not saying the bit pattern is all zeroes. And I am not even sure if NULL ought to have the same pattern as zero.) So C++ could use (void *)0, if it would let itself :p Jan -- ~Randy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL for 2.6.21-rc3- 0.29
On Monday 12 March 2007, Con Kolivas wrote: Hi Gene. On Monday 12 March 2007 16:38, Gene Heskett wrote: I hate to say it Con, but this one seems to have broken the amanda-tar symbiosis. I haven't tried a plain 21-rc3, so the problem may exist there, and in fact it did for 21-rc1, but I don't recall if it was true for -rc2. But I will have a plain 21-rc3 running by tomorrow nights amanda run to test. What happens is that when amanda tells tar to do a level 1 or 2, tar still thinks its doing a level 0. The net result is that the tape is filled completely and amanda does an EOT exit in about 10 of my 42 dle's. This is tar-1.15-1 for fedora core 6. I'm sorry but I have to say I have no idea what any of this means. I gather you're making an association between some application combination failing and RSDL cpu scheduler. Unfortunately the details of what the problem is, or how the cpu scheduler is responsible, escape me :( I have another backup running right now, after building a plain 2.6.21-rc3, and rebooting just now for the test. I don't think its the scheduler itself, but is something post 2.6.20 that is messing with tars mind and making it think the files it just read to do the estimate phase, are all new, so even a level 2 is in effect a level 0. I'll have an answer in about an hour, but its also 2:36am here and I'm headed for the rack to get some zzz's. So I'll report in the morning as to whether or not this backup ran as it was supposed to. I have a feeling its not going to though. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) When it comes to humility, I'm the greatest. -- Bullwinkle Moose - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]Replace 0 with NULL when returning a pointer
Use NULL to indicate we are returning a pointer rather than an integer and to eliminate some sparse warnings. Signed-off-by: Cong WANG [EMAIL PROTECTED] --- --- drivers/kvm/kvm_main.c.orig 2007-03-11 21:41:23.0 +0800 +++ drivers/kvm/kvm_main.c 2007-03-12 14:26:17.0 +0800 @@ -205,7 +205,7 @@ static struct kvm_vcpu *vcpu_load(struct mutex_lock(vcpu-mutex); if (unlikely(!vcpu-vmcs)) { mutex_unlock(vcpu-mutex); - return 0; + return NULL; } return kvm_arch_ops-vcpu_load(vcpu); } @@ -799,7 +799,7 @@ struct kvm_memory_slot *gfn_to_memslot(s gfn memslot-base_gfn + memslot-npages) return memslot; } - return 0; + return NULL; } EXPORT_SYMBOL_GPL(gfn_to_memslot); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]Replace 0 with NULL when returning a pointer
Use NULL to indicate we are returning a pointer rather than an integer and to eliminate some sparse warnings. Signed-off-by: Cong WANG [EMAIL PROTECTED] --- --- drivers/kvm/vmx.c.orig 2007-03-11 21:41:03.0 +0800 +++ drivers/kvm/vmx.c 2007-03-12 14:25:11.0 +0800 @@ -98,7 +98,7 @@ static struct vmx_msr_entry *find_msr_en for (i = 0; i vcpu-nmsrs; ++i) if (vcpu-guest_msrs[i].index == msr) return vcpu-guest_msrs[i]; - return 0; + return NULL; } static void vmcs_clear(struct vmcs *vmcs) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/8] per backing_dev dirty and writeback page accounting
On Tue, Mar 06, 2007 at 07:04:46PM +0100, Miklos Szeredi wrote: From: Andrew Morton [EMAIL PROTECTED] [EMAIL PROTECTED]: bugfix] Miklos Szeredi [EMAIL PROTECTED]: Changes: - updated to apply after clear_page_dirty_for_io() race fix This is needed for - balance_dirty_pages() deadlock fix - fuse dirty page accounting I have no idea how serious the scalability problems with this are. If they are serious, different solutions can probably be found for the above, but this is certainly the simplest. Atomic operations to a single per-backing device from all CPUs at once? That's a pretty serious scalability issue and it will cause a major performance regression for XFS. I'd call this a showstopper right now - maybe you need to look at something like the ZVC code that Christoph Lameter wrote, perhaps? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On 3/11/07, Gene Heskett [EMAIL PROTECTED] wrote: On Sunday 11 March 2007, Mike Galbraith wrote: Just to comment, I've been running one of the patches between 20-ck1 and this latest one, which is building as I type, but I also run gkrellm here, version 2.2.9. Since I have been running this middle of this series patch, something is killing gkrellm about once a day, and there is nothing in the logs to indicate a problem. I see a blink out of the corner of my eye, and its gone. And it always starts right back up from a kmenu click. No idea if anyone else is experiencing this or not. -- Cheers, Gene I've had such an issue with 0.20 or something. Sometimes, the xfce4-panel would disappear (die) when I displayed its menu. Very rare issue. Doesn't happen with 0.28 anyway. :-) Which looks really good, though I'll update to 0.30. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] two more device ids for dm9601 usbnet driver
Jon == Jon Dowland [EMAIL PROTECTED] writes: Hi, Jon This patch for the linux-usb-devel tree adds two more Jon product ids to the dm9601 driver. These ids were found on Jon rebadged dm9601 devices in the wild. Jon Signed-off-by: Jon Dowland [EMAIL PROTECTED] Acked-by: Peter Korsgaard [EMAIL PROTECTED] -- Bye, Peter Korsgaard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc2-mm2: drivers/net/wireless/libertas/debugfs.c addr bogosity
On Fri, Mar 09, 2007 at 09:14:29AM -0800, Randy Dunlap wrote: Good to use FIELD_SIZEOF(), Thanks. but in general, we prefer to use it directly, not in yet another wrapper. I left the item_{size,addr} in place as it seemed to make the item[] more compact. I'm not certain using the FIELD_SIZEOF() macro directly is a win. From: Tony Breeds [EMAIL PROTECTED] Cleanup drivers/net/wireless/libertas/debugfs.c to use standard kernel macros and functions. Signed-off-by: Tony Breeds [EMAIL PROTECTED] --- only compile tested on x86 drivers/net/wireless/libertas/debugfs.c | 56 +++ 1 files changed, 12 insertions(+), 44 deletions(-) diff --git a/drivers/net/wireless/libertas/debugfs.c b/drivers/net/wireless/libertas/debugfs.c index 3ad1e03..8b0e3ec 100644 --- a/drivers/net/wireless/libertas/debugfs.c +++ b/drivers/net/wireless/libertas/debugfs.c @@ -1771,58 +1771,26 @@ void libertas_debugfs_remove_one(wlan_private *priv) } /* debug entry */ - -#define item_size(n) (sizeof ((wlan_adapter *)0)-n) -#define item_addr(n) ((u32) ((wlan_adapter *)0)-n) - struct debug_data { char name[32]; u32 size; u32 addr; }; -/* To debug any member of wlan_adapter, simply add one line here. - */ +/* To debug any member of wlan_adapter, simply add a record here. */ static struct debug_data items[] = { - {intcounter, item_size(intcounter), item_addr(intcounter)}, - {psmode, item_size(psmode), item_addr(psmode)}, - {psstate, item_size(psstate), item_addr(psstate)}, + { .name = intcounter, + .size = FIELD_SIZEOF(wlan_adapter, intcounter), + .addr = offsetof(wlan_adapter, intcounter) }, + { .name = psmode, + .size = FIELD_SIZEOF(wlan_adapter, psmode), + .addr = offsetof(wlan_adapter, psmode) }, + { .name = psstate, + .size = FIELD_SIZEOF(wlan_adapter, psstate), + .addr = offsetof(wlan_adapter, psstate) }, }; -static int num_of_items = sizeof(items) / sizeof(items[0]); - -/** - * @brief convert string to number - * - * @param s pointer to numbered string - * @return converted number from string s - */ -static int string_to_number(char *s) -{ - int r = 0; - int base = 0; - - if ((strncmp(s, 0x, 2) == 0) || (strncmp(s, 0X, 2) == 0)) - base = 16; - else - base = 10; - - if (base == 16) - s += 2; - - for (s = s; *s != 0; s++) { - if ((*s = 48) (*s = 57)) - r = (r * base) + (*s - 48); - else if ((*s = 65) (*s = 70)) - r = (r * base) + (*s - 55); - else if ((*s = 97) (*s = 102)) - r = (r * base) + (*s - 87); - else - break; - } - - return r; -} +static int num_of_items = ARRAY_SIZE(items); /** * @brief proc read function @@ -1912,7 +1880,7 @@ static int wlan_debugfs_write(struct file *f, const char __user *buf, if (!p2) break; p2++; - r = string_to_number(p2); + r = simple_strtoul(p2, NULL, 0); if (d[i].size == 1) *((u8 *) d[i].addr) = (u8) r; else if (d[i].size == 2) Yours Tony linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/ Jan 28 - Feb 02 2008 The Australian Linux Technical Conference! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] libata fixes
Hello, Linus. Linus Torvalds wrote: On Sun, 11 Mar 2007, Paul Rolland wrote: My machine is having two problems : the one you are describing above, which is due to a SIL controler being connected to one port of the ICH7 (at least, it seems to), and probing it goes timeout, but nothing is connected on it. Ok, so that's just a message irritation, not actually bothersome otherwise? It involves a long timeout, so it's bothersome. This is caused by Silicon Image 4726/3726 storage processor (SATA Port Multiplier with extra features) attached to one of the ICH ports. If the first downstream port in the PMP is empty and it gets reset in non-PMP way, it identifies itself as Config Disk of quite small size. It's probably used to configure the extra features using standard ATA RW commands. Anyways, this Config Disk is a bit peculiar and doesn't work very well with the current ATA reset sequence and gets identified only after a few failures thus causing long timeout. I keep forgetting about this. I'll ask SIMG how to deal with this. For the time being, connecting a device to the PMP port should remove the timeouts. The second problem is a Jmicron363 controler that is failing to detect the DVD-RW that is connected, unless I use the irqpoll option as Tejun has suggested. .. and this one has never worked without irqpoll? But, as you suggest it, I'm adding pci=nomsi to the command line rebooting... no change for this part of the problem. OK, the /proc/interrupt for this config, and the dmesg attached. 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts CPU0 CPU1 0: 297549 0 IO-APIC-edge timer 1: 7 0 IO-APIC-edge i8042 4: 13 0 IO-APIC-edge serial 6: 5 0 IO-APIC-edge floppy 8: 1 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12:126 0 IO-APIC-edge i8042 14: 8313 0 IO-APIC-edge libata 15: 0 0 IO-APIC-edge libata 16: 0 0 IO-APIC-fasteoi eth1, libata So it's the irq16 one that is the Jmicron controller and just isn't getting any interrupts? Since all the other interrupts work (and MSI worked for other controllers), I don't think it's interrupt-routing related. Especially as MSI shouldn't even care about things like that. And since it all works when irqpoll is used, that implies that the *only* thing that is broken is literally irq delivery. Is there possibly some jmicron-specific enable interrupts bit? (cc'ing Justin of JMicron. Hello, please correct me if I'm wrong.) Not that I know of. The PATA portion of JMB controllers is bog standard PCI BMDMA ATA device where ATA_NIEN is the way to turn IRQ on and off. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] libata fixes
Of course I forgot to CC. :-) Quoting whole message for Justin. Tejun Heo wrote: Hello, Linus. Linus Torvalds wrote: On Sun, 11 Mar 2007, Paul Rolland wrote: My machine is having two problems : the one you are describing above, which is due to a SIL controler being connected to one port of the ICH7 (at least, it seems to), and probing it goes timeout, but nothing is connected on it. Ok, so that's just a message irritation, not actually bothersome otherwise? It involves a long timeout, so it's bothersome. This is caused by Silicon Image 4726/3726 storage processor (SATA Port Multiplier with extra features) attached to one of the ICH ports. If the first downstream port in the PMP is empty and it gets reset in non-PMP way, it identifies itself as Config Disk of quite small size. It's probably used to configure the extra features using standard ATA RW commands. Anyways, this Config Disk is a bit peculiar and doesn't work very well with the current ATA reset sequence and gets identified only after a few failures thus causing long timeout. I keep forgetting about this. I'll ask SIMG how to deal with this. For the time being, connecting a device to the PMP port should remove the timeouts. The second problem is a Jmicron363 controler that is failing to detect the DVD-RW that is connected, unless I use the irqpoll option as Tejun has suggested. .. and this one has never worked without irqpoll? But, as you suggest it, I'm adding pci=nomsi to the command line rebooting... no change for this part of the problem. OK, the /proc/interrupt for this config, and the dmesg attached. 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts CPU0 CPU1 0: 297549 0 IO-APIC-edge timer 1: 7 0 IO-APIC-edge i8042 4: 13 0 IO-APIC-edge serial 6: 5 0 IO-APIC-edge floppy 8: 1 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12:126 0 IO-APIC-edge i8042 14: 8313 0 IO-APIC-edge libata 15: 0 0 IO-APIC-edge libata 16: 0 0 IO-APIC-fasteoi eth1, libata So it's the irq16 one that is the Jmicron controller and just isn't getting any interrupts? Since all the other interrupts work (and MSI worked for other controllers), I don't think it's interrupt-routing related. Especially as MSI shouldn't even care about things like that. And since it all works when irqpoll is used, that implies that the *only* thing that is broken is literally irq delivery. Is there possibly some jmicron-specific enable interrupts bit? (cc'ing Justin of JMicron. Hello, please correct me if I'm wrong.) Not that I know of. The PATA portion of JMB controllers is bog standard PCI BMDMA ATA device where ATA_NIEN is the way to turn IRQ on and off. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: libata extension
Vitaliyi wrote: Good Day Say i want to implement extended set of ATA commands available to userspace for building diagnostic tools. I need 0x40 -- read verify and 0x32 -- write long with error handling, for example. I was trying ide driver through ioctl's, but seems it lack of functionality and full of gotchas. Furthermore it oopses sometimes. Is it possible to use libata for such purpose or i need to write separate IDE driver ? By the way, i'm sure it should be done in kernel space since i'm going to deal with some hdd manufacturer commands. P.S. I was looking through libata and ide sources and documentation but still dont have broad picture. I believe you should be able to do this by sending ATA pass-through SCSI commands into the device using SG_IO, without any kernel changes. It's really the mechanism that's meant for this.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/3] Add ability to keep track of callers of symbol_(get|put)
> On Sat, 10 Mar 2007 02:31:35 -0200 Mauro Carvalho Chehab <[EMAIL PROTECTED]> > wrote: > From: Trent Piepho <[EMAIL PROTECTED]> > > When a module uses symbol_get() to increase the ref count of another > module, there is no record what module called symbol_get(). A module > can > show up as having other users, but there is no way to tell who those > users are. > > This adds that ability to symbol_put() and symbol_get(). One day I'll write a script which unwordwraps patches and then you'll all need to find new ways of torturing me. This patch needed rather a lot of help in the coding-style department. Hopefully Rusty can comment on the content, because I'm all exhausted from cleaning it up. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2/6] 2.6.21-rc2: known regressions
* Pavel Machek <[EMAIL PROTECTED]> wrote: > > Probably tweaking the webpage doesnt help because people dont get > > there - as the results plainly show it. Maybe some more automation > > would be useful too, a tool that detects failed resume and tries all > > those options that makes sense on that box or something? It's not > > like that > > Unfortunately, these tend to crash the box when you pass wrong > options, and I do not see easy way to test "can user see whats on > display" automatically. you could perhaps try what X's modesetting utility does: display a dialog box that times out if it does not get clicked on, and reboot if it did not get clicked on. Likewise, detect upon the next bootup that a suspend-test was in progress (and didnt get back via normal resume), via some temporary file. That way both the 'did not resume and i had to power-cycle' and the 'resume did not restore my X' problems can be handled. Finally, when the correct options have been established (worse-case with a small number of reboots and "yes, indeed the resume did not work fine" clicks done upon bootup by the user), automatically fill in a webform in firefox and ask the user to do a single click to submit that form. techniques like that have more chance i think to get Linux suspend/resume anywhere near to working. The current 'rely on the developer' technique apparently does not work. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 6/7] Account for the number of tasks within container
Paul Menage wrote: > On 3/6/07, Pavel Emelianov <[EMAIL PROTECTED]> wrote: >> The idea is: >> >> Task may be "the entity that allocates the resources" and "the >> entity that is a resource allocated". >> >> When task is the first entity it may move across containers >> (that is implemented in your patches). When task is a resource >> it shouldn't move across containers like files or pages do. >> >> More generally - allocated resources hold reference to original >> container till they die. No resource migration is performed. >> >> Did I express my idea cleanly? > > Yes, but I disagree with the premise. The title of your patch is > "Account for the number of tasks within container", but that's not > what the subsystem does, it accounts for the number of forks within > the container that aren't directly accompanied by an exit. > > Ideally, resources like files and pages would be able to follow tasks > as well. The reason that files and pages aren't easily migrated from > one container to another is that there could be sharing involved; > figuring out the sharing can be expensive, and it's not clear what to > do if two users are in different containers. > > But in the case of a task count, there are no such issues with > sharing, so it seems to me to be more sensible (and more efficient) to > just limit the number of tasks in a container. > > i.e. when moving a task into a container or forking a task within a > container, increment the count; when moving a task out of a container > or when it exits, decrement the count. Sounds reasonable. I'll take this into account when I make the next iteration. Thanks. > With your approach, if you were to set the task limit of an empty > container A to 1, and then move a process P from B into A, P would be > able to fork a new child, since the "task count" would be 0 (as P was > being charged to B still). Surely the fact that there's 1 process in A > should prevent P from forking? > > Paul > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] Resource counters
Herbert Poetzl wrote: > On Wed, Mar 07, 2007 at 10:19:05AM +0300, Pavel Emelianov wrote: >> Balbir Singh wrote: >>> Pavel Emelianov wrote: Introduce generic structures and routines for resource accounting. Each resource accounting container is supposed to aggregate it, container_subsystem_state and its resource-specific members within. diff -upr linux-2.6.20.orig/include/linux/res_counter.h linux-2.6.20-0/include/linux/res_counter.h --- linux-2.6.20.orig/include/linux/res_counter.h2007-03-06 13:39:17.0 +0300 +++ linux-2.6.20-0/include/linux/res_counter.h2007-03-06 13:33:28.0 +0300 @@ -0,0 +1,83 @@ +#ifndef __RES_COUNTER_H__ +#define __RES_COUNTER_H__ +/* + * resource counters + * + * Copyright 2007 OpenVZ SWsoft Inc + * + * Author: Pavel Emelianov <[EMAIL PROTECTED]> + * + */ + +#include + +struct res_counter { +unsigned long usage; +unsigned long limit; +unsigned long failcnt; +spinlock_t lock; +}; + +enum { +RES_USAGE, +RES_LIMIT, +RES_FAILCNT, +}; + +ssize_t res_counter_read(struct res_counter *cnt, int member, +const char __user *buf, size_t nbytes, loff_t *pos); +ssize_t res_counter_write(struct res_counter *cnt, int member, +const char __user *buf, size_t nbytes, loff_t *pos); + +static inline void res_counter_init(struct res_counter *cnt) +{ +spin_lock_init(>lock); +cnt->limit = (unsigned long)LONG_MAX; +} + >>> Is there any way to indicate that there are no limits on this container. >> Yes - LONG_MAX is essentially a "no limit" value as no >> container will ever have such many files :) > > -1 or ~0 is a viable choice for userspace to > communicate 'infinite' or 'unlimited' OK, I'll make ULONG_MAX :) >>> LONG_MAX is quite huge, but still when the administrator wants to >>> configure a container to *un-limited usage*, it becomes hard for >>> the administrator. >>> +static inline int res_counter_charge_locked(struct res_counter *cnt, +unsigned long val) +{ +if (cnt->usage <= cnt->limit - val) { +cnt->usage += val; +return 0; +} + +cnt->failcnt++; +return -ENOMEM; +} + +static inline int res_counter_charge(struct res_counter *cnt, +unsigned long val) +{ +int ret; +unsigned long flags; + +spin_lock_irqsave(>lock, flags); +ret = res_counter_charge_locked(cnt, val); +spin_unlock_irqrestore(>lock, flags); +return ret; +} + >>> Will atomic counters help here. >> I'm afraid no. We have to atomically check for limit and alter >> one of usage or failcnt depending on the checking result. Making >> this with atomic_xxx ops will require at least two ops. > > Linux-VServer does the accounting with atomic counters, > so that works quite fine, just do the checks at the > beginning of whatever resource allocation and the > accounting once the resource is acquired ... This works quite fine on non-preempted kernels. >From the time you checked for resource till you really account it kernel may preempt and let another process pass through vx_anything_avail() check. >> If we'll remove failcnt this would look like >>while (atomic_cmpxchg(...)) >> which is also not that good. >> >> Moreover - in RSS accounting patches I perform page list >> manipulations under this lock, so this also saves one atomic op. > > it still hasn't been shown that this kind of RSS limit > doesn't add big time overhead to normal operations > (inside and outside of such a resource container) > > note that the 'usual' memory accounting is much more > lightweight and serves similar purposes ... It OOM-kills current int case of limit hit instead of reclaiming pages or killing *memory eater* to free memory. > best, > Herbert > +static inline void res_counter_uncharge_locked(struct res_counter *cnt, +unsigned long val) +{ +if (unlikely(cnt->usage < val)) { +WARN_ON(1); +val = cnt->usage; +} + +cnt->usage -= val; +} + +static inline void res_counter_uncharge(struct res_counter *cnt, +unsigned long val) +{ +unsigned long flags; + +spin_lock_irqsave(>lock, flags); +res_counter_uncharge_locked(cnt, val); +spin_unlock_irqrestore(>lock, flags); +} + +#endif diff -upr linux-2.6.20.orig/init/Kconfig linux-2.6.20-0/init/Kconfig --- linux-2.6.20.orig/init/Kconfig2007-03-06 13:33:28.0 +0300 +++ linux-2.6.20-0/init/Kconfig2007-03-06 13:33:28.0
Re: [RFC][PATCH 2/7] RSS controller core
Herbert Poetzl wrote: > On Tue, Mar 06, 2007 at 02:00:36PM -0800, Andrew Morton wrote: >> On Tue, 06 Mar 2007 17:55:29 +0300 >> Pavel Emelianov <[EMAIL PROTECTED]> wrote: >> >>> +struct rss_container { >>> + struct res_counter res; >>> + struct list_head page_list; >>> + struct container_subsys_state css; >>> +}; >>> + >>> +struct page_container { >>> + struct page *page; >>> + struct rss_container *cnt; >>> + struct list_head list; >>> +}; >> ah. This looks good. I'll find a hunk of time to go through this work >> and through Paul's patches. It'd be good to get both patchsets lined >> up in -mm within a couple of weeks. But.. > > doesn't look so good for me, mainly becaus of the > additional per page data and per page processing > > on 4GB memory, with 100 guests, 50% shared for each > guest, this basically means ~1mio pages, 500k shared > and 1500k x sizeof(page_container) entries, which > roughly boils down to ~25MB of wasted memory ... > > increase the amount of shared pages and it starts > getting worse, but maybe I'm missing something here You are. Each page has only one page_container associated with it despite the number of containers it is shared between. >> We need to decide whether we want to do per-container memory >> limitation via these data structures, or whether we do it via a >> physical scan of some software zone, possibly based on Mel's patches. > > why not do simple page accounting (as done currently > in Linux) and use that for the limits, without > keeping the reference from container to page? As I've already answered in my previous letter simple limiting w/o per-container reclamation and per-container oom killer isn't a good memory management. It doesn't allow to handle resource shortage gracefully. This patchset provides more grace way to handle this, but full memory management includes accounting of VMA-length as well (returning ENOMEM from system call) but we've decided to start with RSS. > best, > Herbert > >> ___ >> Containers mailing list >> [EMAIL PROTECTED] >> https://lists.osdl.org/mailman/listinfo/containers > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BIG] Re: sched rsdl fix for 0.28
Le dimanche 11 mars 2007 à 11:07 +1100, Con Kolivas a écrit : > sched rsdl fix Doesn't change a thing. Always breaks at the same place (though depending on hardware timings? the trace is not always the same). Pretty sure nothing happens before this failure -- Nicolas Mailhot signature.asc Description: Ceci est une partie de message numériquement signée
Re: [BIG] Re: sched rsdl fix for 0.28
On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote: > Le dimanche 11 mars 2007 Ă 11:07 +1100, Con Kolivas a Ă©crit : > > sched rsdl fix > > Doesn't change a thing. Always breaks at the same place (though > depending on hardware timings? the trace is not always the same). Pretty > sure nothing happens before this failure Bummer. The only other thing to try is v0.29 posted recently. I still haven't got a good way to reproduce this locally but I'll keep trying. Thanks for testing. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BIG] Re: sched rsdl fix for 0.28
On Sunday 11 March 2007 20:21, Con Kolivas wrote: > On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote: > > Le dimanche 11 mars 2007 Ă 11:07 +1100, Con Kolivas a Ă©crit : > > > sched rsdl fix > > > > Doesn't change a thing. Always breaks at the same place (though > > depending on hardware timings? the trace is not always the same). Pretty > > sure nothing happens before this failure > > Bummer. The only other thing to try is v0.29 posted recently. I still > haven't got a good way to reproduce this locally but I'll keep trying. > Thanks for testing. Oh and if that oopses and you still have the time, could you please test 0.29 on 2.6.20.2 (available from same directory). -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: "Make nenuconfig" does not save parameters.
[Sam Ravnborg - Sat, Mar 10, 2007 at 11:45:34PM +0100] | On Sat, Mar 10, 2007 at 10:34:41PM +0100, Jan Engelhardt wrote: | > | > On Mar 10 2007 22:27, Sam Ravnborg wrote: | > >On Sat, Mar 10, 2007 at 07:23:41PM +0100, Jan Engelhardt wrote: | > >> | > >> Whether the 'working config file path' should change when you do | > >> 'Save as Alternate' or not, is a menuconfig axiom. Ask Sam Ravnborg | > >> if you want it changed :-) | > > | > >Current behaviour is not logical but on the other hand I do not | > >see a big need to make it so. | > >It seems that people very seldom uses "save alternate" anyway. | > > | > >But patches are welcome. | > | > ^_^ The patch has already been posted, has not it? | No. | Either we keep current behaviour or we change to the "normal" | behaviour with a "Save as..." as know from all other programs. | | Sam | Hi Sam, here is a patch for menuconfig that shows current configuration file. So I think menuconfig does its work well but the only thing we need is to show location of an _active_ configuration. Any comments are welcome (and you may swear at me too :) Cyrill diff --git a/scripts/kconfig/mconf.c b/scripts/kconfig/mconf.c index 3f9a132..cde6792 100644 --- a/scripts/kconfig/mconf.c +++ b/scripts/kconfig/mconf.c @@ -602,6 +602,12 @@ static void conf(struct menu *menu) item_set_tag('L'); item_make(_("Save an Alternate Configuration File")); item_set_tag('S'); + item_make("--- "); + item_set_tag(':'); + item_make(_("Current Configuration File: ")); + item_set_tag(':'); + item_add_str("%s", filename); + } dialog_clear(); res = dialog_menu(prompt ? prompt : _("Main Menu"), @@ -816,8 +822,11 @@ static void conf_load(void) case 0: if (!dialog_input_result[0]) return; - if (!conf_read(dialog_input_result)) + if (!conf_read(dialog_input_result)) { + memset(filename, 0x0, PATH_MAX+1); + strncpy(filename, dialog_input_result, PATH_MAX); return; + } show_textbox(NULL, _("File does not exist!"), 5, 38); break; case 1: @@ -840,8 +849,11 @@ static void conf_save(void) case 0: if (!dialog_input_result[0]) return; - if (!conf_write(dialog_input_result)) + if (!conf_write(dialog_input_result)) { + memset(filename, 0x0, PATH_MAX+1); + strncpy(filename, dialog_input_result, PATH_MAX); return; + } show_textbox(NULL, _("Can't create file! Probably a nonexistent directory."), 5, 60); break; case 1: @@ -903,7 +915,7 @@ int main(int ac, char **av) switch (res) { case 0: - if (conf_write(NULL)) { + if (conf_write(filename)) { fprintf(stderr, _("\n\n" "Error during writing of the kernel configuration.\n" "Your kernel configuration changes were NOT saved."
Re: Use of absolute timeouts for oneshot timers
On Sat, 2007-03-10 at 16:42 -0800, Jeremy Fitzhardinge wrote: > Thomas Gleixner wrote: > > It's simply enforced in NO_HZ, HIGHRES mode as we operate in absolute > > time, which is read back from the clocksource, even if we use a relative > > value for real hardware clock event devices to program the next event. > > We calculate the delta between the absolute event and now. So we never > > get an accumulating error. > > > > What problem are you observing ? > > Actually, two things. There was the unexpected pauses during boot, > which is trivially fixable by not using the Xen periodic timer, and > using the single-shot fallback. > > But I'm making the more general observation that if you use an absolute > rather than relative time to set the single-shot timeout, then you have > to deal with a long-term cumulative drift between the kernel's monotonic > time and the hypervisor's monotonic time. This can happen even if your > clocksource is derived directly from the hypervisor monotonic time, > because running ntp will warp the kernel's time, and so it will drift > with respect to the hypervisor clock. You can only avoid this by 1) not > allowing adjtime, or 2) making those same adjtime warps to the > hypervisor time. Neither of these is a good general solution. Sigh, yes. Using a relative time for the next event is probably the least ugly solution tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 5/7] Per-container OOM killer and page reclamation
Balbir Singh wrote: > Hi, Pavel, > > Please find my patch to add LRU behaviour to your latest RSS controller. Thanks for participation and additional testing :) I'll include this into next generation of patches. > Balbir Singh > Linux Technology Center > IBM, ISTL > > > > > Add LRU behaviour to the RSS controller patches posted by Pavel Emelianov > > http://lkml.org/lkml/2007/3/6/198 > > which was in turn similar to the RSS controller posted by me > > http://lkml.org/lkml/2007/2/26/8 > > Pavel's patches have a per container list of pages, which helps reduce > reclaim time of the RSS controller but the per container list of pages is > in FIFO order. I've implemented active and inactive lists per container to > help select the right set of pages to reclaim when the container is under > memory pressure. > > I've tested these patches on a ppc64 machine and they work fine for > the minimal testing I've done. > > Pavel would you please include these patches in your next iteration. > > Comments, suggestions and further improvements are as always welcome! > > Signed-off-by: <[EMAIL PROTECTED]> > --- > > include/linux/rss_container.h |1 > mm/rss_container.c| 47 > +++--- > mm/swap.c |5 > mm/vmscan.c |3 ++ > 4 files changed, 44 insertions(+), 12 deletions(-) > > diff -puN include/linux/rss_container.h~rss-container-lru2 > include/linux/rss_container.h > --- linux-2.6.20/include/linux/rss_container.h~rss-container-lru2 > 2007-03-09 22:52:56.0 +0530 > +++ linux-2.6.20-balbir/include/linux/rss_container.h 2007-03-10 > 00:39:59.0 +0530 > @@ -19,6 +19,7 @@ int container_rss_prepare(struct page *, > void container_rss_add(struct page_container *); > void container_rss_del(struct page_container *); > void container_rss_release(struct page_container *); > +void container_rss_move_lists(struct page *pg, bool active); > > int mm_init_container(struct mm_struct *mm, struct task_struct *tsk); > void mm_free_container(struct mm_struct *mm); > diff -puN mm/rss_container.c~rss-container-lru2 mm/rss_container.c > --- linux-2.6.20/mm/rss_container.c~rss-container-lru22007-03-09 > 22:52:56.0 +0530 > +++ linux-2.6.20-balbir/mm/rss_container.c2007-03-10 02:42:54.0 > +0530 > @@ -17,7 +17,8 @@ static struct container_subsys rss_subsy > > struct rss_container { > struct res_counter res; > - struct list_head page_list; > + struct list_head inactive_list; > + struct list_head active_list; > struct container_subsys_state css; > }; > > @@ -96,6 +97,26 @@ void container_rss_release(struct page_c > kfree(pc); > } > > +void container_rss_move_lists(struct page *pg, bool active) > +{ > + struct rss_container *rss; > + struct page_container *pc; > + > + if (!page_mapped(pg)) > + return; > + > + pc = page_container(pg); > + BUG_ON(!pc); > + rss = pc->cnt; > + > + spin_lock_irq(>res.lock); > + if (active) > + list_move(>list, >active_list); > + else > + list_move(>list, >inactive_list); > + spin_unlock_irq(>res.lock); > +} > + > void container_rss_add(struct page_container *pc) > { > struct page *pg; > @@ -105,7 +126,7 @@ void container_rss_add(struct page_conta > rss = pc->cnt; > > spin_lock(>res.lock); > - list_add(>list, >page_list); > + list_add(>list, >active_list); > spin_unlock(>res.lock); > > page_container(pg) = pc; > @@ -141,7 +162,10 @@ unsigned long container_isolate_pages(un > struct zone *z; > > spin_lock_irq(>res.lock); > - src = >page_list; > + if (active) > + src = >active_list; > + else > + src = >inactive_list; > > for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) { > pc = list_entry(src->prev, struct page_container, list); > @@ -152,13 +176,10 @@ unsigned long container_isolate_pages(un > > spin_lock(>lru_lock); > if (PageLRU(page)) { > - if ((active && PageActive(page)) || > - (!active && !PageActive(page))) { > - if (likely(get_page_unless_zero(page))) { > - ClearPageLRU(page); > - nr_taken++; > - list_move(>lru, dst); > - } > + if (likely(get_page_unless_zero(page))) { > + ClearPageLRU(page); > + nr_taken++; > + list_move(>lru, dst); > } > } > spin_unlock(>lru_lock); > @@ -212,7 +233,8 @@ static int rss_create(struct
Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd
> On Fri, 9 Mar 2007 09:40:40 +0800 Joe Jin <[EMAIL PROTECTED]> wrote: > > What's the error you're trying to fix? scsi_dispatch_cmd() is only > > called from scsi_request_fn() which already has an equivalent of this > > check in it just prior to calling dispatch. > > Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash > info as following at rhel4 2.6.9-42.0.2.ELsmp, The 2.6.9 base is very old in mainline terms. Are you sure the bug hasn't been fixed in mainline by other means? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 0/3] swsusp: Stop using page flags
Hi, The following three patches make swsusp use its own data structures for memory management instead of special page flags. Thus the page flags used so far by swsusp (PG_nosave, PG_nosave_free) can be used for other purposes and I believe there are some urgend needs of them. :-) Last week I sent these patches to the linux-pm and linux-mm lists and there were no negative comments. Also I've been testing them on my x86_64 boxes for a few days and apparently they don't break anything. I think they can go into -mm for testing. Comments are welcome. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 1/3] swsusp: Use inline functions for changing page flags
From: Rafael J. Wysocki <[EMAIL PROTECTED]> Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with calls to inline functions that can be changed in subsequent patches without modifying the code calling them. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> --- include/linux/suspend.h | 33 + kernel/power/snapshot.c | 48 +--- mm/page_alloc.c |6 +++--- 3 files changed, 61 insertions(+), 26 deletions(-) Index: linux-2.6.21-rc2/include/linux/suspend.h === --- linux-2.6.21-rc2.orig/include/linux/suspend.h 2007-03-02 09:05:53.0 +0100 +++ linux-2.6.21-rc2/include/linux/suspend.h2007-03-02 09:24:02.0 +0100 @@ -8,6 +8,7 @@ #include #include #include +#include /* struct pbe is used for creating lists of pages that should be restored * atomically during the resume from disk, because the page frames they have @@ -49,6 +50,38 @@ void __save_processor_state(struct saved void __restore_processor_state(struct saved_context *ctxt); unsigned long get_safe_page(gfp_t gfp_mask); +/* Page management functions for the software suspend (swsusp) */ + +static inline void swsusp_set_page_forbidden(struct page *page) +{ + SetPageNosave(page); +} + +static inline int swsusp_page_is_forbidden(struct page *page) +{ + return PageNosave(page); +} + +static inline void swsusp_unset_page_forbidden(struct page *page) +{ + ClearPageNosave(page); +} + +static inline void swsusp_set_page_free(struct page *page) +{ + SetPageNosaveFree(page); +} + +static inline int swsusp_page_is_free(struct page *page) +{ + return PageNosaveFree(page); +} + +static inline void swsusp_unset_page_free(struct page *page) +{ + ClearPageNosaveFree(page); +} + /* * XXX: We try to keep some more pages free so that I/O operations succeed * without paging. Might this be more? Index: linux-2.6.21-rc2/kernel/power/snapshot.c === --- linux-2.6.21-rc2.orig/kernel/power/snapshot.c 2007-03-02 09:05:53.0 +0100 +++ linux-2.6.21-rc2/kernel/power/snapshot.c2007-03-02 09:27:06.0 +0100 @@ -67,15 +67,15 @@ static void *get_image_page(gfp_t gfp_ma res = (void *)get_zeroed_page(gfp_mask); if (safe_needed) - while (res && PageNosaveFree(virt_to_page(res))) { + while (res && swsusp_page_is_free(virt_to_page(res))) { /* The page is unsafe, mark it for swsusp_free() */ - SetPageNosave(virt_to_page(res)); + swsusp_set_page_forbidden(virt_to_page(res)); allocated_unsafe_pages++; res = (void *)get_zeroed_page(gfp_mask); } if (res) { - SetPageNosave(virt_to_page(res)); - SetPageNosaveFree(virt_to_page(res)); + swsusp_set_page_forbidden(virt_to_page(res)); + swsusp_set_page_free(virt_to_page(res)); } return res; } @@ -91,8 +91,8 @@ static struct page *alloc_image_page(gfp page = alloc_page(gfp_mask); if (page) { - SetPageNosave(page); - SetPageNosaveFree(page); + swsusp_set_page_forbidden(page); + swsusp_set_page_free(page); } return page; } @@ -110,9 +110,9 @@ static inline void free_image_page(void page = virt_to_page(addr); - ClearPageNosave(page); + swsusp_unset_page_forbidden(page); if (clear_nosave_free) - ClearPageNosaveFree(page); + swsusp_unset_page_free(page); __free_page(page); } @@ -615,7 +615,8 @@ static struct page *saveable_highmem_pag BUG_ON(!PageHighMem(page)); - if (PageNosave(page) || PageReserved(page) || PageNosaveFree(page)) + if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page) || + PageReserved(page)) return NULL; return page; @@ -681,7 +682,7 @@ static struct page *saveable_page(unsign BUG_ON(PageHighMem(page)); - if (PageNosave(page) || PageNosaveFree(page)) + if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page)) return NULL; if (PageReserved(page) && pfn_is_nosave(pfn)) @@ -821,9 +822,10 @@ void swsusp_free(void) if (pfn_valid(pfn)) { struct page *page = pfn_to_page(pfn); - if (PageNosave(page) && PageNosaveFree(page)) { - ClearPageNosave(page); - ClearPageNosaveFree(page); + if (swsusp_page_is_forbidden(page) && +
[RFC][PATCH 3/3] mm: Remove unused page flags
From: Rafael J. Wysocki <[EMAIL PROTECTED]> Remove the two page flags that were previously used by swsusp and are no longer needed. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> --- include/linux/page-flags.h | 12 1 file changed, 12 deletions(-) Index: linux-2.6.21-rc3/include/linux/page-flags.h === --- linux-2.6.21-rc3.orig/include/linux/page-flags.h +++ linux-2.6.21-rc3/include/linux/page-flags.h @@ -82,13 +82,11 @@ #define PG_private 11 /* If pagecache, has fs-private data */ #define PG_writeback 12 /* Page is under writeback */ -#define PG_nosave 13 /* Used for system suspend/resume */ #define PG_compound14 /* Part of a compound page */ #define PG_swapcache 15 /* Swap page: swp_entry_t in private */ #define PG_mappedtodisk16 /* Has blocks allocated on-disk */ #define PG_reclaim 17 /* To be reclaimed asap */ -#define PG_nosave_free 18 /* Used for system suspend/resume */ #define PG_buddy 19 /* Page is free, on buddy lists */ /* PG_owner_priv_1 users should have descriptive aliases */ @@ -214,16 +212,6 @@ static inline void SetPageUptodate(struc ret;\ }) -#define PageNosave(page) test_bit(PG_nosave, &(page)->flags) -#define SetPageNosave(page)set_bit(PG_nosave, &(page)->flags) -#define TestSetPageNosave(page)test_and_set_bit(PG_nosave, &(page)->flags) -#define ClearPageNosave(page) clear_bit(PG_nosave, &(page)->flags) -#define TestClearPageNosave(page) test_and_clear_bit(PG_nosave, &(page)->flags) - -#define PageNosaveFree(page) test_bit(PG_nosave_free, &(page)->flags) -#define SetPageNosaveFree(page)set_bit(PG_nosave_free, &(page)->flags) -#define ClearPageNosaveFree(page) clear_bit(PG_nosave_free, &(page)->flags) - #define PageBuddy(page)test_bit(PG_buddy, &(page)->flags) #define __SetPageBuddy(page) __set_bit(PG_buddy, &(page)->flags) #define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 2/3] swsusp: Do not use page flags
From: Rafael J. Wysocki <[EMAIL PROTECTED]> Make swsusp use memory bitmaps instead of page flags for marking 'nosave' and free pages. This allows us to 'recycle' two page flags that can be used for other purposes. Also, the memory needed to store the bitmaps is allocated when necessary (ie. before the suspend) and freed after the resume which is more reasonable. The patch is designed to minimize the amount of changes and there are some nice simplifications and optimizations possible on top of it. I am going to implement them separately in the future. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> --- arch/x86_64/kernel/e820.c | 26 +--- include/linux/suspend.h | 58 +++--- kernel/power/disk.c | 23 +++- kernel/power/power.h |2 kernel/power/snapshot.c | 250 +++--- kernel/power/user.c |4 6 files changed, 281 insertions(+), 82 deletions(-) Index: linux-2.6.21-rc3/include/linux/suspend.h === --- linux-2.6.21-rc3.orig/include/linux/suspend.h +++ linux-2.6.21-rc3/include/linux/suspend.h @@ -24,63 +24,41 @@ struct pbe { extern void drain_local_pages(void); extern void mark_free_pages(struct zone *zone); -#ifdef CONFIG_PM -/* kernel/power/swsusp.c */ -extern int software_suspend(void); - -#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) +#if defined(CONFIG_PM) && defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) extern int pm_prepare_console(void); extern void pm_restore_console(void); #else static inline int pm_prepare_console(void) { return 0; } static inline void pm_restore_console(void) {} -#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */ +#endif + +#if defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND) +/* kernel/power/swsusp.c */ +extern int software_suspend(void); +/* kernel/power/snapshot.c */ +extern void __init register_nosave_region(unsigned long, unsigned long); +extern int swsusp_page_is_forbidden(struct page *); +extern void swsusp_set_page_free(struct page *); +extern void swsusp_unset_page_free(struct page *); +extern unsigned long get_safe_page(gfp_t gfp_mask); #else static inline int software_suspend(void) { printk("Warning: fake suspend called\n"); return -ENOSYS; } -#endif /* CONFIG_PM */ + +static inline void register_nosave_region(unsigned long b, unsigned long e) {} +static inline int swsusp_page_is_forbidden(struct page *p) { return 0; } +static inline void swsusp_set_page_free(struct page *p) {} +static inline void swsusp_unset_page_free(struct page *p) {} +#endif /* defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND) */ void save_processor_state(void); void restore_processor_state(void); struct saved_context; void __save_processor_state(struct saved_context *ctxt); void __restore_processor_state(struct saved_context *ctxt); -unsigned long get_safe_page(gfp_t gfp_mask); - -/* Page management functions for the software suspend (swsusp) */ - -static inline void swsusp_set_page_forbidden(struct page *page) -{ - SetPageNosave(page); -} - -static inline int swsusp_page_is_forbidden(struct page *page) -{ - return PageNosave(page); -} - -static inline void swsusp_unset_page_forbidden(struct page *page) -{ - ClearPageNosave(page); -} - -static inline void swsusp_set_page_free(struct page *page) -{ - SetPageNosaveFree(page); -} - -static inline int swsusp_page_is_free(struct page *page) -{ - return PageNosaveFree(page); -} - -static inline void swsusp_unset_page_free(struct page *page) -{ - ClearPageNosaveFree(page); -} /* * XXX: We try to keep some more pages free so that I/O operations succeed Index: linux-2.6.21-rc3/kernel/power/snapshot.c === --- linux-2.6.21-rc3.orig/kernel/power/snapshot.c +++ linux-2.6.21-rc3/kernel/power/snapshot.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -34,6 +35,10 @@ #include "power.h" +static int swsusp_page_is_free(struct page *); +static void swsusp_set_page_forbidden(struct page *); +static void swsusp_unset_page_forbidden(struct page *); + /* List of PBEs needed for restoring the pages that were allocated before * the suspend and included in the suspend image, but have also been * allocated by the "resume" kernel, so their contents cannot be written @@ -224,11 +229,6 @@ static void chain_free(struct chain_allo * of type unsigned long each). It also contains the pfns that * correspond to the start and end of the represented memory area and * the number of bit chunks in the block. - * - * NOTE: Memory bitmaps are used for two types of operations only: - * "set a bit" and "find the next bit set". Moreover, the searching - * is always carried out after all of the "set a bit" operations - * on given bitmap. */ #define BM_END_OF_MAP (~0UL)
[PATCH] drivers/isdn/hardware/eicon/: remove unused header files
Hi all, as pointed out by Robert P. J. Day, here is a patch to remove unused header files from Eicon/Dialogic ISDN driver. Signed-off-by: Armin Schindler <[EMAIL PROTECTED]> --- diff -Nur linux-2.6.20.1.orig/drivers/isdn/hardware/eicon/dbgioctl.h linux-2.6.20.1/drivers/isdn/hardware/eicon/dbgioctl.h --- linux-2.6.20.1.orig/drivers/isdn/hardware/eicon/dbgioctl.h 2007-03-10 11:21:15.0 +0100 +++ linux-2.6.20.1/drivers/isdn/hardware/eicon/dbgioctl.h 1970-01-01 01:00:00.0 +0100 @@ -1,198 +0,0 @@ - -/* - * - Copyright (c) Eicon Technology Corporation, 2000. - * - This source file is supplied for the use with Eicon - Technology Corporation's range of DIVA Server Adapters. - * - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2, or (at your option) - any later version. - * - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY OF ANY KIND WHATSOEVER INCLUDING ANY - implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. - * - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - * - */ -/*--*/ -/* file: dbgioctl.h */ -/*--*/ - -#if !defined(__DBGIOCTL_H__) - -#define __DBGIOCTL_H__ - -#ifdef NOT_YET_NEEDED -/* - * The requested operation is passed in arg0 of DbgIoctlArgs, - * additional arguments (if any) in arg1, arg2 and arg3. - */ - -typedef struct -{ ULONG arg0 ; - ULONG arg1 ; - ULONG arg2 ; - ULONG arg3 ; -} DbgIoctlArgs ; - -#defineDBG_COPY_LOGS 0 /* copy debugs to user until buffer full*/ - /* arg1: size threshold */ - /* arg2: timeout in milliseconds*/ - -#define DBG_FLUSH_LOGS 1 /* flush pending debugs to user buffer */ - /* arg1: internal driver id */ - -#define DBG_LIST_DRVS 2 /* return the list of registered drivers */ - -#defineDBG_GET_MASK3 /* get current debug mask of driver */ - /* arg1: internal driver id */ - -#defineDBG_SET_MASK4 /* set/change debug mask of driver */ - /* arg1: internal driver id */ - /* arg2: new debug mask */ - -#defineDBG_GET_BUFSIZE 5 /* get current buffer size of driver */ - /* arg1: internal driver id */ - /* arg2: new debug mask */ - -#defineDBG_SET_BUFSIZE 6 /* set new buffer size of driver */ - /* arg1: new buffer size*/ - -/* - * common internal debug message structure - */ - -typedef struct -{ unsigned short id ; /* virtual driver id */ - unsigned short type ; /* special message type */ - unsigned long seq ;/* sequence number of message */ - unsigned long size ; /* size of message in bytes */ - unsigned long next ; /* offset to next buffered message*/ - LARGE_INTEGER NTtime ; /* 100 ns since 1.1.1601 */ - unsigned char data[4] ;/* message data */ -} OldDbgMessage ; - -typedef struct -{ LARGE_INTEGER NTtime ; /* 100 ns since 1.1.1601 */ - unsigned short size ; /* size of message in bytes */ - unsigned short ; /* always 0x to indicate new msg */ - unsigned short id ; /* virtual driver id */ - unsigned short type ; /* special message type */ - unsigned long seq ;/* sequence number of message */ - unsigned char data[4] ;/* message data */ -} DbgMessage ; - -#endif - -#define DRV_ID_UNKNOWN 0x0C/* for messages via
Re: [RFC][PATCH 0/3] swsusp: Stop using page flags
On Sun, 2007-03-11 at 11:17 +0100, Rafael J. Wysocki wrote: > Hi, > > The following three patches make swsusp use its own data structures for memory > management instead of special page flags. Thus the page flags used so far by > swsusp (PG_nosave, PG_nosave_free) can be used for other purposes and I > believe > there are some urgend needs of them. :-) > > Last week I sent these patches to the linux-pm and linux-mm lists and there > were no negative comments. Also I've been testing them on my x86_64 boxes for > a few days and apparently they don't break anything. I think they can go into > -mm for testing. > > Comments are welcome. These patches have my blessing, they look good to me, but I'm not much involved with the swsusp code, so I won't ACK them. Again, thanks a bunch for freeing up 2 page flags :-) Peter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA resume slowness, e1000 MSI warning
"Michael S. Tsirkin" <[EMAIL PROTECTED]> writes: >> The only case I can see which might trigger this is if we saved >> pci-X state and then didn't restore it because we could not find >> the capability on restore. > > Hmm. pci_save_pcix_state/pci_restore_pcix_state seem to only handle > regular devices and seem to ignore the fact that for bridge PCI-X > capability has a different structure. > > Is this intentional? Probably not a such. I don't think we have any drivers for bridge devices so I don't think it matters. It likely wouldn't hurt to fix it just in case though. Do any of the mellanox cards do anything with the bridge on the card? > If not, here's a patch to fix this. Warning: completely untested. If you fix the offsets and diff this against my last fix (to never free the buffer) I think your patch makes sense. > PCI: restore bridge PCI-X capability registers after PM event > > Restore PCI-X bridge up/downstream capability registers > after PM event. This includes maxumum split transaction > commitment limit which might be vital for PCI X. > > Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index df49530..4b788ef 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -597,14 +597,19 @@ static int pci_save_pcix_state(struct pci_dev *dev) > if (pos <= 0) > return 0; > > - save_state = kzalloc(sizeof(*save_state) + sizeof(u16), GFP_KERNEL); > + save_state = kzalloc(sizeof(*save_state) + sizeof(u16) * 2, GFP_KERNEL); > if (!save_state) { > - dev_err(>dev, "Out of memory in pci_save_pcie_state\n"); > + dev_err(>dev, "Out of memory in pci_save_pcix_state\n"); > return -ENOMEM; > } > cap = (u16 *)_state->data[0]; > > - pci_read_config_word(dev, pos + PCI_X_CMD, [i++]); > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { This appears to be the proper test. > + pci_read_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, [i++]); > + pci_read_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, [i++]); > + } else > + pci_read_config_word(dev, pos + PCI_X_CMD, [i++]); > + > pci_add_saved_cap(dev, save_state); > return 0; > } > @@ -621,7 +626,11 @@ static void pci_restore_pcix_state(struct pci_dev *dev) > return; > cap = (u16 *)_state->data[0]; > > - pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]); > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > + pci_write_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, cap[i++]); > + pci_write_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, cap[i++]); These look like the proper two registers to save. > + } else > + pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]); > pci_remove_saved_cap(save_state); > kfree(save_state); > } > diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h > index f09cce2..fb7eefd 100644 > --- a/include/linux/pci_regs.h > +++ b/include/linux/pci_regs.h > @@ -332,6 +332,8 @@ > #define PCI_X_STATUS_SPL_ERR 0x2000 /* Rcvd Split Completion Error Msg */ > #define PCI_X_STATUS_266MHZ 0x4000 /* 266 MHz capable */ > #define PCI_X_STATUS_533MHZ 0x8000 /* 533 MHz capable */ > +#define PCI_X_BRIDGE_UP_SPL_CTL 10 /* PCI-X upstream split transaction limit > */ > +#define PCI_X_BRIDGE_DN_SPL_CTL 14 /* PCI-X downstream split transaction > limit */ Unless I am completely misreading the spec. While you have picked the right register to save the offsets should be 0x08 and 0x0c or 8 and 12 Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 2.6.16.44-rc1
Security fixes since 2.6.16.43: - CVE-2007-0005: Fix buffer overflow in Omnikey CardMan 4040 driver - CVE-2007-1000: [IPV6]: Handle np->opt being NULL in ipv6_getsockopt_sticky(). Location: ftp://ftp.kernel.org/pub/linux/kernel/people/bunk/linux-2.6.16.y/testing/ git tree: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git Changes since 2.6.16.43: Adrian Bunk (1): Linux 2.6.16.44-rc1 Ang Way Chuang (1): dvb-core: fix bug in CRC-32 checking on 64-bit systems Arnaldo Carvalho de Melo (1): [TCP]: Fix minisock tcp_create_openreq_child() typo. Arthur Kepner (1): IB/mthca: Use mmiowb after doorbell ring Chris Wright (1): [IPV6] fix ipv6_getsockopt_sticky copy_to_user leak Dan Yeisley (1): init_reap_node() initialization fix David Moore (1): Missing critical phys_to_virt in lib/swiotlb.c David S. Miller (4): video/aty/mach64_ct.c: fix bogus delay loop [SPARC64] bbc_i2c: Fix kenvctrld eating %100 cpu. [IPV6]: Handle np->opt being NULL in ipv6_getsockopt_sticky(). (CVE-2007-1000) SPARC64: Fix memory corruption in pci_4u_free_consistent() David Stevens (1): [IPV6]: /proc/net/anycast6 unbalanced inet6_dev refcnt Eli Cohen (1): IPoIB: Rejoin all multicast groups after a port event Eric Dumazet (1): [INET]: twcal_jiffie should be unsigned long, not int Herbert Xu (1): [UDP]: Reread uh pointer after pskb_trim Hugh Dickins (1): make ppc64 current preempt-safe Jin-Bong lee (1): DVB: cxusb: fix firmware patch for big endian systems Komuro (1): modify 3c589_cs to be SMP safe Marcel Holtmann (1): Fix buffer overflow in Omnikey CardMan 4040 driver (CVE-2007-0005) Michael S. Tsirkin (1): IB/mthca: Fix off-by-one in FMR handling on memfree Michal Wrobel (1): [IPV6]: anycast refcnt fix Olaf Kirch (1): [IPV6]: Fix for ipv6_setsockopt NULL dereference Sergey Vlasov (1): Input: psmouse - fix attribute access on 64-bit systems Makefile|2 +- arch/sparc64/kernel/pci_iommu.c |2 +- drivers/char/pcmcia/cm4040_cs.c |3 ++- drivers/infiniband/hw/mthca/mthca_cq.c |7 +++ drivers/infiniband/hw/mthca/mthca_memfree.c |2 +- drivers/infiniband/hw/mthca/mthca_qp.c | 19 +++ drivers/infiniband/hw/mthca/mthca_srq.c |8 drivers/infiniband/ulp/ipoib/ipoib_ib.c |4 +++- drivers/input/mouse/psmouse-base.c |8 +--- drivers/media/dvb/dvb-core/dvb_net.c|4 ++-- drivers/media/dvb/dvb-usb/cxusb.c |4 ++-- drivers/net/pcmcia/3c589_cs.c |7 +-- drivers/sbus/char/bbc_i2c.c | 17 + drivers/video/aty/mach64_ct.c |4 ++-- include/asm-powerpc/current.h | 12 +++- include/net/inet_timewait_sock.h|2 +- lib/swiotlb.c |2 +- mm/slab.c |2 +- net/ipv4/tcp_minisocks.c|2 +- net/ipv4/udp.c |1 + net/ipv6/addrconf.c |2 ++ net/ipv6/anycast.c |1 + net/ipv6/ipv6_sockglue.c| 14 +- 23 files changed, 95 insertions(+), 34 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] revoke: core code
On Fri, 2007-03-09 at 10:15 +0200, Pekka J Enberg wrote: > > + again: > > + restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr, > > + details); > > + > > + need_break = need_resched() || need_lockbreak(details->i_mmap_lock); > > + if (need_break) > > + goto out_need_break; > > + > > + if (restart_addr < end_addr) { > > + start_addr = restart_addr; > > + goto again; > > + } > > + return 0; > > + > > + out_need_break: > > + spin_unlock(details->i_mmap_lock); > > + cond_resched(); > > + spin_lock(details->i_mmap_lock); > > + return -EINTR; On Fri, 2007-03-09 at 13:30 +0100, Peter Zijlstra wrote: > I'm not sure this scheme works, given a sufficiently loaded machine, > this might never complete. Hmm, so what's the alternative? It's better to fail revoke than lock up the box. On Fri, 2007-03-09 at 13:30 +0100, Peter Zijlstra wrote: > I'm never sure of operator precedence and prefer: > > (vma->vm_flags & VM_SHARED) && ... > > which leaves no room for error. Thanks, fixed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] revoke: special mmap handling
From: Pekka Enberg <[EMAIL PROTECTED]> This adds special handling for revoked memory mappings. We want to raise SIGBUS when accessing revoked mappings and return ENODEV when trying to remap with mmap(2). Acked-by: Alan Cox <[EMAIL PROTECTED]> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- include/linux/mm.h |1 + mm/memory.c|3 +++ mm/mmap.c | 12 3 files changed, 12 insertions(+), 4 deletions(-) Index: uml-2.6/include/linux/mm.h === --- uml-2.6.orig/include/linux/mm.h 2007-03-11 13:07:57.0 +0200 +++ uml-2.6/include/linux/mm.h 2007-03-11 13:09:19.0 +0200 @@ -169,6 +169,7 @@ #define VM_NONLINEAR0x0080 /* Is no #define VM_MAPPED_COPY 0x0100 /* T if mapped copy of data (nommu mmap) */ #define VM_INSERTPAGE 0x0200 /* The vma has had "vm_insert_page()" done on it */ #define VM_ALWAYSDUMP 0x0400 /* Always include in core dumps */ +#define VM_REVOKED 0x0800 /* Mapping has been revoked */ #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */ #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS Index: uml-2.6/mm/memory.c === --- uml-2.6.orig/mm/memory.c2007-03-11 13:07:57.0 +0200 +++ uml-2.6/mm/memory.c 2007-03-11 13:09:19.0 +0200 @@ -2504,6 +2504,9 @@ int __handle_mm_fault(struct mm_struct * if (unlikely(is_vm_hugetlb_page(vma))) return hugetlb_fault(mm, vma, address, write_access); + if (unlikely(vma->vm_flags & VM_REVOKED)) + return VM_FAULT_SIGBUS; + pgd = pgd_offset(mm, address); pud = pud_alloc(mm, pgd, address); if (!pud) Index: uml-2.6/mm/mmap.c === --- uml-2.6.orig/mm/mmap.c 2007-03-11 13:07:57.0 +0200 +++ uml-2.6/mm/mmap.c 2007-03-11 13:09:19.0 +0200 @@ -1030,10 +1030,14 @@ accountable = 0; error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, , _link, _parent); - if (vma && vma->vm_start < addr + len) { - if (do_munmap(mm, addr, len)) - return -ENOMEM; - goto munmap_back; + if (vma) { + if (unlikely(vma->vm_flags & VM_REVOKED)) + return -ENODEV; + if (vma->vm_start < addr + len) { + if (do_munmap(mm, addr, len)) + return -ENOMEM; + goto munmap_back; + } } /* Check against address space limit. */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] revoke: core code
From: Pekka Enberg <[EMAIL PROTECTED]> The revokeat(2) and frevoke(2) system calls invalidate open file descriptors and shared mappings of an inode. After an successful revocation, operations on file descriptors fail with the EBADF or ENXIO error code for regular and device files, respectively. Attempting to read from or write to a revoked mapping causes SIGBUS. The actual operation is done in two passes: 1. Revoke all file descriptors that point to the given inode. We do this under tasklist_lock so that after this pass, we don't need to worry about racing with close(2) or dup(2). 2. Take down shared memory mappings of the inode and close all file pointers. The file descriptors and memory mapping ranges are preserved until the owning task does close(2) and munmap(2), respectively. Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- fs/Makefile |2 fs/revoke.c | 588 +++ fs/revoked_inode.c | 378 +++ include/linux/fs.h |4 include/linux/revoked_fs_i.h | 20 + include/linux/syscalls.h |3 6 files changed, 994 insertions(+), 1 deletion(-) Index: uml-2.6/fs/Makefile === --- uml-2.6.orig/fs/Makefile2007-03-11 13:07:57.0 +0200 +++ uml-2.6/fs/Makefile 2007-03-11 13:09:20.0 +0200 @@ -11,7 +11,7 @@ obj-y := open.o read_write.o file_table. attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ pnode.o drop_caches.o splice.o sync.o utimes.o \ - stack.o + stack.o revoke.o revoked_inode.o ifeq ($(CONFIG_BLOCK),y) obj-y += buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o Index: uml-2.6/include/linux/syscalls.h === --- uml-2.6.orig/include/linux/syscalls.h 2007-03-11 13:07:57.0 +0200 +++ uml-2.6/include/linux/syscalls.h2007-03-11 13:09:20.0 +0200 @@ -605,4 +605,7 @@ asmlinkage long sys_getcpu(unsigned __us int kernel_execve(const char *filename, char *const argv[], char *const envp[]); +asmlinkage int sys_revokeat(int dfd, const char __user *filename); +asmlinkage int sys_frevoke(unsigned int fd); + #endif Index: uml-2.6/include/linux/fs.h === --- uml-2.6.orig/include/linux/fs.h 2007-03-11 13:07:57.0 +0200 +++ uml-2.6/include/linux/fs.h 2007-03-11 13:09:20.0 +0200 @@ -1100,6 +1100,7 @@ struct file_operations { int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); + int (*revoke)(struct file *); }; struct inode_operations { @@ -1739,6 +1740,9 @@ extern ssize_t generic_splice_sendpage(s extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, size_t len, unsigned int flags); +/* fs/revoke.c */ +extern int generic_file_revoke(struct file *); + extern void file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping); extern loff_t no_llseek(struct file *file, loff_t offset, int origin); Index: uml-2.6/fs/revoke.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ uml-2.6/fs/revoke.c 2007-03-11 13:14:42.0 +0200 @@ -0,0 +1,588 @@ +/* + * fs/revoke.c - Invalidate all current open file descriptors of an inode. + * + * Copyright (C) 2006-2007 Pekka Enberg + * + * This file is released under the GPLv2. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * This is used for pre-allocating an array of file pointers so that we don't + * have to do memory allocation under tasklist_lock. + */ +struct revoke_table { + struct file **files; + unsigned long size; + unsigned long end; + unsigned long restore_start; +}; + +struct kmem_cache *revokefs_inode_cache; + +/* + * Revoked file descriptors point to inodes in the revokefs filesystem. + */ +static struct vfsmount *revokefs_mnt; + +static struct file *get_revoked_file(void) +{ + struct dentry *dentry; + struct inode *inode; + struct file *filp; + struct qstr name; + + filp = get_empty_filp(); + if (!filp) + goto err; + + inode = new_inode(revokefs_mnt->mnt_sb); + if (!inode) + goto err_inode; + + name.name = "revoked_file"; + name.len = strlen(name.name); + dentry = d_alloc(revokefs_mnt->mnt_sb->s_root, ); + if (!dentry) + goto err_dentry; + +
[PATCH 3/5] revoke: support for ext2 and ext3
From: Pekka Enberg <[EMAIL PROTECTED]> Add revoke support to ext2 and ext3 by wiring f_ops->revoke with generic_file_revoke. Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- fs/ext2/file.c |1 + fs/ext3/file.c |1 + 2 files changed, 2 insertions(+) Index: uml-2.6/fs/ext2/file.c === --- uml-2.6.orig/fs/ext2/file.c 2007-03-11 13:05:33.0 +0200 +++ uml-2.6/fs/ext2/file.c 2007-03-11 13:09:21.0 +0200 @@ -56,6 +56,7 @@ const struct file_operations ext2_file_o .sendfile = generic_file_sendfile, .splice_read= generic_file_splice_read, .splice_write = generic_file_splice_write, + .revoke = generic_file_revoke, }; #ifdef CONFIG_EXT2_FS_XIP Index: uml-2.6/fs/ext3/file.c === --- uml-2.6.orig/fs/ext3/file.c 2007-03-11 13:05:33.0 +0200 +++ uml-2.6/fs/ext3/file.c 2007-03-11 13:09:21.0 +0200 @@ -123,6 +123,7 @@ const struct file_operations ext3_file_o .sendfile = generic_file_sendfile, .splice_read= generic_file_splice_read, .splice_write = generic_file_splice_write, + .revoke = generic_file_revoke, }; const struct inode_operations ext3_file_inode_operations = { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] revoke: add documentation
From: Pekka Enberg <[EMAIL PROTECTED]> This documents revoke file operation in Documentation/filesystems/vfs.txt. Acked-by: Alan Cox <[EMAIL PROTECTED]> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- Documentation/filesystems/vfs.txt |5 + 1 file changed, 5 insertions(+) Index: uml-2.6/Documentation/filesystems/vfs.txt === --- uml-2.6.orig/Documentation/filesystems/vfs.txt 2007-03-11 13:05:33.0 +0200 +++ uml-2.6/Documentation/filesystems/vfs.txt 2007-03-11 13:09:22.0 +0200 @@ -732,6 +732,7 @@ struct file_operations { int); ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned int); + int (*revoke)(struct file *); }; Again, all methods are called without any locks being held, unless @@ -805,6 +806,10 @@ otherwise noted. splice_read: called by the VFS to splice data from file to a pipe. This method is used by the splice(2) system call + revoke: called by revokeat(2) and frevoke(2) system calls to revoke access + to an open file. This method must ensure that all currently blocked + writes are flushed and reads will fail. + Note that the file operations are implemented by the specific filesystem in which the inode resides. When opening a device node (character or block special) most filesystems will call special - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] revoke: wire up i386 system calls
From: Pekka Enberg <[EMAIL PROTECTED]> Make revokeat and frevoke system calls available to user-space on i386. Acked-by: Alan Cox <[EMAIL PROTECTED]> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]> --- arch/i386/kernel/syscall_table.S |3 +++ include/asm-i386/unistd.h|4 +++- 2 files changed, 6 insertions(+), 1 deletion(-) Index: uml-2.6/arch/i386/kernel/syscall_table.S === --- uml-2.6.orig/arch/i386/kernel/syscall_table.S 2007-03-11 13:05:32.0 +0200 +++ uml-2.6/arch/i386/kernel/syscall_table.S2007-03-11 13:09:23.0 +0200 @@ -319,3 +319,6 @@ .long sys_unshare /* 310 */ .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_revokeat /* 320 */ + .long sys_frevoke + Index: uml-2.6/include/asm-i386/unistd.h === --- uml-2.6.orig/include/asm-i386/unistd.h 2007-03-11 13:05:33.0 +0200 +++ uml-2.6/include/asm-i386/unistd.h 2007-03-11 13:09:23.0 +0200 @@ -325,10 +325,12 @@ #define __NR_unshare 310 #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_revokeat 320 +#define __NR_frevoke 321 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 322 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA resume slowness, e1000 MSI warning
> Quoting Eric W. Biederman <[EMAIL PROTECTED]>: > Subject: Re: SATA resume slowness, e1000 MSI warning > > "Michael S. Tsirkin" <[EMAIL PROTECTED]> writes: > > >> The only case I can see which might trigger this is if we saved > >> pci-X state and then didn't restore it because we could not find > >> the capability on restore. > > > > Hmm. pci_save_pcix_state/pci_restore_pcix_state seem to only handle > > regular devices and seem to ignore the fact that for bridge PCI-X > > capability has a different structure. > > > > Is this intentional? > > Probably not a such. I don't think we have any drivers for bridge > devices so I don't think it matters. It likely wouldn't hurt to fix > it just in case though. > > Do any of the mellanox cards do anything with the bridge on the card? Yes but they do their own thing wrt saving/restoring registers. Look at drivers/infiniband/hw/mthca/mthca_reset.c > > If not, here's a patch to fix this. Warning: completely untested. > > If you fix the offsets and diff this against my last fix (to never > free the buffer) I think your patch makes sense. Let's agree what the correct offsets are. > > PCI: restore bridge PCI-X capability registers after PM event > > > > Restore PCI-X bridge up/downstream capability registers > > after PM event. This includes maxumum split transaction > > commitment limit which might be vital for PCI X. > > > > Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index df49530..4b788ef 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -597,14 +597,19 @@ static int pci_save_pcix_state(struct pci_dev *dev) > > if (pos <= 0) > > return 0; > > > > - save_state = kzalloc(sizeof(*save_state) + sizeof(u16), GFP_KERNEL); > > + save_state = kzalloc(sizeof(*save_state) + sizeof(u16) * 2, GFP_KERNEL); > > if (!save_state) { > > - dev_err(>dev, "Out of memory in pci_save_pcie_state\n"); > > + dev_err(>dev, "Out of memory in pci_save_pcix_state\n"); > > return -ENOMEM; > > } > > cap = (u16 *)_state->data[0]; > > > > - pci_read_config_word(dev, pos + PCI_X_CMD, [i++]); > > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > > This appears to be the proper test. > > > + pci_read_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, [i++]); > > + pci_read_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, [i++]); > > + } else > > + pci_read_config_word(dev, pos + PCI_X_CMD, [i++]); > > + > > pci_add_saved_cap(dev, save_state); > > return 0; > > } > > @@ -621,7 +626,11 @@ static void pci_restore_pcix_state(struct pci_dev *dev) > > return; > > cap = (u16 *)_state->data[0]; > > > > - pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]); > > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > > + pci_write_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, cap[i++]); > > + pci_write_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, cap[i++]); > > These look like the proper two registers to save. > > > + } else > > + pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]); > > pci_remove_saved_cap(save_state); > > kfree(save_state); > > } > > diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h > > index f09cce2..fb7eefd 100644 > > --- a/include/linux/pci_regs.h > > +++ b/include/linux/pci_regs.h > > @@ -332,6 +332,8 @@ > > #define PCI_X_STATUS_SPL_ERR 0x2000 /* Rcvd Split Completion Error Msg > > */ > > #define PCI_X_STATUS_266MHZ 0x4000 /* 266 MHz capable */ > > #define PCI_X_STATUS_533MHZ 0x8000 /* 533 MHz capable */ > > +#define PCI_X_BRIDGE_UP_SPL_CTL 10 /* PCI-X upstream split transaction > > limit */ > > +#define PCI_X_BRIDGE_DN_SPL_CTL 14 /* PCI-X downstream split transaction > > limit */ > > Unless I am completely misreading the spec. While you have picked the > right register to save the offsets should be 0x08 and 0x0c or 8 and 12 No, the spec is written in terms of dwords (32 bit), we are storing words (16 bits). The data at offsets 8 and 12 is read-only split transaction capacity. Split transaction limit starts at bit 16 so you need to add 2 to byte offset. Right? -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CIRRUS: Delete unused header file.
On Sat, 10 Mar 2007, Andrew Morton wrote: > > On Sat, 10 Mar 2007 17:27:44 -0500 (EST) "Robert P. J. Day" <[EMAIL > > PROTECTED]> wrote: > > > > Delete apparently unused header file > > sound/pci/cs46xx/imgs/cwcemb80.h. > > > > That patch series was rather a mess > > - Multiple patches with the same Subject: (I might have lost some as a result) yes, that was a bad decision on my part, sorry. > - Several patches which tried to remove the same header file *that* shouldn't have happened, those patches were designed to be independent of one another and, AFAIK, i submitted them only once. i have no idea how the above might have happened. > - Several patches which simply didn't apply hm ... they were created against the latest git tree, i don't know why they wouldn't apply. ... > - Useless indenting in changleog text which I have to edit away. ah, i'll remember to not indent the changelog text next time, sorry. rday -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://fsdev.net/wiki/index.php?title=Main_Page - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
Hi Con, On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote: > What follows this email is a patch series for the latest version of the RSDL > cpu scheduler (ie v0.29). I have addressed all bugs that I am able to > reproduce in this version so if some people would be kind enough to test if > there are any hidden bugs or oops lurking, it would be nice to know in > anticipation of putting this back in -mm. Thanks. > > Full patch for 2.6.21-rc3-mm2: > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch I'm seeing a cpu distribution problem running this on my P4 box. Scenario: listening to music collection (mp3) via Amarok. Enable Amarok visualization gforce, and size such that X and gforce each use ~50% cpu. Start rip/encode of new CD with grip/lame encoder. Lame is set to use both cpus, at nice 5. Once the encoders start, they receive considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry database) to squabble over. With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and the encoders (100%cpu bound) get whats left when Amarok isn't eating it. I plunked the above patch into plain 2.6.21-rc3 and retested to eliminate other mm tree differences, and it's repeatable. The nice 5 cpu hogs always receive considerably more that the nice 0 sleepers. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Sunday 11 March 2007 22:39, Mike Galbraith wrote: > Hi Con, > > On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote: > > What follows this email is a patch series for the latest version of the > > RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am able > > to reproduce in this version so if some people would be kind enough to > > test if there are any hidden bugs or oops lurking, it would be nice to > > know in anticipation of putting this back in -mm. Thanks. > > > > Full patch for 2.6.21-rc3-mm2: > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29 > >.patch > > I'm seeing a cpu distribution problem running this on my P4 box. > > Scenario: > listening to music collection (mp3) via Amarok. Enable Amarok > visualization gforce, and size such that X and gforce each use ~50% cpu. > Start rip/encode of new CD with grip/lame encoder. Lame is set to use > both cpus, at nice 5. Once the encoders start, they receive > considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the > remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry > database) to squabble over. > > With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and > the encoders (100%cpu bound) get whats left when Amarok isn't eating it. > > I plunked the above patch into plain 2.6.21-rc3 and retested to > eliminate other mm tree differences, and it's repeatable. The nice 5 > cpu hogs always receive considerably more that the nice 0 sleepers. Thanks for the report. I'm assuming you're describing a single hyperthread P4 here in SMP mode so 2 logical cores. Can you elaborate on whether there is any difference as to which cpu things are bound to as well? Can you also see what happens with lame not niced to +5 (ie at 0) and with lame at nice +19. Thanks. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/9] signalfd/timerfd - signalfd core ...
On 03/10, Davide Libenzi wrote: > > +static void signalfd_put_sighand(struct signalfd_ctx *ctx, > + struct sighand_struct *sighand, > + unsigned long *flags) > +{ > + unlock_task_sighand(ctx->tsk, flags); > +} Note that signalfd_put_sighand() doesn't need "sighand" parameter, please see below. > +int signalfd_deliver(struct sighand_struct *sighand, int sig, > + struct siginfo *info) > +{ > + int nsig = 0; > + struct signalfd_ctx *ctx, *tmp; > + > + list_for_each_entry_safe(ctx, tmp, >sfdlist, lnk) { > + /* > + * We use a negative signal value as a way to broadcast that the > + * sighand has been orphaned, so that we can notify all the > + * listeners about this. Remeber the ctx->sigmask is inverted, > + * so if the user is interested in a signal, that corresponding > + * bit will be zero. > + */ > + if (sig < 0) > + list_del_init(>lnk); I'm afraid this is not right. This should be per-thread. Suppose we have threads T1 and T2 from the same thread group. sighand->sfdlist contains ctx1 and ctx2 "linked" to T1 and T2. Now, T1 exits, __exit_signal() does signalfd_notify(sighand, -1), and "unlinks" all threads, not just T1. IOW, we should do if (ctx->tsk == current) { list_del_init(>lnk); wake_up(>wqh); } Perhaps it makes sense to not re-use signalfd_deliver(), but introduce a new signalfd_xxx(sighand, tsk) helper for de_thread/exit_signal. Btw, signalfd_deliver() doesn't use "info" parameter. > + if (sig < 0 || !sigismember(>sigmask, sig)) { > + wake_up(>wqh); Minor nit. Perhaps it makes sense to do void signalfd_deliver(struct task_struct *tsk, int sig, struct sigpending *pending) { struct sighand_struct *sighand = tsk->sighand; int private = (tsk->pending == pending); list_for_each_entry_safe(ctx, tmp, >sfdlist, lnk) { if (private && ctx->tsk != tsk) continue; if (!sigismember(>sigmask, sig)) wake_up(>wqh); } } Even better: signalfd_deliver(struct task_struct *tsk, int sig, int private). This way specific_send_sig_info/send_sigqueue won't do a "false" wakeup. > +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t > sizemask) > +{ > ... > + if ((sighand = signalfd_get_sighand(ctx, )) != NULL) { > + ctx->sigmask = sigmask; > + signalfd_put_sighand(ctx, sighand, ); > + } This looks like unneeded complication to me, I'd suggest if (signalfd_get_sighand(ctx, )) { ctx->sigmask = sigmask; signalfd_put_sighand(ctx, flags); } unlock_task_sighand() (and thus signalfd_put_sighand) doesn't need "sighand" parameter. signalfd_get_sighand() is in fact boolean. It makes sense to return sighand, it may be useful, but this patch only needs != NULL. Every usage of signalfd_get_sighand() could be simplified accordingly. > --- linux-2.6.20.ep2.orig/fs/exec.c 2007-03-10 15:57:00.0 -0800 > +++ linux-2.6.20.ep2/fs/exec.c2007-03-10 15:57:51.0 -0800 > @@ -50,6 +50,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -583,6 +584,17 @@ > int count; > > /* > + * Tell all the sighand listeners that this sighand has > + * been detached. Needs to be called with the sighand lock > + * held. > + */ > + if (unlikely(!list_empty(>sfdlist))) { > + spin_lock_irq(>siglock); > + signalfd_notify(oldsighand, -1, NULL); > + spin_unlock_irq(>siglock); > + } Very minor nit. I'd suggest to make a new helper and put it in signalfd.h (like signalfd_notify()). This will help CONFIG_SIGNALFD. I still think that we should do this only for suid-exec. If application passes a signalfd to another process with unix socket, it should know what it does. But yes, I agree, we can change this later if needed. (in that case the caller of the above helper should be flush_old_exec). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Sun, 2007-03-11 at 22:48 +1100, Con Kolivas wrote: > > Thanks for the report. I'm assuming you're describing a single hyperthread P4 > here in SMP mode so 2 logical cores. Can you elaborate on whether there is > any difference as to which cpu things are bound to as well? Can you also see > what happens with lame not niced to +5 (ie at 0) and with lame at nice +19. Yes, one P4/HT/SMP. No change at nice 0, but setting the encoders to nice 19 did put X/gforce ~back where they were with 2.6.21-rc3. Tasks don't seem to be bound to any particular cpu, relies on load balancing (which appears to be working). -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
* Mike Galbraith <[EMAIL PROTECTED]> wrote: > > Full patch for 2.6.21-rc3-mm2: > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch > > I'm seeing a cpu distribution problem running this on my P4 box. > With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and > the encoders (100%cpu bound) get whats left when Amarok isn't eating > it. > > I plunked the above patch into plain 2.6.21-rc3 and retested to > eliminate other mm tree differences, and it's repeatable. The nice 5 > cpu hogs always receive considerably more that the nice 0 sleepers. hm. Do you get the same same problem on UP too? (i.e. lets eliminate any SMP/HT artifacts) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Sun, 2007-03-11 at 13:10 +0100, Ingo Molnar wrote: > * Mike Galbraith <[EMAIL PROTECTED]> wrote: > > > > Full patch for 2.6.21-rc3-mm2: > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch > > > > I'm seeing a cpu distribution problem running this on my P4 box. > > > With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and > > the encoders (100%cpu bound) get whats left when Amarok isn't eating > > it. > > > > I plunked the above patch into plain 2.6.21-rc3 and retested to > > eliminate other mm tree differences, and it's repeatable. The nice 5 > > cpu hogs always receive considerably more that the nice 0 sleepers. > > hm. Do you get the same same problem on UP too? (i.e. lets eliminate any > SMP/HT artifacts) I'll boot up nosmp and report back (but now it's time to take Opa to the Gasthaus for his Sunday afternoon brewskies;) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [git patches] libata fixes
Hello, > It seems like IRQ is not getting through. The first IRQ > driven command is failing for you. H > Extract is : > ata7: PATA max UDMA/100 cmd 0x00019c00 ctl 0x00019882 bmdma > 0x00019400 irq 16 > ata8: PATA max UDMA/100 cmd 0x00019800 ctl 0x00019482 bmdma > 0x00019408 irq 16 IRQ 16 is IO-APIC-fasteoi for libata, and is not shared... but all the others libata IRQ are IO-APIC-edge. > * Does giving 'acpi=off' or 'irqpoll' make any difference? > > * Can you connect a harddisk to the channel and see whether > that works? Tried that.. Disk is identified as ATA-7: Mastor 6Y080L0, YAR41BW0, max UDMA/13 and then timeout again... Tried then with acpi=off, same result (identify is OK, but then timeout), and irqpoll and then it was OK Let's then go back to my DVD-RW and test irqpoll... and ... Yes Got it ! It is identified, it can be mounted, and read as /dev/sr1 ! /proc/interrupts show a count of 0 for IRQ 16, so yes, it goes somewhere else... Doing some diffs on copy of /proc/interrupts while accessing the DVD gives two possibilities : IRQ14 or IRQ18, but both are also counting when not accessing the DVD... Question : does running with irqpoll affects performance ? Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: libata extension
> I believe you should be able to do this by sending ATA pass-through SCSI > commands into the device using SG_IO, without any kernel changes. It's > really the mechanism that's meant for this.. It should work, but Mark Lord reported some problems with READ_LONG on PIIX/ICH intel chipsets. I don't know if he ever resolved them but if not I have a patch that ought to. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] driver core: fix device_add error path
Dmitriy Monakhov <[EMAIL PROTECTED]> writes: > Greg Kroah-Hartman <[EMAIL PROTECTED]> writes: > >> From: James Simmons <[EMAIL PROTECTED]> >> >> When a device fails to register the class symlinks where not cleaned up. >> This left a symlink in the /sys/class/"device"/ directory that pointed >> to no where. This caused the sysfs_follow_link Oops I reported earlier. >> This patch cleanups up the symlink. Please apply. Thank you. >> >> Signed-Off: James Simmons <[EMAIL PROTECTED]> >> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> >> --- >> drivers/base/core.c | 31 ++- >> 1 files changed, 30 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/base/core.c b/drivers/base/core.c >> index d04fd33..cf2a398 100644 >> --- a/drivers/base/core.c >> +++ b/drivers/base/core.c >> @@ -637,12 +637,41 @@ int device_add(struct device *dev) >> BUS_NOTIFY_DEL_DEVICE, dev); >> device_remove_groups(dev); >> GroupError: >> -device_remove_attrs(dev); >> +device_remove_attrs(dev); >> AttrsError: >> if (dev->devt_attr) { >> device_remove_file(dev, dev->devt_attr); >> kfree(dev->devt_attr); >> } >> + >> +if (dev->class) { >> +sysfs_remove_link(>kobj, "subsystem"); >> +/* If this is not a "fake" compatible device, remove the >> + * symlink from the class to the device. */ >> +if (dev->kobj.parent != >class->subsys.kset.kobj) >> +sysfs_remove_link(>class->subsys.kset.kobj, >> + dev->bus_id); >> +#ifdef CONFIG_SYSFS_DEPRECATED >> +if (parent) { >> +char *class_name = make_class_name(dev->class->name, >> + >kobj); >> +if (class_name) >> +sysfs_remove_link(>parent->kobj, >> + class_name); >> +kfree(class_name); >> +sysfs_remove_link(>kobj, "device"); >> +} >> +#endif >> + > < block begin >> +down(>class->sem); >> +/* notify any interfaces that the device is now gone */ >> +list_for_each_entry(class_intf, >class->interfaces, node) >> +if (class_intf->remove_dev) >> +class_intf->remove_dev(dev, class_intf); >> +/* remove the device from the class list */ >> +list_del_init(>node); >> +up(>class->sem); > << block end > May be i've missed something, but i'm confuesd a litle bit. > For example if error happens while device_pm_add() we jump to label "PMError" > and code from block above will be executed (device will be remove from list), > but this device wasn't added to this list yet! I've check it one more time, code it really broken!, and i think i understand how this can happen it look like full code chunck was copy-pasted from device_del(), but in case of device_add() error path, device was't added to dev->class->devices list yet. Folowing patch fix this copy-paste error: [PATCH] driver core: fix device_add error path - At the moment we jump here device was't added to dev->class->devices list yet. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/base/core.c |9 - 1 files changed, 0 insertions(+), 9 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 142c222..7d2459b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -684,15 +684,6 @@ int device_add(struct device *dev) #endif sysfs_remove_link(>kobj, "device"); } - - down(>class->sem); - /* notify any interfaces that the device is now gone */ - list_for_each_entry(class_intf, >class->interfaces, node) - if (class_intf->remove_dev) - class_intf->remove_dev(dev, class_intf); - /* remove the device from the class list */ - list_del_init(>node); - up(>class->sem); } ueventattrError: device_remove_file(dev, >uevent_attr); -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
|> See: |> http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_ioctl.c?revision=1.37=markup OK. Mesa is in git, now, but that still applies. The gitweb url is: http://gitweb.freedesktop.org/?p=mesa/mesa.git and for the version of the above file in the master branch: http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=blob;f=src/mesa/drivers/dri/r200/r200_ioctl.c The recursive grep(1) on mesa shows: ,[grep -r sched_yield mesa] | mesa/mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchpool.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchbuffer.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include/* for sched_yield() */ | mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include/* for sched_yield() */ | mesa/mesa/src/mesa/drivers/dri/common/vblank.h: sched_yield(); \ | mesa/mesa/src/mesa/drivers/dri/unichrome/via_ioctl.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/i915/intel_ioctl.c: sched_yield(); | mesa/mesa/src/mesa/drivers/dri/r200/r200_ioctl.c: sched_yield(); ` Thanks for the heads up. I must've grep(1)ed the xorg subdir rather than the parent dir, and so missed mesa. -JimC -- James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
On Sunday 11 March 2007 23:38, James Cloos wrote: > |> See: > |> http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_i > |>octl.c?revision=1.37=markup > > OK. > > Mesa is in git, now, but that still applies. The gitweb url is: > > http://gitweb.freedesktop.org/?p=mesa/mesa.git > > and for the version of the above file in the master branch: > > http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=blob;f=src/mesa/drivers/dr >i/r200/r200_ioctl.c > > The recursive grep(1) on mesa shows: > > ,[grep -r sched_yield mesa] > > | mesa/mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c: sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchpool.c: > | sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchbuffer.c: > | sched_yield(); mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include > |/* for sched_yield() */ > | mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include/* > | for sched_yield() */ mesa/mesa/src/mesa/drivers/dri/common/vblank.h: > | sched_yield(); \ > | mesa/mesa/src/mesa/drivers/dri/unichrome/via_ioctl.c: sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/i915/intel_ioctl.c: sched_yield(); > | mesa/mesa/src/mesa/drivers/dri/r200/r200_ioctl.c: sched_yield(); > > ` > > Thanks for the heads up. I must've grep(1)ed the xorg subdir rather > than the parent dir, and so missed mesa. I just wonder what the heck all these will do to testing when using any of these drivers. Whether or not we do no yield, mild yield or full blown expiration yield, somehow or other I can't get over the feeling that if the code relies on yield() we can't really trust them to be meaningful cpu scheduler tests. This means most 3d apps out there that aren't using binary drivers, whether they be (fscking) glxgears, audio app visualisations or what... -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Locking interrupt handler in L1 cache
Hi, I have MPC 8548 Linux 2.6.x based firewall which will mostly do packet processing for 80% time. So obviously most of the time it will RX and TX packets through gianfar ethernet driver. I want to lock my interrupt handler of this driver in the L1 cache. 1. Is there any kernel API for locking function and data to lock them in the L1/L2 cache? 2. How can I use "icbtls" - Instruction Cache Block Touch and Lock Set" for locking my interrupt handler? 3. Is "icbtls" is the correct instruction at which I am looking at? 4. How do I find end address of the interrupt handler function and how do we pass it to cache locking instructions? (Because it can happen that interrupt handler size is more than a cache line, not aligned etc)? 5. Can we enhance request_irq() function to take an additional parameter to lock the interrupt handler in the cache? I understand that if my interrupt handler is going to be called most of the time then it is very likely to happen that OS will flush the same, but there is no guarantee for it. Regards, Parav Pandit Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: libata extension
Hi, On Sunday 11 March 2007, Vitaliyi wrote: > Good Day > > Say i want to implement extended set of ATA commands available to > userspace for building diagnostic tools. > I need 0x40 -- read verify and 0x32 -- write long with error handling, Mark Lord is working on READ/WRITE_LONG support for libata, he has posted draft patch recently on linux-ide mailing list. [ Please consider reading/joining linux-ide@vger.kernel.org ML, it is where Linux ATA discussion happens... ] > for example. I was trying ide driver through ioctl's, but seems it > lack of functionality and full of gotchas. Furthermore it oopses > sometimes. READ/WRITE_LONG is unsupported and as you've already noticed TASKFILE ioctls are full of gotchas... > Is it possible to use libata for such purpose or i need to write > separate IDE driver ? It should be possible using ATA pass-through, some libata changes may be required but it is the right way to go IMO. Bart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lpfc: avoid double-free during PCI error failure
ACK... Looks good... -- james s Linas Vepstas wrote: Bino, James, Please review, sign-off and forward upstream. --linas If a PCI error is detected that cannot be recovered from, there will be a double call of lpfc_pci_remove_one(), with the second call resulting in a null-pointer dereference. The first call occurs in lpfc_io_error_detected(), and the second call during pci device remove. This patch eliminates the first call; its un-needed. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> drivers/scsi/lpfc/lpfc_init.c |5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Index: linux-2.6.20-git16/drivers/scsi/lpfc/lpfc_init.c === --- linux-2.6.20-git16.orig/drivers/scsi/lpfc/lpfc_init.c 2007-03-08 15:57:40.0 -0600 +++ linux-2.6.20-git16/drivers/scsi/lpfc/lpfc_init.c2007-03-08 16:03:18.0 -0600 @@ -1817,10 +1817,9 @@ static pci_ers_result_t lpfc_io_error_de struct lpfc_sli *psli = >sli; struct lpfc_sli_ring *pring; - if (state == pci_channel_io_perm_failure) { - lpfc_pci_remove_one(pdev); + if (state == pci_channel_io_perm_failure) return PCI_ERS_RESULT_DISCONNECT; - } + pci_disable_device(pdev); /* * There may be I/Os dropped by the firmware. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Style Question
Hi, list! I have a question about coding style in linux kernel. In Documention/CodingStyle, it is said that "Linux style for comments is the C89 "/* ... */" style. Don't use C99-style "// ..." comments." _But_ I see a lot of '//' style comments in current kernel code. Which is wrong? The documentions or the code, or neither? And why? Another question is about NULL. AFAIK, in user space, using NULL is better than directly using 0 in C. In kernel, I know it used its own NULL, which may be defined as ((void*)0), but it's _still_ different from raw zero. So can I say using NULL is better than 0 in kernel? Any reply is welcome. Thanks and have a nice day! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Style Question
On Sun, 2007-03-11 at 22:15 +0800, Cong WANG wrote: [...] > Another question is about NULL. AFAIK, in user space, using NULL is > better than directly using 0 in C. In kernel, I know it used its own > NULL, which may be defined as ((void*)0), Userspace has the usually same definition. > but it's _still_ different > from raw zero. It is different that "0" as such has the type "int". But this int is automatically promoted to a "0 pointer". >So can I say using NULL is better than 0 in kernel? Yes, because it is immediately clear that a pointer is (or should be) there (and not an int). And the same holds for userspace since this is a pure C question. Bernd -- Firmix Software GmbH http://www.firmix.at/ mobil: +43 664 4416156 fax: +43 1 7890849-55 Embedded Linux Development and Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
On Sun, Mar 11, 2007 at 12:08:16PM +0300, Pavel Emelianov wrote: > Herbert Poetzl wrote: >> On Tue, Mar 06, 2007 at 02:00:36PM -0800, Andrew Morton wrote: >>> On Tue, 06 Mar 2007 17:55:29 +0300 >>> Pavel Emelianov <[EMAIL PROTECTED]> wrote: >>> +struct rss_container { + struct res_counter res; + struct list_head page_list; + struct container_subsys_state css; +}; + +struct page_container { + struct page *page; + struct rss_container *cnt; + struct list_head list; +}; >>> ah. This looks good. I'll find a hunk of time to go through this >>> work and through Paul's patches. It'd be good to get both patchsets >>> lined up in -mm within a couple of weeks. But.. >> >> doesn't look so good for me, mainly becaus of the >> additional per page data and per page processing >> >> on 4GB memory, with 100 guests, 50% shared for each >> guest, this basically means ~1mio pages, 500k shared >> and 1500k x sizeof(page_container) entries, which >> roughly boils down to ~25MB of wasted memory ... >> >> increase the amount of shared pages and it starts >> getting worse, but maybe I'm missing something here > > You are. Each page has only one page_container associated > with it despite the number of containers it is shared > between. > >>> We need to decide whether we want to do per-container memory >>> limitation via these data structures, or whether we do it via >>> a physical scan of some software zone, possibly based on Mel's >>> patches. >> >> why not do simple page accounting (as done currently >> in Linux) and use that for the limits, without >> keeping the reference from container to page? > > As I've already answered in my previous letter simple > limiting w/o per-container reclamation and per-container > oom killer isn't a good memory management. It doesn't allow > to handle resource shortage gracefully. per container OOM killer does not require any container page reference, you know _what_ tasks belong to the container, and you know their _badness_ from the normal OOM calculations, so doing them for a container is really straight forward without having any page 'tagging' for the reclamation part, please elaborate how that will differ in a (shared memory) guest from what the kernel currently does ... TIA, Herbert > This patchset provides more grace way to handle this, but > full memory management includes accounting of VMA-length > as well (returning ENOMEM from system call) but we've decided > to start with RSS. > >> best, >> Herbert >> >>> ___ >>> Containers mailing list >>> [EMAIL PROTECTED] >>> https://lists.osdl.org/mailman/listinfo/containers >> - >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Sunday 11 March 2007, Mike Galbraith wrote: >Hi Con, > >On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote: >> What follows this email is a patch series for the latest version of >> the RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am >> able to reproduce in this version so if some people would be kind >> enough to test if there are any hidden bugs or oops lurking, it would >> be nice to know in anticipation of putting this back in -mm. Thanks. >> >> Full patch for 2.6.21-rc3-mm2: >> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0 >>.29.patch > >I'm seeing a cpu distribution problem running this on my P4 box. > >Scenario: >listening to music collection (mp3) via Amarok. Enable Amarok >visualization gforce, and size such that X and gforce each use ~50% cpu. >Start rip/encode of new CD with grip/lame encoder. Lame is set to use >both cpus, at nice 5. Once the encoders start, they receive >considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the >remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry >database) to squabble over. > >With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and >the encoders (100%cpu bound) get whats left when Amarok isn't eating it. > >I plunked the above patch into plain 2.6.21-rc3 and retested to >eliminate other mm tree differences, and it's repeatable. The nice 5 >cpu hogs always receive considerably more that the nice 0 sleepers. > > -Mike Just to comment, I've been running one of the patches between 20-ck1 and this latest one, which is building as I type, but I also run gkrellm here, version 2.2.9. Since I have been running this middle of this series patch, something is killing gkrellm about once a day, and there is nothing in the logs to indicate a problem. I see a blink out of the corner of my eye, and its gone. And it always starts right back up from a kmenu click. No idea if anyone else is experiencing this or not. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) You scratch my tape, and I'll scratch yours. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
Herbert Poetzl wrote: > On Sun, Mar 11, 2007 at 12:08:16PM +0300, Pavel Emelianov wrote: >> Herbert Poetzl wrote: >>> On Tue, Mar 06, 2007 at 02:00:36PM -0800, Andrew Morton wrote: On Tue, 06 Mar 2007 17:55:29 +0300 Pavel Emelianov <[EMAIL PROTECTED]> wrote: > +struct rss_container { > + struct res_counter res; > + struct list_head page_list; > + struct container_subsys_state css; > +}; > + > +struct page_container { > + struct page *page; > + struct rss_container *cnt; > + struct list_head list; > +}; ah. This looks good. I'll find a hunk of time to go through this work and through Paul's patches. It'd be good to get both patchsets lined up in -mm within a couple of weeks. But.. >>> doesn't look so good for me, mainly becaus of the >>> additional per page data and per page processing >>> >>> on 4GB memory, with 100 guests, 50% shared for each >>> guest, this basically means ~1mio pages, 500k shared >>> and 1500k x sizeof(page_container) entries, which >>> roughly boils down to ~25MB of wasted memory ... >>> >>> increase the amount of shared pages and it starts >>> getting worse, but maybe I'm missing something here >> You are. Each page has only one page_container associated >> with it despite the number of containers it is shared >> between. >> We need to decide whether we want to do per-container memory limitation via these data structures, or whether we do it via a physical scan of some software zone, possibly based on Mel's patches. >>> why not do simple page accounting (as done currently >>> in Linux) and use that for the limits, without >>> keeping the reference from container to page? >> As I've already answered in my previous letter simple >> limiting w/o per-container reclamation and per-container >> oom killer isn't a good memory management. It doesn't allow >> to handle resource shortage gracefully. > > per container OOM killer does not require any container > page reference, you know _what_ tasks belong to the > container, and you know their _badness_ from the normal > OOM calculations, so doing them for a container is really > straight forward without having any page 'tagging' That's true. If you look at the patches you'll find out that no code in oom killer uses page 'tag'. > for the reclamation part, please elaborate how that will > differ in a (shared memory) guest from what the kernel > currently does ... This is all described in the code and in the discussions we had before. > TIA, > Herbert > >> This patchset provides more grace way to handle this, but >> full memory management includes accounting of VMA-length >> as well (returning ENOMEM from system call) but we've decided >> to start with RSS. >> >>> best, >>> Herbert >>> ___ Containers mailing list [EMAIL PROTECTED] https://lists.osdl.org/mailman/listinfo/containers >>> - >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to [EMAIL PROTECTED] >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >>> > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/9] signalfd/timerfd - timerfd core ...
Davide, On Sat, 2007-03-10 at 18:22 -0800, Davide Libenzi wrote: Some remarks: > + > +asmlinkage long sys_timerfd(int ufd, int clockid, int tmrtype, > + const struct timespec __user *utmr) > +{ > + int error; > + struct timerfd_ctx *ctx; > + struct file *file; > + struct inode *inode; > + ktime_t tval, tnow; > + struct timespec ktmr, tmrnow; > + > + error = -EFAULT; > + if (copy_from_user(, utmr, sizeof(ktmr))) > + goto err_exit; Please do not use goto for a simple return -EFAULT; Please validate the timespec before converting it. if (!timespec_valid()) return -EINVAL; > + tval = timespec_to_ktime(ktmr); > + error = -EINVAL; > + if (clockid != CLOCK_MONOTONIC && > + clockid != CLOCK_REALTIME) > + goto err_exit; > + switch (tmrtype) { > + case TFD_TIMER_REL: > + case TFD_TIMER_SEQ: > + break; > + case TFD_TIMER_ABS: > + getnstimeofday(); > + tnow = timespec_to_ktime(tmrnow); tnow = ktime_get(); > + if (ktime_to_ns(tval) <= ktime_to_ns(tnow)) > + goto err_exit; > + tval = ktime_sub(tval, tnow); Why do you want to do that ? hrtimers handle relative and absolute expiry times. You break down everything to relative time and lose the accuracy for absolute timers. > + break; > + default: > + goto err_exit; > + } > + > + if (ufd == -1) { > + error = -ENOMEM; > + ctx = kmem_cache_alloc(timerfd_ctx_cachep, GFP_KERNEL); > + if (!ctx) > + goto err_exit; > + > + init_waitqueue_head(>wqh); > + spin_lock_init(>lock); > + ctx->ticks = 0; > + ctx->tmrtype = tmrtype; > + ctx->clockid = clockid; > + ctx->tval = tval; > + hrtimer_init(>tmr, ctx->clockid, HRTIMER_REL); > + ctx->tmr.expires = ctx->tval; > + ctx->tmr.function = timerfd_tmrproc; > + > + hrtimer_start(>tmr, ctx->tval, HRTIMER_REL); > + > + /* > + * When we call this, the initialization must be complete, since > + * aino_getfd() will install the fd. > + */ > + error = aino_getfd(, , , "[timerfd]", > +_fops, ctx); > + if (error) > + goto err_fdalloc; Why is the timer started before we have everything in place ? Also if you turn it around then the (re)programming part of the timer can be shared. > + } else { > + error = -EBADF; > + file = fget(ufd); > + if (!file) > + goto err_exit; > + ctx = file->private_data; > + error = -EINVAL; > + if (file->f_op != _fops) { > + fput(file); > + goto err_exit; > + } > + > + /* > + * We need to stop the exiting timer before. We call > + * hrtimer_cancel() w/out holding our lock. > + */ > + spin_lock_irq(>lock); > + while (hrtimer_active(>tmr)) { > + spin_unlock_irq(>lock); > + hrtimer_cancel(>tmr); > + spin_lock_irq(>lock); > + } Please use hrtimer_try_to_cancel() retry: spin_lock_irq(): if (hrtimer_try_to_cancel(>tmr) < 0) { spin_unlock_irq(); cpu_relax(); goto retry; } > + > +static unsigned int timerfd_poll(struct file *file, poll_table *wait) > +{ > + struct timerfd_ctx *ctx = file->private_data; > + > + poll_wait(file, >wqh, wait); > + > + return ctx->ticks ? POLLIN: 0; This is racy: timer is set up (non periodic) timer expires poll now poll is stuck for ever ! tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Style Question
Cong WANG wrote: Hi, list! I have a question about coding style in linux kernel. In Documention/CodingStyle, it is said that "Linux style for comments is the C89 "/* ... */" style. Don't use C99-style "// ..." comments." _But_ I see a lot of '//' style comments in current kernel code. Which is wrong? The documentions or the code, or neither? And why? The code.. As with a lot of coding style issues, it's likely just that nobody saw it and bothered to complain when it went in. Another question is about NULL. AFAIK, in user space, using NULL is better than directly using 0 in C. In kernel, I know it used its own NULL, which may be defined as ((void*)0), but it's _still_ different from raw zero. So can I say using NULL is better than 0 in kernel? It's the preferred style, Sparse will complain about using 0 for a null pointer for example.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram
PAGE_MASK is an unsigned long, so using it to mask physical addresses on i386 (which are 64-bit wide) leads to truncation. This can result in page->private of unrelated memory pages being modified, with disasterous results. Fix by not using PAGE_MASK for physical addresses; instead calculate the correct value directly from PAGE_SIZE. Also fix a similar BUG_ON(). Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/mmu.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c index 2cb4893..e85b4c7 100644 --- a/drivers/kvm/mmu.c +++ b/drivers/kvm/mmu.c @@ -131,7 +131,7 @@ static int dbg = 1; (((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1)) -#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & PAGE_MASK) +#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) #define PT64_DIR_BASE_ADDR_MASK \ (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1)) @@ -406,8 +406,8 @@ static void rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn) spte = desc->shadow_ptes[0]; } BUG_ON(!spte); - BUG_ON((*spte & PT64_BASE_ADDR_MASK) != - page_to_pfn(page) << PAGE_SHIFT); + BUG_ON((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT + != page_to_pfn(page)); BUG_ON(!(*spte & PT_PRESENT_MASK)); BUG_ON(!(*spte & PT_WRITABLE_MASK)); rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte); -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] KVM: MMU: Fix guest writes to nonpae pde
KVM shadow page tables are always in pae mode, regardless of the guest setting. This means that a guest pde (mapping 4MB of memory) is mapped to two shadow pdes (mapping 2MB each). When the guest writes to a pte or pde, we intercept the write and emulate it. We also remove any shadowed mappings corresponding to the write. Since the mmu did not account for the doubling in the number of pdes, it removed the wrong entry, resulting in a mismatch between shadow page tables and guest page tables, followed shortly by guest memory corruption. This patch fixes the problem by detecting the special case of writing to a non-pae pde and adjusting the address and number of shadow pdes zapped accordingly. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/mmu.c | 46 ++ 1 files changed, 34 insertions(+), 12 deletions(-) diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c index a1a9336..2cb4893 100644 --- a/drivers/kvm/mmu.c +++ b/drivers/kvm/mmu.c @@ -1093,22 +1093,40 @@ out: return r; } +static void mmu_pre_write_zap_pte(struct kvm_vcpu *vcpu, + struct kvm_mmu_page *page, + u64 *spte) +{ + u64 pte; + struct kvm_mmu_page *child; + + pte = *spte; + if (is_present_pte(pte)) { + if (page->role.level == PT_PAGE_TABLE_LEVEL) + rmap_remove(vcpu, spte); + else { + child = page_header(pte & PT64_BASE_ADDR_MASK); + mmu_page_remove_parent_pte(vcpu, child, spte); + } + } + *spte = 0; +} + void kvm_mmu_pre_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes) { gfn_t gfn = gpa >> PAGE_SHIFT; struct kvm_mmu_page *page; - struct kvm_mmu_page *child; struct hlist_node *node, *n; struct hlist_head *bucket; unsigned index; u64 *spte; - u64 pte; unsigned offset = offset_in_page(gpa); unsigned pte_size; unsigned page_offset; unsigned misaligned; int level; int flooded = 0; + int npte; pgprintk("%s: gpa %llx bytes %d\n", __FUNCTION__, gpa, bytes); if (gfn == vcpu->last_pt_write_gfn) { @@ -1144,22 +1162,26 @@ void kvm_mmu_pre_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes) } page_offset = offset; level = page->role.level; + npte = 1; if (page->role.glevels == PT32_ROOT_LEVEL) { - page_offset <<= 1; /* 32->64 */ + page_offset <<= 1; /* 32->64 */ + /* +* A 32-bit pde maps 4MB while the shadow pdes map +* only 2MB. So we need to double the offset again +* and zap two pdes instead of one. +*/ + if (level == PT32_ROOT_LEVEL) { + page_offset <<= 1; + npte = 2; + } page_offset &= ~PAGE_MASK; } spte = __va(page->page_hpa); spte += page_offset / sizeof(*spte); - pte = *spte; - if (is_present_pte(pte)) { - if (level == PT_PAGE_TABLE_LEVEL) - rmap_remove(vcpu, spte); - else { - child = page_header(pte & PT64_BASE_ADDR_MASK); - mmu_page_remove_parent_pte(vcpu, child, spte); - } + while (npte--) { + mmu_pre_write_zap_pte(vcpu, page, spte); + ++spte; } - *spte = 0; } } -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] KVM: More fixes for 2.6.21-rc3
This patchset contains fixes I plan to submit pre 2.6.21: a fix for large memory 32-bit hosts, and a fix for non-pae 32-bit guests. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH] KVM: MMU: Fix guest writes to nonpae pde
* Avi Kivity <[EMAIL PROTECTED]> wrote: > KVM shadow page tables are always in pae mode, regardless of the guest > setting. This means that a guest pde (mapping 4MB of memory) is > mapped to two shadow pdes (mapping 2MB each). > > When the guest writes to a pte or pde, we intercept the write and > emulate it. We also remove any shadowed mappings corresponding to the > write. Since the mmu did not account for the doubling in the number > of pdes, it removed the wrong entry, resulting in a mismatch between > shadow page tables and guest page tables, followed shortly by guest > memory corruption. > > This patch fixes the problem by detecting the special case of writing > to a non-pae pde and adjusting the address and number of shadow pdes > zapped accordingly. > > Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> tested this with both PAE and non-PAE Linux host and guest - works fine. Acked-by: Ingo Molnar <[EMAIL PROTECTED]> Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram
* Avi Kivity <[EMAIL PROTECTED]> wrote: > PAGE_MASK is an unsigned long, so using it to mask physical addresses > on i386 (which are 64-bit wide) leads to truncation. This can result > in page->private of unrelated memory pages being modified, with > disasterous results. > > Fix by not using PAGE_MASK for physical addresses; instead calculate > the correct value directly from PAGE_SIZE. Also fix a similar > BUG_ON(). > > Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> i have tested this, albeit with less than 4GB RAM. Acked-by: Ingo Molnar <[EMAIL PROTECTED]> Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] KVM: always reload segment selectors
Subject: [patch] KVM: always reload segment selectors From: Ingo Molnar <[EMAIL PROTECTED]> failed VM entry on VMX might still change %fs or %gs, thus make sure that KVM always reloads the segment selectors. This is crutial on both x86 and x86_64: x86 has __KERNEL_PDA in %fs on which things like 'current' depends and x86_64 has 0 there and needs MSR_GS_BASE to work. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- drivers/kvm/vmx.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) Index: linux/drivers/kvm/vmx.c === --- linux.orig/drivers/kvm/vmx.c +++ linux/drivers/kvm/vmx.c @@ -1896,6 +1896,27 @@ again: [cr2]"i"(offsetof(struct kvm_vcpu, cr2)) : "cc", "memory" ); + /* +* Reload segment selectors ASAP. (it's needed for a functional +* kernel: x86 relies on having __KERNEL_PDA in %fs and x86_64 +* relies on having 0 in %gs for the CPU PDA to work.) +*/ + if (fs_gs_ldt_reload_needed) { + load_ldt(ldt_sel); + load_fs(fs_sel); + /* +* If we have to reload gs, we must take care to +* preserve our gs base. +*/ + local_irq_disable(); + load_gs(gs_sel); +#ifdef CONFIG_X86_64 + wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE)); +#endif + local_irq_enable(); + + reload_tss(); + } ++kvm_stat.exits; save_msrs(vcpu->guest_msrs, NR_BAD_MSRS); @@ -1913,22 +1934,6 @@ again: kvm_run->exit_reason = vmcs_read32(VM_INSTRUCTION_ERROR); r = 0; } else { - if (fs_gs_ldt_reload_needed) { - load_ldt(ldt_sel); - load_fs(fs_sel); - /* -* If we have to reload gs, we must take care to -* preserve our gs base. -*/ - local_irq_disable(); - load_gs(gs_sel); -#ifdef CONFIG_X86_64 - wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE)); -#endif - local_irq_enable(); - - reload_tss(); - } /* * Profile KVM exit RIPs: */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KVM: always reload segment selectors
Ingo Molnar wrote: Subject: [patch] KVM: always reload segment selectors From: Ingo Molnar <[EMAIL PROTECTED]> failed VM entry on VMX might still change %fs or %gs, thus make sure that KVM always reloads the segment selectors. This is crutial on both x86 and x86_64: x86 has __KERNEL_PDA in %fs on which things like 'current' depends and x86_64 has 0 there and needs MSR_GS_BASE to work. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- drivers/kvm/vmx.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) Index: linux/drivers/kvm/vmx.c === --- linux.orig/drivers/kvm/vmx.c +++ linux/drivers/kvm/vmx.c @@ -1896,6 +1896,27 @@ again: [cr2]"i"(offsetof(struct kvm_vcpu, cr2)) : "cc", "memory" ); + /* +* Reload segment selectors ASAP. (it's needed for a functional +* kernel: x86 relies on having __KERNEL_PDA in %fs and x86_64 +* relies on having 0 in %gs for the CPU PDA to work.) +*/ + if (fs_gs_ldt_reload_needed) { + load_ldt(ldt_sel); + load_fs(fs_sel); + /* +* If we have to reload gs, we must take care to +* preserve our gs base. +*/ + local_irq_disable(); + load_gs(gs_sel); +#ifdef CONFIG_X86_64 + wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE)); +#endif + local_irq_enable(); + + reload_tss(); + } ++kvm_stat.exits; save_msrs(vcpu->guest_msrs, NR_BAD_MSRS); btw, looking at the code, we could just remove fs from the fs_gs_reload_needed and make in unconditional. VT knows how to reload segments, except if they're user segments (groan). In the case of fs, if it's used for the pda, it's obviously a kernel segment. gs is different: since only the segment base is loaded (via swapgs), the selector part could well be a userspace selector, and thus the irq-protected reload is needed. Anyway, I'm applying the patch as the above discourse is irrelevant to the fix. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/15] KVM userspace interface updates
This patchset updates the kvm userspace interface to what I hope will be the long-term stable interface. Provisions are included for extending the interface later. The patches address performance and cleanliness concerns. One patch is missing -- I'd like the string pio transfers not to include guest virtual addresses. To date all my attempts to write the patch ended with me losing consiousness. Hopefully I'll manage it soon. I'd like to submit the patchset post 2.6.21. Comments are welcome. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/15] KVM: Initialize PIO I/O count
This allows userspace to ignore the io.rep field. No a big deal, but friendly. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/svm.c |1 + drivers/kvm/vmx.c |1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c index b176f5a..c35b8c8 100644 --- a/drivers/kvm/svm.c +++ b/drivers/kvm/svm.c @@ -1037,6 +1037,7 @@ static int io_interception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) kvm_run->io.size = ((io_info & SVM_IOIO_SIZE_MASK) >> SVM_IOIO_SIZE_SHIFT); kvm_run->io.string = (io_info & SVM_IOIO_STR_MASK) != 0; kvm_run->io.rep = (io_info & SVM_IOIO_REP_MASK) != 0; + kvm_run->io.count = 1; if (kvm_run->io.string) { unsigned addr_mask; diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c index 7fd572a..d4c9f33 100644 --- a/drivers/kvm/vmx.c +++ b/drivers/kvm/vmx.c @@ -1459,6 +1459,7 @@ static int handle_io(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) = (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_DF) != 0; kvm_run->io.rep = (exit_qualification & 32) != 0; kvm_run->io.port = exit_qualification >> 16; + kvm_run->io.count = 1; if (kvm_run->io.string) { if (!get_io_count(vcpu, _run->io.count)) return 1; -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/15] KVM: Handle cpuid in the kernel instead of punting to userspace
KVM used to handle cpuid by letting userspace decide what values to return to the guest. We now handle cpuid completely in the kernel. We still let userspace decide which values the guest will see by having userspace set up the value table beforehand (this is necessary to allow management software to set the cpu features to the least common denominator, so that live migration can work). The motivation for the change is that kvm kernel code can be impacted by cpuid features, for example the x86 emulator. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/kvm.h |5 +++ drivers/kvm/kvm_main.c | 69 drivers/kvm/svm.c |4 +- drivers/kvm/vmx.c |4 +- include/linux/kvm.h| 18 - 5 files changed, 95 insertions(+), 5 deletions(-) diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h index 59cbc5b..be3a0e7 100644 --- a/drivers/kvm/kvm.h +++ b/drivers/kvm/kvm.h @@ -55,6 +55,7 @@ #define KVM_NUM_MMU_PAGES 256 #define KVM_MIN_FREE_MMU_PAGES 5 #define KVM_REFILL_PAGES 25 +#define KVM_MAX_CPUID_ENTRIES 40 #define FX_IMAGE_SIZE 512 #define FX_IMAGE_ALIGN 16 @@ -286,6 +287,9 @@ struct kvm_vcpu { u32 ar; } tr, es, ds, fs, gs; } rmode; + + int cpuid_nent; + struct kvm_cpuid_entry cpuid_entries[KVM_MAX_CPUID_ENTRIES]; }; struct kvm_memory_slot { @@ -446,6 +450,7 @@ void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value, struct x86_emulate_ctxt; +void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); int emulate_clts(struct kvm_vcpu *vcpu); int emulator_get_dr(struct x86_emulate_ctxt* ctxt, int dr, diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index 8a4984d..347467e 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -1504,6 +1504,43 @@ void save_msrs(struct vmx_msr_entry *e, int n) } EXPORT_SYMBOL_GPL(save_msrs); +void kvm_emulate_cpuid(struct kvm_vcpu *vcpu) +{ + int i; + u32 function; + struct kvm_cpuid_entry *e, *best; + + kvm_arch_ops->cache_regs(vcpu); + function = vcpu->regs[VCPU_REGS_RAX]; + vcpu->regs[VCPU_REGS_RAX] = 0; + vcpu->regs[VCPU_REGS_RBX] = 0; + vcpu->regs[VCPU_REGS_RCX] = 0; + vcpu->regs[VCPU_REGS_RDX] = 0; + best = NULL; + for (i = 0; i < vcpu->cpuid_nent; ++i) { + e = >cpuid_entries[i]; + if (e->function == function) { + best = e; + break; + } + /* +* Both basic or both extended? +*/ + if (((e->function ^ function) & 0x8000) == 0) + if (!best || e->function > best->function) + best = e; + } + if (best) { + vcpu->regs[VCPU_REGS_RAX] = best->eax; + vcpu->regs[VCPU_REGS_RBX] = best->ebx; + vcpu->regs[VCPU_REGS_RCX] = best->ecx; + vcpu->regs[VCPU_REGS_RDX] = best->edx; + } + kvm_arch_ops->decache_regs(vcpu); + kvm_arch_ops->skip_emulated_instruction(vcpu); +} +EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); + static void complete_pio(struct kvm_vcpu *vcpu) { struct kvm_io *io = >run->io; @@ -2075,6 +2112,26 @@ out: return r; } +static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, + struct kvm_cpuid *cpuid, + struct kvm_cpuid_entry __user *entries) +{ + int r; + + r = -E2BIG; + if (cpuid->nent > KVM_MAX_CPUID_ENTRIES) + goto out; + r = -EFAULT; + if (copy_from_user(>cpuid_entries, entries, + cpuid->nent * sizeof(struct kvm_cpuid_entry))) + goto out; + vcpu->cpuid_nent = cpuid->nent; + return 0; + +out: + return r; +} + static long kvm_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -2181,6 +2238,18 @@ static long kvm_vcpu_ioctl(struct file *filp, case KVM_SET_MSRS: r = msr_io(vcpu, argp, do_set_msr, 0); break; + case KVM_SET_CPUID: { + struct kvm_cpuid __user *cpuid_arg = argp; + struct kvm_cpuid cpuid; + + r = -EFAULT; + if (copy_from_user(, cpuid_arg, sizeof cpuid)) + goto out; + r = kvm_vcpu_ioctl_set_cpuid(vcpu, , cpuid_arg->entries); + if (r) + goto out; + break; + } default: ; } diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c index c35b8c8..d4b2936 100644 --- a/drivers/kvm/svm.c +++ b/drivers/kvm/svm.c @@ -1101,8 +1101,8 @@ static int task_switch_interception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_r
[PATCH 01/15] KVM: Use a shared page for kernel/user communication when runing a vcpu
Instead of passing a 'struct kvm_run' back and forth between the kernel and userspace, allocate a page and allow the user to mmap() it. This reduces needless copying and makes the interface expandable by providing lots of free space. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/kvm.h |1 + drivers/kvm/kvm_main.c | 54 +++ include/linux/kvm.h|6 ++-- 3 files changed, 44 insertions(+), 17 deletions(-) mode change 100755 => 100644 drivers/kvm/kvm_main.c diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h index 0d122bf..901b8d9 100644 --- a/drivers/kvm/kvm.h +++ b/drivers/kvm/kvm.h @@ -228,6 +228,7 @@ struct kvm_vcpu { struct mutex mutex; int cpu; int launched; + struct kvm_run *run; int interrupt_window_open; unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */ #define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long) diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c old mode 100755 new mode 100644 index 946ed86..42be8a8 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -355,6 +355,8 @@ static void kvm_free_vcpu(struct kvm_vcpu *vcpu) kvm_mmu_destroy(vcpu); vcpu_put(vcpu); kvm_arch_ops->vcpu_free(vcpu); + free_page((unsigned long)vcpu->run); + vcpu->run = NULL; } static void kvm_free_vcpus(struct kvm *kvm) @@ -1887,6 +1889,33 @@ static int kvm_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu, return r; } +static struct page *kvm_vcpu_nopage(struct vm_area_struct *vma, + unsigned long address, + int *type) +{ + struct kvm_vcpu *vcpu = vma->vm_file->private_data; + unsigned long pgoff; + struct page *page; + + *type = VM_FAULT_MINOR; + pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; + if (pgoff != 0) + return NOPAGE_SIGBUS; + page = virt_to_page(vcpu->run); + get_page(page); + return page; +} + +static struct vm_operations_struct kvm_vcpu_vm_ops = { + .nopage = kvm_vcpu_nopage, +}; + +static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma) +{ + vma->vm_ops = _vcpu_vm_ops; + return 0; +} + static int kvm_vcpu_release(struct inode *inode, struct file *filp) { struct kvm_vcpu *vcpu = filp->private_data; @@ -1899,6 +1928,7 @@ static struct file_operations kvm_vcpu_fops = { .release= kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, .compat_ioctl = kvm_vcpu_ioctl, + .mmap = kvm_vcpu_mmap, }; /* @@ -1947,6 +1977,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n) { int r; struct kvm_vcpu *vcpu; + struct page *page; r = -EINVAL; if (!valid_vcpu(n)) @@ -1961,6 +1992,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n) return -EEXIST; } + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + r = -ENOMEM; + if (!page) + goto out_unlock; + vcpu->run = page_address(page); + vcpu->host_fx_image = (char*)ALIGN((hva_t)vcpu->fx_buf, FX_IMAGE_ALIGN); vcpu->guest_fx_image = vcpu->host_fx_image + FX_IMAGE_SIZE; @@ -1990,6 +2027,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int n) out_free_vcpus: kvm_free_vcpu(vcpu); +out_unlock: mutex_unlock(>mutex); out: return r; @@ -2003,21 +2041,9 @@ static long kvm_vcpu_ioctl(struct file *filp, int r = -EINVAL; switch (ioctl) { - case KVM_RUN: { - struct kvm_run kvm_run; - - r = -EFAULT; - if (copy_from_user(_run, argp, sizeof kvm_run)) - goto out; - r = kvm_vcpu_ioctl_run(vcpu, _run); - if (r < 0 && r != -EINTR) - goto out; - if (copy_to_user(argp, _run, sizeof kvm_run)) { - r = -EFAULT; - goto out; - } + case KVM_RUN: + r = kvm_vcpu_ioctl_run(vcpu, vcpu->run); break; - } case KVM_GET_REGS: { struct kvm_regs kvm_regs; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 275354f..d88e750 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -11,7 +11,7 @@ #include #include -#define KVM_API_VERSION 4 +#define KVM_API_VERSION 5 /* * Architectural interrupt line count, and the size of the bitmap needed @@ -49,7 +49,7 @@ enum kvm_exit_reason { KVM_EXIT_SHUTDOWN = 8, }; -/* for KVM_RUN */ +/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ struct kvm_run { /* in */ __u32 emulated; /* skip current instruction */ @@ -233,7 +233,7 @@ struct kvm_dirty_log { /* * ioctls
[PATCH 12/15] KVM: Initialize the apic_base msr on svm too
Older userspace didn't care, but newer userspace (with the cpuid changes) does. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/svm.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c index 0311665..2396ada 100644 --- a/drivers/kvm/svm.c +++ b/drivers/kvm/svm.c @@ -582,6 +582,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) init_vmcb(vcpu->svm->vmcb); fx_init(vcpu); + vcpu->apic_base = 0xfee0 | + /*for vcpu 0*/ MSR_IA32_APICBASE_BSP | + MSR_IA32_APICBASE_ENABLE; return 0; -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/15] KVM: Renumber ioctls
The recent changes have left the ioctl numbers in complete disarray. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- include/linux/kvm.h | 34 +- 1 files changed, 17 insertions(+), 17 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index d89189a..93472da 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -229,34 +229,34 @@ struct kvm_cpuid { /* * ioctls for /dev/kvm fds: */ -#define KVM_GET_API_VERSION _IO(KVMIO, 1) -#define KVM_CREATE_VM _IO(KVMIO, 2) /* returns a VM fd */ -#define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 15, struct kvm_msr_list) +#define KVM_GET_API_VERSION _IO(KVMIO, 0x00) +#define KVM_CREATE_VM _IO(KVMIO, 0x01) /* returns a VM fd */ +#define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 0x02, struct kvm_msr_list) /* * ioctls for VM fds */ -#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 10, struct kvm_memory_region) +#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region) /* * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns * a vcpu fd. */ -#define KVM_CREATE_VCPU _IO(KVMIO, 11) -#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 12, struct kvm_dirty_log) +#define KVM_CREATE_VCPU _IO(KVMIO, 0x41) +#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log) /* * ioctls for vcpu fds */ -#define KVM_RUN _IO(KVMIO, 16) -#define KVM_GET_REGS _IOR(KVMIO, 3, struct kvm_regs) -#define KVM_SET_REGS _IOW(KVMIO, 4, struct kvm_regs) -#define KVM_GET_SREGS _IOR(KVMIO, 5, struct kvm_sregs) -#define KVM_SET_SREGS _IOW(KVMIO, 6, struct kvm_sregs) -#define KVM_TRANSLATE _IOWR(KVMIO, 7, struct kvm_translation) -#define KVM_INTERRUPT _IOW(KVMIO, 8, struct kvm_interrupt) -#define KVM_DEBUG_GUEST _IOW(KVMIO, 9, struct kvm_debug_guest) -#define KVM_GET_MSRS _IOWR(KVMIO, 13, struct kvm_msrs) -#define KVM_SET_MSRS _IOW(KVMIO, 14, struct kvm_msrs) -#define KVM_SET_CPUID _IOW(KVMIO, 17, struct kvm_cpuid) +#define KVM_RUN _IO(KVMIO, 0x80) +#define KVM_GET_REGS _IOR(KVMIO, 0x81, struct kvm_regs) +#define KVM_SET_REGS _IOW(KVMIO, 0x82, struct kvm_regs) +#define KVM_GET_SREGS _IOR(KVMIO, 0x83, struct kvm_sregs) +#define KVM_SET_SREGS _IOW(KVMIO, 0x84, struct kvm_sregs) +#define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation) +#define KVM_INTERRUPT _IOW(KVMIO, 0x86, struct kvm_interrupt) +#define KVM_DEBUG_GUEST _IOW(KVMIO, 0x87, struct kvm_debug_guest) +#define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs) +#define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs) +#define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid) #endif -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/15] KVM: Remove minor wart from KVM_CREATE_VCPU ioctl
That ioctl does not transfer any data, so it should be an _IO rather than an _IOW. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- include/linux/kvm.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index c6dd4a7..d89189a 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -241,7 +241,7 @@ struct kvm_cpuid { * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns * a vcpu fd. */ -#define KVM_CREATE_VCPU _IOW(KVMIO, 11, int) +#define KVM_CREATE_VCPU _IO(KVMIO, 11) #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 12, struct kvm_dirty_log) /* -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/15] KVM: Add method to check for backwards-compatible API extensions
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/kvm_main.c |6 ++ include/linux/kvm.h|5 + 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index 747966e..376538c 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -2416,6 +2416,12 @@ static long kvm_dev_ioctl(struct file *filp, r = 0; break; } + case KVM_CHECK_EXTENSION: + /* +* No extensions defined at present. +*/ + r = 0; + break; default: ; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 93472da..c93cf53 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -232,6 +232,11 @@ struct kvm_cpuid { #define KVM_GET_API_VERSION _IO(KVMIO, 0x00) #define KVM_CREATE_VM _IO(KVMIO, 0x01) /* returns a VM fd */ #define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 0x02, struct kvm_msr_list) +/* + * Check if a kvm extension is available. Argument is extension number, + * return is 1 (yes) or 0 (no, sorry). + */ +#define KVM_CHECK_EXTENSION _IO(KVMIO, 0x03) /* * ioctls for VM fds -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 14/15] KVM: Allow kernel to select size of mmap() buffer
This allows us to store offsets in the kernel/user kvm_run area, and be sure that userspace has them mapped. As offsets can be outside the kvm_run struct, userspace has no way of knowing how much to mmap. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/kvm_main.c |8 +++- include/linux/kvm.h|4 2 files changed, 11 insertions(+), 1 deletions(-) diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index ed95c9b..b81f007 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -2436,7 +2436,7 @@ static long kvm_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { void __user *argp = (void __user *)arg; - int r = -EINVAL; + long r = -EINVAL; switch (ioctl) { case KVM_GET_API_VERSION: @@ -2478,6 +2478,12 @@ static long kvm_dev_ioctl(struct file *filp, */ r = 0; break; + case KVM_GET_VCPU_MMAP_SIZE: + r = -EINVAL; + if (arg) + goto out; + r = PAGE_SIZE; + break; default: ; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index c0d10cd..dad9081 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -253,6 +253,10 @@ struct kvm_signal_mask { * return is 1 (yes) or 0 (no, sorry). */ #define KVM_CHECK_EXTENSION _IO(KVMIO, 0x03) +/* + * Get size for mmap(vcpu_fd) + */ +#define KVM_GET_VCPU_MMAP_SIZE_IO(KVMIO, 0x04) /* in bytes */ /* * ioctls for VM fds -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 13/15] KVM: Add guest mode signal mask
Allow a special signal mask to be used while executing in guest mode. This allows signals to be used to interrupt a vcpu without requiring signal delivery to a userspace handler, which is quite expensive. Userspace still receives -EINTR and can get the signal via sigwait(). Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/kvm.h |3 +++ drivers/kvm/kvm_main.c | 41 + include/linux/kvm.h|7 +++ 3 files changed, 51 insertions(+), 0 deletions(-) diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h index be3a0e7..1c4a581 100644 --- a/drivers/kvm/kvm.h +++ b/drivers/kvm/kvm.h @@ -277,6 +277,9 @@ struct kvm_vcpu { gpa_t mmio_phys_addr; int pio_pending; + int sigset_active; + sigset_t sigset; + struct { int active; u8 save_iopl; diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index 0e28f58..ed95c9b 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -1591,9 +1591,13 @@ static void complete_pio(struct kvm_vcpu *vcpu) static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { int r; + sigset_t sigsaved; vcpu_load(vcpu); + if (vcpu->sigset_active) + sigprocmask(SIG_SETMASK, >sigset, ); + /* re-sync apic's tpr */ vcpu->cr8 = kvm_run->cr8; @@ -1616,6 +1620,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = kvm_arch_ops->run(vcpu, kvm_run); + if (vcpu->sigset_active) + sigprocmask(SIG_SETMASK, , NULL); + vcpu_put(vcpu); return r; } @@ -2142,6 +2149,17 @@ out: return r; } +static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset) +{ + if (sigset) { + sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP)); + vcpu->sigset_active = 1; + vcpu->sigset = *sigset; + } else + vcpu->sigset_active = 0; + return 0; +} + static long kvm_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -2260,6 +2278,29 @@ static long kvm_vcpu_ioctl(struct file *filp, goto out; break; } + case KVM_SET_SIGNAL_MASK: { + struct kvm_signal_mask __user *sigmask_arg = argp; + struct kvm_signal_mask kvm_sigmask; + sigset_t sigset, *p; + + p = NULL; + if (argp) { + r = -EFAULT; + if (copy_from_user(_sigmask, argp, + sizeof kvm_sigmask)) + goto out; + r = -EINVAL; + if (kvm_sigmask.len != sizeof sigset) + goto out; + r = -EFAULT; + if (copy_from_user(, sigmask_arg->sigset, + sizeof sigset)) + goto out; + p = + } + r = kvm_vcpu_ioctl_set_sigmask(vcpu, ); + break; + } default: ; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index b3af92e..c0d10cd 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -234,6 +234,12 @@ struct kvm_cpuid { struct kvm_cpuid_entry entries[0]; }; +/* for KVM_SET_SIGNAL_MASK */ +struct kvm_signal_mask { + __u32 len; + __u8 sigset[0]; +}; + #define KVMIO 0xAE /* @@ -273,5 +279,6 @@ struct kvm_cpuid { #define KVM_GET_MSRS _IOWR(KVMIO, 0x88, struct kvm_msrs) #define KVM_SET_MSRS _IOW(KVMIO, 0x89, struct kvm_msrs) #define KVM_SET_CPUID _IOW(KVMIO, 0x8a, struct kvm_cpuid) +#define KVM_SET_SIGNAL_MASK _IOW(KVMIO, 0x8b, struct kvm_signal_mask) #endif -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/15] KVM: Remove the 'emulated' field from the userspace interface
We no longer emulate single instructions in userspace. Instead, we service mmio or pio requests. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/kvm_main.c |5 - include/linux/kvm.h|3 +-- 2 files changed, 1 insertions(+), 7 deletions(-) diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index 347467e..747966e 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -1588,11 +1588,6 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) /* re-sync apic's tpr */ vcpu->cr8 = kvm_run->cr8; - if (kvm_run->emulated) { - kvm_arch_ops->skip_emulated_instruction(vcpu); - kvm_run->emulated = 0; - } - if (kvm_run->io_completed) { if (vcpu->pio_pending) complete_pio(vcpu); diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 15e23bc..c6dd4a7 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -51,10 +51,9 @@ enum kvm_exit_reason { /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ struct kvm_run { /* in */ - __u32 emulated; /* skip current instruction */ __u32 io_completed; /* mmio/pio request completed */ __u8 request_interrupt_window; - __u8 padding1[7]; + __u8 padding1[3]; /* out */ __u32 exit_type; -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/15] KVM: Add a special exit reason when exiting due to an interrupt
This is redundant, as we also return -EINTR from the ioctl, but it allows us to examine the exit_reason field on resume without seeing old data. Signed-off-by: Avi Kivity <[EMAIL PROTECTED]> --- drivers/kvm/svm.c |2 ++ drivers/kvm/vmx.c |2 ++ include/linux/kvm.h |3 ++- 3 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c index b09928f..0311665 100644 --- a/drivers/kvm/svm.c +++ b/drivers/kvm/svm.c @@ -1619,12 +1619,14 @@ again: if (signal_pending(current)) { ++kvm_stat.signal_exits; post_kvm_run_save(vcpu, kvm_run); + kvm_run->exit_reason = KVM_EXIT_INTR; return -EINTR; } if (dm_request_for_irq_injection(vcpu, kvm_run)) { ++kvm_stat.request_irq_exits; post_kvm_run_save(vcpu, kvm_run); + kvm_run->exit_reason = KVM_EXIT_INTR; return -EINTR; } kvm_resched(vcpu); diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c index ba7a98b..0d1c8cf 100644 --- a/drivers/kvm/vmx.c +++ b/drivers/kvm/vmx.c @@ -1936,12 +1936,14 @@ again: if (signal_pending(current)) { ++kvm_stat.signal_exits; post_kvm_run_save(vcpu, kvm_run); + kvm_run->exit_reason = KVM_EXIT_INTR; return -EINTR; } if (dm_request_for_irq_injection(vcpu, kvm_run)) { ++kvm_stat.request_irq_exits; post_kvm_run_save(vcpu, kvm_run); + kvm_run->exit_reason = KVM_EXIT_INTR; return -EINTR; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 57f47ef..b3af92e 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -11,7 +11,7 @@ #include #include -#define KVM_API_VERSION 8 +#define KVM_API_VERSION 9 /* * Architectural interrupt line count, and the size of the bitmap needed @@ -45,6 +45,7 @@ enum kvm_exit_reason { KVM_EXIT_IRQ_WINDOW_OPEN = 7, KVM_EXIT_SHUTDOWN = 8, KVM_EXIT_FAIL_ENTRY = 9, + KVM_EXIT_INTR = 10, }; /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ -- 1.5.0.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/