Re: Two identical entries for "rtc" in /proc/devices
On Saturday 15 September 2007, Andrew Morton wrote: > On Sat, 15 Sep 2007 11:50:21 -0700 David Brownell <[EMAIL PROTECTED]> wrote: > > > > On Thu, 06 Sep 2007 18:23:22 -0400 Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > > > > > > # ls -li > > > > total 0 > > > > 4026532007 -r--r--r-- 1 root root 0 Sep 6 18:18 nvram > > > > 4026532067 -r--r--r-- 1 root root 0 Sep 6 18:18 rtc > > > > 4026532067 -r--r--r-- 1 root root 0 Sep 6 18:18 rtc > > > > 4026532056 -rw-r--r-- 1 root root 0 Sep 6 18:18 snd-page-alloc > > > > > > ... > > > > Semes pretty clear that this must be procfs itself... > > when a filesystem sees a name in a directory, it should > > refuse to make another file with the same name. And it > > should *never* reuse inode numbers... > > ... > > procfs can reject the attempt to create the file, but the bottom line > is that two different callsites are trying to create the same file. One > of those callsites needs fixing? Both of those call sites have code to handle procfs rejecting the file creation; nothing to fix. And anyway, there's no way this is a *caller* bug! The missing step seems to be that proc_register() doesn't bother to check whether there's already an entry for that file. Which is what the appended *UNTESTED* patch does (it compiles though). - Dave --- g26.orig/fs/proc/generic.c 2007-09-18 22:08:44.0 -0700 +++ g26/fs/proc/generic.c 2007-09-18 22:14:07.0 -0700 @@ -521,10 +521,11 @@ static const struct inode_operations pro .setattr= proc_notify_change, }; -static int proc_register(struct proc_dir_entry * dir, struct proc_dir_entry * dp) +static int proc_register(struct proc_dir_entry *dir, struct proc_dir_entry *dp) { unsigned int i; - + struct proc_dir_entry *de; + i = get_inode_number(); if (i == 0) return -EAGAIN; @@ -547,6 +548,16 @@ static int proc_register(struct proc_dir } spin_lock(_subdir_lock); + + for (de = dir->subdir; de ; de = de->next) { + if (de->namelen != dp->namelen) + continue; + if (!memcmp(de->name, dp->name, de->namelen)) { + spin_unlock(_subdir_lock); + return -EEXIST; + } + } + dp->next = dir->subdir; dp->parent = dir; dir->subdir = dp; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cafe_ccic: default to allocating DMA buffers at probe time
By default, we allocate DMA buffers when actually reading from the video capture device. On a system with 128MB or 256MB of ram, it's very easy for that memory to quickly become fragmented. We've had users report having 30+MB of memory free, but the cafe_ccic driver is still unable to allocate DMA buffers. Our workaround has been to make use of the 'alloc_bufs_at_load' parameter to allocate DMA buffers during device probing. This patch makes DMA buffer allocation happen during device probe by default, and changes the parameter to 'alloc_bufs_at_read'. The camera hardware is there, if the cafe_ccic driver is enabled/loaded it should do its best to ensure that the camera is actually usable; delaying DMA buffer allocation saves an insignicant amount of memory, and causes the driver to be much less useful. --- drivers/media/video/cafe_ccic.c | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/media/video/cafe_ccic.c b/drivers/media/video/cafe_ccic.c index ef53618..3588a59 100644 --- a/drivers/media/video/cafe_ccic.c +++ b/drivers/media/video/cafe_ccic.c @@ -63,13 +63,13 @@ MODULE_SUPPORTED_DEVICE("Video"); */ #define MAX_DMA_BUFS 3 -static int alloc_bufs_at_load = 0; -module_param(alloc_bufs_at_load, bool, 0444); -MODULE_PARM_DESC(alloc_bufs_at_load, - "Non-zero value causes DMA buffers to be allocated at module " - "load time. This increases the chances of successfully getting " - "those buffers, but at the cost of nailing down the memory from " - "the outset."); +static int alloc_bufs_at_read = 0; +module_param(alloc_bufs_at_read, bool, 0444); +MODULE_PARM_DESC(alloc_bufs_at_read, + "Non-zero value causes DMA buffers to be allocated when the " + "video capture device is read, rather than at module load " + "time. This saves memory, but decreases the chances of " + "successfully getting those buffers."); static int n_dma_bufs = 3; module_param(n_dma_bufs, uint, 0644); @@ -1503,7 +1503,7 @@ static int cafe_v4l_release(struct inode *inode, struct file *filp) } if (cam->users == 0) { cafe_ctlr_power_down(cam); - if (! alloc_bufs_at_load) + if (alloc_bufs_at_read) cafe_free_dma_bufs(cam); } mutex_unlock(>s_mutex); @@ -2162,7 +2162,7 @@ static int cafe_pci_probe(struct pci_dev *pdev, /* * If so requested, try to get our DMA buffers now. */ - if (alloc_bufs_at_load) { + if (!alloc_bufs_at_read) { if (cafe_alloc_dma_bufs(cam, 1)) cam_warn(cam, "Unable to alloc DMA buffers at load" " will try again later."); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[IA64] Kexec: Remove vector from ia64_machine_kexec()
The use of vector in ia64_machine_kexec() seems spurios, and removing it simplifies the code slightly. As suggested by Alex Williamson <[EMAIL PROTECTED]> Cc: Alex Williamson <[EMAIL PROTECTED]> Signed-off-by: Simon Horman <[EMAIL PROTECTED]> Index: linux-2.6/arch/ia64/kernel/machine_kexec.c === --- linux-2.6.orig/arch/ia64/kernel/machine_kexec.c 2007-09-19 13:43:42.0 +0900 +++ linux-2.6/arch/ia64/kernel/machine_kexec.c 2007-09-19 13:44:11.0 +0900 @@ -79,7 +79,6 @@ static void ia64_machine_kexec(struct un relocate_new_kernel_t rnk; void *pal_addr = efi_get_pal_addr(); unsigned long code_addr = (unsigned long)page_address(image->control_code_page); - unsigned long vector; int ii; BUG_ON(!image); @@ -107,11 +106,8 @@ static void ia64_machine_kexec(struct un /* unmask TPR and clear any pending interrupts */ ia64_setreg(_IA64_REG_CR_TPR, 0); ia64_srlz_d(); - vector = ia64_get_ivr(); - while (vector != IA64_SPURIOUS_INT_VECTOR) { + while (ia64_get_ivr() != IA64_SPURIOUS_INT_VECTOR) ia64_eoi(); - vector = ia64_get_ivr(); - } platform_kernel_launch_event(); rnk = (relocate_new_kernel_t)_addr; (*rnk)(image->head, image->start, ia64_boot_param, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc6-mm1 and acpi
Hi, while trying to compile 2.6.23-rc6-mm1 I came across the following build error: [EMAIL PROTECTED]:/usr/src/linux-2.6.23-rc6-mm1> make modules CHK include/linux/version.h CHK include/linux/utsrelease.h CALLscripts/checksyscalls.sh :1389:2: warning: #warning syscall revokeat not implemented :1393:2: warning: #warning syscall frevoke not implemented CC [M] drivers/acpi/sbs.o drivers/acpi/sbs.c: In function ‘acpi_battery_alarm_show’: drivers/acpi/sbs.c:457: error: implicit declaration of function ‘acpi_battery_get_alarm’ drivers/acpi/sbs.c: In function ‘acpi_battery_alarm_store’: drivers/acpi/sbs.c:472: error: implicit declaration of function ‘acpi_battery_set_alarm’ drivers/acpi/sbs.c: In function ‘acpi_battery_add’: drivers/acpi/sbs.c:829: warning: ignoring return value of ‘device_create_file’, declared with attribute warn_unused_result make[2]: *** [drivers/acpi/sbs.o] Fehler 1 make[1]: *** [drivers/acpi] Fehler 2 make: *** [drivers] Fehler 2 Not sure who to CC, which is why I send it to the list alone. Best, Michael -- Vote against SPAM - see http://www.politik-digital.de/spam/ Michael Gerdau email: [EMAIL PROTECTED] GPG-keys available on request or at public keyserver signature.asc Description: This is a digitally signed message part.
Re: [PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol
On Tue, 2007-09-18 at 22:30 -0700, H. Peter Anvin wrote: > Huang, Ying wrote: > > Known issues: > > > > - The hd0_info and hd1_info are deleted from the zero page. Additional > > work should be done for this? Or this is unnecessary (because no new > > fields will be added to zero page)? > > > > For backwards compatibility, they should be marked as there for the > short-medium term so we don't reuse them for whatever reason. OK, I will add them back. Best Regards, Huang Ying - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol
Huang, Ying wrote: > Known issues: > > - The hd0_info and hd1_info are deleted from the zero page. Additional > work should be done for this? Or this is unnecessary (because no new > fields will be added to zero page)? > For backwards compatibility, they should be marked as there for the short-medium term so we don't reuse them for whatever reason. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFS4 authentification / fsuid
On Sep 18, 2007, at 19:44:59, Satyam Sharma wrote: On Thu, 6 Sep 2007, Kyle Moffett wrote: On Sep 06, 2007, at 19:35:14, Trond Myklebust wrote: On Thu, 2007-09-06 at 19:30 -0400, Kyle Moffett wrote: On Sep 06, 2007, at 11:06:16, J. Bruce Fields wrote: The question of how to protect against someone with *physical* access certainly is more difficult, but surely that's a separate problem. Actually, that's a fairly simple problem (barring disassembling the system and attaching a hardware debugger). You encrypt the root filesystem and require a password to boot (See: LUKS). Debian has built-in support for installing onto fs-on-LVM-on- crypt-on-RAID, and it works quite well on all the laptops I use regularly. It's not even much of a speed penalty; once you take the overhead of hitting a 5400RPM laptop drive you can chew thousands of cycles of CPU without anybody noticing (much). Then all you have to do is burn a copy of your /boot with bootloader onto some read-only media (like a finalized CDROM/ DVDROM) and you're set to go. Disconnect battery, and watch boot password go 'poof!'. Umm, I did say "encrypt the root filesystem", didn't I? Booting my laptops The whole *point* here is to secure against physical access -- then how can you assume "barring disassembling the system"? If you're not considering attacks such as those, then how _are_ you solving the physical access problem in the first place? :-) Security is about fractional reduction of risk, and anybody who tells you otherwise is either ignorant or lying through their teeth. There are *multiple* aspects of "physical access"; one of those is access while the box is off and no data resident in volatile memory, which is the easy case. Basically there you can just encrypt the non- volatile storage. If the system is *on* and has unencrypted data in memory (such as suspend-to-RAM for example) then you *HAVE* to ensure that it can't be easily disassembled and a hardware debugger attached; there is no way around that very fundamental limitation. Basically if the key is resident and unencrypted as is necessary to *USE* the system, then no amount of hardware is going to *prevent* a dedicated attacker from getting at it unless you make it so unportable that you don't have to worry about somebody carrying it off in the first place. Typical mechanisms to increase the time and effort to break into a device include wiring the entire enclosure with extremely thin filament wires and detecting automatically wiping the system upon any variation in a small flow of current through said filament. this way follows this procedure: 1) Enter BIOS boot menu 2) Insert /boot CDROM 3) Select the "CDROM" entry 4) Wait for kernel to start and run through initramfs 5) Type password into the initramfs prompt so that it can DECRYPT THE ROOT FILESYSTEM 6) Continue to boot the system. Under this setup, tinkering with my BIOS does virtually nothing; the only avenues of attack are strictly of the "Install a hardware keylogger" variety. Doesn't flashing/replacing your BIOS firmware/chip count as tinkering? Then I don't really need a "hardware keylogger", do I ... Ok, so you are saying your plan of attack on this system would be: 1) Steal the laptop such that I don't notice it has been stolen 2) Open it up 3) Replace the very-vendor-specific BIOS chip with a reflashed one with sufficient storage to do all the things the old BIOS could *AND* have enough storage for an entire replacement kernel binary with a built-in keylogger, as well as some storage for the logged password 4) Return the laptop, again such that I don't notice it has been missing 5) Wait for me to boot and type my password 6) Somehow recover the laptop *yet* *again* to get the password back off of it and decrypt the disk Yes it "can be done", but so can dumping the firmware for an iPod out through the built-in piezo clicker[1]. USE SOME COMMON SENSE HERE PEOPLE!!! The only "unbreakable" computer is one always disconnected and off under armed guard in a bank vault, and even then it's only as secure as the bank in which it is stored (which get broken into on occasion). I am assuming that if the laptop has sufficiently important data on it to warrant the above steps then I am also clueful enough to: (A) Not carry the laptop around unsecured areas, (B) Keep a close enough eye on it and be aware that it's gone by the time they get to step 2, OR (C) Pay somebody to build me a better physical chassis for my laptop We are talking about *STANDARD* laptop systems with reasonably alert users. If the user doesn't know how to properly protect the stuff on the laptop then they probably don't know how to properly protect the other copy in their heads, either. Besides, if some government wanted the data on your laptop that bad they'd just pick you up in the
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On Tue, Sep 18, 2007 at 06:06:52PM -0700, Linus Torvalds wrote: > > especially as the Linux > > kernel limitations in this area are well known. There's no "16K mess" > > that SGI is trying to clean up here (and SGI have offered both IA64 and > > x86_64 systems for some time now, so not sure how you came up with that > > whacko theory). > > Well, if that is the case, then I vote that we drop the whole patch-series > entirely. It clearly has no reason for existing at all. > > There is *no* valid reason for 16kB blocksizes unless you have legacy > issues. Ok, let's step back for a moment and look at a basic, fundamental constraint of disks - seek capacity. A decade ago, a terabyte of filesystem had 30 disks behind it - a seek capacity of about 6000 seeks/s. Nowdays, that's a single disk with a seek capacity of about 200/s. We're going *rapidly* backwards in terms of seek capacity per terabyte of storage. Now fill that terabyte of storage and index it in the most efficient way - let's say btrees are used because lots of filesystems use them. Hence the depth of the tree is roughly O((log n)/m) where m is a factor of the btree block size. Effectively, btree depth = seek count on lookup of any object. When the filesystem had a capacity of 6,000 seeks/s, we didn't really care if the indexes used 4k blocks or not - the storage subsystem had an excess of seek capacity to deal with less-than-optimal indexing. Now we have over an order of magnitude less seeks to expend in index operations *for the same amount of data* so we are really starting to care about minimising the number of seeks in our indexing mechanisms and allocations. We can play tricks in index compaction to reduce the number of interior nodes of the tree (like hashed indexing in the XFS ext3 htree directories) but that still only gets us so far in reducing seeks and doesn't help at all for tree traversals. That leaves us with the btree block size as the only factor we can further vary to reduce the depth of the tree. i.e. "m". So we want to increase the filesystem block size it improve the efficiency of our indexing. That improvement in efficiency translates directly into better performance on seek constrained storage subsystems. The problem is this: to alter the fundamental block size of the filesystem we also need to alter the data block size and that is exactly the piece that linux does not support right now. So while we have the capability to use large block sizes in certain filesystems, we can't use that capability until the data path supports it. To summarise, large block size support in the filesystem is not about "legacy" issues. It's about trying to cope with the rapid expansion of storage capabilities of modern hardware where we have to index much, much more data with a corresponding decrease in the seek capability of the hardware. > So get your stories straight, people. Ok, so let's set the record straight. There were 3 justifications for using *large pages* to *support* large filesystem block sizes The justifications for the variable order page cache with large pages were: 1. little code change needed in the filesystems -> still true 2. Increased I/O sizes on 4k page machines (the "SCSI controller problem") -> redundant thanks to Jens Axboe's quick work 3. avoiding the need for vmap() as it has great overhead and does not scale -> Nick is starting to work on that and has already had good results. Everyone seems to be focussing on #2 as the entire justification for large block sizes in filesystems and that this is an "SGI" problem. Nothing could be further from the truth - the truth is that large pages solved multiple problems in one go. We now have a different, better solution #2, so please, please stop using that as some justification for claiming filesystems don't need large block sizes. However, all this doesn't change the fact that we have a major storage scalability crunch coming in the next few years. Disk capacity is likely to continue to double every 12 months for the next 3 or 4 years. Large block size support is only one mechanism we need to help cope with this trend. The variable order page cache with large pages was a means to an end - it's not the only solution to this problem and I'm extremely happy to see that there is progress on multiple fronts. That's the strength of the Linux community showing through. In the end, I really don't care how we end up supporting large filesystem block sizes in the page cache - all I care about is that we end up supporting it as efficiently and generically as we possibly can. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
[PATCH -mm] sound/hda: fix help text
From: Randy Dunlap <[EMAIL PROTECTED]> Fix hda help text typo. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- sound/pci/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.23-rc6-mm1.orig/sound/pci/Kconfig +++ linux-2.6.23-rc6-mm1/sound/pci/Kconfig @@ -506,7 +506,7 @@ config SND_HDA_HWDEP select SND_HWDEP help Say Y here to build a hwdep interface for HD-audio driver. - This interface can be used for out-of-bound communication + This interface can be used for out-of-band communication with codecs for debugging purposes. config SND_HDA_CODEC_REALTEK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] kgdb: fix help text
From: Randy Dunlap <[EMAIL PROTECTED]> Fix kgdb help text typos, grammar, config symbol names, and indentation. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- lib/Kconfig.kgdb | 42 -- 1 file changed, 20 insertions(+), 22 deletions(-) --- linux-2.6.23-rc6-mm1.orig/lib/Kconfig.kgdb +++ linux-2.6.23-rc6-mm1/lib/Kconfig.kgdb @@ -27,15 +27,15 @@ config KGDB_ARCH_HAS_SHADOW_INFO config KGDB_CONSOLE bool "KGDB: Console messages through gdb" depends on KGDB - help - If you say Y here, console messages will appear through gdb. - Other consoles such as tty or ttyS will continue to work as usual. - Note, that if you use this in conjunction with KGDB_ETH, if the - ethernet driver runs into an error condition during use with KGDB - it is possible to hit an infinite recusrion, causing the kernel - to crash, and typically reboot. For this reason, it is preferable - to use NETCONSOLE in conjunction with KGDB_ETH instead of - KGDB_CONSOLE. + help + If you say Y here, console messages will appear through gdb. + Other consoles such as tty or ttyS will continue to work as usual. + Note that if you use this in conjunction with KGDBOE, if the + ethernet driver runs into an error condition during use with KGDB, + it is possible to hit an infinite recursion, causing the kernel + to crash, and typically reboot. For this reason, it is preferable + to use NETCONSOLE in conjunction with KGDBOE instead of + KGDB_CONSOLE. choice prompt "Method for KGDB communication" @@ -106,7 +106,7 @@ config KGDB_TXX9 bool "KGDB: On TX49xx serial port" depends on MIPS && CPU_TX49XX help - Uses TX49xx serial port to communicate with the host KGDB. + Uses TX49xx serial port to communicate with the host GDB. config KGDB_SH_SCI bool "KGDB: On SH SCI(F) serial port" @@ -251,20 +251,18 @@ config KGDB_8250_CONF_STRING depends on KGDB_8250_NOMODULE && !KGDB_SIMPLE_SERIAL default "io,2f8,115200,3" if X86 help - The format of this string should be ,,,. For example, to use the - serial port on an i386 box located at 0x2f8 and 115200 baud - on IRQ 3 at use: - io,2f8,115200,3 + The format of this string should be , + ,,. For example, on an i386 box, + to use the serial port located at 0x2f8, IRQ 3, at 115200 baud + use: io,2f8,115200,3 config KGDB_ATTACH_WAIT bool "KGDB: Wait for debugger to attach on an unknown exception" default y if KGDB_8250_NOMODULE default n if !KGDB_8250_NOMODULE - help - If a panic occurs, or any kind of exception the kgdb will - stop and wait for a debugger to attach. This sets the - default behavior for waiting for the debugger to attach. This - value can also be changed at runtime through - /sys/module/kgdb/paramaters/attachwait - + help + If a panic occurs, or any kind of exception, the kgdb will + stop and wait for a debugger to attach. This sets the + default behavior for waiting for the debugger to attach. This + value can also be changed at runtime through + /sys/module/kgdb/parameters/attachwait - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] watchdog: fix help text
From: Randy Dunlap <[EMAIL PROTECTED]> Fix typos in uniform watchdog driver help text. Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- drivers/watchdog/core/Kconfig |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- linux-2.6.23-rc6-mm1.orig/drivers/watchdog/core/Kconfig +++ linux-2.6.23-rc6-mm1/drivers/watchdog/core/Kconfig @@ -9,12 +9,12 @@ config WATCHDOG_CORE depends on EXPERIMENTAL default m ---help--- - Say Y here is you want to use the new uniform watchdog device + Say Y here if you want to use the new uniform watchdog device driver. This driver provides a framework for all watchdog device drivers and gives them the /dev/watchdog interface (and - later also the sysfs interface) + later also the sysfs interface). - At this moment only the iTCO_wdt driver uses this new frame-work. + At this moment only the iTCO_wdt driver uses this new framework. To compile this driver as a module, choose M here: the module will be called watchdog_core. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On 09/19/2007 06:33 AM, Linus Torvalds wrote: On Wed, 19 Sep 2007, Rene Herman wrote: I do feel larger blocksizes continue to make sense in general though. Packet writing on CD/DVD is a problem already today since the hardware needs 32K or 64K blocks and I'd expect to see more of these and similiar situations when flash gets (even) more popular which it sort of inevitably is going to be. .. that's what scatter-gather exists for. What's so hard with just realizing that physical memory isn't contiguous? It's why we have MMU's. It's why we have scatter-gather. So if I understood that right, you'd suggest to deal with devices with larger physical blocksizes at some level above the current blocklayer. Not familiar enough with either block or fs to be able to argue that effectively... Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch 2/2] Relay reset consumend
This patch allows relay channels to be reset i.e. unconsumed. Basically allows a 'rewind' function for flight-recorder tracing. Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]> Signed-off-by: David Wilder <[EMAIL PROTECTED]> --- Documentation/filesystems/relay.txt | 11 ++ include/linux/relay.h |1 + kernel/relay.c | 58 --- 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt index 18d23f9..d31113a 100644 --- a/Documentation/filesystems/relay.txt +++ b/Documentation/filesystems/relay.txt @@ -161,6 +161,7 @@ TBD(curr. line MT:/API/) relay_close(chan) relay_flush(chan) relay_reset(chan) +relay_reset_consumed(chan) channel management typically called on instigation of userspace: @@ -452,6 +453,16 @@ state without reallocating channel buffer memory or destroying existing mappings. It should however only be called when it's safe to do so, i.e. when the channel isn't currently being written to. +The read(2) implementation always 'consumes' the bytes read, +i.e. those bytes won't be available again to subsequent reads. +Certain applications may nonetheless wish to allow the 'consumed' data +to be re-read; relay_reset_consumed() is provided for that purpose - +it resets the internal consumed counters for all buffers in the +channel. For example, if a first set of reads 'drains' the channel, +and then relay_reset_consumed() is called, a second set of reads will +get the exact same data (assuming no new data was written between the +first set of reads and the second). + Finally, there are a couple of utility callbacks that can be used for different purposes. buf_mapped() is called whenever a channel buffer is mmapped from user space and buf_unmapped() is called when it's diff --git a/include/linux/relay.h b/include/linux/relay.h index 6cd8c44..aca45fa 100644 --- a/include/linux/relay.h +++ b/include/linux/relay.h @@ -175,6 +175,7 @@ extern void relay_subbufs_consumed(struct rchan *chan, unsigned int cpu, size_t consumed); extern void relay_reset(struct rchan *chan); +extern void relay_reset_consumed(struct rchan *chan); extern int relay_buf_full(struct rchan_buf *buf); extern size_t relay_switch_subbuf(struct rchan_buf *buf, diff --git a/kernel/relay.c b/kernel/relay.c index 61134eb..6b55eaa 100644 --- a/kernel/relay.c +++ b/kernel/relay.c @@ -383,6 +383,57 @@ void relay_reset(struct rchan *chan) } EXPORT_SYMBOL_GPL(relay_reset); +/** + * __relay_reset_consumed - reset a channel buffer's consumed count + * @buf: the channel buffer + * + * See relay_reset_consumed for description of effect. + */ +static inline void __relay_reset_consumed(struct rchan_buf *buf) +{ + size_t n_subbufs = buf->chan->n_subbufs; + size_t produced = buf->subbufs_produced; + size_t consumed = buf->subbufs_consumed; + + if (produced < n_subbufs) + buf->subbufs_consumed = 0; + else { + consumed = produced - n_subbufs; + if (buf->offset) + consumed++; + buf->subbufs_consumed = consumed; + } + buf->bytes_consumed = 0; +} + +/** + * relay_reset_consumed - reset the channel's consumed counts + * @chan: the channel + * + * This has the effect of making all data previously read (and + * not overwritten by subsequent writes) from a channel available + * for reading again. + * + * NOTE: Care should be taken that the channel isn't actually + * being used by anything when this call is made. + */ +void relay_reset_consumed(struct rchan *chan) +{ + unsigned int i; + struct rchan_buf *prev = NULL; + + if (!chan) + return; + + for (i = 0; i < NR_CPUS; i++) { + if (!chan->buf[i] || chan->buf[i] == prev) + break; + __relay_reset_consumed(chan->buf[i]); + prev = chan->buf[i]; + } +} +EXPORT_SYMBOL_GPL(relay_reset_consumed); + /* * relay_open_buf - create a new relay channel buffer * @@ -845,11 +896,8 @@ static int relay_file_read_avail(struct rchan_buf *buf, size_t read_pos) return 1; } - if (unlikely(produced - consumed >= n_subbufs)) { - consumed = produced - n_subbufs + 1; - buf->subbufs_consumed = consumed; - buf->bytes_consumed = 0; - } + if (unlikely(produced - consumed >= n_subbufs)) + __relay_reset_consumed(buf); produced = (produced % n_subbufs) * subbuf_size + buf->offset; consumed = (consumed % n_subbufs) * subbuf_size + buf->bytes_consumed; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
[Patch 1/2] Trace code and documentation (updated)
Trace - Provides tracing primitives Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]> Signed-off-by: Martin Hunt <[EMAIL PROTECTED]> Signed-off-by: David Wilder <[EMAIL PROTECTED]> --- Documentation/trace/src/Makefile |7 + Documentation/trace/src/README | 18 + Documentation/trace/src/fork_trace.c | 103 ++ Documentation/trace/trace.txt| 164 ++ include/linux/trace.h| 99 ++ lib/Kconfig |9 + lib/Makefile |2 + lib/trace.c | 563 +++ +++ 8 files changed, 965 insertions(+), 0 deletions(-) diff --git a/Documentation/trace/src/Makefile b/Documentation/trace/src/Makefile new file mode 100644 index 000..9ee4c72 --- /dev/null +++ b/Documentation/trace/src/Makefile @@ -0,0 +1,7 @@ +obj-m := fork_trace.o +KDIR := /lib/modules/$(shell uname -r)/build +PWD := $(shell pwd) +default: + $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules +clean: + rm -f *.mod.c *.ko *.o diff --git a/Documentation/trace/src/README b/Documentation/trace/src/README new file mode 100644 index 000..f538491 --- /dev/null +++ b/Documentation/trace/src/README @@ -0,0 +1,18 @@ +This small sample module creates a trace channel. It places a kprobe +on the function do_fork(). The value of current->pid is written to +the trace channel each time the kprobe is hit.. + +How to run the example: +$ mount -t debugfs /debug +$ make +$ insmod fork_trace.ko + +To view the data produced by the module: +$ cat /debug/trace_example/do_fork/trace0 + +Remove the module. +$ rmmod fork_trace + +The function trace_cleanup() is called when the module +is removed. This will cause the TRACE channel to be destroyed and the +corresponding files to disappear from the debug file system. diff --git a/Documentation/trace/src/fork_trace.c b/Documentation/trace/src/fork_trace.c new file mode 100644 index 000..7dad4cc --- /dev/null +++ b/Documentation/trace/src/fork_trace.c @@ -0,0 +1,103 @@ +/* fork_trace.c - An example of using trace in a kprobes module */ +#include +#include +#include +#include + +#define USE_GLOBAL_BUFFERS 1 +#define USE_FLIGHT 1 + +#define PROBE_POINT "do_fork" + +static struct kprobe kp; +static struct trace_info *kprobes_trace; + +#ifdef USE_GLOBAL_BUFFERS +static DEFINE_SPINLOCK(trace_lock); +#endif + +#define TRACE_PRINTF_TMPBUF_SIZE (1024) +static char trace_tmpbuf[NR_CPUS][TRACE_PRINTF_TMPBUF_SIZE]; + +static void trace_printf(struct trace_info *trace, const char *format, ...) +{ + va_list args; + void *buf; + char *record; + int len = 0; + + if (!trace) + return; + + buf = trace_tmpbuf[smp_processor_id()]; + +#ifdef USE_GLOBAL_BUFFERS + spin_lock(_lock); +#endif + + rcu_read_lock(); + if (trace_running(trace)) { + va_start(args, format); + len = vscnprintf(buf, TRACE_PRINTF_TMPBUF_SIZE, +format, args); + va_end(args); + record = relay_reserve(trace->rchan, len); + if (record) + memcpy(record, buf, len); + } + rcu_read_unlock(); + +#ifdef USE_GLOBAL_BUFFERS + spin_unlock(_lock); +#endif +} + + +static int handler_pre(struct kprobe *p, struct pt_regs *regs) +{ + trace_printf(kprobes_trace, "%d\n", current->pid); + return 0; +} + + +int init_module(void) +{ + int ret; + u32 flags = 0; + +#ifdef USE_GLOBAL_BUFFERS + flags |= TRACE_GLOBAL_CHANNEL; +#endif + +#ifdef USE_FLIGHT + flags |= TRACE_FLIGHT_CHANNEL; +#endif + + /* setup the trace */ + kprobes_trace = trace_setup("trace_example", PROBE_POINT, +1024, 8, flags); + if (IS_ERR(kprobes_trace)) + return PTR_ERR(kprobes_trace); + + trace_start(kprobes_trace); + + /* setup the kprobe */ + kp.pre_handler = handler_pre; + kp.post_handler = NULL; + kp.fault_handler = NULL; + kp.symbol_name = PROBE_POINT; + ret = register_kprobe(); + if (ret) { + printk(KERN_ERR "fork_trace: register_kprobe failed\n"); + return ret; + } + return 0; +} + +void cleanup_module(void) +{ + unregister_kprobe(); + trace_stop(kprobes_trace); + trace_cleanup(kprobes_trace); +} +MODULE_LICENSE("GPL"); diff --git a/Documentation/trace/trace.txt b/Documentation/trace/trace.txt new file mode 100644 index 000..d88cb8f --- /dev/null +++ b/Documentation/trace/trace.txt @@ -0,0 +1,164 @@ +Trace Setup and Control +=== +In the kernel, the trace interface provides a simple mechanism for +starting and managing data channels (traces) to user space. The +trace interface builds on the relay interface. For a complete +description of the relay interface, please see: +Documentation/filesystems/relay.txt. + +The trace
[Patch 0/2] A Kernel Tracing Interface (updated)
These patches provide a kernel tracing interface called "trace". The motivation for "trace" is to: - Provide a simple set of tracing primitives that will utilize the high- performance and low-overhead of relayfs for passing traces data from kernel to user space. - Provide a common user interface for managing kernel traces. - Allow for binary as well as ascii trace data. - Incorporate features from the systemtap runtime that are useful to others. History- Versions of this code have been submitted for review under a couple of different names. The original submission was called UTT, it was later re-submitted as GTSC. Christoph Hellwig commented "The code looks fine ...but the name is just dumb". Following Christoph's advice, I changed the name to simply "Trace". This patch addresses review comments made by Christoph Hellwig and Mathieu Desnoyers. Changes include the addition of a mutex and synchronization protecting trace state changes (using RCU) and the reduction of the number of exports. Patch Updated Sep. 18,2007 Addressed further review comments by Andrew Morton, Randy Dunlap, and Sam Ravnborg. Patches are against 2.6.23-rc6-mm1 Required patches: 1/2 Trace code and documentation 2/2 Relay Reset Consumed patch (required for trace's "rewind" feature") Signed-off-by: David Wilder <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On Wed, 19 Sep 2007, Rene Herman wrote: > > I do feel larger blocksizes continue to make sense in general though. Packet > writing on CD/DVD is a problem already today since the hardware needs 32K or > 64K blocks and I'd expect to see more of these and similiar situations when > flash gets (even) more popular which it sort of inevitably is going to be. .. that's what scatter-gather exists for. What's so hard with just realizing that physical memory isn't contiguous? It's why we have MMU's. It's why we have scatter-gather. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On 09/19/2007 05:50 AM, Linus Torvalds wrote: On Wed, 19 Sep 2007, Rene Herman wrote: Well, not so sure about that. What if one of your expected uses for example is video data storage -- lots of data, especially for multiple streams, and needs still relatively fast machinery. Why would you care for the overhead af _small_ blocks? .. so work with an extent-based filesystem instead. 16k blocks are total idiocy. If this wasn't about a "support legacy customers", I think the whole patch-series has been a total waste of time. Admittedly, extent-based might not be a particularly bad answer at least to the I/O side of the equation... I do feel larger blocksizes continue to make sense in general though. Packet writing on CD/DVD is a problem already today since the hardware needs 32K or 64K blocks and I'd expect to see more of these and similiar situations when flash gets (even) more popular which it sort of inevitably is going to be. Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [14/17] Allow bit_waitqueue to wait on a bit in a vmalloc area
Christoph Lameter wrote: > > + if (is_vmalloc_addr(word)) > + page = vmalloc_to_page(word) ^^ Missing ' ; ' > + else > + page = virt_to_page(word); > + > + zone = page_zone(page); > return >wait_table[hash_long(val, zone->wait_table_bits)]; > } > EXPORT_SYMBOL(bit_waitqueue); > Regards, Gabriel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] Time to make CONFIG_PARAVIRT non-experimental.
Andi Kleen wrote: > At least the Xen port seems to have specific requirements > and essentially only work on xen-unstable (?) [or at least > some very new Xen version] which probably very few > people use. > Only on 64-bit hosts, because of bugs in the 64-bit compat layer. 32-on-32 and 64-on-64 (when its done) should work fine. BTW, what does "xm info" say on your system that fails? I'll try to put a more graceful failure in there. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On Wed, 19 Sep 2007, Rene Herman wrote: > > Well, not so sure about that. What if one of your expected uses for example is > video data storage -- lots of data, especially for multiple streams, and needs > still relatively fast machinery. Why would you care for the overhead af > _small_ blocks? .. so work with an extent-based filesystem instead. 16k blocks are total idiocy. If this wasn't about a "support legacy customers", I think the whole patch-series has been a total waste of time. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git] CFS-devel, group scheduler, fixes
On Tue, Sep 18, 2007 at 10:22:43PM +0200, Ingo Molnar wrote: > (I have not tested the group scheduling bits but perhaps Srivatsa would > like to do that?) Ingo, I plan to test it today and send you any updates that may be required. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[05/17] vunmap: return page array
Make vunmap return the page array that was used at vmap. This is useful if one has no structures to track the page array but simply stores the virtual address somewhere. The disposition of the page array can be decided upon after vunmap. vfree() may now also be used instead of vunmap which will release the page array after vunmap'ping it. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/vmalloc.h |2 +- mm/vmalloc.c| 26 -- 2 files changed, 17 insertions(+), 11 deletions(-) Index: linux-2.6/include/linux/vmalloc.h === --- linux-2.6.orig/include/linux/vmalloc.h 2007-09-18 13:22:56.0 -0700 +++ linux-2.6/include/linux/vmalloc.h 2007-09-18 13:22:57.0 -0700 @@ -49,7 +49,7 @@ extern void vfree(const void *addr); extern void *vmap(struct page **pages, unsigned int count, unsigned long flags, pgprot_t prot); -extern void vunmap(const void *addr); +extern struct page **vunmap(const void *addr); extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff); Index: linux-2.6/mm/vmalloc.c === --- linux-2.6.orig/mm/vmalloc.c 2007-09-18 13:22:56.0 -0700 +++ linux-2.6/mm/vmalloc.c 2007-09-18 13:22:57.0 -0700 @@ -356,17 +356,18 @@ struct vm_struct *remove_vm_area(const v return v; } -static void __vunmap(const void *addr, int deallocate_pages) +static struct page **__vunmap(const void *addr, int deallocate_pages) { struct vm_struct *area; + struct page **pages; if (!addr) - return; + return NULL; if ((PAGE_SIZE-1) & (unsigned long)addr) { printk(KERN_ERR "Trying to vfree() bad address (%p)\n", addr); WARN_ON(1); - return; + return NULL; } area = remove_vm_area(addr); @@ -374,29 +375,30 @@ static void __vunmap(const void *addr, i printk(KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", addr); WARN_ON(1); - return; + return NULL; } + pages = area->pages; debug_check_no_locks_freed(addr, area->size); if (deallocate_pages) { int i; for (i = 0; i < area->nr_pages; i++) { - struct page *page = area->pages[i]; + struct page *page = pages[i]; BUG_ON(!page); __free_page(page); } if (area->flags & VM_VPAGES) - vfree(area->pages); + vfree(pages); else - kfree(area->pages); + kfree(pages); } kfree(area); - return; + return pages; } /** @@ -424,11 +426,13 @@ EXPORT_SYMBOL(vfree); * which was created from the page array passed to vmap(). * * Must not be called in interrupt context. + * + * Returns a pointer to the array of pointers to page structs */ -void vunmap(const void *addr) +struct page **vunmap(const void *addr) { BUG_ON(in_interrupt()); - __vunmap(addr, 0); + return __vunmap(addr, 0); } EXPORT_SYMBOL(vunmap); @@ -453,6 +457,8 @@ void *vmap(struct page **pages, unsigned area = get_vm_area((count << PAGE_SHIFT), flags); if (!area) return NULL; + area->pages = pages; + area->nr_pages = count; if (map_vm_area(area, prot, )) { vunmap(area->addr); return NULL; -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On 09/18/2007 09:44 PM, Linus Torvalds wrote: Nobody sane would *ever* argue for 16kB+ blocksizes in general. Well, not so sure about that. What if one of your expected uses for example is video data storage -- lots of data, especially for multiple streams, and needs still relatively fast machinery. Why would you care for the overhead af _small_ blocks? Okay, maybe that's covered in the "in general" but its not extremely oddball either... Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[07/17] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings
This adds a new gfp flag __GFP_VFALLBACK If specified during a higher order allocation then the system will fall back to vmap and attempt to create a virtually contiguous area instead of a physically contiguous area. In many cases the virtually contiguous area can stand in for the physically contiguous area (with some loss of performance). The pages used for VFALLBACK are marked with a new flag PageVcompound(page). The mark is necessary since we have to know upon free if we have to destroy a virtual mapping. No additional flag is consumed through the use of PG_swapcache together with PG_compound (similar to PageHead() and PageTail()). Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/gfp.h|5 + include/linux/page-flags.h | 18 +++ mm/page_alloc.c| 113 ++--- 3 files changed, 130 insertions(+), 6 deletions(-) Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c 2007-09-18 17:03:54.0 -0700 +++ linux-2.6/mm/page_alloc.c 2007-09-18 18:25:46.0 -0700 @@ -1230,6 +1230,86 @@ try_next_zone: } /* + * Virtual Compound Page support. + * + * Virtual Compound Pages are used to fall back to order 0 allocations if large + * linear mappings are not available and __GFP_VFALLBACK is set. They are + * formatted according to compound page conventions. I.e. following + * page->first_page if PageTail(page) is set can be used to determine the + * head page. + */ +struct page *vcompound_alloc(gfp_t gfp_mask, int order, + struct zonelist *zonelist, unsigned long alloc_flags) +{ + void *addr; + struct page *page; + int i; + int nr_pages = 1 << order; + struct page **pages = kzalloc((nr_pages + 1) * sizeof(struct page *), + gfp_mask & GFP_LEVEL_MASK); + + if (!pages) + return NULL; + + for (i = 0; i < nr_pages; i++) { + page = get_page_from_freelist(gfp_mask & ~__GFP_VFALLBACK, + 0, zonelist, alloc_flags); + if (!page) + goto abort; + + /* Sets PageCompound which makes PageHead(page) true */ + __SetPageVcompound(page); + if (i) { + page->first_page = pages[0]; + __SetPageTail(page); + } + pages[i] = page; + } + + addr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (!addr) + goto abort; + + return pages[0]; + +abort: + for (i = 0; i < nr_pages; i++) { + page = pages[i]; + if (!page) + continue; + __ClearPageTail(page); + __ClearPageHead(page); + __ClearPageVcompound(page); + __free_page(page); + } + kfree(pages); + return NULL; +} + +static void vcompound_free(void *addr) +{ + struct page **pages = vunmap(addr); + int i; + + /* +* First page will have zero refcount since it maintains state +* for the compound and was decremented before we got here. +*/ + __ClearPageHead(pages[0]); + __ClearPageVcompound(pages[0]); + free_hot_page(pages[0]); + + for (i = 1; pages[i]; i++) { + struct page *page = pages[i]; + + __ClearPageTail(page); + __ClearPageVcompound(page); + __free_page(page); + } + kfree(pages); +} + +/* * This is the 'heart' of the zoned buddy allocator. */ struct page * fastcall @@ -1324,12 +1404,12 @@ nofail_alloc: goto nofail_alloc; } } - goto nopage; + goto try_vcompound; } /* Atomic allocations - we can't balance anything */ if (!wait) - goto nopage; + goto try_vcompound; cond_resched(); @@ -1360,6 +1440,11 @@ nofail_alloc: */ page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET); + + if (!page && order && (gfp_mask & __GFP_VFALLBACK)) + page = vcompound_alloc(gfp_mask, order, + zonelist, alloc_flags); + if (page) goto got_pg; @@ -1391,6 +1476,14 @@ nofail_alloc: goto rebalance; } +try_vcompound: + /* Last chance before failing the allocation */ + if (order && (gfp_mask & __GFP_VFALLBACK)) { + page = vcompound_alloc(gfp_mask, order, + zonelist, alloc_flags); + if (page) + goto got_pg; +
[06/17] vmalloc_address(): Determine vmalloc address from page struct
Sometimes we need to figure out which vmalloc address is in use for a certain page struct. There is no easy way to figure out the vmalloc address from the page struct. So simply search through the kernel page table to find the address. This is a fairly expensive process. Use sparingly (or provide a better implementation). Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/vmalloc.h |3 + mm/vmalloc.c| 77 2 files changed, 80 insertions(+) Index: linux-2.6/mm/vmalloc.c === --- linux-2.6.orig/mm/vmalloc.c 2007-09-18 18:35:13.0 -0700 +++ linux-2.6/mm/vmalloc.c 2007-09-18 18:35:18.0 -0700 @@ -196,6 +196,83 @@ struct page *vmalloc_to_page(const void EXPORT_SYMBOL(vmalloc_to_page); /* + * Determine vmalloc address from a page struct. + * + * Linear search through all ptes of the vmalloc area. + */ +static unsigned long vaddr_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, unsigned long pfn) +{ + pte_t *pte; + + pte = pte_offset_kernel(pmd, addr); + do { + pte_t ptent = *pte; + if (pte_present(ptent) && pte_pfn(ptent) == pfn) + return addr; + } while (pte++, addr += PAGE_SIZE, addr != end); + return 0; +} + +static inline unsigned long vaddr_pmd_range(pud_t *pud, unsigned long addr, + unsigned long end, unsigned long pfn) +{ + pmd_t *pmd; + unsigned long next; + unsigned long n; + + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); + if (pmd_none_or_clear_bad(pmd)) + continue; + n = vaddr_pte_range(pmd, addr, next, pfn); + if (n) + return n; + } while (pmd++, addr = next, addr != end); + return 0; +} + +static inline unsigned long vaddr_pud_range(pgd_t *pgd, unsigned long addr, + unsigned long end, unsigned long pfn) +{ + pud_t *pud; + unsigned long next; + unsigned long n; + + pud = pud_offset(pgd, addr); + do { + next = pud_addr_end(addr, end); + if (pud_none_or_clear_bad(pud)) + continue; + n = vaddr_pmd_range(pud, addr, next, pfn); + if (n) + return n; + } while (pud++, addr = next, addr != end); + return 0; +} + +void *vmalloc_address(struct page *page) +{ + pgd_t *pgd; + unsigned long next, n; + unsigned long addr = VMALLOC_START; + unsigned long pfn = page_to_pfn(page); + + pgd = pgd_offset_k(VMALLOC_START); + do { + next = pgd_addr_end(addr, VMALLOC_END); + if (pgd_none_or_clear_bad(pgd)) + continue; + n = vaddr_pud_range(pgd, addr, next, pfn); + if (n) + return (void *)n; + } while (pgd++, addr = next, addr < VMALLOC_END); + return NULL; +} +EXPORT_SYMBOL(vmalloc_address); + +/* * Map a vmalloc()-space virtual address to the physical page frame number. */ unsigned long vmalloc_to_pfn(const void *vmalloc_addr) Index: linux-2.6/include/linux/vmalloc.h === --- linux-2.6.orig/include/linux/vmalloc.h 2007-09-18 18:35:13.0 -0700 +++ linux-2.6/include/linux/vmalloc.h 2007-09-18 18:35:48.0 -0700 @@ -85,6 +85,9 @@ extern void free_vm_area(struct vm_struc struct page *vmalloc_to_page(const void *addr); unsigned long vmalloc_to_pfn(const void *addr); +/* Determine address from page struct pointer */ +void *vmalloc_address(struct page *); + /* * Internals. Dont't use.. */ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
SLAB_VFALLBACK can be specified for selected slab caches. If fallback is available then the conservative settings for higher order allocations are overridden. We then request an order that can accomodate at mininum 100 objects. The size of an individual slab allocation is allowed to reach up to 256k (order 6 on i386, order 4 on IA64). Implementing fallback requires special handling of virtual mappings in the free path. However, the impact is minimal since we already check the address if its NULL or ZERO_SIZE_PTR. No additional cachelines are touched if we do not fall back. However, if we need to handle a virtual compound page then walk the kernel page table in the free paths to determine the page struct. We also need special handling in the allocation paths since the virtual addresses cannot be obtained via page_address(). SLUB exploits that page->private is set to the vmalloc address to avoid a costly vmalloc_address(). However, for diagnostics there is still the need to determine the vmalloc address from the page struct. There we must use the costly vmalloc_address(). Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slab.h |1 include/linux/slub_def.h |1 mm/slub.c| 83 --- 3 files changed, 60 insertions(+), 25 deletions(-) Index: linux-2.6/include/linux/slab.h === --- linux-2.6.orig/include/linux/slab.h 2007-09-18 17:03:30.0 -0700 +++ linux-2.6/include/linux/slab.h 2007-09-18 17:07:39.0 -0700 @@ -19,6 +19,7 @@ * The ones marked DEBUG are only valid if CONFIG_SLAB_DEBUG is set. */ #define SLAB_DEBUG_FREE0x0100UL/* DEBUG: Perform (expensive) checks on free */ +#define SLAB_VFALLBACK 0x0200UL/* May fall back to vmalloc */ #define SLAB_RED_ZONE 0x0400UL/* DEBUG: Red zone objs in a cache */ #define SLAB_POISON0x0800UL/* DEBUG: Poison objects */ #define SLAB_HWCACHE_ALIGN 0x2000UL/* Align objs on cache lines */ Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-09-18 17:03:30.0 -0700 +++ linux-2.6/mm/slub.c 2007-09-18 18:13:38.0 -0700 @@ -20,6 +20,7 @@ #include #include #include +#include /* * Lock order: @@ -277,6 +278,26 @@ static inline struct kmem_cache_node *ge #endif } +static inline void *slab_address(struct page *page) +{ + if (unlikely(PageVcompound(page))) + return vmalloc_address(page); + else + return page_address(page); +} + +static inline struct page *virt_to_slab(const void *addr) +{ + struct page *page; + + if (unlikely(is_vmalloc_addr(addr))) + page = vmalloc_to_page(addr); + else + page = virt_to_page(addr); + + return compound_head(page); +} + static inline int check_valid_pointer(struct kmem_cache *s, struct page *page, const void *object) { @@ -285,7 +306,7 @@ static inline int check_valid_pointer(st if (!object) return 1; - base = page_address(page); + base = slab_address(page); if (object < base || object >= base + s->objects * s->size || (object - base) % s->size) { return 0; @@ -470,7 +491,7 @@ static void slab_fix(struct kmem_cache * static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p) { unsigned int off; /* Offset of last byte */ - u8 *addr = page_address(page); + u8 *addr = slab_address(page); print_tracking(s, p); @@ -648,7 +669,7 @@ static int slab_pad_check(struct kmem_ca if (!(s->flags & SLAB_POISON)) return 1; - start = page_address(page); + start = slab_address(page); end = start + (PAGE_SIZE << s->order); length = s->objects * s->size; remainder = end - (start + length); @@ -1040,11 +1061,7 @@ static struct page *allocate_slab(struct struct page * page; int pages = 1 << s->order; - if (s->order) - flags |= __GFP_COMP; - - if (s->flags & SLAB_CACHE_DMA) - flags |= SLUB_DMA; + flags |= s->gfpflags; if (node == -1) page = alloc_pages(flags, s->order); @@ -1098,7 +1115,11 @@ static struct page *new_slab(struct kmem SLAB_STORE_USER | SLAB_TRACE)) SetSlabDebug(page); - start = page_address(page); + if (!PageVcompound(page)) + start = slab_address(page); + else + start = (void *)page->private; + end = start + s->objects * s->size; if (unlikely(s->flags & SLAB_POISON)) @@ -1130,7 +1151,7 @@ static void __free_slab(struct kmem_cach void *p;
[10/17] Use GFP_VFALLBACK for sparsemem.
Sparsemem currently attempts first to do a physically contiguous mapping and then falls back to vmalloc. The same thing can now be accomplished using GFP_VFALLBACK. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/sparse.c | 23 +++ 1 file changed, 3 insertions(+), 20 deletions(-) Index: linux-2.6/mm/sparse.c === --- linux-2.6.orig/mm/sparse.c 2007-09-18 13:21:44.0 -0700 +++ linux-2.6/mm/sparse.c 2007-09-18 13:28:43.0 -0700 @@ -269,32 +269,15 @@ void __init sparse_init(void) #ifdef CONFIG_MEMORY_HOTPLUG static struct page *__kmalloc_section_memmap(unsigned long nr_pages) { - struct page *page, *ret; unsigned long memmap_size = sizeof(struct page) * nr_pages; - page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size)); - if (page) - goto got_map_page; - - ret = vmalloc(memmap_size); - if (ret) - goto got_map_ptr; - - return NULL; -got_map_page: - ret = (struct page *)pfn_to_kaddr(page_to_pfn(page)); -got_map_ptr: - memset(ret, 0, memmap_size); - - return ret; + return (struct page *)alloc_page(GFP_VFALLBACK|__GFP_ZERO, + get_order(memmap_size)); } static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) { - if (is_vmalloc_addr(memmap)) - vfree(memmap); - else - free_pages((unsigned long)memmap, + free_pages((unsigned long)memmap, get_order(sizeof(struct page) * nr_pages)); } -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[09/17] VFALLBACK: Debugging aid
Virtual fallbacks are rare and thus subtle bugs may creep in if we do not test the fallbacks. CONFIG_VFALLBACK_ALWAYS makes all GFP_VFALLBACK allocations fall back to virtual mapping. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- lib/Kconfig.debug | 11 +++ mm/page_alloc.c |9 + 2 files changed, 20 insertions(+) Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c 2007-09-18 19:19:34.0 -0700 +++ linux-2.6/mm/page_alloc.c 2007-09-18 20:16:26.0 -0700 @@ -1205,7 +1205,16 @@ zonelist_scan: goto this_zone_full; } } +#ifdef CONFIG_VFALLBACK_ALWAYS + if ((gfp_mask & __GFP_VFALLBACK) && + system_state == SYSTEM_RUNNING) { + struct page *vcompound_alloc(gfp_t, int, + struct zonelist *, unsigned long); + page = vcompound_alloc(gfp_mask, order, + zonelist, alloc_flags); + } else +#endif page = buffered_rmqueue(zonelist, zone, order, gfp_mask); if (page) break; Index: linux-2.6/lib/Kconfig.debug === --- linux-2.6.orig/lib/Kconfig.debug2007-09-18 19:19:28.0 -0700 +++ linux-2.6/lib/Kconfig.debug 2007-09-18 19:19:34.0 -0700 @@ -105,6 +105,17 @@ config DETECT_SOFTLOCKUP can be detected via the NMI-watchdog, on platforms that support it.) +config VFALLBACK_ALWAYS + bool "Always fall back to Virtual Compound pages" + default y + help + Virtual compound pages are only allocated if there is no linear + memory available. They are a fallback and errors created by the + use of virtual mappings instead of linear ones may not surface + because of their infrequent use. This option makes every + allocation that allows a fallback to a virtual mapping use + the virtual mapping. May have a significant performance impact. + config SCHED_DEBUG bool "Collect scheduler debugging info" depends on DEBUG_KERNEL && PROC_FS -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[17/17] Allow virtual fallback for dentries
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/dcache.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/fs/dcache.c === --- linux-2.6.orig/fs/dcache.c 2007-09-18 18:42:19.0 -0700 +++ linux-2.6/fs/dcache.c 2007-09-18 18:42:55.0 -0700 @@ -2118,7 +2118,8 @@ static void __init dcache_init(unsigned * of the dcache. */ dentry_cache = KMEM_CACHE(dentry, - SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD); + SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD| + SLAB_VFALLBACK); register_shrinker(_shrinker); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[13/17] Virtual compound page freeing in interrupt context
If we are in an interrupt context then simply defer the free via a workqueue. In an interrupt context it is not possible to use vmalloc_addr() to determine the vmalloc address. So add a variant that does that too. Removing a virtual mappping *must* be done with interrupts enabled since tlb_xx functions are called that rely on interrupts for processor to processor communications. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/page_alloc.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c 2007-09-18 20:10:55.0 -0700 +++ linux-2.6/mm/page_alloc.c 2007-09-18 20:11:40.0 -0700 @@ -1297,7 +1297,12 @@ abort: return NULL; } -static void vcompound_free(void *addr) +/* + * Virtual Compound freeing functions. This is complicated by the vmalloc + * layer not being able to free virtual allocations when interrupts are + * disabled. So we defer the frees via a workqueue if necessary. + */ +static void __vcompound_free(void *addr) { struct page **pages = vunmap(addr); int i; @@ -1320,6 +1325,22 @@ static void vcompound_free(void *addr) kfree(pages); } +static void vcompound_free_work(struct work_struct *w) +{ + __vcompound_free((void *)w); +} + +static void vcompound_free(void *addr) +{ + if (in_interrupt()) { + struct work_struct *w = addr; + + INIT_WORK(w, vcompound_free_work); + schedule_work(w); + } else + __vcompound_free(addr); +} + /* * This is the 'heart' of the zoned buddy allocator. */ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[14/17] Allow bit_waitqueue to wait on a bit in a vmalloc area
If bit waitqueue is passed a virtual address then it must use vmalloc_to_page instead of virt_to_page to get to the page struct. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- kernel/wait.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/wait.c === --- linux-2.6.orig/kernel/wait.c2007-09-18 19:19:27.0 -0700 +++ linux-2.6/kernel/wait.c 2007-09-18 20:10:39.0 -0700 @@ -9,6 +9,7 @@ #include #include #include +#include void init_waitqueue_head(wait_queue_head_t *q) { @@ -245,9 +246,16 @@ EXPORT_SYMBOL(wake_up_bit); fastcall wait_queue_head_t *bit_waitqueue(void *word, int bit) { const int shift = BITS_PER_LONG == 32 ? 5 : 6; - const struct zone *zone = page_zone(virt_to_page(word)); unsigned long val = (unsigned long)word << shift | bit; + struct page *page; + struct zone *zone; + if (is_vmalloc_addr(word)) + page = vmalloc_to_page(word) + else + page = virt_to_page(word); + + zone = page_zone(page); return >wait_table[hash_long(val, zone->wait_table_bits)]; } EXPORT_SYMBOL(bit_waitqueue); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[16/17] Allow virtual fallback for buffer_heads
This is in particular useful for large I/Os because it will allow > 100 allocs from the SLUB fast path without having to go to the page allocator. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- fs/buffer.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c 2007-09-18 15:44:37.0 -0700 +++ linux-2.6/fs/buffer.c 2007-09-18 15:44:51.0 -0700 @@ -3008,7 +3008,8 @@ void __init buffer_init(void) int nrpages; bh_cachep = KMEM_CACHE(buffer_head, - SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD); + SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD| + SLAB_VFALLBACK); /* * Limit the bh occupancy to 10% of ZONE_NORMAL -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[11/17] GFP_VFALLBACK for zone wait table.
Currently we have to use vmalloc for the zone wait table possibly generating the need to create lots of TLBs to access the tables. We can now use GFP_VFALLBACK to attempt the use of a physically contiguous page that can then use the large kernel TLBs. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/page_alloc.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c 2007-09-18 14:29:05.0 -0700 +++ linux-2.6/mm/page_alloc.c 2007-09-18 14:29:10.0 -0700 @@ -2572,7 +2572,9 @@ int zone_wait_table_init(struct zone *zo * To use this new node's memory, further consideration will be * necessary. */ - zone->wait_table = (wait_queue_head_t *)vmalloc(alloc_size); + zone->wait_table = (wait_queue_head_t *) + __get_free_pages(GFP_VFALLBACK, + get_order(alloc_size)); } if (!zone->wait_table) return -ENOMEM; -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[12/17] Virtual Compound page allocation from interrupt context.
In an interrupt context we cannot wait for the vmlist_lock in __get_vm_area_node(). So use a trylock instead. If the trylock fails then the atomic allocation will fail and subsequently be retried. This only works because the flush_cache_vunmap in use for allocation is never performing any IPIs in contrast to flush_tlb_... in use for freeing. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/vmalloc.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) Index: linux-2.6/mm/vmalloc.c === --- linux-2.6.orig/mm/vmalloc.c 2007-09-18 10:52:11.0 -0700 +++ linux-2.6/mm/vmalloc.c 2007-09-18 10:54:21.0 -0700 @@ -289,7 +289,6 @@ static struct vm_struct *__get_vm_area_n unsigned long align = 1; unsigned long addr; - BUG_ON(in_interrupt()); if (flags & VM_IOREMAP) { int bit = fls(size); @@ -314,7 +313,14 @@ static struct vm_struct *__get_vm_area_n */ size += PAGE_SIZE; - write_lock(_lock); + if (gfp_mask & __GFP_WAIT) + write_lock(_lock); + else { + if (!write_trylock(_lock)) { + kfree(area); + return NULL; + } + } for (p = (tmp = *p) != NULL ;p = >next) { if ((unsigned long)tmp->addr < addr) { if((unsigned long)tmp->addr + tmp->size >= addr) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[03/17] is_vmalloc_addr(): Check if an address is within the vmalloc boundaries
This test is used in a couple of places. Add a version to vmalloc.h and replace the other checks. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- drivers/net/cxgb3/cxgb3_offload.c |4 +--- fs/ntfs/malloc.h |3 +-- fs/proc/kcore.c |2 +- fs/xfs/linux-2.6/kmem.c |3 +-- fs/xfs/linux-2.6/xfs_buf.c|3 +-- include/linux/mm.h|8 mm/sparse.c | 10 +- 7 files changed, 14 insertions(+), 19 deletions(-) Index: linux-2.6/include/linux/mm.h === --- linux-2.6.orig/include/linux/mm.h 2007-09-17 21:46:06.0 -0700 +++ linux-2.6/include/linux/mm.h2007-09-17 23:56:54.0 -0700 @@ -1158,6 +1158,14 @@ static inline unsigned long vma_pages(st return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; } +/* Determine if an address is within the vmalloc range */ +static inline int is_vmalloc_addr(const void *x) +{ + unsigned long addr = (unsigned long)x; + + return addr >= VMALLOC_START && addr < VMALLOC_END; +} + pgprot_t vm_get_page_prot(unsigned long vm_flags); struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, Index: linux-2.6/mm/sparse.c === --- linux-2.6.orig/mm/sparse.c 2007-09-17 21:45:24.0 -0700 +++ linux-2.6/mm/sparse.c 2007-09-17 23:56:26.0 -0700 @@ -289,17 +289,9 @@ got_map_ptr: return ret; } -static int vaddr_in_vmalloc_area(void *addr) -{ - if (addr >= (void *)VMALLOC_START && - addr < (void *)VMALLOC_END) - return 1; - return 0; -} - static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) { - if (vaddr_in_vmalloc_area(memmap)) + if (is_vmalloc_addr(memmap)) vfree(memmap); else free_pages((unsigned long)memmap, Index: linux-2.6/drivers/net/cxgb3/cxgb3_offload.c === --- linux-2.6.orig/drivers/net/cxgb3/cxgb3_offload.c2007-09-17 21:45:24.0 -0700 +++ linux-2.6/drivers/net/cxgb3/cxgb3_offload.c 2007-09-17 21:46:06.0 -0700 @@ -1035,9 +1035,7 @@ void *cxgb_alloc_mem(unsigned long size) */ void cxgb_free_mem(void *addr) { - unsigned long p = (unsigned long)addr; - - if (p >= VMALLOC_START && p < VMALLOC_END) + if (is_vmalloc_addr(addr)) vfree(addr); else kfree(addr); Index: linux-2.6/fs/ntfs/malloc.h === --- linux-2.6.orig/fs/ntfs/malloc.h 2007-09-17 21:45:24.0 -0700 +++ linux-2.6/fs/ntfs/malloc.h 2007-09-17 21:46:06.0 -0700 @@ -85,8 +85,7 @@ static inline void *ntfs_malloc_nofs_nof static inline void ntfs_free(void *addr) { - if (likely(((unsigned long)addr < VMALLOC_START) || - ((unsigned long)addr >= VMALLOC_END ))) { + if (!is_vmalloc_addr(addr)) { kfree(addr); /* free_page((unsigned long)addr); */ return; Index: linux-2.6/fs/proc/kcore.c === --- linux-2.6.orig/fs/proc/kcore.c 2007-09-17 21:45:24.0 -0700 +++ linux-2.6/fs/proc/kcore.c 2007-09-17 21:46:06.0 -0700 @@ -325,7 +325,7 @@ read_kcore(struct file *file, char __use if (m == NULL) { if (clear_user(buffer, tsz)) return -EFAULT; - } else if ((start >= VMALLOC_START) && (start < VMALLOC_END)) { + } else if (is_vmalloc_addr((void *)start)) { char * elf_buf; struct vm_struct *m; unsigned long curstart = start; Index: linux-2.6/fs/xfs/linux-2.6/kmem.c === --- linux-2.6.orig/fs/xfs/linux-2.6/kmem.c 2007-09-17 21:45:24.0 -0700 +++ linux-2.6/fs/xfs/linux-2.6/kmem.c 2007-09-17 21:46:06.0 -0700 @@ -92,8 +92,7 @@ kmem_zalloc_greedy(size_t *size, size_t void kmem_free(void *ptr, size_t size) { - if (((unsigned long)ptr < VMALLOC_START) || - ((unsigned long)ptr >= VMALLOC_END)) { + if (!is_vmalloc_addr(ptr)) { kfree(ptr); } else { vfree(ptr); Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2007-09-17 21:45:24.0 -0700 +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-09-17 21:46:06.0 -0700 @@ -696,8 +696,7 @@ static inline struct page * mem_to_page( void
[08/17] Pass vmalloc address in page->private
Avoid expensive lookups of virtual addresses from page structs by storing the vmalloc address in page->private. We can then avoid the vmalloc_address() in the get__page() functions and simply return page->private. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/page_alloc.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c 2007-09-18 18:35:55.0 -0700 +++ linux-2.6/mm/page_alloc.c 2007-09-18 18:36:01.0 -0700 @@ -1276,6 +1276,11 @@ struct page *vcompound_alloc(gfp_t gfp_m if (!addr) goto abort; + /* +* Give the caller a chance to avoid an expensive vmalloc_addr() +* call. +*/ + pages[0]->private = (unsigned long)addr; return pages[0]; abort: @@ -1534,6 +1539,8 @@ fastcall unsigned long __get_free_pages( page = alloc_pages(gfp_mask, order); if (!page) return 0; + if (unlikely(PageVcompound(page))) + return page->private; return (unsigned long) page_address(page); } @@ -1550,9 +1557,11 @@ fastcall unsigned long get_zeroed_page(g VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0); page = alloc_pages(gfp_mask | __GFP_ZERO, 0); - if (page) - return (unsigned long) page_address(page); - return 0; + if (!page) + return 0; + if (unlikely(PageVcompound(page))) + return page->private; + return (unsigned long) page_address(page); } EXPORT_SYMBOL(get_zeroed_page); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[01/17] Vmalloc: Move vmalloc_to_page to mm/vmalloc.
We already have page table manipulation for vmalloc in vmalloc.c. Move the vmalloc_to_page() function there as well. Also move the related definitions from include/linux/mm.h. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/mm.h |2 -- include/linux/vmalloc.h |4 mm/memory.c | 40 mm/vmalloc.c| 38 ++ 4 files changed, 42 insertions(+), 42 deletions(-) Index: linux-2.6/mm/memory.c === --- linux-2.6.orig/mm/memory.c 2007-09-18 18:33:56.0 -0700 +++ linux-2.6/mm/memory.c 2007-09-18 18:34:06.0 -0700 @@ -2727,46 +2727,6 @@ int make_pages_present(unsigned long add return ret == len ? 0 : -1; } -/* - * Map a vmalloc()-space virtual address to the physical page. - */ -struct page * vmalloc_to_page(void * vmalloc_addr) -{ - unsigned long addr = (unsigned long) vmalloc_addr; - struct page *page = NULL; - pgd_t *pgd = pgd_offset_k(addr); - pud_t *pud; - pmd_t *pmd; - pte_t *ptep, pte; - - if (!pgd_none(*pgd)) { - pud = pud_offset(pgd, addr); - if (!pud_none(*pud)) { - pmd = pmd_offset(pud, addr); - if (!pmd_none(*pmd)) { - ptep = pte_offset_map(pmd, addr); - pte = *ptep; - if (pte_present(pte)) - page = pte_page(pte); - pte_unmap(ptep); - } - } - } - return page; -} - -EXPORT_SYMBOL(vmalloc_to_page); - -/* - * Map a vmalloc()-space virtual address to the physical page frame number. - */ -unsigned long vmalloc_to_pfn(void * vmalloc_addr) -{ - return page_to_pfn(vmalloc_to_page(vmalloc_addr)); -} - -EXPORT_SYMBOL(vmalloc_to_pfn); - #if !defined(__HAVE_ARCH_GATE_AREA) #if defined(AT_SYSINFO_EHDR) Index: linux-2.6/mm/vmalloc.c === --- linux-2.6.orig/mm/vmalloc.c 2007-09-18 18:33:56.0 -0700 +++ linux-2.6/mm/vmalloc.c 2007-09-18 18:34:06.0 -0700 @@ -166,6 +166,44 @@ int map_vm_area(struct vm_struct *area, } EXPORT_SYMBOL_GPL(map_vm_area); +/* + * Map a vmalloc()-space virtual address to the physical page. + */ +struct page *vmalloc_to_page(void *vmalloc_addr) +{ + unsigned long addr = (unsigned long) vmalloc_addr; + struct page *page = NULL; + pgd_t *pgd = pgd_offset_k(addr); + pud_t *pud; + pmd_t *pmd; + pte_t *ptep, pte; + + if (!pgd_none(*pgd)) { + pud = pud_offset(pgd, addr); + if (!pud_none(*pud)) { + pmd = pmd_offset(pud, addr); + if (!pmd_none(*pmd)) { + ptep = pte_offset_map(pmd, addr); + pte = *ptep; + if (pte_present(pte)) + page = pte_page(pte); + pte_unmap(ptep); + } + } + } + return page; +} +EXPORT_SYMBOL(vmalloc_to_page); + +/* + * Map a vmalloc()-space virtual address to the physical page frame number. + */ +unsigned long vmalloc_to_pfn(void *vmalloc_addr) +{ + return page_to_pfn(vmalloc_to_page(vmalloc_addr)); +} +EXPORT_SYMBOL(vmalloc_to_pfn); + static struct vm_struct *__get_vm_area_node(unsigned long size, unsigned long flags, unsigned long start, unsigned long end, int node, gfp_t gfp_mask) Index: linux-2.6/include/linux/mm.h === --- linux-2.6.orig/include/linux/mm.h 2007-09-18 18:33:56.0 -0700 +++ linux-2.6/include/linux/mm.h2007-09-18 18:34:06.0 -0700 @@ -1160,8 +1160,6 @@ static inline unsigned long vma_pages(st pgprot_t vm_get_page_prot(unsigned long vm_flags); struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); -struct page *vmalloc_to_page(void *addr); -unsigned long vmalloc_to_pfn(void *addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); Index: linux-2.6/include/linux/vmalloc.h === --- linux-2.6.orig/include/linux/vmalloc.h 2007-09-18 18:33:57.0 -0700 +++ linux-2.6/include/linux/vmalloc.h 2007-09-18 18:34:24.0 -0700 @@ -81,6 +81,10 @@ extern void unmap_kernel_range(unsigned extern struct vm_struct *alloc_vm_area(size_t size); extern void
[00/17] [RFC] Virtual Compound Page Support
Currently there is a strong tendency to avoid larger page allocations in the kernel because of past fragmentation issues and the current defragmentation methods are still evolving. It is not clear to what extend they can provide reliable allocations for higher order pages (plus the definition of "reliable" seems to be in the eye of the beholder). Currently we use vmalloc allocations in many locations to provide a safe way to allocate larger arrays. That is due to the danger of higher order allocations failing. Virtual Compound pages allow the use of regular page allocator allocations that will fall back only if there is an actual problem with acquiring a higher order page. This patch set provides a way for a higher page allocation to fall back. Instead of a physically contiguous page a virtually contiguous page is provided. The functionality of the vmalloc layer is used to provide the necessary page tables and control structures to establish a virtually contiguous area. Advantages: - If higher order allocations are failing then virtual compound pages consisting of a series of order-0 pages can stand in for those allocations. - "Reliability" as long as the vmalloc layer can provide virtual mappings. - Ability to reduce the use of vmalloc layer significantly by using physically contiguous memory instead of virtual contiguous memory. Most uses of vmalloc() can be converted to page allocator calls. - The use of physically contiguous memory instead of vmalloc may allow the use larger TLB entries thus reducing TLB pressure. Also reduces the need for page table walks. Disadvantages: - In order to use fall back the logic accessing the memory must be aware that the memory could be backed by a virtual mapping and take precautions. virt_to_page() and page_address() may not work and vmalloc_to_page() and vmalloc_address() (introduced through this patch set) may have to be called. - Virtual mappings are less efficient than physical mappings. Performance will drop once virtual fall back occurs. - Virtual mappings have more memory overhead. vm_area control structures page tables, page arrays etc need to be allocated and managed to provide virtual mappings. The patchset provides this functionality in stages. Stage 1 introduces the basic fall back mechanism necessary to replace vmalloc allocations with alloc_page(GFP_VFALLBACK, order, ) which signifies to the page allocator that a higher order is to be found but a virtual mapping may stand in if there is an issue with fragmentation. Stage 1 functionality does not allow allocation and freeing of virtual mappings from interrupt contexts. The stage 1 series ends with the conversion of a few key uses of vmalloc in the VM to alloc_pages() for the allocation of sparsemems memmap table and the wait table in each zone. Other uses of vmalloc could be converted in the same way. Stage 2 functionality enhances the fallback even more allowing allocation and frees in interrupt context. SLUB is then modified to use the virtual mappings for slab caches that are marked with SLAB_VFALLBACK. If a slab cache is marked this way then we drop all the restraints regarding page order and allocate good large memory areas that fit lots of objects so that we rarely have to use the slow paths. Two slab caches--the dentry cache and the buffer_heads--are then flagged that way. Others could be converted in the same way. The patch set also provides a debugging aid through setting CONFIG_VFALLBACK_ALWAYS If set then all GFP_VFALLBACK allocations fall back to the virtual mappings. This is useful for verification tests. The test of this patch set was done by enabling that options and compiling a kernel. Stage 3 functionality could be the adding of support for the large buffer size patchset. Not done yet and not sure if it would be useful to do. Much of this patchset may only be needed for special cases in which the existing defragmentation methods fail for some reason. It may be better to have the system operate without such a safety net and make sure that the page allocator can return large orders in a reliable way. The initial idea for this patchset came from Nick Piggin's fsblock and from his arguments about reliability and guarantees. Since his fsblock uses the virtual mappings I think it is legitimate to generalize the use of virtual mappings to support higher order allocations in this way. The application of these ideas to the large block size patchset etc are straightforward. If wanted I can base the next rev of the largebuffer patchset on this one and implement fallback. Contrary to Nick, I still doubt that any of this provides a "guarantee". Have said that I have to deal with various failure scenarios in the VM daily and I'd certainly like to see it work in a more reliable manner. IMHO getting rid of the various workarounds to deal with the small 4k pages and avoiding additional layers that group these pages in subsystem
[02/17] Vmalloc: add const
Make vmalloc functions work the same way as kfree() and friends that take a const void * argument. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/vmalloc.h | 10 +- mm/vmalloc.c| 16 2 files changed, 13 insertions(+), 13 deletions(-) Index: linux-2.6/mm/vmalloc.c === --- linux-2.6.orig/mm/vmalloc.c 2007-09-18 18:34:06.0 -0700 +++ linux-2.6/mm/vmalloc.c 2007-09-18 18:34:33.0 -0700 @@ -169,7 +169,7 @@ EXPORT_SYMBOL_GPL(map_vm_area); /* * Map a vmalloc()-space virtual address to the physical page. */ -struct page *vmalloc_to_page(void *vmalloc_addr) +struct page *vmalloc_to_page(const void *vmalloc_addr) { unsigned long addr = (unsigned long) vmalloc_addr; struct page *page = NULL; @@ -198,7 +198,7 @@ EXPORT_SYMBOL(vmalloc_to_page); /* * Map a vmalloc()-space virtual address to the physical page frame number. */ -unsigned long vmalloc_to_pfn(void *vmalloc_addr) +unsigned long vmalloc_to_pfn(const void *vmalloc_addr) { return page_to_pfn(vmalloc_to_page(vmalloc_addr)); } @@ -305,7 +305,7 @@ struct vm_struct *get_vm_area_node(unsig } /* Caller must hold vmlist_lock */ -static struct vm_struct *__find_vm_area(void *addr) +static struct vm_struct *__find_vm_area(const void *addr) { struct vm_struct *tmp; @@ -318,7 +318,7 @@ static struct vm_struct *__find_vm_area( } /* Caller must hold vmlist_lock */ -static struct vm_struct *__remove_vm_area(void *addr) +static struct vm_struct *__remove_vm_area(const void *addr) { struct vm_struct **p, *tmp; @@ -347,7 +347,7 @@ found: * This function returns the found VM area, but using it is NOT safe * on SMP machines, except for its size or flags. */ -struct vm_struct *remove_vm_area(void *addr) +struct vm_struct *remove_vm_area(const void *addr) { struct vm_struct *v; write_lock(_lock); @@ -356,7 +356,7 @@ struct vm_struct *remove_vm_area(void *a return v; } -static void __vunmap(void *addr, int deallocate_pages) +static void __vunmap(const void *addr, int deallocate_pages) { struct vm_struct *area; @@ -407,7 +407,7 @@ static void __vunmap(void *addr, int dea * * Must not be called in interrupt context. */ -void vfree(void *addr) +void vfree(const void *addr) { BUG_ON(in_interrupt()); __vunmap(addr, 1); @@ -423,7 +423,7 @@ EXPORT_SYMBOL(vfree); * * Must not be called in interrupt context. */ -void vunmap(void *addr) +void vunmap(const void *addr) { BUG_ON(in_interrupt()); __vunmap(addr, 0); Index: linux-2.6/include/linux/vmalloc.h === --- linux-2.6.orig/include/linux/vmalloc.h 2007-09-18 18:34:24.0 -0700 +++ linux-2.6/include/linux/vmalloc.h 2007-09-18 18:35:03.0 -0700 @@ -45,11 +45,11 @@ extern void *vmalloc_32_user(unsigned lo extern void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot); extern void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot); -extern void vfree(void *addr); +extern void vfree(const void *addr); extern void *vmap(struct page **pages, unsigned int count, unsigned long flags, pgprot_t prot); -extern void vunmap(void *addr); +extern void vunmap(const void *addr); extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff); @@ -71,7 +71,7 @@ extern struct vm_struct *__get_vm_area(u extern struct vm_struct *get_vm_area_node(unsigned long size, unsigned long flags, int node, gfp_t gfp_mask); -extern struct vm_struct *remove_vm_area(void *addr); +extern struct vm_struct *remove_vm_area(const void *addr); extern int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages); @@ -82,8 +82,8 @@ extern struct vm_struct *alloc_vm_area(s extern void free_vm_area(struct vm_struct *area); /* Determine page struct from address */ -struct page *vmalloc_to_page(void *addr); -unsigned long vmalloc_to_pfn(void *addr); +struct page *vmalloc_to_page(const void *addr); +unsigned long vmalloc_to_pfn(const void *addr); /* * Internals. Dont't use.. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[04/17] vmalloc: clean up page array indexing
The page array is repeatedly indexed both in vunmap and vmalloc_area_node(). Add a temporary variable to make it easier to read (and easier to patch later). Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/vmalloc.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) Index: linux-2.6/mm/vmalloc.c === --- linux-2.6.orig/mm/vmalloc.c 2007-09-18 13:22:16.0 -0700 +++ linux-2.6/mm/vmalloc.c 2007-09-18 13:22:17.0 -0700 @@ -383,8 +383,10 @@ static void __vunmap(const void *addr, i int i; for (i = 0; i < area->nr_pages; i++) { - BUG_ON(!area->pages[i]); - __free_page(area->pages[i]); + struct page *page = area->pages[i]; + + BUG_ON(!page); + __free_page(page); } if (area->flags & VM_VPAGES) @@ -488,15 +490,19 @@ void *__vmalloc_area_node(struct vm_stru } for (i = 0; i < area->nr_pages; i++) { + struct page *page; + if (node < 0) - area->pages[i] = alloc_page(gfp_mask); + page = alloc_page(gfp_mask); else - area->pages[i] = alloc_pages_node(node, gfp_mask, 0); - if (unlikely(!area->pages[i])) { + page = alloc_pages_node(node, gfp_mask, 0); + + if (unlikely(!page)) { /* Successfully allocated i pages, free them in __vunmap() */ area->nr_pages = i; goto fail; } + area->pages[i] = page; } if (map_vm_area(area, prot, )) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch]some proc entries are missed in sched_domain sys_ctl debug code.
cache_nice_tries and flags entry do not appear in proc fs sched_domain directory, because ctl_table entry is skipped. This patch fix the issue. Signed-off-by: Zou Nan hai <[EMAIL PROTECTED]> --- linux-2.6.23-rc6/kernel/sched.c 2007-09-18 23:47:07.0 -0400 +++ b/kernel/sched.c2007-09-18 23:47:20.0 -0400 @@ -5304,7 +5304,7 @@ set_table_entry(struct ctl_table *entry, static struct ctl_table * sd_alloc_ctl_domain_table(struct sched_domain *sd) { - struct ctl_table *table = sd_alloc_ctl_entry(14); + struct ctl_table *table = sd_alloc_ctl_entry(12); set_table_entry([0], "min_interval", >min_interval, sizeof(long), 0644, proc_doulongvec_minmax); @@ -5324,10 +5324,10 @@ sd_alloc_ctl_domain_table(struct sched_d sizeof(int), 0644, proc_dointvec_minmax); set_table_entry([8], "imbalance_pct", >imbalance_pct, sizeof(int), 0644, proc_dointvec_minmax); - set_table_entry([10], "cache_nice_tries", + set_table_entry([9], "cache_nice_tries", >cache_nice_tries, sizeof(int), 0644, proc_dointvec_minmax); - set_table_entry([12], "flags", >flags, + set_table_entry([10], "flags", >flags, sizeof(int), 0644, proc_dointvec_minmax); return table; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol
This patch defines a 32-bit boot protocol and adds corresponding document. It is based on the proposal of Peter Anvin. Known issues: - The hd0_info and hd1_info are deleted from the zero page. Additional work should be done for this? Or this is unnecessary (because no new fields will be added to zero page)? - The fields in zero page are fairly complex (such as struct edd_info). Is it necessary to document every field inside the first level fields, until the primary data type? Or is it sufficient to provide the C struct name only? ChangeLog: -- v2 -- - Revise zero page description according to the source code and move them to zero-page.txt. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- boot.txt | 70 +++ zero-page.txt | 127 -- 2 files changed, 97 insertions(+), 100 deletions(-) Index: linux-2.6.23-rc6/Documentation/i386/boot.txt === --- linux-2.6.23-rc6.orig/Documentation/i386/boot.txt 2007-09-11 10:50:29.0 +0800 +++ linux-2.6.23-rc6/Documentation/i386/boot.txt2007-09-19 10:00:18.0 +0800 @@ -2,7 +2,7 @@ H. Peter Anvin <[EMAIL PROTECTED]> - Last update 2007-05-23 + Last update 2007-09-18 On the i386 platform, the Linux kernel uses a rather complicated boot convention. This has evolved partially due to historical aspects, as @@ -42,6 +42,9 @@ Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of the boot command line +Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical + pointer to single linked list of struct setup_data. + Added 32-bit boot protocol. MEMORY LAYOUT @@ -168,6 +171,9 @@ 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 0235/3 N/A pad2Unused 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line +023c/4 N/A pad3Unused +0240/8 2.07+ setup_data 64-bit physical pointer to linked list + of struct setup_data (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -480,6 +486,36 @@ cmdline_size characters. With protocol version 2.05 and earlier, the maximum size was 255. +Field name:setup_data +Type: write (obligatory) +Offset/size: 0x240/8 +Protocol: 2.07+ + + The 64-bit physical pointer to NULL terminated single linked list of + struct setup_data. This is used to define a more extensible boot + parameters passing mechanism. The definition of struct setup_data is + as follow: + + struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; + } __attribute__((packed)); + + Where, the next is a 64-bit physical pointer to the next node of + linked list, the next field of the last node is 0; the type is used + to identify the contents of data; the len is the length of data + field; the data holds the real payload. + + With this field, to add a new boot parameter written by bootloader, + it is not needed to add a new field to real mode header, just add a + new setup_data type is sufficient. But to add a new boot parameter + read by bootloader, it is still needed to add a new field. + + TODO: Where is the safe place to place the linked list of struct + setup_data? + THE KERNEL COMMAND LINE @@ -753,3 +789,35 @@ After completing your hook, you should jump to the address that was in this field before your boot loader overwrote it (relocated, if appropriate.) + + + SETUP DATA TYPES + + + 32-bit BOOT PROTOCOL + +For machine with some new BIOS other than legacy BIOS, such as EFI, +LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel +based on legacy BIOS can not be used, so a 32-bit boot protocol need +to be defined. + +In 32-bit boot protocol, the first step in loading a Linux kernel +should still be to load the real-mode code and then examine the kernel +header at offset 0x01f1. But, it is not necessary to load all +real-mode code, just first 4K bytes traditionally known as "zero page" +is needed. + +In addition to read/modify/write kernel header of the zero page as +that of 16-bit boot protocol, the boot loader should also fill the +additional fields of the zero page as that described in zero-page.txt. + +After loading and setuping the zero page, the boot loader can load the +32/64-bit kernel in the same way as that of 16-bit boot protocol. + +In 32-bit boot protocol, the kernel is started by jumping to the +32-bit kernel entry point, which is the start address of loaded +32/64-bit kernel. + +At entry, the CPU must be in 32-bit protected mode with paging +disabled; the CS and DS must be 4G flat
[PATCH -mm -v2 1/2] i386/x86_64 boot: setup data
This patch add a field of 64-bit physical pointer to NULL terminated single linked list of struct setup_data to real-mode kernel header. This is used as a more extensible boot parameters passing mechanism. This patch has been tested against 2.6.23-rc6-mm1 kernel on x86_64. It is based on the proposal of Peter Anvin. Known Issues: 1. Where is safe to place the linked list of setup_data? Because the length of the linked list of setup_data is variable, it can not be copied into BSS segment of kernel as that of "zero page". We must find a safe place for it, where it will not be overwritten by kernel during booting up. The i386 kernel will overwrite some pages after _end. The x86_64 kernel will overwrite some pages from 0x1000 on. ChangeLog: -- v2 -- - Increase the boot protocol version number. - Check version number before parsing setup_data. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/i386/Kconfig|3 --- arch/i386/boot/header.S |8 +++- arch/i386/kernel/setup.c | 22 ++ arch/x86_64/kernel/setup.c | 21 + include/asm-i386/bootparam.h | 15 +++ include/asm-i386/io.h|7 +++ 6 files changed, 72 insertions(+), 4 deletions(-) Index: linux-2.6.23-rc6/include/asm-i386/bootparam.h === --- linux-2.6.23-rc6.orig/include/asm-i386/bootparam.h 2007-09-19 10:00:06.0 +0800 +++ linux-2.6.23-rc6/include/asm-i386/bootparam.h 2007-09-19 10:00:08.0 +0800 @@ -9,6 +9,17 @@ #include #include +/* setup data types */ +#define SETUP_NONE 0 + +/* extensible setup data list node */ +struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; +} __attribute__((packed)); + struct setup_header { u8 setup_sects; u16 root_flags; @@ -41,6 +52,10 @@ u32 initrd_addr_max; u32 kernel_alignment; u8 relocatable_kernel; + u8 _pad2[3]; + u32 cmdline_size; + u32 _pad3; + u64 setup_data; } __attribute__((packed)); struct sys_desc_table { Index: linux-2.6.23-rc6/arch/i386/boot/header.S === --- linux-2.6.23-rc6.orig/arch/i386/boot/header.S 2007-09-11 10:50:29.0 +0800 +++ linux-2.6.23-rc6/arch/i386/boot/header.S2007-09-19 10:00:09.0 +0800 @@ -119,7 +119,7 @@ # Part 2 of the header, from the old setup.S .ascii "HdrS" # header signature - .word 0x0206 # header version number (>= 0x0105) + .word 0x0207 # header version number (>= 0x0105) # or else old loadlin-1.5 will fail) .globl realmode_swtch realmode_swtch:.word 0, 0# default_switch, SETUPSEG @@ -214,6 +214,12 @@ #added with boot protocol #version 2.06 +pad4: .long 0 + +setup_data:.quad 0 # 64-bit physical pointer to + # single linked list of + # struct setup_data + # End of setup header # .section ".inittext", "ax" Index: linux-2.6.23-rc6/arch/x86_64/kernel/setup.c === --- linux-2.6.23-rc6.orig/arch/x86_64/kernel/setup.c2007-09-19 10:00:00.0 +0800 +++ linux-2.6.23-rc6/arch/x86_64/kernel/setup.c 2007-09-19 10:00:09.0 +0800 @@ -221,6 +221,25 @@ ebda_size = 64*1024; } +void __init parse_setup_data(void) +{ + struct setup_data *setup_data; + unsigned long pa_setup_data; + + if (boot_params.hdr.version < 0x0207) + return; + pa_setup_data = boot_params.hdr.setup_data; + while (pa_setup_data) { + setup_data = early_ioremap(pa_setup_data, PAGE_SIZE); + switch (setup_data->type) { + default: + break; + } + pa_setup_data = setup_data->next; + early_iounmap(setup_data, PAGE_SIZE); + } +} + void __init setup_arch(char **cmdline_p) { printk(KERN_INFO "Command line: %s\n", boot_command_line); @@ -256,6 +275,8 @@ strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE); *cmdline_p = command_line; + parse_setup_data(); + parse_early_param(); finish_e820_parsing(); Index: linux-2.6.23-rc6/arch/i386/kernel/setup.c === --- linux-2.6.23-rc6.orig/arch/i386/kernel/setup.c 2007-09-19 09:59:59.0 +0800 +++
Re: [PATCH] Ext4: Uninitialized Block Groups
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur <[EMAIL PROTECTED]> wrote: > + > +__u16 crc16(__u16 crc, __u8 const *buffer, size_t len) And is we really really have to do this, then the ext4-private crc16() should have static scope. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Ext4: Uninitialized Block Groups
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur <[EMAIL PROTECTED]> wrote: > +#if !defined(CONFIG_CRC16) > +/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */ > +__u16 const crc16_table[256] = { > + 0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241, > + 0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440, > + 0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40, > + 0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841, > + 0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40, > + 0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41, That's rather sad. A plain old "depends on" would be better. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] atyfb: force 29MHz xtal on G3 PowerBooks
On Sat, 2007-08-25 at 11:13 +0200, Olaf Hering wrote: > The atyfb does not work on my 233MHz PowerBook with Mach64 LP, when the > kernel is booted from firmware. aty_ld_pll_ct() returns 0x22 and xtal > remains at 14.31818. When booted from MacOS, aty_ld_pll_ct() returns 0x3c > and xtal is changed to 29.498928. > Google indicates that all 4 PowerBook models need the higher value. Seems to break it on my wallstreet first gen (M64 LG) So NAK for now until we find out a better way. Ben. > Signed-off-by: Olaf Hering <[EMAIL PROTECTED]> > > --- a/drivers/video/aty/atyfb_base.c > +++ b/drivers/video/aty/atyfb_base.c > @@ -2411,7 +2411,7 @@ static int __devinit aty_init(struct fb_ > diff1 = -diff1; > if (diff2 < 0) > diff2 = -diff2; > - if (diff2 < diff1) { > + if (diff2 < diff1 || (M64_HAS(G3_PB_1024x768))) { > par->ref_clk_per = 1ULL / 29498928; > xtal = "29.498928"; > } > ___ > Linuxppc-dev mailing list > [EMAIL PROTECTED] > https://ozlabs.org/mailman/listinfo/linuxppc-dev - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/usb/misc/emi*.c have the biggest data objects in the whole tree
On Fri, 14 Sep 2007 11:35:34 BST, Denys Vlasenko said: > Hi Tapio, > > You are the author of these files. Are you still maintaining them? > If not, do you know who is the current maintainer? > These two object files hold the biggest data objects in the whole Linux kernel > after lockdep: > >textdata bss dec hex filename >1258 160516 0 161774 277ee ./drivers/usb/misc/emi26.o >1504 209296 0 210800 33770 ./drivers/usb/misc/emi62.o > > Basically, these are big arrays of the following structures: > > typedef struct _INTEL_HEX_RECORD > { > __u32 length; > __u32 address; > __u32 type; > __u8data[MAX_INTEL_HEX_RECORD_LENGTH]; > } INTEL_HEX_RECORD; > > I suggest the following optimizations: > > Change structure to I suggest moving those out of the kernel entirely and use the firmware loader support to bring it in from userspace like all the *other* firmware blobs. 'INTEL_HEX_RECORD' just *screams* 'microcode' ;) pgpDkwQ5AEQWh.pgp Description: PGP signature
Re: iso9660 vs udf
On Wed, 19 Sep 2007 08:05:32 +0530 (IST) Satyam Sharma wrote: > Hi Andries, > > > On Wed, 19 Sep 2007, Andries E. Brouwer wrote: > > > > On Wed, Sep 19, 2007 at 05:48:28AM +0530, Satyam Sharma wrote: > > > > > > > On the other hand, this filesystem announces itself as UDF > > > > > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"), > > > > > perhaps the kernel code should be more robust. > > > > > > Could you send the complete dmesg log, and what you mean with filesystem/ > > > kernel (incorrectly?) announcing it as UDF here ... I agree with Jan, > > > this sounds like an issue with mount(8) to me. > > > > You already got the relevant part of the dmesg log. Slightly more below. > > > Failed mount: > > UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', > > timestamp 2006/03/07 16:26 (1078) > > udf: udf_read_inode(ino 547) failed !bh > > UDF-fs: Error in udf_iget, block=1, partition=1 > > Ok, like said, this comes from udf_fill_super(), but which shouldn't > have been called for this CD in the first place -- i.e. mount(8) shouldn't > have tried to mount a non-UDF filesystem as UDF (unless explicitly asked > as such). I was actually asking for the logs explaining why you thought > the _kernel_ incorrectly "announced" it as an UDF filesystem. > > Hmm ... those "CD-RTOS", "CD-BRIDGE" and "CDUDF File System - Adaptec Inc" > bits are not dmesg output, are they? Looks like "hwinfo --cdrom" or > "isoinfo" or some such. > > > I think the filesystem can be treated both as iso9660 and as udf, > > at least that is what I seem to recall CD-BRIDGE means. Thus, > > if the kernel cannot mount it as udf, I think it is a kernel flaw. > > Given that kernel flaw, and the fact that mounting as iso9660 works, > > mount(8) could work around the kernel problem by guessing iso9660. > > But maybe we should first try to fix the kernel. > > I don't think that is what CD-BRIDGE means -- so no kernel flaw :-) > What happened here is simply that in the absence of a "-t" option, > mount(8) defaulted (probably due to incorrect heuristics?) to UDF for > some reason, thereby obviously failing. I don't know who maintains > mount(8) / util-linux package, or do distributions have their own > maintainers these days (?) Hi, Adrian took over util-linux, but hasn't made any releases lately, so one of the RHAT developers is maintaining util-linux-ng: http://userweb.kernel.org/~kzak/util-linux-ng/ --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On Tue, 2007-09-18 at 18:06 -0700, Linus Torvalds wrote: > There is *no* valid reason for 16kB blocksizes unless you have legacy > issues. That's not correct. > The performance issues have nothing to do with the block-size, and We must be thinking of different performance issues. > should be solvable by just making sure that your stupid "state of the > art" > crap SCSI controller gets contiguous physical memory, which is best > done > in the read-ahead code. SCSI controllers have nothing to do with improving ondisk layout, which is the performance issue I've been referring to. cheers. -- Nathan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix memory hot remove not configured case.
Sorry... I sent old version...it returns -ENOSYS. Andrew-san, please replace. Goto-san, please confirm and ack. == Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess. This patch cleans up them. - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(), which returns -EINVAL. - removed remove_pages() only used in powerpc. - removed no-op remove_memory() in i386, sh, sparc64, x86_64. - only powerpc returns -ENOSYS at memory hot remove. changes it to return -EINVAL. Note: Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other archs if there are requirements and testers. Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> --- arch/i386/mm/init.c|5 arch/ia64/mm/init.c|3 +- arch/powerpc/mm/mem.c | 45 - arch/sh/mm/init.c |6 - arch/sparc64/mm/init.c |5 arch/x86_64/mm/init.c |6 - include/linux/memory_hotplug.h | 12 +- mm/memory_hotplug.c|6 + 8 files changed, 10 insertions(+), 78 deletions(-) Index: linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c === --- linux-2.6.23-rc6-mm1.orig/arch/ia64/mm/init.c +++ linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c @@ -719,7 +719,7 @@ int arch_add_memory(int nid, u64 start, return ret; } - +#ifdef CONFIG_MEMORY_HOTREMOVE int remove_memory(u64 start, u64 size) { unsigned long start_pfn, end_pfn; @@ -735,4 +735,5 @@ out: return ret; } EXPORT_SYMBOL_GPL(remove_memory); +#endif /* CONFIG_MEMORY_HOTREMOVE */ #endif Index: linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c === --- linux-2.6.23-rc6-mm1.orig/arch/powerpc/mm/mem.c +++ linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c @@ -129,51 +129,6 @@ int __devinit arch_add_memory(int nid, u return __add_pages(zone, start_pfn, nr_pages); } -/* - * First pass at this code will check to determine if the remove - * request is within the RMO. Do not allow removal within the RMO. - */ -int __devinit remove_memory(u64 start, u64 size) -{ - struct zone *zone; - unsigned long start_pfn, end_pfn, nr_pages; - - start_pfn = start >> PAGE_SHIFT; - nr_pages = size >> PAGE_SHIFT; - end_pfn = start_pfn + nr_pages; - - printk("%s(): Attempting to remove memoy in range " - "%lx to %lx\n", __func__, start, start+size); - /* -* check for range within RMO -*/ - zone = page_zone(pfn_to_page(start_pfn)); - - printk("%s(): memory will be removed from " - "the %s zone\n", __func__, zone->name); - - /* -* not handling removing memory ranges that -* overlap multiple zones yet -*/ - if (end_pfn > (zone->zone_start_pfn + zone->spanned_pages)) - goto overlap; - - /* make sure it is NOT in RMO */ - if ((start < lmb.rmo_size) || ((start+size) < lmb.rmo_size)) { - printk("%s(): range to be removed must NOT be in RMO!\n", - __func__); - goto in_rmo; - } - - return __remove_pages(zone, start_pfn, nr_pages); - -overlap: - printk("%s(): memory range to be removed overlaps " - "multiple zones!!!\n", __func__); -in_rmo: - return -1; -} #endif /* CONFIG_MEMORY_HOTPLUG */ void show_mem(void) Index: linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c === --- linux-2.6.23-rc6-mm1.orig/arch/x86_64/mm/init.c +++ linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c @@ -474,12 +474,6 @@ error: } EXPORT_SYMBOL_GPL(arch_add_memory); -int remove_memory(u64 start, u64 size) -{ - return -EINVAL; -} -EXPORT_SYMBOL_GPL(remove_memory); - #if !defined(CONFIG_ACPI_NUMA) && defined(CONFIG_NUMA) int memory_add_physaddr_to_nid(u64 start) { Index: linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h === --- linux-2.6.23-rc6-mm1.orig/include/linux/memory_hotplug.h +++ linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h @@ -58,10 +58,9 @@ extern int add_one_highpage(struct page extern void online_page(struct page *page); /* VM interface that may be used by firmware interface */ extern int online_pages(unsigned long, unsigned long); -#ifdef CONFIG_MEMORY_HOTREMOVE -extern int offline_pages(unsigned long, unsigned long, unsigned long); extern void __offline_isolated_pages(unsigned long, unsigned long); -#endif +extern int offline_pages(unsigned long, unsigned long, unsigned long); + /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); @@ -171,13 +170,6 @@ static inline int mhp_notimplemented(con } #endif /* !
Re: iso9660 vs udf
Hi Andries, On Wed, 19 Sep 2007, Andries E. Brouwer wrote: > > On Wed, Sep 19, 2007 at 05:48:28AM +0530, Satyam Sharma wrote: > > > > > On the other hand, this filesystem announces itself as UDF > > > > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"), > > > > perhaps the kernel code should be more robust. > > > > Could you send the complete dmesg log, and what you mean with filesystem/ > > kernel (incorrectly?) announcing it as UDF here ... I agree with Jan, > > this sounds like an issue with mount(8) to me. > > You already got the relevant part of the dmesg log. Slightly more below. > Failed mount: > UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', timestamp > 2006/03/07 16:26 (1078) > udf: udf_read_inode(ino 547) failed !bh > UDF-fs: Error in udf_iget, block=1, partition=1 Ok, like said, this comes from udf_fill_super(), but which shouldn't have been called for this CD in the first place -- i.e. mount(8) shouldn't have tried to mount a non-UDF filesystem as UDF (unless explicitly asked as such). I was actually asking for the logs explaining why you thought the _kernel_ incorrectly "announced" it as an UDF filesystem. Hmm ... those "CD-RTOS", "CD-BRIDGE" and "CDUDF File System - Adaptec Inc" bits are not dmesg output, are they? Looks like "hwinfo --cdrom" or "isoinfo" or some such. > I think the filesystem can be treated both as iso9660 and as udf, > at least that is what I seem to recall CD-BRIDGE means. Thus, > if the kernel cannot mount it as udf, I think it is a kernel flaw. > Given that kernel flaw, and the fact that mounting as iso9660 works, > mount(8) could work around the kernel problem by guessing iso9660. > But maybe we should first try to fix the kernel. I don't think that is what CD-BRIDGE means -- so no kernel flaw :-) What happened here is simply that in the absence of a "-t" option, mount(8) defaulted (probably due to incorrect heuristics?) to UDF for some reason, thereby obviously failing. I don't know who maintains mount(8) / util-linux package, or do distributions have their own maintainers these days (?) Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] JBD slab cleanups
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote: > JBD: Replace slab allocations with page cache allocations > > JBD allocate memory for committed_data and frozen_data from slab. However > JBD should not pass slab pages down to the block layer. Use page allocator > pages instead. This will also prepare JBD for the large blocksize patchset. > > > Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly __GFP_NOFAIL should only be used when we have no way of recovering from failure. The allocation in journal_init_common() (at least) _can_ recover and hence really shouldn't be using __GFP_NOFAIL. (Actually, nothing in the kernel should be using __GFP_NOFAIL. It is there as a marker which says "we really shouldn't be doing this but we don't know how to fix it"). So sometime it'd be good if you could review all the __GFP_NOFAILs in there and see if we can remove some, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fix memory hot remove not configured case.
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess. This patch cleans up them. This is against 2.6.23-rc6-mm1. - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case. - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(), which returns -EINVAL. - removed remove_pages() only used in powerpc. - removed no-op remove_memory() in i386, sh, sparc64, x86_64. - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it to return -EINVAL. Note: Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other archs if there are requirements and testers. Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> --- arch/i386/mm/init.c|5 arch/ia64/mm/init.c|3 +- arch/powerpc/mm/mem.c | 45 - arch/sh/mm/init.c |6 - arch/sparc64/mm/init.c |5 arch/x86_64/mm/init.c |6 - include/linux/memory_hotplug.h | 12 +- mm/memory_hotplug.c|6 + 8 files changed, 10 insertions(+), 78 deletions(-) Index: linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c === --- linux-2.6.23-rc6-mm1.orig/arch/ia64/mm/init.c +++ linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c @@ -719,7 +719,7 @@ int arch_add_memory(int nid, u64 start, return ret; } - +#ifdef CONFIG_MEMORY_HOTREMOVE int remove_memory(u64 start, u64 size) { unsigned long start_pfn, end_pfn; @@ -735,4 +735,5 @@ out: return ret; } EXPORT_SYMBOL_GPL(remove_memory); +#endif /* CONFIG_MEMORY_HOTREMOVE */ #endif Index: linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c === --- linux-2.6.23-rc6-mm1.orig/arch/powerpc/mm/mem.c +++ linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c @@ -129,51 +129,6 @@ int __devinit arch_add_memory(int nid, u return __add_pages(zone, start_pfn, nr_pages); } -/* - * First pass at this code will check to determine if the remove - * request is within the RMO. Do not allow removal within the RMO. - */ -int __devinit remove_memory(u64 start, u64 size) -{ - struct zone *zone; - unsigned long start_pfn, end_pfn, nr_pages; - - start_pfn = start >> PAGE_SHIFT; - nr_pages = size >> PAGE_SHIFT; - end_pfn = start_pfn + nr_pages; - - printk("%s(): Attempting to remove memoy in range " - "%lx to %lx\n", __func__, start, start+size); - /* -* check for range within RMO -*/ - zone = page_zone(pfn_to_page(start_pfn)); - - printk("%s(): memory will be removed from " - "the %s zone\n", __func__, zone->name); - - /* -* not handling removing memory ranges that -* overlap multiple zones yet -*/ - if (end_pfn > (zone->zone_start_pfn + zone->spanned_pages)) - goto overlap; - - /* make sure it is NOT in RMO */ - if ((start < lmb.rmo_size) || ((start+size) < lmb.rmo_size)) { - printk("%s(): range to be removed must NOT be in RMO!\n", - __func__); - goto in_rmo; - } - - return __remove_pages(zone, start_pfn, nr_pages); - -overlap: - printk("%s(): memory range to be removed overlaps " - "multiple zones!!!\n", __func__); -in_rmo: - return -1; -} #endif /* CONFIG_MEMORY_HOTPLUG */ void show_mem(void) Index: linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c === --- linux-2.6.23-rc6-mm1.orig/arch/x86_64/mm/init.c +++ linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c @@ -474,12 +474,6 @@ error: } EXPORT_SYMBOL_GPL(arch_add_memory); -int remove_memory(u64 start, u64 size) -{ - return -EINVAL; -} -EXPORT_SYMBOL_GPL(remove_memory); - #if !defined(CONFIG_ACPI_NUMA) && defined(CONFIG_NUMA) int memory_add_physaddr_to_nid(u64 start) { Index: linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h === --- linux-2.6.23-rc6-mm1.orig/include/linux/memory_hotplug.h +++ linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h @@ -58,10 +58,9 @@ extern int add_one_highpage(struct page extern void online_page(struct page *page); /* VM interface that may be used by firmware interface */ extern int online_pages(unsigned long, unsigned long); -#ifdef CONFIG_MEMORY_HOTREMOVE -extern int offline_pages(unsigned long, unsigned long, unsigned long); extern void __offline_isolated_pages(unsigned long, unsigned long); -#endif +extern int offline_pages(unsigned long, unsigned long, unsigned long); + /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); @@ -171,13 +170,6 @@ static inline int mhp_notimplemented(con }
Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?
On Fri, Sep 14, 2007 at 12:51:34PM -0700, Christoph Lameter wrote: > On Fri, 14 Sep 2007, Siddha, Suresh B wrote: > > We are trying to get the latest data with 2.6.23-rc4-mm1 with and without > > slub. Is this good enough? > > Good enough. If you are concerned about the page allocator pass through > then you may want to test the page allocator pass through patchset > separately. The fastpath of the page allocator is currently not > competitive if you always free and allocate a single page. If contiguous > pages are allocated then the pass through is superior. We are having all sorts of stability issues with -mm kernels, let alone perf testing :( For now, we are trying to do slab Vs slub comparisons for the mainline kernels. Let's see how that goes. Meanwhile, any chance that you can point us at relevant recent patches/fixes that are in -mm and perhaps that can be applied to mainline kernel? thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] cpuset write dirty map
On Tue, 18 Sep 2007 17:51:49 -0700 Ethan Solomita <[EMAIL PROTECTED]> wrote: > > > >> +void cpuset_update_dirty_nodes(struct address_space *mapping, > >> + struct page *page) > >> +{ > >> + nodemask_t *nodes = mapping->dirty_nodes; > >> + int node = page_to_nid(page); > >> + > >> + if (!nodes) { > >> + nodes = kmalloc(sizeof(nodemask_t), GFP_ATOMIC); > > > > Does it have to be atomic? atomic is weak and can fail. > > > > If some callers can do GFP_KERNEL and some can only do GFP_ATOMIC then we > > should at least pass the gfp_t into this function so it can do the stronger > > allocation when possible. > > I was going to say that sanity would be improved by just allocing the > nodemask at inode alloc time. A failure here could be a problem because > below cpuset_intersects_dirty_nodes() assumes that a NULL nodemask > pointer means that there are no dirty nodes, thus preventing dirty pages > from getting written to disk. i.e. This must never fail. > > Given that we allocate it always at the beginning, I'm leaning towards > just allocating it within mapping no matter its size. It will make the > code much much simpler, and save me writing all the comments we've been > discussing. 8-) > > How disastrous would this be? Is the need to support a 1024 node system > with 1,000,000 open mostly-read-only files thus needing to spend 120MB > of extra memory on my nodemasks a real scenario and a showstopper? None of this is very nice. Yes, it would be good to save all that memory and yes, I_DIRTY_PAGES inodes are very much the uncommon case. But if a failed GFP_ATOMIC allocation results in data loss then that's a showstopper. How hard would it be to handle the allocation failure in a more friendly manner? Say, if the allocation failed then point mapping->dirty_nodes at some global all-ones nodemask, and then special-case that nodemask in the freeing code? > > > > > >> + if (!nodes) > >> + return; > >> + > >> + *nodes = NODE_MASK_NONE; > >> + mapping->dirty_nodes = nodes; > >> + } > >> + > >> + if (!node_isset(node, *nodes)) > >> + node_set(node, *nodes); > >> +} > >> + > >> +void cpuset_clear_dirty_nodes(struct address_space *mapping) > >> +{ > >> + nodemask_t *nodes = mapping->dirty_nodes; > >> + > >> + if (nodes) { > >> + mapping->dirty_nodes = NULL; > >> + kfree(nodes); > >> + } > >> +} > > > > Can this race with cpuset_update_dirty_nodes()? And with itself? If not, > > a comment which describes the locking requirements would be good. > > I'll add a comment. Such a race should not be possible. It is called > only from clear_inode() which is used when the inode is being freed > "with extreme prejudice" (from its comments). I can add a check that > i_state I_FREEING is set. Would that do? Sounds sane. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 panic (memory controller issue ?)
Badari Pulavarty wrote: > On Tue, 2007-09-18 at 15:21 -0700, Badari Pulavarty wrote: >> Hi Balbir, >> >> I get following panic from SLUB, while doing simple fsx tests. >> I haven't used any container/memory controller stuff except >> that I configured them in :( >> >> Looks like slub doesn't like one of the flags passed in ? >> >> Known issue ? Ideas ? >> > > I think, I found the issue. I am still running tests to > verify. Does this sound correct ? > > Thanks, > Badari > > Need to strip __GFP_HIGHMEM flag while passing to > mem_container_cache_charge(). > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > mm/filemap.c |3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > Index: linux-2.6.23-rc6/mm/filemap.c > === > --- linux-2.6.23-rc6.orig/mm/filemap.c2007-09-18 12:43:54.0 > -0700 > +++ linux-2.6.23-rc6/mm/filemap.c 2007-09-18 19:14:44.0 -0700 > @@ -441,7 +441,8 @@ int filemap_write_and_wait_range(struct > int add_to_page_cache(struct page *page, struct address_space *mapping, > pgoff_t offset, gfp_t gfp_mask) > { > - int error = mem_container_cache_charge(page, current->mm, gfp_mask); > + int error = mem_container_cache_charge(page, current->mm, > + gfp_mask & ~__GFP_HIGHMEM); > if (error) > goto out; > > > Hi, Badari, The fix looks correct, radix_tree_preload() does the same thing in add_to_page_cache(). Thanks for identifying the fix -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] Time to make CONFIG_PARAVIRT non-experimental.
On Tue, 2007-09-18 at 23:52 +0200, Andi Kleen wrote: > On Tuesday 18 September 2007 23:34, Rusty Russell wrote: > > How about a "select" based on Xen, lguest or VMI? There's no other > > reason to enable it, after all. > > I did an patch to do that recently because the current setup > is indeed unobvious. > > But I had to drop it again because > it ended up with Kconfig warnings. about undefined symbols > on x86-64. The problem is that lguest > is visible in Kconfig for all architectures and it warns > if you select something that doesn't exist on all architectures. I think that's fixed as a side-effect of this cleanup. At least, it works for me on x86-64. Patch below: if you agree, I'll re-xmit all three. > > > Also I would still consider it experimental. > > > > After 9 months in mainline and three kernel versions, > > Well it changed a lot each release. Well, the biggest change was the patching code getting enhanced in 2.6.22 (to cover all calls, not just 5). The 22 -> 23 changes were fairly trivial. So I think 2.6.24 is a reasonable time to remove EXPERIMENTAL. > > I'd hope not. > > It's been pretty damn stable (ok, you broke it once, but maybe that's > > because you consider it experimental). > > Is there a significant user base? It's enabled in Ubuntu Feisty (2.6.20). > At least the Xen port seems to have specific requirements > and essentially only work on xen-unstable (?) [or at least > some very new Xen version] which probably very few > people use. Sure, and that might well still be experimental (Jeremy?). But that's not CONFIG_PARAVIRT. Hope that helps, Rusty. == Andi points out that PARAVIRT is an option best selected when needed. We introduce PARAVIRT_GUEST for the menu itself, and select PARAVIRT if they ask for anything which needs it. This also makes PARAVIRT non-experimental. Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> diff -r 8efa5fdb22d8 arch/i386/Kconfig --- a/arch/i386/Kconfig Wed Sep 19 11:23:18 2007 +1000 +++ b/arch/i386/Kconfig Wed Sep 19 11:33:59 2007 +1000 @@ -214,24 +214,30 @@ config X86_ES7000 endchoice -menuconfig PARAVIRT +config PARAVIRT + bool + depends on !(X86_VISWS || X86_VOYAGER) + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + +menuconfig PARAVIRT_GUEST - bool "Paravirtualized guest support (EXPERIMENTAL)" - depends on EXPERIMENTAL + bool "Paravirtualized guest support" - depends on !(X86_VISWS || X86_VOYAGER) - help - Paravirtualization is a way of running multiple instances of - Linux on the same machine, under a hypervisor. This option - changes the kernel so it can modify itself when it is run - under a hypervisor, improving performance significantly. - However, when run without a hypervisor the kernel is - theoretically slower. If in doubt, say N. - -if PARAVIRT + help + Say Y here to get to see options related to running Linux under + various hypervisors. This option alone does not add any kernel code. + + If you say N, all options in this submenu will be skipped and disabled. + +if PARAVIRT_GUEST source "arch/i386/xen/Kconfig" config VMI bool "VMI Guest support" + select PARAVIRT help VMI provides a paravirtualized interface to the VMware ESX server (it could be used by other hypervisors in theory too, but is not @@ -239,6 +246,7 @@ config VMI config LGUEST_GUEST bool "Lguest guest support" + select PARAVIRT depends on !X86_PAE help Lguest is a tiny in-kernel hypervisor. Selecting this will diff -r 8efa5fdb22d8 arch/i386/xen/Kconfig --- a/arch/i386/xen/Kconfig Wed Sep 19 11:23:18 2007 +1000 +++ b/arch/i386/xen/Kconfig Wed Sep 19 11:25:07 2007 +1000 @@ -4,6 +4,7 @@ config XEN bool "Xen guest support" + select PARAVIRT depends on X86_CMPXCHG && X86_TSC && !NEED_MULTIPLE_NODES help This is the Linux Xen port. Enabling this will allow the - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UML dead with current -git?
On Tue, Sep 18, 2007 at 07:55:13PM +0200, Sam Ravnborg wrote: > Sounds to me like a known issue by you. Can you give a few more details > so we maybe can get it fixed? I believe what happened here is an x86_64 build followed by a UML/x86_64 build with no intervening mrproper. I've always considered this to be a "don't do that" sort of thing. However, maybe we could stick the arch of the current build somewhere in the tree, check that before any serious part of a subsequent build, and error out if $ARCH is different. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] UML - refix ELF_CORE_COPY_REGS
The former uml-fix-x86_64-core-dump-crash.patch expressed ELF_CORE_COPY_REGS in terms of the pt_regs struct currently in -mm. I fast-tracked this to mainline, where it was wrong because the pt_regs struct there hadn't been changed. Fixing that then made the patch wrong for -mm when it was rebased on -rc6. This patch changes things back again to be right for -mm. This should go to mainline after uml-rename-pt_regs-general-purpose-register-file.patch Signed-off-by: Jeff Dike <[EMAIL PROTECTED]> --- include/asm-um/elf-x86_64.h | 42 +- 1 file changed, 21 insertions(+), 21 deletions(-) Index: linux-2.6.20/include/asm-um/elf-x86_64.h === --- linux-2.6.20.orig/include/asm-um/elf-x86_64.h 2007-09-18 13:28:30.0 -0400 +++ linux-2.6.20/include/asm-um/elf-x86_64.h2007-09-18 20:50:47.0 -0400 @@ -68,27 +68,27 @@ typedef struct user_i387_struct elf_fpre } while (0) #define ELF_CORE_COPY_REGS(pr_reg, regs) \ - (pr_reg)[0] = (regs)->regs.skas.regs[0];\ - (pr_reg)[1] = (regs)->regs.skas.regs[1];\ - (pr_reg)[2] = (regs)->regs.skas.regs[2];\ - (pr_reg)[3] = (regs)->regs.skas.regs[3];\ - (pr_reg)[4] = (regs)->regs.skas.regs[4];\ - (pr_reg)[5] = (regs)->regs.skas.regs[5];\ - (pr_reg)[6] = (regs)->regs.skas.regs[6];\ - (pr_reg)[7] = (regs)->regs.skas.regs[7];\ - (pr_reg)[8] = (regs)->regs.skas.regs[8];\ - (pr_reg)[9] = (regs)->regs.skas.regs[9];\ - (pr_reg)[10] = (regs)->regs.skas.regs[10]; \ - (pr_reg)[11] = (regs)->regs.skas.regs[11]; \ - (pr_reg)[12] = (regs)->regs.skas.regs[12]; \ - (pr_reg)[13] = (regs)->regs.skas.regs[13]; \ - (pr_reg)[14] = (regs)->regs.skas.regs[14]; \ - (pr_reg)[15] = (regs)->regs.skas.regs[15]; \ - (pr_reg)[16] = (regs)->regs.skas.regs[16]; \ - (pr_reg)[17] = (regs)->regs.skas.regs[17]; \ - (pr_reg)[18] = (regs)->regs.skas.regs[18]; \ - (pr_reg)[19] = (regs)->regs.skas.regs[19]; \ - (pr_reg)[20] = (regs)->regs.skas.regs[20]; \ + (pr_reg)[0] = (regs)->regs.gp[0]; \ + (pr_reg)[1] = (regs)->regs.gp[1]; \ + (pr_reg)[2] = (regs)->regs.gp[2]; \ + (pr_reg)[3] = (regs)->regs.gp[3]; \ + (pr_reg)[4] = (regs)->regs.gp[4]; \ + (pr_reg)[5] = (regs)->regs.gp[5]; \ + (pr_reg)[6] = (regs)->regs.gp[6]; \ + (pr_reg)[7] = (regs)->regs.gp[7]; \ + (pr_reg)[8] = (regs)->regs.gp[8]; \ + (pr_reg)[9] = (regs)->regs.gp[9]; \ + (pr_reg)[10] = (regs)->regs.gp[10]; \ + (pr_reg)[11] = (regs)->regs.gp[11]; \ + (pr_reg)[12] = (regs)->regs.gp[12]; \ + (pr_reg)[13] = (regs)->regs.gp[13]; \ + (pr_reg)[14] = (regs)->regs.gp[14]; \ + (pr_reg)[15] = (regs)->regs.gp[15]; \ + (pr_reg)[16] = (regs)->regs.gp[16]; \ + (pr_reg)[17] = (regs)->regs.gp[17]; \ + (pr_reg)[18] = (regs)->regs.gp[18]; \ + (pr_reg)[19] = (regs)->regs.gp[19]; \ + (pr_reg)[20] = (regs)->regs.gp[20]; \ (pr_reg)[21] = current->thread.arch.fs; \ (pr_reg)[22] = 0; \ (pr_reg)[23] = 0; \ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] UML - Fix registers.c build
uml-stop-saving-process-fp-state.patch broke the UML/x86_64 build. On x86_64, sys/ptrace.h has to be included before asm/ptrace.h. Otherwise, the defines in asm/ptrace.h will ruin the parse of sys/ptrace.h - asm/ptrace.h: #define PTRACE_GETREGS12 sys/ptrace.h: enum __ptrace_request { ... PTRACE_GETREGS = 12, ... } Also, errno.h was missing. Signed-off-by: Jeff Dike <[EMAIL PROTECTED]> --- arch/um/os-Linux/sys-x86_64/registers.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6.20/arch/um/os-Linux/sys-x86_64/registers.c === --- linux-2.6.20.orig/arch/um/os-Linux/sys-x86_64/registers.c 2007-09-18 20:51:35.0 -0400 +++ linux-2.6.20/arch/um/os-Linux/sys-x86_64/registers.c2007-09-18 20:52:15.0 -0400 @@ -3,9 +3,10 @@ * Licensed under the GPL */ +#include +#include #define __FRAME_OFFSETS #include -#include #include "longjmp.h" #include "user.h" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] UML - Two x86_64 build fixes
These two patches fix UML build breakages on x86_64. They are -mm-specific, so don't need to go to mainline until 2.6.24. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On Tue, 2007-09-18 at 12:44 -0700, Linus Torvalds wrote: > This is not about performance. Never has been. It's about SGI wanting a > way out of their current 16kB mess. Pass the crack pipe, Linus? > The way to fix performance is to move to x86-64, and use 4kB pages and be > happy. However, the SGI people want a 16kB (and possibly bigger) > crap-option for their people who are (often _already_) running some > special case situation that nobody else cares about. FWIW (and I hate to let reality get in the way of a good conspiracy) - all SGI systems have always defaulted to using 4K blocksize filesystems; there's very few customers who would use larger, especially as the Linux kernel limitations in this area are well known. There's no "16K mess" that SGI is trying to clean up here (and SGI have offered both IA64 and x86_64 systems for some time now, so not sure how you came up with that whacko theory). > It's not about "performance". If it was, they would never have used ia64 For SGI it really is about optimising ondisk layouts for some workloads and large filesystems, and has nothing to do with IA64. Read the paper Dave sent out earlier, it's quite interesting. For other people, like AntonA, who has also been asking for this functionality literally for years (and ended up trying to do his own thing inside NTFS IIRC) it's to be able to access existing filesystems from other operating systems. Here's a more recent discussion, I know Anton had discussed it several times on fsdevel before this 2005 post too: http://oss.sgi.com/archives/xfs/2005-01/msg00126.html Although I'm sure others exist, I've never worked on any platform other than Linux that doesn't support filesystem block sizes larger than the pagesize. Its one thing to stick your head in the sand about the need for this feature, its another thing entirely to try pass it off as an "SGI mess", sorry. I do entirely support the sentiment to stop this pissing match and get on with fixing the problem though. cheers. -- Nathan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/41] Large Blocksize Support V7 (adds memmap support)
On Wed, 19 Sep 2007, Nathan Scott wrote: > > FWIW (and I hate to let reality get in the way of a good conspiracy) - > all SGI systems have always defaulted to using 4K blocksize filesystems; Yes. And I've been told that: > there's very few customers who would use larger .. who apparently would like to move to x86-64. That was what people implied at the kernel summit. >especially as the Linux > kernel limitations in this area are well known. There's no "16K mess" > that SGI is trying to clean up here (and SGI have offered both IA64 and > x86_64 systems for some time now, so not sure how you came up with that > whacko theory). Well, if that is the case, then I vote that we drop the whole patch-series entirely. It clearly has no reason for existing at all. There is *no* valid reason for 16kB blocksizes unless you have legacy issues. The performance issues have nothing to do with the block-size, and should be solvable by just making sure that your stupid "state of the art" crap SCSI controller gets contiguous physical memory, which is best done in the read-ahead code. So get your stories straight, people. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Combine instrumentation menus in kernel/Kconfig.instrumentation
* [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote: > On Tue, 18 Sep 2007 17:12:59 EDT, Mathieu Desnoyers said: > > > +++ linux-2.6-lttng/kernel/Kconfig.instrumentation 2007-09-18 13:18:17.000 > 00 -0400 > > @@ -0,0 +1,40 @@ > > +menuconfig INSTRUMENTATION > > + bool "Instrumentation Support" > > + default y > > + ---help--- > > + Say Y here to get to see options related to performance measurement, > > + debugging, and testing. This option alone does not add any kernel > > code. > > + > > + If you say N, all options in this submenu will be skipped and > > disabled. > > OK, I'll bite - given the mention of 'debugging' there, do we want to go for > broke and *also* suck in the 'Kernel Hacking' menu as well? Instrumentation primarity aims at debugging user-space applications by giving the ability to extract information across execution layers, hence being a feature useful to users, not only kernel hackers. Therefore I strongly doubt that it belongs to the kernel hacking submenu. It today's world, where we face complex user-space problems involving multithreaded, multiprocesses applications, the kernel and hypervisors, running on many cores, this kind of tool has proven useful to many, not only kernel developers. Please have a look at the papers (especially the OLS2007 paper) linked on http://ltt.polymtl.ca as a starting point if you are intereted in the question. But yes, it can also be useful to kernel debugging, amongst other things. Mathieu -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 signature.asc Description: Digital signature
Re: [v4l-dvb-maintainer] 2.6.23-rc6-mm1 -- "dvb_dmx_swfilter" [dr ivers/media/video/video-buf-dvb.ko] undefined!
On 9/18/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Miles Lane wrote: > > ERROR: "dvb_dmx_swfilter" [drivers/media/video/video-buf-dvb.ko] > undefined! > > ERROR: "dvb_net_init" [drivers/media/video/video-buf-dvb.ko] undefined! > > ERROR: "dvb_dmxdev_init" [drivers/media/video/video-buf-dvb.ko] undefined! > > ERROR: "dvb_dmx_init" [drivers/media/video/video-buf-dvb.ko] undefined! > > ERROR: "dvb_register_frontend" [drivers/media/video/video-buf-dvb.ko] > undefined! > > ERROR: "dvb_register_adapter" [drivers/media/video/video-buf-dvb.ko] > undefined! > > ERROR: "dvb_unregister_adapter" [drivers/media/video/video-buf-dvb.ko] > > undefined! > > ERROR: "dvb_frontend_detach" [drivers/media/video/video-buf-dvb.ko] > undefined! > > ERROR: "dvb_unregister_frontend" > > [drivers/media/video/video-buf-dvb.ko] undefined! > > ERROR: "dvb_dmx_release" [drivers/media/video/video-buf-dvb.ko] undefined! > > ERROR: "dvb_dmxdev_release" [drivers/media/video/video-buf-dvb.ko] > undefined! > > ERROR: "dvb_net_release" [drivers/media/video/video-buf-dvb.ko] undefined! > > ERROR: "mt2131_attach" [drivers/media/video/cx23885/cx23885.ko] undefined! > > ERROR: "s5h1409_attach" [drivers/media/video/cx23885/cx23885.ko] > undefined! > > > > The attached fix should fix the problem. Thanks! Looks good here. Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Git tree for old kernels from before the current tree
* Mon, 23 Jul 2007 11:02:39 -0700 (PDT) > > On Mon, 23 Jul 2007, Nicolas Pitre wrote: >> >> I started this once. >> >> I have (sort of) a GIT tree with all Linux revisions that I could find >> from v0.01 up to v1.0.9. But the most interesting information and also >> what is the most time consuming is the retrieval of announcement >> messages for those releases in old mailing list or newsgroup archives to >> serve as commit log data. It seems to be even arder to find for post >> v1.0 releases. > > Yes, I agree. Google finds some of them, but (a) I was never very good > about announcements anyway and (b) there's nothing really good to search > for, so it's very hit-and-miss. > > Some of the really early release notes are easy to find, just because I > made them available with the sources, but mostly I'd just have posten to > the newsgroup/mailing lists. Maybe this can be useful somehow: ftp://ftp.shout.net/pub/users/mec/kcs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpuset trouble after hibernate
On 9/9/07, Pavel Machek <[EMAIL PROTECTED]> wrote: > > One of the cpus was unplugged during suspend... perhaps some > save/restore is needed during hotplug/unplug? Or else keep track separately in cpusets of - cpus that the cpuset can run on - cpus that the admin has specified for the cpu to run on hotplug/hotunplug events would only affect the former; userspace would only see/modify the latter. Then when hibernate is over and the CPUs are hotplugged back in, things would be back as before. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] JBD slab cleanups
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote: > On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote: > > On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote: > > > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote: > > > > Here is the incremental small cleanup patch. > > > > > > > > Remove kamlloc usages in jbd/jbd2 and consistently use > > > > jbd_kmalloc/jbd2_malloc. > > > > > > Shouldn't we kill jbd_kmalloc instead? > > > > > > > It seems useful to me to keep jbd_kmalloc/jbd_free. They are central > > places to handle memory (de)allocation( > in the future if we need to change memory allocation in jbd(e.g. not > > using kmalloc or using different flag), we don't need to touch every > > place in the jbd code calling jbd_kmalloc. > > I disagree. Why would jbd need to globally change the way it allocates > memory? It currently uses kmalloc (and jbd_kmalloc) for allocating a > variety of structures. Having to change one particular instance won't > necessarily mean we want to change all of them. Adding unnecessary > wrappers only obfuscates the code making it harder to understand. You > wouldn't want every subsystem to have it's own *_kmalloc() that took > different arguments. Besides, there aren't that many calls to kmalloc > and kfree in the jbd code, so there wouldn't be much pain in changing > GFP flags or whatever, if it ever needed to be done. > > Shaggy Okay, Points taken, Here is the updated patch to get rid of slab management and jbd_kmalloc from jbd totally. This patch is intend to replace the patch in mm tree, Andrew, could you pick up this one instead? Thanks, Mingming jbd/jbd2: JBD memory allocation cleanups From: Christoph Lameter <[EMAIL PROTECTED]> JBD: Replace slab allocations with page cache allocations JBD allocate memory for committed_data and frozen_data from slab. However JBD should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/jbd/commit.c |6 +-- fs/jbd/journal.c | 99 ++ fs/jbd/transaction.c | 12 +++--- fs/jbd2/commit.c |6 +-- fs/jbd2/journal.c | 99 ++ fs/jbd2/transaction.c | 18 - include/linux/jbd.h | 18 + include/linux/jbd2.h | 21 +- 8 files changed, 52 insertions(+), 227 deletions(-) Index: linux-2.6.23-rc6/fs/jbd/journal.c === --- linux-2.6.23-rc6.orig/fs/jbd/journal.c 2007-09-18 17:19:01.0 -0700 +++ linux-2.6.23-rc6/fs/jbd/journal.c 2007-09-18 17:51:21.0 -0700 @@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit); static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); static void __journal_abort_soft (journal_t *journal, int errno); -static int journal_create_jbd_slab(size_t slab_size); /* * Helper function used to manage commit timeouts @@ -334,10 +333,10 @@ repeat: char *tmp; jbd_unlock_bh_state(bh_in); - tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS); + tmp = jbd_alloc(bh_in->b_size, GFP_NOFS); jbd_lock_bh_state(bh_in); if (jh_in->b_frozen_data) { - jbd_slab_free(tmp, bh_in->b_size); + jbd_free(tmp, bh_in->b_size); goto repeat; } @@ -654,7 +653,7 @@ static journal_t * journal_init_common ( journal_t *journal; int err; - journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL); + journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL); if (!journal) goto fail; memset(journal, 0, sizeof(*journal)); @@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal) } } - /* -* Create a slab for this blocksize -*/ - err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize)); - if (err) - return err; - /* Let the recovery code check whether it needs to recover any * data from the journal. */ if (journal_recover(journal)) @@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode } /* - * Simple support for retrying memory allocations. Introduced to help to - * debug different VM deadlock avoidance strategies. - */ -void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry) -{ - return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0)); -} - -/* - * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed - * and allocate frozen and commit buffers from these slabs. - * - * Reason for doing this is to
Re: [PATCH] Reduce __print_symbol/sprint_symbol stack usage.
Hi Gilboa, On Sat, 15 Sep 2007, Gilboa Davara wrote: > > This is my second stab at solving the "stack over flow due to > dump_strace when close to stack-overflow is detected by do_IRQ" problem. > (Hopefully) this patch is creates less noise then the previous one. > > [snip] > > I'll try and create an option 2 (static allocation, minimal locking) > > patch and post ASAP. > > Hopefully it'll fare better. (While keeping the current interface intact > > and reducing the damage/noise) > > - Gilboa > > --- linux-2.6/kernel/kallsyms.orig2007-09-15 11:46:54.0 +0300 > +++ linux-2.6/kernel/kallsyms.c 2007-09-15 21:06:55.0 +0300 > @@ -306,13 +306,14 @@ int lookup_symbol_attrs(unsigned long ad > return lookup_module_symbol_attrs(addr, size, offset, modname, name); > } > > -/* Look up a kernel symbol and return it in a text buffer. */ > -int sprint_symbol(char *buffer, unsigned long address) > +/* Internal version: > + Look up a kernel symbol and module name and return them to the > + caller's buffer/namebuf buffers. */ /* * ... * ... */ is the general coding style here ... > +int __sprint_symbol(char *buffer, char *namebuf, unsigned long address) > { > - char *modname; > - const char *name; > unsigned long offset, size; > - char namebuf[KSYM_NAME_LEN]; > + const char *name; > + char *modname; > > name = kallsyms_lookup(address, , , , namebuf); > if (!name) > @@ -325,14 +326,35 @@ int sprint_symbol(char *buffer, unsigned > return sprintf(buffer, "%s+%#lx/%#lx", name, offset, size); > } > > +/* Exported version: > + Look up a kernel symbol and return it in a text buffer. */ ditto. > +int sprint_symbol(char *buffer, unsigned long address) > +{ > + char namebuf[KSYM_NAME_LEN]; Hmm, don't we intend to push this array out of the stack too? + static char namebuf[KSYM_NAME_LEN]; + static DEFINE_SPINLOCK(namebuf_lock); here ? > + > + return __sprint_symbol(buffer, namebuf, address); And you'd need to wrap spin_lock_irqsave()/spin_unlock_irqrestore() around this call. > +} > +static DEFINE_SPINLOCK(symbol_lock); Try to keep the declarations of a lock, and the data that it protects, close together. Since this lock is being used to protect "buffer", it makes sense to ... > /* Look up a kernel symbol and print it to the kernel messages. */ > void __print_symbol(const char *fmt, unsigned long address) > { > - char buffer[KSYM_SYMBOL_LEN]; > + /* Use static buffers instead of char array to reduce > + stack footprint in i386/4KSTACKS. > + Buffers must be protected against re-entry. */ > + static char namebuf[KSYM_NAME_LEN]; > + static char buffer[KSYM_SYMBOL_LEN]; ... have it: + static DEFINE_SPINLOCK(buffer_lock); here (note the name that exactly describes what the lock protects). And the namebuf array isn't required here, it's already there in sprint_symbol(), which you can call from ... > + unsigned long flags; > + > > - sprint_symbol(buffer, address); > + spin_lock_irqsave(_lock, flags); > + > + __sprint_symbol(buffer, namebuf, address); here ... sprint_symbol() ? > printk(fmt, buffer); > + > + spin_unlock_irqrestore(_lock, flags); But I still don't much like this :-( More importantly, if a panic occurs *below* this callchain (and let's say we ended up in this callchain because somebody put in a dump_stack() somewhere for debugging purposes), then we'd have a deadlock on our hands, and nothing gets printed for that panic. I don't know who maintains this part of kernel code, but you can try resubmitting (with the changes suggested above) to someone appropriate ... Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] cpuset write dirty map
Andrew Morton wrote: > On Tue, 11 Sep 2007 18:36:34 -0700 > Ethan Solomita <[EMAIL PROTECTED]> wrote: > >> Add a dirty map to struct address_space > > I get a tremendous number of rejects trying to wedge this stuff on top of > Peter's mm-dirty-balancing-for-tasks changes. More rejects than I am > prepared to partially-fix so that I can usefully look at these changes in > tkdiff, so this is all based on a quick peek at the diff itself.. This isn't surprising. We're both changing the calculation of dirty limits. If his code is already into your workspace, then I'll have to do the merging after you release it. >> +#if MAX_NUMNODES <= BITS_PER_LONG > > The patch is sprinkled full of this conditional. > > I don't understand why this is being done. afaict it isn't described > in a code comment (it should be) nor even in the changelogs? I can add comments. > Given its overall complexity and its likelihood to change in the > future, I'd suggest that this conditional be centralised in a single > place. Something like > > /* >* nice comment goes here >*/ > #if MAX_NUMNODES <= BITS_PER_LONG > #define CPUSET_DIRTY_LIMITS 1 > #else > #define CPUSET_DIRTY_LIMITS 0 > #endif > > Then use #if CPUSET_DIRTY_LIMITS everywhere else. > > (This is better than #ifdef CPUSET_DIRTY_LIMITS because we'll et a > warning if someone typos '#if CPUSET_DITRY_LIMITS') I can add something like this. Probably something like: CPUSET_DIRTY_LIMITS_USEPTR >> --- 0/include/linux/fs.h 2007-09-11 14:35:58.0 -0700 >> +++ 1/include/linux/fs.h 2007-09-11 14:36:24.0 -0700 >> @@ -516,6 +516,13 @@ struct address_space { >> spinlock_t private_lock; /* for use by the address_space >> */ >> struct list_headprivate_list; /* ditto */ >> struct address_space*assoc_mapping; /* ditto */ >> +#ifdef CONFIG_CPUSETS >> +#if MAX_NUMNODES <= BITS_PER_LONG >> +nodemask_t dirty_nodes;/* nodes with dirty pages */ >> +#else >> +nodemask_t *dirty_nodes; /* pointer to map if dirty */ >> +#endif >> +#endif > > afacit there is no code comment and no changelog text which explains the > above design decision? There should be, please. OK. > > There is talk of making cpusets available with CONFIG_SMP=n. Will this new > feature be available in that case? (it should be). I'm not sure how useful it would be in that scenario, but for consistency we should still be able to specify varying dirty ratios (from patch 6/6). The above code wouldn't mean anything SMP=n since there's only the one node. We'd just be indicating whether the inode has any dirty pages, which we already know. > >> } __attribute__((aligned(sizeof(long; >> /* >> * On most architectures that alignment is already the case; but >> diff -uprN -X 0/Documentation/dontdiff 0/include/linux/writeback.h >> 1/include/linux/writeback.h >> --- 0/include/linux/writeback.h 2007-09-11 14:35:58.0 -0700 >> +++ 1/include/linux/writeback.h 2007-09-11 14:37:46.0 -0700 >> @@ -62,6 +62,7 @@ struct writeback_control { >> unsigned for_writepages:1; /* This is a writepages() call */ >> unsigned range_cyclic:1;/* range_start is cyclic */ >> void *fs_private; /* For use by ->writepages() */ >> +nodemask_t *nodes; /* Set of nodes of interest */ >> }; > > That comment is a bit terse. It's always good to be lavish when commenting > data structures, for understanding those is key to understanding a design. > OK >> /* >> diff -uprN -X 0/Documentation/dontdiff 0/kernel/cpuset.c 1/kernel/cpuset.c >> --- 0/kernel/cpuset.c2007-09-11 14:35:58.0 -0700 >> +++ 1/kernel/cpuset.c2007-09-11 14:36:24.0 -0700 >> @@ -4,7 +4,7 @@ >> * Processor and Memory placement constraints for sets of tasks. >> * >> * Copyright (C) 2003 BULL SA. >> - * Copyright (C) 2004-2006 Silicon Graphics, Inc. >> + * Copyright (C) 2004-2007 Silicon Graphics, Inc. >> * Copyright (C) 2006 Google, Inc >> * >> * Portions derived from Patrick Mochel's sysfs code. >> @@ -14,6 +14,7 @@ >> * 2003-10-22 Updates by Stephen Hemminger. >> * 2004 May-July Rework by Paul Jackson. >> * 2006 Rework by Paul Menage to use generic containers >> + * 2007 Cpuset writeback by Christoph Lameter. >> * >> * This file is subject to the terms and conditions of the GNU General >> Public >> * License. See the file COPYING in the main directory of the Linux >> @@ -1754,6 +1755,63 @@ int cpuset_mem_spread_node(void) >> } >> EXPORT_SYMBOL_GPL(cpuset_mem_spread_node); >> >> +#if MAX_NUMNODES > BITS_PER_LONG > > waah. In other places we do "MAX_NUMNODES <= BITS_PER_LONG" Your sanity is important to me. Will fix. > >> + >> +/* >> + * Special functions for NUMA systems with a large number of nodes. >> + * The nodemask
Re: iso9660 vs udf
On Wed, Sep 19, 2007 at 05:48:28AM +0530, Satyam Sharma wrote: > > > On the other hand, this filesystem announces itself as UDF > > > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"), > > > perhaps the kernel code should be more robust. > > Could you send the complete dmesg log, and what you mean with filesystem/ > kernel (incorrectly?) announcing it as UDF here ... I agree with Jan, > this sounds like an issue with mount(8) to me. You already got the relevant part of the dmesg log. Slightly more below. I think the filesystem can be treated both as iso9660 and as udf, at least that is what I seem to recall CD-BRIDGE means. Thus, if the kernel cannot mount it as udf, I think it is a kernel flaw. Given that kernel flaw, and the fact that mounting as iso9660 works, mount(8) could work around the kernel problem by guessing iso9660. But maybe we should first try to fix the kernel. Andries Failed mount: UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', timestamp 2006/03/07 16:26 (1078) udf: udf_read_inode(ino 547) failed !bh UDF-fs: Error in udf_iget, block=1, partition=1 Success: ISO 9660 Extensions: Microsoft Joliet Level 3 ISOFS: changing to secondary root - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 panic (memory controller issue ?)
On Tue, 2007-09-18 at 15:21 -0700, Badari Pulavarty wrote: > Hi Balbir, > > I get following panic from SLUB, while doing simple fsx tests. > I haven't used any container/memory controller stuff except > that I configured them in :( > > Looks like slub doesn't like one of the flags passed in ? > > Known issue ? Ideas ? > I think, I found the issue. I am still running tests to verify. Does this sound correct ? Thanks, Badari Need to strip __GFP_HIGHMEM flag while passing to mem_container_cache_charge(). Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> mm/filemap.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6.23-rc6/mm/filemap.c === --- linux-2.6.23-rc6.orig/mm/filemap.c 2007-09-18 12:43:54.0 -0700 +++ linux-2.6.23-rc6/mm/filemap.c 2007-09-18 19:14:44.0 -0700 @@ -441,7 +441,8 @@ int filemap_write_and_wait_range(struct int add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) { - int error = mem_container_cache_charge(page, current->mm, gfp_mask); + int error = mem_container_cache_charge(page, current->mm, + gfp_mask & ~__GFP_HIGHMEM); if (error) goto out; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] cpuset dirty limits
Christoph Lameter wrote: > On Fri, 14 Sep 2007, Andrew Morton wrote: > >>> + mutex_lock(_mutex); >>> + *cs_int = val; >>> + mutex_unlock(_mutex); >> I don't think this locking does anything? > > Locking is wrong here. The lock needs to be taken before the cs pointer > is dereferenced from the caller. I think we can just remove the callback_mutex lock. Since the change is coming from an update to a cpuset filesystem file, the cpuset is not going anywhere since the inode is open. And I don't see that any code really cares whether the dirty ratios change out from under them. > >>> + return 0; >>> +} >>> + >>> /* >>> * Frequency meter - How fast is some event occurring? >>> * >>> ... >>> +void cpuset_get_current_ratios(int *background_ratio, int *throttle_ratio) >>> +{ >>> + int background = -1; >>> + int throttle = -1; >>> + struct task_struct *tsk = current; >>> + >>> + task_lock(tsk); >>> + background = task_cs(tsk)->background_dirty_ratio; >>> + throttle = task_cs(tsk)->throttle_dirty_ratio; >>> + task_unlock(tsk); >> ditto? > > It is required to take the task lock while dereferencing the tasks cpuset > pointer. Agreed. -- Ethan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wasting our Freedom
> sorry, but calling attribution claims of any sort "petty" is nothing > short of dangerous ignorance. Says a man who has a .sig of "SDF Public Access UNIX System - http://sdf.lonestar.org; Well sdf.lonestar.org claims to be NetBSD so might I suggest your dangerous ignorance starts at the Unix trademark. And please take this where it belongs which is the relevant wireless list. Better yet leave the dispute to those it actually involves, which is not most of the OpenBSD community, nor the Linux kernel team, but a small group of developers in the OpenBSD wireless world and a few people in the ath5k GPL project. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iso9660 vs udf
Hi, On Tue, 18 Sep 2007, Jan Kara wrote: > > > Today I got a CD. MacOS does not mount it and Linux does not > > mount it without an explicit filesystemtype option. > > That is, > > # mount /dev/hdc /dir -t iso9660 > > works fine, but > > # mount /dev/hdc /dir > > mount: you didn't specify a filesystem type for /dev/hdc > >I will try type udf > > mount: wrong fs type, bad option, bad superblock on /dev/hdc, > >missing codepage or other error > >In some cases useful info is found in syslog - try > >dmesg | tail or so > > # dmesg | tail > > UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', > > timestamp 2006/03/07 16:26 (1078) > > udf: udf_read_inode(ino 547) failed !bh > > UDF-fs: Error in udf_iget, block=1, partition=1 That comes from udf_fill_super() but which shouldn't have been called in the first place ... > > Google gave me half a dozen other people that mentioned the same > > problem (with the same inode 547). Clearly some CD mastering software > > produces a format that Linux and MacOS do not handle easily. > > > > One result of this letter will be that people with the same problem > > learn via Google that using the "-t iso9660" option may help. > > > > What goes wrong on the mount side is that when it hesitates between > > iso9660 and udf it decides for udf when seeing "NSR02". > > Maybe the heuristics in mount should be tuned. > Yes, this seems like a mount problem but you should contact mount > maintainer for that... I guess hardly anyone will help you with this on > this list. > > > On the other hand, this filesystem announces itself as UDF > > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"), > > perhaps the kernel code should be more robust. Could you send the complete dmesg log, and what you mean with filesystem/ kernel (incorrectly?) announcing it as UDF here ... I agree with Jan, this sounds like an issue with mount(8) to me. > > If anybody feels responsible for mount and/or this kernel area > > we might discuss. > I'm kind of taking care about UDF in kernel. What do you find > inappropriate on the kernel reaction? You mean we should produce some > better error message into the log? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] UML - Fix irqstack crash
On Tue, 18 Sep 2007 19:33:36 -0400 Jeff Dike <[EMAIL PROTECTED]> wrote: > === > --- linux-2.6.17.orig/arch/um/os-Linux/signal.c 2007-09-09 > 11:15:37.0 -0400 > +++ linux-2.6.17/arch/um/os-Linux/signal.c2007-09-18 12:32:40.0 > -0400 > @@ -119,7 +119,7 @@ void (*handlers[_NSIG])(int sig, struct > > void handle_signal(int sig, struct sigcontext *sc) > { > - unsigned long pending = 0; > + unsigned long pending = 1 << sig; You want 1UL there. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFS4 authentification / fsuid
On Fri, 7 Sep 2007, Kyle Moffett wrote: > > So you can't draw any relationships between "Protect the end-user" with > "Protect the device FROM the end-user", the former can be done very reliably ^^^ *attacker* > to whatever level of risk-reduction you need and the latter can't practically > be done at all. Well, you're the one who called solving the physical access problem "easy" here ... :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFS4 authentification / fsuid
On Thu, 6 Sep 2007, Kyle Moffett wrote: > > On Sep 06, 2007, at 19:35:14, Trond Myklebust wrote: > > > > On Thu, 2007-09-06 at 19:30 -0400, Kyle Moffett wrote: > > > > > > On Sep 06, 2007, at 11:06:16, J. Bruce Fields wrote: > > > > The question of how to protect against someone with *physical* ^^^ > > > > access certainly is more difficult, but surely that's a separate ^^ > > > > problem. > > > > > > Actually, that's a fairly simple problem (barring disassembling the system > > > and attaching a hardware debugger). You encrypt the root filesystem and > > > require a password to boot (See: LUKS). Debian has built-in support for > > > installing onto fs-on-LVM-on-crypt-on-RAID, and it works quite well on all > > > the laptops I use regularly. It's not even much of a speed penalty; once > > > you take the overhead of hitting a 5400RPM laptop drive you can chew > > > thousands of cycles of CPU without anybody noticing (much). Then all you > > > have to do is burn a copy of your /boot with bootloader onto some > > > read-only media (like a finalized CDROM/DVDROM) and you're set to go. > > > > Disconnect battery, and watch boot password go 'poof!'. > > Umm, I did say "encrypt the root filesystem", didn't I? Booting my laptops ^^^ The whole *point* here is to secure against physical access -- then how can you assume "barring disassembling the system"? If you're not considering attacks such as those, then how _are_ you solving the physical access problem in the first place? :-) > this way follows this procedure: > 1) Enter BIOS boot menu > 2) Insert /boot CDROM > 3) Select the "CDROM" entry > 4) Wait for kernel to start and run through initramfs > 5) Type password into the initramfs prompt so that it can DECRYPT THE ROOT > FILESYSTEM > 6) Continue to boot the system. > > Under this setup, tinkering with my BIOS does virtually nothing; the only > avenues of attack are strictly of the "Install a hardware keylogger" variety. Doesn't flashing/replacing your BIOS firmware/chip count as tinkering? Then I don't really need a "hardware keylogger", do I ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wasting our Freedom
On Tue, Sep 18, 2007 at 08:56:47AM -0400, Theodore Tso wrote: > all of the megabytes and megabhytes of flamewar is over these two > lines: > > > * Copyright (c) 2006-2007 Nick Kossifidis <[EMAIL PROTECTED]> > > * Copyright (c) 2007 Jiri Slaby <[EMAIL PROTECTED]> > > Petty, isn't it? Let's just say it's b.s. like this which is why, 16 > years ago, I decided to work with Linux instead of BSD. copyright assertion == claim of ownership, or posession. posession is 9/10 of the law. was it petty of UCB to claim copyrights over code USL claimed ownersip of? was it also petty of Novell to claim that they, and not SCO, owned the copyright to UNIX? sorry, but calling attribution claims of any sort "petty" is nothing short of dangerous ignorance. -- [EMAIL PROTECTED] SDF Public Access UNIX System - http://sdf.lonestar.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] UML - Fix irqstack crash
This patch fixes a crash caused by an interrupt coming in when an IRQ stack is being torn down. When this happens, handle_signal will loop, setting up the IRQ stack again because the tearing down had finished, and handling whatever signals had come in. However, to_irq_stack returns a mask of pending signals to be handled, plus bit zero is set if the IRQ stack was already active, and thus shouldn't be torn down. This causes a problem because when handle_signal goes around the loop, sig will be zero, and to_irq_stack will duly set bit zero in the returned mask, faking handle_signal into believing that it shouldn't tear down the IRQ stack and return thread_info pointers back to their original values. This will eventually cause a crash, as the IRQ stack thread_info will continue pointing to the original task_struct and an interrupt will look into it after it has been freed. The fix is to stop passing a signal number into to_irq_stack. Rather, the pending signals mask is initialized beforehand with the bit for sig already set. References to sig in to_irq_stack can be replaced with references to the mask. Signed-off-by: Jeff Dike <[EMAIL PROTECTED]> --- arch/um/include/kern_util.h |2 +- arch/um/kernel/irq.c|7 --- arch/um/os-Linux/signal.c |4 ++-- 3 files changed, 7 insertions(+), 6 deletions(-) Index: linux-2.6.17/arch/um/include/kern_util.h === --- linux-2.6.17.orig/arch/um/include/kern_util.h 2007-09-11 10:12:26.0 -0400 +++ linux-2.6.17/arch/um/include/kern_util.h2007-09-18 12:31:28.0 -0400 @@ -117,7 +117,7 @@ extern void sigio_handler(int sig, union extern void copy_sc(union uml_pt_regs *regs, void *from); -unsigned long to_irq_stack(int sig, unsigned long *mask_out); +extern unsigned long to_irq_stack(unsigned long *mask_out); unsigned long from_irq_stack(int nested); #endif Index: linux-2.6.17/arch/um/kernel/irq.c === --- linux-2.6.17.orig/arch/um/kernel/irq.c 2007-09-11 10:14:09.0 -0400 +++ linux-2.6.17/arch/um/kernel/irq.c 2007-09-18 12:32:08.0 -0400 @@ -518,13 +518,13 @@ int init_aio_irq(int irq, char *name, ir static unsigned long pending_mask; -unsigned long to_irq_stack(int sig, unsigned long *mask_out) +unsigned long to_irq_stack(unsigned long *mask_out) { struct thread_info *ti; unsigned long mask, old; int nested; - mask = xchg(_mask, 1 << sig); + mask = xchg(_mask, *mask_out); if(mask != 0){ /* If any interrupts come in at this point, we want to * make sure that their bits aren't lost by our @@ -534,7 +534,7 @@ unsigned long to_irq_stack(int sig, unsi * and pending_mask contains a bit for each interrupt * that came in. */ - old = 1 << sig; + old = *mask_out; do { old |= mask; mask = xchg(_mask, old); @@ -550,6 +550,7 @@ unsigned long to_irq_stack(int sig, unsi task = cpu_tasks[ti->cpu].task; tti = task_thread_info(task); + *ti = *tti; ti->real_thread = tti; task->stack = ti; Index: linux-2.6.17/arch/um/os-Linux/signal.c === --- linux-2.6.17.orig/arch/um/os-Linux/signal.c 2007-09-09 11:15:37.0 -0400 +++ linux-2.6.17/arch/um/os-Linux/signal.c 2007-09-18 12:32:40.0 -0400 @@ -119,7 +119,7 @@ void (*handlers[_NSIG])(int sig, struct void handle_signal(int sig, struct sigcontext *sc) { - unsigned long pending = 0; + unsigned long pending = 1 << sig; do { int nested, bail; @@ -134,7 +134,7 @@ void handle_signal(int sig, struct sigco * have to return, and the upper handler will deal * with this interrupt. */ - bail = to_irq_stack(sig, ); + bail = to_irq_stack(); if(bail) return; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Clarify pci_iomap() usage for MMIO-only devices
On Tue, 18 Sep 2007, Jeff Garzik wrote: > > Easy enough... 'pcimap' branch of > git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git This is wrong. You must not put it in lib/iomap.c, since that file is only compiled for architectures that use CONFIG_GENERIC_IOMAP. So you need to put it in some *generic* PCI place, like drivers/pci/pci.c or similar. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG? Suspend during active sound playback kills sound
On Tue, 18 Sep 2007 16:06:21 -0700 Shentino <[EMAIL PROTECTED]> wrote: > Run any program that opens the ALSA sound (and probably the dsp > legacy), and then suspend to disk during playback. > > On the next and each subsequent thawout, the sound is dead even if you > close and repoen the sound. Only a "cold boot" can fix it. Which kernel version, which sound driver and what type of sound card? Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFS4 authentification / fsuid
On Fri, 7 Sep 2007, J. Bruce Fields wrote: > > On Fri, Sep 07, 2007 at 01:32:52AM +0200, Trond Myklebust wrote: > > Sorry. Of course, you have to copy the entire /lib, etc. onto the tmpfs, > > but you get the gist > > > > The point is that it is easy to subvert userspace if you have enough > > privileges. In the above example it may not be entirely undetectable, > > but who here is running a script on every login to check that / is > > indeed uncompromised? > > I suppose this is the motivation for things like the "secure attention > key"? > > But I'm most curious actually about to what degree the kernel itself is > vulnerable to root (without a reboot). Is disabling /dev/kmem and > module-loading in theory enough? No, not in theory, not in practice. But yeah, restricting an attacker's ability to hack hardware (by controlling physical access) does take out a whole class of attack vectors. But, seriously, such discussion has the tendency to quickly get t theoretical (thus losing practical significance). For example, would we not also need to prevent the (userspace) superuser from being able to run arbitrary executables that can modify firmware? Okay, let's say we have a kernelspace infrastructure of verifying cryptographic signatures on binaries before executing them ... but how practical/usable is this? How practically/universally applicable is a system whose security derives from keeping machines behind locked doors and protected by incorruptible, armed guard? Overall, I tend to be unenthusiastic about most schemes that claim to have solved the user-kernel security problem (with no loss of usability/ practicality). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/4] Linux Kernel Markers - Documentation
On Tue, 18 Sep 2007 17:13:27 -0400 Mathieu Desnoyers wrote: > Here is some documentation explaining what is/how to use the Linux > Kernel Markers. > > --- > > Documentation/markers/markers.txt | 93 +++ > Documentation/markers/src/Makefile |7 ++ > Documentation/markers/src/marker-example.c | 55 > Documentation/markers/src/probe-example.c | 98 > + > 4 files changed, 253 insertions(+) > > Index: linux-2.6-lttng/Documentation/markers/markers.txt > === > --- /dev/null 1970-01-01 00:00:00.0 + > +++ linux-2.6-lttng/Documentation/markers/markers.txt 2007-09-07 > 09:17:45.0 -0400 > @@ -0,0 +1,93 @@ > +The marker mechanism supports inserting multiple instances of the same > marker. > +Markers can be put in inline functions, inlined static functions, and > +unrolled loops. as well as regular functions ? > +* Probe / marker example > + > +See the example provided in Documentation/markers/markers/src drop one of ^^^ "markers/" > +Run, as root : > + > +make > +insmod marker-example.ko (insmod order is not important) > +insmod probe-example.ko > +cat /proc/marker-example (returns an expected error) > +rmmod marker-example probe-example > +dmesg --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NFS4 authentification / fsuid
On Thu, 6 Sep 2007, J. Bruce Fields wrote: > > On Thu, Sep 06, 2007 at 01:59:50PM +0530, Satyam Sharma wrote: > > Oh and btw, note that we're talking of the (lack of) security of a > > "running kernel" here -- because across reboots, there is /really/ > > *absolutely* no such thing as "kernelspace security" because the superuser > > will simply switch the vmlinuz itself ... > > Well, the machine could be booting from cdrom, and could live in a > locked machine room. And how is this different from the "trusted tamperproof hardware" solution I proposed earlier? From an attack vector p.o.v. they are both precisely the same -- both of them are designed to prevent the attacker from gaining unfettered access to system hardware, hmm? Oh, actually, if past history is anything to go by, then your scheme is provably weaker. Security systems are invariably always broken at their weakest link, which is invariably always the human/social element, and your scheme derives its security by relying on *social* element. To elaborate my point, what prevents me from bribing / torturing / blackmailing whoever owns the key to that locked server room and ... The attack is "non-technical", but hey, so was your security :-) > Or people with root on a virtual host don't > necessarily have the ability to replace the kernel for that host. Again, you're restricting physical access ... but okay, this is a slightly more plausible solution (but one that applies to only a *specific* kind of situation). Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Clarify pci_iomap() usage for MMIO-only devices
Benjamin Herrenschmidt wrote: On Tue, 2007-09-18 at 16:21 -0400, Jeff Garzik wrote: A new pci_mmio_map() helper, to be used with 100% MMIO hardware, might help eliminate confusion. Maybe not the best name in theory but at least would show that it relates to existing ioremap would be pci_ioremap() Easy enough... 'pcimap' branch of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git Jeff commit 6e09c71822f76c618353682bf295fc7588284521 Author: Jeff Garzik <[EMAIL PROTECTED]> Date: Tue Sep 18 19:06:08 2007 -0400 Add pci_ioremap() to generic iomap lib. (arches that don't wish to use lib/iomap.c's version may fill in their own) Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> include/asm-generic/iomap.h |1 + lib/iomap.c | 34 ++ 2 files changed, 35 insertions(+) 6e09c71822f76c618353682bf295fc7588284521 diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index cde592f..611e6cf 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -63,6 +63,7 @@ extern void ioport_unmap(void __iomem *); /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */ struct pci_dev; extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max); +extern void __iomem *pci_ioremap(struct pci_dev *dev, int bar, unsigned long max); extern void pci_iounmap(struct pci_dev *dev, void __iomem *); #endif diff --git a/lib/iomap.c b/lib/iomap.c index 864f2ec..0338da0 100644 --- a/lib/iomap.c +++ b/lib/iomap.c @@ -275,9 +275,43 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen) return NULL; } +/** + * pci_ioremap - create a virtual mapping cookie for a memory-based PCI BAR + * @dev: PCI device that owns the BAR + * @bar: BAR number + * @maxlen: length of MMIO memory to map + * + * Using this function you will get a __iomem address to your device BAR. + * You can access it using read*() and write*(). + * + * @maxlen specifies the maximum length to map. If you want to get access to + * the complete BAR without checking for its length first, pass %0 here. + * */ +void __iomem *pci_ioremap(struct pci_dev *dev, int bar, unsigned long maxlen) +{ + unsigned long start = pci_resource_start(dev, bar); + unsigned long len = pci_resource_len(dev, bar); + unsigned long flags = pci_resource_flags(dev, bar); + + if (!len || !start) + return NULL; + if (maxlen && len > maxlen) + len = maxlen; + if (flags & IORESOURCE_IO) + return NULL; + if (flags & IORESOURCE_MEM) { + if (flags & IORESOURCE_CACHEABLE) + return ioremap(start, len); + return ioremap_nocache(start, len); + } + /* What? */ + return NULL; +} + void pci_iounmap(struct pci_dev *dev, void __iomem * addr) { IO_COND(addr, /* nothing */, iounmap(addr)); } EXPORT_SYMBOL(pci_iomap); +EXPORT_SYMBOL(pci_ioremap); EXPORT_SYMBOL(pci_iounmap);
BUG? Suspend during active sound playback kills sound
Run any program that opens the ALSA sound (and probably the dsp legacy), and then suspend to disk during playback. On the next and each subsequent thawout, the sound is dead even if you close and repoen the sound. Only a "cold boot" can fix it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS: some bad numbers with Java/database threading [FIXED]
On 09/18/2007 06:46 PM, Ingo Molnar wrote: >>> We need a (tested) >>> solution for 2.6.23 and the CFS-devel patches are not for 2.6.23. I've >>> attached below the latest version of the -rc6 yield patch - the switch >>> is not dependent on SCHED_DEBUG anymore but always available. >>> >> Is this going to be merged? And will you be making the default == 1 or >> just leaving it at 0, which forces people who want the older behavior >> to modify the default? > > not at the moment - Antoine suggested that the workload is probably fine > and the patch against -rc6 would have no clear effect anyway so we have > nothing to merge right now. (Note that there's no "older behavior" > possible, unless we want to emulate all of the O(1) scheduler's > behavior.) But ... we could still merge something like that patch, but a > clearer testcase is needed. The JVM's i have access to work fine. I just got a bug report today: https://bugzilla.redhat.com/show_bug.cgi?id=295071 == Description of problem: The CFS scheduler does not seem to implement sched_yield correctly. If one program loops with a sched_yield and another program prints out timing information in a loop. You will see that if both are taskset to the same core that the timing stats will be twice as long as when they are on different cores. This problem was not in 2.6.21-1.3194 but showed up in 2.6.22.4-65 and continues in the newest released kernel 2.6.22.5-76. Version-Release number of selected component (if applicable): 2.6.22.4-65 through 2.6.22.5-76 How reproducible: Very Steps to Reproduce: compile task1 int main() { while (1) { sched_yield(); } return 0; } and compile task2 #include #include int main() { while (1) { int i; struct timeval t0,t1; double usec; gettimeofday(, 0); for (i = 0; i < 1; ++i) ; gettimeofday(, 0); usec = (t1.tv_sec * 1e6 + t1.tv_usec) - (t0.tv_sec * 1e6 + t0.tv_usec); printf ("%8.0f\n", usec); } return 0; } Then run: "taskset -c 0 ./task1" "taskset -c 0 ./task2" You will see that both tasks use 50% of the CPU. Then kill task2 and run: "taskset -c 1 ./task2" Now task2 will run twice as fast verifying that it is not some anomaly with the way top calculates CPU usage with sched_yield. Actual results: Tasks with sched_yield do not yield like they are suppose to. Expected results: The sched_yield task's CPU usage should go to near 0% when another task is on the same CPU. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-dvb] [PATCH] Userspace tuner
On Tue, Sep 18, 2007 at 07:56:05PM -0300, Mauro Carvalho Chehab wrote: > proprietary format. This way, an userspace app may use the userspace > library as a "fallback method" for unknown FOURCC formats. The result > will be probably far away from an optimal result on some cases (since it > probably mean double buffering), but this will at least allow userspace > apps to work. As performance become an issue, the userspace app > developer may use the GPL code at userspace API as a reference to write > a proper optimized format driver for its apps. You can dynamically load libraries based on constructed path names which means you can write a simple library for media conversions which in turn will try and open libv4l-format-ABCD.so for any format it doesn't know - and thus is extensible - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-dvb] [PATCH] Userspace tuner
> The reason why there is no single 'format conversion library' that > everybody uses is because of the large differences between requirements > for such a thing. The line between 'format conversion' and things such > as a video codec, or image processing is very vague. Agreed. What I think it should happen is that the userspace library should focus at the "weird" codecs. E. g. those which uses some sort of proprietary format. This way, an userspace app may use the userspace library as a "fallback method" for unknown FOURCC formats. The result will be probably far away from an optimal result on some cases (since it probably mean double buffering), but this will at least allow userspace apps to work. As performance become an issue, the userspace app developer may use the GPL code at userspace API as a reference to write a proper optimized format driver for its apps. Just my 2 cents. Cheers, Mauro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1
Gabriel C wrote: > Sam Ravnborg wrote: >> On Tue, Sep 18, 2007 at 03:42:58PM -0400, Miles Lane wrote: >>> On 9/18/07, Sam Ravnborg <[EMAIL PROTECTED]> wrote: Hi Miles. On Tue, Sep 18, 2007 at 11:27:23AM -0400, Miles Lane wrote: > Selecting Help for "Subarchitecture Type" causes "make menuconfig" to > crash, and the bash display settings have to be reset. Not reproduceable here. But I noticed that we pass a null pointer to a vsprintf function which in the cases you pointed out printed a (null) at my system. Could you plase try if attached patch fix your system. >>> Sorry, it still crashes. I am running Ubuntu pre-6.10 (Gutsy -- the >>> development version of the distro). Maybe I should try "make >>> mrproper" first? >> make mrproper should not do any difference here. >> I rather think you hit some ncurses bug. >> >> If you could add '-g' to HOSTCFLAGS in top-level Makefile >> and then do: >> rm scripts/kconfig/mconf.o scripts/kconfig/mconf >> make menuconfig >> >> (to build mconf and to check that the error is still reproduceable). >> And then run it in a debugger like this: >> gdb scripts/kconfig/mconf >> run arch/x86_64/Kconfig >> ^^ replace with your actual arch >> >> Provoke the error and get a back-trace with 'bt'. > > Hi Sam, > > I can reproduce this bug on Frugalware Linux. > > Here the bt: > > Program received signal SIGSEGV, Segmentation fault. > > 0xb7dc4143 in strlen () from /lib/libc.so.6 > > > (gdb) bt > > > #0 0xb7dc4143 in strlen () from /lib/libc.so.6 > > > #1 0x0804fd60 in str_append (gs=0xbfe4f6e8, s=0x0) at > scripts/kconfig/util.c:87 > > #2 0x0804e0cb in expr_print (e=0x8e22df8, fn=0x804fda0 > , data=0xbfe4f6e8, prevtoken=0) at > scripts/kconfig/expr.c:1037 > #3 0x0804e1e7 in expr_gstr_print (e=0x8e22df8, gs=0xbfe4f6e8) at > scripts/kconfig/expr.c:1099 > > #4 0x0804a07e in get_symbol_str (r=0xbfe4f6e8, sym=0x8b54ee8) at > scripts/kconfig/mconf.c:334 > > #5 0x0804a363 in show_help (menu=0x8b54f88) at scripts/kconfig/mconf.c:738 > > > #6 0x0804acec in conf (menu=0x8b69480) at scripts/kconfig/mconf.c:781 > > > #7 0x0804a971 in conf (menu=0x8063c40) at scripts/kconfig/mconf.c:703 > > > #8 0x0804af8a in main (ac=Cannot access memory at address 0x0 > > > ) at scripts/kconfig/mconf.c:917 > > > Looks somewhat strange -> http://194.231.229.228/menuconfig.png > > PS: Is without the patch you posted , I'll try with in a bit The crash is still there but the (null)'s are all fixed by this patch. Gabriel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CFS: some bad numbers with Java/database threading [FIXED]
* Chuck Ebbert <[EMAIL PROTECTED]> wrote: > On 09/14/2007 11:32 AM, Ingo Molnar wrote: > > * Antoine Martin <[EMAIL PROTECTED]> wrote: > > > have an impact) Keep CONFIG_SCHED_DEBUG=y to be able to twiddle the > sysctl. > >> It looks good now! Updated results here: > >> http://devloop.org.uk/documentation/database-performance/Linux-Kernels/Kernels-ManyThreads-CombinedTests5-10msYield-noload.png > >> http://devloop.org.uk/documentation/database-performance/Linux-Kernels/Kernels-ManyThreads-CombinedTests5-10msYield.png > >> Compared with more kernels here - a bit more cluttered: > >> http://devloop.org.uk/documentation/database-performance/Linux-Kernels/Kernels-ManyThreads-CombinedTests4-10msYield-noload.png > >> > >> Thanks Ingo! > >> Does this mean that I'll have to keep doing: > >> echo 1 > /proc/sys/kernel/sched_yield_bug_workaround > >> Or are you planning on finding a more elegant solution? > > > > just to make sure - can you get it to work fast with the > > -rc6+yield-patch solution too? (i.e. not CFS-devel) We need a (tested) > > solution for 2.6.23 and the CFS-devel patches are not for 2.6.23. I've > > attached below the latest version of the -rc6 yield patch - the switch > > is not dependent on SCHED_DEBUG anymore but always available. > > > > Is this going to be merged? And will you be making the default == 1 or > just leaving it at 0, which forces people who want the older behavior > to modify the default? not at the moment - Antoine suggested that the workload is probably fine and the patch against -rc6 would have no clear effect anyway so we have nothing to merge right now. (Note that there's no "older behavior" possible, unless we want to emulate all of the O(1) scheduler's behavior.) But ... we could still merge something like that patch, but a clearer testcase is needed. The JVM's i have access to work fine. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wasting our Freedom
On Tue, 2007-09-18 at 11:55 -0700, Can E. Acar wrote: > Theodore Tso wrote: > > On Mon, Sep 17, 2007 at 03:06:37PM -0700, Can E. Acar wrote: > >> The only remaining issue is whether Nick & Jiri have enough > >> original contributions to the code to be added to the Copyright. > >> > >> I believe this needs to be resolved between Reyk and Nick and Jiri. > >> > >> The main reason of Theo's message, linked earlier, was the > >> lack of response on this issue. It seems that the SFLC is > >> dismissing this issue,000d8b92-0010lling its resolution by the > >> developers. > > > > OK, so all of this flaming, and digging up of "licenses ripped off", > > and chaff thrown up in the air, and moaning and bewailing about > > "theft", is now down to these two lines regarding Nick and Jiri: > > Yes, quite an improvement, considering how it all started, dont you think? > Pity it took so much pushing and dragging to get people to do the right > thing. > There is just one little step to go. It is can not be that hard, can it? > Apparently. > >> * Copyright (c) 2004-2007 Reyk Floeter <[EMAIL PROTECTED]> > >> * Copyright (c) 2006-2007 Nick Kossifidis <[EMAIL PROTECTED]> > >> * Copyright (c) 2007 Jiri Slaby <[EMAIL PROTECTED]> > >> [snip rest of BSD license] > > > > It's under a BSD license; what material difference does those two > > lines make, for goodness sake? It's under a BSD license, so it's not > > like anything won't be "given back". > > As a programmer, you sure would know what difference any "two lines" > would make on your program. When it comes to law, you seem to lose > that intuition. > > > > Whether or not they have made > > enough for changes is really a question for the lawyers, and may > > differ from one jurisdiction to another > > --- but whether or not they have now, or maybe will not make until later --- > > Well, they can add their names *anywhere* in the whole file, *except* > these two lines. See, these lines have a whole different meaning > when it comes to laws. When they make sufficient contribution, they > sure can add their names. What is so difficult to understand here? > So, here is the actual commit of the code in Linville's wireless networking development tree: http://git.kernel.org/?p=linux/kernel/git/linville/wireless-dev.git;a=commitdiff;h=fb32e1730a91e39adcf06ed5254bfc5a65d17a9b It I am not mistaken, it was Sunday afternoon, so probably 5/6 or more of this thread consisting of more than 110 messages (according to my inbox) to LKML was after this time. As this already had the BSD license ... Anyway, as for the changes, I am not going to check the original, but from the first commit up to now is here: http://git.kernel.org/?p=linux/kernel/git/linville/wireless-dev.git;a=blobdiff;f=drivers/net/wireless/ath5k_hw.c;h=e4cc307e9590a71bcc8542c45dbd2caf3f9e8fe5;hp=f273c42d4004b81597e7cfc5f7eec757a7c52910;hb=everything;hpb=fb32e1730a91e39adcf06ed5254bfc5a65d17a9b Running a diffstat shows: ath5k_hw.c | 344 + 1 file changed, 165 insertions(+), 179 deletions(-) But not having the original version, and as the other two lines are already present, I am not going to look closer at the changes. However, the question I wanted to ask, was this: Can all those that still feel that there is a problem, please go and look at the original, compare it to the current, and then determine (ie, go ask a lawyer or some other appropriate person if need be) if the changes is enough of a contribution *BEFORE* posting again? Pretty please with sugar on top? Thanks, M - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/7] Immediate Values - i386 Optimization
On Tue, Sep 18, 2007 at 03:29:50PM -0700, Jeremy Fitzhardinge wrote: > Andi Kleen wrote: > > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: > > > >> It's a pity that gas seems to generate plain 0x90 nops rather than > >> long-nop forms here. I thought it could do that. > >> > > > > .p2align does it > > Just .p2align? Not align, balign, org or skip? Seems... strange. The problem is that you cannot always safely jump into the middle of the longer form nops. So I suppose they didn't risk breakage on older code relying on this. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1
Sam Ravnborg wrote: > On Tue, Sep 18, 2007 at 03:42:58PM -0400, Miles Lane wrote: >> On 9/18/07, Sam Ravnborg <[EMAIL PROTECTED]> wrote: >>> Hi Miles. >>> On Tue, Sep 18, 2007 at 11:27:23AM -0400, Miles Lane wrote: Selecting Help for "Subarchitecture Type" causes "make menuconfig" to crash, and the bash display settings have to be reset. >>> Not reproduceable here. >>> But I noticed that we pass a null pointer to a vsprintf function which >>> in the cases you pointed out printed a (null) at my system. >>> Could you plase try if attached patch fix your system. >> Sorry, it still crashes. I am running Ubuntu pre-6.10 (Gutsy -- the >> development version of the distro). Maybe I should try "make >> mrproper" first? > > make mrproper should not do any difference here. > I rather think you hit some ncurses bug. > > If you could add '-g' to HOSTCFLAGS in top-level Makefile > and then do: > rm scripts/kconfig/mconf.o scripts/kconfig/mconf > make menuconfig > > (to build mconf and to check that the error is still reproduceable). > And then run it in a debugger like this: > gdb scripts/kconfig/mconf > run arch/x86_64/Kconfig > ^^ replace with your actual arch > > Provoke the error and get a back-trace with 'bt'. Hi Sam, I can reproduce this bug on Frugalware Linux. Here the bt: Program received signal SIGSEGV, Segmentation fault. 0xb7dc4143 in strlen () from /lib/libc.so.6 (gdb) bt #0 0xb7dc4143 in strlen () from /lib/libc.so.6 #1 0x0804fd60 in str_append (gs=0xbfe4f6e8, s=0x0) at scripts/kconfig/util.c:87 #2 0x0804e0cb in expr_print (e=0x8e22df8, fn=0x804fda0 , data=0xbfe4f6e8, prevtoken=0) at scripts/kconfig/expr.c:1037 #3 0x0804e1e7 in expr_gstr_print (e=0x8e22df8, gs=0xbfe4f6e8) at scripts/kconfig/expr.c:1099 #4 0x0804a07e in get_symbol_str (r=0xbfe4f6e8, sym=0x8b54ee8) at scripts/kconfig/mconf.c:334 #5 0x0804a363 in show_help (menu=0x8b54f88) at scripts/kconfig/mconf.c:738 #6 0x0804acec in conf (menu=0x8b69480) at scripts/kconfig/mconf.c:781 #7 0x0804a971 in conf (menu=0x8063c40) at scripts/kconfig/mconf.c:703 #8 0x0804af8a in main (ac=Cannot access memory at address 0x0 ) at scripts/kconfig/mconf.c:917 Looks somewhat strange -> http://194.231.229.228/menuconfig.png PS: Is without the patch you posted , I'll try with in a bit > > Thanks, > Sam Gabriel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] revert ath5k ioread32()/iowrite32() usage - use readl()/writel(), we're MMIO-only
Benjamin Herrenschmidt wrote: To be more precise, a platform has every right to return some kind of "token" from ioport_map/pci_iomap that encodes the type of address, and that is -different- from what a normal ioremap does. In which case, you will -not- be able to use readb/writeb & cie on such a token. The fact that current implementations seem to return something for MMIO that is equivalent to what ioremap returns is an accident and cannot be relied upon. Fair enough. It's easy enough to change ath5k to using ioremap (or pci_ioremap). Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Clarify pci_iomap() usage for MMIO-only devices
On Wed, 19 Sep 2007, Benjamin Herrenschmidt wrote: > > Also, I've been told that modern x86 chipsets have the ability to remap > IO space in the CPU physical address space. Is that true ? That would > allow even x86 to get rid of the condition and just use some magic > offset at map time. I've not seen that, but I wouldn't be entirely surprised if IO virtualization eventually causes something like this to happen: virtualizing PIO is just damn painful right now, due to the lack of any way to remap it. I *think* you may be confused with the PCI config cycles, where the new MMIO configuration was introduced (for similar virtualization reasons). But it's also possible that this is one of those undocumented areas and CPU's actually do have some IO remapping facility. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lguest] [PATCH] Introduce "used_vectors" bitmap which can be used to reserve vectors.
On 9/13/07, Rusty Russell <[EMAIL PROTECTED]> wrote: > Hi Andi and everyone, > > Wanted to get your thoughts on this patch. lguest now supports plan9 > guests which use 0x40 for system calls. We want to let the guests use > that vector if available, but have no way to stop io_apic from > clobbering it. This does that, and also simplifies the current code a > little. I can confirm that this patch supports Plan 9 perfectly as a guest. thanks ron - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/7] Immediate Values - i386 Optimization
Jeremy Fitzhardinge wrote: > Andi Kleen wrote: >> Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: >> >>> It's a pity that gas seems to generate plain 0x90 nops rather than >>> long-nop forms here. I thought it could do that. >>> >> .p2align does it > > Just .p2align? Not align, balign, org or skip? Seems... strange. > Probably it works for .align and .balign too, but not .org. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/7] Immediate Values - i386 Optimization
Andi Kleen wrote: > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: > >> It's a pity that gas seems to generate plain 0x90 nops rather than >> long-nop forms here. I thought it could do that. >> > > .p2align does it Just .p2align? Not align, balign, org or skip? Seems... strange. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/