Re: Two identical entries for "rtc" in /proc/devices

2007-09-18 Thread David Brownell
On Saturday 15 September 2007, Andrew Morton wrote:
> On Sat, 15 Sep 2007 11:50:21 -0700 David Brownell <[EMAIL PROTECTED]> wrote:
> 
> > > On Thu, 06 Sep 2007 18:23:22 -0400 Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> > >
> > > > # ls -li
> > > > total 0
> > > > 4026532007 -r--r--r-- 1 root root 0 Sep  6 18:18 nvram
> > > > 4026532067 -r--r--r-- 1 root root 0 Sep  6 18:18 rtc
> > > > 4026532067 -r--r--r-- 1 root root 0 Sep  6 18:18 rtc
> > > > 4026532056 -rw-r--r-- 1 root root 0 Sep  6 18:18 snd-page-alloc
> > >
> > > ...
> > 
> > Semes pretty clear that this must be procfs itself...
> > when a filesystem sees a name in a directory, it should
> > refuse to make another file with the same name.  And it
> > should *never* reuse inode numbers...
>
> ...
>
> procfs can reject the attempt to create the file, but the bottom line
> is that two different callsites are trying to create the same file.  One
> of those callsites needs fixing?

Both of those call sites have code to handle procfs rejecting
the file creation; nothing to fix.  And anyway, there's no way
this is a *caller* bug!

The missing step seems to be that proc_register() doesn't bother
to check whether there's already an entry for that file.  Which
is what the appended *UNTESTED* patch does (it compiles though).

- Dave

--- g26.orig/fs/proc/generic.c  2007-09-18 22:08:44.0 -0700
+++ g26/fs/proc/generic.c   2007-09-18 22:14:07.0 -0700
@@ -521,10 +521,11 @@ static const struct inode_operations pro
.setattr= proc_notify_change,
 };
 
-static int proc_register(struct proc_dir_entry * dir, struct proc_dir_entry * 
dp)
+static int proc_register(struct proc_dir_entry *dir, struct proc_dir_entry *dp)
 {
unsigned int i;
-   
+   struct proc_dir_entry *de;
+
i = get_inode_number();
if (i == 0)
return -EAGAIN;
@@ -547,6 +548,16 @@ static int proc_register(struct proc_dir
}
 
spin_lock(_subdir_lock);
+
+   for (de = dir->subdir; de ; de = de->next) {
+   if (de->namelen != dp->namelen)
+   continue;
+   if (!memcmp(de->name, dp->name, de->namelen)) {
+   spin_unlock(_subdir_lock);
+   return -EEXIST;
+   }
+   }
+
dp->next = dir->subdir;
dp->parent = dir;
dir->subdir = dp;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cafe_ccic: default to allocating DMA buffers at probe time

2007-09-18 Thread Andres Salomon
By default, we allocate DMA buffers when actually reading from the video
capture device.  On a system with 128MB or 256MB of ram, it's very easy
for that memory to quickly become fragmented.  We've had users report
having 30+MB of memory free, but the cafe_ccic driver is still unable to
allocate DMA buffers.

Our workaround has been to make use of the 'alloc_bufs_at_load' parameter
to allocate DMA buffers during device probing.  This patch makes DMA
buffer allocation happen during device probe by default, and changes
the parameter to 'alloc_bufs_at_read'.  The camera hardware is there,
if the cafe_ccic driver is enabled/loaded it should do its best to ensure
that the camera is actually usable; delaying DMA buffer allocation
saves an insignicant amount of memory, and causes the driver to be much
less useful.
---

 drivers/media/video/cafe_ccic.c |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/media/video/cafe_ccic.c b/drivers/media/video/cafe_ccic.c
index ef53618..3588a59 100644
--- a/drivers/media/video/cafe_ccic.c
+++ b/drivers/media/video/cafe_ccic.c
@@ -63,13 +63,13 @@ MODULE_SUPPORTED_DEVICE("Video");
  */
 
 #define MAX_DMA_BUFS 3
-static int alloc_bufs_at_load = 0;
-module_param(alloc_bufs_at_load, bool, 0444);
-MODULE_PARM_DESC(alloc_bufs_at_load,
-   "Non-zero value causes DMA buffers to be allocated at module "
-   "load time.  This increases the chances of successfully getting 
"
-   "those buffers, but at the cost of nailing down the memory from 
"
-   "the outset.");
+static int alloc_bufs_at_read = 0;
+module_param(alloc_bufs_at_read, bool, 0444);
+MODULE_PARM_DESC(alloc_bufs_at_read,
+   "Non-zero value causes DMA buffers to be allocated when the "
+   "video capture device is read, rather than at module load "
+   "time.  This saves memory, but decreases the chances of "
+   "successfully getting those buffers.");
 
 static int n_dma_bufs = 3;
 module_param(n_dma_bufs, uint, 0644);
@@ -1503,7 +1503,7 @@ static int cafe_v4l_release(struct inode *inode, struct 
file *filp)
}
if (cam->users == 0) {
cafe_ctlr_power_down(cam);
-   if (! alloc_bufs_at_load)
+   if (alloc_bufs_at_read)
cafe_free_dma_bufs(cam);
}
mutex_unlock(>s_mutex);
@@ -2162,7 +2162,7 @@ static int cafe_pci_probe(struct pci_dev *pdev,
/*
 * If so requested, try to get our DMA buffers now.
 */
-   if (alloc_bufs_at_load) {
+   if (!alloc_bufs_at_read) {
if (cafe_alloc_dma_bufs(cam, 1))
cam_warn(cam, "Unable to alloc DMA buffers at load"
" will try again later.");
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[IA64] Kexec: Remove vector from ia64_machine_kexec()

2007-09-18 Thread Simon Horman
The use of vector in ia64_machine_kexec() seems spurios,
and removing it simplifies the code slightly.

As suggested by Alex Williamson <[EMAIL PROTECTED]>

Cc: Alex Williamson <[EMAIL PROTECTED]>
Signed-off-by: Simon Horman <[EMAIL PROTECTED]>

Index: linux-2.6/arch/ia64/kernel/machine_kexec.c
===
--- linux-2.6.orig/arch/ia64/kernel/machine_kexec.c 2007-09-19 
13:43:42.0 +0900
+++ linux-2.6/arch/ia64/kernel/machine_kexec.c  2007-09-19 13:44:11.0 
+0900
@@ -79,7 +79,6 @@ static void ia64_machine_kexec(struct un
relocate_new_kernel_t rnk;
void *pal_addr = efi_get_pal_addr();
unsigned long code_addr = (unsigned 
long)page_address(image->control_code_page);
-   unsigned long vector;
int ii;
 
BUG_ON(!image);
@@ -107,11 +106,8 @@ static void ia64_machine_kexec(struct un
/* unmask TPR and clear any pending interrupts */
ia64_setreg(_IA64_REG_CR_TPR, 0);
ia64_srlz_d();
-   vector = ia64_get_ivr();
-   while (vector != IA64_SPURIOUS_INT_VECTOR) {
+   while (ia64_get_ivr() != IA64_SPURIOUS_INT_VECTOR)
ia64_eoi();
-   vector = ia64_get_ivr();
-   }
platform_kernel_launch_event();
rnk = (relocate_new_kernel_t)_addr;
(*rnk)(image->head, image->start, ia64_boot_param,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.23-rc6-mm1 and acpi

2007-09-18 Thread Michael Gerdau
Hi,

while trying to compile 2.6.23-rc6-mm1 I came across the following
build error:

[EMAIL PROTECTED]:/usr/src/linux-2.6.23-rc6-mm1> make modules
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
:1389:2: warning: #warning syscall revokeat not implemented
:1393:2: warning: #warning syscall frevoke not implemented
  CC [M]  drivers/acpi/sbs.o
drivers/acpi/sbs.c: In function ‘acpi_battery_alarm_show’:
drivers/acpi/sbs.c:457: error: implicit declaration of function 
‘acpi_battery_get_alarm’
drivers/acpi/sbs.c: In function ‘acpi_battery_alarm_store’:
drivers/acpi/sbs.c:472: error: implicit declaration of function 
‘acpi_battery_set_alarm’
drivers/acpi/sbs.c: In function ‘acpi_battery_add’:
drivers/acpi/sbs.c:829: warning: ignoring return value of ‘device_create_file’, 
declared with attribute warn_unused_result
make[2]: *** [drivers/acpi/sbs.o] Fehler 1
make[1]: *** [drivers/acpi] Fehler 2
make: *** [drivers] Fehler 2

Not sure who to CC, which is why I send it to the list alone.

Best,
Michael
-- 
 Vote against SPAM - see http://www.politik-digital.de/spam/
 Michael Gerdau   email: [EMAIL PROTECTED]
 GPG-keys available on request or at public keyserver


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-18 Thread Huang, Ying
On Tue, 2007-09-18 at 22:30 -0700, H. Peter Anvin wrote:
> Huang, Ying wrote:
> > Known issues:
> > 
> > - The hd0_info and hd1_info are deleted from the zero page. Additional
> >   work should be done for this? Or this is unnecessary (because no new
> >   fields will be added to zero page)?
> > 
> 
> For backwards compatibility, they should be marked as there for the
> short-medium term so we don't reuse them for whatever reason.

OK, I will add them back.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-18 Thread H. Peter Anvin
Huang, Ying wrote:
> Known issues:
> 
> - The hd0_info and hd1_info are deleted from the zero page. Additional
>   work should be done for this? Or this is unnecessary (because no new
>   fields will be added to zero page)?
> 

For backwards compatibility, they should be marked as there for the
short-medium term so we don't reuse them for whatever reason.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS4 authentification / fsuid

2007-09-18 Thread Kyle Moffett

On Sep 18, 2007, at 19:44:59, Satyam Sharma wrote:

On Thu, 6 Sep 2007, Kyle Moffett wrote:

On Sep 06, 2007, at 19:35:14, Trond Myklebust wrote:

On Thu, 2007-09-06 at 19:30 -0400, Kyle Moffett wrote:

On Sep 06, 2007, at 11:06:16, J. Bruce Fields wrote:
The question of how to protect against someone with *physical*  
access certainly is more difficult, but surely that's a  
separate problem.


Actually, that's a fairly simple problem (barring disassembling  
the system and attaching a hardware debugger).  You encrypt the  
root filesystem and  require a password to boot (See: LUKS).   
Debian has built-in support for installing onto fs-on-LVM-on- 
crypt-on-RAID, and it works quite well on all the laptops I use  
regularly.  It's not even much of a speed penalty; once you take  
the overhead of hitting a 5400RPM laptop drive you can chew  
thousands of cycles of CPU without anybody noticing (much).   
Then all you have to do is burn a copy of your /boot with  
bootloader onto some read-only media (like a finalized CDROM/ 
DVDROM) and you're set to go.


Disconnect battery, and watch boot password go 'poof!'.


Umm, I did say "encrypt the root filesystem", didn't I?  Booting  
my laptops


The whole *point* here is to secure against physical access -- then  
how can you assume "barring disassembling the system"? If you're  
not considering attacks such as those, then how _are_ you solving  
the physical access problem in the first place? :-)


Security is about fractional reduction of risk, and anybody who tells  
you otherwise is either ignorant or lying through their teeth.  There  
are *multiple* aspects of "physical access"; one of those is access  
while the box is off and no data resident in volatile memory, which  
is the easy case.  Basically there you can just encrypt the non- 
volatile storage.  If the system is *on* and has unencrypted data in  
memory (such as suspend-to-RAM for example) then you *HAVE* to ensure  
that it can't be easily disassembled and a hardware debugger  
attached; there is no way around that very fundamental limitation.


Basically if the key is resident and unencrypted as is necessary to  
*USE* the system, then no amount of hardware is going to *prevent* a  
dedicated attacker from getting at it unless you make it so  
unportable that you don't have to worry about somebody carrying it  
off in the first place.  Typical mechanisms to increase the time and  
effort to break into a device include wiring the entire enclosure  
with extremely thin filament wires and detecting automatically wiping  
the system upon any variation in a small flow of current through said  
filament.



this way follows this procedure:
 1) Enter BIOS boot menu
 2) Insert /boot CDROM
 3) Select the "CDROM" entry
 4) Wait for kernel to start and run through initramfs
 5) Type password into the initramfs prompt so that it can DECRYPT  
THE ROOT FILESYSTEM

 6) Continue to boot the system.

Under this setup, tinkering with my BIOS does virtually nothing;  
the only avenues of attack are strictly of the "Install a hardware  
keylogger" variety.


Doesn't flashing/replacing your BIOS firmware/chip count as  
tinkering?  Then I don't really need a "hardware keylogger", do I ...


Ok, so you are saying your plan of attack on this system would be:
  1)  Steal the laptop such that I don't notice it has been stolen
  2)  Open it up
  3)  Replace the very-vendor-specific BIOS chip with a reflashed  
one with sufficient storage to do all the things the old BIOS could  
*AND* have enough storage for an entire replacement kernel binary  
with a built-in keylogger, as well as some storage for the logged  
password
  4)  Return the laptop, again such that I don't notice it has been  
missing

  5)  Wait for me to boot and type my password
  6)  Somehow recover the laptop *yet* *again* to get the password  
back off of it and decrypt the disk


Yes it "can be done", but so can dumping the firmware for an iPod out  
through the built-in piezo clicker[1].  USE SOME COMMON SENSE HERE  
PEOPLE!!!  The only "unbreakable" computer is one always disconnected  
and off under armed guard in a bank vault, and even then it's only as  
secure as the bank in which it is stored (which get broken into on  
occasion).


I am assuming that if the laptop has sufficiently important data on  
it to warrant the above steps then I am also clueful enough to:

  (A)  Not carry the laptop around unsecured areas,
  (B)  Keep a close enough eye on it and be aware that it's gone by  
the time they get to step 2, OR

  (C)  Pay somebody to build me a better physical chassis for my laptop

We are talking about *STANDARD* laptop systems with reasonably alert  
users.  If the user doesn't know how to properly protect the stuff on  
the laptop then they probably don't know how to properly protect the  
other copy in their heads, either.  Besides, if some government  
wanted the data on your laptop that bad they'd just pick you up in  
the 

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread David Chinner
On Tue, Sep 18, 2007 at 06:06:52PM -0700, Linus Torvalds wrote:
> >  especially as the Linux
> > kernel limitations in this area are well known.  There's no "16K mess"
> > that SGI is trying to clean up here (and SGI have offered both IA64 and
> > x86_64 systems for some time now, so not sure how you came up with that
> > whacko theory).
> 
> Well, if that is the case, then I vote that we drop the whole patch-series 
> entirely. It clearly has no reason for existing at all.
> 
> There is *no* valid reason for 16kB blocksizes unless you have legacy 
> issues.

Ok, let's step back for a moment and look at a basic, fundamental
constraint of disks - seek capacity. A decade ago, a terabyte of
filesystem had 30 disks behind it - a seek capacity of about
6000 seeks/s. Nowdays, that's a single disk with a seek
capacity of about 200/s. We're going *rapidly* backwards in
terms of seek capacity per terabyte of storage.

Now fill that terabyte of storage and index it in the most efficient
way - let's say btrees are used because lots of filesystems use
them. Hence the depth of the tree is roughly O((log n)/m) where m is
a factor of the btree block size.  Effectively, btree depth = seek
count on lookup of any object.

When the filesystem had a capacity of 6,000 seeks/s, we didn't
really care if the indexes used 4k blocks or not - the storage
subsystem had an excess of seek capacity to deal with
less-than-optimal indexing. Now we have over an order of magnitude
less seeks to expend in index operations *for the same amount of
data* so we are really starting to care about minimising the
number of seeks in our indexing mechanisms and allocations.

We can play tricks in index compaction to reduce the number of
interior nodes of the tree (like hashed indexing in the XFS ext3
htree directories) but that still only gets us so far in reducing
seeks and doesn't help at all for tree traversals. That leaves us
with the btree block size as the only factor we can further vary to
reduce the depth of the tree. i.e. "m".

So we want to increase the filesystem block size it improve the
efficiency of our indexing. That improvement in efficiency
translates directly into better performance on seek constrained
storage subsystems.

The problem is this: to alter the fundamental block size of the
filesystem we also need to alter the data block size and that is
exactly the piece that linux does not support right now.  So while
we have the capability to use large block sizes in certain
filesystems, we can't use that capability until the data path
supports it.

To summarise, large block size support in the filesystem is not
about "legacy" issues. It's about trying to cope with the rapid
expansion of storage capabilities of modern hardware where we have
to index much, much more data with a corresponding decrease in
the seek capability of the hardware.

> So get your stories straight, people.

Ok, so let's set the record straight. There were 3 justifications
for using *large pages* to *support* large filesystem block sizes
The justifications for the variable order page cache with large
pages were:

1. little code change needed in the filesystems
-> still true

2. Increased I/O sizes on 4k page machines (the "SCSI
   controller problem")
-> redundant thanks to Jens Axboe's quick work

3. avoiding the need for vmap() as it has great
   overhead and does not scale
-> Nick is starting to work on that and has
   already had good results.

Everyone seems to be focussing on #2 as the entire justification for
large block sizes in filesystems and that this is an "SGI" problem.
Nothing could be further from the truth - the truth is that large
pages solved multiple problems in one go. We now have a different,
better solution #2, so please, please stop using that as some
justification for claiming filesystems don't need large block sizes.

However, all this doesn't change the fact that we have a major storage
scalability crunch coming in the next few years. Disk capacity is
likely to continue to double every 12 months for the next 3 or 4
years. Large block size support is only one mechanism we need to
help cope with this trend.

The variable order page cache with large pages was a means to an end
- it's not the only solution to this problem and I'm extremely happy
to see that there is progress on multiple fronts.  That's the
strength of the Linux community showing through.  In the end, I
really don't care how we end up supporting large filesystem block
sizes in the page cache - all I care about is that we end up
supporting it as efficiently and generically as we possibly can.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

[PATCH -mm] sound/hda: fix help text

2007-09-18 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Fix hda help text typo.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 sound/pci/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.23-rc6-mm1.orig/sound/pci/Kconfig
+++ linux-2.6.23-rc6-mm1/sound/pci/Kconfig
@@ -506,7 +506,7 @@ config SND_HDA_HWDEP
select SND_HWDEP
help
  Say Y here to build a hwdep interface for HD-audio driver.
- This interface can be used for out-of-bound communication
+ This interface can be used for out-of-band communication
  with codecs for debugging purposes.
 
 config SND_HDA_CODEC_REALTEK
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] kgdb: fix help text

2007-09-18 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Fix kgdb help text typos, grammar, config symbol names, and indentation.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 lib/Kconfig.kgdb |   42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

--- linux-2.6.23-rc6-mm1.orig/lib/Kconfig.kgdb
+++ linux-2.6.23-rc6-mm1/lib/Kconfig.kgdb
@@ -27,15 +27,15 @@ config KGDB_ARCH_HAS_SHADOW_INFO
 config KGDB_CONSOLE
bool "KGDB: Console messages through gdb"
depends on KGDB
- help
-   If you say Y here, console messages will appear through gdb.
-   Other consoles such as tty or ttyS will continue to work as usual.
-   Note, that if you use this in conjunction with KGDB_ETH, if the
-   ethernet driver runs into an error condition during use with KGDB
-   it is possible to hit an infinite recusrion, causing the kernel
-   to crash, and typically reboot.  For this reason, it is preferable
-   to use NETCONSOLE in conjunction with KGDB_ETH instead of
-   KGDB_CONSOLE.
+   help
+ If you say Y here, console messages will appear through gdb.
+ Other consoles such as tty or ttyS will continue to work as usual.
+ Note that if you use this in conjunction with KGDBOE, if the
+ ethernet driver runs into an error condition during use with KGDB,
+ it is possible to hit an infinite recursion, causing the kernel
+ to crash, and typically reboot.  For this reason, it is preferable
+ to use NETCONSOLE in conjunction with KGDBOE instead of
+ KGDB_CONSOLE.
 
 choice
prompt "Method for KGDB communication"
@@ -106,7 +106,7 @@ config KGDB_TXX9
bool "KGDB: On TX49xx serial port"
depends on MIPS && CPU_TX49XX
help
- Uses TX49xx serial port to communicate with the host KGDB.
+ Uses TX49xx serial port to communicate with the host GDB.
 
 config KGDB_SH_SCI
bool "KGDB: On SH SCI(F) serial port"
@@ -251,20 +251,18 @@ config KGDB_8250_CONF_STRING
depends on KGDB_8250_NOMODULE && !KGDB_SIMPLE_SERIAL
default "io,2f8,115200,3" if X86
help
- The format of this string should be ,,,.  For example, to use the
- serial port on an i386 box located at 0x2f8 and 115200 baud
- on IRQ 3 at use:
- io,2f8,115200,3
+ The format of this string should be ,
+ ,,.  For example, on an i386 box,
+ to use the serial port located at 0x2f8, IRQ 3, at 115200 baud
+ use:  io,2f8,115200,3
 
 config KGDB_ATTACH_WAIT
bool "KGDB: Wait for debugger to attach on an unknown exception"
default y if KGDB_8250_NOMODULE
default n if !KGDB_8250_NOMODULE
- help
-   If a panic occurs, or any kind of exception the kgdb will
-   stop and wait for a debugger to attach.  This sets the
-   default behavior for waiting for the debugger to attach.  This
-   value can also be changed at runtime through
-   /sys/module/kgdb/paramaters/attachwait
-
+   help
+ If a panic occurs, or any kind of exception, the kgdb will
+ stop and wait for a debugger to attach.  This sets the
+ default behavior for waiting for the debugger to attach.  This
+ value can also be changed at runtime through
+ /sys/module/kgdb/parameters/attachwait
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] watchdog: fix help text

2007-09-18 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Fix typos in uniform watchdog driver help text.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 drivers/watchdog/core/Kconfig |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- linux-2.6.23-rc6-mm1.orig/drivers/watchdog/core/Kconfig
+++ linux-2.6.23-rc6-mm1/drivers/watchdog/core/Kconfig
@@ -9,12 +9,12 @@ config WATCHDOG_CORE
depends on EXPERIMENTAL
default m
---help---
- Say Y here is you want to use the new uniform watchdog device
+ Say Y here if you want to use the new uniform watchdog device
  driver. This driver provides a framework for all watchdog
  device drivers and gives them the /dev/watchdog interface (and
- later also the sysfs interface)
+ later also the sysfs interface).
 
- At this moment only the iTCO_wdt driver uses this new frame-work.
+ At this moment only the iTCO_wdt driver uses this new framework.
 
  To compile this driver as a module, choose M here: the module will
  be called watchdog_core.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman

On 09/19/2007 06:33 AM, Linus Torvalds wrote:


On Wed, 19 Sep 2007, Rene Herman wrote:



I do feel larger blocksizes continue to make sense in general though. Packet
writing on CD/DVD is a problem already today since the hardware needs 32K or
64K blocks and I'd expect to see more of these and similiar situations when
flash gets (even) more popular which it sort of inevitably is going to be.


.. that's what scatter-gather exists for.

What's so hard with just realizing that physical memory isn't contiguous?

It's why we have MMU's. It's why we have scatter-gather. 


So if I understood that right, you'd suggest to deal with devices with 
larger physical blocksizes at some level above the current blocklayer.


Not familiar enough with either block or fs to be able to argue that 
effectively...


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch 2/2] Relay reset consumend

2007-09-18 Thread David J. Wilder
This patch allows relay channels to be reset i.e. unconsumed.
Basically allows a 'rewind' function for flight-recorder tracing.

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/filesystems/relay.txt |   11 ++
 include/linux/relay.h   |1 +
 kernel/relay.c  |   58 
---
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/relay.txt
b/Documentation/filesystems/relay.txt
index 18d23f9..d31113a 100644
--- a/Documentation/filesystems/relay.txt
+++ b/Documentation/filesystems/relay.txt
@@ -161,6 +161,7 @@ TBD(curr. line MT:/API/)
 relay_close(chan)
 relay_flush(chan)
 relay_reset(chan)
+relay_reset_consumed(chan)
 
   channel management typically called on instigation of userspace:
 
@@ -452,6 +453,16 @@ state without reallocating channel buffer memory or
destroying
 existing mappings.  It should however only be called when it's safe to
 do so, i.e. when the channel isn't currently being written to.
 
+The read(2) implementation always 'consumes' the bytes read,
+i.e. those bytes won't be available again to subsequent reads.
+Certain applications may nonetheless wish to allow the 'consumed' data
+to be re-read; relay_reset_consumed() is provided for that purpose -
+it resets the internal consumed counters for all buffers in the
+channel.  For example, if a first set of reads 'drains' the channel,
+and then relay_reset_consumed() is called, a second set of reads will
+get the exact same data (assuming no new data was written between the
+first set of reads and the second).
+
 Finally, there are a couple of utility callbacks that can be used for
 different purposes.  buf_mapped() is called whenever a channel buffer
 is mmapped from user space and buf_unmapped() is called when it's
diff --git a/include/linux/relay.h b/include/linux/relay.h
index 6cd8c44..aca45fa 100644
--- a/include/linux/relay.h
+++ b/include/linux/relay.h
@@ -175,6 +175,7 @@ extern void relay_subbufs_consumed(struct rchan
*chan,
   unsigned int cpu,
   size_t consumed);
 extern void relay_reset(struct rchan *chan);
+extern void relay_reset_consumed(struct rchan *chan);
 extern int relay_buf_full(struct rchan_buf *buf);
 
 extern size_t relay_switch_subbuf(struct rchan_buf *buf,
diff --git a/kernel/relay.c b/kernel/relay.c
index 61134eb..6b55eaa 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -383,6 +383,57 @@ void relay_reset(struct rchan *chan)
 }
 EXPORT_SYMBOL_GPL(relay_reset);
 
+/**
+ * __relay_reset_consumed - reset a channel buffer's consumed count
+ * @buf: the channel buffer
+ *
+ * See relay_reset_consumed for description of effect.
+ */
+static inline void __relay_reset_consumed(struct rchan_buf *buf)
+{
+   size_t n_subbufs = buf->chan->n_subbufs;
+   size_t produced = buf->subbufs_produced;
+   size_t consumed = buf->subbufs_consumed;
+
+   if (produced < n_subbufs)
+   buf->subbufs_consumed = 0;
+   else {
+   consumed = produced - n_subbufs;
+   if (buf->offset)
+   consumed++;
+   buf->subbufs_consumed = consumed;
+   }
+   buf->bytes_consumed = 0;
+}
+
+/**
+ * relay_reset_consumed - reset the channel's consumed counts
+ * @chan: the channel
+ *
+ * This has the effect of making all data previously read (and
+ * not overwritten by subsequent writes) from a channel available
+ * for reading again.
+ *
+ * NOTE: Care should be taken that the channel isn't actually
+ * being used by anything when this call is made.
+ */
+void relay_reset_consumed(struct rchan *chan)
+{
+   unsigned int i;
+   struct rchan_buf *prev = NULL;
+
+   if (!chan)
+   return;
+
+   for (i = 0; i < NR_CPUS; i++) {
+   if (!chan->buf[i] || chan->buf[i] == prev)
+   break;
+   __relay_reset_consumed(chan->buf[i]);
+   prev = chan->buf[i];
+   }
+}
+EXPORT_SYMBOL_GPL(relay_reset_consumed);
+
 /*
  * relay_open_buf - create a new relay channel buffer
  *
@@ -845,11 +896,8 @@ static int relay_file_read_avail(struct rchan_buf
*buf, size_t read_pos)
return 1;
}
 
-   if (unlikely(produced - consumed >= n_subbufs)) {
-   consumed = produced - n_subbufs + 1;
-   buf->subbufs_consumed = consumed;
-   buf->bytes_consumed = 0;
-   }
+   if (unlikely(produced - consumed >= n_subbufs))
+   __relay_reset_consumed(buf);
 
produced = (produced % n_subbufs) * subbuf_size + buf->offset;
consumed = (consumed % n_subbufs) * subbuf_size + buf->bytes_consumed;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

[Patch 1/2] Trace code and documentation (updated)

2007-09-18 Thread David J. Wilder
Trace - Provides tracing primitives

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: Martin Hunt <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/trace/src/Makefile |7 +
 Documentation/trace/src/README   |   18 +
 Documentation/trace/src/fork_trace.c |  103 ++
 Documentation/trace/trace.txt|  164 ++
 include/linux/trace.h|   99 ++
 lib/Kconfig  |9 +
 lib/Makefile |2 +
 lib/trace.c  |  563 +++
+++
 8 files changed, 965 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace/src/Makefile
b/Documentation/trace/src/Makefile
new file mode 100644
index 000..9ee4c72
--- /dev/null
+++ b/Documentation/trace/src/Makefile
@@ -0,0 +1,7 @@
+obj-m := fork_trace.o
+KDIR := /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+default:
+   $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+   rm -f *.mod.c *.ko *.o
diff --git a/Documentation/trace/src/README
b/Documentation/trace/src/README
new file mode 100644
index 000..f538491
--- /dev/null
+++ b/Documentation/trace/src/README
@@ -0,0 +1,18 @@
+This small sample module creates a trace channel. It places a kprobe
+on the function do_fork(). The value of current->pid is written to
+the trace channel each time the kprobe is hit..
+
+How to run the example:
+$ mount -t debugfs /debug
+$ make
+$ insmod fork_trace.ko
+
+To view the data produced by the module:
+$ cat /debug/trace_example/do_fork/trace0
+
+Remove the module.
+$ rmmod fork_trace
+
+The function trace_cleanup() is called when the module
+is removed.  This will cause the TRACE channel to be destroyed and the
+corresponding files to disappear from the debug file system.
diff --git a/Documentation/trace/src/fork_trace.c
b/Documentation/trace/src/fork_trace.c
new file mode 100644
index 000..7dad4cc
--- /dev/null
+++ b/Documentation/trace/src/fork_trace.c
@@ -0,0 +1,103 @@
+/* fork_trace.c - An example of using trace in a kprobes module */
+#include 
+#include 
+#include 
+#include 
+
+#define USE_GLOBAL_BUFFERS 1
+#define USE_FLIGHT 1
+
+#define PROBE_POINT "do_fork"
+
+static struct kprobe kp;
+static struct trace_info *kprobes_trace;
+
+#ifdef USE_GLOBAL_BUFFERS
+static DEFINE_SPINLOCK(trace_lock);
+#endif
+
+#define TRACE_PRINTF_TMPBUF_SIZE (1024)
+static char trace_tmpbuf[NR_CPUS][TRACE_PRINTF_TMPBUF_SIZE];
+
+static void trace_printf(struct trace_info *trace, const char
*format, ...)
+{
+   va_list args;
+   void *buf;
+   char *record;
+   int len = 0;
+
+   if (!trace)
+   return;
+
+   buf = trace_tmpbuf[smp_processor_id()];
+
+#ifdef USE_GLOBAL_BUFFERS
+   spin_lock(_lock);
+#endif
+
+   rcu_read_lock();
+   if (trace_running(trace)) {
+   va_start(args, format);
+   len = vscnprintf(buf, TRACE_PRINTF_TMPBUF_SIZE,
+format, args);
+   va_end(args);
+   record = relay_reserve(trace->rchan, len);
+   if (record)
+   memcpy(record, buf, len);
+   }
+   rcu_read_unlock();
+
+#ifdef USE_GLOBAL_BUFFERS
+   spin_unlock(_lock);
+#endif
+}
+
+
+static int handler_pre(struct kprobe *p, struct pt_regs *regs)
+{
+   trace_printf(kprobes_trace, "%d\n", current->pid);
+   return 0;
+}
+
+
+int init_module(void)
+{
+   int ret;
+   u32 flags = 0;
+
+#ifdef USE_GLOBAL_BUFFERS
+   flags |= TRACE_GLOBAL_CHANNEL;
+#endif
+
+#ifdef USE_FLIGHT
+   flags |= TRACE_FLIGHT_CHANNEL;
+#endif
+
+   /* setup the trace */
+   kprobes_trace = trace_setup("trace_example", PROBE_POINT,
+1024, 8, flags);
+   if (IS_ERR(kprobes_trace))
+   return PTR_ERR(kprobes_trace);
+
+   trace_start(kprobes_trace);
+
+   /* setup the kprobe */
+   kp.pre_handler = handler_pre;
+   kp.post_handler = NULL;
+   kp.fault_handler = NULL;
+   kp.symbol_name = PROBE_POINT;
+   ret = register_kprobe();
+   if (ret) {
+   printk(KERN_ERR "fork_trace: register_kprobe failed\n");
+   return ret;
+   }
+   return 0;
+}
+
+void cleanup_module(void)
+{
+   unregister_kprobe();
+   trace_stop(kprobes_trace);
+   trace_cleanup(kprobes_trace);
+}
+MODULE_LICENSE("GPL");
diff --git a/Documentation/trace/trace.txt
b/Documentation/trace/trace.txt
new file mode 100644
index 000..d88cb8f
--- /dev/null
+++ b/Documentation/trace/trace.txt
@@ -0,0 +1,164 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.  The
+trace interface builds on the relay interface.  For a complete
+description of the relay interface, please see:
+Documentation/filesystems/relay.txt.
+
+The trace 

[Patch 0/2] A Kernel Tracing Interface (updated)

2007-09-18 Thread David J. Wilder
These patches provide a kernel tracing interface called "trace".

The motivation for "trace" is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

History- Versions of this code have been submitted for review under
a couple of different names.  The original submission was called UTT,
it was later re-submitted as GTSC.   Christoph Hellwig commented "The
code looks fine ...but the name is just dumb".  Following Christoph's
advice, I changed the name to simply "Trace".

This patch addresses review comments made by Christoph Hellwig and
Mathieu Desnoyers.  Changes include the addition of a mutex and
synchronization protecting trace state changes (using RCU) and the
reduction of the number of exports.

Patch Updated Sep. 18,2007
Addressed further review comments by Andrew Morton, Randy Dunlap,
and Sam Ravnborg.

Patches are against 2.6.23-rc6-mm1

Required patches:
1/2 Trace code and documentation
2/2 Relay Reset Consumed patch (required for trace's "rewind" feature")

Signed-off-by: David Wilder <[EMAIL PROTECTED]>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds


On Wed, 19 Sep 2007, Rene Herman wrote:
> 
> I do feel larger blocksizes continue to make sense in general though. Packet
> writing on CD/DVD is a problem already today since the hardware needs 32K or
> 64K blocks and I'd expect to see more of these and similiar situations when
> flash gets (even) more popular which it sort of inevitably is going to be.

.. that's what scatter-gather exists for.

What's so hard with just realizing that physical memory isn't contiguous?

It's why we have MMU's. It's why we have scatter-gather. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman

On 09/19/2007 05:50 AM, Linus Torvalds wrote:


On Wed, 19 Sep 2007, Rene Herman wrote:



Well, not so sure about that. What if one of your expected uses for example is
video data storage -- lots of data, especially for multiple streams, and needs
still relatively fast machinery. Why would you care for the overhead af
_small_ blocks?


.. so work with an extent-based filesystem instead.

16k blocks are total idiocy. If this wasn't about a "support legacy 
customers", I think the whole patch-series has been a total waste of time.


Admittedly, extent-based might not be a particularly bad answer at least to 
the I/O side of the equation...


I do feel larger blocksizes continue to make sense in general though. Packet 
writing on CD/DVD is a problem already today since the hardware needs 32K or 
64K blocks and I'd expect to see more of these and similiar situations when 
flash gets (even) more popular which it sort of inevitably is going to be.


Rene.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [14/17] Allow bit_waitqueue to wait on a bit in a vmalloc area

2007-09-18 Thread Gabriel C
Christoph Lameter wrote:

>  
> + if (is_vmalloc_addr(word))
> + page = vmalloc_to_page(word)
^^
Missing ' ; '

> + else
> + page = virt_to_page(word);
> +
> + zone = page_zone(page);
>   return >wait_table[hash_long(val, zone->wait_table_bits)];
>  }
>  EXPORT_SYMBOL(bit_waitqueue);
> 

Regards,

Gabriel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Time to make CONFIG_PARAVIRT non-experimental.

2007-09-18 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> At least the Xen port seems to have specific requirements
> and essentially only work on xen-unstable (?) [or at least
> some very new Xen version] which probably very few
> people use.
>   

Only on 64-bit hosts, because of bugs in the 64-bit compat layer. 
32-on-32 and 64-on-64 (when its done) should work fine.

BTW, what does "xm info" say on your system that fails?  I'll try to put
a more graceful failure in there.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds


On Wed, 19 Sep 2007, Rene Herman wrote:
> 
> Well, not so sure about that. What if one of your expected uses for example is
> video data storage -- lots of data, especially for multiple streams, and needs
> still relatively fast machinery. Why would you care for the overhead af
> _small_ blocks?

.. so work with an extent-based filesystem instead.

16k blocks are total idiocy. If this wasn't about a "support legacy 
customers", I think the whole patch-series has been a total waste of time.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git] CFS-devel, group scheduler, fixes

2007-09-18 Thread Srivatsa Vaddagiri
On Tue, Sep 18, 2007 at 10:22:43PM +0200, Ingo Molnar wrote:
> (I have not tested the group scheduling bits but perhaps Srivatsa would 
> like to do that?)

Ingo,
I plan to test it today and send you any updates that may be
required.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[05/17] vunmap: return page array

2007-09-18 Thread Christoph Lameter
Make vunmap return the page array that was used at vmap. This is useful
if one has no structures to track the page array but simply stores the
virtual address somewhere. The disposition of the page array can be
decided upon after vunmap. vfree() may now also be used instead of
vunmap which will release the page array after vunmap'ping it.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/vmalloc.h |2 +-
 mm/vmalloc.c|   26 --
 2 files changed, 17 insertions(+), 11 deletions(-)

Index: linux-2.6/include/linux/vmalloc.h
===
--- linux-2.6.orig/include/linux/vmalloc.h  2007-09-18 13:22:56.0 
-0700
+++ linux-2.6/include/linux/vmalloc.h   2007-09-18 13:22:57.0 -0700
@@ -49,7 +49,7 @@ extern void vfree(const void *addr);
 
 extern void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
-extern void vunmap(const void *addr);
+extern struct page **vunmap(const void *addr);
 
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
Index: linux-2.6/mm/vmalloc.c
===
--- linux-2.6.orig/mm/vmalloc.c 2007-09-18 13:22:56.0 -0700
+++ linux-2.6/mm/vmalloc.c  2007-09-18 13:22:57.0 -0700
@@ -356,17 +356,18 @@ struct vm_struct *remove_vm_area(const v
return v;
 }
 
-static void __vunmap(const void *addr, int deallocate_pages)
+static struct page **__vunmap(const void *addr, int deallocate_pages)
 {
struct vm_struct *area;
+   struct page **pages;
 
if (!addr)
-   return;
+   return NULL;
 
if ((PAGE_SIZE-1) & (unsigned long)addr) {
printk(KERN_ERR "Trying to vfree() bad address (%p)\n", addr);
WARN_ON(1);
-   return;
+   return NULL;
}
 
area = remove_vm_area(addr);
@@ -374,29 +375,30 @@ static void __vunmap(const void *addr, i
printk(KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
addr);
WARN_ON(1);
-   return;
+   return NULL;
}
 
+   pages = area->pages;
debug_check_no_locks_freed(addr, area->size);
 
if (deallocate_pages) {
int i;
 
for (i = 0; i < area->nr_pages; i++) {
-   struct page *page = area->pages[i];
+   struct page *page = pages[i];
 
BUG_ON(!page);
__free_page(page);
}
 
if (area->flags & VM_VPAGES)
-   vfree(area->pages);
+   vfree(pages);
else
-   kfree(area->pages);
+   kfree(pages);
}
 
kfree(area);
-   return;
+   return pages;
 }
 
 /**
@@ -424,11 +426,13 @@ EXPORT_SYMBOL(vfree);
  * which was created from the page array passed to vmap().
  *
  * Must not be called in interrupt context.
+ *
+ * Returns a pointer to the array of pointers to page structs
  */
-void vunmap(const void *addr)
+struct page **vunmap(const void *addr)
 {
BUG_ON(in_interrupt());
-   __vunmap(addr, 0);
+   return __vunmap(addr, 0);
 }
 EXPORT_SYMBOL(vunmap);
 
@@ -453,6 +457,8 @@ void *vmap(struct page **pages, unsigned
area = get_vm_area((count << PAGE_SHIFT), flags);
if (!area)
return NULL;
+   area->pages = pages;
+   area->nr_pages = count;
if (map_vm_area(area, prot, )) {
vunmap(area->addr);
return NULL;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Rene Herman

On 09/18/2007 09:44 PM, Linus Torvalds wrote:


Nobody sane would *ever* argue for 16kB+ blocksizes in general.


Well, not so sure about that. What if one of your expected uses for example 
is video data storage -- lots of data, especially for multiple streams, and 
needs still relatively fast machinery. Why would you care for the overhead 
af _small_ blocks?


Okay, maybe that's covered in the "in general" but its not extremely oddball 
either...


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[07/17] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings

2007-09-18 Thread Christoph Lameter
This adds a new gfp flag

__GFP_VFALLBACK

If specified during a higher order allocation then the system will fall
back to vmap and attempt to create a virtually contiguous area instead of
a physically contiguous area. In many cases the virtually contiguous area
can stand in for the physically contiguous area (with some loss of
performance).

The pages used for VFALLBACK are marked with a new flag
PageVcompound(page). The mark is necessary since we have to know upon
free if we have to destroy a virtual mapping. No additional flag is
consumed through the use of PG_swapcache together with PG_compound
(similar to PageHead() and PageTail()).

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/gfp.h|5 +
 include/linux/page-flags.h |   18 +++
 mm/page_alloc.c|  113 ++---
 3 files changed, 130 insertions(+), 6 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2007-09-18 17:03:54.0 -0700
+++ linux-2.6/mm/page_alloc.c   2007-09-18 18:25:46.0 -0700
@@ -1230,6 +1230,86 @@ try_next_zone:
 }
 
 /*
+ * Virtual Compound Page support.
+ *
+ * Virtual Compound Pages are used to fall back to order 0 allocations if large
+ * linear mappings are not available and __GFP_VFALLBACK is set. They are
+ * formatted according to compound page conventions. I.e. following
+ * page->first_page if PageTail(page) is set can be used to determine the
+ * head page.
+ */
+struct page *vcompound_alloc(gfp_t gfp_mask, int order,
+   struct zonelist *zonelist, unsigned long alloc_flags)
+{
+   void *addr;
+   struct page *page;
+   int i;
+   int nr_pages = 1 << order;
+   struct page **pages = kzalloc((nr_pages + 1) * sizeof(struct page *),
+   gfp_mask & GFP_LEVEL_MASK);
+
+   if (!pages)
+   return NULL;
+
+   for (i = 0; i < nr_pages; i++) {
+   page = get_page_from_freelist(gfp_mask & ~__GFP_VFALLBACK,
+   0, zonelist, alloc_flags);
+   if (!page)
+   goto abort;
+
+   /* Sets PageCompound which makes PageHead(page) true */
+   __SetPageVcompound(page);
+   if (i) {
+   page->first_page = pages[0];
+   __SetPageTail(page);
+   }
+   pages[i] = page;
+   }
+
+   addr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
+   if (!addr)
+   goto abort;
+
+   return pages[0];
+
+abort:
+   for (i = 0; i < nr_pages; i++) {
+   page = pages[i];
+   if (!page)
+   continue;
+   __ClearPageTail(page);
+   __ClearPageHead(page);
+   __ClearPageVcompound(page);
+   __free_page(page);
+   }
+   kfree(pages);
+   return NULL;
+}
+
+static void vcompound_free(void *addr)
+{
+   struct page **pages = vunmap(addr);
+   int i;
+
+   /*
+* First page will have zero refcount since it maintains state
+* for the compound and was decremented before we got here.
+*/
+   __ClearPageHead(pages[0]);
+   __ClearPageVcompound(pages[0]);
+   free_hot_page(pages[0]);
+
+   for (i = 1; pages[i]; i++) {
+   struct page *page = pages[i];
+
+   __ClearPageTail(page);
+   __ClearPageVcompound(page);
+   __free_page(page);
+   }
+   kfree(pages);
+}
+
+/*
  * This is the 'heart' of the zoned buddy allocator.
  */
 struct page * fastcall
@@ -1324,12 +1404,12 @@ nofail_alloc:
goto nofail_alloc;
}
}
-   goto nopage;
+   goto try_vcompound;
}
 
/* Atomic allocations - we can't balance anything */
if (!wait)
-   goto nopage;
+   goto try_vcompound;
 
cond_resched();
 
@@ -1360,6 +1440,11 @@ nofail_alloc:
 */
page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET);
+
+   if (!page && order && (gfp_mask & __GFP_VFALLBACK))
+   page = vcompound_alloc(gfp_mask, order,
+   zonelist, alloc_flags);
+
if (page)
goto got_pg;
 
@@ -1391,6 +1476,14 @@ nofail_alloc:
goto rebalance;
}
 
+try_vcompound:
+   /* Last chance before failing the allocation */
+   if (order && (gfp_mask & __GFP_VFALLBACK)) {
+   page = vcompound_alloc(gfp_mask, order,
+   zonelist, alloc_flags);
+   if (page)
+   goto got_pg;
+

[06/17] vmalloc_address(): Determine vmalloc address from page struct

2007-09-18 Thread Christoph Lameter
Sometimes we need to figure out which vmalloc address is in use
for a certain page struct. There is no easy way to figure out
the vmalloc address from the page struct. So simply search through
the kernel page table to find the address. This is a fairly expensive
process. Use sparingly (or provide a better implementation).

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/vmalloc.h |3 +
 mm/vmalloc.c|   77 
 2 files changed, 80 insertions(+)

Index: linux-2.6/mm/vmalloc.c
===
--- linux-2.6.orig/mm/vmalloc.c 2007-09-18 18:35:13.0 -0700
+++ linux-2.6/mm/vmalloc.c  2007-09-18 18:35:18.0 -0700
@@ -196,6 +196,83 @@ struct page *vmalloc_to_page(const void 
 EXPORT_SYMBOL(vmalloc_to_page);
 
 /*
+ * Determine vmalloc address from a page struct.
+ *
+ * Linear search through all ptes of the vmalloc area.
+ */
+static unsigned long vaddr_pte_range(pmd_t *pmd, unsigned long addr,
+   unsigned long end, unsigned long pfn)
+{
+   pte_t *pte;
+
+   pte = pte_offset_kernel(pmd, addr);
+   do {
+   pte_t ptent = *pte;
+   if (pte_present(ptent) && pte_pfn(ptent) == pfn)
+   return addr;
+   } while (pte++, addr += PAGE_SIZE, addr != end);
+   return 0;
+}
+
+static inline unsigned long vaddr_pmd_range(pud_t *pud, unsigned long addr,
+   unsigned long end, unsigned long pfn)
+{
+   pmd_t *pmd;
+   unsigned long next;
+   unsigned long n;
+
+   pmd = pmd_offset(pud, addr);
+   do {
+   next = pmd_addr_end(addr, end);
+   if (pmd_none_or_clear_bad(pmd))
+   continue;
+   n = vaddr_pte_range(pmd, addr, next, pfn);
+   if (n)
+   return n;
+   } while (pmd++, addr = next, addr != end);
+   return 0;
+}
+
+static inline unsigned long vaddr_pud_range(pgd_t *pgd, unsigned long addr,
+   unsigned long end, unsigned long pfn)
+{
+   pud_t *pud;
+   unsigned long next;
+   unsigned long n;
+
+   pud = pud_offset(pgd, addr);
+   do {
+   next = pud_addr_end(addr, end);
+   if (pud_none_or_clear_bad(pud))
+   continue;
+   n = vaddr_pmd_range(pud, addr, next, pfn);
+   if (n)
+   return n;
+   } while (pud++, addr = next, addr != end);
+   return 0;
+}
+
+void *vmalloc_address(struct page *page)
+{
+   pgd_t *pgd;
+   unsigned long next, n;
+   unsigned long addr = VMALLOC_START;
+   unsigned long pfn = page_to_pfn(page);
+
+   pgd = pgd_offset_k(VMALLOC_START);
+   do {
+   next = pgd_addr_end(addr, VMALLOC_END);
+   if (pgd_none_or_clear_bad(pgd))
+   continue;
+   n = vaddr_pud_range(pgd, addr, next, pfn);
+   if (n)
+   return (void *)n;
+   } while (pgd++, addr = next, addr < VMALLOC_END);
+   return NULL;
+}
+EXPORT_SYMBOL(vmalloc_address);
+
+/*
  * Map a vmalloc()-space virtual address to the physical page frame number.
  */
 unsigned long vmalloc_to_pfn(const void *vmalloc_addr)
Index: linux-2.6/include/linux/vmalloc.h
===
--- linux-2.6.orig/include/linux/vmalloc.h  2007-09-18 18:35:13.0 
-0700
+++ linux-2.6/include/linux/vmalloc.h   2007-09-18 18:35:48.0 -0700
@@ -85,6 +85,9 @@ extern void free_vm_area(struct vm_struc
 struct page *vmalloc_to_page(const void *addr);
 unsigned long vmalloc_to_pfn(const void *addr);
 
+/* Determine address from page struct pointer */
+void *vmalloc_address(struct page *);
+
 /*
  * Internals.  Dont't use..
  */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-18 Thread Christoph Lameter
SLAB_VFALLBACK can be specified for selected slab caches. If fallback is
available then the conservative settings for higher order allocations are
overridden. We then request an order that can accomodate at mininum
100 objects. The size of an individual slab allocation is allowed to reach
up to 256k (order 6 on i386, order 4 on IA64).

Implementing fallback requires special handling of virtual mappings in
the free path. However, the impact is minimal since we already check the
address if its NULL or ZERO_SIZE_PTR. No additional cachelines are
touched if we do not fall back. However, if we need to handle a virtual
compound page then walk the kernel page table in the free paths to
determine the page struct.

We also need special handling in the allocation paths since the virtual
addresses cannot be obtained via page_address(). SLUB exploits that
page->private is set to the vmalloc address to avoid a costly
vmalloc_address().

However, for diagnostics there is still the need to determine the
vmalloc address from the page struct. There we must use the costly
vmalloc_address().

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slab.h |1 
 include/linux/slub_def.h |1 
 mm/slub.c|   83 ---
 3 files changed, 60 insertions(+), 25 deletions(-)

Index: linux-2.6/include/linux/slab.h
===
--- linux-2.6.orig/include/linux/slab.h 2007-09-18 17:03:30.0 -0700
+++ linux-2.6/include/linux/slab.h  2007-09-18 17:07:39.0 -0700
@@ -19,6 +19,7 @@
  * The ones marked DEBUG are only valid if CONFIG_SLAB_DEBUG is set.
  */
 #define SLAB_DEBUG_FREE0x0100UL/* DEBUG: Perform 
(expensive) checks on free */
+#define SLAB_VFALLBACK 0x0200UL/* May fall back to vmalloc */
 #define SLAB_RED_ZONE  0x0400UL/* DEBUG: Red zone objs in a 
cache */
 #define SLAB_POISON0x0800UL/* DEBUG: Poison objects */
 #define SLAB_HWCACHE_ALIGN 0x2000UL/* Align objs on cache lines */
Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-09-18 17:03:30.0 -0700
+++ linux-2.6/mm/slub.c 2007-09-18 18:13:38.0 -0700
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Lock order:
@@ -277,6 +278,26 @@ static inline struct kmem_cache_node *ge
 #endif
 }
 
+static inline void *slab_address(struct page *page)
+{
+   if (unlikely(PageVcompound(page)))
+   return vmalloc_address(page);
+   else
+   return page_address(page);
+}
+
+static inline struct page *virt_to_slab(const void *addr)
+{
+   struct page *page;
+
+   if (unlikely(is_vmalloc_addr(addr)))
+   page = vmalloc_to_page(addr);
+   else
+   page = virt_to_page(addr);
+
+   return compound_head(page);
+}
+
 static inline int check_valid_pointer(struct kmem_cache *s,
struct page *page, const void *object)
 {
@@ -285,7 +306,7 @@ static inline int check_valid_pointer(st
if (!object)
return 1;
 
-   base = page_address(page);
+   base = slab_address(page);
if (object < base || object >= base + s->objects * s->size ||
(object - base) % s->size) {
return 0;
@@ -470,7 +491,7 @@ static void slab_fix(struct kmem_cache *
 static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
 {
unsigned int off;   /* Offset of last byte */
-   u8 *addr = page_address(page);
+   u8 *addr = slab_address(page);
 
print_tracking(s, p);
 
@@ -648,7 +669,7 @@ static int slab_pad_check(struct kmem_ca
if (!(s->flags & SLAB_POISON))
return 1;
 
-   start = page_address(page);
+   start = slab_address(page);
end = start + (PAGE_SIZE << s->order);
length = s->objects * s->size;
remainder = end - (start + length);
@@ -1040,11 +1061,7 @@ static struct page *allocate_slab(struct
struct page * page;
int pages = 1 << s->order;
 
-   if (s->order)
-   flags |= __GFP_COMP;
-
-   if (s->flags & SLAB_CACHE_DMA)
-   flags |= SLUB_DMA;
+   flags |= s->gfpflags;
 
if (node == -1)
page = alloc_pages(flags, s->order);
@@ -1098,7 +1115,11 @@ static struct page *new_slab(struct kmem
SLAB_STORE_USER | SLAB_TRACE))
SetSlabDebug(page);
 
-   start = page_address(page);
+   if (!PageVcompound(page))
+   start = slab_address(page);
+   else
+   start = (void *)page->private;
+
end = start + s->objects * s->size;
 
if (unlikely(s->flags & SLAB_POISON))
@@ -1130,7 +1151,7 @@ static void __free_slab(struct kmem_cach
void *p;
 

[10/17] Use GFP_VFALLBACK for sparsemem.

2007-09-18 Thread Christoph Lameter
Sparsemem currently attempts first to do a physically contiguous mapping
and then falls back to vmalloc. The same thing can now be accomplished
using GFP_VFALLBACK.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/sparse.c |   23 +++
 1 file changed, 3 insertions(+), 20 deletions(-)

Index: linux-2.6/mm/sparse.c
===
--- linux-2.6.orig/mm/sparse.c  2007-09-18 13:21:44.0 -0700
+++ linux-2.6/mm/sparse.c   2007-09-18 13:28:43.0 -0700
@@ -269,32 +269,15 @@ void __init sparse_init(void)
 #ifdef CONFIG_MEMORY_HOTPLUG
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 {
-   struct page *page, *ret;
unsigned long memmap_size = sizeof(struct page) * nr_pages;
 
-   page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
-   if (page)
-   goto got_map_page;
-
-   ret = vmalloc(memmap_size);
-   if (ret)
-   goto got_map_ptr;
-
-   return NULL;
-got_map_page:
-   ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
-got_map_ptr:
-   memset(ret, 0, memmap_size);
-
-   return ret;
+   return (struct page *)alloc_page(GFP_VFALLBACK|__GFP_ZERO,
+   get_order(memmap_size));
 }
 
 static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
 {
-   if (is_vmalloc_addr(memmap))
-   vfree(memmap);
-   else
-   free_pages((unsigned long)memmap,
+   free_pages((unsigned long)memmap,
   get_order(sizeof(struct page) * nr_pages));
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[09/17] VFALLBACK: Debugging aid

2007-09-18 Thread Christoph Lameter
Virtual fallbacks are rare and thus subtle bugs may creep in if we do not
test the fallbacks. CONFIG_VFALLBACK_ALWAYS makes all GFP_VFALLBACK
allocations fall back to virtual mapping.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 lib/Kconfig.debug |   11 +++
 mm/page_alloc.c   |9 +
 2 files changed, 20 insertions(+)

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2007-09-18 19:19:34.0 -0700
+++ linux-2.6/mm/page_alloc.c   2007-09-18 20:16:26.0 -0700
@@ -1205,7 +1205,16 @@ zonelist_scan:
goto this_zone_full;
}
}
+#ifdef CONFIG_VFALLBACK_ALWAYS
+   if ((gfp_mask & __GFP_VFALLBACK) &&
+   system_state == SYSTEM_RUNNING)  {
+   struct page *vcompound_alloc(gfp_t, int,
+   struct zonelist *, unsigned long);
 
+   page = vcompound_alloc(gfp_mask, order,
+   zonelist, alloc_flags);
+   } else
+#endif
page = buffered_rmqueue(zonelist, zone, order, gfp_mask);
if (page)
break;
Index: linux-2.6/lib/Kconfig.debug
===
--- linux-2.6.orig/lib/Kconfig.debug2007-09-18 19:19:28.0 -0700
+++ linux-2.6/lib/Kconfig.debug 2007-09-18 19:19:34.0 -0700
@@ -105,6 +105,17 @@ config DETECT_SOFTLOCKUP
   can be detected via the NMI-watchdog, on platforms that
   support it.)
 
+config VFALLBACK_ALWAYS
+   bool "Always fall back to Virtual Compound pages"
+   default y
+   help
+ Virtual compound pages are only allocated if there is no linear
+ memory available. They are a fallback and errors created by the
+ use of virtual mappings instead of linear ones may not surface
+ because of their infrequent use. This option makes every
+ allocation that allows a fallback to a virtual mapping use
+ the virtual mapping. May have a significant performance impact.
+
 config SCHED_DEBUG
bool "Collect scheduler debugging info"
depends on DEBUG_KERNEL && PROC_FS

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[17/17] Allow virtual fallback for dentries

2007-09-18 Thread Christoph Lameter
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/dcache.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/dcache.c
===
--- linux-2.6.orig/fs/dcache.c  2007-09-18 18:42:19.0 -0700
+++ linux-2.6/fs/dcache.c   2007-09-18 18:42:55.0 -0700
@@ -2118,7 +2118,8 @@ static void __init dcache_init(unsigned 
 * of the dcache. 
 */
dentry_cache = KMEM_CACHE(dentry,
-   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
+   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|
+   SLAB_VFALLBACK);

register_shrinker(_shrinker);
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[13/17] Virtual compound page freeing in interrupt context

2007-09-18 Thread Christoph Lameter
If we are in an interrupt context then simply defer the free via a workqueue.

In an interrupt context it is not possible to use vmalloc_addr() to determine
the vmalloc address. So add a variant that does that too.

Removing a virtual mappping *must* be done with interrupts enabled
since tlb_xx functions are called that rely on interrupts for
processor to processor communications.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/page_alloc.c |   23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2007-09-18 20:10:55.0 -0700
+++ linux-2.6/mm/page_alloc.c   2007-09-18 20:11:40.0 -0700
@@ -1297,7 +1297,12 @@ abort:
return NULL;
 }
 
-static void vcompound_free(void *addr)
+/*
+ * Virtual Compound freeing functions. This is complicated by the vmalloc
+ * layer not being able to free virtual allocations when interrupts are
+ * disabled. So we defer the frees via a workqueue if necessary.
+ */
+static void __vcompound_free(void *addr)
 {
struct page **pages = vunmap(addr);
int i;
@@ -1320,6 +1325,22 @@ static void vcompound_free(void *addr)
kfree(pages);
 }
 
+static void vcompound_free_work(struct work_struct *w)
+{
+   __vcompound_free((void *)w);
+}
+
+static void vcompound_free(void *addr)
+{
+   if (in_interrupt()) {
+   struct work_struct *w = addr;
+
+   INIT_WORK(w, vcompound_free_work);
+   schedule_work(w);
+   } else
+   __vcompound_free(addr);
+}
+
 /*
  * This is the 'heart' of the zoned buddy allocator.
  */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[14/17] Allow bit_waitqueue to wait on a bit in a vmalloc area

2007-09-18 Thread Christoph Lameter
If bit waitqueue is passed a virtual address then it must use
vmalloc_to_page instead of virt_to_page to get to the page struct.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 kernel/wait.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/wait.c
===
--- linux-2.6.orig/kernel/wait.c2007-09-18 19:19:27.0 -0700
+++ linux-2.6/kernel/wait.c 2007-09-18 20:10:39.0 -0700
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void init_waitqueue_head(wait_queue_head_t *q)
 {
@@ -245,9 +246,16 @@ EXPORT_SYMBOL(wake_up_bit);
 fastcall wait_queue_head_t *bit_waitqueue(void *word, int bit)
 {
const int shift = BITS_PER_LONG == 32 ? 5 : 6;
-   const struct zone *zone = page_zone(virt_to_page(word));
unsigned long val = (unsigned long)word << shift | bit;
+   struct page *page;
+   struct zone *zone;
 
+   if (is_vmalloc_addr(word))
+   page = vmalloc_to_page(word)
+   else
+   page = virt_to_page(word);
+
+   zone = page_zone(page);
return >wait_table[hash_long(val, zone->wait_table_bits)];
 }
 EXPORT_SYMBOL(bit_waitqueue);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[16/17] Allow virtual fallback for buffer_heads

2007-09-18 Thread Christoph Lameter
This is in particular useful for large I/Os because it will allow > 100
allocs from the SLUB fast path without having to go to the page allocator.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 fs/buffer.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/buffer.c
===
--- linux-2.6.orig/fs/buffer.c  2007-09-18 15:44:37.0 -0700
+++ linux-2.6/fs/buffer.c   2007-09-18 15:44:51.0 -0700
@@ -3008,7 +3008,8 @@ void __init buffer_init(void)
int nrpages;
 
bh_cachep = KMEM_CACHE(buffer_head,
-   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
+   SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|
+   SLAB_VFALLBACK);
 
/*
 * Limit the bh occupancy to 10% of ZONE_NORMAL

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[11/17] GFP_VFALLBACK for zone wait table.

2007-09-18 Thread Christoph Lameter
Currently we have to use vmalloc for the zone wait table possibly generating
the need to create lots of TLBs to access the tables. We can now use
GFP_VFALLBACK to attempt the use of a physically contiguous page that can then
use the large kernel TLBs.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/page_alloc.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2007-09-18 14:29:05.0 -0700
+++ linux-2.6/mm/page_alloc.c   2007-09-18 14:29:10.0 -0700
@@ -2572,7 +2572,9 @@ int zone_wait_table_init(struct zone *zo
 * To use this new node's memory, further consideration will be
 * necessary.
 */
-   zone->wait_table = (wait_queue_head_t *)vmalloc(alloc_size);
+   zone->wait_table = (wait_queue_head_t *)
+   __get_free_pages(GFP_VFALLBACK,
+   get_order(alloc_size));
}
if (!zone->wait_table)
return -ENOMEM;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[12/17] Virtual Compound page allocation from interrupt context.

2007-09-18 Thread Christoph Lameter
In an interrupt context we cannot wait for the vmlist_lock in
__get_vm_area_node(). So use a trylock instead. If the trylock fails
then the atomic allocation will fail and subsequently be retried.

This only works because the flush_cache_vunmap in use for
allocation is never performing any IPIs in contrast to flush_tlb_...
in use for freeing.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/vmalloc.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===
--- linux-2.6.orig/mm/vmalloc.c 2007-09-18 10:52:11.0 -0700
+++ linux-2.6/mm/vmalloc.c  2007-09-18 10:54:21.0 -0700
@@ -289,7 +289,6 @@ static struct vm_struct *__get_vm_area_n
unsigned long align = 1;
unsigned long addr;
 
-   BUG_ON(in_interrupt());
if (flags & VM_IOREMAP) {
int bit = fls(size);
 
@@ -314,7 +313,14 @@ static struct vm_struct *__get_vm_area_n
 */
size += PAGE_SIZE;
 
-   write_lock(_lock);
+   if (gfp_mask & __GFP_WAIT)
+   write_lock(_lock);
+   else {
+   if (!write_trylock(_lock)) {
+   kfree(area);
+   return NULL;
+   }
+   }
for (p =  (tmp = *p) != NULL ;p = >next) {
if ((unsigned long)tmp->addr < addr) {
if((unsigned long)tmp->addr + tmp->size >= addr)

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[03/17] is_vmalloc_addr(): Check if an address is within the vmalloc boundaries

2007-09-18 Thread Christoph Lameter
This test is used in a couple of places. Add a version to vmalloc.h
and replace the other checks.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 drivers/net/cxgb3/cxgb3_offload.c |4 +---
 fs/ntfs/malloc.h  |3 +--
 fs/proc/kcore.c   |2 +-
 fs/xfs/linux-2.6/kmem.c   |3 +--
 fs/xfs/linux-2.6/xfs_buf.c|3 +--
 include/linux/mm.h|8 
 mm/sparse.c   |   10 +-
 7 files changed, 14 insertions(+), 19 deletions(-)

Index: linux-2.6/include/linux/mm.h
===
--- linux-2.6.orig/include/linux/mm.h   2007-09-17 21:46:06.0 -0700
+++ linux-2.6/include/linux/mm.h2007-09-17 23:56:54.0 -0700
@@ -1158,6 +1158,14 @@ static inline unsigned long vma_pages(st
return (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 }
 
+/* Determine if an address is within the vmalloc range */
+static inline int is_vmalloc_addr(const void *x)
+{
+   unsigned long addr = (unsigned long)x;
+
+   return addr >= VMALLOC_START && addr < VMALLOC_END;
+}
+
 pgprot_t vm_get_page_prot(unsigned long vm_flags);
 struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
Index: linux-2.6/mm/sparse.c
===
--- linux-2.6.orig/mm/sparse.c  2007-09-17 21:45:24.0 -0700
+++ linux-2.6/mm/sparse.c   2007-09-17 23:56:26.0 -0700
@@ -289,17 +289,9 @@ got_map_ptr:
return ret;
 }
 
-static int vaddr_in_vmalloc_area(void *addr)
-{
-   if (addr >= (void *)VMALLOC_START &&
-   addr < (void *)VMALLOC_END)
-   return 1;
-   return 0;
-}
-
 static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
 {
-   if (vaddr_in_vmalloc_area(memmap))
+   if (is_vmalloc_addr(memmap))
vfree(memmap);
else
free_pages((unsigned long)memmap,
Index: linux-2.6/drivers/net/cxgb3/cxgb3_offload.c
===
--- linux-2.6.orig/drivers/net/cxgb3/cxgb3_offload.c2007-09-17 
21:45:24.0 -0700
+++ linux-2.6/drivers/net/cxgb3/cxgb3_offload.c 2007-09-17 21:46:06.0 
-0700
@@ -1035,9 +1035,7 @@ void *cxgb_alloc_mem(unsigned long size)
  */
 void cxgb_free_mem(void *addr)
 {
-   unsigned long p = (unsigned long)addr;
-
-   if (p >= VMALLOC_START && p < VMALLOC_END)
+   if (is_vmalloc_addr(addr))
vfree(addr);
else
kfree(addr);
Index: linux-2.6/fs/ntfs/malloc.h
===
--- linux-2.6.orig/fs/ntfs/malloc.h 2007-09-17 21:45:24.0 -0700
+++ linux-2.6/fs/ntfs/malloc.h  2007-09-17 21:46:06.0 -0700
@@ -85,8 +85,7 @@ static inline void *ntfs_malloc_nofs_nof
 
 static inline void ntfs_free(void *addr)
 {
-   if (likely(((unsigned long)addr < VMALLOC_START) ||
-   ((unsigned long)addr >= VMALLOC_END ))) {
+   if (!is_vmalloc_addr(addr)) {
kfree(addr);
/* free_page((unsigned long)addr); */
return;
Index: linux-2.6/fs/proc/kcore.c
===
--- linux-2.6.orig/fs/proc/kcore.c  2007-09-17 21:45:24.0 -0700
+++ linux-2.6/fs/proc/kcore.c   2007-09-17 21:46:06.0 -0700
@@ -325,7 +325,7 @@ read_kcore(struct file *file, char __use
if (m == NULL) {
if (clear_user(buffer, tsz))
return -EFAULT;
-   } else if ((start >= VMALLOC_START) && (start < VMALLOC_END)) {
+   } else if (is_vmalloc_addr((void *)start)) {
char * elf_buf;
struct vm_struct *m;
unsigned long curstart = start;
Index: linux-2.6/fs/xfs/linux-2.6/kmem.c
===
--- linux-2.6.orig/fs/xfs/linux-2.6/kmem.c  2007-09-17 21:45:24.0 
-0700
+++ linux-2.6/fs/xfs/linux-2.6/kmem.c   2007-09-17 21:46:06.0 -0700
@@ -92,8 +92,7 @@ kmem_zalloc_greedy(size_t *size, size_t 
 void
 kmem_free(void *ptr, size_t size)
 {
-   if (((unsigned long)ptr < VMALLOC_START) ||
-   ((unsigned long)ptr >= VMALLOC_END)) {
+   if (!is_vmalloc_addr(ptr)) {
kfree(ptr);
} else {
vfree(ptr);
Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-09-17 21:45:24.0 
-0700
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-09-17 21:46:06.0 
-0700
@@ -696,8 +696,7 @@ static inline struct page *
 mem_to_page(
void   

[08/17] Pass vmalloc address in page->private

2007-09-18 Thread Christoph Lameter
Avoid expensive lookups of virtual addresses from page structs by
storing the vmalloc address in page->private. We can then avoid
the vmalloc_address() in the get__page() functions and
simply return page->private.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/page_alloc.c |   15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===
--- linux-2.6.orig/mm/page_alloc.c  2007-09-18 18:35:55.0 -0700
+++ linux-2.6/mm/page_alloc.c   2007-09-18 18:36:01.0 -0700
@@ -1276,6 +1276,11 @@ struct page *vcompound_alloc(gfp_t gfp_m
if (!addr)
goto abort;
 
+   /*
+* Give the caller a chance to avoid an expensive vmalloc_addr()
+* call.
+*/
+   pages[0]->private = (unsigned long)addr;
return pages[0];
 
 abort:
@@ -1534,6 +1539,8 @@ fastcall unsigned long __get_free_pages(
page = alloc_pages(gfp_mask, order);
if (!page)
return 0;
+   if (unlikely(PageVcompound(page)))
+   return page->private;
return (unsigned long) page_address(page);
 }
 
@@ -1550,9 +1557,11 @@ fastcall unsigned long get_zeroed_page(g
VM_BUG_ON((gfp_mask & __GFP_HIGHMEM) != 0);
 
page = alloc_pages(gfp_mask | __GFP_ZERO, 0);
-   if (page)
-   return (unsigned long) page_address(page);
-   return 0;
+   if (!page)
+   return 0;
+   if (unlikely(PageVcompound(page)))
+   return page->private;
+   return (unsigned long) page_address(page);
 }
 
 EXPORT_SYMBOL(get_zeroed_page);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[01/17] Vmalloc: Move vmalloc_to_page to mm/vmalloc.

2007-09-18 Thread Christoph Lameter
We already have page table manipulation for vmalloc in vmalloc.c. Move the
vmalloc_to_page() function there as well. Also move the related definitions
from include/linux/mm.h.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/mm.h  |2 --
 include/linux/vmalloc.h |4 
 mm/memory.c |   40 
 mm/vmalloc.c|   38 ++
 4 files changed, 42 insertions(+), 42 deletions(-)

Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2007-09-18 18:33:56.0 -0700
+++ linux-2.6/mm/memory.c   2007-09-18 18:34:06.0 -0700
@@ -2727,46 +2727,6 @@ int make_pages_present(unsigned long add
return ret == len ? 0 : -1;
 }
 
-/* 
- * Map a vmalloc()-space virtual address to the physical page.
- */
-struct page * vmalloc_to_page(void * vmalloc_addr)
-{
-   unsigned long addr = (unsigned long) vmalloc_addr;
-   struct page *page = NULL;
-   pgd_t *pgd = pgd_offset_k(addr);
-   pud_t *pud;
-   pmd_t *pmd;
-   pte_t *ptep, pte;
-  
-   if (!pgd_none(*pgd)) {
-   pud = pud_offset(pgd, addr);
-   if (!pud_none(*pud)) {
-   pmd = pmd_offset(pud, addr);
-   if (!pmd_none(*pmd)) {
-   ptep = pte_offset_map(pmd, addr);
-   pte = *ptep;
-   if (pte_present(pte))
-   page = pte_page(pte);
-   pte_unmap(ptep);
-   }
-   }
-   }
-   return page;
-}
-
-EXPORT_SYMBOL(vmalloc_to_page);
-
-/*
- * Map a vmalloc()-space virtual address to the physical page frame number.
- */
-unsigned long vmalloc_to_pfn(void * vmalloc_addr)
-{
-   return page_to_pfn(vmalloc_to_page(vmalloc_addr));
-}
-
-EXPORT_SYMBOL(vmalloc_to_pfn);
-
 #if !defined(__HAVE_ARCH_GATE_AREA)
 
 #if defined(AT_SYSINFO_EHDR)
Index: linux-2.6/mm/vmalloc.c
===
--- linux-2.6.orig/mm/vmalloc.c 2007-09-18 18:33:56.0 -0700
+++ linux-2.6/mm/vmalloc.c  2007-09-18 18:34:06.0 -0700
@@ -166,6 +166,44 @@ int map_vm_area(struct vm_struct *area, 
 }
 EXPORT_SYMBOL_GPL(map_vm_area);
 
+/*
+ * Map a vmalloc()-space virtual address to the physical page.
+ */
+struct page *vmalloc_to_page(void *vmalloc_addr)
+{
+   unsigned long addr = (unsigned long) vmalloc_addr;
+   struct page *page = NULL;
+   pgd_t *pgd = pgd_offset_k(addr);
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *ptep, pte;
+
+   if (!pgd_none(*pgd)) {
+   pud = pud_offset(pgd, addr);
+   if (!pud_none(*pud)) {
+   pmd = pmd_offset(pud, addr);
+   if (!pmd_none(*pmd)) {
+   ptep = pte_offset_map(pmd, addr);
+   pte = *ptep;
+   if (pte_present(pte))
+   page = pte_page(pte);
+   pte_unmap(ptep);
+   }
+   }
+   }
+   return page;
+}
+EXPORT_SYMBOL(vmalloc_to_page);
+
+/*
+ * Map a vmalloc()-space virtual address to the physical page frame number.
+ */
+unsigned long vmalloc_to_pfn(void *vmalloc_addr)
+{
+   return page_to_pfn(vmalloc_to_page(vmalloc_addr));
+}
+EXPORT_SYMBOL(vmalloc_to_pfn);
+
 static struct vm_struct *__get_vm_area_node(unsigned long size, unsigned long 
flags,
unsigned long start, unsigned long 
end,
int node, gfp_t gfp_mask)
Index: linux-2.6/include/linux/mm.h
===
--- linux-2.6.orig/include/linux/mm.h   2007-09-18 18:33:56.0 -0700
+++ linux-2.6/include/linux/mm.h2007-09-18 18:34:06.0 -0700
@@ -1160,8 +1160,6 @@ static inline unsigned long vma_pages(st
 
 pgprot_t vm_get_page_prot(unsigned long vm_flags);
 struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
-struct page *vmalloc_to_page(void *addr);
-unsigned long vmalloc_to_pfn(void *addr);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
Index: linux-2.6/include/linux/vmalloc.h
===
--- linux-2.6.orig/include/linux/vmalloc.h  2007-09-18 18:33:57.0 
-0700
+++ linux-2.6/include/linux/vmalloc.h   2007-09-18 18:34:24.0 -0700
@@ -81,6 +81,10 @@ extern void unmap_kernel_range(unsigned 
 extern struct vm_struct *alloc_vm_area(size_t size);
 extern void 

[00/17] [RFC] Virtual Compound Page Support

2007-09-18 Thread Christoph Lameter
Currently there is a strong tendency to avoid larger page allocations in
the kernel because of past fragmentation issues and the current
defragmentation methods are still evolving. It is not clear to what extend
they can provide reliable allocations for higher order pages (plus the
definition of "reliable" seems to be in the eye of the beholder).

Currently we use vmalloc allocations in many locations to provide a safe
way to allocate larger arrays. That is due to the danger of higher order
allocations failing. Virtual Compound pages allow the use of regular
page allocator allocations that will fall back only if there is an actual
problem with acquiring a higher order page.

This patch set provides a way for a higher page allocation to fall back.
Instead of a physically contiguous page a virtually contiguous page
is provided. The functionality of the vmalloc layer is used to provide
the necessary page tables and control structures to establish a virtually
contiguous area.

Advantages:

- If higher order allocations are failing then virtual compound pages
  consisting of a series of order-0 pages can stand in for those
  allocations.

- "Reliability" as long as the vmalloc layer can provide virtual mappings.

- Ability to reduce the use of vmalloc layer significantly by using
  physically contiguous memory instead of virtual contiguous memory.
  Most uses of vmalloc() can be converted to page allocator calls.

- The use of physically contiguous memory instead of vmalloc may allow the
  use larger TLB entries thus reducing TLB pressure. Also reduces the need
  for page table walks.

Disadvantages:

- In order to use fall back the logic accessing the memory must be
  aware that the memory could be backed by a virtual mapping and take
  precautions. virt_to_page() and page_address() may not work and
  vmalloc_to_page() and vmalloc_address() (introduced through this
  patch set) may have to be called.

- Virtual mappings are less efficient than physical mappings.
  Performance will drop once virtual fall back occurs.

- Virtual mappings have more memory overhead. vm_area control structures
  page tables, page arrays etc need to be allocated and managed to provide
  virtual mappings.

The patchset provides this functionality in stages. Stage 1 introduces
the basic fall back mechanism necessary to replace vmalloc allocations
with

alloc_page(GFP_VFALLBACK, order, )

which signifies to the page allocator that a higher order is to be found
but a virtual mapping may stand in if there is an issue with fragmentation.

Stage 1 functionality does not allow allocation and freeing of virtual
mappings from interrupt contexts.

The stage 1 series ends with the conversion of a few key uses of vmalloc
in the VM to alloc_pages() for the allocation of sparsemems memmap table
and the wait table in each zone. Other uses of vmalloc could be converted
in the same way.


Stage 2 functionality enhances the fallback even more allowing allocation
and frees in interrupt context.

SLUB is then modified to use the virtual mappings for slab caches
that are marked with SLAB_VFALLBACK. If a slab cache is marked this way
then we drop all the restraints regarding page order and allocate
good large memory areas that fit lots of objects so that we rarely
have to use the slow paths.

Two slab caches--the dentry cache and the buffer_heads--are then flagged
that way. Others could be converted in the same way.

The patch set also provides a debugging aid through setting

CONFIG_VFALLBACK_ALWAYS

If set then all GFP_VFALLBACK allocations fall back to the virtual
mappings. This is useful for verification tests. The test of this
patch set was done by enabling that options and compiling a kernel.


Stage 3 functionality could be the adding of support for the large
buffer size patchset. Not done yet and not sure if it would be useful
to do.

Much of this patchset may only be needed for special cases in which the
existing defragmentation methods fail for some reason. It may be better to
have the system operate without such a safety net and make sure that the
page allocator can return large orders in a reliable way.

The initial idea for this patchset came from Nick Piggin's fsblock
and from his arguments about reliability and guarantees. Since his
fsblock uses the virtual mappings I think it is legitimate to
generalize the use of virtual mappings to support higher order
allocations in this way. The application of these ideas to the large
block size patchset etc are straightforward. If wanted I can base
the next rev of the largebuffer patchset on this one and implement
fallback.

Contrary to Nick, I still doubt that any of this provides a "guarantee".
Have said that I have to deal with various failure scenarios in the VM
daily and I'd certainly like to see it work in a more reliable manner.

IMHO getting rid of the various workarounds to deal with the small 4k
pages and avoiding additional layers that group these pages in subsystem

[02/17] Vmalloc: add const

2007-09-18 Thread Christoph Lameter
Make vmalloc functions work the same way as kfree() and friends that
take a const void * argument.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/vmalloc.h |   10 +-
 mm/vmalloc.c|   16 
 2 files changed, 13 insertions(+), 13 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===
--- linux-2.6.orig/mm/vmalloc.c 2007-09-18 18:34:06.0 -0700
+++ linux-2.6/mm/vmalloc.c  2007-09-18 18:34:33.0 -0700
@@ -169,7 +169,7 @@ EXPORT_SYMBOL_GPL(map_vm_area);
 /*
  * Map a vmalloc()-space virtual address to the physical page.
  */
-struct page *vmalloc_to_page(void *vmalloc_addr)
+struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
unsigned long addr = (unsigned long) vmalloc_addr;
struct page *page = NULL;
@@ -198,7 +198,7 @@ EXPORT_SYMBOL(vmalloc_to_page);
 /*
  * Map a vmalloc()-space virtual address to the physical page frame number.
  */
-unsigned long vmalloc_to_pfn(void *vmalloc_addr)
+unsigned long vmalloc_to_pfn(const void *vmalloc_addr)
 {
return page_to_pfn(vmalloc_to_page(vmalloc_addr));
 }
@@ -305,7 +305,7 @@ struct vm_struct *get_vm_area_node(unsig
 }
 
 /* Caller must hold vmlist_lock */
-static struct vm_struct *__find_vm_area(void *addr)
+static struct vm_struct *__find_vm_area(const void *addr)
 {
struct vm_struct *tmp;
 
@@ -318,7 +318,7 @@ static struct vm_struct *__find_vm_area(
 }
 
 /* Caller must hold vmlist_lock */
-static struct vm_struct *__remove_vm_area(void *addr)
+static struct vm_struct *__remove_vm_area(const void *addr)
 {
struct vm_struct **p, *tmp;
 
@@ -347,7 +347,7 @@ found:
  * This function returns the found VM area, but using it is NOT safe
  * on SMP machines, except for its size or flags.
  */
-struct vm_struct *remove_vm_area(void *addr)
+struct vm_struct *remove_vm_area(const void *addr)
 {
struct vm_struct *v;
write_lock(_lock);
@@ -356,7 +356,7 @@ struct vm_struct *remove_vm_area(void *a
return v;
 }
 
-static void __vunmap(void *addr, int deallocate_pages)
+static void __vunmap(const void *addr, int deallocate_pages)
 {
struct vm_struct *area;
 
@@ -407,7 +407,7 @@ static void __vunmap(void *addr, int dea
  *
  * Must not be called in interrupt context.
  */
-void vfree(void *addr)
+void vfree(const void *addr)
 {
BUG_ON(in_interrupt());
__vunmap(addr, 1);
@@ -423,7 +423,7 @@ EXPORT_SYMBOL(vfree);
  *
  * Must not be called in interrupt context.
  */
-void vunmap(void *addr)
+void vunmap(const void *addr)
 {
BUG_ON(in_interrupt());
__vunmap(addr, 0);
Index: linux-2.6/include/linux/vmalloc.h
===
--- linux-2.6.orig/include/linux/vmalloc.h  2007-09-18 18:34:24.0 
-0700
+++ linux-2.6/include/linux/vmalloc.h   2007-09-18 18:35:03.0 -0700
@@ -45,11 +45,11 @@ extern void *vmalloc_32_user(unsigned lo
 extern void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot);
 extern void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask,
pgprot_t prot);
-extern void vfree(void *addr);
+extern void vfree(const void *addr);
 
 extern void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
-extern void vunmap(void *addr);
+extern void vunmap(const void *addr);
 
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
@@ -71,7 +71,7 @@ extern struct vm_struct *__get_vm_area(u
 extern struct vm_struct *get_vm_area_node(unsigned long size,
  unsigned long flags, int node,
  gfp_t gfp_mask);
-extern struct vm_struct *remove_vm_area(void *addr);
+extern struct vm_struct *remove_vm_area(const void *addr);
 
 extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
struct page ***pages);
@@ -82,8 +82,8 @@ extern struct vm_struct *alloc_vm_area(s
 extern void free_vm_area(struct vm_struct *area);
 
 /* Determine page struct from address */
-struct page *vmalloc_to_page(void *addr);
-unsigned long vmalloc_to_pfn(void *addr);
+struct page *vmalloc_to_page(const void *addr);
+unsigned long vmalloc_to_pfn(const void *addr);
 
 /*
  * Internals.  Dont't use..

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[04/17] vmalloc: clean up page array indexing

2007-09-18 Thread Christoph Lameter
The page array is repeatedly indexed both in vunmap and vmalloc_area_node().
Add a temporary variable to make it easier to read (and easier to patch
later).

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/vmalloc.c |   16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===
--- linux-2.6.orig/mm/vmalloc.c 2007-09-18 13:22:16.0 -0700
+++ linux-2.6/mm/vmalloc.c  2007-09-18 13:22:17.0 -0700
@@ -383,8 +383,10 @@ static void __vunmap(const void *addr, i
int i;
 
for (i = 0; i < area->nr_pages; i++) {
-   BUG_ON(!area->pages[i]);
-   __free_page(area->pages[i]);
+   struct page *page = area->pages[i];
+
+   BUG_ON(!page);
+   __free_page(page);
}
 
if (area->flags & VM_VPAGES)
@@ -488,15 +490,19 @@ void *__vmalloc_area_node(struct vm_stru
}
 
for (i = 0; i < area->nr_pages; i++) {
+   struct page *page;
+
if (node < 0)
-   area->pages[i] = alloc_page(gfp_mask);
+   page = alloc_page(gfp_mask);
else
-   area->pages[i] = alloc_pages_node(node, gfp_mask, 0);
-   if (unlikely(!area->pages[i])) {
+   page = alloc_pages_node(node, gfp_mask, 0);
+
+   if (unlikely(!page)) {
/* Successfully allocated i pages, free them in 
__vunmap() */
area->nr_pages = i;
goto fail;
}
+   area->pages[i] = page;
}
 
if (map_vm_area(area, prot, ))

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch]some proc entries are missed in sched_domain sys_ctl debug code.

2007-09-18 Thread Zou Nan hai
cache_nice_tries and flags entry do not appear in proc fs sched_domain 
directory,
because ctl_table entry is skipped.

This patch fix the issue.

Signed-off-by: Zou Nan hai <[EMAIL PROTECTED]>

--- linux-2.6.23-rc6/kernel/sched.c 2007-09-18 23:47:07.0 -0400
+++ b/kernel/sched.c2007-09-18 23:47:20.0 -0400
@@ -5304,7 +5304,7 @@ set_table_entry(struct ctl_table *entry,
 static struct ctl_table *
 sd_alloc_ctl_domain_table(struct sched_domain *sd)
 {
-   struct ctl_table *table = sd_alloc_ctl_entry(14);
+   struct ctl_table *table = sd_alloc_ctl_entry(12);
 
set_table_entry([0], "min_interval", >min_interval,
sizeof(long), 0644, proc_doulongvec_minmax);
@@ -5324,10 +5324,10 @@ sd_alloc_ctl_domain_table(struct sched_d
sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry([8], "imbalance_pct", >imbalance_pct,
sizeof(int), 0644, proc_dointvec_minmax);
-   set_table_entry([10], "cache_nice_tries",
+   set_table_entry([9], "cache_nice_tries",
>cache_nice_tries,
sizeof(int), 0644, proc_dointvec_minmax);
-   set_table_entry([12], "flags", >flags,
+   set_table_entry([10], "flags", >flags,
sizeof(int), 0644, proc_dointvec_minmax);
 
return table;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-18 Thread Huang, Ying
This patch defines a 32-bit boot protocol and adds corresponding
document. It is based on the proposal of Peter Anvin.


Known issues:

- The hd0_info and hd1_info are deleted from the zero page. Additional
  work should be done for this? Or this is unnecessary (because no new
  fields will be added to zero page)?

- The fields in zero page are fairly complex (such as struct
  edd_info). Is it necessary to document every field inside the first
  level fields, until the primary data type? Or is it sufficient to
  provide the C struct name only?


ChangeLog:

-- v2 --

- Revise zero page description according to the source code and move
  them to zero-page.txt.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 boot.txt  |   70 +++
 zero-page.txt |  127 --
 2 files changed, 97 insertions(+), 100 deletions(-)

Index: linux-2.6.23-rc6/Documentation/i386/boot.txt
===
--- linux-2.6.23-rc6.orig/Documentation/i386/boot.txt   2007-09-11 
10:50:29.0 +0800
+++ linux-2.6.23-rc6/Documentation/i386/boot.txt2007-09-19 
10:00:18.0 +0800
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin <[EMAIL PROTECTED]>
-   Last update 2007-05-23
+   Last update 2007-09-18
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -42,6 +42,9 @@
 Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
the boot command line
 
+Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical
+   pointer to single linked list of struct setup_data.
+   Added 32-bit boot protocol.
 
  MEMORY LAYOUT
 
@@ -168,6 +171,9 @@
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023c/4 N/A pad3Unused
+0240/8 2.07+   setup_data  64-bit physical pointer to linked list
+   of struct setup_data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -480,6 +486,36 @@
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
 
+Field name:setup_data
+Type:  write (obligatory)
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  The 64-bit physical pointer to NULL terminated single linked list of
+  struct setup_data. This is used to define a more extensible boot
+  parameters passing mechanism. The definition of struct setup_data is
+  as follow:
+
+  struct setup_data {
+ u64 next;
+ u32 type;
+ u32 len;
+ u8  data[0];
+  } __attribute__((packed));
+
+  Where, the next is a 64-bit physical pointer to the next node of
+  linked list, the next field of the last node is 0; the type is used
+  to identify the contents of data; the len is the length of data
+  field; the data holds the real payload.
+
+  With this field, to add a new boot parameter written by bootloader,
+  it is not needed to add a new field to real mode header, just add a
+  new setup_data type is sufficient. But to add a new boot parameter
+  read by bootloader, it is still needed to add a new field.
+
+  TODO: Where is the safe place to place the linked list of struct
+   setup_data?
+
 
  THE KERNEL COMMAND LINE
 
@@ -753,3 +789,35 @@
After completing your hook, you should jump to the address
that was in this field before your boot loader overwrote it
(relocated, if appropriate.)
+
+
+ SETUP DATA TYPES
+
+
+ 32-bit BOOT PROTOCOL
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol need
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should still be to load the real-mode code and then examine the kernel
+header at offset 0x01f1. But, it is not necessary to load all
+real-mode code, just first 4K bytes traditionally known as "zero page"
+is needed.
+
+In addition to read/modify/write kernel header of the zero page as
+that of 16-bit boot protocol, the boot loader should also fill the
+additional fields of the zero page as that described in zero-page.txt.
+
+After loading and setuping the zero page, the boot loader can load the
+32/64-bit kernel in the same way as that of 16-bit boot protocol.
+
+In 32-bit boot protocol, the kernel is started by jumping to the
+32-bit kernel entry point, which is the start address of loaded
+32/64-bit kernel.
+
+At entry, the CPU must be in 32-bit protected mode with paging
+disabled; the CS and DS must be 4G flat 

[PATCH -mm -v2 1/2] i386/x86_64 boot: setup data

2007-09-18 Thread Huang, Ying
This patch add a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.

This patch has been tested against 2.6.23-rc6-mm1 kernel on x86_64. It
is based on the proposal of Peter Anvin.


Known Issues:

1. Where is safe to place the linked list of setup_data?
Because the length of the linked list of setup_data is variable, it
can not be copied into BSS segment of kernel as that of "zero
page". We must find a safe place for it, where it will not be
overwritten by kernel during booting up. The i386 kernel will
overwrite some pages after _end. The x86_64 kernel will overwrite some
pages from 0x1000 on.


ChangeLog:

-- v2 --

- Increase the boot protocol version number.
- Check version number before parsing setup_data.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---

 arch/i386/Kconfig|3 ---
 arch/i386/boot/header.S  |8 +++-
 arch/i386/kernel/setup.c |   22 ++
 arch/x86_64/kernel/setup.c   |   21 +
 include/asm-i386/bootparam.h |   15 +++
 include/asm-i386/io.h|7 +++
 6 files changed, 72 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc6/include/asm-i386/bootparam.h
===
--- linux-2.6.23-rc6.orig/include/asm-i386/bootparam.h  2007-09-19 
10:00:06.0 +0800
+++ linux-2.6.23-rc6/include/asm-i386/bootparam.h   2007-09-19 
10:00:08.0 +0800
@@ -9,6 +9,17 @@
 #include 
 #include 
 
+/* setup data types */
+#define SETUP_NONE 0
+
+/* extensible setup data list node */
+struct setup_data {
+   u64 next;
+   u32 type;
+   u32 len;
+   u8 data[0];
+} __attribute__((packed));
+
 struct setup_header {
u8  setup_sects;
u16 root_flags;
@@ -41,6 +52,10 @@
u32 initrd_addr_max;
u32 kernel_alignment;
u8  relocatable_kernel;
+   u8  _pad2[3];
+   u32 cmdline_size;
+   u32 _pad3;
+   u64 setup_data;
 } __attribute__((packed));
 
 struct sys_desc_table {
Index: linux-2.6.23-rc6/arch/i386/boot/header.S
===
--- linux-2.6.23-rc6.orig/arch/i386/boot/header.S   2007-09-11 
10:50:29.0 +0800
+++ linux-2.6.23-rc6/arch/i386/boot/header.S2007-09-19 10:00:09.0 
+0800
@@ -119,7 +119,7 @@
# Part 2 of the header, from the old setup.S
 
.ascii  "HdrS"  # header signature
-   .word   0x0206  # header version number (>= 0x0105)
+   .word   0x0207  # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
@@ -214,6 +214,12 @@
 #added with boot protocol
 #version 2.06
 
+pad4:  .long 0
+
+setup_data:.quad 0 # 64-bit physical pointer to
+   # single linked list of
+   # struct setup_data
+
 # End of setup header #
 
.section ".inittext", "ax"
Index: linux-2.6.23-rc6/arch/x86_64/kernel/setup.c
===
--- linux-2.6.23-rc6.orig/arch/x86_64/kernel/setup.c2007-09-19 
10:00:00.0 +0800
+++ linux-2.6.23-rc6/arch/x86_64/kernel/setup.c 2007-09-19 10:00:09.0 
+0800
@@ -221,6 +221,25 @@
ebda_size = 64*1024;
 }
 
+void __init parse_setup_data(void)
+{
+   struct setup_data *setup_data;
+   unsigned long pa_setup_data;
+
+   if (boot_params.hdr.version < 0x0207)
+   return;
+   pa_setup_data = boot_params.hdr.setup_data;
+   while (pa_setup_data) {
+   setup_data = early_ioremap(pa_setup_data, PAGE_SIZE);
+   switch (setup_data->type) {
+   default:
+   break;
+   }
+   pa_setup_data = setup_data->next;
+   early_iounmap(setup_data, PAGE_SIZE);
+   }
+}
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO "Command line: %s\n", boot_command_line);
@@ -256,6 +275,8 @@
strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
 
+   parse_setup_data();
+
parse_early_param();
 
finish_e820_parsing();
Index: linux-2.6.23-rc6/arch/i386/kernel/setup.c
===
--- linux-2.6.23-rc6.orig/arch/i386/kernel/setup.c  2007-09-19 
09:59:59.0 +0800
+++ 

Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur <[EMAIL PROTECTED]> wrote:

> +
> +__u16 crc16(__u16 crc, __u8 const *buffer, size_t len)

And is we really really have to do this, then the ext4-private crc16() 
should have static scope.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Ext4: Uninitialized Block Groups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 17:25:31 -0700 Avantika Mathur <[EMAIL PROTECTED]> wrote:

> +#if !defined(CONFIG_CRC16)
> +/** CRC table for the CRC-16. The poly is 0x8005 (x16 + x15 + x2 + 1) */
> +__u16 const crc16_table[256] = {
> + 0x, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,
> + 0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440,
> + 0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40,
> + 0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841,
> + 0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40,
> + 0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41,

That's rather sad.  A plain old "depends on" would be better.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] atyfb: force 29MHz xtal on G3 PowerBooks

2007-09-18 Thread Benjamin Herrenschmidt
On Sat, 2007-08-25 at 11:13 +0200, Olaf Hering wrote:
> The atyfb does not work on my 233MHz PowerBook with Mach64 LP, when the
> kernel is booted from firmware. aty_ld_pll_ct() returns 0x22 and xtal
> remains at 14.31818. When booted from MacOS, aty_ld_pll_ct() returns 0x3c
> and xtal is changed to 29.498928.
> Google indicates that all 4 PowerBook models need the higher value.

Seems to break it on my wallstreet first gen (M64 LG)

So NAK for now until we find out a better way.

Ben.

> Signed-off-by: Olaf Hering <[EMAIL PROTECTED]>
> 
> --- a/drivers/video/aty/atyfb_base.c
> +++ b/drivers/video/aty/atyfb_base.c
> @@ -2411,7 +2411,7 @@ static int __devinit aty_init(struct fb_
>   diff1 = -diff1;
>   if (diff2 < 0)
>   diff2 = -diff2;
> - if (diff2 < diff1) {
> + if (diff2 < diff1 || (M64_HAS(G3_PB_1024x768))) {
>   par->ref_clk_per = 1ULL / 29498928;
>   xtal = "29.498928";
>   }
> ___
> Linuxppc-dev mailing list
> [EMAIL PROTECTED]
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: drivers/usb/misc/emi*.c have the biggest data objects in the whole tree

2007-09-18 Thread Valdis . Kletnieks
On Fri, 14 Sep 2007 11:35:34 BST, Denys Vlasenko said:
> Hi Tapio,
> 
> You are the author of these files. Are you still maintaining them?
> If not, do you know who is the current maintainer?

> These two object files hold the biggest data objects in the whole Linux kernel
> after lockdep:
> 
>textdata bss dec hex filename
>1258  160516   0  161774   277ee ./drivers/usb/misc/emi26.o
>1504  209296   0  210800   33770 ./drivers/usb/misc/emi62.o
> 
> Basically, these are big arrays of the following structures:
> 
> typedef struct _INTEL_HEX_RECORD
> {
> __u32   length;
> __u32   address;
> __u32   type;
> __u8data[MAX_INTEL_HEX_RECORD_LENGTH];
> } INTEL_HEX_RECORD;
> 
> I suggest the following optimizations:
> 
> Change structure to

I suggest moving those out of the kernel entirely and use the firmware loader
support to bring it in from userspace like all the *other* firmware blobs.

'INTEL_HEX_RECORD' just *screams* 'microcode' ;)


pgpDkwQ5AEQWh.pgp
Description: PGP signature


Re: iso9660 vs udf

2007-09-18 Thread Randy Dunlap
On Wed, 19 Sep 2007 08:05:32 +0530 (IST) Satyam Sharma wrote:

> Hi Andries,
> 
> 
> On Wed, 19 Sep 2007, Andries E. Brouwer wrote:
> > 
> > On Wed, Sep 19, 2007 at 05:48:28AM +0530, Satyam Sharma wrote:
> > 
> > > > > On the other hand, this filesystem announces itself as UDF
> > > > > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"),
> > > > > perhaps the kernel code should be more robust.
> > > 
> > > Could you send the complete dmesg log, and what you mean with filesystem/
> > > kernel (incorrectly?) announcing it as UDF here ... I agree with Jan,
> > > this sounds like an issue with mount(8) to me.
> > 
> > You already got the relevant part of the dmesg log. Slightly more below.
> 
> > Failed mount:
> > UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', 
> > timestamp 2006/03/07 16:26 (1078)
> > udf: udf_read_inode(ino 547) failed !bh
> > UDF-fs: Error in udf_iget, block=1, partition=1
> 
> Ok, like said, this comes from udf_fill_super(), but which shouldn't
> have been called for this CD in the first place -- i.e. mount(8) shouldn't
> have tried to mount a non-UDF filesystem as UDF (unless explicitly asked
> as such). I was actually asking for the logs explaining why you thought
> the _kernel_ incorrectly "announced" it as an UDF filesystem.
> 
> Hmm ... those "CD-RTOS", "CD-BRIDGE" and "CDUDF File System - Adaptec Inc"
> bits are not dmesg output, are they? Looks like "hwinfo --cdrom" or
> "isoinfo" or some such.
> 
> > I think the filesystem can be treated both as iso9660 and as udf,
> > at least that is what I seem to recall CD-BRIDGE means.  Thus,
> > if the kernel cannot mount it as udf, I think it is a kernel flaw.
> > Given that kernel flaw, and the fact that mounting as iso9660 works,
> > mount(8) could work around the kernel problem by guessing iso9660.
> > But maybe we should first try to fix the kernel.
> 
> I don't think that is what CD-BRIDGE means -- so no kernel flaw :-)
> What happened here is simply that in the absence of a "-t" option,
> mount(8) defaulted (probably due to incorrect heuristics?) to UDF for
> some reason, thereby obviously failing. I don't know who maintains
> mount(8) / util-linux package, or do distributions have their own
> maintainers these days (?)

Hi,

Adrian took over util-linux, but hasn't made any releases lately,
so one of the RHAT developers is maintaining util-linux-ng:
  http://userweb.kernel.org/~kzak/util-linux-ng/


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nathan Scott
On Tue, 2007-09-18 at 18:06 -0700, Linus Torvalds wrote:
> There is *no* valid reason for 16kB blocksizes unless you have legacy 
> issues.

That's not correct.

> The performance issues have nothing to do with the block-size, and 

We must be thinking of different performance issues.

> should be solvable by just making sure that your stupid "state of the
> art" 
> crap SCSI controller gets contiguous physical memory, which is best
> done 
> in the read-ahead code. 

SCSI controllers have nothing to do with improving ondisk layout, which
is the performance issue I've been referring to.

cheers.

--
Nathan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix memory hot remove not configured case.

2007-09-18 Thread KAMEZAWA Hiroyuki
Sorry... I sent old version...it returns -ENOSYS.

Andrew-san, please replace.
Goto-san, please confirm and ack.
==
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them.

 - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
   which returns -EINVAL.
 - removed remove_pages() only used in powerpc.
 - removed no-op remove_memory() in i386, sh, sparc64, x86_64.

 - only powerpc returns -ENOSYS at memory hot remove. changes it
   to return -EINVAL.

Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 arch/i386/mm/init.c|5 
 arch/ia64/mm/init.c|3 +-
 arch/powerpc/mm/mem.c  |   45 -
 arch/sh/mm/init.c  |6 -
 arch/sparc64/mm/init.c |5 
 arch/x86_64/mm/init.c  |6 -
 include/linux/memory_hotplug.h |   12 +-
 mm/memory_hotplug.c|6 +
 8 files changed, 10 insertions(+), 78 deletions(-)

Index: linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c
===
--- linux-2.6.23-rc6-mm1.orig/arch/ia64/mm/init.c
+++ linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c
@@ -719,7 +719,7 @@ int arch_add_memory(int nid, u64 start, 
 
return ret;
 }
-
+#ifdef CONFIG_MEMORY_HOTREMOVE
 int remove_memory(u64 start, u64 size)
 {
unsigned long start_pfn, end_pfn;
@@ -735,4 +735,5 @@ out:
return ret;
 }
 EXPORT_SYMBOL_GPL(remove_memory);
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif
Index: linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c
===
--- linux-2.6.23-rc6-mm1.orig/arch/powerpc/mm/mem.c
+++ linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c
@@ -129,51 +129,6 @@ int __devinit arch_add_memory(int nid, u
return __add_pages(zone, start_pfn, nr_pages);
 }
 
-/*
- * First pass at this code will check to determine if the remove
- * request is within the RMO.  Do not allow removal within the RMO.
- */
-int __devinit remove_memory(u64 start, u64 size)
-{
-   struct zone *zone;
-   unsigned long start_pfn, end_pfn, nr_pages;
-
-   start_pfn = start >> PAGE_SHIFT;
-   nr_pages = size >> PAGE_SHIFT;
-   end_pfn = start_pfn + nr_pages;
-
-   printk("%s(): Attempting to remove memoy in range "
-   "%lx to %lx\n", __func__, start, start+size);
-   /*
-* check for range within RMO
-*/
-   zone = page_zone(pfn_to_page(start_pfn));
-
-   printk("%s(): memory will be removed from "
-   "the %s zone\n", __func__, zone->name);
-
-   /*
-* not handling removing memory ranges that
-* overlap multiple zones yet
-*/
-   if (end_pfn > (zone->zone_start_pfn + zone->spanned_pages))
-   goto overlap;
-
-   /* make sure it is NOT in RMO */
-   if ((start < lmb.rmo_size) || ((start+size) < lmb.rmo_size)) {
-   printk("%s(): range to be removed must NOT be in RMO!\n",
-   __func__);
-   goto in_rmo;
-   }
-
-   return __remove_pages(zone, start_pfn, nr_pages);
-
-overlap:
-   printk("%s(): memory range to be removed overlaps "
-   "multiple zones!!!\n", __func__);
-in_rmo:
-   return -1;
-}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 void show_mem(void)
Index: linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c
===
--- linux-2.6.23-rc6-mm1.orig/arch/x86_64/mm/init.c
+++ linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c
@@ -474,12 +474,6 @@ error:
 }
 EXPORT_SYMBOL_GPL(arch_add_memory);
 
-int remove_memory(u64 start, u64 size)
-{
-   return -EINVAL;
-}
-EXPORT_SYMBOL_GPL(remove_memory);
-
 #if !defined(CONFIG_ACPI_NUMA) && defined(CONFIG_NUMA)
 int memory_add_physaddr_to_nid(u64 start)
 {
Index: linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h
===
--- linux-2.6.23-rc6-mm1.orig/include/linux/memory_hotplug.h
+++ linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h
@@ -58,10 +58,9 @@ extern int add_one_highpage(struct page 
 extern void online_page(struct page *page);
 /* VM interface that may be used by firmware interface */
 extern int online_pages(unsigned long, unsigned long);
-#ifdef CONFIG_MEMORY_HOTREMOVE
-extern int offline_pages(unsigned long, unsigned long, unsigned long);
 extern void __offline_isolated_pages(unsigned long, unsigned long);
-#endif
+extern int offline_pages(unsigned long, unsigned long, unsigned long);
+
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
@@ -171,13 +170,6 @@ static inline int mhp_notimplemented(con
 }
 
 #endif /* ! 

Re: iso9660 vs udf

2007-09-18 Thread Satyam Sharma
Hi Andries,


On Wed, 19 Sep 2007, Andries E. Brouwer wrote:
> 
> On Wed, Sep 19, 2007 at 05:48:28AM +0530, Satyam Sharma wrote:
> 
> > > > On the other hand, this filesystem announces itself as UDF
> > > > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"),
> > > > perhaps the kernel code should be more robust.
> > 
> > Could you send the complete dmesg log, and what you mean with filesystem/
> > kernel (incorrectly?) announcing it as UDF here ... I agree with Jan,
> > this sounds like an issue with mount(8) to me.
> 
> You already got the relevant part of the dmesg log. Slightly more below.

> Failed mount:
> UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', timestamp 
> 2006/03/07 16:26 (1078)
> udf: udf_read_inode(ino 547) failed !bh
> UDF-fs: Error in udf_iget, block=1, partition=1

Ok, like said, this comes from udf_fill_super(), but which shouldn't
have been called for this CD in the first place -- i.e. mount(8) shouldn't
have tried to mount a non-UDF filesystem as UDF (unless explicitly asked
as such). I was actually asking for the logs explaining why you thought
the _kernel_ incorrectly "announced" it as an UDF filesystem.

Hmm ... those "CD-RTOS", "CD-BRIDGE" and "CDUDF File System - Adaptec Inc"
bits are not dmesg output, are they? Looks like "hwinfo --cdrom" or
"isoinfo" or some such.

> I think the filesystem can be treated both as iso9660 and as udf,
> at least that is what I seem to recall CD-BRIDGE means.  Thus,
> if the kernel cannot mount it as udf, I think it is a kernel flaw.
> Given that kernel flaw, and the fact that mounting as iso9660 works,
> mount(8) could work around the kernel problem by guessing iso9660.
> But maybe we should first try to fix the kernel.

I don't think that is what CD-BRIDGE means -- so no kernel flaw :-)
What happened here is simply that in the absence of a "-t" option,
mount(8) defaulted (probably due to incorrect heuristics?) to UDF for
some reason, thereby obviously failing. I don't know who maintains
mount(8) / util-linux package, or do distributions have their own
maintainers these days (?)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote:

> JBD: Replace slab allocations with page cache allocations
> 
> JBD allocate memory for committed_data and frozen_data from slab. However
> JBD should not pass slab pages down to the block layer. Use page allocator 
> pages instead. This will also prepare JBD for the large blocksize patchset.
> 
> 
> Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

__GFP_NOFAIL should only be used when we have no way of recovering
from failure.  The allocation in journal_init_common() (at least)
_can_ recover and hence really shouldn't be using __GFP_NOFAIL.

(Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
there as a marker which says "we really shouldn't be doing this but
we don't know how to fix it").

So sometime it'd be good if you could review all the __GFP_NOFAILs in
there and see if we can remove some, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fix memory hot remove not configured case.

2007-09-18 Thread KAMEZAWA Hiroyuki
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.

 - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && 
!CONFIG_MEMORY_HOTREMOVE case.
 - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
   which returns -EINVAL.
 - removed remove_pages() only used in powerpc.
 - removed no-op remove_memory() in i386, sh, sparc64, x86_64.

 - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
   to return -EINVAL.

Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 arch/i386/mm/init.c|5 
 arch/ia64/mm/init.c|3 +-
 arch/powerpc/mm/mem.c  |   45 -
 arch/sh/mm/init.c  |6 -
 arch/sparc64/mm/init.c |5 
 arch/x86_64/mm/init.c  |6 -
 include/linux/memory_hotplug.h |   12 +-
 mm/memory_hotplug.c|6 +
 8 files changed, 10 insertions(+), 78 deletions(-)

Index: linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c
===
--- linux-2.6.23-rc6-mm1.orig/arch/ia64/mm/init.c
+++ linux-2.6.23-rc6-mm1/arch/ia64/mm/init.c
@@ -719,7 +719,7 @@ int arch_add_memory(int nid, u64 start, 
 
return ret;
 }
-
+#ifdef CONFIG_MEMORY_HOTREMOVE
 int remove_memory(u64 start, u64 size)
 {
unsigned long start_pfn, end_pfn;
@@ -735,4 +735,5 @@ out:
return ret;
 }
 EXPORT_SYMBOL_GPL(remove_memory);
+#endif /* CONFIG_MEMORY_HOTREMOVE */
 #endif
Index: linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c
===
--- linux-2.6.23-rc6-mm1.orig/arch/powerpc/mm/mem.c
+++ linux-2.6.23-rc6-mm1/arch/powerpc/mm/mem.c
@@ -129,51 +129,6 @@ int __devinit arch_add_memory(int nid, u
return __add_pages(zone, start_pfn, nr_pages);
 }
 
-/*
- * First pass at this code will check to determine if the remove
- * request is within the RMO.  Do not allow removal within the RMO.
- */
-int __devinit remove_memory(u64 start, u64 size)
-{
-   struct zone *zone;
-   unsigned long start_pfn, end_pfn, nr_pages;
-
-   start_pfn = start >> PAGE_SHIFT;
-   nr_pages = size >> PAGE_SHIFT;
-   end_pfn = start_pfn + nr_pages;
-
-   printk("%s(): Attempting to remove memoy in range "
-   "%lx to %lx\n", __func__, start, start+size);
-   /*
-* check for range within RMO
-*/
-   zone = page_zone(pfn_to_page(start_pfn));
-
-   printk("%s(): memory will be removed from "
-   "the %s zone\n", __func__, zone->name);
-
-   /*
-* not handling removing memory ranges that
-* overlap multiple zones yet
-*/
-   if (end_pfn > (zone->zone_start_pfn + zone->spanned_pages))
-   goto overlap;
-
-   /* make sure it is NOT in RMO */
-   if ((start < lmb.rmo_size) || ((start+size) < lmb.rmo_size)) {
-   printk("%s(): range to be removed must NOT be in RMO!\n",
-   __func__);
-   goto in_rmo;
-   }
-
-   return __remove_pages(zone, start_pfn, nr_pages);
-
-overlap:
-   printk("%s(): memory range to be removed overlaps "
-   "multiple zones!!!\n", __func__);
-in_rmo:
-   return -1;
-}
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 void show_mem(void)
Index: linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c
===
--- linux-2.6.23-rc6-mm1.orig/arch/x86_64/mm/init.c
+++ linux-2.6.23-rc6-mm1/arch/x86_64/mm/init.c
@@ -474,12 +474,6 @@ error:
 }
 EXPORT_SYMBOL_GPL(arch_add_memory);
 
-int remove_memory(u64 start, u64 size)
-{
-   return -EINVAL;
-}
-EXPORT_SYMBOL_GPL(remove_memory);
-
 #if !defined(CONFIG_ACPI_NUMA) && defined(CONFIG_NUMA)
 int memory_add_physaddr_to_nid(u64 start)
 {
Index: linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h
===
--- linux-2.6.23-rc6-mm1.orig/include/linux/memory_hotplug.h
+++ linux-2.6.23-rc6-mm1/include/linux/memory_hotplug.h
@@ -58,10 +58,9 @@ extern int add_one_highpage(struct page 
 extern void online_page(struct page *page);
 /* VM interface that may be used by firmware interface */
 extern int online_pages(unsigned long, unsigned long);
-#ifdef CONFIG_MEMORY_HOTREMOVE
-extern int offline_pages(unsigned long, unsigned long, unsigned long);
 extern void __offline_isolated_pages(unsigned long, unsigned long);
-#endif
+extern int offline_pages(unsigned long, unsigned long, unsigned long);
+
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
@@ -171,13 +170,6 @@ static inline int mhp_notimplemented(con
 }
 
 

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-18 Thread Siddha, Suresh B
On Fri, Sep 14, 2007 at 12:51:34PM -0700, Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Siddha, Suresh B wrote:
> > We are trying to get the latest data with 2.6.23-rc4-mm1 with and without
> > slub. Is this good enough?
>
> Good enough. If you are concerned about the page allocator pass through
> then you may want to test the page allocator pass through patchset
> separately. The fastpath of the page allocator is currently not
> competitive if you always free and allocate a single page. If contiguous
> pages are allocated then the pass through is superior.

We are having all sorts of stability issues with -mm kernels, let alone
perf testing :(

For now, we are trying to do slab Vs slub comparisons for the mainline kernels.
Let's see how that goes.

Meanwhile, any chance that you can point us at relevant recent patches/fixes
that are in -mm and perhaps that can be applied to mainline kernel?

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] cpuset write dirty map

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 17:51:49 -0700 Ethan Solomita <[EMAIL PROTECTED]> wrote:

> > 
> >> +void cpuset_update_dirty_nodes(struct address_space *mapping,
> >> +  struct page *page)
> >> +{
> >> +  nodemask_t *nodes = mapping->dirty_nodes;
> >> +  int node = page_to_nid(page);
> >> +
> >> +  if (!nodes) {
> >> +  nodes = kmalloc(sizeof(nodemask_t), GFP_ATOMIC);
> > 
> > Does it have to be atomic?  atomic is weak and can fail.
> > 
> > If some callers can do GFP_KERNEL and some can only do GFP_ATOMIC then we
> > should at least pass the gfp_t into this function so it can do the stronger
> > allocation when possible.
> 
>   I was going to say that sanity would be improved by just allocing the
> nodemask at inode alloc time. A failure here could be a problem because
> below cpuset_intersects_dirty_nodes() assumes that a NULL nodemask
> pointer means that there are no dirty nodes, thus preventing dirty pages
> from getting written to disk. i.e. This must never fail.
> 
>   Given that we allocate it always at the beginning, I'm leaning towards
> just allocating it within mapping no matter its size. It will make the
> code much much simpler, and save me writing all the comments we've been
> discussing. 8-)
> 
>   How disastrous would this be? Is the need to support a 1024 node system
> with 1,000,000 open mostly-read-only files thus needing to spend 120MB
> of extra memory on my nodemasks a real scenario and a showstopper?

None of this is very nice.  Yes, it would be good to save all that memory
and yes, I_DIRTY_PAGES inodes are very much the uncommon case.

But if a failed GFP_ATOMIC allocation results in data loss then that's a
showstopper.

How hard would it be to handle the allocation failure in a more friendly
manner?  Say, if the allocation failed then point mapping->dirty_nodes at
some global all-ones nodemask, and then special-case that nodemask in the
freeing code?

> > 
> > 
> >> +  if (!nodes)
> >> +  return;
> >> +
> >> +  *nodes = NODE_MASK_NONE;
> >> +  mapping->dirty_nodes = nodes;
> >> +  }
> >> +
> >> +  if (!node_isset(node, *nodes))
> >> +  node_set(node, *nodes);
> >> +}
> >> +
> >> +void cpuset_clear_dirty_nodes(struct address_space *mapping)
> >> +{
> >> +  nodemask_t *nodes = mapping->dirty_nodes;
> >> +
> >> +  if (nodes) {
> >> +  mapping->dirty_nodes = NULL;
> >> +  kfree(nodes);
> >> +  }
> >> +}
> > 
> > Can this race with cpuset_update_dirty_nodes()?  And with itself?  If not,
> > a comment which describes the locking requirements would be good.
> 
>   I'll add a comment. Such a race should not be possible. It is called
> only from clear_inode() which is used when the inode is being freed
> "with extreme prejudice" (from its comments). I can add a check that
> i_state I_FREEING is set. Would that do?

Sounds sane.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc6-mm1 panic (memory controller issue ?)

2007-09-18 Thread Balbir Singh
Badari Pulavarty wrote:
> On Tue, 2007-09-18 at 15:21 -0700, Badari Pulavarty wrote:
>> Hi Balbir,
>>
>> I get following panic from SLUB, while doing simple fsx tests.
>> I haven't used any container/memory controller stuff except 
>> that I configured them in :(
>>
>> Looks like slub doesn't like one of the flags passed in ?
>>
>> Known issue ? Ideas ?
>>
> 
> I think, I found the issue. I am still running tests to
> verify. Does this sound correct ?
> 
> Thanks,
> Badari
> 
> Need to strip __GFP_HIGHMEM flag while passing to 
> mem_container_cache_charge().
> 
> Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
>  mm/filemap.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.23-rc6/mm/filemap.c
> ===
> --- linux-2.6.23-rc6.orig/mm/filemap.c2007-09-18 12:43:54.0 
> -0700
> +++ linux-2.6.23-rc6/mm/filemap.c 2007-09-18 19:14:44.0 -0700
> @@ -441,7 +441,8 @@ int filemap_write_and_wait_range(struct 
>  int add_to_page_cache(struct page *page, struct address_space *mapping,
>   pgoff_t offset, gfp_t gfp_mask)
>  {
> - int error = mem_container_cache_charge(page, current->mm, gfp_mask);
> + int error = mem_container_cache_charge(page, current->mm,
> + gfp_mask & ~__GFP_HIGHMEM);
>   if (error)
>   goto out;
> 
> 
> 

Hi, Badari,

The fix looks correct, radix_tree_preload() does the same thing in
add_to_page_cache(). Thanks for identifying the fix

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Time to make CONFIG_PARAVIRT non-experimental.

2007-09-18 Thread Rusty Russell
On Tue, 2007-09-18 at 23:52 +0200, Andi Kleen wrote:
> On Tuesday 18 September 2007 23:34, Rusty Russell wrote:
> > How about a "select" based on Xen, lguest or VMI?  There's no other
> > reason to enable it, after all.
> 
> I did an patch to do that recently because  the current setup
> is indeed unobvious.
> 
> But I had to drop it again because 
> it ended up with Kconfig warnings. about undefined symbols
> on x86-64. The problem is that lguest
> is visible in Kconfig for all architectures and it warns
> if you select something that doesn't exist on all architectures.

I think that's fixed as a side-effect of this cleanup.  At least, it
works for me on x86-64.  Patch below: if you agree, I'll re-xmit all
three.

> > > Also I would still consider it experimental.
> >
> > After 9 months in mainline and three kernel versions, 
> 
> Well it changed a lot each release.

Well, the biggest change was the patching code getting enhanced in
2.6.22 (to cover all calls, not just 5).  The 22 -> 23 changes were
fairly trivial.

So I think 2.6.24 is a reasonable time to remove EXPERIMENTAL.

> > I'd hope not. 
> > It's been pretty damn stable (ok, you broke it once, but maybe that's
> > because you consider it experimental).
> 
> Is there a significant user base? 

It's enabled in Ubuntu Feisty (2.6.20).

> At least the Xen port seems to have specific requirements
> and essentially only work on xen-unstable (?) [or at least
> some very new Xen version] which probably very few
> people use.

Sure, and that might well still be experimental (Jeremy?).  But that's
not CONFIG_PARAVIRT.

Hope that helps,
Rusty.
==
Andi points out that PARAVIRT is an option best selected when needed.

We introduce PARAVIRT_GUEST for the menu itself, and select PARAVIRT
if they ask for anything which needs it.  This also makes PARAVIRT
non-experimental.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 8efa5fdb22d8 arch/i386/Kconfig
--- a/arch/i386/Kconfig Wed Sep 19 11:23:18 2007 +1000
+++ b/arch/i386/Kconfig Wed Sep 19 11:33:59 2007 +1000
@@ -214,24 +214,30 @@ config X86_ES7000
 
 endchoice
 
-menuconfig PARAVIRT
+config PARAVIRT
+   bool
+   depends on !(X86_VISWS || X86_VOYAGER)
+   help
+ This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
+menuconfig PARAVIRT_GUEST
-   bool "Paravirtualized guest support (EXPERIMENTAL)"
-   depends on EXPERIMENTAL
+   bool "Paravirtualized guest support"
-   depends on !(X86_VISWS || X86_VOYAGER)
-   help
- Paravirtualization is a way of running multiple instances of
- Linux on the same machine, under a hypervisor.  This option
- changes the kernel so it can modify itself when it is run
- under a hypervisor, improving performance significantly.
- However, when run without a hypervisor the kernel is
- theoretically slower.  If in doubt, say N.
-
-if PARAVIRT
+   help
+ Say Y here to get to see options related to running Linux under
+ various hypervisors.  This option alone does not add any kernel code.
+
+ If you say N, all options in this submenu will be skipped and 
disabled.
+
+if PARAVIRT_GUEST
 
 source "arch/i386/xen/Kconfig"
 
 config VMI
bool "VMI Guest support"
+   select PARAVIRT
help
  VMI provides a paravirtualized interface to the VMware ESX server
  (it could be used by other hypervisors in theory too, but is not
@@ -239,6 +246,7 @@ config VMI
 
 config LGUEST_GUEST
bool "Lguest guest support"
+   select PARAVIRT
depends on !X86_PAE
help
  Lguest is a tiny in-kernel hypervisor.  Selecting this will
diff -r 8efa5fdb22d8 arch/i386/xen/Kconfig
--- a/arch/i386/xen/Kconfig Wed Sep 19 11:23:18 2007 +1000
+++ b/arch/i386/xen/Kconfig Wed Sep 19 11:25:07 2007 +1000
@@ -4,6 +4,7 @@
 
 config XEN
bool "Xen guest support"
+   select PARAVIRT
depends on X86_CMPXCHG && X86_TSC && !NEED_MULTIPLE_NODES
help
  This is the Linux Xen port.  Enabling this will allow the


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: UML dead with current -git?

2007-09-18 Thread Jeff Dike
On Tue, Sep 18, 2007 at 07:55:13PM +0200, Sam Ravnborg wrote:
> Sounds to me like a known issue by you. Can you give a few more details
> so we maybe can get it fixed?

I believe what happened here is an x86_64 build followed by a
UML/x86_64 build with no intervening mrproper.

I've always considered this to be a "don't do that" sort of thing.
However, maybe we could stick the arch of the current build somewhere
in the tree, check that before any serious part of a subsequent
build, and error out if $ARCH is different.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] UML - refix ELF_CORE_COPY_REGS

2007-09-18 Thread Jeff Dike
The former uml-fix-x86_64-core-dump-crash.patch expressed
ELF_CORE_COPY_REGS in terms of the pt_regs struct currently in -mm.  I
fast-tracked this to mainline, where it was wrong because the pt_regs
struct there hadn't been changed.  Fixing that then made the patch
wrong for -mm when it was rebased on -rc6.

This patch changes things back again to be right for -mm.  This should
go to mainline after uml-rename-pt_regs-general-purpose-register-file.patch

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
---
 include/asm-um/elf-x86_64.h |   42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

Index: linux-2.6.20/include/asm-um/elf-x86_64.h
===
--- linux-2.6.20.orig/include/asm-um/elf-x86_64.h   2007-09-18 
13:28:30.0 -0400
+++ linux-2.6.20/include/asm-um/elf-x86_64.h2007-09-18 20:50:47.0 
-0400
@@ -68,27 +68,27 @@ typedef struct user_i387_struct elf_fpre
 } while (0)
 
 #define ELF_CORE_COPY_REGS(pr_reg, regs)   \
-   (pr_reg)[0] = (regs)->regs.skas.regs[0];\
-   (pr_reg)[1] = (regs)->regs.skas.regs[1];\
-   (pr_reg)[2] = (regs)->regs.skas.regs[2];\
-   (pr_reg)[3] = (regs)->regs.skas.regs[3];\
-   (pr_reg)[4] = (regs)->regs.skas.regs[4];\
-   (pr_reg)[5] = (regs)->regs.skas.regs[5];\
-   (pr_reg)[6] = (regs)->regs.skas.regs[6];\
-   (pr_reg)[7] = (regs)->regs.skas.regs[7];\
-   (pr_reg)[8] = (regs)->regs.skas.regs[8];\
-   (pr_reg)[9] = (regs)->regs.skas.regs[9];\
-   (pr_reg)[10] = (regs)->regs.skas.regs[10];  \
-   (pr_reg)[11] = (regs)->regs.skas.regs[11];  \
-   (pr_reg)[12] = (regs)->regs.skas.regs[12];  \
-   (pr_reg)[13] = (regs)->regs.skas.regs[13];  \
-   (pr_reg)[14] = (regs)->regs.skas.regs[14];  \
-   (pr_reg)[15] = (regs)->regs.skas.regs[15];  \
-   (pr_reg)[16] = (regs)->regs.skas.regs[16];  \
-   (pr_reg)[17] = (regs)->regs.skas.regs[17];  \
-   (pr_reg)[18] = (regs)->regs.skas.regs[18];  \
-   (pr_reg)[19] = (regs)->regs.skas.regs[19];  \
-   (pr_reg)[20] = (regs)->regs.skas.regs[20];  \
+   (pr_reg)[0] = (regs)->regs.gp[0];   \
+   (pr_reg)[1] = (regs)->regs.gp[1];   \
+   (pr_reg)[2] = (regs)->regs.gp[2];   \
+   (pr_reg)[3] = (regs)->regs.gp[3];   \
+   (pr_reg)[4] = (regs)->regs.gp[4];   \
+   (pr_reg)[5] = (regs)->regs.gp[5];   \
+   (pr_reg)[6] = (regs)->regs.gp[6];   \
+   (pr_reg)[7] = (regs)->regs.gp[7];   \
+   (pr_reg)[8] = (regs)->regs.gp[8];   \
+   (pr_reg)[9] = (regs)->regs.gp[9];   \
+   (pr_reg)[10] = (regs)->regs.gp[10]; \
+   (pr_reg)[11] = (regs)->regs.gp[11]; \
+   (pr_reg)[12] = (regs)->regs.gp[12]; \
+   (pr_reg)[13] = (regs)->regs.gp[13]; \
+   (pr_reg)[14] = (regs)->regs.gp[14]; \
+   (pr_reg)[15] = (regs)->regs.gp[15]; \
+   (pr_reg)[16] = (regs)->regs.gp[16]; \
+   (pr_reg)[17] = (regs)->regs.gp[17]; \
+   (pr_reg)[18] = (regs)->regs.gp[18]; \
+   (pr_reg)[19] = (regs)->regs.gp[19]; \
+   (pr_reg)[20] = (regs)->regs.gp[20]; \
(pr_reg)[21] = current->thread.arch.fs; \
(pr_reg)[22] = 0;   \
(pr_reg)[23] = 0;   \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] UML - Fix registers.c build

2007-09-18 Thread Jeff Dike
uml-stop-saving-process-fp-state.patch broke the UML/x86_64 build.

On x86_64, sys/ptrace.h has to be included before asm/ptrace.h.
Otherwise, the defines in asm/ptrace.h will ruin the parse of
sys/ptrace.h - 
asm/ptrace.h:
#define PTRACE_GETREGS12

sys/ptrace.h:
enum __ptrace_request
{
...
   PTRACE_GETREGS = 12,
...
}

Also, errno.h was missing.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
---
 arch/um/os-Linux/sys-x86_64/registers.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.20/arch/um/os-Linux/sys-x86_64/registers.c
===
--- linux-2.6.20.orig/arch/um/os-Linux/sys-x86_64/registers.c   2007-09-18 
20:51:35.0 -0400
+++ linux-2.6.20/arch/um/os-Linux/sys-x86_64/registers.c2007-09-18 
20:52:15.0 -0400
@@ -3,9 +3,10 @@
  * Licensed under the GPL
  */
 
+#include 
+#include 
 #define __FRAME_OFFSETS
 #include 
-#include 
 #include "longjmp.h"
 #include "user.h"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] UML - Two x86_64 build fixes

2007-09-18 Thread Jeff Dike
These two patches fix UML build breakages on x86_64.

They are -mm-specific, so don't need to go to mainline until 2.6.24.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nathan Scott
On Tue, 2007-09-18 at 12:44 -0700, Linus Torvalds wrote:
> This is not about performance. Never has been. It's about SGI wanting a 
> way out of their current 16kB mess.

Pass the crack pipe, Linus?

> The way to fix performance is to move to x86-64, and use 4kB pages and be 
> happy. However, the SGI people want a 16kB (and possibly bigger) 
> crap-option for their people who are (often _already_) running some 
> special case situation that nobody else cares about.

FWIW (and I hate to let reality get in the way of a good conspiracy) -
all SGI systems have always defaulted to using 4K blocksize filesystems;
there's very few customers who would use larger, especially as the Linux
kernel limitations in this area are well known.  There's no "16K mess"
that SGI is trying to clean up here (and SGI have offered both IA64 and
x86_64 systems for some time now, so not sure how you came up with that
whacko theory).

> It's not about "performance". If it was, they would never have used ia64

For SGI it really is about optimising ondisk layouts for some workloads
and large filesystems, and has nothing to do with IA64.  Read the paper
Dave sent out earlier, it's quite interesting.

For other people, like AntonA, who has also been asking for this
functionality literally for years (and ended up trying to do his own
thing inside NTFS IIRC) it's to be able to access existing filesystems
from other operating systems.  Here's a more recent discussion, I know
Anton had discussed it several times on fsdevel before this 2005 post
too:   http://oss.sgi.com/archives/xfs/2005-01/msg00126.html

Although I'm sure others exist, I've never worked on any platform other
than Linux that doesn't support filesystem block sizes larger than the
pagesize.  Its one thing to stick your head in the sand about the need
for this feature, its another thing entirely to try pass it off as an
"SGI mess", sorry.

I do entirely support the sentiment to stop this pissing match and get
on with fixing the problem though.

cheers.

--
Nathan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Linus Torvalds


On Wed, 19 Sep 2007, Nathan Scott wrote:
> 
> FWIW (and I hate to let reality get in the way of a good conspiracy) -
> all SGI systems have always defaulted to using 4K blocksize filesystems;

Yes. And I've been told that:

> there's very few customers who would use larger

.. who apparently would like to  move to x86-64. That was what people 
implied at the kernel summit.

>especially as the Linux
> kernel limitations in this area are well known.  There's no "16K mess"
> that SGI is trying to clean up here (and SGI have offered both IA64 and
> x86_64 systems for some time now, so not sure how you came up with that
> whacko theory).

Well, if that is the case, then I vote that we drop the whole patch-series 
entirely. It clearly has no reason for existing at all.

There is *no* valid reason for 16kB blocksizes unless you have legacy 
issues. The performance issues have nothing to do with the block-size, and 
should be solvable by just making sure that your stupid "state of the art" 
crap SCSI controller gets contiguous physical memory, which is best done 
in the read-ahead code.

So get your stories straight, people.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Combine instrumentation menus in kernel/Kconfig.instrumentation

2007-09-18 Thread Mathieu Desnoyers
* [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
> On Tue, 18 Sep 2007 17:12:59 EDT, Mathieu Desnoyers said:
> 
> > +++ linux-2.6-lttng/kernel/Kconfig.instrumentation  2007-09-18 13:18:17.000
> 00 -0400
> > @@ -0,0 +1,40 @@
> > +menuconfig INSTRUMENTATION
> > +   bool "Instrumentation Support"
> > +   default y
> > +   ---help---
> > + Say Y here to get to see options related to performance measurement,
> > + debugging, and testing. This option alone does not add any kernel 
> > code.
> > +
> > + If you say N, all options in this submenu will be skipped and 
> > disabled.
> 
> OK, I'll bite - given the mention of 'debugging' there, do we want to go for
> broke and *also* suck in the 'Kernel Hacking' menu as well?

Instrumentation primarity aims at debugging user-space applications by
giving the ability to extract information across execution layers, hence
being a feature useful to users, not only kernel hackers. Therefore I
strongly doubt that it belongs to the kernel hacking submenu. It today's
world, where we face complex user-space problems involving
multithreaded, multiprocesses applications, the kernel and
hypervisors, running on many cores, this kind of tool has proven useful
to many, not only kernel developers. Please have a look at the
papers (especially the OLS2007 paper) linked on http://ltt.polymtl.ca as a
starting point if you are intereted in the question.

But yes, it can also be useful to kernel debugging, amongst other
things.

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68


signature.asc
Description: Digital signature


Re: [v4l-dvb-maintainer] 2.6.23-rc6-mm1 -- "dvb_dmx_swfilter" [dr ivers/media/video/video-buf-dvb.ko] undefined!

2007-09-18 Thread Miles Lane
On 9/18/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Miles Lane wrote:
> > ERROR: "dvb_dmx_swfilter" [drivers/media/video/video-buf-dvb.ko]
> undefined!
> > ERROR: "dvb_net_init" [drivers/media/video/video-buf-dvb.ko] undefined!
> > ERROR: "dvb_dmxdev_init" [drivers/media/video/video-buf-dvb.ko] undefined!
> > ERROR: "dvb_dmx_init" [drivers/media/video/video-buf-dvb.ko] undefined!
> > ERROR: "dvb_register_frontend" [drivers/media/video/video-buf-dvb.ko]
> undefined!
> > ERROR: "dvb_register_adapter" [drivers/media/video/video-buf-dvb.ko]
> undefined!
> > ERROR: "dvb_unregister_adapter" [drivers/media/video/video-buf-dvb.ko]
> > undefined!
> > ERROR: "dvb_frontend_detach" [drivers/media/video/video-buf-dvb.ko]
> undefined!
> > ERROR: "dvb_unregister_frontend"
> > [drivers/media/video/video-buf-dvb.ko] undefined!
> > ERROR: "dvb_dmx_release" [drivers/media/video/video-buf-dvb.ko] undefined!
> > ERROR: "dvb_dmxdev_release" [drivers/media/video/video-buf-dvb.ko]
> undefined!
> > ERROR: "dvb_net_release" [drivers/media/video/video-buf-dvb.ko] undefined!
> > ERROR: "mt2131_attach" [drivers/media/video/cx23885/cx23885.ko] undefined!
> > ERROR: "s5h1409_attach" [drivers/media/video/cx23885/cx23885.ko]
> undefined!
> >
>
> The attached fix should fix the problem.

Thanks!  Looks good here.

 Miles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Git tree for old kernels from before the current tree

2007-09-18 Thread Oleg Verych
* Mon, 23 Jul 2007 11:02:39 -0700 (PDT)
>
> On Mon, 23 Jul 2007, Nicolas Pitre wrote:
>> 
>> I started this once.
>> 
>> I have (sort of) a GIT tree with all Linux revisions that I could find 
>> from v0.01 up to v1.0.9.  But the most interesting information and also 
>> what is the most time consuming is the retrieval of announcement 
>> messages for those releases in old mailing list or newsgroup archives to 
>> serve as commit log data.  It seems to be even arder to find for post 
>> v1.0 releases.
>
> Yes, I agree. Google finds some of them, but (a) I was never very good 
> about announcements anyway and (b) there's nothing really good to search 
> for, so it's very hit-and-miss.
>
> Some of the really early release notes are easy to find, just because I 
> made them available with the sources, but mostly I'd just have posten to 
> the newsgroup/mailing lists.

Maybe this can be useful somehow: ftp://ftp.shout.net/pub/users/mec/kcs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cpuset trouble after hibernate

2007-09-18 Thread Paul Menage
On 9/9/07, Pavel Machek <[EMAIL PROTECTED]> wrote:
>
> One of the cpus was unplugged during suspend... perhaps some
> save/restore is needed during hotplug/unplug?

Or else keep track separately in cpusets of

- cpus that the cpuset can run on
- cpus that the admin has specified for the cpu to run on

hotplug/hotunplug events would only affect the former; userspace would
only see/modify the latter. Then when hibernate is over and the CPUs
are hotplugged back in, things would be back as before.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
> On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
> > On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> > > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > > > Here is the incremental small cleanup patch. 
> > > > 
> > > > Remove kamlloc usages in jbd/jbd2 and consistently use 
> > > > jbd_kmalloc/jbd2_malloc.
> > > 
> > > Shouldn't we kill jbd_kmalloc instead?
> > > 
> > 
> > It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
> > places to handle memory (de)allocation( > in the future if we need to change memory allocation in jbd(e.g. not
> > using kmalloc or using different flag), we don't need to touch every
> > place in the jbd code calling jbd_kmalloc.
> 
> I disagree.  Why would jbd need to globally change the way it allocates
> memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
> variety of structures.  Having to change one particular instance won't
> necessarily mean we want to change all of them.  Adding unnecessary
> wrappers only obfuscates the code making it harder to understand.  You
> wouldn't want every subsystem to have it's own *_kmalloc() that took
> different arguments.  Besides, there aren't that many calls to kmalloc
> and kfree in the jbd code, so there wouldn't be much pain in changing
> GFP flags or whatever, if it ever needed to be done.
> 
> Shaggy

Okay, Points taken, Here is the updated patch to get rid of slab
management and jbd_kmalloc from jbd totally. This patch is intend to
replace the patch in mm tree, Andrew, could you pick up this one
instead?

Thanks,

Mingming


jbd/jbd2: JBD memory allocation cleanups

From: Christoph Lameter <[EMAIL PROTECTED]>

JBD: Replace slab allocations with page cache allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.


Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

---
 fs/jbd/commit.c   |6 +--
 fs/jbd/journal.c  |   99 ++
 fs/jbd/transaction.c  |   12 +++---
 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   99 ++
 fs/jbd2/transaction.c |   18 -
 include/linux/jbd.h   |   18 +
 include/linux/jbd2.h  |   21 +-
 8 files changed, 52 insertions(+), 227 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-18 17:19:01.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-18 17:51:21.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
-   jbd_slab_free(tmp, bh_in->b_size);
+   jbd_free(tmp, bh_in->b_size);
goto repeat;
}
 
@@ -654,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to 

Re: [PATCH] Reduce __print_symbol/sprint_symbol stack usage.

2007-09-18 Thread Satyam Sharma
Hi Gilboa,


On Sat, 15 Sep 2007, Gilboa Davara wrote:
> 
> This is my second stab at solving the "stack over flow due to
> dump_strace when close to stack-overflow is detected by do_IRQ" problem.
> (Hopefully) this patch is creates less noise then the previous one.
> 
> [snip]
> > I'll try and create an option 2 (static allocation, minimal locking)
> > patch and post ASAP.
> > Hopefully it'll fare better. (While keeping the current interface intact
> > and reducing the damage/noise)
> 
> - Gilboa
> 
> --- linux-2.6/kernel/kallsyms.orig2007-09-15 11:46:54.0 +0300
> +++ linux-2.6/kernel/kallsyms.c   2007-09-15 21:06:55.0 +0300
> @@ -306,13 +306,14 @@ int lookup_symbol_attrs(unsigned long ad
>   return lookup_module_symbol_attrs(addr, size, offset, modname, name);
>  }
>  
> -/* Look up a kernel symbol and return it in a text buffer. */
> -int sprint_symbol(char *buffer, unsigned long address)
> +/* Internal version:
> +   Look up a kernel symbol and module name and return them to the
> + caller's buffer/namebuf buffers. */

/*
 * ...
 * ...
 */

is the general coding style here ...

> +int __sprint_symbol(char *buffer, char *namebuf, unsigned long address)
>  {
> - char *modname;
> - const char *name;
>   unsigned long offset, size;
> - char namebuf[KSYM_NAME_LEN];
> + const char *name;
> + char *modname;
>  
>   name = kallsyms_lookup(address, , , , namebuf);
>   if (!name)
> @@ -325,14 +326,35 @@ int sprint_symbol(char *buffer, unsigned
>   return sprintf(buffer, "%s+%#lx/%#lx", name, offset, size);
>  }
>  
> +/* Exported version:
> +   Look up a kernel symbol and return it in a text buffer. */

ditto.

> +int sprint_symbol(char *buffer, unsigned long address)
> +{
> + char namebuf[KSYM_NAME_LEN];

Hmm, don't we intend to push this array out of the stack too?

+   static char namebuf[KSYM_NAME_LEN];
+   static DEFINE_SPINLOCK(namebuf_lock);

here ?

> +
> + return __sprint_symbol(buffer, namebuf, address);

And you'd need to wrap spin_lock_irqsave()/spin_unlock_irqrestore()
around this call.

> +}


> +static DEFINE_SPINLOCK(symbol_lock);

Try to keep the declarations of a lock, and the data that it protects,
close together. Since this lock is being used to protect "buffer", it
makes sense to ...


>  /* Look up a kernel symbol and print it to the kernel messages. */
>  void __print_symbol(const char *fmt, unsigned long address)
>  {
> - char buffer[KSYM_SYMBOL_LEN];
> + /* Use static buffers instead of char array to reduce
> +  stack footprint in i386/4KSTACKS.
> +  Buffers must be protected against re-entry. */
> + static char namebuf[KSYM_NAME_LEN];
> + static char buffer[KSYM_SYMBOL_LEN];

... have it:

+   static DEFINE_SPINLOCK(buffer_lock);

here (note the name that exactly describes what the lock protects).

And the namebuf array isn't required here, it's already there in
sprint_symbol(), which you can call from ...

> + unsigned long flags;
> +
>  
> - sprint_symbol(buffer, address);
> + spin_lock_irqsave(_lock, flags);
> +
> + __sprint_symbol(buffer, namebuf, address);

here ... sprint_symbol() ?

>   printk(fmt, buffer);
> +
> + spin_unlock_irqrestore(_lock, flags);

But I still don't much like this :-(

More importantly, if a panic occurs *below* this callchain (and let's
say we ended up in this callchain because somebody put in a dump_stack()
somewhere for debugging purposes), then we'd have a deadlock on our hands,
and nothing gets printed for that panic.

I don't know who maintains this part of kernel code, but you can try
resubmitting (with the changes suggested above) to someone appropriate ...


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] cpuset write dirty map

2007-09-18 Thread Ethan Solomita
Andrew Morton wrote:
> On Tue, 11 Sep 2007 18:36:34 -0700
> Ethan Solomita <[EMAIL PROTECTED]> wrote:
> 
>> Add a dirty map to struct address_space
> 
> I get a tremendous number of rejects trying to wedge this stuff on top of
> Peter's mm-dirty-balancing-for-tasks changes.  More rejects than I am
> prepared to partially-fix so that I can usefully look at these changes in
> tkdiff, so this is all based on a quick peek at the diff itself..

This isn't surprising. We're both changing the calculation of dirty
limits. If his code is already into your workspace, then I'll have to do
the merging after you release it.

>> +#if MAX_NUMNODES <= BITS_PER_LONG
> 
> The patch is sprinkled full of this conditional.
> 
>   I don't understand why this is being done.  afaict it isn't described
>   in a code comment (it should be) nor even in the changelogs?

I can add comments.

>   Given its overall complexity and its likelihood to change in the
>   future, I'd suggest that this conditional be centralised in a single
>   place.  Something like
> 
>   /*
>* nice comment goes here
>*/
>   #if MAX_NUMNODES <= BITS_PER_LONG
>   #define CPUSET_DIRTY_LIMITS 1
>   #else
>   #define CPUSET_DIRTY_LIMITS 0
>   #endif
> 
>   Then use #if CPUSET_DIRTY_LIMITS everywhere else.
> 
>   (This is better than #ifdef CPUSET_DIRTY_LIMITS because we'll et a
>   warning if someone typos '#if CPUSET_DITRY_LIMITS')

I can add something like this. Probably something like:

CPUSET_DIRTY_LIMITS_USEPTR

>> --- 0/include/linux/fs.h 2007-09-11 14:35:58.0 -0700
>> +++ 1/include/linux/fs.h 2007-09-11 14:36:24.0 -0700
>> @@ -516,6 +516,13 @@ struct address_space {
>>  spinlock_t  private_lock;   /* for use by the address_space 
>> */
>>  struct list_headprivate_list;   /* ditto */
>>  struct address_space*assoc_mapping; /* ditto */
>> +#ifdef CONFIG_CPUSETS
>> +#if MAX_NUMNODES <= BITS_PER_LONG
>> +nodemask_t  dirty_nodes;/* nodes with dirty pages */
>> +#else
>> +nodemask_t  *dirty_nodes;   /* pointer to map if dirty */
>> +#endif
>> +#endif
> 
> afacit there is no code comment and no changelog text which explains the
> above design decision?  There should be, please.

OK.
> 
> There is talk of making cpusets available with CONFIG_SMP=n.  Will this new
> feature be available in that case?  (it should be).

I'm not sure how useful it would be in that scenario, but for
consistency we should still be able to specify varying dirty ratios
(from patch 6/6). The above code wouldn't mean anything SMP=n since
there's only the one node. We'd just be indicating whether the inode has
any dirty pages, which we already know.

> 
>>  } __attribute__((aligned(sizeof(long;
>>  /*
>>   * On most architectures that alignment is already the case; but
>> diff -uprN -X 0/Documentation/dontdiff 0/include/linux/writeback.h 
>> 1/include/linux/writeback.h
>> --- 0/include/linux/writeback.h  2007-09-11 14:35:58.0 -0700
>> +++ 1/include/linux/writeback.h  2007-09-11 14:37:46.0 -0700
>> @@ -62,6 +62,7 @@ struct writeback_control {
>>  unsigned for_writepages:1;  /* This is a writepages() call */
>>  unsigned range_cyclic:1;/* range_start is cyclic */
>>  void *fs_private;   /* For use by ->writepages() */
>> +nodemask_t *nodes;  /* Set of nodes of interest */
>>  };
> 
> That comment is a bit terse.  It's always good to be lavish when commenting
> data structures, for understanding those is key to understanding a design.
> 
OK

>>  /*
>> diff -uprN -X 0/Documentation/dontdiff 0/kernel/cpuset.c 1/kernel/cpuset.c
>> --- 0/kernel/cpuset.c2007-09-11 14:35:58.0 -0700
>> +++ 1/kernel/cpuset.c2007-09-11 14:36:24.0 -0700
>> @@ -4,7 +4,7 @@
>>   *  Processor and Memory placement constraints for sets of tasks.
>>   *
>>   *  Copyright (C) 2003 BULL SA.
>> - *  Copyright (C) 2004-2006 Silicon Graphics, Inc.
>> + *  Copyright (C) 2004-2007 Silicon Graphics, Inc.
>>   *  Copyright (C) 2006 Google, Inc
>>   *
>>   *  Portions derived from Patrick Mochel's sysfs code.
>> @@ -14,6 +14,7 @@
>>   *  2003-10-22 Updates by Stephen Hemminger.
>>   *  2004 May-July Rework by Paul Jackson.
>>   *  2006 Rework by Paul Menage to use generic containers
>> + *  2007 Cpuset writeback by Christoph Lameter.
>>   *
>>   *  This file is subject to the terms and conditions of the GNU General 
>> Public
>>   *  License.  See the file COPYING in the main directory of the Linux
>> @@ -1754,6 +1755,63 @@ int cpuset_mem_spread_node(void)
>>  }
>>  EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
>>  
>> +#if MAX_NUMNODES > BITS_PER_LONG
> 
> waah.  In other places we do "MAX_NUMNODES <= BITS_PER_LONG"

Your sanity is important to me. Will fix.
> 
>> +
>> +/*
>> + * Special functions for NUMA systems with a large number of nodes.
>> + * The nodemask 

Re: iso9660 vs udf

2007-09-18 Thread Andries E. Brouwer
On Wed, Sep 19, 2007 at 05:48:28AM +0530, Satyam Sharma wrote:

> > > On the other hand, this filesystem announces itself as UDF
> > > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"),
> > > perhaps the kernel code should be more robust.
> 
> Could you send the complete dmesg log, and what you mean with filesystem/
> kernel (incorrectly?) announcing it as UDF here ... I agree with Jan,
> this sounds like an issue with mount(8) to me.

You already got the relevant part of the dmesg log. Slightly more below.

I think the filesystem can be treated both as iso9660 and as udf,
at least that is what I seem to recall CD-BRIDGE means.  Thus,
if the kernel cannot mount it as udf, I think it is a kernel flaw.
Given that kernel flaw, and the fact that mounting as iso9660 works,
mount(8) could work around the kernel problem by guessing iso9660.
But maybe we should first try to fix the kernel.

Andries

Failed mount:
UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', timestamp 
2006/03/07 16:26 (1078)
udf: udf_read_inode(ino 547) failed !bh
UDF-fs: Error in udf_iget, block=1, partition=1

Success:
ISO 9660 Extensions: Microsoft Joliet Level 3
ISOFS: changing to secondary root

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc6-mm1 panic (memory controller issue ?)

2007-09-18 Thread Badari Pulavarty
On Tue, 2007-09-18 at 15:21 -0700, Badari Pulavarty wrote:
> Hi Balbir,
> 
> I get following panic from SLUB, while doing simple fsx tests.
> I haven't used any container/memory controller stuff except 
> that I configured them in :(
> 
> Looks like slub doesn't like one of the flags passed in ?
> 
> Known issue ? Ideas ?
> 

I think, I found the issue. I am still running tests to
verify. Does this sound correct ?

Thanks,
Badari

Need to strip __GFP_HIGHMEM flag while passing to mem_container_cache_charge().

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
 mm/filemap.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc6/mm/filemap.c
===
--- linux-2.6.23-rc6.orig/mm/filemap.c  2007-09-18 12:43:54.0 -0700
+++ linux-2.6.23-rc6/mm/filemap.c   2007-09-18 19:14:44.0 -0700
@@ -441,7 +441,8 @@ int filemap_write_and_wait_range(struct 
 int add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t offset, gfp_t gfp_mask)
 {
-   int error = mem_container_cache_charge(page, current->mm, gfp_mask);
+   int error = mem_container_cache_charge(page, current->mm,
+   gfp_mask & ~__GFP_HIGHMEM);
if (error)
goto out;
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] cpuset dirty limits

2007-09-18 Thread Ethan Solomita
Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Andrew Morton wrote:
> 
>>> +   mutex_lock(_mutex);
>>> +   *cs_int = val;
>>> +   mutex_unlock(_mutex);
>> I don't think this locking does anything?
> 
> Locking is wrong here. The lock needs to be taken before the cs pointer 
> is dereferenced from the caller.

I think we can just remove the callback_mutex lock. Since the change is
coming from an update to a cpuset filesystem file, the cpuset is not
going anywhere since the inode is open. And I don't see that any code
really cares whether the dirty ratios change out from under them.

> 
>>> +   return 0;
>>> +}
>>> +
>>>  /*
>>>   * Frequency meter - How fast is some event occurring?
>>>   *
>>> ...
>>> +void cpuset_get_current_ratios(int *background_ratio, int *throttle_ratio)
>>> +{
>>> +   int background = -1;
>>> +   int throttle = -1;
>>> +   struct task_struct *tsk = current;
>>> +
>>> +   task_lock(tsk);
>>> +   background = task_cs(tsk)->background_dirty_ratio;
>>> +   throttle = task_cs(tsk)->throttle_dirty_ratio;
>>> +   task_unlock(tsk);
>> ditto?
> 
> It is required to take the task lock while dereferencing the tasks cpuset 
> pointer.

Agreed.
-- Ethan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wasting our Freedom

2007-09-18 Thread Alan Cox
> sorry, but calling attribution claims of any sort "petty" is nothing
> short of dangerous ignorance.

Says a man who has a .sig of "SDF Public Access UNIX System -
http://sdf.lonestar.org;

Well sdf.lonestar.org claims to be NetBSD so might I suggest your
dangerous ignorance starts at the Unix trademark.


And please take this where it belongs which is the relevant wireless
list. Better yet leave the dispute to those it actually involves, which
is not most of the OpenBSD community, nor the Linux kernel team, but a
small group of developers in the OpenBSD wireless world and a few people
in the ath5k GPL project.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iso9660 vs udf

2007-09-18 Thread Satyam Sharma
Hi,

On Tue, 18 Sep 2007, Jan Kara wrote:
> 
> > Today I got a CD. MacOS does not mount it and Linux does not
> > mount it without an explicit filesystemtype option.
> > That is,
> > # mount /dev/hdc /dir -t iso9660
> > works fine, but
> > # mount /dev/hdc /dir
> > mount: you didn't specify a filesystem type for /dev/hdc
> >I will try type udf
> > mount: wrong fs type, bad option, bad superblock on /dev/hdc,
> >missing codepage or other error
> >In some cases useful info is found in syslog - try
> >dmesg | tail  or so
> > # dmesg | tail
> > UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Wisk1956-82', 
> > timestamp 2006/03/07 16:26 (1078)
> > udf: udf_read_inode(ino 547) failed !bh
> > UDF-fs: Error in udf_iget, block=1, partition=1

That comes from udf_fill_super() but which shouldn't have been called
in the first place ...

> > Google gave me half a dozen other people that mentioned the same
> > problem (with the same inode 547). Clearly some CD mastering software
> > produces a format that Linux and MacOS do not handle easily.
> > 
> > One result of this letter will be that people with the same problem
> > learn via Google that using the "-t iso9660" option may help.
> > 
> > What goes wrong on the mount side is that when it hesitates between
> > iso9660 and udf it decides for udf when seeing "NSR02".
> > Maybe the heuristics in mount should be tuned.
>   Yes, this seems like a mount problem but you should contact mount
> maintainer for that... I guess hardly anyone will help you with this on
> this list.
> 
> > On the other hand, this filesystem announces itself as UDF
> > ("CD-RTOS" "CD-BRIDGE" "CDUDF File System - Adaptec Inc"),
> > perhaps the kernel code should be more robust.

Could you send the complete dmesg log, and what you mean with filesystem/
kernel (incorrectly?) announcing it as UDF here ... I agree with Jan,
this sounds like an issue with mount(8) to me.

> > If anybody feels responsible for mount and/or this kernel area
> > we might discuss.
>   I'm kind of taking care about UDF in kernel. What do you find
> inappropriate on the kernel reaction? You mean we should produce some
> better error message into the log?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] UML - Fix irqstack crash

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 19:33:36 -0400
Jeff Dike <[EMAIL PROTECTED]> wrote:

> ===
> --- linux-2.6.17.orig/arch/um/os-Linux/signal.c   2007-09-09 
> 11:15:37.0 -0400
> +++ linux-2.6.17/arch/um/os-Linux/signal.c2007-09-18 12:32:40.0 
> -0400
> @@ -119,7 +119,7 @@ void (*handlers[_NSIG])(int sig, struct 
>  
>  void handle_signal(int sig, struct sigcontext *sc)
>  {
> - unsigned long pending = 0;
> + unsigned long pending = 1 << sig;

You want 1UL there.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS4 authentification / fsuid

2007-09-18 Thread Satyam Sharma


On Fri, 7 Sep 2007, Kyle Moffett wrote:
> 
> So you can't draw any relationships between "Protect the end-user" with
> "Protect the device FROM the end-user", the former can be done very reliably
   ^^^ *attacker*

> to whatever level of risk-reduction you need and the latter can't practically
> be done at all.

Well, you're the one who called solving the physical access problem
"easy" here ... :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS4 authentification / fsuid

2007-09-18 Thread Satyam Sharma


On Thu, 6 Sep 2007, Kyle Moffett wrote:
> 
> On Sep 06, 2007, at 19:35:14, Trond Myklebust wrote:
> > 
> > On Thu, 2007-09-06 at 19:30 -0400, Kyle Moffett wrote:
> > > 
> > > On Sep 06, 2007, at 11:06:16, J. Bruce Fields wrote:
> > > > The question of how to protect against someone with *physical*
   ^^^
> > > > access certainly is more difficult, but surely that's a separate
^^

> > > > problem.
> > > 
> > > Actually, that's a fairly simple problem (barring disassembling the system


> > > and attaching a hardware debugger).  You encrypt the root filesystem and
> > > require a password to boot (See: LUKS).  Debian has built-in support for
> > > installing onto fs-on-LVM-on-crypt-on-RAID, and it works quite well on all
> > > the laptops I use regularly.  It's not even much of a speed penalty; once
> > > you take the overhead of hitting a 5400RPM laptop drive you can chew
> > > thousands of cycles of CPU without anybody noticing (much).  Then all you
> > > have to do is burn a copy of your /boot with bootloader onto some
> > > read-only media (like a finalized CDROM/DVDROM) and you're set to go.
> > 
> > Disconnect battery, and watch boot password go 'poof!'.
> 
> Umm, I did say "encrypt the root filesystem", didn't I?  Booting my laptops
  ^^^

The whole *point* here is to secure against physical access -- then how
can you assume "barring disassembling the system"? If you're not
considering attacks such as those, then how _are_ you solving the
physical access problem in the first place? :-)


> this way follows this procedure:
>  1) Enter BIOS boot menu
>  2) Insert /boot CDROM
>  3) Select the "CDROM" entry
>  4) Wait for kernel to start and run through initramfs
>  5) Type password into the initramfs prompt so that it can DECRYPT THE ROOT
> FILESYSTEM
>  6) Continue to boot the system.
> 
> Under this setup, tinkering with my BIOS does virtually nothing; the only
> avenues of attack are strictly of the "Install a hardware keylogger" variety.

Doesn't flashing/replacing your BIOS firmware/chip count as tinkering?
Then I don't really need a "hardware keylogger", do I ...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wasting our Freedom

2007-09-18 Thread Jacob Meuser
On Tue, Sep 18, 2007 at 08:56:47AM -0400, Theodore Tso wrote:

> all of the megabytes and megabhytes of flamewar is over these two
> lines:
> 
> > * Copyright (c) 2006-2007 Nick Kossifidis <[EMAIL PROTECTED]>
> > * Copyright (c) 2007 Jiri Slaby <[EMAIL PROTECTED]>
> 
> Petty, isn't it?  Let's just say it's b.s. like this which is why, 16
> years ago, I decided to work with Linux instead of BSD.

copyright assertion == claim of ownership, or posession.

posession is 9/10 of the law.

was it petty of UCB to claim copyrights over code USL claimed ownersip
of?

was it also petty of Novell to claim that they, and not SCO, owned
the copyright to UNIX?

sorry, but calling attribution claims of any sort "petty" is nothing
short of dangerous ignorance.

-- 
[EMAIL PROTECTED]
SDF Public Access UNIX System - http://sdf.lonestar.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] UML - Fix irqstack crash

2007-09-18 Thread Jeff Dike
This patch fixes a crash caused by an interrupt coming in when an IRQ
stack is being torn down.  When this happens, handle_signal will loop,
setting up the IRQ stack again because the tearing down had finished,
and handling whatever signals had come in.

However, to_irq_stack returns a mask of pending signals to be handled,
plus bit zero is set if the IRQ stack was already active, and thus
shouldn't be torn down.  This causes a problem because when
handle_signal goes around the loop, sig will be zero, and to_irq_stack
will duly set bit zero in the returned mask, faking handle_signal into
believing that it shouldn't tear down the IRQ stack and return
thread_info pointers back to their original values.

This will eventually cause a crash, as the IRQ stack thread_info will
continue pointing to the original task_struct and an interrupt will
look into it after it has been freed.

The fix is to stop passing a signal number into to_irq_stack.  Rather,
the pending signals mask is initialized beforehand with the bit for
sig already set.  References to sig in to_irq_stack can be replaced
with references to the mask.

Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
---
 arch/um/include/kern_util.h |2 +-
 arch/um/kernel/irq.c|7 ---
 arch/um/os-Linux/signal.c   |4 ++--
 3 files changed, 7 insertions(+), 6 deletions(-)

Index: linux-2.6.17/arch/um/include/kern_util.h
===
--- linux-2.6.17.orig/arch/um/include/kern_util.h   2007-09-11 
10:12:26.0 -0400
+++ linux-2.6.17/arch/um/include/kern_util.h2007-09-18 12:31:28.0 
-0400
@@ -117,7 +117,7 @@ extern void sigio_handler(int sig, union
 
 extern void copy_sc(union uml_pt_regs *regs, void *from);
 
-unsigned long to_irq_stack(int sig, unsigned long *mask_out);
+extern unsigned long to_irq_stack(unsigned long *mask_out);
 unsigned long from_irq_stack(int nested);
 
 #endif
Index: linux-2.6.17/arch/um/kernel/irq.c
===
--- linux-2.6.17.orig/arch/um/kernel/irq.c  2007-09-11 10:14:09.0 
-0400
+++ linux-2.6.17/arch/um/kernel/irq.c   2007-09-18 12:32:08.0 -0400
@@ -518,13 +518,13 @@ int init_aio_irq(int irq, char *name, ir
 
 static unsigned long pending_mask;
 
-unsigned long to_irq_stack(int sig, unsigned long *mask_out)
+unsigned long to_irq_stack(unsigned long *mask_out)
 {
struct thread_info *ti;
unsigned long mask, old;
int nested;
 
-   mask = xchg(_mask, 1 << sig);
+   mask = xchg(_mask, *mask_out);
if(mask != 0){
/* If any interrupts come in at this point, we want to
 * make sure that their bits aren't lost by our
@@ -534,7 +534,7 @@ unsigned long to_irq_stack(int sig, unsi
 * and pending_mask contains a bit for each interrupt
 * that came in.
 */
-   old = 1 << sig;
+   old = *mask_out;
do {
old |= mask;
mask = xchg(_mask, old);
@@ -550,6 +550,7 @@ unsigned long to_irq_stack(int sig, unsi
 
task = cpu_tasks[ti->cpu].task;
tti = task_thread_info(task);
+
*ti = *tti;
ti->real_thread = tti;
task->stack = ti;
Index: linux-2.6.17/arch/um/os-Linux/signal.c
===
--- linux-2.6.17.orig/arch/um/os-Linux/signal.c 2007-09-09 11:15:37.0 
-0400
+++ linux-2.6.17/arch/um/os-Linux/signal.c  2007-09-18 12:32:40.0 
-0400
@@ -119,7 +119,7 @@ void (*handlers[_NSIG])(int sig, struct 
 
 void handle_signal(int sig, struct sigcontext *sc)
 {
-   unsigned long pending = 0;
+   unsigned long pending = 1 << sig;
 
do {
int nested, bail;
@@ -134,7 +134,7 @@ void handle_signal(int sig, struct sigco
 * have to return, and the upper handler will deal
 * with this interrupt.
 */
-   bail = to_irq_stack(sig, );
+   bail = to_irq_stack();
if(bail)
return;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Clarify pci_iomap() usage for MMIO-only devices

2007-09-18 Thread Linus Torvalds


On Tue, 18 Sep 2007, Jeff Garzik wrote:
> 
> Easy enough... 'pcimap' branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git

This is wrong.

You must not put it in lib/iomap.c, since that file is only compiled for 
architectures that use CONFIG_GENERIC_IOMAP.

So you need to put it in some *generic* PCI place, like drivers/pci/pci.c 
or similar.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG? Suspend during active sound playback kills sound

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 16:06:21 -0700
Shentino <[EMAIL PROTECTED]> wrote:

> Run any program that opens the ALSA sound (and probably the dsp
> legacy), and then suspend to disk during playback.
> 
> On the next and each subsequent thawout, the sound is dead even if you
> close and repoen the sound.  Only a "cold boot" can fix it.

Which kernel version, which sound driver and what type of sound card?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS4 authentification / fsuid

2007-09-18 Thread Satyam Sharma


On Fri, 7 Sep 2007, J. Bruce Fields wrote:
> 
> On Fri, Sep 07, 2007 at 01:32:52AM +0200, Trond Myklebust wrote:
> > Sorry. Of course, you have to copy the entire /lib, etc. onto the tmpfs,
> > but you get the gist
> > 
> > The point is that it is easy to subvert userspace if you have enough
> > privileges. In the above example it may not be entirely undetectable,
> > but who here is running a script on every login to check that / is
> > indeed uncompromised?
> 
> I suppose this is the motivation for things like the "secure attention
> key"?
> 
> But I'm most curious actually about to what degree the kernel itself is
> vulnerable to root (without a reboot).  Is disabling /dev/kmem and
> module-loading in theory enough?

No, not in theory, not in practice. But yeah, restricting an attacker's
ability to hack hardware (by controlling physical access) does take out a
whole class of attack vectors.

But, seriously, such discussion has the tendency to quickly get t
theoretical (thus losing practical significance). For example, would we
not also need to prevent the (userspace) superuser from being able to run
arbitrary executables that can modify firmware? Okay, let's say we have
a kernelspace infrastructure of verifying cryptographic signatures on
binaries before executing them ... but how practical/usable is this?
How practically/universally applicable is a system whose security derives
from keeping machines behind locked doors and protected by incorruptible,
armed guard?

Overall, I tend to be unenthusiastic about most schemes that claim to
have solved the user-kernel security problem (with no loss of usability/
practicality).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/4] Linux Kernel Markers - Documentation

2007-09-18 Thread Randy Dunlap
On Tue, 18 Sep 2007 17:13:27 -0400 Mathieu Desnoyers wrote:

> Here is some documentation explaining what is/how to use the Linux
> Kernel Markers.
> 
> ---
> 
>  Documentation/markers/markers.txt  |   93 +++
>  Documentation/markers/src/Makefile |7 ++
>  Documentation/markers/src/marker-example.c |   55 
>  Documentation/markers/src/probe-example.c  |   98 
> +
>  4 files changed, 253 insertions(+)
> 
> Index: linux-2.6-lttng/Documentation/markers/markers.txt
> ===
> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ linux-2.6-lttng/Documentation/markers/markers.txt 2007-09-07 
> 09:17:45.0 -0400
> @@ -0,0 +1,93 @@

> +The marker mechanism supports inserting multiple instances of the same 
> marker.
> +Markers can be put in inline functions, inlined static functions, and
> +unrolled loops.

as well as regular functions ?

> +* Probe / marker example
> +
> +See the example provided in Documentation/markers/markers/src

   drop one of ^^^ "markers/"

> +Run, as root :
> +
> +make
> +insmod marker-example.ko (insmod order is not important)
> +insmod probe-example.ko
> +cat /proc/marker-example (returns an expected error)
> +rmmod marker-example probe-example
> +dmesg


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS4 authentification / fsuid

2007-09-18 Thread Satyam Sharma


On Thu, 6 Sep 2007, J. Bruce Fields wrote:
> 
> On Thu, Sep 06, 2007 at 01:59:50PM +0530, Satyam Sharma wrote:
> > Oh and btw, note that we're talking of the (lack of) security of a
> > "running kernel" here -- because across reboots, there is /really/
> > *absolutely* no such thing as "kernelspace security" because the superuser
> > will simply switch the vmlinuz itself ...
> 
> Well, the machine could be booting from cdrom, and could live in a
> locked machine room.

And how is this different from the "trusted tamperproof hardware"
solution I proposed earlier? From an attack vector p.o.v. they are both
precisely the same -- both of them are designed to prevent the attacker
from gaining unfettered access to system hardware, hmm?

Oh, actually, if past history is anything to go by, then your scheme
is provably weaker. Security systems are invariably always broken at
their weakest link, which is invariably always the human/social element,
and your scheme derives its security by relying on *social* element.

To elaborate my point, what prevents me from bribing / torturing /
blackmailing whoever owns the key to that locked server room and ...

The attack is "non-technical", but hey, so was your security :-)


> Or people with root on a virtual host don't
> necessarily have the ability to replace the kernel for that host.

Again, you're restricting physical access ... but okay, this is a slightly
more plausible solution (but one that applies to only a *specific* kind of
situation).


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Clarify pci_iomap() usage for MMIO-only devices

2007-09-18 Thread Jeff Garzik

Benjamin Herrenschmidt wrote:

On Tue, 2007-09-18 at 16:21 -0400, Jeff Garzik wrote:

A new pci_mmio_map() helper, to be used with 100% MMIO hardware, might
help eliminate confusion. 


Maybe not the best name in theory but at least would show that it
relates to existing ioremap would be pci_ioremap()



Easy enough... 'pcimap' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git

Jeff


commit 6e09c71822f76c618353682bf295fc7588284521
Author: Jeff Garzik <[EMAIL PROTECTED]>
Date:   Tue Sep 18 19:06:08 2007 -0400

Add pci_ioremap() to generic iomap lib.

(arches that don't wish to use lib/iomap.c's version may fill in their own)

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

 include/asm-generic/iomap.h |1 +
 lib/iomap.c |   34 ++
 2 files changed, 35 insertions(+)

6e09c71822f76c618353682bf295fc7588284521
diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index cde592f..611e6cf 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -63,6 +63,7 @@ extern void ioport_unmap(void __iomem *);
 /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
 struct pci_dev;
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long 
max);
+extern void __iomem *pci_ioremap(struct pci_dev *dev, int bar, unsigned long 
max);
 extern void pci_iounmap(struct pci_dev *dev, void __iomem *);
 
 #endif
diff --git a/lib/iomap.c b/lib/iomap.c
index 864f2ec..0338da0 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -275,9 +275,43 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, 
unsigned long maxlen)
return NULL;
 }
 
+/**
+ * pci_ioremap - create a virtual mapping cookie for a memory-based PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @maxlen: length of MMIO memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using read*() and write*().
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR without checking for its length first, pass %0 here.
+ * */
+void __iomem *pci_ioremap(struct pci_dev *dev, int bar, unsigned long maxlen)
+{
+   unsigned long start = pci_resource_start(dev, bar);
+   unsigned long len = pci_resource_len(dev, bar);
+   unsigned long flags = pci_resource_flags(dev, bar);
+
+   if (!len || !start)
+   return NULL;
+   if (maxlen && len > maxlen)
+   len = maxlen;
+   if (flags & IORESOURCE_IO)
+   return NULL;
+   if (flags & IORESOURCE_MEM) {
+   if (flags & IORESOURCE_CACHEABLE)
+   return ioremap(start, len);
+   return ioremap_nocache(start, len);
+   }
+   /* What? */
+   return NULL;
+}
+
 void pci_iounmap(struct pci_dev *dev, void __iomem * addr)
 {
IO_COND(addr, /* nothing */, iounmap(addr));
 }
 EXPORT_SYMBOL(pci_iomap);
+EXPORT_SYMBOL(pci_ioremap);
 EXPORT_SYMBOL(pci_iounmap);


BUG? Suspend during active sound playback kills sound

2007-09-18 Thread Shentino
Run any program that opens the ALSA sound (and probably the dsp
legacy), and then suspend to disk during playback.

On the next and each subsequent thawout, the sound is dead even if you
close and repoen the sound.  Only a "cold boot" can fix it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS: some bad numbers with Java/database threading [FIXED]

2007-09-18 Thread Chuck Ebbert
On 09/18/2007 06:46 PM, Ingo Molnar wrote:
>>> We need a (tested) 
>>> solution for 2.6.23 and the CFS-devel patches are not for 2.6.23. I've 
>>> attached below the latest version of the -rc6 yield patch - the switch 
>>> is not dependent on SCHED_DEBUG anymore but always available.
>>>
>> Is this going to be merged? And will you be making the default == 1 or 
>> just leaving it at 0, which forces people who want the older behavior 
>> to modify the default?
> 
> not at the moment - Antoine suggested that the workload is probably fine 
> and the patch against -rc6 would have no clear effect anyway so we have 
> nothing to merge right now. (Note that there's no "older behavior" 
> possible, unless we want to emulate all of the O(1) scheduler's 
> behavior.) But ... we could still merge something like that patch, but a 
> clearer testcase is needed. The JVM's i have access to work fine.

I just got a bug report today:

https://bugzilla.redhat.com/show_bug.cgi?id=295071

==

Description of problem:

The CFS scheduler does not seem to implement sched_yield correctly. If one
program loops with a sched_yield and another program prints out timing
information in a loop. You will see that if both are taskset to the same core
that the timing stats will be twice as long as when they are on different cores.
This problem was not in 2.6.21-1.3194 but showed up in 2.6.22.4-65 and continues
in the newest released kernel 2.6.22.5-76. 

Version-Release number of selected component (if applicable):

2.6.22.4-65 through 2.6.22.5-76

How reproducible:

Very

Steps to Reproduce:
compile task1
int main() {
while (1) {
sched_yield();
}
return 0;
}

and compile task2

#include 
#include 
int main() {
while (1) {
int i;
struct timeval t0,t1;
double usec;

gettimeofday(, 0);
for (i = 0; i < 1; ++i)
;
gettimeofday(, 0);

usec = (t1.tv_sec * 1e6 + t1.tv_usec) - (t0.tv_sec * 1e6 + t0.tv_usec);
printf ("%8.0f\n", usec);
}
return 0;
}

Then run:
"taskset -c 0 ./task1"
"taskset -c 0 ./task2"

You will see that both tasks use 50% of the CPU. 
Then kill task2 and run:
"taskset -c 1 ./task2"

Now task2 will run twice as fast verifying that it is not some anomaly with the
way top calculates CPU usage with sched_yield.
  
Actual results:
Tasks with sched_yield do not yield like they are suppose to.

Expected results:
The sched_yield task's CPU usage should go to near 0% when another task is on
the same CPU.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-dvb] [PATCH] Userspace tuner

2007-09-18 Thread Alan Cox
On Tue, Sep 18, 2007 at 07:56:05PM -0300, Mauro Carvalho Chehab wrote:
> proprietary format. This way, an userspace app may use the userspace
> library as a "fallback method" for unknown FOURCC formats. The result
> will be probably far away from an optimal result on some cases (since it
> probably mean double buffering), but this will at least allow userspace
> apps to work. As performance become an issue, the userspace app
> developer may use the GPL code at userspace API as a reference to write
> a proper optimized format driver for its apps.

You can dynamically load libraries based on constructed path names which
means you can write a simple library for media conversions which in turn
will try and open libv4l-format-ABCD.so for any format it doesn't know - and
thus is extensible

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-dvb] [PATCH] Userspace tuner

2007-09-18 Thread Mauro Carvalho Chehab
> The reason why there is no single 'format conversion library' that
> everybody uses is because of the large differences between requirements
> for such a thing. The line between 'format conversion' and things such
> as a video codec, or image processing is very vague.

Agreed. What I think it should happen is that the userspace library
should focus at the "weird" codecs. E. g. those which uses some sort of
proprietary format. This way, an userspace app may use the userspace
library as a "fallback method" for unknown FOURCC formats. The result
will be probably far away from an optimal result on some cases (since it
probably mean double buffering), but this will at least allow userspace
apps to work. As performance become an issue, the userspace app
developer may use the GPL code at userspace API as a reference to write
a proper optimized format driver for its apps.

Just my 2 cents.

Cheers,
Mauro

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc6-mm1

2007-09-18 Thread Gabriel C
Gabriel C wrote:
> Sam Ravnborg wrote:
>> On Tue, Sep 18, 2007 at 03:42:58PM -0400, Miles Lane wrote:
>>> On 9/18/07, Sam Ravnborg <[EMAIL PROTECTED]> wrote:
 Hi Miles.
 On Tue, Sep 18, 2007 at 11:27:23AM -0400, Miles Lane wrote:
> Selecting Help for "Subarchitecture Type" causes "make menuconfig" to
> crash, and the bash display settings have to be reset.
 Not reproduceable here.
 But I noticed that we pass a null pointer to a vsprintf function which
 in the cases you pointed out printed a (null) at my system.
 Could you plase try if attached patch fix your system.
>>> Sorry, it still crashes.  I am running Ubuntu pre-6.10 (Gutsy -- the
>>> development version of the distro).  Maybe I should try "make
>>> mrproper" first?
>> make mrproper should not do any difference here.
>> I rather think you hit some ncurses bug.
>>
>> If you could add '-g' to HOSTCFLAGS in top-level Makefile
>> and then do:
>> rm scripts/kconfig/mconf.o scripts/kconfig/mconf
>> make menuconfig
>>
>> (to build mconf and to check that the error is still reproduceable).
>> And then run it in a debugger like this:
>> gdb scripts/kconfig/mconf
>> run arch/x86_64/Kconfig
>>  ^^ replace with your actual arch
>>
>> Provoke the error and get a back-trace with 'bt'.
> 
> Hi Sam,
> 
> I can reproduce this bug on Frugalware Linux. 
> 
> Here the bt:
> 
> Program received signal SIGSEGV, Segmentation fault.  
>
> 0xb7dc4143 in strlen () from /lib/libc.so.6   
>   
>   
> (gdb) bt  
>   
>   
> #0  0xb7dc4143 in strlen () from /lib/libc.so.6   
>   
>   
> #1  0x0804fd60 in str_append (gs=0xbfe4f6e8, s=0x0) at 
> scripts/kconfig/util.c:87 
>  
> #2  0x0804e0cb in expr_print (e=0x8e22df8, fn=0x804fda0 
> , data=0xbfe4f6e8, prevtoken=0) at 
> scripts/kconfig/expr.c:1037
> #3  0x0804e1e7 in expr_gstr_print (e=0x8e22df8, gs=0xbfe4f6e8) at 
> scripts/kconfig/expr.c:1099   
>   
> #4  0x0804a07e in get_symbol_str (r=0xbfe4f6e8, sym=0x8b54ee8) at 
> scripts/kconfig/mconf.c:334   
>   
> #5  0x0804a363 in show_help (menu=0x8b54f88) at scripts/kconfig/mconf.c:738   
>   
>   
> #6  0x0804acec in conf (menu=0x8b69480) at scripts/kconfig/mconf.c:781
>   
>   
> #7  0x0804a971 in conf (menu=0x8063c40) at scripts/kconfig/mconf.c:703
>   
>   
> #8  0x0804af8a in main (ac=Cannot access memory at address 0x0
>   
>   
> ) at scripts/kconfig/mconf.c:917
> 
> 
> Looks somewhat strange -> http://194.231.229.228/menuconfig.png
> 
> PS: Is without the patch you posted , I'll try with in a bit

The crash is still there but the (null)'s are all fixed by this patch.
 
Gabriel 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS: some bad numbers with Java/database threading [FIXED]

2007-09-18 Thread Ingo Molnar

* Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> On 09/14/2007 11:32 AM, Ingo Molnar wrote:
> > * Antoine Martin <[EMAIL PROTECTED]> wrote:
> > 
>  have an impact) Keep CONFIG_SCHED_DEBUG=y to be able to twiddle the
>  sysctl.
> >> It looks good now! Updated results here:
> >> http://devloop.org.uk/documentation/database-performance/Linux-Kernels/Kernels-ManyThreads-CombinedTests5-10msYield-noload.png
> >> http://devloop.org.uk/documentation/database-performance/Linux-Kernels/Kernels-ManyThreads-CombinedTests5-10msYield.png
> >> Compared with more kernels here - a bit more cluttered:
> >> http://devloop.org.uk/documentation/database-performance/Linux-Kernels/Kernels-ManyThreads-CombinedTests4-10msYield-noload.png
> >>
> >> Thanks Ingo!
> >> Does this mean that I'll have to keep doing:
> >> echo 1 > /proc/sys/kernel/sched_yield_bug_workaround
> >> Or are you planning on finding a more elegant solution?
> > 
> > just to make sure - can you get it to work fast with the 
> > -rc6+yield-patch solution too? (i.e. not CFS-devel) We need a (tested) 
> > solution for 2.6.23 and the CFS-devel patches are not for 2.6.23. I've 
> > attached below the latest version of the -rc6 yield patch - the switch 
> > is not dependent on SCHED_DEBUG anymore but always available.
> > 
> 
> Is this going to be merged? And will you be making the default == 1 or 
> just leaving it at 0, which forces people who want the older behavior 
> to modify the default?

not at the moment - Antoine suggested that the workload is probably fine 
and the patch against -rc6 would have no clear effect anyway so we have 
nothing to merge right now. (Note that there's no "older behavior" 
possible, unless we want to emulate all of the O(1) scheduler's 
behavior.) But ... we could still merge something like that patch, but a 
clearer testcase is needed. The JVM's i have access to work fine.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wasting our Freedom

2007-09-18 Thread Martin Schlemmer
On Tue, 2007-09-18 at 11:55 -0700, Can E. Acar wrote:
> Theodore Tso wrote:
> > On Mon, Sep 17, 2007 at 03:06:37PM -0700, Can E. Acar wrote:
> >> The only remaining issue is whether Nick & Jiri have enough
> >> original contributions to the code to be added to the Copyright.
> >>
> >> I believe this needs to be resolved between Reyk and Nick and Jiri.
> >>
> >> The main reason of Theo's message, linked earlier, was the
> >> lack of response on this issue. It seems that the SFLC is
> >> dismissing this issue,000d8b92-0010lling its resolution by the
> >> developers.
> > 
> > OK, so all of this flaming, and digging up of "licenses ripped off",
> > and chaff thrown up in the air, and moaning and bewailing about
> > "theft", is now down to these two lines regarding Nick and Jiri:
> 
> Yes, quite an improvement, considering how it all started, dont you think?
> Pity it took so much pushing and dragging to get people to do the right
> thing.
> There is just one little step to go. It is can not be that hard, can it?
> 

Apparently.


> >> * Copyright (c) 2004-2007 Reyk Floeter <[EMAIL PROTECTED]>
> >> * Copyright (c) 2006-2007 Nick Kossifidis <[EMAIL PROTECTED]>
> >> * Copyright (c) 2007 Jiri Slaby <[EMAIL PROTECTED]>
> >> [snip rest of BSD license]
> > 
> > It's under a BSD license; what material difference does those two
> > lines make, for goodness sake?  It's under a BSD license, so it's not
> > like anything won't be "given back".
> 
> As a programmer, you sure would know what difference any "two lines"
> would make on your program. When it comes to law, you seem to lose
> that intuition.
> 
> 
> > Whether or not they have made
> > enough for changes is really a question for the lawyers, and may
> > differ from one jurisdiction to another
> > --- but whether or not they have now, or maybe will not make until later ---
> 
> Well, they can add their names *anywhere* in the whole file, *except*
> these two lines. See, these lines have a whole different meaning
> when it comes to laws.  When they make sufficient contribution, they
> sure can add their names. What is so difficult to understand here?
> 

So, here is the actual commit of the code in Linville's wireless
networking development tree:

http://git.kernel.org/?p=linux/kernel/git/linville/wireless-dev.git;a=commitdiff;h=fb32e1730a91e39adcf06ed5254bfc5a65d17a9b

It I am not mistaken, it was Sunday afternoon, so probably 5/6 or more
of this thread consisting of more than 110 messages (according to my
inbox) to LKML was after this time.

As this already had the BSD license ...

Anyway, as for the changes, I am not going to check the original, but
from the first commit up to now is here:

http://git.kernel.org/?p=linux/kernel/git/linville/wireless-dev.git;a=blobdiff;f=drivers/net/wireless/ath5k_hw.c;h=e4cc307e9590a71bcc8542c45dbd2caf3f9e8fe5;hp=f273c42d4004b81597e7cfc5f7eec757a7c52910;hb=everything;hpb=fb32e1730a91e39adcf06ed5254bfc5a65d17a9b

Running a diffstat shows:

 ath5k_hw.c |  344 +
 1 file changed, 165 insertions(+), 179 deletions(-)

But not having the original version, and as the other two lines are
already present, I am not going to look closer at the changes.




However, the question I wanted to ask, was this:

Can all those that still feel that there is a problem, please go and
look at the original, compare it to the current, and then determine (ie,
go ask a lawyer or some other appropriate person if need be) if the
changes is enough of a contribution *BEFORE* posting again?

Pretty please with sugar on top?


Thanks,

M


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/7] Immediate Values - i386 Optimization

2007-09-18 Thread Andi Kleen
On Tue, Sep 18, 2007 at 03:29:50PM -0700, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
> >   
> >> It's a pity that gas seems to generate plain 0x90 nops rather than
> >> long-nop forms here.  I thought it could do that.
> >> 
> >
> > .p2align does it
> 
> Just .p2align?  Not align, balign, org or skip?  Seems... strange.

The problem is that you cannot always safely jump into the middle of the 
longer form nops. So I suppose they didn't risk breakage on older
code relying on this.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc6-mm1

2007-09-18 Thread Gabriel C
Sam Ravnborg wrote:
> On Tue, Sep 18, 2007 at 03:42:58PM -0400, Miles Lane wrote:
>> On 9/18/07, Sam Ravnborg <[EMAIL PROTECTED]> wrote:
>>> Hi Miles.
>>> On Tue, Sep 18, 2007 at 11:27:23AM -0400, Miles Lane wrote:
 Selecting Help for "Subarchitecture Type" causes "make menuconfig" to
 crash, and the bash display settings have to be reset.
>>> Not reproduceable here.
>>> But I noticed that we pass a null pointer to a vsprintf function which
>>> in the cases you pointed out printed a (null) at my system.
>>> Could you plase try if attached patch fix your system.
>> Sorry, it still crashes.  I am running Ubuntu pre-6.10 (Gutsy -- the
>> development version of the distro).  Maybe I should try "make
>> mrproper" first?
> 
> make mrproper should not do any difference here.
> I rather think you hit some ncurses bug.
> 
> If you could add '-g' to HOSTCFLAGS in top-level Makefile
> and then do:
> rm scripts/kconfig/mconf.o scripts/kconfig/mconf
> make menuconfig
> 
> (to build mconf and to check that the error is still reproduceable).
> And then run it in a debugger like this:
> gdb scripts/kconfig/mconf
> run arch/x86_64/Kconfig
>  ^^ replace with your actual arch
> 
> Provoke the error and get a back-trace with 'bt'.

Hi Sam,

I can reproduce this bug on Frugalware Linux. 

Here the bt:

Program received signal SIGSEGV, Segmentation fault.
 
0xb7dc4143 in strlen () from /lib/libc.so.6 

  
(gdb) bt

  
#0  0xb7dc4143 in strlen () from /lib/libc.so.6 

  
#1  0x0804fd60 in str_append (gs=0xbfe4f6e8, s=0x0) at 
scripts/kconfig/util.c:87   
   
#2  0x0804e0cb in expr_print (e=0x8e22df8, fn=0x804fda0 
, data=0xbfe4f6e8, prevtoken=0) at 
scripts/kconfig/expr.c:1037
#3  0x0804e1e7 in expr_gstr_print (e=0x8e22df8, gs=0xbfe4f6e8) at 
scripts/kconfig/expr.c:1099 

#4  0x0804a07e in get_symbol_str (r=0xbfe4f6e8, sym=0x8b54ee8) at 
scripts/kconfig/mconf.c:334 

#5  0x0804a363 in show_help (menu=0x8b54f88) at scripts/kconfig/mconf.c:738 

  
#6  0x0804acec in conf (menu=0x8b69480) at scripts/kconfig/mconf.c:781  

  
#7  0x0804a971 in conf (menu=0x8063c40) at scripts/kconfig/mconf.c:703  

  
#8  0x0804af8a in main (ac=Cannot access memory at address 0x0  

  
) at scripts/kconfig/mconf.c:917


Looks somewhat strange -> http://194.231.229.228/menuconfig.png

PS: Is without the patch you posted , I'll try with in a bit


> 
> Thanks,
>   Sam

Gabriel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] revert ath5k ioread32()/iowrite32() usage - use readl()/writel(), we're MMIO-only

2007-09-18 Thread Jeff Garzik

Benjamin Herrenschmidt wrote:

To be more precise, a platform has every right to return some kind of
"token" from ioport_map/pci_iomap that encodes the type of address, and
that is -different- from what a normal ioremap does. In which case, you
will -not- be able to use readb/writeb & cie on such a token.

The fact that current implementations seem to return something for MMIO
that is equivalent to what ioremap returns is an accident and cannot be
relied upon.



Fair enough.  It's easy enough to change ath5k to using ioremap (or 
pci_ioremap).


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Clarify pci_iomap() usage for MMIO-only devices

2007-09-18 Thread Linus Torvalds


On Wed, 19 Sep 2007, Benjamin Herrenschmidt wrote:
> 
> Also, I've been told that modern x86 chipsets have the ability to remap
> IO space in the CPU physical address space. Is that true ? That would
> allow even x86 to get rid of the condition and just use some magic
> offset at map time.

I've not seen that, but I wouldn't be entirely surprised if IO 
virtualization eventually causes something like this to happen: 
virtualizing PIO is just damn painful right now, due to the lack of any 
way to remap it.

I *think* you may be confused with the PCI config cycles, where the new 
MMIO configuration was introduced (for similar virtualization reasons). 
But it's also possible that this is one of those undocumented areas and 
CPU's actually do have some IO remapping facility.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lguest] [PATCH] Introduce "used_vectors" bitmap which can be used to reserve vectors.

2007-09-18 Thread ron minnich
On 9/13/07, Rusty Russell <[EMAIL PROTECTED]> wrote:
> Hi Andi and everyone,
>
> Wanted to get your thoughts on this patch.  lguest now supports plan9
> guests which use 0x40 for system calls.  We want to let the guests use
> that vector if available, but have no way to stop io_apic from
> clobbering it.  This does that, and also simplifies the current code a
> little.

I can confirm that this patch supports Plan 9 perfectly as a guest.

thanks

ron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/7] Immediate Values - i386 Optimization

2007-09-18 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
>> Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
>>   
>>> It's a pity that gas seems to generate plain 0x90 nops rather than
>>> long-nop forms here.  I thought it could do that.
>>> 
>> .p2align does it
> 
> Just .p2align?  Not align, balign, org or skip?  Seems... strange.
> 

Probably it works for .align and .balign too, but not .org.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/7] Immediate Values - i386 Optimization

2007-09-18 Thread Jeremy Fitzhardinge
Andi Kleen wrote:
> Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:
>   
>> It's a pity that gas seems to generate plain 0x90 nops rather than
>> long-nop forms here.  I thought it could do that.
>> 
>
> .p2align does it

Just .p2align?  Not align, balign, org or skip?  Seems... strange.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >