Re: [PATCH] fs/ocfs2/: removed unneeded initial value and function's return value

2007-09-26 Thread Mark Fasheh
On Thu, Sep 27, 2007 at 02:10:04AM +0800, Denis Cheng wrote:
> Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>

Committed, thanks.
--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] Avoid taking waitqueue lock in dmapool

2007-09-26 Thread David Miller
From: Matthew Wilcox <[EMAIL PROTECTED]>
Date: Wed, 26 Sep 2007 15:01:16 -0400

> With one trivial change (taking the lock slightly earlier on wakeup
> from schedule), all uses of the waitq are under the pool lock, so we
> can use the locked (or __) versions of the wait queue functions, and
> avoid the extra spinlock.
> 
> Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>

This one looks good to me:

Acked-by: David S. Miller <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/24] CRED: Introduce a COW credentials record

2007-09-26 Thread David Howells
Al Viro <[EMAIL PROTECTED]> wrote:

> Umm...  Perhaps a better primitive would be "make sure that our cred is
> not shared with anybody, creating a copy and redirecting reference to
> it if needed".

I wanted to make the point that once a cred record was made live - i.e. exposed
to the rest of the system - it should not be changed.  I'll think about
rewording that.  Also "making sure that our cred is not shared" does not work
for cachefiles where we actually want to create a new set of creds.

Al Viro <[EMAIL PROTECTED]> wrote:

> > In addition, the default setting of i_uid and i_gid to fsuid and fsgid has
> > been moved from the callers of new_inode() into new_inode() itself.
> 
> I don't think it's safe; better do something trivial like
>   own_inode(inode)
> that would set these (and that's a goot splitup candidate, to go in front
> of the series).

I think you're probably right.  I commented on this at the bottom of the cover
note.  One thing I could do is provide a variant on own_inode() that takes a
parent dir inode pointer and does the sticky GID thing - something that several
filesystems do.

> FWIW, the main weakness here is the need of update_current_cred() splattered
> all over the entry points.

Yeah.  I'm not keen on that, but I'm even less keen on sticking something in
everywhere that the cred struct is consulted.  I don't like the idea of making
it implicit in the dereference of current->cred either, and neither is Linus.

> Two problems:
>   a) it's a bug source (somebody adds a syscall and forgets to
> add that call / somebody modifies syscall guts and doesn't notice that
> it needs to be added).

It's simpler to check for its existence at the beginning of a syscall.

>   b) it's almost always doing noting, so being lazier would be
> better (event numbers checked in the inlined part, perhaps?)

Linus is against having an inlined part:-/

> The former would be more robust if it had been closer to the places where
> we get to passing current->cred to functions.

You can't do it there because there may be an override in effect.  Or, rather,
if you do do it there, you have to not do it if there's an override set.

> The latter...  When do we actually step into this kind of situation (somebody
> changing keys on us)

There are four cases:

 (1) The request_key() upcall forces us to create a thread keyring.

 (2) The request_key() upcall forces us to create a process keyring.

 (3) A sibling thread instantiates our common process keyring.

 (4) A sibling thread replaces our common session keyring.

The first three could be trivially avoidable by creating the thread and process
keyrings in advance, (1) and (2) at request_key() time, (3) at clone time.  It
eats extra resources, but it's easy.

The fourth is more tricky.  A sibling thread can replace our common session
keyring on us at any time.  I suppose we could decree that you can't replace
your session keyring if you've got multiple threads.  That ought to be simple
enough, and I suspect won't impact particularly.

The alternatives are (b) not to include the keyrings in the cred stuff, though
they are relevant; and (c) to make it possible for sibling threads to change
each other's creds.  I'm really not keen on (c) as that means you can't just
dereference your own creds directly without taking locks and stuff.

> and what's the right semantics here?  E.g. if it happens
> in the middle of long read(), do we want to keep using the original keys?

If you're in the middle of a long read(), you should be using the cred struct
attached to file->f_cred, not current->cred, and so that problem should not
arise.

As for long ops that aren't I/O operations on file descriptors, I think it's
reasonable for you to do the entire op with the creds you started off doing it
with.

Don't forget that there's also the cap_effective stuff, which appears that it
can be changed by someone other than the target process.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] dmapool: Validate parameters to dma_pool_create

2007-09-26 Thread roel
Matthew Wilcox wrote:
> On Wed, Sep 26, 2007 at 09:47:41PM +0200, roel wrote:
>> The brackets in the first if/else are not required, and you could combine 
>> the two statements:
> 
> You mean braces, not brackets.  And I find this little fetish of yours
> highly disturbing.  I prefer to use braces, and will continue to do so,
> regardless of your nitpicking.

Well as you say it, you like the braces, so it appears to be your fetish. Of 
course you don't 
have to make any changes, I am just reporting them cause they aren't needed. No 
need for the
offensive tone either.

Roel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-26 Thread Mingming Cao
On Wed, 2007-09-26 at 12:54 -0700, Andrew Morton wrote:
> On Fri, 21 Sep 2007 16:13:56 -0700
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > Convert kmalloc to kzalloc() and get rid of the memset().
> 
> I split this into separate ext3/jbd and ext4/jbd2 patches.  It's generally
> better to raise separate patches, please - the ext3 patches I'll merge
> directly but the ext4 patches should go through (and be against) the ext4
> devel tree.
> 
Sure. The patches(including ext3/jbd and ext4/jbd2) were merged into
ext4 devel tree already, I will remove the ext3/jbd part out of the ext4
devel tree.

> I fixed lots of rejects against the already-pending changes to these
> filesystems.
> 
> You forgot to remove the memsets in both start_this_handle()s.
> 
Thanks for catching this.

Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: why network devices don't do reference counting? (Re: [PATCH] Module use count must be updated as bridges are created/destroyed)

2007-09-26 Thread Stephen Hemminger
On Wed, 26 Sep 2007 23:06:53 +0200
Oleg Verych <[EMAIL PROTECTED]> wrote:

> * Wed, 26 Sep 2007 08:37:05 -0700
> * Organization: Linux Foundation
> >
> > On Wed, 26 Sep 2007 08:53:27 +0100
> > "Jan Beulich" <[EMAIL PROTECTED]> wrote:
> >
> >> Otherwise 'modprobe -r' on a module having a dependency on bridge will
> >> implicitly unload bridge, bringing down all connectivity that was
> >> using bridges.
> >> 
> >> Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
> >>
> >
> > No, network devices don't do reference counting.
> 
> Could you explain why, please?
> 
> After `udevd` on boot loads lots of unused crap, i surrendered, and use
> $(rmmod `lsmod | just first column`). Networing bravely wipes away. OK,
> there are lots of configs: udev, hotplug, modprobe, that somebody might
> like to fix. But it came to the end with me. I just don't care. So,
> please answer :)
> 

For hotplug and other reasons, the network developers decided that being
able to remove a network module at any time was a good thing. It works.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


State of the Linux PCI Subsystem for 2.6.23-rc8

2007-09-26 Thread Greg KH
Here's a summary of the current state of the Linux PCI subsystem, as of
2.6.23-rc8.

If the information in here is incorrect, or anyone knows of any
outstanding issues not listed here, please let me know.

List of outstanding regressions from 2.6.22:
- none known.

List of outstanding regressions from older kernel versions:
- none known.


If interested, the list of all currently open PCI bugs can be seen at:
http://bugzilla.kernel.org/showdependencytree.cgi?id=5829_resolved=1


Future patches that are currently in my quilt tree (as found at
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
) for the PCI subsystem are as follows.  All of these will be submitted
for inclusion into 2.6.24, except as noted.  The diffstat of these
patches is included at the bottom of this message for those that are
interested.

- various pci quirks for different devices
- pci hotplug driver kthread conversion
- pci hotplug bugfixes
- pci express hotplug tweaks and fixes
- minor bugfixes and cleanups
- MSI documentation update

- pci bridge device rework.  Note, this was reported to break
  Andrew's x86-64 box, but he lost the picture he took of the
  oops.  Others reported that it worked just fine on their
  boxes.  I'm still a little hesitant of sending this to 2.6.24
  until I can track down what happened here.

There are no PCI api changes scheduled for 2.6.24.


thanks,

greg k-h



Diffstat of the current pci-2.6 quilt queue:

 Documentation/DMA-API.txt |3 
 Documentation/MSI-HOWTO.txt   |   69 
 arch/i386/kernel/pci-dma.c|3 
 arch/i386/kernel/reboot_fixups.c  |6 +
 arch/i386/pci/common.c|   10 +
 arch/i386/pci/fixup.c |   47 
 arch/i386/pci/irq.c   |   39 +++
 arch/x86_64/kernel/pci-dma.c  |1 
 drivers/pci/bus.c |   17 ++-
 drivers/pci/hotplug/cpqphp_core.c |2 
 drivers/pci/hotplug/cpqphp_ctrl.c |   74 -
 drivers/pci/hotplug/ibmphp_hpc.c  |   57 ++
 drivers/pci/hotplug/pciehp_core.c |   24 +---
 drivers/pci/hotplug/pciehp_ctrl.c |   20 +--
 drivers/pci/hotplug/pciehp_hpc.c  |  209 ++
 drivers/pci/hotplug/pciehp_pci.c  |   24 ++--
 drivers/pci/pci-driver.c  |3 
 drivers/pci/pci.c |5 
 drivers/pci/pci.h |2 
 drivers/pci/pcie/Kconfig  |9 -
 drivers/pci/probe.c   |   82 +++---
 drivers/pci/quirks.c  |   43 ---
 drivers/pci/remove.c  |6 -
 include/linux/pci.h   |4 
 include/linux/pci_ids.h   |3 
 include/linux/pci_regs.h  |6 -
 lib/swiotlb.c |1 
 27 files changed, 316 insertions(+), 453 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/7] Extended crashkernel command line

2007-09-26 Thread Bernhard Walle
* Oleg Verych <[EMAIL PROTECTED]> [2007-09-26 20:18]:
> > 
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1172,33 +1172,50 @@ static int __init parse_crashkernel_mem(
> > do {
> > unsigned long long start = 0, end = ULLONG_MAX;
> > unsigned long long size = -1;
> 
> no need in assigning values here, unless you plan to use them in case
> of `return -EINVAL', but i can not see that,

What about this (and yes, I tested with some wrong strings with Qemu):



This patch improves error handling in parse_crashkernel_mem() by comparing
the return pointer of memparse() with the input pointer and also replaces
all printk(KERN_WARNING msg) with pr_warning(msg).


Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]>

---
 kernel/kexec.c |   54 +++---
 1 file changed, 39 insertions(+), 15 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1167,44 +1167,59 @@ static int __init parse_crashkernel_mem(
unsigned long long  *crash_size,
unsigned long long  *crash_base)
 {
-   char *cur = cmdline;
+   char *cur = cmdline, *tmp;
 
/* for each entry of the comma-separated list */
do {
-   unsigned long long start = 0, end = ULLONG_MAX;
-   unsigned long long size = -1;
+   unsigned long long start, end = ULLONG_MAX, size;
 
/* get the start of the range */
-   start = memparse(cur, );
+   start = memparse(cur, );
+   if (cur == tmp) {
+   pr_warning("crashkernel: Memory value expected\n");
+   return -EINVAL;
+   }
+   cur = tmp;
if (*cur != '-') {
-   printk(KERN_WARNING "crashkernel: '-' expected\n");
+   pr_warning("crashkernel: '-' expected\n");
return -EINVAL;
}
cur++;
 
/* if no ':' is here, than we read the end */
if (*cur != ':') {
-   end = memparse(cur, );
+   end = memparse(cur, );
+   if (cur == tmp) {
+   pr_warning("crashkernel: Memory "
+   "value expected\n");
+   return -EINVAL;
+   }
+   cur = tmp;
if (end <= start) {
-   printk(KERN_WARNING "crashkernel: end <= 
start\n");
+   pr_warning("crashkernel: end <= start\n");
return -EINVAL;
}
}
 
if (*cur != ':') {
-   printk(KERN_WARNING "crashkernel: ':' expected\n");
+   pr_warning("crashkernel: ':' expected\n");
return -EINVAL;
}
cur++;
 
-   size = memparse(cur, );
-   if (size < 0) {
-   printk(KERN_WARNING "crashkernel: invalid size\n");
+   size = memparse(cur, );
+   if (cur == tmp) {
+   pr_warning("Memory value expected\n");
+   return -EINVAL;
+   }
+   cur = tmp;
+   if (size >= system_ram) {
+   pr_warning("crashkernel: invalid size\n");
return -EINVAL;
}
 
/* match ? */
-   if (system_ram >= start  && system_ram <= end) {
+   if (system_ram >= start && system_ram <= end) {
*crash_size = size;
break;
}
@@ -1213,8 +1228,15 @@ static int __init parse_crashkernel_mem(
if (*crash_size > 0) {
while (*cur != ' ' && *cur != '@')
cur++;
-   if (*cur == '@')
-   *crash_base = memparse(cur+1, );
+   if (*cur == '@') {
+   cur++;
+   *crash_base = memparse(cur, );
+   if (cur == tmp) {
+   pr_warning("Memory value expected "
+   "after '@'\n");
+   return -EINVAL;
+   }
+   }
}
 
return 0;
@@ -1234,8 +1256,10 @@ static int __init parse_crashkernel_simp
char *cur = cmdline;
 
*crash_size = memparse(cmdline, );
-   if (cmdline == cur)
+   if (cmdline == cur) {
+   pr_warning("crashkernel: memory value expected\n");
return -EINVAL;
+   }
 
if (*cur == '@')
*crash_base = memparse(cur+1, );
-
To unsubscribe from this list: send the line 

State of the Linux USB Subsystem for 2.6.23-rc8

2007-09-26 Thread Greg KH
Here's a summary of the current state of the Linux USB subsystem, as of
2.6.23-rc8

If the information in here is incorrect, or anyone knows of any
outstanding issues not listed here, please let me know.

List of outstanding regressions from 2.6.22:
- none known.

List of outstanding regressions from older kernel versions:
- none known.

If interested, the list of all currently open USB bugs can be seen at:
http://bugzilla.kernel.org/showdependencytree.cgi?id=5089_resolved=1

Yeah, there are way too many there, I've been really slack in trying to
work through them.  If anyone wants to help out, feel free :)


Future patches that are currently in my quilt tree (as found at
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
) for the USB subsystem are as follows.  All of these will be submitted
for inclusion into 2.6.24, except as noted.  The diffstat of these
patches is included at the bottom of this message for those that are
interested.
- usbmon fixups and cleanups and documentation update
- usblp cleanups and tweaks
- sisusb2vga lindenting and other janitorial cleanups
- urb->status reworks in the host controller drivers to make the
  removal of that field in the future much easier.
- usb gadget driver cleanups
- new drivers/usb/serial/ch341.c driver
- kobil_sct driver reworking
- ueagle driver updates
- new device ids added
- new unusual devs storage quirks added
- removal of all of the USB_QUIRK_NO_AUTOSUSPEND entries as they
  no longer make any sense.
- USB authorization changes which allow userspace to disable a
  USB device from being able to be used by the kernel, if it so
  desires.  This is part of the slow merge of the USB Wireless
  work.
- lots of small tweaks and bugfixes and reworks in the
  usb-serial drivers due to some tty reworks and auditing.
- the usbserial port is now shown in sysfs for the individual
  usb-serial bus devices.a
- lots of rework of the internal apis for the USB host
  controllers to make things simpler and easier (hopefully.)
- suspend tweaks and reworks to make USB devices behave better
- other minor fixes in the USB core and drivers.


New USB driver api changes for 2.6.24:
usb_urb_dir_in() and usb_urb_dir_out() have been added to test
the direction of an urb.

Note, there are over 100 patches in the USB queue, so I might have
missed a few things in reviewing them by hand right now.  If I failed to
describe your patch that is already in the queue, and you feel it is
important for everyone to know about, please feel free to add to the
above list.  I did not purposefully mean to exclude anything, merely try
to summarize things.

thanks,

greg k-h



Diffstat of the current usb-2.6 quilt queue:

 Documentation/usb/authorization.txt|   92 +
 Documentation/usb/usb-serial.txt   |   11 
 Documentation/usb/usbmon.txt   |9 
 drivers/usb/Makefile   |   24 
 drivers/usb/atm/cxacru.c   |   43 
 drivers/usb/atm/speedtch.c |3 
 drivers/usb/atm/ueagle-atm.c   | 1398 +++--
 drivers/usb/class/usblp.c  |  120 +-
 drivers/usb/core/config.c  |   24 
 drivers/usb/core/devio.c   |   77 -
 drivers/usb/core/driver.c  |   31 
 drivers/usb/core/endpoint.c|1 
 drivers/usb/core/generic.c |   30 
 drivers/usb/core/hcd.c |  908 ++
 drivers/usb/core/hcd.h |   38 
 drivers/usb/core/hub.c |  273 -
 drivers/usb/core/message.c |   52 -
 drivers/usb/core/quirks.c  |   81 -
 drivers/usb/core/sysfs.c   |   39 
 drivers/usb/core/urb.c |  114 +-
 drivers/usb/core/usb.c |   40 
 drivers/usb/core/usb.h |7 
 drivers/usb/gadget/amd5536udc.c|7 
 drivers/usb/gadget/dummy_hcd.c |   97 --
 drivers/usb/gadget/ether.c |  147 +--
 drivers/usb/gadget/file_storage.c  |  249 ++---
 drivers/usb/gadget/fsl_usb2_udc.c  |9 
 drivers/usb/gadget/gmidi.c |   80 -
 drivers/usb/gadget/inode.c |   44 
 drivers/usb/gadget/omap_udc.c  |   10 
 drivers/usb/gadget/serial.c|  166 +--
 drivers/usb/gadget/zero.c  |  239 ++--
 drivers/usb/host/ehci-hcd.c|   14 
 drivers/usb/host/ehci-pci.c|5 
 drivers/usb/host/ehci-ps3.c|2 
 drivers/usb/host/ehci-q.c  |  115 +-
 drivers/usb/host/ehci-sched.c  |   47 
 

State of the Linux Driver Core Subsystem for 2.6.23-rc8

2007-09-26 Thread Greg KH
Here's a summary of the current state of the Linux Driver core
subsystem, as of 2.6.23-rc8.

If the information in here is incorrect, or anyone knows of any
outstanding issues not listed here, please let me know.

List of outstanding regressions from 2.6.22:
- none known.

List of outstanding regressions from older kernel versions:
- none known.


There are no currently open Driver core or sysfs bugs in bugzilla.


Future patches that are currently in my quilt tree (as found at
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
) for the Driver core subsystem are as follows.  All of these will be
submitted for inclusion into 2.6.24, except as noted.  The diffstat of
these patches is included at the bottom of this message for those that
are interested.

- HOWTO ja_JP updates
- metric boatload of changes and tweaks to sysfs from Tejun.
  These clean up the internal usage and implementation of sysfs,
  and split sysfs from the kobject model a lot, fixing a lot of
  problems along the way.  I can't thank him enough for this
  work.
- cleaned up the usage of struct subsystem some more by removing
  some functions that are not needed, or only used by the driver
  core.  Details on the api changes are below
- cleaned up the usage of the kobject->name field and removed it
  entirely.  This saves us some size on every kobject which
  really adds up on those s390 31 bit boxes with 20,000
  different block devices.
- uevent environment variable handling has been made simpler,
  and now hopefully, almost impossible to get wrong.  Previously
  we were forcing every subsystem to open-code a lot of this
  logic.  Thanks to Kay for this work.
- global list of sysdev_drivers is removed, as no one was using
  it at all.
- the number of legacy ptys can now be dynamically specified on
  the kernel command line, allowing people who refuse to fix
  their old applications to be able to still run on newer
  distros that want to limit this number at build time.
- dmi sysfs code cleanups and fixes
- block devices are moved from /sys/block to /sys/class/block.
  This patch has been hanging around for almost a year now, and
  hopefully we have worked out all of the kinks and userspace
  boot breakages.  If anyone has any problems with this, please
  let me and Kay know about it.  I'm hesitant to include it in
  2.6.24 as it has recently changed and needs more testing.
  Maybe it will go into 2.6.25.  Oh, udev has been able to
  handle this for over a year, so there should not be any
  problems with distros that are still supported.  It can also
  be turned off with the CONFIG_SYSFS_DEPRECATED build option
  for older distros.
- new debugfs functions for people who like hex numbers
- platform devices now have the "platform:" alias added to their
  modalias to fix the recursive modprobe loops that Red Hat,
  OLPC, SuSE, and Debian have reported in the current code.
- some struct class_device to struct device conversions were
  done for video and some various char drivers.  More of these
  are on the way.
- cdev kobject name cleanups.  The kobject name of a cdev does
  not do anything, so it should not be set.
- firmware collision fix between i2c and i2c-dev.
- uevent files for bus and driver have been added.  SuSE has
  shipped this for quite some time now.
- if CONFIG_HOTPLUG is not enabled, some more space is saved in
  the uevent code.
- the path to the uevent helper program (traditionally
  /sbin/hotplug) can be specified in the kernel config.  This
  keeps the kernel from having to call a program that is not
  present 300+ times before init starts up, speeding up boot
  time in theory.

New driver core api changes for 2.6.24:
- debugfs_create_x8(), debugfs_create_x16(), and
  debugfs_create_x32() have been added
- struct kobject.name field has been removed.  Use
  kobject_set_name() and kobject_name() to set and get the name.
- uevent callbacks in the driver core and for kobjects have been
  changed from:
int (*uevent)(struct device *dev, char **envp, int num_envp, char 
*buffer, int buffer_size);
  to:
int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
  and struct kobj_uevent_env has been created to make it simpler
  to create environment variables for uevents.
- struct platform_device.id has changed from an u32 to an int.
- the following functions have been removed either entirely or
  from the global namespace:

Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
> On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
>> Please try the following debug patch to let us know what is going on.
>>
>>  -hpa
> 
>> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
>> index 1a2e62d..a0ccf29 100644
>> --- a/arch/i386/boot/memory.c
>> +++ b/arch/i386/boot/memory.c
>> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
>>"=m" (*desc)
>>  : "D" (desc), "a" (0xe820));
>>  
>> +printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
>> +   err, id, next,
>> +   (unsigned int)desc->addr,
>> +   (unsigned int)desc->size,
>> +   desc->type);
>> +
>>  if (err || id != SMAP)
>>  break;
> 
> Okay, we have clarity.   Here is the output
> 
> e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
> e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
> e820: err 0 id 0x534d4150 next 15516 000e:0002 2
> e820: err 0 id 0x0e7b next 11536 0010:0e6b 1
> 
> In the last entry,  id is obviously wrong (it should be 'SMAP' or
> 0x534d4150).  This is the BIOS bug.
> 
> Here's the reason why this bothers us now.  In the old assembly code,
> if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
> code.  In the new code in memory.c, if id != SMAP, we break out of the
> int15 loop, and return boot_params.e820_entries, which in our case is
> 3.  detect_memory() considers this to be successful, and no attempt to
> parse e801 is made.
> 
> So thats where the problem is - in the old code with the buggy BIOS, we
> punted to reading the e801 information, and that was enough to keep us 
> going.   In the new code, we allow a partial table to be used, and we
> blow up.
> 
> Attached is a patch to fix this - it returns -1 on error, and only sets
> boot_params.e820_entries to be non-zero if we have something useful
> in it.  This punts the detection to the e801 code, which then is
> then successful.
> 
> This fixes the problem with the DB800, and so it probably should
> with the other Geode platforms affected by this.
> 
> Many thanks to hpa for the guiding hand.
> 

This patch is obviously wrong.  There are a lot of e820 BIOSen out there
that terminate with CF=1, and that is a legitimate termination condition
for e820.  Now, as far as what to do when id != SMAP, it probably is
still the right thing to do; since the BOS vendor couldn't get something
that elementary correct, we shouldn't trust the data.

I'll write up a corrected patch.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bw-qcam: use data_reverse instead of manually poking the control register

2007-09-26 Thread Randy Dunlap
On Wed, 26 Sep 2007 12:12:50 -0700 Brett Warden wrote:

> Appeases the warning "parport0 (bw-qcam): use data_reverse for this!"
> 
> Signed-off-by: Brett T. Warden <[EMAIL PROTECTED]>
> 

Where does the warning come from?  (what software produces it?)


> ---
> 
> It seems to work fine with my Quickcam under 2.6.22.
> 
> diff --git a/drivers/media/video/bw-qcam.c b/drivers/media/video/bw-qcam.c
> index 7d47cbe..01e47ed 100644
> --- a/drivers/media/video/bw-qcam.c
> +++ b/drivers/media/video/bw-qcam.c
> @@ -107,6 +107,11 @@ static inline void write_lpcontrol(struct
> qcam_device *q, int d)
> parport_write_control(q->pport, d);
>  }
> 
> +static inline void reverse_port(struct qcam_device *q)
> +{
> +   parport_data_reverse(q->pport);
> +}
> +
>  static int qc_waithand(struct qcam_device *q, int val);
>  static int qc_command(struct qcam_device *q, int command);
>  static int qc_readparam(struct qcam_device *q);
> @@ -369,7 +374,11 @@ static void qc_reset(struct qcam_device *q)
> break;
> 
> case QC_ANY:
> -   write_lpcontrol(q, 0x20);
> +   /*
> +* Replaced with reverse_port
> +* write_lpcontrol(q, 0x20);
> +*/
> +   reverse_port(q);
> write_lpdata(q, 0x75);
> 
> if (read_lpdata(q) != 0x75) {
> @@ -512,10 +521,12 @@ static inline int qc_readbytes(struct
> qcam_device *q, char buffer[])
> switch (q->port_mode & QC_MODE_MASK)
> {
> case QC_BIDIR:  /* Bi-directional Port */
> -   write_lpcontrol(q, 0x26);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0x6);
> lo = (qc_waithand2(q, 1) >> 1);
> hi = (read_lpstatus(q) >> 3) & 0x1f;
> -   write_lpcontrol(q, 0x2e);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0xe);
> lo2 = (qc_waithand2(q, 0) >> 1);
> hi2 = (read_lpstatus(q) >> 3) & 0x1f;
> switch (q->bpp)
> @@ -613,10 +624,13 @@ static long qc_capture(struct qcam_device * q,
> char __user *buf, unsigned long l
> 
> if ((q->port_mode & QC_MODE_MASK) == QC_BIDIR)
> {
> -   write_lpcontrol(q, 0x2e);   /* turn port around */
> -   write_lpcontrol(q, 0x26);
> +   reverse_port(q);/* turn port around */
> +   write_lpcontrol(q, 0xe);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0x6);
> (void) qc_waithand(q, 1);
> -   write_lpcontrol(q, 0x2e);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0xe);
> (void) qc_waithand(q, 0);
> }
> 
> 
> 
> -- 

---
~Randy
Phaedrus says that Quality is about caring.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bw-qcam: use data_reverse instead of manually poking the control register

2007-09-26 Thread Ray Lee
On 9/26/07, Brett Warden <[EMAIL PROTECTED]> wrote:
> On 9/26/07, Ray Lee <[EMAIL PROTECTED]> wrote:
>
> > Just as an aside, if you've tested this and it works, then there's no
> > point to keep the write_lpcontrol even as a comment. Kill those four
> > lines, and if someone's interested in what happened they'll just look
> > at the file history.
>
> Point taken, thanks for the feedback.
>
> ---
>
> diff --git a/drivers/media/video/bw-qcam.c b/drivers/media/video/bw-qcam.c
> index 7d47cbe..0ba92e3 100644
> --- a/drivers/media/video/bw-qcam.c
> +++ b/drivers/media/video/bw-qcam.c
> @@ -107,6 +107,11 @@ static inline void write_lpcontrol(struct
> qcam_device *q, int d)
> parport_write_control(q->pport, d);
>  }
>
> +static inline void reverse_port(struct qcam_device *q)
> +{
> +   parport_data_reverse(q->pport);
> +}
> +
>  static int qc_waithand(struct qcam_device *q, int val);
>  static int qc_command(struct qcam_device *q, int command);
>  static int qc_readparam(struct qcam_device *q);
> @@ -369,7 +374,7 @@ static void qc_reset(struct qcam_device *q)
> break;
>
> case QC_ANY:
> -   write_lpcontrol(q, 0x20);
> +   reverse_port(q);
> write_lpdata(q, 0x75);
>
> if (read_lpdata(q) != 0x75) {
> @@ -512,10 +517,12 @@ static inline int qc_readbytes(struct
> qcam_device *q, char buffer[])
> switch (q->port_mode & QC_MODE_MASK)
> {
> case QC_BIDIR:  /* Bi-directional Port */
> -   write_lpcontrol(q, 0x26);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0x6);
> lo = (qc_waithand2(q, 1) >> 1);
> hi = (read_lpstatus(q) >> 3) & 0x1f;
> -   write_lpcontrol(q, 0x2e);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0xe);
> lo2 = (qc_waithand2(q, 0) >> 1);
> hi2 = (read_lpstatus(q) >> 3) & 0x1f;
> switch (q->bpp)
> @@ -613,10 +620,13 @@ static long qc_capture(struct qcam_device * q,
> char __user *buf, unsigned long l
>
> if ((q->port_mode & QC_MODE_MASK) == QC_BIDIR)
> {
> -   write_lpcontrol(q, 0x2e);   /* turn port around */
> -   write_lpcontrol(q, 0x26);
> +   reverse_port(q);/* turn port around */
> +   write_lpcontrol(q, 0xe);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0x6);
> (void) qc_waithand(q, 1);
> -   write_lpcontrol(q, 0x2e);
> +   reverse_port(q);
> +   write_lpcontrol(q, 0xe);
> (void) qc_waithand(q, 0);
> }

Better, and do you have time for two (possibly stupid) questions? In
each of the last cases it looks like the transformation is from a
write_lpcontrol -> reverse_port and a write_lpcontrol (old address -
0x20). Except the first one, which merely has the reverse_port. One
would think that there should be a write_lpcontrol(q, 0x0); after that
one.

Also, is the reverse port sticky, or does it only apply to the next
write? If it's only the next, then maybe a different name would be
better. If it's sticky, then I think the code is wrong...

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread Jordan Crouse
On 26/09/07 12:14 -0700, H. Peter Anvin wrote:
> Please try the following debug patch to let us know what is going on.
> 
>   -hpa

> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index 1a2e62d..a0ccf29 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -33,6 +33,12 @@ static int detect_memory_e820(void)
> "=m" (*desc)
>   : "D" (desc), "a" (0xe820));
>  
> + printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
> +err, id, next,
> +(unsigned int)desc->addr,
> +(unsigned int)desc->size,
> +desc->type);
> +
>   if (err || id != SMAP)
>   break;

Okay, we have clarity.   Here is the output

e820: err 0 id 0x534d4150 next 15476 :0009fc00 1
e820: err 0 id 0x534d4150 next 15496 0009fc00:0400 2
e820: err 0 id 0x534d4150 next 15516 000e:0002 2
e820: err 0 id 0x0e7b next 11536 0010:0e6b 1

In the last entry,  id is obviously wrong (it should be 'SMAP' or
0x534d4150).  This is the BIOS bug.

Here's the reason why this bothers us now.  In the old assembly code,
if the returned ID wasn't equal to 'SMAP', we jumped straight to the e801
code.  In the new code in memory.c, if id != SMAP, we break out of the
int15 loop, and return boot_params.e820_entries, which in our case is
3.  detect_memory() considers this to be successful, and no attempt to
parse e801 is made.

So thats where the problem is - in the old code with the buggy BIOS, we
punted to reading the e801 information, and that was enough to keep us 
going.   In the new code, we allow a partial table to be used, and we
blow up.

Attached is a patch to fix this - it returns -1 on error, and only sets
boot_params.e820_entries to be non-zero if we have something useful
in it.  This punts the detection to the e801 code, which then is
then successful.

This fixes the problem with the DB800, and so it probably should
with the other Geode platforms affected by this.

Many thanks to hpa for the guiding hand.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.
[i386]: Return an error if the e820 detection goes bad

From: Jordan Crouse <[EMAIL PROTECTED]>

Change the e820 code to always return an error if something
bad happens while reading the e820 map.  This matches the
old code behavior, and allows brain-dead e820 implementations
to still work.

Signed-off-by: Jordan Crouse <[EMAIL PROTECTED]>
---

 arch/i386/boot/memory.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..4c7f0f6 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -22,7 +22,7 @@ static int detect_memory_e820(void)
 {
u32 next = 0;
u32 size, id;
-   u8 err;
+   u8 err, count = 0;
struct e820entry *desc = boot_params.e820_map;
 
do {
@@ -34,13 +34,14 @@ static int detect_memory_e820(void)
: "D" (desc), "a" (0xe820));
 
if (err || id != SMAP)
-   break;
+   return -1;
 
-   boot_params.e820_entries++;
+   count++;
desc++;
} while (next && boot_params.e820_entries < E820MAX);
 
-   return boot_params.e820_entries;
+   boot_params.e820_entries = count;
+   return count;
 }
 
 static int detect_memory_e801(void)


2.6.23-rc8-rt1

2007-09-26 Thread Steven Rostedt
We are pleased to announce the 2.6.23-rc8-rt1 tree, which can be
downloaded from the new location:

 http://www.kernel.org/pub/linux/kernel/projects/rt/

Changes since 2.6.23-rc4-rt1

  - update to -rc8

  - A bunch of PowerPC stuff(Tony Breeds)
  - rearrange thread flags
  - count_active_rt_tasks fix
  - match __rw_yield declaration
  - unsigned long flags
  - removed flush_tlb_pending

  - alternate node alloc fix(Andi Kleen)

  - fix compiling of timer code in !PREEMPT_RT  (Andi Kleen)

  - convert i_alloc_sem to compat_rw_semaphore  (Steven Rostedt)

  - don't let RT rw_semaphores do non_owner (Steven Rostedt)

  - kill the union in s_files   (Peter Zijlstra)

  - Sched sum less than zero prevention (Luis Claudio)

  - nmi_watchdog hpet   (David Bahi)

  - call_rcu_bh rename  (Steven Rostedt)

Work in progress (to be included soon):

  - CFS-scheduler updates
  - high res timers updates
  - PowerPC high res timers

to build a 2.6.23-rc8-rt1 tree, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.22.tar.bz2
  http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.23-rc8.bz2
  http://www.kernel.org/pub/linux/kernel/projects/rt/patch-2.6.23-rc8-rt1.bz2

The broken out patches are also available.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] taskstats: fix stats->ac_exitcode to work on threads and use group_exit_code

2007-09-26 Thread Guillaume Chazarain
Le Wed, 26 Sep 2007 22:47:54 +0200,
roel <[EMAIL PROTECTED]> a écrit :

> > +   if (thread_group_leader(tsk) && ((tsk->flags & PF_FORKNOEXEC)))
> 
>   if (thread_group_leader(tsk) && (tsk->flags & PF_FORKNOEXEC))

Yeah, right, good catch.

> > +   group_exit_code = tg_stats ? tsk->signal->group_exit_code : 0;
> > +   stats->ac_exitcode = group_exit_code ? : tsk->exit_code;
> 
> Isn't this just confusing? why not
> 
>   if (tg_stats) {
>   group_exit_code = tsk->signal->group_exit_code;
>   stats->ac_exitcode = group_exit_code;

Because in this case if group_exit_code is null, we want
tsk->exit_code, not 0.

>   
>   } else {
>   group_exit_code = 0;
>   stats->ac_exitcode = tsk->exit_code;
>   }

Andrew is not interested at the moment in this series (that replaces
all my previous patches on taskstats, for info), but thank you for the
review.


-- 
Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: No linux/module.h

2007-09-26 Thread Jiri Slaby
On 09/26/2007 10:25 PM, Kristof Provost wrote:
> On 2007-09-26 11:29:33 (+0100), mahamuni ashish <[EMAIL PROTECTED]> wrote:
>> I am writing simple kernel module.
>> I have included linux/module.h
>> compiler gives me error that no such file, I also
>> searched it on my machine.
>> It really doesn't exist. I am using fedora 6.
>> How do I install required libraries.
> I suspect you either have an incorrect makefile or you don't have the
> kernel source code on your system. The header file can be found in the
> kerel source: include/linux/module.h

He needs kernel-devel, I guess.

-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] Kernel compile bug in 2.6.22.6/7 {maybe more} ARM/StrongARM

2007-09-26 Thread Dave Jones
On Tue, Sep 25, 2007 at 10:36:51AM -0400, Dave Jones wrote:
 > On Tue, Sep 25, 2007 at 08:31:32AM +0100, Russell King wrote:
 >  > On Mon, Sep 24, 2007 at 05:53:57PM -0500, [EMAIL PROTECTED] wrote:
 >  > > I was building a kernel for an iPaq {SA1110} and ran into this.
 >  > > 
 >  > > linux-2.6.22.7/arch/arm/mach-sa1100/generic.c:
 >  > > Has a: #include 
 >  > > Then afterwards there is a: #if defined(CONFIG_CPU_FREQ_SA1100) ||
 >  > > defined(CONFIG_CPU_FREQ_SA1110)
 >  > > who's else section redefines the cpufreq_get function inhereited from
 >  > > the header
 >  > > 
 >  > > I'm guessing no one ever ended up in the "else" section until now, and
 >  > > that the header was added some time ago and no one caught this.
 >  > > This patch worked for me to get rid of the compile time problems.  I'm
 >  > > having issues with the kernel, but as far as I can tell they are form
 >  > > the Frame buffer and not because of this.  If this assessment is correct
 >  > > {the not needing this code anymore} then please pass this along so it
 >  > > makes it into an upcoming release.
 >  > > 
 >  > > --- linux-2.6.22.7/arch/arm/mach-sa1100/generic.c.orig  2007-09-24
 >  > > 17:36:21.0 -0500
 >  > > +++ linux-2.6.22.7/arch/arm/mach-sa1100/generic.c   2007-09-24
 >  > > 17:40:02.0 -0500
 >  > > @@ -107,15 +107,6 @@ unsigned int sa11x0_getspeed(unsigned in
 >  > > return cclk_frequency_100khz[PPCR & 0xf] * 100;
 >  > >  }
 >  > > 
 >  > > -#else
 >  > > -/*
 >  > > - * We still need to provide this so building without cpufreq works.
 >  > > - */
 >  > > -unsigned int cpufreq_get(unsigned int cpu)
 >  > > -{
 >  > > -   return cclk_frequency_100khz[PPCR & 0xf] * 100;
 >  > > -}
 >  > > -EXPORT_SYMBOL(cpufreq_get);
 >  > >  #endif
 >  > > 
 >  > >  /*
 >  > 
 >  > No.  That code is required - the StrongARM 1100 framebuffer driver
 >  > *needs* to know what the CPU frequency is so it can set the pixel
 >  > clock divisor.
 >  > 
 >  > The real problem is the silly people who added this to cpufreq.h:
 >  > 
 >  > #ifdef CONFIG_CPU_FREQ
 >  > unsigned int cpufreq_quick_get(unsigned int cpu);
 >  > unsigned int cpufreq_get(unsigned int cpu);
 >  > #else
 >  > static inline unsigned int cpufreq_quick_get(unsigned int cpu)
 >  > {
 >  > return 0;
 >  > }
 >  > static inline unsigned int cpufreq_get(unsigned int cpu)
 >  > {
 >  > return 0;
 >  > }
 >  > #endif
 >  > 
 >  > which utterly bogus.
 > 
 > Which came from ...
 > 
 > commit 184c44d2049c4db7ef6ec65794546954da2c6a0e
 > Author: Andrew Morton <[EMAIL PROTECTED]>
 > Date:   Wed May 2 19:27:08 2007 +0200
 > 
 > [PATCH] x86-64: fix x86_64-mm-sched-clock-share
 > 
 > Fix for the following patch. Provide dummy cpufreq functions when
 > CPUFREQ is not compiled in.
 > 
 > Cc: Andi Kleen <[EMAIL PROTECTED]>
 > Cc: Dave Jones <[EMAIL PROTECTED]>
 > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
 > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
 > 

Following up on this from yesterday, Linus please revert the above cset.
It doesn't seem to be necessary (it was added to fix a miscompile in
'make allnoconfig' which doesn't seem to be repeatable with it reverted)
and actively breaks the ARM SA1100 framebuffer driver.

(If you'd prefer a patch reverting it, I'll send one, but I'm
 hoping that git-revert will just dtrt).

Thanks,

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Nigel Cunningham
Hi.

On Thursday 27 September 2007 06:30:36 Joseph Fannin wrote:
> On Fri, Sep 21, 2007 at 11:45:12AM +0200, Pavel Machek wrote:
> > Hi!
> > > >
> > > > Sounds doable, as long as you can cope with long command lines (which
> > > > shouldn't be a biggie). (If you've got a swapfile or parts of a swap
> > > > partition already in use, it can be quite fragmented).
> > >
> > > Hmm.  This is an interesting problem.  Sharing a swap file or a swap
> > > partition with the actual swap of user space pages does seem to be
> > > a limitation of this approach.
> > >
> > > Although the fact that it is simple to write to a separate file may
> > > be a reasonable compensation.
> >
> > I'm not sure how you'd write it to a separate file. Notice that kjump
> > kernel may not mount journalling filesystems, not even
> > read-only. (Ext3 replays journal in that case). You could pass block
> > numbers from the original kernel...
> 
> The ext3 thing is a bug, the case for which I don't think has been
> adequately explained to the ext[34] folks.  There should be at least a
> no_replay mount flag available, or something.  It has ramifications
> for more than just hibernation.
> 
> And yeah, I'm gonna bring up the swap files thing again.  If you
> can hibernate to a swap file, you can hibernate to a dedicated
> hibernation file, and vice versa.
> 
> If you can't hibernate to a swap file, then swap files are
> effectively unsupported for any system you might want to hibernate.
>  I wonder what embedded folks would think about that
> .
> 
> But, in my ignorance, I'm not sure even fixing the ext3 bug will
> guarantee you consistent metadata so that you can handle a
> swap/hibernate file.  You can do a sync(), but how do you make that
> not race against running processes without the freezer, or blkdev
> snapshots?
> 
> I guess uswsusp and the-patch-previously-known-as-suspend2 handle
> this somehow, though.
> 
>(It's that same ignorance that has me waiting for someone with
> established credit with kernel people to make that argument for the
> ext3 bug, so I can hang my own reasons for thinking that it's bad off
> of theirs).

I haven't looked at swsusp support, but TuxOnIce handles all storage (swap 
partitions, swap files and ordinary files) by first allocating swap (if we're 
using swap), then bmapping the storage we're going to use. After that, we can 
freeze filesystems and processes with impunity. The allocated storage is then 
viewed as just a collection of bdevs, each with an ordered chain of extents 
defining which blocks we're going to read/write - a series of tapes if you 
like. In the image header, we store dev_ts and the block chains, together 
with the configuration information. As long as the same bdevs are configured 
at boot time prior to the echo > /sys/power/resume, we're in business. 
Filesystems don't need to be mounted because we don't use filesystem code 
anyway. (LVM etc does though in so far as it's needed to make the dev_t match 
the device again).

This matches with what you said above about hibernating to swap files and 
dedicated hibernation files - TuxOnIce uses exactly the same code to do the 
i/o to both; the variation is in the code to recognise the image header and 
allocate/free/bmap storage.

 Personally, I don't think ext[34] is broken. If 
there's data being left in the journal that will need replaying, then 
mounting without replaying the journal sounds wrong. Perhaps you should 
instead be arguing that nothing should be left in the journal after a 
filesystem freeze. But, of course, current code isn't doing a filesystem 
freeze (just a process freeze) and the kexec guys want to take even that 
away. 

In short, I agree. AFAICS, you need both the process freezer and filesystem 
freezing to make this thing fly properly.

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Re: NMI error and Intel S5000PSL Motherboards]

2007-09-26 Thread Randy Dunlap
On Wed, 26 Sep 2007 15:07:14 -0400 samson yeung wrote:

> Hello,
> 
> I'm working with AndrewL733 on this issue. I'm doing the git bisect right now.
> 
> scanpci -f -1 causes the problem, scanpci -f -2 and scanpci -O do not.

Does the problem always happen when scanpci is making an ioperm
syscall (as in the strace output below)?


> The driver does not even need to be loaded to have the problem
> (e1000). I have not tried the 2.6.18 driver with 2.6.20, but I have
> tried both the in-kernel driver as well as the newer driver from Intel
> with the same result.
> 
> The drive is a Seagate Barracuda 7200.9 80 Gbytes with fimware 3.AAE
> I can include hdparm -i output if it will help.
> 
> The problem is only happening on 64-bit. As noted above, I'm running
> git-bisect to test a stock kernel.org kernel. 32-bit Ubuntu does not
> exhibit the problem, I have not tested a kernel.org 32-bit kernel.
> 
-
> strace: I don't know what syscall_273 does. I trimmed the output to
> include syscall 273 and the lines surrounding it. I can include the
> entirety of the strace if it will help.

Does this include trace info all the way to the end of the trace
output file?  If not, please send that part also.


> arch_prctl(ARCH_SET_FS, 0x2aca24060f50) = 0
> mprotect(0x2aca23e3b000, 12288, PROT_READ) = 0
> munmap(0x2aca238e2000, 36649)   = 0
> set_tid_address(0x2aca24060fe0) = 10319
> syscall_273(0x2aca24060ff0, 0x18, 0x7fff87790188, 0x2aca233193c0,
> 0x2aca24060f50, 0x2aca233352b8, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
> 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1,
> 0x1, 0x1, 0x1, 0x1, 0x1) = 0
> rt_sigaction(SIGRTMIN, {0x2aca23e4a3a0, [], SA_RESTORER|SA_SIGINFO,
> 0x2aca23e53200}, NULL, 8) = 0
> rt_sigaction(SIGRT_1, {0x2aca23e4a2f0, [],
> SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x2aca23e53200}, NULL, 8) = 0
> rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
> ioperm(0, 0x400, 0x1)   = 0


---
~Randy
Phaedrus says that Quality is about caring.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Joseph Fannin
FWIW, on all the hardware I have, Windows is able to deal with:

(1) hibernate Windows
(2) run $(OTHER_OS)
(3) resume Windows

... which seems to me to say that Linux is doing it wrong if it can't
handle other ACPI users between hibernate and resume.  But maybe
that's just my hardware.

--
Joseph Fannin
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


why network devices don't do reference counting? (Re: [PATCH] Module use count must be updated as bridges are created/destroyed)

2007-09-26 Thread Oleg Verych
* Wed, 26 Sep 2007 08:37:05 -0700
* Organization: Linux Foundation
>
> On Wed, 26 Sep 2007 08:53:27 +0100
> "Jan Beulich" <[EMAIL PROTECTED]> wrote:
>
>> Otherwise 'modprobe -r' on a module having a dependency on bridge will
>> implicitly unload bridge, bringing down all connectivity that was
>> using bridges.
>> 
>> Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
>>
>
> No, network devices don't do reference counting.

Could you explain why, please?

After `udevd` on boot loads lots of unused crap, i surrendered, and use
$(rmmod `lsmod | just first column`). Networing bravely wipes away. OK,
there are lots of configs: udev, hotplug, modprobe, that somebody might
like to fix. But it came to the end with me. I just don't care. So,
please answer :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH4/4] [POWERPC] Fix cpm_uart driver

2007-09-26 Thread Rune Torgersen
> From: Scott Wood
> Maybe that's how it was, but the current code initializes it (more or
> less) directly with IMAP_ADDR, which also gets fed into ioremap.
> 
> One of the two has got to be wrong.

arch/ppc maps the immr area 1:1 into kernel memory, so ioremap and
physical are the same.
See arch/ppc/syslib/m8260_setup.c, line 208 (function m8260_map_io)

Here quoted:
arch/ppc/syslib/m8260_setup.c
196 /* Map the IMMR, plus anything else we can cover
197  * in that upper space according to the memory controller
198  * chip select mapping.  Grab another bunch of space
199  * below that for stuff we can't cover in the upper.
200  */
201 static void __init
202 m8260_map_io(void)
203 {
204 uint addr;
205
206 /* Map IMMR region to a 256MB BAT */
207 addr = (cpm2_immr != NULL) ? (uint)cpm2_immr : CPM_MAP_ADDR;
208 io_block_mapping(addr, addr, 0x1000, _PAGE_IO);
209
210 /* Map I/O region to a 256MB BAT */
211 io_block_mapping(IO_VIRT_ADDR, IO_PHYS_ADDR, 0x1000,
_PAGE_IO);
212 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] taskstats: fix stats->ac_exitcode to work on threads and use group_exit_code

2007-09-26 Thread roel
Guillaume Chazarain wrote:

[...]

> @@ -65,13 +65,15 @@ void bacct_add_tsk(struct taskstats *stats, struct 
> task_struct *tsk)
>  void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *tsk,
>   bool tg_stats)
>  {
> + int group_exit_code;
> +
>   fill_wall_times(stats, tsk);
>  
> - if (thread_group_leader(tsk)) {
> - stats->ac_exitcode = tsk->exit_code;
> - if (tsk->flags & PF_FORKNOEXEC)
> - stats->ac_flag |= AFORK;
> - }
> + if (thread_group_leader(tsk) && ((tsk->flags & PF_FORKNOEXEC)))

if (thread_group_leader(tsk) && (tsk->flags & PF_FORKNOEXEC))

> + stats->ac_flag |= AFORK;
> +
> + group_exit_code = tg_stats ? tsk->signal->group_exit_code : 0;
> + stats->ac_exitcode = group_exit_code ? : tsk->exit_code;

Isn't this just confusing? why not

if (tg_stats) {
group_exit_code = tsk->signal->group_exit_code;
stats->ac_exitcode = group_exit_code;

} else {
group_exit_code = 0;
stats->ac_exitcode = tsk->exit_code;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] mm: remove throttle_vm_writeback

2007-09-26 Thread Peter Zijlstra

On Thu, 2007-04-05 at 15:44 -0700, Andrew Morton wrote:
> On Thu, 05 Apr 2007 19:42:18 +0200
> [EMAIL PROTECTED] wrote:
> 
> > rely on accurate dirty page accounting to provide enough push back
> 
> I think we'd like to see a bit more justification than that, please.

it should read like this:

for ( ; ; ) {
get_dirty_limits(_thresh, _thresh, NULL, NULL);

/*
 * Boost the allowable dirty threshold a bit for page
 * allocators so they don't get DoS'ed by heavy writers
 */
dirty_thresh += dirty_thresh / 10;  /* wh... */

if (global_page_state(NR_FILE_DIRTY) + 
global_page_state(NR_UNSTABLE_NFS) +
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;

congestion_wait(WRITE, HZ/10);
}

[ note the extra NR_FILE_DIRTY ]

now, balance_dirty_pages() is there to ensure:

  nr_dirty + nr_unstable + nr_writeback < dirty_thresh  (1)

reclaim will (with the introduction of dirty page tracking) never
generate dirty pages, so the only disturbance of that equation is an
increase in nr_writeback.

[ pageout() sets wbc.for_reclaim=1, so NFS traffic will not generate
  unstable pages ]

So, what throttle_vm_writeout() does is limit the number of added
writeback pages to 10% of the total limit.

pageout() seems to avoid stuffing pages down a congested bdi 
(TODO: has details), along with the much smaller io-queues, the initial
purpose of this function - which was to avoid all memory getting stuck
in io-queues - seems to be handled.

Now the problems...

Trouble is that it currently does not take nr_dirty into account which
in the worst case limits it to 110% of the limit.

Also, I'm seeing (2.6.23-rc8-mm1) live-locks in throttle_vm_writeback()
where nr_dirty + nr_unstable > thresh - which according to (1) should
not happen, and will not change without explicit action.

Hmm maybe the 10% is < nr_cpus * ratelimit_pages.

2 cpus, mem=128M -> ratelimit_pages ~ 512
threshold ~ 1500

so indeed: 150 < 1024.

Still not conclusive but at least getting somewhere.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Rafael J. Wysocki
On Wednesday, 26 September 2007 21:49, Rafael J. Wysocki wrote:
> On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote:
> > On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
> > > There still are some oddities.
> > > 
> > > First, with the "x86-64: Disable local APIC timer use on AMD systems with 
> > > C1E"
> > > patch and my collection of suspend patches applied, the box doesn't boot
> > > (the suspend patches don't even thouch the boot code, so they should be
> > > irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
> > > (adjusted
> > > for 2.6.23-rc8) is applied in addition.  Is this expected?
> > 
> > No. That's odd. It is nothing else than adding "noapictimer" to the
> > kernel command line.
> 
> Seems to be reproducible, though.  I'll investigate further.

So far, the results are the following:

1) current Linus' tree doesn't boot with any command line (regression)

[  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0

   x86-64: Disable local APIC timer use on AMD systems with C1E

   It's not necessary for 2.6.23 and actually kills the box that it's supposed 
to fix. ]

2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems with 
C1E"
   patch applied behaves like the current -git

3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_
   "apicmaintimer"

4) 2.6.22 behaves like 2.6.23-rc8

5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
   "noapictimer"

6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
   "x86-64: Disable local APIC timer use on AMD systems with C1E" patch boots
   without any extra command line options

Tested for a couple of times with each kernel, the results seem to be
reproducible 100% of the time.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008

2007-09-26 Thread Mel Gorman
On (26/09/07 21:40), D-Tick didst pronounce:
> Hi,
> i described it a little more in detail in
> http://lkml.org/lkml/2007/9/25/184 2 months ago. 

Are you sure about that link? It looks like my own posting.

> The kernel oopses often when there is (heavy) disk access, but not
> always, thats the point, sometimes it runs 4 weeks, sometimes only a
> few days. With older kernels sometimes the software raid was out of sync
> and one disk was gone, with ne "new" kernel nothing like this happend.
> 

Is there any chance you have dodgy/overclocked RAM? i.e. Have you tried
running memtest for 48 hours? I ask because a NULL reference of 0008
is suspiciously like a single bit flip. Similarly, is there any chance you
have bad cables connecting your disks? I ask because of the long-lived time
running before it corrupts in combination with the fact you say this problem
has existed for a number of kernels. That old a kernel bug, I would have
expected a number of similar reports particularly if ext3 is involved.

Thanks

> Hendrik - D-Tick
> 
> On Tue, Sep 25, 2007 at 02:27:28PM +0100, Mel Gorman wrote:
> > On (25/09/07 13:53), Hendrik P. didst pronounce:
> > > Maybe you know what bringt this box down:
> > > 
> > > regards,
> > > Hendrik P.
> > > 
> > 
> > Nick, this is a random stab in the dark but you were around 
> > mark_buffer_dirty()
> > a few months ago. Does this error look familiar?
> > 
> > Hendrik, what is going on when you trigger this? Is it easily
> > reproducible? Does it only occur with DEBUG_PAGEALLOC?
> > 
> > Thanks
> > 
> > > [263322.356816] BUG: unable to handle kernel NULL pointer dereference at
> > > virtual address 0008
> > > [263322.459908]  printing eip:
> > > [263322.493267] c014e09c
> > > [263322.520391] *pde = 
> > > [263322.554795] Oops:  [#1]
> > > [263322.589188] DEBUG_PAGEALLOC
> > > [263322.623796] Modules linked in: lirc_dev jfs xfs reiserfs ntfs vfat
> > > fat isofs udf
> > > [263322.714319] CPU:0
> > > [263322.714322] EIP:0060:[]Not tainted VLI
> > > [263322.714327] EFLAGS: 00010046   (2.6.21.5_VIA_EPIA_MII_12000 #1)
> > > [263322.878840] EIP is at __set_page_dirty_nobuffers+0x6c/0x110
> > > [263322.946480] eax:    ebx: f6fe9ea0   ecx: f6fe9ea0   edx:
> > > c101cba0
> > > [263323.028685] esi: c330e3c8   edi:    ebp: f71f1e98   esp:
> > > f71f1e94
> > > [263323.110886] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
> > > [263323.181661] Process kjournald (pid: 1362, ti=f71f task=f7c87350
> > > task.ti=f71f)
> > > [263323.274252] Stack: c5bbe504 f71f1ea0 c018391f f71f1ec4 c01c7d0f
> > > c01cd4d7 f7c87350  
> > > [263323.376097]f71f1ed8 c01cd7d6 e14a7c9c c330e3c8 f71f1ed0
> > > c01c802b e14a7c9c f71f1ef8 
> > > [263323.477943]c01c809d f71f1ef8 c01cd87e c01cf4d6 1000
> > > 2ff5 c5bbe4e0 e14a7c9c 
> > > [263323.579789] Call Trace:
> > > [263323.612215]  [] show_trace_log_lvl+0x1a/0x30
> > > [263323.674774]  [] show_stack_log_lvl+0xa9/0xd0
> > > [263323.737337]  [] show_registers+0x1e9/0x2f0
> > > [263323.797821]  [] die+0x11b/0x230
> > > [263323.846872]  [] do_page_fault+0x2c6/0x5d0
> > > [263323.906318]  [] error_code+0x74/0x80
> > > [263323.960566]  [] mark_buffer_dirty+0x1f/0x30
> > > [263324.022089]  [] __journal_temp_unlink_buffer+0x5f/0x160
> > > [263324.096081]  [] __journal_unfile_buffer+0xb/0x20
> > > [263324.162801]  [] __journal_refile_buffer+0x5d/0xa0
> > > [263324.230558]  [] journal_commit_transaction+0xb10/0x1220
> > > [263324.304553]  [] kjournald+0x12c/0x340
> > > [263324.359841]  [] kthread+0xa3/0xd0
> > > [263324.410971]  [] kernel_thread_helper+0x7/0x10
> > > [263324.474573]  ===
> > > [263324.518323] Code: 00 00 00 90 8b 02 8b 4a 10 25 00 80 00 00 66 85 c0
> > > 0f 85 a7 00 00 00 f6 c1 01 75 2e 85 c9 74 2a 39 cb 0f 85 81 00 00 00 8b
> > > 43 38  40 08 01 74 3d 8b 02 25 00 80 00 00 66 85 c0 75 72 8b 52 14 
> > > [263324.751737] EIP: [] __set_page_dirty_nobuffers+0x6c/0x110
> > > SS:ESP 0068:f71f1e94
> > > 
> > > 
> > > config can be found at http://zankt.net/~d-tick/kernel-2.6.21.5-config
> > > lspci, cmdline, mount at http://zankt.net/~d-tick/voltkraft-sysinfo
> > > 
> > > 
> > > older oops:
> > > 
> > > [190861.885741] BUG: unable to handle kernel paging request at virtual
> > > address 01190024
> > > [190861.978686]  printing eip:
> > > [190862.012138] c0176026
> > > [190862.039274] *pde = 
> > > [190862.073778] Oops:  [#1]
> > > [190862.108250] DEBUG_PAGEALLOC
> > > [190862.142857] Modules linked in: ntfs vfat fat isofs udf
> > > [190862.205945] CPU:0
> > > [190862.205948] EIP:0060:[]Not tainted VLI
> > > [190862.205953] EFLAGS: 00010206   (2.6.21.5_VIA_EPIA_MII_12000 #1)
> > > [190862.370464] EIP is at __d_lookup+0x66/0xf0
> > > [190862.420439] eax: d309   ebx: 01190024   ecx: 42ff6ccb   edx:
> > > c0e5da0c
> > > [190862.502644] esi: 01190024   edi: eb49bef4   ebp: eb49bdc4   esp:
> > > eb49bd98
> > > [190862.584847] ds: 007b 

Re: [PATCH4/4] [POWERPC] Fix cpm_uart driver

2007-09-26 Thread Scott Wood
On Wed, Sep 26, 2007 at 03:32:29PM -0500, Rune Torgersen wrote:
> > From: Scott Wood
> > Maybe that's how it was, but the current code initializes it (more or
> > less) directly with IMAP_ADDR, which also gets fed into ioremap.
> > 
> > One of the two has got to be wrong.
> 
> arch/ppc maps the immr area 1:1 into kernel memory, so ioremap and
> physical are the same.
> See arch/ppc/syslib/m8260_setup.c, line 208 (function m8260_map_io)

We were talking about 8xx, not 82xx -- is it always identity mapped there?

If so, then why bother with the ioremap in immr_map_size() in
arch/ppc/8xx_io/commproc.c?  And why compare the result from ioremap() with
a raw identity-mapped address?

-Scott
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + uninline-find_task_by_xxx-set-of-functions.patch added to -mm tree

2007-09-26 Thread Ingo Molnar

* [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> --
> Subject: Uninline find_task_by_xxx set of functions
> From: Pavel Emelyanov <[EMAIL PROTECTED]>
> 
> The find_task_by_something is a set of macros are used to find task by pid
> depending on what kind of pid is proposed - global or virtual one.  All of
> them are wrappers above the most generic one - find_task_by_pid_type_ns() -
> and just substitute some args for it.
> 
> It turned out, that dereferencing the current->nsproxy->pid_ns construction
> and pushing one more argument on the stack inline cause kernel text size to
> grow.
> 
> This patch moves all this stuff out-of-line into kernel/pid.c.  Together
> with the next patch it saves a bit less than 400 bytes from the .text
> section.
> 
> Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>
> Cc: Sukadev Bhattiprolu <[EMAIL PROTECTED]>
> Cc: Oleg Nesterov <[EMAIL PROTECTED]>
> Cc: Paul Menage <[EMAIL PROTECTED]>
> Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: No linux/module.h

2007-09-26 Thread Kristof Provost
On 2007-09-26 11:29:33 (+0100), mahamuni ashish <[EMAIL PROTECTED]> wrote:
> I am writing simple kernel module.
> I have included linux/module.h
> compiler gives me error that no such file, I also
> searched it on my machine.
> It really doesn't exist. I am using fedora 6.
> How do I install required libraries.
I suspect you either have an incorrect makefile or you don't have the
kernel source code on your system. The header file can be found in the
kerel source: include/linux/module.h

Regards, 
Kristof 


signature.asc
Description: Digital signature


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Joseph Fannin
On Fri, Sep 21, 2007 at 11:45:12AM +0200, Pavel Machek wrote:
> Hi!
> > >
> > > Sounds doable, as long as you can cope with long command lines (which
> > > shouldn't be a biggie). (If you've got a swapfile or parts of a swap
> > > partition already in use, it can be quite fragmented).
> >
> > Hmm.  This is an interesting problem.  Sharing a swap file or a swap
> > partition with the actual swap of user space pages does seem to be
> > a limitation of this approach.
> >
> > Although the fact that it is simple to write to a separate file may
> > be a reasonable compensation.
>
> I'm not sure how you'd write it to a separate file. Notice that kjump
> kernel may not mount journalling filesystems, not even
> read-only. (Ext3 replays journal in that case). You could pass block
> numbers from the original kernel...

The ext3 thing is a bug, the case for which I don't think has been
adequately explained to the ext[34] folks.  There should be at least a
no_replay mount flag available, or something.  It has ramifications
for more than just hibernation.

And yeah, I'm gonna bring up the swap files thing again.  If you
can hibernate to a swap file, you can hibernate to a dedicated
hibernation file, and vice versa.

If you can't hibernate to a swap file, then swap files are
effectively unsupported for any system you might want to hibernate.
 I wonder what embedded folks would think about that
.

But, in my ignorance, I'm not sure even fixing the ext3 bug will
guarantee you consistent metadata so that you can handle a
swap/hibernate file.  You can do a sync(), but how do you make that
not race against running processes without the freezer, or blkdev
snapshots?

I guess uswsusp and the-patch-previously-known-as-suspend2 handle
this somehow, though.

   (It's that same ignorance that has me waiting for someone with
established credit with kernel people to make that argument for the
ext3 bug, so I can hang my own reasons for thinking that it's bad off
of theirs).

--
Joseph Fannin
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sata_sil24 broken since 2.6.23-rc4-mm1

2007-09-26 Thread Torsten Kaiser
As reported in the "2.6.23-rc4-mm1"-thread and the "What's in
linux-2.6-block.git for 2.6.24"-thread I'm having trouble that
sometimes on bootup one drive from the SiI-3132 throws errors and
becomes inaccesible.

The latest kernel I have seen this error was 2.6.23-rc7-mm1.
>From 7 boots 2 times the following happend:

Sep 25 07:42:11 treogen [   33.81] md1: bitmap initialized from
disk: read 10/10 pages, set 0 bits
Sep 25 07:42:11 treogen [   33.81] created bitmap (145 pages) for device md1
Sep 25 07:42:11 treogen [   63.91] ata1.00: exception Emask 0x0
SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 25 07:42:11 treogen [   63.91] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 25 07:42:11 treogen [   63.91]  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 25 07:42:11 treogen [   63.91] ata1.00: status: {DRDY }
Sep 25 07:42:11 treogen [   63.91] ata1: hard resetting link
Sep 25 07:42:11 treogen [   66.21] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [   66.21] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 25 07:42:11 treogen [   73.91] ata1: hard resetting link
Sep 25 07:42:11 treogen [   76.21] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [   76.21] ata1: reset failed (errno=-5),
retrying in 8 secs
Sep 25 07:42:11 treogen [   83.91] ata1: hard resetting link
Sep 25 07:42:11 treogen [   86.21] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [   86.21] ata1: reset failed (errno=-5),
retrying in 33 secs
Sep 25 07:42:11 treogen [  118.91] ata1: limiting SATA link speed
to 1.5 Gbps
Sep 25 07:42:11 treogen [  118.91] ata1: hard resetting link
Sep 25 07:42:11 treogen [  121.21] ata1: softreset failed (port not ready)
Sep 25 07:42:11 treogen [  121.21] ata1: reset failed, giving up
Sep 25 07:42:11 treogen [  121.21] ata1.00: disabled
Sep 25 07:42:11 treogen [  121.21] ata1: EH complete
Sep 25 07:42:11 treogen [  121.21] sd 0:0:0:0: [sda] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 25 07:42:11 treogen [  121.21] end_request: I/O error, dev
sda, sector 625137161
Sep 25 07:42:11 treogen [  121.21] md: super_written gets
error=-5, uptodate=0
Sep 25 07:42:11 treogen [  121.21] raid5: Disk failure on sda2,
disabling device. Operation continuing on 2 devices

Comparing the driver/ata directory from rc3-mm1 and rc4-mm1 the
following change looked the most suspicions to me:
http://git.kernel.org/?p=linux/kernel/git/jgarzik/libata-dev.git;a=blobdiff;f=drivers/ata/sata_sil24.c;h=3dcb223117be9739ee04d70b6bfc776a4b839a3f;hp=e0cd31aa8002350add53ba6ff07493e503275244;hb=020bc1bd8d369a77bd9379cd9763ac0057651753;hpb=8d4bdf8087e682df98bdb856f6ad451bf6d597e7

That after rc4-mm1 the sata_sil24.c did not change anymore also
matches the occurrence of the error.

To confirm my theorie I exchanged the sata_sil24.c from rc8-mm1 with
the version from rc3-mm1.
I was able to boot the resulting kernel successfully 5 times, without
the error happening again.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2 kills r8169 send performance

2007-09-26 Thread Timo Jantunen
On Wed, 26 Sep 2007, Willy Tarreau wrote:

> On Wed, Sep 26, 2007 at 09:52:02PM +0300, Timo Jantunen wrote:
> > On Wed, 26 Sep 2007, Francois Romieu wrote:
> > > The patch below is scheduled for inclusion before 2.6.23. Please try it 
> > > and
> > > see if it makes a difference on top of 2.6.23-rc8 (full dmesg will be 
> > > welcome
> > > too).
> > Thanks for the quick reply and fix. Unfortunately the fix didn't help in my 
> > case.
> 
> In another thread on LKML today, there has been some discussion about a
> similar problem, which is caused by a locking bug in iperf which makes
> it spin at 100% CPU. Ingo has posted a fix for this, please check the list.

I noticed the problem originally with another program (playback of 720p 
video to remote X using mplayer). And in my case the CPU usage is 2-3% 
systime, <1% userspace so it doesn't seem to be anything to do with the 
scheduler (I did try that "echo 1 > /proc/sys/kernel/sched_compat_yield" 
workaround, too.)


//T

> Regards,
> Willy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] dmapool: Improve memory usage for devices which can't cross boundaries

2007-09-26 Thread roel
Matthew Wilcox wrote:

[...]

> @@ -142,14 +144,13 @@ struct dma_pool *dma_pool_create(const char *name, 
> struct device *dev,
>   if ((size % align) != 0)
>   size = ALIGN(size, align);
>  
> - if (allocation == 0) {
> - if (PAGE_SIZE < size)
> - allocation = size;
> - else
> - allocation = PAGE_SIZE;
> - // FIXME: round up for less fragmentation
> - } else if (allocation < size)
> + allocation = max_t(size_t, size, PAGE_SIZE);
> +
> + if (!boundary) {
> + boundary = allocation;
> + } else if ((boundary < size) || (boundary & (boundary - 1))) {
>   return NULL;
> + }

if (!boundary)
boundary = allocation;
else if (boundary < size || boundary & (boundary - 1))
return NULL;

[...]

> @@ -190,11 +192,14 @@ struct dma_pool *dma_pool_create(const char *name, 
> struct device *dev,
>  static void pool_initialise_page(struct dma_pool *pool, struct dma_page 
> *page)
>  {
>   unsigned int offset = 0;
> + unsigned int next_boundary = pool->boundary;
>  
>   do {
>   unsigned int next = offset + pool->size;
> - if (unlikely((next + pool->size) >= pool->allocation))
> - next = pool->allocation;
> + if (unlikely((next + pool->size) >= next_boundary)) {

if (unlikely(next + pool->size >= next_boundary)) {

[...]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2 kills r8169 send performance

2007-09-26 Thread Willy Tarreau
On Wed, Sep 26, 2007 at 09:52:02PM +0300, Timo Jantunen wrote:
> On Wed, 26 Sep 2007, Francois Romieu wrote:
> 
> > The patch below is scheduled for inclusion before 2.6.23. Please try it and
> > see if it makes a difference on top of 2.6.23-rc8 (full dmesg will be 
> > welcome
> > too).
> 
> Thanks for the quick reply and fix. Unfortunately the fix didn't help in my 
> case.

In another thread on LKML today, there has been some discussion about a
similar problem, which is caused by a locking bug in iperf which makes
it spin at 100% CPU. Ingo has posted a fix for this, please check the list.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bw-qcam: use data_reverse instead of manually poking the control register

2007-09-26 Thread Brett Warden
On 9/26/07, Ray Lee <[EMAIL PROTECTED]> wrote:

> Just as an aside, if you've tested this and it works, then there's no
> point to keep the write_lpcontrol even as a comment. Kill those four
> lines, and if someone's interested in what happened they'll just look
> at the file history.

Point taken, thanks for the feedback.

---

diff --git a/drivers/media/video/bw-qcam.c b/drivers/media/video/bw-qcam.c
index 7d47cbe..0ba92e3 100644
--- a/drivers/media/video/bw-qcam.c
+++ b/drivers/media/video/bw-qcam.c
@@ -107,6 +107,11 @@ static inline void write_lpcontrol(struct
qcam_device *q, int d)
parport_write_control(q->pport, d);
 }

+static inline void reverse_port(struct qcam_device *q)
+{
+   parport_data_reverse(q->pport);
+}
+
 static int qc_waithand(struct qcam_device *q, int val);
 static int qc_command(struct qcam_device *q, int command);
 static int qc_readparam(struct qcam_device *q);
@@ -369,7 +374,7 @@ static void qc_reset(struct qcam_device *q)
break;

case QC_ANY:
-   write_lpcontrol(q, 0x20);
+   reverse_port(q);
write_lpdata(q, 0x75);

if (read_lpdata(q) != 0x75) {
@@ -512,10 +517,12 @@ static inline int qc_readbytes(struct
qcam_device *q, char buffer[])
switch (q->port_mode & QC_MODE_MASK)
{
case QC_BIDIR:  /* Bi-directional Port */
-   write_lpcontrol(q, 0x26);
+   reverse_port(q);
+   write_lpcontrol(q, 0x6);
lo = (qc_waithand2(q, 1) >> 1);
hi = (read_lpstatus(q) >> 3) & 0x1f;
-   write_lpcontrol(q, 0x2e);
+   reverse_port(q);
+   write_lpcontrol(q, 0xe);
lo2 = (qc_waithand2(q, 0) >> 1);
hi2 = (read_lpstatus(q) >> 3) & 0x1f;
switch (q->bpp)
@@ -613,10 +620,13 @@ static long qc_capture(struct qcam_device * q,
char __user *buf, unsigned long l

if ((q->port_mode & QC_MODE_MASK) == QC_BIDIR)
{
-   write_lpcontrol(q, 0x2e);   /* turn port around */
-   write_lpcontrol(q, 0x26);
+   reverse_port(q);/* turn port around */
+   write_lpcontrol(q, 0xe);
+   reverse_port(q);
+   write_lpcontrol(q, 0x6);
(void) qc_waithand(q, 1);
-   write_lpcontrol(q, 0x2e);
+   reverse_port(q);
+   write_lpcontrol(q, 0xe);
(void) qc_waithand(q, 0);
}




-- 
Brett Warden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] Change dmapool free block management

2007-09-26 Thread roel
Matthew Wilcox wrote:

[...]

> @@ -113,9 +133,12 @@ struct dma_pool *dma_pool_create(const char *name, 
> struct device *dev,
>   return NULL;
>   }
>  
> - if (size == 0)
> + if (size == 0) {
>   return NULL;
> -
> + } else if (size < 4) {
> + size = 4;
> + }

you could do without brackets

[...]

> @@ -263,34 +288,21 @@ void dma_pool_destroy(struct dma_pool *pool)
>   *
>   * This returns the kernel virtual address of a currently unused block,
>   * and reports its dma address through the handle.
> - * If such a memory block can't be allocated, null is returned.
> + * If such a memory block can't be allocated, %NULL is returned.
>   */
>  void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
>dma_addr_t * handle)
>  {
>   unsigned long flags;
>   struct dma_page *page;
> - int map, block;
>   size_t offset;
>   void *retval;
>  
>   spin_lock_irqsave(>lock, flags);
>   restart:
>   list_for_each_entry(page, >page_list, page_list) {
> - int i;
> - /* only cachable accesses here ... */
> - for (map = 0, i = 0;
> -  i < pool->blocks_per_page; i += BITS_PER_LONG, map++) {
> - if (page->bitmap[map] == 0)
> - continue;
> - block = ffz(~page->bitmap[map]);
> - if ((i + block) < pool->blocks_per_page) {
> - clear_bit(block, >bitmap[map]);
> - offset = (BITS_PER_LONG * map) + block;
> - offset *= pool->size;
> - goto ready;
> - }
> - }
> + if (page->offset < pool->allocation)
> + goto ready;
>   }
>   if (!(page = pool_alloc_page(pool, GFP_ATOMIC))) {

page = pool_alloc_page(pool, GFP_ATOMIC);
if(!page) {

[...]

> @@ -355,7 +366,7 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, 
> dma_addr_t dma)
>  {
>   struct dma_page *page;
>   unsigned long flags;
> - int map, block;
> + unsigned int offset;
>  
>   if ((page = pool_find_page(pool, dma)) == 0) {

page = pool_find_page(pool, dma);
if (page == 0) {

>   if (pool->dev)
> @@ -368,13 +379,9 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, 
> dma_addr_t dma)
>   return;
>   }
>  
> - block = dma - page->dma;
> - block /= pool->size;
> - map = block / BITS_PER_LONG;
> - block %= BITS_PER_LONG;
> -
> + offset = vaddr - page->vaddr;
>  #ifdef   CONFIG_DEBUG_SLAB
> - if (((dma - page->dma) + (void *)page->vaddr) != vaddr) {
> + if ((dma - page->dma) != offset) {

if (dma - page->dma != offset) {
[...]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] taskstats: fix indentation of long argument lists

2007-09-26 Thread Andrew Morton
On Wed, 26 Sep 2007 19:08:18 +0200
Guillaume Chazarain <[EMAIL PROTECTED]> wrote:

> Align with the opening parenthesis.
> 
> Changelog since V1 (http://lkml.org/lkml/2007/9/21/527):
> - renamed fill_threadgroup() and add_tsk() to respectively
> fill_threadgroup_stats() and add_tsk_stats() as suggested by Balbir Singh.
> - added braces around do/while.
> - added patch to unbreak binary compatibility between taskstats v5/v6.
> - split further by preparing the bacct/xacct before the main patch.
> - some indentation fixes.

It is unclear to me whether this patch series replaces, augments or
conflicts with your earlier patch series starting with "taskstats: separate
PID/TGID stats producers to complete the TGID ones".

But it doesn't matter much - I think I'll duck both patch series for now. 
The gap between mainline and -mm is so large that it is becoming
unmanageable so let's concentrate on the review and stabilisation work for
now please, rather than adding more stuff.

Also, I don't think we're seeing enough review and test from people on
these patch series - I don't have time to do it all.  (Well, apparently I
do, but I don't think it's a good situation).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


No linux/module.h

2007-09-26 Thread mahamuni ashish
I am writing simple kernel module.
I have included linux/module.h
compiler gives me error that no such file, I also
searched it on my machine.
It really doesn't exist. I am using fedora 6.
How do I install required libraries.


  Did you know? You can CHAT without downloading messenger. Go to 
http://in.messenger.yahoo.com/webmessengerpromo.php/ 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-26 Thread Andrew Morton
On Fri, 21 Sep 2007 16:13:56 -0700
Mingming Cao <[EMAIL PROTECTED]> wrote:

> Convert kmalloc to kzalloc() and get rid of the memset().

I split this into separate ext3/jbd and ext4/jbd2 patches.  It's generally
better to raise separate patches, please - the ext3 patches I'll merge
directly but the ext4 patches should go through (and be against) the ext4
devel tree.

I fixed lots of rejects against the already-pending changes to these
filesystems.

You forgot to remove the memsets in both start_this_handle()s.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread David Wilder

Randy Dunlap wrote:

On Wed, 26 Sep 2007 11:22:29 -0700 David J. Wilder wrote:


These patches provide a kernel tracing interface called "trace".

(update) Moved the sample code to the new samples\ subdir

The motivation for "trace" is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

Patches are against 2.6.23-rc6-mm1

Summary of patches:
[patch 1/3]  Trace code and documentation
[patch 2/3]  Relay Reset Consumed
[patch 3/3]  Trace sample

Note: Patches 1/3 and 2/3 must be applied together.


Patch 2 provides an interface that patch 1 needs, correct?

Yes.

So yes, patches 1 & 2 need to be applied together (merged),
or their order could be reversed, yes?  
2/3 should be applied at the same time as 1/3,  or 2/3 can be applied 
standalone.  The order they are applied makes no difference.  But trace 
will not build if the relay patch is not applied.


Can't the Relay patch

be merged standalone without breaking anything?


Yes the relay patch can be applied standalone.







Note: The following patches must be applied with 3/3.
[patch 3/5] Add samples subdir
http://lkml.org/lkml/2007/9/25/157
[patch 4/5] Linux Kernel Markers - Samples
http://lkml.org/lkml/2007/9/25/166



---
~Randy
Phaedrus says that Quality is about caring.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] dmapool: Validate parameters to dma_pool_create

2007-09-26 Thread roel
Matthew Wilcox wrote:
> Check that 'align' is a power of two, like the API specifies.
> Align 'size' to 'align' correctly -- the current code has an off-by-one.
> The ALIGN macro in kernel.h doesn't.
> 
> Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
> ---
>  mm/dmapool.c |   15 ---
>  1 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index a359b5e..f5d12a7 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -107,17 +107,18 @@ struct dma_pool *dma_pool_create(const char *name, 
> struct device *dev,
>  {
>   struct dma_pool *retval;
>  
> - if (align == 0)
> + if (align == 0) {
>   align = 1;
> - if (size == 0)
> + } else if (align & (align - 1)) {
>   return NULL;
> - else if (size < align)
> - size = align;
> - else if ((size % align) != 0) {
> - size += align + 1;
> - size &= ~(align - 1);
>   }
>  
> + if (size == 0)
> + return NULL;

The brackets in the first if/else are not required, and you could combine the 
two statements:

if (align == 0)
align = 1;
else if (align & (align - 1) || size == 0)
return NULL;
> +
> + if ((size % align) != 0)
> + size = ALIGN(size, align);
> +
>   if (allocation == 0) {
>   if (PAGE_SIZE < size)
>   allocation = size;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bw-qcam: use data_reverse instead of manually poking the control register

2007-09-26 Thread Ray Lee
On 9/26/07, Brett Warden <[EMAIL PROTECTED]> wrote:
> Appeases the warning "parport0 (bw-qcam): use data_reverse for this!"
>
> Signed-off-by: Brett T. Warden <[EMAIL PROTECTED]>
>
> ---
>
> It seems to work fine with my Quickcam under 2.6.22.

> @@ -369,7 +374,11 @@ static void qc_reset(struct qcam_device *q)
> break;
>
> case QC_ANY:
> -   write_lpcontrol(q, 0x20);
> +   /*
> +* Replaced with reverse_port
> +* write_lpcontrol(q, 0x20);
> +*/
> +   reverse_port(q);
> write_lpdata(q, 0x75);
>
> if (read_lpdata(q) != 0x75) {

Just as an aside, if you've tested this and it works, then there's no
point to keep the write_lpcontrol even as a comment. Kill those four
lines, and if someone's interested in what happened they'll just look
at the file history.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


What's in linux1394-2.6.git

2007-09-26 Thread Stefan Richter
I heard that project tree maintainers are encouraged to post merge plans
in time, so I'm going to do so now too.  Although this time there isn't
much in store for drivers/{ieee1394,firewire} for the merge window
because we all got sidetracked lately...  It's basically bugfixes which
I felt were too late for 2.6.23.


To be merged:

Jean Delvare (1):
  ieee1394: pcilynx: I2C cleanups

Satyam Sharma (1):
  ieee1394: Fix kthread stopping in nodemgr_host_thread

Stefan Richter (18):
  ieee1394: eth1394: superfluous local variable
  ieee1394: eth1394: fix lock imbalance
  ieee1394: pcilynx: superfluous local variables
  ieee1394: sbp2: fix unsafe iteration over list of devices
  firewire: optimize fw_core_add_address_handler
  firewire: fw-core: local variable shadows a global one
  firewire: fw-sbp2: always enable IRQs before calling command ORB callback
  firewire: fw-sbp2: add support for multiple logical units per target
  firewire: fw-sbp2: expose module parameter for workarounds
  firewire: fw-sbp2: use an own workqueue (fix system responsiveness)
  firewire: fw-ohci: enforce read order for selfID generation
  firewire: fw-ohci: fix includes
  firewire: fw-ohci: reorder includes
  firewire: fw-ohci: log posted write errors
  firewire: fw-ohci: missing dma_unmap_single
  firewire: fw-ohci: check for misconfigured bus (phyID == 63)
  ieee1394: nodemgr: fix leak of struct csr1212_keyval
  ieee1394: csr1212: proper refcounting


On hold for debugging:

Kristian Høgsberg (1):
  firewire: Fix pci resume to not pass in a __be32 config rom.

-- 
Stefan Richter
-=-=-=== =--= ==-=-
http://arcgraph.de/sr/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008

2007-09-26 Thread D-Tick
Hi,
i described it a little more in detail in
http://lkml.org/lkml/2007/9/25/184 2 months ago. 
The kernel oopses often when there is (heavy) disk access, but not
always, thats the point, sometimes it runs 4 weeks, sometimes only a
few days. With older kernels sometimes the software raid was out of sync
and one disk was gone, with ne "new" kernel nothing like this happend.

Hendrik - D-Tick

On Tue, Sep 25, 2007 at 02:27:28PM +0100, Mel Gorman wrote:
> On (25/09/07 13:53), Hendrik P. didst pronounce:
> > Maybe you know what bringt this box down:
> > 
> > regards,
> > Hendrik P.
> > 
> 
> Nick, this is a random stab in the dark but you were around 
> mark_buffer_dirty()
> a few months ago. Does this error look familiar?
> 
> Hendrik, what is going on when you trigger this? Is it easily
> reproducible? Does it only occur with DEBUG_PAGEALLOC?
> 
> Thanks
> 
> > [263322.356816] BUG: unable to handle kernel NULL pointer dereference at
> > virtual address 0008
> > [263322.459908]  printing eip:
> > [263322.493267] c014e09c
> > [263322.520391] *pde = 
> > [263322.554795] Oops:  [#1]
> > [263322.589188] DEBUG_PAGEALLOC
> > [263322.623796] Modules linked in: lirc_dev jfs xfs reiserfs ntfs vfat
> > fat isofs udf
> > [263322.714319] CPU:0
> > [263322.714322] EIP:0060:[]Not tainted VLI
> > [263322.714327] EFLAGS: 00010046   (2.6.21.5_VIA_EPIA_MII_12000 #1)
> > [263322.878840] EIP is at __set_page_dirty_nobuffers+0x6c/0x110
> > [263322.946480] eax:    ebx: f6fe9ea0   ecx: f6fe9ea0   edx:
> > c101cba0
> > [263323.028685] esi: c330e3c8   edi:    ebp: f71f1e98   esp:
> > f71f1e94
> > [263323.110886] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
> > [263323.181661] Process kjournald (pid: 1362, ti=f71f task=f7c87350
> > task.ti=f71f)
> > [263323.274252] Stack: c5bbe504 f71f1ea0 c018391f f71f1ec4 c01c7d0f
> > c01cd4d7 f7c87350  
> > [263323.376097]f71f1ed8 c01cd7d6 e14a7c9c c330e3c8 f71f1ed0
> > c01c802b e14a7c9c f71f1ef8 
> > [263323.477943]c01c809d f71f1ef8 c01cd87e c01cf4d6 1000
> > 2ff5 c5bbe4e0 e14a7c9c 
> > [263323.579789] Call Trace:
> > [263323.612215]  [] show_trace_log_lvl+0x1a/0x30
> > [263323.674774]  [] show_stack_log_lvl+0xa9/0xd0
> > [263323.737337]  [] show_registers+0x1e9/0x2f0
> > [263323.797821]  [] die+0x11b/0x230
> > [263323.846872]  [] do_page_fault+0x2c6/0x5d0
> > [263323.906318]  [] error_code+0x74/0x80
> > [263323.960566]  [] mark_buffer_dirty+0x1f/0x30
> > [263324.022089]  [] __journal_temp_unlink_buffer+0x5f/0x160
> > [263324.096081]  [] __journal_unfile_buffer+0xb/0x20
> > [263324.162801]  [] __journal_refile_buffer+0x5d/0xa0
> > [263324.230558]  [] journal_commit_transaction+0xb10/0x1220
> > [263324.304553]  [] kjournald+0x12c/0x340
> > [263324.359841]  [] kthread+0xa3/0xd0
> > [263324.410971]  [] kernel_thread_helper+0x7/0x10
> > [263324.474573]  ===
> > [263324.518323] Code: 00 00 00 90 8b 02 8b 4a 10 25 00 80 00 00 66 85 c0
> > 0f 85 a7 00 00 00 f6 c1 01 75 2e 85 c9 74 2a 39 cb 0f 85 81 00 00 00 8b
> > 43 38  40 08 01 74 3d 8b 02 25 00 80 00 00 66 85 c0 75 72 8b 52 14 
> > [263324.751737] EIP: [] __set_page_dirty_nobuffers+0x6c/0x110
> > SS:ESP 0068:f71f1e94
> > 
> > 
> > config can be found at http://zankt.net/~d-tick/kernel-2.6.21.5-config
> > lspci, cmdline, mount at http://zankt.net/~d-tick/voltkraft-sysinfo
> > 
> > 
> > older oops:
> > 
> > [190861.885741] BUG: unable to handle kernel paging request at virtual
> > address 01190024
> > [190861.978686]  printing eip:
> > [190862.012138] c0176026
> > [190862.039274] *pde = 
> > [190862.073778] Oops:  [#1]
> > [190862.108250] DEBUG_PAGEALLOC
> > [190862.142857] Modules linked in: ntfs vfat fat isofs udf
> > [190862.205945] CPU:0
> > [190862.205948] EIP:0060:[]Not tainted VLI
> > [190862.205953] EFLAGS: 00010206   (2.6.21.5_VIA_EPIA_MII_12000 #1)
> > [190862.370464] EIP is at __d_lookup+0x66/0xf0
> > [190862.420439] eax: d309   ebx: 01190024   ecx: 42ff6ccb   edx:
> > c0e5da0c
> > [190862.502644] esi: 01190024   edi: eb49bef4   ebp: eb49bdc4   esp:
> > eb49bd98
> > [190862.584847] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> > [190862.655619] Process du (pid: 18106, ti=eb49a000 task=e2b33770
> > task.ti=eb49a000)
> > [190862.741975] Stack: c0e5da0c ea407c60 eb49be20 e19f9134 c0121e7b
> > 0019 42ff6ccb df8a9000
> > [190862.843822]df8a9019 eb49be20 eb49bef4 eb49bdec c016cb34
> > eb49be2c eb49be20 c1894120
> > [190862.945665]c016cddd df8a9000 df8a9019 eb49be20 eb49bef4
> > eb49be40 c016e62d eb49bef4
> > [190863.047511] Call Trace:
> > [190863.079939]  [] show_trace_log_lvl+0x1a/0x30
> > [190863.142498]  [] show_stack_log_lvl+0xa9/0xd0
> > [190863.205060]  [] show_registers+0x1e9/0x2f0
> > [190863.265545]  [] die+0x11b/0x230
> > [190863.314596]  [] do_page_fault+0x2c6/0x5d0
> > [190863.374040]  [] error_code+0x74/0x80
> > [190863.428289]  [] do_lookup+0x24/0x170
> > [190863.482538] 

Re: [PATCH] Add iSCSI iBFT support.

2007-09-26 Thread roel
Konrad Rzeszutek wrote:

[...]

> +static ssize_t
> +ibft_read_binary(struct kobject *kobj, struct bin_attribute *attr, char *buf,
> +  loff_t off, size_t count)
> +{
> +
> + struct ibft_device *ibft = container_of(kobj, struct ibft_device, kobj);
> + ssize_t len = ibft->hdr->length;
> +
> + if (off > len)
> + return 0;
> +
> + if (off + count > len)
> + count = len - off;

maybe you want to use:

count = min(count, len - off)

> +
> + memcpy(buf, ibft->hdr + off, count);
> +
> + return count;
> +}

[...]

> +static struct ibft_device *ibft_idev;
> +/*
> + * ibft_init() - creates  sysfs tree entry for ibft data
> + */
> +static int __init ibft_init(void)
> +{
> + int rc = 0;
> +
> + printk(KERN_INFO "BIOS iBFT facility v%s %s\n", ISCSI_IBFT_VERSION,
> +ISCSI_IBFT_DATE);
> +
> + if (!ibft_phys)
> + find_ibft();
> +
> + /* What if the ibft_subsys is underneath another struct? */
> + rc = firmware_register(_subsys);
> + if (rc)
> + return rc;
> +
> + if (ibft_phys) {
> + printk(KERN_INFO "iBFT detected at 0x%lx.\n",
> +(unsigned long)ibft_phys);
> + ibft_idev = kzalloc(sizeof(*ibft_idev), GFP_KERNEL);
> + if (!ibft_idev)
> + return -ENOMEM;
> +
> + rc = ibft_device_register(ibft_idev);
> + if (rc) {
> + kfree(ibft_idev);
> + return rc;
> + }

you could do without this return statement (and the brackets) since rc is 
returned anyway...

> + } else {
> + printk(KERN_INFO "No iBFT detected.\n");
> + }

these brackets are not required either

> + return rc;

... here

> +}

[...]

Roel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Rafael J. Wysocki
On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote:
> On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
> > There still are some oddities.
> > 
> > First, with the "x86-64: Disable local APIC timer use on AMD systems with 
> > C1E"
> > patch and my collection of suspend patches applied, the box doesn't boot
> > (the suspend patches don't even thouch the boot code, so they should be
> > irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
> > (adjusted
> > for 2.6.23-rc8) is applied in addition.  Is this expected?
> 
> No. That's odd. It is nothing else than adding "noapictimer" to the
> kernel command line.

Seems to be reproducible, though.  I'll investigate further.

> > Next, on 2.6.23-rc8 with the patches from:
> > 
> > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/
> > 
> > plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" 
> > patch
> > and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation 
> > doesn't
> > work correctly.  Although the box hibernates and restores, there is a 
> > temporary
> > "hang" during the "resume hardware" sequence, after which the "lock" led 
> > starts
> > to blink (and remains in this state) and something like this appears in 
> > dmesg:
> > 
> > Extended CMOS year: 2000
> > Enabling non-boot CPUs ...
> > SMP alternatives: switching to SMP code
> > Booting processor 1/2 APIC 0x1
> > Initializing CPU#1
> > Calibrating delay using timer specific routine.. 3990.36 BogoMIPS 
> > (lpj=7980735)
> > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> > CPU: L2 Cache: 512K (64 bytes/line)
> > Unable to handle kernel paging request at 806c64d4 RIP: 
> >  [] identify_cpu+0x2ac/0x5a1
> 
> Hmm. That's really early in the CPU bring up. The only change in this
> area is the C1E patch. Can you decode the exact source line, where it is
> failing ?

Yes, I can, but I'll first see what's wrong with the boot.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_chroot+sys_fchdir Fix

2007-09-26 Thread Christer Weinigel
On Wed, 26 Sep 2007 20:04:14 +0930
David Newall <[EMAIL PROTECTED]> wrote:

> Al Viro wrote:
> > Oh, for fsck sake...  Folks, it's standard-required behaviour.
> > Ability to chroot() implies the ability to break out of it.  Could
> > we please add that (along with reference to SuS) to l-k FAQ and be
> > done with that nonsense?
> 
> I'm pretty confident that it's only standard behavior for Linux.
> Every other unix says it's not allowed.

So how about reading up on the subject instead?  

*spends five minutes with Google*

>From the OpenBSD FAQ (an operating system most know for being really,
really focused on security):

http://www.openbsd.org/faq/faq10.html

Any application which has to assume root privileges to operate is
pointless to attempt to chroot(2), as root can generally escape a
chroot(2).

Solaris:

http://www.softpanorama.org/Solaris/Security/solaris_privilege_sets.shtml

You must be root to make the chroot() call, and you should quickly
change to non-root (a root user can escape a chroot environment,
so if it's to be effective, you need to drop that privilege).

A chroot FAQ:

http://www.unixwiz.net/techtips/chroot-practices.html

There are well-known techniques used to escape from jail, but the
most common one requires root privileges inside the jail.

Another chroot FAT one linked to from the previous one:

http://www.bpfh.net/simes/computing/chroot-break.html

This page details how the chroot() system call can be used to
provide an additional layer of security when running untrusted
programs. It also details how this additional layer of security
can be circumvented.

Whilst chroot() is reasonably secure, a program can escape from
its trap.

Yet Another FAQ, this time about secure Unix Programming:

http://www.faqs.org/faqs/unix-faq/programmer/secure-programming/

chroot() only limits the file system scope and nothing else.

[further descriptions of how to break out of chroot, with and
without root privileges]

Convinced?

  /Christer

-- 
"Just how much can I get away with and still go to heaven?"

Christer Weinigel <[EMAIL PROTECTED]>  http://www.weinigel.se
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] page->mapping clarification [1/3] base functions

2007-09-26 Thread Hugh Dickins
On Sat, 22 Sep 2007, KAMEZAWA Hiroyuki wrote:
> On Fri, 21 Sep 2007 18:02:47 +0100 (BST)
> Hugh Dickins <[EMAIL PROTECTED]> wrote:
> 
> > Or should I now leave PG_swapcache as is,
> > given your designs on page->mapping?
> > 
>  will conflict with my idea ?
> ==
> http://marc.info/?l=linux-mm=118956492926821=2
> ==

I asked because I had thought it would be a serious conflict: obviously
the patches as such would conflict quite a bit, but that's not serious,
one or the other just gets fixed up.

But now I don't see it - we both want to grab a further bit from the
low bits of the page->mapping pointer, you PAGE_MAPPING_INFO and me
PAGE_MAPPING_SWAP; but that's okay, so long as whoever is left using
bit (1<<2) is careful about the 32-bit case and remembers to put
__attribute__((aligned(sizeof(long long
on the declarations of struct address_space and struct anon_vma
and your struct page_mapping_info.

Would that waste a little memory?  I think not with SLUB,
but perhaps with SLOB, which packs a little tighter.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] allow drivers to flush in-flight DMA

2007-09-26 Thread Roland Dreier
 > Can we define this API to provide the same semantics as the memory
 > that dma_alloc_coherent() returns?

No, definitely not.  The property of the mapping here is all about
ordering with respect to other DMAs (from the same device) and nothing
to do with coherency between the CPU's and device's view of the memory.

 > Sorry - this feels like a "color of the shed" argument, but isn't
 > this about DMA ordering attribute?
 > "dmaflush" is an action and not an attribute to me.

I guess I don't love the "dmaflush" name, but the property of these
mappings is exactly that DMA into one of these mappings also performs
the action of flushing other in-flight DMAs.  However I guess your
point is a good one: the effect really desired is that DMAs to these
mappings become visible strictly after earlier DMAs, and we don't care
exactly how the effect is obtained.

No good idea of a better name though.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Chroot bug (was: sys_chroot+sys_fchdir Fix)

2007-09-26 Thread Bodo Eggert
On Wed, 26 Sep 2007, David Newall wrote:

> Miloslav Semler pointed out that a root process can chdir("..") out of 
> its chroot.  Although this is documented in the man page, it conflicts 
> with the essential function, which is to change the root directory of 
> the process.

The root directory, '/' is changed, and if the process is capable of using
chroot, it may change the root directory again. Works as defined.

>  In addition to any creative uses, for example Philipp 
> Marek's loading dynamic libraries, it seems clear that the prime purpose 
> of chroot is to aid security.

As long as root has more than a safe subset of capabilities, root can escape 
a chroot.

Besides that, fchdir on open-at-chroot fds does not decrease the security, 
since the attacker needs help from the outside root, who is not restricted 
by chroot.

I'm more concerned about abstract unix sockets, they could be used to 
send a file descriptor to compromised daemons and extend exploits to
the outside of a chroot and across namespaces - at least I suspect it.
The whole f* family of syscalls would be affected. This can be cured by
e.g. not allowing to receive fds if the root+namespace do not match.

>  Being able to cd your way out is handy 
> for the bad guys, but the good guys don't need it; there are a thousand 
> better, safer solutions.

The good guys don't cd out, they open the instalkler archive, chroot to the 
new system root and extract it there. Then they chroot back using the saved 
cwd.

> If there truly is a need to be able to pop in and out of a chroot, then 
> the solution should be obvious, such as with real versus effective user 
> and group ids.  An important quality of a solution would be a way to fix 
> that essential function: to set the root in such a way that you can no 
> longer pop out.  But that is a separate question.

As in jail()?

As far as I know, the new virtualisation features sneaking into the kernel  
will allow implementing a jail, too, in a more secure way than any hacking 
on chroot can give.

> The question: is chroot buggy?  I'm pleased to turn to SCO for an 
> independent definition for chroot, from which I get the following:
> 
> http://osr600doc.sco.com/en/man/html.S/chroot.S.html:
> >
> > The *..* entry in the root directory is interpreted to mean the root 
> > directory itself. Thus, *..* cannot be used to access files outside 
> > the subtree rooted at the root directory.
> >
> 
> I argue chroot is buggy.  Miloslav's patch might not be the right 
> solution, but he has the right idea (i.e. fix it.)

There are implementations of chroot which imply chdir(), and not having f* 
functions, they can not _directly_ acces files outside the chroot. But as 
long as they can e.g. mknod /dev/mem or strace, they can do anything.

So let's not put a fingerprint sensor on that chinese paper door.
-- 
You know you're in trouble when packet floods are competing to flood you.
-- grc.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] Change dmapool free block management

2007-09-26 Thread Roland Dreier
 > Also add documentation for how dma pools work, move the header above the
 > includes, add my copyright, add the original author's copyright, add a
 > GPL v2 licence to the file and fix the includes.

The fact that you have all these other changes mixed in makes the main
change very difficult to review.  Also I assume the main change is
"Change dmapool free block management" -- but I'm left wondering
exactly how and why you're changing it.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread Mathieu Desnoyers
* David J. Wilder ([EMAIL PROTECTED]) wrote:
> These patches provide a kernel tracing interface called "trace".
> 
> (update) Moved the sample code to the new samples\ subdir
> 
> The motivation for "trace" is to:
> - Provide a simple set of tracing primitives that will utilize the high-
>   performance and low-overhead of relayfs for passing traces data from
>   kernel to user space.
> - Provide a common user interface for managing kernel traces.
> - Allow for binary as well as ascii trace data.
> - Incorporate features from the systemtap runtime that are
>   useful to others.
> 
> Patches are against 2.6.23-rc6-mm1
> 
> Summary of patches:
> [patch 1/3]  Trace code and documentation
> [patch 2/3]  Relay Reset Consumed
> [patch 3/3]  Trace sample
> 
> Note: Patches 1/3 and 2/3 must be applied together.
> 
> Note: The following patches must be applied with 3/3.
> [patch 3/5] Add samples subdir
>   http://lkml.org/lkml/2007/9/25/157

I guess you mean:
[patch 3/5] Add samples subdir (updated)
http://lkml.org/lkml/2007/9/25/366

(please try it with this new version, it should work as is..)

Mathieu

> [patch 4/5] Linux Kernel Markers - Samples
>   http://lkml.org/lkml/2007/9/25/166
> 
> Signed-off-by: David Wilder <[EMAIL PROTECTED]>
> 
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression in 2.6.23-pre Was: Problems with 2.6.23-rc6 on AMD Geode LX800

2007-09-26 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> Its the latter - max_pfn as read by find_max_pfn() in arch/i386/e820.c
> is being set to 9F (640k) in the broken case, this due to the
> the e820 map looking something like this:
> 
> Address   Size  Type
>   0009FC00  1
> 0009FC00  0400  2
> 000E  2000  2
> 
> (Yep, thats it - thats the list.  e820.nr_map is indeed 3). 
> 
> Long story short, bdata->node_low_pfn gets set to 9F, and When we 
> try to allocate the bootmem bitmap (at _pa_symbol(_text), which is 
> page 0x100), then the system gets appropriately angry.
> 
> As background, I'm using syslinux 3.36 as my loader here - I've used this
> exact same version for a very long time, so I don't blame it in the least.
> Something is getting confused in the early kernel, and whatever that
> something is, a still unknown change in a newer version of the BIOS
> fixed it.  The search goes on.
> 

Please try the following debug patch to let us know what is going on.

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index 1a2e62d..a0ccf29 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -33,6 +33,12 @@ static int detect_memory_e820(void)
  "=m" (*desc)
: "D" (desc), "a" (0xe820));
 
+   printf("e820: err %d id 0x%08x next %u %08x:%08x %u\n",
+  err, id, next,
+  (unsigned int)desc->addr,
+  (unsigned int)desc->size,
+  desc->type);
+
if (err || id != SMAP)
break;
 


Re: __kernel_vsyscall () hangs in SIGCHLD handler

2007-09-26 Thread Mikael Pettersson
Ulrich Drepper writes:
 > On 9/26/07, John Z. Bohach <[EMAIL PROTECTED]> wrote:
 > > Is there some reason that syslog() sleeps in __kernel_vsyscall() when
 > > invoked from a signal handler?
 > 
 > Only very few functions are allowed to be called from signal handlers.
 >  This is clearly spelled out in the POSIX spec.  Section XSH 2.4.3
 > lists the allowed functions.  syslog of course is not on it.

The Linux kernel itself imposes no restrictions on what you can
do in user-space signal handlers.

However, user-space is a different story. The interrupted thread
may have held a lock, in which case calling code from the signal
handler that tries to take the same lock may result in a deadlock.
Or the thread may be been in the process of updating some private
data, in which case calling code from the signal handler that tries
to access that data may result in data corruption. And then there's
libc which may wrap your signal handler with code doing unspecified
libc magic. And so on, the possibilities for failure are endless :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] bw-qcam: use data_reverse instead of manually poking the control register

2007-09-26 Thread Brett Warden
Appeases the warning "parport0 (bw-qcam): use data_reverse for this!"

Signed-off-by: Brett T. Warden <[EMAIL PROTECTED]>

---

It seems to work fine with my Quickcam under 2.6.22.

diff --git a/drivers/media/video/bw-qcam.c b/drivers/media/video/bw-qcam.c
index 7d47cbe..01e47ed 100644
--- a/drivers/media/video/bw-qcam.c
+++ b/drivers/media/video/bw-qcam.c
@@ -107,6 +107,11 @@ static inline void write_lpcontrol(struct
qcam_device *q, int d)
parport_write_control(q->pport, d);
 }

+static inline void reverse_port(struct qcam_device *q)
+{
+   parport_data_reverse(q->pport);
+}
+
 static int qc_waithand(struct qcam_device *q, int val);
 static int qc_command(struct qcam_device *q, int command);
 static int qc_readparam(struct qcam_device *q);
@@ -369,7 +374,11 @@ static void qc_reset(struct qcam_device *q)
break;

case QC_ANY:
-   write_lpcontrol(q, 0x20);
+   /*
+* Replaced with reverse_port
+* write_lpcontrol(q, 0x20);
+*/
+   reverse_port(q);
write_lpdata(q, 0x75);

if (read_lpdata(q) != 0x75) {
@@ -512,10 +521,12 @@ static inline int qc_readbytes(struct
qcam_device *q, char buffer[])
switch (q->port_mode & QC_MODE_MASK)
{
case QC_BIDIR:  /* Bi-directional Port */
-   write_lpcontrol(q, 0x26);
+   reverse_port(q);
+   write_lpcontrol(q, 0x6);
lo = (qc_waithand2(q, 1) >> 1);
hi = (read_lpstatus(q) >> 3) & 0x1f;
-   write_lpcontrol(q, 0x2e);
+   reverse_port(q);
+   write_lpcontrol(q, 0xe);
lo2 = (qc_waithand2(q, 0) >> 1);
hi2 = (read_lpstatus(q) >> 3) & 0x1f;
switch (q->bpp)
@@ -613,10 +624,13 @@ static long qc_capture(struct qcam_device * q,
char __user *buf, unsigned long l

if ((q->port_mode & QC_MODE_MASK) == QC_BIDIR)
{
-   write_lpcontrol(q, 0x2e);   /* turn port around */
-   write_lpcontrol(q, 0x26);
+   reverse_port(q);/* turn port around */
+   write_lpcontrol(q, 0xe);
+   reverse_port(q);
+   write_lpcontrol(q, 0x6);
(void) qc_waithand(q, 1);
-   write_lpcontrol(q, 0x2e);
+   reverse_port(q);
+   write_lpcontrol(q, 0xe);
(void) qc_waithand(q, 0);
}



-- 
Brett Warden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Re: NMI error and Intel S5000PSL Motherboards]

2007-09-26 Thread samson yeung
Hello,

I'm working with AndrewL733 on this issue. I'm doing the git bisect right now.

scanpci -f -1 causes the problem, scanpci -f -2 and scanpci -O do not.

The systems have two 1-Gig sticks in the D1 and C1 slots of the
motherboard. I ran memtest86 overnight and got no errors. (Samsung 1GB
PC2-5300F-555-11-B0)

Both pci=nomsi and pci=nommconf don't change the situation on the
ubuntu's custom kernel. I can try them on a stock kernel.org kernel
after I finish doing the git bisect.

The driver does not even need to be loaded to have the problem
(e1000). I have not tried the 2.6.18 driver with 2.6.20, but I have
tried both the in-kernel driver as well as the newer driver from Intel
with the same result.

The drive is a Seagate Barracuda 7200.9 80 Gbytes with fimware 3.AAE
I can include hdparm -i output if it will help.

The problem is only happening on 64-bit. As noted above, I'm running
git-bisect to test a stock kernel.org kernel. 32-bit Ubuntu does not
exhibit the problem, I have not tested a kernel.org 32-bit kernel.

Extended command output follows:

cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5160  @ 3.00GHz
stepping: 6
cpu MHz : 1998.000
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fx   sr sse sse2 ss ht tm
syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16
xtpr dca lahf_lm
bogomips: 5990.11
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5160  @ 3.00GHz
stepping: 6
cpu MHz : 1998.000
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fx   sr sse sse2 ss ht tm
syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16
xtpr dca lahf_lm
bogomips: 5984.99
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

--
lspci -v:
00:00.0 Host bridge: Intel Corporation Server Memory Controller Hub (rev b1)
Subsystem: Intel Corporation Unknown device 3476
Flags: bus master, fast devsel, latency 0
Capabilities: 

00:02.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 2-3
(rev b1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=05, sec-latency=0
I/O behind bridge: 4000-4fff
Memory behind bridge: b800-b89f
Capabilities: 

00:03.0 PCI bridge: Intel Corporation Server PCI Express x4 Port 3
(rev b1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
Capabilities: 

00:04.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 4-5
(rev b1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=07, subordinate=07, sec-latency=0
Capabilities: 

00:05.0 PCI bridge: Intel Corporation Server PCI Express x4 Port 5
(rev b1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=08, subordinate=08, sec-latency=0
Capabilities: 

00:06.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 6-7
(rev b1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=09, subordinate=0c, sec-latency=0
I/O behind bridge: 2000-3fff
Memory behind bridge: b8b0-b8cf
Prefetchable memory behind bridge: b8e0-b8f0
Capabilities: 

00:07.0 PCI bridge: Intel Corporation Server PCI Express x4 Port 7
(rev b1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=0d, subordinate=0d, sec-latency=0
Capabilities: 

00:08.0 System peripheral: Intel Corporation Server DMA Engine (rev b1)
Subsystem: Intel Corporation Unknown device 3476
Flags: bus master, fast devsel, latency 0, IRQ 1
Memory at fe70 (64-bit, non-prefetchable) [size=1K]
Capabilities: 

00:10.0 Host bridge: Intel Corporation Server Error Reporting 

Re: [ofa-general] [PATCH v3] iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.

2007-09-26 Thread Steve Wise

Rolan/Sean,

What do you all think?

Steve.


Steve Wise wrote:

iw_cxgb3: Support "iwarp-only" interfaces to avoid 4-tuple conflicts.

Version 3:

- don't use list_del_init() where list_del() is sufficient.

Version 2:

- added a per-device mutex for the address and listening endpoints lists.

- wait for all replies if sending multiple passive_open requests to rnic.

- log warning if no addresses are available when a listen is issued.

- tested

---

Design:

The sysadmin creates "for iwarp use only" alias interfaces of the form
"devname:iw*" where devname is the native interface name (eg eth0) for the
iwarp netdev device.  The alias label can be anything starting with "iw".
The "iw" immediately after the ':' is the key used by the iw_cxgb3 driver.

EG:
ifconfig eth0 192.168.70.123 up
ifconfig eth0:iw1 192.168.71.123 up
ifconfig eth0:iw2 192.168.72.123 up

In the above example, 192.168.70/24 is for TCP traffic, while
192.168.71/24 and 192.168.72/24 are for iWARP/RDMA use.

The rdma-only interface must be on its own IP subnet. This allows routing
all rdma traffic onto this interface.

The iWARP driver must translate all listens on address 0.0.0.0 to the
set of rdma-only ip addresses for the device in question.  This prevents
incoming connect requests to the TCP ipaddresses from going up the
rdma stack.

Implementation Details:

- The iw_cxgb3 driver registers for inetaddr events via
register_inetaddr_notifier().  This allows tracking the iwarp-only
addresses/subnets as they get added and deleted.  The iwarp driver
maintains a list of the current iwarp-only addresses.

- The iw_cxgb3 driver builds the list of iwarp-only addresses for its
devices at module insert time.  This is needed because the inetaddr
notifier callbacks don't "replay" address-add events when someone
registers.  So the driver must build the initial list at module load time.

- When a listen is done on address 0.0.0.0, then the iw_cxgb3 driver
must translate that into a set of listens on the iwarp-only addresses.
This is implemented by maintaining a list of stid/addr entries per
listening endpoint.

- When a new iwarp-only address is added or removed, the iw_cxgb3 driver
must traverse the set of listening endpoints and update them accordingly.
This allows an application to bind to 0.0.0.0 prior to the iwarp-only
interfaces being configured.  It also allows changing the iwarp-only set
of addresses and getting the expected behavior for apps already bound
to 0.0.0.0.  This is done by maintaining a list of listening endpoints
off the device struct.

- The address list, the listening endpoint list, and each list of
stid/addrs in use per listening endpoint are all protected via a mutex
per iw_cxgb3 device.

Signed-off-by: Steve Wise <[EMAIL PROTECTED]>
---

 drivers/infiniband/hw/cxgb3/iwch.c|  125 
 drivers/infiniband/hw/cxgb3/iwch.h|   11 +
 drivers/infiniband/hw/cxgb3/iwch_cm.c |  259 +++--
 drivers/infiniband/hw/cxgb3/iwch_cm.h |   15 ++
 4 files changed, 360 insertions(+), 50 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c 
b/drivers/infiniband/hw/cxgb3/iwch.c
index 0315c9d..d81d46e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -63,6 +63,123 @@ struct cxgb3_client t3c_client = {
 static LIST_HEAD(dev_list);
 static DEFINE_MUTEX(dev_mutex);
 
+static void insert_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)

+{
+   struct iwch_addrlist *addr;
+
+   addr = kmalloc(sizeof *addr, GFP_KERNEL);
+   if (!addr) {
+   printk(KERN_ERR MOD "%s - failed to alloc memory!\n",
+  __FUNCTION__);
+   return;
+   }
+   addr->ifa = ifa;
+   mutex_lock(>mutex);
+   list_add_tail(>entry, >addrlist);
+   mutex_unlock(>mutex);
+}
+
+static void remove_ifa(struct iwch_dev *rnicp, struct in_ifaddr *ifa)
+{
+   struct iwch_addrlist *addr, *tmp;
+
+   mutex_lock(>mutex);
+   list_for_each_entry_safe(addr, tmp, >addrlist, entry) {
+   if (addr->ifa == ifa) {
+   list_del(>entry);
+   kfree(addr);
+   goto out;
+   }
+   }
+out:
+   mutex_unlock(>mutex);
+}
+
+static int netdev_is_ours(struct iwch_dev *rnicp, struct net_device *netdev)
+{
+   int i;
+
+   for (i = 0; i < rnicp->rdev.port_info.nports; i++)
+   if (netdev == rnicp->rdev.port_info.lldevs[i])
+   return 1;
+   return 0;
+}
+
+static inline int is_iwarp_label(char *label)
+{
+   char *colon;
+
+   colon = strchr(label, ':');
+   if (colon && !strncmp(colon+1, "iw", 2))
+   return 1;
+   return 0;
+}
+
+static int nb_callback(struct notifier_block *self, unsigned long event,
+  void *ctx)
+{
+   struct in_ifaddr *ifa = ctx;
+   struct iwch_dev *rnicp = container_of(self, struct iwch_dev, nb);

[PATCH 4/4] dmapool: Improve memory usage for devices which can't cross boundaries

2007-09-26 Thread Matthew Wilcox
The previous implementation simply refused to allocate more than a
boundary's worth of data from an entire page.  Some users didn't know
this, so specified things like SMP_CACHE_BYTES, not realising the
horrible waste of memory that this was.  It's fairly easy to correct
this problem, just by ensuring we don't cross a boundary within a page.
This even helps drivers like EHCI (which can't cross a 4k boundary)
on machines with larger page sizes.

Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 mm/dmapool.c |   29 +
 1 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/mm/dmapool.c b/mm/dmapool.c
index 4418e4d..cc43d20 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -43,6 +43,7 @@ struct dma_pool { /* the pool */
size_t size;
struct device *dev;
size_t allocation;
+   size_t boundary;
char name[32];
wait_queue_head_t waitq;
struct list_head pools;
@@ -107,7 +108,7 @@ static DEVICE_ATTR(pools, S_IRUGO, show_pools, NULL);
  * @dev: device that will be doing the DMA
  * @size: size of the blocks in this pool.
  * @align: alignment requirement for blocks; must be a power of two
- * @allocation: returned blocks won't cross this boundary (or zero)
+ * @boundary: returned blocks won't cross this power of two boundary
  * Context: !in_interrupt()
  *
  * Returns a dma allocation pool with the requested characteristics, or
@@ -117,15 +118,16 @@ static DEVICE_ATTR(pools, S_IRUGO, show_pools, NULL);
  * cache flushing primitives.  The actual size of blocks allocated may be
  * larger than requested because of alignment.
  *
- * If allocation is nonzero, objects returned from dma_pool_alloc() won't
+ * If @boundary is nonzero, objects returned from dma_pool_alloc() won't
  * cross that size boundary.  This is useful for devices which have
  * addressing restrictions on individual DMA transfers, such as not crossing
  * boundaries of 4KBytes.
  */
 struct dma_pool *dma_pool_create(const char *name, struct device *dev,
-size_t size, size_t align, size_t allocation)
+size_t size, size_t align, size_t boundary)
 {
struct dma_pool *retval;
+   size_t allocation;
 
if (align == 0) {
align = 1;
@@ -142,14 +144,13 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
if ((size % align) != 0)
size = ALIGN(size, align);
 
-   if (allocation == 0) {
-   if (PAGE_SIZE < size)
-   allocation = size;
-   else
-   allocation = PAGE_SIZE;
-   // FIXME: round up for less fragmentation
-   } else if (allocation < size)
+   allocation = max_t(size_t, size, PAGE_SIZE);
+
+   if (!boundary) {
+   boundary = allocation;
+   } else if ((boundary < size) || (boundary & (boundary - 1))) {
return NULL;
+   }
 
retval = kmalloc_node(sizeof *retval, GFP_KERNEL, dev_to_node(dev));
if (!retval)
@@ -162,6 +163,7 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
INIT_LIST_HEAD(>page_list);
spin_lock_init(>lock);
retval->size = size;
+   retval->boundary = boundary;
retval->allocation = allocation;
init_waitqueue_head(>waitq);
 
@@ -190,11 +192,14 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
 static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
 {
unsigned int offset = 0;
+   unsigned int next_boundary = pool->boundary;
 
do {
unsigned int next = offset + pool->size;
-   if (unlikely((next + pool->size) >= pool->allocation))
-   next = pool->allocation;
+   if (unlikely((next + pool->size) >= next_boundary)) {
+   next = next_boundary;
+   next_boundary += pool->boundary;
+   }
*(int *)(page->vaddr + offset) = next;
offset = next;
} while (offset < pool->allocation);
-- 
1.5.3.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] Change dmapool free block management

2007-09-26 Thread Matthew Wilcox
Also add documentation for how dma pools work, move the header above the
includes, add my copyright, add the original author's copyright, add a
GPL v2 licence to the file and fix the includes.

Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 mm/dmapool.c |  161 +++--
 1 files changed, 88 insertions(+), 73 deletions(-)

diff --git a/mm/dmapool.c b/mm/dmapool.c
index f5d12a7..4418e4d 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -1,25 +1,45 @@
+/*
+ * DMA Pool allocator
+ *
+ * Copyright 2001 David Brownell
+ * Copyright 2007 Intel Corporation
+ *   Author: Matthew Wilcox <[EMAIL PROTECTED]>
+ *
+ * This software may be redistributed and/or modified under the terms of
+ * the GNU General Public License ("GPL") version 2 as published by the
+ * Free Software Foundation.
+ *
+ * This allocator returns small blocks of a given size which are DMA-able by
+ * the given device.  It uses the dma_alloc_coherent page allocator to get
+ * new pages, then splits them up into blocks of the required size.
+ * Many older drivers still have their own code to do this.
+ *
+ * The current design of this allocator is fairly simple.  The pool is
+ * represented by the 'struct dma_pool' which keeps a doubly-linked list of
+ * allocated pages.  Each page in the page_list is split into blocks of at
+ * least 'size' bytes.  Free blocks are tracked in an unsorted singly-linked
+ * list of free blocks within the page.  Used blocks aren't tracked, but we
+ * keep a count of how many are currently allocated from each page.
+ */
 
 #include 
-#include 
-#include /* Needed for i386 to build */
-#include/* Needed for i386 to build */
 #include 
 #include 
-#include 
+#include 
+#include 
 #include 
+#include 
 #include 
 #include 
-
-/*
- * Pool allocator ... wraps the dma_alloc_coherent page allocator, so
- * small blocks are easily used by drivers for bus mastering controllers.
- * This should probably be sharing the guts of the slab allocator.
- */
+#include 
+#include 
+#include 
+#include 
+#include 
 
 struct dma_pool {  /* the pool */
struct list_head page_list;
spinlock_t lock;
-   size_t blocks_per_page;
size_t size;
struct device *dev;
size_t allocation;
@@ -32,8 +52,8 @@ struct dma_page { /* cacheable header for 
'allocation' bytes */
struct list_head page_list;
void *vaddr;
dma_addr_t dma;
-   unsigned in_use;
-   unsigned long bitmap[0];
+   unsigned int in_use;
+   unsigned int offset;
 };
 
 #definePOOL_TIMEOUT_JIFFIES((100 /* msec */ * HZ) / 1000)
@@ -68,8 +88,8 @@ show_pools(struct device *dev, struct device_attribute *attr, 
char *buf)
 
/* per-pool info, no real statistics yet */
temp = scnprintf(next, size, "%-16s %4u %4Zu %4Zu %2u\n",
-pool->name,
-blocks, pages * pool->blocks_per_page,
+pool->name, blocks,
+pages * (pool->allocation / pool->size),
 pool->size, pages);
size -= temp;
next += temp;
@@ -113,9 +133,12 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
return NULL;
}
 
-   if (size == 0)
+   if (size == 0) {
return NULL;
-
+   } else if (size < 4) {
+   size = 4;
+   }
+   
if ((size % align) != 0)
size = ALIGN(size, align);
 
@@ -140,7 +163,6 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
spin_lock_init(>lock);
retval->size = size;
retval->allocation = allocation;
-   retval->blocks_per_page = allocation / size;
init_waitqueue_head(>waitq);
 
if (dev) {
@@ -165,28 +187,36 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
return retval;
 }
 
+static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
+{
+   unsigned int offset = 0;
+
+   do {
+   unsigned int next = offset + pool->size;
+   if (unlikely((next + pool->size) >= pool->allocation))
+   next = pool->allocation;
+   *(int *)(page->vaddr + offset) = next;
+   offset = next;
+   } while (offset < pool->allocation);
+}
+
 static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags)
 {
struct dma_page *page;
-   int mapsize;
-
-   mapsize = pool->blocks_per_page;
-   mapsize = (mapsize + BITS_PER_LONG - 1) / BITS_PER_LONG;
-   mapsize *= sizeof(long);
 
-   page = kmalloc(mapsize + sizeof *page, mem_flags);
+   page = kmalloc(sizeof(*page), mem_flags);
if (!page)
return NULL;
-   page->vaddr = dma_alloc_coherent(pool->dev,
-

Re: missing mnt_drop_write() on open error

2007-09-26 Thread Miklos Szeredi
> > Btw, may_open() doesn't do mnt_want_write() around the truncation if
> > file is opened with O_TRUNC | O_RDONLY.
> 
> What's the path to may_open() in that case?  open_namei() should wrap
> all callers other than nfs, and it does:
> 
>   /* O_TRUNC implies we need access checks for write permissions */
> if (flag & O_TRUNC)
> acc_mode |= MAY_WRITE;
> 
> Which should trigger the may_open() code.  

Ah, I missed that.

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] Avoid taking waitqueue lock in dmapool

2007-09-26 Thread Matthew Wilcox
With one trivial change (taking the lock slightly earlier on wakeup
from schedule), all uses of the waitq are under the pool lock, so we
can use the locked (or __) versions of the wait queue functions, and
avoid the extra spinlock.

Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 mm/dmapool.c |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/dmapool.c b/mm/dmapool.c
index 6201371..a359b5e 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -273,8 +273,8 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
size_t offset;
void *retval;
 
- restart:
spin_lock_irqsave(>lock, flags);
+ restart:
list_for_each_entry(page, >page_list, page_list) {
int i;
/* only cachable accesses here ... */
@@ -296,12 +296,13 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t 
mem_flags,
DECLARE_WAITQUEUE(wait, current);
 
current->state = TASK_INTERRUPTIBLE;
-   add_wait_queue(>waitq, );
+   __add_wait_queue(>waitq, );
spin_unlock_irqrestore(>lock, flags);
 
schedule_timeout(POOL_TIMEOUT_JIFFIES);
 
-   remove_wait_queue(>waitq, );
+   spin_lock_irqsave(>lock, flags);
+   __remove_wait_queue(>waitq, );
goto restart;
}
retval = NULL;
@@ -401,7 +402,7 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, 
dma_addr_t dma)
page->in_use--;
set_bit(block, >bitmap[map]);
if (waitqueue_active(>waitq))
-   wake_up(>waitq);
+   wake_up_locked(>waitq);
/*
 * Resist a temptation to do
 *if (!is_page_busy(bpp, page->bitmap)) pool_free_page(pool, page);
-- 
1.5.3.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] dmapool: Validate parameters to dma_pool_create

2007-09-26 Thread Matthew Wilcox
Check that 'align' is a power of two, like the API specifies.
Align 'size' to 'align' correctly -- the current code has an off-by-one.
The ALIGN macro in kernel.h doesn't.

Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 mm/dmapool.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/dmapool.c b/mm/dmapool.c
index a359b5e..f5d12a7 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -107,17 +107,18 @@ struct dma_pool *dma_pool_create(const char *name, struct 
device *dev,
 {
struct dma_pool *retval;
 
-   if (align == 0)
+   if (align == 0) {
align = 1;
-   if (size == 0)
+   } else if (align & (align - 1)) {
return NULL;
-   else if (size < align)
-   size = align;
-   else if ((size % align) != 0) {
-   size += align + 1;
-   size &= ~(align - 1);
}
 
+   if (size == 0)
+   return NULL;
+
+   if ((size % align) != 0)
+   size = ALIGN(size, align);
+
if (allocation == 0) {
if (PAGE_SIZE < size)
allocation = size;
-- 
1.5.3.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


dmapool

2007-09-26 Thread Matthew Wilcox

I have a series of patches to dmapool that I'd like opinions on.  I
don't have any performance numbers yet, but some of the patches are a
good idea, with or without performance numbers.

One of the problems with dmapool is that it doesn't have a maintainer
listed.  I've spent enough time looking at it over the past couple of
weeks that I think I'd be comfortable in that role, so unless someone
objects, I'll submit a patch to MAINTAINERS to add myself.

I shan't post the first two patches.  The first simply Lindents the code
and the second moves it to mm/ (a better fit than drivers/base, IMO).

Four patches to follow.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Add iSCSI iBFT support.

2007-09-26 Thread Konrad Rzeszutek
This patch adds a /sysfs/firmware/ibft/table binary blob which exports
the iSCSI Boot Firmware Table (iBFT) structure. 

What is iSCSI Boot Firmware Table? It is a mechanism for the iSCSI
tools to extract from the machine NICs the iSCSI connection information
so that they can automagically mount the iSCSI share/target. Currently
the iSCSI information is hard-coded in the initrd.

The full details of the structure are located at:
ftp://ftp.software.ibm.com/systems/support/system_x_pdf/ibm_iscsi_boot_firmware_table_v1.02.pdf

Signed-off-by: Konrad Rzeszutek <[EMAIL PROTECTED]>
Signed-off-by: Peter Jones <[EMAIL PROTECTED]>

diff --git a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c
index d474cd6..11d700f 100644
--- a/arch/i386/kernel/setup.c
+++ b/arch/i386/kernel/setup.c
@@ -46,7 +46,7 @@ #include 
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #include 
@@ -150,6 +150,9 @@ static inline void copy_edd(void)
 }
 #endif
 
+void *ibft_phys;
+EXPORT_SYMBOL(ibft_phys);
+
 int __initdata user_defined_memmap = 0;
 
 /*
@@ -456,6 +459,15 @@ #ifdef CONFIG_KEXEC
reserve_bootmem(crashk_res.start,
crashk_res.end - crashk_res.start + 1);
 #endif
+
+   /* Scan for an iBFT (iSCSI Boot Firmware Table) */
+   {
+   unsigned int ibft_len = find_ibft();
+   if (ibft_len)
+   /* The specs says to scan for the table between 512k to 1MB.
+  We reserve it n case it is in the e820 RAM section. */
+   reserve_bootmem(ibft_phys, PAGE_ALIGN(ibft_len));
+   }
 }
 
 /*
diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
index af838f6..0d12775 100644
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -44,6 +44,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -196,6 +197,9 @@ static inline void copy_edd(void)
 }
 #endif
 
+void *ibft_phys;
+EXPORT_SYMBOL(ibft_phys);
+
 #define EBDA_ADDR_POINTER 0x40E
 
 unsigned __initdata ebda_addr;
@@ -365,6 +369,15 @@ #ifdef CONFIG_KEXEC
crashk_res.end - crashk_res.start + 1);
}
 #endif
+   /* Scan for an iBFT (iSCSI Boot Firmware Table) */
+   {
+   unsigned int ibft_len = find_ibft();
+   if (ibft_len)
+   /* The specs says to scan for the table between 512k to 1MB.
+  We reserve it in case it is in the e820 RAM section. */
+   reserve_bootmem_generic((unsigned long)ibft_phys,
+   PAGE_ALIGN(ibft_len));
+   }
 
paging_init();
 
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 05f02a3..2d9f01a 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -93,4 +93,14 @@ config DMIID
  information from userspace through /sys/class/dmi/id/ or if you want
  DMI-based module auto-loading.
 
+config ISCSI_IBFT
+   tristate "iSCSI Boot Firmware Table Attributes"
+   depends on X86
+   default n
+   help
+ This option enables support for detection of an iSCSI
+ Boot Firmware Table (iBFT).  If you wish to detect iSCSI boot
+ parameters dynamically during system boot, say Y.
+ Otherwise, say N.
+
 endmenu
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 8d4ebc8..b6319f7 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_EFI_PCDP)  += pcdp.o
 obj-$(CONFIG_DELL_RBU)  += dell_rbu.o
 obj-$(CONFIG_DCDBAS)   += dcdbas.o
 obj-$(CONFIG_DMIID)+= dmi-id.o
+obj-$(CONFIG_ISCSI_IBFT)   += iscsi_ibft.o
diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c
new file mode 100644
index 000..b3767fe
--- /dev/null
+++ b/drivers/firmware/iscsi_ibft.c
@@ -0,0 +1,201 @@
+/*
+ * drivers/firmware/iscsi_ibft.c
+ *  Copyright 2007 Red Hat, Inc.
+ *  by Peter Jones <[EMAIL PROTECTED]>
+ *  Copyright 2007 IBM
+ *  by Konrad Rzeszutek <[EMAIL PROTECTED]>
+ *
+ * This code exposes the the iSCSI Boot Format Table to userland via sysfs.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License v2.0 as published by
+ * the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define ISCSI_IBFT_VERSION  "0.2"
+#define ISCSI_IBFT_DATE "2007-Aug-29"
+
+MODULE_AUTHOR
+("Peter Jones <[EMAIL PROTECTED]> and Konrad Rzeszutek <[EMAIL PROTECTED]>");
+MODULE_DESCRIPTION("sysfs interface to BIOS iBFT information");
+MODULE_LICENSE("GPL");

Re: commit 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2 kills r8169 send performance

2007-09-26 Thread Timo Jantunen
On Wed, 26 Sep 2007, Francois Romieu wrote:

> The patch below is scheduled for inclusion before 2.6.23. Please try it and
> see if it makes a difference on top of 2.6.23-rc8 (full dmesg will be welcome
> too).

Thanks for the quick reply and fix. Unfortunately the fix didn't help in my 
case.


Iperf readings (send+receive) from

2.6.23-git (just before 6dccd... commit)
[  5]  0.0-10.0 sec830 MBytes694 Mbits/sec
[  4]  0.0-10.0 sec842 MBytes706 Mbits/sec

2.3.23-rc8 (clean)
[  4]  0.0-10.0 sec323 MBytes270 Mbits/sec
[  5]  0.0-10.0 sec961 MBytes802 Mbits/sec

2.3.23-rc8 and your patch
[  5]  0.0-10.1 sec326 MBytes270 Mbits/sec
[  4]  0.0-10.0 sec958 MBytes802 Mbits/sec

2.3.23-rc8 and 6dccd... reverted
[  5]  0.0-10.1 sec830 MBytes692 Mbits/sec
[  4]  0.0-10.0 sec803 MBytes673 Mbits/sec


and full dmesg from 2.3.23-rc8+your patch below. Dmesg from other versions 
differed only by random noise caused by minor variations on bogomips 
scores, async initializations etc. I can send them to you too, if you think 
they are useful.

===cut
Linux version 2.6.23-rc8-fix ([EMAIL PROTECTED]) (gcc version 4.2.0 (Gentoo 
4.2.0 p1.4)) #2 SMP Wed Sep 26 21:24:43 EEST 2007
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009e400 (usable)
 BIOS-e820: 0009e400 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fee (usable)
 BIOS-e820: 7fee - 7fee3000 (ACPI NVS)
 BIOS-e820: 7fee3000 - 7fef (ACPI data)
 BIOS-e820: 7fef - 7ff0 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
1150MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f3640
NX (Execute Disable) protection: active
Entering add_active_range(0, 0, 524000) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   229376
  HighMem229376 ->   524000
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0:0 ->   524000
On node 0 totalpages: 524000
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 1760 pages used for memmap
  Normal zone: 223520 pages, LIFO batch:31
  HighMem zone: 2301 pages used for memmap
  HighMem zone: 292323 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
DMI 2.5 present.
ACPI: RSDP 000F77F0, 0014 (r0 IntelR)
ACPI: RSDT 7FEE3000, 0034 (r1 IntelR AWRDACPI 42302E31 AWRD0)
ACPI: FACP 7FEE3080, 0074 (r1 IntelR AWRDACPI 42302E31 AWRD0)
ACPI: DSDT 7FEE3100, 4D1C (r1 INTELR AWRDACPI 1000 MSFT  300)
ACPI: FACS 7FEE, 0040
ACPI: MCFG 7FEE7F00, 003C (r1 IntelR AWRDACPI 42302E31 AWRD0)
ACPI: APIC 7FEE7E40, 0084 (r1 IntelR AWRDACPI 42302E31 AWRD0)
ACPI: SSDT 7FEE8860, 0380 (r1  PmRefCpuPm 3000 INTL 20041203)
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x03] enabled)
Processor #3 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
Processor #2 6:15 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 4, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 8000 (gap: 7ff0:6010)
Built 1 zonelists in Zone order.  Total pages: 519907
Kernel command line: ro root=/dev/sdb2 [EMAIL PROTECTED]/eth0,[EMAIL 
PROTECTED]/ video=vesafb:ypan,mtrr:3 vga=0x0376
netconsole: local port 2001
netconsole: local IP 10.0.0.1
netconsole: interface eth0
netconsole: remote port 2001
netconsole: remote IP 10.0.0.254
netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
mapped APIC to b000 (fee0)
mapped IOAPIC to a000 (fec0)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 2403.193 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 

Re: 2.6.23-rc8-mm1: ata3: soft resetting link after STR

2007-09-26 Thread Alexey Dobriyan
On Wed, Sep 26, 2007 at 05:00:20PM +0100, Alan Cox wrote:
> On Wed, 26 Sep 2007 19:39:01 +0400
> Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> 
> > Frequently get these messages after resume from STR,
> > (subjectively, first STR is always OK).
> 
> Does this occur if you have the acpi support enabled
> (libata.noacpi=0)

I resumed ~20 times and there weren't any.

What did happened 3 times is hang on resume right after
message:

Radeon 9600 256MB 1002 pa711eab.iee 817d33
 _
 ^ cursor here

with HDD LED is on. Never saw this on mainline.
(this is probably separate bug)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Thomas Gleixner
On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
> There still are some oddities.
> 
> First, with the "x86-64: Disable local APIC timer use on AMD systems with C1E"
> patch and my collection of suspend patches applied, the box doesn't boot
> (the suspend patches don't even thouch the boot code, so they should be
> irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted
> for 2.6.23-rc8) is applied in addition.  Is this expected?

No. That's odd. It is nothing else than adding "noapictimer" to the
kernel command line.

> Next, on 2.6.23-rc8 with the patches from:
> 
> http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/
> 
> plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch
> and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't
> work correctly.  Although the box hibernates and restores, there is a 
> temporary
> "hang" during the "resume hardware" sequence, after which the "lock" led 
> starts
> to blink (and remains in this state) and something like this appears in dmesg:
> 
> Extended CMOS year: 2000
> Enabling non-boot CPUs ...
> SMP alternatives: switching to SMP code
> Booting processor 1/2 APIC 0x1
> Initializing CPU#1
> Calibrating delay using timer specific routine.. 3990.36 BogoMIPS 
> (lpj=7980735)
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 512K (64 bytes/line)
> Unable to handle kernel paging request at 806c64d4 RIP: 
>  [] identify_cpu+0x2ac/0x5a1

Hmm. That's really early in the CPU bring up. The only change in this
area is the C1E patch. Can you decode the exact source line, where it is
failing ?

tglx



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: __kernel_vsyscall () hangs in SIGCHLD handler

2007-09-26 Thread John Z. Bohach
On Wednesday 26 September 2007 10:03:33 Ulrich Drepper wrote:
> On 9/26/07, John Z. Bohach <[EMAIL PROTECTED]> wrote:
> > Is there some reason that syslog() sleeps in __kernel_vsyscall()
> > when invoked from a signal handler?
>
> Only very few functions are allowed to be called from signal
> handlers. This is clearly spelled out in the POSIX spec.  Section XSH
> 2.4.3 lists the allowed functions.  syslog of course is not on it. -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Thank you for this information...also thanks to Giacomo who answered me 
offline...

--john
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread Randy Dunlap
On Wed, 26 Sep 2007 11:22:29 -0700 David J. Wilder wrote:

> These patches provide a kernel tracing interface called "trace".
> 
> (update) Moved the sample code to the new samples\ subdir
> 
> The motivation for "trace" is to:
> - Provide a simple set of tracing primitives that will utilize the high-
>   performance and low-overhead of relayfs for passing traces data from
>   kernel to user space.
> - Provide a common user interface for managing kernel traces.
> - Allow for binary as well as ascii trace data.
> - Incorporate features from the systemtap runtime that are
>   useful to others.
> 
> Patches are against 2.6.23-rc6-mm1
> 
> Summary of patches:
> [patch 1/3]  Trace code and documentation
> [patch 2/3]  Relay Reset Consumed
> [patch 3/3]  Trace sample
> 
> Note: Patches 1/3 and 2/3 must be applied together.

Patch 2 provides an interface that patch 1 needs, correct?
So yes, patches 1 & 2 need to be applied together (merged),
or their order could be reversed, yes?  Can't the Relay patch
be merged standalone without breaking anything?


> Note: The following patches must be applied with 3/3.
> [patch 3/5] Add samples subdir
>   http://lkml.org/lkml/2007/9/25/157
> [patch 4/5] Linux Kernel Markers - Samples
>   http://lkml.org/lkml/2007/9/25/166


---
~Randy
Phaedrus says that Quality is about caring.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] nmi_watchdog: x86_64 count timer and hpet like i386

2007-09-26 Thread David Bahi
This modifies nmi_watchdog_tick behavior for x86_64 arch to 
consider both timer and pit/hpet IRQs just as the i386 arch does.

Without this fix a kernel crash occurs very early in the boot 
process if nmi_watchdog is on.

Signed-off-by: David Bahi <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/nmi.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c
===
--- linux-2.6.22.8-rt9_1267.orig/arch/x86_64/kernel/nmi.c
+++ linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c
@@ -369,8 +369,6 @@ int notrace __kprobes nmi_watchdog_tick(
touched = 1;
}
 
-   sum = read_pda(apic_timer_irqs);
-
if (__get_cpu_var(nmi_touch)) {
__get_cpu_var(nmi_touch) = 0;
touched = 1;
@@ -386,6 +384,12 @@ int notrace __kprobes nmi_watchdog_tick(
cpu_clear(cpu, backtrace_mask);
}
 
+   /*
+* Take the local apic timer and PIT/HPET into account. We don't
+* know which one is active, when we have highres/dyntick on
+*/
+   sum = read_pda(apic_timer_irqs) + kstat_cpu(cpu).irqs[0];
+
 #ifdef CONFIG_X86_MCE
/* Could check oops_in_progress here too, but it's safer
   not too */




signature.asc
Description: PGP signature


Re: [patch 3/3] Trace sample

2007-09-26 Thread Randy Dunlap
On Wed, 26 Sep 2007 11:22:43 -0700 David J. Wilder wrote:

> Trace example - Adds the trace example to samples/
> 
> Signed-off-by: David Wilder <[EMAIL PROTECTED]>

Acked-by: Randy Dunlap <[EMAIL PROTECTED]>


> ---
>  samples/Kconfig|6 ++
>  samples/Makefile   |1 +
>  samples/trace/Makefile |4 +
>  samples/trace/fork_trace.c |  132 
> 
>  4 files changed, 143 insertions(+), 0 deletions(-)
> 
> diff --git a/samples/Kconfig b/samples/Kconfig
> index 57bb223..e11c806 100644
> --- a/samples/Kconfig
> +++ b/samples/Kconfig
> @@ -13,4 +13,10 @@ config SAMPLE_MARKERS
>   help
> This build markers example modules.
>  
> +config SAMPLE_TRACE
> + tristate "Build trace example -- loadable modules only"
> + depends on TRACE && m
> + help
> +   This builds a trace example module.
> +
>  endif # SAMPLES
> diff --git a/samples/Makefile b/samples/Makefile
> index 5a4f0b6..8f6d05b 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -1,3 +1,4 @@
>  # Makefile for Linux samples code
>  
>  obj-$(CONFIG_SAMPLES)+= markers/
> +obj-$(CONFIG_SAMPLES)+= trace/
> diff --git a/samples/trace/Makefile b/samples/trace/Makefile
> new file mode 100644
> index 000..a2da8af
> --- /dev/null
> +++ b/samples/trace/Makefile
> @@ -0,0 +1,4 @@
> +# builds the trace example kernel modules;
> +# then to use (as root):  insmod 
> +
> +obj-$(CONFIG_SAMPLE_TRACE) := fork_trace.o
> diff --git a/samples/trace/fork_trace.c b/samples/trace/fork_trace.c
> new file mode 100644
> index 000..71c04c7
> --- /dev/null
> +++ b/samples/trace/fork_trace.c
> @@ -0,0 +1,132 @@
> +/*
> + * An example of using trace in a kprobes module
> + *
> + * Copyright (C) 2007 IBM Inc.
> + *
> + * David Wilder <[EMAIL PROTECTED]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> + *
> + * ---
> + * This module creates a trace channel and places a kprobe
> + * on the function do_fork(). The value of current->pid is written to
> + * the trace channel each time the kprobe is hit..
> + *
> + * How to run the example:
> + * $ mount -t debugfs /debug
> + * $ insmod fork_trace.ko
> + *
> + * To view the data produced by the module:
> + * $ cat /debug/trace_example/do_fork/trace0
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define USE_GLOBAL_BUFFERS 1
> +#define USE_FLIGHT 1
> +
> +#define PROBE_POINT "do_fork"
> +
> +static struct kprobe kp;
> +static struct trace_info *kprobes_trace;
> +
> +#ifdef USE_GLOBAL_BUFFERS
> +static DEFINE_SPINLOCK(trace_lock);
> +#endif
> +
> +/*
> + * Send formatted trace data to trace channel.
> + * @note Preemption must be disabled to use this.
> + */
> +static void trace_printf(struct trace_info *trace, const char *format, ...)
> +{
> + va_list ap, aq;
> + char *record;
> + unsigned long flags;
> + int len;
> +
> + if (!trace)
> + return;
> +
> +#ifdef USE_GLOBAL_BUFFERS
> + spin_lock_irqsave(_lock, flags);
> +#endif
> + if (trace_running(trace)) {
> + va_start(ap, format);
> + va_copy(aq, ap);
> + len = vsnprintf(NULL, 0, format, aq);
> + va_end(aq);
> + record = relay_reserve(trace->rchan, ++len);
> + if (record)
> + vsnprintf(record, len, format, ap);
> + va_end(ap);
> + }
> +#ifdef USE_GLOBAL_BUFFERS
> + spin_unlock_irqrestore(_lock, flags);
> +#endif
> +}
> +
> +static int handler_pre(struct kprobe *p, struct pt_regs *regs)
> +{
> + rcu_read_lock();
> + trace_printf(kprobes_trace, "%d\n", current->pid);
> + rcu_read_unlock();
> + return 0;
> +}
> +
> +int init_module(void)
> +{
> + int ret;
> + u32 flags = 0;
> +
> +#ifdef USE_GLOBAL_BUFFERS
> + flags |= TRACE_GLOBAL_CHANNEL;
> +#endif
> +
> +#ifdef USE_FLIGHT
> + flags |= TRACE_FLIGHT_CHANNEL;
> +#endif
> +
> + /* setup the trace */
> + kprobes_trace = trace_setup("trace_example", PROBE_POINT,
> +  1024, 8, flags);
> + if (IS_ERR(kprobes_trace))
> + return PTR_ERR(kprobes_trace);
> +
> + trace_start(kprobes_trace);
> +
> + /* setup the kprobe */
> + kp.pre_handler = handler_pre;
> + kp.post_handler = NULL;
> + kp.fault_handler = 

Re: [PATCH 2/6] LBS: fix uninitialized swapper_space

2007-09-26 Thread Christoph Lameter
On Wed, 26 Sep 2007, Hugh Dickins wrote:

> Probably better, yes.  In -mm Peter is doing an #ifdef CONFIG_SWAP
> bdi_init() on swapper_space.  Would make sense to do both together,
> perhaps move them to a swapper_space_init() in swap_state.c, saving
> his #ifdef too.  I suggest leave such cleanups until one or the
> other is mainlined.

Ok. I have updated the largeblock git tree with your patches and a new 
revision of the mmap patches. Still working on it. Fallback in the block 
layer is not yet working. I probably need to look at Nick's patches a bit.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Patches for tiny 386 kernels, again. Linux kernel 2.6.22.7

2007-09-26 Thread Jonathan Campbell

Here is the DMI patch again, written against linux-2.6.23-rc8,
with some of the #ifdef CONFIG_DMI's removed and moved
to include/linux/dmi.h. Putting them there in the way I've done
ensures that you don't have to put #ifdef CONFIG_DMI
around each dmi_check_machine() and that you don't
have to apply little patches to so many device drivers.

diff -u -r linux-2.6.23-rc8-old/arch/i386/Kconfig 
linux-2.6.23-rc8/arch/i386/Kconfig
--- linux-2.6.23-rc8-old/arch/i386/Kconfig2007-09-21 
22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/Kconfig2007-09-26 00:01:48.0 
+

@@ -83,10 +83,6 @@
bool
default y

-config DMI
-bool
-default y
-
source "init/Kconfig"

menu "Processor type and features"
diff -u -r linux-2.6.23-rc8-old/arch/i386/kernel/acpi/boot.c 
linux-2.6.23-rc8/arch/i386/kernel/acpi/boot.c
--- linux-2.6.23-rc8-old/arch/i386/kernel/acpi/boot.c2007-09-21 
22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/kernel/acpi/boot.c2007-09-26 
00:40:42.0 +

@@ -869,7 +869,7 @@
return;
}

-#ifdef __i386__
+#if defined(__i386__) && defined(CONFIG_DMI)

static int __init disable_acpi_irq(struct dmi_system_id *d)
{
@@ -1097,8 +1097,7 @@
 },
{}
};
-
-#endif/* __i386__ */
+#endif /* CONFIG_DMI && __i386__ */

/*
 * acpi_boot_table_init() and acpi_boot_init()
diff -u -r linux-2.6.23-rc8-old/arch/i386/kernel/acpi/sleep.c 
linux-2.6.23-rc8/arch/i386/kernel/acpi/sleep.c
--- linux-2.6.23-rc8-old/arch/i386/kernel/acpi/sleep.c2007-09-21 
22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/kernel/acpi/sleep.c2007-09-25 
23:59:14.0 +

@@ -86,6 +86,7 @@
return 0;
}

+#ifdef CONFIG_DMI
static __initdata struct dmi_system_id acpisleep_dmi_table[] = {
{/* Reset video mode after returning from ACPI S3 sleep */
 .callback = reset_videomode_after_s3,
@@ -104,3 +105,5 @@
}

core_initcall(acpisleep_dmi_init);
+#endif
+
diff -u -r 
linux-2.6.23-rc8-old/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c 
linux-2.6.23-rc8/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
--- linux-2.6.23-rc8-old/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
2007-09-21 22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
2007-09-26 00:59:26.0 +

@@ -535,7 +535,7 @@
return 0;
}

-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && defined(CONFIG_DMI)
/*
 * Some BIOSes do SW_ANY coordination internally, either set it up in hw
 * or do it in BIOS firmware and won't inform about it to OS. If not
@@ -562,7 +562,9 @@
},
{ }
};
-#endif
+#else
+#  define bios_with_sw_any_bug 0
+#endif /* CONFIG_SMP && CONFIG_DMI */

static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
diff -u -r linux-2.6.23-rc8-old/arch/i386/kernel/reboot.c 
linux-2.6.23-rc8/arch/i386/kernel/reboot.c
--- linux-2.6.23-rc8-old/arch/i386/kernel/reboot.c2007-09-21 
22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/kernel/reboot.c2007-09-26 
01:00:08.0 +

@@ -71,6 +71,7 @@

__setup("reboot=", reboot_setup);

+#ifdef CONFIG_DMI
/*
 * Reboot options and system auto-detection code provided by
 * Dell Inc. so their systems "just work". :-)
@@ -131,6 +132,7 @@
},
{ }
};
+#endif /* CONFIG_DMI */

static int __init reboot_init(void)
{
diff -u -r linux-2.6.23-rc8-old/arch/i386/kernel/tsc.c 
linux-2.6.23-rc8/arch/i386/kernel/tsc.c
--- linux-2.6.23-rc8-old/arch/i386/kernel/tsc.c2007-09-21 
22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/kernel/tsc.c2007-09-26 
00:39:52.0 +

@@ -290,6 +290,7 @@
}
EXPORT_SYMBOL_GPL(mark_tsc_unstable);

+#ifdef CONFIG_DMI
static int __init dmi_mark_tsc_unstable(struct dmi_system_id *d)
{
printk(KERN_NOTICE "%s detected: marking TSC unstable.\n",
@@ -310,6 +311,7 @@
 },
 {}
};
+#endif

/*
 * Make an educated guess if the TSC is trustworthy and synchronized
diff -u -r linux-2.6.23-rc8-old/arch/i386/mach-generic/bigsmp.c 
linux-2.6.23-rc8/arch/i386/mach-generic/bigsmp.c
--- linux-2.6.23-rc8-old/arch/i386/mach-generic/bigsmp.c2007-09-21 
22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/mach-generic/bigsmp.c2007-09-26 
00:57:13.0 +

@@ -21,6 +21,7 @@

static int dmi_bigsmp; /* can be set by dmi scanners */

+#ifdef CONFIG_DMI
static int hp_ht_bigsmp(struct dmi_system_id *d)
{
#ifdef CONFIG_X86_GENERICARCH
@@ -30,7 +31,6 @@
return 0;
}

-
static struct dmi_system_id bigsmp_dmi_table[] = {
{ hp_ht_bigsmp, "HP ProLiant DL760 G2", {
DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
@@ -43,7 +43,7 @@
 }},
 { }
};
-
+#endif /* CONFIG_DMI */

static int probe_bigsmp(void)
{
diff -u -r linux-2.6.23-rc8-old/arch/i386/pci/fixup.c 
linux-2.6.23-rc8/arch/i386/pci/fixup.c
--- linux-2.6.23-rc8-old/arch/i386/pci/fixup.c2007-09-21 
22:38:23.0 +
+++ linux-2.6.23-rc8/arch/i386/pci/fixup.c2007-09-26 
01:17:23.0 +

@@ -367,6 +367,7 @@
 */
static u16 toshiba_line_size;


Re: [patch 1/3] Trace code and documentation

2007-09-26 Thread Randy Dunlap
On Wed, 26 Sep 2007 11:22:35 -0700 David J. Wilder wrote:

> diff --git a/Documentation/trace.txt b/Documentation/trace.txt
> new file mode 100644
> index 000..0e42fb8
> --- /dev/null
> +++ b/Documentation/trace.txt
> @@ -0,0 +1,160 @@
> +Trace User Interface
> +===
> +When a trace channel is created and started, the following
> +directories and files are created in the root of the mounted debugfs.
> +
> +/debug (root of the debugfs)
> + /
> + /
> + trace0...traceN-1  Per-CPU trace data, one
> +file per CPU.
> +
> + state  Start or stop tracing by
> +by writing the strings
> +"start" or "stop" to this
> +file. Read the file to get the
> +current state.
> +
> + droppedThe number of records dropped
> +due to a full-buffer condition,
> +for non-TRACE_FLIGHT_CHANNELs
> +only.
> +
> + rewind Trigger a rewind by writing
> +to this file.  i.e. start
> +next read at the beginning
> +again. Only available for
> +TRACE_FLIGHT_CHANNELS.
> +
> +
> + nr_sub Number of sub-buffers
> +in the channel.
> +
> + sub_size   Size of sub-buffers in
> +the channnel.
> +
> +Trace data is gathered from the trace[0...N] files using one of the
> +available interfaces provided by relay.

For your next patchset (whenever that is), above should be
   trace[0...N-1] files

> +When using the read(2) interface, as data is read it is marked as
> +consumed by the relay subsystem.  Therefore, subsequent reads will
> +only return unconsumed data.


---
~Randy
Phaedrus says that Quality is about caring.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_chroot+sys_fchdir Fix

2007-09-26 Thread Al Viro
On Wed, Sep 26, 2007 at 08:04:14PM +0930, David Newall wrote:
> Al Viro wrote:
> >Oh, for fsck sake...  Folks, it's standard-required behaviour.  Ability
> >to chroot() implies the ability to break out of it.  Could we please
> >add that (along with reference to SuS) to l-k FAQ and be done with that
> >nonsense?
> 
> I'm pretty confident that it's only standard behavior for Linux.  Every 
> other unix says it's not allowed.

OK, the possibilities are
* you've discovered a bug in all Unices (BTW, even FreeBSD *does*
allow to break out of some chroots in that fashion; RTFS and you'll see -
just pay attention to setting fdp->fd_jdir logics in kern/vfs_syscalls.c:
change_root(); it sets jail boundary on _first_ chroot and if you've got
nested chroots, you can leave them just fine by use of SCM_RIGHTS to hold
directory descriptor).  All hail David, nevermind that this behaviour had
been described in Unix FAQs since _way_ back.
* you've misunderstood the purpose of chroot(), the fact that
behaviour in question is at the very least extremely common on Unix and
the fact that any code relying on root-proof chroot(2) is broken and needs
to be fixed, simply because chroot is _not_ root-proof on (at least) almost
all systems.

Note that the last statement applies in both cases; it's simply reality.
Insisting that behaviour known for decades is a bug since it contradicts
your rather convoluted reading of the standards...  Looks rather silly,
IMO, but that has zero practical consequences anyway.  Userland code can't
rely on root-proof chroot(2), period.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/25] Unionfs: add un/likely conditionals on copyup ops

2007-09-26 Thread Adrian Bunk
On Wed, Sep 26, 2007 at 09:40:20AM -0400, Erez Zadok wrote:
>...
> Also, Auke, if indeed compilers are [sic] likely to do better than
> programmers adding un/likely wrappers, then why do we still support that in
> the kernel?  (Working for a company tat produces high-quality compilers, you
> may know the answer better.)
> 
> Personally I'm not too fond of what those wrappers do the code: they make it
> a bit harder to read the code (yet another extra set of parentheses); and
> they cause the code to be indented further to the right, which you sometimes
> have to split to multiple lines to avoid going over 80 chars.

There are some places in generic code where it makes sense, e.g.:
  #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while(0)

If you run into a BUG() it's anyway game over.

And there are some rare hotpaths in the kernel where it might make 
sense, and many other places where the likely/unlikely usage that might 
be present doesn't make sense.

Unless you know you need it you simply shouldn't use likely/unlikely.

> Cheers,
> Erez.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] LBS: fix uninitialized swapper_space

2007-09-26 Thread Hugh Dickins
On Mon, 24 Sep 2007, Christoph Lameter wrote:
> On Fri, 21 Sep 2007, Hugh Dickins wrote:
> 
> > Swapping crashed immediately: must initialize new fields of swapper_space.
> 
> Thanks for finding that. It may be better though to use the new
> mapping_setup() function instead? That way there is no #ifdef.

Probably better, yes.  In -mm Peter is doing an #ifdef CONFIG_SWAP
bdi_init() on swapper_space.  Would make sense to do both together,
perhaps move them to a swapper_space_init() in swap_state.c, saving
his #ifdef too.  I suggest leave such cleanups until one or the
other is mainlined.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


forcedeth question

2007-09-26 Thread roel
in file ./drivers/net/forcedeth.c
line 2142 of current git I have a
for (i=0;i<=np->register_size;i+= 32) {
   ^
shouldn't this be a '<'

In the same file on line 4015:
for (i = 0;i <= np->register_size/sizeof(u32); i++)

shouldn't the "<=" be a '<'?

Roel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Since we have counters in __u64 format we have to print them with %llu macros.

2007-09-26 Thread H. Peter Anvin
Balbir Singh wrote:
> Andreas Schwab wrote:
>> Maxim Uvarov <[EMAIL PROTECTED]> writes:
>>
>>> diff --git a/Documentation/accounting/getdelays.c 
>>> b/Documentation/accounting/getdelays.c
>>> index cbee3a2..73924df 100644
>>> --- a/Documentation/accounting/getdelays.c
>>> +++ b/Documentation/accounting/getdelays.c
>>> @@ -208,7 +208,7 @@ void print_delayacct(struct taskstats *t)
>>>  void task_context_switch_counts(struct taskstats *t)
>>>  {
>>> printf("\n\nTask   %15s%15s\n"
>>> -  "   %15lu%15lu\n",
>>> +  "   %15llu%15llu\n",
>>>"voluntary", "nonvoluntary",
>>>t->nvcsw, t->nivcsw);
>> __u64 is not always long long.
> 
> What is the maximum size of long long across all architectures?
> How does one format __u64 for printing?
> 

In user space, use the macro PRIu64 (or PRIx64 etc) from .

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Proposed 2.6 Patch for AMD MIPS Alchemy au1550 I2C interface I2C interface

2007-09-26 Thread Chris David
Hello, 

Please CC me on replies. 


I have made a trivial patch to fix a problem on the AMD MIPS Alchemy au1550
I2C interface.  The PSC (programmable serial controller) seem to 'hang' when 
I sent only an 'address' byte on the I2C bus.  The patch essentially uses  
the PSC_SMBSTAT register's TE (transmit FIFO empty) bit to check when the 
transmit FIFO is empty, instead of using the PSC_SMBEVNT register's TU  
(transmit underflow) bit.  Using the TE bit fixed the hang problem. 


I tested this on kernel 2.6.16, and confirmed the patch updates the 2.6.22
kernel correctly.  If someone else can test this, that would be great. 

Dan Malek is the author of the file in question.  I would be more than happy 
to provide any additional information about this patch to Dan or anyone 
else.  Please email me privately. 

I am a newbie, but I did read part of the FAQ, and used my best judgement.  
Kindly let me know if my communication could be improved.  And please CC me 
on replies. 

Thank you, 

-Chris David 



diff -Naur linux-2.6.16-orig/drivers/i2c/busses/i2c-au1550.c 
linux-2.6.16/drivers/i2c/busses/i2c-au1550.c
--- linux-2.6.16-orig/drivers/i2c/busses/i2c-au1550.c   2007-09-26 
08:38:45.0 -0700
+++ linux-2.6.16/drivers/i2c/busses/i2c-au1550.c2007-09-26 
08:43:43.0 -0700
@@ -61,17 +61,14 @@
 
sp = (volatile psc_smb_t *)(adap->psc_base);
 
-   /* Wait for Tx FIFO Underflow.
+   /* Wait for Tx Buffer Empty
*/
for (i = 0; i < adap->xfer_timeout; i++) {
-   stat = sp->psc_smbevnt;
+   stat = sp->psc_smbstat;
au_sync();
-   if ((stat & PSC_SMBEVNT_TU) != 0) {
-   /* Clear it.  */
-   sp->psc_smbevnt = PSC_SMBEVNT_TU;
-   au_sync();
+   if ((stat & PSC_SMBSTAT_TE) != 0)
return 0;
-   }
+
udelay(1);
}
 


Re: [PATCH 3/3] CRED: Move the effective capabilities into the cred struct

2007-09-26 Thread Al Viro
On Wed, Sep 19, 2007 at 09:11:26PM -0700, Andrew Morgan wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> David Howells wrote:
> > Move the effective capabilities mask from the task struct into the 
> > credentials
> > record.
> > 
> > Note that the effective capabilities mask in the cred struct shadows that in
> > the task_struct because a thread can have its capabilities masks changed by
> > another thread.  The shadowing is performed by update_current_cred() which 
> > is
> > invoked on entry to any system call that might need it.
> 
> OOC If we were to simply drop support for one process changing the
> capabilities of another, would we need this patch?

Umm...  It would become simpler (which is a damn good thing - less PITA
with update_current_cred), but it would be still needed.

FWIW, dropping that support would be a Good Thing(tm), as far as I'm
concerned.  _Why_ do we want that, anyway, and how much userland code
is able to cope with that in sane way?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


nmi_watchdog fix for x86_64 to be more like i386

2007-09-26 Thread David Bahi
Thanks to tglx and ghaskins for all the help in tracking down a very
early nmi_watchdog crash on certain x86_64 machines.

This modifies nmi_watchdog_tick behavior for 
x86_64 arch to consider both timer and hpet IRQs
just as the i386 arch does.

Signed-off-by: David Bahi <[EMAIL PROTECTED]>

---
 arch/x86_64/kernel/nmi.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c
===
--- linux-2.6.22.8-rt9_1267.orig/arch/x86_64/kernel/nmi.c
+++ linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c
@@ -369,8 +369,6 @@ int notrace __kprobes nmi_watchdog_tick(
touched = 1;
}
 
-   sum = read_pda(apic_timer_irqs);
-
if (__get_cpu_var(nmi_touch)) {
__get_cpu_var(nmi_touch) = 0;
touched = 1;
@@ -386,6 +384,12 @@ int notrace __kprobes nmi_watchdog_tick(
cpu_clear(cpu, backtrace_mask);
}
 
+   /*
+* Take the local apic timer and PIT/HPET into account. We don't
+* know which one is active, when we have highres/dyntick on
+*/
+   sum = read_pda(apic_timer_irqs) + kstat_cpu(cpu).irqs[0];
+
 #ifdef CONFIG_X86_MCE
/* Could check oops_in_progress here too, but it's safer
   not too */


signature.asc
Description: PGP signature


[patch 3/3] Trace sample

2007-09-26 Thread David J. Wilder
Trace example - Adds the trace example to samples/

Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 samples/Kconfig|6 ++
 samples/Makefile   |1 +
 samples/trace/Makefile |4 +
 samples/trace/fork_trace.c |  132 
 4 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/samples/Kconfig b/samples/Kconfig
index 57bb223..e11c806 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -13,4 +13,10 @@ config SAMPLE_MARKERS
help
  This build markers example modules.
 
+config SAMPLE_TRACE
+   tristate "Build trace example -- loadable modules only"
+   depends on TRACE && m
+   help
+ This builds a trace example module.
+
 endif # SAMPLES
diff --git a/samples/Makefile b/samples/Makefile
index 5a4f0b6..8f6d05b 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -1,3 +1,4 @@
 # Makefile for Linux samples code
 
 obj-$(CONFIG_SAMPLES)  += markers/
+obj-$(CONFIG_SAMPLES)  += trace/
diff --git a/samples/trace/Makefile b/samples/trace/Makefile
new file mode 100644
index 000..a2da8af
--- /dev/null
+++ b/samples/trace/Makefile
@@ -0,0 +1,4 @@
+# builds the trace example kernel modules;
+# then to use (as root):  insmod 
+
+obj-$(CONFIG_SAMPLE_TRACE) := fork_trace.o
diff --git a/samples/trace/fork_trace.c b/samples/trace/fork_trace.c
new file mode 100644
index 000..71c04c7
--- /dev/null
+++ b/samples/trace/fork_trace.c
@@ -0,0 +1,132 @@
+/*
+ * An example of using trace in a kprobes module
+ *
+ * Copyright (C) 2007 IBM Inc.
+ *
+ * David Wilder <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ * ---
+ * This module creates a trace channel and places a kprobe
+ * on the function do_fork(). The value of current->pid is written to
+ * the trace channel each time the kprobe is hit..
+ *
+ * How to run the example:
+ * $ mount -t debugfs /debug
+ * $ insmod fork_trace.ko
+ *
+ * To view the data produced by the module:
+ * $ cat /debug/trace_example/do_fork/trace0
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#define USE_GLOBAL_BUFFERS 1
+#define USE_FLIGHT 1
+
+#define PROBE_POINT "do_fork"
+
+static struct kprobe kp;
+static struct trace_info *kprobes_trace;
+
+#ifdef USE_GLOBAL_BUFFERS
+static DEFINE_SPINLOCK(trace_lock);
+#endif
+
+/*
+ * Send formatted trace data to trace channel.
+ * @note Preemption must be disabled to use this.
+ */
+static void trace_printf(struct trace_info *trace, const char *format, ...)
+{
+   va_list ap, aq;
+   char *record;
+   unsigned long flags;
+   int len;
+
+   if (!trace)
+   return;
+
+#ifdef USE_GLOBAL_BUFFERS
+   spin_lock_irqsave(_lock, flags);
+#endif
+   if (trace_running(trace)) {
+   va_start(ap, format);
+   va_copy(aq, ap);
+   len = vsnprintf(NULL, 0, format, aq);
+   va_end(aq);
+   record = relay_reserve(trace->rchan, ++len);
+   if (record)
+   vsnprintf(record, len, format, ap);
+   va_end(ap);
+   }
+#ifdef USE_GLOBAL_BUFFERS
+   spin_unlock_irqrestore(_lock, flags);
+#endif
+}
+
+static int handler_pre(struct kprobe *p, struct pt_regs *regs)
+{
+   rcu_read_lock();
+   trace_printf(kprobes_trace, "%d\n", current->pid);
+   rcu_read_unlock();
+   return 0;
+}
+
+int init_module(void)
+{
+   int ret;
+   u32 flags = 0;
+
+#ifdef USE_GLOBAL_BUFFERS
+   flags |= TRACE_GLOBAL_CHANNEL;
+#endif
+
+#ifdef USE_FLIGHT
+   flags |= TRACE_FLIGHT_CHANNEL;
+#endif
+
+   /* setup the trace */
+   kprobes_trace = trace_setup("trace_example", PROBE_POINT,
+1024, 8, flags);
+   if (IS_ERR(kprobes_trace))
+   return PTR_ERR(kprobes_trace);
+
+   trace_start(kprobes_trace);
+
+   /* setup the kprobe */
+   kp.pre_handler = handler_pre;
+   kp.post_handler = NULL;
+   kp.fault_handler = NULL;
+   kp.symbol_name = PROBE_POINT;
+   ret = register_kprobe();
+   if (ret) {
+   printk(KERN_ERR "fork_trace: register_kprobe failed\n");
+   return ret;
+   }
+   return 0;
+}
+
+void cleanup_module(void)
+{
+   unregister_kprobe();
+   trace_stop(kprobes_trace);
+   trace_cleanup(kprobes_trace);
+}

[patch 2/3] Relay Reset Consumed

2007-09-26 Thread David J. Wilder
This patch allows relay channels to be reset i.e. unconsumed.
Basically allows a 'rewind' function for flight-recorder tracing.

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/filesystems/relay.txt |   11 ++
 include/linux/relay.h   |1 +
 kernel/relay.c  |   58 ---
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/relay.txt 
b/Documentation/filesystems/relay.txt
index 18d23f9..d31113a 100644
--- a/Documentation/filesystems/relay.txt
+++ b/Documentation/filesystems/relay.txt
@@ -161,6 +161,7 @@ TBD(curr. line MT:/API/)
 relay_close(chan)
 relay_flush(chan)
 relay_reset(chan)
+relay_reset_consumed(chan)
 
   channel management typically called on instigation of userspace:
 
@@ -452,6 +453,16 @@ state without reallocating channel buffer memory or 
destroying
 existing mappings.  It should however only be called when it's safe to
 do so, i.e. when the channel isn't currently being written to.
 
+The read(2) implementation always 'consumes' the bytes read,
+i.e. those bytes won't be available again to subsequent reads.
+Certain applications may nonetheless wish to allow the 'consumed' data
+to be re-read; relay_reset_consumed() is provided for that purpose -
+it resets the internal consumed counters for all buffers in the
+channel.  For example, if a first set of reads 'drains' the channel,
+and then relay_reset_consumed() is called, a second set of reads will
+get the exact same data (assuming no new data was written between the
+first set of reads and the second).
+
 Finally, there are a couple of utility callbacks that can be used for
 different purposes.  buf_mapped() is called whenever a channel buffer
 is mmapped from user space and buf_unmapped() is called when it's
diff --git a/include/linux/relay.h b/include/linux/relay.h
index 6cd8c44..aca45fa 100644
--- a/include/linux/relay.h
+++ b/include/linux/relay.h
@@ -175,6 +175,7 @@ extern void relay_subbufs_consumed(struct rchan *chan,
   unsigned int cpu,
   size_t consumed);
 extern void relay_reset(struct rchan *chan);
+extern void relay_reset_consumed(struct rchan *chan);
 extern int relay_buf_full(struct rchan_buf *buf);
 
 extern size_t relay_switch_subbuf(struct rchan_buf *buf,
diff --git a/kernel/relay.c b/kernel/relay.c
index 61134eb..6b55eaa 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -383,6 +383,57 @@ void relay_reset(struct rchan *chan)
 }
 EXPORT_SYMBOL_GPL(relay_reset);
 
+/**
+ * __relay_reset_consumed - reset a channel buffer's consumed count
+ * @buf: the channel buffer
+ *
+ * See relay_reset_consumed for description of effect.
+ */
+static inline void __relay_reset_consumed(struct rchan_buf *buf)
+{
+   size_t n_subbufs = buf->chan->n_subbufs;
+   size_t produced = buf->subbufs_produced;
+   size_t consumed = buf->subbufs_consumed;
+
+   if (produced < n_subbufs)
+   buf->subbufs_consumed = 0;
+   else {
+   consumed = produced - n_subbufs;
+   if (buf->offset)
+   consumed++;
+   buf->subbufs_consumed = consumed;
+   }
+   buf->bytes_consumed = 0;
+}
+
+/**
+ * relay_reset_consumed - reset the channel's consumed counts
+ * @chan: the channel
+ *
+ * This has the effect of making all data previously read (and
+ * not overwritten by subsequent writes) from a channel available
+ * for reading again.
+ *
+ * NOTE: Care should be taken that the channel isn't actually
+ * being used by anything when this call is made.
+ */
+void relay_reset_consumed(struct rchan *chan)
+{
+   unsigned int i;
+   struct rchan_buf *prev = NULL;
+
+   if (!chan)
+   return;
+
+   for (i = 0; i < NR_CPUS; i++) {
+   if (!chan->buf[i] || chan->buf[i] == prev)
+   break;
+   __relay_reset_consumed(chan->buf[i]);
+   prev = chan->buf[i];
+   }
+}
+EXPORT_SYMBOL_GPL(relay_reset_consumed);
+
 /*
  * relay_open_buf - create a new relay channel buffer
  *
@@ -845,11 +896,8 @@ static int relay_file_read_avail(struct rchan_buf *buf, 
size_t read_pos)
return 1;
}
 
-   if (unlikely(produced - consumed >= n_subbufs)) {
-   consumed = produced - n_subbufs + 1;
-   buf->subbufs_consumed = consumed;
-   buf->bytes_consumed = 0;
-   }
+   if (unlikely(produced - consumed >= n_subbufs))
+   __relay_reset_consumed(buf);
 
produced = (produced % n_subbufs) * subbuf_size + buf->offset;
consumed = (consumed % n_subbufs) * subbuf_size + buf->bytes_consumed;
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
old mode 100644
new mode 100755


-
To unsubscribe from this list: send the line 

[PATCH 0/3] A kernel tracing interface - (updated)

2007-09-26 Thread David J. Wilder
These patches provide a kernel tracing interface called "trace".

(update) Moved the sample code to the new samples\ subdir

The motivation for "trace" is to:
- Provide a simple set of tracing primitives that will utilize the high-
  performance and low-overhead of relayfs for passing traces data from
  kernel to user space.
- Provide a common user interface for managing kernel traces.
- Allow for binary as well as ascii trace data.
- Incorporate features from the systemtap runtime that are
  useful to others.

Patches are against 2.6.23-rc6-mm1

Summary of patches:
[patch 1/3]  Trace code and documentation
[patch 2/3]  Relay Reset Consumed
[patch 3/3]  Trace sample

Note: Patches 1/3 and 2/3 must be applied together.

Note: The following patches must be applied with 3/3.
[patch 3/5] Add samples subdir
http://lkml.org/lkml/2007/9/25/157
[patch 4/5] Linux Kernel Markers - Samples
http://lkml.org/lkml/2007/9/25/166

Signed-off-by: David Wilder <[EMAIL PROTECTED]>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/3] Trace code and documentation

2007-09-26 Thread David J. Wilder
Trace - Provides tracing primitives

Signed-off-by: Tom Zanussi <[EMAIL PROTECTED]>
Signed-off-by: Martin Hunt <[EMAIL PROTECTED]>
Signed-off-by: David Wilder <[EMAIL PROTECTED]>
---
 Documentation/trace.txt |  160 ++
 include/linux/trace.h   |   99 +
 lib/Kconfig |9 +
 lib/Makefile|2 +
 lib/trace.c |  563 +++
 5 files changed, 833 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace.txt b/Documentation/trace.txt
new file mode 100644
index 000..0e42fb8
--- /dev/null
+++ b/Documentation/trace.txt
@@ -0,0 +1,160 @@
+Trace Setup and Control
+===
+In the kernel, the trace interface provides a simple mechanism for
+starting and managing data channels (traces) to user space.  The
+trace interface builds on the relay interface.  For a complete
+description of the relay interface, please see:
+Documentation/filesystems/relay.txt.
+
+The trace interface provides a single layer in a complete tracing
+application.  Trace provides a kernel API that can be used for the setup
+and control of tracing channels.  User of trace must provide a data layer
+responsible for formatting and writing data into the trace channels.
+
+A layered approach to tracing
+=
+A complete kernel tracing application consists of a data provider and
+a data consumer.  Both provider and consumer contain three layers; each
+layer works in tandem with the corresponding layer in the opposite side.
+The layers are represented in the following diagram.
+
+Provider Data layer
+   Formats raw trace data and provides data-related service.
+   For example, adding timestamps used by consumer to sort data.
+
+Provider Control layer
+   Provided by the trace interface, this layer creates trace channels
+   and informs the data layer and consumer of the current state
+   of the trace channels.
+
+Provider Buffering layer
+   Provided by relay. This layer buffers data in the
+   kernel for consumption by the consumer's buffer
+   layer.
+
+Provider (in-kernel facility)
+-
+Consumer (user application)
+
+
+Consumer Buffer layer
+   Reads/consumes data from the provider's data buffers.
+
+Consumer Control layer
+   Communicates to the provider's control layer to control the state
+   of the trace channels.
+
+Consumer Data layer
+   Sorts and formats data as provided by the provider's data layer.
+
+The provider is coded as a kernel facility.  The consumer is coded as
+a user application.
+
+
+Trace - Features
+
+Trace exploits services and features provided by relay.  These features
+are:
+- The creation and destruction of relay channels.
+- Buffer management.  Overwrite or non-overwrite modes can be selected
+  as well as global or per-CPU buffering.
+
+Overwrite mode can be called "flight recorder mode".  Flight recorder
+mode is selected by setting the TRACE_FLIGHT_CHANNEL flag when
+creating trace channels.  In flight mode when a tracing buffer is
+full, the oldest records in the buffer will be discarded to make room
+as new records arrive. In the default non-overwrite mode, new records
+may be written only if the buffer has room.  In either case, to
+prevent data loss, a user space reader must keep the buffers
+drained. Trace provides a means to detect the number of records that
+have been dropped due to a buffer-full condition (non-overwrite mode
+only).
+
+When per-CPU buffers are used, relay creates one debugfs file for each
+running CPU.  The user-space consumer of the data is responsible for
+reading the per-CPU buffers and collating the records presumably using
+a time stamp or sequence number included in the trace records.  The
+use of global buffers eliminates this extra work of sequencing
+records; however the provider's data layer must hold a lock when
+writing records.  The lock prevents writers running on different CPUs
+from overwriting each other's data.  However, buffering may be slower
+because writes to the buffer are serialized. Global buffering is
+selected by setting the TRACE_GLOBAL_CHANNEL flag when creating trace
+channels.
+
+Trace User Interface
+===
+When a trace channel is created and started, the following
+directories and files are created in the root of the mounted debugfs.
+
+/debug (root of the debugfs)
+   /
+   /
+   trace0...traceN-1  Per-CPU trace data, one
+  file per CPU.
+
+   state  Start or stop tracing by
+  by writing the strings
+  "start" or "stop" to this
+  file. Read the file to get the
+  current state.
+
+   

[PATCH] just rename call_rcu_bh instead of making it a macro

2007-09-26 Thread Steven Rostedt
Seems that I found a box that has a config that passes call_rcu_bh as a
function pointer (see net/sctp/sm_make_chunk.c), so declaring the
call_rcu_bh has a macro function isn't good enough.

This patch makes it just another name of call_rcu for rcupreempt.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

Index: linux-2.6.23-rc8-rt1/include/linux/rcupreempt.h
===
--- linux-2.6.23-rc8-rt1.orig/include/linux/rcupreempt.h
+++ linux-2.6.23-rc8-rt1/include/linux/rcupreempt.h
@@ -42,9 +42,14 @@
 #include 
 #include 
 
-#define rcu_qsctr_inc(cpu)
-#define rcu_bh_qsctr_inc(cpu)
-#define call_rcu_bh(head, rcu) call_rcu(head, rcu)
+#define rcu_qsctr_inc(cpu) do { } while (0)
+#define rcu_bh_qsctr_inc(cpu)  do { } while (0)
+/*
+ * Someone might want to pass call_rcu_bh as a function pointer.
+ * So this needs to just be a rename and not a macro function.
+ *  (no parentheses)
+ */
+#define call_rcu_bh call_rcu
 
 extern void __rcu_read_lock(void);
 extern void __rcu_read_unlock(void);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/7] Extended crashkernel command line

2007-09-26 Thread Bernhard Walle
* Oleg Verych <[EMAIL PROTECTED]> [2007-09-26 20:18]:
> 
> I was thinking about errors in YaST or typos in bootloader config, that
> may appear sometimes. And kernel must tolerate this kind of userspace
> input to be more reliable. But you know better, i just am waving hands.

Of course the kernel must be able to handle them -- outputting an
error message that can be read by the user and not crashing. But I
don't expect that the kernel then reservate crash memory by guessing
of values.



Thanks,
   Bernhard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22.9

2007-09-26 Thread Greg Kroah-Hartman

diff --git a/Documentation/dvb/get_dvb_firmware 
b/Documentation/dvb/get_dvb_firmware
index 4820366..6cb3080 100644
--- a/Documentation/dvb/get_dvb_firmware
+++ b/Documentation/dvb/get_dvb_firmware
@@ -56,7 +56,7 @@ syntax();
 
 sub sp8870 {
 my $sourcefile = "tt_Premium_217g.zip";
-my $url = "http://www.technotrend.de/new/217g/$sourcefile;;
+my $url = 
"http://www.softwarepatch.pl/ccd06a4813cb827dbb0005071c71/$sourcefile;;
 my $hash = "53970ec17a538945a6d8cb608a7b3899";
 my $outfile = "dvb-fe-sp8870.fw";
 my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
@@ -110,21 +110,21 @@ sub tda10045 {
 }
 
 sub tda10046 {
-my $sourcefile = "tt_budget_217g.zip";
-my $url = "http://www.technotrend.de/new/217g/$sourcefile;;
-my $hash = "6a7e1e2f2644b162ff0502367553c72d";
-my $outfile = "dvb-fe-tda10046.fw";
-my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
+   my $sourcefile = "TT_PCI_2.19h_28_11_2006.zip";
+   my $url = 
"http://technotrend-online.com/download/software/219/$sourcefile;;
+   my $hash = "6a7e1e2f2644b162ff0502367553c72d";
+   my $outfile = "dvb-fe-tda10046.fw";
+   my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
 
-checkstandard();
+   checkstandard();
 
-wgetfile($sourcefile, $url);
-unzip($sourcefile, $tmpdir);
-extract("$tmpdir/software/OEM/PCI/App/ttlcdacc.dll", 0x3f731, 24478, 
"$tmpdir/fwtmp");
-verify("$tmpdir/fwtmp", $hash);
-copy("$tmpdir/fwtmp", $outfile);
+   wgetfile($sourcefile, $url);
+   unzip($sourcefile, $tmpdir);
+   
extract("$tmpdir/TT_PCI_2.19h_28_11_2006/software/OEM/PCI/App/ttlcdacc.dll", 
0x65389, 24478, "$tmpdir/fwtmp");
+   verify("$tmpdir/fwtmp", $hash);
+   copy("$tmpdir/fwtmp", $outfile);
 
-$outfile;
+   $outfile;
 }
 
 sub tda10046lifeview {
diff --git a/Makefile b/Makefile
index dc7a45d..6f8adbb 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 22
-EXTRAVERSION = .8
+EXTRAVERSION = .9
 NAME = Holy Dancing Manatees, Batman!
 
 # *DOCUMENTATION*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6e2f035..87c474d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -83,7 +83,7 @@ void flush_fp_to_thread(struct task_struct *tsk)
 */
BUG_ON(tsk != current);
 #endif
-   giveup_fpu(current);
+   giveup_fpu(tsk);
}
preempt_enable();
}
@@ -143,7 +143,7 @@ void flush_altivec_to_thread(struct task_struct *tsk)
 #ifdef CONFIG_SMP
BUG_ON(tsk != current);
 #endif
-   giveup_altivec(current);
+   giveup_altivec(tsk);
}
preempt_enable();
}
@@ -182,7 +182,7 @@ void flush_spe_to_thread(struct task_struct *tsk)
 #ifdef CONFIG_SMP
BUG_ON(tsk != current);
 #endif
-   giveup_spe(current);
+   giveup_spe(tsk);
}
preempt_enable();
}
diff --git a/arch/sparc64/kernel/pci.c b/arch/sparc64/kernel/pci.c
index 3bc136a..154f10e 100644
--- a/arch/sparc64/kernel/pci.c
+++ b/arch/sparc64/kernel/pci.c
@@ -751,7 +751,7 @@ static void __devinit pci_of_scan_bus(struct pci_pbm_info 
*pbm,
 {
struct device_node *child;
const u32 *reg;
-   int reglen, devfn;
+   int reglen, devfn, prev_devfn;
struct pci_dev *dev;
 
if (ofpci_verbose)
@@ -759,14 +759,25 @@ static void __devinit pci_of_scan_bus(struct pci_pbm_info 
*pbm,
   node->full_name, bus->number);
 
child = NULL;
+   prev_devfn = -1;
while ((child = of_get_next_child(node, child)) != NULL) {
if (ofpci_verbose)
printk("  * %s\n", child->full_name);
reg = of_get_property(child, "reg", );
if (reg == NULL || reglen < 20)
continue;
+
devfn = (reg[0] >> 8) & 0xff;
 
+   /* This is a workaround for some device trees
+* which list PCI devices twice.  On the V100
+* for example, device number 3 is listed twice.
+* Once as "pm" and once again as "lomp".
+*/
+   if (devfn == prev_devfn)
+   continue;
+   prev_devfn = devfn;
+
/* create a new pci_dev for this device */
dev = of_create_pci_dev(pbm, child, bus, devfn, 0);
if (!dev)
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index c99b463..4369ff2 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1081,12 +1081,6 @@ void blk_queue_end_tag(request_queue_t *q, struct 
request *rq)
 */
return;
 
-   if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
-

Linux 2.6.22.9

2007-09-26 Thread Greg Kroah-Hartman
We (the -stable team) are announcing the release of the 2.6.22.9 kernel.
It fixes a number of reported bugs, and any user of the 2.6.22 series is
encouraged to upgrade.

I'll also be replying to this message with a copy of the patch between
2.6.22.8 and 2.6.22.9

The updated 2.6.22.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.22.y.git
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.22.y.git;a=summary

thanks,

greg k-h



 Documentation/dvb/get_dvb_firmware   |   26 -
 Makefile |2 
 arch/powerpc/kernel/process.c|6 +-
 arch/sparc64/kernel/pci.c|   13 
 block/ll_rw_blk.c|   13 ++--
 crypto/blkcipher.c   |   11 ++--
 drivers/acpi/tables/tbutils.c|   71 ++
 drivers/block/DAC960.c   |   18 --
 drivers/block/DAC960.h   |7 --
 drivers/firewire/fw-ohci.c   |   10 +--
 drivers/hwmon/lm78.c |2 
 drivers/hwmon/w83781d.c  |2 
 drivers/ieee1394/ieee1394_core.c |2 
 drivers/ieee1394/ohci1394.c  |4 -
 drivers/media/dvb/b2c2/flexcop-i2c.c |7 ++
 drivers/media/video/cx88/cx88-mpeg.c |2 
 drivers/media/video/ivtv/ivtv-ioctl.c|1 
 drivers/media/video/pwc/pwc-if.c |2 
 drivers/mtd/Makefile |2 
 drivers/mtd/mtdpart.c|4 -
 drivers/mtd/mtdsuper.c   |2 
 drivers/net/forcedeth.c  |2 
 drivers/net/wireless/bcm43xx/bcm43xx_main.c  |   28 +++---
 drivers/net/wireless/bcm43xx/bcm43xx_main.h  |2 
 drivers/net/wireless/bcm43xx/bcm43xx_sysfs.c |2 
 drivers/scsi/3w-9xxx.c   |   18 --
 drivers/usb/core/driver.c|2 
 fs/afs/mntpt.c   |2 
 fs/ext3/namei.c  |   73 ---
 fs/ext4/namei.c  |   73 ---
 fs/jffs2/fs.c|2 
 fs/locks.c   |2 
 fs/nfs/super.c   |2 
 fs/splice.c  |4 -
 include/linux/Kbuild |1 
 init/Kconfig |1 
 kernel/futex_compat.c|4 -
 kernel/signal.c  |   19 +++
 kernel/sys.c |3 -
 net/8021q/vlan_dev.c |   12 
 net/bridge/br_netfilter.c|   12 ++--
 net/core/datagram.c  |3 +
 net/core/pktgen.c|   10 +++
 net/decnet/dn_dev.c  |2 
 net/ipv4/ah4.c   |2 
 net/ipv4/devinet.c   |2 
 net/ipv4/inet_diag.c |4 +
 net/ipv4/tcp_input.c |   21 +--
 net/ipv6/addrconf.c  |2 
 net/ipv6/ip6_output.c|5 +
 net/ipv6/ndisc.c |2 
 net/ipv6/raw.c   |3 -
 net/sunrpc/svcsock.c |6 +-
 scripts/kconfig/conf.c   |   21 ---
 54 files changed, 405 insertions(+), 149 deletions(-)

Summary of changes from v2.6.22.8 to v2.6.22.9
==

Adam Radford (1):
  3w-9xxx: Fix dma mask setting

Adit Ranadive (1):
  Fix pktgen src_mac handling.

Alexey Dobriyan (1):
  nfs: fix oops re sysctls and V4 support

Andreas Arens (1):
  DVB: get_dvb_firmware: update script for new location of tda10046 firmware

Andreas Gruenbacher (1):
  afs: mntput called before dput

Andrew Morton (2):
  disable sys_timerfd()
  Fix "Fix DAC960 driver on machines which don't support 64-bit DMA"

Arnd Bergmann (1):
  futex_compat: fix list traversal bugs

David Howells (1):
  MTD: Initialise s_flags in get_sb_mtd_aux()

David Miller (1):
  Fix sparc64 v100 platform booting.

Denis V. Lunev (1):
  Fix IPV6 DAD handling

Eric Sandeen (2):
  ext34: ensure do_split leaves enough free space in both blocks
  dir_index: error out instead of BUG on corrupt dx dirs

Evgeniy Polyakov (1):
  Fix oops in vlan and bridging code

Greg Kroah-Hartman (1):
  Linux 2.6.22.9

Hans Verkuil (1):
  V4L: ivtv: fix VIDIOC_S_FBUF: new OSD values were never set

Herbert Xu (2):
  crypto: blkcipher_get_spot() handling of buffer at end of page
  Fix datagram recvmsg NULL iov handling 

[PATCH] fs/ocfs2/: removed unneeded initial value and function's return value

2007-09-26 Thread Denis Cheng
Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 fs/ocfs2/super.c |   17 -
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index c034b51..b98ec12 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -105,7 +105,7 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait);
 
 static int ocfs2_init_global_system_inodes(struct ocfs2_super *osb);
 static int ocfs2_init_local_system_inodes(struct ocfs2_super *osb);
-static int ocfs2_release_system_inodes(struct ocfs2_super *osb);
+static void ocfs2_release_system_inodes(struct ocfs2_super *osb);
 static int ocfs2_fill_local_node_info(struct ocfs2_super *osb);
 static int ocfs2_check_volume(struct ocfs2_super *osb);
 static int ocfs2_verify_volume(struct ocfs2_dinode *di,
@@ -177,7 +177,7 @@ static void ocfs2_write_super(struct super_block *sb)
 
 static int ocfs2_sync_fs(struct super_block *sb, int wait)
 {
-   int status = 0;
+   int status;
tid_t target;
struct ocfs2_super *osb = OCFS2_SB(sb);
 
@@ -275,9 +275,9 @@ bail:
return status;
 }
 
-static int ocfs2_release_system_inodes(struct ocfs2_super *osb)
+static void ocfs2_release_system_inodes(struct ocfs2_super *osb)
 {
-   int status = 0, i;
+   int i;
struct inode *inode;
 
mlog_entry_void();
@@ -302,8 +302,7 @@ static int ocfs2_release_system_inodes(struct ocfs2_super 
*osb)
osb->root_inode = NULL;
}
 
-   mlog_exit(status);
-   return status;
+   mlog_exit(0);
 }
 
 /* We're allocating fs objects, use GFP_NOFS */
@@ -453,7 +452,7 @@ static int ocfs2_sb_probe(struct super_block *sb,
  struct buffer_head **bh,
  int *sector_size)
 {
-   int status = 0, tmpstat;
+   int status, tmpstat;
struct ocfs1_vol_disk_hdr *hdr;
struct ocfs2_dinode *di;
int blksize;
@@ -1275,7 +1274,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
  struct buffer_head *bh,
  int sector_size)
 {
-   int status = 0;
+   int status;
int i, cbits, bbits;
struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data;
struct inode *inode = NULL;
@@ -1596,7 +1595,7 @@ static int ocfs2_verify_volume(struct ocfs2_dinode *di,
 
 static int ocfs2_check_volume(struct ocfs2_super *osb)
 {
-   int status = 0;
+   int status;
int dirty;
int local;
struct ocfs2_dinode *local_alloc = NULL; /* only used if we
-- 
1.5.3.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing mnt_drop_write() on open error

2007-09-26 Thread Dave Hansen
On Wed, 2007-09-26 at 19:50 +0200, Miklos Szeredi wrote:
> Maybe.  Can we do the mnt_want_write() from __dentry_open(), instead
> of may_open()?  That would be a lot cleaner.

I'll explore that.  It may make very good sense.

> Btw, may_open() doesn't do mnt_want_write() around the truncation if
> file is opened with O_TRUNC | O_RDONLY.

What's the path to may_open() in that case?  open_namei() should wrap
all callers other than nfs, and it does:

/* O_TRUNC implies we need access checks for write permissions */
if (flag & O_TRUNC)
acc_mode |= MAY_WRITE;

Which should trigger the may_open() code.  

later in open_namei():
...
ok:
error = may_open(nd, acc_mode, flag);
if (error)
goto exit;
return 0;


-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/24] CRED: Introduce a COW credentials record

2007-09-26 Thread Al Viro
On Wed, Sep 26, 2007 at 03:21:05PM +0100, David Howells wrote:
> To alter the credentials record, a copy must be made.  This copy may then be
> altered and then the pointer in the task_struct redirected to it.  From that
> point on the new record should be considered immutable.

Umm...  Perhaps a better primitive would be "make sure that our cred is
not shared with anybody, creating a copy and redirecting reference to
it if needed".

 
> In addition, the default setting of i_uid and i_gid to fsuid and fsgid has 
> been
> moved from the callers of new_inode() into new_inode() itself.

I don't think it's safe; better do something trivial like
own_inode(inode)
that would set these (and that's a goot splitup candidate, to go in front
of the series).


FWIW, the main weakness here is the need of update_current_cred()
splattered all over the entry points.  Two problems:
a) it's a bug source (somebody adds a syscall and forgets to
add that call / somebody modifies syscall guts and doesn't notice that
it needs to be added).
b) it's almost always doing noting, so being lazier would be
better (event numbers checked in the inlined part, perhaps?)


The former would be more robust if it had been closer to the places where
we get to passing current->cred to functions.  The latter...  When do
we actually step into this kind of situation (somebody changing keys on
us) and what's the right semantics here?  E.g. if it happens in the middle
of long read(), do we want to keep using the original keys?

Comments?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Man page for revised timerfd API

2007-09-26 Thread Davide Libenzi

Michael, SCB ...


On Wed, 26 Sep 2007, Michael Kerrisk wrote:

> .TH TIMERFD_CREATE 2 2007-09-26 Linux "Linux Programmer's Manual"
> .SH NAME
> timerfd_create, timerfd_settime, timer_gettime \-
> timers that notify via file descriptors
> .SH SYNOPSIS
> .\" FIXME . This header file may well change
> .\" FIXME . Probably _GNU_SOURCE will be required
> .\" FIXME . May require: Link with \fI\-lrt\f
> .nf
> .B #include 
> .sp
> .BI "int timerfd_create(int " clockid );
> .sp
> .BI "int timerfd_settime(int " fd ", int " flags ,
> .BI "const struct itimerspec *" new_value ,
> .BI "struct itimerspec *" curr_value );
> .sp
> .BI "int timerfd_gettime(int " fd ", struct itimerspec *" curr_value );
> .fi
> .SH DESCRIPTION
> These system calls create and operate on a timer
> that delivers timer expiration notifications via a file descriptor.
> They provide an alternative to the use of
> .BR setitimer (2)
> or
> .BR timer_create (3),
> with the advantage that the file descriptor may be monitored by
> .BR poll (2)
> and
> .BR select (2).

epoll, no?




> The use of these three system calls is analogous to the use of
> .BR timer_create (2),
> .BR timer_settime (2),
> and
> .BR timer_gettime (2).
> .\"
> .SS timerfd_create()
> .BR timerfd_create ()
> creates a new timer object,
> and returns a file descriptor that refers to that timer.
> The
> .I clockid
> argument specifies the clock that is used to mark the progress
> of the timer, and must be either
> .B CLOCK_REALTIME
> or
> .BR CLOCK_MONOTONIC .
> .B CLOCK_REALTIME
> is a settable system-wide clock.
> .B CLOCK_MONOTONIC
> is a non-settable clock that is not affected
> by discontinuous changes in the system clock
> (e.g., manual changes to system time).
> The current value of each of these clocks can be retrieved using
> .BR clock_gettime (3).
> .\"
> .SS timerfd_settime()
> .BR timerfd_settime ()
> arms (starts) or disarms (stops)
> the timer referred to by the file descriptor
> .IR fd .
> 
> The
> .I new_value
> argument specifies the initial expiration and interval for the timer.
> The
> .I itimer
> structure used for this argument contains two fields,
> each of which is in turn a structure of type
> .IR timespec :
> .in +0.25i
> .nf
> 
> struct timespec {
> time_t tv_sec;/* Seconds */
> long   tv_nsec;   /* Nanoseconds */
> };
> 
> struct itimerspec {
> struct timespec it_interval;  /* Interval for periodic timer */
> struct timespec it_value; /* Initial expiration */
> };
> .fi
> .in
> .PP
> .I new_value.it_value
> specifies the initial expiration of the timer,
> in seconds and nanoseconds.
> Setting either field of
> .I new_value.it_value
> to a non-zero value arms the timer.
> Setting both fields of
> .I new_value.it_value
> to zero disarms the timer.
> 
> Setting one or both fields of
> .I new_value.it_interval
> to non-zero values specifies the period, in seconds and nanoseconds,
> for repeated timer expirations after the initial expiration.
> If both fields of
> .I new_value.it_interval
> are zero, the timer expires just once, at the time specified by
> .IR new_value.it_value .
> 
> The
> .I flags
> argument is either 0, to start a relative timer
> .RI ( new_value.it_interval
> specifies a time relative to the current value of the clock specified by
> .IR clockid ),
> or
> .BR TFD_TIMER_ABSTIME ,
> to start an absolute timer
> .RI ( new_value.it_interval
> specifies an absolute time for the clock specified by
> .IR clockid ;
> that is, the timer will expire when the value of that
> clock reaches the value specified in
> .IR new_value.it_interval ).
> 
> The
> .I curr_value
> argument returns a structure containing the setting of the timer that
> was current at the time of the call; see the description of
> .BR timerfd_gettime ()
> following.
> .\"
> .SS timerfd_gettime()
> .BR timerfd_gettime ()
> returns, in
> .IR curr_value ,
> an
> .IR itimerspec
> that contains the current setting of the timer
> referred to by the file descriptor
> .IR fd .
> 
> The
> .I it_value
> field returns the amount of time
> until the timer will next expire.
> If both fields of this structure are zero,
> then the timer is currently disarmed.
> This field always contains a relative value, regardless of whether the
> .BR TFD_TIMER_ABSTIME
> flag was specified when setting the timer.
> 
> The
> .I it_interval
> field returns the interval of the timer.
> If both fields of this structure are zero,
> then the timer is set to expire just once, at the time specified by
> .IR curr_value.it_value .
> .SS Operating on a timer file descriptor
> The file descriptor returned by
> .BR timerfd_create (2)
> supports the following operations:
> .TP
> .BR read (2)
> If the timer has already expired one or more times since it was created,
> or since the last
> .BR read (2),
> then the buffer given to
> .BR read (2)
> returns an unsigned 8-byte integer
> .RI ( uint64_t )
> containing the number of expirations 

Re: [patch 1/7] Extended crashkernel command line

2007-09-26 Thread Oleg Verych
Wed, Sep 26, 2007 at 06:16:02PM +0200, Bernhard Walle (part two, see bottom):
> > memparse(), as a wrapper for somple_strtoll(), always have a return value
> > (zero by default).
> > 

Sorry for my typos, i should write `simple_strtoull()'. This function
(ULL from str) have always return value grater or equal to zero.

Thus,

> 
> Signed-off-by: Bernhard Walle <[EMAIL PROTECTED]>
> 
> ---
>  kernel/kexec.c |   31 ---
>  1 file changed, 24 insertions(+), 7 deletions(-)
> 
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1172,33 +1172,50 @@ static int __init parse_crashkernel_mem(
>   do {
>   unsigned long long start = 0, end = ULLONG_MAX;
>   unsigned long long size = -1;

no need in assigning values here, unless you plan to use them in case
of `return -EINVAL', but i can not see that,

> + char *tmp;
>  
>   /* get the start of the range */
> - start = memparse(cur, );
> + start = memparse(cur, );
> + if (cur == tmp) {
> + pr_warning("crashkernel: Memory value expected\n");
> + return -EINVAL;
> + }
> + cur = tmp;
>   if (*cur != '-') {
> - printk(KERN_WARNING "crashkernel: '-' expected\n");
> + pr_warning("crashkernel: '-' expected\n");
>   return -EINVAL;
>   }
>   cur++;
>  
>   /* if no ':' is here, than we read the end */
>   if (*cur != ':') {
> - end = memparse(cur, );
> + end = memparse(cur, );
> + if (cur == tmp) {
> + pr_warning("crashkernel: Memory "
> + "value expected\n");
> + return -EINVAL;
> + }
> + cur = tmp;
>   if (end <= start) {
> - printk(KERN_WARNING "crashkernel: end <= 
> start\n");
> + pr_warning("crashkernel: end <= start\n");
>   return -EINVAL;
>   }
>   }
>  
>   if (*cur != ':') {
> - printk(KERN_WARNING "crashkernel: ':' expected\n");
> + pr_warning("crashkernel: ':' expected\n");
>   return -EINVAL;
>   }
>   cur++;
>  
> - size = memparse(cur, );
> + size = memparse(cur, );
> + if (cur == tmp) {
> + pr_warning("Memory value expected\n");
> + return -EINVAL;
> + }
> + cur = tmp;
>   if (size < 0) {

`size' cannot be less that zero here (wonder, if it matters to have
userspace model of this parser and actually feed it with garbage input).

> - printk(KERN_WARNING "crashkernel: invalid size\n");
> + pr_warning("crashkernel: invalid size\n");
>   return -EINVAL;
>   }
>  


Wed, Sep 26, 2007 at 06:16:02PM +0200, Bernhard Walle (part one):
> Ok, that's fixed now, see patch below.
> 
> > And why not to make overall result reliable? This is kernel after all.
> > 
> > I.e. if there's valid `crashkernel=' option, but some parsing errors, why
> > not to apply some heuristics with warning in syslog, if user have some
> > conf, bootloader (random) errors, but feature still works?
> 
> I'm against guessing. The user has to specify a parameter which is
> right according to syntax.
> 
> However, I plan to make a patch that the kernel can detect a sensible
> offset automatically for i386 and x86_64 as it's done in ia64. Since
> both architectures have a relocatable kernel now, that makes perfectly
> sense. But that's another patch.

I was thinking about errors in YaST or typos in bootloader config, that
may appear sometimes. And kernel must tolerate this kind of userspace
input to be more reliable. But you know better, i just am waving hands.

("Mail-Followup-To:" respected)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing mnt_drop_write() on open error

2007-09-26 Thread Miklos Szeredi
> On Wed, 2007-09-26 at 10:38 +0200, Miklos Szeredi wrote:
> > In __dentry_open() there's still a few places where fput() won't be
> > called, notably when ->open fails, which is what I'm triggering I
> > think.
> > 
> > Also even more horrible things can happen because of the
> > nd->intent.open.file thing.  For example if the lookup routine calls
> > lookup_instantiate_filp(), and after this, but before may_open() some
> > error happens, then release_open_intent() will call fput() on the
> > file, which will cause mnt_drop_write() to be called, even though a
> > matching mnt_want_write() hasn't yet been called.  Ugly, eh? 
> 
> I'm not sure it is _that_ horrible. ;)
> 
> Do you see any reason we can't just shadow the
> get/put_write_access(inode) calls with mnt_want/drop_write() calls?  I
> think they're always matched.

Maybe.  Can we do the mnt_want_write() from __dentry_open(), instead
of may_open()?  That would be a lot cleaner.

Btw, may_open() doesn't do mnt_want_write() around the truncation if
file is opened with O_TRUNC | O_RDONLY.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 10:28 -0700, Keshavamurthy, Anil S wrote:
> On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote:
> > On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote:
> > > Hello,
> > > I am trying to use kprobes to measure the latency of a function by
> > > instrumenting its call site.  Basically, I find the call instruction,
> > > and insert a kprobe with a pre-handler and post-handler at that point.
> > > The pre-handler measures the latency (reads the TSC counter).  The
> > > post-handler measures the latency again, and subtracts the value that
> > > was read in the pre-handler to compute the total latency of the called
> > > function.
> > 
> > This sounds ok...
> 
> So what you are really measuring is the latency of just that single 
> instruction where you have inserted the probe i.e. because your
> pre-handler is called just before the probed instruction is executed and
> your post-handler is called right after you probed instruction is 
> single-stepped.

Exactly - I want to profile that single instance of the call.

> > 
> > > So to measure the latency of foo(), I basically want kprobes to do this:
> > > pre_handler();
> > > foo();
> When you insert a probe, you are inserting  probe on an instruction boundary
> and not at function level.
> > > post_handler();
> 
> Hence the above looks like
> 
> pre-handler()
> Probed-instruction; // most likely the first instruction in the foo();
> post-hanlder()
> rest-of-foo()

I see.

> > > 
> > > The problem is that the latencies that I am getting are consistently low
> > > (~10,000 cycles).  When I manually instrument the functions, the latency
> > > is about 20,000,000 cycles.  Clearly something is not right here.
> As I mentioned above what you are seeing is the latency of just the
> probed instruction and hence it is very very low compared to
> the latency of the function foo().
> 
> > You could try a a couple of approaches for starters.
> I agree with Ananth, you can try the below approaches
> for your measurements.
> 
> > 
> > a. As you mention above, a kprobe on the function invocation and the
> > other on the instruction following the call; both need just pre_handlers. 
> > 
> > b.
> > - Insert a kprobe and a kretprobe on foo()
> > - The kprobe needs to have only a pre_handler that'll measure the latency
> > - A similar handler for the kretprobe handler can measure the latency
> > again and their difference will give you foo()'s latency.
> > 
> >  though will require you to do some housekeeping in case foo() is
> > reentrant to track which return instance corresponds to which call.
> > 
> > Ananth
> > 
> > PS: There was a thought of providing a facility to run a handler at
> > function entry even when just a kretprobe is used. Maybe we need to
> > relook at that; it'd have been useful in this case.

Thanks for the clarifications!

Avishay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KPROBES: Instrumenting a function's call site

2007-09-26 Thread Avishay Traeger
On Wed, 2007-09-26 at 22:57 +0530, Ananth N Mavinakayanahalli wrote:
> On Wed, Sep 26, 2007 at 12:09:35PM -0400, Avishay Traeger wrote:
> > On Wed, 2007-09-26 at 14:33 +0530, Ananth N Mavinakayanahalli wrote:
> > > What happens when the "call" is singlestepped is that the instruction
> > > pointer is moved to the call target. That explains the lower latency you
> > > are seeing. You'll need to do something along the lines I suggested in
> > > the earlier mail.
> > 
> > Can you please explain what you mean by this more clearly?  I'm not a
> > kprobes expert yet.  Specifically, using kprobes the way that I did,
> > what will the resulting code look like?  Also, what do you mean by
> > "singlestepped"?
> 
> If you single-step (regs->eflags | TF_MASK in i386) on a call instruction,
> you'll end up at the call target; ie., after the post_kprobe_handler()
> returns, the instruction pointer will point to the first instruction
> of foo().
> 
> Try printk()ing the instruction pointer(regs) after resume_execution()
> in the post_kprobe_handler() in your arch//kernel/kprobes.c, you'll
> see what I mean.
> 
> And when I say singlestepped, I mean executing one instruction under the
> architecture specific single step enable flag - the "trap" flag for i386,
> the MSR_SE for powerpc, etc. Evidently, this'll mean single-stepping a
> single instruction.
> 
> Ananth

I see - thanks for all your prompt and helpful advice!

Avishay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] taskstats: separate PID/TGID stats producers to complete the TGID ones

2007-09-26 Thread Guillaume Chazarain
Le Sat, 22 Sep 2007 23:36:29 +0530,
Balbir Singh <[EMAIL PROTECTED]> a écrit :

[reordered]

> How about calling this one fill_threadgroup_stats()?
> How about we call function add_tsk_stats()?
> I still prefer braces around do <--> while, I think the code is easier
> to read with them.
> Could we further split the tsacct and delayacct/taskstats patches.

Hi Balbir,

Thank for your review, hopefully I addressed all your items in this
series.

> So we always call fill_threadgroup later, is there a reason for
> that. Can the order of calls be arbitrary or is there a dependency?

Yes, as you saw there is this dependency, it is needed only for
[PATCH 7/8] taskstats: fix stats->ac_exitcode to work on threads and use 
group_exit_code
but I found it simpler to put the calls in the right order from the
beginning.

Thanks again, and sorry for the double mail bomb.

-- 
Guillaume
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   >