Re: [PATCH v2] Bitbanging i2c bus driver using the GPIO API

2007-03-11 Thread Wu, Bryan
On Sat, 2007-03-10 at 14:13 +0100, Haavard Skinnemoen wrote:
 This is a very simple bitbanging i2c bus driver utilizing the new
 arch-neutral GPIO API. Useful for chips that don't have a built-in
 i2c controller, additional i2c busses, or testing purposes.
 

Sorry for missing this hot discussion. Your idea is exactly what I want.
So many arch specific GPIO based I2C adapter implementation will benefit
from this.

 To use, include something similar to the following in the
 board-specific setup code:
 
   #include linux/i2c-gpio.h
 
   static struct i2c_gpio_platform_data i2c_gpio_data = {
   .sda_pin= GPIO_PIN_FOO,
   .scl_pin= GPIO_PIN_BAR,
   };

Is this usage right, because 3 flags are added to this structure as
below:

struct i2c_gpio_platform_data {
unsigned int sda_pin;
unsigned int scl_pin;
unsigned int sda_is_open_drain:1;
unsigned int scl_is_open_drain:1;
unsigned int scl_is_output_only:1;
};

   static struct platform_device i2c_gpio_device = {
   .name   = i2c-gpio,
   .id = 0,
   .dev= {
   .platform_data  = i2c_gpio_data,
   },
   };
 
 Register this platform_device, set up the i2c pins as GPIO if
 required and you're ready to go.
 
 Signed-off-by: Haavard Skinnemoen [EMAIL PROTECTED]
 ---
 This patch is different from the first patch in the following ways:
   * Handles pins set up as open drain (aka multidrive) by toggling
 the output value instead of the direction
   * Handles output-only SCL pins the same way, and also does not
 install a getscl() callback for such pins
   * Does not add anything to include/linux/i2c-ids.h
   * Sets the output value explicitly after changing the direction to
 output.
   * Plugs a memory leak in remove() -- algo_data wasn't freed.
   * Prints out the pin IDs in decimal, with an extra note when clock
 stretching isn't supported
 
 This version has been compile-tested only. I'll give it a spin when I
 get back to work on monday.
 
 Dave, does this address your concerns?
 
 Haavard   

Thanks a lot,  I will drop our GPIO based I2C driver and try this one on
our platform.

 + if (!pdata-scl_is_output_only)
 + bit_data-getscl = i2c_gpio_getscl,
 +
 + bit_data-getsda= i2c_gpio_getsda,
 + bit_data-udelay= 5,/* 100 kHz */
 + bit_data-timeout   = HZ / 10,  /* 100 ms */

Can we add these udelay/timeout to struct i2c_gpio_platform_data? And
let customer to choose these according their specific requirement. We
use Kconfig to do this, but Jean and David don't like the idea, -:(

Regards,
-Bryan Wu
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: i.MX/MX1 SDHC fix/workaround of SD card recognition problems

2007-03-11 Thread Pavel Pisa
On Monday 12 March 2007 00:36, you wrote:
 Pavel Pisa wrote:
  The SDHC controllers cannot process shorter transfers.
  They has to be handled as longer ones, but it such case CRC
  error is evaluated. There was a case in the code still,
  where this error is not ignored as it should to be process
  these transfers.
 
  Signed-off-by: Pavel Pisa [EMAIL PROTECTED]

 Thanks, applied. Is this something critical that should be in 2.6.21?

 Rgds

Hello Pierre,

this should go to 2.6.21, I have hold this for some
months and I have discussed it in the thread
Re: CRC Errors with SD cards in 4bits mode (on i.MXl)
You have been CCed. This is not solution for seen data CRC
problem, but solves problems with recognition of cards
which has been timing sensitive sometimes.

I have sent it into Russell's patch queue with my others
MX1 fixes I have intended to be included in 2.6.21.
It was probably mistake for this one, because it should
go through your tree. If you send it to mainline
yourself, I would discard patch from patch daemon.

We have spoken about MX1 SDHC maintainership.
I am attaching my subscription.
I am not sure about mailing list field there.
Do you suggest this one, ALKML or other?

Best wishes

  Pavel Pisa

--
Subject: i.MX/MX1 SDHC maintainer

I am reporting to responsibility for i.MX MMC driver
bugs and coordination of the fighting against problems
of this hardware beast.

Signed-off-by: Pavel Pisa [EMAIL PROTECTED]

 MAINTAINERS |7 +++
 1 file changed, 7 insertions(+)

Index: linux-2.6.21-rc1/MAINTAINERS
===
--- linux-2.6.21-rc1.orig/MAINTAINERS
+++ linux-2.6.21-rc1/MAINTAINERS
@@ -1713,6 +1713,13 @@ M:   [EMAIL PROTECTED]
 L: [EMAIL PROTECTED] (subscribers-only)
 S: Maintained
 
+IMX MMC/SD HOST CONTROLLER INTERFACE DRIVER
+P: Pavel Pisa
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
+W: http://mmc.drzeus.cx/wiki/Controllers/Freescale/SDHC
+S: Maintained
+
 INFINIBAND SUBSYSTEM
 P: Roland Dreier
 M: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Joe Jin
 
 This is a bug actually in the megaraid.

Aha, I'll track it.

 
 And this is a direct command submission path:  it already passed both
 online check gates in this path *after* the device was offlined, so
 adding a third won't fix this. 

Yeah, I have notice that, however, from the logs, the device have offline, 
but why still can send cmd to device? isn't the sequences of printk suspectful?

 single disk, so the I/O was definitely bound for sda?  Secondly, can you
 reproduce with a modern (2.6.20) kernel.  Your trace strongly suggests
 that the device came back online for some reason and then the megaraid
 driver died.

It's hard to update the kernel for the system is a production system, and we
cannot debug it at the box :( 

I dont know if you have notice, the logs come from diskdump, if it caused by
diskdump?

Thanks,
Joe
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] fs: introduce perform_write aop

2007-03-11 Thread Mark Fasheh
On Sat, Mar 10, 2007 at 09:25:41AM +, Christoph Hellwig wrote:
 On Fri, Mar 09, 2007 at 03:33:01PM -0800, Mark Fasheh wrote:
  -kernel_write() as opposed to genericizing -perform_write() would be fine
  with me. Just so long as we get rid of -prepare_write and -commit_write in
  that other kernel code doesn't call them directly. That interface just
  doesn't work for Ocfs2.
 
 It doesn't work for any filesystem that needs slightly fancy locking.
 That and the reason that's an interface that doesn't fit into our
 layering is why I want to get rid of it.  Note that fops-kernel_write
 might in fact use -perform_write with an actor as Nick suggested.
 I'm not quite sure how it'll look like - I'd rather take care of the
 buffered write path first and then handle this issue once the first
 changes have stabilized.
 
  Right now I've got Ocfs2 implementing it's own lowest-level buffered write
  code - think generic_file_buffered_write() replacement for Ocfs2. With some
  duplicated code above that layer. What's nice is that I can abstract away
  the copy data into some target pages bits such that the majority of that
  code is re-usable for ocfs2's splice write operation. I'm not sure we could
  have that low a level of abstraction for anyhing above individual the file
  system though which also has to deal with non-kernel writes though. That's
  where a -kernel_write() might come in handy.
 
 Why do you need your own low level buffered write functionality?  As in
 past times when filesystems want to come up I'd like to have a very
 good exaplanation on why you think it's needed and whether we shouldn't
 improve the generic buffered write code instead.

Fair enough - I personally tried everything I could before coming to the
conclusion that for the time being, Ocfs2 should have a seperate write path.

As you know, I've been adding sparse file support for Ocfs2. Putting aside
all the reasons to have real support for sparse files (as opposed to zeroing
allocated regions), the tree code changes alone has gotten us 90% the way to
supporting unwritten extents (much like xfs does).

Ocfs2 supports atomic data allocation units ('clusters', to use an
overloaded term) which can range in size from 4k to 1 meg. This means that
for allocating writes on page size  cluster size file systems, we have to
zero pages adjacent to the one being written so that a re-read doesn't
return dirty data. This alone requires page locking which we can't
realistically achieve via -prepare_write() and -commit_write(). I believe
NTFS has a similar restriction, which has lead to their own file write.

So, page locking was definitely the straw that broke the camels back. Some
other things which were akward or slightly less critically broken than the
page locking:

Since ocfs2 has a rather large (compared to a local file system) context to
build up during an allocating write, it became uncomfortable to pass that
around -prepare_write() and -commit_write() without putting that context
on our struct inode and protecting it with a lock. And since the existing
interfaces were so rigid, it actually required a lot more context to be
passed around than in my current code.

There's also the cluster lock / page lock inversion which we have to deal
with (it gets even worse if we fault in pages in the middle of the user copy
for a write). Granted, we fixed a lot of that before merging, but allocating
in write means taking even more cluster locks and I don't really feel
comfortable nesting so many of those within the page locks.

Finally, we get to the optimization problem - writing stuff one page at a
time. To be fair, my current stuff doesn't do a very good job of optimizing
the amount of data written in a given pass, but the groundwork is there to
easily write at least one clusters worth of user data at a time. My priority
has been mostly to stabilize it as opposed to performance tuning.

So, quite possibly, I overstated what Ocfs2 was doing earlier - we still
make use of as much generic code as we can. The O_DIRECT path for instance
wasn't touched. Ocfs2 still makes use of block_commit_write(), the standard
jbd mechanisms for ordered data mode, and though we got rid of
block_prepare_write() (for zeroing reasons), what we do is a much simpler
version.

By the way, the code in question can be found in the sparse_files branch of
ocfs2.git:

http://git.kernel.org/?p=linux/kernel/git/mfasheh/ocfs2.git;a=log;h=sparse_files

Your review has been extremely useful in the past, so I welcome any comments
you might have.

Though it's getting close to being put in ALL (for a spin in -mm), it's
definitely a work in progress branch. There's 3 patches to generic code
which I need to push out for review (it's pretty much just exporting symbols
which we'd need in any case). Also, some of the bug fixes and feature
adjustments need to get folded back into their respective patches.

 This codepath is so nasty that any duplication will be a maintaince
 horror.

All that 

Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Joe Jin
 The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
 been fixed in mainline by other means?

I cannot confirm if it have fixed in latest kernel, the server is a
production system, it's hard to debug it and try reproduce.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Andrew Morton
 On Mon, 12 Mar 2007 10:52:22 +0800 Joe Jin [EMAIL PROTECTED] wrote:
  The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
  been fixed in mainline by other means?
 
 I cannot confirm if it have fixed in latest kernel, the server is a
 production system, it's hard to debug it and try reproduce.

Well.  That makes it hard to run tests, but perhaps it can be determined
from code review..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [git patches] libata fixes

2007-03-11 Thread Linus Torvalds


On Sun, 11 Mar 2007, Paul Rolland wrote:

 My machine is having two problems : the one you are describing above,
 which is due to a SIL controler being connected to one port of the ICH7
 (at least, it seems to), and probing it goes  timeout, but nothing is
 connected on it.

Ok, so that's just a message irritation, not actually bothersome 
otherwise?

 The second problem is a Jmicron363 controler that is failing to detect
 the DVD-RW that is connected, unless I use the irqpoll option as Tejun has
 suggested.

.. and this one has never worked without irqpoll?

 But, as you suggest it, I'm adding pci=nomsi to the command line
 rebooting... no change for this part of the problem.
 
 OK, the /proc/interrupt for this config, and the dmesg attached.
 
 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts 
CPU0   CPU1   
   0: 297549  0   IO-APIC-edge  timer
   1:  7  0   IO-APIC-edge  i8042
   4: 13  0   IO-APIC-edge  serial
   6:  5  0   IO-APIC-edge  floppy
   8:  1  0   IO-APIC-edge  rtc
   9:  0  0   IO-APIC-fasteoi   acpi
  12:126  0   IO-APIC-edge  i8042
  14:   8313  0   IO-APIC-edge  libata
  15:  0  0   IO-APIC-edge  libata
  16:  0  0   IO-APIC-fasteoi   eth1, libata

So it's the irq16 one that is the Jmicron controller and just isn't 
getting any interrupts?

Since all the other interrupts work (and MSI worked for other 
controllers), I don't think it's interrupt-routing related. Especially as 
MSI shouldn't even care about things like that.

And since it all works when irqpoll is used, that implies that the 
*only* thing that is broken is literally irq delivery.

Is there possibly some jmicron-specific enable interrupts bit? 

 PS : I'd like to try 2.6.21-rc3, but it seems that this is breaking my
 config : disk naming is no more the same, and I end up with a panic
 Warning: unable to open an initial console
 though i've been compiling with the same .config I was using for 2.6.21-rc2

Gaah. Can you get a log through serial console or netconsole to see what 
changed?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA resume slowness, e1000 MSI warning

2007-03-11 Thread Michael S. Tsirkin
 Quoting Eric W. Biederman [EMAIL PROTECTED]:
 Subject: Re: SATA resume slowness, e1000 MSI warning
 
 Michael S. Tsirkin [EMAIL PROTECTED] writes:
 
  OK I guess. I gather we assume writing read-only registers has no side 
  effects?
  Are there rumors circulating wrt to these?
 
 I haven't heard anything about that, and if we are writing the same value back
 it should be pretty safe.
 
 I have heard it asserted that at least one version of the pci spec
 only required 32bit accesses to be supported by the hardware.  One of
 these days I will have to look that and see if it is true.

Maybe. But surely before the PCI-X days.

 I do know
 it can be weird for hardware developers to support multiple kinds of
 decode.

Is this the only place where Linux uses 
pci_read_config_word/pci_read_config_dword?
I think such hardware will be pretty much DOA on all OS-es.  Why don't we wait
and see whether someone reports a broken config?

 As I recall for pci and pci-x at the hardware level the only
 difference in between 32bit transactions and smaller ones is the state
 of the byte-enable lines.

True, and same holds for PCI-Express.

So let's assume hardware implements RO correctly but ignores the BE bits -
nothing bad happens then, right?

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-11 Thread Al Boldi
Con Kolivas wrote:
 On Monday 12 March 2007 08:52, Con Kolivas wrote:
  And thank you! I think I know what's going on now. I think each rotation
  is followed by another rotation before the higher priority task is
  getting a look in in schedule() to even get quota and add it to the
  runqueue quota. I'll try a simple change to see if that helps. Patch
  coming up shortly.

 Can you try the following patch and see if it helps. There's also one
 minor preemption logic fix in there that I'm planning on including.
 Thanks!

Applied on top of v0.28 mainline, and there is no difference.

What's it look like on your machine?


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] xfs: use xfs_get_buf_noaddr for iclogs

2007-03-11 Thread David Chinner
On Wed, Mar 07, 2007 at 11:13:14AM +0100, Christoph Hellwig wrote:
 xfs_buf_get_noaddr.  There's a subtile change because
 xfs_buf_get_empty returns the buffer locked, but xfs_buf_get_noaddr
 returns it unlocked.  From my auditing and testing nothing in the
 log I/O code cares about this distincition, but I'd be happy if
 someone could try to prove this independently.

Looks safe to me - we initialise all the fields in the xfs_buf_t
when we allocate out of the slab, so it doesn't really matter what
state the buffer is in when we free it.

OTOH, all other buffers are supposed to be locked when under I/O.
This change makes a special case for the log buffers, and I'd prefer
not to have to remember that this behaviour changed fo log buffers
at some point in time.

I suggest that adding:

 - iclog-hic_data = (xlog_in_core_2_t *)
 -   kmem_zalloc(iclogsize, KM_SLEEP | KM_LARGE);
 -
   iclog-ic_prev = prev_iclog;
   prev_iclog = iclog;
 +
 + bp = xfs_buf_get_noaddr(log-l_iclog_size, mp-m_logdev_targp);
 + XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
 + XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb);
 + XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);

+   XFS_BUF_PSEMA(bp, PRIBIO);

 + iclog-ic_bp = bp;
 + iclog-hic_data = bp-b_addr;
 +
   log-l_iclog_bak[i] = (xfs_caddr_t)(iclog-ic_header);
  
   head = iclog-ic_header;

To lock the buffer should be added here. That way we don't change
any semantics of the code at all.

 @@ -1216,11 +1221,6 @@
   INT_SET(head-h_fmt, ARCH_CONVERT, XLOG_FMT);
   memcpy(head-h_fs_uuid, mp-m_sb.sb_uuid, sizeof(uuid_t));
  
 - bp = xfs_buf_get_empty(log-l_iclog_size, mp-m_logdev_targp);
 - XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
 - XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb);
 - XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
 - iclog-ic_bp = bp;
  
   iclog-ic_size = XFS_BUF_SIZE(bp) - log-l_iclog_hsize;
   iclog-ic_state = XLOG_STATE_ACTIVE;
 @@ -1229,7 +1229,6 @@
   iclog-ic_datap = (char *)iclog-hic_data + log-l_iclog_hsize;
  
   ASSERT(XFS_BUF_ISBUSY(iclog-ic_bp));
 - ASSERT(XFS_BUF_VALUSEMA(iclog-ic_bp) = 0);

And this assert can then stay...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Null pointer in autofs4 (_spin_lock) in 2.6.21-rc2

2007-03-11 Thread Ian Kent
On Sun, 11 Mar 2007, Thomas Renninger wrote:

 On Thu, 2007-03-08 at 19:39 +0900, Ian Kent wrote:
  On Thu, 2007-03-08 at 11:12 +0100, Thomas Renninger wrote:
   On Thu, 2007-03-08 at 01:28 -0800, Andrew Morton wrote:
 On Thu, 08 Mar 2007 09:57:56 +0100 Thomas Renninger [EMAIL 
 PROTECTED] wrote:
 I saw this happening several times on 2.6.21-rc2.
 Tell me how I can help...
 Some nfs partitions are mounted via nfs using autofs.
 It takes some hours to run into this:
 
 Unable to handle kernel NULL pointer dereference at 0008
 RIP:
  [8025bada] _spin_lock+0x0/0xf
 PGD 1dde23067 PUD 1d3060067 PMD 0
 Oops: 0002 [1] SMP
 CPU 3
 Modules linked in: autofs4 nfs lockd nfs_acl sunrpc asus_acpi 
 af_packet
 tg3 ipv6 button battery ac ext2 mbcache loop dm_mod floppy parport_pc 
 lp
 parport reiserfs pata_amd edd fan thermal sg processor sata_sil libata
 amd74xx sd_mod scsi_mod ide_disk ide_core
 Pid: 11373, comm: touch Not tainted 2.6.21-rc2-default #6
 RIP: 0010:[8025bada]  [8025bada] 
 _spin_lock+0x0/0xf
 RSP: 0018:8101c50a5a50  EFLAGS: 00010202
 RAX: 8100eb8916f8 RBX: 81010007dcd8 RCX: 8100ea45b280
 RDX: 10e58c2e RSI: 810163bf9e50 RDI: 0008
 RBP: 810163bf9e50 R08: 8101c50a4000 R09: 8101c50a5ea8
 R10: 81010003fca8 R11: 802299ad R12: 
 R13: 8100eb891680 R14: 0005 R15: 8101c50a5b48
 FS:  2b8ae744bf20() GS:81010016a7c0()
 knlGS:b7bd88d0
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 0008 CR3: 0001b925f000 CR4: 06e0
 Process touch (pid: 11373, threadinfo 8101c50a4000, task
 8101b78bd100)
 Stack:  882d5f38 8101c50a5ea8 8100ec8df4b0
 00d0
  8100eb8916f8 810163bf9efc 10e58c2eea45b220 8100ea45b220
  810163bf9e50 8100ea45b220 8100ec8df4b0 8100ec8df568
 Call Trace:
  [882d5f38] :autofs4:autofs4_lookup+0xcb/0x311
  [8020c0d8] do_lookup+0xc4/0x1ae
  [802097be] __link_path_walk+0x8ec/0xd9d
  [8824ca24] :sunrpc:rpcauth_lookup_credcache+0x12e/0x24a
  [8020da3e] link_path_walk+0x58/0xe0
  [80232d3f] __strncpy_from_user+0x17/0x41
  [8020949b] __link_path_walk+0x5c9/0xd9d
  [8020da3e] link_path_walk+0x58/0xe0
  [80232d3f] __strncpy_from_user+0x17/0x41
  [8020bea7] do_path_lookup+0x1b6/0x217
  [80221512] __path_lookup_intent_open+0x56/0x97
  [80218912] open_namei+0xa9/0x64c
  [8025dc33] do_page_fault+0x45e/0x7ad
  [802250eb] do_filp_open+0x1c/0x38
  [80232d3f] __strncpy_from_user+0x17/0x41
  [80217698] do_sys_open+0x44/0xc1
  [8025511e] system_call+0x7e/0x83
 
 
 Code: f0 ff 0f 79 09 f3 90 83 3f 00 7e f9 eb f2 c3 f0 81 2f 00 00
 RIP  [8025bada] _spin_lock+0x0/0xf
  RSP 8101c50a5a50
 CR2: 0008

I assume 2.6.20 is OK?
   Can't say for sure, I expect yes.
   Set up with 2.6.20 now and let it run for a day or two.
   Maybe someone has worked in that area and has an idea meanwhile...
  
  Do we have any idea on what was being opened here?
  Might be useful to see the autofs maps if possible.
 I sent that stuff to Ian...
 
 However, I couldn't run into that with 2.6.20 and also not with
 *2.6.21-rc3* (yet). Maybe it already got fixed?
 Machine still running, I'll report back if this should happen again.

I suspect the problem is still present but maybe a bit hard to trigger.
I'm not convinced this is needed but it is the only thing that looks at 
all suspicious so if (when) you see this again could you give the patch 
below a try please.

Ian

---

--- linux-2.6.21-rc3/fs/autofs4/root.c.sbi-check2007-03-12 
13:29:42.0 +0900
+++ linux-2.6.21-rc3/fs/autofs4/root.c  2007-03-12 13:30:04.0 +0900
@@ -503,6 +503,9 @@ static struct dentry *autofs4_lookup_unh
const unsigned char *str = name-name;
struct list_head *p, *head;
 
+   if (!sbi)
+   return NULL;
+
spin_lock(dcache_lock);
spin_lock(sbi-rehash_lock);
head = sbi-rehash_list;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)

2007-03-11 Thread Paul E. McKenney
On Sun, Mar 11, 2007 at 06:49:08PM +0100, Rafael J. Wysocki wrote:
 On Saturday, 3 March 2007 18:32, Oleg Nesterov wrote:
  On 03/02, Paul E. McKenney wrote:
  
   On Sat, Mar 03, 2007 at 02:33:37AM +0300, Oleg Nesterov wrote:
On 03/02, Paul E. McKenney wrote:

 One way to embed try_to_freeze() into kthread_should_stop() might be
 as follows:
 
   int kthread_should_stop(void)
   {
   if (kthread_stop_info.k == current)
   return 1;
   try_to_freeze();
   return 0;
   }

I think this is dangerous. For example, worker_thread() will probably
need some special actions after return from refrigerator. Also, a kernel
thread may check kthread_should_stop() in the place where 
try_to_freeze()
is not safe.

Perhaps we should introduce a new helper which does this.
   
   Good point -- the return value from try_to_freeze() is lost if one uses
   the above approach.  About one third of the calls to try_to_freeze()
   in 2.6.20 pay attention to the return value.
   
   One approach would be to have a kthread_should_stop_nofreeze() for those
   cases, and let the default be to try to freeze.
  
  I personally think we should do the opposite, add 
  kthread_should_stop_check_freeze()
  or something. kthread_should_stop() is like signal_pending(), we can use
  it under spin_lock (and it is probably used this way by some out-of-tree
  driver). The new helper is obviously might_sleep().
 
 Something like this, perhaps:

Looks good to me!  The other kthread_should_stop() calls in
rcutorture.c should also become kthread_should_top_check_freeze().

Acked-by: Paul E. McKenney [EMAIL PROTECTED]

  include/linux/kthread.h |1 +
  kernel/kthread.c|   16 
  kernel/rcutorture.c |5 ++---
  3 files changed, 19 insertions(+), 3 deletions(-)
 
 Index: linux-2.6.21-rc3-mm2/kernel/kthread.c
 ===
 --- linux-2.6.21-rc3-mm2.orig/kernel/kthread.c2007-03-08 
 21:58:48.0 +0100
 +++ linux-2.6.21-rc3-mm2/kernel/kthread.c 2007-03-11 18:32:59.0 
 +0100
 @@ -13,6 +13,7 @@
  #include linux/file.h
  #include linux/module.h
  #include linux/mutex.h
 +#include linux/freezer.h
  #include asm/semaphore.h
 
  /*
 @@ -60,6 +61,21 @@ int kthread_should_stop(void)
  }
  EXPORT_SYMBOL(kthread_should_stop);
 
 +/**
 + * kthread_should_stop_check_freeze - check if the thread should return now 
 and
 + * if not, check if there is a freezing request pending for it.
 + */
 +int kthread_should_stop_check_freeze(void)
 +{
 + might_sleep();
 + if (kthread_stop_info.k == current)
 + return 1;
 +
 + try_to_freeze();
 + return 0;
 +}
 +EXPORT_SYMBOL(kthread_should_stop_check_freeze);
 +
  static void kthread_exit_files(void)
  {
   struct fs_struct *fs;
 Index: linux-2.6.21-rc3-mm2/include/linux/kthread.h
 ===
 --- linux-2.6.21-rc3-mm2.orig/include/linux/kthread.h 2007-02-04 
 19:44:54.0 +0100
 +++ linux-2.6.21-rc3-mm2/include/linux/kthread.h  2007-03-11 
 18:37:10.0 +0100
 @@ -29,5 +29,6 @@ struct task_struct *kthread_create(int (
  void kthread_bind(struct task_struct *k, unsigned int cpu);
  int kthread_stop(struct task_struct *k);
  int kthread_should_stop(void);
 +int kthread_should_stop_check_freeze(void);
 
  #endif /* _LINUX_KTHREAD_H */
 Index: linux-2.6.21-rc3-mm2/kernel/rcutorture.c
 ===
 --- linux-2.6.21-rc3-mm2.orig/kernel/rcutorture.c 2007-03-11 
 11:39:06.0 +0100
 +++ linux-2.6.21-rc3-mm2/kernel/rcutorture.c  2007-03-11 18:45:00.0 
 +0100
 @@ -540,10 +540,9 @@ rcu_torture_writer(void *arg)
   }
   rcu_torture_current_version++;
   oldbatch = cur_ops-completed();
 - try_to_freeze();
 - } while (!kthread_should_stop()  !fullstop);
 + } while (!kthread_should_stop_check_freeze()  !fullstop);
   VERBOSE_PRINTK_STRING(rcu_torture_writer task stopping);
 - while (!kthread_should_stop())
 + while (!kthread_should_stop_check_freeze())
   schedule_timeout_uninterruptible(1);
   return 0;
  }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.30 cpu scheduler for ... 2.6.18.8 kernel

2007-03-11 Thread Con Kolivas
On Monday 12 March 2007 19:17, Vincent Fortier wrote:
  There are updated patches for 2.6.20, 2.6.20.2, 2.6.21-rc3 and
  2.6.21-rc3-mm2 to bring RSDL up to version 0.30 for download here:
 
  Full patches:
 
  http://ck.kolivas.org/patches/staircase-deadline/2.6.20-sched-rsdl-0.30.p
 at ch
  http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2-rsdl-0.30.patch
  http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0.
 30 .patch
  http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.30
 .p atch
 
  incrementals:
 
  http://ck.kolivas.org/patches/staircase-deadline/2.6.20/2.6.20.2-rsdl-0.2
 9- 0.30.patch
  http://ck.kolivas.org/patches/staircase-deadline/2.6.20.2/2.6.20.2-rsdl-0
 .2 9-0.30.patch
  http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3/2.6.21-rc3-rs
 dl -0.29-0.30.patch
  http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2/2.6.21-rc
 3- mm2-rsdl-0.29-0.30.patch

 And here are the backported RSDL 0.30 patches in case any of you would
 still be running an older 2.6.18.8 kernel ...

Thanks, your efforts are appreciated as it would take me quite a while to do a 
variety of backports that people are already requesting.

 Just for info, verison 0.30 seems around 2 seconds faster than 0.26-0.29
 versions at boot time.  I used to have around 2-3 seconds of difference
 between a vanilla and a rsdl patched kernel.  Now it looks more like 5
 seconds faster!  Wow.. nice work CK!

 2.6.18.8 vanilla kernel:
 [   68.514248] ACPI: Power Button (CM) [PWRB]

 2.6.18.8-rsdl-0.30:
 [   63.739337] ACPI: Power Button (CM) [PWRB]

Indeed there's almost 5 seconds difference there. To be honest, the boot time 
speedups are an unexpected bonus, but everyone seems to be reporting them on 
all flavours so perhaps all those timeout related driver setups are 
inadvertently benefiting.

 - vin

Thanks

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1

2007-03-11 Thread Paul E. McKenney
On Sun, Mar 11, 2007 at 06:02:31PM +0100, Michal Piotrowski wrote:
 On 10/03/07, Paul E. McKenney [EMAIL PROTECTED] wrote:
 On Fri, Mar 09, 2007 at 06:18:51PM -0800, Andrew Morton wrote:
   On Thu, 08 Mar 2007 21:50:29 +0100 Michal Piotrowski 
 [EMAIL PROTECTED] wrote:
   Andrew Morton napisaƂ(a):
Temporarily at
   
  http://userweb.kernel.org/~akpm/2.6.21-rc3-mm1/
   
Will appear later at
   
  
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc3/2.6.21-rc3-mm1/
   
  
   cpu_hotplug (AutoTest) hangs at this
  
   =
   [ INFO: possible recursive locking detected ]
   2.6.21-rc3-mm1 #2
   -
   sh/7213 is trying to acquire lock:
(sched_hotcpu_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f
  
   but task is already holding lock:
(sched_hotcpu_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f
  
   other info that might help us debug this:
   4 locks held by sh/7213:
#0:  (cpu_add_remove_lock){--..}, at: [c033883a] 
 mutex_lock+0x1c/0x1f
#1:  (sched_hotcpu_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f
#2:  (cache_chain_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f
#3:  (workqueue_mutex){--..}, at: [c033883a] mutex_lock+0x1c/0x1f
 
  That's pretty useless, isn't it?  We need to know the mutex_lock() caller
  here.
 
   stack backtrace
[c0105256] show_trace_log_lvl+0x1a/0x2f
[c010597b] show_trace+0x12/0x14
[c0105a3d] dump_stack+0x16/0x18
[c013fc73] __lock_acquire+0x1aa/0xceb
[c014082d] lock_acquire+0x79/0x93
[c03385dc] __mutex_lock_slowpath+0x107/0x349
[c033883a] mutex_lock+0x1c/0x1f
[c011d924] sched_getaffinity+0x14/0x91
[c015796d] __synchronize_sched+0x11/0x5f
[c011d257] detach_destroy_domains+0x2c/0x30
[c011fc1a] update_sched_domains+0x27/0x3a
[c012fe7a] notifier_call_chain+0x2b/0x4a
[c012fec6] __raw_notifier_call_chain+0x19/0x1e
[c0145756] _cpu_down+0x70/0x282
[c014598e] cpu_down+0x26/0x38
[c0272714] store_online+0x27/0x5a
[c026f610] sysdev_store+0x20/0x25
[c01b7a8e] sysfs_write_file+0xc1/0xe9
[c0180052] vfs_write+0xd1/0x15a
[c0180682] sys_write+0x3d/0x72
[c0104270] syscall_call+0x7/0xb
  
   l *0xc033883a
   0xc033883a is in mutex_lock 
 (/mnt/md0/devel/linux-mm/kernel/mutex.c:92).
   87  /*
   88   * The locking fastpath is the 1-0 transition from
   89   * 'unlocked' into 'locked' state.
   90   */
   91  __mutex_fastpath_lock(lock-count, 
 __mutex_lock_slowpath);
   92  }
   93
   94  EXPORT_SYMBOL(mutex_lock);
   95
   96  static void fastcall noinline __sched
  
   I didn't test other -mm's with this test.
  
   
 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3-mm1/console.log
   
 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3-mm1/mm-config
 
  I can't immediately spot the bug.  Probably it's caused by rcu-preempt's
  changes to synchronize_sched(): that function now does a heap more than 
 it
  used to, including taking sched_hotcpu_muex.
 
  So, what to do about this.  Paul, I'm thinking that I should drop
  rcu-preempt for now - I don't think we ended up being able to identify 
 any
  particular benefit which it brings to current mainline, and I suspect 
 that
  things will become simpler if/when we start using the process freezer for
  CPU hotplug.
 
 It certainly makes sense for Michal to try backing out rcu-preempt using
 your broken-out list of patches.  If that makes the problem go away,
 
 Problem is caused by rcu-preempt.patch.

OK, clearly we need to fix this.  You might be right about the freezer
code having to go in first, Andrew -- will see!

Thanx, Paul

 then I would certainly have a hard time arguing with you.  We are working
 on getting measurements showing benefit of rcu-preempt, but aren't there
 yet.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-11 Thread Con Kolivas
On Monday 12 March 2007 15:42, Al Boldi wrote:
 Con Kolivas wrote:
  On Monday 12 March 2007 08:52, Con Kolivas wrote:
   And thank you! I think I know what's going on now. I think each
   rotation is followed by another rotation before the higher priority
   task is getting a look in in schedule() to even get quota and add it to
   the runqueue quota. I'll try a simple change to see if that helps.
   Patch coming up shortly.
 
  Can you try the following patch and see if it helps. There's also one
  minor preemption logic fix in there that I'm planning on including.
  Thanks!

 Applied on top of v0.28 mainline, and there is no difference.

 What's it look like on your machine?

The higher priority one always get 6-7ms whereas the lower priority one runs 
6-7ms and then one larger perfectly bound expiration amount. Basically 
exactly as I'd expect. The higher priority task gets precisely RR_INTERVAL 
maximum latency whereas the lower priority task gets RR_INTERVAL min and full 
expiration (according to the virtual deadline) as a maximum. That's exactly 
how I intend it to work. Yes I realise that the max latency ends up being 
longer intermittently on the niced task but that's -in my opinion- perfectly 
fine as a compromise to ensure the nice 0 one always gets low latency.

Eg:
nice 0 vs nice 10

nice 0:
pid 6288, prio   0, out for7 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms
pid 6288, prio   0, out for6 ms

nice 10:
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for   66 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms
pid 6290, prio  10, out for6 ms

exactly as I'd expect. If you want fixed latencies _of niced tasks_ in the 
presence of less niced tasks you will not get them with this scheduler. What 
you will get, though, is a perfectly bound relationship knowing exactly what 
the maximum latency will ever be.

Thanks for the test case. It's interesting and nice that it confirms this 
scheduler works as I expect it to.

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Style Question

2007-03-11 Thread Cong WANG

2007/3/12, Jan Engelhardt [EMAIL PROTECTED]:


On Mar 11 2007 22:15, Cong WANG wrote:

 I have a question about coding style in linux kernel. In
 Documention/CodingStyle, it is said that Linux style for comments is
 the C89 /* ... */ style. Don't use C99-style // ... comments.
 _But_ I see a lot of '//' style comments in current kernel code.

 Which is wrong? The documentions or the code, or neither? And why?

The code. And because it's not always reviewed but silently pushed.

 Another question is about NULL. AFAIK, in user space, using NULL is
 better than directly using 0 in C. In kernel, I know it used its own
 NULL, which may be defined as ((void*)0), but it's _still_ different
 from raw zero.

In what way?


The following code is picked from drivers/kvm/kvm_main.c:

static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot)
{
  struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot];

  mutex_lock(vcpu-mutex);
  if (unlikely(!vcpu-vmcs)) {
  mutex_unlock(vcpu-mutex);
  return 0;
  }
  return kvm_arch_ops-vcpu_load(vcpu);
}

Obviously, it used 0 rather than NULL when returning a pointer to
indicate an error. Should we fix such issue?



So can I say using NULL is better than 0 in kernel?

On what basis? Do you even know what NULL is defined as in
(C, not C++) userspace? Think about it.



I think it's more clear to indicate we are using a pointer rather than
an integer when we use NULL in kernel. But in userspace, using NULL is
for portbility of the program, although most (*just* most, NOT all) of
NULL's defination is ((void*)0). ;-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL for 2.6.21-rc3- 0.29

2007-03-11 Thread Gene Heskett
On Sunday 11 March 2007, Con Kolivas wrote:
On Sunday 11 March 2007 15:03, Matt Mackall wrote:
 On Sat, Mar 10, 2007 at 10:01:32PM -0600, Matt Mackall wrote:
  On Sun, Mar 11, 2007 at 01:28:22PM +1100, Con Kolivas wrote:
   Ok I don't think there's any actual accounting problem here per se
   (although I did just recently post a bugfix for rsdl however I
   think that's unrelated). What I think is going on in the ccache
   testcase is that all the work is being offloaded to kernel threads
   reading/writing to/from the filesystem and the make is not getting
   any actual cpu time.
 
  I don't see significant system time while this is happening.

 Also, it's running pretty much entirely out of page cache so there
 wouldn't be a whole lot for kernel threads to do.

Well I can't reproduce that behaviour here at all whether from disk or
 the pagecache with ccache, so I'm not entirely sure what's different at
 your end. However both you and the other person reporting bad behaviour
 were using ATI drivers. That's about the only commonality? I wonder if
 they do need to yield... somewhat instead of not at all.

I hate to say it Con, but this one seems to have broken the amanda-tar 
symbiosis.

I haven't tried a plain 21-rc3, so the problem may exist there, and in 
fact it did for 21-rc1, but I don't recall if it was true for -rc2.  But 
I will have a plain 21-rc3 running by tomorrow nights amanda run to test.

What happens is that when amanda tells tar to do a level 1 or 2, tar still 
thinks its doing a level 0.  The net result is that the tape is filled 
completely and amanda does an EOT exit in about 10 of my 42 dle's.  This 
is tar-1.15-1 for fedora core 6.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
While it may be true that a watched pot never boils, the one you don't
keep an eye on can make an awful mess of your stove.
-- Edward Stevenson
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Style Question

2007-03-11 Thread Jan Engelhardt

On Mar 12 2007 13:37, Cong WANG wrote:

 The following code is picked from drivers/kvm/kvm_main.c:

 static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot)
 {
 struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot];

 mutex_lock(vcpu-mutex);
 if (unlikely(!vcpu-vmcs)) {
 mutex_unlock(vcpu-mutex);
 return 0;
 }
 return kvm_arch_ops-vcpu_load(vcpu);
 }

 Obviously, it used 0 rather than NULL when returning a pointer to
 indicate an error. Should we fix such issue?

Indeed. If it was for me, something like that should throw a compile error.

[...]
 I think it's more clear to indicate we are using a pointer rather than
 an integer when we use NULL in kernel. But in userspace, using NULL is
 for portbility of the program, although most (*just* most, NOT all) of
 NULL's defination is ((void*)0). ;-)

NULL has the same bit pattern as the number zero. (I'm not saying the bit
pattern is all zeroes. And I am not even sure if NULL ought to have the same
pattern as zero.) So C++ could use (void *)0, if it would let itself :p





Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL for 2.6.21-rc3- 0.29

2007-03-11 Thread Con Kolivas
Hi Gene.

On Monday 12 March 2007 16:38, Gene Heskett wrote:
 I hate to say it Con, but this one seems to have broken the amanda-tar
 symbiosis.

 I haven't tried a plain 21-rc3, so the problem may exist there, and in
 fact it did for 21-rc1, but I don't recall if it was true for -rc2.  But
 I will have a plain 21-rc3 running by tomorrow nights amanda run to test.

 What happens is that when amanda tells tar to do a level 1 or 2, tar still
 thinks its doing a level 0.  The net result is that the tape is filled
 completely and amanda does an EOT exit in about 10 of my 42 dle's.  This
 is tar-1.15-1 for fedora core 6.

I'm sorry but I have to say I have no idea what any of this means. I gather 
you're making an association between some application combination failing and 
RSDL cpu scheduler. Unfortunately the details of what the problem is, or how 
the cpu scheduler is responsible, escape me :(

-- 
-ck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Style Question

2007-03-11 Thread Nicholas Miell
On Mon, 2007-03-12 at 06:40 +0100, Jan Engelhardt wrote:
 On Mar 12 2007 13:37, Cong WANG wrote:
 
  The following code is picked from drivers/kvm/kvm_main.c:
 
  static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot)
  {
  struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot];
 
  mutex_lock(vcpu-mutex);
  if (unlikely(!vcpu-vmcs)) {
  mutex_unlock(vcpu-mutex);
  return 0;
  }
  return kvm_arch_ops-vcpu_load(vcpu);
  }
 
  Obviously, it used 0 rather than NULL when returning a pointer to
  indicate an error. Should we fix such issue?
 
 Indeed. If it was for me, something like that should throw a compile error.
 
 [...]
  I think it's more clear to indicate we are using a pointer rather than
  an integer when we use NULL in kernel. But in userspace, using NULL is
  for portbility of the program, although most (*just* most, NOT all) of
  NULL's defination is ((void*)0). ;-)
 
 NULL has the same bit pattern as the number zero. (I'm not saying the bit
 pattern is all zeroes. And I am not even sure if NULL ought to have the same
 pattern as zero.) So C++ could use (void *)0, if it would let itself :p

Not necessarily. You can use 0 at the source level, but the compiler has
to convert it to the actual NULL pointer bit pattern, whatever it may
be.

In C++, NULL is typically defined to 0 (with no void* cast) by most
compilers because 0 (and only 0) can be implicitly converted to to null
pointer of any ponter type without a cast. 

GCC introduced the __null extension so that NULL still works correctly
in C++ when passed to a varargs function on 64-bit platforms.

(This just works in C because C makes NULL ((void*)0) is thus is the
right size. In C++, the 0 ends up being an int instead of a pointer when
passed to a varargs function, and things tend to blow up when they read
the garbage high bits. Of course, nobody else does this, so you still
have to use (void*)NULL to be portable.)

-- 
Nicholas Miell [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Make nenuconfig does not save parameters.

2007-03-11 Thread Cyrill Gorcunov

On 3/11/07, Sam Ravnborg [EMAIL PROTECTED] wrote:
[..snip..]
|  To make the conversion we should consider renaming from
|  current Load alternate to Open config file...
|  and likewise Save alternate to Save config file as...
| 
|  Comments?
| 
| Sam
[..snip...]

I think that is excellent. (Actually I can't test it now but the idea
is just perfect)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Style Question

2007-03-11 Thread Randy.Dunlap
On Mon, 12 Mar 2007, Jan Engelhardt wrote:


 On Mar 12 2007 13:37, Cong WANG wrote:
 
  The following code is picked from drivers/kvm/kvm_main.c:
 
  static struct kvm_vcpu *vcpu_load(struct kvm *kvm, int vcpu_slot)
  {
  struct kvm_vcpu *vcpu = kvm-vcpus[vcpu_slot];
 
  mutex_lock(vcpu-mutex);
  if (unlikely(!vcpu-vmcs)) {
  mutex_unlock(vcpu-mutex);
  return 0;
  }
  return kvm_arch_ops-vcpu_load(vcpu);
  }
 
  Obviously, it used 0 rather than NULL when returning a pointer to
  indicate an error. Should we fix such issue?

 Indeed. If it was for me, something like that should throw a compile error.

At least it does throw a sparse warning, and yes, it should
be fixed.

 [...]
  I think it's more clear to indicate we are using a pointer rather than
  an integer when we use NULL in kernel. But in userspace, using NULL is
  for portbility of the program, although most (*just* most, NOT all) of
  NULL's defination is ((void*)0). ;-)

 NULL has the same bit pattern as the number zero. (I'm not saying the bit
 pattern is all zeroes. And I am not even sure if NULL ought to have the same
 pattern as zero.) So C++ could use (void *)0, if it would let itself :p


 
 

 Jan


-- 
~Randy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL for 2.6.21-rc3- 0.29

2007-03-11 Thread Gene Heskett
On Monday 12 March 2007, Con Kolivas wrote:
Hi Gene.

On Monday 12 March 2007 16:38, Gene Heskett wrote:
 I hate to say it Con, but this one seems to have broken the amanda-tar
 symbiosis.

 I haven't tried a plain 21-rc3, so the problem may exist there, and in
 fact it did for 21-rc1, but I don't recall if it was true for -rc2. 
 But I will have a plain 21-rc3 running by tomorrow nights amanda run
 to test.

 What happens is that when amanda tells tar to do a level 1 or 2, tar
 still thinks its doing a level 0.  The net result is that the tape is
 filled completely and amanda does an EOT exit in about 10 of my 42
 dle's.  This is tar-1.15-1 for fedora core 6.

I'm sorry but I have to say I have no idea what any of this means. I
 gather you're making an association between some application
 combination failing and RSDL cpu scheduler. Unfortunately the details
 of what the problem is, or how the cpu scheduler is responsible, escape
 me :(

I have another backup running right now, after building a plain 
2.6.21-rc3, and rebooting just now for the test.  I don't think its the 
scheduler itself, but is something post 2.6.20 that is messing with tars 
mind and making it think the files it just read to do the estimate phase, 
are all new, so even a level 2 is in effect a level 0.  I'll have an 
answer in about an hour, but its also 2:36am here and I'm headed for the 
rack to get some zzz's.  So I'll report in the morning as to whether or 
not this backup ran as it was supposed to.  I have a feeling its not 
going to though.


-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
When it comes to humility, I'm the greatest.
-- Bullwinkle Moose

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]Replace 0 with NULL when returning a pointer

2007-03-11 Thread Cong WANG

Use NULL to indicate we are returning a pointer rather than an integer
and to eliminate some sparse warnings.

Signed-off-by: Cong WANG [EMAIL PROTECTED]

---
--- drivers/kvm/kvm_main.c.orig 2007-03-11 21:41:23.0 +0800
+++ drivers/kvm/kvm_main.c  2007-03-12 14:26:17.0 +0800
@@ -205,7 +205,7 @@ static struct kvm_vcpu *vcpu_load(struct
mutex_lock(vcpu-mutex);
if (unlikely(!vcpu-vmcs)) {
mutex_unlock(vcpu-mutex);
-   return 0;
+   return NULL;
}
return kvm_arch_ops-vcpu_load(vcpu);
}
@@ -799,7 +799,7 @@ struct kvm_memory_slot *gfn_to_memslot(s
 gfn  memslot-base_gfn + memslot-npages)
return memslot;
}
-   return 0;
+   return NULL;
}
EXPORT_SYMBOL_GPL(gfn_to_memslot);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]Replace 0 with NULL when returning a pointer

2007-03-11 Thread Cong WANG

Use NULL to indicate we are returning a pointer rather than an integer
and to eliminate some sparse warnings.

Signed-off-by: Cong WANG [EMAIL PROTECTED]
---
--- drivers/kvm/vmx.c.orig  2007-03-11 21:41:03.0 +0800
+++ drivers/kvm/vmx.c   2007-03-12 14:25:11.0 +0800
@@ -98,7 +98,7 @@ static struct vmx_msr_entry *find_msr_en
for (i = 0; i  vcpu-nmsrs; ++i)
if (vcpu-guest_msrs[i].index == msr)
return vcpu-guest_msrs[i];
-   return 0;
+   return NULL;
}

static void vmcs_clear(struct vmcs *vmcs)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/8] per backing_dev dirty and writeback page accounting

2007-03-11 Thread David Chinner
On Tue, Mar 06, 2007 at 07:04:46PM +0100, Miklos Szeredi wrote:
 From: Andrew Morton [EMAIL PROTECTED]
 
 [EMAIL PROTECTED]: bugfix]
 
 Miklos Szeredi [EMAIL PROTECTED]:
 
 Changes:
  - updated to apply after clear_page_dirty_for_io() race fix
 
 This is needed for
 
  - balance_dirty_pages() deadlock fix
  - fuse dirty page accounting
 
 I have no idea how serious the scalability problems with this are.  If
 they are serious, different solutions can probably be found for the
 above, but this is certainly the simplest.

Atomic operations to a single per-backing device from all CPUs at once?
That's a pretty serious scalability issue and it will cause a major
performance regression for XFS.

I'd call this a showstopper right now - maybe you need to look at
something like the ZVC code that Christoph Lameter wrote, perhaps?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-11 Thread Radoslaw Szkodzinski

On 3/11/07, Gene Heskett [EMAIL PROTECTED] wrote:

On Sunday 11 March 2007, Mike Galbraith wrote:

Just to comment, I've been running one of the patches between 20-ck1 and
this latest one, which is building as I type, but I also run gkrellm
here, version 2.2.9.

Since I have been running this middle of this series patch, something is
killing gkrellm about once a day, and there is nothing in the logs to
indicate a problem.  I see a blink out of the corner of my eye, and its
gone.  And it always starts right back up from a kmenu click.

No idea if anyone else is experiencing this or not.

--
Cheers, Gene


I've had such an issue with 0.20 or something. Sometimes, the
xfce4-panel would disappear (die) when I displayed its menu.
Very rare issue.

Doesn't happen with 0.28 anyway. :-) Which looks really good, though
I'll update to 0.30.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] two more device ids for dm9601 usbnet driver

2007-03-11 Thread Peter Korsgaard
 Jon == Jon Dowland [EMAIL PROTECTED] writes:

Hi,

 Jon This patch for the linux-usb-devel tree adds two more
 Jon product ids to the dm9601 driver. These ids were found on
 Jon rebadged dm9601 devices in the wild.

 Jon Signed-off-by: Jon Dowland [EMAIL PROTECTED]

Acked-by: Peter Korsgaard [EMAIL PROTECTED]

-- 
Bye, Peter Korsgaard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2: drivers/net/wireless/libertas/debugfs.c addr bogosity

2007-03-11 Thread Tony Breeds
On Fri, Mar 09, 2007 at 09:14:29AM -0800, Randy Dunlap wrote:
 
 Good to use FIELD_SIZEOF(),

Thanks.

 but in general, we prefer to use it
 directly, not in yet another wrapper.

I left the item_{size,addr} in place as it seemed to make the item[]
more compact.

I'm not certain using the FIELD_SIZEOF() macro directly is a win.

From: Tony Breeds [EMAIL PROTECTED]

Cleanup drivers/net/wireless/libertas/debugfs.c to use standard kernel macros 
and functions.

Signed-off-by: Tony Breeds [EMAIL PROTECTED]

---
only compile tested on x86

 drivers/net/wireless/libertas/debugfs.c |   56 +++
 1 files changed, 12 insertions(+), 44 deletions(-)

diff --git a/drivers/net/wireless/libertas/debugfs.c 
b/drivers/net/wireless/libertas/debugfs.c
index 3ad1e03..8b0e3ec 100644
--- a/drivers/net/wireless/libertas/debugfs.c
+++ b/drivers/net/wireless/libertas/debugfs.c
@@ -1771,58 +1771,26 @@ void libertas_debugfs_remove_one(wlan_private *priv)
 }
 
 /* debug entry */
-
-#define item_size(n) (sizeof ((wlan_adapter *)0)-n)
-#define item_addr(n) ((u32) ((wlan_adapter *)0)-n)
-
 struct debug_data {
char name[32];
u32 size;
u32 addr;
 };
 
-/* To debug any member of wlan_adapter, simply add one line here.
- */
+/* To debug any member of wlan_adapter, simply add a record here. */
 static struct debug_data items[] = {
-   {intcounter, item_size(intcounter), item_addr(intcounter)},
-   {psmode, item_size(psmode), item_addr(psmode)},
-   {psstate, item_size(psstate), item_addr(psstate)},
+   { .name = intcounter,
+ .size = FIELD_SIZEOF(wlan_adapter, intcounter),
+ .addr = offsetof(wlan_adapter, intcounter) },
+   { .name = psmode,
+ .size = FIELD_SIZEOF(wlan_adapter, psmode),
+ .addr = offsetof(wlan_adapter, psmode) },
+   { .name = psstate,
+ .size = FIELD_SIZEOF(wlan_adapter, psstate),
+ .addr = offsetof(wlan_adapter, psstate) },
 };
 
-static int num_of_items = sizeof(items) / sizeof(items[0]);
-
-/**
- *  @brief convert string to number
- *
- *  @param s  pointer to numbered string
- *  @return   converted number from string s
- */
-static int string_to_number(char *s)
-{
-   int r = 0;
-   int base = 0;
-
-   if ((strncmp(s, 0x, 2) == 0) || (strncmp(s, 0X, 2) == 0))
-   base = 16;
-   else
-   base = 10;
-
-   if (base == 16)
-   s += 2;
-
-   for (s = s; *s != 0; s++) {
-   if ((*s = 48)  (*s = 57))
-   r = (r * base) + (*s - 48);
-   else if ((*s = 65)  (*s = 70))
-   r = (r * base) + (*s - 55);
-   else if ((*s = 97)  (*s = 102))
-   r = (r * base) + (*s - 87);
-   else
-   break;
-   }
-
-   return r;
-}
+static int num_of_items = ARRAY_SIZE(items);
 
 /**
  *  @brief proc read function
@@ -1912,7 +1880,7 @@ static int wlan_debugfs_write(struct file *f, const char 
__user *buf,
if (!p2)
break;
p2++;
-   r = string_to_number(p2);
+   r = simple_strtoul(p2, NULL, 0);
if (d[i].size == 1)
*((u8 *) d[i].addr) = (u8) r;
else if (d[i].size == 2)


Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes

2007-03-11 Thread Tejun Heo
Hello, Linus.

Linus Torvalds wrote:
 On Sun, 11 Mar 2007, Paul Rolland wrote:
 My machine is having two problems : the one you are describing above,
 which is due to a SIL controler being connected to one port of the ICH7
 (at least, it seems to), and probing it goes  timeout, but nothing is
 connected on it.
 
 Ok, so that's just a message irritation, not actually bothersome 
 otherwise?

It involves a long timeout, so it's bothersome.  This is caused by
Silicon Image 4726/3726 storage processor (SATA Port Multiplier with
extra features) attached to one of the ICH ports.

If the first  downstream port in the PMP is empty and it gets reset in
non-PMP way, it identifies itself as Config Disk of quite small size.
 It's probably used to configure the extra features using standard ATA
RW commands.  Anyways, this Config Disk is a bit peculiar and doesn't
work very well with the current ATA reset sequence and gets identified
only after a few failures thus causing long timeout.

I keep forgetting about this.  I'll ask SIMG how to deal with this.  For
the time being, connecting a device to the PMP port should remove the
timeouts.

 The second problem is a Jmicron363 controler that is failing to detect
 the DVD-RW that is connected, unless I use the irqpoll option as Tejun has
 suggested.
 
 .. and this one has never worked without irqpoll?
 
 But, as you suggest it, I'm adding pci=nomsi to the command line
 rebooting... no change for this part of the problem.

 OK, the /proc/interrupt for this config, and the dmesg attached.

 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts 
CPU0   CPU1   
   0: 297549  0   IO-APIC-edge  timer
   1:  7  0   IO-APIC-edge  i8042
   4: 13  0   IO-APIC-edge  serial
   6:  5  0   IO-APIC-edge  floppy
   8:  1  0   IO-APIC-edge  rtc
   9:  0  0   IO-APIC-fasteoi   acpi
  12:126  0   IO-APIC-edge  i8042
  14:   8313  0   IO-APIC-edge  libata
  15:  0  0   IO-APIC-edge  libata
  16:  0  0   IO-APIC-fasteoi   eth1, libata
 
 So it's the irq16 one that is the Jmicron controller and just isn't 
 getting any interrupts?
 
 Since all the other interrupts work (and MSI worked for other 
 controllers), I don't think it's interrupt-routing related. Especially as 
 MSI shouldn't even care about things like that.
 
 And since it all works when irqpoll is used, that implies that the 
 *only* thing that is broken is literally irq delivery.
 
 Is there possibly some jmicron-specific enable interrupts bit? 

(cc'ing Justin of JMicron.  Hello, please correct me if I'm wrong.)

Not that I know of.  The PATA portion of JMB controllers is bog standard
PCI BMDMA ATA device where ATA_NIEN is the way to turn IRQ on and off.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes

2007-03-11 Thread Tejun Heo
Of course I forgot to CC.  :-)  Quoting whole message for Justin.

Tejun Heo wrote:
 Hello, Linus.
 
 Linus Torvalds wrote:
 On Sun, 11 Mar 2007, Paul Rolland wrote:
 My machine is having two problems : the one you are describing above,
 which is due to a SIL controler being connected to one port of the ICH7
 (at least, it seems to), and probing it goes  timeout, but nothing is
 connected on it.
 Ok, so that's just a message irritation, not actually bothersome 
 otherwise?
 
 It involves a long timeout, so it's bothersome.  This is caused by
 Silicon Image 4726/3726 storage processor (SATA Port Multiplier with
 extra features) attached to one of the ICH ports.
 
 If the first  downstream port in the PMP is empty and it gets reset in
 non-PMP way, it identifies itself as Config Disk of quite small size.
  It's probably used to configure the extra features using standard ATA
 RW commands.  Anyways, this Config Disk is a bit peculiar and doesn't
 work very well with the current ATA reset sequence and gets identified
 only after a few failures thus causing long timeout.
 
 I keep forgetting about this.  I'll ask SIMG how to deal with this.  For
 the time being, connecting a device to the PMP port should remove the
 timeouts.
 
 The second problem is a Jmicron363 controler that is failing to detect
 the DVD-RW that is connected, unless I use the irqpoll option as Tejun has
 suggested.
 .. and this one has never worked without irqpoll?

 But, as you suggest it, I'm adding pci=nomsi to the command line
 rebooting... no change for this part of the problem.

 OK, the /proc/interrupt for this config, and the dmesg attached.

 3 [23:22] [EMAIL PROTECTED]:~ cat /proc/interrupts 
CPU0   CPU1   
   0: 297549  0   IO-APIC-edge  timer
   1:  7  0   IO-APIC-edge  i8042
   4: 13  0   IO-APIC-edge  serial
   6:  5  0   IO-APIC-edge  floppy
   8:  1  0   IO-APIC-edge  rtc
   9:  0  0   IO-APIC-fasteoi   acpi
  12:126  0   IO-APIC-edge  i8042
  14:   8313  0   IO-APIC-edge  libata
  15:  0  0   IO-APIC-edge  libata
  16:  0  0   IO-APIC-fasteoi   eth1, libata
 So it's the irq16 one that is the Jmicron controller and just isn't 
 getting any interrupts?

 Since all the other interrupts work (and MSI worked for other 
 controllers), I don't think it's interrupt-routing related. Especially as 
 MSI shouldn't even care about things like that.

 And since it all works when irqpoll is used, that implies that the 
 *only* thing that is broken is literally irq delivery.

 Is there possibly some jmicron-specific enable interrupts bit? 
 
 (cc'ing Justin of JMicron.  Hello, please correct me if I'm wrong.)
 
 Not that I know of.  The PATA portion of JMB controllers is bog standard
 PCI BMDMA ATA device where ATA_NIEN is the way to turn IRQ on and off.
 
 Thanks.
 


-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata extension

2007-03-11 Thread Robert Hancock

Vitaliyi wrote:

Good Day

Say i want to implement extended set of ATA commands available to
userspace for building diagnostic tools.
I need 0x40 -- read verify and 0x32 -- write long with error handling,
for example. I was trying ide driver through ioctl's, but seems it
lack of functionality and full of gotchas. Furthermore it oopses
sometimes.

Is it possible to use libata for such purpose or i need to write
separate IDE driver ?
By the way, i'm sure it should be done in kernel space since i'm going
to deal with some hdd manufacturer commands.

P.S. I was looking through libata and ide sources and documentation
but still dont have broad picture.


I believe you should be able to do this by sending ATA pass-through SCSI 
commands into the device using SG_IO, without any kernel changes. It's 
really the mechanism that's meant for this..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/3] Add ability to keep track of callers of symbol_(get|put)

2007-03-11 Thread Andrew Morton
> On Sat, 10 Mar 2007 02:31:35 -0200 Mauro Carvalho Chehab <[EMAIL PROTECTED]> 
> wrote:
> From: Trent Piepho <[EMAIL PROTECTED]>
> 
> When a module uses symbol_get() to increase the ref count of another
> module, there is no record what module called symbol_get().  A module
> can
> show up as having other users, but there is no way to tell who those
> users are.
> 
> This adds that ability to symbol_put() and symbol_get().

One day I'll write a script which unwordwraps patches and then you'll all
need to find new ways of torturing me.

This patch needed rather a lot of help in the coding-style department. 
Hopefully Rusty can comment on the content, because I'm all exhausted from
cleaning it up.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/6] 2.6.21-rc2: known regressions

2007-03-11 Thread Ingo Molnar

* Pavel Machek <[EMAIL PROTECTED]> wrote:

> > Probably tweaking the webpage doesnt help because people dont get 
> > there - as the results plainly show it. Maybe some more automation 
> > would be useful too, a tool that detects failed resume and tries all 
> > those options that makes sense on that box or something? It's not 
> > like that
> 
> Unfortunately, these tend to crash the box when you pass wrong 
> options, and I do not see easy way to test "can user see whats on 
> display" automatically.

you could perhaps try what X's modesetting utility does: display a 
dialog box that times out if it does not get clicked on, and reboot if 
it did not get clicked on. Likewise, detect upon the next bootup that a 
suspend-test was in progress (and didnt get back via normal resume), via 
some temporary file. That way both the 'did not resume and i had to 
power-cycle' and the 'resume did not restore my X' problems can be 
handled.

Finally, when the correct options have been established (worse-case with 
a small number of reboots and "yes, indeed the resume did not work fine" 
clicks done upon bootup by the user), automatically fill in a webform in 
firefox and ask the user to do a single click to submit that form.

techniques like that have more chance i think to get Linux 
suspend/resume anywhere near to working. The current 'rely on the 
developer' technique apparently does not work.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 6/7] Account for the number of tasks within container

2007-03-11 Thread Pavel Emelianov
Paul Menage wrote:
> On 3/6/07, Pavel Emelianov <[EMAIL PROTECTED]> wrote:
>> The idea is:
>>
>> Task may be "the entity that allocates the resources" and "the
>> entity that is a resource allocated".
>>
>> When task is the first entity it may move across containers
>> (that is implemented in your patches). When task is a resource
>> it shouldn't move across containers like files or pages do.
>>
>> More generally - allocated resources hold reference to original
>> container till they die. No resource migration is performed.
>>
>> Did I express my idea cleanly?
> 
> Yes, but I disagree with the premise. The title of your patch is
> "Account for the number of tasks within container", but that's not
> what the subsystem does, it accounts for the number of forks within
> the container that aren't directly accompanied by an exit.
> 
> Ideally, resources like files and pages would be able to follow tasks
> as well. The reason that files and pages aren't easily migrated from
> one container to another is that there could be sharing involved;
> figuring out the sharing can be expensive, and it's not clear what to
> do if two users are in different containers.
> 
> But in the case of a task count, there are no such issues with
> sharing, so it seems to me to be more sensible (and more efficient) to
> just limit the number of tasks in a container.
> 
> i.e. when moving a task into a container or forking a task within a
> container, increment the count; when moving a task out of a container
> or when it exits, decrement the count.

Sounds reasonable.
I'll take this into account when I make the next iteration.
Thanks.

> With your approach, if you were to set the task limit of an empty
> container A to 1, and then move a process P from B into A, P would be
> able to fork a new child, since the "task count" would be 0 (as P was
> being charged to B still). Surely the fact that there's 1 process in A
> should prevent P from forking?
> 
> Paul
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] Resource counters

2007-03-11 Thread Pavel Emelianov
Herbert Poetzl wrote:
> On Wed, Mar 07, 2007 at 10:19:05AM +0300, Pavel Emelianov wrote:
>> Balbir Singh wrote:
>>> Pavel Emelianov wrote:
 Introduce generic structures and routines for
 resource accounting.

 Each resource accounting container is supposed to
 aggregate it, container_subsystem_state and its
 resource-specific members within.


 

 diff -upr linux-2.6.20.orig/include/linux/res_counter.h
 linux-2.6.20-0/include/linux/res_counter.h
 --- linux-2.6.20.orig/include/linux/res_counter.h2007-03-06
 13:39:17.0 +0300
 +++ linux-2.6.20-0/include/linux/res_counter.h2007-03-06
 13:33:28.0 +0300
 @@ -0,0 +1,83 @@
 +#ifndef __RES_COUNTER_H__
 +#define __RES_COUNTER_H__
 +/*
 + * resource counters
 + *
 + * Copyright 2007 OpenVZ SWsoft Inc
 + *
 + * Author: Pavel Emelianov <[EMAIL PROTECTED]>
 + *
 + */
 +
 +#include 
 +
 +struct res_counter {
 +unsigned long usage;
 +unsigned long limit;
 +unsigned long failcnt;
 +spinlock_t lock;
 +};
 +
 +enum {
 +RES_USAGE,
 +RES_LIMIT,
 +RES_FAILCNT,
 +};
 +
 +ssize_t res_counter_read(struct res_counter *cnt, int member,
 +const char __user *buf, size_t nbytes, loff_t *pos);
 +ssize_t res_counter_write(struct res_counter *cnt, int member,
 +const char __user *buf, size_t nbytes, loff_t *pos);
 +
 +static inline void res_counter_init(struct res_counter *cnt)
 +{
 +spin_lock_init(>lock);
 +cnt->limit = (unsigned long)LONG_MAX;
 +}
 +
>>> Is there any way to indicate that there are no limits on this container.
>> Yes - LONG_MAX is essentially a "no limit" value as no
>> container will ever have such many files :)
> 
> -1 or ~0 is a viable choice for userspace to
> communicate 'infinite' or 'unlimited'

OK, I'll make ULONG_MAX :)

>>> LONG_MAX is quite huge, but still when the administrator wants to
>>> configure a container to *un-limited usage*, it becomes hard for
>>> the administrator.
>>>
 +static inline int res_counter_charge_locked(struct res_counter *cnt,
 +unsigned long val)
 +{
 +if (cnt->usage <= cnt->limit - val) {
 +cnt->usage += val;
 +return 0;
 +}
 +
 +cnt->failcnt++;
 +return -ENOMEM;
 +}
 +
 +static inline int res_counter_charge(struct res_counter *cnt,
 +unsigned long val)
 +{
 +int ret;
 +unsigned long flags;
 +
 +spin_lock_irqsave(>lock, flags);
 +ret = res_counter_charge_locked(cnt, val);
 +spin_unlock_irqrestore(>lock, flags);
 +return ret;
 +}
 +
>>> Will atomic counters help here.
>> I'm afraid no. We have to atomically check for limit and alter
>> one of usage or failcnt depending on the checking result. Making
>> this with atomic_xxx ops will require at least two ops.
> 
> Linux-VServer does the accounting with atomic counters,
> so that works quite fine, just do the checks at the
> beginning of whatever resource allocation and the
> accounting once the resource is acquired ...

This works quite fine on non-preempted kernels.
>From the time you checked for resource till you really
account it kernel may preempt and let another process
pass through vx_anything_avail() check.

>> If we'll remove failcnt this would look like
>>while (atomic_cmpxchg(...))
>> which is also not that good.
>>
>> Moreover - in RSS accounting patches I perform page list
>> manipulations under this lock, so this also saves one atomic op.
> 
> it still hasn't been shown that this kind of RSS limit
> doesn't add big time overhead to normal operations
> (inside and outside of such a resource container)
> 
> note that the 'usual' memory accounting is much more
> lightweight and serves similar purposes ...

It OOM-kills current int case of limit hit instead of
reclaiming pages or killing *memory eater* to free memory.

> best,
> Herbert
> 
 +static inline void res_counter_uncharge_locked(struct res_counter *cnt,
 +unsigned long val)
 +{
 +if (unlikely(cnt->usage < val)) {
 +WARN_ON(1);
 +val = cnt->usage;
 +}
 +
 +cnt->usage -= val;
 +}
 +
 +static inline void res_counter_uncharge(struct res_counter *cnt,
 +unsigned long val)
 +{
 +unsigned long flags;
 +
 +spin_lock_irqsave(>lock, flags);
 +res_counter_uncharge_locked(cnt, val);
 +spin_unlock_irqrestore(>lock, flags);
 +}
 +
 +#endif
 diff -upr linux-2.6.20.orig/init/Kconfig linux-2.6.20-0/init/Kconfig
 --- linux-2.6.20.orig/init/Kconfig2007-03-06 13:33:28.0 +0300
 +++ linux-2.6.20-0/init/Kconfig2007-03-06 13:33:28.0 

Re: [RFC][PATCH 2/7] RSS controller core

2007-03-11 Thread Pavel Emelianov
Herbert Poetzl wrote:
> On Tue, Mar 06, 2007 at 02:00:36PM -0800, Andrew Morton wrote:
>> On Tue, 06 Mar 2007 17:55:29 +0300
>> Pavel Emelianov <[EMAIL PROTECTED]> wrote:
>>
>>> +struct rss_container {
>>> +   struct res_counter res;
>>> +   struct list_head page_list;
>>> +   struct container_subsys_state css;
>>> +};
>>> +
>>> +struct page_container {
>>> +   struct page *page;
>>> +   struct rss_container *cnt;
>>> +   struct list_head list;
>>> +};
>> ah. This looks good. I'll find a hunk of time to go through this work
>> and through Paul's patches. It'd be good to get both patchsets lined
>> up in -mm within a couple of weeks. But..
> 
> doesn't look so good for me, mainly becaus of the 
> additional per page data and per page processing
> 
> on 4GB memory, with 100 guests, 50% shared for each
> guest, this basically means ~1mio pages, 500k shared
> and 1500k x sizeof(page_container) entries, which
> roughly boils down to ~25MB of wasted memory ...
> 
> increase the amount of shared pages and it starts
> getting worse, but maybe I'm missing something here

You are. Each page has only one page_container associated
with it despite the number of containers it is shared
between.

>> We need to decide whether we want to do per-container memory
>> limitation via these data structures, or whether we do it via a
>> physical scan of some software zone, possibly based on Mel's patches.
> 
> why not do simple page accounting (as done currently
> in Linux) and use that for the limits, without
> keeping the reference from container to page?

As I've already answered in my previous letter simple
limiting w/o per-container reclamation and per-container
oom killer isn't a good memory management. It doesn't allow
to handle resource shortage gracefully.

This patchset provides more grace way to handle this, but
full memory management includes accounting of VMA-length
as well (returning ENOMEM from system call) but we've decided
to start with RSS.

> best,
> Herbert
> 
>> ___
>> Containers mailing list
>> [EMAIL PROTECTED]
>> https://lists.osdl.org/mailman/listinfo/containers
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BIG] Re: sched rsdl fix for 0.28

2007-03-11 Thread Nicolas Mailhot
Le dimanche 11 mars 2007 Ă  11:07 +1100, Con Kolivas a Ă©crit :
> sched rsdl fix

Doesn't change a thing. Always breaks at the same place (though
depending on hardware timings? the trace is not always the same). Pretty
sure nothing happens before this failure

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [BIG] Re: sched rsdl fix for 0.28

2007-03-11 Thread Con Kolivas
On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote:
> Le dimanche 11 mars 2007 Ă  11:07 +1100, Con Kolivas a Ă©crit :
> > sched rsdl fix
>
> Doesn't change a thing. Always breaks at the same place (though
> depending on hardware timings? the trace is not always the same). Pretty
> sure nothing happens before this failure

Bummer. The only other thing to try is v0.29 posted recently. I still haven't 
got a good way to reproduce this locally but I'll keep trying. Thanks for 
testing.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BIG] Re: sched rsdl fix for 0.28

2007-03-11 Thread Con Kolivas
On Sunday 11 March 2007 20:21, Con Kolivas wrote:
> On Sunday 11 March 2007 20:10, Nicolas Mailhot wrote:
> > Le dimanche 11 mars 2007 Ă  11:07 +1100, Con Kolivas a Ă©crit :
> > > sched rsdl fix
> >
> > Doesn't change a thing. Always breaks at the same place (though
> > depending on hardware timings? the trace is not always the same). Pretty
> > sure nothing happens before this failure
>
> Bummer. The only other thing to try is v0.29 posted recently. I still
> haven't got a good way to reproduce this locally but I'll keep trying.
> Thanks for testing.

Oh and if that oopses and you still have the time, could you please test 0.29 
on 2.6.20.2 (available from same directory).

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: "Make nenuconfig" does not save parameters.

2007-03-11 Thread Cyrill Gorcunov
[Sam Ravnborg - Sat, Mar 10, 2007 at 11:45:34PM +0100]
| On Sat, Mar 10, 2007 at 10:34:41PM +0100, Jan Engelhardt wrote:
| > 
| > On Mar 10 2007 22:27, Sam Ravnborg wrote:
| > >On Sat, Mar 10, 2007 at 07:23:41PM +0100, Jan Engelhardt wrote:
| > >> 
| > >> Whether the 'working config file path' should change when you do
| > >> 'Save as Alternate' or not, is a menuconfig axiom. Ask Sam Ravnborg
| > >> if you want it changed :-)
| > >
| > >Current behaviour is not logical but on the other hand I do not
| > >see a big need to make it so.
| > >It seems that people very seldom uses "save alternate" anyway.
| > >
| > >But patches are welcome.
| > 
| > ^_^ The patch has already been posted, has not it?
| No.
| Either we keep current behaviour or we change to the "normal"
| behaviour with a "Save as..." as know from all other programs.
| 
|   Sam
| 

Hi Sam,

here is a patch for menuconfig that shows current configuration
file. So I think menuconfig does its work well but the only
thing we need is to show location of an _active_ configuration.

Any comments are welcome (and you may swear at me too :)

Cyrill

diff --git a/scripts/kconfig/mconf.c b/scripts/kconfig/mconf.c
index 3f9a132..cde6792 100644
--- a/scripts/kconfig/mconf.c
+++ b/scripts/kconfig/mconf.c
@@ -602,6 +602,12 @@ static void conf(struct menu *menu)
item_set_tag('L');
item_make(_("Save an Alternate Configuration 
File"));
item_set_tag('S');
+   item_make("--- ");
+   item_set_tag(':');
+   item_make(_("Current Configuration File: "));
+   item_set_tag(':');
+   item_add_str("%s", filename);
+
}
dialog_clear();
res = dialog_menu(prompt ? prompt : _("Main Menu"),
@@ -816,8 +822,11 @@ static void conf_load(void)
case 0:
if (!dialog_input_result[0])
return;
-   if (!conf_read(dialog_input_result))
+   if (!conf_read(dialog_input_result)) {
+   memset(filename, 0x0, PATH_MAX+1);
+   strncpy(filename, dialog_input_result, 
PATH_MAX);
return;
+   }
show_textbox(NULL, _("File does not exist!"), 5, 38);
break;
case 1:
@@ -840,8 +849,11 @@ static void conf_save(void)
case 0:
if (!dialog_input_result[0])
return;
-   if (!conf_write(dialog_input_result))
+   if (!conf_write(dialog_input_result)) {
+   memset(filename, 0x0, PATH_MAX+1);
+   strncpy(filename, dialog_input_result, 
PATH_MAX);
return;
+   }
show_textbox(NULL, _("Can't create file!  Probably a 
nonexistent directory."), 5, 60);
break;
case 1:
@@ -903,7 +915,7 @@ int main(int ac, char **av)
 
switch (res) {
case 0:
-   if (conf_write(NULL)) {
+   if (conf_write(filename)) {
fprintf(stderr, _("\n\n"
"Error during writing of the kernel 
configuration.\n"
"Your kernel configuration changes were NOT 
saved."


Re: Use of absolute timeouts for oneshot timers

2007-03-11 Thread Thomas Gleixner
On Sat, 2007-03-10 at 16:42 -0800, Jeremy Fitzhardinge wrote:
> Thomas Gleixner wrote:
> > It's simply enforced in NO_HZ, HIGHRES mode as we operate in absolute
> > time, which is read back from the clocksource, even if we use a relative
> > value for real hardware clock event devices to program the next event.
> > We calculate the delta between the absolute event and now. So we never
> > get an accumulating error.
> >
> > What problem are you observing ?
> 
> Actually, two things.  There was the unexpected pauses during boot,
> which is trivially fixable by not using the Xen periodic timer, and
> using the single-shot fallback.
> 
> But I'm making the more general observation that if you use an absolute
> rather than relative time to set the single-shot timeout, then you have
> to deal with a long-term cumulative drift between the kernel's monotonic
> time and the hypervisor's monotonic time.  This can happen even if your
> clocksource is derived directly from the hypervisor monotonic time,
> because running ntp will warp the kernel's time, and so it will drift
> with respect to the hypervisor clock.  You can only avoid this by 1) not
> allowing adjtime, or 2) making those same adjtime warps to the
> hypervisor time.  Neither of these is a good general solution.

Sigh, yes. Using a relative time for the next event is probably the
least ugly solution

tglx



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 5/7] Per-container OOM killer and page reclamation

2007-03-11 Thread Pavel Emelianov
Balbir Singh wrote:
> Hi, Pavel,
> 
> Please find my patch to add LRU behaviour to your latest RSS controller.

Thanks for participation and additional testing :)
I'll include this into next generation of patches.

> Balbir Singh
> Linux Technology Center
> IBM, ISTL
> 
> 
> 
> 
> Add LRU behaviour to the RSS controller patches posted by Pavel Emelianov
> 
>   http://lkml.org/lkml/2007/3/6/198
> 
> which was in turn similar to the RSS controller posted by me
> 
>   http://lkml.org/lkml/2007/2/26/8
> 
> Pavel's patches have a per container list of pages, which helps reduce
> reclaim time of the RSS controller but the per container list of pages is
> in FIFO order. I've implemented active and inactive lists per container to
> help select the right set of pages to reclaim when the container is under
> memory pressure.
> 
> I've tested these patches on a ppc64 machine and they work fine for
> the minimal testing I've done.
> 
> Pavel would you please include these patches in your next iteration.
> 
> Comments, suggestions and further improvements are as always welcome!
> 
> Signed-off-by: <[EMAIL PROTECTED]>
> ---
> 
>  include/linux/rss_container.h |1 
>  mm/rss_container.c|   47 
> +++---
>  mm/swap.c |5 
>  mm/vmscan.c   |3 ++
>  4 files changed, 44 insertions(+), 12 deletions(-)
> 
> diff -puN include/linux/rss_container.h~rss-container-lru2 
> include/linux/rss_container.h
> --- linux-2.6.20/include/linux/rss_container.h~rss-container-lru2 
> 2007-03-09 22:52:56.0 +0530
> +++ linux-2.6.20-balbir/include/linux/rss_container.h 2007-03-10 
> 00:39:59.0 +0530
> @@ -19,6 +19,7 @@ int container_rss_prepare(struct page *,
>  void container_rss_add(struct page_container *);
>  void container_rss_del(struct page_container *);
>  void container_rss_release(struct page_container *);
> +void container_rss_move_lists(struct page *pg, bool active);
>  
>  int mm_init_container(struct mm_struct *mm, struct task_struct *tsk);
>  void mm_free_container(struct mm_struct *mm);
> diff -puN mm/rss_container.c~rss-container-lru2 mm/rss_container.c
> --- linux-2.6.20/mm/rss_container.c~rss-container-lru22007-03-09 
> 22:52:56.0 +0530
> +++ linux-2.6.20-balbir/mm/rss_container.c2007-03-10 02:42:54.0 
> +0530
> @@ -17,7 +17,8 @@ static struct container_subsys rss_subsy
>  
>  struct rss_container {
>   struct res_counter res;
> - struct list_head page_list;
> + struct list_head inactive_list;
> + struct list_head active_list;
>   struct container_subsys_state css;
>  };
>  
> @@ -96,6 +97,26 @@ void container_rss_release(struct page_c
>   kfree(pc);
>  }
>  
> +void container_rss_move_lists(struct page *pg, bool active)
> +{
> + struct rss_container *rss;
> + struct page_container *pc;
> +
> + if (!page_mapped(pg))
> + return;
> +
> + pc = page_container(pg);
> + BUG_ON(!pc);
> + rss = pc->cnt;
> +
> + spin_lock_irq(>res.lock);
> + if (active)
> + list_move(>list, >active_list);
> + else
> + list_move(>list, >inactive_list);
> + spin_unlock_irq(>res.lock);
> +}
> +
>  void container_rss_add(struct page_container *pc)
>  {
>   struct page *pg;
> @@ -105,7 +126,7 @@ void container_rss_add(struct page_conta
>   rss = pc->cnt;
>  
>   spin_lock(>res.lock);
> - list_add(>list, >page_list);
> + list_add(>list, >active_list);
>   spin_unlock(>res.lock);
>  
>   page_container(pg) = pc;
> @@ -141,7 +162,10 @@ unsigned long container_isolate_pages(un
>   struct zone *z;
>  
>   spin_lock_irq(>res.lock);
> - src = >page_list;
> + if (active)
> + src = >active_list;
> + else
> + src = >inactive_list;
>  
>   for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
>   pc = list_entry(src->prev, struct page_container, list);
> @@ -152,13 +176,10 @@ unsigned long container_isolate_pages(un
>  
>   spin_lock(>lru_lock);
>   if (PageLRU(page)) {
> - if ((active && PageActive(page)) ||
> - (!active && !PageActive(page))) {
> - if (likely(get_page_unless_zero(page))) {
> - ClearPageLRU(page);
> - nr_taken++;
> - list_move(>lru, dst);
> - }
> + if (likely(get_page_unless_zero(page))) {
> + ClearPageLRU(page);
> + nr_taken++;
> + list_move(>lru, dst);
>   }
>   }
>   spin_unlock(>lru_lock);
> @@ -212,7 +233,8 @@ static int rss_create(struct 

Re: [PATCH] [scsi]: Add offline state checking while dispatch a scsi cmd

2007-03-11 Thread Andrew Morton
> On Fri, 9 Mar 2007 09:40:40 +0800 Joe Jin <[EMAIL PROTECTED]> wrote:
> > What's the error you're trying to fix?  scsi_dispatch_cmd() is only
> > called from scsi_request_fn() which already has an equivalent of this
> > check in it just prior to calling dispatch.
> 
> Yeah, I have saw the cheking at scsi_request_fn(), recently we got a crash
> info as following at rhel4 2.6.9-42.0.2.ELsmp,

The 2.6.9 base is very old in mainline terms.  Are you sure the bug hasn't
been fixed in mainline by other means?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 0/3] swsusp: Stop using page flags

2007-03-11 Thread Rafael J. Wysocki
Hi,

The following three patches make swsusp use its own data structures for memory
management instead of special page flags.  Thus the page flags used so far by
swsusp (PG_nosave, PG_nosave_free) can be used for other purposes and I believe
there are some urgend needs of them. :-)

Last week I sent these patches to the linux-pm and linux-mm lists and there
were no negative comments.  Also I've been testing them on my x86_64 boxes for
a few days and apparently they don't break anything.  I think they can go into
-mm for testing.

Comments are welcome.

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 1/3] swsusp: Use inline functions for changing page flags

2007-03-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with
calls to inline functions that can be changed in subsequent patches without
modifying the code calling them.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 include/linux/suspend.h |   33 +
 kernel/power/snapshot.c |   48 +---
 mm/page_alloc.c |6 +++---
 3 files changed, 61 insertions(+), 26 deletions(-)

Index: linux-2.6.21-rc2/include/linux/suspend.h
===
--- linux-2.6.21-rc2.orig/include/linux/suspend.h   2007-03-02 
09:05:53.0 +0100
+++ linux-2.6.21-rc2/include/linux/suspend.h2007-03-02 09:24:02.0 
+0100
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* struct pbe is used for creating lists of pages that should be restored
  * atomically during the resume from disk, because the page frames they have
@@ -49,6 +50,38 @@ void __save_processor_state(struct saved
 void __restore_processor_state(struct saved_context *ctxt);
 unsigned long get_safe_page(gfp_t gfp_mask);
 
+/* Page management functions for the software suspend (swsusp) */
+
+static inline void swsusp_set_page_forbidden(struct page *page)
+{
+   SetPageNosave(page);
+}
+
+static inline int swsusp_page_is_forbidden(struct page *page)
+{
+   return PageNosave(page);
+}
+
+static inline void swsusp_unset_page_forbidden(struct page *page)
+{
+   ClearPageNosave(page);
+}
+
+static inline void swsusp_set_page_free(struct page *page)
+{
+   SetPageNosaveFree(page);
+}
+
+static inline int swsusp_page_is_free(struct page *page)
+{
+   return PageNosaveFree(page);
+}
+
+static inline void swsusp_unset_page_free(struct page *page)
+{
+   ClearPageNosaveFree(page);
+}
+
 /*
  * XXX: We try to keep some more pages free so that I/O operations succeed
  * without paging. Might this be more?
Index: linux-2.6.21-rc2/kernel/power/snapshot.c
===
--- linux-2.6.21-rc2.orig/kernel/power/snapshot.c   2007-03-02 
09:05:53.0 +0100
+++ linux-2.6.21-rc2/kernel/power/snapshot.c2007-03-02 09:27:06.0 
+0100
@@ -67,15 +67,15 @@ static void *get_image_page(gfp_t gfp_ma
 
res = (void *)get_zeroed_page(gfp_mask);
if (safe_needed)
-   while (res && PageNosaveFree(virt_to_page(res))) {
+   while (res && swsusp_page_is_free(virt_to_page(res))) {
/* The page is unsafe, mark it for swsusp_free() */
-   SetPageNosave(virt_to_page(res));
+   swsusp_set_page_forbidden(virt_to_page(res));
allocated_unsafe_pages++;
res = (void *)get_zeroed_page(gfp_mask);
}
if (res) {
-   SetPageNosave(virt_to_page(res));
-   SetPageNosaveFree(virt_to_page(res));
+   swsusp_set_page_forbidden(virt_to_page(res));
+   swsusp_set_page_free(virt_to_page(res));
}
return res;
 }
@@ -91,8 +91,8 @@ static struct page *alloc_image_page(gfp
 
page = alloc_page(gfp_mask);
if (page) {
-   SetPageNosave(page);
-   SetPageNosaveFree(page);
+   swsusp_set_page_forbidden(page);
+   swsusp_set_page_free(page);
}
return page;
 }
@@ -110,9 +110,9 @@ static inline void free_image_page(void 
 
page = virt_to_page(addr);
 
-   ClearPageNosave(page);
+   swsusp_unset_page_forbidden(page);
if (clear_nosave_free)
-   ClearPageNosaveFree(page);
+   swsusp_unset_page_free(page);
 
__free_page(page);
 }
@@ -615,7 +615,8 @@ static struct page *saveable_highmem_pag
 
BUG_ON(!PageHighMem(page));
 
-   if (PageNosave(page) || PageReserved(page) || PageNosaveFree(page))
+   if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page) ||
+   PageReserved(page))
return NULL;
 
return page;
@@ -681,7 +682,7 @@ static struct page *saveable_page(unsign
 
BUG_ON(PageHighMem(page));
 
-   if (PageNosave(page) || PageNosaveFree(page))
+   if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
return NULL;
 
if (PageReserved(page) && pfn_is_nosave(pfn))
@@ -821,9 +822,10 @@ void swsusp_free(void)
if (pfn_valid(pfn)) {
struct page *page = pfn_to_page(pfn);
 
-   if (PageNosave(page) && PageNosaveFree(page)) {
-   ClearPageNosave(page);
-   ClearPageNosaveFree(page);
+   if (swsusp_page_is_forbidden(page) &&
+   

[RFC][PATCH 3/3] mm: Remove unused page flags

2007-03-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Remove the two page flags that were previously used by swsusp and are no longer
needed.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 include/linux/page-flags.h |   12 
 1 file changed, 12 deletions(-)

Index: linux-2.6.21-rc3/include/linux/page-flags.h
===
--- linux-2.6.21-rc3.orig/include/linux/page-flags.h
+++ linux-2.6.21-rc3/include/linux/page-flags.h
@@ -82,13 +82,11 @@
 #define PG_private 11  /* If pagecache, has fs-private data */
 
 #define PG_writeback   12  /* Page is under writeback */
-#define PG_nosave  13  /* Used for system suspend/resume */
 #define PG_compound14  /* Part of a compound page */
 #define PG_swapcache   15  /* Swap page: swp_entry_t in private */
 
 #define PG_mappedtodisk16  /* Has blocks allocated on-disk 
*/
 #define PG_reclaim 17  /* To be reclaimed asap */
-#define PG_nosave_free 18  /* Used for system suspend/resume */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
 /* PG_owner_priv_1 users should have descriptive aliases */
@@ -214,16 +212,6 @@ static inline void SetPageUptodate(struc
ret;\
})
 
-#define PageNosave(page)   test_bit(PG_nosave, &(page)->flags)
-#define SetPageNosave(page)set_bit(PG_nosave, &(page)->flags)
-#define TestSetPageNosave(page)test_and_set_bit(PG_nosave, 
&(page)->flags)
-#define ClearPageNosave(page)  clear_bit(PG_nosave, &(page)->flags)
-#define TestClearPageNosave(page)  test_and_clear_bit(PG_nosave, 
&(page)->flags)
-
-#define PageNosaveFree(page)   test_bit(PG_nosave_free, &(page)->flags)
-#define SetPageNosaveFree(page)set_bit(PG_nosave_free, &(page)->flags)
-#define ClearPageNosaveFree(page)  clear_bit(PG_nosave_free, 
&(page)->flags)
-
 #define PageBuddy(page)test_bit(PG_buddy, &(page)->flags)
 #define __SetPageBuddy(page)   __set_bit(PG_buddy, &(page)->flags)
 #define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 2/3] swsusp: Do not use page flags

2007-03-11 Thread Rafael J. Wysocki
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Make swsusp use memory bitmaps instead of page flags for marking 'nosave' and
free pages.  This allows us to 'recycle' two page flags that can be used for 
other
purposes.  Also, the memory needed to store the bitmaps is allocated when
necessary (ie. before the suspend) and freed after the resume which is more
reasonable.

The patch is designed to minimize the amount of changes and there are some nice
simplifications and optimizations possible on top of it.  I am going to
implement them separately in the future.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/e820.c |   26 +---
 include/linux/suspend.h   |   58 +++---
 kernel/power/disk.c   |   23 +++-
 kernel/power/power.h  |2 
 kernel/power/snapshot.c   |  250 +++---
 kernel/power/user.c   |4 
 6 files changed, 281 insertions(+), 82 deletions(-)

Index: linux-2.6.21-rc3/include/linux/suspend.h
===
--- linux-2.6.21-rc3.orig/include/linux/suspend.h
+++ linux-2.6.21-rc3/include/linux/suspend.h
@@ -24,63 +24,41 @@ struct pbe {
 extern void drain_local_pages(void);
 extern void mark_free_pages(struct zone *zone);
 
-#ifdef CONFIG_PM
-/* kernel/power/swsusp.c */
-extern int software_suspend(void);
-
-#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
+#if defined(CONFIG_PM) && defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
 extern int pm_prepare_console(void);
 extern void pm_restore_console(void);
 #else
 static inline int pm_prepare_console(void) { return 0; }
 static inline void pm_restore_console(void) {}
-#endif /* defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE) */
+#endif
+
+#if defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND)
+/* kernel/power/swsusp.c */
+extern int software_suspend(void);
+/* kernel/power/snapshot.c */
+extern void __init register_nosave_region(unsigned long, unsigned long);
+extern int swsusp_page_is_forbidden(struct page *);
+extern void swsusp_set_page_free(struct page *);
+extern void swsusp_unset_page_free(struct page *);
+extern unsigned long get_safe_page(gfp_t gfp_mask);
 #else
 static inline int software_suspend(void)
 {
printk("Warning: fake suspend called\n");
return -ENOSYS;
 }
-#endif /* CONFIG_PM */
+
+static inline void register_nosave_region(unsigned long b, unsigned long e) {}
+static inline int swsusp_page_is_forbidden(struct page *p) { return 0; }
+static inline void swsusp_set_page_free(struct page *p) {}
+static inline void swsusp_unset_page_free(struct page *p) {}
+#endif /* defined(CONFIG_PM) && defined(CONFIG_SOFTWARE_SUSPEND) */
 
 void save_processor_state(void);
 void restore_processor_state(void);
 struct saved_context;
 void __save_processor_state(struct saved_context *ctxt);
 void __restore_processor_state(struct saved_context *ctxt);
-unsigned long get_safe_page(gfp_t gfp_mask);
-
-/* Page management functions for the software suspend (swsusp) */
-
-static inline void swsusp_set_page_forbidden(struct page *page)
-{
-   SetPageNosave(page);
-}
-
-static inline int swsusp_page_is_forbidden(struct page *page)
-{
-   return PageNosave(page);
-}
-
-static inline void swsusp_unset_page_forbidden(struct page *page)
-{
-   ClearPageNosave(page);
-}
-
-static inline void swsusp_set_page_free(struct page *page)
-{
-   SetPageNosaveFree(page);
-}
-
-static inline int swsusp_page_is_free(struct page *page)
-{
-   return PageNosaveFree(page);
-}
-
-static inline void swsusp_unset_page_free(struct page *page)
-{
-   ClearPageNosaveFree(page);
-}
 
 /*
  * XXX: We try to keep some more pages free so that I/O operations succeed
Index: linux-2.6.21-rc3/kernel/power/snapshot.c
===
--- linux-2.6.21-rc3.orig/kernel/power/snapshot.c
+++ linux-2.6.21-rc3/kernel/power/snapshot.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -34,6 +35,10 @@
 
 #include "power.h"
 
+static int swsusp_page_is_free(struct page *);
+static void swsusp_set_page_forbidden(struct page *);
+static void swsusp_unset_page_forbidden(struct page *);
+
 /* List of PBEs needed for restoring the pages that were allocated before
  * the suspend and included in the suspend image, but have also been
  * allocated by the "resume" kernel, so their contents cannot be written
@@ -224,11 +229,6 @@ static void chain_free(struct chain_allo
  * of type unsigned long each).  It also contains the pfns that
  * correspond to the start and end of the represented memory area and
  * the number of bit chunks in the block.
- *
- * NOTE: Memory bitmaps are used for two types of operations only:
- * "set a bit" and "find the next bit set".  Moreover, the searching
- * is always carried out after all of the "set a bit" operations
- * on given bitmap.
  */
 
 #define BM_END_OF_MAP  (~0UL)

[PATCH] drivers/isdn/hardware/eicon/: remove unused header files

2007-03-11 Thread Armin Schindler
Hi all,

as pointed out by Robert P. J. Day, here is a patch to remove unused header
files from Eicon/Dialogic ISDN driver.


Signed-off-by: Armin Schindler <[EMAIL PROTECTED]>

---

diff -Nur linux-2.6.20.1.orig/drivers/isdn/hardware/eicon/dbgioctl.h 
linux-2.6.20.1/drivers/isdn/hardware/eicon/dbgioctl.h
--- linux-2.6.20.1.orig/drivers/isdn/hardware/eicon/dbgioctl.h  2007-03-10 
11:21:15.0 +0100
+++ linux-2.6.20.1/drivers/isdn/hardware/eicon/dbgioctl.h   1970-01-01 
01:00:00.0 +0100
@@ -1,198 +0,0 @@
-
-/*
- *
-  Copyright (c) Eicon Technology Corporation, 2000.
- *
-  This source file is supplied for the use with Eicon
-  Technology Corporation's range of DIVA Server Adapters.
- *
-  This program is free software; you can redistribute it and/or modify
-  it under the terms of the GNU General Public License as published by
-  the Free Software Foundation; either version 2, or (at your option)
-  any later version.
- *
-  This program is distributed in the hope that it will be useful,
-  but WITHOUT ANY WARRANTY OF ANY KIND WHATSOEVER INCLUDING ANY
-  implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-  See the GNU General Public License for more details.
- *
-  You should have received a copy of the GNU General Public License
-  along with this program; if not, write to the Free Software
-  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- */
-/*--*/
-/* file: dbgioctl.h */
-/*--*/
-
-#if !defined(__DBGIOCTL_H__)
-
-#define __DBGIOCTL_H__
-
-#ifdef NOT_YET_NEEDED
-/*
- * The requested operation is passed in arg0 of DbgIoctlArgs,
- * additional arguments (if any) in arg1, arg2 and arg3.
- */
-
-typedef struct
-{  ULONG   arg0 ;
-   ULONG   arg1 ;
-   ULONG   arg2 ;
-   ULONG   arg3 ;
-} DbgIoctlArgs ;
-
-#defineDBG_COPY_LOGS   0   /* copy debugs to user until buffer 
full*/
-   /* arg1: size threshold 
*/
-   /* arg2: timeout in 
milliseconds*/
-
-#define DBG_FLUSH_LOGS 1   /* flush pending debugs to user buffer  
*/
-   /* arg1: internal 
driver id */
-
-#define DBG_LIST_DRVS  2   /* return the list of registered drivers
*/
-
-#defineDBG_GET_MASK3   /* get current debug mask of driver 
*/
-   /* arg1: internal 
driver id */
-
-#defineDBG_SET_MASK4   /* set/change debug mask of driver  
*/
-   /* arg1: internal 
driver id */
-   /* arg2: new debug mask 
*/
-
-#defineDBG_GET_BUFSIZE 5   /* get current buffer size of driver
*/
-   /* arg1: internal 
driver id */
-   /* arg2: new debug mask 
*/
-
-#defineDBG_SET_BUFSIZE 6   /* set new buffer size of driver
*/
-   /* arg1: new buffer 
size*/
-
-/*
- * common internal debug message structure
- */
-
-typedef struct
-{  unsigned short id ; /* virtual driver id  */
-   unsigned short type ;   /* special message type   */
-   unsigned long  seq ;/* sequence number of message */
-   unsigned long  size ;   /* size of message in bytes   */
-   unsigned long  next ;   /* offset to next buffered message*/
-   LARGE_INTEGER  NTtime ; /* 100 ns  since 1.1.1601 */
-   unsigned char  data[4] ;/* message data   */
-} OldDbgMessage ;
-
-typedef struct
-{  LARGE_INTEGER  NTtime ; /* 100 ns  since 1.1.1601 */
-   unsigned short size ;   /* size of message in bytes   */
-   unsigned short  ;   /* always 0x to indicate new msg  */
-   unsigned short id ; /* virtual driver id  */
-   unsigned short type ;   /* special message type   */
-   unsigned long  seq ;/* sequence number of message */
-   unsigned char  data[4] ;/* message data   */
-} DbgMessage ;
-
-#endif
-
-#define DRV_ID_UNKNOWN 0x0C/* for messages via 

Re: [RFC][PATCH 0/3] swsusp: Stop using page flags

2007-03-11 Thread Peter Zijlstra
On Sun, 2007-03-11 at 11:17 +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> The following three patches make swsusp use its own data structures for memory
> management instead of special page flags.  Thus the page flags used so far by
> swsusp (PG_nosave, PG_nosave_free) can be used for other purposes and I 
> believe
> there are some urgend needs of them. :-)
> 
> Last week I sent these patches to the linux-pm and linux-mm lists and there
> were no negative comments.  Also I've been testing them on my x86_64 boxes for
> a few days and apparently they don't break anything.  I think they can go into
> -mm for testing.
> 
> Comments are welcome.

These patches have my blessing, they look good to me, but I'm not much
involved with the swsusp code, so I won't ACK them.

Again, thanks a bunch for freeing up 2 page flags :-)

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA resume slowness, e1000 MSI warning

2007-03-11 Thread Eric W. Biederman
"Michael S. Tsirkin" <[EMAIL PROTECTED]> writes:

>> The only case I can see which might trigger this is if we saved
>> pci-X state and then didn't restore it because we could not find
>> the capability on restore.
>
> Hmm. pci_save_pcix_state/pci_restore_pcix_state seem to only handle
> regular devices and seem to ignore the fact that for bridge PCI-X
> capability has a different structure.
>
> Is this intentional? 

Probably not a such.  I don't think we have any drivers for bridge
devices so I don't think it matters.  It likely wouldn't hurt to fix
it just in case though.

Do any of the mellanox cards do anything with the bridge on the card?

> If not, here's a patch to fix this. Warning: completely untested.

If you fix the offsets and diff this against my last fix (to never
free the buffer) I think your patch makes sense.

> PCI: restore bridge PCI-X capability registers after PM event
>
> Restore PCI-X bridge up/downstream capability registers
> after PM event.  This includes maxumum split transaction
> commitment limit which might be vital for PCI X.
>
> Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index df49530..4b788ef 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -597,14 +597,19 @@ static int pci_save_pcix_state(struct pci_dev *dev)
>   if (pos <= 0)
>   return 0;
>  
> - save_state = kzalloc(sizeof(*save_state) + sizeof(u16), GFP_KERNEL);
> + save_state = kzalloc(sizeof(*save_state) + sizeof(u16) * 2, GFP_KERNEL);
>   if (!save_state) {
> - dev_err(>dev, "Out of memory in pci_save_pcie_state\n");
> + dev_err(>dev, "Out of memory in pci_save_pcix_state\n");
>   return -ENOMEM;
>   }
>   cap = (u16 *)_state->data[0];
>  
> - pci_read_config_word(dev, pos + PCI_X_CMD, [i++]);
> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {

This appears to be the proper test.

> + pci_read_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, [i++]);
> + pci_read_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, [i++]);
> + } else
> + pci_read_config_word(dev, pos + PCI_X_CMD, [i++]);
> +
>   pci_add_saved_cap(dev, save_state);
>   return 0;
>  }
> @@ -621,7 +626,11 @@ static void pci_restore_pcix_state(struct pci_dev *dev)
>   return;
>   cap = (u16 *)_state->data[0];
>  
> - pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> + pci_write_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, cap[i++]);
> + pci_write_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, cap[i++]);

These look like the proper two registers to save.

> + } else
> + pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
>   pci_remove_saved_cap(save_state);
>   kfree(save_state);
>  }
> diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
> index f09cce2..fb7eefd 100644
> --- a/include/linux/pci_regs.h
> +++ b/include/linux/pci_regs.h
> @@ -332,6 +332,8 @@
>  #define PCI_X_STATUS_SPL_ERR 0x2000 /* Rcvd Split Completion Error Msg */
>  #define  PCI_X_STATUS_266MHZ 0x4000  /* 266 MHz capable */
>  #define  PCI_X_STATUS_533MHZ 0x8000  /* 533 MHz capable */
> +#define PCI_X_BRIDGE_UP_SPL_CTL 10 /* PCI-X upstream split transaction limit 
> */
> +#define PCI_X_BRIDGE_DN_SPL_CTL 14 /* PCI-X downstream split transaction 
> limit */

Unless I am completely misreading the spec. While you have picked the
right register to save the offsets should be 0x08 and 0x0c or 8 and 12

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 2.6.16.44-rc1

2007-03-11 Thread Adrian Bunk
Security fixes since 2.6.16.43:
- CVE-2007-0005: Fix buffer overflow in Omnikey CardMan 4040 driver
- CVE-2007-1000: [IPV6]: Handle np->opt being NULL in ipv6_getsockopt_sticky().


Location:
ftp://ftp.kernel.org/pub/linux/kernel/people/bunk/linux-2.6.16.y/testing/

git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git


Changes since 2.6.16.43:

Adrian Bunk (1):
  Linux 2.6.16.44-rc1

Ang Way Chuang (1):
  dvb-core: fix bug in CRC-32 checking on 64-bit systems

Arnaldo Carvalho de Melo (1):
  [TCP]: Fix minisock tcp_create_openreq_child() typo.

Arthur Kepner (1):
  IB/mthca: Use mmiowb after doorbell ring

Chris Wright (1):
  [IPV6] fix ipv6_getsockopt_sticky copy_to_user leak

Dan Yeisley (1):
  init_reap_node() initialization fix

David Moore (1):
  Missing critical phys_to_virt in lib/swiotlb.c

David S. Miller (4):
  video/aty/mach64_ct.c: fix bogus delay loop
  [SPARC64] bbc_i2c: Fix kenvctrld eating %100 cpu.
  [IPV6]: Handle np->opt being NULL in ipv6_getsockopt_sticky(). 
(CVE-2007-1000)
  SPARC64: Fix memory corruption in pci_4u_free_consistent()

David Stevens (1):
  [IPV6]: /proc/net/anycast6 unbalanced inet6_dev refcnt

Eli Cohen (1):
  IPoIB: Rejoin all multicast groups after a port event

Eric Dumazet (1):
  [INET]: twcal_jiffie should be unsigned long, not int

Herbert Xu (1):
  [UDP]: Reread uh pointer after pskb_trim

Hugh Dickins (1):
  make ppc64 current preempt-safe

Jin-Bong lee (1):
  DVB: cxusb: fix firmware patch for big endian systems

Komuro (1):
  modify 3c589_cs to be SMP safe

Marcel Holtmann (1):
  Fix buffer overflow in Omnikey CardMan 4040 driver (CVE-2007-0005)

Michael S. Tsirkin (1):
  IB/mthca: Fix off-by-one in FMR handling on memfree

Michal Wrobel (1):
  [IPV6]: anycast refcnt fix

Olaf Kirch (1):
  [IPV6]: Fix for ipv6_setsockopt NULL dereference

Sergey Vlasov (1):
  Input: psmouse - fix attribute access on 64-bit systems


 Makefile|2 +-
 arch/sparc64/kernel/pci_iommu.c |2 +-
 drivers/char/pcmcia/cm4040_cs.c |3 ++-
 drivers/infiniband/hw/mthca/mthca_cq.c  |7 +++
 drivers/infiniband/hw/mthca/mthca_memfree.c |2 +-
 drivers/infiniband/hw/mthca/mthca_qp.c  |   19 +++
 drivers/infiniband/hw/mthca/mthca_srq.c |8 
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |4 +++-
 drivers/input/mouse/psmouse-base.c  |8 +---
 drivers/media/dvb/dvb-core/dvb_net.c|4 ++--
 drivers/media/dvb/dvb-usb/cxusb.c   |4 ++--
 drivers/net/pcmcia/3c589_cs.c   |7 +--
 drivers/sbus/char/bbc_i2c.c |   17 +
 drivers/video/aty/mach64_ct.c   |4 ++--
 include/asm-powerpc/current.h   |   12 +++-
 include/net/inet_timewait_sock.h|2 +-
 lib/swiotlb.c   |2 +-
 mm/slab.c   |2 +-
 net/ipv4/tcp_minisocks.c|2 +-
 net/ipv4/udp.c  |1 +
 net/ipv6/addrconf.c |2 ++
 net/ipv6/anycast.c  |1 +
 net/ipv6/ipv6_sockglue.c|   14 +-
 23 files changed, 95 insertions(+), 34 deletions(-)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] revoke: core code

2007-03-11 Thread Pekka Enberg
On Fri, 2007-03-09 at 10:15 +0200, Pekka J Enberg wrote:
> > +  again:
> > +   restart_addr = zap_page_range(vma, start_addr, end_addr - start_addr,
> > + details);
> > +
> > +   need_break = need_resched() || need_lockbreak(details->i_mmap_lock);
> > +   if (need_break)
> > +   goto out_need_break;
> > +
> > +   if (restart_addr < end_addr) {
> > +   start_addr = restart_addr;
> > +   goto again;
> > +   }
> > +   return 0;
> > +
> > +  out_need_break:
> > +   spin_unlock(details->i_mmap_lock);
> > +   cond_resched();
> > +   spin_lock(details->i_mmap_lock);
> > +   return -EINTR;

On Fri, 2007-03-09 at 13:30 +0100, Peter Zijlstra wrote:
> I'm not sure this scheme works, given a sufficiently loaded machine,
> this might never complete.

Hmm, so what's the alternative? It's better to fail revoke than lock up
the box.

On Fri, 2007-03-09 at 13:30 +0100, Peter Zijlstra wrote:
> I'm never sure of operator precedence and prefer:
> 
>  (vma->vm_flags & VM_SHARED) && ...
> 
> which leaves no room for error.

Thanks, fixed.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] revoke: special mmap handling

2007-03-11 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

This adds special handling for revoked memory mappings.  We want to
raise SIGBUS when accessing revoked mappings and return ENODEV when
trying to remap with mmap(2).

Acked-by: Alan Cox <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 include/linux/mm.h |1 +
 mm/memory.c|3 +++
 mm/mmap.c  |   12 
 3 files changed, 12 insertions(+), 4 deletions(-)

Index: uml-2.6/include/linux/mm.h
===
--- uml-2.6.orig/include/linux/mm.h 2007-03-11 13:07:57.0 +0200
+++ uml-2.6/include/linux/mm.h  2007-03-11 13:09:19.0 +0200
@@ -169,6 +169,7 @@ #define VM_NONLINEAR0x0080  /* Is no
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
 #define VM_INSERTPAGE  0x0200  /* The vma has had "vm_insert_page()" 
done on it */
 #define VM_ALWAYSDUMP  0x0400  /* Always include in core dumps */
+#define VM_REVOKED 0x0800  /* Mapping has been revoked */
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
Index: uml-2.6/mm/memory.c
===
--- uml-2.6.orig/mm/memory.c2007-03-11 13:07:57.0 +0200
+++ uml-2.6/mm/memory.c 2007-03-11 13:09:19.0 +0200
@@ -2504,6 +2504,9 @@ int __handle_mm_fault(struct mm_struct *
if (unlikely(is_vm_hugetlb_page(vma)))
return hugetlb_fault(mm, vma, address, write_access);
 
+   if (unlikely(vma->vm_flags & VM_REVOKED))
+   return VM_FAULT_SIGBUS;
+
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
if (!pud)
Index: uml-2.6/mm/mmap.c
===
--- uml-2.6.orig/mm/mmap.c  2007-03-11 13:07:57.0 +0200
+++ uml-2.6/mm/mmap.c   2007-03-11 13:09:19.0 +0200
@@ -1030,10 +1030,14 @@ accountable = 0;
error = -ENOMEM;
 munmap_back:
vma = find_vma_prepare(mm, addr, , _link, _parent);
-   if (vma && vma->vm_start < addr + len) {
-   if (do_munmap(mm, addr, len))
-   return -ENOMEM;
-   goto munmap_back;
+   if (vma) {
+   if (unlikely(vma->vm_flags & VM_REVOKED))
+   return -ENODEV;
+   if (vma->vm_start < addr + len) {
+   if (do_munmap(mm, addr, len))
+   return -ENOMEM;
+   goto munmap_back;
+   }
}
 
/* Check against address space limit. */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] revoke: core code

2007-03-11 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

The revokeat(2) and frevoke(2) system calls invalidate open file
descriptors and shared mappings of an inode. After an successful
revocation, operations on file descriptors fail with the EBADF or
ENXIO error code for regular and device files,
respectively. Attempting to read from or write to a revoked mapping
causes SIGBUS.

The actual operation is done in two passes:

 1. Revoke all file descriptors that point to the given inode. We do
this under tasklist_lock so that after this pass, we don't need
to worry about racing with close(2) or dup(2).
   
 2. Take down shared memory mappings of the inode and close all file
pointers.

The file descriptors and memory mapping ranges are preserved until the
owning task does close(2) and munmap(2), respectively.

Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 fs/Makefile  |2 
 fs/revoke.c  |  588 +++
 fs/revoked_inode.c   |  378 +++
 include/linux/fs.h   |4 
 include/linux/revoked_fs_i.h |   20 +
 include/linux/syscalls.h |3 
 6 files changed, 994 insertions(+), 1 deletion(-)

Index: uml-2.6/fs/Makefile
===
--- uml-2.6.orig/fs/Makefile2007-03-11 13:07:57.0 +0200
+++ uml-2.6/fs/Makefile 2007-03-11 13:09:20.0 +0200
@@ -11,7 +11,7 @@ obj-y :=  open.o read_write.o file_table.
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
pnode.o drop_caches.o splice.o sync.o utimes.o \
-   stack.o
+   stack.o revoke.o revoked_inode.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
Index: uml-2.6/include/linux/syscalls.h
===
--- uml-2.6.orig/include/linux/syscalls.h   2007-03-11 13:07:57.0 
+0200
+++ uml-2.6/include/linux/syscalls.h2007-03-11 13:09:20.0 +0200
@@ -605,4 +605,7 @@ asmlinkage long sys_getcpu(unsigned __us
 
 int kernel_execve(const char *filename, char *const argv[], char *const 
envp[]);
 
+asmlinkage int sys_revokeat(int dfd, const char __user *filename);
+asmlinkage int sys_frevoke(unsigned int fd);
+
 #endif
Index: uml-2.6/include/linux/fs.h
===
--- uml-2.6.orig/include/linux/fs.h 2007-03-11 13:07:57.0 +0200
+++ uml-2.6/include/linux/fs.h  2007-03-11 13:09:20.0 +0200
@@ -1100,6 +1100,7 @@ struct file_operations {
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t 
*, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info 
*, size_t, unsigned int);
+   int (*revoke)(struct file *);
 };
 
 struct inode_operations {
@@ -1739,6 +1740,9 @@ extern ssize_t generic_splice_sendpage(s
 extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
size_t len, unsigned int flags);
 
+/* fs/revoke.c */
+extern int generic_file_revoke(struct file *);
+
 extern void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
 extern loff_t no_llseek(struct file *file, loff_t offset, int origin);
Index: uml-2.6/fs/revoke.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ uml-2.6/fs/revoke.c 2007-03-11 13:14:42.0 +0200
@@ -0,0 +1,588 @@
+/*
+ * fs/revoke.c - Invalidate all current open file descriptors of an inode.
+ *
+ * Copyright (C) 2006-2007  Pekka Enberg
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * This is used for pre-allocating an array of file pointers so that we don't
+ * have to do memory allocation under tasklist_lock.
+ */
+struct revoke_table {
+   struct file **files;
+   unsigned long size;
+   unsigned long end;
+   unsigned long restore_start;
+};
+
+struct kmem_cache *revokefs_inode_cache;
+
+/*
+ * Revoked file descriptors point to inodes in the revokefs filesystem.
+ */
+static struct vfsmount *revokefs_mnt;
+
+static struct file *get_revoked_file(void)
+{
+   struct dentry *dentry;
+   struct inode *inode;
+   struct file *filp;
+   struct qstr name;
+
+   filp = get_empty_filp();
+   if (!filp)
+   goto err;
+
+   inode = new_inode(revokefs_mnt->mnt_sb);
+   if (!inode)
+   goto err_inode;
+
+   name.name = "revoked_file";
+   name.len = strlen(name.name);
+   dentry = d_alloc(revokefs_mnt->mnt_sb->s_root, );
+   if (!dentry)
+   goto err_dentry;
+
+   

[PATCH 3/5] revoke: support for ext2 and ext3

2007-03-11 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

Add revoke support to ext2 and ext3 by wiring f_ops->revoke with
generic_file_revoke.

Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 fs/ext2/file.c |1 +
 fs/ext3/file.c |1 +
 2 files changed, 2 insertions(+)

Index: uml-2.6/fs/ext2/file.c
===
--- uml-2.6.orig/fs/ext2/file.c 2007-03-11 13:05:33.0 +0200
+++ uml-2.6/fs/ext2/file.c  2007-03-11 13:09:21.0 +0200
@@ -56,6 +56,7 @@ const struct file_operations ext2_file_o
.sendfile   = generic_file_sendfile,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 #ifdef CONFIG_EXT2_FS_XIP
Index: uml-2.6/fs/ext3/file.c
===
--- uml-2.6.orig/fs/ext3/file.c 2007-03-11 13:05:33.0 +0200
+++ uml-2.6/fs/ext3/file.c  2007-03-11 13:09:21.0 +0200
@@ -123,6 +123,7 @@ const struct file_operations ext3_file_o
.sendfile   = generic_file_sendfile,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 const struct inode_operations ext3_file_inode_operations = {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] revoke: add documentation

2007-03-11 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

This documents revoke file operation in Documentation/filesystems/vfs.txt.

Acked-by: Alan Cox <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 Documentation/filesystems/vfs.txt |5 +
 1 file changed, 5 insertions(+)

Index: uml-2.6/Documentation/filesystems/vfs.txt
===
--- uml-2.6.orig/Documentation/filesystems/vfs.txt  2007-03-11 
13:05:33.0 +0200
+++ uml-2.6/Documentation/filesystems/vfs.txt   2007-03-11 13:09:22.0 
+0200
@@ -732,6 +732,7 @@ struct file_operations {
 int);
ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, 
unsigned  
 int);
+   int (*revoke)(struct file *);
 };
 
 Again, all methods are called without any locks being held, unless
@@ -805,6 +806,10 @@ otherwise noted.
   splice_read: called by the VFS to splice data from file to a pipe. This
   method is used by the splice(2) system call
 
+  revoke: called by revokeat(2) and frevoke(2) system calls to revoke access
+ to an open file. This method must ensure that all currently blocked
+ writes are flushed and reads will fail.
+
 Note that the file operations are implemented by the specific
 filesystem in which the inode resides. When opening a device node
 (character or block special) most filesystems will call special
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] revoke: wire up i386 system calls

2007-03-11 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

Make revokeat and frevoke system calls available to user-space on i386.

Acked-by: Alan Cox <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 arch/i386/kernel/syscall_table.S |3 +++
 include/asm-i386/unistd.h|4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

Index: uml-2.6/arch/i386/kernel/syscall_table.S
===
--- uml-2.6.orig/arch/i386/kernel/syscall_table.S   2007-03-11 
13:05:32.0 +0200
+++ uml-2.6/arch/i386/kernel/syscall_table.S2007-03-11 13:09:23.0 
+0200
@@ -319,3 +319,6 @@ .long sys_unshare   /* 310 */
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+   .long sys_revokeat  /* 320 */
+   .long sys_frevoke
+
Index: uml-2.6/include/asm-i386/unistd.h
===
--- uml-2.6.orig/include/asm-i386/unistd.h  2007-03-11 13:05:33.0 
+0200
+++ uml-2.6/include/asm-i386/unistd.h   2007-03-11 13:09:23.0 +0200
@@ -325,10 +325,12 @@ #define __NR_unshare  310
 #define __NR_move_pages317
 #define __NR_getcpu318
 #define __NR_epoll_pwait   319
+#define __NR_revokeat  320
+#define __NR_frevoke   321
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 320
+#define NR_syscalls 322
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA resume slowness, e1000 MSI warning

2007-03-11 Thread Michael S. Tsirkin
> Quoting Eric W. Biederman <[EMAIL PROTECTED]>:
> Subject: Re: SATA resume slowness, e1000 MSI warning
> 
> "Michael S. Tsirkin" <[EMAIL PROTECTED]> writes:
> 
> >> The only case I can see which might trigger this is if we saved
> >> pci-X state and then didn't restore it because we could not find
> >> the capability on restore.
> >
> > Hmm. pci_save_pcix_state/pci_restore_pcix_state seem to only handle
> > regular devices and seem to ignore the fact that for bridge PCI-X
> > capability has a different structure.
> >
> > Is this intentional? 
> 
> Probably not a such.  I don't think we have any drivers for bridge
> devices so I don't think it matters.  It likely wouldn't hurt to fix
> it just in case though.
> 
> Do any of the mellanox cards do anything with the bridge on the card?

Yes but they do their own thing wrt saving/restoring registers.
Look at drivers/infiniband/hw/mthca/mthca_reset.c

> > If not, here's a patch to fix this. Warning: completely untested.
> 
> If you fix the offsets and diff this against my last fix (to never
> free the buffer) I think your patch makes sense.

Let's agree what the correct offsets are.

> > PCI: restore bridge PCI-X capability registers after PM event
> >
> > Restore PCI-X bridge up/downstream capability registers
> > after PM event.  This includes maxumum split transaction
> > commitment limit which might be vital for PCI X.
> >
> > Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index df49530..4b788ef 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -597,14 +597,19 @@ static int pci_save_pcix_state(struct pci_dev *dev)
> > if (pos <= 0)
> > return 0;
> >  
> > -   save_state = kzalloc(sizeof(*save_state) + sizeof(u16), GFP_KERNEL);
> > + save_state = kzalloc(sizeof(*save_state) + sizeof(u16) * 2, GFP_KERNEL);
> > if (!save_state) {
> > -   dev_err(>dev, "Out of memory in pci_save_pcie_state\n");
> > +   dev_err(>dev, "Out of memory in pci_save_pcix_state\n");
> > return -ENOMEM;
> > }
> > cap = (u16 *)_state->data[0];
> >  
> > -   pci_read_config_word(dev, pos + PCI_X_CMD, [i++]);
> > +   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> 
> This appears to be the proper test.
> 
> > + pci_read_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, [i++]);
> > + pci_read_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, [i++]);
> > +   } else
> > +   pci_read_config_word(dev, pos + PCI_X_CMD, [i++]);
> > +
> > pci_add_saved_cap(dev, save_state);
> > return 0;
> >  }
> > @@ -621,7 +626,11 @@ static void pci_restore_pcix_state(struct pci_dev *dev)
> > return;
> > cap = (u16 *)_state->data[0];
> >  
> > -   pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
> > +   if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> > + pci_write_config_word(dev, pos + PCI_X_BRIDGE_UP_SPL_CTL, cap[i++]);
> > + pci_write_config_word(dev, pos + PCI_X_BRIDGE_DN_SPL_CTL, cap[i++]);
> 
> These look like the proper two registers to save.
> 
> > +   } else
> > +   pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
> > pci_remove_saved_cap(save_state);
> > kfree(save_state);
> >  }
> > diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
> > index f09cce2..fb7eefd 100644
> > --- a/include/linux/pci_regs.h
> > +++ b/include/linux/pci_regs.h
> > @@ -332,6 +332,8 @@
> >  #define PCI_X_STATUS_SPL_ERR 0x2000 /* Rcvd Split Completion Error Msg 
> > */
> >  #define  PCI_X_STATUS_266MHZ   0x4000  /* 266 MHz capable */
> >  #define  PCI_X_STATUS_533MHZ   0x8000  /* 533 MHz capable */
> > +#define PCI_X_BRIDGE_UP_SPL_CTL 10 /* PCI-X upstream split transaction 
> > limit */
> > +#define PCI_X_BRIDGE_DN_SPL_CTL 14 /* PCI-X downstream split transaction 
> > limit */
> 
> Unless I am completely misreading the spec. While you have picked the
> right register to save the offsets should be 0x08 and 0x0c or 8 and 12

No, the spec is written in terms of dwords (32 bit), we are storing words (16 
bits).
The data at offsets 8 and 12 is read-only split transaction capacity.
Split transaction limit starts at bit 16 so you need to add 2 to byte offset.

Right?


-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CIRRUS: Delete unused header file.

2007-03-11 Thread Robert P. J. Day
On Sat, 10 Mar 2007, Andrew Morton wrote:

> > On Sat, 10 Mar 2007 17:27:44 -0500 (EST) "Robert P. J. Day" <[EMAIL 
> > PROTECTED]> wrote:
> >
> >   Delete apparently unused header file
> > sound/pci/cs46xx/imgs/cwcemb80.h.
> >
>
> That patch series was rather a mess
>
> - Multiple patches with the same Subject: (I might have lost some as a result)

yes, that was a bad decision on my part, sorry.

> - Several patches which tried to remove the same header file

*that* shouldn't have happened, those patches were designed to be
independent of one another and, AFAIK, i submitted them only once.  i
have no idea how the above might have happened.

> - Several patches which simply didn't apply

hm ... they were created against the latest git tree, i don't know
why they wouldn't apply.

...

> - Useless indenting in changleog text which I have to edit away.

ah, i'll remember to not indent the changelog text next time, sorry.

rday

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-11 Thread Mike Galbraith
Hi Con,

On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote:
> What follows this email is a patch series for the latest version of the RSDL 
> cpu scheduler (ie v0.29). I have addressed all bugs that I am able to 
> reproduce in this version so if some people would be kind enough to test if 
> there are any hidden bugs or oops lurking, it would be nice to know in 
> anticipation of putting this back in -mm. Thanks.
> 
> Full patch for 2.6.21-rc3-mm2:
> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch

I'm seeing a cpu distribution problem running this on my P4 box.

Scenario:
listening to music collection (mp3) via Amarok.  Enable Amarok
visualization gforce, and size such that X and gforce each use ~50% cpu.
Start rip/encode of new CD with grip/lame encoder.  Lame is set to use
both cpus, at nice 5.  Once the encoders start, they receive
considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the
remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry
database) to squabble over.

With 2.6.21-rc3,  X/Gforce maintain their ~50% cpu (remain smooth), and
the encoders (100%cpu bound) get whats left when Amarok isn't eating it.

I plunked the above patch into plain 2.6.21-rc3 and retested to
eliminate other mm tree differences, and it's repeatable.  The nice 5
cpu hogs always receive considerably more that the nice 0 sleepers.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-11 Thread Con Kolivas
On Sunday 11 March 2007 22:39, Mike Galbraith wrote:
> Hi Con,
>
> On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote:
> > What follows this email is a patch series for the latest version of the
> > RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am able
> > to reproduce in this version so if some people would be kind enough to
> > test if there are any hidden bugs or oops lurking, it would be nice to
> > know in anticipation of putting this back in -mm. Thanks.
> >
> > Full patch for 2.6.21-rc3-mm2:
> > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29
> >.patch
>
> I'm seeing a cpu distribution problem running this on my P4 box.
>
> Scenario:
> listening to music collection (mp3) via Amarok.  Enable Amarok
> visualization gforce, and size such that X and gforce each use ~50% cpu.
> Start rip/encode of new CD with grip/lame encoder.  Lame is set to use
> both cpus, at nice 5.  Once the encoders start, they receive
> considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the
> remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry
> database) to squabble over.
>
> With 2.6.21-rc3,  X/Gforce maintain their ~50% cpu (remain smooth), and
> the encoders (100%cpu bound) get whats left when Amarok isn't eating it.
>
> I plunked the above patch into plain 2.6.21-rc3 and retested to
> eliminate other mm tree differences, and it's repeatable.  The nice 5
> cpu hogs always receive considerably more that the nice 0 sleepers.

Thanks for the report. I'm assuming you're describing a single hyperthread P4 
here in SMP mode so 2 logical cores. Can you elaborate on whether there is 
any difference as to which cpu things are bound to as well? Can you also see 
what happens with lame not niced to +5 (ie at 0) and with lame at nice +19.

Thanks.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/9] signalfd/timerfd - signalfd core ...

2007-03-11 Thread Oleg Nesterov
On 03/10, Davide Libenzi wrote:
>
> +static void signalfd_put_sighand(struct signalfd_ctx *ctx,
> +  struct sighand_struct *sighand,
> +  unsigned long *flags)
> +{
> + unlock_task_sighand(ctx->tsk, flags);
> +}

Note that signalfd_put_sighand() doesn't need "sighand" parameter, please
see below.

> +int signalfd_deliver(struct sighand_struct *sighand, int sig,
> +  struct siginfo *info)
> +{
> + int nsig = 0;
> + struct signalfd_ctx *ctx, *tmp;
> +
> + list_for_each_entry_safe(ctx, tmp, >sfdlist, lnk) {
> + /*
> +  * We use a negative signal value as a way to broadcast that the
> +  * sighand has been orphaned, so that we can notify all the
> +  * listeners about this. Remeber the ctx->sigmask is inverted,
> +  * so if the user is interested in a signal, that corresponding
> +  * bit will be zero.
> +  */
> + if (sig < 0)
> + list_del_init(>lnk);

I'm afraid this is not right. This should be per-thread.

Suppose we have threads T1 and T2 from the same thread group. sighand->sfdlist
contains ctx1 and ctx2 "linked" to T1 and T2. Now, T1 exits, __exit_signal()
does signalfd_notify(sighand, -1), and "unlinks" all threads, not just T1.

IOW, we should do

if (ctx->tsk == current) {
list_del_init(>lnk);
wake_up(>wqh);
}

Perhaps it makes sense to not re-use signalfd_deliver(), but introduce
a new signalfd_xxx(sighand, tsk) helper for de_thread/exit_signal.

Btw, signalfd_deliver() doesn't use "info" parameter.

> + if (sig < 0 || !sigismember(>sigmask, sig)) {
> + wake_up(>wqh);

Minor nit. Perhaps it makes sense to do

void signalfd_deliver(struct task_struct *tsk, int sig, struct 
sigpending *pending)
{
struct sighand_struct *sighand = tsk->sighand;
int private = (tsk->pending == pending);

list_for_each_entry_safe(ctx, tmp, >sfdlist, lnk) {
if (private && ctx->tsk != tsk)
continue;
if (!sigismember(>sigmask, sig))
wake_up(>wqh);
}
}

Even better: signalfd_deliver(struct task_struct *tsk, int sig, int private).
This way specific_send_sig_info/send_sigqueue won't do a "false" wakeup.

> +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t 
> sizemask)
> +{
> ...
> + if ((sighand = signalfd_get_sighand(ctx, )) != NULL) {
> + ctx->sigmask = sigmask;
> + signalfd_put_sighand(ctx, sighand, );
> + }

This looks like unneeded complication to me, I'd suggest

if (signalfd_get_sighand(ctx, )) {
ctx->sigmask = sigmask;
signalfd_put_sighand(ctx, flags);
}

unlock_task_sighand() (and thus signalfd_put_sighand) doesn't need "sighand"
parameter. signalfd_get_sighand() is in fact boolean. It makes sense to return
sighand, it may be useful, but this patch only needs != NULL.

Every usage of signalfd_get_sighand() could be simplified accordingly.

> --- linux-2.6.20.ep2.orig/fs/exec.c   2007-03-10 15:57:00.0 -0800
> +++ linux-2.6.20.ep2/fs/exec.c2007-03-10 15:57:51.0 -0800
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -583,6 +584,17 @@
>   int count;
>  
>   /*
> +  * Tell all the sighand listeners that this sighand has
> +  * been detached. Needs to be called with the sighand lock
> +  * held.
> +  */
> + if (unlikely(!list_empty(>sfdlist))) {
> + spin_lock_irq(>siglock);
> + signalfd_notify(oldsighand, -1, NULL);
> + spin_unlock_irq(>siglock);
> + }

Very minor nit. I'd suggest to make a new helper and put it in signalfd.h
(like signalfd_notify()). This will help CONFIG_SIGNALFD.

I still think that we should do this only for suid-exec. If application
passes a signalfd to another process with unix socket, it should know
what it does. But yes, I agree, we can change this later if needed.
(in that case the caller of the above helper should be flush_old_exec).

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-11 Thread Mike Galbraith
On Sun, 2007-03-11 at 22:48 +1100, Con Kolivas wrote:
> 
> Thanks for the report. I'm assuming you're describing a single hyperthread P4 
> here in SMP mode so 2 logical cores. Can you elaborate on whether there is 
> any difference as to which cpu things are bound to as well? Can you also see 
> what happens with lame not niced to +5 (ie at 0) and with lame at nice +19.

Yes, one P4/HT/SMP. No change at nice 0, but setting the encoders to
nice 19 did put X/gforce ~back where they were with 2.6.21-rc3.  Tasks
don't seem to be bound to any particular cpu, relies on load balancing
(which appears to be working).

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-11 Thread Ingo Molnar

* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> > Full patch for 2.6.21-rc3-mm2: 
> > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch
> 
> I'm seeing a cpu distribution problem running this on my P4 box.

> With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and 
> the encoders (100%cpu bound) get whats left when Amarok isn't eating 
> it.
> 
> I plunked the above patch into plain 2.6.21-rc3 and retested to 
> eliminate other mm tree differences, and it's repeatable.  The nice 5 
> cpu hogs always receive considerably more that the nice 0 sleepers.

hm. Do you get the same same problem on UP too? (i.e. lets eliminate any 
SMP/HT artifacts)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-11 Thread Mike Galbraith
On Sun, 2007-03-11 at 13:10 +0100, Ingo Molnar wrote:
> * Mike Galbraith <[EMAIL PROTECTED]> wrote:
> 
> > > Full patch for 2.6.21-rc3-mm2: 
> > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0.29.patch
> > 
> > I'm seeing a cpu distribution problem running this on my P4 box.
> 
> > With 2.6.21-rc3, X/Gforce maintain their ~50% cpu (remain smooth), and 
> > the encoders (100%cpu bound) get whats left when Amarok isn't eating 
> > it.
> > 
> > I plunked the above patch into plain 2.6.21-rc3 and retested to 
> > eliminate other mm tree differences, and it's repeatable.  The nice 5 
> > cpu hogs always receive considerably more that the nice 0 sleepers.
> 
> hm. Do you get the same same problem on UP too? (i.e. lets eliminate any 
> SMP/HT artifacts)

I'll boot up nosmp and report back (but now it's time to take Opa to the
Gasthaus for his Sunday afternoon brewskies;)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [git patches] libata fixes

2007-03-11 Thread Paul Rolland
Hello,

> It seems like IRQ is not getting through.  The first IRQ 
> driven command is failing for you.

H 
> Extract is :
> ata7: PATA max UDMA/100 cmd 0x00019c00 ctl 0x00019882 bmdma
> 0x00019400 irq 16
> ata8: PATA max UDMA/100 cmd 0x00019800 ctl 0x00019482 bmdma
> 0x00019408 irq 16

IRQ 16 is IO-APIC-fasteoi for libata, and is not shared... but all the
others libata IRQ are IO-APIC-edge.

> * Does giving 'acpi=off' or 'irqpoll' make any difference?
> 
> * Can you connect a harddisk to the channel and see whether 
> that works?
Tried that.. Disk is identified as ATA-7: Mastor 6Y080L0, YAR41BW0, max
UDMA/13
and then timeout again...

Tried then with acpi=off, same result (identify is OK, but then timeout),
and irqpoll and then it was OK 

Let's then go back to my DVD-RW and test irqpoll...
and ... Yes Got it !
It is identified, it can be mounted, and read as /dev/sr1 !

/proc/interrupts show a count of 0 for IRQ 16, so yes, it goes somewhere
else...

Doing some diffs on copy of /proc/interrupts while accessing the DVD
gives two possibilities : IRQ14 or IRQ18, but both are also counting
when not accessing the DVD...

Question : does running with irqpoll affects performance ?

Paul
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata extension

2007-03-11 Thread Alan Cox
> I believe you should be able to do this by sending ATA pass-through SCSI 
> commands into the device using SG_IO, without any kernel changes. It's 
> really the mechanism that's meant for this..

It should work, but Mark Lord reported some problems with READ_LONG on
PIIX/ICH intel chipsets. I don't know if he ever resolved them but if not
I have a patch that ought to.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] driver core: fix device_add error path

2007-03-11 Thread Dmitriy Monakhov
Dmitriy Monakhov <[EMAIL PROTECTED]> writes:

> Greg Kroah-Hartman <[EMAIL PROTECTED]> writes:
>
>> From: James Simmons <[EMAIL PROTECTED]>
>>
>> When a device fails to register the class symlinks where not cleaned up.
>> This left a symlink in the /sys/class/"device"/ directory that pointed
>> to no where. This caused the sysfs_follow_link Oops I reported earlier.
>> This patch cleanups up the symlink. Please apply. Thank you.
>>
>> Signed-Off: James Simmons <[EMAIL PROTECTED]>
>> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
>> ---
>>  drivers/base/core.c |   31 ++-
>>  1 files changed, 30 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index d04fd33..cf2a398 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -637,12 +637,41 @@ int device_add(struct device *dev)
>>   BUS_NOTIFY_DEL_DEVICE, dev);
>>  device_remove_groups(dev);
>>   GroupError:
>> -device_remove_attrs(dev);
>> +device_remove_attrs(dev);
>>   AttrsError:
>>  if (dev->devt_attr) {
>>  device_remove_file(dev, dev->devt_attr);
>>  kfree(dev->devt_attr);
>>  }
>> +
>> +if (dev->class) {
>> +sysfs_remove_link(>kobj, "subsystem");
>> +/* If this is not a "fake" compatible device, remove the
>> + * symlink from the class to the device. */
>> +if (dev->kobj.parent != >class->subsys.kset.kobj)
>> +sysfs_remove_link(>class->subsys.kset.kobj,
>> +  dev->bus_id);
>> +#ifdef CONFIG_SYSFS_DEPRECATED
>> +if (parent) {
>> +char *class_name = make_class_name(dev->class->name,
>> +   >kobj);
>> +if (class_name)
>> +sysfs_remove_link(>parent->kobj,
>> +  class_name);
>> +kfree(class_name);
>> +sysfs_remove_link(>kobj, "device");
>> +}
>> +#endif
>> +
> < block begin
>> +down(>class->sem);
>> +/* notify any interfaces that the device is now gone */
>> +list_for_each_entry(class_intf, >class->interfaces, node)
>> +if (class_intf->remove_dev)
>> +class_intf->remove_dev(dev, class_intf);
>> +/* remove the device from the class list */
>> +list_del_init(>node);
>> +up(>class->sem);
> << block end 
> May be i've missed something, but i'm confuesd a litle bit.
> For example if error happens while device_pm_add() we jump to label "PMError"
> and code from block above will be executed (device will be remove from list),
> but this device wasn't added to this list yet!
I've check it one more time, code it really broken!, and i think i understand 
how
this can happen 
it look like full code chunck was copy-pasted from device_del(), but in case of 
device_add() error path, device was't added to dev->class->devices list yet.
Folowing patch fix this copy-paste error:

 [PATCH] driver core: fix device_add error path

 - At the moment we jump here device was't added to
   dev->class->devices list yet.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/base/core.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 142c222..7d2459b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -684,15 +684,6 @@ int device_add(struct device *dev)
 #endif
sysfs_remove_link(>kobj, "device");
}
-
-   down(>class->sem);
-   /* notify any interfaces that the device is now gone */
-   list_for_each_entry(class_intf, >class->interfaces, node)
-   if (class_intf->remove_dev)
-   class_intf->remove_dev(dev, class_intf);
-   /* remove the device from the class list */
-   list_del_init(>node);
-   up(>class->sem);
}
  ueventattrError:
device_remove_file(dev, >uevent_attr);
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1 RSDL results

2007-03-11 Thread James Cloos
|> See:
|> 
http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_ioctl.c?revision=1.37=markup

OK.

Mesa is in git, now, but that still applies.  The gitweb url is:

http://gitweb.freedesktop.org/?p=mesa/mesa.git

and for the version of the above file in the master branch:

http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=blob;f=src/mesa/drivers/dri/r200/r200_ioctl.c

The recursive grep(1) on mesa shows:

,[grep -r sched_yield mesa]
| mesa/mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c:   sched_yield();
| mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchpool.c:  sched_yield();
| mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchbuffer.c: 
sched_yield();
| mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include/* for 
sched_yield() */
| mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include/* for 
sched_yield() */
| mesa/mesa/src/mesa/drivers/dri/common/vblank.h:  sched_yield();   
\
| mesa/mesa/src/mesa/drivers/dri/unichrome/via_ioctl.c:  sched_yield();
| mesa/mesa/src/mesa/drivers/dri/i915/intel_ioctl.c: sched_yield();
| mesa/mesa/src/mesa/drivers/dri/r200/r200_ioctl.c:   sched_yield();
`

Thanks for the heads up.  I must've grep(1)ed the xorg subdir rather
than the parent dir, and so missed mesa.

-JimC
-- 
James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1 RSDL results

2007-03-11 Thread Con Kolivas
On Sunday 11 March 2007 23:38, James Cloos wrote:
> |> See:
> |> http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_i
> |>octl.c?revision=1.37=markup
>
> OK.
>
> Mesa is in git, now, but that still applies.  The gitweb url is:
>
> http://gitweb.freedesktop.org/?p=mesa/mesa.git
>
> and for the version of the above file in the master branch:
>
> http://gitweb.freedesktop.org/?p=mesa/mesa.git;a=blob;f=src/mesa/drivers/dr
>i/r200/r200_ioctl.c
>
> The recursive grep(1) on mesa shows:
>
> ,[grep -r sched_yield mesa]
>
> | mesa/mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c: sched_yield();
> | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchpool.c: 
> | sched_yield();
> | mesa/mesa/src/mesa/drivers/dri/i915tex/intel_batchbuffer.c:
> | sched_yield(); mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include
> |/* for sched_yield() */
> | mesa/mesa/src/mesa/drivers/dri/common/vblank.h:#include/*
> | for sched_yield() */ mesa/mesa/src/mesa/drivers/dri/common/vblank.h: 
> | sched_yield();  \
> | mesa/mesa/src/mesa/drivers/dri/unichrome/via_ioctl.c:  sched_yield();
> | mesa/mesa/src/mesa/drivers/dri/i915/intel_ioctl.c:   sched_yield();
> | mesa/mesa/src/mesa/drivers/dri/r200/r200_ioctl.c:   sched_yield();
>
> `
>
> Thanks for the heads up.  I must've grep(1)ed the xorg subdir rather
> than the parent dir, and so missed mesa.

I just wonder what the heck all these will do to testing when using any of 
these drivers. Whether or not we do no yield, mild yield or full blown 
expiration yield, somehow or other I can't get over the feeling that if the 
code relies on yield() we can't really trust them to be meaningful cpu 
scheduler tests. This means most 3d apps out there that aren't using binary 
drivers, whether they be (fscking) glxgears, audio app visualisations or 
what...

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Locking interrupt handler in L1 cache

2007-03-11 Thread Parav Pandit
Hi,

I have MPC 8548 Linux 2.6.x based firewall which will
mostly do packet processing for 80% time.
So obviously most of the time it will RX and TX
packets through gianfar ethernet driver.

I want to lock my interrupt handler of this driver in
the L1 cache.

1. Is there any kernel API for locking function and
data to lock them in the L1/L2 cache?

2. How can I use "icbtls" - Instruction Cache Block
Touch and Lock Set" for locking my interrupt handler?

3. Is "icbtls" is the correct instruction at which I
am looking at?

4. How do I find end address of the interrupt handler
function and how do we pass it to cache locking
instructions? (Because it can happen that interrupt
handler size is more than a cache line, not aligned
etc)?

5. Can we enhance request_irq() function to take an
additional parameter to lock the interrupt handler in
the cache?

I understand that if my interrupt handler is going to
be called most of the time then it is very likely to
happen that OS will flush the same, but there is no
guarantee for it.

Regards,
Parav Pandit



 

Get your own web address.  
Have a HUGE year through Yahoo! Small Business.
http://smallbusiness.yahoo.com/domains/?p=BESTDEAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata extension

2007-03-11 Thread Bartlomiej Zolnierkiewicz

Hi,

On Sunday 11 March 2007, Vitaliyi wrote:
> Good Day
> 
> Say i want to implement extended set of ATA commands available to
> userspace for building diagnostic tools.
> I need 0x40 -- read verify and 0x32 -- write long with error handling,

Mark Lord is working on READ/WRITE_LONG support for libata,
he has posted draft patch recently on linux-ide mailing list.

[ Please consider reading/joining linux-ide@vger.kernel.org ML,
  it is where Linux ATA discussion happens... ]

> for example. I was trying ide driver through ioctl's, but seems it
> lack of functionality and full of gotchas. Furthermore it oopses
> sometimes.

READ/WRITE_LONG is unsupported and as you've already noticed
TASKFILE ioctls are full of gotchas...

> Is it possible to use libata for such purpose or i need to write
> separate IDE driver ?

It should be possible using ATA pass-through, some libata changes
may be required but it is the right way to go IMO.

Bart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lpfc: avoid double-free during PCI error failure

2007-03-11 Thread James Smart

ACK...  Looks good...

-- james s


Linas Vepstas wrote:

Bino, James,
Please review, sign-off and forward upstream.

--linas


If a PCI error is detected that cannot be recovered from, there
will be a double call of lpfc_pci_remove_one(), with the second call
resulting in a null-pointer dereference. The first call occurs in 
lpfc_io_error_detected(), and the second call during pci device 
remove. This patch eliminates the first call; its un-needed.


Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/scsi/lpfc/lpfc_init.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-2.6.20-git16/drivers/scsi/lpfc/lpfc_init.c
===
--- linux-2.6.20-git16.orig/drivers/scsi/lpfc/lpfc_init.c   2007-03-08 
15:57:40.0 -0600
+++ linux-2.6.20-git16/drivers/scsi/lpfc/lpfc_init.c2007-03-08 
16:03:18.0 -0600
@@ -1817,10 +1817,9 @@ static pci_ers_result_t lpfc_io_error_de
struct lpfc_sli *psli = >sli;
struct lpfc_sli_ring  *pring;
 
-	if (state == pci_channel_io_perm_failure) {

-   lpfc_pci_remove_one(pdev);
+   if (state == pci_channel_io_perm_failure)
return PCI_ERS_RESULT_DISCONNECT;
-   }
+
pci_disable_device(pdev);
/*
 * There may be I/Os dropped by the firmware.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Style Question

2007-03-11 Thread Cong WANG

Hi, list!

I have a question about coding style in linux kernel. In
Documention/CodingStyle, it is said that "Linux style for comments is
the C89 "/* ... */" style. Don't use C99-style "// ..." comments."
_But_ I see a lot of '//' style comments in current kernel code.

Which is wrong? The documentions or the code, or neither? And why?

Another question is about NULL. AFAIK, in user space, using NULL is
better than directly using 0 in C. In kernel, I know it used its own
NULL, which may be defined as ((void*)0), but it's _still_ different
from raw zero. So can I say using NULL is better than 0 in kernel?

Any reply is welcome. Thanks and have a nice day!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Style Question

2007-03-11 Thread Bernd Petrovitsch
On Sun, 2007-03-11 at 22:15 +0800, Cong WANG wrote:
[...]
> Another question is about NULL. AFAIK, in user space, using NULL is
> better than directly using 0 in C. In kernel, I know it used its own
> NULL, which may be defined as ((void*)0),

Userspace has the usually same definition.

>   but it's _still_ different
> from raw zero.

It is different that "0" as such has the type "int". But this int is
automatically promoted to a "0 pointer".

>So can I say using NULL is better than 0 in kernel?

Yes, because it is immediately clear that a pointer is (or should be)
there (and not an int).
And the same holds for userspace since this is a pure C question.

Bernd
-- 
Firmix Software GmbH   http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
  Embedded Linux Development and Services

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-11 Thread Herbert Poetzl
On Sun, Mar 11, 2007 at 12:08:16PM +0300, Pavel Emelianov wrote:
> Herbert Poetzl wrote:
>> On Tue, Mar 06, 2007 at 02:00:36PM -0800, Andrew Morton wrote:
>>> On Tue, 06 Mar 2007 17:55:29 +0300
>>> Pavel Emelianov <[EMAIL PROTECTED]> wrote:
>>>
 +struct rss_container {
 +  struct res_counter res;
 +  struct list_head page_list;
 +  struct container_subsys_state css;
 +};
 +
 +struct page_container {
 +  struct page *page;
 +  struct rss_container *cnt;
 +  struct list_head list;
 +};
>>> ah. This looks good. I'll find a hunk of time to go through this
>>> work and through Paul's patches. It'd be good to get both patchsets
>>> lined up in -mm within a couple of weeks. But..
>> 
>> doesn't look so good for me, mainly becaus of the 
>> additional per page data and per page processing
>> 
>> on 4GB memory, with 100 guests, 50% shared for each
>> guest, this basically means ~1mio pages, 500k shared
>> and 1500k x sizeof(page_container) entries, which
>> roughly boils down to ~25MB of wasted memory ...
>> 
>> increase the amount of shared pages and it starts
>> getting worse, but maybe I'm missing something here
> 
> You are. Each page has only one page_container associated
> with it despite the number of containers it is shared
> between.
> 
>>> We need to decide whether we want to do per-container memory
>>> limitation via these data structures, or whether we do it via
>>> a physical scan of some software zone, possibly based on Mel's
>>> patches.
>> 
>> why not do simple page accounting (as done currently
>> in Linux) and use that for the limits, without
>> keeping the reference from container to page?
> 
> As I've already answered in my previous letter simple
> limiting w/o per-container reclamation and per-container
> oom killer isn't a good memory management. It doesn't allow
> to handle resource shortage gracefully.

per container OOM killer does not require any container
page reference, you know _what_ tasks belong to the 
container, and you know their _badness_ from the normal
OOM calculations, so doing them for a container is really
straight forward without having any page 'tagging'

for the reclamation part, please elaborate how that will
differ in a (shared memory) guest from what the kernel
currently does ...

TIA,
Herbert

> This patchset provides more grace way to handle this, but
> full memory management includes accounting of VMA-length
> as well (returning ENOMEM from system call) but we've decided
> to start with RSS.
> 
>> best,
>> Herbert
>> 
>>> ___
>>> Containers mailing list
>>> [EMAIL PROTECTED]
>>> https://lists.osdl.org/mailman/listinfo/containers
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-11 Thread Gene Heskett
On Sunday 11 March 2007, Mike Galbraith wrote:
>Hi Con,
>
>On Sun, 2007-03-11 at 14:57 +1100, Con Kolivas wrote:
>> What follows this email is a patch series for the latest version of
>> the RSDL cpu scheduler (ie v0.29). I have addressed all bugs that I am
>> able to reproduce in this version so if some people would be kind
>> enough to test if there are any hidden bugs or oops lurking, it would
>> be nice to know in anticipation of putting this back in -mm. Thanks.
>>
>> Full patch for 2.6.21-rc3-mm2:
>> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-mm2-rsdl-0
>>.29.patch
>
>I'm seeing a cpu distribution problem running this on my P4 box.
>
>Scenario:
>listening to music collection (mp3) via Amarok.  Enable Amarok
>visualization gforce, and size such that X and gforce each use ~50% cpu.
>Start rip/encode of new CD with grip/lame encoder.  Lame is set to use
>both cpus, at nice 5.  Once the encoders start, they receive
>considerable more cpu than nice 0 X/Gforce, taking ~120% and leaving the
>remaining 80% for X/Gforce and Amarok (when it updates it's ~12k entry
>database) to squabble over.
>
>With 2.6.21-rc3,  X/Gforce maintain their ~50% cpu (remain smooth), and
>the encoders (100%cpu bound) get whats left when Amarok isn't eating it.
>
>I plunked the above patch into plain 2.6.21-rc3 and retested to
>eliminate other mm tree differences, and it's repeatable.  The nice 5
>cpu hogs always receive considerably more that the nice 0 sleepers.
>
>   -Mike

Just to comment, I've been running one of the patches between 20-ck1 and 
this latest one, which is building as I type, but I also run gkrellm 
here, version 2.2.9.

Since I have been running this middle of this series patch, something is 
killing gkrellm about once a day, and there is nothing in the logs to 
indicate a problem.  I see a blink out of the corner of my eye, and its 
gone.  And it always starts right back up from a kmenu click.

No idea if anyone else is experiencing this or not.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
You scratch my tape, and I'll scratch yours.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-11 Thread Pavel Emelianov
Herbert Poetzl wrote:
> On Sun, Mar 11, 2007 at 12:08:16PM +0300, Pavel Emelianov wrote:
>> Herbert Poetzl wrote:
>>> On Tue, Mar 06, 2007 at 02:00:36PM -0800, Andrew Morton wrote:
 On Tue, 06 Mar 2007 17:55:29 +0300
 Pavel Emelianov <[EMAIL PROTECTED]> wrote:

> +struct rss_container {
> + struct res_counter res;
> + struct list_head page_list;
> + struct container_subsys_state css;
> +};
> +
> +struct page_container {
> + struct page *page;
> + struct rss_container *cnt;
> + struct list_head list;
> +};
 ah. This looks good. I'll find a hunk of time to go through this
 work and through Paul's patches. It'd be good to get both patchsets
 lined up in -mm within a couple of weeks. But..
>>> doesn't look so good for me, mainly becaus of the 
>>> additional per page data and per page processing
>>>
>>> on 4GB memory, with 100 guests, 50% shared for each
>>> guest, this basically means ~1mio pages, 500k shared
>>> and 1500k x sizeof(page_container) entries, which
>>> roughly boils down to ~25MB of wasted memory ...
>>>
>>> increase the amount of shared pages and it starts
>>> getting worse, but maybe I'm missing something here
>> You are. Each page has only one page_container associated
>> with it despite the number of containers it is shared
>> between.
>>
 We need to decide whether we want to do per-container memory
 limitation via these data structures, or whether we do it via
 a physical scan of some software zone, possibly based on Mel's
 patches.
>>> why not do simple page accounting (as done currently
>>> in Linux) and use that for the limits, without
>>> keeping the reference from container to page?
>> As I've already answered in my previous letter simple
>> limiting w/o per-container reclamation and per-container
>> oom killer isn't a good memory management. It doesn't allow
>> to handle resource shortage gracefully.
> 
> per container OOM killer does not require any container
> page reference, you know _what_ tasks belong to the 
> container, and you know their _badness_ from the normal
> OOM calculations, so doing them for a container is really
> straight forward without having any page 'tagging'

That's true. If you look at the patches you'll
find out that no code in oom killer uses page 'tag'.

> for the reclamation part, please elaborate how that will
> differ in a (shared memory) guest from what the kernel
> currently does ...

This is all described in the code and in the
discussions we had before.

> TIA,
> Herbert
> 
>> This patchset provides more grace way to handle this, but
>> full memory management includes accounting of VMA-length
>> as well (returning ENOMEM from system call) but we've decided
>> to start with RSS.
>>
>>> best,
>>> Herbert
>>>
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to [EMAIL PROTECTED]
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 6/9] signalfd/timerfd - timerfd core ...

2007-03-11 Thread Thomas Gleixner
Davide,

On Sat, 2007-03-10 at 18:22 -0800, Davide Libenzi wrote:

Some remarks:

> +
> +asmlinkage long sys_timerfd(int ufd, int clockid, int tmrtype,
> + const struct timespec __user *utmr)
> +{
> + int error;
> + struct timerfd_ctx *ctx;
> + struct file *file;
> + struct inode *inode;
> + ktime_t tval, tnow;
> + struct timespec ktmr, tmrnow;
> +
> + error = -EFAULT;
> + if (copy_from_user(, utmr, sizeof(ktmr)))
> + goto err_exit;

Please do not use goto for a simple
return -EFAULT;

Please validate the timespec before converting it.

if (!timespec_valid())
return -EINVAL;


> + tval = timespec_to_ktime(ktmr);
> + error = -EINVAL;
> + if (clockid != CLOCK_MONOTONIC &&
> + clockid != CLOCK_REALTIME)
> + goto err_exit;
> + switch (tmrtype) {
> + case TFD_TIMER_REL:
> + case TFD_TIMER_SEQ:
> + break;
> + case TFD_TIMER_ABS:
> + getnstimeofday();
> + tnow = timespec_to_ktime(tmrnow);

tnow = ktime_get();

> + if (ktime_to_ns(tval) <= ktime_to_ns(tnow))
> + goto err_exit;
> + tval = ktime_sub(tval, tnow);

Why do you want to do that ? hrtimers handle relative and absolute
expiry times. You break down everything to relative time and lose the
accuracy for absolute timers. 

> + break;
> + default:
> + goto err_exit;
> + }
> +
> + if (ufd == -1) {
> + error = -ENOMEM;
> + ctx = kmem_cache_alloc(timerfd_ctx_cachep, GFP_KERNEL);
> + if (!ctx)
> + goto err_exit;
> +
> + init_waitqueue_head(>wqh);
> + spin_lock_init(>lock);
> + ctx->ticks = 0;
> + ctx->tmrtype = tmrtype;
> + ctx->clockid = clockid;
> + ctx->tval = tval;
> + hrtimer_init(>tmr, ctx->clockid, HRTIMER_REL);
> + ctx->tmr.expires = ctx->tval;
> + ctx->tmr.function = timerfd_tmrproc;
> +
> + hrtimer_start(>tmr, ctx->tval, HRTIMER_REL);
> +
> + /*
> +  * When we call this, the initialization must be complete, since
> +  * aino_getfd() will install the fd.
> +  */
> + error = aino_getfd(, , , "[timerfd]",
> +_fops, ctx);
> + if (error)
> + goto err_fdalloc;

Why is the timer started before we have everything in place ? 

Also if you turn it around then the (re)programming part of the timer
can be shared.

> + } else {
> + error = -EBADF;
> + file = fget(ufd);
> + if (!file)
> + goto err_exit;
> + ctx = file->private_data;
> + error = -EINVAL;
> + if (file->f_op != _fops) {
> + fput(file);
> + goto err_exit;
> + }
> +
> + /*
> +  * We need to stop the exiting timer before. We call
> +  * hrtimer_cancel() w/out holding our lock.
> +  */
> + spin_lock_irq(>lock);
> + while (hrtimer_active(>tmr)) {
> + spin_unlock_irq(>lock);
> + hrtimer_cancel(>tmr);
> + spin_lock_irq(>lock);
> + }

Please use hrtimer_try_to_cancel()

retry:
spin_lock_irq():
if (hrtimer_try_to_cancel(>tmr) < 0) {
spin_unlock_irq();
cpu_relax();
goto retry;
}

> +
> +static unsigned int timerfd_poll(struct file *file, poll_table *wait)
> +{
> + struct timerfd_ctx *ctx = file->private_data;
> +
> + poll_wait(file, >wqh, wait);
> +
> + return ctx->ticks ? POLLIN: 0;

This is racy:

timer is set up (non periodic)
timer expires
poll 

now poll is stuck for ever !


tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Style Question

2007-03-11 Thread Robert Hancock

Cong WANG wrote:

Hi, list!

I have a question about coding style in linux kernel. In
Documention/CodingStyle, it is said that "Linux style for comments is
the C89 "/* ... */" style. Don't use C99-style "// ..." comments."
_But_ I see a lot of '//' style comments in current kernel code.

Which is wrong? The documentions or the code, or neither? And why?


The code.. As with a lot of coding style issues, it's likely just that 
nobody saw it and bothered to complain when it went in.



Another question is about NULL. AFAIK, in user space, using NULL is
better than directly using 0 in C. In kernel, I know it used its own
NULL, which may be defined as ((void*)0), but it's _still_ different
from raw zero. So can I say using NULL is better than 0 in kernel?


It's the preferred style, Sparse will complain about using 0 for a null 
pointer for example..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram

2007-03-11 Thread Avi Kivity
PAGE_MASK is an unsigned long, so using it to mask physical addresses on
i386 (which are 64-bit wide) leads to truncation.  This can result in
page->private of unrelated memory pages being modified, with disasterous
results.

Fix by not using PAGE_MASK for physical addresses; instead calculate
the correct value directly from PAGE_SIZE.  Also fix a similar BUG_ON().

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index 2cb4893..e85b4c7 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -131,7 +131,7 @@ static int dbg = 1;
(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
 
 
-#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & PAGE_MASK)
+#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
 #define PT64_DIR_BASE_ADDR_MASK \
(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
 
@@ -406,8 +406,8 @@ static void rmap_write_protect(struct kvm_vcpu *vcpu, u64 
gfn)
spte = desc->shadow_ptes[0];
}
BUG_ON(!spte);
-   BUG_ON((*spte & PT64_BASE_ADDR_MASK) !=
-  page_to_pfn(page) << PAGE_SHIFT);
+   BUG_ON((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT
+  != page_to_pfn(page));
BUG_ON(!(*spte & PT_PRESENT_MASK));
BUG_ON(!(*spte & PT_WRITABLE_MASK));
rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] KVM: MMU: Fix guest writes to nonpae pde

2007-03-11 Thread Avi Kivity
KVM shadow page tables are always in pae mode, regardless of the guest
setting.  This means that a guest pde (mapping 4MB of memory) is mapped
to two shadow pdes (mapping 2MB each).

When the guest writes to a pte or pde, we intercept the write and emulate it.
We also remove any shadowed mappings corresponding to the write.  Since the
mmu did not account for the doubling in the number of pdes, it removed the
wrong entry, resulting in a mismatch between shadow page tables and guest
page tables, followed shortly by guest memory corruption.

This patch fixes the problem by detecting the special case of writing to
a non-pae pde and adjusting the address and number of shadow pdes zapped
accordingly.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/mmu.c |   46 ++
 1 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/kvm/mmu.c b/drivers/kvm/mmu.c
index a1a9336..2cb4893 100644
--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -1093,22 +1093,40 @@ out:
return r;
 }
 
+static void mmu_pre_write_zap_pte(struct kvm_vcpu *vcpu,
+ struct kvm_mmu_page *page,
+ u64 *spte)
+{
+   u64 pte;
+   struct kvm_mmu_page *child;
+
+   pte = *spte;
+   if (is_present_pte(pte)) {
+   if (page->role.level == PT_PAGE_TABLE_LEVEL)
+   rmap_remove(vcpu, spte);
+   else {
+   child = page_header(pte & PT64_BASE_ADDR_MASK);
+   mmu_page_remove_parent_pte(vcpu, child, spte);
+   }
+   }
+   *spte = 0;
+}
+
 void kvm_mmu_pre_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes)
 {
gfn_t gfn = gpa >> PAGE_SHIFT;
struct kvm_mmu_page *page;
-   struct kvm_mmu_page *child;
struct hlist_node *node, *n;
struct hlist_head *bucket;
unsigned index;
u64 *spte;
-   u64 pte;
unsigned offset = offset_in_page(gpa);
unsigned pte_size;
unsigned page_offset;
unsigned misaligned;
int level;
int flooded = 0;
+   int npte;
 
pgprintk("%s: gpa %llx bytes %d\n", __FUNCTION__, gpa, bytes);
if (gfn == vcpu->last_pt_write_gfn) {
@@ -1144,22 +1162,26 @@ void kvm_mmu_pre_write(struct kvm_vcpu *vcpu, gpa_t 
gpa, int bytes)
}
page_offset = offset;
level = page->role.level;
+   npte = 1;
if (page->role.glevels == PT32_ROOT_LEVEL) {
-   page_offset <<= 1;  /* 32->64 */
+   page_offset <<= 1;  /* 32->64 */
+   /*
+* A 32-bit pde maps 4MB while the shadow pdes map
+* only 2MB.  So we need to double the offset again
+* and zap two pdes instead of one.
+*/
+   if (level == PT32_ROOT_LEVEL) {
+   page_offset <<= 1;
+   npte = 2;
+   }
page_offset &= ~PAGE_MASK;
}
spte = __va(page->page_hpa);
spte += page_offset / sizeof(*spte);
-   pte = *spte;
-   if (is_present_pte(pte)) {
-   if (level == PT_PAGE_TABLE_LEVEL)
-   rmap_remove(vcpu, spte);
-   else {
-   child = page_header(pte & PT64_BASE_ADDR_MASK);
-   mmu_page_remove_parent_pte(vcpu, child, spte);
-   }
+   while (npte--) {
+   mmu_pre_write_zap_pte(vcpu, page, spte);
+   ++spte;
}
-   *spte = 0;
}
 }
 
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] KVM: More fixes for 2.6.21-rc3

2007-03-11 Thread Avi Kivity
This patchset contains fixes I plan to submit pre 2.6.21: a fix for
large memory 32-bit hosts, and a fix for non-pae 32-bit guests.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH] KVM: MMU: Fix guest writes to nonpae pde

2007-03-11 Thread Ingo Molnar

* Avi Kivity <[EMAIL PROTECTED]> wrote:

> KVM shadow page tables are always in pae mode, regardless of the guest 
> setting.  This means that a guest pde (mapping 4MB of memory) is 
> mapped to two shadow pdes (mapping 2MB each).
> 
> When the guest writes to a pte or pde, we intercept the write and 
> emulate it. We also remove any shadowed mappings corresponding to the 
> write.  Since the mmu did not account for the doubling in the number 
> of pdes, it removed the wrong entry, resulting in a mismatch between 
> shadow page tables and guest page tables, followed shortly by guest 
> memory corruption.
> 
> This patch fixes the problem by detecting the special case of writing 
> to a non-pae pde and adjusting the address and number of shadow pdes 
> zapped accordingly.
> 
> Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>

tested this with both PAE and non-PAE Linux host and guest - works fine.

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram

2007-03-11 Thread Ingo Molnar

* Avi Kivity <[EMAIL PROTECTED]> wrote:

> PAGE_MASK is an unsigned long, so using it to mask physical addresses 
> on i386 (which are 64-bit wide) leads to truncation.  This can result 
> in page->private of unrelated memory pages being modified, with 
> disasterous results.
> 
> Fix by not using PAGE_MASK for physical addresses; instead calculate 
> the correct value directly from PAGE_SIZE.  Also fix a similar 
> BUG_ON().
> 
> Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>

i have tested this, albeit with less than 4GB RAM.

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] KVM: always reload segment selectors

2007-03-11 Thread Ingo Molnar
Subject: [patch] KVM: always reload segment selectors
From: Ingo Molnar <[EMAIL PROTECTED]>

failed VM entry on VMX might still change %fs or %gs, thus make sure 
that KVM always reloads the segment selectors. This is crutial on both 
x86 and x86_64: x86 has __KERNEL_PDA in %fs on which things like 
'current' depends and x86_64 has 0 there and needs MSR_GS_BASE to work.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 drivers/kvm/vmx.c |   37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

Index: linux/drivers/kvm/vmx.c
===
--- linux.orig/drivers/kvm/vmx.c
+++ linux/drivers/kvm/vmx.c
@@ -1896,6 +1896,27 @@ again:
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
  : "cc", "memory" );
 
+   /*
+* Reload segment selectors ASAP. (it's needed for a functional
+* kernel: x86 relies on having __KERNEL_PDA in %fs and x86_64
+* relies on having 0 in %gs for the CPU PDA to work.)
+*/
+   if (fs_gs_ldt_reload_needed) {
+   load_ldt(ldt_sel);
+   load_fs(fs_sel);
+   /*
+* If we have to reload gs, we must take care to
+* preserve our gs base.
+*/
+   local_irq_disable();
+   load_gs(gs_sel);
+#ifdef CONFIG_X86_64
+   wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE));
+#endif
+   local_irq_enable();
+
+   reload_tss();
+   }
++kvm_stat.exits;
 
save_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
@@ -1913,22 +1934,6 @@ again:
kvm_run->exit_reason = vmcs_read32(VM_INSTRUCTION_ERROR);
r = 0;
} else {
-   if (fs_gs_ldt_reload_needed) {
-   load_ldt(ldt_sel);
-   load_fs(fs_sel);
-   /*
-* If we have to reload gs, we must take care to
-* preserve our gs base.
-*/
-   local_irq_disable();
-   load_gs(gs_sel);
-#ifdef CONFIG_X86_64
-   wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE));
-#endif
-   local_irq_enable();
-
-   reload_tss();
-   }
/*
 * Profile KVM exit RIPs:
 */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KVM: always reload segment selectors

2007-03-11 Thread Avi Kivity

Ingo Molnar wrote:

Subject: [patch] KVM: always reload segment selectors
From: Ingo Molnar <[EMAIL PROTECTED]>

failed VM entry on VMX might still change %fs or %gs, thus make sure 
that KVM always reloads the segment selectors. This is crutial on both 
x86 and x86_64: x86 has __KERNEL_PDA in %fs on which things like 
'current' depends and x86_64 has 0 there and needs MSR_GS_BASE to work.


Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 drivers/kvm/vmx.c |   37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

Index: linux/drivers/kvm/vmx.c
===
--- linux.orig/drivers/kvm/vmx.c
+++ linux/drivers/kvm/vmx.c
@@ -1896,6 +1896,27 @@ again:
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
  : "cc", "memory" );
 
+	/*

+* Reload segment selectors ASAP. (it's needed for a functional
+* kernel: x86 relies on having __KERNEL_PDA in %fs and x86_64
+* relies on having 0 in %gs for the CPU PDA to work.)
+*/
+   if (fs_gs_ldt_reload_needed) {
+   load_ldt(ldt_sel);
+   load_fs(fs_sel);
+   /*
+* If we have to reload gs, we must take care to
+* preserve our gs base.
+*/
+   local_irq_disable();
+   load_gs(gs_sel);
+#ifdef CONFIG_X86_64
+   wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE));
+#endif
+   local_irq_enable();
+
+   reload_tss();
+   }
++kvm_stat.exits;
 
 	save_msrs(vcpu->guest_msrs, NR_BAD_MSRS);


btw, looking at the code, we could just remove fs from the 
fs_gs_reload_needed and make in unconditional.  VT knows how to reload 
segments, except if they're user segments (groan).  In the case of fs, 
if it's used for the pda, it's obviously a kernel segment.


gs is different: since only the segment base is loaded (via swapgs), the 
selector part could well be a userspace selector, and thus the 
irq-protected reload is needed.


Anyway, I'm applying the patch as the above discourse is irrelevant to 
the fix.



--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/15] KVM userspace interface updates

2007-03-11 Thread Avi Kivity
This patchset updates the kvm userspace interface to what I hope will
be the long-term stable interface.  Provisions are included for extending
the interface later.  The patches address performance and cleanliness
concerns.

One patch is missing -- I'd like the string pio transfers not to include
guest virtual addresses.  To date all my attempts to write the patch ended
with me losing consiousness.  Hopefully I'll manage it soon.

I'd like to submit the patchset post 2.6.21.  Comments are welcome.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/15] KVM: Initialize PIO I/O count

2007-03-11 Thread Avi Kivity
This allows userspace to ignore the io.rep field.  No a big deal, but
friendly.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |1 +
 drivers/kvm/vmx.c |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index b176f5a..c35b8c8 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1037,6 +1037,7 @@ static int io_interception(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
kvm_run->io.size = ((io_info & SVM_IOIO_SIZE_MASK) >> 
SVM_IOIO_SIZE_SHIFT);
kvm_run->io.string = (io_info & SVM_IOIO_STR_MASK) != 0;
kvm_run->io.rep = (io_info & SVM_IOIO_REP_MASK) != 0;
+   kvm_run->io.count = 1;
 
if (kvm_run->io.string) {
unsigned addr_mask;
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index 7fd572a..d4c9f33 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1459,6 +1459,7 @@ static int handle_io(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
= (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_DF) != 0;
kvm_run->io.rep = (exit_qualification & 32) != 0;
kvm_run->io.port = exit_qualification >> 16;
+   kvm_run->io.count = 1;
if (kvm_run->io.string) {
if (!get_io_count(vcpu, _run->io.count))
return 1;
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/15] KVM: Handle cpuid in the kernel instead of punting to userspace

2007-03-11 Thread Avi Kivity
KVM used to handle cpuid by letting userspace decide what values to
return to the guest.  We now handle cpuid completely in the kernel.  We
still let userspace decide which values the guest will see by having
userspace set up the value table beforehand (this is necessary to allow
management software to set the cpu features to the least common denominator,
so that live migration can work).

The motivation for the change is that kvm kernel code can be impacted by
cpuid features, for example the x86 emulator.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |5 +++
 drivers/kvm/kvm_main.c |   69 
 drivers/kvm/svm.c  |4 +-
 drivers/kvm/vmx.c  |4 +-
 include/linux/kvm.h|   18 -
 5 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 59cbc5b..be3a0e7 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -55,6 +55,7 @@
 #define KVM_NUM_MMU_PAGES 256
 #define KVM_MIN_FREE_MMU_PAGES 5
 #define KVM_REFILL_PAGES 25
+#define KVM_MAX_CPUID_ENTRIES 40
 
 #define FX_IMAGE_SIZE 512
 #define FX_IMAGE_ALIGN 16
@@ -286,6 +287,9 @@ struct kvm_vcpu {
u32 ar;
} tr, es, ds, fs, gs;
} rmode;
+
+   int cpuid_nent;
+   struct kvm_cpuid_entry cpuid_entries[KVM_MAX_CPUID_ENTRIES];
 };
 
 struct kvm_memory_slot {
@@ -446,6 +450,7 @@ void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, 
unsigned long value,
 
 struct x86_emulate_ctxt;
 
+void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
 int emulate_clts(struct kvm_vcpu *vcpu);
 int emulator_get_dr(struct x86_emulate_ctxt* ctxt, int dr,
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 8a4984d..347467e 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1504,6 +1504,43 @@ void save_msrs(struct vmx_msr_entry *e, int n)
 }
 EXPORT_SYMBOL_GPL(save_msrs);
 
+void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
+{
+   int i;
+   u32 function;
+   struct kvm_cpuid_entry *e, *best;
+
+   kvm_arch_ops->cache_regs(vcpu);
+   function = vcpu->regs[VCPU_REGS_RAX];
+   vcpu->regs[VCPU_REGS_RAX] = 0;
+   vcpu->regs[VCPU_REGS_RBX] = 0;
+   vcpu->regs[VCPU_REGS_RCX] = 0;
+   vcpu->regs[VCPU_REGS_RDX] = 0;
+   best = NULL;
+   for (i = 0; i < vcpu->cpuid_nent; ++i) {
+   e = >cpuid_entries[i];
+   if (e->function == function) {
+   best = e;
+   break;
+   }
+   /*
+* Both basic or both extended?
+*/
+   if (((e->function ^ function) & 0x8000) == 0)
+   if (!best || e->function > best->function)
+   best = e;
+   }
+   if (best) {
+   vcpu->regs[VCPU_REGS_RAX] = best->eax;
+   vcpu->regs[VCPU_REGS_RBX] = best->ebx;
+   vcpu->regs[VCPU_REGS_RCX] = best->ecx;
+   vcpu->regs[VCPU_REGS_RDX] = best->edx;
+   }
+   kvm_arch_ops->decache_regs(vcpu);
+   kvm_arch_ops->skip_emulated_instruction(vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
+
 static void complete_pio(struct kvm_vcpu *vcpu)
 {
struct kvm_io *io = >run->io;
@@ -2075,6 +2112,26 @@ out:
return r;
 }
 
+static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
+   struct kvm_cpuid *cpuid,
+   struct kvm_cpuid_entry __user *entries)
+{
+   int r;
+
+   r = -E2BIG;
+   if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
+   goto out;
+   r = -EFAULT;
+   if (copy_from_user(>cpuid_entries, entries,
+  cpuid->nent * sizeof(struct kvm_cpuid_entry)))
+   goto out;
+   vcpu->cpuid_nent = cpuid->nent;
+   return 0;
+
+out:
+   return r;
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -2181,6 +2238,18 @@ static long kvm_vcpu_ioctl(struct file *filp,
case KVM_SET_MSRS:
r = msr_io(vcpu, argp, do_set_msr, 0);
break;
+   case KVM_SET_CPUID: {
+   struct kvm_cpuid __user *cpuid_arg = argp;
+   struct kvm_cpuid cpuid;
+
+   r = -EFAULT;
+   if (copy_from_user(, cpuid_arg, sizeof cpuid))
+   goto out;
+   r = kvm_vcpu_ioctl_set_cpuid(vcpu, , cpuid_arg->entries);
+   if (r)
+   goto out;
+   break;
+   }
default:
;
}
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index c35b8c8..d4b2936 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1101,8 +1101,8 @@ static int task_switch_interception(struct kvm_vcpu 
*vcpu, struct kvm_run *kvm_r
 

[PATCH 01/15] KVM: Use a shared page for kernel/user communication when runing a vcpu

2007-03-11 Thread Avi Kivity
Instead of passing a 'struct kvm_run' back and forth between the kernel and
userspace, allocate a page and allow the user to mmap() it.  This reduces
needless copying and makes the interface expandable by providing lots of
free space.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |1 +
 drivers/kvm/kvm_main.c |   54 +++
 include/linux/kvm.h|6 ++--
 3 files changed, 44 insertions(+), 17 deletions(-)
 mode change 100755 => 100644 drivers/kvm/kvm_main.c

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 0d122bf..901b8d9 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -228,6 +228,7 @@ struct kvm_vcpu {
struct mutex mutex;
int   cpu;
int   launched;
+   struct kvm_run *run;
int interrupt_window_open;
unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */
 #define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
old mode 100755
new mode 100644
index 946ed86..42be8a8
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -355,6 +355,8 @@ static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
kvm_mmu_destroy(vcpu);
vcpu_put(vcpu);
kvm_arch_ops->vcpu_free(vcpu);
+   free_page((unsigned long)vcpu->run);
+   vcpu->run = NULL;
 }
 
 static void kvm_free_vcpus(struct kvm *kvm)
@@ -1887,6 +1889,33 @@ static int kvm_vcpu_ioctl_debug_guest(struct kvm_vcpu 
*vcpu,
return r;
 }
 
+static struct page *kvm_vcpu_nopage(struct vm_area_struct *vma,
+   unsigned long address,
+   int *type)
+{
+   struct kvm_vcpu *vcpu = vma->vm_file->private_data;
+   unsigned long pgoff;
+   struct page *page;
+
+   *type = VM_FAULT_MINOR;
+   pgoff = ((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+   if (pgoff != 0)
+   return NOPAGE_SIGBUS;
+   page = virt_to_page(vcpu->run);
+   get_page(page);
+   return page;
+}
+
+static struct vm_operations_struct kvm_vcpu_vm_ops = {
+   .nopage = kvm_vcpu_nopage,
+};
+
+static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   vma->vm_ops = _vcpu_vm_ops;
+   return 0;
+}
+
 static int kvm_vcpu_release(struct inode *inode, struct file *filp)
 {
struct kvm_vcpu *vcpu = filp->private_data;
@@ -1899,6 +1928,7 @@ static struct file_operations kvm_vcpu_fops = {
.release= kvm_vcpu_release,
.unlocked_ioctl = kvm_vcpu_ioctl,
.compat_ioctl   = kvm_vcpu_ioctl,
+   .mmap   = kvm_vcpu_mmap,
 };
 
 /*
@@ -1947,6 +1977,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
 {
int r;
struct kvm_vcpu *vcpu;
+   struct page *page;
 
r = -EINVAL;
if (!valid_vcpu(n))
@@ -1961,6 +1992,12 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
return -EEXIST;
}
 
+   page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+   r = -ENOMEM;
+   if (!page)
+   goto out_unlock;
+   vcpu->run = page_address(page);
+
vcpu->host_fx_image = (char*)ALIGN((hva_t)vcpu->fx_buf,
   FX_IMAGE_ALIGN);
vcpu->guest_fx_image = vcpu->host_fx_image + FX_IMAGE_SIZE;
@@ -1990,6 +2027,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, int 
n)
 
 out_free_vcpus:
kvm_free_vcpu(vcpu);
+out_unlock:
mutex_unlock(>mutex);
 out:
return r;
@@ -2003,21 +2041,9 @@ static long kvm_vcpu_ioctl(struct file *filp,
int r = -EINVAL;
 
switch (ioctl) {
-   case KVM_RUN: {
-   struct kvm_run kvm_run;
-
-   r = -EFAULT;
-   if (copy_from_user(_run, argp, sizeof kvm_run))
-   goto out;
-   r = kvm_vcpu_ioctl_run(vcpu, _run);
-   if (r < 0 &&  r != -EINTR)
-   goto out;
-   if (copy_to_user(argp, _run, sizeof kvm_run)) {
-   r = -EFAULT;
-   goto out;
-   }
+   case KVM_RUN:
+   r = kvm_vcpu_ioctl_run(vcpu, vcpu->run);
break;
-   }
case KVM_GET_REGS: {
struct kvm_regs kvm_regs;
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 275354f..d88e750 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define KVM_API_VERSION 4
+#define KVM_API_VERSION 5
 
 /*
  * Architectural interrupt line count, and the size of the bitmap needed
@@ -49,7 +49,7 @@ enum kvm_exit_reason {
KVM_EXIT_SHUTDOWN = 8,
 };
 
-/* for KVM_RUN */
+/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
/* in */
__u32 emulated;  /* skip current instruction */
@@ -233,7 +233,7 @@ struct kvm_dirty_log {
 /*
  * ioctls 

[PATCH 12/15] KVM: Initialize the apic_base msr on svm too

2007-03-11 Thread Avi Kivity
Older userspace didn't care, but newer userspace (with the cpuid changes)
does.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 0311665..2396ada 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -582,6 +582,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
init_vmcb(vcpu->svm->vmcb);
 
fx_init(vcpu);
+   vcpu->apic_base = 0xfee0 |
+   /*for vcpu 0*/ MSR_IA32_APICBASE_BSP |
+   MSR_IA32_APICBASE_ENABLE;
 
return 0;
 
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/15] KVM: Renumber ioctls

2007-03-11 Thread Avi Kivity
The recent changes have left the ioctl numbers in complete disarray.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/linux/kvm.h |   34 +-
 1 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index d89189a..93472da 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -229,34 +229,34 @@ struct kvm_cpuid {
 /*
  * ioctls for /dev/kvm fds:
  */
-#define KVM_GET_API_VERSION   _IO(KVMIO, 1)
-#define KVM_CREATE_VM _IO(KVMIO, 2) /* returns a VM fd */
-#define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 15, struct kvm_msr_list)
+#define KVM_GET_API_VERSION   _IO(KVMIO,   0x00)
+#define KVM_CREATE_VM _IO(KVMIO,   0x01) /* returns a VM fd */
+#define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 0x02, struct kvm_msr_list)
 
 /*
  * ioctls for VM fds
  */
-#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 10, struct kvm_memory_region)
+#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
 /*
  * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
  * a vcpu fd.
  */
-#define KVM_CREATE_VCPU   _IO(KVMIO, 11)
-#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 12, struct kvm_dirty_log)
+#define KVM_CREATE_VCPU   _IO(KVMIO,  0x41)
+#define KVM_GET_DIRTY_LOG _IOW(KVMIO, 0x42, struct kvm_dirty_log)
 
 /*
  * ioctls for vcpu fds
  */
-#define KVM_RUN   _IO(KVMIO, 16)
-#define KVM_GET_REGS  _IOR(KVMIO, 3, struct kvm_regs)
-#define KVM_SET_REGS  _IOW(KVMIO, 4, struct kvm_regs)
-#define KVM_GET_SREGS _IOR(KVMIO, 5, struct kvm_sregs)
-#define KVM_SET_SREGS _IOW(KVMIO, 6, struct kvm_sregs)
-#define KVM_TRANSLATE _IOWR(KVMIO, 7, struct kvm_translation)
-#define KVM_INTERRUPT _IOW(KVMIO, 8, struct kvm_interrupt)
-#define KVM_DEBUG_GUEST   _IOW(KVMIO, 9, struct kvm_debug_guest)
-#define KVM_GET_MSRS  _IOWR(KVMIO, 13, struct kvm_msrs)
-#define KVM_SET_MSRS  _IOW(KVMIO, 14, struct kvm_msrs)
-#define KVM_SET_CPUID _IOW(KVMIO, 17, struct kvm_cpuid)
+#define KVM_RUN   _IO(KVMIO,   0x80)
+#define KVM_GET_REGS  _IOR(KVMIO,  0x81, struct kvm_regs)
+#define KVM_SET_REGS  _IOW(KVMIO,  0x82, struct kvm_regs)
+#define KVM_GET_SREGS _IOR(KVMIO,  0x83, struct kvm_sregs)
+#define KVM_SET_SREGS _IOW(KVMIO,  0x84, struct kvm_sregs)
+#define KVM_TRANSLATE _IOWR(KVMIO, 0x85, struct kvm_translation)
+#define KVM_INTERRUPT _IOW(KVMIO,  0x86, struct kvm_interrupt)
+#define KVM_DEBUG_GUEST   _IOW(KVMIO,  0x87, struct kvm_debug_guest)
+#define KVM_GET_MSRS  _IOWR(KVMIO, 0x88, struct kvm_msrs)
+#define KVM_SET_MSRS  _IOW(KVMIO,  0x89, struct kvm_msrs)
+#define KVM_SET_CPUID _IOW(KVMIO,  0x8a, struct kvm_cpuid)
 
 #endif
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/15] KVM: Remove minor wart from KVM_CREATE_VCPU ioctl

2007-03-11 Thread Avi Kivity
That ioctl does not transfer any data, so it should be an _IO rather than an
_IOW.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 include/linux/kvm.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c6dd4a7..d89189a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -241,7 +241,7 @@ struct kvm_cpuid {
  * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
  * a vcpu fd.
  */
-#define KVM_CREATE_VCPU   _IOW(KVMIO, 11, int)
+#define KVM_CREATE_VCPU   _IO(KVMIO, 11)
 #define KVM_GET_DIRTY_LOG _IOW(KVMIO, 12, struct kvm_dirty_log)
 
 /*
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/15] KVM: Add method to check for backwards-compatible API extensions

2007-03-11 Thread Avi Kivity
Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |6 ++
 include/linux/kvm.h|5 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 747966e..376538c 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2416,6 +2416,12 @@ static long kvm_dev_ioctl(struct file *filp,
r = 0;
break;
}
+   case KVM_CHECK_EXTENSION:
+   /*
+* No extensions defined at present.
+*/
+   r = 0;
+   break;
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 93472da..c93cf53 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -232,6 +232,11 @@ struct kvm_cpuid {
 #define KVM_GET_API_VERSION   _IO(KVMIO,   0x00)
 #define KVM_CREATE_VM _IO(KVMIO,   0x01) /* returns a VM fd */
 #define KVM_GET_MSR_INDEX_LIST_IOWR(KVMIO, 0x02, struct kvm_msr_list)
+/*
+ * Check if a kvm extension is available.  Argument is extension number,
+ * return is 1 (yes) or 0 (no, sorry).
+ */
+#define KVM_CHECK_EXTENSION   _IO(KVMIO,   0x03)
 
 /*
  * ioctls for VM fds
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/15] KVM: Allow kernel to select size of mmap() buffer

2007-03-11 Thread Avi Kivity
This allows us to store offsets in the kernel/user kvm_run area, and be
sure that userspace has them mapped.  As offsets can be outside the
kvm_run struct, userspace has no way of knowing how much to mmap.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |8 +++-
 include/linux/kvm.h|4 
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index ed95c9b..b81f007 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -2436,7 +2436,7 @@ static long kvm_dev_ioctl(struct file *filp,
  unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *)arg;
-   int r = -EINVAL;
+   long r = -EINVAL;
 
switch (ioctl) {
case KVM_GET_API_VERSION:
@@ -2478,6 +2478,12 @@ static long kvm_dev_ioctl(struct file *filp,
 */
r = 0;
break;
+   case KVM_GET_VCPU_MMAP_SIZE:
+   r = -EINVAL;
+   if (arg)
+   goto out;
+   r = PAGE_SIZE;
+   break;
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c0d10cd..dad9081 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -253,6 +253,10 @@ struct kvm_signal_mask {
  * return is 1 (yes) or 0 (no, sorry).
  */
 #define KVM_CHECK_EXTENSION   _IO(KVMIO,   0x03)
+/*
+ * Get size for mmap(vcpu_fd)
+ */
+#define KVM_GET_VCPU_MMAP_SIZE_IO(KVMIO,   0x04) /* in bytes */
 
 /*
  * ioctls for VM fds
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/15] KVM: Add guest mode signal mask

2007-03-11 Thread Avi Kivity
Allow a special signal mask to be used while executing in guest mode.  This
allows signals to be used to interrupt a vcpu without requiring signal
delivery to a userspace handler, which is quite expensive.  Userspace still
receives -EINTR and can get the signal via sigwait().

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm.h  |3 +++
 drivers/kvm/kvm_main.c |   41 +
 include/linux/kvm.h|7 +++
 3 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index be3a0e7..1c4a581 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -277,6 +277,9 @@ struct kvm_vcpu {
gpa_t mmio_phys_addr;
int pio_pending;
 
+   int sigset_active;
+   sigset_t sigset;
+
struct {
int active;
u8 save_iopl;
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 0e28f58..ed95c9b 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1591,9 +1591,13 @@ static void complete_pio(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
int r;
+   sigset_t sigsaved;
 
vcpu_load(vcpu);
 
+   if (vcpu->sigset_active)
+   sigprocmask(SIG_SETMASK, >sigset, );
+
/* re-sync apic's tpr */
vcpu->cr8 = kvm_run->cr8;
 
@@ -1616,6 +1620,9 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 
r = kvm_arch_ops->run(vcpu, kvm_run);
 
+   if (vcpu->sigset_active)
+   sigprocmask(SIG_SETMASK, , NULL);
+
vcpu_put(vcpu);
return r;
 }
@@ -2142,6 +2149,17 @@ out:
return r;
 }
 
+static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
+{
+   if (sigset) {
+   sigdelsetmask(sigset, sigmask(SIGKILL)|sigmask(SIGSTOP));
+   vcpu->sigset_active = 1;
+   vcpu->sigset = *sigset;
+   } else
+   vcpu->sigset_active = 0;
+   return 0;
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -2260,6 +2278,29 @@ static long kvm_vcpu_ioctl(struct file *filp,
goto out;
break;
}
+   case KVM_SET_SIGNAL_MASK: {
+   struct kvm_signal_mask __user *sigmask_arg = argp;
+   struct kvm_signal_mask kvm_sigmask;
+   sigset_t sigset, *p;
+
+   p = NULL;
+   if (argp) {
+   r = -EFAULT;
+   if (copy_from_user(_sigmask, argp,
+  sizeof kvm_sigmask))
+   goto out;
+   r = -EINVAL;
+   if (kvm_sigmask.len != sizeof sigset)
+   goto out;
+   r = -EFAULT;
+   if (copy_from_user(, sigmask_arg->sigset,
+  sizeof sigset))
+   goto out;
+   p = 
+   }
+   r = kvm_vcpu_ioctl_set_sigmask(vcpu, );
+   break;
+   }
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b3af92e..c0d10cd 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -234,6 +234,12 @@ struct kvm_cpuid {
struct kvm_cpuid_entry entries[0];
 };
 
+/* for KVM_SET_SIGNAL_MASK */
+struct kvm_signal_mask {
+   __u32 len;
+   __u8  sigset[0];
+};
+
 #define KVMIO 0xAE
 
 /*
@@ -273,5 +279,6 @@ struct kvm_cpuid {
 #define KVM_GET_MSRS  _IOWR(KVMIO, 0x88, struct kvm_msrs)
 #define KVM_SET_MSRS  _IOW(KVMIO,  0x89, struct kvm_msrs)
 #define KVM_SET_CPUID _IOW(KVMIO,  0x8a, struct kvm_cpuid)
+#define KVM_SET_SIGNAL_MASK   _IOW(KVMIO,  0x8b, struct kvm_signal_mask)
 
 #endif
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/15] KVM: Remove the 'emulated' field from the userspace interface

2007-03-11 Thread Avi Kivity
We no longer emulate single instructions in userspace.  Instead, we service
mmio or pio requests.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/kvm_main.c |5 -
 include/linux/kvm.h|3 +--
 2 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 347467e..747966e 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1588,11 +1588,6 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
/* re-sync apic's tpr */
vcpu->cr8 = kvm_run->cr8;
 
-   if (kvm_run->emulated) {
-   kvm_arch_ops->skip_emulated_instruction(vcpu);
-   kvm_run->emulated = 0;
-   }
-
if (kvm_run->io_completed) {
if (vcpu->pio_pending)
complete_pio(vcpu);
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 15e23bc..c6dd4a7 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -51,10 +51,9 @@ enum kvm_exit_reason {
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
/* in */
-   __u32 emulated;  /* skip current instruction */
__u32 io_completed; /* mmio/pio request completed */
__u8 request_interrupt_window;
-   __u8 padding1[7];
+   __u8 padding1[3];
 
/* out */
__u32 exit_type;
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/15] KVM: Add a special exit reason when exiting due to an interrupt

2007-03-11 Thread Avi Kivity
This is redundant, as we also return -EINTR from the ioctl, but it
allows us to examine the exit_reason field on resume without seeing
old data.

Signed-off-by: Avi Kivity <[EMAIL PROTECTED]>
---
 drivers/kvm/svm.c   |2 ++
 drivers/kvm/vmx.c   |2 ++
 include/linux/kvm.h |3 ++-
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index b09928f..0311665 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1619,12 +1619,14 @@ again:
if (signal_pending(current)) {
++kvm_stat.signal_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
 
if (dm_request_for_irq_injection(vcpu, kvm_run)) {
++kvm_stat.request_irq_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
kvm_resched(vcpu);
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index ba7a98b..0d1c8cf 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1936,12 +1936,14 @@ again:
if (signal_pending(current)) {
++kvm_stat.signal_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
 
if (dm_request_for_irq_injection(vcpu, kvm_run)) {
++kvm_stat.request_irq_exits;
post_kvm_run_save(vcpu, kvm_run);
+   kvm_run->exit_reason = KVM_EXIT_INTR;
return -EINTR;
}
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 57f47ef..b3af92e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define KVM_API_VERSION 8
+#define KVM_API_VERSION 9
 
 /*
  * Architectural interrupt line count, and the size of the bitmap needed
@@ -45,6 +45,7 @@ enum kvm_exit_reason {
KVM_EXIT_IRQ_WINDOW_OPEN  = 7,
KVM_EXIT_SHUTDOWN = 8,
KVM_EXIT_FAIL_ENTRY   = 9,
+   KVM_EXIT_INTR = 10,
 };
 
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
-- 
1.5.0.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   >