Re: [Qemu-devel] Nested KVM is weird?

2014-06-03 Thread Muli Ben-Yehuda
On Sun, Jun 01, 2014 at 11:30:02PM +0700, Jun Koi wrote:

> (1) do you think this VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL is the
> reason why ESXi falls back to binary translation?

It might be, its been a while since I got ESXi to use VMX on KVM. Take
a look at the VMware log file for the L2, it should have a lot more
information.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] Nested KVM is weird?

2014-06-03 Thread Muli Ben-Yehuda
On Sun, Jun 01, 2014 at 11:30:02PM +0700, Jun Koi wrote:

 (1) do you think this VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL is the
 reason why ESXi falls back to binary translation?

It might be, its been a while since I got ESXi to use VMX on KVM. Take
a look at the VMware log file for the L2, it should have a lot more
information.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] Nested KVM is weird?

2014-06-01 Thread Muli Ben-Yehuda
On Sun, Jun 01, 2014 at 05:54:25PM +0700, Jun Koi wrote:

> So this means ESXi never uses VMResume/VMLaunch? How is this
> possible, because it uses VMX for its implementation?

ESXi will fall back to binary translation if it decides that it cannot
use VMX for some reason. Look at the L2's log file inside ESXi and you
will likely find some errors related to running nested, MSRs KVM does
not emulate properly, or something else that causes ESXi to use binary
translation for its L2.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] Nested KVM is weird?

2014-06-01 Thread Muli Ben-Yehuda
On Sun, Jun 01, 2014 at 05:54:25PM +0700, Jun Koi wrote:

 So this means ESXi never uses VMResume/VMLaunch? How is this
 possible, because it uses VMX for its implementation?

ESXi will fall back to binary translation if it decides that it cannot
use VMX for some reason. Look at the L2's log file inside ESXi and you
will likely find some errors related to running nested, MSRs KVM does
not emulate properly, or something else that causes ESXi to use binary
translation for its L2.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, calgary: use 8M TCE table size by default

2014-03-10 Thread Muli Ben-Yehuda
Patch looks good to me.

Acked-by: Muli Ben-Yehuda 

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] x86, calgary: use 8M TCE table size by default

2014-03-10 Thread Muli Ben-Yehuda
Patch looks good to me.

Acked-by: Muli Ben-Yehuda mu...@mulix.org

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, calgary: use 8M TCE table size by default

2014-03-08 Thread Muli Ben-Yehuda
On Fri, Mar 07, 2014 at 11:09:06AM -0500, Vivek Goyal wrote:

> I would say it is not very complicated to maintain backward
> compatibility in this case. So let us keep saved_max_pfn for some
> time and make kexec-tools changes. Some time down the line, one can
> get rid of saved_max_pfn completely.

If this is not too painful then this would be my preference too.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How could we get rid of saved_max_pfn for calgary iommu?

2014-03-08 Thread Muli Ben-Yehuda
On Thu, Mar 06, 2014 at 07:46:44AM -0700, Jon Mason wrote:

> > I don't know of anyone still using it, but it's not
> > impossible. Calgary and CalIOC2 machines would now be ~5-8 years
> > old.
> 
> It is getting a bit crufty in arch/x86.  Would it be better to move
> it to drivers/iommu?

Not sure I see the potential benefit... I think for Calgary it's
either leave it in arch/x86 or rip it out.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How could we get rid of saved_max_pfn for calgary iommu?

2014-03-08 Thread Muli Ben-Yehuda
On Thu, Mar 06, 2014 at 07:46:44AM -0700, Jon Mason wrote:

  I don't know of anyone still using it, but it's not
  impossible. Calgary and CalIOC2 machines would now be ~5-8 years
  old.
 
 It is getting a bit crufty in arch/x86.  Would it be better to move
 it to drivers/iommu?

Not sure I see the potential benefit... I think for Calgary it's
either leave it in arch/x86 or rip it out.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, calgary: use 8M TCE table size by default

2014-03-08 Thread Muli Ben-Yehuda
On Fri, Mar 07, 2014 at 11:09:06AM -0500, Vivek Goyal wrote:

 I would say it is not very complicated to maintain backward
 compatibility in this case. So let us keep saved_max_pfn for some
 time and make kexec-tools changes. Some time down the line, one can
 get rid of saved_max_pfn completely.

If this is not too painful then this would be my preference too.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How could we get rid of saved_max_pfn for calgary iommu?

2014-03-05 Thread Muli Ben-Yehuda
On Wed, Mar 05, 2014 at 10:50:41PM -0800, H. Peter Anvin wrote:

> OK, second question... is it time to axe Calgary?

I don't know of anyone still using it, but it's not
impossible. Calgary and CalIOC2 machines would now be ~5-8 years old.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How could we get rid of saved_max_pfn for calgary iommu?

2014-03-05 Thread Muli Ben-Yehuda
On Wed, Mar 05, 2014 at 01:36:17PM +0800, WANG Chao wrote:

> Hi, Muli
> 
> saved_max_pfn is becoming a setback for kexec-tools. Ideally calgary
> could get rid of saved_max_pfn at all. But If this can't work, how
> about exporting a calgary tce table size to user space, so that
> kexec-tools can simply pass calgary=xxx cmdline to 2nd kernel.

As Jon noted, this code is used to so that the TCE table remains
consistent between the original and the kexec'd kernel. I see two
options: either we hard code the TCE table size to the max so that
this bit of code becomes redundant, or we explicitly pass the original
table size (or the original max_pfn) to the kexec'd kernel. The first
option is more appealing, because I don't think anyone is actually
using the TCE table size -- we mostly added it for debugging the IOMMU
TCE code at the time -- but since I don't have a Calgary machine
anymore, I don't have any way to test it. The second option is uglier
but would be fully backward-compatible and less likely to break
things. Given that very few people are likely running the latest
upstream kernel on Calgary/CalIOC2 machines, I'm inclined towards the
first option.

> BTW MAINTAINERS file still uses your old email, please update
> accordingly.

I think you are the first person to actually look up the Calgary
maintainers in the last few years :-)

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How could we get rid of saved_max_pfn for calgary iommu?

2014-03-05 Thread Muli Ben-Yehuda
On Wed, Mar 05, 2014 at 01:36:17PM +0800, WANG Chao wrote:

 Hi, Muli
 
 saved_max_pfn is becoming a setback for kexec-tools. Ideally calgary
 could get rid of saved_max_pfn at all. But If this can't work, how
 about exporting a calgary tce table size to user space, so that
 kexec-tools can simply pass calgary=xxx cmdline to 2nd kernel.

As Jon noted, this code is used to so that the TCE table remains
consistent between the original and the kexec'd kernel. I see two
options: either we hard code the TCE table size to the max so that
this bit of code becomes redundant, or we explicitly pass the original
table size (or the original max_pfn) to the kexec'd kernel. The first
option is more appealing, because I don't think anyone is actually
using the TCE table size -- we mostly added it for debugging the IOMMU
TCE code at the time -- but since I don't have a Calgary machine
anymore, I don't have any way to test it. The second option is uglier
but would be fully backward-compatible and less likely to break
things. Given that very few people are likely running the latest
upstream kernel on Calgary/CalIOC2 machines, I'm inclined towards the
first option.

 BTW MAINTAINERS file still uses your old email, please update
 accordingly.

I think you are the first person to actually look up the Calgary
maintainers in the last few years :-)

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How could we get rid of saved_max_pfn for calgary iommu?

2014-03-05 Thread Muli Ben-Yehuda
On Wed, Mar 05, 2014 at 10:50:41PM -0800, H. Peter Anvin wrote:

 OK, second question... is it time to axe Calgary?

I don't know of anyone still using it, but it's not
impossible. Calgary and CalIOC2 machines would now be ~5-8 years old.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-16 Thread Muli Ben-Yehuda
On Fri, Feb 15, 2008 at 10:28:22AM -0800, Greg KH wrote:

> That would be nice.  Muli, want to make a patch for this?

Sure, I'll put something together.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-16 Thread Muli Ben-Yehuda
On Fri, Feb 15, 2008 at 10:28:22AM -0800, Greg KH wrote:

 That would be nice.  Muli, want to make a patch for this?

Sure, I'll put something together.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-14 Thread Muli Ben-Yehuda
On Thu, Feb 14, 2008 at 11:17:03PM -0800, Greg KH wrote:

> Hm, that's wierd.  I thought I got something, until I realized that
> you are doing a lot of logic before you ever even determine that
> your hardware is present in the system.  Why are you calling
> calgary_locate_bbars() and doing all of that work?  Or am I mising
> something in the code flow here?

Ok, it's all starting to come back to me. See below.

> Also, it looks like you use the pci_get_device() to find the pci
> device, then you do somethings, and then drop the device, never to
> use it again.
> 
> So, a traditional "probe" might not work as well, but it could be
> used if you want to.

We have a two stage initialization process. First, we need to be
initialized very early in order to be able to allocate relatively
large contiguous chunks of RAM. That requires detecting whether we're
on a Calgary/CalIOC2 system very early (detect_calgary(), called from
pci_iommu_alloc()). Then we have "regular" PCI initialization at a
later stage (calgary_iommu_init(), called from pci_iommu_init()),
which also calls calgary_locate_bbars(). Unlike a regular PCI device
which has its BARs in the configuration space, we need to probe BIOS
provided tables to find where our BARs are before we do anything else
with the "device representation" of the IOMMU. Note that the same
two-stage initialization scheme is used by all other x86 IOMMUs, for
similar reasons.

Now, Calgary is an isolation-capable IOMMU which has per-root-bus (as
opposed to global or per-BDF) IO address space. The reason we want to
access the pci_dev of each Calgary bus is to hang off of it the IOMMU
tables for all devices under that bus so that we end up using the
right translation table when a device asks for a DMA mapping. Hence
the loop in calgary_init, which loops over all pci_dev's of Calgary
busses, raises the ref count for each pci_dev to protect against
hot-unplug, and hooks up the IOMMU table for everything under that bus
to the pci_dev for the bus. Unlike a regular driver, our main entry
points are via the DMA-API, which takes a device's pci_dev, not
Calgary's. We then walk up the bus hierarchy and find the Calgary
pci_dev that is an ancestor of this device, and that gives us the
right IOMMU table to us. Note that this has to occur *before* PCI
device initialization, since we want to turn translation on before any
device will try to DMA.

In conclusion, our usage doesn't seem lika a good fit for the probe
approach, although it could probably converted provided we got the
ordering right with regards to regular PCI device
initialization. Doesn't seem to be worth the effort.

Cheers,
Muli

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-14 Thread Muli Ben-Yehuda
On Thu, Feb 14, 2008 at 11:17:03PM -0800, Greg KH wrote:

 Hm, that's wierd.  I thought I got something, until I realized that
 you are doing a lot of logic before you ever even determine that
 your hardware is present in the system.  Why are you calling
 calgary_locate_bbars() and doing all of that work?  Or am I mising
 something in the code flow here?

Ok, it's all starting to come back to me. See below.

 Also, it looks like you use the pci_get_device() to find the pci
 device, then you do somethings, and then drop the device, never to
 use it again.
 
 So, a traditional probe might not work as well, but it could be
 used if you want to.

We have a two stage initialization process. First, we need to be
initialized very early in order to be able to allocate relatively
large contiguous chunks of RAM. That requires detecting whether we're
on a Calgary/CalIOC2 system very early (detect_calgary(), called from
pci_iommu_alloc()). Then we have regular PCI initialization at a
later stage (calgary_iommu_init(), called from pci_iommu_init()),
which also calls calgary_locate_bbars(). Unlike a regular PCI device
which has its BARs in the configuration space, we need to probe BIOS
provided tables to find where our BARs are before we do anything else
with the device representation of the IOMMU. Note that the same
two-stage initialization scheme is used by all other x86 IOMMUs, for
similar reasons.

Now, Calgary is an isolation-capable IOMMU which has per-root-bus (as
opposed to global or per-BDF) IO address space. The reason we want to
access the pci_dev of each Calgary bus is to hang off of it the IOMMU
tables for all devices under that bus so that we end up using the
right translation table when a device asks for a DMA mapping. Hence
the loop in calgary_init, which loops over all pci_dev's of Calgary
busses, raises the ref count for each pci_dev to protect against
hot-unplug, and hooks up the IOMMU table for everything under that bus
to the pci_dev for the bus. Unlike a regular driver, our main entry
points are via the DMA-API, which takes a device's pci_dev, not
Calgary's. We then walk up the bus hierarchy and find the Calgary
pci_dev that is an ancestor of this device, and that gives us the
right IOMMU table to us. Note that this has to occur *before* PCI
device initialization, since we want to turn translation on before any
device will try to DMA.

In conclusion, our usage doesn't seem lika a good fit for the probe
approach, although it could probably converted provided we got the
ordering right with regards to regular PCI device
initialization. Doesn't seem to be worth the effort.

Cheers,
Muli

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-13 Thread Muli Ben-Yehuda
On Wed, Feb 13, 2008 at 09:32:03AM -0800, Greg KH wrote:

> Is there some reason you aren't using the "real" PCI driver api here
> and registering a pci driver for these devices?  That would take the
> whole "loop over all pci devices" logic out of the code entirely.

I recall we had a reason, but I no longer recall what it was. Some
reason the "real" PCI driver API didn't fit at the time. If someone
were to whip up a working patch, I'd happily apply it.

> > Feel free to nuke it and walk the list forward.
> 
> So something like the patch below would be fine?

Yep, looks good. 

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-13 Thread Muli Ben-Yehuda
On Tue, Feb 12, 2008 at 04:16:38PM -0800, Greg KH wrote:

> Why does the calgary driver need this?  Can we just use
> pci_get_device() instead?  Why do you need to walk the device list
> backwards?  Do you get false positives going forward?

It's not strictly needed, we used it for symmetry. Feel free to nuke
it and walk the list forward.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-13 Thread Muli Ben-Yehuda
On Tue, Feb 12, 2008 at 04:16:38PM -0800, Greg KH wrote:

 Why does the calgary driver need this?  Can we just use
 pci_get_device() instead?  Why do you need to walk the device list
 backwards?  Do you get false positives going forward?

It's not strictly needed, we used it for symmetry. Feel free to nuke
it and walk the list forward.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_get_device_reverse(), why does Calgary need this?

2008-02-13 Thread Muli Ben-Yehuda
On Wed, Feb 13, 2008 at 09:32:03AM -0800, Greg KH wrote:

 Is there some reason you aren't using the real PCI driver api here
 and registering a pci driver for these devices?  That would take the
 whole loop over all pci devices logic out of the code entirely.

I recall we had a reason, but I no longer recall what it was. Some
reason the real PCI driver API didn't fit at the time. If someone
were to whip up a working patch, I'd happily apply it.

  Feel free to nuke it and walk the list forward.
 
 So something like the patch below would be fine?

Yep, looks good. 

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]intel-iommu batched iotlb flushes

2008-02-12 Thread Muli Ben-Yehuda
On Tue, Feb 12, 2008 at 01:00:06AM -0800, David Miller wrote:
> From: Muli Ben-Yehuda <[EMAIL PROTECTED]>
> Date: Tue, 12 Feb 2008 10:52:56 +0200
> 
> > The streaming DMA-API was designed to conserve IOMMU mappings for
> > machines where IOMMU mappings are a scarce resource, and is a poor
> > fit for a modern IOMMU such as VT-d with a 64-bit IO address space
> > (or even an IOMMU with a 32-bit address space such as Calgary)
> > where there are plenty of IOMMU mappings available.
> 
> For the 64-bit case what you are suggesting eventually amounts to
> mapping all available RAM in the IOMMU.
> 
> Although an extreme version of your suggestion, it would be the most
> efficient as it would require zero IOMMU flush operations.
>
> But we'd lose things like protection and other benefits.

For the extreme case you are correct. There's an inherent trade-off
between IOMMU performance and protection guarantees, where one end of
the spectrum is represented by the streaming DMA-API and the other end
is represented by simply mapping all available memory. It's an open
question what is the right point in between. I think that an optimal
strategy might be "keep the mapping around for as long as it is safe",
i.e., keep a mapping to a frame for as long as the frame is owned by
whoever requested the mapping in the first place. Once ownership of
the frame is passed to another entity (e.g., from the driver to the
stack), revoke the original mapping. This implies a way for the kernel
to track frame ownership and communicate frame ownership changes to
the DMA-API layer, which we don't currently have.

Cheers,
Muli

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]intel-iommu batched iotlb flushes

2008-02-12 Thread Muli Ben-Yehuda
On Mon, Feb 11, 2008 at 02:41:05PM -0800, mark gross wrote:

> The intel-iommu hardware requires a polling operation to flush IOTLB
> PTE's after an unmap operation.  Through some TSC instrumentation of
> a netperf UDP stream with small packets test case it was seen that
> the flush operations where sucking up to 16% of the CPU time doing
> iommu_flush_iotlb's
> 
> The following patch batches the IOTLB flushes removing most of the
> overhead in flushing the IOTLB's.  It works by building a list of to
> be released IOVA's that is iterated over when a timer goes off or
> when a high water mark is reached.
> 
> The wrinkle this has is that the memory protection and page fault
> warnings from errant DMA operations is somewhat reduced, hence a kernel
> parameter is added to revert back to the "strict" page flush / unmap
> behavior. 
> 
> The hole is the following scenarios: 
> do many map_signal operations, do some unmap_signals, reuse a recently
> unmapped page,  memory>
> 
> Or: you have rouge hardware using DMA's to look at pages: do many
> map_signal's, do many unmap_singles, reuse some unmapped pages : 
> 
> 
> Note : these holes are very hard to get too, as the IOTLB is small
> and only the PTE's still in the IOTLB can be accessed through this
> mechanism.
> 
> Its recommended that strict is used when developing drivers that do
> DMA operations to catch bugs early.  For production code where
> performance is desired running with the batched IOTLB flushing is a
> good way to go.

While I don't disagree with this patch in principle (Calgary does the
same thing due to expensive IOTLB flushes) the right way to fix it
IMHO is to fix the drivers to batch mapping and unmapping operations
or map up-front and unmap when done. The streaming DMA-API was
designed to conserve IOMMU mappings for machines where IOMMU mappings
are a scarce resource, and is a poor fit for a modern IOMMU such as
VT-d with a 64-bit IO address space (or even an IOMMU with a 32-bit
address space such as Calgary) where there are plenty of IOMMU
mappings available.

Cheers,
Muli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]intel-iommu batched iotlb flushes

2008-02-12 Thread Muli Ben-Yehuda
On Mon, Feb 11, 2008 at 02:41:05PM -0800, mark gross wrote:

 The intel-iommu hardware requires a polling operation to flush IOTLB
 PTE's after an unmap operation.  Through some TSC instrumentation of
 a netperf UDP stream with small packets test case it was seen that
 the flush operations where sucking up to 16% of the CPU time doing
 iommu_flush_iotlb's
 
 The following patch batches the IOTLB flushes removing most of the
 overhead in flushing the IOTLB's.  It works by building a list of to
 be released IOVA's that is iterated over when a timer goes off or
 when a high water mark is reached.
 
 The wrinkle this has is that the memory protection and page fault
 warnings from errant DMA operations is somewhat reduced, hence a kernel
 parameter is added to revert back to the strict page flush / unmap
 behavior. 
 
 The hole is the following scenarios: 
 do many map_signal operations, do some unmap_signals, reuse a recently
 unmapped page, errant DMA hardware sneaks through and steps on reused
 memory
 
 Or: you have rouge hardware using DMA's to look at pages: do many
 map_signal's, do many unmap_singles, reuse some unmapped pages : 
 rouge hardware looks at reused page
 
 Note : these holes are very hard to get too, as the IOTLB is small
 and only the PTE's still in the IOTLB can be accessed through this
 mechanism.
 
 Its recommended that strict is used when developing drivers that do
 DMA operations to catch bugs early.  For production code where
 performance is desired running with the batched IOTLB flushing is a
 good way to go.

While I don't disagree with this patch in principle (Calgary does the
same thing due to expensive IOTLB flushes) the right way to fix it
IMHO is to fix the drivers to batch mapping and unmapping operations
or map up-front and unmap when done. The streaming DMA-API was
designed to conserve IOMMU mappings for machines where IOMMU mappings
are a scarce resource, and is a poor fit for a modern IOMMU such as
VT-d with a 64-bit IO address space (or even an IOMMU with a 32-bit
address space such as Calgary) where there are plenty of IOMMU
mappings available.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]intel-iommu batched iotlb flushes

2008-02-12 Thread Muli Ben-Yehuda
On Tue, Feb 12, 2008 at 01:00:06AM -0800, David Miller wrote:
 From: Muli Ben-Yehuda [EMAIL PROTECTED]
 Date: Tue, 12 Feb 2008 10:52:56 +0200
 
  The streaming DMA-API was designed to conserve IOMMU mappings for
  machines where IOMMU mappings are a scarce resource, and is a poor
  fit for a modern IOMMU such as VT-d with a 64-bit IO address space
  (or even an IOMMU with a 32-bit address space such as Calgary)
  where there are plenty of IOMMU mappings available.
 
 For the 64-bit case what you are suggesting eventually amounts to
 mapping all available RAM in the IOMMU.
 
 Although an extreme version of your suggestion, it would be the most
 efficient as it would require zero IOMMU flush operations.

 But we'd lose things like protection and other benefits.

For the extreme case you are correct. There's an inherent trade-off
between IOMMU performance and protection guarantees, where one end of
the spectrum is represented by the streaming DMA-API and the other end
is represented by simply mapping all available memory. It's an open
question what is the right point in between. I think that an optimal
strategy might be keep the mapping around for as long as it is safe,
i.e., keep a mapping to a frame for as long as the frame is owned by
whoever requested the mapping in the first place. Once ownership of
the frame is passed to another entity (e.g., from the driver to the
stack), revoke the original mapping. This implies a way for the kernel
to track frame ownership and communicate frame ownership changes to
the DMA-API layer, which we don't currently have.

Cheers,
Muli

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


CFP: Operating Systems Review special issue on the Linux kernel

2008-01-22 Thread Muli Ben-Yehuda
SIGOPS Operating Systems Review Special Issue on

  Research and Developments in the Linux Kernel

The Linux kernel, since its inception in 1991, has captured the
interest of many thousands of developers and millions of users. It
recently celebrated its 16th anniversary, includes many millions of
lines of code, and is used in production systems around the world. It
is also advancing at an increasingly rapid pace, undergoing many
changes every single day.

For this OSR special issue, we welcome technical papers covering the
latest advances that have been or will soon be merged into the Linux
kernel, as well as wild idea papers discussing promising experimental
work. In recognition of the current chasm that we wish to bridge, we
encourage papers from both the Linux kernel community and the research
community. The OSR issue aims to:

a) expose members of the Linux kernel community to exploratory
   research work that is going on which might influence Linux's
   evolution, and

b) expose members of the systems research community to the latest
   happenings in a mature, production kernel that is widely used and
   advancing rapidly.

Please submit papers related to all aspects of the Linux kernel. In
particular, papers are solicited for the the following areas:

* Virtualization
* I/O, networking and interconnects
* Support for multi-core and heterogeneous CPUs
* Co-existence with other operating systems
* filesystems, clustering, SSI
* Profiling, performance tuning, debugging
* Scaling up (e.g., supercomputers) and down (e.g., embedded devices)
* Experience reports
* Research directions

Submissions should be between 5 and 10 pages (in a font no smaller
than 10 pt). Standard SIGOPS formatting rules apply (see
http://www.sigops.org/osr.html). Papers should report on significant
new results or directions and include at least some material that has
not been published before. Papers will be reviewed by the guest
editors and the review committee. Accepted papers will be published in
the July 2008 issue of Operating Systems Review.

Please upload your submissions (in PDF format only) to the conference
website, at http://www.easychair.org/conferences/?conf=osrlk2008. If
you have any questions or comments, please do not hesitate to contact
the guest editors.

Important dates:

Submission deadline: March 14th, 2008
Author notification: April 18th, 2008
Camera-ready submission deadline: May 13th, 2008

OSR guest editors:

 * Muli Ben-Yehuda, IBM Haifa Research Lab <[EMAIL PROTECTED]>
 * Eric Van Hensbergen, IBM Austin Research Lab <[EMAIL PROTECTED]>
 * Marc E. Fiuczynski, Princeton University <[EMAIL PROTECTED]>

Review committee:

 * Patrick Bridges (University of New Mexico)
 * Angela Demke Brown (University of Toronto)
 * Hubertus Franke (IBM Research)
 * Oren Laadan (Columbia University)
 * Paul McKenney (IBM Linux Technology Center)
 * Chris Mason (Oracle)
 * Ron Minnich (Los Alamos National Laboratory)
 * Stephen C. Tweedie (Red Hat)
 * Chris Wright (Red Hat)
 * Pete Wyckoff (Ohio Supercomputer Center)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


CFP: Operating Systems Review special issue on the Linux kernel

2008-01-22 Thread Muli Ben-Yehuda
SIGOPS Operating Systems Review Special Issue on

  Research and Developments in the Linux Kernel

The Linux kernel, since its inception in 1991, has captured the
interest of many thousands of developers and millions of users. It
recently celebrated its 16th anniversary, includes many millions of
lines of code, and is used in production systems around the world. It
is also advancing at an increasingly rapid pace, undergoing many
changes every single day.

For this OSR special issue, we welcome technical papers covering the
latest advances that have been or will soon be merged into the Linux
kernel, as well as wild idea papers discussing promising experimental
work. In recognition of the current chasm that we wish to bridge, we
encourage papers from both the Linux kernel community and the research
community. The OSR issue aims to:

a) expose members of the Linux kernel community to exploratory
   research work that is going on which might influence Linux's
   evolution, and

b) expose members of the systems research community to the latest
   happenings in a mature, production kernel that is widely used and
   advancing rapidly.

Please submit papers related to all aspects of the Linux kernel. In
particular, papers are solicited for the the following areas:

* Virtualization
* I/O, networking and interconnects
* Support for multi-core and heterogeneous CPUs
* Co-existence with other operating systems
* filesystems, clustering, SSI
* Profiling, performance tuning, debugging
* Scaling up (e.g., supercomputers) and down (e.g., embedded devices)
* Experience reports
* Research directions

Submissions should be between 5 and 10 pages (in a font no smaller
than 10 pt). Standard SIGOPS formatting rules apply (see
http://www.sigops.org/osr.html). Papers should report on significant
new results or directions and include at least some material that has
not been published before. Papers will be reviewed by the guest
editors and the review committee. Accepted papers will be published in
the July 2008 issue of Operating Systems Review.

Please upload your submissions (in PDF format only) to the conference
website, at http://www.easychair.org/conferences/?conf=osrlk2008. If
you have any questions or comments, please do not hesitate to contact
the guest editors.

Important dates:

Submission deadline: March 14th, 2008
Author notification: April 18th, 2008
Camera-ready submission deadline: May 13th, 2008

OSR guest editors:

 * Muli Ben-Yehuda, IBM Haifa Research Lab [EMAIL PROTECTED]
 * Eric Van Hensbergen, IBM Austin Research Lab [EMAIL PROTECTED]
 * Marc E. Fiuczynski, Princeton University [EMAIL PROTECTED]

Review committee:

 * Patrick Bridges (University of New Mexico)
 * Angela Demke Brown (University of Toronto)
 * Hubertus Franke (IBM Research)
 * Oren Laadan (Columbia University)
 * Paul McKenney (IBM Linux Technology Center)
 * Chris Mason (Oracle)
 * Ron Minnich (Los Alamos National Laboratory)
 * Stephen C. Tweedie (Red Hat)
 * Chris Wright (Red Hat)
 * Pete Wyckoff (Ohio Supercomputer Center)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]intel-iommu-PMEN support

2007-11-17 Thread Muli Ben-Yehuda
On Fri, Nov 16, 2007 at 02:39:57PM -0800, mark gross wrote:

> The following patch adds support for protected memory enable bits by
> clearing them if they are set at startup time.  Some future boot
> loaders or firmware could have this bit set after it loads the
> kernel, and it needs to be cleared if DMA's are going to happen
> effectively.
> 
> please apply
> 
> --mgross
> 
> Signed-off-by: mark gross <[EMAIL PROTECTED]>

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Mark, please try in the future to not mix unrelated changes such as
these. This patch should've been two patches, one to disable protected
memory enable and the other to do the cleanups.

> -#define MAX_FAULT_REASON_IDX ARRAY_SIZE(fault_reason_strings) - 1
> +#define MAX_FAULT_REASON_IDX (ARRAY_SIZE(fault_reason_strings) - 1)

> -static inline u64 dmar_readq(void *addr)
> +static inline u64 dmar_readq(void __iomem *addr)

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]intel-iommu-PMEN support

2007-11-17 Thread Muli Ben-Yehuda
On Fri, Nov 16, 2007 at 02:39:57PM -0800, mark gross wrote:

 The following patch adds support for protected memory enable bits by
 clearing them if they are set at startup time.  Some future boot
 loaders or firmware could have this bit set after it loads the
 kernel, and it needs to be cleared if DMA's are going to happen
 effectively.
 
 please apply
 
 --mgross
 
 Signed-off-by: mark gross [EMAIL PROTECTED]

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Mark, please try in the future to not mix unrelated changes such as
these. This patch should've been two patches, one to disable protected
memory enable and the other to do the cleanups.

 -#define MAX_FAULT_REASON_IDX ARRAY_SIZE(fault_reason_strings) - 1
 +#define MAX_FAULT_REASON_IDX (ARRAY_SIZE(fault_reason_strings) - 1)

 -static inline u64 dmar_readq(void *addr)
 +static inline u64 dmar_readq(void __iomem *addr)

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA

2007-11-12 Thread Muli Ben-Yehuda
On Mon, Nov 12, 2007 at 07:25:27PM +0530, Amit Shah wrote:

> Selectively? What happens in the case when some iommu doesn't want
> to invoke the prev_op, but the mapping depends on it being called
> (eg, the hypercalling op is embedded somewhere in the prev_op chain)

Bad things :-)
There needs to be a hierarchy of dma-ops, e.g., nommu/swiotlb, then a
hardware iommu, then pvdma. Not sure where IB fits in here. The
calling order would be the reverse of the initialization order, so
pvdma->hardare->nommu/swiotlb.

> Hmm, also, a hypercall should be the last operation to be called in
> a few cases, but also the first (and the last) to be called in
> several other cases. For example, in a guest, you can go register
> any number of iotlbs, but you don't actually want to do anything
> there -- you just want to do a hypercall and get the mapping from
> the host.
> 
> But in any case, what ensures that the hypercall op always gets
> called and also that it's the last one?

If it gets called first it can ensure that it runs either first or
last, or both, since it controls when to run the other hooks, before
or after it does what it needs to do.
 
> Also, I'm thinking of implementations where let's say sg_map_free is
> not defined for a particular iotlb, but it was defined in the
> previously registered one. How to handle this?

Good point, this will require all dma ops implementation to provide
stubs that just return prev_ops->op if it's set.

> It seems a small dispatcher which takes care of this seems the
> likely choice here, but avoiding it (or at least caching the
> decisions) is something that needs more thought.

Yeah, I'm not too enthusiastic about it, but we do need such a generic
mechanism or we will each end up implementing our own versions of
it...

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA

2007-11-12 Thread Muli Ben-Yehuda
On Mon, Nov 12, 2007 at 05:26:24PM +0530, Amit Shah wrote:
> On Monday 12 November 2007 16:20:01 Muli Ben-Yehuda wrote:
> > On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:
> > > We make the dma_mapping_ops structure to point to our structure so
> > > that every DMA access goes through us. (This is the reason this only
> > > works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
> > > struct.)
> >
> > I need the same facility for Calgary for falling back to swiotlb if a
> > translation is disabled on some slot, and IB needs the same facility
> > for some IB adapters (e.g., ipath). Perhaps it's time to consider
> > stackable dma-ops (unless someone has a better idea...).
> 
> That would make great sense and simplify implementations.
> 
> How do you propose such an implementation? An array of function
> pointers for each possible call?

I was thinking of simply a chain of dma_ops (via dma_ops->prev_ops) ,
where it's the responsibility of each dma_ops implementation to call
or not call the corresponding entry in chain (prev_ops->op()). This
works well for Calgary (which will only use prev_ops selectively, and
I think it will work well for the IB folks. Will it work for you?

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware

2007-11-12 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 04:21:06PM +0200, Amit Shah wrote:

> Of all the DMA calls, only dma_alloc_coherent might not actually
> call dma_ops->alloc_coherent. We make sure that gets called if the
> device that's being worked on is a PV device

I always thougt that's a mess... the reason it's done this way is that
Andi Kleen preferred it for some reason at the time. How about trying
to fix it cleanly so that dma_alloc_coherent always gets called rather
than adding a hack to work around a hack? It will require auditing all
of the different x86 dma-ops but I can help with that.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation

2007-11-12 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 04:21:05PM +0200, Amit Shah wrote:

> A guest can call dma_ops->is_pv_device() to find out if a device is
> a passthrough'ed device (device passed on to a guest by the
> host). If this is true, a hypercall will be made to translate DMA
> mapping operations.

Doesn't really belong in the DMA mapping API. Instead what I think we
should do is to cache this (per device) value in the pci_device struct
(or device struct?) and in the dma-ops implementation inspect it to
decide exactly how to do DMA mapping for this device.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA

2007-11-12 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:

> We make the dma_mapping_ops structure to point to our structure so
> that every DMA access goes through us. (This is the reason this only
> works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
> struct.)

I need the same facility for Calgary for falling back to swiotlb if a
translation is disabled on some slot, and IB needs the same facility
for some IB adapters (e.g., ipath). Perhaps it's time to consider
stackable dma-ops (unless someone has a better idea...).

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA

2007-11-12 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:

 We make the dma_mapping_ops structure to point to our structure so
 that every DMA access goes through us. (This is the reason this only
 works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
 struct.)

I need the same facility for Calgary for falling back to swiotlb if a
translation is disabled on some slot, and IB needs the same facility
for some IB adapters (e.g., ipath). Perhaps it's time to consider
stackable dma-ops (unless someone has a better idea...).

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation

2007-11-12 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 04:21:05PM +0200, Amit Shah wrote:

 A guest can call dma_ops-is_pv_device() to find out if a device is
 a passthrough'ed device (device passed on to a guest by the
 host). If this is true, a hypercall will be made to translate DMA
 mapping operations.

Doesn't really belong in the DMA mapping API. Instead what I think we
should do is to cache this (per device) value in the pci_device struct
(or device struct?) and in the dma-ops implementation inspect it to
decide exactly how to do DMA mapping for this device.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware

2007-11-12 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 04:21:06PM +0200, Amit Shah wrote:

 Of all the DMA calls, only dma_alloc_coherent might not actually
 call dma_ops-alloc_coherent. We make sure that gets called if the
 device that's being worked on is a PV device

I always thougt that's a mess... the reason it's done this way is that
Andi Kleen preferred it for some reason at the time. How about trying
to fix it cleanly so that dma_alloc_coherent always gets called rather
than adding a hack to work around a hack? It will require auditing all
of the different x86 dma-ops but I can help with that.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA

2007-11-12 Thread Muli Ben-Yehuda
On Mon, Nov 12, 2007 at 05:26:24PM +0530, Amit Shah wrote:
 On Monday 12 November 2007 16:20:01 Muli Ben-Yehuda wrote:
  On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:
   We make the dma_mapping_ops structure to point to our structure so
   that every DMA access goes through us. (This is the reason this only
   works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
   struct.)
 
  I need the same facility for Calgary for falling back to swiotlb if a
  translation is disabled on some slot, and IB needs the same facility
  for some IB adapters (e.g., ipath). Perhaps it's time to consider
  stackable dma-ops (unless someone has a better idea...).
 
 That would make great sense and simplify implementations.
 
 How do you propose such an implementation? An array of function
 pointers for each possible call?

I was thinking of simply a chain of dma_ops (via dma_ops-prev_ops) ,
where it's the responsibility of each dma_ops implementation to call
or not call the corresponding entry in chain (prev_ops-op()). This
works well for Calgary (which will only use prev_ops selectively, and
I think it will work well for the IB folks. Will it work for you?

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA

2007-11-12 Thread Muli Ben-Yehuda
On Mon, Nov 12, 2007 at 07:25:27PM +0530, Amit Shah wrote:

 Selectively? What happens in the case when some iommu doesn't want
 to invoke the prev_op, but the mapping depends on it being called
 (eg, the hypercalling op is embedded somewhere in the prev_op chain)

Bad things :-)
There needs to be a hierarchy of dma-ops, e.g., nommu/swiotlb, then a
hardware iommu, then pvdma. Not sure where IB fits in here. The
calling order would be the reverse of the initialization order, so
pvdma-hardare-nommu/swiotlb.

 Hmm, also, a hypercall should be the last operation to be called in
 a few cases, but also the first (and the last) to be called in
 several other cases. For example, in a guest, you can go register
 any number of iotlbs, but you don't actually want to do anything
 there -- you just want to do a hypercall and get the mapping from
 the host.
 
 But in any case, what ensures that the hypercall op always gets
 called and also that it's the last one?

If it gets called first it can ensure that it runs either first or
last, or both, since it controls when to run the other hooks, before
or after it does what it needs to do.
 
 Also, I'm thinking of implementations where let's say sg_map_free is
 not defined for a particular iotlb, but it was defined in the
 previously registered one. How to handle this?

Good point, this will require all dma ops implementation to provide
stubs that just return prev_ops-op if it's set.

 It seems a small dispatcher which takes care of this seems the
 likely choice here, but avoiding it (or at least caching the
 decisions) is something that needs more thought.

Yeah, I'm not too enthusiastic about it, but we do need such a generic
mechanism or we will each end up implementing our own versions of
it...

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] x86 pci-calgary_64.c: make a variable static

2007-11-11 Thread Muli Ben-Yehuda
On Fri, Nov 09, 2007 at 07:03:30AM +0100, Adrian Bunk wrote:
> "debugging" is a horrible name for a global variable - thankfully it can 
> become static.
> 
> Also put it out of __read_mostly so that gcc no longer has to emit it
> at all.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Thanks, applied. Will push with my next batch of Calgary updates.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] x86 pci-calgary_64.c: make a variable static

2007-11-11 Thread Muli Ben-Yehuda
On Fri, Nov 09, 2007 at 07:03:30AM +0100, Adrian Bunk wrote:
 debugging is a horrible name for a global variable - thankfully it can 
 become static.
 
 Also put it out of __read_mostly so that gcc no longer has to emit it
 at all.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Thanks, applied. Will push with my next batch of Calgary updates.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix incorrect test in trident_ac97_set(); sound/oss/trident.c

2007-11-07 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 11:04:41AM -0800, Ray Lee wrote:

> On Nov 7, 2007 10:50 AM, Roel Kluin <[EMAIL PROTECTED]> wrote:

> > If count reaches zero, the loop ends, but the postfix decrement
> > still subtracts: testing for 'count == 0' will not work.
> >
> > Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>
> > ---
> > diff --git a/sound/oss/trident.c b/sound/oss/trident.c
> > index 96adc47..6959ee1 100644
> > --- a/sound/oss/trident.c
> > +++ b/sound/oss/trident.c
> > @@ -2935,7 +2935,7 @@ trident_ac97_set(struct ac97_codec *codec, u8 reg, 
> > u16 val)
> > do {
> > if ((inw(TRID_REG(card, address)) & busy) == 0)
> > break;
> > -   } while (count--);
> > +   } while (--count);
> >
> > data |= (mask | (reg & AC97_REG_ADDR));
> >
> > @@ -2996,7 +2996,7 @@ trident_ac97_get(struct ac97_codec *codec, u8 reg)
> > data = inl(TRID_REG(card, address));
> > if ((data & busy) == 0)
> > break;
> > -   } while (count--);
> > +   } while (--count);
> >
> > spin_unlock_irqrestore(>lock, flags);
> >
> > if (count == 0) {
> >
> 
> Thanks, much better. In the future, please also CC: the appropriate
> maintainers, or Andrew Morton if you're at a loss...

Indeed.

> Reviewed-by: Ray Lee <[EMAIL PROTECTED]>

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Andrew, can you please push to Linus?

Thanks,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix incorrect test in trident_ac97_set(); sound/oss/trident.c

2007-11-07 Thread Muli Ben-Yehuda
On Wed, Nov 07, 2007 at 11:04:41AM -0800, Ray Lee wrote:

 On Nov 7, 2007 10:50 AM, Roel Kluin [EMAIL PROTECTED] wrote:

  If count reaches zero, the loop ends, but the postfix decrement
  still subtracts: testing for 'count == 0' will not work.
 
  Signed-off-by: Roel Kluin [EMAIL PROTECTED]
  ---
  diff --git a/sound/oss/trident.c b/sound/oss/trident.c
  index 96adc47..6959ee1 100644
  --- a/sound/oss/trident.c
  +++ b/sound/oss/trident.c
  @@ -2935,7 +2935,7 @@ trident_ac97_set(struct ac97_codec *codec, u8 reg, 
  u16 val)
  do {
  if ((inw(TRID_REG(card, address))  busy) == 0)
  break;
  -   } while (count--);
  +   } while (--count);
 
  data |= (mask | (reg  AC97_REG_ADDR));
 
  @@ -2996,7 +2996,7 @@ trident_ac97_get(struct ac97_codec *codec, u8 reg)
  data = inl(TRID_REG(card, address));
  if ((data  busy) == 0)
  break;
  -   } while (count--);
  +   } while (--count);
 
  spin_unlock_irqrestore(card-lock, flags);
 
  if (count == 0) {
 
 
 Thanks, much better. In the future, please also CC: the appropriate
 maintainers, or Andrew Morton if you're at a loss...

Indeed.

 Reviewed-by: Ray Lee [EMAIL PROTECTED]

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Andrew, can you please push to Linus?

Thanks,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 0/3] convert IOMMUs to use iova

2007-11-02 Thread Muli Ben-Yehuda
On Sat, Nov 03, 2007 at 02:05:39AM +0900, FUJITA Tomonori wrote:

> This patchset convert the PPC64 IOMMU to use the iova code for free
> area management.
> 
> The IOMMUs ignores low level drivers' restrictions, the maximum
> segment size and segment boundary.
> 
> I fixed the former:
> 
> http://thread.gmane.org/gmane.linux.scsi/35602
> 
> The latter makes the free area management complicated. I'd like to
> convert IOMMUs to use the iova code (that intel-iommu introduced)
> for free area management and enable iova to handle segment boundary
> restrictions, rather than fixing all the IOMMUs' free area
> management,

In general it sounds like a great idea, but have you looked at what
impact this has on the performance of the IO path?

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 0/3] convert IOMMUs to use iova

2007-11-02 Thread Muli Ben-Yehuda
On Sat, Nov 03, 2007 at 02:05:39AM +0900, FUJITA Tomonori wrote:

 This patchset convert the PPC64 IOMMU to use the iova code for free
 area management.
 
 The IOMMUs ignores low level drivers' restrictions, the maximum
 segment size and segment boundary.
 
 I fixed the former:
 
 http://thread.gmane.org/gmane.linux.scsi/35602
 
 The latter makes the free area management complicated. I'd like to
 convert IOMMUs to use the iova code (that intel-iommu introduced)
 for free area management and enable iova to handle segment boundary
 restrictions, rather than fixing all the IOMMUs' free area
 management,

In general it sounds like a great idea, but have you looked at what
impact this has on the performance of the IO path?

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iommu-PMEN_REG boot up support

2007-10-30 Thread Muli Ben-Yehuda
On Mon, Oct 29, 2007 at 01:31:30PM -0700, Mark Gross wrote:

> > What's the value of adding a command line option for this? Under
> > what circumstances would we want to not clear this bit?
> 
> umm, /me asks around and finds out that at this time there isn't
> much point to having a command line to preserve any protected areas
> set up at boot time.

The society for controlled proliferation of command line options
thanks you.

> Signed-off-by: mark gross <[EMAIL PROTECTED]>

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iommu-PMEN_REG boot up support

2007-10-30 Thread Muli Ben-Yehuda
On Mon, Oct 29, 2007 at 01:31:30PM -0700, Mark Gross wrote:

  What's the value of adding a command line option for this? Under
  what circumstances would we want to not clear this bit?
 
 umm, /me asks around and finds out that at this time there isn't
 much point to having a command line to preserve any protected areas
 set up at boot time.

The society for controlled proliferation of command line options
thanks you.

 Signed-off-by: mark gross [EMAIL PROTECTED]

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iommu-PMEN_REG boot up support

2007-10-26 Thread Muli Ben-Yehuda
On Fri, Oct 26, 2007 at 11:18:49AM -0700, Mark Gross wrote:

> The following patch clears the portect memory region enable bit at
> boot time by default.  It also provides a kernel parrameter for
> disabling this behavior and leave the PMEN_REG untouched if so
> wanted.
> 
> If the boot loader or platform has protected memory regions enabled
> at boot time it could prevent DMA's from happening as drivers are
> loaded and used.

What's the value of adding a command line option for this? Under what
circumstances would we want to not clear this bit?

> + } else if (!strncmp(str, "no_epm_clear", 12)) {
> + printk(KERN_INFO
> + "Intel-IOMMU: subress clearing of Enable "
> + "Protected Memory bit\n");
> + clear_pmen_epm = 0;

`suppress', I assume. Rest looks fine, if the configuration option is
really needed.

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] iommu-PMEN_REG boot up support

2007-10-26 Thread Muli Ben-Yehuda
On Fri, Oct 26, 2007 at 11:18:49AM -0700, Mark Gross wrote:

 The following patch clears the portect memory region enable bit at
 boot time by default.  It also provides a kernel parrameter for
 disabling this behavior and leave the PMEN_REG untouched if so
 wanted.
 
 If the boot loader or platform has protected memory regions enabled
 at boot time it could prevent DMA's from happening as drivers are
 loaded and used.

What's the value of adding a command line option for this? Under what
circumstances would we want to not clear this bit?

 + } else if (!strncmp(str, no_epm_clear, 12)) {
 + printk(KERN_INFO
 + Intel-IOMMU: subress clearing of Enable 
 + Protected Memory bit\n);
 + clear_pmen_epm = 0;

`suppress', I assume. Rest looks fine, if the configuration option is
really needed.

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86 gart: rename symbols only used for the GART implementation

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 08:10:54PM +0200, Andi Kleen wrote:

> I think his goal was to get an prefix that describes the module
> uniquely.  gart_* clearly does not fulfill that criteria.
> 
> So basically he's replacing an
> ambigious-with-other-IOMMU-implementations prefix with an
> ambigious-with-AGP prefix. Seems like a rather pointless change.

Not at all - the present situation where GART specific code
masquerades as generic IOMMU code is confusing. I've heard from more
than one person who was confused by "how come I can enable an IOMMU
while CONFIG_IOMMU is off?" and "how come some iommu_xxx options
(which were really gart options...) don't actually affect the
(Calgary) IOMMU"?

If the variable names clash the correct solution is to s/gart/agpgart/
in the AGP code and then continue renaming gart specific bits 'gart'
rather than 'iommu'.

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] x86 gart: make some variables and functions static

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 07:41:32PM +0200, Joerg Roedel wrote:
> This patch makes some functions and variables static in pci-gart_64.c which 
> are
> not used somewhere else.
> 
> Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] x86 gart: rename CONFIG_IOMMU to CONFIG_GART_IOMMU

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 07:41:31PM +0200, Joerg Roedel wrote:
> This patch renames the IOMMU config option to GART_IOMMU because in fact it
> means the GART and not general support for an IOMMU on x86.
> 
> Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86 gart: rename iommu.h to gart.h

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 07:41:30PM +0200, Joerg Roedel wrote:

> This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
> clear to which IOMMU implementation it belongs. The patch also adds "GART" to
> the Kconfig line.
> 
> Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] x86 gart: rename iommu.h to gart.h

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 07:41:30PM +0200, Joerg Roedel wrote:

 This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
 clear to which IOMMU implementation it belongs. The patch also adds GART to
 the Kconfig line.
 
 Signed-off-by: Joerg Roedel [EMAIL PROTECTED]

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] x86 gart: make some variables and functions static

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 07:41:32PM +0200, Joerg Roedel wrote:
 This patch makes some functions and variables static in pci-gart_64.c which 
 are
 not used somewhere else.
 
 Signed-off-by: Joerg Roedel [EMAIL PROTECTED]

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] x86 gart: rename CONFIG_IOMMU to CONFIG_GART_IOMMU

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 07:41:31PM +0200, Joerg Roedel wrote:
 This patch renames the IOMMU config option to GART_IOMMU because in fact it
 means the GART and not general support for an IOMMU on x86.
 
 Signed-off-by: Joerg Roedel [EMAIL PROTECTED]

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] x86 gart: rename symbols only used for the GART implementation

2007-10-23 Thread Muli Ben-Yehuda
On Tue, Oct 23, 2007 at 08:10:54PM +0200, Andi Kleen wrote:

 I think his goal was to get an prefix that describes the module
 uniquely.  gart_* clearly does not fulfill that criteria.
 
 So basically he's replacing an
 ambigious-with-other-IOMMU-implementations prefix with an
 ambigious-with-AGP prefix. Seems like a rather pointless change.

Not at all - the present situation where GART specific code
masquerades as generic IOMMU code is confusing. I've heard from more
than one person who was confused by how come I can enable an IOMMU
while CONFIG_IOMMU is off? and how come some iommu_xxx options
(which were really gart options...) don't actually affect the
(Calgary) IOMMU?

If the variable names clash the correct solution is to s/gart/agpgart/
in the AGP code and then continue renaming gart specific bits 'gart'
rather than 'iommu'.

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: rename iommu.h to gart.h

2007-10-19 Thread Muli Ben-Yehuda
On Fri, Oct 19, 2007 at 02:38:11PM +0200, Joerg Roedel wrote:

> This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
> clear to which IOMMU implementation it belongs. The patch also adds "GART" to
> the Kconfig line.

Long overdue IMHO. How about also renaming the config option to
CONFIG_GART_IOMMU while you're at it?

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: rename iommu.h to gart.h

2007-10-19 Thread Muli Ben-Yehuda
On Fri, Oct 19, 2007 at 02:38:11PM +0200, Joerg Roedel wrote:

 This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
 clear to which IOMMU implementation it belongs. The patch also adds GART to
 the Kconfig line.

Long overdue IMHO. How about also renaming the config option to
CONFIG_GART_IOMMU while you're at it?

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump

2007-10-13 Thread Muli Ben-Yehuda
On Wed, Oct 10, 2007 at 11:00:58AM +0530, Vivek Goyal wrote:
> On Tue, Oct 09, 2007 at 11:06:23PM +0200, Muli Ben-Yehuda wrote:
> > Hi Chandru,
> > 
> > Thanks for the patch. Comments on the patch below, but first a general
> > question for my education: the main problem here that aacraid
> > continues DMA'ing when it shouldn't. Why can't we shut it down
> > cleanly? Even without the presence of an IOMMU, it seems dangerous to
> > let an adapter continue DMA'ing to and from memory when the kernel is
> > an inconsistent state.
> > 
> 
> Hi Muli,
> 
> After the kernel crash, we can't rely on the crashed kernel's data
> structures or methods any more. We can't call the device shutdown
> methods of all the device drivers as we might be invoking a driver
> which actually might have caused the crash. Hence we don't perform
> any device shutdown in the case of kdump. Instead after the crash,
> we try to take the shortest route to second kernel (Execute minimum
> code in crashed kernel).
> 
> Whatever special handling is required to bring up the second kernel on 
> a potentially unknown state hardware, is taken in second kernel.

That makes sense, but does it really make more sense to set-up proper
IOMMU translation tables for DMA's which have been triggered from the
first kernel than to either quiesce the device ASAP (this means before
enabling IOMMU translation...) or to trap such DMA's and continue,
rather than letting them go through?

> We will not be too concerned about ongoing DMA's as long as there is no
> corruption of tce tables. That would mean DMA is happening in first
> kernel's memory buffer and second kernel is not impacted. But if TCE
> tables themselves are corrupted, then it can potentially interfere with
> second kernel's operation. Don't know how it can be addressed.

I worry about two cases: the first is TCE table corruption, the second
is DMA's to areas which were fine in the first kernel but are wrong
for the second kernel.

> > The patch below looks reasonable *if* that is the least worst way
> > of doing it - let's see if we can come up with something cleaner
> > that doesn't rely in the new kernel on data (which may or may not
> > be corrupted...) from the old kernel.
> 
> I think the issue here is that some DMA was going on when first
> kernel crashed. After the crash, second kernel booted and created
> new TCE tables and switched to it. This resulted in ongoing DMA
> failure and hardware raised an alarm.
> 
> In this case, probably it would make sense to re-use the TCE tables
> of previous kernel (until and unless we have a way to tell hardware
> not to flag a DMA error if TCE mapping is changed while DMA is going
> on ?)

We don't have a way to do that, but we should be able to trap the DMA,
figure out that we're in a kdump kernel, and then just ignore it and
continue.

> I think, we also need to reserve some TCE table entries (in first
> kernel), which can be used by second kernel for saving kernel core
> file to disk. There might be a case where first kernel has used up
> all TCE entries and second kernel can't allocate more. I think ppc64
> has taken the approach of freeing some entries in second kernel but
> that will have the problem as you might be clearing an entry which
> is being used by ongoing DMA.

That's a good point - how do PPC (or other architectures with an
isolation-capable IOMMU) handle this whole issue?

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump

2007-10-13 Thread Muli Ben-Yehuda
On Wed, Oct 10, 2007 at 11:00:58AM +0530, Vivek Goyal wrote:
 On Tue, Oct 09, 2007 at 11:06:23PM +0200, Muli Ben-Yehuda wrote:
  Hi Chandru,
  
  Thanks for the patch. Comments on the patch below, but first a general
  question for my education: the main problem here that aacraid
  continues DMA'ing when it shouldn't. Why can't we shut it down
  cleanly? Even without the presence of an IOMMU, it seems dangerous to
  let an adapter continue DMA'ing to and from memory when the kernel is
  an inconsistent state.
  
 
 Hi Muli,
 
 After the kernel crash, we can't rely on the crashed kernel's data
 structures or methods any more. We can't call the device shutdown
 methods of all the device drivers as we might be invoking a driver
 which actually might have caused the crash. Hence we don't perform
 any device shutdown in the case of kdump. Instead after the crash,
 we try to take the shortest route to second kernel (Execute minimum
 code in crashed kernel).
 
 Whatever special handling is required to bring up the second kernel on 
 a potentially unknown state hardware, is taken in second kernel.

That makes sense, but does it really make more sense to set-up proper
IOMMU translation tables for DMA's which have been triggered from the
first kernel than to either quiesce the device ASAP (this means before
enabling IOMMU translation...) or to trap such DMA's and continue,
rather than letting them go through?

 We will not be too concerned about ongoing DMA's as long as there is no
 corruption of tce tables. That would mean DMA is happening in first
 kernel's memory buffer and second kernel is not impacted. But if TCE
 tables themselves are corrupted, then it can potentially interfere with
 second kernel's operation. Don't know how it can be addressed.

I worry about two cases: the first is TCE table corruption, the second
is DMA's to areas which were fine in the first kernel but are wrong
for the second kernel.

  The patch below looks reasonable *if* that is the least worst way
  of doing it - let's see if we can come up with something cleaner
  that doesn't rely in the new kernel on data (which may or may not
  be corrupted...) from the old kernel.
 
 I think the issue here is that some DMA was going on when first
 kernel crashed. After the crash, second kernel booted and created
 new TCE tables and switched to it. This resulted in ongoing DMA
 failure and hardware raised an alarm.
 
 In this case, probably it would make sense to re-use the TCE tables
 of previous kernel (until and unless we have a way to tell hardware
 not to flag a DMA error if TCE mapping is changed while DMA is going
 on ?)

We don't have a way to do that, but we should be able to trap the DMA,
figure out that we're in a kdump kernel, and then just ignore it and
continue.

 I think, we also need to reserve some TCE table entries (in first
 kernel), which can be used by second kernel for saving kernel core
 file to disk. There might be a case where first kernel has used up
 all TCE entries and second kernel can't allocate more. I think ppc64
 has taken the approach of freeing some entries in second kernel but
 that will have the problem as you might be clearing an entry which
 is being used by ongoing DMA.

That's a good point - how do PPC (or other architectures with an
isolation-capable IOMMU) handle this whole issue?

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump

2007-10-09 Thread Muli Ben-Yehuda
Hi Chandru,

Thanks for the patch. Comments on the patch below, but first a general
question for my education: the main problem here that aacraid
continues DMA'ing when it shouldn't. Why can't we shut it down
cleanly? Even without the presence of an IOMMU, it seems dangerous to
let an adapter continue DMA'ing to and from memory when the kernel is
an inconsistent state.

The patch below looks reasonable *if* that is the least worst way of
doing it - let's see if we can come up with something cleaner that
doesn't rely in the new kernel on data (which may or may not be
corrupted...) from the old kernel.

Cheers,
Muli

On Wed, Oct 10, 2007 at 02:10:13AM +0530, chandru wrote:

> kdump kernel fails to boot with calgary iommu and aacraid driver on
> a x366 box.  The ongoing dma's of aacraid from the first kernel
> continue to exist until the driver is loaded in the kdump
> kernel. Calgary is initialized prior to aacraid and creation of new
> tce tables causes wrong dma's to occur.
> 
> Here we try to grab the tce tables of the first kernel in kdump
> kernel and use them. While in the kdump kernel we do not allocate
> new tce tables but rather read the base address register contents of
> calgary iommu and use the tables that the registers point to. With
> these changes the kdump kernel and hence aacraid now boots normally.

> Another point that came when talking with Vivek was to reserve part
> of the tce table space in first kernel for use in the kdump kernel.
> 
> Signed-off-by: Chandru S <[EMAIL PROTECTED]>
> ---
> 
> --- linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c.orig
> 2007-10-09 23:39:22.0 +0530
> +++ linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c   2007-10-10
> 01:25:53.0 +0530
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -165,6 +166,7 @@ static void calgary_dump_error_regs(stru
>  static void calioc2_handle_quirks(struct iommu_table *tbl, struct
> pci_dev *dev);
>  static void calioc2_tce_cache_blast(struct iommu_table *tbl);
>  static void calioc2_dump_error_regs(struct iommu_table *tbl);
> +static int  calgary_bus_has_devices(int , unsigned short ) __init;

Please add parameter names (int bus, etc).

>  static struct cal_chipset_ops calgary_chip_ops = {
> .handle_quirks = calgary_handle_quirks,
> @@ -819,7 +821,23 @@ static int __init calgary_setup_tar(stru
> 
> tbl = pci_iommu(dev->bus);
> tbl->it_base = (unsigned
> long)bus_info[dev->bus->number].tce_space;
> -   tce_free(tbl, 0, tbl->it_size);
> +#ifdef CONFIG_CRASH_DUMP
> +if (is_kdump_kernel()){
> +u64 *tp;
> +unsigned int index;
> +tp = ((u64*)tbl->it_base);
> +for(index=0;index < tbl->it_size; index++ ){
> +if ( *tp != 0x0 )
> +set_bit(index,tbl->it_map);
> +
> +tp++;
> +}
> +}
> +else
> +#endif
> +   {
> +   tce_free(tbl, 0, tbl->it_size);
> +   }

Please no #ifdefs in here. Do it like this:

if (is_kdump_kernel())
   something()
else
   something_else()

Where is_kdump_kernel() is defined to 0 if CONFIG_CRASH_DUMP is not
set.

Also, please move the part where we scan the table into a suitably
named function, e.g., calgary_init_bitmap_from_tce_table().

Also, coding style - please run the patch through checkpatch.pl.

> +#ifdef CONFIG_CRASH_DUMP
> +   /*
> +* If this is a kdump kernel, then try grabbing the tce tables
> +* from first kernel by reading the contents of the base
> +* address register of calgary iommu
> +*/
> +   if(is_kdump_kernel()){

Same comments as above.

> +   for (bus = 0; bus < MAX_PHB_BUS_NUM; bus++) {
> +   struct calgary_bus_info *info = _info[bus];
> +   unsigned short pci_device;
> +   unsigned long tce_space;
> +   u32 val;
> +
> +   val = read_pci_config(bus, 0, 0, 0);
> +   pci_device = (val & 0x) >> 16;
> +
> +   if (!is_cal_pci_dev(pci_device))
> +   continue;
> +
> +   if (info->translation_disabled)
> +   continue;
> +
> +   if (calgary_bus_has_devices(bus, pci_device) ||
> +   translate_empty_slots ){
> +
> +   target = calgary_reg(bus_info[bus].bbar,
> tar_offset(bus));
> +   tce_space = be64_to_cpu(readq(target));
> +   tce_space = tce_space & TAR_SW_BITS;
> +
> +   BUG_ON(specified_table_size >
> TCE_TABLE_SIZE_8M);
> +
> +   tce_space = tce_space & ( 
> ~specified_table_size);
> +   info->tce_space = 

Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump

2007-10-09 Thread Muli Ben-Yehuda
Hi Chandru,

Thanks for the patch. Comments on the patch below, but first a general
question for my education: the main problem here that aacraid
continues DMA'ing when it shouldn't. Why can't we shut it down
cleanly? Even without the presence of an IOMMU, it seems dangerous to
let an adapter continue DMA'ing to and from memory when the kernel is
an inconsistent state.

The patch below looks reasonable *if* that is the least worst way of
doing it - let's see if we can come up with something cleaner that
doesn't rely in the new kernel on data (which may or may not be
corrupted...) from the old kernel.

Cheers,
Muli

On Wed, Oct 10, 2007 at 02:10:13AM +0530, chandru wrote:

 kdump kernel fails to boot with calgary iommu and aacraid driver on
 a x366 box.  The ongoing dma's of aacraid from the first kernel
 continue to exist until the driver is loaded in the kdump
 kernel. Calgary is initialized prior to aacraid and creation of new
 tce tables causes wrong dma's to occur.
 
 Here we try to grab the tce tables of the first kernel in kdump
 kernel and use them. While in the kdump kernel we do not allocate
 new tce tables but rather read the base address register contents of
 calgary iommu and use the tables that the registers point to. With
 these changes the kdump kernel and hence aacraid now boots normally.

 Another point that came when talking with Vivek was to reserve part
 of the tce table space in first kernel for use in the kdump kernel.
 
 Signed-off-by: Chandru S [EMAIL PROTECTED]
 ---
 
 --- linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c.orig
 2007-10-09 23:39:22.0 +0530
 +++ linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c   2007-10-10
 01:25:53.0 +0530
 @@ -35,6 +35,7 @@
  #include linux/pci_ids.h
  #include linux/pci.h
  #include linux/delay.h
 +#include linux/bootmem.h
  #include asm/iommu.h
  #include asm/calgary.h
  #include asm/tce.h
 @@ -165,6 +166,7 @@ static void calgary_dump_error_regs(stru
  static void calioc2_handle_quirks(struct iommu_table *tbl, struct
 pci_dev *dev);
  static void calioc2_tce_cache_blast(struct iommu_table *tbl);
  static void calioc2_dump_error_regs(struct iommu_table *tbl);
 +static int  calgary_bus_has_devices(int , unsigned short ) __init;

Please add parameter names (int bus, etc).

  static struct cal_chipset_ops calgary_chip_ops = {
 .handle_quirks = calgary_handle_quirks,
 @@ -819,7 +821,23 @@ static int __init calgary_setup_tar(stru
 
 tbl = pci_iommu(dev-bus);
 tbl-it_base = (unsigned
 long)bus_info[dev-bus-number].tce_space;
 -   tce_free(tbl, 0, tbl-it_size);
 +#ifdef CONFIG_CRASH_DUMP
 +if (is_kdump_kernel()){
 +u64 *tp;
 +unsigned int index;
 +tp = ((u64*)tbl-it_base);
 +for(index=0;index  tbl-it_size; index++ ){
 +if ( *tp != 0x0 )
 +set_bit(index,tbl-it_map);
 +
 +tp++;
 +}
 +}
 +else
 +#endif
 +   {
 +   tce_free(tbl, 0, tbl-it_size);
 +   }

Please no #ifdefs in here. Do it like this:

if (is_kdump_kernel())
   something()
else
   something_else()

Where is_kdump_kernel() is defined to 0 if CONFIG_CRASH_DUMP is not
set.

Also, please move the part where we scan the table into a suitably
named function, e.g., calgary_init_bitmap_from_tce_table().

Also, coding style - please run the patch through checkpatch.pl.

 +#ifdef CONFIG_CRASH_DUMP
 +   /*
 +* If this is a kdump kernel, then try grabbing the tce tables
 +* from first kernel by reading the contents of the base
 +* address register of calgary iommu
 +*/
 +   if(is_kdump_kernel()){

Same comments as above.

 +   for (bus = 0; bus  MAX_PHB_BUS_NUM; bus++) {
 +   struct calgary_bus_info *info = bus_info[bus];
 +   unsigned short pci_device;
 +   unsigned long tce_space;
 +   u32 val;
 +
 +   val = read_pci_config(bus, 0, 0, 0);
 +   pci_device = (val  0x)  16;
 +
 +   if (!is_cal_pci_dev(pci_device))
 +   continue;
 +
 +   if (info-translation_disabled)
 +   continue;
 +
 +   if (calgary_bus_has_devices(bus, pci_device) ||
 +   translate_empty_slots ){
 +
 +   target = calgary_reg(bus_info[bus].bbar,
 tar_offset(bus));
 +   tce_space = be64_to_cpu(readq(target));
 +   tce_space = tce_space  TAR_SW_BITS;
 +
 +   BUG_ON(specified_table_size 
 TCE_TABLE_SIZE_8M);
 +
 +   tce_space = tce_space  ( 
 ~specified_table_size);
 +   info-tce_space = (u64
 *)__va(tce_space);
 +  

[PATCH 2/2] x86-64: Calgary - get rid of translate_phb

2007-09-21 Thread Muli Ben-Yehuda
Now that we check for translation enabled/disabled based on the
presence of the IOMMU translation table, we can get rid of
translate_phb.

Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>
---
 pci-calgary.c |   31 +++
 1 files changed, 15 insertions(+), 16 deletions(-)

diff -r c9308d0538d9 -r a7833c71f4df arch/x86_64/kernel/pci-calgary.c
--- a/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:43:49 2007 +0300
+++ b/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:45:29 2007 +0300
@@ -226,12 +226,6 @@ static inline int translation_enabled(st
return (tbl != NULL);
 }
 
-static inline int translate_phb(struct pci_dev* dev)
-{
-   int disabled = bus_info[dev->bus->number].translation_disabled;
-   return !disabled;
-}
-
 static void iommu_range_reserve(struct iommu_table *tbl,
unsigned long start_addr, unsigned int npages)
 {
@@ -1197,7 +1191,7 @@ static int __init calgary_init(void)
 {
int ret;
struct pci_dev *dev = NULL;
-   void *tce_space;
+   struct calgary_bus_info *info;
 
ret = calgary_locate_bbars();
if (ret)
@@ -1209,12 +1203,14 @@ static int __init calgary_init(void)
break;
if (!is_cal_pci_dev(dev->device))
continue;
-   if (!translate_phb(dev)) {
+
+   info = _info[dev->bus->number];
+   if (info->translation_disabled) {
calgary_init_one_nontraslated(dev);
continue;
}
-   tce_space = bus_info[dev->bus->number].tce_space;
-   if (!tce_space && !translate_empty_slots)
+
+   if (!info->tce_space && !translate_empty_slots)
continue;
 
ret = calgary_init_one(dev);
@@ -1232,11 +1228,13 @@ error:
break;
if (!is_cal_pci_dev(dev->device))
continue;
-   if (!translate_phb(dev)) {
+
+   info = _info[dev->bus->number];
+   if (info->translation_disabled) {
pci_dev_put(dev);
continue;
}
-   if (!bus_info[dev->bus->number].tce_space && 
!translate_empty_slots)
+   if (!info->tce_space && !translate_empty_slots)
continue;
 
calgary_disable_translation(dev);
@@ -1549,7 +1547,7 @@ static int __init calgary_fixup_tce_spac
 static int __init calgary_fixup_tce_spaces(void)
 {
struct pci_dev *dev = NULL;
-   void *tce_space;
+   struct calgary_bus_info *info;
 
if (no_iommu || swiotlb || !calgary_detected)
return -ENODEV;
@@ -1562,11 +1560,12 @@ static int __init calgary_fixup_tce_spac
break;
if (!is_cal_pci_dev(dev->device))
continue;
-   if (!translate_phb(dev))
+
+   info = _info[dev->bus->number];
+   if (info->translation_disabled)
continue;
 
-   tce_space = bus_info[dev->bus->number].tce_space;
-   if (!tce_space)
+   if (!info->tce_space)
continue;
 
calgary_fixup_one_tce_space(dev);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] x86-64: Calgary - fix calgary=disable= for CalIOC2

2007-09-21 Thread Muli Ben-Yehuda
The old check we used based on dev->bus->number is wrong for devices
on CalIOC2. Instead look whether we have an IOMMU table for that bus -
if not, translation is disabled.

Thanks to Murillo Fernandes Bernardes <[EMAIL PROTECTED]> for
spotting, suggesting a fix and testing.

Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>
Acked-by: Murillo Fernandes Bernardes <[EMAIL PROTECTED]>
---
Andi, This is Calgary 2.6.24 material - please apply.

 pci-calgary.c |   16 +++-
 1 files changed, 11 insertions(+), 5 deletions(-)

diff -r ca20f7bdb869 -r c9308d0538d9 arch/x86_64/kernel/pci-calgary.c
--- a/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:38:08 2007 +0300
+++ b/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:43:49 2007 +0300
@@ -220,6 +220,12 @@ static inline unsigned int num_dma_pages
return npages;
 }
 
+static inline int translation_enabled(struct iommu_table *tbl)
+{
+   /* only PHBs with translation enabled have an IOMMU table */
+   return (tbl != NULL);
+}
+
 static inline int translate_phb(struct pci_dev* dev)
 {
int disabled = bus_info[dev->bus->number].translation_disabled;
@@ -384,7 +390,7 @@ static void calgary_unmap_sg(struct devi
 {
struct iommu_table *tbl = find_iommu_table(dev);
 
-   if (!translate_phb(to_pci_dev(dev)))
+   if (!translation_enabled(tbl))
return;
 
while (nelems--) {
@@ -424,7 +430,7 @@ static int calgary_map_sg(struct device 
unsigned long entry;
int i;
 
-   if (!translate_phb(to_pci_dev(dev)))
+   if (!translation_enabled(tbl))
return calgary_nontranslate_map_sg(dev, sg, nelems, direction);
 
for (i = 0; i < nelems; i++ ) {
@@ -471,7 +477,7 @@ static dma_addr_t calgary_map_single(str
uaddr = (unsigned long)vaddr;
npages = num_dma_pages(uaddr, size);
 
-   if (translate_phb(to_pci_dev(dev)))
+   if (translation_enabled(tbl))
dma_handle = iommu_alloc(tbl, vaddr, npages, direction);
else
dma_handle = virt_to_bus(vaddr);
@@ -485,7 +491,7 @@ static void calgary_unmap_single(struct 
struct iommu_table *tbl = find_iommu_table(dev);
unsigned int npages;
 
-   if (!translate_phb(to_pci_dev(dev)))
+   if (!translation_enabled(tbl))
return;
 
npages = num_dma_pages(dma_handle, size);
@@ -510,7 +516,7 @@ static void* calgary_alloc_coherent(stru
goto error;
memset(ret, 0, size);
 
-   if (translate_phb(to_pci_dev(dev))) {
+   if (translation_enabled(tbl)) {
/* set up tces to cover the allocated range */
mapping = iommu_alloc(tbl, ret, npages, DMA_BIDIRECTIONAL);
if (mapping == bad_dma_address)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] x86-64: Calgary - fix calgary=disable=busnum for CalIOC2

2007-09-21 Thread Muli Ben-Yehuda
The old check we used based on dev-bus-number is wrong for devices
on CalIOC2. Instead look whether we have an IOMMU table for that bus -
if not, translation is disabled.

Thanks to Murillo Fernandes Bernardes [EMAIL PROTECTED] for
spotting, suggesting a fix and testing.

Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED]
Acked-by: Murillo Fernandes Bernardes [EMAIL PROTECTED]
---
Andi, This is Calgary 2.6.24 material - please apply.

 pci-calgary.c |   16 +++-
 1 files changed, 11 insertions(+), 5 deletions(-)

diff -r ca20f7bdb869 -r c9308d0538d9 arch/x86_64/kernel/pci-calgary.c
--- a/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:38:08 2007 +0300
+++ b/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:43:49 2007 +0300
@@ -220,6 +220,12 @@ static inline unsigned int num_dma_pages
return npages;
 }
 
+static inline int translation_enabled(struct iommu_table *tbl)
+{
+   /* only PHBs with translation enabled have an IOMMU table */
+   return (tbl != NULL);
+}
+
 static inline int translate_phb(struct pci_dev* dev)
 {
int disabled = bus_info[dev-bus-number].translation_disabled;
@@ -384,7 +390,7 @@ static void calgary_unmap_sg(struct devi
 {
struct iommu_table *tbl = find_iommu_table(dev);
 
-   if (!translate_phb(to_pci_dev(dev)))
+   if (!translation_enabled(tbl))
return;
 
while (nelems--) {
@@ -424,7 +430,7 @@ static int calgary_map_sg(struct device 
unsigned long entry;
int i;
 
-   if (!translate_phb(to_pci_dev(dev)))
+   if (!translation_enabled(tbl))
return calgary_nontranslate_map_sg(dev, sg, nelems, direction);
 
for (i = 0; i  nelems; i++ ) {
@@ -471,7 +477,7 @@ static dma_addr_t calgary_map_single(str
uaddr = (unsigned long)vaddr;
npages = num_dma_pages(uaddr, size);
 
-   if (translate_phb(to_pci_dev(dev)))
+   if (translation_enabled(tbl))
dma_handle = iommu_alloc(tbl, vaddr, npages, direction);
else
dma_handle = virt_to_bus(vaddr);
@@ -485,7 +491,7 @@ static void calgary_unmap_single(struct 
struct iommu_table *tbl = find_iommu_table(dev);
unsigned int npages;
 
-   if (!translate_phb(to_pci_dev(dev)))
+   if (!translation_enabled(tbl))
return;
 
npages = num_dma_pages(dma_handle, size);
@@ -510,7 +516,7 @@ static void* calgary_alloc_coherent(stru
goto error;
memset(ret, 0, size);
 
-   if (translate_phb(to_pci_dev(dev))) {
+   if (translation_enabled(tbl)) {
/* set up tces to cover the allocated range */
mapping = iommu_alloc(tbl, ret, npages, DMA_BIDIRECTIONAL);
if (mapping == bad_dma_address)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] x86-64: Calgary - get rid of translate_phb

2007-09-21 Thread Muli Ben-Yehuda
Now that we check for translation enabled/disabled based on the
presence of the IOMMU translation table, we can get rid of
translate_phb.

Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED]
---
 pci-calgary.c |   31 +++
 1 files changed, 15 insertions(+), 16 deletions(-)

diff -r c9308d0538d9 -r a7833c71f4df arch/x86_64/kernel/pci-calgary.c
--- a/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:43:49 2007 +0300
+++ b/arch/x86_64/kernel/pci-calgary.c  Sun Aug 12 11:45:29 2007 +0300
@@ -226,12 +226,6 @@ static inline int translation_enabled(st
return (tbl != NULL);
 }
 
-static inline int translate_phb(struct pci_dev* dev)
-{
-   int disabled = bus_info[dev-bus-number].translation_disabled;
-   return !disabled;
-}
-
 static void iommu_range_reserve(struct iommu_table *tbl,
unsigned long start_addr, unsigned int npages)
 {
@@ -1197,7 +1191,7 @@ static int __init calgary_init(void)
 {
int ret;
struct pci_dev *dev = NULL;
-   void *tce_space;
+   struct calgary_bus_info *info;
 
ret = calgary_locate_bbars();
if (ret)
@@ -1209,12 +1203,14 @@ static int __init calgary_init(void)
break;
if (!is_cal_pci_dev(dev-device))
continue;
-   if (!translate_phb(dev)) {
+
+   info = bus_info[dev-bus-number];
+   if (info-translation_disabled) {
calgary_init_one_nontraslated(dev);
continue;
}
-   tce_space = bus_info[dev-bus-number].tce_space;
-   if (!tce_space  !translate_empty_slots)
+
+   if (!info-tce_space  !translate_empty_slots)
continue;
 
ret = calgary_init_one(dev);
@@ -1232,11 +1228,13 @@ error:
break;
if (!is_cal_pci_dev(dev-device))
continue;
-   if (!translate_phb(dev)) {
+
+   info = bus_info[dev-bus-number];
+   if (info-translation_disabled) {
pci_dev_put(dev);
continue;
}
-   if (!bus_info[dev-bus-number].tce_space  
!translate_empty_slots)
+   if (!info-tce_space  !translate_empty_slots)
continue;
 
calgary_disable_translation(dev);
@@ -1549,7 +1547,7 @@ static int __init calgary_fixup_tce_spac
 static int __init calgary_fixup_tce_spaces(void)
 {
struct pci_dev *dev = NULL;
-   void *tce_space;
+   struct calgary_bus_info *info;
 
if (no_iommu || swiotlb || !calgary_detected)
return -ENODEV;
@@ -1562,11 +1560,12 @@ static int __init calgary_fixup_tce_spac
break;
if (!is_cal_pci_dev(dev-device))
continue;
-   if (!translate_phb(dev))
+
+   info = bus_info[dev-bus-number];
+   if (info-translation_disabled)
continue;
 
-   tce_space = bus_info[dev-bus-number].tce_space;
-   if (!tce_space)
+   if (!info-tce_space)
continue;
 
calgary_fixup_one_tce_space(dev);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] unify DMA_..BIT_MASK definitions: v1

2007-09-17 Thread Muli Ben-Yehuda
On Tue, Sep 18, 2007 at 06:29:19AM +0200, Borislav Petkov wrote:
> These patches remove redundant DMA_..BIT_MASK definitions across two drivers.
> In this version of the patches, the computation of the bitmasks is done by
> the compiler.
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> 
> --
> Index: 23-rc6/include/linux/dma-mapping.h
> ===
> --- 23-rc6/include/linux/dma-mapping.h.orig   2007-09-17 17:48:20.0 
> +0200
> +++ 23-rc6/include/linux/dma-mapping.h2007-09-18 06:12:33.0 
> +0200
> @@ -13,16 +13,19 @@
>   DMA_NONE = 3,
>  };
>  
> -#define DMA_64BIT_MASK   0xULL
> -#define DMA_48BIT_MASK   0xULL
> -#define DMA_40BIT_MASK   0x00ffULL
> -#define DMA_39BIT_MASK   0x007fULL
> -#define DMA_32BIT_MASK   0xULL
> -#define DMA_31BIT_MASK   0x7fffULL
> -#define DMA_30BIT_MASK   0x3fffULL
> -#define DMA_29BIT_MASK   0x1fffULL
> -#define DMA_28BIT_MASK   0x0fffULL
> -#define DMA_24BIT_MASK   0x00ffULL
> +#define DMA_BIT_MASK(n)  ((1ULL<<(n))-1)
> +
> +#define DMA_64BIT_MASK   DMA_BIT_MASK(64)

This one does not do what you mean. You need an explicit mask or a
~0ULL here.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] unify DMA_..BIT_MASK definitions: v1

2007-09-17 Thread Muli Ben-Yehuda
On Tue, Sep 18, 2007 at 06:29:19AM +0200, Borislav Petkov wrote:
 These patches remove redundant DMA_..BIT_MASK definitions across two drivers.
 In this version of the patches, the computation of the bitmasks is done by
 the compiler.
 
 Signed-off-by: Borislav Petkov [EMAIL PROTECTED]
 Cc: Jeremy Fitzhardinge [EMAIL PROTECTED]
 
 --
 Index: 23-rc6/include/linux/dma-mapping.h
 ===
 --- 23-rc6/include/linux/dma-mapping.h.orig   2007-09-17 17:48:20.0 
 +0200
 +++ 23-rc6/include/linux/dma-mapping.h2007-09-18 06:12:33.0 
 +0200
 @@ -13,16 +13,19 @@
   DMA_NONE = 3,
  };
  
 -#define DMA_64BIT_MASK   0xULL
 -#define DMA_48BIT_MASK   0xULL
 -#define DMA_40BIT_MASK   0x00ffULL
 -#define DMA_39BIT_MASK   0x007fULL
 -#define DMA_32BIT_MASK   0xULL
 -#define DMA_31BIT_MASK   0x7fffULL
 -#define DMA_30BIT_MASK   0x3fffULL
 -#define DMA_29BIT_MASK   0x1fffULL
 -#define DMA_28BIT_MASK   0x0fffULL
 -#define DMA_24BIT_MASK   0x00ffULL
 +#define DMA_BIT_MASK(n)  ((1ULL(n))-1)
 +
 +#define DMA_64BIT_MASK   DMA_BIT_MASK(64)

This one does not do what you mean. You need an explicit mask or a
~0ULL here.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Muli Ben-Yehuda
On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote:

> Yes, I agree that pci_dev->sysdata can;t be removed. Even we (IOMMU)
> were dependent on this field but somehow this field is being
> overwritten to point to pci_bus's->sysdata and hence IOMMU was
> failing. Earlier it was overwritten to NULL and hence we were not
> failing but now it is overwritten to non-NULL and hence we fail.

Do you know which commit caused that change?

> My therory is that we don;t need to copy pci_bus's->sysdata to 
> pci_dev's->sysdata. Below patch solves my problem.
> Any objection to below patch?

I will give it a spin to verify it works for me, but in general I am
wary of making such changes unless we can verify (read: audit) that
they have no adverse side effects *on all architectures*.

Cheers,
Muli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-10 Thread Muli Ben-Yehuda
On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote:

 Yes, I agree that pci_dev-sysdata can;t be removed. Even we (IOMMU)
 were dependent on this field but somehow this field is being
 overwritten to point to pci_bus's-sysdata and hence IOMMU was
 failing. Earlier it was overwritten to NULL and hence we were not
 failing but now it is overwritten to non-NULL and hence we fail.

Do you know which commit caused that change?

 My therory is that we don;t need to copy pci_bus's-sysdata to 
 pci_dev's-sysdata. Below patch solves my problem.
 Any objection to below patch?

I will give it a spin to verify it works for me, but in general I am
wary of making such changes unless we can verify (read: audit) that
they have no adverse side effects *on all architectures*.

Cheers,
Muli

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-09 Thread Muli Ben-Yehuda
On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote:
> On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote:
> > On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:
> > 
> > > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
> > 
> > This patch feels like a huge hack. See below.
>
> You seem to be jumping to conclusion without going in detail. The
> pci_dev struct contains pointer to sysdata, which in turn points to
> the copy of its parent's bus sysdata.  So technically speaking we
> can eliminate sysdata pointer from pci_dev struct which is what one
> portion of this patch does.

... provided nothing relies on this relationship or the existence of
the pci_dev's sysdata. Have you audited every architecture's use of
the sysdata pointers?

> > > This patch removes sysdata from pci_dev struct and creates a new
> > > field called sys_data which is exclusively used by IOMMU driver to
> > > keep its per device context pointer.
> > 
> > Hmpf, why is this needed? with the pci_sysdata work that recently went
> > into mainline we have a void *iommu member in pci_sysdata which should
> > be all that's needed. Please elaborate if it's not enough for your
> > needs.

> I looked at your patch and it was not suitable because I need to
> store iommu private pointer in pci_dev

Could you elaborate on why you need this? I'm assuming it's for the
per-device IOMMU page tables?

> and not in the pci_bus. So I have added a new member sys_data in the
> pci_dev struct. I can change the name from sys_dev to iomu_priv to
> clear the confusion. Do let me know.

Well, you should be able to just use the pci_dev's ->sysdata (that's
what it's there for after all!) but you might need to make it point to
a structure if it's shared, the same way we did with the bus's
->sysdata. I agree that just having it point to the bus's ->sysdata is
not very useful *but* there may be code in the kernel that relies on
it (Calgary did until very recently...) so it would have to be audited
first.

Cheers,
Muli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-09 Thread Muli Ben-Yehuda
On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:

> Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash

This patch feels like a huge hack. See below.

> This patch removes sysdata from pci_dev struct and creates a new
> field called sys_data which is exclusively used by IOMMU driver to
> keep its per device context pointer.

Hmpf, why is this needed? with the pci_sysdata work that recently went
into mainline we have a void *iommu member in pci_sysdata which should
be all that's needed. Please elaborate if it's not enough for your
needs.

Thanks,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-09 Thread Muli Ben-Yehuda
On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:

 Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash

This patch feels like a huge hack. See below.

 This patch removes sysdata from pci_dev struct and creates a new
 field called sys_data which is exclusively used by IOMMU driver to
 keep its per device context pointer.

Hmpf, why is this needed? with the pci_sysdata work that recently went
into mainline we have a void *iommu member in pci_sysdata which should
be all that's needed. Please elaborate if it's not enough for your
needs.

Thanks,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-09 Thread Muli Ben-Yehuda
On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote:
 On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote:
  On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote:
  
   Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash
  
  This patch feels like a huge hack. See below.

 You seem to be jumping to conclusion without going in detail. The
 pci_dev struct contains pointer to sysdata, which in turn points to
 the copy of its parent's bus sysdata.  So technically speaking we
 can eliminate sysdata pointer from pci_dev struct which is what one
 portion of this patch does.

... provided nothing relies on this relationship or the existence of
the pci_dev's sysdata. Have you audited every architecture's use of
the sysdata pointers?

   This patch removes sysdata from pci_dev struct and creates a new
   field called sys_data which is exclusively used by IOMMU driver to
   keep its per device context pointer.
  
  Hmpf, why is this needed? with the pci_sysdata work that recently went
  into mainline we have a void *iommu member in pci_sysdata which should
  be all that's needed. Please elaborate if it's not enough for your
  needs.

 I looked at your patch and it was not suitable because I need to
 store iommu private pointer in pci_dev

Could you elaborate on why you need this? I'm assuming it's for the
per-device IOMMU page tables?

 and not in the pci_bus. So I have added a new member sys_data in the
 pci_dev struct. I can change the name from sys_dev to iomu_priv to
 clear the confusion. Do let me know.

Well, you should be able to just use the pci_dev's -sysdata (that's
what it's there for after all!) but you might need to make it point to
a structure if it's shared, the same way we did with the bus's
-sysdata. I agree that just having it point to the bus's -sysdata is
not very useful *but* there may be code in the kernel that relies on
it (Calgary did until very recently...) so it would have to be audited
first.

Cheers,
Muli

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/x86-64 PCI domain support

2007-09-02 Thread Muli Ben-Yehuda
On Sat, Sep 01, 2007 at 10:32:23AM -0400, Jeff Garzik wrote:
> 
> Now that the dust has settled and the prep work is upstream, adding
> PCI edomain support to x86 is a lot more straightforward.
> 
> Targetted for 2.6.24.

Looks good to me.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/x86-64 PCI domain support

2007-09-02 Thread Muli Ben-Yehuda
On Sat, Sep 01, 2007 at 10:32:23AM -0400, Jeff Garzik wrote:
 
 Now that the dust has settled and the prep work is upstream, adding
 PCI edomain support to x86 is a lot more straightforward.
 
 Targetted for 2.6.24.

Looks good to me.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Unified x86 architecture, arch/x86 - V2

2007-08-23 Thread Muli Ben-Yehuda
On Wed, Aug 22, 2007 at 11:56:22PM +0200, Thomas Gleixner wrote:

> We are pleased to announce v2 of the unified arch/x86 project we are
> working on.

After having recently inadvertently broken i386 while making x86-64
changes, thumbs up from me!

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Unified x86 architecture, arch/x86 - V2

2007-08-23 Thread Muli Ben-Yehuda
On Wed, Aug 22, 2007 at 11:56:22PM +0200, Thomas Gleixner wrote:

 We are pleased to announce v2 of the unified arch/x86 project we are
 working on.

After having recently inadvertently broken i386 while making x86-64
changes, thumbs up from me!

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE

2007-08-13 Thread Muli Ben-Yehuda
On Mon, Aug 13, 2007 at 10:25:02AM -0700, Joe Perches wrote:
> On Mon, 2007-08-13 at 15:18 +0300, Muli Ben-Yehuda wrote:
> > Nope, this entry is for sound/oss/trident*.
> 
> TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE
> P:    Muli Ben-Yehuda
> M:[EMAIL PROTECTED]
> L:linux-kernel@vger.kernel.org
> S:Maintained
> F:sound/oss/trident*

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE

2007-08-13 Thread Muli Ben-Yehuda
On Sun, Aug 12, 2007 at 11:37:25PM -0700, [EMAIL PROTECTED] wrote:

> Add file pattern to MAINTAINER entry
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7927653..fbd699e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4563,6 +4563,7 @@ P:  Muli Ben-Yehuda
>  M:   [EMAIL PROTECTED]
>  L:   linux-kernel@vger.kernel.org
>  S:   Maintained
> +F:   sound/pci/trident/

Nope, this entry is for sound/oss/trident*.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [2/2many] - FInd the maintainer(s) for a patch - MAINTAINERS

2007-08-13 Thread Muli Ben-Yehuda
On Sun, Aug 12, 2007 at 10:53:19PM -0700, Joe Perches wrote:
> Describe the new F: pattern

Patch looks reversed?

> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0d7f856..d3a0684 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -83,13 +83,6 @@ S: Status, one of the following:
>   it has been replaced by a better system and you
>   should be using that.
>  
> -F: Files and directories with wildcard patterns.
> -   A trailing slash includes all files and subdirectory files.
> - F:  drivers/net/all files in and below drivers/net
> - F:  drivers/net/*   all files in drivers/net, but not below
> - F:  */net/* all files in "any top level directory"/net
> -   One pattern per line.  Multiple F: lines acceptable.
> -
>  3C359 NETWORK DRIVER
>  P:   Mike Phillips
>  M:   [EMAIL PROTECTED]

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [118/2many] MAINTAINERS - CALGARY x86-64 IOMMU

2007-08-13 Thread Muli Ben-Yehuda
On Sun, Aug 12, 2007 at 11:25:06PM -0700, [EMAIL PROTECTED] wrote:
> Add file pattern to MAINTAINER entry
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 482d255..9c26e7b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1148,6 +1148,11 @@ M: [EMAIL PROTECTED]
>  L:   linux-kernel@vger.kernel.org
>  L:   [EMAIL PROTECTED]
>  S:   Maintained
> +F:   arch/x86_64/kernel/pci-calgary.c
> +F:   arch/x86_64/kernel/pci-dma.c

pci-dma.c is generic x86-64 code maintained by Andi Kleen.

> +F:   arch/x86_64/kernel/tce.c
> +F:   include/asm-x86_6/calgary.h
> +F:   include/asm-x86_6/pci.h

Likewise pci.h.

You may also want to include 

F:  include/asm-x86_6/tce.h
F:  include/asm-x86_6/rio.h

Rest looks good:

Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [118/2many] MAINTAINERS - CALGARY x86-64 IOMMU

2007-08-13 Thread Muli Ben-Yehuda
On Sun, Aug 12, 2007 at 11:25:06PM -0700, [EMAIL PROTECTED] wrote:
 Add file pattern to MAINTAINER entry
 
 Signed-off-by: Joe Perches [EMAIL PROTECTED]
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 482d255..9c26e7b 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -1148,6 +1148,11 @@ M: [EMAIL PROTECTED]
  L:   linux-kernel@vger.kernel.org
  L:   [EMAIL PROTECTED]
  S:   Maintained
 +F:   arch/x86_64/kernel/pci-calgary.c
 +F:   arch/x86_64/kernel/pci-dma.c

pci-dma.c is generic x86-64 code maintained by Andi Kleen.

 +F:   arch/x86_64/kernel/tce.c
 +F:   include/asm-x86_6/calgary.h
 +F:   include/asm-x86_6/pci.h

Likewise pci.h.

You may also want to include 

F:  include/asm-x86_6/tce.h
F:  include/asm-x86_6/rio.h

Rest looks good:

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [2/2many] - FInd the maintainer(s) for a patch - MAINTAINERS

2007-08-13 Thread Muli Ben-Yehuda
On Sun, Aug 12, 2007 at 10:53:19PM -0700, Joe Perches wrote:
 Describe the new F: pattern

Patch looks reversed?

 
 Signed-off-by: Joe Perches [EMAIL PROTECTED]
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 0d7f856..d3a0684 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -83,13 +83,6 @@ S: Status, one of the following:
   it has been replaced by a better system and you
   should be using that.
  
 -F: Files and directories with wildcard patterns.
 -   A trailing slash includes all files and subdirectory files.
 - F:  drivers/net/all files in and below drivers/net
 - F:  drivers/net/*   all files in drivers/net, but not below
 - F:  */net/* all files in any top level directory/net
 -   One pattern per line.  Multiple F: lines acceptable.
 -
  3C359 NETWORK DRIVER
  P:   Mike Phillips
  M:   [EMAIL PROTECTED]

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE

2007-08-13 Thread Muli Ben-Yehuda
On Sun, Aug 12, 2007 at 11:37:25PM -0700, [EMAIL PROTECTED] wrote:

 Add file pattern to MAINTAINER entry
 
 Signed-off-by: Joe Perches [EMAIL PROTECTED]
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 7927653..fbd699e 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -4563,6 +4563,7 @@ P:  Muli Ben-Yehuda
  M:   [EMAIL PROTECTED]
  L:   linux-kernel@vger.kernel.org
  S:   Maintained
 +F:   sound/pci/trident/

Nope, this entry is for sound/oss/trident*.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE

2007-08-13 Thread Muli Ben-Yehuda
On Mon, Aug 13, 2007 at 10:25:02AM -0700, Joe Perches wrote:
 On Mon, 2007-08-13 at 15:18 +0300, Muli Ben-Yehuda wrote:
  Nope, this entry is for sound/oss/trident*.
 
 TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE
 P:Muli Ben-Yehuda
 M:[EMAIL PROTECTED]
 L:linux-kernel@vger.kernel.org
 S:Maintained
 F:sound/oss/trident*

Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED]

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patches] [PATCH] [4/12] x86_64: Disable CLFLUSH support again

2007-08-09 Thread Muli Ben-Yehuda
On Thu, Aug 09, 2007 at 02:29:58PM +0100, Jan Beulich wrote:

> The issue here is not with clflush by itself, but with what pages it
> is being applied to. Here, it gets used on page table pages when
> flushing really is needed on what one or more page table entries in
> that page table page point(ed) to.

I see, thanks.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [4/12] x86_64: Disable CLFLUSH support again

2007-08-09 Thread Muli Ben-Yehuda
On Thu, Aug 09, 2007 at 02:41:31PM +0200, Andi Kleen wrote:
> 
> It turns out CLFLUSH support is still not complete; we
> flush the wrong pages.  Again disable it for the release.
> Noticed by Jan Beulich.
> 
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Aside from the bug Jan pointed out with the patch, we're also using
cflflush in other places (arch/i386/kernel/alternative.c and
arch/x86_64/kernel/tce.c). Is the issue with clflush support specific
to pageattr.c or should it be disabled every else too? what
specifically is the issue?

Cheers,
Muli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [0/12] x86 Late merge bug fixes for 2.6.23

2007-08-09 Thread Muli Ben-Yehuda
On Thu, Aug 09, 2007 at 02:41:27PM +0200, Andi Kleen wrote:
> 
> Various simple bug fixes, no new features. Please review.  I plan to
> send them to Linus later.
> 
> I don't have anything else critical pending. If you believe a x86
> patch is critical and really needs to make .23 please send email.

We also need this fix for the current sysdata regression:

Subject: finish i386 and x86-64 sysdata conversion

This patch finishes the i386 and x86-64 ->sysdata conversion and
hopefully also fixes Riku's and Andy's observed bugs. It is based on
Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes:

- introduce pci_scan_bus_with_sysdata() and use it instead of
  pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
  allocate the sysdata structure and then call pci_scan_bus().
- always allocate pci_sysdata dynamically. The whole point of this
  sysdata work is to make it easy to do root-bus specific things
  (e.g., support PCI domains and IOMMU's). I dislike using a default
  struct pci_sysdata in some places and a dynamically allocated
  pci_sysdata elsewhere - the potential for someone indavertantly
  changing the default structure is too high.
- this patch only makes the minimal changes necessary, i.e., the NUMA node is
  always initialized to -1. Patches to do the right thing with regards
  to the NUMA node can build on top of this (either add a 'node'
  parameter to pci_scan_bus_with_sysdata() or just update the node
  when it becomes known).

The patch was compile tested with various configurations (e.g., NUMAQ,
VISWS) and run-time tested on i386 and x86-64. Unfortunately none of
my machines exhibited the bugs so caveat emptor.

Andy, could you please see if this fixes the NUMA issues you've seen?
Riku, does this fix "pci=noacpi" on your laptop?

Comments appreciated. If this looks ok it should go into 2.6.23.

Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>
Cc: Yinghai Lu <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Cc: Chuck Ebbert <[EMAIL PROTECTED]>,
Cc: [EMAIL PROTECTED]
Cc: Andy Whitcroft <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
---
 arch/i386/pci/common.c   |   23 +++
 arch/i386/pci/fixup.c|6 +++---
 arch/i386/pci/irq.c  |5 +++--
 arch/i386/pci/legacy.c   |2 +-
 arch/i386/pci/numa.c |   15 +--
 arch/i386/pci/visws.c|4 ++--
 include/asm-i386/pci.h   |3 +++
 include/asm-x86_64/pci.h |2 ++
 8 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c
index 85503de..ebc6f3c 100644
--- a/arch/i386/pci/common.c
+++ b/arch/i386/pci/common.c
@@ -455,3 +455,26 @@ void pcibios_disable_device (struct pci_dev *dev)
if (!dev->msi_enabled && pcibios_disable_irq)
pcibios_disable_irq(dev);
 }
+
+struct pci_bus *pci_scan_bus_with_sysdata(int busno)
+{
+   struct pci_bus *bus = NULL;
+   struct pci_sysdata *sd;
+
+   /*
+* Allocate per-root-bus (not per bus) arch-specific data.
+* TODO: leak; this memory is never freed.
+* It's arguable whether it's worth the trouble to care.
+*/
+   sd = kzalloc(sizeof(*sd), GFP_KERNEL);
+   if (!sd) {
+   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busno);
+   return NULL;
+   }
+   sd->node = -1;
+   bus = pci_scan_bus(busno, _root_ops, sd);
+   if (!bus)
+   kfree(sd);
+
+   return bus;
+}
diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c
index e7306db..c82cbf4 100644
--- a/arch/i386/pci/fixup.c
+++ b/arch/i386/pci/fixup.c
@@ -25,9 +25,9 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d)
pci_read_config_byte(d, reg++, );
DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
if (busno)
-   pci_scan_bus(busno, _root_ops, NULL);   /* Bus 
A */
+   pci_scan_bus_with_sysdata(busno);   /* Bus A */
if (suba < subb)
-   pci_scan_bus(suba+1, _root_ops, NULL);  /* Bus 
B */
+   pci_scan_bus_with_sysdata(suba+1);  /* Bus B */
}
pcibios_last_bus = -1;
 }
@@ -42,7 +42,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d)
u8 busno;
pci_read_config_byte(d, 0x4a, );
printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", 
pci_name(d), busno);
-   pci_scan_bus(busno, _root_ops, NULL);
+   pci_scan_bus_with_sysdata(busno);
pcibios_last_bus = -1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, 
pci_fixup_i450gx);
diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
index f2cb942..665db06 100644
--- a/arch/i386/pci/ir

Re: [PATCH] [4/12] x86_64: Disable CLFLUSH support again

2007-08-09 Thread Muli Ben-Yehuda
On Thu, Aug 09, 2007 at 02:41:31PM +0200, Andi Kleen wrote:
 
 It turns out CLFLUSH support is still not complete; we
 flush the wrong pages.  Again disable it for the release.
 Noticed by Jan Beulich.
 
 Signed-off-by: Andi Kleen [EMAIL PROTECTED]

Aside from the bug Jan pointed out with the patch, we're also using
cflflush in other places (arch/i386/kernel/alternative.c and
arch/x86_64/kernel/tce.c). Is the issue with clflush support specific
to pageattr.c or should it be disabled every else too? what
specifically is the issue?

Cheers,
Muli

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [0/12] x86 Late merge bug fixes for 2.6.23

2007-08-09 Thread Muli Ben-Yehuda
On Thu, Aug 09, 2007 at 02:41:27PM +0200, Andi Kleen wrote:
 
 Various simple bug fixes, no new features. Please review.  I plan to
 send them to Linus later.
 
 I don't have anything else critical pending. If you believe a x86
 patch is critical and really needs to make .23 please send email.

We also need this fix for the current sysdata regression:

Subject: finish i386 and x86-64 sysdata conversion

This patch finishes the i386 and x86-64 -sysdata conversion and
hopefully also fixes Riku's and Andy's observed bugs. It is based on
Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes:

- introduce pci_scan_bus_with_sysdata() and use it instead of
  pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
  allocate the sysdata structure and then call pci_scan_bus().
- always allocate pci_sysdata dynamically. The whole point of this
  sysdata work is to make it easy to do root-bus specific things
  (e.g., support PCI domains and IOMMU's). I dislike using a default
  struct pci_sysdata in some places and a dynamically allocated
  pci_sysdata elsewhere - the potential for someone indavertantly
  changing the default structure is too high.
- this patch only makes the minimal changes necessary, i.e., the NUMA node is
  always initialized to -1. Patches to do the right thing with regards
  to the NUMA node can build on top of this (either add a 'node'
  parameter to pci_scan_bus_with_sysdata() or just update the node
  when it becomes known).

The patch was compile tested with various configurations (e.g., NUMAQ,
VISWS) and run-time tested on i386 and x86-64. Unfortunately none of
my machines exhibited the bugs so caveat emptor.

Andy, could you please see if this fixes the NUMA issues you've seen?
Riku, does this fix pci=noacpi on your laptop?

Comments appreciated. If this looks ok it should go into 2.6.23.

Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED]
Cc: Yinghai Lu [EMAIL PROTECTED]
Cc: Andi Kleen [EMAIL PROTECTED]
Cc: Andrew Morton [EMAIL PROTECTED]
Cc: Chuck Ebbert [EMAIL PROTECTED],
Cc: [EMAIL PROTECTED]
Cc: Andy Whitcroft [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
---
 arch/i386/pci/common.c   |   23 +++
 arch/i386/pci/fixup.c|6 +++---
 arch/i386/pci/irq.c  |5 +++--
 arch/i386/pci/legacy.c   |2 +-
 arch/i386/pci/numa.c |   15 +--
 arch/i386/pci/visws.c|4 ++--
 include/asm-i386/pci.h   |3 +++
 include/asm-x86_64/pci.h |2 ++
 8 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c
index 85503de..ebc6f3c 100644
--- a/arch/i386/pci/common.c
+++ b/arch/i386/pci/common.c
@@ -455,3 +455,26 @@ void pcibios_disable_device (struct pci_dev *dev)
if (!dev-msi_enabled  pcibios_disable_irq)
pcibios_disable_irq(dev);
 }
+
+struct pci_bus *pci_scan_bus_with_sysdata(int busno)
+{
+   struct pci_bus *bus = NULL;
+   struct pci_sysdata *sd;
+
+   /*
+* Allocate per-root-bus (not per bus) arch-specific data.
+* TODO: leak; this memory is never freed.
+* It's arguable whether it's worth the trouble to care.
+*/
+   sd = kzalloc(sizeof(*sd), GFP_KERNEL);
+   if (!sd) {
+   printk(KERN_ERR PCI: OOM, skipping PCI bus %02x\n, busno);
+   return NULL;
+   }
+   sd-node = -1;
+   bus = pci_scan_bus(busno, pci_root_ops, sd);
+   if (!bus)
+   kfree(sd);
+
+   return bus;
+}
diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c
index e7306db..c82cbf4 100644
--- a/arch/i386/pci/fixup.c
+++ b/arch/i386/pci/fixup.c
@@ -25,9 +25,9 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d)
pci_read_config_byte(d, reg++, subb);
DBG(i450NX PXB %d: %02x/%02x/%02x\n, pxb, busno, suba, subb);
if (busno)
-   pci_scan_bus(busno, pci_root_ops, NULL);   /* Bus 
A */
+   pci_scan_bus_with_sysdata(busno);   /* Bus A */
if (suba  subb)
-   pci_scan_bus(suba+1, pci_root_ops, NULL);  /* Bus 
B */
+   pci_scan_bus_with_sysdata(suba+1);  /* Bus B */
}
pcibios_last_bus = -1;
 }
@@ -42,7 +42,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d)
u8 busno;
pci_read_config_byte(d, 0x4a, busno);
printk(KERN_INFO PCI: i440KX/GX host bridge %s: secondary bus %02x\n, 
pci_name(d), busno);
-   pci_scan_bus(busno, pci_root_ops, NULL);
+   pci_scan_bus_with_sysdata(busno);
pcibios_last_bus = -1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, 
pci_fixup_i450gx);
diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
index f2cb942..665db06 100644
--- a/arch/i386/pci/irq.c
+++ b/arch/i386/pci/irq.c
@@ -138,8 +138,9 @@ static void __init pirq_peer_trick(void)
for(i = 1; i  256; i

Re: [patches] [PATCH] [4/12] x86_64: Disable CLFLUSH support again

2007-08-09 Thread Muli Ben-Yehuda
On Thu, Aug 09, 2007 at 02:29:58PM +0100, Jan Beulich wrote:

 The issue here is not with clflush by itself, but with what pages it
 is being applied to. Here, it gets used on page table pages when
 flushing really is needed on what one or more page table entries in
 that page table page point(ed) to.

I see, thanks.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] x86_64: get mp_bus_to_node as early v3

2007-08-08 Thread Muli Ben-Yehuda
On Tue, Aug 07, 2007 at 06:20:33PM -0700, Yinghai Lu wrote:

> please check this one against pci_scan_bus_with_sysdata
> 
> [PATCH 1/5] x86_64: get mp_bus_to_node as early v3
> 
> current on amd k8 system with multi ht chain, the numa_node of pci
> devices under r/sys/devices/pci:80/* always 0, even that chain is
> on node 1 or 2 or 3.
> 
> workaround: pcibus_to_node(bus) is used when we want to get node
> that pci_device is on.
> 
> In struct device, we already have numa_node member. and we could use
> dev_to_node() /set_dev_node() to get and set numa_node in the
> device.  set_dev_node is called in pci_device_add() with
> pcibus_to_node(bus). and pcibus_to_node use bus->sysdata for nodeid.
> 
> the problem is when pci_add_device is called, bus->sysdata is not
> assigned correct nodeid yet. the result will be numa_node always is
> 0.
> 
> pcibios_scan_root and pci_scan_root could take sysdata. So we need
> to get mp_bus_to_node mapping before these two are called. and
> get_mp_bus_to_node could get correct node for sysdata in root bus.
> 
> in scanning of root bus, all child bus will take parent bus sysdata. So all
> pci_device->dev.numa_node will be assigned correctly automatically.
> 
> later we could use dev_to_node(_dev->dev) to get numa_node, and
> we could also could make other bus specific device get the correct
> numa_node too.
> 
> this is one update version to pci_sysdata ...  also reverse
> pci_acpi_scan_root to use pcibios_scan_root again.

Can you explain why this patch (and your other patches in this area)
are needed? is this a performance issue? the patches seem complex, is
there a good argument for that complexity?

Cheers,
Muli


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] x86_64: get mp_bus_to_node as early v3

2007-08-08 Thread Muli Ben-Yehuda
On Tue, Aug 07, 2007 at 06:20:33PM -0700, Yinghai Lu wrote:

 please check this one against pci_scan_bus_with_sysdata
 
 [PATCH 1/5] x86_64: get mp_bus_to_node as early v3
 
 current on amd k8 system with multi ht chain, the numa_node of pci
 devices under r/sys/devices/pci:80/* always 0, even that chain is
 on node 1 or 2 or 3.
 
 workaround: pcibus_to_node(bus) is used when we want to get node
 that pci_device is on.
 
 In struct device, we already have numa_node member. and we could use
 dev_to_node() /set_dev_node() to get and set numa_node in the
 device.  set_dev_node is called in pci_device_add() with
 pcibus_to_node(bus). and pcibus_to_node use bus-sysdata for nodeid.
 
 the problem is when pci_add_device is called, bus-sysdata is not
 assigned correct nodeid yet. the result will be numa_node always is
 0.
 
 pcibios_scan_root and pci_scan_root could take sysdata. So we need
 to get mp_bus_to_node mapping before these two are called. and
 get_mp_bus_to_node could get correct node for sysdata in root bus.
 
 in scanning of root bus, all child bus will take parent bus sysdata. So all
 pci_device-dev.numa_node will be assigned correctly automatically.
 
 later we could use dev_to_node(pci_dev-dev) to get numa_node, and
 we could also could make other bus specific device get the correct
 numa_node too.
 
 this is one update version to pci_sysdata ...  also reverse
 pci_acpi_scan_root to use pcibios_scan_root again.

Can you explain why this patch (and your other patches in this area)
are needed? is this a performance issue? the patches seem complex, is
there a good argument for that complexity?

Cheers,
Muli


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion

2007-08-07 Thread Muli Ben-Yehuda
On Tue, Aug 07, 2007 at 03:49:11PM -0700, Andrew Morton wrote:

> I am sooo tired of this thing.  Andi, someone, can we for heaven's
> sake please just get it all sorted out?

With regards to the sysdata conversion: Riku says he cannot test new
kernel. I haven't heard anything from Andy Whitcroft. It passes all of
my tests and what we have now is obviously broken... I think we should
put the fix in.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion

2007-08-07 Thread Muli Ben-Yehuda
On Tue, Aug 07, 2007 at 03:49:11PM -0700, Andrew Morton wrote:

 I am sooo tired of this thing.  Andi, someone, can we for heaven's
 sake please just get it all sorted out?

With regards to the sysdata conversion: Riku says he cannot test new
kernel. I haven't heard anything from Andy Whitcroft. It passes all of
my tests and what we have now is obviously broken... I think we should
put the fix in.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion

2007-08-05 Thread Muli Ben-Yehuda
On Sun, Aug 05, 2007 at 01:49:57AM -0700, Yinghai Lu wrote:

> Can you change
> pci_scan_bus_with_sysdata(int busno)
> to
> pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)?

Do you anticipate passing in a different pci_ops or node? 
In any case please remember I am aiming for the minimal "obviously
correc" change for 2.6.23...

> pci_scan_bus_with_sysdata(int busno) make me feel that i need feed one
> sysdata as on param for it.

Yeah, lousy name, but the best I came up with. Runner up was
'x86_pci_scan_bus', which is I think worse?

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH/RFT] finish i386 and x86-64 sysdata conversion

2007-08-05 Thread Muli Ben-Yehuda
This patch finishes the i386 and x86-64 ->sysdata conversion and
hopefully also fixes Riku's and Andy's observed bugs. It is based on
Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes:

- introduce pci_scan_bus_with_sysdata() and use it instead of
  pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
  allocate the sysdata structure and then call pci_scan_bus().
- always allocate pci_sysdata dynamically. The whole point of this
  sysdata work is to make it easy to do root-bus specific things
  (e.g., support PCI domains and IOMMU's). I dislike using a default
  struct pci_sysdata in some places and a dynamically allocated
  pci_sysdata elsewhere - the potential for someone indavertantly
  changing the default structure is too high.
- this patch only makes the minimal changes necessary, i.e., the NUMA node is
  always initialized to -1. Patches to do the right thing with regards
  to the NUMA node can build on top of this (either add a 'node'
  parameter to pci_scan_bus_with_sysdata() or just update the node
  when it becomes known).

The patch was compile tested with various configurations (e.g., NUMAQ,
VISWS) and run-time tested on i386 and x86-64. Unfortunately none of
my machines exhibited the bugs so caveat emptor.

Andy, could you please see if this fixes the NUMA issues you've seen?
Riku, does this fix "pci=noacpi" on your laptop?

Comments appreciated. If this looks ok it should go into 2.6.23.

Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]>
Cc: Yinghai Lu <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Cc: Chuck Ebbert <[EMAIL PROTECTED]>,
Cc: [EMAIL PROTECTED]
Cc: Andy Whitcroft <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
---
 arch/i386/pci/common.c   |   23 +++
 arch/i386/pci/fixup.c|6 +++---
 arch/i386/pci/irq.c  |5 +++--
 arch/i386/pci/legacy.c   |2 +-
 arch/i386/pci/numa.c |   15 +--
 arch/i386/pci/visws.c|4 ++--
 include/asm-i386/pci.h   |3 +++
 include/asm-x86_64/pci.h |2 ++
 8 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c
index 85503de..ebc6f3c 100644
--- a/arch/i386/pci/common.c
+++ b/arch/i386/pci/common.c
@@ -455,3 +455,26 @@ void pcibios_disable_device (struct pci_dev *dev)
if (!dev->msi_enabled && pcibios_disable_irq)
pcibios_disable_irq(dev);
 }
+
+struct pci_bus *pci_scan_bus_with_sysdata(int busno)
+{
+   struct pci_bus *bus = NULL;
+   struct pci_sysdata *sd;
+
+   /*
+* Allocate per-root-bus (not per bus) arch-specific data.
+* TODO: leak; this memory is never freed.
+* It's arguable whether it's worth the trouble to care.
+*/
+   sd = kzalloc(sizeof(*sd), GFP_KERNEL);
+   if (!sd) {
+   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busno);
+   return NULL;
+   }
+   sd->node = -1;
+   bus = pci_scan_bus(busno, _root_ops, sd);
+   if (!bus)
+   kfree(sd);
+
+   return bus;
+}
diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c
index e7306db..c82cbf4 100644
--- a/arch/i386/pci/fixup.c
+++ b/arch/i386/pci/fixup.c
@@ -25,9 +25,9 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d)
pci_read_config_byte(d, reg++, );
DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb);
if (busno)
-   pci_scan_bus(busno, _root_ops, NULL);   /* Bus 
A */
+   pci_scan_bus_with_sysdata(busno);   /* Bus A */
if (suba < subb)
-   pci_scan_bus(suba+1, _root_ops, NULL);  /* Bus 
B */
+   pci_scan_bus_with_sysdata(suba+1);  /* Bus B */
}
pcibios_last_bus = -1;
 }
@@ -42,7 +42,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d)
u8 busno;
pci_read_config_byte(d, 0x4a, );
printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", 
pci_name(d), busno);
-   pci_scan_bus(busno, _root_ops, NULL);
+   pci_scan_bus_with_sysdata(busno);
pcibios_last_bus = -1;
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, 
pci_fixup_i450gx);
diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
index f2cb942..665db06 100644
--- a/arch/i386/pci/irq.c
+++ b/arch/i386/pci/irq.c
@@ -138,8 +138,9 @@ static void __init pirq_peer_trick(void)
for(i = 1; i < 256; i++) {
if (!busmap[i] || pci_find_bus(0, i))
continue;
-   if (pci_scan_bus(i, _root_ops, NULL))
-   printk(KERN_INFO "PCI: Discovered primary peer bus %02x 
[IRQ]\n", i);
+   if (pci_scan_bus_with_sysdata(i))
+ 

  1   2   3   4   >