Re: [Qemu-devel] Nested KVM is weird?
On Sun, Jun 01, 2014 at 11:30:02PM +0700, Jun Koi wrote: > (1) do you think this VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL is the > reason why ESXi falls back to binary translation? It might be, its been a while since I got ESXi to use VMX on KVM. Take a look at the VMware log file for the L2, it should have a lot more information. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Qemu-devel] Nested KVM is weird?
On Sun, Jun 01, 2014 at 11:30:02PM +0700, Jun Koi wrote: (1) do you think this VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL is the reason why ESXi falls back to binary translation? It might be, its been a while since I got ESXi to use VMX on KVM. Take a look at the VMware log file for the L2, it should have a lot more information. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Qemu-devel] Nested KVM is weird?
On Sun, Jun 01, 2014 at 05:54:25PM +0700, Jun Koi wrote: > So this means ESXi never uses VMResume/VMLaunch? How is this > possible, because it uses VMX for its implementation? ESXi will fall back to binary translation if it decides that it cannot use VMX for some reason. Look at the L2's log file inside ESXi and you will likely find some errors related to running nested, MSRs KVM does not emulate properly, or something else that causes ESXi to use binary translation for its L2. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Qemu-devel] Nested KVM is weird?
On Sun, Jun 01, 2014 at 05:54:25PM +0700, Jun Koi wrote: So this means ESXi never uses VMResume/VMLaunch? How is this possible, because it uses VMX for its implementation? ESXi will fall back to binary translation if it decides that it cannot use VMX for some reason. Look at the L2's log file inside ESXi and you will likely find some errors related to running nested, MSRs KVM does not emulate properly, or something else that causes ESXi to use binary translation for its L2. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, calgary: use 8M TCE table size by default
Patch looks good to me. Acked-by: Muli Ben-Yehuda Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86, calgary: use 8M TCE table size by default
Patch looks good to me. Acked-by: Muli Ben-Yehuda mu...@mulix.org Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, calgary: use 8M TCE table size by default
On Fri, Mar 07, 2014 at 11:09:06AM -0500, Vivek Goyal wrote: > I would say it is not very complicated to maintain backward > compatibility in this case. So let us keep saved_max_pfn for some > time and make kexec-tools changes. Some time down the line, one can > get rid of saved_max_pfn completely. If this is not too painful then this would be my preference too. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How could we get rid of saved_max_pfn for calgary iommu?
On Thu, Mar 06, 2014 at 07:46:44AM -0700, Jon Mason wrote: > > I don't know of anyone still using it, but it's not > > impossible. Calgary and CalIOC2 machines would now be ~5-8 years > > old. > > It is getting a bit crufty in arch/x86. Would it be better to move > it to drivers/iommu? Not sure I see the potential benefit... I think for Calgary it's either leave it in arch/x86 or rip it out. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How could we get rid of saved_max_pfn for calgary iommu?
On Thu, Mar 06, 2014 at 07:46:44AM -0700, Jon Mason wrote: I don't know of anyone still using it, but it's not impossible. Calgary and CalIOC2 machines would now be ~5-8 years old. It is getting a bit crufty in arch/x86. Would it be better to move it to drivers/iommu? Not sure I see the potential benefit... I think for Calgary it's either leave it in arch/x86 or rip it out. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, calgary: use 8M TCE table size by default
On Fri, Mar 07, 2014 at 11:09:06AM -0500, Vivek Goyal wrote: I would say it is not very complicated to maintain backward compatibility in this case. So let us keep saved_max_pfn for some time and make kexec-tools changes. Some time down the line, one can get rid of saved_max_pfn completely. If this is not too painful then this would be my preference too. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How could we get rid of saved_max_pfn for calgary iommu?
On Wed, Mar 05, 2014 at 10:50:41PM -0800, H. Peter Anvin wrote: > OK, second question... is it time to axe Calgary? I don't know of anyone still using it, but it's not impossible. Calgary and CalIOC2 machines would now be ~5-8 years old. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How could we get rid of saved_max_pfn for calgary iommu?
On Wed, Mar 05, 2014 at 01:36:17PM +0800, WANG Chao wrote: > Hi, Muli > > saved_max_pfn is becoming a setback for kexec-tools. Ideally calgary > could get rid of saved_max_pfn at all. But If this can't work, how > about exporting a calgary tce table size to user space, so that > kexec-tools can simply pass calgary=xxx cmdline to 2nd kernel. As Jon noted, this code is used to so that the TCE table remains consistent between the original and the kexec'd kernel. I see two options: either we hard code the TCE table size to the max so that this bit of code becomes redundant, or we explicitly pass the original table size (or the original max_pfn) to the kexec'd kernel. The first option is more appealing, because I don't think anyone is actually using the TCE table size -- we mostly added it for debugging the IOMMU TCE code at the time -- but since I don't have a Calgary machine anymore, I don't have any way to test it. The second option is uglier but would be fully backward-compatible and less likely to break things. Given that very few people are likely running the latest upstream kernel on Calgary/CalIOC2 machines, I'm inclined towards the first option. > BTW MAINTAINERS file still uses your old email, please update > accordingly. I think you are the first person to actually look up the Calgary maintainers in the last few years :-) Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How could we get rid of saved_max_pfn for calgary iommu?
On Wed, Mar 05, 2014 at 01:36:17PM +0800, WANG Chao wrote: Hi, Muli saved_max_pfn is becoming a setback for kexec-tools. Ideally calgary could get rid of saved_max_pfn at all. But If this can't work, how about exporting a calgary tce table size to user space, so that kexec-tools can simply pass calgary=xxx cmdline to 2nd kernel. As Jon noted, this code is used to so that the TCE table remains consistent between the original and the kexec'd kernel. I see two options: either we hard code the TCE table size to the max so that this bit of code becomes redundant, or we explicitly pass the original table size (or the original max_pfn) to the kexec'd kernel. The first option is more appealing, because I don't think anyone is actually using the TCE table size -- we mostly added it for debugging the IOMMU TCE code at the time -- but since I don't have a Calgary machine anymore, I don't have any way to test it. The second option is uglier but would be fully backward-compatible and less likely to break things. Given that very few people are likely running the latest upstream kernel on Calgary/CalIOC2 machines, I'm inclined towards the first option. BTW MAINTAINERS file still uses your old email, please update accordingly. I think you are the first person to actually look up the Calgary maintainers in the last few years :-) Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How could we get rid of saved_max_pfn for calgary iommu?
On Wed, Mar 05, 2014 at 10:50:41PM -0800, H. Peter Anvin wrote: OK, second question... is it time to axe Calgary? I don't know of anyone still using it, but it's not impossible. Calgary and CalIOC2 machines would now be ~5-8 years old. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Fri, Feb 15, 2008 at 10:28:22AM -0800, Greg KH wrote: > That would be nice. Muli, want to make a patch for this? Sure, I'll put something together. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Fri, Feb 15, 2008 at 10:28:22AM -0800, Greg KH wrote: That would be nice. Muli, want to make a patch for this? Sure, I'll put something together. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Thu, Feb 14, 2008 at 11:17:03PM -0800, Greg KH wrote: > Hm, that's wierd. I thought I got something, until I realized that > you are doing a lot of logic before you ever even determine that > your hardware is present in the system. Why are you calling > calgary_locate_bbars() and doing all of that work? Or am I mising > something in the code flow here? Ok, it's all starting to come back to me. See below. > Also, it looks like you use the pci_get_device() to find the pci > device, then you do somethings, and then drop the device, never to > use it again. > > So, a traditional "probe" might not work as well, but it could be > used if you want to. We have a two stage initialization process. First, we need to be initialized very early in order to be able to allocate relatively large contiguous chunks of RAM. That requires detecting whether we're on a Calgary/CalIOC2 system very early (detect_calgary(), called from pci_iommu_alloc()). Then we have "regular" PCI initialization at a later stage (calgary_iommu_init(), called from pci_iommu_init()), which also calls calgary_locate_bbars(). Unlike a regular PCI device which has its BARs in the configuration space, we need to probe BIOS provided tables to find where our BARs are before we do anything else with the "device representation" of the IOMMU. Note that the same two-stage initialization scheme is used by all other x86 IOMMUs, for similar reasons. Now, Calgary is an isolation-capable IOMMU which has per-root-bus (as opposed to global or per-BDF) IO address space. The reason we want to access the pci_dev of each Calgary bus is to hang off of it the IOMMU tables for all devices under that bus so that we end up using the right translation table when a device asks for a DMA mapping. Hence the loop in calgary_init, which loops over all pci_dev's of Calgary busses, raises the ref count for each pci_dev to protect against hot-unplug, and hooks up the IOMMU table for everything under that bus to the pci_dev for the bus. Unlike a regular driver, our main entry points are via the DMA-API, which takes a device's pci_dev, not Calgary's. We then walk up the bus hierarchy and find the Calgary pci_dev that is an ancestor of this device, and that gives us the right IOMMU table to us. Note that this has to occur *before* PCI device initialization, since we want to turn translation on before any device will try to DMA. In conclusion, our usage doesn't seem lika a good fit for the probe approach, although it could probably converted provided we got the ordering right with regards to regular PCI device initialization. Doesn't seem to be worth the effort. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Thu, Feb 14, 2008 at 11:17:03PM -0800, Greg KH wrote: Hm, that's wierd. I thought I got something, until I realized that you are doing a lot of logic before you ever even determine that your hardware is present in the system. Why are you calling calgary_locate_bbars() and doing all of that work? Or am I mising something in the code flow here? Ok, it's all starting to come back to me. See below. Also, it looks like you use the pci_get_device() to find the pci device, then you do somethings, and then drop the device, never to use it again. So, a traditional probe might not work as well, but it could be used if you want to. We have a two stage initialization process. First, we need to be initialized very early in order to be able to allocate relatively large contiguous chunks of RAM. That requires detecting whether we're on a Calgary/CalIOC2 system very early (detect_calgary(), called from pci_iommu_alloc()). Then we have regular PCI initialization at a later stage (calgary_iommu_init(), called from pci_iommu_init()), which also calls calgary_locate_bbars(). Unlike a regular PCI device which has its BARs in the configuration space, we need to probe BIOS provided tables to find where our BARs are before we do anything else with the device representation of the IOMMU. Note that the same two-stage initialization scheme is used by all other x86 IOMMUs, for similar reasons. Now, Calgary is an isolation-capable IOMMU which has per-root-bus (as opposed to global or per-BDF) IO address space. The reason we want to access the pci_dev of each Calgary bus is to hang off of it the IOMMU tables for all devices under that bus so that we end up using the right translation table when a device asks for a DMA mapping. Hence the loop in calgary_init, which loops over all pci_dev's of Calgary busses, raises the ref count for each pci_dev to protect against hot-unplug, and hooks up the IOMMU table for everything under that bus to the pci_dev for the bus. Unlike a regular driver, our main entry points are via the DMA-API, which takes a device's pci_dev, not Calgary's. We then walk up the bus hierarchy and find the Calgary pci_dev that is an ancestor of this device, and that gives us the right IOMMU table to us. Note that this has to occur *before* PCI device initialization, since we want to turn translation on before any device will try to DMA. In conclusion, our usage doesn't seem lika a good fit for the probe approach, although it could probably converted provided we got the ordering right with regards to regular PCI device initialization. Doesn't seem to be worth the effort. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Wed, Feb 13, 2008 at 09:32:03AM -0800, Greg KH wrote: > Is there some reason you aren't using the "real" PCI driver api here > and registering a pci driver for these devices? That would take the > whole "loop over all pci devices" logic out of the code entirely. I recall we had a reason, but I no longer recall what it was. Some reason the "real" PCI driver API didn't fit at the time. If someone were to whip up a working patch, I'd happily apply it. > > Feel free to nuke it and walk the list forward. > > So something like the patch below would be fine? Yep, looks good. Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Tue, Feb 12, 2008 at 04:16:38PM -0800, Greg KH wrote: > Why does the calgary driver need this? Can we just use > pci_get_device() instead? Why do you need to walk the device list > backwards? Do you get false positives going forward? It's not strictly needed, we used it for symmetry. Feel free to nuke it and walk the list forward. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Tue, Feb 12, 2008 at 04:16:38PM -0800, Greg KH wrote: Why does the calgary driver need this? Can we just use pci_get_device() instead? Why do you need to walk the device list backwards? Do you get false positives going forward? It's not strictly needed, we used it for symmetry. Feel free to nuke it and walk the list forward. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci_get_device_reverse(), why does Calgary need this?
On Wed, Feb 13, 2008 at 09:32:03AM -0800, Greg KH wrote: Is there some reason you aren't using the real PCI driver api here and registering a pci driver for these devices? That would take the whole loop over all pci devices logic out of the code entirely. I recall we had a reason, but I no longer recall what it was. Some reason the real PCI driver API didn't fit at the time. If someone were to whip up a working patch, I'd happily apply it. Feel free to nuke it and walk the list forward. So something like the patch below would be fine? Yep, looks good. Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]intel-iommu batched iotlb flushes
On Tue, Feb 12, 2008 at 01:00:06AM -0800, David Miller wrote: > From: Muli Ben-Yehuda <[EMAIL PROTECTED]> > Date: Tue, 12 Feb 2008 10:52:56 +0200 > > > The streaming DMA-API was designed to conserve IOMMU mappings for > > machines where IOMMU mappings are a scarce resource, and is a poor > > fit for a modern IOMMU such as VT-d with a 64-bit IO address space > > (or even an IOMMU with a 32-bit address space such as Calgary) > > where there are plenty of IOMMU mappings available. > > For the 64-bit case what you are suggesting eventually amounts to > mapping all available RAM in the IOMMU. > > Although an extreme version of your suggestion, it would be the most > efficient as it would require zero IOMMU flush operations. > > But we'd lose things like protection and other benefits. For the extreme case you are correct. There's an inherent trade-off between IOMMU performance and protection guarantees, where one end of the spectrum is represented by the streaming DMA-API and the other end is represented by simply mapping all available memory. It's an open question what is the right point in between. I think that an optimal strategy might be "keep the mapping around for as long as it is safe", i.e., keep a mapping to a frame for as long as the frame is owned by whoever requested the mapping in the first place. Once ownership of the frame is passed to another entity (e.g., from the driver to the stack), revoke the original mapping. This implies a way for the kernel to track frame ownership and communicate frame ownership changes to the DMA-API layer, which we don't currently have. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]intel-iommu batched iotlb flushes
On Mon, Feb 11, 2008 at 02:41:05PM -0800, mark gross wrote: > The intel-iommu hardware requires a polling operation to flush IOTLB > PTE's after an unmap operation. Through some TSC instrumentation of > a netperf UDP stream with small packets test case it was seen that > the flush operations where sucking up to 16% of the CPU time doing > iommu_flush_iotlb's > > The following patch batches the IOTLB flushes removing most of the > overhead in flushing the IOTLB's. It works by building a list of to > be released IOVA's that is iterated over when a timer goes off or > when a high water mark is reached. > > The wrinkle this has is that the memory protection and page fault > warnings from errant DMA operations is somewhat reduced, hence a kernel > parameter is added to revert back to the "strict" page flush / unmap > behavior. > > The hole is the following scenarios: > do many map_signal operations, do some unmap_signals, reuse a recently > unmapped page, memory> > > Or: you have rouge hardware using DMA's to look at pages: do many > map_signal's, do many unmap_singles, reuse some unmapped pages : > > > Note : these holes are very hard to get too, as the IOTLB is small > and only the PTE's still in the IOTLB can be accessed through this > mechanism. > > Its recommended that strict is used when developing drivers that do > DMA operations to catch bugs early. For production code where > performance is desired running with the batched IOTLB flushing is a > good way to go. While I don't disagree with this patch in principle (Calgary does the same thing due to expensive IOTLB flushes) the right way to fix it IMHO is to fix the drivers to batch mapping and unmapping operations or map up-front and unmap when done. The streaming DMA-API was designed to conserve IOMMU mappings for machines where IOMMU mappings are a scarce resource, and is a poor fit for a modern IOMMU such as VT-d with a 64-bit IO address space (or even an IOMMU with a 32-bit address space such as Calgary) where there are plenty of IOMMU mappings available. Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]intel-iommu batched iotlb flushes
On Mon, Feb 11, 2008 at 02:41:05PM -0800, mark gross wrote: The intel-iommu hardware requires a polling operation to flush IOTLB PTE's after an unmap operation. Through some TSC instrumentation of a netperf UDP stream with small packets test case it was seen that the flush operations where sucking up to 16% of the CPU time doing iommu_flush_iotlb's The following patch batches the IOTLB flushes removing most of the overhead in flushing the IOTLB's. It works by building a list of to be released IOVA's that is iterated over when a timer goes off or when a high water mark is reached. The wrinkle this has is that the memory protection and page fault warnings from errant DMA operations is somewhat reduced, hence a kernel parameter is added to revert back to the strict page flush / unmap behavior. The hole is the following scenarios: do many map_signal operations, do some unmap_signals, reuse a recently unmapped page, errant DMA hardware sneaks through and steps on reused memory Or: you have rouge hardware using DMA's to look at pages: do many map_signal's, do many unmap_singles, reuse some unmapped pages : rouge hardware looks at reused page Note : these holes are very hard to get too, as the IOTLB is small and only the PTE's still in the IOTLB can be accessed through this mechanism. Its recommended that strict is used when developing drivers that do DMA operations to catch bugs early. For production code where performance is desired running with the batched IOTLB flushing is a good way to go. While I don't disagree with this patch in principle (Calgary does the same thing due to expensive IOTLB flushes) the right way to fix it IMHO is to fix the drivers to batch mapping and unmapping operations or map up-front and unmap when done. The streaming DMA-API was designed to conserve IOMMU mappings for machines where IOMMU mappings are a scarce resource, and is a poor fit for a modern IOMMU such as VT-d with a 64-bit IO address space (or even an IOMMU with a 32-bit address space such as Calgary) where there are plenty of IOMMU mappings available. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]intel-iommu batched iotlb flushes
On Tue, Feb 12, 2008 at 01:00:06AM -0800, David Miller wrote: From: Muli Ben-Yehuda [EMAIL PROTECTED] Date: Tue, 12 Feb 2008 10:52:56 +0200 The streaming DMA-API was designed to conserve IOMMU mappings for machines where IOMMU mappings are a scarce resource, and is a poor fit for a modern IOMMU such as VT-d with a 64-bit IO address space (or even an IOMMU with a 32-bit address space such as Calgary) where there are plenty of IOMMU mappings available. For the 64-bit case what you are suggesting eventually amounts to mapping all available RAM in the IOMMU. Although an extreme version of your suggestion, it would be the most efficient as it would require zero IOMMU flush operations. But we'd lose things like protection and other benefits. For the extreme case you are correct. There's an inherent trade-off between IOMMU performance and protection guarantees, where one end of the spectrum is represented by the streaming DMA-API and the other end is represented by simply mapping all available memory. It's an open question what is the right point in between. I think that an optimal strategy might be keep the mapping around for as long as it is safe, i.e., keep a mapping to a frame for as long as the frame is owned by whoever requested the mapping in the first place. Once ownership of the frame is passed to another entity (e.g., from the driver to the stack), revoke the original mapping. This implies a way for the kernel to track frame ownership and communicate frame ownership changes to the DMA-API layer, which we don't currently have. Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CFP: Operating Systems Review special issue on the Linux kernel
SIGOPS Operating Systems Review Special Issue on Research and Developments in the Linux Kernel The Linux kernel, since its inception in 1991, has captured the interest of many thousands of developers and millions of users. It recently celebrated its 16th anniversary, includes many millions of lines of code, and is used in production systems around the world. It is also advancing at an increasingly rapid pace, undergoing many changes every single day. For this OSR special issue, we welcome technical papers covering the latest advances that have been or will soon be merged into the Linux kernel, as well as wild idea papers discussing promising experimental work. In recognition of the current chasm that we wish to bridge, we encourage papers from both the Linux kernel community and the research community. The OSR issue aims to: a) expose members of the Linux kernel community to exploratory research work that is going on which might influence Linux's evolution, and b) expose members of the systems research community to the latest happenings in a mature, production kernel that is widely used and advancing rapidly. Please submit papers related to all aspects of the Linux kernel. In particular, papers are solicited for the the following areas: * Virtualization * I/O, networking and interconnects * Support for multi-core and heterogeneous CPUs * Co-existence with other operating systems * filesystems, clustering, SSI * Profiling, performance tuning, debugging * Scaling up (e.g., supercomputers) and down (e.g., embedded devices) * Experience reports * Research directions Submissions should be between 5 and 10 pages (in a font no smaller than 10 pt). Standard SIGOPS formatting rules apply (see http://www.sigops.org/osr.html). Papers should report on significant new results or directions and include at least some material that has not been published before. Papers will be reviewed by the guest editors and the review committee. Accepted papers will be published in the July 2008 issue of Operating Systems Review. Please upload your submissions (in PDF format only) to the conference website, at http://www.easychair.org/conferences/?conf=osrlk2008. If you have any questions or comments, please do not hesitate to contact the guest editors. Important dates: Submission deadline: March 14th, 2008 Author notification: April 18th, 2008 Camera-ready submission deadline: May 13th, 2008 OSR guest editors: * Muli Ben-Yehuda, IBM Haifa Research Lab <[EMAIL PROTECTED]> * Eric Van Hensbergen, IBM Austin Research Lab <[EMAIL PROTECTED]> * Marc E. Fiuczynski, Princeton University <[EMAIL PROTECTED]> Review committee: * Patrick Bridges (University of New Mexico) * Angela Demke Brown (University of Toronto) * Hubertus Franke (IBM Research) * Oren Laadan (Columbia University) * Paul McKenney (IBM Linux Technology Center) * Chris Mason (Oracle) * Ron Minnich (Los Alamos National Laboratory) * Stephen C. Tweedie (Red Hat) * Chris Wright (Red Hat) * Pete Wyckoff (Ohio Supercomputer Center) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CFP: Operating Systems Review special issue on the Linux kernel
SIGOPS Operating Systems Review Special Issue on Research and Developments in the Linux Kernel The Linux kernel, since its inception in 1991, has captured the interest of many thousands of developers and millions of users. It recently celebrated its 16th anniversary, includes many millions of lines of code, and is used in production systems around the world. It is also advancing at an increasingly rapid pace, undergoing many changes every single day. For this OSR special issue, we welcome technical papers covering the latest advances that have been or will soon be merged into the Linux kernel, as well as wild idea papers discussing promising experimental work. In recognition of the current chasm that we wish to bridge, we encourage papers from both the Linux kernel community and the research community. The OSR issue aims to: a) expose members of the Linux kernel community to exploratory research work that is going on which might influence Linux's evolution, and b) expose members of the systems research community to the latest happenings in a mature, production kernel that is widely used and advancing rapidly. Please submit papers related to all aspects of the Linux kernel. In particular, papers are solicited for the the following areas: * Virtualization * I/O, networking and interconnects * Support for multi-core and heterogeneous CPUs * Co-existence with other operating systems * filesystems, clustering, SSI * Profiling, performance tuning, debugging * Scaling up (e.g., supercomputers) and down (e.g., embedded devices) * Experience reports * Research directions Submissions should be between 5 and 10 pages (in a font no smaller than 10 pt). Standard SIGOPS formatting rules apply (see http://www.sigops.org/osr.html). Papers should report on significant new results or directions and include at least some material that has not been published before. Papers will be reviewed by the guest editors and the review committee. Accepted papers will be published in the July 2008 issue of Operating Systems Review. Please upload your submissions (in PDF format only) to the conference website, at http://www.easychair.org/conferences/?conf=osrlk2008. If you have any questions or comments, please do not hesitate to contact the guest editors. Important dates: Submission deadline: March 14th, 2008 Author notification: April 18th, 2008 Camera-ready submission deadline: May 13th, 2008 OSR guest editors: * Muli Ben-Yehuda, IBM Haifa Research Lab [EMAIL PROTECTED] * Eric Van Hensbergen, IBM Austin Research Lab [EMAIL PROTECTED] * Marc E. Fiuczynski, Princeton University [EMAIL PROTECTED] Review committee: * Patrick Bridges (University of New Mexico) * Angela Demke Brown (University of Toronto) * Hubertus Franke (IBM Research) * Oren Laadan (Columbia University) * Paul McKenney (IBM Linux Technology Center) * Chris Mason (Oracle) * Ron Minnich (Los Alamos National Laboratory) * Stephen C. Tweedie (Red Hat) * Chris Wright (Red Hat) * Pete Wyckoff (Ohio Supercomputer Center) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]intel-iommu-PMEN support
On Fri, Nov 16, 2007 at 02:39:57PM -0800, mark gross wrote: > The following patch adds support for protected memory enable bits by > clearing them if they are set at startup time. Some future boot > loaders or firmware could have this bit set after it loads the > kernel, and it needs to be cleared if DMA's are going to happen > effectively. > > please apply > > --mgross > > Signed-off-by: mark gross <[EMAIL PROTECTED]> Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Mark, please try in the future to not mix unrelated changes such as these. This patch should've been two patches, one to disable protected memory enable and the other to do the cleanups. > -#define MAX_FAULT_REASON_IDX ARRAY_SIZE(fault_reason_strings) - 1 > +#define MAX_FAULT_REASON_IDX (ARRAY_SIZE(fault_reason_strings) - 1) > -static inline u64 dmar_readq(void *addr) > +static inline u64 dmar_readq(void __iomem *addr) Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]intel-iommu-PMEN support
On Fri, Nov 16, 2007 at 02:39:57PM -0800, mark gross wrote: The following patch adds support for protected memory enable bits by clearing them if they are set at startup time. Some future boot loaders or firmware could have this bit set after it loads the kernel, and it needs to be cleared if DMA's are going to happen effectively. please apply --mgross Signed-off-by: mark gross [EMAIL PROTECTED] Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Mark, please try in the future to not mix unrelated changes such as these. This patch should've been two patches, one to disable protected memory enable and the other to do the cleanups. -#define MAX_FAULT_REASON_IDX ARRAY_SIZE(fault_reason_strings) - 1 +#define MAX_FAULT_REASON_IDX (ARRAY_SIZE(fault_reason_strings) - 1) -static inline u64 dmar_readq(void *addr) +static inline u64 dmar_readq(void __iomem *addr) Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
On Mon, Nov 12, 2007 at 07:25:27PM +0530, Amit Shah wrote: > Selectively? What happens in the case when some iommu doesn't want > to invoke the prev_op, but the mapping depends on it being called > (eg, the hypercalling op is embedded somewhere in the prev_op chain) Bad things :-) There needs to be a hierarchy of dma-ops, e.g., nommu/swiotlb, then a hardware iommu, then pvdma. Not sure where IB fits in here. The calling order would be the reverse of the initialization order, so pvdma->hardare->nommu/swiotlb. > Hmm, also, a hypercall should be the last operation to be called in > a few cases, but also the first (and the last) to be called in > several other cases. For example, in a guest, you can go register > any number of iotlbs, but you don't actually want to do anything > there -- you just want to do a hypercall and get the mapping from > the host. > > But in any case, what ensures that the hypercall op always gets > called and also that it's the last one? If it gets called first it can ensure that it runs either first or last, or both, since it controls when to run the other hooks, before or after it does what it needs to do. > Also, I'm thinking of implementations where let's say sg_map_free is > not defined for a particular iotlb, but it was defined in the > previously registered one. How to handle this? Good point, this will require all dma ops implementation to provide stubs that just return prev_ops->op if it's set. > It seems a small dispatcher which takes care of this seems the > likely choice here, but avoiding it (or at least caching the > decisions) is something that needs more thought. Yeah, I'm not too enthusiastic about it, but we do need such a generic mechanism or we will each end up implementing our own versions of it... Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
On Mon, Nov 12, 2007 at 05:26:24PM +0530, Amit Shah wrote: > On Monday 12 November 2007 16:20:01 Muli Ben-Yehuda wrote: > > On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote: > > > We make the dma_mapping_ops structure to point to our structure so > > > that every DMA access goes through us. (This is the reason this only > > > works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops > > > struct.) > > > > I need the same facility for Calgary for falling back to swiotlb if a > > translation is disabled on some slot, and IB needs the same facility > > for some IB adapters (e.g., ipath). Perhaps it's time to consider > > stackable dma-ops (unless someone has a better idea...). > > That would make great sense and simplify implementations. > > How do you propose such an implementation? An array of function > pointers for each possible call? I was thinking of simply a chain of dma_ops (via dma_ops->prev_ops) , where it's the responsibility of each dma_ops implementation to call or not call the corresponding entry in chain (prev_ops->op()). This works well for Calgary (which will only use prev_ops selectively, and I think it will work well for the IB folks. Will it work for you? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware
On Wed, Nov 07, 2007 at 04:21:06PM +0200, Amit Shah wrote: > Of all the DMA calls, only dma_alloc_coherent might not actually > call dma_ops->alloc_coherent. We make sure that gets called if the > device that's being worked on is a PV device I always thougt that's a mess... the reason it's done this way is that Andi Kleen preferred it for some reason at the time. How about trying to fix it cleanly so that dma_alloc_coherent always gets called rather than adding a hack to work around a hack? It will require auditing all of the different x86 dma-ops but I can help with that. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation
On Wed, Nov 07, 2007 at 04:21:05PM +0200, Amit Shah wrote: > A guest can call dma_ops->is_pv_device() to find out if a device is > a passthrough'ed device (device passed on to a guest by the > host). If this is true, a hypercall will be made to translate DMA > mapping operations. Doesn't really belong in the DMA mapping API. Instead what I think we should do is to cache this (per device) value in the pci_device struct (or device struct?) and in the dma-ops implementation inspect it to decide exactly how to do DMA mapping for this device. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote: > We make the dma_mapping_ops structure to point to our structure so > that every DMA access goes through us. (This is the reason this only > works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops > struct.) I need the same facility for Calgary for falling back to swiotlb if a translation is disabled on some slot, and IB needs the same facility for some IB adapters (e.g., ipath). Perhaps it's time to consider stackable dma-ops (unless someone has a better idea...). Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote: We make the dma_mapping_ops structure to point to our structure so that every DMA access goes through us. (This is the reason this only works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops struct.) I need the same facility for Calgary for falling back to swiotlb if a translation is disabled on some slot, and IB needs the same facility for some IB adapters (e.g., ipath). Perhaps it's time to consider stackable dma-ops (unless someone has a better idea...). Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation
On Wed, Nov 07, 2007 at 04:21:05PM +0200, Amit Shah wrote: A guest can call dma_ops-is_pv_device() to find out if a device is a passthrough'ed device (device passed on to a guest by the host). If this is true, a hypercall will be made to translate DMA mapping operations. Doesn't really belong in the DMA mapping API. Instead what I think we should do is to cache this (per device) value in the pci_device struct (or device struct?) and in the dma-ops implementation inspect it to decide exactly how to do DMA mapping for this device. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware
On Wed, Nov 07, 2007 at 04:21:06PM +0200, Amit Shah wrote: Of all the DMA calls, only dma_alloc_coherent might not actually call dma_ops-alloc_coherent. We make sure that gets called if the device that's being worked on is a PV device I always thougt that's a mess... the reason it's done this way is that Andi Kleen preferred it for some reason at the time. How about trying to fix it cleanly so that dma_alloc_coherent always gets called rather than adding a hack to work around a hack? It will require auditing all of the different x86 dma-ops but I can help with that. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
On Mon, Nov 12, 2007 at 05:26:24PM +0530, Amit Shah wrote: On Monday 12 November 2007 16:20:01 Muli Ben-Yehuda wrote: On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote: We make the dma_mapping_ops structure to point to our structure so that every DMA access goes through us. (This is the reason this only works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops struct.) I need the same facility for Calgary for falling back to swiotlb if a translation is disabled on some slot, and IB needs the same facility for some IB adapters (e.g., ipath). Perhaps it's time to consider stackable dma-ops (unless someone has a better idea...). That would make great sense and simplify implementations. How do you propose such an implementation? An array of function pointers for each possible call? I was thinking of simply a chain of dma_ops (via dma_ops-prev_ops) , where it's the responsibility of each dma_ops implementation to call or not call the corresponding entry in chain (prev_ops-op()). This works well for Calgary (which will only use prev_ops selectively, and I think it will work well for the IB folks. Will it work for you? Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvm-devel] [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
On Mon, Nov 12, 2007 at 07:25:27PM +0530, Amit Shah wrote: Selectively? What happens in the case when some iommu doesn't want to invoke the prev_op, but the mapping depends on it being called (eg, the hypercalling op is embedded somewhere in the prev_op chain) Bad things :-) There needs to be a hierarchy of dma-ops, e.g., nommu/swiotlb, then a hardware iommu, then pvdma. Not sure where IB fits in here. The calling order would be the reverse of the initialization order, so pvdma-hardare-nommu/swiotlb. Hmm, also, a hypercall should be the last operation to be called in a few cases, but also the first (and the last) to be called in several other cases. For example, in a guest, you can go register any number of iotlbs, but you don't actually want to do anything there -- you just want to do a hypercall and get the mapping from the host. But in any case, what ensures that the hypercall op always gets called and also that it's the last one? If it gets called first it can ensure that it runs either first or last, or both, since it controls when to run the other hooks, before or after it does what it needs to do. Also, I'm thinking of implementations where let's say sg_map_free is not defined for a particular iotlb, but it was defined in the previously registered one. How to handle this? Good point, this will require all dma ops implementation to provide stubs that just return prev_ops-op if it's set. It seems a small dispatcher which takes care of this seems the likely choice here, but avoiding it (or at least caching the decisions) is something that needs more thought. Yeah, I'm not too enthusiastic about it, but we do need such a generic mechanism or we will each end up implementing our own versions of it... Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] x86 pci-calgary_64.c: make a variable static
On Fri, Nov 09, 2007 at 07:03:30AM +0100, Adrian Bunk wrote: > "debugging" is a horrible name for a global variable - thankfully it can > become static. > > Also put it out of __read_mostly so that gcc no longer has to emit it > at all. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Thanks, applied. Will push with my next batch of Calgary updates. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] x86 pci-calgary_64.c: make a variable static
On Fri, Nov 09, 2007 at 07:03:30AM +0100, Adrian Bunk wrote: debugging is a horrible name for a global variable - thankfully it can become static. Also put it out of __read_mostly so that gcc no longer has to emit it at all. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Thanks, applied. Will push with my next batch of Calgary updates. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix incorrect test in trident_ac97_set(); sound/oss/trident.c
On Wed, Nov 07, 2007 at 11:04:41AM -0800, Ray Lee wrote: > On Nov 7, 2007 10:50 AM, Roel Kluin <[EMAIL PROTECTED]> wrote: > > If count reaches zero, the loop ends, but the postfix decrement > > still subtracts: testing for 'count == 0' will not work. > > > > Signed-off-by: Roel Kluin <[EMAIL PROTECTED]> > > --- > > diff --git a/sound/oss/trident.c b/sound/oss/trident.c > > index 96adc47..6959ee1 100644 > > --- a/sound/oss/trident.c > > +++ b/sound/oss/trident.c > > @@ -2935,7 +2935,7 @@ trident_ac97_set(struct ac97_codec *codec, u8 reg, > > u16 val) > > do { > > if ((inw(TRID_REG(card, address)) & busy) == 0) > > break; > > - } while (count--); > > + } while (--count); > > > > data |= (mask | (reg & AC97_REG_ADDR)); > > > > @@ -2996,7 +2996,7 @@ trident_ac97_get(struct ac97_codec *codec, u8 reg) > > data = inl(TRID_REG(card, address)); > > if ((data & busy) == 0) > > break; > > - } while (count--); > > + } while (--count); > > > > spin_unlock_irqrestore(>lock, flags); > > > > if (count == 0) { > > > > Thanks, much better. In the future, please also CC: the appropriate > maintainers, or Andrew Morton if you're at a loss... Indeed. > Reviewed-by: Ray Lee <[EMAIL PROTECTED]> Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Andrew, can you please push to Linus? Thanks, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix incorrect test in trident_ac97_set(); sound/oss/trident.c
On Wed, Nov 07, 2007 at 11:04:41AM -0800, Ray Lee wrote: On Nov 7, 2007 10:50 AM, Roel Kluin [EMAIL PROTECTED] wrote: If count reaches zero, the loop ends, but the postfix decrement still subtracts: testing for 'count == 0' will not work. Signed-off-by: Roel Kluin [EMAIL PROTECTED] --- diff --git a/sound/oss/trident.c b/sound/oss/trident.c index 96adc47..6959ee1 100644 --- a/sound/oss/trident.c +++ b/sound/oss/trident.c @@ -2935,7 +2935,7 @@ trident_ac97_set(struct ac97_codec *codec, u8 reg, u16 val) do { if ((inw(TRID_REG(card, address)) busy) == 0) break; - } while (count--); + } while (--count); data |= (mask | (reg AC97_REG_ADDR)); @@ -2996,7 +2996,7 @@ trident_ac97_get(struct ac97_codec *codec, u8 reg) data = inl(TRID_REG(card, address)); if ((data busy) == 0) break; - } while (count--); + } while (--count); spin_unlock_irqrestore(card-lock, flags); if (count == 0) { Thanks, much better. In the future, please also CC: the appropriate maintainers, or Andrew Morton if you're at a loss... Indeed. Reviewed-by: Ray Lee [EMAIL PROTECTED] Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Andrew, can you please push to Linus? Thanks, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 0/3] convert IOMMUs to use iova
On Sat, Nov 03, 2007 at 02:05:39AM +0900, FUJITA Tomonori wrote: > This patchset convert the PPC64 IOMMU to use the iova code for free > area management. > > The IOMMUs ignores low level drivers' restrictions, the maximum > segment size and segment boundary. > > I fixed the former: > > http://thread.gmane.org/gmane.linux.scsi/35602 > > The latter makes the free area management complicated. I'd like to > convert IOMMUs to use the iova code (that intel-iommu introduced) > for free area management and enable iova to handle segment boundary > restrictions, rather than fixing all the IOMMUs' free area > management, In general it sounds like a great idea, but have you looked at what impact this has on the performance of the IO path? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 0/3] convert IOMMUs to use iova
On Sat, Nov 03, 2007 at 02:05:39AM +0900, FUJITA Tomonori wrote: This patchset convert the PPC64 IOMMU to use the iova code for free area management. The IOMMUs ignores low level drivers' restrictions, the maximum segment size and segment boundary. I fixed the former: http://thread.gmane.org/gmane.linux.scsi/35602 The latter makes the free area management complicated. I'd like to convert IOMMUs to use the iova code (that intel-iommu introduced) for free area management and enable iova to handle segment boundary restrictions, rather than fixing all the IOMMUs' free area management, In general it sounds like a great idea, but have you looked at what impact this has on the performance of the IO path? Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu-PMEN_REG boot up support
On Mon, Oct 29, 2007 at 01:31:30PM -0700, Mark Gross wrote: > > What's the value of adding a command line option for this? Under > > what circumstances would we want to not clear this bit? > > umm, /me asks around and finds out that at this time there isn't > much point to having a command line to preserve any protected areas > set up at boot time. The society for controlled proliferation of command line options thanks you. > Signed-off-by: mark gross <[EMAIL PROTECTED]> Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu-PMEN_REG boot up support
On Mon, Oct 29, 2007 at 01:31:30PM -0700, Mark Gross wrote: What's the value of adding a command line option for this? Under what circumstances would we want to not clear this bit? umm, /me asks around and finds out that at this time there isn't much point to having a command line to preserve any protected areas set up at boot time. The society for controlled proliferation of command line options thanks you. Signed-off-by: mark gross [EMAIL PROTECTED] Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu-PMEN_REG boot up support
On Fri, Oct 26, 2007 at 11:18:49AM -0700, Mark Gross wrote: > The following patch clears the portect memory region enable bit at > boot time by default. It also provides a kernel parrameter for > disabling this behavior and leave the PMEN_REG untouched if so > wanted. > > If the boot loader or platform has protected memory regions enabled > at boot time it could prevent DMA's from happening as drivers are > loaded and used. What's the value of adding a command line option for this? Under what circumstances would we want to not clear this bit? > + } else if (!strncmp(str, "no_epm_clear", 12)) { > + printk(KERN_INFO > + "Intel-IOMMU: subress clearing of Enable " > + "Protected Memory bit\n"); > + clear_pmen_epm = 0; `suppress', I assume. Rest looks fine, if the configuration option is really needed. Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu-PMEN_REG boot up support
On Fri, Oct 26, 2007 at 11:18:49AM -0700, Mark Gross wrote: The following patch clears the portect memory region enable bit at boot time by default. It also provides a kernel parrameter for disabling this behavior and leave the PMEN_REG untouched if so wanted. If the boot loader or platform has protected memory regions enabled at boot time it could prevent DMA's from happening as drivers are loaded and used. What's the value of adding a command line option for this? Under what circumstances would we want to not clear this bit? + } else if (!strncmp(str, no_epm_clear, 12)) { + printk(KERN_INFO + Intel-IOMMU: subress clearing of Enable + Protected Memory bit\n); + clear_pmen_epm = 0; `suppress', I assume. Rest looks fine, if the configuration option is really needed. Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86 gart: rename symbols only used for the GART implementation
On Tue, Oct 23, 2007 at 08:10:54PM +0200, Andi Kleen wrote: > I think his goal was to get an prefix that describes the module > uniquely. gart_* clearly does not fulfill that criteria. > > So basically he's replacing an > ambigious-with-other-IOMMU-implementations prefix with an > ambigious-with-AGP prefix. Seems like a rather pointless change. Not at all - the present situation where GART specific code masquerades as generic IOMMU code is confusing. I've heard from more than one person who was confused by "how come I can enable an IOMMU while CONFIG_IOMMU is off?" and "how come some iommu_xxx options (which were really gart options...) don't actually affect the (Calgary) IOMMU"? If the variable names clash the correct solution is to s/gart/agpgart/ in the AGP code and then continue renaming gart specific bits 'gart' rather than 'iommu'. Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] x86 gart: make some variables and functions static
On Tue, Oct 23, 2007 at 07:41:32PM +0200, Joerg Roedel wrote: > This patch makes some functions and variables static in pci-gart_64.c which > are > not used somewhere else. > > Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]> Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] x86 gart: rename CONFIG_IOMMU to CONFIG_GART_IOMMU
On Tue, Oct 23, 2007 at 07:41:31PM +0200, Joerg Roedel wrote: > This patch renames the IOMMU config option to GART_IOMMU because in fact it > means the GART and not general support for an IOMMU on x86. > > Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]> Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] x86 gart: rename iommu.h to gart.h
On Tue, Oct 23, 2007 at 07:41:30PM +0200, Joerg Roedel wrote: > This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make > clear to which IOMMU implementation it belongs. The patch also adds "GART" to > the Kconfig line. > > Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]> Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] x86 gart: rename iommu.h to gart.h
On Tue, Oct 23, 2007 at 07:41:30PM +0200, Joerg Roedel wrote: This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make clear to which IOMMU implementation it belongs. The patch also adds GART to the Kconfig line. Signed-off-by: Joerg Roedel [EMAIL PROTECTED] Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] x86 gart: make some variables and functions static
On Tue, Oct 23, 2007 at 07:41:32PM +0200, Joerg Roedel wrote: This patch makes some functions and variables static in pci-gart_64.c which are not used somewhere else. Signed-off-by: Joerg Roedel [EMAIL PROTECTED] Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] x86 gart: rename CONFIG_IOMMU to CONFIG_GART_IOMMU
On Tue, Oct 23, 2007 at 07:41:31PM +0200, Joerg Roedel wrote: This patch renames the IOMMU config option to GART_IOMMU because in fact it means the GART and not general support for an IOMMU on x86. Signed-off-by: Joerg Roedel [EMAIL PROTECTED] Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86 gart: rename symbols only used for the GART implementation
On Tue, Oct 23, 2007 at 08:10:54PM +0200, Andi Kleen wrote: I think his goal was to get an prefix that describes the module uniquely. gart_* clearly does not fulfill that criteria. So basically he's replacing an ambigious-with-other-IOMMU-implementations prefix with an ambigious-with-AGP prefix. Seems like a rather pointless change. Not at all - the present situation where GART specific code masquerades as generic IOMMU code is confusing. I've heard from more than one person who was confused by how come I can enable an IOMMU while CONFIG_IOMMU is off? and how come some iommu_xxx options (which were really gart options...) don't actually affect the (Calgary) IOMMU? If the variable names clash the correct solution is to s/gart/agpgart/ in the AGP code and then continue renaming gart specific bits 'gart' rather than 'iommu'. Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: rename iommu.h to gart.h
On Fri, Oct 19, 2007 at 02:38:11PM +0200, Joerg Roedel wrote: > This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make > clear to which IOMMU implementation it belongs. The patch also adds "GART" to > the Kconfig line. Long overdue IMHO. How about also renaming the config option to CONFIG_GART_IOMMU while you're at it? Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: rename iommu.h to gart.h
On Fri, Oct 19, 2007 at 02:38:11PM +0200, Joerg Roedel wrote: This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make clear to which IOMMU implementation it belongs. The patch also adds GART to the Kconfig line. Long overdue IMHO. How about also renaming the config option to CONFIG_GART_IOMMU while you're at it? Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump
On Wed, Oct 10, 2007 at 11:00:58AM +0530, Vivek Goyal wrote: > On Tue, Oct 09, 2007 at 11:06:23PM +0200, Muli Ben-Yehuda wrote: > > Hi Chandru, > > > > Thanks for the patch. Comments on the patch below, but first a general > > question for my education: the main problem here that aacraid > > continues DMA'ing when it shouldn't. Why can't we shut it down > > cleanly? Even without the presence of an IOMMU, it seems dangerous to > > let an adapter continue DMA'ing to and from memory when the kernel is > > an inconsistent state. > > > > Hi Muli, > > After the kernel crash, we can't rely on the crashed kernel's data > structures or methods any more. We can't call the device shutdown > methods of all the device drivers as we might be invoking a driver > which actually might have caused the crash. Hence we don't perform > any device shutdown in the case of kdump. Instead after the crash, > we try to take the shortest route to second kernel (Execute minimum > code in crashed kernel). > > Whatever special handling is required to bring up the second kernel on > a potentially unknown state hardware, is taken in second kernel. That makes sense, but does it really make more sense to set-up proper IOMMU translation tables for DMA's which have been triggered from the first kernel than to either quiesce the device ASAP (this means before enabling IOMMU translation...) or to trap such DMA's and continue, rather than letting them go through? > We will not be too concerned about ongoing DMA's as long as there is no > corruption of tce tables. That would mean DMA is happening in first > kernel's memory buffer and second kernel is not impacted. But if TCE > tables themselves are corrupted, then it can potentially interfere with > second kernel's operation. Don't know how it can be addressed. I worry about two cases: the first is TCE table corruption, the second is DMA's to areas which were fine in the first kernel but are wrong for the second kernel. > > The patch below looks reasonable *if* that is the least worst way > > of doing it - let's see if we can come up with something cleaner > > that doesn't rely in the new kernel on data (which may or may not > > be corrupted...) from the old kernel. > > I think the issue here is that some DMA was going on when first > kernel crashed. After the crash, second kernel booted and created > new TCE tables and switched to it. This resulted in ongoing DMA > failure and hardware raised an alarm. > > In this case, probably it would make sense to re-use the TCE tables > of previous kernel (until and unless we have a way to tell hardware > not to flag a DMA error if TCE mapping is changed while DMA is going > on ?) We don't have a way to do that, but we should be able to trap the DMA, figure out that we're in a kdump kernel, and then just ignore it and continue. > I think, we also need to reserve some TCE table entries (in first > kernel), which can be used by second kernel for saving kernel core > file to disk. There might be a case where first kernel has used up > all TCE entries and second kernel can't allocate more. I think ppc64 > has taken the approach of freeing some entries in second kernel but > that will have the problem as you might be clearing an entry which > is being used by ongoing DMA. That's a good point - how do PPC (or other architectures with an isolation-capable IOMMU) handle this whole issue? Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump
On Wed, Oct 10, 2007 at 11:00:58AM +0530, Vivek Goyal wrote: On Tue, Oct 09, 2007 at 11:06:23PM +0200, Muli Ben-Yehuda wrote: Hi Chandru, Thanks for the patch. Comments on the patch below, but first a general question for my education: the main problem here that aacraid continues DMA'ing when it shouldn't. Why can't we shut it down cleanly? Even without the presence of an IOMMU, it seems dangerous to let an adapter continue DMA'ing to and from memory when the kernel is an inconsistent state. Hi Muli, After the kernel crash, we can't rely on the crashed kernel's data structures or methods any more. We can't call the device shutdown methods of all the device drivers as we might be invoking a driver which actually might have caused the crash. Hence we don't perform any device shutdown in the case of kdump. Instead after the crash, we try to take the shortest route to second kernel (Execute minimum code in crashed kernel). Whatever special handling is required to bring up the second kernel on a potentially unknown state hardware, is taken in second kernel. That makes sense, but does it really make more sense to set-up proper IOMMU translation tables for DMA's which have been triggered from the first kernel than to either quiesce the device ASAP (this means before enabling IOMMU translation...) or to trap such DMA's and continue, rather than letting them go through? We will not be too concerned about ongoing DMA's as long as there is no corruption of tce tables. That would mean DMA is happening in first kernel's memory buffer and second kernel is not impacted. But if TCE tables themselves are corrupted, then it can potentially interfere with second kernel's operation. Don't know how it can be addressed. I worry about two cases: the first is TCE table corruption, the second is DMA's to areas which were fine in the first kernel but are wrong for the second kernel. The patch below looks reasonable *if* that is the least worst way of doing it - let's see if we can come up with something cleaner that doesn't rely in the new kernel on data (which may or may not be corrupted...) from the old kernel. I think the issue here is that some DMA was going on when first kernel crashed. After the crash, second kernel booted and created new TCE tables and switched to it. This resulted in ongoing DMA failure and hardware raised an alarm. In this case, probably it would make sense to re-use the TCE tables of previous kernel (until and unless we have a way to tell hardware not to flag a DMA error if TCE mapping is changed while DMA is going on ?) We don't have a way to do that, but we should be able to trap the DMA, figure out that we're in a kdump kernel, and then just ignore it and continue. I think, we also need to reserve some TCE table entries (in first kernel), which can be used by second kernel for saving kernel core file to disk. There might be a case where first kernel has used up all TCE entries and second kernel can't allocate more. I think ppc64 has taken the approach of freeing some entries in second kernel but that will have the problem as you might be clearing an entry which is being used by ongoing DMA. That's a good point - how do PPC (or other architectures with an isolation-capable IOMMU) handle this whole issue? Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump
Hi Chandru, Thanks for the patch. Comments on the patch below, but first a general question for my education: the main problem here that aacraid continues DMA'ing when it shouldn't. Why can't we shut it down cleanly? Even without the presence of an IOMMU, it seems dangerous to let an adapter continue DMA'ing to and from memory when the kernel is an inconsistent state. The patch below looks reasonable *if* that is the least worst way of doing it - let's see if we can come up with something cleaner that doesn't rely in the new kernel on data (which may or may not be corrupted...) from the old kernel. Cheers, Muli On Wed, Oct 10, 2007 at 02:10:13AM +0530, chandru wrote: > kdump kernel fails to boot with calgary iommu and aacraid driver on > a x366 box. The ongoing dma's of aacraid from the first kernel > continue to exist until the driver is loaded in the kdump > kernel. Calgary is initialized prior to aacraid and creation of new > tce tables causes wrong dma's to occur. > > Here we try to grab the tce tables of the first kernel in kdump > kernel and use them. While in the kdump kernel we do not allocate > new tce tables but rather read the base address register contents of > calgary iommu and use the tables that the registers point to. With > these changes the kdump kernel and hence aacraid now boots normally. > Another point that came when talking with Vivek was to reserve part > of the tce table space in first kernel for use in the kdump kernel. > > Signed-off-by: Chandru S <[EMAIL PROTECTED]> > --- > > --- linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c.orig > 2007-10-09 23:39:22.0 +0530 > +++ linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c 2007-10-10 > 01:25:53.0 +0530 > @@ -35,6 +35,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -165,6 +166,7 @@ static void calgary_dump_error_regs(stru > static void calioc2_handle_quirks(struct iommu_table *tbl, struct > pci_dev *dev); > static void calioc2_tce_cache_blast(struct iommu_table *tbl); > static void calioc2_dump_error_regs(struct iommu_table *tbl); > +static int calgary_bus_has_devices(int , unsigned short ) __init; Please add parameter names (int bus, etc). > static struct cal_chipset_ops calgary_chip_ops = { > .handle_quirks = calgary_handle_quirks, > @@ -819,7 +821,23 @@ static int __init calgary_setup_tar(stru > > tbl = pci_iommu(dev->bus); > tbl->it_base = (unsigned > long)bus_info[dev->bus->number].tce_space; > - tce_free(tbl, 0, tbl->it_size); > +#ifdef CONFIG_CRASH_DUMP > +if (is_kdump_kernel()){ > +u64 *tp; > +unsigned int index; > +tp = ((u64*)tbl->it_base); > +for(index=0;index < tbl->it_size; index++ ){ > +if ( *tp != 0x0 ) > +set_bit(index,tbl->it_map); > + > +tp++; > +} > +} > +else > +#endif > + { > + tce_free(tbl, 0, tbl->it_size); > + } Please no #ifdefs in here. Do it like this: if (is_kdump_kernel()) something() else something_else() Where is_kdump_kernel() is defined to 0 if CONFIG_CRASH_DUMP is not set. Also, please move the part where we scan the table into a suitably named function, e.g., calgary_init_bitmap_from_tce_table(). Also, coding style - please run the patch through checkpatch.pl. > +#ifdef CONFIG_CRASH_DUMP > + /* > +* If this is a kdump kernel, then try grabbing the tce tables > +* from first kernel by reading the contents of the base > +* address register of calgary iommu > +*/ > + if(is_kdump_kernel()){ Same comments as above. > + for (bus = 0; bus < MAX_PHB_BUS_NUM; bus++) { > + struct calgary_bus_info *info = _info[bus]; > + unsigned short pci_device; > + unsigned long tce_space; > + u32 val; > + > + val = read_pci_config(bus, 0, 0, 0); > + pci_device = (val & 0x) >> 16; > + > + if (!is_cal_pci_dev(pci_device)) > + continue; > + > + if (info->translation_disabled) > + continue; > + > + if (calgary_bus_has_devices(bus, pci_device) || > + translate_empty_slots ){ > + > + target = calgary_reg(bus_info[bus].bbar, > tar_offset(bus)); > + tce_space = be64_to_cpu(readq(target)); > + tce_space = tce_space & TAR_SW_BITS; > + > + BUG_ON(specified_table_size > > TCE_TABLE_SIZE_8M); > + > + tce_space = tce_space & ( > ~specified_table_size); > + info->tce_space =
Re: [RFC] [Patch] calgary iommu: Use the first kernel's tce tables in kdump
Hi Chandru, Thanks for the patch. Comments on the patch below, but first a general question for my education: the main problem here that aacraid continues DMA'ing when it shouldn't. Why can't we shut it down cleanly? Even without the presence of an IOMMU, it seems dangerous to let an adapter continue DMA'ing to and from memory when the kernel is an inconsistent state. The patch below looks reasonable *if* that is the least worst way of doing it - let's see if we can come up with something cleaner that doesn't rely in the new kernel on data (which may or may not be corrupted...) from the old kernel. Cheers, Muli On Wed, Oct 10, 2007 at 02:10:13AM +0530, chandru wrote: kdump kernel fails to boot with calgary iommu and aacraid driver on a x366 box. The ongoing dma's of aacraid from the first kernel continue to exist until the driver is loaded in the kdump kernel. Calgary is initialized prior to aacraid and creation of new tce tables causes wrong dma's to occur. Here we try to grab the tce tables of the first kernel in kdump kernel and use them. While in the kdump kernel we do not allocate new tce tables but rather read the base address register contents of calgary iommu and use the tables that the registers point to. With these changes the kdump kernel and hence aacraid now boots normally. Another point that came when talking with Vivek was to reserve part of the tce table space in first kernel for use in the kdump kernel. Signed-off-by: Chandru S [EMAIL PROTECTED] --- --- linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c.orig 2007-10-09 23:39:22.0 +0530 +++ linux-2.6.23-rc9/arch/x86_64/kernel/pci-calgary.c 2007-10-10 01:25:53.0 +0530 @@ -35,6 +35,7 @@ #include linux/pci_ids.h #include linux/pci.h #include linux/delay.h +#include linux/bootmem.h #include asm/iommu.h #include asm/calgary.h #include asm/tce.h @@ -165,6 +166,7 @@ static void calgary_dump_error_regs(stru static void calioc2_handle_quirks(struct iommu_table *tbl, struct pci_dev *dev); static void calioc2_tce_cache_blast(struct iommu_table *tbl); static void calioc2_dump_error_regs(struct iommu_table *tbl); +static int calgary_bus_has_devices(int , unsigned short ) __init; Please add parameter names (int bus, etc). static struct cal_chipset_ops calgary_chip_ops = { .handle_quirks = calgary_handle_quirks, @@ -819,7 +821,23 @@ static int __init calgary_setup_tar(stru tbl = pci_iommu(dev-bus); tbl-it_base = (unsigned long)bus_info[dev-bus-number].tce_space; - tce_free(tbl, 0, tbl-it_size); +#ifdef CONFIG_CRASH_DUMP +if (is_kdump_kernel()){ +u64 *tp; +unsigned int index; +tp = ((u64*)tbl-it_base); +for(index=0;index tbl-it_size; index++ ){ +if ( *tp != 0x0 ) +set_bit(index,tbl-it_map); + +tp++; +} +} +else +#endif + { + tce_free(tbl, 0, tbl-it_size); + } Please no #ifdefs in here. Do it like this: if (is_kdump_kernel()) something() else something_else() Where is_kdump_kernel() is defined to 0 if CONFIG_CRASH_DUMP is not set. Also, please move the part where we scan the table into a suitably named function, e.g., calgary_init_bitmap_from_tce_table(). Also, coding style - please run the patch through checkpatch.pl. +#ifdef CONFIG_CRASH_DUMP + /* +* If this is a kdump kernel, then try grabbing the tce tables +* from first kernel by reading the contents of the base +* address register of calgary iommu +*/ + if(is_kdump_kernel()){ Same comments as above. + for (bus = 0; bus MAX_PHB_BUS_NUM; bus++) { + struct calgary_bus_info *info = bus_info[bus]; + unsigned short pci_device; + unsigned long tce_space; + u32 val; + + val = read_pci_config(bus, 0, 0, 0); + pci_device = (val 0x) 16; + + if (!is_cal_pci_dev(pci_device)) + continue; + + if (info-translation_disabled) + continue; + + if (calgary_bus_has_devices(bus, pci_device) || + translate_empty_slots ){ + + target = calgary_reg(bus_info[bus].bbar, tar_offset(bus)); + tce_space = be64_to_cpu(readq(target)); + tce_space = tce_space TAR_SW_BITS; + + BUG_ON(specified_table_size TCE_TABLE_SIZE_8M); + + tce_space = tce_space ( ~specified_table_size); + info-tce_space = (u64 *)__va(tce_space); +
[PATCH 2/2] x86-64: Calgary - get rid of translate_phb
Now that we check for translation enabled/disabled based on the presence of the IOMMU translation table, we can get rid of translate_phb. Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> --- pci-calgary.c | 31 +++ 1 files changed, 15 insertions(+), 16 deletions(-) diff -r c9308d0538d9 -r a7833c71f4df arch/x86_64/kernel/pci-calgary.c --- a/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:43:49 2007 +0300 +++ b/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:45:29 2007 +0300 @@ -226,12 +226,6 @@ static inline int translation_enabled(st return (tbl != NULL); } -static inline int translate_phb(struct pci_dev* dev) -{ - int disabled = bus_info[dev->bus->number].translation_disabled; - return !disabled; -} - static void iommu_range_reserve(struct iommu_table *tbl, unsigned long start_addr, unsigned int npages) { @@ -1197,7 +1191,7 @@ static int __init calgary_init(void) { int ret; struct pci_dev *dev = NULL; - void *tce_space; + struct calgary_bus_info *info; ret = calgary_locate_bbars(); if (ret) @@ -1209,12 +1203,14 @@ static int __init calgary_init(void) break; if (!is_cal_pci_dev(dev->device)) continue; - if (!translate_phb(dev)) { + + info = _info[dev->bus->number]; + if (info->translation_disabled) { calgary_init_one_nontraslated(dev); continue; } - tce_space = bus_info[dev->bus->number].tce_space; - if (!tce_space && !translate_empty_slots) + + if (!info->tce_space && !translate_empty_slots) continue; ret = calgary_init_one(dev); @@ -1232,11 +1228,13 @@ error: break; if (!is_cal_pci_dev(dev->device)) continue; - if (!translate_phb(dev)) { + + info = _info[dev->bus->number]; + if (info->translation_disabled) { pci_dev_put(dev); continue; } - if (!bus_info[dev->bus->number].tce_space && !translate_empty_slots) + if (!info->tce_space && !translate_empty_slots) continue; calgary_disable_translation(dev); @@ -1549,7 +1547,7 @@ static int __init calgary_fixup_tce_spac static int __init calgary_fixup_tce_spaces(void) { struct pci_dev *dev = NULL; - void *tce_space; + struct calgary_bus_info *info; if (no_iommu || swiotlb || !calgary_detected) return -ENODEV; @@ -1562,11 +1560,12 @@ static int __init calgary_fixup_tce_spac break; if (!is_cal_pci_dev(dev->device)) continue; - if (!translate_phb(dev)) + + info = _info[dev->bus->number]; + if (info->translation_disabled) continue; - tce_space = bus_info[dev->bus->number].tce_space; - if (!tce_space) + if (!info->tce_space) continue; calgary_fixup_one_tce_space(dev); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] x86-64: Calgary - fix calgary=disable= for CalIOC2
The old check we used based on dev->bus->number is wrong for devices on CalIOC2. Instead look whether we have an IOMMU table for that bus - if not, translation is disabled. Thanks to Murillo Fernandes Bernardes <[EMAIL PROTECTED]> for spotting, suggesting a fix and testing. Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Acked-by: Murillo Fernandes Bernardes <[EMAIL PROTECTED]> --- Andi, This is Calgary 2.6.24 material - please apply. pci-calgary.c | 16 +++- 1 files changed, 11 insertions(+), 5 deletions(-) diff -r ca20f7bdb869 -r c9308d0538d9 arch/x86_64/kernel/pci-calgary.c --- a/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:38:08 2007 +0300 +++ b/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:43:49 2007 +0300 @@ -220,6 +220,12 @@ static inline unsigned int num_dma_pages return npages; } +static inline int translation_enabled(struct iommu_table *tbl) +{ + /* only PHBs with translation enabled have an IOMMU table */ + return (tbl != NULL); +} + static inline int translate_phb(struct pci_dev* dev) { int disabled = bus_info[dev->bus->number].translation_disabled; @@ -384,7 +390,7 @@ static void calgary_unmap_sg(struct devi { struct iommu_table *tbl = find_iommu_table(dev); - if (!translate_phb(to_pci_dev(dev))) + if (!translation_enabled(tbl)) return; while (nelems--) { @@ -424,7 +430,7 @@ static int calgary_map_sg(struct device unsigned long entry; int i; - if (!translate_phb(to_pci_dev(dev))) + if (!translation_enabled(tbl)) return calgary_nontranslate_map_sg(dev, sg, nelems, direction); for (i = 0; i < nelems; i++ ) { @@ -471,7 +477,7 @@ static dma_addr_t calgary_map_single(str uaddr = (unsigned long)vaddr; npages = num_dma_pages(uaddr, size); - if (translate_phb(to_pci_dev(dev))) + if (translation_enabled(tbl)) dma_handle = iommu_alloc(tbl, vaddr, npages, direction); else dma_handle = virt_to_bus(vaddr); @@ -485,7 +491,7 @@ static void calgary_unmap_single(struct struct iommu_table *tbl = find_iommu_table(dev); unsigned int npages; - if (!translate_phb(to_pci_dev(dev))) + if (!translation_enabled(tbl)) return; npages = num_dma_pages(dma_handle, size); @@ -510,7 +516,7 @@ static void* calgary_alloc_coherent(stru goto error; memset(ret, 0, size); - if (translate_phb(to_pci_dev(dev))) { + if (translation_enabled(tbl)) { /* set up tces to cover the allocated range */ mapping = iommu_alloc(tbl, ret, npages, DMA_BIDIRECTIONAL); if (mapping == bad_dma_address) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] x86-64: Calgary - fix calgary=disable=busnum for CalIOC2
The old check we used based on dev-bus-number is wrong for devices on CalIOC2. Instead look whether we have an IOMMU table for that bus - if not, translation is disabled. Thanks to Murillo Fernandes Bernardes [EMAIL PROTECTED] for spotting, suggesting a fix and testing. Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED] Acked-by: Murillo Fernandes Bernardes [EMAIL PROTECTED] --- Andi, This is Calgary 2.6.24 material - please apply. pci-calgary.c | 16 +++- 1 files changed, 11 insertions(+), 5 deletions(-) diff -r ca20f7bdb869 -r c9308d0538d9 arch/x86_64/kernel/pci-calgary.c --- a/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:38:08 2007 +0300 +++ b/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:43:49 2007 +0300 @@ -220,6 +220,12 @@ static inline unsigned int num_dma_pages return npages; } +static inline int translation_enabled(struct iommu_table *tbl) +{ + /* only PHBs with translation enabled have an IOMMU table */ + return (tbl != NULL); +} + static inline int translate_phb(struct pci_dev* dev) { int disabled = bus_info[dev-bus-number].translation_disabled; @@ -384,7 +390,7 @@ static void calgary_unmap_sg(struct devi { struct iommu_table *tbl = find_iommu_table(dev); - if (!translate_phb(to_pci_dev(dev))) + if (!translation_enabled(tbl)) return; while (nelems--) { @@ -424,7 +430,7 @@ static int calgary_map_sg(struct device unsigned long entry; int i; - if (!translate_phb(to_pci_dev(dev))) + if (!translation_enabled(tbl)) return calgary_nontranslate_map_sg(dev, sg, nelems, direction); for (i = 0; i nelems; i++ ) { @@ -471,7 +477,7 @@ static dma_addr_t calgary_map_single(str uaddr = (unsigned long)vaddr; npages = num_dma_pages(uaddr, size); - if (translate_phb(to_pci_dev(dev))) + if (translation_enabled(tbl)) dma_handle = iommu_alloc(tbl, vaddr, npages, direction); else dma_handle = virt_to_bus(vaddr); @@ -485,7 +491,7 @@ static void calgary_unmap_single(struct struct iommu_table *tbl = find_iommu_table(dev); unsigned int npages; - if (!translate_phb(to_pci_dev(dev))) + if (!translation_enabled(tbl)) return; npages = num_dma_pages(dma_handle, size); @@ -510,7 +516,7 @@ static void* calgary_alloc_coherent(stru goto error; memset(ret, 0, size); - if (translate_phb(to_pci_dev(dev))) { + if (translation_enabled(tbl)) { /* set up tces to cover the allocated range */ mapping = iommu_alloc(tbl, ret, npages, DMA_BIDIRECTIONAL); if (mapping == bad_dma_address) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] x86-64: Calgary - get rid of translate_phb
Now that we check for translation enabled/disabled based on the presence of the IOMMU translation table, we can get rid of translate_phb. Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED] --- pci-calgary.c | 31 +++ 1 files changed, 15 insertions(+), 16 deletions(-) diff -r c9308d0538d9 -r a7833c71f4df arch/x86_64/kernel/pci-calgary.c --- a/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:43:49 2007 +0300 +++ b/arch/x86_64/kernel/pci-calgary.c Sun Aug 12 11:45:29 2007 +0300 @@ -226,12 +226,6 @@ static inline int translation_enabled(st return (tbl != NULL); } -static inline int translate_phb(struct pci_dev* dev) -{ - int disabled = bus_info[dev-bus-number].translation_disabled; - return !disabled; -} - static void iommu_range_reserve(struct iommu_table *tbl, unsigned long start_addr, unsigned int npages) { @@ -1197,7 +1191,7 @@ static int __init calgary_init(void) { int ret; struct pci_dev *dev = NULL; - void *tce_space; + struct calgary_bus_info *info; ret = calgary_locate_bbars(); if (ret) @@ -1209,12 +1203,14 @@ static int __init calgary_init(void) break; if (!is_cal_pci_dev(dev-device)) continue; - if (!translate_phb(dev)) { + + info = bus_info[dev-bus-number]; + if (info-translation_disabled) { calgary_init_one_nontraslated(dev); continue; } - tce_space = bus_info[dev-bus-number].tce_space; - if (!tce_space !translate_empty_slots) + + if (!info-tce_space !translate_empty_slots) continue; ret = calgary_init_one(dev); @@ -1232,11 +1228,13 @@ error: break; if (!is_cal_pci_dev(dev-device)) continue; - if (!translate_phb(dev)) { + + info = bus_info[dev-bus-number]; + if (info-translation_disabled) { pci_dev_put(dev); continue; } - if (!bus_info[dev-bus-number].tce_space !translate_empty_slots) + if (!info-tce_space !translate_empty_slots) continue; calgary_disable_translation(dev); @@ -1549,7 +1547,7 @@ static int __init calgary_fixup_tce_spac static int __init calgary_fixup_tce_spaces(void) { struct pci_dev *dev = NULL; - void *tce_space; + struct calgary_bus_info *info; if (no_iommu || swiotlb || !calgary_detected) return -ENODEV; @@ -1562,11 +1560,12 @@ static int __init calgary_fixup_tce_spac break; if (!is_cal_pci_dev(dev-device)) continue; - if (!translate_phb(dev)) + + info = bus_info[dev-bus-number]; + if (info-translation_disabled) continue; - tce_space = bus_info[dev-bus-number].tce_space; - if (!tce_space) + if (!info-tce_space) continue; calgary_fixup_one_tce_space(dev); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] unify DMA_..BIT_MASK definitions: v1
On Tue, Sep 18, 2007 at 06:29:19AM +0200, Borislav Petkov wrote: > These patches remove redundant DMA_..BIT_MASK definitions across two drivers. > In this version of the patches, the computation of the bitmasks is done by > the compiler. > > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]> > Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > > -- > Index: 23-rc6/include/linux/dma-mapping.h > === > --- 23-rc6/include/linux/dma-mapping.h.orig 2007-09-17 17:48:20.0 > +0200 > +++ 23-rc6/include/linux/dma-mapping.h2007-09-18 06:12:33.0 > +0200 > @@ -13,16 +13,19 @@ > DMA_NONE = 3, > }; > > -#define DMA_64BIT_MASK 0xULL > -#define DMA_48BIT_MASK 0xULL > -#define DMA_40BIT_MASK 0x00ffULL > -#define DMA_39BIT_MASK 0x007fULL > -#define DMA_32BIT_MASK 0xULL > -#define DMA_31BIT_MASK 0x7fffULL > -#define DMA_30BIT_MASK 0x3fffULL > -#define DMA_29BIT_MASK 0x1fffULL > -#define DMA_28BIT_MASK 0x0fffULL > -#define DMA_24BIT_MASK 0x00ffULL > +#define DMA_BIT_MASK(n) ((1ULL<<(n))-1) > + > +#define DMA_64BIT_MASK DMA_BIT_MASK(64) This one does not do what you mean. You need an explicit mask or a ~0ULL here. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] unify DMA_..BIT_MASK definitions: v1
On Tue, Sep 18, 2007 at 06:29:19AM +0200, Borislav Petkov wrote: These patches remove redundant DMA_..BIT_MASK definitions across two drivers. In this version of the patches, the computation of the bitmasks is done by the compiler. Signed-off-by: Borislav Petkov [EMAIL PROTECTED] Cc: Jeremy Fitzhardinge [EMAIL PROTECTED] -- Index: 23-rc6/include/linux/dma-mapping.h === --- 23-rc6/include/linux/dma-mapping.h.orig 2007-09-17 17:48:20.0 +0200 +++ 23-rc6/include/linux/dma-mapping.h2007-09-18 06:12:33.0 +0200 @@ -13,16 +13,19 @@ DMA_NONE = 3, }; -#define DMA_64BIT_MASK 0xULL -#define DMA_48BIT_MASK 0xULL -#define DMA_40BIT_MASK 0x00ffULL -#define DMA_39BIT_MASK 0x007fULL -#define DMA_32BIT_MASK 0xULL -#define DMA_31BIT_MASK 0x7fffULL -#define DMA_30BIT_MASK 0x3fffULL -#define DMA_29BIT_MASK 0x1fffULL -#define DMA_28BIT_MASK 0x0fffULL -#define DMA_24BIT_MASK 0x00ffULL +#define DMA_BIT_MASK(n) ((1ULL(n))-1) + +#define DMA_64BIT_MASK DMA_BIT_MASK(64) This one does not do what you mean. You need an explicit mask or a ~0ULL here. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote: > Yes, I agree that pci_dev->sysdata can;t be removed. Even we (IOMMU) > were dependent on this field but somehow this field is being > overwritten to point to pci_bus's->sysdata and hence IOMMU was > failing. Earlier it was overwritten to NULL and hence we were not > failing but now it is overwritten to non-NULL and hence we fail. Do you know which commit caused that change? > My therory is that we don;t need to copy pci_bus's->sysdata to > pci_dev's->sysdata. Below patch solves my problem. > Any objection to below patch? I will give it a spin to verify it works for me, but in general I am wary of making such changes unless we can verify (read: audit) that they have no adverse side effects *on all architectures*. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote: Yes, I agree that pci_dev-sysdata can;t be removed. Even we (IOMMU) were dependent on this field but somehow this field is being overwritten to point to pci_bus's-sysdata and hence IOMMU was failing. Earlier it was overwritten to NULL and hence we were not failing but now it is overwritten to non-NULL and hence we fail. Do you know which commit caused that change? My therory is that we don;t need to copy pci_bus's-sysdata to pci_dev's-sysdata. Below patch solves my problem. Any objection to below patch? I will give it a spin to verify it works for me, but in general I am wary of making such changes unless we can verify (read: audit) that they have no adverse side effects *on all architectures*. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote: > On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote: > > On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: > > > > > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash > > > > This patch feels like a huge hack. See below. > > You seem to be jumping to conclusion without going in detail. The > pci_dev struct contains pointer to sysdata, which in turn points to > the copy of its parent's bus sysdata. So technically speaking we > can eliminate sysdata pointer from pci_dev struct which is what one > portion of this patch does. ... provided nothing relies on this relationship or the existence of the pci_dev's sysdata. Have you audited every architecture's use of the sysdata pointers? > > > This patch removes sysdata from pci_dev struct and creates a new > > > field called sys_data which is exclusively used by IOMMU driver to > > > keep its per device context pointer. > > > > Hmpf, why is this needed? with the pci_sysdata work that recently went > > into mainline we have a void *iommu member in pci_sysdata which should > > be all that's needed. Please elaborate if it's not enough for your > > needs. > I looked at your patch and it was not suitable because I need to > store iommu private pointer in pci_dev Could you elaborate on why you need this? I'm assuming it's for the per-device IOMMU page tables? > and not in the pci_bus. So I have added a new member sys_data in the > pci_dev struct. I can change the name from sys_dev to iomu_priv to > clear the confusion. Do let me know. Well, you should be able to just use the pci_dev's ->sysdata (that's what it's there for after all!) but you might need to make it point to a structure if it's shared, the same way we did with the bus's ->sysdata. I agree that just having it point to the bus's ->sysdata is not very useful *but* there may be code in the kernel that relies on it (Calgary did until very recently...) so it would have to be audited first. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash This patch feels like a huge hack. See below. > This patch removes sysdata from pci_dev struct and creates a new > field called sys_data which is exclusively used by IOMMU driver to > keep its per device context pointer. Hmpf, why is this needed? with the pci_sysdata work that recently went into mainline we have a void *iommu member in pci_sysdata which should be all that's needed. Please elaborate if it's not enough for your needs. Thanks, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash This patch feels like a huge hack. See below. This patch removes sysdata from pci_dev struct and creates a new field called sys_data which is exclusively used by IOMMU driver to keep its per device context pointer. Hmpf, why is this needed? with the pci_sysdata work that recently went into mainline we have a void *iommu member in pci_sysdata which should be all that's needed. Please elaborate if it's not enough for your needs. Thanks, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote: On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote: On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash This patch feels like a huge hack. See below. You seem to be jumping to conclusion without going in detail. The pci_dev struct contains pointer to sysdata, which in turn points to the copy of its parent's bus sysdata. So technically speaking we can eliminate sysdata pointer from pci_dev struct which is what one portion of this patch does. ... provided nothing relies on this relationship or the existence of the pci_dev's sysdata. Have you audited every architecture's use of the sysdata pointers? This patch removes sysdata from pci_dev struct and creates a new field called sys_data which is exclusively used by IOMMU driver to keep its per device context pointer. Hmpf, why is this needed? with the pci_sysdata work that recently went into mainline we have a void *iommu member in pci_sysdata which should be all that's needed. Please elaborate if it's not enough for your needs. I looked at your patch and it was not suitable because I need to store iommu private pointer in pci_dev Could you elaborate on why you need this? I'm assuming it's for the per-device IOMMU page tables? and not in the pci_bus. So I have added a new member sys_data in the pci_dev struct. I can change the name from sys_dev to iomu_priv to clear the confusion. Do let me know. Well, you should be able to just use the pci_dev's -sysdata (that's what it's there for after all!) but you might need to make it point to a structure if it's shared, the same way we did with the bus's -sysdata. I agree that just having it point to the bus's -sysdata is not very useful *but* there may be code in the kernel that relies on it (Calgary did until very recently...) so it would have to be audited first. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86/x86-64 PCI domain support
On Sat, Sep 01, 2007 at 10:32:23AM -0400, Jeff Garzik wrote: > > Now that the dust has settled and the prep work is upstream, adding > PCI edomain support to x86 is a lot more straightforward. > > Targetted for 2.6.24. Looks good to me. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86/x86-64 PCI domain support
On Sat, Sep 01, 2007 at 10:32:23AM -0400, Jeff Garzik wrote: Now that the dust has settled and the prep work is upstream, adding PCI edomain support to x86 is a lot more straightforward. Targetted for 2.6.24. Looks good to me. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Unified x86 architecture, arch/x86 - V2
On Wed, Aug 22, 2007 at 11:56:22PM +0200, Thomas Gleixner wrote: > We are pleased to announce v2 of the unified arch/x86 project we are > working on. After having recently inadvertently broken i386 while making x86-64 changes, thumbs up from me! Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Unified x86 architecture, arch/x86 - V2
On Wed, Aug 22, 2007 at 11:56:22PM +0200, Thomas Gleixner wrote: We are pleased to announce v2 of the unified arch/x86 project we are working on. After having recently inadvertently broken i386 while making x86-64 changes, thumbs up from me! Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE
On Mon, Aug 13, 2007 at 10:25:02AM -0700, Joe Perches wrote: > On Mon, 2007-08-13 at 15:18 +0300, Muli Ben-Yehuda wrote: > > Nope, this entry is for sound/oss/trident*. > > TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE > P: Muli Ben-Yehuda > M:[EMAIL PROTECTED] > L:linux-kernel@vger.kernel.org > S:Maintained > F:sound/oss/trident* Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE
On Sun, Aug 12, 2007 at 11:37:25PM -0700, [EMAIL PROTECTED] wrote: > Add file pattern to MAINTAINER entry > > Signed-off-by: Joe Perches <[EMAIL PROTECTED]> > > diff --git a/MAINTAINERS b/MAINTAINERS > index 7927653..fbd699e 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -4563,6 +4563,7 @@ P: Muli Ben-Yehuda > M: [EMAIL PROTECTED] > L: linux-kernel@vger.kernel.org > S: Maintained > +F: sound/pci/trident/ Nope, this entry is for sound/oss/trident*. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [2/2many] - FInd the maintainer(s) for a patch - MAINTAINERS
On Sun, Aug 12, 2007 at 10:53:19PM -0700, Joe Perches wrote: > Describe the new F: pattern Patch looks reversed? > > Signed-off-by: Joe Perches <[EMAIL PROTECTED]> > > diff --git a/MAINTAINERS b/MAINTAINERS > index 0d7f856..d3a0684 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -83,13 +83,6 @@ S: Status, one of the following: > it has been replaced by a better system and you > should be using that. > > -F: Files and directories with wildcard patterns. > - A trailing slash includes all files and subdirectory files. > - F: drivers/net/all files in and below drivers/net > - F: drivers/net/* all files in drivers/net, but not below > - F: */net/* all files in "any top level directory"/net > - One pattern per line. Multiple F: lines acceptable. > - > 3C359 NETWORK DRIVER > P: Mike Phillips > M: [EMAIL PROTECTED] Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [118/2many] MAINTAINERS - CALGARY x86-64 IOMMU
On Sun, Aug 12, 2007 at 11:25:06PM -0700, [EMAIL PROTECTED] wrote: > Add file pattern to MAINTAINER entry > > Signed-off-by: Joe Perches <[EMAIL PROTECTED]> > > diff --git a/MAINTAINERS b/MAINTAINERS > index 482d255..9c26e7b 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1148,6 +1148,11 @@ M: [EMAIL PROTECTED] > L: linux-kernel@vger.kernel.org > L: [EMAIL PROTECTED] > S: Maintained > +F: arch/x86_64/kernel/pci-calgary.c > +F: arch/x86_64/kernel/pci-dma.c pci-dma.c is generic x86-64 code maintained by Andi Kleen. > +F: arch/x86_64/kernel/tce.c > +F: include/asm-x86_6/calgary.h > +F: include/asm-x86_6/pci.h Likewise pci.h. You may also want to include F: include/asm-x86_6/tce.h F: include/asm-x86_6/rio.h Rest looks good: Acked-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [118/2many] MAINTAINERS - CALGARY x86-64 IOMMU
On Sun, Aug 12, 2007 at 11:25:06PM -0700, [EMAIL PROTECTED] wrote: Add file pattern to MAINTAINER entry Signed-off-by: Joe Perches [EMAIL PROTECTED] diff --git a/MAINTAINERS b/MAINTAINERS index 482d255..9c26e7b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1148,6 +1148,11 @@ M: [EMAIL PROTECTED] L: linux-kernel@vger.kernel.org L: [EMAIL PROTECTED] S: Maintained +F: arch/x86_64/kernel/pci-calgary.c +F: arch/x86_64/kernel/pci-dma.c pci-dma.c is generic x86-64 code maintained by Andi Kleen. +F: arch/x86_64/kernel/tce.c +F: include/asm-x86_6/calgary.h +F: include/asm-x86_6/pci.h Likewise pci.h. You may also want to include F: include/asm-x86_6/tce.h F: include/asm-x86_6/rio.h Rest looks good: Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [2/2many] - FInd the maintainer(s) for a patch - MAINTAINERS
On Sun, Aug 12, 2007 at 10:53:19PM -0700, Joe Perches wrote: Describe the new F: pattern Patch looks reversed? Signed-off-by: Joe Perches [EMAIL PROTECTED] diff --git a/MAINTAINERS b/MAINTAINERS index 0d7f856..d3a0684 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -83,13 +83,6 @@ S: Status, one of the following: it has been replaced by a better system and you should be using that. -F: Files and directories with wildcard patterns. - A trailing slash includes all files and subdirectory files. - F: drivers/net/all files in and below drivers/net - F: drivers/net/* all files in drivers/net, but not below - F: */net/* all files in any top level directory/net - One pattern per line. Multiple F: lines acceptable. - 3C359 NETWORK DRIVER P: Mike Phillips M: [EMAIL PROTECTED] Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE
On Sun, Aug 12, 2007 at 11:37:25PM -0700, [EMAIL PROTECTED] wrote: Add file pattern to MAINTAINER entry Signed-off-by: Joe Perches [EMAIL PROTECTED] diff --git a/MAINTAINERS b/MAINTAINERS index 7927653..fbd699e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4563,6 +4563,7 @@ P: Muli Ben-Yehuda M: [EMAIL PROTECTED] L: linux-kernel@vger.kernel.org S: Maintained +F: sound/pci/trident/ Nope, this entry is for sound/oss/trident*. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [481/2many] MAINTAINERS - TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE
On Mon, Aug 13, 2007 at 10:25:02AM -0700, Joe Perches wrote: On Mon, 2007-08-13 at 15:18 +0300, Muli Ben-Yehuda wrote: Nope, this entry is for sound/oss/trident*. TRIDENT 4DWAVE/SIS 7018 PCI AUDIO CORE P:Muli Ben-Yehuda M:[EMAIL PROTECTED] L:linux-kernel@vger.kernel.org S:Maintained F:sound/oss/trident* Acked-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patches] [PATCH] [4/12] x86_64: Disable CLFLUSH support again
On Thu, Aug 09, 2007 at 02:29:58PM +0100, Jan Beulich wrote: > The issue here is not with clflush by itself, but with what pages it > is being applied to. Here, it gets used on page table pages when > flushing really is needed on what one or more page table entries in > that page table page point(ed) to. I see, thanks. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [4/12] x86_64: Disable CLFLUSH support again
On Thu, Aug 09, 2007 at 02:41:31PM +0200, Andi Kleen wrote: > > It turns out CLFLUSH support is still not complete; we > flush the wrong pages. Again disable it for the release. > Noticed by Jan Beulich. > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Aside from the bug Jan pointed out with the patch, we're also using cflflush in other places (arch/i386/kernel/alternative.c and arch/x86_64/kernel/tce.c). Is the issue with clflush support specific to pageattr.c or should it be disabled every else too? what specifically is the issue? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [0/12] x86 Late merge bug fixes for 2.6.23
On Thu, Aug 09, 2007 at 02:41:27PM +0200, Andi Kleen wrote: > > Various simple bug fixes, no new features. Please review. I plan to > send them to Linus later. > > I don't have anything else critical pending. If you believe a x86 > patch is critical and really needs to make .23 please send email. We also need this fix for the current sysdata regression: Subject: finish i386 and x86-64 sysdata conversion This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully also fixes Riku's and Andy's observed bugs. It is based on Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes: - introduce pci_scan_bus_with_sysdata() and use it instead of pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will allocate the sysdata structure and then call pci_scan_bus(). - always allocate pci_sysdata dynamically. The whole point of this sysdata work is to make it easy to do root-bus specific things (e.g., support PCI domains and IOMMU's). I dislike using a default struct pci_sysdata in some places and a dynamically allocated pci_sysdata elsewhere - the potential for someone indavertantly changing the default structure is too high. - this patch only makes the minimal changes necessary, i.e., the NUMA node is always initialized to -1. Patches to do the right thing with regards to the NUMA node can build on top of this (either add a 'node' parameter to pci_scan_bus_with_sysdata() or just update the node when it becomes known). The patch was compile tested with various configurations (e.g., NUMAQ, VISWS) and run-time tested on i386 and x86-64. Unfortunately none of my machines exhibited the bugs so caveat emptor. Andy, could you please see if this fixes the NUMA issues you've seen? Riku, does this fix "pci=noacpi" on your laptop? Comments appreciated. If this looks ok it should go into 2.6.23. Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cc: Yinghai Lu <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Cc: Chuck Ebbert <[EMAIL PROTECTED]>, Cc: [EMAIL PROTECTED] Cc: Andy Whitcroft <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> --- arch/i386/pci/common.c | 23 +++ arch/i386/pci/fixup.c|6 +++--- arch/i386/pci/irq.c |5 +++-- arch/i386/pci/legacy.c |2 +- arch/i386/pci/numa.c | 15 +-- arch/i386/pci/visws.c|4 ++-- include/asm-i386/pci.h |3 +++ include/asm-x86_64/pci.h |2 ++ 8 files changed, 46 insertions(+), 14 deletions(-) diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c index 85503de..ebc6f3c 100644 --- a/arch/i386/pci/common.c +++ b/arch/i386/pci/common.c @@ -455,3 +455,26 @@ void pcibios_disable_device (struct pci_dev *dev) if (!dev->msi_enabled && pcibios_disable_irq) pcibios_disable_irq(dev); } + +struct pci_bus *pci_scan_bus_with_sysdata(int busno) +{ + struct pci_bus *bus = NULL; + struct pci_sysdata *sd; + + /* +* Allocate per-root-bus (not per bus) arch-specific data. +* TODO: leak; this memory is never freed. +* It's arguable whether it's worth the trouble to care. +*/ + sd = kzalloc(sizeof(*sd), GFP_KERNEL); + if (!sd) { + printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busno); + return NULL; + } + sd->node = -1; + bus = pci_scan_bus(busno, _root_ops, sd); + if (!bus) + kfree(sd); + + return bus; +} diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c index e7306db..c82cbf4 100644 --- a/arch/i386/pci/fixup.c +++ b/arch/i386/pci/fixup.c @@ -25,9 +25,9 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d) pci_read_config_byte(d, reg++, ); DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb); if (busno) - pci_scan_bus(busno, _root_ops, NULL); /* Bus A */ + pci_scan_bus_with_sysdata(busno); /* Bus A */ if (suba < subb) - pci_scan_bus(suba+1, _root_ops, NULL); /* Bus B */ + pci_scan_bus_with_sysdata(suba+1); /* Bus B */ } pcibios_last_bus = -1; } @@ -42,7 +42,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d) u8 busno; pci_read_config_byte(d, 0x4a, ); printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", pci_name(d), busno); - pci_scan_bus(busno, _root_ops, NULL); + pci_scan_bus_with_sysdata(busno); pcibios_last_bus = -1; } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx); diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c index f2cb942..665db06 100644 --- a/arch/i386/pci/ir
Re: [PATCH] [4/12] x86_64: Disable CLFLUSH support again
On Thu, Aug 09, 2007 at 02:41:31PM +0200, Andi Kleen wrote: It turns out CLFLUSH support is still not complete; we flush the wrong pages. Again disable it for the release. Noticed by Jan Beulich. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Aside from the bug Jan pointed out with the patch, we're also using cflflush in other places (arch/i386/kernel/alternative.c and arch/x86_64/kernel/tce.c). Is the issue with clflush support specific to pageattr.c or should it be disabled every else too? what specifically is the issue? Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [0/12] x86 Late merge bug fixes for 2.6.23
On Thu, Aug 09, 2007 at 02:41:27PM +0200, Andi Kleen wrote: Various simple bug fixes, no new features. Please review. I plan to send them to Linus later. I don't have anything else critical pending. If you believe a x86 patch is critical and really needs to make .23 please send email. We also need this fix for the current sysdata regression: Subject: finish i386 and x86-64 sysdata conversion This patch finishes the i386 and x86-64 -sysdata conversion and hopefully also fixes Riku's and Andy's observed bugs. It is based on Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes: - introduce pci_scan_bus_with_sysdata() and use it instead of pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will allocate the sysdata structure and then call pci_scan_bus(). - always allocate pci_sysdata dynamically. The whole point of this sysdata work is to make it easy to do root-bus specific things (e.g., support PCI domains and IOMMU's). I dislike using a default struct pci_sysdata in some places and a dynamically allocated pci_sysdata elsewhere - the potential for someone indavertantly changing the default structure is too high. - this patch only makes the minimal changes necessary, i.e., the NUMA node is always initialized to -1. Patches to do the right thing with regards to the NUMA node can build on top of this (either add a 'node' parameter to pci_scan_bus_with_sysdata() or just update the node when it becomes known). The patch was compile tested with various configurations (e.g., NUMAQ, VISWS) and run-time tested on i386 and x86-64. Unfortunately none of my machines exhibited the bugs so caveat emptor. Andy, could you please see if this fixes the NUMA issues you've seen? Riku, does this fix pci=noacpi on your laptop? Comments appreciated. If this looks ok it should go into 2.6.23. Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED] Cc: Yinghai Lu [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Cc: Andrew Morton [EMAIL PROTECTED] Cc: Chuck Ebbert [EMAIL PROTECTED], Cc: [EMAIL PROTECTED] Cc: Andy Whitcroft [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] --- arch/i386/pci/common.c | 23 +++ arch/i386/pci/fixup.c|6 +++--- arch/i386/pci/irq.c |5 +++-- arch/i386/pci/legacy.c |2 +- arch/i386/pci/numa.c | 15 +-- arch/i386/pci/visws.c|4 ++-- include/asm-i386/pci.h |3 +++ include/asm-x86_64/pci.h |2 ++ 8 files changed, 46 insertions(+), 14 deletions(-) diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c index 85503de..ebc6f3c 100644 --- a/arch/i386/pci/common.c +++ b/arch/i386/pci/common.c @@ -455,3 +455,26 @@ void pcibios_disable_device (struct pci_dev *dev) if (!dev-msi_enabled pcibios_disable_irq) pcibios_disable_irq(dev); } + +struct pci_bus *pci_scan_bus_with_sysdata(int busno) +{ + struct pci_bus *bus = NULL; + struct pci_sysdata *sd; + + /* +* Allocate per-root-bus (not per bus) arch-specific data. +* TODO: leak; this memory is never freed. +* It's arguable whether it's worth the trouble to care. +*/ + sd = kzalloc(sizeof(*sd), GFP_KERNEL); + if (!sd) { + printk(KERN_ERR PCI: OOM, skipping PCI bus %02x\n, busno); + return NULL; + } + sd-node = -1; + bus = pci_scan_bus(busno, pci_root_ops, sd); + if (!bus) + kfree(sd); + + return bus; +} diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c index e7306db..c82cbf4 100644 --- a/arch/i386/pci/fixup.c +++ b/arch/i386/pci/fixup.c @@ -25,9 +25,9 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d) pci_read_config_byte(d, reg++, subb); DBG(i450NX PXB %d: %02x/%02x/%02x\n, pxb, busno, suba, subb); if (busno) - pci_scan_bus(busno, pci_root_ops, NULL); /* Bus A */ + pci_scan_bus_with_sysdata(busno); /* Bus A */ if (suba subb) - pci_scan_bus(suba+1, pci_root_ops, NULL); /* Bus B */ + pci_scan_bus_with_sysdata(suba+1); /* Bus B */ } pcibios_last_bus = -1; } @@ -42,7 +42,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d) u8 busno; pci_read_config_byte(d, 0x4a, busno); printk(KERN_INFO PCI: i440KX/GX host bridge %s: secondary bus %02x\n, pci_name(d), busno); - pci_scan_bus(busno, pci_root_ops, NULL); + pci_scan_bus_with_sysdata(busno); pcibios_last_bus = -1; } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx); diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c index f2cb942..665db06 100644 --- a/arch/i386/pci/irq.c +++ b/arch/i386/pci/irq.c @@ -138,8 +138,9 @@ static void __init pirq_peer_trick(void) for(i = 1; i 256; i
Re: [patches] [PATCH] [4/12] x86_64: Disable CLFLUSH support again
On Thu, Aug 09, 2007 at 02:29:58PM +0100, Jan Beulich wrote: The issue here is not with clflush by itself, but with what pages it is being applied to. Here, it gets used on page table pages when flushing really is needed on what one or more page table entries in that page table page point(ed) to. I see, thanks. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] x86_64: get mp_bus_to_node as early v3
On Tue, Aug 07, 2007 at 06:20:33PM -0700, Yinghai Lu wrote: > please check this one against pci_scan_bus_with_sysdata > > [PATCH 1/5] x86_64: get mp_bus_to_node as early v3 > > current on amd k8 system with multi ht chain, the numa_node of pci > devices under r/sys/devices/pci:80/* always 0, even that chain is > on node 1 or 2 or 3. > > workaround: pcibus_to_node(bus) is used when we want to get node > that pci_device is on. > > In struct device, we already have numa_node member. and we could use > dev_to_node() /set_dev_node() to get and set numa_node in the > device. set_dev_node is called in pci_device_add() with > pcibus_to_node(bus). and pcibus_to_node use bus->sysdata for nodeid. > > the problem is when pci_add_device is called, bus->sysdata is not > assigned correct nodeid yet. the result will be numa_node always is > 0. > > pcibios_scan_root and pci_scan_root could take sysdata. So we need > to get mp_bus_to_node mapping before these two are called. and > get_mp_bus_to_node could get correct node for sysdata in root bus. > > in scanning of root bus, all child bus will take parent bus sysdata. So all > pci_device->dev.numa_node will be assigned correctly automatically. > > later we could use dev_to_node(_dev->dev) to get numa_node, and > we could also could make other bus specific device get the correct > numa_node too. > > this is one update version to pci_sysdata ... also reverse > pci_acpi_scan_root to use pcibios_scan_root again. Can you explain why this patch (and your other patches in this area) are needed? is this a performance issue? the patches seem complex, is there a good argument for that complexity? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] x86_64: get mp_bus_to_node as early v3
On Tue, Aug 07, 2007 at 06:20:33PM -0700, Yinghai Lu wrote: please check this one against pci_scan_bus_with_sysdata [PATCH 1/5] x86_64: get mp_bus_to_node as early v3 current on amd k8 system with multi ht chain, the numa_node of pci devices under r/sys/devices/pci:80/* always 0, even that chain is on node 1 or 2 or 3. workaround: pcibus_to_node(bus) is used when we want to get node that pci_device is on. In struct device, we already have numa_node member. and we could use dev_to_node() /set_dev_node() to get and set numa_node in the device. set_dev_node is called in pci_device_add() with pcibus_to_node(bus). and pcibus_to_node use bus-sysdata for nodeid. the problem is when pci_add_device is called, bus-sysdata is not assigned correct nodeid yet. the result will be numa_node always is 0. pcibios_scan_root and pci_scan_root could take sysdata. So we need to get mp_bus_to_node mapping before these two are called. and get_mp_bus_to_node could get correct node for sysdata in root bus. in scanning of root bus, all child bus will take parent bus sysdata. So all pci_device-dev.numa_node will be assigned correctly automatically. later we could use dev_to_node(pci_dev-dev) to get numa_node, and we could also could make other bus specific device get the correct numa_node too. this is one update version to pci_sysdata ... also reverse pci_acpi_scan_root to use pcibios_scan_root again. Can you explain why this patch (and your other patches in this area) are needed? is this a performance issue? the patches seem complex, is there a good argument for that complexity? Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
On Tue, Aug 07, 2007 at 03:49:11PM -0700, Andrew Morton wrote: > I am sooo tired of this thing. Andi, someone, can we for heaven's > sake please just get it all sorted out? With regards to the sysdata conversion: Riku says he cannot test new kernel. I haven't heard anything from Andy Whitcroft. It passes all of my tests and what we have now is obviously broken... I think we should put the fix in. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
On Tue, Aug 07, 2007 at 03:49:11PM -0700, Andrew Morton wrote: I am sooo tired of this thing. Andi, someone, can we for heaven's sake please just get it all sorted out? With regards to the sysdata conversion: Riku says he cannot test new kernel. I haven't heard anything from Andy Whitcroft. It passes all of my tests and what we have now is obviously broken... I think we should put the fix in. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion
On Sun, Aug 05, 2007 at 01:49:57AM -0700, Yinghai Lu wrote: > Can you change > pci_scan_bus_with_sysdata(int busno) > to > pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)? Do you anticipate passing in a different pci_ops or node? In any case please remember I am aiming for the minimal "obviously correc" change for 2.6.23... > pci_scan_bus_with_sysdata(int busno) make me feel that i need feed one > sysdata as on param for it. Yeah, lousy name, but the best I came up with. Runner up was 'x86_pci_scan_bus', which is I think worse? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH/RFT] finish i386 and x86-64 sysdata conversion
This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully also fixes Riku's and Andy's observed bugs. It is based on Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes: - introduce pci_scan_bus_with_sysdata() and use it instead of pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will allocate the sysdata structure and then call pci_scan_bus(). - always allocate pci_sysdata dynamically. The whole point of this sysdata work is to make it easy to do root-bus specific things (e.g., support PCI domains and IOMMU's). I dislike using a default struct pci_sysdata in some places and a dynamically allocated pci_sysdata elsewhere - the potential for someone indavertantly changing the default structure is too high. - this patch only makes the minimal changes necessary, i.e., the NUMA node is always initialized to -1. Patches to do the right thing with regards to the NUMA node can build on top of this (either add a 'node' parameter to pci_scan_bus_with_sysdata() or just update the node when it becomes known). The patch was compile tested with various configurations (e.g., NUMAQ, VISWS) and run-time tested on i386 and x86-64. Unfortunately none of my machines exhibited the bugs so caveat emptor. Andy, could you please see if this fixes the NUMA issues you've seen? Riku, does this fix "pci=noacpi" on your laptop? Comments appreciated. If this looks ok it should go into 2.6.23. Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> Cc: Yinghai Lu <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Cc: Chuck Ebbert <[EMAIL PROTECTED]>, Cc: [EMAIL PROTECTED] Cc: Andy Whitcroft <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> --- arch/i386/pci/common.c | 23 +++ arch/i386/pci/fixup.c|6 +++--- arch/i386/pci/irq.c |5 +++-- arch/i386/pci/legacy.c |2 +- arch/i386/pci/numa.c | 15 +-- arch/i386/pci/visws.c|4 ++-- include/asm-i386/pci.h |3 +++ include/asm-x86_64/pci.h |2 ++ 8 files changed, 46 insertions(+), 14 deletions(-) diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c index 85503de..ebc6f3c 100644 --- a/arch/i386/pci/common.c +++ b/arch/i386/pci/common.c @@ -455,3 +455,26 @@ void pcibios_disable_device (struct pci_dev *dev) if (!dev->msi_enabled && pcibios_disable_irq) pcibios_disable_irq(dev); } + +struct pci_bus *pci_scan_bus_with_sysdata(int busno) +{ + struct pci_bus *bus = NULL; + struct pci_sysdata *sd; + + /* +* Allocate per-root-bus (not per bus) arch-specific data. +* TODO: leak; this memory is never freed. +* It's arguable whether it's worth the trouble to care. +*/ + sd = kzalloc(sizeof(*sd), GFP_KERNEL); + if (!sd) { + printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busno); + return NULL; + } + sd->node = -1; + bus = pci_scan_bus(busno, _root_ops, sd); + if (!bus) + kfree(sd); + + return bus; +} diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c index e7306db..c82cbf4 100644 --- a/arch/i386/pci/fixup.c +++ b/arch/i386/pci/fixup.c @@ -25,9 +25,9 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d) pci_read_config_byte(d, reg++, ); DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb); if (busno) - pci_scan_bus(busno, _root_ops, NULL); /* Bus A */ + pci_scan_bus_with_sysdata(busno); /* Bus A */ if (suba < subb) - pci_scan_bus(suba+1, _root_ops, NULL); /* Bus B */ + pci_scan_bus_with_sysdata(suba+1); /* Bus B */ } pcibios_last_bus = -1; } @@ -42,7 +42,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d) u8 busno; pci_read_config_byte(d, 0x4a, ); printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", pci_name(d), busno); - pci_scan_bus(busno, _root_ops, NULL); + pci_scan_bus_with_sysdata(busno); pcibios_last_bus = -1; } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx); diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c index f2cb942..665db06 100644 --- a/arch/i386/pci/irq.c +++ b/arch/i386/pci/irq.c @@ -138,8 +138,9 @@ static void __init pirq_peer_trick(void) for(i = 1; i < 256; i++) { if (!busmap[i] || pci_find_bus(0, i)) continue; - if (pci_scan_bus(i, _root_ops, NULL)) - printk(KERN_INFO "PCI: Discovered primary peer bus %02x [IRQ]\n", i); + if (pci_scan_bus_with_sysdata(i)) +