Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline

2008-02-11 Thread Andi Kleen
On Mon, Feb 11, 2008 at 09:33:14AM -0800, Linus Torvalds wrote: On Mon, 11 Feb 2008, Andi Kleen wrote: Without this patch a Opteron test system here oopses at boot with currentg git. Calling to_pci_dev() on a NULL pointer gives a negative value so the following NULL

Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline

2008-02-11 Thread Andi Kleen
On Mon, Feb 11, 2008 at 09:37:18AM -0800, Linus Torvalds wrote: On Mon, 11 Feb 2008, Linus Torvalds wrote: So we should probably make pcibus_to_node() be an inline function for that case Or, we could just do the ugliest patch ever, namely -#define pcibus_to_node(node)

[PATCH] Prevent IDE boot ops on NUMA system in mainline

2008-02-10 Thread Andi Kleen
instead. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux/include/linux/ide.h === --- linux.orig/include/linux/ide.h +++ linux/include/linux/ide.h @@ -1294,7 +1294,7 @@ static inline void ide_dump_identify(u8 static inline

Re: DMA mapping on SCSI device?

2008-01-28 Thread Andi Kleen
The ideal solution would be to do mapping against a different struct device for each port, so that we could maintain the proper DMA mask for each of them at all times. However I'm not sure if that's possible. I cannot imagine why it should be that difficult. The PCI subsystem could over a

libata exception handling messages at boot on qemu

2008-01-08 Thread Andi Kleen
Is there a workaround for the long ugly boot messages on sees with libata and qemu (0.9.0 CVS 070719)? It boots eventually, but it looks quite ugly. I suppose that's a qemu device model bug or could it be a Linux problem? -Andi Driver 'sd' needs updating - please use bus_type methods Driver

Re: libata exception handling messages at boot on qemu

2008-01-08 Thread Andi Kleen
On Tue, Jan 08, 2008 at 09:04:28PM +, Alan Cox wrote: On Tue, 8 Jan 2008 20:23:52 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Is there a workaround for the long ugly boot messages on sees with libata and qemu (0.9.0 CVS 070719)? It boots eventually, but it looks quite ugly. I

Re: libata exception handling messages at boot on qemu

2008-01-08 Thread Andi Kleen
On Tue, Jan 08, 2008 at 09:19:31PM +, Alan Cox wrote: Since I assume that qemu code base is wide spread and if a workaround is not too ugly I think it would be nice if the kernel handled that. Qemu behaves exactly the same way as a broken device in a situation where data corruption may

Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl

2008-01-08 Thread Andi Kleen
On Tue, Jan 08, 2008 at 07:50:47PM -0400, Kevin Winchester wrote: Andi Kleen wrote: Here's a proposal for some useful code transformations the kernel janitors could do as opposed to running checkpatch.pl. snip I notice that every driver in drivers/ata uses a .ioctl that points

Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl

2008-01-08 Thread Andi Kleen
Sorry about the noise here - I now notice that not all .ioctl function pointers have the option of changing to .unlocked_ioctl. In this case, the ioctl is in the struct scsi_host_template, rather than struct file_operations. I'll try to be a little more careful about the git grepping in

Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM II

2007-11-20 Thread Andi Kleen
Which in turn enables the iommu_merge functionality in gart_map_sg(). for_each_sg(sg, s, nents, i) { Hmm, another thought. Maybe this code just has trouble with the new linked SG lists and it's not really a SB600 problem? I did a quick test on two ATI machines with older chipset and

Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-20 Thread Andi Kleen
On Tuesday 20 November 2007 19:29:56 Thomas Gleixner wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: This requires propably working 64bit DMA, which is not possible with the SB600 controller. It should not no. The remapping is done into the GART which is 4GB

Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-14 Thread Andi Kleen
On Wednesday 14 November 2007 12:55, Srihari Vijayaraghavan wrote: [Sorry to reply to my own email thread] Srihari Vijayaraghavan [EMAIL PROTECTED] wrote: ... No problems. Here's the log of unworking kernel with IOMMU turned on. Basically it goes on reseting the SATA ports throwing many

Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM

2007-11-14 Thread Andi Kleen
The AHCI code falls back to 32bit DMA in that case. Which in turn causes the problem seen by Srihari. There is not much printk sticking necessary, the code is simply not handling this. What code is not handling what? IOMMU merging should be always safe. If it is not the driver should not

Re: [PATCH] Add a global ide=off switch for drivers/ide

2007-10-25 Thread Andi Kleen
On Thursday 25 October 2007 23:07:23 Bartlomiej Zolnierkiewicz wrote: On Monday 15 October 2007, Andi Kleen wrote: Had a situation where drivers/ide was compiled in, but I wanted to turn it off to let the drivers/ata drivers take over. I ended up using ide*=noprobe

ata_exec_internal crash at boot in -git22

2007-10-22 Thread Andi Kleen
One of the systems tested in autoboot crashes at boot with with -git22. This is a AMD 2 socket Opteron NUMA system. The tester was a little flakey and happened to hit the x86-merge-broke- compilation window, so the last good data point I have is 2.6.23-rc9. -Andi megasas: 00.00.03.05 Mon Oct

Re: ata_exec_internal crash at boot in -git22

2007-10-22 Thread Andi Kleen
On Monday 22 October 2007 20:26:45 Jens Axboe wrote: On Mon, Oct 22 2007, Andi Kleen wrote: One of the systems tested in autoboot crashes at boot with with -git22. This is a AMD 2 socket Opteron NUMA system. The tester was a little flakey and happened to hit the x86-merge-broke

[PATCH] Add a global ide=off switch for drivers/ide

2007-10-15 Thread Andi Kleen
this situation better. The patch is a little bigger because I tried to cover all modules. I'm also not 100% sure ENODEV is the right error return for this case, but I didn't come up with a better one. The ARM/MIPS part is uncompiled. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux-2.6.23

Re: [PATCH] drivers/firmware: const-ify DMI API and internals

2007-09-01 Thread Andi Kleen
And if we're really lucky, this might enable some additional optimizations on the part of the compiler. Only if the kernel was compiled C++. C compilers generally ignore constness for optimization purposes because it can be so easily casted away. -Andi - To unsubscribe from this list:

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs) II

2007-08-10 Thread Andi Kleen
Alan Cox [EMAIL PROTECTED] writes: BTW unless I'm misreading the i386 code it'll not fail here, but allocate memory. Surely that will cause failures later if you rely on it failing? If you don't rely on it then changing x86-64 will also not help you. Eww that'll do strange things.

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs) II

2007-08-10 Thread Andi Kleen
Surely we don't need to wait until then? This is the correct fix, isn't it? (Obviously I'll split it into a generic and a pcmcia specific piece if it looks OK to everyone). It sets the PCMCIA dma_mask up correctly and introduces a DMA_MASK_NONE (I prefer that to DMA_0BIT_MASK but I can

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)

2007-08-09 Thread Andi Kleen
On Thu, Aug 09, 2007 at 02:53:36PM +0100, Alan Cox wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8424 - patch review This one is on Alan. I think not - something horrible is happening in dma_alloc_coherent when called from dmam_* with a non PCI device Seems to be some kind of

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)

2007-08-09 Thread Andi Kleen
On Thu, Aug 09, 2007 at 06:21:01PM +0100, Alan Cox wrote: Seems to be some kind of AMD64 specific DMA mapping bug ? I think it's dev-dma_mask == NULL. Clearly you're passing a non DMA able device to dma_alloc_coherent(). Which seems like a caller bug. Ok - other archs seem to just

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)

2007-08-09 Thread Andi Kleen
On Thu, Aug 09, 2007 at 06:53:10PM +0100, Alan Cox wrote: Or perhaps you got the wrong device here? For ISA devices we traditionally used NULL. Or if you set up your own ISA devices (which I can't see a reason for but there might be one I'm missing) at least give them a dma mask. Then it

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)

2007-08-09 Thread Andi Kleen
On Thu, Aug 09, 2007 at 11:34:58PM +0100, Alan Cox wrote: Where does the device come from? What device is it? At the higher level someone passed us a device and some mappings and function methods and said this is an IDE controller Ok you're stabbing in the dark. I guess more debugging is

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)

2007-08-09 Thread Andi Kleen
I'll submit a patch to check this in my next batch. But as James pointed out you'll likely need similar patches on other architectures. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs) II

2007-08-09 Thread Andi Kleen
BTW unless I'm misreading the i386 code it'll not fail here, but allocate memory. Surely that will cause failures later if you rely on it failing? If you don't rely on it then changing x86-64 will also not help you. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in

Re: sata_sil, writing bug with multiple cards?

2007-07-04 Thread Andi Kleen
On Wednesday 04 July 2007 10:17:34 [EMAIL PROTECTED] wrote: Most likely it is some sort of hardware bug that we might not be able to do much about. Have you tried contacting SIL or VIA? No, I haven't. Like I mentioned above, the OpenBSD drivers seemed to work, or at least did with

Re: Libata PATA status

2007-07-04 Thread Andi Kleen
Alan Cox [EMAIL PROTECTED] writes: Post SRST What is SRST? My personal wish list feature would be a little forwarder driver to forward /dev/hd* to /dev/sd* for this; then old IDE could be disabled without risking breaking old root file systems. -Andi - To unsubscribe from this list: send the

Re: Libata PATA status

2007-07-04 Thread Andi Kleen
You could probably reliably map hda/b/c/d initially with some kind of forwarder providing nobody hot plugged them. Just not sure I see the PATA hotplug? SATA systems typically already use /dev/sda*, it just applies to PATA. point of doing it kernel side. The point would be that old user

Re: Silent corruption on AMD64

2007-04-01 Thread Andi Kleen
Aaron Lehmann [EMAIL PROTECTED] writes: [adding netdev] [meta-comment: I wish people wouldn't use such unnecessarily broad subjects -- how is it the x86-64 port's or AMD's fault when you have broken hardware? Would anybody write Silent corruption on i386 or Silent corruption on Intel or Silent

Re: [PATCH v2] Add suspend/resume for HPET

2007-03-29 Thread Andi Kleen
Ingo Molnar [EMAIL PROTECTED] writes: there's no fundamental reason. x86_64 COW-ed hpet_timer.c and time_hpet.c years ago and drifted off into different areas. Not quite -- x86-64 did HPET long before i386; the only stuff cowed was the character driver support code. But the core HPET code

Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

2007-03-15 Thread Andi Kleen
Do you mean between disabling IRQ mechanisms and enabling PCI device in pcim_prepare_device()? Yes. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access

2007-03-14 Thread Andi Kleen
Tejun Heo [EMAIL PROTECTED] writes: Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. This seems to be also

Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes

2005-07-07 Thread Andi Kleen
On Thu, Jul 07, 2005 at 09:21:55AM -0700, Christoph Lameter wrote: On Wed, 6 Jul 2005, Andi Kleen wrote: Without this patch a dual Xeon EM64T machine would oops on boot because the hwif pointer here was NULL. I also added a check for pci_dev because it's doubtful that all IDE devices have

Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes

2005-07-07 Thread Andi Kleen
On Thu, Jul 07, 2005 at 09:32:51AM -0700, Christoph Lameter wrote: On Thu, 7 Jul 2005, Andi Kleen wrote: On Thu, Jul 07, 2005 at 09:21:55AM -0700, Christoph Lameter wrote: On Wed, 6 Jul 2005, Andi Kleen wrote: Without this patch a dual Xeon EM64T machine would oops on boot

Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes

2005-07-07 Thread Andi Kleen
On Thu, Jul 07, 2005 at 12:09:00PM -0700, Christoph Lameter wrote: On Thu, 7 Jul 2005, Linus Torvalds wrote: Yes. Except that if hwif is NULL, we'll have other oopses since we access that in other places. Why _is_ hwif NULL anyway? That's another, unrelated thing, and should

Re: [PATCH] Fix crash on boot in kmalloc_node IDE changes

2005-07-06 Thread Andi Kleen
drive-hwif check is redundant, please remove it It's not. My first version didn't have it but it still crashed. It's what actually prevents the crash. I also don't know why, but it's true. The machine had four IDE controllers BTW (on board an an external Promise card) -Andi - To unsubscribe

Re: [PATCH] Fix crash on boot in kmalloc_node IDE changes

2005-07-06 Thread Andi Kleen
On Wed, Jul 06, 2005 at 09:34:28AM -0700, Christoph Lameter wrote: On Wed, 6 Jul 2005, Andi Kleen wrote: - q = blk_init_queue_node(do_ide_request, ide_lock, - pcibus_to_node(drive-hwif-pci_dev-bus)); + int node = 0; /* Should be -1 */ Why is this not -1

Re: [PATCH] Fix crash on boot in kmalloc_node IDE changes

2005-07-06 Thread Andi Kleen
On Wed, 6 Jul 2005 16:35:11 +0200 Bartlomiej Zolnierkiewicz [EMAIL PROTECTED] wrote: On 7/6/05, Andi Kleen [EMAIL PROTECTED] wrote: drive-hwif check is redundant, please remove it It's not. My first version didn't have it but it still crashed. It's what actually prevents the crash. I