Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline
On Mon, Feb 11, 2008 at 09:33:14AM -0800, Linus Torvalds wrote: On Mon, 11 Feb 2008, Andi Kleen wrote: Without this patch a Opteron test system here oopses at boot with currentg git. Calling to_pci_dev() on a NULL pointer gives a negative value so the following NULL pointer check never triggers and then an illegal address is referenced. Check the unadjusted original device pointer for NULL instead. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux/include/linux/ide.h === --- linux.orig/include/linux/ide.h +++ linux/include/linux/ide.h @@ -1294,7 +1294,7 @@ static inline void ide_dump_identify(u8 static inline int hwif_to_node(ide_hwif_t *hwif) { struct pci_dev *dev = to_pci_dev(hwif-dev); - return dev ? pcibus_to_node(dev-bus) : -1; + return hwif-dev ? pcibus_to_node(dev-bus) : -1; } Ok, I applied this, but it causes a *lot* of noise about unused variable 'dev' because pcibus_to_node() is defined to be -1 when you don't do any strange NUMA thing (so that dev-bus usage thing is never even seen by the compiler. So we should probably make pcibus_to_node() be an inline function for that case, or just make that thing be return hwif-dev ? pcibus_to_node(to_pci_dev(hwif-dev)-bus) : -1; or something. Because now things are really noisy. Yes sorry; I only checked NUMA builds before sending it off. The easiest way is probably just the traditional (void) cast I fixed most of the topology macros while I was at it. -Andi --- Make topology fallback macros reference their arguments. This avoids warnings with unreferenced variables in the !NUMA case. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux/include/asm-generic/topology.h === --- linux.orig/include/asm-generic/topology.h +++ linux/include/asm-generic/topology.h @@ -30,19 +30,19 @@ /* Other architectures wishing to use this simple topology API should fill in the below functions as appropriate in their own asm/topology.h file. */ #ifndef cpu_to_node -#define cpu_to_node(cpu) (0) +#define cpu_to_node(cpu) ((void)(cpu),0) #endif #ifndef parent_node -#define parent_node(node) (0) +#define parent_node(node) ((void)(node),0) #endif #ifndef node_to_cpumask -#define node_to_cpumask(node) (cpu_online_map) +#define node_to_cpumask(node) ((void)node, cpu_online_map) #endif #ifndef node_to_first_cpu -#define node_to_first_cpu(node)(0) +#define node_to_first_cpu(node)((void)(node),0) #endif #ifndef pcibus_to_node -#define pcibus_to_node(node) (-1) +#define pcibus_to_node(bus)((void)(bus), -1) #endif #ifndef pcibus_to_cpumask - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Prevent IDE boot ops on NUMA system in mainline
On Mon, Feb 11, 2008 at 09:37:18AM -0800, Linus Torvalds wrote: On Mon, 11 Feb 2008, Linus Torvalds wrote: So we should probably make pcibus_to_node() be an inline function for that case Or, we could just do the ugliest patch ever, namely -#define pcibus_to_node(node) (-1) +#define pcibus_to_node(node) ((int)(long)(node),-1) Wow. It's so ugly it's almost wraps around and comes out the other sides and looks pretty. (void)arg, ... is better. Trick originally from Jan Beulich I think (his code is always a good source for useful new C tricks) -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Prevent IDE boot ops on NUMA system in mainline
Without this patch a Opteron test system here oopses at boot with currentg git. Calling to_pci_dev() on a NULL pointer gives a negative value so the following NULL pointer check never triggers and then an illegal address is referenced. Check the unadjusted original device pointer for NULL instead. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux/include/linux/ide.h === --- linux.orig/include/linux/ide.h +++ linux/include/linux/ide.h @@ -1294,7 +1294,7 @@ static inline void ide_dump_identify(u8 static inline int hwif_to_node(ide_hwif_t *hwif) { struct pci_dev *dev = to_pci_dev(hwif-dev); - return dev ? pcibus_to_node(dev-bus) : -1; + return hwif-dev ? pcibus_to_node(dev-bus) : -1; } static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive) - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: DMA mapping on SCSI device?
The ideal solution would be to do mapping against a different struct device for each port, so that we could maintain the proper DMA mask for each of them at all times. However I'm not sure if that's possible. I cannot imagine why it should be that difficult. The PCI subsystem could over a pci_clone_device() or similar function. For all complicated purposes (sysfs etc) the original device could be used, so it would be hopefully not that difficult. The alternative would be to add a new family of PCI mapping functions that take an explicit mask. Disadvantage would be changing all architectures, but on the other hand the interface could be phase in one by one (and nF4 primarily only works on x86 anyways) I suspect the later would be a little cleaner, although they don't make much difference. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
libata exception handling messages at boot on qemu
Is there a workaround for the long ugly boot messages on sees with libata and qemu (0.9.0 CVS 070719)? It boots eventually, but it looks quite ugly. I suppose that's a qemu device model bug or could it be a Linux problem? -Andi Driver 'sd' needs updating - please use bus_type methods Driver 'sr' needs updating - please use bus_type methods scsi0 : ata_piix scsi1 : ata_piix ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14 ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15 ata1.00: ATA-7: QEMU HARDDISK, 0.9.0, max UDMA/100 ata1.00: 14062608 sectors, multi 16: LBA48 ata1.00: configured for MWDMA2 ata2.00: ATAPI: QEMU CD-ROM, 0.9.0, max UDMA/100 ata2.00: configured for MWDMA2 scsi 0:0:0:0: Direct-Access ATA QEMU HARDDISK0.9. PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 14062608 512-byte hardware sectors (7200 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DP O or FUA sd 0:0:0:0: [sda] 14062608 512-byte hardware sectors (7200 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DP O or FUA sda: sda1 sd 0:0:0:0: [sda] Attached SCSI disk sd 0:0:0:0: Attached scsi generic sg0 type 0 scsi 1:0:0:0: CD-ROMQEMU QEMU CD-ROM 0.9. PQ: 0 ANSI: 5 ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH complete ata2: DRQ=1 with device error, dev_stat 0x49 ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 41/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) ata2.00: status: { DRDY ERR } ata2: soft resetting link ata2.00: configured for MWDMA2 ata2: EH
Re: libata exception handling messages at boot on qemu
On Tue, Jan 08, 2008 at 09:04:28PM +, Alan Cox wrote: On Tue, 8 Jan 2008 20:23:52 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Is there a workaround for the long ugly boot messages on sees with libata and qemu (0.9.0 CVS 070719)? It boots eventually, but it looks quite ugly. I suppose that's a qemu device model bug or could it be a Linux problem? libata actually bothers to check things like data directions and device state. This as far as I can tell is a qemu bug. I did look at the relevant code but it was so vomitously horrible I didn't investigate in detail. I believe the Xen qemu fork is ok ? It's fixed in qemu CVS as Jim pointed out. Still it's a little annoying to always have to update qemu from rpms to CVS. Since I assume that qemu code base is wide spread and if a workaround is not too ugly I think it would be nice if the kernel handled that. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libata exception handling messages at boot on qemu
On Tue, Jan 08, 2008 at 09:19:31PM +, Alan Cox wrote: Since I assume that qemu code base is wide spread and if a workaround is not too ugly I think it would be nice if the kernel handled that. Qemu behaves exactly the same way as a broken device in a situation where data corruption may occur. It would be extremely bad to remove sanity checking in storage subsystems just to deal with a silly bug in an emulator. It could just a quirk e.g. matching on the ID string? -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl
On Tue, Jan 08, 2008 at 07:50:47PM -0400, Kevin Winchester wrote: Andi Kleen wrote: Here's a proposal for some useful code transformations the kernel janitors could do as opposed to running checkpatch.pl. snip I notice that every driver in drivers/ata uses a .ioctl that points to ata_scsi_ioctl(). I could add the BKL to that function, and then change This might be a little more complicated. These are funnelled through the block/SCSI layers which might not have separate unlocked ioctl callbacks yet. Would be probably not very difficult to add though. all of the drivers to .unlocked_ioctl, but I assume this would be a candidate to actually clean up by determining why the lock is needed and removing it if necessary. Does anyone know off-hand the reason for needing the lock (I assume someone does or it wouldn't have survived this long)? If the lock is absolutely required, then I can write the patch to add lock_kernel() and unlock_kernel(). Just sending the patch to add lock/unlock_kernel() is probably a good idea anyways -- Jeff will then feel bad over it and eventually remove it when he figures out it is safe ;-) -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [JANITOR PROPOSAL] Switch ioctl functions to -unlocked_ioctl
Sorry about the noise here - I now notice that not all .ioctl function pointers have the option of changing to .unlocked_ioctl. In this case, the ioctl is in the struct scsi_host_template, rather than struct file_operations. I'll try to be a little more careful about the git grepping in the future. Well it just points to another area that needs to be improved. Clearly scsi_host_template should have a unlocked_ioctl too. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM II
Which in turn enables the iommu_merge functionality in gart_map_sg(). for_each_sg(sg, s, nents, i) { Hmm, another thought. Maybe this code just has trouble with the new linked SG lists and it's not really a SB600 problem? I did a quick test on two ATI machines with older chipset and iommu=force,merge and it didn't show a problem though. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
On Tuesday 20 November 2007 19:29:56 Thomas Gleixner wrote: On Tue, 20 Nov 2007, Andi Kleen wrote: This requires propably working 64bit DMA, which is not possible with the SB600 controller. It should not no. The remapping is done into the GART which is 4GB and that is the address the SB600 sees. Hmm, I just checked the boot logs of the failing 4GB kernel: BIOS-e820: 0001 - 00012000 (usable) ... CPU 0: aperture @ c00 size 32 MB Aperture too small (32 MB) No AGP bridge found Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ c00 The aperture is mapped at c00 and c00 + 64MB 4GB Memory: 4055984k/4718592k available (2146k kernel code, 136780k reserved, 1273k data, 296k init) 4718592k * 1024 == 0x12000 So now we have addresses 4G and I suspect that this is somehow related to the problem. Yes of course -- without 4GB the PCI-GART would not be used at all (unless you force it) and then no merging. Also is the aperture size of 32MB somehow related to this ? This just means the BIOS didn't initialize it properly (a lot of BIOS don't do anymore these days because they assume it's a AGP only feature) -- that is why the kernel allocated its own over memory. I think we really have to find out which request freezes it. Can you perhaps just apply this patch and post the output? Index: linux-2.6.24-rc1-hack/arch/x86/kernel/pci-gart_64.c === --- linux-2.6.24-rc1-hack.orig/arch/x86/kernel/pci-gart_64.c +++ linux-2.6.24-rc1-hack/arch/x86/kernel/pci-gart_64.c @@ -385,13 +385,19 @@ static int gart_map_sg(struct device *de unsigned long pages = 0; int need = 0, nextneed; struct scatterlist *s, *ps, *start_sg, *sgmap; - + if (nents == 0) return 0; if (!dev) dev = fallback_dev; + if (*dev-dma_mask = 0x) { + for_each_sg(sg, s, nents, i) { + printk(%d: map %lx len %u dir %d\n, i, sg_phys(s), s-length, dir); + } + } + out = 0; start = 0; start_sg = sgmap = sg; Tejun can probably figure out from that output where it comes from in libata :) -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
On Wednesday 14 November 2007 12:55, Srihari Vijayaraghavan wrote: [Sorry to reply to my own email thread] Srihari Vijayaraghavan [EMAIL PROTECTED] wrote: ... No problems. Here's the log of unworking kernel with IOMMU turned on. Basically it goes on reseting the SATA ports throwing many errors (none are present in 2.6.23 or on 2.6.24-rc with mem=3500M) for many minutes at which point I do a power reset :-(. Also the log of the working kernel with IOMMU but with mem=3500M is also attached for the record. It's basically the same above kernel just with the added parameter. Gentlemen, This changeset has introduced a regression in 2.6.24-rc, such that my machine boots no more: Hmm, you got an AHCI controller that does not do 64bit DMA masks? Or do you have CONFIG_IOMMU_DEBUG enabled? Anyways, not being able to deal with merged SG lists must be some driver or hardware bug. I would stick some printks into gart_map_sg() and try to find out where the failing DMA is initiatiated and then split it into multiple IO submissions at the caller level. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc SB600 AHCI no go on =4GB of RAM
The AHCI code falls back to 32bit DMA in that case. Which in turn causes the problem seen by Srihari. There is not much printk sticking necessary, the code is simply not handling this. What code is not handling what? IOMMU merging should be always safe. If it is not the driver should not submit things in a single SG list. So the main option right now seems to revert the iommu_merge patch. I don't think that is the correct fix. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add a global ide=off switch for drivers/ide
On Thursday 25 October 2007 23:07:23 Bartlomiej Zolnierkiewicz wrote: On Monday 15 October 2007, Andi Kleen wrote: Had a situation where drivers/ide was compiled in, but I wanted to turn it off to let the drivers/ata drivers take over. I ended up using ide*=noprobe, but that was somewhat clumpsy because I wasn't sure how many IDE interfaces the machine really had. Add a global ide=off switch to handle this situation better. Overall looks OK but I think we should limit it to IDE built-in case (when IDE is modular it is all up to the user-space anyway). Disagree. It's useful for the modular case too e.g. if you have the ide modules in your initrd and you want to not load them for some reason (e.g. debugging) Besides adding so many ifdefs would be ugly in my opinion. The patch is a little bigger because I tried to cover all modules. A few still needs to be covered: - drivers/scsi/ide-scsi.c (other directory) - drivers/ide/legacy/ide_platform.c (new driver) - drivers/ide/legacy/ide-cs.c (late_initcall) - drivers/ide/pci/sgiioc4.c (ditto, not a SFF-PCI driver) Thanks I'll fix those. I'm also not 100% sure ENODEV is the right error return for this case, but I didn't come up with a better one. -EPERM? IMO it would be more appropriate (and easy to distinguish from the real -ENODEV). Ok. +int ide_off; +EXPORT_SYMBOL(ide_off); + _GPL? Please cover it with #ifdef/#endif CONFIG_BLK_DEV_IDE as it should be valid only when IDE is built-in. This way we don't pollute device/host drivers with CONFIG_BLK_DEV_IDE #ifdefs. What CONFIG_BLK_DEV_IDE ifdefs? I added the check only to code that is already conditional to this I believe and there were no additional ifdefs at all. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ata_exec_internal crash at boot in -git22
One of the systems tested in autoboot crashes at boot with with -git22. This is a AMD 2 socket Opteron NUMA system. The tester was a little flakey and happened to hit the x86-merge-broke- compilation window, so the last good data point I have is 2.6.23-rc9. -Andi megasas: 00.00.03.05 Mon Oct 02 11:21:32 PDT 2006 ACPI: PCI Interrupt :03:05.0[A] - GSI 17 (level, low) - IRQ 17 ata1: SATA max UDMA/100 cmd 0xC2006C80 ctl 0xC2006C8A bmdma 0xC2006C00 irq 17 ata2: SATA max UDMA/100 cmd 0xC2006CC0 ctl 0xC2006CCA bmdma 0xC2006C08 irq 17 ata3: SATA max UDMA/100 cmd 0xC2006E80 ctl 0xC2006E8A bmdma 0xC2006E00 irq 17 ata4: SATA max UDMA/100 cmd 0xC2006EC0 ctl 0xC2006ECA bmdma 0xC2006E08 irq 17 scsi0 : sata_sil ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata1.00: ATA-7, max UDMA/133, 234441648 sectors: LBA48 NCQ (depth 0/32) ata1.00: ata1: dev 0 multi count 16 Unable to handle kernel paging request at 2277 RIP: [80259b03] pfn_to_page+0x1f/0x32 PGD 0 Oops: [1] SMP CPU 2 Modules linked in: Pid: 915, comm: scsi_eh_0 Not tainted 2.6.19-git22 #24 RIP: 0010:[80259b03] [80259b03] pfn_to_page+0x1f/0x32 RSP: :810004f8 EFLAGS: 00010216 RAX: 0040 RBX: RCX: 0020 RDX: 000f RSI: 810004fbbc40 RDI: 0007f000 RBP: R08: R09: R10: 0001 R11: 80352a9b R12: 0003 R13: R14: 810004fbbc40 R15: 8100f7c2c708 FS: () GS:8101000eabc0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2277 CR3: 00201000 CR4: 06e0 Process scsi_eh_0 (pid: 915, threadinfo 810004fba000, task 8100f7a54870) Stack: 8043f1d1 8100f7c2c508 8100f7c2c708 8100f7c2c708 8100f7c2c508 Call Trace: [8043f1d1] ata_exec_internal+0x5a/0x96 [8043fab3] ata_set_mode+0x415/0x564 [80445b2f] ata_do_eh+0x11af/0x1493 [804464a2] ata_scsi_error+0x26f/0x509 [80403a67] scsi_error_handler+0x73/0x67b [80242f04] kthread+0xd1/0x101 [8020a318] child_rip+0xa/0x12 Code: 48 2b ba 68 22 00 00 48 6b c7 38 48 03 82 58 22 00 00 c3 48 RIP [80259b03] pfn_to_page+0x1f/0x32 RSP 810004f8 CR2: 2277 - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ata_exec_internal crash at boot in -git22
On Monday 22 October 2007 20:26:45 Jens Axboe wrote: On Mon, Oct 22 2007, Andi Kleen wrote: One of the systems tested in autoboot crashes at boot with with -git22. This is a AMD 2 socket Opteron NUMA system. The tester was a little flakey and happened to hit the x86-merge-broke- compilation window, so the last good data point I have is 2.6.23-rc9. Andi, can you test with this patch applied? http://brick.kernel.dk/sg-git.patch Sorry was a mistake (cue: there is no 2.6.23-git22 yet). It turned out to be an old broken kernel. Current -git seems to boot. Sorry for the noise. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add a global ide=off switch for drivers/ide
Had a situation where drivers/ide was compiled in, but I wanted to turn it off to let the drivers/ata drivers take over. I ended up using ide*=noprobe, but that was somewhat clumpsy because I wasn't sure how many IDE interfaces the machine really had. Add a global ide=off switch to handle this situation better. The patch is a little bigger because I tried to cover all modules. I'm also not 100% sure ENODEV is the right error return for this case, but I didn't come up with a better one. The ARM/MIPS part is uncompiled. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux-2.6.23-rc8-misc/drivers/ide/ide.c === --- linux-2.6.23-rc8-misc.orig/drivers/ide/ide.c +++ linux-2.6.23-rc8-misc/drivers/ide/ide.c @@ -91,6 +91,9 @@ static const u8 ide_hwif_to_major[] = { static int idebus_parameter; /* holds the idebus= parameter */ static int system_bus_speed; /* holds what we think is VESA/PCI bus speed */ +int ide_off; +EXPORT_SYMBOL(ide_off); + DEFINE_MUTEX(ide_cfg_mtx); __cacheline_aligned_in_smp DEFINE_SPINLOCK(ide_lock); @@ -1249,6 +1252,13 @@ static int __init ide_setup(char *s) return 0; printk(KERN_INFO ide_setup: %s, s); + + if (!strcmp(s, ide=off)) { + printk( : IDE disabled\n); + ide_off = 1; + return 1; + } + init_ide_data (); #ifdef CONFIG_BLK_DEV_IDEDOUBLER @@ -1717,6 +1727,9 @@ static int __init ide_init(void) { int ret; + if (ide_off) + return -ENODEV; + printk(KERN_INFO Uniform Multi-Platform E-IDE driver REVISION \n); system_bus_speed = ide_system_bus_speed(); Index: linux-2.6.23-rc8-misc/drivers/ide/setup-pci.c === --- linux-2.6.23-rc8-misc.orig/drivers/ide/setup-pci.c +++ linux-2.6.23-rc8-misc/drivers/ide/setup-pci.c @@ -793,6 +793,8 @@ static LIST_HEAD(ide_pci_drivers); int __ide_pci_register_driver(struct pci_driver *driver, struct module *module, const char *mod_name) { + if (ide_off) + return -ENODEV; if(!pre_init) return __pci_register_driver(driver, module, mod_name); driver-driver.owner = module; Index: linux-2.6.23-rc8-misc/include/linux/ide.h === --- linux-2.6.23-rc8-misc.orig/include/linux/ide.h +++ linux-2.6.23-rc8-misc/include/linux/ide.h @@ -871,6 +871,8 @@ typedef struct ide_driver_s ide_driver_t extern struct mutex ide_setting_mtx; +extern int ide_off; + int set_io_32bit(ide_drive_t *, int); int set_pio_mode(ide_drive_t *, int); int set_using_dma(ide_drive_t *, int); Index: linux-2.6.23-rc8-misc/drivers/ide/arm/bast-ide.c === --- linux-2.6.23-rc8-misc.orig/drivers/ide/arm/bast-ide.c +++ linux-2.6.23-rc8-misc/drivers/ide/arm/bast-ide.c @@ -52,6 +52,9 @@ bastide_register(unsigned int base, unsi static int __init bastide_init(void) { + if (ide_off) + return -ENODEV; + /* we can treat the VR1000 and the BAST the same */ if (!(machine_is_bast() || machine_is_vr1000())) Index: linux-2.6.23-rc8-misc/drivers/ide/arm/icside.c === --- linux-2.6.23-rc8-misc.orig/drivers/ide/arm/icside.c +++ linux-2.6.23-rc8-misc/drivers/ide/arm/icside.c @@ -824,6 +824,8 @@ static struct ecard_driver icside_driver static int __init icside_init(void) { + if (ide_off) + return -ENODEV; return ecard_register_driver(icside_driver); } Index: linux-2.6.23-rc8-misc/drivers/ide/arm/rapide.c === --- linux-2.6.23-rc8-misc.orig/drivers/ide/arm/rapide.c +++ linux-2.6.23-rc8-misc/drivers/ide/arm/rapide.c @@ -113,6 +113,8 @@ static struct ecard_driver rapide_driver static int __init rapide_init(void) { + if (ide_off) + return -ENODEV; return ecard_register_driver(rapide_driver); } Index: linux-2.6.23-rc8-misc/drivers/ide/ide-cd.c === --- linux-2.6.23-rc8-misc.orig/drivers/ide/ide-cd.c +++ linux-2.6.23-rc8-misc/drivers/ide/ide-cd.c @@ -3568,6 +3568,8 @@ static void __exit ide_cdrom_exit(void) static int __init ide_cdrom_init(void) { + if (ide_off) + return -ENODEV; return driver_register(ide_cdrom_driver.gen_driver); } Index: linux-2.6.23-rc8-misc/drivers/ide/ide-disk.c === --- linux-2.6.23-rc8-misc.orig/drivers/ide/ide-disk.c +++ linux-2.6.23-rc8-misc/drivers/ide/ide-disk.c @@ -1329,6 +1329,8 @@ static void __exit idedisk_exit (void) static int __init idedisk_init(void) { + if (ide_off) + return
Re: [PATCH] drivers/firmware: const-ify DMI API and internals
And if we're really lucky, this might enable some additional optimizations on the part of the compiler. Only if the kernel was compiled C++. C compilers generally ignore constness for optimization purposes because it can be so easily casted away. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs) II
Alan Cox [EMAIL PROTECTED] writes: BTW unless I'm misreading the i386 code it'll not fail here, but allocate memory. Surely that will cause failures later if you rely on it failing? If you don't rely on it then changing x86-64 will also not help you. Eww that'll do strange things. In theory I could change i386 too to return NULL in this case (dma_mask == NULL). But I wonder how how many existing PCMCIA drivers I would break this way? Probably needs more testing at least. My current patch is in ftp://ftp.firstfloor.org/pub/ak/x86_64/late-merge/patches/dma-alloc-mask Ok so we do in fact need some kind of proper way to ask if a device is DMA capable ? Right now it seems to be (dev-dma_mask != NULL) -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs) II
Surely we don't need to wait until then? This is the correct fix, isn't it? (Obviously I'll split it into a generic and a pcmcia specific piece if it looks OK to everyone). It sets the PCMCIA dma_mask up correctly and introduces a DMA_MASK_NONE (I prefer that to DMA_0BIT_MASK but I can add that too if people want) and gives Alan his is_device_dma_capable() API. Patch looks good to me. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)
On Thu, Aug 09, 2007 at 02:53:36PM +0100, Alan Cox wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8424 - patch review This one is on Alan. I think not - something horrible is happening in dma_alloc_coherent when called from dmam_* with a non PCI device Seems to be some kind of AMD64 specific DMA mapping bug ? I think it's dev-dma_mask == NULL. Clearly you're passing a non DMA able device to dma_alloc_coherent(). Which seems like a caller bug. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)
On Thu, Aug 09, 2007 at 06:21:01PM +0100, Alan Cox wrote: Seems to be some kind of AMD64 specific DMA mapping bug ? I think it's dev-dma_mask == NULL. Clearly you're passing a non DMA able device to dma_alloc_coherent(). Which seems like a caller bug. Ok - other archs seem to just return NULL in this case. If its your view Ok then the initialization would fail. I could do that, but clearly there must be some other bug in the process that tries to do DMA on such random devices in the first place. that dma_alloc_coherent(dev) should explode not return NULL then we can wrap dma_alloc_coherent but I'm not sure the caller library has any business knowing whether a specific class of device is DMA capable on a specific platform ? Someone must ask that caller library to DMA to/from that device in the first place. Whoever it is it is wrong. Or perhaps you got the wrong device here? For ISA devices we traditionally used NULL. Or if you set up your own ISA devices (which I can't see a reason for but there might be one I'm missing) at least give them a dma mask. Then it should probably work on x86-64. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)
On Thu, Aug 09, 2007 at 06:53:10PM +0100, Alan Cox wrote: Or perhaps you got the wrong device here? For ISA devices we traditionally used NULL. Or if you set up your own ISA devices (which I can't see a reason for but there might be one I'm missing) at least give them a dma mask. Then it should probably work on x86-64. The libata code currently (and this seems to work for all but x86-64) does the following if it is setting up a potentially DMA capable device Where does the device come from? What device is it? Is that known already? Thus I think dma_alloc_coherent() for an ISA, PCMCIA or other class (platform particularly) shouldn't explode on AMD64 but simply return NULL. Its a sane request to make when you don't in your library know what dev is. But at least on x86-64 the device is likely DMA capable. At least PCMCIA usually is. Near all devices are DMA capable, except perhaps SMBus and some real oddballs. So the device should not have a NULL dma_mask in the first place. Or it is the wrong device. e.g. we have platform devices for things like timers etc. which are really not DMA capable because it doesn't make much sense to DMA your timer; but there should be really no reason to use such a device in a libata driver. If it's some kind of PCMCIA device like implied by pata_pcmcia then it's likely DMA capable. So if I return NULL then either a really DMA capable device will not work. Or you got a real oddball device that you shouldn't have gotten in the first place. Perhaps the PCMCIA code sets up some of its devices wrong? -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)
On Thu, Aug 09, 2007 at 11:34:58PM +0100, Alan Cox wrote: Where does the device come from? What device is it? At the higher level someone passed us a device and some mappings and function methods and said this is an IDE controller Ok you're stabbing in the dark. I guess more debugging is needed. But at least on x86-64 the device is likely DMA capable. At least PCMCIA usually is. Near all devices are DMA capable, except perhaps PCMCIA usally isn't. Hmm? I don't want to know about arch internals. I don't want to know about platform specific DMA rules. I just want to be able to ask for DMA and get told yes or no. Oops is not a useful error return. I'm not blaming libata here, just whoever gave that bogus device to it. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs)
I'll submit a patch to check this in my next batch. But as James pointed out you'll likely need similar patches on other architectures. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AMD64 dma_alloc_coherent crashes on non PCI device (was SATA open bugs) II
BTW unless I'm misreading the i386 code it'll not fail here, but allocate memory. Surely that will cause failures later if you rely on it failing? If you don't rely on it then changing x86-64 will also not help you. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sata_sil, writing bug with multiple cards?
On Wednesday 04 July 2007 10:17:34 [EMAIL PROTECTED] wrote: Most likely it is some sort of hardware bug that we might not be able to do much about. Have you tried contacting SIL or VIA? No, I haven't. Like I mentioned above, the OpenBSD drivers seemed to work, or at least did with similar tests. I would need to run the more extensive checks to be positive, but those take a lot of time, obviously. And downtime for the box, a lot of which isn't really manageable, at the moment. Perhaps the OpenBSD drivers program the SIL chip in a different way that avoids this problem. e.g. if you have some other system with a different chipset it might be useful to test the SIL controllers in those. The previous motherboard was an AMD 760 chipset, and it had the same problem. Ok this means it's likely a SIL issue, not a chipset issue. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Libata PATA status
Alan Cox [EMAIL PROTECTED] writes: Post SRST What is SRST? My personal wish list feature would be a little forwarder driver to forward /dev/hd* to /dev/sd* for this; then old IDE could be disabled without risking breaking old root file systems. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Libata PATA status
You could probably reliably map hda/b/c/d initially with some kind of forwarder providing nobody hot plugged them. Just not sure I see the PATA hotplug? SATA systems typically already use /dev/sda*, it just applies to PATA. point of doing it kernel side. The point would be that old user land just works without any changes. I know that is a virtue that has gone out of fashion recently with udev and sysfs, but older distributions that weren't that sysfs-layout-of-the-week dependent used to be fairly kernel version independent. I always found that very useful. The hda-sda move would be the only radical change that prevents booting for a long time (the last one before that was the 2.5 modutils transition) -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Silent corruption on AMD64
Aaron Lehmann [EMAIL PROTECTED] writes: [adding netdev] [meta-comment: I wish people wouldn't use such unnecessarily broad subjects -- how is it the x86-64 port's or AMD's fault when you have broken hardware? Would anybody write Silent corruption on i386 or Silent corruption on Intel or Silent corruption on Linux?] On Sat, Mar 31, 2007 at 08:03:16PM -0700, Jim Paris wrote: Since it shows up under heavy load that includes unrelated devices, I think ruling out hardware problems is important. Some suggestions: I've been able to narrow it down to the Realtek Ethernet card. I can't reproduce the problem using onboard Ethernet, whereas the Realtek card causes trouble in any slot. However, I still don't know whether it's a hardware or software issue, or whether it's caused directly or indirectly by the Realtek card. You could disable the hardware checksumming support in the card with the appended patch. Then hopefully Linux will catch most corruptions (but perhaps not all because TCP checksums are not very strong) You can watch failed checksums then with netstat -s -Andi Index: linux-2.6.21-rc3-net/drivers/net/r8169.c === --- linux-2.6.21-rc3-net.orig/drivers/net/r8169.c +++ linux-2.6.21-rc3-net/drivers/net/r8169.c @@ -2477,6 +2477,7 @@ static inline int rtl8169_fragmented_fra static inline void rtl8169_rx_csum(struct sk_buff *skb, struct RxDesc *desc) { +#if 0 u32 opts1 = le32_to_cpu(desc-opts1); u32 status = opts1 RxProtoMask; @@ -2485,6 +2486,7 @@ static inline void rtl8169_rx_csum(struc ((status == RxProtoIP) !(opts1 IPFail))) skb-ip_summed = CHECKSUM_UNNECESSARY; else +#endif skb-ip_summed = CHECKSUM_NONE; } - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Add suspend/resume for HPET
Ingo Molnar [EMAIL PROTECTED] writes: there's no fundamental reason. x86_64 COW-ed hpet_timer.c and time_hpet.c years ago and drifted off into different areas. Not quite -- x86-64 did HPET long before i386; the only stuff cowed was the character driver support code. But the core HPET code was always totally different code streams. We never did the complicated pluggable clock code i386 did though -- i never quite saw the point of that because there aren't that many timers there. Of course it is already obsolete with clocksources now. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Do you mean between disabling IRQ mechanisms and enabling PCI device in pcim_prepare_device()? Yes. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] PCI prepare/activate instead of enable to avoid IRQ storm and rogue DMA access
Tejun Heo [EMAIL PROTECTED] writes: Let's assume there's a device which shares its INTX IRQ line with another device and the other one is already initialized. During boot, due to BIOS's fault, bad hardware design or sheer bad luck, the device has got a pending IRQ. This seems to be also common after kexec during kexec crashdumps where the device just continues doing what it did before the crash. This patch expands the pci_set_master() approach. Instead of enabling the device in one go, it's done in two steps - prepare and activate. 'prepare' enables access to PCI configuration, I hope there aren't any new erratas triggered by this. Perhaps it would make sense to add some paranoia sleeps at least before touching other state? -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes
On Thu, Jul 07, 2005 at 09:21:55AM -0700, Christoph Lameter wrote: On Wed, 6 Jul 2005, Andi Kleen wrote: Without this patch a dual Xeon EM64T machine would oops on boot because the hwif pointer here was NULL. I also added a check for pci_dev because it's doubtful that all IDE devices have pci_devs. Here is IMHO the right way to fix this. Test for the hwif != NULL and test for pci_dev != NULL before determining the node number of the pci bus that the device is connected to. Maybe we need a hwif_to_node for ide drivers that is also able to determine the locality of other hardware? Hmm? Where is the difference? This is 100% equivalent to my code except that you compressed it all into a single expression. The former one was more readable. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes
On Thu, Jul 07, 2005 at 09:32:51AM -0700, Christoph Lameter wrote: On Thu, 7 Jul 2005, Andi Kleen wrote: On Thu, Jul 07, 2005 at 09:21:55AM -0700, Christoph Lameter wrote: On Wed, 6 Jul 2005, Andi Kleen wrote: Without this patch a dual Xeon EM64T machine would oops on boot because the hwif pointer here was NULL. I also added a check for pci_dev because it's doubtful that all IDE devices have pci_devs. Here is IMHO the right way to fix this. Test for the hwif != NULL and test for pci_dev != NULL before determining the node number of the pci bus that the device is connected to. Maybe we need a hwif_to_node for ide drivers that is also able to determine the locality of other hardware? Hmm? Where is the difference? node = -1 if the node cannot be determined. But that will crash right now. Trading one crash for another doesn't seem to be particularly useful. This is 100% equivalent to my code except that you compressed it all into a single expression. My patch consistently checks for hwif != NULL and pci_dev != NULL. There was someother stuff in your patch. This patch does not add any additional variables and is more readable. No it was exactly the same except for a working node number. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [another PATCH] Fix crash on boot in kmalloc_node IDE changes
On Thu, Jul 07, 2005 at 12:09:00PM -0700, Christoph Lameter wrote: On Thu, 7 Jul 2005, Linus Torvalds wrote: Yes. Except that if hwif is NULL, we'll have other oopses since we access that in other places. Why _is_ hwif NULL anyway? That's another, unrelated thing, and should probably have a separate check and an early return. I was wondering about that one as well. Andi brought it up. I don't know why hwif was NULL, but my kernel definitely crashed. hwif was NULL in the first function (I first misread the oops and thought it was pci_dev NULL, but it wasn't). For the second I didn't verify it was hwif or pci_dev NULL, but one of them was too. The setup was a Intel board with 1 PATA/4 SATA onboard and only a CD-ROM and a external Promise PATA controller with two PATA disks. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix crash on boot in kmalloc_node IDE changes
drive-hwif check is redundant, please remove it It's not. My first version didn't have it but it still crashed. It's what actually prevents the crash. I also don't know why, but it's true. The machine had four IDE controllers BTW (on board an an external Promise card) -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix crash on boot in kmalloc_node IDE changes
On Wed, Jul 06, 2005 at 09:34:28AM -0700, Christoph Lameter wrote: On Wed, 6 Jul 2005, Andi Kleen wrote: - q = blk_init_queue_node(do_ide_request, ide_lock, - pcibus_to_node(drive-hwif-pci_dev-bus)); + int node = 0; /* Should be -1 */ Why is this not -1? Because there is no code in rc3 that handles -1 in kmalloc_node. If you add a patch that handles it then feel free to change. But fixing the bootup has the highest priority. + int node = 0; + if (hwif-drives[0].hwif) { Also needs to be -1. Then it would crash again. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix crash on boot in kmalloc_node IDE changes
On Wed, 6 Jul 2005 16:35:11 +0200 Bartlomiej Zolnierkiewicz [EMAIL PROTECTED] wrote: On 7/6/05, Andi Kleen [EMAIL PROTECTED] wrote: drive-hwif check is redundant, please remove it It's not. My first version didn't have it but it still crashed. It's what actually prevents the crash. I also don't know why, but it's true. very weird as HWIF(drive) == drive-hwif: ide_hwif_t *hwif = HWIF(drive); ... q = blk_init_queue_node(do_ide_request, ide_lock, pcibus_to_node(drive-hwif-pci_dev-bus)); ^^ I oops on that one. Maybe it takes the early return. if (!q) return 1; ... if (!hwif-rqsize) { you should OOPS here I would appreciate if some form of this patch is merged asap because it breaks booting on my test machines. -Andi - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html