Re: Question regarding mutex locking
Larry Finger wrote: If a particular routine needs to lock a mutex, but it may be entered with that mutex already locked, would the following code be SMP safe? hold_lock = mutex_trylock() .. if (hold_lock) mutex_unlock() Not if another task could be acquiring that lock at the same time, which is probably the case, otherwise you wouldn't need the mutex. In other words, if you're going to do this, you might as well toss the mutex entirely as it's about the same effect.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dynticks Causing High Context Switch Rate in ksoftirqd
[EMAIL PROTECTED] wrote: Hello Robert, I've attached additional detail on the config of the misbehaving system including output from oprofile and PowerTop. PowerTop output leads me to believe that maybe this is an interaction between my bridged ethernet setup and dynticks? Hmmm... Don't know about that, your top wakeups are from br_stp_enable_bridge, but that is only 26 a second - that doesn't explain a context switch rate of 150,000 a second.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dynticks Causing High Context Switch Rate in ksoftirqd
[EMAIL PROTECTED] wrote: Hello Robert, I've attached additional detail on the config of the misbehaving system including output from oprofile and PowerTop. PowerTop output leads me to believe that maybe this is an interaction between my bridged ethernet setup and dynticks? Hmmm... Don't know about that, your top wakeups are from br_stp_enable_bridge, but that is only 26 a second - that doesn't explain a context switch rate of 150,000 a second.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question regarding mutex locking
Larry Finger wrote: If a particular routine needs to lock a mutex, but it may be entered with that mutex already locked, would the following code be SMP safe? hold_lock = mutex_trylock() .. if (hold_lock) mutex_unlock() Not if another task could be acquiring that lock at the same time, which is probably the case, otherwise you wouldn't need the mutex. In other words, if you're going to do this, you might as well toss the mutex entirely as it's about the same effect.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dynticks Causing High Context Switch Rate in ksoftirqd
[EMAIL PROTECTED] wrote: Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle system? The problem occurs if I config the kernel with tickless support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to "oprofile" for putting me onto this.) I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and 2.6.23.9 ** *** Output from "vmstat -n 1 10" -- Note very high context switch rate *** *** This is on a idle machine! *** ** procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 0 0 1925556 4768 11610400 124 26 7538 1 2 96 1 0 0 0 1925556 4768 11610400 0 02 147329 0 1 99 0 What did oprofile show? It should be able to narrow down what function(s) are responsible for the CPU usage.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dynticks Causing High Context Switch Rate in ksoftirqd
[EMAIL PROTECTED] wrote: Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle system? The problem occurs if I config the kernel with tickless support (i.e. CONFIG_TICK_ONESHOT=y). (Thanks to oprofile for putting me onto this.) I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and 2.6.23.9 ** *** Output from vmstat -n 1 10 -- Note very high context switch rate *** *** This is on a idle machine! *** ** procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 0 0 1925556 4768 11610400 124 26 7538 1 2 96 1 0 0 0 1925556 4768 11610400 0 02 147329 0 1 99 0 What did oprofile show? It should be able to narrow down what function(s) are responsible for the CPU usage.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v3)
We need to run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Enforce this with BUG_ON() since data corruption could potentially result if this happened. Also, fail any attempt to try and issue NCQ commands with result taskfile requested, since the hardware doesn't allow this. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-25 16:28:58.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-25 16:31:09.0 -0600 @@ -792,11 +792,13 @@ static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf) { - /* Since commands where a result TF is requested are not - executed in ADMA mode, the only time this function will be called - in ADMA mode will be if a command fails. In this case we - don't care about going into register mode with ADMA commands - pending, as the commands will all shortly be aborted anyway. */ + /* Other than when internal or pass-through commands are executed, + the only time this function will be called in ADMA mode will be + if a command fails. In the failure case we don't care about going + into register mode with ADMA commands pending, as the commands will + all shortly be aborted anyway. We assume that NCQ commands are not + issued via passthrough, which is the only way that switching into + ADMA mode could abort outstanding commands. */ nv_adma_register_mode(ap); ata_tf_read(ap, tf); @@ -1379,11 +1381,9 @@ struct nv_adma_port_priv *pp = qc->ap->private_data; /* ADMA engine can only be used for non-ATAPI DMA commands, - or interrupt-driven no-data commands, where a result taskfile - is not required. */ + or interrupt-driven no-data commands. */ if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) || - (qc->tf.flags & ATA_TFLAG_POLLING) || - (qc->flags & ATA_QCFLAG_RESULT_TF)) + (qc->tf.flags & ATA_TFLAG_POLLING)) return 1; if ((qc->flags & ATA_QCFLAG_DMAMAP) || @@ -1401,6 +1401,8 @@ NV_CPB_CTL_IEN; if (nv_adma_use_reg_mode(qc)) { + BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) && + (qc->flags & ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc->ap); ata_qc_prep(qc); return; @@ -1445,9 +1447,21 @@ VPRINTK("ENTER\n"); + /* We can't handle result taskfile with NCQ commands, since + retrieving the taskfile switches us out of ADMA mode and would abort + existing commands. */ + if (unlikely(qc->tf.protocol == ATA_PROT_NCQ && +(qc->flags & ATA_QCFLAG_RESULT_TF))) { + ata_dev_printk(qc->dev, KERN_ERR, + "NCQ w/ RESULT_TF not allowed\n"); + return AC_ERR_SYSTEM; + } + if (nv_adma_use_reg_mode(qc)) { /* use ATA register mode */ VPRINTK("using ATA register mode: 0x%lx\n", qc->flags); + BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) && + (qc->flags & ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc->ap); return ata_qc_issue_prot(qc); } else - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: Serial disabled in BIOS but serial modules still loaded (probably PnP related)
Andrey Borzenkov wrote: I have no COM port on notebook (without port replicator which I do not have) so COM is disabled in BIOS. No ttyS* is detected during boot (and no device created) but I just noticed that serial modules are still loaded. Well, this partially defeats the purpose of disabling COM port - the intention was to free resources by *not* loading unneeded modules ... This may have something to do with (ACPI) PnP which apparently believes COM is alive. Notebook is Toshiba Portege 4000. Probably a BIOS bug. It still lists the port in PnP data even though the hardware is disabled, so the kernel still tries to load the serial driver for it, which finds there's no port there. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth ethernet driver & Low power state
Jeroen wrote: Hi, I'm migrating my server from windows 2003 server to Ubuntu, but I am stumbling over the "Low Power State Link Speed" option for my NIC (forcedeth) I need to disable this option in my windows driver otherwise the trough pout is horrible because the link fluctuates constantly from 100/1000. Anyway, my question is where and how can I turn off this feature for the forcedeth driver? I've looked in the source and as far as I can tell there is no bootoption for this. There are some references noted in the code, but AFAIK no setting. Any ideas? Thanks in advance! Are you sure forcedeth even supports that feature? I haven't seen any code for it, and certainly it should never be enabled by default.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform
peerchen wrote: According to the HyperTransport spec, 'En' indicate if the MSI Mapping is active. So it should be set when enable the MSI. The patch base on kernel 2.6.24-rc3 Signed-off-by: Andy Currid <[EMAIL PROTECTED]> Signed-off-by: Peer Chen <[EMAIL PROTECTED]> Isn't there a way we can make this work for any upstream HT bridge, rather than only for specific NVIDIA chipsets? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform
peerchen wrote: According to the HyperTransport spec, 'En' indicate if the MSI Mapping is active. So it should be set when enable the MSI. The patch base on kernel 2.6.24-rc3 Signed-off-by: Andy Currid [EMAIL PROTECTED] Signed-off-by: Peer Chen [EMAIL PROTECTED] Isn't there a way we can make this work for any upstream HT bridge, rather than only for specific NVIDIA chipsets? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth ethernet driver Low power state
Jeroen wrote: Hi, I'm migrating my server from windows 2003 server to Ubuntu, but I am stumbling over the Low Power State Link Speed option for my NIC (forcedeth) I need to disable this option in my windows driver otherwise the trough pout is horrible because the link fluctuates constantly from 100/1000. Anyway, my question is where and how can I turn off this feature for the forcedeth driver? I've looked in the source and as far as I can tell there is no bootoption for this. There are some references noted in the code, but AFAIK no setting. Any ideas? Thanks in advance! Are you sure forcedeth even supports that feature? I haven't seen any code for it, and certainly it should never be enabled by default.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24: Serial disabled in BIOS but serial modules still loaded (probably PnP related)
Andrey Borzenkov wrote: I have no COM port on notebook (without port replicator which I do not have) so COM is disabled in BIOS. No ttyS* is detected during boot (and no device created) but I just noticed that serial modules are still loaded. Well, this partially defeats the purpose of disabling COM port - the intention was to free resources by *not* loading unneeded modules ... This may have something to do with (ACPI) PnP which apparently believes COM is alive. Notebook is Toshiba Portege 4000. Probably a BIOS bug. It still lists the port in PnP data even though the hardware is disabled, so the kernel still tries to load the serial driver for it, which finds there's no port there. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v3)
We need to run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Enforce this with BUG_ON() since data corruption could potentially result if this happened. Also, fail any attempt to try and issue NCQ commands with result taskfile requested, since the hardware doesn't allow this. Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-25 16:28:58.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-25 16:31:09.0 -0600 @@ -792,11 +792,13 @@ static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf) { - /* Since commands where a result TF is requested are not - executed in ADMA mode, the only time this function will be called - in ADMA mode will be if a command fails. In this case we - don't care about going into register mode with ADMA commands - pending, as the commands will all shortly be aborted anyway. */ + /* Other than when internal or pass-through commands are executed, + the only time this function will be called in ADMA mode will be + if a command fails. In the failure case we don't care about going + into register mode with ADMA commands pending, as the commands will + all shortly be aborted anyway. We assume that NCQ commands are not + issued via passthrough, which is the only way that switching into + ADMA mode could abort outstanding commands. */ nv_adma_register_mode(ap); ata_tf_read(ap, tf); @@ -1379,11 +1381,9 @@ struct nv_adma_port_priv *pp = qc-ap-private_data; /* ADMA engine can only be used for non-ATAPI DMA commands, - or interrupt-driven no-data commands, where a result taskfile - is not required. */ + or interrupt-driven no-data commands. */ if ((pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) || - (qc-tf.flags ATA_TFLAG_POLLING) || - (qc-flags ATA_QCFLAG_RESULT_TF)) + (qc-tf.flags ATA_TFLAG_POLLING)) return 1; if ((qc-flags ATA_QCFLAG_DMAMAP) || @@ -1401,6 +1401,8 @@ NV_CPB_CTL_IEN; if (nv_adma_use_reg_mode(qc)) { + BUG_ON(!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) + (qc-flags ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc-ap); ata_qc_prep(qc); return; @@ -1445,9 +1447,21 @@ VPRINTK(ENTER\n); + /* We can't handle result taskfile with NCQ commands, since + retrieving the taskfile switches us out of ADMA mode and would abort + existing commands. */ + if (unlikely(qc-tf.protocol == ATA_PROT_NCQ +(qc-flags ATA_QCFLAG_RESULT_TF))) { + ata_dev_printk(qc-dev, KERN_ERR, + NCQ w/ RESULT_TF not allowed\n); + return AC_ERR_SYSTEM; + } + if (nv_adma_use_reg_mode(qc)) { /* use ATA register mode */ VPRINTK(using ATA register mode: 0x%lx\n, qc-flags); + BUG_ON(!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) + (qc-flags ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc-ap); return ata_qc_issue_prot(qc); } else - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Jeff Garzik wrote: Robert Hancock wrote: Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. sata_mv doesn't do ATAPI at all... Right.. missed that ATA_FLAG_NO_ATAPI. So these issues Tom is reporting are just with a normal SATA hard drive? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Mark Lord wrote: Morrison, Tom wrote: I am hopeful that the sata_mv has this bug (I proved that the problem I was experiencing was due to the sata_mv driver with 3.75Gig or more of memory)... I am on vacation for a week or more ...or I'd tell you today if it did have this bug! .. Yeah, I kind of had your reports in mind when I asked that. :) On a related note, I now have lots of Marvell (sata_mv) hardware here, and an Intel CPU/chipset box with physical RAM above the 4GB boundary. Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. Likely it needs a similar slave_config trick to change bounce limit depending on the connected device, unless there is really a way to issue ATAPI commands with this EDMA interface, as the TODO list in sata_mv.c suggests may be possible.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Mark Lord wrote: Morrison, Tom wrote: I am hopeful that the sata_mv has this bug (I proved that the problem I was experiencing was due to the sata_mv driver with 3.75Gig or more of memory)... I am on vacation for a week or more ...or I'd tell you today if it did have this bug! .. Yeah, I kind of had your reports in mind when I asked that. :) On a related note, I now have lots of Marvell (sata_mv) hardware here, and an Intel CPU/chipset box with physical RAM above the 4GB boundary. Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. Likely it needs a similar slave_config trick to change bounce limit depending on the connected device, unless there is really a way to issue ATAPI commands with this EDMA interface, as the TODO list in sata_mv.c suggests may be possible.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
Jeff Garzik wrote: Robert Hancock wrote: Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask unconditionally, but for non-ATA_PROT_DMA commands (which includes all ATAPI), it just falls back to ata_qc_issue_prot which issues via the legacy SFF interface and can only handle 32-bit addressing. So yes, it appears to have a similar bug as sata_nv had. sata_mv doesn't do ATAPI at all... Right.. missed that ATA_FLAG_NO_ATAPI. So these issues Tom is reporting are just with a normal SATA hard drive? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)
We need to run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Enforce this with BUG_ON() since data corruption could potentially result if this happened. Also WARN_ON() if we try and send result taskfile commands while NCQ commands are still active, since the hardware doesn't allow this. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.24-rc3-git1/drivers/ata/sata_nv.c 2007-11-20 17:40:09.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 19:40:58.0 -0600 @@ -791,11 +791,13 @@ static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf) { - /* Since commands where a result TF is requested are not - executed in ADMA mode, the only time this function will be called - in ADMA mode will be if a command fails. In this case we - don't care about going into register mode with ADMA commands - pending, as the commands will all shortly be aborted anyway. */ + /* Other than when internal or pass-through commands are executed, + the only time this function will be called in ADMA mode will be + if a command fails. In the failure case we don't care about going + into register mode with ADMA commands pending, as the commands will + all shortly be aborted anyway. We assume that NCQ commands are not + issued via passthrough, which is the only way that switching into + ADMA mode could abort outstanding commands. */ nv_adma_register_mode(ap); ata_tf_read(ap, tf); @@ -1359,11 +1361,9 @@ struct nv_adma_port_priv *pp = qc->ap->private_data; /* ADMA engine can only be used for non-ATAPI DMA commands, - or interrupt-driven no-data commands, where a result taskfile - is not required. */ + or interrupt-driven no-data commands. */ if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) || - (qc->tf.flags & ATA_TFLAG_POLLING) || - (qc->flags & ATA_QCFLAG_RESULT_TF)) + (qc->tf.flags & ATA_TFLAG_POLLING)) return 1; if ((qc->flags & ATA_QCFLAG_DMAMAP) || @@ -1381,6 +1381,8 @@ NV_CPB_CTL_IEN; if (nv_adma_use_reg_mode(qc)) { + BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) && + (qc->flags & ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc->ap); ata_qc_prep(qc); return; @@ -1425,9 +1427,17 @@ VPRINTK("ENTER\n"); + /* We can't handle result taskfile with NCQ commands active, since + retrieving the taskfile switches us out of ADMA mode and would abort + existing commands. */ + WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) && + (qc->ap->qc_allocated & ~(1 << qc->tag))); + if (nv_adma_use_reg_mode(qc)) { /* use ATA register mode */ VPRINTK("using ATA register mode: 0x%lx\n", qc->flags); + BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) && + (qc->flags & ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc->ap); return ata_qc_issue_prot(qc); } else - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-22 19:42:28.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 19:48:25.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -748,7 +749,7 @@ adma_enable = 0; nv_adma_register_mode(ap); } else { - bounce_limit = *ap->dev->dma_mask; + bounce_limit = pp->adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; @@ -1134,10 +1135,20 @@ void *mem; dma_addr_t mem_dma; void __iomem *mmio; + struct pci_dev *pdev = to_pci_dev(dev); u16 tmp; VPRINTK("ENTER\n"); + /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and + pad buffers */ + rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); + if (rc) + return rc; + rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); + if (rc) + return rc; + rc = ata_port_start(ap); if (rc) return rc; @@ -1153,6 +1164,15 @@ pp->notifier_clear_block = pp->gen_block + NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no); + /* Now that the legacy PRD and padding buffer are allocated we can + safely raise the DMA mask to allocate the CPB/APRD table. + These are allowed to fail since we store the value that ends up + being used to set as the bounce limit in slave_config later if + needed. */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + pp->adma_dma_mask = *dev->dma_mask; + mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ, _dma, GFP_KERNEL); if (!mem) @@ -2414,12 +2434,6 @@ hpriv->type = type; host->private_data = hpriv; - /* set 64bit dma masks, may fail */ - if (type == ADMA) { - if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0) - pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); - } - /* request and iomap NV_MMIO_BAR */ rc = pcim_iomap_regions(pdev, 1 << NV_MMIO_BAR, DRV_NAME); if (rc) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Daniel Drake wrote: Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? ... You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Also, x86 doesn't prohibit unaligned accesses, but I believe they have a significant performance cost and are best avoided where possible. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the interrupt going?
[EMAIL PROTECTED] wrote: I tried the hammer and the problem persists. [EMAIL PROTECTED]:~$ cat /proc/cmdline root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash However, I reserve the right to try the hammer again in the future. When I look at /proc/interrupts without the APIC: [EMAIL PROTECTED]:~$ cat /proc/interrupts CPU0 0:144XT-PIC-XTtimer 1: 10XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 5: 10XT-PIC-XTohci_hcd:usb5, mxser 6: 5XT-PIC-XTfloppy 7: 1XT-PIC-XTparport0 8: 3XT-PIC-XTrtc 9: 1XT-PIC-XTacpi, uhci_hcd:usb2 10: 10XT-PIC-XTohci_hcd:usb4, ehci_hcd:usb6, [EMAIL PROTECTED]::01:00.0 11: 2231XT-PIC-XTuhci_hcd:usb1, ohci_hcd:usb3, eth0 12:130XT-PIC-XTi8042 14: 4362XT-PIC-XTlibata 15: 15315XT-PIC-XTlibata NMI: 0 LOC: 130125 ERR: 0 MIS: 0 I do not even see the device that I registered unless it is that r128... line. However the code printed out in /var/log/messages: Nov 22 16:05:27 bbb kernel: [ 104.712473] apc8620: VID = 0x10B5 Nov 22 16:05:27 bbb kernel: [ 104.712486] apc8620: mapped addr = e0bd4000 Nov 22 16:05:27 bbb kernel: [ 104.713022] apc8620: registered carrier 0 Nov 22 16:05:27 bbb kernel: [ 104.713028] apc8620: interrupt data (0xe1083e40) on irq (10) and status (0x10) which indicates it successfully registered without being shared. When I have more time, I will changed the code to be a shared IRQ and try the noapic again. You're not calling pci_enable_device anywhere. Unless you do this before requesting the IRQ, the IRQ routing may not be set up properly for your device and it may not even give you the right IRQ number. You should see a line like this somewhere in dmesg for the IRQ your card is on: ACPI: PCI Interrupt :00:1f.2[D] -> GSI 19 (level, low) -> IRQ 17 I think this behavior changed in the somewhat recent past.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of mutex in interrupt context flawed/impossible, need advice.
Leon Woestenberg wrote: Hello, I'm converting an out-of-tree (*1) driver from binary semaphore to mutex. Userspace updates a look-up-table using write(). The driver tries to write this LUT to the FPGA in the (video frame) interrupt handler. It is important that the LUT is consistent and thus changed atomically. Note that it is not important that the LUT is updated each interrupt. The current approach is to try-down()ing a binary semaphore in interrupt context, and write the LUT to the FPGA if the semaphore was down()ed, do nothing else. The write() down()s the semaphore as well before updating the in-driver-copy of the LUT, then up()s it again. I understand this design is not clean (*2), and not even possible with mutexes, as mutex_trylock() is not interrupt safe. My current approach would be to have userspace write into a shadow copy, and use a spinlock to update the live copy. The interrupt then would try a spinlock. Unless this update into the FPGA takes a significant amount of time, I wouldn't bother with that complexity - just do spin_lock_irq/irqsave on that spinlock. Using a trylock for this rather sucks since the behavior is entirely non-deterministic. It could take a really long time in some cases for the trylock to ever succeed. My feeling is that we have a valid use of mutex_trylock() in interrupt context; "i.e. update LUT if we can do so consistently and in time, or not at all". I would like to know why this is not so, and if someone has a cleaner proposal than the "try spinlock" approach? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)
Tejun Heo wrote: Hello, Robert. Robert Hancock wrote: This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the legacy buffers since setting the DMA mask affects both ports and we need to ensure the second port's buffers are allocated properly (fixes a problem with the previous version of this patch). Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> + /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and + pad buffers */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); [--snip--] + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); I'm probably being paranoid here but please add error checks. Just checking return value and returning error suffices. In the 32-bit case, I'm pretty sure those are guaranteed not to fail because 32-bit is the default. For the 64-bit ones, we don't care if they fail, because then we'll just use whatever mask ends up being set (we store the actual set DMA mask in adma_dma_mask for use when we need to reconfigure the bounce limit). We definitely don't want to fail initialization if the 64-bit set doesn't succeed.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)
Tejun Heo wrote: Hello, Robert. Robert Hancock wrote: This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the legacy buffers since setting the DMA mask affects both ports and we need to ensure the second port's buffers are allocated properly (fixes a problem with the previous version of this patch). Signed-off-by: Robert Hancock [EMAIL PROTECTED] + /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and + pad buffers */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); [--snip--] + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); I'm probably being paranoid here but please add error checks. Just checking return value and returning error suffices. In the 32-bit case, I'm pretty sure those are guaranteed not to fail because 32-bit is the default. For the 64-bit ones, we don't care if they fail, because then we'll just use whatever mask ends up being set (we store the actual set DMA mask in adma_dma_mask for use when we need to reconfigure the bounce limit). We definitely don't want to fail initialization if the 64-bit set doesn't succeed.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where is the interrupt going?
[EMAIL PROTECTED] wrote: I tried the hammer and the problem persists. [EMAIL PROTECTED]:~$ cat /proc/cmdline root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash However, I reserve the right to try the hammer again in the future. When I look at /proc/interrupts without the APIC: [EMAIL PROTECTED]:~$ cat /proc/interrupts CPU0 0:144XT-PIC-XTtimer 1: 10XT-PIC-XTi8042 2: 0XT-PIC-XTcascade 5: 10XT-PIC-XTohci_hcd:usb5, mxser 6: 5XT-PIC-XTfloppy 7: 1XT-PIC-XTparport0 8: 3XT-PIC-XTrtc 9: 1XT-PIC-XTacpi, uhci_hcd:usb2 10: 10XT-PIC-XTohci_hcd:usb4, ehci_hcd:usb6, [EMAIL PROTECTED]::01:00.0 11: 2231XT-PIC-XTuhci_hcd:usb1, ohci_hcd:usb3, eth0 12:130XT-PIC-XTi8042 14: 4362XT-PIC-XTlibata 15: 15315XT-PIC-XTlibata NMI: 0 LOC: 130125 ERR: 0 MIS: 0 I do not even see the device that I registered unless it is that r128... line. However the code printed out in /var/log/messages: Nov 22 16:05:27 bbb kernel: [ 104.712473] apc8620: VID = 0x10B5 Nov 22 16:05:27 bbb kernel: [ 104.712486] apc8620: mapped addr = e0bd4000 Nov 22 16:05:27 bbb kernel: [ 104.713022] apc8620: registered carrier 0 Nov 22 16:05:27 bbb kernel: [ 104.713028] apc8620: interrupt data (0xe1083e40) on irq (10) and status (0x10) which indicates it successfully registered without being shared. When I have more time, I will changed the code to be a shared IRQ and try the noapic again. You're not calling pci_enable_device anywhere. Unless you do this before requesting the IRQ, the IRQ routing may not be set up properly for your device and it may not even give you the right IRQ number. You should see a line like this somewhere in dmesg for the IRQ your card is on: ACPI: PCI Interrupt :00:1f.2[D] - GSI 19 (level, low) - IRQ 17 I think this behavior changed in the somewhat recent past.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)
We need to run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Enforce this with BUG_ON() since data corruption could potentially result if this happened. Also WARN_ON() if we try and send result taskfile commands while NCQ commands are still active, since the hardware doesn't allow this. Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.24-rc3-git1/drivers/ata/sata_nv.c 2007-11-20 17:40:09.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 19:40:58.0 -0600 @@ -791,11 +791,13 @@ static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf) { - /* Since commands where a result TF is requested are not - executed in ADMA mode, the only time this function will be called - in ADMA mode will be if a command fails. In this case we - don't care about going into register mode with ADMA commands - pending, as the commands will all shortly be aborted anyway. */ + /* Other than when internal or pass-through commands are executed, + the only time this function will be called in ADMA mode will be + if a command fails. In the failure case we don't care about going + into register mode with ADMA commands pending, as the commands will + all shortly be aborted anyway. We assume that NCQ commands are not + issued via passthrough, which is the only way that switching into + ADMA mode could abort outstanding commands. */ nv_adma_register_mode(ap); ata_tf_read(ap, tf); @@ -1359,11 +1361,9 @@ struct nv_adma_port_priv *pp = qc-ap-private_data; /* ADMA engine can only be used for non-ATAPI DMA commands, - or interrupt-driven no-data commands, where a result taskfile - is not required. */ + or interrupt-driven no-data commands. */ if ((pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) || - (qc-tf.flags ATA_TFLAG_POLLING) || - (qc-flags ATA_QCFLAG_RESULT_TF)) + (qc-tf.flags ATA_TFLAG_POLLING)) return 1; if ((qc-flags ATA_QCFLAG_DMAMAP) || @@ -1381,6 +1381,8 @@ NV_CPB_CTL_IEN; if (nv_adma_use_reg_mode(qc)) { + BUG_ON(!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) + (qc-flags ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc-ap); ata_qc_prep(qc); return; @@ -1425,9 +1427,17 @@ VPRINTK(ENTER\n); + /* We can't handle result taskfile with NCQ commands active, since + retrieving the taskfile switches us out of ADMA mode and would abort + existing commands. */ + WARN_ON((qc-flags ATA_QCFLAG_RESULT_TF) + (qc-ap-qc_allocated ~(1 qc-tag))); + if (nv_adma_use_reg_mode(qc)) { /* use ATA register mode */ VPRINTK(using ATA register mode: 0x%lx\n, qc-flags); + BUG_ON(!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) + (qc-flags ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc-ap); return ata_qc_issue_prot(qc); } else - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Use of mutex in interrupt context flawed/impossible, need advice.
Leon Woestenberg wrote: Hello, I'm converting an out-of-tree (*1) driver from binary semaphore to mutex. Userspace updates a look-up-table using write(). The driver tries to write this LUT to the FPGA in the (video frame) interrupt handler. It is important that the LUT is consistent and thus changed atomically. Note that it is not important that the LUT is updated each interrupt. The current approach is to try-down()ing a binary semaphore in interrupt context, and write the LUT to the FPGA if the semaphore was down()ed, do nothing else. The write() down()s the semaphore as well before updating the in-driver-copy of the LUT, then up()s it again. I understand this design is not clean (*2), and not even possible with mutexes, as mutex_trylock() is not interrupt safe. My current approach would be to have userspace write into a shadow copy, and use a spinlock to update the live copy. The interrupt then would try a spinlock. Unless this update into the FPGA takes a significant amount of time, I wouldn't bother with that complexity - just do spin_lock_irq/irqsave on that spinlock. Using a trylock for this rather sucks since the behavior is entirely non-deterministic. It could take a really long time in some cases for the trylock to ever succeed. My feeling is that we have a valid use of mutex_trylock() in interrupt context; i.e. update LUT if we can do so consistently and in time, or not at all. I would like to know why this is not so, and if someone has a cleaner proposal than the try spinlock approach? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-22 19:42:28.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 19:48:25.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -748,7 +749,7 @@ adma_enable = 0; nv_adma_register_mode(ap); } else { - bounce_limit = *ap-dev-dma_mask; + bounce_limit = pp-adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; @@ -1134,10 +1135,20 @@ void *mem; dma_addr_t mem_dma; void __iomem *mmio; + struct pci_dev *pdev = to_pci_dev(dev); u16 tmp; VPRINTK(ENTER\n); + /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and + pad buffers */ + rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); + if (rc) + return rc; + rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); + if (rc) + return rc; + rc = ata_port_start(ap); if (rc) return rc; @@ -1153,6 +1164,15 @@ pp-notifier_clear_block = pp-gen_block + NV_ADMA_NOTIFIER_CLEAR + (4 * ap-port_no); + /* Now that the legacy PRD and padding buffer are allocated we can + safely raise the DMA mask to allocate the CPB/APRD table. + These are allowed to fail since we store the value that ends up + being used to set as the bounce limit in slave_config later if + needed. */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + pp-adma_dma_mask = *dev-dma_mask; + mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ, mem_dma, GFP_KERNEL); if (!mem) @@ -2414,12 +2434,6 @@ hpriv-type = type; host-private_data = hpriv; - /* set 64bit dma masks, may fail */ - if (type == ADMA) { - if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0) - pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); - } - /* request and iomap NV_MMIO_BAR */ rc = pcim_iomap_regions(pdev, 1 NV_MMIO_BAR, DRV_NAME); if (rc) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Documentation about unaligned memory access
Daniel Drake wrote: Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? ... You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Also, x86 doesn't prohibit unaligned accesses, but I believe they have a significant performance cost and are best avoided where possible. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)
Vincent Fortier wrote: Le mardi 20 novembre 2007 à 18:56 -0600, Robert Hancock a écrit : This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the legacy buffers since setting the DMA mask affects both ports and we need to ensure the second port's buffers are allocated properly (fixes a problem with the previous version of this patch). Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> Would this be worth sending to stable team for 2.6.22 & 2.6.23 ? Likely (after it gets merged), those versions would have the same bug.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)
Vincent Fortier wrote: Le mardi 20 novembre 2007 à 18:56 -0600, Robert Hancock a écrit : This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the legacy buffers since setting the DMA mask affects both ports and we need to ensure the second port's buffers are allocated properly (fixes a problem with the previous version of this patch). Signed-off-by: Robert Hancock [EMAIL PROTECTED] Would this be worth sending to stable team for 2.6.22 2.6.23 ? Likely (after it gets merged), those versions would have the same bug.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3: find complains about /proc/net
Eric W. Biederman wrote: Could you elaborate a bit on how the semantics of returning the wrong information are more useful? In particular if a thread does the logical equivalent of: grep Pid: /proc/self/status. It always get the tgid despite having a different process id. The POSIX-defined userspace concept of a PID requires that all threads appear to have the same PID. This is something that Linux didn't comply with under the old LinuxThreads implementation and was finally fixed with NPTL. This isn't a POSIX-defined interface, but I assume it's trying to be consistent with getpid(), etc. How can that possibly be useful or correct? From the kernel side I really think the current semantics of /proc/self in the context of threads is a bug and confusing. All of the kernel developers first reaction when this was pointed out was that this is a regression. If it is truly useful to user space we can preserve this API design bug forever. I just want to make certain we are not being bug compatible without a good reason. Currently we have several kernel side bugs with threaded programs because /proc/self does not do the intuitive thing. Unless something has changed recently selinux will cause accesses by a non-leader thread to fail when accessing files through /proc/self. So far the more I look at the current /proc/self behavior the more I am convinced it is broken, and useless. Please help me see where it is useful, so we can justify keeping it. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode
Tejun Heo wrote: Tejun Heo wrote: If so, can you please add that switching into register mode is okay as long as there's no other ADMA commands in flight and add WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) && link->sactive)? More accurately, link->sactive test can be substituted with (ap->qc_allocated & ~(1 << qc->tag)). Unfortunately we only get the ata_port and ata_taskfile in the tf_read callback, so I'm not sure if we can do the equivalent of the qc->flags & ATA_QCFLAG_RESULT_TF test (i.e. distinguishing between the error-handling case where we care if we abort outstanding commands and the normal case with a RESULT_TF command where we do).. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the legacy buffers since setting the DMA mask affects both ports and we need to ensure the second port's buffers are allocated properly (fixes a problem with the previous version of this patch). Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-20 17:47:46.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-20 17:50:30.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -748,7 +749,7 @@ adma_enable = 0; nv_adma_register_mode(ap); } else { - bounce_limit = *ap->dev->dma_mask; + bounce_limit = pp->adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; @@ -1134,10 +1135,16 @@ void *mem; dma_addr_t mem_dma; void __iomem *mmio; + struct pci_dev *pdev = to_pci_dev(dev); u16 tmp; VPRINTK("ENTER\n"); + /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and + pad buffers */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); + rc = ata_port_start(ap); if (rc) return rc; @@ -1153,6 +1160,14 @@ pp->notifier_clear_block = pp->gen_block + NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no); + /* Now that the legacy PRD and padding buffer are allocated we can + safely raise the DMA mask to allocate the CPB/APRD table. */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + /* Store the mask that was actually used so we can restore it as + the bounce limit later if needed */ + pp->adma_dma_mask = *dev->dma_mask; + mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ, _dma, GFP_KERNEL); if (!mem) @@ -2408,12 +2423,6 @@ hpriv->type = type; host->private_data = hpriv; - /* set 64bit dma masks, may fail */ - if (type == ADMA) { - if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0) - pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); - } - /* request and iomap NV_MMIO_BAR */ rc = pcim_iomap_regions(pdev, 1 << NV_MMIO_BAR, DRV_NAME); if (rc) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: System reboot triggered by just reading a device file....!?
[EMAIL PROTECTED] wrote: good evening, i stumbled over some funny issue when trying windirstat (like KDirStat) with wine. after running that tool for a while my system rebooted. i could reproduce this with every run. after some deep investigation (i thought i had stability issues with my system and spent more than an hour on this) i found out, that the reboot is being triggered by iTCO_wdt ( /dev/watchdog ) this is how to reproduce: - be root - cat /dev/watchdog or dd if=/dev/watchdog of=/dev/zero bs=1 count=1 or . - wait one minute *reboot*! i have heard 2 opinions for now (contacted the author and also discussed on wine-devel ) that this should be expected behaviour. Yes, it is. It's a watchdog device, it's meant to reboot the machine if whatever task is poking the watchdog dies. being sysadmin quite a while, i cannot believe that (accidentally) reading a device file (being root or not - what does that matter) triggers a system reboot. ok - when i`m root , i shouldn`t do stupid things and be careful, but i thought reading/crawling trough a filesystem (r/o, btw.) with some tool which is built to do exactly this wasn`t so stupid - even from within wine. I would say that running a Windows tool that opens up and reads random files, on the /dev directory tree, as root, probably does qualify as "stupid". I'd say running pretty much anything through Wine as root is not a good idea, a Windows app could hose the system without even meaning to through exactly such things. think of an admin writing a quick script for intrusion detection (find / -exec md5sum {} \; >/tmp/need-no-tripwire) and forgetting to exclude /dev, /sys or /proc appropriately.. think of someone exporting "/" via samba (readonly) and then navigating trough the /dev directory stupid? i don`t think so.i have seen worse things.. :) should someone get punished by an accidental system reboot and should he need to spend his time on this to investigate why this happens? i`d wish there would be some fence around this or iTCO_wdt /dev/watchdog not being active after a default desktop installation. There is.. it's called "root privileges". i`d be interested if i`m the only one who thinks this is strange/dangerous behaviour. regards roland -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: System reboot triggered by just reading a device file....!?
[EMAIL PROTECTED] wrote: good evening, i stumbled over some funny issue when trying windirstat (like KDirStat) with wine. after running that tool for a while my system rebooted. i could reproduce this with every run. after some deep investigation (i thought i had stability issues with my system and spent more than an hour on this) i found out, that the reboot is being triggered by iTCO_wdt ( /dev/watchdog ) this is how to reproduce: - be root - cat /dev/watchdog or dd if=/dev/watchdog of=/dev/zero bs=1 count=1 or . - wait one minute *reboot*! i have heard 2 opinions for now (contacted the author and also discussed on wine-devel ) that this should be expected behaviour. Yes, it is. It's a watchdog device, it's meant to reboot the machine if whatever task is poking the watchdog dies. being sysadmin quite a while, i cannot believe that (accidentally) reading a device file (being root or not - what does that matter) triggers a system reboot. ok - when i`m root , i shouldn`t do stupid things and be careful, but i thought reading/crawling trough a filesystem (r/o, btw.) with some tool which is built to do exactly this wasn`t so stupid - even from within wine. I would say that running a Windows tool that opens up and reads random files, on the /dev directory tree, as root, probably does qualify as stupid. I'd say running pretty much anything through Wine as root is not a good idea, a Windows app could hose the system without even meaning to through exactly such things. think of an admin writing a quickdirty script for intrusion detection (find / -exec md5sum {} \; /tmp/need-no-tripwire) and forgetting to exclude /dev, /sys or /proc appropriately.. think of someone exporting / via samba (readonly) and then navigating trough the /dev directory stupid? i don`t think so.i have seen worse things.. :) should someone get punished by an accidental system reboot and should he need to spend his time on this to investigate why this happens? i`d wish there would be some fence around this or iTCO_wdt /dev/watchdog not being active after a default desktop installation. There is.. it's called root privileges. i`d be interested if i`m the only one who thinks this is strange/dangerous behaviour. regards roland -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the legacy buffers since setting the DMA mask affects both ports and we need to ensure the second port's buffers are allocated properly (fixes a problem with the previous version of this patch). Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-20 17:47:46.0 -0600 +++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-20 17:50:30.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -748,7 +749,7 @@ adma_enable = 0; nv_adma_register_mode(ap); } else { - bounce_limit = *ap-dev-dma_mask; + bounce_limit = pp-adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; @@ -1134,10 +1135,16 @@ void *mem; dma_addr_t mem_dma; void __iomem *mmio; + struct pci_dev *pdev = to_pci_dev(dev); u16 tmp; VPRINTK(ENTER\n); + /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and + pad buffers */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); + rc = ata_port_start(ap); if (rc) return rc; @@ -1153,6 +1160,14 @@ pp-notifier_clear_block = pp-gen_block + NV_ADMA_NOTIFIER_CLEAR + (4 * ap-port_no); + /* Now that the legacy PRD and padding buffer are allocated we can + safely raise the DMA mask to allocate the CPB/APRD table. */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + /* Store the mask that was actually used so we can restore it as + the bounce limit later if needed */ + pp-adma_dma_mask = *dev-dma_mask; + mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ, mem_dma, GFP_KERNEL); if (!mem) @@ -2408,12 +2423,6 @@ hpriv-type = type; host-private_data = hpriv; - /* set 64bit dma masks, may fail */ - if (type == ADMA) { - if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0) - pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); - } - /* request and iomap NV_MMIO_BAR */ rc = pcim_iomap_regions(pdev, 1 NV_MMIO_BAR, DRV_NAME); if (rc) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode
Tejun Heo wrote: Tejun Heo wrote: If so, can you please add that switching into register mode is okay as long as there's no other ADMA commands in flight and add WARN_ON((qc-flags ATA_QCFLAG_RESULT_TF) link-sactive)? More accurately, link-sactive test can be substituted with (ap-qc_allocated ~(1 qc-tag)). Unfortunately we only get the ata_port and ata_taskfile in the tf_read callback, so I'm not sure if we can do the equivalent of the qc-flags ATA_QCFLAG_RESULT_TF test (i.e. distinguishing between the error-handling case where we care if we abort outstanding commands and the normal case with a RESULT_TF command where we do).. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3: find complains about /proc/net
Eric W. Biederman wrote: Could you elaborate a bit on how the semantics of returning the wrong information are more useful? In particular if a thread does the logical equivalent of: grep Pid: /proc/self/status. It always get the tgid despite having a different process id. The POSIX-defined userspace concept of a PID requires that all threads appear to have the same PID. This is something that Linux didn't comply with under the old LinuxThreads implementation and was finally fixed with NPTL. This isn't a POSIX-defined interface, but I assume it's trying to be consistent with getpid(), etc. How can that possibly be useful or correct? From the kernel side I really think the current semantics of /proc/self in the context of threads is a bug and confusing. All of the kernel developers first reaction when this was pointed out was that this is a regression. If it is truly useful to user space we can preserve this API design bug forever. I just want to make certain we are not being bug compatible without a good reason. Currently we have several kernel side bugs with threaded programs because /proc/self does not do the intuitive thing. Unless something has changed recently selinux will cause accesses by a non-leader thread to fail when accessing files through /proc/self. So far the more I look at the current /proc/self behavior the more I am convinced it is broken, and useless. Please help me see where it is useful, so we can justify keeping it. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)
Tejun Heo wrote: Robert Hancock wrote: Tejun Heo wrote: Robert Hancock wrote: This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> applied to #tj-upstream-fixes. I have a report that these patches crashed but the previous patch worked: https://bugzilla.redhat.com/show_bug.cgi?id=351451 So there may still be a problem here. Any progress? It looks like the problem is that even though we set the DMA mask after we allocate the PRD and pad buffers, when the other port is set up, the DMA mask is already over 64-bit and so it allocates its buffers over 4GB and fails. I think we just need to explicitly set to 32-bit first, getting the reporter to try that one now. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)
Tejun Heo wrote: Robert Hancock wrote: Tejun Heo wrote: Robert Hancock wrote: This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock [EMAIL PROTECTED] applied to #tj-upstream-fixes. I have a report that these patches crashed but the previous patch worked: https://bugzilla.redhat.com/show_bug.cgi?id=351451 So there may still be a problem here. Any progress? It looks like the problem is that even though we set the DMA mask after we allocate the PRD and pad buffers, when the other port is set up, the DMA mask is already over 64-bit and so it allocates its buffers over 4GB and fails. I think we just need to explicitly set to 32-bit first, getting the reporter to try that one now. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REQUEST] Option for skipping unreadable blocks on Video DVD
Tobias wrote: If you are accessing a scratched Video DVD and the device cannot read it, the process ends. What about a more tolerant way to handle unreadable blocks. Especially on Video DVDs single blocks are not that important than on data dvds. If the DVD player process ends from this, I'd say that's the fault of the player software not handling errors properly. I think that if they are using the normal block layer accesses on the DVD device, there may be some retries that occur which are likely undesirable in this case since they will just stall playback. If they are using SG_IO to feed raw requests into the drive (which I imagine they need to do for CSS authentication, etc. anyway), then all error handling is passed up to the user application. So is there a way that the kernel tells the device to skip these bad blocks? We don't know they're bad until we try and read them. How long the drive will stall trying to read that sector before giving up and returning an error is up to the drive. I'm not sure if the MMC command set allows any way to tell the drive to give up more quickly or not.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REQUEST] Option for skipping unreadable blocks on Video DVD
Tobias wrote: If you are accessing a scratched Video DVD and the device cannot read it, the process ends. What about a more tolerant way to handle unreadable blocks. Especially on Video DVDs single blocks are not that important than on data dvds. If the DVD player process ends from this, I'd say that's the fault of the player software not handling errors properly. I think that if they are using the normal block layer accesses on the DVD device, there may be some retries that occur which are likely undesirable in this case since they will just stall playback. If they are using SG_IO to feed raw requests into the drive (which I imagine they need to do for CSS authentication, etc. anyway), then all error handling is passed up to the user application. So is there a way that the kernel tells the device to skip these bad blocks? We don't know they're bad until we try and read them. How long the drive will stall trying to read that sector before giving up and returning an error is up to the drive. I'm not sure if the MMC command set allows any way to tell the drive to give up more quickly or not.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)
Tejun Heo wrote: Robert Hancock wrote: This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> applied to #tj-upstream-fixes. I have a report that these patches crashed but the previous patch worked: https://bugzilla.redhat.com/show_bug.cgi?id=351451 So there may still be a problem here. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)
Tejun Heo wrote: Robert Hancock wrote: This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock [EMAIL PROTECTED] applied to #tj-upstream-fixes. I have a report that these patches crashed but the previous patch worked: https://bugzilla.redhat.com/show_bug.cgi?id=351451 So there may still be a problem here. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c.before 2007-11-13 19:04:18.0 -0600 +++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 19:02:34.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -748,7 +749,7 @@ adma_enable = 0; nv_adma_register_mode(ap); } else { - bounce_limit = *ap->dev->dma_mask; + bounce_limit = pp->adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; @@ -763,6 +764,11 @@ config_mask = NV_MCP_SATA_CFG_20_PORT0_EN | NV_MCP_SATA_CFG_20_PORT0_PWB_EN; + /* Set appropriate DMA mask. */ + rc = pci_set_dma_mask(pdev, bounce_limit); + if (rc) + return rc; + if (adma_enable) { new_reg = current_reg | config_mask; pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE; @@ -1134,6 +1140,7 @@ void *mem; dma_addr_t mem_dma; void __iomem *mmio; + struct pci_dev *pdev = to_pci_dev(dev); u16 tmp; VPRINTK("ENTER\n"); @@ -1153,6 +1160,14 @@ pp->notifier_clear_block = pp->gen_block + NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no); + /* Now that the legacy PRD and padding buffer are allocated we can + safely raise the DMA mask to allocate the CPB/APRD table. */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + /* Store the mask that was actually used so we can restore it later + if needed */ + pp->adma_dma_mask = *dev->dma_mask; + mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ, _dma, GFP_KERNEL); if (!mem) @@ -2408,12 +2423,6 @@ hpriv->type = type; host->private_data = hpriv; - /* set 64bit dma masks, may fail */ - if (type == ADMA) { - if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0) - pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); - } - /* request and iomap NV_MMIO_BAR */ rc = pcim_iomap_regions(pdev, 1 << NV_MMIO_BAR, DRV_NAME); if (rc) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode
We need to run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Enforce this with BUG_ON() since data corruption could potentially result if this happened. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 20:01:32.0 -0600 +++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 19:01:09.0 -0600 @@ -791,11 +797,13 @@ static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf) { - /* Since commands where a result TF is requested are not - executed in ADMA mode, the only time this function will be called - in ADMA mode will be if a command fails. In this case we - don't care about going into register mode with ADMA commands - pending, as the commands will all shortly be aborted anyway. */ + /* Other than when internal or pass-through commands are executed, + the only time this function will be called in ADMA mode will be + if a command fails. In the failure case we don't care about going + into register mode with ADMA commands pending, as the commands will + all shortly be aborted anyway. We assume that NCQ commands are not + issued via passthrough and so this will not abort any commands in + that case. */ nv_adma_register_mode(ap); ata_tf_read(ap, tf); @@ -1359,11 +1376,9 @@ struct nv_adma_port_priv *pp = qc->ap->private_data; /* ADMA engine can only be used for non-ATAPI DMA commands, - or interrupt-driven no-data commands, where a result taskfile - is not required. */ + or interrupt-driven no-data commands. */ if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) || - (qc->tf.flags & ATA_TFLAG_POLLING) || - (qc->flags & ATA_QCFLAG_RESULT_TF)) + (qc->tf.flags & ATA_TFLAG_POLLING)) return 1; if ((qc->flags & ATA_QCFLAG_DMAMAP) || @@ -1381,6 +1396,8 @@ NV_CPB_CTL_IEN; if (nv_adma_use_reg_mode(qc)) { + BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) && + (qc->flags & ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc->ap); ata_qc_prep(qc); return; @@ -1428,6 +1445,8 @@ if (nv_adma_use_reg_mode(qc)) { /* use ATA register mode */ VPRINTK("using ATA register mode: 0x%lx\n", qc->flags); + BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) && + (qc->flags & ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc->ap); return ata_qc_issue_prot(qc); } else - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB
Tejun Heo wrote: Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in that way (there are some dual-socket Opteron boxes with this controller, forcing an allocation below 4GB for this could force a non-optimal node allocation I think..) To do this I'd have to raise the mask for the APRD allocation, drop it again, then raise it again in ADMA mode, which is kind of ugly. I don't think it really matters. The table isn't too big and it's not like access to the table has any processor locality. Maybe it's better to allocate to the same node as the irq but raising DMA mask doesn't help at all. It's quite possible that restricting the DMA mask will also restrict what node that can get allocated on. I'm not so much thinking of the CPU access to the table but the controller's banging on the thing several times for each command.. I think performance impact is nil either way but even in highly unlikely case it has any impact, allocating PRDs under 4G should be better as it avoids DAC cycles on the bus. But again, this is just irrelevant. I'd say just allocate everything under 4G. The DAC issue shouldn't matter as these controllers are integrated into the chipset so it will be using all HT bus transactions, not PCI. We can do it without all that mess in slave_config though, just by delaying raising the DMA mask until after the PRD/pad buffers are allocated. Also, I'd rather not allocate the legacy PRD at all if we're in ADMA mode. That way, if some bug causes us to try and do legacy DMA in ADMA mode, we'll crash from null pointer dereference instead of potentially transferring incorrect data (as we had in this case) and corrupting things. Yeap, I can agree with this. But can you add BUG_ON()/WARN_ON() at places instead? I know blanking pointers feel safer but I think it's best to keep resource allocation / release in ->port_start/stop(). Yeah, I've got rid of that stuff now and added some BUG_ONs for this. Will submit the patches shortly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB
Tejun Heo wrote: Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in that way (there are some dual-socket Opteron boxes with this controller, forcing an allocation below 4GB for this could force a non-optimal node allocation I think..) To do this I'd have to raise the mask for the APRD allocation, drop it again, then raise it again in ADMA mode, which is kind of ugly. I don't think it really matters. The table isn't too big and it's not like access to the table has any processor locality. Maybe it's better to allocate to the same node as the irq but raising DMA mask doesn't help at all. It's quite possible that restricting the DMA mask will also restrict what node that can get allocated on. I'm not so much thinking of the CPU access to the table but the controller's banging on the thing several times for each command.. I think performance impact is nil either way but even in highly unlikely case it has any impact, allocating PRDs under 4G should be better as it avoids DAC cycles on the bus. But again, this is just irrelevant. I'd say just allocate everything under 4G. The DAC issue shouldn't matter as these controllers are integrated into the chipset so it will be using all HT bus transactions, not PCI. We can do it without all that mess in slave_config though, just by delaying raising the DMA mask until after the PRD/pad buffers are allocated. Also, I'd rather not allocate the legacy PRD at all if we're in ADMA mode. That way, if some bug causes us to try and do legacy DMA in ADMA mode, we'll crash from null pointer dereference instead of potentially transferring incorrect data (as we had in this case) and corrupting things. Yeap, I can agree with this. But can you add BUG_ON()/WARN_ON() at places instead? I know blanking pointers feel safer but I think it's best to keep resource allocation / release in -port_start/stop(). Yeah, I've got rid of that stuff now and added some BUG_ONs for this. Will submit the patches shortly. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode
We need to run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Enforce this with BUG_ON() since data corruption could potentially result if this happened. Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 20:01:32.0 -0600 +++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 19:01:09.0 -0600 @@ -791,11 +797,13 @@ static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf) { - /* Since commands where a result TF is requested are not - executed in ADMA mode, the only time this function will be called - in ADMA mode will be if a command fails. In this case we - don't care about going into register mode with ADMA commands - pending, as the commands will all shortly be aborted anyway. */ + /* Other than when internal or pass-through commands are executed, + the only time this function will be called in ADMA mode will be + if a command fails. In the failure case we don't care about going + into register mode with ADMA commands pending, as the commands will + all shortly be aborted anyway. We assume that NCQ commands are not + issued via passthrough and so this will not abort any commands in + that case. */ nv_adma_register_mode(ap); ata_tf_read(ap, tf); @@ -1359,11 +1376,9 @@ struct nv_adma_port_priv *pp = qc-ap-private_data; /* ADMA engine can only be used for non-ATAPI DMA commands, - or interrupt-driven no-data commands, where a result taskfile - is not required. */ + or interrupt-driven no-data commands. */ if ((pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) || - (qc-tf.flags ATA_TFLAG_POLLING) || - (qc-flags ATA_QCFLAG_RESULT_TF)) + (qc-tf.flags ATA_TFLAG_POLLING)) return 1; if ((qc-flags ATA_QCFLAG_DMAMAP) || @@ -1381,6 +1396,8 @@ NV_CPB_CTL_IEN; if (nv_adma_use_reg_mode(qc)) { + BUG_ON(!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) + (qc-flags ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc-ap); ata_qc_prep(qc); return; @@ -1428,6 +1445,8 @@ if (nv_adma_use_reg_mode(qc)) { /* use ATA register mode */ VPRINTK(using ATA register mode: 0x%lx\n, qc-flags); + BUG_ON(!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) + (qc-flags ATA_QCFLAG_DMAMAP)); nv_adma_register_mode(qc-ap); return ata_qc_issue_prot(qc); } else - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to delay setting the 64-bit DMA mask until the PRD table and padding buffer are allocated so that they don't get allocated above 4GB and break legacy mode (which is needed for ATAPI devices). Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c.before 2007-11-13 19:04:18.0 -0600 +++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 19:02:34.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -748,7 +749,7 @@ adma_enable = 0; nv_adma_register_mode(ap); } else { - bounce_limit = *ap-dev-dma_mask; + bounce_limit = pp-adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; @@ -763,6 +764,11 @@ config_mask = NV_MCP_SATA_CFG_20_PORT0_EN | NV_MCP_SATA_CFG_20_PORT0_PWB_EN; + /* Set appropriate DMA mask. */ + rc = pci_set_dma_mask(pdev, bounce_limit); + if (rc) + return rc; + if (adma_enable) { new_reg = current_reg | config_mask; pp-flags = ~NV_ADMA_ATAPI_SETUP_COMPLETE; @@ -1134,6 +1140,7 @@ void *mem; dma_addr_t mem_dma; void __iomem *mmio; + struct pci_dev *pdev = to_pci_dev(dev); u16 tmp; VPRINTK(ENTER\n); @@ -1153,6 +1160,14 @@ pp-notifier_clear_block = pp-gen_block + NV_ADMA_NOTIFIER_CLEAR + (4 * ap-port_no); + /* Now that the legacy PRD and padding buffer are allocated we can + safely raise the DMA mask to allocate the CPB/APRD table. */ + pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); + /* Store the mask that was actually used so we can restore it later + if needed */ + pp-adma_dma_mask = *dev-dma_mask; + mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ, mem_dma, GFP_KERNEL); if (!mem) @@ -2408,12 +2423,6 @@ hpriv-type = type; host-private_data = hpriv; - /* set 64bit dma masks, may fail */ - if (type == ADMA) { - if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0) - pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK); - } - /* request and iomap NV_MMIO_BAR */ rc = pcim_iomap_regions(pdev, 1 NV_MMIO_BAR, DRV_NAME); if (rc) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB
Tejun Heo wrote: How about always initialize DMA mask to ATA_DMA_MASK regardless of ADMA mode such that PRD and PAD buffers are always accessible by register mode and just raising PCI dma mask and queue bounce limit if ADMA mode is active? Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in that way (there are some dual-socket Opteron boxes with this controller, forcing an allocation below 4GB for this could force a non-optimal node allocation I think..) To do this I'd have to raise the mask for the APRD allocation, drop it again, then raise it again in ADMA mode, which is kind of ugly. Also, I'd rather not allocate the legacy PRD at all if we're in ADMA mode. That way, if some bug causes us to try and do legacy DMA in ADMA mode, we'll crash from null pointer dereference instead of potentially transferring incorrect data (as we had in this case) and corrupting things. + /* Set appropriate DMA mask. */ + pci_set_dma_mask(pdev, bounce_limit); + pci_set_consistent_dma_mask(pdev, bounce_limit); These can fail. Yes, it should likely do something with these return values. Though theoretically it shouldn't fail, since the DMA mask is either 32-bit, which shouldn't fail, or one that was successfully set before. Also I don't think the SCSI layer actually checks the slave_config return value.. sigh. Also, please separate out the result TF handling to a separate patch. I know it's a small change but as both introduces important behavior changes, I think it would be nice to have a bisection point inbetween. Could do. That change would have to come first though, as the change to not allocate the PRD except when necessary would cause some cases there to blow up when before they might have worked in some cases. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to make sure that the legacy PRD table and padding buffer are appropriately allocated according to the DMA mask requirements of the current operating mode (ADMA or legacy). Also, we should run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Fixes Red Hat Bugzilla #351451: https://bugzilla.redhat.com/show_bug.cgi?id=351451 Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 20:01:32.0 -0600 +++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-10 19:57:47.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -747,11 +748,29 @@ on the port. */ adma_enable = 0; nv_adma_register_mode(ap); + if (!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) { + /* Transitioning to legacy mode. Free the pad buffer. */ + ata_pad_free(ap, ap->host->dev); + ap->pad = NULL; + ap->pad_dma = 0; + } } else { - bounce_limit = *ap->dev->dma_mask; + bounce_limit = pp->adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; + + if (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) { + /* Transitioning to ADMA mode. Free legacy PRD table + and the pad buffer. */ + ata_pad_free(ap, ap->host->dev); + ap->pad = NULL; + ap->pad_dma = 0; + dmam_free_coherent(ap->host->dev, ATA_PRD_TBL_SZ, + ap->prd, ap->prd_dma); + ap->prd = NULL; + ap->prd_dma = 0; + } } pci_read_config_dword(pdev, NV_MCP_SATA_CFG_20, _reg); @@ -763,23 +782,45 @@ config_mask = NV_MCP_SATA_CFG_20_PORT0_EN | NV_MCP_SATA_CFG_20_PORT0_PWB_EN; + /* Set appropriate DMA mask. */ + pci_set_dma_mask(pdev, bounce_limit); + pci_set_consistent_dma_mask(pdev, bounce_limit); + + blk_queue_bounce_limit(sdev->request_queue, bounce_limit); + blk_queue_segment_boundary(sdev->request_queue, segment_boundary); + blk_queue_max_hw_segments(sdev->request_queue, sg_tablesize); + ata_port_printk(ap, KERN_INFO, + "bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n", + (unsigned long long)bounce_limit, segment_boundary, + sg_tablesize); + if (adma_enable) { new_reg = current_reg | config_mask; - pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE; + if (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) { + /* Transition to ADMA mode. + Reallocate the pad buffer. */ + rc = ata_pad_alloc(ap, ap->host->dev); + pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE; + } } else { new_reg = current_reg & ~config_mask; - pp->flags |= NV_ADMA_ATAPI_SETUP_COMPLETE; + if (!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) { + /* Transition to legacy mode. + Reallocate the legacy PRD and pad buffer. */ + ap->prd = dmam_alloc_coherent(ap->host->dev, + ATA_PRD_TBL_SZ, >prd_dma, GFP_KERNEL); + if (!ap->prd) + rc = -ENOMEM; + else + rc = ata_pad_alloc(ap, ap->host->dev); + + pp->flags |= NV_ADMA_ATAPI_SETUP_COMPLETE; + } } if (current_reg != new_reg) pci_write_config_dword(pdev, NV_MCP_SATA_CFG_20, new_reg); - blk_queue_bounce_limit(sdev->request_queue, bounce_limit); - blk_queue_segment_boundary(sdev->request_queue, segment_boundary); - blk_queue_max_hw_segments(sdev->request_queue, sg_tablesize); - ata_port_printk(ap, KERN_INFO, - "bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n&
Re: x86_64 SATA DVD drive + libata trouble
Bernd Strieder wrote: I managed to get it running with the SuSE kernel when passing adma=0 to sata_nv module, and I managed to get it running when passing mem=2000M to the SuSE kernel. Thanks to Robert for those hints. The vanilla kernels I tried 2.6.23.1 and 2.6.24-rc1-git10 (with patch to sata_nv.c from Robert Hancock see https://bugzilla.redhat.com/show_bug.cgi?id=351451) seem to be very sensitive in this area. Whenever I got them to oops, I did not have much time to get anything read on the screen. I managed under the patched 2.6.24-rc1-git10 to manually load sata_nv and sr_mod, and then I got an OOps like Unable to handle ... NULL pointer dref at RIP ff880edf6a . libata:ata_qc_prep + 0xe2/0x15b . srmod:sr_probe Which patch is this using, the original one from Nov. 2 or the updated one from Nov. 10? The original one has a bug. I have attached 3 dmesg outputs with the openSuSE 10.3 kernel and extracts of /var/log/messages, especially some Oopses. The oopses from the vanilla kernels seem to be so bad that they do never end up in a file. I will do some more tests as soon as possible. I have attached the files as I created them, you will have to diff the single files, anyway, to get the important information out, I cannot select for you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86_64 SATA DVD drive + libata trouble
Bernd Strieder wrote: I managed to get it running with the SuSE kernel when passing adma=0 to sata_nv module, and I managed to get it running when passing mem=2000M to the SuSE kernel. Thanks to Robert for those hints. The vanilla kernels I tried 2.6.23.1 and 2.6.24-rc1-git10 (with patch to sata_nv.c from Robert Hancock see https://bugzilla.redhat.com/show_bug.cgi?id=351451) seem to be very sensitive in this area. Whenever I got them to oops, I did not have much time to get anything read on the screen. I managed under the patched 2.6.24-rc1-git10 to manually load sata_nv and sr_mod, and then I got an OOps like Unable to handle ... NULL pointer dref at RIP ff880edf6a . libata:ata_qc_prep + 0xe2/0x15b . srmod:sr_probe Which patch is this using, the original one from Nov. 2 or the updated one from Nov. 10? The original one has a bug. I have attached 3 dmesg outputs with the openSuSE 10.3 kernel and extracts of /var/log/messages, especially some Oopses. The oopses from the vanilla kernels seem to be so bad that they do never end up in a file. I will do some more tests as soon as possible. I have attached the files as I created them, you will have to diff the single files, anyway, to get the important information out, I cannot select for you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode on systems with memory located above 4GB. We need to make sure that the legacy PRD table and padding buffer are appropriately allocated according to the DMA mask requirements of the current operating mode (ADMA or legacy). Also, we should run any DMA command with result taskfile requested in ADMA mode when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine in ADMA mode which is not allowed. Fixes Red Hat Bugzilla #351451: https://bugzilla.redhat.com/show_bug.cgi?id=351451 Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 20:01:32.0 -0600 +++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-10 19:57:47.0 -0600 @@ -247,6 +247,7 @@ void __iomem*ctl_block; void __iomem*gen_block; void __iomem*notifier_clear_block; + u64 adma_dma_mask; u8 flags; int last_issue_ncq; }; @@ -747,11 +748,29 @@ on the port. */ adma_enable = 0; nv_adma_register_mode(ap); + if (!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE)) { + /* Transitioning to legacy mode. Free the pad buffer. */ + ata_pad_free(ap, ap-host-dev); + ap-pad = NULL; + ap-pad_dma = 0; + } } else { - bounce_limit = *ap-dev-dma_mask; + bounce_limit = pp-adma_dma_mask; segment_boundary = NV_ADMA_DMA_BOUNDARY; sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN; adma_enable = 1; + + if (pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) { + /* Transitioning to ADMA mode. Free legacy PRD table + and the pad buffer. */ + ata_pad_free(ap, ap-host-dev); + ap-pad = NULL; + ap-pad_dma = 0; + dmam_free_coherent(ap-host-dev, ATA_PRD_TBL_SZ, + ap-prd, ap-prd_dma); + ap-prd = NULL; + ap-prd_dma = 0; + } } pci_read_config_dword(pdev, NV_MCP_SATA_CFG_20, current_reg); @@ -763,23 +782,45 @@ config_mask = NV_MCP_SATA_CFG_20_PORT0_EN | NV_MCP_SATA_CFG_20_PORT0_PWB_EN; + /* Set appropriate DMA mask. */ + pci_set_dma_mask(pdev, bounce_limit); + pci_set_consistent_dma_mask(pdev, bounce_limit); + + blk_queue_bounce_limit(sdev-request_queue, bounce_limit); + blk_queue_segment_boundary(sdev-request_queue, segment_boundary); + blk_queue_max_hw_segments(sdev-request_queue, sg_tablesize); + ata_port_printk(ap, KERN_INFO, + bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n, + (unsigned long long)bounce_limit, segment_boundary, + sg_tablesize); + if (adma_enable) { new_reg = current_reg | config_mask; - pp-flags = ~NV_ADMA_ATAPI_SETUP_COMPLETE; + if (pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE) { + /* Transition to ADMA mode. + Reallocate the pad buffer. */ + rc = ata_pad_alloc(ap, ap-host-dev); + pp-flags = ~NV_ADMA_ATAPI_SETUP_COMPLETE; + } } else { new_reg = current_reg ~config_mask; - pp-flags |= NV_ADMA_ATAPI_SETUP_COMPLETE; + if (!(pp-flags NV_ADMA_ATAPI_SETUP_COMPLETE)) { + /* Transition to legacy mode. + Reallocate the legacy PRD and pad buffer. */ + ap-prd = dmam_alloc_coherent(ap-host-dev, + ATA_PRD_TBL_SZ, ap-prd_dma, GFP_KERNEL); + if (!ap-prd) + rc = -ENOMEM; + else + rc = ata_pad_alloc(ap, ap-host-dev); + + pp-flags |= NV_ADMA_ATAPI_SETUP_COMPLETE; + } } if (current_reg != new_reg) pci_write_config_dword(pdev, NV_MCP_SATA_CFG_20, new_reg); - blk_queue_bounce_limit(sdev-request_queue, bounce_limit); - blk_queue_segment_boundary(sdev-request_queue, segment_boundary); - blk_queue_max_hw_segments(sdev-request_queue, sg_tablesize); - ata_port_printk(ap, KERN_INFO, - bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n, - (unsigned long long)bounce_limit, segment_boundary, sg_tablesize); return rc; } @@ -791,11 +832,13 @@ static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf
Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB
Tejun Heo wrote: How about always initialize DMA mask to ATA_DMA_MASK regardless of ADMA mode such that PRD and PAD buffers are always accessible by register mode and just raising PCI dma mask and queue bounce limit if ADMA mode is active? Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in that way (there are some dual-socket Opteron boxes with this controller, forcing an allocation below 4GB for this could force a non-optimal node allocation I think..) To do this I'd have to raise the mask for the APRD allocation, drop it again, then raise it again in ADMA mode, which is kind of ugly. Also, I'd rather not allocate the legacy PRD at all if we're in ADMA mode. That way, if some bug causes us to try and do legacy DMA in ADMA mode, we'll crash from null pointer dereference instead of potentially transferring incorrect data (as we had in this case) and corrupting things. + /* Set appropriate DMA mask. */ + pci_set_dma_mask(pdev, bounce_limit); + pci_set_consistent_dma_mask(pdev, bounce_limit); These can fail. Yes, it should likely do something with these return values. Though theoretically it shouldn't fail, since the DMA mask is either 32-bit, which shouldn't fail, or one that was successfully set before. Also I don't think the SCSI layer actually checks the slave_config return value.. sigh. Also, please separate out the result TF handling to a separate patch. I know it's a small change but as both introduces important behavior changes, I think it would be nice to have a bisection point inbetween. Could do. That change would have to come first though, as the change to not allocate the PRD except when necessary would cause some cases there to blow up when before they might have worked in some cases. Thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86_64 SATA DVD drive + libata trouble
Bernd Strieder wrote: Hello, please CC me, I'm not subscribed. If any kernel developer is interested in more specific information please mail me, I can build kernels, I can apply patches, though have not done it regularly. I'd like to get the DVD drive working somehow. I have googled a lot and did not find any more ideas what to do. Some good keywords to find a solution would suffice at that end. Rough problem description: I have a Tyan mainboard with NVIDIA chipset CK804. The only SATA/IDE device is a SATA DVD combo, the harddisks are on a RAID controller from 3ware. The harddisks are fine. The openSuSE 10.3 boot dvds fail after booting from the BIOS, the installation kernel cannot use the DVD drive. That kernel uses libata and sata_nv pata_amd as drivers. The drive is recognized but it cannot be used. This is the situation probably during install from DVD and now in the running system after a network install it persists. Reading from the dvd device /dev/sr0 with dd stops after at most 119kb of rubbish read. Mounting fails with superblock not found. When trying to remove the pata_amd module I get an Oops. I tried to remove the modules to have a chance to reload them with other options (atapi_enable), but that did not help, even after rebooting. A vanilla 2.6.23.1 kernel behaves even less friendly, the dd on /dev/sr0 causes a hard reset. So there are clearly some problems with libata in this system. I have failed switching away from libata getting the drive to be recognized at all. There is a known problem with ATAPI devices on CK804 chipsets which have memory above the 4GB mark, being debugged here: https://bugzilla.redhat.com/show_bug.cgi?id=351451 If you are running into that one you can workaround it for now by passing the adma=0 parameter to the sata_nv module (not sure how this would be done on Suse's setup) or pass sata_nv.adma=0 on the kernel command line if sata_nv is built into the kernel. If that does help, I could ask you to test patches :-) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv,ahci: add the ahci legacy mode support to sata_nv
Jeff Garzik wrote: Jeff Garzik wrote: The proposed sata_nv patch does the opposite -- guarantees we must support the continually problematic legacy IDE interface ad infinitum. Such patches are OK for the test lab, but in this specific case users /suffer/ when not running AHCI mode. Just to reinforce... sata_nv support and bug fixes are primarily done right now through the valiant efforts of Robert Hancock (with assists from Alan, Tejun, and others). Robert's job is difficult, because he has no hardware documentation[1], and NVIDIA does not seem to be helping out much with driver bug reports on the lists or in bugzillas. Right, I don't have anything. Unless the original incomplete ADMA driver release from NVIDIA counts as documentation, lol. And yes, I've CC'ed NVIDIA people about a few ADMA-related issues and been met with silence. It would be nice if they were as responsive about ADMA issues as I must say Kuan and Peer have been on the SWNCQ side of things.. As far as I know, I am the only one in the universe outside of NVIDIA with any SATA docs at all, and those docs _only_ cover ADMA registers and DMA structures, no PCI config info, no errata, nothing on SWNCQ or legacy IDE (well, half a page). NVIDIA has indeed become more engaged in sata_nv in recent times, and that's a positive sign. You, Kuon and Ayaz have all been noticeably more responsive in email. Thanks. Users have definitely benefited, particularly from your help addressing a couple SWNCQ issues. But at this point in time, being asked to choose between sata_nv and ahci is no choice at all. One has public documentation, wide industry support and little-or-no bugs. The other has several open issues, no documentation, and support obstacles. They're not even equivalent interfaces in this case, in the proposed AHCI legacy mode patch these controllers are supported in the default SFF mode only, no ADMA or SWNCQ, so you don't get any NCQ support.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_nv,ahci: add the ahci legacy mode support to sata_nv
Jeff Garzik wrote: Jeff Garzik wrote: The proposed sata_nv patch does the opposite -- guarantees we must support the continually problematic legacy IDE interface ad infinitum. Such patches are OK for the test lab, but in this specific case users /suffer/ when not running AHCI mode. Just to reinforce... sata_nv support and bug fixes are primarily done right now through the valiant efforts of Robert Hancock (with assists from Alan, Tejun, and others). Robert's job is difficult, because he has no hardware documentation[1], and NVIDIA does not seem to be helping out much with driver bug reports on the lists or in bugzillas. Right, I don't have anything. Unless the original incomplete ADMA driver release from NVIDIA counts as documentation, lol. And yes, I've CC'ed NVIDIA people about a few ADMA-related issues and been met with silence. It would be nice if they were as responsive about ADMA issues as I must say Kuan and Peer have been on the SWNCQ side of things.. As far as I know, I am the only one in the universe outside of NVIDIA with any SATA docs at all, and those docs _only_ cover ADMA registers and DMA structures, no PCI config info, no errata, nothing on SWNCQ or legacy IDE (well, half a page). NVIDIA has indeed become more engaged in sata_nv in recent times, and that's a positive sign. You, Kuon and Ayaz have all been noticeably more responsive in email. Thanks. Users have definitely benefited, particularly from your help addressing a couple SWNCQ issues. But at this point in time, being asked to choose between sata_nv and ahci is no choice at all. One has public documentation, wide industry support and little-or-no bugs. The other has several open issues, no documentation, and support obstacles. They're not even equivalent interfaces in this case, in the proposed AHCI legacy mode patch these controllers are supported in the default SFF mode only, no ADMA or SWNCQ, so you don't get any NCQ support.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86_64 SATA DVD drive + libata trouble
Bernd Strieder wrote: Hello, please CC me, I'm not subscribed. If any kernel developer is interested in more specific information please mail me, I can build kernels, I can apply patches, though have not done it regularly. I'd like to get the DVD drive working somehow. I have googled a lot and did not find any more ideas what to do. Some good keywords to find a solution would suffice at that end. Rough problem description: I have a Tyan mainboard with NVIDIA chipset CK804. The only SATA/IDE device is a SATA DVD combo, the harddisks are on a RAID controller from 3ware. The harddisks are fine. The openSuSE 10.3 boot dvds fail after booting from the BIOS, the installation kernel cannot use the DVD drive. That kernel uses libata and sata_nv pata_amd as drivers. The drive is recognized but it cannot be used. This is the situation probably during install from DVD and now in the running system after a network install it persists. Reading from the dvd device /dev/sr0 with dd stops after at most 119kb of rubbish read. Mounting fails with superblock not found. When trying to remove the pata_amd module I get an Oops. I tried to remove the modules to have a chance to reload them with other options (atapi_enable), but that did not help, even after rebooting. A vanilla 2.6.23.1 kernel behaves even less friendly, the dd on /dev/sr0 causes a hard reset. So there are clearly some problems with libata in this system. I have failed switching away from libata getting the drive to be recognized at all. There is a known problem with ATAPI devices on CK804 chipsets which have memory above the 4GB mark, being debugged here: https://bugzilla.redhat.com/show_bug.cgi?id=351451 If you are running into that one you can workaround it for now by passing the adma=0 parameter to the sata_nv module (not sure how this would be done on Suse's setup) or pass sata_nv.adma=0 on the kernel command line if sata_nv is built into the kernel. If that does help, I could ask you to test patches :-) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How do I debug PCI resource allocation problems
00 00 00 00 00 00 a0: 11 11 00 00 00 00 06 03 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 01 00 22 00 00 00 00 00 00 00 00 00 00 01 02 00 e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 f0: 12 00 03 00 00 00 00 00 90 0f 03 00 d1 cd 5d df Question: Memory region 2 at 12000? That is beyond the 4GB boundary and the BIOS guys I know told me that every PCI IOMEM region should reside within the first 4 GBs! When running the machine with 2 GB only lspci output looks like this for the VGA device: 64-bit capable PCI devices can indeed have BARs which can be located above 4GB. However, I can't see why lspci is detecting that from this configuration space: the BAR contents for region 2 are 2008, which means prefetchable memory at 0x2000 which can be located anywhere within 32-bit memory space. That doesn't make any sense though, since that's in the middle of RAM! Quite likely this bogus resource setting of the graphics controller is a large part of your problem. Question is who's doing this.. 00:02.0 Class 0300: 8086:29b2 (rev 02) (prog-if 00 [VGA]) Subsystem: 1734:10fc Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: Memory at f010 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at 1c70 [size=8] Region 2: Memory at e000 (32-bit, prefetchable) [size=256M] Region 3: Memory at f000 (32-bit, non-prefetchable) [size=1M] Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable- Address: Data: Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- That means now we get the region 2 at e000 and everything is fine. This looks more reasonable, though it's still mapped the BAR over top of a region that's reserved in the E820 memory map, which is wrong.. I tried to have a look at the PCI config space with a DOS tool from the year 1999, the hexdump looked pretty much similar, but byte 1A changed from "20" to "e0". And the region was declared as e008, which also looked strange to me. That looks more reasonable (e000, not 2000). The last 4 bits are used to encode the prefetchable flag and memory space. Question is how that got set in the bogus fashion in Linux.. This is where I am now. I also with an Intel reference mainboard (same chipset, different BIOS) and this one didn't show the problem. That makes me think that the problem is somewhere in the BIOS, but where? I have access to the BIOS developers, but they don't know much about Linux and since the other operating systems from Redmond are running without problems they are hard to convince that they made a mistake. :-) Another very bad side effect of the problem is that when the machine runs on 32-bit Linux then the graphic card seems to work, but people report corrupted file systems after a while. I guess that is related to my problem on 64 bit, only that in the 32-bit case then filesystem buffers got overwritten by the video RAM and when they are written back to disk... ouch! I also tried to track the problem down with the CD from LinuxFirmwareKit.org, but the resource allocation errors are the same and unfortunately the lack of verbosity as well. Ok, that's it. Any help is much appreciated. Thanks Rainer -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
Luca Tettamanti wrote: On Nov 7, 2007 1:55 PM, Tejun Heo <[EMAIL PROTECTED]> wrote: Florian La Roche wrote: Hello all, I've taking email addresses from the last NCQ blacklist changes going into the kernel. This Fujitsu drive also gives me spurious command completions. Detailed output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181. Let me know if you need more info or anything else. --- drivers/ata/libata-core.c +++ drivers/ata/libata-core.c @@ -4222,6 +4222,7 @@ { "WDC WD740ADFD-00NLR1", NULL, ATA_HORKAGE_NONCQ, }, { "WDC WD3200AAJS-00RYA0", "12.01B01", ATA_HORKAGE_NONCQ, }, { "FUJITSU MHV2080BH", "00840028", ATA_HORKAGE_NONCQ, }, + { "FUJITSU MHW2160BJ G2", NULL, ATA_HORKAGE_NONCQ }, { "ST9120822AS","3.CLF",ATA_HORKAGE_NONCQ, }, { "ST9160821AS","3.CLF",ATA_HORKAGE_NONCQ, }, { "ST9160821AS","3.ALD",ATA_HORKAGE_NONCQ, }, Thanks. We're currently trying to find out what's actually going on with all these drives. At first, drives which got blacklisted aren't many and made sense (had other problems with NCQ, etc..) but with new generation drives from many vendors showing the same symptom, we aren't too sure now. Is there a way to tell whether Windows is using NCQ or not? I checked the system log (or whatever it's called) on my notebook and is clean but I'm not sure it's using NCQ (I don't even know if it'd log spurious completions somewhere). Which driver is installed for the SATA controller in Windows, the chipset-manufacturer-provided AHCI driver or the default Microsoft driver? You'd need the AHCI driver installed for NCQ to be used. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
Luca Tettamanti wrote: On Nov 7, 2007 1:55 PM, Tejun Heo [EMAIL PROTECTED] wrote: Florian La Roche wrote: Hello all, I've taking email addresses from the last NCQ blacklist changes going into the kernel. This Fujitsu drive also gives me spurious command completions. Detailed output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181. Let me know if you need more info or anything else. --- drivers/ata/libata-core.c +++ drivers/ata/libata-core.c @@ -4222,6 +4222,7 @@ { WDC WD740ADFD-00NLR1, NULL, ATA_HORKAGE_NONCQ, }, { WDC WD3200AAJS-00RYA0, 12.01B01, ATA_HORKAGE_NONCQ, }, { FUJITSU MHV2080BH, 00840028, ATA_HORKAGE_NONCQ, }, + { FUJITSU MHW2160BJ G2, NULL, ATA_HORKAGE_NONCQ }, { ST9120822AS,3.CLF,ATA_HORKAGE_NONCQ, }, { ST9160821AS,3.CLF,ATA_HORKAGE_NONCQ, }, { ST9160821AS,3.ALD,ATA_HORKAGE_NONCQ, }, Thanks. We're currently trying to find out what's actually going on with all these drives. At first, drives which got blacklisted aren't many and made sense (had other problems with NCQ, etc..) but with new generation drives from many vendors showing the same symptom, we aren't too sure now. Is there a way to tell whether Windows is using NCQ or not? I checked the system log (or whatever it's called) on my notebook and is clean but I'm not sure it's using NCQ (I don't even know if it'd log spurious completions somewhere). Which driver is installed for the SATA controller in Windows, the chipset-manufacturer-provided AHCI driver or the default Microsoft driver? You'd need the AHCI driver installed for NCQ to be used. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How do I debug PCI resource allocation problems
space with a DOS tool from the year 1999, the hexdump looked pretty much similar, but byte 1A changed from 20 to e0. And the region was declared as e008, which also looked strange to me. That looks more reasonable (e000, not 2000). The last 4 bits are used to encode the prefetchable flag and memory space. Question is how that got set in the bogus fashion in Linux.. This is where I am now. I also with an Intel reference mainboard (same chipset, different BIOS) and this one didn't show the problem. That makes me think that the problem is somewhere in the BIOS, but where? I have access to the BIOS developers, but they don't know much about Linux and since the other operating systems from Redmond are running without problems they are hard to convince that they made a mistake. :-) Another very bad side effect of the problem is that when the machine runs on 32-bit Linux then the graphic card seems to work, but people report corrupted file systems after a while. I guess that is related to my problem on 64 bit, only that in the 32-bit case then filesystem buffers got overwritten by the video RAM and when they are written back to disk... ouch! I also tried to track the problem down with the CD from LinuxFirmwareKit.org, but the resource allocation errors are the same and unfortunately the lack of verbosity as well. Ok, that's it. Any help is much appreciated. Thanks Rainer -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10
Denys wrote: Finally i got full DMESG with 1GB card till end. Seems not readable too. .. ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: soft resetting link ata1.00: configured for MWDMA1 sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 00 sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 ata1: EH complete I'm guessing that your CF-to-IDE adapter doesn't have the correct lines wired up for DMA to work properly, and the card indicates DMA support, which libata tries to use but which doesn't work. It looks like it never tried falling back to PIO after DMA failed. Seems like a deficiency in the speed-down logic? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA eating my disk, port reset, destroying unrelated data
Norbert Preining wrote: Dear all! (please Cc me for answers) Since about 5 days I am having serious problems with my SATA drive: kernel 2.6.22 (from Debian/sid) hardware nv Sometimes at boot time, often/always at disk io intense stuff: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2 Serror 0x40 means a handshake error. Usually Serror indications are due to a hardware problem (bad SATA cable, power or drive problem). ata1.00: (BMDMA stat 0x25) ata1.00: cmd 35/00:00:2a:6f:c0/00:04:0c:00:00/e0 tag 0 cdb 0x0 data 524288 out res 51/84:10:1a:72:c0/84:01:0c:00:00/e0 Emask 0x10 (ATA bus error) ata1: soft resetting port ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete Even worse, sometimes the reset does not work ... ata1: device not ready (errno=-16), forcing hardreset ata1: hard resetting port ata1 SRST failed (errno=-19) ata1: reset failed (errno=-19), retrying in 10 secs .. (typed from a digital photo, nothing remains in the logs) After this I need to do a cold boot otherwise the drive is really in a bad state and not even the bios gets it right. If even the BIOS cannot reset properly then that also really points to a hardware problem.. Interestingly the whole stuff DID work for a long time until I did too many things at the same time: 2 x svn up, copying 40G from the SATA drive to an USB drive, aptitude upgrade. Before I did regularly the same stuff (like svn up etc), but this time it was too much, it seems. Apropos data hosing: After the first incident some data on my windows partitions (/dev/sda1) was hosed, programs missing, chkdisk necessary etc. I attach dmesg (from the current boot with a succeeding soft reset, I interrupted the svn process before the SATA drives goes into hard reset failures), .config, lspci -v output. Are there any chances that using 2.6.23 will improve/fix this? Any other suggestions? I would consider it an hardware problem, but since it started at one big io thingy and is persistent since then I am a bit sceptic. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
Tejun Heo wrote: Florian La Roche wrote: Hello all, I've taking email addresses from the last NCQ blacklist changes going into the kernel. This Fujitsu drive also gives me spurious command completions. Detailed output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181. Let me know if you need more info or anything else. --- drivers/ata/libata-core.c +++ drivers/ata/libata-core.c @@ -4222,6 +4222,7 @@ { "WDC WD740ADFD-00NLR1", NULL, ATA_HORKAGE_NONCQ, }, { "WDC WD3200AAJS-00RYA0", "12.01B01", ATA_HORKAGE_NONCQ, }, { "FUJITSU MHV2080BH","00840028", ATA_HORKAGE_NONCQ, }, + { "FUJITSU MHW2160BJ G2", NULL, ATA_HORKAGE_NONCQ }, { "ST9120822AS", "3.CLF", ATA_HORKAGE_NONCQ, }, { "ST9160821AS", "3.CLF", ATA_HORKAGE_NONCQ, }, { "ST9160821AS", "3.ALD", ATA_HORKAGE_NONCQ, }, Thanks. We're currently trying to find out what's actually going on with all these drives. At first, drives which got blacklisted aren't many and made sense (had other problems with NCQ, etc..) but with new generation drives from many vendors showing the same symptom, we aren't too sure now. I'll keep your email in my todo list and add the drive to the blacklist once the problem is verified. I agree that something seems fishy with this. It seems unlikely that this many drives from multiple vendors would have the exact same, relatively obscure problem.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA eating my disk, port reset, destroying unrelated data
Norbert Preining wrote: Dear all! (please Cc me for answers) Since about 5 days I am having serious problems with my SATA drive: kernel 2.6.22 (from Debian/sid) hardware nv Sometimes at boot time, often/always at disk io intense stuff: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2 Serror 0x40 means a handshake error. Usually Serror indications are due to a hardware problem (bad SATA cable, power or drive problem). ata1.00: (BMDMA stat 0x25) ata1.00: cmd 35/00:00:2a:6f:c0/00:04:0c:00:00/e0 tag 0 cdb 0x0 data 524288 out res 51/84:10:1a:72:c0/84:01:0c:00:00/e0 Emask 0x10 (ATA bus error) ata1: soft resetting port ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete Even worse, sometimes the reset does not work ... ata1: device not ready (errno=-16), forcing hardreset ata1: hard resetting port ata1 SRST failed (errno=-19) ata1: reset failed (errno=-19), retrying in 10 secs .. (typed from a digital photo, nothing remains in the logs) After this I need to do a cold boot otherwise the drive is really in a bad state and not even the bios gets it right. If even the BIOS cannot reset properly then that also really points to a hardware problem.. Interestingly the whole stuff DID work for a long time until I did too many things at the same time: 2 x svn up, copying 40G from the SATA drive to an USB drive, aptitude upgrade. Before I did regularly the same stuff (like svn up etc), but this time it was too much, it seems. Apropos data hosing: After the first incident some data on my windows partitions (/dev/sda1) was hosed, programs missing, chkdisk necessary etc. I attach dmesg (from the current boot with a succeeding soft reset, I interrupted the svn process before the SATA drives goes into hard reset failures), .config, lspci -v output. Are there any chances that using 2.6.23 will improve/fix this? Any other suggestions? I would consider it an hardware problem, but since it started at one big io thingy and is persistent since then I am a bit sceptic. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata NCQ blacklist entry
Tejun Heo wrote: Florian La Roche wrote: Hello all, I've taking email addresses from the last NCQ blacklist changes going into the kernel. This Fujitsu drive also gives me spurious command completions. Detailed output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181. Let me know if you need more info or anything else. --- drivers/ata/libata-core.c +++ drivers/ata/libata-core.c @@ -4222,6 +4222,7 @@ { WDC WD740ADFD-00NLR1, NULL, ATA_HORKAGE_NONCQ, }, { WDC WD3200AAJS-00RYA0, 12.01B01, ATA_HORKAGE_NONCQ, }, { FUJITSU MHV2080BH,00840028, ATA_HORKAGE_NONCQ, }, + { FUJITSU MHW2160BJ G2, NULL, ATA_HORKAGE_NONCQ }, { ST9120822AS, 3.CLF, ATA_HORKAGE_NONCQ, }, { ST9160821AS, 3.CLF, ATA_HORKAGE_NONCQ, }, { ST9160821AS, 3.ALD, ATA_HORKAGE_NONCQ, }, Thanks. We're currently trying to find out what's actually going on with all these drives. At first, drives which got blacklisted aren't many and made sense (had other problems with NCQ, etc..) but with new generation drives from many vendors showing the same symptom, we aren't too sure now. I'll keep your email in my todo list and add the drive to the blacklist once the problem is verified. I agree that something seems fishy with this. It seems unlikely that this many drives from multiple vendors would have the exact same, relatively obscure problem.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10
Denys wrote: Finally i got full DMESG with 1GB card till end. Seems not readable too. .. ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: soft resetting link ata1.00: configured for MWDMA1 sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 00 sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 ata1: EH complete I'm guessing that your CF-to-IDE adapter doesn't have the correct lines wired up for DMA to work properly, and the card indicates DMA support, which libata tries to use but which doesn't work. It looks like it never tried falling back to PIO after DMA failed. Seems like a deficiency in the speed-down logic? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)
Frank van Maarseveen wrote: For quite some time I'm seeing occasional lockups spread over 50 different machines I'm maintaining. Symptom: a page allocation failure with order:1, GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free pages, almost no swap used) followed by a lockup (everything dead). I've collected all (12) crash cases which occurred the last 10 weeks on 50 machines total (i.e. 1 crash every 41 weeks on average). The kernel messages are summarized to show the interesting part (IMO) they have in common. Over the years this has become the crash cause #1 for stable kernels for me (fglrx doesn't count ;). One note: I suspect that reporting a GFP_ATOMIC allocation failure in an network driver via that same driver (netconsole) may not be the smartest thing to do and this could be responsible for the lockup itself. However, the initial page allocation failure remains and I'm not sure how to address that problem. I still think the issue is memory fragmentation but if so, it looks a bit extreme to me: One system with 2GB of ram crashed after a day, merely running a couple of TCP server programs. All systems have either 1 or 2GB ram and at least 1G of (merely unused) swap. These are all order-1 allocations for received network packets that need to be allocated out of low memory (assuming you're using a 32-bit kernel), so it's quite possible for them to fail on occasion. (Are you using jumbo frames?) That should not be causing a lockup though.. the received packet should just get dropped. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)
Frank van Maarseveen wrote: For quite some time I'm seeing occasional lockups spread over 50 different machines I'm maintaining. Symptom: a page allocation failure with order:1, GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free pages, almost no swap used) followed by a lockup (everything dead). I've collected all (12) crash cases which occurred the last 10 weeks on 50 machines total (i.e. 1 crash every 41 weeks on average). The kernel messages are summarized to show the interesting part (IMO) they have in common. Over the years this has become the crash cause #1 for stable kernels for me (fglrx doesn't count ;). One note: I suspect that reporting a GFP_ATOMIC allocation failure in an network driver via that same driver (netconsole) may not be the smartest thing to do and this could be responsible for the lockup itself. However, the initial page allocation failure remains and I'm not sure how to address that problem. I still think the issue is memory fragmentation but if so, it looks a bit extreme to me: One system with 2GB of ram crashed after a day, merely running a couple of TCP server programs. All systems have either 1 or 2GB ram and at least 1G of (merely unused) swap. These are all order-1 allocations for received network packets that need to be allocated out of low memory (assuming you're using a 32-bit kernel), so it's quite possible for them to fail on occasion. (Are you using jumbo frames?) That should not be causing a lockup though.. the received packet should just get dropped. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch
Linus Torvalds wrote: On Tue, 30 Oct 2007, Robert Hancock wrote: You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for startup. If it does, it's not by necessity. As soon as you read the table location out of the ACPI tables you can start using it, and that shouldn't require any config space accesses. Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to use conf1 (which we can trust) to verify it and/or fix things up. My point was, it's not inherently necessary in order to use MMCONFIG. I'm not saying the checks (unreachable_devices and pci_mmcfg_check_hostbridge) aren't useful or needed with many real machines. However, in the event that type1 access isn't available we just skip all those checks because we have no other option. It would indeed be a pretty broken spec if there was no way to bootstrap with it even under ideal conditions.. Also, there are several devices that don't show up in the MMCFG things, or just otherwise get it wrong. So just take a look at arch/x86/pci/mmconfig-shared.c and look for "conf1". Really. Damn, I'm nervous taking any MMCFG patches that has you as an author, if you aren't even aware of these kinds of fundamnetal issues. You probably read the standards about how things are "supposed" to work, and then just believed them? Rule #1 in kernel programming: don't *ever* think that things actually work the way they are documented to work. The documentation is a starting point, nothing else. And please be defensive in programming. We *know* conf1 cycles work. The hardware has been extensively tested, and there are no firmware interactions. There is *zero* reasons to use MMCONF cycles for normal devices. Ergo: switching over to MMCONF when not needed is stupid and fragile. I can't really disagree that MMCONFIG doesn't have great advantages for most devices (though it likely is faster on a lot of platforms, which may be significant if the device does lots of config space accesses). So for the moment, avoiding using it except where necessary will likely work out (except if some system does indeed puke on mixing type1 and MMCONFIG). However, what Microsoft is doing with Vista may eventually make a difference in the future. Many hardware vendors seem to use the testing strategy of "test with latest Windows version. Works OK? Ship it." If Vista decides that MMCONFIG is good to use all the time, then type1 access support is likely going to a) end up less tested and b) probably deleted entirely in time. We've seen it before - it used to be that not using ACPI was the safe option on most hardware with Linux. Now you pretty much have to use it because the manufacturers only test with it enabled. I've seen at least one board where the interrupt routing was completely broken with ACPI off, because they obviously only tested in Windows.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata_nv and dynamically changing DMA mask?
Alan Cox wrote: On Mon, 29 Oct 2007 22:17:40 -0600 Robert Hancock <[EMAIL PROTECTED]> wrote: In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. However, when an ATAPI device like a DVD drive is connected, we can't use ADMA mode, and so we have to abide by the restrictions of a normal SFF ATA controller and can only do 32-bit DMA. We detect this and try to set the blk_queue_bounce_limit, blk_queue_segment_boundary and blk_queue_max_hw_segments to the values corresponding to a normal SFF controller. What about the DMA padding buffer from nv_adma_port_start and internal buffers for commands like request sense that don't come via the request queue directly. Indeed we do call ata_port_start from nv_adma_port_start, which calls dmam_alloc_coherent to allocate the SFF PRD table. Since the DMA mask is 64-bit, this could indeed be allocated above 4GB which would be bad. I suppose what we could do is just not call ata_port_start there, but move it into nv_adma_slave_config and call it when going into non-ADMA mode. We'd have to drop the DMA mask down to 32-bit first as well as setting blk_queue_bounce_limit though, which is one of my questions, is this OK to do? Also it seems nv_adma_use_reg_mode() can decide to send other commands via the non ADMA interface even for ATA devices. Are we 100% certain it never decides to let through a command with DMA via the register interface in this case - what do you see if you instrument the function ? The only cases where that could happen are for polling DMA commands (which I presume we never do) or where result taskfile is requested. The latter could be a problem for ATA passthrough commands using DMA, I suppose.. Question is what we can do about it.. We have to switch out of ADMA mode to read a result taskfile. I guess that's not really a problem unless somebody starts issuing NCQ commands via ATA pass-through. Do we allow that? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch
Linus Torvalds wrote: On Tue, 30 Oct 2007, Arjan van de Ven wrote: the problem is... you're not supposed to mix both types of accesses. You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for startup. If it does, it's not by necessity. As soon as you read the table location out of the ACPI tables you can start using it, and that shouldn't require any config space accesses. Also, there's reason to believe that mixing things up _has_ to work anyway, and if the issue is between "works in practice" and "theory says that you shouldn't mix", I'll take practice every time. Especially since we *know* that the theory is broken. Right now MMCONFIG is effectively disabled very aggressively because it's simply unusably flaky. So the choice is between: - don't use MMCONFIG at all, because it has so many problems - use MMCONFIG sparingly enough to hide the problems Fact is, we don't really know how many of these systems with supposedly "broken" MMCONFIG were really just suffering from the overlapping PCI/MMCONFIG address space problem, which is entirely the fault of the way we do PCI probing. I would bet quite a few of them. and what "you're supposed to do" is simply trumped by Real Life(tm). Because Intel screwed up so badly when they designed that piece of shit. (Where "screwed up badly" is the usual "left it to firmware people" thing, of course. Dammit, Intel *could* have just made it a real PCI BAR in the Northbridge, and specified it as such, and we wouldn't have these problems! But no, it had to be another idiotic "firmware tells where it is" thing) This wouldn't have helped anything with the problem in question. The fact is, CONF1 style accesses are just safer, and *work*. I would suggest a slight twist then: use CONF1 *until* you're using something above 256, and then and only then switch to MMCONFIG from then on for all accesses. No. Maybe if you do it per-device, and only *after* probing (ie we have seen multiple, and successful, accesses), but globally, absolutely not. That would be useless. The bugs we have had in this area have been exactly the kinds of things like "we don't know the real size of the MMCONFIG areas" etc. I could easily see device driver writers probing to see if something works, and I absolutely don't think we should just automatically enable MMCONFIG from then on. Why per device? It's not like the MSI case where both the platform and the device are potentially busted. Whether or not MMCONFIG works has nothing to do with the device, all that matters is whether it works on the platform. It shouldn't be the driver's responsibility to know this. But maybe we could have a per-device flag that a driver *can* set. Ie have the logic be: - use MMCONFIG if we have to (reg >= 256) OR - use MMCONFIG if the driver specifically asked us to and then drivers that absolutely need it, and know they do, can set that flag. Preferably after they actually verified that it works. How will they verify that it works? If it works, then verifying it works is all well and good. If it doesn't work, trying to verify if it does could very well blow up the machine. I've made the point before that if we're going to allow using it at all, we'd better find out if it works or not early on, not after we've been running and somebody decides it's a good idea to try using it and causing a lockup or something. That way you _can_ get the "this is how you're supposed to do it" behaviour, but you get it when there is a reasonable chance that it actually works. And quite frankly, if you're not supposed to mix these things even across devices, then I think we are better off just doing what we effectively do now: mostly ignore the damn thing because it's too broken to use. Maybe somebody inside Intel could just clarify the documentation, and change it from "you're not supposed to mix" to "mix all you want". Intel could say what they want on the subject.. but that doesn't necessarily reflect what happens with anyone else's chipset implementations. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch
Linus Torvalds wrote: On Tue, 30 Oct 2007, Arjan van de Ven wrote: the problem is... you're not supposed to mix both types of accesses. You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for startup. If it does, it's not by necessity. As soon as you read the table location out of the ACPI tables you can start using it, and that shouldn't require any config space accesses. Also, there's reason to believe that mixing things up _has_ to work anyway, and if the issue is between works in practice and theory says that you shouldn't mix, I'll take practice every time. Especially since we *know* that the theory is broken. Right now MMCONFIG is effectively disabled very aggressively because it's simply unusably flaky. So the choice is between: - don't use MMCONFIG at all, because it has so many problems - use MMCONFIG sparingly enough to hide the problems Fact is, we don't really know how many of these systems with supposedly broken MMCONFIG were really just suffering from the overlapping PCI/MMCONFIG address space problem, which is entirely the fault of the way we do PCI probing. I would bet quite a few of them. and what you're supposed to do is simply trumped by Real Life(tm). Because Intel screwed up so badly when they designed that piece of shit. (Where screwed up badly is the usual left it to firmware people thing, of course. Dammit, Intel *could* have just made it a real PCI BAR in the Northbridge, and specified it as such, and we wouldn't have these problems! But no, it had to be another idiotic firmware tells where it is thing) This wouldn't have helped anything with the problem in question. The fact is, CONF1 style accesses are just safer, and *work*. I would suggest a slight twist then: use CONF1 *until* you're using something above 256, and then and only then switch to MMCONFIG from then on for all accesses. No. Maybe if you do it per-device, and only *after* probing (ie we have seen multiple, and successful, accesses), but globally, absolutely not. That would be useless. The bugs we have had in this area have been exactly the kinds of things like we don't know the real size of the MMCONFIG areas etc. I could easily see device driver writers probing to see if something works, and I absolutely don't think we should just automatically enable MMCONFIG from then on. Why per device? It's not like the MSI case where both the platform and the device are potentially busted. Whether or not MMCONFIG works has nothing to do with the device, all that matters is whether it works on the platform. It shouldn't be the driver's responsibility to know this. But maybe we could have a per-device flag that a driver *can* set. Ie have the logic be: - use MMCONFIG if we have to (reg = 256) OR - use MMCONFIG if the driver specifically asked us to and then drivers that absolutely need it, and know they do, can set that flag. Preferably after they actually verified that it works. How will they verify that it works? If it works, then verifying it works is all well and good. If it doesn't work, trying to verify if it does could very well blow up the machine. I've made the point before that if we're going to allow using it at all, we'd better find out if it works or not early on, not after we've been running and somebody decides it's a good idea to try using it and causing a lockup or something. That way you _can_ get the this is how you're supposed to do it behaviour, but you get it when there is a reasonable chance that it actually works. And quite frankly, if you're not supposed to mix these things even across devices, then I think we are better off just doing what we effectively do now: mostly ignore the damn thing because it's too broken to use. Maybe somebody inside Intel could just clarify the documentation, and change it from you're not supposed to mix to mix all you want. Intel could say what they want on the subject.. but that doesn't necessarily reflect what happens with anyone else's chipset implementations. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata_nv and dynamically changing DMA mask?
Alan Cox wrote: On Mon, 29 Oct 2007 22:17:40 -0600 Robert Hancock [EMAIL PROTECTED] wrote: In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. However, when an ATAPI device like a DVD drive is connected, we can't use ADMA mode, and so we have to abide by the restrictions of a normal SFF ATA controller and can only do 32-bit DMA. We detect this and try to set the blk_queue_bounce_limit, blk_queue_segment_boundary and blk_queue_max_hw_segments to the values corresponding to a normal SFF controller. What about the DMA padding buffer from nv_adma_port_start and internal buffers for commands like request sense that don't come via the request queue directly. Indeed we do call ata_port_start from nv_adma_port_start, which calls dmam_alloc_coherent to allocate the SFF PRD table. Since the DMA mask is 64-bit, this could indeed be allocated above 4GB which would be bad. I suppose what we could do is just not call ata_port_start there, but move it into nv_adma_slave_config and call it when going into non-ADMA mode. We'd have to drop the DMA mask down to 32-bit first as well as setting blk_queue_bounce_limit though, which is one of my questions, is this OK to do? Also it seems nv_adma_use_reg_mode() can decide to send other commands via the non ADMA interface even for ATA devices. Are we 100% certain it never decides to let through a command with DMA via the register interface in this case - what do you see if you instrument the function ? The only cases where that could happen are for polling DMA commands (which I presume we never do) or where result taskfile is requested. The latter could be a problem for ATA passthrough commands using DMA, I suppose.. Question is what we can do about it.. We have to switch out of ADMA mode to read a result taskfile. I guess that's not really a problem unless somebody starts issuing NCQ commands via ATA pass-through. Do we allow that? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch
Linus Torvalds wrote: On Tue, 30 Oct 2007, Robert Hancock wrote: You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for startup. If it does, it's not by necessity. As soon as you read the table location out of the ACPI tables you can start using it, and that shouldn't require any config space accesses. Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to use conf1 (which we can trust) to verify it and/or fix things up. My point was, it's not inherently necessary in order to use MMCONFIG. I'm not saying the checks (unreachable_devices and pci_mmcfg_check_hostbridge) aren't useful or needed with many real machines. However, in the event that type1 access isn't available we just skip all those checks because we have no other option. It would indeed be a pretty broken spec if there was no way to bootstrap with it even under ideal conditions.. Also, there are several devices that don't show up in the MMCFG things, or just otherwise get it wrong. So just take a look at arch/x86/pci/mmconfig-shared.c and look for conf1. Really. Damn, I'm nervous taking any MMCFG patches that has you as an author, if you aren't even aware of these kinds of fundamnetal issues. You probably read the standards about how things are supposed to work, and then just believed them? Rule #1 in kernel programming: don't *ever* think that things actually work the way they are documented to work. The documentation is a starting point, nothing else. And please be defensive in programming. We *know* conf1 cycles work. The hardware has been extensively tested, and there are no firmware interactions. There is *zero* reasons to use MMCONF cycles for normal devices. Ergo: switching over to MMCONF when not needed is stupid and fragile. I can't really disagree that MMCONFIG doesn't have great advantages for most devices (though it likely is faster on a lot of platforms, which may be significant if the device does lots of config space accesses). So for the moment, avoiding using it except where necessary will likely work out (except if some system does indeed puke on mixing type1 and MMCONFIG). However, what Microsoft is doing with Vista may eventually make a difference in the future. Many hardware vendors seem to use the testing strategy of test with latest Windows version. Works OK? Ship it. If Vista decides that MMCONFIG is good to use all the time, then type1 access support is likely going to a) end up less tested and b) probably deleted entirely in time. We've seen it before - it used to be that not using ACPI was the safe option on most hardware with Linux. Now you pretty much have to use it because the manufacturers only test with it enabled. I've seen at least one board where the interrupt routing was completely broken with ACPI off, because they obviously only tested in Windows.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sata_nv and dynamically changing DMA mask?
In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. However, when an ATAPI device like a DVD drive is connected, we can't use ADMA mode, and so we have to abide by the restrictions of a normal SFF ATA controller and can only do 32-bit DMA. We detect this and try to set the blk_queue_bounce_limit, blk_queue_segment_boundary and blk_queue_max_hw_segments to the values corresponding to a normal SFF controller. However, we have this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=351451 that their DVD drive doesn't work properly on a computer with 4GB of RAM unless they either disable ADMA (thus resulting in the DMA parameters being initialized to the SFF ones from the start) or pass mem=3000M to the kernel to keep the memory above the 4GB mark from being used. Thus I suspect that what we're trying to do with the DMA parameters is not taking. Question is: is setting blk_queue_bounce_limit enough to prevent addresses outside that mask from showing up, or does the device DMA mask also need to be updated? Is there anything wrong with just changing the DMA mask at runtime? Keep in mind, ATAPI and non-ATAPI devices can potentially be switched out on the port, so the mask might need to be updated at runtime.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Page-Out of RO data
vicky wrote: Hi, Can Read-Only(RO) Section/Data of kernel can ever be paged out memory? -Vicky All kernel code and data is non-swappable in Linux.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Max Krasnyansky wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. Any ideas what might the problem be ? Can you post the full dmesg output? What kind of drive is this? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line &q
Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch
Greg KH wrote: On Fri, Oct 26, 2007 at 09:59:45AM -0700, Jesse Barnes wrote: On Thursday, October 25, 2007 7:54 pm Greg KH wrote: On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote: I think Greg doesn't like it, even though we don't have an alternative at this point... Yes, I didn't like it, Ivan didn't like it, and I got reports that it wasn't even needed at all once you upgraded your BIOS to the latest version. So, is this still needed? And if so, can you try to implement what Ivan suggested to do here instead? Yes, it's still needed. Auke rescinded his "BIOS upgrade makes it work" message, so something like this is still necessary. He did? Ugh, I can't keep these all straight, sorry. Can someone just send what they think is still needed, and explain why Ivan will not object to it? :) Here's a recap of the whole issue just for people's information: Right now we disable MMCONFIG on machines where the MCFG area is not reserved in the E820 memory map since we figure it's not valid. This is a broken heuristic because the PCI Express firmware spec doesn't require that it be so reserved, it only needs to be reserved as an ACPI motherboard resource, and so many times it's not reserved in E820 despite being completely valid and working. The mmconfig-validate-against-acpi-motherboard-resources.patch changes this to validate against the ACPI motherboard resources instead. The second problem is that on some machines, when we are doing BAR sizing on PCI devices, and write all ones to a BAR in order to determine how many bits "stick", the BAR ends up overlapping with the MCFG area. On some chipsets, this causes writes to the MCFG area (like, to restore the original BAR contents) to get decoded by the device instead of by the MCFG mechanism, which means the BAR stays disabled and configuration access stops working, wreaking havoc. Usually on these machines the MMCONFIG is located near the top of 32-bit memory and the PCI device causing problems is a PCI Express graphics card. pci-disable-decode-of-io-memory-during-bar-sizing.patch, and its successors, switch off the device's decoding during sizing so that it won't absorb the accesses to the MCFG table. The concern raised was that this might affect some devices negatively. We do avoid disabling decode on host bridges, as it's known that some of them disable RAM access when you turn decode off, stupidly. I've yet to hear of any other conclusive case where disabling the decode is harmful. In general, if disabling the decode causes issues, the mere fact of doing the BAR sizing could cause the same issues, and that is unavoidable. The other possible workaround would be to avoid using MMCONFIG until the BAR sizing is done. However, this seems like a poor solution. First of all, in the future there may come machines where MMCONFIG is the only config mechanism (or, perhaps more likely, it becomes the only tested one, so the old methods get broken). Secondly, what happens with hot-plug devices that need to be sized after MMCONFIG gets turned on? The only way these two patches are related is that the E820 check happens to wrongly disable MMCONFIG on some of the machines where the memory areas could overlap during sizing, so removing that check alone without fixing the overlap issue could cause breakage on some machines. However, this is purely by chance, and it doesn't prevent the breakage on many other machines - as well as the one mentioned in the earlier thread, there's this one: https://bugzilla.redhat.com/show_bug.cgi?id=251493 -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Max Krasnyansky wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. Any ideas what might the problem be ? Can you post the full dmesg output? What kind of drive is this? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux
Re: Page-Out of RO data
vicky wrote: Hi, Can Read-Only(RO) Section/Data of kernel can ever be paged out memory? -Vicky All kernel code and data is non-swappable in Linux.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch
Greg KH wrote: On Fri, Oct 26, 2007 at 09:59:45AM -0700, Jesse Barnes wrote: On Thursday, October 25, 2007 7:54 pm Greg KH wrote: On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote: I think Greg doesn't like it, even though we don't have an alternative at this point... Yes, I didn't like it, Ivan didn't like it, and I got reports that it wasn't even needed at all once you upgraded your BIOS to the latest version. So, is this still needed? And if so, can you try to implement what Ivan suggested to do here instead? Yes, it's still needed. Auke rescinded his BIOS upgrade makes it work message, so something like this is still necessary. He did? Ugh, I can't keep these all straight, sorry. Can someone just send what they think is still needed, and explain why Ivan will not object to it? :) Here's a recap of the whole issue just for people's information: Right now we disable MMCONFIG on machines where the MCFG area is not reserved in the E820 memory map since we figure it's not valid. This is a broken heuristic because the PCI Express firmware spec doesn't require that it be so reserved, it only needs to be reserved as an ACPI motherboard resource, and so many times it's not reserved in E820 despite being completely valid and working. The mmconfig-validate-against-acpi-motherboard-resources.patch changes this to validate against the ACPI motherboard resources instead. The second problem is that on some machines, when we are doing BAR sizing on PCI devices, and write all ones to a BAR in order to determine how many bits stick, the BAR ends up overlapping with the MCFG area. On some chipsets, this causes writes to the MCFG area (like, to restore the original BAR contents) to get decoded by the device instead of by the MCFG mechanism, which means the BAR stays disabled and configuration access stops working, wreaking havoc. Usually on these machines the MMCONFIG is located near the top of 32-bit memory and the PCI device causing problems is a PCI Express graphics card. pci-disable-decode-of-io-memory-during-bar-sizing.patch, and its successors, switch off the device's decoding during sizing so that it won't absorb the accesses to the MCFG table. The concern raised was that this might affect some devices negatively. We do avoid disabling decode on host bridges, as it's known that some of them disable RAM access when you turn decode off, stupidly. I've yet to hear of any other conclusive case where disabling the decode is harmful. In general, if disabling the decode causes issues, the mere fact of doing the BAR sizing could cause the same issues, and that is unavoidable. The other possible workaround would be to avoid using MMCONFIG until the BAR sizing is done. However, this seems like a poor solution. First of all, in the future there may come machines where MMCONFIG is the only config mechanism (or, perhaps more likely, it becomes the only tested one, so the old methods get broken). Secondly, what happens with hot-plug devices that need to be sized after MMCONFIG gets turned on? The only way these two patches are related is that the E820 check happens to wrongly disable MMCONFIG on some of the machines where the memory areas could overlap during sizing, so removing that check alone without fixing the overlap issue could cause breakage on some machines. However, this is purely by chance, and it doesn't prevent the breakage on many other machines - as well as the one mentioned in the earlier thread, there's this one: https://bugzilla.redhat.com/show_bug.cgi?id=251493 -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sata_nv and dynamically changing DMA mask?
In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. However, when an ATAPI device like a DVD drive is connected, we can't use ADMA mode, and so we have to abide by the restrictions of a normal SFF ATA controller and can only do 32-bit DMA. We detect this and try to set the blk_queue_bounce_limit, blk_queue_segment_boundary and blk_queue_max_hw_segments to the values corresponding to a normal SFF controller. However, we have this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=351451 that their DVD drive doesn't work properly on a computer with 4GB of RAM unless they either disable ADMA (thus resulting in the DMA parameters being initialized to the SFF ones from the start) or pass mem=3000M to the kernel to keep the memory above the 4GB mark from being used. Thus I suspect that what we're trying to do with the DMA parameters is not taking. Question is: is setting blk_queue_bounce_limit enough to prevent addresses outside that mask from showing up, or does the device DMA mask also need to be updated? Is there anything wrong with just changing the DMA mask at runtime? Keep in mind, ATAPI and non-ATAPI devices can potentially be switched out on the port, so the mask might need to be updated at runtime.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major SATA / EXT3 Issue?
Chris Holvenstot wrote: I am curious if anyone else has had major problems with SATA drives on the current series of kernels. I have (or rather had) two SATA drives on my system - the first was a Maxtor MaxLine 500 and the second was a Maxtor MaxLine 250. Both of these drives were plugged to the 1.5 Gigabyte / second mode. My SATA controller is integrated on my MSI motherboard and sports four ports. It is implemented using the Nvidia CK804 chipset. My processor is an AMD64 X2 4600+ running the 32 bit version of Linux. I have had these drives up and running for about six months. The first drive "failed" about 10 days ago - and unfortunately I focused on hardware error and after several attempts to get the drive back online I physically pulled it from the system. This drive was used for backups and thus was not critical to day-to-day operations. However, tonight I "lost" a second SATA drive, this one I use on a daily basis for my kernel build and test processes. It failed in the same manner as the first, which makes me a little suspicious. The first drive “failed” while I was running a modified Ubuntu 7.04 system. Because I focused on hardware as the reason for the failure I did not collect specific information about the version of the kernel being used, but it was likely to be 2.6.24-git8. The second drive “failed” tonight on what is, except for the kernel, a fairly standard Ubuntu 7.10 system (the same hardware - I upgraded my OS this past week) – the kernel in use tonight at the time of the second failure was 2.6.24-rc1-git1 In each case the failure mode appears to have been the same – the system appears to lock up. When rebooted I get a long string of messages like: Oct 26 20:07:37 localhost kernel: [ 101.581091] ata2: timeout waiting for ADMA IDLE, stat=0x440 Oct 26 20:07:37 localhost kernel: [ 101.581096] sd 1:0:0:0: [sda] Write Protect is off Oct 26 20:07:37 localhost kernel: [ 101.581174] res 71/04:08:00:00:00/04:00:1d:00:00/e0 Emask 0x1 (device error) Oct 26 20:07:37 localhost kernel: [ 101.644992] ata2.00: configured for UDMA/33 Oct 26 20:07:37 localhost kernel: [ 101.644994] ata2: EH complete Oct 26 20:07:37 localhost kernel: [ 101.645006] sd 1:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA You should try and get some output from dmesg and not from the messages log, as the log daemon seems to have a nasty habit of discarding critical output from these errors. In this case the failing command is missing and the message ordering even seems off. The hardware appears to be correctly identified by the BIOS during the power up sequence. Not much is seen in the dmesg log excpet for: [ 43.649673] scsi0 : sata_nv [ 43.649722] scsi1 : sata_nv [ 43.649776] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xcc00 irq 19 [ 43.649778] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xcc08 irq 19 There should be more than this at the very least.. As above, please try to get output from dmesg itself. When I try to run a file system check on these devices I get: e2fsck 1.40.2 (12-Jul-2007) fsck.ext2: No such file or directory while trying to open /dev/sdb1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 I have a gut feeling that when the system appears to lock up what is really going on is that the contents of the drive are being trashed. But I have no proof of that. I don't think that is the case, more like the drives have not been detected at all. If this happens after a reboot when they were working before, that sounds like some kind of a hardware issue most likely.. When I try to do a parted to see what the system thinks is on the drive I get the error message: Error: Error opening /dev/sdb: No medium found I am not having any problems with my EXT3 file systems located on “standard” IDE / PATA drives. My config file, which has not changed in months beyond taking the defaults during make oldconfig looks like: -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major SATA / EXT3 Issue?
Chris Holvenstot wrote: I am curious if anyone else has had major problems with SATA drives on the current series of kernels. I have (or rather had) two SATA drives on my system - the first was a Maxtor MaxLine 500 and the second was a Maxtor MaxLine 250. Both of these drives were plugged to the 1.5 Gigabyte / second mode. My SATA controller is integrated on my MSI motherboard and sports four ports. It is implemented using the Nvidia CK804 chipset. My processor is an AMD64 X2 4600+ running the 32 bit version of Linux. I have had these drives up and running for about six months. The first drive failed about 10 days ago - and unfortunately I focused on hardware error and after several attempts to get the drive back online I physically pulled it from the system. This drive was used for backups and thus was not critical to day-to-day operations. However, tonight I lost a second SATA drive, this one I use on a daily basis for my kernel build and test processes. It failed in the same manner as the first, which makes me a little suspicious. The first drive “failed” while I was running a modified Ubuntu 7.04 system. Because I focused on hardware as the reason for the failure I did not collect specific information about the version of the kernel being used, but it was likely to be 2.6.24-git8. The second drive “failed” tonight on what is, except for the kernel, a fairly standard Ubuntu 7.10 system (the same hardware - I upgraded my OS this past week) – the kernel in use tonight at the time of the second failure was 2.6.24-rc1-git1 In each case the failure mode appears to have been the same – the system appears to lock up. When rebooted I get a long string of messages like: Oct 26 20:07:37 localhost kernel: [ 101.581091] ata2: timeout waiting for ADMA IDLE, stat=0x440 Oct 26 20:07:37 localhost kernel: [ 101.581096] sd 1:0:0:0: [sda] Write Protect is off Oct 26 20:07:37 localhost kernel: [ 101.581174] res 71/04:08:00:00:00/04:00:1d:00:00/e0 Emask 0x1 (device error) Oct 26 20:07:37 localhost kernel: [ 101.644992] ata2.00: configured for UDMA/33 Oct 26 20:07:37 localhost kernel: [ 101.644994] ata2: EH complete Oct 26 20:07:37 localhost kernel: [ 101.645006] sd 1:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA You should try and get some output from dmesg and not from the messages log, as the log daemon seems to have a nasty habit of discarding critical output from these errors. In this case the failing command is missing and the message ordering even seems off. The hardware appears to be correctly identified by the BIOS during the power up sequence. Not much is seen in the dmesg log excpet for: [ 43.649673] scsi0 : sata_nv [ 43.649722] scsi1 : sata_nv [ 43.649776] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xcc00 irq 19 [ 43.649778] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xcc08 irq 19 There should be more than this at the very least.. As above, please try to get output from dmesg itself. When I try to run a file system check on these devices I get: e2fsck 1.40.2 (12-Jul-2007) fsck.ext2: No such file or directory while trying to open /dev/sdb1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 device I have a gut feeling that when the system appears to lock up what is really going on is that the contents of the drive are being trashed. But I have no proof of that. I don't think that is the case, more like the drives have not been detected at all. If this happens after a reboot when they were working before, that sounds like some kind of a hardware issue most likely.. When I try to do a parted to see what the system thinks is on the drive I get the error message: Error: Error opening /dev/sdb: No medium found I am not having any problems with my EXT3 file systems located on “standard” IDE / PATA drives. My config file, which has not changed in months beyond taking the defaults during make oldconfig looks like: -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree
Greg KH wrote: On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote: I think Greg doesn't like it, even though we don't have an alternative at this point... Yes, I didn't like it, Ivan didn't like it, and I got reports that it wasn't even needed at all once you upgraded your BIOS to the latest version. So, is this still needed? And if so, can you try to implement what Ivan suggested to do here instead? Aren't you guys referring to pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another one entirely, though related. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is gcc thread-unsafe?
Arjan van de Ven wrote: On Wed, 24 Oct 2007 21:29:56 -0700 "David Schwartz" <[EMAIL PROTECTED]> wrote: Well that's exactly right. For threaded programs (and maybe even real-world non-threaded ones in general), you don't want to be even _reading_ global variables if you don't need to. Cache misses and cacheline bouncing could easily cause performance to completely tank in some cases while only gaining a cycle or two in microbenchmarks for doing these funny x86 predication things. For some CPUs, replacing an conditional branch with a conditional move is a *huge* win because it cannot be mispredicted. please name one... Hint: It's not one made by either Intel or AMD in the last 4 years... It is a win if the branch cannot be effectively predicted, i.e. if the outcome is essentially random, as may occur with data-dependent conditionals. I've seen a doubling of performance on one workload using a predicated instruction instead of a branch on newer Xeons in such a case. I suspect that if branch prediction fails often, the data dependency created by the cmov, etc. is less expensive than the pipeline flush required by mispredicts.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree
Where did this patch go? I didn't get notified that anyone dropped it, but I don't see it in current -git.. [EMAIL PROTECTED] wrote: The patch titled MMCONFIG: validate against ACPI motherboard resources has been removed from the -mm tree. Its filename was mmconfig-validate-against-acpi-motherboard-resources.patch This patch was dropped because it was merged into mainline or a subsystem tree -- Subject: MMCONFIG: validate against ACPI motherboard resources From: Robert Hancock <[EMAIL PROTECTED]> This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: - Validate all MMCONFIG configurations provided, not just the first one. - Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. - Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. This also cleans up the MMCONFIG initialization functions so that they simply do nothing if MMCONFIG is not compiled in. Based on an original patch by Rajesh Shah from Intel. [EMAIL PROTECTED]: many fixes and cleanups] Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> Cc: Rajesh Shah <[EMAIL PROTECTED]> Cc: Jesse Barnes <[EMAIL PROTECTED]> Acked-by: Linus Torvalds <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Greg KH <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- arch/i386/pci/init.c|4 arch/i386/pci/mmconfig-shared.c | 151 ++ arch/i386/pci/pci.h |1 drivers/acpi/bus.c |2 include/linux/pci.h |8 + 5 files changed, 144 insertions(+), 22 deletions(-) diff -puN arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources arch/i386/pci/init.c --- a/arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources +++ a/arch/i386/pci/init.c @@ -11,9 +11,7 @@ static __init int pci_access_init(void) #ifdef CONFIG_PCI_DIRECT type = pci_direct_probe(); #endif -#ifdef CONFIG_PCI_MMCONFIG - pci_mmcfg_init(type); -#endif + pci_mmcfg_early_init(type); if (raw_pci_ops) return 0; #ifdef CONFIG_PCI_BIOS diff -puN arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources arch/i386/pci/mmconfig-shared.c --- a/arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources +++ a/arch/i386/pci/mmconfig-shared.c @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso pci_mmcfg_resources_inserted = 1; } -static void __init pci_mmcfg_reject_broken(int type) +static acpi_status __init check_mcfg_resource(struct acpi_resource *res, + void *data) +{ + struct resource *mcfg_res = data; + struct acpi_resource_address64 address; + acpi_status status; + + if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) { + struct acpi_resource_fixed_memory32 *fixmem32 = + >data.fixed_memory32; + if (!fixmem32) + return AE_OK; + if ((mcfg_res->start >= fixmem32->address) && + (mcfg_res->end < (fixmem32->address + + fixmem32->address_length))) { + mcfg_res->flags = 1; + return AE_CTRL_TERMINATE; + } + } + if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) && + (res->type != ACPI_RESOURCE_TYPE_ADDRESS64)) + return AE_OK; + + status = acpi_resource_to_address64(res, ); + if (ACPI_FAILURE(status) || + (address.address_length <= 0) || +
Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree
Where did this patch go? I didn't get notified that anyone dropped it, but I don't see it in current -git.. [EMAIL PROTECTED] wrote: The patch titled MMCONFIG: validate against ACPI motherboard resources has been removed from the -mm tree. Its filename was mmconfig-validate-against-acpi-motherboard-resources.patch This patch was dropped because it was merged into mainline or a subsystem tree -- Subject: MMCONFIG: validate against ACPI motherboard resources From: Robert Hancock [EMAIL PROTECTED] This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: - Validate all MMCONFIG configurations provided, not just the first one. - Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. - Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. This also cleans up the MMCONFIG initialization functions so that they simply do nothing if MMCONFIG is not compiled in. Based on an original patch by Rajesh Shah from Intel. [EMAIL PROTECTED]: many fixes and cleanups] Signed-off-by: Robert Hancock [EMAIL PROTECTED] Cc: Rajesh Shah [EMAIL PROTECTED] Cc: Jesse Barnes [EMAIL PROTECTED] Acked-by: Linus Torvalds [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Cc: Greg KH [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- arch/i386/pci/init.c|4 arch/i386/pci/mmconfig-shared.c | 151 ++ arch/i386/pci/pci.h |1 drivers/acpi/bus.c |2 include/linux/pci.h |8 + 5 files changed, 144 insertions(+), 22 deletions(-) diff -puN arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources arch/i386/pci/init.c --- a/arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources +++ a/arch/i386/pci/init.c @@ -11,9 +11,7 @@ static __init int pci_access_init(void) #ifdef CONFIG_PCI_DIRECT type = pci_direct_probe(); #endif -#ifdef CONFIG_PCI_MMCONFIG - pci_mmcfg_init(type); -#endif + pci_mmcfg_early_init(type); if (raw_pci_ops) return 0; #ifdef CONFIG_PCI_BIOS diff -puN arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources arch/i386/pci/mmconfig-shared.c --- a/arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources +++ a/arch/i386/pci/mmconfig-shared.c @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso pci_mmcfg_resources_inserted = 1; } -static void __init pci_mmcfg_reject_broken(int type) +static acpi_status __init check_mcfg_resource(struct acpi_resource *res, + void *data) +{ + struct resource *mcfg_res = data; + struct acpi_resource_address64 address; + acpi_status status; + + if (res-type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) { + struct acpi_resource_fixed_memory32 *fixmem32 = + res-data.fixed_memory32; + if (!fixmem32) + return AE_OK; + if ((mcfg_res-start = fixmem32-address) + (mcfg_res-end (fixmem32-address + + fixmem32-address_length))) { + mcfg_res-flags = 1; + return AE_CTRL_TERMINATE; + } + } + if ((res-type != ACPI_RESOURCE_TYPE_ADDRESS32) + (res-type != ACPI_RESOURCE_TYPE_ADDRESS64)) + return AE_OK; + + status = acpi_resource_to_address64(res, address); + if (ACPI_FAILURE(status) || + (address.address_length = 0) || + (address.resource_type != ACPI_MEMORY_RANGE)) + return AE_OK; + + if ((mcfg_res-start
Re: Is gcc thread-unsafe?
Arjan van de Ven wrote: On Wed, 24 Oct 2007 21:29:56 -0700 David Schwartz [EMAIL PROTECTED] wrote: Well that's exactly right. For threaded programs (and maybe even real-world non-threaded ones in general), you don't want to be even _reading_ global variables if you don't need to. Cache misses and cacheline bouncing could easily cause performance to completely tank in some cases while only gaining a cycle or two in microbenchmarks for doing these funny x86 predication things. For some CPUs, replacing an conditional branch with a conditional move is a *huge* win because it cannot be mispredicted. please name one... Hint: It's not one made by either Intel or AMD in the last 4 years... It is a win if the branch cannot be effectively predicted, i.e. if the outcome is essentially random, as may occur with data-dependent conditionals. I've seen a doubling of performance on one workload using a predicated instruction instead of a branch on newer Xeons in such a case. I suspect that if branch prediction fails often, the data dependency created by the cmov, etc. is less expensive than the pipeline flush required by mispredicts.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree
Greg KH wrote: On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote: I think Greg doesn't like it, even though we don't have an alternative at this point... Yes, I didn't like it, Ivan didn't like it, and I got reports that it wasn't even needed at all once you upgraded your BIOS to the latest version. So, is this still needed? And if so, can you try to implement what Ivan suggested to do here instead? Aren't you guys referring to pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another one entirely, though related. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: HIGHMEM64G Kernel (2.6.23.1) makes system crawl
Rajkumar S wrote: On 10/24/07, Robert Hancock <[EMAIL PROTECTED]> wrote: Rajkumar S wrote: Hello, I am using a Core 2 Duo E6750 CPU on an intel DG33FB mother board with 4GB Ram, running Debian Lenny. Since the box has 4 GB ram I compiled a big mem kernel, but the machine is very slow while running big mem kernel. It takes about 37 minutes to compile the intel e1000 driver (e1000-7.6.5.tar.gz) from intel site. But it's performing normally when using a non big mem kernel. The diff of the .config between working and non working is as follows. Post your contents of /proc/mtrr. Likely a BIOS bug which has been seen on a number of Intel boards, which doesn't mark all of RAM as cachable. I have upgraded the bios to latest (v. 0293 October 02, 2007) Previously the /proc/mtrr was: ravanan:~# cat /proc/mtrr reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xc000 (3072MB), size= 256MB: write-back, count=1 reg03: base=0xcf80 (3320MB), size= 8MB: uncachable, count=1 reg04: base=0xcf60 (3318MB), size= 2MB: uncachable, count=1 reg05: base=0xcf50 (3317MB), size= 1MB: uncachable, count=1 reg06: base=0x1 (4096MB), size= 512MB: write-back, count=1 reg07: base=0x12000 (4608MB), size= 128MB: write-back, count=1 Now after upgrading the bios it's reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xc000 (3072MB), size= 256MB: write-back, count=1 reg03: base=0xcf80 (3320MB), size= 8MB: uncachable, count=1 reg04: base=0xcf40 (3316MB), size= 4MB: uncachable, count=1 reg05: base=0x1 (4096MB), size= 512MB: write-back, count=1 reg06: base=0x12000 (4608MB), size= 128MB: write-back, count=1 Yup, it's a BIOS bug. Your BIOS only marks ram up to physical address of 4736MB as cacheable, while the actual RAM reported by the BIOS goes up to physical address 4800MB. I think we had a patch in -mm to detect this case and disable the extra memory (64MB in this case) to keep the kernel from using it. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: test_and_set_bit and friends ?
Mark Hounschell wrote: Mark Hounschell wrote: These calls apparently are gone. Can someone tell me why and what are the replacements. Thanks in advance Mark I got no response from the glibc people on this and the kernel-newbies list appears dead so I thought I'd try here since these calls are/were actually kernel based to begin with. Why are they no longer available in user space and what is one supposed to use now? In general, none of the kernel synchronization-type functions should have been used in userspace since they often depend on infrastructure which is only in the kernel in order to work properly. The new headers installation system strips out any code intended for kernel use only from the userspace-visible headers. (Not to mention the licensing issues - the kernel is GPL, not LGPL, so only GPL programs could have legally done so in the first place.) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: HIGHMEM64G Kernel (2.6.23.1) makes system crawl
Rajkumar S wrote: Hello, I am using a Core 2 Duo E6750 CPU on an intel DG33FB mother board with 4GB Ram, running Debian Lenny. Since the box has 4 GB ram I compiled a big mem kernel, but the machine is very slow while running big mem kernel. It takes about 37 minutes to compile the intel e1000 driver (e1000-7.6.5.tar.gz) from intel site. But it's performing normally when using a non big mem kernel. The diff of the .config between working and non working is as follows. Post your contents of /proc/mtrr. Likely a BIOS bug which has been seen on a number of Intel boards, which doesn't mark all of RAM as cachable. When the top memory starts being used with the bigmem kernel it causes a major slowdown. Check for a BIOS update from Intel, first. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: test_and_set_bit and friends ?
Mark Hounschell wrote: Mark Hounschell wrote: These calls apparently are gone. Can someone tell me why and what are the replacements. Thanks in advance Mark I got no response from the glibc people on this and the kernel-newbies list appears dead so I thought I'd try here since these calls are/were actually kernel based to begin with. Why are they no longer available in user space and what is one supposed to use now? In general, none of the kernel synchronization-type functions should have been used in userspace since they often depend on infrastructure which is only in the kernel in order to work properly. The new headers installation system strips out any code intended for kernel use only from the userspace-visible headers. (Not to mention the licensing issues - the kernel is GPL, not LGPL, so only GPL programs could have legally done so in the first place.) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/