Re: Question regarding mutex locking

2007-11-27 Thread Robert Hancock

Larry Finger wrote:

If a particular routine needs to lock a mutex, but it may be entered with that 
mutex already locked,
would the following code be SMP safe?

hold_lock = mutex_trylock()

..

if (hold_lock)
mutex_unlock()


Not if another task could be acquiring that lock at the same time, which 
is probably the case, otherwise you wouldn't need the mutex. In other 
words, if you're going to do this, you might as well toss the mutex 
entirely as it's about the same effect..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dynticks Causing High Context Switch Rate in ksoftirqd

2007-11-27 Thread Robert Hancock

[EMAIL PROTECTED] wrote:

Hello Robert,

I've attached additional detail on the config of the misbehaving system
including output from oprofile and PowerTop. PowerTop output leads me to
believe that maybe this is an interaction between my bridged ethernet
setup and dynticks? Hmmm...


Don't know about that, your top wakeups are from br_stp_enable_bridge, 
but that is only 26 a second - that doesn't explain a context switch 
rate of 150,000 a second..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dynticks Causing High Context Switch Rate in ksoftirqd

2007-11-27 Thread Robert Hancock

[EMAIL PROTECTED] wrote:

Hello Robert,

I've attached additional detail on the config of the misbehaving system
including output from oprofile and PowerTop. PowerTop output leads me to
believe that maybe this is an interaction between my bridged ethernet
setup and dynticks? Hmmm...


Don't know about that, your top wakeups are from br_stp_enable_bridge, 
but that is only 26 a second - that doesn't explain a context switch 
rate of 150,000 a second..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question regarding mutex locking

2007-11-27 Thread Robert Hancock

Larry Finger wrote:

If a particular routine needs to lock a mutex, but it may be entered with that 
mutex already locked,
would the following code be SMP safe?

hold_lock = mutex_trylock()

..

if (hold_lock)
mutex_unlock()


Not if another task could be acquiring that lock at the same time, which 
is probably the case, otherwise you wouldn't need the mutex. In other 
words, if you're going to do this, you might as well toss the mutex 
entirely as it's about the same effect..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dynticks Causing High Context Switch Rate in ksoftirqd

2007-11-26 Thread Robert Hancock

[EMAIL PROTECTED] wrote:

Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
system? The problem occurs if I config the kernel with tickless
support (i.e. CONFIG_TICK_ONESHOT=y).  (Thanks to "oprofile" for putting me
onto this.)

I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
2.6.23.9

**
*** Output from "vmstat -n 1 10" -- Note very high context switch rate ***
*** This is on a idle machine! ***
**

procs ---memory-- ---swap-- -io --system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy
id wa
 0  0  0 1925556   4768 11610400   124 26  7538  1  2
96  1
 0  0  0 1925556   4768 11610400 0 02 147329  0  1
99  0


What did oprofile show? It should be able to narrow down what 
function(s) are responsible for the CPU usage..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dynticks Causing High Context Switch Rate in ksoftirqd

2007-11-26 Thread Robert Hancock

[EMAIL PROTECTED] wrote:

Question: Why is ksoftirqd eating about 5 to 10 percent of my CPU on an idle
system? The problem occurs if I config the kernel with tickless
support (i.e. CONFIG_TICK_ONESHOT=y).  (Thanks to oprofile for putting me
onto this.)

I have noted this same problem on kernel versions: 2.6.23.1, 2.6.23.8 and
2.6.23.9

**
*** Output from vmstat -n 1 10 -- Note very high context switch rate ***
*** This is on a idle machine! ***
**

procs ---memory-- ---swap-- -io --system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy
id wa
 0  0  0 1925556   4768 11610400   124 26  7538  1  2
96  1
 0  0  0 1925556   4768 11610400 0 02 147329  0  1
99  0


What did oprofile show? It should be able to narrow down what 
function(s) are responsible for the CPU usage..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v3)

2007-11-25 Thread Robert Hancock
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened. Also, fail any attempt to
try and issue NCQ commands with result taskfile requested, since the hardware
doesn't allow this.

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-25 
16:28:58.0 -0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-25 
16:31:09.0 -0600
@@ -792,11 +792,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   /* Since commands where a result TF is requested are not
-  executed in ADMA mode, the only time this function will be called
-  in ADMA mode will be if a command fails. In this case we
-  don't care about going into register mode with ADMA commands
-  pending, as the commands will all shortly be aborted anyway. */
+   /* Other than when internal or pass-through commands are executed,
+  the only time this function will be called in ADMA mode will be
+  if a command fails. In the failure case we don't care about going
+  into register mode with ADMA commands pending, as the commands will
+  all shortly be aborted anyway. We assume that NCQ commands are not
+  issued via passthrough, which is the only way that switching into
+  ADMA mode could abort outstanding commands. */
nv_adma_register_mode(ap);
 
ata_tf_read(ap, tf);
@@ -1379,11 +1381,9 @@
struct nv_adma_port_priv *pp = qc->ap->private_data;
 
/* ADMA engine can only be used for non-ATAPI DMA commands,
-  or interrupt-driven no-data commands, where a result taskfile
-  is not required. */
+  or interrupt-driven no-data commands. */
if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
-  (qc->tf.flags & ATA_TFLAG_POLLING) ||
-  (qc->flags & ATA_QCFLAG_RESULT_TF))
+  (qc->tf.flags & ATA_TFLAG_POLLING))
return 1;
 
if ((qc->flags & ATA_QCFLAG_DMAMAP) ||
@@ -1401,6 +1401,8 @@
   NV_CPB_CTL_IEN;
 
if (nv_adma_use_reg_mode(qc)) {
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
ata_qc_prep(qc);
return;
@@ -1445,9 +1447,21 @@
 
VPRINTK("ENTER\n");
 
+   /* We can't handle result taskfile with NCQ commands, since
+  retrieving the taskfile switches us out of ADMA mode and would abort
+  existing commands. */
+   if (unlikely(qc->tf.protocol == ATA_PROT_NCQ &&
+(qc->flags & ATA_QCFLAG_RESULT_TF))) {
+   ata_dev_printk(qc->dev, KERN_ERR,
+   "NCQ w/ RESULT_TF not allowed\n");
+   return AC_ERR_SYSTEM;
+   }
+
if (nv_adma_use_reg_mode(qc)) {
/* use ATA register mode */
VPRINTK("using ATA register mode: 0x%lx\n", qc->flags);
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
return ata_qc_issue_prot(qc);
} else

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: Serial disabled in BIOS but serial modules still loaded (probably PnP related)

2007-11-25 Thread Robert Hancock

Andrey Borzenkov wrote:

I have no COM port on notebook (without port replicator which I do not have)
so COM is disabled in BIOS. No ttyS* is detected during boot (and no device
created) but I just noticed that serial modules are still loaded. Well, this
partially defeats the purpose of disabling COM port - the intention was to
free resources by *not* loading unneeded modules ...

This may have something to do with (ACPI) PnP which apparently believes COM is 
alive.
Notebook is Toshiba Portege 4000.


Probably a BIOS bug. It still lists the port in PnP data even though the 
hardware is disabled, so the kernel still tries to load the serial 
driver for it, which finds there's no port there.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth ethernet driver & Low power state

2007-11-25 Thread Robert Hancock

Jeroen wrote:

Hi,

I'm migrating my server from windows 2003 server to Ubuntu, but I am
stumbling over the "Low Power State Link Speed" option for my NIC
(forcedeth)

I need to disable this option in my windows driver otherwise the trough pout is
horrible because the link fluctuates constantly from 100/1000.

Anyway, my question is where and how can I turn off this feature for the
forcedeth driver? I've looked in the source and as far as I can tell there is no
bootoption for this. There are some references noted in the code, but AFAIK
no setting.

Any ideas? Thanks in advance!


Are you sure forcedeth even supports that feature? I haven't seen any 
code for it, and certainly it should never be enabled by default..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform

2007-11-25 Thread Robert Hancock

peerchen wrote:

According to the HyperTransport spec, 'En' indicate if the MSI Mapping is 
active. So it should be set when enable the MSI.

The patch base on kernel 2.6.24-rc3

Signed-off-by: Andy Currid <[EMAIL PROTECTED]>
Signed-off-by: Peer Chen <[EMAIL PROTECTED]>


Isn't there a way we can make this work for any upstream HT bridge, 
rather than only for specific NVIDIA chipsets?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform

2007-11-25 Thread Robert Hancock

peerchen wrote:

According to the HyperTransport spec, 'En' indicate if the MSI Mapping is 
active. So it should be set when enable the MSI.

The patch base on kernel 2.6.24-rc3

Signed-off-by: Andy Currid [EMAIL PROTECTED]
Signed-off-by: Peer Chen [EMAIL PROTECTED]


Isn't there a way we can make this work for any upstream HT bridge, 
rather than only for specific NVIDIA chipsets?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth ethernet driver Low power state

2007-11-25 Thread Robert Hancock

Jeroen wrote:

Hi,

I'm migrating my server from windows 2003 server to Ubuntu, but I am
stumbling over the Low Power State Link Speed option for my NIC
(forcedeth)

I need to disable this option in my windows driver otherwise the trough pout is
horrible because the link fluctuates constantly from 100/1000.

Anyway, my question is where and how can I turn off this feature for the
forcedeth driver? I've looked in the source and as far as I can tell there is no
bootoption for this. There are some references noted in the code, but AFAIK
no setting.

Any ideas? Thanks in advance!


Are you sure forcedeth even supports that feature? I haven't seen any 
code for it, and certainly it should never be enabled by default..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24: Serial disabled in BIOS but serial modules still loaded (probably PnP related)

2007-11-25 Thread Robert Hancock

Andrey Borzenkov wrote:

I have no COM port on notebook (without port replicator which I do not have)
so COM is disabled in BIOS. No ttyS* is detected during boot (and no device
created) but I just noticed that serial modules are still loaded. Well, this
partially defeats the purpose of disabling COM port - the intention was to
free resources by *not* loading unneeded modules ...

This may have something to do with (ACPI) PnP which apparently believes COM is 
alive.
Notebook is Toshiba Portege 4000.


Probably a BIOS bug. It still lists the port in PnP data even though the 
hardware is disabled, so the kernel still tries to load the serial 
driver for it, which finds there's no port there.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v3)

2007-11-25 Thread Robert Hancock
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened. Also, fail any attempt to
try and issue NCQ commands with result taskfile requested, since the hardware
doesn't allow this.

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

--- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-25 
16:28:58.0 -0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-25 
16:31:09.0 -0600
@@ -792,11 +792,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   /* Since commands where a result TF is requested are not
-  executed in ADMA mode, the only time this function will be called
-  in ADMA mode will be if a command fails. In this case we
-  don't care about going into register mode with ADMA commands
-  pending, as the commands will all shortly be aborted anyway. */
+   /* Other than when internal or pass-through commands are executed,
+  the only time this function will be called in ADMA mode will be
+  if a command fails. In the failure case we don't care about going
+  into register mode with ADMA commands pending, as the commands will
+  all shortly be aborted anyway. We assume that NCQ commands are not
+  issued via passthrough, which is the only way that switching into
+  ADMA mode could abort outstanding commands. */
nv_adma_register_mode(ap);
 
ata_tf_read(ap, tf);
@@ -1379,11 +1381,9 @@
struct nv_adma_port_priv *pp = qc-ap-private_data;
 
/* ADMA engine can only be used for non-ATAPI DMA commands,
-  or interrupt-driven no-data commands, where a result taskfile
-  is not required. */
+  or interrupt-driven no-data commands. */
if ((pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) ||
-  (qc-tf.flags  ATA_TFLAG_POLLING) ||
-  (qc-flags  ATA_QCFLAG_RESULT_TF))
+  (qc-tf.flags  ATA_TFLAG_POLLING))
return 1;
 
if ((qc-flags  ATA_QCFLAG_DMAMAP) ||
@@ -1401,6 +1401,8 @@
   NV_CPB_CTL_IEN;
 
if (nv_adma_use_reg_mode(qc)) {
+   BUG_ON(!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) 
+   (qc-flags  ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc-ap);
ata_qc_prep(qc);
return;
@@ -1445,9 +1447,21 @@
 
VPRINTK(ENTER\n);
 
+   /* We can't handle result taskfile with NCQ commands, since
+  retrieving the taskfile switches us out of ADMA mode and would abort
+  existing commands. */
+   if (unlikely(qc-tf.protocol == ATA_PROT_NCQ 
+(qc-flags  ATA_QCFLAG_RESULT_TF))) {
+   ata_dev_printk(qc-dev, KERN_ERR,
+   NCQ w/ RESULT_TF not allowed\n);
+   return AC_ERR_SYSTEM;
+   }
+
if (nv_adma_use_reg_mode(qc)) {
/* use ATA register mode */
VPRINTK(using ATA register mode: 0x%lx\n, qc-flags);
+   BUG_ON(!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) 
+   (qc-flags  ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc-ap);
return ata_qc_issue_prot(qc);
} else

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:
Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.



sata_mv doesn't do ATAPI at all...


Right.. missed that ATA_FLAG_NO_ATAPI. So these issues Tom is reporting 
are just with a normal SATA hard drive?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Robert Hancock

Mark Lord wrote:

Morrison, Tom wrote:

I am hopeful that the sata_mv has this bug (I proved that the
problem I was experiencing was due to the sata_mv driver with 3.75Gig 
or more of memory)...
 
I am on vacation for a week or more ...or I'd tell you today

if it did have this bug!

..

Yeah, I kind of had your reports in mind when I asked that.  :)

On a related note, I now have lots of Marvell (sata_mv) hardware here,
and an Intel CPU/chipset box with physical RAM above the 4GB boundary.


Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.


Likely it needs a similar slave_config trick to change bounce limit 
depending on the connected device, unless there is really a way to issue 
ATAPI commands with this EDMA interface, as the TODO list in sata_mv.c 
suggests may be possible..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Robert Hancock

Mark Lord wrote:

Morrison, Tom wrote:

I am hopeful that the sata_mv has this bug (I proved that the
problem I was experiencing was due to the sata_mv driver with 3.75Gig 
or more of memory)...
 
I am on vacation for a week or more ...or I'd tell you today

if it did have this bug!

..

Yeah, I kind of had your reports in mind when I asked that.  :)

On a related note, I now have lots of Marvell (sata_mv) hardware here,
and an Intel CPU/chipset box with physical RAM above the 4GB boundary.


Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.


Likely it needs a similar slave_config trick to change bounce limit 
depending on the connected device, unless there is really a way to issue 
ATAPI commands with this EDMA interface, as the TODO list in sata_mv.c 
suggests may be possible..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-23 Thread Robert Hancock

Jeff Garzik wrote:

Robert Hancock wrote:
Based on a quick look at sata_mv it appears it sets a 64-bit DMA mask 
unconditionally, but for non-ATA_PROT_DMA commands (which includes all 
ATAPI), it just falls back to ata_qc_issue_prot which issues via the 
legacy SFF interface and can only handle 32-bit addressing. So yes, it 
appears to have a similar bug as sata_nv had.



sata_mv doesn't do ATAPI at all...


Right.. missed that ATA_FLAG_NO_ATAPI. So these issues Tom is reporting 
are just with a normal SATA hard drive?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)

2007-11-22 Thread Robert Hancock
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened. Also WARN_ON() if we try
and send result taskfile commands while NCQ commands are still active, since the
hardware doesn't allow this.

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc3-git1/drivers/ata/sata_nv.c 2007-11-20 17:40:09.0 
-0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 
19:40:58.0 -0600
@@ -791,11 +791,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   /* Since commands where a result TF is requested are not
-  executed in ADMA mode, the only time this function will be called
-  in ADMA mode will be if a command fails. In this case we
-  don't care about going into register mode with ADMA commands
-  pending, as the commands will all shortly be aborted anyway. */
+   /* Other than when internal or pass-through commands are executed,
+  the only time this function will be called in ADMA mode will be
+  if a command fails. In the failure case we don't care about going
+  into register mode with ADMA commands pending, as the commands will
+  all shortly be aborted anyway. We assume that NCQ commands are not
+  issued via passthrough, which is the only way that switching into
+  ADMA mode could abort outstanding commands. */
nv_adma_register_mode(ap);
 
ata_tf_read(ap, tf);
@@ -1359,11 +1361,9 @@
struct nv_adma_port_priv *pp = qc->ap->private_data;
 
/* ADMA engine can only be used for non-ATAPI DMA commands,
-  or interrupt-driven no-data commands, where a result taskfile
-  is not required. */
+  or interrupt-driven no-data commands. */
if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
-  (qc->tf.flags & ATA_TFLAG_POLLING) ||
-  (qc->flags & ATA_QCFLAG_RESULT_TF))
+  (qc->tf.flags & ATA_TFLAG_POLLING))
return 1;
 
if ((qc->flags & ATA_QCFLAG_DMAMAP) ||
@@ -1381,6 +1381,8 @@
   NV_CPB_CTL_IEN;
 
if (nv_adma_use_reg_mode(qc)) {
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
ata_qc_prep(qc);
return;
@@ -1425,9 +1427,17 @@
 
VPRINTK("ENTER\n");
 
+   /* We can't handle result taskfile with NCQ commands active, since
+  retrieving the taskfile switches us out of ADMA mode and would abort
+  existing commands. */
+   WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) &&
+   (qc->ap->qc_allocated & ~(1 << qc->tag)));
+
if (nv_adma_use_reg_mode(qc)) {
/* use ATA register mode */
VPRINTK("using ATA register mode: 0x%lx\n", qc->flags);
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
return ata_qc_issue_prot(qc);
} else

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-22 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to delay setting the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-22 
19:42:28.0 -0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 
19:48:25.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -748,7 +749,7 @@
adma_enable = 0;
nv_adma_register_mode(ap);
} else {
-   bounce_limit = *ap->dev->dma_mask;
+   bounce_limit = pp->adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
@@ -1134,10 +1135,20 @@
void *mem;
dma_addr_t mem_dma;
void __iomem *mmio;
+   struct pci_dev *pdev = to_pci_dev(dev);
u16 tmp;
 
VPRINTK("ENTER\n");
 
+   /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+  pad buffers */
+   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   if (rc)
+   return rc;
+   rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+   if (rc)
+   return rc;
+
rc = ata_port_start(ap);
if (rc)
return rc;
@@ -1153,6 +1164,15 @@
pp->notifier_clear_block = pp->gen_block +
   NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no);
 
+   /* Now that the legacy PRD and padding buffer are allocated we can
+  safely raise the DMA mask to allocate the CPB/APRD table.
+  These are allowed to fail since we store the value that ends up
+  being used to set as the bounce limit in slave_config later if
+  needed. */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+   pp->adma_dma_mask = *dev->dma_mask;
+
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
  _dma, GFP_KERNEL);
if (!mem)
@@ -2414,12 +2434,6 @@
hpriv->type = type;
host->private_data = hpriv;
 
-   /* set 64bit dma masks, may fail */
-   if (type == ADMA) {
-   if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0)
-   pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
-   }
-
/* request and iomap NV_MMIO_BAR */
rc = pcim_iomap_regions(pdev, 1 << NV_MMIO_BAR, DRV_NAME);
if (rc)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Robert Hancock

Daniel Drake wrote:

Being spoilt by the luxuries of i386/x86_64 I've never really had a good
grasp on unaligned memory access problems on other architectures and decided
it was time to figure it out. As a result I've written this documentation
which I plan to submit for inclusion as
Documentation/unaligned_memory_access.txt

Before I do so, any comments on the following?


...


You may be wondering why you have never seen these problems on your own
architecture. Some architectures (such as i386 and x86_64) do not have this
limitation, but nevertheless it is important for you to write portable code
that works everywhere.


Also, x86 doesn't prohibit unaligned accesses, but I believe they have a 
significant performance cost and are best avoided where possible.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the interrupt going?

2007-11-22 Thread Robert Hancock

[EMAIL PROTECTED] wrote:


I tried the hammer and the problem persists.
[EMAIL PROTECTED]:~$ cat /proc/cmdline
root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash

However, I reserve the right to try the hammer again in the future. When 
I look at /proc/interrupts without the APIC:

[EMAIL PROTECTED]:~$ cat /proc/interrupts
   CPU0
  0:144XT-PIC-XTtimer
  1: 10XT-PIC-XTi8042
  2:  0XT-PIC-XTcascade
  5: 10XT-PIC-XTohci_hcd:usb5, mxser
  6:  5XT-PIC-XTfloppy
  7:  1XT-PIC-XTparport0
  8:  3XT-PIC-XTrtc
  9:  1XT-PIC-XTacpi, uhci_hcd:usb2
 10: 10XT-PIC-XTohci_hcd:usb4, ehci_hcd:usb6, 
[EMAIL PROTECTED]::01:00.0

 11:   2231XT-PIC-XTuhci_hcd:usb1, ohci_hcd:usb3, eth0
 12:130XT-PIC-XTi8042
 14:   4362XT-PIC-XTlibata
 15:  15315XT-PIC-XTlibata
NMI:  0
LOC: 130125
ERR:  0
MIS:  0

I do not even see the device that I registered unless it is that r128... 
line. However the code printed out in /var/log/messages:

Nov 22 16:05:27 bbb kernel: [  104.712473] apc8620: VID = 0x10B5
Nov 22 16:05:27 bbb kernel: [  104.712486] apc8620: mapped addr = e0bd4000
Nov 22 16:05:27 bbb kernel: [  104.713022] apc8620: registered carrier 0
Nov 22 16:05:27 bbb kernel: [  104.713028] apc8620: interrupt data 
(0xe1083e40) on irq (10) and status (0x10)


which indicates it successfully registered without being shared. When I 
have more time, I will changed the code to be a shared IRQ and try the 
noapic again.


You're not calling pci_enable_device anywhere. Unless you do this before 
requesting the IRQ, the IRQ routing may not be set up properly for your 
device and it may not even give you the right IRQ number. You should see 
a line like this somewhere in dmesg for the IRQ your card is on:


ACPI: PCI Interrupt :00:1f.2[D] -> GSI 19 (level, low) -> IRQ 17

I think this behavior changed in the somewhat recent past..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Use of mutex in interrupt context flawed/impossible, need advice.

2007-11-22 Thread Robert Hancock

Leon Woestenberg wrote:

Hello,


I'm converting an out-of-tree (*1) driver from binary semaphore to mutex.

Userspace updates a look-up-table using write(). The driver tries to
write this LUT to the FPGA in the (video frame) interrupt handler. It
is important that the LUT is consistent and thus changed atomically.
Note that it is not important that the LUT is updated each interrupt.

The current approach is to try-down()ing a binary semaphore in
interrupt context, and write the LUT to the FPGA if the semaphore was
down()ed, do nothing else.
The write() down()s the semaphore as well before updating the
in-driver-copy of the LUT, then up()s it again.

I understand this design is not clean (*2), and not even possible with
mutexes, as mutex_trylock() is not interrupt safe.

My current approach would be to have userspace write into a shadow
copy, and use a spinlock to update the live copy. The interrupt then
would try a spinlock.


Unless this update into the FPGA takes a significant amount of time, I 
wouldn't bother with that complexity - just do spin_lock_irq/irqsave on 
that spinlock.


Using a trylock for this rather sucks since the behavior is entirely 
non-deterministic. It could take a really long time in some cases for 
the trylock to ever succeed.




My feeling is that we have a  valid use of mutex_trylock() in
interrupt context; "i.e. update LUT if we can do so consistently and
in time, or not at all".

I would like to know why this is not so, and if someone has a cleaner
proposal than the "try spinlock" approach?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)

2007-11-22 Thread Robert Hancock

Tejun Heo wrote:

Hello, Robert.

Robert Hancock wrote:

This fixes some problems with ATAPI devices on nForce4 controllers in ADMA
mode on systems with memory located above 4GB. We need to delay setting the
64-bit DMA mask until the PRD table and padding buffer are allocated so that
they don't get allocated above 4GB and break legacy mode (which is needed for
ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the
legacy buffers since setting the DMA mask affects both ports and we need to
ensure the second port's buffers are allocated properly (fixes a problem
with the previous version of this patch).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

+   /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+  pad buffers */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));

[--snip--]

+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));


I'm probably being paranoid here but please add error checks.  Just
checking return value and returning error suffices.


In the 32-bit case, I'm pretty sure those are guaranteed not to fail 
because 32-bit is the default. For the 64-bit ones, we don't care if 
they fail, because then we'll just use whatever mask ends up being set 
(we store the actual set DMA mask in adma_dma_mask for use when we need 
to reconfigure the bounce limit). We definitely don't want to fail 
initialization if the 64-bit set doesn't succeed..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)

2007-11-22 Thread Robert Hancock

Tejun Heo wrote:

Hello, Robert.

Robert Hancock wrote:

This fixes some problems with ATAPI devices on nForce4 controllers in ADMA
mode on systems with memory located above 4GB. We need to delay setting the
64-bit DMA mask until the PRD table and padding buffer are allocated so that
they don't get allocated above 4GB and break legacy mode (which is needed for
ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the
legacy buffers since setting the DMA mask affects both ports and we need to
ensure the second port's buffers are allocated properly (fixes a problem
with the previous version of this patch).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

+   /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+  pad buffers */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));

[--snip--]

+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));


I'm probably being paranoid here but please add error checks.  Just
checking return value and returning error suffices.


In the 32-bit case, I'm pretty sure those are guaranteed not to fail 
because 32-bit is the default. For the 64-bit ones, we don't care if 
they fail, because then we'll just use whatever mask ends up being set 
(we store the actual set DMA mask in adma_dma_mask for use when we need 
to reconfigure the bounce limit). We definitely don't want to fail 
initialization if the 64-bit set doesn't succeed..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the interrupt going?

2007-11-22 Thread Robert Hancock

[EMAIL PROTECTED] wrote:


I tried the hammer and the problem persists.
[EMAIL PROTECTED]:~$ cat /proc/cmdline
root=UUID=8b3c3666-22c3-4c04-b399-ece266f2ef30 ro noapic quiet splash

However, I reserve the right to try the hammer again in the future. When 
I look at /proc/interrupts without the APIC:

[EMAIL PROTECTED]:~$ cat /proc/interrupts
   CPU0
  0:144XT-PIC-XTtimer
  1: 10XT-PIC-XTi8042
  2:  0XT-PIC-XTcascade
  5: 10XT-PIC-XTohci_hcd:usb5, mxser
  6:  5XT-PIC-XTfloppy
  7:  1XT-PIC-XTparport0
  8:  3XT-PIC-XTrtc
  9:  1XT-PIC-XTacpi, uhci_hcd:usb2
 10: 10XT-PIC-XTohci_hcd:usb4, ehci_hcd:usb6, 
[EMAIL PROTECTED]::01:00.0

 11:   2231XT-PIC-XTuhci_hcd:usb1, ohci_hcd:usb3, eth0
 12:130XT-PIC-XTi8042
 14:   4362XT-PIC-XTlibata
 15:  15315XT-PIC-XTlibata
NMI:  0
LOC: 130125
ERR:  0
MIS:  0

I do not even see the device that I registered unless it is that r128... 
line. However the code printed out in /var/log/messages:

Nov 22 16:05:27 bbb kernel: [  104.712473] apc8620: VID = 0x10B5
Nov 22 16:05:27 bbb kernel: [  104.712486] apc8620: mapped addr = e0bd4000
Nov 22 16:05:27 bbb kernel: [  104.713022] apc8620: registered carrier 0
Nov 22 16:05:27 bbb kernel: [  104.713028] apc8620: interrupt data 
(0xe1083e40) on irq (10) and status (0x10)


which indicates it successfully registered without being shared. When I 
have more time, I will changed the code to be a shared IRQ and try the 
noapic again.


You're not calling pci_enable_device anywhere. Unless you do this before 
requesting the IRQ, the IRQ routing may not be set up properly for your 
device and it may not even give you the right IRQ number. You should see 
a line like this somewhere in dmesg for the IRQ your card is on:


ACPI: PCI Interrupt :00:1f.2[D] - GSI 19 (level, low) - IRQ 17

I think this behavior changed in the somewhat recent past..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: don't use legacy DMA in ADMA mode (v2)

2007-11-22 Thread Robert Hancock
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened. Also WARN_ON() if we try
and send result taskfile commands while NCQ commands are still active, since the
hardware doesn't allow this.

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

--- linux-2.6.24-rc3-git1/drivers/ata/sata_nv.c 2007-11-20 17:40:09.0 
-0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 
19:40:58.0 -0600
@@ -791,11 +791,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   /* Since commands where a result TF is requested are not
-  executed in ADMA mode, the only time this function will be called
-  in ADMA mode will be if a command fails. In this case we
-  don't care about going into register mode with ADMA commands
-  pending, as the commands will all shortly be aborted anyway. */
+   /* Other than when internal or pass-through commands are executed,
+  the only time this function will be called in ADMA mode will be
+  if a command fails. In the failure case we don't care about going
+  into register mode with ADMA commands pending, as the commands will
+  all shortly be aborted anyway. We assume that NCQ commands are not
+  issued via passthrough, which is the only way that switching into
+  ADMA mode could abort outstanding commands. */
nv_adma_register_mode(ap);
 
ata_tf_read(ap, tf);
@@ -1359,11 +1361,9 @@
struct nv_adma_port_priv *pp = qc-ap-private_data;
 
/* ADMA engine can only be used for non-ATAPI DMA commands,
-  or interrupt-driven no-data commands, where a result taskfile
-  is not required. */
+  or interrupt-driven no-data commands. */
if ((pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) ||
-  (qc-tf.flags  ATA_TFLAG_POLLING) ||
-  (qc-flags  ATA_QCFLAG_RESULT_TF))
+  (qc-tf.flags  ATA_TFLAG_POLLING))
return 1;
 
if ((qc-flags  ATA_QCFLAG_DMAMAP) ||
@@ -1381,6 +1381,8 @@
   NV_CPB_CTL_IEN;
 
if (nv_adma_use_reg_mode(qc)) {
+   BUG_ON(!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) 
+   (qc-flags  ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc-ap);
ata_qc_prep(qc);
return;
@@ -1425,9 +1427,17 @@
 
VPRINTK(ENTER\n);
 
+   /* We can't handle result taskfile with NCQ commands active, since
+  retrieving the taskfile switches us out of ADMA mode and would abort
+  existing commands. */
+   WARN_ON((qc-flags  ATA_QCFLAG_RESULT_TF) 
+   (qc-ap-qc_allocated  ~(1  qc-tag)));
+
if (nv_adma_use_reg_mode(qc)) {
/* use ATA register mode */
VPRINTK(using ATA register mode: 0x%lx\n, qc-flags);
+   BUG_ON(!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) 
+   (qc-flags  ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc-ap);
return ata_qc_issue_prot(qc);
} else

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Use of mutex in interrupt context flawed/impossible, need advice.

2007-11-22 Thread Robert Hancock

Leon Woestenberg wrote:

Hello,


I'm converting an out-of-tree (*1) driver from binary semaphore to mutex.

Userspace updates a look-up-table using write(). The driver tries to
write this LUT to the FPGA in the (video frame) interrupt handler. It
is important that the LUT is consistent and thus changed atomically.
Note that it is not important that the LUT is updated each interrupt.

The current approach is to try-down()ing a binary semaphore in
interrupt context, and write the LUT to the FPGA if the semaphore was
down()ed, do nothing else.
The write() down()s the semaphore as well before updating the
in-driver-copy of the LUT, then up()s it again.

I understand this design is not clean (*2), and not even possible with
mutexes, as mutex_trylock() is not interrupt safe.

My current approach would be to have userspace write into a shadow
copy, and use a spinlock to update the live copy. The interrupt then
would try a spinlock.


Unless this update into the FPGA takes a significant amount of time, I 
wouldn't bother with that complexity - just do spin_lock_irq/irqsave on 
that spinlock.


Using a trylock for this rather sucks since the behavior is entirely 
non-deterministic. It could take a really long time in some cases for 
the trylock to ever succeed.




My feeling is that we have a  valid use of mutex_trylock() in
interrupt context; i.e. update LUT if we can do so consistently and
in time, or not at all.

I would like to know why this is not so, and if someone has a cleaner
proposal than the try spinlock approach?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v3)

2007-11-22 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to delay setting the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

--- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-22 
19:42:28.0 -0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-22 
19:48:25.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -748,7 +749,7 @@
adma_enable = 0;
nv_adma_register_mode(ap);
} else {
-   bounce_limit = *ap-dev-dma_mask;
+   bounce_limit = pp-adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
@@ -1134,10 +1135,20 @@
void *mem;
dma_addr_t mem_dma;
void __iomem *mmio;
+   struct pci_dev *pdev = to_pci_dev(dev);
u16 tmp;
 
VPRINTK(ENTER\n);
 
+   /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+  pad buffers */
+   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   if (rc)
+   return rc;
+   rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+   if (rc)
+   return rc;
+
rc = ata_port_start(ap);
if (rc)
return rc;
@@ -1153,6 +1164,15 @@
pp-notifier_clear_block = pp-gen_block +
   NV_ADMA_NOTIFIER_CLEAR + (4 * ap-port_no);
 
+   /* Now that the legacy PRD and padding buffer are allocated we can
+  safely raise the DMA mask to allocate the CPB/APRD table.
+  These are allowed to fail since we store the value that ends up
+  being used to set as the bounce limit in slave_config later if
+  needed. */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+   pp-adma_dma_mask = *dev-dma_mask;
+
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
  mem_dma, GFP_KERNEL);
if (!mem)
@@ -2414,12 +2434,6 @@
hpriv-type = type;
host-private_data = hpriv;
 
-   /* set 64bit dma masks, may fail */
-   if (type == ADMA) {
-   if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0)
-   pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
-   }
-
/* request and iomap NV_MMIO_BAR */
rc = pcim_iomap_regions(pdev, 1  NV_MMIO_BAR, DRV_NAME);
if (rc)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread Robert Hancock

Daniel Drake wrote:

Being spoilt by the luxuries of i386/x86_64 I've never really had a good
grasp on unaligned memory access problems on other architectures and decided
it was time to figure it out. As a result I've written this documentation
which I plan to submit for inclusion as
Documentation/unaligned_memory_access.txt

Before I do so, any comments on the following?


...


You may be wondering why you have never seen these problems on your own
architecture. Some architectures (such as i386 and x86_64) do not have this
limitation, but nevertheless it is important for you to write portable code
that works everywhere.


Also, x86 doesn't prohibit unaligned accesses, but I believe they have a 
significant performance cost and are best avoided where possible.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)

2007-11-21 Thread Robert Hancock

Vincent Fortier wrote:

Le mardi 20 novembre 2007 à 18:56 -0600, Robert Hancock a écrit :

This fixes some problems with ATAPI devices on nForce4 controllers in ADMA
mode on systems with memory located above 4GB. We need to delay setting the
64-bit DMA mask until the PRD table and padding buffer are allocated so that
they don't get allocated above 4GB and break legacy mode (which is needed for
ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the
legacy buffers since setting the DMA mask affects both ports and we need to
ensure the second port's buffers are allocated properly (fixes a problem
with the previous version of this patch).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>



Would this be worth sending to stable team for 2.6.22 & 2.6.23 ?


Likely (after it gets merged), those versions would have the same bug..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)

2007-11-21 Thread Robert Hancock

Vincent Fortier wrote:

Le mardi 20 novembre 2007 à 18:56 -0600, Robert Hancock a écrit :

This fixes some problems with ATAPI devices on nForce4 controllers in ADMA
mode on systems with memory located above 4GB. We need to delay setting the
64-bit DMA mask until the PRD table and padding buffer are allocated so that
they don't get allocated above 4GB and break legacy mode (which is needed for
ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the
legacy buffers since setting the DMA mask affects both ports and we need to
ensure the second port's buffers are allocated properly (fixes a problem
with the previous version of this patch).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]



Would this be worth sending to stable team for 2.6.22  2.6.23 ?


Likely (after it gets merged), those versions would have the same bug..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3: find complains about /proc/net

2007-11-20 Thread Robert Hancock

Eric W. Biederman wrote:

Could you elaborate a bit on how the semantics of returning the
wrong information are more useful?

In particular if a thread does the logical equivalent of:
grep Pid: /proc/self/status.  It always get the tgid despite
having a different process id.


The POSIX-defined userspace concept of a PID requires that all threads 
appear to have the same PID. This is something that Linux didn't comply 
with under the old LinuxThreads implementation and was finally fixed 
with NPTL. This isn't a POSIX-defined interface, but I assume it's 
trying to be consistent with getpid(), etc.



How can that possibly be useful or correct?

From the kernel side I really think the current semantics of /proc/self
in the context of threads is a bug and confusing.  All of the kernel
developers first reaction when this was pointed out was that this
is a regression.

If it is truly useful to user space we can preserve this API design
bug forever.  I just want to make certain we are not being bug
compatible without a good reason.

Currently we have several kernel side bugs with threaded
programs because /proc/self does not do the intuitive thing.  Unless
something has changed recently selinux will cause accesses by a
non-leader thread to fail when accessing files through /proc/self.

So far the more I look at the current /proc/self behavior the
more I am convinced it is broken, and useless.  Please help me see
where it is useful, so we can justify keeping it.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode

2007-11-20 Thread Robert Hancock

Tejun Heo wrote:

Tejun Heo wrote:

If so, can you please add that switching into register mode is okay as
long as there's no other ADMA commands in flight and add
WARN_ON((qc->flags & ATA_QCFLAG_RESULT_TF) && link->sactive)?


More accurately, link->sactive test can be substituted with
(ap->qc_allocated & ~(1 << qc->tag)).


Unfortunately we only get the ata_port and ata_taskfile in the tf_read 
callback, so I'm not sure if we can do the equivalent of the qc->flags & 
ATA_QCFLAG_RESULT_TF test (i.e. distinguishing between the 
error-handling case where we care if we abort outstanding commands and 
the normal case with a RESULT_TF command where we do)..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)

2007-11-20 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA
mode on systems with memory located above 4GB. We need to delay setting the
64-bit DMA mask until the PRD table and padding buffer are allocated so that
they don't get allocated above 4GB and break legacy mode (which is needed for
ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the
legacy buffers since setting the DMA mask affects both ports and we need to
ensure the second port's buffers are allocated properly (fixes a problem
with the previous version of this patch).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-20 
17:47:46.0 -0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-20 
17:50:30.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -748,7 +749,7 @@
adma_enable = 0;
nv_adma_register_mode(ap);
} else {
-   bounce_limit = *ap->dev->dma_mask;
+   bounce_limit = pp->adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
@@ -1134,10 +1135,16 @@
void *mem;
dma_addr_t mem_dma;
void __iomem *mmio;
+   struct pci_dev *pdev = to_pci_dev(dev);
u16 tmp;
 
VPRINTK("ENTER\n");
 
+   /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+  pad buffers */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+
rc = ata_port_start(ap);
if (rc)
return rc;
@@ -1153,6 +1160,14 @@
pp->notifier_clear_block = pp->gen_block +
   NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no);
 
+   /* Now that the legacy PRD and padding buffer are allocated we can
+  safely raise the DMA mask to allocate the CPB/APRD table. */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+   /* Store the mask that was actually used so we can restore it as
+  the bounce limit later if needed */
+   pp->adma_dma_mask = *dev->dma_mask;
+
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
  _dma, GFP_KERNEL);
if (!mem)
@@ -2408,12 +2423,6 @@
hpriv->type = type;
host->private_data = hpriv;
 
-   /* set 64bit dma masks, may fail */
-   if (type == ADMA) {
-   if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0)
-   pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
-   }
-
/* request and iomap NV_MMIO_BAR */
rc = pcim_iomap_regions(pdev, 1 << NV_MMIO_BAR, DRV_NAME);
if (rc)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: System reboot triggered by just reading a device file....!?

2007-11-20 Thread Robert Hancock

[EMAIL PROTECTED] wrote:
good evening, 


i stumbled over some funny issue when trying windirstat (like KDirStat) with 
wine.

after running that tool for a while my system rebooted. i could reproduce this 
with every run.

after some deep investigation (i thought i had stability issues with my system 
and spent more than an hour on this) i found out, that the reboot is being 
triggered by iTCO_wdt ( /dev/watchdog )

this is how to reproduce:

- be root
-  cat /dev/watchdog or dd if=/dev/watchdog of=/dev/zero bs=1 count=1 or .
-  wait one minute

*reboot*!

i have heard 2 opinions for now (contacted the author and also discussed on 
wine-devel ) that this should be expected behaviour.


Yes, it is. It's a watchdog device, it's meant to reboot the machine if 
whatever task is poking the watchdog dies.



being sysadmin quite a while, i cannot believe that (accidentally) reading a 
device file (being root or not - what does that matter) triggers a system 
reboot.

ok - when i`m root , i shouldn`t do stupid things and be careful, but i thought 
reading/crawling trough a filesystem (r/o, btw.) with some tool which is built 
to do exactly this wasn`t so stupid - even from within wine.


I would say that running a Windows tool that opens up and reads random 
files, on the /dev directory tree, as root, probably does qualify as 
"stupid". I'd say running pretty much anything through Wine as root is 
not a good idea, a Windows app could hose the system without even 
meaning to through exactly such things.




think of an admin writing a quick script for intrusion detection (find / 
-exec md5sum {} \; >/tmp/need-no-tripwire) and forgetting to exclude /dev, /sys or 
/proc appropriately..
think of someone exporting "/" via samba (readonly) and then navigating trough 
the /dev directory

stupid?
i don`t think so.i have seen worse things.. :)

should someone get punished  by an accidental system reboot and should he need 
to spend his time on this to investigate why this happens?

i`d wish there would be some fence around this or iTCO_wdt /dev/watchdog not 
being active after a default desktop installation.


There is.. it's called "root privileges".



i`d be interested if i`m the only one who thinks this is strange/dangerous 
behaviour.

regards
roland



--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: System reboot triggered by just reading a device file....!?

2007-11-20 Thread Robert Hancock

[EMAIL PROTECTED] wrote:
good evening, 


i stumbled over some funny issue when trying windirstat (like KDirStat) with 
wine.

after running that tool for a while my system rebooted. i could reproduce this 
with every run.

after some deep investigation (i thought i had stability issues with my system 
and spent more than an hour on this) i found out, that the reboot is being 
triggered by iTCO_wdt ( /dev/watchdog )

this is how to reproduce:

- be root
-  cat /dev/watchdog or dd if=/dev/watchdog of=/dev/zero bs=1 count=1 or .
-  wait one minute

*reboot*!

i have heard 2 opinions for now (contacted the author and also discussed on 
wine-devel ) that this should be expected behaviour.


Yes, it is. It's a watchdog device, it's meant to reboot the machine if 
whatever task is poking the watchdog dies.



being sysadmin quite a while, i cannot believe that (accidentally) reading a 
device file (being root or not - what does that matter) triggers a system 
reboot.

ok - when i`m root , i shouldn`t do stupid things and be careful, but i thought 
reading/crawling trough a filesystem (r/o, btw.) with some tool which is built 
to do exactly this wasn`t so stupid - even from within wine.


I would say that running a Windows tool that opens up and reads random 
files, on the /dev directory tree, as root, probably does qualify as 
stupid. I'd say running pretty much anything through Wine as root is 
not a good idea, a Windows app could hose the system without even 
meaning to through exactly such things.




think of an admin writing a quickdirty script for intrusion detection (find / 
-exec md5sum {} \; /tmp/need-no-tripwire) and forgetting to exclude /dev, /sys or 
/proc appropriately..
think of someone exporting / via samba (readonly) and then navigating trough 
the /dev directory

stupid?
i don`t think so.i have seen worse things.. :)

should someone get punished  by an accidental system reboot and should he need 
to spend his time on this to investigate why this happens?

i`d wish there would be some fence around this or iTCO_wdt /dev/watchdog not 
being active after a default desktop installation.


There is.. it's called root privileges.



i`d be interested if i`m the only one who thinks this is strange/dangerous 
behaviour.

regards
roland



--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB (v2)

2007-11-20 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA
mode on systems with memory located above 4GB. We need to delay setting the
64-bit DMA mask until the PRD table and padding buffer are allocated so that
they don't get allocated above 4GB and break legacy mode (which is needed for
ATAPI devices). Also, explicitly set a 32-bit DMA mask before allocating the
legacy buffers since setting the DMA mask affects both ports and we need to
ensure the second port's buffers are allocated properly (fixes a problem
with the previous version of this patch).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

--- linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c.before2 2007-11-20 
17:47:46.0 -0600
+++ linux-2.6.24-rc3-git1edit/drivers/ata/sata_nv.c 2007-11-20 
17:50:30.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -748,7 +749,7 @@
adma_enable = 0;
nv_adma_register_mode(ap);
} else {
-   bounce_limit = *ap-dev-dma_mask;
+   bounce_limit = pp-adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
@@ -1134,10 +1135,16 @@
void *mem;
dma_addr_t mem_dma;
void __iomem *mmio;
+   struct pci_dev *pdev = to_pci_dev(dev);
u16 tmp;
 
VPRINTK(ENTER\n);
 
+   /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+  pad buffers */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+
rc = ata_port_start(ap);
if (rc)
return rc;
@@ -1153,6 +1160,14 @@
pp-notifier_clear_block = pp-gen_block +
   NV_ADMA_NOTIFIER_CLEAR + (4 * ap-port_no);
 
+   /* Now that the legacy PRD and padding buffer are allocated we can
+  safely raise the DMA mask to allocate the CPB/APRD table. */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+   /* Store the mask that was actually used so we can restore it as
+  the bounce limit later if needed */
+   pp-adma_dma_mask = *dev-dma_mask;
+
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
  mem_dma, GFP_KERNEL);
if (!mem)
@@ -2408,12 +2423,6 @@
hpriv-type = type;
host-private_data = hpriv;
 
-   /* set 64bit dma masks, may fail */
-   if (type == ADMA) {
-   if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0)
-   pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
-   }
-
/* request and iomap NV_MMIO_BAR */
rc = pcim_iomap_regions(pdev, 1  NV_MMIO_BAR, DRV_NAME);
if (rc)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode

2007-11-20 Thread Robert Hancock

Tejun Heo wrote:

Tejun Heo wrote:

If so, can you please add that switching into register mode is okay as
long as there's no other ADMA commands in flight and add
WARN_ON((qc-flags  ATA_QCFLAG_RESULT_TF)  link-sactive)?


More accurately, link-sactive test can be substituted with
(ap-qc_allocated  ~(1  qc-tag)).


Unfortunately we only get the ata_port and ata_taskfile in the tf_read 
callback, so I'm not sure if we can do the equivalent of the qc-flags  
ATA_QCFLAG_RESULT_TF test (i.e. distinguishing between the 
error-handling case where we care if we abort outstanding commands and 
the normal case with a RESULT_TF command where we do)..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3: find complains about /proc/net

2007-11-20 Thread Robert Hancock

Eric W. Biederman wrote:

Could you elaborate a bit on how the semantics of returning the
wrong information are more useful?

In particular if a thread does the logical equivalent of:
grep Pid: /proc/self/status.  It always get the tgid despite
having a different process id.


The POSIX-defined userspace concept of a PID requires that all threads 
appear to have the same PID. This is something that Linux didn't comply 
with under the old LinuxThreads implementation and was finally fixed 
with NPTL. This isn't a POSIX-defined interface, but I assume it's 
trying to be consistent with getpid(), etc.



How can that possibly be useful or correct?

From the kernel side I really think the current semantics of /proc/self
in the context of threads is a bug and confusing.  All of the kernel
developers first reaction when this was pointed out was that this
is a regression.

If it is truly useful to user space we can preserve this API design
bug forever.  I just want to make certain we are not being bug
compatible without a good reason.

Currently we have several kernel side bugs with threaded
programs because /proc/self does not do the intuitive thing.  Unless
something has changed recently selinux will cause accesses by a
non-leader thread to fail when accessing files through /proc/self.

So far the more I look at the current /proc/self behavior the
more I am convinced it is broken, and useless.  Please help me see
where it is useful, so we can justify keeping it.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)

2007-11-19 Thread Robert Hancock

Tejun Heo wrote:

Robert Hancock wrote:

Tejun Heo wrote:

Robert Hancock wrote:

This fixes some problems with ATAPI devices on nForce4 controllers in
ADMA mode
on systems with memory located above 4GB. We need to delay setting
the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that
they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

applied to #tj-upstream-fixes.


I have a report that these patches crashed but the previous patch worked:

https://bugzilla.redhat.com/show_bug.cgi?id=351451

So there may still be a problem here.


Any progress?


It looks like the problem is that even though we set the DMA mask after 
we allocate the PRD and pad buffers, when the other port is set up, the 
DMA mask is already over 64-bit and so it allocates its buffers over 4GB 
and fails. I think we just need to explicitly set to 32-bit first, 
getting the reporter to try that one now.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)

2007-11-19 Thread Robert Hancock

Tejun Heo wrote:

Robert Hancock wrote:

Tejun Heo wrote:

Robert Hancock wrote:

This fixes some problems with ATAPI devices on nForce4 controllers in
ADMA mode
on systems with memory located above 4GB. We need to delay setting
the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that
they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

applied to #tj-upstream-fixes.


I have a report that these patches crashed but the previous patch worked:

https://bugzilla.redhat.com/show_bug.cgi?id=351451

So there may still be a problem here.


Any progress?


It looks like the problem is that even though we set the DMA mask after 
we allocate the PRD and pad buffers, when the other port is set up, the 
DMA mask is already over 64-bit and so it allocates its buffers over 4GB 
and fails. I think we just need to explicitly set to 32-bit first, 
getting the reporter to try that one now.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REQUEST] Option for skipping unreadable blocks on Video DVD

2007-11-17 Thread Robert Hancock

Tobias wrote:
If you are accessing a scratched Video DVD and the device cannot read it, the 
process ends. 
What about a more tolerant way to handle unreadable blocks. 
Especially on Video DVDs single blocks are not that important than on data 
dvds.


If the DVD player process ends from this, I'd say that's the fault of 
the player software not handling errors properly.


I think that if they are using the normal block layer accesses on the 
DVD device, there may be some retries that occur which are likely 
undesirable in this case since they will just stall playback. If they 
are using SG_IO to feed raw requests into the drive (which I imagine 
they need to do for CSS authentication, etc. anyway), then all error 
handling is passed up to the user application.




So is there a way that the kernel tells the device to skip these bad blocks?


We don't know they're bad until we try and read them. How long the drive 
will stall trying to read that sector before giving up and returning an 
error is up to the drive. I'm not sure if the MMC command set allows any 
way to tell the drive to give up more quickly or not..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REQUEST] Option for skipping unreadable blocks on Video DVD

2007-11-17 Thread Robert Hancock

Tobias wrote:
If you are accessing a scratched Video DVD and the device cannot read it, the 
process ends. 
What about a more tolerant way to handle unreadable blocks. 
Especially on Video DVDs single blocks are not that important than on data 
dvds.


If the DVD player process ends from this, I'd say that's the fault of 
the player software not handling errors properly.


I think that if they are using the normal block layer accesses on the 
DVD device, there may be some retries that occur which are likely 
undesirable in this case since they will just stall playback. If they 
are using SG_IO to feed raw requests into the drive (which I imagine 
they need to do for CSS authentication, etc. anyway), then all error 
handling is passed up to the user application.




So is there a way that the kernel tells the device to skip these bad blocks?


We don't know they're bad until we try and read them. How long the drive 
will stall trying to read that sector before giving up and returning an 
error is up to the drive. I'm not sure if the MMC command set allows any 
way to tell the drive to give up more quickly or not..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)

2007-11-14 Thread Robert Hancock

Tejun Heo wrote:

Robert Hancock wrote:

This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to delay setting the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>


applied to #tj-upstream-fixes.



I have a report that these patches crashed but the previous patch worked:

https://bugzilla.redhat.com/show_bug.cgi?id=351451

So there may still be a problem here.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)

2007-11-14 Thread Robert Hancock

Tejun Heo wrote:

Robert Hancock wrote:

This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to delay setting the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]


applied to #tj-upstream-fixes.



I have a report that these patches crashed but the previous patch worked:

https://bugzilla.redhat.com/show_bug.cgi?id=351451

So there may still be a problem here.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)

2007-11-13 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to delay setting the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c.before 2007-11-13 
19:04:18.0 -0600
+++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 
19:02:34.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -748,7 +749,7 @@
adma_enable = 0;
nv_adma_register_mode(ap);
} else {
-   bounce_limit = *ap->dev->dma_mask;
+   bounce_limit = pp->adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
@@ -763,6 +764,11 @@
config_mask = NV_MCP_SATA_CFG_20_PORT0_EN |
  NV_MCP_SATA_CFG_20_PORT0_PWB_EN;
 
+   /* Set appropriate DMA mask. */
+   rc = pci_set_dma_mask(pdev, bounce_limit);
+   if (rc)
+   return rc;
+
if (adma_enable) {
new_reg = current_reg | config_mask;
pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE;
@@ -1134,6 +1140,7 @@
void *mem;
dma_addr_t mem_dma;
void __iomem *mmio;
+   struct pci_dev *pdev = to_pci_dev(dev);
u16 tmp;
 
VPRINTK("ENTER\n");
@@ -1153,6 +1160,14 @@
pp->notifier_clear_block = pp->gen_block +
   NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no);
 
+   /* Now that the legacy PRD and padding buffer are allocated we can
+  safely raise the DMA mask to allocate the CPB/APRD table. */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+   /* Store the mask that was actually used so we can restore it later
+  if needed */
+   pp->adma_dma_mask = *dev->dma_mask;
+
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
  _dma, GFP_KERNEL);
if (!mem)
@@ -2408,12 +2423,6 @@
hpriv->type = type;
host->private_data = hpriv;
 
-   /* set 64bit dma masks, may fail */
-   if (type == ADMA) {
-   if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0)
-   pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
-   }
-
/* request and iomap NV_MMIO_BAR */
rc = pcim_iomap_regions(pdev, 1 << NV_MMIO_BAR, DRV_NAME);
if (rc)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode

2007-11-13 Thread Robert Hancock
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened.

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 
20:01:32.0 -0600
+++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 
19:01:09.0 -0600
@@ -791,11 +797,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   /* Since commands where a result TF is requested are not
-  executed in ADMA mode, the only time this function will be called
-  in ADMA mode will be if a command fails. In this case we
-  don't care about going into register mode with ADMA commands
-  pending, as the commands will all shortly be aborted anyway. */
+   /* Other than when internal or pass-through commands are executed,
+  the only time this function will be called in ADMA mode will be
+  if a command fails. In the failure case we don't care about going
+  into register mode with ADMA commands pending, as the commands will
+  all shortly be aborted anyway. We assume that NCQ commands are not
+  issued via passthrough and so this will not abort any commands in
+  that case. */
nv_adma_register_mode(ap);
 
ata_tf_read(ap, tf);
@@ -1359,11 +1376,9 @@
struct nv_adma_port_priv *pp = qc->ap->private_data;
 
/* ADMA engine can only be used for non-ATAPI DMA commands,
-  or interrupt-driven no-data commands, where a result taskfile
-  is not required. */
+  or interrupt-driven no-data commands. */
if ((pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
-  (qc->tf.flags & ATA_TFLAG_POLLING) ||
-  (qc->flags & ATA_QCFLAG_RESULT_TF))
+  (qc->tf.flags & ATA_TFLAG_POLLING))
return 1;
 
if ((qc->flags & ATA_QCFLAG_DMAMAP) ||
@@ -1381,6 +1396,8 @@
   NV_CPB_CTL_IEN;
 
if (nv_adma_use_reg_mode(qc)) {
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
ata_qc_prep(qc);
return;
@@ -1428,6 +1445,8 @@
if (nv_adma_use_reg_mode(qc)) {
/* use ATA register mode */
VPRINTK("using ATA register mode: 0x%lx\n", qc->flags);
+   BUG_ON(!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) &&
+   (qc->flags & ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc->ap);
return ata_qc_issue_prot(qc);
} else


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

2007-11-13 Thread Robert Hancock

Tejun Heo wrote:

Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in
that way (there are some dual-socket Opteron boxes with this controller,
forcing an allocation below 4GB for this could force a non-optimal node
allocation I think..) To do this I'd have to raise the mask for the APRD
allocation, drop it again, then raise it again in ADMA mode, which is
kind of ugly.


I don't think it really matters.  The table isn't too big and it's not
like access to the table has any processor locality.  Maybe it's better
to allocate to the same node as the irq but raising DMA mask doesn't
help at all.


It's quite possible that restricting the DMA mask will also restrict 
what node that can get allocated on. I'm not so much thinking of the CPU 
access to the table but the controller's banging on the thing several 
times for each command..




I think performance impact is nil either way but even in highly unlikely
case it has any impact, allocating PRDs under 4G should be better as it
avoids DAC cycles on the bus.  But again, this is just irrelevant.

I'd say just allocate everything under 4G.


The DAC issue shouldn't matter as these controllers are integrated into 
the chipset so it will be using all HT bus transactions, not PCI.


We can do it without all that mess in slave_config though, just by 
delaying raising the DMA mask until after the PRD/pad buffers are allocated.





Also, I'd rather not allocate the legacy PRD at all if we're in ADMA
mode. That way, if some bug causes us to try and do legacy DMA in ADMA
mode, we'll crash from null pointer dereference instead of potentially
transferring incorrect data (as we had in this case) and corrupting things.


Yeap, I can agree with this.  But can you add BUG_ON()/WARN_ON() at
places instead?  I know blanking pointers feel safer but I think it's
best to keep resource allocation / release in ->port_start/stop().


Yeah, I've got rid of that stuff now and added some BUG_ONs for this. 
Will submit the patches shortly.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

2007-11-13 Thread Robert Hancock

Tejun Heo wrote:

Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in
that way (there are some dual-socket Opteron boxes with this controller,
forcing an allocation below 4GB for this could force a non-optimal node
allocation I think..) To do this I'd have to raise the mask for the APRD
allocation, drop it again, then raise it again in ADMA mode, which is
kind of ugly.


I don't think it really matters.  The table isn't too big and it's not
like access to the table has any processor locality.  Maybe it's better
to allocate to the same node as the irq but raising DMA mask doesn't
help at all.


It's quite possible that restricting the DMA mask will also restrict 
what node that can get allocated on. I'm not so much thinking of the CPU 
access to the table but the controller's banging on the thing several 
times for each command..




I think performance impact is nil either way but even in highly unlikely
case it has any impact, allocating PRDs under 4G should be better as it
avoids DAC cycles on the bus.  But again, this is just irrelevant.

I'd say just allocate everything under 4G.


The DAC issue shouldn't matter as these controllers are integrated into 
the chipset so it will be using all HT bus transactions, not PCI.


We can do it without all that mess in slave_config though, just by 
delaying raising the DMA mask until after the PRD/pad buffers are allocated.





Also, I'd rather not allocate the legacy PRD at all if we're in ADMA
mode. That way, if some bug causes us to try and do legacy DMA in ADMA
mode, we'll crash from null pointer dereference instead of potentially
transferring incorrect data (as we had in this case) and corrupting things.


Yeap, I can agree with this.  But can you add BUG_ON()/WARN_ON() at
places instead?  I know blanking pointers feel safer but I think it's
best to keep resource allocation / release in -port_start/stop().


Yeah, I've got rid of that stuff now and added some BUG_ONs for this. 
Will submit the patches shortly.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] sata_nv: don't use legacy DMA in ADMA mode

2007-11-13 Thread Robert Hancock
We need to run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed. Enforce this with BUG_ON() since data
corruption could potentially result if this happened.

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

--- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 
20:01:32.0 -0600
+++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 
19:01:09.0 -0600
@@ -791,11 +797,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
 {
-   /* Since commands where a result TF is requested are not
-  executed in ADMA mode, the only time this function will be called
-  in ADMA mode will be if a command fails. In this case we
-  don't care about going into register mode with ADMA commands
-  pending, as the commands will all shortly be aborted anyway. */
+   /* Other than when internal or pass-through commands are executed,
+  the only time this function will be called in ADMA mode will be
+  if a command fails. In the failure case we don't care about going
+  into register mode with ADMA commands pending, as the commands will
+  all shortly be aborted anyway. We assume that NCQ commands are not
+  issued via passthrough and so this will not abort any commands in
+  that case. */
nv_adma_register_mode(ap);
 
ata_tf_read(ap, tf);
@@ -1359,11 +1376,9 @@
struct nv_adma_port_priv *pp = qc-ap-private_data;
 
/* ADMA engine can only be used for non-ATAPI DMA commands,
-  or interrupt-driven no-data commands, where a result taskfile
-  is not required. */
+  or interrupt-driven no-data commands. */
if ((pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) ||
-  (qc-tf.flags  ATA_TFLAG_POLLING) ||
-  (qc-flags  ATA_QCFLAG_RESULT_TF))
+  (qc-tf.flags  ATA_TFLAG_POLLING))
return 1;
 
if ((qc-flags  ATA_QCFLAG_DMAMAP) ||
@@ -1381,6 +1396,8 @@
   NV_CPB_CTL_IEN;
 
if (nv_adma_use_reg_mode(qc)) {
+   BUG_ON(!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) 
+   (qc-flags  ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc-ap);
ata_qc_prep(qc);
return;
@@ -1428,6 +1445,8 @@
if (nv_adma_use_reg_mode(qc)) {
/* use ATA register mode */
VPRINTK(using ATA register mode: 0x%lx\n, qc-flags);
+   BUG_ON(!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) 
+   (qc-flags  ATA_QCFLAG_DMAMAP));
nv_adma_register_mode(qc-ap);
return ata_qc_issue_prot(qc);
} else


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] sata_nv: fix ATAPI issues with memory over 4GB (v3)

2007-11-13 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to delay setting the 64-bit
DMA mask until the PRD table and padding buffer are allocated so that they don't
get allocated above 4GB and break legacy mode (which is needed for ATAPI
devices).

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

--- linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c.before 2007-11-13 
19:04:18.0 -0600
+++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-13 
19:02:34.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -748,7 +749,7 @@
adma_enable = 0;
nv_adma_register_mode(ap);
} else {
-   bounce_limit = *ap-dev-dma_mask;
+   bounce_limit = pp-adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
@@ -763,6 +764,11 @@
config_mask = NV_MCP_SATA_CFG_20_PORT0_EN |
  NV_MCP_SATA_CFG_20_PORT0_PWB_EN;
 
+   /* Set appropriate DMA mask. */
+   rc = pci_set_dma_mask(pdev, bounce_limit);
+   if (rc)
+   return rc;
+
if (adma_enable) {
new_reg = current_reg | config_mask;
pp-flags = ~NV_ADMA_ATAPI_SETUP_COMPLETE;
@@ -1134,6 +1140,7 @@
void *mem;
dma_addr_t mem_dma;
void __iomem *mmio;
+   struct pci_dev *pdev = to_pci_dev(dev);
u16 tmp;
 
VPRINTK(ENTER\n);
@@ -1153,6 +1160,14 @@
pp-notifier_clear_block = pp-gen_block +
   NV_ADMA_NOTIFIER_CLEAR + (4 * ap-port_no);
 
+   /* Now that the legacy PRD and padding buffer are allocated we can
+  safely raise the DMA mask to allocate the CPB/APRD table. */
+   pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+   pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+   /* Store the mask that was actually used so we can restore it later
+  if needed */
+   pp-adma_dma_mask = *dev-dma_mask;
+
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
  mem_dma, GFP_KERNEL);
if (!mem)
@@ -2408,12 +2423,6 @@
hpriv-type = type;
host-private_data = hpriv;
 
-   /* set 64bit dma masks, may fail */
-   if (type == ADMA) {
-   if (pci_set_dma_mask(pdev, DMA_64BIT_MASK) == 0)
-   pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
-   }
-
/* request and iomap NV_MMIO_BAR */
rc = pcim_iomap_regions(pdev, 1  NV_MMIO_BAR, DRV_NAME);
if (rc)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

2007-11-12 Thread Robert Hancock

Tejun Heo wrote:

How about always initialize DMA mask to ATA_DMA_MASK regardless of ADMA
mode such that PRD and PAD buffers are always accessible by register
mode and just raising PCI dma mask and queue bounce limit if ADMA mode
is active?


Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in 
that way (there are some dual-socket Opteron boxes with this controller, 
forcing an allocation below 4GB for this could force a non-optimal node 
allocation I think..) To do this I'd have to raise the mask for the APRD 
allocation, drop it again, then raise it again in ADMA mode, which is 
kind of ugly.


Also, I'd rather not allocate the legacy PRD at all if we're in ADMA 
mode. That way, if some bug causes us to try and do legacy DMA in ADMA 
mode, we'll crash from null pointer dereference instead of potentially 
transferring incorrect data (as we had in this case) and corrupting things.





+   /* Set appropriate DMA mask. */
+   pci_set_dma_mask(pdev, bounce_limit);
+   pci_set_consistent_dma_mask(pdev, bounce_limit);


These can fail.


Yes, it should likely do something with these return values. Though 
theoretically it shouldn't fail, since the DMA mask is either 32-bit, 
which shouldn't fail, or one that was successfully set before. Also I 
don't think the SCSI layer actually checks the slave_config return 
value.. sigh.




Also, please separate out the result TF handling to a separate patch.  I
know it's a small change but as both introduces important behavior
changes, I think it would be nice to have a bisection point inbetween.


Could do. That change would have to come first though, as the change to 
not allocate the PRD except when necessary would cause some cases there 
to blow up when before they might have worked in some cases.




Thanks.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

2007-11-12 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to make sure that the legacy
PRD table and padding buffer are appropriately allocated according to the
DMA mask requirements of the current operating mode (ADMA or legacy).

Also, we should run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed.

Fixes Red Hat Bugzilla #351451: 
https://bugzilla.redhat.com/show_bug.cgi?id=351451

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 
20:01:32.0 -0600
+++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-10 
19:57:47.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -747,11 +748,29 @@
   on the port. */
adma_enable = 0;
nv_adma_register_mode(ap);
+   if (!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+   /* Transitioning to legacy mode. Free the pad buffer. */
+   ata_pad_free(ap, ap->host->dev);
+   ap->pad = NULL;
+   ap->pad_dma = 0;
+   }
} else {
-   bounce_limit = *ap->dev->dma_mask;
+   bounce_limit = pp->adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
+
+   if (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) {
+   /* Transitioning to ADMA mode. Free legacy PRD table
+  and the pad buffer. */
+   ata_pad_free(ap, ap->host->dev);
+   ap->pad = NULL;
+   ap->pad_dma = 0;
+   dmam_free_coherent(ap->host->dev, ATA_PRD_TBL_SZ,
+   ap->prd, ap->prd_dma);
+   ap->prd = NULL;
+   ap->prd_dma = 0;
+   }
}
 
pci_read_config_dword(pdev, NV_MCP_SATA_CFG_20, _reg);
@@ -763,23 +782,45 @@
config_mask = NV_MCP_SATA_CFG_20_PORT0_EN |
  NV_MCP_SATA_CFG_20_PORT0_PWB_EN;
 
+   /* Set appropriate DMA mask. */
+   pci_set_dma_mask(pdev, bounce_limit);
+   pci_set_consistent_dma_mask(pdev, bounce_limit);
+
+   blk_queue_bounce_limit(sdev->request_queue, bounce_limit);
+   blk_queue_segment_boundary(sdev->request_queue, segment_boundary);
+   blk_queue_max_hw_segments(sdev->request_queue, sg_tablesize);
+   ata_port_printk(ap, KERN_INFO,
+   "bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n",
+   (unsigned long long)bounce_limit, segment_boundary,
+   sg_tablesize);
+
if (adma_enable) {
new_reg = current_reg | config_mask;
-   pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE;
+   if (pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) {
+   /* Transition to ADMA mode.
+  Reallocate the pad buffer. */
+   rc = ata_pad_alloc(ap, ap->host->dev);
+   pp->flags &= ~NV_ADMA_ATAPI_SETUP_COMPLETE;
+   }
} else {
new_reg = current_reg & ~config_mask;
-   pp->flags |= NV_ADMA_ATAPI_SETUP_COMPLETE;
+   if (!(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+   /* Transition to legacy mode.
+  Reallocate the legacy PRD and pad buffer. */
+   ap->prd = dmam_alloc_coherent(ap->host->dev,
+   ATA_PRD_TBL_SZ, >prd_dma, GFP_KERNEL);
+   if (!ap->prd)
+   rc = -ENOMEM;
+   else
+   rc = ata_pad_alloc(ap, ap->host->dev);
+
+   pp->flags |= NV_ADMA_ATAPI_SETUP_COMPLETE;
+   }
}
 
if (current_reg != new_reg)
pci_write_config_dword(pdev, NV_MCP_SATA_CFG_20, new_reg);
 
-   blk_queue_bounce_limit(sdev->request_queue, bounce_limit);
-   blk_queue_segment_boundary(sdev->request_queue, segment_boundary);
-   blk_queue_max_hw_segments(sdev->request_queue, sg_tablesize);
-   ata_port_printk(ap, KERN_INFO,
-   "bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n&

Re: x86_64 SATA DVD drive + libata trouble

2007-11-12 Thread Robert Hancock

Bernd Strieder wrote:
I managed to get it running with the SuSE kernel when passing 
adma=0 to sata_nv module, and I managed to get it running when 
passing mem=2000M to the SuSE kernel. Thanks to Robert for those 
hints.


The vanilla kernels I tried 2.6.23.1 and 2.6.24-rc1-git10 (with 
patch to sata_nv.c from Robert Hancock see 
https://bugzilla.redhat.com/show_bug.cgi?id=351451)  seem to be 
very sensitive in this area. Whenever I got them to oops, I did 
not have much time to get anything read on the screen. 

I managed under the patched 2.6.24-rc1-git10 to manually load 
sata_nv and sr_mod, and then I got an OOps like


Unable to handle ... NULL pointer dref at  RIP 
ff880edf6a

.
libata:ata_qc_prep + 0xe2/0x15b
.
srmod:sr_probe


Which patch is this using, the original one from Nov. 2 or the updated 
one from Nov. 10? The original one has a bug.


I have  attached 3 dmesg outputs with the openSuSE 10.3 kernel and 
extracts of /var/log/messages, especially some Oopses. The oopses 
from the vanilla kernels seem to be so bad that they do never end 
up in a file. 

I will do some more tests as soon as possible. I have attached the 
files as I created them, you will have to diff the single files, 
anyway, to get the important information out, I cannot select for 
you.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64 SATA DVD drive + libata trouble

2007-11-12 Thread Robert Hancock

Bernd Strieder wrote:
I managed to get it running with the SuSE kernel when passing 
adma=0 to sata_nv module, and I managed to get it running when 
passing mem=2000M to the SuSE kernel. Thanks to Robert for those 
hints.


The vanilla kernels I tried 2.6.23.1 and 2.6.24-rc1-git10 (with 
patch to sata_nv.c from Robert Hancock see 
https://bugzilla.redhat.com/show_bug.cgi?id=351451)  seem to be 
very sensitive in this area. Whenever I got them to oops, I did 
not have much time to get anything read on the screen. 

I managed under the patched 2.6.24-rc1-git10 to manually load 
sata_nv and sr_mod, and then I got an OOps like


Unable to handle ... NULL pointer dref at  RIP 
ff880edf6a

.
libata:ata_qc_prep + 0xe2/0x15b
.
srmod:sr_probe


Which patch is this using, the original one from Nov. 2 or the updated 
one from Nov. 10? The original one has a bug.


I have  attached 3 dmesg outputs with the openSuSE 10.3 kernel and 
extracts of /var/log/messages, especially some Oopses. The oopses 
from the vanilla kernels seem to be so bad that they do never end 
up in a file. 

I will do some more tests as soon as possible. I have attached the 
files as I created them, you will have to diff the single files, 
anyway, to get the important information out, I cannot select for 
you.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

2007-11-12 Thread Robert Hancock
This fixes some problems with ATAPI devices on nForce4 controllers in ADMA mode
on systems with memory located above 4GB. We need to make sure that the legacy
PRD table and padding buffer are appropriately allocated according to the
DMA mask requirements of the current operating mode (ADMA or legacy).

Also, we should run any DMA command with result taskfile requested in ADMA mode
when the port is in ADMA mode, otherwise it may try to use the legacy DMA engine
in ADMA mode which is not allowed.

Fixes Red Hat Bugzilla #351451: 
https://bugzilla.redhat.com/show_bug.cgi?id=351451

Signed-off-by: Robert Hancock [EMAIL PROTECTED]

--- linux-2.6.24-rc1-git10/drivers/ata/sata_nv.c2007-11-01 
20:01:32.0 -0600
+++ linux-2.6.24-rc1-git10edit/drivers/ata/sata_nv.c2007-11-10 
19:57:47.0 -0600
@@ -247,6 +247,7 @@
void __iomem*ctl_block;
void __iomem*gen_block;
void __iomem*notifier_clear_block;
+   u64 adma_dma_mask;
u8  flags;
int last_issue_ncq;
 };
@@ -747,11 +748,29 @@
   on the port. */
adma_enable = 0;
nv_adma_register_mode(ap);
+   if (!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+   /* Transitioning to legacy mode. Free the pad buffer. */
+   ata_pad_free(ap, ap-host-dev);
+   ap-pad = NULL;
+   ap-pad_dma = 0;
+   }
} else {
-   bounce_limit = *ap-dev-dma_mask;
+   bounce_limit = pp-adma_dma_mask;
segment_boundary = NV_ADMA_DMA_BOUNDARY;
sg_tablesize = NV_ADMA_SGTBL_TOTAL_LEN;
adma_enable = 1;
+
+   if (pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) {
+   /* Transitioning to ADMA mode. Free legacy PRD table
+  and the pad buffer. */
+   ata_pad_free(ap, ap-host-dev);
+   ap-pad = NULL;
+   ap-pad_dma = 0;
+   dmam_free_coherent(ap-host-dev, ATA_PRD_TBL_SZ,
+   ap-prd, ap-prd_dma);
+   ap-prd = NULL;
+   ap-prd_dma = 0;
+   }
}
 
pci_read_config_dword(pdev, NV_MCP_SATA_CFG_20, current_reg);
@@ -763,23 +782,45 @@
config_mask = NV_MCP_SATA_CFG_20_PORT0_EN |
  NV_MCP_SATA_CFG_20_PORT0_PWB_EN;
 
+   /* Set appropriate DMA mask. */
+   pci_set_dma_mask(pdev, bounce_limit);
+   pci_set_consistent_dma_mask(pdev, bounce_limit);
+
+   blk_queue_bounce_limit(sdev-request_queue, bounce_limit);
+   blk_queue_segment_boundary(sdev-request_queue, segment_boundary);
+   blk_queue_max_hw_segments(sdev-request_queue, sg_tablesize);
+   ata_port_printk(ap, KERN_INFO,
+   bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n,
+   (unsigned long long)bounce_limit, segment_boundary,
+   sg_tablesize);
+
if (adma_enable) {
new_reg = current_reg | config_mask;
-   pp-flags = ~NV_ADMA_ATAPI_SETUP_COMPLETE;
+   if (pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE) {
+   /* Transition to ADMA mode.
+  Reallocate the pad buffer. */
+   rc = ata_pad_alloc(ap, ap-host-dev);
+   pp-flags = ~NV_ADMA_ATAPI_SETUP_COMPLETE;
+   }
} else {
new_reg = current_reg  ~config_mask;
-   pp-flags |= NV_ADMA_ATAPI_SETUP_COMPLETE;
+   if (!(pp-flags  NV_ADMA_ATAPI_SETUP_COMPLETE)) {
+   /* Transition to legacy mode.
+  Reallocate the legacy PRD and pad buffer. */
+   ap-prd = dmam_alloc_coherent(ap-host-dev,
+   ATA_PRD_TBL_SZ, ap-prd_dma, GFP_KERNEL);
+   if (!ap-prd)
+   rc = -ENOMEM;
+   else
+   rc = ata_pad_alloc(ap, ap-host-dev);
+
+   pp-flags |= NV_ADMA_ATAPI_SETUP_COMPLETE;
+   }
}
 
if (current_reg != new_reg)
pci_write_config_dword(pdev, NV_MCP_SATA_CFG_20, new_reg);
 
-   blk_queue_bounce_limit(sdev-request_queue, bounce_limit);
-   blk_queue_segment_boundary(sdev-request_queue, segment_boundary);
-   blk_queue_max_hw_segments(sdev-request_queue, sg_tablesize);
-   ata_port_printk(ap, KERN_INFO,
-   bounce limit 0x%llX, segment boundary 0x%lX, hw segs %hu\n,
-   (unsigned long long)bounce_limit, segment_boundary, 
sg_tablesize);
return rc;
 }
 
@@ -791,11 +832,13 @@
 
 static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf

Re: [PATCH] sata_nv: fix ADMA ATAPI issues with memory over 4GB

2007-11-12 Thread Robert Hancock

Tejun Heo wrote:

How about always initialize DMA mask to ATA_DMA_MASK regardless of ADMA
mode such that PRD and PAD buffers are always accessible by register
mode and just raising PCI dma mask and queue bounce limit if ADMA mode
is active?


Could be done.. but, I don't want to constrain the ADMA APRD/CPB area in 
that way (there are some dual-socket Opteron boxes with this controller, 
forcing an allocation below 4GB for this could force a non-optimal node 
allocation I think..) To do this I'd have to raise the mask for the APRD 
allocation, drop it again, then raise it again in ADMA mode, which is 
kind of ugly.


Also, I'd rather not allocate the legacy PRD at all if we're in ADMA 
mode. That way, if some bug causes us to try and do legacy DMA in ADMA 
mode, we'll crash from null pointer dereference instead of potentially 
transferring incorrect data (as we had in this case) and corrupting things.





+   /* Set appropriate DMA mask. */
+   pci_set_dma_mask(pdev, bounce_limit);
+   pci_set_consistent_dma_mask(pdev, bounce_limit);


These can fail.


Yes, it should likely do something with these return values. Though 
theoretically it shouldn't fail, since the DMA mask is either 32-bit, 
which shouldn't fail, or one that was successfully set before. Also I 
don't think the SCSI layer actually checks the slave_config return 
value.. sigh.




Also, please separate out the result TF handling to a separate patch.  I
know it's a small change but as both introduces important behavior
changes, I think it would be nice to have a bisection point inbetween.


Could do. That change would have to come first though, as the change to 
not allocate the PRD except when necessary would cause some cases there 
to blow up when before they might have worked in some cases.




Thanks.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64 SATA DVD drive + libata trouble

2007-11-10 Thread Robert Hancock

Bernd Strieder wrote:

Hello,

please CC me, I'm not subscribed.

If any kernel developer is interested in more specific information 
please mail me, I can build kernels, I can apply patches, though 
have not done it regularly.


I'd like to get the DVD drive working somehow. I have googled a lot 
and did not find any more ideas what to do. Some good keywords to 
find a solution would suffice at that end.


Rough problem description:

I have a Tyan mainboard with NVIDIA chipset CK804. The only 
SATA/IDE device is a SATA DVD combo, the harddisks are on a RAID 
controller from 3ware. The harddisks are fine.


The openSuSE 10.3 boot dvds fail after booting from the BIOS, the 
installation kernel cannot use the DVD drive. That kernel uses 
libata and sata_nv pata_amd as drivers. The drive is recognized 
but it cannot be used. This is the situation probably during 
install from DVD and now in the running system after a network 
install it persists.


Reading from the dvd device /dev/sr0 with dd stops after at most 
119kb of rubbish read. Mounting fails with superblock not found.
When trying to remove the pata_amd module I get an Oops. I tried to 
remove the modules to have a chance to reload them with other 
options (atapi_enable), but that did not help, even after 
rebooting.


A vanilla 2.6.23.1 kernel behaves even less friendly, the dd 
on /dev/sr0 causes a hard reset.


So there are clearly some problems with libata in this system.

I have failed switching away from libata getting the drive to be 
recognized at all.


There is a known problem with ATAPI devices on CK804 chipsets which have 
memory above the 4GB mark, being debugged here:


https://bugzilla.redhat.com/show_bug.cgi?id=351451

If you are running into that one you can workaround it for now by 
passing the adma=0 parameter to the sata_nv module (not sure how this 
would be done on Suse's setup) or pass sata_nv.adma=0 on the kernel 
command line if sata_nv is built into the kernel. If that does help, I 
could ask you to test patches :-)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv,ahci: add the ahci legacy mode support to sata_nv

2007-11-10 Thread Robert Hancock

Jeff Garzik wrote:

Jeff Garzik wrote:
The proposed sata_nv patch does the opposite -- guarantees we must 
support the continually problematic legacy IDE interface ad infinitum. 
Such patches are OK for the test lab, but in this specific case users 
/suffer/ when not running AHCI mode.


Just to reinforce...

sata_nv support and bug fixes are primarily done right now through the 
valiant efforts of Robert Hancock (with assists from Alan, Tejun, and 
others).


Robert's job is difficult, because he has no hardware documentation[1], 
and NVIDIA does not seem to be helping out much with driver bug reports 
on the lists or in bugzillas.


Right, I don't have anything. Unless the original incomplete ADMA driver 
release from NVIDIA counts as documentation, lol.


And yes, I've CC'ed NVIDIA people about a few ADMA-related issues and 
been met with silence. It would be nice if they were as responsive about 
ADMA issues as I must say Kuan and Peer have been on the SWNCQ side of 
things..




As far as I know, I am the only one in the universe outside of NVIDIA 
with any SATA docs at all, and those docs _only_ cover ADMA registers 
and DMA structures, no PCI config info, no errata, nothing on SWNCQ or 
legacy IDE (well, half a page).


NVIDIA has indeed become more engaged in sata_nv in recent times, and 
that's a positive sign.  You, Kuon and Ayaz have all been noticeably 
more responsive in email.  Thanks.  Users have definitely benefited, 
particularly from your help addressing a couple SWNCQ issues.


But at this point in time, being asked to choose between sata_nv and 
ahci is no choice at all.  One has public documentation, wide industry 
support and little-or-no bugs.  The other has several open issues, no 
documentation, and support obstacles.


They're not even equivalent interfaces in this case, in the proposed 
AHCI legacy mode patch these controllers are supported in the default 
SFF mode only, no ADMA or SWNCQ, so you don't get any NCQ support..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv,ahci: add the ahci legacy mode support to sata_nv

2007-11-10 Thread Robert Hancock

Jeff Garzik wrote:

Jeff Garzik wrote:
The proposed sata_nv patch does the opposite -- guarantees we must 
support the continually problematic legacy IDE interface ad infinitum. 
Such patches are OK for the test lab, but in this specific case users 
/suffer/ when not running AHCI mode.


Just to reinforce...

sata_nv support and bug fixes are primarily done right now through the 
valiant efforts of Robert Hancock (with assists from Alan, Tejun, and 
others).


Robert's job is difficult, because he has no hardware documentation[1], 
and NVIDIA does not seem to be helping out much with driver bug reports 
on the lists or in bugzillas.


Right, I don't have anything. Unless the original incomplete ADMA driver 
release from NVIDIA counts as documentation, lol.


And yes, I've CC'ed NVIDIA people about a few ADMA-related issues and 
been met with silence. It would be nice if they were as responsive about 
ADMA issues as I must say Kuan and Peer have been on the SWNCQ side of 
things..




As far as I know, I am the only one in the universe outside of NVIDIA 
with any SATA docs at all, and those docs _only_ cover ADMA registers 
and DMA structures, no PCI config info, no errata, nothing on SWNCQ or 
legacy IDE (well, half a page).


NVIDIA has indeed become more engaged in sata_nv in recent times, and 
that's a positive sign.  You, Kuon and Ayaz have all been noticeably 
more responsive in email.  Thanks.  Users have definitely benefited, 
particularly from your help addressing a couple SWNCQ issues.


But at this point in time, being asked to choose between sata_nv and 
ahci is no choice at all.  One has public documentation, wide industry 
support and little-or-no bugs.  The other has several open issues, no 
documentation, and support obstacles.


They're not even equivalent interfaces in this case, in the proposed 
AHCI legacy mode patch these controllers are supported in the default 
SFF mode only, no ADMA or SWNCQ, so you don't get any NCQ support..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64 SATA DVD drive + libata trouble

2007-11-10 Thread Robert Hancock

Bernd Strieder wrote:

Hello,

please CC me, I'm not subscribed.

If any kernel developer is interested in more specific information 
please mail me, I can build kernels, I can apply patches, though 
have not done it regularly.


I'd like to get the DVD drive working somehow. I have googled a lot 
and did not find any more ideas what to do. Some good keywords to 
find a solution would suffice at that end.


Rough problem description:

I have a Tyan mainboard with NVIDIA chipset CK804. The only 
SATA/IDE device is a SATA DVD combo, the harddisks are on a RAID 
controller from 3ware. The harddisks are fine.


The openSuSE 10.3 boot dvds fail after booting from the BIOS, the 
installation kernel cannot use the DVD drive. That kernel uses 
libata and sata_nv pata_amd as drivers. The drive is recognized 
but it cannot be used. This is the situation probably during 
install from DVD and now in the running system after a network 
install it persists.


Reading from the dvd device /dev/sr0 with dd stops after at most 
119kb of rubbish read. Mounting fails with superblock not found.
When trying to remove the pata_amd module I get an Oops. I tried to 
remove the modules to have a chance to reload them with other 
options (atapi_enable), but that did not help, even after 
rebooting.


A vanilla 2.6.23.1 kernel behaves even less friendly, the dd 
on /dev/sr0 causes a hard reset.


So there are clearly some problems with libata in this system.

I have failed switching away from libata getting the drive to be 
recognized at all.


There is a known problem with ATAPI devices on CK804 chipsets which have 
memory above the 4GB mark, being debugged here:


https://bugzilla.redhat.com/show_bug.cgi?id=351451

If you are running into that one you can workaround it for now by 
passing the adma=0 parameter to the sata_nv module (not sure how this 
would be done on Suse's setup) or pass sata_nv.adma=0 on the kernel 
command line if sata_nv is built into the kernel. If that does help, I 
could ask you to test patches :-)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How do I debug PCI resource allocation problems

2007-11-08 Thread Robert Hancock
00 00 00 00 00 00
a0: 11 11 00 00 00 00 06 03 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 01 00 22 00 00 00 00 00 00 00 00 00 00 01 02 00
e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00
f0: 12 00 03 00 00 00 00 00 90 0f 03 00 d1 cd 5d df

Question: Memory region 2 at 12000? That is beyond the 4GB boundary and 
the BIOS guys I know told me that every PCI IOMEM region should reside within 
the first 4 GBs! When running the machine with 2 GB only lspci output looks 
like this for the VGA device:


64-bit capable PCI devices can indeed have BARs which can be located 
above 4GB. However, I can't see why lspci is detecting that from this 
configuration space: the BAR contents for region 2 are 2008, which 
means prefetchable memory at 0x2000 which can be located anywhere 
within 32-bit memory space. That doesn't make any sense though, since 
that's in the middle of RAM! Quite likely this bogus resource setting of 
the graphics controller is a large part of your problem. Question is 
who's doing this..




00:02.0 Class 0300: 8086:29b2 (rev 02) (prog-if 00 [VGA])
Subsystem: 1734:10fc
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- 
SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at f010 (32-bit, non-prefetchable) [size=512K]
Region 1: I/O ports at 1c70 [size=8]
Region 2: Memory at e000 (32-bit, prefetchable) [size=256M]
Region 3: Memory at f000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 
Enable-

Address:   Data: 
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 PME-Enable- DSel=0 DScale=0 PME-

That means now we get the region 2 at e000 and everything is fine. 


This looks more reasonable, though it's still mapped the BAR over top of 
a region that's reserved in the E820 memory map, which is wrong..




I tried to have a look at the PCI config space with a DOS tool from the year 
1999, the hexdump looked pretty much similar, but byte 1A changed from "20" 
to "e0".  And the region was declared as e008, which also looked strange 
to me. 


That looks more reasonable (e000, not 2000). The last 4 bits are 
used to encode the prefetchable flag and memory space. Question is how 
that got set in the bogus fashion in Linux..




This is where I am now. I also  with an Intel reference mainboard (same 
chipset, different BIOS) and this one didn't show the problem. That makes me 
think that the problem is somewhere in the BIOS, but where? I have access to 
the BIOS developers, but they don't know much about Linux and since the other 
operating systems from Redmond are running without problems they are hard to 
convince that they made a mistake. :-)


Another very bad side effect of the problem is that when the machine runs on 
32-bit Linux then the graphic card seems to work, but people report corrupted 
file systems after a while. I guess that is related to my problem on 64 bit, 
only that in the 32-bit case then filesystem buffers got overwritten by the 
video RAM and when they are written back to disk... ouch! 

I also tried to track the problem down with the CD from LinuxFirmwareKit.org, 
but the resource allocation errors are the same and unfortunately the lack of 
verbosity as well.


Ok, that's it. Any help is much appreciated. 
Thanks

Rainer



--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-08 Thread Robert Hancock

Luca Tettamanti wrote:

On Nov 7, 2007 1:55 PM, Tejun Heo <[EMAIL PROTECTED]> wrote:

Florian La Roche wrote:

Hello all,

I've taking email addresses from the last NCQ blacklist changes going
into the kernel.
This Fujitsu drive also gives me spurious command completions. Detailed
output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181.

Let me know if you need more info or anything else.

--- drivers/ata/libata-core.c
+++ drivers/ata/libata-core.c
@@ -4222,6 +4222,7 @@
  { "WDC WD740ADFD-00NLR1", NULL, ATA_HORKAGE_NONCQ, },
  { "WDC WD3200AAJS-00RYA0", "12.01B01",  ATA_HORKAGE_NONCQ, },
  { "FUJITSU MHV2080BH",  "00840028", ATA_HORKAGE_NONCQ, },
+ { "FUJITSU MHW2160BJ G2",   NULL,   ATA_HORKAGE_NONCQ },
  { "ST9120822AS","3.CLF",ATA_HORKAGE_NONCQ, },
  { "ST9160821AS","3.CLF",ATA_HORKAGE_NONCQ, },
  { "ST9160821AS","3.ALD",ATA_HORKAGE_NONCQ, },

Thanks.  We're currently trying to find out what's actually going on
with all these drives.  At first, drives which got blacklisted aren't
many and made sense (had other problems with NCQ, etc..) but with new
generation drives from many vendors showing the same symptom, we aren't
too sure now.


Is there a way to tell whether Windows is using NCQ or not? I checked
the system log (or whatever it's called) on my notebook and is clean
but I'm not sure it's using NCQ (I don't even know if it'd log
spurious completions somewhere).


Which driver is installed for the SATA controller in Windows, the 
chipset-manufacturer-provided AHCI driver or the default Microsoft 
driver? You'd need the AHCI driver installed for NCQ to be used.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-08 Thread Robert Hancock

Luca Tettamanti wrote:

On Nov 7, 2007 1:55 PM, Tejun Heo [EMAIL PROTECTED] wrote:

Florian La Roche wrote:

Hello all,

I've taking email addresses from the last NCQ blacklist changes going
into the kernel.
This Fujitsu drive also gives me spurious command completions. Detailed
output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181.

Let me know if you need more info or anything else.

--- drivers/ata/libata-core.c
+++ drivers/ata/libata-core.c
@@ -4222,6 +4222,7 @@
  { WDC WD740ADFD-00NLR1, NULL, ATA_HORKAGE_NONCQ, },
  { WDC WD3200AAJS-00RYA0, 12.01B01,  ATA_HORKAGE_NONCQ, },
  { FUJITSU MHV2080BH,  00840028, ATA_HORKAGE_NONCQ, },
+ { FUJITSU MHW2160BJ G2,   NULL,   ATA_HORKAGE_NONCQ },
  { ST9120822AS,3.CLF,ATA_HORKAGE_NONCQ, },
  { ST9160821AS,3.CLF,ATA_HORKAGE_NONCQ, },
  { ST9160821AS,3.ALD,ATA_HORKAGE_NONCQ, },

Thanks.  We're currently trying to find out what's actually going on
with all these drives.  At first, drives which got blacklisted aren't
many and made sense (had other problems with NCQ, etc..) but with new
generation drives from many vendors showing the same symptom, we aren't
too sure now.


Is there a way to tell whether Windows is using NCQ or not? I checked
the system log (or whatever it's called) on my notebook and is clean
but I'm not sure it's using NCQ (I don't even know if it'd log
spurious completions somewhere).


Which driver is installed for the SATA controller in Windows, the 
chipset-manufacturer-provided AHCI driver or the default Microsoft 
driver? You'd need the AHCI driver installed for NCQ to be used.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How do I debug PCI resource allocation problems

2007-11-08 Thread Robert Hancock
 space with a DOS tool from the year 
1999, the hexdump looked pretty much similar, but byte 1A changed from 20 
to e0.  And the region was declared as e008, which also looked strange 
to me. 


That looks more reasonable (e000, not 2000). The last 4 bits are 
used to encode the prefetchable flag and memory space. Question is how 
that got set in the bogus fashion in Linux..




This is where I am now. I also  with an Intel reference mainboard (same 
chipset, different BIOS) and this one didn't show the problem. That makes me 
think that the problem is somewhere in the BIOS, but where? I have access to 
the BIOS developers, but they don't know much about Linux and since the other 
operating systems from Redmond are running without problems they are hard to 
convince that they made a mistake. :-)


Another very bad side effect of the problem is that when the machine runs on 
32-bit Linux then the graphic card seems to work, but people report corrupted 
file systems after a while. I guess that is related to my problem on 64 bit, 
only that in the 32-bit case then filesystem buffers got overwritten by the 
video RAM and when they are written back to disk... ouch! 

I also tried to track the problem down with the CD from LinuxFirmwareKit.org, 
but the resource allocation errors are the same and unfortunately the lack of 
verbosity as well.


Ok, that's it. Any help is much appreciated. 
Thanks

Rainer



--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Robert Hancock

Denys wrote:

Finally i got full DMESG with 1GB card till end. Seems not readable too.



..



ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for MWDMA1
sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 00 00 00
sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
ata1: EH complete


I'm guessing that your CF-to-IDE adapter doesn't have the correct lines 
wired up for DMA to work properly, and the card indicates DMA support, 
which libata tries to use but which doesn't work. It looks like it never 
tried falling back to PIO after DMA failed. Seems like a deficiency in 
the speed-down logic?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA eating my disk, port reset, destroying unrelated data

2007-11-07 Thread Robert Hancock

Norbert Preining wrote:

Dear all!

(please Cc me for answers)

Since about 5 days I am having serious problems with my SATA drive:

kernel 2.6.22 (from Debian/sid)
hardware nv

Sometimes at boot time, often/always at disk io intense stuff:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2


Serror 0x40 means a handshake error. Usually Serror indications are 
due to a hardware problem (bad SATA cable, power or drive problem).



ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 35/00:00:2a:6f:c0/00:04:0c:00:00/e0 tag 0 cdb 0x0 data 524288 out
 res 51/84:10:1a:72:c0/84:01:0c:00:00/e0 Emask 0x10 (ATA bus error)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete

Even worse, sometimes the reset does not work ...

ata1: device not ready (errno=-16), forcing hardreset
ata1: hard resetting port
ata1 SRST failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 10 secs
..

(typed from a digital photo, nothing remains in the logs)

After this I need to do a cold boot otherwise the drive is really in a
bad state and not even the bios gets it right.


If even the BIOS cannot reset properly then that also really points to a 
hardware problem..




Interestingly the whole stuff DID work for a long time until I did too
many things at the same time: 2 x svn up, copying 40G from the SATA
drive to an USB drive, aptitude upgrade. Before I did regularly the same
stuff (like svn up etc), but this time it was too much, it seems.

Apropos data hosing: After the first incident some data on my windows
partitions (/dev/sda1) was hosed, programs missing, chkdisk necessary
etc.

I attach dmesg (from the current boot with a succeeding soft reset, I
interrupted the svn process before the SATA drives goes into hard reset
failures), .config, lspci -v output.

Are there any chances that using 2.6.23 will improve/fix this? Any other
suggestions?

I would consider it an hardware problem, but since it started at one big
io thingy and is persistent since then I am a bit sceptic.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-07 Thread Robert Hancock

Tejun Heo wrote:

Florian La Roche wrote:

Hello all,

I've taking email addresses from the last NCQ blacklist changes going
into the kernel.
This Fujitsu drive also gives me spurious command completions. Detailed
output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181.

Let me know if you need more info or anything else.

--- drivers/ata/libata-core.c
+++ drivers/ata/libata-core.c
@@ -4222,6 +4222,7 @@
{ "WDC WD740ADFD-00NLR1", NULL,   ATA_HORKAGE_NONCQ, },
{ "WDC WD3200AAJS-00RYA0", "12.01B01",  ATA_HORKAGE_NONCQ, },
{ "FUJITSU MHV2080BH","00840028",   ATA_HORKAGE_NONCQ, },
+   { "FUJITSU MHW2160BJ G2",   NULL,   ATA_HORKAGE_NONCQ },
{ "ST9120822AS",  "3.CLF",  ATA_HORKAGE_NONCQ, },
{ "ST9160821AS",  "3.CLF",  ATA_HORKAGE_NONCQ, },
{ "ST9160821AS",  "3.ALD",  ATA_HORKAGE_NONCQ, },


Thanks.  We're currently trying to find out what's actually going on
with all these drives.  At first, drives which got blacklisted aren't
many and made sense (had other problems with NCQ, etc..) but with new
generation drives from many vendors showing the same symptom, we aren't
too sure now.

I'll keep your email in my todo list and add the drive to the blacklist
once the problem is verified.


I agree that something seems fishy with this. It seems unlikely that 
this many drives from multiple vendors would have the exact same, 
relatively obscure problem..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA eating my disk, port reset, destroying unrelated data

2007-11-07 Thread Robert Hancock

Norbert Preining wrote:

Dear all!

(please Cc me for answers)

Since about 5 days I am having serious problems with my SATA drive:

kernel 2.6.22 (from Debian/sid)
hardware nv

Sometimes at boot time, often/always at disk io intense stuff:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40 action 0x2


Serror 0x40 means a handshake error. Usually Serror indications are 
due to a hardware problem (bad SATA cable, power or drive problem).



ata1.00: (BMDMA stat 0x25)
ata1.00: cmd 35/00:00:2a:6f:c0/00:04:0c:00:00/e0 tag 0 cdb 0x0 data 524288 out
 res 51/84:10:1a:72:c0/84:01:0c:00:00/e0 Emask 0x10 (ATA bus error)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete

Even worse, sometimes the reset does not work ...

ata1: device not ready (errno=-16), forcing hardreset
ata1: hard resetting port
ata1 SRST failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 10 secs
..

(typed from a digital photo, nothing remains in the logs)

After this I need to do a cold boot otherwise the drive is really in a
bad state and not even the bios gets it right.


If even the BIOS cannot reset properly then that also really points to a 
hardware problem..




Interestingly the whole stuff DID work for a long time until I did too
many things at the same time: 2 x svn up, copying 40G from the SATA
drive to an USB drive, aptitude upgrade. Before I did regularly the same
stuff (like svn up etc), but this time it was too much, it seems.

Apropos data hosing: After the first incident some data on my windows
partitions (/dev/sda1) was hosed, programs missing, chkdisk necessary
etc.

I attach dmesg (from the current boot with a succeeding soft reset, I
interrupted the svn process before the SATA drives goes into hard reset
failures), .config, lspci -v output.

Are there any chances that using 2.6.23 will improve/fix this? Any other
suggestions?

I would consider it an hardware problem, but since it started at one big
io thingy and is persistent since then I am a bit sceptic.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata NCQ blacklist entry

2007-11-07 Thread Robert Hancock

Tejun Heo wrote:

Florian La Roche wrote:

Hello all,

I've taking email addresses from the last NCQ blacklist changes going
into the kernel.
This Fujitsu drive also gives me spurious command completions. Detailed
output also available at https://bugzilla.redhat.com/show_bug.cgi?id=366181.

Let me know if you need more info or anything else.

--- drivers/ata/libata-core.c
+++ drivers/ata/libata-core.c
@@ -4222,6 +4222,7 @@
{ WDC WD740ADFD-00NLR1, NULL,   ATA_HORKAGE_NONCQ, },
{ WDC WD3200AAJS-00RYA0, 12.01B01,  ATA_HORKAGE_NONCQ, },
{ FUJITSU MHV2080BH,00840028,   ATA_HORKAGE_NONCQ, },
+   { FUJITSU MHW2160BJ G2,   NULL,   ATA_HORKAGE_NONCQ },
{ ST9120822AS,  3.CLF,  ATA_HORKAGE_NONCQ, },
{ ST9160821AS,  3.CLF,  ATA_HORKAGE_NONCQ, },
{ ST9160821AS,  3.ALD,  ATA_HORKAGE_NONCQ, },


Thanks.  We're currently trying to find out what's actually going on
with all these drives.  At first, drives which got blacklisted aren't
many and made sense (had other problems with NCQ, etc..) but with new
generation drives from many vendors showing the same symptom, we aren't
too sure now.

I'll keep your email in my todo list and add the drive to the blacklist
once the problem is verified.


I agree that something seems fishy with this. It seems unlikely that 
this many drives from multiple vendors would have the exact same, 
relatively obscure problem..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SC1200 failure in 2.6.23 and 2.6.24-rc1-git10

2007-11-07 Thread Robert Hancock

Denys wrote:

Finally i got full DMESG with 1GB card till end. Seems not readable too.



..



ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 in
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: configured for MWDMA1
sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] Sense Key : 0xb [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 00 00 00
sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
ata1: EH complete


I'm guessing that your CF-to-IDE adapter doesn't have the correct lines 
wired up for DMA to work properly, and the card indicates DMA support, 
which libata tries to use but which doesn't work. It looks like it never 
tried falling back to PIO after DMA failed. Seems like a deficiency in 
the speed-down logic?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-06 Thread Robert Hancock

Frank van Maarseveen wrote:

For quite some time I'm seeing occasional lockups spread over 50 different
machines I'm maintaining. Symptom: a page allocation failure with order:1,
GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
pages, almost no swap used) followed by a lockup (everything dead). I've
collected all (12) crash cases which occurred the last 10 weeks on 50
machines total (i.e. 1 crash every 41 weeks on average). The kernel
messages are summarized to show the interesting part (IMO) they have
in common. Over the years this has become the crash cause #1 for stable
kernels for me (fglrx doesn't count ;).

One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
network driver via that same driver (netconsole) may not be the smartest
thing to do and this could be responsible for the lockup itself. However,
the initial page allocation failure remains and I'm not sure how to
address that problem.

I still think the issue is memory fragmentation but if so, it looks
a bit extreme to me: One system with 2GB of ram crashed after a day,
merely running a couple of TCP server programs. All systems have either
1 or 2GB ram and at least 1G of (merely unused) swap.


These are all order-1 allocations for received network packets that need 
to be allocated out of low memory (assuming you're using a 32-bit 
kernel), so it's quite possible for them to fail on occasion. (Are you 
using jumbo frames?)


That should not be causing a lockup though.. the received packet should 
just get dropped.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-06 Thread Robert Hancock

Frank van Maarseveen wrote:

For quite some time I'm seeing occasional lockups spread over 50 different
machines I'm maintaining. Symptom: a page allocation failure with order:1,
GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
pages, almost no swap used) followed by a lockup (everything dead). I've
collected all (12) crash cases which occurred the last 10 weeks on 50
machines total (i.e. 1 crash every 41 weeks on average). The kernel
messages are summarized to show the interesting part (IMO) they have
in common. Over the years this has become the crash cause #1 for stable
kernels for me (fglrx doesn't count ;).

One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
network driver via that same driver (netconsole) may not be the smartest
thing to do and this could be responsible for the lockup itself. However,
the initial page allocation failure remains and I'm not sure how to
address that problem.

I still think the issue is memory fragmentation but if so, it looks
a bit extreme to me: One system with 2GB of ram crashed after a day,
merely running a couple of TCP server programs. All systems have either
1 or 2GB ram and at least 1G of (merely unused) swap.


These are all order-1 allocations for received network packets that need 
to be allocated out of low memory (assuming you're using a 32-bit 
kernel), so it's quite possible for them to fail on occasion. (Are you 
using jumbo frames?)


That should not be causing a lockup though.. the received packet should 
just get dropped.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

2007-10-30 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 30 Oct 2007, Robert Hancock wrote:

You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for
startup.

If it does, it's not by necessity. As soon as you read the table location out
of the ACPI tables you can start using it, and that shouldn't require any
config space accesses.


Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to 
use conf1 (which we can trust) to verify it and/or fix things up.


My point was, it's not inherently necessary in order to use MMCONFIG. 
I'm not saying the checks (unreachable_devices and 
pci_mmcfg_check_hostbridge) aren't useful or needed with many real 
machines. However, in the event that type1 access isn't available we 
just skip all those checks because we have no other option. It would 
indeed be a pretty broken spec if there was no way to bootstrap with it 
even under ideal conditions..




Also, there are several devices that don't show up in the MMCFG things, or 
just otherwise get it wrong.


So just take a look at arch/x86/pci/mmconfig-shared.c and look for 
"conf1".


Really. Damn, I'm nervous taking any MMCFG patches that has you as an 
author, if you aren't even aware of these kinds of fundamnetal issues. You 
probably read the standards about how things are "supposed" to work, and 
then just believed them?


Rule #1 in kernel programming: don't *ever* think that things actually 
work the way they are documented to work. The documentation is a starting 
point, nothing else. 

And please be defensive in programming. We *know* conf1 cycles work. The 
hardware has been extensively tested, and there are no firmware 
interactions. There is *zero* reasons to use MMCONF cycles for normal 
devices. Ergo: switching over to MMCONF when not needed is stupid and 
fragile.


I can't really disagree that MMCONFIG doesn't have great advantages for 
most devices (though it likely is faster on a lot of platforms, which 
may be significant if the device does lots of config space accesses). So 
for the moment, avoiding using it except where necessary will likely 
work out (except if some system does indeed puke on mixing type1 and 
MMCONFIG).


However, what Microsoft is doing with Vista may eventually make a 
difference in the future. Many hardware vendors seem to use the testing 
strategy of "test with latest Windows version. Works OK? Ship it." If 
Vista decides that MMCONFIG is good to use all the time, then type1 
access support is likely going to a) end up less tested and b) probably 
deleted entirely in time. We've seen it before - it used to be that not 
using ACPI was the safe option on most hardware with Linux. Now you 
pretty much have to use it because the manufacturers only test with it 
enabled. I've seen at least one board where the interrupt routing was 
completely broken with ACPI off, because they obviously only tested in 
Windows..


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and dynamically changing DMA mask?

2007-10-30 Thread Robert Hancock

Alan Cox wrote:

On Mon, 29 Oct 2007 22:17:40 -0600
Robert Hancock <[EMAIL PROTECTED]> wrote:

In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. 
However, when an ATAPI device like a DVD drive is connected, we can't 
use ADMA mode, and so we have to abide by the restrictions of a normal 
SFF ATA controller and can only do 32-bit DMA. We detect this and try to 
set the blk_queue_bounce_limit, blk_queue_segment_boundary and 
blk_queue_max_hw_segments to the values corresponding to a normal SFF 
controller.


What about the DMA padding buffer from nv_adma_port_start and internal
buffers for commands like request sense that don't come via the request
queue directly.


Indeed we do call ata_port_start from nv_adma_port_start, which calls 
dmam_alloc_coherent to allocate the SFF PRD table. Since the DMA mask is 
64-bit, this could indeed be allocated above 4GB which would be bad.


I suppose what we could do is just not call ata_port_start there, but 
move it into nv_adma_slave_config and call it when going into non-ADMA 
mode. We'd have to drop the DMA mask down to 32-bit first as well as 
setting blk_queue_bounce_limit though, which is one of my questions, is 
this OK to do?



Also it seems nv_adma_use_reg_mode() can decide to send other commands
via the non ADMA interface even for ATA devices. Are we 100% certain it
never decides to let through a command with DMA via the register
interface in this case - what do you see if you instrument the function ?


The only cases where that could happen are for polling DMA commands 
(which I presume we never do) or where result taskfile is requested. The 
latter could be a problem for ATA passthrough commands using DMA, I 
suppose.. Question is what we can do about it.. We have to switch out of 
ADMA mode to read a result taskfile. I guess that's not really a problem 
unless somebody starts issuing NCQ commands via ATA pass-through. Do we 
allow that?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

2007-10-30 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 30 Oct 2007, Arjan van de Ven wrote:

the problem is... you're not supposed to mix both types of accesses.


You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for 
startup.


If it does, it's not by necessity. As soon as you read the table 
location out of the ACPI tables you can start using it, and that 
shouldn't require any config space accesses.




Also, there's reason to believe that mixing things up _has_ to work 
anyway, and if the issue is between "works in practice" and "theory says 
that you shouldn't mix", I'll take practice every time.


Especially since we *know* that the theory is broken. Right now MMCONFIG 
is effectively disabled very aggressively because it's simply unusably 
flaky. So the choice is between:


 - don't use MMCONFIG at all, because it has so many problems
 - use MMCONFIG sparingly enough to hide the problems


Fact is, we don't really know how many of these systems with supposedly 
"broken" MMCONFIG were really just suffering from the overlapping 
PCI/MMCONFIG address space problem, which is entirely the fault of the 
way we do PCI probing. I would bet quite a few of them.




and what "you're supposed to do" is simply trumped by Real Life(tm). 
Because Intel screwed up so badly when they designed that piece of shit.


(Where "screwed up badly" is the usual "left it to firmware people" thing, 
of course. Dammit, Intel *could* have just made it a real PCI BAR in the 
Northbridge, and specified it as such, and we wouldn't have these 
problems! But no, it had to be another idiotic "firmware tells where it 
is" thing)


This wouldn't have helped anything with the problem in question.



The fact is, CONF1 style accesses are just safer, and *work*. 

I would suggest a slight twist then: use CONF1 *until* you're using
something above 256, and then and only then switch to MMCONFIG from
then on for all accesses.


No.

Maybe if you do it per-device, and only *after* probing (ie we have seen 
multiple, and successful, accesses), but globally, absolutely not. That 
would be useless. The bugs we have had in this area have been exactly the 
kinds of things like "we don't know the real size of the MMCONFIG areas" 
etc.


I could easily see device driver writers probing to see if something 
works, and I absolutely don't think we should just automatically enable 
MMCONFIG from then on.


Why per device? It's not like the MSI case where both the platform and 
the device are potentially busted. Whether or not MMCONFIG works has 
nothing to do with the device, all that matters is whether it works on 
the platform. It shouldn't be the driver's responsibility to know this.




But maybe we could have a per-device flag that a driver *can* set. Ie have 
the logic be:


 - use MMCONFIG if we have to (reg >= 256)

OR

 - use MMCONFIG if the driver specifically asked us to

and then drivers that absolutely need it, and know they do, can set that 
flag. Preferably after they actually verified that it works.


How will they verify that it works? If it works, then verifying it works 
is all well and good. If it doesn't work, trying to verify if it does 
could very well blow up the machine.


I've made the point before that if we're going to allow using it at all, 
we'd better find out if it works or not early on, not after we've been 
running and somebody decides it's a good idea to try using it and 
causing a lockup or something.




That way you _can_ get the "this is how you're supposed to do it" 
behaviour, but you get it when there is a reasonable chance that it 
actually works.


And quite frankly, if you're not supposed to mix these things even across 
devices, then I think we are better off just doing what we effectively do 
now: mostly ignore the damn thing because it's too broken to use.


Maybe somebody inside Intel could just clarify the documentation, and 
change it from "you're not supposed to mix" to "mix all you want". 


Intel could say what they want on the subject.. but that doesn't 
necessarily reflect what happens with anyone else's chipset implementations.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

2007-10-30 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 30 Oct 2007, Arjan van de Ven wrote:

the problem is... you're not supposed to mix both types of accesses.


You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for 
startup.


If it does, it's not by necessity. As soon as you read the table 
location out of the ACPI tables you can start using it, and that 
shouldn't require any config space accesses.




Also, there's reason to believe that mixing things up _has_ to work 
anyway, and if the issue is between works in practice and theory says 
that you shouldn't mix, I'll take practice every time.


Especially since we *know* that the theory is broken. Right now MMCONFIG 
is effectively disabled very aggressively because it's simply unusably 
flaky. So the choice is between:


 - don't use MMCONFIG at all, because it has so many problems
 - use MMCONFIG sparingly enough to hide the problems


Fact is, we don't really know how many of these systems with supposedly 
broken MMCONFIG were really just suffering from the overlapping 
PCI/MMCONFIG address space problem, which is entirely the fault of the 
way we do PCI probing. I would bet quite a few of them.




and what you're supposed to do is simply trumped by Real Life(tm). 
Because Intel screwed up so badly when they designed that piece of shit.


(Where screwed up badly is the usual left it to firmware people thing, 
of course. Dammit, Intel *could* have just made it a real PCI BAR in the 
Northbridge, and specified it as such, and we wouldn't have these 
problems! But no, it had to be another idiotic firmware tells where it 
is thing)


This wouldn't have helped anything with the problem in question.



The fact is, CONF1 style accesses are just safer, and *work*. 

I would suggest a slight twist then: use CONF1 *until* you're using
something above 256, and then and only then switch to MMCONFIG from
then on for all accesses.


No.

Maybe if you do it per-device, and only *after* probing (ie we have seen 
multiple, and successful, accesses), but globally, absolutely not. That 
would be useless. The bugs we have had in this area have been exactly the 
kinds of things like we don't know the real size of the MMCONFIG areas 
etc.


I could easily see device driver writers probing to see if something 
works, and I absolutely don't think we should just automatically enable 
MMCONFIG from then on.


Why per device? It's not like the MSI case where both the platform and 
the device are potentially busted. Whether or not MMCONFIG works has 
nothing to do with the device, all that matters is whether it works on 
the platform. It shouldn't be the driver's responsibility to know this.




But maybe we could have a per-device flag that a driver *can* set. Ie have 
the logic be:


 - use MMCONFIG if we have to (reg = 256)

OR

 - use MMCONFIG if the driver specifically asked us to

and then drivers that absolutely need it, and know they do, can set that 
flag. Preferably after they actually verified that it works.


How will they verify that it works? If it works, then verifying it works 
is all well and good. If it doesn't work, trying to verify if it does 
could very well blow up the machine.


I've made the point before that if we're going to allow using it at all, 
we'd better find out if it works or not early on, not after we've been 
running and somebody decides it's a good idea to try using it and 
causing a lockup or something.




That way you _can_ get the this is how you're supposed to do it 
behaviour, but you get it when there is a reasonable chance that it 
actually works.


And quite frankly, if you're not supposed to mix these things even across 
devices, then I think we are better off just doing what we effectively do 
now: mostly ignore the damn thing because it's too broken to use.


Maybe somebody inside Intel could just clarify the documentation, and 
change it from you're not supposed to mix to mix all you want. 


Intel could say what they want on the subject.. but that doesn't 
necessarily reflect what happens with anyone else's chipset implementations.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv and dynamically changing DMA mask?

2007-10-30 Thread Robert Hancock

Alan Cox wrote:

On Mon, 29 Oct 2007 22:17:40 -0600
Robert Hancock [EMAIL PROTECTED] wrote:

In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. 
However, when an ATAPI device like a DVD drive is connected, we can't 
use ADMA mode, and so we have to abide by the restrictions of a normal 
SFF ATA controller and can only do 32-bit DMA. We detect this and try to 
set the blk_queue_bounce_limit, blk_queue_segment_boundary and 
blk_queue_max_hw_segments to the values corresponding to a normal SFF 
controller.


What about the DMA padding buffer from nv_adma_port_start and internal
buffers for commands like request sense that don't come via the request
queue directly.


Indeed we do call ata_port_start from nv_adma_port_start, which calls 
dmam_alloc_coherent to allocate the SFF PRD table. Since the DMA mask is 
64-bit, this could indeed be allocated above 4GB which would be bad.


I suppose what we could do is just not call ata_port_start there, but 
move it into nv_adma_slave_config and call it when going into non-ADMA 
mode. We'd have to drop the DMA mask down to 32-bit first as well as 
setting blk_queue_bounce_limit though, which is one of my questions, is 
this OK to do?



Also it seems nv_adma_use_reg_mode() can decide to send other commands
via the non ADMA interface even for ATA devices. Are we 100% certain it
never decides to let through a command with DMA via the register
interface in this case - what do you see if you instrument the function ?


The only cases where that could happen are for polling DMA commands 
(which I presume we never do) or where result taskfile is requested. The 
latter could be a problem for ATA passthrough commands using DMA, I 
suppose.. Question is what we can do about it.. We have to switch out of 
ADMA mode to read a result taskfile. I guess that's not really a problem 
unless somebody starts issuing NCQ commands via ATA pass-through. Do we 
allow that?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

2007-10-30 Thread Robert Hancock

Linus Torvalds wrote:


On Tue, 30 Oct 2007, Robert Hancock wrote:

You have to, anyway. Even now the MMCONFIG stuff uses CONF1 cycles for
startup.

If it does, it's not by necessity. As soon as you read the table location out
of the ACPI tables you can start using it, and that shouldn't require any
config space accesses.


Don't be silly. Exactly _BECAUSE_ we cannot trust the firmware, we have to 
use conf1 (which we can trust) to verify it and/or fix things up.


My point was, it's not inherently necessary in order to use MMCONFIG. 
I'm not saying the checks (unreachable_devices and 
pci_mmcfg_check_hostbridge) aren't useful or needed with many real 
machines. However, in the event that type1 access isn't available we 
just skip all those checks because we have no other option. It would 
indeed be a pretty broken spec if there was no way to bootstrap with it 
even under ideal conditions..




Also, there are several devices that don't show up in the MMCFG things, or 
just otherwise get it wrong.


So just take a look at arch/x86/pci/mmconfig-shared.c and look for 
conf1.


Really. Damn, I'm nervous taking any MMCFG patches that has you as an 
author, if you aren't even aware of these kinds of fundamnetal issues. You 
probably read the standards about how things are supposed to work, and 
then just believed them?


Rule #1 in kernel programming: don't *ever* think that things actually 
work the way they are documented to work. The documentation is a starting 
point, nothing else. 

And please be defensive in programming. We *know* conf1 cycles work. The 
hardware has been extensively tested, and there are no firmware 
interactions. There is *zero* reasons to use MMCONF cycles for normal 
devices. Ergo: switching over to MMCONF when not needed is stupid and 
fragile.


I can't really disagree that MMCONFIG doesn't have great advantages for 
most devices (though it likely is faster on a lot of platforms, which 
may be significant if the device does lots of config space accesses). So 
for the moment, avoiding using it except where necessary will likely 
work out (except if some system does indeed puke on mixing type1 and 
MMCONFIG).


However, what Microsoft is doing with Vista may eventually make a 
difference in the future. Many hardware vendors seem to use the testing 
strategy of test with latest Windows version. Works OK? Ship it. If 
Vista decides that MMCONFIG is good to use all the time, then type1 
access support is likely going to a) end up less tested and b) probably 
deleted entirely in time. We've seen it before - it used to be that not 
using ACPI was the safe option on most hardware with Linux. Now you 
pretty much have to use it because the manufacturers only test with it 
enabled. I've seen at least one board where the interrupt routing was 
completely broken with ACPI off, because they obviously only tested in 
Windows..


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sata_nv and dynamically changing DMA mask?

2007-10-29 Thread Robert Hancock
In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. 
However, when an ATAPI device like a DVD drive is connected, we can't 
use ADMA mode, and so we have to abide by the restrictions of a normal 
SFF ATA controller and can only do 32-bit DMA. We detect this and try to 
set the blk_queue_bounce_limit, blk_queue_segment_boundary and 
blk_queue_max_hw_segments to the values corresponding to a normal SFF 
controller.


However, we have this bug report:

https://bugzilla.redhat.com/show_bug.cgi?id=351451

that their DVD drive doesn't work properly on a computer with 4GB of RAM 
unless they either disable ADMA (thus resulting in the DMA parameters 
being initialized to the SFF ones from the start) or pass mem=3000M to 
the kernel to keep the memory above the 4GB mark from being used. Thus I 
suspect that what we're trying to do with the DMA parameters is not taking.


Question is: is setting blk_queue_bounce_limit enough to prevent 
addresses outside that mask from showing up, or does the device DMA mask 
also need to be updated? Is there anything wrong with just changing the 
DMA mask at runtime? Keep in mind, ATAPI and non-ATAPI devices can 
potentially be switched out on the port, so the mask might need to be 
updated at runtime..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Page-Out of RO data

2007-10-29 Thread Robert Hancock

vicky wrote:

Hi,
Can Read-Only(RO) Section/Data of kernel can ever be paged out memory?

-Vicky



All kernel code and data is non-swappable in Linux..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange freezes (seems like SATA related)

2007-10-29 Thread Robert Hancock

Max Krasnyansky wrote:

A couple of HP xw9300 machines (dual Opterons) started freezing up.
We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is 
alive
(I can switch vts, etc) but everything else is dead (network, etc).
Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff.

Hooked up serial console and the only error that shows up is this.

ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 
0x1540 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Descriptor sense data with sense descriptors (in hex):
end_request: I/O error, dev sda, sector 8388695
Buffer I/O error on device sda1, logical block 1048579
lost page write due to I/O error on sda1
sd 0:0:0:0: [sda] Write Protect is off

I see a bunch of those and then the box just sits there spewing this 
periodically

ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 
0x1540 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

SMART selftest on the drive passed without errors.

Here is how this machine looks like

00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio 
Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address 
Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM 
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address 
Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM 
Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 
7000/VE]
05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 
Controller (PHY/Link)
0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet 
Controller (Copper) (rev 06)
40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07)
61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07)
62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01)
63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01)
80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet 
Controller (Copper) (rev 06)

As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. 


Any ideas what might the problem be ?


Can you post the full dmesg output? What kind of drive is this?

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line &q

Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

2007-10-29 Thread Robert Hancock

Greg KH wrote:

On Fri, Oct 26, 2007 at 09:59:45AM -0700, Jesse Barnes wrote:

On Thursday, October 25, 2007 7:54 pm Greg KH wrote:

On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:

I think Greg doesn't like it, even though we don't have an
alternative at this point...

Yes, I didn't like it, Ivan didn't like it, and I got reports that it
wasn't even needed at all once you upgraded your BIOS to the latest
version.

So, is this still needed?  And if so, can you try to implement what
Ivan suggested to do here instead?
Yes, it's still needed.  Auke rescinded his "BIOS upgrade makes it work" 
message, so something like this is still necessary.


He did?  Ugh, I can't keep these all straight, sorry.

Can someone just send what they think is still needed, and explain why
Ivan will not object to it?  :)


Here's a recap of the whole issue just for people's information:

Right now we disable MMCONFIG on machines where the MCFG area is not 
reserved in the E820 memory map since we figure it's not valid. This is 
a broken heuristic because the PCI Express firmware spec doesn't require 
that it be so reserved, it only needs to be reserved as an ACPI 
motherboard resource, and so many times it's not reserved in E820 
despite being completely valid and working. The 
mmconfig-validate-against-acpi-motherboard-resources.patch changes this 
to validate against the ACPI motherboard resources instead.


The second problem is that on some machines, when we are doing BAR 
sizing on PCI devices, and write all ones to a BAR in order to determine 
how many bits "stick", the BAR ends up overlapping with the MCFG area. 
On some chipsets, this causes writes to the MCFG area (like, to restore 
the original BAR contents) to get decoded by the device instead of by 
the MCFG mechanism, which means the BAR stays disabled and configuration 
access stops working, wreaking havoc. Usually on these machines the 
MMCONFIG is located near the top of 32-bit memory and the PCI device 
causing problems is a PCI Express graphics card. 
pci-disable-decode-of-io-memory-during-bar-sizing.patch, and its 
successors, switch off the device's decoding during sizing so that it 
won't absorb the accesses to the MCFG table.


The concern raised was that this might affect some devices negatively. 
We do avoid disabling decode on host bridges, as it's known that some of 
them disable RAM access when you turn decode off, stupidly. I've yet to 
hear of any other conclusive case where disabling the decode is harmful. 
In general, if disabling the decode causes issues, the mere fact of 
doing the BAR sizing could cause the same issues, and that is unavoidable.


The other possible workaround would be to avoid using MMCONFIG until the 
BAR sizing is done. However, this seems like a poor solution. First of 
all, in the future there may come machines where MMCONFIG is the only 
config mechanism (or, perhaps more likely, it becomes the only tested 
one, so the old methods get broken). Secondly, what happens with 
hot-plug devices that need to be sized after MMCONFIG gets turned on?


The only way these two patches are related is that the E820 check 
happens to wrongly disable MMCONFIG on some of the machines where the 
memory areas could overlap during sizing, so removing that check alone 
without fixing the overlap issue could cause breakage on some machines. 
However, this is purely by chance, and it doesn't prevent the breakage 
on many other machines - as well as the one mentioned in the earlier 
thread, there's this one:


https://bugzilla.redhat.com/show_bug.cgi?id=251493

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange freezes (seems like SATA related)

2007-10-29 Thread Robert Hancock

Max Krasnyansky wrote:

A couple of HP xw9300 machines (dual Opterons) started freezing up.
We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is 
alive
(I can switch vts, etc) but everything else is dead (network, etc).
Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff.

Hooked up serial console and the only error that shows up is this.

ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 
0x1540 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Descriptor sense data with sense descriptors (in hex):
end_request: I/O error, dev sda, sector 8388695
Buffer I/O error on device sda1, logical block 1048579
lost page write due to I/O error on sda1
sd 0:0:0:0: [sda] Write Protect is off

I see a bunch of those and then the box just sits there spewing this 
periodically

ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 
0x1540 next cpb count 0x0 next cpb idx 0x0
ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out
 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

SMART selftest on the drive passed without errors.

Here is how this machine looks like

00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio 
Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address 
Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM 
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address 
Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM 
Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 
7000/VE]
05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 
Controller (PHY/Link)
0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet 
Controller (Copper) (rev 06)
40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07)
61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X 
Fusion-MPT Dual Ultra320 SCSI (rev 07)
61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07)
62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01)
63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01)
80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet 
Controller (Copper) (rev 06)

As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. 


Any ideas what might the problem be ?


Can you post the full dmesg output? What kind of drive is this?

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux

Re: Page-Out of RO data

2007-10-29 Thread Robert Hancock

vicky wrote:

Hi,
Can Read-Only(RO) Section/Data of kernel can ever be paged out memory?

-Vicky



All kernel code and data is non-swappable in Linux..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci-disable-decode-of-io-memory-during-bar-sizing.patch

2007-10-29 Thread Robert Hancock

Greg KH wrote:

On Fri, Oct 26, 2007 at 09:59:45AM -0700, Jesse Barnes wrote:

On Thursday, October 25, 2007 7:54 pm Greg KH wrote:

On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:

I think Greg doesn't like it, even though we don't have an
alternative at this point...

Yes, I didn't like it, Ivan didn't like it, and I got reports that it
wasn't even needed at all once you upgraded your BIOS to the latest
version.

So, is this still needed?  And if so, can you try to implement what
Ivan suggested to do here instead?
Yes, it's still needed.  Auke rescinded his BIOS upgrade makes it work 
message, so something like this is still necessary.


He did?  Ugh, I can't keep these all straight, sorry.

Can someone just send what they think is still needed, and explain why
Ivan will not object to it?  :)


Here's a recap of the whole issue just for people's information:

Right now we disable MMCONFIG on machines where the MCFG area is not 
reserved in the E820 memory map since we figure it's not valid. This is 
a broken heuristic because the PCI Express firmware spec doesn't require 
that it be so reserved, it only needs to be reserved as an ACPI 
motherboard resource, and so many times it's not reserved in E820 
despite being completely valid and working. The 
mmconfig-validate-against-acpi-motherboard-resources.patch changes this 
to validate against the ACPI motherboard resources instead.


The second problem is that on some machines, when we are doing BAR 
sizing on PCI devices, and write all ones to a BAR in order to determine 
how many bits stick, the BAR ends up overlapping with the MCFG area. 
On some chipsets, this causes writes to the MCFG area (like, to restore 
the original BAR contents) to get decoded by the device instead of by 
the MCFG mechanism, which means the BAR stays disabled and configuration 
access stops working, wreaking havoc. Usually on these machines the 
MMCONFIG is located near the top of 32-bit memory and the PCI device 
causing problems is a PCI Express graphics card. 
pci-disable-decode-of-io-memory-during-bar-sizing.patch, and its 
successors, switch off the device's decoding during sizing so that it 
won't absorb the accesses to the MCFG table.


The concern raised was that this might affect some devices negatively. 
We do avoid disabling decode on host bridges, as it's known that some of 
them disable RAM access when you turn decode off, stupidly. I've yet to 
hear of any other conclusive case where disabling the decode is harmful. 
In general, if disabling the decode causes issues, the mere fact of 
doing the BAR sizing could cause the same issues, and that is unavoidable.


The other possible workaround would be to avoid using MMCONFIG until the 
BAR sizing is done. However, this seems like a poor solution. First of 
all, in the future there may come machines where MMCONFIG is the only 
config mechanism (or, perhaps more likely, it becomes the only tested 
one, so the old methods get broken). Secondly, what happens with 
hot-plug devices that need to be sized after MMCONFIG gets turned on?


The only way these two patches are related is that the E820 check 
happens to wrongly disable MMCONFIG on some of the machines where the 
memory areas could overlap during sizing, so removing that check alone 
without fixing the overlap issue could cause breakage on some machines. 
However, this is purely by chance, and it doesn't prevent the breakage 
on many other machines - as well as the one mentioned in the earlier 
thread, there's this one:


https://bugzilla.redhat.com/show_bug.cgi?id=251493

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sata_nv and dynamically changing DMA mask?

2007-10-29 Thread Robert Hancock
In the sata_nv driver, when running in ADMA mode, we can do 64-bit DMA. 
However, when an ATAPI device like a DVD drive is connected, we can't 
use ADMA mode, and so we have to abide by the restrictions of a normal 
SFF ATA controller and can only do 32-bit DMA. We detect this and try to 
set the blk_queue_bounce_limit, blk_queue_segment_boundary and 
blk_queue_max_hw_segments to the values corresponding to a normal SFF 
controller.


However, we have this bug report:

https://bugzilla.redhat.com/show_bug.cgi?id=351451

that their DVD drive doesn't work properly on a computer with 4GB of RAM 
unless they either disable ADMA (thus resulting in the DMA parameters 
being initialized to the SFF ones from the start) or pass mem=3000M to 
the kernel to keep the memory above the 4GB mark from being used. Thus I 
suspect that what we're trying to do with the DMA parameters is not taking.


Question is: is setting blk_queue_bounce_limit enough to prevent 
addresses outside that mask from showing up, or does the device DMA mask 
also need to be updated? Is there anything wrong with just changing the 
DMA mask at runtime? Keep in mind, ATAPI and non-ATAPI devices can 
potentially be switched out on the port, so the mask might need to be 
updated at runtime..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Major SATA / EXT3 Issue?

2007-10-28 Thread Robert Hancock

Chris Holvenstot wrote:

I am curious if anyone else has had major problems with SATA drives on
the current series of kernels.  I have (or rather had) two SATA drives
on my system - the first was a Maxtor MaxLine 500 and the second was a
Maxtor MaxLine 250.

Both of these drives were plugged to the 1.5 Gigabyte / second mode.

My SATA controller is integrated on my MSI motherboard and sports four
ports.  It is implemented using the Nvidia CK804 chipset.  My processor
is an AMD64 X2 4600+ running the 32 bit version of Linux.  


I have had these drives up and running for about six months.

The first drive "failed" about 10 days ago - and unfortunately I focused
on hardware error and after several attempts to get the drive back
online I physically pulled it from the system.  This drive was used for
backups and thus was not critical to day-to-day operations.  


However, tonight I "lost" a second SATA drive, this one I use on a daily
basis for my kernel build and test processes.  It failed in the same
manner as the first, which makes me a little suspicious.


The first drive “failed” while I was running a modified Ubuntu 7.04
system. Because I focused on hardware as the reason for the failure I
did not collect specific information about the version of the kernel
being used, but it was likely to be 2.6.24-git8.


The second drive “failed” tonight on what is, except for the kernel, a
fairly standard Ubuntu 7.10 system (the same hardware - I upgraded my OS
this past week) – the kernel in use tonight at the time of the second
failure was 2.6.24-rc1-git1


In each case the failure mode appears to have been the same – the system
appears to lock up. When rebooted I get a long string of messages like:


Oct 26 20:07:37 localhost kernel: [ 101.581091] ata2: timeout waiting
for ADMA IDLE, stat=0x440

Oct 26 20:07:37 localhost kernel: [ 101.581096] sd 1:0:0:0: [sda] Write
Protect is off

Oct 26 20:07:37 localhost kernel: [ 101.581174] res
71/04:08:00:00:00/04:00:1d:00:00/e0 Emask 0x1 (device error)

Oct 26 20:07:37 localhost kernel: [ 101.644992] ata2.00: configured for
UDMA/33

Oct 26 20:07:37 localhost kernel: [ 101.644994] ata2: EH complete

Oct 26 20:07:37 localhost kernel: [ 101.645006] sd 1:0:0:0: [sda] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA


You should try and get some output from dmesg and not from the messages 
log, as the log daemon seems to have a nasty habit of discarding 
critical output from these errors. In this case the failing command is 
missing and the message ordering even seems off.





The hardware appears to be correctly identified by the BIOS during the
power up sequence. 



Not much is seen in the dmesg log excpet for:


[ 43.649673] scsi0 : sata_nv

[ 43.649722] scsi1 : sata_nv

[ 43.649776] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xcc00
irq 19

[ 43.649778] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xcc08
irq 19


There should be more than this at the very least.. As above, please try 
to get output from dmesg itself.





When I try to run a file system check on these devices I get:




e2fsck 1.40.2 (12-Jul-2007)

fsck.ext2: No such file or directory while trying to open /dev/sdb1

The superblock could not be read or does not describe a correct ext2

filesystem. If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate
superblock:

e2fsck -b 8193 


I have a gut feeling that when the system appears to lock up what is
really going on is that the contents of the drive are being trashed. But
I have no proof of that.


I don't think that is the case, more like the drives have not been 
detected at all. If this happens after a reboot when they were working 
before, that sounds like some kind of a hardware issue most likely..





When I try to do a parted to see what the system thinks is on the drive
I get the error message:


Error: Error opening /dev/sdb: No medium found 



I am not having any problems with my EXT3 file systems located on
“standard” IDE / PATA drives.


My config file, which has not changed in months beyond taking the
defaults during make oldconfig looks like:


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Major SATA / EXT3 Issue?

2007-10-28 Thread Robert Hancock

Chris Holvenstot wrote:

I am curious if anyone else has had major problems with SATA drives on
the current series of kernels.  I have (or rather had) two SATA drives
on my system - the first was a Maxtor MaxLine 500 and the second was a
Maxtor MaxLine 250.

Both of these drives were plugged to the 1.5 Gigabyte / second mode.

My SATA controller is integrated on my MSI motherboard and sports four
ports.  It is implemented using the Nvidia CK804 chipset.  My processor
is an AMD64 X2 4600+ running the 32 bit version of Linux.  


I have had these drives up and running for about six months.

The first drive failed about 10 days ago - and unfortunately I focused
on hardware error and after several attempts to get the drive back
online I physically pulled it from the system.  This drive was used for
backups and thus was not critical to day-to-day operations.  


However, tonight I lost a second SATA drive, this one I use on a daily
basis for my kernel build and test processes.  It failed in the same
manner as the first, which makes me a little suspicious.


The first drive “failed” while I was running a modified Ubuntu 7.04
system. Because I focused on hardware as the reason for the failure I
did not collect specific information about the version of the kernel
being used, but it was likely to be 2.6.24-git8.


The second drive “failed” tonight on what is, except for the kernel, a
fairly standard Ubuntu 7.10 system (the same hardware - I upgraded my OS
this past week) – the kernel in use tonight at the time of the second
failure was 2.6.24-rc1-git1


In each case the failure mode appears to have been the same – the system
appears to lock up. When rebooted I get a long string of messages like:


Oct 26 20:07:37 localhost kernel: [ 101.581091] ata2: timeout waiting
for ADMA IDLE, stat=0x440

Oct 26 20:07:37 localhost kernel: [ 101.581096] sd 1:0:0:0: [sda] Write
Protect is off

Oct 26 20:07:37 localhost kernel: [ 101.581174] res
71/04:08:00:00:00/04:00:1d:00:00/e0 Emask 0x1 (device error)

Oct 26 20:07:37 localhost kernel: [ 101.644992] ata2.00: configured for
UDMA/33

Oct 26 20:07:37 localhost kernel: [ 101.644994] ata2: EH complete

Oct 26 20:07:37 localhost kernel: [ 101.645006] sd 1:0:0:0: [sda] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA


You should try and get some output from dmesg and not from the messages 
log, as the log daemon seems to have a nasty habit of discarding 
critical output from these errors. In this case the failing command is 
missing and the message ordering even seems off.





The hardware appears to be correctly identified by the BIOS during the
power up sequence. 



Not much is seen in the dmesg log excpet for:


[ 43.649673] scsi0 : sata_nv

[ 43.649722] scsi1 : sata_nv

[ 43.649776] ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xcc00
irq 19

[ 43.649778] ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xcc08
irq 19


There should be more than this at the very least.. As above, please try 
to get output from dmesg itself.





When I try to run a file system check on these devices I get:




e2fsck 1.40.2 (12-Jul-2007)

fsck.ext2: No such file or directory while trying to open /dev/sdb1

The superblock could not be read or does not describe a correct ext2

filesystem. If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate
superblock:

e2fsck -b 8193 device


I have a gut feeling that when the system appears to lock up what is
really going on is that the contents of the drive are being trashed. But
I have no proof of that.


I don't think that is the case, more like the drives have not been 
detected at all. If this happens after a reboot when they were working 
before, that sounds like some kind of a hardware issue most likely..





When I try to do a parted to see what the system thinks is on the drive
I get the error message:


Error: Error opening /dev/sdb: No medium found 



I am not having any problems with my EXT3 file systems located on
“standard” IDE / PATA drives.


My config file, which has not changed in months beyond taking the
defaults during make oldconfig looks like:


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

2007-10-25 Thread Robert Hancock

Greg KH wrote:

On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
I think Greg doesn't like it, even though we don't have an alternative 
at this point...


Yes, I didn't like it, Ivan didn't like it, and I got reports that it
wasn't even needed at all once you upgraded your BIOS to the latest
version.

So, is this still needed?  And if so, can you try to implement what Ivan
suggested to do here instead?


Aren't you guys referring to 
pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another 
one entirely, though related.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is gcc thread-unsafe?

2007-10-25 Thread Robert Hancock

Arjan van de Ven wrote:

On Wed, 24 Oct 2007 21:29:56 -0700
"David Schwartz" <[EMAIL PROTECTED]> wrote:


Well that's exactly right. For threaded programs (and maybe even
real-world non-threaded ones in general), you don't want to be
even _reading_ global variables if you don't need to. Cache misses
and cacheline bouncing could easily cause performance to completely
tank in some cases while only gaining a cycle or two in
microbenchmarks for doing these funny x86 predication things.

For some CPUs, replacing an conditional branch with a conditional
move is a *huge* win because it cannot be mispredicted.


please name one...
Hint: It's not one made by either Intel or AMD in the last 4 years...


It is a win if the branch cannot be effectively predicted, i.e. if the 
outcome is essentially random, as may occur with data-dependent 
conditionals. I've seen a doubling of performance on one workload using 
a predicated instruction instead of a branch on newer Xeons in such a case.


I suspect that if branch prediction fails often, the data dependency 
created by the cmov, etc. is less expensive than the pipeline flush 
required by mispredicts..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

2007-10-25 Thread Robert Hancock
Where did this patch go? I didn't get notified that anyone dropped it, 
but I don't see it in current -git..


[EMAIL PROTECTED] wrote:

The patch titled
 MMCONFIG: validate against ACPI motherboard resources
has been removed from the -mm tree.  Its filename was
 mmconfig-validate-against-acpi-motherboard-resources.patch

This patch was dropped because it was merged into mainline or a subsystem tree

--
Subject: MMCONFIG: validate against ACPI motherboard resources
From: Robert Hancock <[EMAIL PROTECTED]>

This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources.  If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table.  The PCI Express firmware
spec apparently tells BIOS developers that reservation in ACPI is required
and E820 reservation is optional, so checking against ACPI first makes
sense.  Many BIOSes don't reserve the MMCONFIG region in E820 even though
it is perfectly functional, the existing check needlessly disables MMCONFIG
in these cases.

In order to do this, MMCONFIG setup has been split into two phases.  If PCI
configuration type 1 is not available then MMCONFIG is enabled early as
before.  Otherwise, it is enabled later after the ACPI interpreter is
enabled, since we need to be able to execute control methods in order to
check the ACPI reserved resources.  Presently this is just triggered off
the end of ACPI interpreter initialization.

There are a few other behavioral changes here:

- Validate all MMCONFIG configurations provided, not just the first one.

- Validate the entire required length of each configuration according to
  the provided ending bus number is reserved, not just the minimum required
  allocation.

- Validate that the area is reserved even if we read it from the chipset
  directly and not from the MCFG table.  This catches the case where the
  BIOS didn't set the location properly in the chipset and has mapped it
  over other things it shouldn't have.

This also cleans up the MMCONFIG initialization functions so that they
simply do nothing if MMCONFIG is not compiled in.

Based on an original patch by Rajesh Shah from Intel.

[EMAIL PROTECTED]: many fixes and cleanups]
Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>
Cc: Rajesh Shah <[EMAIL PROTECTED]>
Cc: Jesse Barnes <[EMAIL PROTECTED]>
Acked-by: Linus Torvalds <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Greg KH <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/i386/pci/init.c|4 
 arch/i386/pci/mmconfig-shared.c |  151 ++
 arch/i386/pci/pci.h |1 
 drivers/acpi/bus.c  |2 
 include/linux/pci.h |8 +

 5 files changed, 144 insertions(+), 22 deletions(-)

diff -puN 
arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources 
arch/i386/pci/init.c
--- a/arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources
+++ a/arch/i386/pci/init.c
@@ -11,9 +11,7 @@ static __init int pci_access_init(void)
 #ifdef CONFIG_PCI_DIRECT
type = pci_direct_probe();
 #endif
-#ifdef CONFIG_PCI_MMCONFIG
-   pci_mmcfg_init(type);
-#endif
+   pci_mmcfg_early_init(type);
if (raw_pci_ops)
return 0;
 #ifdef CONFIG_PCI_BIOS
diff -puN 
arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources
 arch/i386/pci/mmconfig-shared.c
--- 
a/arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources
+++ a/arch/i386/pci/mmconfig-shared.c
@@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
pci_mmcfg_resources_inserted = 1;
 }
 
-static void __init pci_mmcfg_reject_broken(int type)

+static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
+ void *data)
+{
+   struct resource *mcfg_res = data;
+   struct acpi_resource_address64 address;
+   acpi_status status;
+
+   if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
+   struct acpi_resource_fixed_memory32 *fixmem32 =
+   >data.fixed_memory32;
+   if (!fixmem32)
+   return AE_OK;
+   if ((mcfg_res->start >= fixmem32->address) &&
+   (mcfg_res->end < (fixmem32->address +
+ fixmem32->address_length))) {
+   mcfg_res->flags = 1;
+   return AE_CTRL_TERMINATE;
+   }
+   }
+   if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) &&
+   (res->type != ACPI_RESOURCE_TYPE_ADDRESS64))
+   return AE_OK;
+
+   status = acpi_resource_to_address64(res, );
+   if (ACPI_FAILURE(status) ||
+  (address.address_length <= 0) ||
+  

Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

2007-10-25 Thread Robert Hancock
Where did this patch go? I didn't get notified that anyone dropped it, 
but I don't see it in current -git..


[EMAIL PROTECTED] wrote:

The patch titled
 MMCONFIG: validate against ACPI motherboard resources
has been removed from the -mm tree.  Its filename was
 mmconfig-validate-against-acpi-motherboard-resources.patch

This patch was dropped because it was merged into mainline or a subsystem tree

--
Subject: MMCONFIG: validate against ACPI motherboard resources
From: Robert Hancock [EMAIL PROTECTED]

This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources.  If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table.  The PCI Express firmware
spec apparently tells BIOS developers that reservation in ACPI is required
and E820 reservation is optional, so checking against ACPI first makes
sense.  Many BIOSes don't reserve the MMCONFIG region in E820 even though
it is perfectly functional, the existing check needlessly disables MMCONFIG
in these cases.

In order to do this, MMCONFIG setup has been split into two phases.  If PCI
configuration type 1 is not available then MMCONFIG is enabled early as
before.  Otherwise, it is enabled later after the ACPI interpreter is
enabled, since we need to be able to execute control methods in order to
check the ACPI reserved resources.  Presently this is just triggered off
the end of ACPI interpreter initialization.

There are a few other behavioral changes here:

- Validate all MMCONFIG configurations provided, not just the first one.

- Validate the entire required length of each configuration according to
  the provided ending bus number is reserved, not just the minimum required
  allocation.

- Validate that the area is reserved even if we read it from the chipset
  directly and not from the MCFG table.  This catches the case where the
  BIOS didn't set the location properly in the chipset and has mapped it
  over other things it shouldn't have.

This also cleans up the MMCONFIG initialization functions so that they
simply do nothing if MMCONFIG is not compiled in.

Based on an original patch by Rajesh Shah from Intel.

[EMAIL PROTECTED]: many fixes and cleanups]
Signed-off-by: Robert Hancock [EMAIL PROTECTED]
Cc: Rajesh Shah [EMAIL PROTECTED]
Cc: Jesse Barnes [EMAIL PROTECTED]
Acked-by: Linus Torvalds [EMAIL PROTECTED]
Cc: Andi Kleen [EMAIL PROTECTED]
Cc: Greg KH [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 arch/i386/pci/init.c|4 
 arch/i386/pci/mmconfig-shared.c |  151 ++
 arch/i386/pci/pci.h |1 
 drivers/acpi/bus.c  |2 
 include/linux/pci.h |8 +

 5 files changed, 144 insertions(+), 22 deletions(-)

diff -puN 
arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources 
arch/i386/pci/init.c
--- a/arch/i386/pci/init.c~mmconfig-validate-against-acpi-motherboard-resources
+++ a/arch/i386/pci/init.c
@@ -11,9 +11,7 @@ static __init int pci_access_init(void)
 #ifdef CONFIG_PCI_DIRECT
type = pci_direct_probe();
 #endif
-#ifdef CONFIG_PCI_MMCONFIG
-   pci_mmcfg_init(type);
-#endif
+   pci_mmcfg_early_init(type);
if (raw_pci_ops)
return 0;
 #ifdef CONFIG_PCI_BIOS
diff -puN 
arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources
 arch/i386/pci/mmconfig-shared.c
--- 
a/arch/i386/pci/mmconfig-shared.c~mmconfig-validate-against-acpi-motherboard-resources
+++ a/arch/i386/pci/mmconfig-shared.c
@@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
pci_mmcfg_resources_inserted = 1;
 }
 
-static void __init pci_mmcfg_reject_broken(int type)

+static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
+ void *data)
+{
+   struct resource *mcfg_res = data;
+   struct acpi_resource_address64 address;
+   acpi_status status;
+
+   if (res-type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
+   struct acpi_resource_fixed_memory32 *fixmem32 =
+   res-data.fixed_memory32;
+   if (!fixmem32)
+   return AE_OK;
+   if ((mcfg_res-start = fixmem32-address) 
+   (mcfg_res-end  (fixmem32-address +
+ fixmem32-address_length))) {
+   mcfg_res-flags = 1;
+   return AE_CTRL_TERMINATE;
+   }
+   }
+   if ((res-type != ACPI_RESOURCE_TYPE_ADDRESS32) 
+   (res-type != ACPI_RESOURCE_TYPE_ADDRESS64))
+   return AE_OK;
+
+   status = acpi_resource_to_address64(res, address);
+   if (ACPI_FAILURE(status) ||
+  (address.address_length = 0) ||
+  (address.resource_type != ACPI_MEMORY_RANGE))
+   return AE_OK;
+
+   if ((mcfg_res-start

Re: Is gcc thread-unsafe?

2007-10-25 Thread Robert Hancock

Arjan van de Ven wrote:

On Wed, 24 Oct 2007 21:29:56 -0700
David Schwartz [EMAIL PROTECTED] wrote:


Well that's exactly right. For threaded programs (and maybe even
real-world non-threaded ones in general), you don't want to be
even _reading_ global variables if you don't need to. Cache misses
and cacheline bouncing could easily cause performance to completely
tank in some cases while only gaining a cycle or two in
microbenchmarks for doing these funny x86 predication things.

For some CPUs, replacing an conditional branch with a conditional
move is a *huge* win because it cannot be mispredicted.


please name one...
Hint: It's not one made by either Intel or AMD in the last 4 years...


It is a win if the branch cannot be effectively predicted, i.e. if the 
outcome is essentially random, as may occur with data-dependent 
conditionals. I've seen a doubling of performance on one workload using 
a predicated instruction instead of a branch on newer Xeons in such a case.


I suspect that if branch prediction fails often, the data dependency 
created by the cmov, etc. is less expensive than the pipeline flush 
required by mispredicts..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: - mmconfig-validate-against-acpi-motherboard-resources.patch removed from -mm tree

2007-10-25 Thread Robert Hancock

Greg KH wrote:

On Thu, Oct 25, 2007 at 04:22:35PM -0700, Jesse Barnes wrote:
I think Greg doesn't like it, even though we don't have an alternative 
at this point...


Yes, I didn't like it, Ivan didn't like it, and I got reports that it
wasn't even needed at all once you upgraded your BIOS to the latest
version.

So, is this still needed?  And if so, can you try to implement what Ivan
suggested to do here instead?


Aren't you guys referring to 
pci-disable-decode-of-io-memory-during-bar-sizing.patch? This is another 
one entirely, though related.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HIGHMEM64G Kernel (2.6.23.1) makes system crawl

2007-10-24 Thread Robert Hancock

Rajkumar S wrote:

On 10/24/07, Robert Hancock <[EMAIL PROTECTED]> wrote:

Rajkumar S wrote:

Hello,

I am using a Core 2 Duo E6750 CPU on an intel DG33FB mother board with
4GB Ram, running Debian Lenny.

Since the box has 4 GB ram I compiled a big mem kernel, but the
machine is very slow while running big mem kernel. It takes about 37
minutes to compile the intel e1000 driver  (e1000-7.6.5.tar.gz) from
intel site. But it's performing normally when using a non big mem
kernel. The diff of the .config between working and non working is as
follows.

Post your contents of /proc/mtrr. Likely a BIOS bug which has been seen
on a number of Intel boards, which doesn't mark all of RAM as cachable.


I have upgraded the bios to latest  (v. 0293 October 02, 2007)
Previously the /proc/mtrr was:

ravanan:~# cat /proc/mtrr
reg00: base=0x (   0MB), size=2048MB: write-back, count=1
reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc000 (3072MB), size= 256MB: write-back, count=1
reg03: base=0xcf80 (3320MB), size=   8MB: uncachable, count=1
reg04: base=0xcf60 (3318MB), size=   2MB: uncachable, count=1
reg05: base=0xcf50 (3317MB), size=   1MB: uncachable, count=1
reg06: base=0x1 (4096MB), size= 512MB: write-back, count=1
reg07: base=0x12000 (4608MB), size= 128MB: write-back, count=1

Now after upgrading the bios it's

reg00: base=0x (   0MB), size=2048MB: write-back, count=1
reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc000 (3072MB), size= 256MB: write-back, count=1
reg03: base=0xcf80 (3320MB), size=   8MB: uncachable, count=1
reg04: base=0xcf40 (3316MB), size=   4MB: uncachable, count=1
reg05: base=0x1 (4096MB), size= 512MB: write-back, count=1
reg06: base=0x12000 (4608MB), size= 128MB: write-back, count=1


Yup, it's a BIOS bug. Your BIOS only marks ram up to physical address of 
4736MB as cacheable, while the actual RAM reported by the BIOS goes up 
to physical address 4800MB.


I think we had a patch in -mm to detect this case and disable the extra 
memory (64MB in this case) to keep the kernel from using it.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: test_and_set_bit and friends ?

2007-10-24 Thread Robert Hancock

Mark Hounschell wrote:

Mark Hounschell wrote:

These calls apparently are gone. Can someone tell me why and what are
the replacements.

Thanks in advance
Mark



I got no response from the glibc people on this and the kernel-newbies
list appears dead so I thought I'd try here since these calls are/were
actually kernel based to begin with. Why are they no longer available in
user space and what is one supposed to use now?


In general, none of the kernel synchronization-type functions should 
have been used in userspace since they often depend on infrastructure 
which is only in the kernel in order to work properly. The new headers 
installation system strips out any code intended for kernel use only 
from the userspace-visible headers.


(Not to mention the licensing issues - the kernel is GPL, not LGPL, so 
only GPL programs could have legally done so in the first place.)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HIGHMEM64G Kernel (2.6.23.1) makes system crawl

2007-10-24 Thread Robert Hancock

Rajkumar S wrote:

Hello,

I am using a Core 2 Duo E6750 CPU on an intel DG33FB mother board with
4GB Ram, running Debian Lenny.

Since the box has 4 GB ram I compiled a big mem kernel, but the
machine is very slow while running big mem kernel. It takes about 37
minutes to compile the intel e1000 driver  (e1000-7.6.5.tar.gz) from
intel site. But it's performing normally when using a non big mem
kernel. The diff of the .config between working and non working is as
follows.


Post your contents of /proc/mtrr. Likely a BIOS bug which has been seen 
on a number of Intel boards, which doesn't mark all of RAM as cachable. 
When the top memory starts being used with the bigmem kernel it causes a 
major slowdown. Check for a BIOS update from Intel, first.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: test_and_set_bit and friends ?

2007-10-24 Thread Robert Hancock

Mark Hounschell wrote:

Mark Hounschell wrote:

These calls apparently are gone. Can someone tell me why and what are
the replacements.

Thanks in advance
Mark



I got no response from the glibc people on this and the kernel-newbies
list appears dead so I thought I'd try here since these calls are/were
actually kernel based to begin with. Why are they no longer available in
user space and what is one supposed to use now?


In general, none of the kernel synchronization-type functions should 
have been used in userspace since they often depend on infrastructure 
which is only in the kernel in order to work properly. The new headers 
installation system strips out any code intended for kernel use only 
from the userspace-visible headers.


(Not to mention the licensing issues - the kernel is GPL, not LGPL, so 
only GPL programs could have legally done so in the first place.)


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >