Re: sata_nv + ADMA + Samsung disk problem

2008-01-11 Thread Robert Hancock
Gabor Gombas wrote: On Mon, Jan 07, 2008 at 06:10:29PM -0600, Robert Hancock wrote: Gabor, I just noticed you said that it worked OK in 2.6.20, yet 2.6.22 fails. 2.6.20 had ADMA support as well, so I wonder what change started causing the problem. Would it be possible for you to do a git

Re: sata_nv + ADMA + Samsung disk problem

2008-01-11 Thread Gabor Gombas
On Mon, Jan 07, 2008 at 06:10:29PM -0600, Robert Hancock wrote: > Gabor, I just noticed you said that it worked OK in 2.6.20, yet 2.6.22 > fails. 2.6.20 had ADMA support as well, so I wonder what change started > causing the problem. Would it be possible for you to do a git bisect (or > at

Re: sata_nv + ADMA + Samsung disk problem

2008-01-11 Thread Gabor Gombas
On Mon, Jan 07, 2008 at 06:10:29PM -0600, Robert Hancock wrote: Gabor, I just noticed you said that it worked OK in 2.6.20, yet 2.6.22 fails. 2.6.20 had ADMA support as well, so I wonder what change started causing the problem. Would it be possible for you to do a git bisect (or at

Re: sata_nv + ADMA + Samsung disk problem

2008-01-11 Thread Robert Hancock
Gabor Gombas wrote: On Mon, Jan 07, 2008 at 06:10:29PM -0600, Robert Hancock wrote: Gabor, I just noticed you said that it worked OK in 2.6.20, yet 2.6.22 fails. 2.6.20 had ADMA support as well, so I wonder what change started causing the problem. Would it be possible for you to do a git

Re: sata_nv + ADMA + Samsung disk problem

2008-01-07 Thread Robert Hancock
Allen Martin wrote: Dunno about the NVidia version. Theirs works rather differently - the GO bit is there, but there's another append register which is used to tell the controller that a new tag has been added to the CPB list. The only thing we currently use the GO bit for is to switch

Re: sata_nv + ADMA + Samsung disk problem

2008-01-07 Thread Robert Hancock
Allen Martin wrote: Dunno about the NVidia version. Theirs works rather differently - the GO bit is there, but there's another append register which is used to tell the controller that a new tag has been added to the CPB list. The only thing we currently use the GO bit for is to switch

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Benjamin Herrenschmidt
On Thu, 2008-01-03 at 19:43 -0600, Robert Hancock wrote: > Benjamin Herrenschmidt wrote: > >> Another thing about the PacDigi core: one has to be very careful > >> to avoid sequential accesses to sequential PCI locations when > >> programming the chip -- it cannot handle merged register writes.

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Robert Hancock
Allen Martin wrote: Dunno about the NVidia version. Theirs works rather differently - the GO bit is there, but there's another append register which is used to tell the controller that a new tag has been added to the CPB list. The only thing we currently use the GO bit for is to switch

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Robert Hancock
Benjamin Herrenschmidt wrote: Another thing about the PacDigi core: one has to be very careful to avoid sequential accesses to sequential PCI locations when programming the chip -- it cannot handle merged register writes. So for any group of sequentially laid out registers, the code has to

RE: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Allen Martin
> > Dunno about the NVidia version. > > Theirs works rather differently - the GO bit is there, but there's > another append register which is used to tell the controller > that a new > tag has been added to the CPB list. > > The only thing we currently use the GO bit for is to switch >

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Benjamin Herrenschmidt
> Another thing about the PacDigi core: one has to be very careful > to avoid sequential accesses to sequential PCI locations when > programming the chip -- it cannot handle merged register writes. > > So for any group of sequentially laid out registers, the code has > to ensure it never writes

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Mark Lord
Mark Lord wrote: Robert Hancock wrote: Mark Lord wrote: Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as "sata_nv ADMA controller lockup investigation" way back in Feb 07), what seems to occur is that when the second command is issued very rapidly

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Mark Lord
Robert Hancock wrote: Mark Lord wrote: Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as "sata_nv ADMA controller lockup investigation" way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Mark Lord
Robert Hancock wrote: Mark Lord wrote: Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as sata_nv ADMA controller lockup investigation way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Mark Lord
Mark Lord wrote: Robert Hancock wrote: Mark Lord wrote: Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as sata_nv ADMA controller lockup investigation way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Benjamin Herrenschmidt
Another thing about the PacDigi core: one has to be very careful to avoid sequential accesses to sequential PCI locations when programming the chip -- it cannot handle merged register writes. So for any group of sequentially laid out registers, the code has to ensure it never writes two

RE: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Allen Martin
Dunno about the NVidia version. Theirs works rather differently - the GO bit is there, but there's another append register which is used to tell the controller that a new tag has been added to the CPB list. The only thing we currently use the GO bit for is to switch between ADMA

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Robert Hancock
Benjamin Herrenschmidt wrote: Another thing about the PacDigi core: one has to be very careful to avoid sequential accesses to sequential PCI locations when programming the chip -- it cannot handle merged register writes. So for any group of sequentially laid out registers, the code has to

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Robert Hancock
Allen Martin wrote: Dunno about the NVidia version. Theirs works rather differently - the GO bit is there, but there's another append register which is used to tell the controller that a new tag has been added to the CPB list. The only thing we currently use the GO bit for is to switch

Re: sata_nv + ADMA + Samsung disk problem

2008-01-03 Thread Benjamin Herrenschmidt
On Thu, 2008-01-03 at 19:43 -0600, Robert Hancock wrote: Benjamin Herrenschmidt wrote: Another thing about the PacDigi core: one has to be very careful to avoid sequential accesses to sequential PCI locations when programming the chip -- it cannot handle merged register writes. So for

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Robert Hancock
Mark Lord wrote: Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as "sata_nv ADMA controller lockup investigation" way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20 microseconds, or

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Mark Lord
Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as "sata_nv ADMA controller lockup investigation" way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20 microseconds, or potentially longer) after

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Mark Lord
Robert Hancock wrote: What we're doing to enter legacy mode is essentially: -wait until ADMA status indicates IDLE bit set (max wait of 1 microsecond) -clear GO bit in control register -wait until status indicates LEGACY bit set (max wait of 1 microsecond) and to enter ADMA mode: -set GO bit

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Robert Hancock
Tejun Heo wrote: Robert Hancock wrote: Jeff Garzik wrote: Tejun Heo wrote: Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? FLUSH is used regularly. We really need to fix this. I reiterate my opinion :) ... We should remove ADMA support from sata_nv. It's only in a

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Robert Hancock
Allen Martin wrote: The software definitely provides that guarantee for all NCQ-capable controllers. Well if that's not it, it must be some problem entering ADMA legacy mode. Here's what the Windows driver does: ADMACtrl.aGO = 0 ADMACtrl.aEIEN = 0 poll { until ADMAStatus.aLGCY = 1 ||

RE: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Allen Martin
> The software definitely provides that guarantee for all NCQ-capable > controllers. > Well if that's not it, it must be some problem entering ADMA legacy mode. Here's what the Windows driver does: ADMACtrl.aGO = 0 ADMACtrl.aEIEN = 0 poll { until ADMAStatus.aLGCY = 1 || timeout }

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Jeff Garzik
Allen Martin wrote: The question I had for NVIDIA regarding this that I never got answered was, is there any reason why we would need a delay when switching between NCQ and non-NCQ commands on ADMA, and if not, is there any known cause that could cause the controller to get into this

RE: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Allen Martin
> The question I had for NVIDIA regarding this that I never got > answered > was, is there any reason why we would need a delay when switching > between NCQ and non-NCQ commands on ADMA, and if not, is > there any known > cause that could cause the controller to get into this seemingly >

RE: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Allen Martin
The question I had for NVIDIA regarding this that I never got answered was, is there any reason why we would need a delay when switching between NCQ and non-NCQ commands on ADMA, and if not, is there any known cause that could cause the controller to get into this seemingly locked-up

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Jeff Garzik
Allen Martin wrote: The question I had for NVIDIA regarding this that I never got answered was, is there any reason why we would need a delay when switching between NCQ and non-NCQ commands on ADMA, and if not, is there any known cause that could cause the controller to get into this

RE: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Allen Martin
The software definitely provides that guarantee for all NCQ-capable controllers. Well if that's not it, it must be some problem entering ADMA legacy mode. Here's what the Windows driver does: ADMACtrl.aGO = 0 ADMACtrl.aEIEN = 0 poll { until ADMAStatus.aLGCY = 1 || timeout }

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Robert Hancock
Allen Martin wrote: The software definitely provides that guarantee for all NCQ-capable controllers. Well if that's not it, it must be some problem entering ADMA legacy mode. Here's what the Windows driver does: ADMACtrl.aGO = 0 ADMACtrl.aEIEN = 0 poll { until ADMAStatus.aLGCY = 1 ||

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Robert Hancock
Tejun Heo wrote: Robert Hancock wrote: Jeff Garzik wrote: Tejun Heo wrote: Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? FLUSH is used regularly. We really need to fix this. I reiterate my opinion :) ... We should remove ADMA support from sata_nv. It's only in a

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Mark Lord
Robert Hancock wrote: What we're doing to enter legacy mode is essentially: -wait until ADMA status indicates IDLE bit set (max wait of 1 microsecond) -clear GO bit in control register -wait until status indicates LEGACY bit set (max wait of 1 microsecond) and to enter ADMA mode: -set GO bit

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Mark Lord
Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as sata_nv ADMA controller lockup investigation way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20 microseconds, or potentially longer) after

Re: sata_nv + ADMA + Samsung disk problem

2008-01-02 Thread Robert Hancock
Mark Lord wrote: Robert Hancock wrote: .. From some of the traces I took previously (posted on LKML as sata_nv ADMA controller lockup investigation way back in Feb 07), what seems to occur is that when the second command is issued very rapidly (within less than 20 microseconds, or

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Tejun Heo
Robert Hancock wrote: > Jeff Garzik wrote: >> Tejun Heo wrote: >>> Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? >>> FLUSH is used regularly. We really need to fix this. >> >> >> I reiterate my opinion :) ... We should remove ADMA support from >> sata_nv. It's only in a

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Robert Hancock
Jeff Garzik wrote: Tejun Heo wrote: Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? FLUSH is used regularly. We really need to fix this. I reiterate my opinion :) ... We should remove ADMA support from sata_nv. It's only in a few chips, it's not appearing in any new

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Jeff Garzik
Tejun Heo wrote: Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? FLUSH is used regularly. We really need to fix this. I reiterate my opinion :) ... We should remove ADMA support from sata_nv. It's only in a few chips, it's not appearing in any new chips, and nasty

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Tejun Heo
Robert Hancock wrote: >> This is kind of a longstanding problem which has been partially worked >> around, but it seems not entirely. This is what I had diagnosed some >> time ago: >> >> "recently, some issues cropped up with command timeouts when a cache >> flush command was immediately followed

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Robert Hancock
Robert Hancock wrote: Tejun Heo wrote: [cc'ing Robert Hancock and NVidia people] Whole thread can be read from the following URL. http://thread.gmane.org/gmane.linux.ide/21710 In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I first suspected faulty disk (reallocation

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Robert Hancock
Tejun Heo wrote: [cc'ing Robert Hancock and NVidia people] Whole thread can be read from the following URL. http://thread.gmane.org/gmane.linux.ide/21710 In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I first suspected faulty disk (reallocation failure on flush) but

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Tejun Heo
[cc'ing Robert Hancock and NVidia people] Whole thread can be read from the following URL. http://thread.gmane.org/gmane.linux.ide/21710 In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I first suspected faulty disk (reallocation failure on flush) but SMART reports nothing

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Gabor Gombas
Hi, Just FYI I've tried to enable ADMA again (now running 2.6.24-rc6) but the bug is still present: Jan 1 16:11:21 host kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0 Jan 1 16:11:21 host kernel: ata7: CPB 0:

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Gabor Gombas
Hi, Just FYI I've tried to enable ADMA again (now running 2.6.24-rc6) but the bug is still present: Jan 1 16:11:21 host kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 next cpb count 0x0 next cpb idx 0x0 Jan 1 16:11:21 host kernel: ata7: CPB 0:

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Robert Hancock
Tejun Heo wrote: [cc'ing Robert Hancock and NVidia people] Whole thread can be read from the following URL. http://thread.gmane.org/gmane.linux.ide/21710 In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I first suspected faulty disk (reallocation failure on flush) but

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Robert Hancock
Robert Hancock wrote: Tejun Heo wrote: [cc'ing Robert Hancock and NVidia people] Whole thread can be read from the following URL. http://thread.gmane.org/gmane.linux.ide/21710 In a nutshell, with ADMA enabled, FLUSH_EXT occasionally times out. I first suspected faulty disk (reallocation

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Tejun Heo
Robert Hancock wrote: This is kind of a longstanding problem which has been partially worked around, but it seems not entirely. This is what I had diagnosed some time ago: recently, some issues cropped up with command timeouts when a cache flush command was immediately followed by an NCQ

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Jeff Garzik
Tejun Heo wrote: Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? FLUSH is used regularly. We really need to fix this. I reiterate my opinion :) ... We should remove ADMA support from sata_nv. It's only in a few chips, it's not appearing in any new chips, and nasty

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Robert Hancock
Jeff Garzik wrote: Tejun Heo wrote: Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? FLUSH is used regularly. We really need to fix this. I reiterate my opinion :) ... We should remove ADMA support from sata_nv. It's only in a few chips, it's not appearing in any new

Re: sata_nv + ADMA + Samsung disk problem

2008-01-01 Thread Tejun Heo
Robert Hancock wrote: Jeff Garzik wrote: Tejun Heo wrote: Thanks a lot for the detailed explanation. Nvidia ppl, any ideas? FLUSH is used regularly. We really need to fix this. I reiterate my opinion :) ... We should remove ADMA support from sata_nv. It's only in a few chips, it's

Re: sata_nv + ADMA + Samsung disk problem

2007-08-16 Thread Jim Paris
Gabor Gombas wrote: > On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote: > > Hmmm... That's timeout on cache flush, indicative of failing disk. > > Please post the result of 'smartctl -a /dev/sdc'. > > Ok, so something is fishy in 2.6.22 wrt. SMART. See http://lkml.org/lkml/2007/7/8/198

Re: sata_nv + ADMA + Samsung disk problem

2007-08-16 Thread Gabor Gombas
Hi, On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote: > Hmmm... That's timeout on cache flush, indicative of failing disk. > Please post the result of 'smartctl -a /dev/sdc'. Ok, so something is fishy in 2.6.22 wrt. SMART. First, booting back to 2.6.20.5 I confirmed that SMART works

Re: sata_nv + ADMA + Samsung disk problem

2007-08-16 Thread Gabor Gombas
Hi, On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote: Hmmm... That's timeout on cache flush, indicative of failing disk. Please post the result of 'smartctl -a /dev/sdc'. Ok, so something is fishy in 2.6.22 wrt. SMART. First, booting back to 2.6.20.5 I confirmed that SMART works

Re: sata_nv + ADMA + Samsung disk problem

2007-08-16 Thread Jim Paris
Gabor Gombas wrote: On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote: Hmmm... That's timeout on cache flush, indicative of failing disk. Please post the result of 'smartctl -a /dev/sdc'. Ok, so something is fishy in 2.6.22 wrt. SMART. See http://lkml.org/lkml/2007/7/8/198 -jim -

Re: sata_nv + ADMA + Samsung disk problem

2007-08-14 Thread Gabor Gombas
On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote: > Hmmm... That's timeout on cache flush, indicative of failing disk. > Please post the result of 'smartctl -a /dev/sdc'. Will do when I get home. Note however that this only occurs in ADMA mode. It never occured with 2.6.20 and it never

Re: sata_nv + ADMA + Samsung disk problem

2007-08-14 Thread Tejun Heo
Gabor Gombas wrote: > Hi, > > Since I have upgraded to 2.6.22.1 from 2.6.20 I have problems with > Samsung disks. Sometimes the disks stall for about half a minute and > then I have these messages in the logs: > > Aug 6 20:10:11 twister kernel: ata7: EH in ADMA mode, notifier 0x0 >

Re: sata_nv + ADMA + Samsung disk problem

2007-08-14 Thread Tejun Heo
Gabor Gombas wrote: Hi, Since I have upgraded to 2.6.22.1 from 2.6.20 I have problems with Samsung disks. Sometimes the disks stall for about half a minute and then I have these messages in the logs: Aug 6 20:10:11 twister kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0

Re: sata_nv + ADMA + Samsung disk problem

2007-08-14 Thread Gabor Gombas
On Tue, Aug 14, 2007 at 06:30:28PM +0900, Tejun Heo wrote: Hmmm... That's timeout on cache flush, indicative of failing disk. Please post the result of 'smartctl -a /dev/sdc'. Will do when I get home. Note however that this only occurs in ADMA mode. It never occured with 2.6.20 and it never