Re: Scary Intel SATA problem: "frozen"

2006-12-06 Thread Jonas Lundgren
ways
  -   18117
 10 Spin_Retry_Count0x0013   100   100   051Pre-fail  Always
  -   0
 11 Calibration_Retry_Count 0x0013   100   100   051Pre-fail  Always
  -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always
  -   228
194 Temperature_Celsius 0x0022   117   108   000Old_age   Always
  -   33
196 Reallocated_Event_Count 0x0032   001   001   000Old_age   Always
  -   639
197 Current_Pending_Sector  0x0012   200   200   000Old_age   Always
  -   0
198 Offline_Uncorrectable   0x0012   200   200   000Old_age   Always
  -   0
199 UDMA_CRC_Error_Count0x000a   200   253   000Old_age   Always
  -   0
200 Multi_Zone_Error_Rate   0x0009   200   179   051Pre-fail
Offline  -   0


The "Reallocated_Sector_Ct" and "Reallocated_Event_Count" worries me..
Should I be worried?

-- 
-Jonas

Name:   Jonas Lundgren
ICQ#:   52064961
Mail:   [EMAIL PROTECTED]
IRC:neon / neonman @ EFnet, Undernet, Quakenet, freenode
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Scary Intel SATA problem: "frozen"

2006-11-28 Thread Jonas Lundgren


Tejun Heo wrote:
> Jonas Lundgren wrote:
> [--snip--]
>> Also, it doesn't matter if I enable AHCI in the BIOS (But with AHCI
>> enabled the disks spin down/power down when I boot, just to power up
>> again a few seconds after. The boot progress freezes until the disks
>> have spun up again. (This happens when the kernel probes the sata
>> controller ports at bootup, the disks spin down at the same time, but
>> spin up one by one as they're getting probed))
> 
> Likely fix is pending for this problem.
> 
>> I've tried changing I/O scheduler, only noticable diffrence is when I
>> use "noop". Then I get like 20mb/sec write instead of 4mb/sec. I have no
>> idea why this is :P
>>
>> Example of what I mean with crappy performance:
>> dd if=/dev/zero of=test232 bs=1M count=100; time sync
>> 100+0 records in
>> 100+0 records out
>> 104857600 bytes (105 MB) copied, 0.130424 s, 804 MB/s
>> real 0m21.104s
>> user 0m0.000s
>> sys 0m0.011s
>>
>> 21 seconds to do a seq write of 100mb.. And during this time ALL other
>> disk IO gets starved, I can't do anything that uses disk IO for the
>> duration.. (not even `ls`)
> 
> What does the kernel say during this writing?  Can you post the result
> of the following?
> 
> 1. reboot
> 2. dmesg -c
> 3. time dd if=/dev/zero.. blah
> 4. dmesg
> 
> Also, does 'mount -o remount,barrier=0 /' change anything?

I will post this info as soon as I can "reproduce" the error.

> 
>> Yet, a hdparm shows a decent read
>> hdparm -tT /dev/md4
>> /dev/md4:
>> Timing cached reads: 8060 MB in 1.99 seconds = 4042.19 MB/sec
>> Timing buffered disk reads: 400 MB in 3.00 seconds = 133.28 MB/sec
>>
>> dd if=1GBzeroFile of=/dev/null bs=1M count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 11.4335 s, 91.7 MB/s
>>
>> This is the cpu usage stats I get from top when running the dd write:
>> Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 99.0%wa, 0.5%hi, 0.5%si, 0.0%st
>> Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
>>
>> Pretty crappy read speeds compared to what I got on my previous mobo
>> (around 140mb/sec), but still alot better than the 4mb/sec I get when
>> writing..
> 
> Which controller did you use on your previous mobo?  If you're using
> ata_piix and hook two hard drives as primary and secondary on the same
> channel, some level of performance degradation is expected.  ata_piix
> can only issue command to only one of the two drives at once.  Is the
> read performance still bad in ahci mode?

Atm I run the ICH8 SATA ports in AHCI mode with "IDE bus master"(To be
honest I don't really know what this option does, no info about it in
the BIOS nor the mobo manual) turned off in BIOS. The drives are
connected to port 1, 3, 6 and 8 (raptor+raptor on 1+3, and WD 250G + WD
250G (also a raid0) on ports 6+8)

> 
> [--snip--]
>> Dmesg output from the error(s): (sda and sdb are 2 * 74GB raptor SATA
>> drives in a Linux software raid0)
>>
>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
>> ata1.00: (BMDMA stat 0x20)
>> ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
> 
> This might be a missed interrupt.  It's a write.  DMA engine is done
> finishing transferring all data.  Device is ready for the next command
> but the interrupt has never arrived.
> 
>> ata1: port is slow to respond, please be patient
>> ata1: port failed to respond (30 secs)
>> ata1: soft resetting port
>> ATA: abnormal status 0xD0 on port 0xFA07
>> ATA: abnormal status 0xD0 on port 0xFA07
>> ATA: abnormal status 0xD0 on port 0xFA07
>> ATA: abnormal status 0xD0 on port 0xFA07
>> ATA: abnormal status 0xD0 on port 0xFA07
>> ATA: abnormal status 0xD0 on port 0xFA07
>> ata1.00: qc timeout (cmd 0xec)
>> ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> ata1.00: revalidation failed (errno=-5)
>> ata1: failed to recover some devices, retrying in 5 secs
> 
> But this is weird.  If it were a missed interrupt, softreset should have
> recovered it instantly.  Something fishy is going on.
> 
> [--snip--]
>> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
>> ata1.00: (BMDMA stat 0x21)
>> ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout)
> 
> Same thing for read.
> 
>> ata1: port is slow to respond, please be patient
>> ata1: port failed to respond (30 secs)
> 
> Again, pre-reset wait times out.  Weird.
> 
>> ata1: soft resetting port
>> ata1.00:

Re: Scary Intel SATA problem: "frozen"

2006-11-28 Thread Jonas Lundgren
 0xD0 on port 0xFA07
ATA: abnormal status 0xD0 on port 0xFA07
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x2)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: soft resetting port
ata1.00: configured for UDMA/100
ata1: EH complete
SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back


Most of the time when I get these errors the system will recover after
anything from 10 seconds to 10 minutes of unresponsiveness (no disk
I/O), and sometimes hang. IF the system does recover, I start getting
the extremly low disk write speeds that I reported above, and only a
reboot will get the performance back to regular.

I don't know what causes it, but most of the times when I've gotten it
my system has been under heavy load (compiling, downloading torrents in
11mb/sec etc). Please let me know if you want any additional info, want
me to try something out, or whatever. My recent hardware upgrade for
around $1200 (to a core2duo system, i965 mobo) is just going to waste
because of this problem. :/

I just got so glad when I saw the post of this on linux-ide, I've been
searching like crazy to find another person having the same problem (and
possibly a solution) for the past 2-3 weeks or so.

-- 
-Jonas

Name:   Jonas Lundgren
ICQ#:   52064961
IRC:neon / neonman @ EFnet, Undernet, Quakenet, freenode
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html