Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Dieter openbsd at sopwith.solgatos.com writes: Sigh. I could easily go on a major rant here, but it wouldn't do us any good. Anyone have information or ideas that could get us closer to a solution? Event log counter can be written every once in a while for example if S.M.A.R.T automatic off-line data collection (ex. every 4h) is enabled (it is by default and may include a list of last errors), temperatures, SER and others. It appears to be theorically possible to effectively verify if a particular drive is affected by the problem using S.M.A.R.T information (ie. attributes, logs, etc) and that may also be used as workaround if there is a way to change the event counter to a safe value (ie. not 320 or a multiple of 320 + x*256) but we need more details (ex. specific data pattern) which were released by Seagate under NDA to some partners/vendors. It might be possible to find out by comparing/researching the S.M.A.R.T information (including some of the vendor logs like 0xa1) from affected and non-affected drives matching basic the requirements (7200.11/ES2.1/DiamondMax 22 both new and old firmware). Log Directory Supported (this one is from an affected model) SMART Log Directory Logging Version 1 [multi-sector log support] Log at address 0x00 has 001 sectors [Log Directory] Log at address 0x01 has 001 sectors [Summary SMART error log] Log at address 0x02 has 005 sectors [Comprehensive SMART error log] Log at address 0x03 has 005 sectors [Extended Comprehensive SMART error log] Log at address 0x06 has 001 sectors [SMART self-test log] Log at address 0x07 has 001 sectors [Extended self-test log] Log at address 0x09 has 001 sectors [Selective self-test log] Log at address 0x10 has 001 sectors [Reserved log] Log at address 0x11 has 001 sectors [Reserved log] Log at address 0x21 has 001 sectors [Write stream error log] Log at address 0x22 has 001 sectors [Read stream error log] Log at address 0x80 has 016 sectors [Host vendor specific log] Log at address 0x81 has 016 sectors [Host vendor specific log] Log at address 0x82 has 016 sectors [Host vendor specific log] Log at address 0x83 has 016 sectors [Host vendor specific log] Log at address 0x84 has 016 sectors [Host vendor specific log] Log at address 0x85 has 016 sectors [Host vendor specific log] Log at address 0x86 has 016 sectors [Host vendor specific log] Log at address 0x87 has 016 sectors [Host vendor specific log] Log at address 0x88 has 016 sectors [Host vendor specific log] Log at address 0x89 has 016 sectors [Host vendor specific log] Log at address 0x8a has 016 sectors [Host vendor specific log] Log at address 0x8b has 016 sectors [Host vendor specific log] Log at address 0x8c has 016 sectors [Host vendor specific log] Log at address 0x8d has 016 sectors [Host vendor specific log] Log at address 0x8e has 016 sectors [Host vendor specific log] Log at address 0x8f has 016 sectors [Host vendor specific log] Log at address 0x90 has 016 sectors [Host vendor specific log] Log at address 0x91 has 016 sectors [Host vendor specific log] Log at address 0x92 has 016 sectors [Host vendor specific log] Log at address 0x93 has 016 sectors [Host vendor specific log] Log at address 0x94 has 016 sectors [Host vendor specific log] Log at address 0x95 has 016 sectors [Host vendor specific log] Log at address 0x96 has 016 sectors [Host vendor specific log] Log at address 0x97 has 016 sectors [Host vendor specific log] Log at address 0x98 has 016 sectors [Host vendor specific log] Log at address 0x99 has 016 sectors [Host vendor specific log] Log at address 0x9a has 016 sectors [Host vendor specific log] Log at address 0x9b has 016 sectors [Host vendor specific log] Log at address 0x9c has 016 sectors [Host vendor specific log] Log at address 0x9d has 016 sectors [Host vendor specific log] Log at address 0x9e has 016 sectors [Host vendor specific log] Log at address 0x9f has 016 sectors [Host vendor specific log] Log at address 0xa1 has 020 sectors [Device vendor specific log] Log at address 0xa8 has 020 sectors [Device vendor specific log] Log at address 0xa9 has 001 sectors [Device vendor specific log] Log at address 0xe0 has 001 sectors [Reserved log] Log at address 0xe1 has 001 sectors [Reserved log]
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
please, this is way off topic. could you try and find a better list to chat about this on...
Re: Dealing with Seagate's problematic 7200.11 firmware.
Has anyone looked into disassembling the firmware?
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Hi, On Tue, 27.01.2009 at 21:37:28 +, Dieter open...@sopwith.solgatos.com wrote: Toni writes: positives and false negatives. After deciding that the results were far too unreliable, the page was pulled. That too. For one thing people were entering the serial numbers using lower case letters and getting false negatives. this is a joke, right? There is a reason I want to look into zeroing out the magic area as an alternative to risking updating the firmware. :-( Understood... I'm looking for a different vendor, too. :-| the power fails. So not a great workaround, but better than nothing, Right. As I understand it, updating the firmware on some mainboards IS risky. It may well be that some combinations don't work, but at some point, I'd say that this should fall into the category of you get what you pay for. IOW, I can't imagine that doing this kind of stuff right would cost more than, say, $1 for a drive, and $5 for a motherboard, and I think that everyone should be prepared to add, say, $50 to a small server to get these things, ie, (much) less broken designs, imho. But the bigger problem is that currently there appears to be no way to add $50, or even $500, to a server, to get these things right because there seems to be no vendor who offers such stuff. There is supposed to be some document that explains all this, with enough details to create a fix. If anyone finds this document I need a copy please. Me too! Sounds like you are on good terms with your dealer. Can your dealer get you a copy? LOL. I can ask him, but don't expect too much... Kind regards, --Toni++
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
positives and false negatives. After deciding that the results were far too unreliable, the page was pulled. That too. For one thing people were entering the serial numbers using lower case letters and getting false negatives. this is a joke, right? As far as I can tell it is not a joke. The people entering the serial numbers might have been wintel users and thus not too bright. Seagate's quality control dept is clearly missing in action lately. As I understand it, updating the firmware on some mainboards IS risky. It may well be that some combinations don't work, but at some point, I'd say that this should fall into the category of you get what you pay for. IOW, I can't imagine that doing this kind of stuff right would cost more than, say, $1 for a drive, and $5 for a motherboard, and I think that everyone should be prepared to add, say, $50 to a small server to get these things, ie, (much) less broken designs, imho. But the bigger problem is that currently there appears to be no way to add $50, or even $500, to a server, to get these things right because there seems to be no vendor who offers such stuff. The idiots in charge of most companies don't care about quality control. Sigh. I could easily go on a major rant here, but it wouldn't do us any good. Anyone have information or ideas that could get us closer to a solution?
Re: Dealing with Seagate's problematic 7200.11 firmware.
Hi, On Mon, 26.01.2009 at 15:39:36 +0100, Raimo Niskanen raimo+open...@erix.ericsson.se wrote: How can I know if I have a suspicious drive? you won't, imho, until Seagate will deliver usable data on this issue. Their statements so far were a long way from being trust-inspiring, imho. My best bet is currently to wait for a definite statement of my dealer, who also carries the burden of providing warranty to me (so I hope he'll think twice before saying something he doesn't at least believe). In the meantime, I've opted to not power down or reboot any machine as long as I have definite answers, which turns out to be quite a nuisance! -- Kind regards, --Toni++
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Hi, On Mon, 26.01.2009 at 17:08:51 +, Dieter open...@sopwith.solgatos.com wrote: It is easy to set up a slashdot account. Or you can post as anonymous coward. yes, but I don't want to set up a /. account right now, and posting as AC wouldn't likely solve the problem. that he has another slashdot account that isn't anonymous. Problem I have is I can't find a way to send him a PM (private message). Most web This is exactly the point. forums have a facility for sending other users a PM. We can post a reply to the thread, but he would have to read the thread again to see it. Any slashdot wizards out there have an idea? Post to the thread and offer one's own email address (maybe time-limited or so), and hope for the best... not exactly a silver bullet, but maybe better than nothing. It isn't even just FLOSS. Any non-x86 machine is out of luck. Proprietary Unix is out of luck. Anything embedded is out of luck. Even Mac is probably out of luck. And if the reboot to run the firmware installer bricks the drive(s) even wintel is out of luck. Yes, and smartmontools claims to run on all platforms you mentioned (except MAC OS 9). Ie, they even run on Windows and/or together with Cygwin. Therefore, I think that this is a strategic point from where the problem could be solved for a really broad range of systems, and in one go. I don't understand the common corporate policy of keeping everything secret. All they are doing is hurting their previously loyal customers. It didn't used to be this way. Oh... over here, we have a saying: Sea gate, oder sie geht net. (meaning: it works, or it doesn't - it's a pun on the pronounciation of Seagate). Yes, many people, me included, thought they had reformed... Supposedly there was a broken test machine that didn't zero out some special area after writing a test pattern. So only drives that were tested on that machine are at risk. I'd like to not speculate about the cause of the problem any longer, but instead devise a plan to acquire the required knowledge to beef up smartmontools to solve the problem. I could only believe such claims about the causes, but presently, Seagate destroyed about as much trust as they possibly could, at least with me. So, except for the hard-core technical data, they're out of the loop as far as I'm concerned. If we can find out what area this is (I assume it isn't in the normal space used for user storage) and how to zero it (if not already zero) there is no need to update the firmware. I'd rather say that the (ring) buffer has some external counter, also stored somewhere, which needs to be adjusted. I'd not bet that simply zeroing the area(s) will do. Good question. Seagate has some web page that supposedly will tell you, but of course it is broken and doesn't work with all browsers. At some time, they had a page where you could enter your model and serial number, but reportedly this page delivered a lot of false positives and false negatives. After deciding that the results were far too unreliable, the page was pulled. Toni reports that ES and ES.2 may be affected. This I took from a Seagate web page. Stuart Henderson has posted the link, and I had the same link in my email which I received from Seagate, so, I'd say, the link is genuine (despite the contents of the page being almost worthless, imho). From what I've read it sounds like the counter must be exactly 320 AND some location must have a test pattern rather than zero when you init (power up or reboot) the drive. From Maxtorman's description, the log is circular, so it will eventually wrap around to 320 again. My dealer, who claimed that he also had information directly from Seagate, told me that the buffer was 256 entries long (makes a lot of sense, imho), but nevermind. We need hard facts, preferably in the form of photocopies of internal design papers or so, not speculations. So keeping the counter away from 320 is an okay short term workaround, This would require to periodically check the log position and eg. reset it to zero at shutdown, to be on the safe side. but long term we want to either zero out the magic location or update the firmware. We want to have updated firmware and the ability to update firmware for all drives, also from other manufacturers. Updating firmware for a drive shouldn't be any more complicated or risky than updating the BIOS on the motherboard. There is supposed to be some document that explains all this, with enough details to create a fix. If anyone finds this document I need a copy please. Me too! If you have one or more of the suspect drives, if it running, try to keep it running and don't reboot. If it is powered down leave it powered down if possible until this all gets sorted out. Yes... but that still doesn't help you in the face of a system's crash. What to do then? No need to answer this one... -- Kind regards, --Toni++
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Hi, On Mon, 26.01.2009 at 17:08:51 +, Dieter open...@sopwith.solgatos.com wrote: Your suggestion of smartmontools is helpful, thank you. thanks - I have just sent an email to them, esp. after seeing that there are people from big name companies involved, who could procure at least some of the required documentation inhouse. -- Kind regards, --Toni++
Re: Dealing with Seagate's problematic 7200.11 firmware.
Dieter openbsd at sopwith.solgatos.com writes: Recovering from Seagate's problematic 7200.11 firmware. Most of you have read about the problems with Seagate's 7200.11 disks. For those of you that haven't, the firmware on many of these drives is buggy, and can brick the drive when powering up or rebooting the system. Thus far, Seagate's response has been less than wonderful. We need a FLOSS solution. Goals: 1) Ability to read the number of log entries. 2) Ability to change the number of log entries. As far I know the drive internal event counter can only be accessed or changed from firmware level (ie. serial/pc-3000). Maybe disabling the S.M.A.R.T automatic off-line data collection (and/or the attribute autosave) with smartctl could somehow prevent the internal event log from reaching the magic value (320 or 320+x*256) because it does save data to reserved drive area (in case of errors it even includes POH). POH = Power On Hours 3) Ability to install new firmware from Unix. Drive firmware flashing from (S)ATA interface level could be done on UNIX but doing so from a mounted file-system (to avoid a reboot) and/or without controller reset might have castrophic results (would risk to say it's even more critical than updating system BIOS because there more variables - ie. different controllers, RAID, etc). We need for this to work with any flavor of Unix, on any CPU arch, without reboot or power cycle. We need for this to work on one drive without affecting other drives. I don't expect to be able to write FLOSS firmware for the drives, so this isn't listed as a goal. If you think you can, please feel free. I also think the firmware should be open-source with a portable (any arch) update tool. This would allow many improvements and a much more reliable bug tracking/testing process (ie. there are many firmware bugs like NCQ stuttering issue with some versions, self-test log holes, etc). Writing FLOSS firmware would require some degree of cooperation from Seagate. The problem: IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS. Maxtorman, slashdot discussion [2] If Maxtorman is correct, then once the drive has been operating awhile, we have a 1 in 320 chance that the circular log is at entry 320. We want to be able to find out how many log entries the disk currently has, and we want to be able to change the number of log entries away from 320, while we wait for Seagate to get its act together and release firmware that works properly. Since Seagate's solution will require attaching the drive to an x86 system and booting a FreeDOS ISO from CD, if the log is at 320 that boot will brick the drive. There are other firmware problems with the 7200.11 series, but this is the biggie. Once Seagate releases working firmware, we want to be able to install it from Unix, on any CPU arch. Seagate's release can only install on x86 using FreeDOS. *ATA Commands that may be useful: command name command code in hex page [1] pdf page [1] Read Log Ext 0x2F27 33 S.M.A.R.T. Read Log Sector0xB0 / 0xD5 28,34 34,40 S.M.A.R.T. Write Log Sector 0xB0 / 0xD6 28,34 34.40 Write Log Extended0x3F28 34 Download Microcode0x9227 33 Questions: Is Maxtorman correct about the 320 log entries? Are the commands listed above the ones we need? What is the difference between the Log Extended and the S.M.A.R.T. Log Sector? Is Microcode the same as firmware? (Seagate uses the term firmware elsewhere in the manual, but I don't find any sort of write firmware command.) Where can we get more detailed info about these commands and how to use them? Maxtorman is right about the 320 but it's bit more complicated. Here is the failure root cause detailed descrption (no NDA pets were hurt): The firmware issue is that the end boundary of the event log circular buffer (320) was set incorrectly. During Event Log initialization, the boundary condition that defines the end of the Event Log is off by one. During power up, if the Event Log counter is at entry 320, or a multiple of (320 + x*256), and if a particular data pattern (dependent on the type of tester used during the drive manufacturing test process) had been present in the reserved-area system tracks when the drive's reserved-area file system was created during manufacturing, firmware will increment the Event Log pointer past the end of the event log data structure. This error is detected and results in an Assert Failure, which causes the drive to hang as a
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Nenhum_de_Nos matheus at eternamente.info writes: where you read that from ? I have a couple of 750GB ES.2 and now I'm worried ! http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931NewLang=en thanks, yet OT, but I also heard of new firmwares being worse than old ones, from seagate first try to fix things. anyone already updated some ES.2 and all went fine ? thanks, matheus I updated some ST3500320NS to SN06C and everything went fine. If you have the SAS version of ES2.1 it's not affected (does not need update).
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Toni writes: If we can find out what area this is (I assume it isn't in the normal space used for user storage) and how to zero it (if not already zero) there is no need to update the firmware. I'd rather say that the (ring) buffer has some external counter, also stored somewhere, which needs to be adjusted. I'd not bet that simply zeroing the area(s) will do. From what I've read, zeroing the area should make the disk safe. Depending on what it takes to zero the area, this might be either more or less safe than updating the firmware. Good question. Seagate has some web page that supposedly will tell you, but of course it is broken and doesn't work with all browsers. At some time, they had a page where you could enter your model and serial number, but reportedly this page delivered a lot of false positives and false negatives. After deciding that the results were far too unreliable, the page was pulled. That too. For one thing people were entering the serial numbers using lower case letters and getting false negatives. Does Seagate test anything? Their firmware is buggy, their test equipment is buggy, their web site doesn't work, their model serial number checker program is buggy. They released a firmware installer program that bricks drives. There is a reason I want to look into zeroing out the magic area as an alternative to risking updating the firmware. :-( So keeping the counter away from 320 is an okay short term workaround, This would require to periodically check the log position and eg. reset it to zero at shutdown, to be on the safe side. Yes. Depending on what events make the counter increment, it might be possible for the counter to go from 0 to 320 in a short time, and then the power fails. So not a great workaround, but better than nothing, and if we get info on the counter before getting info on a proper fix (either zeroing the magic are or updating the firmware) we could use it as a workaround until we get a proper fix. but long term we want to either zero out the magic location or update the firmware. We want to have updated firmware and the ability to update firmware for all drives, also from other manufacturers. Updating firmware for a drive shouldn't be any more complicated or risky than updating the BIOS on the motherboard. As I understand it, updating the firmware on some mainboards IS risky. Some have a fail safe that allows trying again, others have two areas that can be written, but some have neither of these and risk getting bricked. There is supposed to be some document that explains all this, with enough details to create a fix. If anyone finds this document I need a copy please. Me too! Sounds like you are on good terms with your dealer. Can your dealer get you a copy? -- Sieve-X writes: As far I know the drive internal event counter can only be accessed or changed from firmware level (ie. serial/pc-3000). By serial you mean using the TTL to RS-232 thing? For most of us that would require taking the drive out of whatever case it is mounted in to access the pins. What is pc-3000? Drive firmware flashing from (S)ATA interface level could be done on UNIX but doing so from a mounted file-system (to avoid a reboot) I was worried that device drivers might not have a facility to pass through whatever magic commands we need. But yeah, if the root partition is on an affected drive that might be a problem. It would be nice to know if rebooting is safe or not.
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Hi, On Sun, 25.01.2009 at 16:27:14 +, Dieter open...@sopwith.solgatos.com wrote: I wrote: You wrote: Is Maxtorman correct about the 320 log entries? My dealer told me a similar story, but I don't know where he had it from. I guess the next step is to find out if Maxtorman is correct about this 320 log entries stuff, and if the SMART log entries as reported by smartmontools is the log to worry about, or if there is some other log. I don't have an account on /., and also feel incapable of actually working on this problem, but someone who has and can, could probably try to nag maxtorman about improving smartmontools to the point that they do the right thing, or try to get him to connect one to somebody else who can verify the issue and/or provide more technical details. If he can find a way to almost-anonymously post to /., he might be able to give some hints to the smartmontools gyus, too. Then, we only need them to integrate everything and make a new release. Personally, I'd say that it'd be best if Seagate themselves would grab the opportunity to partially make good on the issue, but I heavily doubt that they understand, or want to understand, what's it about with FLOSS. Kind regards, --Toni++
Re: Dealing with Seagate's problematic 7200.11 firmware.
Dieter wrote: Recovering from Seagate's problematic 7200.11 firmware. Most of you have read about the problems with Seagate's 7200.11 disks. For those of you that haven't, the firmware on many of these drives is buggy, and can brick the drive when powering up or rebooting the system. Thus far, Seagate's response has been less than wonderful. We need a FLOSS solution. Goals: 1) Ability to read the number of log entries. 2) Ability to change the number of log entries. 3) Ability to install new firmware from Unix. We need for this to work with any flavor of Unix, on any CPU arch, without reboot or power cycle. We need for this to work on one drive without affecting other drives. I don't expect to be able to write FLOSS firmware for the drives, so this isn't listed as a goal. If you think you can, please feel free. The problem: IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS. Maxtorman, slashdot discussion [2] Just a hypothetical situation, since we do not have the sourcecode of the firmware: isn't it possible some kind of mathematical operation is occuring on the number of log entries causing some kind of infinite loop to occur or a division that leads to/by 0 that the software/hardware is unable to handle? That could mean this problem could also manifest itself on for example multiples of 320, so just putting the counter on 321 may just be delaying the inevitable. And what happens if the counter overflows and reaches 320 again? Glenn If Maxtorman is correct, then once the drive has been operating awhile, we have a 1 in 320 chance that the circular log is at entry 320. We want to be able to find out how many log entries the disk currently has, and we want to be able to change the number of log entries away from 320, while we wait for Seagate to get its act together and release firmware that works properly. Since Seagate's solution will require attaching the drive to an x86 system and booting a FreeDOS ISO from CD, if the log is at 320 that boot will brick the drive. There are other firmware problems with the 7200.11 series, but this is the biggie. Once Seagate releases working firmware, we want to be able to install it from Unix, on any CPU arch. Seagate's release can only install on x86 using FreeDOS. *ATA Commands that may be useful: command namecommand code in hex page [1] pdf page [1] Read Log Ext0x2F27 33 S.M.A.R.T. Read Log Sector 0xB0 / 0xD5 28,34 34,40 S.M.A.R.T. Write Log Sector 0xB0 / 0xD6 28,34 34.40 Write Log Extended 0x3F28 34 Download Microcode 0x9227 33 Questions: Is Maxtorman correct about the 320 log entries? Are the commands listed above the ones we need? What is the difference between the Log Extended and the S.M.A.R.T. Log Sector? Is Microcode the same as firmware? (Seagate uses the term firmware elsewhere in the manual, but I don't find any sort of write firmware command.) Where can we get more detailed info about these commands and how to use them? References: [1] Seagate Barracuda 7200.11 Serial ATA Product Manual rev C August 2008 http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%207200.11/100507013c.pdf [2] http://it.slashdot.org/article.pl?sid=09/01/21/0052236
Re: Dealing with Seagate's problematic 7200.11 firmware.
On Fri, Jan 23, 2009 at 09:28:34PM +, Dieter wrote: Recovering from Seagate's problematic 7200.11 firmware. Most of you have read about the problems with Seagate's 7200.11 disks. For those of you that haven't, the firmware on many of these drives is buggy, and can brick the drive when powering up or rebooting the system. Thus far, How can I know if I have a suspicious drive? E.g# smartctl -i -d ata /dev/rwd1c smartctl version 5.33 [i386-unknown-openbsd4.1] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3808110AS Serial Number:5LRA2E2J Firmware Version: 3.AJJ User Capacity:80,026,361,856 bytes Device is:Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Mon Jan 26 15:31:45 2009 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled Google for ST3808110AS gives me Barracuda 7200.9 SATA 80-GB Hard Drive, so I guess this one is not suspicious, but I have more disks, in other servers. What if i find a 7200.10, 7200.11, ES or ES.2, is that enough for me to suspect it? Seagate's response has been less than wonderful. We need a FLOSS solution. Goals: 1) Ability to read the number of log entries. 2) Ability to change the number of log entries. 3) Ability to install new firmware from Unix. We need for this to work with any flavor of Unix, on any CPU arch, without reboot or power cycle. We need for this to work on one drive without affecting other drives. I don't expect to be able to write FLOSS firmware for the drives, so this isn't listed as a goal. If you think you can, please feel free. The problem: IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS. Maxtorman, slashdot discussion [2] If Maxtorman is correct, then once the drive has been operating awhile, we have a 1 in 320 chance that the circular log is at entry 320. We want to be able to find out how many log entries the disk currently has, and we want to be able to change the number of log entries away from 320, while we wait for Seagate to get its act together and release firmware that works properly. Since Seagate's solution will require attaching the drive to an x86 system and booting a FreeDOS ISO from CD, if the log is at 320 that boot will brick the drive. There are other firmware problems with the 7200.11 series, but this is the biggie. Once Seagate releases working firmware, we want to be able to install it from Unix, on any CPU arch. Seagate's release can only install on x86 using FreeDOS. *ATA Commands that may be useful: command name command code in hex page [1] pdf page [1] Read Log Ext 0x2F27 33 S.M.A.R.T. Read Log Sector0xB0 / 0xD5 28,34 34,40 S.M.A.R.T. Write Log Sector 0xB0 / 0xD6 28,34 34.40 Write Log Extended0x3F28 34 Download Microcode0x9227 33 Questions: Is Maxtorman correct about the 320 log entries? Are the commands listed above the ones we need? What is the difference between the Log Extended and the S.M.A.R.T. Log Sector? Is Microcode the same as firmware? (Seagate uses the term firmware elsewhere in the manual, but I don't find any sort of write firmware command.) Where can we get more detailed info about these commands and how to use them? References: [1] Seagate Barracuda 7200.11 Serial ATA Product Manual rev C August 2008 http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%207200.11/100507013c.pdf [2] http://it.slashdot.org/article.pl?sid=09/01/21/0052236 -- / Raimo Niskanen, Erlang/OTP, Ericsson AB
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
On Sun, January 25, 2009 16:01, Toni Mueller wrote: Hi, On Fri, 23.01.2009 at 21:28:34 +, Dieter open...@sopwith.solgatos.com wrote: Recovering from Seagate's problematic 7200.11 firmware. first off, several other product lines are affected, too. In particular, the popular ES and ES.2 server grade disks are also affected, to the best of my knowledge. Seagate only admits to problems with ES.2 drives, not ES drives, though. where you read that from ? I have a couple of 750GB ES.2 and now I'm worried ! matheus -- We will call you cygnus, The God of balance you shall be
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
On 2009-01-26, Nenhum_de_Nos math...@eternamente.info wrote: On Sun, January 25, 2009 16:01, Toni Mueller wrote: Hi, On Fri, 23.01.2009 at 21:28:34 +, Dieter open...@sopwith.solgatos.com wrote: Recovering from Seagate's problematic 7200.11 firmware. first off, several other product lines are affected, too. In particular, the popular ES and ES.2 server grade disks are also affected, to the best of my knowledge. Seagate only admits to problems with ES.2 drives, not ES drives, though. where you read that from ? I have a couple of 750GB ES.2 and now I'm worried ! http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931NewLang=en
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
On Mon, January 26, 2009 18:48, Stuart Henderson wrote: On 2009-01-26, Nenhum_de_Nos math...@eternamente.info wrote: On Sun, January 25, 2009 16:01, Toni Mueller wrote: Hi, On Fri, 23.01.2009 at 21:28:34 +, Dieter open...@sopwith.solgatos.com wrote: Recovering from Seagate's problematic 7200.11 firmware. first off, several other product lines are affected, too. In particular, the popular ES and ES.2 server grade disks are also affected, to the best of my knowledge. Seagate only admits to problems with ES.2 drives, not ES drives, though. where you read that from ? I have a couple of 750GB ES.2 and now I'm worried ! http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931NewLang=en thanks, yet OT, but I also heard of new firmwares being worse than old ones, from seagate first try to fix things. anyone already updated some ES.2 and all went fine ? thanks, matheus -- We will call you cygnus, The God of balance you shall be
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Disk families affected: Barracuda 7200.11, Barracuda ES.2 (SATA), DiamondMax 22, FreeAgent Desk, Maxtor OneTouch 4, Pipeline HD, Pipeline HD Pro, SV35.3, SV35.4 Barracuda ES.2 SAS drive is not affected All drives with a date of manufacture January 12, 2009 and later are not affected by this issue This condition was introduced by a firmware issue that sets the drive event log to an invalid location causing the drive to become inaccessible. The firmware issue is that the end boundary of the event log circular buffer (320) was set incorrectly. During Event Log initialization, the boundary condition that defines the end of the Event Log is off by one. During power up, if the Event Log counter is at entry 320, or a multiple of (320 + x*256), and if a particular data pattern (dependent on the type of tester used during the drive manufacturing test process) had been present in the reserved-area system tracks when the drive's reserved-area file system was created during manufacturing, firmware will increment the Event Log pointer past the end of the event log data structure. This error is detected and results in an Assert Failure, which causes the drive to hang as a failsafe measure. When the drive enters failsafe further update s to the counter become impossible and the condition will remain through subsequent power cycles. The problem only arises if a power cycle initialization occurs when the Event Log is at 320 or some multiple of 256 thereafter. Seagate says only on power up, but I'm pretty sure I have seen stories of rebooting causing bricking. Might be unrelated, but to play it safe I will continue to avoid reboots. So, we have confirmation of the number 320, and a formula for event counts past 320. We still need to find out if this Event Log counter is the error count reported by smartmontools, or some other counter. Ideally, I would like to find out how to read this reserved-area system track, and how to set it to a safe value (I have seen zero, but this is not confirmed). If we can do this we don't need to update the firmware. And we still want to find out how to update the firmware from Unix.
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Toni writes: Is Maxtorman correct about the 320 log entries? My dealer told me a similar story, but I don't know where he had it from. I guess the next step is to find out if Maxtorman is correct about this 320 log entries stuff, and if the SMART log entries as reported by smartmontools is the log to worry about, or if there is some other log. I don't have an account on /., and also feel incapable of actually working on this problem, but someone who has and can, could probably try to nag maxtorman about improving smartmontools to the point that they do the right thing, or try to get him to connect one to somebody else who can verify the issue and/or provide more technical details. If he can find a way to almost-anonymously post to /., he might be able to give some hints to the smartmontools gyus, too. Then, we only need them to integrate everything and make a new release. It is easy to set up a slashdot account. Or you can post as anonymous coward. He set up the Maxtorman account to post anonymously, he mentioned that he has another slashdot account that isn't anonymous. Problem I have is I can't find a way to send him a PM (private message). Most web forums have a facility for sending other users a PM. We can post a reply to the thread, but he would have to read the thread again to see it. Any slashdot wizards out there have an idea? Your suggestion of smartmontools is helpful, thank you. Personally, I'd say that it'd be best if Seagate themselves would grab the opportunity to partially make good on the issue, but I heavily doubt that they understand, or want to understand, what's it about with FLOSS. It isn't even just FLOSS. Any non-x86 machine is out of luck. Proprietary Unix is out of luck. Anything embedded is out of luck. Even Mac is probably out of luck. And if the reboot to run the firmware installer bricks the drive(s) even wintel is out of luck. I don't understand the common corporate policy of keeping everything secret. All they are doing is hurting their previously loyal customers. It didn't used to be this way. Supposedly there was a broken test machine that didn't zero out some special area after writing a test pattern. So only drives that were tested on that machine are at risk. If we can find out what area this is (I assume it isn't in the normal space used for user storage) and how to zero it (if not already zero) there is no need to update the firmware. -- Raimo writes: How can I know if I have a suspicious drive? Good question. Seagate has some web page that supposedly will tell you, but of course it is broken and doesn't work with all browsers. Google for ST3808110AS gives me Barracuda 7200.9 SATA 80-GB Hard Drive, so I guess this one is not suspicious, but I have more disks, in other servers. What if i find a 7200.10, 7200.11, ES or ES.2, is that enough for me to suspect it? I haven't read anything about problems with 7200.10 or earlier. Toni reports that ES and ES.2 may be affected. -- Glenn writes: Just a hypothetical situation, since we do not have the sourcecode of the firmware: isn't it possible some kind of mathematical operation is occuring on the number of log entries causing some kind of infinite loop to occur or a division that leads to/by 0 that the software/hardware is unable to handle? That could mean this problem could also manifest itself on for example multiples of 320, so just putting the counter on 321 may just be delaying the inevitable. And what happens if the counter overflows and reaches 320 again? From what I've read it sounds like the counter must be exactly 320 AND some location must have a test pattern rather than zero when you init (power up or reboot) the drive. From Maxtorman's description, the log is circular, so it will eventually wrap around to 320 again. So keeping the counter away from 320 is an okay short term workaround, but long term we want to either zero out the magic location or update the firmware. -- matheus writes: but I also heard of new firmwares being worse than old ones, from seagate first try to fix things. What I read is that the firmware itself was ok but the installer program would brick a previously working drive. But it didn't brick it as badly as the firmware bug, you can still update the firmware again once you get a proper update program. === There is supposed to be some document that explains all this, with enough details to create a fix. If anyone finds this document I need a copy please. If you have one or more of the suspect drives, if it running, try to keep it running and don't reboot. If it is powered down leave it powered down if possible until this all gets sorted out.
OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Hi, On Fri, 23.01.2009 at 21:28:34 +, Dieter open...@sopwith.solgatos.com wrote: Recovering from Seagate's problematic 7200.11 firmware. first off, several other product lines are affected, too. In particular, the popular ES and ES.2 server grade disks are also affected, to the best of my knowledge. Seagate only admits to problems with ES.2 drives, not ES drives, though. Seagate's response has been less than wonderful. We need a FLOSS solution. Right. We need for this to work with any flavor of Unix, We need to do this from within a running system. We need for this to work on one drive without affecting other drives. My first idea is that smartmontools probably provide much of the required framework alreaedy, and could possibly extended to work with this situation, too. If Maxtorman is correct, then once the drive has been operating awhile, Seagate sent me the following link http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931 which imho contributes to the impression of a less-than-stellar response by stating Based on the low risk as determined by an analysis of actual field return data, Seagate believes that the affected drives can be used as is. (current as of _now_). that works properly. Since Seagate's solution will require attaching the drive to an x86 system and booting a FreeDOS ISO from CD, if the log is at 320 that boot will brick the drive. As far as I understood, the firmware has a sort of a boot loader which reads the actual firmware from the drive, and also writes new firmware to the drive. This leads me to suspect that writing a modified boot loader firmware which does not contain such log entry reading or writing, could bypass the 'brickedness' caused by the broken firmware which is actually on the platters (ie, which is what the boot loader needs to load to begin with). So, if a modified boot loader would eg. abstain from loading the firmware on the drive, the corruption of said firmware on the drive would not occur, thus not blocking the remainder of the hardware. However, if, and how, such a new boot loader could be placed into the ???ROMs of the drive, I really don't know. Once Seagate releases working firmware, we want to be able to install it from Unix, on any CPU arch. Seagate's release can only install on x86 using FreeDOS. - smartmontools come to mind. Is Maxtorman correct about the 320 log entries? My dealer told me a similar story, but I don't know where he had it from. Kind regards, --Toni++
Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)
Recovering from Seagate's problematic 7200.11 firmware. first off, several other product lines are affected, too. In particular, the popular ES and ES.2 server grade disks are also affected, to the best of my knowledge. Seagate only admits to problems with ES.2 drives, not ES drives, though. Word is the Maxtor Diamond Max 21 line is also affected. We need to do this from within a running system. Yes. My first idea is that smartmontools probably provide much of the required framework alreaedy, and could possibly extended to work with this situation, too. Thanks. I downloaded smartmontools, fixed a couple of ILP vs LP64 bugs, and it appears to provide the number of SMART log entries. Is Maxtorman correct about the 320 log entries? My dealer told me a similar story, but I don't know where he had it from. I guess the next step is to find out if Maxtorman is correct about this 320 log entries stuff, and if the SMART log entries as reported by smartmontools is the log to worry about, or if there is some other log. E.g. see the Read Log Ext and Write Log Extended commands I posted yesterday. I don't know if these use the same log as the SMART commands or if this is something different.
Dealing with Seagate's problematic 7200.11 firmware.
Recovering from Seagate's problematic 7200.11 firmware. Most of you have read about the problems with Seagate's 7200.11 disks. For those of you that haven't, the firmware on many of these drives is buggy, and can brick the drive when powering up or rebooting the system. Thus far, Seagate's response has been less than wonderful. We need a FLOSS solution. Goals: 1) Ability to read the number of log entries. 2) Ability to change the number of log entries. 3) Ability to install new firmware from Unix. We need for this to work with any flavor of Unix, on any CPU arch, without reboot or power cycle. We need for this to work on one drive without affecting other drives. I don't expect to be able to write FLOSS firmware for the drives, so this isn't listed as a goal. If you think you can, please feel free. The problem: IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS. Maxtorman, slashdot discussion [2] If Maxtorman is correct, then once the drive has been operating awhile, we have a 1 in 320 chance that the circular log is at entry 320. We want to be able to find out how many log entries the disk currently has, and we want to be able to change the number of log entries away from 320, while we wait for Seagate to get its act together and release firmware that works properly. Since Seagate's solution will require attaching the drive to an x86 system and booting a FreeDOS ISO from CD, if the log is at 320 that boot will brick the drive. There are other firmware problems with the 7200.11 series, but this is the biggie. Once Seagate releases working firmware, we want to be able to install it from Unix, on any CPU arch. Seagate's release can only install on x86 using FreeDOS. *ATA Commands that may be useful: command namecommand code in hex page [1] pdf page [1] Read Log Ext0x2F27 33 S.M.A.R.T. Read Log Sector 0xB0 / 0xD5 28,34 34,40 S.M.A.R.T. Write Log Sector 0xB0 / 0xD6 28,34 34.40 Write Log Extended 0x3F28 34 Download Microcode 0x9227 33 Questions: Is Maxtorman correct about the 320 log entries? Are the commands listed above the ones we need? What is the difference between the Log Extended and the S.M.A.R.T. Log Sector? Is Microcode the same as firmware? (Seagate uses the term firmware elsewhere in the manual, but I don't find any sort of write firmware command.) Where can we get more detailed info about these commands and how to use them? References: [1] Seagate Barracuda 7200.11 Serial ATA Product Manual rev C August 2008 http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%207200.11/100507013c.pdf [2] http://it.slashdot.org/article.pl?sid=09/01/21/0052236