Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-30 Thread Sieve-X
Dieter openbsd at sopwith.solgatos.com writes:

 
 Sigh.  I could easily go on a major rant here, but it wouldn't do us
 any good.  Anyone have information or ideas that could get us closer
 to a solution?
 

Event log counter can be written every once in a while for example if S.M.A.R.T
automatic off-line data collection (ex. every 4h) is enabled (it is by default
and may include a list of last errors), temperatures, SER and others.

It appears to be theorically possible to effectively verify if a particular 
drive is affected by the problem using S.M.A.R.T information (ie. attributes, 
logs, etc) and that may also be used as workaround if there is a way to change 
the event counter to a safe value (ie. not 320 or a multiple of 320 + x*256) 
but we need more details (ex. specific data pattern) which were released by 
Seagate under NDA to some partners/vendors. It might be possible to find out
by comparing/researching the S.M.A.R.T information (including some of the 
vendor logs like 0xa1) from affected and non-affected drives matching basic
the requirements (7200.11/ES2.1/DiamondMax 22 both new and old firmware).

Log Directory Supported (this one is from an affected model)

SMART Log Directory Logging Version 1 [multi-sector log support]
Log at address 0x00 has 001 sectors [Log Directory]
Log at address 0x01 has 001 sectors [Summary SMART error log]
Log at address 0x02 has 005 sectors [Comprehensive SMART error log]
Log at address 0x03 has 005 sectors [Extended Comprehensive SMART error log]
Log at address 0x06 has 001 sectors [SMART self-test log]
Log at address 0x07 has 001 sectors [Extended self-test log]
Log at address 0x09 has 001 sectors [Selective self-test log]
Log at address 0x10 has 001 sectors [Reserved log]
Log at address 0x11 has 001 sectors [Reserved log]
Log at address 0x21 has 001 sectors [Write stream error log]
Log at address 0x22 has 001 sectors [Read stream error log]
Log at address 0x80 has 016 sectors [Host vendor specific log]
Log at address 0x81 has 016 sectors [Host vendor specific log]
Log at address 0x82 has 016 sectors [Host vendor specific log]
Log at address 0x83 has 016 sectors [Host vendor specific log]
Log at address 0x84 has 016 sectors [Host vendor specific log]
Log at address 0x85 has 016 sectors [Host vendor specific log]
Log at address 0x86 has 016 sectors [Host vendor specific log]
Log at address 0x87 has 016 sectors [Host vendor specific log]
Log at address 0x88 has 016 sectors [Host vendor specific log]
Log at address 0x89 has 016 sectors [Host vendor specific log]
Log at address 0x8a has 016 sectors [Host vendor specific log]
Log at address 0x8b has 016 sectors [Host vendor specific log]
Log at address 0x8c has 016 sectors [Host vendor specific log]
Log at address 0x8d has 016 sectors [Host vendor specific log]
Log at address 0x8e has 016 sectors [Host vendor specific log]
Log at address 0x8f has 016 sectors [Host vendor specific log]
Log at address 0x90 has 016 sectors [Host vendor specific log]
Log at address 0x91 has 016 sectors [Host vendor specific log]
Log at address 0x92 has 016 sectors [Host vendor specific log]
Log at address 0x93 has 016 sectors [Host vendor specific log]
Log at address 0x94 has 016 sectors [Host vendor specific log]
Log at address 0x95 has 016 sectors [Host vendor specific log]
Log at address 0x96 has 016 sectors [Host vendor specific log]
Log at address 0x97 has 016 sectors [Host vendor specific log]
Log at address 0x98 has 016 sectors [Host vendor specific log]
Log at address 0x99 has 016 sectors [Host vendor specific log]
Log at address 0x9a has 016 sectors [Host vendor specific log]
Log at address 0x9b has 016 sectors [Host vendor specific log]
Log at address 0x9c has 016 sectors [Host vendor specific log]
Log at address 0x9d has 016 sectors [Host vendor specific log]
Log at address 0x9e has 016 sectors [Host vendor specific log]
Log at address 0x9f has 016 sectors [Host vendor specific log]
Log at address 0xa1 has 020 sectors [Device vendor specific log]
Log at address 0xa8 has 020 sectors [Device vendor specific log]
Log at address 0xa9 has 001 sectors [Device vendor specific log]
Log at address 0xe0 has 001 sectors [Reserved log]
Log at address 0xe1 has 001 sectors [Reserved log]



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-30 Thread Stuart Henderson
please, this is way off topic. could you try and find a better list to
chat about this on...



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-28 Thread Toni Mueller
Hi,

On Tue, 27.01.2009 at 21:37:28 +, Dieter open...@sopwith.solgatos.com 
wrote:
 Toni writes:
  positives and false negatives. After deciding that the results were
  far too unreliable, the page was pulled.
 
 That too.  For one thing people were entering the serial numbers
 using lower case letters and getting false negatives.

this is a joke, right?

 There is a reason I want to look into zeroing out the magic area as
 an alternative to risking updating the firmware.  :-(

Understood... I'm looking for a different vendor, too. :-|

 the power fails.  So not a great workaround, but better than nothing,

Right.

 As I understand it, updating the firmware on some mainboards IS risky.

It may well be that some combinations don't work, but at some point,
I'd say that this should fall into the category of you get what you
pay for. IOW, I can't imagine that doing this kind of stuff right
would cost more than, say, $1 for a drive, and $5 for a motherboard,
and I think that everyone should be prepared to add, say, $50 to a
small server to get these things, ie, (much) less broken designs, imho.
But the bigger problem is that currently there appears to be no way to
add $50, or even $500, to a server, to get these things right because
there seems to be no vendor who offers such stuff.

   There is supposed to be some document that explains all this,
   with enough details to create a fix.  If anyone finds this
   document I need a copy please.
  
  Me too!
 
 Sounds like you are on good terms with your dealer.  Can your dealer get
 you a copy?

LOL. I can ask him, but don't expect too much...


Kind regards,
--Toni++



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-28 Thread Dieter
   positives and false negatives. After deciding that the results were
   far too unreliable, the page was pulled.
  
  That too.  For one thing people were entering the serial numbers
  using lower case letters and getting false negatives.
 
 this is a joke, right?

As far as I can tell it is not a joke.  The people entering the
serial numbers might have been wintel users and thus not too bright.
Seagate's quality control dept is clearly missing in action lately.

  As I understand it, updating the firmware on some mainboards IS risky.
 
 It may well be that some combinations don't work, but at some point,
 I'd say that this should fall into the category of you get what you
 pay for. IOW, I can't imagine that doing this kind of stuff right
 would cost more than, say, $1 for a drive, and $5 for a motherboard,
 and I think that everyone should be prepared to add, say, $50 to a
 small server to get these things, ie, (much) less broken designs, imho.
 But the bigger problem is that currently there appears to be no way to
 add $50, or even $500, to a server, to get these things right because
 there seems to be no vendor who offers such stuff.

The idiots in charge of most companies don't care about quality control.

Sigh.  I could easily go on a major rant here, but it wouldn't do us
any good.  Anyone have information or ideas that could get us closer
to a solution?



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-27 Thread Toni Mueller
Hi,

On Mon, 26.01.2009 at 17:08:51 +, Dieter open...@sopwith.solgatos.com 
wrote:
 It is easy to set up a slashdot account.  Or you can post as anonymous
 coward.

yes, but I don't want to set up a /. account right now, and posting as
AC wouldn't likely solve the problem.

 that he has another slashdot account that isn't anonymous.  Problem I
 have is I can't find a way to send him a PM (private message).  Most web

This is exactly the point.

 forums have a facility for sending other users a PM.  We can post a reply
 to the thread, but he would have to read the thread again to see it.
 Any slashdot wizards out there have an idea?

Post to the thread and offer one's own email address (maybe
time-limited or so), and hope for the best... not exactly a silver
bullet, but maybe better than nothing.

 It isn't even just FLOSS.  Any non-x86 machine is out of luck.
 Proprietary Unix is out of luck.  Anything embedded is out of luck.
 Even Mac is probably out of luck.  And if the reboot to run the
 firmware installer bricks the drive(s) even wintel is out of luck.

Yes, and smartmontools claims to run on all platforms you mentioned
(except MAC OS 9). Ie, they even run on Windows and/or together with
Cygwin. Therefore, I think that this is a strategic point from where
the problem could be solved for a really broad range of systems, and in
one go.

 I don't understand the common corporate policy of keeping everything
 secret.  All they are doing is hurting their previously loyal customers.
 It didn't used to be this way.

Oh... over here, we have a saying: Sea gate, oder sie geht net.
(meaning: it works, or it doesn't - it's a pun on the pronounciation
of Seagate). Yes, many people, me included, thought they had
reformed...

 Supposedly there was a broken test machine that didn't zero out some
 special area after writing a test pattern.  So only drives that were
 tested on that machine are at risk.

I'd like to not speculate about the cause of the problem any longer,
but instead devise a plan to acquire the required knowledge to beef up
smartmontools to solve the problem. I could only believe such claims
about the causes, but presently, Seagate destroyed about as much
trust as they possibly could, at least with me. So, except for the
hard-core technical data, they're out of the loop as far as I'm
concerned.

 If we can find out what area
 this is (I assume it isn't in the normal space used for user storage)
 and how to zero it (if not already zero) there is no need to update
 the firmware.

I'd rather say that the (ring) buffer has some external counter, also
stored somewhere, which needs to be adjusted. I'd not bet that simply
zeroing the area(s) will do.

 Good question.  Seagate has some web page that supposedly will tell you,
 but of course it is broken and doesn't work with all browsers.

At some time, they had a page where you could enter your model and
serial number, but reportedly this page delivered a lot of false
positives and false negatives. After deciding that the results were
far too unreliable, the page was pulled.

 Toni reports that ES and ES.2 may be affected.

This I took from a Seagate web page. Stuart Henderson has posted the
link, and I had the same link in my email which I received from
Seagate, so, I'd say, the link is genuine (despite the contents of
the page being almost worthless, imho).

 From what I've read it sounds like the counter must be exactly 320 AND some
 location must have a test pattern rather than zero when you init (power up
 or reboot) the drive.  From Maxtorman's description, the log is circular,
 so it will eventually wrap around to 320 again.

My dealer, who claimed that he also had information directly from
Seagate, told me that the buffer was 256 entries long (makes a lot of
sense, imho), but nevermind. We need hard facts, preferably in the
form of photocopies of internal design papers or so, not speculations.

 So keeping the counter away from 320 is an okay short term workaround,

This would require to periodically check the log position and eg. reset
it to zero at shutdown, to be on the safe side.

 but long term we want to either zero out the magic location or update the
 firmware.

We want to have updated firmware and the ability to update firmware for
all drives, also from other manufacturers. Updating firmware for a
drive shouldn't be any more complicated or risky than updating the BIOS
on the motherboard.

 There is supposed to be some document that explains all this,
 with enough details to create a fix.  If anyone finds this
 document I need a copy please.

Me too!

 If you have one or more of the suspect drives, if it running,
 try to keep it running and don't reboot.  If it is powered down
 leave it powered down if possible until this all gets sorted out.

Yes... but that still doesn't help you in the face of a system's crash.
What to do then? No need to answer this one...


-- 
Kind regards,
--Toni++



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-27 Thread Toni Mueller
Hi,

On Mon, 26.01.2009 at 17:08:51 +, Dieter open...@sopwith.solgatos.com 
wrote:
 Your suggestion of smartmontools is helpful, thank you.

thanks - I have just sent an email to them, esp. after seeing that
there are people from big name companies involved, who could procure
at least some of the required documentation inhouse.


-- 
Kind regards,
--Toni++



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-27 Thread Sieve-X
Nenhum_de_Nos matheus at eternamente.info writes:

  where you read that from ?
 
  I have a couple of 750GB ES.2 and now I'm worried !
 
 
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931NewLang=en
 
 thanks, yet OT, but I also heard of new firmwares being worse than old
 ones, from seagate first try to fix things. anyone already updated some
 ES.2 and all went fine ?
 
 thanks,
 
 matheus
 

I updated some ST3500320NS to SN06C and everything went fine. If you
have the SAS version of ES2.1 it's not affected (does not need update). 



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-27 Thread Dieter
Toni writes:

  If we can find out what area
  this is (I assume it isn't in the normal space used for user storage)
  and how to zero it (if not already zero) there is no need to update
  the firmware.
 
 I'd rather say that the (ring) buffer has some external counter, also
 stored somewhere, which needs to be adjusted. I'd not bet that simply
 zeroing the area(s) will do.

From what I've read, zeroing the area should make the disk safe.
Depending on what it takes to zero the area, this might be either
more or less safe than updating the firmware.

  Good question.  Seagate has some web page that supposedly will tell you,
  but of course it is broken and doesn't work with all browsers.
 
 At some time, they had a page where you could enter your model and
 serial number, but reportedly this page delivered a lot of false
 positives and false negatives. After deciding that the results were
 far too unreliable, the page was pulled.

That too.  For one thing people were entering the serial numbers
using lower case letters and getting false negatives.

Does Seagate test anything?  Their firmware is buggy, their test
equipment is buggy, their web site doesn't work, their model  serial
number checker program is buggy.  They released a firmware installer
program that bricks drives.

There is a reason I want to look into zeroing out the magic area as
an alternative to risking updating the firmware.  :-(

  So keeping the counter away from 320 is an okay short term workaround,
 
 This would require to periodically check the log position and eg. reset
 it to zero at shutdown, to be on the safe side.

Yes.  Depending on what events make the counter increment, it might be
possible for the counter to go from 0 to 320 in a short time, and then
the power fails.  So not a great workaround, but better than nothing,
and if we get info on the counter before getting info on a proper fix
(either zeroing the magic are or updating the firmware) we could use
it as a workaround until we get a proper fix.

  but long term we want to either zero out the magic location or update the
  firmware.
 
 We want to have updated firmware and the ability to update firmware for
 all drives, also from other manufacturers. Updating firmware for a
 drive shouldn't be any more complicated or risky than updating the BIOS
 on the motherboard.

As I understand it, updating the firmware on some mainboards IS risky.
Some have a fail safe that allows trying again, others have two
areas that can be written, but some have neither of these and risk getting
bricked.

  There is supposed to be some document that explains all this,
  with enough details to create a fix.  If anyone finds this
  document I need a copy please.
 
 Me too!

Sounds like you are on good terms with your dealer.  Can your dealer get
you a copy?

--

Sieve-X writes:

 As far I know the drive internal event counter can only be accessed
 or changed from firmware level (ie. serial/pc-3000).

By serial you mean using the TTL to RS-232 thing?  For most of us that
would require taking the drive out of whatever case it is mounted in
to access the pins.

What is pc-3000?

 Drive firmware flashing from (S)ATA interface level could be done 
 on UNIX but doing so from a mounted file-system (to avoid a reboot)

I was worried that device drivers might not have a facility to
pass through whatever magic commands we need.  But yeah, if the
root partition is on an affected drive that might be a problem.
It would be nice to know if rebooting is safe or not.



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-26 Thread Toni Mueller
Hi,

On Sun, 25.01.2009 at 16:27:14 +, Dieter open...@sopwith.solgatos.com 
wrote:
 I wrote:
  You wrote:
 Is Maxtorman correct about the 320 log entries?
  My dealer told me a similar story, but I don't know where he had it
  from.
 
 I guess the next step is to find out if Maxtorman is correct about this
 320 log entries stuff, and if the SMART log entries as reported by
 smartmontools is the log to worry about, or if there is some other log.

I don't have an account on /., and also feel incapable of actually
working on this problem, but someone who has and can, could probably
try to nag maxtorman about improving smartmontools to the point that
they do the right thing, or try to get him to connect one to somebody
else who can verify the issue and/or provide more technical details.

If he can find a way to almost-anonymously post to /., he might be able
to give some hints to the smartmontools gyus, too. Then, we only need
them to integrate everything and make a new release.

Personally, I'd say that it'd be best if Seagate themselves would grab
the opportunity to partially make good on the issue, but I heavily
doubt that they understand, or want to understand, what's it about
with FLOSS.


Kind regards,
--Toni++



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-26 Thread Nenhum_de_Nos
On Sun, January 25, 2009 16:01, Toni Mueller wrote:
 Hi,

 On Fri, 23.01.2009 at 21:28:34 +, Dieter
 open...@sopwith.solgatos.com wrote:
 Recovering from Seagate's problematic 7200.11 firmware.


 first off, several other product lines are affected, too. In
 particular, the popular ES and ES.2 server grade disks are also
 affected, to the best of my knowledge. Seagate only admits to problems
 with ES.2 drives, not ES drives, though.

where you read that from ?

I have a couple of 750GB ES.2 and now I'm worried !

matheus

-- 
We will call you cygnus,
The God of balance you shall be



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-26 Thread Stuart Henderson
On 2009-01-26, Nenhum_de_Nos math...@eternamente.info wrote:
 On Sun, January 25, 2009 16:01, Toni Mueller wrote:
 Hi,

 On Fri, 23.01.2009 at 21:28:34 +, Dieter
 open...@sopwith.solgatos.com wrote:
 Recovering from Seagate's problematic 7200.11 firmware.


 first off, several other product lines are affected, too. In
 particular, the popular ES and ES.2 server grade disks are also
 affected, to the best of my knowledge. Seagate only admits to problems
 with ES.2 drives, not ES drives, though.

 where you read that from ?

 I have a couple of 750GB ES.2 and now I'm worried !

http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931NewLang=en



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-26 Thread Nenhum_de_Nos
On Mon, January 26, 2009 18:48, Stuart Henderson wrote:
 On 2009-01-26, Nenhum_de_Nos math...@eternamente.info wrote:
 On Sun, January 25, 2009 16:01, Toni Mueller wrote:
 Hi,

 On Fri, 23.01.2009 at 21:28:34 +, Dieter
 open...@sopwith.solgatos.com wrote:
 Recovering from Seagate's problematic 7200.11 firmware.


 first off, several other product lines are affected, too. In
 particular, the popular ES and ES.2 server grade disks are also
 affected, to the best of my knowledge. Seagate only admits to problems
 with ES.2 drives, not ES drives, though.

 where you read that from ?

 I have a couple of 750GB ES.2 and now I'm worried !

 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931NewLang=en

thanks, yet OT, but I also heard of new firmwares being worse than old
ones, from seagate first try to fix things. anyone already updated some
ES.2 and all went fine ?

thanks,

matheus

-- 
We will call you cygnus,
The God of balance you shall be



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-26 Thread Dieter
Disk families affected:

Barracuda 7200.11, Barracuda ES.2 (SATA), DiamondMax 22, FreeAgent Desk,
Maxtor OneTouch 4, Pipeline HD, Pipeline HD Pro, SV35.3, SV35.4

Barracuda ES.2 SAS drive is not affected

All drives with a date of manufacture January 12, 2009 and later are
not affected by this issue

This condition was introduced by a firmware issue that sets the drive event
log to an invalid location causing the drive to become inaccessible.

The firmware issue is that the end boundary of the event log circular
buffer (320) was set incorrectly. During Event Log initialization,
the boundary condition that defines the end of the Event Log is off
by one. During power up, if the Event Log counter is at entry 320,
or a multiple of (320 + x*256), and if a particular data pattern
(dependent on the type of tester used during the drive manufacturing
test process) had been present in the reserved-area system tracks
when the drive's reserved-area file system was created during
manufacturing, firmware will increment the Event Log pointer past
the end of the event log data structure. This error is detected and
results in an Assert Failure, which causes the drive to hang as a
failsafe measure. When the drive enters failsafe further update s to
the counter become impossible and the condition will remain through
subsequent power cycles. The problem only arises if a power cycle
initialization occurs when the Event Log is at 320 or some multiple
of 256 thereafter.



Seagate says only on power up, but I'm pretty sure I have seen stories
of rebooting causing bricking.  Might be unrelated, but to play it safe
I will continue to avoid reboots.

So, we have confirmation of the number 320, and a formula for event counts
past 320.  We still need to find out if this Event Log counter is the
error count reported by smartmontools, or some other counter.

Ideally, I would like to find out how to read this reserved-area system
track, and how to set it to a safe value (I have seen zero, but this is
not confirmed).  If we can do this we don't need to update the firmware.

And we still want to find out how to update the firmware from Unix.



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-26 Thread Dieter
Toni writes:

Is Maxtorman correct about the 320 log entries?
   My dealer told me a similar story, but I don't know where he had it
   from.
  
  I guess the next step is to find out if Maxtorman is correct about this
  320 log entries stuff, and if the SMART log entries as reported by
  smartmontools is the log to worry about, or if there is some other log.
 
 I don't have an account on /., and also feel incapable of actually
 working on this problem, but someone who has and can, could probably
 try to nag maxtorman about improving smartmontools to the point that
 they do the right thing, or try to get him to connect one to somebody
 else who can verify the issue and/or provide more technical details.
 
 If he can find a way to almost-anonymously post to /., he might be able
 to give some hints to the smartmontools gyus, too. Then, we only need
 them to integrate everything and make a new release.

It is easy to set up a slashdot account.  Or you can post as anonymous
coward.   He set up the Maxtorman account to post anonymously, he mentioned
that he has another slashdot account that isn't anonymous.  Problem I
have is I can't find a way to send him a PM (private message).  Most web
forums have a facility for sending other users a PM.  We can post a reply
to the thread, but he would have to read the thread again to see it.
Any slashdot wizards out there have an idea?

Your suggestion of smartmontools is helpful, thank you.

 Personally, I'd say that it'd be best if Seagate themselves would grab
 the opportunity to partially make good on the issue, but I heavily
 doubt that they understand, or want to understand, what's it about
 with FLOSS.

It isn't even just FLOSS.  Any non-x86 machine is out of luck.
Proprietary Unix is out of luck.  Anything embedded is out of luck.
Even Mac is probably out of luck.  And if the reboot to run the
firmware installer bricks the drive(s) even wintel is out of luck.

I don't understand the common corporate policy of keeping everything
secret.  All they are doing is hurting their previously loyal customers.
It didn't used to be this way.

Supposedly there was a broken test machine that didn't zero out some
special area after writing a test pattern.  So only drives that were
tested on that machine are at risk.  If we can find out what area
this is (I assume it isn't in the normal space used for user storage)
and how to zero it (if not already zero) there is no need to update
the firmware.

--
Raimo writes:

 How can I know if I have a suspicious drive?

Good question.  Seagate has some web page that supposedly will tell you,
but of course it is broken and doesn't work with all browsers.

 Google for ST3808110AS gives me Barracuda 7200.9 SATA 80-GB Hard Drive,
 so I guess this one is not suspicious, but I have more disks,
 in other servers. What if i find a 7200.10, 7200.11, ES or ES.2,
 is that enough for me to suspect it?

I haven't read anything about problems with 7200.10 or earlier.
Toni reports that ES and ES.2 may be affected.

--
Glenn writes:

 Just a hypothetical situation, since we do not have the sourcecode of
 the firmware: isn't it possible some kind of mathematical operation
 is occuring on the number of log entries causing some kind of infinite
 loop to occur or a division that leads to/by 0 that the software/hardware
 is unable to handle? That could mean this problem could also manifest
 itself on for example multiples of 320, so just putting the counter on
 321 may just be delaying the inevitable. And what happens if the counter
 overflows and reaches 320 again?

From what I've read it sounds like the counter must be exactly 320 AND some
location must have a test pattern rather than zero when you init (power up
or reboot) the drive.  From Maxtorman's description, the log is circular,
so it will eventually wrap around to 320 again.

So keeping the counter away from 320 is an okay short term workaround,
but long term we want to either zero out the magic location or update the
firmware.

--
matheus writes:

 but I also heard of new firmwares being worse than old
 ones, from seagate first try to fix things.

What I read is that the firmware itself was ok but the installer
program would brick a previously working drive.  But it didn't
brick it as badly as the firmware bug, you can still update the
firmware again once you get a proper update program.

===

There is supposed to be some document that explains all this,
with enough details to create a fix.  If anyone finds this
document I need a copy please.

If you have one or more of the suspect drives, if it running,
try to keep it running and don't reboot.  If it is powered down
leave it powered down if possible until this all gets sorted out.



Re: OT: Hard Disk Problems (was: Re: Dealing with Seagate's problematic 7200.11 firmware.)

2009-01-25 Thread Dieter
  Recovering from Seagate's problematic 7200.11 firmware.
 
 
 first off, several other product lines are affected, too. In
 particular, the popular ES and ES.2 server grade disks are also
 affected, to the best of my knowledge. Seagate only admits to problems
 with ES.2 drives, not ES drives, though.

Word is the Maxtor Diamond Max 21 line is also affected.

 We need to do this from within a running system.

Yes.

 My first idea is that smartmontools probably provide much of the
 required framework alreaedy, and could possibly extended to work with
 this situation, too.

Thanks.  I downloaded smartmontools, fixed a couple of ILP vs LP64 bugs,
and it appears to provide the number of SMART log entries.

  Is Maxtorman correct about the 320 log entries?
 
 My dealer told me a similar story, but I don't know where he had it
 from.

I guess the next step is to find out if Maxtorman is correct about this
320 log entries stuff, and if the SMART log entries as reported by
smartmontools is the log to worry about, or if there is some other log.
E.g. see the Read Log Ext and Write Log Extended commands I posted
yesterday.  I don't know if these use the same log as the SMART commands
or if this is something different.