Re: off topic - disk crash

2004-03-14 Thread Christoph P. Kukulies
On Fri, Mar 12, 2004 at 03:58:16PM +0100, Dag-Erling Smørgrav wrote:
 Clifton Royston [EMAIL PROTECTED] writes:
   Today an important (no backup of course) 46 GB IBM Deskstar
   IDE disk crashed.
  This specific line of drives is infamous for a failure rate that's at
  least a full order of magnitude above the industry average for ATA
  drives.  Google a bit for it.
 
 Not the entire DeskStar line, just the 75GXP series.  I still have
 several 16Gs and at least one 60GXP that have never given me any
 trouble, and they were fast and silent for their time, head and
 shoulders ahead of the competition.  These days I mostly buy WD...
 
   The disk boots into FreeBSD but already at power on time the disk does
   seek retries or some recalibration noise.
 
 Also known as the click of death...


Thanks for all the helpful tips so far. It is a DLTA 307045 (3.5)
Don't know whether this is a 75GXP.

I'm getting either these:

ad2: TIMEOUT - READ_DMA retrying (2 retries left)  LBA=30583

Which don't stop the dd process.

And these,

ad2: FAILURE  -  READ_DMA status=51READY, DSC, ERROR error=40UNCORRECTABLE LBA=9156

leading to termination.

Also the transfer rate is terribly slow: (80 KB/s)

I was able to save 18 MB (of 46 GB) (not much so far)

Any other suggestions? 

Could I increase the retry count? Or enforce continuation even in case of
hard errors? So that with  a bit of luck I could find the FS later
in the dump and be able to restore at least partially some files?


--
Chris Christoph P. U. Kukulies kuku_at_physik.rwth-aachen.de
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: off topic - disk crash

2004-03-14 Thread Christoph P. Kukulies
On Sun, Mar 14, 2004 at 12:25:02PM +0100, Søren Schmidt wrote:
 Christoph P. Kukulies wrote:
 
 Thanks for all the helpful tips so far. It is a DLTA 307045 (3.5)
 Don't know whether this is a 75GXP.
 
 It is one of the dreaded models experience shows that all models after 
 this has some kind of problems, no wonder they sold out :)
 
 I'm getting either these:
 
 ad2: TIMEOUT - READ_DMA retrying (2 retries left)  LBA=30583
 
 Which don't stop the dd process.
 
 And these,
 
 ad2: FAILURE  -  READ_DMA status=51READY, DSC, ERROR 
 error=40UNCORRECTABLE LBA=9156
 
 leading to termination.
 
 Also the transfer rate is terribly slow: (80 KB/s)
 
 I was able to save 18 MB (of 46 GB) (not much so far)
 
 Any other suggestions? 
 
 Use the noerror and sync flags to dd, that will get past errors and put 
 in NULL sectors for those you cant read. However it will take a looong 
 time and probably tear off the sorry rests of your magnetic coating on 
 the platters :(

It is now dumping and I'm at 2.7 GB meanwhile. No more errors since the
last one at LBA=67 . Are these LBS identical to the block #?

Maybe I'll give it another try (when this pass is through) and dump
from the beginning.

I'm about to get me a second identical model and maybe I then can dd
the whole image including partition table so that I will not have to
scan the disk for the start of the filesystems.

Some time ago I wrote a little program to scan a disk for the start of 
a FS. Unfortunately that program is also on the crashed disk :-O

--
Chris Christoph P. U. Kukulies kuku_at_physik.rwth-aachen.de
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: off topic - disk crash

2004-03-14 Thread Søren Schmidt
Christoph P. Kukulies wrote:

Thanks for all the helpful tips so far. It is a DLTA 307045 (3.5)
Don't know whether this is a 75GXP.
It is one of the dreaded models experience shows that all models after 
this has some kind of problems, no wonder they sold out :)

I'm getting either these:

ad2: TIMEOUT - READ_DMA retrying (2 retries left)  LBA=30583

Which don't stop the dd process.

And these,

ad2: FAILURE  -  READ_DMA status=51READY, DSC, ERROR error=40UNCORRECTABLE LBA=9156

leading to termination.

Also the transfer rate is terribly slow: (80 KB/s)

I was able to save 18 MB (of 46 GB) (not much so far)

Any other suggestions? 
Use the noerror and sync flags to dd, that will get past errors and put 
in NULL sectors for those you cant read. However it will take a looong 
time and probably tear off the sorry rests of your magnetic coating on 
the platters :(

-Søren
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: off topic - disk crash

2004-03-14 Thread Søren Schmidt
Christoph P. Kukulies wrote:

It is now dumping and I'm at 2.7 GB meanwhile. No more errors since the
last one at LBA=67 . Are these LBS identical to the block #?
Yes.


Maybe I'll give it another try (when this pass is through) and dump
from the beginning.
I'm about to get me a second identical model and maybe I then can dd
the whole image including partition table so that I will not have to
scan the disk for the start of the filesystems.
Dont get another DTLA/AVER IBM disk, you will just have the same problem 
again sometime in the future, stay away from IBM/Hitachi disks that is 
based on these models (I dont know much about the newer disks from 
Hitachi and frankly I wont waste my money on them to find out).

--
-Søren
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: off topic - disk crash

2004-03-14 Thread Christoph P. Kukulies
On Sun, Mar 14, 2004 at 01:42:18PM +0100, Søren Schmidt wrote:
 Christoph P. Kukulies wrote:
 
 the whole image including partition table so that I will not have to
 scan the disk for the start of the filesystems.
 
 Dont get another DTLA/AVER IBM disk, you will just have the same problem 
 again sometime in the future, stay away from IBM/Hitachi disks that is 
 based on these models (I dont know much about the newer disks from 
 Hitachi and frankly I wont waste my money on them to find out).

Yes, I abandoned that idea now since things turn out a bit better.

I have built up a recovery system with a new big disk as a FreeBSD 5.2.1
and hooked the troubled disk as in as ad2.

I can mount -rf /dev/ad2s1g /mnt and find the old FS with all its
entries.

I copied over already some very important files and as it seems I will not
be as catastrophical as I initially thought.

With certain directories or files I get READ_DMA timeouts and also the system
hangs totally when a certain type of error occurs.

ad2: TIMEOUT - READ_DMA retryinmg (2 retries left) LBA=24703729
ad2: WARNING - READ_DMA Interrupt was seen but but timeout fired LBA=24703729 
ad2: WARNING - READ_DMA Interrupt was seen but but taskqueue stalled LBA=24703729 
ad0: FAILURE - WRITE_DMA status=51READY,DSC,ERROR error=40UNCORRECTABLE LBA=9825063

What I find strange is that the failing drive on the secondary IDE channel
causes the primary channel also to fail.

I wonder if this has to happen or could be avoided. I can only reboot from
that point on.

For recovering data this additionally painful and it would be nice I could
get this fixed somehow.

Another question is whether the read error occurs on the actual data
or only during the fstat or directory read. Is it possible to mount a 
FS with an alternate superblock as information base or do I have to fsck 
(write back to the disk risking that things get worse)

--
Chris Christoph P. U. Kukulies kuku_at_physik.rwth-aachen.de
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: off topic - disk crash

2004-03-14 Thread soralx
 With certain directories or files I get READ_DMA timeouts and also the
 system hangs totally when a certain type of error occurs.

 ad2: TIMEOUT - READ_DMA retryinmg (2 retries left) LBA=24703729
 ad2: WARNING - READ_DMA Interrupt was seen but but timeout fired
 LBA=24703729 ad2: WARNING - READ_DMA Interrupt was seen but but taskqueue
 stalled LBA=24703729 ad0: FAILURE - WRITE_DMA status=51READY,DSC,ERROR
 error=40UNCORRECTABLE LBA=9825063

 What I find strange is that the failing drive on the secondary IDE channel
 causes the primary channel also to fail.

 I wonder if this has to happen or could be avoided. I can only reboot from
 that point on.

I used a straightforward approach: copy files with midc, note on which the
system freezes, reboot, and skip those files. Eventually I got everything
impotant recovered. BTW, one of the few files which could not be read was
the Apache log - another reason to keep huge logs on sepatate drives (or
slices, at least). :)

Timestamp: 0x4054BCAD
[SorAlx]  http://cydem.org.ua/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Mozilla sucking file descriptors

2004-03-14 Thread Matthew D. Fuller
Has anybody else seen Mozilla just start munching file descriptors the
longer it runs?  I've seen it with at least Phoen^WFirebird 0.6 and
the current Firebi^WFirefox.  It just keeps going 'till it maxes out
the system.  fstat(1) doesn't show much directly, but with -v it spits
a crapload of errors:

(ttyp4):{173}% fstat -v |  grep -E 'unknown file type 5 for file [0-9]+ of pid 4697' 
| wc -l
3472

(that being, of course, my current firefox PID)

File type 5 is a kqueue (according to sys/file.h).  Why is Mozilla
eating an ever-increasing number of kqueue handles?  Is this our
problem or theirs?  Or is this something fixed since I last updated
(I'm on 5.1-RELEASE now)?

(In other news, thank heavens you can tweak kern.maxfiles on the fly!)


-- 
Matthew Fuller (MF4839)   |  [EMAIL PROTECTED]
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/

The only reason I'm burning my candle at both ends, is because I
  haven't figured out how to light the middle yet
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


GCC include files conundrum.

2004-03-14 Thread David Gilbert
I attempted to argue that audio/tclmidi wasn't broken... and the ports
maintainer fired back with

http://bento.freebsd.org/errorlogs/i386-5-latest/tclmidi-3.1.log

Now... I started investigating this and found that this was all due to
some differences in C++ over the years.

The error on bento comes down to bento not having strstream.h.  I have
that file as:

/usr/include/c++/3.3/backward/strstream.h
/usr/include/g++/backward/strstream.h

on my -CURRENT (as of a week or two ago) laptop.

bento does appear to have /usr/include/c++/3.3/backward/iostream.h
... but not strstream.h.  Why?

I realize that my source upgrading may have left around a few old
files, but I don't see a replacement strstream.h.

The C++ FAQ referred to by iostream (not iostream.h) seems to imply
that you should use iostream and sstream (no .h)... but including
those files imposes a very different standard that this port is not
ready to accept.  It appears that (among other things that I havn't
found yet) all 'istream' must be written 'std::istream' ... etc.

So what's the solution?

Dave.

-- 

|David Gilbert, Independent Contractor.   | Two things can only be |
|Mail:   [EMAIL PROTECTED]|  equal if and only if they |
|http://daveg.ca  |   are precisely opposite.  |
=GLO
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: GCC include files conundrum.

2004-03-14 Thread Mike Silbersack

On Sun, 14 Mar 2004, David Gilbert wrote:

 The C++ FAQ referred to by iostream (not iostream.h) seems to imply
 that you should use iostream and sstream (no .h)... but including
 those files imposes a very different standard that this port is not
 ready to accept.  It appears that (among other things that I havn't
 found yet) all 'istream' must be written 'std::istream' ... etc.

 So what's the solution?

 Dave.

#include blahblahblah
using namespace STD;

or something similar should restore the behavior the application is
expecting.

(Apparently including namespace std is evil, and this is why the FAQs
aren't helpful in telling you this.)

Mike Silby Silbersack
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: GCC include files conundrum.

2004-03-14 Thread Craig Rodrigues
On Sun, Mar 14, 2004 at 07:55:18PM -0500, David Gilbert wrote:
 I attempted to argue that audio/tclmidi wasn't broken... and the ports
 maintainer fired back with
 
 http://bento.freebsd.org/errorlogs/i386-5-latest/tclmidi-3.1.log
 
 Now... I started investigating this and found that this was all due to
 some differences in C++ over the years.
 
 So what's the solution?

Pick up a contemporary C++ book and learn about Standard C++ (which became
an ISO standard in 1998).  strstream is deprecated in Appendix D
of the standard.  I recommend a book such as The C++ Programming
Language, 3rd ed. by Bjarne Stroustrup.

gcc 3.x supports Standard C++ more aggressively than earlier
gcc versions, which can be painful.  The GCC developers (more specifically
libstdc++ developers) are more interested in supporting Standard
C++, and are not too interested in maintaining backwards compatibility
with deprecated headers such as strstream.h.  This is a bit of a problem
for software that depends on these older libraries.

You have a few options:

(1)  Learn enough C++ so that you can apply the necessary patches
 to fix audio/tclmidi so that it compiles with
 Standard C++ headers (such as sstream).

(2)  gcc 3.3 has /usr/include/c++/3.3/backward/strstream, so you may
 want to try #include backward/sstream an see if that works,
 but chances are if it doesn't work, you will be out of luck,
 since it is a deprecated header that the GCC developers
 are not too interested in supporting.

(3)  In the Makefile for the audio/tclmidi port, mark it as broken
 on FreeBSD 5.x:

.if ${OSVERSION}  50
BROKEN= Does not build on 5.x
.endif


-- 
Craig Rodrigues
http://crodrigues.org
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: GCC include files conundrum.

2004-03-14 Thread David Gilbert
 Craig == Craig Rodrigues [EMAIL PROTECTED] writes:

Craig You have a few options:

Craig (1) Learn enough C++ so that you can apply the necessary
Craig patches to fix audio/tclmidi so that it compiles with Standard
Craig C++ headers (such as sstream).

Craig (2) gcc 3.3 has /usr/include/c++/3.3/backward/strstream, so you
Craig may want to try #include backward/sstream an see if that
Craig works, but chances are if it doesn't work, you will be out of
Craig luck, since it is a deprecated header that the GCC developers
Craig are not too interested in supporting.

I'll ignore the condescending tone for a momment.  It's worth noting
that everything works by simply having a copy of strstream.h in the
backward directory.  Maybe the right path to take here is to include
that file much as we include old versions of shared libraries.

Dave.

-- 

|David Gilbert, Independent Contractor.   | Two things can only be |
|Mail:   [EMAIL PROTECTED]|  equal if and only if they |
|http://daveg.ca  |   are precisely opposite.  |
=GLO
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Mozilla sucking file descriptors

2004-03-14 Thread Kris Kennaway
On Sun, Mar 14, 2004 at 07:32:15PM -0600, Matthew D. Fuller wrote:
 Has anybody else seen Mozilla just start munching file descriptors the
 longer it runs?  I've seen it with at least Phoen^WFirebird 0.6 and
 the current Firebi^WFirefox.  It just keeps going 'till it maxes out
 the system.  fstat(1) doesn't show much directly, but with -v it spits
 a crapload of errors:
 
 (ttyp4):{173}% fstat -v |  grep -E 'unknown file type 5 for file [0-9]+ of pid 
 4697' | wc -l
 3472
 
 (that being, of course, my current firefox PID)
 
 File type 5 is a kqueue (according to sys/file.h).  Why is Mozilla
 eating an ever-increasing number of kqueue handles?  Is this our
 problem or theirs?  Or is this something fixed since I last updated
 (I'm on 5.1-RELEASE now)?
 
 (In other news, thank heavens you can tweak kern.maxfiles on the fly!)

This sounds like a DNS resolver bug that was fixed some time ago.

Kris


pgp0.pgp
Description: PGP signature


Re: a serious error in sched_ule.c?

2004-03-14 Thread Wes Peters
On Tue, 09 Mar 2004 21:29:54 +0100 [EMAIL PROTECTED] (Dag-Erling Smørgrav) alleged:

 Wes Peters [EMAIL PROTECTED] writes:
  One of the classic trade-offs in making a 'server' vs. 'workstation'
  operating system.  Workstations require a strong preference for
  interactive over background tasks so the interactive tasks will
  remain responsive, especially in terms of heavily event-driven tasks
  like graphical UIs.  For a true server, where interactive tasks are
  not the norm, this preference may be counter-productive.
 
 Umm, remember that interactive here means performs I/O, even if
 that I/O is a database lookup or a TCP connection.

Sigh.  Nobody really does compute-bound tasks anymore, do they?  I really
miss scientific programming.

-- 

Where am I, and what am I doing in this handbasket?

Wes Peters   [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: a serious error in sched_ule.c?

2004-03-14 Thread Dag-Erling Smørgrav
Wes Peters [EMAIL PROTECTED] writes:
 Sigh.  Nobody really does compute-bound tasks anymore, do they?  I really
 miss scientific programming.

Actually, my wife is a molecular biologist and eats CPU hours with
milk and sugar for breakfast.  She expressed her satisfaction
yesterday at finding out that her latest program only takes four and a
half hours per data set.  But honey, says I, you have 30,000 data
sets!  Quoth the love of my life, That's OK, we've got *two*
computers.

DES
-- 
Dag-Erling Smrgrav - [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: a serious error in sched_ule.c?

2004-03-14 Thread Colin Percival
At 07:32 15/03/2004, Dag-Erling Smørgrav wrote:
Actually, my wife is a molecular biologist and eats CPU hours with
milk and sugar for breakfast.  She expressed her satisfaction
yesterday at finding out that her latest program only takes four and a
half hours per data set.  But honey, says I, you have 30,000 data
sets!  Quoth the love of my life, That's OK, we've got *two*
computers.
... and 8 years to waste, apparently.

Colin Percival

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: a serious error in sched_ule.c?

2004-03-14 Thread Wes Peters
On Mon, 15 Mar 2004 07:42:59 + Colin Percival
[EMAIL PROTECTED] alleged:

 At 07:32 15/03/2004, Dag-Erling Smørgrav wrote:
 Actually, my wife is a molecular biologist and eats CPU hours with
 milk and sugar for breakfast.  She expressed her satisfaction
 yesterday at finding out that her latest program only takes four and a
 half hours per data set.  But honey, says I, you have 30,000 data
 sets!  Quoth the love of my life, That's OK, we've got *two*
 computers.
 
 ... and 8 years to waste, apparently.

Wowsers.  Sounds like they need a cluster.  Introduce her to Dillon!  ;^)

-- 

Where am I, and what am I doing in this handbasket?

Wes Peters   [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]