Re: disk problems: which ATA?

2011-07-04 Thread Camaleón
On Sun, 03 Jul 2011 14:35:08 -0700, Ross Boylan wrote:

> On Sun, 2011-07-03 at 19:19 +, Camaleón wrote:
>> On Sun, 03 Jul 2011 11:25:18 -0700, Ross Boylan wrote:
>> 
>> > How can I tell which ata device is which hard drive?  It's come up
>> > several times for me, most recently with ata2.00: exception Emask 0x0
>> > SAct 0x0 SErr 0x0 action 0x6 frozen
>> 
>> (...)
>> 
>> You can:
>> 
>> - Run "smartctl -i /dev/sdb | grep -i model"

> Except the drive isn't responding to smartctl (see original message). I
> tried adding -T permissive, but all that gets me is Short INQUIRY
> response, skip product id (curiously, no error about command failed).

Yes, I already noticed but "-i" also fails? :-? It just gathers basic hdd 
info, mmm... that does not sound very good.

Anyway, you can test with another hard disc utility like hdparm/sdparm 
unless they're also failing.

>> - Then "dmesg | grep -i ata2"
>> - To finally compare by hdd model :-)

> That works for my current machine.  But on another machine I want to
> figure out which drive an error message goes with, and there are 2
> identical drivers.  I suppose that even if I new which sd device the ata
> went with, I still wouldn't be sure which physical drive that was...

Try with another tool?

>> As per the error itself, you can use the manufacturer hard disk
>> diagnostic tools which uses to run from a LiveCD and will provide
>> accurate results about your hdd health and status.
>> 
>> OTOH, I've also seen that kind of error coming from bad sata cable or
>> bad sit connection to motherboard/disk. You may also check this.

> I think I already tried reseating, but I suppose it's worth trying
> again.  I'm concerned if I power down I may not be able to get back up,
> since the failing hard disk is actually part of an LVM volume group.  I
> am also unable to get information on that VG right now.

What I would do is:

1/ Having a full copy of both hard disks placed into another machine, 
just in case.

2/ Download manufacture's disk utility and make a full scan for both disks

> Most of the logical vomes in the group are backed by other hard drives,
> but I'm not quite sure what will happen if the disk is toast.  At the
> moment, I have access to most of the LVs, even though I can't get info
> on the PV that contains them (!).

I've never used LVM so I dunno what would happen in such cases. I've 
heard that it is advisable to have a RAID level on top of LVM to avoid 
data loss as LVM does not prevent from this situations unless it has been 
setup to run as RAID 1.

> P.S. For the record, kernel logs need to be read carefully to figure out
> which drive is ata2.  

(...)

I think the important log entry was the first you provided that pointed 
to ata2.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.07.04.13.13...@gmail.com



Re: message formatting and pseudo XML [was Re: disk problems: which ATA?]

2011-07-03 Thread Lisi
On Sunday 03 July 2011 23:56:30 you wrote:
> On Sun, 2011-07-03 at 23:40 +0100, Lisi wrote:
> > On Sunday 03 July 2011 22:35:08 Ross Boylan wrote:
> > > (see original message)
> >
> > Not a very productive instruction to those of us who have HTML display
> > turned off.
> >
> > Lisi
>
> That's interesting.  The messages are not in html.  However, I did use a
> little pseudo-XML (with angle brackets--I'm deliberately not repeating
> it here) to distinguish material I copied from the terminal.
>
> I haven't noticed problems in evolution, mutt, or thunderbird.
>
> Any opinions from anyone on whether pseudo-XML is inadvisable?
>
> Lisi, could this be a problem with your particular mail client?

Not that I know of - but that doesn't mean that it isn't so.  Does anyone 
know?   I am using Kmail 1.9.9. 

I shall have to look into it.  Google and ask on another list etc.  I am using 
Kmail 1.9.9.  I'll get back to you - I hope.  I have a memory that resembles 
a sieve more than anything else. :-(

Lisi



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201107040655.02725.lisi.re...@gmail.com



Re: message formatting and pseudo XML [was Re: disk problems: which ATA?]

2011-07-03 Thread David Jardine
On Sun, Jul 03, 2011 at 03:56:30PM -0700, Ross Boylan wrote:
> On Sun, 2011-07-03 at 23:40 +0100, Lisi wrote:
> > On Sunday 03 July 2011 22:35:08 Ross Boylan wrote:
> > > (see original message)
> > 
> > Not a very productive instruction to those of us who have HTML display 
> > turned 
> > off.
> > 
> > Lisi
> > 
> That's interesting.  The messages are not in html.  However, I did use a
> little pseudo-XML (with angle brackets--I'm deliberately not repeating
> it here) to distinguish material I copied from the terminal.
> 
> I haven't noticed problems in evolution, mutt, or thunderbird.
> 
> Any opinions from anyone on whether pseudo-XML is inadvisable?
> 

I don't even know what pseudo-XML is, but your message was clearly 
displayed for me (with indentations, without raw markup) by mutt with 
a more-or-less default configuration.

David


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110704002627.GA3833@gennes.augarten



Re: disk problems: which ATA?

2011-07-03 Thread Ross Boylan
On Sun, 2011-07-03 at 23:48 +0100, Brian wrote:
> On Sun 03 Jul 2011 at 11:25:18 -0700, Ross Boylan wrote:
> 
> > How can I tell which ata device is which hard drive?  It's come up
> > several times for me, most recently with
> > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> 
> For ata1.00 on this machine:
> 
>ls -l /sys/class/scsi_disk/0:0:0:0/device/ | grep block:
Does ata1 always go with scsi 0:0:0:0?
and ata2 with scsi 1:0:0:0?
etc?

The information in the directory you mention gets me from the scsi
location to the symbolic drive (e.g., sda)--which is good to know.  But
I'm not sure how to go from ata -> scsi.
> 
> > I'd also welcome advice about the disk problems, but I was hoping to
> > raise the odds of an answer by keeping it simple :)
> 
> You thought devising an answer to your first question would be easy?
Neither I nor my real sysadmin could figure it out.  I was hoping it
would be easy for someone :)

> I've just spent the best part of two hours on it! 
Thank you very much!

> No time now to sort
> out your disk problem. :)
I have survived my reboot, and all the LVs not backed by the bad disk seem OK.

That startup was interesting: the filesystems backed by the bad disk
were mounted; an ext3 volume replayed the journal and a reiser volume
started an fsck.  The latter bombed part way through and various
hardware looking errors popped up.  I got dropped into the startup
subshell with only / mounted.  I issued mount -a and proceeded with the
boot.

LVM tools (I tried lvs) still aren't getting anything but errors,
concluding with the message
  Couldn't find all physical volumes for volume group daisy.
  Volume group "daisy" not found
which is pretty odd, since I'm running on volumes from daisy.

Maybe if I pull the hard drive completely LVM will be happier.

Ross


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1309738194.11383.10.ca...@corn.betterworld.us



Re: disk problems: which ATA?

2011-07-03 Thread Pablo Sánchez

Ross, maybe it's been suggested, but running smartctl -i /dev/sdb on all drives,
you could get serial numbers. If you get all but one, open the case, identify 
those
you already listed on smartctl , the other(s) one(s) are the problematic ones.

I had to do it some time ago.

Pablo Sánchez


On Sun, 2011-07-03 at 19:19 +, Camaleón wrote:

On Sun, 03 Jul 2011 11:25:18 -0700, Ross Boylan wrote:


How can I tell which ata device is which hard drive?  It's come up
several times for me, most recently with ata2.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x6 frozen

(...)

You can:

- Run "smartctl -i /dev/sdb | grep -i model"

Except the drive isn't responding to smartctl (see original message).
I tried adding -T permissive, but all that gets me is
Short INQUIRY response, skip product id
(curiously, no error about command failed).

- Then "dmesg | grep -i ata2"
- To finally compare by hdd model :-)

That works for my current machine.  But on another machine I want to
figure out which drive an error message goes with, and there are 2
identical drivers.  I suppose that even if I new which sd device the ata
went with, I still wouldn't be sure which physical drive that was...

As per the error itself, you can use the manufacturer hard disk
diagnostic tools which uses to run from a LiveCD and will provide
accurate results about your hdd health and status.

OTOH, I've also seen that kind of error coming from bad sata cable or bad
sit connection to motherboard/disk. You may also check this.

I think I already tried reseating, but I suppose it's worth trying
again.  I'm concerned if I power down I may not be able to get back up,
since the failing hard disk is actually part of an LVM volume group.  I
am also unable to get information on that VG right now.

Most of the logical vomes in the group are backed by other hard drives,
but I'm not quite sure what will happen if the disk is toast.  At the
moment, I have access to most of the LVs, even though I can't get info
on the PV that contains them (!).

Ross

P.S. For the record, kernel logs need to be read carefully to figure out
which drive is ata2.  Mine had
Jun 22 09:24:10 corn kernel: [7.767461] hdf: 26563824 sectors (13600 MB) 
w/256KiB Cache, CHS=26353/16/63
Jun 22 09:24:10 corn kernel: [7.798391] hdf: cache flushes not supported
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: New USB device found, 
idVendor=04b8, idProduct=011e
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: New USB device strings: 
Mfr=1, Product=2, SerialNumber=0
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: Product: EPSON Scanner
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: Manufacturer: EPSON
Jun 22 09:24:10 corn kernel: [7.967458]  hdf:<6>ata2.00: ATA-8: 
ST31000340AS, SD15, max UDMA/133
Jun 22 09:24:10 corn kernel: [8.135686]  hdf1 hdf2 hdf3
Jun 22 09:24:10 corn kernel: [8.129867] ata2.00: 1953525168 sectors, multi 
16: LBA48 NCQ (depth 0/32)
Jun 22 09:24:10 corn kernel: [8.241613] ata2.00: configured for UDMA/133
Jun 22 09:24:10 corn kernel: [8.129867] scsi 0:0:0:0: Direct-Access ATA 
 WDC WD2500JS-00M 02.0 PQ: 0 ANSI: 5
Jun 22 09:24:10 corn kernel: [8.290133] scsi 1:0:0:0: Direct-Access ATA 
 ST31000340AS SD15 PQ: 0 ANSI: 5

The key point is that although the message about ata2.00 (at 7.967458)
appears in the middle of the hdf information, the 2 have no relation.
It seems the<6>  indicates an asyncronous message dumped before the end
of the line.  Later comes
Jun 22 09:24:10 corn kernel: [9.520209] sd 1:0:0:0: [sdb] 1953525168 
512-byte hardware sectors (1000205 MB)
so the sector count indicates this matches ata2.00, which is the Seagate 
ST31000340AS at scsi 1:0:0:0.
I guess the first ata line (after the<6>) gives the model.





--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4e1100a1.1010...@adinet.com.uy



Re: disk problems: which ATA?

2011-07-03 Thread Brian
On Sun 03 Jul 2011 at 11:25:18 -0700, Ross Boylan wrote:

> How can I tell which ata device is which hard drive?  It's come up
> several times for me, most recently with
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

For ata1.00 on this machine:

   ls -l /sys/class/scsi_disk/0:0:0:0/device/ | grep block:

> I'd also welcome advice about the disk problems, but I was hoping to
> raise the odds of an answer by keeping it simple :)

You thought devising an answer to your first question would be easy?
I've just spent the best part of two hours on it! No time now to sort
out your disk problem. :)


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110703224839.GV15615@desktop



message formatting and pseudo XML [was Re: disk problems: which ATA?]

2011-07-03 Thread Ross Boylan
On Sun, 2011-07-03 at 23:40 +0100, Lisi wrote:
> On Sunday 03 July 2011 22:35:08 Ross Boylan wrote:
> > (see original message)
> 
> Not a very productive instruction to those of us who have HTML display turned 
> off.
> 
> Lisi
> 
That's interesting.  The messages are not in html.  However, I did use a
little pseudo-XML (with angle brackets--I'm deliberately not repeating
it here) to distinguish material I copied from the terminal.

I haven't noticed problems in evolution, mutt, or thunderbird.

Any opinions from anyone on whether pseudo-XML is inadvisable?

Lisi, could this be a problem with your particular mail client?

Ross

P.S. I'm about to reboot.  It may be awhile before I'm back :(.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1309733790.10857.46.ca...@corn.betterworld.us



Re: disk problems: which ATA?

2011-07-03 Thread Lisi
On Sunday 03 July 2011 22:35:08 Ross Boylan wrote:
> (see original message)

Not a very productive instruction to those of us who have HTML display turned 
off.

Lisi


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201107032340.05709.lisi.re...@gmail.com



Re: disk problems: which ATA?

2011-07-03 Thread Mike Castle
On Sun, Jul 3, 2011 at 11:25 AM, Ross Boylan
 wrote:
> How can I tell which ata device is which hard drive?  It's come up
> several times for me, most recently with
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Depending on how long since boot, you can often explore the output of
dmesg to figure out which drive is which.

Sometimes what I do is something like this pseudocode:

for disk in /dev/sd?; do
  echo $disk
  smartctl -i $disk | grep -e Model -e Serial
done

And write down each working drive's model and serial number.

Reboot. and do it again.

In many cases, I've been lucky that the failing drive would work for a
little while (say, until a write).  So I could compare the two lists,
and figure out which model/serial is failing, and pull it.

Failing that, I have a list of known good disks, and can go through
all of the disks in the machine until I find the failing one.  And at
that point, since I have the case open, reseat all cables and cards,
just in case that's the problem.

mrc


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CA+t9iMwkN-Atum5kdYUdQmjA1Tmat+bz=6ka-zzaf4nttgs...@mail.gmail.com



Re: disk problems: which ATA?

2011-07-03 Thread Ross Boylan
On Sun, 2011-07-03 at 12:07 -0700, JD wrote:
> On 07/03/2011 11:25 AM, Ross Boylan wrote: 
> > How can I tell which ata device is which hard drive?  It's come up
> > several times for me, most recently with
> > ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> > 
> > It appears from other info that sdb is the problem:
> > 
> > # smartctl -H /dev/sdb
> > Sun Jul  3 10:26:29 PDT 2011
> > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
> > Allen
> > Home page is http://smartmontools.sourceforge.net/
> > 
> > Short INQUIRY response, skip product id
> > A mandatory SMART command failed: exiting. To continue, add one or more
> > '-T permissive' options.
> > 
> > but I'd like to know how to do this in general.
> > 
> > I'd also welcome advice about the disk problems, but I was hoping to
> > raise the odds of an answer by keeping it simple :)
> > 
> > sdb has had hardware problems for awhile; it wouldn't be surprising if
> > it's failed.
> > 
> > Running lenny with linux 2.6.26-2-686.
> > 
> > Thanks.
> > Ross
> > 
> > 
> 
> My basic advice in this situation is:
> 1. unmount the failing drive if it is mounted, and mount it's
> partitions read only.
It's a bit tricky, since my mounts are mostly on LVM logical volumes.
The main volume group I use includes the failing disk and some good
disks.  Unfortunately, LVM (specifically pvs or vgdisplay) won't give me
info on the VG, even though I seem to have continuing access to many of
its LVs.  I *think* I've already moved most LVs to be backed by other
disks.
> 2. back up it's mounted partitions to a good drive using tar:
> Assuming the mount points are
> /sdb1  /sdb2/sdb3
> cd /
> for i in 1 2 3; do
> tar cf-   sdb$i  |  bzip2 -c -9 > /somewhere/backup/sdb$i
> done
The latest problem emerged while doing my monthly (full) backups.  I
think the backups are good but the partition with the iso image failed.
> 3. If the drive is still under warranty, then run secure erase.
> read up on hdparm (man hdparm)
> If you go this route, I can send you a script
> for proper use of hdparm to securely erase a drive.
> After erasing, then get and RMA and send it away for
> replacement.
> 
It was when it started to fail.  I was hoping to do some kind of LVM
migration to get the info off, and so things have dragged on.
> 4. If you purchase a new drive, you will then partition it
> and tar back the backups to it.
Thanks for the advice.
> 
> -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with
> a subject of "unsubscribe". Trouble? Contact
> listmas...@lists.debian.org Archive:
> http://lists.debian.org/4e10bde7.4040...@gmail.com


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1309729495.10857.40.ca...@corn.betterworld.us



Re: disk problems: which ATA?

2011-07-03 Thread Ross Boylan
On Sun, 2011-07-03 at 19:19 +, Camaleón wrote:
> On Sun, 03 Jul 2011 11:25:18 -0700, Ross Boylan wrote:
> 
> > How can I tell which ata device is which hard drive?  It's come up
> > several times for me, most recently with ata2.00: exception Emask 0x0
> > SAct 0x0 SErr 0x0 action 0x6 frozen
> 
> (...)
> 
> You can:
> 
> - Run "smartctl -i /dev/sdb | grep -i model"
Except the drive isn't responding to smartctl (see original message).
I tried adding -T permissive, but all that gets me is
Short INQUIRY response, skip product id
(curiously, no error about command failed).
> - Then "dmesg | grep -i ata2"
> - To finally compare by hdd model :-)
That works for my current machine.  But on another machine I want to
figure out which drive an error message goes with, and there are 2
identical drivers.  I suppose that even if I new which sd device the ata
went with, I still wouldn't be sure which physical drive that was...
> 
> As per the error itself, you can use the manufacturer hard disk 
> diagnostic tools which uses to run from a LiveCD and will provide 
> accurate results about your hdd health and status.
> 
> OTOH, I've also seen that kind of error coming from bad sata cable or bad 
> sit connection to motherboard/disk. You may also check this.
I think I already tried reseating, but I suppose it's worth trying
again.  I'm concerned if I power down I may not be able to get back up,
since the failing hard disk is actually part of an LVM volume group.  I
am also unable to get information on that VG right now.  

Most of the logical vomes in the group are backed by other hard drives,
but I'm not quite sure what will happen if the disk is toast.  At the
moment, I have access to most of the LVs, even though I can't get info
on the PV that contains them (!).

Ross

P.S. For the record, kernel logs need to be read carefully to figure out
which drive is ata2.  Mine had
Jun 22 09:24:10 corn kernel: [7.767461] hdf: 26563824 sectors (13600 MB) 
w/256KiB Cache, CHS=26353/16/63
Jun 22 09:24:10 corn kernel: [7.798391] hdf: cache flushes not supported
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: New USB device found, 
idVendor=04b8, idProduct=011e
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: New USB device strings: 
Mfr=1, Product=2, SerialNumber=0
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: Product: EPSON Scanner
Jun 22 09:24:10 corn kernel: [7.829859] usb 1-4: Manufacturer: EPSON
Jun 22 09:24:10 corn kernel: [7.967458]  hdf:<6>ata2.00: ATA-8: 
ST31000340AS, SD15, max UDMA/133
Jun 22 09:24:10 corn kernel: [8.135686]  hdf1 hdf2 hdf3
Jun 22 09:24:10 corn kernel: [8.129867] ata2.00: 1953525168 sectors, multi 
16: LBA48 NCQ (depth 0/32)
Jun 22 09:24:10 corn kernel: [8.241613] ata2.00: configured for UDMA/133
Jun 22 09:24:10 corn kernel: [8.129867] scsi 0:0:0:0: Direct-Access ATA 
 WDC WD2500JS-00M 02.0 PQ: 0 ANSI: 5
Jun 22 09:24:10 corn kernel: [8.290133] scsi 1:0:0:0: Direct-Access ATA 
 ST31000340AS SD15 PQ: 0 ANSI: 5

The key point is that although the message about ata2.00 (at 7.967458)
appears in the middle of the hdf information, the 2 have no relation.
It seems the <6> indicates an asyncronous message dumped before the end
of the line.  Later comes
Jun 22 09:24:10 corn kernel: [9.520209] sd 1:0:0:0: [sdb] 1953525168 
512-byte hardware sectors (1000205 MB)
so the sector count indicates this matches ata2.00, which is the Seagate 
ST31000340AS at scsi 1:0:0:0.
I guess the first ata line (after the <6>) gives the model.


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1309728908.10857.30.ca...@corn.betterworld.us



Re: disk problems: which ATA?

2011-07-03 Thread Camaleón
On Sun, 03 Jul 2011 11:25:18 -0700, Ross Boylan wrote:

> How can I tell which ata device is which hard drive?  It's come up
> several times for me, most recently with ata2.00: exception Emask 0x0
> SAct 0x0 SErr 0x0 action 0x6 frozen

(...)

You can:

- Run "smartctl -i /dev/sdb | grep -i model"
- Then "dmesg | grep -i ata2"
- To finally compare by hdd model :-)

As per the error itself, you can use the manufacturer hard disk 
diagnostic tools which uses to run from a LiveCD and will provide 
accurate results about your hdd health and status.

OTOH, I've also seen that kind of error coming from bad sata cable or bad 
sit connection to motherboard/disk. You may also check this.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.07.03.19.19...@gmail.com



Re: disk problems: which ATA?

2011-07-03 Thread JD


  
  
On 07/03/2011 11:25 AM, Ross Boylan wrote:

  How can I tell which ata device is which hard drive?  It's come up
several times for me, most recently with
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

It appears from other info that sdb is the problem:

# smartctl -H /dev/sdb
Sun Jul  3 10:26:29 PDT 2011
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more
'-T permissive' options.

but I'd like to know how to do this in general.

I'd also welcome advice about the disk problems, but I was hoping to
raise the odds of an answer by keeping it simple :)

sdb has had hardware problems for awhile; it wouldn't be surprising if
it's failed.

Running lenny with linux 2.6.26-2-686.

Thanks.
Ross





My basic advice in this
situation is:
1. unmount the failing drive if it is mounted, and mount it's
partitions read only.
2. back up it's mounted partitions to a good drive using tar:
    Assuming the mount points are
    /sdb1  /sdb2    /sdb3
    cd /
    for i in 1 2 3; do
        tar cf-   sdb$i  |  bzip2 -c -9 >
/somewhere/backup/sdb$i
    done
3. If the drive is still under warranty, then run secure erase.
    read up on hdparm (man hdparm)
    If you go this route, I can send you a script
    for proper use of hdparm to securely erase a drive.
    After erasing, then get and RMA and send it away for
    replacement.

4. If you purchase a new drive, you will then partition it
    and tar back the backups to it.

  
  



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e10bde7.4040...@gmail.com



disk problems: which ATA?

2011-07-03 Thread Ross Boylan
How can I tell which ata device is which hard drive?  It's come up
several times for me, most recently with
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

It appears from other info that sdb is the problem:

# smartctl -H /dev/sdb
Sun Jul  3 10:26:29 PDT 2011
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more
'-T permissive' options.

but I'd like to know how to do this in general.

I'd also welcome advice about the disk problems, but I was hoping to
raise the odds of an answer by keeping it simple :)

sdb has had hardware problems for awhile; it wouldn't be surprising if
it's failed.

Running lenny with linux 2.6.26-2-686.

Thanks.
Ross


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1309717518.10857.9.ca...@corn.betterworld.us



Re: ATA Disk problems.

2010-09-02 Thread Camaleón
On Wed, 01 Sep 2010 05:39:21 -0700, Account for Debian group mail wrote:

> One of my mail servers is having some disk problems. I see stuff like
> this in my log files:

(...)

Run a smart test with "smartcl" and check for the results. Just note that 
some hardware raid controllers do not allow running smartctl.
 
> This just started happening. Is it new disk time, or do I have something
> set wrong? Any help would be appreciated.

It can be also a bad cable or connection making "noise", not just a disk 
failing per se. You can replace the cable (wether possible) and see if 
the error stops.

Anyway, time to update the backup for that disk, just in case :-)

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2010.09.02.10.03...@gmail.com



ATA Disk problems.

2010-09-01 Thread Account for Debian group mail


Hello,

One of my mail servers is having some disk problems. I see stuff like this 
in my log files:


Sep  1 05:14:24 mail kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 
action 0x0
Sep  1 05:14:24 mail kernel: ata1.00: BMDMA stat 0x4
Sep  1 05:14:24 mail kernel: ata1.00: cmd 25/00:f8:b7:79:ea/00:01:6d:00:00/e0 
tag 0 dma 258048 in
Sep  1 05:14:24 mail kernel:  res 51/40:00:e8:79:ea/40:00:6d:00:00/00 
Emask 0x9 (media error)
Sep  1 05:14:24 mail kernel: ata1.00: status: { DRDY ERR }
Sep  1 05:14:24 mail kernel: ata1.00: error: { UNC }
Sep  1 05:14:24 mail kernel: ata1.00: configured for UDMA/133
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE,SUGGEST_OK
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Sense Key : Medium Error 
[current] [descriptor]
Sep  1 05:14:24 mail kernel: Descriptor sense data with sense descriptors (in 
hex):
Sep  1 05:14:24 mail kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 
00 00
Sep  1 05:14:24 mail kernel: 6d ea 79 e8
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Add. Sense: Unrecovered read 
error - auto reallocate failed
Sep  1 05:14:24 mail kernel: end_request: I/O error, dev sda, sector 1844083176
Sep  1 05:14:24 mail kernel: ata1: EH complete
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] 1953525168 512-byte hardware 
sectors (1000205 MB)
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Write Protect is off
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] 1953525168 512-byte hardware 
sectors (1000205 MB)
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Write Protect is off
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
Sep  1 05:14:24 mail kernel: sd 1:0:0:0: [sda] Write cache: enabled, read 
cache: enabled, doesn't support DPO or FUA

This just started happening. Is it new disk time, or do I have something 
set wrong? Any help would be appreciated.


Thanks,

Ken



--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/pine.lnx.4.64.1009010534220.1...@mail.pcez.com



[Resolved] Re: Disk problems or worse?

2010-06-18 Thread Ralph Katz
-- On 03 Jun 2010 19:22:48 -0400, Message-id: <4c083948.4090...@rcn.com>
I wrote --

On 06/03/2010 05:53 PM, Jochen Schulz wrote:
> Ralph Katz:
>> On 06/03/2010 01:45 PM, Jochen Schulz wrote:
>>> Which IDE controller? The controller I had problems with was:
>>>
>>> 00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE
Controller (rev 03)

>> You think those errors could come from the
>> controller?
>
> Yes and no. As far as I understand, it was the kernel having
> difficulties with that controller. But as I wrote, I don't know the
> specifics anymore. I just remember that I blamed the disk at first, but
> a replacement drive showed the same symptoms.
>
> You can easily rule this out by adding backports.org to your
> sources.list and trying their kernel.
>
> J.

ra...@spike ~$ lspci |grep IDE
00:1f.1 IDE interface: Intel Corporation 82801BA IDE U100 Controller
(rev 12)

I had not used that command in maybe 5 years, heh.  You get complacent
with stable.  Thanks for the kernel suggestion.

Ralph
-- end of last post --

[Apologies for bad paste and maybe bad threading; having difficulty
migrating mail.]

Jochen, the kernel upgrade to 2.6.32-bpo.5-686 seems to have fixed the
problem!  No system hang, no constant disk error messages, only this:

> zgrep -i attrib /var/log/syslog* |grep -v Temp
> /var/log/syslog:Jun 18 08:30:31 spike smartd[1990]: Device: /dev/sda, SMART 
> Usag
> e Attribute: 196 Reallocated_Event_Count changed from 186 to 185
> /var/log/syslog:Jun 18 14:00:32 spike smartd[1990]: Device: /dev/sda, SMART 
> Pref
> ailure Attribute: 7 Seek_Error_Rate changed from 200 to 100
> /var/log/syslog.1:Jun 17 21:08:35 spike smartd[1985]: Device: /dev/sda, SMART 
> Pr
> efailure Attribute: 7 Seek_Error_Rate changed from 100 to 200
> /var/log/syslog.4.gz:Jun 14 15:27:14 spike smartd[1940]: Device: /dev/sda, 
> SMART
>  Prefailure Attribute: 7 Seek_Error_Rate changed from 200 to 100
> 

Thank you again for your suggestion.

Ralph


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c1c1b40.50...@rcn.com



Re: Disk problems or worse?

2010-06-03 Thread Ralph Katz
On 06/03/2010 05:53 PM, Jochen Schulz wrote:
> Ralph Katz:
>> On 06/03/2010 01:45 PM, Jochen Schulz wrote:
>>> Which IDE controller? The controller I had problems with was:
>>>
>>> 00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller 
>>> (rev 03)
>> Where would I find it?
> 
> Just run lspci.
> 
>> You think those errors could come from the
>> controller?
> 
> Yes and no. As far as I understand, it was the kernel having
> difficulties with that controller. But as I wrote, I don't know the
> specifics anymore. I just remember that I blamed the disk at first, but
> a replacement drive showed the same symptoms.
> 
> You can easily rule this out by adding backports.org to your
> sources.list and trying their kernel.
> 
> J.

ra...@spike ~$ lspci |grep IDE
00:1f.1 IDE interface: Intel Corporation 82801BA IDE U100 Controller
(rev 12)

I had not used that command in maybe 5 years, heh.  You get complacent
with stable.  Thanks for the kernel suggestion.

Ralph


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c083948.4090...@rcn.com



Re: Disk problems or worse?

2010-06-03 Thread Jochen Schulz
Ralph Katz:
> On 06/03/2010 01:45 PM, Jochen Schulz wrote:
>> 
>> Which IDE controller? The controller I had problems with was:
>> 
>> 00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller 
>> (rev 03)
> 
> Where would I find it?

Just run lspci.

> You think those errors could come from the
> controller?

Yes and no. As far as I understand, it was the kernel having
difficulties with that controller. But as I wrote, I don't know the
specifics anymore. I just remember that I blamed the disk at first, but
a replacement drive showed the same symptoms.

You can easily rule this out by adding backports.org to your
sources.list and trying their kernel.

J.
-- 
When standing at the top of beachy head I find the rocks below very
attractive.
[Agree]   [Disagree]
 


signature.asc
Description: Digital signature


Re: Disk problems or worse?

2010-06-03 Thread Ralph Katz
On 06/03/2010 01:45 PM, Jochen Schulz wrote:
> Ralph Katz:
>> As mentioned in the original post, disk PASSED SMART tests, and computer
>> is a P4.
>> Intel(R) Pentium(R) 4 CPU 1.70GHz  single processor
>> hda: UDMA/100 mode selected
> 
> Which IDE controller? The controller I had problems with was:
> 
> 00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller 
> (rev 03)
> 
> J.

Where would I find it?  You think those errors could come from the
controller?
There is probing but no results like you posted.   I only see this is
/var/log/* :

Jun  2 08:22:13 spike kernel: [2.887681] Uniform Multi-Platform
E-IDE driver
Jun  2 08:22:13 spike kernel: [2.887695] ide: Assuming 33MHz system
bus speed for PIO modes; override with idebus=xx
Jun  2 08:22:13 spike kernel: [2.912223] ICH2: IDE controller
(0x8086:0x244b rev 0x12) at  PCI slot :00:1f.1
Jun  2 08:22:13 spike kernel: [2.912255] ICH2: not 100% native mode:
will probe irqs later
Jun  2 08:22:13 spike kernel: [2.912277] ide0: BM-DMA at
0xffa0-0xffa7
Jun  2 08:22:13 spike kernel: [2.912295] ide1: BM-DMA at
0xffa8-0xffaf
Jun  2 08:22:13 spike kernel: [2.912304] Probing IDE interface ide0...
Jun  2 08:22:13 spike kernel: [3.200226] hda: WDC WD1200JB-00EVA0,
ATA DISK drive

Thanks,
Ralph


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c081118.5080...@rcn.com



Re: Disk problems or worse?

2010-06-03 Thread Jochen Schulz
Ralph Katz:
> 
> As mentioned in the original post, disk PASSED SMART tests, and computer
> is a P4.
> Intel(R) Pentium(R) 4 CPU 1.70GHz  single processor
> hda: UDMA/100 mode selected

Which IDE controller? The controller I had problems with was:

00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 
03)

J.
-- 
I have never been happier than I am now; a fact which depresses me
immensely.
[Agree]   [Disagree]
 


signature.asc
Description: Digital signature


Re: Disk problems or worse?

2010-06-03 Thread Ralph Katz
On 06/03/2010 12:48 PM, Anand Sivaram wrote:
> 
> 
> On Thu, Jun 3, 2010 at 21:24, Daniel Barclay  > wrote:
> 
> Ralph,
> 
> Jochen Schulz wrote:
> 
> Ralph Katz:
> 
> Lenny install on newly acquired used Dell hangs and throws
> errors to
> syslog.  Do I have two bad disks or a more serious hardware
> problem?
> 
> 
> Another option: it might be a kernel problem. I don't remember the
> specifics anymore, but on one of my systems I had similar
> errors. After
> replacing the disk and still getting these errors, I found hints
> that
> the kernel might be at fault. I then installed a newer kernel from
> backports.org  and the problems went away.
> 
> 
> What processor and chipset does your motherboard use?
> 
> Do you get
> 
> Does changing your IDE/ATA controllers from DMA mode to PIO
> mode stop the message?
> 
> 
> (I had similar problems (got similar log message) with a dual-processor
> AMD Athlon MP board.  Apparently, the AMD chipset apparently had some
> bug, the Linux didn't work around that particular bug, and the kernel's
> IDE DMA code (or maybe filesystem code) wasn't very robust--it didn't
> retry an operation that failed because of a detected DMA timeout,
> and it didn't even detect that the operation failed and stop (panic
> or something) before things (disk and filesystem state) became
> inconsistent.)
> 
> 
> Daniel
>


> 
> Install the smartmontools for hard disk.  This could tell you in case
> any real problems with your harddisk.
> smartctl -a /dev/hda (for ide hardisk /dev/hda)
> smartctl -d ata -a /dev/sda (for sata harddisk /dev/sda)
> change your device names with the one in your case.


Thanks for the replies to date.

As mentioned in the original post, disk PASSED SMART tests, and computer
is a P4.
Intel(R) Pentium(R) 4 CPU 1.70GHz  single processor
hda: UDMA/100 mode selected

There are no sounds from the disk.

As requested, output from smartctl -a /dev/hda  is below.  The following
errors from this morning were in syslog prior to the most recent Smart
long self test:

Jun  3 08:09:52 spike kernel: [  417.872025] hda: dma_timer_expiry: dma
status == 0x21
Jun  3 08:33:17 spike smartd[2357]: Device: /dev/hda, SMART Usage
Attribute: 196 Reallocated_Event_Count changed from 191 to 190
Jun  3 09:03:14 spike smartd[2357]: Device: /dev/hda, SMART Prefailure
Attribute: 7 Seek_Error_Rate changed from 100 to 200
ra...@spike ~$ sudo smartctl -a /dev/hda
[sudo] password for ralph:
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE family
Device Model: WDC WD1200JB-00EVA0
Serial Number:WD-WCAEK1690109
Firmware Version: 15.05R15
User Capacity:120,034,123,776 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Thu Jun  3 13:29:56 2010 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
was aborted by an interrupting
command from host.
Auto Offline Data Collection:
Enabled.
Self-test execution status:  (   0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: (3999) seconds.
Offline data collection
capabilities:(0x79) SMART execute Offline immediate.
No Auto Offline data collection
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:

Re: Disk problems or worse?

2010-06-03 Thread Anand Sivaram
On Thu, Jun 3, 2010 at 21:24, Daniel Barclay  wrote:

> Ralph,
>
> Jochen Schulz wrote:
>
>> Ralph Katz:
>>
>>> Lenny install on newly acquired used Dell hangs and throws errors to
>>> syslog.  Do I have two bad disks or a more serious hardware problem?
>>>
>>
>> Another option: it might be a kernel problem. I don't remember the
>> specifics anymore, but on one of my systems I had similar errors. After
>> replacing the disk and still getting these errors, I found hints that
>> the kernel might be at fault. I then installed a newer kernel from
>> backports.org and the problems went away.
>>
>
> What processor and chipset does your motherboard use?
>
> Do you get
>
> Does changing your IDE/ATA controllers from DMA mode to PIO
> mode stop the message?
>
>
> (I had similar problems (got similar log message) with a dual-processor
> AMD Athlon MP board.  Apparently, the AMD chipset apparently had some
> bug, the Linux didn't work around that particular bug, and the kernel's
> IDE DMA code (or maybe filesystem code) wasn't very robust--it didn't
> retry an operation that failed because of a detected DMA timeout,
> and it didn't even detect that the operation failed and stop (panic
> or something) before things (disk and filesystem state) became
> inconsistent.)
>
>
> Daniel
> --
>
>
>
> --
> To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a
> subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
> Archive: http://lists.debian.org/4c07d027.9020...@fgm.com
>
>
Install the smartmontools for hard disk.  This could tell you in case any
real problems with your harddisk.
smartctl -a /dev/hda (for ide hardisk /dev/hda)
smartctl -d ata -a /dev/sda (for sata harddisk /dev/sda)
change your device names with the one in your case.


Re: Disk problems or worse?

2010-06-03 Thread Daniel Barclay

Ralph,

Jochen Schulz wrote:

Ralph Katz:

Lenny install on newly acquired used Dell hangs and throws errors to
syslog.  Do I have two bad disks or a more serious hardware problem?


Another option: it might be a kernel problem. I don't remember the
specifics anymore, but on one of my systems I had similar errors. After
replacing the disk and still getting these errors, I found hints that
the kernel might be at fault. I then installed a newer kernel from
backports.org and the problems went away.


What processor and chipset does your motherboard use?

Do you get

Does changing your IDE/ATA controllers from DMA mode to PIO
mode stop the message?


(I had similar problems (got similar log message) with a dual-processor
AMD Athlon MP board.  Apparently, the AMD chipset apparently had some
bug, the Linux didn't work around that particular bug, and the kernel's
IDE DMA code (or maybe filesystem code) wasn't very robust--it didn't
retry an operation that failed because of a detected DMA timeout,
and it didn't even detect that the operation failed and stop (panic
or something) before things (disk and filesystem state) became
inconsistent.)


Daniel
--



--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4c07d027.9020...@fgm.com



Re: Disk problems or worse?

2010-06-03 Thread David Baron
I sometimes get this. The disks click-clack. Those messages.

Usually rebooting after jiggling the cables fixes it. Maybe replace them. Also 
check the power supply. Working? Adequate?


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201006031625.34181.d_ba...@012.net.il



Re: Disk problems or worse?

2010-06-02 Thread Jochen Schulz
Ralph Katz:
>
> Lenny install on newly acquired used Dell hangs and throws errors to
> syslog.  Do I have two bad disks or a more serious hardware problem?

Another option: it might be a kernel problem. I don't remember the
specifics anymore, but on one of my systems I had similar errors. After
replacing the disk and still getting these errors, I found hints that
the kernel might be at fault. I then installed a newer kernel from
backports.org and the problems went away.

> May 24 21:54:14 spike kernel: [ 5065.393331] Clocksource tsc unstable
> (delta = 4686898152 ns)

This line is irrelevant for the hard disk problem.

> /var/log/syslog:Jun  2 08:52:40 spike smartd[2346]: Device: /dev/hda,
> SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 198
> /var/log/syslog.1:Jun  1 08:13:56 spike kernel: [  936.23] hda:
> dma_timer_expiry: dma status == 0x21
> /var/log/syslog.1:Jun  1 08:28:44 spike smartd[2357]: Device: /dev/hda,
> SMART Usage Attribute: 196 Reallocated_Event_Count changed from 196 to 195

That's a real hard disk error, but unless it happens regularly, you
don't need to worry. These happen sometimes and the disk is usually able
to handle it.

> Meanwhile, SMART self-tests short and long passed.  No errors were
> reported by smartctl -a /dev/hda.

Well, at least the reallocation events should have been counted. It
doesn't hurt to post smartctl's output.

J.
-- 
In an ideal world I would cure poverty and go to the gym at least three
days a week.
[Agree]   [Disagree]
 


signature.asc
Description: Digital signature


Re: Disk problems or worse?

2010-06-02 Thread Mark
On Wed, Jun 2, 2010 at 6:21 PM, Ralph Katz  wrote:

> Lenny install on newly acquired used Dell hangs and throws errors to
> syslog.  Do I have two bad disks or a more serious hardware problem?
> Short of buying a new disk, how would I know?  What would you recommend?
>  Or do I have a simple BIOS setting problem?
>

[snip]

If you boot to an Ubuntu Live CD, it will automatically let you know of any
bad hard disk sectors via a pop up GUI upon booting to the desktop
environment.  I inherited a decommissioned hard drive from a server room and
used Ubuntu Live CD to confirm it had bad sectors, hence the reason for its
decommissioning.

Once you confirm it's not the hdd, then you can troubleshoot other
possibilities.

HTH.

Mark


Disk problems or worse?

2010-06-02 Thread Ralph Katz
Lenny install on newly acquired used Dell hangs and throws errors to
syslog.  Do I have two bad disks or a more serious hardware problem?
Short of buying a new disk, how would I know?  What would you recommend?
 Or do I have a simple BIOS setting problem?

(My last post to debian-user was in 2008.  Etch has continued to be rock
solid on two desktops.  Now I felt was time to upgrade.)

First, an old DELL GX240 was obtained and Lenny/xfce installed; P4, 1Gb,
 120 Gb WDC disk.

Syslog showed all kinds of errors while system would hang at times:

May 24 21:53:39 spike kernel: [ 5034.952013] hda: status timeout:
status=0x80 { Busy }
May 24 21:53:39 spike kernel: [ 5034.952021] ide: failed opcode was: unknown
May 24 21:53:39 spike kernel: [ 5034.952030] hda: DMA disabled
May 24 21:53:39 spike kernel: [ 5034.952066] hda: drive not ready for
command
May 24 21:54:14 spike kernel: [ 5064.952021] ide0: reset timed-out,
status=0x80
May 24 21:54:14 spike kernel: [ 5065.393331] hda: status timeout:
status=0x80 { Busy }
May 24 21:54:14 spike kernel: [ 5065.393331] ide: failed opcode was: unknown
May 24 21:54:14 spike kernel: [ 5065.393331] hda: drive not ready for
command
May 24 21:54:14 spike kernel: [ 5065.393331] Clocksource tsc unstable
(delta = 4686898152 ns)
May 24 21:54:44 spike kernel: [ 5099.964023] ide0: reset timed-out,
status=0x80
May 24 21:54:44 spike kernel: [ 5099.964040] end_request: I/O error, dev
hda, sector 10867375
May 24 21:54:44 spike kernel: [ 5099.964104] end_request: I/O error, dev
hda, sector 13826839
May 24 21:54:44 spike kernel: [ 5099.964115] Buffer I/O error on device
dm-2, logical block 360455

[snipped 20 Kb of I/O errors]

May 24 21:54:44 spike kernel: [ 5099.967007] end_request: I/O error, dev
hda, sector 208223535
May 24 21:54:44 spike kernel: [ 5099.967024] EXT3-fs error (device
dm-5): ext3_get_inode_loc: unable to read inode block - inode=5792911,
block=23167050
May 24 21:54:44 spike kernel: [ 5099.967128] Aborting journal on device
dm-5.
May 24 21:54:44 spike kernel: [ 5099.968575] ext3_abort called.
May 24 21:54:44 spike kernel: [ 5099.968587] EXT3-fs error (device
dm-5): ext3_journal_start_sb: Detected aborted journal
May 24 21:54:44 spike kernel: [ 5099.968594] Remounting filesystem read-only

I concluded the disk was dead (but SMART tests PASSED), and replaced it
with another used 120 Gb WDC, re-installed Lenny, and soon the system
would again hang, typically at start up.

Sylog entries of note with the second disk installed:

/var/log/syslog:Jun  2 08:52:40 spike smartd[2346]: Device: /dev/hda,
SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 198
/var/log/syslog.1:Jun  1 08:13:56 spike kernel: [  936.23] hda:
dma_timer_expiry: dma status == 0x21
/var/log/syslog.1:Jun  1 08:28:44 spike smartd[2357]: Device: /dev/hda,
SMART Usage Attribute: 196 Reallocated_Event_Count changed from 196 to 195

May 31 09:54:09 spike kernel: [  620.084022] hda: dma_timer_expiry: dma
status == 0x20
May 31 09:54:09 spike kernel: [  620.084031] hda: DMA timeout retry
May 31 09:54:09 spike kernel: [  620.084034] hda: timeout waiting for DMA
May 31 09:54:09 spike kernel: [  624.232267] Clocksource tsc unstable
(delta = 4686697657 ns)
May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Prefailure
Attribute: 5 Reallocated_Sector_Ct changed from 200 to 199
May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Usage
Attribute: 196 Reallocated_Event_Count changed from 200 to 196

Meanwhile, SMART self-tests short and long passed.  No errors were
reported by smartctl -a /dev/hda.

This morning I had to reboot a hung system with Alt SysRq b because X,
an ssh connection, VT1 and CrlAltDel failed.

Searching the net for "Clocksource tsc unstable" suggested disabling
acpi in bios.  Hey, I'm just a desktop user, and this is beginning to
get beyond my 7 yrs capabilities of understanding the magic.

Suggestions welcomed, thanks!

Ralph




-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c070380@rcn.com



UBS disk problems

2006-01-07 Thread Z F
Hello everybody,

I was searching the web, saw that some people have
similar problem, but I could not find a solution...

The problem is that I have a USB hard drive and when
it is plugged in, it is detected fine and works on small files.
If a large file is copied to the drive, something bad happens
(line USB reset) and the USB drive partition is remonted read-only
due to errors. (output of dmesg is below)

Any help is highly appreciated. 

Thanks

Lazar
---
usb 6-6: new high speed USB device using ehci_hcd and address 3
scsi1 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 3
usb-storage: waiting for device to settle before scanning
  Vendor: IC35L060  Model: AVVA07-0  Rev: 0811
  Type:   Direct-Access  ANSI SCSI revision: 00
SCSI device sda: 120103200 512-byte hdwr sectors (61493 MB)
sda: assuming drive cache: write through
SCSI device sda: 120103200 512-byte hdwr sectors (61493 MB)
sda: assuming drive cache: write through
 sda: sda1 sda2
Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
usb-storage: device scan complete
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda2, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.


being removed
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
[message repeats many times]

Aborting journal on device sda2.
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
[message repeats many times]

journal commit I/O error
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
[message repeats many times]

EXT3-fs error (device sda2) in ext3_ordered_writepage: IO failure
scsi0 (0:0): rejecting I/O to device being removed
ext3_abort called.
EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted
journal
Remounting filesystem read-only
__journal_remove_journal_head: freeing b_committed_data
scsi0 (0:0): rejecting I/O to dead device
printk: 25611 messages suppressed.
Buffer I/O error on device sda2, logical block 522
lost page write due to I/O error on sda2





__ 
Yahoo! DSL – Something to write home about. 
Just $16.99/mo. or less. 
dsl.yahoo.com 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Disk problems? (was: Re: dpkg fails)

2003-03-11 Thread Pigeon
On Sun, Mar 09, 2003 at 07:49:01PM -0800, Ron Farrer wrote:
> Second update: after doing some disk intensive work, these show up in
> the system log:
> 
> Mar  9 19:16:27 dmz kernel: scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: 
> Request Sense 00 00 00 10 00 
> Mar  9 19:16:27 dmz kernel: Info fld=0x11730b, Current sd08:01: sense key Medium 
> Error
> Mar  9 19:16:27 dmz kernel: Additional sense indicates Unrecovered read error
> Mar  9 19:16:27 dmz kernel: scsidisk I/O error: dev 08:01, sector 1143496
> Mar  9 19:16:29 dmz kernel: scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: 
> Request Sense 00 00 00 10 00 
> Mar  9 19:16:29 dmz kernel: Info fld=0x11730b, Current sd08:01: sense key Medium 
> Error
> Mar  9 19:16:29 dmz kernel: Additional sense indicates Unrecovered read error
> Mar  9 19:16:29 dmz kernel: scsidisk I/O error: dev 08:01, sector 1143496
> 
> It looks like maybe a bad sector on the disk? Any ideas?

I think that not only is it a bad sector, it's a bad sector that your
SCSI drive has failed to remap to a good one. Have a look at the grown
defects list (scsiinfo -d /dev/sda | less) - if it's full, back up
your drive ASAP and get a new one.

How do you know if it's full if you don't know how big it can be - if
the number of entries is a suspicious number like (2^n)-1, it's
probably full.

Pigeon


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Disk problems? (was: Re: dpkg fails)

2003-03-09 Thread Ron Farrer
Ron Farrer ([EMAIL PROTECTED]) wrote:

> Update: it's not just ipmasq. I also tried to install wget and it fails
> in the same way.
> 
> TIA,
> Ron

Second update: after doing some disk intensive work, these show up in
the system log:

Mar  9 19:16:27 dmz kernel: scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: 
Request Sense 00 00 00 10 00 
Mar  9 19:16:27 dmz kernel: Info fld=0x11730b, Current sd08:01: sense key Medium Error
Mar  9 19:16:27 dmz kernel: Additional sense indicates Unrecovered read error
Mar  9 19:16:27 dmz kernel: scsidisk I/O error: dev 08:01, sector 1143496
Mar  9 19:16:29 dmz kernel: scsi0: MEDIUM ERROR on channel 0, id 0, lun 0, CDB: 
Request Sense 00 00 00 10 00 
Mar  9 19:16:29 dmz kernel: Info fld=0x11730b, Current sd08:01: sense key Medium Error
Mar  9 19:16:29 dmz kernel: Additional sense indicates Unrecovered read error
Mar  9 19:16:29 dmz kernel: scsidisk I/O error: dev 08:01, sector 1143496

It looks like maybe a bad sector on the disk? Any ideas?

TIA,
Ron
-- 
Email: <[EMAIL PROTECTED]> or <[EMAIL PROTECTED]>


pgp0.pgp
Description: PGP signature


Re: Help: disk problems..

2000-05-12 Thread John Pearson
On Fri, May 12, 2000 at 04:49:21PM -0500, Gregory Guthrie wrote
> Help!!
> 
> I had a working system, and after several weeks up we were moving some 
> (Apache) files around and wanted to make sure that the system setup for 
> Apache was OK, se we re-booted.
> 
> At re-boot I now get:
> |--
> |  ...
> | .. checking root file system
> |  fsck.ext2: attempt to read block from filesystem resulted in short read 
> while trying to open /dev/hda1
> |  Could this be a zero length partition?
> |
> | fsck failed. please repair manually and re-boot. Please note that the 
> root file system is mounted Read-only,
> | 
> |give root password for maintenance.
> |---
> Ok, I go root, and look around. everything seems OK, all files there.
> 
> fdisk shows partitions OK:
> |--
> |  boot   device format  size   start  end
> |   *   /dev/hda1   Extended1580M   2 785
> |  hda2   DOS FAT-16  50M   2 27
> |   *  hda6   Linux ext2 1250M28662
> |   hda7   linux swap   250M  663   785
> |-
> a  v option to fdisk (check partition table) says:  8249 unallocated sectors
> 
> running fdisk from a rescue floppy on /dev/hda6 gives (immediately):
>   e2fsck
>   /dev/hda6: clean, 29621/641024 files, 542724/3560288 blocks.
> 
> It seems to report this Immediately, no time, no disk work.   (???)
> 
> I tried re-writing the boot block to the first partition (the one reported 
> troublesome), /dev/hda1, seems to work fine.
> 

Your root filesystem is on /dev/hda6, but rcS said:
> | .. checking root file system
> |  fsck.ext2: attempt to read block from filesystem resulted in short read
> while trying to open /dev/hda1
> |  Could this be a zero length partition?

It's trying to fsck your extneded partition, rather than your true root
partition.  My guess is that /etc/fstab is wrong, and that it lists
/dev/hda1 as root instead of /dev/hda6.  The fact that you get to
single-user mode means that hte kernel and LILO are configured correctly.


HTH,


John P.
-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.mdt.net.au/~john Debian Linux admin & support:technical services



Help: disk problems..

2000-05-12 Thread Gregory Guthrie

Help!!

I had a working system, and after several weeks up we were moving some 
(Apache) files around and wanted to make sure that the system setup for 
Apache was OK, se we re-booted.


At re-boot I now get:
|--
|  ...
| .. checking root file system
|  fsck.ext2: attempt to read block from filesystem resulted in short read 
while trying to open /dev/hda1

|  Could this be a zero length partition?
|
| fsck failed. please repair manually and re-boot. Please note that the 
root file system is mounted Read-only,

| 
|give root password for maintenance.
|---
Ok, I go root, and look around. everything seems OK, all files there.

fdisk shows partitions OK:
|--
|  boot   device format  size   start  end
|   *   /dev/hda1   Extended1580M   2 785
|  hda2   DOS FAT-16  50M   2 27
|   *  hda6   Linux ext2 1250M28662
|   hda7   linux swap   250M  663   785
|-
a  v option to fdisk (check partition table) says:  8249 unallocated sectors

running fdisk from a rescue floppy on /dev/hda6 gives (immediately):
 e2fsck
 /dev/hda6: clean, 29621/641024 files, 542724/3560288 blocks.

It seems to report this Immediately, no time, no disk work.   (???)

I tried re-writing the boot block to the first partition (the one reported 
troublesome), /dev/hda1, seems to work fine.


Running DF shows some things that look like problems:
   filesystem   1024-blocks   used   availcapacity   mount
 /dev/hda6   2476090   458526  1889550   20%  /
 /proc   " "" 
" /proc
 none   " "" 
   "/proc

---

So, what to do? I would really hate to lose all the work put into this new 
system.
I tried to do a tar backup over the network, but since the system never 
finished booting, it hasn't got networking up yet.


I can't figure out what happened, what's wrong, or what to do.

Thanks,
Gregory Guthrie
(please also reply by Email)



Gregory Guthrie
[EMAIL PROTECTED] (515)472-1125Fax: -1103
   Computer Science Department
   College of Science and Technology
   Maharishi University of Management
   http://www.mum.edu/csdept




Re: Backup of MBF (was Hard disk problems)

1999-06-30 Thread Dean


I'm not an expert but it was my understanding a low-level format was
only done at the
factory as it required special equipment, and that a new hard drive was
less 
expensive. Also I thought LILO had a backup utility for the MBR. Dean
> > Low-level format is *not* needed any more -- that is, as long as your hard
> > drive isn't fubared (as in fscked up beyond all recognition). It may not be
> > called a low-level format, either. I have a couple of Maxtor drives and
> > there is a Maxtor utility for them which will do the low-level format. In
> > the utility itself the operation is called a "Write test", but the docs
> > explain what it is. Basically, this will re-write the MBR and all the
> > sectors and "restore the drive to the condition in which it was shipped". I
> > had to use it once when I managed to screw up my MBR and it's backup as
> > well. Thankfully, this happened when I was installing a drive so there was
> > no data on it.
> >
> Do you know a generic way to use the backup copy of the MBR, because I
> once accidently hosed my MBR and had to reinstall.
> 
> Thanks,
> Pete.
> 
> --
> Unsubscribe?  mail -s unsubscribe [EMAIL PROTECTED] < /dev/null


Re: Backup of MBF (was Hard disk problems)

1999-06-30 Thread Peter Ross
On 30-Jun-1999, [ Kaa [EMAIL PROTECTED]@hotmail.com <[EMAIL PROTECTED]> wrote:
> >From: charles kaufman <[EMAIL PROTECTED]>
> 
> >I'm trying to avoid repartitioning. I will if nothing else works.
> >But I don't know what 'low level format' means. I remember doing that
> >for DOS before there was IDE, but thought it wasn't needed anymore.
> >Thanks for all the information.
> >Chuck Kaufman
> >
> 
> Low-level format is *not* needed any more -- that is, as long as your hard 
> drive isn't fubared (as in fscked up beyond all recognition). It may not be 
> called a low-level format, either. I have a couple of Maxtor drives and 
> there is a Maxtor utility for them which will do the low-level format. In 
> the utility itself the operation is called a "Write test", but the docs 
> explain what it is. Basically, this will re-write the MBR and all the 
> sectors and "restore the drive to the condition in which it was shipped". I 
> had to use it once when I managed to screw up my MBR and it's backup as 
> well. Thankfully, this happened when I was installing a drive so there was 
> no data on it.
> 
Do you know a generic way to use the backup copy of the MBR, because I
once accidently hosed my MBR and had to reinstall.

Thanks,
Pete.


Re: Hard disk problems

1999-06-30 Thread [ Kaa ]

From: charles kaufman <[EMAIL PROTECTED]>



I'm trying to avoid repartitioning. I will if nothing else works.
But I don't know what 'low level format' means. I remember doing that
for DOS before there was IDE, but thought it wasn't needed anymore.
Thanks for all the information.
Chuck Kaufman



Low-level format is *not* needed any more -- that is, as long as your hard 
drive isn't fubared (as in fscked up beyond all recognition). It may not be 
called a low-level format, either. I have a couple of Maxtor drives and 
there is a Maxtor utility for them which will do the low-level format. In 
the utility itself the operation is called a "Write test", but the docs 
explain what it is. Basically, this will re-write the MBR and all the 
sectors and "restore the drive to the condition in which it was shipped". I 
had to use it once when I managed to screw up my MBR and it's backup as 
well. Thankfully, this happened when I was installing a drive so there was 
no data on it.





Kaa




___
Get Free Email and Do More On The Web. Visit http://www.msn.com


Re: Hard disk problems

1999-06-30 Thread charles kaufman
Dear Kaa: 
Thanks for the suggestions. 

On Tue, 29 Jun 1999, [ Kaa [EMAIL PROTECTED]@hotmail.com wrote:

> Yes, but given that the kernel believes there is FAT12 partition, it seems 
> that there is something wrong with the partition table or at least the 
> reading thereof.
> 
> Is the low-level reformatting/repartitioning an option?
I'm trying to avoid repartitioning. I will if nothing else works. 
But I don't know what 'low level format' means. I remember doing that
for DOS before there was IDE, but thought it wasn't needed anymore.
Thanks for all the information.
Chuck Kaufman


Re: Hard disk problems

1999-06-29 Thread [ Kaa ]





From: charles kaufman <[EMAIL PROTECTED]>

Thanks for the hint. Of course I don't know whether it's a BIOS disk
geometry problem. In fact fdisk says the disk has 1027 cylinders.

 ^^

But it reports hda1 (dos) is 1 to 64, hda2 (linux) is 65 to 192,
 and hda3(linux swap) is 193 to 205. That's beyond 512 MB but
well within 1024 cylinders.


Here is a quote from the Large-Disk HOWTO:

 Suppose you have a disk with more than 1024 cylinders.  Suppose
 moreover that you have an operating system that uses the old INT13
 BIOS interface to disk I/O.  Then you have a problem, because this
 interface uses a 10-bit field for the cylinder on which the I/O is
 done, so that cylinders 1024 and past are inaccessible.

 Fortunately, Linux does not use the BIOS, so there is no problem.

 Well, except for two things:

 (1) When you boot your system, Linux isn't running yet and cannot save
 you from BIOS problems.  This has some consequences for LILO and
 similar boot loaders.

 (2) It is necessary for all operating systems that use one disk to
 agree on where the partitions are.  In other words, if you use both
 Linux and, say, DOS on one disk, then both must interpret the
 partition table in the same way.  This has some consequences for the
 Linux kernel and for fdisk.

... and another possibly relevant piece:


 Another point is that the boot loader and the BIOS must agree as to
 the disk geometry.  LILO asks the kernel for the geometry, but more
 and more authors of disk drivers follow the bad habit of deriving a
 geometry from the partition table, instead of telling LILO what the
 BIOS will use. Thus, often the geometry supplied by the kernel is
 worthless. In such cases it helps to give LILO the `linear' option.
 The effect of this is that LILO does not need geometry information at
 boot loader install time (it stores linear addresses in the maps) but
 does the conversion of linear addresses at boot time. Why is this not
 the default?  Well, there is one disadvantage: with the `linear'
 option, LILO no longer knows about cylinder numbers, and hence cannot
 warn you when part of the kernel was stored above the 1024 cylinder
 limit, and you may end up with a system that does not boot.





However lilo works fine and the kernel boots fine. The trouble only
comes late in the startup process, after the partition check-which gives
the results it should-when it insists on trying to mount / on 03:03
while lilo and rdev and fstab all (seem to) have been told that the
 root device is hda2 not hda3.


Yes, but given that the kernel believes there is FAT12 partition, it seems 
that there is something wrong with the partition table or at least the 
reading thereof.


Is the low-level reformatting/repartitioning an option?


Kaa




___
Get Free Email and Do More On The Web. Visit http://www.msn.com


Re: Disk problems

1998-11-03 Thread Michael B. Taylor

On Tue, Nov 03, 1998 at 07:32:19AM -0500, Biciunas, Paul John wrote:
> Hello, all.
> 
> I installed Debian 2.0 (2.0.34) Greenbush distribution.
> My disks are 2 IDE drives, a 540M and 2.5G slave.
> 
> The partitions are (df output)
> /dev/hda1 99029   ... /
> /dev/hda3   348873   ... /home
> /dev/hdb1   495714   ... /var
> /dev/hdb2 1926659   ... /usr
> 
> I had problems making a kernel, but finally managed to build a bzImage.
> The kernel booted, and upon testing, realized that I needed to rebuild
> the kernel.
> That's when the fun started. The make failed, when it couldn't process 
> some .c files in /usr/src/linux/lib/ - "file" said they were MPEG files.
> 
> Firing up emacs, it complained about not being able to find
> /usr/local/share/emacs/...
> files, and sure enough, /usr/local/share was no longer a directory, but
> some .c file.
> 
> Running fsck was a nightmare.
> Instead, I rebooted from the installation cdrom, and repartitioned the
> disks, checked
> for bad blocks (passed), and started dselect. On Install, what I got was
> multiple
> 
> EXT_fs error (device 03:42): ext2_find_entry : bad entry in directory
> #8193 : rec_len
> is too small for name_len - offset 0, inode 538976288, rec_len=8224,
> name_len=8224
> 
> Is my disk toast? 

That would be my guess.  Unfortunately, it looks like the newer 2.5G
drive is the one going south.

To test this theory, unplug the data cable from the 2.5G and jumper the
540M as a solo master.  Reinstall from scratch, including repartitioning.
That way, you can be sure that you are not starting out with any corrupt
files.

I think, however, that IDE drives can fail in ways that induce errors
in drives mounted on the same controller, so it *could* be that the
540M is failing.  If the above does not clear up the problem, unplug the
540M, set up the 2.5G as solo master, and try again.

If the problem still presists, it is prolly a controller problem, or 
something I missed.

If you are not using the kernel-package for your kernel compiles, I suggest
you check it out.  It makes things much easier.  The documentation will
be in /usr/doc.

Mike


Disk problems

1998-11-03 Thread Biciunas, Paul John
Hello, all.

I installed Debian 2.0 (2.0.34) Greenbush distribution.
My disks are 2 IDE drives, a 540M and 2.5G slave.

The partitions are (df output)
/dev/hda1 99029   ... /
/dev/hda3   348873   ... /home
/dev/hdb1   495714   ... /var
/dev/hdb2 1926659   ... /usr

I had problems making a kernel, but finally managed to build a bzImage.
The kernel booted, and upon testing, realized that I needed to rebuild
the kernel.
That's when the fun started. The make failed, when it couldn't process 
some .c files in /usr/src/linux/lib/ - "file" said they were MPEG files.

Firing up emacs, it complained about not being able to find
/usr/local/share/emacs/...
files, and sure enough, /usr/local/share was no longer a directory, but
some .c file.

Running fsck was a nightmare.
Instead, I rebooted from the installation cdrom, and repartitioned the
disks, checked
for bad blocks (passed), and started dselect. On Install, what I got was
multiple

EXT_fs error (device 03:42): ext2_find_entry : bad entry in directory
#8193 : rec_len
is too small for name_len - offset 0, inode 538976288, rec_len=8224,
name_len=8224

Is my disk toast? 
Any and all help will be appreciated.

-Paul Biciunas
[EMAIL PROTECTED]




Weird Rescue Disk problems

1998-08-05 Thread Matt Kopishke
Hello, I am trying to do a fresh install of Hamm (2.0 Beta cd from Cheap
Bytes).  My harware does not support cd booting, and I don't have dos
drivers for my CD rom, so I need to use a Rescue Disk to get thing going.
But I make the Disk from 1440.bin with Rawrite2, reboot, watch it load
root.bin, then load Linux but then I get this error (I included the last
line thatt was completed for references):

VFS: mounted root (minix filesystem)
Bug in Dynamic linker ld.so ../sysdeps/i386/dl-machines.h:
307:elf_machine_lazy_rel Assertion '!"unexpected PLT reloc type"' failed!

I installed Bo on this machine with out a hich, and had it upgraded to
Hamm using the same CD, but I wanted to do some re partioning, and now I
can't install Hamm on my new partions!  This is a Cyrix 586, I have heard
of problems with the Cyrix's and floppys, could this be it, or is there
somthing wrong here?
Thanks, 

-Matt-


[EMAIL PROTECTED]
http://www.midcoast.com/~kopishke
http://169.244.147.29   MSAD#40 Home Page
http://169.244.147.29/ss/MVHS Seed Savers Project
http://169.244.147.29/MVCUG/Medomak Valley Computer User Group
   --
  | *To see tomorrow's PC, Look at todays Macintosh* | 
  |*If it says "Windows 95 or better" install Linux!*|
   --


--  
Unsubscribe?  mail -s unsubscribe [EMAIL PROTECTED] < /dev/null


Re: Strange disk problems - file dates out of wack - solved

1997-10-26 Thread Colin R. Telmer
On 26 Oct 1997, Ben Pfaff wrote:

> Philippe Troin <[EMAIL PROTECTED]> writes:
> > > > Unmounted /home without any problems and ran e2fsck with the "check for
> > > > bad blocks" and "force" options. However, the disk seems to be fine.
> > > > Strange.
> 
> You might want to try the debugfs program.  Perhaps it can unlink the
> files.

I ran debugfs -w /dev/hda6, changed into the appropriate directory, rm'd
the files (some complaints, but proceeded), quit and the files were
gone:). I then ran a e2fsck -fvcy /dev/hda6 (unmounted) and it repaired a
few screwy inodes. Everthing seems to be fine now. Thanks for all the
help. Cheers, Colin. 

--
Colin Telmer, Institute of Intergovernmental Relations
School of Policy Studies Building, Room 309, Queen's University
Kingston, Ontario, Canada, K7L-3N6 (613)545-6000x4219   




--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


RE: Strange disk problems - file dates out of wack

1997-10-26 Thread Ted Harding

On 26-Oct-97 Colin R. Telmer wrote:
> 
> I have done a e2fsck -fnv but it also did not reveal any problems (output
> below). However, it did reveal the existence of 27 block device files
> that
> I assume have no reason to be under /home. I'm at a loss - any other
> suggestions? Thanks, Colin.
> 
> frisch:/# e2fsck -fnv /dev/hda6
> e2fsck 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
> 
>24468 regular files
> 1128 directories
>   19 character device files
>   27 block device files
>3 fifos
>   20 links
>  292 symbolic links (292 fast symbolic links)
>2 sockets
> 
>25959 files  

This ties in with what you first posted -- only more so.

You then showed 1 block device (filetype letter "b"): fsck has found 27.
You showed 2 character device (filetype letter "c"): fsck has found 19.
It would be most bizarre if any of these things were legitimately under
/home, and they are almost certainly all spurious.

You also showed 2 fifos (named pipes, filetype letter "p"): fsck has
found 3.
fsck has also found 2 sockets. Fifos and sockets are quite possible in
/home, depending on what users are doing, but both of the fifos you showed
had the names of .gif files (though they showed in the listing with size 0,
as fifos should). So at least these two are spurious; probably the sockets
are too.   

This also ties in with the results you got with Philippe Troin's suggestion
of lsattr:

frisch:/home/reevesj/.netscape/cache# lsattr 07
lsattr 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
lsattr: No such device While reading flags on 07 

This, according to the directory listing purports to be a block device with
major,minor = 73,60 and such a combination corresponds to no device type
that I know of.

frisch:/home/reevesj/.netscape/cache# lsattr 13
lsattr 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
lsattr: Invalid argument While reading flags on 13/cache340259B30125B9F.gif
lsattr: Invalid argument While reading flags on 13/cache343150330010C49.gif
lsattr: No such device While reading flags on 13/cache340259B30115B9F 

The first two files purported to be fifos (pipes) while their names suggest
they should be .gif files, and lsattr has found invalid flags.

The third file purported to be a character device with maj,min = 60,62
(again unkown type), and lsattr again finds "No such device".

My impression is that so much corrupt info has been written to disk that it
is probably "fubar" (in the orignal military sense of that expression).
There is a program which allows direct editing of inodes, but it's a very
long shot at the best of times even for experts (which I'm not); and in any
case I reckon attempting to mend the disk by hand needs an expert sitting in
front of the machine. I'll back off now: I think it's time for any real
filesystem experts reading all this evidence to give a considered diagnosis
(and prognosis). I'm only speaking from memories of painful experience, and
general knowledge ...

Best wishes,
Ted.


E-Mail: Ted Harding <[EMAIL PROTECTED]>
Date: 26-Oct-97   Time: 19:26:52



--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Re: Strange disk problems - file dates out of wack

1997-10-26 Thread Ben Pfaff
Philippe Troin <[EMAIL PROTECTED]> writes:
> > > Unmounted /home without any problems and ran e2fsck with the "check for
> > > bad blocks" and "force" options. However, the disk seems to be fine.
> > > Strange.

You might want to try the debugfs program.  Perhaps it can unlink the
files.
-- 
Ben Pfaff <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
Senders of unsolicited commercial e-mail will receive free 32MB core files!


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


RE: Strange disk problems - file dates out of wack

1997-10-26 Thread Colin R. Telmer
On Sun, 26 Oct 1997, Ted Harding wrote:

> Try (non-destructively) e2fsck -fnV on the device with these files
> and stand back ... (at any rate pipe it through "less"). I predict
> several thousand lines of possibly alarming information. Depending on what
> you see, you may judge that it's worth taking the chance to give fsck a free
> reign to try to make the filesystem clean (though it may zap some stuff in
> so doing); or else raw-backup (dd to another device) the bytes on the device
> and then either do fsck, or reformat the filesystem, or replace the hard
> drive.

I have done a e2fsck -fnv but it also did not reveal any problems (output
below). However, it did reveal the existence of 27 block device files that
I assume have no reason to be under /home. I'm at a loss - any other
suggestions? Thanks, Colin.

frisch:/# e2fsck -fnv /dev/hda6
e2fsck 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

   25948 inodes used (4%)
 902 non-contiguous inodes (3.5%)
 # of inodes with ind/dind/tind blocks: 2230/249/41
 1169173 blocks used (54%)
   0 bad blocks

   24468 regular files
1128 directories
  19 character device files
  27 block device files
   3 fifos
  20 links
 292 symbolic links (292 fast symbolic links)
   2 sockets

   25959 files  



--
Colin Telmer, Institute of Intergovernmental Relations
School of Policy Studies Building, Room 309, Queen's University
Kingston, Ontario, Canada, K7L-3N6 (613)545-6000x4219   




--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Re: Strange disk problems - file dates out of wack

1997-10-26 Thread Colin R. Telmer
On Sun, 26 Oct 1997, Philippe Troin wrote:

> > Looking more closely at /reevesj/.netscape/cache, one finds:
> > 
> > br--r-srwx   1 2878729728 73,  60 May 21  2025 07
> > 
> > Notice the date and the permissions!  Whatever this is, I cannot remove
> > it, even using "rm -f", as root! I also cannot change the permissions.
> 
> What does lsattr say ? Maybe it's an immutable file (chattr it).

frisch:/home/reevesj/.netscape/cache# lsattr 07
lsattr 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
lsattr: No such device While reading flags on 07 

frisch:/home/reevesj/.netscape/cache# lsattr 13
lsattr 1.10, 24-Apr-97 for EXT2 FS 0.5b, 95/08/09
lsattr: Invalid argument While reading flags on 13/cache340259B30125B9F.gif
lsattr: Invalid argument While reading flags on 13/cache343150330010C49.gif
lsattr: No such device While reading flags on 13/cache340259B30115B9F 

I also tried chattr -i 07 but got the same error message. Any other ideas?
Thanks for the suggestion. Cheers, Colin.

--
Colin Telmer, Institute of Intergovernmental Relations
School of Policy Studies Building, Room 309, Queen's University
Kingston, Ontario, Canada, K7L-3N6 (613)545-6000x4219   




--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Re: Strange disk problems - file dates out of wack

1997-10-26 Thread Philippe Troin

On Sun, 26 Oct 1997 09:42:59 EST "Colin R. Telmer" 
([EMAIL PROTECTED]) wrote:

> > Unmounted /home without any problems and ran e2fsck with the "check for
> > bad blocks" and "force" options. However, the disk seems to be fine.
> > Strange.
> 
> Here are the key parts of the original note:
> 
> There are several directories that are claimed (by du) to be absurdly big:
> 
>   501597058   ./reevesj/.netscape/cache/13
>   634965987   .
>   1017117464  ./reevesj/.netscape/cache
>   1017117572  ./reevesj/.netscape
>   1017168521  ./reevesj
> 
> Of course, those numbers are not correct!
> 
> Looking more closely at /reevesj/.netscape/cache, one finds:
> 
> br--r-srwx   1 2878729728 73,  60 May 21  2025 07
> 
> Notice the date and the permissions!  Whatever this is, I cannot remove
> it, even using "rm -f", as root! I also cannot change the permissions.

What does lsattr say ? Maybe it's an immutable file (chattr it).

Phil.



--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


RE: Strange disk problems - file dates out of wack

1997-10-26 Thread Ted Harding

On 26-Oct-97 Colin R. Telmer wrote:
> A server in my department has suddenly created (or altered) some files
> and
> I cannot figure out how to remove them. Below is the a part of the
> original note sent to me and I have tried various attempts to remove the
> files as weel with no avail. I even went to the point of creating a user
> with uid 28757 but that did not help either. One thing that isn't
> mentioned below is that when the I tried to remove the files the kernel
> stated "operation not permitted" rather than the usual permissions stuff.
> Any ideas how I can get rid of these files?


I think it's almost certain that the data on your hard disk has got
corrupted (see below). A possible cause is RAM corruption at a time when
data was being written back to disk during an update.


> -- Forwarded message --
> Date: Fri, 24 Oct 1997 11:29:40 -0400 (EDT)
> From: "James G. MacKinnon" <[EMAIL PROTECTED]>
> To: "Colin R. Telmer" <[EMAIL PROTECTED]>
> Cc: "James G. Mackinnon" <[EMAIL PROTECTED]>,
> [EMAIL PROTECTED]
> Subject: Re: frisch
> 
> On Fri, 24 Oct 1997, Colin R. Telmer wrote:
> 
>> Unmounted /home without any problems and ran e2fsck with the "check for
>> bad blocks" and "force" options. However, the disk seems to be fine.
>> Strange.
> 
> Here are the key parts of the original note:
> 
> There are several directories that are claimed (by du) to be absurdly
> big:
> 
>   501597058   ./reevesj/.netscape/cache/13
>   634965987   .
>   1017117464  ./reevesj/.netscape/cache
>   1017117572  ./reevesj/.netscape
>   1017168521  ./reevesj
> 
> Of course, those numbers are not correct!
> 
> Looking more closely at /reevesj/.netscape/cache, one finds:
> 
> br--r-srwx   1 2878729728 73,  60 May 21  2025 07
> 
> Notice the date and the permissions!  Whatever this is, I cannot remove
> it, even using "rm -f", as root! I also cannot change the permissions.
> 
> Then, within the directory /reevesj/.netscape/cache/13, one finds:
> 
>c---rwxr-t   1 2494228192 60,  62 Jan 25  2026
   cache340259B30115B9F
>pr-s-wxr--   1 3155811396   0 Jan 13  1983
   cache340259B30125B9F.gif
>p-ws-wx-wx   1 6019 23682   0 Jan 31  1940
   cache343150330010C49.gif
> 
> Notice the dates! Again, it seems to be impossible to remove these or
> change the permissions.

Note also that /reevesj/.netscape/cache (which should be an ordinary
directory, first char in directory listing should be "d", not "b") now
appear as a "block device" ("b") with major number "73" and minor "60",
which are not maj/min numbers known to me. Likewise, cache340259B30115B9F
appears not a file but as a character device with major,minor = 60,62 which
again is an unknown type; the two .gifs appear as named pipes ("p").

Given that the very nature of the file types has changed, taken with the
zany dates and sizes etc, it is almost certain that parts of the hard disk
have been written with false data. At the same time, other less obvious
corruptions may have occurred which may make files inaccessible or only
partially accessible, or point to spurious data.

This is the sort of thing that fsck should notice; James MacK  says that
fsck /was/ run, apparently normally, which is puzzling; but apparently only
options "-c -f" were used which may not reveal serious trouble.

Try (non-destructively) e2fsck -fnV on the device with these files
and stand back ... (at any rate pipe it through "less"). I predict
several thousand lines of possibly alarming information. Depending on what
you see, you may judge that it's worth taking the chance to give fsck a free
reign to try to make the filesystem clean (though it may zap some stuff in
so doing); or else raw-backup (dd to another device) the bytes on the device
and then either do fsck, or reformat the filesystem, or replace the hard
drive.

In any case it looks pretty dire from here. Sorry.

Best wishes,
Ted.


E-Mail: Ted Harding <[EMAIL PROTECTED]>
Date: 26-Oct-97   Time: 17:00:08



--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Strange disk problems - file dates out of wack

1997-10-26 Thread Colin R. Telmer
A server in my department has suddenly created (or altered) some files and
I cannot figure out how to remove them. Below is the a part of the
original note sent to me and I have tried various attempts to remove the
files as weel with no avail. I even went to the point of creating a user
with uid 28757 but that did not help either. One thing that isn't
mentioned below is that when the I tried to remove the files the kernel
stated "operation not permitted" rather than the usual permissions stuff.
Any ideas how I can get rid of these files?

--
Colin Telmer, Institute of Intergovernmental Relations
School of Policy Studies Building, Room 309, Queen's University
Kingston, Ontario, Canada, K7L-3N6 (613)545-6000x4219   



-- Forwarded message --
Date: Fri, 24 Oct 1997 11:29:40 -0400 (EDT)
From: "James G. MacKinnon" <[EMAIL PROTECTED]>
To: "Colin R. Telmer" <[EMAIL PROTECTED]>
Cc: "James G. Mackinnon" <[EMAIL PROTECTED]>,
[EMAIL PROTECTED]
Subject: Re: frisch

On Fri, 24 Oct 1997, Colin R. Telmer wrote:

> Unmounted /home without any problems and ran e2fsck with the "check for
> bad blocks" and "force" options. However, the disk seems to be fine.
> Strange.

Here are the key parts of the original note:

There are several directories that are claimed (by du) to be absurdly big:

501597058   ./reevesj/.netscape/cache/13
634965987   .
1017117464  ./reevesj/.netscape/cache
1017117572  ./reevesj/.netscape
1017168521  ./reevesj

Of course, those numbers are not correct!

Looking more closely at /reevesj/.netscape/cache, one finds:

br--r-srwx   1 2878729728 73,  60 May 21  2025 07

Notice the date and the permissions!  Whatever this is, I cannot remove
it, even using "rm -f", as root! I also cannot change the permissions.

Then, within the directory /reevesj/.netscape/cache/13, one finds:

c---rwxr-t   1 2494228192 60,  62 Jan 25  2026 cache340259B30115B9F
pr-s-wxr--   1 3155811396   0 Jan 13  1983 cache340259B30125B9F.gif
p-ws-wx-wx   1 6019 23682   0 Jan 31  1940 cache343150330010C49.gif

Notice the dates! Again, it seems to be impossible to remove these or
change the permissions.

Cheers, James.

James G. MacKinnon   Department of Economics
phone: 613 545-2293  Queen's University
  Fax: 613 545-6668  Kingston, Ontario, Canada
Email: [EMAIL PROTECTED]   K7L 3N6


--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
[EMAIL PROTECTED] . 
Trouble?  e-mail to [EMAIL PROTECTED] .


Re: disk problems

1997-04-08 Thread Mary Conner


On Sun, 6 Apr 1997, Matt Lawrence wrote:

> Ok, I've run out of places to look.  I'm getting occasional "hda:timeout"
> messages and my system is locking up with a disk error after anywhere from
> a few hours to a couple of days.  When it locks up, I can still change
> virtual consoles, but I can't run anything and ctrl-alt-del doesn't work.
> Since I hope to leave this system running unattended in Austin, crashes are
> a very bad thing.  Help???

I'm having this same problem with a 486 with a fairly new motherboard.
It seems to happen most often when there is a fair amount of disk activity
which makes me think it might be triggered by queueing writes to both
disks under certain circumstances.  One of these days when I get the
time I'm going to upgrade the flash BIOS on the motherboard to see if
that helps.  Can you get a new disk controller for your machine to see
if that helps?



HELP!! Boot Disk Problems

1997-04-07 Thread Adam Greene
I ATTEMPTED to install the Debian 1.2.4 off of a CheapBytes CD and 
the boot disks hung on the md driver and I could not go any further, 
I have a working Slackware, so I compiled a Ramdisk enabled kernel, 
stuck it on the disk and rebooted, that worked fine, but the kernel 
seemed to hang on running rc, but when I switched console I found on 
console 2, an active prompt, so I ran dselect and everything went ok. 
Is there any fix for this.

Also once I installed Abuse (video game) I couldn't get the mouse to 
work and it went quickly through the opening screens (the text didn't 
scroll it just faded the screen in and out again).  I tried turning 
the mouse off (unloading gpm).  But it just hung Abuse.

My Mouse is a Microsoft/MouseSystems flip-switch type and it works 
fine with gpm.


disk problems

1997-04-06 Thread Matt Lawrence
Ok, I've run out of places to look.  I'm getting occasional "hda:timeout"
messages and my system is locking up with a disk error after anywhere from
a few hours to a couple of days.  When it locks up, I can still change
virtual consoles, but I can't run anything and ctrl-alt-del doesn't work.
Since I hope to leave this system running unattended in Austin, crashes are
a very bad thing.  Help???

My system config:
Old 40MHz 386
8 meg RAM - (8) 1 meg 30-pin SIMMS
old I/O controller with large DTC 2280 chip.
(2) Quantum 6.4 Gb bigfoot drives
EGA card

-- Matt