Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-13 Thread Mark Fletcher
On Sun, Feb 12, 2017 at 09:36:16PM -0500, Bob Weber wrote:
> I use a program called ossec.  It watches logs of all my linux boxes so I get
> email messages about disk problems.  I also do periodic self tests on all my
> drives controlled by smartd from the  smartmontools package.  I also use a
> package called logwatch which summarizes my logs.   The messages from mdadm 
> and
> smartd are seen by ossec.  When I mess with an array to make it larger and 
> add a
> disk for backup I get the messages in my mailbox about a degraded array.  As 
> I'm
> reading them I am startled until I remember ...Oh I did that!  I have a daily
> cron job that emails the output of "smartctl -a /dev/sdx" for each drive on 
> each
> machine so I can keep a history of the parameters for each drive.
> 

$ apt-file search ossec

sagan-rules: /etc/sagan-rules/ossec.rules

Seems like the only reference to ossec in Jessie is this rules file in 
the Sagan package. Looking at the description for sagan-rules, it seems 
to be along the right lines. But the sagan package is not in Jessie it 
seems. It's in wheezy and in stretch/sid, but not in jessie. Any idea 
what's up with that?

And was ossec packaged, or did you build it from source?

Cheers

Mark



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Shapiro

On 02/12/2017 06:36 PM, Bob Weber wrote:


After writing this I wonder if I am over doing this.  I just don't want to loose
data from a failing drive.  I lived through 3.5 inch floppies which seemed to
always fail.  And tape drives that were painfully slow.  Not to mention back in
the mid 70s saving Z80 programs and data to audio cassette tapes at 1200 baud!
I was so glad to get my first 8 inch floppys working.

...Bob

I, too remember the cassette tapes for saving files and programs on my 
TRS-80 Model III.  I think I still have a few of those tapes (10 minutes 
tapes for a single program) lying around.  The Radio Shack cassette 
player has long since died, however.



Marc



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Bob Weber
On 02/12/2017 01:59 PM, Marc Shapiro wrote:
> On 02/12/2017 08:30 AM, Marc Auslander wrote:
>> I do not use LVM over raid 1.  I think it can be made to work,
>> although IIRC booting from an LVM over RAID partion has caused issues.
> my boot partitions are separate.  They are not under LVM.
>> LVM is useful when space requirements are changing over time and the
>> ability to add additional disks and grow logical partions is needed.
>> In my case, that isn't an issue.  I have only a small number of
>> paritions - 3 because of history but starting from scratch, I'd only
>> have two - root (including boot) and /home.
> I started using LVM when I had a much smaller disk (40GB).  With the current
> 1TB disk, even with three accounts on the box, and expanding several
> partitions when moving to the new disk, I have still partitioned less than
> half the disk and that is less than 1/3 used. So, no, LVM is probably not an
> issue any more.
>
> BTW, what is your third partition, and why would you not separate it now if
> starting from scratch?
>> I converted to mdamd raid as follows, IIRC.
>>
>> Install the second disk, and parition it the way I wanted.
>> Create a one disk raid 1 partion in each of the new paritions.
>> Take down my system, boot a live system from CD, and use a reliable
>> copy program like rsync to copy each of the partitions contents to the
>> equivalent raid partition.
>> Run grub to set the new disk as bootable.  This is by far the
>> trickiest part.
>> Boot the new system and verify it's happy.
>> Repartion the now spare disk to match the new one if necessary.
>> You may need to zero the front of each partion with dd if=/dev/zero
>> to avoid mdadm error checks.
>> Add the partitions from that disk to the mdadm paritions and let mdadm
>> do its thing.
>>
> On 02/12/2017 07:08 AM, Bob Weber wrote:
>>
>> I use raid 1 also for the redundancy it provides.  If I need a backup I just
>> connect a disk, grow each array and add it to the array (I have 3 arrays for
>> /, /home and swap).  It syncs up in a couple hours (depending on size of the
>> array).  If you have grub install itself on the added disk you have a
>> bootable copy of your system (mdadm will complain about a degraded array).  I
>> then remove the drive and place it in another outbuilding in case of fire. 
>> You can even use a external USB disk housing for the drive to keep from
>> shutting down the system.  The sync is MUCH slower ... just coma back the
>> next day and you will have your backup.  You then grow each array back to the
>> number of disks you had before and all is happy again.  Note that this single
>> disk backup will only work with raid 1.
>>
> So, how do you do a complete restore from backup?  Boot from just the single
> backup drive and add additional drives as Marc Auslander describes, above?

Yes if that is what you need to do if there was a complete failure in your
machine and maybe you had to start over with a new motherboard and power supply.

>
>
> One other question.  If using raid, how do you know when a disk is starting to
> have trouble, as mine did?  Since the whole purpose of raid is to keep the
> system up and running I wouldn't expect errors to pop up like I was getting. 
> Do you have to keep an eye on log files?  Which ones?  Or is there some other
> way that mdadm provides notification of errors?  I've got to admit, even
> though I have been using Debian for 18 or 19 years (since Bo), log files have
> never been my favorite thing.  I generally only look at them when I have a
> problem and someone on this luist tells me what to look for and where.
>
> Marc
>
>
I use a program called ossec.  It watches logs of all my linux boxes so I get
email messages about disk problems.  I also do periodic self tests on all my
drives controlled by smartd from the  smartmontools package.  I also use a
package called logwatch which summarizes my logs.   The messages from mdadm and
smartd are seen by ossec.  When I mess with an array to make it larger and add a
disk for backup I get the messages in my mailbox about a degraded array.  As I'm
reading them I am startled until I remember ...Oh I did that!  I have a daily
cron job that emails the output of "smartctl -a /dev/sdx" for each drive on each
machine so I can keep a history of the parameters for each drive.

I also use backuppc on a dedicated server to backup all my boxes.  That way I
can get back files I deleted by mistake or modified and has to go back to a
previous version.  I now have all my machines on raid 1,  My wife just recently
gave up on Win 10 with all those updates that just took over her machine when
Windows wanted to!  So now she is running Debian/KDE.

After writing this I wonder if I am over doing this.  I just don't want to loose
data from a failing drive.  I lived through 3.5 inch floppies which seemed to
always fail.  And tape drives that were painfully slow.  Not to mention back in
the mid 70s saving Z80 programs and data to audio cassette 

Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Auslander
Marc Shapiro  writes:

> BTW, what is your third partition, and why would you not separate it
> now if starting from scratch?
My third partition is for backups which I make to protect against
software or operator error.  At one point it was on a separate disk
since disks were small and without LVM had to be a different
partition/file system.
>
>
> One other question.  If using raid, how do you know when a disk is
> starting to have trouble, as mine did?  Since the whole purpose of
...
> Marc

Ok - I'm pretty paranoid about that.  smart is checking.
mdadm will notice if a disk is bad and turn
it off, so to speak.  Again in the logs.
I run a cron job to check form smart errors based on:

smartctl -l error -q errorsonly "device"
smartctl -H -q errorsonly "device"

But I've always checked all my disks once a week.  A root cron job
reads the whole disk with dd into /dev/null.  Any error get logged, of
course.  Separately, a cron job scans syslog and syslog.1 grepping for
"IO Error" and informs me by email if any new errors are found.  This
catches error in the dd check but also actual errors in operation.



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Shapiro

On 02/12/2017 08:30 AM, Marc Auslander wrote:

I do not use LVM over raid 1.  I think it can be made to work,
although IIRC booting from an LVM over RAID partion has caused issues.

my boot partitions are separate.  They are not under LVM.

LVM is useful when space requirements are changing over time and the
ability to add additional disks and grow logical partions is needed.
In my case, that isn't an issue.  I have only a small number of
paritions - 3 because of history but starting from scratch, I'd only
have two - root (including boot) and /home.
I started using LVM when I had a much smaller disk (40GB).  With the 
current 1TB disk, even with three accounts on the box, and expanding 
several partitions when moving to the new disk, I have still partitioned 
less than half the disk and that is less than 1/3 used. So, no, LVM is 
probably not an issue any more.


BTW, what is your third partition, and why would you not separate it now 
if starting from scratch?

I converted to mdamd raid as follows, IIRC.

Install the second disk, and parition it the way I wanted.
Create a one disk raid 1 partion in each of the new paritions.
Take down my system, boot a live system from CD, and use a reliable
copy program like rsync to copy each of the partitions contents to the
equivalent raid partition.
Run grub to set the new disk as bootable.  This is by far the
trickiest part.
Boot the new system and verify it's happy.
Repartion the now spare disk to match the new one if necessary.
You may need to zero the front of each partion with dd if=/dev/zero
to avoid mdadm error checks.
Add the partitions from that disk to the mdadm paritions and let mdadm
do its thing.


On 02/12/2017 07:08 AM, Bob Weber wrote:


I use raid 1 also for the redundancy it provides.  If I need a backup 
I just connect a disk, grow each array and add it to the array (I have 
3 arrays for /, /home and swap).  It syncs up in a couple hours 
(depending on size of the array).  If you have grub install itself on 
the added disk you have a bootable copy of your system (mdadm will 
complain about a degraded array).  I then remove the drive and place 
it in another outbuilding in case of fire.  You can even use a 
external USB disk housing for the drive to keep from shutting down the 
system.  The sync is MUCH slower ... just coma back the next day and 
you will have your backup.  You then grow each array back to the 
number of disks you had before and all is happy again.  Note that this 
single disk backup will only work with raid 1.


So, how do you do a complete restore from backup?  Boot from just the 
single backup drive and add additional drives as Marc Auslander 
describes, above?



One other question.  If using raid, how do you know when a disk is 
starting to have trouble, as mine did?  Since the whole purpose of raid 
is to keep the system up and running I wouldn't expect errors to pop up 
like I was getting.  Do you have to keep an eye on log files?  Which 
ones?  Or is there some other way that mdadm provides notification of 
errors?  I've got to admit, even though I have been using Debian for 18 
or 19 years (since Bo), log files have never been my favorite thing.  I 
generally only look at them when I have a problem and someone on this 
luist tells me what to look for and where.


Marc



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Auslander
Marc Shapiro  writes:

> the past couple of weeks.  AIUI you can use LVM over raid.  Is there
> any actual advantage to this?  I was trying to determine the
> advantages of using straight raid, straight LVM, or LVM over raid.  If
> I decide, later, to use raid, how dificult is it to add to a currently
> running system (with, or without LVM)?
>
>
> Marc
I do not use LVM over raid 1.  I think it can be made to work,
although IIRC booting from an LVM over RAID partion has caused issues.

LVM is useful when space requirements are changing over time and the
ability to add additional disks and grow logical partions is needed.
In my case, that isn't an issue.  I have only a small number of
paritions - 3 because of history but starting from scratch, I'd only
have two - root (including boot) and /home.

I converted to mdamd raid as follows, IIRC.

Install the second disk, and parition it the way I wanted.
Create a one disk raid 1 partion in each of the new paritions.
Take down my system, boot a live system from CD, and use a reliable
copy program like rsync to copy each of the partitions contents to the
equivalent raid partition.
Run grub to set the new disk as bootable.  This is by far the
trickiest part.
Boot the new system and verify it's happy.
Repartion the now spare disk to match the new one if necessary.
You may need to zero the front of each partion with dd if=/dev/zero
to avoid mdadm error checks.
Add the partitions from that disk to the mdadm paritions and let mdadm
do its thing.



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Bob Weber
I use raid 1 also for the redundancy it provides.  If I need a backup I just
connect a disk, grow each array and add it to the array (I have 3 arrays for /,
/home and swap).  It syncs up in a couple hours (depending on size of the
array).  If you have grub install itself on the added disk you have a bootable
copy of your system (mdadm will complain about a degraded array).  I then remove
the drive and place it in another outbuilding in case of fire.  You can even use
a external USB disk housing for the drive to keep from shutting down the
system.  The sync is MUCH slower ... just coma back the next day and you will
have your backup.  You then grow each array back to the number of disks you had
before and all is happy again.  Note that this single disk backup will only work
with raid 1.


*...Bob*
On 02/11/2017 10:42 PM, Marc Shapiro wrote:
> On 02/11/2017 05:22 PM, Marc Auslander wrote:
>> You didn't ask for advice so take it or ignore it.
>>
>> IMHO, in this day and age, there is no reason not to run raid 1.  Two
>> disks, identially partitioned, each parition set up as a raid 1
>> partition with two copies.
>>
>> When a disk dies, you remove it from all the raid partitions, pop in a
>> new disk, partition it,  add the new partitions back into the raid
>> partitions and raid rebuilds the copies.
>>
>> Except for taking the system down to replace the disk (assuming you
>> don't have a third installed as a spare) you just keep running as if
>> nothing has happened.
>>
> I had been considering using raid 1 and I have not yet ruled it out entirely. 
> I have never used raid and have been reading up on it over the past couple of
> weeks.  AIUI you can use LVM over raid.  Is there any actual advantage to
> this?  I was trying to determine the advantages of using straight raid,
> straight LVM, or LVM over raid.  If I decide, later, to use raid, how dificult
> is it to add to a currently running system (with, or without LVM)?
>
>
> Marc
>
>



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread Marc Shapiro

On 02/11/2017 05:22 PM, Marc Auslander wrote:

You didn't ask for advice so take it or ignore it.

IMHO, in this day and age, there is no reason not to run raid 1.  Two
disks, identially partitioned, each parition set up as a raid 1
partition with two copies.

When a disk dies, you remove it from all the raid partitions, pop in a
new disk, partition it,  add the new partitions back into the raid
partitions and raid rebuilds the copies.

Except for taking the system down to replace the disk (assuming you
don't have a third installed as a spare) you just keep running as if
nothing has happened.

I had been considering using raid 1 and I have not yet ruled it out 
entirely.  I have never used raid and have been reading up on it over 
the past couple of weeks.  AIUI you can use LVM over raid.  Is there any 
actual advantage to this?  I was trying to determine the advantages of 
using straight raid, straight LVM, or LVM over raid.  If I decide, 
later, to use raid, how dificult is it to add to a currently running 
system (with, or without LVM)?



Marc



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread Felix Miata

Marc Auslander composed on 2017-02-11 20:22 (UTC-0500):


IMHO, in this day and age, there is no reason not to run raid 1.

Are you sure? Laptops have been outselling desktops for years.
--
"The wise are known for their understanding, and pleasant
words are persuasive." Proverbs 16:21 (New Living Translation)

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata  ***  http://fm.no-ip.com/



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread Marc Auslander
You didn't ask for advice so take it or ignore it.

IMHO, in this day and age, there is no reason not to run raid 1.  Two
disks, identially partitioned, each parition set up as a raid 1
partition with two copies.

When a disk dies, you remove it from all the raid partitions, pop in a
new disk, partition it,  add the new partitions back into the raid
partitions and raid rebuilds the copies.

Except for taking the system down to replace the disk (assuming you
don't have a third installed as a spare) you just keep running as if
nothing has happened.



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread David Christensen

On 02/10/17 23:39, Marc Shapiro wrote:

On 02/08/2017 05:32 PM, David Christensen wrote:

On 02/08/17 15:59, Marc Shapiro wrote:

So how do I lay down a low level format on [the new 1 TB] drive?

I would use the SeaTools bootable CD to fill the drive with zeroes:
On 02/03/17 23:13, David Christensen wrote:

Sometimes you get lucky and the tool is a live CD:

www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO

I didn't feel like burning a CD and it has been a long time since I had
a box with a 3.5" floppy (although i do have one or two drives in a box
somewhere and quite a few of the folppies, themselves, as well)


3.5" floppy?  The link above is for a live CD.


 so I just used dd to write zeros to the disk. It took a while, but it 

> did the job.

For a HDD, the effect should be the same.



I partitioned the new disk with 3 physical partitions of 2GB each for
root/boot partitions.  ...
The 4th partition was set up for LVM and was set as a Physical Volume
(PV) to be added to the volume group along with my old drive.


The problem with putting everything on one big disk is that it becomes 
impractical to clone the system image.  I'm still climbing the disk 
imaging learning curve, but it's a useful technique that has saved me 
countless hours.




In the end, I picked yet another method for moving to the new disk. ...


Congratulations on your success battling through it all, especially LVM.


David



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-10 Thread Marc Shapiro

On 02/08/2017 05:32 PM, David Christensen wrote:

On 02/08/17 15:59, Marc Shapiro wrote:

So how do I lay down a low level format on [the new 1 TB] drive?


I would use the SeaTools bootable CD to fill the drive with zeroes:

On 02/03/17 23:13, David Christensen wrote:
> Sometimes you get lucky and the tool is a live CD:
>
> 
www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO



David

I didn't feel like burning a CD and it has been a long time since I had 
a box with a 3.5" floppy (although i do have one or two drives in a box 
somewhere and quite a few of the folppies, themselves, as well) so I 
just used dd to write zeros to the disk.  It took a while, but it did 
the job.  In the end, I picked yet another method for moving to the new 
disk.  As mentioned  in my first post, I am using LVM and I have unused 
space in the VG. I was debating with myself whether I wanted to continue 
to use LVM, or just use raw disk partitions.  I almost went with raw 
disk partitions before I came across 'pvmove', which does exactly what I 
needed.  So...


I partitioned the new disk with 3 physical partitions of 2GB each for 
root/boot partitions.


The 4th partition was set up for LVM and was set as a Physical Volume 
(PV) to be added to the volume group along with my old drive.


Before adding the new disk, I created a new Logical Volume (LV) and 
manually copied my home partition (one user tree at a time) to the new 
partition.  This spat out errors whenever it hit an unreadable sector 
and I redirected those errors to a file for later use.


I then added the LVM partition from the new disk to the Volume Group 
(VG) and did a 'pvmove' for each LV from the old PVto the new PV.


I included the original LV for /home, along with the newly copied LV.  I 
expected it to spit out errors and fail, but it didn't.  I could hear it 
struggle a bit when it hit the bad spots, but then it kept going.  This 
was actually a good thing.  I had the list of affected files from when I 
did the manual copy of the /home partition, so I knew what to check 
after the move.  Several of the files were videos.  Using the original 
files before copying, Xine would play up to the first I/O Error and then 
freeze, even though it continued to read the file and advance the 
timeline until the file ended.  Using the manually copied file, which 
truncated at the first error, I also only got the beginning of the video 
and then it ended.  Using the file from the original LV which I moved to 
the new disk with pvmove, however, gave better results.  There is a bit 
of flicker when it hits a sector that had been unreadable before moving, 
but it continues on so the rest of the video can be viewed.  A few of 
the other files I did delete (Libre Office document files do not survive 
well, but I have a PDF of that file if I ever need it again).


Then I just had to copy over the root/boot partitions which I did from a 
shell after booting my clonezilla CD (it came in handy after all) and 
run lilo on them to make the new disk bootable. Everything seems good, 
now.  I ran the full test from SeagateTools (st) again, today, just to 
verify that all was still good.  It was.  I now have an empty PV in my 
LVM volume group that I will need to remove before I add any new Logical 
Volumes (LVs), but I can do that any time.  Since there are no LVs on it 
nothing will attempt to read from it, or write to it.


I'll keep an eye on the disk for a while, but this should fix the 
problem.  If I ever have a failing disk again I hope that I will 
remember this method because the LVM pvmove command really did make 
moving to another disk easy.  The hard part was dealing with the 
root/boot partitions and getting the new disk bootable.


Hopefully this thread will help someone else who has a similar problem 
in the future.



Marc




Re: HELP! Re: How to fix I/O errors?

2017-02-10 Thread Ric Moore

On 02/09/2017 12:13 PM, Greg Wooledge wrote:


You shared your philosophy ("tear it all down and rebuild it from scratch
every two years")


I don't know where you got this. The OP was having one helluva time with 
a harddrive. I suggested that he create a partition to store his 
personal files "more safely" as /opt, when he did a partition, format 
and re-install to the new drive. After he could mount the failing drive 
and copy as many personal files as he could salvage to the new 
/opt/ install. Then, if the need arises, a re-install is 
relatively painless. I have never exposed wipe and re-install every two 
years. That would be stupid. The decision to upgrade is purely a 
personal one, driven either by choice or necessity.



and I shared mine ("keep everything unchanged until
you are forced to change it").



A dying harddrive will drive change, don't you think??


Neither one is right, and neither one
is wrong.  I just wanted both viewpoints to be equally represented.



"Viewpoints", as in politics, do not remedy a failing drive nor the 
rescue of it's contents. That was the reason the OP posted. Please keep 
his needs in mind. Ric



--
My father, Victor Moore (Vic) used to say:
"There are two Great Sins in the world...
..the Sin of Ignorance, and the Sin of Stupidity.
Only the former may be overcome." R.I.P. Dad.
http://linuxcounter.net/user/44256.html



Re: HELP! Re: How to fix I/O errors?

2017-02-09 Thread Greg Wooledge
On Thu, Feb 09, 2017 at 12:03:18PM -0500, Ric Moore wrote:
> How so?? Don't "many other operating systems" have different 
> configuration files in many other locations?? I wouldn't expect BSD 
> config files to migrate to Linux, or Windows to do anything useful.

When I shared my $HOME between OpenBSD and Debian for a time, I didn't
have many problems at all.  There are some shell functions that I only
created when $(uname -s) was Linux, but that's about it.

Most of the command-line tools that use dot-files in $HOME are the same.
Just stick with the older-common-denominator syntax in things like
~/.muttrc and and ~/.ssh/config and you should be fine.  (Hint: when
mixing Debian with other non-legacy Unixes, usually it'll be Debian that
has the older version of the tool.)

You shared your philosophy ("tear it all down and rebuild it from scratch
every two years") and I shared mine ("keep everything unchanged until
you are forced to change it").  Neither one is right, and neither one
is wrong.  I just wanted both viewpoints to be equally represented.



Re: HELP! Re: How to fix I/O errors?

2017-02-09 Thread Ric Moore

On 02/09/2017 08:10 AM, Greg Wooledge wrote:

On Wed, Feb 08, 2017 at 06:06:34PM -0500, Ric Moore wrote:

Careful there, I would not copy any of the /home/username/dot-files or
dot directories over, except like .mozilla and .thunderbird, so you
don't carry over some old and crufty setting that might have been
problematic.


I have the exact opposite philosophy.  My home directory has survived
across many, many different operating systems and computers.


How so?? Don't "many other operating systems" have different 
configuration files in many other locations?? I wouldn't expect BSD 
config files to migrate to Linux, or Windows to do anything useful.




If a
new version of some app breaks compatibility with a dot file, which
is rare, then I'll handle that on a case by case basis.


...and that is you. I suspect that in this case that the OP doesn't wish 
anything to jump up and bite his behind. And, you seem to be able to 
deal with things on a case by case level, but just maybe the OP cannot. 
Ergo, some discretion is in order ...unless you are willing to provide 
life support in person.



Otherwise,
I get to keep all of my comfortable settings.


True, true. But, we're now talking about your comfort level, with 
successful builds, and not his. Some empathy is always a good thing, 
especially when it comes to tech support advice. :) Ric



--
My father, Victor Moore (Vic) used to say:
"There are two Great Sins in the world...
..the Sin of Ignorance, and the Sin of Stupidity.
Only the former may be overcome." R.I.P. Dad.
http://linuxcounter.net/user/44256.html



Re: HELP! Re: How to fix I/O errors?

2017-02-09 Thread Greg Wooledge
On Wed, Feb 08, 2017 at 06:06:34PM -0500, Ric Moore wrote:
> Careful there, I would not copy any of the /home/username/dot-files or 
> dot directories over, except like .mozilla and .thunderbird, so you 
> don't carry over some old and crufty setting that might have been 
> problematic.

I have the exact opposite philosophy.  My home directory has survived
across many, many different operating systems and computers.  If a
new version of some app breaks compatibility with a dot file, which
is rare, then I'll handle that on a case by case basis.  Otherwise,
I get to keep all of my comfortable settings.



Re: HELP! Re: How to fix I/O errors?

2017-02-08 Thread rhkramer
On Wednesday, February 08, 2017 06:37:55 PM Marc Shapiro wrote:
> On 02/08/2017 03:06 PM, Ric Moore wrote:
> > On 02/08/2017 04:38 PM, Marc Shapiro wrote:
> > Careful there, I would not copy any of the /home/username/dot-files or
> > dot directories over, except like .mozilla and .thunderbird, so you
> > don't carry over some old and crufty setting that might have been
> > problematic. To spare you nightmares like this one, I use the /opt
> > directory on a separate partition for all of my personal data.
> > So, I use /opt/ric/Documents and in my brand-new /home/ric directory I
> > delete the newly created Documents directory and then link (ln -s
> > /opt/ric/Documents Documents) and do the same with the other familiar
> > home directories like Videos, Music, Downloads, everything except
> > Desktop. If something goes ape, systemk-wise, you can do a fresh
> > install of / (root) directory and leave /opt alone. I've done this
> > since the old Caldera days. Nary a burp in the barrel! Ric

Why not make your own top level directory, i.e. /ric (with Documents and 
such)--that's what I do.

> I don't usually go quite that far, but photos, videos, and virtual disks
> are all in /usr/local/  which I will also need to copy over.  

Same comment as above--why not make your own top level directory for that 
stuff.  (Reading the File Hierarchy Standard (FHS), I don't think that is quite 
the intent of /usr/local--and could make some things inconvenient at one time 
or another...)

> You say to
> avoid copying   except .mozilla and .thunderbird.  I have 117 such
> dot-files and dot-directories.  Are you saying only to leave .mozilla
> and .thunderbird and have everything else rebuild when it is next used.
> Admittedly, that will get rid of some cruft, but how should I determine
> if there are others that I should keep?
> 
> 
> I tried to format the new drive using st (Seagate Tools).  It said that
> it would remove all data, which is expected, but nothing was removed!
> It also took less than a minute.  Should I be using /dev/sda in the
> command line instead of /dev/sg0 (which is how st -l lists the drive?
> 
> 
> Marc
> 
> 
> 
> Marc



Re: HELP! Re: How to fix I/O errors?

2017-02-08 Thread David Christensen

On 02/08/17 15:59, Marc Shapiro wrote:

So how do I lay down a low level format on [the new 1 TB] drive?


I would use the SeaTools bootable CD to fill the drive with zeroes:

On 02/03/17 23:13, David Christensen wrote:
> Sometimes you get lucky and the tool is a live CD:
>
> 
www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO



David



Re: HELP! Re: How to fix I/O errors?

2017-02-08 Thread Marc Shapiro

On 02/08/2017 03:37 PM, Marc Shapiro wrote:

On 02/08/2017 03:06 PM, Ric Moore wrote:

On 02/08/2017 04:38 PM, Marc Shapiro wrote:

On 02/08/2017 01:26 PM, Ric Moore wrote:

On 02/08/2017 02:37 AM, Marc Shapiro wrote:
How it went is not well.  I tested the new drive with SeagateTools 
and
it was fine.  Then I made a clonezilla live CD and booted from 
it.  It
stopped on the first read error with a message saying to restart 
using

the rescue option.  I did that.  After 5 hours it finished without
mentioning any errors.

I tried to boot to the old disk (since it was still wired that 
way).  I
got dropped int a maintenance shell with fs errors in /dev/sda4 
which is
the physical volume for all my LVM logical volumes -- /usr, /var, 
/home

and /temp.  It says to run fsck manually.

I decided to try the new drive, so I changed the cables and 
re-booted.


Maintenance shell, again.

/ mounted clean

lvm started

/home fs has errors run fsck (at this point, I'm afraid to try it)

/var, /usr, and /tmp all say that the superblock can not be read, 
or is

invalid.  Try running

e2fsck -b 8193 
or
e2fsck -b 32768 

Which do I use?

How did trying to clone the disk nake such a mess of BOTH disks?



You cloned a mess, you got a perfect copy. I'd do a clean install to
the new drive, after formatting the entire drive. Once you boot into
that drive, mount the old drive. It should show up in 
/media/

Then copy the directories of personal stuff you want to keep to a new
location on the new drive. I use cp -raf 
 and everything, including sub-directories, file
ownership and file permissions are preserved. If a file is clunky, it
won't copy it and should proceed.

Next, if you are in your office, observe if the window is open. If
yes, throw the old drive out of it. :) Ric



Ric,


As soon as I finished my last post (above) I realized that what you
suggest is exactly what I should have done in the first place. Why I
did not realize that earlier (and save myself a lot of headaches) I do
not know.  The system is now booting to the old drive, just as it did
before.  I think it just needed a good night's sleep.  I know that I 
did.


My next steps are:

Format new drive

Install fresh on new drive

Mount and copy /home from old drive to new drive


Careful there, I would not copy any of the /home/username/dot-files 
or dot directories over, except like .mozilla and .thunderbird, so 
you don't carry over some old and crufty setting that might have been 
problematic. To spare you nightmares like this one, I use the /opt 
directory on a separate partition for all of my personal data.
So, I use /opt/ric/Documents and in my brand-new /home/ric directory 
I delete the newly created Documents directory and then link (ln -s 
/opt/ric/Documents Documents) and do the same with the other familiar 
home directories like Videos, Music, Downloads, everything except 
Desktop. If something goes ape, systemk-wise, you can do a fresh 
install of / (root) directory and leave /opt alone. I've done this 
since the old Caldera days. Nary a burp in the barrel! Ric




I don't usually go quite that far, but photos, videos, and virtual 
disks are all in /usr/local/  which I will also need to copy over.  
You say to avoid copying   except .mozilla and .thunderbird.  I have 
117 such dot-files and dot-directories.  Are you saying only to leave 
.mozilla and .thunderbird and have everything else rebuild when it is 
next used.  Admittedly, that will get rid of some cruft, but how 
should I determine if there are others that I should keep?



I tried to format the new drive using st (Seagate Tools).  It said 
that it would remove all data, which is expected, but nothing was 
removed!  It also took less than a minute.  Should I be using /dev/sda 
in the command line instead of /dev/sg0 (which is how st -l lists the 
drive)?
I just tried this with 'st -i /dev/sda' (which should give drive info) 
and it does nothing, so that doesn't work.  So how do I lay down a low 
level format on this drive?



Marc



Marc





Re: HELP! Re: How to fix I/O errors?

2017-02-08 Thread Marc Shapiro

On 02/08/2017 03:06 PM, Ric Moore wrote:

On 02/08/2017 04:38 PM, Marc Shapiro wrote:

On 02/08/2017 01:26 PM, Ric Moore wrote:

On 02/08/2017 02:37 AM, Marc Shapiro wrote:

How it went is not well.  I tested the new drive with SeagateTools and
it was fine.  Then I made a clonezilla live CD and booted from it.  It
stopped on the first read error with a message saying to restart using
the rescue option.  I did that.  After 5 hours it finished without
mentioning any errors.

I tried to boot to the old disk (since it was still wired that 
way).  I
got dropped int a maintenance shell with fs errors in /dev/sda4 
which is
the physical volume for all my LVM logical volumes -- /usr, /var, 
/home

and /temp.  It says to run fsck manually.

I decided to try the new drive, so I changed the cables and re-booted.

Maintenance shell, again.

/ mounted clean

lvm started

/home fs has errors run fsck (at this point, I'm afraid to try it)

/var, /usr, and /tmp all say that the superblock can not be read, 
or is

invalid.  Try running

e2fsck -b 8193 
or
e2fsck -b 32768 

Which do I use?

How did trying to clone the disk nake such a mess of BOTH disks?



You cloned a mess, you got a perfect copy. I'd do a clean install to
the new drive, after formatting the entire drive. Once you boot into
that drive, mount the old drive. It should show up in /media/
Then copy the directories of personal stuff you want to keep to a new
location on the new drive. I use cp -raf 
 and everything, including sub-directories, file
ownership and file permissions are preserved. If a file is clunky, it
won't copy it and should proceed.

Next, if you are in your office, observe if the window is open. If
yes, throw the old drive out of it. :) Ric



Ric,


As soon as I finished my last post (above) I realized that what you
suggest is exactly what I should have done in the first place. Why I
did not realize that earlier (and save myself a lot of headaches) I do
not know.  The system is now booting to the old drive, just as it did
before.  I think it just needed a good night's sleep.  I know that I 
did.


My next steps are:

Format new drive

Install fresh on new drive

Mount and copy /home from old drive to new drive


Careful there, I would not copy any of the /home/username/dot-files or 
dot directories over, except like .mozilla and .thunderbird, so you 
don't carry over some old and crufty setting that might have been 
problematic. To spare you nightmares like this one, I use the /opt 
directory on a separate partition for all of my personal data.
So, I use /opt/ric/Documents and in my brand-new /home/ric directory I 
delete the newly created Documents directory and then link (ln -s 
/opt/ric/Documents Documents) and do the same with the other familiar 
home directories like Videos, Music, Downloads, everything except 
Desktop. If something goes ape, systemk-wise, you can do a fresh 
install of / (root) directory and leave /opt alone. I've done this 
since the old Caldera days. Nary a burp in the barrel! Ric




I don't usually go quite that far, but photos, videos, and virtual disks 
are all in /usr/local/  which I will also need to copy over.  You say to 
avoid copying   except .mozilla and .thunderbird.  I have 117 such 
dot-files and dot-directories.  Are you saying only to leave .mozilla 
and .thunderbird and have everything else rebuild when it is next used.  
Admittedly, that will get rid of some cruft, but how should I determine 
if there are others that I should keep?



I tried to format the new drive using st (Seagate Tools).  It said that 
it would remove all data, which is expected, but nothing was removed!  
It also took less than a minute.  Should I be using /dev/sda in the 
command line instead of /dev/sg0 (which is how st -l lists the drive?



Marc



Marc



Re: HELP! Re: How to fix I/O errors?

2017-02-08 Thread Ric Moore

On 02/08/2017 04:38 PM, Marc Shapiro wrote:

On 02/08/2017 01:26 PM, Ric Moore wrote:

On 02/08/2017 02:37 AM, Marc Shapiro wrote:

How it went is not well.  I tested the new drive with SeagateTools and
it was fine.  Then I made a clonezilla live CD and booted from it.  It
stopped on the first read error with a message saying to restart using
the rescue option.  I did that.  After 5 hours it finished without
mentioning any errors.

I tried to boot to the old disk (since it was still wired that way).  I
got dropped int a maintenance shell with fs errors in /dev/sda4 which is
the physical volume for all my LVM logical volumes -- /usr, /var, /home
and /temp.  It says to run fsck manually.

I decided to try the new drive, so I changed the cables and re-booted.

Maintenance shell, again.

/ mounted clean

lvm started

/home fs has errors run fsck (at this point, I'm afraid to try it)

/var, /usr, and /tmp all say that the superblock can not be read, or is
invalid.  Try running

e2fsck -b 8193 
or
e2fsck -b 32768 

Which do I use?

How did trying to clone the disk nake such a mess of BOTH disks?



You cloned a mess, you got a perfect copy. I'd do a clean install to
the new drive, after formatting the entire drive. Once you boot into
that drive, mount the old drive. It should show up in /media/
Then copy the directories of personal stuff you want to keep to a new
location on the new drive. I use cp -raf 
 and everything, including sub-directories, file
ownership and file permissions are preserved. If a file is clunky, it
won't copy it and should proceed.

Next, if you are in your office, observe if the window is open. If
yes, throw the old drive out of it. :) Ric



Ric,


As soon as I finished my last post (above) I realized that what you
suggest is exactly what I should have done in the first place.  Why I
did not realize that earlier (and save myself a lot of headaches) I do
not know.  The system is now booting to the old drive, just as it did
before.  I think it just needed a good night's sleep.  I know that I did.

My next steps are:

Format new drive

Install fresh on new drive

Mount and copy /home from old drive to new drive


Careful there, I would not copy any of the /home/username/dot-files or 
dot directories over, except like .mozilla and .thunderbird, so you 
don't carry over some old and crufty setting that might have been 
problematic. To spare you nightmares like this one, I use the /opt 
directory on a separate partition for all of my personal data.
So, I use /opt/ric/Documents and in my brand-new /home/ric directory I 
delete the newly created Documents directory and then link (ln -s 
/opt/ric/Documents Documents) and do the same with the other familiar 
home directories like Videos, Music, Downloads, everything except 
Desktop. If something goes ape, systemk-wise, you can do a fresh install 
of / (root) directory and leave /opt alone. I've done this since the old 
Caldera days. Nary a burp in the barrel! Ric




--
My father, Victor Moore (Vic) used to say:
"There are two Great Sins in the world...
..the Sin of Ignorance, and the Sin of Stupidity.
Only the former may be overcome." R.I.P. Dad.
http://linuxcounter.net/user/44256.html



Re: HELP! Re: How to fix I/O errors?

2017-02-08 Thread David Christensen

On 02/07/17 23:37, Marc Shapiro wrote:
> How it went is not well.

> David Christensen wrote:
>> Run memtest86+ for 24+ hours to verify that you don't have a memory
>> problem.

Did you test the memory?  If not, test it now just to be sure.


>> Use SeaTools to wipe the new 1 TB drive and run the short and long
>> tests.  Stop if anything fails.

I tested the new drive with SeagateTools and it
was fine.


Please confirm that you wiped the 1 TB recovery drive.



Then I made a clonezilla live CD and booted from it.  It stopped
on the first read error with a message saying to restart using the rescue
option.  I did that.  After 5 hours it finished without mentioning any
errors.

I tried to boot to the old disk (since it was still wired that way).  I got
dropped int a maintenance shell with fs errors in /dev/sda4 which is the
physical volume for all my LVM logical volumes -- /usr, /var, /home and
/temp.  It says to run fsck manually.

I decided to try the new drive, so I changed the cables and re-booted.

Maintenance shell, again.

/ mounted clean

lvm started

/home fs has errors run fsck (at this point, I'm afraid to try it)

/var, /usr, and /tmp all say that the superblock can not be read, or is
invalid.  Try running

e2fsck -b 8193 
or
e2fsck -b 32768 

Which do I use?

>

How did trying to clone the disk nake such a mess of BOTH disks?


Don't blame Clonezilla.  Everything is decaying -- you, me, those hard 
drives, etc..  With that in mind, do the most precious operations first 
-- because in 1 second, 1 minute, 1 hour, 1 day, 1 month, 1 year, 1 
decade, 1 century, whatever, the data will be inaccessible without 
extraordinary means.



Forget about booting off the failing 1 TB disk.  Disconnect it for now.


Forget about booting off the 1 TB recovery disk.  It should now contain 
whatever blocks Clonezilla was able to recover.  It is now in a state 
analogous to Swiss cheese.  Disconnect it for now.




Any help getting a working system again will be greatly appreciated.


On the computer you use for e-mail, start an administration log folder 
for the machine in question.  Start a log.txt file and take notes.  Cut 
and paste what you can.  Photograph screens and transcribe what you 
can't.  Collect important files.  Put it all into a version control system.



>> I'd do a fresh install on a 16+ GB SSD (USB flash drives also
>> work).

Install SSH when you build the new system drive.


Use ssh(1) to log in from your e-mail computer.  Consider using 
script(1) to capture your console sessions, and scp(1) to copy out the 
files.  Read fsck(8) and consider your moves carefully.  Reconnect the 1 
TB recovery disk and see what fsck can recover.



David



HELP! Re: How to fix I/O errors?

2017-02-07 Thread Marc Shapiro
How it went is not well.  I tested the new drive with SeagateTools and it
was fine.  Then I made a clonezilla live CD and booted from it.  It stopped
on the first read error with a message saying to restart using the rescue
option.  I did that.  After 5 hours it finished without mentioning any
errors.

I tried to boot to the old disk (since it was still wired that way).  I got
dropped int a maintenance shell with fs errors in /dev/sda4 which is the
physical volume for all my LVM logical volumes -- /usr, /var, /home and
/temp.  It says to run fsck manually.

I decided to try the new drive, so I changed the cables and re-booted.

Maintenance shell, again.

/ mounted clean

lvm started

/home fs has errors run fsck (at this point, I'm afraid to try it)

/var, /usr, and /tmp all say that the superblock can not be read, or is
invalid.  Try running

e2fsck -b 8193 
or
e2fsck -b 32768 

Which do I use?

How did trying to clone the disk nake such a mess of BOTH disks?

Any help getting a working system again will be greatly appreciated.

Marc

On Feb 6, 2017 2:37 PM, "David Christensen" 
wrote:

On 02/06/17 13:15, Marc Shapiro wrote:

> I am pasting the result of smartctl -x /dev/sda below as I have no real
> clue what to do with the information, but I have a few questions first.
>
> 1) I have purchased a new, very similar, Seagate 1TB drive and I plan to
> install it and copy the whole system to the new drive.
>

It sounds like you don't have a backup of the failing 1 TB drive (?).


Do you have a file server with ~1 TB of free space?  RAID?


Run memtest86+ for 24+ hours to verify that you don't have a memory problem.


Use SeaTools to wipe the new 1 TB drive and run the short and long tests.
Stop if anything fails.



What is the best
> way to do this copy since I don't wangt to copy bad sectors?
>

I've done it with 'dd' in the past, but will use 'ddrescue' in the future.



2) Once I have verified that the new drive boots
>

I'd do a fresh install on a 16+ GB SSD (USB flash drives also work).  A
recovered system disk image is too uncertain.



and everything is running properly
>

As I understand it, the drive microcontroller calculates and stores a
checksum with every sector (block).  That's one way it knows that a block
is bad upon reading.  So, when you copy out whatever blocks you can get,
you probably won't have errors in those blocks.


But, files and directories are stored on one or more sectors.  Depending
upon your file system, fsck may or may not find the missing blocks.


When you're done, the destination disk is likely to be missing files and/or
directories.



I am hoping to reformat the old drive.  This should
> reallocate the bad sectors IIRC.  I then would like to set up a raid
> with both drives (keeping a close eye on the old drive).The
> feasibility of this, I would guess, depends on what the posted smartctl
> information tells someone who knows what to look for.
>
> 3) As I understand it, the above mentioned raid should be safe since,
> even if the old drive deteriorates further, the system can run on just
> the new drive.  Is that correct?
>

Once you've copied out whatever blocks you can get, use SeaTools to wipe
the old 1 TB drive and run short and long tests.  If all three pass, I
might be tempted to re-use the drive.


If it fails to wipe and has plaintext, destroy it with a sledge hammer.
(Wear safety glasses!)


If it wipes but fails the short or long tests, recycle it.



Here is the smafrtctl output:
>
...

=== START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>

Interesting, given that the drive failed SeaTools (short test?).



General SMART Values:
> Offline data collection status:  (0x82)Offline data collection activity
> was completed without error.
> Auto Offline Data Collection: Enabled.
> Self-test execution status:  ( 121)The previous self-test
> completed having
> the read element of the test failed.
>

Matches SeaTools result.



Total time to complete Offline
> data collection: (  600) seconds.
>
...

SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
>   1 Raw_Read_Error_Rate POSR--   117   095   006- 165391146
>   3 Spin_Up_TimePO   095   093   000-0
>   4 Start_Stop_Count-O--CK   100   100   020-406
>   5 Reallocated_Sector_Ct   PO--CK   072   072   036-1181
>   7 Seek_Error_Rate POSR--   087   060   030- 656506200
>   9 Power_On_Hours  -O--CK   048   048   000-46195
>  10 Spin_Retry_CountPO--C-   100   100   097-0
>  12 Power_Cycle_Count   -O--CK   100   100   020-203
> 183 Runtime_Bad_Block   -O--CK   092   092   000-8
> 184 End-to-End_Error-O--CK   100   100  

Re: How to fix I/O errors?

2017-02-07 Thread Jonathan Dowland
On Fri, Feb 03, 2017 at 01:38:58PM -0800, Marc Shapiro wrote:
> I had been trying as root (see above).  I do not have smartmontools
> currently installed.  If I remember correctly, that is only going to be
> useful if it was already installed so the daemon could be capturing data
> when the problem occurred.  Is that correct, or am I thinking of a different
> package?

Different package maybe. The HDDs themselves maintain the logs, smartctl merely
prints them. If you had it installed already, you might have local logs from it
that would tell you when a problem was first noticed, but that's about it.

-- 
Jonathan Dowland
Please do not CC me, I am subscribed to the list.


signature.asc
Description: Digital signature


Re: How to fix I/O errors?

2017-02-06 Thread David Christensen

On 02/06/17 13:15, Marc Shapiro wrote:

I am pasting the result of smartctl -x /dev/sda below as I have no real
clue what to do with the information, but I have a few questions first.

1) I have purchased a new, very similar, Seagate 1TB drive and I plan to
install it and copy the whole system to the new drive.


It sounds like you don't have a backup of the failing 1 TB drive (?).


Do you have a file server with ~1 TB of free space?  RAID?


Run memtest86+ for 24+ hours to verify that you don't have a memory problem.


Use SeaTools to wipe the new 1 TB drive and run the short and long 
tests.  Stop if anything fails.




What is the best
way to do this copy since I don't wangt to copy bad sectors?


I've done it with 'dd' in the past, but will use 'ddrescue' in the future.



2) Once I have verified that the new drive boots


I'd do a fresh install on a 16+ GB SSD (USB flash drives also work).  A 
recovered system disk image is too uncertain.




and everything is running properly


As I understand it, the drive microcontroller calculates and stores a 
checksum with every sector (block).  That's one way it knows that a 
block is bad upon reading.  So, when you copy out whatever blocks you 
can get, you probably won't have errors in those blocks.



But, files and directories are stored on one or more sectors.  Depending 
upon your file system, fsck may or may not find the missing blocks.



When you're done, the destination disk is likely to be missing files 
and/or directories.




I am hoping to reformat the old drive.  This should
reallocate the bad sectors IIRC.  I then would like to set up a raid
with both drives (keeping a close eye on the old drive).The
feasibility of this, I would guess, depends on what the posted smartctl
information tells someone who knows what to look for.

3) As I understand it, the above mentioned raid should be safe since,
even if the old drive deteriorates further, the system can run on just
the new drive.  Is that correct?


Once you've copied out whatever blocks you can get, use SeaTools to wipe 
the old 1 TB drive and run short and long tests.  If all three pass, I 
might be tempted to re-use the drive.



If it fails to wipe and has plaintext, destroy it with a sledge hammer. 
(Wear safety glasses!)



If it wipes but fails the short or long tests, recycle it.



Here is the smafrtctl output:

...

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


Interesting, given that the drive failed SeaTools (short test?).



General SMART Values:
Offline data collection status:  (0x82)Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  ( 121)The previous self-test
completed having
the read element of the test failed.


Matches SeaTools result.



Total time to complete Offline
data collection: (  600) seconds.

...

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate POSR--   117   095   006- 165391146
  3 Spin_Up_TimePO   095   093   000-0
  4 Start_Stop_Count-O--CK   100   100   020-406
  5 Reallocated_Sector_Ct   PO--CK   072   072   036-1181
  7 Seek_Error_Rate POSR--   087   060   030- 656506200
  9 Power_On_Hours  -O--CK   048   048   000-46195
 10 Spin_Retry_CountPO--C-   100   100   097-0
 12 Power_Cycle_Count   -O--CK   100   100   020-203
183 Runtime_Bad_Block   -O--CK   092   092   000-8
184 End-to-End_Error-O--CK   100   100   099-0
187 Reported_Uncorrect  -O--CK   011   011   000-89
188 Command_Timeout -O--CK   100   097   000- 51540394008
189 High_Fly_Writes -O-RCK   100   100   000-0
190 Airflow_Temperature_Cel -O---K   070   049   045-30 (Min/Max
27/32)
194 Temperature_Celsius -O---K   030   051   000-30 (0 20 0
0 0)
195 Hardware_ECC_Recovered  -O-RC-   034   003   000- 165391146
197 Current_Pending_Sector  -O--C-   093   083   000-310
198 Offline_Uncorrectable   C-   093   083   000-310
199 UDMA_CRC_Error_Count-OSRCK   200   200   000-26
240 Head_Flying_Hours   --   100   253   000-46718 (49
76 0)
241 Total_LBAs_Written  --   100   253   000- 1725386978
242 Total_LBAs_Read --   100   253   000- 265479204
||_ K auto-keep
|__ C event count
___ R error rate
||| S speed/performance
||_ O updated online
|__ P prefailure warning


I have yet to find a good 

Re: How to fix I/O errors?

2017-02-06 Thread Felix Miata

Gene Heskett composed on 2017-02-06 12:28 (UTC-0500):


That cold spare will eventually develop stiction, seizing the parked haed
to the surface of the disk solidly enough that the disk motor cannot
break it loose to spin the disk up.  Such is best treated by hooking up
the cables, but holding the drive in your hand so that you can turn on
the power, and within a couple seconds, give the drive a good sideways
blow on a corner with the ball of the wrist so the drive housing/casting
is caused to rotate a few degrees around the axis of the disk, breaking
the stiction so the spindle motor can spin it up. The theory is that the
drive frame rotates when you drive it by hitting the corner, but the
disk doesn't, breaking the stiction seal.


The other theory is the motor got too weak to start the platters in motion. It 
wouldn't surprise me that these things have separate electronics for startup and 
for maintain, and that something in startup simply expires to cause spinup failure.


The first HD I ever bought, a 3.5" "full-height" SCSI-I 80MB Seagate in 1990, 
acquired the won't spin up problem a month after its 12 month warranty expired. 
I avoided the problem a long time via a UPS, but eventually an extended power 
outage claimed it permanently.

--
"The wise are known for their understanding, and pleasant
words are persuasive." Proverbs 16:21 (New Living Translation)

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata  ***  http://fm.no-ip.com/



Re: How to fix I/O errors?

2017-02-06 Thread David Christensen

On 02/06/17 09:28, Gene Heskett wrote:

That cold spare will eventually develop stiction, seizing the parked haed
to the surface of the disk solidly enough that the disk motor cannot
break it loose to spin the disk up.  Such is best treated by hooking up
the cables, but holding the drive in your hand so that you can turn on
the power, and within a couple seconds, give the drive a good sideways
blow on a corner with the ball of the wrist so the drive housing/casting
is caused to rotate a few degrees around the axis of the disk, breaking
the stiction so the spindle motor can spin it up. The theory is that the
drive frame rotates when you drive it by hitting the corner, but the
disk doesn't, breaking the stiction seal.


Yowza!  I haven't experienced (realized?) a head-stuck-to-platter 
problem, but I'll keep your technique in mind if all else fails.



David




Re: How to fix I/O errors?

2017-02-06 Thread David Christensen

On 02/06/17 07:22, Joe Pfeiffer wrote:

David Christensen  writes:

I've found (and heard) that the worst thing I can do to a HDD is put
it on the shelf and let it rot.  I've had more than a few that failed
shortly after being put into a computer.


I hadn't heard this...  I've got a drive I've been keeping as a cold
spare.  Am I better off (in the sense of "is it more likely to actually
be useable when I need it") installing it and adding it to one of my
RAID1 arrays?  Can you point me to an article about it?


I don't know of a particular article.  I just have personal experience 
and advice from other people (including a relative who worked for a HDD 
manufacturer).



David



Re: How to fix I/O errors?

2017-02-06 Thread Marc Shapiro

On 02/03/2017 11:13 PM, David Christensen wrote:

On 02/03/17 13:47, Marc Shapiro wrote:

On 02/02/2017 10:23 PM, David Christensen wrote:


Have you downloaded and run the manufacturer diagnostic utilities for
all your drives?  What do they say?


I have now downloaded and run Seagate's tools and it does show a does
show a disk error.  Since it stops on the first error I do not know if
this is an isolated error, or a more systematic problem.

Automatic Write Reallocation Enable (AWRE) is on by default, but
Automatic Read Reallocation Enable (ARRE) is off.  If I set ARRE on and
then run the long test (which reads all sectors sequentially), will that
reallocate any bad sectors and mark them as such?  Is this a safe thing
to do?


Beware that failures have a way of escalating faster than you expect.


Do you have a good backup of the drive?


If not and the data has high value, do not power up the drive. Pack it 
properly and send it to a professional recovery service.



>> http://www.seagate.com/support/downloads/seatools/
> I see download links for DOS and Windows, nothiong for Linux

It is common for Wintel-centric tools to only run on Windows. It's 
good to keep an operational Windows system drive around. Mobile docks 
make swapping drives easy.



Sometimes you get lucky and the tool is a live CD:

www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO 




> If I remember correctly, [SMART] is
> only going to be useful if it was already installed so the daemon
> could be capturing data when the problem occurred.  Is that correct,
> or am I thinking of a different package?

https://en.wikipedia.org/wiki/S.M.A.R.T.

SMART is built-in to the firmware of the HDD/SSD; the drive 
microcontroller does most of the work.  Using the right tools, you can 
pull SMART information out of the microcontroller and/or adjust 
tunable SMART parameters.  As the drive is failing, you especially 
want those reports.  If you post them here, people can tell you all 
kinds of interesting things about your drive.



> I have now downloaded and run Seagate's tools and it does show a does
> show a disk error.  Since it stops on the first error I do not know if
> this is an isolated error, or a more systematic problem.

Take a picture of the screen with a digital camera (or phone), and 
then type the exact screen contents into a reply.



David

I am pasting the result of smartctl -x /dev/sda below as I have no real 
clue what to do with the information, but I have a few questions first.


1) I have purchased a new, very similar, Seagate 1TB drive and I plan to 
install it and copy the whole system to the new drive. What is the best 
way to do this copy since I don't wangt to copy bad sectors?


2) Once I have verified that the new drive boots and everything is 
running properly I am hoping to reformat the old drive.  This should 
reallocate the bad sectors IIRC.  I then would like to set up a raid 
with both drives (keeping a close eye on the old drive).  The 
feasibility of this, I would guess, depends on what the posted smartctl 
information tells someone who knows what to look for.


3) As I understand it, the above mentioned raid should be safe since, 
even if the old drive deteriorates further, the system can run on just 
the new drive.  Is that correct?



Here is the smafrtctl output:


$ sudo smartctl -x /dev/sda
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.12
Device Model: ST31000528AS
Serial Number:5VP9QSWJ
LU WWN Device Id: 5 000c50 03e5ccb5c
Firmware Version: CC3E
User Capacity:1,000,204,886,016 bytes [1.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:7200 rpm
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:Mon Feb  6 12:57:05 2017 PST

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213891en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is: 0 (vendor specific), recommended: 254
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unknown

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  ( 121)The previous self-test 
completed having


Re: How to fix I/O errors?

2017-02-06 Thread Gene Heskett
On Monday 06 February 2017 10:22:54 Joe Pfeiffer wrote:

> David Christensen  writes:
> > On 02/04/17 07:18, Ric Moore wrote:
> >> I'm looking at a Seagate 750 gig drive that went south on me with a
> >> pile of errors. Good luck getting Seagate to give a good gosh darn.
> >> In the past I have had mixed results replacing the drive
> >> motherboard. I saved two out of three. I doubt I will buy anything
> >> Seagate makes in the future.
> >
> > Everything electrical and mechanical fails.  It's just a question of
> > when, followed by whether or not you're prepared.
> >
> >
> > I've found (and heard) that the worst thing I can do to a HDD is put
> > it on the shelf and let it rot.  I've had more than a few that
> > failed shortly after being put into a computer.
>
> I hadn't heard this...  I've got a drive I've been keeping as a cold
> spare.  Am I better off (in the sense of "is it more likely to
> actually be useable when I need it") installing it and adding it to
> one of my RAID1 arrays?  Can you point me to an article about it?

That cold spare will eventually develop stiction, seizing the parked haed 
to the surface of the disk solidly enough that the disk motor cannot 
break it loose to spin the disk up.  Such is best treated by hooking up 
the cables, but holding the drive in your hand so that you can turn on 
the power, and within a couple seconds, give the drive a good sideways 
blow on a corner with the ball of the wrist so the drive housing/casting 
is caused to rotate a few degrees around the axis of the disk, breaking 
the stiction so the spindle motor can spin it up. The theory is that the 
drive frame rotates when you drive it by hitting the corner, but the 
disk doesn't, breaking the stiction seal.

Based on my experience here, with terabyte drives, they seem to be better 
off spinning even when not in active service. I have one old terabyte 
drive thats had 25 bad, re-allocated clusters, since the first time I 
had smartctl do an extended self test on it several years ago. It still 
has 25 reallocated sectors this day, with (word wrapped):
 5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  
Always   -   25
and
  9 Power_On_Hours  0x0032   030   030   000Old_age   
Always   -   61430

And that drive gets beat on every night as its my backup disk containing 
all the virtual tapes amanda uses.

In terms of spin time, thats 6.997490303 years.  And thats a Seagate 
Barracuda drive, which has a horrible reputation according to these 
mailing lists.

One secret though. When that drive was new to me, I went to the seagate 
web site and downloaded a cd image for that model that updated its 
firmware. As I already had a linux install on it, I applied it to that 
drive with a bit of trepidation. But worry wasn't needed, I didn't lose 
a byte of the install. But a side benefit was that the drives speed was 
nearly doubled.

If you can afford the time, I highly recommend putting the latest 
firmware in it before putting it in service. Who knows how long its been 
on the dealers warehouse shelf, but the dealer bought 10,000 when they 
were announced. And the initial shipment can be guaranteed to have been 
bagged with Alpha rated firmware in it.  Always update new drives is the 
message from this elderly (82) user.  You won't regret it.

Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 



Re: How to fix I/O errors?

2017-02-06 Thread Joe Pfeiffer
David Christensen  writes:

> On 02/04/17 07:18, Ric Moore wrote:
>> I'm looking at a Seagate 750 gig drive that went south on me with a pile
>> of errors. Good luck getting Seagate to give a good gosh darn. In the
>> past I have had mixed results replacing the drive motherboard. I saved
>> two out of three. I doubt I will buy anything Seagate makes in the
>> future.
>
> Everything electrical and mechanical fails.  It's just a question of
> when, followed by whether or not you're prepared.
>
>
> I've found (and heard) that the worst thing I can do to a HDD is put
> it on the shelf and let it rot.  I've had more than a few that failed
> shortly after being put into a computer.

I hadn't heard this...  I've got a drive I've been keeping as a cold
spare.  Am I better off (in the sense of "is it more likely to actually
be useable when I need it") installing it and adding it to one of my
RAID1 arrays?  Can you point me to an article about it?



Re: How to fix I/O errors?

2017-02-04 Thread David Christensen

On 02/04/17 07:18, Ric Moore wrote:

I'm looking at a Seagate 750 gig drive that went south on me with a pile
of errors. Good luck getting Seagate to give a good gosh darn. In the
past I have had mixed results replacing the drive motherboard. I saved
two out of three. I doubt I will buy anything Seagate makes in the
future.


Everything electrical and mechanical fails.  It's just a question of 
when, followed by whether or not you're prepared.



I've found (and heard) that the worst thing I can do to a HDD is put it 
on the shelf and let it rot.  I've had more than a few that failed 
shortly after being put into a computer.



David



Re: How to fix I/O errors?

2017-02-04 Thread Ric Moore

On 02/03/2017 04:47 PM, Marc Shapiro wrote:

On 02/02/2017 10:23 PM, David Christensen wrote:


Have you downloaded and run the manufacturer diagnostic utilities for
all your drives?  What do they say?


I have now downloaded and run Seagate's tools and it does show a does
show a disk error.  Since it stops on the first error I do not know if
this is an isolated error, or a more systematic problem.


I'm looking at a Seagate 750 gig drive that went south on me with a pile 
of errors. Good luck getting Seagate to give a good gosh darn. In the 
past I have had mixed results replacing the drive motherboard. I saved 
two out of three. I doubt I will buy anything Seagate makes in the 
future. Ric



--
My father, Victor Moore (Vic) used to say:
"There are two Great Sins in the world...
..the Sin of Ignorance, and the Sin of Stupidity.
Only the former may be overcome." R.I.P. Dad.
http://linuxcounter.net/user/44256.html



Re: How to fix I/O errors?

2017-02-03 Thread David Christensen

On 02/03/17 13:47, Marc Shapiro wrote:

On 02/02/2017 10:23 PM, David Christensen wrote:


Have you downloaded and run the manufacturer diagnostic utilities for
all your drives?  What do they say?


I have now downloaded and run Seagate's tools and it does show a does
show a disk error.  Since it stops on the first error I do not know if
this is an isolated error, or a more systematic problem.

Automatic Write Reallocation Enable (AWRE) is on by default, but
Automatic Read Reallocation Enable (ARRE) is off.  If I set ARRE on and
then run the long test (which reads all sectors sequentially), will that
reallocate any bad sectors and mark them as such?  Is this a safe thing
to do?


Beware that failures have a way of escalating faster than you expect.


Do you have a good backup of the drive?


If not and the data has high value, do not power up the drive.  Pack it 
properly and send it to a professional recovery service.



>> http://www.seagate.com/support/downloads/seatools/
> I see download links for DOS and Windows, nothiong for Linux

It is common for Wintel-centric tools to only run on Windows.  It's good 
to keep an operational Windows system drive around.  Mobile docks make 
swapping drives easy.



Sometimes you get lucky and the tool is a live CD:

www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO


> If I remember correctly, [SMART] is
> only going to be useful if it was already installed so the daemon
> could be capturing data when the problem occurred.  Is that correct,
> or am I thinking of a different package?

https://en.wikipedia.org/wiki/S.M.A.R.T.

SMART is built-in to the firmware of the HDD/SSD; the drive 
microcontroller does most of the work.  Using the right tools, you can 
pull SMART information out of the microcontroller and/or adjust tunable 
SMART parameters.  As the drive is failing, you especially want those 
reports.  If you post them here, people can tell you all kinds of 
interesting things about your drive.



> I have now downloaded and run Seagate's tools and it does show a does
> show a disk error.  Since it stops on the first error I do not know if
> this is an isolated error, or a more systematic problem.

Take a picture of the screen with a digital camera (or phone), and then 
type the exact screen contents into a reply.



David



Re: How to fix I/O errors?

2017-02-03 Thread Marc Shapiro

On 02/02/2017 10:23 PM, David Christensen wrote:


Have you downloaded and run the manufacturer diagnostic utilities for 
all your drives?  What do they say?


I have now downloaded and run Seagate's tools and it does show a does 
show a disk error.  Since it stops on the first error I do not know if 
this is an isolated error, or a more systematic problem.


Automatic Write Reallocation Enable (AWRE) is on by default, but 
Automatic Read Reallocation Enable (ARRE) is off.  If I set ARRE on and 
then run the long test (which reads all sectors sequentially), will that 
reallocate any bad sectors and mark them as such?  Is this a safe thing 
to do?


Marc



Re: How to fix I/O errors?

2017-02-03 Thread Marc Shapiro

On 02/03/2017 06:50 AM, Mark Fletcher wrote:

On Thu, Feb 02, 2017 at 11:34:03PM -0800, Marc Shapiro wrote:

Have you looked at the SMART reports?  Please paste the following command
into a root shell, run it once for each drive (replacing /dev/sdX with the
corresponding device name), and paste both the command and the output into
your reply:

# smartctl -x /dev/sdX

root:/var/log# smartctl -x /dev/sda

bash: smartctl: command not found

What package would this be in?


apt-file search smartctl

shows this is in package smartmontools (in Jessie, I assume same in
other flavours)

Also it is installed in /usr/sbin which a non-root user doesn't usually
have on their path, which implies it may have to be executed as root.
The # in the sample command line also implies that, but just in case it
wasn't obvious...

Mark

I had been trying as root (see above).  I do not have smartmontools 
currently installed.  If I remember correctly, that is only going to be 
useful if it was already installed so the daemon could be capturing data 
when the problem occurred.  Is that correct, or am I thinking of a 
different package?



Marc



Re: How to fix I/O errors?

2017-02-03 Thread Mark Fletcher
On Thu, Feb 02, 2017 at 11:34:03PM -0800, Marc Shapiro wrote:
> 
> >Have you looked at the SMART reports?  Please paste the following command
> >into a root shell, run it once for each drive (replacing /dev/sdX with the
> >corresponding device name), and paste both the command and the output into
> >your reply:
> >
> ># smartctl -x /dev/sdX
> root:/var/log# smartctl -x /dev/sda
> 
> bash: smartctl: command not found
> 
> What package would this be in?
> 

apt-file search smartctl 

shows this is in package smartmontools (in Jessie, I assume same in 
other flavours)

Also it is installed in /usr/sbin which a non-root user doesn't usually 
have on their path, which implies it may have to be executed as root. 
The # in the sample command line also implies that, but just in case it 
wasn't obvious...

Mark



Re: How to fix I/O errors?

2017-02-02 Thread Marc Shapiro

On 02/02/2017 10:23 PM, David Christensen wrote:

On 02/02/17 13:05, Marc Shapiro wrote:

I apologize for this being so long, but since the problem occurs
sporadically I wanted to get as much information in this post as
possible because I don't know when it will happen again.

...

What operating system are you running?  Please paste the following 
command into a root shell, run it, and then paste both the command and 
the output into your reply:


# cat /etc/debian_version; uname -a


root:/var/log# cat /etc/debian_version; uname -a
8.6
Linux quixote 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 
(2016-02-29) x86_64 GNU/Linux





Did you make any hardware, software, or configuration changes 
immediately prior to the initial event?

No.



What is the make and model of the disk drive(s) that are having problems?



   *-disk
 description: ATA Disk
 product: ST31000528AS
 vendor: Seagate
 physical id: 0.0.0
 bus info: scsi@0:0.0.0
 logical name: /dev/sda
 version: CC3E
 serial: 5VP9QSWJ
 size: 931GiB (1TB)
 capabilities: partitioned partitioned:dos
 configuration: ansiversion=5 logicalsectorsize=512 
sectorsize=512 signature=575f



Have you looked at the SMART reports?  Please paste the following 
command into a root shell, run it once for each drive (replacing 
/dev/sdX with the corresponding device name), and paste both the 
command and the output into your reply:


# smartctl -x /dev/sdX

root:/var/log# smartctl -x /dev/sda

bash: smartctl: command not found

What package would this be in?




Have you downloaded and run the manufacturer diagnostic utilities for 
all your drives?  What do they say?



David


p.s. here are the links for the Intel, Seagate, and Western Digital 
drive diagnostics.  If your drive(s) are another brand, please find 
their tool and reply with the URL:


http://www.intel.com/content/www/us/en/support/solid-state-drives/ssd-software/intel-ssd-toolbox.html 



http://www.seagate.com/support/downloads/seatools/

I see download links for DOS and Windows, nothiong for Linux


http://support.wdc.com/downloads.aspx?DL





Re: How to fix I/O errors?

2017-02-02 Thread David Christensen

On 02/02/17 13:05, Marc Shapiro wrote:

I apologize for this being so long, but since the problem occurs
sporadically I wanted to get as much information in this post as
possible because I don't know when it will happen again.

...

What operating system are you running?  Please paste the following 
command into a root shell, run it, and then paste both the command and 
the output into your reply:


# cat /etc/debian_version; uname -a


Did you make any hardware, software, or configuration changes 
immediately prior to the initial event?



What is the make and model of the disk drive(s) that are having problems?


Have you looked at the SMART reports?  Please paste the following 
command into a root shell, run it once for each drive (replacing 
/dev/sdX with the corresponding device name), and paste both the command 
and the output into your reply:


# smartctl -x /dev/sdX


Have you downloaded and run the manufacturer diagnostic utilities for 
all your drives?  What do they say?



David


p.s. here are the links for the Intel, Seagate, and Western Digital 
drive diagnostics.  If your drive(s) are another brand, please find 
their tool and reply with the URL:


http://www.intel.com/content/www/us/en/support/solid-state-drives/ssd-software/intel-ssd-toolbox.html

http://www.seagate.com/support/downloads/seatools/

http://support.wdc.com/downloads.aspx?DL



Re: How to fix I/O errors?

2017-02-02 Thread Marc Shapiro

On 02/02/2017 04:20 PM, Marc Auslander wrote:

A few observations.

Are your filesystems journaled.  They say ext3, which IIRC does
support journaling?

the flashplayer should not be able to trash the file system.

/var/log/syslog is a place to look for io errors.  If you are having
them you likely have a failing disk and need to replace it ASAP.

given the cost of disks, running raid 1 with pairs of disks is really
a good idea.  When one fails you pull it, replace it, and rebuild, all
without data loss or loss of use of the system.


Yes, the filesystems are journalled.  I will take a look at /var/log/syslog.


Marc




Re: How to fix I/O errors?

2017-02-02 Thread Marc Auslander
A few observations.

Are your filesystems journaled.  They say ext3, which IIRC does
support journaling?

the flashplayer should not be able to trash the file system.

/var/log/syslog is a place to look for io errors.  If you are having
them you likely have a failing disk and need to replace it ASAP.

given the cost of disks, running raid 1 with pairs of disks is really
a good idea.  When one fails you pull it, replace it, and rebuild, all
without data loss or loss of use of the system.



Re: How to fix I/O errors?

2017-02-02 Thread Marc Shapiro

On 02/02/2017 01:40 PM, Marc Shapiro wrote:

On 02/02/2017 01:19 PM, to...@tuxteam.de wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Feb 02, 2017 at 01:05:47PM -0800, Marc Shapiro wrote:

I apologize for this being so long, but since the problem occurs
sporadically I wanted to get as much information in this post as
possible because I don't know when it will happen again.

If I were you, I'd take a backup ASAP and double-check whether one
of your disks is dying. Perhaps there's some hint in /var/log/messages,

It might just be a lose cable.

Proceed carefully. If at all possible don't mount your disks read/write
until yon know more.

(Perhaps boot off an external medium, CDROM or USB stick).

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAliTongACgkQBcgs9XrR2ka/BwCfXGGdH/hABiXZEG/nSFMR3QRJ
rhUAn1wC8V3ZJdqbdEQzV0McASFyNiZE
=vTyi
-END PGP SIGNATURE-

I was on the system last night until close to midnight with no 
problems.  There were only 3 lines for yesterday in /var/log/messages 
and one for just after midnight. libflashplayer seems to be 
segfaulting.  Nothing then until 11:41:30 this morning, which seems to 
be when I rebooted.



Feb  1 07:35:03 quixote rsyslogd: [origin software="rsyslogd" 
swVersion="8.4.2" x-pid="1993" x-info="http://www.rsyslog.com;] 
rsyslogd was HUPed
Feb  1 13:22:19 quixote kernel: [102750.350970] plugin-containe[2161]: 
segfault at 15a5ed4cb3c4 ip 7fc269bf9412 sp 7ffc8f383c68 error 
6 in libflashplayer.so[7fc269588000+107a000]

Feb  1 21:24:46 quixote kernel: [131744.689533] usblp0: removed
Feb  2 00:00:04 quixote kernel: [141077.771903] plugin-containe[4968]: 
segfault at 1ccfdd1123c4 ip 7f3b59df9412 sp 7ffc2fe526d8 error 
6 in libflashplayer.so[7f3b59788000+107a000]
Feb  2 11:41:30 quixote rsyslogd: [origin software="rsyslogd" 
swVersion="8.4.2" x-pid="2055" x-info="http://www.rsyslog.com;] start
Feb  2 11:41:30 quixote kernel: [0.00] Initializing cgroup 
subsys cpuset
Feb  2 11:41:30 quixote kernel: [0.00] Initializing cgroup 
subsys cpu
Feb  2 11:41:30 quixote kernel: [0.00] Initializing cgroup 
subsys cpuacct
Feb  2 11:41:30 quixote kernel: [0.00] Linux version 
3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 
(Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29)
Feb  2 11:41:30 quixote kernel: [0.00] Command line: 
BOOT_IMAGE=Jessie ro root=UUID=1a16b577-6751-412e-ba89-ca0718922385
Feb  2 11:41:30 quixote kernel: [0.00] e820: BIOS-provided 
physical RAM map:



The previous time that I had to reboot was two days ago.  The lines in 
/var/log/messages just prior to that reboot also point to segfaults in 
libflashplayer:


Jan 30 11:59:06 quixote kernel: [134871.151137] traps: 
plugin-containe[5434] general protection ip:7fc9105543aa sp:7ffef51c4240 
error:0 in libflashplayer.so[7fc90fe88000+107a000]
Jan 30 18:19:27 quixote kernel: [157729.969257] plugin-containe[32057]: 
segfault at 1a8 ip 7f2657a2bfd9 sp 7ffe71fb7d20 error 4 in 
libflashplayer.so[7f2657388000+107a000]
Jan 30 18:24:53 quixote kernel: [158056.527557] plugin-containe[352]: 
segfault at 1a8 ip 7fa9c7f2bfd9 sp 7ffca4961bc0 error 4 in 
libflashplayer.so[7fa9c7888000+107a000]
Jan 30 18:25:53 quixote kernel: [158116.346494] plugin-containe[723]: 
segfault at 237e0f7743c4 ip 7f3d694f9412 sp 7fffba01eed8 error 6 
in libflashplayer.so[7f3d68e88000+107a000]
Jan 31 08:53:21 quixote rsyslogd: [origin software="rsyslogd" 
swVersion="8.4.2" x-pid="1993" x-info="http://www.rsyslog.com;] start
Jan 31 08:53:21 quixote kernel: [0.00] Initializing cgroup 
subsys cpuset
Jan 31 08:53:21 quixote kernel: [0.00] Initializing cgroup 
subsys cpu
Jan 31 08:53:21 quixote kernel: [0.00] Initializing cgroup 
subsys cpuacct
Jan 31 08:53:21 quixote kernel: [0.00] Linux version 
3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 
(Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29)
Jan 31 08:53:21 quixote kernel: [0.00] Command line: auto 
BOOT_IMAGE=Jessie ro root=UUID=1a16b577-6751-412e-ba89-ca0718922385
Jan 31 08:53:21 quixote kernel: [0.00] e820: BIOS-provided 
physical RAM map:



According to adobe.com I currently have version 24.0.0.186 installed and 
the latest version available is 24.0.0.194.  I don't know if the update 
to 24.0.0.186 coincides with the start of my problems, or not.



Marc






Re: How to fix I/O errors?

2017-02-02 Thread Marc Shapiro

On 02/02/2017 01:19 PM, to...@tuxteam.de wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Feb 02, 2017 at 01:05:47PM -0800, Marc Shapiro wrote:

I apologize for this being so long, but since the problem occurs
sporadically I wanted to get as much information in this post as
possible because I don't know when it will happen again.

If I were you, I'd take a backup ASAP and double-check whether one
of your disks is dying. Perhaps there's some hint in /var/log/messages,

It might just be a lose cable.

Proceed carefully. If at all possible don't mount your disks read/write
until yon know more.

(Perhaps boot off an external medium, CDROM or USB stick).

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAliTongACgkQBcgs9XrR2ka/BwCfXGGdH/hABiXZEG/nSFMR3QRJ
rhUAn1wC8V3ZJdqbdEQzV0McASFyNiZE
=vTyi
-END PGP SIGNATURE-

I was on the system last night until close to midnight with no 
problems.  There were only 3 lines for yesterday in /var/log/messages 
and one for just after midnight. libflashplayer seems to be 
segfaulting.  Nothing then until 11:41:30 this morning, which seems to 
be when I rebooted.



Feb  1 07:35:03 quixote rsyslogd: [origin software="rsyslogd" 
swVersion="8.4.2" x-pid="1993" x-info="http://www.rsyslog.com;] rsyslogd 
was HUPed
Feb  1 13:22:19 quixote kernel: [102750.350970] plugin-containe[2161]: 
segfault at 15a5ed4cb3c4 ip 7fc269bf9412 sp 7ffc8f383c68 error 6 
in libflashplayer.so[7fc269588000+107a000]

Feb  1 21:24:46 quixote kernel: [131744.689533] usblp0: removed
Feb  2 00:00:04 quixote kernel: [141077.771903] plugin-containe[4968]: 
segfault at 1ccfdd1123c4 ip 7f3b59df9412 sp 7ffc2fe526d8 error 6 
in libflashplayer.so[7f3b59788000+107a000]
Feb  2 11:41:30 quixote rsyslogd: [origin software="rsyslogd" 
swVersion="8.4.2" x-pid="2055" x-info="http://www.rsyslog.com;] start
Feb  2 11:41:30 quixote kernel: [0.00] Initializing cgroup 
subsys cpuset
Feb  2 11:41:30 quixote kernel: [0.00] Initializing cgroup 
subsys cpu
Feb  2 11:41:30 quixote kernel: [0.00] Initializing cgroup 
subsys cpuacct
Feb  2 11:41:30 quixote kernel: [0.00] Linux version 
3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 
(Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29)
Feb  2 11:41:30 quixote kernel: [0.00] Command line: 
BOOT_IMAGE=Jessie ro root=UUID=1a16b577-6751-412e-ba89-ca0718922385
Feb  2 11:41:30 quixote kernel: [0.00] e820: BIOS-provided 
physical RAM map:





Re: How to fix I/O errors?

2017-02-02 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Feb 02, 2017 at 01:05:47PM -0800, Marc Shapiro wrote:
> I apologize for this being so long, but since the problem occurs
> sporadically I wanted to get as much information in this post as
> possible because I don't know when it will happen again.

If I were you, I'd take a backup ASAP and double-check whether one
of your disks is dying. Perhaps there's some hint in /var/log/messages,

It might just be a lose cable.

Proceed carefully. If at all possible don't mount your disks read/write
until yon know more.

(Perhaps boot off an external medium, CDROM or USB stick).

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAliTongACgkQBcgs9XrR2ka/BwCfXGGdH/hABiXZEG/nSFMR3QRJ
rhUAn1wC8V3ZJdqbdEQzV0McASFyNiZE
=vTyi
-END PGP SIGNATURE-



How to fix I/O errors?

2017-02-02 Thread Marc Shapiro
I apologize for this being so long, but since the problem occurs 
sporadically I wanted to get as much information in this post as 
possible because I don't know when it will happen again.


This problem started a bout two weeks ago.  I woke up to find a black 
screen and a kernel panic.  I rebooted and was presented with many fsck 
errors that could not be handled automatically so I ran it manually, as 
directed.  I took all the defaults.  Any time that I was shown a file 
name it seemed to be a flash file in my daughters /home directory or 
otherwise related to flash. Afterwards, the only partition that I found 
anything in lost+found was /home and all of the files there were, 
indeed, showing my daughter as owner.  I shutdown and rebooted to get 
everything clean and it seemed good for a while.  Since then, however, 
every day or two things just stop working properly.  Menus cease to do 
anything, pages don't load in the browser, etc.  If I exit from X and 
work at a console, some commands (like ls) seem to work fine, others do 
not, giving me I/O error messages.  I can't even do a typescript, or 
redirect the output to a file that I could attach here, since I just get 
errors.  I can't even do a ctl-alt-del to reboot, as I get an error saying:


INIT: cannot execute "/sbin/shutdown"


I have no choice but to power down with the power button, which I really 
don't like to do.


It happened again, today, and I manually copied down the errors so I 
hope that I got it all correct.  This is what I did before shutting down:


marc@quixote:~$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs 
(rw,relatime,size=10240k,nr_inodes=3081484,mode=755)
devpts on /dev/pts type devpts 
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=2472496k,mode=755)
/dev/sda2 on / type ext3 (ro,relatime,errors=remount-ro,data=ordered)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
pstore on /sys/fs/pstore type pstore (rw,relatime)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=6622700k)
/dev/mapper/vg1-home on /home type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-tmp--jessie on /tmp type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-usr--jessie on /usr type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-usrlocal on /usr/local type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-photos on /usr/local/photos type ext3 
(rw,relatime,data=ordered)
/dev/mapper/vg1-vDisks on /usr/local/vdisks type ext3 
(rw,relatime,data=ordered)

/dev/mapper/vg1-var--jessie on /var type ext3 (ro,relatime,data=ordered)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc 
(rw,nosuid,nodev,noexec,relatime)

cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k)
cgmfs on /run/cgmanager/fs type tmpfs (rw,relatime,size=100k,mode=755)
systemd on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/x86_64-linux-gnu/systemd-shim-cgroup-release-agent,name=systemd)
tmpfs on /run/user/1000 type tmpfs 
(rw,nosuid,nodev,relatime,size=2472496k,mode=700,uid=1000,gid=1000)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse 
(rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)


Note that almost all real filesystems are readonly.


I logged out and back in as root.  From /root I attempted to copy a text 
file to /usr/local/photos (which still shows as rw):



cp wheezy1.script /usr/local/photos

[] sd: 0:0:0:0: [sda] Unhandled error code

[] sd: 0:0:0:0: [sda]

[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

[] sd: 0:0:0:0: [sda] CDB:

[] Read(10): 28 00 00 3e bc 68 00 00 08 00

[] end_request: I/O error, dev sda, sector 4111464

[] sd: 0:0:0:0: [sda] Unhandled error code

[] sd: 0:0:0:0: [sda]

[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

[] sd: 0:0:0:0: [sda] CDB:

[] Read(10): 28 00 00 3e bc 68 00 00 08 00

[] end_request: I/O error, dev sda, sector 4111464

-bash /bin/cp: Input/output error


NOTE: all the empty brackets on the left actually had timestamps in 
them.  The same is true in all following cases, as well.



I then changed directory to /usr/local/photos and tried to create a new 
file with touch:



touch tempfile

[] Write(10): 2a 00 08 56 9e 0c 00 00 08 00

[] sd: 0:0:0:0: [sda] Unhandled error code

[] sd: 0:0:0:0: [sda]

[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

[] sd: 0:0:0:0: [sda] CDB:

[] Read(10): 28 00 08 56 05 1c 00 00 08 00

[] sd: 0:0:0:0: [sda] Unhandled error code

[] sd: 0:0:0:0: [sda]

[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

[] sd: 0:0:0:0: [sda] CDB:


Finally, I tried to unmount /home with the intention of remounting it to 
see if it would come back as rw:



umount /home

[] sd: 0:0:0:0: [sda]