Re: [Bacula-users] Does bacula track and or store tape soft errors?

2007-02-27 Thread Alan Brown
On Mon, 26 Feb 2007, Rex Wheeler wrote:

 Do you have the expertise to write something which can parse this
 information or would it need to go on the Bacula wishlist?

 I do have the expertise, but not sure about the time. It would be a
 reasonable undertaking. It would involve:

 1) Updating the storage daemon to send the appropriate SCSI commands to
 inquire about error counts at mount and unmount times (or possibly with
 each block read / write.) This shouldn't be too bad as the sg_logs
 utility is out there and can access these statistics. The code could
 either be pulled from that tool, or the storage daemon could just call
 that tool and parse the results.

Loading/unloadng is handled by external scripts, so could be quite 
modular.

 2) Updating the configuration syntax and parser for the storage daemon
 so the soft error recording logic could be enabled from the
 configuration file.

This is likely more complex

 3) Changing the protocol between the storage daemon and the director to
 include the new soft error count information.

 4) Changing the protocol between the director and the catalog service to
 include the new soft error count information.

 5) Changing the schema in the configuration store to hold this new
 information.

 6) Other stuff I haven't though of because I have only about an hour of
 looking at the bacula source code.

If this is done in an external program then things are somewhat easier. 
There is already some scripting logic in place to flag to the operator 
when cleaning is needed or other errors have been encountered and it would 
be terrific if more detailed stats were kept on individual tapes


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Does bacula track and or store tape soft errors?

2007-02-26 Thread Alan Brown
On Sat, 24 Feb 2007, Rex Wheeler wrote:

 Does anyone know what kind of errors that the VolErrors column totals?

Primarily write errors and database vs tape file number mismatches

Do you have the expertise to write something which can parse this 
information or would it need to go on the Bacula wishlist?


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Does bacula track and or store tape soft errors?

2007-02-26 Thread Rex Wheeler


 From: Alan Brown 
 Sent: Monday, February 26, 2007 10:38 AM
 
 On Sat, 24 Feb 2007, Rex Wheeler wrote:
 
  Does anyone know what kind of errors that the VolErrors column
totals?
 
 Primarily write errors and database vs tape file number mismatches
 
 Do you have the expertise to write something which can parse this
 information or would it need to go on the Bacula wishlist?

I do have the expertise, but not sure about the time. It would be a
reasonable undertaking. It would involve:

1) Updating the storage daemon to send the appropriate SCSI commands to
inquire about error counts at mount and unmount times (or possibly with
each block read / write.) This shouldn't be too bad as the sg_logs
utility is out there and can access these statistics. The code could
either be pulled from that tool, or the storage daemon could just call
that tool and parse the results.

2) Updating the configuration syntax and parser for the storage daemon
so the soft error recording logic could be enabled from the
configuration file. 

3) Changing the protocol between the storage daemon and the director to
include the new soft error count information.

4) Changing the protocol between the director and the catalog service to
include the new soft error count information.

5) Changing the schema in the configuration store to hold this new
information.

6) Other stuff I haven't though of because I have only about an hour of
looking at the bacula source code.

My initial plan in my spare time is to mess around with the SCSI stuff
in the storage engine and just send console messages back to the
director. If I can get that to work I may play with the other stuff. I
don't, however, take playing with the other stuff lightly. It involves
protocol and database changes and I have no idea what the procedure and
culture around here is for such things. 

Rex

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Does bacula track and or store tape soft errors?

2007-02-26 Thread Kern Sibbald
On Monday 26 February 2007 20:20, Rex Wheeler wrote:
  From: Alan Brown
  Sent: Monday, February 26, 2007 10:38 AM
 
  On Sat, 24 Feb 2007, Rex Wheeler wrote:
   Does anyone know what kind of errors that the VolErrors column

 totals?

  Primarily write errors and database vs tape file number mismatches
 
  Do you have the expertise to write something which can parse this
  information or would it need to go on the Bacula wishlist?

 I do have the expertise, but not sure about the time. It would be a
 reasonable undertaking. It would involve:

 1) Updating the storage daemon to send the appropriate SCSI commands to
 inquire about error counts at mount and unmount times (or possibly with
 each block read / write.) 

I'm not ready to start putting SCSI commands into the Storage daemon.  Maybe 
some day much later.

 This shouldn't be too bad as the sg_logs 
 utility is out there and can access these statistics. The code could
 either be pulled from that tool, or the storage daemon could just call
 that tool and parse the results.

Using an external program poses no problems.  However, I suspect that the 
parsing should be done either in that program or in a script -- much the same 
way that mtx-changer is written.


 2) Updating the configuration syntax and parser for the storage daemon
 so the soft error recording logic could be enabled from the
 configuration file.

 3) Changing the protocol between the storage daemon and the director to
 include the new soft error count information.

 4) Changing the protocol between the director and the catalog service to
 include the new soft error count information.

 5) Changing the schema in the configuration store to hold this new
 information.

 6) Other stuff I haven't though of because I have only about an hour of
 looking at the bacula source code.

 My initial plan in my spare time is to mess around with the SCSI stuff
 in the storage engine and just send console messages back to the
 director. If I can get that to work I may play with the other stuff. I
 don't, however, take playing with the other stuff lightly. It involves
 protocol and database changes and I have no idea what the procedure and
 culture around here is for such things.

The DIR-SD protocol and database changes are not very difficult.  Some of 
important changes to maintain Device statistics are already being implemented 
at the moment by Eric.  I think the missing pieces could be easily added 
either by Eric or myself at the appropriate time.

 Rex

 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share
 your opinions on IT  business topics through brief surveys-and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Does bacula track and or store tape soft errors?

2007-02-24 Thread Dan Langille
resent, to bacula-users

On 24 Feb 2007 at 17:58, Rex Wheeler wrote:

 Does anyone know if bacula keeps track of tape soft errors? (Soft errors
 being correctable errors that were corrected by the tape drive
 hardware.)

Bacula does not keep track of this.

 I like to use the number of soft errors as an early warning indicator as
 to tape failure.
 
 I am currently using Veritas Backup Exec and it tracks total read and
 write errors both soft and hard per tape. I will typically toss a tape
 if it gets any hard errors and consider tossing it when the soft error
 count starts to get high. I don't like the linux support on Veritas and
 I am a couple of versions behind on it. I would like to switch something
 that handles linux better and doesn't require me to fork out a bunch of
 cash.
 
 I took a look at the bacula table structure and there is a VolErrors
 column in the Media table. I glanced at the source code and it seems the
 column is related to higher level problems then tape soft errors.
 
 Does anyone know what kind of errors that the VolErrors column totals?

See http://www.bacula.org/developers/Catalog_Services.html

Number of errors during Job

 
 
 I am currently running version 1.36.3 as offered by the default ubuntu
 package repositories. I realize that the current version is 2.x, but I
 haven't found a repository that allows me to install a current version
 via a package manager. I will get around to building from source soon so
 I can be on the current version.

You might want to look at how I test tapes (when I get second hand 
tapes).

   http://www.freebsddiary.org/tape-testing.php

Of note is the script that pulls back corrected errors per GB.  That 
is at:

   http://www.freebsddiary.org/samples/dlt

The script is FreeBSD specific, but I know one person who has taken 
it and converted it for use by another OS.  Each OS may have its own 
method for querying the hardware.

FWIW, I've long wanted to start using this script for pulling such 
information from the drive and collecting stats.  For example, this 
is the current state.  I believe this is a new tape, previously 
unused, but with a used DLT drive.  I have no idea of the history of 
this drive, but it is in very good condition.  I consider anything 
under 20 corrected errors / GB to be good enough.  But I've tested 
and seen correct backups with up to 600 errors / GB.

 $ sudo ~/bin/dlt sa0
The tape is 'sa0'
Corrected errors with substantial delay: 0
Corrected errors with possible delay   : 0
Total errors   : 0
Total errors corrected : 0
Total times correction algorithm used  : 0
Total bytes processed  : 36291200
Total corrected errors / GB: 0
Total uncorrected errors   : 0
Read compression ratio : 600%
On tape Mbytes read: 0
On tape kbytes read residual   : 71265
WRITING
Corrected errors with substantial delay: 0
Corrected errors with possible delay   : 0
Total errors   : 7
Total errors corrected : 7
Total times correction algorithm used  : 0
Total bytes processed  : 763642480
Total corrected errors / GB: 9
Total uncorrected errors   : 0
Write compression ratio: 270%
Host requested Mbytes written  : 1555
Host requested kbytes written residual : 327680
On tape Mbytes written : 576
On tape kbytes written residual: 0

-- 
Dan Langille : Software Developer looking for work
my resume: http://www.freebsddiary.org/dan_langille.php
PGCon - The PostgreSQL Conference - http://www.pgcon.org/



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Does bacula track and or store tape soft errors?

2007-02-24 Thread Dan Langille
 On 25 Feb 2007 at 3:33, Florian Heigl wrote:

 Just a silent heads up, if I may...

Umm, how is this silent?  ;)

 I never noticed bacula didn't track media errors, but this *is* a missing
 feature - the backup tool is expected to have error counters for tape devices
 and media, both will fail lots once things scale up as they're just parts that
 wear off over time and one needs an indicator for replacing things on time.
 
 usually a tape should be blocked from being recycled after a certain point
 and a tape drive should be disabled for intervention.

Sounds like you want to get this onto the projects page,and ready for 
the next vote.

Or someone could just do the work.  If this is to be done, it must be 
modular: not every OS will collect the errors the same way.  It may 
even differ from device to device. 

-- 
Dan Langille : Software Developer looking for work
my resume: http://www.freebsddiary.org/dan_langille.php
PGCon - The PostgreSQL Conference - http://www.pgcon.org/



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Does bacula track and or store tape soft errors?

2007-02-24 Thread Rex Wheeler


On February 24, 2007 6:41 PM Dan Langille wrote:
 
 On 24 Feb 2007 at 17:58, Rex Wheeler wrote:
 
  Does anyone know if bacula keeps track of tape soft errors? (Soft
errors
  being correctable errors that were corrected by the tape drive
  hardware.)
 
 Bacula does not keep track of this.

Not what I wanted to hear, but thanks for the response. 

  I took a look at the bacula table structure and there is a VolErrors
  column in the Media table. I glanced at the source code and it seems
the
  column is related to higher level problems then tape soft errors.
 
  Does anyone know what kind of errors that the VolErrors column
totals?
 
 See http://www.bacula.org/developers/Catalog_Services.html
 
 Number of errors during Job


I saw that page; I was actually wondering if there was a more formal
definition of what an error was. Specifically, are error counts here
considered related to the media (the media's fault) or errors that just
happened to occur to a backup job while the media was mounted?
 
 You might want to look at how I test tapes (when I get second hand
 tapes).
 
http://www.freebsddiary.org/tape-testing.php
 
 Of note is the script that pulls back corrected errors per GB.  That
 is at:
 
http://www.freebsddiary.org/samples/dlt
 
 The script is FreeBSD specific, but I know one person who has taken
 it and converted it for use by another OS.  Each OS may have its own
 method for querying the hardware.

It looks like your script uses a utility called camcontrol to send
SCSI commands. I poked around and found that the sg_logs utility (from
the sg3-utils package) can provide similar statistics on linux with
something like sg_logs -a /dev/st0. Before I hack this out, has anyone
here already converted this script or have a test utility to determine
tape soft error rate?

Thanks,

Rex


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users