Re: [Bacula-users] Does bacula track and or store tape soft errors?
On Mon, 26 Feb 2007, Rex Wheeler wrote: Do you have the expertise to write something which can parse this information or would it need to go on the Bacula wishlist? I do have the expertise, but not sure about the time. It would be a reasonable undertaking. It would involve: 1) Updating the storage daemon to send the appropriate SCSI commands to inquire about error counts at mount and unmount times (or possibly with each block read / write.) This shouldn't be too bad as the sg_logs utility is out there and can access these statistics. The code could either be pulled from that tool, or the storage daemon could just call that tool and parse the results. Loading/unloadng is handled by external scripts, so could be quite modular. 2) Updating the configuration syntax and parser for the storage daemon so the soft error recording logic could be enabled from the configuration file. This is likely more complex 3) Changing the protocol between the storage daemon and the director to include the new soft error count information. 4) Changing the protocol between the director and the catalog service to include the new soft error count information. 5) Changing the schema in the configuration store to hold this new information. 6) Other stuff I haven't though of because I have only about an hour of looking at the bacula source code. If this is done in an external program then things are somewhat easier. There is already some scripting logic in place to flag to the operator when cleaning is needed or other errors have been encountered and it would be terrific if more detailed stats were kept on individual tapes - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Does bacula track and or store tape soft errors?
On Sat, 24 Feb 2007, Rex Wheeler wrote: Does anyone know what kind of errors that the VolErrors column totals? Primarily write errors and database vs tape file number mismatches Do you have the expertise to write something which can parse this information or would it need to go on the Bacula wishlist? - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Does bacula track and or store tape soft errors?
From: Alan Brown Sent: Monday, February 26, 2007 10:38 AM On Sat, 24 Feb 2007, Rex Wheeler wrote: Does anyone know what kind of errors that the VolErrors column totals? Primarily write errors and database vs tape file number mismatches Do you have the expertise to write something which can parse this information or would it need to go on the Bacula wishlist? I do have the expertise, but not sure about the time. It would be a reasonable undertaking. It would involve: 1) Updating the storage daemon to send the appropriate SCSI commands to inquire about error counts at mount and unmount times (or possibly with each block read / write.) This shouldn't be too bad as the sg_logs utility is out there and can access these statistics. The code could either be pulled from that tool, or the storage daemon could just call that tool and parse the results. 2) Updating the configuration syntax and parser for the storage daemon so the soft error recording logic could be enabled from the configuration file. 3) Changing the protocol between the storage daemon and the director to include the new soft error count information. 4) Changing the protocol between the director and the catalog service to include the new soft error count information. 5) Changing the schema in the configuration store to hold this new information. 6) Other stuff I haven't though of because I have only about an hour of looking at the bacula source code. My initial plan in my spare time is to mess around with the SCSI stuff in the storage engine and just send console messages back to the director. If I can get that to work I may play with the other stuff. I don't, however, take playing with the other stuff lightly. It involves protocol and database changes and I have no idea what the procedure and culture around here is for such things. Rex - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Does bacula track and or store tape soft errors?
On Monday 26 February 2007 20:20, Rex Wheeler wrote: From: Alan Brown Sent: Monday, February 26, 2007 10:38 AM On Sat, 24 Feb 2007, Rex Wheeler wrote: Does anyone know what kind of errors that the VolErrors column totals? Primarily write errors and database vs tape file number mismatches Do you have the expertise to write something which can parse this information or would it need to go on the Bacula wishlist? I do have the expertise, but not sure about the time. It would be a reasonable undertaking. It would involve: 1) Updating the storage daemon to send the appropriate SCSI commands to inquire about error counts at mount and unmount times (or possibly with each block read / write.) I'm not ready to start putting SCSI commands into the Storage daemon. Maybe some day much later. This shouldn't be too bad as the sg_logs utility is out there and can access these statistics. The code could either be pulled from that tool, or the storage daemon could just call that tool and parse the results. Using an external program poses no problems. However, I suspect that the parsing should be done either in that program or in a script -- much the same way that mtx-changer is written. 2) Updating the configuration syntax and parser for the storage daemon so the soft error recording logic could be enabled from the configuration file. 3) Changing the protocol between the storage daemon and the director to include the new soft error count information. 4) Changing the protocol between the director and the catalog service to include the new soft error count information. 5) Changing the schema in the configuration store to hold this new information. 6) Other stuff I haven't though of because I have only about an hour of looking at the bacula source code. My initial plan in my spare time is to mess around with the SCSI stuff in the storage engine and just send console messages back to the director. If I can get that to work I may play with the other stuff. I don't, however, take playing with the other stuff lightly. It involves protocol and database changes and I have no idea what the procedure and culture around here is for such things. The DIR-SD protocol and database changes are not very difficult. Some of important changes to maintain Device statistics are already being implemented at the moment by Eric. I think the missing pieces could be easily added either by Eric or myself at the appropriate time. Rex - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Does bacula track and or store tape soft errors?
resent, to bacula-users On 24 Feb 2007 at 17:58, Rex Wheeler wrote: Does anyone know if bacula keeps track of tape soft errors? (Soft errors being correctable errors that were corrected by the tape drive hardware.) Bacula does not keep track of this. I like to use the number of soft errors as an early warning indicator as to tape failure. I am currently using Veritas Backup Exec and it tracks total read and write errors both soft and hard per tape. I will typically toss a tape if it gets any hard errors and consider tossing it when the soft error count starts to get high. I don't like the linux support on Veritas and I am a couple of versions behind on it. I would like to switch something that handles linux better and doesn't require me to fork out a bunch of cash. I took a look at the bacula table structure and there is a VolErrors column in the Media table. I glanced at the source code and it seems the column is related to higher level problems then tape soft errors. Does anyone know what kind of errors that the VolErrors column totals? See http://www.bacula.org/developers/Catalog_Services.html Number of errors during Job I am currently running version 1.36.3 as offered by the default ubuntu package repositories. I realize that the current version is 2.x, but I haven't found a repository that allows me to install a current version via a package manager. I will get around to building from source soon so I can be on the current version. You might want to look at how I test tapes (when I get second hand tapes). http://www.freebsddiary.org/tape-testing.php Of note is the script that pulls back corrected errors per GB. That is at: http://www.freebsddiary.org/samples/dlt The script is FreeBSD specific, but I know one person who has taken it and converted it for use by another OS. Each OS may have its own method for querying the hardware. FWIW, I've long wanted to start using this script for pulling such information from the drive and collecting stats. For example, this is the current state. I believe this is a new tape, previously unused, but with a used DLT drive. I have no idea of the history of this drive, but it is in very good condition. I consider anything under 20 corrected errors / GB to be good enough. But I've tested and seen correct backups with up to 600 errors / GB. $ sudo ~/bin/dlt sa0 The tape is 'sa0' Corrected errors with substantial delay: 0 Corrected errors with possible delay : 0 Total errors : 0 Total errors corrected : 0 Total times correction algorithm used : 0 Total bytes processed : 36291200 Total corrected errors / GB: 0 Total uncorrected errors : 0 Read compression ratio : 600% On tape Mbytes read: 0 On tape kbytes read residual : 71265 WRITING Corrected errors with substantial delay: 0 Corrected errors with possible delay : 0 Total errors : 7 Total errors corrected : 7 Total times correction algorithm used : 0 Total bytes processed : 763642480 Total corrected errors / GB: 9 Total uncorrected errors : 0 Write compression ratio: 270% Host requested Mbytes written : 1555 Host requested kbytes written residual : 327680 On tape Mbytes written : 576 On tape kbytes written residual: 0 -- Dan Langille : Software Developer looking for work my resume: http://www.freebsddiary.org/dan_langille.php PGCon - The PostgreSQL Conference - http://www.pgcon.org/ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Does bacula track and or store tape soft errors?
On 25 Feb 2007 at 3:33, Florian Heigl wrote: Just a silent heads up, if I may... Umm, how is this silent? ;) I never noticed bacula didn't track media errors, but this *is* a missing feature - the backup tool is expected to have error counters for tape devices and media, both will fail lots once things scale up as they're just parts that wear off over time and one needs an indicator for replacing things on time. usually a tape should be blocked from being recycled after a certain point and a tape drive should be disabled for intervention. Sounds like you want to get this onto the projects page,and ready for the next vote. Or someone could just do the work. If this is to be done, it must be modular: not every OS will collect the errors the same way. It may even differ from device to device. -- Dan Langille : Software Developer looking for work my resume: http://www.freebsddiary.org/dan_langille.php PGCon - The PostgreSQL Conference - http://www.pgcon.org/ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Does bacula track and or store tape soft errors?
On February 24, 2007 6:41 PM Dan Langille wrote: On 24 Feb 2007 at 17:58, Rex Wheeler wrote: Does anyone know if bacula keeps track of tape soft errors? (Soft errors being correctable errors that were corrected by the tape drive hardware.) Bacula does not keep track of this. Not what I wanted to hear, but thanks for the response. I took a look at the bacula table structure and there is a VolErrors column in the Media table. I glanced at the source code and it seems the column is related to higher level problems then tape soft errors. Does anyone know what kind of errors that the VolErrors column totals? See http://www.bacula.org/developers/Catalog_Services.html Number of errors during Job I saw that page; I was actually wondering if there was a more formal definition of what an error was. Specifically, are error counts here considered related to the media (the media's fault) or errors that just happened to occur to a backup job while the media was mounted? You might want to look at how I test tapes (when I get second hand tapes). http://www.freebsddiary.org/tape-testing.php Of note is the script that pulls back corrected errors per GB. That is at: http://www.freebsddiary.org/samples/dlt The script is FreeBSD specific, but I know one person who has taken it and converted it for use by another OS. Each OS may have its own method for querying the hardware. It looks like your script uses a utility called camcontrol to send SCSI commands. I poked around and found that the sg_logs utility (from the sg3-utils package) can provide similar statistics on linux with something like sg_logs -a /dev/st0. Before I hack this out, has anyone here already converted this script or have a test utility to determine tape soft error rate? Thanks, Rex - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users