Re: [Bacula-users] Block checksum mismatch on file storage
On Sat, 28 Jun 2014 09:30:12 +0200 Kern Sibbald k...@sibbald.com wrote: It is unlikely that this is a Bacula problem, especially considering your remark that you have used it for years and never had any problems. Hi List, first of all I have to say thanks for all the helpful replies. I checked every disk twice, USB and SATA, but I could find anything. Reinstalled.. IO tests... nothing. In the End I ran a memory test over night and guess what the §/$($§ machine has corrupted memory. (handful bits out of 6GB) Don't know when I had problems with memormy the last time, but I guess this must be more than 10 years ago. (or maybe I simply didn't notice :-) ) So I assume it will work again this weekend. And I have to do automated restore tests I guess. Thanks a lot again G. -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Block checksum mismatch on file storage
On 06/30/2014 10:35 PM, John Stoffel wrote: Kern Yes, it is clear that one can do read-only tests that do not destroy Kern data. However, in this case, it seems to me more useful to do Kern read/write (it is actually write/read) tests as it appears that the Kern problem is more likely in the write ... Absolutely. And hopefully, this way you don't corrupt the existing data on the disk, but you do force the disk to do a low level re-allocation of bad blocks and sectors. But if you are seeing bad blocks on the disk, then it's time to start thinking about retiring it. Hmm. I learn something new every day; re-allocation of bad blocks. That sounds very interesting. Thanks for the information, it could be very useful for situations like this. Best regards, Kern Kern I have never heard of a non-destructive read/write test, which I assume Kern reads then rewrites the disk. Although that is clever and could be Kern useful, in this case it sounds to me risky on a disk that seems to be Kern failing. Kern Best regards, Kern Kern Kern On 06/29/2014 09:04 PM, John Stoffel wrote: Kern 3. Run read/write disk tests on your USB disk (note: this will Kern destroy any existing data). This isn't quite right. You can run read-write tests on a quiescent filesystem (ie unmounted) without problems: badblocks -svn /dev/sd? will scan the entire disk using non-destructive read-write mode. But as Kern said, check your logs as well. John -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Block checksum mismatch on file storage
Hello, Yes, it is clear that one can do read-only tests that do not destroy data. However, in this case, it seems to me more useful to do read/write (it is actually write/read) tests as it appears that the problem is more likely in the write ... I have never heard of a non-destructive read/write test, which I assume reads then rewrites the disk. Although that is clever and could be useful, in this case it sounds to me risky on a disk that seems to be failing. Best regards, Kern On 06/29/2014 09:04 PM, John Stoffel wrote: Kern 3. Run read/write disk tests on your USB disk (note: this will Kern destroy any existing data). This isn't quite right. You can run read-write tests on a quiescent filesystem (ie unmounted) without problems: badblocks -svn /dev/sd? will scan the entire disk using non-destructive read-write mode. But as Kern said, check your logs as well. John -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Block checksum mismatch on file storage
I have seen this before with both disk and tape media, where a backup job with no errors cannot later be restored due to i/o errors. The simple answer is that media can fail, even when offline, which is one of the reasons we make more than one backup. It is possible, if cumbersome and expensive, to write to RAID-1 storage, which would practically eliminate this issue. If restore from a secondary backup is not acceptable for whatever reason, then more fault tolerant hardware is the only answer. The alternative I would recommend, where restore from secondary backup is acceptable, is to set a volume size limit for disk volumes. Disk media usually fails in a small area of the media, meaning that if there are multiple volumes on the disk then only one (or a few) are likely to be affected. Huge volumes are at greater risk. Smaller volume size does not eliminate the problem, but mitigates the risk at the expense of a somewhat larger database size. On 6/28/2014 3:30 AM, Kern Sibbald wrote: It is unlikely that this is a Bacula problem, especially considering your remark that you have used it for years and never had any problems. My best guess is that you have bad media or a bad medium or a bad connector. When writing, unless the OS reports an error, Bacula assumes the write is good. That is, it does not re-read the data. If you want to verify then you must run a Bacula verify job after the backup job. I suspect that there is no difference between Bacula and rsync except that rsync is writing on a part of the media that is good and Bacula is writing elsewhere. There are several solutions (this is not exhaustive): 1. Get new media. 2. Use a more reliable form of backup device (USB is relatively unreliable compared to SATA, ...). 3. Run read/write disk tests on your USB disk (note: this will destroy any existing data). 4. Check your OS logs. They may show low level errors that are not reported to Bacula. If you have such errors, you must eliminate them to have reliable backups (or said the other way around: reliable backups *never* generate any OS device errors). Best regards, Kern On 06/27/2014 04:36 PM, advan...@posteo.de wrote: Hi Liste, I am using Bacula for years now and had no trouble so far. But now it really hits me. Well it worked smoothly .. until restore. (on ubuntu 12LTS and ubuntu 14, bacula version 5.2.6) The files were on USB disk. To be on the safe side I recreated everything on local sata again. Same result. I do tons of rsync on that disc with no problem, checked with smart, upgraded the system and no change. If I run bacula-sd with -p the restore is pulled through but the files are really corrupted. Luckily I have another backup. But this is really a bad move. How can I rely on the backup of bacula now? (i.e. Rsync tells me at once if the file is corrupt) Do I really have to do a checking restore on every job now? Could you give me a hint what might be the problem? Thanks G. -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Block checksum mismatch on file storage
Kern Yes, it is clear that one can do read-only tests that do not destroy Kern data. However, in this case, it seems to me more useful to do Kern read/write (it is actually write/read) tests as it appears that the Kern problem is more likely in the write ... Absolutely. And hopefully, this way you don't corrupt the existing data on the disk, but you do force the disk to do a low level re-allocation of bad blocks and sectors. But if you are seeing bad blocks on the disk, then it's time to start thinking about retiring it. Kern I have never heard of a non-destructive read/write test, which I assume Kern reads then rewrites the disk. Although that is clever and could be Kern useful, in this case it sounds to me risky on a disk that seems to be Kern failing. Kern Best regards, Kern Kern Kern On 06/29/2014 09:04 PM, John Stoffel wrote: Kern 3. Run read/write disk tests on your USB disk (note: this will Kern destroy any existing data). This isn't quite right. You can run read-write tests on a quiescent filesystem (ie unmounted) without problems: badblocks -svn /dev/sd? will scan the entire disk using non-destructive read-write mode. But as Kern said, check your logs as well. John -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Block checksum mismatch on file storage
Kern 3. Run read/write disk tests on your USB disk (note: this will Kern destroy any existing data). This isn't quite right. You can run read-write tests on a quiescent filesystem (ie unmounted) without problems: badblocks -svn /dev/sd? will scan the entire disk using non-destructive read-write mode. But as Kern said, check your logs as well. John -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Block checksum mismatch on file storage
It is unlikely that this is a Bacula problem, especially considering your remark that you have used it for years and never had any problems. My best guess is that you have bad media or a bad medium or a bad connector. When writing, unless the OS reports an error, Bacula assumes the write is good. That is, it does not re-read the data. If you want to verify then you must run a Bacula verify job after the backup job. I suspect that there is no difference between Bacula and rsync except that rsync is writing on a part of the media that is good and Bacula is writing elsewhere. There are several solutions (this is not exhaustive): 1. Get new media. 2. Use a more reliable form of backup device (USB is relatively unreliable compared to SATA, ...). 3. Run read/write disk tests on your USB disk (note: this will destroy any existing data). 4. Check your OS logs. They may show low level errors that are not reported to Bacula. If you have such errors, you must eliminate them to have reliable backups (or said the other way around: reliable backups *never* generate any OS device errors). Best regards, Kern On 06/27/2014 04:36 PM, advan...@posteo.de wrote: Hi Liste, I am using Bacula for years now and had no trouble so far. But now it really hits me. Well it worked smoothly .. until restore. (on ubuntu 12LTS and ubuntu 14, bacula version 5.2.6) The files were on USB disk. To be on the safe side I recreated everything on local sata again. Same result. I do tons of rsync on that disc with no problem, checked with smart, upgraded the system and no change. If I run bacula-sd with -p the restore is pulled through but the files are really corrupted. Luckily I have another backup. But this is really a bad move. How can I rely on the backup of bacula now? (i.e. Rsync tells me at once if the file is corrupt) Do I really have to do a checking restore on every job now? Could you give me a hint what might be the problem? Thanks G. 27-Jun 15:03 backup01-sd JobId 252: Ready to read from volume File010018 on device FileStorage (/data/bacula/FileStorage). 27-Jun 15:03 backup01-sd JobId 252: Forward spacing Volume File010018 to file:block 1:699862044. 27-Jun 15:03 backup01-sd JobId 252: Error: block.c:318 Volume data error at 1:930427898! Block checksum mismatch in block=3574 len=64512: calc=ea539ac7 blk=d1a3deba 27-Jun 15:03 backup01-fd JobId 252: Error: attribs.c:485 File size of restored file /tmp/restore/data/tmp/xx.zip notcorrect. Original 45958435, restored 2949120. -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Block checksum mismatch on file storage
Hi Liste, I am using Bacula for years now and had no trouble so far. But now it really hits me. Well it worked smoothly .. until restore. (on ubuntu 12LTS and ubuntu 14, bacula version 5.2.6) The files were on USB disk. To be on the safe side I recreated everything on local sata again. Same result. I do tons of rsync on that disc with no problem, checked with smart, upgraded the system and no change. If I run bacula-sd with -p the restore is pulled through but the files are really corrupted. Luckily I have another backup. But this is really a bad move. How can I rely on the backup of bacula now? (i.e. Rsync tells me at once if the file is corrupt) Do I really have to do a checking restore on every job now? Could you give me a hint what might be the problem? Thanks G. 27-Jun 15:03 backup01-sd JobId 252: Ready to read from volume File010018 on device FileStorage (/data/bacula/FileStorage). 27-Jun 15:03 backup01-sd JobId 252: Forward spacing Volume File010018 to file:block 1:699862044. 27-Jun 15:03 backup01-sd JobId 252: Error: block.c:318 Volume data error at 1:930427898! Block checksum mismatch in block=3574 len=64512: calc=ea539ac7 blk=d1a3deba 27-Jun 15:03 backup01-fd JobId 252: Error: attribs.c:485 File size of restored file /tmp/restore/data/tmp/xx.zip notcorrect. Original 45958435, restored 2949120. -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users