Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon

2011-06-13 Thread Mike Seda
I forgot to mention that during my debugging, I did have Heartbeat 
Interval set to 10 on the Client, Storage, and Director resources. The 
same error still occurred... Very odd.



On 06/10/2011 07:02 PM, Blake Dunlap wrote:
$20 you have the other bacula comm channel failing due to timeout of 
the state on a forwarding device. Dropping spool sizes is only 
increasing the frequency of communication across that path. You will 
likely see this problem solved completely by setting a short duration 
keepalive in your bacula configs.


-Blake

On Fri, Jun 10, 2011 at 20:48, Mike Seda mas...@stanford.edu 
mailto:mas...@stanford.edu wrote:


I just encountered a similar error in RHEL 6 using 5.0.3 (on the
server
and client) with Data Spooling enabled:
10-Jun 02:06 srv084 JobId 43: Error: bsock.c:393 Write error sending
65536 bytes to Storage daemon:srv010.nowhere.us:9103
http://srv010.nowhere.us:9103: ERR=Broken pipe
10-Jun 02:06 srv084 JobId 43: Fatal error: backup.c:1024 Network send
error to SD. ERR=Broken pipe

The way that I made it go away was to decrease Maximum Spool
Size from
200G to 2G. I also received the same error at 100G and 50G. I ended up
just disabling data spooling completely on this box since small spool
sizes almost defeat the point of spooling at all.

I've also been seeing some sporadic tape drive errors recently,
too. So
that may be part of the problem. I will be running the
vendor-suggested
diags on the library (Dell TL4000 with 2 x LTO-4 FC drives) in the
next
couple of days.

Plus, this is a temporary SD instance that I will eventually
migrate to
new hardware and add large/fast SAN disk to for spooling. This should
explain the reason for the small spool size settings... This box only
has a 2 x 300 GB drive SAS 10K RAID 1.

It'd be nice to see if anyone else has received this error on a
similar
HW/SW configuration.

Mike


On 06/07/2011 09:48 AM, Yann Cézard wrote:
 Le 07/06/2011 18:10, Josh Fisher a écrit :
 Another problem I see with Windows 7 clients is too aggressive
power
 management turning off the Ethernet interface even though it is
in use
 by bacula-fd. Apparently there is some Windows system call that a
 service (daemon) must make to tell Windows not to do power
management
 while it is busy. I don't know what versions of Windows do
that, other
 than 7 and Vista, but it is a potential problem.
 There is no power management on our servers :-D

 I just ran some tests this afternoon, I create a new bacula server
 with lenny / bacula 2.4.4, and downgrade the client to 2.4.4, to
 be sure that all was fine with the same fileset, etc.
 The test was OK, no problem, the job ran fine.
 Than I tested again with our production server (5.0.3) and
 the 2.4.4 client =  network error, failed job
 I upgraded the test bacula server to squeeze / bacula 5.0.2,
 and still the 2.4.4 fd on the client =  No problem !

 So it seems that the problem is clearly in network hardware on the
 server side.

 We will do some more tests on the network side (change
 switch port, change wire, see if no firmware update is
available...),
 but now I really doubt that the problem is in bacula, nor it can be
 resolved in it.

 The strange thing is that the problems are only observed with win32
 clients. Perhaps the Windows (2003) TCP/IP stack is less fault
tolerant than
 the linux one in some very special case ?

 Regards,



--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
mailto:Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula for openSuSE11.3 rpms missing

2011-06-13 Thread Bruno Friedmann
On 06/12/2011 09:30 PM, Marc Chamberlin wrote:
 Any chance (psheaffer?) someone could contribute the rpms for
 openSuSE11.3 x86_64 and put up on the bacula website, soon? I tried the
 11.2 versions, but the python version of libraries have been updated so
 get a dependency error.
 
 Is it appropriate that I ask here on the forum, or should I submit a bug
 report?  Thanks in advance,   Marc..
 
 
 
Hi Marc

For openSUSE there's several way to get bacula.
If you trust the way actual packager has done things (but only build for mysql)
you can have it with the repository
http://download.opensuse.org/repositories/Archiving:/Backup
(The base build, use a lot /etc/alternatives and is inspirited by fedora build)


Otherwise, dassit (Bacula Network Partner) has build Bacula in their obs 
project home

Mysql
https://build.opensuse.org/project/show?project=home%3Adassit%3Abacula%3Abacula-mysql
PostgreSQL
https://build.opensuse.org/project/show?project=home%3Adassit%3Abacula%3Abacula-postgres
Sqlite
https://build.opensuse.org/project/show?project=home%3Adassit%3Abacula%3Abacula-sqlite

etc...
if you want to see more, go to build.opensuse.org and search dassit
(They also have single-dir builds for example)


I was waiting the 5.1/2x release before proposing full build for the openSUSE 
stack.
But we need to wait this one stabilized before.

I've personally build 5.0.3+ package for 113 114 and 12.1(factory) with a full 
rework on package name (especially libs-) to
avoid conflicts etc.


-- 

Bruno Friedmann
Ioda-Net Sàrl www.ioda-net.ch

openSUSE Member  Ambassador
GPG KEY : D5C9B751C4653227
irc: tigerfoot

--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] purge sometimes resets to bad first write time

2011-06-13 Thread Marek Šimon
Hi,
I must sometimes do a manual volume purge. Today I purged an old volume
in favour of a job, which was waiting for a volume since yesterday, but
after the job was finished the volume then got first write time from
yesterday (actually from the time the job started seeking for a volume).
Other jobs, which needed the same volume, had to wait again. I had to
purge another media.
I don't like this behaviour, because the job ate actually two volumes.
My scenario is weighted to use one volume per day.
Marek



--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] bacula IPv6 status (unofficially)

2011-06-13 Thread Gavin McCullagh
Hi James,

On Sat, 11 Jun 2011, James Harper wrote:

 Not really directly bacula related, but one of the concerns I have with
 switching to IPv6 for LAN scale traffic is the performance of the
 various offload features in the network adapters. Did you do any
 throughput testing?

I haven't yet had time to look at performance in detail -- except to say
that none of the backups that have run since seem noticeably slower.  That
being said, it's hard to tell with incremental backups and some of our full
backups have other bottlenecks.

I'm guessing support for IP Checksum Offload, TSO et al will be driver
dependant so it may be quite messy to work out which the details of what
cards/kernels/OSes do and don't make full use of these in IPv6.

The tool to use for testing this is probably iPerf.  I must do some
investigation myself.

Gavin



--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon

2011-06-13 Thread Josh Fisher

On 6/13/2011 2:15 AM, Mike Seda wrote:
I forgot to mention that during my debugging, I did have Heartbeat 
Interval set to 10 on the Client, Storage, and Director resources. 
The same error still occurred... Very odd.




I have encountered similar situations with clients. Everything but 
Bacula would appear to work over the network, but Bacula would fail. In 
one case it was a bad switch, and 2 or 3 other times it was a bad NIC in 
the client. My conclusion is that Bacula is very sensitive to network 
problems, and since it is network heavy during a backup, it tends to 
reveal network problems when nothing else does. If the client has been 
working in the past, then suddenly began failing jobs, then the problem 
is not likely the config. The procedure I now go through to diagnose 
client problems is something like:


1) If a win32 client, then disable OS power management (can turn off 
NIC's PHY inappropriately)

2) Swap connections with an existing, known working client (if possible)
3) Replace Ethernet patch cable
4) Connect client to a different switch (if possible)
5) Replace client's NIC
6) Try different plenum cabling or bypass plenum cabling if possible
7) Physically move client and directly connect to the switch SD is 
connected to


For me, this error has always thus far ended up being a hardware problem.

--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Any way to execute action after LTO got Full volstatus?

2011-06-13 Thread Krysztofiak
Hello,
I know, it is possible to send mail in this situation with bacula deamon, but 
is there any way to execute some action (run script) after LTO volstatus turned 
from Append to Full?

I would appreciate any help. Thanks.

+--
|This was sent by p.krysztof...@pixel.com.pl via Backup Central.
|Forward SPAM to ab...@backupcentral.com.
+--



--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Long running restore canceled by tape error? Any way to continue?

2011-06-13 Thread Bob Hetzel
The problem is that your setup is stretching your hardware way beyond the 
limits as you've configured it.

In most cases I would say that you should configure your file sets such 
that backups and restores take less than one day each.  If that means you 
have to break up a 50TB filesystem into 25 or even 50 jobs, then so be it. 
  With how much work you had it going through, it's very possible the tape 
drive needed cleaning in the middle.

https://www.maxell.co.jp/e/products/industrial/care_comtape/proper/proper3.html

For most environments I suspect every 50 hrs is overkill, but bacula does 
not handle tape cleaning properly yet.  With a job that long you're just 
asking for trouble.  I'm not saying with any certainty that the cleaning 
issue is what caused your problem, but if you break up that data area into 
multiple file sets you may find it to be far more manageable.  I presume 
you've got an auto-changer with more than one drive too, so if you've got 
the IO available on your storage array you may even be able to back it up 
two parts at a time... (and likewise for restore).

Just curious... is an 8 day restore acceptable to management?  You've 
probably already spent a ton of money on hardware... time to start 
optimizing it.

 From: Steve Costaras stev...@chaven.com

 I'm running bacula 5.0.3 under ubuntu 10.04  lto4 tapes. Have a restore 
 going that is a ~8 days long (~50TB), on the second to last day, got an I/O 
 error onn one of the tapes/one of the files, instead of continuing the 
 restore/skipping that file it canceled the entire restore process.

 two issues:

 1) is there a way to continue the restore from where it left off?

 2) what can I do to make sure that restores in the future do not cancel the 
 job when an i/o error happens but instead just log the file(s) that are in 
 error?



 
 2011-06-11 07loki-sd JobId 731: Ready to read from volume AA0011 on device 
 LTO4 (/dev/nst0).
 2011-06-11 07loki-sd JobId 731: Forward spacing Volume AA0011 to file:block 
 0:1.
 2011-06-11 08loki-sd JobId 731: Error: block.c:1002 Read error on fd=6 at 
 file:blk 3:1185 on device LTO4 (/dev/nst0). ERR=Input/output error.
 2011-06-11 08loki-fd JobId 731: Error: attribs.c:423 File size of restored 
 file /var/ftp/pub/Multimedia/DVD/Television/Stargate/Stargate SG1/Stargate 
 SG1 (2000)-S04D5.iso not correct. Original 7763525632, restored 28570208.
 2011-06-11 08loki-sd JobId 731: Alert: cannot open SCSI device '*None*' - No 
 such file or directory
 2011-06-11 08loki-sd JobId 731: Fatal error: fd_cmds.c:167 Command error with 
 FD, hanging up. Wrong Volume mounted on device LTO4 (/dev/nst0): Wanted 
 AA0011 have AA0010

 2011-06-11 08loki-dir JobId 731: Error: Bacula loki-dir 5.0.3 (04Aug10): 
 11-Jun-2011 08:02:25
  Build OS: x86_64-unknown-linux-gnu ubuntu 10.04
  JobId: 731
  Job: RestoreFiles.2011-06-05_21.26.08_04
  Restore Client: loki-fd
  Start time: 05-Jun-2011 21:26:10
  End time: 11-Jun-2011 08:02:25
  Files Expected: 3,146,656
  Files Restored: 48,369
  Bytes Restored: 32,321,658,846,063
  Rate: 68743.9 KB/s
  FD Errors: 1
  FD termination status: Error
  SD termination status: Error
  Termination: *** Restore Error ***

 2011-06-11 08loki-dir JobId 731: Begin pruning Jobs older than 1 year 25 days 
 .
 2011-06-11 08loki-dir JobId 731: No Jobs found to prune.
 2011-06-11 08loki-dir JobId 731: Begin pruning Jobs.
 2011-06-11 08loki-dir JobId 731: No Files found to prune.
 2011-06-11 08loki-dir JobId 731: End auto prune.

 2011-06-11 08loki-dir JobId 732: shell command: run BeforeJob 
 /opt/bacula/etc/make_catalog_backup bacula bacula
 2011-06-11 08loki-dir JobId 732: Start Backup JobId 732, 
 Job=loki-BackupCatalog.2011-06-05_23.10.00_05
 2011-06-11 08loki-dir JobId 732: Using Device LTO4
 2011-06-11 08loki-sd JobId 732: Error: block.c:1002 Read error on fd=6 at 
 file:blk 0:0 on device LTO4 (/dev/nst0). ERR=Input/output error.
 2011-06-11 08loki-sd JobId 732: Please mount Volume DD0004 or label a new 
 one for:
  Job: loki-BackupCatalog.2011-06-05_23.10.00_05
  Storage: LTO4 (/dev/nst0)
  Pool: BackupSetDD
  Media type: LTO4
 2011-06-11 08loki-sd JobId 732: Error: block.c:1002 Read error on fd=6 at 
 file:blk 0:0 on device LTO4 (/dev/nst0). ERR=Input/output error.
 ---


--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users