Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-07 Thread perikillo

On 10/6/06, Kris Kennaway [EMAIL PROTECTED] wrote:

On Fri, Oct 06, 2006 at 10:08:27AM -0700, perikillo wrote:

 change the scheduler to the old SCHED_4BSD and maxuser from 10 to 32
 like chuck told me.

These are probably what fixed it.

I guess you've learned a Lesson: when you choose to use code marked as
experimental, a) don't be surprised when it goes wrong, and b) the
first thing you should do to try and fix it is to stop using the
experimental code :-)

Kris






 Yes, this is the lastime that i will use *experimental code*. It
looks everything back to normal.

 My local backups already finished withuout any problems, right now
is bringing the remote servser backups and they are running good.

  Thanks people for all your help.

Greetings!!!
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-06 Thread perikillo

On 10/4/06, backyard [EMAIL PROTECTED] wrote:



--- Chuck Swiger [EMAIL PROTECTED] wrote:

 On Oct 4, 2006, at 10:32 AM, perikillo wrote:
  My kernel file is this:
 
  machine  i386
  cpu   I686_CPU

 You should also list cpu  I586_CPU, otherwise you
 will not include
 some optimizations intended for Pentium or higher
 processors.
 

are you sure about this??? This statement seems to
contradict the handbook which says it is best to use
only the CPU you have I would think I686_CPU would
cause the build know it is higher then a pentium and
thus use those optimizations. But if this is true...


-brian



___
freebsd-questions@freebsd.org  mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to  [EMAIL PROTECTED]





Hi people.

  Today i receive a completed FULL backups from all my local clients,
without any message saying:

vr0: watchdog timeout

  I did some changes, in kernel, bacula, and machine:

Machine
  Disable the internal NIC(via) and install one Linksys which use the
same driver vr0.

Kernel:

change the scheduler to the old SCHED_4BSD and maxuser from 10 to 32
like chuck told me.

disable AHC_ALLOW_MEMIO, this is the firs time that i use this option.

Enable IPFILTER to setup the firewall, i was thinking that maybe i
have been atack or something like that, i must check this.

Remove some SCSI drivers.

build the kernel, installed and reboot.

Bacula:

I setup the Heartbeat Interval var in the client and the storage demon
to 1 minute, because there is no formula to know which number is the
best.

  Today my backups where completed succesfully, no horror message, i
have been working with this server this past days, testing, change
here, there, until today, i dont know if it was the NIC, or some
kernel option, but is not very easy to test because is a production
server.

  I check my Firewall logs but there is nothing that give some clue
that i have been atack, good :-)

  Im testing the backup right now, today i will do another
FULL-BACKUPS  from all my local serves and i will bring the backups
from another serves that we have on another building and see if the
system is already stable.

  I will let you now people, thanks for your help.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-06 Thread Kris Kennaway
On Fri, Oct 06, 2006 at 10:08:27AM -0700, perikillo wrote:

 change the scheduler to the old SCHED_4BSD and maxuser from 10 to 32
 like chuck told me.

These are probably what fixed it.

I guess you've learned a Lesson: when you choose to use code marked as
experimental, a) don't be surprised when it goes wrong, and b) the
first thing you should do to try and fix it is to stop using the
experimental code :-)

Kris



pgpRB5V7lTNxs.pgp
Description: PGP signature


Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-04 Thread perikillo

On 10/3/06, Christopher Swingler [EMAIL PROTECTED] wrote:



On Oct 3, 2006, at 7:55 PM, perikillo wrote:

 On 10/3/06, perikillo [EMAIL PROTECTED] wrote:

   Hi people i have read a some mails about this problem, snip

 Greetings.




 Wow

   snip again
 To unsubscribe, send any mail to freebsd-questions-
 [EMAIL PROTECTED]

The only time I have ever had watchdog timeouts is when I've had a
bad cable or a bad port.  If it's doing it on two entirely different
NICs, then it is most CERTAINLY a bad cable or a bad port on the
switch end.  You seem to have addressed the most expensive issue
first (bad card), which is kind of backwards, but whatever.  You've
switched ports, that's good too, now switch cables.



Hi people, thanks for your answer, right now i googling around and see how
to handle this problem i have.

 Im home right now, here i have one NIC Intel (fxp driver) with 2 ports on
it, this is my firewall machine but tomorrow i will take to my work, i dont
have access top my server right now , but i will give you the info you
request ASAP.

  I have another Linksys NIC, some posts say that those 2 NIC's on freebsd
are really good.

 Another thing that i will do, i dont know if it works, disable the drivers
form the kernel and just use the modules and see what hapend :-?

 But i need to see first how my backups finish, and will let you you now
guys. Thanks for your time.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-04 Thread Bill Moran
In response to perikillo [EMAIL PROTECTED]:

[snip]

 Right now my first backup again crash
 
 xl0: watchdog timeout
 
 Right now i change the cable from on port to another and see what happends.
 
 Guy, please someone has something to tell me, this is critical for me.
 
 This is my second NIC.

Don't know if this is related or not, but it may be:
http://lists.freebsd.org/pipermail/freebsd-stable/2006-September/028792.html

-- 
Bill Moran
Collaborative Fusion Inc.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-04 Thread perikillo

On 10/4/06, Bill Moran [EMAIL PROTECTED] wrote:


In response to perikillo [EMAIL PROTECTED]:

[snip]

 Right now my first backup again crash

 xl0: watchdog timeout

 Right now i change the cable from on port to another and see what
happends.

 Guy, please someone has something to tell me, this is critical for me.

 This is my second NIC.

Don't know if this is related or not, but it may be:

http://lists.freebsd.org/pipermail/freebsd-stable/2006-September/028792.html

--
Bill Moran
Collaborative Fusion Inc.



Hi people.

  Today my full backups completed succesfully, but my NIC again show me the
same failure:

Oct  3 23:36:54 bacula kernel: xl0: watchdog timeout
Oct  3 23:36:54 bacula kernel: xl0: no carrier - transceiver cable problem?
Oct  3 23:36:54 bacula kernel: xl0: link state changed to DOWN
Oct  3 23:36:56 bacula kernel: xl0: link state changed to UP
Oct  4 00:39:14 bacula kernel: xl0: watchdog timeout
Oct  4 00:39:14 bacula kernel: xl0: no carrier - transceiver cable problem?
Oct  4 00:39:14 bacula kernel: xl0: link state changed to DOWN
Oct  4 00:39:16 bacula kernel: xl0: link state changed to UP
Oct  4 01:41:39 bacula kernel: xl0: watchdog timeout
Oct  4 01:41:39 bacula kernel: xl0: no carrier - transceiver cable problem?
Oct  4 01:41:39 bacula kernel: xl0: link state changed to DOWN
Oct  4 01:41:42 bacula kernel: xl0: link state changed to UP
Oct  4 08:12:45 bacula login: ROOT LOGIN (root) ON ttyv0
Oct  4 08:15:50 bacula kernel: xl0: link state changed to DOWN
Oct  4 08:20:07 bacula login: ROOT LOGIN (root) ON ttyv1
Oct  4 08:27:34 bacula kernel: xl0: link state changed to UP
Oct  4 08:27:38 bacula kernel: xl0: link state changed to DOWN
Oct  4 08:27:40 bacula kernel: xl0: link state changed to UP
Oct  4 08:31:53 bacula su: ubacula to root on /dev/ttyp0

I check the switch, view the port where this server is connected but i dont
see nothing wrong there:

  Received   Transmitted
--
--
Packets:  53411791Packets:
93628031
Multicasts:  0Multicasts:
37550
Broadcasts: 19Broadcasts:
36157
Total Octets:   3644260033Total Octets:
737446293
Lost Packets:0Lost
Packets:0
Packets 64 bytes: 16678016Packets 64 bytes:
959175
   65-127 bytes  3673309465-127 bytes
384773
   128-255 bytes  384128-255 bytes
114963
   256-511 bytes   70256-511 bytes
304495
   512-1023 bytes  60512-1023 bytes
2472655
   1024-1518 bytes1671024-1518 bytes
89391970
FCS Errors:  0
Collisions:  0
Undersized Packets:  0Single
Collisions:   0
Oversized Packets:   0Multiple
Collisions: 0
Filtered Packets:   83Excessive
Collisions:0
Flooded Packets: 0Deferred
Packets:0
Frame Errors:0Late
Collisions: 0


My kernel file is this:

machine  i386
cpu   I686_CPU
ident BACULA
maxusers   10

# To statically compile in device wiring instead of /boot/device.hints
#hintsGENERIC.hints# Default places to look for
devices.

makeoptions   DEBUG=-g # Build kernel with gdb(1) debug symbols

options   SCHED_ULE# ULE scheduler
#options  SCHED_4BSD   # 4BSD scheduler
options   PREEMPTION   # Enable kernel thread preemption
options   INET # InterNETworking
#options  INET6# IPv6 communications protocols
options   FFS  # Berkeley Fast Filesystem
options   SOFTUPDATES  # Enable FFS soft updates support
options   UFS_ACL# Support for access control lists
options   UFS_DIRHASH  # Improve performance on big directories
options   MD_ROOT# MD is a potential root device
options   MSDOSFS# MSDOS Filesystem
options   CD9660   # ISO 9660 Filesystem
options   PROCFS   # Process filesystem (requires PSEUDOFS)
options   PSEUDOFS # Pseudo-filesystem framework
options   GEOM_GPT # GUID Partition Tables.
options   COMPAT_43# Compatible with BSD 4.3 [KEEP THIS!]
#options  COMPAT_FREEBSD4# Compatible with FreeBSD4
options   COMPAT_FREEBSD5# Compatible with FreeBSD5
options   SCSI_DELAY=5000# Delay (in ms) before probing SCSI
options   KTRACE   # ktrace(1) support
options   _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time

Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-04 Thread Chuck Swiger

On Oct 4, 2006, at 10:32 AM, perikillo wrote:

My kernel file is this:

machine  i386
cpu   I686_CPU


You should also list cpu  I586_CPU, otherwise you will not include  
some optimizations intended for Pentium or higher processors.



ident BACULA
maxusers   10


Unless you've got extremely low RAM in the machine, you should either  
increase this to 32 or so, or let it autoconfigure itself.



# To statically compile in device wiring instead of /boot/device.hints
#hintsGENERIC.hints# Default places to look for
devices.

makeoptions   DEBUG=-g # Build kernel with gdb(1) debug  
symbols


options   SCHED_ULE# ULE scheduler
#options  SCHED_4BSD   # 4BSD scheduler


And you should switch to using SCHED_4BSD instead of SCHED_ULE until  
the bugs are worked out of the ULE scheduler.


--
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-04 Thread backyard


--- Chuck Swiger [EMAIL PROTECTED] wrote:

 On Oct 4, 2006, at 10:32 AM, perikillo wrote:
  My kernel file is this:
 
  machine  i386
  cpu   I686_CPU
 
 You should also list cpu  I586_CPU, otherwise you
 will not include  
 some optimizations intended for Pentium or higher
 processors.
 

are you sure about this??? This statement seems to
contradict the handbook which says it is best to use
only the CPU you have I would think I686_CPU would
cause the build know it is higher then a pentium and
thus use those optimizations. But if this is true...


-brian



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups

2006-10-03 Thread perikillo

On 10/3/06, perikillo [EMAIL PROTECTED] wrote:


  Hi people i have read a some mails about this problem, it looks like all
was running some 5.X branch, i have been using FreeBSD 6.1 some months
ago,  yesterday i make the buildworld process, right now i have my box with
FreeBSD6.1-p10.

  This box runs bacula server with this NIC:

vr0: VIA VT6102 Rhine II 10/100BaseTX port 0xe400-0xe4ff mem
0xee022000-0xee0220ff at device 18.0 on pci0
vr0: Reserved 0x100 bytes for rid 0x10 type 4 at 0xe400
miibus0: MII bus on vr0
vr0: bpf attached
vr0: Ethernet address: 00:01:6c:2c:09:90
vr0: [MPSAFE]

  This NIC is integrated with the motherboard, i used this box with
freebsd 5.4-pX almost 1 year running bacula 1.38.5 without a problem.

  1 full backup take almost 140Gb of data.

Last week i lost 1 job Full Backup from one of my biggest servers running
RH9 aprox 80Gb off data, bacula just backup 35Gb and mark the job -Error

26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: Network
error with FD during Backup: ERR=Operation timed out
26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: No Job
status returned from FD.
26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Error: Bacula 
1.38.11(28Jun06): 26-Sep-2006 00:28:48

FD termination status:  Error
SD termination status:  Error
Termination:*** Backup Error ***

  I have no problem with the client, is running our ERP software and no
comment here.

In my freebsd console appear this:

vr0: watchdog timeout

  I reset the server, and all the Differential backups has been working
good, i do the buildworld yesterday and let my bacula server ready to do a
full backup for all my clients and whops...

I lost 2 clients jobs:

Client 1:

02-Oct 18:30 bacula-dir: Start Backup JobId 176, Job=
PDC.2006-10-02_18.30.00
02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: Network
error with FD during Backup: ERR=Operation timed out
02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: No Job
status returned from FD.
02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Error: Bacula 
1.38.11(28Jun06): 02-Oct-2006 20:40:11
  JobId:  176
  Job:PDC.2006-10-02_18.30.00
  Backup Level:   Full
  Client:   PDC Windows NT 4.0,MVS,NT 4.0.1381
  FileSet:PDC-FS 2006-08-21 18:04:12
  Pool:   FullTape
  Storage:LTO-1
  Scheduled time: 02-Oct-2006 18:30:00
  Start time: 02-Oct-2006 18:30:06
  End time:   02-Oct-2006 20:40:11
  Elapsed time:   2 hours 10 mins 5 secs
  Priority:   11
  FD Files Written:   0
  SD Files Written:   0
  FD Bytes Written:   0 (0 B)
  SD Bytes Written:   0 (0 B)
  Rate:   0.0 KB/s
  Software Compression:   None
  Volume name(s): FullTape-0004
  Volume Session Id:  2
  Volume Session Time:1159832414
  Last Volume Bytes:  38,857,830,949 ( 38.85 GB)
  Non-fatal FD errors:0
  SD Errors:  0
  FD termination status:  Error
  SD termination status:  Error
  Termination:*** Backup Error ***

Client 2

02-Oct 21:30 bacula-dir: Start Backup JobId 178, Job=
MBXBDCB.2006-10-02_21.30.00
02-Oct 21:31 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:37 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:44 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:51 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 21:58 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 22:04 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
Retrying ...
02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Fatal error: bnet.c:859
Unable to connect to File daemon on 192.168.2.9:9102 . ERR=Host is down
02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Error: Bacula 
1.38.11(28Jun06): 02-Oct-2006 22:10:03
  JobId:  178
  Job:MBXBDCB.2006-10-02_21.30.00
  Backup Level:   Full
  Client: MBXBDCB i686-pc-linux-gnu,redhat,9
  FileSet:MBXBDCB-FS 2006-08-21 23:00:02
  Pool:   FullTape
  Storage:LTO-1
  Scheduled time: 02-Oct-2006 21:30:00
  Start time: 02-Oct-2006 21:30:02
  End time:   02-Oct-2006 22:10:03
  Elapsed time:   40 mins 1 sec
  Priority:   13
  FD Files Written:   0
  SD