i have a pair of openbsd boxes, each running a secondary sshd service on
alternate port on their primary ip addresses. in addition to the primary
address, they also share a carp address.

the secondary sshd service listens on the primary address, and alternate
port
(10022). secondary sshd configuration, users, directories and such are
synchronized between the two boxes so that the secondary sshd services are
identical excepting for their listening address.

each of these boxes is also running relayd. the configurations are
identical,
being synchronized the same way as the secondary sshd services are. relayd
listens on the carp address, and relays inbound connections to the two
primary
addresses on the alternate ports.

the overall goal is to provide a clustered sftp service that protects
against
single points of failure and allows for growth and maintenance (eg. adding
more
cluster nodes and/or bringing one down for patching).

overall this appears to be working, but (and here is why i'm posting) ...

relayd has been failing its backend hosts at somewhat random intervals, with
complaints of timeouts. we don't always see these (though they are
definitely
being logged) because relayd simply relays connections to the boxes which
remain "up" ... however, we've seen periods where both backend services get
marked down at the same time and then our inbound transmissions fail. at
that
point it comes to everybody's attention.

initially i think we're having some kind of issue with our switching or
something. but then i realize that relayd is failing the service running on
the same host as well as other hosts on the network. if relayd is listening
on
address X on a carp interface and talking to address Y on the same physical
interface which is related to the carp interface, do these packets
physically
leave the box or does openbsd route those packets locally? if the former, we
could still have network switching issues. if the latter, i probably have an
issue with my configuration on openbsd. and in the case of the latter, i'll
show my configuration below in hopes that someone can point something out
that
i'm doing incorrectly. i'll also include dmesg. we do have cron jobs running
at 15-minute intervals, which very roughly equates to at least some of the
relayd messages shown below, so i have to wonder about limits, memory, etc.

i haven't seen anything listed in errata which leads me to believe it's a
problem with the code on the system, but i'm definitely not running the
latest
code. i'm more interested at this point in vetting out any configuration
issues or misconceptions on my part, and possibly ruling out these systems
as
causes of the problem.

help, as well as constructive criticism, will certainly be appreciated!


----

# w|grep load
11:47AM  up 24 days, 11:04, 2 users, load averages: 0.14, 0.27, 0.55

# uname -a
OpenBSD clusternode1 5.3 GENERIC.MP#58 i386

# vmstat
 procs    memory       page                    disks    traps          cpu
 r b w    avm     fre  flt  re  pi  po  fr  sr cd0 sd0  int   sys   cs us
sy id
 1 7 0  24020 2403692  363   0   0   0   0   0   0   0   54  1186   79  1
1 98



========
/etc/hostname.em0
--------
inet 10.11.12.102 255.255.255.0
-inet6
--------



========
/etc/hostname.carp100
--------
inet 10.11.12.100 255.255.255.0 NONE vhid 100 pass mycarppass advbase 1
advskew 1 description sftpcluster
-inet6
--------



========
/usr/local/etc/sftp/run
--------
#!/bin/sh
exec 2>&1

PORT=10022
MYDIR=`pwd`
HOSTKEYS=
for K in  ${MYDIR}/etc/*key ; do HOSTKEYS="${HOSTKEYS} -h ${K}" ; done
SSHD=`which sshd`
MYIP=`ifconfig egress | awk '/inet / { print $2 }'`

exec ${SSHD} -f ${MYDIR}/sftp.config \
        ${HOSTKEYS} \
        -o ListenAddress=${MYIP}:${PORT} \
        -D -e
--------



========
/usr/local/etc/sftp/sftp.config
--------
Protocol 2
SyslogFacility AUTH
LogLevel INFO
PermitRootLogin no
StrictModes yes
MaxAuthTries 6
##MaxSessions 10
AuthorizedKeysFile      /usr/local/etc/sftp/authorized_keys/%u.pub
PasswordAuthentication yes
PermitEmptyPasswords no
AllowAgentForwarding no
AllowTcpForwarding no
GatewayPorts no
X11Forwarding no
UsePrivilegeSeparation sandbox
PermitUserEnvironment no
Compression delayed
ClientAliveInterval 0
ClientAliveCountMax 3
PidFile /var/run/sftp.pid
##MaxStartups 10:30:100
PermitTunnel no
ChrootDirectory %h
#Port 10022
UseDNS no
Subsystem       sftp    internal-sftp
ForceCommand internal-sftp -l VERBOSE

--------



========
/usr/local/etc/relayd/run
--------
#!/bin/sh
exec 2>&1

RELAYD=`which relayd`

sleep 1
exec ${RELAYD} -d -v -f ./relayd.conf
--------



========
/usr/local/etc/relayd/relayd.conf
--------
table <sftpcluster> { 10.11.12.102 10.11.12.103 } # sftpnode1/sftpnode2
relay "sftpcluster" {
        listen on carp100 port 10022
        forward to <sftpcluster> port 10022 mode loadbalance check send
'foo'
expect 'SSH-2*'
}
--------

output from relayd:
/var/log/authlog:Mar 18 00:00:02 clusternode1 relayd: host 10.10.10.103,
check send expect (209ms), state up -> down, availability 99.51%
/var/log/authlog:Mar 18 00:00:02 clusternode1 relayd: host 10.10.10.102,
check send expect (210ms), state up -> down, availability 99.59%
/var/log/authlog:Mar 18 00:00:12 clusternode1 relayd: host 10.10.10.102,
check send expect (109ms), state down -> up, availability 99.59%
/var/log/authlog:Mar 18 00:00:12 clusternode1 relayd: host 10.10.10.103,
check send expect (117ms), state down -> up, availability 99.51%
/var/log/authlog:Mar 18 00:01:02 clusternode1 relayd: host 10.10.10.103,
check send expect (209ms), state up -> down, availability 99.51%
/var/log/authlog:Mar 18 00:01:02 clusternode1 relayd: host 10.10.10.102,
check send expect (210ms), state up -> down, availability 99.59%
/var/log/authlog:Mar 18 00:01:12 clusternode1 relayd: host 10.10.10.102,
check send expect (111ms), state down -> up, availability 99.59%
/var/log/authlog:Mar 18 00:01:12 clusternode1 relayd: host 10.10.10.103,
check send expect (118ms), state down -> up, availability 99.51%
/var/log/authlog:Mar 18 00:15:03 clusternode1 relayd: host 10.10.10.102,
check send expect (211ms), state up -> down, availability 99.58%
/var/log/authlog:Mar 18 00:15:03 clusternode1 relayd: host 10.10.10.103,
check send expect (211ms), state up -> down, availability 99.51%
/var/log/authlog:Mar 18 00:15:13 clusternode1 relayd: host 10.10.10.102,
check send expect (109ms), state down -> up, availability 99.58%
/var/log/authlog:Mar 18 00:15:13 clusternode1 relayd: host 10.10.10.103,
check send expect (123ms), state down -> up, availability 99.51%
/var/log/authlog:Mar 18 00:30:04 clusternode1 relayd: host 10.10.10.102,
check send expect (305ms), state up -> down, availability 99.58%
/var/log/authlog:Mar 18 00:30:04 clusternode1 relayd: host 10.10.10.103,
check send expect (306ms), state up -> down, availability 99.51%
/var/log/authlog:Mar 18 00:30:14 clusternode1 relayd: host 10.10.10.103,
check send expect (112ms), state down -> up, availability 99.51%
/var/log/authlog:Mar 18 00:30:14 clusternode1 relayd: host 10.10.10.102,
check send expect (123ms), state down -> up, availability 99.58%
/var/log/authlog:Mar 18 00:45:05 clusternode1 relayd: host 10.10.10.102,
check send expect (209ms), state up -> down, availability 99.58%
/var/log/authlog:Mar 18 00:45:14 clusternode1 relayd: host 10.10.10.102,
check send expect (113ms), state down -> up, availability 99.58%
/var/log/authlog:Mar 18 01:00:06 clusternode1 relayd: host 10.10.10.103,
check send expect (200ms), state up -> down, availability 99.51%
/var/log/authlog:Mar 18 01:00:15 clusternode1 relayd: host 10.10.10.103,
check send expect (128ms), state down -> up, availability 99.51%
/var/log/authlog:Mar 18 01:30:07 clusternode1 relayd: host 10.10.10.103,
check send expect (202ms), state up -> down, availability 99.51%
/var/log/authlog:Mar 18 01:30:17 clusternode1 relayd: host 10.10.10.103,
check send expect (116ms), state down -> up, availability 99.51%
/var/log/authlog:Mar 18 02:30:01 clusternode1 relayd: host 10.10.10.103,
check send expect (209ms), state up -> down, availability 99.51%
/var/log/authlog:Mar 18 02:30:11 clusternode1 relayd: host 10.10.10.103,
check send expect (115ms), state down -> up, availability 99.51%



# dmesg
OpenBSD 5.3 (GENERIC.MP) #58: Tue Mar 12 18:43:53 MDT 2013
    dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
cpu0: AMD Opteron(tm) Processor 6278  ("AuthenticAMD" 686-class, 2048KB L2
cache) 2.40 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,NXE,MMXX,FFXSR,LONG,SSE3,CX16,POPCNT,LAHF,EAPICSP,ABM,SSE4A,MASSE,3DNOWP,OSVW,ITSC
real mem  = 3220697088 (3071MB)
avail mem = 3157082112 (3010MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 07/30/13, BIOS32 rev. 0 @ 0xfd780,
SMBIOS rev. 2.4 @ 0xe0010 (364 entries)
bios0: vendor Phoenix Technologies LTD version "6.00" date 07/30/2013
bios0: VMware, Inc. VMware Virtual Platform
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP BOOT APIC MCFG SRAT HPET WAET
acpi0: wakeup devices PCI0(S3) USB_(S1) P2P0(S3) S1F0(S3) S2F0(S3) S3F0(S3)
S4F0(S3) S5F0(S3) S6F0(S3) S7F0(S3) S8F0(S3) S9F0(S3) S10F(S3) S11F(S3)
S12F(S3) S13F(S3) S14F(S3) S15F(S3) S16F(S3) S17F(S3) S18F(S3) S19F(S3)
S20F(S3) S21F(S3) S22F(S3) S23F(S3) S24F(S3) S25F(S3) S26F(S3) S27F(S3)
S28F(S3) S29F(S3) S30F(S3) S31F(S3) S32F(S3) P2P1(S3) S1F0(S3) S2F0(S3)
S3F0(S3) S4F0(S3) S5F0(S3) S6F0(S3) S7F0(S3) S8F0(S3) S9F0(S3) S10F(S3)
S11F(S3) S12F(S3) S13F(S3) S14F(S3) S15F(S3) S16F(S3) S17F(S3) S18F(S3)
S19F(S3) S20F(S3) S21F(S3) S22F(S3) S23F(S3) S24F(S3) S25F(S3) S26F(S3)
S27F(S3) S28F(S3) S29F(S3) S30F(S3) S31F(S3) S32F(S3) P2P2(S3) S1F0(S3)
S2F0(S3) S3F0(S3) S4F0(S3) S5F0(S3) S6F0(S3) S7F0(S3) S8F0(S3) S9F0(S3)
S10F(S3) S11F(S3) S12F(S3) S13F(S3) S14F(S3) S15F(S3) S16F(S3) S17F(S3)
S18F(S3) S19F(S3) S20F(S3) S21F(S3) S22F(S3) S23F(S3) S24F(S3) S25F(S3)
S26F(S3) S27F(S3) S28F(S3) S29F(S3) S30F(S3) S31F(S3) S32F(S3) P2P3(S3)
S1F0(S3) S2F0(S3) S3F0(S3) S4F0(S3) S5F0(S3) S6F0(S3) S7F0(S3) S8F0(S3)
S9F0(S3) S10F(S3) S11F(S3) S12F(S3) S13F(S3) S14F(S3) S15F(S3) S16F(S3)
S17F(S3) S18F(S3) S19F(S3) S20F(S3) S21F(S3) S22F(S3) S23F(S3) S24F(S3)
S25F(S3) S26F(S3) S27F(S3) S28F(S3) S29F(S3) S30F(S3) S31F(S3) S32F(S3)
PE40(S3) S1F0(S3) PE50(S3) S1F0(S3) PE60(S3) S1F0(S3) PE70(S3) S1F0(S3)
PE80(S3) S1F0(S3) PE90(S3) S1F0(S3) PEA0(S3) S1F0(S3) PEB0(S3) S1F0(S3)
PEC0(S3) S1F0(S3) PED0(S3) S1F0(S3) PEE0(S3) S1F0(S3) PE41(S3) S1F0(S3)
PE42(S3) S1F0(S3) PE43(S3) S1F0(S3) PE44(S3) S1F0(S3) PE45(S3) S1F0(S3)
PE46(S3) S1F0(S3) PE47(S3) S1F0(S3) PE51(S3) S1F0(S3) PE52(S3) S1F0(S3)
PE53(S3) S1F0(S3) PE54(S3) S1F0(S3) PE55(S3) S1F0(S3) PE56(S3) S1F0(S3)
PE57(S3) S1F0(S3) PE61(S3) S1F0(S3) PE62(S3) S1F0(S3) PE63(S3) S1F0(S3)
PE64(S3) S1F0(S3) PE65(S3) S1F0(S3) PE66(S3) S1F0(S3) PE67(S3) S1F0(S3)
PE71(S3) S1F0(S3) PE72(S3) S1F0(S3) PE73(S3) S1F0(S3) PE74(S3) S1F0(S3)
PE75(S3) S1F0(S3) PE76(S3) S1F0(S3) PE77(S3) S1F0(S3) PE81(S3) S1F0(S3)
PE82(S3) S1F0(S3) PE83(S3) S1F0(S3) PE84(S3) S1F0(S3) PE85(S3) S1F0(S3)
PE86(S3) S1F0(S3) PE87(S3) S1F0(S3) PE91(S3) S1F0(S3) PE92(S3) S1F0(S3)
PE93(S3) S1F0(S3) PE94(S3) S1F0(S3) PE95(S3) S1F0(S3) PE96(S3) S1F0(S3)
PE97(S3) S1F0(S3) PEA1(S3) S1F0(S3) PEA2(S3) S1F0(S3) PEA3(S3) S1F0(S3)
PEA4(S3) S1F0(S3) PEA5(S3) S1F0(S3) PEA6(S3) S1F0(S3) PEA7(S3) S1F0(S3)
PEB1(S3) S1F0(S3) PEB2(S3) S1F0(S3) PEB3(S3) S1F0(S3) PEB4(S3) S1F0(S3)
PEB5(S3) S1F0(S3) PEB6(S3) S1F0(S3) PEB7(S3) S1F0(S3) SLPB(S4) LID_(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD erratum 721 detected and fixed
cpu0: apic clock running at 65MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Opteron(tm) Processor 6278  ("AuthenticAMD" 686-class, 2048KB L2
cache) 2.41 GHz
cpu1:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,NXE,MMXX,FFXSR,LONG,SSE3,CX16,POPCNT,LAHF,EAPICSP,ABM,SSE4A,MASSE,3DNOWP,OSVW,ITSC
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD Opteron(tm) Processor 6278  ("AuthenticAMD" 686-class, 2048KB L2
cache) 2.41 GHz
cpu2:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,NXE,MMXX,FFXSR,LONG,SSE3,CX16,POPCNT,LAHF,EAPICSP,ABM,SSE4A,MASSE,3DNOWP,OSVW,ITSC
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD Opteron(tm) Processor 6278  ("AuthenticAMD" 686-class, 2048KB L2
cache) 2.41 GHz
cpu3:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,NXE,MMXX,FFXSR,LONG,SSE3,CX16,POPCNT,LAHF,EAPICSP,ABM,SSE4A,MASSE,3DNOWP,OSVW,ITSC
ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 11, 24 pins
acpimcfg0 at acpi0 addr 0xf0000000, bus 0-127
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0
acpicpu1 at acpi0
acpicpu2 at acpi0
acpicpu3 at acpi0
acpibat0 at acpi0: BAT1 not present
acpibat1 at acpi0: BAT2 not present
acpiac0 at acpi0: AC unit online
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: LID_
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x1e00! 0xca000/0x1000
0xdc000/0x4000!
0xe0000/0x8000!
vmt0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pci12 at ppb11 bus 12
ppb12 at pci0 dev 22 function 2 "VMware Virtual PCIE-PCIE" rev 0x01
pci13 at ppb12 bus 13
ppb13 at pci0 dev 22 function 3 "VMware Virtual PCIE-PCIE" rev 0x01
pci14 at ppb13 bus 14
ppb14 at pci0 dev 22 function 4 "VMware Virtual PCIE-PCIE" rev 0x01
pci15 at ppb14 bus 15
ppb15 at pci0 dev 22 function 5 "VMware Virtual PCIE-PCIE" rev 0x01
pci16 at ppb15 bus 16
ppb16 at pci0 dev 22 function 6 "VMware Virtual PCIE-PCIE" rev 0x01
pci17 at ppb16 bus 17
ppb17 at pci0 dev 22 function 7 "VMware Virtual PCIE-PCIE" rev 0x01
pci18 at ppb17 bus 18
ppb18 at pci0 dev 23 function 0 "VMware Virtual PCIE-PCIE" rev 0x01
pci19 at ppb18 bus 19
ppb19 at pci0 dev 23 function 1 "VMware Virtual PCIE-PCIE" rev 0x01
pci20 at ppb19 bus 20
ppb20 at pci0 dev 23 function 2 "VMware Virtual PCIE-PCIE" rev 0x01
pci21 at ppb20 bus 21
ppb21 at pci0 dev 23 function 3 "VMware Virtual PCIE-PCIE" rev 0x01
pci22 at ppb21 bus 22
ppb22 at pci0 dev 23 function 4 "VMware Virtual PCIE-PCIE" rev 0x01
pci23 at ppb22 bus 23
ppb23 at pci0 dev 23 function 5 "VMware Virtual PCIE-PCIE" rev 0x01
pci24 at ppb23 bus 24
ppb24 at pci0 dev 23 function 6 "VMware Virtual PCIE-PCIE" rev 0x01
pci25 at ppb24 bus 25
ppb25 at pci0 dev 23 function 7 "VMware Virtual PCIE-PCIE" rev 0x01
pci26 at ppb25 bus 26
ppb26 at pci0 dev 24 function 0 "VMware Virtual PCIE-PCIE" rev 0x01
pci27 at ppb26 bus 27
ppb27 at pci0 dev 24 function 1 "VMware Virtual PCIE-PCIE" rev 0x01
pci28 at ppb27 bus 28
ppb28 at pci0 dev 24 function 2 "VMware Virtual PCIE-PCIE" rev 0x01
pci29 at ppb28 bus 29
ppb29 at pci0 dev 24 function 3 "VMware Virtual PCIE-PCIE" rev 0x01
pci30 at ppb29 bus 30
ppb30 at pci0 dev 24 function 4 "VMware Virtual PCIE-PCIE" rev 0x01
pci31 at ppb30 bus 31
ppb31 at pci0 dev 24 function 5 "VMware Virtual PCIE-PCIE" rev 0x01
pci32 at ppb31 bus 32
ppb32 at pci0 dev 24 function 6 "VMware Virtual PCIE-PCIE" rev 0x01
pci33 at ppb32 bus 33
ppb33 at pci0 dev 24 function 7 "VMware Virtual PCIE-PCIE" rev 0x01
pci34 at ppb33 bus 34
isa0 at piixpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
mtrr: Pentium Pro MTRR support
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (116ad8073d3d50b2.a) swap on sd0b dump on sd0b
cpu1: AMD erratum 721 detected and fixed
cpu2: AMD erratum 721 detected and fixed
cpu3: AMD erratum 721 detected and fixed
wsdisplay0: screen 5 deleted
wsdisplay0: screen 5 added (80x50, vt100 emulation)
carp100: state transition: BACKUP -> MASTER

Reply via email to