RE: sshd / tcp packet corruption ? ZFS Samba?

2010-06-28 Thread Martin Minkus
Okay guys,

 

Just thought i’d post that a resolution has been found.

 

People suggested it could be hardware and try memtest – which never
found anything.

 

It seems though that in the end the issue is the motherboard; Possibly
the southbridge or something to do with the PCI bus.

 

The SATA drives which are hanging of a marvel in a pcie slot was
unaffected. No amount of zfs scrubs and rsync with checksumming found
anything wrong.

 

It was only network traffic on the intel pro (pci card) or onboard
nvidia nfe card that had issues. It was worst when using samba of ZFS,
though god knows why that exposed the issue more.

 

I never had any kernel panics, just silent data corruption on the PCI
bus.

 

Moved hdds and cards to a different motherboard, and everything is 100%
fine.

 

So a couple weeks looking at this on and off (and slowly losing my mind)
and it was nothing more than flaky hardware.

 

Thanks for your help to those who took the time to reply.

 

Martin.

 

From: Martin Minkus 
Sent: Monday, 28 June 2010 09:22
To: freebsd-questions@freebsd.org
Subject: RE: sshd / tcp packet corruption ? ZFS  Samba?

 

Hey all,

 

It was suggested I do a memtest, but that checked out fine. (I wish it
was as simple as just the ram!)

 

I’ve realised the issue manifests itself almost immediately when
accessing an underlying ZFS filesystem using Samba. But if it is UFS, it
is fine.

 

Does this mean anything to anyone?

 

Ie: md5’ing the same file over SMB, one on UFS (/tmp) one on ZFS:

 

cd5d0011c28fb335d57a83b3751831e7
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

bb433ae7e4c3c70c49b3c8c1590e8aa5
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

8eeaf672f6742ae4f900b16ec3cb190a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

bc327dc715516b5ba2e8478036112bd2
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

0cde0cf7ec036cedc8f3294153209b4c
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

71e705470a4af5533eb019e00df3a946
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

ba7041e4cad852d00c8da1a461e3b5f9
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

7ce9ea8b9a4d8858899da23472a24c76
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

8f0eff7cb6069ff39aa46e2affc27a4b
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

c23fceb0302fd59b49e22bce61eabe8d
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

46c9d538c99be3947b92f9ec47bb900a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

2a2a94c94a167a8e525e368aceb07875
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

d303861d09b0584f6c6621e9881e3f63
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

ad8f8cef1829de206460b947687909f0
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

9a866d9602a9df92b6acb6f1182b05ab
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

5552491a9e295890ad48064440d8d05b
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

ceee04c26b03132db48d67c076526c82
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

7aa666918d73e40a25ccdb1c104f8476
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

561aa772884c0b7ef139f556355adffb
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

30540ecb4bfb8533969f4a4137a77e79
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

c0f315f00be76a4e15dec68de2bba49b
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

9de4864a97ed4ad9c495c221fe1b932f
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

47c8ad183dbe0d4637229af08cc2cd89
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

c9bfe8c7073940acbcdb31430eb4a061
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe

327605a6ddb89f7a3e2bd056c5f28b2a
*//kinetic/pulse/shares/cti/bin/Desktop.exe

2447bdb56c5fa8efa761ffa100908022

RE: sshd / tcp packet corruption ? ZFS Samba?

2010-06-27 Thread Martin Minkus
on usbus0

kbd2 at ukbd0

uhid0: CHICONY Compaq USB Keyboard, class 0/0, rev 1.10/1.05, addr 2
on usbus0

em0: link state changed to UP

ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is
present;

to enable, add vfs.zfs.prefetch_disable=0 to
/boot/loader.conf.

ZFS filesystem version 3

ZFS storage pool version 14

kinetic:~#

 

 

I’ve since removed everything from /etc/sysctl.conf and/boot/loader.conf
so no tuning is used. I’ve also been fiddling and trying all sorts of
different things in smb.conf.

 

It makes no difference.

 

I am at a complete loss as to what is going on here.

 

Should I just give up? Is there some obscure ZFS+Samba issue on FreeBSD?

 

Thanks,

Martin.

 

 

From: Martin Minkus 
Sent: Wednesday, 23 June 2010 16:01
To: freebsd-questions@freebsd.org
Subject: sshd / tcp packet corruption ?

 

It seems this issue I reported below may actually be related to some
kind of TCP packet corruption ?

 

Still same box. I’ve noticed my SSH connections into the box will die
randomly, with errors.

 

Sshd logs the following on the box itself:

 

Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from
10.64.10.251 port 56469 ssh2

Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't
contact LDAP server

Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0

Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1

Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout()
returned an error

Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from
10.64.10.251 port 56470 ssh2

Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout()
returned an error

 

Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from
10.64.10.209: 5: Message Authentication Code did not verify (packet
#75658). Data integrity has been compromised. 

Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from
10.64.10.209 port 9494 ssh2

Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3

Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from
10.64.10.209: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from
10.64.10.209 port 9534 ssh2

 

My googlefu has failed me on this.

 

Any ideas what on earth this could be ?

 

Ethernet card?

 

em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.1 port
0xcc00-0xcc3f mem 0xfdfe-0xfdff,0xfdfc-0xfdfd irq 17 at
device 7.0 on pci1

em0: [FILTER]

em0: Ethernet address: 00:0e:0c:6b:d6:d3

 

em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu
1500

 
options=209bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC


ether 00:0e:0c:6b:d6:d3

inet 10.64.10.10 netmask 0xff00 broadcast 10.64.10.255

media: Ethernet autoselect (1000baseT full-duplex)

status: active

 

Thanks,

Martin.

 

 

From: Martin Minkus 
Sent: Monday, 14 June 2010 11:21
To: freebsd-questions@freebsd.org
Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported -
after a few days?

 

Samba 3.4 on FreeBSD 8-STABLE branch.

After a few days I start getting weird errors and windows PC's can't
access the samba share, have trouble accessing files, etc, and samba
becomes totally unusable.

Restarting samba doesn't fix it – only a reboot does.

 

Accessing files on the ZFS pool locally is fine. Other services (like
dhcpd, openldap server) on the box continue to work fine. Only samba
dies and by dies I mean it can no longer service clients and windows
brings up bizarre errors. Windows can access our other samba servers (on
linux, etc) just fine.

Kernel:

 

FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4:
Wed May 26 18:09:14 NZST 2010
mart...@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64

 

Zpool status:

 

kinetic:~$ zpool status

  pool: pulse

 state: ONLINE

 scrub: none requested

config:

 

NAME  STATE READ
WRITE CKSUM

pulse ONLINE   0
0 0

  raidz1  ONLINE   0
0 0

gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352  ONLINE   0
0 0

gptid/0eaa8131-828e-6449-b9ba-89ac63729d  ONLINE   0
0 0

gptid/77a8da7c-8e3c-184c-9893

RE: sshd / tcp packet corruption ?

2010-06-23 Thread Martin Minkus
Thanks for the reply. I actually posted a response to this original 
message with more details showing just raw tcp data sent from one box to 
another box is getting corrupted.

The culprit is definitely kinetic.

Futhermore, i've determined both NICs are doing it.

kinetic:~# netstat -i
NameMtu Network   Address  Ipkts Ierrs Idrop
Opkts Oerrs  Coll
em01500 Link#1  00:0e:0c:6b:d6:d3   49 0 0   
190062 0 0
em01500 10.64.10.0kinetic 198516 - -   
189315 - -
nfe0   1500 Link#2  00:24:1d:15:11:4817932 0 0  
219 0 0
nfe0   1500 10.64.11.010.64.11.253 12675 - -  
217 - -
plip0  1500 Link#3   0 0 0
0 0 0
lo0   16384 Link#4 592 0 0  
592 0 0
lo0   16384 fe80:4::1 fe80:4::10 - -
0 - -
lo0   16384 localhost ::1  0 - -
0 - -
lo0   16384 your-net  localhost  552 - -  
592 - -
kinetic:~# 

Perhaps it is ram, though good point. I'll do a memtest.

Martin.

-Original Message-
From: Lowell Gilbert [mailto:freebsd-questions-lo...@be-well.ilk.org] 
Sent: Thursday, 24 June 2010 09:41
To: Martin Minkus
Cc: freebsd-questions
Subject: Re: sshd / tcp packet corruption ?

Martin Minkus martin.min...@punz.co.nz writes:

 It seems this issue I reported below may actually be related to some
 kind of TCP packet corruption ?

Possible.  Or memory errors.  Hard to say much at this point, when you
don't even know which side is actually causing the errors.

 Still same box. Ive noticed my SSH connections into the box will die
 randomly, with errors.

  

 Sshd logs the following on the box itself:

  

 Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from
 10.64.10.251: 2: Invalid packet header.  This probably indicates a
 problem with key exchange or encryption. 


You might find more useful information by getting verbose messages from
the other end.  

I don't have time to check this in detail, but if I recall correctly,
that message means that the other side closed the connection based on an
apparent invalid header type in a packet that 'kinetic' received.
Random corruption isn't likely in that case, because the error is always
in the same place in the packet.  Check the 'netstat -i' numbers to see
if the drivers are picking up any packet errors.

It's hard to debug network problems in ssh, though, because (obviously)
you can't tell in general whether packet data is corrupt.  If you can
set up a test case with, say, UDP echo, that would be easier to see the
damage to the packets if they are, in fact, being corrupted.  

Unfortunately, I'm so used to having sophisticated test equipment in the
lab to look at these kinds of problems that I'm probably missing what
would be obvious to someone who deals with problems in the field.
Hope I've been somewhat helpful anyway.


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


sshd / tcp packet corruption ?

2010-06-22 Thread Martin Minkus
It seems this issue I reported below may actually be related to some
kind of TCP packet corruption ?

 

Still same box. I’ve noticed my SSH connections into the box will die
randomly, with errors.

 

Sshd logs the following on the box itself:

 

Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from
10.64.10.251 port 56469 ssh2

Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't
contact LDAP server

Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0

Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1

Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout()
returned an error

Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from
10.64.10.251 port 56470 ssh2

Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout()
returned an error

 

Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from
10.64.10.209: 5: Message Authentication Code did not verify (packet
#75658). Data integrity has been compromised. 

Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from
10.64.10.209 port 9494 ssh2

Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3

Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from
10.64.10.209: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from
10.64.10.209 port 9534 ssh2

 

My googlefu has failed me on this.

 

Any ideas what on earth this could be ?

 

Ethernet card?

 

em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.1 port
0xcc00-0xcc3f mem 0xfdfe-0xfdff,0xfdfc-0xfdfd irq 17 at
device 7.0 on pci1

em0: [FILTER]

em0: Ethernet address: 00:0e:0c:6b:d6:d3

 

em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu
1500

 
options=209bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC


ether 00:0e:0c:6b:d6:d3

inet 10.64.10.10 netmask 0xff00 broadcast 10.64.10.255

media: Ethernet autoselect (1000baseT full-duplex)

status: active

 

Thanks,

Martin.

 

 

From: Martin Minkus 
Sent: Monday, 14 June 2010 11:21
To: freebsd-questions@freebsd.org
Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported -
after a few days?

 

Samba 3.4 on FreeBSD 8-STABLE branch.

After a few days I start getting weird errors and windows PC's can't
access the samba share, have trouble accessing files, etc, and samba
becomes totally unusable.

Restarting samba doesn't fix it – only a reboot does.

 

Accessing files on the ZFS pool locally is fine. Other services (like
dhcpd, openldap server) on the box continue to work fine. Only samba
dies and by dies I mean it can no longer service clients and windows
brings up bizarre errors. Windows can access our other samba servers (on
linux, etc) just fine.

Kernel:

 

FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4:
Wed May 26 18:09:14 NZST 2010
mart...@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64

 

Zpool status:

 

kinetic:~$ zpool status

  pool: pulse

 state: ONLINE

 scrub: none requested

config:

 

NAME  STATE READ
WRITE CKSUM

pulse ONLINE   0
0 0

  raidz1  ONLINE   0
0 0

gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352  ONLINE   0
0 0

gptid/0eaa8131-828e-6449-b9ba-89ac63729d  ONLINE   0
0 0

gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60  ONLINE   0
0 0

gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01  ONLINE   0
0 0

 

errors: No known data errors

kinetic:~$


log.smb:

[2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in)
open_socket_in(): socket() call failed: Protocol not supported
[2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Protocol not supported
[2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop)
waiting for connections

log.ANYPC:

[2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal)
getpeername failed. Error was Socket is not connected
read_fd_with_timeout: client 0.0.0.0 read error = Socket is not
connected.


The code in lib/util_sock.c, around line 902:

/***
*
Open

RE: sshd / tcp packet corruption ?

2010-06-22 Thread Martin Minkus
So definitely some kind of packet corruption;

 

Using netcat to send a single megabyte of binary data to a box with no
known issues (from kinetic - steel):

 

kinetic:/tmp$ dd if=/dev/urandom of=random.testfile bs=1k count=1k

1024+0 records in

1024+0 records out

1048576 bytes transferred in 0.018347 secs (57152372 bytes/sec)

 

kinetic:/tmp$ md5 random.testfile 

MD5 (random.testfile) = 9be700336ef81e8f89c60422fc795877

 

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ nc steel 1234 -v -O 4096  random.testfile

Connection to steel 1234 port [tcp/*] succeeded!

kinetic:/tmp$ 

 

 

whilst on steel: (a stable linux box kinetic is MEANT to be replacing)

 

ff8a336e2be0c5c645e9f8a2dea67eea  random.testfile

fae5da747c7857d1d87870c05db1f152  random.testfile

a36c7166631ca10c460e323e39071094  random.testfile

50a8f005a772f9321243215d1ea1adb6  random.testfile

5da41b6f475f4655572df8c9bd81e181  random.testfile

3104dd30179bf870e8ec6ef91c34d78f  random.testfile

274a16890cf39c3089d8f0eda253f5fd  random.testfile

e8d0bae998340252c6c67529d520feb4  random.testfile

6d5377ca4545f98a55c017f518567092  random.testfile

6b464f810fe1c2902694a7817f881906  random.testfile

8912007161ececdb3e23a0018af36c36  random.testfile

3f4e17d5a939cd8dfd0941c898c5ac5f  random.testfile

9db926ba5f5f39dddcc0607983ed96f0  random.testfile

835de68b981bf6cb871ebb2ce81404e1  random.testfile

a211a3260d9c8ae595782d254798cacf  random.testfile

030e08f1d3d0fb761046f66c888fdea2  random.testfile

 

If I reboot kinetic and try one last time:

 

9be700336ef81e8f89c60422fc795877  random.testfile

 

Notice that is now the CORRECT checksum on steel.

 

Kinetic’s samba, sshd, etc will play nice for a day or so before
returning to corrupting packets.

 

So any idea ? Why would my packets start getting corrupted after a
couple days use?

 

This box just runs isc-dhcpd, openldap-server, samba34, and ZFS (the
real reason its replacing the Linux box.)

 

Thanks,

Martin.

 

From: Martin Minkus 
Sent: Wednesday, 23 June 2010 16:01
To: freebsd-questions@freebsd.org
Subject: sshd / tcp packet corruption ?

 

It seems this issue I reported below may actually be related to some
kind of TCP packet corruption ?

 

Still same box. I’ve noticed my SSH connections into the box will die
randomly, with errors.

 

Sshd logs the following on the box itself:

 

Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from
10.64.10.251 port 56469 ssh2

Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't
contact LDAP server

Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0

Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1

Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout()
returned an error

Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from
10.64.10.251 port 56470 ssh2

Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from
10.64.10.251: 2: Invalid packet header.  This probably indicates a
problem with key exchange or encryption. 

Jun 18 11:16:41 kinetic sshd[16511

FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported - after a few days?

2010-06-13 Thread Martin Minkus
Samba 3.4 on FreeBSD 8-STABLE branch.

After a few days I start getting weird errors and windows PC's can't
access the samba share, have trouble accessing files, etc, and samba
becomes totally unusable.

Restarting samba doesn't fix it – only a reboot does.

 

Accessing files on the ZFS pool locally is fine. Other services (like
dhcpd, openldap server) on the box continue to work fine. Only samba
dies and by dies I mean it can no longer service clients and windows
brings up bizarre errors. Windows can access our other samba servers (on
linux, etc) just fine.



Kernel:

 

FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4:
Wed May 26 18:09:14 NZST 2010
mart...@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64

 

Zpool status:

 

kinetic:~$ zpool status

  pool: pulse

 state: ONLINE

 scrub: none requested

config:

 

NAME  STATE READ
WRITE CKSUM

pulse ONLINE   0
0 0

  raidz1  ONLINE   0
0 0

gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352  ONLINE   0
0 0

gptid/0eaa8131-828e-6449-b9ba-89ac63729d  ONLINE   0
0 0

gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60  ONLINE   0
0 0

gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01  ONLINE   0
0 0

 

errors: No known data errors

kinetic:~$


log.smb:

[2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in)
open_socket_in(): socket() call failed: Protocol not supported
[2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket)
smbd_open_once_socket: open_socket_in: Protocol not supported
[2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop)
waiting for connections

log.ANYPC:

[2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal)
getpeername failed. Error was Socket is not connected
read_fd_with_timeout: client 0.0.0.0 read error = Socket is not
connected.


The code in lib/util_sock.c, around line 902:

/***
*
Open a socket of the specified type, port, and address for incoming
data.

/

int open_socket_in(int type,
uint16_t port,
int dlevel,
const struct sockaddr_storage *psock,
bool rebind)
{
struct sockaddr_storage sock;
int res;
socklen_t slen = sizeof(struct sockaddr_in);

sock = *psock;

#if defined(HAVE_IPV6)
if (sock.ss_family == AF_INET6) {
((struct sockaddr_in6 *)sock)-sin6_port = htons(port);
slen = sizeof(struct sockaddr_in6);
}
#endif
if (sock.ss_family == AF_INET) {
((struct sockaddr_in *)sock)-sin_port = htons(port);
}

res = socket(sock.ss_family, type, 0 );
if( res == -1 ) {
if( DEBUGLVL(0) ) {
dbgtext( open_socket_in(): socket() call failed:  );
dbgtext( %s\n, strerror( errno ) );
}

In other words, it looks like something in the kernel is exhausted
(what?). I don’t know if tuning is required, or this is some kind of
bug?

/boot/loader.conf:

mvs_load=YES
zfs_load=YES
vm.kmem_size=20G

#vfs.zfs.arc_min=512M
#vfs.zfs.arc_max=1536M

vfs.zfs.arc_min=512M
vfs.zfs.arc_max=3072M



I’ve played with a few sysctl settings (found these recommendations
online, but they make no difference)


/etc/sysctl.conf:

kern.ipc.maxsockbuf=2097152

net.inet.tcp.sendspace=262144
net.inet.tcp.recvspace=262144
net.inet.tcp.mssdflt=1452

net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=65535

net.local.stream.recvspace=65535
net.local.stream.sendspace=65535

Any ideas on what could possibly be going wrong?

 

Any help would be greatly appreciated!

 

Thanks,

Martin


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


5.3-Stable network issue

2005-02-10 Thread Martin Minkus
I seem to have been having a rather strange networking issue in FreeBSD
5.3-Stable (it started happening immediately after 5.2.1 and has persisted
since.. I keep ³hoping² that next time I cvsup it will be fixed, but no).

I downgraded back to 5.2.1-p13 and it is perfectly fine once again.


*** Some background information:

My FreeBSD box is my home NAT router, server, firewall, etc. It does DHCP,
MX for some of my domains, secondary DNS (I got primary elsewhere), apache
for some webhosting, blah blah blah. Nothing really special. It is a Dual
PIII-500, 512mb ram, and a couple ATA hdd¹s. Had 3 realtek network
interfaces, but down to 2 now.

*** The problem:

Networking simply stops or locks up. Why, I don't know. I believe
initially it happened for all 3 network cards... I thought tcp/ip processing
or something in the kernel got locked. It happens every 30 minutes to an
hour, and lasts about 60 seconds to 120 seconds. Unfortunately, 60 seconds
to 120 seconds is long enough to kill messenger (my gf does not like),
online gaming, etc etc.

Lately, I had taken one of the realtek cards out (it was for a several km
long wireless link) and moved the server to my gf's place (where I am now
100% of the time). So now that I have the server locally and rely on it for
my internet connection, this has become a real PAIN.

I've noticed that I can remain ssh'd into diablo, do whatever I want while
this lock issue occurs. So the lan interface rl0 is fine. The internet
interface, rl1 (which goes to the cable modem) locks up. (btw, its not the
cable modem as I am using my gf's now, and it did this at my place on my
cable modem too, which is a different brand. Nortel at my place, motorola at
my gfs).

*** Attempts:

I've attempted switching out network cards, and places 3 other realtek cards
in. Different brands, all with different revisions (D instead of B, etc,
etc).

No matter what I try, nothing fixes it. The machine seems perfectly
repsonsive, and I am still ssh'd in and can do whatever I want on it... But
the network card going to the cable modem has stopped responding?!

This never happened during 5.0-Current all throughout 5.2.1-STABLE, but
anywhere beyond 5.2.1 it craps itself.


*** Dmesg output:

Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.2.1-RELEASE-p13 #2: Thu Feb 10 18:39:33 CST 2005
[EMAIL PROTECTED]:/junk/obj/junk/src/sys/DIABLO
Preloaded elf kernel /boot/kernel/kernel at 0xc076c000.
MPTable: OEM0 PROD
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Pentium III/Pentium III Xeon/Celeron (504.72-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x673  Stepping = 3
  
Features=0x387fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,PN,MMX,FXSR,SSE
real memory  = 536870912 (512 MB)
avail memory = 516034560 (492 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Assuming intbase of 0
ioapic0 Version 1.1 irqs 0-23 on motherboard
Pentium Pro MTRR support enabled
npx0: [FAST]
npx0: math processor on motherboard
npx0: INT 16 interface
pcibios: BIOS version 2.10
Using $PIR table, 7 entries at 0xc00fdcf0
pcib0: Intel 82443BX (440 BX) host to PCI bridge at pcibus 0 on
motherboard
pci0: PCI bus on pcib0
pci_cfgintr: 0:10 INTA BIOS irq 10
pci_cfgintr: 0:12 INTA BIOS irq 11
agp0: Intel 82443BX (440 BX) host to PCI bridge mem 0xd000-0xd3ff
at device 0.0 on pci0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX4 UDMA33 controller port 0xf000-0xf00f at device 7.1 on
pci0
ata0: at 0x1f0 irq 14 on atapci0
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci0
ata1: [MPSAFE]
uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0xe000-0xe01f at
device 7.2 on pci0
pci_cfgintr: 0:7 INTD routed to irq 11
usb0: Intel 82371AB/EB (PIIX4) USB controller on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
piix0: PIIX Timecounter port 0x5000-0x500f at device 7.3 on pci0
Timecounter PIIX frequency 3579545 Hz quality 0
pci0: display, VGA at device 8.0 (no driver attached)
rl0: RealTek 8139 10/100BaseTX port 0xe400-0xe4ff mem
0xd700-0xd7ff irq 10 at device 10.0 on pci0
rl0: Ethernet address: 00:00:21:f2:a5:47
miibus0: MII bus on rl0
rlphy0: RealTek internal media interface on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl1: RealTek 8139 10/100BaseTX port 0xe800-0xe8ff mem
0xd7001000-0xd70010ff irq 11 at device 12.0 on pci0
rl1: Ethernet address: 00:40:f4:90:1c:4b
miibus1: MII bus on rl1
rlphy1: RealTek internal media interface on miibus1
rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
orm0: Option ROMs at iomem 0xc8000-0xcbfff,0xc-0xc7fff