RE: sshd / tcp packet corruption ? ZFS Samba?
Okay guys, Just thought i’d post that a resolution has been found. People suggested it could be hardware and try memtest – which never found anything. It seems though that in the end the issue is the motherboard; Possibly the southbridge or something to do with the PCI bus. The SATA drives which are hanging of a marvel in a pcie slot was unaffected. No amount of zfs scrubs and rsync with checksumming found anything wrong. It was only network traffic on the intel pro (pci card) or onboard nvidia nfe card that had issues. It was worst when using samba of ZFS, though god knows why that exposed the issue more. I never had any kernel panics, just silent data corruption on the PCI bus. Moved hdds and cards to a different motherboard, and everything is 100% fine. So a couple weeks looking at this on and off (and slowly losing my mind) and it was nothing more than flaky hardware. Thanks for your help to those who took the time to reply. Martin. From: Martin Minkus Sent: Monday, 28 June 2010 09:22 To: freebsd-questions@freebsd.org Subject: RE: sshd / tcp packet corruption ? ZFS Samba? Hey all, It was suggested I do a memtest, but that checked out fine. (I wish it was as simple as just the ram!) I’ve realised the issue manifests itself almost immediately when accessing an underlying ZFS filesystem using Samba. But if it is UFS, it is fine. Does this mean anything to anyone? Ie: md5’ing the same file over SMB, one on UFS (/tmp) one on ZFS: cd5d0011c28fb335d57a83b3751831e7 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe bb433ae7e4c3c70c49b3c8c1590e8aa5 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 8eeaf672f6742ae4f900b16ec3cb190a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe bc327dc715516b5ba2e8478036112bd2 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 0cde0cf7ec036cedc8f3294153209b4c *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 71e705470a4af5533eb019e00df3a946 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe ba7041e4cad852d00c8da1a461e3b5f9 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 7ce9ea8b9a4d8858899da23472a24c76 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 8f0eff7cb6069ff39aa46e2affc27a4b *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe c23fceb0302fd59b49e22bce61eabe8d *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 46c9d538c99be3947b92f9ec47bb900a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 2a2a94c94a167a8e525e368aceb07875 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe d303861d09b0584f6c6621e9881e3f63 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe ad8f8cef1829de206460b947687909f0 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 9a866d9602a9df92b6acb6f1182b05ab *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 5552491a9e295890ad48064440d8d05b *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe ceee04c26b03132db48d67c076526c82 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 7aa666918d73e40a25ccdb1c104f8476 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 561aa772884c0b7ef139f556355adffb *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 30540ecb4bfb8533969f4a4137a77e79 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe c0f315f00be76a4e15dec68de2bba49b *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 9de4864a97ed4ad9c495c221fe1b932f *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 47c8ad183dbe0d4637229af08cc2cd89 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe c9bfe8c7073940acbcdb31430eb4a061 *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022 *//kinetic/temp/Desktop.exe 327605a6ddb89f7a3e2bd056c5f28b2a *//kinetic/pulse/shares/cti/bin/Desktop.exe 2447bdb56c5fa8efa761ffa100908022
RE: sshd / tcp packet corruption ? ZFS Samba?
on usbus0 kbd2 at ukbd0 uhid0: CHICONY Compaq USB Keyboard, class 0/0, rev 1.10/1.05, addr 2 on usbus0 em0: link state changed to UP ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add vfs.zfs.prefetch_disable=0 to /boot/loader.conf. ZFS filesystem version 3 ZFS storage pool version 14 kinetic:~# I’ve since removed everything from /etc/sysctl.conf and/boot/loader.conf so no tuning is used. I’ve also been fiddling and trying all sorts of different things in smb.conf. It makes no difference. I am at a complete loss as to what is going on here. Should I just give up? Is there some obscure ZFS+Samba issue on FreeBSD? Thanks, Martin. From: Martin Minkus Sent: Wednesday, 23 June 2010 16:01 To: freebsd-questions@freebsd.org Subject: sshd / tcp packet corruption ? It seems this issue I reported below may actually be related to some kind of TCP packet corruption ? Still same box. I’ve noticed my SSH connections into the box will die randomly, with errors. Sshd logs the following on the box itself: Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from 10.64.10.251 port 56469 ssh2 Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't contact LDAP server Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0 Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1 Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout() returned an error Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from 10.64.10.251 port 56470 ssh2 Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout() returned an error Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from 10.64.10.209: 5: Message Authentication Code did not verify (packet #75658). Data integrity has been compromised. Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from 10.64.10.209 port 9494 ssh2 Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3 Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from 10.64.10.209: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from 10.64.10.209 port 9534 ssh2 My googlefu has failed me on this. Any ideas what on earth this could be ? Ethernet card? em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.1 port 0xcc00-0xcc3f mem 0xfdfe-0xfdff,0xfdfc-0xfdfd irq 17 at device 7.0 on pci1 em0: [FILTER] em0: Ethernet address: 00:0e:0c:6b:d6:d3 em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=209bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC ether 00:0e:0c:6b:d6:d3 inet 10.64.10.10 netmask 0xff00 broadcast 10.64.10.255 media: Ethernet autoselect (1000baseT full-duplex) status: active Thanks, Martin. From: Martin Minkus Sent: Monday, 14 June 2010 11:21 To: freebsd-questions@freebsd.org Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported - after a few days? Samba 3.4 on FreeBSD 8-STABLE branch. After a few days I start getting weird errors and windows PC's can't access the samba share, have trouble accessing files, etc, and samba becomes totally unusable. Restarting samba doesn't fix it – only a reboot does. Accessing files on the ZFS pool locally is fine. Other services (like dhcpd, openldap server) on the box continue to work fine. Only samba dies and by dies I mean it can no longer service clients and windows brings up bizarre errors. Windows can access our other samba servers (on linux, etc) just fine. Kernel: FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4: Wed May 26 18:09:14 NZST 2010 mart...@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64 Zpool status: kinetic:~$ zpool status pool: pulse state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pulse ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352 ONLINE 0 0 0 gptid/0eaa8131-828e-6449-b9ba-89ac63729d ONLINE 0 0 0 gptid/77a8da7c-8e3c-184c-9893
RE: sshd / tcp packet corruption ?
Thanks for the reply. I actually posted a response to this original message with more details showing just raw tcp data sent from one box to another box is getting corrupted. The culprit is definitely kinetic. Futhermore, i've determined both NICs are doing it. kinetic:~# netstat -i NameMtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll em01500 Link#1 00:0e:0c:6b:d6:d3 49 0 0 190062 0 0 em01500 10.64.10.0kinetic 198516 - - 189315 - - nfe0 1500 Link#2 00:24:1d:15:11:4817932 0 0 219 0 0 nfe0 1500 10.64.11.010.64.11.253 12675 - - 217 - - plip0 1500 Link#3 0 0 0 0 0 0 lo0 16384 Link#4 592 0 0 592 0 0 lo0 16384 fe80:4::1 fe80:4::10 - - 0 - - lo0 16384 localhost ::1 0 - - 0 - - lo0 16384 your-net localhost 552 - - 592 - - kinetic:~# Perhaps it is ram, though good point. I'll do a memtest. Martin. -Original Message- From: Lowell Gilbert [mailto:freebsd-questions-lo...@be-well.ilk.org] Sent: Thursday, 24 June 2010 09:41 To: Martin Minkus Cc: freebsd-questions Subject: Re: sshd / tcp packet corruption ? Martin Minkus martin.min...@punz.co.nz writes: It seems this issue I reported below may actually be related to some kind of TCP packet corruption ? Possible. Or memory errors. Hard to say much at this point, when you don't even know which side is actually causing the errors. Still same box. Ive noticed my SSH connections into the box will die randomly, with errors. Sshd logs the following on the box itself: Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. You might find more useful information by getting verbose messages from the other end. I don't have time to check this in detail, but if I recall correctly, that message means that the other side closed the connection based on an apparent invalid header type in a packet that 'kinetic' received. Random corruption isn't likely in that case, because the error is always in the same place in the packet. Check the 'netstat -i' numbers to see if the drivers are picking up any packet errors. It's hard to debug network problems in ssh, though, because (obviously) you can't tell in general whether packet data is corrupt. If you can set up a test case with, say, UDP echo, that would be easier to see the damage to the packets if they are, in fact, being corrupted. Unfortunately, I'm so used to having sophisticated test equipment in the lab to look at these kinds of problems that I'm probably missing what would be obvious to someone who deals with problems in the field. Hope I've been somewhat helpful anyway. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
sshd / tcp packet corruption ?
It seems this issue I reported below may actually be related to some kind of TCP packet corruption ? Still same box. I’ve noticed my SSH connections into the box will die randomly, with errors. Sshd logs the following on the box itself: Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from 10.64.10.251 port 56469 ssh2 Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't contact LDAP server Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0 Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1 Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout() returned an error Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from 10.64.10.251 port 56470 ssh2 Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout() returned an error Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from 10.64.10.209: 5: Message Authentication Code did not verify (packet #75658). Data integrity has been compromised. Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from 10.64.10.209 port 9494 ssh2 Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3 Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from 10.64.10.209: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from 10.64.10.209 port 9534 ssh2 My googlefu has failed me on this. Any ideas what on earth this could be ? Ethernet card? em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.1 port 0xcc00-0xcc3f mem 0xfdfe-0xfdff,0xfdfc-0xfdfd irq 17 at device 7.0 on pci1 em0: [FILTER] em0: Ethernet address: 00:0e:0c:6b:d6:d3 em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=209bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC ether 00:0e:0c:6b:d6:d3 inet 10.64.10.10 netmask 0xff00 broadcast 10.64.10.255 media: Ethernet autoselect (1000baseT full-duplex) status: active Thanks, Martin. From: Martin Minkus Sent: Monday, 14 June 2010 11:21 To: freebsd-questions@freebsd.org Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported - after a few days? Samba 3.4 on FreeBSD 8-STABLE branch. After a few days I start getting weird errors and windows PC's can't access the samba share, have trouble accessing files, etc, and samba becomes totally unusable. Restarting samba doesn't fix it – only a reboot does. Accessing files on the ZFS pool locally is fine. Other services (like dhcpd, openldap server) on the box continue to work fine. Only samba dies and by dies I mean it can no longer service clients and windows brings up bizarre errors. Windows can access our other samba servers (on linux, etc) just fine. Kernel: FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4: Wed May 26 18:09:14 NZST 2010 mart...@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64 Zpool status: kinetic:~$ zpool status pool: pulse state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pulse ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352 ONLINE 0 0 0 gptid/0eaa8131-828e-6449-b9ba-89ac63729d ONLINE 0 0 0 gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60 ONLINE 0 0 0 gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01 ONLINE 0 0 0 errors: No known data errors kinetic:~$ log.smb: [2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in) open_socket_in(): socket() call failed: Protocol not supported [2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket) smbd_open_once_socket: open_socket_in: Protocol not supported [2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop) waiting for connections log.ANYPC: [2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal) getpeername failed. Error was Socket is not connected read_fd_with_timeout: client 0.0.0.0 read error = Socket is not connected. The code in lib/util_sock.c, around line 902: /*** * Open
RE: sshd / tcp packet corruption ?
So definitely some kind of packet corruption; Using netcat to send a single megabyte of binary data to a box with no known issues (from kinetic - steel): kinetic:/tmp$ dd if=/dev/urandom of=random.testfile bs=1k count=1k 1024+0 records in 1024+0 records out 1048576 bytes transferred in 0.018347 secs (57152372 bytes/sec) kinetic:/tmp$ md5 random.testfile MD5 (random.testfile) = 9be700336ef81e8f89c60422fc795877 kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ nc steel 1234 -v -O 4096 random.testfile Connection to steel 1234 port [tcp/*] succeeded! kinetic:/tmp$ whilst on steel: (a stable linux box kinetic is MEANT to be replacing) ff8a336e2be0c5c645e9f8a2dea67eea random.testfile fae5da747c7857d1d87870c05db1f152 random.testfile a36c7166631ca10c460e323e39071094 random.testfile 50a8f005a772f9321243215d1ea1adb6 random.testfile 5da41b6f475f4655572df8c9bd81e181 random.testfile 3104dd30179bf870e8ec6ef91c34d78f random.testfile 274a16890cf39c3089d8f0eda253f5fd random.testfile e8d0bae998340252c6c67529d520feb4 random.testfile 6d5377ca4545f98a55c017f518567092 random.testfile 6b464f810fe1c2902694a7817f881906 random.testfile 8912007161ececdb3e23a0018af36c36 random.testfile 3f4e17d5a939cd8dfd0941c898c5ac5f random.testfile 9db926ba5f5f39dddcc0607983ed96f0 random.testfile 835de68b981bf6cb871ebb2ce81404e1 random.testfile a211a3260d9c8ae595782d254798cacf random.testfile 030e08f1d3d0fb761046f66c888fdea2 random.testfile If I reboot kinetic and try one last time: 9be700336ef81e8f89c60422fc795877 random.testfile Notice that is now the CORRECT checksum on steel. Kinetic’s samba, sshd, etc will play nice for a day or so before returning to corrupting packets. So any idea ? Why would my packets start getting corrupted after a couple days use? This box just runs isc-dhcpd, openldap-server, samba34, and ZFS (the real reason its replacing the Linux box.) Thanks, Martin. From: Martin Minkus Sent: Wednesday, 23 June 2010 16:01 To: freebsd-questions@freebsd.org Subject: sshd / tcp packet corruption ? It seems this issue I reported below may actually be related to some kind of TCP packet corruption ? Still same box. I’ve noticed my SSH connections into the box will die randomly, with errors. Sshd logs the following on the box itself: Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from 10.64.10.251 port 56469 ssh2 Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't contact LDAP server Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0 Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1 Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout() returned an error Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from 10.64.10.251 port 56470 ssh2 Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption. Jun 18 11:16:41 kinetic sshd[16511
FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported - after a few days?
Samba 3.4 on FreeBSD 8-STABLE branch. After a few days I start getting weird errors and windows PC's can't access the samba share, have trouble accessing files, etc, and samba becomes totally unusable. Restarting samba doesn't fix it – only a reboot does. Accessing files on the ZFS pool locally is fine. Other services (like dhcpd, openldap server) on the box continue to work fine. Only samba dies and by dies I mean it can no longer service clients and windows brings up bizarre errors. Windows can access our other samba servers (on linux, etc) just fine. Kernel: FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4: Wed May 26 18:09:14 NZST 2010 mart...@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64 Zpool status: kinetic:~$ zpool status pool: pulse state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pulse ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352 ONLINE 0 0 0 gptid/0eaa8131-828e-6449-b9ba-89ac63729d ONLINE 0 0 0 gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60 ONLINE 0 0 0 gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01 ONLINE 0 0 0 errors: No known data errors kinetic:~$ log.smb: [2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in) open_socket_in(): socket() call failed: Protocol not supported [2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket) smbd_open_once_socket: open_socket_in: Protocol not supported [2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop) waiting for connections log.ANYPC: [2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal) getpeername failed. Error was Socket is not connected read_fd_with_timeout: client 0.0.0.0 read error = Socket is not connected. The code in lib/util_sock.c, around line 902: /*** * Open a socket of the specified type, port, and address for incoming data. / int open_socket_in(int type, uint16_t port, int dlevel, const struct sockaddr_storage *psock, bool rebind) { struct sockaddr_storage sock; int res; socklen_t slen = sizeof(struct sockaddr_in); sock = *psock; #if defined(HAVE_IPV6) if (sock.ss_family == AF_INET6) { ((struct sockaddr_in6 *)sock)-sin6_port = htons(port); slen = sizeof(struct sockaddr_in6); } #endif if (sock.ss_family == AF_INET) { ((struct sockaddr_in *)sock)-sin_port = htons(port); } res = socket(sock.ss_family, type, 0 ); if( res == -1 ) { if( DEBUGLVL(0) ) { dbgtext( open_socket_in(): socket() call failed: ); dbgtext( %s\n, strerror( errno ) ); } In other words, it looks like something in the kernel is exhausted (what?). I don’t know if tuning is required, or this is some kind of bug? /boot/loader.conf: mvs_load=YES zfs_load=YES vm.kmem_size=20G #vfs.zfs.arc_min=512M #vfs.zfs.arc_max=1536M vfs.zfs.arc_min=512M vfs.zfs.arc_max=3072M I’ve played with a few sysctl settings (found these recommendations online, but they make no difference) /etc/sysctl.conf: kern.ipc.maxsockbuf=2097152 net.inet.tcp.sendspace=262144 net.inet.tcp.recvspace=262144 net.inet.tcp.mssdflt=1452 net.inet.udp.recvspace=65535 net.inet.udp.maxdgram=65535 net.local.stream.recvspace=65535 net.local.stream.sendspace=65535 Any ideas on what could possibly be going wrong? Any help would be greatly appreciated! Thanks, Martin ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
5.3-Stable network issue
I seem to have been having a rather strange networking issue in FreeBSD 5.3-Stable (it started happening immediately after 5.2.1 and has persisted since.. I keep ³hoping² that next time I cvsup it will be fixed, but no). I downgraded back to 5.2.1-p13 and it is perfectly fine once again. *** Some background information: My FreeBSD box is my home NAT router, server, firewall, etc. It does DHCP, MX for some of my domains, secondary DNS (I got primary elsewhere), apache for some webhosting, blah blah blah. Nothing really special. It is a Dual PIII-500, 512mb ram, and a couple ATA hdd¹s. Had 3 realtek network interfaces, but down to 2 now. *** The problem: Networking simply stops or locks up. Why, I don't know. I believe initially it happened for all 3 network cards... I thought tcp/ip processing or something in the kernel got locked. It happens every 30 minutes to an hour, and lasts about 60 seconds to 120 seconds. Unfortunately, 60 seconds to 120 seconds is long enough to kill messenger (my gf does not like), online gaming, etc etc. Lately, I had taken one of the realtek cards out (it was for a several km long wireless link) and moved the server to my gf's place (where I am now 100% of the time). So now that I have the server locally and rely on it for my internet connection, this has become a real PAIN. I've noticed that I can remain ssh'd into diablo, do whatever I want while this lock issue occurs. So the lan interface rl0 is fine. The internet interface, rl1 (which goes to the cable modem) locks up. (btw, its not the cable modem as I am using my gf's now, and it did this at my place on my cable modem too, which is a different brand. Nortel at my place, motorola at my gfs). *** Attempts: I've attempted switching out network cards, and places 3 other realtek cards in. Different brands, all with different revisions (D instead of B, etc, etc). No matter what I try, nothing fixes it. The machine seems perfectly repsonsive, and I am still ssh'd in and can do whatever I want on it... But the network card going to the cable modem has stopped responding?! This never happened during 5.0-Current all throughout 5.2.1-STABLE, but anywhere beyond 5.2.1 it craps itself. *** Dmesg output: Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.2.1-RELEASE-p13 #2: Thu Feb 10 18:39:33 CST 2005 [EMAIL PROTECTED]:/junk/obj/junk/src/sys/DIABLO Preloaded elf kernel /boot/kernel/kernel at 0xc076c000. MPTable: OEM0 PROD Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Pentium III/Pentium III Xeon/Celeron (504.72-MHz 686-class CPU) Origin = GenuineIntel Id = 0x673 Stepping = 3 Features=0x387fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA, CMOV,PAT,PSE36,PN,MMX,FXSR,SSE real memory = 536870912 (512 MB) avail memory = 516034560 (492 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Assuming intbase of 0 ioapic0 Version 1.1 irqs 0-23 on motherboard Pentium Pro MTRR support enabled npx0: [FAST] npx0: math processor on motherboard npx0: INT 16 interface pcibios: BIOS version 2.10 Using $PIR table, 7 entries at 0xc00fdcf0 pcib0: Intel 82443BX (440 BX) host to PCI bridge at pcibus 0 on motherboard pci0: PCI bus on pcib0 pci_cfgintr: 0:10 INTA BIOS irq 10 pci_cfgintr: 0:12 INTA BIOS irq 11 agp0: Intel 82443BX (440 BX) host to PCI bridge mem 0xd000-0xd3ff at device 0.0 on pci0 pcib1: PCI-PCI bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 isab0: PCI-ISA bridge at device 7.0 on pci0 isa0: ISA bus on isab0 atapci0: Intel PIIX4 UDMA33 controller port 0xf000-0xf00f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata0: [MPSAFE] ata1: at 0x170 irq 15 on atapci0 ata1: [MPSAFE] uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0xe000-0xe01f at device 7.2 on pci0 pci_cfgintr: 0:7 INTD routed to irq 11 usb0: Intel 82371AB/EB (PIIX4) USB controller on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered piix0: PIIX Timecounter port 0x5000-0x500f at device 7.3 on pci0 Timecounter PIIX frequency 3579545 Hz quality 0 pci0: display, VGA at device 8.0 (no driver attached) rl0: RealTek 8139 10/100BaseTX port 0xe400-0xe4ff mem 0xd700-0xd7ff irq 10 at device 10.0 on pci0 rl0: Ethernet address: 00:00:21:f2:a5:47 miibus0: MII bus on rl0 rlphy0: RealTek internal media interface on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl1: RealTek 8139 10/100BaseTX port 0xe800-0xe8ff mem 0xd7001000-0xd70010ff irq 11 at device 12.0 on pci0 rl1: Ethernet address: 00:40:f4:90:1c:4b miibus1: MII bus on rl1 rlphy1: RealTek internal media interface on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto orm0: Option ROMs at iomem 0xc8000-0xcbfff,0xc-0xc7fff