Re: [POSSIBLE BUG] 10-STABLE CARP erroneously becomes master on boot
On 21 August 2015 at 09:06, Eugene M. Zheganin e...@norma.perm.ru wrote: Hi. On 20.08.2015 14:51, Damien Fleuriot wrote: Hello list, We've managed to find the source of the bug, if it is indeed a bug. It all comes down to the order in which the IP addresses are assigned to the interface from /etc/rc.conf. When using the following syntax, the physical IP address is configured AFTER the CARPs on the interface, which results in the CARP advertisements being sourced from the CARP IP, triggering the double MASTER situation : ipv4_addrs_int=1.2.3.4/24 ifconfig_int_alias0=1.2.3.6/32 vhid 1 pass test advskew 20 When using either of the following syntaxes, the physical IP address is configured BEFORE the CARPs, which results in the CARP advertisements being sourced from the physical IP and restoring normal functionality : ifconfig_int=inet 1.2.3.4/24 ifconfig_int_alias0=1.2.3.6/32 vhid 1 pass test advskew 20 OR ifconfig_int_alias0=1.2.3.4/24 ifconfig_int_alias1=1.2.3.6/32 vhid 1 pass test advskew 20 It has been there since carp-ng was commited to the 10-CURRENT 2 years ago. The thing is, carp-ng doesn't need a non-carp address on an interface anymore, both nodes can work fine using only shared address. This isn't comfortable in lots of cases, but still. Thus, kernel sends carp advertisements from a primary address on the interface (which is normal behavior for any known network stack) and for FreeBSD that primary address has always been the first address on an interface for a given AF. Thus, your split-brain carp situation cause lies definitely somewhere else. I'm running carp on FreeBSD for years, including legacy one; if there is a bug - the situation you are describing probably isn't one. Eugene, agree to disagree here. I've also been using CARP for years, both legacy and carp-ng, and while I'm not an expert on its inner workings I do understand how it operates. What you describe WRT the network stack and sourcing the advertisements is correct, I'll give you that. The problem lies not with CARP but with how the IP addresses are assigned by rc.conf It is abnormal that the CARP addresses should be set up first, when using the ipv4_addrs_int syntax. Therein lies the problem. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Will 10.2 also ship with a very stale NTP?
On Fri, 2015-08-21 at 08:51 +0200, Harald Schmalzbauer wrote: Bezüglich Ian Lepore's Nachricht vom 21.08.2015 00:34 (localtime): On Fri, 2015-07-24 at 15:19 +0200, Harald Schmalzbauer wrote: Bezglich Ian Lepore's Nachricht vom 12.07.2015 17:41 (localtime): And let's all just hope that a week or two of testing is enough when jumping a major piece of software forward several years in its independent evolution. … I wonder how many other such things could be lurking in 4.2.8, waiting to be triggered by other peoples' non-stock configurations? We've … I'd like to report one, most likely an upstream problem: 'restrict' definitions in ntp.conf(5) no longer work with unqualified DNS names. A line like restrict time1 nomodify nopeer noquery notrap results in: ntpd[1913]: line 7 column 7 syntax error, unexpected T_Time1 ntpd[1913]: syntax error in /etc/ntp.conf line 7, column 7 I've always been using unqualified hostnames with 'restrict', and since defining 'server' with unqualified hostname still works, this seems to be a significant bug to me. People are forced to change 'restrict' definitions, but not to also change other unqualified definitions, which potentially leads to misconfigurations, since intentionally matching definitions can now differ easily. Has anybody already noticed this problem? And any idea if upstream is aware? I had a quick look at this today. It appears that the problem isn't unqualified names exactly, but rather an unqualified name that exactly matches an ntp.conf keyword will be mistaken by the ntpd config parser as a misplaced keyword token. So most unqualified names should work, but there are about 200 words that won't, many of them very sensible names for ntp servers such as ntp and time1 and time2. When I look at the ntp_parser.y grammar file it's not clear to me why server time1 works and restrict time1 doesn't. I couldn't find any way to trick it into taking a keyword as a hostname following restrict (like using quotes). Thank you very much! This is very interesting and exactly matches my tested host names. I wish I had better C skills to find such things myself. Out of curiosity: How much time took it to find the ntp_parser.y route? (and with what “IDE” I'm stuck with vim) One additional observation was that the reserved-name-collision only happens with CNAME records. I hope I'll find some time to actually do look into sources - which I didn't at first hand because of my lousy C skills :-( But that's the place where to find hints :-) Thanks, I started out pretty sure what I was going to discover, based on the error you reported syntax error, unexpected T_Time1. That 'T_Time1' just said to me that's a yacc/bison token constant, this is going to be an error in their grammar (.y) file. The tricky part is that the .y file isn't in the base source code, I had to go find it in the vendor branch. I don't think the CNAME part matters. I tried changing my 'ntp' CNAME to a regular A record and the error still happens if I use it as an unqualified name with restrict. The IDE I use is SlickEdit, running on freebsd under the linuxulator. It's a commercial product worth every penny I've paid for various versions since the 90s. It gets the credit for a lot of my productivity. -- Ian ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ix(intel) vs mlxen(mellanox) 10Gb performance
Yonghyeon PYUN wrote: On Wed, Aug 19, 2015 at 09:00:35AM -0400, Rick Macklem wrote: Hans Petter Selasky wrote: On 08/19/15 09:42, Yonghyeon PYUN wrote: On Wed, Aug 19, 2015 at 09:00:52AM +0200, Hans Petter Selasky wrote: On 08/18/15 23:54, Rick Macklem wrote: Ouch! Yes, I now see that the code that counts the # of mbufs is before the code that adds the tcp/ip header mbuf. In my opinion, this should be fixed by setting if_hw_tsomaxsegcount to whatever the driver provides - 1. It is not the driver's responsibility to know if a tcp/ip header mbuf will be added and is a lot less confusing that expecting the driver author to know to subtract one. (I had mistakenly thought that tcp_output() had added the tc/ip header mbuf before the loop that counts mbufs in the list. Btw, this tcp/ip header mbuf also has leading space for the MAC layer header.) Hi Rick, Your question is good. With the Mellanox hardware we have separate so-called inline data space for the TCP/IP headers, so if the TCP stack subtracts something, then we would need to add something to the limit, because then the scatter gather list is only used for the data part. I think all drivers in tree don't subtract 1 for if_hw_tsomaxsegcount. Probably touching Mellanox driver would be simpler than fixing all other drivers in tree. Maybe it can be controlled by some kind of flag, if all the three TSO limits should include the TCP/IP/ethernet headers too. I'm pretty sure we want both versions. Hmm, I'm afraid it's already complex. Drivers have to tell almost the same information to both bus_dma(9) and network stack. Don't forget that not all drivers in the tree set the TSO limits before if_attach(), so possibly the subtraction of one TSO fragment needs to go into ip_output() Ok, I realized that some drivers may not know the answers before ether_ifattach(), due to the way they are configured/written (I saw the use of if_hw_tsomax_update() in the patch). I was not able to find an interface that configures TSO parameters after if_t conversion. I'm under the impression if_hw_tsomax_update() is not designed to use this way. Probably we need a better one?(CCed to Gleb). If it is subtracted as a part of the assignment to if_hw_tsomaxsegcount in tcp_output() at line#791 in tcp_output() like the following, I don't think it should matter if the values are set before ether_ifattach()? /* * Subtract 1 for the tcp/ip header mbuf that * will be prepended to the mbuf chain in this * function in the code below this block. */ if_hw_tsomaxsegcount = tp-t_tsomaxsegcount - 1; I don't have a good solution for the case where a driver doesn't plan on using the tcp/ip header provided by tcp_output() except to say the driver can add one to the setting to compensate for that (and if they fail to do so, it still works, although somewhat suboptimally). When I now read the comment in sys/net/if_var.h it is clear what it means, but for some reason I didn't read it that way before? (I think it was the part that said the driver didn't have to subtract for the headers that confused me?) In any case, we need to try and come up with a clear definition of what they need to be set to. I can now think of two ways to deal with this: 1 - Leave tcp_output() as is, but provide a macro for the device driver authors to use that sets if_hw_tsomaxsegcount with a flag for driver uses tcp/ip header mbuf, documenting that this flag should normally be true. OR 2 - Change tcp_output() as above, noting that this is a workaround for confusion w.r.t. whether or not if_hw_tsomaxsegcount should include the tcp/ip header mbuf and update the comment in if_var.h to reflect this. Then drivers that don't use the tcp/ip header mbuf can increase their value for if_hw_tsomaxsegcount by 1. (The comment should also mention that a value of 35 or greater is much preferred to 32 if the hardware will support that.) Both works for me. My preference is 2 just because it's very common for most drivers that use tcp/ip header mbuf. Thanks for this comment. I tend to agree, both for the reason you state and also because the patch is simple enough that it might qualify as an errata for 10.2. I am hoping Daniel Braniss will be able to test the patch and let us know if it improves performance with TSO enabled? rick ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: freebsd-update to 10.2-RELEASE broken ?
Hi, On Mon, Aug 17, 2015 at 03:54:34PM +, Glen Barber wrote: [...] Secteam. I've cc'd them. the issue persists even when forcing to a single update server, update2.freebsd.org is very close to this server. The DNS (?) response of Looking up update2.freebsd.org mirrors... none found is also still there. I end up with files where name and hash don't match. It appears to be an issue how the filename is generated from the hash, while the fact that the file can be unzipped from .gz format tells me it is not really corrupted. Or perhaps, how the gzip compression gets handled on small files, with certain content and padding. For many files, the SHA256 over the ascii content after gunzip is equal to the filename. This is not the case on the files that are flagged as mismatch. I have not looked at the code, but I think it will exit after the first mismatch, even if there would be more mismatched files/checksums. This server is starting from 10.1-RELEASE-p18, fully updated. I removed all files in /var/db/freebsd-update/* , rebooted, then ran freebsd-update fetch again, and got the meta files. I observe, that when running the freebsd-update upgrade again after the first failure, I end up with less patches, less downloads, presumably because a large portion got patched in the previous round, but the hash issue exists on a different file. I did a simple checksum verification on the 809 *.gz files after the second run # for f in `ls *gz`; do ls -la $f; echo $f; gunzip -c $f |sha256; done and the output is deposited here: https://files.naund.org/andreas/freebsd-update-SHA256-mismatch.txt Eventually, in the third run, the upgrade completed. First run: [root@dev1 /usr/home/andreas]# freebsd-update -s update2.freebsd.org -r 10.2-RELEASE upgrade Looking up update2.freebsd.org mirrors... none found. Fetching metadata signature for 10.1-RELEASE from update2.freebsd.org... done. Fetching metadata index... done. Inspecting system... done. The following components of FreeBSD seem to be installed: kernel/generic src/src world/base world/doc The following components of FreeBSD do not seem to be installed: world/games Does this look reasonable (y/n)? y Fetching metadata signature for 10.2-RELEASE from update2.freebsd.org... done. Fetching metadata index... done. Fetching 1 metadata patches. done. Applying metadata patches... done. Fetching 1 metadata files... done. Inspecting system... done. Fetching files from 10.1-RELEASE for merging... done. Preparing to download files... done. Fetching 41142 patches.102030405060708090100 [... you all can count to 41030] 4104041050410604107041080410904110041110411204113041140. done. Applying patches... done. Fetching 5820 files... a36091931a81837106764f9afbf977c81c286f9bba476e9bfc77a3f962e84955 has incorrect hash. [root@dev1 /usr/home/andreas]# [root@dev1 /usr/home/andreas]# cd /var/db/freebsd-update/ [root@dev1 /var/db/freebsd-update]# ls -la a36091931a81837106764f9afbf977c81c286f9bba476e9bfc77a3f962e84955* -rw-r--r-- 1 root wheel 151 Aug 21 05:38 a36091931a81837106764f9afbf977c81c286f9bba476e9bfc77a3f962e84955.gz [root@dev1 /var/db/freebsd-update]# gunzip -c a36091931a81837106764f9afbf977c81c286f9bba476e9bfc77a3f962e84955.gz |sha256 a3649107fd11187af3797b596807f82cbab6f0ccae026b26a3eea3669a9223e5 [root@dev1 /var/db/freebsd-update]# [root@dev1 /var/db/freebsd-update]# gunzip -c a36091931a81837106764f9afbf977c81c286f9bba476e9bfc77a3f962e84955.gz .\ $FreeBSD: releng/10.2/tools/build/options/WITHOUT_FILE 279506 2015-03-01 22:07:54Z ngie $ Set to not build .Xr file 1 and related programs. [root@dev1 /var/db/freebsd-update]# Second run: [root@dev1 /var/db/freebsd-update]# date Fri Aug 21 05:52:14 UTC 2015 [root@dev1 /var/db/freebsd-update]# freebsd-update -s update2.freebsd.org -r 10.2-RELEASE upgrade Looking up update2.freebsd.org mirrors... none found. Fetching metadata signature for 10.1-RELEASE from update2.freebsd.org... done. Fetching metadata index... done. Fetching 1 metadata patches. done. Applying metadata patches... done. Fetching 1 metadata files... done. Inspecting system... done. The following components of FreeBSD seem to be installed: kernel/generic src/src world/base world/doc The following components of FreeBSD do not seem to be installed: world/games Does this look reasonable (y/n)? y Fetching metadata signature for 10.2-RELEASE from update2.freebsd.org... done. Fetching metadata index... done. Fetching 1 metadata patches. done. Applying metadata patches... done. Fetching 1 metadata files... done. Inspecting system... done. Fetching files from 10.1-RELEASE for merging... done. Preparing to download files... done. Fetching 354
Re: [POSSIBLE BUG] 10-STABLE CARP erroneously becomes master on boot
Hi. On 20.08.2015 14:51, Damien Fleuriot wrote: Hello list, We've managed to find the source of the bug, if it is indeed a bug. It all comes down to the order in which the IP addresses are assigned to the interface from /etc/rc.conf. When using the following syntax, the physical IP address is configured AFTER the CARPs on the interface, which results in the CARP advertisements being sourced from the CARP IP, triggering the double MASTER situation : ipv4_addrs_int=1.2.3.4/24 ifconfig_int_alias0=1.2.3.6/32 vhid 1 pass test advskew 20 When using either of the following syntaxes, the physical IP address is configured BEFORE the CARPs, which results in the CARP advertisements being sourced from the physical IP and restoring normal functionality : ifconfig_int=inet 1.2.3.4/24 ifconfig_int_alias0=1.2.3.6/32 vhid 1 pass test advskew 20 OR ifconfig_int_alias0=1.2.3.4/24 ifconfig_int_alias1=1.2.3.6/32 vhid 1 pass test advskew 20 It has been there since carp-ng was commited to the 10-CURRENT 2 years ago. The thing is, carp-ng doesn't need a non-carp address on an interface anymore, both nodes can work fine using only shared address. This isn't comfortable in lots of cases, but still. Thus, kernel sends carp advertisements from a primary address on the interface (which is normal behavior for any known network stack) and for FreeBSD that primary address has always been the first address on an interface for a given AF. Thus, your split-brain carp situation cause lies definitely somewhere else. I'm running carp on FreeBSD for years, including legacy one; if there is a bug - the situation you are describing probably isn't one. Eugene. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: vmx and esxi vst mode vlan tagging
On Thu, 20 Aug 2015 13:23:53 +0200 Marko Cupać marko.cu...@mimar.rs wrote: Hi, I have just spent half an hour bashing my head against the wall why my FreeBSD-10.2-RELEASE i386 machine with kernel-included vmx driver and emulators/open-vm-tools installed won't ping machines on the other vlan on the same vSwitch. It DOES ping other virtual machines on the same vSwitch and same vlan (freebsd servers), and it DOES ping physical machines in other vlans. It DOES NOT ping virtual machines on the same vSwitch and different vlan (windows servers). To cut the long story short, it appears that the problem is on FreeBSD side, as another machine (FreeBSD-10.1-RELEASE-p14 amd 64) with vmx3f0 driver and VMWare's vmware-tools installed pings everything as it should. Could it be that FreeBSD's vmx driver does not support all the functions of vmware's vmx3f driver? I deleted open-vm-tools and installed vmware-tools, but the broblem remains. I'm gonna test if 10.2-RELEASE amd64 has the same issue. That should tell me if problem is related to difference between i386 and amd64 in 10.2-RELEASE. If not, could this be a regression in 10.2? Should I report a bug or do some more testing? Thank you in advance, -- Marko Cupać https://www.mimar.rs/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: vmx and esxi vst mode vlan tagging
On Fri, Aug 21, 2015 at 10:25:05AM +0200, Marko Cupa'c wrote: On Thu, 20 Aug 2015 13:23:53 +0200 Marko Cupa'c marko.cu...@mimar.rs wrote: Hi, I have just spent half an hour bashing my head against the wall why my FreeBSD-10.2-RELEASE i386 machine with kernel-included vmx driver and emulators/open-vm-tools installed won't ping machines on the other vlan on the same vSwitch. It DOES ping other virtual machines on the same vSwitch and same vlan (freebsd servers), and it DOES ping physical machines in other vlans. It DOES NOT ping virtual machines on the same vSwitch and different vlan (windows servers). To cut the long story short, it appears that the problem is on FreeBSD side, as another machine (FreeBSD-10.1-RELEASE-p14 amd 64) with vmx3f0 driver and VMWare's vmware-tools installed pings everything as it should. Could it be that FreeBSD's vmx driver does not support all the functions of vmware's vmx3f driver? I deleted open-vm-tools and installed vmware-tools, but the broblem remains. I'm gonna test if 10.2-RELEASE amd64 has the same issue. That should tell me if problem is related to difference between i386 and amd64 in 10.2-RELEASE. If not, could this be a regression in 10.2? Should I report a bug or do some more testing? netstat -nr arp -na ifconfig -a ping's commands and results. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: vmx and esxi vst mode vlan tagging
On Fri, 21 Aug 2015 12:41:02 +0300 Slawa Olhovchenkov s...@zxy.spb.ru wrote: On Fri, Aug 21, 2015 at 10:25:05AM +0200, Marko Cupa'c wrote: On Thu, 20 Aug 2015 13:23:53 +0200 Marko Cupa'c marko.cu...@mimar.rs wrote: Hi, I have just spent half an hour bashing my head against the wall why my FreeBSD-10.2-RELEASE i386 machine with kernel-included vmx driver and emulators/open-vm-tools installed won't ping machines on the other vlan on the same vSwitch. It DOES ping other virtual machines on the same vSwitch and same vlan (freebsd servers), and it DOES ping physical machines in other vlans. It DOES NOT ping virtual machines on the same vSwitch and different vlan (windows servers). To cut the long story short, it appears that the problem is on FreeBSD side, as another machine (FreeBSD-10.1-RELEASE-p14 amd 64) with vmx3f0 driver and VMWare's vmware-tools installed pings everything as it should. Could it be that FreeBSD's vmx driver does not support all the functions of vmware's vmx3f driver? I deleted open-vm-tools and installed vmware-tools, but the broblem remains. I'm gonna test if 10.2-RELEASE amd64 has the same issue. That should tell me if problem is related to difference between i386 and amd64 in 10.2-RELEASE. If not, could this be a regression in 10.2? Should I report a bug or do some more testing? netstat -nr arp -na ifconfig -a ping's commands and results. Sorry guys, please dismiss this issue entirely. I was constantly setting /24 subnet mask whereas the correct mask for the vlan is /27. Please consider the fact that my long-needed vacation starts tomorrow :) Regards, -- Marko Cupać https://www.mimar.rs/ ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Will 10.2 also ship with a very stale NTP?
Bezüglich Ian Lepore's Nachricht vom 21.08.2015 00:34 (localtime): On Fri, 2015-07-24 at 15:19 +0200, Harald Schmalzbauer wrote: Bezglich Ian Lepore's Nachricht vom 12.07.2015 17:41 (localtime): And let's all just hope that a week or two of testing is enough when jumping a major piece of software forward several years in its independent evolution. … I wonder how many other such things could be lurking in 4.2.8, waiting to be triggered by other peoples' non-stock configurations? We've … I'd like to report one, most likely an upstream problem: 'restrict' definitions in ntp.conf(5) no longer work with unqualified DNS names. A line like restrict time1 nomodify nopeer noquery notrap results in: ntpd[1913]: line 7 column 7 syntax error, unexpected T_Time1 ntpd[1913]: syntax error in /etc/ntp.conf line 7, column 7 I've always been using unqualified hostnames with 'restrict', and since defining 'server' with unqualified hostname still works, this seems to be a significant bug to me. People are forced to change 'restrict' definitions, but not to also change other unqualified definitions, which potentially leads to misconfigurations, since intentionally matching definitions can now differ easily. Has anybody already noticed this problem? And any idea if upstream is aware? I had a quick look at this today. It appears that the problem isn't unqualified names exactly, but rather an unqualified name that exactly matches an ntp.conf keyword will be mistaken by the ntpd config parser as a misplaced keyword token. So most unqualified names should work, but there are about 200 words that won't, many of them very sensible names for ntp servers such as ntp and time1 and time2. When I look at the ntp_parser.y grammar file it's not clear to me why server time1 works and restrict time1 doesn't. I couldn't find any way to trick it into taking a keyword as a hostname following restrict (like using quotes). Thank you very much! This is very interesting and exactly matches my tested host names. I wish I had better C skills to find such things myself. Out of curiosity: How much time took it to find the ntp_parser.y route? (and with what “IDE” – I'm stuck with vim) One additional observation was that the reserved-name-collision only happens with CNAME records. I hope I'll find some time to actually do look into sources - which I didn't at first hand because of my lousy C skills :-( But that's the place where to find hints :-) Thanks, -Harry signature.asc Description: OpenPGP digital signature