Re: SLAAC not working
Greg Riverswrote in <2045487.fzlpjxt...@flake.tharned.org>: gc> > 2. What is shown by the command "ping6 ff02::1%lagg0" and "rtsol -dD lagg0"? gc> > gc> $ ping6 -c 2 ff02::1%lagg0 gc> PING6(56=40+8+8 bytes) fe80::ae16:2dff:fe1e:b880%lagg0 --> ff02::1%lagg0 gc> 16 bytes from fe80::ae16:2dff:fe1e:b880%lagg0, icmp_seq=0 hlim=64 time=0.181 ms gc> 16 bytes from fe80::f415:63ff:fe2b:ea06%lagg0, icmp_seq=0 hlim=64 time=0.263 ms(DUP!) gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 time=0.318 ms(DUP!) gc> 16 bytes from fe80::8edc:d4ff:feaf:8938%lagg0, icmp_seq=0 hlim=64 time=0.369 ms(DUP!) gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 time=0.803 ms(DUP!) gc> 16 bytes from fe80::ae16:2dff:fe1e:e998%lagg0, icmp_seq=0 hlim=64 time=0.868 ms(DUP!) gc> 16 bytes from fe80::ae16:2dff:fe1e:49f8%lagg0, icmp_seq=0 hlim=64 time=0.922 ms(DUP!) gc> 16 bytes from fe80::226:55ff:fe2f:40a4%lagg0, icmp_seq=0 hlim=64 time=0.971 ms(DUP!) gc> 16 bytes from fe80::f415:63ff:fe2b:ea06%lagg0, icmp_seq=0 hlim=64 time=2.144 ms(DUP!) gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 time=4.154 ms(DUP!) gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 time=4.220 ms(DUP!) gc> 16 bytes from fe80::ae16:2dff:fe1e:b880%lagg0, icmp_seq=1 hlim=64 time=0.222 ms You should have got responses from 64:a0:e7:45:63:43 (router), namely fe80::66a0:e7ff:fe45:6343%lagg0, but it seems it did not happen for some reason. Was the router receiving the ICMPv6 ECHOes which came from fe80::ae16:2dff:fe1e:b880? gc> > 2. What is shown by the command "ping6 ff02::1%lagg0" and "rtsol -dD lagg0"? (snip) gc> # rtsol -dD lagg0 gc> checking if lagg0 is ready... gc> lagg0 is ready gc> set timer for lagg0 to 1s gc> New timer is 1s gc> timer expiration on lagg0, state = 1 gc> send RS on lagg0, whose state is 2 gc> set timer for lagg0 to 4s gc> New timer is 4s gc> timer expiration on lagg0, state = 2 gc> send RS on lagg0, whose state is 2 gc> set timer for lagg0 to 4s gc> New timer is 4s gc> timer expiration on lagg0, state = 2 gc> send RS on lagg0, whose state is 2 gc> set timer for lagg0 to 1s gc> New timer is 1s gc> timer expiration on lagg0, state = 2 gc> No answer after sending 3 RSs gc> stop timer for lagg0 gc> there is no timer This indicates that there was no RA as an answer from the router after a RS message was sent. Probably there is a problem with the link between lagg0 and the router, not specific to IPv6. -- Hiroki pgpL4q7DN7PCO.pgp Description: PGP signature
Re: SLAAC not working
Greg Riverswrote in <1557648.bebeymq...@flake.tharned.org>: gc> On Monday, August 07, 2017 15:57:04 Andrey V. Elsukov wrote: gc> > So, set net.inet6.icmp6.nd6_debug=1 and show what you have in the gc> > ndp -p gc> > ndp -r gc> > ndp -i lagg0 gc> > gc> # sysctl net.inet6.icmp6.nd6_debug=1 gc> net.inet6.icmp6.nd6_debug: 0 -> 1 gc> # suspend gc> [1] + Stopped (SIGSTOP)su - gc> $ ndp -p gc> fe80::%lagg0/64 if=lagg0 gc> flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=0 gc> No advertising router gc> fe80::%lo0/64 if=lo0 gc> flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=0 gc> No advertising router gc> $ ndp -r gc> $ ndp -i lagg0 gc> linkmtu=0, maxmtu=0, curhlim=64, basereachable=30s0ms, reachable=31s, gc> retrans=1s0ms gc> Flags: nud accept_rtadv auto_linklocal gc> gc> Clearly there's no SLAAC action. I can't find any NDP debug messages gc> in the kernel message log or in the syslog. Where might they be going? The configuration looks correct to me, but two questions: 1. Does "sysctl net.inet6.ip6.forwarding" command show "0"? 2. What is shown by the command "ping6 ff02::1%lagg0" and "rtsol -dD lagg0"? -- Hiroki pgpfOK4C9B_f7.pgp Description: PGP signature
Re: IPv6 works on em0 () but not on em1 () - what's wrong?,IPv6 works on em0 () but not on em1 () - what's wrong?
Lev Serebryakovwrote in <58756dde.5000...@freebsd.org>,<58756dde.5000...@freebsd.org>: le> le> I have MoBo (Supermicro X9SCL-F) with two 1G NICs, first one (em0) is le> based on 82579LM, and second one (em1) is based on 82574L. le> le> When I'm using em0 with simple config: le> le> ifconfig_em0="inet 192.168.134.2 netmask 255.255.255.0 mtu 9000" le> ifconfig_em0_ipv6="inet6 accept_rtadv" le> le> everything works fine - em0 get IPv6 prefix from rtadvd of my router le> and "tspdump -n -i em0 icmp6" shows some traffic, like router and prefix le> announcements. So far so good. le> le> I want to use em1 (and don't use em0 at all), because 82579LM has some le> known bugs according to SuperMicro support and someties hangs whole system. le> le> So, I change config to le> le> ifconfig_em1="inet 192.168.134.2 netmask 255.255.255.0 mtu 9000" le> ifconfig_em1_ipv6="inet6 accept_rtadv" le> le> connect em1 instead of em0 to the switch and reboot. And after that le> interface (em1) can not get IPv6 prefix, don't get global address (and le> shows only link-local one)and "tcpdump -n -i em1 icmp6" shows nothing at le> all! IPv4 works fine, though. le> le> What do I do wrong? Is it known issue of 82574L? le> le> I'm running 10-STABLE r311462. What happens by typing the following command? % ping6 ff02::1%em1 -- Hiroki pgppEPUNeQglf.pgp Description: PGP signature
Re: stf(4) on 10-stable
Daniel Bilikwrote in <20160205093713.1c1453f9b5d06a6b366c4...@neosystem.cz>: dd> On Thu, 14 Jan 2016 10:49:37 +0100 dd> Daniel Bilik wrote: dd> dd> >> Should I create PR for this? dd> > Created: dd> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206231 dd> dd> Seems that 10-stable has just entered beta1, so unless some effort is dd> put into fixing this, 10.3-release is probably gonna ship with broken 6to4 dd> connectivity. I am sorry for not taking care of this in a timely manner. I will do this weekend. -- Hiroki pgpo3q6sftjuZ.pgp Description: PGP signature
Re: ipv6_addrs_IF aliases in rc.conf(5)
Hiroki Sato h...@freebsd.org wrote in 20130718.123323.1730389945845032580@allbsd.org: hr Michael Grimm trash...@odo.in-berlin.de wrote hr in eb3c4472-02bf-4415-bb2d-b4929063d...@odo.in-berlin.de: hr hr tr On 12.07.2013, at 09:03, Hiroki Sato h...@freebsd.org wrote: hr tr hr tr Please let me know if the existing configurations and/or the new hr tr formats do not work. hr tr hr tr First of all: great work! It is that much easier to deal with aliases, now. hr tr hr tr There is only one minor issue, if at all: hr tr hr tr rc.conf: hr tr | ifconfig_em0_ipv6=inet6 dead:beef::::1 prefixlen 56 hr tr | ifconfig_em0_aliases=\ hr tr | inet6 dead:beef::::2-3 prefixlen 56 \ hr tr | inet6 dead:beef::::4 prefixlen 56 \ hr tr | inet6 dead:beef::::5-6/56 hr tr hr tr ifconfig: hr tr | inet6 dead:beef::::1 prefixlen 56 hr tr | inet6 dead:beef::::2 prefixlen 64 hr tr | inet6 dead:beef::::3 prefixlen 64 hr tr | inet6 dead:beef::::4 prefixlen 56 hr tr | inet6 dead:beef::::5 prefixlen 56 hr tr | inet6 dead:beef::::6 prefixlen 56 hr tr hr tr Any combination of a range definition (2-3) *and* prefixlen 56 is ignored hr tr whereas a range definition (5-6) *and* /56 is interpreted as wanted. hr tr hr tr Well, that combination of a range and prefix isn't documented, thus I am hr tr not sure if that's an issue or a feature? hr hr It seems a bug. Thank you for your report. I am investigating it now. Can you test the attached patch? The old version (in stable/9 now) does not support address range spec + options properly and ignore the options part. The attached patch accepts options and treats netmask for inet and prefixlen in inet6 in a reasonable way so that the specified options do not conflict with the default /NN values. -- Hiroki Index: etc/network.subr === --- etc/network.subr (revision 253489) +++ etc/network.subr (working copy) @@ -721,9 +721,14 @@ # ifalias_expand_addr() { + local _af _action - afexists $1 || return - ifalias_expand_addr_$1 $2 $3 + _af=$1 + _action=$2 + shift 2 + + afexists $_af || return + ifalias_expand_addr_$_af $_action $* } # ifalias_expand_addr_inet action addr @@ -731,19 +736,31 @@ # ifalias_expand_addr_inet() { - local _action _arg _cidr _cidr_addr + local _action _arg _cidr _cidr_addr _exargs local _ipaddr _plen _range _iphead _iptail _iplow _iphigh _ipcount local _retstr _c _action=$1 _arg=$2 + shift 2 + _exargs=$* _retstr= - case $_action:$_arg in + case $_action:$_arg:$_exargs in *:*--*) return ;; # invalid - tmp:*) echo $_arg return ;; # already expanded - tmp:*-*) _action=alias ;; # to be expanded - *:*-*) ;;# to be expanded - *:*) echo inet $_arg return ;; # already expanded + tmp:*:*netmask*) # already expanded w/ netmask option + echo ${_arg%/[0-9]*} $_exargs return + ;; + tmp:*:*) # already expanded w/o netmask option + echo $_arg $_exargs return + ;; + tmp:*[0-9]-[0-9]*:*) _action=alias ;; # to be expanded + *:*[0-9]-[0-9]*:*) ;; # to be expanded + *:*:*netmask*) # already expanded w/ netmask option + echo inet ${_arg%/[0-9]*} $_exargs return + ;; + *:*:*)# already expanded w/o netmask option + echo inet $_arg $_exargs return + ;; esac for _cidr in $_arg; do @@ -796,7 +813,7 @@ done for _c in $_retstr; do - ifalias_expand_addr_inet $_action $_c + ifalias_expand_addr_inet $_action $_c $_exargs done } @@ -805,20 +822,32 @@ # ifalias_expand_addr_inet6() { - local _action _arg _cidr _cidr_addr + local _action _arg _cidr _cidr_addr _exargs local _ipaddr _plen _ipleft _ipright _iplow _iphigh _ipcount local _ipv4part local _retstr _c _action=$1 _arg=$2 + shift 2 + _exargs=$* _retstr= - case $_action:$_arg in - *:*--*) return ;; # invalid - tmp:*) echo $_arg return ;; - tmp:*-*) _action=alias ;; - *:*-*) ;; - *:*) echo inet6 $_arg return ;; + case $_action:$_arg:$_exargs in + *:*--*:*) return ;; # invalid + tmp:*:*prefixlen*) # already expanded w/ prefixlen option + echo ${_arg%/[0-9]*} $_exargs return + ;; + tmp:*:*) # already expanded w/o prefixlen option + echo $_arg $_exargs return + ;; + tmp:*[0-9a-zA-Z]-[0-9a-zA-Z]*:*)_action=alias ;;# to be expanded + *:*[0-9a-zA-Z]-[0-9a-zA-Z]*:*) ;; # to be expanded + *:*:*prefixlen*) # already expanded w/ prefixlen option + echo inet6 ${_arg%/[0-9]*} $_exargs return + ;; + *:*:*) # already expanded w/o prefixlen option + echo inet6 $_arg $_exargs return + ;; esac for _cidr in $_arg; do @@ -872,7 +901,7 @@ fi for _c in $_retstr; do -ifalias_expand_addr_inet6 $_action $_c +ifalias_expand_addr_inet6 $_action $_c $_exargs done else # v4mapped/v4compat should handle as an IPv4 alias @@ -888,7 +917,7 @@ _retstr=`ifalias_expand_addr_inet \ tmp ${_ipv4part}${_plen:+/}${_plen
Re: ipv6_addrs_IF aliases in rc.conf(5)
Michael Grimm trash...@odo.in-berlin.de wrote in 5c2419e4-d5b7-4f1a-aed0-90ef73305...@odo.in-berlin.de: tr On 20.07.2013, at 16:46, Hiroki Sato h...@freebsd.org wrote: tr Hiroki Sato h...@freebsd.org wrote in 20130718.123323.1730389945845032580@allbsd.org: tr tr Can you test the attached patch? The old version (in stable/9 now) tr does not support address range spec + options properly and ignore tr the options part. tr tr The attached patch accepts options and treats netmask for inet and tr prefixlen in inet6 in a reasonable way so that the specified tr options do not conflict with the default /NN values. tr tr I can confirm that your patch is working for my examples used before. tr tr Now, a range definition and prefixlen 56 is recognized properly: Thank you. Committed as r253505 and will be merged to stable/9. -- Hiroki pgp2zlCMgHI0D.pgp Description: PGP signature
Re: ipv6_addrs_IF aliases in rc.conf(5)
Łukasz Wąsikowski luk...@wasikowski.net wrote in 51e53ac7.1040...@wasikowski.net: lu hr# IPv4 address range spec. Now deprecated. lu hripv4_addr_em0=10.2.1.1-10 lu lu So I'm a little confused now :) If I'd use post r252015 system then lu would this be better way? lu lu ifconfig_em0_aliases=inet 10.0.0.66/28 inet 10.0.0.67-78 inet6 lu fdda:5cc1:23:4::1/48 inet6 fdda:5cc1:23:4::2-f Dewayne Geraghty dewayne.gerag...@heuristicsystems.com.au wrote in 14677223DB6D4CD48E880520725B3552@white: de Sato-san, de de You have provided a very useful summary of ifconfig parameters for de rc.conf. However, you are missing one example that would provide de clearer understanding. Would you please advise if de de ipv4_addr_em0=10.2.1.1-10/32 de de is deprecated, backward compatible or remains valid into the future? de de I particularly appreciate the succinctness of: de ifconfig_em0_aliases=inet 10.3.3.201-204/24 inet6 de 2001:db8:210-213::1/64 inet 10.1.1.1/24 The recommended way is ifconfig_IF_aliasN or ifconfig_IF_aliases. ipv4_addr_IF will not be removed in the near future, but please use ifconfig_IF_alias{N,es} for newly-configured systems. Backward compatibility for not breaking the existing configurations will be maintained as much as possible (even on the upcoming 10.0R and later). This is because we have a lot of variables which have (almost) the same functionality in rc.conf and I want to simplify them by merging them with each other, not because these are better than the others. Variables with overlapped functionality have made difficult to maintain/improve the rc.d scripts. -- Hiroki pgpzCODgf7YuF.pgp Description: PGP signature
Re: ipv6_addrs_IF aliases in rc.conf(5)
Mark Felder f...@freebsd.org wrote in 1374062120.4532.140661256673649.36ed5...@webmail.messagingengine.com: fe On Wed, Jul 17, 2013, at 4:36, Hiroki Sato wrote: fe fe The recommended way is ifconfig_IF_aliasN or ifconfig_IF_aliases. fe ipv4_addr_IF will not be removed in the near future, but please use fe ifconfig_IF_alias{N,es} for newly-configured systems. Backward fe compatibility for not breaking the existing configurations will be fe maintained as much as possible (even on the upcoming 10.0R and fe later). fe fe fe Almost everyone is familiar with ifconfig_IF_aliasN, but can you provide fe example syntax for ifconfig_IF_aliases ? I've never seen that before and fe can't find it documented. I committed some descriptions about it to rc.conf(5) at the same time. It is basically the same as ifconfig_IF_aliasN, but can have multiple address specification. Both of ifconfig_IF_alias{N,es} now supports range specification, so there is no difference in the functionality. The following two examples give the same result: ifconfig_ed0_alias0=inet 127.0.0.251 netmask 0x ifconfig_ed0_alias1=inet 127.0.0.252 netmask 0x ifconfig_ed0_alias2=inet 127.0.0.253 netmask 0x ifconfig_ed0_alias3=inet 127.0.0.254 netmask 0x ifconfig_ed0_aliases=\ inet 127.0.0.251 netmask 0x \ inet 127.0.0.252 netmask 0x \ inet 127.0.0.253 netmask 0x \ inet 127.0.0.254 netmask 0x \ The implementation actually converts values in the variables in ifconfig_IF_aliasN, ipv6_ifconfig_IF_aliasN, and ipv4_addrs_IF into a list of them in a consistent format (AF-keyword + address spec + options) used in ifconfig_IF_aliases, and then it processes ifconfig_IF_aliases and them. ifconfig_IF_aliasN accepts address spec without address family keyword for backward compatibility, but ifconfig_IF_aliases does not. This is the difference between the two. fe This thread isn't exactly the proper forum to debate the future of fe network configuration on FreeBSD, but please take this into fe consideration. And thank you for your work on the rc.d scripts -- fe they're the #1 reason many of us prefer working with FreeBSD. Fair enough. Please do not hesitate to speak up on freebsd-rc@ for this kind of topics. -- Hiroki pgpjGe9dn7kJF.pgp Description: PGP signature
Re: ipv6_addrs_IF aliases in rc.conf(5)
Michael Grimm trash...@odo.in-berlin.de wrote in eb3c4472-02bf-4415-bb2d-b4929063d...@odo.in-berlin.de: tr On 12.07.2013, at 09:03, Hiroki Sato h...@freebsd.org wrote: tr tr Please let me know if the existing configurations and/or the new tr formats do not work. tr tr First of all: great work! It is that much easier to deal with aliases, now. tr tr There is only one minor issue, if at all: tr tr rc.conf: tr | ifconfig_em0_ipv6=inet6 dead:beef::::1 prefixlen 56 tr | ifconfig_em0_aliases=\ tr | inet6 dead:beef::::2-3 prefixlen 56 \ tr | inet6 dead:beef::::4 prefixlen 56 \ tr | inet6 dead:beef::::5-6/56 tr tr ifconfig: tr | inet6 dead:beef::::1 prefixlen 56 tr | inet6 dead:beef::::2 prefixlen 64 tr | inet6 dead:beef::::3 prefixlen 64 tr | inet6 dead:beef::::4 prefixlen 56 tr | inet6 dead:beef::::5 prefixlen 56 tr | inet6 dead:beef::::6 prefixlen 56 tr tr Any combination of a range definition (2-3) *and* prefixlen 56 is ignored tr whereas a range definition (5-6) *and* /56 is interpreted as wanted. tr tr Well, that combination of a range and prefix isn't documented, thus I am tr not sure if that's an issue or a feature? It seems a bug. Thank you for your report. I am investigating it now. -- Hiroki pgpeQ_UFShVrJ.pgp Description: PGP signature
Re: ipv6_addrs_IF aliases in rc.conf(5)
Michael Grimm trash...@odo.in-berlin.de wrote in 4c07217dc9200841dfd065a6d5284...@mx1.enfer-du-nord.net: tr On 2013-07-12 6:56, Hiroki Sato wrote: tr Kevin Oberman rkober...@gmail.com wrote trin can6yy1srswemj2_bjx_drzmxgk4tf50_ode8o8i2d6wtrgw...@mail.gmail.com: tr rk On Wed, Jul 10, 2013 at 4:46 AM, Mark Felder f...@feld.me wrote: tr rk tr rk On Wed, 10 Jul 2013 06:44:12 -0500, Michael Grimm tr rk trash...@odo.in-berlin.de wrote: tr rk tr rk Will that patch make it into 9.2? If I am not mistaken, that patch isn't tr rk in stable yet. tr rk tr rk tr rk I would also like to see this patch hit 9.x sooner than later. It's so tr rk painful when someone forgets to fix the alias numbering on servers with tr rk many, many IPv4 and IPv6 addresses... tr rk tr rk tr rk Please, please, please, please, ...! tr rk tr rk Freeze is only two days away, so time for 9.2 is almost over and I can see tr rk no good reason NOT to get this done. tr r252015 was merged to stable/9 today. tr tr Thanks! This is highly appreciated. A first glance at network.subr tells me that tr much more has been modified/simplified regarding alias definitions, great. Please let me know if the existing configurations and/or the new formats do not work. The following is a summary of the supported rc.conf variables, FYI: Hiroki Sato h...@freebsd.org wrote in 201306200229.r5k2tnfr085...@svn.freebsd.org: hr A summary of the supported ifconfig_* variables is as follows: hr hr# IPv4 configuration. hrifconfig_em0=inet 192.168.0.1 hr# IPv6 configuration. hrifconfig_em0_ipv6=inet6 2001:db8::1/64 hr# IPv4 address range spec. Now deprecated. hripv4_addr_em0=10.2.1.1-10 hr# IPv6 alias. hrifconfig_em0_alias0=inet6 2001:db8:5::1 prefixlen 70 hr# IPv4 alias. hrifconfig_em0_alias1=inet 10.2.2.1/24 hr# IPv4 alias with range spec w/o AF keyword (backward compat). hrifconfig_em0_alias2=10.3.1.1-10/32 hr# IPv6 alias with range spec. hrifconfig_em0_alias3=inet6 2001:db8:20-2f::1/64 hr# ifconfig_IF_aliases is just like ifconfig_IF_aliasN. hrifconfig_em0_aliases=inet 10.3.3.201-204/24 inet6 2001:db8:210-213::1/64 inet 10.1.1.1/24 hr# IPv6 alias (backward compat) hripv6_ifconfig_em0_alias0=inet6 2001:db8:f::1/64 hr# IPv6 alias w/o AF keyword (backward compat) hripv6_ifconfig_em0_alias1=2001:db8:f:1::1/64 hr# IPv6 prefix. hripv6_prefix_em0=2001:db8::/64 -- Hiroki pgp_2ncrav6RP.pgp Description: PGP signature
Re: ipv6_addrs_IF aliases in rc.conf(5)
Kevin Oberman rkober...@gmail.com wrote in can6yy1srswemj2_bjx_drzmxgk4tf50_ode8o8i2d6wtrgw...@mail.gmail.com: rk On Wed, Jul 10, 2013 at 4:46 AM, Mark Felder f...@feld.me wrote: rk rk On Wed, 10 Jul 2013 06:44:12 -0500, Michael Grimm rk trash...@odo.in-berlin.de wrote: rk rk Will that patch make it into 9.2? If I am not mistaken, that patch isn't rk in stable yet. rk rk rk I would also like to see this patch hit 9.x sooner than later. It's so rk painful when someone forgets to fix the alias numbering on servers with rk many, many IPv4 and IPv6 addresses... rk rk rk Please, please, please, please, ...! rk rk Freeze is only two days away, so time for 9.2 is almost over and I can see rk no good reason NOT to get this done. r252015 was merged to stable/9 today. -- Hiroki pgpwRUfj8rol1.pgp Description: PGP signature
request for your comments on release documentation
Hi, I would like your comments on release notes for each release. Although I have been working on editing them for years, the workflow is still not optimal and sometimes delay of the preparation became an obstacle for release process. I would like to improve it, but before that I would like to know what are desired of the contents which people think. Release Notes is just listing the changes between the two releases. It includes user-visible change (bugfix and/or UI change), new functionality, and performance improvement. Minor changes such as one in kernel internal structure are omitted. I always try to keep these series of relnotes items are correct and reasonably comprehensive, but this lengthy list may be boring and technically-correct descriptions can be cryptic for average users. So, my questions are: 1. What do you think about current granularity of the relnotes items? Too detailed, good, or too rough? Currently, judgment of what is included or not is based on user-visible, new functionality, or performance improvement. Applicable changes are included as relnotes items even if the changes are small, 2. Do you want technical details? For example, just disk access performance was improved by 50% or Feature A has been added. This changes the old behavior because ..., and as a result, it improves disk access performance by 50%. 3. Is there missing information which should be in the relnotes? Probably there are some missing items for each release, but this question is one at some abstraction level. Link to commit log and diff, detailed description of major incompatible changes, and so on. Although the other release documentations---Errata, Installation Notes, ReadMe, and Hardware Notes---also need some improvements, please focus on Release Notes only. And you might think quality of English writing are not good, please leave that alone for now. -- Hiroki pgp5vPNysGiJt.pgp Description: PGP signature
Re: Possible 8.4 regression
Alexander Pyhalov a...@rsu.ru wrote in 4fffaaf8a6667175fca94ce32f25a...@sfedu.ru: al Hello. al al Just wanted to share a notice. al I had a 8.3 system with PostgreSQL running in a jail. al rc.conf has the following lines: al al jail_enable=YES al jail_sysvipc_allow=YES al jail_mount_enable=YES al jail_devfs_enable=YES al al jail_pgsql_rootdir=/jails/run/pgsql al jail_pgsql_hostname=pgsql.freebsd al jail_pgsql_ip=my.ip al jail_pgsql_interface=em0 al al It was running normally. However, after update to 8.4 I had to add the al following parameter al jail_pgsql_parameters=allow.sysvipc al al Without it shmget in jail didn't work. Thank you for the report. This affects jail_set_hostname_allow and jail_socket_unixiproute_only as well. I will add it to Errata. -- Hiroki pgpi2CEVtRsG4.pgp Description: PGP signature
Re: Apparent fxp regression in FreeBSD 8.4-RC3
YongHyeon PYUN pyu...@gmail.com wrote in 20130528023300.ga3...@michelle.cdnetworks.com: py I'll have access to the other box on Wednesday and will try the other test. py py Here is patch I'm testing and it seems to work with dhclient on py CURRENT. py Mike, could you try attached patch? On my box it worked without problem. Link status change of fxp0 was down-up only in the patched driver. -- Hiroki pgppsNZhJL23T.pgp Description: PGP signature
Re: Apparent fxp regression in FreeBSD 8.4-RC3
YongHyeon PYUN pyu...@gmail.com wrote in 20130524054720.ga1...@michelle.cdnetworks.com: py On Thu, May 23, 2013 at 09:49:19PM -0700, Jeremy Chadwick wrote: py On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote: py On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote: pyOn Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote: py If someone wants me to test DHCP via fxp(4) on the above system (I can py do so with both NICs), just let me know; it should only take me half an py hour or so. py py I'll politely wait for someone to say please do so else won't bother. py py pyFor the sake of completeness... py pyPlease do so. :) py py Issue reproduced 100% reliably, even within sysinstall. py py {snip} py py Forgot to add: py py This issue ONLY happens when using DHCP. py py Statically assigning the IP address works fine; fxp0 goes down once, py up once, then stays up indefinitely. py py I asked Mike to try backing out dhclient(8) change(r247336) but it py seems he missed that. Jeremy, could you try that? py py I guess dhclient(8) does not like flow-control negotiation of py fxp(4) after link establishment. Okay, I could reproduce this issue on my box. After invocation of dhclient(8), a link is up and then state_reboot() drops the link establishment. Removing the changes around RTM_IFINFO in r247336 makes it work with no problem. A workaround is specifying the following line in rc.conf: ifconfig_fxp0=DHCP media 100baseTX mediaopt full-duplex -- Hiroki pgplYTY7pdVsc.pgp Description: PGP signature
Re: Apparent fxp regression in FreeBSD 8.4-RC3
Hiroki Sato h...@freebsd.org wrote in 20130524.162926.395058052118975996@allbsd.org: hr YongHyeon PYUN pyu...@gmail.com wrote hr in 20130524054720.ga1...@michelle.cdnetworks.com: hr hr A workaround is specifying the following line in rc.conf: hr hr ifconfig_fxp0=DHCP media 100baseTX mediaopt full-duplex Hmm, I guess this can happen on other NICs when the link negotiation causes a link-state flap. Is it true? -- Hiroki pgpsjeYWEzvsx.pgp Description: PGP signature
Re: Apparent fxp regression in FreeBSD 8.4-RC3
Jeremy Chadwick j...@koitsu.org wrote in 20130524044035.ga40...@icarus.home.lan: jd On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote: jd On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote: jd If someone wants me to test DHCP via fxp(4) on the above system (I can jd do so with both NICs), just let me know; it should only take me half an jd hour or so. jd jd I'll politely wait for someone to say please do so else won't bother. jd jd jd For the sake of completeness... jd jd Please do so. :) jd jd Issue reproduced 100% reliably, even within sysinstall. jd jd ISO image used: jd jd ftp://ftp4.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/8.4/FreeBSD-8.4-RC3-i386-disc1.iso jd jd I just chose to Configure the system, selected Networking, chose NO to jd the IPv6 configuration choice, and YES to the DHCP configuration choice, jd then hit Alt-F2 to watch relevant output. jd jd This was the result: jd jd http://imgbin.org/index.php?page=imageid=13718 jd jd ...with the fxp0 physif up/down messages continuing indefinitely. jd jd fxp0 on the system is the Intel 82559. Shot of console's dmesg: jd jd http://imgbin.org/index.php?page=imageid=13720 Hmm, I tried RC3 on one of my test machines which has fxp0: FreeBSD 8.4-RC3 #0 r250307: Tue May 7 04:40:16 UTC 2013 r...@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC i386 ... fxp0: Intel 82559 Pro/100 Ethernet port 0x2800-0x283f mem 0xc4ffe000-0xc4ffefff,0xc4e0-0xc4ef irq 10 at device 3.0 on pci0 miibus0: MII bus on fxp0 inphy0: i82555 10/100 media interface PHY 1 on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow fxp0: Ethernet address: 00:02:a5:eb:14:93 fxp0: [ITHREAD] fxp0@pci0:0:3:0:class=0x02 card=0xb1340e11 chip=0x12298086 rev=0x08 hdr=0x00 vendor = 'Intel Corporation' device = '82550/1/7/8/9 EtherExpress PRO/100(B) Ethernet Adapter' class = network subclass = ethernet dev.inphy.0.%desc: i82555 10/100 media interface dev.inphy.0.%driver: inphy dev.inphy.0.%location: phyno=1 dev.inphy.0.%pnpinfo: oui=0xaa00 model=0x15 rev=0x4 dev.inphy.0.%parent: miibus0 It worked well for a PXE boot at least. I will give dhclient a try later. -- Hiroki pgpedi2anyIFG.pgp Description: PGP signature
Re: NFS-exported ZFS instability
Hiroki Sato h...@freebsd.org wrote in 20130104.023244.472910818423317661@allbsd.org: hr Konstantin Belousov kostik...@gmail.com wrote hr in 20130102174044.gb82...@kib.kiev.ua: hr hr ko I might take a closer look this evening and see if I can spot anything hr ko in the log, rick hr ko ps: I hope Alan and Kostik don't mind being added to the cc list. hr ko hr ko What I see in the log is that the lock cascade rooted in the thread hr ko 100838, which owns system map mutex. I believe this prevents malloc(9) hr ko from making a progress in other threads, which e.g. own the ZFS vnode hr ko locks. As the result, the whole system wedged. hr ko hr ko Looking back at the thread 100838, we can see that it executes hr ko smp_tlb_shootdown(). It is impossible to tell from the static dump, hr ko is the appearance of the smp_tlb_shootdown() in the backtrace is hr ko transient, or the thread is spinning there, waiting for other CPUs to hr ko acknowledge the request. But, since the system wedged, most likely, hr ko smp_tlb_shootdown spins. hr ko hr ko Taking this hypothesis, the situation can occur, most likely, due to hr ko some other core running with the interrupts disabled. Inspection of the hr ko backtraces of the processes running on all cores does not show any which hr ko could legitimately own a spinlock or otherwise run with the interrupts hr ko disabled. hr ko hr ko One thing you could try to do is to enable WITNESS for the spinlocks, hr ko to try to catch the leaked spinlock. I very much doubt that this is hr ko the case. hr ko hr ko Another thing to try is to switch the CPU idle method to something hr ko else. Look at the machdep.idle* sysctls. It could be some CPU errata hr ko which blocks wakeup due the interrupt in some conditions in C1 ? hr hr Thank you. It can take 1-2 weeks to reproduce this, so I set hr debug.witness.skipspin=0 and keeping machdep.idle acpi abd will see hr how it goes for a while. I will report again if I can get another hr freeze. Hmm, I could reproduce the same freeze when debug.witness.skipspin=0, too. DDB and crash dump outputs are the following: http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt The value of machdep.idle was acpi. I have seen this symptom on two boxes with the following CPUs, so I am guessing it is not specific to a CPU model: CPU: Intel(R) Pentium(R) D CPU 3.40GHz (3391.52-MHz K8-class CPU) CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz (2666.82-MHz K8-class CPU) -- Hiroki pgpD2om1nCoqH.pgp Description: PGP signature
Re: sendmail vs ipv6 broken after upgrade to 9.1
Ulrich Spörlein u...@freebsd.org wrote in 20130109142111.gl35...@acme.spoerlein.net: uq On Wed, 2013-01-09 at 14:14:18 +0100, Michiel Boland wrote: uq On 01/08/2013 23:33, Hiroki Sato wrote: uq Ulrich Spörlein u...@freebsd.org wrote uq in 20130108184051.gi35...@acme.spoerlein.net: uq uq uq After setting this, it now looks like this: uq uq root@acme: ~# ip6addrctl uq uq Prefix Prec Label Use uq uq ::1/128 50 00 uq uq ::/0 40 10 uq uq 2002::/16 30 20 uq uq ::/96 20 30 uq uq :::0.0.0.0/96 10 40 uq uq uq uq And even sendmail is happily finding the sockets to bind to. Thanks for the hint! uq uq I think this just hides the problem. If gshapiro@'s explanation is uq correct, no :::0.0.0.0/96 address should be returned if the name uq resolution works fine... uq uq -- Hiroki uq uq uq getipnodebyname(xx, AF_INET6, AI_DEFAULT|AI_ALL) does this:- uq uq If a host has both IPv6 and IPv4 addresses, both are returned. uq The IPv4 address is presented as a mapped address. uq The order in which the addresses are returns depends on the uq address selection policy (_hpreorder in lib/libc/net/name6.c) uq uq Is this also supposed to work for selecting the source IP address for uq outgoing packets/sockets? And should it work for ping6? Yes. uq Using a tunnel for IPv6, I have this transfer net configured on my uq router, but for ACL purposes I would like to have all connections come uq from my real prefix, not the transfer net. So I wrote my own policy, yet uq ping6 seems to ignore it. uq As you can see, source prefix stays 2a02:2528:ff00, though I'd like it uq to be 2a02:2528:ff0d. This is because the prefix on the interface has the first priority. Why don't you use an fe80::/10 address to route packets to the other endpoint of tun0? -- Hiroki pgpFTwL8cirug.pgp Description: PGP signature
Re: sendmail vs ipv6 broken after upgrade to 9.1
Ben Morrow b...@morrow.me.uk wrote in 20130109154435.ga81...@anubis.morrow.me.uk: be So getipnodebyname is behaving correctly here: the host has both IPv4 be and IPv6 addresses, and Sendmail is requesting both native and v4-mapped be addresses be returned in all cases. The v4-mapped addresses are then be sorted to the top of the list. be be On FreeBSD, where net.inet6.ip6.v6only is on by default, I believe this be is incorrect, and Sendmail should be passing 0 for the flags argument, be unless it's going to check or clear the IPV6_V6ONLY socket option. There be is no point binding a socket to a v4-mapped address if the kernel isn't be going to deliver IPv4 connections to it. Sendmail should also be binding be to all the addresses returned, if it isn't already, rather than just the be first: this would make the problem go away, since both v4-mapped and be native IPv6 sockets would be bound, and the v4-mapped ones would simply be never get any connections. I reread the RFC 2553 and realize your explanation is correct. gshapiro's explanation was a behavior in the case of (AF_INET6, AI_DEFAULT), not (AF_INET6, AI_DEFAULT|AI_ALL). I think sendmail should work regardless of net.inet6.ip6.v6only. Is just dropping AI_ALL enough for that? When RR is found, no v4-mapped address will return in that case. Is this correct? be Fixing this by setting ipv6_prefer is not necessarily a good idea; this be will cause IPv6 addresses to be preferred across the whole system, and be unless your IPv6 connectivity is at least as good as your IPv4, that be probably isn't what you want. Yes, I agree that ipv6_prefer is not a correct way to solve this specific issue. be Just curious, but is there any specific reason not to return an error be when Family=inet6 and no RR? be be In this case, Sendmail explicitly requested that v4-mapped addresses be be returned in all cases... -- Hiroki pgp8oZFQaQ0r1.pgp Description: PGP signature
Re: sendmail vs ipv6 broken after upgrade to 9.1
Gregory Shapiro gshap...@freebsd.org wrote in 20130108180920.gj36...@rugsucker.smi.sendmail.com: gs How can I unstupid sendmail here? gs gs I don't think sendmail is being stupid here as it is doing what it has gs been doing under 8.x and 9.1 (the code is the same). I think gs something changed with the upgrade to 9.1. As far as tracking it gs down, the sendmail code does: gs gs getipnodebyname(acme.spoerlein.net, AF_INET6, AI_DEFAULT|AI_ALL, gs err); gs gs This will only return an IPv4 mapped address if: gs gs 1. There are no IPv6 addresses configured on the interfaces. How are gs your IPv6 addresses assigned? If auto-configured (DHCPv6, RTADV), is gs it possible sendmail is being started before autoconfiguration has gs completed? Restarting the MTA after boot and seeing if it still gets gs the mapped address will say whether or not this is the cause. gs gs 2. The query for an record for acme.spoerlein.net failed. This gs doesn't appear to be the case for dns based on your dig output gs (assuming you ran that dig command on the same machine that is gs exhibiting the problem). However, your nsswitch.conf lists hosts gs before dns and there have been broken name resolution implementations gs that, with 'hosts' listed first in nsswitch.conf have given back bad gs info if the first hostname match didn't have the IPv6 address. You gs could try switching the order in /etc/hosts to see if this helps. gs (Note, the broken implementation was not FreeBSD.) Just curious, but is there any specific reason not to return an error when Family=inet6 and no RR? -- Hiroki pgpKBFOU0X1Fy.pgp Description: PGP signature
Re: sendmail vs ipv6 broken after upgrade to 9.1
Ulrich Spörlein u...@freebsd.org wrote in 20130108184051.gi35...@acme.spoerlein.net: uq After setting this, it now looks like this: uq root@acme: ~# ip6addrctl uq Prefix Prec Label Use uq ::1/128 50 00 uq ::/0 40 10 uq 2002::/16 30 20 uq ::/96 20 30 uq :::0.0.0.0/96 10 40 uq uq And even sendmail is happily finding the sockets to bind to. Thanks for the hint! I think this just hides the problem. If gshapiro@'s explanation is correct, no :::0.0.0.0/96 address should be returned if the name resolution works fine... -- Hiroki pgpTBxYwcfkgN.pgp Description: PGP signature
Re: NFS-exported ZFS instability
Rick Macklem rmack...@uoguelph.ca wrote in 1914428061.1617223.1357133079421.javamail.r...@erie.cs.uoguelph.ca: rm Hiroki Sato wrote: rm Hello, rm rm I have been in a trouble about my NFS server for a long time. The rm symptom is that it stops working in one or two weeks after a boot. I rm could not track down the cause yet, but it is reproducible and only rm occurred under a very high I/O load. rm rm It did not panic, just stopped working---while it responded to ping, rm userland programs seemed not working. I could break it into DDB and rm get a kernel dump. The following URLs are a log of ps, trace, and rm etc.: rm rm http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102 rm http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102 rm rm Does anyone see how to debug this? I guess this is due to a deadlock rm somewhere. I have suffered from this problem for almost two years. rm The above log is from stable/9 as of Dec 19, but this have persisted rm since 8.X. rm rm Well, I took a quick glance at the log and there are a lot of processes rm sleeping on pfault (in vm_waitpfault() in sys/vm/vm_page.c). I'm no rm vm guy, so I'm not sure when/why that will happen. The comment on the rm function suggests they are waiting for free pages. rm rm Maybe something as simple as running out of swap space or a problem rm talking to the disk(s) that has the swap partition(s) or ??? rm (I'm talking through my hat here, because I'm not conversant with rm the vm side of things.) rm rm I might take a closer look this evening and see if I can spot anything rm in the log, rick rm ps: I hope Alan and Kostik don't mind being added to the cc list. Thank you. This machine has 24GB RAM + 30GB swap. 16GB of them are used for ZFS ARC, and I can see 1.5GB free space on average. However, frequent swapouts happen in a regular basis even when the I/O load is low. The amount used in the swap was 20-30MB only regardless of the load. I checked vm.stats and the outputs of vmstat -z/-m every 10 sec until the freeze several times but vm.stats.vm.v_free_count was around 300,000 (1GB) even just before the freeze. -- Hiroki pgpt4cIux6h0I.pgp Description: PGP signature
Re: NFS-exported ZFS instability
Konstantin Belousov kostik...@gmail.com wrote in 20130102174044.gb82...@kib.kiev.ua: ko I might take a closer look this evening and see if I can spot anything ko in the log, rick ko ps: I hope Alan and Kostik don't mind being added to the cc list. ko ko What I see in the log is that the lock cascade rooted in the thread ko 100838, which owns system map mutex. I believe this prevents malloc(9) ko from making a progress in other threads, which e.g. own the ZFS vnode ko locks. As the result, the whole system wedged. ko ko Looking back at the thread 100838, we can see that it executes ko smp_tlb_shootdown(). It is impossible to tell from the static dump, ko is the appearance of the smp_tlb_shootdown() in the backtrace is ko transient, or the thread is spinning there, waiting for other CPUs to ko acknowledge the request. But, since the system wedged, most likely, ko smp_tlb_shootdown spins. ko ko Taking this hypothesis, the situation can occur, most likely, due to ko some other core running with the interrupts disabled. Inspection of the ko backtraces of the processes running on all cores does not show any which ko could legitimately own a spinlock or otherwise run with the interrupts ko disabled. ko ko One thing you could try to do is to enable WITNESS for the spinlocks, ko to try to catch the leaked spinlock. I very much doubt that this is ko the case. ko ko Another thing to try is to switch the CPU idle method to something ko else. Look at the machdep.idle* sysctls. It could be some CPU errata ko which blocks wakeup due the interrupt in some conditions in C1 ? Thank you. It can take 1-2 weeks to reproduce this, so I set debug.witness.skipspin=0 and keeping machdep.idle acpi abd will see how it goes for a while. I will report again if I can get another freeze. -- Hiroki pgppNW6a6Bds7.pgp Description: PGP signature
NFS-exported ZFS instability
Hello, I have been in a trouble about my NFS server for a long time. The symptom is that it stops working in one or two weeks after a boot. I could not track down the cause yet, but it is reproducible and only occurred under a very high I/O load. It did not panic, just stopped working---while it responded to ping, userland programs seemed not working. I could break it into DDB and get a kernel dump. The following URLs are a log of ps, trace, and etc.: http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102 http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102 Does anyone see how to debug this? I guess this is due to a deadlock somewhere. I have suffered from this problem for almost two years. The above log is from stable/9 as of Dec 19, but this have persisted since 8.X. -- Hiroki pgprYn17NEo1S.pgp Description: PGP signature
Re: FreeBSD daily snapshot build in allbsd.org temporarily down
Hiroki Sato h...@freebsd.org wrote in 20121207.101917.103513550140980591@allbsd.org: hr Hi all, hr hr I received many emails asking why hr https://pub.allbsd.org/FreeBSD-snapshots/ is stopped working and when hr it will recover, so I just wanted to let you know that FreeBSD daily hr snapshot build in allbsd.org is temporarily down. The reason why it hr is down is some local network issue and CVS-SVN migration of the hr build system. The latter was solved already. However, the former hr was unexpected and needed some time than I thought originally. The service has almost recovered. Snapshots for i386, amd64, and pc98/i386 are being rebuilt now, and then ia64, sparc64, and powerpc will also be connected to the build queue soon. For stable/9 and later, Subversion repository is used and the build results are sorted by the revision numbers on each day. For 8.X it still uses CVS via the make release target but will be switched to use Subversion shortly. Note that some local network performance issue still remains. It seems due to traffic congestion around the border router which I do not have control of. The transfer rate can become less than 100KB/s especially in 12:00-18:00 in JST. I will planning to add a custom build functionality by using the source trees under projects/ or user/ branch to this service. -- Hiroki pgpurKVXFyGiy.pgp Description: PGP signature
FreeBSD daily snapshot build in allbsd.org temporarily down
Hi all, I received many emails asking why https://pub.allbsd.org/FreeBSD-snapshots/ is stopped working and when it will recover, so I just wanted to let you know that FreeBSD daily snapshot build in allbsd.org is temporarily down. The reason why it is down is some local network issue and CVS-SVN migration of the build system. The latter was solved already. However, the former was unexpected and needed some time than I thought originally. The snapshot build will start again this weekend or early next week. Glen is offering similar snapshot ISO images and distfiles for amd64 and i386 at https://snapshots.glenbarber.us/Latest/, so please visit his page if you need the latest snapshot right now. -- Hiroki pgpyctOy4z2Oi.pgp Description: PGP signature
Re: FreeBSD 10-CURRENT and 9-STABLE snapshots
Jakub Lach jakub_l...@mailplus.pl wrote in 1349873186577-5750838.p...@n5.nabble.com: ja Any questions and suggestions are welcome. Contact h...@freebsd.org. ja ja But good catch, if your reasoning is indeed correct. ja ja And for the record, they are NOT official snapshots. Migrating from CVS to SVN in the build infrastructure is in progress and the daily snapshot build will recover in a couple of days, JFYI. -- Hiroki pgpEccIg5sCc7.pgp Description: PGP signature
Re: Broadcom NetXtreme bcm5720 in the 9.1 beta
Sean Bruno sean...@yahoo-inc.com wrote in 1343243969.2727.2.ca...@powernoodle.corp.yahoo.com: se On Tue, 2012-07-24 at 18:46 -0700, Hiroki Sato wrote: se Peter Feger magick...@gmail.com wrote sein CAD_3y4wAPp+8ZSveB6mbOF7M1Ne-zAvz4Uf=vv9quohuu23...@mail.gmail.com: se se ma I just got done installing FreeBSD-9.0 on a Dell R720. I can tell you se ma that none of the broadcom products will work. There is no driver that se ma I have been able to find. I wound up having to replace them with se ma Intel nics. I used the i350 quad-port 1G and the x520 for 10G Fiber. se se I recently bought a Dell R420 which had BCM 5720 as the LOM. The se output of pciconf was the following: se se bge0@pci0:2:0:0:class=0x02 card=0x04f81028 chip=0x165f14e4 rev=0x00 hdr=0x00 se vendor = 'Broadcom Corporation' se device = 'NetXtreme BCM5720 Gigabit Ethernet PCIe' se class = network se subclass = ethernet se se On 9.1-PRERELEASE as of Jul 23, it was recognized but did not work se properly first (the link-status went back and forth between up and se down). However, after setting dev.bge.0.msi=0 it worked. I am not se sure of whether it had decent communication speed or not, but I saw se it worked with 50MB/s or so at least. se se IPMI over LAN did not work even if hw.bge.allow_asf was set to 1. se se -- Hiroki se se se se For the r420/320 ... grab Pyun's latest updates and give it a whirl. se They seem to work for us at yahoo: se se http://people.freebsd.org/~yongari/bge/ Thanks! I am testing his patches... -- Hiroki pgpCoLNDhH26O.pgp Description: PGP signature
Re: Broadcom NetXtreme bcm5720 in the 9.1 beta
Peter Feger magick...@gmail.com wrote in CAD_3y4wAPp+8ZSveB6mbOF7M1Ne-zAvz4Uf=vv9quohuu23...@mail.gmail.com: ma I just got done installing FreeBSD-9.0 on a Dell R720. I can tell you ma that none of the broadcom products will work. There is no driver that ma I have been able to find. I wound up having to replace them with ma Intel nics. I used the i350 quad-port 1G and the x520 for 10G Fiber. I recently bought a Dell R420 which had BCM 5720 as the LOM. The output of pciconf was the following: bge0@pci0:2:0:0:class=0x02 card=0x04f81028 chip=0x165f14e4 rev=0x00 hdr=0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme BCM5720 Gigabit Ethernet PCIe' class = network subclass = ethernet On 9.1-PRERELEASE as of Jul 23, it was recognized but did not work properly first (the link-status went back and forth between up and down). However, after setting dev.bge.0.msi=0 it worked. I am not sure of whether it had decent communication speed or not, but I saw it worked with 50MB/s or so at least. IPMI over LAN did not work even if hw.bge.allow_asf was set to 1. -- Hiroki pgp5uvlqIwwQg.pgp Description: PGP signature
Re: cvsup{, d} woes after upgrading to RELENG_9 on amd64 this weekend
Dimitry Andric d...@freebsd.org wrote in 4fcc80c7.8060...@freebsd.org: di That said, since the ezm3 software is essentially unmaintained, the di only practical solutions to your problem currently are: di di - Compile libz without SSE di - Compile libz with gcc di - Use csup instead of cvsup di - Fix ezm3 to respect the amd64 ABI di - Rewrite cvsupd in C (this is left as an exercise for the reader ;) I have the same problem on my mirror server and currenly using a cvsup package for i386 on FreeBSD/amd64. -- Hiroki pgpEvSTVjMeKL.pgp Description: PGP signature
Re: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak
Rick Macklem rmack...@uoguelph.ca wrote in 1527622626.3418715.1335445225510.javamail.r...@erie.cs.uoguelph.ca: rm Steven Hartland wrote: rm Original Message - rm From: Rick Macklem rmack...@uoguelph.ca rm At a glance, it looks to me like 8.x is affected. Note that the rm bug only affects the new NFS server (the experimental one for 8.x) rm when exporting ZFS volumes. (UFS exported volumes don't leak) rm rm If you are running a server that might be affected, just: rm # vmstat -z | fgrep -i namei rm on the server and see if the 3rd number shown is increasing. rm rm Many thanks Rick wasnt aware we had anything experimental enabled rm but I think that would be a yes looking at these number:- rm rm vmstat -z | fgrep -i namei rm NAMEI: 1024, 0, 1, 1483, 25285086096, 0 rm vmstat -z | fgrep -i namei rm NAMEI: 1024, 0, 0, 1484, 25285945725, 0 rm rm ^ rm I don't think so, since the 3rd number (USED) is 0 here. rm If that # is increasing over time, you have the leak. You are rm probably running the old (default in 8.x) NFS server. Just a report, I confirmed it affected 8.x servers running newnfs. Actually I have been suffered from memory starvation symptom on that server (24GB RAM) for a long time and watching vmstat -z periodically. It stopped working once a week. I investigated the vmstat log again and found the amount of NAMEI leak was 11,543,956 (about 11GB!) just before the locked-up. After applying the patch, the leak disappeared. Thank you for fixing it! -- Hiroki pgpbHh66gySGv.pgp Description: PGP signature
Re: another panic in 8.3-PRERELEASE
Konstantin Belousov kostik...@gmail.com wrote in 20120224150259.gv55...@deviant.kiev.zoral.com.ua: ko #19 0x000800abecfc in ?? () ko Previous frame inner to this frame (corrupt stack?) ko (kgdb) ko Can you, please, print out the content of *td, e.g. from the frame 16 ? ko ko And *req from the frame 11, please. Here: (kgdb) f 16 #16 0x80675e3a in __sysctl (td=0xff0396ec5460, uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491 1491error = userland_sysctl(td, name, uap-namelen, (kgdb) print *td $2 = {td_lock = 0x80d7f540, td_proc = 0xff03969bf470, td_plist = { tqe_next = 0x0, tqe_prev = 0xff03969bf480}, td_runq = {tqe_next = 0x0, tqe_prev = 0x80d7f788}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xff0396ebe800}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xff86c57b48a0}, td_cpuset = 0xff0005789dc8, td_sel = 0xff01b5dd0500, td_sleepqueue = 0xff0396ebe800, td_turnstile = 0xff01334cf600, td_umtxq = 0xff0396ec3a80, td_tid = 100763, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xff0396ec5500}, sq_proc = 0xff03969bf470, sq_flags = 1}, td_flags = 65540, td_inhibitors = 0, td_pflags = 0, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 4 '\004', td_oncpu = 4 '\004', td_owepreempt = 0 '\0', td_tsqueue = 255 'ÿ', td_locks = 4, td_rw_rlocks = 0, td_lk_slocks = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x80ecebf0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xff007d537b00, td_estcpu = 0, td_slptick = 0, td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = { tv_sec = 0, tv_usec = 0}, ru_maxrss = 1864, ru_ixrss = 66288, ru_idrss = 1347856, ru_isrss = 176768, ru_minflt = 263901, ru_majflt = 10, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14937, ru_nivcsw = 3286}, td_incruntime = 0, td_runtime = 15204044088, td_pticks = 15, td_sticks = 15, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_generation = 18223, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = top, '\0' repeats 16 times, td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = { si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = { _fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', td_pri_class = 3 '\003', td_user_pri = 129 '\201', td_base_user_pri = 129 '\201', td_pcb = 0xff86c6389d10, td_state = TDS_RUNNING, td_retval = {0, 34375032832}, td_slpcallout = { c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xff800042ccd0}}, c_time = 51568077, c_arg = 0xff0396ec5460, c_func = 0x806a84c0 sleepq_timeout, c_lock = 0x0, c_flags = 18, c_cpu = 4}, td_frame = 0xff86c6389c50, td_kstack_obj = 0xff03410b20d8, td_kstack = 18446743553049124864, td_kstack_pages = 4, td_unused1 = 0x0, td_unused2 = 0, td_unused3 = 0, td_critnest = 0, td_md = {md_spinlock_count = 0, md_saved_flags = 70}, td_sched = 0xff0396ec5890, td_ar = 0x0, td_syscalls = 469926, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_rux = { rux_runtime = 15204044088, rux_uticks = 226, rux_sticks = 1140, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, td_map_def_user = 0x0, td_dbg_forked = 0} (kgdb) f 11 #11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470, req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085 1085error = SYSCTL_OUT(req, ki, sizeof(struct kinfo_proc)); (kgdb) print *req $3 = {td = 0xff0396ec5460, lock = 2, oldptr = 0x800e96000, oldlen = 68217, oldidx = 1088, oldfunc = 0x80675e80 sysctl_old_user, newptr = 0x0, newlen = 0, newidx = 0, newfunc = 0x80675d10 sysctl_new_user, validlen = 68217, flags = 0} (kgdb) quit -- Hiroki pgpXBb7kwRDuX.pgp Description: PGP signature
Re: panic in 8.3-PRERELEASE
Rick Macklem rmack...@uoguelph.ca wrote in 476361430.1773817.1329954835308.javamail.r...@erie.cs.uoguelph.ca: rm John Baldwin wrote: rm On Wednesday, February 22, 2012 2:24:14 pm Konstantin Belousov wrote: rm On Wed, Feb 22, 2012 at 11:29:40AM -0500, Rick Macklem wrote: rmHiroki Sato wrote: rm Hi, rm rm Just a report, but I got the following panic on an NFS server rm running rm 8.3-PRERELEASE: rm rm (from here) rm pool.allbsd.org dumped core - see /var/crash/vmcore.0 rm rm Tue Feb 21 10:59:44 JST 2012 rm rm FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE rm #7: Thu rm Feb 16 19:29:19 JST 2012 rm h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL rm amd64 rm rm panic: Assertion lock == sq-sq_lock failed at rm /usr/src/sys/kern/subr_sleepqueue.c:335 rm rmOops, I didn't know that mixing msleep() and tsleep() calls on the rmsame rmevent wasn't allowed. rmThere are two places in the code where it did a: rm mtx_unlock(); rm tsleep(); rmleft over from the days when it was written for OpenBSD. rm This sequence allows to lost the wakeup which is happen right after rm cache unlock (together with clearing the RC_WANTED flag) but before rm the thread enters sleep state. The tsleep has a timeout so thread rm should rm recover in 10 seconds, but still. rm rm Anyway, you should use consistent outer lock for the same wchan, rm i.e. rm no lock (tsleep) or mtx (msleep), but not mix them. rm rm Correct. rm rmI don't think the mix would actually break anything, except that rmthe rmMPASS() assertion fails, but I've cc'd jhb@ since he seems to have rmbeen rmthe author of the sleep() stuff. rm rmAnyhow, please try the attached patch which replaces the rmmtx_unlock(); rm tsleep(); with rmmsleep()s using PDROP. If the attachment gets lost, the patch is rmalso rm here: rm http://people.freebsd.org/~rmacklem/tsleep.patch rm rmThanks for reporting this, rick rmps: Is mtx_lock() now preferred over msleep()? rm What do you mean ? rm rm mtx_sleep() is preferred over msleep(), but I doubt I will remove rm msleep() rm anytime soon. rm rm Ok, I'll redo the patch with mtx_sleep() and get one of you guys to rm review it. Thank you for the patch! I applied it and put the box under a stress testing again. -- Hiroki pgphnvwzNb6TV.pgp Description: PGP signature
another panic in 8.3-PRERELEASE
Hi, This is another reproducible panic. This seems to happen only when top(1) is running for a long time (a sysctl() call for CTL_KERN.KERN_PROC.KERN_PROC_PROC MIB triggered it). pool.allbsd.org dumped core - see /var/crash/vmcore.0 Thu Feb 23 23:21:52 JST 2012 FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #8: Thu Feb 23 04:40:54 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL amd64 panic: GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x800e96000 fault code = supervisor write data, protection violation instruction pointer = 0x20:0x809440cb stack pointer = 0x28:0xff86c63890b0 frame pointer = 0x28:0xff86c6389100 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 47211 (top) lock order reversal: (Giant after non-sleepable) 1st 0xff0244b85568 process lock (process lock) @ /usr/src/sys/kern/kern_proc.c:1211 2nd 0x80d74c80 Giant (Giant) @ /usr/src/sys/dev/usb/input/ukbd.c:2018 KDB: stack backtrace: Dumping 23903 out of 24550 MB:..1%..11%..21%..31% (CTRL-C to abort) (CTRL-C to abort) ..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done. done. Loaded symbols for /boot/kernel/geom_mirror.ko Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from /boot/kernel/ipfw.ko.symbols...done. done. Loaded symbols for /boot/kernel/ipfw.ko #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:263 263 if (textdump_pending) (kgdb) #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:263 #1 0x801f8cfc in db_fncall (dummy1=Variable dummy1 is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0x801f9031 in db_command (last_cmdp=0x80d37f40, cmd_table=Variable cmd_table is not available. ) at /usr/src/sys/ddb/db_command.c:445 #3 0x801f9280 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0x801fb369 in db_trap (type=Variable type is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0x8069dff1 in kdb_trap (type=12, code=0, tf=0xff86c6389000) at /usr/src/sys/kern/subr_kdb.c:548 #6 0x809461ed in trap_fatal (frame=0xff86c6389000, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:820 #7 0x809468b5 in trap (frame=0xff86c6389000) at /usr/src/sys/amd64/amd64/trap.c:326 #8 0x8092d2f4 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #9 0x809440cb in copyout () at /usr/src/sys/amd64/amd64/support.S:258 #10 0x80675f1f in sysctl_old_user (req=0xff86c63899c0, p=0xff86c6389470, l=1088) at /usr/src/sys/kern/kern_sysctl.c:1276 #11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470, req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085 #12 0x8065ff6c in sysctl_out_proc (p=0xff0244b85470, req=0xff86c63899c0, flags=Variable flags is not available. ) at /usr/src/sys/kern/kern_proc.c:1114 #13 0x8066245e in sysctl_kern_proc (oidp=Variable oidp is not available. ) at /usr/src/sys/kern/kern_proc.c:1302 #14 0x806756e8 in sysctl_root (oidp=Variable oidp is not available. ) at /usr/src/sys/kern/kern_sysctl.c:1455 #15 0x8067598e in userland_sysctl (td=0x0, name=0xff86c6389a80, namelen=3, old=0x800e96000, oldlenp=Variable oldlenp is not available. ) at /usr/src/sys/kern/kern_sysctl.c:1565 #16 0x80675e3a in __sysctl (td=0xff0396ec5460, uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491 #17 0x80945809 in amd64_syscall (td=0xff0396ec5460, traced=0) at subr_syscall.c:114 #18 0x8092d5ec in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 #19 0x000800abecfc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) db show alllocks Process 1169 (sshd) thread 0xff0022cfa460 (100715) exclusive sx so_rcv_sx
panic in 8.3-PRERELEASE
Hi, Just a report, but I got the following panic on an NFS server running 8.3-PRERELEASE: (from here) pool.allbsd.org dumped core - see /var/crash/vmcore.0 Tue Feb 21 10:59:44 JST 2012 FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #7: Thu Feb 16 19:29:19 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL amd64 panic: Assertion lock == sq-sq_lock failed at /usr/src/sys/kern/subr_sleepqueue.c:335 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done. done. Loaded symbols for /boot/kernel/geom_mirror.ko Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from /boot/kernel/ipfw.ko.symbols...done. done. Loaded symbols for /boot/kernel/ipfw.ko #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:263 263 if (textdump_pending) (kgdb) #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:263 #1 0x801f8cfc in db_fncall (dummy1=Variable dummy1 is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0x801f9031 in db_command (last_cmdp=0x80d37f40, cmd_table=Variable cmd_table is not available. ) at /usr/src/sys/ddb/db_command.c:445 #3 0x801f9280 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0x801fb369 in db_trap (type=Variable type is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0x8069e021 in kdb_trap (type=3, code=0, tf=0xff86c5f7e640) at /usr/src/sys/kern/subr_kdb.c:548 #6 0x80946766 in trap (frame=0xff86c5f7e640) at /usr/src/sys/amd64/amd64/trap.c:595 #7 0x8092d324 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #8 0x8069de7b in kdb_enter (why=0x80a891dd panic, msg=0xa Address 0xa out of bounds) at cpufunc.h:63 #9 0x8066afc0 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:597 #10 0x806a9360 in sleepq_add (wchan=0xff0073b97a00, lock=0x80d6af00, wmesg=0x80a7bb28 nfsrc, flags=0, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:335 #11 0x80673e4f in _sleep (ident=0xff0073b97a00, lock=0x80d6af00, priority=Variable priority is not available. ) at /usr/src/sys/kern/kern_synch.c:218 #12 0x805fe01e in nfsrvd_updatecache (nd=0xff86c5f7e960, so=0xff002217c000) at /usr/src/sys/fs/nfsserver/nfs_nfsdcache.c:697 #13 0x805ea934 in nfssvc_program (rqst=0xff0476070800, xprt=0xff000edd0a00) at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:333 #14 0x8084c76b in svc_run_internal (pool=0xff000c876600, ismaster=0) at /usr/src/sys/rpc/svc.c:895 #15 0x8084cc8b in svc_thread_start (arg=Variable arg is not available. ) at /usr/src/sys/rpc/svc.c:1200 #16 0x80640865 in fork_exit ( callout=0x8084cc80 svc_thread_start, arg=0xff000c876600, frame=0xff86c5f7ec50) at /usr/src/sys/kern/kern_fork.c:876 #17 0x8092d86e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 #18 0x0080 in ?? () #19 0x7fffe700 in ?? () #20 0x002e in ?? () #21 0x in ?? () #22 0xfef4 in ?? () #23 0xff000e1028c0 in ?? () #24 0x009b in ?? () #25 0x7fffe700 in ?? () #26 0x0006 in ?? () #27 0x0003 in ?? () #28 0x in ?? () #29 0x7fffe720 in ?? () #30 0x in ?? () #31 0x in ?? () #32 0x0001 in ?? () #33 0x001b0013000c in ?? () #34 0x7fe8 in ?? () #35 0x003b003b0001 in ?? () #36 0x0002 in ?? () #37 0x0008006a1dac in ?? () #38 0x0043 in ?? () #39 0x0202 in ?? () #40 0x7fffe6c8 in ?? () #41 0x003b in ?? () #42 0xff0022262470 in ?? () #43 0x in ?? () #44 0x80d80e40 in tdq_cpu () #45 0xff00057958c0 in ?? () #46 0xff86c5f7e930 in ?? () #47 0xff86c5f7e8d8 in ?? () #48 0xff002218c8c0 in ?? () #49 0x80691397 in sched_switch (td=0xff000c876600, newtd=0x8084cc80, flags=Variable flags is not available. ) at
Re: New BSD Installer
Andriy Gapon a...@freebsd.org wrote in 4f3e3000.9000...@freebsd.org: av -BEGIN PGP SIGNED MESSAGE- av Hash: SHA1 av av on 17/02/2012 09:04 Hiroki Sato said the following: av No, the issue is our gptloader assumes the backup header is always located av at the (physical) last sector while this is not mandatory in the UEFI av specification. av av Are you sure? Yes, sure. In the gm0-md0+md1 case, the last LBA of the device is changed (growed in size) but they can still have a valid backup header at the last LBA - 1 before an attempt to grow the size of the volume as the last paragraph of your excerpts says. If we *choose* to grow the device size permanently, the backup header must be relocated at the new last LBA. However, before the relocation happens, the specification says both the primary and secondary header must be valid in the previous device size. This is my understanding. This means software should assume the device size can grow and should not assume the backup header is always located at the last possible LBA on the device. If AlternateLBA does not match the device size - 1, the software should recognize the location of the backup header based on the information in the primary header first. The gptboot does not do so currently. I didn't give it a try actually but the attached patch is what I want to say. -- Hiroki Index: sys/boot/common/gpt.c === --- sys/boot/common/gpt.c (revision 230616) +++ sys/boot/common/gpt.c (working copy) @@ -333,24 +333,26 @@ gptread_table(primary, uuid, dskp, hdr_primary, table_primary) == 0) { hdr_primary_lba = hdr_primary.hdr_lba_self; + /* Use AlternateLBA if valid. If not, use LastUsableLBA+34. */ + if (hdr_primary_lba hdr_primary.hdr_lba_alt) + altlba = hdr_primary.hdr_lba_alt; + else if (hdr_primary.hdr_lba_end != 0) + altlba = hdr_primary.hdr_lba_end + 34; gpthdr = hdr_primary; gpttable = table_primary; } - altlba = drvsize(dskp); - if (altlba 0) - altlba--; - else if (hdr_primary_lba 0) { - /* - * If we cannot obtain disk size, but primary header - * is valid, we can get backup header location from - * there. - */ - altlba = hdr_primary.hdr_lba_alt; + /* + * Try to locate the backup header from the media size if no primary + * header found. + */ + if (hdr_primary_lba == 0) { + altlba = drvsize(dskp); + if (altlba 0) + altlba--; } - if (altlba == 0) - printf(%s: unable to locate backup GPT header\n, BOOTPROG); - else if (gptread_hdr(backup, dskp, hdr_backup, altlba) == 0 + if (altlba != 0 + gptread_hdr(backup, dskp, hdr_backup, altlba) == 0 gptread_table(backup, uuid, dskp, hdr_backup, table_backup) == 0) { hdr_backup_lba = hdr_backup.hdr_lba_self; @@ -359,7 +361,8 @@ gpttable = table_backup; printf(%s: using backup GPT\n, BOOTPROG); } - } + } else + printf(%s: unable to locate backup GPT header\n, BOOTPROG); /* * Convert all BOOTONCE without BOOTME flags into BOOTFAILED. pgppi2XRbnX5b.pgp Description: PGP signature
Re: New BSD Installer
Jeremy Chadwick free...@jdc.parodius.com wrote in 20120217030806.ga62...@icarus.home.lan: fr On Thu, Feb 16, 2012 at 07:40:35PM -0700, Warren Block wrote: fr Sorry, I may be misunderstanding your point. GEOM classes don't fr lie, they accurately represent the space. The space provided by a fr gmirror is one block less than the actual space occupied, to allow fr for the metadata block at the end. The problem is that GPT puts fr backup partition tables at the end of the physical (not logical) fr device. Create a GEOM device on that drive, and the GEOM metadata fr overwrites the backup GPT partition table. Well, the last block of fr it, anyway. fr fr But create the GEOM device inside a GPT partition that spans the fr drive, and things are fine. The GPT backup tables are safely fr outside the GEOM metadata, which is safely outside of the data. fr fr I wasn't aware you could do that. I was only aware that it was the fr other way around. That (my) misconception seems to also be relayed fr by others such as Miroslav who said: fr fr GPT doesn't play nice with GEOM classes which store their metadata fr on last sector. For example, you can't use gmirror of a whole drives fr and use GPT on top of this mirror. (and gmirror is not the only one) fr fr So if I read this correctly, it means that the erroneous behaviour is fr the result of someone doing things in the wrong order (for lack of fr better terminology). Well, does GPT really depend on the absolute last block? The header has fields for both the first and the last LBAs and they do not have to be matched with the physical capacity. Creating a gmirror first, and then creating a GPT on it does not work? I do not think it is true, and I suspect a description on gmirror recommending kern.geom.debugflags=17 in the handbook is the source of the problem. The partition layout in my mind is the following: (0)(last) |PMBR|GPT primary| |GPT secondary|gmirror meta| |-| ada0 ||| mirror/gm0 ||-| | mirror/gm0p{1,2,...} and the following commands will create an example of this configuration: # mdconfig -a -t vnode -s100m md0 # mdconfig -a -t vnode -s100m md1 # gmirror label gm0 /dev/md0 /dev/md1 # gmirror dump /dev/md0 | grep size mediasize: 104857088 sectorsize: 512 provsize: 104857600 # gpart create -s gpt mirror/gm0 # gpart add -t freebsd-ufs mirror/gm0 mirror/gm0p1 added =34 204732 mirror/gm0 GPT (100M) 34 204732 1 freebsd-ufs (100M) # echo (34 + 204732) * 512 | bc 104840192 The size of GPT header + partition entries is 33 sectors. So, # echo (34 + 204732) * 512 + 33 * 512 | bc 104857088 is the size which the GPT recognizes. This matches the size of mirror/gm0, not /dev/md0. This means the gmirror metadata is located just after it. I think this should work in most cases for mirroring the whole disk. Certainly the gpart reports [CORRUPT] if the underlying device capacity does not match with the GPT header. For example, deactivating mirror/gm0 above will show the following: # gpart show =34 204732 mirror/gm0 GPT (100M) 34 204732 1 freebsd-ufs (100M) # gmirror stop gm0 # gpart show =34 204732 md1 GPT (100M) [CORRUPT] 34 2047321 freebsd-ufs (100M) =34 204732 md0 GPT (100M) [CORRUPT] 34 2047321 freebsd-ufs (100M) # gpart recover md0 md0 recovered # gpart show =34 204732 md1 GPT (100M) [CORRUPT] 34 2047321 freebsd-ufs (100M) =34 204733 md0 GPT (100M) 34 2047321 freebsd-ufs (100M) 204766 1 - free - (512B) We can see the gpart recover extends the size to the last sector where gmirror metadata was placed and clears the [CORRUPT] status as expected. So, some early boot stages which do not recognize mirror/gm0 see the corrupted GPT. However, I think they will simply follow the information in the GPT header. -- Hiroki pgpHa3wCUo9zw.pgp Description: PGP signature
Re: New BSD Installer
Freddie Cash fjwc...@gmail.com wrote in caojfwz5ehgfr_vp0+trfxvgm6kzxv9qo3ufvdkura96z3ax...@mail.gmail.com: fj On Thu, Feb 16, 2012 at 8:20 PM, Hiroki Sato h...@freebsd.org wrote: fj Jeremy Chadwick free...@jdc.parodius.com wrote fj in 20120217030806.ga62...@icarus.home.lan: fj fj fr On Thu, Feb 16, 2012 at 07:40:35PM -0700, Warren Block wrote: fj fr Sorry, I may be misunderstanding your point. GEOM classes don't fj fr lie, they accurately represent the space. The space provided by a fj fr gmirror is one block less than the actual space occupied, to allow fj fr for the metadata block at the end. The problem is that GPT puts fj fr backup partition tables at the end of the physical (not logical) fj fr device. Create a GEOM device on that drive, and the GEOM metadata fj fr overwrites the backup GPT partition table. Well, the last block of fj fr it, anyway. fj fr fj fr But create the GEOM device inside a GPT partition that spans the fj fr drive, and things are fine. The GPT backup tables are safely fj fr outside the GEOM metadata, which is safely outside of the data. fj fr fj fr I wasn't aware you could do that. I was only aware that it was the fj fr other way around. That (my) misconception seems to also be relayed fj fr by others such as Miroslav who said: fj fr fj fr GPT doesn't play nice with GEOM classes which store their metadata fj fr on last sector. For example, you can't use gmirror of a whole drives fj fr and use GPT on top of this mirror. (and gmirror is not the only one) fj fr fj fr So if I read this correctly, it means that the erroneous behaviour is fj fr the result of someone doing things in the wrong order (for lack of fj fr better terminology). fj fj Well, does GPT really depend on the absolute last block? The header fj has fields for both the first and the last LBAs and they do not have fj to be matched with the physical capacity. Creating a gmirror first, fj and then creating a GPT on it does not work? I do not think it is fj true, and I suspect a description on gmirror recommending fj kern.geom.debugflags=17 in the handbook is the source of the problem. fj fj It's not the partitioning that's the issue. It's the order that GEOM fj providers and GPT partition tables are tasted. fj fj You can gmirror two disks, then GPT partition the gm0 device without fj any issues. As you noted, the first/last sectors are 1 less than the fj physical disk (the size of the gmirror provider). fj fj When you boot, though, the gptboot loader only sees the GPT table, it fj doesn't know that it's part of a gmirror setup. Thus it loads the fj GPT, notices that the size of the GPT is 1 less sector than the size fj of the disk, can't find the secondary GPT table as the last sector of fj the disk is gmirror metadata, and complains about corrupted GPT. fj fj Then the kernel loads, gmirror tastes the disk, finds the gmirror fj metadata, configures the gmirror provider, and now all the GPT stuff fj matches again. And the system carries on correctly. fj fj The issue is that we don't have a GEOM-aware loader. Or, at least, fj that the gpt*boot loaders read the GPT table(s) before configuring the fj GEOM providers. No, the issue is our gptloader assumes the backup header is always located at the (physical) last sector while this is not mandatory in the UEFI specification. GEOM-based logical volumes suffer from this assumption at boot time. It is not practical (and not necessary) to taste the volumes before loading a kernel. If the primary header is valid, using a lookup order of the hdr_lba_alt(AlternateLBA), the hdr_lba_end(LastUsableLBA), then drvsize() - 1 looks reasonable to me. The current code uses drvsize() - 1 first and then looks up the AlternateLBA only when drvsize() failed. -- Hiroki pgpTRoiMCJIgR.pgp Description: PGP signature
Re: accepting rtadv broken on 9-STABLE, re driver?
Mark Felder f...@feld.me wrote in op.v7tvkbkr34t2sn@tech304: fe On Sat, 07 Jan 2012 14:23:46 -0600, Hiroki Sato h...@freebsd.org fe wrote: fe fe It is an unexpected behavior and the flag should be set on all fe interfaces. Can you send me your /etc/rc.conf, /etc/sysctl.conf, and fe the result of ifconfig -a? fe fe Back at work so I have access to the machine again: (snip) fe # ifconfig -a fe fe 11:43:29 tech304:~ ifconfig -a fe re0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric fe 0 mtu 1500 fe options=209bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC fe ether d0:67:e5:17:e1:32 fe inet6 fe80::d267:e5ff:fe17:e132%re0 prefixlen 64 scopeid 0x2 fe inet 192.168.93.23 netmask 0xff00 broadcast 192.168.93.255 fe inet6 2607:f4e0:100:104:d267:e5ff:fe17:e132 prefixlen 64 autoconf fe nd6 options=23PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL fe media: Ethernet autoselect (100baseTX full-duplex) fe status: active re0 seems to have ACCEPT_RTADV. What is the problem? fe lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 fe options=3RXCSUM,TXCSUM fe inet6 ::1 prefixlen 128 fe inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 fe inet 127.0.0.1 netmask 0xff00 fe nd6 options=21PERFORMNUD,AUTO_LINKLOCAL fe vboxnet0: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500 fe ether 0a:00:27:00:00:00 fe nd6 options=23PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL -- Hiroki pgpOYUgzx28Hl.pgp Description: PGP signature
Re: accepting rtadv broken on 9-STABLE, re driver?
Mark Felder f...@feld.me wrote in op.v7t4xpuh34t2sn@tech304: fe On Mon, 09 Jan 2012 13:02:24 -0600, Hiroki Sato h...@freebsd.org fe wrote: fe fe re0 seems to have ACCEPT_RTADV. What is the problem? fe fe That's because I haven't rebooted fe fe Let's start fresh. fe fe The normal ipv6 configuration anyone would use: fe fe -ipv6_activate_all_interfaces=YES in rc.conf fe fe -NO mention of net.inet6.ip6.accept_rtadv in sysctl.conf fe fe I boot up, re0 *does not* have ACCEPT_RTADV. This is an expected behavior. ACCEPT_RTADV is disabled by default on 9.X. fe I try forcing via the sysctl: net.inet6.ip6.accept_rtadv=1 fe fe Still doesn't work! This needs a reboot. Did you reboot the box? fe Why? What makes this machine different? All the other machines I run fe do not require this to get ACCEPT_RTADV. Is it the re driver? My other fe machines have em and ath interfaces. Putting the following line net.inet6.ip6.accept_rtadv=1 into /etc/sysctl.conf, and then removing the following line ipv6_ifconfig_re0=inet6 accept_rtadv should work, I think. (Of course a reboot is needed after that). -- Hiroki pgpruYhioBd6d.pgp Description: PGP signature
Re: accepting rtadv broken on 9-STABLE, re driver?
Mark Felder f...@feld.me wrote in op.v7ogp01w34t2sn@tech304: fe I figured I would end up putting that in rc.conf as a temporary fix, fe but maybe that's just the long term solution. It seems so odd to me fe that the sysctl change doesn't automatically cause the ACCEPT_RTADV fe option to show up for re0, but it does for vboxnet0. Perhaps there fe should be a cleaner way to do this in rc.conf like how we do fe ifconfig_re0=DHCP ? Is it correct that ACCEPT_RTADV option was enabled on the vboxnet0 and not on re0, even after setting net.inet6.ip6.accept_rtadv to 1 at boot time and ipv6_activate_all_interfaces=YES? -- Hiroki pgpdfA1Ujv4In.pgp Description: PGP signature
Re: accepting rtadv broken on 9-STABLE, re driver?
Mark Felder f...@feld.me wrote in 891fe25c-1560-479f-b855-1713c1c7a...@email.android.com: fe Hiroki Sato h...@freebsd.org wrote: fe fe Is it correct that ACCEPT_RTADV option was enabled on the vboxnet0 fe and not on re0, even after setting net.inet6.ip6.accept_rtadv to 1 at fe boot time and ipv6_activate_all_interfaces=YES? fe fe -- Hiroki fe fe Yes, that is the behavior I witnessed. It is an unexpected behavior and the flag should be set on all interfaces. Can you send me your /etc/rc.conf, /etc/sysctl.conf, and the result of ifconfig -a? -- Hiroki pgpajcFzYLzs1.pgp Description: PGP signature
Re: ZFS panic on a RELENG_8 NFS server
Hiroki Sato h...@freebsd.org wrote in 20110911.054601.1424617155148336027@allbsd.org: hr Hiroki Sato h...@freebsd.org wrote hr in 20110910.044841.232160047547388224@allbsd.org: hr hr hr Hiroki Sato h...@freebsd.org wrote hr hr in 20110907.094717.2272609566853905102@allbsd.org: hr hr hr hr hr During this investigation an disk has to be replaced and resilvering hr hr hr it is now in progress. A deadlock and a forced reboot after that hr hr hr make recovering of the zfs datasets take a long time (for committing hr hr hr logs, I think), so I will try to reproduce the deadlock and get a hr hr hr core dump after it finished. hr hr hr hr I think I could reproduce the symptoms. I have no idea about if hr hr these are exactly the same as occurred on my box before because the hr hr kernel was replaced with one with some debugging options, but these hr hr are reproducible at least. hr hr hr hr There are two symptoms. One is a panic. A DDB output when the panic hr hr occurred is the following: hr hr I am trying vfs.lookup_shared=0 and seeing how it goes. It seems the hr box can endure a high load which quickly caused these symptoms. There was no difference by the knob. The same panic or unresponsiveness still occurs in about 24-32 hours or so. -- Hiroki pgpIwsQ57ZO6Q.pgp Description: PGP signature
Re: ZFS panic on a RELENG_8 NFS server
Hiroki Sato h...@freebsd.org wrote in 20110910.044841.232160047547388224@allbsd.org: hr Hiroki Sato h...@freebsd.org wrote hr in 20110907.094717.2272609566853905102@allbsd.org: hr hr hr During this investigation an disk has to be replaced and resilvering hr hr it is now in progress. A deadlock and a forced reboot after that hr hr make recovering of the zfs datasets take a long time (for committing hr hr logs, I think), so I will try to reproduce the deadlock and get a hr hr core dump after it finished. hr hr I think I could reproduce the symptoms. I have no idea about if hr these are exactly the same as occurred on my box before because the hr kernel was replaced with one with some debugging options, but these hr are reproducible at least. hr hr There are two symptoms. One is a panic. A DDB output when the panic hr occurred is the following: I am trying vfs.lookup_shared=0 and seeing how it goes. It seems the box can endure a high load which quickly caused these symptoms. -- Hiroki pgpfb5zUJdfPH.pgp Description: PGP signature
ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too long (RELENG_8 from today))
Hiroki Sato h...@freebsd.org wrote in 20110907.094717.2272609566853905102@allbsd.org: hr During this investigation an disk has to be replaced and resilvering hr it is now in progress. A deadlock and a forced reboot after that hr make recovering of the zfs datasets take a long time (for committing hr logs, I think), so I will try to reproduce the deadlock and get a hr core dump after it finished. I think I could reproduce the symptoms. I have no idea about if these are exactly the same as occurred on my box before because the kernel was replaced with one with some debugging options, but these are reproducible at least. There are two symptoms. One is a panic. A DDB output when the panic occurred is the following: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x10040 fault code = supervisor read data, page not present instruction pointer = 0x20:0x8065b926 stack pointer = 0x28:0xff8257b94d70 frame pointer = 0x28:0xff8257b94e10 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 992 (nfsd: service) [thread pid 992 tid 100586 ] Stopped at witness_checkorder+0x246: movl0x40(%r13),%ebx db bt Tracing pid 992 tid 100586 td 0xff00595d9000 witness_checkorder() at witness_checkorder+0x246 _sx_slock() at _sx_slock+0x35 dmu_bonus_hold() at dmu_bonus_hold+0x57 zfs_zget() at zfs_zget+0x237 zfs_dirent_lock() at zfs_dirent_lock+0x488 zfs_dirlook() at zfs_dirlook+0x69 zfs_lookup() at zfs_lookup+0x26b zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81 vfs_cache_lookup() at vfs_cache_lookup+0xf0 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40 lookup() at lookup+0x384 nfsvno_namei() at nfsvno_namei+0x268 nfsrvd_lookup() at nfsrvd_lookup+0xd6 nfsrvd_dorpc() at nfsrvd_dorpc+0x745 nfssvc_program() at nfssvc_program+0x447 svc_run_internal() at svc_run_internal+0x51b svc_thread_start() at svc_thread_start+0xb fork_exit() at fork_exit+0x11d fork_trampoline() at fork_trampoline+0xe --- trap 0xc, rip = 0x8006a031c, rsp = 0x7fffe6c8, rbp = 0x6 --- The complete output can be found at: http://people.allbsd.org/~hrs/zfs_panic_20110909_1/pool-zfs-20110909-1.txt Another is getting stuck at ZFS access. The kernel is running with no panic but any access to ZFS datasets causes a program non-responsive. The DDB output can be found at: http://people.allbsd.org/~hrs/zfs_panic_20110909_2/pool-zfs-20110909-2.txt The trigger for the both was some access to a ZFS dataset from the NFS clients. Because the access pattern was complex I could not narrow down what was the culprit, but it seems timing-dependent and simply doing rm -rf locally on the server can sometimes trigger them. The crash dump and the kernel can be found at the following URLs: panic: http://people.allbsd.org/~hrs/zfs_panic_20110909_1/ no panic but unresponsive: http://people.allbsd.org/~hrs/zfs_panic_20110909_2/ kernel: http://people.allbsd.org/~hrs/zfs_panic_20110909_kernel/ -- Hiroki ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic: spin lock held too long (RELENG_8 from today)
Attilio Rao atti...@freebsd.org wrote in CAJ-FndAChGndC=lkzni7i6mot+spw3-ofto9rh0+5wnnvwz...@mail.gmail.com: at This should be enough for someone NFS-aware to look into it. at at Were you also able to get a core? Yes. But as kib@ pointed out it seems a deadlock in ZFS. Some experiments I did showed that this deadlock can be triggered at least by doing rm -rf against a local directory that has a large number of files/sub-directories. Then, I updated the kernel with the latest 8-STABLE + WITNESS option because a fix for LOR of spa_config lock was committed and tracking locks without WITNESS was hard. The deadlock can still be triggered after that. During this investigation an disk has to be replaced and resilvering it is now in progress. A deadlock and a forced reboot after that make recovering of the zfs datasets take a long time (for committing logs, I think), so I will try to reproduce the deadlock and get a core dump after it finished. If the old kernel and core of the deadlock I reported on Saturday are still useful for debugging, I can put them to somewhere you can access. -- Hiroki pgptCZubr4hdM.pgp Description: PGP signature
Re: panic: spin lock held too long (RELENG_8 from today)
Attilio Rao atti...@freebsd.org wrote in CAJ-FndDHmwa+=lnggu+5mk2xmtj8kwhb10jsoytkmgetvgn...@mail.gmail.com: at If nobody complains about it earlier, I'll propose the patch to re@ in 8 hours. Running fine for 45 hours so far. Please go ahead! -- Hiroki pgp3JVRs7kKa0.pgp Description: PGP signature
Re: panic: spin lock held too long (RELENG_8 from today)
Chip Camden sterl...@camdensoftware.com wrote in 20110818025550.ga1...@libertas.local.camdensoftware.com: st Quoth Attilio Rao on Thursday, 18 August 2011: st In callout_cpu_switch() if a low priority thread is migrating the st callout and gets preempted after the outcoming cpu queue lock is left st (and scheduled much later) we get this problem. st st In order to fix this bug it could be enough to use a critical section, st but I think this should be really interrupt safe, thus I'd wrap them st up with spinlock_enter()/spinlock_exit(). Fortunately st callout_cpu_switch() should be called rarely and also we already do st expensive locking operations in callout, thus we should not have st problem performance-wise. st st Can the guys I also CC'ed here try the following patch, with all the st initial kernel options that were leading you to the deadlock? (thus st revert any debugging patch/option you added for the moment): st http://www.freebsd.org/~attilio/callout-fixup.diff st st Please note that this patch is for STABLE_8, if you can confirm the st good result I'll commit to -CURRENT and then backmarge as soon as st possible. st st Thanks, st Attilio st st st Thanks, Attilio. I've applied the patch and removed the extra debug st options I had added (though keeping debug symbols). I'll let you know if st I experience any more panics. No panic for 20 hours at this moment, FYI. For my NFS server, I think another 24 hours would be sufficient to confirm the stability. I will see how it works... -- Hiroki pgpatVE0r5wVx.pgp Description: PGP signature
Re: panic: spin lock held too long (RELENG_8 from today)
Hi, Mike Tancsa m...@sentex.net wrote in 4e15a08c.6090...@sentex.net: mi On 7/7/2011 7:32 AM, Mike Tancsa wrote: mi On 7/7/2011 4:20 AM, Kostik Belousov wrote: mi mi BTW, we had a similar panic, spinlock held too long, the spinlock mi is the sched lock N, on busy 8-core box recently upgraded to the mi stable/8. Unfortunately, machine hung dumping core, so the stack trace mi for the owner thread was not available. mi mi I was unable to make any conclusion from the data that was present. mi If the situation is reproducable, you coulld try to revert r221937. This mi is pure speculation, though. mi mi Another crash just now after 5hrs uptime. I will try and revert r221937 mi unless there is any extra debugging you want me to add to the kernel mi instead ? I am also suffering from a reproducible panic on an 8-STABLE box, an NFS server with heavy I/O load. I could not get a kernel dump because this panic locked up the machine just after it occurred, but according to the stack trace it was the same as posted one. Switching to an 8.2R kernel can prevent this panic. Any progress on the investigation? -- spin lock 0x80cb46c0 (sched lock 0) held by 0xff01900458c0 (tid 100489) too long panic: spin lock held too long cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 _mtx_lock_spin() at _mtx_lock_spin+0x9e sched_add() at sched_add+0x117 setrunnable() at setrunnable+0x78 sleepq_signal() at sleepq_signal+0x7a cv_signal() at cv_signal+0x3b xprt_active() at xprt_active+0xe3 svc_vc_soupcall() at svc_vc_soupcall+0xc sowakeup() at sowakeup+0x69 tcp_do_segment() at tcp_do_segment+0x25e7 tcp_input() at tcp_input+0xcdd ip_input() at ip_input+0xac netisr_dispatch_src() at netisr_dispatch_src+0x7e ether_demux() at ether_demux+0x14d ether_input() at ether_input+0x17d em_rxeof() at em_rxeof+0x1ca em_handle_que() at em_handle_que+0x5b taskqueue_run_locked() at taskqueue_run_locked+0x85 taskqueue_thread_loop() at taskqueue_thread_loop+0x4e fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe -- -- Hiroki pgpq7HXO6kUuo.pgp Description: PGP signature
Re: panic: spin lock held too long (RELENG_8 from today)
Attilio Rao atti...@freebsd.org wrote in caj-fndcdow0_b2mv0lzeo-tpea9+7oanj7ihvkqsm4j4b0d...@mail.gmail.com: at 2011/8/17 Hiroki Sato h...@freebsd.org: at Hi, at at Mike Tancsa m...@sentex.net wrote at in 4e15a08c.6090...@sentex.net: at at mi On 7/7/2011 7:32 AM, Mike Tancsa wrote: at mi On 7/7/2011 4:20 AM, Kostik Belousov wrote: at mi at mi BTW, we had a similar panic, spinlock held too long, the spinlock at mi is the sched lock N, on busy 8-core box recently upgraded to the at mi stable/8. Unfortunately, machine hung dumping core, so the stack trace at mi for the owner thread was not available. at mi at mi I was unable to make any conclusion from the data that was present. at mi If the situation is reproducable, you coulld try to revert r221937. This at mi is pure speculation, though. at mi at mi Another crash just now after 5hrs uptime. I will try and revert r221937 at mi unless there is any extra debugging you want me to add to the kernel at mi instead ? at at I am also suffering from a reproducible panic on an 8-STABLE box, an at NFS server with heavy I/O load. I could not get a kernel dump at because this panic locked up the machine just after it occurred, but at according to the stack trace it was the same as posted one. at Switching to an 8.2R kernel can prevent this panic. at at Any progress on the investigation? at at Hiroki, at how easilly can you reproduce it? It takes 5-10 hours. I installed another kernel for debugging just now, so I think I will be able to collect more detail information in a couple of days. at It would be important to have a DDB textdump with these informations: at - bt at - ps at - show allpcpu at - alltrace at at Alternatively, a coredump which has the stop cpu patch which Andryi can provide. Okay, I will post them once I can get another panic. Thanks! -- Hiroki pgpFqPofBZyKa.pgp Description: PGP signature
Re: panic: spin lock held too long (RELENG_8 from today)
Hiroki Sato h...@freebsd.org wrote in 20110818.043332.27079545013461535@allbsd.org: hr Attilio Rao atti...@freebsd.org wrote hr in caj-fndcdow0_b2mv0lzeo-tpea9+7oanj7ihvkqsm4j4b0d...@mail.gmail.com: hr hr at 2011/8/17 Hiroki Sato h...@freebsd.org: hr at Hi, hr at hr at Mike Tancsa m...@sentex.net wrote hr at in 4e15a08c.6090...@sentex.net: hr at hr at mi On 7/7/2011 7:32 AM, Mike Tancsa wrote: hr at mi On 7/7/2011 4:20 AM, Kostik Belousov wrote: hr at mi hr at mi BTW, we had a similar panic, spinlock held too long, the spinlock hr at mi is the sched lock N, on busy 8-core box recently upgraded to the hr at mi stable/8. Unfortunately, machine hung dumping core, so the stack trace hr at mi for the owner thread was not available. hr at mi hr at mi I was unable to make any conclusion from the data that was present. hr at mi If the situation is reproducable, you coulld try to revert r221937. This hr at mi is pure speculation, though. hr at mi hr at mi Another crash just now after 5hrs uptime. I will try and revert r221937 hr at mi unless there is any extra debugging you want me to add to the kernel hr at mi instead ? hr at hr at I am also suffering from a reproducible panic on an 8-STABLE box, an hr at NFS server with heavy I/O load. I could not get a kernel dump hr at because this panic locked up the machine just after it occurred, but hr at according to the stack trace it was the same as posted one. hr at Switching to an 8.2R kernel can prevent this panic. hr at hr at Any progress on the investigation? hr at hr at Hiroki, hr at how easilly can you reproduce it? hr hr It takes 5-10 hours. I installed another kernel for debugging just hr now, so I think I will be able to collect more detail information in hr a couple of days. hr hr at It would be important to have a DDB textdump with these informations: hr at - bt hr at - ps hr at - show allpcpu hr at - alltrace hr at hr at Alternatively, a coredump which has the stop cpu patch which Andryi can provide. hr hr Okay, I will post them once I can get another panic. Thanks! I got the panic with a crash dump this time. The result of bt, ps, allpcpu, and traces can be found at the following URL: http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt -- Hiroki ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1 Pre-release gpart, isn't setting type correctly
Phil p...@amdg.etowns.org wrote in 580ca5b8f8654fc782cc113761458...@hs: ph Performing the following gpart commands on either a hard disk or ph usb memory stick doesn't correctly store the gpart type information. ph ph What we're doing, using FreeBSD 8.1-PRERELEASE, csuped as at ph 30-May-2010 23:59 UTC (*default date=2010.05.30.23.59.59) ph ph # gpart create -s GPT da1 ph # gpart add -s 1G -t freebsd-ufs da1 ph # gpart show da1 ph ph = 34 7827325 da1 GPT (3.7G) ph34 20971521 !---- (1.0G) ph 2097186 5730173 - free - (2.7G) This is probably the same issue reported at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern%2F142174 and already fixed on CURRENT. I guess the fix will be merged to 8.X soon. -- Hiroki pgpcqLZbjT00A.pgp Description: PGP signature
Re: em interface slow down on 8.0R
Hiroki Sato h...@freebsd.org wrote in 20091220.053757.230970486@allbsd.org: hr Jack Vogel jfvo...@gmail.com wrote hr in 2a41acea0912052327t7830f85aw5b4b581ab3f09...@mail.gmail.com: hr hr jf The 82573, when onboard (LOM) is usually special, it is used by system hr jf management hr jf firmware. Go to the system BIOS and turn off management, see if that hr jf eliminates the hr jf periodic hang. hr hr Well, I am using them without enabling such a BIOS feature on the two hr boxes. hr hr I was monitoring for 1 week after replacing the kernel of 8.0-STABLE hr with 8.0R. Frequency of the symptom was reduced, but occurred once hr in 2-3 days. So it is reproducible on 8.0R, too. JFYI, when I tried 8-STABLE as of May 15 the periodic hang-ups disappeared. The chip ids are 0x109a8086 and 0x108c8086 (pciconf reported them as 82573L and 82573E, added to PCI slots on the box). The hang-ups were able to be reproduced on 8.0-RELEASE. I didn't tried other boxes which had another symptom (abnormal long interval between each packet), but I will give it a try and report it, too I have no idea of what was the cause because there were a lot of changes since the release, though. -- Hiroki pgpgHrrrlXSG7.pgp Description: PGP signature
Re: em interface slow down on 8.0R
Jack Vogel jfvo...@gmail.com wrote in 2a41acea0912052327t7830f85aw5b4b581ab3f09...@mail.gmail.com: jf The 82573, when onboard (LOM) is usually special, it is used by system jf management jf firmware. Go to the system BIOS and turn off management, see if that jf eliminates the jf periodic hang. Well, I am using them without enabling such a BIOS feature on the two boxes. I was monitoring for 1 week after replacing the kernel of 8.0-STABLE with 8.0R. Frequency of the symptom was reduced, but occurred once in 2-3 days. So it is reproducible on 8.0R, too. Just after the symptom occurred, dev.em.[01].debug showed the following: Dec 17 16:50:03 pool kernel: em0: Std mbuf failed = 0 Dec 17 16:50:03 pool kernel: em0: Std mbuf cluster failed = 9612 Dec 17 16:50:12 pool kernel: em1: Std mbuf failed = 0 Dec 17 16:50:12 pool kernel: em1: Std mbuf cluster failed = 15183 The other numbers look normal to me. dev.em.[01].stats reported almost all of the counters other than Good Packets are zero. Doing ifconfig down/up could make it work again, sending/receiving 10 packets or so it stopped. -- Hiroki pgpqr0AvqiEGc.pgp Description: PGP signature
Re: em interface slow down on 8.0R
Hiroki Sato h...@freebsd.org wrote in 20091203.182931.129751456@allbsd.org: hr And another thing, I noticed a box with 82573E and 82573L sometimes hr got stuck after upgrading to 8.0-STABLE. It has moderate network hr load (average 5-10Mbps) on both NICs. It worked for a day or two and hr then got stuck suddenly. Rebooting the box solved the situation, but hr it got stuck again after a day or so. After it happens, the hr interface does not respond. The other functionalities of FreeBSD hr seemed working. Doing an up/down cycle for the NICs seemed to send hr some packets, but it did not recover completely; rebooting was needed hr for recovery. This box does not have the RTT problem. I am still hr not sure what is the trigger, there seems something wrong. Things turned out for this symptom so far are: - This occurs around once per 1-2 days. - Once it occurs, all of communications including ARP and IPv4 stop. - ifconfig em0 down/up can recover the interface. However, on doing up after down the following message was displayed: # ifconfig em0 up em0: Could not setup receive structures After trying it several times it worked. Then, the interface seemed back to normal for a couple of minutes, but it stopped again. I guess there is a kind of deadlock somewhere but not sure it is really related to the em(4) driver. I will continue to investigate anyway. -- Hiroki pgpLUYiLUHZZ7.pgp Description: PGP signature
Re: loader(8) readin failed on 7.2R and later including 8.0R
John Baldwin j...@freebsd.org wrote in 200912041734.24016@freebsd.org: jh On Friday 04 December 2009 10:35:59 am John Baldwin wrote: jh So memtop_copyin would start off as 0xf0 but would end up as 0xc0, jh and since the kernel starts at 4MB, I think that only leaves about 8MB for jh the kernel. Probably the loader needs to be more intelligent about using jh high memory for malloc by using the largest region 1MB but 4GB for jh malloc() instead of stealing memory from bios_extmem in the SMAP case. jh Try the attached patch which tries to make the loader use better smarts jh when picking a memory region for the heap (warning, I haven't tested it jh myself yet). jh jh Use the updated patch (actually tested in qemu) instead. Thanks! I applied your patch and tried loading an 8.0R kernel (without LOADER_NO_GPT_SUPPORT=yes). The elf32_loadimage: read failed error message disappeared: OK load /boot/kernel.N/kernel /boot/kernel.N/kernel text=0x8db9a4 data=0xdd134+0xa5e84 syms=[0x4+0x99390+0x4+0xd2201 elf32_loadimage: could not read symbols - skipped! OK A summary so far is: 1) a 8MB 7.1R kernel + stock 8.0R loader 2a) a 8MB 8.0R kernel + stock 8.0R loader 2b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes 2c) a 8MB 8.0R kernel + loader with your patch 3a) a 8MB 8.0R kernel + stock 8.0R loader 3b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes 3c) a 8MB 8.0R kernel + loader with your patch loading text loading syms boot 1) OKOK OK 2a) readin failed - - 2b) OKskipped! NG 2c) OKskipped! NG 3a) not tried yet 3b) OKOK NG 3c) OKOK NG Loading syms sections still fails for the large kernel. The boot=NG means it got stuck after l_exec() in boot.c and before cninit() in i386/machdep.c as far as I can check by inserting printf(). So the cause of that is something in the kernel, I guess. Hm. One thing something special of that box is that it has four quad-hme PCI cards. I will try removing them and see if it changes something or not. -- Hiroki pgprJrQm3NyM7.pgp Description: PGP signature
Re: loader(8) readin failed on 7.2R and later including 8.0R
Hiroki Sato h...@freebsd.org wrote in 20091205.184250.201700943@allbsd.org: hr A summary so far is: hr hr 1) a 8MB 7.1R kernel + stock 8.0R loader hr 2a) a 8MB 8.0R kernel + stock 8.0R loader hr 2b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes hr 2c) a 8MB 8.0R kernel + loader with your patch hr 3a) a 8MB 8.0R kernel + stock 8.0R loader hr 3b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes hr 3c) a 8MB 8.0R kernel + loader with your patch Grr, I double-checked how it got stuck, then I found the console redirect was disabled because of an old device.hints. The revised summary is: loading text loading syms boot 1) OKOK OK 2a) readin failed - - 2b) OKskipped! OK 2c) OKskipped! OK 3a) OKOK OK 3b) OKOK OK 3c) OKOK OK So, the case 2c shows that your patch solves the problem in the case 2a. Thank you! :) Loading 8MB kernel works now, but loading syms sections still fails even in the case 2c. -- Hiroki pgpZeDjCvRVGH.pgp Description: PGP signature
Re: em interface slow down on 8.0R
John Nielsen j...@jnielsen.net wrote in 1e3c66ea-a6d3-44d7-b28e-bf068fff1...@jnielsen.net: jo On Dec 5, 2009, at 4:40 AM, Hiroki Sato h...@freebsd.org wrote: jo jo Hiroki Sato h...@freebsd.org wrote jo in 20091203.182931.129751456@allbsd.org: jo jo hr And another thing, I noticed a box with 82573E and 82573L jo sometimes jo hr got stuck after upgrading to 8.0-STABLE. It has moderate network jo hr load (average 5-10Mbps) on both NICs. It worked for a day or two jo and jo hr then got stuck suddenly. Rebooting the box solved the situation, jo but jo hr it got stuck again after a day or so. After it happens, the jo hr interface does not respond. The other functionalities of FreeBSD jo hr seemed working. Doing an up/down cycle for the NICs seemed to jo send jo hr some packets, but it did not recover completely; rebooting was jo needed jo hr for recovery. This box does not have the RTT problem. I am still jo hr not sure what is the trigger, there seems something wrong. jo jo Things turned out for this symptom so far are: jo jo - This occurs around once per 1-2 days. jo jo - Once it occurs, all of communications including ARP and IPv4 stop. jo jo - ifconfig em0 down/up can recover the interface. However, on doing joup after down the following message was displayed: jo jo# ifconfig em0 up joem0: Could not setup receive structures jo joAfter trying it several times it worked. jo joThen, the interface seemed back to normal for a couple of minutes, jobut it stopped again. jo jo I guess there is a kind of deadlock somewhere but not sure it is jo really related to the em(4) driver. I will continue to investigate jo anyway. jo jo I'm curious, what speed/duplex is your interface using and is it jo statically set or using autoselect? No manual configuration. Two em's are set as the following: | media: Ethernet autoselect (1000baseT full-duplex) It is mainly used for NFS server. The actual communication speed was around 700Mbps at peak. -- Hiroki pgpvYDKEkwgAk.pgp Description: PGP signature
Re: em interface slow down on 8.0R
Hi Jack, Jack Vogel jfvo...@gmail.com wrote in 2a41acea0912021514r2d44dd33n4c364518d7fe1...@mail.gmail.com: jf Update: the claim to be unable to install was hasty, I went in and looked jf into myself and was able to get an install. Here's what I've found so far: jf jf First, the 82547EI will fail due to Invalid Mac Address, so I guess you jf hacked around this problem yourself? I had someone here test all jf legacy adapters for this problem and I was told nothing else was exhibiting jf it besides the 82542, obviously this is false :) In any case I will be jf making jf an official patch to fix that problem soon. jf jf Second, once I had the device working I do indeed see substandard jf performance, I am continuing to debug, but wanted you to know that I jf have reproduced this. Thank you! I have investigated some more details. First, I got something wrong with the affected FreeBSD versions; one I tried was 8.0-STABLE, not 8.0-RELEASE. So I started to try 8.0R. A summary of chips and releases I tried so far is now the following: 7.2R 8.0R 8.0-STABLE 82540EM (chip=0x100e8086, rev=0x02) OKOKtoo slow[1] 82541PI (chip=0x107c8086, rev=0x05) OK? OK 82545ep (chip=0x10268086, rev=0x04) OK? OK 82547EI (chip=0x10198086, rev=0x00) OKOKtoo slow[1] 82562V-2(chip=0x10c08086, rev=0x02) OK? OK 82573E (chip=0x108c8086, rev=0x03) OK? work but sometimes freeze[2] 82573L (chip=0x109a8086, rev=0x00) OK? work but sometimes freeze[2] 8.0-STABLE is as of Dec 1. The [1] means the odd RTT I described in the previous email. The [2] means it worked fine but sometimes it stopped working, as described later. The long RTT symptom is reproducible on Intel D865BGP motherboard. When I inserted another PCI card with an 82545ep onto it, it worked fine as em1. The em0 still had the problem after adding the em1 card. I did not manually set MAC address on it, and there was no error related to it. The above box is used for some network services, so I prepared another box based on D865BGP motherboard. This box has two NICs, 82547EI and 82540EM. The former is on-board and the latter is a PCI card. The 8.0R worked fine with the two. On the 8.0-STABLE both NICs have the RTT problem. The following difference was found by comparing the outputs dev.em.[01].debug with each other: -em0: Adapter hardware address = 0xc42e1424 +em0: Adapter hardware address = 0xc42e0424 -em1: Adapter hardware address = 0xc4364424 +em1: Adapter hardware address = 0xc435e424 The - lines are on 8.0-STABLE, and the + ones are on 8.0-RELEASE. Although I did not yet tried 8.0R on the other boxes which work fine on 8.0-STABLE, it is certain that the RTT problem did not occur on that box + 8.0R, at least. Difference of em(4) between 8.0-RELEASE and 8.0-STABLE is quite small, so perhaps it is due to some other changes... If there is something else I should try, please let me know. And another thing, I noticed a box with 82573E and 82573L sometimes got stuck after upgrading to 8.0-STABLE. It has moderate network load (average 5-10Mbps) on both NICs. It worked for a day or two and then got stuck suddenly. Rebooting the box solved the situation, but it got stuck again after a day or so. After it happens, the interface does not respond. The other functionalities of FreeBSD seemed working. Doing an up/down cycle for the NICs seemed to send some packets, but it did not recover completely; rebooting was needed for recovery. This box does not have the RTT problem. I am still not sure what is the trigger, there seems something wrong. -- Hiroki pgpJ7YFZl6Z8M.pgp Description: PGP signature
Re: loader(8) readin failed on 7.2R and later including 8.0R
John Baldwin j...@freebsd.org wrote in 200912020948.05698@freebsd.org: jh On Tuesday 01 December 2009 12:13:39 pm Hiroki Sato wrote: jh While the load command seemed to finish, the box got stuck just jh after entering boot command. jh jh Curious to say, I have got this symptom only on a specific box in jh more than ten different boxes I upgraded so far; it is based on an jh old motherboard Supermicro P4DPE[*]. jh jh [*] http://www.supermicro.com/products/motherboard/Xeon/E7500/P4DPE.cfm jh jh Any workaround? Booting from release CDROMs (7.2R and 8.0R) also jh fail. On the box 7.1R or 7.1R's loader + 7.2R kernel worked jh fine. It is possible something in changes of loader(8) between 7.1R jh and 7.2R is the cause, but I am still not sure what it is... jh jh It may be related to the loader switching to using memory 1MB for its jh malloc(). Maybe try building the loader with 'LOADER_NO_GPT_SUPPORT=yes' in jh /etc/src.conf? Thanks, a recompiled loader with LOADER_NO_GPT_SUPPORT=yes' displayed elf32_loadimage: could not read symbols - skipped! for 8.0R kernel. This is the same as 7.1R's loader + 8.0R kernel case. -- Hiroki pgppYSmidXp4L.pgp Description: PGP signature
Re: loader(8) readin failed on 7.2R and later including 8.0R
John Baldwin j...@freebsd.org wrote in 200912030803.29797@freebsd.org: jh On Thursday 03 December 2009 5:29:13 am Hiroki Sato wrote: jh John Baldwin j...@freebsd.org wrote jhin 200912020948.05698@freebsd.org: jh jh jh On Tuesday 01 December 2009 12:13:39 pm Hiroki Sato wrote: jh jh While the load command seemed to finish, the box got stuck just jh jh after entering boot command. jh jh jh jh Curious to say, I have got this symptom only on a specific box in jh jh more than ten different boxes I upgraded so far; it is based on an jh jh old motherboard Supermicro P4DPE[*]. jh jh jh jh [*] jh http://www.supermicro.com/products/motherboard/Xeon/E7500/P4DPE.cfm jh jh jh jh Any workaround? Booting from release CDROMs (7.2R and 8.0R) also jh jh fail. On the box 7.1R or 7.1R's loader + 7.2R kernel worked jh jh fine. It is possible something in changes of loader(8) between 7.1R jh jh and 7.2R is the cause, but I am still not sure what it is... jh jh jh jh It may be related to the loader switching to using memory 1MB for its jh jh malloc(). Maybe try building the loader with jh 'LOADER_NO_GPT_SUPPORT=yes' in jh jh /etc/src.conf? jh jh Thanks, a recompiled loader with LOADER_NO_GPT_SUPPORT=yes' displayed jh elf32_loadimage: could not read symbols - skipped! for 8.0R kernel. jh This is the same as 7.1R's loader + 8.0R kernel case. jh jh Can you get the output of 'smap' from the loader? Is the 8.0 kernel bigger jh than the 7.x kernel? If so, can you try trimming the 8.0 kernel a bit to see jh if that changes things? Sure. Output of smap on an 8.0R loader with LOADER_NO_GPT_SUPPORT=yes was: | OK smap | SMAP type=01 base= len=0009f400 | SMAP type=02 base=0009f400 len=0c00 | SMAP type=02 base=000dc000 len=00024000 | SMAP type=01 base=0010 len=00e0 | SMAP type=02 base=00f0 len=0010 | SMAP type=01 base=0100 len=beef | SMAP type=03 base=bfef len=c000 | SMAP type=04 base=bfefc000 len=4000 | SMAP type=01 base=bff0 len=0008 | SMAP type=02 base=bff8 len=0008 | SMAP type=02 base=fec0 len=0001 | SMAP type=02 base=fee0 len=1000 | SMAP type=02 base=ff80 len=0040 | SMAP type=02 base=fff0 len=0010 | OK Size difference between the two kernels was: | -r-xr-xr-x 1 root wheel 9708240 Dec 1 16:22 kernel.7/kernel | -r-xr-xr-x 1 root wheel 11492703 Nov 21 15:48 kernel.8/kernel Then I rebuilt a smaller 8.0 kernel by removing some entries from the kernel configuration file. The size is now smaller than 7.1R kernel: | -r-xr-xr-x 1 root wheel 7710491 Dec 3 21:10 /boot/kernel.8X/kernel Loading the new kernel seemed to work fine with the recompiled 8.0R loader, but it got stuck just after entering boot: | OK load /boot/kernel.8X/kernel | /boot/kernel.8X/kernel text=0x5a7664 data=0x88d74+0x82f04 syms=[0x4+0x6d290+0x4+0x987e3] | OK boot | / Loading 7.1R kernel by using the recompiled 8.0R loader had no problem. -- Hiroki pgp4kNtLrPHOy.pgp Description: PGP signature
loader(8) readin failed on 7.2R and later including 8.0R
Hi, This may be a rare case, but I post this with the hope for ideas from people here. I have experienced a strange loader(8) error. After upgrading one of my boxes from 7.1R to 7.2R, an error appeared on boot command of loader(8) like this: | FreeBSD/i386 bootstrap loader, Revision 1.1 | (h...@cmaster.allbsd.org, Mon Nov 30 04:01:24 JST 2009) | Loading /boot/defaults/loader.conf | /boot/kernel/kernel text=0x8b6c04 | readin failed | | elf32_loadimage: read failed | /boot/kernel/kernel text=0x8b6c04 | readin failed | | elf32_loadimage: read failed | Unable to load a kernel! (Actually the above error message was displayed when I upgraded it to 8.0R. The message was the same when I tried 7.2R.) Replacing the /boot/loader with 7.1R's one, 7.2R's kernel worked fine. Next, I tried to upgrade it to 8.0R. As I explained earlier, the 8.0R's loader did not work either, so I replaced it with 7.1R again. However, 7.1R loader(8) + 8.0R kernel displayed the following error and did not work: | OK load /boot/kernel/kernel | /boot/kernel/kernel text=0x8db9a4 data=0xdd134+0xa5e84 syms=[0x4+0x99390+0x4+0xd2201 | elf32_loadimage: could not read symbols - skipped! While the load command seemed to finish, the box got stuck just after entering boot command. Curious to say, I have got this symptom only on a specific box in more than ten different boxes I upgraded so far; it is based on an old motherboard Supermicro P4DPE[*]. [*] http://www.supermicro.com/products/motherboard/Xeon/E7500/P4DPE.cfm Any workaround? Booting from release CDROMs (7.2R and 8.0R) also fail. On the box 7.1R or 7.1R's loader + 7.2R kernel worked fine. It is possible something in changes of loader(8) between 7.1R and 7.2R is the cause, but I am still not sure what it is... -- Hiroki pgpBMlCWrr9jX.pgp Description: PGP signature
em interface slow down on 8.0R
Hi, I noticed that network connection of one of my boxes got significantly slow just after upgrading it to 8.0R. The box has an em0 (82547EI) and worked fine with 7.2R. The symptoms are: - A ping to a host on the same LAN takes 990ms RTT, it reduces gradually to around 1ms, and then it returns to around 1s. The rate was about 2ms/ping. - The response is quite slow, but no packet loss and network services on the box seem to work fine as far as I can check. There does not seem interrupt storm according to vmstat -i. No error message such as watchdog timeout appears. Any ideas to narrow down the cause? It maybe a linkup problem with a specific model of hub like full-duplex/half-duplex mismatch, but the link is 1000baseT full-duplex and setting it manually did not solve it. I think it is certain that upgrading to 8.0R triggered it, at least. Another box with an em interface works fine after upgrading to 8.0R. It has a different chip (82573E). Details of the em interface and vmstat -i are the following: e...@pci0:1:1:0: class=0x02 card=0x302c8086 chip=0x10198086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (LOM) (82547EI)' class = network subclass = ethernet Adapter hardware address = 0xc42e1424 em0: CTRL = 0x183c0241 RCTL = 0x8002 em0: Packet buffer = Tx=10k Rx=30k em0: Flow control watermarks high = 28672 low = 27172 em0: tx_int_delay = 66, tx_abs_int_delay = 66 em0: rx_int_delay = 0, rx_abs_int_delay = 66 em0: fifo workaround = 0, fifo_reset_count = 0 em0: hw tdh = 49, hw tdt = 49 em0: hw rdh = 238, hw rdt = 187 em0: Num Tx descriptors avail = 250 em0: Tx Descriptors not avail1 = 0 em0: Tx Descriptors not avail2 = 0 em0: Std mbuf failed = 0 em0: Std mbuf cluster failed = 0 em0: Driver dropped packets = 0 em0: Driver tx dma failure in encap = 0 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.14 dev.em.0.%driver: em dev.em.0.%location: slot=1 function=0 handle=\_SB_.PCI0.P0P2.TANA dev.em.0.%pnpinfo: vendor=0x8086 device=0x1019 subvendor=0x8086 subdevice=0x302c class=0x02 dev.em.0.%parent: pci1 dev.em.0.debug: -1 dev.em.0.stats: -1 dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_processing_limit: 100 dev.em.0.wake: 0 % vmstat -i interrupt total rate irq4: uart0 3585 3 irq14: ata0 1811 1 irq15: ata1 112 0 irq16: uhci0 uhci315 0 irq18: em0 uhci2+ 92457 99 irq19: uhci1 1 0 irq23: ehci0 2 0 cpu0: timer 1849981 1997 cpu1: timer 1849961 1997 Total3797925 4101 -- Hiroki pgpKKA4N6gAaa.pgp Description: PGP signature
Re: em interface slow down on 8.0R
Jack Vogel jfvo...@gmail.com wrote in 2a41acea0911301119j1449be58y183f2fe1d1112...@mail.gmail.com: jf I will look into this Hiroki, as time goes the older hardware does not jf always jf get test cycles like one might wish. Thanks! Please let me know if you need more information. -- Hiroki pgp3TYQPpOkMO.pgp Description: PGP signature
possible loader regression on RELENG_7_2_0_RELEASE
During upgrading boxes in allbsd.org to RELENG_7_2_0_RELEASE I found one of them could not boot at the loader stage. The error messages issued by the loader after make installkernel + make installworld + reboot were the following: |Loading /boot/defaults/loader.conf |/boot/kernel/kernel text=0x7cbd7c data=0xcece0+0x67940 |readin failed | |elf32_loadimage: read failed |/boot/kernel/kernel text=0x7cbd7c data=0xcece0+0x67940 |readin failed | |elf32_loadimage: read failed |Unable to load a kernel! The normal loader prompt was displayed after that and I can enter commands, but neither the kernel nor some old kernels which I confirmed they worked fine got loaded. Then I tried a livefs CDROM, but the same error occurred at the loader stage. So I tried 7.1R CDROM instead, mounted the root file system on the hard drive, and copied a loader binary from 7.1R. It worked with no problem with the RELENG_7_2_0_RELEASE kernel. The motherboard was Supermicro P4DPE (Xeon 2.4GHz x 2, 3GB RAM). The installed version was FreeBSD/i386. I did not narrow down the cause yet due to the time was limited, but it was reproducible and probably hardware-dependent. Replacing the loader binary with the old one worked as a workaround, so I guess there may be a regression around the boot loader. Just a report. -- | Hiroki SATO pgphazWVsse2N.pgp Description: PGP signature
IPv6 routing on 7.1R
Hi, I noticed an odd behavior regarding IPv6 after upgrading my 7.0R box to 7.1R. The situation and symptom are the following: 1. The box has two NICs. One has an address 2001:0db8:1::1/64 (NIC A), and another has 2001:0db8:2::1/64 (NIC B). These addresses are assigned manually ($ipv6_ifconfig in rc.conf). 2. RA is periodically sent to the network 2001:0db8:1::1/64 (NIC A) by a router on the subnet. The RA includes a source link-layer address option only. When setting net.inet6.ip6.accept_rtadv=1 in this configuration, I expected the box assigns an autoconf IPv6 address (prefix 2001:0db8:1::/64 + EUI64) to NIC A and an default route based on source link-layer address in the RA packet. Actually, these two were done as expected. However, after addresses are assigned, routes for NIC B disappeared from the routing table. More specifically, a cloning route 2001:0db8:2::1/64 - link#2 was removed for some reason. Is this an expected behavior? IIRC, 7.0R does not remove the route and I think it is strange. It works fine if a box has a single NIC, though. -- | Hiroki SATO pgpedyyIb66a2.pgp Description: PGP signature
Re: IPv6 over gif(4) broken in 6.2-RELEASE?
Bruce A. Mah [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: bm I'm observing a problem with IPv6 over gif(4) tunnels on 6.2-RELEASE bm and recent 6-STABLE, namely that I can't seem to be able to pass bm traffic over them. bm bm Essentially, when I configure a gif interface like this: bm bm # ifconfig gif0 inet6 :::::1 :::::2 prefixlen 128 bm bm the interface should add a host route to :::::2 bm through gif0. This is necessary to be able to pass traffic over the bm tunnel, particularly since the source and destination addresses of the bm link don't need to have any relationship to each other. bm bm However, this route doesn't get installed on recent 6-STABLE. bm Therefore there is no way to get an IPv6 packet to the other end of bm the tunnel because there's no route for the destination. The most bm obvious symptom is that I try to ping the other tunnel endpoint and bm get: bm bm ping6: UDP connect: No route to host bm bm I know this worked on RELENG_6 as of June 2006; my home firewall has bm been running this code for months without a hitch. It doesn't work in bm 6.2-RC2 or 6.2-RELEASE (fresh CD installs on i386, GENERIC kernels), bm or this week's RELENG_6 (nanobsd on i386). bm bm I somewhat suspect revs. 1.48.2.15 and 1.48.2.14 to bm src/sys/netinet/nd6.c. If I locally revert these two changes (see bm diff below), IPv6 over gif(4) works again. bm bm There's another workaround for people stuck in this situation and who bm aren't in a position to try this diff. That is to manually install bm the host route like this: bm bm # route add -host -inet6 :::::2 -interface gif0 -nostatic -llinfo bm bm Comments? I remember Dimitry Andric reported the same problem on -stable on 30 Dec, and after he reverted rev.1.48.2.16 it worked fine again. Do you have the symptom even on 6.2-RELEASE? Since RELENG_6_2_0_RELEASE did not have the change, I thought there was no problem. I will try to reproduce it on my box anyway... -- | Hiroki SATO pgpXPIvebVKfg.pgp Description: PGP signature
Re: strange behavior of ioapic on PDSME motherboard
John Baldwin [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: jh On Sunday 03 December 2006 22:55, Hiroki Sato wrote: jh Hiroki Sato [EMAIL PROTECTED] wrote jhin [EMAIL PROTECTED]: jh John, are there any big changes of ioapic support between RELENG_6 jh and CURRENT? I would like your comments to narrow down the cause. jh The 7.0-CURRENT November snapshot could probe the mpt (as a very slow jh device, though), and both em and mpt worked with acpi/ioapic enabled. jh I had a look at the changes in sys/i386/i386, but I am not sure jh if which is likely (or not)... jh jh There aren't any non-cosmetic changes in the apic code between 6.x and HEAD. Okay, thanks. So these symptoms are not directly related to mpt(4) and apic code at least. I will continue to investigate what is the cause, anyway. -- | Hiroki SATO pgpibwzVkkyTo.pgp Description: PGP signature
panic in nfsd on 6.2-RC1
Hi, One of my NFS servers running 6.2-RC1 that are highly-loaded causes a panic repeatedly these days. I am not sure which upgrading this panic starts after precisely, but this was running for almost one year (6.0R and 6.1R) with no problem at least. A core file is available. (from here) Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x20:0xc069d890 stack pointer = 0x28:0xed0ae920 frame pointer = 0x28:0xed0ae928 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= resume, IOPL = 0 current process = 653 (nfsd) trap number = 12 panic: page fault Uptime: 46m22s Dumping 1021 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1022MB (261423 pages) 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc067c512 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc067c7d8 in panic (fmt=0xc08d0c8e %s) at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc0892122 in trap_fatal (frame=0xed0ae8e0, eva=0) at /usr/src/sys/i386/i386/trap.c:837 #4 0xc0891866 in trap (frame= {tf_fs = -992346104, tf_es = 40, tf_ds = 268107816, tf_edi = 72, tf_esi = 0, tf_ebp = -318052056, tf_isp = -318052084, tf_ebx = -993986688, tf_edx = -993986688, tf_ecx = 4, tf_eax = 4, tf_trapno = 12, tf_err = 0, tf_eip = -1066805104, tf_cs = 32, tf_eflags = 589831, tf_esp = 0, tf_ss = -1063278752}) at /usr/src/sys/i386/i386/trap.c:270 #5 0xc088012a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #6 0xc069d890 in turnstile_broadcast (ts=0x0) at /usr/src/sys/kern/subr_turnstile.c:726 #7 0xc06739d7 in _mtx_unlock_sleep (m=0xc09fa760, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:690 #8 0xc077e00b in nfs_rephead (siz=0, nd=0xc5023c00, err=72, mbp=0x4, bposp=0x4) at /usr/src/sys/nfsserver/nfs_srvsock.c:152 #9 0xc07779f3 in nfsrv_symlink (nfsd=0xc5023c00, slp=0xc4f8ae80, td=0xc4c0f780, mrq=0xed0aec98) at /usr/src/sys/nfsserver/nfs_serv.c:2844 #10 0xc07819b1 in nfssvc_nfsd (td=0x4) at /usr/src/sys/nfsserver/nfs_syscalls.c:474 #11 0xc0781194 in nfssvc (td=0xc4c0f780, uap=0xed0aed04) at /usr/src/sys/nfsserver/nfs_syscalls.c:181 #12 0xc0892437 in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 1, tf_esi = 0, tf_ebp = -1077941464, tf_isp = -318050972, tf_ebx = 12, tf_edx = 672449048, tf_ecx = 26, tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 671863223, tf_cs = 51, tf_eflags = 662, tf_esp = -1077941492, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:983 #13 0xc088017f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #14 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) (to here) -- | Hiroki SATO pgpjbloAH1H0n.pgp Description: PGP signature
Re: panic in nfsd on 6.2-RC1
Kostik Belousov [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: ko What version of sys/nfsserver/nfs_serv.c do you use ? If it is older than ko 1.156.2.7, please, update the system. Thanks, I updated it just now and see how it works. -- | Hiroki SATO pgp856R5LuG1R.pgp Description: PGP signature
strange behavior of ioapic on PDSME motherboard (was: LSI 53C1030/mpt(4) problem)
Hiroki Sato [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: hr Recently I bought Intel Pentium D 945 (3.45GHz), Supermicro PDSME hr (Intel E7320), and LSI21320RB (PCI-X SCSI HBA using LSI 53C1030). I hr installed 6.2-RC1 to an old PATA HDD and attached it to the hr motherboard, and it worked fine. However, I installed 21320RB and hr made several SCSI HDDs attached, some strange problems occurred. It worked when I turned off ioapic and/or acpi. When acpi was disabled, mpt seemed to work but em did not work due to the UP/DOWN storm (vmstat -i did not display an irq for em0 at that time). When ioapic was disabled, all devices worked with shared irqs. So, this is probably an ioapic's issue, not a mpt's, and PDSME specific I guess. Sorry for the false alarm. John, are there any big changes of ioapic support between RELENG_6 and CURRENT? I would like your comments to narrow down the cause. The 7.0-CURRENT November snapshot could probe the mpt (as a very slow device, though), and both em and mpt worked with acpi/ioapic enabled. I had a look at the changes in sys/i386/i386, but I am not sure if which is likely (or not)... Scott Long [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: sc Hiroki Sato wrote: sc Any suggestions for what I should do for this problem? I can send sc more detail information from boot -v and/or dev.mpt.0.debug=5, but sc not sure which message is important for diagnosing. sc sc Just for comparison, could you go back to FreeBSD 6.0 and see if the sc problems remain? No difference when I tried, but it seems not a mpt problem as I wrote above. Thanks for the suggestion, anyway. Matthew Jacob [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: ly - 2006 Nov 7-CURRENT snapshot probes the two HDDs case, but the HDDs ly are recognized as very slow devices such as 6MB/s, and accessing ly it makes the box freeze, too. ly ly The 6MB/s thing I'm working on now. I have no clue about the other ly issues at this time. I see. BTW, I confirmed that mpt worked on the November snapshot except the data transfer rate was 6.6MB/s. Is it worth trying the latest current? -- | Hiroki SATO pgpjKImSwO9n3.pgp Description: PGP signature
LSI 53C1030/mpt(4) problem
Hi, Recently I bought Intel Pentium D 945 (3.45GHz), Supermicro PDSME (Intel E7320), and LSI21320RB (PCI-X SCSI HBA using LSI 53C1030). I installed 6.2-RC1 to an old PATA HDD and attached it to the motherboard, and it worked fine. However, I installed 21320RB and made several SCSI HDDs attached, some strange problems occurred. First, 21320RB was recognized by the mpt(4) driver. When I tried it with no HDD it was recognized properly, so I turned off the box and connected an HDD to it and rebooted it. Then, mpt(4) recognized the HDD and it worked without problems. I thought it was okay, and connected more HDDs to the SCSI HBA. More specifically, 21320RB has two channels, so I connected two hardware RAID boxes which actually contain five HDDs each and are seen as one large HDD to each channel. When I rebooted the box after that, device probing at boot time stopped just before Waiting 5 seconds for SCSI devices to settle. Everything including keyboard does not work at that time, I turned off the box and disconnect the RAID boxes. After several trials, I found that 21320RB's behavior was somewhat strange: - with no HDD: Works fine basically, but after two or more HDDs recognized, it freezes during device probing (just before Waiting... message) even if the HDDs removed. Setting the card's configuration as factory default via BIOS setting seems to recover the state. - with one HDD: Works fine after it is recognized. - with two HDDs: Does not work if two HDDs are connected to each channel. BIOS message from the HBA is normal, but FreeBSD device probing keeps failing in the following two forms: a) Freeze just before Waiting... message. b) Freeze after Waiting... message. In b), mpt(4) seems to reset the buses and wait the responses, but I saw after displaying unretryable error it freeze when boot -v used. I tried booting the box with no SCSI HDD, connecting HDDs after the boot, and doing camcontrol rescan all. It recognizes the connected HDDs successfully, and it can be accessed fine even if it is more than one. However, simultaneous access causes solid freeze again. Then I tried a RAID box which has one ID and several LUN numbers corresponding to the HDDs. It recognized as normal, multiple HDDs at boot time, and can be accessed. Simultaneous access works, too. After that, I tried daisy-chaining two RAID boxes and connected the two to a channel of the SCSI HBA. These RAID boxes have ID=0 and ID=1. FreeBSD freezes after Waiting... message this time. In short, I could make this configuration work fine only when a RAID box (or SCSI HDD) is connected to the HBA, or multiple HDDs that have the same ID and different LUN number from each other are connected. I investigated the following: - 6.1R sometimes probes the two HDDs case, but accessing it makes the box freeze. - 2006 Nov 7-CURRENT snapshot probes the two HDDs case, but the HDDs are recognized as very slow devices such as 6MB/s, and accessing it makes the box freeze, too. - When the box freezes just before Waiting... message, boot -v does not display any detail messages there. In after Waiting.. case, several messages are displayed from mpt(4). - No panic in either case. In all cases, it silently freezes and does not respond to Ctrl-Alt-ESC. - When I use Intel D865GBF (motherboard with Intel 865 chipset), the same HBA, and the same RAID boxes, they work fine on 6.1-RC1. The HBA is connected to 33MHz PCI bus, not PCI-X, so it may make some differences. Any suggestions for what I should do for this problem? I can send more detail information from boot -v and/or dev.mpt.0.debug=5, but not sure which message is important for diagnosing. -- | Hiroki SATO pgpA2pXeWR7UN.pgp Description: PGP signature
Re: getopt_long and POSIXLY_CORRECT
Mikhail Teterin [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: mi Could a committer with interest in -stable, please, see to it, that Andrey's mi recent change to getopt_long makes it into 6.2-RELEASE? mi mi The change makes our implementation of getopt_long closer to GNULIB's and will mi make it easier to avoid code-duplication in some ports. Approved. Thanks. -- | Hiroki SATO pgpiA0ASiMWN0.pgp Description: PGP signature
Re: cvs commit: www/en/releases/6.1R todo.sgml
Gleb Smirnoff [EMAIL PROTECTED] wrote in [EMAIL PROTECTED]: gl Is it possible place kern/87208 into TODO list for 6.1-RELEASE? gl The problem appeared to be a bad regression in 6.0-RELEASE, gl that hurted many users. The PR contains several test cases, gl description and patch for the problem. Thanks, added just now. Will this description do? -- | Hiroki SATO pgp93rSswL0gH.pgp Description: PGP signature
tester needed: problems in 5.3R errata solved?
Hi all, Before 5.4R is released, I would like to make sure which problems described in 5.3R errata[*] are solved and which are not. If you had a problem on 5.3R and you make sure it is solved (or still persists) on 5.4-RC series, could you please inform us? Any comments are welcome. Thanks in advance. [*] http://www.FreeBSD.org/releases/5.3R/errata.html -- | Hiroki SATO pgpkErWk22NXc.pgp Description: PGP signature