Re: SLAAC not working

2017-08-09 Thread Hiroki Sato
Greg Rivers  wrote
  in <2045487.fzlpjxt...@flake.tharned.org>:

gc> >  2. What is shown by the command "ping6 ff02::1%lagg0" and "rtsol -dD 
lagg0"?
gc> >
gc> $ ping6 -c 2 ff02::1%lagg0
gc> PING6(56=40+8+8 bytes) fe80::ae16:2dff:fe1e:b880%lagg0 --> ff02::1%lagg0
gc> 16 bytes from fe80::ae16:2dff:fe1e:b880%lagg0, icmp_seq=0 hlim=64 
time=0.181 ms
gc> 16 bytes from fe80::f415:63ff:fe2b:ea06%lagg0, icmp_seq=0 hlim=64 
time=0.263 ms(DUP!)
gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 
time=0.318 ms(DUP!)
gc> 16 bytes from fe80::8edc:d4ff:feaf:8938%lagg0, icmp_seq=0 hlim=64 
time=0.369 ms(DUP!)
gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 
time=0.803 ms(DUP!)
gc> 16 bytes from fe80::ae16:2dff:fe1e:e998%lagg0, icmp_seq=0 hlim=64 
time=0.868 ms(DUP!)
gc> 16 bytes from fe80::ae16:2dff:fe1e:49f8%lagg0, icmp_seq=0 hlim=64 
time=0.922 ms(DUP!)
gc> 16 bytes from fe80::226:55ff:fe2f:40a4%lagg0, icmp_seq=0 hlim=64 time=0.971 
ms(DUP!)
gc> 16 bytes from fe80::f415:63ff:fe2b:ea06%lagg0, icmp_seq=0 hlim=64 
time=2.144 ms(DUP!)
gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 
time=4.154 ms(DUP!)
gc> 16 bytes from fe80::f415:63ff:fe2b:e806%lagg0, icmp_seq=0 hlim=64 
time=4.220 ms(DUP!)
gc> 16 bytes from fe80::ae16:2dff:fe1e:b880%lagg0, icmp_seq=1 hlim=64 
time=0.222 ms

 You should have got responses from 64:a0:e7:45:63:43 (router), namely
 fe80::66a0:e7ff:fe45:6343%lagg0, but it seems it did not happen for
 some reason.  Was the router receiving the ICMPv6 ECHOes which came
 from fe80::ae16:2dff:fe1e:b880?

gc> >  2. What is shown by the command "ping6 ff02::1%lagg0" and "rtsol -dD 
lagg0"?
(snip)
gc> # rtsol -dD lagg0
gc> checking if lagg0 is ready...
gc> lagg0 is ready
gc> set timer for lagg0 to 1s
gc> New timer is 1s
gc> timer expiration on lagg0, state = 1
gc> send RS on lagg0, whose state is 2
gc> set timer for lagg0 to 4s
gc> New timer is 4s
gc> timer expiration on lagg0, state = 2
gc> send RS on lagg0, whose state is 2
gc> set timer for lagg0 to 4s
gc> New timer is 4s
gc> timer expiration on lagg0, state = 2
gc> send RS on lagg0, whose state is 2
gc> set timer for lagg0 to 1s
gc> New timer is 1s
gc> timer expiration on lagg0, state = 2
gc> No answer after sending 3 RSs
gc> stop timer for lagg0
gc> there is no timer

 This indicates that there was no RA as an answer from the router
 after a RS message was sent.  Probably there is a problem with the
 link between lagg0 and the router, not specific to IPv6.

-- Hiroki


pgpL4q7DN7PCO.pgp
Description: PGP signature


Re: SLAAC not working

2017-08-09 Thread Hiroki Sato
Greg Rivers  wrote
  in <1557648.bebeymq...@flake.tharned.org>:

gc> On Monday, August 07, 2017 15:57:04 Andrey V. Elsukov wrote:
gc> > So, set net.inet6.icmp6.nd6_debug=1 and show what you have in the
gc> >  ndp -p
gc> >  ndp -r
gc> >  ndp -i lagg0
gc> >
gc> # sysctl net.inet6.icmp6.nd6_debug=1
gc> net.inet6.icmp6.nd6_debug: 0 -> 1
gc> # suspend
gc> [1] + Stopped (SIGSTOP)su -
gc> $ ndp -p
gc> fe80::%lagg0/64 if=lagg0
gc> flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=0
gc>   No advertising router
gc> fe80::%lo0/64 if=lo0
gc> flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=0
gc>   No advertising router
gc> $ ndp -r
gc> $ ndp -i lagg0
gc> linkmtu=0, maxmtu=0, curhlim=64, basereachable=30s0ms, reachable=31s,
gc> retrans=1s0ms
gc> Flags: nud accept_rtadv auto_linklocal
gc>
gc> Clearly there's no SLAAC action. I can't find any NDP debug messages
gc> in the kernel message log or in the syslog. Where might they be going?

 The configuration looks correct to me, but two questions:

 1. Does "sysctl net.inet6.ip6.forwarding" command show "0"?

 2. What is shown by the command "ping6 ff02::1%lagg0" and "rtsol -dD lagg0"?

-- Hiroki


pgpfOK4C9B_f7.pgp
Description: PGP signature


Re: IPv6 works on em0 () but not on em1 () - what's wrong?,IPv6 works on em0 () but not on em1 () - what's wrong?

2017-01-10 Thread Hiroki Sato
Lev Serebryakov  wrote
  in <58756dde.5000...@freebsd.org>,<58756dde.5000...@freebsd.org>:

le>
le>  I have MoBo (Supermicro X9SCL-F) with two 1G NICs, first one (em0) is
le> based on 82579LM, and second one (em1) is based on 82574L.
le>
le>  When I'm using em0 with simple config:
le>
le> ifconfig_em0="inet 192.168.134.2 netmask 255.255.255.0 mtu 9000"
le> ifconfig_em0_ipv6="inet6 accept_rtadv"
le>
le>  everything works fine - em0 get IPv6 prefix from rtadvd of my router
le> and "tspdump -n -i em0 icmp6" shows some traffic, like router and prefix
le> announcements. So far so good.
le>
le>  I want to use em1 (and don't use em0 at all), because 82579LM has some
le> known bugs according to SuperMicro support and someties hangs whole system.
le>
le>   So, I change config to
le>
le> ifconfig_em1="inet 192.168.134.2 netmask 255.255.255.0 mtu 9000"
le> ifconfig_em1_ipv6="inet6 accept_rtadv"
le>
le>  connect em1 instead of em0 to the switch and reboot. And after that
le> interface (em1) can not get IPv6 prefix, don't get global address (and
le> shows only link-local one)and "tcpdump -n -i em1 icmp6" shows nothing at
le> all! IPv4 works fine, though.
le>
le>  What do I do wrong? Is it known issue of 82574L?
le>
le>  I'm running 10-STABLE r311462.

 What happens by typing the following command?

 % ping6 ff02::1%em1

-- Hiroki


pgppEPUNeQglf.pgp
Description: PGP signature


Re: stf(4) on 10-stable

2016-02-05 Thread Hiroki Sato
Daniel Bilik  wrote
  in <20160205093713.1c1453f9b5d06a6b366c4...@neosystem.cz>:

dd> On Thu, 14 Jan 2016 10:49:37 +0100
dd> Daniel Bilik  wrote:
dd>
dd> >> Should I create PR for this?
dd> > Created:
dd> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206231
dd>
dd> Seems that 10-stable has just entered beta1, so unless some effort is
dd> put into fixing this, 10.3-release is probably gonna ship with broken 6to4
dd> connectivity.

 I am sorry for not taking care of this in a timely manner.  I will do
 this weekend.

-- Hiroki


pgpo3q6sftjuZ.pgp
Description: PGP signature


Re: ipv6_addrs_IF aliases in rc.conf(5)

2013-07-20 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20130718.123323.1730389945845032580@allbsd.org:

hr Michael Grimm trash...@odo.in-berlin.de wrote
hr   in eb3c4472-02bf-4415-bb2d-b4929063d...@odo.in-berlin.de:
hr
hr tr On 12.07.2013, at 09:03, Hiroki Sato h...@freebsd.org wrote:
hr tr
hr tr  Please let me know if the existing configurations and/or the new
hr tr  formats do not work.
hr tr
hr tr First of all: great work! It is that much easier to deal with aliases, 
now.
hr tr
hr tr There is only one minor issue, if at all:
hr tr
hr tr rc.conf:
hr tr | ifconfig_em0_ipv6=inet6 dead:beef::::1 prefixlen 56
hr tr | ifconfig_em0_aliases=\
hr tr | inet6 dead:beef::::2-3 prefixlen 56 \
hr tr | inet6 dead:beef::::4 prefixlen 56 \
hr tr | inet6 dead:beef::::5-6/56
hr tr
hr tr ifconfig:
hr tr |   inet6 dead:beef::::1 prefixlen 56
hr tr |   inet6 dead:beef::::2 prefixlen 64
hr tr |   inet6 dead:beef::::3 prefixlen 64
hr tr |   inet6 dead:beef::::4 prefixlen 56
hr tr |   inet6 dead:beef::::5 prefixlen 56
hr tr |   inet6 dead:beef::::6 prefixlen 56
hr tr
hr tr Any combination of a range definition (2-3) *and* prefixlen 56 is 
ignored
hr tr whereas a range definition (5-6) *and* /56 is interpreted as wanted.
hr tr
hr tr Well, that combination of a range and prefix isn't documented, thus I 
am
hr tr not sure if that's an issue or a feature?
hr
hr  It seems a bug.  Thank you for your report.  I am investigating it now.

 Can you test the attached patch?  The old version (in stable/9 now)
 does not support address range spec + options properly and ignore
 the options part.

 The attached patch accepts options and treats netmask for inet and
 prefixlen in inet6 in a reasonable way so that the specified
 options do not conflict with the default /NN values.

-- Hiroki
Index: etc/network.subr
===
--- etc/network.subr	(revision 253489)
+++ etc/network.subr	(working copy)
@@ -721,9 +721,14 @@
 #
 ifalias_expand_addr()
 {
+	local _af _action

-	afexists $1 || return
-	ifalias_expand_addr_$1 $2 $3
+	_af=$1
+	_action=$2
+	shift 2
+
+	afexists $_af || return
+	ifalias_expand_addr_$_af $_action $*
 }

 # ifalias_expand_addr_inet action addr
@@ -731,19 +736,31 @@
 #
 ifalias_expand_addr_inet()
 {
-	local _action _arg _cidr _cidr_addr
+	local _action _arg _cidr _cidr_addr _exargs
 	local _ipaddr _plen _range _iphead _iptail _iplow _iphigh _ipcount
 	local _retstr _c
 	_action=$1
 	_arg=$2
+	shift 2
+	_exargs=$*
 	_retstr=

-	case $_action:$_arg in
+	case $_action:$_arg:$_exargs in
 	*:*--*)		return ;;			# invalid
-	tmp:*)		echo $_arg  return ;;		# already expanded
-	tmp:*-*)	_action=alias	;;		# to be expanded
-	*:*-*)		;;# to be expanded
-	*:*)		echo inet $_arg  return ;;	# already expanded
+	tmp:*:*netmask*)		# already expanded w/ netmask option
+		echo ${_arg%/[0-9]*} $_exargs  return
+	;;
+	tmp:*:*)			# already expanded w/o netmask option
+		echo $_arg $_exargs  return
+	;;
+	tmp:*[0-9]-[0-9]*:*)	_action=alias	;;	# to be expanded
+	*:*[0-9]-[0-9]*:*)	;;			# to be expanded
+	*:*:*netmask*)			# already expanded w/ netmask option
+		echo inet ${_arg%/[0-9]*} $_exargs  return
+	;;
+	*:*:*)# already expanded w/o netmask option
+		echo inet $_arg $_exargs  return
+	;;
 	esac

 	for _cidr in $_arg; do
@@ -796,7 +813,7 @@
 	done

 	for _c in $_retstr; do
-		ifalias_expand_addr_inet $_action $_c
+		ifalias_expand_addr_inet $_action $_c $_exargs
 	done
 }

@@ -805,20 +822,32 @@
 #
 ifalias_expand_addr_inet6()
 {
-	local _action _arg _cidr _cidr_addr
+	local _action _arg _cidr _cidr_addr _exargs
 	local _ipaddr _plen _ipleft _ipright _iplow _iphigh _ipcount
 	local _ipv4part
 	local _retstr _c
 	_action=$1
 	_arg=$2
+	shift 2
+	_exargs=$*
 	_retstr=

-	case $_action:$_arg in
-	*:*--*)		return ;;			# invalid
-	tmp:*)		echo $_arg  return ;;
-	tmp:*-*)	_action=alias	;;
-	*:*-*)		;;
-	*:*)		echo inet6 $_arg  return ;;
+	case $_action:$_arg:$_exargs in
+	*:*--*:*)	return ;;			# invalid
+	tmp:*:*prefixlen*)	# already expanded w/ prefixlen option
+		echo ${_arg%/[0-9]*} $_exargs  return
+	;;
+	tmp:*:*)		# already expanded w/o prefixlen option
+		echo $_arg $_exargs  return
+	;;
+	tmp:*[0-9a-zA-Z]-[0-9a-zA-Z]*:*)_action=alias	;;# to be expanded
+	*:*[0-9a-zA-Z]-[0-9a-zA-Z]*:*)	;;		# to be expanded
+	*:*:*prefixlen*)	# already expanded w/ prefixlen option
+		echo inet6 ${_arg%/[0-9]*} $_exargs  return
+	;;
+	*:*:*)			# already expanded w/o prefixlen option
+		echo inet6 $_arg $_exargs  return
+	;;
 	esac

 	for _cidr in $_arg; do
@@ -872,7 +901,7 @@
 			fi

 			for _c in $_retstr; do
-ifalias_expand_addr_inet6 $_action $_c
+ifalias_expand_addr_inet6 $_action $_c $_exargs
 			done
 		else
 			# v4mapped/v4compat should handle as an IPv4 alias
@@ -888,7 +917,7 @@
 			_retstr=`ifalias_expand_addr_inet \
 			tmp ${_ipv4part}${_plen:+/}${_plen

Re: ipv6_addrs_IF aliases in rc.conf(5)

2013-07-20 Thread Hiroki Sato
Michael Grimm trash...@odo.in-berlin.de wrote
  in 5c2419e4-d5b7-4f1a-aed0-90ef73305...@odo.in-berlin.de:

tr On 20.07.2013, at 16:46, Hiroki Sato h...@freebsd.org wrote:
tr  Hiroki Sato h...@freebsd.org wrote in 
20130718.123323.1730389945845032580@allbsd.org:
tr 
tr  Can you test the attached patch?  The old version (in stable/9 now)
tr  does not support address range spec + options properly and ignore
tr  the options part.
tr 
tr  The attached patch accepts options and treats netmask for inet and
tr  prefixlen in inet6 in a reasonable way so that the specified
tr  options do not conflict with the default /NN values.
tr
tr I can confirm that your patch is working for my examples used before.
tr
tr Now, a range definition and prefixlen 56 is recognized properly:

 Thank you.  Committed as r253505 and will be merged to stable/9.

-- Hiroki


pgp2zlCMgHI0D.pgp
Description: PGP signature


Re: ipv6_addrs_IF aliases in rc.conf(5)

2013-07-17 Thread Hiroki Sato
Łukasz Wąsikowski luk...@wasikowski.net wrote
  in 51e53ac7.1040...@wasikowski.net:

lu hr# IPv4 address range spec.  Now deprecated.
lu hripv4_addr_em0=10.2.1.1-10
lu 
lu So I'm a little confused now :) If I'd use post r252015 system then
lu would this be better way?
lu 
lu ifconfig_em0_aliases=inet 10.0.0.66/28 inet 10.0.0.67-78 inet6
lu fdda:5cc1:23:4::1/48 inet6 fdda:5cc1:23:4::2-f

Dewayne Geraghty dewayne.gerag...@heuristicsystems.com.au wrote
  in 14677223DB6D4CD48E880520725B3552@white:

de Sato-san,
de 
de You have provided a very useful summary of ifconfig parameters for
de rc.conf. However, you are missing one example that would provide
de clearer understanding.  Would you please advise if 
de  
de ipv4_addr_em0=10.2.1.1-10/32
de 
de is deprecated, backward compatible or remains valid into the future?
de 
de I particularly appreciate the succinctness of:
de ifconfig_em0_aliases=inet 10.3.3.201-204/24 inet6
de 2001:db8:210-213::1/64 inet 10.1.1.1/24

 The recommended way is ifconfig_IF_aliasN or ifconfig_IF_aliases.
 ipv4_addr_IF will not be removed in the near future, but please use
 ifconfig_IF_alias{N,es} for newly-configured systems.  Backward
 compatibility for not breaking the existing configurations will be
 maintained as much as possible (even on the upcoming 10.0R and
 later).

 This is because we have a lot of variables which have (almost) the
 same functionality in rc.conf and I want to simplify them by merging
 them with each other, not because these are better than the others.
 Variables with overlapped functionality have made difficult to
 maintain/improve the rc.d scripts.

-- Hiroki


pgpzCODgf7YuF.pgp
Description: PGP signature


Re: ipv6_addrs_IF aliases in rc.conf(5)

2013-07-17 Thread Hiroki Sato
Mark Felder f...@freebsd.org wrote
  in 1374062120.4532.140661256673649.36ed5...@webmail.messagingengine.com:

fe On Wed, Jul 17, 2013, at 4:36, Hiroki Sato wrote:
fe 
fe   The recommended way is ifconfig_IF_aliasN or ifconfig_IF_aliases.
fe   ipv4_addr_IF will not be removed in the near future, but please use
fe   ifconfig_IF_alias{N,es} for newly-configured systems.  Backward
fe   compatibility for not breaking the existing configurations will be
fe   maintained as much as possible (even on the upcoming 10.0R and
fe   later).
fe 
fe
fe Almost everyone is familiar with ifconfig_IF_aliasN, but can you provide
fe example syntax for ifconfig_IF_aliases ? I've never seen that before and
fe can't find it documented.

 I committed some descriptions about it to rc.conf(5) at the same
 time.  It is basically the same as ifconfig_IF_aliasN, but can have
 multiple address specification.  Both of ifconfig_IF_alias{N,es} now
 supports range specification, so there is no difference in the
 functionality.  The following two examples give the same result:

   ifconfig_ed0_alias0=inet 127.0.0.251 netmask 0x
   ifconfig_ed0_alias1=inet 127.0.0.252 netmask 0x
   ifconfig_ed0_alias2=inet 127.0.0.253 netmask 0x
   ifconfig_ed0_alias3=inet 127.0.0.254 netmask 0x

   ifconfig_ed0_aliases=\
 inet 127.0.0.251 netmask 0x \
 inet 127.0.0.252 netmask 0x \
 inet 127.0.0.253 netmask 0x \
 inet 127.0.0.254 netmask 0x \
   

 The implementation actually converts values in the variables in
 ifconfig_IF_aliasN, ipv6_ifconfig_IF_aliasN, and ipv4_addrs_IF into a
 list of them in a consistent format (AF-keyword + address spec +
 options) used in ifconfig_IF_aliases, and then it processes
 ifconfig_IF_aliases and them.

 ifconfig_IF_aliasN accepts address spec without address family
 keyword for backward compatibility, but ifconfig_IF_aliases does not.
 This is the difference between the two.

fe This thread isn't exactly the proper forum to debate the future of
fe network configuration on FreeBSD, but please take this into
fe consideration. And thank you for your work on the rc.d scripts --
fe they're the #1 reason many of us prefer working with FreeBSD.

 Fair enough.  Please do not hesitate to speak up on freebsd-rc@ for
 this kind of topics.

-- Hiroki


pgpjGe9dn7kJF.pgp
Description: PGP signature


Re: ipv6_addrs_IF aliases in rc.conf(5)

2013-07-17 Thread Hiroki Sato
Michael Grimm trash...@odo.in-berlin.de wrote
  in eb3c4472-02bf-4415-bb2d-b4929063d...@odo.in-berlin.de:

tr On 12.07.2013, at 09:03, Hiroki Sato h...@freebsd.org wrote:
tr
tr  Please let me know if the existing configurations and/or the new
tr  formats do not work.
tr
tr First of all: great work! It is that much easier to deal with aliases, now.
tr
tr There is only one minor issue, if at all:
tr
tr rc.conf:
tr | ifconfig_em0_ipv6=inet6 dead:beef::::1 prefixlen 56
tr | ifconfig_em0_aliases=\
tr | inet6 dead:beef::::2-3 prefixlen 56 \
tr | inet6 dead:beef::::4 prefixlen 56 \
tr | inet6 dead:beef::::5-6/56
tr
tr ifconfig:
tr |   inet6 dead:beef::::1 prefixlen 56
tr |   inet6 dead:beef::::2 prefixlen 64
tr |   inet6 dead:beef::::3 prefixlen 64
tr |   inet6 dead:beef::::4 prefixlen 56
tr |   inet6 dead:beef::::5 prefixlen 56
tr |   inet6 dead:beef::::6 prefixlen 56
tr
tr Any combination of a range definition (2-3) *and* prefixlen 56 is ignored
tr whereas a range definition (5-6) *and* /56 is interpreted as wanted.
tr
tr Well, that combination of a range and prefix isn't documented, thus I am
tr not sure if that's an issue or a feature?

 It seems a bug.  Thank you for your report.  I am investigating it now.

-- Hiroki


pgpeQ_UFShVrJ.pgp
Description: PGP signature


Re: ipv6_addrs_IF aliases in rc.conf(5)

2013-07-12 Thread Hiroki Sato
Michael Grimm trash...@odo.in-berlin.de wrote
  in 4c07217dc9200841dfd065a6d5284...@mx1.enfer-du-nord.net:

tr On 2013-07-12 6:56, Hiroki Sato wrote:
tr  Kevin Oberman rkober...@gmail.com wrote
trin can6yy1srswemj2_bjx_drzmxgk4tf50_ode8o8i2d6wtrgw...@mail.gmail.com:
tr  rk On Wed, Jul 10, 2013 at 4:46 AM, Mark Felder f...@feld.me wrote:
tr  rk
tr  rk  On Wed, 10 Jul 2013 06:44:12 -0500, Michael Grimm 
tr  rk  trash...@odo.in-berlin.de wrote:
tr  rk 
tr  rk   Will that patch make it into 9.2? If I am not mistaken, that patch 
isn't
tr  rk  in stable yet.
tr  rk 
tr  rk 
tr  rk  I would also like to see this patch hit 9.x sooner than later. It's 
so
tr  rk  painful when someone forgets to fix the alias numbering on servers 
with
tr  rk  many, many IPv4 and IPv6 addresses...
tr  rk 
tr  rk
tr  rk Please, please, please, please, ...!
tr  rk
tr  rk Freeze is only two days away, so time for 9.2 is almost over and I 
can see
tr  rk no good reason NOT to get this done.
tr   r252015 was merged to stable/9 today.
tr
tr Thanks! This is highly appreciated. A first glance at network.subr tells me 
that
tr much more has been modified/simplified regarding alias definitions, great.

 Please let me know if the existing configurations and/or the new
 formats do not work.  The following is a summary of the supported
 rc.conf variables, FYI:

Hiroki Sato h...@freebsd.org wrote
  in 201306200229.r5k2tnfr085...@svn.freebsd.org:

hr   A summary of the supported ifconfig_* variables is as follows:
hr
hr# IPv4 configuration.
hrifconfig_em0=inet 192.168.0.1
hr# IPv6 configuration.
hrifconfig_em0_ipv6=inet6 2001:db8::1/64
hr# IPv4 address range spec.  Now deprecated.
hripv4_addr_em0=10.2.1.1-10
hr# IPv6 alias.
hrifconfig_em0_alias0=inet6 2001:db8:5::1 prefixlen 70
hr# IPv4 alias.
hrifconfig_em0_alias1=inet 10.2.2.1/24
hr# IPv4 alias with range spec w/o AF keyword (backward compat).
hrifconfig_em0_alias2=10.3.1.1-10/32
hr# IPv6 alias with range spec.
hrifconfig_em0_alias3=inet6 2001:db8:20-2f::1/64
hr# ifconfig_IF_aliases is just like ifconfig_IF_aliasN.
hrifconfig_em0_aliases=inet 10.3.3.201-204/24 inet6 
2001:db8:210-213::1/64 inet 10.1.1.1/24
hr# IPv6 alias (backward compat)
hripv6_ifconfig_em0_alias0=inet6 2001:db8:f::1/64
hr# IPv6 alias w/o AF keyword (backward compat)
hripv6_ifconfig_em0_alias1=2001:db8:f:1::1/64
hr# IPv6 prefix.
hripv6_prefix_em0=2001:db8::/64

-- Hiroki


pgp_2ncrav6RP.pgp
Description: PGP signature


Re: ipv6_addrs_IF aliases in rc.conf(5)

2013-07-11 Thread Hiroki Sato
Kevin Oberman rkober...@gmail.com wrote
  in can6yy1srswemj2_bjx_drzmxgk4tf50_ode8o8i2d6wtrgw...@mail.gmail.com:

rk On Wed, Jul 10, 2013 at 4:46 AM, Mark Felder f...@feld.me wrote:
rk
rk  On Wed, 10 Jul 2013 06:44:12 -0500, Michael Grimm 
rk  trash...@odo.in-berlin.de wrote:
rk 
rk   Will that patch make it into 9.2? If I am not mistaken, that patch isn't
rk  in stable yet.
rk 
rk 
rk  I would also like to see this patch hit 9.x sooner than later. It's so
rk  painful when someone forgets to fix the alias numbering on servers with
rk  many, many IPv4 and IPv6 addresses...
rk 
rk
rk Please, please, please, please, ...!
rk
rk Freeze is only two days away, so time for 9.2 is almost over and I can see
rk no good reason NOT to get this done.

 r252015 was merged to stable/9 today.

-- Hiroki


pgpwRUfj8rol1.pgp
Description: PGP signature


request for your comments on release documentation

2013-06-12 Thread Hiroki Sato
Hi,

 I would like your comments on release notes for each release.
 Although I have been working on editing them for years, the workflow
 is still not optimal and sometimes delay of the preparation became an
 obstacle for release process.  I would like to improve it, but before
 that I would like to know what are desired of the contents which
 people think.

 Release Notes is just listing the changes between the two releases.
 It includes user-visible change (bugfix and/or UI change), new
 functionality, and performance improvement.  Minor changes such as
 one in kernel internal structure are omitted.  I always try to keep
 these series of relnotes items are correct and reasonably
 comprehensive, but this lengthy list may be boring and
 technically-correct descriptions can be cryptic for average users.

 So, my questions are:

 1. What do you think about current granularity of the relnotes items?
Too detailed, good, or too rough?  Currently, judgment of what is
included or not is based on user-visible, new functionality, or
performance improvement.  Applicable changes are included as
relnotes items even if the changes are small,

 2. Do you want technical details?  For example, just disk access
performance was improved by 50% or Feature A has been added.
This changes the old behavior because ..., and as a result, it
improves disk access performance by 50%.

 3. Is there missing information which should be in the relnotes?
Probably there are some missing items for each release, but this
question is one at some abstraction level.  Link to commit log and
diff, detailed description of major incompatible changes, and so
on.

 Although the other release documentations---Errata, Installation
 Notes, ReadMe, and Hardware Notes---also need some improvements,
 please focus on Release Notes only.  And you might think quality of
 English writing are not good, please leave that alone for now.

-- Hiroki


pgp5vPNysGiJt.pgp
Description: PGP signature


Re: Possible 8.4 regression

2013-06-08 Thread Hiroki Sato
Alexander Pyhalov a...@rsu.ru wrote
  in 4fffaaf8a6667175fca94ce32f25a...@sfedu.ru:

al Hello.
al
al Just wanted to share a notice.
al I had a 8.3 system with PostgreSQL running in a jail.
al rc.conf has the following lines:
al
al jail_enable=YES
al jail_sysvipc_allow=YES
al jail_mount_enable=YES
al jail_devfs_enable=YES
al
al jail_pgsql_rootdir=/jails/run/pgsql
al jail_pgsql_hostname=pgsql.freebsd
al jail_pgsql_ip=my.ip
al jail_pgsql_interface=em0
al
al It was running normally. However, after update to 8.4 I had to add the
al following parameter
al jail_pgsql_parameters=allow.sysvipc
al
al Without it shmget in jail didn't work.

 Thank you for the report.  This affects jail_set_hostname_allow and
 jail_socket_unixiproute_only as well.  I will add it to Errata.

-- Hiroki


pgpi2CEVtRsG4.pgp
Description: PGP signature


Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-28 Thread Hiroki Sato
YongHyeon PYUN pyu...@gmail.com wrote
  in 20130528023300.ga3...@michelle.cdnetworks.com:

py  I'll have access to the other box on Wednesday and will try the other 
test.
py
py Here is patch I'm testing and it seems to work with dhclient on
py CURRENT.
py Mike, could you try attached patch?

 On my box it worked without problem.  Link status change of fxp0 was
 down-up only in the patched driver.

-- Hiroki


pgppsNZhJL23T.pgp
Description: PGP signature


Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-24 Thread Hiroki Sato
YongHyeon PYUN pyu...@gmail.com wrote
  in 20130524054720.ga1...@michelle.cdnetworks.com:

py On Thu, May 23, 2013 at 09:49:19PM -0700, Jeremy Chadwick wrote:
py  On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote:
py   On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote:
pyOn Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote:
py If someone wants me to test DHCP via fxp(4) on the above system (I 
can
py do so with both NICs), just let me know; it should only take me 
half an
py hour or so.
py
py I'll politely wait for someone to say please do so else won't 
bother.
py
py   
pyFor the sake of completeness...
py   
pyPlease do so.  :)
py  
py   Issue reproduced 100% reliably, even within sysinstall.
py  
py   {snip}
py 
py  Forgot to add:
py 
py  This issue ONLY happens when using DHCP.
py 
py  Statically assigning the IP address works fine; fxp0 goes down once,
py  up once, then stays up indefinitely.
py
py I asked Mike to try backing out dhclient(8) change(r247336) but it
py seems he missed that. Jeremy, could you try that?
py
py I guess dhclient(8) does not like flow-control negotiation of
py fxp(4) after link establishment.

 Okay, I could reproduce this issue on my box.  After invocation of
 dhclient(8), a link is up and then state_reboot() drops the link
 establishment.  Removing the changes around RTM_IFINFO in r247336
 makes it work with no problem.

 A workaround is specifying the following line in rc.conf:

 ifconfig_fxp0=DHCP media 100baseTX mediaopt full-duplex

-- Hiroki


pgplYTY7pdVsc.pgp
Description: PGP signature


Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-24 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20130524.162926.395058052118975996@allbsd.org:

hr YongHyeon PYUN pyu...@gmail.com wrote
hr   in 20130524054720.ga1...@michelle.cdnetworks.com:
hr
hr  A workaround is specifying the following line in rc.conf:
hr
hr  ifconfig_fxp0=DHCP media 100baseTX mediaopt full-duplex

 Hmm, I guess this can happen on other NICs when the link negotiation
 causes a link-state flap.  Is it true?

-- Hiroki


pgpsjeYWEzvsx.pgp
Description: PGP signature


Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Hiroki Sato
Jeremy Chadwick j...@koitsu.org wrote
  in 20130524044035.ga40...@icarus.home.lan:

jd On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote:
jd  On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote:
jd   If someone wants me to test DHCP via fxp(4) on the above system (I can
jd   do so with both NICs), just let me know; it should only take me half an
jd   hour or so.
jd  
jd   I'll politely wait for someone to say please do so else won't bother.
jd  
jd 
jd  For the sake of completeness...
jd 
jd  Please do so.  :)
jd
jd Issue reproduced 100% reliably, even within sysinstall.
jd
jd ISO image used:
jd
jd 
ftp://ftp4.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/8.4/FreeBSD-8.4-RC3-i386-disc1.iso
jd
jd I just chose to Configure the system, selected Networking, chose NO to
jd the IPv6 configuration choice, and YES to the DHCP configuration choice,
jd then hit Alt-F2 to watch relevant output.
jd
jd This was the result:
jd
jd http://imgbin.org/index.php?page=imageid=13718
jd
jd ...with the fxp0 physif up/down messages continuing indefinitely.
jd
jd fxp0 on the system is the Intel 82559.  Shot of console's dmesg:
jd
jd http://imgbin.org/index.php?page=imageid=13720

 Hmm, I tried RC3 on one of my test machines which has fxp0:


 FreeBSD 8.4-RC3 #0 r250307: Tue May  7 04:40:16 UTC 2013
r...@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC i386
 ...
 fxp0: Intel 82559 Pro/100 Ethernet port 0x2800-0x283f mem 
0xc4ffe000-0xc4ffefff,0xc4e0-0xc4ef irq 10 at device 3.0 on pci0
 miibus0: MII bus on fxp0
 inphy0: i82555 10/100 media interface PHY 1 on miibus0
 inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
 fxp0: Ethernet address: 00:02:a5:eb:14:93
 fxp0: [ITHREAD]

 fxp0@pci0:0:3:0:class=0x02 card=0xb1340e11 chip=0x12298086 
rev=0x08 hdr=0x00
vendor = 'Intel Corporation'
device = '82550/1/7/8/9 EtherExpress PRO/100(B) Ethernet Adapter'
class  = network
subclass   = ethernet

 dev.inphy.0.%desc: i82555 10/100 media interface
 dev.inphy.0.%driver: inphy
 dev.inphy.0.%location: phyno=1
 dev.inphy.0.%pnpinfo: oui=0xaa00 model=0x15 rev=0x4
 dev.inphy.0.%parent: miibus0


 It worked well for a PXE boot at least.  I will give dhclient a try
 later.

-- Hiroki


pgpedi2anyIFG.pgp
Description: PGP signature


Re: NFS-exported ZFS instability

2013-01-29 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20130104.023244.472910818423317661@allbsd.org:

hr Konstantin Belousov kostik...@gmail.com wrote
hr   in 20130102174044.gb82...@kib.kiev.ua:
hr
hr ko  I might take a closer look this evening and see if I can spot anything
hr ko  in the log, rick
hr ko  ps: I hope Alan and Kostik don't mind being added to the cc list.
hr ko
hr ko What I see in the log is that the lock cascade rooted in the thread
hr ko 100838, which owns system map mutex. I believe this prevents malloc(9)
hr ko from making a progress in other threads, which e.g. own the ZFS vnode
hr ko locks. As the result, the whole system wedged.
hr ko
hr ko Looking back at the thread 100838, we can see that it executes
hr ko smp_tlb_shootdown(). It is impossible to tell from the static dump,
hr ko is the appearance of the smp_tlb_shootdown() in the backtrace is
hr ko transient, or the thread is spinning there, waiting for other CPUs to
hr ko acknowledge the request. But, since the system wedged, most likely,
hr ko smp_tlb_shootdown spins.
hr ko
hr ko Taking this hypothesis, the situation can occur, most likely, due to
hr ko some other core running with the interrupts disabled. Inspection of the
hr ko backtraces of the processes running on all cores does not show any which
hr ko could legitimately own a spinlock or otherwise run with the interrupts
hr ko disabled.
hr ko
hr ko One thing you could try to do is to enable WITNESS for the spinlocks,
hr ko to try to catch the leaked spinlock. I very much doubt that this is
hr ko the case.
hr ko
hr ko Another thing to try is to switch the CPU idle method to something
hr ko else. Look at the machdep.idle* sysctls. It could be some CPU errata
hr ko which blocks wakeup due the interrupt in some conditions in C1 ?
hr
hr  Thank you.  It can take 1-2 weeks to reproduce this, so I set
hr  debug.witness.skipspin=0 and keeping machdep.idle acpi abd will see
hr  how it goes for a while.  I will report again if I can get another
hr  freeze.

 Hmm, I could reproduce the same freeze when debug.witness.skipspin=0,
 too.  DDB and crash dump outputs are the following:

  http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt
  http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt

 The value of machdep.idle was acpi.  I have seen this symptom on two
 boxes with the following CPUs, so I am guessing it is not specific to
 a CPU model:

  CPU: Intel(R) Pentium(R) D CPU 3.40GHz (3391.52-MHz K8-class CPU)
  CPU: Intel(R) Xeon(R) CPU X5650  @ 2.67GHz (2666.82-MHz K8-class CPU)

-- Hiroki


pgpD2om1nCoqH.pgp
Description: PGP signature


Re: sendmail vs ipv6 broken after upgrade to 9.1

2013-01-09 Thread Hiroki Sato
Ulrich Spörlein u...@freebsd.org wrote
  in 20130109142111.gl35...@acme.spoerlein.net:

uq On Wed, 2013-01-09 at 14:14:18 +0100, Michiel Boland wrote:
uq  On 01/08/2013 23:33, Hiroki Sato wrote:
uq   Ulrich Spörlein u...@freebsd.org wrote
uq  in 20130108184051.gi35...@acme.spoerlein.net:
uq  
uq   uq After setting this, it now looks like this:
uq   uq root@acme: ~# ip6addrctl
uq   uq Prefix  Prec Label  Use
uq   uq ::1/128   50 00
uq   uq ::/0  40 10
uq   uq 2002::/16 30 20
uq   uq ::/96 20 30
uq   uq :::0.0.0.0/96 10 40
uq   uq
uq   uq And even sendmail is happily finding the sockets to bind to. Thanks 
for the hint!
uq  
uq I think this just hides the problem.  If gshapiro@'s explanation is
uq correct, no :::0.0.0.0/96 address should be returned if the name
uq resolution works fine...
uq  
uq   -- Hiroki
uq  
uq  
uq  getipnodebyname(xx, AF_INET6, AI_DEFAULT|AI_ALL) does this:-
uq  
uq  If a host has both IPv6 and IPv4 addresses, both are returned.
uq  The IPv4 address is presented as a mapped address.
uq  The order in which the addresses are returns depends on the
uq  address selection policy (_hpreorder in lib/libc/net/name6.c)
uq 
uq Is this also supposed to work for selecting the source IP address for
uq outgoing packets/sockets? And should it work for ping6?

 Yes.

uq Using a tunnel for IPv6, I have this transfer net configured on my
uq router, but for ACL purposes I would like to have all connections come
uq from my real prefix, not the transfer net. So I wrote my own policy, yet
uq ping6 seems to ignore it.

uq As you can see, source prefix stays 2a02:2528:ff00, though I'd like it
uq to be 2a02:2528:ff0d.

 This is because the prefix on the interface has the first priority.
 Why don't you use an fe80::/10 address to route packets to the other
 endpoint of tun0?

-- Hiroki


pgpFTwL8cirug.pgp
Description: PGP signature


Re: sendmail vs ipv6 broken after upgrade to 9.1

2013-01-09 Thread Hiroki Sato
Ben Morrow b...@morrow.me.uk wrote
  in 20130109154435.ga81...@anubis.morrow.me.uk:

be So getipnodebyname is behaving correctly here: the host has both IPv4
be and IPv6 addresses, and Sendmail is requesting both native and v4-mapped
be addresses be returned in all cases. The v4-mapped addresses are then
be sorted to the top of the list.
be
be On FreeBSD, where net.inet6.ip6.v6only is on by default, I believe this
be is incorrect, and Sendmail should be passing 0 for the flags argument,
be unless it's going to check or clear the IPV6_V6ONLY socket option. There
be is no point binding a socket to a v4-mapped address if the kernel isn't
be going to deliver IPv4 connections to it. Sendmail should also be binding
be to all the addresses returned, if it isn't already, rather than just the
be first: this would make the problem go away, since both v4-mapped and
be native IPv6 sockets would be bound, and the v4-mapped ones would simply
be never get any connections.

 I reread the RFC 2553 and realize your explanation is correct.
 gshapiro's explanation was a behavior in the case of (AF_INET6,
 AI_DEFAULT), not (AF_INET6, AI_DEFAULT|AI_ALL).

 I think sendmail should work regardless of net.inet6.ip6.v6only.  Is
 just dropping AI_ALL enough for that?  When  RR is found, no
 v4-mapped address will return in that case.  Is this correct?

be Fixing this by setting ipv6_prefer is not necessarily a good idea; this
be will cause IPv6 addresses to be preferred across the whole system, and
be unless your IPv6 connectivity is at least as good as your IPv4, that
be probably isn't what you want.

 Yes, I agree that ipv6_prefer is not a correct way to solve this
 specific issue.

be   Just curious, but is there any specific reason not to return an error
be   when Family=inet6 and no  RR?
be
be In this case, Sendmail explicitly requested that v4-mapped addresses be
be returned in all cases...

-- Hiroki


pgp8oZFQaQ0r1.pgp
Description: PGP signature


Re: sendmail vs ipv6 broken after upgrade to 9.1

2013-01-08 Thread Hiroki Sato
Gregory Shapiro gshap...@freebsd.org wrote
  in 20130108180920.gj36...@rugsucker.smi.sendmail.com:

gs  How can I unstupid sendmail here?
gs
gs I don't think sendmail is being stupid here as it is doing what it has
gs been doing under 8.x and 9.1 (the code is the same).  I think
gs something changed with the upgrade to 9.1.  As far as tracking it
gs down, the sendmail code does:
gs
gs getipnodebyname(acme.spoerlein.net, AF_INET6, AI_DEFAULT|AI_ALL,
gs err);
gs
gs This will only return an IPv4 mapped address if:
gs
gs 1. There are no IPv6 addresses configured on the interfaces.  How are
gs your IPv6 addresses assigned?  If auto-configured (DHCPv6, RTADV), is
gs it possible sendmail is being started before autoconfiguration has
gs completed?  Restarting the MTA after boot and seeing if it still gets
gs the mapped address will say whether or not this is the cause.
gs
gs 2. The query for an  record for acme.spoerlein.net failed.  This
gs doesn't appear to be the case for dns based on your dig output
gs (assuming you ran that dig command on the same machine that is
gs exhibiting the problem).  However, your nsswitch.conf lists hosts
gs before dns and there have been broken name resolution implementations
gs that, with 'hosts' listed first in nsswitch.conf have given back bad
gs info if the first hostname match didn't have the IPv6 address.  You
gs could try switching the order in /etc/hosts to see if this helps.
gs (Note, the broken implementation was not FreeBSD.)

 Just curious, but is there any specific reason not to return an error
 when Family=inet6 and no  RR?

-- Hiroki


pgpKBFOU0X1Fy.pgp
Description: PGP signature


Re: sendmail vs ipv6 broken after upgrade to 9.1

2013-01-08 Thread Hiroki Sato
Ulrich Spörlein u...@freebsd.org wrote
  in 20130108184051.gi35...@acme.spoerlein.net:

uq After setting this, it now looks like this:
uq root@acme: ~# ip6addrctl
uq Prefix  Prec Label  Use
uq ::1/128   50 00
uq ::/0  40 10
uq 2002::/16 30 20
uq ::/96 20 30
uq :::0.0.0.0/96 10 40
uq 
uq And even sendmail is happily finding the sockets to bind to. Thanks for the 
hint!

 I think this just hides the problem.  If gshapiro@'s explanation is
 correct, no :::0.0.0.0/96 address should be returned if the name
 resolution works fine...

-- Hiroki


pgpTBxYwcfkgN.pgp
Description: PGP signature


Re: NFS-exported ZFS instability

2013-01-03 Thread Hiroki Sato
Rick Macklem rmack...@uoguelph.ca wrote
  in 1914428061.1617223.1357133079421.javamail.r...@erie.cs.uoguelph.ca:

rm Hiroki Sato wrote:
rm  Hello,
rm 
rm  I have been in a trouble about my NFS server for a long time. The
rm  symptom is that it stops working in one or two weeks after a boot. I
rm  could not track down the cause yet, but it is reproducible and only
rm  occurred under a very high I/O load.
rm 
rm  It did not panic, just stopped working---while it responded to ping,
rm  userland programs seemed not working. I could break it into DDB and
rm  get a kernel dump. The following URLs are a log of ps, trace, and
rm  etc.:
rm 
rm  http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102
rm  http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102
rm 
rm  Does anyone see how to debug this? I guess this is due to a deadlock
rm  somewhere. I have suffered from this problem for almost two years.
rm  The above log is from stable/9 as of Dec 19, but this have persisted
rm  since 8.X.
rm 
rm Well, I took a quick glance at the log and there are a lot of processes
rm sleeping on pfault (in vm_waitpfault() in sys/vm/vm_page.c). I'm no
rm vm guy, so I'm not sure when/why that will happen. The comment on the
rm function suggests they are waiting for free pages.
rm
rm Maybe something as simple as running out of swap space or a problem
rm talking to the disk(s) that has the swap partition(s) or ???
rm (I'm talking through my hat here, because I'm not conversant with
rm  the vm side of things.)
rm
rm I might take a closer look this evening and see if I can spot anything
rm in the log, rick
rm ps: I hope Alan and Kostik don't mind being added to the cc list.

 Thank you.  This machine has 24GB RAM + 30GB swap.  16GB of them are
 used for ZFS ARC, and I can see 1.5GB free space on average.
 However, frequent swapouts happen in a regular basis even when the
 I/O load is low.  The amount used in the swap was 20-30MB only
 regardless of the load.

 I checked vm.stats and the outputs of vmstat -z/-m every 10 sec until
 the freeze several times but vm.stats.vm.v_free_count was around
 300,000 (1GB) even just before the freeze.

-- Hiroki


pgpt4cIux6h0I.pgp
Description: PGP signature


Re: NFS-exported ZFS instability

2013-01-03 Thread Hiroki Sato
Konstantin Belousov kostik...@gmail.com wrote
  in 20130102174044.gb82...@kib.kiev.ua:

ko  I might take a closer look this evening and see if I can spot anything
ko  in the log, rick
ko  ps: I hope Alan and Kostik don't mind being added to the cc list.
ko
ko What I see in the log is that the lock cascade rooted in the thread
ko 100838, which owns system map mutex. I believe this prevents malloc(9)
ko from making a progress in other threads, which e.g. own the ZFS vnode
ko locks. As the result, the whole system wedged.
ko
ko Looking back at the thread 100838, we can see that it executes
ko smp_tlb_shootdown(). It is impossible to tell from the static dump,
ko is the appearance of the smp_tlb_shootdown() in the backtrace is
ko transient, or the thread is spinning there, waiting for other CPUs to
ko acknowledge the request. But, since the system wedged, most likely,
ko smp_tlb_shootdown spins.
ko
ko Taking this hypothesis, the situation can occur, most likely, due to
ko some other core running with the interrupts disabled. Inspection of the
ko backtraces of the processes running on all cores does not show any which
ko could legitimately own a spinlock or otherwise run with the interrupts
ko disabled.
ko
ko One thing you could try to do is to enable WITNESS for the spinlocks,
ko to try to catch the leaked spinlock. I very much doubt that this is
ko the case.
ko
ko Another thing to try is to switch the CPU idle method to something
ko else. Look at the machdep.idle* sysctls. It could be some CPU errata
ko which blocks wakeup due the interrupt in some conditions in C1 ?

 Thank you.  It can take 1-2 weeks to reproduce this, so I set
 debug.witness.skipspin=0 and keeping machdep.idle acpi abd will see
 how it goes for a while.  I will report again if I can get another
 freeze.

-- Hiroki


pgppNW6a6Bds7.pgp
Description: PGP signature


NFS-exported ZFS instability

2013-01-01 Thread Hiroki Sato
Hello,

 I have been in a trouble about my NFS server for a long time.  The
 symptom is that it stops working in one or two weeks after a boot.  I
 could not track down the cause yet, but it is reproducible and only
 occurred under a very high I/O load.

 It did not panic, just stopped working---while it responded to ping,
 userland programs seemed not working.  I could break it into DDB and
 get a kernel dump.  The following URLs are a log of ps, trace, and
 etc.:

  http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102
  http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102

 Does anyone see how to debug this?  I guess this is due to a deadlock
 somewhere.  I have suffered from this problem for almost two years.
 The above log is from stable/9 as of Dec 19, but this have persisted
 since 8.X.

-- Hiroki


pgprYn17NEo1S.pgp
Description: PGP signature


Re: FreeBSD daily snapshot build in allbsd.org temporarily down

2012-12-19 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20121207.101917.103513550140980591@allbsd.org:

hr Hi all,
hr
hr  I received many emails asking why
hr  https://pub.allbsd.org/FreeBSD-snapshots/ is stopped working and when
hr  it will recover, so I just wanted to let you know that FreeBSD daily
hr  snapshot build in allbsd.org is temporarily down.  The reason why it
hr  is down is some local network issue and CVS-SVN migration of the
hr  build system.  The latter was solved already.  However, the former
hr  was unexpected and needed some time than I thought originally.

 The service has almost recovered.  Snapshots for i386, amd64, and
 pc98/i386 are being rebuilt now, and then ia64, sparc64, and powerpc
 will also be connected to the build queue soon.

 For stable/9 and later, Subversion repository is used and the build
 results are sorted by the revision numbers on each day.  For 8.X it
 still uses CVS via the make release target but will be switched to
 use Subversion shortly.

 Note that some local network performance issue still remains.  It
 seems due to traffic congestion around the border router which I do
 not have control of.  The transfer rate can become less than 100KB/s
 especially in 12:00-18:00 in JST.

 I will planning to add a custom build functionality by using the
 source trees under projects/ or user/ branch to this service.

-- Hiroki


pgpurKVXFyGiy.pgp
Description: PGP signature


FreeBSD daily snapshot build in allbsd.org temporarily down

2012-12-06 Thread Hiroki Sato
Hi all,

 I received many emails asking why
 https://pub.allbsd.org/FreeBSD-snapshots/ is stopped working and when
 it will recover, so I just wanted to let you know that FreeBSD daily
 snapshot build in allbsd.org is temporarily down.  The reason why it
 is down is some local network issue and CVS-SVN migration of the
 build system.  The latter was solved already.  However, the former
 was unexpected and needed some time than I thought originally.

 The snapshot build will start again this weekend or early next week.
 Glen is offering similar snapshot ISO images and distfiles for amd64
 and i386 at https://snapshots.glenbarber.us/Latest/, so please visit
 his page if you need the latest snapshot right now.

-- Hiroki


pgpyctOy4z2Oi.pgp
Description: PGP signature


Re: FreeBSD 10-CURRENT and 9-STABLE snapshots

2012-10-10 Thread Hiroki Sato
Jakub Lach jakub_l...@mailplus.pl wrote
  in 1349873186577-5750838.p...@n5.nabble.com:

ja Any questions and suggestions are welcome. Contact h...@freebsd.org.
ja
ja But good catch, if your reasoning is indeed correct.
ja
ja And for the record, they are NOT official snapshots.

 Migrating from CVS to SVN in the build infrastructure is in progress
 and the daily snapshot build will recover in a couple of days, JFYI.

-- Hiroki


pgpEccIg5sCc7.pgp
Description: PGP signature


Re: Broadcom NetXtreme bcm5720 in the 9.1 beta

2012-08-01 Thread Hiroki Sato
Sean Bruno sean...@yahoo-inc.com wrote
  in 1343243969.2727.2.ca...@powernoodle.corp.yahoo.com:

se On Tue, 2012-07-24 at 18:46 -0700, Hiroki Sato wrote:
se  Peter Feger magick...@gmail.com wrote
sein CAD_3y4wAPp+8ZSveB6mbOF7M1Ne-zAvz4Uf=vv9quohuu23...@mail.gmail.com:
se 
se  ma I just got done installing FreeBSD-9.0 on a Dell R720.  I can tell you
se  ma that none of the broadcom products will work.  There is no driver that
se  ma I have been able to find.  I wound up having to replace them with
se  ma Intel nics.  I used the i350 quad-port 1G  and the x520 for 10G Fiber.
se 
se   I recently bought a Dell R420 which had BCM 5720 as the LOM.  The
se   output of pciconf was the following:
se 
se  bge0@pci0:2:0:0:class=0x02 card=0x04f81028 chip=0x165f14e4 
rev=0x00 hdr=0x00
se  vendor = 'Broadcom Corporation'
se  device = 'NetXtreme BCM5720 Gigabit Ethernet PCIe'
se  class  = network
se  subclass   = ethernet
se 
se   On 9.1-PRERELEASE as of Jul 23, it was recognized but did not work
se   properly first (the link-status went back and forth between up and
se   down).  However, after setting dev.bge.0.msi=0 it worked.  I am not
se   sure of whether it had decent communication speed or not, but I saw
se   it worked with 50MB/s or so at least.
se 
se   IPMI over LAN did not work even if hw.bge.allow_asf was set to 1.
se 
se  -- Hiroki
se
se
se
se For the r420/320 ... grab Pyun's latest updates and give it a whirl.
se They seem to work for us at yahoo:
se
se http://people.freebsd.org/~yongari/bge/

 Thanks!  I am testing his patches...

-- Hiroki


pgpCoLNDhH26O.pgp
Description: PGP signature


Re: Broadcom NetXtreme bcm5720 in the 9.1 beta

2012-07-24 Thread Hiroki Sato
Peter Feger magick...@gmail.com wrote
  in CAD_3y4wAPp+8ZSveB6mbOF7M1Ne-zAvz4Uf=vv9quohuu23...@mail.gmail.com:

ma I just got done installing FreeBSD-9.0 on a Dell R720.  I can tell you
ma that none of the broadcom products will work.  There is no driver that
ma I have been able to find.  I wound up having to replace them with
ma Intel nics.  I used the i350 quad-port 1G  and the x520 for 10G Fiber.

 I recently bought a Dell R420 which had BCM 5720 as the LOM.  The
 output of pciconf was the following:

bge0@pci0:2:0:0:class=0x02 card=0x04f81028 chip=0x165f14e4 rev=0x00 
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme BCM5720 Gigabit Ethernet PCIe'
class  = network
subclass   = ethernet

 On 9.1-PRERELEASE as of Jul 23, it was recognized but did not work
 properly first (the link-status went back and forth between up and
 down).  However, after setting dev.bge.0.msi=0 it worked.  I am not
 sure of whether it had decent communication speed or not, but I saw
 it worked with 50MB/s or so at least.

 IPMI over LAN did not work even if hw.bge.allow_asf was set to 1.

-- Hiroki


pgp5uvlqIwwQg.pgp
Description: PGP signature


Re: cvsup{, d} woes after upgrading to RELENG_9 on amd64 this weekend

2012-06-04 Thread Hiroki Sato
Dimitry Andric d...@freebsd.org wrote
  in 4fcc80c7.8060...@freebsd.org:

di That said, since the ezm3 software is essentially unmaintained, the
di only practical solutions to your problem currently are:
di
di - Compile libz without SSE
di - Compile libz with gcc
di - Use csup instead of cvsup
di - Fix ezm3 to respect the amd64 ABI
di - Rewrite cvsupd in C (this is left as an exercise for the reader ;)

 I have the same problem on my mirror server and currenly using a
 cvsup package for i386 on FreeBSD/amd64.

-- Hiroki


pgpEvSTVjMeKL.pgp
Description: PGP signature


Re: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak

2012-04-26 Thread Hiroki Sato
Rick Macklem rmack...@uoguelph.ca wrote
  in 1527622626.3418715.1335445225510.javamail.r...@erie.cs.uoguelph.ca:

rm Steven Hartland wrote:
rm   Original Message -
rm  From: Rick Macklem rmack...@uoguelph.ca
rm   At a glance, it looks to me like 8.x is affected. Note that the
rm   bug only affects the new NFS server (the experimental one for 8.x)
rm   when exporting ZFS volumes. (UFS exported volumes don't leak)
rm  
rm   If you are running a server that might be affected, just:
rm   # vmstat -z | fgrep -i namei
rm   on the server and see if the 3rd number shown is increasing.
rm 
rm  Many thanks Rick wasnt aware we had anything experimental enabled
rm  but I think that would be a yes looking at these number:-
rm 
rm  vmstat -z | fgrep -i namei
rm  NAMEI: 1024, 0, 1, 1483, 25285086096, 0
rm  vmstat -z | fgrep -i namei
rm  NAMEI: 1024, 0, 0, 1484, 25285945725, 0
rm 
rm   ^
rm I don't think so, since the 3rd number (USED) is 0 here.
rm If that # is increasing over time, you have the leak. You are
rm probably running the old (default in 8.x) NFS server.

 Just a report, I confirmed it affected 8.x servers running newnfs.

 Actually I have been suffered from memory starvation symptom on that
 server (24GB RAM) for a long time and watching vmstat -z
 periodically.  It stopped working once a week.  I investigated the
 vmstat log again and found the amount of NAMEI leak was 11,543,956
 (about 11GB!) just before the locked-up.  After applying the patch,
 the leak disappeared.  Thank you for fixing it!

-- Hiroki


pgpbHh66gySGv.pgp
Description: PGP signature


Re: another panic in 8.3-PRERELEASE

2012-02-24 Thread Hiroki Sato
Konstantin Belousov kostik...@gmail.com wrote
  in 20120224150259.gv55...@deviant.kiev.zoral.com.ua:

ko   #19 0x000800abecfc in ?? ()
ko   Previous frame inner to this frame (corrupt stack?)
ko   (kgdb)
ko  Can you, please, print out the content of *td, e.g. from the frame 16 ?
ko 
ko And *req from the frame 11, please.

 Here:

(kgdb) f 16
#16 0x80675e3a in __sysctl (td=0xff0396ec5460, 
uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491
1491error = userland_sysctl(td, name, uap-namelen,
(kgdb) print *td
$2 = {td_lock = 0x80d7f540, td_proc = 0xff03969bf470, td_plist = {
tqe_next = 0x0, tqe_prev = 0xff03969bf480}, td_runq = {tqe_next = 0x0, 
tqe_prev = 0x80d7f788}, td_slpq = {tqe_next = 0x0, 
tqe_prev = 0xff0396ebe800}, td_lockq = {tqe_next = 0x0, 
tqe_prev = 0xff86c57b48a0}, td_cpuset = 0xff0005789dc8, 
  td_sel = 0xff01b5dd0500, td_sleepqueue = 0xff0396ebe800, 
  td_turnstile = 0xff01334cf600, td_umtxq = 0xff0396ec3a80, 
  td_tid = 100763, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, 
sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, 
  tqh_last = 0xff0396ec5500}, sq_proc = 0xff03969bf470, 
sq_flags = 1}, td_flags = 65540, td_inhibitors = 0, td_pflags = 0, 
  td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, 
  td_lastcpu = 4 '\004', td_oncpu = 4 '\004', td_owepreempt = 0 '\0', 
  td_tsqueue = 255 'ÿ', td_locks = 4, td_rw_rlocks = 0, td_lk_slocks = 0, 
  td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, 
  td_sleeplocks = 0x80ecebf0, td_intr_nesting_level = 0, 
  td_pinned = 0, td_ucred = 0xff007d537b00, td_estcpu = 0, td_slptick = 0, 
  td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {
  tv_sec = 0, tv_usec = 0}, ru_maxrss = 1864, ru_ixrss = 66288, 
ru_idrss = 1347856, ru_isrss = 176768, ru_minflt = 263901, ru_majflt = 10, 
ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, 
ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14937, ru_nivcsw = 3286}, 
  td_incruntime = 0, td_runtime = 15204044088, td_pticks = 15, td_sticks = 15, 
  td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 
  0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_generation = 18223, 
  td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, 
  td_profil_addr = 0, td_profil_ticks = 0, 
  td_name = top, '\0' repeats 16 times, td_fpop = 0x0, td_dbgflags = 0, 
  td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {
  si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, 
  si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, 
sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {
_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, 
_mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, 
  __spare2__ = {0, 0, 0, 0, 0, 0, 0, ksi_flags = 0, 
ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, 
osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, 
  td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', 
  td_pri_class = 3 '\003', td_user_pri = 129 '\201', 
  td_base_user_pri = 129 '\201', td_pcb = 0xff86c6389d10, 
  td_state = TDS_RUNNING, td_retval = {0, 34375032832}, td_slpcallout = {
c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, 
tqe_prev = 0xff800042ccd0}}, c_time = 51568077, 
c_arg = 0xff0396ec5460, c_func = 0x806a84c0 sleepq_timeout, 
c_lock = 0x0, c_flags = 18, c_cpu = 4}, td_frame = 0xff86c6389c50, 
  td_kstack_obj = 0xff03410b20d8, td_kstack = 18446743553049124864, 
  td_kstack_pages = 4, td_unused1 = 0x0, td_unused2 = 0, td_unused3 = 0, 
  td_critnest = 0, td_md = {md_spinlock_count = 0, md_saved_flags = 70}, 
  td_sched = 0xff0396ec5890, td_ar = 0x0, td_syscalls = 469926, 
  td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, 
  td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_rux = {
rux_runtime = 15204044088, rux_uticks = 226, rux_sticks = 1140, 
rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, 
  td_map_def_user = 0x0, td_dbg_forked = 0}
(kgdb) f 11
#11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470, 
req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085
1085error = SYSCTL_OUT(req, ki, sizeof(struct kinfo_proc));
(kgdb) print *req
$3 = {td = 0xff0396ec5460, lock = 2, oldptr = 0x800e96000, oldlen = 68217, 
  oldidx = 1088, oldfunc = 0x80675e80 sysctl_old_user, newptr = 0x0, 
  newlen = 0, newidx = 0, newfunc = 0x80675d10 sysctl_new_user, 
  validlen = 68217, flags = 0}
(kgdb) quit

-- Hiroki


pgpXBb7kwRDuX.pgp
Description: PGP signature


Re: panic in 8.3-PRERELEASE

2012-02-23 Thread Hiroki Sato
Rick Macklem rmack...@uoguelph.ca wrote
  in 476361430.1773817.1329954835308.javamail.r...@erie.cs.uoguelph.ca:

rm John Baldwin wrote:
rm  On Wednesday, February 22, 2012 2:24:14 pm Konstantin Belousov wrote:
rm   On Wed, Feb 22, 2012 at 11:29:40AM -0500, Rick Macklem wrote:
rmHiroki Sato wrote:
rm Hi,
rm
rm Just a report, but I got the following panic on an NFS server
rm running
rm 8.3-PRERELEASE:
rm
rm (from here)
rm pool.allbsd.org dumped core - see /var/crash/vmcore.0
rm
rm Tue Feb 21 10:59:44 JST 2012
rm
rm FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE
rm #7: Thu
rm Feb 16 19:29:19 JST 2012
rm h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL
rm amd64
rm
rm panic: Assertion lock == sq-sq_lock failed at
rm /usr/src/sys/kern/subr_sleepqueue.c:335
rm
rmOops, I didn't know that mixing msleep() and tsleep() calls on the
rmsame
rmevent wasn't allowed.
rmThere are two places in the code where it did a:
rm  mtx_unlock();
rm  tsleep();
rmleft over from the days when it was written for OpenBSD.
rm   This sequence allows to lost the wakeup which is happen right after
rm   cache unlock (together with clearing the RC_WANTED flag) but before
rm   the thread enters sleep state. The tsleep has a timeout so thread
rm   should
rm   recover in 10 seconds, but still.
rm  
rm   Anyway, you should use consistent outer lock for the same wchan,
rm   i.e.
rm   no lock (tsleep) or mtx (msleep), but not mix them.
rm 
rm  Correct.
rm 
rmI don't think the mix would actually break anything, except that
rmthe
rmMPASS() assertion fails, but I've cc'd jhb@ since he seems to have
rmbeen
rmthe author of the sleep() stuff.
rm   
rmAnyhow, please try the attached patch which replaces the
rmmtx_unlock();
rm  tsleep(); with
rmmsleep()s using PDROP. If the attachment gets lost, the patch is
rmalso
rm  here:
rm  http://people.freebsd.org/~rmacklem/tsleep.patch
rm   
rmThanks for reporting this, rick
rmps: Is mtx_lock() now preferred over msleep()?
rm   What do you mean ?
rm 
rm  mtx_sleep() is preferred over msleep(), but I doubt I will remove
rm  msleep()
rm  anytime soon.
rm 
rm Ok, I'll redo the patch with mtx_sleep() and get one of you guys to
rm review it.

 Thank you for the patch!  I applied it and put the box under a stress
 testing again.

-- Hiroki


pgphnvwzNb6TV.pgp
Description: PGP signature


another panic in 8.3-PRERELEASE

2012-02-23 Thread Hiroki Sato
Hi,

 This is another reproducible panic.  This seems to happen only when
 top(1) is running for a long time (a sysctl() call for
 CTL_KERN.KERN_PROC.KERN_PROC_PROC MIB triggered it).


pool.allbsd.org dumped core - see /var/crash/vmcore.0

Thu Feb 23 23:21:52 JST 2012

FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #8: Thu Feb 23 
04:40:54 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL  amd64

panic:

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x800e96000
fault code  = supervisor write data, protection violation
instruction pointer = 0x20:0x809440cb
stack pointer   = 0x28:0xff86c63890b0
frame pointer   = 0x28:0xff86c6389100
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 47211 (top)
lock order reversal: (Giant after non-sleepable)
 1st 0xff0244b85568 process lock (process lock) @ 
/usr/src/sys/kern/kern_proc.c:1211
 2nd 0x80d74c80 Giant (Giant) @ /usr/src/sys/dev/usb/input/ukbd.c:2018
KDB: stack backtrace:
Dumping 23903 out of 24550 MB:..1%..11%..21%..31% (CTRL-C to abort)  (CTRL-C to 
abort) ..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from 
/boot/kernel/geom_mirror.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/geom_mirror.ko
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
/boot/kernel/ipfw.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipfw.ko
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
263 if (textdump_pending)
(kgdb) #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
#1  0x801f8cfc in db_fncall (dummy1=Variable dummy1 is not available.
)
at /usr/src/sys/ddb/db_command.c:548
#2  0x801f9031 in db_command (last_cmdp=0x80d37f40, 
cmd_table=Variable cmd_table is not available.

) at /usr/src/sys/ddb/db_command.c:445
#3  0x801f9280 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:498
#4  0x801fb369 in db_trap (type=Variable type is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0x8069dff1 in kdb_trap (type=12, code=0, tf=0xff86c6389000)
at /usr/src/sys/kern/subr_kdb.c:548
#6  0x809461ed in trap_fatal (frame=0xff86c6389000, eva=Variable 
eva is not available.
)
at /usr/src/sys/amd64/amd64/trap.c:820
#7  0x809468b5 in trap (frame=0xff86c6389000)
at /usr/src/sys/amd64/amd64/trap.c:326
#8  0x8092d2f4 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:228
#9  0x809440cb in copyout () at /usr/src/sys/amd64/amd64/support.S:258
#10 0x80675f1f in sysctl_old_user (req=0xff86c63899c0,
p=0xff86c6389470, l=1088) at /usr/src/sys/kern/kern_sysctl.c:1276
#11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470,
req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085
#12 0x8065ff6c in sysctl_out_proc (p=0xff0244b85470,
req=0xff86c63899c0, flags=Variable flags is not available.
) at /usr/src/sys/kern/kern_proc.c:1114
#13 0x8066245e in sysctl_kern_proc (oidp=Variable oidp is not 
available.
)
at /usr/src/sys/kern/kern_proc.c:1302
#14 0x806756e8 in sysctl_root (oidp=Variable oidp is not available.
)
at /usr/src/sys/kern/kern_sysctl.c:1455
#15 0x8067598e in userland_sysctl (td=0x0, name=0xff86c6389a80,
namelen=3, old=0x800e96000, oldlenp=Variable oldlenp is not available.
)
at /usr/src/sys/kern/kern_sysctl.c:1565
#16 0x80675e3a in __sysctl (td=0xff0396ec5460,
uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491
#17 0x80945809 in amd64_syscall (td=0xff0396ec5460, traced=0)
at subr_syscall.c:114
#18 0x8092d5ec in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:387
#19 0x000800abecfc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)


db show alllocks
Process 1169 (sshd) thread 0xff0022cfa460 (100715)
exclusive sx so_rcv_sx 

panic in 8.3-PRERELEASE

2012-02-22 Thread Hiroki Sato
Hi,

 Just a report, but I got the following panic on an NFS server running
 8.3-PRERELEASE:

(from here)
pool.allbsd.org dumped core - see /var/crash/vmcore.0

Tue Feb 21 10:59:44 JST 2012

FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #7: Thu Feb 16 
19:29:19 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL  amd64

panic: Assertion lock == sq-sq_lock failed at 
/usr/src/sys/kern/subr_sleepqueue.c:335

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:


Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from 
/boot/kernel/geom_mirror.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/geom_mirror.ko
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
/boot/kernel/ipfw.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipfw.ko
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
263 if (textdump_pending)
(kgdb) #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
#1  0x801f8cfc in db_fncall (dummy1=Variable dummy1 is not available.
)
at /usr/src/sys/ddb/db_command.c:548
#2  0x801f9031 in db_command (last_cmdp=0x80d37f40, 
cmd_table=Variable cmd_table is not available.
)
at /usr/src/sys/ddb/db_command.c:445
#3  0x801f9280 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:498
#4  0x801fb369 in db_trap (type=Variable type is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0x8069e021 in kdb_trap (type=3, code=0, tf=0xff86c5f7e640)
at /usr/src/sys/kern/subr_kdb.c:548
#6  0x80946766 in trap (frame=0xff86c5f7e640)
at /usr/src/sys/amd64/amd64/trap.c:595
#7  0x8092d324 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:228
#8  0x8069de7b in kdb_enter (why=0x80a891dd panic, 
msg=0xa Address 0xa out of bounds) at cpufunc.h:63
#9  0x8066afc0 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:597
#10 0x806a9360 in sleepq_add (wchan=0xff0073b97a00, 
lock=0x80d6af00, wmesg=0x80a7bb28 nfsrc, flags=0, 
queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:335
#11 0x80673e4f in _sleep (ident=0xff0073b97a00, 
lock=0x80d6af00, priority=Variable priority is not available.
) at /usr/src/sys/kern/kern_synch.c:218
#12 0x805fe01e in nfsrvd_updatecache (nd=0xff86c5f7e960, 
so=0xff002217c000) at /usr/src/sys/fs/nfsserver/nfs_nfsdcache.c:697
#13 0x805ea934 in nfssvc_program (rqst=0xff0476070800, 
xprt=0xff000edd0a00) at /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:333
#14 0x8084c76b in svc_run_internal (pool=0xff000c876600, 
ismaster=0) at /usr/src/sys/rpc/svc.c:895
#15 0x8084cc8b in svc_thread_start (arg=Variable arg is not available.
)
at /usr/src/sys/rpc/svc.c:1200
#16 0x80640865 in fork_exit (
callout=0x8084cc80 svc_thread_start, arg=0xff000c876600, 
frame=0xff86c5f7ec50) at /usr/src/sys/kern/kern_fork.c:876
#17 0x8092d86e in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:602
#18 0x0080 in ?? ()
#19 0x7fffe700 in ?? ()
#20 0x002e in ?? ()
#21 0x in ?? ()
#22 0xfef4 in ?? ()
#23 0xff000e1028c0 in ?? ()
#24 0x009b in ?? ()
#25 0x7fffe700 in ?? ()
#26 0x0006 in ?? ()
#27 0x0003 in ?? ()
#28 0x in ?? ()
#29 0x7fffe720 in ?? ()
#30 0x in ?? ()
#31 0x in ?? ()
#32 0x0001 in ?? ()
#33 0x001b0013000c in ?? ()
#34 0x7fe8 in ?? ()
#35 0x003b003b0001 in ?? ()
#36 0x0002 in ?? ()
#37 0x0008006a1dac in ?? ()
#38 0x0043 in ?? ()
#39 0x0202 in ?? ()
#40 0x7fffe6c8 in ?? ()
#41 0x003b in ?? ()
#42 0xff0022262470 in ?? ()
#43 0x in ?? ()
#44 0x80d80e40 in tdq_cpu ()
#45 0xff00057958c0 in ?? ()
#46 0xff86c5f7e930 in ?? ()
#47 0xff86c5f7e8d8 in ?? ()
#48 0xff002218c8c0 in ?? ()
#49 0x80691397 in sched_switch (td=0xff000c876600, 
newtd=0x8084cc80, flags=Variable flags is not available.
) at 

Re: New BSD Installer

2012-02-17 Thread Hiroki Sato
Andriy Gapon a...@freebsd.org wrote
  in 4f3e3000.9000...@freebsd.org:

av -BEGIN PGP SIGNED MESSAGE-
av Hash: SHA1
av
av on 17/02/2012 09:04 Hiroki Sato said the following:
av  No, the issue is our gptloader assumes the backup header is always located
av  at the (physical) last sector while this is not mandatory in the UEFI
av  specification.
av
av Are you sure?

 Yes, sure.  In the gm0-md0+md1 case, the last LBA of the device is
 changed (growed in size) but they can still have a valid backup
 header at the last LBA - 1 before an attempt to grow the size of
 the volume as the last paragraph of your excerpts says.  If we
 *choose* to grow the device size permanently, the backup header must
 be relocated at the new last LBA.  However, before the relocation
 happens, the specification says both the primary and secondary header
 must be valid in the previous device size.  This is my understanding.

 This means software should assume the device size can grow and should
 not assume the backup header is always located at the last possible
 LBA on the device.  If AlternateLBA does not match the device size -
 1, the software should recognize the location of the backup header
 based on the information in the primary header first.  The gptboot
 does not do so currently.  I didn't give it a try actually but the
 attached patch is what I want to say.

-- Hiroki
Index: sys/boot/common/gpt.c
===
--- sys/boot/common/gpt.c	(revision 230616)
+++ sys/boot/common/gpt.c	(working copy)
@@ -333,24 +333,26 @@
 	gptread_table(primary, uuid, dskp, hdr_primary,
 	table_primary) == 0) {
 		hdr_primary_lba = hdr_primary.hdr_lba_self;
+		/* Use AlternateLBA if valid.  If not, use LastUsableLBA+34. */
+		if (hdr_primary_lba  hdr_primary.hdr_lba_alt)
+			altlba = hdr_primary.hdr_lba_alt;
+		else if (hdr_primary.hdr_lba_end != 0)
+			altlba = hdr_primary.hdr_lba_end + 34;
 		gpthdr = hdr_primary;
 		gpttable = table_primary;
 	}

-	altlba = drvsize(dskp);
-	if (altlba  0)
-		altlba--;
-	else if (hdr_primary_lba  0) {
-		/*
-		 * If we cannot obtain disk size, but primary header
-		 * is valid, we can get backup header location from
-		 * there.
-		 */
-		altlba = hdr_primary.hdr_lba_alt;
+	/*
+	 * Try to locate the backup header from the media size if no primary
+	 * header found.
+	 */
+	if (hdr_primary_lba == 0) {
+		altlba = drvsize(dskp);
+		if (altlba  0)
+			altlba--;
 	}
-	if (altlba == 0)
-		printf(%s: unable to locate backup GPT header\n, BOOTPROG);
-	else if (gptread_hdr(backup, dskp, hdr_backup, altlba) == 0 
+	if (altlba != 0 
+	gptread_hdr(backup, dskp, hdr_backup, altlba) == 0 
 	gptread_table(backup, uuid, dskp, hdr_backup,
 	table_backup) == 0) {
 		hdr_backup_lba = hdr_backup.hdr_lba_self;
@@ -359,7 +361,8 @@
 			gpttable = table_backup;
 			printf(%s: using backup GPT\n, BOOTPROG);
 		}
-	}
+	} else
+		printf(%s: unable to locate backup GPT header\n, BOOTPROG);

 	/*
 	 * Convert all BOOTONCE without BOOTME flags into BOOTFAILED.


pgppi2XRbnX5b.pgp
Description: PGP signature


Re: New BSD Installer

2012-02-16 Thread Hiroki Sato
Jeremy Chadwick free...@jdc.parodius.com wrote
  in 20120217030806.ga62...@icarus.home.lan:

fr On Thu, Feb 16, 2012 at 07:40:35PM -0700, Warren Block wrote:
fr  Sorry, I may be misunderstanding your point.  GEOM classes don't
fr  lie, they accurately represent the space.  The space provided by a
fr  gmirror is one block less than the actual space occupied, to allow
fr  for the metadata block at the end.  The problem is that GPT puts
fr  backup partition tables at the end of the physical (not logical)
fr  device. Create a GEOM device on that drive, and the GEOM metadata
fr  overwrites the backup GPT partition table.  Well, the last block of
fr  it, anyway.
fr 
fr  But create the GEOM device inside a GPT partition that spans the
fr  drive, and things are fine.  The GPT backup tables are safely
fr  outside the GEOM metadata, which is safely outside of the data.
fr
fr I wasn't aware you could do that.  I was only aware that it was the
fr other way around.  That (my) misconception seems to also be relayed
fr by others such as Miroslav who said:
fr
fr GPT doesn't play nice with GEOM classes which store their metadata
fr on last sector.  For example, you can't use gmirror of a whole drives
fr and use GPT on top of this mirror. (and gmirror is not the only one)
fr
fr So if I read this correctly, it means that the erroneous behaviour is
fr the result of someone doing things in the wrong order (for lack of
fr better terminology).

 Well, does GPT really depend on the absolute last block?  The header
 has fields for both the first and the last LBAs and they do not have
 to be matched with the physical capacity.  Creating a gmirror first,
 and then creating a GPT on it does not work?  I do not think it is
 true, and I suspect a description on gmirror recommending
 kern.geom.debugflags=17 in the handbook is the source of the problem.

 The partition layout in my mind is the following:

 (0)(last)
 |PMBR|GPT primary|   |GPT secondary|gmirror meta|
 |-| ada0
 ||| mirror/gm0
 ||-|  | mirror/gm0p{1,2,...}

 and the following commands will create an example of this
 configuration:

 # mdconfig -a -t vnode -s100m
 md0
 # mdconfig -a -t vnode -s100m
 md1
 # gmirror label gm0 /dev/md0 /dev/md1
 # gmirror dump /dev/md0 | grep size
  mediasize: 104857088
 sectorsize: 512
  provsize: 104857600
 # gpart create -s gpt mirror/gm0
 # gpart add -t freebsd-ufs mirror/gm0
 mirror/gm0p1 added
 =34  204732  mirror/gm0  GPT  (100M)
   34  204732   1  freebsd-ufs  (100M)
 # echo (34 + 204732) * 512 | bc
 104840192

 The size of GPT header + partition entries is 33 sectors.  So,

 # echo (34 + 204732) * 512 + 33 * 512 | bc
 104857088

 is the size which the GPT recognizes.  This matches the size of
 mirror/gm0, not /dev/md0.  This means the gmirror metadata is located
 just after it.  I think this should work in most cases for mirroring
 the whole disk.

 Certainly the gpart reports [CORRUPT] if the underlying device
 capacity does not match with the GPT header.  For example,
 deactivating mirror/gm0 above will show the following:

 # gpart show
 =34  204732  mirror/gm0  GPT  (100M)
   34  204732   1  freebsd-ufs  (100M)
 # gmirror stop gm0
 # gpart show
 =34  204732  md1  GPT  (100M) [CORRUPT]
   34  2047321  freebsd-ufs  (100M)

 =34  204732  md0  GPT  (100M) [CORRUPT]
   34  2047321  freebsd-ufs  (100M)
 # gpart recover md0
 md0 recovered
 # gpart show
 =34  204732  md1  GPT  (100M) [CORRUPT]
   34  2047321  freebsd-ufs  (100M)

 =34  204733  md0  GPT  (100M)
   34  2047321  freebsd-ufs  (100M)
   204766   1   - free -  (512B)

 We can see the gpart recover extends the size to the last sector
 where gmirror metadata was placed and clears the [CORRUPT] status
 as expected.

 So, some early boot stages which do not recognize mirror/gm0 see the
 corrupted GPT.  However, I think they will simply follow the
 information in the GPT header.

-- Hiroki


pgpHa3wCUo9zw.pgp
Description: PGP signature


Re: New BSD Installer

2012-02-16 Thread Hiroki Sato
Freddie Cash fjwc...@gmail.com wrote
  in caojfwz5ehgfr_vp0+trfxvgm6kzxv9qo3ufvdkura96z3ax...@mail.gmail.com:

fj On Thu, Feb 16, 2012 at 8:20 PM, Hiroki Sato h...@freebsd.org wrote:
fj  Jeremy Chadwick free...@jdc.parodius.com wrote
fj   in 20120217030806.ga62...@icarus.home.lan:
fj 
fj  fr On Thu, Feb 16, 2012 at 07:40:35PM -0700, Warren Block wrote:
fj  fr  Sorry, I may be misunderstanding your point.  GEOM classes don't
fj  fr  lie, they accurately represent the space.  The space provided by a
fj  fr  gmirror is one block less than the actual space occupied, to allow
fj  fr  for the metadata block at the end.  The problem is that GPT puts
fj  fr  backup partition tables at the end of the physical (not logical)
fj  fr  device. Create a GEOM device on that drive, and the GEOM metadata
fj  fr  overwrites the backup GPT partition table.  Well, the last block of
fj  fr  it, anyway.
fj  fr 
fj  fr  But create the GEOM device inside a GPT partition that spans the
fj  fr  drive, and things are fine.  The GPT backup tables are safely
fj  fr  outside the GEOM metadata, which is safely outside of the data.
fj  fr
fj  fr I wasn't aware you could do that.  I was only aware that it was the
fj  fr other way around.  That (my) misconception seems to also be relayed
fj  fr by others such as Miroslav who said:
fj  fr
fj  fr GPT doesn't play nice with GEOM classes which store their metadata
fj  fr on last sector.  For example, you can't use gmirror of a whole 
drives
fj  fr and use GPT on top of this mirror. (and gmirror is not the only one)
fj  fr
fj  fr So if I read this correctly, it means that the erroneous behaviour is
fj  fr the result of someone doing things in the wrong order (for lack of
fj  fr better terminology).
fj 
fj   Well, does GPT really depend on the absolute last block?  The header
fj   has fields for both the first and the last LBAs and they do not have
fj   to be matched with the physical capacity.  Creating a gmirror first,
fj   and then creating a GPT on it does not work?  I do not think it is
fj   true, and I suspect a description on gmirror recommending
fj   kern.geom.debugflags=17 in the handbook is the source of the problem.
fj 
fj It's not the partitioning that's the issue.  It's the order that GEOM
fj providers and GPT partition tables are tasted.
fj 
fj You can gmirror two disks, then GPT partition the gm0 device without
fj any issues.  As you noted, the first/last sectors are 1 less than the
fj physical disk (the size of the gmirror provider).
fj 
fj When you boot, though, the gptboot loader only sees the GPT table, it
fj doesn't know that it's part of a gmirror setup.  Thus it loads the
fj GPT, notices that the size of the GPT is 1 less sector than the size
fj of the disk, can't find the secondary GPT table as the last sector of
fj the disk is gmirror metadata, and complains about corrupted GPT.
fj 
fj Then the kernel loads, gmirror tastes the disk, finds the gmirror
fj metadata, configures the gmirror provider, and now all the GPT stuff
fj matches again.  And the system carries on correctly.
fj 
fj The issue is that we don't have a GEOM-aware loader.  Or, at least,
fj that the gpt*boot loaders read the GPT table(s) before configuring the
fj GEOM providers.

 No, the issue is our gptloader assumes the backup header is always
 located at the (physical) last sector while this is not mandatory in
 the UEFI specification.  GEOM-based logical volumes suffer from this
 assumption at boot time.  It is not practical (and not necessary) to
 taste the volumes before loading a kernel.

 If the primary header is valid, using a lookup order of the
 hdr_lba_alt(AlternateLBA), the hdr_lba_end(LastUsableLBA), then
 drvsize() - 1 looks reasonable to me.  The current code uses
 drvsize() - 1 first and then looks up the AlternateLBA only when
 drvsize() failed.

-- Hiroki


pgpTRoiMCJIgR.pgp
Description: PGP signature


Re: accepting rtadv broken on 9-STABLE, re driver?

2012-01-09 Thread Hiroki Sato
Mark Felder f...@feld.me wrote
  in op.v7tvkbkr34t2sn@tech304:

fe On Sat, 07 Jan 2012 14:23:46 -0600, Hiroki Sato h...@freebsd.org
fe wrote:
fe
fe   It is an unexpected behavior and the flag should be set on all
fe   interfaces.  Can you send me your /etc/rc.conf, /etc/sysctl.conf, and
fe   the result of ifconfig -a?
fe
fe Back at work so I have access to the machine again:
(snip)
fe # ifconfig -a
fe
fe 11:43:29 tech304:~  ifconfig -a
fe re0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST metric
fe 0 mtu 1500
fe 
options=209bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC
fe ether d0:67:e5:17:e1:32
fe inet6 fe80::d267:e5ff:fe17:e132%re0 prefixlen 64 scopeid 0x2
fe inet 192.168.93.23 netmask 0xff00 broadcast 192.168.93.255
fe inet6 2607:f4e0:100:104:d267:e5ff:fe17:e132 prefixlen 64 autoconf
fe nd6 options=23PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL
fe media: Ethernet autoselect (100baseTX full-duplex)
fe status: active

 re0 seems to have ACCEPT_RTADV.  What is the problem?

fe lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384
fe options=3RXCSUM,TXCSUM
fe inet6 ::1 prefixlen 128
fe inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
fe inet 127.0.0.1 netmask 0xff00
fe nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
fe vboxnet0: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500
fe ether 0a:00:27:00:00:00
fe nd6 options=23PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL


-- Hiroki


pgpOYUgzx28Hl.pgp
Description: PGP signature


Re: accepting rtadv broken on 9-STABLE, re driver?

2012-01-09 Thread Hiroki Sato
Mark Felder f...@feld.me wrote
  in op.v7t4xpuh34t2sn@tech304:

fe On Mon, 09 Jan 2012 13:02:24 -0600, Hiroki Sato h...@freebsd.org
fe wrote:
fe
fe  re0 seems to have ACCEPT_RTADV.  What is the problem?
fe
fe That's because I haven't rebooted
fe
fe Let's start fresh.
fe
fe The normal ipv6 configuration anyone would use:
fe
fe -ipv6_activate_all_interfaces=YES in rc.conf
fe
fe -NO mention of net.inet6.ip6.accept_rtadv in sysctl.conf
fe
fe I boot up, re0 *does not* have ACCEPT_RTADV.

 This is an expected behavior.  ACCEPT_RTADV is disabled by default on
 9.X.

fe I try forcing via the sysctl: net.inet6.ip6.accept_rtadv=1
fe
fe Still doesn't work!

 This needs a reboot.  Did you reboot the box?

fe Why? What makes this machine different? All the other machines I run
fe do not require this to get ACCEPT_RTADV. Is it the re driver? My other
fe machines have em and ath interfaces.

 Putting the following line

  net.inet6.ip6.accept_rtadv=1

 into /etc/sysctl.conf, and then removing the following line

  ipv6_ifconfig_re0=inet6 accept_rtadv

 should work, I think.  (Of course a reboot is needed after that).

-- Hiroki


pgpruYhioBd6d.pgp
Description: PGP signature


Re: accepting rtadv broken on 9-STABLE, re driver?

2012-01-07 Thread Hiroki Sato
Mark Felder f...@feld.me wrote
  in op.v7ogp01w34t2sn@tech304:

fe I figured I would end up putting that in rc.conf as a temporary fix,
fe but maybe that's just the long term solution. It seems so odd to me
fe that the sysctl change doesn't automatically cause the ACCEPT_RTADV
fe option to show up for re0, but it does for vboxnet0. Perhaps there
fe should be a cleaner way to do this in rc.conf like how we do
fe ifconfig_re0=DHCP ?

 Is it correct that ACCEPT_RTADV option was enabled on the vboxnet0
 and not on re0, even after setting net.inet6.ip6.accept_rtadv to 1 at
 boot time and ipv6_activate_all_interfaces=YES?

-- Hiroki


pgpdfA1Ujv4In.pgp
Description: PGP signature


Re: accepting rtadv broken on 9-STABLE, re driver?

2012-01-07 Thread Hiroki Sato
Mark Felder f...@feld.me wrote
  in 891fe25c-1560-479f-b855-1713c1c7a...@email.android.com:

fe Hiroki Sato h...@freebsd.org wrote:
fe 
fe  Is it correct that ACCEPT_RTADV option was enabled on the vboxnet0
fe  and not on re0, even after setting net.inet6.ip6.accept_rtadv to 1 at
fe  boot time and ipv6_activate_all_interfaces=YES?
fe 
fe -- Hiroki
fe
fe Yes, that is the behavior I witnessed.

 It is an unexpected behavior and the flag should be set on all
 interfaces.  Can you send me your /etc/rc.conf, /etc/sysctl.conf, and
 the result of ifconfig -a?

-- Hiroki


pgpajcFzYLzs1.pgp
Description: PGP signature


Re: ZFS panic on a RELENG_8 NFS server

2011-09-19 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20110911.054601.1424617155148336027@allbsd.org:

hr Hiroki Sato h...@freebsd.org wrote
hr   in 20110910.044841.232160047547388224@allbsd.org:
hr
hr hr Hiroki Sato h...@freebsd.org wrote
hr hr   in 20110907.094717.2272609566853905102@allbsd.org:
hr hr
hr hr hr  During this investigation an disk has to be replaced and 
resilvering
hr hr hr  it is now in progress.  A deadlock and a forced reboot after that
hr hr hr  make recovering of the zfs datasets take a long time (for 
committing
hr hr hr  logs, I think), so I will try to reproduce the deadlock and get a
hr hr hr  core dump after it finished.
hr hr
hr hr  I think I could reproduce the symptoms.  I have no idea about if
hr hr  these are exactly the same as occurred on my box before because the
hr hr  kernel was replaced with one with some debugging options, but these
hr hr  are reproducible at least.
hr hr
hr hr  There are two symptoms.  One is a panic.  A DDB output when the panic
hr hr  occurred is the following:
hr
hr  I am trying vfs.lookup_shared=0 and seeing how it goes.  It seems the
hr  box can endure a high load which quickly caused these symptoms.

 There was no difference by the knob.  The same panic or
 unresponsiveness still occurs in about 24-32 hours or so.

-- Hiroki


pgpIwsQ57ZO6Q.pgp
Description: PGP signature


Re: ZFS panic on a RELENG_8 NFS server

2011-09-10 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20110910.044841.232160047547388224@allbsd.org:

hr Hiroki Sato h...@freebsd.org wrote
hr   in 20110907.094717.2272609566853905102@allbsd.org:
hr
hr hr  During this investigation an disk has to be replaced and resilvering
hr hr  it is now in progress.  A deadlock and a forced reboot after that
hr hr  make recovering of the zfs datasets take a long time (for committing
hr hr  logs, I think), so I will try to reproduce the deadlock and get a
hr hr  core dump after it finished.
hr
hr  I think I could reproduce the symptoms.  I have no idea about if
hr  these are exactly the same as occurred on my box before because the
hr  kernel was replaced with one with some debugging options, but these
hr  are reproducible at least.
hr
hr  There are two symptoms.  One is a panic.  A DDB output when the panic
hr  occurred is the following:

 I am trying vfs.lookup_shared=0 and seeing how it goes.  It seems the
 box can endure a high load which quickly caused these symptoms.

-- Hiroki


pgpfb5zUJdfPH.pgp
Description: PGP signature


ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too long (RELENG_8 from today))

2011-09-09 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20110907.094717.2272609566853905102@allbsd.org:

hr  During this investigation an disk has to be replaced and resilvering
hr  it is now in progress.  A deadlock and a forced reboot after that
hr  make recovering of the zfs datasets take a long time (for committing
hr  logs, I think), so I will try to reproduce the deadlock and get a
hr  core dump after it finished.

 I think I could reproduce the symptoms.  I have no idea about if
 these are exactly the same as occurred on my box before because the
 kernel was replaced with one with some debugging options, but these
 are reproducible at least.

 There are two symptoms.  One is a panic.  A DDB output when the panic
 occurred is the following:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x10040
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8065b926
stack pointer   = 0x28:0xff8257b94d70
frame pointer   = 0x28:0xff8257b94e10
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 992 (nfsd: service)
[thread pid 992 tid 100586 ]
Stopped at  witness_checkorder+0x246:   movl0x40(%r13),%ebx

db bt
Tracing pid 992 tid 100586 td 0xff00595d9000
witness_checkorder() at witness_checkorder+0x246
_sx_slock() at _sx_slock+0x35
dmu_bonus_hold() at dmu_bonus_hold+0x57
zfs_zget() at zfs_zget+0x237
zfs_dirent_lock() at zfs_dirent_lock+0x488
zfs_dirlook() at zfs_dirlook+0x69
zfs_lookup() at zfs_lookup+0x26b
zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81
vfs_cache_lookup() at vfs_cache_lookup+0xf0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40
lookup() at lookup+0x384
nfsvno_namei() at nfsvno_namei+0x268
nfsrvd_lookup() at nfsrvd_lookup+0xd6
nfsrvd_dorpc() at nfsrvd_dorpc+0x745
nfssvc_program() at nfssvc_program+0x447
svc_run_internal() at svc_run_internal+0x51b
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x11d
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x8006a031c, rsp = 0x7fffe6c8, rbp = 0x6 ---


 The complete output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_1/pool-zfs-20110909-1.txt

 Another is getting stuck at ZFS access.  The kernel is running with
 no panic but any access to ZFS datasets causes a program
 non-responsive.  The DDB output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_2/pool-zfs-20110909-2.txt

 The trigger for the both was some access to a ZFS dataset from the
 NFS clients.  Because the access pattern was complex I could not
 narrow down what was the culprit, but it seems timing-dependent and
 simply doing rm -rf locally on the server can sometimes trigger
 them.

 The crash dump and the kernel can be found at the following URLs:

  panic:
http://people.allbsd.org/~hrs/zfs_panic_20110909_1/

  no panic but unresponsive:
http://people.allbsd.org/~hrs/zfs_panic_20110909_2/

  kernel:
http://people.allbsd.org/~hrs/zfs_panic_20110909_kernel/

-- Hiroki
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: panic: spin lock held too long (RELENG_8 from today)

2011-09-06 Thread Hiroki Sato
Attilio Rao atti...@freebsd.org wrote
  in CAJ-FndAChGndC=lkzni7i6mot+spw3-ofto9rh0+5wnnvwz...@mail.gmail.com:

at This should be enough for someone NFS-aware to look into it.
at
at Were you also able to get a core?

 Yes.  But as kib@ pointed out it seems a deadlock in ZFS.  Some
 experiments I did showed that this deadlock can be triggered at least
 by doing rm -rf against a local directory that has a large number
 of files/sub-directories.

 Then, I updated the kernel with the latest 8-STABLE + WITNESS option
 because a fix for LOR of spa_config lock was committed and tracking
 locks without WITNESS was hard.  The deadlock can still be triggered
 after that.

 During this investigation an disk has to be replaced and resilvering
 it is now in progress.  A deadlock and a forced reboot after that
 make recovering of the zfs datasets take a long time (for committing
 logs, I think), so I will try to reproduce the deadlock and get a
 core dump after it finished.

 If the old kernel and core of the deadlock I reported on Saturday are
 still useful for debugging, I can put them to somewhere you can
 access.

-- Hiroki


pgptCZubr4hdM.pgp
Description: PGP signature


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-19 Thread Hiroki Sato
Attilio Rao atti...@freebsd.org wrote
  in CAJ-FndDHmwa+=lnggu+5mk2xmtj8kwhb10jsoytkmgetvgn...@mail.gmail.com:

at If nobody complains about it earlier, I'll propose the patch to re@ in 8 
hours.

 Running fine for 45 hours so far.  Please go ahead!

-- Hiroki


pgp3JVRs7kKa0.pgp
Description: PGP signature


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-18 Thread Hiroki Sato
Chip Camden sterl...@camdensoftware.com wrote
  in 20110818025550.ga1...@libertas.local.camdensoftware.com:

st Quoth Attilio Rao on Thursday, 18 August 2011:
st  In callout_cpu_switch() if a low priority thread is migrating the
st  callout and gets preempted after the outcoming cpu queue lock is left
st  (and scheduled much later) we get this problem.
st 
st  In order to fix this bug it could be enough to use a critical section,
st  but I think this should be really interrupt safe, thus I'd wrap them
st  up with spinlock_enter()/spinlock_exit(). Fortunately
st  callout_cpu_switch() should be called rarely and also we already do
st  expensive locking operations in callout, thus we should not have
st  problem performance-wise.
st 
st  Can the guys I also CC'ed here try the following patch, with all the
st  initial kernel options that were leading you to the deadlock? (thus
st  revert any debugging patch/option you added for the moment):
st  http://www.freebsd.org/~attilio/callout-fixup.diff
st 
st  Please note that this patch is for STABLE_8, if you can confirm the
st  good result I'll commit to -CURRENT and then backmarge as soon as
st  possible.
st 
st  Thanks,
st  Attilio
st 
st
st Thanks, Attilio.  I've applied the patch and removed the extra debug
st options I had added (though keeping debug symbols).  I'll let you know if
st I experience any more panics.

 No panic for 20 hours at this moment, FYI.  For my NFS server, I
 think another 24 hours would be sufficient to confirm the stability.
 I will see how it works...

-- Hiroki


pgpatVE0r5wVx.pgp
Description: PGP signature


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-17 Thread Hiroki Sato
Hi,

Mike Tancsa m...@sentex.net wrote
  in 4e15a08c.6090...@sentex.net:

mi On 7/7/2011 7:32 AM, Mike Tancsa wrote:
mi  On 7/7/2011 4:20 AM, Kostik Belousov wrote:
mi 
mi  BTW, we had a similar panic, spinlock held too long, the spinlock
mi  is the sched lock N, on busy 8-core box recently upgraded to the
mi  stable/8. Unfortunately, machine hung dumping core, so the stack trace
mi  for the owner thread was not available.
mi 
mi  I was unable to make any conclusion from the data that was present.
mi  If the situation is reproducable, you coulld try to revert r221937. This
mi  is pure speculation, though.
mi 
mi  Another crash just now after 5hrs uptime. I will try and revert r221937
mi  unless there is any extra debugging you want me to add to the kernel
mi  instead  ?

 I am also suffering from a reproducible panic on an 8-STABLE box, an
 NFS server with heavy I/O load.  I could not get a kernel dump
 because this panic locked up the machine just after it occurred, but
 according to the stack trace it was the same as posted one.
 Switching to an 8.2R kernel can prevent this panic.

 Any progress on the investigation?

--
spin lock 0x80cb46c0 (sched lock 0) held by 0xff01900458c0 (tid 
100489) too long
panic: spin lock held too long
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x187
_mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
_mtx_lock_spin() at _mtx_lock_spin+0x9e
sched_add() at sched_add+0x117
setrunnable() at setrunnable+0x78
sleepq_signal() at sleepq_signal+0x7a
cv_signal() at cv_signal+0x3b
xprt_active() at xprt_active+0xe3
svc_vc_soupcall() at svc_vc_soupcall+0xc
sowakeup() at sowakeup+0x69
tcp_do_segment() at tcp_do_segment+0x25e7
tcp_input() at tcp_input+0xcdd
ip_input() at ip_input+0xac
netisr_dispatch_src() at netisr_dispatch_src+0x7e
ether_demux() at ether_demux+0x14d
ether_input() at ether_input+0x17d
em_rxeof() at em_rxeof+0x1ca
em_handle_que() at em_handle_que+0x5b
taskqueue_run_locked() at taskqueue_run_locked+0x85
taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--

-- Hiroki


pgpq7HXO6kUuo.pgp
Description: PGP signature


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-17 Thread Hiroki Sato
Attilio Rao atti...@freebsd.org wrote
  in caj-fndcdow0_b2mv0lzeo-tpea9+7oanj7ihvkqsm4j4b0d...@mail.gmail.com:

at 2011/8/17 Hiroki Sato h...@freebsd.org:
at  Hi,
at 
at  Mike Tancsa m...@sentex.net wrote
at   in 4e15a08c.6090...@sentex.net:
at 
at  mi On 7/7/2011 7:32 AM, Mike Tancsa wrote:
at  mi  On 7/7/2011 4:20 AM, Kostik Belousov wrote:
at  mi 
at  mi  BTW, we had a similar panic, spinlock held too long, the spinlock
at  mi  is the sched lock N, on busy 8-core box recently upgraded to the
at  mi  stable/8. Unfortunately, machine hung dumping core, so the stack 
trace
at  mi  for the owner thread was not available.
at  mi 
at  mi  I was unable to make any conclusion from the data that was present.
at  mi  If the situation is reproducable, you coulld try to revert 
r221937. This
at  mi  is pure speculation, though.
at  mi 
at  mi  Another crash just now after 5hrs uptime. I will try and revert 
r221937
at  mi  unless there is any extra debugging you want me to add to the kernel
at  mi  instead  ?
at 
at   I am also suffering from a reproducible panic on an 8-STABLE box, an
at   NFS server with heavy I/O load.  I could not get a kernel dump
at   because this panic locked up the machine just after it occurred, but
at   according to the stack trace it was the same as posted one.
at   Switching to an 8.2R kernel can prevent this panic.
at 
at   Any progress on the investigation?
at 
at Hiroki,
at how easilly can you reproduce it?

 It takes 5-10 hours.  I installed another kernel for debugging just
 now, so I think I will be able to collect more detail information in
 a couple of days.

at It would be important to have a DDB textdump with these informations:
at - bt
at - ps
at - show allpcpu
at - alltrace
at 
at Alternatively, a coredump which has the stop cpu patch which Andryi can 
provide.

 Okay, I will post them once I can get another panic.  Thanks!

-- Hiroki


pgpFqPofBZyKa.pgp
Description: PGP signature


Re: panic: spin lock held too long (RELENG_8 from today)

2011-08-17 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20110818.043332.27079545013461535@allbsd.org:

hr Attilio Rao atti...@freebsd.org wrote
hr   in caj-fndcdow0_b2mv0lzeo-tpea9+7oanj7ihvkqsm4j4b0d...@mail.gmail.com:
hr 
hr at 2011/8/17 Hiroki Sato h...@freebsd.org:
hr at  Hi,
hr at 
hr at  Mike Tancsa m...@sentex.net wrote
hr at   in 4e15a08c.6090...@sentex.net:
hr at 
hr at  mi On 7/7/2011 7:32 AM, Mike Tancsa wrote:
hr at  mi  On 7/7/2011 4:20 AM, Kostik Belousov wrote:
hr at  mi 
hr at  mi  BTW, we had a similar panic, spinlock held too long, the 
spinlock
hr at  mi  is the sched lock N, on busy 8-core box recently upgraded to 
the
hr at  mi  stable/8. Unfortunately, machine hung dumping core, so the 
stack trace
hr at  mi  for the owner thread was not available.
hr at  mi 
hr at  mi  I was unable to make any conclusion from the data that was 
present.
hr at  mi  If the situation is reproducable, you coulld try to revert 
r221937. This
hr at  mi  is pure speculation, though.
hr at  mi 
hr at  mi  Another crash just now after 5hrs uptime. I will try and revert 
r221937
hr at  mi  unless there is any extra debugging you want me to add to the 
kernel
hr at  mi  instead  ?
hr at 
hr at   I am also suffering from a reproducible panic on an 8-STABLE box, an
hr at   NFS server with heavy I/O load.  I could not get a kernel dump
hr at   because this panic locked up the machine just after it occurred, but
hr at   according to the stack trace it was the same as posted one.
hr at   Switching to an 8.2R kernel can prevent this panic.
hr at 
hr at   Any progress on the investigation?
hr at 
hr at Hiroki,
hr at how easilly can you reproduce it?
hr 
hr  It takes 5-10 hours.  I installed another kernel for debugging just
hr  now, so I think I will be able to collect more detail information in
hr  a couple of days.
hr 
hr at It would be important to have a DDB textdump with these informations:
hr at - bt
hr at - ps
hr at - show allpcpu
hr at - alltrace
hr at 
hr at Alternatively, a coredump which has the stop cpu patch which Andryi can 
provide.
hr 
hr  Okay, I will post them once I can get another panic.  Thanks!

 I got the panic with a crash dump this time.  The result of bt, ps,
 allpcpu, and traces can be found at the following URL:

  http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt

-- Hiroki
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.1 Pre-release gpart, isn't setting type correctly

2010-05-31 Thread Hiroki Sato
Phil p...@amdg.etowns.org wrote
  in 580ca5b8f8654fc782cc113761458...@hs:

ph Performing the following gpart commands on either a hard disk or
ph usb memory stick doesn't correctly store the gpart type information.
ph
ph What we're doing, using FreeBSD 8.1-PRERELEASE, csuped as at
ph 30-May-2010 23:59 UTC (*default date=2010.05.30.23.59.59)
ph
ph # gpart create -s GPT da1
ph # gpart add -s 1G -t freebsd-ufs da1
ph # gpart show da1
ph
ph = 34  7827325  da1  GPT  (3.7G)
ph34  20971521  !----  (1.0G)
ph   2097186  5730173   - free -  (2.7G)

 This is probably the same issue reported at
 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern%2F142174 and already
 fixed on CURRENT.  I guess the fix will be merged to 8.X soon.

-- Hiroki


pgpcqLZbjT00A.pgp
Description: PGP signature


Re: em interface slow down on 8.0R

2010-05-24 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20091220.053757.230970486@allbsd.org:

hr Jack Vogel jfvo...@gmail.com wrote
hr   in 2a41acea0912052327t7830f85aw5b4b581ab3f09...@mail.gmail.com:
hr
hr jf The 82573, when onboard (LOM) is usually special, it is used by system
hr jf management
hr jf firmware.  Go to the system BIOS and turn off management, see if that
hr jf eliminates the
hr jf periodic hang.
hr
hr  Well, I am using them without enabling such a BIOS feature on the two
hr  boxes.
hr
hr  I was monitoring for 1 week after replacing the kernel of 8.0-STABLE
hr  with 8.0R.  Frequency of the symptom was reduced, but occurred once
hr  in 2-3 days.  So it is reproducible on 8.0R, too.

 JFYI, when I tried 8-STABLE as of May 15 the periodic hang-ups
 disappeared.  The chip ids are 0x109a8086 and 0x108c8086 (pciconf
 reported them as 82573L and 82573E, added to PCI slots on the box).
 The hang-ups were able to be reproduced on 8.0-RELEASE.

 I didn't tried other boxes which had another symptom (abnormal long
 interval between each packet), but I will give it a try and report
 it, too I have no idea of what was the cause because there were a lot
 of changes since the release, though.

-- Hiroki


pgpgHrrrlXSG7.pgp
Description: PGP signature


Re: em interface slow down on 8.0R

2009-12-19 Thread Hiroki Sato
Jack Vogel jfvo...@gmail.com wrote
  in 2a41acea0912052327t7830f85aw5b4b581ab3f09...@mail.gmail.com:

jf The 82573, when onboard (LOM) is usually special, it is used by system
jf management
jf firmware.  Go to the system BIOS and turn off management, see if that
jf eliminates the
jf periodic hang.

 Well, I am using them without enabling such a BIOS feature on the two
 boxes.

 I was monitoring for 1 week after replacing the kernel of 8.0-STABLE
 with 8.0R.  Frequency of the symptom was reduced, but occurred once
 in 2-3 days.  So it is reproducible on 8.0R, too.

 Just after the symptom occurred, dev.em.[01].debug showed the
 following:

Dec 17 16:50:03 pool kernel: em0: Std mbuf failed = 0
Dec 17 16:50:03 pool kernel: em0: Std mbuf cluster failed = 9612

Dec 17 16:50:12 pool kernel: em1: Std mbuf failed = 0
Dec 17 16:50:12 pool kernel: em1: Std mbuf cluster failed = 15183

 The other numbers look normal to me.  dev.em.[01].stats reported
 almost all of the counters other than Good Packets are zero.

 Doing ifconfig down/up could make it work again, sending/receiving 10
 packets or so it stopped.

-- Hiroki


pgpqr0AvqiEGc.pgp
Description: PGP signature


Re: em interface slow down on 8.0R

2009-12-05 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20091203.182931.129751456@allbsd.org:

hr  And another thing, I noticed a box with 82573E and 82573L sometimes
hr  got stuck after upgrading to 8.0-STABLE.  It has moderate network
hr  load (average 5-10Mbps) on both NICs.  It worked for a day or two and
hr  then got stuck suddenly.  Rebooting the box solved the situation, but
hr  it got stuck again after a day or so.  After it happens, the
hr  interface does not respond.  The other functionalities of FreeBSD
hr  seemed working.  Doing an up/down cycle for the NICs seemed to send
hr  some packets, but it did not recover completely; rebooting was needed
hr  for recovery.  This box does not have the RTT problem.  I am still
hr  not sure what is the trigger, there seems something wrong.

 Things turned out for this symptom so far are:

 - This occurs around once per 1-2 days.

 - Once it occurs, all of communications including ARP and IPv4 stop.

 - ifconfig em0 down/up can recover the interface. However, on doing
   up after down the following message was displayed:

   # ifconfig em0 up
   em0: Could not setup receive structures

   After trying it several times it worked.

   Then, the interface seemed back to normal for a couple of minutes,
   but it stopped again.

 I guess there is a kind of deadlock somewhere but not sure it is
 really related to the em(4) driver.  I will continue to investigate
 anyway.

-- Hiroki


pgpLUYiLUHZZ7.pgp
Description: PGP signature


Re: loader(8) readin failed on 7.2R and later including 8.0R

2009-12-05 Thread Hiroki Sato
John Baldwin j...@freebsd.org wrote
  in 200912041734.24016@freebsd.org:

jh On Friday 04 December 2009 10:35:59 am John Baldwin wrote:
jh  So memtop_copyin would start off as 0xf0 but would end up as 0xc0,
jh  and since the kernel starts at 4MB, I think that only leaves about 8MB for
jh  the kernel.  Probably the loader needs to be more intelligent about using
jh  high memory for malloc by using the largest region  1MB but  4GB for
jh  malloc() instead of stealing memory from bios_extmem in the SMAP case.
jh  Try the attached patch which tries to make the loader use better smarts
jh  when picking a memory region for the heap (warning, I haven't tested it
jh  myself yet).
jh
jh Use the updated patch (actually tested in qemu) instead.

 Thanks!  I applied your patch and tried loading an 8.0R kernel
 (without LOADER_NO_GPT_SUPPORT=yes).  The elf32_loadimage: read
 failed error message disappeared:

 OK load /boot/kernel.N/kernel
 /boot/kernel.N/kernel text=0x8db9a4 data=0xdd134+0xa5e84 
syms=[0x4+0x99390+0x4+0xd2201
 elf32_loadimage: could not read symbols - skipped!
 OK

 A summary so far is:

 1)  a 8MB 7.1R kernel + stock 8.0R loader
 2a) a 8MB 8.0R kernel + stock 8.0R loader
 2b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes
 2c) a 8MB 8.0R kernel + loader with your patch
 3a) a 8MB 8.0R kernel + stock 8.0R loader
 3b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes
 3c) a 8MB 8.0R kernel + loader with your patch

  loading text  loading syms   boot
 1)   OKOK OK
 2a)  readin failed   -  -
 2b)  OKskipped! NG
 2c)  OKskipped! NG
 3a)  not tried yet
 3b)  OKOK NG
 3c)  OKOK NG

 Loading syms sections still fails for the large kernel.  The
 boot=NG means it got stuck after l_exec() in boot.c and before
 cninit() in i386/machdep.c as far as I can check by inserting
 printf().  So the cause of that is something in the kernel,
 I guess.  Hm.

 One thing something special of that box is that it has four quad-hme
 PCI cards.  I will try removing them and see if it changes something
 or not.

-- Hiroki


pgprJrQm3NyM7.pgp
Description: PGP signature


Re: loader(8) readin failed on 7.2R and later including 8.0R

2009-12-05 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20091205.184250.201700943@allbsd.org:

hr  A summary so far is:
hr
hr  1)  a 8MB 7.1R kernel + stock 8.0R loader
hr  2a) a 8MB 8.0R kernel + stock 8.0R loader
hr  2b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes
hr  2c) a 8MB 8.0R kernel + loader with your patch
hr  3a) a 8MB 8.0R kernel + stock 8.0R loader
hr  3b) a 8MB 8.0R kernel + 8.0R loader with LOADER_NO_GPT_SUPPORT=yes
hr  3c) a 8MB 8.0R kernel + loader with your patch

 Grr, I double-checked how it got stuck, then I found the console
 redirect was disabled because of an old device.hints.  The revised
 summary is:

   loading text  loading syms   boot
  1)   OKOK OK
  2a)  readin failed   -  -
  2b)  OKskipped! OK
  2c)  OKskipped! OK
  3a)  OKOK OK
  3b)  OKOK OK
  3c)  OKOK OK

 So, the case 2c shows that your patch solves the problem in the case
 2a.  Thank you! :)

 Loading 8MB kernel works now, but loading syms sections still fails
 even in the case 2c.

-- Hiroki


pgpZeDjCvRVGH.pgp
Description: PGP signature


Re: em interface slow down on 8.0R

2009-12-05 Thread Hiroki Sato
John Nielsen j...@jnielsen.net wrote
  in 1e3c66ea-a6d3-44d7-b28e-bf068fff1...@jnielsen.net:

jo On Dec 5, 2009, at 4:40 AM, Hiroki Sato h...@freebsd.org wrote:
jo
jo  Hiroki Sato h...@freebsd.org wrote
jo   in 20091203.182931.129751456@allbsd.org:
jo 
jo  hr And another thing, I noticed a box with 82573E and 82573L
jo  sometimes
jo  hr  got stuck after upgrading to 8.0-STABLE.  It has moderate network
jo  hr load (average 5-10Mbps) on both NICs.  It worked for a day or two
jo  and
jo  hr then got stuck suddenly.  Rebooting the box solved the situation,
jo  but
jo  hr  it got stuck again after a day or so.  After it happens, the
jo  hr  interface does not respond.  The other functionalities of FreeBSD
jo  hr seemed working.  Doing an up/down cycle for the NICs seemed to
jo  send
jo  hr some packets, but it did not recover completely; rebooting was
jo  needed
jo  hr for recovery.  This box does not have the RTT problem.  I am still
jo  hr  not sure what is the trigger, there seems something wrong.
jo 
jo  Things turned out for this symptom so far are:
jo 
jo  - This occurs around once per 1-2 days.
jo 
jo  - Once it occurs, all of communications including ARP and IPv4 stop.
jo 
jo  - ifconfig em0 down/up can recover the interface. However, on doing
joup after down the following message was displayed:
jo 
jo# ifconfig em0 up
joem0: Could not setup receive structures
jo 
joAfter trying it several times it worked.
jo 
joThen, the interface seemed back to normal for a couple of minutes,
jobut it stopped again.
jo 
jo  I guess there is a kind of deadlock somewhere but not sure it is
jo  really related to the em(4) driver.  I will continue to investigate
jo  anyway.
jo
jo I'm curious, what speed/duplex is your interface using and is it
jo statically set or using autoselect?

 No manual configuration.  Two em's are set as the following:

 | media: Ethernet autoselect (1000baseT full-duplex)

 It is mainly used for NFS server.  The actual communication speed was
 around 700Mbps at peak.

-- Hiroki


pgpvYDKEkwgAk.pgp
Description: PGP signature


Re: em interface slow down on 8.0R

2009-12-03 Thread Hiroki Sato
Hi Jack,

Jack Vogel jfvo...@gmail.com wrote
  in 2a41acea0912021514r2d44dd33n4c364518d7fe1...@mail.gmail.com:

jf Update: the claim to be unable to install was hasty, I went in and looked
jf into myself and was able to get an install. Here's what I've found so far:
jf
jf First, the 82547EI will fail due to Invalid Mac Address, so I guess you
jf hacked around this problem yourself?  I had someone here test all
jf legacy adapters for this problem and I was told nothing else was exhibiting
jf it besides the 82542, obviously this is false :)  In any case I will be
jf making
jf an official patch to fix that problem soon.
jf
jf Second, once I had the device working I do indeed see substandard
jf performance, I am continuing to debug, but wanted you to know that I
jf have reproduced this.

 Thank you!  I have investigated some more details.  First, I got
 something wrong with the affected FreeBSD versions; one I tried was
 8.0-STABLE, not 8.0-RELEASE.  So I started to try 8.0R.  A summary of
 chips and releases I tried so far is now the following:

  7.2R  8.0R  8.0-STABLE
 82540EM (chip=0x100e8086, rev=0x02)  OKOKtoo slow[1]
 82541PI (chip=0x107c8086, rev=0x05)  OK? OK
 82545ep (chip=0x10268086, rev=0x04)  OK? OK
 82547EI (chip=0x10198086, rev=0x00)  OKOKtoo slow[1]
 82562V-2(chip=0x10c08086, rev=0x02)  OK? OK
 82573E  (chip=0x108c8086, rev=0x03)  OK? work but sometimes freeze[2]
 82573L  (chip=0x109a8086, rev=0x00)  OK? work but sometimes freeze[2]

 8.0-STABLE is as of Dec 1. The [1] means the odd RTT I described in
 the previous email.  The [2] means it worked fine but sometimes it
 stopped working, as described later.

 The long RTT symptom is reproducible on Intel D865BGP motherboard.
 When I inserted another PCI card with an 82545ep onto it, it worked
 fine as em1.  The em0 still had the problem after adding the em1
 card.  I did not manually set MAC address on it, and there was no
 error related to it.

 The above box is used for some network services, so I prepared
 another box based on D865BGP motherboard.  This box has two NICs,
 82547EI and 82540EM.  The former is on-board and the latter is a PCI
 card.  The 8.0R worked fine with the two.  On the 8.0-STABLE both
 NICs have the RTT problem.  The following difference was found by
 comparing the outputs dev.em.[01].debug with each other:

-em0: Adapter hardware address = 0xc42e1424
+em0: Adapter hardware address = 0xc42e0424
-em1: Adapter hardware address = 0xc4364424
+em1: Adapter hardware address = 0xc435e424

 The - lines are on 8.0-STABLE, and the + ones are on 8.0-RELEASE.

 Although I did not yet tried 8.0R on the other boxes which work fine
 on 8.0-STABLE, it is certain that the RTT problem did not occur on
 that box + 8.0R, at least.  Difference of em(4) between 8.0-RELEASE
 and 8.0-STABLE is quite small, so perhaps it is due to some other
 changes...  If there is something else I should try, please let me
 know.

 And another thing, I noticed a box with 82573E and 82573L sometimes
 got stuck after upgrading to 8.0-STABLE.  It has moderate network
 load (average 5-10Mbps) on both NICs.  It worked for a day or two and
 then got stuck suddenly.  Rebooting the box solved the situation, but
 it got stuck again after a day or so.  After it happens, the
 interface does not respond.  The other functionalities of FreeBSD
 seemed working.  Doing an up/down cycle for the NICs seemed to send
 some packets, but it did not recover completely; rebooting was needed
 for recovery.  This box does not have the RTT problem.  I am still
 not sure what is the trigger, there seems something wrong.

-- Hiroki


pgpJ7YFZl6Z8M.pgp
Description: PGP signature


Re: loader(8) readin failed on 7.2R and later including 8.0R

2009-12-03 Thread Hiroki Sato
John Baldwin j...@freebsd.org wrote
  in 200912020948.05698@freebsd.org:

jh On Tuesday 01 December 2009 12:13:39 pm Hiroki Sato wrote:
jh   While the load command seemed to finish, the box got stuck just
jh   after entering boot command.
jh 
jh   Curious to say, I have got this symptom only on a specific box in
jh   more than ten different boxes I upgraded so far; it is based on an
jh   old motherboard Supermicro P4DPE[*].
jh 
jh   [*] http://www.supermicro.com/products/motherboard/Xeon/E7500/P4DPE.cfm
jh 
jh   Any workaround?  Booting from release CDROMs (7.2R and 8.0R) also
jh   fail.  On the box 7.1R or 7.1R's loader + 7.2R kernel worked
jh   fine.  It is possible something in changes of loader(8) between 7.1R
jh   and 7.2R is the cause, but I am still not sure what it is...
jh
jh It may be related to the loader switching to using memory  1MB for its
jh malloc().  Maybe try building the loader with 'LOADER_NO_GPT_SUPPORT=yes' in
jh /etc/src.conf?

 Thanks, a recompiled loader with LOADER_NO_GPT_SUPPORT=yes' displayed
 elf32_loadimage: could not read symbols - skipped! for 8.0R kernel.
 This is the same as 7.1R's loader + 8.0R kernel case.

-- Hiroki


pgppYSmidXp4L.pgp
Description: PGP signature


Re: loader(8) readin failed on 7.2R and later including 8.0R

2009-12-03 Thread Hiroki Sato
John Baldwin j...@freebsd.org wrote
  in 200912030803.29797@freebsd.org:

jh On Thursday 03 December 2009 5:29:13 am Hiroki Sato wrote:
jh  John Baldwin j...@freebsd.org wrote
jhin 200912020948.05698@freebsd.org:
jh 
jh  jh On Tuesday 01 December 2009 12:13:39 pm Hiroki Sato wrote:
jh  jh   While the load command seemed to finish, the box got stuck just
jh  jh   after entering boot command.
jh  jh 
jh  jh   Curious to say, I have got this symptom only on a specific box in
jh  jh   more than ten different boxes I upgraded so far; it is based on an
jh  jh   old motherboard Supermicro P4DPE[*].
jh  jh 
jh  jh   [*]
jh http://www.supermicro.com/products/motherboard/Xeon/E7500/P4DPE.cfm
jh  jh 
jh  jh   Any workaround?  Booting from release CDROMs (7.2R and 8.0R) also
jh  jh   fail.  On the box 7.1R or 7.1R's loader + 7.2R kernel worked
jh  jh   fine.  It is possible something in changes of loader(8) between 
7.1R
jh  jh   and 7.2R is the cause, but I am still not sure what it is...
jh  jh
jh  jh It may be related to the loader switching to using memory  1MB for 
its
jh  jh malloc().  Maybe try building the loader with
jh 'LOADER_NO_GPT_SUPPORT=yes' in
jh  jh /etc/src.conf?
jh 
jh   Thanks, a recompiled loader with LOADER_NO_GPT_SUPPORT=yes' displayed
jh   elf32_loadimage: could not read symbols - skipped! for 8.0R kernel.
jh   This is the same as 7.1R's loader + 8.0R kernel case.
jh
jh Can you get the output of 'smap' from the loader?  Is the 8.0 kernel bigger
jh than the 7.x kernel?  If so, can you try trimming the 8.0 kernel a bit to 
see
jh if that changes things?

 Sure.  Output of smap on an 8.0R loader with LOADER_NO_GPT_SUPPORT=yes
 was:

| OK smap
| SMAP type=01 base= len=0009f400
| SMAP type=02 base=0009f400 len=0c00
| SMAP type=02 base=000dc000 len=00024000
| SMAP type=01 base=0010 len=00e0
| SMAP type=02 base=00f0 len=0010
| SMAP type=01 base=0100 len=beef
| SMAP type=03 base=bfef len=c000
| SMAP type=04 base=bfefc000 len=4000
| SMAP type=01 base=bff0 len=0008
| SMAP type=02 base=bff8 len=0008
| SMAP type=02 base=fec0 len=0001
| SMAP type=02 base=fee0 len=1000
| SMAP type=02 base=ff80 len=0040
| SMAP type=02 base=fff0 len=0010
| OK

 Size difference between the two kernels was:

| -r-xr-xr-x  1 root  wheel   9708240 Dec  1 16:22 kernel.7/kernel
| -r-xr-xr-x  1 root  wheel  11492703 Nov 21 15:48 kernel.8/kernel

 Then I rebuilt a smaller 8.0 kernel by removing some entries from the
 kernel configuration file.  The size is now smaller than 7.1R kernel:

| -r-xr-xr-x  1 root  wheel  7710491 Dec  3 21:10 /boot/kernel.8X/kernel

 Loading the new kernel seemed to work fine with the recompiled 8.0R
 loader, but it got stuck just after entering boot:

| OK load /boot/kernel.8X/kernel
| /boot/kernel.8X/kernel text=0x5a7664 data=0x88d74+0x82f04 
syms=[0x4+0x6d290+0x4+0x987e3]
| OK boot
| /

 Loading 7.1R kernel by using the recompiled 8.0R loader had no
 problem.

-- Hiroki


pgp4kNtLrPHOy.pgp
Description: PGP signature


loader(8) readin failed on 7.2R and later including 8.0R

2009-12-01 Thread Hiroki Sato
Hi,

 This may be a rare case, but I post this with the hope for ideas from
 people here.

 I have experienced a strange loader(8) error.  After upgrading one of
 my boxes from 7.1R to 7.2R, an error appeared on boot command of
 loader(8) like this:

 | FreeBSD/i386 bootstrap loader, Revision 1.1
 | (h...@cmaster.allbsd.org, Mon Nov 30 04:01:24 JST 2009)
 | Loading /boot/defaults/loader.conf
 | /boot/kernel/kernel text=0x8b6c04
 | readin failed
 |
 | elf32_loadimage: read failed
 | /boot/kernel/kernel text=0x8b6c04
 | readin failed
 |
 | elf32_loadimage: read failed
 | Unable to load a kernel!

 (Actually the above error message was displayed when I upgraded it to
 8.0R.  The message was the same when I tried 7.2R.)

 Replacing the /boot/loader with 7.1R's one, 7.2R's kernel worked
 fine.

 Next, I tried to upgrade it to 8.0R.  As I explained earlier, the
 8.0R's loader did not work either, so I replaced it with 7.1R again.
 However, 7.1R loader(8) + 8.0R kernel displayed the following error
 and did not work:

 | OK load /boot/kernel/kernel
 | /boot/kernel/kernel text=0x8db9a4 data=0xdd134+0xa5e84 
syms=[0x4+0x99390+0x4+0xd2201
 | elf32_loadimage: could not read symbols - skipped!

 While the load command seemed to finish, the box got stuck just
 after entering boot command.

 Curious to say, I have got this symptom only on a specific box in
 more than ten different boxes I upgraded so far; it is based on an
 old motherboard Supermicro P4DPE[*].

 [*] http://www.supermicro.com/products/motherboard/Xeon/E7500/P4DPE.cfm

 Any workaround?  Booting from release CDROMs (7.2R and 8.0R) also
 fail.  On the box 7.1R or 7.1R's loader + 7.2R kernel worked
 fine.  It is possible something in changes of loader(8) between 7.1R
 and 7.2R is the cause, but I am still not sure what it is...

-- Hiroki


pgpBMlCWrr9jX.pgp
Description: PGP signature


em interface slow down on 8.0R

2009-11-30 Thread Hiroki Sato
Hi,

 I noticed that network connection of one of my boxes got
 significantly slow just after upgrading it to 8.0R.  The box has an
 em0 (82547EI) and worked fine with 7.2R.

 The symptoms are:

 - A ping to a host on the same LAN takes 990ms RTT, it reduces
   gradually to around 1ms, and then it returns to around 1s.  The
   rate was about 2ms/ping.

 - The response is quite slow, but no packet loss and network services
   on the box seem to work fine as far as I can check.  There does not
   seem interrupt storm according to vmstat -i.  No error message
   such as watchdog timeout appears.

 Any ideas to narrow down the cause?  It maybe a linkup problem with a
 specific model of hub like full-duplex/half-duplex mismatch, but the
 link is 1000baseT full-duplex and setting it manually did not
 solve it.  I think it is certain that upgrading to 8.0R triggered it,
 at least.

 Another box with an em interface works fine after upgrading to 8.0R.
 It has a different chip (82573E).

 Details of the em interface and vmstat -i are the following:

 e...@pci0:1:1:0: class=0x02 card=0x302c8086 chip=0x10198086 rev=0x00 
hdr=0x00
vendor = 'Intel Corporation'
device = 'Gigabit Ethernet Controller (LOM) (82547EI)'
class  = network
subclass   = ethernet

 Adapter hardware address = 0xc42e1424
 em0: CTRL = 0x183c0241 RCTL = 0x8002
 em0: Packet buffer = Tx=10k Rx=30k
 em0: Flow control watermarks high = 28672 low = 27172
 em0: tx_int_delay = 66, tx_abs_int_delay = 66
 em0: rx_int_delay = 0, rx_abs_int_delay = 66
 em0: fifo workaround = 0, fifo_reset_count = 0
 em0: hw tdh = 49, hw tdt = 49
 em0: hw rdh = 238, hw rdt = 187
 em0: Num Tx descriptors avail = 250
 em0: Tx Descriptors not avail1 = 0
 em0: Tx Descriptors not avail2 = 0
 em0: Std mbuf failed = 0
 em0: Std mbuf cluster failed = 0
 em0: Driver dropped packets = 0
 em0: Driver tx dma failure in encap = 0

 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.14
 dev.em.0.%driver: em
 dev.em.0.%location: slot=1 function=0 handle=\_SB_.PCI0.P0P2.TANA
 dev.em.0.%pnpinfo: vendor=0x8086 device=0x1019 subvendor=0x8086 
subdevice=0x302c class=0x02
 dev.em.0.%parent: pci1
 dev.em.0.debug: -1
 dev.em.0.stats: -1
 dev.em.0.rx_int_delay: 0
 dev.em.0.tx_int_delay: 66
 dev.em.0.rx_abs_int_delay: 66
 dev.em.0.tx_abs_int_delay: 66
 dev.em.0.rx_processing_limit: 100
 dev.em.0.wake: 0

 % vmstat -i
 interrupt  total   rate
 irq4: uart0 3585  3
 irq14: ata0 1811  1
 irq15: ata1  112  0
 irq16: uhci0 uhci315  0
 irq18: em0 uhci2+  92457 99
 irq19: uhci1   1  0
 irq23: ehci0   2  0
 cpu0: timer  1849981   1997
 cpu1: timer  1849961   1997
 Total3797925   4101

-- Hiroki


pgpKKA4N6gAaa.pgp
Description: PGP signature


Re: em interface slow down on 8.0R

2009-11-30 Thread Hiroki Sato
Jack Vogel jfvo...@gmail.com wrote
  in 2a41acea0911301119j1449be58y183f2fe1d1112...@mail.gmail.com:

jf I will look into this Hiroki, as time goes the older hardware does not
jf always
jf get test cycles like one might wish.

 Thanks!  Please let me know if you need more information.

-- Hiroki


pgp3TYQPpOkMO.pgp
Description: PGP signature


possible loader regression on RELENG_7_2_0_RELEASE

2009-05-03 Thread Hiroki Sato
During upgrading boxes in allbsd.org to RELENG_7_2_0_RELEASE I found
one of them could not boot at the loader stage.  The error messages
issued by the loader after make installkernel + make installworld +
reboot were the following:

|Loading /boot/defaults/loader.conf
|/boot/kernel/kernel text=0x7cbd7c data=0xcece0+0x67940
|readin failed
|
|elf32_loadimage: read failed
|/boot/kernel/kernel text=0x7cbd7c data=0xcece0+0x67940
|readin failed
|
|elf32_loadimage: read failed
|Unable to load a kernel!

The normal loader prompt was displayed after that and I can enter
commands, but neither the kernel nor some old kernels which I
confirmed they worked fine got loaded.

Then I tried a livefs CDROM, but the same error occurred at the loader
stage.  So I tried 7.1R CDROM instead, mounted the root file system on
the hard drive, and copied a loader binary from 7.1R.  It worked with
no problem with the RELENG_7_2_0_RELEASE kernel.

The motherboard was Supermicro P4DPE (Xeon 2.4GHz x 2, 3GB RAM).  The
installed version was FreeBSD/i386.

I did not narrow down the cause yet due to the time was limited, but
it was reproducible and probably hardware-dependent.  Replacing the
loader binary with the old one worked as a workaround, so I guess
there may be a regression around the boot loader.  Just a report.

--
| Hiroki SATO


pgphazWVsse2N.pgp
Description: PGP signature


IPv6 routing on 7.1R

2009-01-11 Thread Hiroki Sato
Hi,

 I noticed an odd behavior regarding IPv6 after upgrading my 7.0R box
 to 7.1R.  The situation and symptom are the following:

 1. The box has two NICs.  One has an address 2001:0db8:1::1/64 (NIC
A), and another has 2001:0db8:2::1/64 (NIC B).  These addresses
are assigned manually ($ipv6_ifconfig in rc.conf).

 2. RA is periodically sent to the network 2001:0db8:1::1/64 (NIC A)
by a router on the subnet.  The RA includes a source link-layer
address option only.

 When setting net.inet6.ip6.accept_rtadv=1 in this configuration, I
 expected the box assigns an autoconf IPv6 address (prefix
 2001:0db8:1::/64 + EUI64) to NIC A and an default route based on
 source link-layer address in the RA packet.  Actually, these two were
 done as expected.  However, after addresses are assigned, routes for
 NIC B disappeared from the routing table.  More specifically, a
 cloning route 2001:0db8:2::1/64 - link#2 was removed for some
 reason.

 Is this an expected behavior?  IIRC, 7.0R does not remove the route
 and I think it is strange.  It works fine if a box has a single NIC,
 though.

--
| Hiroki SATO


pgpedyyIb66a2.pgp
Description: PGP signature


Re: IPv6 over gif(4) broken in 6.2-RELEASE?

2007-01-20 Thread Hiroki Sato
Bruce A. Mah [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

bm I'm observing a problem with IPv6 over gif(4) tunnels on 6.2-RELEASE
bm and recent 6-STABLE, namely that I can't seem to be able to pass
bm traffic over them.
bm
bm Essentially, when I configure a gif interface like this:
bm
bm # ifconfig gif0 inet6 :::::1 :::::2 
prefixlen 128
bm
bm the interface should add a host route to :::::2
bm through gif0.  This is necessary to be able to pass traffic over the
bm tunnel, particularly since the source and destination addresses of the
bm link don't need to have any relationship to each other.
bm
bm However, this route doesn't get installed on recent 6-STABLE.
bm Therefore there is no way to get an IPv6 packet to the other end of
bm the tunnel because there's no route for the destination.  The most
bm obvious symptom is that I try to ping the other tunnel endpoint and
bm get:
bm
bm ping6: UDP connect: No route to host
bm
bm I know this worked on RELENG_6 as of June 2006; my home firewall has
bm been running this code for months without a hitch.  It doesn't work in
bm 6.2-RC2 or 6.2-RELEASE (fresh CD installs on i386, GENERIC kernels),
bm or this week's RELENG_6 (nanobsd on i386).
bm
bm I somewhat suspect revs. 1.48.2.15 and 1.48.2.14 to
bm src/sys/netinet/nd6.c.  If I locally revert these two changes (see
bm diff below), IPv6 over gif(4) works again.
bm
bm There's another workaround for people stuck in this situation and who
bm aren't in a position to try this diff.  That is to manually install
bm the host route like this:
bm
bm # route add -host -inet6 :::::2 -interface gif0 -nostatic 
-llinfo
bm
bm Comments?

 I remember Dimitry Andric reported the same problem on -stable on 30
 Dec, and after he reverted rev.1.48.2.16 it worked fine again.  Do
 you have the symptom even on 6.2-RELEASE?  Since RELENG_6_2_0_RELEASE
 did not have the change, I thought there was no problem.

 I will try to reproduce it on my box anyway...

--
| Hiroki SATO


pgpXPIvebVKfg.pgp
Description: PGP signature


Re: strange behavior of ioapic on PDSME motherboard

2006-12-05 Thread Hiroki Sato
John Baldwin [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

jh On Sunday 03 December 2006 22:55, Hiroki Sato wrote:
jh  Hiroki Sato [EMAIL PROTECTED] wrote
jhin [EMAIL PROTECTED]:
jh   John, are there any big changes of ioapic support between RELENG_6
jh   and CURRENT?  I would like your comments to narrow down the cause.
jh   The 7.0-CURRENT November snapshot could probe the mpt (as a very slow
jh   device, though), and both em and mpt worked with acpi/ioapic enabled.
jh   I had a look at the changes in sys/i386/i386, but I am not sure
jh   if which is likely (or not)...
jh
jh There aren't any non-cosmetic changes in the apic code between 6.x and HEAD.

 Okay, thanks.  So these symptoms are not directly related to mpt(4)
 and apic code at least.  I will continue to investigate what is the
 cause, anyway.

--
| Hiroki SATO



pgpibwzVkkyTo.pgp
Description: PGP signature


panic in nfsd on 6.2-RC1

2006-12-04 Thread Hiroki Sato
Hi,

 One of my NFS servers running 6.2-RC1 that are highly-loaded causes a
 panic repeatedly these days.  I am not sure which upgrading this
 panic starts after precisely, but this was running for almost one
 year (6.0R and 6.1R) with no problem at least.  A core file is
 available.

(from here)
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x0
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc069d890
stack pointer   = 0x28:0xed0ae920
frame pointer   = 0x28:0xed0ae928
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= resume, IOPL = 0
current process = 653 (nfsd)
trap number = 12
panic: page fault
Uptime: 46m22s
Dumping 1021 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 1022MB (261423 pages) 1006 990 974 958 942 926 910 894 878 862 846 
830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 
510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 
190 174 158 142 126 110 94 78 62 46 30 14

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc067c512 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc067c7d8 in panic (fmt=0xc08d0c8e %s)
at /usr/src/sys/kern/kern_shutdown.c:565
#3  0xc0892122 in trap_fatal (frame=0xed0ae8e0, eva=0)
at /usr/src/sys/i386/i386/trap.c:837
#4  0xc0891866 in trap (frame=
  {tf_fs = -992346104, tf_es = 40, tf_ds = 268107816, tf_edi = 72, tf_esi = 
0, tf_ebp = -318052056, tf_isp = -318052084, tf_ebx = -993986688, tf_edx = 
-993986688, tf_ecx = 4, tf_eax = 4, tf_trapno = 12, tf_err = 0, tf_eip = 
-1066805104, tf_cs = 32, tf_eflags = 589831, tf_esp = 0, tf_ss = -1063278752})
at /usr/src/sys/i386/i386/trap.c:270
#5  0xc088012a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#6  0xc069d890 in turnstile_broadcast (ts=0x0)
at /usr/src/sys/kern/subr_turnstile.c:726
#7  0xc06739d7 in _mtx_unlock_sleep (m=0xc09fa760, opts=0, file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:690
#8  0xc077e00b in nfs_rephead (siz=0, nd=0xc5023c00, err=72, mbp=0x4,
bposp=0x4) at /usr/src/sys/nfsserver/nfs_srvsock.c:152
#9  0xc07779f3 in nfsrv_symlink (nfsd=0xc5023c00, slp=0xc4f8ae80,
td=0xc4c0f780, mrq=0xed0aec98) at /usr/src/sys/nfsserver/nfs_serv.c:2844
#10 0xc07819b1 in nfssvc_nfsd (td=0x4)
at /usr/src/sys/nfsserver/nfs_syscalls.c:474
#11 0xc0781194 in nfssvc (td=0xc4c0f780, uap=0xed0aed04)
at /usr/src/sys/nfsserver/nfs_syscalls.c:181
#12 0xc0892437 in syscall (frame=
  {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 1, tf_esi = 0, tf_ebp = 
-1077941464, tf_isp = -318050972, tf_ebx = 12, tf_edx = 672449048, tf_ecx = 26, 
tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 671863223, tf_cs = 51, 
tf_eflags = 662, tf_esp = -1077941492, tf_ss = 59}) at 
/usr/src/sys/i386/i386/trap.c:983
#13 0xc088017f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#14 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)
(to here)

--
| Hiroki SATO


pgpjbloAH1H0n.pgp
Description: PGP signature


Re: panic in nfsd on 6.2-RC1

2006-12-04 Thread Hiroki Sato
Kostik Belousov [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

ko What version of sys/nfsserver/nfs_serv.c do you use ? If it is older than
ko 1.156.2.7, please, update the system.

 Thanks, I updated it just now and see how it works.

--
| Hiroki SATO


pgp856R5LuG1R.pgp
Description: PGP signature


strange behavior of ioapic on PDSME motherboard (was: LSI 53C1030/mpt(4) problem)

2006-12-03 Thread Hiroki Sato
Hiroki Sato [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

hr  Recently I bought Intel Pentium D 945 (3.45GHz), Supermicro PDSME
hr  (Intel E7320), and LSI21320RB (PCI-X SCSI HBA using LSI 53C1030).  I
hr  installed 6.2-RC1 to an old PATA HDD and attached it to the
hr  motherboard, and it worked fine.  However, I installed 21320RB and
hr  made several SCSI HDDs attached, some strange problems occurred.

 It worked when I turned off ioapic and/or acpi.  When acpi was
 disabled, mpt seemed to work but em did not work due to the UP/DOWN
 storm (vmstat -i did not display an irq for em0 at that time).
 When ioapic was disabled, all devices worked with shared irqs.

 So, this is probably an ioapic's issue, not a mpt's, and PDSME
 specific I guess.  Sorry for the false alarm.

 John, are there any big changes of ioapic support between RELENG_6
 and CURRENT?  I would like your comments to narrow down the cause.
 The 7.0-CURRENT November snapshot could probe the mpt (as a very slow
 device, though), and both em and mpt worked with acpi/ioapic enabled.
 I had a look at the changes in sys/i386/i386, but I am not sure
 if which is likely (or not)...

Scott Long [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

sc Hiroki Sato wrote:
sc   Any suggestions for what I should do for this problem?  I can send
sc   more detail information from boot -v and/or dev.mpt.0.debug=5, but
sc   not sure which message is important for diagnosing.
sc 
sc Just for comparison, could you go back to FreeBSD 6.0 and see if the
sc problems remain?

 No difference when I tried, but it seems not a mpt problem as I wrote
 above.  Thanks for the suggestion, anyway.

Matthew Jacob [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

ly   - 2006 Nov 7-CURRENT snapshot probes the two HDDs case, but the HDDs
ly are recognized as very slow devices such as 6MB/s, and accessing
ly it makes the box freeze, too.
ly
ly The 6MB/s thing I'm working on now. I have no clue about the other
ly issues at this time.

 I see.  BTW, I confirmed that mpt worked on the November snapshot
 except the data transfer rate was 6.6MB/s.  Is it worth trying the
 latest current?

--
| Hiroki SATO


pgpjKImSwO9n3.pgp
Description: PGP signature


LSI 53C1030/mpt(4) problem

2006-12-02 Thread Hiroki Sato
Hi,

 Recently I bought Intel Pentium D 945 (3.45GHz), Supermicro PDSME
 (Intel E7320), and LSI21320RB (PCI-X SCSI HBA using LSI 53C1030).  I
 installed 6.2-RC1 to an old PATA HDD and attached it to the
 motherboard, and it worked fine.  However, I installed 21320RB and
 made several SCSI HDDs attached, some strange problems occurred.

 First, 21320RB was recognized by the mpt(4) driver.  When I tried it
 with no HDD it was recognized properly, so I turned off the box and
 connected an HDD to it and rebooted it.  Then, mpt(4) recognized the
 HDD and it worked without problems.  I thought it was okay, and
 connected more HDDs to the SCSI HBA.  More specifically, 21320RB has
 two channels, so I connected two hardware RAID boxes which actually
 contain five HDDs each and are seen as one large HDD to each channel.

 When I rebooted the box after that, device probing at boot time
 stopped just before Waiting 5 seconds for SCSI devices to settle.
 Everything including keyboard does not work at that time, I turned
 off the box and disconnect the RAID boxes.  After several trials,
 I found that 21320RB's behavior was somewhat strange:

 - with no HDD:

   Works fine basically, but after two or more HDDs recognized, it
   freezes during device probing (just before Waiting... message) even
   if the HDDs removed.  Setting the card's configuration as factory
   default via BIOS setting seems to recover the state.

 - with one HDD:

   Works fine after it is recognized.

 - with two HDDs:

   Does not work if two HDDs are connected to each channel.  BIOS
   message from the HBA is normal, but FreeBSD device probing keeps
   failing in the following two forms:

a) Freeze just before Waiting... message.
b) Freeze after Waiting... message.

   In b), mpt(4) seems to reset the buses and wait the responses, but
   I saw after displaying unretryable error it freeze when boot -v
   used.

   I tried booting the box with no SCSI HDD, connecting HDDs after the
   boot, and doing camcontrol rescan all.  It recognizes the
   connected HDDs successfully, and it can be accessed fine even if it
   is more than one.  However, simultaneous access causes solid freeze
   again.

   Then I tried a RAID box which has one ID and several LUN numbers
   corresponding to the HDDs.  It recognized as normal, multiple HDDs
   at boot time, and can be accessed.  Simultaneous access works, too.

   After that, I tried daisy-chaining two RAID boxes and connected the
   two to a channel of the SCSI HBA.  These RAID boxes have ID=0 and
   ID=1.  FreeBSD freezes after Waiting... message this time.

 In short, I could make this configuration work fine only when a RAID
 box (or SCSI HDD) is connected to the HBA, or multiple HDDs that have
 the same ID and different LUN number from each other are connected.

 I investigated the following:

 - 6.1R sometimes probes the two HDDs case, but accessing it makes the
   box freeze.

 - 2006 Nov 7-CURRENT snapshot probes the two HDDs case, but the HDDs
   are recognized as very slow devices such as 6MB/s, and accessing
   it makes the box freeze, too.

 - When the box freezes just before Waiting... message, boot -v
   does not display any detail messages there.  In after Waiting..
   case, several messages are displayed from mpt(4).

 - No panic in either case.  In all cases, it silently freezes and
   does not respond to Ctrl-Alt-ESC.

 - When I use Intel D865GBF (motherboard with Intel 865 chipset), the
   same HBA, and the same RAID boxes, they work fine on 6.1-RC1.  The
   HBA is connected to 33MHz PCI bus, not PCI-X, so it may make some
   differences.

 Any suggestions for what I should do for this problem?  I can send
 more detail information from boot -v and/or dev.mpt.0.debug=5, but
 not sure which message is important for diagnosing.

--
| Hiroki SATO



pgpA2pXeWR7UN.pgp
Description: PGP signature


Re: getopt_long and POSIXLY_CORRECT

2006-09-24 Thread Hiroki Sato
Mikhail Teterin [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

mi Could a committer with interest in -stable, please, see to it, that Andrey's
mi recent change to getopt_long makes it into 6.2-RELEASE?
mi
mi The change makes our implementation of getopt_long closer to GNULIB's and 
will
mi make it easier to avoid code-duplication in some ports.

 Approved.  Thanks.

--
| Hiroki SATO


pgpiA0ASiMWN0.pgp
Description: PGP signature


Re: cvs commit: www/en/releases/6.1R todo.sgml

2006-03-06 Thread Hiroki Sato
Gleb Smirnoff [EMAIL PROTECTED] wrote
  in [EMAIL PROTECTED]:

gl Is it possible place kern/87208 into TODO list for 6.1-RELEASE?
gl The problem appeared to be a bad regression in 6.0-RELEASE,
gl that hurted many users. The PR contains several test cases,
gl description and patch for the problem.

 Thanks, added just now.  Will this description do?

--
| Hiroki SATO


pgp93rSswL0gH.pgp
Description: PGP signature


tester needed: problems in 5.3R errata solved?

2005-04-20 Thread Hiroki Sato
Hi all,

 Before 5.4R is released, I would like to make sure which problems
 described in 5.3R errata[*] are solved and which are not.
 If you had a problem on 5.3R and you make sure it is solved (or
 still persists) on 5.4-RC series, could you please inform us?
 Any comments are welcome.  Thanks in advance.

 [*] http://www.FreeBSD.org/releases/5.3R/errata.html

-- 
| Hiroki SATO


pgpkErWk22NXc.pgp
Description: PGP signature