from:"Jeremy Chadwick"

Re: [HEADSUP] Re: Is IPV6 option still necessary?

2019-10-10 Thread Jeremy Chadwick via freebsd-stable

On Wed, Oct 09, 2019 at 08:13:39PM -0700, Jeremy Chadwick wrote:
> > Now we can get back on the ipv6 option.
> > 
> > so if we want to proceed further in removing the option to build with or 
> > without
> > ipv6 for the ports side. Please speak up in reply to this email, if you are
> > building without ipv6, why are you doing so, what are the real benefit for 
> > it.
> > How bad it will impact you if we do remove that option?
> 
> Whenever I use ports over FreeBSD-provided packages (or to use ports to
> build my own packages), I often disable IPV6 support.  The lengthy
> response below should explain why.
> {brevity snip}

This was sent to the wrong mailing list; was intended for -ports.
Sorry for the noise.

-- 
| Jeremy Chadwick j...@koitsu.org |
| UNIX Systems Administrator  PGP 0x2A389531 |
| Making life hard for others since 1977.|

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: [HEADSUP] Re: Is IPV6 option still necessary?

2019-10-10 Thread Jeremy Chadwick via freebsd-stable

> Now we can get back on the ipv6 option.
> 
> so if we want to proceed further in removing the option to build with or 
> without
> ipv6 for the ports side. Please speak up in reply to this email, if you are
> building without ipv6, why are you doing so, what are the real benefit for it.
> How bad it will impact you if we do remove that option?

Whenever I use ports over FreeBSD-provided packages (or to use ports to
build my own packages), I often disable IPV6 support.  The lengthy
response below should explain why.

In short: the IPV6 option is useful and important.  Please keep it.

In length: I think anyone operating in the Real World knows quite well
that IPv6 is still treated as a third-class citizen when it comes to
both general connectivity/reliability* and general use cases
code-wise**.  It's still very much in utero; or a toddler, if you will.

When you encounter IPv6 vs. IPv4 prioritisation issues, they are painful
and annoying.  No user or administrator is going to sit for hours
fiddling with it all to restore things to a working state when simply
removing IPv6 relieves the problem permanently.  Time and time again I
see companies advertising  records and webservers listening on IPv6
yet IPv6 transit fails but their A/IPv4 endpoint works fine.  It's the
dual-stack nature that makes a lot of this worse than it should be.  (I
do think this subject should be re-visited once the world as a whole
starts to seriously decommission IPv4, though.  Yes I'm serious.)

I've worked for several companies that are IPv4-only, where the belief
(and one I share) is that IPv6-only clients have some 6-to-4-ish
gateway/NAT somewhere upstream, otherwise they wouldn't be able to reach
most of the Internet.  IPv4 NAT still works for the majority of use
cases still as of 2019.

Furthermore, faux-political statements like "IPv6 is more widely used
than 2012" should be ignored and facts reiterated: IPv6 adoption is
around 25% as of mid-2019.  And it's taken over 10 years to reach that.

IPv4 is also well-understood, and not, as Dave Horsfall accurately
described, "a horse designed by a committee"; people are still trying to
wrap their head around IPv6 NDP/RA, SLAAC, and a myriad of other things
(dare I mention syntax?).  It's this which explains the sluggish
adoption rate.

And yes, I am well-aware of how important IPv6 is in other regions,
particularly Asia.  I am not belittling that need at all.  But not
everyone globally has the same needs.

What should really be asked for is the opposite: for the FreeBSD ports
folks to justify its removal.

How is this hurting you on a daily basis?  Is there a large percentage
of Mk/ framework bits causing you pain?  Are the bulk of per-port
patches inducing maintainer grief?  At what scale is this impacting you?
In 7 years (since the OP picked 2012), how much time has been spent by
maintainers ensuring IPV6=true works for their port(s)?  Are you truly
OK throwing away the integration work done by many, many people (not
just Project members!) over the past N years (see: per-port patches),
and forcing people who still need the option to make their own ports
tree to retain it?

Here's some harsh advice for the FreeBSD Project: quit changing shit for
sake of change, often masked by lies like "XXX is stagnant/old" or
similarly fallacious and loaded statements.  The project (both src and
ports, but especially ports) have lost many very good people in the past
10+ years (and I'm not talking about me) *because* of that change for
sake of change mindset -- the same mindset driving this request!  It's
changes like this that drive people away from FreeBSD.  Really.  It's
the same mindset that provoked people to stop using Linux distros due
to systemd integration.

I will not be replying to this thread past this point.  I have said all
that I care to say / spent enough time on it.  Just please stop hurting
administrators and end users with proposals/actions like this.

* - Real-world IPv6 failures impacting end users tend to be higher
than IPv4; this is anecdotal on my part, but I have a myriad of peers
who have had to disable IPv6 for similar reasons.  The IPv4 fallback in
software (both userland apps and network stacks) does not always work
"correctly".  Just go see how often IPv6 failures/issues are reported on
both NANOG and the outages@ mailing list.  And yes I am quite aware that
a good portion of the Internet backbone at this point is IPv6 (that's
nice, and not what we're talking about here).

** - I still continue to see open-source software committing major fixes
to AF_INET6 related code bits.  Major pieces of software include curl,
wget, Busybox, DNS servers (pick one!), and ntp... just for starters.

-- 
| Jeremy Chadwick j...@koitsu.org |
| UNIX Systems Administrator  PGP 0x2A389531 |
| Making life ha

Re: svn commit: r351246 - in stable: 11/sys/opencrypto 12/sys/opencrypto

2019-09-20 Thread Jeremy Chadwick via freebsd-stable

> I've committed a fix to head and will MFC it in a few days.  Thanks
> for tracking this down!

Did HEAD r351557 get backported/MFC'd into stable/11 and stable/12?  Can
test stable/11 if needed.

Thanks!

-- 
| Jeremy Chadwick j...@koitsu.org |
| UNIX Systems Administrator  PGP 0x2A389531 |
| Making life hard for others since 1977.|

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Buildworld times (was Re: svn commit: r350256 - in stable/12: . contrib/compiler-rt/lib/sanitizer_common contrib/libunwind/src contrib/llvm/lib/DebugInfo/DWARF contrib/llvm/lib/MC contrib/llvm/lib

2019-07-29 Thread Jeremy Chadwick via freebsd-stable

On Mon, Jul 29, 2019 at 01:44:01PM -0400, mike tancsa wrote:
> On 7/26/2019 10:38 PM, Jeremy Chadwick via freebsd-stable wrote:
> > (Please retain CCs, I am not subscribed to the list)
> >
> > Below is hard evidence of 3 things on stable/11 (not 12) after r350259:
> >
> > 1. r350259 adds *substantial* time to buildworld.
> 
> Are you sure this is not the same as the issue in RELENG12 ? ie. the new
> version of clang is built as part of world since it differs from whats
> installed.  I had a RELENG11 box sitting around from July 4th

By "on stable/11 (not 12)" I meant: I do not run stable/12, thus I
cannot speak on its behalf.

-- 
| Jeremy Chadwick j...@koitsu.org |
| UNIX Systems Administrator  PGP 0x2A389531 |
| Making life hard for others since 1977.|

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Buildworld times (was Re: svn commit: r350256 - in stable/12: . contrib/compiler-rt/lib/sanitizer_common contrib/libunwind/src contrib/llvm/lib/DebugInfo/DWARF contrib/llvm/lib/MC contrib/llvm/lib

2019-07-26 Thread Jeremy Chadwick via freebsd-stable

(Please retain CCs, I am not subscribed to the list)

Below is hard evidence of 3 things on stable/11 (not 12) after r350259:

1. r350259 adds *substantial* time to buildworld.

2. WITHOUT_CLANG_EXTRAS+WITHOUT_CLANG_FULL+WITHOUT_LLDB can help improve
the situation after r350259, but it is still no where near as fast as
pre-r350259.

3. Kernel build times are fine; issue is with world.

TL;DR for lazy folks:

stable/11 r350330 world + minimal clang = 1:29:34
stable/11 r350330 world + full clang= 1:46:31
stable/11 r350252 world + minimal clang =   56:52
stable/11 r350252 world + full clang= 1:14:30

I cannot even begin to tell you how big of an impact this has on my
low-end dual-core VPS box (world takes hours upon hours).

We've been down this road before, many many times, since the
introduction of clang/LLVM.  Here's just a few that went no where.  I
couldn't find the more-useful one that had some concrete numbers in it,
dating back to pre-2016 (maybe sometime in 2014 or 2015?):

https://lists.freebsd.org/pipermail/freebsd-current/2017-January/thread.html#64431
https://lists.freebsd.org/pipermail/freebsd-stable/2017-January/thread.html#86646
https://lists.freebsd.org/pipermail/freebsd-questions/2016-November/thread.html#274684

Does anyone have a good/recent write-up on how to switch to gcc?  :-)


System
==
* Intel Core 2 Quad Q9550 @ 2.83GHz
* 8GB ECC RAM
* Samsung SSD 840 EVO 250GB filesystem (UFS2 + SU (not SUJ) + TRIM) + 32GB swap
* Running stable/11 r349226
* Misc notes
  - r350330 happened to be what was "master" at the time of my test
  - r350252 was the commit on stable/11 immediately before r350259
  - Switching to r350252 accomplished via: cd /usr/src && svnlite up -r350252
  - System uses kern.maxvnodes=856944, last tuned 2018/06/07


Test #1, building r350330 minimal clang
===
# cat /etc/src.conf
WITHOUT_ATM=true
WITHOUT_BLUETOOTH=true
WITHOUT_DEBUG_FILES=true
WITHOUT_FLOPPY=true
WITHOUT_FREEBSD_UPDATE=true
WITHOUT_IPFILTER=true
WITHOUT_IPX=true
WITHOUT_LIB32=true
WITHOUT_NDIS=true
WITHOUT_NETGRAPH=true
WITHOUT_PPP=true
WITHOUT_SENDMAIL=true
WITHOUT_TESTS=true
WITHOUT_WIRELESS=true
WITH_OPENSSH_NONE_CIPHER=true
WITHOUT_CLANG_EXTRAS=true
WITHOUT_CLANG_FULL=true
WITHOUT_LLDB=true
WITHOUT_LLVM_TARGET_AARCH64=true
WITHOUT_LLVM_TARGET_ARM=true
WITHOUT_LLVM_TARGET_MIPS=true
WITHOUT_LLVM_TARGET_POWERPC=true
WITHOUT_LLVM_TARGET_SPARC=true
WITHOUT_REPRODUCIBLE_BUILD=true
# cat /etc/make.conf
KERNCONF=X7SBA_RELENG_11_amd64
CPUTYPE?=core2
SVN_UPDATE=yes
STRIP=
CFLAGS+=-fno-omit-frame-pointer

Result:
# rm -fr /usr/obj/*
# cd /usr/src
# time make -j4 buildworld
19906.874u 1280.928s 1:29:33.51 394.3%  57966+778k 23504+14200io 13867pf+0w
# time make -j4 buildkernel
1592.460u 196.047s 7:36.61 391.6%   48704+614k 6627+18158io 7361pf+0w


Test #2, building r350330 full clang

"full clang" means same as Test #1 but with these 3 src.conf lines
commented out, i.e. CLANG_EXTRAS, CLANG_FULL, and LLDB are ENABLED:

WITHOUT_CLANG_EXTRAS=true
WITHOUT_CLANG_FULL=true
WITHOUT_LLDB=true

Result:
# rm -fr /usr/obj/*
# cd /usr/src
# time make -j4 buildworld
23779.674u 1463.156s 1:46:30.75 394.9%  57621+783k 20093+15423io 7283pf+0w
# time make -j4 buildkernel
1594.079u 194.345s 7:36.48 391.7%   48707+614k 5301+18013io 5342pf+0w


Test #3, building r350252 minimal clang
===
Same configs as Test #1

Result:
# rm -fr /usr/obj/*
# cd /usr/src
# time make -j4 buildworld
12582.693u 882.543s 56:52.35 394.6% 62698+760k 21432+9694io 6923pf+0w
# time make -j4 buildkernel
1649.559u 184.934s 7:48.01 391.9%   57053+622k 7566+18291io 5402pf+0w


Test #4, building r350252 full clang

Same configs as Test #2

# rm -fr /usr/obj/*
# cd /usr/src
# time make -j4 buildworld
16600.975u 1068.754s 1:14:29.53 395.3%  63271+774k 8683+10876io 4707pf+0w
# time make -j4 buildkernel
1650.654u 183.966s 7:47.47 392.4%   57117+623k 2829+17951io 1926pf+0w

-- 
| Jeremy Chadwick j...@koitsu.org |
| UNIX Systems Administrator  PGP 0x2A389531 |
| Making life hard for others since 1977.|

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: /dev/crypto not being used in 12-STABLE

2018-12-07 Thread Jeremy Chadwick

On Fri, Dec 07, 2018 at 06:38:04PM -0500, Jung-uk Kim wrote:
> On 18. 12. 6., Jeremy Chadwick wrote:
> > I'm not subscribed to -stable.
> > 
> > This is in response to jkim@'s messages here:
> > 
> > https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html
> > https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html
> > 
> > Based on what I can tell, OpenSSL 1.1.1 or thereabouts removed the
> > cryptodev OpenSSL engine, which was a tie-in to BSD's cryptodev(4),
> > which is accessed via /dev/crypto and related crypto(4) ioctls.
> > 
> > Instead, they offered a replacement engine called devcrypto (what an
> > awful name), with the primary focus being against something from Linux
> > called cryptodev-linux, then was made to work on FreeBSD 8.4.  This code
> > was as of June 2017; 8.4 was EOL'd August 2015.  Interesting.
> > 
> > https://github.com/openssl/openssl/commit/4f79aff is not "add support
> > for BSD" at all.  It's "tweak further stuff for BSD", probably to get it
> > to work on newer FreeBSD; they seem to care about crypto/cryptodev.h
> > details.  I asked myself: why do they care about that if they're doing
> > it all themselves?  Looking at the code sheds light on that.  The actual
> > devcrypto engine commits that added BSD support are here:
> > 
> > https://github.com/openssl/openssl/pull/3744
> > https://github.com/openssl/openssl/pull/3744/files
> > 
> > The commits indicate that the devcrypto is enabled by default on
> > FreeBSD.  But we can tell from Herbert's post and jkim@'s patch that's
> > not true at all, i.e. FreeBSD disables it.  Why?  And is that a good
> > default?
> 
> Why do you think it is enabled by default?
> 
> https://github.com/openssl/openssl/blob/619eb33/Configure#L428

Because of this commit to OpenSSL's CHANGES file, which is part of what
I linked above; last sentence:

https://github.com/openssl/openssl/pull/3744/files#diff-e4eb329834da3d36278b1b7d943b3bc9

  *) Add devcrypto engine.  This has been implemented against cryptodev-linux,
 then adjusted to work on FreeBSD 8.4 as well.
 Enable by configuring with 'enable-devcryptoeng'.  This is done by default
 on BSD implementations, as cryptodev.h is assumed to exist on all of them.
 [Richard Levitte]

Is this message incorrect/false?  While I can read the perl code that is
the Configure script just fine, the CHANGES entry makes me think there
may be "other pieces" that affect the value of the key in that hash
(e.g. some script that uses uname detection and calls Configure with
argument).  Are there?

> Note crypto(4) was imported from OpenBSD.  Since OpenBSD 4.9, it was
> disabled by default.
> 
> https://www.openbsd.org/plus49.html
> 
> Then, they killed it in 5.7.
> 
> https://www.openbsd.org/plus57.html
> 
> o Unlinked the crypto(4) pseudo device (disabled by default for about 4
> years).
> 
> Now FreeBSD is the only major BSD with /dev/crypto.  That's why new
> engine was not thoroughly tested.

Thanks for the information.

So this implies there is a desire to get rid of cryptodev(4) (which is
the /dev/crypto endpoint), at least on OpenBSD.

Apologies if this is off-topic, but: is "device cryptodev" something
that should be removed from one's kernel config (due to what sounds like
desired deprecation), while keeping "device crypto" (to ensure userland
applications that use libcrypto/crypto(4) functions can still get at
crypto(9))?

> > Here's why I ask:
> >
> > The new devcrypto engine most definitely utilises /dev/crypto (thus
> > cryptodev(4) and crypto(4)).  cipher_init(), prepare_cipher_methods(),
> > digest_init(), and prepare_digest_methods() all utilise that interface:
> > 
> > https://github.com/openssl/openssl/pull/3744/files#diff-027f92eb0a10c0986aec873d9fd1ab66
> > 
> > So while OpenSSL now uses more of its own native C and assembly code
> > (e.g. for AES-NI support), and that's certainly faster than all the
> > overhead that cryptodev(4) brings with it (see jhb@'s post), I wonder:
> > 
> > 1. What happens to people using crypto hardware accelerators, ex.
> > hifn(4), padlock(4), ubsec(4), and safe(4)?  How exactly would OpenSSL
> > utilise these H/W accelerators if the devcrypto engine is disabled?
> 
> padlock has a dynamic engine, i.e., /usr/lib/engines/padlock.so.  I
> believe glxsb, hifn(4), safe(4), and ubsec(4) users are very rare
> nowadays.  If we have significant number of users and they show
> reasonable performance, then I will reconsider my decision.

Consider me surpri

Re: /dev/crypto not being used in 12-STABLE

2018-12-06 Thread Jeremy Chadwick

I'm not subscribed to -stable.

This is in response to jkim@'s messages here:

https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html
https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html

Based on what I can tell, OpenSSL 1.1.1 or thereabouts removed the
cryptodev OpenSSL engine, which was a tie-in to BSD's cryptodev(4),
which is accessed via /dev/crypto and related crypto(4) ioctls.

Instead, they offered a replacement engine called devcrypto (what an
awful name), with the primary focus being against something from Linux
called cryptodev-linux, then was made to work on FreeBSD 8.4.  This code
was as of June 2017; 8.4 was EOL'd August 2015.  Interesting.

https://github.com/openssl/openssl/commit/4f79aff is not "add support
for BSD" at all.  It's "tweak further stuff for BSD", probably to get it
to work on newer FreeBSD; they seem to care about crypto/cryptodev.h
details.  I asked myself: why do they care about that if they're doing
it all themselves?  Looking at the code sheds light on that.  The actual
devcrypto engine commits that added BSD support are here:

https://github.com/openssl/openssl/pull/3744
https://github.com/openssl/openssl/pull/3744/files

The commits indicate that the devcrypto is enabled by default on
FreeBSD.  But we can tell from Herbert's post and jkim@'s patch that's
not true at all, i.e. FreeBSD disables it.  Why?  And is that a good
default?  Here's why I ask:

The new devcrypto engine most definitely utilises /dev/crypto (thus
cryptodev(4) and crypto(4)).  cipher_init(), prepare_cipher_methods(),
digest_init(), and prepare_digest_methods() all utilise that interface:

https://github.com/openssl/openssl/pull/3744/files#diff-027f92eb0a10c0986aec873d9fd1ab66

So while OpenSSL now uses more of its own native C and assembly code
(e.g. for AES-NI support), and that's certainly faster than all the
overhead that cryptodev(4) brings with it (see jhb@'s post), I wonder:

1. What happens to people using crypto hardware accelerators, ex.
hifn(4), padlock(4), ubsec(4), and safe(4)?  How exactly would OpenSSL
utilise these H/W accelerators if the devcrypto engine is disabled?

2. If the devcrypto engine is *enabled*, and people have aesni(4)
loaded alongside cryptodev(4), which gets priority: OpenSSL's native
AES-NI code or cryptodev(4)/aesni(4)?

Likewise: if the decrypto engine is to remain disabled as a default:
this needs to be made crystal clear in Release Notes, so that folks
using H/W accelerators know they'll no longer benefit from those cards
unless they use a patch (third-party so/module won't work, AFAIT, as
OpenSSL's dynamic engine loading is unavailable per openssl engine -t).
Might I suggest enabling devcrypto be capable via src.conf, ex.
WITH_OPENSSL_ENGINE_DEVCRYPTO=true?

-- 
| Jeremy Chadwick j...@koitsu.org |
| UNIX Systems Administrator  PGP 0x2A389531 |
| Making life hard for others since 1977.|

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: lightly loaded system eats swap space

2018-06-19 Thread Jeremy Chadwick

(I am not subscribed to -stable, so please CC me, though I doubt I can
help in any way/shape/form past this Email)

Not the first time this has come up -- and every time it has, all that's
heard is crickets in the threads.  Recent proof:

https://lists.freebsd.org/pipermail/freebsd-stable/2018-April/088727.html
https://lists.freebsd.org/pipermail/freebsd-stable/2018-April/088728.html
https://lists.freebsd.org/pipermail/freebsd-stable/2018-June/089094.html

I sent private mail to Peter Jeremy about his issue.  I will not
disclose that Email here.  However, I will disclose the commits I
included in said Email that have touched ZFS ARC-related code:

http://www.freshbsd.org/commit/freebsd/r332785
http://www.freshbsd.org/commit/freebsd/r332552
http://www.freshbsd.org/commit/freebsd/r332540 (may help give insights)
http://www.freshbsd.org/commit/freebsd/r330061
http://www.freshbsd.org/commit/freebsd/r328235
http://www.freshbsd.org/commit/freebsd/r327491
http://www.freshbsd.org/commit/freebsd/r326619
http://www.freshbsd.org/commit/freebsd/r326427 (quota-related, maybe irrelevant)
http://www.freshbsd.org/commit/freebsd/r323667

In short (and nebulous as hell; sorry, I cannot be more specific given
the nature of the problem): there have been changes about ZFS's memory
allocation/releasing decision-making scheme compared to ZFS on "older"
FreeBSD (i.e. earlier 11.x, and definitely 10.x and 9.x).

Recommendations like "limit your ARC" are nothing new in FreeBSD, but
are still ridiculous kludges: tech-lists' system clearly has 105GB MRU
(MRU = most recently used) in ARC, meaning there is memory that can be
released back to the rest of the OS for general use (re: memory
contention/pressure situation), but the OS is choosing to use swap
instead, eventually exhausting it.  That logic sounds broken, IMO.  (And
yes I did notice the size of bhyve process)

ZFS-related kernel folks need to be involved in this conversation.  For
whatever reason, in the past several years, related committers are no
longer participating in these type of discussions.  The opposite was
true back in the 7.x to 9.x days.  The answers have to come from them.
I don't know, today, a) how they prefer these problems get reported to
them, or b) what exact information they want that can help narrow it
down (tech-lists' provided data is, IMO, good and par for the course).

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kern.maxswzone causing serious problems

2018-03-29 Thread Jeremy Chadwick

I am not subscribed to -stable, so please keep me CC'd.

I mailed -stable about this problem, or a variation of it, earlier this
month:

https://lists.freebsd.org/pipermail/freebsd-stable/2018-March/088467.html

What isn't publicly visible is the list of individuals I CC'd on that
mail who had touched this code in recent days: k...@freebsd.org,
d...@freebsd.org, pluk...@freebsd.org, ead...@freebsd.org

I received no response from them on this matter.  At least two, however,
have been extremely busy commit-wise, so I imagine folks are just
swamped right now + have higher priorities.

I did not read or review your {naiveanalysis} section or your patch, as
tinkering with VM design/internals is *way* outside my comfort zone.

I will say that printing the sizes in a unit other than pages would be
generally helpful; I did try to figure out what value to use for
kern.maxswzone as a workaround by digging through kernel code but gave
up, as I wasn't able to truly determine what "pages" actually
represented (size-wise) in this specific context.

I hope someone with src commit bit will comment, as code slush for
11.2-RELEASE begins on April 20th:

https://www.freebsd.org/releases/11.2R/schedule.html

Else a separate PR can be opened if requested.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Stability of 11.1S

2018-03-20 Thread Jeremy Chadwick

(Please keep me CC'd as I am not subscribed to -stable)

I haven't seen any issues, but that means very little.  Details:

Two boxes -- one bare metal, one VPS (QEMU):

$ uname -a
FreeBSD XXX 11.1-STABLE FreeBSD 11.1-STABLE #0 r330529: Tue Mar  6 
11:36:04 PST 2018 
root@XXX:/usr/obj/usr/src/sys/X7SBA_RELENG_11_amd64  amd64
$ uptime
10:33a.m.  up 13 days, 18:10, 2 users, load averages: 0.15, 0.19, 0.16

$ uname -a
FreeBSD  11.1-STABLE FreeBSD 11.1-STABLE #0 r330753: Sat Mar 10 
21:34:20 PST 2018 
root@:/usr/obj/usr/src/sys/_RELENG_11_amd64  amd64
$ uptime
10:33a.m.  up 9 days, 10:46, 1 user, load averages: 0.31, 0.35, 0.31

Systems were updated recently because I wanted to test Meltdown/Spectre
mitigation (more on that below).  Prior to that, bare metal was running
9.x with 200+ day uptimes, VPS was running 10.x with 80-90 day uptimes
(VPS providers' HV crashed, i.e. not FreeBSD issues).

Since load averages on FreeBSD 10.x onward cannot be trusted[1][2], I
have to explain the general system specs and loads:

Bare metal box is an Intel Core 2 Quad Q9550, 8GB RAM, doing very little
other than running Apache + lots of cron jobs for systems stuff + ZFS
with several disks (but not OS disk; that's a dedicated SSD w/ UFS + SU
(not SUJ).  The cron jobs tend to stress the network and disk I/O a bit;
ZFS gets used every day, but only "heavily" during LAN file copies
to/from it (Samba is involved), and during nightly backups with rsync.

VPS box is some form of QEMU-based Intel Haswell CPU, 1GB RAM, doing
general things like Apache + postfix + SpamAssassin + some other
daemons, and a lot of Perl.  Swap is used heavily on this machine.
Disks are all vtblk, and I use multiple to get capacity for the needed
space for /usr/src and /usr/obj.  Everything is UFS + SU (not SUJ).

Things off the top of my head that might be relevant to you:

1. r329462 added Meltdown/Spectre mitigation[3][4].

Bare metal box has the below in /boot/loader.conf, since this is a
machine that does not need either given its environment:

# Disable PTI (Meltdown mitigation) and IBRS (Spectre mitigation); these
# are not relevant on this bare-metal system given its environment and
# use case.  Details of these tunables is here:
# https://lists.freebsd.org/pipermail/freebsd-stable/2018-March/088526.html
#
vm.pmap.pti="0"
hw.ibrs_disable="1"

VPS box has no tunings of this sort, and ends up with the below, because
the hosting provider has no done BIOS + QEMU updates to add IBRS
support (they're very aware of it + have attempted it twice but
apparently it didn't go well):

vm.pmap.pti: 1
hw.ibrs_disable: 1
hw.ibrs_active: 0

2. If your CPU is an AMD Ryzen, there is a VERY long discussion on
-stable about problems with Ryzen manifesting itself in a very
uncomfortable way, leading to system lock-ups[5].  There are unofficial
patches you can try.  I would recommend chiming in there and not here,
if relevant to your systems.

And yes, the massive number of MFCs that eadler@ is doing make tracking
down exact things more tedious than normal, especially when you have
sweeping commits like this one[6][7] (which, AFAIK, was acting as a
major blocker for several other MFCs and causing general merge
problems).

However, I commend his efforts; it's a massive undertaking (I would say
full-time job).  We stable users must accept that we are running
stable/11 for a reason -- not only to get fixes faster, but to act a
form of "guinea pig" that don't want the risks of HEAD/CURRENT.  The
more people using stable/11 the better overall feedback devs can get on
bugs/issues before making it into the next -RELEASE.  This is exactly
why, for those of you who have known me over the years, I actually
"track" or "follow" commits as they come across.  I do this by using the
FreshBSD site[8] alongside manual review of svnlite update output.  I
generally know what files/bits are relevant to my interests.

Hope this gives you some things to think about.  Good luck!

[1]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541#c8
[2]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541#c22
[3]: 
https://lists.freebsd.org/pipermail/freebsd-stable/2018-February/088396.html
[4]: https://lists.freebsd.org/pipermail/freebsd-stable/2018-March/088526.html
[5]: 
https://lists.freebsd.org/pipermail/freebsd-stable/2018-January/thread.html#88174
[6]: http://www.freshbsd.org/commit/freebsd/r330897
[7]: https://svnweb.freebsd.org/base?view=revision&revision=330897
[8]: http://www.freshbsd.org/?branch=RELENG_11&project=freebsd

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd

total configured swap pages exceeds maximum recommended amount

2018-03-02 Thread Jeremy Chadwick

I am not subscribed to -stable, so please keep me CC'd.  I am CC'ing
folks who have touched this code or dealt with it recently or in the
past.

Something has changed regarding how FreeBSD determines when to emit this
message.  I do not know if this is a regression.  The message below
comes from a stable/11 r330260 amd64 box w/ 8GB RAM and 32GB swap during
boot:

warning: total configured swap (8358563 pages) exceeds maximum recommended 
amount (8141112 pages).
warning: increase kern.maxswzone or reduce amount of swap.

In stable/9, the message could be squelched via kern.maxswzone="0" in
loader.conf.  Confirmation is here (see Dag-Erling's responses):
https://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069301.html

In stable/11, this no longer appears to work (the default value is 0).

The reason this box has 32GB swap (4x more than existing RAM) has to do
with planning ahead.  The system can support up to 32GB RAM, but does
not have all the DIMM slots populated at this time.  Swap on this
machine is a physical partition on its main disk, thus "shrinking swap"
is not not possible without a full format/reinstall.

This code has been touched/tweaked semi-recently in PR 221356:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221356

Code references:

stable/9:  
https://svnweb.freebsd.org/base/stable/9/sys/vm/swap_pager.c?annotate=284100#l2132
stable/10: 
https://svnweb.freebsd.org/base/stable/10/sys/vm/swap_pager.c?annotate=320557#l2156
stable/11: 
https://svnweb.freebsd.org/base/stable/11/sys/vm/swap_pager.c?annotate=329591#l2126

My questions: how does one squelch this warning message on such systems
running stable/11?  If it involves setting the tunable to a more useful
value, how does one reliably calculate that value?

Thank you.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

stable/11 r329462 - Meltdown/Spectre MFC questions

2018-02-17 Thread Jeremy Chadwick

Reference: https://svnweb.freebsd.org/base?view=revision&revision=329462

Do the following new loader tunables and sysctls have documentation
anywhere?  I ask because I wish to know how to turn all of this off (yes
you heard me correctly), as not all systems necessarily require
mitigation of these flaws.

Best I can tell from skimming source:

vm.pmap.pti
  - Description: Page Table Isolation enabled
  - Loader tunable, visible in sysctl (read-only)
  - Integer
  - Default value: depends on CPU model and capabilities, see
function pti_get_default(); looks like AMD = 0, any CPU with
RDCL_NO capability enabled = 0, else 1

hw.ibrs_active
  - Description: Indirect Branch Restricted Speculation active
  - sysctl (read-only)
  - Integer
  - Real-time indicator as to if IBRS is currently on or off

hw.ibrs_disable 
  - Description: Disable Indirect Branch Restricted Speculation
  - Loader tunable and sysctl tunable (read-write)
  - Integer
  - Default value: unsure.  Variable declaration has 1 but
SYSCTL_PROC() macro has 0.

Thank you.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r296462 - in stable/9: crypto/openssl/crypto/bio crypto/openssl/crypto/bn crypto/openssl/doc/apps crypto/openssl/ssl secure/usr.bin/openssl/man

2016-03-09 Thread Jeremy Chadwick

ied to send mail to myself locally, as postfix's smtp(8) links to
libcrypt/libssl/libcrypto.  Bzzt, nope:

pid 5046 (smtp), uid 125: exited on signal 11

Mar  9 04:49:38 icarus postfix/master[802]: daemon started -- version 3.1.0, 
configuration /usr/local/etc/postfix
Mar  9 04:54:38 icarus postfix/pickup[5043]: 1835D1AF150: uid=1000 from=
Mar  9 04:54:38 icarus postfix/cleanup[5044]: 1835D1AF150: 
message-id=<20160309125438.ga5...@icarus.home.lan>
Mar  9 04:54:38 icarus postfix/qmgr[804]: 1835D1AF150: 
from=, size=631, nrcpt=1 (queue active)
Mar  9 04:54:38 icarus postfix/qmgr[804]: warning: private/smtp socket: 
malformed response
Mar  9 04:54:38 icarus postfix/qmgr[804]: warning: transport smtp failure -- 
see a previous warning/fatal/panic logfile record for the problem description
Mar  9 04:54:38 icarus postfix/master[802]: warning: process 
/usr/local/libexec/postfix/smtp pid 5046 killed by signal 11
Mar  9 04:54:38 icarus postfix/master[802]: warning: 
/usr/local/libexec/postfix/smtp: bad command startup -- throttling
Mar  9 04:54:38 icarus postfix/error[5048]: 1835D1AF150: to=, 
relay=none, delay=0.5, delays=0.05/0.44/0/0.01, dsn=4.3.0, status=deferred 
(unknown mail transport error)

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/10: high load average when box is idle

2015-10-29 Thread Jeremy Chadwick

On Thu, Oct 29, 2015 at 11:00:32AM +0100, Miroslav Lachman wrote:
> Jeremy Chadwick wrote on 10/27/2015 06:05:
> >(I am not subscribed to the mailing list, please keep me CC'd)
> >
> >Issue: a stable/10 system that has an abnormally high load average (e.g.
> >0.15, but may be higher depending on other variables which I can't
> >account for) when the machine is definitely idle (i.e. cannot be traced
> >to high interrupt usage per vmstat -i, cannot be traced to a userland
> >process or kernel thread, etc.).
> >
> >This problem has been discussed many times on the FreeBSD mailing lists
> >and the FreeBSD forum (including some folks seeing it on 9.x, but my
> >complaint here is focused on 10.x so please focus there).
> >
> >I'd politely like to request that anyone experiencing this, or who has
> >experienced it (and if you know when it stopped or why, including what
> >you may have done, include that), to chime in on this ticket from 2012
> >(made for 9.x but style of issue still applies; c#5 is quite valid):
> >
> >https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541
> >
> >For those still experiencing it, I'd suggest reading c#8 and seeing if
> >sysctl kern.eventtimer.periodic=1 relieves the problem for you.  (At
> >this time I would not suggest leaving that set indefinitely, as it does
> >seem to increase the interrupt rate in cpuX:timer in vmstat -i.  But for
> >me kern.eventtimer.periodic=1 "fixes" the issue)
> 
> Is it on real HW server or in some kind of virtualization? I am seeing load
> 0.5 - 1.2 on three virtual machines in VMware. The machines are without any
> traffic. Just fresh instalation of FreeBSD 10.1 and some services without
> any public content.

I've seen it on both bare-metal and VMs.  Please see c#8 in the ticket;
there's an itemised list of where I've seen it, but I'm sure it's not
limited to just those.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

stable/10: high load average when box is idle

2015-10-26 Thread Jeremy Chadwick

(I am not subscribed to the mailing list, please keep me CC'd)

Issue: a stable/10 system that has an abnormally high load average (e.g.
0.15, but may be higher depending on other variables which I can't
account for) when the machine is definitely idle (i.e. cannot be traced
to high interrupt usage per vmstat -i, cannot be traced to a userland
process or kernel thread, etc.).

This problem has been discussed many times on the FreeBSD mailing lists
and the FreeBSD forum (including some folks seeing it on 9.x, but my
complaint here is focused on 10.x so please focus there).

I'd politely like to request that anyone experiencing this, or who has
experienced it (and if you know when it stopped or why, including what
you may have done, include that), to chime in on this ticket from 2012
(made for 9.x but style of issue still applies; c#5 is quite valid):

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541

For those still experiencing it, I'd suggest reading c#8 and seeing if
sysctl kern.eventtimer.periodic=1 relieves the problem for you.  (At
this time I would not suggest leaving that set indefinitely, as it does
seem to increase the interrupt rate in cpuX:timer in vmstat -i.  But for
me kern.eventtimer.periodic=1 "fixes" the issue)

Thanks.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Stable/9 from today mpssas_scsiio timeouts

2013-07-09 Thread Jeremy Chadwick

On Tue, Jul 09, 2013 at 11:46:24AM -0400, Outback Dingo wrote:
> On Tue, Jul 9, 2013 at 11:30 AM, Jeremy Chadwick  wrote:
> 
> > On Tue, Jul 09, 2013 at 11:20:45AM -0400, Outback Dingo wrote:
> > > On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick  wrote:
> > >
> > > > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote:
> > > > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo <
> > outbackdi...@gmail.com
> > > > >wrote:
> > > > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick 
> > > > wrote:
> > > > > >
> > > > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote:
> > > > > >> > as of stable today im seeing alot of new mps time outs
> > > > > >> >
> > > > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul  8 16:34:28
> > UTC
> > > > 2013
> > > > > >> > root@:/usr/obj/nas/usr/src/sys/
> > > > > >> >
> > > > > >> > mps1@pci0:130:0:0:  class=0x010700 card=0x30201000
> > > > chip=0x00721000
> > > > > >> > rev=0x03 hdr=0x00
> > > > > >> > vendor = 'LSI Logic / Symbios Logic'
> > > > > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]'
> > > > > >> > class  = mass storage
> > > > > >> > subclass   = SAS
> > > > > >> >
> > > > > >> >
> > > > > >> > mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm
> > > > > >> > 0xff80021a6b78
> > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > > > SMID
> > > > > >> 983
> > > > > >> > command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800
> > > > > >> > mps0: mpssas_alloc_tm freezing simq
> > > > > >> > mps0: timedout cm 0xff80021a6b78 allocated tm
> > 0xff80021587b0
> > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > > > SMID
> > > > > >> 983
> > > > > >> > completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800
> > > > during
> > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0
> > > > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a
> > code
> > > > 0x0
> > > > > >> count
> > > > > >> > 1
> > > > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting
> > > > TaskMID
> > > > > >> 983
> > > > > >> > mps0: mpssas_free_tm releasing simq
> > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00
> > > > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout
> > > > > >> > (probe40:mps0:0:40:0): Retrying command
> > > > > >> > mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm
> > > > > >> > 0xff80023e5b78
> > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length
> > 36
> > > > SMID
> > > > > >> 983
> > > > > >> > command timeout cm 0xff80023e5b78 ccb 0xfe002be14800
> > > > > >> > mps1: mpssas_alloc_tm freezing simq
> > > > > >> > mps1: timedout cm 0xff80023e5b78 allocated tm
> > 0xff80023977b0
> > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length
> > 36
> > > > SMID
> > > > > >> 983
> > > > > >> > completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800
> > > > during
> > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0
> > > > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a
> > code
> > > > 0x0
> > > > > >> count
> > > > > >> > 1
> > > > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting
> > > > TaskMID
> > > > > >> 983
> > > > > >> > mps1: mpssas_free_tm releasing simq
> > > > > >> > (probe292:mps1:0:37:

Re: Stable/9 from today mpssas_scsiio timeouts

2013-07-09 Thread Jeremy Chadwick

On Tue, Jul 09, 2013 at 11:20:45AM -0400, Outback Dingo wrote:
> On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick  wrote:
> 
> > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote:
> > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo  > >wrote:
> > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick 
> > wrote:
> > > >
> > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote:
> > > >> > as of stable today im seeing alot of new mps time outs
> > > >> >
> > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul  8 16:34:28 UTC
> > 2013
> > > >> > root@:/usr/obj/nas/usr/src/sys/
> > > >> >
> > > >> > mps1@pci0:130:0:0:  class=0x010700 card=0x30201000
> > chip=0x00721000
> > > >> > rev=0x03 hdr=0x00
> > > >> > vendor = 'LSI Logic / Symbios Logic'
> > > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]'
> > > >> > class  = mass storage
> > > >> > subclass   = SAS
> > > >> >
> > > >> >
> > > >> > mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm
> > > >> > 0xff80021a6b78
> > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800
> > > >> > mps0: mpssas_alloc_tm freezing simq
> > > >> > mps0: timedout cm 0xff80021a6b78 allocated tm 0xff80021587b0
> > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800
> > during
> > > >> > recovery ioc 8048 scsi 0 state c xfer 0
> > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code
> > 0x0
> > > >> count
> > > >> > 1
> > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting
> > TaskMID
> > > >> 983
> > > >> > mps0: mpssas_free_tm releasing simq
> > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00
> > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout
> > > >> > (probe40:mps0:0:40:0): Retrying command
> > > >> > mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm
> > > >> > 0xff80023e5b78
> > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > command timeout cm 0xff80023e5b78 ccb 0xfe002be14800
> > > >> > mps1: mpssas_alloc_tm freezing simq
> > > >> > mps1: timedout cm 0xff80023e5b78 allocated tm 0xff80023977b0
> > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36
> > SMID
> > > >> 983
> > > >> > completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800
> > during
> > > >> > recovery ioc 8048 scsi 0 state c xfer 0
> > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code
> > 0x0
> > > >> count
> > > >> > 1
> > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting
> > TaskMID
> > > >> 983
> > > >> > mps1: mpssas_free_tm releasing simq
> > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00
> > > >> > (probe292:mps1:0:37:0): CAM status: Command timeout
> > > >> > (probe292:mps1:0:37:0): Retrying command
> > > >>
> > > >> 1. What revision were you running before (i.e. what were you on prior
> > to
> > > >> the upgrade)?
> > > >>
> > > >
> > > >
> > > > Sorry I was on 252595 from July 3
> >
> > And does rolling back to r252595 resolve the problem for you?
> >
> > Because the only commit I see between r253035 and r252595 that might
> > account for some kind of behavioural change, unless I missed one while
> > skimming the commit history, is the following:
> >
> > r252730 -- http://www.freshbsd.org/commit/freebsd/r252730
> >
> > If at all possible, please try updating to r253037 or newer to see
> > if tha

Re: Stable/9 from today mpssas_scsiio timeouts

2013-07-09 Thread Jeremy Chadwick

On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote:
> On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo wrote:
> > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick  wrote:
> >
> >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote:
> >> > as of stable today im seeing alot of new mps time outs
> >> >
> >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul  8 16:34:28 UTC 2013
> >> > root@:/usr/obj/nas/usr/src/sys/
> >> >
> >> > mps1@pci0:130:0:0:  class=0x010700 card=0x30201000 chip=0x00721000
> >> > rev=0x03 hdr=0x00
> >> > vendor = 'LSI Logic / Symbios Logic'
> >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]'
> >> > class  = mass storage
> >> > subclass   = SAS
> >> >
> >> >
> >> > mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm
> >> > 0xff80021a6b78
> >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID
> >> 983
> >> > command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800
> >> > mps0: mpssas_alloc_tm freezing simq
> >> > mps0: timedout cm 0xff80021a6b78 allocated tm 0xff80021587b0
> >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID
> >> 983
> >> > completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800 during
> >> > recovery ioc 8048 scsi 0 state c xfer 0
> >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0
> >> count
> >> > 1
> >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting TaskMID
> >> 983
> >> > mps0: mpssas_free_tm releasing simq
> >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00
> >> > (probe40:mps0:0:40:0): CAM status: Command timeout
> >> > (probe40:mps0:0:40:0): Retrying command
> >> > mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm
> >> > 0xff80023e5b78
> >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID
> >> 983
> >> > command timeout cm 0xff80023e5b78 ccb 0xfe002be14800
> >> > mps1: mpssas_alloc_tm freezing simq
> >> > mps1: timedout cm 0xff80023e5b78 allocated tm 0xff80023977b0
> >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID
> >> 983
> >> > completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800 during
> >> > recovery ioc 8048 scsi 0 state c xfer 0
> >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0
> >> count
> >> > 1
> >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting TaskMID
> >> 983
> >> > mps1: mpssas_free_tm releasing simq
> >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00
> >> > (probe292:mps1:0:37:0): CAM status: Command timeout
> >> > (probe292:mps1:0:37:0): Retrying command
> >>
> >> 1. What revision were you running before (i.e. what were you on prior to
> >> the upgrade)?
> >>
> >
> >
> > Sorry I was on 252595 from July 3

And does rolling back to r252595 resolve the problem for you?

Because the only commit I see between r253035 and r252595 that might
account for some kind of behavioural change, unless I missed one while
skimming the commit history, is the following:

r252730 -- http://www.freshbsd.org/commit/freebsd/r252730

If at all possible, please try updating to r253037 or newer to see
if that has some effect/improvement.  Why I mention that commit:

r253037 -- http://www.freshbsd.org/commit/freebsd/r253037

Because the only mps(4) changes done in recent days are:

http://svnweb.freebsd.org/base/stable/9/sys/dev/mps/mps_sas.c?view=log

r253037
r251899
r251874

Else I'd say what you're experiencing is legitimate/unrelated to kernel
changes.  I can only speculate.

The messages to me indicate that some part of the kernel is submitting a
SCSI INQUIRY request to the underlying device(s) which results in a CAM
timeout, i.e. the disk attached to the controller did not respond
promptly (while the controller seemed to be alive/well).

If these disks (which we do not know the type of -- no dmesg provided,
etc.) are SSDs then TRIM behaviour is possibly causing the drive to take
too long to perform its TRIM cleanup, or, the drives themselves are
doing some kind of garbage collection which is taking quite a long time.

Steven et all may have a different (and almost certainly more accurate)
analysis.

It would really help if you could provide "dmesg" from the machine, as
well as any details about your setup (if ZFS, "zpool status", etc.), in
addition to (if SSDs) "sysctl -a | grep -i trim".  All this matters.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Stable/9 from today mpssas_scsiio timeouts

2013-07-09 Thread Jeremy Chadwick

On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote:
> as of stable today im seeing alot of new mps time outs
> 
> 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul  8 16:34:28 UTC 2013
> root@:/usr/obj/nas/usr/src/sys/
> 
> mps1@pci0:130:0:0:  class=0x010700 card=0x30201000 chip=0x00721000
> rev=0x03 hdr=0x00
> vendor = 'LSI Logic / Symbios Logic'
> device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]'
> class  = mass storage
> subclass   = SAS
> 
> 
> mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm
> 0xff80021a6b78
> (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983
> command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800
> mps0: mpssas_alloc_tm freezing simq
> mps0: timedout cm 0xff80021a6b78 allocated tm 0xff80021587b0
> (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983
> completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800 during
> recovery ioc 8048 scsi 0 state c xfer 0
> (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0 count
> 1
> (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting TaskMID 983
> mps0: mpssas_free_tm releasing simq
> (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00
> (probe40:mps0:0:40:0): CAM status: Command timeout
> (probe40:mps0:0:40:0): Retrying command
> mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm
> 0xff80023e5b78
> (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983
> command timeout cm 0xff80023e5b78 ccb 0xfe002be14800
> mps1: mpssas_alloc_tm freezing simq
> mps1: timedout cm 0xff80023e5b78 allocated tm 0xff80023977b0
> (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983
> completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800 during
> recovery ioc 8048 scsi 0 state c xfer 0
> (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0 count
> 1
> (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting TaskMID 983
> mps1: mpssas_free_tm releasing simq
> (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00
> (probe292:mps1:0:37:0): CAM status: Command timeout
> (probe292:mps1:0:37:0): Retrying command

1. What revision were you running before (i.e. what were you on prior to
the upgrade)?

2. Something in your /usr/src differs from stock r253035, hence the "M"
at the end.  What is it?

Answer to #1 will help me narrow down the commits; there have been CAM
and mps changes fairly recently.  Otherwise you can dig through the
commits yourself (you'll need to go through many, many pages, as there
was a recent massive influx of SCTP changes (50+ commits)):

http://www.freshbsd.org/?branch=RELENG_9&project=freebsd

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found

2013-07-07 Thread Jeremy Chadwick

k.c:988
> > > #12 0xc07ba904 in fork_trampoline () at 
> > > /src/src-9/sys/i386/i386/exception.s:279
> > > (kgdb) up 10
> > > #10 0xc0738f94 in softdep_flush () at 
> > > /src/src-9/sys/ufs/ffs/ffs_softdep.c:1414
> > > 1414progress += softdep_process_worklist(mp, 
> > > 0);
> > > 
> > >   -Andre
> > 
> > This looks unrelated, and exactly this panic is usually has one of two
> > causes:
> > - corrupted filesystem, run fsck to recheck it;
> 
> root@palveli:~>fsck /dev/stripe/p 
> ** /dev/stripe/p
> ** Last Mounted on /palveli
> ** Phase 1 - Check Blocks and Sizes
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> ** Phase 5 - Check Cyl groups
> 9895 files, 2039706 used, 15697693 free (5397 frags, 1961537 blocks, 0.0% 
> fragmentation)
> 
> * FILE SYSTEM IS CLEAN *

Taken from your previous mail (showing only UFS stuff):

http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073817.html

>>>> fstab:
>>>> --
>>>> /dev/da0s1a/   ufs noatime,rw  
>>>> 0 1
>>>> /dev/da0s1d/usrufs noatime,rw  
>>>> 0 2
>>>> /dev/da0s1e/varufs noatime,nosuid,rw   
>>>> 0 2
>>>> /dev/da10p1/share2 ufs 
>>>> suiddir,groupquota,noatime,nosuid,rw0 2
>>>> /dev/da10p2/raid2  ufs userquota,noatime,nosuid,rw 
>>>> 0 2

Where is gstripe(8) in that picture?  Are you **sure** this is the same
system?  Surely I'm missing something here...

Can you provide details of the stripe, specifically "gstripe list" so I
can see what the disks are and then ask you for "smartctl -a" output for
each of them (to try and rule out disk-level problems that may be
causing oddities at the layer underneathe the filesystem (sometimes fsck
will not catch this))?

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make buildworld is now 50% slower

2013-07-07 Thread Jeremy Chadwick

On Sun, Jul 07, 2013 at 05:47:31AM -0500, Matthew D. Fuller wrote:
> Apropos of nothing, but...
> 
> On Sun, Jul 07, 2013 at 03:17:14AM -0700 I heard the voice of
> Jeremy Chadwick, and lo! it spake thus:
> >
> > WITHOUT_LIB32=true
> 
> suggests you're running amd64, which I'm pretty sure means
> 
> > - I do increase kern.maxdsiz, kern.dfldsiz, and kern.maxssiz in
> > /boot/loader.conf to 2560M/2560M/256M respectively, but that was mainly
> > from the days when I ran MySQL and needed a huge userland processes.
> 
> are not necessarily _in_creases, and may well be mostly _de_creases.
> e.g., on a RELENG_9 box with 8 gig of physical RAM:
> 
> % sysctl kern.{max{d,s},dfld}siz
> kern.maxdsiz: 34359738368
> kern.maxssiz: 536870912
> kern.dfldsiz: 134217728
>
> while a -CURRENT box with 16 has dfldsiz blown all the way up too.  I
> don't recall doing anything to change them at all recently, and a
> glance over loader.conf, sysctl.conf, rc.local, and the kernel configs
> doesn't turn up anything.

Thanks!

The settings I mention are from "ancient times" -- specifically RELENG_6
on i386 (I know because I found an old mailing list post of mine
discussing the settings with a user).

The problem as I said was that mysqld would crap itself (crash and be
quite loud about it) if the process allocated too much memory/became too
large.  I am fairly certain the issue related to the data size, **not**
the stack size (but I didn't see the harm in increasing that either).

It's good to know I can remove these on amd64.  Yay, one less thing in
loader.conf I have to deal with...  :-)  Thanks again!

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: USB ports on Lenovo T400 do not work after a suspend/resume

2013-07-07 Thread Jeremy Chadwick

On Sun, Jul 07, 2013 at 03:51:12PM +1000, Ian Smith wrote:
> On Sun, 30 Jun 2013 15:02:57 -0700, Adrian Chadd wrote:
>  > On 30 June 2013 07:22, Ian Smith  wrote:
> [..]
>  > > Nothing of note that I can see, if that usb hub-to-bus remapping is
>  > > normal.  As you said, 'CPU0: local APIC error 0x40' looks maybe sus.
>  > > Maybe someone who knows might comment on that?
> 
> Does noone know what that signifies?  Maybe it's not relevant to this.

It's too vague to know.  The error comes from lapic_handle_error(),
which is a generic/small routine which pulls the local APIC error status
register.  (Note I'm saying APIC, not ACPI -- two different things)

apic_vector.S sets this up/makes use of this function, and its done as
an interrupt handler.

I think this is one of those situations where you have to know *what* is
being set up/done at that moment in time for the error code to mean
something.  Maybe booting verbose would give more information as to what
was being done that lead up to the line.

I've CC'd John Baldwin who might have some ideas.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make buildworld is now 50% slower

2013-07-07 Thread Jeremy Chadwick

On Sun, Jul 07, 2013 at 11:50:29AM +0300, Daniel Braniss wrote:
> > On Fri, Jul 05, 2013 at 02:39:00PM +0200, Dimitry Andric wrote:
> > > [redirecting to the correct mailing list, freebsd-stable@ ...]
> > > 
> > > On Jul 5, 2013, at 10:53, Daniel Braniss  wrote:
> > > > after today's update of 9.1-STABLE I noticed that make 
> > > > build[world|kernel] are
> > > > taking conciderable more time, is it because the upgrade of clang?
> > > > and if so, is the code produced any better?
> > > > 
> > > > before:
> > > > buildwordl:  26m4.52s real 2h28m32.12s user 36m6.27s sys
> > > > buildkernel: 7m29.42s real 23m22.22s user 4m26.26s sys
> > > > 
> > > > today:
> > > > buildwordl: 34m29.80s real 2h38m9.37s user 37m7.61s sys
> > > > buildkernel:15m31.52s real 22m59.40s user 4m33.06s sys
> > > 
> > > Ehm, your user and sys times are not that much different at all, they
> > > add up to about 5% slower for buildworld, and 1% faster for build kernel.
> > > Are you sure nothing else is running on that machine, eating up CPU time
> > > while you are building? :)
> > > 
> > > But yes, clang 3.3 is of course somewhat larger than 3.2.  You might
> > > especially notice that, if you are using gcc, which is very slow at
> > > compiling C++.
> > > 
> > > In any case, if you do not care about clang, just set WITHOUT_CLANG= in
> > > your /etc/src.conf, and you can shave off some build time.
> > 
> > I just built world/kernel (stable/9 r252769) 5 hours ago.  Results:
> > 
> > time make -j4 buildworld  = roughly 21 minutes on my hardware
> > time make -j4 buildkernel = roughly 8 minutes on my hardware
> > 
> 
> It's been a long time since I saw such numbers, maybe it's time
> to see where time is being spent, I will run it without clang to compare with
> your numbers.
> 
> > These numbers are about the norm for me, meaning I do not see a
> > substantial increase in build times.
> > 
> > Key point: I do not use/build/grok clang, i.e. WITHOUT_CLANG=true is in
> > my src.conf.  But I am aware of the big clang change in r252723.
> > 
> > If hardware details are wanted, ask, but I don't think it's relevant to
> > what the root cause is.
> > 
> 
> from what you are saying, I guess clang is not responsible.
> looking for my Sherlock Holmes hat.

Some points to those numbers I stated above:

- System is an Intel Q9550 with 8GB of RAM

- Single SSD (UFS2+SU+TRIM) is used for root, /usr, /var, /tmp, and swap

- /usr/src is on ZFS (raidz1 + 3 disks) -- however I got equally small
numbers when it was on the SSD

- /usr/src is using compression=lz4  (to folks from -fs: yeah, I'm
trying it out to see how much of an impact it has on interactivity.  I
can still tell when it kicks in, but it's way, way better than lzjb.
Rather not get into that here)

- Contents of /etc/src.conf (to give you some idea of what I disable):

WITHOUT_ATM=true
WITHOUT_BLUETOOTH=true
WITHOUT_CLANG=true
WITHOUT_FLOPPY=true
WITHOUT_FREEBSD_UPDATE=true
WITHOUT_INET6=true
WITHOUT_IPFILTER=true
WITHOUT_IPX=true
WITHOUT_KERBEROS=true
WITHOUT_LIB32=true
WITHOUT_LPR=true
WITHOUT_NDIS=true
WITHOUT_NETGRAPH=true
WITHOUT_PAM_SUPPORT=true
WITHOUT_PPP=true
WITHOUT_SENDMAIL=true
WITHOUT_WIRELESS=true
WITH_OPENSSH_NONE_CIPHER=true

It's WITHOUT_CLANG that cuts down the buildworld time by a *huge* amount
(I remember when it got introduced, my buildworld jumped up to something
like 40 minutes); the rest probably save a minute or two at most.

- /etc/make.conf doesn't contain much that's relevant, other than:

CPUTYPE?=core2

# For DTrace; also affects ports
STRIP=
CFLAGS+=-fno-omit-frame-pointer

- I do some tweaks in /etc/sysctl.conf (mainly vfs.read_min and
vfs.read_max), but I will admit I am not completely sure what those
do quite yet (I just saw the commit from scottl@ a while back talking
about how an increased vfs.read_min helps them at Netflix quite a
lot).  I also adjust kern.maxvnodes.

- Some ZFS ARC settings are adjusted in /boot/loader.conf (I'm playing
with some stuff I read in Andriy Gapon's ZFS PDF), but they definitely
do not have a major impact on the numbers I listed off.

- I do increase kern.maxdsiz, kern.dfldsiz, and kern.maxssiz in
/boot/loader.conf to 2560M/2560M/256M respectively, but that was mainly
from the days when I ran MySQL and needed a huge userland processes.

All in all my numbers are low/small because of two things: the SSD, and
WITHOUT_CLANG.

Hope this gives you somewhere to start/stuff to ponder.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: When will subversion be ready for updating/upgrading src && ports?

2013-07-05 Thread Jeremy Chadwick

On Fri, Jul 05, 2013 at 08:38:07PM -0700, bsd-li...@hush.com wrote:
> Greetings,
>  Well after posting a couple of questions to the list regarding questions I 
> had before
> migrating from (cv)sup to subversion, I took the leap:
> 
> mv /usr/src/ /usr/src.old/
> 
> mkdir /usr/src
> 
> mv /usr/ports/ /usr/ports.old/
> 
> mkdir /usr/ports
> 
> rm -fr /var/db/sup/*
> rm -fr /var/db/portsnap/*
> 
> svn checkout svn://svn.freebsd.org/base/stable/8 /usr/src
> 
> svn checkout svn://svn.freebsd.org/ports/head /usr/ports
> 
> I then performed a portmaster -a
> 
> which left me with a non-working X desktop.
> Turned out to be a problem with the Nvidia driver -- was 2.9.40, now 3.10.14.
> But loading it in loader.conf didn't create /dev/nvidia0, or /dev/nvidiactl
> To make a long story short, I attempted to update my src && ports, and try 
> agaiin;
> 
> svn update svn://svn.freebsd.org/ports/head /usr/ports
> FAILED! I don't have the exact output

Incorrect syntax -- should be one of the following (your choice):

  cd /usr/ports && svn update
  svn update /usr/ports

> So I tried:
> cd /usr/ports
> svn update
> Which replied:
> svn: E155036: Please see the 'svn upgrade' command
> svn: E155036: The working copy at '/usr/ports'
> is too old (format 29) to work with client version '1.8.0 (r1490375)' 
> (expects f
> ormat 31). You need to upgrade the working copy first.
> 
> So I guess subversion isn't (yet) designed for this sort of stuff, which 
> leaves me with a useless box. :(

Incorrect.  Please look very, VERY closely at what the command is that
it's telling you to use.  Read it 4 times over.  Pay close attention.

The explanation:

You installed subversion 1.7 or earlier when you originally started
(i.e. subversion-1.7 or 1.6 or something else was installed).  No
problem.

You then updated your ports tree.  No problem.

You then ran portmaster -a to upgrade/update all your ports (rebuild
them).  No problem.  However this updated subversion to the latest in
ports, which is 1.8.

The subversion metadata (stored in the .svn directories, ex.
/usr/src/.svn, /usr/ports/.svn, etc.) has changed as of 1.8.  This is
why you need to do "svn upgrade" in those directories.

This is a one-time thing you have to do.  That's all.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: make buildworld is now 50% slower

2013-07-05 Thread Jeremy Chadwick

On Fri, Jul 05, 2013 at 02:39:00PM +0200, Dimitry Andric wrote:
> [redirecting to the correct mailing list, freebsd-stable@ ...]
> 
> On Jul 5, 2013, at 10:53, Daniel Braniss  wrote:
> > after today's update of 9.1-STABLE I noticed that make build[world|kernel] 
> > are
> > taking conciderable more time, is it because the upgrade of clang?
> > and if so, is the code produced any better?
> > 
> > before:
> > buildwordl:  26m4.52s real 2h28m32.12s user 36m6.27s sys
> > buildkernel: 7m29.42s real 23m22.22s user 4m26.26s sys
> > 
> > today:
> > buildwordl: 34m29.80s real 2h38m9.37s user 37m7.61s sys
> > buildkernel:15m31.52s real 22m59.40s user 4m33.06s sys
> 
> Ehm, your user and sys times are not that much different at all, they
> add up to about 5% slower for buildworld, and 1% faster for build kernel.
> Are you sure nothing else is running on that machine, eating up CPU time
> while you are building? :)
> 
> But yes, clang 3.3 is of course somewhat larger than 3.2.  You might
> especially notice that, if you are using gcc, which is very slow at
> compiling C++.
> 
> In any case, if you do not care about clang, just set WITHOUT_CLANG= in
> your /etc/src.conf, and you can shave off some build time.

I just built world/kernel (stable/9 r252769) 5 hours ago.  Results:

time make -j4 buildworld  = roughly 21 minutes on my hardware
time make -j4 buildkernel = roughly 8 minutes on my hardware

These numbers are about the norm for me, meaning I do not see a
substantial increase in build times.

Key point: I do not use/build/grok clang, i.e. WITHOUT_CLANG=true is in
my src.conf.  But I am aware of the big clang change in r252723.

If hardware details are wanted, ask, but I don't think it's relevant to
what the root cause is.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: UFS Trim wont stay set

2013-07-04 Thread Jeremy Chadwick

On Thu, Jul 04, 2013 at 04:48:38PM -0400, Mike Jakubik wrote:
> On 07/04/13 16:33, Jeremy Chadwick wrote:
> >Yup, experienced this myself many times over. The reasons are
> >understood (it's not limited to just the TRIM bits, it's related
> >to anything adjusting the superblock -- it gets cached in memory
> >in certain situations and not flushed back to disk). Hint: are you
> >booting into single user and then issuing a "mount" command before
> >doing your tunefs stuff? If so, this is probably what's causing it
> >(at least it was in my case). Instead just boot into single-user,
> >do not mount anything, and use /sbin/tunefs (if available --
> >depends on your filesystem setup) or /rescue/tunefs.
> 
> I booted in to single user mode and the system mounted the only file
> system there, which is mounted at /. What i did now however is boot
> off a Live CD and run tunefs, this did the trick!

I talked with Andriy Gapon a couple years ago about this, actually.  I
had to dig up the thread.  Here are the relevant parts (read in order):

http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062921.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062922.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062923.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062924.html

Make sure you read Andriy's comments (2nd URL) in full.  My follow-up
(4th URL) confirms that the "mount -a" (which is what made / read-write
since /etc/fstab obviously has / as rw) was causing the issue.  He
explains the reason.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: UFS Trim wont stay set

2013-07-04 Thread Jeremy Chadwick

On Thu, Jul 04, 2013 at 03:37:28PM -0400, Mike Jakubik wrote:
> Hello,
> 
> I've just installed a stable snapshot on a new machine with a SSD
> drive, after installing i booted single user mode and ran
> 
> # tunefs -t enable /dev/ada0p2
> tunefs: issue TRIM to the disk set
> 
> Great, back to multiuser mode, i check the partition
> 
> # tunefs -p /dev/ada0p2
> tunefs: POSIX.1e ACLs: (-a)disabled
> tunefs: NFSv4 ACLs: (-N)   disabled
> tunefs: MAC multilabel: (-l)   disabled
> tunefs: soft updates: (-n) enabled
> tunefs: soft update journaling: (-j)   enabled
> tunefs: gjournal: (-J) disabled
> tunefs: trim: (-t) disabled
> 
> What the heck.. did i miss something? Back to single user mode and
> 
> # tunefs -t enable /dev/ada0p2
> tunefs: issue TRIM to the disk remains unchanged as enabled
> 
> I check again in multiuser mode and it says disabled, any ideas what
> is going on here?

Yup, experienced this myself many times over.  The reasons are
understood (it's not limited to just the TRIM bits, it's related to
anything adjusting the superblock -- it gets cached in memory in certain
situations and not flushed back to disk).

Hint: are you booting into single user and then issuing a "mount"
command before doing your tunefs stuff?  If so, this is probably
what's causing it (at least it was in my case).

Instead just boot into single-user, do not mount anything, and use
/sbin/tunefs (if available -- depends on your filesystem setup) or
/rescue/tunefs.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS Panic after freebsd-update

2013-07-02 Thread Jeremy Chadwick

On Tue, Jul 02, 2013 at 08:59:56AM +0300, Andriy Gapon wrote:
> on 01/07/2013 21:50 Jeremy Chadwick said the following:
> > The issue is that ZFS on FreeBSD is still young compared to other
> > filesystems (specifically UFS).
> 
> That's a fact.
> 
> > Nothing is perfect, but FFS/UFS tends
> > to have a significantly larger number of bugs worked out of it to the
> > point where people can use it without losing sleep (barring the SUJ
> > stuff, don't get me started).
> 
> That's subjective.
> 
> > I have the same concerns over other
> > things, like ext2fs and fusefs for that matter -- but this thread is
> > about a ZFS-related crash, and that's why I'm "over-focused" on it.
> 
> I have an impression that you seem to state your (negative) opinion of ZFS in
> every other thread about ZFS problems.

The OP in question ended his post with the line "Thoughts?", and I have
given those thoughts.  My thoughts/opinions/experience may differ from
that of others.  Diversity of thoughts/opinions/experiences is good.
I'm not some kind of "authoritative ZFS guru" -- far from it.  If I
misunderstood what "Thoughts?" meant/implied, then draw and quarter me
for it; my actions/words = my responsibility.

I do not feel I have a "negative opinion" of ZFS.  I still use it today
on FreeBSD, donated money to Pawel when the project was originally
announced (because I wanted to see something new and useful thrive on
FreeBSD), and try my best to assist with issues pertaining to it where
applicable.  These are not the actions of someone with a negative
opinion, these are the actions of someone who is supportive while
simultaneously very cautious.

Is ZFS better today than it was when it was introduced?  By a long shot.
For example, on my stable/9 system here I don't tune /boot/loader.conf
any longer.  But that doesn't change my viewpoint when it comes to using
ZFS exclusively on a FreeBSD box.

> > A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only),
> > results in a system where an admin can upgrade + boot into single-user
> > and perform some tasks to test/troubleshoot; if the ZFS layer is
> > broken, it doesn't mean an essentially useless box.  That isn't FUD,
> > that's just the stage we're at right now.  I'm aware lots of people have
> > working ZFS-exclusive setups; like I said, "works great until it
> > doesn't".
> 
> Yeah, a heterogeneous setup can have its benefits, but it can have its 
> drawbacks
> too.  This is true for heterogeneous vs monoculture in general.
> But the sword cuts both ways: what if something is broken in "UFS layer" or 
> god
> forbid in VFS layer and you have only UFS?
> Besides, without mentioning specific classes of problems "ZFS layer is broken"
> is too vague.

The likelihood of something being broken in UFS is significantly lower
given its established history.  I have to go off of experience, both
personal and professional -- in my years of dealing with FreeBSD
(1997-present), I have only encountered issues with UFS a few times (I
can count them on one, maybe two hands), and I'm choosing to exclude
SU+J from the picture for what should be obvious reasons.  With ZFS,
well... just look at the mailing lists and PR count.  I don't want to be
a jerk about it, but you really have to look at the quantity.  It
doesn't mean ZFS is crap, it just means that for me, I don't think
we're quite "there" yet.

And I will gladly admit -- because you are the one who taught me this --
that every incident need be treated unique.  But one can't deny that a
substantial percentage (I would say majority) of -fs and -stable posts
relate somehow to ZFS; I'm often thrilled when it turns out to be
something else.

Playing a strange devil's advocate, let me give you an interesting
example: softupdates.  When SU was introduced to FreeBSD back in the
late 90s, there were issues and concerns -- lots.  As such, SU was
chosen to be disabled by default on root filesystems given the
importance of that filesystem (re: "we do not want to risk losing as
much data in the case of a crash" -- see the official FAQ, section 8.3).
All other filesystems defaulted to SU enabled.  It's been like that up
until 9.x where it now defaults to enabled.  So that's what, 15 years?

You could say that my example could also apply to ZFS, i.e. the reports
are a part of its growth and maturity, and I'd agree.  But I don't feel
it's reached the point where I'm willing to risk going ZFS-only.  Down
the road, sure, but not now.  That's just my take on it.

Please make sure to also consider, politely, that a lot of people who
have issues wit

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick

On Mon, Jul 01, 2013 at 09:10:45PM +0300, Andriy Gapon wrote:
> on 01/07/2013 20:04 Jeremy Chadwick said the following:
> > People are operating with the belief that "ZFS just
> > works", when reality shows "it works until it doesn't"
> 
> That reality applies to everything that a man creates with a purpose to work.
> I am not sure why you are so over-focused on ZFS.
> Please stop spreading FUD.  Thank you.

The issue is that ZFS on FreeBSD is still young compared to other
filesystems (specifically UFS).  Nothing is perfect, but FFS/UFS tends
to have a significantly larger number of bugs worked out of it to the
point where people can use it without losing sleep (barring the SUJ
stuff, don't get me started).  I have the same concerns over other
things, like ext2fs and fusefs for that matter -- but this thread is
about a ZFS-related crash, and that's why I'm "over-focused" on it.

A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only),
results in a system where an admin can upgrade + boot into single-user
and perform some tasks to test/troubleshoot; if the ZFS layer is
broken, it doesn't mean an essentially useless box.  That isn't FUD,
that's just the stage we're at right now.  I'm aware lots of people have
working ZFS-exclusive setups; like I said, "works great until it
doesn't".

So, how do you kernel guys debug a problem in this environment:

- ZFS-only
- Running -RELEASE (i.e. no source, thus a kernel cannot be rebuilt
  with added debugging features, etc.)
- No swap configured
- No serial console

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick

On Mon, Jul 01, 2013 at 02:04:24PM -0400, Scott Sipe wrote:
> On Mon, Jul 1, 2013 at 1:04 PM, Jeremy Chadwick  wrote:
> 
> > On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote:
> > > On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick  wrote:
> > >
> > > > Of course when I see lines like this:
> > > >
> > > >  Trying to mount root from zfs:zroot
> > > >
> > > >  ...this greatly diminishes any chances of "live debugging" on the
> > > >  system.  It amazes me how often I see this come up on the lists --
> > people
> > > >  who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
> > > >  that behaviour would stop, as it makes debugging ZFS a serious PITA.
> > > >  This comes up on the list almost constantly, sad panda.
> > >
> > >
> > > I'm not sure why it amazes you that people are making widespread use of
> > ZFS.
> >
> > It's not widespread use of ZFS.  It's widespread use of ZFS as their
> > sole filesystem (specifically root/var/tmp/usr, or more specifically
> > just root/usr).  People are operating with the belief that "ZFS just
> > works", when reality shows "it works until it doesn't".  The mentality
> > seems to be "it's so rock solid it'll never break" along with "it can't
> > happen to me".  I tend to err on the side of caution, hence avoidance of
> > ZFS for critical things like the aforementioned.
> >
> > It's different if you have a UFS root/var/tmp/usr and ZFS for everything
> > else.  You then have a system you can boot/use without issue even if ZFS
> > is crapping the bed.
> >
> 
> 
> > ...
> >
> 
> 
> > 95% of FreeBSD users cannot debug kernel problems**.  To debug a kernel
> > problem, you need: a crash dump, a usable system with the exact
> > kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and
> > boot into 8.2 and reliably debug it using that), and (most important of
> > all) a developer who is familiar with kernel debugging *and* familiar
> > with the bits which are crashing.  Those who say what you're quoting are
> > often the latter.
> >
> 
> 
> > ...
> >
> 
> 
> > But the OP is running -RELEASE, and chooses to run that, along with use
> > of freebsd-update for binary updates.  Their choices are limited: stick
> > with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely.
> >
> 
> So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I
> ultimately wasn't sure where the right place to go for discuss 8.4 is?

For filesystem issues, freebsd-fs@ is usually the best choice, because
it discusses filesystem-related thing (regardless of stable vs. release,
but knowing what version you have of course is mandatory).

freebsd-stable@ is mainly for stable/X related discussions.

Sorry to add pedanticism to an already difficult situation for you (and
I sympathise, particularly since the purpose of the lists is often
difficult to discern, even with their terse descriptions in mailman).

> Beyond the FS mailing list, was there a better place for my question? I'll
> provide the other requested information (zfs outputs, etc) to wherever
> would be best.

Nope, not as far as I know.  The only other place is send-pr(1), once
you have an issue that can be reproduced.

Keep in mind, however, that none of these options (mailing lists,
send-pr, etc.) mandate a response from anyone.  You/your business (see
below) should be aware that there is always the possibility no one can
help solve the actual problem; as such it's important that companies
have proper upgrade/migration paths, rollback plans, and so on.

> This is a production machine (has been since late 2010) and after tweaking
> some ZFS settings initially has been totally stable. I wasn't incredibly
> closely involved in the initial configuration, but I've done at least one
> binary freebsd-update previously.

Well regardless it sounds like moving from 8.2-RELEASE to 8.4-RELEASE
causes ZFS to break for you, so that would classify as a regression.
What the root cause is, however, is still unknown.

Point: 8.2-RELEASE came out in February 2011, and 8.4-RELEASE came out
in June 2013 -- that's almost 2.5 years of changes between versions.
The number of changes between these two is major -- hundreds, maybe
thousands.  ZFS got worked on heavily during this time as well.

I tend to tell anyone using ZFS that they should be running a stable/X
(particularly stable/9) branch.  I can expand on that justification if
needed, as it's well-founded for a lot of reasons.

&

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick

On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote:
> On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick  wrote:
> 
> > On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote:
> >> *** Sorry for partial first message! (gmail sent after multiple returns
> >> apparently?) ***
> >> 
> >> Hello,
> >> 
> >> I have not had much time to research this problem yet, so please let me
> >> know what further information I might be able to provide.
> >> [[...]]
> >> Any thoughts?
> > 
> > Thoughts:
> > 
> > [[..]]
> > Of course when I see lines like this:
> > 
> >  Trying to mount root from zfs:zroot
> > 
> >  ...this greatly diminishes any chances of "live debugging" on the
> >  system.  It amazes me how often I see this come up on the lists -- people
> >  who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
> >  that behaviour would stop, as it makes debugging ZFS a serious PITA.
> >  This comes up on the list almost constantly, sad panda.
> 
> 
> I'm not sure why it amazes you that people are making widespread use of ZFS.

It's not widespread use of ZFS.  It's widespread use of ZFS as their
sole filesystem (specifically root/var/tmp/usr, or more specifically
just root/usr).  People are operating with the belief that "ZFS just
works", when reality shows "it works until it doesn't".  The mentality
seems to be "it's so rock solid it'll never break" along with "it can't
happen to me".  I tend to err on the side of caution, hence avoidance of
ZFS for critical things like the aforementioned.

It's different if you have a UFS root/var/tmp/usr and ZFS for everything
else.  You then have a system you can boot/use without issue even if ZFS
is crapping the bed.

> You could make the same argument that people shouldn't use UFS2
> journaling on their file systems because bugs in the implementation
> might make debugging journaled UFS2 file systems "a serious PITA."

Yup, and I do make that argument, quite regularly at that.  There is
even some evidence at this point in time that softupdates are broken:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-June/017424.html

> The point is that there are VERY compelling reasons why people might
> want to use ZFS for root/var/tmp/usr/etc. (pooled storage; easy
> snapshots; etc.) and there should come a time when a given file system
> is "generally regarded as safe."

While there may be compelling reasons, those reasons quickly get shot
down when they realise they have a system they can't easily do
troubleshooting with when the issue is with ZFS.

> I'd say the time for ZFS came when they removed the big disclaimer
> from the boot messages.  If ZFS is dangerous, they should reinstate
> the "not ready for production" warning.  Until they do, I think it's
> unfair to castigate people for using ZFS universally.

The warning meant absolutely nothing at the time (it did not keep people
away from it), and would mean nothing now if brought back.  A single
kernel printf() is not the right choice of action.

Are we better off today than we were when ZFS was originally ported
over?  Yes, by far.  Lots of improvements, in many great/good ways.  No
argument there.  But there is no way I'd risk putting my root filesystem
(or other key filesystems) on it -- still too new, still too many bugs,
and users don't know about those problems until it's too late.

> Isn't it a recurring theme on freebsd-current and freebsd-stable that
> more people need to use features so they can be debugged in realistic
> environments?  If you're telling them, "don't use that because it
> makes debugging harder," how are they supposed to get debugged and
> hence improved? :-)

95% of FreeBSD users cannot debug kernel problems**.  To debug a kernel
problem, you need: a crash dump, a usable system with the exact
kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and
boot into 8.2 and reliably debug it using that), and (most important of
all) a developer who is familiar with kernel debugging *and* familiar
with the bits which are crashing.  Those who say what you're quoting are
often the latter.

Part of the "need people to try this" process you refer to is what
stable/X is about, *without* the extra chaos of head.  I'm one of those
who for the past 15 years has advocated stable/X usage for a lot of
reasons; I'll save the diatribe for some other time.

But the OP is running -RELEASE, and chooses to run that, along with use
of freebsd-update for binary updates.  Their choices are limited: stick
with 8.2, switch to stable/X, cease use of ZFS, or change OSes

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick

On Mon, Jul 01, 2013 at 08:49:25AM -0700, Jeremy Chadwick wrote:
> - Is there a reason you do not have dumpdev defined in /etc/rc.conf (or
>   alternately, no swap device defined in /etc/fstab (which will get
>   used/honoured by the dumpdev="auto" (the default)) ?

This should have read "or alternately, ***A*** swap device defined in
/etc/fstab ..."

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick

On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote:
> *** Sorry for partial first message! (gmail sent after multiple returns
> apparently?) ***
> 
> Hello,
> 
> I have not had much time to research this problem yet, so please let me
> know what further information I might be able to provide.
> 
> This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4
> using freebsd-update. After I rebooted to test the new kernel, I got a
> panic. I had to take a picture of the screen. Here's a condensed version:
> 
> panic: page fault
> cpuid = 1
> KDB: stack backtrace:
> #0 kdb_backtrace
> #1 panic
> #2 trap_fatal
> #3 trap_pfault
> #4 trap
> #5 calltrap
> #6 vdev_mirror_child_select
> #7 ved_mirror_io_start
> #8 zio_vdev_io_start
> #9 zio_execute
> #10 arc_read
> #11 dbuf_read
> #12 dbuf_findbp
> #13 dbuf_hold_impl
> #14 dbuf_hold
> #15 dnode_hold_impl
> #16 dnu_buf_hold
> #17 zap_lockdir
> Uptime: 5s
> Cannot dump. Device not defined or unavailable.
> Automatic reboot in 15 seconds - press a key on the console to abort
> 
> uname -a from before (and after) the reboot:
> 
> FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57
> UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
>  amd64
> 
> dmesg is attached.
> 
> I was able to reboot to the old kernel and am up and running back on 8.2
> right now.
> 
> Any thoughts?

Thoughts:

- All I see is an amd64 system with 16GB RAM and 4 disks driven by an ICH10
  in AHCI mode.

- Output from: zpool status

- Output from: zpool get all

- Output from: zfs get all

- Output from: "gpart show -p" for every disk on the system

- Output from: cat /etc/sysctl.conf

- Output from: cat /boot/loader.conf

- Is there a reason you do not have dumpdev defined in /etc/rc.conf (or
  alternately, no swap device defined in /etc/fstab (which will get
  used/honoured by the dumpdev="auto" (the default)) ?  Taking photos of
  the console and manually typing backtraces in is borderline worthless.
  Of course when I see lines like this:

  Trying to mount root from zfs:zroot

  ...this greatly diminishes any chances of "live debugging" on the
  system.  It amazes me how often I see this come up on the lists -- people
  who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
  that behaviour would stop, as it makes debugging ZFS a serious PITA.
  This comes up on the list almost constantly, sad panda.

- Get yourself stable/9 and try that:
  https://pub.allbsd.org/FreeBSD-snapshots/

- freebsd-fs is a better place for this discussion, especially since
  you're running a -RELEASE build, not a -STABLE build.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Subversion 1.8 / FreeBSD 8 x86 STABLE Symlinks

2013-06-30 Thread Jeremy Chadwick

On Sun, Jun 30, 2013 at 02:20:21PM -0400, Jason Hellenthal wrote:
> When using svn 1.8 I have come across a situation where when it is used 
> pointing to a symlink that refers to a working directory that a update will 
> either segfault or exit prematurely and leave a lock held on the working 
> directory that the symlink points to.
> 
> This leaves you with one choice but to run cleanup on the referenced actual 
> working directory which was AFAIK never the case for any version below 1.8.
> 
> Not sure if this is a problem with svn or FreeBSD itself but thought I would 
> report the characteristics in case it's noticed elsewhere.
> 
> Details:
> Using UFS
> FreeBSD 8-STABLE i386 as of this date.
> 
> In the directory...
> cd /exports/usr
> ln -s src8 src
> svn up /exports/usr/src

Known bug/problem in Subversion, not FreeBSD:

http://svn.apache.org/viewvc?view=revision&revision=r1496007

Previous discussion:

http://lists.freebsd.org/pipermail/freebsd-questions/2013-June/251842.html

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FREEBSD_INSTALL failed with error 19 during booting installer

2013-06-29 Thread Jeremy Chadwick

On Sun, Jun 30, 2013 at 02:09:36AM +1000, Ian Smith wrote:
> On Fri, 28 Jun 2013 11:26:15 -0700, Jeremy Chadwick wrote:
>  > On Fri, Jun 28, 2013 at 08:22:29PM +0200, Marek Salwerowicz wrote:
>  > > Hi list,
>  > > 
>  > > I am trying to install FreeBSD 9.1-Release amd64 on a Supermicro server:
>  > > 
>  > > SuperStorage Server 6027R-E1R12N
>  > > 
>  > > with Intel Xeon E5-2640 CPU and 32 GB (4 x 8 ) KVR16R11D4/8HC installed
>  > > 
>  > > Currently I have only 2 SSD Kingston  drives (working in mirror)
>  > > installed on that server.
>  > > 
>  > > during booting installer from the ISO CD (amd64),  the boot process
>  > > fails with message:
>  > > 
>  > > Mounting from cd9660:/dev/iso9660/FREEBSD_INSTALL failed with error 19.
>  > > 
>  > > As I found here: http://forums.freebsd.org/showthread.php?t=36579 ,
>  > > probably this could be issue with ACPI, but setting option in
>  > > loader:
>  > > 
>  > > # set debug.acpi.disabled ="hostres"
>  > > # boot
>  > > 
>  > > made nothing for me.
>  > > 
>  > > 
>  > > 
>  > > Any ideas?
>  > 
>  > Try using a USB flash drive + memstick image instead of CD-based media.
> 
> Last time I tried - 9.1-release i386 - the memstick boot gave no option 
> to drop to loader; I had to burn a disc1 CD so I could drop to loader to 
> turn cam.ctl off to succeed installing in 128MB.  Did I miss something?

I've used memstick images exclusively for years and have never seen
this.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FREEBSD_INSTALL failed with error 19 during booting installer

2013-06-28 Thread Jeremy Chadwick

On Fri, Jun 28, 2013 at 08:22:29PM +0200, Marek Salwerowicz wrote:
> Hi list,
> 
> I am trying to install FreeBSD 9.1-Release amd64 on a Supermicro server:
> 
> SuperStorage Server 6027R-E1R12N
> 
> with Intel Xeon E5-2640 CPU and 32 GB (4 x 8 ) KVR16R11D4/8HC installed
> 
> Currently I have only 2 SSD Kingston  drives (working in mirror)
> installed on that server.
> 
> during booting installer from the ISO CD (amd64),  the boot process
> fails with message:
> 
> Mounting from cd9660:/dev/iso9660/FREEBSD_INSTALL failed with error 19.
> 
> As I found here: http://forums.freebsd.org/showthread.php?t=36579 ,
> probably this could be issue with ACPI, but setting option in
> loader:
> 
> # set debug.acpi.disabled ="hostres"
> # boot
> 
> made nothing for me.
> 
> 
> 
> Any ideas?

Try using a USB flash drive + memstick image instead of CD-based media.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: AHCI Patsburg SATA controller and slow transfer speed

2013-06-27 Thread Jeremy Chadwick

On Thu, Jun 27, 2013 at 06:38:27PM -0700, Jeremy Chadwick wrote:
> Next, this statement by ahci(4) then confuses the user:
> 
> > ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
> 
> You see, when AHCI was invented, the existing idea was that all ports
> would have the same speed (and that was the case at the time).  Only
> somewhat recently have some vendors begun to mix-match speeds on the
> same controller -- like this one.
> 
> The AHCI specification probably (I have not read it even recently) only
> provides a number indicating "the total number of ports" followed by a
> single number indicating "the speed".
> 
> There may be support somewhere within AHCI to provide an updated way to
> get more granular information, but I do not know if that's the case.
> 
> If there is, FreeBSD's ahci(4) driver does not support such at this
> time (see sys/dev/ahci/ahci.c around line 502 for the device_printf()
> call and what the arguments are (specifically AHCI_CAP_ISS and
> AHCI_CAP_NPMASK)).

Just a technical follow-up:

I spent some time this evening looking at AHCI specification 1.30.  I'll
try to explain the situation.

First, at the HBA level (meaning the entire AHCI controller):

Bits 23-30 of CAP (reg. offset 0x00): Interface Speed Support (ISS).
This indicates, quote, "the maximum speed the HBA can support on its
ports".

Next, on a per-port basis, there are two registers available relating to
speed: one indicates speed, the other controls/limits speed:

1) Bits 7-4 of PxSSTS (reg. offset 0x28): SPD: Port x Serial ATA Status
(SCR0: SStatus).  This indicates, quote, "the negotiated interface
speed".

2) Bits 7-4 of PxSCTL (reg. offset 0x2c): SPD: Port x Serial ATA Control
(SCR2: SControl).  The register controls, quote, "the highest allowable
speed of the interface".  The bit definitions indicate a way to limit
the speed of a port and do not indicate capability.

The actual 1.30 specification even has a section (10.5) on this whole
ordeal, which states clearly, quote:

10.5 Interface Speed Support

The HBA indicates the maximum speed it can support via the CAP.ISS
register. Software can further limit the speed of a port by manipulating
each port's PxSCTL.SPD field to a lower value.

AHCI spec "proposal" 1.31 also does not address/cover this (all that
adds is per-port sleep capabilities).

I will point out that SATA600 is not officially mentioned in any spec at
this time (that I can get my hands on), so what all the OSes run off of
are educated assumptions.  :-)  But theoretically, a newer AHCI spec
could support per-port maximum speed indication.

It's not easy to phrase all this tersely in a single device_printf(),
and there has already been opposition to adding printing of more lines
to the existing drivers/in dmesg (meaning, printing 6 lines, one for
each port, indicating active speed + maximum speed, would probably be
looked down upon outside of verbose booting).  The best I can come up
with is this:

ahci0: AHCI v1.30, 6 ports, maximum 6Gbps, Port Multiplier not supported

...which is better, but could still be interpreted as "6 ports
with a maximum of 6Gbps per port".

Hope this sheds light in some way or another.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: AHCI Patsburg SATA controller and slow transfer speed

2013-06-27 Thread Jeremy Chadwick

On Thu, Jun 27, 2013 at 02:21:57PM -0700, Dave Hayes wrote:
> Greetings all. I'm on FreeBSD 9.1-STABLE #0 r251391M. I'm noticing
> two of my SATA disks are at half speed. Is this normal or is there
> some configuration I'm forgetting?
> 
> # dmesg | grep -C 4 ahc
> ...
> ahci0:  port
> 0x2070-0x2077,0x2060-0x2063,0x2050-0x2057,0x2040-0x2043,0x2020-0x203f
> mem 0xd0b0-0xd0b007ff irq 21 at device 31.2 on pci0
> ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
> ahcich0:  at channel 0 on ahci0
> ahcich1:  at channel 1 on ahci0
> ahcich2:  at channel 2 on ahci0
> ahcich3:  at channel 3 on ahci0
> ahcich4:  at channel 4 on ahci0
> ahcich5:  at channel 5 on ahci0
> ...
> ada0:  ATA-8 SATA 3.x device
> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada0: Previously was known as ad4
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada1:  ATA-8 SATA 3.x device
> ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada1: Previously was known as ad6
> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada2:  ATA-8 SATA 3.x device
> ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>   ^
> ada2: Command Queueing enabled
> ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada2: Previously was known as ad8
> ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
> ada3:  ATA-8 SATA 3.x device
> ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>   ^^^
> ada3: Command Queueing enabled
> ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada3: Previously was known as ad10
> # pciconf -lcvb
> ahci0@pci0:0:31:2:  class=0x010601 card=0x35ae8086
> chip=0x1d028086 rev=0x06 hdr=0x00
> vendor = 'Intel Corporation'
> device = 'Patsburg 6-Port SATA AHCI Controller'
> class  = mass storage
> subclass   = SATA
> bar   [10] = type I/O Port, range 32, base 0x2070, size  8, enabled
> bar   [14] = type I/O Port, range 32, base 0x2060, size  4, enabled
> bar   [18] = type I/O Port, range 32, base 0x2050, size  8, enabled
> bar   [1c] = type I/O Port, range 32, base 0x2040, size  4, enabled
> bar   [20] = type I/O Port, range 32, base 0x2020, size 32, enabled
> bar   [24] = type Memory, range 32, base 0xd0b0, size 2048, enabled
> cap 05[80] = MSI supports 1 message enabled with 1 message
> cap 01[70] = powerspec 3  supports D0 D3  current D0
> cap 12[a8] = SATA Index-Data Pair
> cap 13[b0] = PCI Advanced Features: FLR TP
> 
> Thanks for any insight provided.

Intel Patsburg is otherwise known as Intel X79.  The X79
chipset/southbridge offers 6 SATA ports, 2 of which are SATA600, and the
remaining 4 are SATA300:

http://en.wikipedia.org/wiki/Intel_X79

The intention of this was to offer 2 ports for people wanting to use
SSDs (which tend to throttle themselves based on negotiated PHY speed),
and a remaining 4 ports for MHDDs or ATAPI.  You can, of course, use
whatever ports for whatever you want.

More importantly (I think): your devices are MHDDs and will never be
able to reach SATA600 (or SATA300) speeds.  Pure MHDDs which use SATA600
PHYs are somewhat of a marketing gimmick (but my gut feeling is that the
MHDD vendors are choosing to narrow the number of on-disk SATA
controllers they use).  Hybrid HDDs may benefit from faster PHYs.

Next, this statement by ahci(4) then confuses the user:

> ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported

You see, when AHCI was invented, the existing idea was that all ports
would have the same speed (and that was the case at the time).  Only
somewhat recently have some vendors begun to mix-match speeds on the
same controller -- like this one.

The AHCI specification probably (I have not read it even recently) only
provides a number indicating "the total number of ports" followed by a
single number indicating "the speed".

There may be support somewhere within AHCI to provide an updated way to
get more granular information, but I do not know if that's the case.

If there is, FreeBSD's ahci(4) driver does not support such at this
time (see sys/dev/ahci/ahci.c around line 502 for the device_printf()
call and what the arguments are (specifically AHCI_CAP_ISS and
AHCI_CAP_NPMASK)).

TL;DR -- Your motherboard offers 6 ports, 2 of which are SATA600, 4 of
which are SATA300, and despite the line shown above by FreeBSD not
matching reality, everything is working as designed.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Ad

Re: portupgrade(1) | portmaster(8) -- which is more effective for large upgrade?

2013-06-26 Thread Jeremy Chadwick

On Wed, Jun 26, 2013 at 01:23:32PM -0700, Jeremy Chadwick wrote:
> On Wed, Jun 26, 2013 at 09:42:43AM -0700, Chris H wrote:
> {snipping}

Also, hoping the OP is subscribed to -stable -- you should probably deal
with this.  This is not the first time I've seen problems with mail
delivery to a 1command.com address.

: host male.ultimateDNS.NET[209.180.214.225] said: 550
5.0.0 SPAM and BULK mail REJECTED (in reply to MAIL FROM command)

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: portupgrade(1) | portmaster(8) -- which is more effective for large upgrade?

2013-06-26 Thread Jeremy Chadwick

On Wed, Jun 26, 2013 at 09:42:43AM -0700, Chris H wrote:
> Greetings,
>  I haven't upgraded my tree(s) for awhile. My last attempt to rebuild after 
> an updating
> src && ports, resulted in nearly installing the entire ports tree, which is 
> why I've
> waited so long. Try as I might, I've had great difficulty finding something 
> that will
> _only_ upgrade what I already have installed, _and_ respect the "options" 
> used during the
> original make && make install, or those options expressed in make.conf.
> As portupgrade(1) && portmaster(8) appear to be the most used in this 
> scenario,
> I'm soliciting opinions on which of these works best, or if there is 
> something else to
> better manage this situation. Is there such a thing as a FreeBSD upgrade 
> "easy button"?

Use portmaster, avoid portupgrade.  And no I will not expand on my
reasoning -- I urge anyone even mentioning the word portupgrade to spend
a few hours of their day reading the horror stories on the mailing lists
over the past 10 years or so (including recently).  Choose wisely.

And before going on any sort of "update crusade", I recommend you
re-examine your make.conf methodologies for options if you haven't
already.  The OPTIONS framework has been revamped and improved many
times over, so you will find things like this on a system whose admin
keeps up with the times (compare this to older ways/methods, which may
break or stop working):

OPTIONS_UNSET+= X11 IPV6 NLS

php5_SET+=  APACHE
php5_UNSET+=CGI
postfix_SET+=   PCRE TLS SASL2
samba36_SET+=   AIO_SUPPORT
samba36_UNSET+= LDAP CUPS ACL_SUPPORT WINBIND POPT
wget_SET+=  OPENSSL
wget_UNSET+=IDN

When rebuilding everything, I have always resorted to this:

rsync -avH /usr/local/ /usr/local.old/
pkg_delete -a -f
rm -fr /usr/local/*
rm -fr /var/db/ports/*
rm -fr /usr/ports/distfiles/*
cd /usr/ports/whatever
make install clean
{lather rinse repeat until done}

And add some pkg_add -r's in there for large-ish things I don't want to
rebuild from source (I think folks who use X probably do this quite a
bit; I remember hearing how Open/LibreOffice takes something like 3-4
hours to build on some systems).

But that's just how I do things.  My advice on using portmaster,
however, still stands.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Another bug in SSH in FreeBSD 8.4 (sftp cannot create relative symlinks)

2013-06-24 Thread Jeremy Chadwick

On Tue, Jun 25, 2013 at 03:03:04AM +0200, Miroslav Lachman wrote:
> Jeremy Chadwick wrote:
> >On Mon, Jun 24, 2013 at 03:36:24PM -0700, Xin Li wrote:
> >>-BEGIN PGP SIGNED MESSAGE-
> >>Hash: SHA512
> >>
> >>On 06/24/13 15:11, Miroslav Lachman wrote:
> >>[...]
> >>>The patch seems really simple and I know how to apply it, but I am
> >>>not able to compile and install only fixed sftp command instead of
> >>>the whole userland. Can you push me to the right direction?
> >>
> >>I think you can go to /usr/src/secure/usr.bin/sftp and do:
> >>
> >>make depend
> >>make
> >>
> >>Then, as root:
> >>
> >>make install
> 
> Thank you! I didn't know I must be in /usr/src/secure/usr.bin/sftp
> 
> I tried your patch and can confirm it works for me!
> 
> >>I usually do a full world build to make sure that this doesn't break
> >>something else but this change should only affect sftp(1).
> >
> >I'm going to make this real simple:
> >
> >Is the problem with symlinks in the client (sftp(1)), in the server
> >(sftp-server(8)), or both?  The impression I get from the original post
> >that started this thread is that it's in the server part.
> 
> No, it is the problem on the client side. The server side in all
> cases is good old OpenSSH 5.4 on FreeBSD 8.3. Only the newer sftp
> client is broken and this bug is really fixed by patch provided by
> Xin Li.
> 
> We tried OpenSSH 6.2 client side from Mac OS X and it is broken too.
> The same apply to openssh-portable from ports (openssh-portable-6.2.p2_3,1)
> 
> >So, I believe he'd want to poke about in src/secure/libexec/sftp-server.
> >However, that may not be enough, due to the fact that sftp-server(8)
> >depends (links to) libssh.so.X, libcrypt.so.X, and libcrypto.so.X.  I do
> >not know where the actual broken code lies.
> >
> >Someone on -security might know exactly what all needs to be built/what
> >commands need to be run, but I will tell you this up front:
> >
> >The official security announcements for SSL or SSH-related things have
> >historically told people to build world.  I went and read the mailing
> >list archives for -security-announcements and found proof/examples of
> >this fact when issues pertain to SSL or SSH.
> >
> >My recommendation is just to build world.  Don't risk it -- this is a
> >key piece of your system, all you're trying to do is save some time.
> >Don't.  Just build/install world and don't screw around.
> 
> I understand your concern and I will rebuild world if the patch
> changes anything in the server part, but this is realy just a fix in
> sftp client command and I want to try it quickly and to have a quick
> path to go back to original version of the sftp command.
> 
> This is on testing machine anyway, I will not do this on production
> machines.

Understood -- it was my misunderstanding of the issue (being on the
client side, not server side), so Xin's advice is sound.  Sorry for the
noise on my part.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Another bug in SSH in FreeBSD 8.4 (sftp cannot create relative symlinks)

2013-06-24 Thread Jeremy Chadwick

On Mon, Jun 24, 2013 at 03:36:24PM -0700, Xin Li wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> On 06/24/13 15:11, Miroslav Lachman wrote:
> [...]
> > The patch seems really simple and I know how to apply it, but I am
> > not able to compile and install only fixed sftp command instead of
> > the whole userland. Can you push me to the right direction?
> 
> I think you can go to /usr/src/secure/usr.bin/sftp and do:
> 
> make depend
> make
> 
> Then, as root:
> 
> make install
> 
> I usually do a full world build to make sure that this doesn't break
> something else but this change should only affect sftp(1).

I'm going to make this real simple:

Is the problem with symlinks in the client (sftp(1)), in the server
(sftp-server(8)), or both?  The impression I get from the original post
that started this thread is that it's in the server part.

So, I believe he'd want to poke about in src/secure/libexec/sftp-server.
However, that may not be enough, due to the fact that sftp-server(8)
depends (links to) libssh.so.X, libcrypt.so.X, and libcrypto.so.X.  I do
not know where the actual broken code lies.

Someone on -security might know exactly what all needs to be built/what
commands need to be run, but I will tell you this up front:

The official security announcements for SSL or SSH-related things have
historically told people to build world.  I went and read the mailing
list archives for -security-announcements and found proof/examples of
this fact when issues pertain to SSL or SSH.

My recommendation is just to build world.  Don't risk it -- this is a
key piece of your system, all you're trying to do is save some time.
Don't.  Just build/install world and don't screw around.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-22 Thread Jeremy Chadwick

On Sun, Jun 23, 2013 at 02:41:27AM +0200, Willem Jan Withagen wrote:
> On 19-6-2013 17:04, Jeremy Chadwick wrote:
> >- Adam runs 9.1-RELEASE because of business needs pertaining to
> >   freebsd-update and binary updates.  (I ask more about this for
> >   benefits of readers below, however -- because this situation comes
> >   up a lot and I want to know what real-world admins do)
> 
> The bug is very specifically available in 9.1-RELEASE because I got
> bit by it before the release of 9.1. But discussed it with avg@ and
> it did not make it into the release, but was submitted only like 2
> weeks later.
> 
> So in that case you can probably stop looking.
> 
> For just about any 9.1-STABLE after that should the fix be in the code.

I'm not sure why so many people (so far) seem to think that this problem
is always the same issue -- it isn't.  There are multiple things that
have historically (and/or presently) have caused this issue.

Here's the list I composed only a few days ago, and it is far from
thorough:

http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073863.html

My point is that the "shutdown -r issue" issue might manifest itself in
the same fashion for everyone, but the **root cause** often differs.
I.e.  what fixed it for you may not fix it for Adam.  We must wait and
see (he's in the process of getting a system to try stable/9 on).

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: slow bootloader on Dell R320

2013-06-22 Thread Jeremy Chadwick

On Sat, Jun 22, 2013 at 09:37:37PM +0200, Loc BLOT wrote:
> Hi all !
> Thanks for the very good support of Dell R320 hardware, perc H310 is
> well supported, BCM5720 seems to work correctly and performances are
> great.
> The only problem i have found is very strange. The FreeBSD bootloader
> take many times to load, 30sec-2minutes to boot the kernel and show the
> bootloader menu. After that, the system boots properly, at a normal
> speed.
> Is there any issue or optimization i can do ?
> The OpenBSD bootloader doesn't have this problem.

1. What FreeBSD version exactly?  (Please don't say "9.1", we need to
know the full version, e.g. 9.1-RELEASE, or if you built your own we
need uname -a output (you can hide the machine name))

2. How many disks are in the machine?

3. Are any of the disks used for ZFS?

There have been **many** improvements to the FreeBSD bootloader with
regards to things taking a long time on boot-up in semi-recent days, but
answers to the above questions will determine that.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick

On Wed, Jun 19, 2013 at 11:34:39AM -0500, Matthew D. Fuller wrote:
> On Wed, Jun 19, 2013 at 09:16:35AM -0700 I heard the voice of
> Jeremy Chadwick, and lo! it spake thus:
> > 
> > The above CDB + subcommand disables APM entirely.  There is a lot
> > more to APM than just parking heads (and in all honesty, APM should
> > have nothing to do with parking heads).  Disabling APM can actually
> > have drastic effects on drive temperature (meaning there are certain
> > chip and/or motor operations that said feature controls *in
> > addition* to head parking), and other firmware-level features that
> > aren't documented.
> 
> True enough, in concept.  With all the drives sitting behind
> ventilation perfectly capable of dealing with 15kRPM drives, I don't
> worry about what that might do to the 7200's though...

Justified in your environment, but not in mine -- where most of my
systems (at home) are extremely quiet (1000-1200rpm fans, lots of noise
dampening material, etc.).  A 10C increase *during idle* is enough to
make me wary.  I also have extremely sensitive hearing, so drives
clicking is something I can hear from quite a distance -- I guess
working with them for so long over the years has made me sensitive to
'em.

> > Furthermore, that CDB does not work for all drives.  There are
> > Seagate drives -- I know because I bought some and returned them
> > when the APM trick did not work -- that lack the LCC-disable tie-in
> > to APM.  The drive either rejected the CDB (ATA status code error
> > returned), while others accepted it but nothing in 0xec (IDENTIFY)
> > reported as got changed.
> 
> Well, I haven't seen it with these.  Several of
> ada0:  ATA-8 SATA 3.x device
> and some systems with CC4C too.

The drives I was testing were STx000DM001.  I don't remember if I had a
DM002.  I also don't remember the firmware version they had on them, but
I do remember there were no updates available from Seagate at that time.
On the other hand, their forum was *filled* with post after post about
the issue, including one fellow whose drive in something like 3 months
was almost reaching MTBF head park/reload count.

But my point is this: 3.5" drives do not need this feature in 95% of
environments.  In desktop systems it's worthless -- in consumer desktops
it accomplishes nothing but noise and annoyance and impacts I/O, and in
business desktop desktop environments it serves no purpose because most
places have their desktops go into sleep mode (so drive standby/sleep
gets used).  And in the server environment it's pure 100% worthless.

With 2.5" drives I can see it being more useful, but only if the drive
is used in a laptop.  There are NASes (and now servers too!) which use
2.5" drives, and I sure as hell wouldn't want that happening there.

So really it's just a bad feature all around that should be specific to
one environment demographic; the vendors should have made a 2.5" drive
"dedicated for laptops" that had this feature enabled, while disabld on
all other drives (2.5" and 3.5").  What we got was nearly opposite.

> > I will have -- and eat -- their souls.
> 
> The problem with that is that the undigestible bits of "soul" just get
> passed right back into the ecosystem, and in a more concentrated form.
> 
> Some might suggest that's already happened, and is got us here in the
> first place  8-}

If you had what I do (moderate-to-severe IBS), you'd know that it
definitely doesn't get passed back in a more concentrated form.  First
joke I've been able to make about my health condition, yeah!  Ha!  I
kill me! -- Alf

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick

On Wed, Jun 19, 2013 at 10:53:46AM -0500, Matthew D. Fuller wrote:
> On Wed, Jun 19, 2013 at 08:04:14AM -0700 I heard the voice of
> Jeremy Chadwick, and lo! it spake thus:
> > 
> > 
> > Readers: if any of you have a ST[123]000DM001 drive running the CC24
> > firmware, and can confirm high head parking counts (SMART attribute
> > 193), and are willing to upgrade your drive firmware to the latest then
> > see if the LCC increments stop (or at least settle down to normal
> > levels), I'd love to hear from you.  I have been socially boycotting
> > these models of drives because of that idiotic firmware design choice
> > for quite some time now (not to mention the parking on those drives
> > is audibly loud in a normal living room), and if the F/W actually
> > inhibits the excessive parking then I have some drives to consider
> > upgrading.  :-)
> > 
> 
> I dunno about firmware, but you can smack 'em with a big hammer...
> 
> /etc/rc.local:
> for i in 0 1; do
> /sbin/camcontrol cmd ada${i} -a "EF 85 00 00 00 00 00 00 00 00 00 00"
> done
> 
> x-ref:
> http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052997.html
> 
> 
> LCC was somewhere in the upper 400's (I wanna say 480-some?) a year
> and change ago when I dropped that in.  It's 506/493 now on the two
> drives.

The above CDB + subcommand disables APM entirely.  There is a lot more
to APM than just parking heads (and in all honesty, APM should have
nothing to do with parking heads).  Disabling APM can actually have
drastic effects on drive temperature (meaning there are certain chip
and/or motor operations that said feature controls *in addition* to head
parking), and other firmware-level features that aren't documented.

Furthermore, that CDB does not work for all drives.  There are Seagate
drives -- I know because I bought some and returned them when the APM
trick did not work -- that lack the LCC-disable tie-in to APM.  The
drive either rejected the CDB (ATA status code error returned), while
others accepted it but nothing in 0xec (IDENTIFY) reported as got
changed.

The only model of drive I know that reliably works with this method is
the WD Green/-GP drive, and the drive temperatures do increase.  No idea
on the Blues.  (Another reason I recommend the Reds...)

What *should* have happened is that a new 0xef subcommand should have
been created for this.  Subs range from 0x00-0xff.  T13 spec shows
that a huge number of them (I'd say 30% or more) are marked "Reserved"
and an additional 30% or so are marked "Obsolete".  And finally,
0x56-0x5c, 0xd6-0xdc and 0xe0 are "Vendor Specific".

But looking at this from a more general view, the real issue is that
these types of features should not have been introduced to begin with.
The vendors introduced this problem, and now are marketing drives with
said feature disabled, claiming "we fixed the problem that annoys so
many of you!" -- the same problem **they introduced without asking
anyone**.

I will have -- and eat -- their souls.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Jeremy Chadwick

On Wed, Jun 19, 2013 at 05:02:20PM +0200, Dennis Kgel wrote:
> Am 19.06.2013 um 16:47 schrieb Steven Hartland:
> > I'm not familar with that model of the areca but have you tried
> > with the standard OS driver or does it not support that card?
> 
> The ARC1320 (non-raid) unfortunately isn't supported by the in-tree driver.

Which model of the ARC1320 are you using (there are 2).  I'm having
trouble understanding their chart too:

http://www.areca.us/products/sasnoneraid6g.htm

Because the controllers claim to support up to 128 disks, via break-out
cables, but I'm not sure.

You aren't using any port multipliers, are you?

> > Also when you see hangs can you access the disk directly or not
> > e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?
> 
> Interesting idea. The dd then hangs right until everything else resumes as 
> well.
> 
> ^T during hang says: load: 12.39  cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 
> 1632k

Is this ***while** you have immense amounts of ZFS write I/O going to
those drives (your zpool iostat was showing ~250-300MB/sec to the pool)?

It's very important to note that the stats you showed were during
writes.

What we're trying to figure out here is where the blocking (waiting) is
happening:

a) the ZFS layer
b) the storage driver layer ('arcsat', the 3rd-party unofficial driver)
c) the CAM layer
d) the GEOM layer
e) something with the disk(s)
f) something with memory I/O going on (say between the storage driver
   and ZFS, for lack of better way to phrase it)

I have a very big Email written for you, but I wanted to let certain
answers to Ronald's questions come out first.

-rw---1 jdc   users 5576 Jun 19 06:49 dennis_kgel_response.txt

I need to re-word this and take into consideration some of the new stuff
said up to now, but I don't know if I'll ahve the time for this (you
should see my desktop right now, I have literally 4 IM messages to
answer and my Email box is non-stop).

The one I want to get out of the way right now is this:

Can you please try putting this in /boot/loader.conf + reboot and
see if the behaviour for you changes?

vfs.zfs.no_write_throttle="1"

Warning: this may actually exacerbate the problem worse, depending on
what the nature/root cause is.  Right now I'm of the opinion ZFS is
actually doing the Right Thing(tm) and that the issue may be in Areca's
driver, but that's hearsay until I have proof.  But the write throttling
stuff added semi-recently (by the Illumos folks, this is not a FreeBSD
feature) has had some reports of problems where disabling it helped
immensely.

Important: 24 disks off a single controller is a lot of bandwidth.
That controller may be overwhelmed, in which case you would see
exactly this kind of behaviour as the controller is screaming "GOD HELP
ME, I'M TRYING TO DO ALL THIS STUFF AND YOU KEEP THROWING I/O AT ME".
:-)  This is also why I ask about port multiplier usage.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick

On Wed, Jun 19, 2013 at 09:15:18PM +0700, Adam Strohl wrote:
> On 6/19/2013 20:35, Jeremy Chadwick wrote:

I've snipped out portions which aren't relevant at this point in the
convo.  I'm trying to be terse as much as possible here (honest).

To recap for readers/mailing list:

- Adam seems the same behaviour on systems on bare metal, as well as
  FreeBSD guests running under VMware ESXi 5.0 hypervisor.  However,
  as I stated on the list just yesterday about "lock-ups on shutdown",
  every situation may be different and there is a well-established
  history of this problem on FreeBSD where each root cause (bugs)
  were completely different from one another.

- The system we're discussing at this point in the thread is on
  bare metal -- specifically an Asus P8B-X motherboard, with BIOS
  version 6103, driven entirely by on-board Intel AHCI (not BIOS-level
  RAID).

- Adam runs 9.1-RELEASE because of business needs pertaining to
  freebsd-update and binary updates.  (I ask more about this for
  benefits of readers below, however -- because this situation comes
  up a lot and I want to know what real-world admins do)

> >Thanks.  I was mainly interested in the storage controller being used
> >(in this case ahci(4)) and the disks being used (notorious ST3000DM001,
> >known for excessively parking heads).
> 
> Yeah, was not my first choice but then again ... RAIDZ-2 :)  HD
> supply chain here (Thailand) is weird considering how many are made
> here (and can't buy).  Smartd screams about them possibly needing a
> firmware update (they don't according to Seagate).   Had no issues
> aside from a failure a month or so again (it's an HD ... it
> happens).

Absolutely understood -- and FYI, in case you need backup, your thought
process/conclusion here is spot on (re: "it's a MHDD, failures happen").

Irrelevant to your shutdown problem: as for smartmontools bitching about
the firmware: no vendors disclose what actual changes go into their
drive firmware updates (vendors if you are reading this: I will have
your souls...), so I have to read a bunch of end-user forums where
nobody knows what they're talking about, and then of course find this
"highly educational" *cough* article from Adaptec:

http://ask.adaptec.com/app/answers/detail/a_id/17241/~/known-issues-with-seagate-barracuda-7200.14-desktop-drives

The problem here is that there have been *so many* firmware bugs with
Seagate's drives in the past 2 years or so that it's impossible for me
to know which fixes what.  You buy what you buy because that's what you
buy, and that's cool -- but I avoid their stuff like the plague.

Readers: if any of you have a ST[123]000DM001 drive running the CC24
firmware, and can confirm high head parking counts (SMART attribute
193), and are willing to upgrade your drive firmware to the latest then
see if the LCC increments stop (or at least settle down to normal
levels), I'd love to hear from you.  I have been socially boycotting
these models of drives because of that idiotic firmware design choice
for quite some time now (not to mention the parking on those drives
is audibly loud in a normal living room), and if the F/W actually
inhibits the excessive parking then I have some drives to consider
upgrading.  :-)

> >I can also see you're running your own kernel.  We'll get to that in a
> >moment.
> 
> It's GENERIC with the following added to the end:
> 
> # -- Add Support for nicer console
> #
> options VESA
> options SC_PIXEL_MODE

Can you try removing VESA and SC_PIXEL_MODE please?  I know that
sounds crazy ("what on earth would that have to do with it?"), but
please try it.  I can explain the justification if need be -- I'm being
extra paranoid of something that got discovered here on -stable only a
few days ago.  It's a stretch, but I can see potential relevance.  I can
provide details/links later.

> >>>4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?
> >>
> >>Weirdly this allowed it to reboot on the first try (without needing
> >>to be reset), but not the second.
> >
> >I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
> >working on the USB stack and fixing major bugs.
> 
> Got it but probably not going to go this route as it means no more
> binary upgrades.  While I can reboot it, it is the office NAS here
> and so 'testing out' -STABLE I think probably isn't going to happen.

I understand.  I have a question relating to this below.

> >Place background_fsck="no" in /etc/rc.conf.  If the machine does not
> >have a clean filesystem on boot-up, you'll know because the system will
> >immediately begin fsck (in the foreground actively).  You'll recog

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick

On Wed, Jun 19, 2013 at 07:53:19PM +0700, Adam Strohl wrote:
> On 6/19/2013 19:21, Jeremy Chadwick wrote:
> >On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:
> >>Hello -STABLE@,
> >>
> >>So I've seen this situation seemingly randomly on a number of both
> >>physical 9.1 boxes as well as VMs for I would say 6-9 months at
> >>least.  I finally have a physical box here that reproduces it
> >>consistently that I can reboot easily (ie; not a production/client
> >>server).
> >>
> >>No matter what I do:
> >>
> >>reboot
> >>shutdown -p
> >>shutdown -r
> >>
> >>This specific server will stop at "All buffers synced" and not
> >>actually power down or reboot.  KB input seems to be ignored.  This
> >>server is a ZFS NAS (with GMIRROR for boot blocks) but the other
> >>boxes which show this are using GMIRRORs for root/swap/boot (no
> >>ZFS).
> >>
> >>Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
> >>
> >>When I reset the server it appears that disks were not dismounted
> >>cleanly ... on this ZFS box it comes back quick because ZFS is good
> >>like that but on the other servers with GMIRROR roots rebuilding the
> >>GMIRROR and fscking at the same time is murder on the
> >>disk/performance until it finishes.
> >
> >1. You mention "as well as VMs".  Anything under a "virtual machine" or
> >under a hypervisor is going to be very, very, **VERY** different than
> >bare metal.  So I hope the issues you're talking about above are on bare
> >metal -- I will assume so.
> 
> Nope, I see basically the same thing sometimes under ESXi 5.0
> Hypervisor (and yes it worries me the implications of something so
> broad).  Those unites I just haven't been able to isolate on a
> server which isn't critical.  Lets focus on this server for now
> though per your suggestion below.

I'm sorry but I don't understand your first sentence -- the first part
of your sentence says "nope" (I have to assume in reply to my "on bare
metal" part), but then says "I see basically the same thing sometimes
under ESXi" which implies an alternate environment in comparison (i.e.
we *are* talking about bare metal).  Consider me confused.  :-)

> >2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE.
> >If you use stable/9 (RELENG_9) we need to see uname -a output (you can
> >hide the machine name if you want).
> 
> Sorry, this ZFS box is 9.1-R P4 (kernel built today):
> 
> FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun
> 19 15:31:12 ICT 2013
> root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS  amd64

I suggest trying stable/9 (and staying with it, for that matter).

> >3. Can we please have dmesg from this machine?  The controller and some
> >other hardware details matter.
> 
> Sure take a look at the full log here: http://pastebin.com/k55gVVuU
> 
> This includes a boot, then a reboot as I describe (you can see it
> logs the All Buffers Synced, etc) then powering back on.

Thanks.  I was mainly interested in the storage controller being used
(in this case ahci(4)) and the disks being used (notorious ST3000DM001,
known for excessively parking heads).  AFAIK this isn't one of the
controllers that was known for weird "quirky issues" pertaining to
flushing data to disk on shutdown.

I have to ask: is this FreeBSD box running under a HV?

If it *is not* running under a HV, could we please get exact motherboard
model and version (including BIOS version)?  Sometimes (not always) you
can get this from "kenv | grep smbios."

I can also see you're running your own kernel.  We'll get to that in a
moment.

> >4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?
> 
> Weirdly this allowed it to reboot on the first try (without needing
> to be reset), but not the second.

I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
working on the USB stack and fixing major bugs.

> The "Starting background file
> system checks in 60 seconds" message appeared ... that only happens
> when something is dirty, right?

No it does not.  That message is always printed when you use background
fsck, which is the default.

I do not advocate using background fsck, because it has been known (and
may still do this -- I do not care to find out, I do not have time for
unreliable filesystem nonsense) to not always fix all filesystem
problems.  Meaning: people using background fsck have been known to boot
into single-user and issue "fsck" manually and find iss

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick

On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:
> Hello -STABLE@,
> 
> So I've seen this situation seemingly randomly on a number of both
> physical 9.1 boxes as well as VMs for I would say 6-9 months at
> least.  I finally have a physical box here that reproduces it
> consistently that I can reboot easily (ie; not a production/client
> server).
> 
> No matter what I do:
> 
> reboot
> shutdown -p
> shutdown -r
> 
> This specific server will stop at "All buffers synced" and not
> actually power down or reboot.  KB input seems to be ignored.  This
> server is a ZFS NAS (with GMIRROR for boot blocks) but the other
> boxes which show this are using GMIRRORs for root/swap/boot (no
> ZFS).
> 
> Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
> 
> When I reset the server it appears that disks were not dismounted
> cleanly ... on this ZFS box it comes back quick because ZFS is good
> like that but on the other servers with GMIRROR roots rebuilding the
> GMIRROR and fscking at the same time is murder on the
> disk/performance until it finishes.

1. You mention "as well as VMs".  Anything under a "virtual machine" or
under a hypervisor is going to be very, very, **VERY** different than
bare metal.  So I hope the issues you're talking about above are on bare
metal -- I will assume so.

2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE.
If you use stable/9 (RELENG_9) we need to see uname -a output (you can
hide the machine name if you want).

3. Can we please have dmesg from this machine?  The controller and some
other hardware details matter.

4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?

5. Does "sysctl hw.acpi.handle_reboot=1" help you?

6. Does "sysctl hw.acpi.disable_on_reboot=1" help you?

7. If none of the above helps, can you please boot verbose mode and then
when the system "locks up" on "shutdown -r now" take a picture of the
VGA console?

8. Does the machine run moused(8) (check the process list please, do not
rely on rc.conf) ?

> Another interesting thing is that this particular server runs slapd
> (OpenLDAP) which, when it comes back up, has a "corrupted" DB
> (easily fixed with db_recover, but still).  This might be because FS
> commits aren't happening at the end.   I can even manually stop
> slapd (service slapd stop) then run sync(8) (I assume this does
> something for ZFS too) and it still comes back as hosed if I reboot
> shortly after.  If I start/stop slapd it's fine.  So I feel like
> there is an FS/dismount thing going on here.

sync(8) does not do what you think it does.  Please read (not skim) this
entire thread starting here:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982
http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html

Your problem is related to unclean shutdown; fix that and your issues go
away.

> Additional information: I also have some boxes which will reboot
> (ie; they don't freeze like some do at the end) but they don't
> dismount cleanly either and have to rebuild both GMIRROR and fsck.
> This might be a different issue, too.

Every issue needs to be handled/treated separately.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG

2013-06-18 Thread Jeremy Chadwick

On Wed, Jun 19, 2013 at 01:41:10AM +0430, Javad Kouhi wrote:
> I've read the posts again. Although the issue looks same as Michiel
> Boland (first link) but I'm not sure if the root of the issue is same
> as Michiel's too (second link). Anyway, it should be discussed in
> another thread as you said.

Let me be more clear:

I have seen repeated reports from people complaining about "lockups when
shutting down" many times over the years.  The ones I remember:

- Certain oddities with SCSI/SATA storage drivers and disks (many of
  these have been fixed)
- ACPI-based reboot not working correctly on some motherboards
  (depends on hw.acpi.handle_reboot and sometimes
  hw.acpi.disable_on_reboot) -- not sure if this still pops up
- USB layer causing issues, or possibly some USB CAM integration
  problem (this is still an ongoing one)
- Now some sort of weird Intel graphics driver (and DRM?) quirk
  involving moused(8) and Vsync (the issue reported by Michiel)

And I'm certain I'm forgetting others.

What Kevin Oberman said also applies -- these are painful to debug
because the system is already in a "shutting down" state where usability
and accessibility becomes bare minimal, and you're kind of at your
wits end.

Booting verbose can help -- there are other messages printed to the VGA
(and/or serial) console during the shutdown phase when verbose.

All you can hope for is that the kernel is still alive and Ctrl-Alt-Esc
to force a drop to DDB (assuming all of this is enabled in your kernel)
works and that someone familiar with the FreeBSD kernel can help you
debug it (possibly it's just easier to do that, type "panic", then
issue "call doadump" to force a dump to swap at that point -- kib@
might have better recommendations).

Serial console can also greatly help, because quite often there are
pages upon pages of debugging information that are useful, otherwise you
have to hope the VGA console keyboard is functional (even more tricky
with USB) and that Scroll Lock + Page Up/Down function along with taking
photos of the screen; doing it this way is stressful and painful for
everyone involved.

I hope this sheds some light on why I said what I did.  :-)

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG

2013-06-18 Thread Jeremy Chadwick

On Tue, Jun 18, 2013 at 10:37:10PM +0430, Javad Kouhi wrote:
> On Tue, Jun 18, 2013 at 7:17 PM, Jeremy Chadwick  wrote:
> >
> > I do not use git, I use svn, So I cannot help you with git "crap".
> >
> > Please revert your sys/dev/drm2/i915/intel_fb.c and
> > sys/dev/syscons/scvgarndr.c back to r251934 (or newer) before following
> > what I tell you below.
> >
> > The problem is either that:
> >
> > - The patch you were given is probably for a different FreeBSD release,
> >   thus the code/line numbers/info in the code break the fuzzy logic
> >   matching,
> > - You copy-pasted the diff and because of tabs vs. spaces botched it,
> > - git apply/patch/whatever is weird,
> > - Multitudes of other possibilities I do not care to go into.
> >
> > The hack kib@ gave you is not hard to manually add yourself.  It's very
> > few lines of code.  I'm very surprised you didn't try to manually add it
> > yourself.  So I have done that for you.  First, the proof -- this is
> > against r251939, by the way, but that shouldn't matter as nobody has
> > touched this between r251934 and r251939:
> >
> > $ svn info
> > Path: .
> > Working Copy Root Path: /home/jdc/work/src
> > URL: svn://svn.freebsd.org/base/stable/9
> > Repository Root: svn://svn.freebsd.org/base
> > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> > Revision: 251939
> > Node Kind: directory
> > Schedule: normal
> > Last Changed Author: marius
> > Last Changed Rev: 251939
> > Last Changed Date: 2013-06-18 07:20:14 -0700 (Tue, 18 Jun 2013)
> >
> > $ svn status
> > M   sys/dev/drm2/i915/intel_fb.c
> > M   sys/dev/syscons/scvgarndr.c
> >
> > The diff itself is available here:
> >
> > http://jdc.koitsu.org/freebsd/sysmouse_vsync.diff
> >
> > I've also attached it here in Email (assuming the mailing list doesn't
> > delete it).
> >
> > You should apply the patch using:
> >
> >   cd /usr/src  (or wherever your source is)
> >   patch -p0 < sysmouse_vsync.diff
> >
> > Assuming use of svn, you can revert this patch by doing:
> >
> >   cd /usr/src  (or wherever your source is)
> >   svn revert sys/dev/drm2/i915/intel_fb.c
> >   svn revert sys/dev/syscons/scvgarndr.c
> >   rm sys/dev/drm2/i915/intel_fb.c.orig
> >   rm sys/dev/syscons/scvgarndr.c.orig
> >
> > There is probably some other "magical" way to do all of this, but as
> > anyone here knows, I do things manually because in general I do not
> > trust VCSes or the "magic" they do under the hood; I prefer to do things
> > that I know work.
> >
> > Good luck -- I cannot help with any other aspect to the issue.
> >
> > --
> > | Jeremy Chadwick   j...@koitsu.org |
> > | UNIX Systems Administratorhttp://jdc.koitsu.org/ |
> > | Making life hard for others since 1977. PGP 4BD6C0CB |
> >
> 
> Many thanks for the detailed answer. I've applied your patch and then
> rebuilt the world and kernel. To be honest, I tried to apply the patch
> manually but the syntax was too complex for me. Thanks for the help to
> apply the patch.
> 
> Unfortunately, the original issue is still exist and shutdown(8)
> doesn't work properly. I'm a newbie and I don't know what informations
> I should provide, but here is some basic information:
> 
> % uname -a
> FreeBSD minootux 9.1-STABLE FreeBSD 9.1-STABLE #0 r251946M: Tue Jun 18
> 21:16:56 IRDT 2013 root@minootux:/usr/obj/usr/src/sys/GIGABYTE
> amd64
> 
> % pkg_info -I -x xorg-server -x drm
> libdrm-2.4.44   Userspace interface to kernel Direct Rendering Module 
> servi
> xorg-server-1.12.4,1 X.Org X server and related programs
> 
> The machine is a laptop and the following link contains the details
> about the hardware:
> http://www.gigabyte.com/products/product-page.aspx?pid=3793#sp
> 
> KMS and NEW_XORG are enabled in my /etc/make.conf.

First, what makes you think your issue is the same issue as reported by
Michiel Boland?  Let me point you to two of his posts (read them slowly
and in full please):

http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073821.html

http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073839.html

Second, the patch is not mine -- it's Konstantin's.  I did not write the
code/fix, nor do I understand it.  All I did was provide a version of
the same patch that applied cleanly on recent stable/9.  (I'm sorry for
needing to state this, but c

Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG

2013-06-18 Thread Jeremy Chadwick

On Tue, Jun 18, 2013 at 07:00:30PM +0430, Javad Kouhi wrote:
> Thanks for the reply, seems that our source trees are not same, I got this:
> 
> % patch -p1 < /path/to/patch
> Hmm...  Looks like a unified diff to me...
> The text leading up to this was:
> --
> |diff --git a/sys/dev/drm2/i915/intel_fb.c b/sys/dev/drm2/i915/intel_fb.c
> |index 3cb3b78..e41a49f 100644
> |--- a/sys/dev/drm2/i915/intel_fb.c
> |+++ b/sys/dev/drm2/i915/intel_fb.c
> --
> Patching file sys/dev/drm2/i915/intel_fb.c using Plan A...
> Hunk #1 succeeded at 207 with fuzz 1.
> Hunk #2 failed at 231.
> 1 out of 2 hunks failed--saving rejects to sys/dev/drm2/i915/intel_fb.c.rej
> Hmm...  The next patch looks like a unified diff to me...
> The text leading up to this was:
> --
> |diff --git a/sys/dev/syscons/scvgarndr.c b/sys/dev/syscons/scvgarndr.c
> |index 6e6663c..fc7f02f 100644
> |--- a/sys/dev/syscons/scvgarndr.c
> |+++ b/sys/dev/syscons/scvgarndr.c
> --
> Patching file sys/dev/syscons/scvgarndr.c using Plan A...
> Hunk #1 succeeded at 395.
> Hunk #2 failed at 447.
> 1 out of 2 hunks failed--saving rejects to sys/dev/syscons/scvgarndr.c.rej
> done
> 
> 
> And the git way:
> 
> % git apply /path/to/patch
> error: patch failed: sys/dev/drm2/i915/intel_fb.c:207
> error: sys/dev/drm2/i915/intel_fb.c: patch does not apply
> error: patch failed: sys/dev/syscons/scvgarndr.c:445
> error: sys/dev/syscons/scvgarndr.c: patch does not apply
> 
> 
> I have revision 251934 of -STABLE branch. (I updated my source tree
> about 3 hours ago using svn)

I do not use git, I use svn, So I cannot help you with git "crap".

Please revert your sys/dev/drm2/i915/intel_fb.c and
sys/dev/syscons/scvgarndr.c back to r251934 (or newer) before following
what I tell you below.

The problem is either that:

- The patch you were given is probably for a different FreeBSD release,
  thus the code/line numbers/info in the code break the fuzzy logic
  matching,
- You copy-pasted the diff and because of tabs vs. spaces botched it,
- git apply/patch/whatever is weird,
- Multitudes of other possibilities I do not care to go into.

The hack kib@ gave you is not hard to manually add yourself.  It's very
few lines of code.  I'm very surprised you didn't try to manually add it
yourself.  So I have done that for you.  First, the proof -- this is
against r251939, by the way, but that shouldn't matter as nobody has
touched this between r251934 and r251939:

$ svn info
Path: .
Working Copy Root Path: /home/jdc/work/src
URL: svn://svn.freebsd.org/base/stable/9
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 251939
Node Kind: directory
Schedule: normal
Last Changed Author: marius
Last Changed Rev: 251939
Last Changed Date: 2013-06-18 07:20:14 -0700 (Tue, 18 Jun 2013)

$ svn status
M   sys/dev/drm2/i915/intel_fb.c
M   sys/dev/syscons/scvgarndr.c

The diff itself is available here:

http://jdc.koitsu.org/freebsd/sysmouse_vsync.diff

I've also attached it here in Email (assuming the mailing list doesn't
delete it).

You should apply the patch using:

  cd /usr/src  (or wherever your source is)
  patch -p0 < sysmouse_vsync.diff

Assuming use of svn, you can revert this patch by doing:

  cd /usr/src  (or wherever your source is)
  svn revert sys/dev/drm2/i915/intel_fb.c
  svn revert sys/dev/syscons/scvgarndr.c
  rm sys/dev/drm2/i915/intel_fb.c.orig
  rm sys/dev/syscons/scvgarndr.c.orig

There is probably some other "magical" way to do all of this, but as
anyone here knows, I do things manually because in general I do not
trust VCSes or the "magic" they do under the hood; I prefer to do things
that I know work.

Good luck -- I cannot help with any other aspect to the issue.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

Index: sys/dev/drm2/i915/intel_fb.c
===
--- sys/dev/drm2/i915/intel_fb.c	(revision 251939)
+++ sys/dev/drm2/i915/intel_fb.c	(working copy)
@@ -207,6 +207,8 @@ static void intel_fbdev_destroy(struct drm_device
 	}
 }

+extern int sc_txtmouse_no_retrace_wait;
+
 int intel_fbdev_init(struct drm_device *dev)
 {
 	struct intel_fbdev *ifbdev;
@@ -229,6 +231,7 @@ int intel_fbdev_init(struct drm_device *dev)

 	drm_fb_helper_single_add_all_connectors(&ifbdev->helper);
 	drm_fb_helper_initial_config(&ifbdev->helper, 32);
+	sc_txtmouse_no_retrace_wait = 1;
 	return 0;
 }

Index: sys/dev/syscons/scvgarndr.c
===
--- sys/dev/

Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG

2013-06-16 Thread Jeremy Chadwick

On Sun, Jun 16, 2013 at 06:01:49PM +0200, Michiel Boland wrote:
> On 06/16/2013 17:55, Jeremy Chadwick wrote:
> [...]
> 
> >Are you running moused(8)?  Actually, I can see quite clearly that you
> >are in your core.txt:
> >
> >Starting ums0 moused.
> >
> >Try turning that off.  Don't ask me how, because devd(8) / devd.conf(5)
> >might be involved.
> >
> 
> The moused is started by devd - I don't see a quick way of turning that off.

Comment out the relevant crap in devd.conf(5).  Search for "ums"
and comment out the two "notify" sections.

> As a workaround I'm trying to run a kernel with
> 
>  options SC_NO_SYSMOUSE
> 
> to see if the hangs go away.

That's one way to do it, I guess.

Be aware that I do not use X, however I have repeatedly seen mentioned
on these lists problems/complexities from where people rely on moused(8)
to "drive their mouse" while inside of X (or possibly that X and
moused(8) are both simultaneously polling the mouse).  There's
apparently a very specific kind of X configuration you're supposed to
use to get proper mouse/keyboard/HAL/HID/whatever support, and tons of
people have it wrongt.  Warren Block I think has some insights into
this, or could maybe help shed some light on what I'm remembering.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG

2013-06-16 Thread Jeremy Chadwick

On Sun, Jun 16, 2013 at 05:48:52PM +0200, Michiel Boland wrote:
> On 06/16/2013 17:37, Konstantin Belousov wrote:
> >On Sun, Jun 16, 2013 at 05:11:15PM +0200, Michiel Boland wrote:
> >>Hi. Recently I switched to WITH_NEW_XORG, primarily because the stock X 
> >>server
> >>with Intel driver has some issues that make it unusable for me.
> >>
> >>The new X server and Intel driver works extremely well, so kudos to whoever 
> >>made
> >>this possible.
> >>
> >>Unfortunately, I am now experiencing random hangs on shutdown. On shutdown 
> >>the
> >>system randomly freezes after
> >>
> >>[...] syslogd: exiting on signal 15
> >>
> >>I would then expect to see 'Waiting (max 60 seconds) for system process 
> >>'XXX' to
> >>stop messages, but these never arrive.
> >>
> >>I paniced the machine in ddb, so I have a crash dump if someone want to 
> >>look at
> >>it. The crashinfo is at http://barrytown.boland.org/core.txt (I would have
> >>pasted it here but it is a bit verbose.)
> >>
> >>Machine has an Intel G41 chipset, with a SAMSUNG SSD 830 Series HD, running
> >>9.1-STABLE r251803. Serial console. GENERIC kernel, expect for options DDB 
> >>and
> >>ALT_BREAK_TO_DEBUGGER.
> >>
> >>Who knows what's going on here?
> >
> >I do not see anything related to i915 in the core.txt you provided.
> >
> >Next time the machine hangs, start with the output of ps command from
> >ddb and 'show allpcpu', together with 'alltrace'.
> >
> 
> Ok.
> 
> I appended 'thread apply all bt' from kgdb to the core.txt, maybe
> there is something interesting in there.
> 
> I did notice the following
> 
> Thread 17 (Thread 17):
> #0  cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1392
> #1  0x80cbebbd in ipi_nmi_handler () at
> /usr/src/sys/amd64/amd64/mp_machdep.c:1374
> #2  0x80ccc159 in trap (frame=0x81424890) at
> /usr/src/sys/amd64/amd64/trap.c:211
> #3  0x80cb55af in nmi_calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:501
> #4  0x80d0c029 in vga_txtmouse (scp=0xfe0005586600,
> x=320, y=200, on=) at cpufunc.h:186
> Previous frame inner to this frame (corrupt stack?)
> 
> Maybe the hang is caused by the removal of the text mouse cursor?
> (Just guessing here.)

vga_txtmouse comes from syscons(4).

Are you making use of vidcontrol(1) in any way to set the system console
(outside of X) to something that uses the VGA framebuffer?  There are
probably some loader.conf or rc.conf variables that control this (I do
not know).

Are you running moused(8)?  Actually, I can see quite clearly that you
are in your core.txt:

Starting ums0 moused.

Try turning that off.  Don't ask me how, because devd(8) / devd.conf(5)
might be involved.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found

2013-06-16 Thread Jeremy Chadwick

On Sun, Jun 16, 2013 at 11:55:38AM +0200, Andre Albsmeier wrote:
> On Sun, 16-Jun-2013 at 10:49:37 +0200, Jeremy Chadwick wrote:
> > On Sun, Jun 16, 2013 at 10:02:39AM +0200, Andre Albsmeier wrote:
> > > On Sun, 16-Jun-2013 at 08:54:41 +0200, Jeremy Chadwick wrote:
> > > > On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote:
> > > > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > > > > This used to work perfectly under 7-STABLE for years but since
> > > > > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > > > > of all cases.
> > > > > > > 
> > > > > > > After rebooting we find a new snapshot file which is a bit
> > > > > > > smaller than the good ones and with different permissions
> > > > > > > It does not succeed a fsck. In this example it is the one
> > > > > > > whose name is beginning with s3:
> > > > > > > 
> > > > > > > -r--r-   1 root  operator  snapshot 72802894528 29 May 05:15 
> > > > > > > s2-2013.05.28-03.15.04
> > > > > > > -r   1 root  operator  snapshot 72802893824 29 May 05:15 
> > > > > > > s3-2013.05.29-03.15.03
> > > > > > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > > > > > s4-2013.05.23-06.38.44
> > > > > > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > > > > > s5-2013.05.24-03.15.03
> > > > > > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > > > > > s6-2013.05.25-03.15.03
> > > > > > > 
> > > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > > > > 
> > > > > > > May 29 05:15:00  palveli kernel: lock order reversal:
> > > > > > > May 29 05:15:00  palveli kernel: 1st 0xc2371da8 ufs 
> > > > > > > (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > > > > May 29 05:15:00  palveli kernel: 2nd 0xc2371ec4 devfs 
> > > > > > > (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > > > > May 29 05:15:04  palveli kernel: lock order reversal:
> > > > > > > May 29 05:15:04  palveli kernel: 1st 0xc228471c snaplk 
> > > > > > > (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > > > > May 29 05:15:04  palveli kernel: 2nd 0xc22f25e4 ufs 
> > > > > > > (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > > > > 
> > > > > > > Unfortunatley no corefiles are being generated ;-(.
> > > > > > > 
> > > > > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > > > > from scratch. I have also seen this happen on an UFS2 on
> > > > > > > another machine and on a third one when running "dump -L"
> > > > > > > on a root fs.
> > > > > > > 
> > > > > > > Any hints of how to proceed?
> > > > > > 
> > > > > > Would it be possible to setup a serial console that is logged on 
> > > > > > this machine
> > > > > > to see if it is panic'ing but failing to write out a crashdump?
> > > > > 
> > > > > I'll try to arrange that. It'll take a bit since this
> > > > > box is 200 km away... 
> > > > > 
> > > > > Maybe I'll find another one nearby to reproduce it...
> > > > 
> > > > SPECIFICALLY regarding "lack of crash dumps": I need to see the
> > > > following:
> > > > 
> > > > * cat /etc/rc.conf
> > > > * cat /etc/fstab
> > > > 
> > > > I may need output from other commands, but shall deal with that when I
> > > > see output from the above.  Thanks.
> > > 
> > > No problem, see below...
> > > 
> > > To make a long story short, the machine dumps core perfectly
> > > (tested that a while ago), but not when deali

Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found

2013-06-16 Thread Jeremy Chadwick

On Sun, Jun 16, 2013 at 10:02:39AM +0200, Andre Albsmeier wrote:
> On Sun, 16-Jun-2013 at 08:54:41 +0200, Jeremy Chadwick wrote:
> > On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote:
> > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > > This used to work perfectly under 7-STABLE for years but since
> > > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > > of all cases.
> > > > > 
> > > > > After rebooting we find a new snapshot file which is a bit
> > > > > smaller than the good ones and with different permissions
> > > > > It does not succeed a fsck. In this example it is the one
> > > > > whose name is beginning with s3:
> > > > > 
> > > > > -r--r-   1 root  operator  snapshot 72802894528 29 May 05:15 
> > > > > s2-2013.05.28-03.15.04
> > > > > -r   1 root  operator  snapshot 72802893824 29 May 05:15 
> > > > > s3-2013.05.29-03.15.03
> > > > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > > > s4-2013.05.23-06.38.44
> > > > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > > > s5-2013.05.24-03.15.03
> > > > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > > > s6-2013.05.25-03.15.03
> > > > > 
> > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > > 
> > > > > May 29 05:15:00  palveli kernel: lock order reversal:
> > > > > May 29 05:15:00  palveli kernel: 1st 0xc2371da8 ufs (ufs) 
> > > > > @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > > May 29 05:15:00  palveli kernel: 2nd 0xc2371ec4 devfs 
> > > > > (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > > May 29 05:15:04  palveli kernel: lock order reversal:
> > > > > May 29 05:15:04  palveli kernel: 1st 0xc228471c snaplk 
> > > > > (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > > May 29 05:15:04  palveli kernel: 2nd 0xc22f25e4 ufs (ufs) 
> > > > > @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > > 
> > > > > Unfortunatley no corefiles are being generated ;-(.
> > > > > 
> > > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > > from scratch. I have also seen this happen on an UFS2 on
> > > > > another machine and on a third one when running "dump -L"
> > > > > on a root fs.
> > > > > 
> > > > > Any hints of how to proceed?
> > > > 
> > > > Would it be possible to setup a serial console that is logged on this 
> > > > machine
> > > > to see if it is panic'ing but failing to write out a crashdump?
> > > 
> > > I'll try to arrange that. It'll take a bit since this
> > > box is 200 km away... 
> > > 
> > > Maybe I'll find another one nearby to reproduce it...
> > 
> > SPECIFICALLY regarding "lack of crash dumps": I need to see the
> > following:
> > 
> > * cat /etc/rc.conf
> > * cat /etc/fstab
> > 
> > I may need output from other commands, but shall deal with that when I
> > see output from the above.  Thanks.
> 
> No problem, see below...
> 
> To make a long story short, the machine dumps core perfectly
> (tested that a while ago), but not when dealing with _this_
> issue...
> 
> I dump on da1s1b and savecore fetches it from there and puts
> it on /var (sitting on da0), that's faster.
> 
> rc.conf (beware, rc.conf.local exists):
> ---
> rcshutdown_timeout=180
> tmpmfs=YES
> tmpsize="$(( `/sbin/sysctl -n hw.usermem` / 300 ))m"
> tmpmfs_flags="$tmpmfs_flags -v 1 -n"
> 
> background_fsck=NO
> 
> nisdomainname=ofw.tld
> pflog_flags=-S
> 
> syslogd_flags=-svv
> inetd_enable=YES
> inetd_flags=-l
> named_flags="-S 1000"
> named_chrootdir=""
> rwhod_enable=YES
> sshd_enable=YES
> amd_enable=YES
> amd_flags="-F /etc/amd.conf"
> nfs_client_enable=YES
> nfs_access_cache=2
> mountd_flags=-n
> rpcbind_

Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found

2013-06-16 Thread Jeremy Chadwick

On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote:
> On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > Each day at 5:15 we are generating snapshots on various machines.
> > > This used to work perfectly under 7-STABLE for years but since
> > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > of all cases.
> > > 
> > > After rebooting we find a new snapshot file which is a bit
> > > smaller than the good ones and with different permissions
> > > It does not succeed a fsck. In this example it is the one
> > > whose name is beginning with s3:
> > > 
> > > -r--r-   1 root  operator  snapshot 72802894528 29 May 05:15 
> > > s2-2013.05.28-03.15.04
> > > -r   1 root  operator  snapshot 72802893824 29 May 05:15 
> > > s3-2013.05.29-03.15.03
> > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > s4-2013.05.23-06.38.44
> > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > s5-2013.05.24-03.15.03
> > > -r--r-   1 root  operator  snapshot 72802894528 28 May 14:22 
> > > s6-2013.05.25-03.15.03
> > > 
> > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > 
> > > May 29 05:15:00  palveli kernel: lock order reversal:
> > > May 29 05:15:00  palveli kernel: 1st 0xc2371da8 ufs (ufs) @ 
> > > /src/src-9/sys/kern/vfs_mount.c:1240
> > > May 29 05:15:00  palveli kernel: 2nd 0xc2371ec4 devfs (devfs) 
> > > @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > May 29 05:15:04  palveli kernel: lock order reversal:
> > > May 29 05:15:04  palveli kernel: 1st 0xc228471c snaplk 
> > > (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > May 29 05:15:04  palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ 
> > > /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > 
> > > Unfortunatley no corefiles are being generated ;-(.
> > > 
> > > I have checked and even rebuilt the (UFS1) fs in question
> > > from scratch. I have also seen this happen on an UFS2 on
> > > another machine and on a third one when running "dump -L"
> > > on a root fs.
> > > 
> > > Any hints of how to proceed?
> > 
> > Would it be possible to setup a serial console that is logged on this 
> > machine
> > to see if it is panic'ing but failing to write out a crashdump?
> 
> I'll try to arrange that. It'll take a bit since this
> box is 200 km away... 
> 
> Maybe I'll find another one nearby to reproduce it...

SPECIFICALLY regarding "lack of crash dumps": I need to see the
following:

* cat /etc/rc.conf
* cat /etc/fstab

I may need output from other commands, but shall deal with that when I
see output from the above.  Thanks.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ACPI Warning, then hang

2013-06-13 Thread Jeremy Chadwick

On Thu, Jun 13, 2013 at 05:32:21PM -0500, Bryce Edwards wrote:
> On Mon, Jun 10, 2013 at 9:32 PM, Jeremy Chadwick  wrote:
> > On Mon, Jun 10, 2013 at 09:18:47PM -0500, Bryce Edwards wrote:
> >> Verbose boot:
> >>
> >> https://www.dropbox.com/s/obm8rtavro68ea8/acpi-verbose.jpg
> >>
> >>
> >> On Mon, Jun 10, 2013 at 11:27 AM, Bryce Edwards  wrote:
> >> > On Mon, Jun 10, 2013 at 11:19 AM, John Baldwin  wrote:
> >> >> On Monday, June 10, 2013 10:35:07 am Jeremy Chadwick wrote:
> >> >>> On Mon, Jun 10, 2013 at 09:18:14AM -0500, Bryce Edwards wrote:
> >> >>> > I'm getting the following warning, and then the system locks:
> >> >>> >
> >> >>> > ACPI Warning: Incorrect checksum in table [(bunch of spaces)] - 0x29,
> >> >>> > should be 0x48
> >> >>> >
> >> >>> > Here's a pic: http://db.tt/O6dxONzI
> >> >>> >
> >> >>> > System is on a SuperMicro C7X58 motherboard that I just upgraded to
> >> >>> > BIOS 2.0a, which I would like to stay on if possible.  I tried
> >> >>> > adjusting all the ACPI related BIOS settings without success.
> >> >>>
> >> >>> The message in question refers to hard-coded data in one of the many
> >> >>> ACPI tables (see acpidump(8) for the list -- there are many).  ACPI
> >> >>> tables are stored within the BIOS -- the motherboard/BIOS vendor has
> >> >>> full control over all of them and is fully 100% responsible for their
> >> >>> content.
> >> >>>
> >> >>> It looks to me like they severely botched their BIOS, or somehow it got
> >> >>> flashed wrong.
> >> >>>
> >> >>> You need to contact Supermicro Technical Support and tell them of the
> >> >>> problem.  They need to either fix their BIOS, or help figure out what's
> >> >>> become corrupted.  You can point them to this thread if you'd like.
> >> >>>
> >> >>> I should note that the corruption/issue is major enough that you are
> >> >>> missing very key/important lines from your dmesg (after "avail memory"
> >> >>> but before "kdbX at kdbmuxX", which come from pure reliance upon ACPI.
> >> >>> Lines such as:
> >> >>>
> >> >>> Event timer "LAPIC" quality 400
> >> >>> ACPI APIC Table: 
> >> >>> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> >> >>> FreeBSD/SMP: 1 package(s) x 4 core(s)
> >> >>>  cpu0 (BSP): APIC ID:  0
> >> >>>  cpu1 (AP): APIC ID:  1
> >> >>>  cpu2 (AP): APIC ID:  2
> >> >>>  cpu3 (AP): APIC ID:  3
> >> >>> ioapic0  irqs 0-23 on motherboard
> >> >>> ioapic1  irqs 24-47 on motherboard
> >> >>>
> >> >>> In the meantime, you can try booting without ACPI support (there should
> >> >>> be a boot-up menu option for that) and pray that works.  If it doesn't,
> >> >>> then your workaround is to roll back to an older BIOS version and/or 
> >> >>> put
> >> >>> pressure on Supermicro.  You will find their Technical Support folks 
> >> >>> are
> >> >>> quite helpful/responsive to technical issues.
> >> >>>
> >> >>> Good luck and keep us posted on what transpires.
> >> >>
> >> >> Actually, that message is mostly harmless.  All sorts of vendors ship
> >> >> tables with busted checksums that are in fact fine. :(  However, the 
> >> >> table
> >> >> name looks very odd which is more worrying.  Booting without ACPI 
> >> >> enabled
> >> >> would be a good first step.  Trying a verbose boot to capture the last
> >> >> message before the hang would also be useful.
> >> >>
> >> >> --
> >> >> John Baldwin
> >> >
> >> > Booting without ACPI did not work for me, although I might be able to
> >> > hack away at lots of BIOS setting to make it work.  It didn't assign
> >> > IRQ's to things like the storage controller, etc. soI thought it was
> >> > probably not worth the effort.
> >> >
> >> > I did contact SuperMicro support

Re: ACPI Warning, then hang

2013-06-10 Thread Jeremy Chadwick

On Mon, Jun 10, 2013 at 09:18:47PM -0500, Bryce Edwards wrote:
> Verbose boot:
> 
> https://www.dropbox.com/s/obm8rtavro68ea8/acpi-verbose.jpg
> 
> 
> On Mon, Jun 10, 2013 at 11:27 AM, Bryce Edwards  wrote:
> > On Mon, Jun 10, 2013 at 11:19 AM, John Baldwin  wrote:
> >> On Monday, June 10, 2013 10:35:07 am Jeremy Chadwick wrote:
> >>> On Mon, Jun 10, 2013 at 09:18:14AM -0500, Bryce Edwards wrote:
> >>> > I'm getting the following warning, and then the system locks:
> >>> >
> >>> > ACPI Warning: Incorrect checksum in table [(bunch of spaces)] - 0x29,
> >>> > should be 0x48
> >>> >
> >>> > Here's a pic: http://db.tt/O6dxONzI
> >>> >
> >>> > System is on a SuperMicro C7X58 motherboard that I just upgraded to
> >>> > BIOS 2.0a, which I would like to stay on if possible.  I tried
> >>> > adjusting all the ACPI related BIOS settings without success.
> >>>
> >>> The message in question refers to hard-coded data in one of the many
> >>> ACPI tables (see acpidump(8) for the list -- there are many).  ACPI
> >>> tables are stored within the BIOS -- the motherboard/BIOS vendor has
> >>> full control over all of them and is fully 100% responsible for their
> >>> content.
> >>>
> >>> It looks to me like they severely botched their BIOS, or somehow it got
> >>> flashed wrong.
> >>>
> >>> You need to contact Supermicro Technical Support and tell them of the
> >>> problem.  They need to either fix their BIOS, or help figure out what's
> >>> become corrupted.  You can point them to this thread if you'd like.
> >>>
> >>> I should note that the corruption/issue is major enough that you are
> >>> missing very key/important lines from your dmesg (after "avail memory"
> >>> but before "kdbX at kdbmuxX", which come from pure reliance upon ACPI.
> >>> Lines such as:
> >>>
> >>> Event timer "LAPIC" quality 400
> >>> ACPI APIC Table: 
> >>> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> >>> FreeBSD/SMP: 1 package(s) x 4 core(s)
> >>>  cpu0 (BSP): APIC ID:  0
> >>>  cpu1 (AP): APIC ID:  1
> >>>  cpu2 (AP): APIC ID:  2
> >>>  cpu3 (AP): APIC ID:  3
> >>> ioapic0  irqs 0-23 on motherboard
> >>> ioapic1  irqs 24-47 on motherboard
> >>>
> >>> In the meantime, you can try booting without ACPI support (there should
> >>> be a boot-up menu option for that) and pray that works.  If it doesn't,
> >>> then your workaround is to roll back to an older BIOS version and/or put
> >>> pressure on Supermicro.  You will find their Technical Support folks are
> >>> quite helpful/responsive to technical issues.
> >>>
> >>> Good luck and keep us posted on what transpires.
> >>
> >> Actually, that message is mostly harmless.  All sorts of vendors ship
> >> tables with busted checksums that are in fact fine. :(  However, the table
> >> name looks very odd which is more worrying.  Booting without ACPI enabled
> >> would be a good first step.  Trying a verbose boot to capture the last
> >> message before the hang would also be useful.
> >>
> >> --
> >> John Baldwin
> >
> > Booting without ACPI did not work for me, although I might be able to
> > hack away at lots of BIOS setting to make it work.  It didn't assign
> > IRQ's to things like the storage controller, etc. soI thought it was
> > probably not worth the effort.
> >
> > I did contact SuperMicro support as well, so we'll see what they have to 
> > say.
> >
> > I'll get a verbose boot posted up in a bit.

A screenshot of a verbose boot is insufficient; as I'm sure you noticed
there are pages upon pages of information before the lock-up/crash.
Those pages are what folks are interested in.

Because the system is hung, I doubt hitting Scroll Lock + using
PageUp/PageDown to go through the kernel message scrollback will work.

You're going to need a serial-based console (i.e. hook something up to
COM1 on the motherboard, and get a null modem cable to connect to
another system where you use a serial port/terminal emulator (ex. PuTTY
for Windows, etc.) that has a scrollback buffer which you can copy-paste
or save.  Set your serial port for 9600 baud, 8 bits, no parity, and 1
stop bit (9600bps, 8N1).  You'll need to have physical access to both
systems simultaneously.

At the VGA console, boot FreeBSD then escape to the loader prompt
("ok") and issue the following commands:

set boot_multicons="YES"
set boot_serial="YES"
set console="comconsole,vidconsole"
boot

You should begin seeing output on the serial port, and the system will
eventually hang/etc..  Then provide the captured output from the serial
port here.  :-)

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ACPI Warning, then hang

2013-06-10 Thread Jeremy Chadwick

On Mon, Jun 10, 2013 at 09:18:14AM -0500, Bryce Edwards wrote:
> I'm getting the following warning, and then the system locks:
> 
> ACPI Warning: Incorrect checksum in table [(bunch of spaces)] - 0x29,
> should be 0x48
> 
> Here's a pic: http://db.tt/O6dxONzI
> 
> System is on a SuperMicro C7X58 motherboard that I just upgraded to
> BIOS 2.0a, which I would like to stay on if possible.  I tried
> adjusting all the ACPI related BIOS settings without success.

The message in question refers to hard-coded data in one of the many
ACPI tables (see acpidump(8) for the list -- there are many).  ACPI
tables are stored within the BIOS -- the motherboard/BIOS vendor has
full control over all of them and is fully 100% responsible for their
content.

It looks to me like they severely botched their BIOS, or somehow it got
flashed wrong.

You need to contact Supermicro Technical Support and tell them of the
problem.  They need to either fix their BIOS, or help figure out what's
become corrupted.  You can point them to this thread if you'd like.

I should note that the corruption/issue is major enough that you are
missing very key/important lines from your dmesg (after "avail memory"
but before "kdbX at kdbmuxX", which come from pure reliance upon ACPI.
Lines such as:

Event timer "LAPIC" quality 400
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard

In the meantime, you can try booting without ACPI support (there should
be a boot-up menu option for that) and pray that works.  If it doesn't,
then your workaround is to roll back to an older BIOS version and/or put
pressure on Supermicro.  You will find their Technical Support folks are
quite helpful/responsive to technical issues.

Good luck and keep us posted on what transpires.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Error in make buildkernel `

2013-06-10 Thread Jeremy Chadwick

On Mon, Jun 10, 2013 at 02:04:59PM +0200, Willem Jan Withagen wrote:
> I'm trying to build a stable kernle on a freshly build 8.4-Stable i386
> system.
> 
> And I get:
> MAKE=make sh /usr/srcs/src9/src/sys/conf/newvers.sh GENERIC
> /usr/local/bin/svnversion
> cc -c -O -pipe  -std=c99 -g -Wall -Wredundant-decls -Wnested-externs
> -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline
> -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions
> -Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I.
> -I/usr/srcs/src9/src/sys -I/usr/srcs/src9/src/sys/contrib/altq -D_KERNEL
> -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common
> -finline-limit=8000 --param inline-unit-growth=100 --param
> large-function-growth=1000  -mno-align-long-strings
> -mpreferred-stack-boundary=2 -mno-mmx -mno-sse -msoft-float
> -ffreestanding -fstack-protector -Werror  vers.c
> ctfconvert -L VERSION -g vers.o
> linking kernel.debug
> ld:/usr/srcs/src9/src/sys/conf/ldscript.i386:66: syntax error
> *** Error code 1
> 
> Stop in /usr/obj/usr/srcs/src9/src/sys/GENERIC.
> *** Error code 1
> 
> Stop in /usr/srcs/src9/src.
> *** Error code 1
> 
> Line 66 is:   .eh_frame   : ONLY_IF_RO { KEEP (*(.eh_frame)) }
> The piece of "code" around line 66 looks like:
> 
>   PROVIDE (__etext = .);
>   PROVIDE (_etext = .);
>   PROVIDE (etext = .);
>   .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
>   .rodata1: { *(.rodata1) }
>   .eh_frame_hdr : { *(.eh_frame_hdr) }
>   .eh_frame   : ONLY_IF_RO { KEEP (*(.eh_frame)) }
>   .gcc_except_table   : ONLY_IF_RO { *(.gcc_except_table
> .gcc_except_table.*) }
>   /* Adjust the address for the data segment.  We want to adjust up to
>  the same address within the page on the next page up.  */
>   . = ALIGN (CONSTANT (MAXPAGESIZE)) - ((CONSTANT (MAXPAGESIZE) - .) &
> (CONSTANT (MAXPAGESIZE) - 1)); . = DATA_SEGMENT_ALIGN (CONSTANT
> (MAXPAGESIZE), CONSTANT (COMMONPAGESI
> ZE));
>   /* Exception handling  */
> 
> Any suggestions on how to fix this??

I can't help with the actual syntax error, but from the path names
involved here, it looks like you:

1) are using an alternate location for src (/usr/srcs not /usr/src),

2) are trying to build FreeBSD 9.x on an 8.4-STABLE box
(/usr/obj/usr/srcs/src9)

Is that correct?  You might want to provide /etc/make.conf and
/etc/src.conf from this system or other details of the "build framework"
you might be using.  That might help/pertain to the situation.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.4 and EHCI - regression?

2013-06-09 Thread Jeremy Chadwick

le 802.11s draft support
> #lena device  wlan_wep# 802.11 WEP support
> #lena device  wlan_ccmp   # 802.11 CCMP support
> #lena device  wlan_tkip   # 802.11 TKIP support
> #lena device  wlan_amrr   # AMRR transmit rate control algorithm
> #lena device  an  # Aironet 4500/4800 802.11 wireless 
> NICs.
> #lena device  ath # Atheros pci/cardbus NIC's
> #lena device  ath_hal # pci/cardbus chip support
> #lena options AH_SUPPORT_AR5416   # enable AR5416 tx/rx 
> descriptors
> #lena device  ath_rate_sample # SampleRate tx rate control for ath
> #lena device  ral # Ralink Technology RT2500 wireless 
> NICs.
> #lena device  wi  # WaveLAN/Intersil/Symbol 802.11 
> wireless NICs.
> #lena #device wl  # Older non 802.11 Wavelan wireless NIC.
> 
> # Pseudo devices.
> deviceloop# Network loopback
> devicerandom  # Entropy device
> #lena options PADLOCK_RNG # VIA Padlock RNG
> options   RDRAND_RNG  # Intel Bull Mountain RNG
> deviceether   # Ethernet support
> #lena device  vlan# 802.1Q VLAN support
> #lena device  tun # Packet tunnel.
> devicepty # BSD-style compatibility pseudo ttys
> #lena:load-as-module device   md  # Memory "disks"
> #lena device  gif # IPv6 and IPv4 tunneling
> #lena device  faith   # IPv6-to-IPv4 relaying (translation)
> #lena device  firmware# firmware assist module
> 
> # The `bpf' device enables the Berkeley Packet Filter.
> # Be aware of the administrative consequences of enabling this!
> # Note that 'bpf' is required for DHCP.
> devicebpf # Berkeley packet filter
> 
> # USB support
> options   USB_DEBUG   # enable debug msgs
> deviceuhci# UHCI PCI->USB interface
> deviceohci# OHCI PCI->USB interface
> deviceehci# EHCI PCI->USB interface (USB 2.0)
> deviceusb # USB Bus (required)
> #device   udbp# USB Double Bulk Pipe devices
> deviceuhid# "Human Interface Devices"
> deviceukbd# Keyboard
> #lena device  ulpt# Printer
> deviceumass   # Disks/Mass storage - Requires scbus 
> and da
> #lena:load-as-module device   ums # Mouse
> #lena device  urio# Diamond Rio 500 MP3 player
> # USB Serial devices
> #lena device  u3g # USB-based 3G modems (Option, Huawei, 
> Sierra)
> #lena device  uark# Technologies ARK3116 based serial 
> adapters
> #lena device  ubsa# Belkin F5U103 and compatible serial 
> adapters
> #lena device  uftdi   # For FTDI usb serial adapters
> #lena device  uipaq   # Some WinCE based devices
> #lena device  uplcom  # Prolific PL-2303 serial adapters
> #lena device  uslcom  # SI Labs CP2101/CP2102 serial adapters
> #lena device  uvisor  # Visor and Palm devices
> #lena device  uvscom  # USB serial support for DDI pocket's 
> PHS
> # USB Ethernet, requires miibus
> #lena device  aue # ADMtek USB Ethernet
> #lena device  axe # ASIX Electronics USB Ethernet
> #lena device  cdce# Generic USB over Ethernet
> #lena device  cue # CATC USB Ethernet
> #lena device  kue # Kawasaki LSI USB Ethernet
> #lena device  rue # RealTek RTL8150 USB Ethernet
> #lena device  udav# Davicom DM9601E USB
> # USB Wireless
> #lena device  rum # Ralink Technology RT2501USB wireless 
> NICs
> #lena device  uath# Atheros AR5523 wireless NICs
> #lena device  ural# Ralink Technology RT2500USB wireless 
> NICs
> #lena device  zyd # ZyDAS zd1211/zd1211b wireless NICs
> 
> # FireWire support
> #lena device  firewire# FireWire bus code
> #device   sbp # SCSI over FireWire (Requires scbus 
> and da)
> #lena device  fwe # Ethernet over FireWire (non-standard!)
> #lena device  fwip# IP over FireWire (RFC 2734,3146)
> #lena device  dcons   # Dumb console driver
> #lena device  dcons_crom  # Configuration ROM for dcons
> 
> # VirtIO support
> devicevirtio  # Generic VirtIO bus (required)
> devicevirtio_pci  # VirtIO PCI device
> devicevtnet   # VirtIO Ethernet device
> devicevirtio_blk  # VirtIO Block device
> devicevirtio_scsi # VirtIO SCSI device
> devicevirtio_balloon  # VirtIO Memory Balloon device
> 
> #lenab
> # from /sys/conf/NOTES:
> 
> # Optional character code conversion support with LIBICONV.
> # Each option requires their base file system and LIBICONV.
> 
> options MSDOSFS_ICONV
> 
> # Kernel side iconv library
> options LIBICONV
> 
> # Set the amount of time (in seconds) the system will wait before
> # rebooting automatically when a kernel panic occurs.  If set to (-1),
> # the system will wait indefinitely until a key is pressed on the
> # console.
> options PANIC_REBOOT_WAIT_TIME=60   #lena was 16
> 
> # from /sys/i386/conf/NOTES:
> 
> # Enable Linux ABI emulation
> options COMPAT_LINUX
> 
> #lenae

CC'ing freebsd-usb@, where Hans can probably help with this.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: fxp0 interface going up/down/up/down (dhclient related?)

2013-06-09 Thread Jeremy Chadwick

On Sun, Jun 09, 2013 at 02:48:29PM +0200, Alban Hertroys wrote:
> On Jun 9, 2013, at 12:44, Jeremy Chadwick  wrote:
> 
> > On Sun, Jun 09, 2013 at 12:21:37PM +0200, Alban Hertroys wrote:
> >> I'm having an issue where my fxp0 interface keeps looping between DOWN/UP, 
> >> with dhclient requesting a lease each time in between. I think it's caused 
> >> by dhclient:
> >> 
> >> solfertje # dhclient -d fxp0
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> send_packet: Network is down
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> fxp0 link state down -> up
> >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> >> DHCPACK from 109.72.40.1
> >> bound to 141.105.10.89 -- renewal in 7200 seconds.
> >> fxp0 link state up -> down
> >> ^C
> >> 
> >> In above test I turned off devd (/etc/rc.d/devd stop) and background 
> >> dhclient (/etc/rc.d/dhclient stop fxp0), and I still go the above result. 
> >> There's practically no time spent between up/down cycles, this just keeps 
> >> going on and on.
> >> fxp0 is the only interface that runs on DHCP. The others have static IP's.
> >> 
> >> Initially I thought the issue might be caused by devd, because I have both 
> >> ethernet and 822.11 type NICs (2x ethernet, 1x wifi) in that system.
> >> 
> >> This is 9-STABLE from yesterday.
> >> 
> >> Before, I had 9-RELEASE running on this system with the same config, and 
> >> that worked well.
> > 
> > And so what I predicted begins...
> > 
> > The issue is described in the 8.4-RELEASE Errata Notes; the driver is
> > using the same driver version as in stable/9, hence you're experiencing
> > the same problem.  See Open Issues:
> > 
> > http://www.freebsd.org/releases/8.4R/errata.html
> > 
> > No fix for this has been committed.  It is still under discussions by
> > multiple kernel folks as to where the fix should be applied (dhclient or
> > the fxp(4) driver), because the changes made to dhclient (that tickle
> > this bug) may actually affect more drivers than just fxp(4).
> > 
> > You can start by reading the (extremely long but very informative)
> > thread here.  I do urge you to read all the posts, not skim them:
> > 
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/thread.html#73440
> 
> Goodness, and here I was hoping it was just a silly mistake I made?
> 
> IIUC, the issue is a combination of:
> - dhclient now being aware of link state changes and
> - the fxp driver reinitializes for certain mode changes, such as assigning an 
> IP address
> 
> Which causes dhclient to think that the link state changed, fetch a "new" IP 
> address and assigns it to the fxp adapter again, causing the same link state 
> change over and over again.
> 
> Is that about correct?

Someone else can answer this.

> > The only known workarounds at this time are:
> > 
> > a) Cease use of DHCP; set a static IP in rc.conf,
> > 
> > b) Try some of the patches mentioned within the above thread,
> > specifically this one:
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073581.html
> 
> Or c) Use DHCP with a static media setting:
> ifconfig_fxp0="DHCP media 100baseTX mediaopt full-duplex"

DO NOT DO THIS.  People who do this do not understand what this does.
This has bad effects on IEEE 802.3 and will not do/behave like you might
think.  The short version:

The ONLY TIME you should be hard-setting speed and duplex in ifconfig is
when you have a managed switch on the other end where you can set the
speed/duplex for that port as well.

Re: fxp0 interface going up/down/up/down (dhclient related?)

2013-06-09 Thread Jeremy Chadwick

On Sun, Jun 09, 2013 at 01:21:53PM +0200, ?ukasz Gruner wrote:
> On Sun, Jun 9, 2013, at 12:44, Jeremy Chadwick wrote:
> > On Sun, Jun 09, 2013 at 12:21:37PM +0200, Alban Hertroys wrote:
> > > I'm having an issue where my fxp0 interface keeps looping between 
> > > DOWN/UP, with dhclient requesting a lease each time in between. I think 
> > > it's caused by dhclient:
> > And so what I predicted begins...
> 
> I have been suffering this issue since forever (which for me began at
> freebsd 9.0). Currently I'm at stable9.

The problem we're talking about was a direct result of this PR:

http://www.freebsd.org/cgi/query-pr.cgi?pr=166656

The commit (MFC) was done to stable/8 and stable/9 in this revision and
at this date/time:

stable/9 commit: r247335 -- 2013/02/26
stable/8 commit: r247336 -- 2013/02/26

You can see the commit log/messages in the PR.

Now let's talk about versions:

FreeBSD 9.0-RELEASE came out 2012/01/12:

http://lists.freebsd.org/pipermail/freebsd-announce/2012-January/001406.html

FreeBSD 9.1-RELEASE came out 2012/12/30:

http://lists.freebsd.org/pipermail/freebsd-announce/2012-December/001448.html

So when you say "the issue" for you "began at FreeBSD 9.0", you need to
be more specific (uname -a output would be a good start), because
otherwise to me it sounds like you're experiencing a *completely*
different problem.

> Much appreciated, shouldn't this be at wiki? 

What wiki?  How would people know to read it?  Using a web search engine
like Google?  That would return this mailing list thread, as well as
the ones I've referenced.

There is enough old/outdated/completely and absolutely WRONG crap on the
FreeBSD Wiki as is.  The Wiki is not the "official source/list of
problems" (there is no official source/list -- the mailing lists are,
for a decade, have been as good as it gets).

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: fxp0 interface going up/down/up/down (dhclient related?)

2013-06-09 Thread Jeremy Chadwick

On Sun, Jun 09, 2013 at 12:21:37PM +0200, Alban Hertroys wrote:
> I'm having an issue where my fxp0 interface keeps looping between DOWN/UP, 
> with dhclient requesting a lease each time in between. I think it's caused by 
> dhclient:
> 
> solfertje # dhclient -d fxp0
> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> send_packet: Network is down
> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> DHCPACK from 109.72.40.1
> bound to 141.105.10.89 -- renewal in 7200 seconds.
> fxp0 link state up -> down
> fxp0 link state down -> up
> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> DHCPACK from 109.72.40.1
> bound to 141.105.10.89 -- renewal in 7200 seconds.
> fxp0 link state up -> down
> fxp0 link state down -> up
> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> DHCPACK from 109.72.40.1
> bound to 141.105.10.89 -- renewal in 7200 seconds.
> fxp0 link state up -> down
> fxp0 link state down -> up
> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> DHCPACK from 109.72.40.1
> bound to 141.105.10.89 -- renewal in 7200 seconds.
> fxp0 link state up -> down
> fxp0 link state down -> up
> DHCPREQUEST on fxp0 to 255.255.255.255 port 67
> DHCPACK from 109.72.40.1
> bound to 141.105.10.89 -- renewal in 7200 seconds.
> fxp0 link state up -> down
> ^C
> 
> In above test I turned off devd (/etc/rc.d/devd stop) and background dhclient 
> (/etc/rc.d/dhclient stop fxp0), and I still go the above result. There's 
> practically no time spent between up/down cycles, this just keeps going on 
> and on.
> fxp0 is the only interface that runs on DHCP. The others have static IP's.
> 
> Initially I thought the issue might be caused by devd, because I have both 
> ethernet and 822.11 type NICs (2x ethernet, 1x wifi) in that system.
> 
> This is 9-STABLE from yesterday.
> 
> Before, I had 9-RELEASE running on this system with the same config, and that 
> worked well.

And so what I predicted begins...

The issue is described in the 8.4-RELEASE Errata Notes; the driver is
using the same driver version as in stable/9, hence you're experiencing
the same problem.  See Open Issues:

http://www.freebsd.org/releases/8.4R/errata.html

No fix for this has been committed.  It is still under discussions by
multiple kernel folks as to where the fix should be applied (dhclient or
the fxp(4) driver), because the changes made to dhclient (that tickle
this bug) may actually affect more drivers than just fxp(4).

You can start by reading the (extremely long but very informative)
thread here.  I do urge you to read all the posts, not skim them:

http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html
http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/thread.html#73440

The only known workarounds at this time are:

a) Cease use of DHCP; set a static IP in rc.conf,

b) Try some of the patches mentioned within the above thread,
specifically this one:

http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073581.html

The patch is for head (CURRENT) so it may not patch cleanly.  If not,
you can try to work the patch in yourself/by hand, or you can ask
Yong-Hyeon or others for help.

> I'm not sure it's related, but on the wireless interface I get  alot of:
> Jun  9 12:08:11 solfertje kernel: ath0: stuck beacon; resetting (bmiss count 
> 4)

Absolutely 100% unrelated.  That issue has been around for years, and
the root cause varies tremendously.  I discussed it back in February
2011:

http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061700.html

If you want to know how I solved that problem, I can tell you, but I'm
certain you won't be happy to hear what I have to say.

If you're concerned about this problem, please start another thread
discussing it.  I'm sure Adrian Chadd can provide you lots of insights,
but most of them are already in his response to my above thread/post.

> {snipping other stuff}

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: TRIM support through ciss

2013-06-05 Thread Jeremy Chadwick

On Thu, Jun 06, 2013 at 02:00:36AM +0400, Dmitry Morozovsky wrote:
> Dear colleagues,
> 
> I have a DB server with ciss and a bunch of disks (8 SAS + 2 Intel SATA SSD).
> 
> However, this setup does not seem to support TRIM on SSDs:
> 
> kstat.zfs.misc.zio_trim.bytes: 0
> kstat.zfs.misc.zio_trim.success: 0
> kstat.zfs.misc.zio_trim.unsupported: 418
> kstat.zfs.misc.zio_trim.failed: 0
> 
> 
> Excerpt from dmesg about SSD:
> 
> da9 at ciss0 bus 0 scbus0 target 9 lun 0
> da9:  Fixed Direct Access SCSI-5 device
> da9: Serial Number PACCR9SZ7KJS
> da9: 135.168MB/s transfers
> da9: Command Queueing enabled
> da9: 114439MB (234371520 512 byte sectors: 255H 32S/T 28722C)
> da9: quirks=0x1
> da9: Delete methods: 
> 
> the last line bothers me...
> 
> Is there any tuning I missed?

I'm sure Steve will respond, but in the meantime...

I assume this is you running stable/9 with r251419 or newer (which just
got committed a few hours ago)?

I haven't looked at the code, but it is very, VERY important to remember
that you are *always* at the whim of 1) the controller driver (ciss(4)
in this case), and 2) the controller firmware, as to whether or not
certain pass-through commands are supported (in this case, since you
have a SAS controller, this would be accomplished via a SCSI command
that your controller does not support.

Oh, it looks like Steve just replied and said more or less what I did.
:-)

Bottom line as "we" (the royal we, I guess) have been saying for many
years now: any controller which operates in a RAID fashion and does not
support "true JBOD" (meaning the controller acts a generic controller
with no concept of RAID), will almost always get in the way.  Instead,
stick with true non-RAID controllers -- and yes I am aware choices are
limited.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Serial terminal issues

2013-06-05 Thread Jeremy Chadwick

On Wed, Jun 05, 2013 at 09:29:56PM +0200, Alban Hertroys wrote:
> {sniping stuff that is pending or has been acknowledged}
> On Jun 5, 2013, at 2:59, Jeremy Chadwick  wrote:
> > Serial port speed settings in a BIOS pertain to BIOS-level console
> > redirection -- that redirection is lost the instant anything (boot
> > loader, kernel, etc.) touches SMI and/or interrupts and starts
> > "fiddling" with the serial port.
> 
> That's the bit I wasn't entirely certain of - that there is no possible 
> interaction from having a BIOS console to the point where the OS takes over. 
> That's why I mentioned it.
> 
> I assumed that if the BIOS had set up the serial port to 19200 baud and the 
> OS didn't specify it, that it would be possible that the speed set up in the 
> BIOS would still be in effect and that the serial terminal just incidentally 
> worked for the last 10 years because of that. Far-fetched, I know.

Not far-fetched.  Some system BIOSes with BIOS-level serial console
redirection offer what you describe -- on Supermicro systems, you can
toggle this capability in the BIOS, it's called "Continue CR After POST"
(CR stands for Console Redirection).

This is hard to explain without getting into the technicalities, so bear
with me here.  Get coffee, etc..

What this BIOS feature does is "retain" the SMI/interrupt mapping stuff,
so that certain calls to interrupt 0x10 (the BIOS interrupt) for things
like cursor movement, writing text/strings, etc. are done on the native
console (ex. VGA) *as well* as sent to the serial port (and converted
into escape sequences of your choice -- another BIOS option, "Console
Type", lets you pick between things like vt100, ANSI, ASCII, etc.).

This option is useful for things like option ROMs or HBAs (SCSI/SAS
controllers, etc.) which print stuff *after* POST.  I'm sure you've seen
this.  With "Continue CR After POST" disabled, those types of messages
are only seen on the VGA console.

However, regardless of the setting of "Continue CR After POST", the
instant any x86 code starts tinkering with the SMI/interrupt stuff, that
functionality is lost (and cannot be restored).  In FreeBSD, this
definitely happens when the kernel starts, but AFAIR not during the
bootstraps.

Instead, the bootstraps (that is: boot0, as well as boot2/loader) have
the ability to speak to the serial port *directly*, rather than relying
on interrupt 0x10.

The -S19200 parameter in /boot.config causes the bootstraps **very**
early on to set the serial port speed to 19200 baud.  This could cause
a problem if you have BIOS-level serial redirect set up and set to a
different speed (ex. 57600), so naturally you need to make sure
everything uses the same speed at all "stages".

The -Dh parameter in /boot.config causes the bootstraps **very** early
on to tell FreeBSD to write data to the VGA console/text console, in
addition to the serial port (directly, not via interrupt 0x10).  For how
all that works (meaning how -D vs. -Dh behaves and at what ""stages""
of the FreeBSD boot process), please see the FreeBSD Handbook 
section 27.6.4.1:

http://www.freebsd.org/doc/en/books/handbook/serialconsole-setup.html

The handbook here is also outdated/wrong; it's talking about sio0 when
it means to refer to uart0.  flags for uart0 in this case will be
0x00010 (meaning uart0 is a potential serial console).

Finally, the important/key part: the -Dh capability when used in
/boot.config gets ""passed on"" to boot2/loader (so it knows to output
data to the serial port as a console), and boot2/loader **ALSO** passes
that information on to the kernel when it starts so that it knows to
print data to the serial port too.

Make sense?  :-)

This is why I advocate using /boot.config (or you can use /boot/config
if you wish -- both in 9.1-RELEASE work (thanks des@ !)) rather than
mucking about with /boot/loader.conf -- the added advantage is that you
can actually get serial output at an earlier phase/stage, in case some
of your boot blocks don't work.  More specifically, with /boot.config
you can actually get this on the serial port (if you bang on Escape or
Enter repeatedly VERY early on in the boot process):

>> FreeBSD/i386 BOOT
Default: 0:ad(0,a)/boot/loader
boot:

But if you don't bang on keys, you won't ever see this.

Anyway, sorry for the long ramble there, but the above is how it works.
(I'm sure readers will go "My god, that is one of the best write-ups
I've seen of how the serial console/boot process stuff works, why isn't
this in the handbook!?" to which I will opt out/not respond to).

> > What you're adjusting in FreeBSD is 1) the FreeBSD boot loader touching
> > the serial port, and 2) the FreeBSD kernel outputting to a serial port
> > (it also in

Re: Serial terminal issues

2013-06-04 Thread Jeremy Chadwick

=


This diagram should allow you to build your own cable if need be,
including a null-modem cable if you plan on doing a PC<-->PC (i.e. DB9
to DB9) connection.  I just happened to use RJ45 stuff; for DB9-to-DB9;
just follow the chart (signal/pin names) and go from there.

If you already have a cable and you aren't sure of its wiring (which is
very common -- sigh, stupid companies...), you will need to figure out
the wiring using a multimetre (continuity test is all that's needed).

> I didn't see any options in the BIOS to set the console speed (just
> address and IRQ, those are in the above). ISTR that my old mobo did
> allow to set that information, but then again, that board (Tyan Tiger)
> gave me access to the BIOS through the serial console.

This has absolutely no relevancy.

Serial port speed settings in a BIOS pertain to BIOS-level console
redirection -- that redirection is lost the instant anything (boot
loader, kernel, etc.) touches SMI and/or interrupts and starts
"fiddling" with the serial port.

What you're adjusting in FreeBSD is 1) the FreeBSD boot loader touching
the serial port, and 2) the FreeBSD kernel outputting to a serial port
(it also initialises/sets the serial port), and 3) getty et al spawning
a login prompt on the serial port.

I would point you to my "FreeBSD via serial console and PXE" document,
except there are one-offs specific to the PXE portions that are not
relevant to your situation.  The important part is that I've used
FreeBSD serial console for almost 16 years and have a very good
understanding of what works (including vs. what some developers say
"should" work; i.e. reality vs. pragmatism).

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS crashing while zfs recv in progress

2013-06-04 Thread Jeremy Chadwick

6GB RAM.  That's a bit shocking for something of
  this size.

Moving on.

Can you tell me what exact disk (e.g. daXX) in the above list you used
for swap, and what kind of both system and disk load were going on at
the time you saw the swap message?

I'm looking for a capture of "gstat -I500ms" output (you will need a
VERY long/big terminal window to capture this given how many disks you
have) while I/O is happening, as well as "top -s 1" in another window.
I would also like to see "zpool iostat -v 1" output while things are
going on, to help possibly narrow down if there is a single disk causing
the entire I/O subsystem for that controller to choke.

Next: are you using compression or dedup on any of your filesystems?
If not, have you ever in the past?

Next: could we have your loader.conf and sysctl.conf please?

My gut feeling is that if you're doing zfs {send,recv} for "tank" --
which you are -- multiple subsystems and busses are so incredibly
overwhelmed by all the I/O and interrupts and *everything* that it's
very hard for the swap I/O time slicer to get a decent share of time to
swap something out to swap (even worse if that controller is overwhelmed
with requests).  Worse, you're using raidz2, which means even more CPU
time + calculation overhead, which means less time for other tasks
(threads).  Everything on the system -- everything! -- is fighting for
time at multiple levels.

If you could put a swap disk on a dedicated controller (and no other
disks on it), that would be ideal.  Please do not use USB for this task
(the USB stack may introduce its own set of complexities pertaining to
interrupt usage).

If all this turns out to be an "overall system overwhelmed" situation,
my advice is to cut back on the usage.  I would STRONGLY suggest in that
case a 2nd system, and split the number of disks across both.

I'm really surprised given how many disks/etc. you have you didn't
choose to get an actual filer (Netapp).  I sure as hell would have.  I
really do not know why people think ZFS is a full-blown replacement for
a Netapp of this scale -- it isn't.

Anyway take what I say with a grain of salt -- really.  I'm just
throwing out thoughts/ideas as I look over everything.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9.1-current disk throughput stalls ?

2013-06-03 Thread Jeremy Chadwick

On Mon, Jun 03, 2013 at 03:34:26PM -0700, Jeremy Chadwick wrote:
> 7. ZFS setup is a mirror (RAID-1-like),

Should have referenced [2].

> 12. Rolling back to 8.4-STABLE (date/build unknown) apparently fixes
> your issue (I would appreciate you running the system for 72 hours
> before making this statement, and doing the *exact same things* on it
> that cause the problem with 9.1-STABLE) [2]

I should have used the word "exacerbate" instead of "cause".

> v) I really wish you would not have rolled this system back to
> 8.4-STABLE.  For anyone to debug this, we need the system in a
> consistent state.  Changing kernels/etc. 

User error while using vim (I have an awful tendency to nuke entire
lines when switching between input mode vs. navigation mode); last line
should read "Changing kernels/etc. in the middle of troubleshooting a
problem you ask for assistance with makes things very difficult".  (And
I say that knowing that rolling back as a form of testing is good, since
it can help narrow things down to a specific version or release, i.e. a
software problem).

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9.1-current disk throughput stalls ?

2013-06-03 Thread Jeremy Chadwick

On Mon, Jun 03, 2013 at 03:48:30PM -0600, Ross Alexander wrote:
> On Mon, 3 Jun 2013, Jeremy Chadwick wrote:
> 
> >1. There is no such thing as 9.1-CURRENT.  Either you meant 9.1-STABLE
> >(what should be called stable/9) or -CURRENT (what should be called
> >head).
> 
> >I wrote:
> >>The oldest kernel I have that shows the syndrome is -
> >>
> >>FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498:
> >>Sat May 11 00:03:15 MDT 2013
> >>toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> See above.  You're right, I shouldn't post after a 07:00 dentist's
> appt while my spouse is worrying me about the ins adjustor's report
> on the car damage :(.  Hey, I'm very fallible.  I'll try harder.
> 
> >2. Is there some reason you excluded details of your ZFS setup?
> >"zpool status" would be a good start.
> 
> Thanks for the useful hint as to what info you need to diagnose.
> 
> One of the machines ran a 5 drive zraid-1 pool (Mnemosyne).
> 
> Another was a 2 drive gmirror, in the simplest possible gpart/gmirror setup.
> (Mnemosyne-sub-1.)
> 
> The third is a 2 drive ZFS raid-1, again in the simplest possible
> gpart/gmirror manner (Aukward).
> 
> The fourth is a conceptually identical 2 drive ZFS raid-1, swapping
> to a zvol (Griffon.)
> 
> If you look on the FreeBSD wiki, the pages that say "bootable zfs
> gptzfsboot" and "bootable mirror" -
> 
> https://wiki.freebsd.org/RootOnZFS
> http://www.freebsdwiki.net/index.php/RAID1,_Software,_How_to_setup
> 
> Well, I just followed those in cookbook style (modulo device and pool
> names).  Didn't see any reason to be creative; I build for
> reliability, not performance.
> 
> Aukward is gpart/zfs raid-1 box #1:
> 
> aukward:/u0/rwa > ls -l /dev/gpt
> total 0
> crw-r-  1 root  operator  0x91 Jun  3 10:18 vol0
> crw-r-  1 root  operator  0x8e Jun  3 10:18 vol1
> 
> aukward:/u0/rwa > zpool list -v
> NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
> ult_root   111G   108G  2.53G97%  1.00x  ONLINE  -
>   mirror   111G   108G  2.53G -
>   gpt/vol0  -  -  - -
>   gpt/vol1  -  -  - -
> 
> aukward:/u0/rwa > zpool status
>   pool: ult_root
>  state: ONLINE
>   scan: scrub repaired 0 in 1h13m with 0 errors on Sun May  5 04:29:30 
> 2013
> config:
> 
>   NAME  STATE READ WRITE CKSUM
>   ult_root  ONLINE   0 0 0
> mirror-0ONLINE   0 0 0
>   gpt/vol0  ONLINE   0 0 0
>   gpt/vol1  ONLINE   0 0 0
> 
> errors: No known data errors
> 
> (Yes, that machine has no swap.  Has NEVER had swap, has 16 GB and
> uses maybe 10% at max load.  Has been running 9.x since prerelease
> days, FWTW.  The ARC is throttled to 2 GB; zfs-stats says I never get
> near using even that.  It's just the box that drives the radios,
> a ham radio hobby machine.)
> 
> Griffon is also gpart/zfs raid-1 -
> 
> griffon:/u0/rwa > uname -a
>   FreeBSD griffon.cs.athabascau.ca 9.1-STABLE FreeBSD 9.1-STABLE #25 
> r251062M:
>   Tue May 28 10:39:13 MDT 2013
>   t...@griffon.cs.athabascau.ca:/usr/obj/usr/src/sys/GENERIC
>   amd64
> 
> griffon:/u0/rwa > ls -l /dev/gpt
> total 0
> crw-r-  1 root  operator  0x7b Jun  3 08:38 disk0
> crw-r-  1 root  operator  0x80 Jun  3 08:38 disk1
> crw-r-  1 root  operator  0x79 Jun  3 08:38 swap0
> crw-r-  1 root  operator  0x7e Jun  3 08:38 swap1
> 
> and the pool is fat and happy -
> 
> griffon:/u0/rwa > zpool status -v
>   pool: pool0
>  state: ONLINE
>   scan: none requested
> config:
> 
>   NAME   STATE READ WRITE CKSUM
>   pool0  ONLINE   0 0 0
> mirror-0 ONLINE   0 0 0
>   gpt/disk0  ONLINE   0 0 0
>   gpt/disk1  ONLINE   0 0 0
> 
> errors: No known data errors
> 
> Note that swap is through ZFS zvol;
> 
> griffon:/u0/rwa > cat /etc/fstab
> # DeviceMountpoint  FStype  Options DumpPass#
> #
> #
> /dev/zvol/pool0/swap none   swapsw  0   0
> 
> pool0   /   zfs rw  0   0
> pool0/tmp   /tmpzfs rw  0   0
> pool0/var   /var

Re: 9.1-current disk throughput stalls ?

2013-06-03 Thread Jeremy Chadwick

world doesn't get finished.
> I am seeing very similar behaviour on three other 9.1-current
> machines, all of which are AHCI/SATA setups, using both Seagate and WD
> disks (of random sizes and ages).  All these boxes ran fine a month
> ago.
> 
> BTW, when I do the rattle-keyboard-to-get-disks-going trick, the NFS
> daemon reports that the system clock slews badly - machine time drops
> behind wall clock time.  Something is locking the clock update off.
> 
> (Hmmm, I see I'm running a pre-5000/feature flags ZFS pool, FWTW.
> I'll run zpool upgrade, my bad.)

1. There is no such thing as 9.1-CURRENT.  Either you meant 9.1-STABLE
(what should be called stable/9) or -CURRENT (what should be called
head).

2. Is there some reason you excluded details of your ZFS setup?  "zpool
status" would be a good start.

3. Do any of your filesystems/pools have ZFS compression enabled, or
have in the past?

4. Do any of your filesystems/pools have ZFS dedup enabled, or have in
the past?

5. Does the problem go away after a reboot?

6. Can you provide smartctl -x output for both ada0 and ada1?  You will
need to install ports/sysutils/smartmontools for this.  The reason I'm
asking for this is there may be one of your disks which is causing I/O
transactions to stall for the entire pool (i.e. "single point of
annoyance").

7. Can you remove ZFS from the picture entirely (use UFS only) and
re-test?  My guess is that this is ZFS behaviour, particularly the ARC
being flushed to disk, and your disks are old/slow.  (Meaning: you have
16GB RAM + 4 core CPU but with very old disks).

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout

2013-06-03 Thread Jeremy Chadwick

On Mon, Jun 03, 2013 at 03:06:53PM +0100, Mike Pumford wrote:
> Ian Lepore wrote:
> >On Wed, 2013-05-29 at 16:21 +0200, Oliver Fromme wrote:
> >>Steven Hartland wrote:
> >>  > Have you checked your sata cables and psu outputs?
> >>  >
> >>  > Both of these could be the underlying cause of poor signalling.
> >>
> >>I can't easily check that because it is a cheap rented
> >>server in a remote location.
> >>
> >>But I don't believe it is bad cabling or PSU anyway, or
> >>otherwise the problem would occur intermittently all the
> >>time if the load on the disks is sufficiently high.
> >>But it only occurs at tags=3 and above.  At tags=2 it does
> >>not occur at all, no matter how hard I hammer on the disks.
> >>
> >>At the moment I'm inclined to believe that it is either
> >>a bug in the HDD firmware or in the controller.  The disks
> >>aren't exactly new, they're 400 GB Samsung ones that are
> >>several years old.  I think it's not uncommon to have bugs
> >>in the NCQ implementation in such disks.
> >>
> >>The only thing that puzzles me is the fact that the problem
> >>also disappears completely when I reduce the SATA rev from
> >>II to I, even at tags=32.
> >>
> >
> >It seems to me that you dismiss signaling problems too quickly.
> >Consider the possibilities... A bad cable leads to intermittant errors
> >at higher speeds.  When NCQ is disabled or limited the software handles
> >these errors pretty much transparently.  When NCQ is not limitted and
> >there are many outstanding requests, suddenly the error handling in the
> >software breaks down somehow and a minor recoverable problem becomes an
> >in-your-face error.
> >
> It could also be a software bug in the way CAM handles the failure
> of NCQ commands. When command queueing is used on a SCSI drive and a
> queued command fails only that command fails. A queued command
> failure on a SATA device fails ALL currently queued commands. I've
> not looked at the code but do the SATA CAM drivers do the right
> thing here?

Quoting T13/2015-D ATA8-ACS2 WD spec:

"If an error occurs while the device is processing an NCQ command, then
the device shall return command aborted for all NCQ commands that are in
the queue and shall return command aborted for any new commands, except
a READ LOG EXT command requesting log address 10h, until the device
completes a READ LOG EXT command requesting log address 10h (i.e.,
reading the NCQ Command Error log) without error."

While I can't easily provide an answer to your question, I can tell you
that sys/dev/ahci/ahci.c does execute READ LOG EXT (command 0x2f) for
certain scenarios (the code is in function ahci_issue_recovery()).

The one person who can answer this question is mav@, who is now CC'd.

> Less commands queued makes it less likely that multiple commands
> will be in progress when a failure occurs.  A lower link rate also
> makes you more immune to signal failures.

He isn't seeing SATA-level signal/link failure; the AHCI driver would
complain about that, and those messages aren't there.  Unless, of
course, those messages are only visible when verbose booting is enabled
(I hope not).

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Corrupt GPT header on disk from twa array - fixable?

2013-06-03 Thread Jeremy Chadwick

On Mon, Jun 03, 2013 at 09:14:41AM +0200, Alban Hertroys wrote:
> 
> On Jun 3, 2013, at 1:09, Warren Block  wrote:
> 
> > On Mon, 3 Jun 2013, Alban Hertroys wrote:
> >>> 
> >>> Really, the easiest way would be to temporarily install the old RAID 
> >>> controller and copy the data off the array.
> >> 
> >> Well, that would mean I'd have to assemble the old server again, as the 
> >> controller is not compatible with the hardware in the new one. And that 
> >> would probably be unnecessary as well, since I already did copy the data 
> >> off those disks.
> >> 
> >> I was just curious whether it would be possible to read that data off the 
> >> disks while I still have them (with their original contents) in the new 
> >> server in the eventuality that I _did_ forget to copy something over or 
> >> that something wasn't copied over correctly.
> >> 
> >> I copied the data over a 100MBit ethernet link, which was the fastest 
> >> option I had with the old server; it had USB1 and no native SATA. Hence 
> >> the RAID controller, but that was on a now deprecated PCI-X channel (those 
> >> 64-bit parallel things) and all 4 ports were in use. Not to mention that 
> >> the CPU was so old that it had a rather narrow margin for operating 
> >> temperatures and overheated several times during the copying process, 
> >> because rsync+sshd put a relatively high load on the CPU (An old Athlon XP 
> >> 2000+).
> > 
> > PCI-X cards will operate in PCI slots.  Or at least some will; I've done 
> > that with an Intel network card.  The motherboard can't have components 
> > that block the unused part of the edge connector, or the offending card 
> > edge could be removed with extreme prejudice.
> 
> Not this 3Ware card. I remember buying that particular motherboard because 
> the card wouldn't fit in the PCI slots on the board I had. There's a division 
> in those PCI-X slots opposite of where there's one in normal PCI slots and no 
> groove in the card to match the division in the PCI slot.

This is all besides-the-point, but to clarify: please see the following
diagram:

http://en.wikipedia.org/wiki/File:PCI_Keying.png

I recommend seeing the caption under the diagram, in addition to reading
the "Mixing of 32-bit and 64-bit PCI cards in different width slots"
section:

http://en.wikipedia.org/wiki/PCI-X

It sounds like your 3Ware card is 5V PCI-X (32-bit or 64-bit is
irrelevant), and your new motherboard only supports 3.3V PCI (which is
pretty much the norm on all motherboards today when it comes to classic
PCI).

The 5V stuff is generally shunned (both with regards to PCI and PCI-X)
and is uncommon at this point in time.

You can find some server-class boards that offer this capability, such
as Supermicro's UIO slots, where you purchase the proper type of "riser"
(adapter) for the type of card you have, i.e. UIO->5.5V PCI-X 64-bit),
but you will not find this on consumer/desktop or even "enthusiast"
boards.  Example:

http://www.supermicro.com/support/resources/riser/riser.aspx

If you want to know what kind of card it is, ask 3Ware or see the user
manual.  Note that many vendors do not disclose all the relevant data in
the manual or on their site.  That info: voltage (3.3V vs. 5V vs.
universal), bus width (32-bit vs. 64-bit), and if 64-bit if the card
will function in a 32-bit slot (some cards won't).

Educational footnote: AGP is another one of those standards that went
through the same nonsense (specifically 3.3V vs. 1.5V), except the
situation was worse when some card manufacturers began selling 1.5V
cards with incorrect notchings, resulting in smoke/fire when installed
in a 3.3V slot.  I have one such card, and keep it solely as a reminder
of manufacturer/vendor idiocy.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Corrupt GPT header on disk from twa array - fixable?

2013-06-02 Thread Jeremy Chadwick

 from the numbers would be 512 bytes in size).
> 
> > Finally, GPT and gmirror are combined.  That's a problematic combination 
> > because both want metadata in the last block of the drive. The new section 
> > in the Handbook about RAID1 (gmirror) describes that in the "Metadata 
> > Issues" section:
> > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/GEOM-mirror.html
> 
> I'm pretty sure the disks on the controller had nothing to do with gmirror 
> ever.
> 
> Gmirror is only applied to a pair of new disks that I put in the (new) server 
> to be able to copy my data over. I hadn't expected to be able to rely on 
> those original disks to be readable at all without the controller, so I 
> needed some place to store the data. I like the redundancy of a mirror, so I 
> used gmirror for (only) the new disks.

I think you're missing what Warren is telling you, because you have
multiple things going on/complexities to deal with simultaneously.

You haven't provided any details about your gmirror setup either.  All
we know at this point:

> >> GEOM_MIRROR: Device mirror/boot launched (2/2).
> >> GEOM_MIRROR: Device mirror/swap launched (2/2).
> >> GEOM_MIRROR: Device mirror/root launched (2/2).

My gut feeling is ada2 and ada3 make up the mirror, and the mirror is at
the disk level (ada2 and ada3).  I'm basing this on past evidence
presented in the thread, and having to make assumptions.  No "gmirror
status" output = we have to make assumptions.

Now, what Warren is telling you: gmirror + GPT do not play well
together.  This is a design flaw** on the part of gmirror.  If you want
to use gmirror with disks using GPT, your only solutions are to mirror
the partitions (adaXpX) and not the disk (adaX), which has its own set
of caveats, or to use the MBR scheme (and if these are 4K sectors disks,
or you plan on using those, you're even more screwed).  I will not bring
ZFS into this discussion since that also opens up a can of worms -- I'm
trying to stay focused.

The errors you see on ada4 and ada5 about the backup GPT header can be
dealt with in a different manner.

But for (again, assuming) ada2 and ada3, you will see GPT "backup header
corruption" messages indefinitely because of the above flaw.

** -- I will not get into a debate about terminology.  I am aware of the
history (which came first), and so on.  It's a flaw.  Linux md had the
same problem when GPT was introduced, and it has since been
fixed/addressed.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout

2013-05-29 Thread Jeremy Chadwick

0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Retrying command
> ..
> ..

It's worth pointing out that all of the events you provided are writes.
In my experience, historically, that has usually been the case.  If a
drive firmware screws around when handling an NCQ write, taking too long
to do something (think firmware bug), this can happen.  If that's the
case, the fact it happens on 2 disks of the same type thus wouldn't
surprise me.

I've mentioned in the past that I know of a few situations where this
can happen, particularly with 4KByte sector drives, depending on how the
user set up the system.  In this case, the Samsung HD403LJ is supposedly
a 512-byte sector drive, but the drive probably complies with an older
ATA specification and thus only provides the logical sector size in ATA
IDENTIFY output, thus the system must assume physical=logical
(camcontrol and smartmontools will both say something to the effect of
"512 bytes logical/physical").

I would appreciate the following:

1. smartctl -x {ada0,ada1} output using a recent version of
smartmontools (6.1 if possible please),

2. camcontrol identify {ada0,ada1} -v output (note the -v),

3. If you are running smartd(8) or not,

4. pciconf -lvbc output.

Anecdotal story:

A lot of people forget the infamous nVidia nForce 4 vs. Maxtor NCQ issue
that circulated "PC enthusiast" sites during the mid-2000s.  Neither
company wanted to own up to the problem, blaming each other instead.
There was never any official statement made as to where the problem was,
only that nVidia updated their nForce 4 controller drivers with some
sort of workaround (details were not disclosed), and Maxtor also quietly
added a document to their website stating that you could get a firmware
from Technical Support that would address the problem as well.  I had
a combination of the two at the time, which is why I remember it.  Still to
this day nobody knows who was really responsible.  I won't get into the
whole political/societal aspects of why vendors always blame one another
rather than solve real problems.

There is no way at this time (in real-time or via loader.conf) to
disable NCQ within the AHCI driver.  It is possible to add an entry to
the AHCI quirks table for your controller that sets AHCI_Q_NONCQ, if you
want to try that.  I can give you a patch for that, but I need to see
the output from the above (4) commands first -- it may not be necessary
to try, depending on the results.

I have probably left out key/important informations within this mail,
which is an indicator of how tired I have grown of seeing it come up.
:-(

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: System doesn't dump

2013-05-29 Thread Jeremy Chadwick

On Wed, May 29, 2013 at 08:41:38AM +0200, Dominic Fandrey wrote:
> I have a number of actions that reliably panic the system, such as
> performing shutdown -p (yes I'm booting into an inconsistent file
> system every time). Both with my notebook and my workstation.
> 
> However I cannot get the system to dump.
> 
> dumpdir=/var/crash
> and I've tried ada0s2b, /dev/ada0s2b, label/5swap, /dev/label/5swap and AUTO
> for dumpdev to no avail.
>
> The swap partition is 16g, the machines have 8g RAM and there's plenty
> of hard disk space available for /var/crash.
> 
> I'm looking for that secret, undocumented trigger, that makes the
> system dump if a panic occurs. Once upon a time dumping just worked
> if the swap partition was large enough. I miss those olden days.

Foremost: the fact you did not disclose your FreeBSD version (and SVN
rev if you have it) nor architecture is disappointing.  It matters more
than you think.  Please disclose it.

Onward ho...

If you have VGA console access, try dropping to db> and issuing the
command "call doadump" (possibly preceded by "panic").

If you have serial console access, there are ways to drop to ddb but it
depends on your kernel config (look for BREAK_TO_DEBUGGER and
ALT_BREAK_TO_DEBUGGER in /sys/conf/NOTES).  "Break" with serial, by the
way, means a serial-level break signal (often why I prefer
ALT_BREAK_TO_DEBUGGER).

After doing "call doadump" you should definitely see the kernel dumping
memory to swap (it gives a progress indicator of sorts).  Google for the
phrase "call doadump" and look at some of the results to get an idea of
what the output normally is during that phase, for comparison.

If you don't see such, I'm sure many of the kernel folks here can help
figure out why.

See sysctl debug.ddb.scripting.scripts for what should get automatically
done on a panic.  This may or may not be affected by ddb_enable="yes" in
rc.conf (which mandates DDB being enabled in your kernel) -- I can't
remember though, so someone else may want to comment.

If your issue is that the kernel actually *does* dump memory to swap but
that on boot-up savecore(8) doesn't recover the memory dump and populate
relevant files in /var/crash: that's a separate issue that has been
discussed for probably 10 years or longer with (to my knowledge) no
definitive explanation.  Theories presented (going off of memory here)
were that that something ended up writing over parts of the "panic
metadata" on the swap disk/slice/etc. and thus savecore(8) finds
nothing.  This is why rc scripts/etc. have to make sure to look for the
swap "panic metadata" and run savecore(8) **before** issuing dumpon(8).

My opinion, others' may vary:

Stick with using dumpdev="auto" in rc.conf, assuming you have a
/etc/fstab entry of "swap" somewhere.  Swap should ideally be a
partition or slice, not something abstracted out by other layers (see
above paragraph for why I advocate that, but my additional opinion is
that when it comes to getting a kernel dump and system configurations,
KISS principle applies heavily.  If your system is crashing, the last
thing you want to deal with is why you can't get a kernel dump -- you
could spend more time doing that than you do getting the panic info +
debugging the actual crash), but again, this is my own opinion and there
are legitimate other opinions as well -- I just follow what I do because
I know it works.

Likewise I always get wary of people's setups when I start seeing
labels mentioned.  *waves cane*  Screw all this newfandangled stuff.
:-)

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-28 Thread Jeremy Chadwick

On Tue, May 28, 2013 at 10:57:22AM +0300, Daniel Braniss wrote:
> 
> [...]
> > 1. r248226 in head was MFC'd to stable/9 as r248858.  Validation:
> > 
> > http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log
> > 
> > So the answer: whether or not you have that MFC in stable/9 depends on
> > what SVN rev your kernel is.
> 
> I do a svnsync then I convert to mercurial so from the svn logs I see that
> the highest rev number is 250960.
> 
> [...]
> > 
> > That "piggybacking" crap never should have been invented.  All it has
> > done is cause problems for every OS I know of (including Windows) since
> > its inception, and is also exactly why today almost all vendors I've
> > seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface.
> > It's admission the "piggybacking" method doesn't work.  And may it rot
> > in hell for all I care, while simultaneously feeling very sorry for
> > those who have to suffer/deal with it.
> > 
> > This is just another reason why I've always been very picky about what
> > hardware I'd buy for server deployments.  Vendors never actually
> > disclose this crap until you've shelled out money for the hardware, by
> > which point it's too late and you're suffering.  Really great model --
> > for the pocketbook.  :/
> > 
> 
> I couldn't agree more!
> 
> [...]
> 
> in the case of the SunFire X2200, it has 4 bge ports, the
> 2nd, bge1, is only used by the ilo, it's not enabled (UP'ed),
> it doesn't have an interrupt assigned, it's, as far as I can tell,
> just anoying to have the DOWN/UP messages - unless something more sinester
> is lurking.

Does output from "ps -auxH | grep kernel/bge" show anything for
bge1?

What about "vmstat -i -a" (you might be surprised about the -a flag and
what shows up compared to just using -i).  Gut feeling says it will show
up there.  (See vmstat(8) for what -a does)

Possibly interrupt generation isn't what's "triggering" the bge(4)
device to see link going up/down; maybe this is done via some memory
mapped I/O, which would explain why "vmstat -i" shows nothing for bge1
(no interrupts ever generated).

That doesn't change the fact that the driver still is being told via
some means that link is going up/down.

Just a general FYI (probably not relevant here too much, but I often
have to point it out for younger SAs (not saying anyone here is one,
but the list is archived...)): there is a very distinct difference
between a link being physically up/down vs. administratively up/down.

With *IX ifconfig, the social assumption is that there's a 1:1
correlation between those (especially with Ethernet devices), when in
reality it depends on the device driver and all subsystems in between.
I remember quite clearly on some OSes (can't remember if BSD or Linux or
Solaris) where "ifconfig xxx down" on certain devices would still result
in packets being passed across xxx.  This used to shock me when I was
younger, but nowadays doesn't because I have a better understanding of
why.

ifconfig is just a generic tool that interfaces with a lot of things and
tries to do too much, in my opinion.  On BSD we tend to cram as much
crap into ifconfig as humanly possible, while on other OSes separate
per-device tools/utilities have been developed to segregate the
intended behaviours/desires.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-27 Thread Jeremy Chadwick

On Mon, May 27, 2013 at 11:49:31PM -0700, Jeremy Chadwick wrote:
> Other question: is there any correlation between the amount of time that
> goes by between events with, say, ARP/MAC address expiry in "arp -a"?  I
> mention this because I know some of the ASF methods have historically
> shown two MAC addresses on the same physif, and I can see how this might
> confuse some stacks.

Never mind -- I thought about this more, and it's irrelevant.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-27 Thread Jeremy Chadwick

On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote:
> > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
> > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
> > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire 
> > > > > X2200,
> > > > 
> > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output.
> > > > 
> > > 
> > > bge0:  > > 0x009003> mem 
> > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6
> > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > miibus2:  on bge0
> > > brgphy0:  PHY 1 on miibus2
> > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > bge0: Ethernet address: 00:1b:24:5d:5b:bd
> > > bge1:  > > 0x009003> mem 
> > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6
> > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > miibus3:  on bge1
> > > brgphy1:  PHY 1 on miibus3
> > > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > bge1: Ethernet address: 00:1b:24:5d:5b:be
> > > 
> > > sf-10> ifconfig bge1
> > > bge1: flags=8802 metric 0 mtu 1500
> > > 
> > > options=8009b > > TE>
> > > ether 00:1b:24:5d:5b:be
> > > nd6 options=21
> > > media: Ethernet autoselect (100baseTX )
> > > status: active
> > > 
> > 
> > Because bge1 is not UP, I wonder how you get link UP/DOWN events.
> > Do you have some network script run by cron?
> 
> no scripts.
> this port is shared with the ILO/IPMI, and back in March you fixed a problem
> that it was hanging soon after it was initialized by the driver,
> (r248226 - but I'm not sure if it was ever MFC'ed).
> Initialy I thought it could be caused by connections to it from other
> hosts (either via the web, or ssh) so I killed them, but it didn't help.
> without that patch the connection fails, and I don't see any DOWN/UP.

Two things:

1. r248226 in head was MFC'd to stable/9 as r248858.  Validation:

http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log

So the answer: whether or not you have that MFC in stable/9 depends on
what SVN rev your kernel is.

2. Is there some way to verify that the ASF/iLO/IPMI bits (i.e. the IPMI
firmware itself) are not shutting down bge1's PHY intentionally?  Unless
the IPMI module chooses to log something useful (e.g. "I'm doing this"),
I'm not sure how you'd figure that out.

Other question: is there any correlation between the amount of time that
goes by between events with, say, ARP/MAC address expiry in "arp -a"?  I
mention this because I know some of the ASF methods have historically
shown two MAC addresses on the same physif, and I can see how this might
confuse some stacks.

That "piggybacking" crap never should have been invented.  All it has
done is cause problems for every OS I know of (including Windows) since
its inception, and is also exactly why today almost all vendors I've
seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface.
It's admission the "piggybacking" method doesn't work.  And may it rot
in hell for all I care, while simultaneously feeling very sorry for
those who have to suffer/deal with it.

This is just another reason why I've always been very picky about what
hardware I'd buy for server deployments.  Vendors never actually
disclose this crap until you've shelled out money for the hardware, by
which point it's too late and you're suffering.  Really great model --
for the pocketbook.  :/

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Fri, May 24, 2013 at 02:47:20PM +0900, YongHyeon PYUN wrote:
> On Thu, May 23, 2013 at 09:49:19PM -0700, Jeremy Chadwick wrote:
> > On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote:
> > > On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote:
> > > > On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote:
> > > > > If someone wants me to test DHCP via fxp(4) on the above system (I can
> > > > > do so with both NICs), just let me know; it should only take me half 
> > > > > an
> > > > > hour or so.
> > > > > 
> > > > > I'll politely wait for someone to say "please do so" else won't 
> > > > > bother.
> > > > > 
> > > > 
> > > > For the sake of completeness...
> > > > 
> > > > "Please do so."  :)
> > > 
> > > Issue reproduced 100% reliably, even within sysinstall.
> > >
> > > {snip} 
> > 
> > Forgot to add:
> > 
> > This issue ONLY happens when using DHCP.
> > 
> > Statically assigning the IP address works fine; fxp0 goes down once,
> > up once, then stays up indefinitely.
> 
> I asked Mike to try backing out dhclient(8) change(r247336) but it
> seems he missed that. Jeremy, could you try that?
> 
> I guess dhclient(8) does not like flow-control negotiation of
> fxp(4) after link establishment.

I can't test anything without an ISO -- the system in question is truly
"bare-bones" (no hard disk, can't boot USB memsticks, etc.).  I'm not a
good test subject for changes on this one, I'm sorry to say.  :-(

If there's some way to disable flow-control negotiation in fxp(4) or
miibus(4) via loader, I can try that, but I don't know what the MIB
name would be.


If r247336 turns out to be the cause: ironic, as r247336 references PR
166656, which was tested against -- wait for it -- xl(4).

People in *this* thread are saying "screw legacy hardware" yet the PR is
for something as old as the 3C905B?  Maybe I should bow out of this
thread before I have an aneurysm.


-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Fri, May 24, 2013 at 01:24:24AM -0400, Glen Barber wrote:
> Speaking entirely on behalf of myself now...
> 
> On Thu, May 23, 2013 at 10:11:39PM -0700, Jeremy Chadwick wrote:
> > > I think this will likely be included in errata notes for the release.
> > 
> > I urge you to meet with others in Release Engineering and discuss this
> > fully.  This is major enough that, once fixed, it warrants an immediate
> > binary update (to the kernel + if_fxp.ko) pushed out via freebsd-update.
> > 
> 
> It can be solved with a -pN update after 8.4-RELEASE is out.

Not that I'm calling the shots or anything, but:

Let's go with that, combined with an included mention in the Errata
section of the Release Notes as you initially mentioned.

Sorry I can't be of more help; Charles' environment sounds like it would
be better-suited for testing, and I'm sure Michael can test out a patch
if/when someone gets around to poking at things.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Fri, May 24, 2013 at 12:56:20AM -0400, Glen Barber wrote:
> On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote:
> > [...]
> > So if someone wants to take a stab at this, they'll need to do so and
> > make me an ISO.  Sorry that I can't make things easier.  :-(
> > 
> > This definitely needs to get fixed before 8.4-RELEASE.
> > 
> 
> *sigh*
> 
> At this point, it is highly unlikely this will be fixed before
> 8.4-RELEASE.  We are _far_ too deep into the release cycle.  In fact, we
> are effectively done with the release, and waiting on release notes to
> be completed.
> 
> I think this will likely be included in errata notes for the release.

I urge you to meet with others in Release Engineering and discuss this
fully.  This is major enough that, once fixed, it warrants an immediate
binary update (to the kernel + if_fxp.ko) pushed out via freebsd-update.

fxp(4) is a commonly-used driver; it isn't something rare/uncommon.

Also remember at this stage we don't know if it's a specific PHY model
or specific NIC model (or series) which triggers it.  For all we know it
could affect everything that fxp(4) drives.

Please don't forget that FreeBSD has a very well-established history of
having rock-solid Intel NIC support.  Sure, mistakes happen, we're
human, bugs get introduced, but this does not bode well -- meaning I
would expect Slashdot et al to pick up on this.

> It is very unfortunate that this waited so long to be reported, as much
> time has passed since 8.4-BETA1...

This is what happens when people socially proliferate the belief that
"RELEASE is rock solid/stable, don't run stable/X" -- the number of
people who test what changes between RELEASE builds is vastly smaller
comparatively.  I've only been saying this for the past 15 years, so
it's even more unfortunate that people keep believing it.  :/

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote:
> On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote:
> > On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote:
> > > If someone wants me to test DHCP via fxp(4) on the above system (I can
> > > do so with both NICs), just let me know; it should only take me half an
> > > hour or so.
> > > 
> > > I'll politely wait for someone to say "please do so" else won't bother.
> > > 
> > 
> > For the sake of completeness...
> > 
> > "Please do so."  :)
> 
> Issue reproduced 100% reliably, even within sysinstall.
>
> {snip} 

Forgot to add:

This issue ONLY happens when using DHCP.

Statically assigning the IP address works fine; fxp0 goes down once,
up once, then stays up indefinitely.

I also tested network I/O in the statically-assigned scenario.  Pinging
the box from another machine on the LAN:

$ ping 192.168.1.192
PING 192.168.1.192 (192.168.1.192): 56 data bytes
64 bytes from 192.168.1.192: icmp_seq=0 ttl=64 time=0.180 ms
64 bytes from 192.168.1.192: icmp_seq=1 ttl=64 time=0.138 ms
64 bytes from 192.168.1.192: icmp_seq=2 ttl=64 time=0.214 ms
64 bytes from 192.168.1.192: icmp_seq=3 ttl=64 time=0.165 ms
64 bytes from 192.168.1.192: icmp_seq=4 ttl=64 time=0.114 ms
^C
--- 192.168.1.192 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.114/0.162/0.214/0.034 ms

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote:
> On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote:
> > If someone wants me to test DHCP via fxp(4) on the above system (I can
> > do so with both NICs), just let me know; it should only take me half an
> > hour or so.
> > 
> > I'll politely wait for someone to say "please do so" else won't bother.
> > 
> 
> For the sake of completeness...
> 
> "Please do so."  :)

Issue reproduced 100% reliably, even within sysinstall.

ISO image used:

ftp://ftp4.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/8.4/FreeBSD-8.4-RC3-i386-disc1.iso

I just chose to Configure the system, selected Networking, chose NO to
the IPv6 configuration choice, and YES to the DHCP configuration choice,
then hit Alt-F2 to watch relevant output.

This was the result:

http://imgbin.org/index.php?page=image&id=13718

...with the fxp0 physif up/down messages continuing indefinitely.

fxp0 on the system is the Intel 82559.  Shot of console's dmesg:

http://imgbin.org/index.php?page=image&id=13720

Nothing is connected to fxp1.

Key points for those asking me to help debug:

- I only have VGA console on this box
- I do not have an IDE hard disk of any sort for temporary OS
  installation, setup, kernel testing, etc..
- The system cannot boot USB media of any sort, so memsticks are out
- The ATAPI drive is CD-only; there is no DVD support, so there's no
  easy way to get a "real" shell with full utilities (i.e. "Fixit")

So if someone wants to take a stab at this, they'll need to do so and
make me an ISO.  Sorry that I can't make things easier.  :-(

This definitely needs to get fixed before 8.4-RELEASE.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Thu, May 23, 2013 at 11:13:03PM -0400, Glen Barber wrote:
> On Thu, May 23, 2013 at 08:03:51PM -0700, Jeremy Chadwick wrote:
> > On Thu, May 23, 2013 at 09:21:17PM -0400, Glen Barber wrote:
> > > On Thu, May 23, 2013 at 06:09:43PM -0700, Jeremy Chadwick wrote:
> > > > On Thu, May 23, 2013 at 08:18:33PM -0400, Michael L. Squires wrote:
> > > > > I've just tested 8.4-RC3 using a different Supermicro 1U box with a 
> > > > > fresh
> > > > > installation of 8.4-RC3.  I had problems with the installation, 
> > > > > wouldn't
> > > > > boot until I used a Windows 98 FDISK to write a master boot record
> > > > > (no idea why; this system uses an Adaptec SATA 1.5 6-channel PCI-X
> > > > > board with two
> > > > > drives in RAID 1).
> > > > > 
> > > > > Using the em0 interface there are no problems with DHCP; when I
> > > > > switch to the fxp0 interface the interface starts going up/down in
> > > > > the same manner as reported.
> > > > > 
> > > > > The problem appears associated with "world", not with the kernel 
> > > > > (running
> > > > > the 8.4 kernel with the 8.3 world does not have this problem).
> > > > > 
> > > > > This motherboard is an X5DPL-iGM with 2 Xeon 2.8GHz CPUs and 4 GB of 
> > > > > RAM.
> > > > > The other unit (an earlier board) has a Serverworks chipset with a 
> > > > > single
> > > > > Xeon CPU but also with a 100Mbit Intel Pro100 Ethernet port and a 
> > > > > 1000Mbit
> > > > > Intel Pro1000 Ethernet port.
> > > > > 
> > > > > This unit isn't doing anything useful, so testing isn't a problem.
> > > > 
> > > > Mike, Yong-Hyeon asked you a very important question which you didn't
> > > > answer:
> > > > 
> > > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073458.html
> > > > 
> > > > If you assign a static IP address, does fxp0 behave properly?
> > > > 
> > > > I'm also re-adding Yong-Hyeon to the CC list here.
> > > > 
> > > 
> > > At this point, I am not convinced we have a problem with what will turn
> > > out to be 8.4-RELEASE.
> > > 
> > > There have been several attempts to ensure the upgraded version is
> > > actually 8.4-RC3 (and again, 'uname -a' is not provided in this
> > > email...).
> > > 
> > > I find it very hard to believe that we have exactly one fxp(4) user
> > > upgrading to 8.4-*.
> > > 
> > > I'd really like to make sure that this is not an issue that will affect
> > > an uncountable number of users, but truthfully, at this point have to
> > > consider it a local configuration problem.
> > 
> > I have numerous Supermicro 1U boxes sitting in my garage from closing
> > down my hosting organisation back in August 2012.  I am certain one or
> > two of them have Intel NICs that use fxp(4) -- the problem is that I
> > don't know what exact NIC and PHY model they use.
> > 
> > >From what I can tell, there are at least two systems Mike has which
> > experience this anomaly.  One of those systems' dmesg:
> > 
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html
> > 
> > The relevant lines start at "fxp0:  > the way down to "pci0:0:8:0: bad VPD cksum, remain 14".  I'm not sure if
> > the bad VPD checksum message is relevant to the fxp0 device or not.
> > 
> > The 2nd system is mentioned above/in this post:
> > 
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073530.html
> > 
> > But there's no verbose dmesg etc. for the 2nd system so I don't know if
> > it has the same NIC/PHY.
> > 
> 
> My understanding from the start of this thread is that "both" machines
> are actually the same machine, but with different combinations of
> userland/kernel.  (No, not arguing anything - only one person can answer
> if my understanding is correct or not.)


> > The model of NIC and PHY matters greatly; most users don't seem to
> > realise how important this is, they think in terms of "Intel vs.
> > Broadcom vs. Realtek".
> > 
> > Output from "pciconf -lvbc", specifically the lines relevant to the fxp0
> > device, from both systems, would be highly beneficial.
> > 
> >

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Thu, May 23, 2013 at 09:21:17PM -0400, Glen Barber wrote:
> On Thu, May 23, 2013 at 06:09:43PM -0700, Jeremy Chadwick wrote:
> > On Thu, May 23, 2013 at 08:18:33PM -0400, Michael L. Squires wrote:
> > > I've just tested 8.4-RC3 using a different Supermicro 1U box with a fresh
> > > installation of 8.4-RC3.  I had problems with the installation, wouldn't
> > > boot until I used a Windows 98 FDISK to write a master boot record
> > > (no idea why; this system uses an Adaptec SATA 1.5 6-channel PCI-X
> > > board with two
> > > drives in RAID 1).
> > > 
> > > Using the em0 interface there are no problems with DHCP; when I
> > > switch to the fxp0 interface the interface starts going up/down in
> > > the same manner as reported.
> > > 
> > > The problem appears associated with "world", not with the kernel (running
> > > the 8.4 kernel with the 8.3 world does not have this problem).
> > > 
> > > This motherboard is an X5DPL-iGM with 2 Xeon 2.8GHz CPUs and 4 GB of RAM.
> > > The other unit (an earlier board) has a Serverworks chipset with a single
> > > Xeon CPU but also with a 100Mbit Intel Pro100 Ethernet port and a 1000Mbit
> > > Intel Pro1000 Ethernet port.
> > > 
> > > This unit isn't doing anything useful, so testing isn't a problem.
> > 
> > Mike, Yong-Hyeon asked you a very important question which you didn't
> > answer:
> > 
> > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073458.html
> > 
> > If you assign a static IP address, does fxp0 behave properly?
> > 
> > I'm also re-adding Yong-Hyeon to the CC list here.
> > 
> 
> At this point, I am not convinced we have a problem with what will turn
> out to be 8.4-RELEASE.
> 
> There have been several attempts to ensure the upgraded version is
> actually 8.4-RC3 (and again, 'uname -a' is not provided in this
> email...).
> 
> I find it very hard to believe that we have exactly one fxp(4) user
> upgrading to 8.4-*.
> 
> I'd really like to make sure that this is not an issue that will affect
> an uncountable number of users, but truthfully, at this point have to
> consider it a local configuration problem.

I have numerous Supermicro 1U boxes sitting in my garage from closing
down my hosting organisation back in August 2012.  I am certain one or
two of them have Intel NICs that use fxp(4) -- the problem is that I
don't know what exact NIC and PHY model they use.

>From what I can tell, there are at least two systems Mike has which
experience this anomaly.  One of those systems' dmesg:

http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html

The relevant lines start at "fxp0: http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073530.html

But there's no verbose dmesg etc. for the 2nd system so I don't know if
it has the same NIC/PHY.

The model of NIC and PHY matters greatly; most users don't seem to
realise how important this is, they think in terms of "Intel vs.
Broadcom vs. Realtek".

Output from "pciconf -lvbc", specifically the lines relevant to the fxp0
device, from both systems, would be highly beneficial.

In the meantime, I'll head down to my garage to see if I can find those
fxp(4) boxes and see if they're 85551s (I sure hope I haven't pulled the
CPUs/RAM from them).  If I find a match, I can try to reproduce this.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Apparent fxp regression in FreeBSD 8.4-RC3

2013-05-23 Thread Jeremy Chadwick

On Thu, May 23, 2013 at 08:18:33PM -0400, Michael L. Squires wrote:
> I've just tested 8.4-RC3 using a different Supermicro 1U box with a fresh
> installation of 8.4-RC3.  I had problems with the installation, wouldn't
> boot until I used a Windows 98 FDISK to write a master boot record
> (no idea why; this system uses an Adaptec SATA 1.5 6-channel PCI-X
> board with two
> drives in RAID 1).
> 
> Using the em0 interface there are no problems with DHCP; when I
> switch to the fxp0 interface the interface starts going up/down in
> the same manner as reported.
> 
> The problem appears associated with "world", not with the kernel (running
> the 8.4 kernel with the 8.3 world does not have this problem).
> 
> This motherboard is an X5DPL-iGM with 2 Xeon 2.8GHz CPUs and 4 GB of RAM.
> The other unit (an earlier board) has a Serverworks chipset with a single
> Xeon CPU but also with a 100Mbit Intel Pro100 Ethernet port and a 1000Mbit
> Intel Pro1000 Ethernet port.
> 
> This unit isn't doing anything useful, so testing isn't a problem.

Mike, Yong-Hyeon asked you a very important question which you didn't
answer:

http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073458.html

If you assign a static IP address, does fxp0 behave properly?

I'm also re-adding Yong-Hyeon to the CC list here.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Swap Warning Message?

2013-05-23 Thread Jeremy Chadwick

On Thu, May 23, 2013 at 07:55:20AM -0500, Michael Gass wrote:
> Updated 9.1 to 9 stable on an old PII with 256 MB of memory.
> (FreeBSD runs fine on this machine).  After updating have
> been getting the following warning on startup:
> 
> warning: total configured swap (524288 pages) exceeds maximum recommended
> amount (497056 pages).
> warning: increase kern.maxswzone or reduce amount of swap space.
> 
> I allocated 2.0 GB of swap when I installed.  This was not a problem
> in the past. 
> 
> Should I ignore this warning or do I need to do something?

Taken from my /boot/loader.conf:

# Set kern.maxswzone to 0 to squelch "total configured swap exceeds
# maximum recommended amount" warning, even with maxpages/2 fix.
# 
http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/thread.html#69301
#
kern.maxswzone="0"

Give the small amount of memory on your system, I would suggest using
the above /boot/loader.conf setting, since your system is significantly 
likely to make use of lots of swap; decreasing swap space in your case
seems downright silly.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: OpenSSH in -STABLE

2013-05-21 Thread Jeremy Chadwick

On Tue, May 21, 2013 at 08:11:09PM -0700, Jeremy Chadwick wrote:
> ... 6.2p2 was imported to head/CURRENT on May 22nd ...

Typo on my part: this should have read May 17th, as is obvious from
svnweb.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: OpenSSH in -STABLE

2013-05-21 Thread Jeremy Chadwick

On Tue, May 21, 2013 at 11:02:27PM -0400, usa...@hushmail.com wrote:
> On Tue, 21 May 2013 22:20:08 -0400 "David Wolfskill" 
>  wrote:
> >On Tue, May 21, 2013 at 09:42:39PM -0400, usa...@hushmail.com 
> >wrote:
> >> Hi. Are there any plans to get OpenSSH 6.2 in 9-STABLE? I'd like 
> >to
> >> check out the new AES-GCM stuff without going to -CURRENT on 
> >this
> >> system. If there are no plans, is there a possibility? Thanks
> >> 
> >
> >Please refer to ports/security/openssh-portable; its Makefile says 
> >it's
> >6.2p2,1, last updated about 5 days ago.
> >
> 
> Thanks, but that wasn't what I asked about. I'm aware of the 
> version in ports.

Try freebsd-secur...@freebsd.org, I am certain you will get an answer
there.

Fact: OpenSSH 6.2p1 was imported to head/CURRENT on March 22nd, and
6.2p2 was imported to head/CURRENT on May 22nd:

http://svnweb.freebsd.org/base/head/crypto/openssh/ChangeLog

OpenSSH is such an important/key piece of software that, much like
OpenSSL, it is one that does not warrant haste when it comes to getting
MFC'd.  If you want something more recent on non-CURRENT, you will
usually be told to run the version from ports.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Unexpected reboot/crash on 8.2-RELEASE.

2013-05-18 Thread Jeremy Chadwick

On Sat, May 18, 2013 at 09:45:21PM -0400, kpn...@pobox.com wrote:
> I had an unexpected reboot of my Dell R610 today around 2:05-06pm today.
> I do not know if it crashed or if it was power cycled.
> 
> This machine is running:
> FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec 
>  8 21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> It's a stock 8.2-RELEASE kernel except I had to tweak it near the top of
> vfs_mountroot() to delay before attempting to mount the root filesystem.
> (Without my tweak it attempts to mount root before the USB drive is finished
> getting attached.)
> 
> The dmesg shows this at the reboot:
> mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete
> mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started 
> mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete
> mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 
> 0060/1000/1f0c/1028)
> mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
> mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 
> 0060/1000/1f0c/1028)
> mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952
> 
> Does this mean the machine did not lose power? I ask because my datacenter
> had some sort of power incident and I'm not sure if the server lost power
> or not. But if the kernel message buffer from before the incident is still
> present then the machine never lost power, correct? The datacenter's power
> incident I'm told happened somewhere around the time of the reboot so I
> have to ask.
> 
> It looks like I didn't have dumps enabled. That's ... not helpful.
> 
> The machine has been stable for:
>  2:05PM  up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00
> 
> http://www.neutralgood.org/~kpn/dmesg.boot
> 
> Here's various stats I usually keep displayed. This is the last from
> before the reboot:
> http://www.neutralgood.org/~kpn/status.txt

Your system did not reboot nor did it crash.  If it did, your uptime
would not be showing 472 days..

Really, it's that simple. 

> I've got all the power savings features turned off in the BIOS and, like
> I said, the machine has been stable for all this time. However, one thing
> to note from a couple of days ago:
> 
> May 14 00:49:13 gunsight1 -- MARK --
> May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 35 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 65 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 95 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 125 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 155 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 185 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 215 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 245 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 275 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 305 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 335 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 365 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 395 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 425 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 455 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 485 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 515 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 545 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 575 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 605 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 635 SECONDS
> May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT 
> AFTER 665 SECONDS
> May 14 01:19:36 gunsight1 -- MARK --
> May 14 01:39:36 gunsight1 -- MARK --
> May 14 01:59:37 gunsight1 -- MARK --
> May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0

Re: still mbuf leak in 9.0 / 9.1?

2013-05-18 Thread Jeremy Chadwick

On Sat, May 18, 2013 at 12:14:28PM +0200, Ronald Klop wrote:
> On Fri, 17 May 2013 19:31:01 +0200, Jeremy Chadwick  wrote:
> 
> >On Fri, May 17, 2013 at 11:37:23AM +0200, dennis berger wrote:
> >>Hi List,
> >>I can confirm that it is the bug you mentioned steven.
> >>Here is how I found it.
> >>
> >>I recorded hourly zfskern and nfsd stats. like this.
> >>
> >>echo "PROCSTAT" >> $reportname
> >>pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname
> >>
> >>luckily it crashed this night and logged this.
> >>
> >> 1910 101508 nfsd nfsd: servicemi_switch+0x186
> >>sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1
> >>uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5
> >>arc_read_nolock+0x1ec arc_read+0x93 dbuf_prefetch+0x12c
> >>dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x4a7
> >>dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67
> >>dmu_read_uio+0x3f zfs_freebsd_read+0x3e3
> >>
> >>Maybe it would be good to merge this fix into RELENG_9_1 and
> >>distribute a fix via freebsd-update what do you think?
> >>
> >>best,
> >>-dennis
> >>
> >>
> >>Am 16.05.2013 um 11:42 schrieb dennis berger:
> >>
> >>> This is indeed a ZFS+NFS system and I can see that istgt and
> >>nfs are stuck in some ZIO state. Maybe it's this.
> >>> Thank's for pointing out.
> >>>
> >>> Is it this ZFS+NFS deadlock?
> >>>
> >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
> >>> @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int
> >>howto __unused)
> >>>   mutex_enter(&arc_reclaim_thr_lock);
> >>>   needfree = 1;
> >>>   cv_signal(&arc_reclaim_thr_cv);
> >>> - while (needfree)
> >>> -  msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0);
> >>> +
> >>> + /*
> >>> +  * It is unsafe to block here in arbitrary threads, because
> >>we can come
> >>> +  * here from ARC itself and may hold ARC locks and thus risk
> >>a deadlock
> >>> +  * with ARC reclaim thread.
> >>> +  */
> >>> + if (curproc == pageproc) {
> >>> +  while (needfree)
> >>> +  msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0);
> >>> + }
> >>>   mutex_exit(&arc_reclaim_thr_lock);
> >>>   mutex_exit(&arc_lowmem_lock);
> >>> }
> >>>
> >>> I'll try to crash our testsystem. I'll assume that stressing
> >>NFS backed with ZFS a lot might trigger this bug?
> >>>
> >>> -dennis
> >>>
> >>>
> >>> Am 16.05.2013 um 00:03 schrieb Steven Hartland:
> >>>
> >>>> - Original Message - From: "dennis berger" 
> >>>>> FreeBSD  9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec
> >>4 09:23:10 UTC 2012
> >>>>>
> >>>>>> 3. Regarding this:
> >>>>>>>> A clean shutdown isn't possible though. It hangs after vnode
> >>>>>>>> cleaning, normally you would see detaching of usb devices
> >>here, or
> >>>>>>>> other devices maybe?
> >>>>>> Please don't conflate this with your above issue.  This is almost
> >>>>>> certainly unrelated.  Please start a new thread about that
> >>if desired.
> >>>>>
> >>>>> Maybe this is a misunderstanding normally this system will
> >>shutdown cleanly, of course.
> >>>>> This hang only appears after the network problem above.
> >>>>
> >>>> If this is a ZFS system, its a known issue which is fixed in current,
> >>>> stable-9, stable-8 and the upcoming 8.4 release.
> >>>>
> >>>> If not and you have USB devices see if the following sysctl helps:
> >>>> hw.usb.no_shutdown_wait=1
> >
> >I'm sorry to say it won't happen.  The only updates that the -RELEASE
> >branches get are for security.  If you want fixes for other things, you
> >need to follow/run stables branches (i.e. stable/9), otherwise you will
> >need to wait until 9.2-RELEASE comes out.
> >
> 
> And errata notices?

Re: Command line not responding

2013-05-17 Thread Jeremy Chadwick

On Fri, May 17, 2013 at 09:49:20PM -0500, Michael Gass wrote:
> On Fri, May 17, 2013 at 11:55:13AM -0700, Jeremy Chadwick wrote:
> > On Fri, May 17, 2013 at 12:56:53PM -0500, Michael Gass wrote:
> > > Running 9.0-Stable on an i386.
> > > 
> > > Whenever I type a command at the prompt I get
> > > the output
> > > 
> > > /usr/local/lib/libintl.so.9: Undefined symbol "_ThreadRuneLocale"
> > > 
> > > and nothing else - the command will not run. Just the
> > > above output.  Commands like "ls" and "exit" work, but not much
> > > else.  This happends whether I am logged in a user or as root.
> > > Cannot even halt the system from the command line.
> > > 
> > > Started to happen after trying to update the freetype2 port.
> > > Got an error msg while updating libXft-2.1.14.  From that point
> > > on I cannot use  the command line.
> > > 
> > > I have no idea what to try.  Any suggestions.
> > 
> 
> 
> > First provide the contents of /etc/make.conf and /etc/src.conf.
> > 
> 
> Thanks for getting back to me. Here are the contents of the two
> files.  I rebuilt the kernel last fall and have updated ports
> fairly regularly since. Things have worked fine until today when
> I tried to update ports.
> 
> # File:   make.conf
> # The ? in the below is for buildworld
> CPUTYPE?=pentium2
> # Uncomment the below for general builds.
> CFLAGS= -O -pipe
> # Uncomment the below for kernel builds.
> # COPTFLAGS= -O -pipe
> NO_PROFILE=true
> INSTALL_NODEBUG=true
> #WITHOUT_DILLO_IPV6=yes
> #WITH_DILLO_DLGUI=yes
> # added by use.perl 2013-05-17 11:04:30
> PERL_VERSION=5.12.4
> 
> # File:   src.conf
> WITHOUT_PROFILE=true
> WITHOUT_BLUETOOTH=true

These confs look generally good, meaning there isn't the "messing about"
that the other user had.  I did catch one thing, however.

Speaking strictly about CFLAGS:

This should be CFLAGS+= (plus-equals), not CFLAGS= (equals).  Otherwise
you're effectively overriding CFLAGS for everything, which could cause
issues (some portions of the build infrastructure may set or adjust the
optimiser flags to something other than -O, and you'd be forcing it to
do it anyway).  I obviously don't know if that could/would explain the
missing symbol issue, but it's still something that's erroneous and
major.  In general I recommend people *do not* tinker with CFLAGS at
all in make.conf -- it's not worth the hassle on i386/amd64 if something
goes wrong.

If you ever want to know which syntaxes to use (for example, your
CPUTYPE?= is correct, and your COPTFLAGS= is correct), review
/usr/share/examples/etc/make.conf or src/share/examples/etc/make.conf.

Unrelated to all of this (just a useful comment in passing): NO_PROFILE
serves no purpose there, just keep WITHOUT_PROFILE=true in src.conf like
you have.  NO_PROFILE in make.conf would be from "old" FreeBSD days
(i.e. prior to src.conf existing).

Your src.conf looks fine.

Sorry I can't be of more help.  :-(

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Command line not responding

2013-05-17 Thread Jeremy Chadwick

On Fri, May 17, 2013 at 12:56:53PM -0500, Michael Gass wrote:
> Running 9.0-Stable on an i386.
> 
> Whenever I type a command at the prompt I get
> the output
> 
> /usr/local/lib/libintl.so.9: Undefined symbol "_ThreadRuneLocale"
> 
> and nothing else - the command will not run. Just the
> above output.  Commands like "ls" and "exit" work, but not much
> else.  This happends whether I am logged in a user or as root.
> Cannot even halt the system from the command line.
> 
> Started to happen after trying to update the freetype2 port.
> Got an error msg while updating libXft-2.1.14.  From that point
> on I cannot use  the command line.
> 
> I have no idea what to try.  Any suggestions.

First provide the contents of /etc/make.conf and /etc/src.conf.

The _ThreadRuneLocale thing has come up before, but on -CURRENT circa
early 2012.  It happened to a user when trying to build kernel (really)
and that user was tinkering about in make.conf and src.conf heavily,
messing with Clang.  I personally remove Clang from my systems entirely
for many reasons, by simply doing WITHOUT_CLANG=true in src.conf and
thus rely entirely on gcc.

My recommendation, and this isn't going to make you happy:

Boot into single-user, mount your filesystems, and try commands there,
in hopes that they work.  If they do:

pkg_delete -a -f
cp -pR /usr/local /usr/local.old
rm -fr /usr/local/*
reboot

Boot into multi-user, log in, and things should be fine.  Next:

rm -fr /var/db/ports/*
rm -fr /usr/ports/distfiles/*
find /usr/ports -type d -name "work" -exec rm -fr {} \;

Now begin rebuilding your ports.  If you prefer to use packages, go
right ahead, given that this was just announced a few days ago:

http://lists.freebsd.org/pipermail/freebsd-announce/2013-May/001476.html

But I tend to build everything from source, barring large-ish packages
(things like cmake, python27, perl) which I pkg_add -r.

My attitude has always been when something catastrophic impacts a very
large number of commands (particularly a library with a missing symbol
that a very large number of programs link to), start fresh.  It's
not worth scrambling around with leftover cruft in place that could
appear months later and make you say "I thought I fixed that!", where
you then have to follow up to a thread months old and admit "actually
there is more breakage..."

Footnote: I am likely to get a large amount of backlash for proposing
the above, with claims that will equate it to fixing a minor cut by
amputating the entire limb.  My response to such: that's nice.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: still mbuf leak in 9.0 / 9.1?

2013-05-17 Thread Jeremy Chadwick

On Fri, May 17, 2013 at 11:37:23AM +0200, dennis berger wrote:
> Hi List,
> I can confirm that it is the bug you mentioned steven.
> Here is how I found it.
> 
> I recorded hourly zfskern and nfsd stats. like this.
> 
> echo "PROCSTAT" >> $reportname
> pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname
> 
> luckily it crashed this night and logged this.
> 
>  1910 101508 nfsd nfsd: servicemi_switch+0x186 
> sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1 
> uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 arc_read_nolock+0x1ec 
> arc_read+0x93 dbuf_prefetch+0x12c dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 
> dbuf_read+0x4a7 dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67 
> dmu_read_uio+0x3f zfs_freebsd_read+0x3e3 
> 
> Maybe it would be good to merge this fix into RELENG_9_1 and distribute a fix 
> via freebsd-update what do you think?
> 
> best,
> -dennis
> 
> 
> Am 16.05.2013 um 11:42 schrieb dennis berger:
> 
> > This is indeed a ZFS+NFS system and I can see that istgt and nfs are stuck 
> > in some ZIO state. Maybe it's this. 
> > Thank's for pointing out. 
> > 
> > Is it this ZFS+NFS deadlock?
> > 
> > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c 
> > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c 
> > @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto __unused) 
> > mutex_enter(&arc_reclaim_thr_lock); 
> > needfree = 1; 
> > cv_signal(&arc_reclaim_thr_cv); 
> > -   while (needfree) 
> > -msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); 
> > + 
> > +   /* 
> > +* It is unsafe to block here in arbitrary threads, because we can come 
> > +* here from ARC itself and may hold ARC locks and thus risk a deadlock 
> > +* with ARC reclaim thread. 
> > +*/ 
> > +   if (curproc == pageproc) { 
> > +while (needfree) 
> > +msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); 
> > +   } 
> > mutex_exit(&arc_reclaim_thr_lock); 
> > mutex_exit(&arc_lowmem_lock); 
> > }
> > 
> > I'll try to crash our testsystem. I'll assume that stressing NFS backed 
> > with ZFS a lot might trigger this bug?
> > 
> > -dennis
> > 
> > 
> > Am 16.05.2013 um 00:03 schrieb Steven Hartland:
> > 
> >> - Original Message - From: "dennis berger" 
> >>> FreeBSD  9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 
> >>> UTC 2012
> >>> 
> >>>> 3. Regarding this:
> >>>>>> A clean shutdown isn't possible though. It hangs after vnode
> >>>>>> cleaning, normally you would see detaching of usb devices here, or
> >>>>>> other devices maybe?
> >>>> Please don't conflate this with your above issue.  This is almost
> >>>> certainly unrelated.  Please start a new thread about that if desired.
> >>> 
> >>> Maybe this is a misunderstanding normally this system will shutdown 
> >>> cleanly, of course.
> >>> This hang only appears after the network problem above.
> >> 
> >> If this is a ZFS system, its a known issue which is fixed in current,
> >> stable-9, stable-8 and the upcoming 8.4 release.
> >> 
> >> If not and you have USB devices see if the following sysctl helps:
> >> hw.usb.no_shutdown_wait=1

I'm sorry to say it won't happen.  The only updates that the -RELEASE
branches get are for security.  If you want fixes for other things, you
need to follow/run stables branches (i.e. stable/9), otherwise you will
need to wait until 9.2-RELEASE comes out.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: revision higher than 250508 breaks webcam support

2013-05-16 Thread Jeremy Chadwick

On Thu, May 16, 2013 at 08:38:39PM -0700, Adrian Chadd wrote:
> Are you able to narrow down the specific commit along 9-stable that broke it?
> 
> Thanks!
> 
> 
> 
> Adrian
> 
> On 16 May 2013 18:00, Jo??e Zobec  wrote:
> > Sorry, for waiting this long to post this problem, I thought it would be 
> > dealt with this week, but since it wasn't better to report it now. I hope 
> > this is the right mailing list for this particular problem.
> >
> > I am running FreeBSD 9.1-STABLE and using Logitech Webcam C525. I it's not 
> > listed amongst the supported hardware, but it was working perfectly until 
> > the updates that came this Sunday, 2013-05-12.
> >
> > The problem I'm getting is this:
> >
> > I keep getting this error message from the kernel, if I'm using 9.1-STABLE 
> > r250707
> >
> > ...
> > pcm6: detached
> > ugen7.2:  at asbus7
> > uaudio0:  
> > on usbus7
> > uaudio0: No playback.
> > uaudio0: Record: 48000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer.
> > uaudio0: Record: 32000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer.
> > uaudio0: Record: 24000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer.
> > uaudio0: Record: 16000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer.
> > uaudio: No MIDI squencer.
> > pcm6:  on uaudio0
> > uaudio0: No HID volume keys found.
> > ugen7.2:  at usbus7 (disconnected)
> > uaudio0: at uhub7, port4, addr 2 (disconnected)
> > pcm6: detached
> > ...
> >
> > This message is displayed periodically "ad infinitum" or at least until I 
> > unplug the webcam. It stays this way, even if I use the GENERIC kernel. In 
> > a "healthy" case, revision 250508, kernel message upon plugging the webcam, 
> > is
> >
> > ...
> > ugen7.2:  at usbus7
> > uaudio0:  
> > on usbus7
> > uaudio: No playback.
> > uaudio: Record: 48000 Hz, 1 ch, 16 bit S-LE PCM format, 2x8ms buffer.
> > uaudio: No MIDI sequencer.
> > pcm6:  on uaudio0
> > uaudio0: No HID volume keys found.
> >
> > And there it stops, and the webcam works in Skype.

Note: I told Joe to mail freebsd-usb@ about this, since it looks like it
pertains to the USB stack, and Hans tends to respond to stuff there.

That said...

Looking at commits between r250508 and r250707, my gut says it's very
likely one of these (with the most probable being marked with arrows):

http://www.freshbsd.org/commit/freebsd/r250581
http://www.freshbsd.org/commit/freebsd/r250561 <---
http://www.freshbsd.org/commit/freebsd/r250560 <---
http://www.freshbsd.org/commit/freebsd/r250559

How I got that list was by manually reviewing the following:

http://www.freshbsd.org/?branch=RELENG_9&project=freebsd

So I would recommend rolling back to r250558 (the last stable/9 commit
to happen before r250559) and see if things improve.  Again, my gut
feeling says that they will, and that r250561 or r250560 are
responsible.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: still mbuf leak in 9.0 / 9.1?

2013-05-15 Thread Jeremy Chadwick

> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > ix1: link state changed to DOWN
> > ix1: link state changed to UP
> > 
> > 
> > I should add that the servers that are directly connected to this freebsd 
> > server reboot every night. This is why you see ix0 UP/DOWN
> > messages in dmesg.
> > 
> > 
> > 
> > 
> > 
> > 
> > - END System information

1. You appear convinced that the issue is related to mbuf exhaustion,
but you haven't provided evidence that you're hitting the mbuf maximum
(in your case 262144).

What you *have* shown is your mbuf count gradually increasing (sans
15-05-2013-13-09.txt vs. 15-05-2013-14-09.txt which shows mbufs almost
doubling (!)), which could indicate a leak but then again might not.

If you reach mbuf maximum, then yes, network I/O can cease or stall
(possibly indefinitely).  However, broken/busted network I/O can also
happen due to other issues unrelated to mbufs, such as network stack
issues, firewall stack issues, or network driver bugs.  Are you using
pf, ipfw, or ipfilter on this system?

2. I think we'd all appreciate if you disclosed **exactly** what version
of FreeBSD you're using (Subject says "9.0 or 9.1" which is
insufficient).  Please provide "uname -a" output (you can XXX out the
hostname if you want) -- and if you're still using csup/cvsup and built
your own kernel/world, we'll need to know exactly what date your src
files were from when you rebuilt.

I'm wary of CC'ing folks who can help troubleshoot mbuf exhaustion
issues until answers to the above can be provided, as I don't want to
waste their time.

3. Regarding this:

> > A clean shutdown isn't possible though. It hangs after vnode
> > cleaning, normally you would see detaching of usb devices here, or
> > other devices maybe?

Please don't conflate this with your above issue.  This is almost
certainly unrelated.  Please start a new thread about that if desired.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Build GENERIC with IPX support

2013-05-12 Thread Jeremy Chadwick

ead rc file as follows:
182  * 1. read [server] section
183  * 2. override with [server:user] section
184  * Since abcence of rcfile is not a bug, silently ignore that fact.
185  * rcfile never closed to reduce number of open/close operations.
186  */
187 int
188 ncp_li_readrc(struct ncp_conn_loginfo *li) {
189 int i, val, error;
190 char uname[NCP_BINDERY_NAME_LEN*2+1];
191 char *sect = NULL, *p;
192
193 /*
194  * if info from cmd line incomplete, try to find existing
195  * connection and fill server/user from it.
196  */
197 if (li->server[0] == 0 || li->user == NULL) {
198 int connHandle;
199 struct ncp_conn_stat cs;
200
201 if ((error = ncp_conn_scan(li, &connHandle)) != 0) {
202 ncp_error("no default connection found", errno);
203 return error;
204 }

To me, this may indicate you have some kind of "ncp rc file" (I believe
this is ~/.nwfsrc according to the ncplist(1) man page) that may contain
something invalid, or maybe you lack such a file altogether (creating one
might work around the problem).

Back to the actual segfault itself: ncp_error() is pretty simple:

src/lib/libncp/ncpl_subr.c --

447 /*
448  * Print a (descriptive) error message
449  * error values:
450  * 0 - no specific error code available;
451  *  -999..-1 - NDS error
452  *  1..32767 - system error
453  *  the rest - requester error;
454  */
455 void
456 ncp_error(const char *fmt, int error, ...) {
457 va_list ap;
458
459 fprintf(stderr, "%s: ", _getprogname());
460 va_start(ap, error);
461 vfprintf(stderr, fmt, ap);
462 va_end(ap);
463 if (error == -1)
464 error = errno;
465 if (error > -1000 && error < 0) {
466 fprintf(stderr, ": dserr = %d\n", error);
467 } else if (error & 0x8000) {
468 fprintf(stderr, ": nwerr = %04x\n", error);
469 } else if (error) {
470 fprintf(stderr, ": syserr = %s\n", strerror(error));
471 } else
472 fprintf(stderr, "\n");
473 }

What I don't understand from the calling stack is how gettimeofday() is
involved.  I have looked at the libc code, looked at the underlying
calling functions and so on (from fprintf() to vfprintf_l() and deeper),
and I don't see how or where gettimeofday() would be called.  The only
place I can think of might be the related locale stuff, but I'm doubting
that given what I've looked at but could still be wrong.

Have world/kernel on this system ever been rebuilt?  If they have,
were both kernel and world rebuilt together from the same source code
and not at different times?

If you're setting LANG, LC_CTYPE, LC_COLLATE, or other locale-oriented
settings in your environment (and my gut feeling is that you are), you
could try removing them and see if you get an actual useful error
message on stderr, but I'm not holding my breath.

I cannot help you with the remaining IPX-specific "stuff"; it's fairly
obvious though, as I said, that this code has been neglected.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Mountain View, CA, US|
| Making life hard for others since 1977. PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2090 matches

Mail list logo