Re: [HEADSUP] Re: Is IPV6 option still necessary?
On Wed, Oct 09, 2019 at 08:13:39PM -0700, Jeremy Chadwick wrote: > > Now we can get back on the ipv6 option. > > > > so if we want to proceed further in removing the option to build with or > > without > > ipv6 for the ports side. Please speak up in reply to this email, if you are > > building without ipv6, why are you doing so, what are the real benefit for > > it. > > How bad it will impact you if we do remove that option? > > Whenever I use ports over FreeBSD-provided packages (or to use ports to > build my own packages), I often disable IPV6 support. The lengthy > response below should explain why. > {brevity snip} This was sent to the wrong mailing list; was intended for -ports. Sorry for the noise. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administrator PGP 0x2A389531 | | Making life hard for others since 1977.| ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [HEADSUP] Re: Is IPV6 option still necessary?
> Now we can get back on the ipv6 option. > > so if we want to proceed further in removing the option to build with or > without > ipv6 for the ports side. Please speak up in reply to this email, if you are > building without ipv6, why are you doing so, what are the real benefit for it. > How bad it will impact you if we do remove that option? Whenever I use ports over FreeBSD-provided packages (or to use ports to build my own packages), I often disable IPV6 support. The lengthy response below should explain why. In short: the IPV6 option is useful and important. Please keep it. In length: I think anyone operating in the Real World knows quite well that IPv6 is still treated as a third-class citizen when it comes to both general connectivity/reliability* and general use cases code-wise**. It's still very much in utero; or a toddler, if you will. When you encounter IPv6 vs. IPv4 prioritisation issues, they are painful and annoying. No user or administrator is going to sit for hours fiddling with it all to restore things to a working state when simply removing IPv6 relieves the problem permanently. Time and time again I see companies advertising records and webservers listening on IPv6 yet IPv6 transit fails but their A/IPv4 endpoint works fine. It's the dual-stack nature that makes a lot of this worse than it should be. (I do think this subject should be re-visited once the world as a whole starts to seriously decommission IPv4, though. Yes I'm serious.) I've worked for several companies that are IPv4-only, where the belief (and one I share) is that IPv6-only clients have some 6-to-4-ish gateway/NAT somewhere upstream, otherwise they wouldn't be able to reach most of the Internet. IPv4 NAT still works for the majority of use cases still as of 2019. Furthermore, faux-political statements like "IPv6 is more widely used than 2012" should be ignored and facts reiterated: IPv6 adoption is around 25% as of mid-2019. And it's taken over 10 years to reach that. IPv4 is also well-understood, and not, as Dave Horsfall accurately described, "a horse designed by a committee"; people are still trying to wrap their head around IPv6 NDP/RA, SLAAC, and a myriad of other things (dare I mention syntax?). It's this which explains the sluggish adoption rate. And yes, I am well-aware of how important IPv6 is in other regions, particularly Asia. I am not belittling that need at all. But not everyone globally has the same needs. What should really be asked for is the opposite: for the FreeBSD ports folks to justify its removal. How is this hurting you on a daily basis? Is there a large percentage of Mk/ framework bits causing you pain? Are the bulk of per-port patches inducing maintainer grief? At what scale is this impacting you? In 7 years (since the OP picked 2012), how much time has been spent by maintainers ensuring IPV6=true works for their port(s)? Are you truly OK throwing away the integration work done by many, many people (not just Project members!) over the past N years (see: per-port patches), and forcing people who still need the option to make their own ports tree to retain it? Here's some harsh advice for the FreeBSD Project: quit changing shit for sake of change, often masked by lies like "XXX is stagnant/old" or similarly fallacious and loaded statements. The project (both src and ports, but especially ports) have lost many very good people in the past 10+ years (and I'm not talking about me) *because* of that change for sake of change mindset -- the same mindset driving this request! It's changes like this that drive people away from FreeBSD. Really. It's the same mindset that provoked people to stop using Linux distros due to systemd integration. I will not be replying to this thread past this point. I have said all that I care to say / spent enough time on it. Just please stop hurting administrators and end users with proposals/actions like this. * - Real-world IPv6 failures impacting end users tend to be higher than IPv4; this is anecdotal on my part, but I have a myriad of peers who have had to disable IPv6 for similar reasons. The IPv4 fallback in software (both userland apps and network stacks) does not always work "correctly". Just go see how often IPv6 failures/issues are reported on both NANOG and the outages@ mailing list. And yes I am quite aware that a good portion of the Internet backbone at this point is IPv6 (that's nice, and not what we're talking about here). ** - I still continue to see open-source software committing major fixes to AF_INET6 related code bits. Major pieces of software include curl, wget, Busybox, DNS servers (pick one!), and ntp... just for starters. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administrator PGP 0x2A389531 | | Making life ha
Re: svn commit: r351246 - in stable: 11/sys/opencrypto 12/sys/opencrypto
> I've committed a fix to head and will MFC it in a few days. Thanks > for tracking this down! Did HEAD r351557 get backported/MFC'd into stable/11 and stable/12? Can test stable/11 if needed. Thanks! -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administrator PGP 0x2A389531 | | Making life hard for others since 1977.| ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Buildworld times (was Re: svn commit: r350256 - in stable/12: . contrib/compiler-rt/lib/sanitizer_common contrib/libunwind/src contrib/llvm/lib/DebugInfo/DWARF contrib/llvm/lib/MC contrib/llvm/lib
On Mon, Jul 29, 2019 at 01:44:01PM -0400, mike tancsa wrote: > On 7/26/2019 10:38 PM, Jeremy Chadwick via freebsd-stable wrote: > > (Please retain CCs, I am not subscribed to the list) > > > > Below is hard evidence of 3 things on stable/11 (not 12) after r350259: > > > > 1. r350259 adds *substantial* time to buildworld. > > Are you sure this is not the same as the issue in RELENG12 ? ie. the new > version of clang is built as part of world since it differs from whats > installed. I had a RELENG11 box sitting around from July 4th By "on stable/11 (not 12)" I meant: I do not run stable/12, thus I cannot speak on its behalf. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administrator PGP 0x2A389531 | | Making life hard for others since 1977.| ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Buildworld times (was Re: svn commit: r350256 - in stable/12: . contrib/compiler-rt/lib/sanitizer_common contrib/libunwind/src contrib/llvm/lib/DebugInfo/DWARF contrib/llvm/lib/MC contrib/llvm/lib
(Please retain CCs, I am not subscribed to the list) Below is hard evidence of 3 things on stable/11 (not 12) after r350259: 1. r350259 adds *substantial* time to buildworld. 2. WITHOUT_CLANG_EXTRAS+WITHOUT_CLANG_FULL+WITHOUT_LLDB can help improve the situation after r350259, but it is still no where near as fast as pre-r350259. 3. Kernel build times are fine; issue is with world. TL;DR for lazy folks: stable/11 r350330 world + minimal clang = 1:29:34 stable/11 r350330 world + full clang= 1:46:31 stable/11 r350252 world + minimal clang = 56:52 stable/11 r350252 world + full clang= 1:14:30 I cannot even begin to tell you how big of an impact this has on my low-end dual-core VPS box (world takes hours upon hours). We've been down this road before, many many times, since the introduction of clang/LLVM. Here's just a few that went no where. I couldn't find the more-useful one that had some concrete numbers in it, dating back to pre-2016 (maybe sometime in 2014 or 2015?): https://lists.freebsd.org/pipermail/freebsd-current/2017-January/thread.html#64431 https://lists.freebsd.org/pipermail/freebsd-stable/2017-January/thread.html#86646 https://lists.freebsd.org/pipermail/freebsd-questions/2016-November/thread.html#274684 Does anyone have a good/recent write-up on how to switch to gcc? :-) System == * Intel Core 2 Quad Q9550 @ 2.83GHz * 8GB ECC RAM * Samsung SSD 840 EVO 250GB filesystem (UFS2 + SU (not SUJ) + TRIM) + 32GB swap * Running stable/11 r349226 * Misc notes - r350330 happened to be what was "master" at the time of my test - r350252 was the commit on stable/11 immediately before r350259 - Switching to r350252 accomplished via: cd /usr/src && svnlite up -r350252 - System uses kern.maxvnodes=856944, last tuned 2018/06/07 Test #1, building r350330 minimal clang === # cat /etc/src.conf WITHOUT_ATM=true WITHOUT_BLUETOOTH=true WITHOUT_DEBUG_FILES=true WITHOUT_FLOPPY=true WITHOUT_FREEBSD_UPDATE=true WITHOUT_IPFILTER=true WITHOUT_IPX=true WITHOUT_LIB32=true WITHOUT_NDIS=true WITHOUT_NETGRAPH=true WITHOUT_PPP=true WITHOUT_SENDMAIL=true WITHOUT_TESTS=true WITHOUT_WIRELESS=true WITH_OPENSSH_NONE_CIPHER=true WITHOUT_CLANG_EXTRAS=true WITHOUT_CLANG_FULL=true WITHOUT_LLDB=true WITHOUT_LLVM_TARGET_AARCH64=true WITHOUT_LLVM_TARGET_ARM=true WITHOUT_LLVM_TARGET_MIPS=true WITHOUT_LLVM_TARGET_POWERPC=true WITHOUT_LLVM_TARGET_SPARC=true WITHOUT_REPRODUCIBLE_BUILD=true # cat /etc/make.conf KERNCONF=X7SBA_RELENG_11_amd64 CPUTYPE?=core2 SVN_UPDATE=yes STRIP= CFLAGS+=-fno-omit-frame-pointer Result: # rm -fr /usr/obj/* # cd /usr/src # time make -j4 buildworld 19906.874u 1280.928s 1:29:33.51 394.3% 57966+778k 23504+14200io 13867pf+0w # time make -j4 buildkernel 1592.460u 196.047s 7:36.61 391.6% 48704+614k 6627+18158io 7361pf+0w Test #2, building r350330 full clang "full clang" means same as Test #1 but with these 3 src.conf lines commented out, i.e. CLANG_EXTRAS, CLANG_FULL, and LLDB are ENABLED: WITHOUT_CLANG_EXTRAS=true WITHOUT_CLANG_FULL=true WITHOUT_LLDB=true Result: # rm -fr /usr/obj/* # cd /usr/src # time make -j4 buildworld 23779.674u 1463.156s 1:46:30.75 394.9% 57621+783k 20093+15423io 7283pf+0w # time make -j4 buildkernel 1594.079u 194.345s 7:36.48 391.7% 48707+614k 5301+18013io 5342pf+0w Test #3, building r350252 minimal clang === Same configs as Test #1 Result: # rm -fr /usr/obj/* # cd /usr/src # time make -j4 buildworld 12582.693u 882.543s 56:52.35 394.6% 62698+760k 21432+9694io 6923pf+0w # time make -j4 buildkernel 1649.559u 184.934s 7:48.01 391.9% 57053+622k 7566+18291io 5402pf+0w Test #4, building r350252 full clang Same configs as Test #2 # rm -fr /usr/obj/* # cd /usr/src # time make -j4 buildworld 16600.975u 1068.754s 1:14:29.53 395.3% 63271+774k 8683+10876io 4707pf+0w # time make -j4 buildkernel 1650.654u 183.966s 7:47.47 392.4% 57117+623k 2829+17951io 1926pf+0w -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administrator PGP 0x2A389531 | | Making life hard for others since 1977.| ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: /dev/crypto not being used in 12-STABLE
On Fri, Dec 07, 2018 at 06:38:04PM -0500, Jung-uk Kim wrote: > On 18. 12. 6., Jeremy Chadwick wrote: > > I'm not subscribed to -stable. > > > > This is in response to jkim@'s messages here: > > > > https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html > > https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html > > > > Based on what I can tell, OpenSSL 1.1.1 or thereabouts removed the > > cryptodev OpenSSL engine, which was a tie-in to BSD's cryptodev(4), > > which is accessed via /dev/crypto and related crypto(4) ioctls. > > > > Instead, they offered a replacement engine called devcrypto (what an > > awful name), with the primary focus being against something from Linux > > called cryptodev-linux, then was made to work on FreeBSD 8.4. This code > > was as of June 2017; 8.4 was EOL'd August 2015. Interesting. > > > > https://github.com/openssl/openssl/commit/4f79aff is not "add support > > for BSD" at all. It's "tweak further stuff for BSD", probably to get it > > to work on newer FreeBSD; they seem to care about crypto/cryptodev.h > > details. I asked myself: why do they care about that if they're doing > > it all themselves? Looking at the code sheds light on that. The actual > > devcrypto engine commits that added BSD support are here: > > > > https://github.com/openssl/openssl/pull/3744 > > https://github.com/openssl/openssl/pull/3744/files > > > > The commits indicate that the devcrypto is enabled by default on > > FreeBSD. But we can tell from Herbert's post and jkim@'s patch that's > > not true at all, i.e. FreeBSD disables it. Why? And is that a good > > default? > > Why do you think it is enabled by default? > > https://github.com/openssl/openssl/blob/619eb33/Configure#L428 Because of this commit to OpenSSL's CHANGES file, which is part of what I linked above; last sentence: https://github.com/openssl/openssl/pull/3744/files#diff-e4eb329834da3d36278b1b7d943b3bc9 *) Add devcrypto engine. This has been implemented against cryptodev-linux, then adjusted to work on FreeBSD 8.4 as well. Enable by configuring with 'enable-devcryptoeng'. This is done by default on BSD implementations, as cryptodev.h is assumed to exist on all of them. [Richard Levitte] Is this message incorrect/false? While I can read the perl code that is the Configure script just fine, the CHANGES entry makes me think there may be "other pieces" that affect the value of the key in that hash (e.g. some script that uses uname detection and calls Configure with argument). Are there? > Note crypto(4) was imported from OpenBSD. Since OpenBSD 4.9, it was > disabled by default. > > https://www.openbsd.org/plus49.html > > Then, they killed it in 5.7. > > https://www.openbsd.org/plus57.html > > o Unlinked the crypto(4) pseudo device (disabled by default for about 4 > years). > > Now FreeBSD is the only major BSD with /dev/crypto. That's why new > engine was not thoroughly tested. Thanks for the information. So this implies there is a desire to get rid of cryptodev(4) (which is the /dev/crypto endpoint), at least on OpenBSD. Apologies if this is off-topic, but: is "device cryptodev" something that should be removed from one's kernel config (due to what sounds like desired deprecation), while keeping "device crypto" (to ensure userland applications that use libcrypto/crypto(4) functions can still get at crypto(9))? > > Here's why I ask: > > > > The new devcrypto engine most definitely utilises /dev/crypto (thus > > cryptodev(4) and crypto(4)). cipher_init(), prepare_cipher_methods(), > > digest_init(), and prepare_digest_methods() all utilise that interface: > > > > https://github.com/openssl/openssl/pull/3744/files#diff-027f92eb0a10c0986aec873d9fd1ab66 > > > > So while OpenSSL now uses more of its own native C and assembly code > > (e.g. for AES-NI support), and that's certainly faster than all the > > overhead that cryptodev(4) brings with it (see jhb@'s post), I wonder: > > > > 1. What happens to people using crypto hardware accelerators, ex. > > hifn(4), padlock(4), ubsec(4), and safe(4)? How exactly would OpenSSL > > utilise these H/W accelerators if the devcrypto engine is disabled? > > padlock has a dynamic engine, i.e., /usr/lib/engines/padlock.so. I > believe glxsb, hifn(4), safe(4), and ubsec(4) users are very rare > nowadays. If we have significant number of users and they show > reasonable performance, then I will reconsider my decision. Consider me surpri
Re: /dev/crypto not being used in 12-STABLE
I'm not subscribed to -stable. This is in response to jkim@'s messages here: https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090202.html Based on what I can tell, OpenSSL 1.1.1 or thereabouts removed the cryptodev OpenSSL engine, which was a tie-in to BSD's cryptodev(4), which is accessed via /dev/crypto and related crypto(4) ioctls. Instead, they offered a replacement engine called devcrypto (what an awful name), with the primary focus being against something from Linux called cryptodev-linux, then was made to work on FreeBSD 8.4. This code was as of June 2017; 8.4 was EOL'd August 2015. Interesting. https://github.com/openssl/openssl/commit/4f79aff is not "add support for BSD" at all. It's "tweak further stuff for BSD", probably to get it to work on newer FreeBSD; they seem to care about crypto/cryptodev.h details. I asked myself: why do they care about that if they're doing it all themselves? Looking at the code sheds light on that. The actual devcrypto engine commits that added BSD support are here: https://github.com/openssl/openssl/pull/3744 https://github.com/openssl/openssl/pull/3744/files The commits indicate that the devcrypto is enabled by default on FreeBSD. But we can tell from Herbert's post and jkim@'s patch that's not true at all, i.e. FreeBSD disables it. Why? And is that a good default? Here's why I ask: The new devcrypto engine most definitely utilises /dev/crypto (thus cryptodev(4) and crypto(4)). cipher_init(), prepare_cipher_methods(), digest_init(), and prepare_digest_methods() all utilise that interface: https://github.com/openssl/openssl/pull/3744/files#diff-027f92eb0a10c0986aec873d9fd1ab66 So while OpenSSL now uses more of its own native C and assembly code (e.g. for AES-NI support), and that's certainly faster than all the overhead that cryptodev(4) brings with it (see jhb@'s post), I wonder: 1. What happens to people using crypto hardware accelerators, ex. hifn(4), padlock(4), ubsec(4), and safe(4)? How exactly would OpenSSL utilise these H/W accelerators if the devcrypto engine is disabled? 2. If the devcrypto engine is *enabled*, and people have aesni(4) loaded alongside cryptodev(4), which gets priority: OpenSSL's native AES-NI code or cryptodev(4)/aesni(4)? Likewise: if the decrypto engine is to remain disabled as a default: this needs to be made crystal clear in Release Notes, so that folks using H/W accelerators know they'll no longer benefit from those cards unless they use a patch (third-party so/module won't work, AFAIT, as OpenSSL's dynamic engine loading is unavailable per openssl engine -t). Might I suggest enabling devcrypto be capable via src.conf, ex. WITH_OPENSSL_ENGINE_DEVCRYPTO=true? -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administrator PGP 0x2A389531 | | Making life hard for others since 1977.| ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: lightly loaded system eats swap space
(I am not subscribed to -stable, so please CC me, though I doubt I can help in any way/shape/form past this Email) Not the first time this has come up -- and every time it has, all that's heard is crickets in the threads. Recent proof: https://lists.freebsd.org/pipermail/freebsd-stable/2018-April/088727.html https://lists.freebsd.org/pipermail/freebsd-stable/2018-April/088728.html https://lists.freebsd.org/pipermail/freebsd-stable/2018-June/089094.html I sent private mail to Peter Jeremy about his issue. I will not disclose that Email here. However, I will disclose the commits I included in said Email that have touched ZFS ARC-related code: http://www.freshbsd.org/commit/freebsd/r332785 http://www.freshbsd.org/commit/freebsd/r332552 http://www.freshbsd.org/commit/freebsd/r332540 (may help give insights) http://www.freshbsd.org/commit/freebsd/r330061 http://www.freshbsd.org/commit/freebsd/r328235 http://www.freshbsd.org/commit/freebsd/r327491 http://www.freshbsd.org/commit/freebsd/r326619 http://www.freshbsd.org/commit/freebsd/r326427 (quota-related, maybe irrelevant) http://www.freshbsd.org/commit/freebsd/r323667 In short (and nebulous as hell; sorry, I cannot be more specific given the nature of the problem): there have been changes about ZFS's memory allocation/releasing decision-making scheme compared to ZFS on "older" FreeBSD (i.e. earlier 11.x, and definitely 10.x and 9.x). Recommendations like "limit your ARC" are nothing new in FreeBSD, but are still ridiculous kludges: tech-lists' system clearly has 105GB MRU (MRU = most recently used) in ARC, meaning there is memory that can be released back to the rest of the OS for general use (re: memory contention/pressure situation), but the OS is choosing to use swap instead, eventually exhausting it. That logic sounds broken, IMO. (And yes I did notice the size of bhyve process) ZFS-related kernel folks need to be involved in this conversation. For whatever reason, in the past several years, related committers are no longer participating in these type of discussions. The opposite was true back in the 7.x to 9.x days. The answers have to come from them. I don't know, today, a) how they prefer these problems get reported to them, or b) what exact information they want that can help narrow it down (tech-lists' provided data is, IMO, good and par for the course). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: kern.maxswzone causing serious problems
I am not subscribed to -stable, so please keep me CC'd. I mailed -stable about this problem, or a variation of it, earlier this month: https://lists.freebsd.org/pipermail/freebsd-stable/2018-March/088467.html What isn't publicly visible is the list of individuals I CC'd on that mail who had touched this code in recent days: k...@freebsd.org, d...@freebsd.org, pluk...@freebsd.org, ead...@freebsd.org I received no response from them on this matter. At least two, however, have been extremely busy commit-wise, so I imagine folks are just swamped right now + have higher priorities. I did not read or review your {naiveanalysis} section or your patch, as tinkering with VM design/internals is *way* outside my comfort zone. I will say that printing the sizes in a unit other than pages would be generally helpful; I did try to figure out what value to use for kern.maxswzone as a workaround by digging through kernel code but gave up, as I wasn't able to truly determine what "pages" actually represented (size-wise) in this specific context. I hope someone with src commit bit will comment, as code slush for 11.2-RELEASE begins on April 20th: https://www.freebsd.org/releases/11.2R/schedule.html Else a separate PR can be opened if requested. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stability of 11.1S
(Please keep me CC'd as I am not subscribed to -stable) I haven't seen any issues, but that means very little. Details: Two boxes -- one bare metal, one VPS (QEMU): $ uname -a FreeBSD XXX 11.1-STABLE FreeBSD 11.1-STABLE #0 r330529: Tue Mar 6 11:36:04 PST 2018 root@XXX:/usr/obj/usr/src/sys/X7SBA_RELENG_11_amd64 amd64 $ uptime 10:33a.m. up 13 days, 18:10, 2 users, load averages: 0.15, 0.19, 0.16 $ uname -a FreeBSD 11.1-STABLE FreeBSD 11.1-STABLE #0 r330753: Sat Mar 10 21:34:20 PST 2018 root@:/usr/obj/usr/src/sys/_RELENG_11_amd64 amd64 $ uptime 10:33a.m. up 9 days, 10:46, 1 user, load averages: 0.31, 0.35, 0.31 Systems were updated recently because I wanted to test Meltdown/Spectre mitigation (more on that below). Prior to that, bare metal was running 9.x with 200+ day uptimes, VPS was running 10.x with 80-90 day uptimes (VPS providers' HV crashed, i.e. not FreeBSD issues). Since load averages on FreeBSD 10.x onward cannot be trusted[1][2], I have to explain the general system specs and loads: Bare metal box is an Intel Core 2 Quad Q9550, 8GB RAM, doing very little other than running Apache + lots of cron jobs for systems stuff + ZFS with several disks (but not OS disk; that's a dedicated SSD w/ UFS + SU (not SUJ). The cron jobs tend to stress the network and disk I/O a bit; ZFS gets used every day, but only "heavily" during LAN file copies to/from it (Samba is involved), and during nightly backups with rsync. VPS box is some form of QEMU-based Intel Haswell CPU, 1GB RAM, doing general things like Apache + postfix + SpamAssassin + some other daemons, and a lot of Perl. Swap is used heavily on this machine. Disks are all vtblk, and I use multiple to get capacity for the needed space for /usr/src and /usr/obj. Everything is UFS + SU (not SUJ). Things off the top of my head that might be relevant to you: 1. r329462 added Meltdown/Spectre mitigation[3][4]. Bare metal box has the below in /boot/loader.conf, since this is a machine that does not need either given its environment: # Disable PTI (Meltdown mitigation) and IBRS (Spectre mitigation); these # are not relevant on this bare-metal system given its environment and # use case. Details of these tunables is here: # https://lists.freebsd.org/pipermail/freebsd-stable/2018-March/088526.html # vm.pmap.pti="0" hw.ibrs_disable="1" VPS box has no tunings of this sort, and ends up with the below, because the hosting provider has no done BIOS + QEMU updates to add IBRS support (they're very aware of it + have attempted it twice but apparently it didn't go well): vm.pmap.pti: 1 hw.ibrs_disable: 1 hw.ibrs_active: 0 2. If your CPU is an AMD Ryzen, there is a VERY long discussion on -stable about problems with Ryzen manifesting itself in a very uncomfortable way, leading to system lock-ups[5]. There are unofficial patches you can try. I would recommend chiming in there and not here, if relevant to your systems. And yes, the massive number of MFCs that eadler@ is doing make tracking down exact things more tedious than normal, especially when you have sweeping commits like this one[6][7] (which, AFAIK, was acting as a major blocker for several other MFCs and causing general merge problems). However, I commend his efforts; it's a massive undertaking (I would say full-time job). We stable users must accept that we are running stable/11 for a reason -- not only to get fixes faster, but to act a form of "guinea pig" that don't want the risks of HEAD/CURRENT. The more people using stable/11 the better overall feedback devs can get on bugs/issues before making it into the next -RELEASE. This is exactly why, for those of you who have known me over the years, I actually "track" or "follow" commits as they come across. I do this by using the FreshBSD site[8] alongside manual review of svnlite update output. I generally know what files/bits are relevant to my interests. Hope this gives you some things to think about. Good luck! [1]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541#c8 [2]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541#c22 [3]: https://lists.freebsd.org/pipermail/freebsd-stable/2018-February/088396.html [4]: https://lists.freebsd.org/pipermail/freebsd-stable/2018-March/088526.html [5]: https://lists.freebsd.org/pipermail/freebsd-stable/2018-January/thread.html#88174 [6]: http://www.freshbsd.org/commit/freebsd/r330897 [7]: https://svnweb.freebsd.org/base?view=revision&revision=330897 [8]: http://www.freshbsd.org/?branch=RELENG_11&project=freebsd -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd
total configured swap pages exceeds maximum recommended amount
I am not subscribed to -stable, so please keep me CC'd. I am CC'ing folks who have touched this code or dealt with it recently or in the past. Something has changed regarding how FreeBSD determines when to emit this message. I do not know if this is a regression. The message below comes from a stable/11 r330260 amd64 box w/ 8GB RAM and 32GB swap during boot: warning: total configured swap (8358563 pages) exceeds maximum recommended amount (8141112 pages). warning: increase kern.maxswzone or reduce amount of swap. In stable/9, the message could be squelched via kern.maxswzone="0" in loader.conf. Confirmation is here (see Dag-Erling's responses): https://lists.freebsd.org/pipermail/freebsd-stable/2012-August/069301.html In stable/11, this no longer appears to work (the default value is 0). The reason this box has 32GB swap (4x more than existing RAM) has to do with planning ahead. The system can support up to 32GB RAM, but does not have all the DIMM slots populated at this time. Swap on this machine is a physical partition on its main disk, thus "shrinking swap" is not not possible without a full format/reinstall. This code has been touched/tweaked semi-recently in PR 221356: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221356 Code references: stable/9: https://svnweb.freebsd.org/base/stable/9/sys/vm/swap_pager.c?annotate=284100#l2132 stable/10: https://svnweb.freebsd.org/base/stable/10/sys/vm/swap_pager.c?annotate=320557#l2156 stable/11: https://svnweb.freebsd.org/base/stable/11/sys/vm/swap_pager.c?annotate=329591#l2126 My questions: how does one squelch this warning message on such systems running stable/11? If it involves setting the tunable to a more useful value, how does one reliably calculate that value? Thank you. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
stable/11 r329462 - Meltdown/Spectre MFC questions
Reference: https://svnweb.freebsd.org/base?view=revision&revision=329462 Do the following new loader tunables and sysctls have documentation anywhere? I ask because I wish to know how to turn all of this off (yes you heard me correctly), as not all systems necessarily require mitigation of these flaws. Best I can tell from skimming source: vm.pmap.pti - Description: Page Table Isolation enabled - Loader tunable, visible in sysctl (read-only) - Integer - Default value: depends on CPU model and capabilities, see function pti_get_default(); looks like AMD = 0, any CPU with RDCL_NO capability enabled = 0, else 1 hw.ibrs_active - Description: Indirect Branch Restricted Speculation active - sysctl (read-only) - Integer - Real-time indicator as to if IBRS is currently on or off hw.ibrs_disable - Description: Disable Indirect Branch Restricted Speculation - Loader tunable and sysctl tunable (read-write) - Integer - Default value: unsure. Variable declaration has 1 but SYSCTL_PROC() macro has 0. Thank you. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r296462 - in stable/9: crypto/openssl/crypto/bio crypto/openssl/crypto/bn crypto/openssl/doc/apps crypto/openssl/ssl secure/usr.bin/openssl/man
ied to send mail to myself locally, as postfix's smtp(8) links to libcrypt/libssl/libcrypto. Bzzt, nope: pid 5046 (smtp), uid 125: exited on signal 11 Mar 9 04:49:38 icarus postfix/master[802]: daemon started -- version 3.1.0, configuration /usr/local/etc/postfix Mar 9 04:54:38 icarus postfix/pickup[5043]: 1835D1AF150: uid=1000 from= Mar 9 04:54:38 icarus postfix/cleanup[5044]: 1835D1AF150: message-id=<20160309125438.ga5...@icarus.home.lan> Mar 9 04:54:38 icarus postfix/qmgr[804]: 1835D1AF150: from=, size=631, nrcpt=1 (queue active) Mar 9 04:54:38 icarus postfix/qmgr[804]: warning: private/smtp socket: malformed response Mar 9 04:54:38 icarus postfix/qmgr[804]: warning: transport smtp failure -- see a previous warning/fatal/panic logfile record for the problem description Mar 9 04:54:38 icarus postfix/master[802]: warning: process /usr/local/libexec/postfix/smtp pid 5046 killed by signal 11 Mar 9 04:54:38 icarus postfix/master[802]: warning: /usr/local/libexec/postfix/smtp: bad command startup -- throttling Mar 9 04:54:38 icarus postfix/error[5048]: 1835D1AF150: to=, relay=none, delay=0.5, delays=0.05/0.44/0/0.01, dsn=4.3.0, status=deferred (unknown mail transport error) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/10: high load average when box is idle
On Thu, Oct 29, 2015 at 11:00:32AM +0100, Miroslav Lachman wrote: > Jeremy Chadwick wrote on 10/27/2015 06:05: > >(I am not subscribed to the mailing list, please keep me CC'd) > > > >Issue: a stable/10 system that has an abnormally high load average (e.g. > >0.15, but may be higher depending on other variables which I can't > >account for) when the machine is definitely idle (i.e. cannot be traced > >to high interrupt usage per vmstat -i, cannot be traced to a userland > >process or kernel thread, etc.). > > > >This problem has been discussed many times on the FreeBSD mailing lists > >and the FreeBSD forum (including some folks seeing it on 9.x, but my > >complaint here is focused on 10.x so please focus there). > > > >I'd politely like to request that anyone experiencing this, or who has > >experienced it (and if you know when it stopped or why, including what > >you may have done, include that), to chime in on this ticket from 2012 > >(made for 9.x but style of issue still applies; c#5 is quite valid): > > > >https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541 > > > >For those still experiencing it, I'd suggest reading c#8 and seeing if > >sysctl kern.eventtimer.periodic=1 relieves the problem for you. (At > >this time I would not suggest leaving that set indefinitely, as it does > >seem to increase the interrupt rate in cpuX:timer in vmstat -i. But for > >me kern.eventtimer.periodic=1 "fixes" the issue) > > Is it on real HW server or in some kind of virtualization? I am seeing load > 0.5 - 1.2 on three virtual machines in VMware. The machines are without any > traffic. Just fresh instalation of FreeBSD 10.1 and some services without > any public content. I've seen it on both bare-metal and VMs. Please see c#8 in the ticket; there's an itemised list of where I've seen it, but I'm sure it's not limited to just those. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
stable/10: high load average when box is idle
(I am not subscribed to the mailing list, please keep me CC'd) Issue: a stable/10 system that has an abnormally high load average (e.g. 0.15, but may be higher depending on other variables which I can't account for) when the machine is definitely idle (i.e. cannot be traced to high interrupt usage per vmstat -i, cannot be traced to a userland process or kernel thread, etc.). This problem has been discussed many times on the FreeBSD mailing lists and the FreeBSD forum (including some folks seeing it on 9.x, but my complaint here is focused on 10.x so please focus there). I'd politely like to request that anyone experiencing this, or who has experienced it (and if you know when it stopped or why, including what you may have done, include that), to chime in on this ticket from 2012 (made for 9.x but style of issue still applies; c#5 is quite valid): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541 For those still experiencing it, I'd suggest reading c#8 and seeing if sysctl kern.eventtimer.periodic=1 relieves the problem for you. (At this time I would not suggest leaving that set indefinitely, as it does seem to increase the interrupt rate in cpuX:timer in vmstat -i. But for me kern.eventtimer.periodic=1 "fixes" the issue) Thanks. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stable/9 from today mpssas_scsiio timeouts
On Tue, Jul 09, 2013 at 11:46:24AM -0400, Outback Dingo wrote: > On Tue, Jul 9, 2013 at 11:30 AM, Jeremy Chadwick wrote: > > > On Tue, Jul 09, 2013 at 11:20:45AM -0400, Outback Dingo wrote: > > > On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick wrote: > > > > > > > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote: > > > > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo < > > outbackdi...@gmail.com > > > > >wrote: > > > > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick > > > > wrote: > > > > > > > > > > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote: > > > > > >> > as of stable today im seeing alot of new mps time outs > > > > > >> > > > > > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul 8 16:34:28 > > UTC > > > > 2013 > > > > > >> > root@:/usr/obj/nas/usr/src/sys/ > > > > > >> > > > > > > >> > mps1@pci0:130:0:0: class=0x010700 card=0x30201000 > > > > chip=0x00721000 > > > > > >> > rev=0x03 hdr=0x00 > > > > > >> > vendor = 'LSI Logic / Symbios Logic' > > > > > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > > > > > >> > class = mass storage > > > > > >> > subclass = SAS > > > > > >> > > > > > > >> > > > > > > >> > mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm > > > > > >> > 0xff80021a6b78 > > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > > > SMID > > > > > >> 983 > > > > > >> > command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800 > > > > > >> > mps0: mpssas_alloc_tm freezing simq > > > > > >> > mps0: timedout cm 0xff80021a6b78 allocated tm > > 0xff80021587b0 > > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > > > SMID > > > > > >> 983 > > > > > >> > completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800 > > > > during > > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a > > code > > > > 0x0 > > > > > >> count > > > > > >> > 1 > > > > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting > > > > TaskMID > > > > > >> 983 > > > > > >> > mps0: mpssas_free_tm releasing simq > > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 > > > > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout > > > > > >> > (probe40:mps0:0:40:0): Retrying command > > > > > >> > mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm > > > > > >> > 0xff80023e5b78 > > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length > > 36 > > > > SMID > > > > > >> 983 > > > > > >> > command timeout cm 0xff80023e5b78 ccb 0xfe002be14800 > > > > > >> > mps1: mpssas_alloc_tm freezing simq > > > > > >> > mps1: timedout cm 0xff80023e5b78 allocated tm > > 0xff80023977b0 > > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length > > 36 > > > > SMID > > > > > >> 983 > > > > > >> > completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800 > > > > during > > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a > > code > > > > 0x0 > > > > > >> count > > > > > >> > 1 > > > > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting > > > > TaskMID > > > > > >> 983 > > > > > >> > mps1: mpssas_free_tm releasing simq > > > > > >> > (probe292:mps1:0:37:
Re: Stable/9 from today mpssas_scsiio timeouts
On Tue, Jul 09, 2013 at 11:20:45AM -0400, Outback Dingo wrote: > On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick wrote: > > > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote: > > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo > >wrote: > > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick > > wrote: > > > > > > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote: > > > >> > as of stable today im seeing alot of new mps time outs > > > >> > > > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul 8 16:34:28 UTC > > 2013 > > > >> > root@:/usr/obj/nas/usr/src/sys/ > > > >> > > > > >> > mps1@pci0:130:0:0: class=0x010700 card=0x30201000 > > chip=0x00721000 > > > >> > rev=0x03 hdr=0x00 > > > >> > vendor = 'LSI Logic / Symbios Logic' > > > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > > > >> > class = mass storage > > > >> > subclass = SAS > > > >> > > > > >> > > > > >> > mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm > > > >> > 0xff80021a6b78 > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > SMID > > > >> 983 > > > >> > command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800 > > > >> > mps0: mpssas_alloc_tm freezing simq > > > >> > mps0: timedout cm 0xff80021a6b78 allocated tm 0xff80021587b0 > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > SMID > > > >> 983 > > > >> > completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800 > > during > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code > > 0x0 > > > >> count > > > >> > 1 > > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting > > TaskMID > > > >> 983 > > > >> > mps0: mpssas_free_tm releasing simq > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 > > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout > > > >> > (probe40:mps0:0:40:0): Retrying command > > > >> > mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm > > > >> > 0xff80023e5b78 > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > SMID > > > >> 983 > > > >> > command timeout cm 0xff80023e5b78 ccb 0xfe002be14800 > > > >> > mps1: mpssas_alloc_tm freezing simq > > > >> > mps1: timedout cm 0xff80023e5b78 allocated tm 0xff80023977b0 > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > SMID > > > >> 983 > > > >> > completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800 > > during > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code > > 0x0 > > > >> count > > > >> > 1 > > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting > > TaskMID > > > >> 983 > > > >> > mps1: mpssas_free_tm releasing simq > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 > > > >> > (probe292:mps1:0:37:0): CAM status: Command timeout > > > >> > (probe292:mps1:0:37:0): Retrying command > > > >> > > > >> 1. What revision were you running before (i.e. what were you on prior > > to > > > >> the upgrade)? > > > >> > > > > > > > > > > > > Sorry I was on 252595 from July 3 > > > > And does rolling back to r252595 resolve the problem for you? > > > > Because the only commit I see between r253035 and r252595 that might > > account for some kind of behavioural change, unless I missed one while > > skimming the commit history, is the following: > > > > r252730 -- http://www.freshbsd.org/commit/freebsd/r252730 > > > > If at all possible, please try updating to r253037 or newer to see > > if tha
Re: Stable/9 from today mpssas_scsiio timeouts
On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote: > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo wrote: > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick wrote: > > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote: > >> > as of stable today im seeing alot of new mps time outs > >> > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul 8 16:34:28 UTC 2013 > >> > root@:/usr/obj/nas/usr/src/sys/ > >> > > >> > mps1@pci0:130:0:0: class=0x010700 card=0x30201000 chip=0x00721000 > >> > rev=0x03 hdr=0x00 > >> > vendor = 'LSI Logic / Symbios Logic' > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > >> > class = mass storage > >> > subclass = SAS > >> > > >> > > >> > mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm > >> > 0xff80021a6b78 > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID > >> 983 > >> > command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800 > >> > mps0: mpssas_alloc_tm freezing simq > >> > mps0: timedout cm 0xff80021a6b78 allocated tm 0xff80021587b0 > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID > >> 983 > >> > completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800 during > >> > recovery ioc 8048 scsi 0 state c xfer 0 > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0 > >> count > >> > 1 > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting TaskMID > >> 983 > >> > mps0: mpssas_free_tm releasing simq > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 > >> > (probe40:mps0:0:40:0): CAM status: Command timeout > >> > (probe40:mps0:0:40:0): Retrying command > >> > mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm > >> > 0xff80023e5b78 > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID > >> 983 > >> > command timeout cm 0xff80023e5b78 ccb 0xfe002be14800 > >> > mps1: mpssas_alloc_tm freezing simq > >> > mps1: timedout cm 0xff80023e5b78 allocated tm 0xff80023977b0 > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID > >> 983 > >> > completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800 during > >> > recovery ioc 8048 scsi 0 state c xfer 0 > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0 > >> count > >> > 1 > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting TaskMID > >> 983 > >> > mps1: mpssas_free_tm releasing simq > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 > >> > (probe292:mps1:0:37:0): CAM status: Command timeout > >> > (probe292:mps1:0:37:0): Retrying command > >> > >> 1. What revision were you running before (i.e. what were you on prior to > >> the upgrade)? > >> > > > > > > Sorry I was on 252595 from July 3 And does rolling back to r252595 resolve the problem for you? Because the only commit I see between r253035 and r252595 that might account for some kind of behavioural change, unless I missed one while skimming the commit history, is the following: r252730 -- http://www.freshbsd.org/commit/freebsd/r252730 If at all possible, please try updating to r253037 or newer to see if that has some effect/improvement. Why I mention that commit: r253037 -- http://www.freshbsd.org/commit/freebsd/r253037 Because the only mps(4) changes done in recent days are: http://svnweb.freebsd.org/base/stable/9/sys/dev/mps/mps_sas.c?view=log r253037 r251899 r251874 Else I'd say what you're experiencing is legitimate/unrelated to kernel changes. I can only speculate. The messages to me indicate that some part of the kernel is submitting a SCSI INQUIRY request to the underlying device(s) which results in a CAM timeout, i.e. the disk attached to the controller did not respond promptly (while the controller seemed to be alive/well). If these disks (which we do not know the type of -- no dmesg provided, etc.) are SSDs then TRIM behaviour is possibly causing the drive to take too long to perform its TRIM cleanup, or, the drives themselves are doing some kind of garbage collection which is taking quite a long time. Steven et all may have a different (and almost certainly more accurate) analysis. It would really help if you could provide "dmesg" from the machine, as well as any details about your setup (if ZFS, "zpool status", etc.), in addition to (if SSDs) "sysctl -a | grep -i trim". All this matters. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stable/9 from today mpssas_scsiio timeouts
On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote: > as of stable today im seeing alot of new mps time outs > > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul 8 16:34:28 UTC 2013 > root@:/usr/obj/nas/usr/src/sys/ > > mps1@pci0:130:0:0: class=0x010700 card=0x30201000 chip=0x00721000 > rev=0x03 hdr=0x00 > vendor = 'LSI Logic / Symbios Logic' > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > class = mass storage > subclass = SAS > > > mps0: mpssas_scsiio_timeout checking sc 0xff8002145000 cm > 0xff80021a6b78 > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983 > command timeout cm 0xff80021a6b78 ccb 0xfe002bb5f800 > mps0: mpssas_alloc_tm freezing simq > mps0: timedout cm 0xff80021a6b78 allocated tm 0xff80021587b0 > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983 > completed timedout cm 0xff80021a6b78 ccb 0xfe002bb5f800 during > recovery ioc 8048 scsi 0 state c xfer 0 > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0 count > 1 > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting TaskMID 983 > mps0: mpssas_free_tm releasing simq > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 > (probe40:mps0:0:40:0): CAM status: Command timeout > (probe40:mps0:0:40:0): Retrying command > mps1: mpssas_scsiio_timeout checking sc 0xff8002384000 cm > 0xff80023e5b78 > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983 > command timeout cm 0xff80023e5b78 ccb 0xfe002be14800 > mps1: mpssas_alloc_tm freezing simq > mps1: timedout cm 0xff80023e5b78 allocated tm 0xff80023977b0 > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 983 > completed timedout cm 0xff80023e5b78 ccb 0xfe002be14800 during > recovery ioc 8048 scsi 0 state c xfer 0 > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code 0x0 count > 1 > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting TaskMID 983 > mps1: mpssas_free_tm releasing simq > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 > (probe292:mps1:0:37:0): CAM status: Command timeout > (probe292:mps1:0:37:0): Retrying command 1. What revision were you running before (i.e. what were you on prior to the upgrade)? 2. Something in your /usr/src differs from stock r253035, hence the "M" at the end. What is it? Answer to #1 will help me narrow down the commits; there have been CAM and mps changes fairly recently. Otherwise you can dig through the commits yourself (you'll need to go through many, many pages, as there was a recent massive influx of SCTP changes (50+ commits)): http://www.freshbsd.org/?branch=RELENG_9&project=freebsd -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
k.c:988 > > > #12 0xc07ba904 in fork_trampoline () at > > > /src/src-9/sys/i386/i386/exception.s:279 > > > (kgdb) up 10 > > > #10 0xc0738f94 in softdep_flush () at > > > /src/src-9/sys/ufs/ffs/ffs_softdep.c:1414 > > > 1414progress += softdep_process_worklist(mp, > > > 0); > > > > > > -Andre > > > > This looks unrelated, and exactly this panic is usually has one of two > > causes: > > - corrupted filesystem, run fsck to recheck it; > > root@palveli:~>fsck /dev/stripe/p > ** /dev/stripe/p > ** Last Mounted on /palveli > ** Phase 1 - Check Blocks and Sizes > ** Phase 2 - Check Pathnames > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > ** Phase 5 - Check Cyl groups > 9895 files, 2039706 used, 15697693 free (5397 frags, 1961537 blocks, 0.0% > fragmentation) > > * FILE SYSTEM IS CLEAN * Taken from your previous mail (showing only UFS stuff): http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073817.html >>>> fstab: >>>> -- >>>> /dev/da0s1a/ ufs noatime,rw >>>> 0 1 >>>> /dev/da0s1d/usrufs noatime,rw >>>> 0 2 >>>> /dev/da0s1e/varufs noatime,nosuid,rw >>>> 0 2 >>>> /dev/da10p1/share2 ufs >>>> suiddir,groupquota,noatime,nosuid,rw0 2 >>>> /dev/da10p2/raid2 ufs userquota,noatime,nosuid,rw >>>> 0 2 Where is gstripe(8) in that picture? Are you **sure** this is the same system? Surely I'm missing something here... Can you provide details of the stripe, specifically "gstripe list" so I can see what the disks are and then ask you for "smartctl -a" output for each of them (to try and rule out disk-level problems that may be causing oddities at the layer underneathe the filesystem (sometimes fsck will not catch this))? -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: make buildworld is now 50% slower
On Sun, Jul 07, 2013 at 05:47:31AM -0500, Matthew D. Fuller wrote: > Apropos of nothing, but... > > On Sun, Jul 07, 2013 at 03:17:14AM -0700 I heard the voice of > Jeremy Chadwick, and lo! it spake thus: > > > > WITHOUT_LIB32=true > > suggests you're running amd64, which I'm pretty sure means > > > - I do increase kern.maxdsiz, kern.dfldsiz, and kern.maxssiz in > > /boot/loader.conf to 2560M/2560M/256M respectively, but that was mainly > > from the days when I ran MySQL and needed a huge userland processes. > > are not necessarily _in_creases, and may well be mostly _de_creases. > e.g., on a RELENG_9 box with 8 gig of physical RAM: > > % sysctl kern.{max{d,s},dfld}siz > kern.maxdsiz: 34359738368 > kern.maxssiz: 536870912 > kern.dfldsiz: 134217728 > > while a -CURRENT box with 16 has dfldsiz blown all the way up too. I > don't recall doing anything to change them at all recently, and a > glance over loader.conf, sysctl.conf, rc.local, and the kernel configs > doesn't turn up anything. Thanks! The settings I mention are from "ancient times" -- specifically RELENG_6 on i386 (I know because I found an old mailing list post of mine discussing the settings with a user). The problem as I said was that mysqld would crap itself (crash and be quite loud about it) if the process allocated too much memory/became too large. I am fairly certain the issue related to the data size, **not** the stack size (but I didn't see the harm in increasing that either). It's good to know I can remove these on amd64. Yay, one less thing in loader.conf I have to deal with... :-) Thanks again! -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: USB ports on Lenovo T400 do not work after a suspend/resume
On Sun, Jul 07, 2013 at 03:51:12PM +1000, Ian Smith wrote: > On Sun, 30 Jun 2013 15:02:57 -0700, Adrian Chadd wrote: > > On 30 June 2013 07:22, Ian Smith wrote: > [..] > > > Nothing of note that I can see, if that usb hub-to-bus remapping is > > > normal. As you said, 'CPU0: local APIC error 0x40' looks maybe sus. > > > Maybe someone who knows might comment on that? > > Does noone know what that signifies? Maybe it's not relevant to this. It's too vague to know. The error comes from lapic_handle_error(), which is a generic/small routine which pulls the local APIC error status register. (Note I'm saying APIC, not ACPI -- two different things) apic_vector.S sets this up/makes use of this function, and its done as an interrupt handler. I think this is one of those situations where you have to know *what* is being set up/done at that moment in time for the error code to mean something. Maybe booting verbose would give more information as to what was being done that lead up to the line. I've CC'd John Baldwin who might have some ideas. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: make buildworld is now 50% slower
On Sun, Jul 07, 2013 at 11:50:29AM +0300, Daniel Braniss wrote: > > On Fri, Jul 05, 2013 at 02:39:00PM +0200, Dimitry Andric wrote: > > > [redirecting to the correct mailing list, freebsd-stable@ ...] > > > > > > On Jul 5, 2013, at 10:53, Daniel Braniss wrote: > > > > after today's update of 9.1-STABLE I noticed that make > > > > build[world|kernel] are > > > > taking conciderable more time, is it because the upgrade of clang? > > > > and if so, is the code produced any better? > > > > > > > > before: > > > > buildwordl: 26m4.52s real 2h28m32.12s user 36m6.27s sys > > > > buildkernel: 7m29.42s real 23m22.22s user 4m26.26s sys > > > > > > > > today: > > > > buildwordl: 34m29.80s real 2h38m9.37s user 37m7.61s sys > > > > buildkernel:15m31.52s real 22m59.40s user 4m33.06s sys > > > > > > Ehm, your user and sys times are not that much different at all, they > > > add up to about 5% slower for buildworld, and 1% faster for build kernel. > > > Are you sure nothing else is running on that machine, eating up CPU time > > > while you are building? :) > > > > > > But yes, clang 3.3 is of course somewhat larger than 3.2. You might > > > especially notice that, if you are using gcc, which is very slow at > > > compiling C++. > > > > > > In any case, if you do not care about clang, just set WITHOUT_CLANG= in > > > your /etc/src.conf, and you can shave off some build time. > > > > I just built world/kernel (stable/9 r252769) 5 hours ago. Results: > > > > time make -j4 buildworld = roughly 21 minutes on my hardware > > time make -j4 buildkernel = roughly 8 minutes on my hardware > > > > It's been a long time since I saw such numbers, maybe it's time > to see where time is being spent, I will run it without clang to compare with > your numbers. > > > These numbers are about the norm for me, meaning I do not see a > > substantial increase in build times. > > > > Key point: I do not use/build/grok clang, i.e. WITHOUT_CLANG=true is in > > my src.conf. But I am aware of the big clang change in r252723. > > > > If hardware details are wanted, ask, but I don't think it's relevant to > > what the root cause is. > > > > from what you are saying, I guess clang is not responsible. > looking for my Sherlock Holmes hat. Some points to those numbers I stated above: - System is an Intel Q9550 with 8GB of RAM - Single SSD (UFS2+SU+TRIM) is used for root, /usr, /var, /tmp, and swap - /usr/src is on ZFS (raidz1 + 3 disks) -- however I got equally small numbers when it was on the SSD - /usr/src is using compression=lz4 (to folks from -fs: yeah, I'm trying it out to see how much of an impact it has on interactivity. I can still tell when it kicks in, but it's way, way better than lzjb. Rather not get into that here) - Contents of /etc/src.conf (to give you some idea of what I disable): WITHOUT_ATM=true WITHOUT_BLUETOOTH=true WITHOUT_CLANG=true WITHOUT_FLOPPY=true WITHOUT_FREEBSD_UPDATE=true WITHOUT_INET6=true WITHOUT_IPFILTER=true WITHOUT_IPX=true WITHOUT_KERBEROS=true WITHOUT_LIB32=true WITHOUT_LPR=true WITHOUT_NDIS=true WITHOUT_NETGRAPH=true WITHOUT_PAM_SUPPORT=true WITHOUT_PPP=true WITHOUT_SENDMAIL=true WITHOUT_WIRELESS=true WITH_OPENSSH_NONE_CIPHER=true It's WITHOUT_CLANG that cuts down the buildworld time by a *huge* amount (I remember when it got introduced, my buildworld jumped up to something like 40 minutes); the rest probably save a minute or two at most. - /etc/make.conf doesn't contain much that's relevant, other than: CPUTYPE?=core2 # For DTrace; also affects ports STRIP= CFLAGS+=-fno-omit-frame-pointer - I do some tweaks in /etc/sysctl.conf (mainly vfs.read_min and vfs.read_max), but I will admit I am not completely sure what those do quite yet (I just saw the commit from scottl@ a while back talking about how an increased vfs.read_min helps them at Netflix quite a lot). I also adjust kern.maxvnodes. - Some ZFS ARC settings are adjusted in /boot/loader.conf (I'm playing with some stuff I read in Andriy Gapon's ZFS PDF), but they definitely do not have a major impact on the numbers I listed off. - I do increase kern.maxdsiz, kern.dfldsiz, and kern.maxssiz in /boot/loader.conf to 2560M/2560M/256M respectively, but that was mainly from the days when I ran MySQL and needed a huge userland processes. All in all my numbers are low/small because of two things: the SSD, and WITHOUT_CLANG. Hope this gives you somewhere to start/stuff to ponder. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: When will subversion be ready for updating/upgrading src && ports?
On Fri, Jul 05, 2013 at 08:38:07PM -0700, bsd-li...@hush.com wrote: > Greetings, > Well after posting a couple of questions to the list regarding questions I > had before > migrating from (cv)sup to subversion, I took the leap: > > mv /usr/src/ /usr/src.old/ > > mkdir /usr/src > > mv /usr/ports/ /usr/ports.old/ > > mkdir /usr/ports > > rm -fr /var/db/sup/* > rm -fr /var/db/portsnap/* > > svn checkout svn://svn.freebsd.org/base/stable/8 /usr/src > > svn checkout svn://svn.freebsd.org/ports/head /usr/ports > > I then performed a portmaster -a > > which left me with a non-working X desktop. > Turned out to be a problem with the Nvidia driver -- was 2.9.40, now 3.10.14. > But loading it in loader.conf didn't create /dev/nvidia0, or /dev/nvidiactl > To make a long story short, I attempted to update my src && ports, and try > agaiin; > > svn update svn://svn.freebsd.org/ports/head /usr/ports > FAILED! I don't have the exact output Incorrect syntax -- should be one of the following (your choice): cd /usr/ports && svn update svn update /usr/ports > So I tried: > cd /usr/ports > svn update > Which replied: > svn: E155036: Please see the 'svn upgrade' command > svn: E155036: The working copy at '/usr/ports' > is too old (format 29) to work with client version '1.8.0 (r1490375)' > (expects f > ormat 31). You need to upgrade the working copy first. > > So I guess subversion isn't (yet) designed for this sort of stuff, which > leaves me with a useless box. :( Incorrect. Please look very, VERY closely at what the command is that it's telling you to use. Read it 4 times over. Pay close attention. The explanation: You installed subversion 1.7 or earlier when you originally started (i.e. subversion-1.7 or 1.6 or something else was installed). No problem. You then updated your ports tree. No problem. You then ran portmaster -a to upgrade/update all your ports (rebuild them). No problem. However this updated subversion to the latest in ports, which is 1.8. The subversion metadata (stored in the .svn directories, ex. /usr/src/.svn, /usr/ports/.svn, etc.) has changed as of 1.8. This is why you need to do "svn upgrade" in those directories. This is a one-time thing you have to do. That's all. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: make buildworld is now 50% slower
On Fri, Jul 05, 2013 at 02:39:00PM +0200, Dimitry Andric wrote: > [redirecting to the correct mailing list, freebsd-stable@ ...] > > On Jul 5, 2013, at 10:53, Daniel Braniss wrote: > > after today's update of 9.1-STABLE I noticed that make build[world|kernel] > > are > > taking conciderable more time, is it because the upgrade of clang? > > and if so, is the code produced any better? > > > > before: > > buildwordl: 26m4.52s real 2h28m32.12s user 36m6.27s sys > > buildkernel: 7m29.42s real 23m22.22s user 4m26.26s sys > > > > today: > > buildwordl: 34m29.80s real 2h38m9.37s user 37m7.61s sys > > buildkernel:15m31.52s real 22m59.40s user 4m33.06s sys > > Ehm, your user and sys times are not that much different at all, they > add up to about 5% slower for buildworld, and 1% faster for build kernel. > Are you sure nothing else is running on that machine, eating up CPU time > while you are building? :) > > But yes, clang 3.3 is of course somewhat larger than 3.2. You might > especially notice that, if you are using gcc, which is very slow at > compiling C++. > > In any case, if you do not care about clang, just set WITHOUT_CLANG= in > your /etc/src.conf, and you can shave off some build time. I just built world/kernel (stable/9 r252769) 5 hours ago. Results: time make -j4 buildworld = roughly 21 minutes on my hardware time make -j4 buildkernel = roughly 8 minutes on my hardware These numbers are about the norm for me, meaning I do not see a substantial increase in build times. Key point: I do not use/build/grok clang, i.e. WITHOUT_CLANG=true is in my src.conf. But I am aware of the big clang change in r252723. If hardware details are wanted, ask, but I don't think it's relevant to what the root cause is. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: UFS Trim wont stay set
On Thu, Jul 04, 2013 at 04:48:38PM -0400, Mike Jakubik wrote: > On 07/04/13 16:33, Jeremy Chadwick wrote: > >Yup, experienced this myself many times over. The reasons are > >understood (it's not limited to just the TRIM bits, it's related > >to anything adjusting the superblock -- it gets cached in memory > >in certain situations and not flushed back to disk). Hint: are you > >booting into single user and then issuing a "mount" command before > >doing your tunefs stuff? If so, this is probably what's causing it > >(at least it was in my case). Instead just boot into single-user, > >do not mount anything, and use /sbin/tunefs (if available -- > >depends on your filesystem setup) or /rescue/tunefs. > > I booted in to single user mode and the system mounted the only file > system there, which is mounted at /. What i did now however is boot > off a Live CD and run tunefs, this did the trick! I talked with Andriy Gapon a couple years ago about this, actually. I had to dig up the thread. Here are the relevant parts (read in order): http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062921.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062922.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062923.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062924.html Make sure you read Andriy's comments (2nd URL) in full. My follow-up (4th URL) confirms that the "mount -a" (which is what made / read-write since /etc/fstab obviously has / as rw) was causing the issue. He explains the reason. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: UFS Trim wont stay set
On Thu, Jul 04, 2013 at 03:37:28PM -0400, Mike Jakubik wrote: > Hello, > > I've just installed a stable snapshot on a new machine with a SSD > drive, after installing i booted single user mode and ran > > # tunefs -t enable /dev/ada0p2 > tunefs: issue TRIM to the disk set > > Great, back to multiuser mode, i check the partition > > # tunefs -p /dev/ada0p2 > tunefs: POSIX.1e ACLs: (-a)disabled > tunefs: NFSv4 ACLs: (-N) disabled > tunefs: MAC multilabel: (-l) disabled > tunefs: soft updates: (-n) enabled > tunefs: soft update journaling: (-j) enabled > tunefs: gjournal: (-J) disabled > tunefs: trim: (-t) disabled > > What the heck.. did i miss something? Back to single user mode and > > # tunefs -t enable /dev/ada0p2 > tunefs: issue TRIM to the disk remains unchanged as enabled > > I check again in multiuser mode and it says disabled, any ideas what > is going on here? Yup, experienced this myself many times over. The reasons are understood (it's not limited to just the TRIM bits, it's related to anything adjusting the superblock -- it gets cached in memory in certain situations and not flushed back to disk). Hint: are you booting into single user and then issuing a "mount" command before doing your tunefs stuff? If so, this is probably what's causing it (at least it was in my case). Instead just boot into single-user, do not mount anything, and use /sbin/tunefs (if available -- depends on your filesystem setup) or /rescue/tunefs. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS Panic after freebsd-update
On Tue, Jul 02, 2013 at 08:59:56AM +0300, Andriy Gapon wrote: > on 01/07/2013 21:50 Jeremy Chadwick said the following: > > The issue is that ZFS on FreeBSD is still young compared to other > > filesystems (specifically UFS). > > That's a fact. > > > Nothing is perfect, but FFS/UFS tends > > to have a significantly larger number of bugs worked out of it to the > > point where people can use it without losing sleep (barring the SUJ > > stuff, don't get me started). > > That's subjective. > > > I have the same concerns over other > > things, like ext2fs and fusefs for that matter -- but this thread is > > about a ZFS-related crash, and that's why I'm "over-focused" on it. > > I have an impression that you seem to state your (negative) opinion of ZFS in > every other thread about ZFS problems. The OP in question ended his post with the line "Thoughts?", and I have given those thoughts. My thoughts/opinions/experience may differ from that of others. Diversity of thoughts/opinions/experiences is good. I'm not some kind of "authoritative ZFS guru" -- far from it. If I misunderstood what "Thoughts?" meant/implied, then draw and quarter me for it; my actions/words = my responsibility. I do not feel I have a "negative opinion" of ZFS. I still use it today on FreeBSD, donated money to Pawel when the project was originally announced (because I wanted to see something new and useful thrive on FreeBSD), and try my best to assist with issues pertaining to it where applicable. These are not the actions of someone with a negative opinion, these are the actions of someone who is supportive while simultaneously very cautious. Is ZFS better today than it was when it was introduced? By a long shot. For example, on my stable/9 system here I don't tune /boot/loader.conf any longer. But that doesn't change my viewpoint when it comes to using ZFS exclusively on a FreeBSD box. > > A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only), > > results in a system where an admin can upgrade + boot into single-user > > and perform some tasks to test/troubleshoot; if the ZFS layer is > > broken, it doesn't mean an essentially useless box. That isn't FUD, > > that's just the stage we're at right now. I'm aware lots of people have > > working ZFS-exclusive setups; like I said, "works great until it > > doesn't". > > Yeah, a heterogeneous setup can have its benefits, but it can have its > drawbacks > too. This is true for heterogeneous vs monoculture in general. > But the sword cuts both ways: what if something is broken in "UFS layer" or > god > forbid in VFS layer and you have only UFS? > Besides, without mentioning specific classes of problems "ZFS layer is broken" > is too vague. The likelihood of something being broken in UFS is significantly lower given its established history. I have to go off of experience, both personal and professional -- in my years of dealing with FreeBSD (1997-present), I have only encountered issues with UFS a few times (I can count them on one, maybe two hands), and I'm choosing to exclude SU+J from the picture for what should be obvious reasons. With ZFS, well... just look at the mailing lists and PR count. I don't want to be a jerk about it, but you really have to look at the quantity. It doesn't mean ZFS is crap, it just means that for me, I don't think we're quite "there" yet. And I will gladly admit -- because you are the one who taught me this -- that every incident need be treated unique. But one can't deny that a substantial percentage (I would say majority) of -fs and -stable posts relate somehow to ZFS; I'm often thrilled when it turns out to be something else. Playing a strange devil's advocate, let me give you an interesting example: softupdates. When SU was introduced to FreeBSD back in the late 90s, there were issues and concerns -- lots. As such, SU was chosen to be disabled by default on root filesystems given the importance of that filesystem (re: "we do not want to risk losing as much data in the case of a crash" -- see the official FAQ, section 8.3). All other filesystems defaulted to SU enabled. It's been like that up until 9.x where it now defaults to enabled. So that's what, 15 years? You could say that my example could also apply to ZFS, i.e. the reports are a part of its growth and maturity, and I'd agree. But I don't feel it's reached the point where I'm willing to risk going ZFS-only. Down the road, sure, but not now. That's just my take on it. Please make sure to also consider, politely, that a lot of people who have issues wit
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 09:10:45PM +0300, Andriy Gapon wrote: > on 01/07/2013 20:04 Jeremy Chadwick said the following: > > People are operating with the belief that "ZFS just > > works", when reality shows "it works until it doesn't" > > That reality applies to everything that a man creates with a purpose to work. > I am not sure why you are so over-focused on ZFS. > Please stop spreading FUD. Thank you. The issue is that ZFS on FreeBSD is still young compared to other filesystems (specifically UFS). Nothing is perfect, but FFS/UFS tends to have a significantly larger number of bugs worked out of it to the point where people can use it without losing sleep (barring the SUJ stuff, don't get me started). I have the same concerns over other things, like ext2fs and fusefs for that matter -- but this thread is about a ZFS-related crash, and that's why I'm "over-focused" on it. A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only), results in a system where an admin can upgrade + boot into single-user and perform some tasks to test/troubleshoot; if the ZFS layer is broken, it doesn't mean an essentially useless box. That isn't FUD, that's just the stage we're at right now. I'm aware lots of people have working ZFS-exclusive setups; like I said, "works great until it doesn't". So, how do you kernel guys debug a problem in this environment: - ZFS-only - Running -RELEASE (i.e. no source, thus a kernel cannot be rebuilt with added debugging features, etc.) - No swap configured - No serial console -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 02:04:24PM -0400, Scott Sipe wrote: > On Mon, Jul 1, 2013 at 1:04 PM, Jeremy Chadwick wrote: > > > On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote: > > > On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick wrote: > > > > > > > Of course when I see lines like this: > > > > > > > > Trying to mount root from zfs:zroot > > > > > > > > ...this greatly diminishes any chances of "live debugging" on the > > > > system. It amazes me how often I see this come up on the lists -- > > people > > > > who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish > > > > that behaviour would stop, as it makes debugging ZFS a serious PITA. > > > > This comes up on the list almost constantly, sad panda. > > > > > > > > > I'm not sure why it amazes you that people are making widespread use of > > ZFS. > > > > It's not widespread use of ZFS. It's widespread use of ZFS as their > > sole filesystem (specifically root/var/tmp/usr, or more specifically > > just root/usr). People are operating with the belief that "ZFS just > > works", when reality shows "it works until it doesn't". The mentality > > seems to be "it's so rock solid it'll never break" along with "it can't > > happen to me". I tend to err on the side of caution, hence avoidance of > > ZFS for critical things like the aforementioned. > > > > It's different if you have a UFS root/var/tmp/usr and ZFS for everything > > else. You then have a system you can boot/use without issue even if ZFS > > is crapping the bed. > > > > > > ... > > > > > > 95% of FreeBSD users cannot debug kernel problems**. To debug a kernel > > problem, you need: a crash dump, a usable system with the exact > > kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and > > boot into 8.2 and reliably debug it using that), and (most important of > > all) a developer who is familiar with kernel debugging *and* familiar > > with the bits which are crashing. Those who say what you're quoting are > > often the latter. > > > > > > ... > > > > > > But the OP is running -RELEASE, and chooses to run that, along with use > > of freebsd-update for binary updates. Their choices are limited: stick > > with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely. > > > > So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I > ultimately wasn't sure where the right place to go for discuss 8.4 is? For filesystem issues, freebsd-fs@ is usually the best choice, because it discusses filesystem-related thing (regardless of stable vs. release, but knowing what version you have of course is mandatory). freebsd-stable@ is mainly for stable/X related discussions. Sorry to add pedanticism to an already difficult situation for you (and I sympathise, particularly since the purpose of the lists is often difficult to discern, even with their terse descriptions in mailman). > Beyond the FS mailing list, was there a better place for my question? I'll > provide the other requested information (zfs outputs, etc) to wherever > would be best. Nope, not as far as I know. The only other place is send-pr(1), once you have an issue that can be reproduced. Keep in mind, however, that none of these options (mailing lists, send-pr, etc.) mandate a response from anyone. You/your business (see below) should be aware that there is always the possibility no one can help solve the actual problem; as such it's important that companies have proper upgrade/migration paths, rollback plans, and so on. > This is a production machine (has been since late 2010) and after tweaking > some ZFS settings initially has been totally stable. I wasn't incredibly > closely involved in the initial configuration, but I've done at least one > binary freebsd-update previously. Well regardless it sounds like moving from 8.2-RELEASE to 8.4-RELEASE causes ZFS to break for you, so that would classify as a regression. What the root cause is, however, is still unknown. Point: 8.2-RELEASE came out in February 2011, and 8.4-RELEASE came out in June 2013 -- that's almost 2.5 years of changes between versions. The number of changes between these two is major -- hundreds, maybe thousands. ZFS got worked on heavily during this time as well. I tend to tell anyone using ZFS that they should be running a stable/X (particularly stable/9) branch. I can expand on that justification if needed, as it's well-founded for a lot of reasons. &
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote: > On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick wrote: > > > On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote: > >> *** Sorry for partial first message! (gmail sent after multiple returns > >> apparently?) *** > >> > >> Hello, > >> > >> I have not had much time to research this problem yet, so please let me > >> know what further information I might be able to provide. > >> [[...]] > >> Any thoughts? > > > > Thoughts: > > > > [[..]] > > Of course when I see lines like this: > > > > Trying to mount root from zfs:zroot > > > > ...this greatly diminishes any chances of "live debugging" on the > > system. It amazes me how often I see this come up on the lists -- people > > who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish > > that behaviour would stop, as it makes debugging ZFS a serious PITA. > > This comes up on the list almost constantly, sad panda. > > > I'm not sure why it amazes you that people are making widespread use of ZFS. It's not widespread use of ZFS. It's widespread use of ZFS as their sole filesystem (specifically root/var/tmp/usr, or more specifically just root/usr). People are operating with the belief that "ZFS just works", when reality shows "it works until it doesn't". The mentality seems to be "it's so rock solid it'll never break" along with "it can't happen to me". I tend to err on the side of caution, hence avoidance of ZFS for critical things like the aforementioned. It's different if you have a UFS root/var/tmp/usr and ZFS for everything else. You then have a system you can boot/use without issue even if ZFS is crapping the bed. > You could make the same argument that people shouldn't use UFS2 > journaling on their file systems because bugs in the implementation > might make debugging journaled UFS2 file systems "a serious PITA." Yup, and I do make that argument, quite regularly at that. There is even some evidence at this point in time that softupdates are broken: http://lists.freebsd.org/pipermail/freebsd-fs/2013-June/017424.html > The point is that there are VERY compelling reasons why people might > want to use ZFS for root/var/tmp/usr/etc. (pooled storage; easy > snapshots; etc.) and there should come a time when a given file system > is "generally regarded as safe." While there may be compelling reasons, those reasons quickly get shot down when they realise they have a system they can't easily do troubleshooting with when the issue is with ZFS. > I'd say the time for ZFS came when they removed the big disclaimer > from the boot messages. If ZFS is dangerous, they should reinstate > the "not ready for production" warning. Until they do, I think it's > unfair to castigate people for using ZFS universally. The warning meant absolutely nothing at the time (it did not keep people away from it), and would mean nothing now if brought back. A single kernel printf() is not the right choice of action. Are we better off today than we were when ZFS was originally ported over? Yes, by far. Lots of improvements, in many great/good ways. No argument there. But there is no way I'd risk putting my root filesystem (or other key filesystems) on it -- still too new, still too many bugs, and users don't know about those problems until it's too late. > Isn't it a recurring theme on freebsd-current and freebsd-stable that > more people need to use features so they can be debugged in realistic > environments? If you're telling them, "don't use that because it > makes debugging harder," how are they supposed to get debugged and > hence improved? :-) 95% of FreeBSD users cannot debug kernel problems**. To debug a kernel problem, you need: a crash dump, a usable system with the exact kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and boot into 8.2 and reliably debug it using that), and (most important of all) a developer who is familiar with kernel debugging *and* familiar with the bits which are crashing. Those who say what you're quoting are often the latter. Part of the "need people to try this" process you refer to is what stable/X is about, *without* the extra chaos of head. I'm one of those who for the past 15 years has advocated stable/X usage for a lot of reasons; I'll save the diatribe for some other time. But the OP is running -RELEASE, and chooses to run that, along with use of freebsd-update for binary updates. Their choices are limited: stick with 8.2, switch to stable/X, cease use of ZFS, or change OSes
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 08:49:25AM -0700, Jeremy Chadwick wrote: > - Is there a reason you do not have dumpdev defined in /etc/rc.conf (or > alternately, no swap device defined in /etc/fstab (which will get > used/honoured by the dumpdev="auto" (the default)) ? This should have read "or alternately, ***A*** swap device defined in /etc/fstab ..." -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote: > *** Sorry for partial first message! (gmail sent after multiple returns > apparently?) *** > > Hello, > > I have not had much time to research this problem yet, so please let me > know what further information I might be able to provide. > > This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4 > using freebsd-update. After I rebooted to test the new kernel, I got a > panic. I had to take a picture of the screen. Here's a condensed version: > > panic: page fault > cpuid = 1 > KDB: stack backtrace: > #0 kdb_backtrace > #1 panic > #2 trap_fatal > #3 trap_pfault > #4 trap > #5 calltrap > #6 vdev_mirror_child_select > #7 ved_mirror_io_start > #8 zio_vdev_io_start > #9 zio_execute > #10 arc_read > #11 dbuf_read > #12 dbuf_findbp > #13 dbuf_hold_impl > #14 dbuf_hold > #15 dnode_hold_impl > #16 dnu_buf_hold > #17 zap_lockdir > Uptime: 5s > Cannot dump. Device not defined or unavailable. > Automatic reboot in 15 seconds - press a key on the console to abort > > uname -a from before (and after) the reboot: > > FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 > UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC > amd64 > > dmesg is attached. > > I was able to reboot to the old kernel and am up and running back on 8.2 > right now. > > Any thoughts? Thoughts: - All I see is an amd64 system with 16GB RAM and 4 disks driven by an ICH10 in AHCI mode. - Output from: zpool status - Output from: zpool get all - Output from: zfs get all - Output from: "gpart show -p" for every disk on the system - Output from: cat /etc/sysctl.conf - Output from: cat /boot/loader.conf - Is there a reason you do not have dumpdev defined in /etc/rc.conf (or alternately, no swap device defined in /etc/fstab (which will get used/honoured by the dumpdev="auto" (the default)) ? Taking photos of the console and manually typing backtraces in is borderline worthless. Of course when I see lines like this: Trying to mount root from zfs:zroot ...this greatly diminishes any chances of "live debugging" on the system. It amazes me how often I see this come up on the lists -- people who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish that behaviour would stop, as it makes debugging ZFS a serious PITA. This comes up on the list almost constantly, sad panda. - Get yourself stable/9 and try that: https://pub.allbsd.org/FreeBSD-snapshots/ - freebsd-fs is a better place for this discussion, especially since you're running a -RELEASE build, not a -STABLE build. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Subversion 1.8 / FreeBSD 8 x86 STABLE Symlinks
On Sun, Jun 30, 2013 at 02:20:21PM -0400, Jason Hellenthal wrote: > When using svn 1.8 I have come across a situation where when it is used > pointing to a symlink that refers to a working directory that a update will > either segfault or exit prematurely and leave a lock held on the working > directory that the symlink points to. > > This leaves you with one choice but to run cleanup on the referenced actual > working directory which was AFAIK never the case for any version below 1.8. > > Not sure if this is a problem with svn or FreeBSD itself but thought I would > report the characteristics in case it's noticed elsewhere. > > Details: > Using UFS > FreeBSD 8-STABLE i386 as of this date. > > In the directory... > cd /exports/usr > ln -s src8 src > svn up /exports/usr/src Known bug/problem in Subversion, not FreeBSD: http://svn.apache.org/viewvc?view=revision&revision=r1496007 Previous discussion: http://lists.freebsd.org/pipermail/freebsd-questions/2013-June/251842.html -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FREEBSD_INSTALL failed with error 19 during booting installer
On Sun, Jun 30, 2013 at 02:09:36AM +1000, Ian Smith wrote: > On Fri, 28 Jun 2013 11:26:15 -0700, Jeremy Chadwick wrote: > > On Fri, Jun 28, 2013 at 08:22:29PM +0200, Marek Salwerowicz wrote: > > > Hi list, > > > > > > I am trying to install FreeBSD 9.1-Release amd64 on a Supermicro server: > > > > > > SuperStorage Server 6027R-E1R12N > > > > > > with Intel Xeon E5-2640 CPU and 32 GB (4 x 8 ) KVR16R11D4/8HC installed > > > > > > Currently I have only 2 SSD Kingston drives (working in mirror) > > > installed on that server. > > > > > > during booting installer from the ISO CD (amd64), the boot process > > > fails with message: > > > > > > Mounting from cd9660:/dev/iso9660/FREEBSD_INSTALL failed with error 19. > > > > > > As I found here: http://forums.freebsd.org/showthread.php?t=36579 , > > > probably this could be issue with ACPI, but setting option in > > > loader: > > > > > > # set debug.acpi.disabled ="hostres" > > > # boot > > > > > > made nothing for me. > > > > > > > > > > > > Any ideas? > > > > Try using a USB flash drive + memstick image instead of CD-based media. > > Last time I tried - 9.1-release i386 - the memstick boot gave no option > to drop to loader; I had to burn a disc1 CD so I could drop to loader to > turn cam.ctl off to succeed installing in 128MB. Did I miss something? I've used memstick images exclusively for years and have never seen this. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FREEBSD_INSTALL failed with error 19 during booting installer
On Fri, Jun 28, 2013 at 08:22:29PM +0200, Marek Salwerowicz wrote: > Hi list, > > I am trying to install FreeBSD 9.1-Release amd64 on a Supermicro server: > > SuperStorage Server 6027R-E1R12N > > with Intel Xeon E5-2640 CPU and 32 GB (4 x 8 ) KVR16R11D4/8HC installed > > Currently I have only 2 SSD Kingston drives (working in mirror) > installed on that server. > > during booting installer from the ISO CD (amd64), the boot process > fails with message: > > Mounting from cd9660:/dev/iso9660/FREEBSD_INSTALL failed with error 19. > > As I found here: http://forums.freebsd.org/showthread.php?t=36579 , > probably this could be issue with ACPI, but setting option in > loader: > > # set debug.acpi.disabled ="hostres" > # boot > > made nothing for me. > > > > Any ideas? Try using a USB flash drive + memstick image instead of CD-based media. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: AHCI Patsburg SATA controller and slow transfer speed
On Thu, Jun 27, 2013 at 06:38:27PM -0700, Jeremy Chadwick wrote: > Next, this statement by ahci(4) then confuses the user: > > > ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported > > You see, when AHCI was invented, the existing idea was that all ports > would have the same speed (and that was the case at the time). Only > somewhat recently have some vendors begun to mix-match speeds on the > same controller -- like this one. > > The AHCI specification probably (I have not read it even recently) only > provides a number indicating "the total number of ports" followed by a > single number indicating "the speed". > > There may be support somewhere within AHCI to provide an updated way to > get more granular information, but I do not know if that's the case. > > If there is, FreeBSD's ahci(4) driver does not support such at this > time (see sys/dev/ahci/ahci.c around line 502 for the device_printf() > call and what the arguments are (specifically AHCI_CAP_ISS and > AHCI_CAP_NPMASK)). Just a technical follow-up: I spent some time this evening looking at AHCI specification 1.30. I'll try to explain the situation. First, at the HBA level (meaning the entire AHCI controller): Bits 23-30 of CAP (reg. offset 0x00): Interface Speed Support (ISS). This indicates, quote, "the maximum speed the HBA can support on its ports". Next, on a per-port basis, there are two registers available relating to speed: one indicates speed, the other controls/limits speed: 1) Bits 7-4 of PxSSTS (reg. offset 0x28): SPD: Port x Serial ATA Status (SCR0: SStatus). This indicates, quote, "the negotiated interface speed". 2) Bits 7-4 of PxSCTL (reg. offset 0x2c): SPD: Port x Serial ATA Control (SCR2: SControl). The register controls, quote, "the highest allowable speed of the interface". The bit definitions indicate a way to limit the speed of a port and do not indicate capability. The actual 1.30 specification even has a section (10.5) on this whole ordeal, which states clearly, quote: 10.5 Interface Speed Support The HBA indicates the maximum speed it can support via the CAP.ISS register. Software can further limit the speed of a port by manipulating each port's PxSCTL.SPD field to a lower value. AHCI spec "proposal" 1.31 also does not address/cover this (all that adds is per-port sleep capabilities). I will point out that SATA600 is not officially mentioned in any spec at this time (that I can get my hands on), so what all the OSes run off of are educated assumptions. :-) But theoretically, a newer AHCI spec could support per-port maximum speed indication. It's not easy to phrase all this tersely in a single device_printf(), and there has already been opposition to adding printing of more lines to the existing drivers/in dmesg (meaning, printing 6 lines, one for each port, indicating active speed + maximum speed, would probably be looked down upon outside of verbose booting). The best I can come up with is this: ahci0: AHCI v1.30, 6 ports, maximum 6Gbps, Port Multiplier not supported ...which is better, but could still be interpreted as "6 ports with a maximum of 6Gbps per port". Hope this sheds light in some way or another. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: AHCI Patsburg SATA controller and slow transfer speed
On Thu, Jun 27, 2013 at 02:21:57PM -0700, Dave Hayes wrote: > Greetings all. I'm on FreeBSD 9.1-STABLE #0 r251391M. I'm noticing > two of my SATA disks are at half speed. Is this normal or is there > some configuration I'm forgetting? > > # dmesg | grep -C 4 ahc > ... > ahci0: port > 0x2070-0x2077,0x2060-0x2063,0x2050-0x2057,0x2040-0x2043,0x2020-0x203f > mem 0xd0b0-0xd0b007ff irq 21 at device 31.2 on pci0 > ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported > ahcich0: at channel 0 on ahci0 > ahcich1: at channel 1 on ahci0 > ahcich2: at channel 2 on ahci0 > ahcich3: at channel 3 on ahci0 > ahcich4: at channel 4 on ahci0 > ahcich5: at channel 5 on ahci0 > ... > ada0: ATA-8 SATA 3.x device > ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > ada1: ATA-8 SATA 3.x device > ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada1: Previously was known as ad6 > ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 > ada2: ATA-8 SATA 3.x device > ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ^ > ada2: Command Queueing enabled > ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada2: Previously was known as ad8 > ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 > ada3: ATA-8 SATA 3.x device > ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ^^^ > ada3: Command Queueing enabled > ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada3: Previously was known as ad10 > # pciconf -lcvb > ahci0@pci0:0:31:2: class=0x010601 card=0x35ae8086 > chip=0x1d028086 rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = 'Patsburg 6-Port SATA AHCI Controller' > class = mass storage > subclass = SATA > bar [10] = type I/O Port, range 32, base 0x2070, size 8, enabled > bar [14] = type I/O Port, range 32, base 0x2060, size 4, enabled > bar [18] = type I/O Port, range 32, base 0x2050, size 8, enabled > bar [1c] = type I/O Port, range 32, base 0x2040, size 4, enabled > bar [20] = type I/O Port, range 32, base 0x2020, size 32, enabled > bar [24] = type Memory, range 32, base 0xd0b0, size 2048, enabled > cap 05[80] = MSI supports 1 message enabled with 1 message > cap 01[70] = powerspec 3 supports D0 D3 current D0 > cap 12[a8] = SATA Index-Data Pair > cap 13[b0] = PCI Advanced Features: FLR TP > > Thanks for any insight provided. Intel Patsburg is otherwise known as Intel X79. The X79 chipset/southbridge offers 6 SATA ports, 2 of which are SATA600, and the remaining 4 are SATA300: http://en.wikipedia.org/wiki/Intel_X79 The intention of this was to offer 2 ports for people wanting to use SSDs (which tend to throttle themselves based on negotiated PHY speed), and a remaining 4 ports for MHDDs or ATAPI. You can, of course, use whatever ports for whatever you want. More importantly (I think): your devices are MHDDs and will never be able to reach SATA600 (or SATA300) speeds. Pure MHDDs which use SATA600 PHYs are somewhat of a marketing gimmick (but my gut feeling is that the MHDD vendors are choosing to narrow the number of on-disk SATA controllers they use). Hybrid HDDs may benefit from faster PHYs. Next, this statement by ahci(4) then confuses the user: > ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported You see, when AHCI was invented, the existing idea was that all ports would have the same speed (and that was the case at the time). Only somewhat recently have some vendors begun to mix-match speeds on the same controller -- like this one. The AHCI specification probably (I have not read it even recently) only provides a number indicating "the total number of ports" followed by a single number indicating "the speed". There may be support somewhere within AHCI to provide an updated way to get more granular information, but I do not know if that's the case. If there is, FreeBSD's ahci(4) driver does not support such at this time (see sys/dev/ahci/ahci.c around line 502 for the device_printf() call and what the arguments are (specifically AHCI_CAP_ISS and AHCI_CAP_NPMASK)). TL;DR -- Your motherboard offers 6 ports, 2 of which are SATA600, 4 of which are SATA300, and despite the line shown above by FreeBSD not matching reality, everything is working as designed. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Ad
Re: portupgrade(1) | portmaster(8) -- which is more effective for large upgrade?
On Wed, Jun 26, 2013 at 01:23:32PM -0700, Jeremy Chadwick wrote: > On Wed, Jun 26, 2013 at 09:42:43AM -0700, Chris H wrote: > {snipping} Also, hoping the OP is subscribed to -stable -- you should probably deal with this. This is not the first time I've seen problems with mail delivery to a 1command.com address. : host male.ultimateDNS.NET[209.180.214.225] said: 550 5.0.0 SPAM and BULK mail REJECTED (in reply to MAIL FROM command) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: portupgrade(1) | portmaster(8) -- which is more effective for large upgrade?
On Wed, Jun 26, 2013 at 09:42:43AM -0700, Chris H wrote: > Greetings, > I haven't upgraded my tree(s) for awhile. My last attempt to rebuild after > an updating > src && ports, resulted in nearly installing the entire ports tree, which is > why I've > waited so long. Try as I might, I've had great difficulty finding something > that will > _only_ upgrade what I already have installed, _and_ respect the "options" > used during the > original make && make install, or those options expressed in make.conf. > As portupgrade(1) && portmaster(8) appear to be the most used in this > scenario, > I'm soliciting opinions on which of these works best, or if there is > something else to > better manage this situation. Is there such a thing as a FreeBSD upgrade > "easy button"? Use portmaster, avoid portupgrade. And no I will not expand on my reasoning -- I urge anyone even mentioning the word portupgrade to spend a few hours of their day reading the horror stories on the mailing lists over the past 10 years or so (including recently). Choose wisely. And before going on any sort of "update crusade", I recommend you re-examine your make.conf methodologies for options if you haven't already. The OPTIONS framework has been revamped and improved many times over, so you will find things like this on a system whose admin keeps up with the times (compare this to older ways/methods, which may break or stop working): OPTIONS_UNSET+= X11 IPV6 NLS php5_SET+= APACHE php5_UNSET+=CGI postfix_SET+= PCRE TLS SASL2 samba36_SET+= AIO_SUPPORT samba36_UNSET+= LDAP CUPS ACL_SUPPORT WINBIND POPT wget_SET+= OPENSSL wget_UNSET+=IDN When rebuilding everything, I have always resorted to this: rsync -avH /usr/local/ /usr/local.old/ pkg_delete -a -f rm -fr /usr/local/* rm -fr /var/db/ports/* rm -fr /usr/ports/distfiles/* cd /usr/ports/whatever make install clean {lather rinse repeat until done} And add some pkg_add -r's in there for large-ish things I don't want to rebuild from source (I think folks who use X probably do this quite a bit; I remember hearing how Open/LibreOffice takes something like 3-4 hours to build on some systems). But that's just how I do things. My advice on using portmaster, however, still stands. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Another bug in SSH in FreeBSD 8.4 (sftp cannot create relative symlinks)
On Tue, Jun 25, 2013 at 03:03:04AM +0200, Miroslav Lachman wrote: > Jeremy Chadwick wrote: > >On Mon, Jun 24, 2013 at 03:36:24PM -0700, Xin Li wrote: > >>-BEGIN PGP SIGNED MESSAGE- > >>Hash: SHA512 > >> > >>On 06/24/13 15:11, Miroslav Lachman wrote: > >>[...] > >>>The patch seems really simple and I know how to apply it, but I am > >>>not able to compile and install only fixed sftp command instead of > >>>the whole userland. Can you push me to the right direction? > >> > >>I think you can go to /usr/src/secure/usr.bin/sftp and do: > >> > >>make depend > >>make > >> > >>Then, as root: > >> > >>make install > > Thank you! I didn't know I must be in /usr/src/secure/usr.bin/sftp > > I tried your patch and can confirm it works for me! > > >>I usually do a full world build to make sure that this doesn't break > >>something else but this change should only affect sftp(1). > > > >I'm going to make this real simple: > > > >Is the problem with symlinks in the client (sftp(1)), in the server > >(sftp-server(8)), or both? The impression I get from the original post > >that started this thread is that it's in the server part. > > No, it is the problem on the client side. The server side in all > cases is good old OpenSSH 5.4 on FreeBSD 8.3. Only the newer sftp > client is broken and this bug is really fixed by patch provided by > Xin Li. > > We tried OpenSSH 6.2 client side from Mac OS X and it is broken too. > The same apply to openssh-portable from ports (openssh-portable-6.2.p2_3,1) > > >So, I believe he'd want to poke about in src/secure/libexec/sftp-server. > >However, that may not be enough, due to the fact that sftp-server(8) > >depends (links to) libssh.so.X, libcrypt.so.X, and libcrypto.so.X. I do > >not know where the actual broken code lies. > > > >Someone on -security might know exactly what all needs to be built/what > >commands need to be run, but I will tell you this up front: > > > >The official security announcements for SSL or SSH-related things have > >historically told people to build world. I went and read the mailing > >list archives for -security-announcements and found proof/examples of > >this fact when issues pertain to SSL or SSH. > > > >My recommendation is just to build world. Don't risk it -- this is a > >key piece of your system, all you're trying to do is save some time. > >Don't. Just build/install world and don't screw around. > > I understand your concern and I will rebuild world if the patch > changes anything in the server part, but this is realy just a fix in > sftp client command and I want to try it quickly and to have a quick > path to go back to original version of the sftp command. > > This is on testing machine anyway, I will not do this on production > machines. Understood -- it was my misunderstanding of the issue (being on the client side, not server side), so Xin's advice is sound. Sorry for the noise on my part. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Another bug in SSH in FreeBSD 8.4 (sftp cannot create relative symlinks)
On Mon, Jun 24, 2013 at 03:36:24PM -0700, Xin Li wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > On 06/24/13 15:11, Miroslav Lachman wrote: > [...] > > The patch seems really simple and I know how to apply it, but I am > > not able to compile and install only fixed sftp command instead of > > the whole userland. Can you push me to the right direction? > > I think you can go to /usr/src/secure/usr.bin/sftp and do: > > make depend > make > > Then, as root: > > make install > > I usually do a full world build to make sure that this doesn't break > something else but this change should only affect sftp(1). I'm going to make this real simple: Is the problem with symlinks in the client (sftp(1)), in the server (sftp-server(8)), or both? The impression I get from the original post that started this thread is that it's in the server part. So, I believe he'd want to poke about in src/secure/libexec/sftp-server. However, that may not be enough, due to the fact that sftp-server(8) depends (links to) libssh.so.X, libcrypt.so.X, and libcrypto.so.X. I do not know where the actual broken code lies. Someone on -security might know exactly what all needs to be built/what commands need to be run, but I will tell you this up front: The official security announcements for SSL or SSH-related things have historically told people to build world. I went and read the mailing list archives for -security-announcements and found proof/examples of this fact when issues pertain to SSL or SSH. My recommendation is just to build world. Don't risk it -- this is a key piece of your system, all you're trying to do is save some time. Don't. Just build/install world and don't screw around. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Sun, Jun 23, 2013 at 02:41:27AM +0200, Willem Jan Withagen wrote: > On 19-6-2013 17:04, Jeremy Chadwick wrote: > >- Adam runs 9.1-RELEASE because of business needs pertaining to > > freebsd-update and binary updates. (I ask more about this for > > benefits of readers below, however -- because this situation comes > > up a lot and I want to know what real-world admins do) > > The bug is very specifically available in 9.1-RELEASE because I got > bit by it before the release of 9.1. But discussed it with avg@ and > it did not make it into the release, but was submitted only like 2 > weeks later. > > So in that case you can probably stop looking. > > For just about any 9.1-STABLE after that should the fix be in the code. I'm not sure why so many people (so far) seem to think that this problem is always the same issue -- it isn't. There are multiple things that have historically (and/or presently) have caused this issue. Here's the list I composed only a few days ago, and it is far from thorough: http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073863.html My point is that the "shutdown -r issue" issue might manifest itself in the same fashion for everyone, but the **root cause** often differs. I.e. what fixed it for you may not fix it for Adam. We must wait and see (he's in the process of getting a system to try stable/9 on). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: slow bootloader on Dell R320
On Sat, Jun 22, 2013 at 09:37:37PM +0200, Loc BLOT wrote: > Hi all ! > Thanks for the very good support of Dell R320 hardware, perc H310 is > well supported, BCM5720 seems to work correctly and performances are > great. > The only problem i have found is very strange. The FreeBSD bootloader > take many times to load, 30sec-2minutes to boot the kernel and show the > bootloader menu. After that, the system boots properly, at a normal > speed. > Is there any issue or optimization i can do ? > The OpenBSD bootloader doesn't have this problem. 1. What FreeBSD version exactly? (Please don't say "9.1", we need to know the full version, e.g. 9.1-RELEASE, or if you built your own we need uname -a output (you can hide the machine name)) 2. How many disks are in the machine? 3. Are any of the disks used for ZFS? There have been **many** improvements to the FreeBSD bootloader with regards to things taking a long time on boot-up in semi-recent days, but answers to the above questions will determine that. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 11:34:39AM -0500, Matthew D. Fuller wrote: > On Wed, Jun 19, 2013 at 09:16:35AM -0700 I heard the voice of > Jeremy Chadwick, and lo! it spake thus: > > > > The above CDB + subcommand disables APM entirely. There is a lot > > more to APM than just parking heads (and in all honesty, APM should > > have nothing to do with parking heads). Disabling APM can actually > > have drastic effects on drive temperature (meaning there are certain > > chip and/or motor operations that said feature controls *in > > addition* to head parking), and other firmware-level features that > > aren't documented. > > True enough, in concept. With all the drives sitting behind > ventilation perfectly capable of dealing with 15kRPM drives, I don't > worry about what that might do to the 7200's though... Justified in your environment, but not in mine -- where most of my systems (at home) are extremely quiet (1000-1200rpm fans, lots of noise dampening material, etc.). A 10C increase *during idle* is enough to make me wary. I also have extremely sensitive hearing, so drives clicking is something I can hear from quite a distance -- I guess working with them for so long over the years has made me sensitive to 'em. > > Furthermore, that CDB does not work for all drives. There are > > Seagate drives -- I know because I bought some and returned them > > when the APM trick did not work -- that lack the LCC-disable tie-in > > to APM. The drive either rejected the CDB (ATA status code error > > returned), while others accepted it but nothing in 0xec (IDENTIFY) > > reported as got changed. > > Well, I haven't seen it with these. Several of > ada0: ATA-8 SATA 3.x device > and some systems with CC4C too. The drives I was testing were STx000DM001. I don't remember if I had a DM002. I also don't remember the firmware version they had on them, but I do remember there were no updates available from Seagate at that time. On the other hand, their forum was *filled* with post after post about the issue, including one fellow whose drive in something like 3 months was almost reaching MTBF head park/reload count. But my point is this: 3.5" drives do not need this feature in 95% of environments. In desktop systems it's worthless -- in consumer desktops it accomplishes nothing but noise and annoyance and impacts I/O, and in business desktop desktop environments it serves no purpose because most places have their desktops go into sleep mode (so drive standby/sleep gets used). And in the server environment it's pure 100% worthless. With 2.5" drives I can see it being more useful, but only if the drive is used in a laptop. There are NASes (and now servers too!) which use 2.5" drives, and I sure as hell wouldn't want that happening there. So really it's just a bad feature all around that should be specific to one environment demographic; the vendors should have made a 2.5" drive "dedicated for laptops" that had this feature enabled, while disabld on all other drives (2.5" and 3.5"). What we got was nearly opposite. > > I will have -- and eat -- their souls. > > The problem with that is that the undigestible bits of "soul" just get > passed right back into the ecosystem, and in a more concentrated form. > > Some might suggest that's already happened, and is got us here in the > first place 8-} If you had what I do (moderate-to-severe IBS), you'd know that it definitely doesn't get passed back in a more concentrated form. First joke I've been able to make about my health condition, yeah! Ha! I kill me! -- Alf -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 10:53:46AM -0500, Matthew D. Fuller wrote: > On Wed, Jun 19, 2013 at 08:04:14AM -0700 I heard the voice of > Jeremy Chadwick, and lo! it spake thus: > > > > > > Readers: if any of you have a ST[123]000DM001 drive running the CC24 > > firmware, and can confirm high head parking counts (SMART attribute > > 193), and are willing to upgrade your drive firmware to the latest then > > see if the LCC increments stop (or at least settle down to normal > > levels), I'd love to hear from you. I have been socially boycotting > > these models of drives because of that idiotic firmware design choice > > for quite some time now (not to mention the parking on those drives > > is audibly loud in a normal living room), and if the F/W actually > > inhibits the excessive parking then I have some drives to consider > > upgrading. :-) > > > > I dunno about firmware, but you can smack 'em with a big hammer... > > /etc/rc.local: > for i in 0 1; do > /sbin/camcontrol cmd ada${i} -a "EF 85 00 00 00 00 00 00 00 00 00 00" > done > > x-ref: > http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052997.html > > > LCC was somewhere in the upper 400's (I wanna say 480-some?) a year > and change ago when I dropped that in. It's 506/493 now on the two > drives. The above CDB + subcommand disables APM entirely. There is a lot more to APM than just parking heads (and in all honesty, APM should have nothing to do with parking heads). Disabling APM can actually have drastic effects on drive temperature (meaning there are certain chip and/or motor operations that said feature controls *in addition* to head parking), and other firmware-level features that aren't documented. Furthermore, that CDB does not work for all drives. There are Seagate drives -- I know because I bought some and returned them when the APM trick did not work -- that lack the LCC-disable tie-in to APM. The drive either rejected the CDB (ATA status code error returned), while others accepted it but nothing in 0xec (IDENTIFY) reported as got changed. The only model of drive I know that reliably works with this method is the WD Green/-GP drive, and the drive temperatures do increase. No idea on the Blues. (Another reason I recommend the Reds...) What *should* have happened is that a new 0xef subcommand should have been created for this. Subs range from 0x00-0xff. T13 spec shows that a huge number of them (I'd say 30% or more) are marked "Reserved" and an additional 30% or so are marked "Obsolete". And finally, 0x56-0x5c, 0xd6-0xdc and 0xe0 are "Vendor Specific". But looking at this from a more general view, the real issue is that these types of features should not have been introduced to begin with. The vendors introduced this problem, and now are marketing drives with said feature disabled, claiming "we fixed the problem that annoys so many of you!" -- the same problem **they introduced without asking anyone**. I will have -- and eat -- their souls. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)
On Wed, Jun 19, 2013 at 05:02:20PM +0200, Dennis Kgel wrote: > Am 19.06.2013 um 16:47 schrieb Steven Hartland: > > I'm not familar with that model of the areca but have you tried > > with the standard OS driver or does it not support that card? > > The ARC1320 (non-raid) unfortunately isn't supported by the in-tree driver. Which model of the ARC1320 are you using (there are 2). I'm having trouble understanding their chart too: http://www.areca.us/products/sasnoneraid6g.htm Because the controllers claim to support up to 128 disks, via break-out cables, but I'm not sure. You aren't using any port multipliers, are you? > > Also when you see hangs can you access the disk directly or not > > e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ? > > Interesting idea. The dd then hangs right until everything else resumes as > well. > > ^T during hang says: load: 12.39 cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% > 1632k Is this ***while** you have immense amounts of ZFS write I/O going to those drives (your zpool iostat was showing ~250-300MB/sec to the pool)? It's very important to note that the stats you showed were during writes. What we're trying to figure out here is where the blocking (waiting) is happening: a) the ZFS layer b) the storage driver layer ('arcsat', the 3rd-party unofficial driver) c) the CAM layer d) the GEOM layer e) something with the disk(s) f) something with memory I/O going on (say between the storage driver and ZFS, for lack of better way to phrase it) I have a very big Email written for you, but I wanted to let certain answers to Ronald's questions come out first. -rw---1 jdc users 5576 Jun 19 06:49 dennis_kgel_response.txt I need to re-word this and take into consideration some of the new stuff said up to now, but I don't know if I'll ahve the time for this (you should see my desktop right now, I have literally 4 IM messages to answer and my Email box is non-stop). The one I want to get out of the way right now is this: Can you please try putting this in /boot/loader.conf + reboot and see if the behaviour for you changes? vfs.zfs.no_write_throttle="1" Warning: this may actually exacerbate the problem worse, depending on what the nature/root cause is. Right now I'm of the opinion ZFS is actually doing the Right Thing(tm) and that the issue may be in Areca's driver, but that's hearsay until I have proof. But the write throttling stuff added semi-recently (by the Illumos folks, this is not a FreeBSD feature) has had some reports of problems where disabling it helped immensely. Important: 24 disks off a single controller is a lot of bandwidth. That controller may be overwhelmed, in which case you would see exactly this kind of behaviour as the controller is screaming "GOD HELP ME, I'M TRYING TO DO ALL THIS STUFF AND YOU KEEP THROWING I/O AT ME". :-) This is also why I ask about port multiplier usage. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 09:15:18PM +0700, Adam Strohl wrote: > On 6/19/2013 20:35, Jeremy Chadwick wrote: I've snipped out portions which aren't relevant at this point in the convo. I'm trying to be terse as much as possible here (honest). To recap for readers/mailing list: - Adam seems the same behaviour on systems on bare metal, as well as FreeBSD guests running under VMware ESXi 5.0 hypervisor. However, as I stated on the list just yesterday about "lock-ups on shutdown", every situation may be different and there is a well-established history of this problem on FreeBSD where each root cause (bugs) were completely different from one another. - The system we're discussing at this point in the thread is on bare metal -- specifically an Asus P8B-X motherboard, with BIOS version 6103, driven entirely by on-board Intel AHCI (not BIOS-level RAID). - Adam runs 9.1-RELEASE because of business needs pertaining to freebsd-update and binary updates. (I ask more about this for benefits of readers below, however -- because this situation comes up a lot and I want to know what real-world admins do) > >Thanks. I was mainly interested in the storage controller being used > >(in this case ahci(4)) and the disks being used (notorious ST3000DM001, > >known for excessively parking heads). > > Yeah, was not my first choice but then again ... RAIDZ-2 :) HD > supply chain here (Thailand) is weird considering how many are made > here (and can't buy). Smartd screams about them possibly needing a > firmware update (they don't according to Seagate). Had no issues > aside from a failure a month or so again (it's an HD ... it > happens). Absolutely understood -- and FYI, in case you need backup, your thought process/conclusion here is spot on (re: "it's a MHDD, failures happen"). Irrelevant to your shutdown problem: as for smartmontools bitching about the firmware: no vendors disclose what actual changes go into their drive firmware updates (vendors if you are reading this: I will have your souls...), so I have to read a bunch of end-user forums where nobody knows what they're talking about, and then of course find this "highly educational" *cough* article from Adaptec: http://ask.adaptec.com/app/answers/detail/a_id/17241/~/known-issues-with-seagate-barracuda-7200.14-desktop-drives The problem here is that there have been *so many* firmware bugs with Seagate's drives in the past 2 years or so that it's impossible for me to know which fixes what. You buy what you buy because that's what you buy, and that's cool -- but I avoid their stuff like the plague. Readers: if any of you have a ST[123]000DM001 drive running the CC24 firmware, and can confirm high head parking counts (SMART attribute 193), and are willing to upgrade your drive firmware to the latest then see if the LCC increments stop (or at least settle down to normal levels), I'd love to hear from you. I have been socially boycotting these models of drives because of that idiotic firmware design choice for quite some time now (not to mention the parking on those drives is audibly loud in a normal living room), and if the F/W actually inhibits the excessive parking then I have some drives to consider upgrading. :-) > >I can also see you're running your own kernel. We'll get to that in a > >moment. > > It's GENERIC with the following added to the end: > > # -- Add Support for nicer console > # > options VESA > options SC_PIXEL_MODE Can you try removing VESA and SC_PIXEL_MODE please? I know that sounds crazy ("what on earth would that have to do with it?"), but please try it. I can explain the justification if need be -- I'm being extra paranoid of something that got discovered here on -stable only a few days ago. It's a stretch, but I can see potential relevance. I can provide details/links later. > >>>4. Does "sysctl hw.usb.no_shutdown_wait=1" help you? > >> > >>Weirdly this allowed it to reboot on the first try (without needing > >>to be reset), but not the second. > > > >I'm not surprised. Pleas re-try with stable/9; Hans has been constantly > >working on the USB stack and fixing major bugs. > > Got it but probably not going to go this route as it means no more > binary upgrades. While I can reboot it, it is the office NAS here > and so 'testing out' -STABLE I think probably isn't going to happen. I understand. I have a question relating to this below. > >Place background_fsck="no" in /etc/rc.conf. If the machine does not > >have a clean filesystem on boot-up, you'll know because the system will > >immediately begin fsck (in the foreground actively). You'll recog
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 07:53:19PM +0700, Adam Strohl wrote: > On 6/19/2013 19:21, Jeremy Chadwick wrote: > >On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: > >>Hello -STABLE@, > >> > >>So I've seen this situation seemingly randomly on a number of both > >>physical 9.1 boxes as well as VMs for I would say 6-9 months at > >>least. I finally have a physical box here that reproduces it > >>consistently that I can reboot easily (ie; not a production/client > >>server). > >> > >>No matter what I do: > >> > >>reboot > >>shutdown -p > >>shutdown -r > >> > >>This specific server will stop at "All buffers synced" and not > >>actually power down or reboot. KB input seems to be ignored. This > >>server is a ZFS NAS (with GMIRROR for boot blocks) but the other > >>boxes which show this are using GMIRRORs for root/swap/boot (no > >>ZFS). > >> > >>Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg > >> > >>When I reset the server it appears that disks were not dismounted > >>cleanly ... on this ZFS box it comes back quick because ZFS is good > >>like that but on the other servers with GMIRROR roots rebuilding the > >>GMIRROR and fscking at the same time is murder on the > >>disk/performance until it finishes. > > > >1. You mention "as well as VMs". Anything under a "virtual machine" or > >under a hypervisor is going to be very, very, **VERY** different than > >bare metal. So I hope the issues you're talking about above are on bare > >metal -- I will assume so. > > Nope, I see basically the same thing sometimes under ESXi 5.0 > Hypervisor (and yes it worries me the implications of something so > broad). Those unites I just haven't been able to isolate on a > server which isn't critical. Lets focus on this server for now > though per your suggestion below. I'm sorry but I don't understand your first sentence -- the first part of your sentence says "nope" (I have to assume in reply to my "on bare metal" part), but then says "I see basically the same thing sometimes under ESXi" which implies an alternate environment in comparison (i.e. we *are* talking about bare metal). Consider me confused. :-) > >2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE. > >If you use stable/9 (RELENG_9) we need to see uname -a output (you can > >hide the machine name if you want). > > Sorry, this ZFS box is 9.1-R P4 (kernel built today): > > FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun > 19 15:31:12 ICT 2013 > root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS amd64 I suggest trying stable/9 (and staying with it, for that matter). > >3. Can we please have dmesg from this machine? The controller and some > >other hardware details matter. > > Sure take a look at the full log here: http://pastebin.com/k55gVVuU > > This includes a boot, then a reboot as I describe (you can see it > logs the All Buffers Synced, etc) then powering back on. Thanks. I was mainly interested in the storage controller being used (in this case ahci(4)) and the disks being used (notorious ST3000DM001, known for excessively parking heads). AFAIK this isn't one of the controllers that was known for weird "quirky issues" pertaining to flushing data to disk on shutdown. I have to ask: is this FreeBSD box running under a HV? If it *is not* running under a HV, could we please get exact motherboard model and version (including BIOS version)? Sometimes (not always) you can get this from "kenv | grep smbios." I can also see you're running your own kernel. We'll get to that in a moment. > >4. Does "sysctl hw.usb.no_shutdown_wait=1" help you? > > Weirdly this allowed it to reboot on the first try (without needing > to be reset), but not the second. I'm not surprised. Pleas re-try with stable/9; Hans has been constantly working on the USB stack and fixing major bugs. > The "Starting background file > system checks in 60 seconds" message appeared ... that only happens > when something is dirty, right? No it does not. That message is always printed when you use background fsck, which is the default. I do not advocate using background fsck, because it has been known (and may still do this -- I do not care to find out, I do not have time for unreliable filesystem nonsense) to not always fix all filesystem problems. Meaning: people using background fsck have been known to boot into single-user and issue "fsck" manually and find iss
Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: > Hello -STABLE@, > > So I've seen this situation seemingly randomly on a number of both > physical 9.1 boxes as well as VMs for I would say 6-9 months at > least. I finally have a physical box here that reproduces it > consistently that I can reboot easily (ie; not a production/client > server). > > No matter what I do: > > reboot > shutdown -p > shutdown -r > > This specific server will stop at "All buffers synced" and not > actually power down or reboot. KB input seems to be ignored. This > server is a ZFS NAS (with GMIRROR for boot blocks) but the other > boxes which show this are using GMIRRORs for root/swap/boot (no > ZFS). > > Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg > > When I reset the server it appears that disks were not dismounted > cleanly ... on this ZFS box it comes back quick because ZFS is good > like that but on the other servers with GMIRROR roots rebuilding the > GMIRROR and fscking at the same time is murder on the > disk/performance until it finishes. 1. You mention "as well as VMs". Anything under a "virtual machine" or under a hypervisor is going to be very, very, **VERY** different than bare metal. So I hope the issues you're talking about above are on bare metal -- I will assume so. 2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE. If you use stable/9 (RELENG_9) we need to see uname -a output (you can hide the machine name if you want). 3. Can we please have dmesg from this machine? The controller and some other hardware details matter. 4. Does "sysctl hw.usb.no_shutdown_wait=1" help you? 5. Does "sysctl hw.acpi.handle_reboot=1" help you? 6. Does "sysctl hw.acpi.disable_on_reboot=1" help you? 7. If none of the above helps, can you please boot verbose mode and then when the system "locks up" on "shutdown -r now" take a picture of the VGA console? 8. Does the machine run moused(8) (check the process list please, do not rely on rc.conf) ? > Another interesting thing is that this particular server runs slapd > (OpenLDAP) which, when it comes back up, has a "corrupted" DB > (easily fixed with db_recover, but still). This might be because FS > commits aren't happening at the end. I can even manually stop > slapd (service slapd stop) then run sync(8) (I assume this does > something for ZFS too) and it still comes back as hosed if I reboot > shortly after. If I start/stop slapd it's fine. So I feel like > there is an FS/dismount thing going on here. sync(8) does not do what you think it does. Please read (not skim) this entire thread starting here: http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982 http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html Your problem is related to unclean shutdown; fix that and your issues go away. > Additional information: I also have some boxes which will reboot > (ie; they don't freeze like some do at the end) but they don't > dismount cleanly either and have to rebuild both GMIRROR and fsck. > This might be a different issue, too. Every issue needs to be handled/treated separately. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG
On Wed, Jun 19, 2013 at 01:41:10AM +0430, Javad Kouhi wrote: > I've read the posts again. Although the issue looks same as Michiel > Boland (first link) but I'm not sure if the root of the issue is same > as Michiel's too (second link). Anyway, it should be discussed in > another thread as you said. Let me be more clear: I have seen repeated reports from people complaining about "lockups when shutting down" many times over the years. The ones I remember: - Certain oddities with SCSI/SATA storage drivers and disks (many of these have been fixed) - ACPI-based reboot not working correctly on some motherboards (depends on hw.acpi.handle_reboot and sometimes hw.acpi.disable_on_reboot) -- not sure if this still pops up - USB layer causing issues, or possibly some USB CAM integration problem (this is still an ongoing one) - Now some sort of weird Intel graphics driver (and DRM?) quirk involving moused(8) and Vsync (the issue reported by Michiel) And I'm certain I'm forgetting others. What Kevin Oberman said also applies -- these are painful to debug because the system is already in a "shutting down" state where usability and accessibility becomes bare minimal, and you're kind of at your wits end. Booting verbose can help -- there are other messages printed to the VGA (and/or serial) console during the shutdown phase when verbose. All you can hope for is that the kernel is still alive and Ctrl-Alt-Esc to force a drop to DDB (assuming all of this is enabled in your kernel) works and that someone familiar with the FreeBSD kernel can help you debug it (possibly it's just easier to do that, type "panic", then issue "call doadump" to force a dump to swap at that point -- kib@ might have better recommendations). Serial console can also greatly help, because quite often there are pages upon pages of debugging information that are useful, otherwise you have to hope the VGA console keyboard is functional (even more tricky with USB) and that Scroll Lock + Page Up/Down function along with taking photos of the screen; doing it this way is stressful and painful for everyone involved. I hope this sheds some light on why I said what I did. :-) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG
On Tue, Jun 18, 2013 at 10:37:10PM +0430, Javad Kouhi wrote: > On Tue, Jun 18, 2013 at 7:17 PM, Jeremy Chadwick wrote: > > > > I do not use git, I use svn, So I cannot help you with git "crap". > > > > Please revert your sys/dev/drm2/i915/intel_fb.c and > > sys/dev/syscons/scvgarndr.c back to r251934 (or newer) before following > > what I tell you below. > > > > The problem is either that: > > > > - The patch you were given is probably for a different FreeBSD release, > > thus the code/line numbers/info in the code break the fuzzy logic > > matching, > > - You copy-pasted the diff and because of tabs vs. spaces botched it, > > - git apply/patch/whatever is weird, > > - Multitudes of other possibilities I do not care to go into. > > > > The hack kib@ gave you is not hard to manually add yourself. It's very > > few lines of code. I'm very surprised you didn't try to manually add it > > yourself. So I have done that for you. First, the proof -- this is > > against r251939, by the way, but that shouldn't matter as nobody has > > touched this between r251934 and r251939: > > > > $ svn info > > Path: . > > Working Copy Root Path: /home/jdc/work/src > > URL: svn://svn.freebsd.org/base/stable/9 > > Repository Root: svn://svn.freebsd.org/base > > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f > > Revision: 251939 > > Node Kind: directory > > Schedule: normal > > Last Changed Author: marius > > Last Changed Rev: 251939 > > Last Changed Date: 2013-06-18 07:20:14 -0700 (Tue, 18 Jun 2013) > > > > $ svn status > > M sys/dev/drm2/i915/intel_fb.c > > M sys/dev/syscons/scvgarndr.c > > > > The diff itself is available here: > > > > http://jdc.koitsu.org/freebsd/sysmouse_vsync.diff > > > > I've also attached it here in Email (assuming the mailing list doesn't > > delete it). > > > > You should apply the patch using: > > > > cd /usr/src (or wherever your source is) > > patch -p0 < sysmouse_vsync.diff > > > > Assuming use of svn, you can revert this patch by doing: > > > > cd /usr/src (or wherever your source is) > > svn revert sys/dev/drm2/i915/intel_fb.c > > svn revert sys/dev/syscons/scvgarndr.c > > rm sys/dev/drm2/i915/intel_fb.c.orig > > rm sys/dev/syscons/scvgarndr.c.orig > > > > There is probably some other "magical" way to do all of this, but as > > anyone here knows, I do things manually because in general I do not > > trust VCSes or the "magic" they do under the hood; I prefer to do things > > that I know work. > > > > Good luck -- I cannot help with any other aspect to the issue. > > > > -- > > | Jeremy Chadwick j...@koitsu.org | > > | UNIX Systems Administratorhttp://jdc.koitsu.org/ | > > | Making life hard for others since 1977. PGP 4BD6C0CB | > > > > Many thanks for the detailed answer. I've applied your patch and then > rebuilt the world and kernel. To be honest, I tried to apply the patch > manually but the syntax was too complex for me. Thanks for the help to > apply the patch. > > Unfortunately, the original issue is still exist and shutdown(8) > doesn't work properly. I'm a newbie and I don't know what informations > I should provide, but here is some basic information: > > % uname -a > FreeBSD minootux 9.1-STABLE FreeBSD 9.1-STABLE #0 r251946M: Tue Jun 18 > 21:16:56 IRDT 2013 root@minootux:/usr/obj/usr/src/sys/GIGABYTE > amd64 > > % pkg_info -I -x xorg-server -x drm > libdrm-2.4.44 Userspace interface to kernel Direct Rendering Module > servi > xorg-server-1.12.4,1 X.Org X server and related programs > > The machine is a laptop and the following link contains the details > about the hardware: > http://www.gigabyte.com/products/product-page.aspx?pid=3793#sp > > KMS and NEW_XORG are enabled in my /etc/make.conf. First, what makes you think your issue is the same issue as reported by Michiel Boland? Let me point you to two of his posts (read them slowly and in full please): http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073821.html http://lists.freebsd.org/pipermail/freebsd-stable/2013-June/073839.html Second, the patch is not mine -- it's Konstantin's. I did not write the code/fix, nor do I understand it. All I did was provide a version of the same patch that applied cleanly on recent stable/9. (I'm sorry for needing to state this, but c
Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG
On Tue, Jun 18, 2013 at 07:00:30PM +0430, Javad Kouhi wrote: > Thanks for the reply, seems that our source trees are not same, I got this: > > % patch -p1 < /path/to/patch > Hmm... Looks like a unified diff to me... > The text leading up to this was: > -- > |diff --git a/sys/dev/drm2/i915/intel_fb.c b/sys/dev/drm2/i915/intel_fb.c > |index 3cb3b78..e41a49f 100644 > |--- a/sys/dev/drm2/i915/intel_fb.c > |+++ b/sys/dev/drm2/i915/intel_fb.c > -- > Patching file sys/dev/drm2/i915/intel_fb.c using Plan A... > Hunk #1 succeeded at 207 with fuzz 1. > Hunk #2 failed at 231. > 1 out of 2 hunks failed--saving rejects to sys/dev/drm2/i915/intel_fb.c.rej > Hmm... The next patch looks like a unified diff to me... > The text leading up to this was: > -- > |diff --git a/sys/dev/syscons/scvgarndr.c b/sys/dev/syscons/scvgarndr.c > |index 6e6663c..fc7f02f 100644 > |--- a/sys/dev/syscons/scvgarndr.c > |+++ b/sys/dev/syscons/scvgarndr.c > -- > Patching file sys/dev/syscons/scvgarndr.c using Plan A... > Hunk #1 succeeded at 395. > Hunk #2 failed at 447. > 1 out of 2 hunks failed--saving rejects to sys/dev/syscons/scvgarndr.c.rej > done > > > And the git way: > > % git apply /path/to/patch > error: patch failed: sys/dev/drm2/i915/intel_fb.c:207 > error: sys/dev/drm2/i915/intel_fb.c: patch does not apply > error: patch failed: sys/dev/syscons/scvgarndr.c:445 > error: sys/dev/syscons/scvgarndr.c: patch does not apply > > > I have revision 251934 of -STABLE branch. (I updated my source tree > about 3 hours ago using svn) I do not use git, I use svn, So I cannot help you with git "crap". Please revert your sys/dev/drm2/i915/intel_fb.c and sys/dev/syscons/scvgarndr.c back to r251934 (or newer) before following what I tell you below. The problem is either that: - The patch you were given is probably for a different FreeBSD release, thus the code/line numbers/info in the code break the fuzzy logic matching, - You copy-pasted the diff and because of tabs vs. spaces botched it, - git apply/patch/whatever is weird, - Multitudes of other possibilities I do not care to go into. The hack kib@ gave you is not hard to manually add yourself. It's very few lines of code. I'm very surprised you didn't try to manually add it yourself. So I have done that for you. First, the proof -- this is against r251939, by the way, but that shouldn't matter as nobody has touched this between r251934 and r251939: $ svn info Path: . Working Copy Root Path: /home/jdc/work/src URL: svn://svn.freebsd.org/base/stable/9 Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 251939 Node Kind: directory Schedule: normal Last Changed Author: marius Last Changed Rev: 251939 Last Changed Date: 2013-06-18 07:20:14 -0700 (Tue, 18 Jun 2013) $ svn status M sys/dev/drm2/i915/intel_fb.c M sys/dev/syscons/scvgarndr.c The diff itself is available here: http://jdc.koitsu.org/freebsd/sysmouse_vsync.diff I've also attached it here in Email (assuming the mailing list doesn't delete it). You should apply the patch using: cd /usr/src (or wherever your source is) patch -p0 < sysmouse_vsync.diff Assuming use of svn, you can revert this patch by doing: cd /usr/src (or wherever your source is) svn revert sys/dev/drm2/i915/intel_fb.c svn revert sys/dev/syscons/scvgarndr.c rm sys/dev/drm2/i915/intel_fb.c.orig rm sys/dev/syscons/scvgarndr.c.orig There is probably some other "magical" way to do all of this, but as anyone here knows, I do things manually because in general I do not trust VCSes or the "magic" they do under the hood; I prefer to do things that I know work. Good luck -- I cannot help with any other aspect to the issue. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | Index: sys/dev/drm2/i915/intel_fb.c === --- sys/dev/drm2/i915/intel_fb.c (revision 251939) +++ sys/dev/drm2/i915/intel_fb.c (working copy) @@ -207,6 +207,8 @@ static void intel_fbdev_destroy(struct drm_device } } +extern int sc_txtmouse_no_retrace_wait; + int intel_fbdev_init(struct drm_device *dev) { struct intel_fbdev *ifbdev; @@ -229,6 +231,7 @@ int intel_fbdev_init(struct drm_device *dev) drm_fb_helper_single_add_all_connectors(&ifbdev->helper); drm_fb_helper_initial_config(&ifbdev->helper, 32); + sc_txtmouse_no_retrace_wait = 1; return 0; } Index: sys/dev/syscons/scvgarndr.c === --- sys/dev/
Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG
On Sun, Jun 16, 2013 at 06:01:49PM +0200, Michiel Boland wrote: > On 06/16/2013 17:55, Jeremy Chadwick wrote: > [...] > > >Are you running moused(8)? Actually, I can see quite clearly that you > >are in your core.txt: > > > >Starting ums0 moused. > > > >Try turning that off. Don't ask me how, because devd(8) / devd.conf(5) > >might be involved. > > > > The moused is started by devd - I don't see a quick way of turning that off. Comment out the relevant crap in devd.conf(5). Search for "ums" and comment out the two "notify" sections. > As a workaround I'm trying to run a kernel with > > options SC_NO_SYSMOUSE > > to see if the hangs go away. That's one way to do it, I guess. Be aware that I do not use X, however I have repeatedly seen mentioned on these lists problems/complexities from where people rely on moused(8) to "drive their mouse" while inside of X (or possibly that X and moused(8) are both simultaneously polling the mouse). There's apparently a very specific kind of X configuration you're supposed to use to get proper mouse/keyboard/HAL/HID/whatever support, and tons of people have it wrongt. Warren Block I think has some insights into this, or could maybe help shed some light on what I'm remembering. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG
On Sun, Jun 16, 2013 at 05:48:52PM +0200, Michiel Boland wrote: > On 06/16/2013 17:37, Konstantin Belousov wrote: > >On Sun, Jun 16, 2013 at 05:11:15PM +0200, Michiel Boland wrote: > >>Hi. Recently I switched to WITH_NEW_XORG, primarily because the stock X > >>server > >>with Intel driver has some issues that make it unusable for me. > >> > >>The new X server and Intel driver works extremely well, so kudos to whoever > >>made > >>this possible. > >> > >>Unfortunately, I am now experiencing random hangs on shutdown. On shutdown > >>the > >>system randomly freezes after > >> > >>[...] syslogd: exiting on signal 15 > >> > >>I would then expect to see 'Waiting (max 60 seconds) for system process > >>'XXX' to > >>stop messages, but these never arrive. > >> > >>I paniced the machine in ddb, so I have a crash dump if someone want to > >>look at > >>it. The crashinfo is at http://barrytown.boland.org/core.txt (I would have > >>pasted it here but it is a bit verbose.) > >> > >>Machine has an Intel G41 chipset, with a SAMSUNG SSD 830 Series HD, running > >>9.1-STABLE r251803. Serial console. GENERIC kernel, expect for options DDB > >>and > >>ALT_BREAK_TO_DEBUGGER. > >> > >>Who knows what's going on here? > > > >I do not see anything related to i915 in the core.txt you provided. > > > >Next time the machine hangs, start with the output of ps command from > >ddb and 'show allpcpu', together with 'alltrace'. > > > > Ok. > > I appended 'thread apply all bt' from kgdb to the core.txt, maybe > there is something interesting in there. > > I did notice the following > > Thread 17 (Thread 17): > #0 cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1392 > #1 0x80cbebbd in ipi_nmi_handler () at > /usr/src/sys/amd64/amd64/mp_machdep.c:1374 > #2 0x80ccc159 in trap (frame=0x81424890) at > /usr/src/sys/amd64/amd64/trap.c:211 > #3 0x80cb55af in nmi_calltrap () at > /usr/src/sys/amd64/amd64/exception.S:501 > #4 0x80d0c029 in vga_txtmouse (scp=0xfe0005586600, > x=320, y=200, on=) at cpufunc.h:186 > Previous frame inner to this frame (corrupt stack?) > > Maybe the hang is caused by the removal of the text mouse cursor? > (Just guessing here.) vga_txtmouse comes from syscons(4). Are you making use of vidcontrol(1) in any way to set the system console (outside of X) to something that uses the VGA framebuffer? There are probably some loader.conf or rc.conf variables that control this (I do not know). Are you running moused(8)? Actually, I can see quite clearly that you are in your core.txt: Starting ums0 moused. Try turning that off. Don't ask me how, because devd(8) / devd.conf(5) might be involved. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
On Sun, Jun 16, 2013 at 11:55:38AM +0200, Andre Albsmeier wrote: > On Sun, 16-Jun-2013 at 10:49:37 +0200, Jeremy Chadwick wrote: > > On Sun, Jun 16, 2013 at 10:02:39AM +0200, Andre Albsmeier wrote: > > > On Sun, 16-Jun-2013 at 08:54:41 +0200, Jeremy Chadwick wrote: > > > > On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote: > > > > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote: > > > > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: > > > > > > > Each day at 5:15 we are generating snapshots on various machines. > > > > > > > This used to work perfectly under 7-STABLE for years but since > > > > > > > we started to use 9.1-STABLE the machine reboots in about 10% > > > > > > > of all cases. > > > > > > > > > > > > > > After rebooting we find a new snapshot file which is a bit > > > > > > > smaller than the good ones and with different permissions > > > > > > > It does not succeed a fsck. In this example it is the one > > > > > > > whose name is beginning with s3: > > > > > > > > > > > > > > -r--r- 1 root operator snapshot 72802894528 29 May 05:15 > > > > > > > s2-2013.05.28-03.15.04 > > > > > > > -r 1 root operator snapshot 72802893824 29 May 05:15 > > > > > > > s3-2013.05.29-03.15.03 > > > > > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > > > > > s4-2013.05.23-06.38.44 > > > > > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > > > > > s5-2013.05.24-03.15.03 > > > > > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > > > > > s6-2013.05.25-03.15.03 > > > > > > > > > > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel > > > > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15): > > > > > > > > > > > > > > May 29 05:15:00 palveli kernel: lock order reversal: > > > > > > > May 29 05:15:00 palveli kernel: 1st 0xc2371da8 ufs > > > > > > > (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240 > > > > > > > May 29 05:15:00 palveli kernel: 2nd 0xc2371ec4 devfs > > > > > > > (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 > > > > > > > May 29 05:15:04 palveli kernel: lock order reversal: > > > > > > > May 29 05:15:04 palveli kernel: 1st 0xc228471c snaplk > > > > > > > (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 > > > > > > > May 29 05:15:04 palveli kernel: 2nd 0xc22f25e4 ufs > > > > > > > (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 > > > > > > > > > > > > > > Unfortunatley no corefiles are being generated ;-(. > > > > > > > > > > > > > > I have checked and even rebuilt the (UFS1) fs in question > > > > > > > from scratch. I have also seen this happen on an UFS2 on > > > > > > > another machine and on a third one when running "dump -L" > > > > > > > on a root fs. > > > > > > > > > > > > > > Any hints of how to proceed? > > > > > > > > > > > > Would it be possible to setup a serial console that is logged on > > > > > > this machine > > > > > > to see if it is panic'ing but failing to write out a crashdump? > > > > > > > > > > I'll try to arrange that. It'll take a bit since this > > > > > box is 200 km away... > > > > > > > > > > Maybe I'll find another one nearby to reproduce it... > > > > > > > > SPECIFICALLY regarding "lack of crash dumps": I need to see the > > > > following: > > > > > > > > * cat /etc/rc.conf > > > > * cat /etc/fstab > > > > > > > > I may need output from other commands, but shall deal with that when I > > > > see output from the above. Thanks. > > > > > > No problem, see below... > > > > > > To make a long story short, the machine dumps core perfectly > > > (tested that a while ago), but not when deali
Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
On Sun, Jun 16, 2013 at 10:02:39AM +0200, Andre Albsmeier wrote: > On Sun, 16-Jun-2013 at 08:54:41 +0200, Jeremy Chadwick wrote: > > On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote: > > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote: > > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: > > > > > Each day at 5:15 we are generating snapshots on various machines. > > > > > This used to work perfectly under 7-STABLE for years but since > > > > > we started to use 9.1-STABLE the machine reboots in about 10% > > > > > of all cases. > > > > > > > > > > After rebooting we find a new snapshot file which is a bit > > > > > smaller than the good ones and with different permissions > > > > > It does not succeed a fsck. In this example it is the one > > > > > whose name is beginning with s3: > > > > > > > > > > -r--r- 1 root operator snapshot 72802894528 29 May 05:15 > > > > > s2-2013.05.28-03.15.04 > > > > > -r 1 root operator snapshot 72802893824 29 May 05:15 > > > > > s3-2013.05.29-03.15.03 > > > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > > > s4-2013.05.23-06.38.44 > > > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > > > s5-2013.05.24-03.15.03 > > > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > > > s6-2013.05.25-03.15.03 > > > > > > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel > > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15): > > > > > > > > > > May 29 05:15:00 palveli kernel: lock order reversal: > > > > > May 29 05:15:00 palveli kernel: 1st 0xc2371da8 ufs (ufs) > > > > > @ /src/src-9/sys/kern/vfs_mount.c:1240 > > > > > May 29 05:15:00 palveli kernel: 2nd 0xc2371ec4 devfs > > > > > (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 > > > > > May 29 05:15:04 palveli kernel: lock order reversal: > > > > > May 29 05:15:04 palveli kernel: 1st 0xc228471c snaplk > > > > > (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 > > > > > May 29 05:15:04 palveli kernel: 2nd 0xc22f25e4 ufs (ufs) > > > > > @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 > > > > > > > > > > Unfortunatley no corefiles are being generated ;-(. > > > > > > > > > > I have checked and even rebuilt the (UFS1) fs in question > > > > > from scratch. I have also seen this happen on an UFS2 on > > > > > another machine and on a third one when running "dump -L" > > > > > on a root fs. > > > > > > > > > > Any hints of how to proceed? > > > > > > > > Would it be possible to setup a serial console that is logged on this > > > > machine > > > > to see if it is panic'ing but failing to write out a crashdump? > > > > > > I'll try to arrange that. It'll take a bit since this > > > box is 200 km away... > > > > > > Maybe I'll find another one nearby to reproduce it... > > > > SPECIFICALLY regarding "lack of crash dumps": I need to see the > > following: > > > > * cat /etc/rc.conf > > * cat /etc/fstab > > > > I may need output from other commands, but shall deal with that when I > > see output from the above. Thanks. > > No problem, see below... > > To make a long story short, the machine dumps core perfectly > (tested that a while ago), but not when dealing with _this_ > issue... > > I dump on da1s1b and savecore fetches it from there and puts > it on /var (sitting on da0), that's faster. > > rc.conf (beware, rc.conf.local exists): > --- > rcshutdown_timeout=180 > tmpmfs=YES > tmpsize="$(( `/sbin/sysctl -n hw.usermem` / 300 ))m" > tmpmfs_flags="$tmpmfs_flags -v 1 -n" > > background_fsck=NO > > nisdomainname=ofw.tld > pflog_flags=-S > > syslogd_flags=-svv > inetd_enable=YES > inetd_flags=-l > named_flags="-S 1000" > named_chrootdir="" > rwhod_enable=YES > sshd_enable=YES > amd_enable=YES > amd_flags="-F /etc/amd.conf" > nfs_client_enable=YES > nfs_access_cache=2 > mountd_flags=-n > rpcbind_
Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote: > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote: > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: > > > Each day at 5:15 we are generating snapshots on various machines. > > > This used to work perfectly under 7-STABLE for years but since > > > we started to use 9.1-STABLE the machine reboots in about 10% > > > of all cases. > > > > > > After rebooting we find a new snapshot file which is a bit > > > smaller than the good ones and with different permissions > > > It does not succeed a fsck. In this example it is the one > > > whose name is beginning with s3: > > > > > > -r--r- 1 root operator snapshot 72802894528 29 May 05:15 > > > s2-2013.05.28-03.15.04 > > > -r 1 root operator snapshot 72802893824 29 May 05:15 > > > s3-2013.05.29-03.15.03 > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > s4-2013.05.23-06.38.44 > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > s5-2013.05.24-03.15.03 > > > -r--r- 1 root operator snapshot 72802894528 28 May 14:22 > > > s6-2013.05.25-03.15.03 > > > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel > > > I see the following LORs (mksnap_ffs starts exactly at 5:15): > > > > > > May 29 05:15:00 palveli kernel: lock order reversal: > > > May 29 05:15:00 palveli kernel: 1st 0xc2371da8 ufs (ufs) @ > > > /src/src-9/sys/kern/vfs_mount.c:1240 > > > May 29 05:15:00 palveli kernel: 2nd 0xc2371ec4 devfs (devfs) > > > @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 > > > May 29 05:15:04 palveli kernel: lock order reversal: > > > May 29 05:15:04 palveli kernel: 1st 0xc228471c snaplk > > > (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 > > > May 29 05:15:04 palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ > > > /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 > > > > > > Unfortunatley no corefiles are being generated ;-(. > > > > > > I have checked and even rebuilt the (UFS1) fs in question > > > from scratch. I have also seen this happen on an UFS2 on > > > another machine and on a third one when running "dump -L" > > > on a root fs. > > > > > > Any hints of how to proceed? > > > > Would it be possible to setup a serial console that is logged on this > > machine > > to see if it is panic'ing but failing to write out a crashdump? > > I'll try to arrange that. It'll take a bit since this > box is 200 km away... > > Maybe I'll find another one nearby to reproduce it... SPECIFICALLY regarding "lack of crash dumps": I need to see the following: * cat /etc/rc.conf * cat /etc/fstab I may need output from other commands, but shall deal with that when I see output from the above. Thanks. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ACPI Warning, then hang
On Thu, Jun 13, 2013 at 05:32:21PM -0500, Bryce Edwards wrote: > On Mon, Jun 10, 2013 at 9:32 PM, Jeremy Chadwick wrote: > > On Mon, Jun 10, 2013 at 09:18:47PM -0500, Bryce Edwards wrote: > >> Verbose boot: > >> > >> https://www.dropbox.com/s/obm8rtavro68ea8/acpi-verbose.jpg > >> > >> > >> On Mon, Jun 10, 2013 at 11:27 AM, Bryce Edwards wrote: > >> > On Mon, Jun 10, 2013 at 11:19 AM, John Baldwin wrote: > >> >> On Monday, June 10, 2013 10:35:07 am Jeremy Chadwick wrote: > >> >>> On Mon, Jun 10, 2013 at 09:18:14AM -0500, Bryce Edwards wrote: > >> >>> > I'm getting the following warning, and then the system locks: > >> >>> > > >> >>> > ACPI Warning: Incorrect checksum in table [(bunch of spaces)] - 0x29, > >> >>> > should be 0x48 > >> >>> > > >> >>> > Here's a pic: http://db.tt/O6dxONzI > >> >>> > > >> >>> > System is on a SuperMicro C7X58 motherboard that I just upgraded to > >> >>> > BIOS 2.0a, which I would like to stay on if possible. I tried > >> >>> > adjusting all the ACPI related BIOS settings without success. > >> >>> > >> >>> The message in question refers to hard-coded data in one of the many > >> >>> ACPI tables (see acpidump(8) for the list -- there are many). ACPI > >> >>> tables are stored within the BIOS -- the motherboard/BIOS vendor has > >> >>> full control over all of them and is fully 100% responsible for their > >> >>> content. > >> >>> > >> >>> It looks to me like they severely botched their BIOS, or somehow it got > >> >>> flashed wrong. > >> >>> > >> >>> You need to contact Supermicro Technical Support and tell them of the > >> >>> problem. They need to either fix their BIOS, or help figure out what's > >> >>> become corrupted. You can point them to this thread if you'd like. > >> >>> > >> >>> I should note that the corruption/issue is major enough that you are > >> >>> missing very key/important lines from your dmesg (after "avail memory" > >> >>> but before "kdbX at kdbmuxX", which come from pure reliance upon ACPI. > >> >>> Lines such as: > >> >>> > >> >>> Event timer "LAPIC" quality 400 > >> >>> ACPI APIC Table: > >> >>> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > >> >>> FreeBSD/SMP: 1 package(s) x 4 core(s) > >> >>> cpu0 (BSP): APIC ID: 0 > >> >>> cpu1 (AP): APIC ID: 1 > >> >>> cpu2 (AP): APIC ID: 2 > >> >>> cpu3 (AP): APIC ID: 3 > >> >>> ioapic0 irqs 0-23 on motherboard > >> >>> ioapic1 irqs 24-47 on motherboard > >> >>> > >> >>> In the meantime, you can try booting without ACPI support (there should > >> >>> be a boot-up menu option for that) and pray that works. If it doesn't, > >> >>> then your workaround is to roll back to an older BIOS version and/or > >> >>> put > >> >>> pressure on Supermicro. You will find their Technical Support folks > >> >>> are > >> >>> quite helpful/responsive to technical issues. > >> >>> > >> >>> Good luck and keep us posted on what transpires. > >> >> > >> >> Actually, that message is mostly harmless. All sorts of vendors ship > >> >> tables with busted checksums that are in fact fine. :( However, the > >> >> table > >> >> name looks very odd which is more worrying. Booting without ACPI > >> >> enabled > >> >> would be a good first step. Trying a verbose boot to capture the last > >> >> message before the hang would also be useful. > >> >> > >> >> -- > >> >> John Baldwin > >> > > >> > Booting without ACPI did not work for me, although I might be able to > >> > hack away at lots of BIOS setting to make it work. It didn't assign > >> > IRQ's to things like the storage controller, etc. soI thought it was > >> > probably not worth the effort. > >> > > >> > I did contact SuperMicro support
Re: ACPI Warning, then hang
On Mon, Jun 10, 2013 at 09:18:47PM -0500, Bryce Edwards wrote: > Verbose boot: > > https://www.dropbox.com/s/obm8rtavro68ea8/acpi-verbose.jpg > > > On Mon, Jun 10, 2013 at 11:27 AM, Bryce Edwards wrote: > > On Mon, Jun 10, 2013 at 11:19 AM, John Baldwin wrote: > >> On Monday, June 10, 2013 10:35:07 am Jeremy Chadwick wrote: > >>> On Mon, Jun 10, 2013 at 09:18:14AM -0500, Bryce Edwards wrote: > >>> > I'm getting the following warning, and then the system locks: > >>> > > >>> > ACPI Warning: Incorrect checksum in table [(bunch of spaces)] - 0x29, > >>> > should be 0x48 > >>> > > >>> > Here's a pic: http://db.tt/O6dxONzI > >>> > > >>> > System is on a SuperMicro C7X58 motherboard that I just upgraded to > >>> > BIOS 2.0a, which I would like to stay on if possible. I tried > >>> > adjusting all the ACPI related BIOS settings without success. > >>> > >>> The message in question refers to hard-coded data in one of the many > >>> ACPI tables (see acpidump(8) for the list -- there are many). ACPI > >>> tables are stored within the BIOS -- the motherboard/BIOS vendor has > >>> full control over all of them and is fully 100% responsible for their > >>> content. > >>> > >>> It looks to me like they severely botched their BIOS, or somehow it got > >>> flashed wrong. > >>> > >>> You need to contact Supermicro Technical Support and tell them of the > >>> problem. They need to either fix their BIOS, or help figure out what's > >>> become corrupted. You can point them to this thread if you'd like. > >>> > >>> I should note that the corruption/issue is major enough that you are > >>> missing very key/important lines from your dmesg (after "avail memory" > >>> but before "kdbX at kdbmuxX", which come from pure reliance upon ACPI. > >>> Lines such as: > >>> > >>> Event timer "LAPIC" quality 400 > >>> ACPI APIC Table: > >>> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > >>> FreeBSD/SMP: 1 package(s) x 4 core(s) > >>> cpu0 (BSP): APIC ID: 0 > >>> cpu1 (AP): APIC ID: 1 > >>> cpu2 (AP): APIC ID: 2 > >>> cpu3 (AP): APIC ID: 3 > >>> ioapic0 irqs 0-23 on motherboard > >>> ioapic1 irqs 24-47 on motherboard > >>> > >>> In the meantime, you can try booting without ACPI support (there should > >>> be a boot-up menu option for that) and pray that works. If it doesn't, > >>> then your workaround is to roll back to an older BIOS version and/or put > >>> pressure on Supermicro. You will find their Technical Support folks are > >>> quite helpful/responsive to technical issues. > >>> > >>> Good luck and keep us posted on what transpires. > >> > >> Actually, that message is mostly harmless. All sorts of vendors ship > >> tables with busted checksums that are in fact fine. :( However, the table > >> name looks very odd which is more worrying. Booting without ACPI enabled > >> would be a good first step. Trying a verbose boot to capture the last > >> message before the hang would also be useful. > >> > >> -- > >> John Baldwin > > > > Booting without ACPI did not work for me, although I might be able to > > hack away at lots of BIOS setting to make it work. It didn't assign > > IRQ's to things like the storage controller, etc. soI thought it was > > probably not worth the effort. > > > > I did contact SuperMicro support as well, so we'll see what they have to > > say. > > > > I'll get a verbose boot posted up in a bit. A screenshot of a verbose boot is insufficient; as I'm sure you noticed there are pages upon pages of information before the lock-up/crash. Those pages are what folks are interested in. Because the system is hung, I doubt hitting Scroll Lock + using PageUp/PageDown to go through the kernel message scrollback will work. You're going to need a serial-based console (i.e. hook something up to COM1 on the motherboard, and get a null modem cable to connect to another system where you use a serial port/terminal emulator (ex. PuTTY for Windows, etc.) that has a scrollback buffer which you can copy-paste or save. Set your serial port for 9600 baud, 8 bits, no parity, and 1 stop bit (9600bps, 8N1). You'll need to have physical access to both systems simultaneously. At the VGA console, boot FreeBSD then escape to the loader prompt ("ok") and issue the following commands: set boot_multicons="YES" set boot_serial="YES" set console="comconsole,vidconsole" boot You should begin seeing output on the serial port, and the system will eventually hang/etc.. Then provide the captured output from the serial port here. :-) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ACPI Warning, then hang
On Mon, Jun 10, 2013 at 09:18:14AM -0500, Bryce Edwards wrote: > I'm getting the following warning, and then the system locks: > > ACPI Warning: Incorrect checksum in table [(bunch of spaces)] - 0x29, > should be 0x48 > > Here's a pic: http://db.tt/O6dxONzI > > System is on a SuperMicro C7X58 motherboard that I just upgraded to > BIOS 2.0a, which I would like to stay on if possible. I tried > adjusting all the ACPI related BIOS settings without success. The message in question refers to hard-coded data in one of the many ACPI tables (see acpidump(8) for the list -- there are many). ACPI tables are stored within the BIOS -- the motherboard/BIOS vendor has full control over all of them and is fully 100% responsible for their content. It looks to me like they severely botched their BIOS, or somehow it got flashed wrong. You need to contact Supermicro Technical Support and tell them of the problem. They need to either fix their BIOS, or help figure out what's become corrupted. You can point them to this thread if you'd like. I should note that the corruption/issue is major enough that you are missing very key/important lines from your dmesg (after "avail memory" but before "kdbX at kdbmuxX", which come from pure reliance upon ACPI. Lines such as: Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard In the meantime, you can try booting without ACPI support (there should be a boot-up menu option for that) and pray that works. If it doesn't, then your workaround is to roll back to an older BIOS version and/or put pressure on Supermicro. You will find their Technical Support folks are quite helpful/responsive to technical issues. Good luck and keep us posted on what transpires. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Error in make buildkernel `
On Mon, Jun 10, 2013 at 02:04:59PM +0200, Willem Jan Withagen wrote: > I'm trying to build a stable kernle on a freshly build 8.4-Stable i386 > system. > > And I get: > MAKE=make sh /usr/srcs/src9/src/sys/conf/newvers.sh GENERIC > /usr/local/bin/svnversion > cc -c -O -pipe -std=c99 -g -Wall -Wredundant-decls -Wnested-externs > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline > -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions > -Wmissing-include-dirs -fdiagnostics-show-option -nostdinc -I. > -I/usr/srcs/src9/src/sys -I/usr/srcs/src9/src/sys/contrib/altq -D_KERNEL > -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common > -finline-limit=8000 --param inline-unit-growth=100 --param > large-function-growth=1000 -mno-align-long-strings > -mpreferred-stack-boundary=2 -mno-mmx -mno-sse -msoft-float > -ffreestanding -fstack-protector -Werror vers.c > ctfconvert -L VERSION -g vers.o > linking kernel.debug > ld:/usr/srcs/src9/src/sys/conf/ldscript.i386:66: syntax error > *** Error code 1 > > Stop in /usr/obj/usr/srcs/src9/src/sys/GENERIC. > *** Error code 1 > > Stop in /usr/srcs/src9/src. > *** Error code 1 > > Line 66 is: .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) } > The piece of "code" around line 66 looks like: > > PROVIDE (__etext = .); > PROVIDE (_etext = .); > PROVIDE (etext = .); > .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } > .rodata1: { *(.rodata1) } > .eh_frame_hdr : { *(.eh_frame_hdr) } > .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) } > .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table > .gcc_except_table.*) } > /* Adjust the address for the data segment. We want to adjust up to > the same address within the page on the next page up. */ > . = ALIGN (CONSTANT (MAXPAGESIZE)) - ((CONSTANT (MAXPAGESIZE) - .) & > (CONSTANT (MAXPAGESIZE) - 1)); . = DATA_SEGMENT_ALIGN (CONSTANT > (MAXPAGESIZE), CONSTANT (COMMONPAGESI > ZE)); > /* Exception handling */ > > Any suggestions on how to fix this?? I can't help with the actual syntax error, but from the path names involved here, it looks like you: 1) are using an alternate location for src (/usr/srcs not /usr/src), 2) are trying to build FreeBSD 9.x on an 8.4-STABLE box (/usr/obj/usr/srcs/src9) Is that correct? You might want to provide /etc/make.conf and /etc/src.conf from this system or other details of the "build framework" you might be using. That might help/pertain to the situation. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.4 and EHCI - regression?
le 802.11s draft support > #lena device wlan_wep# 802.11 WEP support > #lena device wlan_ccmp # 802.11 CCMP support > #lena device wlan_tkip # 802.11 TKIP support > #lena device wlan_amrr # AMRR transmit rate control algorithm > #lena device an # Aironet 4500/4800 802.11 wireless > NICs. > #lena device ath # Atheros pci/cardbus NIC's > #lena device ath_hal # pci/cardbus chip support > #lena options AH_SUPPORT_AR5416 # enable AR5416 tx/rx > descriptors > #lena device ath_rate_sample # SampleRate tx rate control for ath > #lena device ral # Ralink Technology RT2500 wireless > NICs. > #lena device wi # WaveLAN/Intersil/Symbol 802.11 > wireless NICs. > #lena #device wl # Older non 802.11 Wavelan wireless NIC. > > # Pseudo devices. > deviceloop# Network loopback > devicerandom # Entropy device > #lena options PADLOCK_RNG # VIA Padlock RNG > options RDRAND_RNG # Intel Bull Mountain RNG > deviceether # Ethernet support > #lena device vlan# 802.1Q VLAN support > #lena device tun # Packet tunnel. > devicepty # BSD-style compatibility pseudo ttys > #lena:load-as-module device md # Memory "disks" > #lena device gif # IPv6 and IPv4 tunneling > #lena device faith # IPv6-to-IPv4 relaying (translation) > #lena device firmware# firmware assist module > > # The `bpf' device enables the Berkeley Packet Filter. > # Be aware of the administrative consequences of enabling this! > # Note that 'bpf' is required for DHCP. > devicebpf # Berkeley packet filter > > # USB support > options USB_DEBUG # enable debug msgs > deviceuhci# UHCI PCI->USB interface > deviceohci# OHCI PCI->USB interface > deviceehci# EHCI PCI->USB interface (USB 2.0) > deviceusb # USB Bus (required) > #device udbp# USB Double Bulk Pipe devices > deviceuhid# "Human Interface Devices" > deviceukbd# Keyboard > #lena device ulpt# Printer > deviceumass # Disks/Mass storage - Requires scbus > and da > #lena:load-as-module device ums # Mouse > #lena device urio# Diamond Rio 500 MP3 player > # USB Serial devices > #lena device u3g # USB-based 3G modems (Option, Huawei, > Sierra) > #lena device uark# Technologies ARK3116 based serial > adapters > #lena device ubsa# Belkin F5U103 and compatible serial > adapters > #lena device uftdi # For FTDI usb serial adapters > #lena device uipaq # Some WinCE based devices > #lena device uplcom # Prolific PL-2303 serial adapters > #lena device uslcom # SI Labs CP2101/CP2102 serial adapters > #lena device uvisor # Visor and Palm devices > #lena device uvscom # USB serial support for DDI pocket's > PHS > # USB Ethernet, requires miibus > #lena device aue # ADMtek USB Ethernet > #lena device axe # ASIX Electronics USB Ethernet > #lena device cdce# Generic USB over Ethernet > #lena device cue # CATC USB Ethernet > #lena device kue # Kawasaki LSI USB Ethernet > #lena device rue # RealTek RTL8150 USB Ethernet > #lena device udav# Davicom DM9601E USB > # USB Wireless > #lena device rum # Ralink Technology RT2501USB wireless > NICs > #lena device uath# Atheros AR5523 wireless NICs > #lena device ural# Ralink Technology RT2500USB wireless > NICs > #lena device zyd # ZyDAS zd1211/zd1211b wireless NICs > > # FireWire support > #lena device firewire# FireWire bus code > #device sbp # SCSI over FireWire (Requires scbus > and da) > #lena device fwe # Ethernet over FireWire (non-standard!) > #lena device fwip# IP over FireWire (RFC 2734,3146) > #lena device dcons # Dumb console driver > #lena device dcons_crom # Configuration ROM for dcons > > # VirtIO support > devicevirtio # Generic VirtIO bus (required) > devicevirtio_pci # VirtIO PCI device > devicevtnet # VirtIO Ethernet device > devicevirtio_blk # VirtIO Block device > devicevirtio_scsi # VirtIO SCSI device > devicevirtio_balloon # VirtIO Memory Balloon device > > #lenab > # from /sys/conf/NOTES: > > # Optional character code conversion support with LIBICONV. > # Each option requires their base file system and LIBICONV. > > options MSDOSFS_ICONV > > # Kernel side iconv library > options LIBICONV > > # Set the amount of time (in seconds) the system will wait before > # rebooting automatically when a kernel panic occurs. If set to (-1), > # the system will wait indefinitely until a key is pressed on the > # console. > options PANIC_REBOOT_WAIT_TIME=60 #lena was 16 > > # from /sys/i386/conf/NOTES: > > # Enable Linux ABI emulation > options COMPAT_LINUX > > #lenae CC'ing freebsd-usb@, where Hans can probably help with this. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: fxp0 interface going up/down/up/down (dhclient related?)
On Sun, Jun 09, 2013 at 02:48:29PM +0200, Alban Hertroys wrote: > On Jun 9, 2013, at 12:44, Jeremy Chadwick wrote: > > > On Sun, Jun 09, 2013 at 12:21:37PM +0200, Alban Hertroys wrote: > >> I'm having an issue where my fxp0 interface keeps looping between DOWN/UP, > >> with dhclient requesting a lease each time in between. I think it's caused > >> by dhclient: > >> > >> solfertje # dhclient -d fxp0 > >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > >> send_packet: Network is down > >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > >> DHCPACK from 109.72.40.1 > >> bound to 141.105.10.89 -- renewal in 7200 seconds. > >> fxp0 link state up -> down > >> fxp0 link state down -> up > >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > >> DHCPACK from 109.72.40.1 > >> bound to 141.105.10.89 -- renewal in 7200 seconds. > >> fxp0 link state up -> down > >> fxp0 link state down -> up > >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > >> DHCPACK from 109.72.40.1 > >> bound to 141.105.10.89 -- renewal in 7200 seconds. > >> fxp0 link state up -> down > >> fxp0 link state down -> up > >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > >> DHCPACK from 109.72.40.1 > >> bound to 141.105.10.89 -- renewal in 7200 seconds. > >> fxp0 link state up -> down > >> fxp0 link state down -> up > >> DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > >> DHCPACK from 109.72.40.1 > >> bound to 141.105.10.89 -- renewal in 7200 seconds. > >> fxp0 link state up -> down > >> ^C > >> > >> In above test I turned off devd (/etc/rc.d/devd stop) and background > >> dhclient (/etc/rc.d/dhclient stop fxp0), and I still go the above result. > >> There's practically no time spent between up/down cycles, this just keeps > >> going on and on. > >> fxp0 is the only interface that runs on DHCP. The others have static IP's. > >> > >> Initially I thought the issue might be caused by devd, because I have both > >> ethernet and 822.11 type NICs (2x ethernet, 1x wifi) in that system. > >> > >> This is 9-STABLE from yesterday. > >> > >> Before, I had 9-RELEASE running on this system with the same config, and > >> that worked well. > > > > And so what I predicted begins... > > > > The issue is described in the 8.4-RELEASE Errata Notes; the driver is > > using the same driver version as in stable/9, hence you're experiencing > > the same problem. See Open Issues: > > > > http://www.freebsd.org/releases/8.4R/errata.html > > > > No fix for this has been committed. It is still under discussions by > > multiple kernel folks as to where the fix should be applied (dhclient or > > the fxp(4) driver), because the changes made to dhclient (that tickle > > this bug) may actually affect more drivers than just fxp(4). > > > > You can start by reading the (extremely long but very informative) > > thread here. I do urge you to read all the posts, not skim them: > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/thread.html#73440 > > Goodness, and here I was hoping it was just a silly mistake I made? > > IIUC, the issue is a combination of: > - dhclient now being aware of link state changes and > - the fxp driver reinitializes for certain mode changes, such as assigning an > IP address > > Which causes dhclient to think that the link state changed, fetch a "new" IP > address and assigns it to the fxp adapter again, causing the same link state > change over and over again. > > Is that about correct? Someone else can answer this. > > The only known workarounds at this time are: > > > > a) Cease use of DHCP; set a static IP in rc.conf, > > > > b) Try some of the patches mentioned within the above thread, > > specifically this one: > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073581.html > > Or c) Use DHCP with a static media setting: > ifconfig_fxp0="DHCP media 100baseTX mediaopt full-duplex" DO NOT DO THIS. People who do this do not understand what this does. This has bad effects on IEEE 802.3 and will not do/behave like you might think. The short version: The ONLY TIME you should be hard-setting speed and duplex in ifconfig is when you have a managed switch on the other end where you can set the speed/duplex for that port as well.
Re: fxp0 interface going up/down/up/down (dhclient related?)
On Sun, Jun 09, 2013 at 01:21:53PM +0200, ?ukasz Gruner wrote: > On Sun, Jun 9, 2013, at 12:44, Jeremy Chadwick wrote: > > On Sun, Jun 09, 2013 at 12:21:37PM +0200, Alban Hertroys wrote: > > > I'm having an issue where my fxp0 interface keeps looping between > > > DOWN/UP, with dhclient requesting a lease each time in between. I think > > > it's caused by dhclient: > > And so what I predicted begins... > > I have been suffering this issue since forever (which for me began at > freebsd 9.0). Currently I'm at stable9. The problem we're talking about was a direct result of this PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=166656 The commit (MFC) was done to stable/8 and stable/9 in this revision and at this date/time: stable/9 commit: r247335 -- 2013/02/26 stable/8 commit: r247336 -- 2013/02/26 You can see the commit log/messages in the PR. Now let's talk about versions: FreeBSD 9.0-RELEASE came out 2012/01/12: http://lists.freebsd.org/pipermail/freebsd-announce/2012-January/001406.html FreeBSD 9.1-RELEASE came out 2012/12/30: http://lists.freebsd.org/pipermail/freebsd-announce/2012-December/001448.html So when you say "the issue" for you "began at FreeBSD 9.0", you need to be more specific (uname -a output would be a good start), because otherwise to me it sounds like you're experiencing a *completely* different problem. > Much appreciated, shouldn't this be at wiki? What wiki? How would people know to read it? Using a web search engine like Google? That would return this mailing list thread, as well as the ones I've referenced. There is enough old/outdated/completely and absolutely WRONG crap on the FreeBSD Wiki as is. The Wiki is not the "official source/list of problems" (there is no official source/list -- the mailing lists are, for a decade, have been as good as it gets). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: fxp0 interface going up/down/up/down (dhclient related?)
On Sun, Jun 09, 2013 at 12:21:37PM +0200, Alban Hertroys wrote: > I'm having an issue where my fxp0 interface keeps looping between DOWN/UP, > with dhclient requesting a lease each time in between. I think it's caused by > dhclient: > > solfertje # dhclient -d fxp0 > DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > send_packet: Network is down > DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > DHCPACK from 109.72.40.1 > bound to 141.105.10.89 -- renewal in 7200 seconds. > fxp0 link state up -> down > fxp0 link state down -> up > DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > DHCPACK from 109.72.40.1 > bound to 141.105.10.89 -- renewal in 7200 seconds. > fxp0 link state up -> down > fxp0 link state down -> up > DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > DHCPACK from 109.72.40.1 > bound to 141.105.10.89 -- renewal in 7200 seconds. > fxp0 link state up -> down > fxp0 link state down -> up > DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > DHCPACK from 109.72.40.1 > bound to 141.105.10.89 -- renewal in 7200 seconds. > fxp0 link state up -> down > fxp0 link state down -> up > DHCPREQUEST on fxp0 to 255.255.255.255 port 67 > DHCPACK from 109.72.40.1 > bound to 141.105.10.89 -- renewal in 7200 seconds. > fxp0 link state up -> down > ^C > > In above test I turned off devd (/etc/rc.d/devd stop) and background dhclient > (/etc/rc.d/dhclient stop fxp0), and I still go the above result. There's > practically no time spent between up/down cycles, this just keeps going on > and on. > fxp0 is the only interface that runs on DHCP. The others have static IP's. > > Initially I thought the issue might be caused by devd, because I have both > ethernet and 822.11 type NICs (2x ethernet, 1x wifi) in that system. > > This is 9-STABLE from yesterday. > > Before, I had 9-RELEASE running on this system with the same config, and that > worked well. And so what I predicted begins... The issue is described in the 8.4-RELEASE Errata Notes; the driver is using the same driver version as in stable/9, hence you're experiencing the same problem. See Open Issues: http://www.freebsd.org/releases/8.4R/errata.html No fix for this has been committed. It is still under discussions by multiple kernel folks as to where the fix should be applied (dhclient or the fxp(4) driver), because the changes made to dhclient (that tickle this bug) may actually affect more drivers than just fxp(4). You can start by reading the (extremely long but very informative) thread here. I do urge you to read all the posts, not skim them: http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/thread.html#73440 The only known workarounds at this time are: a) Cease use of DHCP; set a static IP in rc.conf, b) Try some of the patches mentioned within the above thread, specifically this one: http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073581.html The patch is for head (CURRENT) so it may not patch cleanly. If not, you can try to work the patch in yourself/by hand, or you can ask Yong-Hyeon or others for help. > I'm not sure it's related, but on the wireless interface I get alot of: > Jun 9 12:08:11 solfertje kernel: ath0: stuck beacon; resetting (bmiss count > 4) Absolutely 100% unrelated. That issue has been around for years, and the root cause varies tremendously. I discussed it back in February 2011: http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061700.html If you want to know how I solved that problem, I can tell you, but I'm certain you won't be happy to hear what I have to say. If you're concerned about this problem, please start another thread discussing it. I'm sure Adrian Chadd can provide you lots of insights, but most of them are already in his response to my above thread/post. > {snipping other stuff} -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: TRIM support through ciss
On Thu, Jun 06, 2013 at 02:00:36AM +0400, Dmitry Morozovsky wrote: > Dear colleagues, > > I have a DB server with ciss and a bunch of disks (8 SAS + 2 Intel SATA SSD). > > However, this setup does not seem to support TRIM on SSDs: > > kstat.zfs.misc.zio_trim.bytes: 0 > kstat.zfs.misc.zio_trim.success: 0 > kstat.zfs.misc.zio_trim.unsupported: 418 > kstat.zfs.misc.zio_trim.failed: 0 > > > Excerpt from dmesg about SSD: > > da9 at ciss0 bus 0 scbus0 target 9 lun 0 > da9: Fixed Direct Access SCSI-5 device > da9: Serial Number PACCR9SZ7KJS > da9: 135.168MB/s transfers > da9: Command Queueing enabled > da9: 114439MB (234371520 512 byte sectors: 255H 32S/T 28722C) > da9: quirks=0x1 > da9: Delete methods: > > the last line bothers me... > > Is there any tuning I missed? I'm sure Steve will respond, but in the meantime... I assume this is you running stable/9 with r251419 or newer (which just got committed a few hours ago)? I haven't looked at the code, but it is very, VERY important to remember that you are *always* at the whim of 1) the controller driver (ciss(4) in this case), and 2) the controller firmware, as to whether or not certain pass-through commands are supported (in this case, since you have a SAS controller, this would be accomplished via a SCSI command that your controller does not support. Oh, it looks like Steve just replied and said more or less what I did. :-) Bottom line as "we" (the royal we, I guess) have been saying for many years now: any controller which operates in a RAID fashion and does not support "true JBOD" (meaning the controller acts a generic controller with no concept of RAID), will almost always get in the way. Instead, stick with true non-RAID controllers -- and yes I am aware choices are limited. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Serial terminal issues
On Wed, Jun 05, 2013 at 09:29:56PM +0200, Alban Hertroys wrote: > {sniping stuff that is pending or has been acknowledged} > On Jun 5, 2013, at 2:59, Jeremy Chadwick wrote: > > Serial port speed settings in a BIOS pertain to BIOS-level console > > redirection -- that redirection is lost the instant anything (boot > > loader, kernel, etc.) touches SMI and/or interrupts and starts > > "fiddling" with the serial port. > > That's the bit I wasn't entirely certain of - that there is no possible > interaction from having a BIOS console to the point where the OS takes over. > That's why I mentioned it. > > I assumed that if the BIOS had set up the serial port to 19200 baud and the > OS didn't specify it, that it would be possible that the speed set up in the > BIOS would still be in effect and that the serial terminal just incidentally > worked for the last 10 years because of that. Far-fetched, I know. Not far-fetched. Some system BIOSes with BIOS-level serial console redirection offer what you describe -- on Supermicro systems, you can toggle this capability in the BIOS, it's called "Continue CR After POST" (CR stands for Console Redirection). This is hard to explain without getting into the technicalities, so bear with me here. Get coffee, etc.. What this BIOS feature does is "retain" the SMI/interrupt mapping stuff, so that certain calls to interrupt 0x10 (the BIOS interrupt) for things like cursor movement, writing text/strings, etc. are done on the native console (ex. VGA) *as well* as sent to the serial port (and converted into escape sequences of your choice -- another BIOS option, "Console Type", lets you pick between things like vt100, ANSI, ASCII, etc.). This option is useful for things like option ROMs or HBAs (SCSI/SAS controllers, etc.) which print stuff *after* POST. I'm sure you've seen this. With "Continue CR After POST" disabled, those types of messages are only seen on the VGA console. However, regardless of the setting of "Continue CR After POST", the instant any x86 code starts tinkering with the SMI/interrupt stuff, that functionality is lost (and cannot be restored). In FreeBSD, this definitely happens when the kernel starts, but AFAIR not during the bootstraps. Instead, the bootstraps (that is: boot0, as well as boot2/loader) have the ability to speak to the serial port *directly*, rather than relying on interrupt 0x10. The -S19200 parameter in /boot.config causes the bootstraps **very** early on to set the serial port speed to 19200 baud. This could cause a problem if you have BIOS-level serial redirect set up and set to a different speed (ex. 57600), so naturally you need to make sure everything uses the same speed at all "stages". The -Dh parameter in /boot.config causes the bootstraps **very** early on to tell FreeBSD to write data to the VGA console/text console, in addition to the serial port (directly, not via interrupt 0x10). For how all that works (meaning how -D vs. -Dh behaves and at what ""stages"" of the FreeBSD boot process), please see the FreeBSD Handbook section 27.6.4.1: http://www.freebsd.org/doc/en/books/handbook/serialconsole-setup.html The handbook here is also outdated/wrong; it's talking about sio0 when it means to refer to uart0. flags for uart0 in this case will be 0x00010 (meaning uart0 is a potential serial console). Finally, the important/key part: the -Dh capability when used in /boot.config gets ""passed on"" to boot2/loader (so it knows to output data to the serial port as a console), and boot2/loader **ALSO** passes that information on to the kernel when it starts so that it knows to print data to the serial port too. Make sense? :-) This is why I advocate using /boot.config (or you can use /boot/config if you wish -- both in 9.1-RELEASE work (thanks des@ !)) rather than mucking about with /boot/loader.conf -- the added advantage is that you can actually get serial output at an earlier phase/stage, in case some of your boot blocks don't work. More specifically, with /boot.config you can actually get this on the serial port (if you bang on Escape or Enter repeatedly VERY early on in the boot process): >> FreeBSD/i386 BOOT Default: 0:ad(0,a)/boot/loader boot: But if you don't bang on keys, you won't ever see this. Anyway, sorry for the long ramble there, but the above is how it works. (I'm sure readers will go "My god, that is one of the best write-ups I've seen of how the serial console/boot process stuff works, why isn't this in the handbook!?" to which I will opt out/not respond to). > > What you're adjusting in FreeBSD is 1) the FreeBSD boot loader touching > > the serial port, and 2) the FreeBSD kernel outputting to a serial port > > (it also in
Re: Serial terminal issues
= This diagram should allow you to build your own cable if need be, including a null-modem cable if you plan on doing a PC<-->PC (i.e. DB9 to DB9) connection. I just happened to use RJ45 stuff; for DB9-to-DB9; just follow the chart (signal/pin names) and go from there. If you already have a cable and you aren't sure of its wiring (which is very common -- sigh, stupid companies...), you will need to figure out the wiring using a multimetre (continuity test is all that's needed). > I didn't see any options in the BIOS to set the console speed (just > address and IRQ, those are in the above). ISTR that my old mobo did > allow to set that information, but then again, that board (Tyan Tiger) > gave me access to the BIOS through the serial console. This has absolutely no relevancy. Serial port speed settings in a BIOS pertain to BIOS-level console redirection -- that redirection is lost the instant anything (boot loader, kernel, etc.) touches SMI and/or interrupts and starts "fiddling" with the serial port. What you're adjusting in FreeBSD is 1) the FreeBSD boot loader touching the serial port, and 2) the FreeBSD kernel outputting to a serial port (it also initialises/sets the serial port), and 3) getty et al spawning a login prompt on the serial port. I would point you to my "FreeBSD via serial console and PXE" document, except there are one-offs specific to the PXE portions that are not relevant to your situation. The important part is that I've used FreeBSD serial console for almost 16 years and have a very good understanding of what works (including vs. what some developers say "should" work; i.e. reality vs. pragmatism). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS crashing while zfs recv in progress
6GB RAM. That's a bit shocking for something of this size. Moving on. Can you tell me what exact disk (e.g. daXX) in the above list you used for swap, and what kind of both system and disk load were going on at the time you saw the swap message? I'm looking for a capture of "gstat -I500ms" output (you will need a VERY long/big terminal window to capture this given how many disks you have) while I/O is happening, as well as "top -s 1" in another window. I would also like to see "zpool iostat -v 1" output while things are going on, to help possibly narrow down if there is a single disk causing the entire I/O subsystem for that controller to choke. Next: are you using compression or dedup on any of your filesystems? If not, have you ever in the past? Next: could we have your loader.conf and sysctl.conf please? My gut feeling is that if you're doing zfs {send,recv} for "tank" -- which you are -- multiple subsystems and busses are so incredibly overwhelmed by all the I/O and interrupts and *everything* that it's very hard for the swap I/O time slicer to get a decent share of time to swap something out to swap (even worse if that controller is overwhelmed with requests). Worse, you're using raidz2, which means even more CPU time + calculation overhead, which means less time for other tasks (threads). Everything on the system -- everything! -- is fighting for time at multiple levels. If you could put a swap disk on a dedicated controller (and no other disks on it), that would be ideal. Please do not use USB for this task (the USB stack may introduce its own set of complexities pertaining to interrupt usage). If all this turns out to be an "overall system overwhelmed" situation, my advice is to cut back on the usage. I would STRONGLY suggest in that case a 2nd system, and split the number of disks across both. I'm really surprised given how many disks/etc. you have you didn't choose to get an actual filer (Netapp). I sure as hell would have. I really do not know why people think ZFS is a full-blown replacement for a Netapp of this scale -- it isn't. Anyway take what I say with a grain of salt -- really. I'm just throwing out thoughts/ideas as I look over everything. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.1-current disk throughput stalls ?
On Mon, Jun 03, 2013 at 03:34:26PM -0700, Jeremy Chadwick wrote: > 7. ZFS setup is a mirror (RAID-1-like), Should have referenced [2]. > 12. Rolling back to 8.4-STABLE (date/build unknown) apparently fixes > your issue (I would appreciate you running the system for 72 hours > before making this statement, and doing the *exact same things* on it > that cause the problem with 9.1-STABLE) [2] I should have used the word "exacerbate" instead of "cause". > v) I really wish you would not have rolled this system back to > 8.4-STABLE. For anyone to debug this, we need the system in a > consistent state. Changing kernels/etc. User error while using vim (I have an awful tendency to nuke entire lines when switching between input mode vs. navigation mode); last line should read "Changing kernels/etc. in the middle of troubleshooting a problem you ask for assistance with makes things very difficult". (And I say that knowing that rolling back as a form of testing is good, since it can help narrow things down to a specific version or release, i.e. a software problem). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.1-current disk throughput stalls ?
On Mon, Jun 03, 2013 at 03:48:30PM -0600, Ross Alexander wrote: > On Mon, 3 Jun 2013, Jeremy Chadwick wrote: > > >1. There is no such thing as 9.1-CURRENT. Either you meant 9.1-STABLE > >(what should be called stable/9) or -CURRENT (what should be called > >head). > > >I wrote: > >>The oldest kernel I have that shows the syndrome is - > >> > >>FreeBSD aukward.bogons 9.1-STABLE FreeBSD 9.1-STABLE #59 r250498: > >>Sat May 11 00:03:15 MDT 2013 > >>toor@aukward.bogons:/usr/obj/usr/src/sys/GENERIC amd64 > > See above. You're right, I shouldn't post after a 07:00 dentist's > appt while my spouse is worrying me about the ins adjustor's report > on the car damage :(. Hey, I'm very fallible. I'll try harder. > > >2. Is there some reason you excluded details of your ZFS setup? > >"zpool status" would be a good start. > > Thanks for the useful hint as to what info you need to diagnose. > > One of the machines ran a 5 drive zraid-1 pool (Mnemosyne). > > Another was a 2 drive gmirror, in the simplest possible gpart/gmirror setup. > (Mnemosyne-sub-1.) > > The third is a 2 drive ZFS raid-1, again in the simplest possible > gpart/gmirror manner (Aukward). > > The fourth is a conceptually identical 2 drive ZFS raid-1, swapping > to a zvol (Griffon.) > > If you look on the FreeBSD wiki, the pages that say "bootable zfs > gptzfsboot" and "bootable mirror" - > > https://wiki.freebsd.org/RootOnZFS > http://www.freebsdwiki.net/index.php/RAID1,_Software,_How_to_setup > > Well, I just followed those in cookbook style (modulo device and pool > names). Didn't see any reason to be creative; I build for > reliability, not performance. > > Aukward is gpart/zfs raid-1 box #1: > > aukward:/u0/rwa > ls -l /dev/gpt > total 0 > crw-r- 1 root operator 0x91 Jun 3 10:18 vol0 > crw-r- 1 root operator 0x8e Jun 3 10:18 vol1 > > aukward:/u0/rwa > zpool list -v > NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT > ult_root 111G 108G 2.53G97% 1.00x ONLINE - > mirror 111G 108G 2.53G - > gpt/vol0 - - - - > gpt/vol1 - - - - > > aukward:/u0/rwa > zpool status > pool: ult_root > state: ONLINE > scan: scrub repaired 0 in 1h13m with 0 errors on Sun May 5 04:29:30 > 2013 > config: > > NAME STATE READ WRITE CKSUM > ult_root ONLINE 0 0 0 > mirror-0ONLINE 0 0 0 > gpt/vol0 ONLINE 0 0 0 > gpt/vol1 ONLINE 0 0 0 > > errors: No known data errors > > (Yes, that machine has no swap. Has NEVER had swap, has 16 GB and > uses maybe 10% at max load. Has been running 9.x since prerelease > days, FWTW. The ARC is throttled to 2 GB; zfs-stats says I never get > near using even that. It's just the box that drives the radios, > a ham radio hobby machine.) > > Griffon is also gpart/zfs raid-1 - > > griffon:/u0/rwa > uname -a > FreeBSD griffon.cs.athabascau.ca 9.1-STABLE FreeBSD 9.1-STABLE #25 > r251062M: > Tue May 28 10:39:13 MDT 2013 > t...@griffon.cs.athabascau.ca:/usr/obj/usr/src/sys/GENERIC > amd64 > > griffon:/u0/rwa > ls -l /dev/gpt > total 0 > crw-r- 1 root operator 0x7b Jun 3 08:38 disk0 > crw-r- 1 root operator 0x80 Jun 3 08:38 disk1 > crw-r- 1 root operator 0x79 Jun 3 08:38 swap0 > crw-r- 1 root operator 0x7e Jun 3 08:38 swap1 > > and the pool is fat and happy - > > griffon:/u0/rwa > zpool status -v > pool: pool0 > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > pool0 ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > gpt/disk0 ONLINE 0 0 0 > gpt/disk1 ONLINE 0 0 0 > > errors: No known data errors > > Note that swap is through ZFS zvol; > > griffon:/u0/rwa > cat /etc/fstab > # DeviceMountpoint FStype Options DumpPass# > # > # > /dev/zvol/pool0/swap none swapsw 0 0 > > pool0 / zfs rw 0 0 > pool0/tmp /tmpzfs rw 0 0 > pool0/var /var
Re: 9.1-current disk throughput stalls ?
world doesn't get finished. > I am seeing very similar behaviour on three other 9.1-current > machines, all of which are AHCI/SATA setups, using both Seagate and WD > disks (of random sizes and ages). All these boxes ran fine a month > ago. > > BTW, when I do the rattle-keyboard-to-get-disks-going trick, the NFS > daemon reports that the system clock slews badly - machine time drops > behind wall clock time. Something is locking the clock update off. > > (Hmmm, I see I'm running a pre-5000/feature flags ZFS pool, FWTW. > I'll run zpool upgrade, my bad.) 1. There is no such thing as 9.1-CURRENT. Either you meant 9.1-STABLE (what should be called stable/9) or -CURRENT (what should be called head). 2. Is there some reason you excluded details of your ZFS setup? "zpool status" would be a good start. 3. Do any of your filesystems/pools have ZFS compression enabled, or have in the past? 4. Do any of your filesystems/pools have ZFS dedup enabled, or have in the past? 5. Does the problem go away after a reboot? 6. Can you provide smartctl -x output for both ada0 and ada1? You will need to install ports/sysutils/smartmontools for this. The reason I'm asking for this is there may be one of your disks which is causing I/O transactions to stall for the entire pool (i.e. "single point of annoyance"). 7. Can you remove ZFS from the picture entirely (use UFS only) and re-test? My guess is that this is ZFS behaviour, particularly the ARC being flushed to disk, and your disks are old/slow. (Meaning: you have 16GB RAM + 4 core CPU but with very old disks). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout
On Mon, Jun 03, 2013 at 03:06:53PM +0100, Mike Pumford wrote: > Ian Lepore wrote: > >On Wed, 2013-05-29 at 16:21 +0200, Oliver Fromme wrote: > >>Steven Hartland wrote: > >> > Have you checked your sata cables and psu outputs? > >> > > >> > Both of these could be the underlying cause of poor signalling. > >> > >>I can't easily check that because it is a cheap rented > >>server in a remote location. > >> > >>But I don't believe it is bad cabling or PSU anyway, or > >>otherwise the problem would occur intermittently all the > >>time if the load on the disks is sufficiently high. > >>But it only occurs at tags=3 and above. At tags=2 it does > >>not occur at all, no matter how hard I hammer on the disks. > >> > >>At the moment I'm inclined to believe that it is either > >>a bug in the HDD firmware or in the controller. The disks > >>aren't exactly new, they're 400 GB Samsung ones that are > >>several years old. I think it's not uncommon to have bugs > >>in the NCQ implementation in such disks. > >> > >>The only thing that puzzles me is the fact that the problem > >>also disappears completely when I reduce the SATA rev from > >>II to I, even at tags=32. > >> > > > >It seems to me that you dismiss signaling problems too quickly. > >Consider the possibilities... A bad cable leads to intermittant errors > >at higher speeds. When NCQ is disabled or limited the software handles > >these errors pretty much transparently. When NCQ is not limitted and > >there are many outstanding requests, suddenly the error handling in the > >software breaks down somehow and a minor recoverable problem becomes an > >in-your-face error. > > > It could also be a software bug in the way CAM handles the failure > of NCQ commands. When command queueing is used on a SCSI drive and a > queued command fails only that command fails. A queued command > failure on a SATA device fails ALL currently queued commands. I've > not looked at the code but do the SATA CAM drivers do the right > thing here? Quoting T13/2015-D ATA8-ACS2 WD spec: "If an error occurs while the device is processing an NCQ command, then the device shall return command aborted for all NCQ commands that are in the queue and shall return command aborted for any new commands, except a READ LOG EXT command requesting log address 10h, until the device completes a READ LOG EXT command requesting log address 10h (i.e., reading the NCQ Command Error log) without error." While I can't easily provide an answer to your question, I can tell you that sys/dev/ahci/ahci.c does execute READ LOG EXT (command 0x2f) for certain scenarios (the code is in function ahci_issue_recovery()). The one person who can answer this question is mav@, who is now CC'd. > Less commands queued makes it less likely that multiple commands > will be in progress when a failure occurs. A lower link rate also > makes you more immune to signal failures. He isn't seeing SATA-level signal/link failure; the AHCI driver would complain about that, and those messages aren't there. Unless, of course, those messages are only visible when verbose booting is enabled (I hope not). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Corrupt GPT header on disk from twa array - fixable?
On Mon, Jun 03, 2013 at 09:14:41AM +0200, Alban Hertroys wrote: > > On Jun 3, 2013, at 1:09, Warren Block wrote: > > > On Mon, 3 Jun 2013, Alban Hertroys wrote: > >>> > >>> Really, the easiest way would be to temporarily install the old RAID > >>> controller and copy the data off the array. > >> > >> Well, that would mean I'd have to assemble the old server again, as the > >> controller is not compatible with the hardware in the new one. And that > >> would probably be unnecessary as well, since I already did copy the data > >> off those disks. > >> > >> I was just curious whether it would be possible to read that data off the > >> disks while I still have them (with their original contents) in the new > >> server in the eventuality that I _did_ forget to copy something over or > >> that something wasn't copied over correctly. > >> > >> I copied the data over a 100MBit ethernet link, which was the fastest > >> option I had with the old server; it had USB1 and no native SATA. Hence > >> the RAID controller, but that was on a now deprecated PCI-X channel (those > >> 64-bit parallel things) and all 4 ports were in use. Not to mention that > >> the CPU was so old that it had a rather narrow margin for operating > >> temperatures and overheated several times during the copying process, > >> because rsync+sshd put a relatively high load on the CPU (An old Athlon XP > >> 2000+). > > > > PCI-X cards will operate in PCI slots. Or at least some will; I've done > > that with an Intel network card. The motherboard can't have components > > that block the unused part of the edge connector, or the offending card > > edge could be removed with extreme prejudice. > > Not this 3Ware card. I remember buying that particular motherboard because > the card wouldn't fit in the PCI slots on the board I had. There's a division > in those PCI-X slots opposite of where there's one in normal PCI slots and no > groove in the card to match the division in the PCI slot. This is all besides-the-point, but to clarify: please see the following diagram: http://en.wikipedia.org/wiki/File:PCI_Keying.png I recommend seeing the caption under the diagram, in addition to reading the "Mixing of 32-bit and 64-bit PCI cards in different width slots" section: http://en.wikipedia.org/wiki/PCI-X It sounds like your 3Ware card is 5V PCI-X (32-bit or 64-bit is irrelevant), and your new motherboard only supports 3.3V PCI (which is pretty much the norm on all motherboards today when it comes to classic PCI). The 5V stuff is generally shunned (both with regards to PCI and PCI-X) and is uncommon at this point in time. You can find some server-class boards that offer this capability, such as Supermicro's UIO slots, where you purchase the proper type of "riser" (adapter) for the type of card you have, i.e. UIO->5.5V PCI-X 64-bit), but you will not find this on consumer/desktop or even "enthusiast" boards. Example: http://www.supermicro.com/support/resources/riser/riser.aspx If you want to know what kind of card it is, ask 3Ware or see the user manual. Note that many vendors do not disclose all the relevant data in the manual or on their site. That info: voltage (3.3V vs. 5V vs. universal), bus width (32-bit vs. 64-bit), and if 64-bit if the card will function in a 32-bit slot (some cards won't). Educational footnote: AGP is another one of those standards that went through the same nonsense (specifically 3.3V vs. 1.5V), except the situation was worse when some card manufacturers began selling 1.5V cards with incorrect notchings, resulting in smoke/fire when installed in a 3.3V slot. I have one such card, and keep it solely as a reminder of manufacturer/vendor idiocy. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Corrupt GPT header on disk from twa array - fixable?
from the numbers would be 512 bytes in size). > > > Finally, GPT and gmirror are combined. That's a problematic combination > > because both want metadata in the last block of the drive. The new section > > in the Handbook about RAID1 (gmirror) describes that in the "Metadata > > Issues" section: > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/GEOM-mirror.html > > I'm pretty sure the disks on the controller had nothing to do with gmirror > ever. > > Gmirror is only applied to a pair of new disks that I put in the (new) server > to be able to copy my data over. I hadn't expected to be able to rely on > those original disks to be readable at all without the controller, so I > needed some place to store the data. I like the redundancy of a mirror, so I > used gmirror for (only) the new disks. I think you're missing what Warren is telling you, because you have multiple things going on/complexities to deal with simultaneously. You haven't provided any details about your gmirror setup either. All we know at this point: > >> GEOM_MIRROR: Device mirror/boot launched (2/2). > >> GEOM_MIRROR: Device mirror/swap launched (2/2). > >> GEOM_MIRROR: Device mirror/root launched (2/2). My gut feeling is ada2 and ada3 make up the mirror, and the mirror is at the disk level (ada2 and ada3). I'm basing this on past evidence presented in the thread, and having to make assumptions. No "gmirror status" output = we have to make assumptions. Now, what Warren is telling you: gmirror + GPT do not play well together. This is a design flaw** on the part of gmirror. If you want to use gmirror with disks using GPT, your only solutions are to mirror the partitions (adaXpX) and not the disk (adaX), which has its own set of caveats, or to use the MBR scheme (and if these are 4K sectors disks, or you plan on using those, you're even more screwed). I will not bring ZFS into this discussion since that also opens up a can of worms -- I'm trying to stay focused. The errors you see on ada4 and ada5 about the backup GPT header can be dealt with in a different manner. But for (again, assuming) ada2 and ada3, you will see GPT "backup header corruption" messages indefinitely because of the above flaw. ** -- I will not get into a debate about terminology. I am aware of the history (which came first), and so on. It's a flaw. Linux md had the same problem when GPT was introduced, and it has since been fixed/addressed. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.1-stable: ATI IXP600 AHCI: CAM timeout
0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > .. > .. It's worth pointing out that all of the events you provided are writes. In my experience, historically, that has usually been the case. If a drive firmware screws around when handling an NCQ write, taking too long to do something (think firmware bug), this can happen. If that's the case, the fact it happens on 2 disks of the same type thus wouldn't surprise me. I've mentioned in the past that I know of a few situations where this can happen, particularly with 4KByte sector drives, depending on how the user set up the system. In this case, the Samsung HD403LJ is supposedly a 512-byte sector drive, but the drive probably complies with an older ATA specification and thus only provides the logical sector size in ATA IDENTIFY output, thus the system must assume physical=logical (camcontrol and smartmontools will both say something to the effect of "512 bytes logical/physical"). I would appreciate the following: 1. smartctl -x {ada0,ada1} output using a recent version of smartmontools (6.1 if possible please), 2. camcontrol identify {ada0,ada1} -v output (note the -v), 3. If you are running smartd(8) or not, 4. pciconf -lvbc output. Anecdotal story: A lot of people forget the infamous nVidia nForce 4 vs. Maxtor NCQ issue that circulated "PC enthusiast" sites during the mid-2000s. Neither company wanted to own up to the problem, blaming each other instead. There was never any official statement made as to where the problem was, only that nVidia updated their nForce 4 controller drivers with some sort of workaround (details were not disclosed), and Maxtor also quietly added a document to their website stating that you could get a firmware from Technical Support that would address the problem as well. I had a combination of the two at the time, which is why I remember it. Still to this day nobody knows who was really responsible. I won't get into the whole political/societal aspects of why vendors always blame one another rather than solve real problems. There is no way at this time (in real-time or via loader.conf) to disable NCQ within the AHCI driver. It is possible to add an entry to the AHCI quirks table for your controller that sets AHCI_Q_NONCQ, if you want to try that. I can give you a patch for that, but I need to see the output from the above (4) commands first -- it may not be necessary to try, depending on the results. I have probably left out key/important informations within this mail, which is an indicator of how tired I have grown of seeing it come up. :-( -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: System doesn't dump
On Wed, May 29, 2013 at 08:41:38AM +0200, Dominic Fandrey wrote: > I have a number of actions that reliably panic the system, such as > performing shutdown -p (yes I'm booting into an inconsistent file > system every time). Both with my notebook and my workstation. > > However I cannot get the system to dump. > > dumpdir=/var/crash > and I've tried ada0s2b, /dev/ada0s2b, label/5swap, /dev/label/5swap and AUTO > for dumpdev to no avail. > > The swap partition is 16g, the machines have 8g RAM and there's plenty > of hard disk space available for /var/crash. > > I'm looking for that secret, undocumented trigger, that makes the > system dump if a panic occurs. Once upon a time dumping just worked > if the swap partition was large enough. I miss those olden days. Foremost: the fact you did not disclose your FreeBSD version (and SVN rev if you have it) nor architecture is disappointing. It matters more than you think. Please disclose it. Onward ho... If you have VGA console access, try dropping to db> and issuing the command "call doadump" (possibly preceded by "panic"). If you have serial console access, there are ways to drop to ddb but it depends on your kernel config (look for BREAK_TO_DEBUGGER and ALT_BREAK_TO_DEBUGGER in /sys/conf/NOTES). "Break" with serial, by the way, means a serial-level break signal (often why I prefer ALT_BREAK_TO_DEBUGGER). After doing "call doadump" you should definitely see the kernel dumping memory to swap (it gives a progress indicator of sorts). Google for the phrase "call doadump" and look at some of the results to get an idea of what the output normally is during that phase, for comparison. If you don't see such, I'm sure many of the kernel folks here can help figure out why. See sysctl debug.ddb.scripting.scripts for what should get automatically done on a panic. This may or may not be affected by ddb_enable="yes" in rc.conf (which mandates DDB being enabled in your kernel) -- I can't remember though, so someone else may want to comment. If your issue is that the kernel actually *does* dump memory to swap but that on boot-up savecore(8) doesn't recover the memory dump and populate relevant files in /var/crash: that's a separate issue that has been discussed for probably 10 years or longer with (to my knowledge) no definitive explanation. Theories presented (going off of memory here) were that that something ended up writing over parts of the "panic metadata" on the swap disk/slice/etc. and thus savecore(8) finds nothing. This is why rc scripts/etc. have to make sure to look for the swap "panic metadata" and run savecore(8) **before** issuing dumpon(8). My opinion, others' may vary: Stick with using dumpdev="auto" in rc.conf, assuming you have a /etc/fstab entry of "swap" somewhere. Swap should ideally be a partition or slice, not something abstracted out by other layers (see above paragraph for why I advocate that, but my additional opinion is that when it comes to getting a kernel dump and system configurations, KISS principle applies heavily. If your system is crashing, the last thing you want to deal with is why you can't get a kernel dump -- you could spend more time doing that than you do getting the panic info + debugging the actual crash), but again, this is my own opinion and there are legitimate other opinions as well -- I just follow what I do because I know it works. Likewise I always get wary of people's setups when I start seeing labels mentioned. *waves cane* Screw all this newfandangled stuff. :-) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Tue, May 28, 2013 at 10:57:22AM +0300, Daniel Braniss wrote: > > [...] > > 1. r248226 in head was MFC'd to stable/9 as r248858. Validation: > > > > http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log > > > > So the answer: whether or not you have that MFC in stable/9 depends on > > what SVN rev your kernel is. > > I do a svnsync then I convert to mercurial so from the svn logs I see that > the highest rev number is 250960. > > [...] > > > > That "piggybacking" crap never should have been invented. All it has > > done is cause problems for every OS I know of (including Windows) since > > its inception, and is also exactly why today almost all vendors I've > > seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface. > > It's admission the "piggybacking" method doesn't work. And may it rot > > in hell for all I care, while simultaneously feeling very sorry for > > those who have to suffer/deal with it. > > > > This is just another reason why I've always been very picky about what > > hardware I'd buy for server deployments. Vendors never actually > > disclose this crap until you've shelled out money for the hardware, by > > which point it's too late and you're suffering. Really great model -- > > for the pocketbook. :/ > > > > I couldn't agree more! > > [...] > > in the case of the SunFire X2200, it has 4 bge ports, the > 2nd, bge1, is only used by the ilo, it's not enabled (UP'ed), > it doesn't have an interrupt assigned, it's, as far as I can tell, > just anoying to have the DOWN/UP messages - unless something more sinester > is lurking. Does output from "ps -auxH | grep kernel/bge" show anything for bge1? What about "vmstat -i -a" (you might be surprised about the -a flag and what shows up compared to just using -i). Gut feeling says it will show up there. (See vmstat(8) for what -a does) Possibly interrupt generation isn't what's "triggering" the bge(4) device to see link going up/down; maybe this is done via some memory mapped I/O, which would explain why "vmstat -i" shows nothing for bge1 (no interrupts ever generated). That doesn't change the fact that the driver still is being told via some means that link is going up/down. Just a general FYI (probably not relevant here too much, but I often have to point it out for younger SAs (not saying anyone here is one, but the list is archived...)): there is a very distinct difference between a link being physically up/down vs. administratively up/down. With *IX ifconfig, the social assumption is that there's a 1:1 correlation between those (especially with Ethernet devices), when in reality it depends on the device driver and all subsystems in between. I remember quite clearly on some OSes (can't remember if BSD or Linux or Solaris) where "ifconfig xxx down" on certain devices would still result in packets being passed across xxx. This used to shock me when I was younger, but nowadays doesn't because I have a better understanding of why. ifconfig is just a generic tool that interfaces with a lot of things and tries to do too much, in my opinion. On BSD we tend to cram as much crap into ifconfig as humanly possible, while on other OSes separate per-device tools/utilities have been developed to segregate the intended behaviours/desires. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Mon, May 27, 2013 at 11:49:31PM -0700, Jeremy Chadwick wrote: > Other question: is there any correlation between the amount of time that > goes by between events with, say, ARP/MAC address expiry in "arp -a"? I > mention this because I know some of the ASF methods have historically > shown two MAC addresses on the same physif, and I can see how this might > confuse some stacks. Never mind -- I thought about this more, and it's irrelevant. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire > > > > > X2200, > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > > > > bge0: > > 0x009003> mem > > > 0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6 > > > bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > miibus2: on bge0 > > > brgphy0: PHY 1 on miibus2 > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > bge1: > > 0x009003> mem > > > 0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6 > > > bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > miibus3: on bge1 > > > brgphy1: PHY 1 on miibus3 > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > sf-10> ifconfig bge1 > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > options=8009b > > TE> > > > ether 00:1b:24:5d:5b:be > > > nd6 options=21 > > > media: Ethernet autoselect (100baseTX ) > > > status: active > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > Do you have some network script run by cron? > > no scripts. > this port is shared with the ILO/IPMI, and back in March you fixed a problem > that it was hanging soon after it was initialized by the driver, > (r248226 - but I'm not sure if it was ever MFC'ed). > Initialy I thought it could be caused by connections to it from other > hosts (either via the web, or ssh) so I killed them, but it didn't help. > without that patch the connection fails, and I don't see any DOWN/UP. Two things: 1. r248226 in head was MFC'd to stable/9 as r248858. Validation: http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log So the answer: whether or not you have that MFC in stable/9 depends on what SVN rev your kernel is. 2. Is there some way to verify that the ASF/iLO/IPMI bits (i.e. the IPMI firmware itself) are not shutting down bge1's PHY intentionally? Unless the IPMI module chooses to log something useful (e.g. "I'm doing this"), I'm not sure how you'd figure that out. Other question: is there any correlation between the amount of time that goes by between events with, say, ARP/MAC address expiry in "arp -a"? I mention this because I know some of the ASF methods have historically shown two MAC addresses on the same physif, and I can see how this might confuse some stacks. That "piggybacking" crap never should have been invented. All it has done is cause problems for every OS I know of (including Windows) since its inception, and is also exactly why today almost all vendors I've seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface. It's admission the "piggybacking" method doesn't work. And may it rot in hell for all I care, while simultaneously feeling very sorry for those who have to suffer/deal with it. This is just another reason why I've always been very picky about what hardware I'd buy for server deployments. Vendors never actually disclose this crap until you've shelled out money for the hardware, by which point it's too late and you're suffering. Really great model -- for the pocketbook. :/ -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Fri, May 24, 2013 at 02:47:20PM +0900, YongHyeon PYUN wrote: > On Thu, May 23, 2013 at 09:49:19PM -0700, Jeremy Chadwick wrote: > > On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote: > > > On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote: > > > > On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote: > > > > > If someone wants me to test DHCP via fxp(4) on the above system (I can > > > > > do so with both NICs), just let me know; it should only take me half > > > > > an > > > > > hour or so. > > > > > > > > > > I'll politely wait for someone to say "please do so" else won't > > > > > bother. > > > > > > > > > > > > > For the sake of completeness... > > > > > > > > "Please do so." :) > > > > > > Issue reproduced 100% reliably, even within sysinstall. > > > > > > {snip} > > > > Forgot to add: > > > > This issue ONLY happens when using DHCP. > > > > Statically assigning the IP address works fine; fxp0 goes down once, > > up once, then stays up indefinitely. > > I asked Mike to try backing out dhclient(8) change(r247336) but it > seems he missed that. Jeremy, could you try that? > > I guess dhclient(8) does not like flow-control negotiation of > fxp(4) after link establishment. I can't test anything without an ISO -- the system in question is truly "bare-bones" (no hard disk, can't boot USB memsticks, etc.). I'm not a good test subject for changes on this one, I'm sorry to say. :-( If there's some way to disable flow-control negotiation in fxp(4) or miibus(4) via loader, I can try that, but I don't know what the MIB name would be. If r247336 turns out to be the cause: ironic, as r247336 references PR 166656, which was tested against -- wait for it -- xl(4). People in *this* thread are saying "screw legacy hardware" yet the PR is for something as old as the 3C905B? Maybe I should bow out of this thread before I have an aneurysm. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Fri, May 24, 2013 at 01:24:24AM -0400, Glen Barber wrote: > Speaking entirely on behalf of myself now... > > On Thu, May 23, 2013 at 10:11:39PM -0700, Jeremy Chadwick wrote: > > > I think this will likely be included in errata notes for the release. > > > > I urge you to meet with others in Release Engineering and discuss this > > fully. This is major enough that, once fixed, it warrants an immediate > > binary update (to the kernel + if_fxp.ko) pushed out via freebsd-update. > > > > It can be solved with a -pN update after 8.4-RELEASE is out. Not that I'm calling the shots or anything, but: Let's go with that, combined with an included mention in the Errata section of the Release Notes as you initially mentioned. Sorry I can't be of more help; Charles' environment sounds like it would be better-suited for testing, and I'm sure Michael can test out a patch if/when someone gets around to poking at things. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Fri, May 24, 2013 at 12:56:20AM -0400, Glen Barber wrote: > On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote: > > [...] > > So if someone wants to take a stab at this, they'll need to do so and > > make me an ISO. Sorry that I can't make things easier. :-( > > > > This definitely needs to get fixed before 8.4-RELEASE. > > > > *sigh* > > At this point, it is highly unlikely this will be fixed before > 8.4-RELEASE. We are _far_ too deep into the release cycle. In fact, we > are effectively done with the release, and waiting on release notes to > be completed. > > I think this will likely be included in errata notes for the release. I urge you to meet with others in Release Engineering and discuss this fully. This is major enough that, once fixed, it warrants an immediate binary update (to the kernel + if_fxp.ko) pushed out via freebsd-update. fxp(4) is a commonly-used driver; it isn't something rare/uncommon. Also remember at this stage we don't know if it's a specific PHY model or specific NIC model (or series) which triggers it. For all we know it could affect everything that fxp(4) drives. Please don't forget that FreeBSD has a very well-established history of having rock-solid Intel NIC support. Sure, mistakes happen, we're human, bugs get introduced, but this does not bode well -- meaning I would expect Slashdot et al to pick up on this. > It is very unfortunate that this waited so long to be reported, as much > time has passed since 8.4-BETA1... This is what happens when people socially proliferate the belief that "RELEASE is rock solid/stable, don't run stable/X" -- the number of people who test what changes between RELEASE builds is vastly smaller comparatively. I've only been saying this for the past 15 years, so it's even more unfortunate that people keep believing it. :/ -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Thu, May 23, 2013 at 09:40:35PM -0700, Jeremy Chadwick wrote: > On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote: > > On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote: > > > If someone wants me to test DHCP via fxp(4) on the above system (I can > > > do so with both NICs), just let me know; it should only take me half an > > > hour or so. > > > > > > I'll politely wait for someone to say "please do so" else won't bother. > > > > > > > For the sake of completeness... > > > > "Please do so." :) > > Issue reproduced 100% reliably, even within sysinstall. > > {snip} Forgot to add: This issue ONLY happens when using DHCP. Statically assigning the IP address works fine; fxp0 goes down once, up once, then stays up indefinitely. I also tested network I/O in the statically-assigned scenario. Pinging the box from another machine on the LAN: $ ping 192.168.1.192 PING 192.168.1.192 (192.168.1.192): 56 data bytes 64 bytes from 192.168.1.192: icmp_seq=0 ttl=64 time=0.180 ms 64 bytes from 192.168.1.192: icmp_seq=1 ttl=64 time=0.138 ms 64 bytes from 192.168.1.192: icmp_seq=2 ttl=64 time=0.214 ms 64 bytes from 192.168.1.192: icmp_seq=3 ttl=64 time=0.165 ms 64 bytes from 192.168.1.192: icmp_seq=4 ttl=64 time=0.114 ms ^C --- 192.168.1.192 ping statistics --- 5 packets transmitted, 5 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.114/0.162/0.214/0.034 ms -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Thu, May 23, 2013 at 11:42:44PM -0400, Glen Barber wrote: > On Thu, May 23, 2013 at 08:38:06PM -0700, Jeremy Chadwick wrote: > > If someone wants me to test DHCP via fxp(4) on the above system (I can > > do so with both NICs), just let me know; it should only take me half an > > hour or so. > > > > I'll politely wait for someone to say "please do so" else won't bother. > > > > For the sake of completeness... > > "Please do so." :) Issue reproduced 100% reliably, even within sysinstall. ISO image used: ftp://ftp4.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/8.4/FreeBSD-8.4-RC3-i386-disc1.iso I just chose to Configure the system, selected Networking, chose NO to the IPv6 configuration choice, and YES to the DHCP configuration choice, then hit Alt-F2 to watch relevant output. This was the result: http://imgbin.org/index.php?page=image&id=13718 ...with the fxp0 physif up/down messages continuing indefinitely. fxp0 on the system is the Intel 82559. Shot of console's dmesg: http://imgbin.org/index.php?page=image&id=13720 Nothing is connected to fxp1. Key points for those asking me to help debug: - I only have VGA console on this box - I do not have an IDE hard disk of any sort for temporary OS installation, setup, kernel testing, etc.. - The system cannot boot USB media of any sort, so memsticks are out - The ATAPI drive is CD-only; there is no DVD support, so there's no easy way to get a "real" shell with full utilities (i.e. "Fixit") So if someone wants to take a stab at this, they'll need to do so and make me an ISO. Sorry that I can't make things easier. :-( This definitely needs to get fixed before 8.4-RELEASE. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Thu, May 23, 2013 at 11:13:03PM -0400, Glen Barber wrote: > On Thu, May 23, 2013 at 08:03:51PM -0700, Jeremy Chadwick wrote: > > On Thu, May 23, 2013 at 09:21:17PM -0400, Glen Barber wrote: > > > On Thu, May 23, 2013 at 06:09:43PM -0700, Jeremy Chadwick wrote: > > > > On Thu, May 23, 2013 at 08:18:33PM -0400, Michael L. Squires wrote: > > > > > I've just tested 8.4-RC3 using a different Supermicro 1U box with a > > > > > fresh > > > > > installation of 8.4-RC3. I had problems with the installation, > > > > > wouldn't > > > > > boot until I used a Windows 98 FDISK to write a master boot record > > > > > (no idea why; this system uses an Adaptec SATA 1.5 6-channel PCI-X > > > > > board with two > > > > > drives in RAID 1). > > > > > > > > > > Using the em0 interface there are no problems with DHCP; when I > > > > > switch to the fxp0 interface the interface starts going up/down in > > > > > the same manner as reported. > > > > > > > > > > The problem appears associated with "world", not with the kernel > > > > > (running > > > > > the 8.4 kernel with the 8.3 world does not have this problem). > > > > > > > > > > This motherboard is an X5DPL-iGM with 2 Xeon 2.8GHz CPUs and 4 GB of > > > > > RAM. > > > > > The other unit (an earlier board) has a Serverworks chipset with a > > > > > single > > > > > Xeon CPU but also with a 100Mbit Intel Pro100 Ethernet port and a > > > > > 1000Mbit > > > > > Intel Pro1000 Ethernet port. > > > > > > > > > > This unit isn't doing anything useful, so testing isn't a problem. > > > > > > > > Mike, Yong-Hyeon asked you a very important question which you didn't > > > > answer: > > > > > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073458.html > > > > > > > > If you assign a static IP address, does fxp0 behave properly? > > > > > > > > I'm also re-adding Yong-Hyeon to the CC list here. > > > > > > > > > > At this point, I am not convinced we have a problem with what will turn > > > out to be 8.4-RELEASE. > > > > > > There have been several attempts to ensure the upgraded version is > > > actually 8.4-RC3 (and again, 'uname -a' is not provided in this > > > email...). > > > > > > I find it very hard to believe that we have exactly one fxp(4) user > > > upgrading to 8.4-*. > > > > > > I'd really like to make sure that this is not an issue that will affect > > > an uncountable number of users, but truthfully, at this point have to > > > consider it a local configuration problem. > > > > I have numerous Supermicro 1U boxes sitting in my garage from closing > > down my hosting organisation back in August 2012. I am certain one or > > two of them have Intel NICs that use fxp(4) -- the problem is that I > > don't know what exact NIC and PHY model they use. > > > > >From what I can tell, there are at least two systems Mike has which > > experience this anomaly. One of those systems' dmesg: > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html > > > > The relevant lines start at "fxp0: > the way down to "pci0:0:8:0: bad VPD cksum, remain 14". I'm not sure if > > the bad VPD checksum message is relevant to the fxp0 device or not. > > > > The 2nd system is mentioned above/in this post: > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073530.html > > > > But there's no verbose dmesg etc. for the 2nd system so I don't know if > > it has the same NIC/PHY. > > > > My understanding from the start of this thread is that "both" machines > are actually the same machine, but with different combinations of > userland/kernel. (No, not arguing anything - only one person can answer > if my understanding is correct or not.) > > The model of NIC and PHY matters greatly; most users don't seem to > > realise how important this is, they think in terms of "Intel vs. > > Broadcom vs. Realtek". > > > > Output from "pciconf -lvbc", specifically the lines relevant to the fxp0 > > device, from both systems, would be highly beneficial. > > > >
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Thu, May 23, 2013 at 09:21:17PM -0400, Glen Barber wrote: > On Thu, May 23, 2013 at 06:09:43PM -0700, Jeremy Chadwick wrote: > > On Thu, May 23, 2013 at 08:18:33PM -0400, Michael L. Squires wrote: > > > I've just tested 8.4-RC3 using a different Supermicro 1U box with a fresh > > > installation of 8.4-RC3. I had problems with the installation, wouldn't > > > boot until I used a Windows 98 FDISK to write a master boot record > > > (no idea why; this system uses an Adaptec SATA 1.5 6-channel PCI-X > > > board with two > > > drives in RAID 1). > > > > > > Using the em0 interface there are no problems with DHCP; when I > > > switch to the fxp0 interface the interface starts going up/down in > > > the same manner as reported. > > > > > > The problem appears associated with "world", not with the kernel (running > > > the 8.4 kernel with the 8.3 world does not have this problem). > > > > > > This motherboard is an X5DPL-iGM with 2 Xeon 2.8GHz CPUs and 4 GB of RAM. > > > The other unit (an earlier board) has a Serverworks chipset with a single > > > Xeon CPU but also with a 100Mbit Intel Pro100 Ethernet port and a 1000Mbit > > > Intel Pro1000 Ethernet port. > > > > > > This unit isn't doing anything useful, so testing isn't a problem. > > > > Mike, Yong-Hyeon asked you a very important question which you didn't > > answer: > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073458.html > > > > If you assign a static IP address, does fxp0 behave properly? > > > > I'm also re-adding Yong-Hyeon to the CC list here. > > > > At this point, I am not convinced we have a problem with what will turn > out to be 8.4-RELEASE. > > There have been several attempts to ensure the upgraded version is > actually 8.4-RC3 (and again, 'uname -a' is not provided in this > email...). > > I find it very hard to believe that we have exactly one fxp(4) user > upgrading to 8.4-*. > > I'd really like to make sure that this is not an issue that will affect > an uncountable number of users, but truthfully, at this point have to > consider it a local configuration problem. I have numerous Supermicro 1U boxes sitting in my garage from closing down my hosting organisation back in August 2012. I am certain one or two of them have Intel NICs that use fxp(4) -- the problem is that I don't know what exact NIC and PHY model they use. >From what I can tell, there are at least two systems Mike has which experience this anomaly. One of those systems' dmesg: http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073440.html The relevant lines start at "fxp0: http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073530.html But there's no verbose dmesg etc. for the 2nd system so I don't know if it has the same NIC/PHY. The model of NIC and PHY matters greatly; most users don't seem to realise how important this is, they think in terms of "Intel vs. Broadcom vs. Realtek". Output from "pciconf -lvbc", specifically the lines relevant to the fxp0 device, from both systems, would be highly beneficial. In the meantime, I'll head down to my garage to see if I can find those fxp(4) boxes and see if they're 85551s (I sure hope I haven't pulled the CPUs/RAM from them). If I find a match, I can try to reproduce this. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Apparent fxp regression in FreeBSD 8.4-RC3
On Thu, May 23, 2013 at 08:18:33PM -0400, Michael L. Squires wrote: > I've just tested 8.4-RC3 using a different Supermicro 1U box with a fresh > installation of 8.4-RC3. I had problems with the installation, wouldn't > boot until I used a Windows 98 FDISK to write a master boot record > (no idea why; this system uses an Adaptec SATA 1.5 6-channel PCI-X > board with two > drives in RAID 1). > > Using the em0 interface there are no problems with DHCP; when I > switch to the fxp0 interface the interface starts going up/down in > the same manner as reported. > > The problem appears associated with "world", not with the kernel (running > the 8.4 kernel with the 8.3 world does not have this problem). > > This motherboard is an X5DPL-iGM with 2 Xeon 2.8GHz CPUs and 4 GB of RAM. > The other unit (an earlier board) has a Serverworks chipset with a single > Xeon CPU but also with a 100Mbit Intel Pro100 Ethernet port and a 1000Mbit > Intel Pro1000 Ethernet port. > > This unit isn't doing anything useful, so testing isn't a problem. Mike, Yong-Hyeon asked you a very important question which you didn't answer: http://lists.freebsd.org/pipermail/freebsd-stable/2013-May/073458.html If you assign a static IP address, does fxp0 behave properly? I'm also re-adding Yong-Hyeon to the CC list here. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Swap Warning Message?
On Thu, May 23, 2013 at 07:55:20AM -0500, Michael Gass wrote: > Updated 9.1 to 9 stable on an old PII with 256 MB of memory. > (FreeBSD runs fine on this machine). After updating have > been getting the following warning on startup: > > warning: total configured swap (524288 pages) exceeds maximum recommended > amount (497056 pages). > warning: increase kern.maxswzone or reduce amount of swap space. > > I allocated 2.0 GB of swap when I installed. This was not a problem > in the past. > > Should I ignore this warning or do I need to do something? Taken from my /boot/loader.conf: # Set kern.maxswzone to 0 to squelch "total configured swap exceeds # maximum recommended amount" warning, even with maxpages/2 fix. # http://lists.freebsd.org/pipermail/freebsd-stable/2012-August/thread.html#69301 # kern.maxswzone="0" Give the small amount of memory on your system, I would suggest using the above /boot/loader.conf setting, since your system is significantly likely to make use of lots of swap; decreasing swap space in your case seems downright silly. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: OpenSSH in -STABLE
On Tue, May 21, 2013 at 08:11:09PM -0700, Jeremy Chadwick wrote: > ... 6.2p2 was imported to head/CURRENT on May 22nd ... Typo on my part: this should have read May 17th, as is obvious from svnweb. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: OpenSSH in -STABLE
On Tue, May 21, 2013 at 11:02:27PM -0400, usa...@hushmail.com wrote: > On Tue, 21 May 2013 22:20:08 -0400 "David Wolfskill" > wrote: > >On Tue, May 21, 2013 at 09:42:39PM -0400, usa...@hushmail.com > >wrote: > >> Hi. Are there any plans to get OpenSSH 6.2 in 9-STABLE? I'd like > >to > >> check out the new AES-GCM stuff without going to -CURRENT on > >this > >> system. If there are no plans, is there a possibility? Thanks > >> > > > >Please refer to ports/security/openssh-portable; its Makefile says > >it's > >6.2p2,1, last updated about 5 days ago. > > > > Thanks, but that wasn't what I asked about. I'm aware of the > version in ports. Try freebsd-secur...@freebsd.org, I am certain you will get an answer there. Fact: OpenSSH 6.2p1 was imported to head/CURRENT on March 22nd, and 6.2p2 was imported to head/CURRENT on May 22nd: http://svnweb.freebsd.org/base/head/crypto/openssh/ChangeLog OpenSSH is such an important/key piece of software that, much like OpenSSL, it is one that does not warrant haste when it comes to getting MFC'd. If you want something more recent on non-CURRENT, you will usually be told to run the version from ports. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Unexpected reboot/crash on 8.2-RELEASE.
On Sat, May 18, 2013 at 09:45:21PM -0400, kpn...@pobox.com wrote: > I had an unexpected reboot of my Dell R610 today around 2:05-06pm today. > I do not know if it crashed or if it was power cycled. > > This machine is running: > FreeBSD gunsight1.neutralgood.org 8.2-RELEASE FreeBSD 8.2-RELEASE #1: Thu Dec > 8 21:58:59 UTC 2011 root@:/usr/obj/usr/src/sys/GENERIC amd64 > > It's a stock 8.2-RELEASE kernel except I had to tweak it near the top of > vfs_mountroot() to delay before attempting to mount the root filesystem. > (Without my tweak it attempts to mount root before the USB drive is finished > getting attached.) > > The dmesg shows this at the reboot: > mfi0: 24272 (422106527s/0x0020/info) - Patrol Read complete > mfi0: 24273 (422172000s/0x0020/info) - Patrol Read started > mfi0: 24318 (422192750s/0x0020/info) - Patrol Read complete > mfi0: 24319 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID > 0060/1000/1f0c/1028) > mfi0: 24320 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > mfi0: 24321 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID > 0060/1000/1f0c/1028) > mfi0: 24322 (boot + 3s/0x0020/info) - Firmware version 1.22.12-0952 > > Does this mean the machine did not lose power? I ask because my datacenter > had some sort of power incident and I'm not sure if the server lost power > or not. But if the kernel message buffer from before the incident is still > present then the machine never lost power, correct? The datacenter's power > incident I'm told happened somewhere around the time of the reboot so I > have to ask. > > It looks like I didn't have dumps enabled. That's ... not helpful. > > The machine has been stable for: > 2:05PM up 472 days, 21 mins, 7 users, load averages: 0.01, 0.02, 0.00 > > http://www.neutralgood.org/~kpn/dmesg.boot > > Here's various stats I usually keep displayed. This is the last from > before the reboot: > http://www.neutralgood.org/~kpn/status.txt Your system did not reboot nor did it crash. If it did, your uptime would not be showing 472 days.. Really, it's that simple. > I've got all the power savings features turned off in the BIOS and, like > I said, the machine has been stable for all this time. However, one thing > to note from a couple of days ago: > > May 14 00:49:13 gunsight1 -- MARK -- > May 14 01:00:45 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 35 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 65 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 95 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 125 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 155 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 185 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 215 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 245 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 275 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 305 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 335 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 365 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 395 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 425 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 455 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 485 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 515 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 545 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 575 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 605 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 635 SECONDS > May 14 01:11:36 gunsight1 kernel: mfi0: COMMAND 0xff80009d1310 TIMEOUT > AFTER 665 SECONDS > May 14 01:19:36 gunsight1 -- MARK -- > May 14 01:39:36 gunsight1 -- MARK -- > May 14 01:59:37 gunsight1 -- MARK -- > May 14 02:10:55 gunsight1 kernel: mfi0: 24089 (421826400s/0
Re: still mbuf leak in 9.0 / 9.1?
On Sat, May 18, 2013 at 12:14:28PM +0200, Ronald Klop wrote: > On Fri, 17 May 2013 19:31:01 +0200, Jeremy Chadwick wrote: > > >On Fri, May 17, 2013 at 11:37:23AM +0200, dennis berger wrote: > >>Hi List, > >>I can confirm that it is the bug you mentioned steven. > >>Here is how I found it. > >> > >>I recorded hourly zfskern and nfsd stats. like this. > >> > >>echo "PROCSTAT" >> $reportname > >>pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname > >> > >>luckily it crashed this night and logged this. > >> > >> 1910 101508 nfsd nfsd: servicemi_switch+0x186 > >>sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1 > >>uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 > >>arc_read_nolock+0x1ec arc_read+0x93 dbuf_prefetch+0x12c > >>dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x4a7 > >>dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67 > >>dmu_read_uio+0x3f zfs_freebsd_read+0x3e3 > >> > >>Maybe it would be good to merge this fix into RELENG_9_1 and > >>distribute a fix via freebsd-update what do you think? > >> > >>best, > >>-dennis > >> > >> > >>Am 16.05.2013 um 11:42 schrieb dennis berger: > >> > >>> This is indeed a ZFS+NFS system and I can see that istgt and > >>nfs are stuck in some ZIO state. Maybe it's this. > >>> Thank's for pointing out. > >>> > >>> Is it this ZFS+NFS deadlock? > >>> > >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > >>> @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int > >>howto __unused) > >>> mutex_enter(&arc_reclaim_thr_lock); > >>> needfree = 1; > >>> cv_signal(&arc_reclaim_thr_cv); > >>> - while (needfree) > >>> - msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > >>> + > >>> + /* > >>> + * It is unsafe to block here in arbitrary threads, because > >>we can come > >>> + * here from ARC itself and may hold ARC locks and thus risk > >>a deadlock > >>> + * with ARC reclaim thread. > >>> + */ > >>> + if (curproc == pageproc) { > >>> + while (needfree) > >>> + msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > >>> + } > >>> mutex_exit(&arc_reclaim_thr_lock); > >>> mutex_exit(&arc_lowmem_lock); > >>> } > >>> > >>> I'll try to crash our testsystem. I'll assume that stressing > >>NFS backed with ZFS a lot might trigger this bug? > >>> > >>> -dennis > >>> > >>> > >>> Am 16.05.2013 um 00:03 schrieb Steven Hartland: > >>> > >>>> - Original Message - From: "dennis berger" > >>>>> FreeBSD 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec > >>4 09:23:10 UTC 2012 > >>>>> > >>>>>> 3. Regarding this: > >>>>>>>> A clean shutdown isn't possible though. It hangs after vnode > >>>>>>>> cleaning, normally you would see detaching of usb devices > >>here, or > >>>>>>>> other devices maybe? > >>>>>> Please don't conflate this with your above issue. This is almost > >>>>>> certainly unrelated. Please start a new thread about that > >>if desired. > >>>>> > >>>>> Maybe this is a misunderstanding normally this system will > >>shutdown cleanly, of course. > >>>>> This hang only appears after the network problem above. > >>>> > >>>> If this is a ZFS system, its a known issue which is fixed in current, > >>>> stable-9, stable-8 and the upcoming 8.4 release. > >>>> > >>>> If not and you have USB devices see if the following sysctl helps: > >>>> hw.usb.no_shutdown_wait=1 > > > >I'm sorry to say it won't happen. The only updates that the -RELEASE > >branches get are for security. If you want fixes for other things, you > >need to follow/run stables branches (i.e. stable/9), otherwise you will > >need to wait until 9.2-RELEASE comes out. > > > > And errata notices?
Re: Command line not responding
On Fri, May 17, 2013 at 09:49:20PM -0500, Michael Gass wrote: > On Fri, May 17, 2013 at 11:55:13AM -0700, Jeremy Chadwick wrote: > > On Fri, May 17, 2013 at 12:56:53PM -0500, Michael Gass wrote: > > > Running 9.0-Stable on an i386. > > > > > > Whenever I type a command at the prompt I get > > > the output > > > > > > /usr/local/lib/libintl.so.9: Undefined symbol "_ThreadRuneLocale" > > > > > > and nothing else - the command will not run. Just the > > > above output. Commands like "ls" and "exit" work, but not much > > > else. This happends whether I am logged in a user or as root. > > > Cannot even halt the system from the command line. > > > > > > Started to happen after trying to update the freetype2 port. > > > Got an error msg while updating libXft-2.1.14. From that point > > > on I cannot use the command line. > > > > > > I have no idea what to try. Any suggestions. > > > > > > First provide the contents of /etc/make.conf and /etc/src.conf. > > > > Thanks for getting back to me. Here are the contents of the two > files. I rebuilt the kernel last fall and have updated ports > fairly regularly since. Things have worked fine until today when > I tried to update ports. > > # File: make.conf > # The ? in the below is for buildworld > CPUTYPE?=pentium2 > # Uncomment the below for general builds. > CFLAGS= -O -pipe > # Uncomment the below for kernel builds. > # COPTFLAGS= -O -pipe > NO_PROFILE=true > INSTALL_NODEBUG=true > #WITHOUT_DILLO_IPV6=yes > #WITH_DILLO_DLGUI=yes > # added by use.perl 2013-05-17 11:04:30 > PERL_VERSION=5.12.4 > > # File: src.conf > WITHOUT_PROFILE=true > WITHOUT_BLUETOOTH=true These confs look generally good, meaning there isn't the "messing about" that the other user had. I did catch one thing, however. Speaking strictly about CFLAGS: This should be CFLAGS+= (plus-equals), not CFLAGS= (equals). Otherwise you're effectively overriding CFLAGS for everything, which could cause issues (some portions of the build infrastructure may set or adjust the optimiser flags to something other than -O, and you'd be forcing it to do it anyway). I obviously don't know if that could/would explain the missing symbol issue, but it's still something that's erroneous and major. In general I recommend people *do not* tinker with CFLAGS at all in make.conf -- it's not worth the hassle on i386/amd64 if something goes wrong. If you ever want to know which syntaxes to use (for example, your CPUTYPE?= is correct, and your COPTFLAGS= is correct), review /usr/share/examples/etc/make.conf or src/share/examples/etc/make.conf. Unrelated to all of this (just a useful comment in passing): NO_PROFILE serves no purpose there, just keep WITHOUT_PROFILE=true in src.conf like you have. NO_PROFILE in make.conf would be from "old" FreeBSD days (i.e. prior to src.conf existing). Your src.conf looks fine. Sorry I can't be of more help. :-( -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Command line not responding
On Fri, May 17, 2013 at 12:56:53PM -0500, Michael Gass wrote: > Running 9.0-Stable on an i386. > > Whenever I type a command at the prompt I get > the output > > /usr/local/lib/libintl.so.9: Undefined symbol "_ThreadRuneLocale" > > and nothing else - the command will not run. Just the > above output. Commands like "ls" and "exit" work, but not much > else. This happends whether I am logged in a user or as root. > Cannot even halt the system from the command line. > > Started to happen after trying to update the freetype2 port. > Got an error msg while updating libXft-2.1.14. From that point > on I cannot use the command line. > > I have no idea what to try. Any suggestions. First provide the contents of /etc/make.conf and /etc/src.conf. The _ThreadRuneLocale thing has come up before, but on -CURRENT circa early 2012. It happened to a user when trying to build kernel (really) and that user was tinkering about in make.conf and src.conf heavily, messing with Clang. I personally remove Clang from my systems entirely for many reasons, by simply doing WITHOUT_CLANG=true in src.conf and thus rely entirely on gcc. My recommendation, and this isn't going to make you happy: Boot into single-user, mount your filesystems, and try commands there, in hopes that they work. If they do: pkg_delete -a -f cp -pR /usr/local /usr/local.old rm -fr /usr/local/* reboot Boot into multi-user, log in, and things should be fine. Next: rm -fr /var/db/ports/* rm -fr /usr/ports/distfiles/* find /usr/ports -type d -name "work" -exec rm -fr {} \; Now begin rebuilding your ports. If you prefer to use packages, go right ahead, given that this was just announced a few days ago: http://lists.freebsd.org/pipermail/freebsd-announce/2013-May/001476.html But I tend to build everything from source, barring large-ish packages (things like cmake, python27, perl) which I pkg_add -r. My attitude has always been when something catastrophic impacts a very large number of commands (particularly a library with a missing symbol that a very large number of programs link to), start fresh. It's not worth scrambling around with leftover cruft in place that could appear months later and make you say "I thought I fixed that!", where you then have to follow up to a thread months old and admit "actually there is more breakage..." Footnote: I am likely to get a large amount of backlash for proposing the above, with claims that will equate it to fixing a minor cut by amputating the entire limb. My response to such: that's nice. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: still mbuf leak in 9.0 / 9.1?
On Fri, May 17, 2013 at 11:37:23AM +0200, dennis berger wrote: > Hi List, > I can confirm that it is the bug you mentioned steven. > Here is how I found it. > > I recorded hourly zfskern and nfsd stats. like this. > > echo "PROCSTAT" >> $reportname > pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname > > luckily it crashed this night and logged this. > > 1910 101508 nfsd nfsd: servicemi_switch+0x186 > sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1 > uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 arc_read_nolock+0x1ec > arc_read+0x93 dbuf_prefetch+0x12c dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 > dbuf_read+0x4a7 dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67 > dmu_read_uio+0x3f zfs_freebsd_read+0x3e3 > > Maybe it would be good to merge this fix into RELENG_9_1 and distribute a fix > via freebsd-update what do you think? > > best, > -dennis > > > Am 16.05.2013 um 11:42 schrieb dennis berger: > > > This is indeed a ZFS+NFS system and I can see that istgt and nfs are stuck > > in some ZIO state. Maybe it's this. > > Thank's for pointing out. > > > > Is it this ZFS+NFS deadlock? > > > > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > > @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto __unused) > > mutex_enter(&arc_reclaim_thr_lock); > > needfree = 1; > > cv_signal(&arc_reclaim_thr_cv); > > - while (needfree) > > -msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > > + > > + /* > > +* It is unsafe to block here in arbitrary threads, because we can come > > +* here from ARC itself and may hold ARC locks and thus risk a deadlock > > +* with ARC reclaim thread. > > +*/ > > + if (curproc == pageproc) { > > +while (needfree) > > +msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > > + } > > mutex_exit(&arc_reclaim_thr_lock); > > mutex_exit(&arc_lowmem_lock); > > } > > > > I'll try to crash our testsystem. I'll assume that stressing NFS backed > > with ZFS a lot might trigger this bug? > > > > -dennis > > > > > > Am 16.05.2013 um 00:03 schrieb Steven Hartland: > > > >> - Original Message - From: "dennis berger" > >>> FreeBSD 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 > >>> UTC 2012 > >>> > >>>> 3. Regarding this: > >>>>>> A clean shutdown isn't possible though. It hangs after vnode > >>>>>> cleaning, normally you would see detaching of usb devices here, or > >>>>>> other devices maybe? > >>>> Please don't conflate this with your above issue. This is almost > >>>> certainly unrelated. Please start a new thread about that if desired. > >>> > >>> Maybe this is a misunderstanding normally this system will shutdown > >>> cleanly, of course. > >>> This hang only appears after the network problem above. > >> > >> If this is a ZFS system, its a known issue which is fixed in current, > >> stable-9, stable-8 and the upcoming 8.4 release. > >> > >> If not and you have USB devices see if the following sysctl helps: > >> hw.usb.no_shutdown_wait=1 I'm sorry to say it won't happen. The only updates that the -RELEASE branches get are for security. If you want fixes for other things, you need to follow/run stables branches (i.e. stable/9), otherwise you will need to wait until 9.2-RELEASE comes out. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: revision higher than 250508 breaks webcam support
On Thu, May 16, 2013 at 08:38:39PM -0700, Adrian Chadd wrote: > Are you able to narrow down the specific commit along 9-stable that broke it? > > Thanks! > > > > Adrian > > On 16 May 2013 18:00, Jo??e Zobec wrote: > > Sorry, for waiting this long to post this problem, I thought it would be > > dealt with this week, but since it wasn't better to report it now. I hope > > this is the right mailing list for this particular problem. > > > > I am running FreeBSD 9.1-STABLE and using Logitech Webcam C525. I it's not > > listed amongst the supported hardware, but it was working perfectly until > > the updates that came this Sunday, 2013-05-12. > > > > The problem I'm getting is this: > > > > I keep getting this error message from the kernel, if I'm using 9.1-STABLE > > r250707 > > > > ... > > pcm6: detached > > ugen7.2: at asbus7 > > uaudio0: > > on usbus7 > > uaudio0: No playback. > > uaudio0: Record: 48000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer. > > uaudio0: Record: 32000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer. > > uaudio0: Record: 24000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer. > > uaudio0: Record: 16000 Hz, 1 ch, 16-bit S-LE PCM format, 2x8ms buffer. > > uaudio: No MIDI squencer. > > pcm6: on uaudio0 > > uaudio0: No HID volume keys found. > > ugen7.2: at usbus7 (disconnected) > > uaudio0: at uhub7, port4, addr 2 (disconnected) > > pcm6: detached > > ... > > > > This message is displayed periodically "ad infinitum" or at least until I > > unplug the webcam. It stays this way, even if I use the GENERIC kernel. In > > a "healthy" case, revision 250508, kernel message upon plugging the webcam, > > is > > > > ... > > ugen7.2: at usbus7 > > uaudio0: > > on usbus7 > > uaudio: No playback. > > uaudio: Record: 48000 Hz, 1 ch, 16 bit S-LE PCM format, 2x8ms buffer. > > uaudio: No MIDI sequencer. > > pcm6: on uaudio0 > > uaudio0: No HID volume keys found. > > > > And there it stops, and the webcam works in Skype. Note: I told Joe to mail freebsd-usb@ about this, since it looks like it pertains to the USB stack, and Hans tends to respond to stuff there. That said... Looking at commits between r250508 and r250707, my gut says it's very likely one of these (with the most probable being marked with arrows): http://www.freshbsd.org/commit/freebsd/r250581 http://www.freshbsd.org/commit/freebsd/r250561 <--- http://www.freshbsd.org/commit/freebsd/r250560 <--- http://www.freshbsd.org/commit/freebsd/r250559 How I got that list was by manually reviewing the following: http://www.freshbsd.org/?branch=RELENG_9&project=freebsd So I would recommend rolling back to r250558 (the last stable/9 commit to happen before r250559) and see if things improve. Again, my gut feeling says that they will, and that r250561 or r250560 are responsible. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: still mbuf leak in 9.0 / 9.1?
> > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > > > > > I should add that the servers that are directly connected to this freebsd > > server reboot every night. This is why you see ix0 UP/DOWN > > messages in dmesg. > > > > > > > > > > > > > > - END System information 1. You appear convinced that the issue is related to mbuf exhaustion, but you haven't provided evidence that you're hitting the mbuf maximum (in your case 262144). What you *have* shown is your mbuf count gradually increasing (sans 15-05-2013-13-09.txt vs. 15-05-2013-14-09.txt which shows mbufs almost doubling (!)), which could indicate a leak but then again might not. If you reach mbuf maximum, then yes, network I/O can cease or stall (possibly indefinitely). However, broken/busted network I/O can also happen due to other issues unrelated to mbufs, such as network stack issues, firewall stack issues, or network driver bugs. Are you using pf, ipfw, or ipfilter on this system? 2. I think we'd all appreciate if you disclosed **exactly** what version of FreeBSD you're using (Subject says "9.0 or 9.1" which is insufficient). Please provide "uname -a" output (you can XXX out the hostname if you want) -- and if you're still using csup/cvsup and built your own kernel/world, we'll need to know exactly what date your src files were from when you rebuilt. I'm wary of CC'ing folks who can help troubleshoot mbuf exhaustion issues until answers to the above can be provided, as I don't want to waste their time. 3. Regarding this: > > A clean shutdown isn't possible though. It hangs after vnode > > cleaning, normally you would see detaching of usb devices here, or > > other devices maybe? Please don't conflate this with your above issue. This is almost certainly unrelated. Please start a new thread about that if desired. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Build GENERIC with IPX support
ead rc file as follows: 182 * 1. read [server] section 183 * 2. override with [server:user] section 184 * Since abcence of rcfile is not a bug, silently ignore that fact. 185 * rcfile never closed to reduce number of open/close operations. 186 */ 187 int 188 ncp_li_readrc(struct ncp_conn_loginfo *li) { 189 int i, val, error; 190 char uname[NCP_BINDERY_NAME_LEN*2+1]; 191 char *sect = NULL, *p; 192 193 /* 194 * if info from cmd line incomplete, try to find existing 195 * connection and fill server/user from it. 196 */ 197 if (li->server[0] == 0 || li->user == NULL) { 198 int connHandle; 199 struct ncp_conn_stat cs; 200 201 if ((error = ncp_conn_scan(li, &connHandle)) != 0) { 202 ncp_error("no default connection found", errno); 203 return error; 204 } To me, this may indicate you have some kind of "ncp rc file" (I believe this is ~/.nwfsrc according to the ncplist(1) man page) that may contain something invalid, or maybe you lack such a file altogether (creating one might work around the problem). Back to the actual segfault itself: ncp_error() is pretty simple: src/lib/libncp/ncpl_subr.c -- 447 /* 448 * Print a (descriptive) error message 449 * error values: 450 * 0 - no specific error code available; 451 * -999..-1 - NDS error 452 * 1..32767 - system error 453 * the rest - requester error; 454 */ 455 void 456 ncp_error(const char *fmt, int error, ...) { 457 va_list ap; 458 459 fprintf(stderr, "%s: ", _getprogname()); 460 va_start(ap, error); 461 vfprintf(stderr, fmt, ap); 462 va_end(ap); 463 if (error == -1) 464 error = errno; 465 if (error > -1000 && error < 0) { 466 fprintf(stderr, ": dserr = %d\n", error); 467 } else if (error & 0x8000) { 468 fprintf(stderr, ": nwerr = %04x\n", error); 469 } else if (error) { 470 fprintf(stderr, ": syserr = %s\n", strerror(error)); 471 } else 472 fprintf(stderr, "\n"); 473 } What I don't understand from the calling stack is how gettimeofday() is involved. I have looked at the libc code, looked at the underlying calling functions and so on (from fprintf() to vfprintf_l() and deeper), and I don't see how or where gettimeofday() would be called. The only place I can think of might be the related locale stuff, but I'm doubting that given what I've looked at but could still be wrong. Have world/kernel on this system ever been rebuilt? If they have, were both kernel and world rebuilt together from the same source code and not at different times? If you're setting LANG, LC_CTYPE, LC_COLLATE, or other locale-oriented settings in your environment (and my gut feeling is that you are), you could try removing them and see if you get an actual useful error message on stderr, but I'm not holding my breath. I cannot help you with the remaining IPX-specific "stuff"; it's fairly obvious though, as I said, that this code has been neglected. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"