Re: kernel: MCA: CPU 0 COR (1) internal parity error
be new MCEs or changes to the MCA that Intel implemented in some newer models of Core iX that aren't being handled correctly by the kernel (i.e. misreporting or mis-decoding). Good luck! -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Sun, Apr 21, 2013 at 02:11:04PM +0300, Alexander Motin wrote: On 21.04.2013 00:29, Jeremy Chadwick wrote: - The ATA commands which lead up to the error also vary. Many are for write requests, and from some entries I can see that the OS was doing NCQ writes (WRITE FPDMA QUEUED) and then suddenly decided to do a classic 28-bit LBA write (WRITE DMA). I'm not sure why an OS would do this (there's nothing optimal about it) unless there were conditions occurring where the OS/ATA driver said this NCQ write isn't working (timeout, etc.), let me retry with a classic 28-bit LBA write. ATA disk driver in CAM inserts non-queued command every several seconds of continuous load to limit possible command starvation inside the disk. SCSI driver does alike things, but inserts ordered command flag, that does not exist in SATA, instead of different command. Thanks for the insights Alexander, greatly appreciated. I'm a little confused by your description, because if I'm reading it right, it sounds like it conflicts with what the ACS-2 spec states. Quoting T13/2015-D rev 3 (I'm aware it's a working draft), section 4.16.1: If the device receives a command that is not an NCQ command while NCQ commands are in the queue, then the device shall return command aborted for the new command and for all of the NCQ commands that are in the queue. I assume this means ABRT status is returned to the host controller; if so (and by design of course), how do we differentiate between that condition and any other I/O condition that induces ABRT? Possibly in the answer is in this admission: I should probably get around to reading ATA8-AST sometime. :-) -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, Apr 04, 2013 at 10:00:18AM +0200, Matthias Andree wrote: Am 04.04.2013 03:05, schrieb Jeremy Chadwick: { snipping stuff I have no comment on. reference thread: } { http://lists.freebsd.org/pipermail/freebsd-stable/2013-April/073036.html } One piece of evidence that refutes my theory is that if Windows and/or Linux partition are something you boot into and use often, I would imagine NCQ would be used in both of those environments and would suffer from the same issue. Although Windows tends to hide all sorts of transient errors from the user (sigh), Linux tends to be like FreeBSD with regards to such issues (on the console anyway; you wouldn't see such messages normally inside of X). Now, the FreeBSD slice is the only partition on that disk that would likely see concurrent write accesses (think make -j8 on a quadcore computer) which is more prone to ferret out such alignment contention. The NTFS partition is aligned on a multi-MB boundary, so wouldn't hit the problem anyways. The Linux partition is in ext4 format for mostly sequential access to files usually in excess of 10 MB each. Linux's ext4 jumps through several hoops to end up with bulk writes, like extents, delayed allocations (to avoid fragmentation), reordering of data and metadata writes, serialized log writes and all that stuff, and it would appear I am permitting it to cache writes -- Linux uses write barriers to enforce proper ordering of journal/meta-data writes. It would be rather hard to hit ATA taskfile timeouts, the expected rate with which the drive needs to do a partial write is orders of magnitude lower. Any good concurrent write exercise tools for Unix that I could run on the Linux ext4 partition that you would propose? The only tool I'm familiar with is bonnie++. But I don't think this (partition alignment) is what matters now. Your smartctl output has shed some light on your situation. - I am running with kern.cam.ada.default_timeout=5 which makes the computer recover faster I can definitely imagine cases where a drive using NCQ but doing writes to a non-aligned partition could take longer than 5 seconds to respond to an ATA CDB (this is different than a SATA or AHCI layer timeout). I am not telling you change this back to 30, but it might not be helping your situation at all given my above theory. My feeling is that the stalls are mostly from the error handler and the overall time the drive is frozen gets shorter. If it had not _felt_ faster, I'd not have left that in sysctl.conf in the first place. Your understanding of what that sysctl does is wrong, or I'm misunderstanding what you're saying (very possible!). How I interpret what you're saying: that the sysctl somehow decreases stall times during I/O operations that fail. This is incorrect. What that sysctl does is define the number of seconds that transpire ***before*** the CAM layer says Okay, I didn't get a response to the ATA CDB I sent the disk, and then re-submits the same CDB to the disk. Rephrased: in the case of a disk stalling on an I/O request, you will experience the effects of that stall no matter what that sysctl is set to. A lower value in that sysctl will result in CAM spitting out nasties on the console + hitting the CDB retry submission scenario sooner, which if the drive is awake/responsive by that time will go smoothly. That's all it does. Thus a value of 5 indicates a device/drive did not respond to a CDB within 5 seconds, and a value of 30 indicates a device/drive did not respond to a CDB within 30 seconds. Regardless, those lengths of time are VERY long for an I/O operation on a mechanical HDD. When you get to the bottom of my Email, you'll understand why I screamed at you about adjusting that sysctl. Finally: could you please provide output from smartctl -x /dev/ada1? I would like to rule out any possibility of your drive having some other kind of issue that might cause it to go catatonic. Thanks. I have fetched the data with Linux this time (should not make a difference as it's all drive internal data, not host OS stuff). Looks sane to me, http://people.freebsd.org/~mandree/smartctl.log. I'll be happy to refetch this data with a more current smartctl version under FreeBSD if required. Oh look, it's the Samsung SpinPoint series, especially the EcoGreen (EG) series. No joke: ~60% of the problem reports I deal with when it comes to weird wonky problems stem from this drive series. I have no idea why, but they're a common pain point for me. First, about the shown sector size: smartmontools 5.41 was the first release to show the sector sizes per ATA IDENTIFY. I assume they got this right from the get-go. So as of this moment I'm going to assume that this drive really is a 512-byte sector drive. Politely, your analysis of the drive (looks sane to me) is an indicator of why SMART output needs to be interpreted by a person who is familiar
Re: Kernel output interleaved on boot
I have discussed this problem for years now -- over 5 years, to be exact. As if I haven't sounded like a broken record before, I surely do now. Start here, under section Kernel, item Scrambled or garbled kernel output: https://wiki.freebsd.org/BugBusting/Commonly_reported_issues The problem has not gone away. It has not been solved. It has not been worked around. PRINTF_BUFR_SIZE does not solve the problem, and rarely helps relieve it. I have discussed this issue more recently (2010) with John Baldwin as well: http://lists.freebsd.org/pipermail/freebsd-questions/2010-March/214412.html http://lists.freebsd.org/pipermail/freebsd-questions/2010-March/214423.html And in December 2011 too -- particularly an important read if you think increasing the number is a wise idea: http://lists.freebsd.org/pipermail/freebsd-stable/2011-December/065158.html Bottom line: there is no solution other than to switch OSes. And yes, I am aware of how GSoC works, but this really should have become a GSoC project by now, otherwise the Foundation should have funded someone to fix this. It makes kernel debugging basically worthless. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, Apr 04, 2013 at 12:15:32AM +0200, Matthias Andree wrote: I have just sent more information to the PR at http://www.freebsd.org/cgi/query-pr.cgi?pr=157397 The short summary (more info in the PR) is: - limiting tags to 31 does not help - disabling NCQ appears to help in initial testing, but warrants more testing - error happens during WRITE_FPDMA_QUEUED, This is an NCQ-based write LBA request. There are many non-NCQ equivalents of this, ATA-protocol-wise (too many to list here), but the most likely non-NCQ ATA command you'd see is WRITE_DMA48. - File system in question is SU+J UFS2 mounted on /usr, and I can for instance rm -rf /usr/obj or just log into GNOME and try to open a gnome-terminal to trigger stalls; - Linux uses 31 tags (for different reason) and has no drive quirks, but a controller quirk; for Jeremy's topic #6, regarding the ATI/AMD SB7x0 that I am using, it might be worthwhile investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag - it gets set by Linux on the SB700 that my computer is using, see ahci_error_intr() in libahci.h - I am not going to interpret that for lack of expertise, but it does affect error handling and appears to ignore a certain condition. Alexander could expand on this, but the name of the flag implies that there are certain conditions where the SATA-level SERR condition gets ignored (IGN). While skimming Linux libata code and commits in the past, the only glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the hardware revision apparently matters) and port multiplier (PMP) support and soft resets. Are you using a port multiplier? I doubt it, but I have to ask. Why only my Samsung HDD drive triggers this but not the WD drive, I do not know yet. Please provide gpart show -p ada1 output, both here and in the PR, if you could. I have a gut feeling I know what the issue is (and if it is what I think it is, it's actually happening all the time, just that NCQ exacerbates it given how command queueing works), but I won't know for sure until I see the output. Thanks. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Any objections/comments on axing out old ATA stack?
On Thu, Apr 04, 2013 at 02:19:16AM +0200, Matthias Andree wrote: Am 04.04.2013 01:38, schrieb Jeremy Chadwick: ... While skimming Linux libata code and commits in the past, the only glaringly obvious bug/issue I see is with SB600/SB700 chipsets (the hardware revision apparently matters) and port multiplier (PMP) support and soft resets. Are you using a port multiplier? I doubt it, but I have to ask. I am not using a PMP as far as I know (unless one is buried on my Asus M4A78T-E main board). It would seem the drives are directly attached to the south bridge's SATA ports. Then the answer is nope, you're not using a PM. Details: http://www.serialata.org/technology/port_multipliers.asp http://en.wikipedia.org/wiki/Port_multiplier Why only my Samsung HDD drive triggers this but not the WD drive, I do not know yet. Please provide gpart show -p ada1 output, both here and in the PR, if you could. =63 1953525105ada1 MBR (931G) 63 209714337 ada1s1 freebsd [active] (100G) 209714400 800 - free - (400k) 2097152007168 ada1s2 ntfs (34G) 281395200 15405 - free - (7.5M) 281410605 488263545 ada1s3 linux-data (232G) 769674150 1183851018 - free - (564G) This is what I was worried about. Referring to your camcontrol identify output: device model SAMSUNG HD103SI sector size logical 512, physical 512, offset 0 Hear me out entirely on this one. My theory is that your hard disk actually uses 4096-byte sectors but is too old to provide ATA IDENTIFY semantics to delineate between logical vs. physical sector size. In other words, only logical is provided, thus logical=physical in the eyes of all software; smartctl will show you the exact same thing too. There are drives like this in the wild, both SSDs as well as MHDDs. For example, the Intel 320-series SSD behaves this way too (providing only logical size). Do not let the capacity/size of the drive be the deciding factor; your drive is 1TB, but I also have many 1TB MHDDs that use 4096-byte sectors. Seagate/Samsung's specification** for the HD103SI states, and I quote: Byte per Sensor: 512 bytes. Yes, it says Sensor. Whether or not this documentation is correct/accurate is unknown, and when vendors have typos in their own specification docs, I cannot help but to honour the possibility of the information being wrong. So I'm unsure if this drive uses 512-byte sectors or 4096-byte sectors. That said: in your gpart show ada1 output, none of your partitions (FreeBSD, NTFS, nor Linux) appear to be aligned to 4096-byte boundaries. Ideally you'd want to have these aligned to 1MB or 2MByte boundaries in the case you ever move to an SSD. You're also using the MBR scheme, which does not tend to play well with alignment. Comparatively, your WD5002ABYS drive **does** use 512-byte sectors (I know this for a fact). The problem here is that I cannot guarantee you that alignment is the problem. The performance impact of writes to partitions which are non-aligned is quite high, and NCQ just exacerbates this problem. I would love to tell you switch to GPT and follow Warren Block's document*** but if your NTFS partition is Windows and is a Windows version older than Windows 7 GPT is not supported. One piece of evidence that refutes my theory is that if Windows and/or Linux partition are something you boot into and use often, I would imagine NCQ would be used in both of those environments and would suffer from the same issue. Although Windows tends to hide all sorts of transient errors from the user (sigh), Linux tends to be like FreeBSD with regards to such issues (on the console anyway; you wouldn't see such messages normally inside of X). If you have the time and want to put forth the effort, I would recommend backing up all your data on ada1, zero the first and last 1MByte of the drive, and then try following Warren Block's guide. I'd just recommend doing this: gpart create -s gpt ada1 gpart add -t freebsd-ufs -b 2m ada1 newfs -U -j /dev/ada1p1 (or remove -j if you don't want to use SUJ) I picked an alignment value of 2MBytes since it's both 4K-aligned and is generally safe for things like newer SSDs that have larger NAND erase block size (I am not going to get into a discussion about that here, so please stay focused. :-) ) If the problem is gone after that (it should be easy to induce by writing tons and tons of data to the drive), then we can safely say that the drive uses 4096-byte sectors and need to add it to the quirks list in ata_da.c. If the problem remains after that, then further investigation is needed, and we can safely rule out alignment. Welcome to all the pain/effort one has to go through when troubleshooting things like this. :-) Another thing: in your PR you state: - I am running with kern.cam.ada.default_timeout=5 which makes the computer recover faster I can definitely imagine cases where
Re: Any objections/comments on axing out old ATA stack?
On Sun, Mar 31, 2013 at 03:02:09PM -0600, Scott Long wrote: On Mar 31, 2013, at 7:04 AM, Victor Balada Diaz vic...@bsdes.net wrote: On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote: Hi. Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA stack, using only some controller drivers of old ata(4) by having `options ATA_CAM` enabled in all kernels by default. I have a wish to drop non-ATA_CAM ata(4) code, unused since that time from the head branch to allow further ATA code cleanup. Does any one here still uses legacy ATA stack (kernel explicitly built without `options ATA_CAM`) for some reason, for example as workaround for some regression? Does anybody have good ideas why we should not drop it now? Hello, At my previous job we had troubles with NCQ on some controllers. It caused failures and silent data corruption. As old ata code didn't use NCQ we just used it. I reported some of the problems on 8.2[1] but the problem existed with 8.3. I no longer have access to those systems, so i don't know if the problem still exists or have been fixed on newer versions. So what I hear you and Matthias saying, I believe, is that it should be easier to force disks to fall back to non-NCQ mode, and/or have a more responsive black-list for problematic controllers. Would this help the situation? It's hard to justify holding back overall forward progress because of some bad controllers; we do several Tbps off of AHCI controllers with NCQ enabled on FreeBSD 9.x, enough to make up a sizable percentage of the internet's traffic, and we see no problems. How can we move forward but also take care of you guys with problematic hardware? I've read a referenced PR (157397) except there really isn't enough technical troubleshooting/detail to determine what the root cause is. That isn't the fault of the reporter either -- the reporter needs to be told what information they need to provide / how to troubleshoot it. Meaning: kernel folks who are in-the-know need to step up and help. That PR is soon-to-be 2 years old and is missing tons of information that, even as a non-kernel guy, that *I* would find useful: 1. Output from: - camcontrol tags ada1 -v - camcontrol identify ada1 - What sorts of filesystems are on ada1; if UFS, tunefs -p output would be greatly appreciated - If the timeouts happen during heavy I/O load, and if so, during what kinds of I/O load (reads or writes). 2. Does camcontrol tags ada1 -N 31 help? I mention this because stated here: http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072985.html ...there are statements which imply decreasing queue length may solve the issue. What confuses me, however, is that the queue length on my own systems (with different models of disks, as well as an SSD) all have a limit of 32. I dug through the kernel source for a while but could not easily find where this number comes from. (I have very little familiarity with command queuing at the protocol level) 3. Why not find out why Linux (probably libata) has a 32 (or 31?) queue limit? They have commit logs, and there is the LVKM where you could ask. While I understand reluctance to add something just because Linux does it, it doesn't appear anyone's stepped up to the plate to ask them why; I pray this is not caused by anti-Linux sentiment. 4. The ada1 device in the PR is a Samsung Spinpoint EcoGreen F2 hard drive (1TB, 5400rpm, 32MB cache). Possibly the drive has firmware bugs relating to its NCQ implementation, or possibly it's going into some power-saving mode (it is an EcoGreen model). I've always been wary of the EcoGreen disks since reading about the F4 EcoGreen firmware fiasco (even though the same page says the F1 and F3 EcoGreen had no issue): http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks 5. We really need to have some way to print active quirks for devices, even if it's only at boot-up, e.g.: ada3: quirks=0x00034K,NO_NCQ I'd be happy to write the code for this (basing it on how we do CPU flags), but as I've said in the past, kernel-land is scary to me. 6. The controller referenced is an ATI IXP700. I cannot tell you how many times on the mailing lists I've seen weird issues reported by people using that controller. I am in no way/shape/form saying the issue is with the controller or with AHCI compatibility (FreeBSD vs. ATI), because I have no proof. I just find it very unnerving that so many issues have been reported where that controller is involved, and often across all sorts of different device/disk models. All that said: I agree a loader tunable to inhibit command queueing would be nice. sysctl would be even more convenient (easier for real-time testing) but I don't know the implications of turning CQ off in the middle of any pending I/O requests. -- | Jeremy Chadwick j...@koitsu.org
Re: [HEADS UP] pkgng binary packages regression in 1.0.9. Fixed in 1.0.9_1
On Wed, Mar 20, 2013 at 04:20:02PM +0100, Matthias Gamsjager wrote: Due to the security incident, there are still no official FreeBSD packages. Do you know what the status is on that issue? I'd also like to find out what the status of this is. The packages at: ftp://ftp.freebsd.org/pub/FreeBSD/ports/amd64/packages-9-stable/ Are still circa October 2012 -- that's 4-5 months ago. While I truly and deeply understand that proper engineering design and infrastructure changes take time, there has been absolutely no communication presented to the community as to what has (or hasn't) transpired, if there is (or isn't) a plan, or if people are simply waiting until future in-person BSD* events to work things out. freebsd-ops-announce has been silent on this matter as well: http://lists.freebsd.org/mailman/listinfo/freebsd-ops-announce At this point users and administrators do not know if newer packages will be made available or if they should stick to building purely from source. Deep down I'm worried that this will solicit a response of switch to ports-mgmt/pkg and ports-mgmt/poudriere. While I'm not opposed to the tools themselves, I'm strongly opposed to that kind of response as I'm tired of seeing the security incident being used as a opportunistic crutch (as it was for the sudden cvsup/csup deprecation). -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Mountain View, CA, US| | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ACPI broke going from 8 to 9
On Sat, Dec 31, 2011 at 04:17:16PM -0700, Dan Allen wrote: On 31 Dec 2011, at 12:34 PM, Garrett Cooper wrote: Not yet. Add 'nooptions NEW_PCIB' to your KERNCONF, recompile, and try booting the new kernel. See if this works. It worked! No hang, power button works. Nice. I hope this experimental option stays in. Thank you everyone for your help. Happy New Years! This option isn't documented **anywhere** in the entire src tree. It's purely #ifdef all over. The code in question was committed 7 months ago. It was MFC'd to RELENG_8 6 months ago. Here's the HEAD commit message: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pci/pci.c#rev1.420 The RELENG_8 MFC is revision 1.386.2.15. The committer is jhb@, with mav@ being the individual who tested it, so I imagine either of these folks will have some excellent insights as to what's causing Dan's problem. I'm CC'ing them both directly on this thread. In the meantime: Dan, when you say in your original mail, I just upgraded my Dell OptiPlex GX270 from RELENG_8 to RELENG_9, can you please provide uname -a output from the system when it was running RELENG_8? I'm looking specifically for the exact time when the kernel was built, because there may have been fixes (that broke things for you) between the above commit and present-day RELENG_8 (I have not examined all commits). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On Fri, Dec 23, 2011 at 10:00:05AM -0500, John Baldwin wrote: On Thursday, December 22, 2011 6:58:46 pm Jeremy Chadwick wrote: On Fri, Dec 23, 2011 at 12:44:14AM +0100, O. Hartmann wrote: On 12/21/11 19:41, Alexander Leidinger wrote: Hi, while the discussion continued here, some work started at some other place. Now... in case someone here is willing to help instead of talking, feel free to go to http://wiki.freebsd.org/BenchmarkAdvice and have a look what can be improved. The page is far from perfect and needs some additional people which are willing to improve it. This is only part of the problem. A tuning page in the wiki - which could be referenced from the benchmark page - would be great too. Any volunteers? A first step would be to take he tuning-man-page and wikify it. Other tuning sources are welcome too. Every FreeBSD dev with a wiki account can hand out write access to the wiki. The benchmark page gives contributor-access. If someone wants write access create a FirstnameLastname account and ask here for contributor-access. Don't worry if you think your english is not good enough, even some one- word notes can help (and _my_ english got already corrected by other people on the benchmark page). Bye, Alexander. Nice to see movement ;-) But there seems something unclear: man make.conf(5) says, that MALLOC_PRODUCTION is a knob set in /etc/make.conf. The WiJi says, MALLOC_PRODUCTION is to be set in /etc/src.conf. What's right and what's wrong now? I can say with certainty that this value belongs in /etc/make.conf (on RELENG_8 and earlier at least). src/share/mk/bsd.own.mk has no framework for MK_MALLOC_PRODUCTION, so, this is definitely a make.conf variable. Eh, normal make variables can go in src.conf as well. They do not have to be listed in bsd.own.mk. World builds include /etc/src.conf whereas every make invocation includes /etc/make.conf via sys.mk. The only reason to use /etc/src.conf is to have a place to put variables only affect make buildworld / buildkernel but do not affect other make invocations. I was always under the impression src.conf(5) variables had to be manually added to bsd.own.mk and similar bits (e.g. src/tools/build/options/WITH_xxx which is what's used to create the src.conf(5) man page), but upon your comment and manual investigation on my part, I found you're indeed right. Taken from bsd.own.mk: 107 .if !defined(_WITHOUT_SRCCONF) 108 SRCCONF?= /etc/src.conf 109 .if exists(${SRCCONF}) 110 .include ${SRCCONF} 111 .endif 112 .endif As long as third-party software doesn't depend on MALLOC_PRODUCTION for something (I don't know why something would, but who knows; maybe there's a third-party malloc implementation which might?), then putting it in src.conf would be fine (src/lib/libc/stdlib files reference it). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On Fri, Dec 23, 2011 at 12:44:14AM +0100, O. Hartmann wrote: On 12/21/11 19:41, Alexander Leidinger wrote: Hi, while the discussion continued here, some work started at some other place. Now... in case someone here is willing to help instead of talking, feel free to go to http://wiki.freebsd.org/BenchmarkAdvice and have a look what can be improved. The page is far from perfect and needs some additional people which are willing to improve it. This is only part of the problem. A tuning page in the wiki - which could be referenced from the benchmark page - would be great too. Any volunteers? A first step would be to take he tuning-man-page and wikify it. Other tuning sources are welcome too. Every FreeBSD dev with a wiki account can hand out write access to the wiki. The benchmark page gives contributor-access. If someone wants write access create a FirstnameLastname account and ask here for contributor-access. Don't worry if you think your english is not good enough, even some one-word notes can help (and _my_ english got already corrected by other people on the benchmark page). Bye, Alexander. Nice to see movement ;-) But there seems something unclear: man make.conf(5) says, that MALLOC_PRODUCTION is a knob set in /etc/make.conf. The WiJi says, MALLOC_PRODUCTION is to be set in /etc/src.conf. What's right and what's wrong now? I can say with certainty that this value belongs in /etc/make.conf (on RELENG_8 and earlier at least). src/share/mk/bsd.own.mk has no framework for MK_MALLOC_PRODUCTION, so, this is definitely a make.conf variable. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
tests, reboots, etc. -- hours of work -- and if I get that wrong, it's wasted effort (thus wasted developer time). I want to get it right. :-) -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon, Dec 19, 2011 at 03:20:10PM -0800, Jeremy Chadwick wrote: On Mon, Dec 19, 2011 at 10:56:33PM +, Alexander Best wrote: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219224700.ga75...@freebsd.org, Alexander Best writes: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? The improper alignment will result in sub-par write performance, and a slight decrease in read performance writes -- but will not impact life expectancy or harm the drive in any way. This should have read ...slight decrease in read performance, not read performance writes. Editing mistake on my part. :-) -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon, Dec 19, 2011 at 10:56:33PM +, Alexander Best wrote: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219224700.ga75...@freebsd.org, Alexander Best writes: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? The improper alignment will result in sub-par write performance, and a slight decrease in read performance writes -- but will not impact life expectancy or harm the drive in any way. I recommend strongly that you rectify the situation before you get too carried away with software installations, etc.. And yes I am aware what you have is a mechanical HDD not an SSD (I say in this advance of what I'm about to write). If you need a safe alignment value, most software on Windows (including Windows 7) pick a value of 2MBytes as the alignment offset, which I believe is LBA 4095, since everything software-wise uses 512-byte sectors. That's calculated via: 2097152 / 512. This number is also evenly divisible by 4096 bytes (which is what you're trying to ensure for performance). Readers, as well as you, may wonder where the magical 2MByte value comes from, and can you pick something smaller. Yes you can pick something smaller, but the value itself stems from the added complexity of SSDs and NAND erase page size vs. NAND page size. A value of 2MBytes works well on all brands of SSDs on the market (as of this writing). Which reminds me -- I need to go back and redo most of our systems that use Intel SSDs, since at the time I picked the default offset in sysinstall (LBA 63, thus 64 * 512 = 32KBytes), which though divisible by 4096, is not optimal for NAND erase page size. I would love to advocate FreeBSD change sysinstall/bsdinstall to use a default offset of 2MBytes, but I imagine that would upset a lot of people who install FreeBSD on limited space devices (CF, etc.). Honestly though, with the size of media these days and one other question: the hdd also supports usb 3. will the improper alignment have any effect (speed wise) when connected via usb 3, or is even usb 3 too slow to notice the performance drop due to the improper alignment? USB 3.0 vs. 2.0 vs. eSATA vs. native SATA has no bearing on the situation. Those are transport protocols that define maximum bandwidth. By the way, the hard disk itself does not support USB 3.0 -- your drive is in an enclosure that contains a SATA-USB3.0 conversion chipset inside. If you open the enclosure, you will find the hard disk is SATA, and probably supports SATA600. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
/universities; the first book is basically a beginner's guide to CPU architecture. The book is also a bit old at that. Individual proceeded to look up where the article author went to school, and noted that said school's CPU architecture course **ends** with that book. The user/viewer demographic of overclockers.com is going to be significantly different from that of phoronix.com -- you know that I'm sure. The point is that you should be aware that there is going to be significant discussions that come from publishing such benchmark comparisons with such a demographic. Things that indicate severe performance differential (e.g. 10x to 100x worse) are going to be focused on and criticised -- and hopefully in a socially-agreeable manner[1] -- and in a much different way than, say, a 3D video card review site (lol ur pc sux if u spend onl $4000 on it lol). The first step is to try and figure out what exactly you're seeing and why it's so significantly different when compared to other OSes. [1]: I'm sure by now you know that the BSDs in general tend to harbour a community of folks who are more argumentative/aggressive than, say, Linux (generally speaking). In this thread though, I think all of us really want to assist in some way to figure out what exactly is going on here, scheduler-wise, and see if we can put something together to hand developers who are responsible for said code and see what comes of it. Remember, we're all here to try and make things better... I hope. :-) Footnote: It's nice meeting you (indirectly), I was always curious who did the phoronix.com reviews/stuff when it came to FreeBSD. Greetings! -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On Thu, Dec 15, 2011 at 05:32:47AM -0700, Samuel J. Greear wrote: Well, the only way it's going to get fixed is if someone sits down, replicates it, and starts to document exactly what it is that these benchmarks are/aren't doing. I think you will find that investigation is largely a waste of time, because not only are some of these benchmarks just downright silly, there are huge differences in the environments (compiler versions), etc., etc. leading to a largely apples/oranges comparison. But also the the analysis and reporting of the results by Phoronix is simply moronic to the point of being worse than useful, they are spreading misinformation. Take the first test as an example, Blogbench read. This doesn't raise any red flags, right? At least not until you realize that Blogbench isn't a read test, it's a read/write test. So what they have done here is run a read/write test and then thrown away the write results for both platforms and reported only the read results. If you dig down into the actual results, http://openbenchmarking.org/result/1112113-AR-ORACLELIN37 -- you will see two Blogbench numbers, one for read and another for write. These were both taken from the same Blogbench run, so FreeBSD optimizes writes over reads, that's probably a good thing for your data but a bad thing when someone totally misrepresents benchmark results. Other benchmarks in the Phoronix suite and their representations are similarly flawed, _ALL_ of these results should be ignored and no time should be wasted by any FreeBSD committer further evaluating this garbage. (Yes, I have been down this rabbit hole). For sake of argument, let's say we throw out the Phoronix benchmarks as a data source (I don't think the benchmark specifically implied or stated this is all because of SCHED_ULE though; remember, that's what we're supposed to be focusing on. There may not be a direct correlation between the Phoronix benchmarks and the ULE issue reported here...). That said: thrown out, data ignored, done. Now what? Where are we? We're right back where we were a day or two ago; meaning no closer to solving the dilemma reported by users and SCHED_ULE. Heck, we're not even sure if there is an issue, other than some folks confirming that SCHED_4BSD performs better for them (that's what started this whole thread), and there are at least a couple which have stated this. So given the above semi-devil's-advocate response -- Sam, do you have something positive or progressive to offer so we can move forward on the ULE vs. 4BSD debacle? :-) The smiley is meant to be sincere, not sarcastic. I'm getting to the point where I'm considering formulating a private mail to Jeff Roberson, requesting that he be aware of the discussion that's happening (not that he necessarily follow or read it), and that based on what I can tell we're at a roadblock -- nobody so far is absolutely certain how to benchmark and compare ULE vs. 4BSD in multiple ways, so that those of us involved here can run such utilities and provide the data somewhere central for devs to review. I only mention this because so far I haven't seen anyone really say okay, this is what we should be using for these kinds of tests. Yay nature of the beast. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote: 2011/12/13 Jeremy Chadwick free...@jdc.parodius.com: On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. ??[...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This is in no way shape or form the same kind of benchmark as what you're planning to do, but I thought I'd throw it out there for folks to take in as they see fit. I know folks were focused mainly on buildworld. I personally would find it interesting if someone with a higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test (changing -jX to -j{numofcores} of course). -- | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at parodius.com | | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? http://www.parodius.com/ | | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US | | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB | sched_ule === - time make -j2 buildworld ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w - time make -j2 buildkernel ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w sched_4bsd - time make -j2 buildworld ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w - time make -j2 buildkernel ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w software == * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011 * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 Hi Jeremy, thanks for the time you spent on this. However, I wanted to ask/let you note 3 things: 1) Did you use 2 different code base for the test? (one updated on December 1 and another one on December 12) No; src-all (/usr/src on this system) was not updated between December 1st and December 12th PST. I do believe I updated it today (15th PST). I can/will obviously hold off so that we have a consistent code base for comparing numbers between schedulers during buildworld and/or buildkernel. 2) Please note that you should have repeated this test several times (basically until you don't get a standard deviation which is acceptable with ministat) and report the ministat output This is the first time I have heard of ministat(1). I'm pretty sure I see what it's for and how it applies to this situation, but boy that man page could use some clarification (I have 3 people looking at this thing right now trying to figure out what means what in the graph :-) ). Anyway, graph or not, I see the point. Regarding multiple tests: yup, you're absolutely right, the only way to do it would be to run a sequence of tests repeatedly (probably 10 per scheduler). Reboots and rm -fr /usr/obj/* would be required after each test too, to guarantee empty kernel caches (of all types) consistently every time. What I posted was supposed to give people just a general idea if there was any gigantic difference between the two, and there really isn't. But, as others have stated (and you below), buildworld may not be an effective way to benchmark what we're trying to test. Hence me wondering exactly what would make for a good test. Example: 1. Run + background some program that beats on things (I really don't know what; creation/deletion of threads? CPU benchmark? bonnie++?), with output going to /dev/null. 2. Run + background time make -j2 buildworld with output going to /dev/null 3. Record/save output from time. 4. rm -fr /usr/obj shutdown -r now 5. Repeat all steps ~10 times 6. Adjust kernel configuration file to use other scheduler 7. Repeat steps 1-5. What I'm trying to figure out is what #1 and #2 should be in the above example. 3) The difference is less than 2% which I suspect is really
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. This is in no way shape or form the same kind of benchmark as what you're planning to do, but I thought I'd throw it out there for folks to take in as they see fit. I know folks were focused mainly on buildworld. I personally would find it interesting if someone with a higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test (changing -jX to -j{numofcores} of course). -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | sched_ule === - time make -j2 buildworld 1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w - time make -j2 buildkernel 640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w sched_4bsd - time make -j2 buildworld 1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w - time make -j2 buildkernel 638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w software == * sched_ule test: FreeBSD 8.2-STABLE, Thu Dec 1 04:37:29 PST 2011 * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 hardware == * Intel Core 2 Duo E8400, 3GHz * Supermicro X7SBA * 8GB ECC RAM (4x2GB), DDR2-800 * Intel 320-series SSD, 80GB: /, swap, /var, /tmp, /usr tuning adjustments / etc. === * Before each scheduler test, system was rebooted to ensure I/O cache and other whatnots were empty * All filesystems stock UFS2 + SU (root is non-SU) * All filesystems had tunefs -t enable applied to them * powerd(8) in use, with two rc.conf variables (per CPU spec): performance_cx_lowest=C2 economy_cx_lowest=C2 * loader.conf kern.maxdsiz=2560M kern.dfldsiz=2560M kern.maxssiz=256M ahci_load=yes hint.p4tcc.0.disabled=1 hint.acpi_throttle.0.disabled=1 vfs.zfs.arc_max=5120M * make.conf CPUTYPE?=core2 * src.conf WITHOUT_INET6=true WITHOUT_IPFILTER=true WITHOUT_LIB32=true WITHOUT_KERBEROS=true WITHOUT_PAM_SUPPORT=true WITHOUT_PROFILE=true WITHOUT_SENDMAIL=true * kernel configuration - note: between kernel builds, config was changed to either use SCHED_4BSD or SCHED_ULE respectively. cpu HAMMER ident GENERIC makeoptions DEBUG=-g# Build kernel with gdb(1) debug symbols options SCHED_4BSD # Classic BSD scheduler #optionsSCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET# InterNETworking options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL# Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options NFSLOCKD# Network Lock Manager options NFS_ROOT# NFS usable as /, requires NFSCLIENT options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS# Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 12:13:42PM +0100, O. Hartmann wrote: On 12/12/11 16:13, Vincent Hoffman wrote: On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Interesting, there seems to be a much more performant scheduler in 7.0, called SCHED_SMP. I have some faint recalls on that ... where is this beast gone? Boy I sure hope I remember this right. I strongly urge others to correct me where I'm wrong; thanks in advance! The classic scheduler, SCHED_4BSD, was implemented back before there was oxygen. sched_4bsd(4) mentions this. No need to discuss it. Jeff Robertson began working on the first-generation ULE scheduler during the days of FreeBSD 5.x (I believe 5.1), and a paper on it was presented at USENIX circa 2003: http://www.usenix.org/event/bsdcon03/tech/full_papers/roberson/roberson.pdf Over the following years, Jeff (and others I assume -- maybe folks like George Neville-Neil and/or Kirk McKusick?) adjusted and tinkered with some of the semantics and models/methods. If I remember right, some of these quirks/fixes were committed. All of this was happening under the scheduler that was then called SCHED_ULE, but it was ULE 1.0 for lack of better terminology. This scheduler did not perform well, if I remember right, and Jeff was quite honest about that. From this point forward, Jeff began idealising and working on a scheduler which he called SCHED_SMP -- think of it as ULE 2.0, again, for lack of better terminology. It was different than the existing SCHED_ULE scheduler, hence a different name. Jeff blogged about this in early 2007, using exactly that term (ULE 2.0): http://jeffr-tech.livejournal.com/3729.html In mid-2007, prior to FreeBSD 7.0-RELEASE, Jeff announced that effectively he wanted to make SCHED_ULE do what SCHED_SMP did, and provided a patch to SCHED_ULE to accomplish just that: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-07/msg00755.html Full thread is here (beware -- many replies): http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-07/threads.html#00755 The patch mentioned above was merged into HEAD on 2007/07/19. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c#rev1.202 So in effect, as of 2007/07/19, SCHED_ULE became SCHED_SMP. FreeBSD 7.0-RELEASE was released on 2008/02/27, and the above commit/changes were available at that time as well (meaning: RELENG_7 and RELENG_7_0 at that moment in time should have included the patch from the above paragraph). The document released by Kris Kenneway hinted at those changes and performance improvements: http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf Keep in mind, however, that at that time kernel configuration files (GENERIC, etc.) still defaulted to SCHED_4BSD. The default scheduler in kernel config files (GENERIC, etc.) for i386 and amd64 (not sure about others) was changed in 2007/10/19: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/conf/GENERIC#rev1.475 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/amd64/conf/GENERIC#rev1.485 This was done *prior* to FreeBSD 7.1-RELEASE. So, it first became available as the default scheduler for the masses when 7.1-RELEASE came out on 2009/01/05. All of the answers, in a roundabout and non-user-friendly way, are available by examining the commit history for src/sys/kern/sched_ule.c. It's hard to follow especially given that you have to consider all the releases/branchpoints that took place over time, but: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c Are we having fun yet? :-) -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail
Re: zfs i/o hangs on 9-PRERELEASE
On Sat, Nov 26, 2011 at 04:47:35PM -0600, Mark Felder wrote: It appears that I'm mistaken about those messages then . However this does both happen on my AMD x6 and Intel Atom machines with different hard drives, controllers, etc. I feel it would be unlikely to be hardware. Unfortunately the procstat command is probably of no use because I can't interact with the console or ssh for the periods of time when it is hanging (sometimes in excess of a minute). Zpool scrubs come up clean and I never see any errors reported. I've been running this hardware for 2 years and v28 for quite some time. It doesn't seem like it started happening until I upgraded to a build past RC1. I don't know where to find RC1 media and I don't know the svn revision of RC1 so I haven't tried. The kernel backtrace you provided indicates a problem in pf(4), not ZFS. What piece am I missing? -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SIOCGIFADDR broken on 9.0-RC1?
On Tue, Nov 15, 2011 at 11:35:37PM +0100, GR wrote: From Kristof Provost kris...@sigsegv.be: [..] The 'ia' pointer is later used to return the IP address. In other words: it returns the first address on the interface of type IF_INET (which isn't assigned to a jail). I think the order of the addresses is not fixed, or rather it depends on the order in which you assign addresses. In the handling of SIOCSIFADDR new addresses are just appended: TAILQ_INSERT_TAIL(ifp-if_addrhead, ifa, ifa_link); I don't believe this has changed since 8.0. Is it possible something changed in the network initialisation, leading to the addresses being assigned in a different order? Eagerly awaiting to be told I'm wrong, Kristof Thanks Kristof. It appears you are right, the order of assignement is important. I configured my interface using DHCP, and added aliases (all in /etc/rc.conf). But on the 8.2-RELEASE, I used static configuration. So, I switched to static assignement and it changes the behaviour (and fixes the bug). My guess is that during the time waiting for the DHCP offer, all aliases are already configured on the network interface, and the IP address given by DHCP is added at the end of the tail. Is that a wanted behaviour? I find it dangerous (i.e. not exactly what a user is expecting). Note: my aliases are attributed to jails. I would recommend adding synchronous_dhclient=yes to /etc/rc.conf. This will cause dhclient (the DHCP client) to wait until it gets an answer + IP back from the DHCP server before continuing with the rc.d scripts. The default is no. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: FreeBSD 10.0-CURRENT/amd64: Weirdness with LOCALE settings: ghostswitching in csh?
On Fri, Nov 04, 2011 at 07:49:52AM +0100, O. Hartmann wrote: Am 11/03/11 23:48, schrieb Jeremy Chadwick: On Thu, Nov 03, 2011 at 11:17:08PM +0100, O. Hartmann wrote: Hello. I realised something weird in FreeBSD 10.-CURRENT/amd64 (CLANG compiled), build as from today (buildworld). Working the whole day coding some pyhton scripts and committing the code to my subversion server (most recent subversion from the ports collection, the server is a FreeBSD 9.0-RC1/amd64 box, also system compiled with CLANG, most recent as compiled world of today), suddenly, oy of the blue, trying again to commit I get this error: svn: warning: cannot set LC_CTYPE locale svn: warning: environment variable LC_CTYPE is de_DE.ISO-8859-1 svn: warning: please check that your locale name is correct Checking csh shell setting with 'locale: LANG= LC_CTYPE=C LC_COLLATE=C LC_TIME=C LC_NUMERIC=C LC_MONETARY=C LC_MESSAGES=C LC_ALL= Checking my settings from /etc/csh.cshrc and ./.cshrc or .login reveals localised settings for some of the locales as I need those: (set in $HOME/.cshrc) setenv LC_CTYPEde_DE.ISO-8859-1 setenv LC_TIME de_DE.ISO-8859-1 setenv LC_MONETARY de_DE.ISO-8859-1 What is going on? I realised this behaviour now several times, first time I thought I did something and I couldn't remember, but this time, only two terminal windows were opened and the whole day committing data to the repository wasn't an issue. Is there an explanation for this? It sounds like a problem specific to the client end, meaning your -CURRENT box. If that's the case: shouldn't this mail have gone to freebsd-current@ instead of freebsd-stable@ ? What am I missing? Mea culpa, mea culpa, mea maxima culpa! It was intented to send the mail to CURRENT. Sorry, missed the listentry by one row ... Can you please so kind and show mercy? No worries. I wasn't sure if there was a reason -stable was involved; I saw it and thought Hmm, he mentions a 9.0-RC1/amd64 box, maybe that's where the problem is? I must be missing something, so I thought I'd ask. Mistakes happen, especially ones from me! :-) As for your problem: your locale looks incorrect. It's de_DE.ISO8859-1. Note that yours has an extra hyphen, which probably explains the error (sort of). $ ls -ld /usr/share/locale/de_DE* drwxr-xr-x2 root wheel 512 Sep 28 14:36 /usr/share/locale/de_DE.ISO8859-1/ drwxr-xr-x2 root wheel 512 Sep 28 14:36 /usr/share/locale/de_DE.ISO8859-15/ drwxr-xr-x2 root wheel 512 Sep 28 14:36 /usr/share/locale/de_DE.UTF-8/ As for the fact that it's random: I cannot explain why a sub-shell might get spawned in some cases but not others. I corrected this. Sorry. I ffel a bit confused, since sometimes it is ISO-8859-1 and sometimes ISO8859-1. I got confused again. After correcting that, the locale variables has been set correctly. I will check now wether this also influences this weird random behaviour. I find the randomness of the situation more perplexing than fixing your locale (I have a feeling you do too). I imagine this will probably fix the errors you were seeing, but I'm still surprised that the errors would happen seemingly intermittently. I wonder how one could go about debugging such a thing. Hmm. Are there any environment complexities you might have, such as using GNU screen or tmux? I'm cycling through ideas as they come to me. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can audio CDs be played with ATA_CAM ?
On Tue, Oct 25, 2011 at 01:18:47PM +0200, Claude Buisson wrote: On 10/25/2011 12:52, Daniel O'Connor wrote: On 25/10/2011, at 20:45, Claude Buisson wrote: When upgrading a system to 8.2-STABLE, I switched my kernel from atapicam to ATA_CAM, and found that vlc could not play audio CDs anymore. Reverting to atapicam (and reverting from cdN to acdN of course), vlc was OK again. It seems that I am not the only one having this kind of problem, as I found (for example) this message on questions@ (for releng9): http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234737.html Is this a known problem ? Is somebody working on it ? Have you tried pointing VLC at /dev/cd0 when using ATA_CAM? Of course yes ! (I even configured WITH_CDROM_DEVICE=/dev/cd1 when building VLC) It may be trying old style ATA ioctls based on the device name. VLC recognize the tracks and jump quickly from one to the following, without playing it, and with a flow of messages: [0x2caf2a3c] cdda access error: Could not set block size [0x2caf2a3c] cdda access error: cannot read sector n where the sector number is incremented, and then emit (2 times if I remenber): [0x2af28bc] es demux error: cannot peek Sorry for having ommited these messages in the previous mail. I found a PR 161760 about cdparanoia needing to be patched for 9.0 with CAM, a proposal by avg@ related to libxine: http://lists.freebsd.org/pipermail/freebsd-multimedia/2010-December/011414.html These may not be the same problem, but I think they are related (a not so well documented change in the kerm interface). You want atapicam(4). This is not the same thing as options ATA_CAM. See /sys/conf/NOTES. Whether or not it works with audio CDs is unknown to me. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: avl_find() panic
On Wed, Jul 06, 2011 at 12:21:55AM +, John wrote: I have a system that panic'd this morning, 4 day old current (2011-07-01_11.45pm). Message typed in from the console immediately after reboot. OS on ufs, data volumes on zfs. ZFS filesystem version 5 ZFS storage pool version 28 panic: avl_find() succedded inside avl_find() Unfortunately, I don't have a traceback for this. The comment in avl.c makes it seem like the avl code is enforcing uniqueness in calling code, esp. where it talks about kernel vs userland. I'll followup with more info if this replicates. Cross-posting is generally shunned, but since this is a current thing, adding freebsd-current to the CC list. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: snd_hda : sometimes sound sometimes not
On Sat, May 28, 2011 at 03:30:26PM +0200, David Demelier wrote: On 12/05/2011 08:47, David Demelier wrote: Hello, I don't know if there is a lot of changes in the snd_hda driver in the -STABLE branch but since I upgraded to it sometimes I have sound and sometimes not. The mixer are exactly the same when these event occurs. This happened this morning. After booting I do not have any sound. I rebooted and suddenly I've got sound again... I only tweak snd_hda(4) for a pin sense on the front panel (it has no sound neither) So I added in /boot/devices.hints : hint.hdac.1.cad0.nid27.config=as=1 seq=15 And there's the both dmesg ok.txt when sound is here and not.txt when there isn't as you can see there is no difference related to the hda driver. http://markand.malikania.fr/ok.txt http://markand.malikania.fr/nok.txt I'm guessing something. My laptop has a mute shortcut, if I press it at the BIOS stage I will not have sound neither thus is it possible that my chipset is muted from anything? Cheers, Sorry to cross-post again, but I just wanted to tell you that the problem disappeared in -CURRENT so now I just how the unknown bogus code will be MFC before 8.3-RELEASE Unless someone can chime in with details of the commits which changed, assuming the magic change will be MFC'd is a bad one. It's safe to say that when 8.3-RELEASE comes out if this problem haunts you again, you will be mailing the list about it, and this cycle will continue until 9.0-RELEASE comes out. Does any developer/committer have familiarity with this issue and have some ideas as to what may have changed in CURRENT that addresses David's issue? And if so, can that code be MFC'd safely or patches provided to David for RELENG_8 that he can try out? I'm CC'ing mav@ here (snd_hda(4) says he's one of the authors), although he may not have any knowledge of the code which may need to be MFC'd. He may be able to point us to who has a better idea though. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ACL issue (Was Re: HEADS UP: ZFSv28 is in!)
On Sun, Mar 06, 2011 at 09:43:34AM -0500, Steve Wills wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/06/11 08:35, Steve Wills wrote: On 03/06/11 04:22, Edward Tomasz NapieraBa wrote: Wiadomo[ napisana przez Steve Wills w dniu 2011-03-06, o godz. 05:11: [..] Thanks for your work on this, I'm very happy to have ZFS v28. I just updated my -CURRENT system from a snapshot from about a month ago to code from today. I have 3 pools and one of them is for ports tinderbox. I only upgraded that pool. When I try to build something using tinderbox, I get this error: cp: failed to set acl entries for /usr/local/tinderbox/9-CURRENT-amd64-FreeBSD/buildscript: Operation not supported What does mount show? /dev/md4 12186190 332724 11853466 3% /usr/local/tinderbox/9-CURRENT-amd64-FreeBSD Sorry, I forgot about the mdmfs hacks I had in my local tinderd. Without them, it works fine. So the problem seems to be in mfs rather than zfs. I should have said mdmfs, but all that's doing is running mdconfig and newfs for me. I've reproduced the issue without mdmfs: % mdconfig -a -t swap -s 12G -u 4 % newfs -m 0 -o time /dev/md4 [...] % mount /dev/md4 /tmp/foobar % cp -p /usr/local/tinderbox/scripts/lib/buildscript /tmp/foobar cp: failed to set acl entries for /tmp/foobar/buildscript: Operation not supported Without -p it works fine. FWIW: % getfacl /usr/local/tinderbox/scripts/lib/buildscript # file: /usr/local/tinderbox/scripts/lib/buildscript # owner: root # group: wheel owner@:--:--:deny owner@:rwxp---A-W-Co-:--:allow group@:-w-p--:--:deny group@:r-x---:--:allow everyone@:-w-p---A-W-Co-:--:deny everyone@:r-x---a-R-c--s:--:allow Any suggestions on where the problem could be? At first glance it looks like acl_set_fd_np(3) isn't working on an md-backed filesystem; specifically, it's returning EOPNOTSUPP. You should be able to reproduce the problem by doing a setfacl on something in /tmp/foobar. Looking through src/bin/cp/utils.c, this is the code: 420 if (acl_set_fd_np(dest_fd, acl, acl_type) 0) { 421 warn(failed to set acl entries for %s, to.p_path); 422 acl_free(acl); 423 return (1); 424 } EOPNOTSUPP for acl_set_fd_np(3) is defined as: [EOPNOTSUPP] The file system does not support ACL retrieval. This would be referring to the destination filesystem. Looking through the md(4) source for references to EOPNOTSUPP, we do find some references: $ egrep -n -r EOPNOTSUPP|ENOTSUP /usr/src/sys/dev/md /usr/src/sys/dev/md/md.c:423: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:475: error = EOPNOTSUPP; /usr/src/sys/dev/md/md.c:523: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:601: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:731: error = EOPNOTSUPP; Line 423 is within mdstart_malloc(), and it returns EOPNOTSUPP on any BIO operation other than READ/WRITE/DELETE. Line 475 is a continuation of that. Line 508 is within mdstart_vnode(), behaving effectively the same as line 423. Line 601 is within mdstart_swap(), behaving effectively the same as line 423. Line 731 is within md_kthread(), and indicates only BIO operation BIO_GETATTR is supported. This would not be an ACL attribute thing, but rather getting attributes of the backing device itself. The code hints at that: 722 if (bp-bio_cmd == BIO_GETATTR) { 723 if ((sc-fwsectors sc-fwheads 724 (g_handleattr_int(bp, GEOM::fwsectors, 725 sc-fwsectors) || 726 g_handleattr_int(bp, GEOM::fwheads, 727 sc-fwheads))) || 728 g_handleattr_int(bp, GEOM::candelete, 1)) 729 error = -1; 730 else 731 error = EOPNOTSUPP; 732 } else { This leaves me with some ideas; just tossing them out here... 1. Maybe/somehow this is caused by swap being used as the backing type/store for md(4)? Try using mdconfig -t malloc -o reserve instead, temporarily anyway. 2. Are you absolutely 100% sure the kernel you're using was built with options UFS_ACL defined in it? Doing a strings -a /boot/kernel/kernel | grep UFS_ACL should suffice. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo
Re: ACL issue (Was Re: HEADS UP: ZFSv28 is in!)
On Sun, Mar 06, 2011 at 11:06:09AM -0500, Steve Wills wrote: On 03/06/11 10:37, Jeremy Chadwick wrote: At first glance it looks like acl_set_fd_np(3) isn't working on an md-backed filesystem; specifically, it's returning EOPNOTSUPP. You should be able to reproduce the problem by doing a setfacl on something in /tmp/foobar. Looking through src/bin/cp/utils.c, this is the code: 420 if (acl_set_fd_np(dest_fd, acl, acl_type) 0) { 421 warn(failed to set acl entries for %s, to.p_path); 422 acl_free(acl); 423 return (1); 424 } EOPNOTSUPP for acl_set_fd_np(3) is defined as: [EOPNOTSUPP] The file system does not support ACL retrieval. This would be referring to the destination filesystem. Looking through the md(4) source for references to EOPNOTSUPP, we do find some references: $ egrep -n -r EOPNOTSUPP|ENOTSUP /usr/src/sys/dev/md /usr/src/sys/dev/md/md.c:423: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:475: error = EOPNOTSUPP; /usr/src/sys/dev/md/md.c:523: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:601: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:731: error = EOPNOTSUPP; Line 423 is within mdstart_malloc(), and it returns EOPNOTSUPP on any BIO operation other than READ/WRITE/DELETE. Line 475 is a continuation of that. Line 508 is within mdstart_vnode(), behaving effectively the same as line 423. Line 601 is within mdstart_swap(), behaving effectively the same as line 423. Line 731 is within md_kthread(), and indicates only BIO operation BIO_GETATTR is supported. This would not be an ACL attribute thing, but rather getting attributes of the backing device itself. The code hints at that: 722 if (bp-bio_cmd == BIO_GETATTR) { 723 if ((sc-fwsectors sc-fwheads 724 (g_handleattr_int(bp, GEOM::fwsectors, 725 sc-fwsectors) || 726 g_handleattr_int(bp, GEOM::fwheads, 727 sc-fwheads))) || 728 g_handleattr_int(bp, GEOM::candelete, 1)) 729 error = -1; 730 else 731 error = EOPNOTSUPP; 732 } else { Thanks for the investigation! So this seems to be a bug in md? That's too bad, I was enjoying using it to make my tinderbox builds faster. Sorry, I should have been more clear -- my investigation wasn't to determine if the issue you're reporting was a bug or not, but more along the lines of hmm, where is userland getting EOPNOTSUPP from in the kernel in this situation? It could be that some piece hasn't been implemented somewhere yet (more an incomplete than a bug :-) ). I tend to trace source the way I did above in hopes that someone (kernel dev, etc.) will chime in and go Oh, yes, THAT... let me tell you about that! It's also for educational purposes; I figure sharing the innards along with some simple descriptions might help people feel more comfortable (vs. thinking everything is a black box; don't let the magic smoke out!). Sometimes digging through the code helps. This leaves me with some ideas; just tossing them out here... 1. Maybe/somehow this is caused by swap being used as the backing type/store for md(4)? Try using mdconfig -t malloc -o reserve instead, temporarily anyway. Seems to be the same. I'm not too surprised, but at least that rules out swap vs. non-block-device stuff being somehow responsible. I'm not a user of ACLs myself, but Robert Watson might know what's up with this, or where to go looking. I've CC'd him here. 2. Are you absolutely 100% sure the kernel you're using was built with options UFS_ACL defined in it? Doing a strings -a /boot/kernel/kernel | grep UFS_ACL should suffice. Yep, it does: % strings -a /boot/kernel/kernel | grep UFS_ACL options UFS_ACL (My kernel config is just include GENERIC then a bunch of nooptions for KDB, DDB, GDB, INVARIANTS, WITNESS, etc.) Cool, good to rule out the obvious. Thanks. The only other thing I can think of off the top of my head would be to ktrace -t+ -i the cp -p, then provide output of kdump -s -t+ after. I wouldn't say go about this quite yet (it may not even help determine what's going on); maybe wait for Robert to take a look first. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org
Re: ACL issue (Was Re: HEADS UP: ZFSv28 is in!)
On Sun, Mar 06, 2011 at 08:23:42AM -0800, Jeremy Chadwick wrote: On Sun, Mar 06, 2011 at 11:06:09AM -0500, Steve Wills wrote: On 03/06/11 10:37, Jeremy Chadwick wrote: At first glance it looks like acl_set_fd_np(3) isn't working on an md-backed filesystem; specifically, it's returning EOPNOTSUPP. You should be able to reproduce the problem by doing a setfacl on something in /tmp/foobar. Looking through src/bin/cp/utils.c, this is the code: 420 if (acl_set_fd_np(dest_fd, acl, acl_type) 0) { 421 warn(failed to set acl entries for %s, to.p_path); 422 acl_free(acl); 423 return (1); 424 } EOPNOTSUPP for acl_set_fd_np(3) is defined as: [EOPNOTSUPP] The file system does not support ACL retrieval. This would be referring to the destination filesystem. Looking through the md(4) source for references to EOPNOTSUPP, we do find some references: $ egrep -n -r EOPNOTSUPP|ENOTSUP /usr/src/sys/dev/md /usr/src/sys/dev/md/md.c:423: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:475: error = EOPNOTSUPP; /usr/src/sys/dev/md/md.c:523: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:601: return (EOPNOTSUPP); /usr/src/sys/dev/md/md.c:731: error = EOPNOTSUPP; Line 423 is within mdstart_malloc(), and it returns EOPNOTSUPP on any BIO operation other than READ/WRITE/DELETE. Line 475 is a continuation of that. Line 508 is within mdstart_vnode(), behaving effectively the same as line 423. Line 601 is within mdstart_swap(), behaving effectively the same as line 423. Line 731 is within md_kthread(), and indicates only BIO operation BIO_GETATTR is supported. This would not be an ACL attribute thing, but rather getting attributes of the backing device itself. The code hints at that: 722 if (bp-bio_cmd == BIO_GETATTR) { 723 if ((sc-fwsectors sc-fwheads 724 (g_handleattr_int(bp, GEOM::fwsectors, 725 sc-fwsectors) || 726 g_handleattr_int(bp, GEOM::fwheads, 727 sc-fwheads))) || 728 g_handleattr_int(bp, GEOM::candelete, 1)) 729 error = -1; 730 else 731 error = EOPNOTSUPP; 732 } else { Thanks for the investigation! So this seems to be a bug in md? That's too bad, I was enjoying using it to make my tinderbox builds faster. Sorry, I should have been more clear -- my investigation wasn't to determine if the issue you're reporting was a bug or not, but more along the lines of hmm, where is userland getting EOPNOTSUPP from in the kernel in this situation? It could be that some piece hasn't been implemented somewhere yet (more an incomplete than a bug :-) ). I tend to trace source the way I did above in hopes that someone (kernel dev, etc.) will chime in and go Oh, yes, THAT... let me tell you about that! It's also for educational purposes; I figure sharing the innards along with some simple descriptions might help people feel more comfortable (vs. thinking everything is a black box; don't let the magic smoke out!). Sometimes digging through the code helps. This leaves me with some ideas; just tossing them out here... 1. Maybe/somehow this is caused by swap being used as the backing type/store for md(4)? Try using mdconfig -t malloc -o reserve instead, temporarily anyway. Seems to be the same. I'm not too surprised, but at least that rules out swap vs. non-block-device stuff being somehow responsible. I'm not a user of ACLs myself, but Robert Watson might know what's up with this, or where to go looking. I've CC'd him here. 2. Are you absolutely 100% sure the kernel you're using was built with options UFS_ACL defined in it? Doing a strings -a /boot/kernel/kernel | grep UFS_ACL should suffice. Yep, it does: % strings -a /boot/kernel/kernel | grep UFS_ACL options UFS_ACL (My kernel config is just include GENERIC then a bunch of nooptions for KDB, DDB, GDB, INVARIANTS, WITNESS, etc.) Cool, good to rule out the obvious. Thanks. The only other thing I can think of off the top of my head would be to ktrace -t+ -i the cp -p, then provide output of kdump -s -t+ after. I wouldn't say go about this quite yet (it may not even help determine what's going on); maybe wait for Robert to take a look first. It would help if I actually added Robert to the CC list, wouldn't it? :-) -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com
Re: HEADS UP: ZFSv28 is in!
On Sun, Feb 27, 2011 at 09:29:57PM +0100, Pawel Jakub Dawidek wrote: I just committed ZFSv28 to HEAD. Thank you so much for this effort! I look forward to trying this once it's MFC'd to RELENG_8 in the upcoming future. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: About panic: bufwrite: buffer is not busy???
On Sun, Feb 20, 2011 at 10:30:52AM -0500, Mike Tancsa wrote: On 2/20/2011 9:33 AM, Andrey Smagin wrote: On week -current I have same problem, my box paniced every 2-15 min. I resolve problem by next steps - unplug network connectors from 2 intel em (82574L) cards. I think last time that mpd5 related panic, but mpd5 work with another re interface interated on MB. I think it may be em related panic, or em+mpd5. The latest panic I saw didnt have anything to do with em. Are you sure your crashes are because of the nic drive ? Not to mention, the error string the OP provided (see Subject) is only contained in one file: sys/ufs/ffs/ffs_vfsops.c, function ffs_bufwrite(). So, that would be some kind of weird filesystem-related issue, not NIC-specific. I have no idea how to debug said problem. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: TTY task group scheduling
On Fri, Nov 19, 2010 at 02:18:52PM +, Vincent Hoffman wrote: On 19/11/2010 12:42, Eric Masson wrote: Bruce Cran br...@cran.org.uk writes: Hello, Google suggests that the work was a GSoC project in 2005 on a pluggable disk scheduler. It seems that something similar has found its way in DFlyBSD, dsched. And indeed to FreeBSD, man gsched. Added sometime round April http://svn.freebsd.org/viewvc/base/head/sys/geom/sched/README?view=log It's been pointed out on the list a couple times, and I've sent mail to the authors about this, that gsched breaks (very, very badly) things like sysinstall, and does other strange things like leaves trailing periods at the end of its .sched. labels. This appears to be by design, but I'm still left thinking ?! It's hard to discern technical innards/workings of GEOM since the documentation is so poor (and reading the code doesn't help, especially with regards to libgeom). IMHO, the gsched stuff, as a layer, should probably be moved into the I/O framework by default, with the functionality *disabled* by default and tunables to adjust it. That's just how I feel about it. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: patch for topology detection of Intel CPUs
On Mon, Sep 06, 2010 at 03:17:42PM +0300, Andriy Gapon wrote: on 29/08/2010 12:25 Andriy Gapon said the following: The below patch is against sources in FreeBSD tree, it should be applied either to sys/amd64/amd64/mp_machdep.c or sys/i386/i386/mp_machdep.c depending on the desired architecture: http://people.freebsd.org/~avg/intel-cpu-topo.diff I see that I am not getting as many testers as I expected, so I am going to commit the patch. You still have a short while to either objectively object to the patch or to voluntary test it :-) I would gladly assist in testing this, except there doesn't appear to be an authoritative statement that it will apply to RELENG_8; when I see WIP, I assume -CURRENT/HEAD only. Let me know, since all the systems I have are Intel multi-core. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: patch for topology detection of Intel CPUs
On Mon, Sep 06, 2010 at 03:56:01PM +0300, Andriy Gapon wrote: on 06/09/2010 15:23 Jeremy Chadwick said the following: On Mon, Sep 06, 2010 at 03:17:42PM +0300, Andriy Gapon wrote: on 29/08/2010 12:25 Andriy Gapon said the following: The below patch is against sources in FreeBSD tree, it should be applied either to sys/amd64/amd64/mp_machdep.c or sys/i386/i386/mp_machdep.c depending on the desired architecture: http://people.freebsd.org/~avg/intel-cpu-topo.diff I see that I am not getting as many testers as I expected, so I am going to commit the patch. You still have a short while to either objectively object to the patch or to voluntary test it :-) I would gladly assist in testing this, except there doesn't appear to be an authoritative statement that it will apply to RELENG_8; when I see WIP, I assume -CURRENT/HEAD only. patch -C is much better than any statement :) Let me know, since all the systems I have are Intel multi-core. Yes, the patch should be applicable to stable/8 without any issues. Great, thanks! I'll be testing this out on two separate systems, both RELENG_8: - Supermicro X7SBA + Intel C2D E8400 (stepping 10) - Supermicro X7SBL-LN2 + Intel C2D E6600 (stepping 6) I'll make sure to provide what the topology looks like before and after. Is CPU-relevant dmesg output sufficient? -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: patch for topology detection of Intel CPUs
On Mon, Sep 06, 2010 at 04:28:02PM +0300, Andriy Gapon wrote: on 06/09/2010 16:12 Jeremy Chadwick said the following: Great, thanks! I'll be testing this out on two separate systems, both RELENG_8: - Supermicro X7SBA + Intel C2D E8400 (stepping 10) - Supermicro X7SBL-LN2 + Intel C2D E6600 (stepping 6) I'll make sure to provide what the topology looks like before and after. Is CPU-relevant dmesg output sufficient? If you mean something like the below, then yes. Thanks! [...] All done. Good news (I think): there's no difference in the CPU-related topology on either system with your patch, aside from kernel build date. The topologies are still detected correctly. In case you want them: Supermicro X7SBA Intel C2D E8400 (stepping 10) === Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-STABLE #0: Mon Sep 6 09:06:52 PDT 2010 r...@icarus.home.lan:/usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64 amd64 Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (2992.52-MHz K8-class CPU) Origin = GenuineIntel Id = 0x1067a Family = 6 Model = 17 Stepping = 10 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x408e3fdSSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 4112097280 (3921 MB) ACPI APIC Table: PTLTD APIC FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 24-47 on motherboard kbd1 at kbdmux0 ichwd module loaded acpi0: PTLTDXSDT on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0 cpu0: ACPI CPU on acpi0 cpu1: ACPI CPU on acpi0 Supermicro X7SBL-LN2 Intel C2D E6600 (stepping 6) == Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-STABLE #1: Mon Sep 6 07:59:49 PDT 2010 r...@gujoja.home.lan:/usr/obj/usr/src/sys/X7SBL_RELENG_8_amd64 amd64 Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz (2394.01-MHz K8-class CPU) Origin = GenuineIntel Id = 0x6f6 Family = 6 Model = f Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xe3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant real memory = 8589934592 (8192 MB) avail memory = 8261648384 (7878 MB) ACPI APIC Table: PTLTD APIC FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 Version 2.0 irqs 0-23 on motherboard kbd1 at kbdmux0 ichwd module loaded acpi0: PTLTDXSDT on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0 cpu0: ACPI CPU on acpi0 cpu1: ACPI CPU on acpi0 All other systems I have are C2D and C2Q-based, but I can't easily test on those given their production roles. If there's a particular Intel processor family/model you're interested in, let me know and I can dig around to see if I have access to one. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Watchdog resets on 82575
On Tue, Aug 10, 2010 at 10:30:21AM +0100, Steven Hartland wrote: Is there an easy way to check which chip is present as the startup doesnt seem to mention it? Not during start-up, but once the machine is running (including in single-user), you can do: pciconf -lvc And look for device igb0. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Watchdog resets on 82575
On Tue, Aug 10, 2010 at 11:23:26AM +0100, Steven Hartland wrote: Thanks Jeremy, from that we get:- i...@pci0:1:0:0:class=0x02 card=0x060015d9 chip=0x10c98086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages in map 0x1c enabled cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x4(x4) i...@pci0:1:0:1:class=0x02 card=0x060015d9 chip=0x10c98086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages in map 0x1c enabled cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x4(x4) I assume there is a way to convert from the hex values to the human value but not sure what it is? The card and chip identifiers are part of the PCI ID specification. You can see what the human value is by examining the source code for the driver. Sometimes it's easy to figure out, other times there's a series of #define's which you have to reverse engineer. In this case, there's two places with relevant information: src/sys/dev/e1000/if_igb.c src/sys/dev/e1000/e1000_hw.h You have to split the Chip ID into two separate 16-bit portions, so 0x10c9 and 0x8086. 0x8086 is Intel's vendor code. 0x10c9 is the device ID of the individual NIC/model type. So: $ grep -i 0x10c9 * e1000_hw.h:#define E1000_DEV_ID_825760x10C9 For Jack: igb_vendor_info_array should really be extended to include actual ASCII strings for the individual chips/models/codenames. I'm sure that's on your todo list somewhere. I'd be willing to write this but would need a list of the models (or maybe the Linux driver has them in comments, etc. and I could go off of that). -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Results of BIND RFC
On Fri, Apr 02, 2010 at 09:24:51AM +, Poul-Henning Kamp wrote: In message 20100402021715.669838e0.s...@freebsd.org, Stanislav Sedov writes: On Fri, 02 Apr 2010 08:55:07 + Poul-Henning Kamp p...@phk.freebsd.dk mentioned: Sorry, I think I was not clear enough. Sorry for misunderstanding. Yes, the case can certainly be made that DNS query tool belongs in the base system. I disagree (so what else is new?) It should be kept out of the base system. KISS: Doug pulling BIND out of the base system / going ports-only = excellent. Doug making a separate port for BIND-esque DNS query/maintenance tools = excellent. Both of the above can be made into packages. Vendors who use FreeBSD can incorporate said package(s) into their build infrastructure. Folks who do not have Internet connections (yet for some reason want said DNS tools) can install the package(s) from CD/DVD/USB. I want the bikeshed to be black. :-) [1]: FreeBSD really needs to move away from the base system as a concept, as I've ranted about in the past. Or if it cannot, the base system needs to start using pkg_* (somehow) for use, and src.conf WITHOUT_xxx (where xxx = some software) removed. Concept being: I don't need Kerberos; pkg_delete base-krb5. I also don't need lib32; pkg_delete base-lib32. Beautiful concept, hard to implement due to libraries being yanked out from underneathe binaries that are linked to them. But you get the idea. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org