Re: vmx0: watchdog timeout on queue 2, no interrupts on BSP
> On Jul 21, 2019, at 4:17 PM, Andriy Gapon wrote: > >> On 20/07/2019 20:08, Patrick Kelsey wrote: >> >> >> On Fri, Jul 19, 2019 at 10:07 AM Andriy Gapon > <mailto:a...@freebsd.org>> wrote: >> >> >>Recently we experienced a strange problem. >>We noticed a lot of these messages in the logs: >>vmx0: watchdog timeout on queue 2 >>(always queue 2) >>Also, we noticed that connections to some end points did not work at all >>while others worked without problems. I assume that that was because >>specific flows got assigned to that queue 2. >> >>Further investigation has shown that none of interrupts assigned to the >>BSP has ever fired (since boot, of course). That included vmx0:rx2 and >>vmx0:tx2. But also interrupts for other drivers as well. >> >>Trying to get more information I rebooted the system and the problem >>disappeared. >> >>Has anyone seen anything like that? >>Any thoughts on possible causes? >>Any suggestions what to check if/when the problem reoccurs? >> >>Thanks! >> >> >> If you are running head at or after r347221 or stable/12 at or after >> r349112, then this could be due to >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239118 (see Comment 4 >> - short story is that an iflib change has broken the vmx driver). > > I am not sure if that bug could lead to all interrupts on the core > getting disabled (for all drivers), and right at the boot time. I am not sure either, but it’s the kind of bug that breaks the design of the vmx driver in such a way that its state can get corrupted to the point where the kernel can panic. I haven’t fully analyzed the potential scope of memory corruption / hardware state corruption that can occur (because the fix for the issue is already apparent), so I am freely considering it to include elements beyond the device and driver itself. If you are saying that zero vmx queue interrupts have occurred anywhere in the system, then I would rule out any connection to this as a prerequisite for the corruption to occur is having at least one such interrupt. -Patrick ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: vmx0: watchdog timeout on queue 2, no interrupts on BSP
On Fri, Jul 19, 2019 at 10:07 AM Andriy Gapon wrote: > > Recently we experienced a strange problem. > We noticed a lot of these messages in the logs: > vmx0: watchdog timeout on queue 2 > (always queue 2) > Also, we noticed that connections to some end points did not work at all > while others worked without problems. I assume that that was because > specific flows got assigned to that queue 2. > > Further investigation has shown that none of interrupts assigned to the > BSP has ever fired (since boot, of course). That included vmx0:rx2 and > vmx0:tx2. But also interrupts for other drivers as well. > > Trying to get more information I rebooted the system and the problem > disappeared. > > Has anyone seen anything like that? > Any thoughts on possible causes? > Any suggestions what to check if/when the problem reoccurs? > > Thanks! > > If you are running head at or after r347221 or stable/12 at or after r349112, then this could be due to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239118 (see Comment 4 - short story is that an iflib change has broken the vmx driver). -Patrick ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: posix_fallocate on ZFS
On Mon, Feb 12, 2018 at 12:04 PM, John Baldwinwrote: > On Saturday, February 10, 2018 01:46:33 PM Garrett Wollman wrote: > > In article > > , > > asom...@freebsd.org writes: > > > > >On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen > > >wrote: > > > > >> Is there any expectation that this is going to fixed in any near > future? > > > > >No. It's fundamentally impossible to support posix_fallocate on a COW > > >filesystem like ZFS. Ceph should be taught to ignore an EINVAL result, > > >since the system call is merely advisory. > > > > I don't think it's true that this is _fundamentally_ impossible. What > > the standard requires would in essence be a per-object refreservation. > > ZFS supports refreservation, obviously, but not on a per-object basis. > > Furthermore, there are mechanisms to preallocate blocks for things > > like dumps. So it *could* be done (as in, the concept is there), but > > it may not be practical. (And ultimately, there are ways in which the > > administrator might manage the system that would defeat the desired > > effect, but that's out of the standard's scope.) Given the semantic > > mismatch, though, I suspect it's unreasonable to expect anyone to > > prioritize implementation of such a feature. > > I don't think posix_fallocate() can be compatible with COW. Suppose you > do reserve a fixed set of blocks. That ensures the first write has a > place to write, but not if you overwrite one of those blocks. You'd have > to reserve another block to maintain the reservation each time you wrote > to a block, or you'd have to have a way to mark a file as not COW. The > first case isn't really any better than not using posix_fallocate() in the > first place as you are still requiring writes to allocate blocks, and the > second seems a bit fraught with peril as well if the application is > expecting the non-COW'd file to be in sync with other files in the system > since presumably non-COW'd files couldn't be snapshotted, etc. > > I think Garrett's assessment that it is not fundamentally impossible, but may not be felt to be worth implementing in any given file system for practical reasons, is correct. I say this having designed/implemented a COW file system that was driven by customer pressure to do things that at first pass one might declare represented an architectural contradiction, but upon further reflection were entirely possible to do given sufficient willingness to invest the effort and accept the accompanying trade-offs, additional knobs to turn, etc. In this case (posix_fallocate() + COW + snapshots), it could be implemented with a per-object allocator that normally keeps at least one extra block beyond the reservation requirement on hand, plus a snapshot operation that in order to succeed has to be able to provision the local allocators of all fallocated objects with enough additional blocks to maintain the no-fail write guarantee post-snapshot. -Patrick ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: CURRENT: FreeBSD not reporting AES-NI on Intel(R) Xeon(R) CPU E5-1650 v3
On Fri, Mar 17, 2017 at 1:31 PM, O. Hartmannwrote: > Am Fri, 17 Mar 2017 20:07:35 +0300 > Slawa Olhovchenkov schrieb: > > > On Fri, Mar 17, 2017 at 05:53:24PM +0100, O. Hartmann wrote: > > > > > Am Fri, 17 Mar 2017 15:04:29 +0300 > > > Slawa Olhovchenkov schrieb: > > > > > > > On Fri, Mar 17, 2017 at 12:36:25PM +0100, O. Hartmann wrote: > > > > > > > > > Running recent CURRENT on a Fujitsu Celsius M740 equipted with an > Intel(R) > > > > > Xeon(R) CPU E5-1650 v3 @ 3.50GHz CPU makes me some trouble. > > > > > > > > > > FreeBSD does not report the existence or availability of AES-NI > feature, which > > > > > is supposed to be a feature of this type of CPU: > > > > > > > > What reassons to detect AES-NI by FreeBSD? > > > > > > What do you mean? I do not understand! FreeBSD is supposed to read the > CPUID and > > > therefore the capabilities as every other OS, too. But there may some > circumstances > > > why FBSD won't. I do not know, that is the reason why I'm asking here. > > > > This sample can have disabled AES-NI by vendor, in BIOS, for example. > > As I show by links this is posible. > > > > CPUID in you example don't show AES-NI capabilities, for example > > 1650v4 w/ AES-NI > > > > CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz (3600.07-MHz K8-class CPU) > > Origin="GenuineIntel" Id=0x406f1 Family=0x6 Model=0x4f Stepping=1 > > Features=0xbfebfbff APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI, > MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > Features2=0x7ffefbff VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA, > SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE, > OSXSAVE,AVX,F16C,RDRAND> > > > > ^^ > > AMD Features=0x2c100800 > > AMD Features2=0x121 > > Structured Extended > > Features=0x21cbfbb BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE> > > XSAVE Features=0x1 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID, > VID,PostIntr > > TSC: P-state invariant, performance statistics > > > > In you sample: "TSCDLT,XSAVE" > > > > May be AES-NI disabled by vendor and FreeBSD correct show this. Or some > bug in FreeBSD, > > AES-NI work and other OS show AES-NI capabilities. > > > > ___ > > freebsd-current@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@ > freebsd.org" > > We have some LGA1151 XEON based 19 inch rack server, also equipted with > Haswell > E3-12XX-v3 XEONs and FreeBSD, also CURRENT, does show AES-NI. > > You're right, the vendor could have disabled AES-NI by intention - but > they offered this > box especially with AES-NI capabilities. > > See here: > > http://freebsd.1045724.x6.nabble.com/r285947-broken- > AESNI-support-No-aesni0-on-Intel-XEON-E5-1650-v3-on-Fujitsu-Celsius-M740- > td6028895.html > > I feel a bit pissed off right now due to Fujitsu, because we started > testing some > encrypting features and I'd like to use AES-NI and I run into this issue > again. > > I need to know that FreeBSD is not the issue with this specific CPU type. > I'm still > frustrated by that stupid comment "UNIX is not supoorted" I got that time > then when I > reported 2015 the issue to Fujitsu. > > > It's pretty straightforward to gain confidence that FreeBSD is not the issue here. The 'Features2=' line is printed by printcpuinfo() in sys/x86/x86/identcpu.c based on the bits set in a variable called cpu_feature2 (the printf is currently at line 802). The value of cpu_feature2 is set in identify_cpu() identcpu.c (for amd64, currently at line 1401) based on the result of the cpuid instruction that is executed by a call to do_cpuid(), which itself resides in sys/amd64/include/cpufunc.h. In other words, a single asm instruction is executed and the set bits from the result are printed. Based on some poking around in open source bits (tianocore, coreboot), it appears that AES-NI is something the BIOS can irreversibly disable-until-next-reset by twiddling bits in the appropriate MSR register. There is no code that does this in FreeBSD on purpose, so there would have to be a bug introduced in -CURRENT that somehow clobbers those MSR bits early on - a bug that was also not merged to 11-STABLE (since Slawa shows AESNI enabled on the same processor under 11-STABLE). I will also say that I have dealt with a manufacturer of Xeon hardware in Europe who will not provide a stock BIOS that allows you to enable AES-NI, out of concerns over violating export/import rules governing encryption technology. With that vendor, you have to pass an end-user verification and then they will make you a custom BIOS that gives you the option to enable AES-NI. It took quite some time working through the outer layers of their
Re: sysctl -a panic on VIMAGE kernels
On Sun, Aug 9, 2015 at 6:36 AM, Gleb Smirnoff gleb...@freebsd.org wrote: On Sun, Aug 09, 2015 at 12:28:22PM +0200, Kristof Provost wrote: K Hi, K K I’ve run into a reproducible panic on a VIMAGE kernel with ‘sysctl -a’. K K Relevant backtrace bits: K #8 0x80e7dd28 in trap (frame=0xfe01f16b26a0) K at /usr/src/sys/amd64/amd64/trap.c:426 K #9 0x80e5e6a2 in calltrap () K at /usr/src/sys/amd64/amd64/exception.S:235 K #10 0x80cea67d in uma_zone_get_cur (zone=0x0) K at /usr/src/sys/vm/uma_core.c:3006 K #11 0x80cec029 in sysctl_handle_uma_zone_cur ( K oidp=0x818a7c90, arg1=0xfe00010c0438, arg2=0, K req=0xfe01f16b2868) at /usr/src/sys/vm/uma_core.c:3580 K #12 0x80a28614 in sysctl_root_handler_locked (oid=0x818a7c90, K arg1=0xfe00010c0438, arg2=0, req=0xfe01f16b2868) K at /usr/src/sys/kern/kern_sysctl.c:183 K #13 0x80a27d70 in sysctl_root (arg1=value optimized out, K arg2=value optimized out) at /usr/src/sys/kern/kern_sysctl.c:1694 K #14 0x80a28372 in userland_sysctl (td=0x0, name=0xfe01f16b2930, K namelen=value optimized out, old=value optimized out, K oldlenp=value optimized out, inkernel=value optimized out, K new=value optimized out, newlen=value optimized out, K retval=value optimized out, flags=0) K at /usr/src/sys/kern/kern_sysctl.c:1798 K #15 0x80a28144 in sys___sysctl (td=0xf8000b1e49a0, K uap=0xfe01f16b2a40) at /usr/src/sys/kern/kern_sysctl.c:1724 K K In essence, what happens is that we end up in sysctl_handle_uma_zone_cur() and arg1 is a pointer to NULL, K so we call uma_zone_get_cur(zone); with zone == NULL. K K There’s been a bit of churn around tcp_reass_zone, and I think the latest version is wrong. K It marks the sysctl as CTLFLAG_VNET, but the exposed variable is not VNET_DEFINE(). K K The following fixes it for me: K K diff --git a/sys/netinet/tcp_reass.c b/sys/netinet/tcp_reass.c K index 77d8940..3913ef3 100644 K --- a/sys/netinet/tcp_reass.c K +++ b/sys/netinet/tcp_reass.c K @@ -84,7 +84,7 @@ SYSCTL_INT(_net_inet_tcp_reass, OID_AUTO, maxsegments, CTLFLAG_RDTUN, K Global maximum number of TCP Segments in Reassembly Queue); K K static uma_zone_t tcp_reass_zone; K -SYSCTL_UMA_CUR(_net_inet_tcp_reass, OID_AUTO, cursegments, CTLFLAG_VNET, K +SYSCTL_UMA_CUR(_net_inet_tcp_reass, OID_AUTO, cursegments, 0, K tcp_reass_zone, K Global number of TCP Segments currently in Reassembly Queue”); Right, if a variable isn't virtualized, the CTLFLAG_VNET must be removed. Patrick, how is your progress wuth improved reassembly? Kristof, thanks for committing this patch. Gleb, I expect to have a tcp reassembly patch up for review at some point this week. -Patrick ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: UMA: Increase vm.boot_pages on Dell R920 r279210
On Sat, May 2, 2015 at 10:25 PM, Adrian Chadd adr...@freebsd.org wrote: hi, Hm, should we be upping this limit automatically? Can we get cpu counts or memory amount early enough in boot to have a hope of auto-tuning? 64 seems low, 1024 seems high as a default. :) What is it that's exhausting the boot_pages? I'm semi-guessing it's the number of vm radix tree nodes needed for the TiB of memory. The only thing I'm aware of (allow for ignorance here) that consumes boot_pages and scales with the cpu count is the uma zone used for uma cache objects, but on amd64 this zone only needs 640 + cpus * 128 bytes, or about 4 pages for 120 cpus. vm radix nodes are 144 bytes each on amd64, and by my back-of-the-envelope calculations (using traces of non-vm-radix boot_page use from another amd64 system), 64 boot_pages would be exhausted after about 1000 vm radix nodes were allocated. It would be interesting to know how many boot_pages were actually required for this particular system. In any event, since startup_alloc() is designed to exhaust all the boot_pages before switching to the normal allocators, it doesn't seem necessarily harmful to err on the high side either in bumping up the static default or introducing an auto-tuned value (provided the excess is not so perversely large that startup_alloc() isn't able to make use of an embarrassment of pages due to zone creation timing and usage patterns). We know the number of cpus at the time boot_pages is put to use, but I don't think we know how much memory there is (and even less sure that even if we did, we'd really want to try to estimate things the vm radix tree size in a generic way). Something like a default of boot_pages = max(64, 32 + k * cpus) might be sufficient for k = 4 or 8 (gathering some data points would give a clue here), and palatable since it is at a minimum the current value that's been in use, and at the other end approaches a modest commitment of 16 or 32 KiB per cpu in the worst case (unused and unreclaimed boot_pages with high cpu count). -Patrick On 24 March 2015 at 13:00, Keith White kwh...@site.uottawa.ca wrote: On Tue, 24 Mar 2015, Rui Paulo wrote: On Mar 24, 2015, at 04:19, kwh...@site.uottawa.ca wrote: I'm using /boot/loader.conf. Is there another place I should be doing this? No, that's correct, but apparently there's a problem: the RDTUN sysctl is not picked up early enough. Can you try this patch? I haven't really tested it. :-) diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c index 79665ba..a764788 100644 --- a/sys/vm/vm_page.c +++ b/sys/vm/vm_page.c @@ -134,8 +134,9 @@ long first_page; int vm_page_zero_count; static int boot_pages = UMA_BOOT_PAGES; -SYSCTL_INT(_vm, OID_AUTO, boot_pages, CTLFLAG_RDTUN, boot_pages, 0, - number of pages allocated for bootstrapping the VM system); +SYSCTL_INT(_vm, OID_AUTO, boot_pages, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, +boot_pages, 0, +number of pages allocated for bootstrapping the VM system); static int pa_tryrelock_restart; SYSCTL_INT(_vm, OID_AUTO, tryrelock_restart, CTLFLAG_RD, @@ -349,6 +350,7 @@ vm_page_startup(vm_offset_t vaddr) * Allocate memory for use when boot strapping the kernel memory * allocator. */ + TUNABLE_INT_FETCH(vm.boot_pages, boot_pages); new_end = end - (boot_pages * UMA_SLAB_SIZE); new_end = trunc_page(new_end); mapped = pmap_map(vaddr, new_end, end, @@ -443,7 +445,7 @@ vm_page_startup(vm_offset_t vaddr) -- Rui Paulo Patch tried. Success! I now get this after setting vm.boot_pages=1024 in /boot/loader.conf: Booting... GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2015 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.0-CURRENT #1: Tue Mar 24 13:44:48 UTC 2015 root@:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.5.1 (tags/RELEASE_351/final 225668) 20150115 WARNING: WITNESS option enabled, expect reduced performance. UMA startup boot_pages: 1024 ... And can start all 120 processors. Thanks! ...keith -- Keith White, genie.uottawa.ca engineering.uottawa.ca kwh...@uottawa.ca [+1 613 562 5800 x6681] ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___
Re: _ftello() modification requires additional capsicum rights, breaking tcpdump and dhclient
On Wed, Sep 10, 2014 at 3:00 AM, Andrey Chernov a...@freebsd.org wrote: On 09.09.2014 21:53, Patrick Kelsey wrote: I don't think it is worth the trouble, as given the larger pattern of libc routines requiring multiple capsicum rights, it seems one will in general have to have libc implementation knowledge when using it in concert with capsicum. For example, consider the limitfd() routine in kdump.c, which provides rights for the TIOCGETA ioctl to be used on stdout so the eventual call to isatty() via printf() will work as intended. I think the above kdump example is a good one for the subtle issues that can arise when using capsicum with libc. That call to isatty() is via a widely-used internal libc routine __smakebuf(). __smakebuf() also calls __swhatbuf(), which in turn calls _fstat(), all to make sure that output to a tty is line buffered by default. It would appear that programs that restrict rights on stdout without allowing CAP_IOCTL and CAP_FSTAT could be disabling the normally default line buffering when stdout is a tty. kdump goes the distance, but dhclient does not (restricting stdout to CAP_WRITE only). In any event, the patch attached to my first message is seeming like the way to go. Well, then commit it (if capsicum team agrees). Will do - thanks for the feedback. -Patrick ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: _ftello() modification requires additional capsicum rights, breaking tcpdump and dhclient
On Mon, Sep 8, 2014 at 6:00 PM, Andrey Chernov a...@freebsd.org wrote: On 09.09.2014 1:13, Patrick Kelsey wrote: You make a godo point about the wider use of fcntl() in libc - aside from the rpc code, by my count there are 14 other entry points in libc that use fcntl in their implementation. To experience breakage, programs that use those entry points would also have to be supplying them fds with restricted rights that do not include CAP_FCNTL. By my count, there are currently only 12 programs in -current that call cap_rights_limit(). I don't think these counts inform us very well as to the presence and extent of any capsicum+libc issues similar to the one that I've raised. Those 12 programs mentioned above would have to be audited to determine if any of the 15 libc entry points (including fcntl) that use fcntl are being used on those restricted fds without being granted CAP_FCNTL rights, and whether there are overt or potential failures occurring as a result. Consider that the failure mode in tcpdump that I found requires that you be using multiple capture files with size-based rotation, otherwise all works fine. Also consider that the failure mode in dhclient only occurs when a rewritten client lease file is smaller than its predecessor. Just to note by quick glance: tcpdump use fdopen(), so in some cases probably already broken without F_GETFL rights. openssh use fdopen(), so suspicious about F_GETFL too, but I don't traverse the order in which fdopen() and cap_rights_* there are applied. I have now looked at all of the programs in -current that call cap_rights_limit() (dhclient, hastd, ping, tcpdump, rwhod, ctld, iscsid, kdump, rwho, units, uniq, and sshd) and examined them to see which file descriptors cap_rights_limit() is invoked on, with what rights, and whether libc functions that require fcntl rights (fcntl, fdopendir, fdopen, freopen, fseek, ftell, popen, lockf, etc) are subsequently used on those descriptors. In most cases, the programs are simple and/or the application of cap_rights_limit() is otherwise limited in scope, and it is easy to see that they have sufficient rights on the restricted fds for the operations performed on those fds. This was a mostly manual inspection, and of course I may have missed something, but I did not find any further issues related to insufficient capsicum rights when using libc. In the case of tcpdump, fdopen() is not used on a file descriptor whose rights have been restricted via cap_rights_limit(). In the case of openssh, cap_rights_limit() is used by sshd to sandbox the unprivileged child process when using privilege separation by restricting the child's stdin, stdout, and stderr, the child's end of the socketpair used to communicate with the privileged parent and the child's end of the pipe used to log to the privileged parent. fdopen() is not used on any of those descriptors. I don't think that this read-only fcntl(F_GETFL) which doesn not modify anything deserves any special rights at all (i.e. can be just enabled by default in contrast to F_SETFL), but I am not capsicum expert. I don't think I am in a position to comment on the implications of permanent F_GETFL rights either. I do think that the point about wider use of fcntl(F_GETFL) in libc does argue against making a CAP_FSEEK right in sys/capability.h, as it would appear users of capsicum and libc are more in need of a map of capsicum rights required by libc entry points than they are of convenience #defines. Theoretically it will be possible to get rid of fcntl(F_GETFL) in fseek(), but O_APPEND flag need to be stored somewhere in that case, and stdio _flags already have all bit occupied for 16bit short. So the price will be changing size of the main stdio structure __sFILE to add new space for flags, which is undesirable I think. I don't think it is worth the trouble, as given the larger pattern of libc routines requiring multiple capsicum rights, it seems one will in general have to have libc implementation knowledge when using it in concert with capsicum. For example, consider the limitfd() routine in kdump.c, which provides rights for the TIOCGETA ioctl to be used on stdout so the eventual call to isatty() via printf() will work as intended. I think the above kdump example is a good one for the subtle issues that can arise when using capsicum with libc. That call to isatty() is via a widely-used internal libc routine __smakebuf(). __smakebuf() also calls __swhatbuf(), which in turn calls _fstat(), all to make sure that output to a tty is line buffered by default. It would appear that programs that restrict rights on stdout without allowing CAP_IOCTL and CAP_FSTAT could be disabling the normally default line buffering when stdout is a tty. kdump goes the distance, but dhclient does not (restricting stdout to CAP_WRITE only). In any event, the patch attached to my first message is seeming like
_ftello() modification requires additional capsicum rights, breaking tcpdump and dhclient
In r268997, _ftello() was modified to use _fcntl(F_GETFL) in the non-append, write-only path. Consequently, programs that use _ftello() (via ftell, fgetpos, fsetpos, fseek, rewind...) on non-append, write-only files and that use capsicum to restrict capabilities on the associated fds to [CAP_SEEK, CAP_WRITE] broke as all ftell() (and friends) calls on those files fail with ENOTCAPABLE due to lack of CAP_FCNTL rights. There appear to be only two affected programs in the tree - tcpdump and dhclient. This affects both CURRENT and 10-STABLE (including 10.1-PRERELEASE) tcpdump, when configured to write to capture files rotated by size, fails to rotate and captures indefinitely to the first file in the series. This can be reproduced by a command such as: tcpdump -i ifname -C 1 -W 2 -w packets -v By inspection, dhclient will fail to trim old data from its client leases file when rewriting that file with a lesser amount of data than it currently contains. See the ftruncate() call in dhclient.c:rewrite_client_leases(). The attached patch adds CAP_FCNTL to the limited rights established for non-append, write-only files used by tcpdump and dhclient. It also restricts the fcntl rights to CAP_FCNTL_GETFL. The current need to have CAP_FCNTL rights in order to get or set the file position on non-append, write-only files is subtle. Perhaps part of the answer is to define a CAP_FSEEK right in sys/capability.h that resolves to CAP_SEEK|CAP_FCNTL, or to modify the CAP_SEEK description in rights(4) to note the need for CAP_FCNTL when using ftell() and friends. -Patrick ftell_cap_rights.patch Description: Binary data ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: _ftello() modification requires additional capsicum rights, breaking tcpdump and dhclient
On Mon, Sep 8, 2014 at 4:42 PM, Andrey Chernov a...@freebsd.org wrote: On 09.09.2014 0:28, Patrick Kelsey wrote: In r268997, _ftello() was modified to use _fcntl(F_GETFL) in the non-append, write-only path. Consequently, programs that use _ftello() (via ftell, fgetpos, fsetpos, fseek, rewind...) on non-append, write-only files and that use capsicum to restrict capabilities on the associated fds to [CAP_SEEK, CAP_WRITE] broke as all ftell() (and friends) calls on those files fail with ENOTCAPABLE due to lack of CAP_FCNTL rights. There appear to be only two affected programs in the tree - tcpdump and dhclient. This affects both CURRENT and 10-STABLE (including 10.1-PRERELEASE) tcpdump, when configured to write to capture files rotated by size, fails to rotate and captures indefinitely to the first file in the series. This can be reproduced by a command such as: tcpdump -i ifname -C 1 -W 2 -w packets -v By inspection, dhclient will fail to trim old data from its client leases file when rewriting that file with a lesser amount of data than it currently contains. See the ftruncate() call in dhclient.c:rewrite_client_leases(). The attached patch adds CAP_FCNTL to the limited rights established for non-append, write-only files used by tcpdump and dhclient. It also restricts the fcntl rights to CAP_FCNTL_GETFL. The current need to have CAP_FCNTL rights in order to get or set the file position on non-append, write-only files is subtle. Perhaps part of the answer is to define a CAP_FSEEK right in sys/capability.h that resolves to CAP_SEEK|CAP_FCNTL, or to modify the CAP_SEEK description in rights(4) to note the need for CAP_FCNTL when using ftell() and friends. -Patrick Stdio code use fcntl(F_GETFL) already in many places, f.e. fdopen(), freopen(). libc code in general use it in rpc code. According to your note, all that places are currently broken in anyway. You make a godo point about the wider use of fcntl() in libc - aside from the rpc code, by my count there are 14 other entry points in libc that use fcntl in their implementation. To experience breakage, programs that use those entry points would also have to be supplying them fds with restricted rights that do not include CAP_FCNTL. By my count, there are currently only 12 programs in -current that call cap_rights_limit(). I don't think these counts inform us very well as to the presence and extent of any capsicum+libc issues similar to the one that I've raised. Those 12 programs mentioned above would have to be audited to determine if any of the 15 libc entry points (including fcntl) that use fcntl are being used on those restricted fds without being granted CAP_FCNTL rights, and whether there are overt or potential failures occurring as a result. Consider that the failure mode in tcpdump that I found requires that you be using multiple capture files with size-based rotation, otherwise all works fine. Also consider that the failure mode in dhclient only occurs when a rewritten client lease file is smaller than its predecessor. I don't think that this read-only fcntl(F_GETFL) which doesn not modify anything deserves any special rights at all (i.e. can be just enabled by default in contrast to F_SETFL), but I am not capsicum expert. I don't think I am in a position to comment on the implications of permanent F_GETFL rights either. I do think that the point about wider use of fcntl(F_GETFL) in libc does argue against making a CAP_FSEEK right in sys/capability.h, as it would appear users of capsicum and libc are more in need of a map of capsicum rights required by libc entry points than they are of convenience #defines. -Patrick ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org