Re: question in sosend_generic()
On Fri, 7 Jun 2013, vasanth rao naik sabavat wrote: When sending data out of the socket I don't see in the code where the sb_cc is incremented. sb_cc reflects data appended to the socket buffer; sosend_generic() is responsible for arranging copying in and performing flow control, but the protocol's own pru_send() routine performs the append. E.g., tcp_usr_send() performs sbappendstream() which actually adds it to the socket buffer. Notice that not all protocols actually use the send socket buffer -- for example, UNIX domain sockets direct cross-deliver to the receiving socket's receive buffer. Is the socket send performed in the same thread of execution or the data is copied on to the socket send buffer and a different thread then sends the data out of the socket? Protocols provide their own implementations to handle data moving down the stack, so the specifics are protocol-dependent. In TCP, socket buffer append occurs synchronously in the same thread as part of the pru_send() downcall from the socket layer. When data leaves the send socket buffer is quite a different question. For TCP, data may be sent immediately if there various windows allow immediate transmit of the data (e.g., flow control, congestion control) ... or it may remain enqueued in the send socket buffer until an ACK is received that indicates the receiver is ready for more data (E.g., growing window size, ACK clocking, etc). In the steady send state (e.g., filling the window) I would expect to see data sent (and later removed) from the socket buffer only in an asynchronous context. Typically, ACK processing occurs in one of two threads: device driver interrupt handling (i.e., in the ithread) or in the netisr thread for encapsulated or looped back traffic. Because, I see a call to sbwait(so-so_snd) in the sosend_generic and I don't understand who would wake up this thread? sbwait() implements blocking for flow/congestion control: when the socket buffer fills, the sending thread must wait for space to open up. Space becomes available as a result of successful transmit -- e.g., the sbtruncate() of the sending socket buffer when a TCP ACK has been received. So the thread that triggers the wakeup will usually be the ithread or netisr. In the case of UNIX domain sockets, it's actually the receiving thread that triggers the wakeup directly. If the data is not copied on to the socket buffers then it should technically send all data out in the same thread of execution and future socket send calls should see that space is always fully available. In that case I dont see a reason why we need to wait on the socket send buffer. As there would no one who will actually wake you up. There are some false assumptions here. The sending thread will always append data [that fits] to the socket buffer, but may have to loop awaiting space for all data, depending on blocking/non-blocking status. Space becomes available when the remote endpoint acknowledges receipt, perhaps via a TCP ACK. You might never wake up if flow control from the remote endpoint doesn't find space becoming available, you've enabled blocking, and no timeout is set. If you fear the recipient may block the sender, then you need to implement some timeout mechanism to decide how long you're willing to wait. if (space resid + clen (atomic || space so-so_snd.sb_lowat || space clen)) { if ((so-so_state SS_NBIO) || (flags MSG_NBIO)) { SOCKBUF_UNLOCK(so-so_snd); error = EWOULDBLOCK; goto release; } error = sbwait(so-so_snd); SOCKBUF_UNLOCK(so-so_snd); if (error) goto release; goto restart; } In the above code snippet, for a blocking socket if the space is not available, then it may trigger a deadlock? You can experience deadlocks between senders and receivers as a result of cyclic waits for constrained resources (e.g., buffers). However, that is a property of application design, and applications that are killed will close their sockets, releasing resources. Most application designers attempt to avoid deadlock in their designs by ensuring that there is a path to progress, even a slow one. The deadlock you're suggesting in general does not exist -- it would be silly to wait for something that could never happen. Instead, we wait for things that generally will happen (e.g., a TCP ACK) or a timeout, which would close the connection. Notice that sbwait() is allowed to fail -- if the connection is severed due to a timeout or RST, then it returns immediately with an error. Robert ___ freebsd-hackers@freebsd.org mailing list
Re: KVERIFY for non-debug invariants?
On Wed, 5 Dec 2012, Vijay Singh wrote: All. KASSERT() is a really need way of expressing invariants when INVARIANTS is defined. However for regular, non-INVARIANTS code folks have the typical if() panic() combos, or private macros. Would a KVERIFY() that does this in non-INVARIANTS code make sense? I'd certainly be fine with something like this. It might be worth posting to arch@ with a code example, as hackers@ has a subset of the potentially interested audience. INVARIANTS has got a bit heavier-weight over the years -- the main thing I run into in higher-performance scenarios is its additional UMA debugging, which causes a global lock to be acquired during sanity checks. It might be worth our pondering adding a new configure option for particularly slow invariant tests -- e.g., INVARIANTS_SLOW ... or maybe just INVARIANTS_UMA. However, that's a different issue. (I sort of feel that things labeled assert should be something we can turn on in production... so maybe INVARIANTS/KASSERT mission-creep is the issue.) Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: A question about creating a system call
Hi Dave: This wiki page may be of value: http://wiki.freebsd.org/AddingAuditEvents Robert N M Watson Computer Laboratory University of Cambridge On Thu, 8 Nov 2012, dave jones wrote: Hello, I know how to create system calls, but I'm a bit confused about sys/kern/syscalls.master file explained. For example, if I have a foo system call, following code is added: 532 AUE_NULLSTD { int foo(char *str); } The question is in column two AUE_NULL, can I replace it with AUE_FOO? How to determine the system call should be audit or not? Thank you. Regards, Dave. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: No bus_space_read_8 on x86 ?
On Fri, 12 Oct 2012, Carl Delsey wrote: Indeed -- and on non-x86, where there are uncached direct map segments, and TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite different effects in terms of atomicity. Where uncached I/Os are being used, those differences may affect semantics significantly -- e.g., if your device has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you two halves of two different 64-bit values, rather than two halves of the same value. As device drivers depend on those atomicity semantics, we should (at the busspace level) offer only the exactly expected semantics, rather than trying to patch things up. If a device driver accessing 64-bit fields wants to support doing it using two 32-bit reads, it can figure out how to splice it together following bus_space_read_region_4(). I wouldn't make any default behaviour for bus_space_read_8 on i386, just amd64. My assumption (which may be unjustified) is that by far the most common implementations to read a 64-bit register on i386 would be to read the lower 4 bytes first, followed by the upper 4 bytes (or vice versa) and then stitch them together. I think we should provide helper functions for these two cases, otherwise I fear our code base will be littered with multiple independent implementations of this. Some driver writer who wants to take advantage of these helper functions would do something like #ifdef i386 #definebus_space_read_8bus_space_read_8_lower_first #endif otherwise, using bus_space_read_8 won't compile for i386 builds. If these implementations won't work for their case, they are free to write their own implementation or take whatever action is necessary. I guess my question is, are these cases common enough that it is worth helping developers by providing functions that do the double read and shifts for them, or do we leave them to deal with it on their own at the risk of possibly some duplicated code. I was thinking we might suggest to developers that they use a KPI that specifically captures the underlying semantics, so it's clear they understand them. Untested example: uint64_t v; /* * On 32-bit systems, read the 64-bit statistic using two 32-bit * reads. * * XXX: This will sometimes lead to a race. * * XXX: Gosh, I wonder if some word-swapping is needed in the merge? */ #ifdef 32-bit bus_space_read_region_4(space, handle, offset, (uint32_t *)v, 2; #else bus_space_read_8(space, handle, offset, v); #endif The potential need to word swap, however, suggests that you may be right about the error-prone nature of manual merging. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: No bus_space_read_8 on x86 ?
On Fri, 12 Oct 2012, John Baldwin wrote: I believe it was because bus reads weren't guaranteed to be atomic on i386. don't know if that's still the case or a concern, but it was an intentional omission. True. If you are on a 32-bit system you can read the two 4 byte values and then build a 64-bit value. For 64-bit platforms we should offer bus_read_8() however. I believe there is still no way to perform a 64-bit read on a i386 (or at least without messing with SSE instructions), but if you have to read a 64-bit register, you are stuck with doing two 32-bit reads and concatenating them. I figure we may as well provide an implementation for those who have to do that as well as the implementation for 64-bit. I think the problem though is that the way you should glue those two 32-bit reads together is device dependent. I don't think you can provide a completely device-neutral bus_read_8() on i386. We should certainly have it on 64-bit platforms, but I think drivers that want to work on 32-bit platforms need to explicitly merge the two words themselves. Indeed -- and on non-x86, where there are uncached direct map segments, and TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite different effects in terms of atomicity. Where uncached I/Os are being used, those differences may affect semantics significantly -- e.g., if your device has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you two halves of two different 64-bit values, rather than two halves of the same value. As device drivers depend on those atomicity semantics, we should (at the busspace level) offer only the exactly expected semantics, rather than trying to patch things up. If a device driver accessing 64-bit fields wants to support doing it using two 32-bit reads, it can figure out how to splice it together following bus_space_read_region_4(). Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: syslog(3) issues
On Mon, 3 Sep 2012, Attilio Rao wrote: I was trying to use syslog(3) in a port application that uses threading , having all of them at the LOG_CRIT level. What I see is that when the logging gets massive (1000 entries) I cannot find some items within the /var/log/messages (I know because I started stamping also some sort of message ID in order to see what is going on). The missing items are in the order of 25% of what really be there. Someone has a good idea on where I can start verifying for my syslogd system? I have really 0 experience with syslogd and maybe I could be missing something obvious. syslog(3)/syslogd(8) use datagram sockets for both local and networked logging, and it is possible for those datagram sockets to fill and drop messages. I'm not sure if we have per-socket counters that can easily be queried by syslogd, but if we do, it might be beneficial to have syslogd wake up once a second and check to see if the counters have changed -- if they have, inject a log message indicating how many messages were dropped in the last $epsilon. If we don't have counters along those lines, it might make sense to add them. We might also find that it is appropriate to tune up the limits if they no longer seem sensible in the current world order -- they may have late 1980s/early 1990s values (or they may not). Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: projects/armv6 merged to HEAD
On Thu, 16 Aug 2012, Oleksandr Tymoshenko wrote: projects/armv6 branch was merged to HEAD and should be considered dead now. This patch is a result of a joint effort by many people. Including but not limited to: Amazing work -- many thanks are due to to everyone who was involved! Robert Grzegorz Bernacki (gber@) Aleksander Dutkowski Ben R. Gray (bgray@) Olivier Houchard (cognet@) Rafal Jaworowski (raj@) and Semihalf team Tim Kientzle (kientzle@) Jakub Wojciech Klama (jceel@) Ian Lepore Warner Losh (imp@) Damjan Marion (dmarion@) Lukasz Plachno Stanislav Sedov (stas@) Mark Tinguely Andrew Turner (andrew@) Thanks to all, who contributed by submitting code, testing and giving valuable advices. Code drop includes following parts: - General ARMv6/ARMv7 kernel bits (pmap, cache, assembler routines, etc...) - ARM SMP support - VFP/Neon support - ARM Generic Interrupt Controller driver - Improved thread-local storage for cpus =ARMv6 - Two new values for TARGET_ARCH: armv6 and armv6eb - Driver for SMSC LAN95XX and LAN8710A ethernet controllers - Marvell MV78x60 support (multiuser, ARMADA XP kernel config) - TI OMAP4 and AM335x support (multiuser, no GPU or graphics support, kernel configs for Pandaboard and Beaglebone) - LPC32x0 support (multiuser, frame buffer works with SSD1289 LCD controller.Embedded Artists EA3250 kernel config) - Barebone Nvidia Tegra2 support (timers, interrupts and UART. No kernel config) Hope now that the code is in trunk it will get more attention and love from developers. Happy hacking -- gonzo ___ freebsd-a...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arch To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sysctl filesystem ?
On Tue, 26 Jun 2012, Chris Rees wrote: as well as we don't depend of /proc for normal operation we shouldn't for say /proc/sysctl improvements are welcome, better documentation is welcome, changes to what is OK - isn't. /proc/sysctl might be useful. Just because Linux uses it doesn't make it a bad idea. One of the problems we've encounted with synthetic file systems is that off-the-shelf file system tools (e.g., cp, dd, cat) make simplistic (but not unreasonable) assumptions about the statistic content of files. This comes up frequently with procfs-like systems where the size of, say, memory map data can be considerably larger than the perhaps 128-byte, 256-byte, or even 8k buffers that might exist in a stock file access tool. Unless we change all of those tools to use buffers much bigger than they currently do, which even suggets changing the C library buffer to defaults for FILE *, that places an onus on the file system to provide persisting snapshots of data until it's sure that a user process is done -- e.g., over many system calls. sysctl is not immune to the requirement of atomicity, but it has explicit control over it: sysctl is a single system call, rather than an unbounded open-read-seek-repeat-etc cycle, and has been carefully crafted to provide this and other MIB-like properties, such as a basic data type model so that command line tools know how to render content rather than having to guess and/or get it wrong. sysctl has some file-system like properties, but on the whole, it's not a file system -- it's much more like an SNMP MIB. While you can map anything into anything (including Turing machines), I think the sysctl command line tool and API, despite its limitations, is a better match for accessing this sort of monitoring and control data than the POSIX file API, and would recommend against trying to move to a sysctl file system. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: SMP: protocol control block protection for a multithreaded process (ex: udp).
On Tue, 29 May 2012, vasanth rao naik sabavat wrote: In case of a Multicore cpu system running a multithreaded process. For protocol control blocks there is no protection provided in the FreeBSD 9. For example, udp_close() and udp_send() access the inp before taking the lock. Couldn't this cause the inp inconsistency on a multithreaded process running on multicore cpu system? Say, If the two threads of a process are concurrently executing socket send and socket close say on a udp connection (this can happen in case of poorly written user code.). udp_close() will access the inp on one cpu and udp_send() will access the inp on another cpu. it is possible that udp_close() gets the locks first and free's the inp before udp_send() has a chance to run? Am I missing anything? The life cycle here is complicated and there is some subtlety. The simple answer to your question is that udp_abort() and udp_close() don't free the inpcb -- that occurs in udp_detach(), which is called only when the reference count on the socket hits 0, which can't happen while udp_send() is in flight, as the caller owns a reference maintaining the stability of the socket. Take a look at the comment at the top of uipc_socket.c for more detailed coverage of socket life cycles; for UDP, inpcbs are around for the entirely life cycle of the socket, so it is always safe to follow so-so_pcb if you hold a valid socket reference (either borrowed from a process's file descriptor, or held). For TCP, things are more complex. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: SMP: protocol control block protection for a multithreaded process (ex: udp).
On Tue, 29 May 2012, vasanth rao naik sabavat wrote: Can somebody please reply to this email. basically, can udp_detach() and udp_send() execute simultaneously for a process with multiple threads? if yes, then inp reference in udp_send() will be stale if udp_detach() free's the inp? You are confusing application-level close() with an actual close in the socket implementation. The socket will remain allocated as long as there are consumers using it, which is ensured through a reference count on the socket, regardless of close(). That isn't to say that there aren't bugs -- this stuff is pretty complex -- but the life cycle and synchronisation models around sockets should prevent the scenario you are describing from occurring. Robert Thanks, Vasanth On Tue, May 29, 2012 at 10:53 AM, vasanth rao naik sabavat vasanth.raon...@gmail.com wrote: Hi, In case of a Multicore cpu system running a multithreaded process. For protocol control blocks there is no protection provided in the FreeBSD 9. For example, udp_close() and udp_send() access the inp before taking the lock. Couldn't this cause the inp inconsistency on a multithreaded process running on multicore cpu system? Say, If the two threads of a process are concurrently executing socket send and socket close say on a udp connection (this can happen in case of poorly written user code.). udp_close() will access the inp on one cpu and udp_send() will access the inp on another cpu. it is possible that udp_close() gets the locks first and free's the inp before udp_send() has a chance to run? Am I missing anything? Thanks, Vasanth ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD has serious problems with focus, longevity, and lifecycle
On Mon, 16 Jan 2012, Julian Elischer wrote: On 1/16/12 3:32 PM, William Bentley wrote: I also echo John's sentiments here. Very excellent points made here. Thank you for voicing your opinion. I was beginning to think I was the only one who felt this way. [...] We seem to have lost our way around the release of FreeBSD 7. I am all in favor of new features but not at the risk of stability and proper life cycle management. Are me and John the only people that feel this way or are we among the minority? It pretty much boils down to one thing.. man power.. I disagree. Resourcing is an issue, but it is not *the* issue. The real issue here is a failure by the release engineering team (which includes me) to concurrently perform major and minor releases. Given that minor releases run like clockwork in most cases, this is disappointing. In the past, there have been a lot of good technical and structural obstacles to trying to do clockwork releases for both major and minor releases: - Tight synchronisation of the ports and base release schedule means that the base release schedule limits ports productivity. - Long freezes forced on us by poor revision control support for branching. None of these really apply any longer -- and in as much as they do, they should be addressed. In particular, I think there's a growing feeling that ports should be conducting its own releases out of lockstep with the base tree, producing package sets as a primary product at regular intervals regardless of the base release schedule. Likewise, long freezes enforced by expensive branching operation in CVS no longer apply due to use of Subversion -- it's not perfect, but it's workable. There's no way to satisfy everyone with any particular maintenance schedule and release cycle. However, it seems clear that the current model with minor releases spaced at a year is satisfying no one. It's easy to point at a developer-user divide, but I think that misses the point: most developers are users. A big gap between development branch and shipped features hurts the commercial users of FreeBSD that pay for so much of its development, since it forces them to support diverging local development and shipping products -- ISPs, etc. There is no incentive for year-long gaps in minor releases. My view is therefore that we have a social -- which is to say structural -- problem. Regardless of .0 releases, we should be forcing out minor releases, which are morally similar to service packs in the vocabulary of other vendors: device driver improvements, new CPU support, steady of conservative feature development, etc, required to keep older major releases viable on contemporary hardware and with contemporary applications. One known problem is using a single head release engineer in steering all releases. I think this is a mistake, as it makes the whole project's release schedule subject to individual unavailability, burnout, etc, as well as increasing the risks associated with low bus factor. I'd like to see us move to a model where new release engineers are mentored in from the developer community for point releases, ensuring that we increase our expertise, share knowledge about release engineering in the broader community, and get new eyes on the process which can lead more readily to process improvements. The role of the head release engineer shouldn't be hands-on prodution of every release, but rather, steering of the overall team. I'd like to see this begin with 8.3, drawing a per-release lead from the developer community, and continue with a fixed schedule release of 8.4. Yes, more staffing is needed, but first, what is needed is an improvement in model. On a related note, the security team owns the freebsd-update mechanism, largely for historical reasons (Colin wrote it), but this is actually a bit backwards from how you would expect things to run, as we now use freebsd-update for upgrades, which are almost never engineered by the security team. Not sure what the fix is there, but it seems related -- perhaps what is really called for is breaking out our .0 release engineering entirely from .x engineering, with freebsd-update being in the latter. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD has serious problems with focus, longevity, and lifecycle
On Wed, 18 Jan 2012, Andriy Gapon wrote: on 18/01/2012 02:16 Igor Mozolevsky said the following: Seriously, WTF is the point of having a PR system that allows patches to be submitted??! When I submit a patch I fix *your* code (not yours personally, but you get my gist). Let me pretend that I don't get it. It is as much your code as it is mine if you are a user of FreeBSD. I just happen to have a commit bit at this point in time. No other project requires a non-committer to be so ridiculously persistent in order to get a patch through. There are about 5000 open PRs for FreeBSD base system, maybe more. There are only a few dozens of active FreeBSD developers. Maybe less for any given particular point in time (as opposed to a period of time). And dealing with PRs is not always exciting. Need I continue? P.S. Using GNATS for the PR database doesn't help either, in some technical ways. The structural problem around the PR system for the base system is that there isn't a whole lot of incentive for most developers to use it. I think we can reasonably categorise developers into three classes -- some move between or span them, of course: (1) Volunteers. Due to childhood trauma, they have a desperate urge to write operating systems. Not much incentive to do PRs here, as most refer to versions of FreeBSD before their time, aren't great characterisations, rarely come with patches, and when they do, the patches are out of date, don't apply, have the wrong style, solve the wrong problem, etc. A sweeping generalisation, but you see what I mean. The only exceptions here are our dedicated team of bugmeisters, who get enourmous respect from me, but they are a tiny minority. (2) Employees. They work at a company using FreeBSD as a product, and effectively deliver their own CompanyBSD as a further product to their own internal customers -- to be put on a web service frontline, to ship as the foundation of an appliance, etc. The key phrase here is internal customers -- they have their own bug report database, which they respond to in a timely way due to the incentives of the workplace, but also because they are relevant bug reports for their product goals. (3) Authors of upstream code. They don't even work on FreeBSD, but their code ends up in FreeBSD, so they also have their own bug report databases, fix bugs, and eventually the fixes trickle into FreeBSD. With the above, the incentives to handle PRs are very weak -- and it's compounded by gnats being terrible for both submitters and handlers of bug reports. Contrast this with ports, where the PR database is a key part of the workflow. However, and I am being entirely honest when I say this: FreeBSD works anyway. So somehow, we end up with a pretty good OS despite largely ignoring our bug report database. Why? Well, for (1) it's because volunteers have a strong sense of ownership of the code they've written and care about, (2) there's a significant internal QA and bug management effort at downstream companies from FreeBSD, whose improvements are frequently upstreamed by committers on staff, and (3) occurs independently of bugs in our bug report database. Don't get me wrong: it's a problem that the PR database goes so unloved. But it's a symptom of the construction of *extremely large* volunteer projects in which the incentives are not aligned for dealing with PRs most of the time. If you want to see something similarly sad, try counting dropped patches on the linux-kernel list. Someone once ported the entire FreeBSD kernel audit framework and OpenBSM to Linux, posted on the list saying here are my patches, never heard anything back, and went away. You can moralise in various ways and for various parties in that relationship, but at heart, that's pretty similar to a lot of the patches in the PR database; you'll find similar stuff in every open source project of scale. I submitted patches to fix several bugs in KDE a decade or so ago .. after five years, the reports were closed as out of date. Yet large open source products *do* work, and become the foundations for amazing things. I think shifting away from Gnats would help as it would make it easier for developers to find bugs they care about, users to submit higher-quality reports, and so on. Gnats makes it really hard to manage reports in a useful way. Another possibility is to get some combination of {The FreeBSD Foundation, iX Systems, ...} to trawl the bug report database in a more official capacity. The problem there is that this will be a high burn-out job. I'll bring it up at the next Foundation board meeting, especially after a bumper year of fund-raising, and see what we can do. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to
Re: FreeBSD has serious problems with focus, longevity, and lifecycle
On Tue, 17 Jan 2012, Andriy Gapon wrote: on 17/01/2012 00:28 John Kozubik said the following: we going to run RELEASE software ONLY My opinion: you've put yourself in a box that is not very compatible with the current FreeBSD release strategy. With your scale and restrictions you probably should just use the FreeBSD source and roll your own releases from a stable branch of interest (including testing, etc). Or have your own branch where you could cherry-pick interesting changes from any FreeBSD branches. Tools like e.g. git and mercurial make it easy. Of course, this strategy is not as easy as trying to persuade the rest of FreeBSD community/project/thing to change its ways, but perhaps a little bit more realistic. You can bond with similarly minded organizations to share costs/work/etc. It's a community-driven project after all. Suppose for a moment we get the .x release process fixed: we start cutting regular point releases from -STABLE on a 6-month cycle (just a strawman). freebsd-update's update and upgrade features actually make tracking -STABLE at release engineered time slices plausible. One reason that's true is that between 5.x and 6.x, the FreeBSD Project underwent a substantive change in our approach to binary interfaces. In 4.x and before, the letters ABI rarely hit the mailing lists. In 6.x and later, it's a key topic discussed whenever merges to -STABLE come up. We now really care about keeping applications running as the OS moves under them. We also build packages to better-defined ABIs -- not perfectly, but OK. I think John gets a lot of what he wants if we just fix our release cycle. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD has serious problems with focus, longevity, and lifecycle
On Tue, 17 Jan 2012, Doug Barton wrote: The other thing I think has been missing (as several have pointed out in this thread already) is any sort of planning for what should be in the next release. The current time-based release schedule is (in large part) a reaction to the problems we had in getting 5.0 out the door. However I think the pendulum has swung *way* too far in the wrong direction, such that we are now afraid to put *any* kind of plan in place for fear that it will cause the release schedule to slip. Aside from the obvious folly in that (lack of) plan, it fails to take into account the fact that the release schedules already slip, often comically far out into the future, and that the results are often worse than they would have been otherwise. Agreed entirely. There's been an over-swing caused by the diagnosis it's like herding cats into cats can't be herded, so why try?. Projects like FreeBSD don't agree if there's no consensus on interesting problems to solve, directions to run in, etc. The history of FreeBSD is also full of examples of successful collaborative development in which developers decide, together, on a direction and run that way. Sure, it's not the same as we are paying you to do X, but I think many FreeBSD developers like the idea that they are working on something larger than just their own micro-project, and would subscribe (and contribute) to a sensible plan. In fact, I think we'd find that if we were a bit more forthcoming about our plans, we'd have an easier time soliciting contributions from people less involved in the project, as it would be more obvious how they could get involved. It strikes me that the first basic plan would be a release schedule, however. :-) Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: buf_ring(9) API precisions
On Thu, 15 Sep 2011, K. Macy wrote: Why are you making an MD guess, the amount of padding to fit the size of a cache line, in MI API ? Strangely enough, you did not make this assumption in, say r205488 (picked randomly). It has been several years, and I haven't done any work in svn in over a year, I don't remember. I probably meant to refine it in a later iteration. If you would like to send me a patch addressing this I'd be more than happy to apply it if appropriate. Otherwise, I will deal with it some time after 9 settles. Thanks for pointing this out. I'm not sure if gcc (and friends) allow __aligned(CACHE_LINE_SIZE) to be used on individual elements of a struct (causing appropriate padding to be added), but that may be one option here. Of course, that introduces a further alignment requirement on the struct itself, so a moderate amount of care would need to be used. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: TIME_WAIT Assassination in FreeBSD???
On Sat, 3 Sep 2011, Jarrod Lee Petz wrote: 3. Does FreeBSD handle this situation? How? I can't seem to find much info on TIME_WAIT assassination in FreeBSD is mentioned in RFC 6056 I'm not familiar with the RFC side here, but I can confirm that FreeBSD will recycle TIMEWAIT connections more quickly than specified when load is very high. This is done on the basis of allocated space; the sysctl: net.inet.tcp.maxtcptw Instructs the stack regarding how much state to retain -- this is implemented by adjusting the allocation limit on the tcptw zone. On my system, it seems to auto-tune to about 5000 connections, a value derived from the global limit on the number of sockets on the box I'm looking at -- your mileage may vary. The resource limit case can occur in tcp_twstart(), when uma_zalloc() returns NULL on failing to allocate new TIMEWAIT state for a connection. At that point, it forces an early scan of TIMEWAIT connections (which normally happens on 2msl intervals) with a 'reuse' argument of 1, authorising premature reuse. Without too close an analysis, it appears on face value to implement LRU: we reuse storage held by the connection that has been in TIMEWAIT the longest. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Dynamic kernel module linking problem
On Fri, 26 Aug 2011, Monthadar Al Jaberi wrote: I have written a dynamic loadable module using DECLARE_MODULE in FreeBSD-Current. And I want to iterate through the ifnet list using following code snippet: If this is on a recent version of FreeBSD (8.x and later), then you probably mean to be using V_ifnet, and you should include if_var.h rather than using an extern in order to ensure virtualisation is handled properly. Robert extern struct ifnethead ifnet; ... struct ifnet *ifp, *ifp_temp; TAILQ_FOREACH_SAFE(ifp, ifnet, if_link, ifp_temp) { printf(%s\n, ifp-if_dname); } Compilation is fine, but when I load the module I get the following error: ... /sbin/kldload -v module.ko link_elf: symbol ifnet undefined ... What am I doing wrong? Shouldn't kernel be able to link it on its own? Grateful for any advice. -- //Monthadar Al Jaberi ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Capsicum project: Ideas needed
On Thu, 4 Aug 2011, Lars Engels wrote: I just stumbled upon this rather outdated thread... On Fri, 8 Jul 2011 15:09:52 +0400, Ilya Bakulin wrote: [...] wget curl links/lynx This is Ports software, we may try to modify it and even send patches to upstream, or maintain our local patches. I wanted to focus on base system components during GSoC, but it doesn't hurt to try to capsicumize these tools either. fetch(1) is similar to wget and curl and is part of the base system, so would this be a candidate? I'd think fetch would be quite a good candidate -- most of its work is done as a pipeline between a socket and a file, and sandboxing the gubbins that sits in the middle of that pipeline would be quite beneficial. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: MIPS toolchain
On Fri, 29 Jul 2011, James Jones wrote: Does anyone have a prebuilt MIPS tool chain? For FreeBSD-related MIPS work, I generally use the FreeBSD toolchain target followed by the buildenv environment, but that requires first building a cross-toolchain using TARGET_ARCH and TARGET. However, the result is a pretty sane compiler, linker, etc, setup for the MIPS of your choice (we tend to use mips64eb). We also use the MIPS-provided SDE toolchain for Linux at the CL, but that appears to be out of maintenance, and I haven't found its bug density to be any lower, really, than the even more ageing FreeBSD versions of the tools. In fact, there are some toolchain bugs I'm running into that manifest only in the SDE toolchain and not the FreeBSD toolchain. (Mind you, Philip has commented that in building Uboot for MIPS, he's found FreeBSD bugs that don't appear in the SDE toolchain, so mileage varies). We're greatly looking forward to MIPS support for LLVM, which currently appears very premature indeed. Someone from MIPS appears to be contributing to it, however, and we (cl.cam.ac.uk) hope to provide some implementation support for that effort in the immediate future. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: MAC Framework, Socket information
On Thu, 28 Jul 2011, s wrote: I need to get some info about the socket being created by the user. What I want to do is log all TCP/UDP outgoing connections that are being made. I *need* to get the local and remote address, as well as the local and remote port. I managed to get all of the remote data, but this is useless to me, if I haven't got the local port. Here is what I have already written: Most MAC Framework entry points are invoked before operations of interest, rather than after, because they are intended to perform access control on operations. I think the closest you may be able to get given current entry points is logging when the first operation is performed on the connected socket: i.e., read, write, sendfile, etc, since it will be established at that point (some caution required: you can invoke system calls on sockets before and during connect()). However, I can't help but wonder: would you be better-served by using the kernel's audit facilities to track events like socket connection? Are you blending access control and logging in your module, or is this really just about logging? Robert static int slog_socket_check_connect(struct ucred *cred, struct socket *socket, struct label *socketlabel, struct sockaddr *sockaddr) { if(sockaddr-sa_family == AF_INET) { struct sockaddr_in sa; log(LOG_SECURITY | LOG_DEBUG, Somebody made a socket: %d:%d (%d)\n, cred-cr_ruid, ntohs(((struct sockaddr_in*)sockaddr)-sin_port), ntohs(((struct in_endpoints*)sockaddr)-ie_lport) ); } return 0; } -- Pozdrawiam, Jakub 'samu' Szafrański ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Add setacl system call?
On Mon, 25 Jul 2011, exorcistkiller wrote: Another question while I'm reading the code. In ufs_acl.c, in static int ufs_getacl_posix1e(struct vop_getacl_args *ap), you commented: As part of the ACL is stored in the inode, and the rest in an EA, assemble both into a final ACL product. From what I learned from Kirk's book, ACLs are supposed be stored in extended attributes blocks. So what do you mean by part of the ACL is stores in the inode? I know extended attributes blocks data can be addressed by inode, but how to get ACL directly from the inode? POSIX.1e ACLs are defined as an extension to permissions: additional user entries, group entries, and a mask entry supplement the existing owner, group and other permission bits. Both the APIs and command line tools allow the portions of the ACL representing permission bits to be directly manipulated. For the purpose of the UFS implementation (and I suspect this to be common in other implementations as well), we keep the owner/group/other bits (or sometimes the mask bits) in the existing inode permissions field. All additional entries are stored in the extended attribute. This has some nice properties, not least: (1) stat(2) on the file still only needs look at the inode, not the extended attributes, making it faster. (2) chmod(2) can be implemented by writing out only the inode, also faster. (3) Files without extended ACLs don't need extended attributes stored. The inclusion of a mask field in POSIX.1e is motivated similarly: it is what allows stat(2) and chmod(2) to not touch extended ACL fields. This is what the commend means by part of the ACL being stored in the inode, and part in the extended attribute: any areas of an ACL that are actually permission mask entries go in the existing mode bits in the inode for efficiency reasons. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: HTT vs SMT in x86 SMP topology reporting
On Tue, 26 Jul 2011, Andriy Gapon wrote: Can anybody explain to me why our _x86_ SMP topology discovery and reporting code sometimes reports HTT and sometimes SMT? As in FreeBSD/SMP: %d package(s) x %d core(s) x %d HTT threads vs FreeBSD/SMP: %d package(s) x %d core(s) x %d SMT threads As I understand, and quoting Wikipedia (I know, I know), SMT stands for simultaneous multithreading and is a generic term for a particular kind of hardware multithreading: http://en.wikipedia.org/wiki/Simultaneous_multithreading The only known (to me) implementation of SMT for x86 is Intel's Hyper-Threading Technology aka HTT aka HT Technology aka hyperthreading: http://en.wikipedia.org/wiki/Hyper-threading http://software.intel.com/en-us/articles/intel-hyper-threading-technology-your-questions-answered/?wapkw=%28Intel+Hyper-Threading+Technology%29 Several MIPS platforms we run on support SMT. Typically this means a set of weaker threads sharing a single core, usually context switching as a result of memory access stalls in other threads, and perhaps sharing particularly expensive CPU features, such as a TLB. They sometimes come with high-performance message-passing facilities between threads, or even between cores, to supplement shared memory and IPIs. It may be that HTT is, among other things, a trademark of Intel. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Add setacl system call?
On Sun, 24 Jul 2011, exorcistkiller wrote: Hi, I'm working on a course project in which I need to add 3 system calls. One of which is setacl(char *name, int type, int idnum, int perms), which set acl for a file specified by name. I used newfs as in ftp://ftp.tw.freebsd.org/pub/FreeBSD/FreeBSD-current/src/sbin/newfs/ to make this new filesystem, named myfs (which really is UFS2) and mounted it. My question is: 1) where to start with? 2) Is this filesystem actually a userland UFS and I can use functions in libufs(3)? 3) What about functions in ufs_acl.c? Should the acls be stored on the extended attributes blocks? Does FreeBSD 8.2 support it? I know I'm asking stupid questions, but a small hint might help me a lot. Thank you so much.. Hi... er.. exorcistkiller... (*) This being FreeBSD, you may want to start with the existing programmer documentation, which should prove quite useful given your goals. Try acl(3) for userspace, and acl(9) for the kernel. You are doing this in the context of a course, so the constraints may be somewhat artificial. However, normally my advice to someone wanting to add a new ACL implementation to FreeBSD would be to start with our existing implementation, which supports both POSIX.1e and NFSv4 ACLs (and is extensible to new ACL types without changing the current APIs (much)). For example, if I were going to teach our native system call API about AFS ACLs, I'd start by perusing the above man pages and code, including: src/bin/*acl* # Commands for manipulating ACLs src/lib/libc/posix1e # Library routines src/sys/kern/*acl*# File system-independent code src/sys/sys/acl.h # File system-independent header As you've already found, ufs_acl.c contains the implementation for UFS; ZFS, NFS, etc, have similar-looking files with markedly different contents. In general, if something looks file system-independent, we try to put it in the centralised files in kern, rather than replicate the code across file systems. Roughly half the code in the kern directory has to do with calls *into* the file system, and the other half is a library of routines called *by* the file system. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Finding symlink information in MAC Framework
On Fri, 15 Jul 2011, s wrote: I am trying to get some information related to the symlink which is being accessed by the user in MAC Framework. Currently I managed to get the uid/gid of the owner of the symlink that is being read, but now I need to get the same information about the target, that the symlink points to. static int samplemac_vnode_check_link (struct ucred *cred, struct vnode *vp, struct label *vplabel) { int error; struct vattr vap; error = VOP_GETATTR(vp, vap, cred); if (error) return (1); if(vap.va_uid != 0) { log(LOG_NOTICE, stub_vnode_check_readlink: %i, gid: %i\n, vap.va_uid, vap.va_gid); return (0); } return (0); } And I have no idea how could I do that. Where should I look for that info? And what way would be the fastest? Hi Jakub: Could you say a bit more about what you're trying to accomplish? The reason it's hard to express what you're trying to do (inspect the target of a symlink during a read of the symlink) is that it's not really a coherent concept in terms of kernel implementation. At the point where the access control check on readlink is occuring, the string hasn't yet been read from the link, and even if it had, you couldn't look up the target object as you're already holding locks relating to lookup and read of the symlink itself. Even if you could, there's also a risk of recursion: the symlink could point straight back to where you are, etc. The readlink check is mid-lookup and triggering an entirely fresh lookup from there might be quite awkward for a number of such reasons. In general, however, this is not an issue for the policies we've encountered thus far: they almost all care only about authorising path segment lookups (in which case readlink is just another segment in evaluation), or absolute paths to objects reconstructed during the actual operation on the target object, etc. Hence my wondering what you're trying to accomplish -- the first question, really, is is what you're trying to express actually safely expressible in a fine-grained, multiprocessing kernel? Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Issue with 'Unknown Error: -512'
On Mon, 18 Jul 2011, Andriy Gapon wrote: In recent branches (confirmed with 224119) builds compiled with clang happen to throw 'Unknown error: -512' in a lot of places, making the system unusable. (Untested on gcc compiled systems). Originally I thought the problem was with specific programs, then I narrowed it down to file I/O, and now I've narrowed it down to open() with O_TRUNC. Without O_TRUNC there seems to be no issues whatsoever. With O_TRUNC on open() it fails with that 'Unknown error: -512' every other time you run the program. Common issues, portsnap is affected, making it impossible to fetch/extract ports. As well as redirecting output in shells eg `echo 'hi' test` fails every other try. You have the same issue with text editors like `edit` where it fails every other save. There are no issues with `echo 'hi' test` as there is no O_TRUNC, it only seems to be an O_TRUNC error. Any tips? Otherwise I'll be looking into this today myself. Just a hint that you could try using DTrace syscall and fbt providers to see where in kernel (if in kernel) that -512 return value originates. Jon Anderson spotted that here during some Capsicum work -- initially we were concerned it was a local patch, but it sounds like it might be less local. I think he saw it on calls to open(2) as well, and I couldn't help but wonder (given its recent arrival) if it was an outcome of the change to break falloc into two parts, leading to some or another problematic handling of file descriptor numbers. I.e., it's not so much that -512 is being returned, as a number that's a bad file descriptor. (Although now having seen 512 twice on two different machines, that particular explanation seems less credible). Perhaps this is indeed unrelated to Capsicum, and triggered by a clang bug or something else. I've CC'd Jon, maybe he has gained further insight since we chatted. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Kernel timers infrastructure
On Mon, 25 Jul 2011, Filippo Sironi wrote: I'm working on a university project that's based on FreeBSD and I'm currently hacking the kernel... but I'm a complete newbie. My question is: what if I have to call a certain function 10 times per second? I've seen a bit of code regarding callout_* functions but I can't get through them. Is there anyone who can help me? Hi Filippo: I'm not sure if you've found the callout(9) man page yet, but it talks about the KPI in some detail. The basic idea, though, is that you describe a regular callout using a function pointer, an opaque data pointer, and how long until it should be invoked. In its more complex incantations, you can also specify locks for it to acquire, etc. The key aspect of the API that some people find confusing is that the time interval is described in ticks of length 1/hz seconds. Unless software really wants one invocation per tick (generally unlikely), you will want to pass in some constant times/divided by hz so that it's appropriately scaled. You can find two fairly straight-forward examples in kern/uipc_domain.c, which are respectively the fast and slow timers Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: priv_check() question
On Sun, 3 Jul 2011, exorcistkiller wrote: Hi! I am taking a FreeBSD course this summer and I'm doing a homework. A new system call uidkill() is to be added. uidkill(uid_t uid, int signum) sends signal specified by signum to all processes owned by uid, excluding the calling process itself. I'm almost done, however I get stuck with priv_check(). If the calling process is trying to send signal to processes owned by others, permission should be denied. My implementation simply uses an if (p-p_ucred-cr_uid == ksi.ksi_uid) to deny it, however priv_check() is required. My question is: what privilege a process should have to send signal to processes owned by others? PRIV_SIGNAL_DIFFCRED? The right way to think about privileges in FreeBSD is that they exempt subjects (usually processes) from normal access control rules -- typically as a result of a root uid. The access control rules for signalling are captured by p_cansignal() and cr_cansignal(), depending on whether the subject is a process or a cached credential. Processes have access to slightly greater rights than raw credentials due to additional context -- for example, information about parent-child relationships. These functions then invoke further privilege checks if required, perhaps overriding the normal requirement that uids match, etc. kill() implements a couple of broadcast modes for signals -- you may want to look at the implementation there to see how this is done. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD I/OAT (QuickData now?) driver
On Mon, 6 Jun 2011, grarpamp wrote: I know we've got polling. And probably MSI-X in a couple drivers. Pretty sure there is still one CPU doing the interrupt work? And none of the multiple queue thread spreading tech exists? Actually, with most recent 10gbps cards, and even 1gbps cards, we process inbound data with as many CPUs as the hardware has MSI-X enabled input and output queues. So a couple understates things significantly. * Through PF_RING, expose the RX queues to the userland so that the application can spawn one thread per queue hence avoid using semaphores at all. I'm probably a bit out of date, but last I checked, PF_RING still implied copying, albeit into shared memory buffers. We support shared memory between the kernel and userspace for BPF and have done for quite a while. However, right now a single shared memory buffer is shared for all receive queues on a NIC. We have a Google summer of code student working on this actively right now -- my hope is that by the end of the summer we'll have a pretty functional system that allows different shared memory buffers to be used for different input queues. In particular, applications will be able to query the set of queues available, detect CPU affinity for them, and bind particular shared memory rings to particular queues. It's worth observing that for many types of high-performance analysis, BPF's packet filtering and truncation support is quite helpful, and if you're going to use multiple hardware threads per input queue anyway, you actually get a nice split this way (as long as those threads share L2 caches). Luigi's work on mapping receive rings straight into userspace looks quite interesting, but I'm pretty behind currently, so haven't had a chance to read his NetMap paper. The direct mapping of rings approach is what a number of high-performance FreeBSD shops have been doing for a while, but none had generalised it sufficiently to merge into our base stack. I hope to see this happen in the next year. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sizeof(function pointer)
On Tue, 31 May 2011, m...@freebsd.org wrote: I am looking into potentially MFC'ing r212367 and related, that adds drains to sbufs. The reason for MFC is that several pieces of new code in CURRENT are using the drain functionality and it would make MFCing those changes much easier. The problem is that r212367 added a pointer to a drain function in the sbuf (it replaced a pointer to void). The C standard doesn't guarantee that a void * and a function pointer have the same size, though its true on amd64, i386 and I believe PPC. What I'm wondering is, though not guaranteed by the standard, is it *practically* true that sizeof(void *) == sizeof(int(*)(void)), such that an MFC won't break binary compatibility for any supported architecture? (The standard does guarantee, though not in words, that all function pointers have the same size, since it guarantees that pointers to functions can be cast to other pointers to functions and back without changing the value). I think you're OK for MFC purposes, but that in general, we shouldn't assume that they are the same size. I.e., we should use a function pointer type where we mean a function pointer type, and never write code that casts a function pointer to a regular pointer. (Which the change is fine with respect to, I believe). I'm doing some research on an experimental architecture where certain types of function pointers are 256-bit. This has some interesting consequences; we haven't yet gotten to investigating C language extensions/compatibility, but that will follow in the next year or so. (We also have 256-bit data references, similar to pointers, for use in some environments, which will also prove interesting. I'm not yet convinced we'll try to use a general pointer type for them, but perhaps instead extend the language to have a qualified type of some sort). Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Mount_nfs question
On Mon, 30 May 2011, Mark Saad wrote: So I am stumped on this one. I want to know what the IP of each nfs server that is providing each nfs export. I am running 7.4-RELEASE When I run mount -t nfs I see something like this VIP-01:/export/source on /mnt/src VIP-02:/export/target on /mnt/target VIP-01:/export/logs on /mnt/logs VIP-02:/export/package on /mnt/pkg The issue is I use a load balanced nfs server , from isilon. So VIP-01 could be any one of a group of IPs . I am trying to track down a network congestion issue and I cant find a way to match the output of lsof , and netstat to the output of mount -t nfs . Does anyone have any ideas how I could track this down , is there a way to run mount and have it show the IP and not the name of the source server ? Unfortunately, there's not a good answer to this question. nfsstat(1) should have a mode that can iterate down active mount points displaying statistics and connection information for each, but doesn't. NFS sockets generally don't appear in sockstat(1) either. However, they should appear in netstat(1), so you can at least identify the sockets open to various NFS server IP addresses (especially if they are TCP mounts). Enhancing nfsstat(1) to display more detailed information would, I think, be a very useful task for someone to get up to (and perhaps should appear on our ideas list). Something that would be nice to have, in support of this, is a way for file systems to provide extended status via a system call that queries mountpoints, both portable information that spans file systems, and file system-specific data. Morally, similar to nmount(2) but for statistics rather than setting things. The easier route is to add new sysctls that dump per-mountpoint state directly from NFS, but given how much other information we'd like to export, it would be great to have a more general mechanism. (The more adventurous can, with a fairly high degree of safety, use kgdb on /dev/mem (read-only) to walk the NFS stack's mount tables, but that's not much fun.) Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: compiler warnings (was: Re: [rfc] a few kern.mk and bsd.sys.mk related changes)
On Tue, 31 May 2011, Alexander Best wrote: On Mon May 30 11, Dieter BSD wrote: Chris writes: Ports need attention. The warnings I get there are frightening. I find it comforting that they're just that: warnings. How do they frighten you? High quality code does not have any warnings. The most frightening thing is the attitute that They're just warnings, so I'll ignore them. Most compiler warnings should be fatal errors. And a lot of the warnings that require a -Wwhatever should be on by default. please keep in mind that -Wfoo does reflect the ideas of the GNU people regarding *proper* code. the warnings themselves are sometimes wrong, because they complain about perfectly correct code. so -Wfoo should not be considered a code verifier, but in fact what it is: a warning flag. sometimes it's correct and indeed reports wrong code, sometimes it is completely wrong. And, it's also worth remembering that warnings change over time, as the compiler changes. One of the known issues building with clang is that large quantities of warning-free code under gcc are in fact rife with warnings under clang, including the gcc source code itself. In general, my hope is that we can get the FreeBSD base warning-free for a useful set of warnings, and on the whole, this is the case. Pretty much the entire kernel is compiled with quite a large number of warning classes enabled, and -Werror set, for example. (One of the other tensions, of course, is the locally maintained vs externally maintained tension: fixing warnings in other people's code is useful only if you can get them to accept the fixes back -- maintaining large numbers of patch sets over time is not sustainable for non-trivial quantifies of code, if you're tracking the upstream vendor. Ports is the worst possible case, where maintaining local patches is quite expensive. In the FreeBSD base we can do a lot better, since we can use revision control and automatic merging to help us, but it's still an overhead that has to be reasoned about carefully.) Robert___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: GSoC
On Fri, 1 Apr 2011, Oleksandr Dudinskyi wrote: I should like more specifically disclose my plan of action. One of the main tasks is find the places where registered errors, subsequently error analysis (their type) and separation errors related to disk and modifying the output format. There are different types of errors such as soft, hard, transport, device not ready, recoverable and other. Currently, presence the problem of reports and the majority error logs built as an individual files. Necessary changes in the kernel, which provide the emergence a database that processes information from several sources. The current kernel can't report what specific operations were errors, this further compounds the consistency problem. Reports of drivers errors requires a change. Systematization format recording of errors also is a priority,that we get and where the error occurred. Hi Oleksandr: This sounds like a potentially interesting project, but it remains a bit abstract to me, which makes me worry about it as a GSoC project. Strong proposals typically have a well-defined and easily characterised objective (1-2 sentences), and 3-4 intermediate deliverables. I worry that what you've described may be a bit too researchy for a summer project, but I'm willing to be convinced otherwise! Could you flesh out in a bit more detail how what you have in mind would work: are there new daemons? system calls? will you reuse existing logging or error-handling infrastructure? what is the namespace for errors? how will it affect current operations? We don't need perfect answers to these questions yet, but a slightly more worked out example might help resolve my concerns. Thanks! Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Include file search path
On Wed, 30 Mar 2011, Warner Losh wrote: On Mar 30, 2011, at 9:23 AM, Dimitry Andric wrote: This is a rather nasty hack, though. If we can make it work, we should probably try using --sysroot instead, or alternatively, -nostdinc and adding include dirs by hand. The same for executable and library search paths, although I am not sure if there is a way to completely reset those with the current options. I'm pretty sure that the origins of this hack pre-dates the -sysroot feature in gcc. It works in -current and has for years, so nobody has cared enough to even contemplate changing it. If you can make the sysroot feature work, that would be great, since that would allow us to skip the compiler building phase if we were building using external compilers. I have some patches to make that work, but this very problem is what I'd worked my way up to. It works well if you are building current on current, but not so well if you are mixing versions (you can mix architectures if you are using the xdev feature I put in a while ago, but even that has one or two niggles I need to iron out). Count me as another eager consumer awaiting a nice answer to the general cross-compile problem. I'm really looking for three things: (1) A bit more intelligence from our build framework regarding not rebuilding the toolchain quite so many times! I'd like to be able to do a buildworld with TARGET_ARCH with significantly improved performance. Perhaps we can do this already, in which case a pointer considered welcome. (2) Working clang/LLVM cross-compile of FreeBSD. This seems like a basic requirement to adopt clang/LLVM, and as far as I'm aware that's not yet a resolved issue? (3) Making it easy to plug in, first, an external gcc easily, and second, an external clang/LLVM. One worrying point for me on the last one is that we can't yet build the whole kernel with clang/LLVM, at least for i386/amd64, so I guess you need both external gcc *and* external clang/LLVM? We (Cambridge) are currently bringing up FreeBSD on a new soft-core 64-bit MIPS platform. We're already using a non-base gcc for our boot loader work, and plan to move to using clang/LLVM later in the year. The base system seems a bit short on detail when it comes to the above, currently. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Prebind from OpenBSD
On Sat, 26 Mar 2011, Jesse Smith wrote: I'm interested in working on the Port prebind from OpenBSD project mentioned on the FreeBSD Ideas page. ( http://wiki.freebsd.org/IdeasPage#head-d28cdd95ca1755d5afe63d653cb4926d4bdc99de ) There isn't much to go on from the project description and I'm curious what FreeBSD devs are looking for specifically. For example, should the entire ldconfig program be ported from OpenBSD (it looks like it's close enough to FreeBSD's to make that suitable), or should just the prebind code be merged into FreeBSD's ldcnfig? Once the project is complete, who should the work be submitted to? Has anyone else worked on this and made any progress? Hi Jesse: I think the intent of the ideas list entry is more a research project than a direct-to-commit project: the question is whether prebinding of some sort would observably help performance for important FreeBSD applications or, for example, the boot process. If so, then certainly the OpenBSD prebinding code is a possible model -- Mac OS X also has prebinding, of course, and it's done quite differently (and probably less reusably from our perspective as they use Mach-O rather than ELF); however, there might be interesting ideas as well. I think therefore I'd structure a project along the following lines: first, you want to establish to what extent synchronous waiting on linkage at run-time is a significant problem. It could be that some combination of hwpmc and DTrace would be the right tools for this. I'd especially pay attention to boot time, since we know that quite a lot of executing takes place then as part of rc.d. I'd also investigate large applications like Firefox, Chrome, KDE, Gnome, etc. KDE already integrates prebinding tricks in its design, but I don't think the others do. Next, I'd dig a bit more into the areas where it's hurting performance -- can you add up all the time spent waiting and cut 10 seconds from boot, or 5 seconds from Firefox startup? Or is the best win going to be .2 seconds in Firefox? Does the OpenBSD optimisation actually address the problem we're experiencing? Perhaps perform some experiments with prebinding-like behaviour, working up to an implementation. It's worth remembering that prebinding comes with some baggage as well, of course. Perhaps less relevant in the world of 64-bit address spaces, but there are some design trade-offs in this department... Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: GSoC
On Fri, 25 Mar 2011, Dudinskyi Olexandr wrote: My name is Dudinskyi Oleksandr. I am a student of National aviation university, Ukraine. I want to participate in GSoC 2011 with your organization. My project: Disk device error counters, iostat –e. I thing this project is very necessary in the FreeBSD system. Now I make a plan to develop this project. What you can say about the idea of my project? And what about the favor of this project? My mentor: Andriy Gapon. Hi Dudinskyi: It's a little hard to tell from your description exactly what it is you are proposing to do. Could you flesh out the idea some for us, so that we can give you feedback? What is the nature of the problem you want to solve? What software changes do you anticipate making? How will you test your changes? Robert___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...
On Sat, 12 Feb 2011, Alexander Leidinger wrote: On Sat, 12 Feb 2011 00:52:48 + (GMT) Robert Watson rwat...@freebsd.org wrote: The one comment I'd make is that the MAC case should indicate that The MAC Framework is supported, rather than mandatory access controls being present -- the presence of the framework doesn't imply the presence of mandatory access control policies. Does FEATURE(mac, Mandatory Access Control Framework support); look better? Alternatively/additionally we could use mac_framework as the name of the feature. The above seems fine -- while I've been moving to names like mac_framework.h, it's still options MAC and security/mac, etc, and think that mac is the most consistent options. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...
On Fri, 11 Feb 2011, Alexander Leidinger wrote: during the last GSoC various FEATURE macros where added to the system. Before committing them, I would like to get some review (like if macro is in the correct file, and for those FEATURES where the description was not taken from NOTES if the description is OK). If nobody complains, I would like to commit this in 1-2 weeks. If you need more time to review, just tell me. Here is the list of affected files (for those impatient ones which do not want to look at the attached patch before noticing that they are not interested to look at it): The additions for security/audit and security/mac both seem reasonable; I've been meaning to add them myself for quite a bit. There's then some code in libc that can learn to use this as well, at least for MAC. The one comment I'd make is that the MAC case should indicate that The MAC Framework is supported, rather than mandatory access controls being present -- the presence of the framework doesn't imply the presence of mandatory access control policies. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...
On Fri, 11 Feb 2011, Ilya Bakulin wrote: When I was beginning this GSoC work, I primarily thought about unifying the way to determine if particular feature exists in the kernel. Of course there should be at least one way to check if the feature is available or not (by definition: if I may use some functionality, than feature is present, otherwise... Oh, no, may be I have no permissions to use it? or something is terribly wrong with system confuguration? Or?...), but it is better to have a sort of unified way to get this information without looking for files in /dev, parsing `kldstat -v`, etc. One of the nice things about this is that when a conditionally compiled feature introduces a new system call, there can be forward (rather than backward) compatibility benefits. If login(1) had checked for the Audit feature before trying audit system calls when we introduced it in 6.x, it would have avoided a few people shooting their feet off in the (officially unsupported) case where following a kernel and userspace roll-forward, a kernel roll-back was required to restore stability. While we don't support it (you shouldn't run a new userspace with an old kernel), the failure mode would have been improved. More abstractly: for a feature like MAC, testing for the presence of the framework is functionally fairly different from exercise the feature, as most instances of exercising it work only based on modules loaded by the framework, which is a different goal. Right now, libc offers a mac_present API, which back-ends into manually testing a system call. I'd rather it backended into a common feature test framework. In many cases, it is of course desirable to test for a feature by using it -- a much more pragmatic approach, and generally one preferred in the world of autoconf, etc... Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ixgbe DMA question
On Fri, 11 Feb 2011, Santosh Rao Gururajan wrote: I have a host machine with 2 ixgbe NICs. I am trying to pass the frames from one NIC to the other with the lowest possible overhead to the host (high speed bridge). I am wondering if I can do a rx-ring to tx-ring DMA copy without creating a mbuf on the host. Is that possible? What are the risks? The only real risk is the simple matter of programming, I think. There's no reason not to it except that it involves modifying device drivers, memory models, etc. If you do what you describe, and you decide you do want to pass some frames up the stack, you can always hook up mbufs and use the external storage free routine to return the memory to the ring. Jeff Roberson has been circulating some patches that eliminate the mbuf-cluster relationship in its current form, instead preferring variable size mbufs, and I can't help but wonder if with such a patch, that wouldn't be simpler than what you propose, offering many of the same performance benefits while making the device driver changes smaller and still allowing you to direct some packets up the stack if desired. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Analyzing wired memory?
On Tue, 8 Feb 2011, Alan Cox wrote: On Tue, Feb 8, 2011 at 6:20 AM, Ivan Voras ivo...@freebsd.org wrote: Is it possible to track by some way what kernel system, process or thread has wired memory? (including data exists but needs code to extract it) No. I'd like to analyze a system where there is a lot of memory wired but not accounted for in the output of vmstat -m and vmstat -z. There are no user processes which would lock memory themselves. Any pointers? Have you accounted for the buffer cache? John and I have occasionally talked about making procstat -v work on the kernel; conceivably it could also export a wired page count for mappings where it makes sense. Ideally procstat would drill in a bit and allow you to see things at least at the granularty of this page range was allocated to UMA. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Why does printf(9) hang network?
On Sat, 5 Feb 2011, dieter...@engineer.com wrote: Why would doing a printf(9) in a device driver (usb, firewire, probably others) cause an obscenely long lockout on /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) ? Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause the problem, but printfs from device drivers do. Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The device drivers and printf don't even know that sb_sx exists. I can't speak to the details of your situation, but one possible explanation might be: printf runs at the speed of the console, which for serious consoles can be extremely slowly. Device driver interrupt threads can preempt other threads, possibly while those threads hold locks. That causes them to hold the locks for much longer, as the threads may not get rescheduled for some period (for example, until the device driver is done doing a printf), leading other threads waiting for that lock to wait significantly longer. Especially the case if the other thread was spinning adaptively, in which case it will then yield since the holder of the lock effectively yielded. You might try forcing all the various threads to run on different CPUs using cpuset and see if the variance goes down. You can also use KTR + schedgraph to explore the specific scheduling going on, although be aware that KTR can also noticeably perturb schediling itself. In general, things shouldn't call kernel printf in steady state operation; if they need to log something, they should use log(9) or similar. printf is primarily a tool for printing out device probe information, and for debugging purposes: it is not intended to be fast. Robert 135 int 136 sblock(struct sockbuf *sb, int flags) 137 { 138 139 KASSERT((flags SBL_VALID) == flags, 140 (sblock: flags invalid (0x%x), flags)); 141 142 if (flags SBL_WAIT) { 143 if ((sb-sb_flags SB_NOINTR) || 144 (flags SBL_NOINTR)) { 145 sx_xlock(sb-sb_sx); 146 return (0); 147 } 148 return (sx_xlock_sig(sb-sb_sx)); 149 } else { 150 if (sx_try_xlock(sb-sb_sx) == 0) 151 return (EWOULDBLOCK); 152 return (0); 153 } 154 } More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Creating an LVM-backed FreeBSD DomU in a Linux Dom0
On Tue, 28 Dec 2010, Avleen Vig wrote: After searching high and low and not finding exactly what I wanted (although Adrian Chadd's documents came close), I decided to document a lengthy but worthwhile procedure: How to install a FreeBSD DomU guest in a Linux Dom0 Xen host, from scratch, with LVM-backed storage (rather than file based), and without the need to rely on random kernels and ISO[1] http://bit.ly/dVhfFe Hopefully people find it useful :-) FYI, we now have a xen(4) man page, which will ship in 8.2. It's not tutorial material like your document, but is useful reference material. I'd like it very much if we could get something more along the lines of what you've created into the FreeBSD Handbook. Robert I haven't yet broached configuring inside the Xen host. Again there is scattered documentation available. I'll try to bring it together next. [1] I gave serious thought to uploading my own stuff along with the other similar things available already, but in the end I thought it better if people try out how to do it, given that the amount of work will be almost the same, or even slightly less building it yourself. Plus there are the usual security and availability concerns.. :) -- Avleen Vig Systems Administrator Personal: www.silverwraith.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: libkvm: consumers of kvm_getprocs for non-live kernels?
On Wed, 10 Nov 2010, Ulrich Spörlein wrote: I have this cleanup of libkvm sitting in my tree and it needs a little bit of testing, especially the function kvm_proclist, which is only called from kvm_deadprocs which is only called from kvm_getprocs when kd is not ALIVE. The only consumer in our tree that I can make out is *probably* kgdb, as ps(1), top(1), w(1), pkill(1), fstat(1), systat(1), pmcstat(8) and bsnmpd don't really work on coredumps But, the kgdb file gnu/usr.bin/binutils/gdb/kvm-fbsd.c, where kvm_getprocs is probably called on a dead kernel is not even used during build! So I guess I'm staring at dead code here, any kvm people around that can clue me in? Even if those tools aren't using kvm properly, they should be. ps(1) at least used to work quite well on coredumps, and perhaps still does? Stas has ongoing work on a libprocstat, you might want to give him a ping. I'm not sure if he plans to refactor some of those existing tools to use that library or not, but crashdump support is a key goal of it. Robert___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] Fix 'implicit declaration' warning and update vgone(9)
On Wed, 27 Oct 2010, Benjamin Kaduk wrote: On Wed, 27 Oct 2010, Kostik Belousov wrote: On Wed, Oct 27, 2010 at 10:59:56AM -0400, Benjamin Kaduk wrote: [1] The old (racy) function is osi_TryEvictVCache, here: http://git.openafs.org/?p=openafs.git;a=blob;f=src/afs/FBSD/osi_vcache.c;h=c2060c74f0155a610d2ea94f3c7f508e8ca4373a;hb=HEAD The function looks very strange for much more serious reasons. Why do you try to manage the vnode revocation in the filesystem module at all ? I am still becoming familiar with the AFS code, but I think this is largely due to a difference in the vfs structure that AFS has been using and the FreeBSD standard. E.g. vop_inactive/vop_reclaim do not actually free filesystem-specific resources, instead keeping a free list of vcache entries. So, the original authors of this AFS code were approaching the problem in a somewhat different way. Therefore, are somewhat-orthogonal pools of vcaches and vnodes (with some intersection). If the vcaches are all in use in use, there is a routine which tries to shake some loose; if it can free up vcaches, their associated vnodes also need to be cleaned up in some fashion. It may be that no additional code is actually needed to do this, though -- I am not sure. I have a hazy recollection, from quite a long time ago, that OpenAFS used to be a bit special with regard to vnodes -- allocating its own, or something along those lines. I expect it no longer does that, but it could be that it feels it owns the vnodes more than your average file system does, which may play less well with global management of vnodes. Derrick would probably have more to say on this. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: addition of sysctl nodes after compile time
On Tue, 19 Oct 2010, Alexander Best wrote: does this limitation still exist? Sysctls can be added dynamically using the sysctl_add_oid(9) KPI, which has existed (as far as I'm aware) at least since FreeBSD 4.x. It could be that this KPI provides the functionality required to do what the comment describes. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Bumping MAXCPU on amd64?
On Wed, 22 Sep 2010, Maxim Sobolev wrote: On 9/22/2010 6:37 AM, John Baldwin wrote: Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for existing klds. Ah, ok, sorry, I did only check RELENG_7. Can we make it a kernel option then? In principle, yes, but MAXCPU is used to size various kernel data structures inspected by userspace crash post-mortem tools, etc. I've done a bit of work to teach some of those tools (in particular, vmstat -z and vmstat -m) to extract the version of maxcpu compiled into the kernel instead just relying on the version of MAXCPU present when the command line tool was compiled. However, I think a better long-term approach here is to generally eliminate sizing based on MAXCPU and instead size based on the number of CPUs present. Certain kernel subsystems already do this (UMA, netisr, ...) but others don't (malloc(9), ...). Additional hands on this project would probably help :-). As John mentioned, the other issue is the use of fixed-width types instead of variable-length CPU bitmasks to name cores for IPIs, etc. There are people actively working on this, but it's a non-trivial project as kernel code likes to do things like cpumask othermask. My expectation is that this problem will be solved in 9.0 but I don't see any obvious MFC paths for 8.x due to KBI issues. It could be that this forces our hand in terms of breaking the KBI at some point in the 8.x series, unclear... Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: zfs + uma
On Fri, 17 Sep 2010, Andre Oppermann wrote: Although keeping free items around improves performance, it does consume memory too. And the fact that that memory is not freed on lowmem condition makes the situation worse. Interesting. We may run into related issues with excessive mbuf (cluster) caching in the per-cpu buckets as well. Having a general solutions for that is appreciated. Maybe the size of the free per-cpu buckets should be specified when setting up the UMA zone. Of certain frequently re-used elements we may want to cache more, other less. I've been keeping a vague eye out for this over the last few years, and haven't spotted many problems in production machines I've inspected. You can use the umastat tool in the tools tree to look at the distribution of memory over buckets (etc) in UMA manually. It would be nice if it had some automated statistics on fragmentation however. Short-lived fragmentation is likely, and isn't an issue, so what you want is a tool that monitors over time and reports on longer-lived fragmentation. The main fragmentation issue we've had in the past has been due to mbuf+cluster caching, which prevented mbufs from being freed usefully in some cases. Jeff's ongoing work on variable-sized mbufs would entirely eliminate that problem... Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Intel TurboBoost in practice
On Sun, 25 Jul 2010, Alexander Motin wrote: The numbers that you are showing doesn't show much difference. Have you tried buildworld? If you mean relative difference -- as I have told, it's mostly because of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most of time if CPU is not overheated. It probably doesn't, as it works on clear table under air conditioner. So maximal effect I can expect on is 4.2%. In such situation 2.8% probably not so bad to illustrate that feature works and there is space for further improvements. If I had Core i5-750S I would expect 33% boost. Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per configuration? Robert If you mean absolute difference, here are results or four buildworld runs: hw.acpi.cpu.cx_lowest=C1: 4654.23 sec hw.acpi.cpu.cx_lowest=C2: 4556.37 sec hw.acpi.cpu.cx_lowest=C2: 4570.85 sec hw.acpi.cpu.cx_lowest=C1: 4679.83 sec Benefit is about 2.1%. Each time results were erased and sources pre-cached into RAM. Storage was SSD, so disk should not be an issue. -- Alexander Motin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to get stack bounds of current process?
On Mon, 10 May 2010, Lev Serebryakov wrote: I'm proting some application from Linux, which discover its stack bounds by reading and pasing /proc/self/maps. FreeBSD have /prov/curproc/map, but I can not find how to determine which record is for stack (I've looked into implementation of proc_fs, but it doesn't contain any specail processing for process stack). How could I determine stack bounds of current process on FreeBSD 7/8/9? The procstat -v command in 8.x and 9.x will give this information based on sysctls; we're about to integrate a libprocstat(3) library which will provide a public API for this information. I'd agree with Kostik that you should think carefully about whether the application really needs this information :-). Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: make pkg_install suite reusable, please
On Fri, 9 Apr 2010, Alexander Churanov wrote: 2010/4/9 Leinier Cruz Salfran salfrancl.lis...@gmail.com i want to ask you one thing: can you make the 'pkg_install' suite reusable .. means install 'libinstall.a' as a shared object in order to make it reusable by others devs I'd like to add my 50 cents. From my point of view, the true UNIX way is re-using whole programs. This provides unbelievable isolation and correctness. If you don't want to fork myriads of processes each second, then, it's, probably, better to ask for pipe mode of pkg_* tools. For example, aspell works that way. You start a process, write commands and queries and read results. While there are clearly benefits to process isolation, there are countless situations in UNIX where I've said to myself Oh, I wish I had a libfoo not just a foo command. This is particularly the case for monitoring tools, where third-party applications have a lot of trouble parsing and tracking the output of tools like ps(1), etc. This is why recently we've been working on libmemstat(3), libprocstat(3), libnetstat(3), etc -- so that tools can avoid rewriting that code as well as avoid the parsing problem. So I have no particular opinion on this tool, but I will say that in general, it would be nice if programs were often thin wrappers around a library that could be reused, not just command line tools. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: make pkg_install suite reusable, please
On Fri, 9 Apr 2010, Charlie Kester wrote: It was a watershed moment in my programming career when I realized that the bubbles on those DFD charts we used to use for structured design could be whole processes and not just functions in a single, monolithic program. Suddenly everything the structured design folks were saying about re-use, encapsulation, loose coupling, module cohesion, etc. made a lot more sense when viewed from the perspective of simple Unix utilities communicating with plain text via pipes. We should encourage that approach as a default, and only put things into binary libraries when forced to by performance considerations. Per my e-mail, I'm not sure I entirely agree with this view, although for certain types of scripting and programming it makes a lot of sense. What was always missing from this model is a structured way to pass complex data between components: streams of one-line ASCII strings work fine, but when you want to pass data structures, you end up replicating code to generate and parse data between components. Maybe XML is an answer to this, but more likely it's not :-). There's also the issue of plugging and types: if you support complex types, why not have type checking on the plugs? For example, gzcat | tar -xf - only for certain file types: wouldn't it be nice if type information, as well as byte streams, were passed around and you could do static checking, or even negotiation. But it would be nice to get a clear typing error instead of garbage. This is, BTW, what windowing systems do for copy-and-paste: when you copy from one program and paste to another, the two programmes negotiate an appropriate intermediate format: if the target doesn't support rich text, then it needs to be generated as plain text by the source, etc. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: GSoC 2010
On Sat, 10 Apr 2010, jax wrote: I am Igor Druzhinin and I want to participate in GSoC 2010 in FreeBSD project. I want to propose to completely realise fast syscalls support for FreeBSD on x86 platform. I have already submited my proposal few days ago on GSoC site and tried to contact with possible mentors from technical contacts list. But they they still have not answered me. So I have decided to try here. What do you think about this proposal? Is it still actual or not? If so, who can be my mentor? Hi Igor-- Due to the volume of proposals, it can take some time to get through them all, and the last few weeks have been a bit rife with holidays around the world which has slowed down answers to some questions/pings. I see your proposal in the set, and I'll point some appropriate potential mentors at it this week. Thanks for your proposal! Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Another tool for updating /etc -- lua||other script language bikeshed
On Fri, 26 Mar 2010, per...@pluto.rain.com wrote: Robert Watson rwat...@freebsd.org wrote: ... web browsers [are] basically operating systems at this point ... Isn't this a bit of an exaggeration? Not too many browsers have to deal with process/thread scheduling, or device drivers, or booting, or file system issues -- they rely on the OS for that (as does any other application). I think it's more of an anaology than an exageration. The FreeBSD kernel, including device drivers and architectures, is around 3.9 million lines of code. Google's Chromium, including WebKit, is around 4.1 million lines of code. Both provide an extensive runtime environment for applications that run on top of them, security domains, storage services, and management models. I'm not arguing that web browsers are a substitute for our current operating system layer: they clearly build on it. However, in terms of their goals in providing an execution environment, user interface, etc, they fill a very similar niche by being a general-purpose platform for many specific things. And, to get back to the point I was making: if you toast your Chromium update or get configuration management wrong, then your applications (Google Docs, GMail, ...) on ChromeOS won't work any more than if you toasted your /lib or /etc in FreeBSD. For example, if the Chromium configuration files change and it forgets about web proxies, Chromium won't be able to call home to pick up a fix any more than if etcmerge toasts resolv.conf. Making updates easy is, to a large extent, about avoiding the creation of foot-shooting opportunities. Some of it is about tools (binary updates, mergers, rollbac, etc), but most of it is about avoiding scenarios in which a previously valid configuration becomes invalid. And if we look at problems FreeBSD has had with updates in a past, a lot come down to precisely that: for example, renaming serial port device names (several times in as many years). Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Another tool for updating /etc -- lua||other script language bikeshed
On Wed, 24 Mar 2010, Ivan Voras wrote: Wouldn't it be nice to have a blessed (i.e. present-in-base) script language interpreter with a syntax that has evolved since the 1970-ies? (with a side-glance to C that *has* evolved since the KR style). ... As a possible alternative, or at least to learn about others' opinion on the subject, I'd like to suggest Lua (http://www.lua.org/). I think there are lots of good arguments for Lua in the base, but that etcmerge is definitely not one of them :-). An important goals for a tool like etcmerge is a minimal dependency footprint, so that you can use it with all the existing versions of FreeBSD floating around and upgrade to new versions. None of those existing versions have lua. Good arguments for lua in the base might include: - Moving to Lua as the scripting language for the boot loader - Improving scripting capabilities in the installer etcmerge sounds very exciting, especially for shops that want a more automated upgrade path. It's easy to upgrade web browsers, and they're basically operating systems at this point, so it would be nice if we could offer FreeBSD upgrades with similar ease. Quite a bit of our automated configuration update problem comes down to configuration file formats and the way diff/patch perform merges. Consider files like inetd.conf, master.passwd, group, etc: they essentially ensure that there will be a conflict if you have any local changes and the vendor (us) makes an upstream change. We used to have this problem with /etc/rc and /etc/rc.local, but rc.d has basically eliminated the problem by allowing boot-time custtomization through file insertion rather than file changes. Choices made in the configuration design for launchd, xinetd, and others avoid this mistake. Perhaps we shold be considering similar sorts of redesigns, focusing on how configuration files could be reworked to maximize automated update support. Where there's a true semantic conflict, an update conflict requiring resolution is fine, but where there's no semantic conflict (i.e., we add _anotheruser to the base master.passwd), no upgrade conflict should arise. (And definitely keeping this mind as we add new configuration files) Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
RE: mac_mls mac_biba mac_lomac patches to fix ptys_equal mib support for new /dev/pts in FreeBSD 8
On Tue, 2 Mar 2010, Selphie Keller wrote: - (2) Could you let me know how your login.conf + user labels are configured, and show me the output of ps -axZ | grep sshd? /etc/login.conf label configurations I use Staff users: label=mls/2(low-high) Deamons: label=mls/equal(equal-equal) Insecure users: label=mls/low(low-low) If you need the exact data from login.conf I can provide it, but is a bit tricky as I use tc= to call from one class to another class and override, in which default class is mls/low. Am I right in thinking that you have security.mac.biba.revocation_enabled and/or security.mac.mls.revocation_enabled set? Revocation being enabled might explain why you're seeing this issue, but other users aren't reporting problems. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Automated kernel crash reporting system
On Thu, 4 Mar 2010, sean connolly wrote: Automatic reporting would end up being a mess given that panics can be caused by hardware problems. Having an autoreport check if memtest was run before it reports, or having it only run with -CURRENTmight be useful. Hi Sean, Dan, et al: I'm not sure I agree with this view. For releases, it's true that many reported panics are a result of bad hardware. However, on active development branches, especially -CURRENT, that's not the case. An automated scheme to track bug reports and find common themes could be incredibly valuable in the development environment. And, to be honest, even if a fair number of reports are due to hardware failures, these often have common themes themselves, so it would be quite educational to be able to reason about panics on a large scale. Not to mention using it to identify potentially flakey hardware that users could then be warned about :-). Collecting crash reports is widespread in industry for both operating systems and applications for these reasons. Certainly, the crashinfo summary gathered on recent FreeBSD versions is an excellent starting point for building such a system. If we were to move ahead with it, we'd need to pay very close attention to scrubbing potentially sensitive information from panic reports, however. Robert Sean From: jhell jh...@dataix.net To: Dan Naumov dan.nau...@gmail.com Cc: FreeBSD Hackers freebsd-hackers@freebsd.org; freebsd-questi...@freebsd.org Sent: Thu, March 4, 2010 8:06:50 AM Subject: Re: Automated kernel crash reporting system On Thu, 4 Mar 2010 07:09, dan.naumov@ wrote: Hello I noticed the following on the FreeBSD website: http://www.freebsd.org/projects/ideas/ideas.html#p-autoreport Has there been any progress/work done on the automated kernel crash reporting system? The current ways of enabling and gathering the information required by developers for investigating panics and similar issues are unintuitive and user-hostile to say the least and anything to automate the process would be a very welcome addition. - Sincerely, Dan Naumov Hi Dan, I am assuming that the output of crashinfo_enable=YES is not what you are talking about is it ? are you aware of it ? The info contained in the crashinfo.txt.N is pretty informative for developers, maybe your talking about another way of submitting it ? Regards, -- jhell ___ freebsd-questi...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mac_mls mac_biba mac_lomac patches to fix ptys_equal mib support for new /dev/pts in FreeBSD 8
On Mon, 1 Mar 2010, Estella Mystagic wrote: Found issues with sysctl mibs security.mac.biba.ptys_equal, security.mac.lomac.ptys_equal, security.mac.mls.ptys_equal, not supporting new /dev/pts terminal system in FreeBSD 8, proposed fix for issue. When using a higher security grade/clearance with mac_mls it prevents writing to the /dev/pts/5 as its set as mls/low and subjects may not write to objects with a lower classification level than its own clearance level. Feb 25 21:42:16 labyrinth sshd[30965]: error: /dev/pts/5: Permission denied Feb 25 21:42:16 labyrinth sshd[30965]: error: open /dev/tty failed - could not set controlling tty: Permission denied Hi Selphie: Thanks for this patch. I'll go ahead and merge it, but had two questions: (1) It looks like you didn't need to set any special label on /dev/ptmx itself? (2) Could you let me know how your login.conf + user labels are configured, and show me the output of ps -axZ | grep sshd? We need to rethink how we deal with ttys anyway, and I'd like to understand how the specific case you're running into comes about. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mac_mls mac_biba mac_lomac patches to fix ptys_equal mib support for new /dev/pts in FreeBSD 8
On Tue, 2 Mar 2010, Robert Watson wrote: Thanks for this patch. I'll go ahead and merge it, but had two questions: Committed as r204581, thanks! Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: unix socket: race on close?
On Thu, 18 Feb 2010, Mikolaj Golub wrote: Below is a simple test code with unix sockets: the client does connect()/close() in loop and the server -- accept()/close(). Sometimes close() fails with 'Socket is not connected' error: Hi Mikolaj: Thanks for this report, and sorry about not spotting your earlier post to freebsd-net. I've been fairly preoccupied the last month and not keeping up with the mailing lists. Could I ask you to file a PR on this, and forward me the PR number so I can claim ownership? This should prevent it from getting lost while I catch up. In short, your evaluation seems reasonable to me -- have you tried tweaking soclose() to ignore ENOTCONN from sodisconnect() to confirm this diagnosis fixes all the instances you've been seeing? Robert N M Watson Computer Laboratory University of Cambridge a.out: parent: close error: 57 or a.out: child: close error: 57 It looks for me like some race in close(). Looking at uipc_socket.c:soclose(): int soclose(struct socket *so) { int error = 0; KASSERT(!(so-so_state SS_NOFDREF), (soclose: SS_NOFDREF on enter)); CURVNET_SET(so-so_vnet); funsetown(so-so_sigio); if (so-so_state SS_ISCONNECTED) { if ((so-so_state SS_ISDISCONNECTING) == 0) { error = sodisconnect(so); if (error) goto drop; } Isn't the problem here? so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING without locking and then sodisconnect() is called, which closes both sockets of the connection. So it looks for me that if the close() is called for both ends simultaneously it is possible that sodisconnect() will be called for both ends and for one ENOTCONN will be returned. Or may I have missed something? We have been observing periodically ENOTCONN errors on unix socket close in our applications, so it is not just curiosity :-) (I posted about our problem to freebsd-net@ some time ago but then did not attract any attention http://lists.freebsd.org/pipermail/freebsd-net/2009-December/024047.html). #include sys/types.h #include sys/socket.h #include sys/un.h #include netinet/in.h #include arpa/inet.h #include errno.h #include fcntl.h #include stdio.h #include strings.h #include string.h #include unistd.h #include sys/select.h #include err.h #define UNIXSTR_PATH /tmp/mytest.socket #define USLEEP 100 int main(int argc, char **argv) { int listenfd, connfd, pid; struct sockaddr_un servaddr; pid = fork(); if (-1 == pid) errx(1, fork(): %d, errno); if (0 != pid) { /* parent */ if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0)) 0) errx(1, parent: socket error: %d, errno); unlink(UNIXSTR_PATH); bzero(servaddr, sizeof(servaddr)); servaddr.sun_family = AF_LOCAL; strcpy(servaddr.sun_path, UNIXSTR_PATH); if (bind(listenfd, (struct sockaddr *) servaddr, sizeof(servaddr)) 0) errx(1, parent: bind error: %d, errno); if (listen(listenfd, 1024) 0) errx(1, parent: listen error: %d, errno); for ( ; ; ) { if ((connfd = accept(listenfd, (struct sockaddr *) NULL, NULL)) 0) errx(1, parent: accept error: %d, errno); //usleep(USLEEP / 2); // (I) uncomment this or (II) below to avoid the race if (close(connfd) 0) errx(1, parent: close error: %d, errno); } } else { /* child */ sleep(1); /* give the parent some time to create the socket */ for ( ; ; ) { if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0)) 0) errx(1, child: socket error: %d, errno); bzero(servaddr, sizeof(servaddr)); servaddr.sun_family = AF_LOCAL; strcpy(servaddr.sun_path, UNIXSTR_PATH); if (connect(connfd, (struct sockaddr *) servaddr, sizeof(servaddr)) 0) errx(1, child: connect error %d, errno); // usleep(USLEEP); // (II) uncomment this or (I) above to avoid the race if (close(connfd) != 0) errx(1, child: close error: %d, errno); usleep(USLEEP); } } return 0; } -- Mikolaj Golub ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___
Re: PFIL: how to get tcp/ip fields from mbuf
On Mon, 1 Feb 2010, Lukasz Jaroszewski wrote: I am wondering about most elegant and proper way to get IP header fields from mbuf, using PFILs. I have read Murat Balaban paper on PFIL_HOOKS where I found some example function. Question is how can I access IP header field in such manner. The best reference here is probably firewall source code that already exists in the tree. For IP-layer hooks, you'll need to use the m_pullup() call to ensure the bytes you want are contiguously stored, and then mtod() to cast the mbuf pointer appropriately. Although I notice ipfw, at least, doesn't call m_pullup() for the base header, as it assumes the calling context will already have arranged for it to be contiguous: static int ipfw_check_hook(void *arg, struct mbuf **m0, struct ifnet *ifp, int dir, struct inpcb *inp) { ... if (mtod(*m0, struct ip *)-ip_v == 4) ret = ip_dn_io_ptr(m0, dir, args); ... Robert static int hisar_chkinput(void *arg, struct mbuf **m, struct ifnet *ifp, int dir, struct inpcb *inp) { in_bytes += (*m)-m_len; return 0; } Regards LVJ. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Contribution to FreeBSD network stack
On Sun, 31 Jan 2010, shashidhara none wrote: I am interested to contribute to FreeBSD network stack. I found some projects at http://wiki.freebsd.org/Networking . But could not figure out how to start working on the same. Please help. Hi Shashi-- The FreeBSD network stack is a very large piece of code, and there are lots of opportunities to get involved helping to measure and improve its behavior, add new features, etc. Could you say a bit more about your background -- have you done much kernel programming and/or network stack programming before? Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Strange network issue in freebsd 8
On Sat, 23 Jan 2010, Sherin George wrote: i am facing some sort of strange network issue in a freebsd server occasionally. OS: FreeBSD 8.0-RELEASE - amd64 The servers loses network connection once in a few days. I logged into console and verified that network is up. I even restarted network service using following command. I'd suggest sending this e-mail to freebsd-net; there have been significant link layer changes in 8.0, and it's possible this is a side effect (and bug) from that. The appropriate people will pick it up on that list. Also, I notice you're running 8.0-RELEASE, rather than the latest patch level (which included some important security impovements and stability improvements); you may want to slide forward using freebsd-update or a manual rebuild. You will need to reboot to pick up some of the improvements. Robert N M Watson Computer Laboratory University of Cambridge /etc/rc.d/netif restart Still, it didn't fix. I checked /var/log/messages, but I am not getting any clue. == Jan 19 12:10:20 myserver kernel: GEOM_MIRROR: Device gm0: rebuilding provider ad0 finished. Jan 19 20:20:23 myserver nfsd[732]: select failed: Interrupted system call Jan 19 20:21:07 myserver nfsd[732]: select failed: Interrupted system call Jan 23 02:14:33 myserver login: ROOT LOGIN (root) ON ttyv0 Jan 23 02:19:51 myserver kernel: ifa_del_loopback_route: deletion failed Jan 23 02:19:57 myserver kernel: em0: link state changed to DOWN Jan 23 02:20:02 myserver kernel: em0: link state changed to UP Jan 23 02:29:58 myserver reboot: rebooted by root Jan 23 02:29:58 myserver syslogd: exiting on signal 15 Jan 23 02:31:31 myserver syslogd: kernel boot file is /boot/kernel/kernel Jan 23 02:31:31 myserver kernel: Copyright (c) 1992-2009 The FreeBSD Project. Jan 23 02:31:31 myserver kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Jan 23 02:31:31 myserver kernel: The Regents of the University of California. All rights reserved. Jan 23 02:31:31 myserver kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Jan 23 02:31:31 myserver kernel: FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009 Jan 23 02:31:31 myserver kernel: r...@mason.cse.buffalo.edu: /usr/obj/usr/src/sys/GENERIC Jan 23 02:31:31 myserver kernel: Timecounter i8254 frequency 1193182 Hz quality 0 == Network, TCP stack all were up. It was pinging gateway even. But, traceroute was not going beyond gateway. I believe the issue is not related to anything outside server since a reboot always fixes the issue. I will be grateful for any advise that can help me in troubleshooting this problem. -- Best Regards, Sherin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: yarrow random generator
On Thu, 24 Dec 2009, RW wrote: And also according to Schneier it is a good idea to save state of the PRNG and restore it on boot to make it more seeded. In the default configuration, we save some PRNG output every few minutes (using cron) to a file in /var so that it can be re-injected into Yarrow on the next boot (done by /etc/rc.d/random). It isn't handled very well though. The files saved by crontab under /var are loaded a bit late in the boot sequence - after encrypted swap. The main entropy file is loaded earlier, but immediatly after ps -fauxww, sysctl -a, etc are dumped into the device, saturating its 4K of buffer space. I can't speak to the specific /dev/random design choices here, but I can say that there is a more general issue with swap being required to get to the point where you reliably have writable file system access. This is because fsck can be quite memory-heavy, and so swap is started before fsck is started. It could well be that the arrival of proper UFS journaling support in the immediate future allows more agressive reordering of the boot process so that writable file systems can be assumed much earlier. I'll point Mark Murray at this thread and see if we can get him to opine some on the current design choices and any potential changes to address them. I was interested by your observation that the boot-time dumping of bits into /dev/random may overflow the buffering -- indeed, it looks like the rate-controlling in effect for other entropy sources may not be appropriate for /dev/random. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: yarrow random generator
On Thu, 24 Dec 2009, Paul Graphov wrote: And also according to Schneier it is a good idea to save state of the PRNG and restore it on boot to make it more seeded. In the default configuration, we save some PRNG output every few minutes (using cron) to a file in /var so that it can be re-injected into Yarrow on the next boot (done by /etc/rc.d/random). Robert N M Watson Computer Laboratory University of Cambridge 2009/12/24 Colin Percival cperc...@freebsd.org Hi all, Looks like there's a bug here, but it doesn't matter since this is dead code: .seeded is initialized to 1 and never modified, so we will never call into random_yarrow_block. IIRC this is because there are some places which ask for entropy before yarrow is seeded but don't actually need *cryptographic* entropy. Thu, Dec 24, 2009 at 03:45:15PM +0300, Paul Graphov wrote: I've looked at FreeBSD 8.0 cryptographically secure pseudorandom numbers generator and have a question. It looks like a bug but I'am not sure. In file sys/dev/randomdev.c, function random_read: if (!random_systat.seeded) error = (*random_systat.block)(flag); It blocks until PRNG is seeded. For software random generator implementation block method looks as follows, sys/dev/randomdev_soft.c: random_yarrow_block(int flag) { int error = 0; mtx_lock(random_reseed_mtx); /* Blocking logic */ while (random_systat.seeded !error) { if (flag O_NONBLOCK) error = EWOULDBLOCK; else { printf(Entropy device is blocking.\n); error = msleep(random_systat, random_reseed_mtx, PUSER | PCATCH, block, 0); } } mtx_unlock(random_reseed_mtx); return error; } It seems that random_systat.seeded in while condition should be negated. Or it will never block actually, or block erroneously until next reseed (under very rare conditions) -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: 8.0-RELEASE-p1 Panic panic: sbdrop
On Tue, 15 Dec 2009, Linda Messerschmidt wrote: This is a new one on me: Hi Linda-- Unfortunately, this has historically been a tricky panic to debug, as it's associated with a sanity check that picks up kernel memory corruption that may have occurred at a much earlier time. Without a crashdump, we won't get much further. However, let's see what we can do, perhaps trying to find some common configuration element with another past report of the same diagnostic. FYI, typically this panic occurs as a result of a concurrency bug in a device driver, although it can be a symptom of a more general network stack bug. Could you tell us a bit more about the network configuration -- especially, are you using any tunneling software (such as ipsec), netgraph, or other less commonly used network features? Are you using accept filters? Robert N M Watson Computer Laboratory University of Cambridge panic: sbdrop cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 sbdrop_internal() at sbdrop_internal+0x323 soisdisconnected() at soisdisconnected+0xbe tcp_close() at tcp_close+0x45 tcp_do_segment() at tcp_do_segment+0x122f tcp_input() at tcp_input+0xc92 ip_input() at ip_input+0xac netisr_dispatch_src() at netisr_dispatch_src+0x7e ether_demux() at ether_demux+0x15d ether_input() at ether_input+0x17b em_rxeof() at em_rxeof+0x287 em_handle_rxtx() at em_handle_rxtx+0x2f taskqueue_run() at taskqueue_run+0x93 taskqueue_thread_loop() at taskqueue_thread_loop+0x46 fork_exit() at fork_exit+0x118 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8000117d30, rbp = 0 --- This machine runs squid as a reverse proxy, and this has happened a couple of times now in the past day. Unfortunately it's a production machine, so we'll have to go back to 7.2. I can probably leave it as-is for 24 hours or so if anybody wants me to check something, but it doesn't have a dump or a debug kernel and I unfortunately can't put it back in production to provoke another crash. :( But I wanted to at least report this before we did in case it's useful to anyone. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Superpages on amd64 FreeBSD 7.2-STABLE
On Thu, 10 Dec 2009, Nate Eldredge wrote: What about using posix_spawn(3)? This is implemented in terms of vfork(), so you'll gain the same performance advantages, but it avoids many of vfork's pitfalls. Also, since it's a POSIX standard function, you needn't worry that it will go away or change its semantics someday. Just as a note here: while we do posix_spawn(3) as a library function, Mac OS X does it as a system call. As a result, they can implement certain spawn flags that we can't, among others, the ability to have the newly created process/image be suspended before its first instruction executes. This would be very useful when debugging the runtime linker, among other things. On the other hand, it's quite a complex kernel code path... Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: UNIX domain sockets on nullfs still broken?
On Mon, 30 Nov 2009, xorquew...@googlemail.com wrote: jackd (audio/jack) creates a directory in /tmp with a UNIX domain socket in it. Clients connect to this socket to communicate with the server. We currently support the sharing of UNIX domain sockets between file system layers on either nullfs or unionfs. In the former case, this is a bug, and in the latter case, it is a feature. The specific nature of the bug is that you can't just copy the socket pointer between layers in the vnode stack without additional reference counting (and other similar state propagation), so if we allowed inter-layer access it would lead to use-after-free panics and similar sorts of problems. This occurs, BTW, because the socket pointer is directly in struct vnode, and not queried by a VOP, which could be forwarded by nullfs down a layer. The fixes here aren't easy, so I would anticipate UNIX domain sockets not working across nullfs layers for some time to come. It's not immediately clear to me which approach is the best way to fix it, since it likely requires UNIX domain sockets to learn about stacked file systems in some form, which will significantly complicate an already complicated relationship. Robert N M Watson Computer Laboratory University of Cambridge $ jackd -d oss -r 44100 -p 128 $ ls -alF /tmp/jack-11001/default total 4 drwx-- 2 xw wheel 512 30 Nov 14:19 ./ drwx-- 3 xw wheel 512 30 Nov 14:19 ../ prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-0| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-1| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-2| srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_0= srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_ack_0= $ sudo mount_nullfs /tmp/ /jail/k4m/tmp In the jail: k4m$ ls -alF /tmp/jack-11001/default drwx-- 2 xw wheel 512 30 Nov 14:19 ./ drwx-- 3 xw wheel 512 30 Nov 14:19 ../ prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-0| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-1| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-2| srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_0= srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_ack_0= k4m$ ktrace jack_showtime jack server not running? k4m$ kdump | grep '/tmp/jack-11001' 76030 initial thread STRU struct sockaddr { AF_LOCAL, /tmp/jack-11001/default/jack_0 } 76030 initial thread NAMI /tmp/jack-11001/default/jack_0 76030 initial thread RET connect -1 errno 61 Connection refused $ uname -a FreeBSD viper.internal.network 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 xw ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: UNIX domain sockets on nullfs still broken?
On Mon, 30 Nov 2009, Ivan Voras wrote: What's the sane solution, then, when the only method of communication is unix domain sockets? It is a security problem. I think the long-term solution would be to add a sysctl analogous to security.jail.param.securelevel to handle this. I don't think there is a workaround right now. I'm not sure I agree on the above, hence my comments about nullfs and unionfs. I see nullfs as intended to provide references (possibly masked to read-only) to the same fundamental object, and unionfs to provide independence between different consumers that see objects via different file system mounts. As such, I'd expect UNIX domain sockets to work for inter-jail communication when using nullfs, and not work when using unionfs. It's simply a property of the implementation of the linkage between VFS and UNIX domain sockets that they are currently both broken (in fact, someone tried to fix it with union mounts recenty, running into the use-after-free bugs I mentioned, but also breaking the semantics in my view). Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: UNIX domain sockets on nullfs still broken?
On Tue, 1 Dec 2009, Linda Messerschmidt wrote: On Mon, Nov 30, 2009 at 10:14 AM, Ivan Voras ivo...@freebsd.org wrote: What's the sane solution, then, when the only method of communication is unix domain sockets? It is a security problem. I think the long-term solution would be to add a sysctl analogous to security.jail.param.securelevel to handle this. Out of curiosity, why is allowing accessing to a Unix domain socket in a filesystem to which a jail has explicitly been allowed access more or less secure than allowing access to a file or a devfs node in a filesystem to which a jail has explicitly been allowed access? (I seem to have caught this thread rather late in the game due to being on travel) -- Ivan is wrong about nullfs, it's broken due to a bug, not a feature, and that bug is not present when using a single file system. He's thinking of unionfs semantics, where if it worked it would be a bug. :-) Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: UNIX domain sockets on nullfs still broken?
On Thu, 10 Dec 2009, Robert Watson wrote: On Mon, 30 Nov 2009, xorquew...@googlemail.com wrote: jackd (audio/jack) creates a directory in /tmp with a UNIX domain socket in it. Clients connect to this socket to communicate with the server. We currently support the sharing of UNIX domain sockets between file system layers on either nullfs or unionfs. In the former case, this is a bug, and Should read neither ... nor. Robert N M Watson Computer Laboratory University of Cambridge in the latter case, it is a feature. The specific nature of the bug is that you can't just copy the socket pointer between layers in the vnode stack without additional reference counting (and other similar state propagation), so if we allowed inter-layer access it would lead to use-after-free panics and similar sorts of problems. This occurs, BTW, because the socket pointer is directly in struct vnode, and not queried by a VOP, which could be forwarded by nullfs down a layer. The fixes here aren't easy, so I would anticipate UNIX domain sockets not working across nullfs layers for some time to come. It's not immediately clear to me which approach is the best way to fix it, since it likely requires UNIX domain sockets to learn about stacked file systems in some form, which will significantly complicate an already complicated relationship. Robert N M Watson Computer Laboratory University of Cambridge $ jackd -d oss -r 44100 -p 128 $ ls -alF /tmp/jack-11001/default total 4 drwx-- 2 xw wheel 512 30 Nov 14:19 ./ drwx-- 3 xw wheel 512 30 Nov 14:19 ../ prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-0| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-1| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-2| srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_0= srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_ack_0= $ sudo mount_nullfs /tmp/ /jail/k4m/tmp In the jail: k4m$ ls -alF /tmp/jack-11001/default drwx-- 2 xw wheel 512 30 Nov 14:19 ./ drwx-- 3 xw wheel 512 30 Nov 14:19 ../ prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-0| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-1| prw-r--r-- 1 xw wheel0 30 Nov 14:19 jack-ack-fifo-54211-2| srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_0= srwxr-xr-x 1 xw wheel0 30 Nov 14:19 jack_ack_0= k4m$ ktrace jack_showtime jack server not running? k4m$ kdump | grep '/tmp/jack-11001' 76030 initial thread STRU struct sockaddr { AF_LOCAL, /tmp/jack-11001/default/jack_0 } 76030 initial thread NAMI /tmp/jack-11001/default/jack_0 76030 initial thread RET connect -1 errno 61 Connection refused $ uname -a FreeBSD viper.internal.network 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 xw ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mprotect(2) clears the flag for whole page which causes program crash.
On Tue, 17 Nov 2009, Sharad Chandra wrote: Is it known bug or is there any workaround? How will a userland process make sure that process will not crash as malloc(3) can allocate where ever it get the memory free to use. mprotect(2) operates on pages, so you'll want to use mmap(2) and munmap(2) to allocate and free pages directly rather than mallac(3), which manages byte ranges from pages managed using those same interfaces. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: mmap(2) segaults with certain len values and MAP_ANON|MAP_FIXED
On Wed, 21 Oct 2009, Alexander Best wrote: this code serves only one purpose: to trigger a segfault. i don't use the code for any other purpose. i was under the impression that mmap() should either succeed or fail (tertium non datur). mmap's manual doesn't say anything about mmap() causing segfaults. Have you tried ktracing the application? I think you'll find that mmap(2) system call succeeded fine, and that the segfault comes from attempting to execute the address in libc on return to userspace, as a result of libc not being at that address anymore (since you removed its mapping). You can use procstat -v to inspect address space use by processes, but as a general rule you don't want to pass anything other than an address of 0x0 to mmap(2) unless you're very carefully managing the address space of the process. Many userspace libraries are involved in using that address space, but especially the runtime linker which begins execution in userspace when a binary is started. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: global TCP_NODELAY?
On Mon, 12 Oct 2009, Ivan Voras wrote: 2009/10/12 Alfred Perlstein alf...@freebsd.org: * Ivan Voras ivo...@freebsd.org [091012 04:29] wrote: I'm trying to work around some extreme brain damageness in PHP (yes, it sucks) which doesn't have a way to set TCP_NODELAY on stream sockets so I'm wondering what are my other options? Is there a way to set TCP_NODELAY system-wide? Ivan, many people write php extensions, maybe you can do that? While writing PHP extensions isn't hard (I've done it before), I'm not yet convinced it's worth the effort in this case - I don't know if TCP_NODELAY will help at all. I'll think about it if time permits. Create a libc wrapper that calls setsockopt(2) whenever socket(2) is called to create a TCP socket in php, and inject it using LD_PRELOAD. This is a similar trick to what things like socks proxy library wrappers use, is easy to hack together, and avoids having to modify the kernel. When it doesn't work, move on, and if it does, change php :-). Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Need some help understanding a jail system call.
On Wed, 26 Aug 2009, Dag-Erling Smørgrav wrote: Here is the link i used to find this code http://www.watson.org/~robert/freebsd/jailng/ You realize that this is eight years old, right? And that the jail infrastructure has been extensively modified since then, and is currently being rewritten again? As DES points out, that jail work has been superceded by other, more interesting, work in the base tree. My suggestion would be to read the jail(2) man page, both the current 7.x version, and the forthcoming 8.x version which has been substantially enhanced, and disregard the jailng page. I should more clearly mark it as being of historic interest only. Robert N M Watson Computer Laboratory University of Cambridge___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Common interface for sensors/health monitoring
On Sat, 22 Aug 2009, Marc Balmer wrote: I was looking for the same info a time ago .. something that would allow me to gather all the info from the same place, but the only thing I came up with was the very same discussion about the sensors framework port and nothing else. Any info on any such proyect will be greatly apreciated The OpenBSD sensors framework lacks some desireable features, e.g. event capabilities like getting an event if a certain threshold is exceeded. And it propbably was used for things that it better had not (yes, I am culprit for on of these (ab)uses...). I am sure these features could be added if only the code was in the tree to hack on... One of the things I'd particularly like to see is an alignment between kernel/user level monitoring frameworks and the SNMP model (especially relating to traps). The SNMP information model (MIBs, agents, traps, etc) has its limitations, but having a compatible model at all layers of the system will make it easier to store, manipulate, manage, and report this information consistently throughout the OS and larger distributed systems. Robert ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Security: information leaks in /proc enable keystroke recovery
On Sun, 16 Aug 2009, David Wagner wrote: I accept your argument that there is no point trying to defend against deliberate communication of information between two cooperating processes via some sneaky channel; there is no hope of stopping that in general-purpose commodity OS's. If process X and Y are both colluding to send information from X to Y, they will succeed, no matter how hard we try. We have no hope of closing all such channels, for general-purpose commodity OS's (like FreeBSD or Linux). Moving beyind EIP/ESP, which are clearly a bad idea: The OS community has not engaged well with the concerns raised by past cache-based crypto side channels, in part because it seemed the least complex solution was hardening crypto against having key-driven footprints in the cache. However, the problem they represent (avoiding the use of shared resources between mutually untrusting processes, and then mitigating efects that remain) definitely sounds like the covert channel problem, with very similar concerns extensively discussed in the documents I referred to. In an interactive system, the scheduling of threads in a process reflect the completion of various events: user I/O, network I/O, disk I/O, or perhaps the expiration of a timer associated with application-internal events (animations, statistics, etc). Monitoring these from another process is intentionally easy on commodity OS's -- there are a variety of monitoring statistics, from the already mentioned process/thread execution time, to context switch counters, wait channels/addresses, lock states, timestamps on special devices, etc, not to mention having CPU sink processes that nice themselves appropriately and hang around monitoring execution of other processes/threads/the kernel through gaps in its own scheduling. Some of the intentional mechanisms are specific to processes, and easy to block by policy. Others are global, and begin the sliding down the slope of making the system and applications a lot harder to analyze and debug, something that sites frequently hosting large numbers of mutually untrusting users (web farms) may not be willing to deal do. Into the area of techniques that annoy people: my guess is that you may also be able to measure the context switching of processes on other CPUs through very careful timing of events in the kernel on your local CPU. For example, it's a reasonable bet that using the TSC and carefully selected system calls/arguments, you can measure cache line behavior associated with kernel scheduler/statistic lines that will be pulled to another CPU when a context switch takes place. For example, consider per-CPU run queue locks or context switch statistics, which may in edge cases be pulled to another CPU, such as when monitoring takes place. If they are already local to the attacking CPU, no context switch has taken place on the other CPU since you last checked; if they're non-local, a context switch has taken place. Following Colin Percival's paper on cache side channels for RSA, there was a lot of discussion about how the OS could help mitigate these problems: do you provide security critical sections around cryptography which introduce temporary but performance-degrading mutual exclusion of caches based on knowledge of the CPU topology, for example. Identifying and offering similar trade-offs between performance and security, avoiding excess complexity, and in particular, limiting the scope of those performance losses to only critical moments will be key if the security community wants to engage the OS community here. Otherwise I suspect these concerns will pass by, unaddressed, again. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Security: information leaks in /proc enable keystroke recovery
On Sun, 16 Aug 2009, Oliver Pinter wrote: FreeBSD manages its process files more cautiously than Linux12 : it puts all register values into the file /proc/pid/regs that can only be read by the owner of a process, which blocks the information used by This is inaccurate, but largely in an academic sense. The FreeBSD kernels computes debug permissions between two processes from a number of factors, including: - Comparison of uids (effective, real, saved) - Comparison of gids (effective, real, saved) - Subset check on the additional group set - Jail information - Mandatory access control policies There are some other checks that fit the pattern of credential comparison less well: - We deny debugger activity during execve(2) as a robustness protection against possible race conditions. - We have global policy controls, security.bsd.see_other_gids and security.bsd.see_other_uids, which allow administrators to scope not just debugging facilities, but also monitoring facilities. - We have a global policy control, security.bsd.unprivileged_proc_debug, to disable debugging facilities for unprivileged users, on the grounds that (a) this is sometimes a desirable system policy, and (b) all UNIX systems have historically suffered from significant debugger security vulnerabilities and this provides an easy work-around to use if that happens in the future. Which is to say: the UNIX file system permissions appearing in procfs are purely decorative -- they roughly summarize, but do not implement, the above checks. procfs is also deprecated in FreeBSD, and has not been mounted by default for several major releases. Instead, the system call interfaces ktrace(2), ptrace(2), and sysctl(2) provide access to trace data, process debugging, and process state (such as address space layout and file descriptor information). Some legacy setgid kvm tools exist that use libkvm and /dev/kmem, but we are eliminating these as quickly as we can; they may not follow the same policies as those implemented in the kernel. The see_other_uids and see_other_gids policy sysctls narrow the policy on inter-process visibility via monitoring controls -- however, additional hardening is required to enforce this policy universally. For example, administrators will also need to limit access to logs, accounting data, and so on, for this to be fully effective. Beyond this, and assuming the correct implementation of the above, we're into the grounds of classic trusted OS covert channel analysis, against which no COTS UNIX OSes I'm aware of are hardened. This isn't to dismiss these attacks as purely hypothetical -- we've seen some rather compelling examples of covert channels being exploited in unexpected and remarkably practical ways in the last few years (Steven Murdoch's Hot or Not paper takes the cake in that regard, I think). However, this next step up from the kernel doesn't reveal information on processes from other users involves scheduler hardening, consideration of inter-CPU/core/thread cache interactions, and so on -- things that we don't have a good research, let alone production, OS understanding of. There are tools in FreeBSD that can help with some of these issues -- for example, you can use login classes to pin different users to different CPU threads, cores, or packages. However, this leaves the implementation of policy up to the administrator, rather than simply allowing the administator to specify the policy that mutually untrusting processes can't share CPUs with each other in some window. Robert N M Watson Computer Laboratory University of Cambridge___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: What's changed between 7.1 and 7.2
On Fri, 10 Jul 2009, Ivan Voras wrote: Robert Watson wrote: On Wed, 8 Jul 2009, Wojciech Puchar wrote: i'm getting that crap every time i remount filesystem and on startup. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. i'm using glabel only to avoid mess about what drive is connected to what SATA port. This is effectively debugging output that slipped into the release and shouldn't have. I believe it's now removed in 8.x, I'm not sure it's been MFC'd to 7.x yet. The output can be entirely ignored and does not reflect a problem, just state changes resulting from a volume becoming visible to geom, and then the label name being removed following mount. If it's desireable to MFC it now, I'll do it. (but it will remove the above output for all glabel labels, not only ufs). I think it would be widely appreciated. :-) Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: What's changed between 7.1 and 7.2
On Wed, 8 Jul 2009, Wojciech Puchar wrote: i'm getting that crap every time i remount filesystem and on startup. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed. GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e. i'm using glabel only to avoid mess about what drive is connected to what SATA port. This is effectively debugging output that slipped into the release and shouldn't have. I believe it's now removed in 8.x, I'm not sure it's been MFC'd to 7.x yet. The output can be entirely ignored and does not reflect a problem, just state changes resulting from a volume becoming visible to geom, and then the label name being removed following mount. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: large pages (amd64)
On Tue, 30 Jun 2009, Mel Flynn wrote: It looks like sys/kern/kern_proc.c could call mincore around the loop at line 1601 (rev 194498), but I know nothing about the vm subsystem to know the implications or locking involved. There's still 16 bytes of spare to consume, in the kve_vminfo struct though ;) Yes, to start with, you could replace the call to pmap_extract() with a call to pmap_mincore() and export a Boolean to user space that says, This region of the address space contains one or more superpage mappings. How about attached? I like the idea -- there are some style nits that need fixing though. Assuming Alan is happy with the VM side of things, I can do the cleanup and get it in the tree. Robert N M Watson Computer Laboratory University of Cambridge % sudo procstat -av|grep 'S ' PID STARTEND PRT RES PRES REF SHD FL TP PATH 1754 0x2890 0x2ae0 rw- 93850 3 0 --S df 2141 0x2f90 0x3080 rw- 37190 1 0 --S df 2146 0x3eec 0x4fac rwx 17450 1 0 --S df -- Mel ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: callout(9) and Giant lock
On Sun, 28 Jun 2009, Sebastian Huber wrote: suppose that a certain time event triggered several callout functions. What happens if the first of these callout functions blocks on the Giant lock? Does this delay all further callout functions until the Giant lock is available for the first callout function? What happens if one of the callout function blocks forever? Does this deadlock the system? Callouts are marked as MPSAFE or non-MPSAFE when registered. If non-MPSAFE, we will acquire Giant automatically for the callout, but I believe we'll also try and sort non-MPSAFE callouts behind MPSAFE ones in execution order to minimize latency for MPSAFE callouts. Most callouts acquire locks of some sort, and stalling any callout indefinitely will stall the entire callout thread indefinitely, which in turn could lead to a variety of odd behaviors and potentially (although not necessarily) deadlock. In general, we do not allow callouts to block, however, in the sense that with INVARIANTS enabled we will actually panic if a callout tries to call msleep() or related functions. Likewise, if another thread sleeps while holding Giant, it will automatically release it when it sleeps. Relatively few kernel subsystems use Giant at this point, FYI, and even fewer in FreeBSD 8. One of our goals for FreeBSD 9 is to eliminate all last remaining references to the Giant lock in the kernel. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Disk quota for Jail. Discussion.
On Tue, 26 May 2009, Menshikov Konstantin wrote: Yes. But jail cannot allocate block and inode above root path. In allocation functions, whether for example ffs_alloc we have access to ucred process and we can check up there is a process in jail. Yes, you can check this for jailed process. Think about non-jailed processes that can do allocation below the jail root. Processes out of jail are not considered. I do not understand, these processes have what relation to disk to quotas for jail. Please explain more in detail Historic UFS quotas are actually not interested in processes at all, really, except in as much as processes are where exception states are exposed. UFS quotas count blocks and inodes owned by users based on the 'uid' and 'gid' fields in the inode. There's now 'jailid' field, so quotas on this model can't capture the notion of per-jail quotas. In fact, quotacheck relies on being able to walk the file system looking only at file system data in order to establish initial usage accounting. You can imagine adding one, or managing the uid spaces across jails such that all uids are unique, etc, but all of these require some amount of rethinking. Or, some other model of quota. Frankly, I've always been a fan of the AFS model, now accessible locally via ZFS, in which lightweight volumes with quota limits are used for individual user home directories, virtual machines, etc. This was hard to do in FreeBSD before ZFS because (a) UFS didn't want to resize trivially and (b) having lots and lots of mountpoints and file systems wasn't something we made administratively easy. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: compiling system binutils as cross tools
On Thu, 21 May 2009, xorquew...@googlemail.com wrote: How do I compile the system binutils (contrib/binutils) as i386 - x86_64 cross utils? That is, binutils that will run on an i386 host but will produce x86_64 binaries? I'm trying to produce a bootstrapping compiler for a port and need to get these working. I've spent a while reading Makefiles but would rather get information from someone who actually knows rather than waste *another* week on this stuff. I'd rather not compile the entire world if it can be avoided. Not really my area, but if you haven't found make toolchain and make buildenv then you might want to take a look. Typically these will be combined with TARGET_ARCH=foo, and in your case foo is 'amd64'. The former builds the toolchain required for the architecture, and the latter creates a shell environment with paths appropriately munged and environments appropriately set to cross-compile using that chain. Normally the toolchain step is part of our integrated buildworld/buildkernel/etc process, but you can also use it for other things with buildenv. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to invalidate NFS read cache?
On Fri, 8 May 2009, Konrad Heuer wrote: sporadically, I observe a strange but serious problem in our large NFS environment. NFS servers are Linux and OS X with StorNext/Xsan cluster filesystems, NFS clients Linux and FreeBSD. NFS client A changes a file, but nfs client B (running on FreeBSD) does still see the old version. On the NFS server itself, everything looks fine. Afaik the FreeBSD kernel invalidates the NFS read cache if file modification time on the server changed which should happen here but doesn't. Can I force FreeBSD (e.g. by sysctl setting) to read file buffers again unconditionally after vfs.nfs.access_cache_timeout seconds have passed? Hi Konrad: Normally, NFS clients implement open-to-close consistency, which dictates that when a close() occurs on client A, all pending writes on the file should be issued to the server before close() returns, so that a signal to client B to open() the file can validate its cache before open() returns. This raises the following question: is client A closing the file, and is client B then opening it? If not: relying on writes being visible on the client B before the close() on A and a fresh open() on B is not guaranteed to work, although we can discuss ways to improve behavior with respect to expectation. Try modifying your application and see if it gets the desired behavior, and then we can discuss ways to improve what you're seeing. If you are: this is probably a bug in our caching and or issuing of NFS RPCs. We cache both attribute and access data -- perhaps there is an open() path where we issue neither RPC? In the case of open, we likely should test for a valid access cache entry, and if there is one, issue an attribute read, and otherwise just issue an access check which will piggyback fresh attribute data on the reply. Perhaps there is a bug here somewhere. A few other misc questions: - Could you confirm you're using NFSv3 on all clients. Are there any special mount options in use? - What version of FreeBSD are you running with? In FreeBSD 8.x, we now have DTrace probes for all of the above events -- VOPs, attribute cache hit/miss/load/flush, access cache hit/miss/load/flush, RPCs, etc, which we can use to debug the problem. I haven't yet MFC'd these to 7.x, but if you're able to run a very fresh 7-STABLE, I can probably produce a patch to add it for you in a few days. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to invalidate NFS read cache?
On Tue, 12 May 2009, Robert Watson wrote: Normally, NFS clients implement open-to-close consistency, which dictates that when a close() occurs on client A, all pending writes on the file should be issued to the server before close() returns, so that a signal to client B to open() the file can validate its cache before open() returns. This should, of course, read close-to-open consistency -- I plead jetlag after an overnight flight back form Boston to the UK :-) Robert N M Watson Computer Laboratory University of Cambridge This raises the following question: is client A closing the file, and is client B then opening it? If not: relying on writes being visible on the client B before the close() on A and a fresh open() on B is not guaranteed to work, although we can discuss ways to improve behavior with respect to expectation. Try modifying your application and see if it gets the desired behavior, and then we can discuss ways to improve what you're seeing. If you are: this is probably a bug in our caching and or issuing of NFS RPCs. We cache both attribute and access data -- perhaps there is an open() path where we issue neither RPC? In the case of open, we likely should test for a valid access cache entry, and if there is one, issue an attribute read, and otherwise just issue an access check which will piggyback fresh attribute data on the reply. Perhaps there is a bug here somewhere. A few other misc questions: - Could you confirm you're using NFSv3 on all clients. Are there any special mount options in use? - What version of FreeBSD are you running with? In FreeBSD 8.x, we now have DTrace probes for all of the above events -- VOPs, attribute cache hit/miss/load/flush, access cache hit/miss/load/flush, RPCs, etc, which we can use to debug the problem. I haven't yet MFC'd these to 7.x, but if you're able to run a very fresh 7-STABLE, I can probably produce a patch to add it for you in a few days. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: NetBSD 5.0 statistics
On Thu, 30 Apr 2009, Oliver Pinter wrote: Is the FreeBSD's FS management so slow? http://www.netbsd.org/~ad/50/img15.html Or so big is the difference between the two cpu scheduler? Also, there's a known and serious performance regression in CAM relating to tgged queueing, and the generic disk sort routine, introduced 7.1, which will be fixed in 7.2. I can't speak more generally to the benchmarks -- we'll need to run them in a controlled environment and see if we can reproduce the results. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD memguard + spinlocks
On Sat, 11 Apr 2009, Andrew Brampton wrote: I'm having a problem with memguard(9) on FreeBSD 7.1 but before I ask about that I just need to check my facts about malloc. When in interrupt context malloc must be called with M_NOWAIT, this is because I can't sleep inside a interrupt. Now when I hold a spinlock (MTX_SPIN) I am also not allowed to sleep or obtain a sleepable mutex (such as MTX_DEF). So I assume while holding a spin lock any mallocs I do must have the M_NOWAIT flag? This was not clear from the manual pages, but at least makes sense to me. So my problem with memguard stems from the fact that I am locking a spinlock, and then I'm calling malloc with M_NOWAIT. But inside memguard_alloc a MTX_DEF is acquired causing WITNESS to panic. So I think fundamental memguard is flawed and should be using MTX_SPIN instead of MTX_DEF otherwise it can't be called from inside a interrupt or when a spin lock is held. But maybe I'm missing something? Also on a related note, I see that MTX_SPIN disables interrupts, making it a rather heavy spinlock. Is there a lighter spin lock that literally just spins? I read that MTX_DEF are far quicker to acquire , but surely a light spinlock would be easier to acquire than sleeping? I think for the moment I will fix my code by not using a MTX_SPIN (since the code is not in a interrupt), however, I think memguard should change its lock. Your understanding is mostly right. The missing bit is this: there are two kinds of interrupt contexts -- fast/filter interrupt handlers, which borrow the stack and execution context of the kernel thread they preempt, and interrupt threads, which get their own complete thread context. Fast interrupt handlers are allowed unlock to acquire spinlocks so as to avoid deadlock because of the borrowed context. This means they can't perform any sort of sleep, or acquire any locks that might sleep, since the thread they've preempted may hold conflicting locks, or be the one that would have woken up the sleep that the handler performed. Almost no code will run in fast handlers -- perhaps checking some device registers, doing work on a lockless or spinlock-protected queue, and waking up a worker thread. This is why, BTW, spin locks disable interrupt: they need to control preemption by other interrupt handlers to avoid deadlock, but they are not intended for use except when either in the scheduler, in a few related IPI contexts, or when synchronizing between normal kernel code and a fast handler. Full interrupt thread contexts are permitted to perform short lock sleeps, such as those performed when contending default mutexes, rwlocks, and rmlocks. They are permitted to invoke kernel services such as malloc(9), UMA(9), the network stack, etc, as long as they use M_NOWAIT and don't invoke msleep(9) or similar unbounded sleeps -- again to avoid the possibility of deadlocks, since you don't want an interrupt thread sleeping waiting for an event that only it can satisfy. So the first question, really, is whether you are or mean to be using fast/filter interrupt handler. Device drivers will never call memory allocation, free, etc, from there, but will defer it to an ithread using the filter mechanism in 8.x, or to a task queue or other worker in 7.x and earlier. If you're using a regular INTR_MPSAFE ithread, you should be able to use only default mutexes (a single atomic operation if uncontended) without disabling interrupts, etc. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: FreeBSD memguard + spinlocks
On Sat, 11 Apr 2009, Andrew Brampton wrote: Thanks very much for your detailed reply. I'm slowly understanding how everything in FreeBSD fits together, and I appreciate your help. I've been given a project to take over, and all of the design decisions were made before I started working on it, thus I'm playing catch up. One of the decisions was to implement their own version of a spin lock, which literally looks something like this: lock_aquire() { critical_enter(); while (! lockHeld ) {} lockHeld++; } This was actually the code tripping up MemGuard, as it is inside a critical section, which MemGuard is unable to sleep within. This is all running inside a kthread_create thread (I'm unsure of the proper name of this type of thread). Anyway, that is why I also asked about a lighter weight spin lock (perhaps similar to this one). I tempted to replace this custom spinlock with the standard MTX_DEF, however I'm unsure of its impact. The custom spin lock seems quick and light to acquire, and it does not concern me that a interrupt can potentially interrupt the code. On a related note, if you change the lock in memguard to a MTX_SPIN, it panics the kernel during boot. So that is not an option :) I was only using memguard because I suspected memory being used after it was freed. However, I think I will either change my locks to MTX_DEF or live without memguard. I realise I've not really asked any questions, but I would be grateful for any insights anyone may have. Andrew My advice, unless you're definitely executing code in fast interrupt contexts, is to simply use the FreeBSD default mutex primitive instead of either a custom-build spinlock or a FreeBSD MTX_SPIN mutex. The default mutex is adaptive, and will spin when contending the lock unless the thread holding the lock isn't executing, in which case it will fall back on a context switch. Our mutexes also make correct use of memory barriers, which the above example code doesn't appear to, so will work on systems that have weaker memory ordering properties. Using the default mutex scheme also allows you to take advantage of WITNESS, our lock order verifier, which proves a really useful tool when a lot of locks are in flight. The critical sections you're using above may not have the effect you intend: they prevent preemption by another thread, and they prevent migration to another CPU, but they don't prevent fast interrupt handlers from executing. Any synchronization with a fast interrupt handler needs to be done either using spinlocks, or other atomic operations. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: GSoC: Semantic File System
On Tue, 7 Apr 2009, Stephan Lichtenauer wrote: Am 02.04.2009 um 19:26 schrieb Robert Watson: In the BeOS model, or my reinterpretation based on something I read a long time ago and then presumably had dreams about, the split is a bit different: the file system maintains indexes of extended attributes, which are written by applications in order to expose searchable material. For example, a mail application might write out each message as a file, and attach a series of extended attributes, such as subject line, date, author, etc. These extended attributes are then indexed automatically by the file system in order to allow queries to be evaluated. I don't recall how queries and results are expressed, and in particular, whether the queries are processed by the file system (possibly exposed via special APIs or the name space) or userspace (accessing special files maintained by the kernel that are the indexes). It's also worth observing that one of the authors of BFS was Dominic Giampaolo, who now works on Apple's HFS+, and implemented fsevents there as part of their Spotlight project. Maybe you also might be interested that there is a PDF document (formerly book) from Dominic available describing the BeOS file system in great detail: http://www.haiku-os.org/legacy-docs/practical-file-system-design.pdf Additionally, there seems to be a GSoC project to create something like Spotlight for Haiku, the open source BeOS clone. You could browse through the haiku-developer mailing list archives at http://www.freelists.org/archive/haiku-development, the thread where this has been discussed is titled Need Some GSoC Advice with the first mail from 21 March. Actually, I have a original copy of the book on the bookshelf behind me. :-) Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: GSoC: Semantic File System
On Thu, 2 Apr 2009, Gabriele Modena wrote: On Sun, Mar 22, 2009 at 6:52 PM, Robert Watson rwat...@freebsd.org wrote: We are certainly not uninterested in projects along these lines, but I think the trick will be creating a convincing proposal that argues that (a) you can do the work in a summer, (b) there's a compelling usage case for including the results in FreeBSD, and (c) find a mentor who can supervise you in this project. Thanks, I will keep it on mind when writing the proposal. How do you suggest to proceed for finding a mentor? By the way, this is a project that I'm very probably going to carry on even without GSoC support (even though that would be very useful). Well, I think the first step is to write the proposal, and we can see about shopping it around for a potential mentor. What sort of semantic file system do you have in mind? How would you feel about a middle-ground project along the lines of Mac OS X Spotlight or similar efficient userspace indexing of a file system based on feedback from the file system about what has changed, or something BeOS-like, in which indexing takes place for extended attributes rather than for contents? In this moment I am considering also an userspace approach similar to Spotlight/Beagles, but I don't know how I could propose this as a FreeBSD GSoC project. I think that would make a fine GSoC proposal. Keep in mind that one of the premises of Spotlight is the fsevents kernel feature, and fseventsd, which allow Spotlight to subscribe to changes in trees and kick off reindexing as required. Porting the fsevents API to FreeBSD is fairly straight forward, with one exception: HFS+ offers a much more reliable notion of vnode-path mapping, but it would be interesting to see how well our current vnode-path mapping mechanisms would suffice in practice (since a lot of the edge cases that don't work well with our mapping system are exactly that -- edge cases). Between kernel and userspace parts there's quite a bit to do, but one possibility would be to borrow parts from Mac OS X/etc that we need. For example, do a literal port of the fsevents mechanism from XNU, provide our own implementation that provides a similar API, or provide a new mechanism that meets fseventd's semantic requirements for monitoring. What I have in mind at the moment would be an indexing based on contents rather than extended fs attributes. I did not know about the BeOS semantics capabilities, I will surely have a look at that. I'm probably blending reality with imagination here, but my vague recollection is that the model was a slightly different blend of user vs. application involvement in indexing. For systems like Spotlight, there are no kernel-maintained indexes, the kernel simply provides a change list so that the userspace indexer can go through and apply file type-specific indexes to all files that have changed. So, for example, there are indexers for word files, plain text files, pdf's, and so on. In the BeOS model, or my reinterpretation based on something I read a long time ago and then presumably had dreams about, the split is a bit different: the file system maintains indexes of extended attributes, which are written by applications in order to expose searchable material. For example, a mail application might write out each message as a file, and attach a series of extended attributes, such as subject line, date, author, etc. These extended attributes are then indexed automatically by the file system in order to allow queries to be evaluated. I don't recall how queries and results are expressed, and in particular, whether the queries are processed by the file system (possibly exposed via special APIs or the name space) or userspace (accessing special files maintained by the kernel that are the indexes). It's also worth observing that one of the authors of BFS was Dominic Giampaolo, who now works on Apple's HFS+, and implemented fsevents there as part of their Spotlight project. Robert N M Watson Computer Laboratory University of Cambridge___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On Fri, 27 Mar 2009, Scott Long wrote: I've been talking about this for years. All I need is help with the VM magic to create the page on fork. I also want two pages, one global for gettimeofday (and any other global data we can think of) and one per-process for static data like getpid/getgid. FWIW, there are some variations in schemes across OS's -- one extreme is the Linux approach, which actually exports a mini shared library in ELF format on the shared page, providing implementations of various services (such as entering system calls), time stuff, etc. Less extreme are the shared pages offered on Mac OS X, etc. Robert N M Watson Computer Laboratory University of Cambridge Scott Sergey Babkin wrote: (Sorry for the top quoting). Probably the best implementation of gettimeofd=y() is to have a page in the kernel mapped read-only to all the user pr=cesses. Put the kernel's idea of time into this page. Then getting the =ime becomes a simple read (OK, two reads, to make sure that no update =as happened in between). The TSC can then be used to add the precis=on between the ticks of the kernel timer: i.e. remember the value of TS= when the last tick happen, and the highest rate at which TSC may be ti=king at this CPU, and export in the same page. This would guarantee thatthe time is not moving back. However there are more issues with TS=. TSC is guaranteed to have the same value on all the processors that s=are the same system bus. But if the machine is built of multiple buses =ith bridges between them, all bets are off. Each bus may be stopped, resta=ted and clocked separately. There is no way to tell, on which CPU is th= process currently runnning, and it may be rescheduled do a different C=U right before or after the RDTSC instruction. -SB Ma= 26, 2009 06:55:04 PM, [1]...@phk.freebsd.dk wrote: In message [2]17560ccf0903260551v1f5cba9eu8 7727c0bae7b...@mail.gmail.com, Prasha nt Vaibhav writes: =The gettimeofday() function's implementation will then be change= to read the timestamp counter (TSC) from the processor, and use the g=;reading in conjunction with the timing info exported by the kernel to =calculate and return the time info in proper format. I take it a= read, that you know that there are other relvant functions than gettim=ofday() and that these must provide a monotonic timescale when queried =nterleaved ? Be aware that the TSC may not be, and may not stay syn=hronized across multiple cores. Further more, the TSC is not con=tant frequency and in particular not known frequency at all times. There are a lot of nasty cases to check, and a nasty interpolation =equired, which, in my tests some years back, totally negated any speedu= from using the TSC in the first place. At the very minimum, you wi=l have to add a quirk table where known good {CPU+MOBO+BIOS} combinatio=s can be entered, as we find them. This will also pave way f=r optionally making the FreeBSD kernel tickless, Rubbish. T=mecounters are not even closely associated with the tick or ticklessnes= of the kernel. [1] - The TSC frequency might change on cert=in processors with non-constant TSC rate (because of SpeedStep, =ynamic freq scaling etc.). The only way to combat this is that t=e kernel be notified every time the processor frequency changes.=very cpu frequency driver will need to be updated to notify the=ernel before and after a cpu freq change. That is not good enough= the bios may autonomously change the cpu speed and the skew from not k=owing exactly _when_ and _how_ the cpu clock changed, is a significant =umber of microseconds, plenty of time to make strange things happen. You will want to study carefully Dave Mills work to tame the alpha =hips wandering SAW clocks. Poul-Henning [1] In my mind, rewo=king the callout system in the kernel would be a much better more neded=nd much more worthwhile project. -- Poul-Henning Kamp | =NIX since Zilog Zeus 3.20 [3]...@freebsd.org | TCP=IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe N=ver attribute to malice what can adequately be explained by incompetence.=r___ [4]freebsd-hack...@freebsd.org mailing list [5]http://lists.freebsd.org/mailman/listinfo/freebsd-hackersTo unsubscribe, send any mail to [6]fre ebsd-hackers-unsubscr...@freebsd.org References 1. 3Dmailto:p...@phk.freebsd.dk; 2. file://localhost/tmp/3D 3. 3Dmailto:p...@freebsd.org; 4. 3Dmailto:fre 5. 3Dhttp://lists.=/ 6. 3Dmailto:freebsd-hackers-unsub___ freebsd-curr...@freebsd.org mailing list
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On Fri, 27 Mar 2009, Scott Long wrote: I've been talking about this for years. All I need is help with the VM magic to create the page on fork. I also want two pages, one global for gettimeofday (and any other global data we can think of) and one per-process for static data like getpid/getgid. One note though -- the time to do the global page is at execve()-time. Robert N M Watson Computer Laboratory University of Cambridge Scott Sergey Babkin wrote: (Sorry for the top quoting). Probably the best implementation of gettimeofd=y() is to have a page in the kernel mapped read-only to all the user pr=cesses. Put the kernel's idea of time into this page. Then getting the =ime becomes a simple read (OK, two reads, to make sure that no update =as happened in between). The TSC can then be used to add the precis=on between the ticks of the kernel timer: i.e. remember the value of TS= when the last tick happen, and the highest rate at which TSC may be ti=king at this CPU, and export in the same page. This would guarantee thatthe time is not moving back. However there are more issues with TS=. TSC is guaranteed to have the same value on all the processors that s=are the same system bus. But if the machine is built of multiple buses =ith bridges between them, all bets are off. Each bus may be stopped, resta=ted and clocked separately. There is no way to tell, on which CPU is th= process currently runnning, and it may be rescheduled do a different C=U right before or after the RDTSC instruction. -SB Ma= 26, 2009 06:55:04 PM, [1]...@phk.freebsd.dk wrote: In message [2]17560ccf0903260551v1f5cba9eu8 7727c0bae7b...@mail.gmail.com, Prasha nt Vaibhav writes: =The gettimeofday() function's implementation will then be change= to read the timestamp counter (TSC) from the processor, and use the g=;reading in conjunction with the timing info exported by the kernel to =calculate and return the time info in proper format. I take it a= read, that you know that there are other relvant functions than gettim=ofday() and that these must provide a monotonic timescale when queried =nterleaved ? Be aware that the TSC may not be, and may not stay syn=hronized across multiple cores. Further more, the TSC is not con=tant frequency and in particular not known frequency at all times. There are a lot of nasty cases to check, and a nasty interpolation =equired, which, in my tests some years back, totally negated any speedu= from using the TSC in the first place. At the very minimum, you wi=l have to add a quirk table where known good {CPU+MOBO+BIOS} combinatio=s can be entered, as we find them. This will also pave way f=r optionally making the FreeBSD kernel tickless, Rubbish. T=mecounters are not even closely associated with the tick or ticklessnes= of the kernel. [1] - The TSC frequency might change on cert=in processors with non-constant TSC rate (because of SpeedStep, =ynamic freq scaling etc.). The only way to combat this is that t=e kernel be notified every time the processor frequency changes.=very cpu frequency driver will need to be updated to notify the=ernel before and after a cpu freq change. That is not good enough= the bios may autonomously change the cpu speed and the skew from not k=owing exactly _when_ and _how_ the cpu clock changed, is a significant =umber of microseconds, plenty of time to make strange things happen. You will want to study carefully Dave Mills work to tame the alpha =hips wandering SAW clocks. Poul-Henning [1] In my mind, rewo=king the callout system in the kernel would be a much better more neded=nd much more worthwhile project. -- Poul-Henning Kamp | =NIX since Zilog Zeus 3.20 [3]...@freebsd.org | TCP=IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe N=ver attribute to malice what can adequately be explained by incompetence.=r___ [4]freebsd-hack...@freebsd.org mailing list [5]http://lists.freebsd.org/mailman/listinfo/freebsd-hackersTo unsubscribe, send any mail to [6]fre ebsd-hackers-unsubscr...@freebsd.org References 1. 3Dmailto:p...@phk.freebsd.dk; 2. file://localhost/tmp/3D 3. 3Dmailto:p...@freebsd.org; 4. 3Dmailto:fre 5. 3Dhttp://lists.=/ 6. 3Dmailto:freebsd-hackers-unsub___ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to
Re: Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On Fri, 27 Mar 2009, Sergey Babkin wrote: Would not a normal mmap be duplicated on fork? I'd do it as a small pseudo-= driver that allows to mmap this page. Then libc would open this pseudo-d= evice and mmap it, either in the on-load handler or on the first call of= gettimeofday(). I think, that should be it, no special magic nece= ssary. The per-process is more difficult and would require the magic= :-) Or maybe no magic a s such: just mmap the file from the /proc files= ystem. Then on fork in the child unmap this page, open the new file, and= map it. vfork will still be tricky :-) It also means wasting an extra p= age per process. Part of the point of mapping in the page at execve()-time, or fork()-time for per-process pages (which I'm not entirely convinced we need yet) is to avoid the cost of an extra device open, mmap, etc, for every execve(), which can be quite expensive. I stuck a prototype page mapped from a special device exporting time information here a year or two ago: http://www.watson.org/~robert/freebsd/20080203-evilmem.diff http://www.watson.org/~robert/freebsd/evilmem_test.c This doesn't do TSC-based adjustment, just drops a timestamp in from the callout wheel, but was intended to allow Kris to do a bit of comparative benchmarking and decide if it might be a viable approach to invest further work in. Obviously, the above code should never, ever, get near a production kernel, since it was a 2-hour hack for experimental purposes. I think the right way forward is to prototype: map the page in at execve()-time in the kernel and pass the address to rtld via elf auxiliary arguments, and have rtld link it (via some or another means), exposing symbols or code or whatever, to libc. If someone wants to make it a dynamic shared object in ELF-speak, then I'm all for that as it would minimize the work rtld had to do. I guess interesting questions are whether (a) it would be desirable to have per-page, per-cpu, or per-thread mappings. If there are non-synchronized TSCs, then there might be some interesting advantages to a per-CPU page. Robert N M Watson Computer Laboratory University of Cambridge -SB Mar 27, 2009 12:51:56 PM, [1]sco...@samsc= o.org wrote: I've been talking about this for years. All I need is help with = the VM magic to create the page on fork. I also want two pages, one gl= obal for gettimeofday (and any other global data we can think of) and on= e per-process for static data like getpid/getgid. Scott Sergey Babkin wrote: (Sorry for the top quoting). Probably the= best implementation of gettimeofd=3Dy() is to have a= page in the kernel mapped read-only to all the user pr=3Dcesses. Put g= t; the kernel's idea of time into this page. Then getting the= =3Dime becomes a simple read (OK, two reads, to make sure that= br no update =3Das happened in between). References 1. file://localhost/tmp/3Dmai= ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On Fri, 27 Mar 2009, Poul-Henning Kamp wrote: In message alpine.bsf.2.00.0903272254460.12...@fledge.watson.org, Robert Wats on writes: I guess interesting questions are whether (a) it would be desirable to have per-page, per-cpu, or per-thread mappings. If there are non-synchronized TSCs, then there might be some interesting advantages to a per-CPU page. Rule #3: The only thing worse than generalizing from one example is generalizing from no examples at all. We can add those mappings when we know why we would want them. If we believe TSCs won't be synchronized, and don't want to synchronize them ourselves, then we'll need different mapping state to get from a TSC stamp to a time on different CPUs. In which case user application threads will need to know their CPU in order to use the right conversion data (ideally without a system call, since that's part of what we're avoiding here), or use a per-CPU mapping and not know (in which case they'll need to detect and handle the very rare preempted and migrated between read TSC and read conversion data race). I'm not pushing a per-CPU page, but there would be some interesting advantages to supporting that. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On Fri, 27 Mar 2009, Poul-Henning Kamp wrote: In message alpine.bsf.2.00.0903272303040.12...@fledge.watson.org, Robert Wats on writes: In which case user application threads will need to know their CPU [...] Didn't jemalloc solve that problem once already ? I think jemalloc implements thread-affinity for arenas rather than CPU-affinity in the strict sense, but I may misread. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: does Copyright on source files expire ?
On Thu, 26 Mar 2009, ttw+...@cobbled.net wrote: On 25.03-05:31, David Schultz wrote: [ ... ] A person's Copyright doesn't go away just because they die, disappear, or fail to respond. If you can't contact them, their heirs, or whomever they transferred the Copyright to, you're stuck. yeah but it's a little like finding something. if there not about and not reachable there isn't much they can do to stop you using it. if they popup and make demands later then you get to choose between re-writes and haggling (twenty shekels is standard). In some countries, such as the US, copyright violation can be a criminal, not just civil, matter. Also, in countries where copyright can be assigned, the holder listed in a file may not accurately represent who the current holder is, so while the original author may be unreachable, etc, the current holder may be alive and kicking. Robert N M Watson Computer Laboratory University of Cambridge point is you can use it, the actual copyright owner needs to sue you; not like saying jehovah which may result in action by the agents of the state. n.b: using the above opinion may get you crucified. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: 2 uni-directional TCP connection good
On Sun, 22 Mar 2009, Yoshihiro Ota wrote: On Fri, 20 Mar 2009, Yoshihiro Ota wrote: 1. With TCP connections, only sender side can detect some communication issues passively if happened. By using two connections, you lost that ability by your self. I agree on this one. Could you expand a bit on this point? While the connection creation process (usually) asymmetric, once the connection is built it's essentially the same state machine on both sides of the connection, and socket semantics with respect to the state machine are effectively identical. Application on both sides should be able to detect disconnect, monitor connection state using TCP_INFO, etc. What I meant was that there were cases when a receiver could not tell weather no data was coming or communication was interrupted. Once connection is established, a route is available between a server and a client. Let's say this route is broken for some reasons, i.e. someone unplugged a cable or a firewall started dropping or rejecting between these server and client, a sender may not notice as soon as it happens but at least, a sender knows a massages was not delivered right. On the other hand, receiver side does not have any idea that a message delivery failure has happened at all or for a while unless using heartbeat messages in upper layer. KEEP_ALIVE option seems to be implementation dependent such that you cannot assure TCP connection availability for every minute. This is generally considered a robustness property rather than a fragility issue, but yes: if you need a liveliness property for idle connections with TCP, it's something you have to implement at the application layer, and many protocols indeed do this. I don't see that this is an argument for using two TCP connections as opposed to one, however. If you're interested in alternative protocols, however, SCTP allows a number of these protocol behaviors to be modified, and includes support for a heartbeat. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: GSoC: Semantic File System
On Sat, 21 Mar 2009, Gabriele Modena wrote: I am an AI master student at the university of Amsterdam. On of my current research interests lays in the area of information retrieval and I would like to do a project within my University research group starting next june. I am actually studying background literature about semantic filesystem and information retrieval over local files. Being also quite interested in kernel development, I would like to propose a proof of concept that implements such techniques. My goal, though, would not be just a reimplementation of existing code, but possibly some more extensive work that combines techniques already used in other domains of II. Could this be an interesting Summer of Code proposal for the FreeBSD Foundation? I plan to write down some notes/ideas (and details) I have on a wiki starting from next week. Hi Gabriele-- We are certainly not uninterested in projects along these lines, but I think the trick will be creating a convincing proposal that argues that (a) you can do the work in a summer, (b) there's a compelling usage case for including the results in FreeBSD, and (c) find a mentor who can supervise you in this project. What sort of semantic file system do you have in mind? How would you feel about a middle-ground project along the lines of Mac OS X Spotlight or similar efficient userspace indexing of a file system based on feedback from the file system about what has changed, or something BeOS-like, in which indexing takes place for extended attributes rather than for contents? Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: 2 uni-directional TCP connection good?
On Fri, 20 Mar 2009, Yoshihiro Ota wrote: 1. With TCP connections, only sender side can detect some communication issues passively if happened. By using two connections, you lost that ability by your self. I agree on this one. Could you expand a bit on this point? While the connection creation process (usualy) asymetric, once the connection is built it's essentially the same state machine on both sides of the connection, and socket semantics with respect to the state machine are effectively identical. Application on both sides should be able to detect disconnect, monitor connection state using TCP_INFO, etc. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: SA add notification to externa module
On Tue, 17 Mar 2009, srikanth jampala wrote: This is my first posting. I want the notifications about the SA (security association) add/delete events, from the kernel to my externel kernel module. How can I do this... ? Thanks in advance for ur suggestions. I'm not sure if PF_KEY has an async notification event, but in principle you could consume those inside the kernel, not just in a user application. Alternatively, you might reasonably submit a patch to add an EVENTHANDLER(9) event at the right points in the kernel code so that future versions of FreeBSD will allow your code to plug in more easily. We already provide event handler hooks for things like process fork/exit, arrival/departure of network interfaces, etc. The trick is to place them at the right points so that appropriate locks are held, and you'll want to avoid having your handler code change the semantics of the calling site (i.e., don't sleep if that's not allowed). Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org