Re: CPU report in first line of vmstat 1 is meaningless
: : On a related note I'm not sure if it makes sense to have the same : behaviour for the first line when an interval is set as when it is : invoked with no interval. : :...also vmstat seems to exist in a few other OSes (linux e.g). maybe they've :fixed it already (or the netbsd/openbsd/dragonflybsd folks or apple?). : :cheers. :alex No, we haven't. I think it is meaningless too, and I can't imagine any script that would try to snarf that line. Another problem is that vmstat output in general is blowing out its column formatting. We have made some changes in DFly... sub-second intervals can be specified, and I think we enforce at least one space between fields so the output doesn't become totally unreadable. But the blowouts still remain. -Matt ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Examining the VM splay tree effectiveness
I don't remember the reference but I read a comprehensive comparison between various indexing methods about a year ago and the splay tree did considerably better than a RB-tree. The RB-tree actually did fairly poorly. Any binary tree-like structure makes fairly poor use of cpu caches. Splay trees work somewhat better as long as there is some locality of reference in the lookups, since the node being looked up is moved to the root. It isn't a bad trade-off. On the negative side all binary-tree-like structures tend to be difficult to MP-lock in a fine-grained manner (not that very many structures are locked at that fine a grain anyway, but it's a data point). Splay-trees are impossible to lock at a fine-grain due to the massive modifications made on any search and the movement of the root, and there are MP access issues too. -- What turned out to be the best indexing mechanism was a chained hash table whos hoppers were small linear arrays instead of single elements. So instead of pointer-chaining each element you have a small for() loop for 4-8 elements before you chain. The structure being indexed would NOT be integrated into the index directly, the index would point at the final structure from the hopper. For our purposes such linear arrays would contain a pointer and an indexing value in as small an element as possible (8-16 bytes), the idea being that you make the absolute best use of your cache line and L1 cache / memory burst. One random access (initial hash index), then linear accesses using a small indexing element, then one final random access to get to the result structure and validate that it's the one desired (at that point we'd be 99.9% sure that we have the right structure because we have already compared the index value stored in the hopper). As a plus the initial hash index also makes MP locking the base of the chains easier. I don't use arrayized chained hash tables in DFly either, but only because it's stayed fairly low on the priority list. cpu isn't really a major issue on systems these days. I/O is the bigger problem. RB-Trees are simply extremely convenient from the programming side, which is why we still use them. I was surprised that splay trees did so well vs RB-trees, I never liked the memory writes splay trees do let alone the impossibility of fine-grained locking. Originally I thought the writes would make performance worse, but they actually don't. Still, if I were to change any topologies now I would definitely go with the chained-hash / small-linear-array / chain / small-linear-array / chain mechanic. It seems to be the clear winner. -Matt Matthew Dillon dil...@backplane.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Increasing MAXPHYS
:The whole point of the discussion, sans PHK's interlude, is to reduce the context switches and indirection, not to increase it. But if you can show decreased latency/higher-iops benefits of increasing it, more power to you. I would think that the results of DFly's experiment with parallelism-via-more-queues would serve as a good warning, though. : :Scott Well, I'm not sure what experiment you are refering to but I'll assume its the network threading, which works quite well actually. The protocol threads can be matched against the toeplitz function and in that case the entire packet stream operates lockless. Even without the matching we still get good benefits from batching (e.g. via ether_input_chain()) which drops the IPI and per-packet switch overhead basically to zero. We have other issues but the protocol threads aren't one of them. In anycase, the lesson to learn with batching to a thread is that you don't want the thread to immediately preempt the sender (if it happens to be on the same cpu), or to generate an instant IPI (if going between cpus). This creates a degenerate case where you wind up with a thread switch on each message or an excessive messaging interrupt rate... THAT is what seriously screws up performance. The key is to be able to batch multiple messages per thread switch when under load and to be able to maintain a pipeline. A single user-process test case will always have a bit more latency and can wind up being inefficient for a variety of other reasons (e.g. whether the target thread is on the same cpu or not), but that becomes less relevant when the machine is under load so its a self-correcting problem for the most part. Once the machine is under load batching becomes highly efficient. That is, latency != cpu cycle cost under load. When the threads have enough work to do they can pick up the next message without the cost of entering a sleep state or needing a wakeup (or needing to generate an actual IPI interrupt, etc). Plus you can run lockless and you get excellent cache locality. So as long as you ensure these optimal operations become the norm under load you win. Getting the threads to pipeline properly and avoid unnecessary tsleeps and wakeups is the hard part. -- But with regard to geom, I'd have to agree with you. You don't want to pipeline a single N-stage request through N threads. One thread, sure... that can be batched to reduce overhead. N-stages through N-threads just creates unnecessary latency, complicates your ability to maintain a pipeline, and has a multiplicative effect on thread activity that negates the advantage of having multiple cpus (and destroys cache locality as well). You could possibly use a different trick at least for some of the simpler transformations, and that is to replicate the control structures on a per-cpu basis. If you replicate the control structures on a per-cpu basis then you can parallelize independent operations running through the same set of devices and remove the bottlenecks. The set of transformations for a single BIO would be able to run lockless within a single thread and the control system as a whole would have one thread per cpu. (Of course, a RAID layer would require some rendezvous to deal with contention/conflicts, but that's easily dealt with). That would be my suggestion. We use that trick for our route tables in DFly, and also for listen socket PCBs to remove choke points, and a few other things like statistics gathering. -Matt Matthew Dillon dil...@backplane.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Increasing MAXPHYS
:All above I have successfully tested last months with MAXPHYS of 1MB on :i386 and amd64 platforms. : :So my questions are: :- does somebody know any issues denying increasing MAXPHYS in HEAD? :- are there any specific opinions about value? 512K, 1MB, MD? : :-- :Alexander Motin (nswbuf * MAXPHYS) of KVM is reserved for pbufs, so on i386 you might hit up against KVM exhaustion issues in unrelated subsystems. nswbuf typically maxes out at around 256. For i386 1MB is probably too large (256M of reserved KVM is a lot for i386). On amd64 there shouldn't be a problem. Diminishing returns get hit pretty quickly with larger MAXPHYS values. As long as the I/O can be pipelined the reduced transaction rate becomes less interesting when the transaction rate is less than a certain level. Off the cuff I'd say 2000 tps is a good basis for considering whether it is an issue or not. 256K is actually quite a reasonable value. Even 128K is reasonable. Nearly all the issues I've come up against in the last few years have been related more to pipeline algorithms breaking down and less with I/O size. The cluster_read() code is especially vulnerable to algorithmic breakdowns when fast media (such as a SSD) is involved. e.g. I/Os queued from the previous cluster op can create stall conditions in subsequent cluster ops before they can issue new I/Os to keep the pipeline hot. -Matt Matthew Dillon dil...@backplane.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Increasing MAXPHYS
:Pardon my ignorance, but wouldn't so much KVM make small embedded :devices like Soekris boards with 128 MB of physical RAM totally unusable :then? On my net4801, running RELENG_8: : :vm.kmem_size: 40878080 : :hw.physmem: 125272064 :hw.usermen: 84840448 :hw.realmem: 134217728 KVM != physical memory. On i386 by default the kernel has 1G of KVM and userland has 3G. While the partition can be moved to increase available KVM on i386 (e.g. 2G/2G), it isn't recommended. So the KVM reserved for various things does not generally impact physical memory use. The number of swap buffers (nswbuf) is scaled to 1/4 nbufs with a maximum of 256. Systems with small amounts of memory should not be impacted. The issue w/ regards to KVM problems on i386 is mostly restricted to systems with 2G+ of ram where the kernel's various internal parameters are scaled to their maximum values or limits. On systems with less ram the kernel's internal parameters are usually scaled down sufficiently that there is very little chance of the kernel running out of KVM. -Matt ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: 5.2-RELEASE TODO
: This argument is exactly why I added the 'disable acpi' option in the boot : loader menu. Of course, we STILL need to get good debugging information : from you as to why you get a Trap 9 when ACPI is disabled. This is the : more important issue. : :This is actually a known issue on Intel motherboards. Somehow we broke :something in our bios32 code. 4.x works fine using the BIOS to enumerate :PNP BIOS devices, but 5.x (including 5.0 and 5.1) get a GPF (Trap 9) :with the code segment set to 0x58 trying to enumerate the last PNPBIOS :device. Somehow the BIOS routine jumps off into lala land where it :eventually traps. I dumped the BIOS and dissassembled and tried to walk :it to figure out how it was breaking but couldn't see anything obvious. : :-- : :John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ I have a motherboard that even 4.x doesn't work on... same deal, GPF while enumerating PNP devices. It died on the 6th device or something like that. I went so far as to 'fix' DDB's disassembler (in DFly) to properly decode the code segment (instead of double faulting on a 'bad' EIP in DDB) and properly show 16 bit instructions, and I set break points and single-stepped through. It died in the same place every time, during an attempt to issue a write to a CS: prefixed memory address. But I suspect a bad branch or indirect-jump table lookup. I finally gave up but if I were to do it again I would attach a serial console and record the single-step session all the way through two device number iterations... the one prior to the one that failed, and the one that failed, then I'd compare the output for the successful device iteration verses the failure to figure out where they diverge. That's what I suggest... don't play around, just connect the serial console, fix DDB's disassembler (you can pop the changes that I made in DFly, they should be the same), single step through two iterations, and compare. If you figure out what is causing the problem I'd love an email. I suspect it is a mapping overlap somewhere due to the page 0 map. I just can't think of anything else it might be. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
bug in CD9660 path handling in libstand (effects boot code)
This fixes a bug where an attempt to load a file path which contains an intermediate path element which is a file on the CD rather then a directory results in the file being accessed like a directory... which can lockup the boot code. The fix is simple. Check to see if an intermediate path element really represents a directory and return ENOENT if it doesn't. This situation can occur due to searches via the module search path. -Matt Matthew Dillon [EMAIL PROTECTED] Index: cd9660.c === RCS file: /cvs/src/lib/libstand/cd9660.c,v retrieving revision 1.3 diff -u -r1.3 cd9660.c --- cd9660.c8 Aug 2003 04:18:34 - 1.3 +++ cd9660.c1 Dec 2003 08:37:25 - @@ -372,7 +372,13 @@ rec = *dp; while (*path *path != '/') /* look for next component */ path++; - if (*path) path++; /* skip '/' */ + if (*path == '/') { /* skip /, make sure is dir */ + path++; + if (*path (isonum_711(dp-flags) 2) == 0) { + rc = ENOENT;/* not directory */ + goto out; + } + } } /* allocate file system specific data structure */ ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: fork speed vs /bin/sh
:What this shows is that vfork() is 3 times faster than fork() on static :binaries, and 9 times faster on dynamic binaries. If people are :worried about a 40% slowdown, then perhaps they'd like to investigate :a speedup that works no matter whether its static or dynamic? There is :a reason that popen(3) uses vfork(). /bin/sh should too, regardless of :whether its dynamic or static. csh/tcsh already uses vfork() for the :same reason. : :NetBSD have already taken advantage of this speedup and their /bin/sh uses :vfork(). Some enterprising individual who cares about /bin/sh speed should :check out that. Start looking near #ifdef DO_SHAREDVFORK. That isn't really a fair comparison because your vfork is hitting a degenerate case and isn't actually doing anything significant. You really need to exec() something. I've included a program below that [v]fork/exec's ./sh -c exit 0 5000 times. Dell2550, 2xCPU (MP build), DFly 0.000u 4.095s 0:02.53 161.6% 154+107k 0+0io 0pf+0w VFORK/EXEC STATIC SH 0.000u 6.681s 0:04.04 165.3% 94+97k 0+0io 0pf+0w FORK/EXEC STATIC SH 0.500u 16.844s 0:16.34 106.1% 53+84k 0+0io 0pf+0w VFORK/EXEC DYNAMIC SH 0.093u 18.303s 0:23.86 77.0%42+79k 0+0io 0pf+0w FORK/EXEC DYNAMIC SH Athlon64, 2xCPU (UP), DFly 0.078u 0.687s 0:00.74 101.3%399+226k 0+0io 0pf+0w VFORK/EXEC STATIC SH 0.117u 0.968s 0:01.07 100.0%273+208k 0+0io 0pf+0w FORK/EXEC STATIC SH 2.218u 2.484s 0:04.71 99.5% 121+180k 0+0io 1pf+0w VFORK/EXEC DYNAMIC SH 2.281u 2.773s 0:04.98 101.4%113+179k 0+0io 0pf+0w FORK/EXEC DYNAMIC SH 1.304u 2.289s 0:03.60 99.4% 121+180k 0+0io 0pf+0w VFORK/EXEC DYNAMIC SH WITH PREBINDING. 1.296u 2.648s 0:03.90 100.7%112+180k 0+0io 1pf+0w FORK/EXEC DYNAMIC SH WITH PREBINDING. These results were rather unexpected, actually. I'm not sure why the numbers on the DELL box are so bad with a dynamic 'sh' but I suspect that the dynamic linking is blowing out the L1 cache. In anycase, taking the Athlon64 system the difference between static and dynamic is around 4 seconds while the difference between vfork and fork is only around 0.25 seconds, so while moving to vfork() helps it doesn't help all that much. Unless you happen to be hitting a boundary condition on the L1 cache, that is. If that is presumably the case on the Dell box (which only has a 16K L1 cache where as the AMD64 has a 64K L1 cache), then the difference is around 14 seconds between vfork static and vfork dynamic verses an additional 8 seconds going from vfork to fork. Vfork would probably be a significant improvement on the DELL box. Prebinding generates around a 20% overhead improvement for the dynamic 'sh' on the Athlon64 but on the Dell2550 prebinding actually made things go slower (not shown above), from 23.8 seconds to 26 seconds. I think there is an edge case due to prebinding having a greater L1 cache impact. For larger, more complex programs prebinding shows definite, if small, improvements. -Matt /* * CD into the directory containing the ./sh executable before running */ #include sys/types.h #include stdio.h #include unistd.h main() { int i; pid_t pid; for (i = 0; i 5000; ++i) { if ((pid = vfork()) == 0) { /* CHANGE THIS FORK/VFORK */ execl(./sh, ./sh, -c, exit, 0, NULL); write(2, problem\n, 8); _exit(1); } if (pid 0) waitpid(pid, NULL, 0); } return(0); } ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
:At 00:23 26/11/2003 -0500, Michael Edenfield wrote: :Static /bin/sh: : real385m29.977s : user111m58.508s : sys 93m14.450s : :Dynamic /bin/sh: : real455m44.852s : user113m17.807s : sys 103m16.509s : : Given that user+sys real in both cases, it looks like you're running :out of memory; it's not surprising that dynamic linking has an increased :cost in such circumstances, since reading the diverse files into memory :will take longer than reading a single static binary. : I doubt many systems will experience this sort of performance delta. : :Colin Percival It definitely looks memory related but the system isn't necessarily 'running out' of memory. It could simply be that the less memory available for caching files is causing more disk I/O to occur. It should be possible to quanity this by doing a full timing of the build ( /usr/bin/time -l ), which theoretically includes I/O ops. Dynamically linked code definitely dirties more anonymous memory then static, and definitely accesses more shared file pages. The difference is going to depend on the complexity of the program. How much this effects system peformance depends on the situation. If the system has significant idle cycles available the impact should not be too serious, but if it doesn't then the overhead will drag down the pre-zerod pages (even if the program is exec'd, does something real quick, and exits). I have included a program below that prints the delta free page count and the delta zero-fill count once a second. This can be used to estimate anonymous memory use. Run the program and let it stabilize. Be sure that the system is idle. Then run the target program (it needs to stick around, it can't just exec and exit), then exit the target program and repeat. Leave several seconds in between invocation, exit, and repeat to allow the system to stabilize. Note that it may take several runs to get reliable information since the program is measuring anonymous memory use for the whole system. Also note that shared pages will not be measured by this program, only the number of dirtied anonymous pages. If on an idle system the program is not reporting '0 0' then your system isn't idle :-). The main indicator is the 'freepg' negative jump when the target program is invoked. The zfod count will be a subset of that, indicating the number of zero-fill pages requested (verses program text/data COW pages which do not need zero'd pages but still eat anonymous memory for the duration of the target program). When I tested it with a static and dynamic /bin/sh on 4.8 I got (looking at 'freepg'), 20 pages for the static binary and 50 pages for the dynamic binary. So a dynamic /bin/sh eats 30 * 4K = 120K more anonymous memory then a static /bin/sh. In the same test I got 12 ZFOD faults for the static binary and 34 ZFOD faults for the dynamic binary, which means that 22 additional pre-zero'd pages are being allocated in the dynamic case (88KB). If /bin/sh is exec'd a lot in a situation where the system is otherwise not idle, this will impact the number of pre-zero'd pages available on the system. Each exec of a dyanmic /bin/sh eats 22 additional pages (88K) worth of zero-fill. Each resident copy of (exec'd) /bin/sh eats 120KB more dirty anonymous memory. make buildworld -j 1 may have as many as a dozen /bin/sh's exec'd at any given moment (impact 120K each) depending on where in the build it is. -j 2 and so forth will have even more. This will impact your system relative to the amount of total system memory you have. The more system memory you have, the less the percentage impact. /bin/sh /bin/csh -- --- static freepg -19 zfod 12 freepg -140 zfod 129 dynamic freepg -50 zfod 34 freepg -167 zfod 149 /usr/bin/make (note that make is static by default) -- static freepg -33 zfod 27 dynamic freepg -51 zfod 44 As you can see, the issue becomes less significant on a percentage basis with larger programs that already allocate more incidental memory. Also to my surprise I found that 'make' was already static. It would seem that this issue was recognized long ago. bzip2, chflags, make, and objformat are compiled statically even though they reside in /usr/bin. -Matt /* * print delta free pages and zfod requests once a second. Leave running * while testing other programs. Note: ozfod is not displayed. ozfod is * a subset of zfod, just as zfod deltas are a subset of v_free_count * allocations. */ #include sys/types.h #include sys/sysctl.h #include stdio.h #include
Re: 40% slowdown with dynamic /bin/sh
: That seems to have the most impact. We can also expend our efforts : to improve dynamic linking performance, since that will improve the : performance of the other 99.9% of the universe. : : :What happened to mdodd's prebinding efforts? : :Drew Prebinding was put into DFly but the improvement is barely noticeable when you prebind /bin/sh. Prebinding is only really useful for much larger programs like mozilla which have thousands or tens of thousands of bindings. These numbers are on DragonFly and will be different on FBsd, but the relative numbers should be similar with a FBsd prebinding implementation. /bin/sh -- static freepg -16 zfod 12 dynamic freepg -46 zfod 34 dyn/prebind freepg -46 zfod 26 saves 8*4=32KB worth of pre-zero'd pages but does not save any anon memory. 1000 runs of /bin/sh -c exit 0 (program to vfork/exec /bin/sh -c exit 0 1000 times). This effectively measures startup overhead only and is not inclusive of what a script might do in addition to starting up. /bin/sh static 11023 zfod's00.54s dynamic 33023 zfod's02.88s dyn/prebind 25023 zfod's02.36s There isn't much of a time improvement but there is a significant improvement in the number of ZFOD's with prebinding. leaf:/usr/obj/usr/src/bin/sh# prebind /usr/obj/usr/src/bin/sh/sh object /usr/obj/usr/src/bin/sh/sh uniqid 986137809 object /usr/lib/libedit.so.3 uniqid -1757875926 object /usr/lib/libncurses.so.5 uniqid 1023436343 object /usr/lib/libc.so.4 uniqid 2011683737 Non-PLT Prebindings: object /usr/lib/libedit.so.3 count 5 object /usr/lib/libncurses.so.5 count 63 object /usr/lib/libc.so.4 count 203 PLT Prebindings: object /usr/obj/usr/src/bin/sh/sh count 106 object /usr/lib/libedit.so.3 count 63 object /usr/lib/libncurses.so.5 count 270 object /usr/lib/libc.so.4 count 638 Non-PLT COPY Prebindings: object /usr/obj/usr/src/bin/sh/sh count 13 -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
:... :5.x and propaganda about DFBSD doesn't really mean a whole lot, unless you :are looking for new recruits to your camp. In any case, you've made your :point on a nearly daily basis that 5.x is inferior to what DFBSD will be, :and that you don't have much knowledge or care about 5.x anyways. So :please, go do what you do best and make DFBSD the envy of the BSD world. :I'll be first in line to pat you on the back when you succeed. : :Scott Hmm. Well, I think there's some confusion here. While I certainly like my vision for DFly better then I like the vision for FreeBSD-5, that is simply in the eye of the beholder... of course I am going to like my own vision better. It's my vision, after all! Your vision is obviously different. In fact, I expect that each person has his own vision for the project, so don't knock me for mine. But that has nothing to do with perceived inferiority or superiority. The issue isn't so much whether one project is better then the other as it is whether one is able and willing to borrow a proven concept from another project to replace the one that maybe isn't so hot in one's own. As it happens, I have borrowed quite a bit of code from 5.x. As it also happens, I believe that 5.x would benefit by adopting some of the things that have already been proven to work quite well in DragonFly. For example, using a statistical time accumulation model instead of calling microtime() in the middle of the critical thread switch path, or not preemptively switching threads operating in kernelland to another cpu, or the implementation of a mutexless scheduler. Just a few examples. I can only point out the concepts and ideas and point to the code in DFly, it is up to FreeBSD-5 developers to take the ball up and either throw it away or run with it. I have not been posting daily, but you seem to be frustrated about something. I can only suggest that blaming me for your frustrations is not going to solve any tangible, technical issue in FreeBSD-5. My posts are technical and to the point. Just because it's coming out of my mouth rather then someone you might respect a bit more doesn't make it any more or less valid. If you cannot address them based on their technical merit then you've missed the point of the post entirely. And, just for the record, I feel quite obligated to try to move the FreeBSD project forward along a path that I believe will be more beneficial to its users. Just to be clear: My obligation is to all the people who use FreeBSD, not to the feelings of particular developers whos vision(s) I might disagree with. I have no intent or intention of screwing over FreeBSD (how absurd!) but you should not mistakenly equate that to me being accomodating to FreeBSD's current vision which, yes! it is true! I have serious disagreements with. Over the years I have recommended FreeBSD to hundreds of people and I take that responsibility very seriously. If it is within the scope of the FreeBSD charter for a person to post based on a perceived obligation to the end users of FreeBSD, then I certainly still have a right to post to this group. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
password file is sufficiently limited in scope as to greatly reduce the potential for bugs creating situations which expose the password file. That is a damn sight better then an NSS module which needs to physically open the password file. :And if you *are* really talking about authentication code (and not :directory services), then you need to get PAM to work in a statically :linked world, also. (You can compile PAM statically today, but that :means no 3rd-party modules. The same holds for NSS, of course.) ... Or you can build an IPC mechanism that implements the PAM functionality and then have programs which would otherwise use PAM instead use the IPC mechanism. Which is the whole point of having the IPC mechanism in the first place. : The other huge advantage that IPC has over DLL is that you can switchout : the backend mechanisms on a running system without killing and restarting : anything other then he one service you want to switch out, and if it : doesn't work right you can restart the old one or keep live as a fallback. :When using the current NSS implementation, there is no need to :kill/restart anything when you update /etc/nsswitch.conf. New :modules are dynamically loaded, and any no-longer-referenced ones are :unloaded. Sounds good, I guess... does it check the timestamp on /etc/nsswitch.conf every time you try to do a lookup? With the IPC mechanism an IPC request will either fail or gets a 'reconnect' request, in which case the IPC reconnects to the (updated) service. : the IPC model is so much better then the DLL model for this sort of thing : I don't understand why people are even arguing about it. : :Because the rest of us are stupid and lazy, remember? :-) : :Cheers, :-- :Jacques Vidrine NTT/Verio SME FreeBSD UNIX Heimdal Just not thinking out of the box, maybe. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
:Matt, I'm talking about the de facto standard NSS, as found in Solaris :and Linux; and now FreeBSD 5 [*] and soon NetBSD [**]. You are talking :about some better mousetrap. The latter does not have any relevance :to this thread, which is about dynamic linking in next release of :FreeBSD. : :If you want to talk about FreeBSD 6.x and a better mousetrap, please :go right ahead with a new thread here on freebsd-current or over on :freebsd-arch. : :If you want to talk about the future of DragonFlyBSD, I'm sure there :is an appropriate list for that--- this one ain't it. Parts of your :message certainly seemed to describe what might be best for some other :operating system. : :Cheers, :-- :Jacques Vidrine NTT/Verio SME FreeBSD UNIX Heimdal :[EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] : : :Side notes: : :[*] The actual APIs used by Solaris and Linux are *very* different; :and the APIs used by FreeBSD are *somewhat* different from Linux. :However, because the *core concepts* are the same--- dynamic loading, :in-process modules--- portability issues are not much of a problem. : :[**] NetBSD doesn't support dynamic loading yet, but has had :considerable influence on the FreeBSD implementation. NetBSD :developers have indicated to me that they expect to bring in :the FreeBSD changes so that there will be basically the same :implementation on FreeBSD and NetBSD. I'm not sure of the relevance of this comment. My original opinion still stands... you guys are using this issue as an excuse to basically do away with static binaries, rather then fixing the real problem which is an inability to dynamically load modules in a static binary. How much do you intend to use NSS for? I mean, what's the point of adopting this cool infrastructure if all you are going to do with it is make a better PAM out of it? You can use it for basic authentication, sure, but the more things you try to use it for without static binary support the fewer things you can compile statically. You are basically doing away with the static linking capability of the system for no good reason that I can see, and coming up with all sorts of extra junk, like /rescue, to work around that fact. You are creating a huge mess *JUST* to be able to port a dynamic load NSS framework and you are throwing away functionality left and right to get it. That's no good in my book. If you *REALLY* want dynamic load NSS then you ought to spend the time to make it work with static binaries rather then just being lazy and getting rid of static binaries. So, yes, I do think you guys are being lazy in that regard. If this is the path you've chosen to go then you have an obligation not to tear out major existing system capabilities, such as the ability to generate static binaries, in the process. If the APIs are totally different then I don't see any difference between your direct NSS implementation and my ability, within an IPC framework, to implement NSS as a backend service. I suppose you can call me work building a better mousetrap, implying that I am doing work that has already been done, but it is really no different then what you are doing in FreeBSD-5 by changing existing API standards to suit your particular implementation. There is a lot of circular reasoning going on here... it's the same sort of circular reasoning that John uses to justify some of the more esoteric scheduling mechanisms in -current. A because of B because of A, and to hell with anyone who wanted to use C. What I am doing is not reinventing the wheel, I am simply reinventing the API that the backend of libc uses to access resources. There is nothing preventing me from then implementing something like NSS and PAM on the backend of the new API, as a service rather then as a DLL. I fully expect that someone will do that, in fact, possibly even me. But I also expect that straight IPC will solve at least as many problems and in fact solve significantly more problems then the DLL NSS solution solves. So, yes, IPC will be the basis used in DFly. That does not invalidate my comments vis-a-vie the dynamic/static problem with NSS and PAM that FreeBSD-5 has. If you want to do it right then make DLL's work with static binaries. If you want to do it wrong then ignore static binaries. It is that simple. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
: is the path you've chosen to go then you have an obligation not to : tear out major existing system capabilities, such as the ability to : generate static binaries, in the process. : :If this is what you think has happened, you're living in some parallel :fantasy universe. I am simply repeating the reasoning being used for going to a dynamic root. Forgive me if I misread it, but I believe the argument was that FreeBSD-5 was migrating to NSS and NSS's DLL mechanism does not work in a static world, therefore dynamic becomes the default. If I am wrong and NSS's DLL mechanism can be used in a static world, please correct me! So exactly how far do you intend to go with NSS? Because it seems to me that FreeBSD-5's goal is to start to depend on the DLL capabilities. If the goal is an intention to depend on DLL then you damn well need to make it work with static ELF binaries. It's that simple. : There is a lot of circular reasoning going on here... it's the same sort : of circular reasoning that John uses to justify some of the more esoteric : scheduling mechanisms in -current. A because of B because of A, and : to hell with anyone who wanted to use C. : :Keep the ad homenim attacks to yourself, buster! This was uncalled-for. : :Kris Well, the scheduler arguments are more involved but I am not incorrect here. E.G. the priority borrowing exists to work around problems with the mutex implementation. Preemption by non-interrupts threads also exists to work around problems with the mutex implementation. Preemptive cpu migration could be turned off fairly easily and doesn't really count. The priority borrowing is a mess, though. You may think its the best thing since sliced bread but I think it unnecessarily complicates both the scheduler core and the programming model. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
:No, what you said was not to tear out..the ability to generate static :binaries. That's completely different, and is absolutely not what :has happened, or what is planned. Static binaries continue to be :supported, available, and work with the system NSS and PAM modules as :before. I think you are missing the point I made in that response, because it isn't that cut and dry. Obviously isn't that cut and dry. Why is a dynamic root the default again? Because statically linking NSS and PAM is not the direction FreeBSD-5 is going. So if you are going to start depending on dynamic loading, and I seem to recall a number of conversations where that is, in fact, the intention, then you are marginalizing your static binary support. The more you use NSS and operate on the assumption that DLL will be leveraged, the more you marginalize your static binary support. FreeBSD-5 has *ALREADY* made major concessions, such as going to the dynamic root, precisely because it has *ALREADY* marginalized static binary support. That is what I'm hearing. What I am saying is that for something this fundamental to the infrastructure, it is not appropriate to marginalize static binary support. That is all I am saying here. Sure, I think an IPC mechanism is a better API, but that has nothing at all to do with this DLL / static/dyanmic binary issue in FreeBSD-5. They are two separate issues. Right now, in FreeBSD-5, the issue is the marginalized static binary support. :We're not talking about schedulers. What is at issue is that you :decided, for no reason appropriate to the topic of discussion, to :mention that you think an unrelated FreeBSD developer has difficulties :with logical reasoning. : :What the hell, Matt? By what standards of behaviour is this :acceptable? : :We have rules of conduct on the FreeBSD mailing lists, and people have :been removed in the past because they were unable to hold themselves :to them. Don't think that you're exempt just because you're Matt :Dillon. Yes, and apparently you are breaking them as much as you believe I am, Kris. You are also seriously misinterpreted my postings. I am obviously not advocating that FreeBSD-5 rip everything out and move to an IPC model. It takes time and consideration to be able to do something like that. What I am advocating is that FreeBSD-5 not marginalize and restrict (make less flexible) basic infrastructure in order to get other infrastructure working. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
:Ding! Oh god, not another one! *plonk* : :We need nsswitch type functionality in /bin/sh. To the people who want to :make it static, lets see some static binary dlopen() support or a nsswitch :proxy system. : :If half as much effort had been spent on implementing such a thing as there :has been hand wringing, then we'd have the best of both worlds by now. : :I'm sorry to sound harsh, but its the reality of the situation. Code :speaks louder than words. : :Cheers, :-Peter :-- :Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] :All of this is for nothing if we don't go to the stars - JMS/B5 You don't need dynamic loading to get nsswitch type functionality. You only need dynamic loading if nobody is willing to write an IPC model to get the functionality. It's really silly to create such a fundamental restriction on the binary because people are too lazy to build an IPC based mechanism. Not only silly, but it seems to me that it also creates security issues. At least with a client/server model it is possible to isolate the authentication code to, for example, disallow exec(), filesystem operations, or the opening of any new files. The other huge advantage that IPC has over DLL is that you can switchout the backend mechanisms on a running system without killing and restarting anything other then he one service you want to switch out, and if it doesn't work right you can restart the old one or keep live as a fallback. the IPC model is so much better then the DLL model for this sort of thing I don't understand why people are even arguing about it. And, of course, moving all of this junk out means that the libc implementation becomes a lot smaller... it just becomes an IPC interface and may be a small local cache rather then the full blown algorithm. The result? Static binaries become a lot smaller too (not that that is really a problem anyway). This 'it has to be static so dlopen works' argument is just not a good argument. It's really more of an excuse then an argument. If you really want to use dlopen then make it work with static binaries. If you want to do it right, develop an IPC model or use an existing IPC model. That said, an IPC model is precisely what I am developing for DragonFly (a 'money where your mouth is' response). :-) Right now I'm building it as a userspace library but I intend to move the rendezvous code directly into the kernel. Unix domain sockets are so icky(!). They work, but they are icky. I intend to use it for all resource and authentication lookups... password, group, services, pam equivalent, kerberos, resolver, and so on and so forth. Hell, I think I might use it for C locale too just to be able to rip locale out of libc. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
:I supported the decision because: : :1. It has been requested for years :2. It benefits PAM and NSS. :3. It is easy to revert. Easy to revert? You are talking about depending on mechanisms for authentication and other things that WILL NOT WORK with static binaries as they currently stand and, apparently, will not work in the future either. Easy to revert? I don't think so. More like Lets do away with support for static binaries entirely. Because that is precisely what is happening here. :Now please move along and revert it on your local system. There are far :too many REAL problems out there that need to be addressed so that 5.2 can :go out the door. This is just wasting time and energy. : :Scott This is a real problem. I have no problem with people who want dynamic roots to get dynamic roots. My problem is with this intention to not fix PAM or NSS in a way that works with static binaries, and my problem is with changing the default from static to dynamic. The result is, down the line, that either (A) it will become impossible to compile anything static, or (B) there will be things you WON'T be able to use NSS for because it will break static binaries. It is a serious logistical and planning mistake, IMHO. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
: : :I supported the decision because: : : : :1. It has been requested for years : :2. It benefits PAM and NSS. : :3. It is easy to revert. : : Easy to revert? You are talking about depending on mechanisms for : authentication and other things that WILL NOT WORK with static binaries : as they currently stand and, apparently, will not work in the : future either. Easy to revert? I don't think so. : : More like Lets do away with support for static binaries entirely. : Because that is precisely what is happening here. : :What the hell are you talking about What I am talking about is that if the intent in -CURRENT is to start to depend on things like NSS... and it really does make sense to be able to depend on something like NSS, then it will become less and less feasible to compile programs in a way that cannot use NSS. /bin/sh is an excellent example of this. Why is /bin/sh now dynamic again? Why can't it be static? I'm being retorical, but I think it demonstrates the problem and my point quite succinctly. Regardless of whether you have a dynamic root or a static root, FreeBSD is digging itself into a hole if it cannot use these spiffy new mechanisms with static binaries. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 40% slowdown with dynamic /bin/sh
:I think that you forgot to attach the patches that demonstrate all of :this. : :Also, I'm really starting to resent you using the FreeBSD mailing lists as :an advocacy channel for DragonFly. I fail to see how FreeBSD 4.x and :DFBSD relate to FreeBSD 5-current, which is the overall topic of this :mailing list at the moment. : :Scott That's too bad, Scott. You are just going to have to live with it. From my point of view anything related to BSD is fair and reasonable game, and regardless of anything else this mailing list is discussing topics and things that are near and dear to my heart and I have a perfect right to comment on them. I've contributed a considerable amount of time and effort to FreeBSD and I have a right to be on this list. From my point of view, it's all under the same open-source umbrella anyway. The moment you start thinking of a project in isolation is the moment you start to disconnect from your responsibility to the project and to all the users and developers who depend on the project. I still run a lot of FreeBSD boxes and know people who run a lot of FreeBSD boxes and, frankly, that gives me just as much right to comment on the future directions of FreeBSD as you or anyone else. So you are just going to have to live with it. Maybe instead of fuming over DFly you should consider the work from a more technical standpoint. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unfortunate dynamic linking for everything
: Don't you think that people are able to change defaults if they think : thats appropriate? : : Prior to that Jordan had bumped the root partition size to 100MB : in 1.98.2.3 in March 2001. It was 50MB before then, which is too : small even for 4.x. : : Hm, then why do I have still room on my 50MB root partition? : :$ df :Filesystem 1K-blocksUsed Avail Capacity Mounted on :/dev/vinum/root49583 29040 1657764%/ : : Thats enough for installworld (if softupdates are turned off) : : Gunther Try running installkernel and see what happens when it tries to rename the old kernel and modules and install a new one. Or try installing a kernel.debug (which is 14MB+) instead of a kernel. The point here is that just because you can barely squeeze the system into 50MB doesn't mean it's a good idea to. It might work for a very narrow use pattern, but it will create terrible problems if you ever tried to expand your use and the system defaults have to cover more then just generic users reasonably. This is why the default is no longer 50MB. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unfortunate dynamic linking for everything
:GAD Many freebsd users (me for one) are still living on a modem, :GAD where even one bump of 1.5 meg is a significant issue... :GAD :GAD Remember that the issue we're talking about is security :GAD updates, not full system upgrades. Everyone would want :GAD the security updates, even if they're on a slow link. : :When security updates change but a few bytes, it seems that some :xdiff- or rsync-like algorithm would be an apropriate way to :distribute patches. : : :Eddy :-- :Brotsman Dreger, Inc. - EverQuick Internet Division Security updates are a red hearing regardless because they are few and far between compared to even a modem-user's bandwidth (especially those modem users who are likely to read the security lists aren't going to care if it takes an hour to download a non optimal binary patch if it only happens a few times a year). -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unfortunate dynamic linking for everything
:Our rationale for encouraging Gordon is as follows: : :1. 4.x upgrade path: As we approach 5-STABLE, a lot of users might want :to upgrade from 4-STABLE. Historically in 4.x, the / partition has :been very modest in size. One just simply cannot cram the bloat that :has grown in 5.x into a 4.x partition scheme. Of course there is the :venerable 'dump - clean install - restore' scheme, but we were looking :for something a little more user-friendly. This argument would apply to very old 4.x users but not to anyone who installed it as of March 2001. I bumped the nominal size of the root partition to 128MB in 1.98.2.7 of sysinstall/label.c. Prior to that Jordan had bumped the root partition size to 100MB in 1.98.2.3 in March 2001. It was 50MB before then, which is too small even for 4.x. :2. NSS support out-of-the-box: Again, this is a user-experience issue :in that forcing NSS users to recompile world was seen as a potential :roadblock to them. Users have to recompile the world anyway, I don't think NSS is an issue. Or to put it another way: Nobody in their right mind should be contemplating upgrading a 4.x system to 5.x for production purposes. There will simply be too much 4.x cruft left over. Upgrading really requires a wipe and reinstall in this instance. I seem to recall that the main reason 5.x went to a dynamic /bin was to work around a terribly broken PAM design. The other issues were just eye candy compared to the PAM problems. Personally speaking, I think the solution is to fix PAM instead :-) :3. Binary security updates: there is a lot of interest in providing a :binary update mechanism for doing security updates. Having a dynamic :root means that vulnerable libraries can be updated without having to :update all of the static binaries that might use them. Non-issue if you have an automatic security update mechanism or script to do the work for the user, which we don't. Even so, /bin and /sbin are sufficiently self-contained in the source hierarchy that recompiling and reinstalling them is not a big deal. I think the security argument is a red-herring because, frankly, security problems are far more prevalient with ports and larger services (Apache, sendmail, sshd, etc... mostly in /usr/bin and /usr/local), and not as big an issue for /bin and /sbin. :As for performance, we felt that the typical use of FreeBSD did not fall :into the category of doing forkbomb performance tests. While there might :certainly be users that do continuously create lots of short-lived :processes, we felt that the above benefits outweighed this, but left the :door open for tailoring to this need (recompiling world with :NO_DYNAMICROOT). fork() is a non-issue (forking overhead for a dynamic executable imposes a slight time penalty but does not really impose any resource penalty). However, I personally believe that anything run by a shell script is an issue in regards to performance, and anything you exec multiple separate copies of is an issue in regards to memory overhead. A system running a large number of JAIL'd environments will wind up with a larger number of swap-managed dirty pages. But, all this aside, the reason I am decidedly against a dynamic /bin is simply one of system recovery and safety. I've blown up /usr so many times over the years that had /bin and /sbin been dynamic I would have been in real trouble on many occassions. If I recall correctly, some people have proposed and/or implemented some sort of emergency fall back or safety bin in 5.x to make up for this issue. I would humbly submit that this simply acknowledges that a dynamic /bin and /sbin is a bad idea in the first place. I do not intend to make /bin or /sbin dynamic in DragonFly by default. I don't mind adding support for those people who want it, but I am not going to make it the default. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unfortunate dynamic linking for everything
: :As far as I'm concerned, this is a non-issue. Identifying which static : binaries need to be replaced is now a solved problem, replacing them is : easy, and if binary patches are used, there is effectively no impact on : bandwidth usage either. : :Bandwidth is still a concern for a lot of people, and this has the :potential to save significant bandwidth in many situations. :.. :Scott I would not consider this a viable argument since binary downloads are usually compressed. A compressed /bin stacks up as follows (from 4.x): -rw-r--r-- 1 root wheel 4034560 Nov 18 18:34 /tmp/x1.tar static -rw-r--r-- 1 root wheel 849920 Nov 18 18:34 /tmp/x2.tar dynamic -rw-r--r-- 1 root wheel 1860215 Nov 18 18:34 /tmp/x1.tgz static -rw-r--r-- 1 root wheel 354576 Nov 18 18:34 /tmp/x2.tgz dynamic So you are talking about 1.5 MBytes less bandwidth, which is nothing compared to the cost of doing a full install over the net. Yah, yah, /sbin too... but you get the idea. It certainly isn't enough of a difference to justify going to a dynamic /bin and /sbin. I'm not saying there aren't other arguments, just that this isn't one of them. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unfortunate dynamic linking for everything
:/boot has grown quite large too and threatens to be unbounded in size as :times go on. Shaving off the 30-40MB of size in /bin and /sbin can :help alleviate this, even on system formatted in 5.x partition sized. :... :This argument wasn't the most compelling one by itself, but it played a :part in the decision so I included it here. You aren't saving that much. I don't have a 5.x box handy but on a 4.x box a static /bin is only 4MB and /sbin is only 12MB. The dynamic versions will save you around 12MB is my guess. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unfortunate dynamic linking for everything
:Many freebsd users (me for one) are still living on a modem, :where even one bump of 1.5 meg is a significant issue... : :Remember that the issue we're talking about is security :updates, not full system upgrades. Everyone would want :the security updates, even if they're on a slow link. : :-- :Garance Alistair Drosehn= [EMAIL PROTECTED] I really have to disagree with this comment. By your reasoning saving a few bytes could be argued as being 'significant'. I've done net installs over slow modems before.. hell, I ran cvsup over a slow modem for over a year! My problem was never / or /bin. Not once. Not ever. I really don't care about a measily few megabytes relative to the size of the whole dist, and I doubt modem users of today care much either when you put it in that perspective. Sure, if you could save 50% of the bandwidth over the whole dist it would be significant. But 12 MBytes? No. The reason your argument make little sense is that there are plenty of OTHER ways you can make modem user's lives easier that have not been implemented. We aren't talking about a few measily megabytes here, we are talking about not making modem users have to wait sitting in front of their terminal staring at a slow download for hours before they even know whether their system will boot the dist! A two-stage install... basic kernel and /, reboot, then install the rest, would have a major impact on modem users. A thousand times greater impact then the few measily megabytes. Modem users don't mind waiting as long as they don't have to sit in front of the terminal while doing so. Once a basic dist is up and running on a modem user's machine believe me, they will happily go off and do something else while waiting for the rest of the bits to download and install because they know if something blows up they won't have to start all over again. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADS-UP new statfs structure condidered harmful
:Well, there's some glue there now, but its pretty slim. What you :advocate would swap system call numbers for doing structure reloading per :call, which would significantly incrase the cost of the call. :Considering that *BSD system call overhead is pretty bad as is, I don't :think I'd be putting structure recopies into the critical path of a :syscall. : :-- :Doug White| FreeBSD: The Power to Serve Umm, no. I'm not sure why you are taking such a negative view of things, the actual implementation is whole lot simpler then you seem to believe. What we will be doing is adding new system calls to replace *stat() and *statfs(). They will for obvious reasons not be named the same, nor would the old system calls be removed. The new system calls will generate a capability list into a buffer supplied by userland, which is really no different from the copyout that the old system calls already do. The only difference is that the userland libc function that takes over the *stat() and *statfs() functionality using the new system calls (obsoleting the original system calls) will have to have to loop through the capability list and populate the user-supplied statfs or stat structure from it. Since the returned capability list is simply a stack based buffer there won't be any cache contention and the data will already be in the L1 cache. My guess is that it would add perhaps 150ns to these system calls compared to the 3-5uS they already take for the non-I/O case. The capability list would be 'chunky'. e.g. one capability record would represent all three timespecs for example, another record would represent uid, and gid. Another record record represent file size and block count, and so forth. They key point is that the individual capability elements would not change, ever. Instead if a change is needed a new capability element would be added and an argument to the new syscalls will let the system know whether it needs to generate the older elements that the newer ones replace. Userland will ignore capabilities it does not understand. The result is full forwards and backwards compatibility, forever. I do not believe there is any performance impact at all, especially if stat has to go do I/O. If you care about performance then I recommend that you fix the syscall path in 5.x instead of worrying yourself over stat(). If a particular program really needs to save the 150ns, say 'find', then it can call the new system call directly. But I really doubt anyone would notice 'find' running any slower. I certainly care a great deal about performance in DragonFly and I am not worried about the capability idea's impact *AT* *ALL*. The userland implementation would be something like this: int stat(const char *file, struct stat *st) { char tmpbuf[SMALLBUF]; /* stat info is expected to fit */ char *buf = tmpbuf; int off; int len; struct stat_cap_header *cap; /* * Run the system call. Try a small buffer first (designed to * succeed for the current version of the OS). If it fails then * allocate a larger buffer (compatibility with future OSs that might * provide more information). */ if ((len = stat_cap(file, buf, STAT_CAP_STDFIELDS)) 0) { if (errno != E2BIG) return(-1); buf = malloc(((struct stat_cap_header *)buf)-c_len); if ((len = stat_cap(file, buf, STAT_CAP_STDFIELDS)) 0) { free(buf); return(-1); } } /* * Populate the stat structure (this could be common code for all * stat*() calls). */ off = 0; while (off len) { cap = (struct stat_cap_header *)(buf + off); switch(cap-c_type) { case STAT_TIMESPEC1: st-st_atimespec = cap-c_timespec1.atimespec; st-st_mtimespec = cap-c_timespec1.mtimespec; st-st_ctimespec = cap-c_timespec1.ctimespec; break; case STAT_UIDGID1: st-st_uid = cap-c_uidgid1.uid; st-st_gid = cap-c_uidgid1.gid; break; ... } off += cap-c_len; } if (buf != tmpbuf) free(buf); return(0); } -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADS-UP new statfs structure condidered harmful
:Expect to have to recompile the entire fricking world for a change :this fundamental. : :Really, what should have appened is that the system call interface :for stat should have been retired as ostat, a new system call :interface introduced, and the libc version number bumped, given a :change this fundamental. : :Effectively, this will destroy binary backward compatability for :everything in the world. : :-- Terry :___ :[EMAIL PROTECTED] mailing list I recommend that instead of rolling these sorts of system calls over and over again (how many versions of stat do we have now? A lot!), that instead you make a system call which returns a capability buffer and then have libc load the capabilities it understands into the structure. That way you don't have to worry about forwards and backwards kernel compatibility. This is what I plan to do with DragonFly for *stat(), *statfs*(), various sysctls that return structures, and so forth. Wherever it makes sense to do it. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADS-UP new statfs structure condidered harmful
: sense to do it. : :How do you propose to achieve POSIX compliance? At the library :level? Or not at all? : :-- Terry I don't understand the question. All that happens is that functions like fstat() and statfs() become libc functions rather then direct syscalls. The userland program doesn't know the difference, it uses fstat() and statfs() just like it always has. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 3C940 / Asus P4P800 gigabit LAN driver
:I just tried Jung-uk Kim's driver on -stable and sofar it works OK: : ... and I just ported it to DragonFly and it works fine there too with an ASUS K8V Motherboard. Kudos! -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Help saving my system
: : :On Fri, 17 Oct 2003, Jason Dictos wrote: : : 1. Had FreeBSD 5.1 system running. : 2. Did a cvsup get of stable (that's right, stable, so 4.9) : 3. Compiled make buildworld, then did a make build world, and re-compiled : with generic kernel : 4. Booted into a system which could not mount the file system (expected : since its 4.9) : : So now I am trying to save the system and here's where I've gotten to: : : 1. Booted disc 2 of 5.1 cd into fixme mode. : 2. Mounted live cdrom filesystem and then mounted my root drive to /mnt : 3. Did a chroot to /mnt so that I would emulate my live system environment : 4. Re-did a cvsup of the branch current. : 5. Ran make buildworld from /usr/src : :At this point, I'd probably do a binary upgrade install of 5.1 to make :sure you have known-good binaries. I'd then boot to single-user mode and :run find on /bin, /sbin, /lib, /usr/lib, /usr/sbin, et al, and make sure :that there are no 4.x binaries left. If possible :-). : :Robert N M Watson FreeBSD Core Team, TrustedBSD Projects :[EMAIL PROTECTED] Network Associates Laboratories I would boot with the 5.1 live CD and backup all the data you want to save somewhere (like to another machine), then I would reinstall the system from scratch with the release you intend to run, wiping the disks completely. Then restore the data you wanted to keep from whereever you saved it. If you have a lot of data and a reasonably fast network link and no place to temporarily backup your data too while reinstalling, I can provide you with a temporary account with lots of disk space for a few days. EMail me privately if you need the temporary account. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Text file busy
: :Tim : :P.S. I wonder if demand-paging of executables is still a win for :program startup on modern systems with dynamically-linked executables? :Large reads are a lot more efficient, and it seems that dynamic :linking might cause more startup thrashing. Hmmm... Yes, they are a big win 95% of the time. Don't worry, the kernel will pre-fault pages that are already cached in memory (to a point), and the kernel will also cluster pagein operations if actual I/O becomes necessary. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
64 bit quantities in statfs ?
As part of the DragonFly effort we are going to increase the mount path limit from 80 chars to 1024. This will change the statfs structure. I thought I would adopt the 64 bit changes that 5.x has made to keep things synchronized. Except... there don't appear to be any 64 bit changes to struct statfs in 5.x. Am I missing something here? Is there an 'nstatfs' structure that I have not seen? The following probably need to be 64 bit entries: f_blocks f_bfree f_bavail f_files f_ffree f_syncwrites f_asyncwrites f_syncreads f_asyncreads -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Change to kernel+modules build approach
:: Has anyone in this discussion looked at what Matt has done with :: Dragonfly? He's re-arranged the kernel tree and moved each driver/module :: into its own directory. Each directory has a Makefile. thus :: a traversal of the kernel tree make hierarchy generates the modules. :: :: The modules subdirectory is going away.. (I think he's in the middle :: of doing that now) : :That's certainly an interesting concept. One that would make it :easier to deal with modules since you have the Makefile right where :you need it. If that is all that he's done, then that wouldn't answer :John's objection to the current state of affairs: meta data in two :places (module Makefile and conf/files*). If he's done something :else in addition to the movement, then that would be interesting to :look at. : :Warner Yes, I've done away with the 'modules' directory and have been reincorporating the modules Makefiles into the main part of the kernel tree. It turns out not to be all that difficult. Most module Makefile's can be plopped into the proper directory with very few changes. On half of them I only had to remove the .PATH directive. Subdirectories are glued together with the standard bsd.subdir.mk. Some surgery is required, but nothing difficult. For example, the netgraph modules necessitated moving each /usr/src/sys/netgraph/* element into a subdirectory to accomodate the Makefile for that netgraph module. There are a few areas like that... mainly changes which force partitionable entities into their own subdirectories which I consider to be good for the structure of the system. It is still a work in progress but I am very close to getting all the ducks honking properly in regards to config based kernel builds, make buildkernel, separate module builds, and module builds with and without 'make obj'. I heartily recommend that -current investigate and implement the refolding of the module build into the main kernel source tree. Eventually my goal is to make the entire kernel sufficiently modular that 'config' can be gotten rid of (or, at least, relegated to just generating the various .h files from the config options). -- I have also done additional (and very extensive) reorganization of the kernel source tree, but it is probably too extensive for -current to consider (read: about 40 dillon hours of work plus another 40 dillon hours to cleanup the build issues afterwords). Not only did I completely reorganize filesystems, network subsystems, and device drivers, I also moved driver-specific architecture-specific files out of e.g. i386/ and into appropriate_driver/i386, and I also changed config to generate use_*.h instead *.h files, which allowed me to remove the -I- sillyness from the Makefiles, which in turn allows relative #include file paths to be used again in the kernel source (in the many places where they make sense, which is just about everywhere). This greatly improves our ability to modularize of the kernel. But it was a huge amount of work. If I were to pick *one* thing to recommend that -current adopt it would be to change all the config generated *.h files to use_*.h (plus the same thing in those module makefiles which generated *.h files), and get rid of the -I- compiler option, then incrementally fix all the #include's that can be trivially relative to being trivially relative. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
:Wouldn't it be possible to achive the same result without the VFS with :well organized lib subdirs? like usr/lib/xyzlib1.2/ and :usr/lib/xyzlib1.3/ which would maintain the install for any given :version of a lib? In other words, instead of just dumping all the libs :into the one place, you simply place them into sub folders instead and :then link them as needed? Granted this would cause havoc for things like :LD_LIBRARY_PATH. I never did like the way we dump things in the lib :dir's, its messy. The VFS idea is interesting, but it like cleaning the :mess by sending parts of the big mess into another dimention, making it :a trans-dimentional mess (technically a larger mess). This throws away :the KISS principle. Not unless one wanted to make major modifications to all the third party applications out there, which nobody really wants to do, because hacking all those programs up makes it difficult to track updates. : taken for granted. Begin userland VFSs with the capability of : overlaying the entire filesystem space, these environments would be : extremely powerful. : :I suspect this ability would usefull for other things too, possibly for :security lock-downs on shell users env's without chrooting them as an :example. : :-Jon Yes, Exactly. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
:Matthew, :OK, you want a divergent kernel, EG src/sys/ , fair enough. : : - Are there any benefits to the BSD community in having :a 4th BSD bin/ sbin/ usr.bin/ usr.sbin/ ? :Can you not share co-operate with toolset maintainers of NetBSD :or OpenBSD, even if you can't work with some FreeBSD CVS people ? I don't think these particular subhiearchies are that big a deal. Except for libc and libc_r, most of the above will wind up being almost identical to their 4.x counterparts. You know the old saying... if it aint broke... One big part of the goal set will be the creation of a middle 'emulation' layer which is managed by the kernel but runs in userland, which will take over all primary system call entry points (in userland) and convert them to syscall messages that the kernel understands. 4.x, 5.x, SysV, Linux, and other compatibility sets will be moved out of the kernel and into this middle layer. Even the 'native' syscall set will run through an emulation layer (though being aware of it the native sets will call the emulation layer directly rather then bounce through the kernel). The advantage of this methodology is that we will be able to keep the kernel clean. For example, we would be able to modify how certain syscall messages work and simply by fixing the backend of the appropriate emulation layers we can maintain binary compatibility with any past userland. The emulation layer would be fully versioned so older userland programs use emulation layers targeted to older APIs, and newer userland programs use emulation layers targeted to newer APIs. So there would no longer be five different versions of stat() in the kernel, for example. There might be five different versions of the '4.x emulation layer', but there would be only *ONE* stat in the kernel. : - Do you intend your own ports/ collection too ? (or Free, Net or Open ?) : :- :Julian Stacey Freelance Systems Engineer, Unix Net Consultant, Munich. A Packaging system is a very important piece of any distribution. Our goal will be to create a packaging system that, via VFS 'environments', causes any particular package to see only the dependancies that it depends on, and the proper version of said dependancies as well. Multiple versions of third party apps that normally conflict with each other could be installed simultaniously. The packaging-system-controlled VFS environment would also hide everything a package does not depend on, like other libraries in the system, in order to guarentee that the dependancies listed in the packaging system are in fact what the application depends on. There's no point in having a packaging system that can't detect broken and incorrect dependancies or we wind up with the same mess that we have with ports. To make this work the VFS environment would have to be able to run as a userland process. Otherwise we would never be able to throw in the type of flexibility and sophistication required to make it do what we want it to do, and the kernel interfacing would have to be quite robust. I want to make these environments so ubiquitous that they are simply taken for granted. Begin userland VFSs with the capability of overlaying the entire filesystem space, these environments would be extremely powerful. It might be possible to build this new packaging system on top of the existing ports infrastructure. It will be several months (possibly 6-12 months) before the kernelland is sufficienctly progressed to be able to imlpement the userland VFS concept so we have a lot of time to think about how to do it. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
: Hmm, typing this command a second time I nopw see extra info: : Registrant: :Matthew Dillon :41 Vicente Rd :Berkeley, CA 94705 :US : :I've been to Matt's house before -- its real. He does have a T-1 at :home. : :-- :-- David ([EMAIL PROTECTED]) :P.S. I offer my home address for anyone that wants to at my place. Yah. There is truth in the registration address :-). I guess that means I really have got to go in and secure my WIFI system now. Speaking of which, that alpha box is just sitting there in my machine room like a boat anchor. If you know someone that would like to have it it's available, no charge! -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
for the Amiga. In modern day terms it is consigned to history now, having outlived the platform it was originally designed for (the Amiga, though there are people who still use it to support legacy 68000 based hardware). What is amazing to me are the stories from people who got their start in programming using that compiler, and have gone on to great jobs and interesting work in later years. *THAT* is what I care about. I don't care a whit about what happens to the actual code because I know it won't last. I've written a huge amount of code in my life and 95% of it is no longer being actively used. That same 95%, however, has effected the lives of thousands of people in a positive fashion so the idea that code must somehow become immortal or its a wasted effort is just absurd. Open source is the ultimate expression of Darwinism, and evolution takes many forms, but one thing is for certain: There is no such thing as immortality for Linux, FreeBSD, or anything else for that matter. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
: I also have the disk space. : : Let me know if you are interested. : :I'm happy with it, but right now, until we get a bit more organized, we :only need one yea vote: Matt's. I *don't* want to inconvenience his plans :any (especially not when I'm really sure I don't understand them all :yet). : :Is Larry's offer OK with you, Matt? We need off the FreeBSD lists, before :complaints start up. We can advertise later, if it's necessary. I've got a bunch of mailing lists already set up on dragonflybsd.org. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
: before :complaints start up. We can advertise later, if it's necessary. : : I've got a bunch of mailing lists already set up on dragonflybsd.org. :I didn't notice. Sorry for stepping all over you. : :LER : :Larry Rosenman http://www.lerctr.org/~ler No biggy! I would have gotten back to you sooner but I've been typing nearly uninterrupted for 6 hours answering email and just now catching up. The dragonfly lists will be where most of the meat is, but I will certainly post major achievements to -hackers. I also hope to get a list archive browser interface up today or tomorrow for lurkers. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Annoucning DragonFly BSD!
Announcing DragonFly BSD! http://www.dragonflybsd.org/ Hello everyone! For the last few months I have been investigating and then working on a new approach to the BSD kernel. This has snowballed into a far more ambitious project which is now ready for wider participation. It is the intent of this project to take over development of the 4.x tree, to move kernel development along an entirely new path towards SMP, and to completely rewrite the packaging and distribution system. We eventually intend to backport many FreeBSD-5 features into the new tree, but that is not where the initial focus will be. The preliminary 'proving' work I have done is now available on the new DragonFly site. You can access it through cvsup or browse it through ftp. This proving work involved implementing much of the earlier UP-SMP converstion work that was done when 5.x first branched, but under an entirely new mutex-free light weight kernel threading infrastructure. It includes the LWKT system, interrupt threads, and pure threads for system processes amoung other things. For obvious reasons the codebase will only run on i386 for now, and ports to other platforms will not happen until the MD infrastructure is cleaned up and finalized. I considered starting with a 5.x base but it is simply too heavily mutexed, it was actually faster to start with 4.x and move forward rather then to start with 5.x and move backwards. I have both UP and SMP builds working in the current codebase. I believe it proves out the core concepts quite nicely and there is much more work coming down the pipeline. The site is: http://www.dragonflybsd.org/ Hopefully my T1 can handle the cvsup load. Eventually I'll colocate some boxes to deal with that issue. For the next few months the project is going to concentrate on low level kernel development. There are still a number of big ticket items that have to be accomplished, primarily in converting the I/O path to using VM Object/range lists, before work can branch out into other areas. I expect the project to start fairly slowly but then for momentum to build. Anyone interested in working on or discussing the project is welcome! I have created a mailing list server and newsgroup forums and I am working on web-accessibility to same for passive listeners. I will be posting periodic updates to freebsd-hackers as well. Again, the site is below. It contains a great deal of documentation and other information. I even have a mascot! And, hopefully, it will all work from outside my LAN :-) http://www.dragonflybsd.org/ -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
: :Is it real or another troll? : :-Maxim I stupidly misspelled 'announcing' in the subject line, but it's very real. Check the site out: http://www.dragonflybsd.org/ It's basically the reason why I've been so quiet lately. I've been working 12 hours a day on proving the model out. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Annoucning DragonFly BSD!
: : I stupidly misspelled 'announcing' in the subject line, : :Well, at least you didn't misspell your name... :-) : : but it's very real. Check the site out: : : http://www.dragonflybsd.org/ : :The site looks interesting. All the kernel-level stuff is :pretty much over my head, but I will be interested in the :package-level ideas (once work starts on that part). : :-- :Garance Alistair Drosehn= [EMAIL PROTECTED] I have the beginnings of an idea for how to do the packaging stuff properly... and how to automated it so one gets the dependancies correct. To realize the idea I will have to get to the point where a VFS layer can run in userland. Then it becomes trivial to build 'filtering VFS layers' that run in userland (i.e. don't take up kernel resources) which can be used to figure out *exactly* what a package references in the system and *exactly* what it effects. Once one is able to do that the dependancy and conflict information can not only be completely automated, but one can theoretically (and automatically) create 'environments' for each package to run in which resolve the conflicts (i.e. makes sure that the exact version of third party shared libraries and so forth used by a package are the ones that package sees even if multiple versions would normally install over each other). I'm guessing 6 months until the project gets to that point, oweing to the complexity of fixing VFS and, in particular, VFS_LOOKUP. But then, watch out! Infinite application... not only to the packaging system, but to the concept of how jails should work as well, and many other things. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: libthr and 1:1 threading.
:Terry Lambert wrote: : Peter Wemm wrote: : No. It gives the ability for a thread to block on a syscall without : stalling the entire system. Just try using mysqld on a system using libc_r : and heavy disk IO. You can't select() on a read() from disk. Thats the : ultimate reason to do it. The SMP parallelism is a bonus. : : Bug in FreeBSD's NBIO implementation. A read() that would result : in page-in needs to queue the request, but return EAGAIN to user : space to indicate the request cannot be satisfied. Making select() : come true for disk I/O after the fault is satisfied is a seperate : issue. Probably need to pass the fd all the way down. : :Umm Terry.. we have zero infrastructure to support this. : :Cheers, :-Peter :-- :Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] :All of this is for nothing if we don't go to the stars - JMS/B5 It would be a very bad idea anyway. If someone is that dependant on detecting page-fault or page-in behavior during I/O then they ought to be using AIO (which does queue the request), not, read(), or they should wire the memory in question. I think I know what Terry wants... the best of both worlds when faced with the classic performance tradeoff between a cached synchronous operation and an asynchronous operation. Giving read() + NBIO certain asynchronous characteristics solves the performance problem but breaks the read() API (with or without NBIO) in a major way. A better solution would be to give AIO the capability to operate synchronously if the operation would occur in a non-blocking fashion (inclusive of blockages on page faults), and asynchronously otherwise. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: libthr and 1:1 threading.
That's a cute trick. The ultimate solution is to implement a semi-synchronous message passing API to replace the myrid system calls we have now. Roughly speaking, what the Amiga did for messages, ports, and I/O, is far superior then what is done in Linux and *BSD. You get the benefit of being able to operate syncnronously when possible, and having a convenient cup-holder for the operation state if you decide you have to 'block' the operation (instead of the state being strewn all over the call stack in a syscall implementation). Userland can decide whether to block or not block on an operation entirely independant of the OS deciding whether to block or not block on an operation. -Matt Matthew Dillon [EMAIL PROTECTED] :Without wanting to get too far off into the weeds, squid does something :interesting. They need to be able to nonblock for everything including :open(), read(), unlink(), readdir() etc. So what they do is implement a :fairly significant superset of the traditional AIO stuff using pthreads. It :seems to work pretty well for them, even with linuxthreads style threads. :Granted, squid's needs are not exactly typical. But I did want to point :out that a good part of the delays come not only from data IO but operations :like opening a file (pathname traversal), creating or removing a file, :reading a directory etc. This is a particular problem when the disk :is really busy. : :Cheers, :-Peter :-- :Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] :All of this is for nothing if we don't go to the stars - JMS/B5 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: libthr and 1:1 threading.
:How does this break the read() API? The read() API, when called :on a NBIO fd is *supposed* to return EAGAIN, if the request cannot :be immediately satisfied, but could be satisfied later. Right now, :it blocks. This looks like breakage of disk I/O introducing a :stall, when socket I/O doesn't. : :If this breaks read() semantics, then socket I/O needs fixing to :unbreak them, right? Oh please. You know very well that every single UNIX out there operates on disk files as if their data was immediately available regardless of whether the process blocks in an uninterruptable disk wait or not. What you are suggesting is that we make our file interface incompatible with every other unix out there... ours will return EAGAIN in situations where programs wouldn't expect it. Additionally, the EAGAIN operation would be highly non-deterministic and it would be fairly difficult for a program to rely on it because there would be no easy way (short of experiementation or a sysctl) for it to determine whether the 'feature' is present or not. Also, the idea that the resulting block I/O operation is then queued and one returns immediately from way down deep in the filesystem device driver code, and that this whole mess is then tied into select()/kqueue()/ poll(), is just asking for more non-determinism... now it would depend on the filesystem AND the OS supporting the feature, and other UNIX implementations (if they were to adopt the mechanism) would likely wind up with slightly different semantics, just like O_NONBLOCK on listen() sockets has wound up being broken on things like HPUX. For example, how would one deal with, say, issuing a million of these special non-blocking reads() all of which fail. Do we queue a million I/Os? Do we queue just the last requested I/O? You see the problem? The API would be unstable and almost certainly implemented differently on each OS platform. A better solution would be to implement a new system call, similar to pread(), which simply checks the buffer cache and returns a short read or an error if the data is not present. If the call fails you would then know that reading that data would block in the disk subsystem and you could back-off to a more expensive mechanism like AIO. If want to select() on it you would then simply use kqueue with EVFILT_AIO and AIO. A system call pread_cache(), or perhaps we could even use recvmsg() with a flag. Such an interface would not have to touch the filesystem code, only the buffer cache and the VM page cache, and could be implemented in less then a day. -Matt ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.x locking plan
:My curiousity has overcome my fear of the bikeshed so I'll ask the :question that has been bugging me for a while. Why haven't we gone :through the tree and created a lock for each spl and then converted every :spl call into the appropriate mtx_lock call? At that point, we can mark :large sections of the tree giant-free and then make the locking data-based :(instead of code-based) one section at a time. This is the approach :Solaris took. : :-Nate The problem is that SPLs are per-thread masks, and different sets of bits can be added or removed from the master mask in any order and at any time. There is no direct translation to a mutex (which cannot be obtained in random order, is not per-thread, and may result in preemption or a context switch). Most of the code locked under Giant assumes the single-threading of kernel threads regardless of the SPL. This 'inherent' single threading is one the reasons why the original code was so efficient. Since preemption can occur now under many new circumstances, including when 'normal' (non-spin) mutexes are used to replace prior uses of SPLs (which could not cause thread level preemption)... well, it basically means there is no easy way to remove Giant short of going through every bit of code and fixing it one subsystem at a time. Giant itself is a special case. It is not a normal mutex. Instead, the kernel very carefully saves and restores the state of Giant on a per-thread basis so programs don't 'need to know' whether Giant is being held or not and so Giant can be held in combination with another mutex without violating the basic 'only one mutex can be held when going to sleep' rule. -Matt Matthew Dillon [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S
:Seems like diskless clients would have to have separate kernels with :the option BOOTP while any servers must omit this option. : :How do you keep them separate? or am I missing something fundamental? : :Thanks. You can compile pxeboot with the LOADER_TFTP_SUPPORT=YES option. Add LOADER_TFTP_SUPPORT=YES to your /etc/make.conf and recompile /usr/src/sys/boot (make clean; make obj; make; make install from within /usr/src/sys/boot). If you do this pxeboot will attempt to load the kernel via TFTP instead of via NFS. You then put your kernel in /tftpboot right along side a copy of pxeboot. This allows you to netboot a different kernel then the one in the server's root directory. :PS: could you show me your dhcpd.conf so I can see how you're :specifying your root filesystem? Mine's currently: : option root-path192.168.255.185:/; subnet 216.240.41.0 netmask 255.255.255.192 { range ...; server-name apollo.backplane.com; option subnet-mask 255.255.255.192; option domain-name-servers 216.240.41.2; option domain-name backplane.com; option broadcast-address 216.240.41.63; option routers 216.240.41.15; group { filename pxeboot; option root-path 216.240.41.2:/; host net1 { # Alternative server to boot -current # option root-path 216.240.41.12:/; # next-server 216.240.41.12; hardware ethernet ...; } host net2 { hardware ethernet ...; } ... } } -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S
4.x and -current use the same mechanism, except 4.x uses MFS and -current uses MD. Ignore the handbook. Try 'man diskless'. kenv is only used in current's rc.diskless scripts, and it resides in /bin on -current. kenv is not used in 4.x's diskless scripts. Basically what you do is create a files and directories in /conf/base and /conf/default which are used to populate the MFS/MD root and other directories. I have included my setup at the end. :I was running a VIA Mini-ITX diskless box off a 4.7-STABLE box for a :while using a root fs created by the clone_root discussed in the :handbook, then some tweaks. I'm having a heck of a time trying to get :this running under 5.0-RELEASE, now sync'd to 5.0-CURRENT as of :yesterday, then mergemastered. : :If someone can provide some clues or pointers, I'd be happy to doc how :I get it to work (for the Handbook?) and could take a stab at updating :clone_root for 5.x if it's needed. : : :Background: : :Been using FreeBSD since 2.2.x. I can code. I can RTFM. :-) : :I've read the 5.0 Release Notes and Early Adopters docs. :... : :Upon boot, after kernel loaded, console shows a bunch of rc.conf-style :vars being set, then spews some debugging which I put in :$DISKLESSROOT/conf/default/etc/rc.d/diskless, so it's running that :rather than the old /etc/rc.diskless* files. I've moved the mount -a :near the top of rc.d/diskless since it runs commands which are and not :available until /usr is mounted (e.g., mtree). The NFS mount fails :with a message I don't understand: : :Can someone point me in the right direction ? Thanks! I'm not sure what you are doing here. You don't want to override any rc.d files in /conf. That will blow things up for sure. Ok, here is an ls -lR of my setup: # ls -lR /conf total 2 drwxr-xr-x 5 root wheel 512 Dec 21 10:37 base drwxr-xr-x 3 root wheel 512 Dec 19 21:56 default /conf/base: total 5 drwxr-xr-x 2 root wheel 512 Dec 21 10:37 dev drwxr-xr-x 2 root wheel 512 Dec 19 22:22 etc -rw-r--r-- 1 root wheel 11 Dec 20 15:38 etc.remove drwxr-xr-x 2 root wheel 512 Dec 20 14:31 root -rw-r--r-- 1 root wheel 12 Dec 20 15:38 root.remove /conf/base/dev: total 2 -rw-r--r-- 1 root wheel 18 Dec 21 10:37 diskless_remount -rw-r--r-- 1 root wheel 6 Dec 19 22:22 md_size /conf/base/etc: total 2 -rw-r--r-- 1 root wheel 18 Dec 19 22:10 diskless_remount -rw-r--r-- 1 root wheel 6 Dec 19 22:22 md_size /conf/base/root: total 2 -rw-r--r-- 1 root wheel 19 Dec 20 14:31 diskless_remount -rw-r--r-- 1 root wheel 5 Dec 20 14:31 md_size /conf/default: total 1 drwxr-xr-x 3 root wheel 512 Dec 20 11:18 etc /conf/default/etc: total 4 -rw-r--r-- 1 root wheel 184 Feb 18 18:16 fstab -rw-r--r-- 1 root wheel 867 Dec 21 00:04 rc.conf -rw-r--r-- 1 root wheel 197 Feb 18 18:19 rc.local And some cats: apollo:/conf# cat /conf/base/dev/diskless_remount 216.240.41.2:/dev apollo:/conf# cat /conf/base/dev/md_size 16384 apollo:/conf# cat /conf/base/etc/diskless_remount 216.240.41.2:/etc apollo:/conf# cat /conf/base/etc/md_size 16384 apollo:/conf# cat /conf/base/root/diskless_remount 216.240.41.2:/root apollo:/conf# cat /conf/base/root/md_size 8192 apollo:/conf# cat /conf/default/etc/rc.conf hostname=mobile.backplane.com nfs_client_enable=YES local_startup= ip_portrange_first=4000 ip_portrange_last=8192 syslogd_enable=NO firewall_enable=YES firewall_type=/etc/ipfw.conf ntpdate_enable=YES ntpdate_flags=apollo.backplane.com xntpd_enable=YES sshd_enable=YES sendmail_enable=NO linux_enable=NO apollo:/conf# cat /conf/default/etc/fstab # DeviceMountpoint FStype Options DumpPass# apollo:/usr /usrnfs ro 0 0 apollo:/FreeBSD /FreeBSDnfs ro 0 0 apollo:/backup1 /usr/objnfs rw 0 0 proc/proc procfs rw 0 0 That's basically it. You can also use /conf/base/dir.remove and /conf/default/dir.remove files to list files to remove from the clone. You don't have to specify a network config in your /conf/default/etc/rc.conf because the network will have already been setup. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S
I'm not sure what is occuring here but it sounds like cockpit trouble somewhere. Make sure your NFS server is exporting to your subnet and that it is running the necessary services, (portmap, mountd, nfsd -t -u -n 4). If you have another box that you can boot normally (not netboot), test the NFS server from that box by mounting / and /usr: other# mount 192.168.255.185:/usr /mnt Or, if you have no other box, make sure you can mount the server onto the server itself as a test: nfsserver# mount localhost:/usr /mnt nfsserver# mount myaddress:/usr /mnt(aka 192.168.255.185) If you are running a firewire on the server, make sure it is letting NFS through to your LAN. It is also possible that someone has broken something in NFS recently. The -current I am running (which works fine as a server for my EPIA 5000 and EPIA M 9000) is several weeks old. If your /usr partition is on / on your server (i.e. not its own partition), then remember to use the -alldirs option in /etc/exports for / and /usr. If /usr is on its own partition you don't need -alldirs unless you are trying to mount a subdirectory in / or /usr. You *might* need -alldirs on your / export. In anycase, I always set -alldirs on all my read-only exports and that is what I would recommend you do too. -Matt :Chris Shenton [EMAIL PROTECTED] writes: : : I've moved the mount -a near the top of rc.d/diskless since it : runs commands which are and not available until /usr is mounted : (e.g., mtree). The NFS mount fails with a message I don't : understand: : : [udp] pectopah.shenton.org:/usr: RPCPROG_NFS: RPC: Unknown host : :Tasteless self-followup: : :I get the same error when the boot process fails and drops me to a :shell; I can get it with UDP or TCP mounts. For example: : : mount_nfs -U -2 192.168.255.185:/usr /mnt : mount_nfs -U -3 192.168.255.185:/usr /mnt : mount_nfs -T -2 192.168.255.185:/usr /mnt : mount_nfs -T -3 192.168.255.185:/usr /mnt : :The only difference is the [udp] vs [tcp] in the error msg: : : [tcp] 192.168.255.185:/usr: RPCPROG_NFS: RPC: Unknown host : :I sniffed traffic with tcpdump and ethereal: the diskless client is :contacting the server so it's not having problems resolving that IP :addr. :... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S
:Matthew Dillon [EMAIL PROTECTED] writes: : : 4.x and -current use the same mechanism, except 4.x uses MFS and : -current uses MD. : :4.x uses /etc/diskless[12] while 5.x (by default) uses :/etc/rc.d/(init)?diskless. The latter is works very differently than :the former. 4.x uses /etc/rc.diskless[12]. I synchronized most of -current and -stable's rc.diskless scripts a month or two ago so even though parts of the scripts are different (like the use of MFS instead of MD), they should operate according to the same /conf rules. : kenv is only used in current's rc.diskless scripts, and it : resides in /bin on -current. : :Not on mine: : : chris@Pectopah103 whereis kenv : kenv: /usr/bin/kenv /usr/share/man/man1/kenv.1.gz /usr/src/bin/kenv : chris@Pectopah104 ls /bin/kenv : ls: /bin/kenv: No such file or directory Are you sure you have done a recent buildworld/installworld? It sounds like you haven't. In -current kenv is in /bin (i.e. the source is in /usr/src/bin/kenv on -current) as of the 15th of this month. But it still should have worked because /usr should have been mounted by then. : Basically what you do is create a files and directories in : /conf/base and /conf/default which are used to populate the : MFS/MD root and other directories. I have included my setup : at the end. : :Which startup scripts are you running, old diskless[12] or new :rc.d/(init)?diskless ? Both should work the same. The new rc.d/initdiskless and rc.d/diskless run the same rules that rc.diskless1 and rc.diskless2 do. :Thanks for your examples, I'll plow through them tonight. But -- more :below -- these sure look like 4.x-compatible stuff, not 5.0. 4.x and 5.x work the same. That is, -stable and -current work the same. If you are running an older 4.x (possibly even the last 4.x release) it may not have the new scripts. :Actually, I don't see any code to look for that md_size or :diskless_remount in either of 5.0's rc.diskless[12] or :rc.d/(init)?diskless. I do know that what you're describing is in :4.x's rc.diskless[12], and I did have that working on a 4.7S system. :That's why I'm having so much trouble with the 5.0 diskless boot -- :everything's changed. :.. :Thanks a bunch! You must be working off an out of date source tree. I have included -current's current /usr/src/etc/rc.d/initdiskless script below for you to compare against your sources. If your sources are out of date you should update them (e.g. via cvsup or whatever you use to get the sources). As you can see, the initidiskless script is full of references to md_size :-) -Matt Matthew Dillon [EMAIL PROTECTED] #!/bin/sh # # Copyright (c) 1999 Matt Dillion # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright #notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright #notice, this list of conditions and the following disclaimer in the #documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # # $FreeBSD: src/etc/rc.d/initdiskless,v 1.23 2003/02/15 16:29:20 jhay Exp $ # # PROVIDE: initdiskless # KEYWORD: FreeBSD # On entry to this script the entire system consists of a read-only root # mounted via NFS. We use the contents of /conf to create and populate # memory filesystems. The kernel has run BOOTP and configured an interface # (otherwise it would not have been able to mount the NFS root!) # # The following directories are scanned. Each sucessive directory overrides # (is merged into) the previous one. # # /conf/base universal base # /conf/default modified by a secondary universal base # /conf/${ipba} modified based on the assigned broadcast IP # /conf/${ip} modified based on the machine's assigned IP
Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S
:OK, I'll change this to tag=. and recvsup, try again. A big doh! : :Many thanks. Yup, the 5.0 release tag was messing up your cvsup. Keep at it! Once you get the diskless stuff working with the more recent sources it will become a whole lot easier to maintain it. For one thing, you won't have to maintain a complete copy of /etc any more :-), just overrides for particular files. I'm not sure what is going on with your NFS issues, but hopefully you will be able to solve it with a fully updated system. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Hyperthreading and machdep.cpu_idle_hlt
: The ideal situation would be to have as Matt (and the comment : actually) says a cpu mask of idle cpus and generate an IPI to wake up : CPUs sitting in HLT when something hits the runqueue, then you can : just hlt all of them and rely on the IPI to wake you up, or the next : timer tick, whichever comes first and you can really get the best of : both worlds. : :I think it's more complicated than that; you don't want to have :anything other than the CPU that owns the per CPU run queue doing :anything with it, which means that it's the wakeup event, not the :arrival on the run queue, which needs to be signalled. Then the :CPU in question has to do it's own processing of pending wakeup :events in order to handle the placing of the process on the run :queue itself, rather than it being handled by another CPU. : :This also implies per-CPU wait queues, and a reliable message :delivery mechanism for wakeup messages. : :Though it may be enough to simple mark everything on the wait :queue as wakeup pending, for a first rev., and run the wait :queue, it's probably not a good idea for a production system, :since it brings back the Giant Scheduler Lock for the wait queue :(on the plus side, items awakened could be moved to the head of :the queue when they were marked, with the lock held anyway, and :that would shorten the list of traversed items per CPU to all :pending wakeup processing, rather than all queue entries). :But it's still too ugly for words. The HLT/clock interrupt issue is precisely what I describe in the idle_hlt comments in i386/i386/machdep.c (last July). I wish we had a better mechanism then the stupid IPI stuff, like a simple per-cpu latch/acknowledge level interrupt (softint), but we don't. I don't think we want to over-engineer per-cpu scheduling. The system really doesn't know what cpu a task is going to wind up running on until a cpu scheduler (sched_choose()) comes along and needs to locate the next task to run. Too many things can happen in between the initiation of the wait, the wakeup, and the task actually getting cpu. Introducing a complex per-cpu wait queue or trying to do something complex at wakeup time instead of at sched_choose() time is just going to be a waste of time. I think it is best to wakeup a task by placing it on the same cpu run queue as it was previously on (which is what Jeffs code does for the most part), and deal with task stealing in sched_choose(). The scheduler, when it comes time to actually switch in the next runnable task, then deals with complexities associated with misbalancing (i.e. cpu A is idle and ready to accept a new task, and cpu B's run-queue has a task ready to be run). While it is true that we would like a cpu to predominantly use the per-cpu run-queue that it owns, we don't really lose anything in the way of performance by allowing cpu A to add a task to cpu B's run queue or for cpu A to steal a task from cpu B's run queue. Sure we have the overhead of a per-cpu mutex, but the reason we don't lose anything is that this sort of mechanism will *STILL* scale linearly with the number of cpus in the system (whereas the global run queue in sched_4bsd.c constricts at a single sched_mtx and does not scale). The overhead of a per-cpu run-queue with a per-cpu mutex is *STILL* effectively O(1) and the more complex overheads involved with locating a new task to schedule from some other cpu's run queue when the current cpu's run-queue is empty are irrelevant because you are only eating into cycles which would otherwise be idle anyway. -Matt Matthew Dillon [EMAIL PROTECTED] :I think something like wakeup signalling, as a message abstraction, :is required, in any case, considering support for clustering or NUMA, :going forward, to deal with slower signal paths on a single system :image for much more loosely coupled CPUs. Directly modifying queues :in memory of other CPUs is unlikely to scale well, if it can even be :made to work at all. : :-- Terry : To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Style fixups for proc.h
:Julian Elischer writes: : I don't know about the protection with a '_'. : : It's not standard and usually the name matches that used in the actual : function. : :When the prototype parameter name matches a local variable, the C compiler :(and lint) whine about clashes between names in local/global namespace. I've never in my life heard of this behavior before, what compiler arguments reproduce it? :2 ways to fix this are to protect the prototype argument names with the :_, or to remove the argument name altogether. If it is a problem, why not simply use the same variable names that are declared in the procedure proper? The underscore looks ugly and out of place and doesn't make that much sense to me. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Style fixups for proc.h
:WARNS=5. This isn't helpful. I tried adding every -W switch in bsd.sys.mk and couldn't reproduce the problem. What compiler option is causing the problem? : :2 ways to fix this are to protect the prototype argument names with the : :_, or to remove the argument name altogether. : : If it is a problem, why not simply use the same variable names that are : declared in the procedure proper? The underscore looks ugly and out of : place and doesn't make that much sense to me. : :Because this doesn't always help, or if it did, the diffs are often :much bigger and to many more files. : :M :-- :Mark Murray :iumop ap!sdn w,I idlaH Ok, now I'm really confused. How can it not always help? If the arguments are the same as the arguments declared in the underlying procedures why would an error still be produced? The diff you produced for proc.h is *already* fairly extensive. If you want to fix this, you only need to fix the lines generating compiler warnings. I really dislike screwing around with source code to work around bugs in the the compiler, or lint. Given the choice of underlines or leaving the arguments unnamed, I would leave them unnamed. Or I would figure out and remove whatever broken compiler option is generating the warning in the first place. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Style fixups for proc.h
:If a named prototype clashes with something in global scope, :isn't it still a shadowing issue? They should probably never :be *in* scope. :-- :Juli Mallett [EMAIL PROTECTED] :AIM: BSDFlata -- IRC: juli on EFnet I finally was able to reproduce the bug. But it's an obvious bug in the compiler at least in regards to named arguments in prototypes. The -Wshadow switch causes the error message to be produced and it occurs both for named arguments in prototypes and, named arguments in the procedure proper, and named locals in the procedure. The solution is simply to name the arguments in the prototype the same as the arguments in the procedure. If we are going to use -Wshadow then the arguments in the procedure will generate the error too so naming the prototype arguments the same will not make things any better OR worse. So the prototype arguments should either be named the same as those in the procedure proper, or should not be named at all. We definitely should not go adding underscores to the prototypes to work around this problem. -Matt Matthew Dillon [EMAIL PROTECTED] int x; int fubar(int x); int fubar(int y) { int x = 2; ++x; y = 1; return(1); } int fubar2(int x) { ++x; return(1); } To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Style fixups for proc.h
:I really dislike screwing around with source code to work around :bugs in the the compiler, or lint. Given the choice of underlines :or leaving the arguments unnamed, I would leave them unnamed. Or I :would figure out and remove whatever broken compiler option is generating :the warning in the first place. : :Then can we just get the proc.h prototypes into a (any) consistent :style? : :M :-- :Mark Murray Lets ask ourselves what the goal of the named prototypes is... the compiler doesn't need them, obviously, so one would presume that the goal is human readability. So if we care about human readability we should simply name them after the argument names used in the procedures proper. If we don't care about human readability we should omit the names entirely. An underscore would be detrimental to human readability. It makes the prototypes look rather nasty when I look at the fully patched proc.h, and also makes them different from the arguments as declared in the procedures proper. A quick perusal of include files shows that we use a mix. Examples: sys/acl.h -- looks like the authors tried to use the underscore technique but forgot a couple. sys/aio.h -- a mix of named (without underscore) and unnamed. sys/blist.h -- named prototypes without underscore (mine originally) sys/buf.h -- a mix of named (without underscore) and unnamed. Mostly unnamed, and __P() is still being used. (the named one is probably mine). sys/callout.h -- unnamed. sys/conf.h -- mostly named (without underscore) (not mine) sys/cons.h -- unnamed And it goes on. Quite a mess we have, actually. We still have __P in many places. The newest header file would arguably be acl.h in which the author used underscores. I can't say I like the way it reads on the screen. Older header files either still have __P, don't have __P and the arguments are named (typically without an underscore), or mix with some of the arguments named and some not (some wholely not). -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: split out patch
:02:59:24 -0800 (PST). The date/time stamp on the message that I am :replying to is Sat, 1 Feb 2003 10:47:44 -0800 (PST). That's :something around seven hours and forty-five minutes, unless I have :miscalculated. : : Is it really normal to expect replies within that kind of a time :frame, especially since we're talking about 3:00 AM to 10:45 AM, and :most people are likely to be asleep? Granted, not everyone is in :PST, but it's still a relatively quiet period of time for most people :... : I'm not questioning the patch at all, just the apparent impatience. :-- :Brad Knowles, [EMAIL PROTECTED] Well, it is an active conversation/thread. Either people care enough to stay involved or they don't. Considering how much time has been wasted so far bickering back and forth over a commit that was *almost* corrected before the inevitable calls for a backout, and the fairly hostile history between some of the participants, I can understand Julian's impatience. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Style fixups for proc.h
Well, there is something to be said for trying to avoid userland namespace pollution, but it is still somewhat of a stretch since most userland programs #include standard and system headers before they #include their own, and the includes are typically done before any code. But I see no reason why the underscore methodology would need to be used for kernelland prototypes. C has its problems and we need to live with them, but we shouldn't have to add bogus underscores to prototyped arguments to work around those problems. I'd prefer normally named arguments but if I were given only a choice between underscored named arguments and unnamed arguments, I'd take unnamed arguments hands down. -Matt :Actually, the pattern is that the function prototypes exposed to userspace :are prefixed with '_' to prevent interfering with the application :namespace. The ones exposed only in the kernel don't. I should probably :update the kernel ones as well. This is mostly because of the profound :evils associated with the C preprocessor, which can cause substitutions to :occur in function prototypes (this is often used intentionally). : :For an example of this evil: there appears to be a convention in which we :name structure elements with a prefix, such as m_blah, based on the :structure name. At one point I added m_flags to struct mac. When I :included mbuf.h, this became m_hdr.mh_flags, resulting in fairly obtuse :compile errors. Protecting user applications against hard to understand :compile errors is an important part of providing useful include files to :application writers, so avoiding exposing things to the application :namespace where it can be avoided. : :Robert N M Watson FreeBSD Core Team, TrustedBSD Projects To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Hyperthreading and machdep.cpu_idle_hlt
:So, at the request of bmilekic, I ran netpipe on a hyperthreading box (non :hyperthreading, I'll do when I can turn it off in BIOS next time I'm down :there) :... : :The results are here: : :http://bsdunix.net/performance : :all information on what command line options I used is in there. : :the difference with it on is pretty substantial, might be worth noting in :tuning(7) : :-Trish : :-- :Trish Lynch [EMAIL PROTECTED] Those results are indeed quite substantial. Before you modify tuning(7), though, lets wait a bit to see if anyone comes up with a fix to the performance issue when idle_hlt is turned off. In particular I would like to try using a per-cpu global test in the idle loop that avoids doing any locked bus cycles. Unfortunately I am not sure if I have any hyperthreading capable boxes. My primary machine is a pentium 4 but it is running -stable. Timecounter i8254 frequency 1193182 Hz Timecounter TSC frequency 1296069572 Hz CPU: Pentium 4 (1296.07-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf07 Stepping = 7 Features=0x3febf9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,P AT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,b28,ACC It has HTT set but it's only a 1.2GHz box and I heard somewhere that only 2+ GHz P4's had hyperthreading. I noticed some MFCs to stable that suggested hyperthreading support but I do not know if full hyperthreading support has been MFCd yet or is intended to be MFCd to -stable. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Hyperthreading and machdep.cpu_idle_hlt
:AFAIK, full hyperthreading support, as it is, has been merged to :-stable. It consists of a patch to recognize the virtual CPUs, so they :will be dealt with like any SMP system, as long as HTT is enabled on the :BIOS. : :-- :Daniel C. Sobral (8-DCS) :Gerencia de Operacoes Yah. Shoot, well this Sony VAIO desktop has a P4 with HTT set in it, but it doesn't have an APIC, the BIOS is clueless, and there is no mptable, so I guess I am S.O.L. in regards to using hyperthreading on this box. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Hyperthreading and machdep.cpu_idle_hlt
: Why do you think that hlt-ing the CPU(s) when idle would actually : improve performance in this case? My only suspicion is that perhaps : this reduces scheduling on the auxiliary 'logical' (fake) CPUs, : thereby indirectly reducing cache ping-ponging and abuse. I would : imagine that both units sharing the same execution engine in the : HTT-enabled model would be effectively 'hlt'-ed when one of the two : threads executes an 'hlt' until the next timer tick. : : I guess we'll wait for the two other data sets from Trish: one with : HTT off, and cpu_idle_hlt=0, and the other with HTT off, and : cpu_idle_hlt=1, before figuring this out. : :-- :Bosko Milekic * [EMAIL PROTECTED] * [EMAIL PROTECTED] I am almost certain that it is related to pipeline stalls created by the fairly long (in instructions) idle loop and the locked bus cycles used by the mutex code. It could also possibly be related to L1 cache contention. I think that if we can fit the idle loop for the 'logical' processor into a single instruction cache line and a single data cache line, accessing a single memory location without any locked bus cycles, that it would solve the problem. Unfortunately I have no boxes I can do testing on so this is just a guess. Another solution would be to have a global mask of 'idle' cpus and send an IPI to them when a new KSE is scheduled on a non-idle cpu that would simply serve to wakeup the HLT. IPIs are nasty, but there are large (power consumption) advantages to standardizing on the HLT methodology. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Hyperthreading and machdep.cpu_idle_hlt
:The cache and most of the execution hardware is shared. The execution :units can run something like 4 instructions per clock. If the idle :logical core is in a spinloop, then it is generating instructions for :execution, so you are dividing the execution resources between one context :that is doing real work, and the other context that is burning off the :excess resources. Overall, it is a huge loss. It is absolutely essential :that logical cpus be halted when they are not doing useful work. Ah, that makes sense. Are the two logical cpus shared 50-50? -Matt :Cheers, :-Peter :-- :Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Autodefaults in disklabel on 5.0dp2 install
I recently changed the swap backoff algorithm in -current and the MFC is slated for -stable. Try this change and see if it produces better results. -Matt Index: label.c === RCS file: /home/ncvs/src/release/sysinstall/Attic/label.c,v retrieving revision 1.98.2.12 diff -u -r1.98.2.12 label.c --- label.c 3 Jul 2002 00:01:08 - 1.98.2.12 +++ label.c 4 Jul 2002 04:39:03 - @@ -1228,7 +1228,7 @@ def = SWAP_MIN_SIZE * ONE_MEG; if (def SWAP_AUTO_LIMIT_SIZE * ONE_MEG) def = SWAP_AUTO_LIMIT_SIZE * ONE_MEG; - nom = (int)(physmem / 512) / 2; + nom = (int)(physmem / 512) / 8; sz = nom + (def - nom) * perc / 100; } swap_chunk = Create_Chunk_DWIM(label_chunk_info[here].c-disk, To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
:However, when you are saving a new version of an important file, :you need to be careful that the new version (and its directory :entry) hits the disk before the old one goes away. I know that vi :saves files in a safe way, whereas ee and emacs do not. (Emacs :introduces only a small race, though.) Also, mv will DTRT only if :the source and destination files live on the same filesystem. : I think you have that reversed. vi just overwrites the destination file (O_CREAT|O_TRUNC, try ktrace'ing a vi session and you will see). I believe emacs defaults to a mode where it creates a new file and renames it over the original. This means that there is a period of time where a crash may result in the loss of the file if the vi session cannot be recovered (with vi -r) after the fact. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD panic with umass
:Backtrace would be useful since you shouldn't be getting a panic. At the :worst, your usb reader just wouldn't work. : :-Nate At worse the system will crash. The problem is that USB devices sometimes return total garbage for the READ CAPACITY command. This isn't CAM's fault, it is a major issue with the way the USB code works as well as a major issue with the way certain USB devices respond to commands (or request lengths) that they do not understand. Sometimes the USB layer does not get an error or returns the wrong sense code. The USB code can also somtimes get out of sync with the device between commands, causing all sorts of havoc. When I was tracking down the Sony DiskKey problem I hit this problem. The Sony often returned complete garbage for READ CAPACITY until I put the proper quirk entries in. Since the CAM/SCSI layer does not bzero() the scsi_read_capacity_data structure, the result was random capacities and system crashes (e.g. when a certain value would wind up 0 the CAM layer would crash with a divide by 0). I added some debugging to try to catch the problem and it still happens to be in my tree so I can give you guys the diff. This is *NOT* for commit. It's just junk debugging code. Note that I added the M_ZERO option to the malloc. -Matt Index: scsi_da.c === RCS file: /home/ncvs/src/sys/cam/scsi/scsi_da.c,v retrieving revision 1.42.2.30 diff -u -r1.42.2.30 scsi_da.c --- scsi_da.c 20 Dec 2002 15:20:25 - 1.42.2.30 +++ scsi_da.c 30 Dec 2002 23:22:22 - @@ -546,8 +556,10 @@ rcap = (struct scsi_read_capacity_data *)malloc(sizeof(*rcap), M_TEMP, - M_WAITOK); - + M_WAITOK|M_ZERO); + scsi_ulto4b(313, (void *)rcap-length); + scsi_ulto4b(512, (void *)rcap-addr); + ccb = cam_periph_getccb(periph, /*priority*/1); scsi_read_capacity(ccb-csio, /*retries*/1, @@ -1187,6 +1199,7 @@ softc-minimum_cmd_size = 10; else softc-minimum_cmd_size = 6; + printf(QUIRKS %04x MCS %d MATCH %p\n, softc-quirks, +softc-minimum_cmd_size, match); /* * Block our timeout handler while we @@ -1748,6 +1761,8 @@ dp = softc-params; dp-secsize = scsi_4btoul(rdcap-length); dp-sectors = scsi_4btoul(rdcap-addr) + 1; + printf(RDCAP SECSIZE %d\n, (int)dp-secsize); + printf(RDCAP SECTORS %d\n, (int)dp-sectors); /* * Have the controller provide us with a geometry * for this disk. The only time the geometry @@ -1767,6 +1782,7 @@ dp-heads = ccg.heads; dp-secs_per_track = ccg.secs_per_track; dp-cylinders = ccg.cylinders; + printf(RDCAP HEADS %d SPT %d CYL %d\n, (int)ccg.heads, +(int)ccg.secs_per_track, (int)ccg.cylinders); } static void To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: FreeBSD panic with umass
:Hmm, good stuff, but shouldn't something be committed anyway? I mean if it :causes a panic just by plugging in the device that's totally unacceptable. :I'll provide a backtrace of the crash on my computer tomorrow I suppose (I :won't be home until then) and let people know if that's what's causing my :crash. : :Ken Yes, but it isn't quite that easy. I did fix the incorrect sense code issue with UMASS, but that's only one of the potentially many problems that could occur. It would probably also help (give us more deterministic panics / errors) if the read_capacity structure were at least bzero'd by CAM/SCSI. But half the problem is with the USB devices themselves. The device firmware for many of these devices, especially the Sony, was written by idiots. The entire USB specification was written by idiots IMHO. For example, the Sony will respond with garbage, and no error whatsoever, to just about any page inquiry command you send it. The Sony doesn't even return reasonable data for the code pages that the USB spec requires! Ultimately this means that the best we can do is to try to ensure that garbage data doesn't result in a system panic. That's a fairly tall order for such a low level subsystem. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VM_METER no longer defined?
: #define VM_METER VM_TOTAL This change was part of a larger patch set. I have no particular objection to adding the #define or renaming VM_TOTAL back to VM_METER, but I *WILL* point out that even without a name change the VM cnt structure has changed so often that for all intents and purposes the API has changed with or without the name change, on many occassions, and is likely to continue to change in the future. Because of this there isn't much point keeping name compatibility and unless you can point to some piece of third party software that is so adversely effected that the name must be changed back, I would not recommend changing it back. But if you want to do it, I will not stop you. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ia64 tinderbox failure
:... : is never a problem with its bounds). Other users of getbsize() in the : src tree but perhaps not ones in ports have been broken to match the : interface breakage. The usual breakage is to cast the size_t to int : without checking bounds. : :Agreed. Not a single consumer actually wants a size_t and not all base :system uses have been fixed for the new interface (ls(1) for instance). :I'd like to see the interface restored and merged into RELENG_5_0 before :we introduce this mistake on the world. : :Best regards, :Mike Barcroft Agreed. It really should be an int * like it used to be. But it's up to Mark Murray to fix it since it was his commit that changed it to a size_t in the first place. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: patch #3 Re: swapoff code comitted.
The swapctl code has been comitted. Bruce, you never supplied feedback in regards to your original nits, but feel free to clean the code up now that it has been comitted. All other bullets have been taken care of. I'm still on the fence in regards to backporting it. I would like to, but do not have the time at the moment. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: PAWS ack-on-ack loop avoided
The printf() is only in HEAD for feedback purposes. I'd like to leave it in there just a little while longer (maybe a week at the rate things are going). It looks like more people are hitting this bug(fix) then we previously thought would hit it, which is actually somewhat worrying because it only occurs when you get out-of-order timestamp replies. Could you tell me what services were running or what you were doing when you got the warnings? Are you running a web server? Talking to windows boxes at all? -Matt Matthew Dillon [EMAIL PROTECTED] :=20 : Which appears to just be triggered by a mechanism to drop : bad packets. Is this correct? Is this something I should be : concerned about? :=20 :Matt, : :I'm seeing these too. Can you please remove the relevant :printf() or at least limit it to the ``if (verbose)'' or :DIAGNOSTIC, whatever is more appropriate? : : :Cheers, :--=20 :Ruslan Ermilov Sysadmin and DBA, :[EMAIL PROTECTED] Sunbay Software AG, To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: PAWS ack-on-ack loop avoided
:I've got this on my development box which doesn't run any services. :I don't remember exactly what I've been doing when these appeared; :probably printing some connection data like IPs and ports from TCB :would help. : : :Cheers, :--=20 :Ruslan Ermilov Sysadmin and DBA, :[EMAIL PROTECTED] Sunbay Software AG, That's very odd. I see them on my development box too which is just talking FreeBSD-FreeBSD. We should not be seeing them at all. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: PAWS ack-on-ack loop avoided
:Hello -current, : :I'm seeing a bit (12 or more per day) of :PAWS ack-on-ack loop avoided in my /var/log/messages : :Which appears to just be triggered by a mechanism to drop :bad packets. Is this correct? Is this something I should be :concerned about? : :Thanks in advance, : :./muk : :-- :m. kolb [EMAIL PROTECTED] No, you don't need to be concerned, it's just debugging what we thought was a rare bug case (but apparently is not) that was recently fixed. The printf is only going to be in -current another couple of days. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: UMASS USB bug? (getting the Sony disk-on-key device working)
:The NetBSD code is already different: :1.48 (augustss 15-Sep-99): /* The OHCI hardware can handle at :most one page crossing. */ :1.48 (augustss 15-Sep-99): if (OHCI_PAGE(dataphys) == :dataphysend || :1.48 (augustss 15-Sep-99): OHCI_PAGE(dataphys) + :OHCI_PAGE_SIZE == dataphysend) { :1.48 (augustss 15-Sep-99): /* we can handle it in this :TD */ :1.48 (augustss 15-Sep-99): curlen = len; :1.48 (augustss 15-Sep-99): } else { :1.48 (augustss 15-Sep-99): /* must use multiple TDs, :fill as much as possible. */ :1.120(augustss 03-Feb-02): curlen = 2 * OHCI_PAGE_SIZE - :1.48 (augustss 15-Sep-99): (dataphys :(OHCI_PAGE_SIZE-1)); :1.78 (augustss 20-Mar-00): /* the length must be a :multiple of the max size */ :1.78 (augustss 20-Mar-00): curlen -= curlen % :UGETW(opipe-pipe.endpoint-edesc-wMaxPacketSize); :1.78 (augustss 20-Mar-00): #ifdef DIAGNOSTIC :1.78 (augustss 20-Mar-00): if (curlen == 0) :1.128(provos 27-Sep-02): :panic(ohci_alloc_std: curlen == 0); :1.78 (augustss 20-Mar-00): #endif :1.48 (augustss 15-Sep-99): } : : :To bad we did not catch it. : :-- :B.Walter COSMO-Project http://www.cosmo-project.de Well, that's the curlen fix, which doesn't apply to us at all (in FreeBSD we do not try to optimize for two physically contiguous pages). I'm not sure why they are using a mod there, I think it is as simple as if (curlen len) curlen = len, but I don't understand the 'the length must be a multiple of the max size' comment so maybe there is some magic there that I haven't considered. The fix that applies to both FreeBSD and NetBSD was the calculation of dataphysend just above the code you indicate. When I look at ohci.c via cvsweb for NetBSD, their 1.135, they have not fixed the dataphysend calculation yet. They still have (which is WRONG): dataphysend = OHCI_PAGE(dataphys + len - 1); The correct answer is: dataphysend = OHCI_PAGE(DMAADDR(dma, len - 1)); I am going to attempt to add [EMAIL PROTECTED] to this thread, I don't know if that is a valid email address :-) -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
UMASS USB bug? (getting the Sony disk-on-key device working).
It took a hellofalong time pulling my hair out trying to figure out why the Sony disk-on-key I just bought didn't work. First I added a Quirk entry for the standard 6-byte problem, but it didn't solve the problem. Finally, after slogging through an insane amount of debugging (I mean, it really generates a lot of debugging if you turn it all on!) I came up with the following patch. It appears that when an error occurs and the umass device tries to read the sense data that it fails comparing: sc-transfer_datalen - sc-transfer_actlen != UGETDW(sc-csw.dCSWDataResidue). As far as I can sc-transfer_actlen is NEVER updated. It is always 0, so I don't quite see how the calculation could ever possibly be correct if DataResidue is what I think it is (a count-down of the number of unused bytes after a transfer). Note that my other UMASS device, a compact flash reader, has always worked fine with just the Quirk entry. I really need a USB expert to tell me what is going on :-) With the patch below my Sony diskkey works. Note that the junk at the end of the patch is debugging. I noticed that the CAM layer thought the READCAPACITY command succeeded when it didn't, and it was generating weird sector-size errors due to the malloc'd return buffer containing garbage. I had to put real values in the buffer to catch the problem consistently. I don't know why the UMASS layer was returning a success code to CAM for failed READCAPACITY commands but it took an hour just to figure that CAM was using garbage in the return buffer. USB Experts gravitate here! Tell me I'm right or explain to me why I'm wrong, because this stuff is incredibly complex and I'm having problems thinking straight at 2:30 a.m. :-) -Matt Index: dev/usb/umass.c === RCS file: /home/ncvs/src/sys/dev/usb/umass.c,v retrieving revision 1.11.2.13 diff -u -r1.11.2.13 umass.c --- dev/usb/umass.c 21 Nov 2002 21:26:14 - 1.11.2.13 +++ dev/usb/umass.c 19 Dec 2002 10:21:58 - @@ -1488,6 +1488,7 @@ panic(%s: transferred %d bytes instead of %d bytes\n, USBDEVNAME(sc-sc_dev), sc-transfer_actlen, sc-transfer_datalen); +#if 0 } else if (sc-transfer_datalen - sc-transfer_actlen != UGETDW(sc-csw.dCSWDataResidue)) { DPRINTF(UDMASS_BBB, (%s: actlen=%d != residue=%d\n, @@ -1497,6 +1498,7 @@ umass_bbb_reset(sc, STATUS_WIRE_FAILED); return; +#endif } else if (sc-csw.bCSWStatus == CSWSTATUS_FAILED) { DPRINTF(UDMASS_BBB, (%s: Command Failed, res = %d\n, Index: cam/scsi/scsi_da.c === RCS file: /home/ncvs/src/sys/cam/scsi/scsi_da.c,v retrieving revision 1.42.2.29 diff -u -r1.42.2.29 scsi_da.c --- cam/scsi/scsi_da.c 23 Nov 2002 23:21:42 - 1.42.2.29 +++ cam/scsi/scsi_da.c 19 Dec 2002 10:28:11 - @@ -250,6 +250,14 @@ }, { /* +* Sony Key-Storage media fails in terrible ways without +* both quirks. +*/ + {T_DIRECT, SIP_MEDIA_REMOVABLE, Sony, Storage Media, *}, + /*quirks*/ DA_Q_NO_6_BYTE|DA_Q_NO_SYNC_CACHE + }, + { + /* * Sony DSC cameras (DSC-S30, DSC-S50, DSC-S70) */ {T_DIRECT, SIP_MEDIA_REMOVABLE, Sony, Sony DSC, *}, @@ -546,8 +554,10 @@ rcap = (struct scsi_read_capacity_data *)malloc(sizeof(*rcap), M_TEMP, - M_WAITOK); - + M_WAITOK|M_ZERO); + scsi_ulto4b(313, (void *)rcap-length); + scsi_ulto4b(512, (void *)rcap-addr); + ccb = cam_periph_getccb(periph, /*priority*/1); scsi_read_capacity(ccb-csio, /*retries*/1, @@ -1185,6 +1195,7 @@ softc-minimum_cmd_size = 10; else softc-minimum_cmd_size = 6; + printf(QUIRKS %04x MCS %d MATCH %p\n, softc-quirks, +softc-minimum_cmd_size, match); /* * Block our timeout handler while we @@ -1746,6 +1757,8 @@ dp = softc-params; dp-secsize = scsi_4btoul(rdcap-length); dp-sectors = scsi_4btoul(rdcap-addr) + 1; + printf(RDCAP SECSIZE %d\n, (int)dp-secsize); + printf(RDCAP SECTORS %d\n, (int)dp-sectors); /* * Have the controller provide us with a geometry * for
Re: UMASS USB bug? (getting the Sony disk-on-key device working).
: :On Thu, Dec 19, 2002 at 12:08:27PM +0100, Frode Nordahl wrote: : Hey, Matt : : While you're at it, could you have a look at PR kern/46176 ? =) : : At least would you tell me if you have the same problem with your : device(s) : :The umass cam interaction is questionable. :I've seen lots of problems caused by this. :As I have problems with my device caused by this too I already placed it on :my todo list. : :-- :B.Walter COSMO-Project http://www.cosmo-project.de :[EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] The panic described in 46176 has happened to me on -current. On -stable I have managed (very easily) to get the CAM layer vs UMASS layer into a confused state where the CAM layer thinks it is still attached but the UMASS layer thinks it has detached / cleaned everything out. In both cases it appears that memory is being freed by one side which is still being used by the other side but I haven't tracked down the exact cause. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: UMASS USB bug? (getting the Sony disk-on-key device working).
: : Note that my other UMASS device, a compact flash reader, has always : worked fine with just the Quirk entry. I really need a USB expert to : tell me what is going on :-) : :The problem is that an umass bulk only umass device is allowed to stall the :comunication pipe on an invalid command. :I got the impression that the umass driver doesn't clear the pipe on :errors. : :-- :B.Walter COSMO-Project http://www.cosmo-project.de In my traces I did occassionally see a command wind up in a STALLED state, but most of the time it either wound up in a BABBLE state or in a NAK state and hit the 5 second timeout. Since I removed the conditional (the #if 0 in the patch I posted earlier), I have not seen either state. These problems always occured while the CAM layer was probing the device (e.g. doing a READ CAPACITY, telling the lower layer to not allow user removal (as if that helps)). Once the CAM layer was actually able to get past that state the actual READs and WRITEs worked fine with just the Quirk entry, before my #if 0 patch. Prior to the patch it took a lot of random pulling of the device, putting it back in, pulling it again, putting it back in, camcontrol reset 1, camcontrol rescan 1, that sort of thing, before I would either get the device to work or the system would panic :-). After the patch everything works fine (though I'm sure pulling the device without unmounting it first will still lead to problems similar to those describde by Frode). p.s. the patch was against -stable. -current was crashing on me too much while playing with the disk key, but I'm sure it applies to -current too. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: UMASS USB bug? (getting the Sony disk-on-key device working)
: didn't solve the problem. : : You don't need the 6-byte quirk entries anymore. The umass(4) driver : automatically handles 6-byte commands (converting them to 10-byte commands) : and has done so for a while now. You should at least try removing the : 6 byte quirk for now. : :I tought this too and it's true for many devices, but the umass device :gets an invalid command first and the umass driver is required to :handle that failure in a special way for some devices. : :-- :B.Walter COSMO-Project http://www.cosmo-project.de :[EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] I'll try to get the diskkey working on -current first (I now have it working on -stable), then I'll try removing the Quirk entry on both stable and current to see if it still works and post a followup. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: [src] cvs commit: src/sys/geom geom_dev.c
This commit is crashing my -current box on boot when it goes to check for a core. I get the panic: Negative bio_offset (-1024) on bio ... Userland probably should not be allowed to panic the box in that way. -Matt Matthew Dillon [EMAIL PROTECTED] : :phk 2002/12/13 14:04:45 PST : : Modified files: :sys/geom geom_dev.c : Log: : Add a couple of KASSERTS, just in case. : : Revision ChangesPath : 1.33 +4 -0 src/sys/geom/geom_dev.c : : :Index: src/sys/geom/geom_dev.c :diff -u src/sys/geom/geom_dev.c:1.32 src/sys/geom/geom_dev.c:1.33 :--- src/sys/geom/geom_dev.c:1.32 Fri Nov 1 07:56:26 2002 :+++ src/sys/geom/geom_dev.cFri Dec 13 14:04:45 2002 :@@ -32,7 +32,7 @@ : * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF : * SUCH DAMAGE. : * :- * $FreeBSD: /repoman/r/ncvs/src/sys/geom/geom_dev.c,v 1.32 2002/11/01 15:56:26 phk :Exp $ :+ * $FreeBSD: /repoman/r/ncvs/src/sys/geom/geom_dev.c,v 1.33 2002/12/13 22:04:45 phk :Exp $ : */ : : #include sys/param.h :@@ -388,7 +388,11 @@ : gp = dev-si_drv1; : cp = dev-si_drv2; : bp2 = g_clone_bio(bp); :+ KASSERT(bp2 != NULL, (XXX: ENOMEM in a bad place)); : bp2-bio_offset = (off_t)bp-bio_blkno DEV_BSHIFT; :+ KASSERT(bp2-bio_offset = 0, :+ (Negative bio_offset (%jd) on bio %p, :+ (intmax_t)bp2-bio_offset, bp)); : bp2-bio_length = (off_t)bp-bio_bcount; : bp2-bio_done = g_dev_done; : g_trace(G_T_BIO, : To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: UMASS USB bug? (getting the Sony disk-on-key device working)
: : Eh? For ATAPI and UFM devices we never send a 6 byte command to the : device that can fail, only 10 byte commands. : :I believed this was a SCSI over bulk only device. : :-- :B.Walter COSMO-Project http://www.cosmo-project.de :[EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] Yes, this is a USB DiskKey - UMASS storage, SCSI over bulk only device. I've done some further testing on both -current and -stable. I cannot get the device to work unless I have the quirk entry in scsi_da.c. -Current has a quirk table for umass.c and already has a flag which disables the residue test. The patch for current thus does not require #if 0'ing out that code, only a quirk entry. Since I don't need any hacks beyond what is there already I am going to commit the two quirk entries for -current now. However, I am still unable to get the device to work properly in -Current. This is what happens (see below). test2 kernel: umass0: Sony USB Storage Media, rev 1.10/2.00, addr 2 test2 kernel: umass0: Get Max Lun not supported (IOERROR) test2 kernel: da2 at umass-sim0 bus 0 target 0 lun 0 test2 kernel: da2: Sony Storage Media 2.51 Removable Direct Access SCSI-0 device test2 kernel: da2: 1.000MB/s transfers test2 kernel: da2: Attempt to query device size failed: UNIT ATTENTION, Medium not present test2 kernel: (da2:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 test2 kernel: (da2:umass-sim0:0:0:0): CAM Status: SCSI Status Error test2 kernel: (da2:umass-sim0:0:0:0): SCSI Status: Check Condition test2 kernel: (da2:umass-sim0:0:0:0): UNIT ATTENTION asc:3a,0 test2 kernel: (da2:umass-sim0:0:0:0): Medium not present test2 kernel: (da2:umass-sim0:0:0:0): Retrying Command (per Sense Data) test2 kernel: (da2:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 test2 kernel: (da2:umass-sim0:0:0:0): CAM Status: SCSI Status Error test2 kernel: (da2:umass-sim0:0:0:0): SCSI Status: Check Condition test2 kernel: (da2:umass-sim0:0:0:0): UNIT ATTENTION asc:3a,0 test2 kernel: (da2:umass-sim0:0:0:0): Medium not present test2 kernel: (da2:umass-sim0:0:0:0): Retrying Command (per Sense Data) test2 kernel: (da2:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 test2 kernel: (da2:umass-sim0:0:0:0): CAM Status: SCSI Status Error test2 kernel: (da2:umass-sim0:0:0:0): SCSI Status: Check Condition test2 kernel: (da2:umass-sim0:0:0:0): UNIT ATTENTION asc:3a,0 test2 kernel: (da2:umass-sim0:0:0:0): Medium not present test2 kernel: (da2:umass-sim0:0:0:0): Retrying Command (per Sense Data) test2 kernel: (da2:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 test2 kernel: (da2:umass-sim0:0:0:0): CAM Status: SCSI Status Error test2 kernel: (da2:umass-sim0:0:0:0): SCSI Status: Check Condition test2 kernel: (da2:umass-sim0:0:0:0): UNIT ATTENTION asc:3a,0 test2 kernel: (da2:umass-sim0:0:0:0): Medium not present test2 kernel: (da2:umass-sim0:0:0:0): Retrying Command (per Sense Data) test2 kernel: (da2:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 test2 kernel: (da2:umass-sim0:0:0:0): CAM Status: SCSI Status Error test2 kernel: (da2:umass-sim0:0:0:0): SCSI Status: Check Condition test2 kernel: (da2:umass-sim0:0:0:0): UNIT ATTENTION asc:3a,0 test2 kernel: (da2:umass-sim0:0:0:0): Medium not present test2 kernel: (da2:umass-sim0:0:0:0): Retries Exhausted test2 kernel: Opened disk da2 - 6 But then I get this: test2 kernel: (da2:umass-sim0:0:0:0): Not ready to ready change, medium may have changed test2 kernel: (da2:umass-sim0:0:0:0): Retrying Command (per Sense Data) (no retry occurs) And if I tell cam to rescan a different Lun it works: camcontrol rescan 2:0:1 da3 at umass-sim0 bus 0 target 0 lun 1 da3: Sony Storage Media 2.51 Removable Direct Access SCSI-0 device da3: 1.000MB/s transfers da3: 125MB (256352 512 byte sectors: 64H 32S/T 125C) camcontrol rescan 2:0:2 da3 at umass-sim0 bus 0 target 0 lun 1 da3: Sony Storage Media 2.51 Removable Direct Access SCSI-0 device da3: 1.000MB/s transfers da3: 125MB (256352 512 byte sectors: 64H 32S/T 125C) I am not sure what is going on but I think in -current CAM is exhausting its retries too quickly (the messages are instantanious) and not giving the device enough time to boot up. This is because, I believe, the usb controller is now a kerneland thread instead of a userland usbd and is responding instantly to the device presence. I would have expected 'camcontrol rescan 2:0:0' but it doesn't. It just says: # camcontrol rescan 2:0:0 Re-scan of 2:0:0 was successful But then doesn't do anything. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: [src] cvs commit: src/sys/geom geom_dev.c
: :In message [EMAIL PROTECTED], Matthew Dillon w :rites: :This commit is crashing my -current box on boot when it :goes to check for a core. I get the panic: : :Negative bio_offset (-1024) on bio ... : :Userland probably should not be allowed to panic the box :in that way. : :Backtrace ? : :-- :Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 :[EMAIL PROTECTED] | TCP/IP since RFC 956 userland read() system call from savecore running through the standard sequence to read from a physical device, then panicing in geom. I don't have the serial console connected to that machine but I'll try to reproduce it in a little while and manually transcribe the backtrace. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
UMASS USB again...
It took a hellofalong time pulling my hair out trying to figure out why the Fumerola disk-on-key I just bought didn't work. First I added a Quirk entry for the standard 6-byte problem, but it didn't solve the problem. Finally, after slogging through an insane amount of debugging (I mean, it really generates a lot of debugging if you turn it all on!) I came up with the following patch. It appears that when an error occurs and the umass device tries to read the sense data that it fails comparing: sc-transfer_datalen - sc-transfer_actlen != UGETDW(sc-csw.dCSWDataResidue). As far as I can sc-transfer_actlen is NEVER updated. It is always 0, so I don't quite see how the calculation could ever possibly be correct if DataResidue is what I think it is (a count-down of the number of unused bytes after a transfer). Note that my other UMASS device, a compact flash reader, has always worked fine with just the Quirk entry. I really need a USB expert to tell me what is going on :-) With the patch below my Sony diskkey works. Note that the junk at the end of the patch is debugging. I noticed that the CAM layer thought the READCAPACITY command succeeded when it didn't, and it was generating weird sector-size errors due to the malloc'd return buffer containing garbage. I had to put real values in the buffer to catch the problem consistently. I don't know why the UMASS layer was returning a success code to CAM for failed READCAPACITY commands but it took an hour just to figure that CAM was using garbage in the return buffer. USB Experts gravitate here! Tell me I'm right or explain to me why I'm wrong, because this stuff is incredibly complex and I'm having problems thinking straight at 2:30 a.m. :-) -Matt Index: dev/usb/umass.c === RCS file: /home/ncvs/src/sys/dev/usb/umass.c,v retrieving revision 1.11.2.13 diff -u -r1.11.2.13 umass.c --- dev/usb/umass.c 21 Nov 2002 21:26:14 - 1.11.2.13 +++ dev/usb/umass.c 19 Dec 2002 10:21:58 - @@ -1488,6 +1488,7 @@ panic(%s: transferred %d bytes instead of %d bytes\n, USBDEVNAME(sc-sc_dev), sc-transfer_actlen, sc-transfer_datalen); +#if 0 } else if (sc-transfer_datalen - sc-transfer_actlen != UGETDW(sc-csw.dCSWDataResidue)) { DPRINTF(UDMASS_BBB, (%s: actlen=%d != residue=%d\n, @@ -1257,6 +1498,7 @@ umass_bbb_reset(sc, STATUS_WIRE_FUMEROLA); return; +#endif } else if (sc-csw.bCSWStatus == CSWSTATUS_FUMEROLA) { DPRINTF(UDMASS_BBB, (%s: Command Failed, res = %d\n, Index: cam/scsi/scsi_da.c === RCS file: /home/ncvs/src/sys/cam/scsi/scsi_da.c,v retrieving revision 1.42.2.29 diff -u -r1.42.2.29 scsi_da.c --- cam/scsi/scsi_da.c 23 Nov 2002 23:21:42 - 1.42.2.29 +++ cam/scsi/scsi_da.c 19 Dec 2002 10:28:11 - @@ -250,6 +250,14 @@ }, { /* +* Sony Key-Storage media fails in terrible ways without +* both quirks. +*/ + {T_DIRECT, SIP_MEDIA_REMOVABLE, Sony, Storage Media, *}, + /*quirks*/ DA_Q_NO_6_BYTE|DA_Q_NO_SYNC_CACHE + }, + { + /* * Sony DSC cameras (DSC-S30, DSC-S50, DSC-S70) */ {T_DIRECT, SIP_MEDIA_REMOVABLE, Sony, Sony DSC, *}, @@ -546,8 +554,10 @@ rcap = (struct scsi_read_capacity_data *)malloc(sizeof(*rcap), M_TEMP, - M_WAITOK); - + M_WAITOK|M_ZERO); + scsi_ulto4b(313, (void *)rcap-length); + scsi_ulto4b(512, (void *)rcap-addr); + ccb = cam_periph_getccb(periph, /*priority*/1); scsi_read_capacity(ccb-csio, /*retries*/1, @@ -1185,6 +1195,7 @@ softc-minimum_cmd_size = 10; else softc-minimum_cmd_size = 6; + printf(QUIRKS %04x MCS %d MATCH %p\n, softc-quirks, +softc-minimum_cmd_size, match); /* * Block our timeout handler while we @@ -1746,6 +1757,8 @@ dp = softc-params; dp-secsize = scsi_4btoul(rdcap-length); dp-sectors = scsi_4btoul(rdcap-addr) + 1; + printf(RDCAP SECSIZE %d\n, (int)dp-secsize); + printf(RDCAP SECTORS %d\n, (int)dp-sectors); /* * Have the controller provide us with a geometry * for this disk. The only time the geometry @@ -1765,6 +1778,7
VAX tinderbox failure
- Rebuilding the temporary build tree -- stage 1: bootstrap tools -- stage 2: cleaning up the object tree -- stage 2: rebuilding the object tree -- stage 2: build tools -- stage 3: cross tools -- stage 4: populating /home/dillon/tinderbox/vax/obj/local0/scratch/des/src/vax/usr/include -- stage 4: building libraries -- stage 4: make dependencies -- stage 4: building everything.. -- Kernel build for GENERIC started on Thu Dec 19 09:37:06 PST 2002 -- Kernel build for GENERIC completed on Thu Dec 19 10:28:44 PST 2002 -- Kernel build for LINT started on Thu Dec 19 10:28:44 PST 2002 -- === vesa Makefile, line 5401: warning: duplicate script for target geom_bsd.o ignored Makefile, line 5404: warning: duplicate script for target geom_mbr.o ignored /local0/scratch/dillon/src/sys/contrib/dev/acpica/dbdisply.c:131: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/dbexec.c:124: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/dbhistry.c:124: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/dbinput.c:125: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/dbstats.c:125: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/dbxface.c:127: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/hwgpe.c:122: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/hwregs.c: In function `AcpiGetSleepTypeData': /local0/scratch/dillon/src/sys/contrib/dev/acpica/hwregs.c:242: warning: cast discards qualifiers from pointer target type /local0/scratch/dillon/src/sys/contrib/dev/acpica/nsxfname.c:125: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/nsxfobj.c:126: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/rsdump.c:124: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/utclib.c:129: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/utdebug.c:122: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/contrib/dev/acpica/utglobal.c: In function `AcpiUtGetRegionName': /local0/scratch/dillon/src/sys/contrib/dev/acpica/utglobal.c:482: warning: cast discards qualifiers from pointer target type /local0/scratch/dillon/src/sys/contrib/dev/acpica/utglobal.c: In function `AcpiUtGetEventName': /local0/scratch/dillon/src/sys/contrib/dev/acpica/utglobal.c:520: warning: cast discards qualifiers from pointer target type /local0/scratch/dillon/src/sys/contrib/dev/acpica/utglobal.c: In function `AcpiUtGetTypeName': /local0/scratch/dillon/src/sys/contrib/dev/acpica/utglobal.c:590: warning: cast discards qualifiers from pointer target type /local0/scratch/dillon/src/sys/contrib/dev/acpica/utglobal.c:593: warning: cast discards qualifiers from pointer target type /local0/scratch/dillon/src/sys/dev/acpica/acpi_acad.c:50: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/dev/acpica/acpi_cmbat.c:56: warning: `_THIS_MODULE' defined but not used /local0/scratch/dillon/src/sys/dev/acpica/acpi_powerres.c:272: warning: `acpi_pwr_deregister_consumer' defined but not used /local0/scratch/dillon/src/sys/dev/acpica/acpi_powerres.c:210: warning: `acpi_pwr_deregister_resource' defined but not used /local0/scratch/dillon/src/sys/dev/ie/if_ie.c: In function `ieattach': /local0/scratch/dillon/src/sys/dev/ie/if_ie.c:778: warning: assignment discards qualifiers from pointer target type /local0/scratch/des/src/sys/dev/ie/if_ie.c: In function `ieget': /local0/scratch/dillon/src/sys/dev/ie/if_ie.c:1147: warning: passing arg 1 of `bcopy' discards qualifiers from pointer target type /local0/scratch/dillon/src/sys/dev/ie/if_ie.c:1237: warning: passing arg 1 of `bcopy' discards qualifiers from pointer target type /local0/scratch/dillon/src/sys/dev/ie/if_ie.c:1237: warning: passing arg
Re: UMASS USB bug? (getting the Sony disk-on-key device working)
This is a real mess but I finally got it to work. (Note to John: both quirk entries are absolutely necessary, everything stalls and dies without them). Problem #1: RA_NO_CLEAR_UA quirk required in umass.c. Problem #2: RA_NO_CLEAR_UA quirk code is broken, causes CAM to think that the READ_CAPACITY command succeed when it actually failed due to the umass_cam_quirk_cb() function changing the return status. (Machine crashes on bogus capacity data, junk that was sitting in the malloc'd buffer which the machine thinks is real). Problem #3: After fixing RA_NO_CLEAR_UA, CAM still retries four times quickly and fails. But at least it doesn't crash and burn. This is odd... in this state /dev/da2 exists and if I run 'fdisk da2' it does in fact work, as does dd'ing. If I run 'fdisk da2' manually CAM tries to do a READ CAPACITY and gets a sense key indicating that the media changed, and recovers from there, except disklabel still doesn't work (/dev only has da2, it doesn't have any of the slice entries for some reason). I tried bumping the number of CAM retries from 4 to 10. No joy. I tried adding the NO_TEST_UNIT_READY quirk to force UMASS to issue a start-unit command instead of a test-unit-ready command. That didn't work. I tried adding a 0.3-second DELAY(30) between retries. Ahhh... THAT WORKED! The device takes over a second before it TEST_UNIT_READY returns TRUE. Ok, so here is the patch. I need help with two things. First, are my RA_NO_CLEAR_UA bug fixes correct? Second, does anyone have any ideas on how we can make CAM/UMASS friendlier to devices which take longer to get themselves going?Obviously sticking a DELAY(30) in the middle of an interrupt routine is not a good thing to do. Is there any way to get CAM to poll the device every once in a while to see if the media is ready? Note that 'camcontrol rescan correct_bus' does not work. It does not cause cam to rescan the USB device, it does not cause geom to pick up on the fact that da2 is good. It doesn't seem to do anything in fact :-( -Matt Index: sys/cam/scsi/scsi_da.c === RCS file: /home/ncvs/src/sys/cam/scsi/scsi_da.c,v retrieving revision 1.118 diff -u -r1.118 scsi_da.c --- sys/cam/scsi/scsi_da.c 18 Dec 2002 21:47:52 - 1.118 +++ sys/cam/scsi/scsi_da.c 19 Dec 2002 21:56:31 - @@ -271,6 +271,16 @@ }, { /* +* Sony Key-Storage media fails in terrible ways without +* both quirks. The auto 6-10 code doesn't do the job. +* (note: The Sony diskkey is actually the MSYSTEMS +* disk-on-key device). +*/ + {T_DIRECT, SIP_MEDIA_REMOVABLE, Sony, Storage Media, *}, + /*quirks*/ DA_Q_NO_6_BYTE|DA_Q_NO_SYNC_CACHE + }, + { + /* * Sony DSC cameras (DSC-S30, DSC-S50, DSC-S70) */ {T_DIRECT, SIP_MEDIA_REMOVABLE, Sony, Sony DSC, *}, Index: sys/dev/usb/umass.c === RCS file: /home/ncvs/src/sys/dev/usb/umass.c,v retrieving revision 1.67 diff -u -r1.67 umass.c --- sys/dev/usb/umass.c 8 Nov 2002 07:57:42 - 1.67 +++ sys/dev/usb/umass.c 19 Dec 2002 23:05:13 - @@ -345,6 +345,10 @@ UMASS_PROTO_SCSI | UMASS_PROTO_CBI, NO_TEST_UNIT_READY | NO_START_STOP }, + { USB_VENDOR_MSYSTEMS, USB_PRODUCT_MSYSTEMS_DISKONKEY, RID_WILDCARD, + UMASS_PROTO_SCSI | UMASS_PROTO_BBB, + IGNORE_RESIDUE | NO_GETMAXLUN | RS_NO_CLEAR_UA + }, { USB_VENDOR_OLYMPUS, USB_PRODUCT_OLYMPUS_C1, RID_WILDCARD, UMASS_PROTO_SCSI | UMASS_PROTO_BBB, WRONG_CSWSIG @@ -2606,7 +2610,7 @@ /* Getting sense data always succeeds (apart from wire * failures). */ - if (sc-quirks RS_NO_CLEAR_UA + if ((sc-quirks RS_NO_CLEAR_UA) csio-cdb_io.cdb_bytes[0] == INQUIRY (csio-sense_data.flags SSD_KEY) == SSD_KEY_UNIT_ATTENTION) { @@ -2622,21 +2626,24 @@ * CCI) */ ccb-ccb_h.status = CAM_REQ_CMP; - } else if ((sc-quirks RS_NO_CLEAR_UA) /* XXX */ + } else if ((sc-quirks RS_NO_CLEAR_UA)
Re: UMASS USB bug? (getting the Sony disk-on-key device working)
: : -Current has a quirk table for umass.c and already has a flag which : disables the residue test. The patch for current thus does not require : #if 0'ing out that code, only a quirk entry. Since I don't need any : hacks beyond what is there already I am going to commit the two quirk : entries for -current now. : :Please respect the maintainer of da(4). There's info about quirk :documentation at: : http://www.root.org/~nate/freebsd/quirks.html : :I still have a broken finger and a job so slow responses do not mean I am :ignoring you. Well, what do you think about the Quirk entry? I tested both stable and current with and without the quirk entry, and with various combinations of options. It doesn't work without thet quirk entry. : test2 kernel: umass0: Sony USB Storage Media, rev 1.10/2.00, addr 2 : test2 kernel: umass0: Get Max Lun not supported (IOERROR) : test2 kernel: da2 at umass-sim0 bus 0 target 0 lun 0 : test2 kernel: da2: Sony Storage Media 2.51 Removable Direct Access SCSI-0 device : test2 kernel: da2: 1.000MB/s transfers : test2 kernel: da2: Attempt to query device size failed: UNIT ATTENTION, Medium not :present : :Bus scan is probably happening before the device is fully powered up. :One problem I noticed was that umass_cam_rescan() doesn't fill out the ccb :properly, leaving timeout as 0 for instance. This probably won't hurt but :I'm not sure. Another bug available from cursory overview is: I'll take a look at the attach code. It would be cool if there were something easy we could do there, but I am not optimistic considering the fact that the device responds to the control channel queries just fine long, long before it tells us that the media is ready. : test2 kernel: (da2:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 : test2 kernel: (da2:umass-sim0:0:0:0): CAM Status: SCSI Status Error : test2 kernel: (da2:umass-sim0:0:0:0): SCSI Status: Check Condition : test2 kernel: (da2:umass-sim0:0:0:0): UNIT ATTENTION asc:3a,0 : test2 kernel: (da2:umass-sim0:0:0:0): Medium not present : test2 kernel: (da2:umass-sim0:0:0:0): Retrying Command (per Sense Data) : :CAM retries the CCB as requested by scsi_da.c in daopen(). Yes, I know. That wasn't really my question. This is USB we are talking about here after all. : test2 kernel: (da2:umass-sim0:0:0:0): Not ready to ready change, medium may have :changed : test2 kernel: (da2:umass-sim0:0:0:0): Retrying Command (per Sense Data) : (no retry occurs) : :And if I tell cam to rescan a different Lun it works: : :Because the device has had time to power up. That is my assumption too, but then again the device responds just fine to the control channel commands (getting the device name and all of that rot). I'm hoping it is as simple as a timeout parameter somewhere but where to look... : I am not sure what is going on but I think in -current CAM is : exhausting its retries too quickly (the messages are instantanious) : and not giving the device enough time to boot up. This is because, : I believe, the usb controller is now a kerneland thread instead of : a userland usbd and is responding instantly to the device presence. : :No, umass is attaching before the device is ready. Note the difference in :delay between camcontrol rescan on a SPI bus vs. USB. The SPI controller :is taking the appropriate time per device for it to respond to select. :umass_cam_attach attempts to delay the bus scan to have the same effect :but may not be working properly here. I have no idea what the USB spec :says here. Hmm. Any idea where I should look? : I would have expected 'camcontrol rescan 2:0:0' but it doesn't. : It just says: : : # camcontrol rescan 2:0:0 : Re-scan of 2:0:0 was successful : : But then doesn't do anything. : :umass is not calling the attach code again for some reason. : :-Nate Is it supposed to? I'm looking for guidance. -- I found another couple of bugs, this time in OHCI's DMA buffer chaining code. A patch for this with additional debugging code is included below (for current). There are two bugs. I do not know if -stable is effected. First, the calculation of dataphysend is totally bogus. You can just take the physical address and add (len - 1) to it. You have to take the virtual address, add len - 1 to it, and convert it to a physical address. I can crash my machine simply by doing a 'newfs -f 1024 -b 8192 /dev/da2s1a' On the disk-on-key USB device. Second, I believe the OpenBSD and NetBSD code is broken. The range can be one or two pages, but the remaining bytes may be less then one page and this has to be taken into account. -Matt Matthew Dillon [EMAIL PROTECTED
Re: UMASS USB bug? (getting the Sony disk-on-key device working)
:First, the calculation of dataphysend is totally bogus. :You can just take the physical address and add (len - 1) :to it. You have to take the virtual address, add len - 1 :to it, and convert it to a physical address. I can :crash my machine simply by doing a God, my grammer is getting really bad. I meant, You can't just take the physical address and add (len - 1) to it, not that you can :-) Then the sentence makes sense. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: kernel panic trying to utilize a da(4)/umass(4) device with ohci(4)
I fixed a serious DMA physical address translation bug in OHCI today. Look on the lists today for this message. Message and patch enclosed. Note that this fixes one bug in FreeBSD's implementation and a second bug that is NetBSD/OpenBSD specific. I would appreciate it if someone would forward this information to the NetBSD/OpenBSD folks. These were very serious bugs. -Matt :Date: Thu, 19 Dec 2002 17:11:32 -0800 (PST) :From: Matthew Dillon [EMAIL PROTECTED] :Message-Id: [EMAIL PROTECTED] :To: Nate Lawson [EMAIL PROTECTED] :Cc: [EMAIL PROTECTED] :Subject: Re: UMASS USB bug? (getting the Sony disk-on-key device working) :References: [EMAIL PROTECTED] :Sender: [EMAIL PROTECTED] :List-ID: freebsd-current.FreeBSD.ORG :List-Archive: http://docs.freebsd.org/mail/ (Web Archive) :List-Help: mailto:[EMAIL PROTECTED]?subject=help (List Instructions) :List-Subscribe: mailto:[EMAIL PROTECTED]?subject=subscribe%20freebsd-current :List-Unsubscribe: mailto:[EMAIL PROTECTED]?subject=unsubscribe%20freebsd-current :X-Loop: FreeBSD.ORG :Precedence: bulk :... : Index: ohci.c === RCS file: /home/ncvs/src/sys/dev/usb/ohci.c,v retrieving revision 1.116 diff -u -r1.116 ohci.c --- ohci.c 9 Dec 2002 01:41:24 - 1.116 +++ ohci.c 20 Dec 2002 01:02:11 - @@ -493,17 +493,17 @@ u_int32_t intr, tdflags; int offset = 0; int len, curlen; + int orig_len; usb_dma_t *dma = xfer-dmabuf; u_int16_t flags = xfer-flags; DPRINTFN(alen 4096,(ohci_alloc_std_chain: start len=%d\n, alen)); - len = alen; + orig_len = len = alen; cur = sp; - dataphys = DMAADDR(dma, 0); - dataphysend = OHCI_PAGE(dataphys + len - 1); + dataphysend = OHCI_PAGE(DMAADDR(dma, len - 1)); tdflags = htole32( (rd ? OHCI_TD_IN : OHCI_TD_OUT) | (flags USBD_SHORT_XFER_OK ? OHCI_TD_R : 0) | @@ -518,8 +518,8 @@ /* The OHCI hardware can handle at most one page crossing. */ #if defined(__NetBSD__) || defined(__OpenBSD__) - if (OHCI_PAGE(dataphys) == OHCI_PAGE(dataphysend) || - OHCI_PAGE(dataphys) + OHCI_PAGE_SIZE == OHCI_PAGE(dataphysend)) + if (OHCI_PAGE(dataphys) == dataphysend || + OHCI_PAGE(dataphys) + OHCI_PAGE_SIZE == dataphysend) #elif defined(__FreeBSD__) /* XXX This is pretty broken: Because we do not allocate * a contiguous buffer (contiguous in physical pages) we @@ -527,7 +527,7 @@ * So check whether the start and end of the buffer are on * the same page. */ - if (OHCI_PAGE(dataphys) == OHCI_PAGE(dataphysend)) + if (OHCI_PAGE(dataphys) == dataphysend) #endif { /* we can handle it in this TD */ @@ -544,6 +544,8 @@ /* must use multiple TDs, fill as much as possible. */ curlen = 2 * OHCI_PAGE_SIZE - OHCI_PAGE_MASK(dataphys); + if (curlen len) /* may have fit in one page */ + curlen = len; #elif defined(__FreeBSD__) /* See comment above (XXX) */ curlen = OHCI_PAGE_SIZE - @@ -568,6 +570,9 @@ dataphys, dataphys + curlen - 1)); if (len == 0) break; + if (len 0) + panic(Length went negative: %d curlen %d (dma %p offset %08x +dataphysend %p currentdataphysend %p, len, curlen, *dma, (int)offset, (void +*)dataphysend, (void *)OHCI_PAGE(DMAADDR(dma,0) + orig_len - 1)); + DPRINTFN(10,(ohci_alloc_std_chain: extend chain\n)); offset += curlen; cur = next; To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: swapoff code comitted.
:I have made a small patch, added l, s and h switches to show :information about the swap devices. And the U switch to swapctl only :to remove all activated swap devices. :If anything else is needed let me know and I will add it. : :--=20 : :Eirik Nygaard [EMAIL PROTECTED] :PGP Key: 83C55EDE That is a pretty good first attempt. I have a few suggests and found one bug. First the bug: :+ is_swapctl ? lsU : ); I think that was supposed to be a call to is_swapctl, not a pointer to the function. Suggestions: Get rid of the is_swap*() functions and instead use av[0] at the top of main() and use strstr() to determine if the program is swapon, swapoff, or swapctl. Check against swapon and swapoff and if it is neither then default to swapctl (don't test against swapctl). Store which program it is in a global variable, e.g. an enum like this: enum { SWAPON, SWAPOFF, SWAPCTL } which_prog = SWAPCTL; ... main(...) { if (strstr(av[0], swapon)) which_prog = SWAPON; else if (strstr(av[0], swapoff)) which_prog = SWAPOFF; ... } In regards to retrieving swap information, in -current there is a sysctl() to do it. Take a look at /usr/src/usr.sbin/pstat/pstat.c (in the current source tree), at the swapmode_kvm() and swapmode_sysctl() functions. The sysctl is much, much faster then the kvm call because the kvm call has to run through the swap radix tree to collect the useage information. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: swapoff code comitted.
:Added the enum instead of is_swap* commands and changed from kvm to :sysctl to get the swap information. : :Eirik Nygaard [EMAIL PROTECTED] :PGP Key: 83C55EDE All right, I found a couple more bugs and fleshed it out a bit. You got your LINKS and MLINKS reversed and forgot a +=, you forgot to initialize the do_swapoff variable in swap_on_off(), and you reuse 'total' and 'used' in swaplist() in a manner which breaks the -s option. I have included an updated patch below based on these fixes and a few other minor cleanups. I also changed the block size for -h from 1000 to 1024 bytes to make it consistent with pstat -s and friends (and also OpenBSD and NetBSD). I also added -A, -U, cleaned up usage(), and made the options conform to NetBSD and OpenBSD. The only thing really missing is for us to handle the BLOCKSIZE environment variable like 'df' does, and appropriate additions to the manual page (which I would be happy to do). Once we get those in I will commit it. Here is an updated patch. -Matt Index: Makefile === RCS file: /home/ncvs/src/sbin/swapon/Makefile,v retrieving revision 1.7 diff -u -r1.7 Makefile --- Makefile15 Dec 2002 19:17:56 - 1.7 +++ Makefile18 Dec 2002 21:31:41 - @@ -4,6 +4,8 @@ PROG= swapon MAN= swapon.8 LINKS= ${BINDIR}/swapon ${BINDIR}/swapoff +LINKS+=${BINDIR}/swapon ${BINDIR}/swapctl MLINKS=swapon.8 swapoff.8 +MLINKS+=swapon.8 swapctl.8 .include bsd.prog.mk Index: swapon.c === RCS file: /home/ncvs/src/sbin/swapon/swapon.c,v retrieving revision 1.13 diff -u -r1.13 swapon.c --- swapon.c15 Dec 2002 19:17:56 - 1.13 +++ swapon.c18 Dec 2002 22:20:42 - @@ -45,6 +45,11 @@ $FreeBSD: src/sbin/swapon/swapon.c,v 1.13 2002/12/15 19:17:56 dillon Exp $; #endif /* not lint */ +#include sys/stat.h +#include sys/param.h +#include sys/user.h +#include sys/sysctl.h + #include err.h #include errno.h #include fstab.h @@ -52,10 +57,13 @@ #include stdlib.h #include string.h #include unistd.h +#include fcntl.h + +static void usage(void); +static int swap_on_off(char *name, int ignoreebusy); +static void swaplist(int, int, int); -static void usage(const char *); -static int is_swapoff(const char *); -intswap_on_off(char *name, int ignoreebusy, int do_swapoff); +enum { SWAPON, SWAPOFF, SWAPCTL } orig_prog, which_prog = SWAPCTL; int main(int argc, char **argv) @@ -63,48 +71,105 @@ struct fstab *fsp; int stat; int ch, doall; - int do_swapoff; - char *pname = argv[0]; - - do_swapoff = is_swapoff(pname); - + int sflag = 0, lflag = 0, hflag = 0; + + if (strstr(argv[0], swapon)) + which_prog = SWAPON; + else if (strstr(argv[0], swapoff)) + which_prog = SWAPOFF; + orig_prog = which_prog; + doall = 0; - while ((ch = getopt(argc, argv, a)) != -1) - switch((char)ch) { + while ((ch = getopt(argc, argv, AadlhksU)) != -1) { + switch(ch) { + case 'A': + if (which_prog == SWAPCTL) { + doall = 1; + which_prog = SWAPON; + } else { + usage(); + } + break; case 'a': - doall = 1; + if (which_prog == SWAPON || which_prog == SWAPOFF) + doall = 1; + else + which_prog = SWAPON; + break; + case 'd': + if (which_prog == SWAPCTL) + which_prog = SWAPOFF; + else + usage(); + break; + case 's': + sflag = 1; + break; + case 'l': + lflag = 1; + break; + case 'h': + hflag = 'M'; + break; + case 'k': + hflag = 'K'; + break; + case 'U': + if (which_prog == SWAPCTL) { + doall = 1; + which_prog = SWAPOFF; + } else { + usage(); + } break; case '?': default: - usage(pname); + usage(); } + } argv += optind; - + stat = 0; - if (doall) -
patch #3 Re: swapoff code comitted.
Here's another update. I cleaned things up even more, add BLOCKSIZE support, and updated the manual page. It looks quite nice now. -Matt Index: Makefile === RCS file: /home/ncvs/src/sbin/swapon/Makefile,v retrieving revision 1.7 diff -u -r1.7 Makefile --- Makefile15 Dec 2002 19:17:56 - 1.7 +++ Makefile18 Dec 2002 21:31:41 - @@ -4,6 +4,8 @@ PROG= swapon MAN= swapon.8 LINKS= ${BINDIR}/swapon ${BINDIR}/swapoff +LINKS+=${BINDIR}/swapon ${BINDIR}/swapctl MLINKS=swapon.8 swapoff.8 +MLINKS+=swapon.8 swapctl.8 .include bsd.prog.mk Index: swapon.8 === RCS file: /home/ncvs/src/sbin/swapon/swapon.8,v retrieving revision 1.21 diff -u -r1.21 swapon.8 --- swapon.815 Dec 2002 19:17:56 - 1.21 +++ swapon.818 Dec 2002 22:46:01 - @@ -43,45 +43,101 @@ .Fl a .Nm swap[on|off] .Ar special_file ... +.Nm swapctl +.Fl lshk +.Nm swapctl +.Fl AU +.Nm swapctl +.Fl a +.Ar special_file ... +.Nm swapctl +.Fl d +.Ar special_file ... .Sh DESCRIPTION The +.Nm swap[on,off,ctl] +utilties are used to control swap devices in the system. At boot time all +swap entries in +.Pa /etc/fstab +are added automatically when the system goes multi-user. +Swap devices are interleaved and kernels are typically configured +to handle a maximum of 4 swap devices. There is no priority mechanism. +.Pp +The .Nm swapon -utility is used to specify additional devices on which paging and swapping -are to take place. -The system begins by swapping and paging on only a single device -so that only one disk is required at bootstrap time. -Calls to -.Nm swapon -normally occur in the system multi-user initialization file -.Pa /etc/rc -making all swap devices available, so that the paging and swapping -activity is interleaved across several devices. +utility adds the specified swap devices to the system. If the +.Fl a +option is used, all swap devices in +.Pa /etc/fstab +will be added, unless their ``noauto'' option is also set. .Pp The .Nm swapoff -utility disables paging and swapping on a device. -Calls to +utility removes the specified swap devices from the system. If the +.Fl a +option is used, all swap devices in +.Pa /etc/fstab +will be removed, unless their ``noauto'' option is also set. +Note that .Nm swapoff -succeed only if disabling the device would leave enough -remaining virtual memory to accomodate all running programs. +will fail and refuse to remove a swap device if there is insufficient +VM (memory + remaining swap devices) to run the system. +.Nm Swapoff +must move sawpped pages out of the device being removed which could +lead to high system loads for a period of time, depending on how +much data has been swapped out to that device. .Pp -Normally, the first form is used: -.Bl -tag -width indent -.It Fl a -All devices marked as ``sw'' -swap devices in +The +.Nm swapctl +utility exists primarily for those familiar with other BSDs and may be +used to add, remove, or list swap. Note that the +.Fl a +option is used diferently in +.Nm swapctl +and indicates that a specific list of devices should be added. +The +.Fl d +option indicates that a specific list should be removed. The +.Fl A +and +.Fl D +options to +.Nm swapctl +operate on all swap entries in .Pa /etc/fstab -are added to or removed from the pool of available swap -unless their ``noauto'' option is also set. -.El +which do not have their ``noauto'' option set. +.Pp +Swap information can be generated using the +.Nm swapinfo +program, +.Nm pstat +.Fl s , +or +.Nm swapctl +.Fl lshk . +The +.Nm swapctl +utility has the following options for listing swap: +.Bl -tag -width indent +.It Fl l +List the devices making up system swap. +.It Fl s +Print a summary line for system swap. +.It Fl h +Output values in megabytes. +.It Fl k +Output values in kilobytes. .Pp -The second form is used to configure or disable individual devices. +The BLOCKSIZE environment variable is used if not specifically +overridden. 512 byte blocks are used by default. +.El .Sh SEE ALSO .Xr swapon 2 , .Xr fstab 5 , .Xr init 8 , .Xr mdconfig 8 , .Xr pstat 8 , +.Xr swapinfo 8 , .Xr rc 8 .Sh FILES .Bl -tag -width /dev/{ad,da}?s?b -compact Index: swapon.c === RCS file: /home/ncvs/src/sbin/swapon/swapon.c,v retrieving revision 1.13 diff -u -r1.13 swapon.c --- swapon.c15 Dec 2002 19:17:56 - 1.13 +++ swapon.c18 Dec 2002 22:53:52 - @@ -45,6 +45,11 @@ $FreeBSD: src/sbin/swapon/swapon.c,v 1.13 2002/12/15 19:17:56 dillon Exp $; #endif /* not lint */ +#include sys/stat.h +#include sys/param.h +#include sys/user.h +#include sys/sysctl.h + #include err.h #include errno.h #include fstab.h @@ -52,10 +57,13 @@ #include stdlib.h #include string.h #include unistd.h +#include
Re: patch #3 Re: swapoff code comitted.
:Looks good to me, modulo a few nits. I try not to nitpick, but :I've mentioned a few of them below. (BDE does a better job of it :than I do anyway. :-) : :The patch puts identical functionality in two places, so maybe it :would make sense to rip support for -s out of pstat/swapinfo (and :integrate 'pstat -ss' support into swapctl). If we really want to :go the NetBSD way, we could even integrate the swapon(2) and :swapoff(2) into swapctl(2). Doesn't matter to me. I think we should keep swapon and swapoff as separate commands. They are the most intuitive of the lot. NetBSD's pstat supports -s, as does OpenBSD's, so there is no reason to rip out support for -s in our pstat. Neither OpenBSD or NetBSD have swapinfo that I can find. We could potentially rip out the swapinfo command though all it is is a hardlink to pstat so it wouldn't really save us anything. :(BTW, when I get the chance, I'll re-run my swapoff torture tests :now that Alan Cox's new locking is in place. Chances are the :swapoff code needs to be brought up to date..) I ran it across Alan and he thought it looked ok at a glance, but I agree now that it is integrated in we should take a more involved look at it. :... :[...] : +if (strstr(argv[0], swapon)) : +which_prog = SWAPON; : +else if (strstr(argv[0], swapoff)) : +which_prog = SWAPOFF; : :It's probably better to do a strcmp on strrchr(argv[0], '/') when :argv[0] contains a slash. Otherwise people will wonder why :swapoff(8) breaks when they (perhaps mistakenly) compile and run :it from the src/sbin/swapon directory. Hmm. How about a strstr on a strrchr. I don't like making exact comparisons because it removes flexibility that someone might want in regards to hardlinks (for example, someone might want to add a version or other suffix to differentiate certain binaries in a test environment or in an emulation environment). e.g. bsdswapon vs swapon. Isn't there a shortcut procedure to handle the NULL return case? I know there is one for a forward scan. I thought there was one for the reverse scan too. if ((ptr = strrchr(argv[0], '/')) == NULL) ptr = argv[0]; if (strstr(ptr, swapon)) ... : +if (which_prog == SWAPCTL) { : +doall = 1; : +which_prog = SWAPON; : +} else { : +usage(); : +} : +break; :[...] : :The repeated 'whichprog == foo' tests can be combined into a :single test at the end of the loop. They do subtly different things so I am not sure what you mean. You need to provide some code here. : - : + : :? It's probably a space or a tab. I'll track it down. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: patch #3 Re: swapoff code comitted.
: :On Wed, 18 Dec 2002, Matthew Dillon wrote: : : Here's another update. I cleaned things up even more, add BLOCKSIZE : support, and updated the manual page. It looks quite nice now. : :I still dislike it. It starts by adding style bugs to the Makefile :(changing = to += for the initial assignments to LINKS and MLINKS) :and doesn't get any better. : :Bruce It hasn't been committed yet so please feel free to email me patches for stylistic issues over the next day or two. In regards to having a swapctl command at all, I think anything that offers familiarity between the BSDs is a good thing to have. If anything we should remove 'swapinfo'. 'pstat -s' is actually shorter and easier to type :-) -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ipfw userland breaks again.
Huh. Interesting. The IP_FW_ADD test threw me but now that I look at the code more closely it is only there because IP_FW_ADD is a valid SOPT_GET op as well as a SOPT_SET op. But FLUSH and friends are SOPT_SET only. Now I see how it works :-) -Matt :.. :: rule that, say, prevents spoofing is as bad as adding a rule that :: allows everything through :-( : :This comment got me thinking. The thinking lead to a lot of looking :at code between compiles today, and more this evening. It would :appear that the test that was there was sufficient to deal with the :cases that I was worried about. Revisiting the change: : :- if (sopt-sopt_name == IP_FW_ADD || :+ if (sopt-sopt_name == IP_FW_ADD || sopt-sopt_name == IP_FW_UNBREAK || : (sopt-sopt_dir == SOPT_SET sopt-sopt_name != IP_FW_RESETLOG)) { : :Earlier, we only allow IP_FW_{ADD,UNBREAK,RESETLOG,FLUSH,DELETE} for :SOPT_SET requests and IP_FW_ADD (and a few others) for SOPT_GET :requests. Since GET + ADD is only case that isn't a SET that changes :things, the == SOPT_SET takes care of the case that you added. : :For a while I thought one could do nasty things based on GET + FLUSH, :say, but in raw_ip.c, we do the proper checks before calling :ip_fw_ctl_ptr(). : :So it looks like this code is subtle enough to have fooled both of :us. This one change isn't needed for this patch. : :Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm leaving the project
: :Matt Dillon wrote: : Thanks to my dear friend Warner Losh. I've decided to leave FreeBSD and : flame in another project. Maybe I could join OpenBSD, the seem to share : my views on how to deal with other people. : : I hereby give maintainership of all my code to Warner, or, whoever wants : it, for that matter. : :Does anyone know why this person is trying to (poorly) impersonate MD? Probably because I lambast him mercilessly for being such a whimp. It's kinda sad, actually. He's probably not making any friends with the people running the blind proxies he abuses to post, either. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm leaving the project
:You know the person by name/alias, then? Who is it? : I do not know who it is, he posts through anonymous proxies. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ipfw userland breaks again.
:How this could be helpful in a remote upgrade scenario that has :IPFW ABI incompatibility issues? : :One alternative approach would be to not compile IPFW into a :kernel but rather have it loaded as a module. Then, you :install new kernel, edit out ipfw_enable=3DYES for the time :being, reboot with the new kernel, installworld, edit :ipfw_enable=3DYES back in, reboot, and you're done. : : :Cheers, :--=20 :Ruslan Ermilov Sysadmin and DBA, Well, the basic problem is that you don't actually know when the IPFW API is going to break. I do incremental upgrades most of the time and IPFW breaks maybe once every 5 upgrades. So for a manual upgrade it can be a severe inconvenience to have to deal with the possibility every time you upgrade. For an automated upgrade one can always automate and 'ipfw unbreak' (or 'ipfw open' as John just suggested to me) is not needed. What this patch does is allow you to upgrade via a serial console normally, without having to pay particular attention to IPFW, and if the IPFW API happens to break you can then simply 'ipfw unbreak' to get access to the network and then fix whatever broke. The only viable alternative that I've heard so far on the lists, other then 'Matt should rewrite the API so it doesn't break' is to have the installkernel and installworld targets check for ipfw incompatibility and install the new ipfw. Of course, this doesn't help if you have to revert the kernel. I still prefer the failsafe my solution supplies. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: 'I want to apologize'
Ha ha. -Matt Matthew Dillon [EMAIL PROTECTED] :From [EMAIL PROTECTED] Mon Dec 16 14:00:03 2002 :... :Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) : by hub.freebsd.org (Postfix) with ESMTP : id D49B837B401; Mon, 16 Dec 2002 13:59:20 -0800 (PST) :Received: from gator.darkhorse.com (mail.darkhorse.com [209.95.33.140]) : by mx1.FreeBSD.org (Postfix) with ESMTP : id 5C37943EDC; Mon, 16 Dec 2002 13:59:20 -0800 (PST) : (envelope-from [EMAIL PROTECTED]) :Received: from [207.236.15.9] (account [EMAIL PROTECTED]) : by gator.darkhorse.com (CommuniGate Pro WebUser 3.5.9) : with HTTP id 9513340; Mon, 16 Dec 2002 13:58:55 -0800 :From: Matt Dillon [EMAIL PROTECTED] :Subject: I want to apologize :To: [EMAIL PROTECTED] :Cc: [EMAIL PROTECTED] :X-Mailer: CommuniGate Pro Web Mailer v.3.5.9 :Date: Mon, 16 Dec 2002 13:58:55 -0800 :Message-ID: [EMAIL PROTECTED] :MIME-Version: 1.0 :Content-Type: text/plain; charset=ISO-8859-1; format=flowed :Content-Transfer-Encoding: 8bit :Sender: [EMAIL PROTECTED] :List-ID: freebsd-hackers.FreeBSD.ORG :List-Archive: http://docs.freebsd.org/mail/ (Web Archive) :List-Help: mailto:[EMAIL PROTECTED]?subject=help (List Instructions) :List-Subscribe: mailto:[EMAIL PROTECTED]?subject=subscribe%20freebsd-hackers :List-Unsubscribe: mailto:[EMAIL PROTECTED]?subject=unsubscribe%20freebsd-hackers :X-Loop: FreeBSD.ORG :Precedence: bulk : :Hey dudes, I want to apologize for being a total *asshole* :wrt the ipfw thingie. Sorry. I know my patch was shit :anyway, and that ipfw blows dead goats when compared to :ipf, but even with that in mind, I had to pull a deraadt, :sorry. I'm so sorry. I mean, I've had my commit bit taken :away many times already, and yet I'm stupid enough to keep :with the same attitude. Damn, : :Yours, : Matthew. :_ :For the best comics, toys, movies, and more, :please visit http://www.tfaw.com/?qt=wmf To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ipfw userland breaks again.
:I don't like the patch from a security standpoint. It makes it to :easy to turn off a firewall. If you want to be that stupid about :security, you should just make the default be 'accept all' and be done :with it. I'm opposed to this patch unless you can get the security :officer to sign off on it. The defaults are there for a reason so :that we fail 'safe' from a security point of view. : :The real fix is to fix the abi problems. : :Warner This is complete BULLSHIT, Warner. This patch exists precisely so the firewall can be turned on in secure mode. It does not make it any easier to turn off then adding a rule: ipfw add 2 allow all from any to any So don't give me this bullshit about the patch being a security issue. YOU KNOW IT ISN'T. Now you are forcing me to go to core. It's absolutely ridiculous and you know it. Goddamn it, next time I won't even bother posting if all I get is this sort of crap. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ipfw userland breaks again.
: :The real fix is to fix the abi problems. : :Warner Doh!!Thanks for volunteering to fix the ABI problems. No? You don't want to do it? Gee, I saw that one coming a mile away! THEN DON'T COMPLAIN. This is not a fucking security issue. This is a patch that solves a major irritation, nothing more, nothing less, except some people can't stand an 8-line fix to the problem apparently. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ipfw userland breaks again.
:How about sending the patch to the Technical Review Board, trb@ instead. : :Thanks. : :Cheers, : :-- :Anders. Getting bored sitting on your buns? It's already gone to core and, frankly, I think core is the proper forum now that Warner has declared it a security issue (when it obviously isn't. How easy is it to do an ipfw add 2 allow all from any to any? It's ludicrous to call it a security issue). I really don't mind people disagreeing, but I do mind it when people believe that the proper solution is for Matt Dillon to spend a man week fixing a major API that he didn't write instead of comitting an 8 line patch that deals with the issue well enough so sysads don't have to pull their hair out every time it happens. As I said before, I have no problem with the patch being removed once the API is fixed, but I am NOT the guy who should be rewriting the API and, frankly, it is inappropriate for anyone to suggest that I should be if they themselves are not willing to sit down in front of a keyboard and come up with a committable solution of their own. So far all I've heard are utterly trivial complaints from people who aren't willing to code up a solution themselves. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: ipfw userland breaks again.
: :In message: [EMAIL PROTECTED] :Matthew Dillon [EMAIL PROTECTED] writes: :: : :: :The real fix is to fix the abi problems. :: : :: :Warner :: :: Doh!!Thanks for volunteering to fix the ABI problems. No? You :: don't want to do it? Gee, I saw that one coming a mile away! :: THEN DON'T COMPLAIN. : :GET OVER YOURSELF. YOUR CONTIRBUTES ARE NOT GREAT ENOUGH THAT I WILL :TOLERATE THIS BULLSHIT ANYMORE. : :Warner Bullshit is exactly what it is Warner, but I'm not the one spouting it. When all is said and done, this patch is utterly trivial and doesn't hurt anyone. I have said on multiple occassions that it can be removed when the API is fixed, but I am not willing to wait for the API to be fixed because the API has been an open issue for, what, a year now? More? If you want to fix the API then you should go and fix it. I should have just comitted the damn thing rather then ask for a review on the mailing list. I should know better by now. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message