Re: [PATCH] Searching for users of netncp and nwfs to help debug5.0 problems
some comments: firstly: ncp_conn_locklist(int flags, struct proc *p) { ! struct thread *td = TAILQ_FIRST(p-p_threads); /* XXX */ ! ! return lockmgr(listlock, flags | LK_CANRECURSE, 0, td); } can't you use unidifs? :-) ok there is a Macro to find the first thread in a process. FIRST_THREAD_IN_PROC(p) I use it because it makes it easy to find the places that are DEFINITLY BROKEN. The marker for a KSE breakage is: XXXKSE, but places that use FIRST_THREAD_IN_PROC(p) are marked implicitly since nearly any time it is used, the code is trying to do something that doesn't make sense. (except in fork and maybe exit and exec, and in things like linux emulation where you KNOW there is only one thread). if you see TAILQ_FIRST(p-p_threads) (or FIRST_THREAD_IN_PROC(p)) you can pretty much guarantee that what is happenning there need to be completely rewritten, possibly to the extent of rerwiting it's callers and the caller's callers. That is the problem I hit when trying to convert it to start with... Since I don't know the way that process IDs come into the session control (you have to understand the protocol for that) I basically hit a wall on trying to work out what to rewrite, and how. BTW the obnoxious FIRST_THREAD_IN_PROC will go away when we have got rid of most of the broken code and be replaced in places for which it is ok with p2td() which will be: __inline struct thread * p2td(struct proc *p) { KASSERT(((p-p_flag P_KSES) == 0), (Threaded program uses p2td())); return (TAILQ_FIRST(p-p_threads)); } You can always get from a thread to a single process but the reverse always presents the question which thread?. This question can only be answered with external information or if you know there is only one thread at this moment. julian On Tue, 26 Nov 2002, Terry Lambert wrote: Nate Lawson wrote: It's not so much that I volunteered as I said that I'd help with thread/proc issues.. The trouble was that there are places where it used a proc in the old code, but in some cases it needs to be a proc, and in other cases it now needs to be a thread. But all they stored was the proc. Also, from my memories of the code you needed to understand the protocol to know which needed to be which, and I don't know that protocol. In addition whoever does it needs to remember that any structure that stores a thread poitner is probably in error, as threads are transient items and any stored thread pointer is probably a wild pointer within a few milliseconds of being stored. :-) I'll take a whack at it and send it out by tomorrow, working or not. Don't bother. 8-). The attached patch makes it compile, and takes a shot at doing the right thing. The threasd stuff is problematic; it's useful only for a blocking context. The process stuff is there to identify the connection, actually, which can mean huge latencies (hence the caching of procp). It helps to know that the protocol is exclusively request/response per session, the current code handles only a single session per process (not one per thread), and that lock requests are answered bith synchronously and asynchronously (request/response, then async message on timeout or success). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: [PATCH] Searching for users of netncp and nwfs to help debug5.0 problems
First of all, the patch was just to get to the point of compilability, which other prople said they would take it from there. I don't have a NetWare server to test against in my apartment. I'd be just as happy to _let_ the other people who wanted to take it from there do do, now that I made it compile. That said... Julian Elischer wrote: some comments: firstly: ncp_conn_locklist(int flags, struct proc *p) { ! struct thread *td = TAILQ_FIRST(p-p_threads); /* XXX */ ! ! return lockmgr(listlock, flags | LK_CANRECURSE, 0, td); } can't you use unidifs? :-) Only if I want the code to be unreadable. 8-) 8-). Can't you apply the patch to a -current tree checked out for that purpose, and get whatever flavor of diffs you want, e.g. cvs diff -u? Unidiffs don't support fuzz, and uless you are committing the thing, I'd rather not have to recreate it interatively until it passes someone's filter. A context diff gives me that. ok there is a Macro to find the first thread in a process. FIRST_THREAD_IN_PROC(p) I didn't see this. It's definitely the way to go. Any chance of that being documented any time soon, so that no one else does the obvious thing, instead, like I did? I use it because it makes it easy to find the places that are DEFINITLY BROKEN. The marker for a KSE breakage is: XXXKSE, but places that use FIRST_THREAD_IN_PROC(p) are marked implicitly since nearly any time it is used, the code is trying to do something that doesn't make sense. (except in fork and maybe exit and exec, and in things like linux emulation where you KNOW there is only one thread). if you see TAILQ_FIRST(p-p_threads) (or FIRST_THREAD_IN_PROC(p)) you can pretty much guarantee that what is happenning there need to be completely rewritten, possibly to the extent of rerwiting it's callers and the caller's callers. That is the problem I hit when trying to convert it to start with... Actually, I really wanted a LAST_THREAD_IN_PROC(p). The reality is that I want the most recently inserted thread to use as a blocking context, on the theory that it would not be used until much later, and that all the pages it cares about are much more likely to be in core, particularly on a process with tons of threads. The only reason it's being used at all is that the lockmgr() class need a blocking context for their calls, and it's an explicit parameter (arguably, it should be curthread). I can edit the patch (since it's not a unidiff 8-) 8-) 8-)), or I can post a new one, if you want (it's only 9K, compressed). But realize that all it means is give me a thread that I can use as a blocking context while I'm waiting on lockmgr(). Since I don't know the way that process IDs come into the session control (you have to understand the protocol for that) I basically hit a wall on trying to work out what to rewrite, and how. The wire protocol is always request/response. Always. As I stated before, the only exception is a lock, with/without a timeout. In that case, you get the synchronous response to your synchornous request, which basically means request has been queue for servicing, and you later get a seperate notification. The notification is by connection. The connection is per process, because we are talking about a connection where the credentials are associated with the connection in question. The connection provides both a state context (waiting for request vs. request in progress). Making additional requests over the same VC will result in a request in progress, go away you dork response to the client. When an async response to a lock request comes in, in comes in on a seperate VC (each connection has 3 VCs, only 2 of which are normally used: the one for request/response, and the one for async notifications, which is overloaded to handle lock responses). The connection is mapped back to a process to map it back to the blocker. What this basically means is that NCP's can't be handled as multithreaded, without establishing a VC per thread. It just does not work. Therefore NCP request are going to serialize in the kernel, no matter what you do in happy-thread-town. If you are asking for the code to be thread safe, then you are basically talking about multithreading the whole stack: NWFS - NCP Client - IPX - IP Probably it would be a better idea if TCP/IP was multithreaded first? BTW the obnoxious FIRST_THREAD_IN_PROC will go away when we have got rid of most of the broken code and be replaced in places for which it is ok with p2td() which will be: __inline struct thread * p2td(struct proc *p) { KASSERT(((p-p_flag P_KSES) == 0), (Threaded program uses p2td())); return (TAILQ_FIRST(p-p_threads)); } Uh, how exactly is that less obnoxious, given it's the same code with a different name and an obnoxious inline instead of a macro? 8-). You can always get from a thread to a single process but the reverse always presents
Re: [PATCH] Searching for users of netncp and nwfs to help debug5.0 problems
Terry Lambert wrote: Did you want me to update the patch to use your FIRST_THREAD_IN_PROC macro and resend it? OK; here it is, whether you wanted it or not. -- Terry smbfs_thr.diff.gz Description: GNU Zip compressed data
Re: Searching for users of netncp and nwfs to help debug5.0 problems
At 1:04 PM -0800 2002/11/23, Nate Lawson wrote: I'd like to see this on in 5.0 for a while, given the number of new users and problem reports we are already getting and will soon get even more of. If it's too hard to complete this work, could we add DDB in by default in GENERIC? (I shudder to think at the amount of debate such a request would cause but I'm making it nevertheless). IMO, for -CURRENT, we should turn on all possible debugging by default in the kernel. Then, we should make it as easy as possible to capture and report the kind of trace information we need to help locate and fix the problem. Hmm. I was thinking that I needed just one Cyclades terminal server box for my old Sun SPARC 5 clone (break-safe, of course). Now, I'm thinking that I'll need a multi-port model, which I can also connect to the Compaq Armada 4131T that I'll be running -DP2 on, and given all the network problems I've had lately, I may need to also connect it to the cisco ISDN/Ethernet router -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Searching for users of netncp and nwfs to help debug5.0 problems
At 6:45 PM -0500 2002/11/21, Robert Watson wrote: I appreciate the effort, and am interested in the idea, but in this case it was as much a solicitation for a developer as for the testing environment itself. This won't just be testing of netncp and nwfs, this will probably require a developer to have local access to a netware configuration that they can do nasty things to in order to exercise the code properly. Unfortunately, those seem to be in short supply. I understand. My understanding is that part of the job of this team is to find people that have various different types of environments that could be used for testing like this, and put them together with the necessary development personnel. If I might suggest: there's a freebsd-qa mailing list. It's a great place to organize QA efforts, whereas freebsd-chat is notorious for its lack of signal (it's where dead signals go to rot). There's been some talk of freebsd-qa, but so far the only thing I recall being said is that the sort of thing we're talking about doing is not what this list has been used for in the past. We were kind of wondering where we could take this particular effort, and if we could commandeer the freebsd-qa list for this purpose. If you moving the conversation there and get a bunch of people subscribed and interested, they'll be able to look there for the stream of bug fixes associated with the install process, and get easy access to the testing guide as it evolves, since we usually pass drafts through there, etc. We will be moving the conversation to somewhere more appropriate, but I don't know when or where. I believe that our biggest problem at the moment is finding the necessary development types to help us debug the problems and get them sorted out -- we've got people who have hardware, and would be more than willing to help test things out, but in the past they haven't gotten much help from the developers. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Searching for users of netncp and nwfs to help debug5.0 problems
At 2:31 PM -0800 2002/11/22, Terry Lambert wrote: A bug-filing wizard would be useful. The send-pr system doesn't cut it, and most people are unaware of how to file a decent bug report. It doesn't help when the process involves another computer, a serial cable, recompiling a kernel to use a serial console and turn DDB support on, special configuration for system dump images, and changing the size of your swap partition to support the amount of RAM you have put into the machine. Speaking as someone who is about to step off the deep end and start trying to actually run and test -CURRENT on my system here at home, I believe that this kind of resource would be vitally important. In contrast, I've had a few crashes this past week from other programs here on my PowerBook G4 running MacOS X (primarily Chimera, based on the Mozilla Gecko engine with native Aqua interface), and they have made it very easy for me to report crashes. They have integrated tools to extract the maximum amount of information from the system as to exactly what other programs were running, what the program stack was, and a whole host of other things. All I have to do is type in my e-mail address, optionally describe what I was trying to do at the time, and have a functioning Internet connection so that they can upload the reports. I'd share some examples with you, but they are *huge*. Now, we can say that running -CURRENT is not for people who want to be molly-coddled. But I believe it's a good idea to give people better tools to help make a better system. I am convinced that we can find a better compromise. If the PR contains a patch, and the owner does nothing in the allotted time, then give the PR submitter a commit bit, and give ownership of the area over to them. At the very least, PR's will be closed, and more people actively writing code will end up with commit bits. Gack. I'm not sure even I would be quite this radical -- any moron (like me ;-) can generate a PR that might include a patch. IMO, better would be to give the area to another person who is suitably qualified, has the available cycles, and presumably already has a commit bit. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Searching for users of netncp and nwfs to help debug5.0 problems
[ ... bug filing wizard ... ] Brad Knowles wrote: Speaking as someone who is about to step off the deep end and start trying to actually run and test -CURRENT on my system here at home, I believe that this kind of resource would be vitally important. In contrast, I've had a few crashes this past week from other programs here on my PowerBook G4 running MacOS X (primarily Chimera, based on the Mozilla Gecko engine with native Aqua interface), and they have made it very easy for me to report crashes. They have integrated tools to extract the maximum amount of information from the system as to exactly what other programs were running, what the program stack was, and a whole host of other things. All I have to do is type in my e-mail address, optionally describe what I was trying to do at the time, and have a functioning Internet connection so that they can upload the reports. I'd share some examples with you, but they are *huge*. The amount of information is much less important than the utility of the information to someone who is attempting to resolve the bug. Most bug reports contain a lot of information that is really useless, not topical to the bug in question (I was running XYZ and the kernel paniced!), etc.. In terms of kernel problems, the absolutely most useful information is the DDB traceback, followed by a DDB traceback mapped against a debug kernel, followed by a system dump image and a debug kernel, etc.. By default, the system, as distributed, is not setup to supply this. In terms of general problems, diagnostic messages are pretty lame; setting aside the argument against data interfaces not being ripped out being in itself lame, one bix example is the libkvm error message kinfo_proc size mismatch (expected %d, got %d). An error message should tell give you enough information that you could deal with the problem reasonably; for the example message, a better message would be: _kvm_err(kd, kd-program, kinfo_proc size mismatch (expected %d, got %d), sizeof(struct kinfo_proc), kd-procbase-ki_structsize); _kvm_err(kd, kd-program, recompile libkvm and recompile and reinstall %s, kd-programsrcdir); Now, we can say that running -CURRENT is not for people who want to be molly-coddled. But I believe it's a good idea to give people better tools to help make a better system. I am convinced that we can find a better compromise. Plus we aren't really talking about -CURRENT, we are talking about 5.0-DP2, or almost 5.0-RC1, if we're being honest. If the PR contains a patch, and the owner does nothing in the allotted time, then give the PR submitter a commit bit, and give ownership of the area over to them. At the very least, PR's will be closed, and more people actively writing code will end up with commit bits. Gack. I'm not sure even I would be quite this radical -- any moron (like me ;-) can generate a PR that might include a patch. IMO, better would be to give the area to another person who is suitably qualified, has the available cycles, and presumably already has a commit bit. Moron PR's are easily filterable: they are closable by the owner with little effort. PR's that are ignored, on the other hand, rather than being closed, are most likely legitimate, but inconvenient. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Searching for users of netncp and nwfs to help debug5.0 problems
At 5:23 PM -0500 2002/11/21, Robert Watson wrote: (And, you have to bring your own test environment, as the second sentence suggests, but doesn't actually state). Over on -chat, we're in the process of putting together a list of volunteers, hardware, organizational talent, etc... to help test out -DP2. Mark Murray is involved, but I personally would like to see at least one or two more core team members committed to making this happen. If we can get a suitable group of people together, with suitable hardware, and get the coordination effort done correctly, I believe that we can help make this a much more successful project. Your assistance in this effort would be greatly appreciated. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message