Re: [nvi-iconv]Call for test
I totally hate gmail's reply -- I enabled Reply to all by default but I still got things wrong. Anyway, for short, the problem is caused by the lack of a widechar enabled regex. I ported the one used by nvi-devel-1.8x. A new patch is uploaded, https://github.com/downloads/lichray/nvi2/nvi2-freebsd-2011-08-17.diff.gz and I tested it with make buildworld. Note that this version sets WARNS=1 in vi's Makefile, since it's warning free with clang and gcc. And there is change to `rescue`'s compilation: now it links to libcursesw if WITH_ICONV is on. On Tue, Aug 16, 2011 at 5:56 PM, Test Rat ttse...@gmail.com wrote: Zhihao Yuan lich...@gmail.com writes: On Sun, Aug 14, 2011 at 10:39 AM, Zhihao Yuan lich...@gmail.com wrote: Hi, hackers: My GSoC2011 project, Multibyte Encoding Support in Nvi is ready for testing. The proposal of the project is here: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/zy/1 [...] Let me try to ``quickly'' explain how to involve into the testing. First, download the patch from https://github.com/downloads/lichray/nvi2/nvi2-freebsd-2011-08-14.diff.gz It breaks buildworld for me, e.g. $ make all -C share/termcap gzip -cn /usr/src/share/termcap/termcap.5 termcap.5.gz TERM=dumb TERMCAP=dumb: ex - /usr/src/share/termcap/termcap.src /usr/src/share/termcap/reorder Error: stderr: Inappropriate ioctl for device script, 3: Destination line is inside move range *** Error code 1 and crashes when no WITH_ICONV is defined. Can you confirm? Starting program: /usr/bin/ex - /usr/src/share/termcap/termcap.src /usr/src/share/termcap/reorder Program received signal SIGSEGV, Segmentation fault. 0x000800be7760 in ?? () (gdb) bt #0 0x000800be7760 in ?? () #1 0x0044092b in ex_writefp (sp=0x801106800, name=0x801103148 termcap, fp=0x800e51d90, fm=0x801007ca8, tm=0x801007cb8, nlno=0x7fffc808, nch=0x7fffc800, silent=0) at /usr/src/usr.bin/vi/../../contrib/nvi2/ex/ex_write.c:321 #2 0x0040bfb2 in file_write (sp=0x801106800, fm=0x801007ca8, tm=0x801007cb8, name=0x801103148 termcap, flags=21) at /usr/src/usr.bin/vi/../../contrib/nvi2/common/exf.c:924 #3 0x00440739 in exwr (sp=0x801106800, cmdp=0x801007be8, cmd=WRITE) at /usr/src/usr.bin/vi/../../contrib/nvi2/ex/ex_write.c:264 #4 0x004400d2 in ex_write (sp=0x801106800, cmdp=0x801007be8) at /usr/src/usr.bin/vi/../../contrib/nvi2/ex/ex_write.c:91 #5 0x00422b78 in ex_cmd (sp=0x801106800) at /usr/src/usr.bin/vi/../../contrib/nvi2/ex/ex.c:1375 #6 0x0041f788 in ex (spp=0x7fffd040) at /usr/src/usr.bin/vi/../../contrib/nvi2/ex/ex.c:133 #7 0x00412377 in editor (gp=0x801007b00, argc=1, argv=0x7fffd268) at /usr/src/usr.bin/vi/../../contrib/nvi2/common/main.c:424 #8 0x0040513f in main (argc=3, argv=0x7fffd258) at /usr/src/usr.bin/vi/../../contrib/nvi2/cl/cl_main.c:123 (gdb) bt f #0 0x000800be7760 in ?? () No symbol table info available. #1 0x0044092b in ex_writefp (sp=0x801106800, name=0x801103148 termcap, fp=0x800e51d90, fm=0x801007ca8, tm=0x801007cb8, nlno=0x7fffc808, nch=0x7fffc800, silent=0) at /usr/src/usr.bin/vi/../../contrib/nvi2/ex/ex_write.c:321 sb = { st_dev = 4294955600, st_ino = 32767, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 1944, st_rdev = 0, st_atim = { tv_sec = 3, tv_nsec = 34377892735 }, st_mtim = { tv_sec = 6798080, tv_nsec = 4096 }, st_ctim = { tv_sec = 0, tv_nsec = 34366769152 }, st_size = 4, st_blocks = 140737488343672, st_blksize = 3, st_flags = 0, st_gen = 4294954160, st_lspare = 32767, st_birthtim = { tv_sec = 34366543277, tv_nsec = 582 } } gp = (GS *) 0x801007b00 ccnt = 0 fline = 1 tline = 4666 lcnt = 0 len = 140737488340496 rval = -11656 msg = 0x46f540 253|Writing... p = (CHAR_T *) 0xb Error reading address 0xb: Bad address f = 0x800c0981f H\211A^]\017\037\204 flen = 140737488340168 isutf16 = 0 #2 0x0040bfb2 in file_write (sp=0x801106800, fm=0x801007ca8, tm=0x801007cb8, name=0x801103148 termcap, flags=21) at /usr/src/usr.bin/vi/../../contrib/nvi2/common/exf.c:924 mtype = OLDFILE sb = { st_dev = 745804815, st_ino = 70073, st_mode = 33188, st_nlink = 1, st_uid = 1001, st_gid = 1001, st_rdev = 4294967295, st_atim = { tv_sec = 1313534959, tv_nsec = 905174484 }, st_mtim = { tv_sec = 1313535150, tv_nsec = 420174354 }, st_ctim = { tv_sec = 1313535150, tv_nsec = 420174354 }, st_size = 0, st_blocks = 1, st_blksize = 131072, st_flags = 0, st_gen = 0, st_lspare = 0, st_birthtim = {
Re: building my own release images from -CURRENT
On Tue, 2011-08-16 at 15:15 -0700, Test Rat wrote: Sean Bruno sean...@yahoo-inc.com writes: Just trying some hackery with building my own release images from -CURRENT today. I've built world and my kernel, when I enter release and make memstick I get the following: ** Creating the temporary root environment in /var/tmp/temproot *** /var/tmp/temproot ready for use *** Creating and populating directory structure in /var/tmp/temproot *** FATAL ERROR: Cannot 'cd' to /usr/src and install files to the temproot environment *** Error code 1 Stop in /dumpster/scratch/sbruno-scratch/test/release. Where points /usr/src? Have you tried the patch in misc/159666 ? Well, wow. Ok, misc/159666 does indeed fix this problem. I'm very confused by the fact that it has not come up before. Let me cc r...@freebsd.org and see if we can get a commit for this. Sean ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: debugging frequent kernel panics on 8.2-RELEASE
Thanks to the debug that Steven provided and to the help that I received from Kostik, I think that now I understand the basic mechanics of this panic, but, unfortunately, not the details of its root cause. It seems like everything starts with some kind of a race between terminating processes in a jail and termination of the jail itself. This is where the details are very thin so far. What we see is that a process (http) is in exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT flag is set and even past the place where p_limit is freed and reset to NULL. At that place the thread calls prison_proc_free(), which calls prison_deref(). Then, we see that in prison_deref() the thread gets a page fault because of what seems like a NULL pointer dereference. That's just the start of the problem and its root cause. Then, trap_pfault() gets invoked and, because addresses close to NULL look like userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes on to call vm_map_growstack. First thing that vm_map_growstack does is a call to lim_cur(), but because p_limit is already NULL, that call results in a NULL pointer dereference and a page fault. Goto the beginning of this paragraph. So we get this recursion of sorts, which only ends when a stack is exhausted and a CPU generates a double-fault. So, of course, Steven is interested in finding and fixing the root cause. I hope we will get to that with some help from the prison guards :-) But I also would like to use this opportunity to discuss how we can make it easier to debug such issue as this. I think that this problem demonstrates that when we treat certain junk in kernel address value as a userland address value, we throw additional heaps of irrelevant stuff on top of an actual problem. One solution could be to use a special flag that would mark all actual attempts to access userland address (e.g. setting the flag on entrance to copyin and clearing it upon return), so that in the page fault handler we could distinguish actual faults on userland addresses from faults on garbage kernel addresses. I am sure that there could be other clever techniques to catch such garbage addresses early. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: debugging frequent kernel panics on 8.2-RELEASE
On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote: [skip] But I also would like to use this opportunity to discuss how we can make it easier to debug such issue as this. I think that this problem demonstrates that when we treat certain junk in kernel address value as a userland address value, we throw additional heaps of irrelevant stuff on top of an actual problem. One solution could be to use a special flag that would mark all actual attempts to access userland address (e.g. setting the flag on entrance to copyin and clearing it upon return), so that in the page fault handler we could distinguish actual faults on userland addresses from faults on garbage kernel addresses. I am sure that there could be other clever techniques to catch such garbage addresses early. We already have such mechanism, the kernel code aware of the usermode page access sets pcb_onfault. See the end of trap_pfault() handler. In fact, we can catch it earlier, before even calling vm_fault(). BTW, I think this is esp. useful in the combination with the support for the SMEP in recent Intel CPUs. commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536 Author: Konstantin Belousov kos...@pooma.home Date: Thu Aug 18 00:08:50 2011 +0300 Assert that the exiting process does not return to usermode. On x86, do not call vm_fault() when the kernel is not prepared to handle unsuccessful page fault. diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index 4e5f8b8..55e1e5a 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -674,6 +674,19 @@ trap_pfault(frame, usermode) goto nogo; map = vm-vm_map; + + /* +* When accessing a usermode address, kernel must be +* ready to accept the page fault, and provide a +* handling routine. Since accessing the address +* without the handler is a bug, do not try to handle +* it normally, and panic immediately. +*/ + if (!usermode (td-td_intr_nesting_level != 0 || + PCPU_GET(curpcb)-pcb_onfault == NULL)) { + trap_fatal(frame, eva); + return (-1); + } } /* diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c index 5a8016c..e6d2b5a 100644 --- a/sys/i386/i386/trap.c +++ b/sys/i386/i386/trap.c @@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva) goto nogo; map = vm-vm_map; + if (!usermode (td-td_intr_nesting_level != 0 || + PCPU_GET(curpcb)-pcb_onfault == NULL)) { + trap_fatal(frame, eva); + return (-1); + } } /* diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c index 3527ed1..a69b7b8 100644 --- a/sys/kern/subr_trap.c +++ b/sys/kern/subr_trap.c @@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame) CTR3(KTR_SYSC, userret: thread %p (pid %d, %s), td, p-p_pid, td-td_name); + KASSERT((p-p_flag P_WEXIT) == 0, + (Exiting process returns to usermode)); #if 0 #ifdef DIAGNOSTIC /* Check that we called signotify() enough. */ pgp0chWxIkaWy.pgp Description: PGP signature
Re: debugging frequent kernel panics on 8.2-RELEASE
- Original Message - From: Andriy Gapon a...@freebsd.org Thanks to the debug that Steven provided and to the help that I received from Kostik, I think that now I understand the basic mechanics of this panic, but, unfortunately, not the details of its root cause. It seems like everything starts with some kind of a race between terminating processes in a jail and termination of the jail itself. This is where the details are very thin so far. What we see is that a process (http) is in exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT flag is set and even past the place where p_limit is freed and reset to NULL. At that place the thread calls prison_proc_free(), which calls prison_deref(). Then, we see that in prison_deref() the thread gets a page fault because of what seems like a NULL pointer dereference. That's just the start of the problem and its root cause. Thats interesting, are you using http as an example or is that something thats been gleaned from the debugging of our output? I ask as there's only one process running in each of our jails and thats a single java process. Now given your description there may be something I can add that may help clarify what the cause could be. In a nutshell the jail manager we're using will attempt to resurrect the jail from a dieing state in a few specific scenarios. Here's an exmaple:- 1. jail restart requested 2. jail is stopped, so the java processes is killed off, but active tcp sessions may prevent the timely full shutdown of the jail. 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of starting a new jail we attach to the old one and exec the new java process. 4. if an existing jail isnt detected, i.e. where there where not hanging tcp sessions and #2 cleanly shutdown the jail, a new jail is created, attached to and the java exec'ed. The system uses static jailid's so its possible to determine if an existing jail for this service exists or not. This prevents duplicate services as well as making services easy to identify by their jailid. So what we could be seeing is a race between the jail shutdown and the attach of the new process? Now man 2 jail seems to indicate this is a valid use case for jail_set, as it documents its support for JAIL_DYING as a valid option for flags, but I suspect its something quite out of the ordinary to actually do, which may be why this panic hasnt been seen before now. As some background the reason we use static jailid's is to ensure only one instance of the jailed service is running, and the reason we re-attach to the dieing jail is so that jails can be restarted in a timely manor. Without using the re-attach we would need to wait of all tcp sessions which have been aborted to timeout. So, of course, Steven is interested in finding and fixing the root cause. I hope we will get to that with some help from the prison guards :-) Does the above potentially explain how we're getting to the situation which generates the panic? If so we can certainly look at using alternatives to the current design to workaround this issue. Flagging the jail as permanent and using manual process management and additional external locking to prevent duplicates, is what instantly springs to mind. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org