Re: Multiple NFS server problems with Solaris 8 clients
On Sun, 2001/10/14 at 21:38:26 +0100, Ian Dowse wrote: > > > >The last one is a know problem. There is a (unfinished) patch available to > >solve this problem. Thomas Moestl <[EMAIL PROTECTED]> is still working on > >some issues of the patch. Please contact him if you like to know more. > > > >Here is the URL for the patch: > > > >http://home.teleport.ch/freebsd/userland/nfsd-loop.diff > > That patch is a bit out of date, because Peter removed a big chunk > of kerberos code from nfsd since. I was actually just looking at > this problem again, so I include an updated version of Thomas's > patch below. > > This version also removes entries from the children[] array when > a slave nfsd dies to avoid the possibility of accidentally killing > unrelated processes. > > The issue that remains open with the patch is that currently if a > slave nfsd dies, then all nfsds will shut down. This is because > nfssvc() in the master nfsd returns 0 when the master nfsd receives > a SIGCHLD. This behaviour is probably reasonable enough, but the > way it happens is a bit odd. > > Thomas, I'll probably commit this within the next few days if you > have no objections, and if you don't get there before me. The > exiting behaviour can be resolved later if necessary. Thanks! I've been meaning to update and commit this patch for quite some time, but was rather focused on sparc64 development recently when I had time. I also wanted to resolve this exiting behaviour before, but I agree that it is probably not a real issue. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] for linux_connect (ugly)
On Wed, Feb 28, 2001 at 01:17:12AM +0100, Martin Blapp wrote: > Thomas Moestl and I tried to fix linux_connect. Most of this patch > is from Thomas Moestl. I did only a little part of it and testing. > > Staroffice5.2 has been broken about one year now, and it needs > a fix with the same behaviour to work correctly with FreeBSD. > > This patch should be rewritten so it can be comitted to CURRENT > and (IMPORTANT) to STABLE before 4.3 is out. > > + /* > + * Ugly kluge: some applications depend on 0 being > + * returned only the first time. Therefore, we set > + * the (otherwise invisible) SO_KNBCONN flag. > + * If it is set, return EISCONN. > + */ > + error = holdsock(p->p_fd, linux_args.s, &fp); > + if (error) > + return (error); > + iconn = ((struct socket *)fp->f_data)->so_options & > + SO_KNBCONN; > + ((struct socket *)fp->f_data)->so_options |= SO_KNBCONN; > + fdrop(fp, p); > + > + if (iconn) > + return (EISCONN); Some background: when a socket is connected in non-blocking mode and the connect does not immediately succeed (i.e. EINPROGRESS is returned), linux obviously will return the value getsockopt(...SO_ERROR...) on the socket would give on FreeBSD (i.e. 0 if the connection attempt succeeded) from the first connect() call on the socket after the connection has been established. Only the next call will returne EISCONN. So, the linuxulator has been modified in the past to always return the value getsockopt(...SO_ERROR...) gives. This does break applications that loop and wait for EISCONN, e.g. StarOffice. I might add that I do not particularly like this patch (because of fiddling with the socket internals) and consider it more of a quick fix. A somewhat cleaner solution might be to add a SO_USER socket option that can be freely set or reset by any FreeBSD application (without any effect). This could then be used to store connect state, and linux applications would run fine because they are ignorant of this. This way the getsockopt/setsockopt interface could be used in the linuxulator. Then again, maybe it is better to hide this... Well. I dislike the Linux behaviour. I see no really clean way of emulating this without touching our socket internals (a separate state could be kept in the linuxulator, but this is even more ugly, I presume). - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: [PATCH] for linux_connect (ugly)
On Wed, Feb 28, 2001 at 01:55:15AM +0100, Thomas Moestl wrote: > On Wed, Feb 28, 2001 at 01:17:12AM +0100, Martin Blapp wrote: > > Thomas Moestl and I tried to fix linux_connect. Most of this patch > > is from Thomas Moestl. I did only a little part of it and testing. > > > > Staroffice5.2 has been broken about one year now, and it needs > > a fix with the same behaviour to work correctly with FreeBSD. > > > > This patch should be rewritten so it can be comitted to CURRENT > > and (IMPORTANT) to STABLE before 4.3 is out. > > > > + /* > > +* Ugly kluge: some applications depend on 0 being > > +* returned only the first time. Therefore, we set > > +* the (otherwise invisible) SO_KNBCONN flag. > > +* If it is set, return EISCONN. > > +*/ > > + error = holdsock(p->p_fd, linux_args.s, &fp); > > + if (error) > > + return (error); > > + iconn = ((struct socket *)fp->f_data)->so_options & > > + SO_KNBCONN; > > + ((struct socket *)fp->f_data)->so_options |= SO_KNBCONN; > > + fdrop(fp, p); > > + > > + if (iconn) > > + return (EISCONN); I have forgotten to add that if we fiddle with the socket's internals anyway, this should probably be put into the socket state. This really is a quick hack, but maybe it can be sanitized to be useful. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: new rc.diskless{1,2} files
On Fri, Mar 30, 2001 at 07:38:30PM +0200, Falco Krepel wrote: > I have implemented good ideas from Mike Smith in my > rc.diskless{1,2} files and make some other changes: > > 1. Now I use the kernel flag "vm.nswapdev" to determine if swap is > available. Just a note - this might go away soon because the swap dev sysctls will probably be reorganized a little. There will be another method to determine this then. I'll keep you posted on this change if you wish. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: ssh: no RSA support in libssl and libcrypto. See ssl(8).
On Fri, Mar 30, 2001 at 06:36:51PM -0400, The Hermit Hacker wrote: > > how does one fix this? *puzzled look* there is no ssl(8) that I can seem > to find to See, and nothing apparent in the UPGRADING file ... system > uptodate as of last night, and mergemaster just finished completing ... > > ssh: no RSA support in libssl and libcrypto. See ssl(8). > Disabling protocol version 1 > DH_generate_key Do you have device random in your kernel configuration? Absence of a working random device triggers this bug. The need for this is also documented in UPDATING. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
*HEADS UP* libposix1e is integrated into libc
src/lib/libposix1e was repocopied to src/lib/libc/posix1e, and I'll start to commit the necessary patches now and will then activate the build. World may be broken during a short interval due to the switch. You will also need to rebuild anything that uses libposix1e. In the base system, those are src/bin/getfacl and src/bin/setfacl for now. I'm not aware of any ports using it, so normally you should be fine after a buildworld. Please let me know of any problems. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: *HEADS UP* libposix1e is integrated into libc
On Wed, Apr 04, 2001 at 07:23:18PM +0200, Thomas Moestl wrote: > src/lib/libposix1e was repocopied to src/lib/libc/posix1e, and I'll > start to commit the necessary patches now and will then activate the > build. > > World may be broken during a short interval due to the switch. You > will also need to rebuild anything that uses libposix1e. In the base > system, those are src/bin/getfacl and src/bin/setfacl for now. I'm not > aware of any ports using it, so normally you should be fine after a > buildworld. The changes are complete now, so any possible breakage should be over. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: recursed on non-recursive lock
On Sat, 2001/05/26 at 11:07:36 -0500, Michael Harnois wrote: > I finally got this much. I hope it helps. > > lock order reversal > 1st 0xc03af0a0 mntvnode @ ../../ufs/ffs/ffs_vnops.c:1007 > 2nd 0xc8b539cc vnode interlock @ ../../ufs/ffs/ffs_vfsops.c:1016 > > recursed on non-recursive lock (sleep mutex) vm @ ../../ufs/ufs/ufs_readwrite.c:420 > first acquired @ ../../vm/vnode_pager.c:912 > panic:recurse > Debugger ("panic") > Stopped at Debugger+0x45: pushl %ebx > db> t > Debugger(c0310767b) at Debugger+0x45 > panic(c0313348,c81b9cb8,a0,10,0) at panic+0x70 > witness_lock(c03b3f20,8,c03263b6,1a4) at witness_lock+0x356 > ffs_write(c81b9ca4) at ffs_write+0xba > vnode_pager_generic_putpages(c8c31d00,c81b9ddc,1,0,c81b9d74) at >vnode_pager_generic_putpages+0x19c > vop_stdputpages(c81b9d28,c81b9d0c,c02a7f9d,c81b9d28,c81b9d48) at vop_stdputpages+0x1f > vop_defaultop(c81b9d28,c81b9d48,c02c5c3d,c81b9d28,0) at vop_defaultop+0x15 > ufs_vnoperate(c81b9d28) at ufs_vnoperate+0x15 > vnode_pager_putpages(c8c4b360,c81b9ddc,10,0,c81b9d74,c03b3f20,1,c0329ffa,91) at >vnode_pager_putpages+0x1ad > [...] I can relatively reliable reproduce this panic here... The problem appears to be that the vm_mtx is held when VOP_WRITE is called in vnode_pager_generic_putpages (sys/vm/vnode_pager.c:999). This may try to grab the vm_mtx (e.g. the ufs implementation in sys/ufs/ufs/ufs_readwrite.c), so you end up with a recursion on the lock. Even if it wouldn't recurse, VOP_WRITE can AFAIK block, so there is a potential for other panics, too. The attached patch just unlocks vm_mtx before this call and reacquires the it when it's done. This works for me, and I think it theoretically should be safe because all relevant parts are under Giant again for now; YMMV, it might cause other panics or corruption, so you've been warned ;) - thomas Index: sys/vm/vnode_pager.c === RCS file: /home/ncvs/src/sys/vm/vnode_pager.c,v retrieving revision 1.130 diff -u -r1.130 vnode_pager.c --- sys/vm/vnode_pager.c2001/05/23 22:51:23 1.130 +++ sys/vm/vnode_pager.c2001/05/27 01:07:19 @@ -996,7 +996,9 @@ auio.uio_rw = UIO_WRITE; auio.uio_resid = maxsize; auio.uio_procp = (struct proc *) 0; + mtx_unlock(&vm_mtx); error = VOP_WRITE(vp, &auio, ioflags, curproc->p_ucred); + mtx_lock(&vm_mtx); cnt.v_vnodeout++; cnt.v_vnodepgsout += ncount;
Re: -current is _definitely_ not stable right now
On Tue, 2001/05/29 at 09:39:42 -0700, John Baldwin wrote: > > On 28-May-01 Doug Barton wrote: > > I forgot something: > > > > IdlePTD 4734976 > > initial pcb at 3b5f80 > > panicstr: mutex sched lock recursed at /usr/src/sys/kern/kern_synch.c:858 > > panic messages: > > I would need a traceback from here. It looks like someone called msleep or > tsleep with sched lock held. OK, I think I've found the problem, patch attached. set_user_ldt is called from cpu_switch on i386, where the sched lock is already held by the process that is just being scheduled away, and curproc has already been changed, so this isn't treated like a recursed mutex, but rather like the new process (dead-) locking against the old one. The solution taken in the attached patch create a set_user_ldt_nolock. This way, we have a more or less consistent enviroment (of the new process) there. The (pcb != PCPU_GET(curpcb)) check is in the outer locking set_user_ldt wrapper (it seems only to be needed in the smp rendezvous case and is a "can't happen" when called from cpu_switch). This works for me; Doug, could you please test it too? I'd be thankful for any review. - thomas Index: i386/swtch.s === RCS file: /home/ncvs/src/sys/i386/i386/swtch.s,v retrieving revision 1.114 diff -u -r1.114 swtch.s --- i386/swtch.s2001/05/20 16:51:08 1.114 +++ i386/swtch.s2001/05/29 22:09:14 @@ -248,7 +248,7 @@ movl%eax,PCPU(CURRENTLDT) jmp 2f 1: pushl %edx - callset_user_ldt + callset_user_ldt_nolock popl%edx 2: Index: i386/sys_machdep.c === RCS file: /home/ncvs/src/sys/i386/i386/sys_machdep.c,v retrieving revision 1.57 diff -u -r1.57 sys_machdep.c --- i386/sys_machdep.c 2001/05/15 23:22:20 1.57 +++ i386/sys_machdep.c 2001/05/29 22:24:04 @@ -239,17 +239,16 @@ /* * Update the GDT entry pointing to the LDT to point to the LDT of the - * current process. + * current process. Assumes that sched_lock is held. This is needed + * in cpu_switch because sched_lock is held by the process that has + * just been scheduled away and we would deadlock if we would try to + * acquire sched_lock. */ void -set_user_ldt(struct pcb *pcb) +set_user_ldt_nolock(struct pcb *pcb) { struct pcb_ldt *pcb_ldt; - if (pcb != PCPU_GET(curpcb)) - return; - - mtx_lock_spin(&sched_lock); pcb_ldt = pcb->pcb_ldt; #ifdef SMP gdt[PCPU_GET(cpuid) * NGDT + GUSERLDT_SEL].sd = pcb_ldt->ldt_sd; @@ -258,6 +257,17 @@ #endif lldt(GSEL(GUSERLDT_SEL, SEL_KPL)); PCPU_SET(currentldt, GSEL(GUSERLDT_SEL, SEL_KPL)); +} + +/* Locking wrapper of the above */ +void +set_user_ldt(struct pcb *pcb) +{ + if (pcb != PCPU_GET(curpcb)) + return; + + mtx_lock_spin(&sched_lock); + set_user_ldt_nolock(pcb); mtx_unlock_spin(&sched_lock); } Index: include/pcb_ext.h === RCS file: /home/ncvs/src/sys/i386/include/pcb_ext.h,v retrieving revision 1.6 diff -u -r1.6 pcb_ext.h --- include/pcb_ext.h 2001/05/10 17:03:03 1.6 +++ include/pcb_ext.h 2001/05/29 22:06:37 @@ -55,6 +55,7 @@ int i386_extend_pcb __P((struct proc *)); void set_user_ldt __P((struct pcb *)); +void set_user_ldt_nolock __P((struct pcb *)); struct pcb_ldt *user_ldt_alloc __P((struct pcb *, int)); void user_ldt_free __P((struct pcb *));
Re: alpha tinderbox failure
On Sun, 2003/01/05 at 01:33:14 -0800, David O'Brien wrote: > On Sun, Jan 05, 2003 at 06:00:26PM +1100, Bruce Evans wrote: > > > >> ===> usr.bin/vi > > > >> *** Error code 1 (ignored) > > > >> *** Error code 1 (ignored) > > > >> ===> usr.bin/vis > .. > > No; it would be more profitable to teach programmers to not ignore errors. > > whereintheworld is perfectly non-broken in not ignoring them. These > > "*** Error" messages (not to mention other error ouput from makeworld) > > also make it harder for human readers to see the actual errors. > > Agreed. I'd love to hear from fanf what the changes are to unifdef that > causes this change in exit code. According to the man page, this is the correct behaviour: The unifdef utility exits 0 if the output is an exact copy of the input, 1 if not, and 2 if in trouble. The exit status code in unifdef seems to have been broken before for a while. The vi Makefile just was sloppy in checking for the exit code; it should probably check for 1 and exclude 0 also, like: --- Makefile29 Jul 2002 09:40:16 - 1.38 +++ Makefile5 Jan 2003 13:20:49 - @@ -75,10 +75,12 @@ # unifdef has some *weird* exit codes, sigh! RTFM unifdef(1)... ex_notcl.c: ex_tcl.c - -unifdef -UHAVE_TCL_INTERP ${SRCDIR}/ex/ex_tcl.c > ${.TARGET} + ! { unifdef -UHAVE_TCL_INTERP ${SRCDIR}/ex/ex_tcl.c > ${.TARGET} || \ + [ $$? -ne 1 ] ; } ex_noperl.c: ex_perl.c - -unifdef -UHAVE_PERL_INTERP ${SRCDIR}/ex/ex_perl.c > ${.TARGET} + ! { unifdef -UHAVE_PERL_INTERP ${SRCDIR}/ex/ex_perl.c > ${.TARGET} || \ + [ $$? -ne 1 ] ; } CLEANFILES+= ex_notcl.c ex_noperl.c --- (there's probably a more elegant way to do this, my sh is a bit rusty). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: PANIC in tcp_syncache.c sonewconn() line 562
On Mon, 2003/01/13 at 17:47:11 +0100, Martin Blapp wrote: > #10 0xc03df350 in trap (frame= > {tf_fs = 24, tf_es = -65520, tf_ds = 16, tf_edi = 2, tf_esi = -1031597312, > tf_ebp = -854635944, tf_isp = -854635988, tf_ebx = -1031595264, tf_edx = 4, > tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = -1071076206, tf_cs > = 8, tf_eflags = 66050, tf_esp = -1031595264, tf_ss = > 0}) at /usr/src/sys/i386/i386/trap.c:445 > #11 0xc03cf9f8 in calltrap () at {standard input}:98 > #12 0xc02e1f3f in syncache_socket (sc=0xc2831300, lso=0xc2831300, m=0xc0ed9c00) > at /usr/src/sys/netinet/tcp_syncache.c:562 > #13 0xc02e23e8 in syncache_expand (inc=0xcd0f4b4c, th=0xc0ed9c68, > sop=0xcd0f4b18, m=0xc0ed9c00) > at /usr/src/sys/netinet/tcp_syncache.c:781 > #14 0xc02db779 in tcp_input (m=0xc0ed9c68, off0=20) at > /usr/src/sys/netinet/tcp_input.c:703 > #15 0xc02d409b in ip_input (m=0xc0ed9c00) at /usr/src/sys/netinet/ip_input.c:923 > #16 0xc02d4192 in ipintr () at /usr/src/sys/netinet/ip_input.c:941 > #17 0xc02c1713 in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:97 > #18 0xc0238df1 in ithread_loop (arg=0xc0eba000) at > /usr/src/sys/kern/kern_intr.c:535 > #19 0xc0237cf3 in fork_exit (callout=0xc0238c20 , arg=0x0, > frame=0x0) > at /usr/src/sys/kern/kern_fork.c:873 > > 562 so = sonewconn(lso, SS_ISCONNECTED); This seems to actually be a quite old bug: we allow listen() to be called on connected sockets, which messes up the state of the socket (it will get SO_ACCEPTCONN set). Before syncache, this would likely only lead to the connection becoming catatonic, unless a matching SYN packet came along (in a state where the initial SYN of the connection was already received). With syncache however, a panic can be triggered by normal ACK packets. In your example, the listen is buried in the bowels of the RPC code. The solution should be to reject the listen() with EINVAL (which seems to be that standard-mandated error for connected sockets); patch attached. Any thoughts on this? - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: kern/uipc_socket.c === RCS file: /ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.140 diff -u -r1.140 uipc_socket.c --- kern/uipc_socket.c 5 Jan 2003 11:14:04 - 1.140 +++ kern/uipc_socket.c 13 Jan 2003 21:43:52 - @@ -266,6 +266,10 @@ int s, error; s = splnet(); + if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING)) { + splx(s); + return (EINVAL); + } error = (*so->so_proto->pr_usrreqs->pru_listen)(so, td); if (error) { splx(s); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: PANIC in tcp_syncache.c sonewconn() line 562
On Tue, 2003/01/14 at 02:51:03 -0800, Terry Lambert wrote: > Martin Blapp wrote: > > Can you commit this ? The fix looks appropriate, but the manpage should > > also be changed to reflect the change. > > > > ERRORS > > Listen() will fail if: > > > > [EBADF]The argument s is not a valid descriptor. > > [ENOTSOCK] The argument s is not a socket. > > [EOPNOTSUPP] The socket is not of a type that > > supports the operation listen(). > > [EINVAL] Listen() has been already called on the socket. > > > > Any objections from others ? > > It seems to me that calling listen() on a socket to change the > listen queue depth is a reasonable thing to do; this is true > before it's bound, after it's bound, before listen() has been > called on it, and after listen() has been called on it once (or > more). > > Am I missing something here? Is there a good technical reason > to not permit an application to change the listen queue depth? > Or is there some way that an application can do this, using a > call other than listen()? > > That it causes a panic when the SYN cache is enabled isn't really > a technical reason, it's a circumstantial reason. The manpage change does not reflect the change in the patch :) It should be: [EINVAL]The socket is connected. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Bus DMA for USB - compilation problems.
On Wed, 2003/01/15 at 20:20:33 +, Josef Karthauser wrote: > On Wed, Jan 15, 2003 at 12:05:20PM -0800, Maxime Henrion wrote: > > Josef Karthauser wrote: > > > I've partially ported the NetBSD busdma code for USB to FreeBSD, but > > > it doesn't compile, probably for a trivial reason. > > > > > > Anyone fancy helping me out? > > > > I didn't look at the patches yet, but could you give me the compilation > > error you are getting ? > > > > cc -c -O -pipe -march=pentium3 -Wall -Wredundant-decls -Wnested-externs > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline > -Wcast-qual -fformat-extensions -ansi -g -nostdinc -I- -I. > -I/usr/src/sys -I/usr/src/sys/dev -I/usr/src/sys/contrib/dev/acpica > -I/usr/src/sys/contrib/ipfilter -D_KERNEL -include opt_global.h > -fno-common -mno-align-long-strings -mpreferred-stack-boundary=2 > -ffreestanding -Werror /usr/src/sys/dev/usb/uhci.c > /usr/src/sys/dev/usb/uhci.c: In function `uhci_init': > /usr/src/sys/dev/usb/uhci.c:425: dereferencing pointer to incomplete type > /usr/src/sys/dev/usb/uhci.c: In function `uhci_power': > /usr/src/sys/dev/usb/uhci.c:714: dereferencing pointer to incomplete type > /usr/src/sys/dev/usb/uhci.c: In function `uhci_alloc_std': > > It's failing at lines like: > > UWRITE4(sc, UHCI_FLBASEADDR, DMAADDR(&sc->sc_dma, 0)); /* set frame list */ > > The problematic is DMAADDR, and it's because the sc->sc_dma, which is > defined as usb_dma_t. This is defined in usb_port.h, and it uses > usb_dma_block which is defined in usb_mem.h. I think that it's the > usb_dma_block that is coming up as incomplete, but I'm not sure. DMAADDR is: #define DMAADDR(dma, o) ((dma)->block->map->dm_segs[0].ds_addr + (dma)->offs + (o)) struct usb_dma_block starts like: typedef struct usb_dma_block { bus_dma_tag_t tag; bus_dmamap_t map; However, bus_dmamap_t (like bus_dma_tag_t) is supposed to be opaque to users of the busdma interface on FreeBSD. Our implementations enforce this by defining it as: typedef struct bus_dmamap *bus_dmamap_t; , and by not exporting struct bus_dmamap in public headers. The DMA addresses are obtained by writing an appropriate callback routine to process them and passing it to bus_dmamap_load(). The usb_mem.c will need some other changes to work on FreeBSD, since our busdma code has diverged from NetBSD's quite a bit. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: PANIC in tcp_syncache.c sonewconn() line 562
On Wed, 2003/01/15 at 02:20:12 +1100, Bruce Evans wrote: > On Tue, 14 Jan 2003, Martin Blapp wrote: > > > > > Hi Thomas, > > > > > s = splnet(); > > > + if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING)) { > > > + splx(s); > > > + return (EINVAL); > > > + } > > > error = (*so->so_proto->pr_usrreqs->pru_listen)(so, td); > > > if (error) { > > > splx(s); > > > > > > > Can you commit this ? The fix looks appropriate, but the manpage should > > also be changed to reflect the change. > > > > ERRORS > > Listen() will fail if: > > > > [EBADF]The argument s is not a valid descriptor. > > [ENOTSOCK] The argument s is not a socket. > > [EOPNOTSUPP] The socket is not of a type that > > supports the operation listen(). > > [EINVAL] Listen() has been already called on the socket. > > > > Any objections from others ? > > EINVAL is a bogus errno for this, but is standard. POSIX has better > wording: "The socket is already connected". The patch also returns > EINVAL if the socket is being connected. Is this right? (Maybe we > should wait until we can tell if it is connected.) Yes, I think so; calling listen() for SS_ISCONNECTING sockets can also lead to bogus states, although that could of course be avoided in another way. For applications, however, it does not matter much, since sockets can never be safely assumed to be in SS_ISCONNECTING (they can always change to SS_ISCONNECTED or time out behind the application's back). > POSIX also specifies the errors EDESTADDRREQ, EACCES, another EINVAL for > shut down sockets, and ENOBUFS. The last 3 "may" cause listen() to fail > and the others (including the first EINVAL) "shall" cause it to fail. EDESTADDRREQ seems to not be generated, instead e.g. tcp_usr_listen() always chooses a local address (which does not really make much sense, but changing it might break old applications I guess). ENOBUFS does not seem to occur. Shut down sockets seem to not be handled specially. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: panic: malloc(M_WAITOK) returned NULL
On Sat, 2003/01/18 at 09:00:39 +0100, [EMAIL PROTECTED] wrote: > In message <[EMAIL PROTECTED]>, Kris Kennaway writes: > > >I just got the following on axp1: > > > >panic: malloc(M_WAITOK) returned NULL > >db_print_backtrace() at db_print_backtrace+0x18 > >panic() at panic+0x104 > >malloc() at malloc+0x1a8 > >initiate_write_inodeblock_ufs1() at initiate_write_inodeblock_ufs1+0xc4 > >softdep_disk_io_initiation() at softdep_disk_io_initiation+0xa4 > >spec_strategy() at spec_strategy+0x158 > >spec_vnoperate() at spec_vnoperate+0x2c > > This is a bug in the kernel memory allocator, since it should not be > able to return NULL when M_WAITOK is specified. The potential bugs > are more likely because M_WAITOK is defined as zero. I found two instances of bogus M_WAITOK tests a few days ago (but haven't received an answer from Jeff yet). The first occurance in the attached patch would cause malloc to fail if a zone was exhausted (without M_NOWAIT); the second one is mostly harmless. None of the two could have caused this panic. I would guess that it was caused by the alpha uma_small_alloc() implementation trying less hard to allocate a page than kmem_alloc() (i.e. it does not sleep at all). This problem does also affect the ia64 and sparc64 uma_small_alloc() versions it seems. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: uma_core.c === RCS file: /ncvs/src/sys/vm/uma_core.c,v retrieving revision 1.45 diff -u -r1.45 uma_core.c --- uma_core.c 1 Jan 2003 18:48:59 - 1.45 +++ uma_core.c 12 Jan 2003 23:15:49 - @@ -1476,10 +1476,10 @@ zone->uz_pages >= zone->uz_maxpages) { zone->uz_flags |= UMA_ZFLAG_FULL; - if (flags & M_WAITOK) - msleep(zone, &zone->uz_lock, PVM, "zonelimit", 0); - else + if (flags & M_NOWAIT) break; + else + msleep(zone, &zone->uz_lock, PVM, "zonelimit", 0); continue; } zone->uz_recurse++; @@ -1499,7 +1499,7 @@ * could have while we were unlocked. Check again before we * fail. */ - if ((flags & M_WAITOK) == 0) + if (flags & M_NOWAIT) flags |= M_NOVM; } return (slab); @@ -1587,7 +1587,6 @@ } /* Don't block on the next fill */ flags |= M_NOWAIT; - flags &= ~M_WAITOK; } zone->uz_fills--; To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: panic: malloc(M_WAITOK) returned NULL
On Sat, 2003/01/18 at 13:22:45 +0100, Thomas Moestl wrote: > None of the two could have caused this panic. I would guess that it > was caused by the alpha uma_small_alloc() implementation trying less > hard to allocate a page than kmem_alloc() (i.e. it does not sleep at > all). This problem does also affect the ia64 and sparc64 > uma_small_alloc() versions it seems. To follow up on this, the attached patch should fix the problem if this was really the cause. It makes the alpha and sparc64 implementations wait if requested (I don't know how I got the idea that there was an ia64 one, must have accidentially opened the wrong file). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: alpha/alpha/pmap.c === RCS file: /d/ncvs/src/sys/alpha/alpha/pmap.c,v retrieving revision 1.117 diff -u -r1.117 pmap.c --- alpha/alpha/pmap.c 28 Dec 2002 22:47:45 - 1.117 +++ alpha/alpha/pmap.c 18 Jan 2003 16:50:19 - @@ -582,7 +582,14 @@ if (wait & M_ZERO) pflags |= VM_ALLOC_ZERO; - m = vm_page_alloc(NULL, color++, pflags | VM_ALLOC_NOOBJ); + for (;;) { + m = vm_page_alloc(NULL, color, pflags | VM_ALLOC_NOOBJ); + if (m == NULL && (wait & M_NOWAIT) == 0) + VM_WAIT; + else + break; + } + color++; if (m) { va = (void *)ALPHA_PHYS_TO_K0SEG(m->phys_addr); Index: sparc64/sparc64/vm_machdep.c === RCS file: /d/ncvs/src/sys/sparc64/sparc64/vm_machdep.c,v retrieving revision 1.32 diff -u -r1.32 vm_machdep.c --- sparc64/sparc64/vm_machdep.c5 Jan 2003 05:30:40 - 1.32 +++ sparc64/sparc64/vm_machdep.c18 Jan 2003 17:00:51 - @@ -64,6 +64,7 @@ #include #include #include +#include #include #include #include @@ -330,7 +331,14 @@ if (wait & M_ZERO) pflags |= VM_ALLOC_ZERO; - m = vm_page_alloc(NULL, color++, pflags | VM_ALLOC_NOOBJ); + for (;;) { + m = vm_page_alloc(NULL, color, pflags | VM_ALLOC_NOOBJ); + if (m == NULL && (wait & M_NOWAIT) == 0) + VM_WAIT; + else + break; + } + color++; if (m) { pa = VM_PAGE_TO_PHYS(m); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: kernel panic with today's CURRENT on sparc64 at boot
On Tue, 2003/01/21 at 17:33:42 +, [EMAIL PROTECTED] wrote: > Hi > > cvsup'd to . today on an E220R, single CPU with two Symbios scsi cards. > Box had been running fine with RC3, but would not boot with -CURRENT > kernel: > > Current: > > hme4: Ethernet address: 08:00:20:b7:ef:44 > miibus4: on hme4 > ukphy3: on miibus4 > ukphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > sym0: <875> port 0x1000-0x10ff mem 0x1090a000-0x1090afff,0x10908000-0x109080ff i > rq 32 at device 3.0 on pci0 > panic: trap: fast data access mmu miss > cpuid = 0; > Debugger("panic") > Stopped at Debugger+0x1c: ta %xcc, 1 Can you please cvsup again to pick up some changes that were made yesterday and try again? - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: kernel panic with today's CURRENT on sparc64 at boot
On Fri, 2003/01/24 at 11:54:41 +, Steven Haywood wrote: >hme6: mem 0xc80-0xc807fff irq >26 at device 1.1 on >pci2 >hme6: DMA buffer map load error 12 >hme6: could not be configured >device_probe_and_attach: hme6 attach returned 6 >pci2: at device 2.0 (no driver >attached) >hme6: mem 0xe80-0xe807fff irq >27 at device 2.1 on >pci2 >hme6: DMA buffer map load error 12 >hme6: could not be configured >device_probe_and_attach: hme6 attach returned 6 >pci2: at device 3.0 (no driver >attached) >pci2: at device 3.0 (no driver >attached) >n pci2 >hme6: DMA buffer map load error 12 >hme6: could not be configured >device_probe_and_attach: hme6 attach returned 6 >pci0: at device 5.0 (no driver attached) >pcib3: on nexus0 >pcib3: Psycho, impl 0, version 4, ign 0x7c0 >pci3: on pcib3 >Timecounters tick every 10.000 msec >ipfw2 initialized, divert disabled, rule-based forwarding >enabled, default to de >ny, logging limited to 100 packets/entry by default >Waiting 5 seconds for SCSI devices to settle >da0 at sym0 bus 0 target 1 lun 0 >da0: Fixed Direct Access >SCSI-2 device >da0: 40.000MB/s transfers (20.000MHz, offset 16, 16bit), >Tagged Queueing Enabled >da0: 8637MB (17689267 512 byte sectors: 255H 63S/T 1101C) >Mounting root from ufs:/dev/da0a >exec /sbin/init: error 8 >init: not found in path >/sbin/init:/sbin/oinit:/sbin/init.bak:/stand/sysinstall >panic: no init >cpuid = 0; >Debugger("panic") >Stopped at Debugger+0x1c: ta %xcc, 1 This is probably easy to work around for you by increasing the amount of available DVMA: -- diff -u -r1.26 psycho.c --- sparc64/pci/psycho.c21 Jan 2003 08:56:14 - 1.26 +++ sparc64/pci/psycho.c24 Jan 2003 16:05:00 - @@ -565,7 +565,7 @@ sc->sc_is->is_sb[1] = 0; if (OF_getproplen(sc->sc_node, "no-streaming-cache") < 0) sc->sc_is->is_sb[0] = sc->sc_pcictl + PCR_STRBUF; - psycho_iommu_init(sc, 2); + psycho_iommu_init(sc, 3); } else { /* Just copy IOMMU state, config tag and address */ sc->sc_is = osc->sc_is; -- If that still doesn't help, you can further increase the constant to 4 or 5 (at the expense of another 64kB or 192kB of memory). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: psm not working on Toshiba 1905-s301 and 5.0 Release
On Fri, 2003/01/31 at 10:00:28 -0500, Jason wrote: > I searched the archives, found a few similair problems, and possible > resolutions, but none have worked for me. > > In windows, it is shown as an Alps Glidepoint on irq12, however fbsd > refuses to find it. > > The device.hints file shows the correct info. I ran acpidump, but was not > quite sure what to look for. (In one of the posts involving a sony with > similair problems, someone said to check for MOUE meant MOUSE> to get the ID to modify psm.c, but neither exist in acpidump > results) > > Any ideas? I have a Satellite 1110, ignoring port errors gets the glidepoint detection to work. To do this, add hint.psm.0.flags="0x1000" (or-ed with any other flags you might want) to your device.hints. On my notebook, that is not the only quirk this glidepoint has: it completely stops sending interrupts as soon as you stop to move the cursor for a bit. I have found no other solution to this problem but to poll the glidepoint - this as the obvious disadvantage of using up some CPU time (< 0.5%), making the mouse movement feel a bit jittery and occasionally missing some very fast klicks. Since I do not use the glidepoint overly much, I can live with that. If your notebook has that problem too, I can supply a patch. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Dumping broken?
On Sat, 2003/02/08 at 03:18:54 -0800, Kris Kennaway wrote: > I'm having lots of problems with crashdumps under 5.0. Most of the > time trying to force a dump via 'call doadump' returns an error about > 'Context switches not permitted in the debugger'. Calling it again > causes the system to hang. Is anyone else seeing this? If this is on ATA, try the attached patch. It changes ata_getparam() to not block anymore by reverting to the pre-r1.138 behaviour. This is just a quick hack though, since r1.138 was apparently made to fix probing of some CD-ROMs (so these are broken again by this patch). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: ata-all.c === RCS file: /ncvs/src/sys/dev/ata/ata-all.c,v retrieving revision 1.163 diff -u -r1.163 ata-all.c --- ata-all.c 19 Jan 2003 20:18:07 - 1.163 +++ ata-all.c 21 Jan 2003 17:01:13 - @@ -514,7 +514,7 @@ /* apparently some devices needs this repeated */ do { - if (ata_command(atadev, command, 0, 0, 0, ATA_WAIT_INTR)) { + if (ata_command(atadev, command, 0, 0, 0, ATA_WAIT_READY)) { ata_prtdev(atadev, "%s identify failed\n", command == ATA_C_ATAPI_IDENTIFY ? "ATAPI" : "ATA"); free(ata_parm, M_ATA);
Re: Panic in fork()
On Sat, 2003/02/08 at 15:15:44 +0100, Morten Rodal wrote: > On Sat, Feb 08, 2003 at 03:05:12AM -0800, Kris Kennaway wrote: > > bento# addr2line -e kernel.debug 0xc01a1e2d > > ../../../kern/kern_fork.c:388 > > > > for (; p2 != NULL; p2 = LIST_NEXT(p2, p_list)) { > > PROC_LOCK(p2); > > 388 --> while (p2->p_pid == trypid || > > > > That is the exact same spot I saw my computer (old smp machine) crash. > I think someone mentioned that it would be more or less impossible to > crash there since one would not enter the for loop when p2 is NULL. > > Could it be that PROC_LOCK tampers with p2? addr2line will usually point to the first line of a statement if it spans multiple lines; in this case, the full guard is: while (p2->p_pid == trypid || p2->p_pgrp->pg_id == trypid || p2->p_session->s_sid == trypid) { The fault address indicates, that p2->p_pgrp->p_session (p_session is a macro that expands to p_pgrp->p_session) is NULL, since the offset of s_sid in struct session is 0x14. I haven't yet found out how that could happen though, this field is never legitimatly NULL and the locking seems to be tight so that it cannot be freed from under fork1(). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Panic in fork()
On Sun, 2003/02/09 at 14:39:36 +1100, Tim Robbins wrote: > On Sat, Feb 08, 2003 at 02:04:56PM -0800, Kris Kennaway wrote: > > > On Sat, Feb 08, 2003 at 04:12:26PM +0100, Thomas Moestl wrote: > > > > > addr2line will usually point to the first line of a statement if it > > > spans multiple lines; in this case, the full guard is: > > > > > > while (p2->p_pid == trypid || > > > p2->p_pgrp->pg_id == trypid || > > > p2->p_session->s_sid == trypid) { > > > > OK, I suspected that. > > > > tjr was looking into this last night and proposed the following patch: > > Alfred was the one who pointed out that holding proctree was probably > necessary, though :-) I don't really get why this is required - the pg_session pointer in struct pgrp is constant over the pgrp's lifetime, so for it to be invalid the wrong struct pgrp must be referenced; the p_pgrp pointer is protected by the process lock however, which is held for this check. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: What broke X between 5.0R and recent current?
On Mon, 2003/02/24 at 17:28:57 -0500, Daniel Eischen wrote: > So I got a new Dell laptop, ATI Radeon 7500. Installed FreeBSD > 5.0-RELEASE, X 4.2.1, and KDE right off the FreeBSD Mall CD-ROM. > I configured X from the installation setup and was happily > running X and KDE @ 1400x1050. Cool. > > Then I cvsup'd to a recent -current from about a week ago, > which my other laptop and desktop are currently running just > fine (both use very old 3.x or 4.x built X's, probably XFree86 > 3.2.x and running with compat libraries). Now X refuses to > work. It's not the kernel because before installing world, > I installed the kernel and booted it to make sure everything > still worked. X was happy with the old kernel too 'cause > I went back and tried it again also. But after the installworld, > X didn't work, not even xf86cfg. I tried the installworld with > and without mergemaster'ing and X didn't work regardless. I > went back and did another fresh install of 5.0R and X worked > again, but after another installworld, I got the same problem. > The XFree86 log file is at the end. It aborts with: > > [...] > (II) LoadModule: "ddc" > (II) Reloading /usr/X11R6/lib/modules/libddc.a > (II) RADEON(0): VESA VBE DDC supported > (II) RADEON(0): VESA VBE DDC Level none > (II) RADEON(0): VESA VBE DDC transfer in appr. 2 sec. > (II) RADEON(0): VESA VBE DDC read failed > (==) RADEON(0): Write-combining range (0xfcff,0x8) was already clear > (==) RADEON(0): Write-combining range (0xe000,0x200) > (II) RADEON(0): PLL parameters: rf=2700 rd=12 min=12000 max=35000; xclk=16600 > (==) RADEON(0): Using gamma correction (1.0, 1.0, 1.0) > > Fatal server error: > Caught signal 10. Server aborting > > Any ideas? I'm in the process of trying a more recent -current > and am also going to try and rebuild X (though we shouldn't > have to do that). I saw something like that on my notebook, too - in my case, it was a problem in the radeon driver which could be worked around by adding an explicit "Modes" line in the used Display subsection in XF86Config, like: Section "Screen" [...] DefaultColorDepth 24 [...] SubSection "Display" Depth 24 Modes "1024x768" EndSubSection EndSection - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: dereferencing type-punned pointer will break strict-aliasingrules
On Mon, 2003/07/28 at 09:30:08 +0900, Jun Kuriyama wrote: > > Is this caused by -oS option? > > - in making BOOTMFS in make release > cc -c -Os -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes > -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions > -std=c99 -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/dev > -I/usr/src/sys/contrib/dev/acpica -I/usr/src/sys/contrib/ipfilter > -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -D_KERNEL > -include opt_global.h -fno-common -finline-limit=15000 -mno-align-long-strings > -mpreferred-stack-boundary=2 -ffreestanding -Werror /usr/src/sys/geom/geom_dev.c > /usr/src/sys/geom/geom_dev.c: In function `g_dev_open': > /usr/src/sys/geom/geom_dev.c:198: warning: dereferencing type-punned pointer will > break strict-aliasing rules > [...] Yes, by implying -fstrict-aliasing, so using -fno-strict-aliasing is a workaround. The problem is caused by the i386 PCPU_GET/PCPU_SET implementation: #define __PCPU_GET(name) ({ \ __pcpu_type(name) __result; \ \ [...] } else if (sizeof(__result) == 4) { \ u_int __i; \ __asm __volatile("movl %%fs:%1,%0" \ : "=r" (__i)\ : "m" (*(u_int *)(__pcpu_offset(name; \ __result = *(__pcpu_type(name) *)&__i; \ [...] In this case, the PCPU_GET is used to retrieve curthread, causing sizeof(__result) to be 4, so the cast at the end of the code snippet is from a u_int * to struct thread *, and __i is accessed through the casted pointer, which violates the C99 aliasing rules. An alternative is to type-pun via a union, which is also a bit ugly, but explicitly allowed by C99. Patch attached (but only superficially tested). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: pcpu.h === RCS file: /vol/ncvs/src/sys/i386/include/pcpu.h,v retrieving revision 1.36 diff -u -r1.36 pcpu.h --- pcpu.h 27 Jun 2003 21:50:52 - 1.36 +++ pcpu.h 28 Jul 2003 01:37:57 - @@ -96,23 +96,32 @@ __pcpu_type(name) __result; \ \ if (sizeof(__result) == 1) {\ - u_char __b; \ + union { \ + u_char __b; \ + __pcpu_type(name) __r; \ + } __u; \ __asm __volatile("movb %%fs:%1,%0" \ - : "=r" (__b)\ + : "=r" (__u.__b)\ : "m" (*(u_char *)(__pcpu_offset(name; \ - __result = *(__pcpu_type(name) *)&__b; \ + __result = __u.__r; \ } else if (sizeof(__result) == 2) { \ - u_short __w;\ + union { \ + u_short __w;\ + __pcpu_type(name) __r; \ + } __u; \ __asm __volatile("movw %%fs:%1,%0" \ - : "=r" (__w)\ + : "=r" (__u.__w)\ : "m" (*(u_short *)(__pcpu_offset(name; \ - __result = *(__pcpu_type(name) *)&__w; \ + __result = __u.__r; \ } else if (sizeof(__result) == 4) { \ - u_int __i; \ +
Re: dereferencing type-punned pointer will break strict-aliasingrules
On Mon, 2003/07/28 at 03:59:00 +0200, Thomas Moestl wrote: > Yes, by implying -fstrict-aliasing, so using -fno-strict-aliasing is a > workaround. The problem is caused by the i386 PCPU_GET/PCPU_SET > implementation: > > #define __PCPU_GET(name) ({ \ > __pcpu_type(name) __result; \ > \ > [...] > } else if (sizeof(__result) == 4) { \ > u_int __i; \ > __asm __volatile("movl %%fs:%1,%0" \ > : "=r" (__i)\ > : "m" (*(u_int *)(__pcpu_offset(name; \ > __result = *(__pcpu_type(name) *)&__i; \ > [...] > > In this case, the PCPU_GET is used to retrieve curthread, causing > sizeof(__result) to be 4, so the cast at the end of the code snippet > is from a u_int * to struct thread *, and __i is accessed through the ^^^ struct thread **, of course. > casted pointer, which violates the C99 aliasing rules. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: panic: trap: fast data access mmu miss
On Fri, 2003/10/24 at 09:09:25 +0400, Alex Deiter wrote: > panic: trap: fast data access mmu miss > cpuid = 0; > Debugger("panic") > Stopped at Debugger+0x1c: ta %xcc, 1 > db> tr > panic() at panic+0x174 > trap() at trap+0x394 > -- fast data access mmu miss tar=0 %o7=0xc018b820 -- > quotactl() at quotactl+0x98 > syscall() at syscall+0x308 > -- syscall (148, FreeBSD ELF64, quotactl) %o7=0x1e3044 -- > userland() at 0x41187e88 > user trace: trap %o7=0x1e3044 > pc 0x41187e88, sp 0x7fde221 > pc 0x15149c, sp 0x7fde321 > pc 0x151818, sp 0x7fde871 > pc 0x1c771c, sp 0x7fde931 > pc 0x1a6938, sp 0x7fdea01 > pc 0x1b3904, sp 0x7fdec81 > pc 0x1d987c, sp 0x7fdedc1 > pc 0x1d99c0, sp 0x7fdeec1 > pc 0x1da06c, sp 0x7fdefa1 > pc 0x1db99c, sp 0x7fdf071 > pc 0x451ea8, sp 0x7fdf161 > pc 0x133560, sp 0x7fdf3f1 > pc 0x405d3f94, sp 0x7fdf4b1 > done I believe that the attached patch should fix that; the panic is not sparc64-specific, and should occur on all file systems that do not define a vop_getwritemount method. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: vfs_syscalls.c === RCS file: /vol/ncvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.331 diff -u -r1.331 vfs_syscalls.c --- vfs_syscalls.c 21 Aug 2003 13:53:01 - 1.331 +++ vfs_syscalls.c 24 Oct 2003 19:08:29 - @@ -189,7 +189,7 @@ caddr_t arg; } */ *uap; { - struct mount *mp; + struct mount *mp, *wmp; int error; struct nameidata nd; @@ -199,12 +199,13 @@ if ((error = namei(&nd)) != 0) return (error); NDFREE(&nd, NDF_ONLY_PNBUF); - error = vn_start_write(nd.ni_vp, &mp, V_WAIT | PCATCH); + error = vn_start_write(nd.ni_vp, &wmp, V_WAIT | PCATCH); + mp = nd.ni_vp->v_mount; vrele(nd.ni_vp); if (error) return (error); error = VFS_QUOTACTL(mp, uap->cmd, uap->uid, uap->arg, td); - vn_finished_write(mp); + vn_finished_write(wmp); return (error); } ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NULL pointer problem in pid selection ?
On Tue, 2003/03/11 at 08:43:46 +1100, Tim Robbins wrote: > On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote: > > > On 08-Mar-2003 Kris Kennaway wrote: > > > On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote: > > >> > > >> Just got this crash on -current, and I belive I have seen similar > > >> before. addr2line(1) reports the faulting address to be > > >> ../../../kern/kern_fork.c:395 > > >> which is in the inner loop of pid collision avoidance. > > > > > > I've been running this patch from Alfred for the past month or so on > > > bento, which has fixed a similar panic I was seeing regularly. > > > > Using just a shared lock instead of an xlock should be ok there. You > > aren't modifying the process tree, just looking at it. OTOH, the > > proc lock is supposed to protect p_grp and p_session, so they shouldn't > > be NULL. :( > > I have a suspiscion that the bug is actually in wait1(): > > sx_xlock(&proctree_lock); > [...] > /* >* Remove other references to this process to ensure >* we have an exclusive reference. >*/ > leavepgrp(p); > > sx_xlock(&allproc_lock); > LIST_REMOVE(p, p_list); /* off zombproc */ > sx_xunlock(&allproc_lock); > > LIST_REMOVE(p, p_sibling); > sx_xunlock(&proctree_lock); > > > Shouldn't we be removing the process from zombproc before setting > p_pgrp to NULL via leavepgrp()? Does this even matter at all when both > fork1() and wait1() are still protected by Giant? Hmmm, I think you're right; if allproc_lock happens to be contested in fork1() (which can happen because it it is locked without Giant held in some places, and because sleeping with an sx lock is allowed), we'll go to sleep there, dropping Giant. This opens up a race, since wait1() can now proceed until after the leavepgrp() before blocking; when allproc_lock is released, fork1() will be the first to pick it up, and this panic will happen. Seems that I relied on Giant too much when I first took a look into that code :) - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: phoenix crash in libc_r on sparc64
On Wed, 2003/06/04 at 00:30:36 -0700, Kris Kennaway wrote: > On Mon, Jun 02, 2003 at 04:15:43PM -0700, Kris Kennaway wrote: > > phoenix on my sparc64 crashed while idle with the following: > > > > Fatal error '_waitq_insert: Already in queue' at line 321 in file > > /usr/src/lib/libc_r/uthread/uthread_priority_queue.c (errno = 2) > > > > Any ideas? It should have dropped a core - can you please take a look at it with gdb? > One of the libc_r tests seems to hang: > > Test static library: > -- > Test c_user c_system c_total chng > passed/FAILEDh_user h_system h_total % chng > -- > hello_d 0.00 0.020.02 > passed > -- > hello_s 0.00 0.020.02 > passed > -- > join_leak_d 0.77 0.180.95 > passed > -- > mutex_d 9.0892.42 101.50 > passed > -- > sem_d 0.01 0.020.02 > passed > -- > sigsuspend_d0.00 0.020.02 > passed > -- > sigwait_d 0.00 0.020.02 > *** FAILED *** > -- > guard_s.pl > > It's been sitting there for hours now. This an unfortunate failure mode, which is caused by a fault on the stack while all signals are masked (by libc_r internals, I assume); the kernel will fail to store the user register windows on the stack, and because SIGILL is blocked, it cannot notify (or terminate) the process and is stuck trying to copy out the register windows over and over. > P.S. Why do 3 of the tests even fail on i386? The guard test includes constants which are machine- and compiler-specific, probably this broke due to a gcc upgrade. The sigwait test is killed by it's own SIGUSR1, and this behaviour actually looks correct to me (but I could easily be wrong, since the signal behaviour of pthreads seems to be quite complex). The propagate test failure is due to problems in libc (failing to use the underscored versions of functions overridden in libc_r). The attached patch should fix that; Daniel, does this look OK to you? - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: gen/sysconf.c === RCS file: /vol/ncvs/src/lib/libc/gen/sysconf.c,v retrieving revision 1.20 diff -u -r1.20 sysconf.c --- gen/sysconf.c 17 Nov 2002 08:54:29 - 1.20 +++ gen/sysconf.c 4 Jun 2003 20:44:47 - @@ -40,6 +40,7 @@ #include __FBSDID("$FreeBSD: src/lib/libc/gen/sysconf.c,v 1.20 2002/11/17 08:54:29 dougb Exp $"); +#include "namespace.h" #include #include #include @@ -52,6 +53,7 @@ #include/* we just need the limits */ #include #include +#include "un-namespace.h" #include "../stdlib/atexit.h" #include "../stdtime/tzfile.h" @@ -560,7 +562,7 @@ value = socket(PF_INET6, SOCK_DGRAM, 0); errno = sverrno; if (value >= 0) { - close(value); + _close(value); return (200112L); } else return (0); Index: include/namespace.h === RCS file: /vol/ncvs/src/lib/libc/include/namespace.h,v retrieving revision 1.16 diff -u -r1.16 namespace.h --- include/namespace.h 1 May 2003 19:03:13 - 1.16 +++ include/namespace.h 4 Jun 2003 20:38:29 - @@ -122,8 +122,10 @@ /*#define sigaction _sigaction*/ #definesigprocmask _sigprocmask #definesigsuspend _sigsuspend +#definesleep _sleep #definesocket _socket #de
Re: PCI bus numbering and orphaned devices
On Mon, 2003/06/09 at 16:58:38 -0700, John-Mark Gurney wrote: > Hello, > > I've recently started work on making FreeBSD work better on a sparc64 > box that a friend has. It's a Netra AX1105-500 (UltraSPARC-IIe 500MHz). > > So far I have found out that the pci bus numbering has problems. We > don't attach pci busses as they are numbered in the bridge/OFW info. > This causes problems with pciconf -l and pciconf -{w,r} not agreeing. > It isn't too hard to tie down the busses to make pciconf agree with > itself. > > [...] > > Index: apb.c > === > RCS file: /home/ncvs/src/sys/sparc64/pci/apb.c,v > retrieving revision 1.4 > diff -u -r1.4 apb.c > --- apb.c 2002/03/24 02:10:56 1.4 > +++ apb.c 2003/06/09 23:33:07 > @@ -207,9 +207,11 @@ >* number, we should pick a better value. One sensible alternative >* would be to pick 255; the only tradeoff here is that configuration >* transactions would be more widely routed than absolutely necessary. > + * > + * If we don't hardware the bus down, pciconf gets confused. >*/ > if (sc->secbus != 0) { > - child = device_add_child(dev, "pci", -1); > + child = device_add_child(dev, "pci", sc->secbus); > if (child != NULL) > return (bus_generic_attach(dev)); > } else This one looks good, please commit. The comment above is outdated, so it might be better to just remove it completely. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: PCI bus numbering and orphaned devices
On Tue, 2003/06/10 at 15:34:36 -0700, John-Mark Gurney wrote: > M. Warner Losh wrote this message on Tue, Jun 10, 2003 at 08:27 -0600: > > : > > hdrtype = REG(PCIR_HEADERTYPE, 1); > > : > This needs to be tested on that given hardware. > > : > I don't know if REG will work as expected because it asks function 0, > > : > which is disabled. > > : > > : I've reread John-Mark's last mail about the readable registers. > > : So - yes it should work. > > > > That's what inspired me. Also, I'd expected that we'd need some kind > > of tweaking to make it actually compile and be neat. > > Ok, attached is a patched I tried, Hmmm, you seem to have forgotten to actually attach it. > but sad to say, this doesn't work > out to well. I added a printf before and after the REG statement, and > a boot with the kernel give this output: > found-> vendor=0x108e, dev=0x5000, revid=0x13 > bus=0, slot=1, func=1 > class=06-04-00, hdrtype=0x01, mfdev=1 > cmdreg=0x0147, statreg=0x02a0, cachelnsz=16 (dwords) > lattimer=0x50 (2400 ns), mingnt=0x02 (500 ns), maxlat=0x00 (0 ns) > about to read HEADERTYPE > panic: trap: data access error > > [...] > > the last three lines repeate for a while, but this is because of: > psycho_read_config(...) > [...] > /* >* The psycho bridge does not tolerate accesses to unconfigured PCI >* devices' or function's config space, so look up the device in the >* firmware device tree first, and if it is not present, return a value >* that will make the detection code think that there is no device here. >* This is ugly... >*/ > if (reg == 0 && ofw_pci_find_node(bus, slot, func) == 0) > return (0x); > > Which obviously will fail if reg != 0 which we try to do when reading > the PCIR_HEADERTYPE.. > > So, the question is, does other arch's do something nasty like this > too? Should I change the check to just do ofw_pci_find_node? You could safely (it would just be slow), but that alone would not help you, since you would also be ignoring requests to the registers you want to read without further hackery. You could, of course, look into the device tree to see if there are devices at higher functions, that would just make that kludge more ugly than it already is :) There's a similar problem with hme devices in some Netra models, and so far I have just ignored this because of the ugliness involved (there were patches floating around at one point, but I did not commit them). The real fix (and the way I wanted to implement things from the beginning) is to write an OFW PCI bus, analogous to the ACPI one. This is high on my list, waiting for time to become available :) > Is this why pciconf -r is returning 0x when reading the ebus > and firewire parts of the SME2300BGA? Simply because it isn't in > the ofw tree? Could be. We just cannot handle devices without firmware nodes - we don't know whether we can safely access them, cannot assign interrupts etc. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: PCI bus numbering and orphaned devices
On Wed, 2003/06/11 at 01:16:50 +0200, Bernd Walter wrote: > On Tue, Jun 10, 2003 at 03:34:36PM -0700, John-Mark Gurney wrote: > > M. Warner Losh wrote this message on Tue, Jun 10, 2003 at 08:27 -0600: > > > : > > hdrtype = REG(PCIR_HEADERTYPE, 1); > > > : > This needs to be tested on that given hardware. > > > : > I don't know if REG will work as expected because it asks function 0, > > > : > which is disabled. > > > : > > > : I've reread John-Mark's last mail about the readable registers. > > > : So - yes it should work. > > > > > > That's what inspired me. Also, I'd expected that we'd need some kind > > > of tweaking to make it actually compile and be neat. > > > > Ok, attached is a patched I tried, but sad to say, this doesn't work > > out to well. I added a printf before and after the REG statement, and > > a boot with the kernel give this output: > > found-> vendor=0x108e, dev=0x5000, revid=0x13 > > bus=0, slot=1, func=1 > > class=06-04-00, hdrtype=0x01, mfdev=1 > > cmdreg=0x0147, statreg=0x02a0, cachelnsz=16 (dwords) > > lattimer=0x50 (2400 ns), mingnt=0x02 (500 ns), maxlat=0x00 (0 ns) > > about to read HEADERTYPE > > panic: trap: data access error > > cpuid = 0; > > Uptime: 1s > > panic: Assertion mtx_unowned(m) failed at ../../../kern/kern_mutex.c:956 > > cpuid = 0; > > Uptime: 1s > > panic: Assertion mtx_unowned(m) failed at ../../../kern/kern_mutex.c:956 > > cpuid = 0; > > Uptime: 1s > > > > the last three lines repeate for a while, but this is because of: > > psycho_read_config(...) > > [...] > > /* > > * The psycho bridge does not tolerate accesses to unconfigured PCI > > * devices' or function's config space, so look up the device in the > > * firmware device tree first, and if it is not present, return a value > > * that will make the detection code think that there is no device here. > > * This is ugly... > > */ > > if (reg == 0 && ofw_pci_find_node(bus, slot, func) == 0) > > return (0x); > > > > Which obviously will fail if reg != 0 which we try to do when reading > > the PCIR_HEADERTYPE.. > > > > So, the question is, does other arch's do something nasty like this > > too? Should I change the check to just do ofw_pci_find_node? Is this > > why pciconf -r is returning 0x when reading the ebus and firewire > > parts of the SME2300BGA? Simply because it isn't in the ofw tree? > > Possible - in fact I was very surprised that a disabled device was not > readable on some registers. > We have a similar situation on alpha, where we get traps for reading non > available devices. > It's handled in that we tell the system to expect traps before accessing > registers. > This is done by reading with the badaddr function, which sets a flag for > our trap handler so it can continue in case the device doesn't exist. > badaddr() returns a flags which tells if reading was successfull. > > > I don't have any data sheets or the PCI spec, so making heads or tails > > of this is going be hard. > > It's OK to get errors when reading locations that aren't available. > Some chipsets nerver trap, others only trap for type2 access (behind > Bridges) and some always trap. I don't have the standard handy, but from my reading of the Shanley book, it seems that for the vendor ID register, a host bridge is required to return 0x if no device is present. Loading this task off to the software is a bit annoying. There is no mention of other registers, so reading them in the probe might theoretically cause problems even on host bridges that handle the vendor ID register correctly. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: PCI bus numbering and orphaned devices
On Tue, 2003/06/10 at 16:41:44 -0700, John-Mark Gurney wrote: > Thomas Moestl wrote this message on Wed, Jun 11, 2003 at 01:02 +0200: > > On Tue, 2003/06/10 at 15:34:36 -0700, John-Mark Gurney wrote: > > There's a similar problem with hme devices in some Netra models, and > > so far I have just ignored this because of the ugliness involved > > (there were patches floating around at one point, but I did not commit > > them). > > > > The real fix (and the way I wanted to implement things from the > > beginning) is to write an OFW PCI bus, analogous to the ACPI one. This > > is high on my list, waiting for time to become available :) > > Yikes, I just started looking at the acpi code, and there's a lot of > code in it! There's much setup to be done that the firmware is too lazy to do for us. > > > Is this why pciconf -r is returning 0x when reading the ebus > > > and firewire parts of the SME2300BGA? Simply because it isn't in > > > the ofw tree? > > > > Could be. We just cannot handle devices without firmware nodes - we > > don't know whether we can safely access them, cannot assign interrupts > > etc. > > Ok, the only problem is that is then we have the same problem the ACPI > code does in that hot swapping cards would have a problem. Since it > appears to me that the OFW tree doesn't get updated upon a swap. (At > least the usb part of the tree doesn't.) We do not support hotplugging at the moment anyway. If a bridge driver would implement that in the future without using any firmware support however, it will then need to know everything information about the interrupt routing required for its devices if it cannot use the firmware for this. in that case, it can just prevent the ofw_pci bus from attaching to it (this will be easily possible). I'd hope that machines that support hot-plugging of PCI devices would have firmware methods available to support that though. > Does this mean that we should eliminate most of the Sun specific pci > bus drivers in favor of OFW specific ones like ACPI? or? No, it means introducing an OFW bus driver, which uses the firmware to enumerate the devices and to support interrupt routing. The bridge drivers would be mostly unaffected by this. The only problem with this approach is that it can change the device enumeration; I hope that the resulting one will be the same one that is printed on the boxen, so it should be advantageous for new installations, but a minor migration problem for old ones. I've got some code for this already, it just isn't done yet. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: PCI bus numbering and orphaned devices
On Wed, 2003/06/11 at 08:34:39 -0600, M. Warner Losh wrote: > In message: <[EMAIL PROTECTED]> > Thomas Moestl <[EMAIL PROTECTED]> writes: > : On Tue, 2003/06/10 at 16:41:44 -0700, John-Mark Gurney wrote: > : > Thomas Moestl wrote this message on Wed, Jun 11, 2003 at 01:02 +0200: > : > Ok, the only problem is that is then we have the same problem the ACPI > : > code does in that hot swapping cards would have a problem. Since it > : > appears to me that the OFW tree doesn't get updated upon a swap. (At > : > least the usb part of the tree doesn't.) > : > : We do not support hotplugging at the moment anyway. If a bridge driver > : would implement that in the future without using any firmware support > : however, it will then need to know everything information about the > : interrupt routing required for its devices if it cannot use the > : firmware for this. in that case, it can just prevent the ofw_pci bus > : from attaching to it (this will be easily possible). > : I'd hope that machines that support hot-plugging of PCI devices would > : have firmware methods available to support that though. > > We'll need to do so for the cardbus bridges. Yes, I was just speaking of PCI. > However, the interupt is routed to the bridge and has to be shared > with the cardbus/pccard cards. That should be far less of a problem with the new code, since it will make interrupt routing work right finally (right now, everything needs to be prerouted). > : > Does this mean that we should eliminate most of the Sun specific pci > : > bus drivers in favor of OFW specific ones like ACPI? or? > : > : No, it means introducing an OFW bus driver, which uses the firmware to > : enumerate the devices and to support interrupt routing. The bridge > : drivers would be mostly unaffected by this. > : The only problem with this approach is that it can change the device > : enumeration; I hope that the resulting one will be the same one that is > : printed on the boxen, so it should be advantageous for new > : installations, but a minor migration problem for old ones. > : > : I've got some code for this already, it just isn't done yet. > > So are you talking about doing something akin to the acpi bridge code > or something else? Would this more properly be called a OFW PCI bus > driver, Yes, that's what I meant to say. It will "override" some of the PCI methods (by using it's own method table), and use the rest of them unaltered. It will attach to PCI bridges which offer an additional method to get the firmware device node (but with a higher priority than the standard PCI bus), so bridges can choose which bus driver they want to have attached by offering or not offering that method. Of course, there will need to be a generic OFW PCI-PCI bridge driver that adds this method, but it's needed anyway to override the standard interrupt routing method. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: [-CURRENT tinderbox] failure on sparc64/sparc64
On Tue, 2003/07/15 at 12:04:56 -0700, Marcel Moolenaar wrote: > On Tue, Jul 15, 2003 at 09:00:17PM +0200, Dag-Erling Sm?rgrav wrote: > > Marcel Moolenaar <[EMAIL PROTECTED]> writes: > > > It needs to be analyzed because cross-builds should not fail. Do > > > we have a machine problem? What exactly is dumping core? Is it > > > gzip or some binary started immediately after it? If it's gzip, > > > is there a relation with the recent compiler warning about strncmp? > > > > It's not a machine problem if it only happens to the sparc64 build - > > the same machine runs all the other -CURRENT tinderboxen except > > powerpc. > > It does not only happen to sparc64. I've seen it fail for all but > i386 and pc98, I think. i386 and pc98 have failed too (random example: July 5). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: different packing of structs in kernel vs. userland ?
On Sun, 2002/07/14 at 01:18:10 -0700, Luigi Rizzo wrote: > Hi, > the following message seems to suggest that the compiler > (the way it is invoked) packs structures differently > when building the kernel and userland. > > The stize of the structure in question is computed > by both kernel and userland app using sizeof(), > so there is no assumption on the size of its members, > so i believe the only possibility of a mismatch is > the one above. > > Any ideas ? (Disclaimer: my solution below is untested, so it may all be bogus) No, you are not accounting for "external" structure padding. Take a look: struct ip_fw { struct ip_fw *next; /* linked list of rules */ u_int16_t act_ofs;/* offset of action in 32-bit units */ u_int16_t cmd_len;/* # of 32-bit words in cmd */ u_int16_t rulenum;/* rule number */ u_int16_t _pad; /* padding */ /* These fields are present in all rules. */ u_int64_t pcnt; /* Packet counter */ u_int64_t bcnt; /* Byte counter */ u_int32_t timestamp; /* tv_sec of last match */ struct ip_fw *next_rule; /* ptr to next rule */ ipfw_insn cmd[1]; /* storage for commands */ }; On a 64-bit architecture, pointers are obviously 8 bytes in size; structure members must or should be on natural borders, depending on the architecture. So, next_rule will not be on a natural border; 4 bytes of padding will be inserted before it. With that, the total structure size would be 52. The compiler must account for the fact that an array of struct ip_fws may be used. For obvious reasons, it can not just insert extra padding in the array case; instead, the structure size must be chosen so that in this situation, the first member will be on a natural border. This results in an extra 4 bytes of "external" padding at the end, after the member 'cmd'. The macro you use to compute the size in the kernel is: #define RULESIZE(rule) (sizeof(struct ip_fw) + \ ((struct ip_fw *)(rule))->cmd_len * 4 - 4) In the userland code, you start at &foo.cmd and append data directly. This means that the padding will also be used to store data, so the '- 4' (= sizeof(foo.cmd)) will not always be enough. The following definition of RULESIZE (untested) should fix this: #define RULESIZE(rule) (offsetof(struct ip_fw, cmd) + \ ((struct ip_fw *)(rule))->cmd_len * 4) It also removes the explicit 4 for sizeof(ipfw_insn). - thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: different packing of structs in kernel vs. userland ?
On Sun, 2002/07/14 at 13:43:37 -0700, Luigi Rizzo wrote: > [i am deliberately not trimming the email in case someone wants to > look at the context] > > i am a bit dubious about your explaination -- it also does not > explain why the person reporting this problem "fixed" that > by swapping "timestamp" and "next_rule" in the structure It does - doing so removes the need for padding before 'next_rule', because it is properly aligned then. 'timestamp' and 'cmd' are both 4 bytes in size and immediately follow each other, so the total structure size is a multiple of 8 (48 bytes). Because of that, no padding after 'cmd' is required, and the effect is gone. - thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: different packing of structs in kernel vs. userland ?
On Sun, 2002/07/14 at 23:08:21 -0400, Mike Barcroft wrote: > Thomas Moestl <[EMAIL PROTECTED]> writes: > > (Disclaimer: my solution below is untested, so it may all be bogus) > > As request, here are the test results. > > Most rules work, except my final one: > %%% > bowie# ipfw add allow all from any to any > ipfw: getsockopt(IP_FW_ADD): Invalid argument > %%% Oh, right, that's related: the kernel checks for a minimum size of the passed data on two occasions, first in sooptcopyin(), and then again in check_ipfw_struct(). It the size to be at least sizeof(struct ip_fw), however for structures containing just one action (like the one for the command above) this is again too much in the 64-bit case because of the padding. Can you please try the attached patch (against the CVS version)? - thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: ip_fw.h === RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v retrieving revision 1.71 diff -u -r1.71 ip_fw.h --- ip_fw.h 8 Jul 2002 22:39:19 - 1.71 +++ ip_fw.h 15 Jul 2002 10:48:19 - @@ -294,8 +294,9 @@ #define ACTION_PTR(rule) \ (ipfw_insn *)( (u_int32_t *)((rule)->cmd) + ((rule)->act_ofs) ) -#define RULESIZE(rule) (sizeof(struct ip_fw) + \ - ((struct ip_fw *)(rule))->cmd_len * 4 - 4) +#defineRULESIZE_FROMLEN(len) (offsetof(struct ip_fw, cmd) + (len) * 4) +#defineRULESIZE(rule) RULESIZE_FROMLEN(((struct ip_fw *)(rule))->cmd_len) +#defineRULESIZE_MINRULESIZE_FROMLEN(1) /* * This structure is used as a flow mask and a flow id for various Index: ip_fw2.c === RCS file: /home/ncvs/src/sys/netinet/ip_fw2.c,v retrieving revision 1.4 diff -u -r1.4 ip_fw2.c --- ip_fw2.c8 Jul 2002 22:46:01 - 1.4 +++ ip_fw2.c15 Jul 2002 10:38:09 - @@ -2142,7 +2142,7 @@ int have_action=0; ipfw_insn *cmd; - if (size < sizeof(*rule)) { + if (size < RULESIZE_MIN) { printf("ipfw: rule too short\n"); return (EINVAL); } @@ -2428,7 +2428,7 @@ case IP_FW_ADD: rule = (struct ip_fw *)rule_buf; /* XXX do a malloc */ error = sooptcopyin(sopt, rule, sizeof(rule_buf), - sizeof(struct ip_fw) ); + RULESIZE_MIN); size = sopt->sopt_valsize; if (error || (error = check_ipfw_struct(rule, size))) break; To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: different packing of structs in kernel vs. userland ?
On Mon, 2002/07/15 at 04:00:08 -0700, Luigi Rizzo wrote: > sorry but all this just does not make sense to me. > > sizeof(foo) should give the same result irrespective of > where you use it. OK, let me rephrase it: as I explained before, struct ip_fw has padding after 'cmd' (the last member) to ensure that arrays can be built from it safely, so that the first member will always be properly aligned. Since the first members must/should be aligned on an 8-bit boundary on 64-bit platforms, this means that sizeof(struct ip_fw) must be a multiple of 8, the size of the padding is 4 bytes (unless the situation is changed by reordering structure members). This can easily be checked on a 64-bit platform. The following program fragment: struct ip_fw f; printf("sizeof(ip_fw) = %d\n", (int)sizeof(f)); printf("offsetof(ip_fw, cmd) = %d\n", (int)offsetof(struct ip_fw, cmd)); printf("sizeof(ip_fw.cmd) = %d\n", (int)sizeof(f.cmd)); Produces this output on sparc64: sizeof(ip_fw) = 56 offsetof(ip_fw, cmd) = 48 sizeof(ip_fw.cmd) = 4 This illustrates that indeed, padding is appended after 'cmd'. In the (userland) ipfw2.c, you basically do the following: ipfw_insn *dst; /* sizeof(ipfw_insn) = 4 */ dst = (ipfw_insn *)rule->cmd; /* Write n instructions and increase dst accordingly. */ rule->cmd_len = (u_int32_t *)dst - (u_int32_t *)(rule->cmd); i = (void *)dst - (void *)rule; if (getsockopt(s, IPPROTO_IP, IP_FW_ADD, rule, &i) == -1) err(EX_UNAVAILABLE, "getsockopt(%s)", "IP_FW_ADD"); Let's consider the case where only one instruction was added. In this case, dst was incremented once and points directly after cmd, so i is 52 on a 64-bit platform. However, sizeof(struct ip_fw) is 56 because the aformentioned 4 bytes of padding following 'cmd', so i < sizeof(struct ip_fw). This explains why rules with just one instruction would not work properly in this case with just my first patch. Likewise, when adding more rules, the second one will be added to the memory location directly following 'cmd'. If padding is present, the second instruction will write into it. The size of the total structure will thus not be properly computed by the old RULESIZE macro: #define RULESIZE(rule) (sizeof(struct ip_fw) + \ ((struct ip_fw *)(rule))->cmd_len * 4 - 4) The '- 4' is meant to subtract the size of the cmd, which is accounted for in cmd_len. Still, you are counting the padding twice, once in the sizeof() and once in cmd_len. So, sizeof(struct ip_fw) is no different between userland and kernel, but the problem is that you don't use sizeof(struct ip_fw) in userland to compute the sizes (but pointer arithmetic), but you do use it for the checks in the kernel. - thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: different packing of structs in kernel vs. userland ?
On Mon, 2002/07/15 at 04:24:33 -0700, Terry Lambert wrote: > Luigi Rizzo wrote: > > sorry but all this just does not make sense to me. > > > > sizeof(foo) should give the same result irrespective of > > where you use it. > > > > Perhaps the best thing would be to put a > > > > printf("struct ip_fw has size %d\n", sizeof(struct ip_fw)); > > > > both in ipfw2.c and somewhere in ip_fw2.c and see if there is > > a mismatch between the two numbers. > > I have to assume that what didn't make sense was that his patch > worked? 8-). > > He's making the valid point that for: > > struct foo *fee; > > It's possible that: > > sizeof(struct foo) != (((char *)&fee[1]) - ((char *)&fee[0])) No, I do not. In fact, the opposite: sizeof(struct foo) = (((char *)&fee[1]) - ((char *)&fee[0])) _must_ always be true, since it is legal to compute the size of storage needed for an n-element array of struct foo by using (sizeof(struct foo) * n). My point was that, because of the above, any padding that might be required between the first and last member of two struct foo's immediately following each other must be _included_ in struct foo, after the last element. - thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: alpha tinderbox failure - kernel is broken.
On Tue, 2002/09/03 at 09:37:14 -0700, Peter Wemm wrote: > Bernd Walter wrote: > > On Tue, Sep 03, 2002 at 09:01:07AM -0700, Peter Wemm wrote: > > I was running -current from 2002/08/11 before without any sign about > > this kind of problem. > > Building libiconv failed reproduceable for me, but booting an > > 2002/08/11 kernel made me build the port. > > Yes, imgact_elf.c rev 1.121 is the culprit. Reverting that change solves > the problem. I have attached a patch which, I believe, should fix the problem. I have no alpha box, so I cannot test kernel patches beyond compiling them, so be warned, all below is just a theory, and the patch might of course be broken (so keep kernel.old around :). The problem was caused by the fact that on static executable, the text segment is writable on alpha, so the heuristic prot & VM_PROT_WRITE in the ELF image activator will regard everything as a data segment. This has the (non-fatal) effect that the program text size is regarded to be 0. Much more fatal, however, is that obreak() assumes that all data segments start on consecutive pages (see below). Newer binutils will however place the data segment on the next 64k page at the same offset after the text segment (probably to make it easier for the OS to use super pages), so that holes of more than a page size can occur. obreak() will calculate the heap end address by taking the start of the program data and adding the current data size. The data size of a process is initially set by the image activator; the ELF one sums up the number of 8k-pages actually needed to hold the data. Now, if a "hole" happens to be between the segments that the image activator thinks to hold data, (start address + number of used pages) does of course not suffice to calculate the end address any more. The result is that the vm_map_insert() in obreak() can collide with program segments when trying to insert a mapping starting with the old address (that was calculated incorrectly), so it will fail, causing ENOMEM to be returned. For dynamic executables, this does not occur because the text segment is not writable; the dynamic section, which is writable and executable (because of the plt) starts after the hole and is directly followed by the rest of the data. The attached patch does just change the heuristics used to detect "text" segments to look for executable segments (using an idea from Peter). This results in the fact that dynamic section is viewed as text, which should not break anything. This way, it should be possible to avoid the hole currently; a real fix would be to add a new vmspace field to represent the heap size including holes which could then be used by obreak(), while vm_dsize would only be used for statistics (which is however difficult to maintain when shrinking below the initial size with brk()). Can somebody who is feeling adventurous and has an alpha box please test whether this fixes it for now? Thanks, - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: kern/imgact_elf.c === RCS file: /home/ncvs/src/sys/kern/imgact_elf.c,v retrieving revision 1.124 diff -u -r1.124 imgact_elf.c --- kern/imgact_elf.c 2 Sep 2002 17:27:30 - 1.124 +++ kern/imgact_elf.c 3 Sep 2002 17:10:21 - @@ -738,14 +738,14 @@ * to distinguish between the two for the purpose * of limit checking and vmspace fields. */ - if (prot & VM_PROT_WRITE) { - data_size += seg_size; - if (data_addr == 0) - data_addr = seg_addr; - } else { + if (prot & VM_PROT_EXECUTE) { text_size += seg_size; if (text_addr == 0) text_addr = seg_addr; + } else { + data_size += seg_size; + if (data_addr == 0) + data_addr = seg_addr; } /* To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: alpha tinderbox failure - kernel is broken.
On Tue, 2002/09/03 at 11:21:05 -0700, Matthew Dillon wrote: > > :The attached patch does just change the heuristics used to detect > :"text" segments to look for executable segments (using an idea from > :Peter). This results in the fact that dynamic section is viewed as > :text, which should not break anything. > :This way, it should be possible to avoid the hole currently; a real > :fix would be to add a new vmspace field to represent the heap size > :including holes which could then be used by obreak(), while vm_dsize > :would only be used for statistics (which is however difficult to > :maintain when shrinking below the initial size with brk()). > : > :Can somebody who is feeling adventurous and has an alpha box please > :test whether this fixes it for now? > : > :Thanks, > : - Thomas > > Excellent Thomas! Thanks for tracking this down. Your patch looks > far better then mine if the circumstances of the failure are as you > believe. > > As soon as we get verification that your patch solves the problem, > I will commit / MFC cycle it. > > I am also still somewhat worried about the data segment start address > and I am wondering if I should remove the if (data_addr == 0) > and instead unconditionally set data_addr to the last data segment > loaded (which is what the original code did). That would only allow to shrink bss, but since that seems to be the traditional behaviour (and it's not likely that anybody would like to shrink away other segments), that would probably better. I think this would also require to add that other vmspace field as I described above, otherwise there would be a gap between bss and the heap start because vm_dsize counts all data segments. However, it has the advantage of making this real (non-workaround) fix easy to implement, since no bookkeeping problems would occur when shrinking just bss away (as no holes can be crossed). - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: alpha tinderbox failure - kernel is broken.
On Tue, 2002/09/03 at 15:11:06 -0400, John Baldwin wrote: > > On 03-Sep-2002 Thomas Moestl wrote: > > On Tue, 2002/09/03 at 09:37:14 -0700, Peter Wemm wrote: > >> Bernd Walter wrote: > >> > On Tue, Sep 03, 2002 at 09:01:07AM -0700, Peter Wemm wrote: > >> > I was running -current from 2002/08/11 before without any sign about > >> > this kind of problem. > >> > Building libiconv failed reproduceable for me, but booting an > >> > 2002/08/11 kernel made me build the port. > >> > >> Yes, imgact_elf.c rev 1.121 is the culprit. Reverting that change solves > >> the problem. > > > > Can somebody who is feeling adventurous and has an alpha box please > > test whether this fixes it for now? > > Nope, if anything it's now worse. :( We should perhaps revert this > change in -stable until we can get it to work in -current. FWIW, with > the patch all sorts of programs no longer work including find, > rpc.lockd, cron, sendmail, getty, etc., not just static c++ programs. Thanks for testing, and sorry! This time, I broke dynmically linked programs :) It turns out that only C++ programs actually had their text segments mapped writable; dynamically linked programs have their data segment mapped executable though (contrary to what I said before, the PLT is actually included in the data segment, sorry). So, protections cannot be used to discriminate between text and data. I have attached a a new workaround patch that uses the old method to find the text segment again (i.e. finding the entry point), and treats everything else as data. This time it's tested (thanks to jhb) and actually seems to work. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: imgact_elf.c === RCS file: /home/ncvs/src/sys/kern/imgact_elf.c,v retrieving revision 1.124 diff -u -r1.124 imgact_elf.c --- imgact_elf.c2 Sep 2002 17:27:30 - 1.124 +++ imgact_elf.c3 Sep 2002 19:11:58 - @@ -734,18 +734,20 @@ phdr[i].p_vaddr - seg_addr); /* -* Is this .text or .data? Use VM_PROT_WRITE -* to distinguish between the two for the purpose -* of limit checking and vmspace fields. +* Check whether the entry point is in this segment +* to determine whether to count is as text or data. +* XXX: this needs to be done better! */ - if (prot & VM_PROT_WRITE) { + if (hdr->e_entry >= phdr[i].p_vaddr && + hdr->e_entry < (phdr[i].p_vaddr + + phdr[i].p_memsz)) { + text_size = seg_size; + text_addr = seg_addr; + entry = (u_long)hdr->e_entry; + } else { data_size += seg_size; if (data_addr == 0) data_addr = seg_addr; - } else { - text_size += seg_size; - if (text_addr == 0) - text_addr = seg_addr; } /* @@ -762,12 +764,6 @@ goto fail; } - /* Does the entry point belong to this segment? */ - if (hdr->e_entry >= phdr[i].p_vaddr && - hdr->e_entry < (phdr[i].p_vaddr + - phdr[i].p_memsz)) { - entry = (u_long)hdr->e_entry; - } break; case PT_PHDR: /* Program header table info */ proghdr = phdr[i].p_vaddr; To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: alpha tinderbox failure - kernel is broken.
On Tue, 2002/09/03 at 20:32:48 +0200, Thomas Moestl wrote: > On Tue, 2002/09/03 at 11:21:05 -0700, Matthew Dillon wrote: > > I am also still somewhat worried about the data segment start address > > and I am wondering if I should remove the if (data_addr == 0) > > and instead unconditionally set data_addr to the last data segment > > loaded (which is what the original code did). > > That would only allow to shrink bss, but since that seems to be the > traditional behaviour (and it's not likely that anybody would like to > shrink away other segments), that would probably better. Huh, that should read data+bss for usual elf binaries which share the two in one segment (and there seems to be some code around in other places that expect binaries formed with only two PT_LOAD segments). Assuming that, setting data_addr conditionally or unconditionally should not make any difference, it will always be set for the first data PT_LOAD segment and there will be only one (the other one will be text). Sorry for the confusion, - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Anyone seeing snp(4) problems?
On Sun, 2002/11/10 at 17:40:47 +, Dima Dorfman wrote: > Juli Mallett <[EMAIL PROTECTED]> wrote: > > I just tried to "sudo watch ttyv1" and ran into the following: > > > > % Fatal trap 12: page fault while in kernel mode > .. > > Looks like use of a NULL structure, accessing member at offsetof==0x60? > > > > Anyway, I couldn't get a dump, but I'll keep trying... Also this kernel > > is a bit stale, but it'll take a while to get the kernel on this box updated, > > so I figured I'd go ahead and post now, and try with a new one when I can. > > Is snp loaded from a module, and if it is, are the modules in sync > with the kernel? I tried the above command on a -current about a > month old and it works for me. If something broke recently, I'm > interested in tracebacks. Speaking of snp, what do you think of the attached patches? They fix three problems: - the ioctl()s really operate on udev_ts. dev_t is defined differently in kernel and userland and is even of different size on 64-bit platforms, leading to an ioctl() number mismatch there. - SNPGTTY returned a kernel pointer instead of a correctly formed udev_t - watch(8) assumed that FIONREAD takes a size_t *, when it really takes just an int * The last patch updates the manpage accordingly. - Thomas -- Thomas Moestl <[EMAIL PROTECTED]> http://www.tu-bs.de/~y0015675/ <[EMAIL PROTECTED]> http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C Index: sys/snoop.h === RCS file: /d/ncvs/src/sys/sys/snoop.h,v retrieving revision 1.21 diff -u -r1.21 snoop.h --- sys/snoop.h 10 Sep 2002 03:58:44 - 1.21 +++ sys/snoop.h 10 Nov 2002 19:49:39 - @@ -30,8 +30,8 @@ * detached from its current tty. */ -#define SNPSTTY _IOW('T', 90, dev_t) -#define SNPGTTY _IOR('T', 89, dev_t) +#define SNPSTTY _IOW('T', 90, udev_t) +#define SNPGTTY _IOR('T', 89, udev_t) /* * These values would be returned by FIONREAD ioctl Index: dev/snp/snp.c === RCS file: /d/ncvs/src/sys/dev/snp/snp.c,v retrieving revision 1.73 diff -u -r1.73 snp.c --- dev/snp/snp.c 10 Apr 2002 03:51:49 - 1.73 +++ dev/snp/snp.c 10 Nov 2002 19:49:39 - @@ -544,7 +544,7 @@ * SNPGTTY happy, else we can't know what is device * major/minor for tty. */ - *((dev_t *)data) = snp->snp_target; + *((udev_t *)data) = dev2udev(snp->snp_target); break; case FIONBIO: Index: watch.c === RCS file: /d/ncvs/src/usr.sbin/watch/watch.c,v retrieving revision 1.26 diff -u -r1.26 watch.c --- watch.c 10 Aug 2002 08:42:10 - 1.26 +++ watch.c 10 Nov 2002 19:49:09 - @@ -285,8 +285,8 @@ int main(int ac, char *av[]) { - int res, idata, rv; - size_t nread, b_size = MIN_SIZE; + int res, rv, nread; + size_t b_size = MIN_SIZE; charch, *buf, chb[READB_LEN]; fd_set fd_s; @@ -362,7 +362,7 @@ if (nread > READB_LEN) nread = READB_LEN; rv = read(std_in, chb, nread); - if (rv == -1 || (unsigned)rv != nread) + if (rv == -1 || rv != nread) fatal(EX_IOERR, "read (stdin) failed"); switch (chb[0]) { @@ -379,7 +379,7 @@ default: if (opt_write) { rv = write(snp_io, chb, nread); - if (rv == -1 || (unsigned)rv != nread) { + if (rv == -1 || rv != nread) { detach_snp(); if (opt_no_switch) fatal(EX_IOERR, @@ -394,10 +394,10 @@ if (!FD_ISSET(snp_io, &fd_s)) continue; - if ((res = ioctl(snp_io, FIONREAD, &idata)) != 0) + if ((res = ioctl(snp_io, FIONREAD, &nread)) != 0) fatal(EX_OSERR, "ioctl(FIONREAD)"); - switch (idata) { + switch (nread) { case SNP_OFLOW: if (opt_reconn_oflow) attach_snp(); @@ -418,7 +418,6 @@ cleanup(-1); break; defaul
Re: Preparing innocent users for -current
On Fri, 2002/03/08 at 11:23:36 -0800, Doug Barton wrote: > 3. xconsole causes periodic panics. The problem (according to BDE) is "a > well-know bug in printf(9)," caused by "The TIOCCONS ioctl ... panics when > printf() is called while sched_lock is held." I reported this bug in > October 2001, if anyone wants to look through the archives. While this issue is still present, printf()s with sched_lock held seem to fortunately be quite rare. IIRC your panics were caused by the "microuptime went backwards" message, which was recently removed. The only other relatively frequently reported printf() in this category I can think of at the moment is for the "calcru: negative time..." message. - thomas To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message