Re: hang in i386 pmap_tlb_shootwait
On Wed, May 09, 2018 at 06:21:54PM +0200, Hans-Joerg Hoexer wrote: > Hi, > > I think this fallout from using interrupt gates now. I did not properly > enable interrupts for dna, fpu and f00f_redirect: Thux npxintr() tries to > get the kernel lock with interrupts disabled. Meanwhile the IPI for tlb > shootdown is pending for delivery. When the sender of the IPI is holding > the kernel lock it will spin in pmap_tlb_shootwait() and we dead lock. > > Diff below fixes dna, fpu and f00f_redirect by enabling interrupts. > > (dna and fpu leave the kernel directly, thus they have to disable > interrupts again; f00f_redirect goes through calltrap which will enable > interrupts) > > Take care, > HJ. > This makes sense, ok mlarkin. -ml > Index: sys/arch/i386//i386/locore.s > === > RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v > retrieving revision 1.185 > diff -u -p -u -p -r1.185 locore.s > --- sys/arch/i386//i386/locore.s 11 Apr 2018 15:44:08 - 1.185 > +++ sys/arch/i386//i386/locore.s 9 May 2018 15:47:51 - > @@ -988,6 +988,7 @@ IDTVEC(dna) > pushl $0 # dummy error code > pushl $T_DNA > INTRENTRY(dna) > + sti > pushl CPUVAR(SELF) > call*_C_LABEL(npxdna_func) > addl$4,%esp > @@ -996,6 +997,7 @@ IDTVEC(dna) > #ifdef DIAGNOSTIC > movl$0xfd,%esi > #endif > + cli > INTRFASTEXIT > #else > ZTRAP(T_DNA) > @@ -1015,6 +1017,7 @@ IDTVEC(prot) > IDTVEC(f00f_redirect) > pushl $T_PAGEFLT > INTRENTRY(f00f_redirect) > + sti > testb $PGEX_U,TF_ERR(%esp) > jnz calltrap > movl%cr2,%eax > @@ -1050,6 +1053,7 @@ IDTVEC(fpu) >*/ > subl$8,%esp /* space for tf_{err,trapno} */ > INTRENTRY(fpu) > + sti > pushl CPL # if_ppl in intrframe > pushl %esp# push address of intrframe > incl_C_LABEL(uvmexp)+V_TRAP > @@ -1058,6 +1062,7 @@ IDTVEC(fpu) > #ifdef DIAGNOSTIC > movl$0xfc,%esi > #endif > + cli > INTRFASTEXIT > #else > ZTRAP(T_ARITHTRAP) >
Re: hang in i386 pmap_tlb_shootwait
On Wed, May 09, 2018 at 06:21:54PM +0200, Hans-Joerg Hoexer wrote: > Hi, > > I think this fallout from using interrupt gates now. I did not properly > enable interrupts for dna, fpu and f00f_redirect: Thux npxintr() tries to > get the kernel lock with interrupts disabled. Meanwhile the IPI for tlb > shootdown is pending for delivery. When the sender of the IPI is holding > the kernel lock it will spin in pmap_tlb_shootwait() and we dead lock. > > Diff below fixes dna, fpu and f00f_redirect by enabling interrupts. This fixes my test setup. bluhm > (dna and fpu leave the kernel directly, thus they have to disable > interrupts again; f00f_redirect goes through calltrap which will enable > interrupts) > > Take care, > HJ. > > Index: sys/arch/i386//i386/locore.s > === > RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v > retrieving revision 1.185 > diff -u -p -u -p -r1.185 locore.s > --- sys/arch/i386//i386/locore.s 11 Apr 2018 15:44:08 - 1.185 > +++ sys/arch/i386//i386/locore.s 9 May 2018 15:47:51 - > @@ -988,6 +988,7 @@ IDTVEC(dna) > pushl $0 # dummy error code > pushl $T_DNA > INTRENTRY(dna) > + sti > pushl CPUVAR(SELF) > call*_C_LABEL(npxdna_func) > addl$4,%esp > @@ -996,6 +997,7 @@ IDTVEC(dna) > #ifdef DIAGNOSTIC > movl$0xfd,%esi > #endif > + cli > INTRFASTEXIT > #else > ZTRAP(T_DNA) > @@ -1015,6 +1017,7 @@ IDTVEC(prot) > IDTVEC(f00f_redirect) > pushl $T_PAGEFLT > INTRENTRY(f00f_redirect) > + sti > testb $PGEX_U,TF_ERR(%esp) > jnz calltrap > movl%cr2,%eax > @@ -1050,6 +1053,7 @@ IDTVEC(fpu) >*/ > subl$8,%esp /* space for tf_{err,trapno} */ > INTRENTRY(fpu) > + sti > pushl CPL # if_ppl in intrframe > pushl %esp# push address of intrframe > incl_C_LABEL(uvmexp)+V_TRAP > @@ -1058,6 +1062,7 @@ IDTVEC(fpu) > #ifdef DIAGNOSTIC > movl$0xfc,%esi > #endif > + cli > INTRFASTEXIT > #else > ZTRAP(T_ARITHTRAP)
Re: hang in i386 pmap_tlb_shootwait
Hi, I think this fallout from using interrupt gates now. I did not properly enable interrupts for dna, fpu and f00f_redirect: Thux npxintr() tries to get the kernel lock with interrupts disabled. Meanwhile the IPI for tlb shootdown is pending for delivery. When the sender of the IPI is holding the kernel lock it will spin in pmap_tlb_shootwait() and we dead lock. Diff below fixes dna, fpu and f00f_redirect by enabling interrupts. (dna and fpu leave the kernel directly, thus they have to disable interrupts again; f00f_redirect goes through calltrap which will enable interrupts) Take care, HJ. Index: sys/arch/i386//i386/locore.s === RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v retrieving revision 1.185 diff -u -p -u -p -r1.185 locore.s --- sys/arch/i386//i386/locore.s11 Apr 2018 15:44:08 - 1.185 +++ sys/arch/i386//i386/locore.s9 May 2018 15:47:51 - @@ -988,6 +988,7 @@ IDTVEC(dna) pushl $0 # dummy error code pushl $T_DNA INTRENTRY(dna) + sti pushl CPUVAR(SELF) call*_C_LABEL(npxdna_func) addl$4,%esp @@ -996,6 +997,7 @@ IDTVEC(dna) #ifdef DIAGNOSTIC movl$0xfd,%esi #endif + cli INTRFASTEXIT #else ZTRAP(T_DNA) @@ -1015,6 +1017,7 @@ IDTVEC(prot) IDTVEC(f00f_redirect) pushl $T_PAGEFLT INTRENTRY(f00f_redirect) + sti testb $PGEX_U,TF_ERR(%esp) jnz calltrap movl%cr2,%eax @@ -1050,6 +1053,7 @@ IDTVEC(fpu) */ subl$8,%esp /* space for tf_{err,trapno} */ INTRENTRY(fpu) + sti pushl CPL # if_ppl in intrframe pushl %esp# push address of intrframe incl_C_LABEL(uvmexp)+V_TRAP @@ -1058,6 +1062,7 @@ IDTVEC(fpu) #ifdef DIAGNOSTIC movl$0xfc,%esi #endif + cli INTRFASTEXIT #else ZTRAP(T_ARITHTRAP)
Re: hang in i386 pmap_tlb_shootwait
On Wed, May 09, 2018 at 11:01:58AM +0200, Alexander Bluhm wrote: > Hi, > > While running my nightly regression tests, I compiled > /ports/misc/posixtestsuite. It was the first time that I was running > regress while having some other load on the machine. During > regress/lib/libc/ieeefp/except the machine hang. It has 2 CPUs. > Based on the discussion below, it sounds like the same bug mpi and I noticed a few weeks ago in nantes. A cpu gets stuck with interrupts disabled and a shootdown can't happen because the IPI isn't being received by that CPU. You might want to apply mpi's changes to see if it spins out waiting for the lock, and where. The output of show all locks might be useful also. -ml > The final output of the test: > > ===> ieeefp/except > cc -O2 -pipe -MD -MP -c /usr/src/regress/lib/libc/ieeefp/except/except.c > cc -o except except.o > ./except fltdiv > > This kernel was running: > > OpenBSD 6.3-current (GENERIC.MP) #592: Mon May 7 10:07:12 MDT 2018 > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP > > I could break into ddb: > > Stopped at db_enter+0x4: popl%ebp > ddb{0}> trace > db_enter() at db_enter+0x4 > comintr(d577d000) at comintr+0x21e > intr_handler(f58be8e4,d577c840) at intr_handler+0x30 > Xintr_ioapic3_untramp() at Xintr_ioapic3_untramp+0xd7 > --- interrupt --- > pmap_tlb_shootwait() at pmap_tlb_shootwait+0x12 > pmap_do_remove_pae(d0d33ce0,f55f2000,f55f3000,0) at pmap_do_remove_pae+0x2ac > pmap_remove(d0d33ce0,f55f2000,f55f3000) at pmap_remove+0x18 > uvm_unmap_kill_entry(d0d2d2b4,d4c810dc) at uvm_unmap_kill_entry+0xde > uvm_unmap_remove(d0d2d2b4,f55f2000,f55f3000,f58bea00,0,1) at > uvm_unmap_remove+0 > x194 > sys_kbind(d435dcf0,f58bea80,f58bea78) at sys_kbind+0x295 > syscall() at syscall+0x25e > --- syscall (number -813868376) --- > end of kernel > 0x7d6558e8: > > CPU 0 is running clang, CPU 1 is running the except test script. > > ddb{0}> ps >PID TID PPIDUID S FLAGS WAIT COMMAND > 92284 394442 70506 0 7 0x2except > *47266 113041 37786 55 7 0x2cc > 37786 281652 35994 55 30x10008a pause sh > 70506 372899 71391 0 30x10008a pause make > 71391 488915 75345 0 30x10008a pause sh > 75345 253329 29923 0 30x10008a pause make > 29923 89609 68217 0 30x10008a pause sh > 68217 294846 81420 0 30x10008a pause make > 51311 445816 20823 0 2 0x491perl > 81420 149032 81906 0 30x10008a pause sh > 81906 389989 44981 0 30x10008a pause make > 24237 35914 94782 0 30x100082 piperdgzip > 94782 375463 44981 0 30x100082 piperdpax > 44981 114211 25893 0 30x82 piperdperl > 25893 239558 5387 0 30x10008a pause ksh > 5387 100109 39691 0 30x92 selectsshd > 65456 428886 57598 0 30x100083 kqreadtail > 57598 364467 56435 0 30x10008b pause ksh > 39040 394741 84200 55 2 0x482perl > 84200 57590 22769 55 30x10008a pause sh > 22769 388112 71080 55 30x10008a pause make > 71080 289240 55503 55 30x10008a pause sh > 55503 177103 20823 55 30x10008a pause make > 20823 473630 90353 0 30x93 wait perl > 35994 500360 35455 55 30x82 piperdgmake > 35455 82895 18413 55 30x10008a pause make > 184139872 9766 55 30x10008a pause sh > 9766 29157 60819 55 30x10008a pause make > 60819 198028 51400 55 30x10008a pause sh > 51400 455284 1 55 30x10008a pause make > 90353 444304 56435 0 30x10008b pause ksh > 56435 213296 1 0 20x100480tmux > 12943 273120 79318 0 30x100083 kqreadtmux > 79318 90427 49332 0 30x10008b pause ksh > 49332 480938 39691 0 30x92 selectsshd > 79215 221858 1 0 20x100083getty > 5182 91398 1 0 30x100083 ttyin getty > 68061 353121 1 0 30x100083 ttyin getty > 61973 471346 1 0 30x100083 ttyin getty > 58677 314567 1 0 30x100083 ttyin getty > 26310 59684 1 0 30x100083 ttyin getty > 2 266793 1 0 20x100498cron > 69017 469788 1 99 30x100090 poll sndiod > 67250 378711 1110 30x100090 poll sndiod > 7419 486904 35256 95 30x100092 kqreadsmtpd > 87223
Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)
Thanks - I'm inclined to leave that one, I'm not sure why (possibly because of the division rather than using the value directly) but that one feels better to me as-is. On 2018/05/09 13:45, BRAND Arnaud wrote: > I do agree with you. > I based my patch on the code on the ifHighSpeed case block at line 1291 in > the same file. > You might want to make it more explicit too. > > -Message d'origine- > De : Stuart Henderson> Envoyé : mercredi 9 mai 2018 15:41 > À : BRAND Arnaud > Cc : bugs@openbsd.org > Objet : Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G) > > On 2018/05/09 12:48, BRAND Arnaud wrote: > > Hi, > > > > I would like to report was looks like a bug in snmpd. > > > > When walking on the ifTable my client crashes when walking over 10G > > interfaces. > > Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value > > 100 (10Gbps). > > But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1). > > > > My MIB browser states : > > "An estimate of the interface's current bandwidth in bits per second. > > For interfaces which do not vary in bandwidth or for those where no > > accurate estimation can be made, this object should contain the > > nominal bandwidth. If the bandwidth of the interface is greater than > > the maximum value reportable by this object then this object should > > report its maximum value (4,294,967,295) and ifHighSpeed must be used > > to report the interace's speed. For a sub-layer which has no concept > > of bandwidth, this object should be zero." > > > > So I guess the case block at line in /usr.sbin/snmpd/mib.c should read > > : > > case 5: > >i = kif->if_baudrate >= 4294967295 ? > >4294967295 : > > kif->if_baudrate ; > >ber = ber_add_integer(ber, i); > >ber_set_header(ber, BER_CLASS_APPLICATION, > > SNMP_T_GAUGE32); > > break; > > instead of > > case 5: > >ber = ber_add_integer(ber, kif->if_baudrate); > >ber_set_header(ber, BER_CLASS_APPLICATION, > > SNMP_T_GAUGE32); > > break; > > > > Is my assumption correct or have I missed something ? > > > > I'm gonna give it a try while a fix perhaps makes its way in the next > > release or patches. > > > > Have a nice day and thanks for your nice work in OpenBSD ! > > > > Best regards > > Arnaud > > I think that's the right thing to do, but an if() and using a macro instead > of writing 4294967295 out in full is easier on the eye. > Any OKs for this? > > > Index: mib.c > === > RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v retrieving revision 1.85 diff -u -p > -r1.85 mib.c > --- mib.c 18 Dec 2017 05:51:53 - 1.85 > +++ mib.c 9 May 2018 13:38:50 - > @@ -1109,7 +1109,11 @@ mib_iftable(struct oid *oid, struct ber_ > ber = ber_add_integer(ber, kif->if_mtu); > break; > case 5: > - ber = ber_add_integer(ber, kif->if_baudrate); > + if (kif->if_baudrate > UINT32_MAX) { > + /* speed should be obtained from ifHighSpeed instead */ > + ber = ber_add_integer(ber, UINT32_MAX); > + } else > + ber = ber_add_integer(ber, kif->if_baudrate); > ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32); > break; > case 6: > >
Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)
I do agree with you. I based my patch on the code on the ifHighSpeed case block at line 1291 in the same file. You might want to make it more explicit too. -Message d'origine- De : Stuart HendersonEnvoyé : mercredi 9 mai 2018 15:41 À : BRAND Arnaud Cc : bugs@openbsd.org Objet : Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G) On 2018/05/09 12:48, BRAND Arnaud wrote: > Hi, > > I would like to report was looks like a bug in snmpd. > > When walking on the ifTable my client crashes when walking over 10G > interfaces. > Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value > 100 (10Gbps). > But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1). > > My MIB browser states : > "An estimate of the interface's current bandwidth in bits per second. > For interfaces which do not vary in bandwidth or for those where no > accurate estimation can be made, this object should contain the > nominal bandwidth. If the bandwidth of the interface is greater than > the maximum value reportable by this object then this object should > report its maximum value (4,294,967,295) and ifHighSpeed must be used > to report the interace's speed. For a sub-layer which has no concept > of bandwidth, this object should be zero." > > So I guess the case block at line in /usr.sbin/snmpd/mib.c should read : > case 5: >i = kif->if_baudrate >= 4294967295 ? >4294967295 : kif->if_baudrate ; >ber = ber_add_integer(ber, i); >ber_set_header(ber, BER_CLASS_APPLICATION, > SNMP_T_GAUGE32); > break; > instead of > case 5: >ber = ber_add_integer(ber, kif->if_baudrate); >ber_set_header(ber, BER_CLASS_APPLICATION, > SNMP_T_GAUGE32); > break; > > Is my assumption correct or have I missed something ? > > I'm gonna give it a try while a fix perhaps makes its way in the next release > or patches. > > Have a nice day and thanks for your nice work in OpenBSD ! > > Best regards > Arnaud I think that's the right thing to do, but an if() and using a macro instead of writing 4294967295 out in full is easier on the eye. Any OKs for this? Index: mib.c === RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v retrieving revision 1.85 diff -u -p -r1.85 mib.c --- mib.c 18 Dec 2017 05:51:53 - 1.85 +++ mib.c 9 May 2018 13:38:50 - @@ -1109,7 +1109,11 @@ mib_iftable(struct oid *oid, struct ber_ ber = ber_add_integer(ber, kif->if_mtu); break; case 5: - ber = ber_add_integer(ber, kif->if_baudrate); + if (kif->if_baudrate > UINT32_MAX) { + /* speed should be obtained from ifHighSpeed instead */ + ber = ber_add_integer(ber, UINT32_MAX); + } else + ber = ber_add_integer(ber, kif->if_baudrate); ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32); break; case 6:
Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)
On 2018/05/09 12:48, BRAND Arnaud wrote: > Hi, > > I would like to report was looks like a bug in snmpd. > > When walking on the ifTable my client crashes when walking over 10G > interfaces. > Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value > 100 (10Gbps). > But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1). > > My MIB browser states : > "An estimate of the interface's current bandwidth in bits > per second. For interfaces which do not vary in bandwidth > or for those where no accurate estimation can be made, this > object should contain the nominal bandwidth. If the > bandwidth of the interface is greater than the maximum value > reportable by this object then this object should report its > maximum value (4,294,967,295) and ifHighSpeed must be used > to report the interace's speed. For a sub-layer which has > no concept of bandwidth, this object should be zero." > > So I guess the case block at line in /usr.sbin/snmpd/mib.c should read : > case 5: >i = kif->if_baudrate >= 4294967295 ? >4294967295 : kif->if_baudrate ; >ber = ber_add_integer(ber, i); >ber_set_header(ber, BER_CLASS_APPLICATION, > SNMP_T_GAUGE32); > break; > instead of > case 5: >ber = ber_add_integer(ber, kif->if_baudrate); >ber_set_header(ber, BER_CLASS_APPLICATION, > SNMP_T_GAUGE32); > break; > > Is my assumption correct or have I missed something ? > > I'm gonna give it a try while a fix perhaps makes its way in the next release > or patches. > > Have a nice day and thanks for your nice work in OpenBSD ! > > Best regards > Arnaud I think that's the right thing to do, but an if() and using a macro instead of writing 4294967295 out in full is easier on the eye. Any OKs for this? Index: mib.c === RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v retrieving revision 1.85 diff -u -p -r1.85 mib.c --- mib.c 18 Dec 2017 05:51:53 - 1.85 +++ mib.c 9 May 2018 13:38:50 - @@ -1109,7 +1109,11 @@ mib_iftable(struct oid *oid, struct ber_ ber = ber_add_integer(ber, kif->if_mtu); break; case 5: - ber = ber_add_integer(ber, kif->if_baudrate); + if (kif->if_baudrate > UINT32_MAX) { + /* speed should be obtained from ifHighSpeed instead */ + ber = ber_add_integer(ber, UINT32_MAX); + } else + ber = ber_add_integer(ber, kif->if_baudrate); ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32); break; case 6:
snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)
Hi, I would like to report was looks like a bug in snmpd. When walking on the ifTable my client crashes when walking over 10G interfaces. Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value 100 (10Gbps). But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1). My MIB browser states : "An estimate of the interface's current bandwidth in bits per second. For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth. If the bandwidth of the interface is greater than the maximum value reportable by this object then this object should report its maximum value (4,294,967,295) and ifHighSpeed must be used to report the interace's speed. For a sub-layer which has no concept of bandwidth, this object should be zero." So I guess the case block at line in /usr.sbin/snmpd/mib.c should read : case 5: i = kif->if_baudrate >= 4294967295 ? 4294967295 : kif->if_baudrate ; ber = ber_add_integer(ber, i); ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32); break; instead of case 5: ber = ber_add_integer(ber, kif->if_baudrate); ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32); break; Is my assumption correct or have I missed something ? I'm gonna give it a try while a fix perhaps makes its way in the next release or patches. Have a nice day and thanks for your nice work in OpenBSD ! Best regards Arnaud
Re: ddb(4): p[rint] man page example vs. result.
On 09/05/18(Wed) 12:13, Artturi Alm wrote: > On Wed, May 09, 2018 at 10:23:41AM +0200, Martin Pieuchot wrote: > > On 09/05/18(Wed) 07:48, Artturi Alm wrote: > > > On Tue, May 08, 2018 at 01:44:39AM +0300, Artturi Alm wrote: > > > > > > No bug are irrelevant to fix. But working with you is hard, really > > hard. You never explain what the problem is. Reading your email is > > an exercise in frustration because you can do some good work but you > > fail to communicate. > > > > > > (manual "copypaste"): > > > > nc2k4hp# sysctl ddb.trigger=1 > > > > Stopped at db_enter+0x4: popl%ebp > > > > ddb{0}> print/x "eax = " $eax "\necx = " $ecx "\n" > > > > 3 > > > > ddb{0}> c > > > > ddb.trigger: 0 -> 1 > > > > > > > > so, for reasons yet unknown to me, p[rint] doesn't seem to work at all > > > > like described in the man page, tested on i386. > > > > What do no work? What does the man page describe? Do you expect us to > > read the man page, then look at your mail again, then try to understand > > what is not working? > > > > For example, > > print/x "eax = " $eax "\necx = " $ecx "\n" > > will print something like this: > > eax = xx > ecx = yy > > Now I did install 5.0 into a VM, and there the result for above example > would of have been just "Ambiguous", and I'm guessing now that this > has not been working as in the example since import. > My fix is limited to producing output just like in the example, but > input requires more, as it needs escapes for everything not a-z,A-Z,0-9. > > > > > Should it work? I hope it would. > > > > What should work? Why do you hope? Maybe the manpage should be fixed? > > > > Multiple [addr] arguments to p[rint], including support for strings, > and i hope so because i would find it useful while testing/writing/porting > drivers. Maybe, I do like "show struct", and have more than just > the filtering diff for it, but it doesn't really work for the ad hoc > usecases p[rint] seems so excellent for. > > > > Does feel like waste of time to go any further fixing this, if this is > > > yet another bug too irrelevant for anyone to ack for, so _any_ input > > > here would be great. > > > > Like I said, no bug are irrelevant but if the one finding the bug, you > > in that case, is not willing to properly explain the problem, then > > better not send an email at all ;) > > Will try in the future. Thanks for the explanation! > haven't tested the diff below yet, but compared to previous, it should > have working /modifierS. IMHO we should just amend the man page and keep ddb(4) code simple.
Re: ddb(4): p[rint] man page example vs. result.
On Wed, May 09, 2018 at 10:23:41AM +0200, Martin Pieuchot wrote: > On 09/05/18(Wed) 07:48, Artturi Alm wrote: > > On Tue, May 08, 2018 at 01:44:39AM +0300, Artturi Alm wrote: > > > No bug are irrelevant to fix. But working with you is hard, really > hard. You never explain what the problem is. Reading your email is > an exercise in frustration because you can do some good work but you > fail to communicate. > > > > (manual "copypaste"): > > > nc2k4hp# sysctl ddb.trigger=1 > > > Stopped atdb_enter+0x4: popl%ebp > > > ddb{0}> print/x "eax = " $eax "\necx = " $ecx "\n" > > > 3 > > > ddb{0}> c > > > ddb.trigger: 0 -> 1 > > > > > > so, for reasons yet unknown to me, p[rint] doesn't seem to work at all > > > like described in the man page, tested on i386. > > What do no work? What does the man page describe? Do you expect us to > read the man page, then look at your mail again, then try to understand > what is not working? > For example, print/x "eax = " $eax "\necx = " $ecx "\n" will print something like this: eax = xx ecx = yy Now I did install 5.0 into a VM, and there the result for above example would of have been just "Ambiguous", and I'm guessing now that this has not been working as in the example since import. My fix is limited to producing output just like in the example, but input requires more, as it needs escapes for everything not a-z,A-Z,0-9. > > > Should it work? I hope it would. > > What should work? Why do you hope? Maybe the manpage should be fixed? > Multiple [addr] arguments to p[rint], including support for strings, and i hope so because i would find it useful while testing/writing/porting drivers. Maybe, I do like "show struct", and have more than just the filtering diff for it, but it doesn't really work for the ad hoc usecases p[rint] seems so excellent for. > > Does feel like waste of time to go any further fixing this, if this is > > yet another bug too irrelevant for anyone to ack for, so _any_ input > > here would be great. > > Like I said, no bug are irrelevant but if the one finding the bug, you > in that case, is not willing to properly explain the problem, then > better not send an email at all ;) Will try in the future. haven't tested the diff below yet, but compared to previous, it should have working /modifierS. -Artturi diff --git sys/ddb/db_command.c sys/ddb/db_command.c index a275023dc58..27cda0ba641 100644 --- sys/ddb/db_command.c +++ sys/ddb/db_command.c @@ -612,8 +612,8 @@ struct db_command db_command_table[] = { { "machine",NULL, 0, NULL}, #endif { "kill", db_kill_cmd,0, NULL }, - { "print", db_print_cmd, 0, NULL }, - { "p", db_print_cmd, 0, NULL }, + { "print", db_print_cmd, CS_OWN, NULL }, + { "p", db_print_cmd, CS_OWN, NULL }, { "pprint", db_ctf_pprint_cmd, CS_OWN, NULL }, { "examine",db_examine_cmd, CS_SET_DOT, NULL }, { "x", db_examine_cmd, CS_SET_DOT, NULL }, diff --git sys/ddb/db_examine.c sys/ddb/db_examine.c index d8fec8219f1..e8b1912b937 100644 --- sys/ddb/db_examine.c +++ sys/ddb/db_examine.c @@ -238,19 +238,68 @@ db_examine(db_addr_t addr, char *fmt, int count) /* * Print value. */ -char db_print_format = 'x'; +char db_print_format[TOK_STRING_SIZE] = "x"; /*ARGSUSED*/ void db_print_cmd(db_expr_t addr, int have_addr, db_expr_t count, char *modif) { db_expr_t value; + chartmptok[TOK_STRING_SIZE]; chartmpfmt[28]; + char*s; + int i, m, t; - if (modif[0] != '\0') - db_print_format = modif[0]; + /* check for modifier */ + t = db_read_token(); + if (t == tSLASH) { + t = db_read_token(); + if (t != tIDENT) { + db_printf("\nBad modifier\n"); + db_flush_lex(); + return; + } + db_strlcpy(db_print_format, db_tok_string, + sizeof(db_print_format)); + + t = db_read_token(); + } + +_inp_loop: + if (t == tDITTO) { + t = db_read_token(); + db_strlcpy(tmptok, db_tok_string, sizeof(tmptok)); + t = db_read_token(); + if (t != tDITTO) { + db_printf("\nBad string, missing \"\n"); + db_flush_lex(); + return; + } + s = db_tok_string; + for (i = 0; i < TOK_STRING_SIZE && s[i] != '\0'; i++) { + if (i < (TOK_STRING_SIZE - 1) && s[i] == '\\') { + switch (s[++i]) { +
hang in i386 pmap_tlb_shootwait
Hi, While running my nightly regression tests, I compiled /ports/misc/posixtestsuite. It was the first time that I was running regress while having some other load on the machine. During regress/lib/libc/ieeefp/except the machine hang. It has 2 CPUs. The final output of the test: ===> ieeefp/except cc -O2 -pipe -MD -MP -c /usr/src/regress/lib/libc/ieeefp/except/except.c cc -o except except.o ./except fltdiv This kernel was running: OpenBSD 6.3-current (GENERIC.MP) #592: Mon May 7 10:07:12 MDT 2018 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP I could break into ddb: Stopped at db_enter+0x4: popl%ebp ddb{0}> trace db_enter() at db_enter+0x4 comintr(d577d000) at comintr+0x21e intr_handler(f58be8e4,d577c840) at intr_handler+0x30 Xintr_ioapic3_untramp() at Xintr_ioapic3_untramp+0xd7 --- interrupt --- pmap_tlb_shootwait() at pmap_tlb_shootwait+0x12 pmap_do_remove_pae(d0d33ce0,f55f2000,f55f3000,0) at pmap_do_remove_pae+0x2ac pmap_remove(d0d33ce0,f55f2000,f55f3000) at pmap_remove+0x18 uvm_unmap_kill_entry(d0d2d2b4,d4c810dc) at uvm_unmap_kill_entry+0xde uvm_unmap_remove(d0d2d2b4,f55f2000,f55f3000,f58bea00,0,1) at uvm_unmap_remove+0 x194 sys_kbind(d435dcf0,f58bea80,f58bea78) at sys_kbind+0x295 syscall() at syscall+0x25e --- syscall (number -813868376) --- end of kernel 0x7d6558e8: CPU 0 is running clang, CPU 1 is running the except test script. ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 92284 394442 70506 0 7 0x2except *47266 113041 37786 55 7 0x2cc 37786 281652 35994 55 30x10008a pause sh 70506 372899 71391 0 30x10008a pause make 71391 488915 75345 0 30x10008a pause sh 75345 253329 29923 0 30x10008a pause make 29923 89609 68217 0 30x10008a pause sh 68217 294846 81420 0 30x10008a pause make 51311 445816 20823 0 2 0x491perl 81420 149032 81906 0 30x10008a pause sh 81906 389989 44981 0 30x10008a pause make 24237 35914 94782 0 30x100082 piperdgzip 94782 375463 44981 0 30x100082 piperdpax 44981 114211 25893 0 30x82 piperdperl 25893 239558 5387 0 30x10008a pause ksh 5387 100109 39691 0 30x92 selectsshd 65456 428886 57598 0 30x100083 kqreadtail 57598 364467 56435 0 30x10008b pause ksh 39040 394741 84200 55 2 0x482perl 84200 57590 22769 55 30x10008a pause sh 22769 388112 71080 55 30x10008a pause make 71080 289240 55503 55 30x10008a pause sh 55503 177103 20823 55 30x10008a pause make 20823 473630 90353 0 30x93 wait perl 35994 500360 35455 55 30x82 piperdgmake 35455 82895 18413 55 30x10008a pause make 184139872 9766 55 30x10008a pause sh 9766 29157 60819 55 30x10008a pause make 60819 198028 51400 55 30x10008a pause sh 51400 455284 1 55 30x10008a pause make 90353 444304 56435 0 30x10008b pause ksh 56435 213296 1 0 20x100480tmux 12943 273120 79318 0 30x100083 kqreadtmux 79318 90427 49332 0 30x10008b pause ksh 49332 480938 39691 0 30x92 selectsshd 79215 221858 1 0 20x100083getty 5182 91398 1 0 30x100083 ttyin getty 68061 353121 1 0 30x100083 ttyin getty 61973 471346 1 0 30x100083 ttyin getty 58677 314567 1 0 30x100083 ttyin getty 26310 59684 1 0 30x100083 ttyin getty 2 266793 1 0 20x100498cron 69017 469788 1 99 30x100090 poll sndiod 67250 378711 1110 30x100090 poll sndiod 7419 486904 35256 95 30x100092 kqreadsmtpd 87223 110989 35256103 30x100092 kqreadsmtpd 22973 257799 35256 95 30x100092 kqreadsmtpd 22893 197212 35256 95 30x100092 kqreadsmtpd 55776 30 35256 95 30x100092 kqreadsmtpd 67856 519997 35256 95 30x100092 kqreadsmtpd 35256 194026 1 0 30x100080 kqreadsmtpd 39691 482995 1 0 30x80 selectsshd 91848 227431 0 0 2 0x14600acct 57929 439430 0 0 3 0x14280 nfsidlnfsio 22984 278690 0 0 3 0x14280 nfsidl
Re: ddb(4): p[rint] man page example vs. result.
On 09/05/18(Wed) 07:48, Artturi Alm wrote: > On Tue, May 08, 2018 at 01:44:39AM +0300, Artturi Alm wrote: No bug are irrelevant to fix. But working with you is hard, really hard. You never explain what the problem is. Reading your email is an exercise in frustration because you can do some good work but you fail to communicate. > > (manual "copypaste"): > > nc2k4hp# sysctl ddb.trigger=1 > > Stopped at db_enter+0x4: popl%ebp > > ddb{0}> print/x "eax = " $eax "\necx = " $ecx "\n" > > 3 > > ddb{0}> c > > ddb.trigger: 0 -> 1 > > > > so, for reasons yet unknown to me, p[rint] doesn't seem to work at all > > like described in the man page, tested on i386. What do no work? What does the man page describe? Do you expect us to read the man page, then look at your mail again, then try to understand what is not working? > > Should it work? I hope it would. What should work? Why do you hope? Maybe the manpage should be fixed? > Does feel like waste of time to go any further fixing this, if this is > yet another bug too irrelevant for anyone to ack for, so _any_ input > here would be great. Like I said, no bug are irrelevant but if the one finding the bug, you in that case, is not willing to properly explain the problem, then better not send an email at all ;)