Re: hang in i386 pmap_tlb_shootwait

2018-05-09 Thread Mike Larkin
On Wed, May 09, 2018 at 06:21:54PM +0200, Hans-Joerg Hoexer wrote:
> Hi,
> 
> I think this fallout from using interrupt gates now.  I did not properly
> enable interrupts for dna, fpu and f00f_redirect:  Thux npxintr() tries to
> get the kernel lock with interrupts disabled.  Meanwhile the IPI for tlb
> shootdown is pending for delivery.  When the sender of the IPI is holding
> the kernel lock it will spin in pmap_tlb_shootwait() and we dead lock.
> 
> Diff below fixes dna, fpu and f00f_redirect by enabling interrupts.
> 
> (dna and fpu leave the kernel directly, thus they have to disable
> interrupts again; f00f_redirect goes through calltrap which will enable
> interrupts)
> 
> Take care,
> HJ.
> 

This makes sense, ok mlarkin.

-ml

> Index: sys/arch/i386//i386/locore.s
> ===
> RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
> retrieving revision 1.185
> diff -u -p -u -p -r1.185 locore.s
> --- sys/arch/i386//i386/locore.s  11 Apr 2018 15:44:08 -  1.185
> +++ sys/arch/i386//i386/locore.s  9 May 2018 15:47:51 -
> @@ -988,6 +988,7 @@ IDTVEC(dna)
>   pushl   $0  # dummy error code
>   pushl   $T_DNA
>   INTRENTRY(dna)
> + sti
>   pushl   CPUVAR(SELF)
>   call*_C_LABEL(npxdna_func)
>   addl$4,%esp
> @@ -996,6 +997,7 @@ IDTVEC(dna)
>  #ifdef DIAGNOSTIC
>   movl$0xfd,%esi
>  #endif
> + cli
>   INTRFASTEXIT
>  #else
>   ZTRAP(T_DNA)
> @@ -1015,6 +1017,7 @@ IDTVEC(prot)
>  IDTVEC(f00f_redirect)
>   pushl   $T_PAGEFLT
>   INTRENTRY(f00f_redirect)
> + sti
>   testb   $PGEX_U,TF_ERR(%esp)
>   jnz calltrap
>   movl%cr2,%eax
> @@ -1050,6 +1053,7 @@ IDTVEC(fpu)
>*/
>   subl$8,%esp /* space for tf_{err,trapno} */
>   INTRENTRY(fpu)
> + sti
>   pushl   CPL # if_ppl in intrframe
>   pushl   %esp# push address of intrframe
>   incl_C_LABEL(uvmexp)+V_TRAP
> @@ -1058,6 +1062,7 @@ IDTVEC(fpu)
>  #ifdef DIAGNOSTIC
>   movl$0xfc,%esi
>  #endif
> + cli
>   INTRFASTEXIT
>  #else
>   ZTRAP(T_ARITHTRAP)
> 



Re: hang in i386 pmap_tlb_shootwait

2018-05-09 Thread Alexander Bluhm
On Wed, May 09, 2018 at 06:21:54PM +0200, Hans-Joerg Hoexer wrote:
> Hi,
> 
> I think this fallout from using interrupt gates now.  I did not properly
> enable interrupts for dna, fpu and f00f_redirect:  Thux npxintr() tries to
> get the kernel lock with interrupts disabled.  Meanwhile the IPI for tlb
> shootdown is pending for delivery.  When the sender of the IPI is holding
> the kernel lock it will spin in pmap_tlb_shootwait() and we dead lock.
> 
> Diff below fixes dna, fpu and f00f_redirect by enabling interrupts.

This fixes my test setup.

bluhm

> (dna and fpu leave the kernel directly, thus they have to disable
> interrupts again; f00f_redirect goes through calltrap which will enable
> interrupts)
> 
> Take care,
> HJ.
> 
> Index: sys/arch/i386//i386/locore.s
> ===
> RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
> retrieving revision 1.185
> diff -u -p -u -p -r1.185 locore.s
> --- sys/arch/i386//i386/locore.s  11 Apr 2018 15:44:08 -  1.185
> +++ sys/arch/i386//i386/locore.s  9 May 2018 15:47:51 -
> @@ -988,6 +988,7 @@ IDTVEC(dna)
>   pushl   $0  # dummy error code
>   pushl   $T_DNA
>   INTRENTRY(dna)
> + sti
>   pushl   CPUVAR(SELF)
>   call*_C_LABEL(npxdna_func)
>   addl$4,%esp
> @@ -996,6 +997,7 @@ IDTVEC(dna)
>  #ifdef DIAGNOSTIC
>   movl$0xfd,%esi
>  #endif
> + cli
>   INTRFASTEXIT
>  #else
>   ZTRAP(T_DNA)
> @@ -1015,6 +1017,7 @@ IDTVEC(prot)
>  IDTVEC(f00f_redirect)
>   pushl   $T_PAGEFLT
>   INTRENTRY(f00f_redirect)
> + sti
>   testb   $PGEX_U,TF_ERR(%esp)
>   jnz calltrap
>   movl%cr2,%eax
> @@ -1050,6 +1053,7 @@ IDTVEC(fpu)
>*/
>   subl$8,%esp /* space for tf_{err,trapno} */
>   INTRENTRY(fpu)
> + sti
>   pushl   CPL # if_ppl in intrframe
>   pushl   %esp# push address of intrframe
>   incl_C_LABEL(uvmexp)+V_TRAP
> @@ -1058,6 +1062,7 @@ IDTVEC(fpu)
>  #ifdef DIAGNOSTIC
>   movl$0xfc,%esi
>  #endif
> + cli
>   INTRFASTEXIT
>  #else
>   ZTRAP(T_ARITHTRAP)



Re: hang in i386 pmap_tlb_shootwait

2018-05-09 Thread Hans-Joerg Hoexer
Hi,

I think this fallout from using interrupt gates now.  I did not properly
enable interrupts for dna, fpu and f00f_redirect:  Thux npxintr() tries to
get the kernel lock with interrupts disabled.  Meanwhile the IPI for tlb
shootdown is pending for delivery.  When the sender of the IPI is holding
the kernel lock it will spin in pmap_tlb_shootwait() and we dead lock.

Diff below fixes dna, fpu and f00f_redirect by enabling interrupts.

(dna and fpu leave the kernel directly, thus they have to disable
interrupts again; f00f_redirect goes through calltrap which will enable
interrupts)

Take care,
HJ.

Index: sys/arch/i386//i386/locore.s
===
RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
retrieving revision 1.185
diff -u -p -u -p -r1.185 locore.s
--- sys/arch/i386//i386/locore.s11 Apr 2018 15:44:08 -  1.185
+++ sys/arch/i386//i386/locore.s9 May 2018 15:47:51 -
@@ -988,6 +988,7 @@ IDTVEC(dna)
pushl   $0  # dummy error code
pushl   $T_DNA
INTRENTRY(dna)
+   sti
pushl   CPUVAR(SELF)
call*_C_LABEL(npxdna_func)
addl$4,%esp
@@ -996,6 +997,7 @@ IDTVEC(dna)
 #ifdef DIAGNOSTIC
movl$0xfd,%esi
 #endif
+   cli
INTRFASTEXIT
 #else
ZTRAP(T_DNA)
@@ -1015,6 +1017,7 @@ IDTVEC(prot)
 IDTVEC(f00f_redirect)
pushl   $T_PAGEFLT
INTRENTRY(f00f_redirect)
+   sti
testb   $PGEX_U,TF_ERR(%esp)
jnz calltrap
movl%cr2,%eax
@@ -1050,6 +1053,7 @@ IDTVEC(fpu)
 */
subl$8,%esp /* space for tf_{err,trapno} */
INTRENTRY(fpu)
+   sti
pushl   CPL # if_ppl in intrframe
pushl   %esp# push address of intrframe
incl_C_LABEL(uvmexp)+V_TRAP
@@ -1058,6 +1062,7 @@ IDTVEC(fpu)
 #ifdef DIAGNOSTIC
movl$0xfc,%esi
 #endif
+   cli
INTRFASTEXIT
 #else
ZTRAP(T_ARITHTRAP)



Re: hang in i386 pmap_tlb_shootwait

2018-05-09 Thread Mike Larkin
On Wed, May 09, 2018 at 11:01:58AM +0200, Alexander Bluhm wrote:
> Hi,
> 
> While running my nightly regression tests, I compiled
> /ports/misc/posixtestsuite.  It was the first time that I was running
> regress while having some other load on the machine.  During
> regress/lib/libc/ieeefp/except the machine hang.  It has 2 CPUs.
> 

Based on the discussion below, it sounds like the same bug mpi and I noticed
a few weeks ago in nantes. A cpu gets stuck with interrupts disabled and a
shootdown can't happen because the IPI isn't being received by that CPU.

You might want to apply mpi's changes to see if it spins out waiting for the
lock, and where. The output of show all locks might be useful also.

-ml

> The final output of the test:
> 
> ===> ieeefp/except
> cc -O2 -pipe   -MD -MP  -c /usr/src/regress/lib/libc/ieeefp/except/except.c
> cc   -o except except.o 
> ./except fltdiv
> 
> This kernel was running:
> 
> OpenBSD 6.3-current (GENERIC.MP) #592: Mon May  7 10:07:12 MDT 2018
> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
> 
> I could break into ddb:
> 
> Stopped at  db_enter+0x4:   popl%ebp
> ddb{0}> trace
> db_enter() at db_enter+0x4
> comintr(d577d000) at comintr+0x21e
> intr_handler(f58be8e4,d577c840) at intr_handler+0x30
> Xintr_ioapic3_untramp() at Xintr_ioapic3_untramp+0xd7
> --- interrupt ---
> pmap_tlb_shootwait() at pmap_tlb_shootwait+0x12
> pmap_do_remove_pae(d0d33ce0,f55f2000,f55f3000,0) at pmap_do_remove_pae+0x2ac
> pmap_remove(d0d33ce0,f55f2000,f55f3000) at pmap_remove+0x18
> uvm_unmap_kill_entry(d0d2d2b4,d4c810dc) at uvm_unmap_kill_entry+0xde
> uvm_unmap_remove(d0d2d2b4,f55f2000,f55f3000,f58bea00,0,1) at 
> uvm_unmap_remove+0
> x194
> sys_kbind(d435dcf0,f58bea80,f58bea78) at sys_kbind+0x295
> syscall() at syscall+0x25e
> --- syscall (number -813868376) ---
> end of kernel
> 0x7d6558e8:
> 
> CPU 0 is running clang, CPU 1 is running the except test script.
> 
> ddb{0}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  92284  394442  70506  0  7 0x2except
> *47266  113041  37786 55  7 0x2cc
>  37786  281652  35994 55  30x10008a  pause sh
>  70506  372899  71391  0  30x10008a  pause make
>  71391  488915  75345  0  30x10008a  pause sh
>  75345  253329  29923  0  30x10008a  pause make
>  29923   89609  68217  0  30x10008a  pause sh
>  68217  294846  81420  0  30x10008a  pause make
>  51311  445816  20823  0  2   0x491perl
>  81420  149032  81906  0  30x10008a  pause sh
>  81906  389989  44981  0  30x10008a  pause make
>  24237   35914  94782  0  30x100082  piperdgzip
>  94782  375463  44981  0  30x100082  piperdpax
>  44981  114211  25893  0  30x82  piperdperl
>  25893  239558   5387  0  30x10008a  pause ksh
>   5387  100109  39691  0  30x92  selectsshd
>  65456  428886  57598  0  30x100083  kqreadtail
>  57598  364467  56435  0  30x10008b  pause ksh
>  39040  394741  84200 55  2   0x482perl
>  84200   57590  22769 55  30x10008a  pause sh
>  22769  388112  71080 55  30x10008a  pause make
>  71080  289240  55503 55  30x10008a  pause sh
>  55503  177103  20823 55  30x10008a  pause make
>  20823  473630  90353  0  30x93  wait  perl
>  35994  500360  35455 55  30x82  piperdgmake
>  35455   82895  18413 55  30x10008a  pause make
>  184139872   9766 55  30x10008a  pause sh
>   9766   29157  60819 55  30x10008a  pause make
>  60819  198028  51400 55  30x10008a  pause sh
>  51400  455284  1 55  30x10008a  pause make
>  90353  444304  56435  0  30x10008b  pause ksh
>  56435  213296  1  0  20x100480tmux
>  12943  273120  79318  0  30x100083  kqreadtmux
>  79318   90427  49332  0  30x10008b  pause ksh
>  49332  480938  39691  0  30x92  selectsshd
>  79215  221858  1  0  20x100083getty
>   5182   91398  1  0  30x100083  ttyin getty
>  68061  353121  1  0  30x100083  ttyin getty
>  61973  471346  1  0  30x100083  ttyin getty
>  58677  314567  1  0  30x100083  ttyin getty
>  26310   59684  1  0  30x100083  ttyin getty
>  2  266793  1  0  20x100498cron
>  69017  469788  1 99  30x100090  poll  sndiod
>  67250  378711  1110  30x100090  poll  sndiod
>   7419  486904  35256 95  30x100092  kqreadsmtpd
>  87223  

Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)

2018-05-09 Thread Stuart Henderson
Thanks - I'm inclined to leave that one, I'm not sure why (possibly
because of the division rather than using the value directly) but that
one feels better to me as-is.


On 2018/05/09 13:45, BRAND Arnaud wrote:
> I do agree with you.
> I based my patch on the code on the ifHighSpeed case block at line 1291 in 
> the same file.
> You might want to make it more explicit too.
> 
> -Message d'origine-
> De : Stuart Henderson  
> Envoyé : mercredi 9 mai 2018 15:41
> À : BRAND Arnaud 
> Cc : bugs@openbsd.org
> Objet : Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)
> 
> On 2018/05/09 12:48, BRAND Arnaud wrote:
> > Hi,
> > 
> > I would like to report was looks like a bug in snmpd.
> > 
> > When walking on the ifTable my client crashes when walking over 10G 
> > interfaces.
> > Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value 
> > 100 (10Gbps).
> > But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1).
> > 
> > My MIB browser  states :
> > "An estimate of the interface's current bandwidth in bits per second.  
> > For interfaces which do not vary in bandwidth or for those where no 
> > accurate estimation can be made, this object should contain the 
> > nominal bandwidth.  If the bandwidth of the interface is greater than 
> > the maximum value reportable by this object then this object should 
> > report its maximum value (4,294,967,295) and ifHighSpeed must be used 
> > to report the interace's speed.  For a sub-layer which has no concept 
> > of bandwidth, this object should be zero."
> > 
> > So I guess the case block at line  in /usr.sbin/snmpd/mib.c should read 
> > :
> > case 5:
> >i = kif->if_baudrate >= 4294967295 ?
> >4294967295 : 
> > kif->if_baudrate ;
> >ber = ber_add_integer(ber, i);
> >ber_set_header(ber, BER_CLASS_APPLICATION, 
> > SNMP_T_GAUGE32);
> > break;
> > instead of
> > case 5:
> >ber = ber_add_integer(ber, kif->if_baudrate);
> >ber_set_header(ber, BER_CLASS_APPLICATION, 
> > SNMP_T_GAUGE32);
> > break;
> > 
> > Is my assumption correct or have I missed something ?
> > 
> > I'm gonna give it a try while a fix perhaps makes its way in the next 
> > release or patches.
> > 
> > Have a nice day and thanks for your nice work in OpenBSD !
> > 
> > Best regards
> > Arnaud
> 
> I think that's the right thing to do, but an if() and using a macro instead 
> of writing 4294967295 out in full is easier on the eye.
> Any OKs for this?
> 
> 
> Index: mib.c
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v retrieving revision 1.85 diff -u -p 
> -r1.85 mib.c
> --- mib.c 18 Dec 2017 05:51:53 -  1.85
> +++ mib.c 9 May 2018 13:38:50 -
> @@ -1109,7 +1109,11 @@ mib_iftable(struct oid *oid, struct ber_
>   ber = ber_add_integer(ber, kif->if_mtu);
>   break;
>   case 5:
> - ber = ber_add_integer(ber, kif->if_baudrate);
> + if (kif->if_baudrate > UINT32_MAX) {
> + /* speed should be obtained from ifHighSpeed instead */
> + ber = ber_add_integer(ber, UINT32_MAX);
> + } else
> + ber = ber_add_integer(ber, kif->if_baudrate);
>   ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32);
>   break;
>   case 6:
> 
> 



Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)

2018-05-09 Thread BRAND Arnaud
I do agree with you.
I based my patch on the code on the ifHighSpeed case block at line 1291 in the 
same file.
You might want to make it more explicit too.

-Message d'origine-
De : Stuart Henderson  
Envoyé : mercredi 9 mai 2018 15:41
À : BRAND Arnaud 
Cc : bugs@openbsd.org
Objet : Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)

On 2018/05/09 12:48, BRAND Arnaud wrote:
> Hi,
> 
> I would like to report was looks like a bug in snmpd.
> 
> When walking on the ifTable my client crashes when walking over 10G 
> interfaces.
> Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value 
> 100 (10Gbps).
> But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1).
> 
> My MIB browser  states :
> "An estimate of the interface's current bandwidth in bits per second.  
> For interfaces which do not vary in bandwidth or for those where no 
> accurate estimation can be made, this object should contain the 
> nominal bandwidth.  If the bandwidth of the interface is greater than 
> the maximum value reportable by this object then this object should 
> report its maximum value (4,294,967,295) and ifHighSpeed must be used 
> to report the interace's speed.  For a sub-layer which has no concept 
> of bandwidth, this object should be zero."
> 
> So I guess the case block at line  in /usr.sbin/snmpd/mib.c should read :
> case 5:
>i = kif->if_baudrate >= 4294967295 ?
>4294967295 : kif->if_baudrate ;
>ber = ber_add_integer(ber, i);
>ber_set_header(ber, BER_CLASS_APPLICATION, 
> SNMP_T_GAUGE32);
> break;
> instead of
> case 5:
>ber = ber_add_integer(ber, kif->if_baudrate);
>ber_set_header(ber, BER_CLASS_APPLICATION, 
> SNMP_T_GAUGE32);
> break;
> 
> Is my assumption correct or have I missed something ?
> 
> I'm gonna give it a try while a fix perhaps makes its way in the next release 
> or patches.
> 
> Have a nice day and thanks for your nice work in OpenBSD !
> 
> Best regards
> Arnaud

I think that's the right thing to do, but an if() and using a macro instead of 
writing 4294967295 out in full is easier on the eye.
Any OKs for this?


Index: mib.c
===
RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v retrieving revision 1.85 diff -u -p 
-r1.85 mib.c
--- mib.c   18 Dec 2017 05:51:53 -  1.85
+++ mib.c   9 May 2018 13:38:50 -
@@ -1109,7 +1109,11 @@ mib_iftable(struct oid *oid, struct ber_
ber = ber_add_integer(ber, kif->if_mtu);
break;
case 5:
-   ber = ber_add_integer(ber, kif->if_baudrate);
+   if (kif->if_baudrate > UINT32_MAX) {
+   /* speed should be obtained from ifHighSpeed instead */
+   ber = ber_add_integer(ber, UINT32_MAX);
+   } else
+   ber = ber_add_integer(ber, kif->if_baudrate);
ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32);
break;
case 6:




Re: snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)

2018-05-09 Thread Stuart Henderson
On 2018/05/09 12:48, BRAND Arnaud wrote:
> Hi,
> 
> I would like to report was looks like a bug in snmpd.
> 
> When walking on the ifTable my client crashes when walking over 10G 
> interfaces.
> Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value 
> 100 (10Gbps).
> But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1).
> 
> My MIB browser  states :
> "An estimate of the interface's current bandwidth in bits
> per second.  For interfaces which do not vary in bandwidth
> or for those where no accurate estimation can be made, this
> object should contain the nominal bandwidth.  If the
> bandwidth of the interface is greater than the maximum value
> reportable by this object then this object should report its
> maximum value (4,294,967,295) and ifHighSpeed must be used
> to report the interace's speed.  For a sub-layer which has
> no concept of bandwidth, this object should be zero."
> 
> So I guess the case block at line  in /usr.sbin/snmpd/mib.c should read :
> case 5:
>i = kif->if_baudrate >= 4294967295 ?
>4294967295 : kif->if_baudrate ;
>ber = ber_add_integer(ber, i);
>ber_set_header(ber, BER_CLASS_APPLICATION, 
> SNMP_T_GAUGE32);
> break;
> instead of
> case 5:
>ber = ber_add_integer(ber, kif->if_baudrate);
>ber_set_header(ber, BER_CLASS_APPLICATION, 
> SNMP_T_GAUGE32);
> break;
> 
> Is my assumption correct or have I missed something ?
> 
> I'm gonna give it a try while a fix perhaps makes its way in the next release 
> or patches.
> 
> Have a nice day and thanks for your nice work in OpenBSD !
> 
> Best regards
> Arnaud

I think that's the right thing to do, but an if() and using a macro
instead of writing 4294967295 out in full is easier on the eye.
Any OKs for this?


Index: mib.c
===
RCS file: /cvs/src/usr.sbin/snmpd/mib.c,v
retrieving revision 1.85
diff -u -p -r1.85 mib.c
--- mib.c   18 Dec 2017 05:51:53 -  1.85
+++ mib.c   9 May 2018 13:38:50 -
@@ -1109,7 +1109,11 @@ mib_iftable(struct oid *oid, struct ber_
ber = ber_add_integer(ber, kif->if_mtu);
break;
case 5:
-   ber = ber_add_integer(ber, kif->if_baudrate);
+   if (kif->if_baudrate > UINT32_MAX) {
+   /* speed should be obtained from ifHighSpeed instead */
+   ber = ber_add_integer(ber, UINT32_MAX);
+   } else
+   ber = ber_add_integer(ber, kif->if_baudrate);
ber_set_header(ber, BER_CLASS_APPLICATION, SNMP_T_GAUGE32);
break;
case 6:




snmpd ifSpeed reporting seems wrong for speeds over 4G (ex: 10G)

2018-05-09 Thread BRAND Arnaud
Hi,

I would like to report was looks like a bug in snmpd.

When walking on the ifTable my client crashes when walking over 10G interfaces.
Tcpdump shows that ifSpeed (1.3.6.1.2.1.2.2.1.5) is sending the value 
100 (10Gbps).
But ifSpeed is of type GAUGE and maxes out at 2^32-1 (4Gbps-1).

My MIB browser  states :
"An estimate of the interface's current bandwidth in bits
per second.  For interfaces which do not vary in bandwidth
or for those where no accurate estimation can be made, this
object should contain the nominal bandwidth.  If the
bandwidth of the interface is greater than the maximum value
reportable by this object then this object should report its
maximum value (4,294,967,295) and ifHighSpeed must be used
to report the interace's speed.  For a sub-layer which has
no concept of bandwidth, this object should be zero."

So I guess the case block at line  in /usr.sbin/snmpd/mib.c should read :
case 5:
   i = kif->if_baudrate >= 4294967295 ?
   4294967295 : kif->if_baudrate ;
   ber = ber_add_integer(ber, i);
   ber_set_header(ber, BER_CLASS_APPLICATION, 
SNMP_T_GAUGE32);
break;
instead of
case 5:
   ber = ber_add_integer(ber, kif->if_baudrate);
   ber_set_header(ber, BER_CLASS_APPLICATION, 
SNMP_T_GAUGE32);
break;

Is my assumption correct or have I missed something ?

I'm gonna give it a try while a fix perhaps makes its way in the next release 
or patches.

Have a nice day and thanks for your nice work in OpenBSD !

Best regards
Arnaud


Re: ddb(4): p[rint] man page example vs. result.

2018-05-09 Thread Martin Pieuchot
On 09/05/18(Wed) 12:13, Artturi Alm wrote:
> On Wed, May 09, 2018 at 10:23:41AM +0200, Martin Pieuchot wrote:
> > On 09/05/18(Wed) 07:48, Artturi Alm wrote:
> > > On Tue, May 08, 2018 at 01:44:39AM +0300, Artturi Alm wrote:
> > 
> > 
> > No bug are irrelevant to fix.  But working with you is hard, really
> > hard.  You never explain what the problem is.  Reading your email is
> > an exercise in frustration because you can do some good work but you
> > fail to communicate.
> > 
> > > > (manual "copypaste"):
> > > > nc2k4hp# sysctl ddb.trigger=1
> > > > Stopped at  db_enter+0x4:   popl%ebp
> > > > ddb{0}> print/x "eax = " $eax "\necx = " $ecx "\n"
> > > > 3
> > > > ddb{0}> c
> > > > ddb.trigger: 0 -> 1
> > > > 
> > > > so, for reasons yet unknown to me, p[rint] doesn't seem to work at all
> > > > like described in the man page, tested on i386.
> > 
> > What do no work?  What does the man page describe?  Do you expect us to
> > read the man page, then look at your mail again, then try to understand
> > what is not working? 
> > 
> 
> For example,
> 
>   print/x "eax = " $eax "\necx = " $ecx "\n"
> 
> will print something like this:
> 
>   eax = xx
>   ecx = yy
> 
> Now I did install 5.0 into a VM, and there the result for above example
> would of have been just "Ambiguous", and I'm guessing now that this
> has not been working as in the example since import.
> My fix is limited to producing output just like in the example, but
> input requires more, as it needs escapes for everything not a-z,A-Z,0-9.
> 
> > > > Should it work? I hope it would.
> > 
> > What should work?  Why do you hope?  Maybe the manpage should be fixed?
> > 
> 
> Multiple [addr] arguments to p[rint], including support for strings,
> and i hope so because i would find it useful while testing/writing/porting
> drivers. Maybe, I do like "show struct", and have more than just
> the filtering diff for it, but it doesn't really work for the ad hoc
> usecases p[rint] seems so excellent for.
> 
> > > Does feel like waste of time to go any further fixing this, if this is
> > > yet another bug too irrelevant for anyone to ack for, so _any_ input
> > > here would be great.
> > 
> > Like I said, no bug are irrelevant but if the one finding the bug, you
> > in that case, is not willing to properly explain the problem, then
> > better not send an email at all ;)
> 
> Will try in the future.

Thanks for the explanation!

> haven't tested the diff below yet, but compared to previous, it should
> have working /modifierS.

IMHO we should just amend the man page and keep ddb(4) code simple. 



Re: ddb(4): p[rint] man page example vs. result.

2018-05-09 Thread Artturi Alm
On Wed, May 09, 2018 at 10:23:41AM +0200, Martin Pieuchot wrote:
> On 09/05/18(Wed) 07:48, Artturi Alm wrote:
> > On Tue, May 08, 2018 at 01:44:39AM +0300, Artturi Alm wrote:
> 
> 
> No bug are irrelevant to fix.  But working with you is hard, really
> hard.  You never explain what the problem is.  Reading your email is
> an exercise in frustration because you can do some good work but you
> fail to communicate.
> 
> > > (manual "copypaste"):
> > > nc2k4hp# sysctl ddb.trigger=1
> > > Stopped atdb_enter+0x4:   popl%ebp
> > > ddb{0}> print/x "eax = " $eax "\necx = " $ecx "\n"
> > >   3
> > > ddb{0}> c
> > > ddb.trigger: 0 -> 1
> > > 
> > > so, for reasons yet unknown to me, p[rint] doesn't seem to work at all
> > > like described in the man page, tested on i386.
> 
> What do no work?  What does the man page describe?  Do you expect us to
> read the man page, then look at your mail again, then try to understand
> what is not working? 
> 

For example,

  print/x "eax = " $eax "\necx = " $ecx "\n"

will print something like this:

  eax = xx
  ecx = yy

Now I did install 5.0 into a VM, and there the result for above example
would of have been just "Ambiguous", and I'm guessing now that this
has not been working as in the example since import.
My fix is limited to producing output just like in the example, but
input requires more, as it needs escapes for everything not a-z,A-Z,0-9.

> > > Should it work? I hope it would.
> 
> What should work?  Why do you hope?  Maybe the manpage should be fixed?
> 

Multiple [addr] arguments to p[rint], including support for strings,
and i hope so because i would find it useful while testing/writing/porting
drivers. Maybe, I do like "show struct", and have more than just
the filtering diff for it, but it doesn't really work for the ad hoc
usecases p[rint] seems so excellent for.

> > Does feel like waste of time to go any further fixing this, if this is
> > yet another bug too irrelevant for anyone to ack for, so _any_ input
> > here would be great.
> 
> Like I said, no bug are irrelevant but if the one finding the bug, you
> in that case, is not willing to properly explain the problem, then
> better not send an email at all ;)

Will try in the future.


haven't tested the diff below yet, but compared to previous, it should
have working /modifierS.

-Artturi

diff --git sys/ddb/db_command.c sys/ddb/db_command.c
index a275023dc58..27cda0ba641 100644
--- sys/ddb/db_command.c
+++ sys/ddb/db_command.c
@@ -612,8 +612,8 @@ struct db_command db_command_table[] = {
{ "machine",NULL,   0,  NULL},
 #endif
{ "kill",   db_kill_cmd,0,  NULL },
-   { "print",  db_print_cmd,   0,  NULL },
-   { "p",  db_print_cmd,   0,  NULL },
+   { "print",  db_print_cmd,   CS_OWN, NULL },
+   { "p",  db_print_cmd,   CS_OWN, NULL },
{ "pprint", db_ctf_pprint_cmd,  CS_OWN, NULL },
{ "examine",db_examine_cmd, CS_SET_DOT, NULL },
{ "x",  db_examine_cmd, CS_SET_DOT, NULL },
diff --git sys/ddb/db_examine.c sys/ddb/db_examine.c
index d8fec8219f1..e8b1912b937 100644
--- sys/ddb/db_examine.c
+++ sys/ddb/db_examine.c
@@ -238,19 +238,68 @@ db_examine(db_addr_t addr, char *fmt, int count)
 /*
  * Print value.
  */
-char   db_print_format = 'x';
+char   db_print_format[TOK_STRING_SIZE] = "x";
 
 /*ARGSUSED*/
 void
 db_print_cmd(db_expr_t addr, int have_addr, db_expr_t count, char *modif)
 {
db_expr_t   value;
+   chartmptok[TOK_STRING_SIZE];
chartmpfmt[28];
+   char*s;
+   int i, m, t;
 
-   if (modif[0] != '\0')
-   db_print_format = modif[0];
+   /* check for modifier */
+   t = db_read_token();
+   if (t == tSLASH) {
+   t = db_read_token();
+   if (t != tIDENT) {
+   db_printf("\nBad modifier\n");
+   db_flush_lex();
+   return;
+   }
+   db_strlcpy(db_print_format, db_tok_string,
+   sizeof(db_print_format));
+
+   t = db_read_token();
+   }
+
+_inp_loop:
+   if (t == tDITTO) {
+   t = db_read_token();
+   db_strlcpy(tmptok, db_tok_string, sizeof(tmptok));
+   t = db_read_token();
+   if (t != tDITTO) {
+   db_printf("\nBad string, missing \"\n");
+   db_flush_lex();
+   return;
+   }
+   s = db_tok_string;
+   for (i = 0; i < TOK_STRING_SIZE && s[i] != '\0'; i++) {
+   if (i < (TOK_STRING_SIZE - 1) && s[i] == '\\') {
+   switch (s[++i]) {
+   

hang in i386 pmap_tlb_shootwait

2018-05-09 Thread Alexander Bluhm
Hi,

While running my nightly regression tests, I compiled
/ports/misc/posixtestsuite.  It was the first time that I was running
regress while having some other load on the machine.  During
regress/lib/libc/ieeefp/except the machine hang.  It has 2 CPUs.

The final output of the test:

===> ieeefp/except
cc -O2 -pipe   -MD -MP  -c /usr/src/regress/lib/libc/ieeefp/except/except.c
cc   -o except except.o 
./except fltdiv

This kernel was running:

OpenBSD 6.3-current (GENERIC.MP) #592: Mon May  7 10:07:12 MDT 2018
dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP

I could break into ddb:

Stopped at  db_enter+0x4:   popl%ebp
ddb{0}> trace
db_enter() at db_enter+0x4
comintr(d577d000) at comintr+0x21e
intr_handler(f58be8e4,d577c840) at intr_handler+0x30
Xintr_ioapic3_untramp() at Xintr_ioapic3_untramp+0xd7
--- interrupt ---
pmap_tlb_shootwait() at pmap_tlb_shootwait+0x12
pmap_do_remove_pae(d0d33ce0,f55f2000,f55f3000,0) at pmap_do_remove_pae+0x2ac
pmap_remove(d0d33ce0,f55f2000,f55f3000) at pmap_remove+0x18
uvm_unmap_kill_entry(d0d2d2b4,d4c810dc) at uvm_unmap_kill_entry+0xde
uvm_unmap_remove(d0d2d2b4,f55f2000,f55f3000,f58bea00,0,1) at uvm_unmap_remove+0
x194
sys_kbind(d435dcf0,f58bea80,f58bea78) at sys_kbind+0x295
syscall() at syscall+0x25e
--- syscall (number -813868376) ---
end of kernel
0x7d6558e8:

CPU 0 is running clang, CPU 1 is running the except test script.

ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 92284  394442  70506  0  7 0x2except
*47266  113041  37786 55  7 0x2cc
 37786  281652  35994 55  30x10008a  pause sh
 70506  372899  71391  0  30x10008a  pause make
 71391  488915  75345  0  30x10008a  pause sh
 75345  253329  29923  0  30x10008a  pause make
 29923   89609  68217  0  30x10008a  pause sh
 68217  294846  81420  0  30x10008a  pause make
 51311  445816  20823  0  2   0x491perl
 81420  149032  81906  0  30x10008a  pause sh
 81906  389989  44981  0  30x10008a  pause make
 24237   35914  94782  0  30x100082  piperdgzip
 94782  375463  44981  0  30x100082  piperdpax
 44981  114211  25893  0  30x82  piperdperl
 25893  239558   5387  0  30x10008a  pause ksh
  5387  100109  39691  0  30x92  selectsshd
 65456  428886  57598  0  30x100083  kqreadtail
 57598  364467  56435  0  30x10008b  pause ksh
 39040  394741  84200 55  2   0x482perl
 84200   57590  22769 55  30x10008a  pause sh
 22769  388112  71080 55  30x10008a  pause make
 71080  289240  55503 55  30x10008a  pause sh
 55503  177103  20823 55  30x10008a  pause make
 20823  473630  90353  0  30x93  wait  perl
 35994  500360  35455 55  30x82  piperdgmake
 35455   82895  18413 55  30x10008a  pause make
 184139872   9766 55  30x10008a  pause sh
  9766   29157  60819 55  30x10008a  pause make
 60819  198028  51400 55  30x10008a  pause sh
 51400  455284  1 55  30x10008a  pause make
 90353  444304  56435  0  30x10008b  pause ksh
 56435  213296  1  0  20x100480tmux
 12943  273120  79318  0  30x100083  kqreadtmux
 79318   90427  49332  0  30x10008b  pause ksh
 49332  480938  39691  0  30x92  selectsshd
 79215  221858  1  0  20x100083getty
  5182   91398  1  0  30x100083  ttyin getty
 68061  353121  1  0  30x100083  ttyin getty
 61973  471346  1  0  30x100083  ttyin getty
 58677  314567  1  0  30x100083  ttyin getty
 26310   59684  1  0  30x100083  ttyin getty
 2  266793  1  0  20x100498cron
 69017  469788  1 99  30x100090  poll  sndiod
 67250  378711  1110  30x100090  poll  sndiod
  7419  486904  35256 95  30x100092  kqreadsmtpd
 87223  110989  35256103  30x100092  kqreadsmtpd
 22973  257799  35256 95  30x100092  kqreadsmtpd
 22893  197212  35256 95  30x100092  kqreadsmtpd
 55776  30  35256 95  30x100092  kqreadsmtpd
 67856  519997  35256 95  30x100092  kqreadsmtpd
 35256  194026  1  0  30x100080  kqreadsmtpd
 39691  482995  1  0  30x80  selectsshd
 91848  227431  0  0  2 0x14600acct
 57929  439430  0  0  3 0x14280  nfsidlnfsio
 22984  278690  0  0  3 0x14280  nfsidl   

Re: ddb(4): p[rint] man page example vs. result.

2018-05-09 Thread Martin Pieuchot
On 09/05/18(Wed) 07:48, Artturi Alm wrote:
> On Tue, May 08, 2018 at 01:44:39AM +0300, Artturi Alm wrote:


No bug are irrelevant to fix.  But working with you is hard, really
hard.  You never explain what the problem is.  Reading your email is
an exercise in frustration because you can do some good work but you
fail to communicate.

> > (manual "copypaste"):
> > nc2k4hp# sysctl ddb.trigger=1
> > Stopped at  db_enter+0x4:   popl%ebp
> > ddb{0}> print/x "eax = " $eax "\necx = " $ecx "\n"
> > 3
> > ddb{0}> c
> > ddb.trigger: 0 -> 1
> > 
> > so, for reasons yet unknown to me, p[rint] doesn't seem to work at all
> > like described in the man page, tested on i386.

What do no work?  What does the man page describe?  Do you expect us to
read the man page, then look at your mail again, then try to understand
what is not working? 

> > Should it work? I hope it would.

What should work?  Why do you hope?  Maybe the manpage should be fixed?

> Does feel like waste of time to go any further fixing this, if this is
> yet another bug too irrelevant for anyone to ack for, so _any_ input
> here would be great.

Like I said, no bug are irrelevant but if the one finding the bug, you
in that case, is not willing to properly explain the problem, then
better not send an email at all ;)