Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg drivers (?))

2018-06-03 Thread Masanobu SAITOH

On 2018/06/03 18:20, 6b...@6bone.informatik.uni-leipzig.de wrote:

Hello,

I have applied

 http://www.netbsd.org/~msaitoh/ixgbe-eitr-20180522-0.dif
and
 http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif

to netbsd-8 RC1. With these patches the problem seems to be solved.


 Thanks. I've committed the latest patch now!




Thank you for your efforts

Regards
Uwe


On Fri, 1 Jun 2018, Masanobu SAITOH wrote:


Date: Fri, 1 Jun 2018 12:47:32 +0900
From: Masanobu SAITOH 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: msai...@execsw.org, Martin Husemann ,
    current-users@netbsd.org
Subject: Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg
    drivers (?))




  The same diff is at:

 http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif



Updated patch (Fix compile error and ixv patch):

--
Don't call ixgbe_rearm_queues() in ixgbe_local_timer1(). ixgbe_enable_queue()
and ixgbe_disable_queue() try to enable/disable queue interrupt safely. It
has the internal counter. When a queue's MSI-X is received, ixgbe_msix_que()
is called (IPL_NET). This function disable the queue's interrupt by
ixgbe_disable_queue() and issues an softint. ixgbe_handle() queue is called by
the softint (IPL_SOFTNET), process TX,RX and call ixgbe_enable_queue() at the
end.

ixgbe_local_timer1() is a callout and run always on CPU 0 (IPL_SOFTCLOCK).
When ixgbe_rearm_queues() called, an MSI-X interrupt is issued for a specific
queue. It may not CPU 0. If this interrupt's ixgbe_msix_que() is called
and sofint_schedule() is called before the last sofint's softint_execute()
is not called, the softint_schedule() fails because of SOFTINT_PENDING.
It result in breaking ixgbe_{enable,disable}_queue()'s internal counter.

ixgbe_local_timer1() is written not to call ixgbe_rearm_queues() if
the interrupt is disabled, but it's called because of unknown bug or a race.

One solution is to not to use the internal counter, but it's little difficult.
Another solution is stop using ixgbe_rearm_queues() at all.  Essentially,
ixgbe_rearm_queues() is not required (it was added in ixgbe.c rev. 1.43
(2016/12/01)). ixgbe_rearm_queues() helps for lost interrupt problem but
I've never seen it other than ixgbe_rearm_queues() problem.


Index: ixgbe.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
retrieving revision 1.158
diff -u -p -r1.158 ixgbe.c
--- ixgbe.c    30 May 2018 09:17:17 -    1.158
+++ ixgbe.c    1 Jun 2018 03:22:05 -
@@ -4411,6 +4411,7 @@ ixgbe_local_timer1(void *arg)
/* Only truely watchdog if all queues show hung */
if (hung == adapter->num_queues)
    goto watchdog;
+#if 0 /* XXX Avoid unexpectedly disabling interrupt forever (PR#53294) */
else if (queues != 0) { /* Force an IRQ on queues with work */
    que = adapter->queues;
    for (i = 0; i < adapter->num_queues; i++, que++) {
@@ -4421,6 +4422,7 @@ ixgbe_local_timer1(void *arg)
    mutex_exit(>dc_mtx);
    }
}
+#endif
 out:
callout_reset(>timer, hz, ixgbe_local_timer, adapter);
@@ -6643,7 +6645,7 @@ ixgbe_handle_link(void *context)
/
 * ixgbe_rearm_queues
 /
-static void
+static __inline void
ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
{
u32 mask;
Index: ixv.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixv.c,v
retrieving revision 1.102
diff -u -p -r1.102 ixv.c
--- ixv.c    30 May 2018 08:35:26 -    1.102
+++ ixv.c    1 Jun 2018 03:22:05 -
@@ -1266,9 +1266,11 @@ ixv_local_timer_locked(void *arg)
/* Only truly watchdog if all queues show hung */
if (hung == adapter->num_queues)
    goto watchdog;
+#if 0
else if (queues != 0) { /* Force an IRQ on queues with work */
    ixv_rearm_queues(adapter, queues);
}
+#endif
 callout_reset(>timer, hz, ixv_local_timer, adapter);
--

The same diff is at:

http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180531-0.dif

--
---
   SAITOH Masanobu (msai...@execsw.org
    msai...@netbsd.org)




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: More build.sh ctf fallout on a linux host

2018-06-03 Thread Chuck Silvers
On Sun, Jun 03, 2018 at 10:27:06PM +0300, Valery Ushakov wrote:
> #   compile  libctf/ctf_error.lo
> cc -pipe -O2   -DCTF_OLD_VERSIONS 
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../compat  
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/sys  
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/include 
>  
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/head
>   
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/common/ctf
>   
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/lib/libctf/common
>   
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/uts/common
>   
> -I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/bsd/elftoolchain/dist/libelf
>  -DHAVE_NBTOOL_CONFIG_H=1 -D_FILE_OFFSET_BITS=64  
> -I/home/uwe/work/netbsd/build/tools/include/compat 
> -I/home/uwe/work/netbsd/ro/src/tools/compat  -DHAVE_NBTOOL_CONFIG_H=1 
> -D_FILE_OFFSET_BITS=64 -I/home/uwe/work/netbsd/build/tools/include 
> -I/home/uwe/work/netbsd/build/tools/include/nbinclude -c -o ctf_error.lo.o
> /home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/common/ctf/ctf_error.c
> In file included from 
> /home/uwe/work/netbsd/ro/src/tools/libctf/../compat/compat_defs.h:75:0,
>  from 
> /home/uwe/work/netbsd/build/tools/include/compat/nbtool_config.h:882,
>  from 
> /home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/common/ctf/ctf_error.c:23:
> /home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/uts/common/rpc/types.h:57:9:
>  error: unknown type name 'u_longlong_t'
>  typedef u_longlong_t ulonglong_t;
>  ^


I was afraid that might happen...
which distribution and version of linux was that on?
the one I did my fix for linux on was fedora 26.

-Chuck


re: Automated report: NetBSD-current/i386 build failure

2018-06-03 Thread Christos Zoulas
On Jun 4,  6:07am, m...@eterna.com.au (matthew green) wrote:
-- Subject: re: Automated report: NetBSD-current/i386 build failure

| FWIW, yes, i did do this, both before committing and after i
| heard a report it wasn't working for chuq.
| 
| i still was unable to reproduce it, and while christos commited
| one fix, i still think forcing that code to look at _our_
| headers first before cddl ones is better than the hacks we've
| added to the cddl ones to be usable in non-cddl code contexts,
| is also the right thing to do.

Let me explain my way of thinking about this:

There are two kinds of system header files:
1. "infrastructure" headers that provide base definitions and
   other stuff that other headers use (systm.h)
2.  "feature" headers that provide a specific feature (proc.h).

The solaris ones generally augment ours; the template should be;

solaris/foo.h:

#ifndef SOLARIS_SYS_FOO_H
#define SOLARIS_SYS_FOO_H

#include_next 

/* More definitions */

#endif /* SOLARIS_SYS_FOO_H *?

Now there caaes where the solaris headers expose more stuff than ours,
so we need to include more of our system headers from theirs. When that
happens the rule should be that:

Their infrastructure headers should not include our feature headers,
but can include our infrastructure headers, otherwise we end up with
circular dependencies. Violations to the rule should be kept to a minimum
and tested carefully.

In the general case, if we want the augmented symbols we should include
the solaris ones first in the search path.

christos




re: Automated report: NetBSD-current/i386 build failure

2018-06-03 Thread matthew green
Andreas Gustafsson writes:
> matthew green wrote:
> > FWIW, i tested my change on about 5 platforms, but all
> > netbsd host.  i can't reproduce the above, unfortunately.
> 
> The testbed runs on a NetBSD host.  Have you tried doing a
> "build.sh release" from scratch?
> 
> In the meantime, the build has been broken in a fifth way, this time
> by pgoyette:

FWIW, yes, i did do this, both before committing and after i
heard a report it wasn't working for chuq.

i still was unable to reproduce it, and while christos commited
one fix, i still think forcing that code to look at _our_
headers first before cddl ones is better than the hacks we've
added to the cddl ones to be usable in non-cddl code contexts,
is also the right thing to do.


.mrg.


More build.sh ctf fallout on a linux host

2018-06-03 Thread Valery Ushakov
#   compile  libctf/ctf_error.lo
cc -pipe -O2   -DCTF_OLD_VERSIONS 
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../compat  
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/sys  
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/include  
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/head 
 
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/common/ctf
  
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/lib/libctf/common
  
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/uts/common
  
-I/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/bsd/elftoolchain/dist/libelf
 -DHAVE_NBTOOL_CONFIG_H=1 -D_FILE_OFFSET_BITS=64  
-I/home/uwe/work/netbsd/build/tools/include/compat 
-I/home/uwe/work/netbsd/ro/src/tools/compat  -DHAVE_NBTOOL_CONFIG_H=1 
-D_FILE_OFFSET_BITS=64 -I/home/uwe/work/netbsd/build/tools/include 
-I/home/uwe/work/netbsd/build/tools/include/nbinclude -c -o ctf_error.lo.o
/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/common/ctf/ctf_error.c
In file included from 
/home/uwe/work/netbsd/ro/src/tools/libctf/../compat/compat_defs.h:75:0,
 from 
/home/uwe/work/netbsd/build/tools/include/compat/nbtool_config.h:882,
 from 
/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/common/ctf/ctf_error.c:23:
/home/uwe/work/netbsd/ro/src/tools/libctf/../../external/cddl/osnet/dist/uts/common/rpc/types.h:57:9:
 error: unknown type name 'u_longlong_t'
 typedef u_longlong_t ulonglong_t;
 ^

-uwe


re: Automated report: NetBSD-current/i386 build failure

2018-06-03 Thread Andreas Gustafsson
> > --- dependall-kdump ---
> > In file included from 
> > /tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/net/if.h:99:0,
> >  from kdump-ioctl.c:23:
> > /tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/altq/if_altq.h:48:2: 
> > error: unknown type name 'kmutex_t'
> >   kmutex_t *ifq_lock;
> >   ^~~~

So, christos fixed that one - thanks.  Now, the build is failing with
a sixth (!) issue:

--- dependall-sysinst ---
/tmp/bracket/build/2018.06.03.15.26.03-i386/src/usr.sbin/sysinst/arch/i386/md.c:
 In function 'md_post_newfs':
/tmp/bracket/build/2018.06.03.15.26.03-i386/src/usr.sbin/sysinst/arch/i386/md.c:388:1:
 error: declaration of 'md_post_extract' shadows a global declaration 
[-Werror=shadow]
 md_post_extract(void)
 ^~~
In file included from 
/tmp/bracket/build/2018.06.03.15.26.03-i386/src/usr.sbin/sysinst/arch/i386/md.c:50:0:
/tmp/bracket/build/2018.06.03.15.26.03-i386/src/usr.sbin/sysinst/arch/i386/../../defs.h:434:5:
 note: shadowed declaration is here
 int md_post_extract(void);
 ^~~

(that's just the first one of many error messages, more at
http://releng.netbsd.org/b5reports/i386/2018/2018.06.03.15.26.03/build.log.tail)

Looks like martin's sysinst changes...
-- 
Andreas Gustafsson, g...@gson.org


re: Automated report: NetBSD-current/i386 build failure

2018-06-03 Thread Andreas Gustafsson
FWIW, amd64 is also affected:

  
http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2018/2018.06.03.13.23.58/build.log.tail

To find the relevant error message in the above, search for kmutex_t.
-- 
Andreas Gustafsson, g...@gson.org


re: Automated report: NetBSD-current/i386 build failure

2018-06-03 Thread Andreas Gustafsson
matthew green wrote:
> FWIW, i tested my change on about 5 platforms, but all
> netbsd host.  i can't reproduce the above, unfortunately.

The testbed runs on a NetBSD host.  Have you tried doing a
"build.sh release" from scratch?

In the meantime, the build has been broken in a fifth way, this time
by pgoyette:

 --- cleandir-share ---
 nbmake[6]: 
"/tmp/bracket/build/2018.06.03.09.22.34-i386/src/share/mk/bsd.man.mk" line 257: 
Wrong number of words (2549) in .for substitution list with 2 vars

I just committed a change that will hopefully fix that one.
-- 
Andreas Gustafsson, g...@gson.org


re: Automated report: NetBSD-current/i386 build failure

2018-06-03 Thread matthew green
Andreas Gustafsson writes:
> The build is still failing as of source date 2018.06.03.05.55.08,
> now with the following error messages:
> 
> --- dependall-kdump ---
> In file included from 
> /tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/net/if.h:99:0,
>  from kdump-ioctl.c:23:
> /tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/altq/if_altq.h:48:2: 
> error: unknown type name 'kmutex_t'
>   kmutex_t *ifq_lock;
>   ^~~~
> In file included from kdump-ioctl.c:23:0:
> /tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/net/if.h:211:2: error: 
> unknown type name 'kmutex_t'
>   kmutex_t *ifq_lock;
>   ^~~~
> 
> Looks like this latest error was introduced by the following commit:
> 
>   2018.06.02.20.07.15 mrg src/usr.bin/kdump/mkioctls 1.51
> 
> The build has now been broken in four different ways within 24 hours,
> without a single successful build inbetween.
> 
> If would be helpful if developers could (a) compile test their changes
> with a full release build before committing, (b) fix or revert their
> commits quickly if they nonetheless turn out to break the build, and
> (c) wait until the tree is building successfully before committing
> unrelated changes, so that those changes can benefit from automated
> testing.

FWIW, i tested my change on about 5 platforms, but all
netbsd host.  i can't reproduce the above, unfortunately.


.mrg.


Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg drivers (?))

2018-06-03 Thread 6bone

Hello,

I have applied

http://www.netbsd.org/~msaitoh/ixgbe-eitr-20180522-0.dif
and
http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif

to netbsd-8 RC1. With these patches the problem seems to be solved.


Thank you for your efforts

Regards
Uwe


On Fri, 1 Jun 2018, Masanobu SAITOH wrote:


Date: Fri, 1 Jun 2018 12:47:32 +0900
From: Masanobu SAITOH 
To: 6b...@6bone.informatik.uni-leipzig.de
Cc: msai...@execsw.org, Martin Husemann ,
current-users@netbsd.org
Subject: Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg
drivers (?))




  The same diff is at:

 http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif



Updated patch (Fix compile error and ixv patch):

--
Don't call ixgbe_rearm_queues() in ixgbe_local_timer1(). 
ixgbe_enable_queue()

and ixgbe_disable_queue() try to enable/disable queue interrupt safely. It
has the internal counter. When a queue's MSI-X is received, ixgbe_msix_que()
is called (IPL_NET). This function disable the queue's interrupt by
ixgbe_disable_queue() and issues an softint. ixgbe_handle() queue is called 
by

the softint (IPL_SOFTNET), process TX,RX and call ixgbe_enable_queue() at the
end.

ixgbe_local_timer1() is a callout and run always on CPU 0 (IPL_SOFTCLOCK).
When ixgbe_rearm_queues() called, an MSI-X interrupt is issued for a specific
queue. It may not CPU 0. If this interrupt's ixgbe_msix_que() is called
and sofint_schedule() is called before the last sofint's softint_execute()
is not called, the softint_schedule() fails because of SOFTINT_PENDING.
It result in breaking ixgbe_{enable,disable}_queue()'s internal counter.

ixgbe_local_timer1() is written not to call ixgbe_rearm_queues() if
the interrupt is disabled, but it's called because of unknown bug or a race.

One solution is to not to use the internal counter, but it's little 
difficult.

Another solution is stop using ixgbe_rearm_queues() at all.  Essentially,
ixgbe_rearm_queues() is not required (it was added in ixgbe.c rev. 1.43
(2016/12/01)). ixgbe_rearm_queues() helps for lost interrupt problem but
I've never seen it other than ixgbe_rearm_queues() problem.


Index: ixgbe.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
retrieving revision 1.158
diff -u -p -r1.158 ixgbe.c
--- ixgbe.c 30 May 2018 09:17:17 -  1.158
+++ ixgbe.c 1 Jun 2018 03:22:05 -
@@ -4411,6 +4411,7 @@ ixgbe_local_timer1(void *arg)
/* Only truely watchdog if all queues show hung */
if (hung == adapter->num_queues)
goto watchdog;
+#if 0 /* XXX Avoid unexpectedly disabling interrupt forever (PR#53294) */
else if (queues != 0) { /* Force an IRQ on queues with work */
que = adapter->queues;
for (i = 0; i < adapter->num_queues; i++, que++) {
@@ -4421,6 +4422,7 @@ ixgbe_local_timer1(void *arg)
mutex_exit(>dc_mtx);
}
}
+#endif
 out:
callout_reset(>timer, hz, ixgbe_local_timer, adapter);
@@ -6643,7 +6645,7 @@ ixgbe_handle_link(void *context)
/
 * ixgbe_rearm_queues
 /
-static void
+static __inline void
ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
{
u32 mask;
Index: ixv.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixv.c,v
retrieving revision 1.102
diff -u -p -r1.102 ixv.c
--- ixv.c   30 May 2018 08:35:26 -  1.102
+++ ixv.c   1 Jun 2018 03:22:05 -
@@ -1266,9 +1266,11 @@ ixv_local_timer_locked(void *arg)
/* Only truly watchdog if all queues show hung */
if (hung == adapter->num_queues)
goto watchdog;
+#if 0
else if (queues != 0) { /* Force an IRQ on queues with work */
ixv_rearm_queues(adapter, queues);
}
+#endif
callout_reset(>timer, hz, ixv_local_timer, adapter);
--

The same diff is at:

http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180531-0.dif

--
---
   SAITOH Masanobu (msai...@execsw.org
msai...@netbsd.org)


Re: Automated report: NetBSD-current/i386 build failure

2018-06-03 Thread Andreas Gustafsson
The build is still failing as of source date 2018.06.03.05.55.08,
now with the following error messages:

--- dependall-kdump ---
In file included from 
/tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/net/if.h:99:0,
 from kdump-ioctl.c:23:
/tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/altq/if_altq.h:48:2: error: 
unknown type name 'kmutex_t'
  kmutex_t *ifq_lock;
  ^~~~
In file included from kdump-ioctl.c:23:0:
/tmp/bracket/build/2018.06.03.05.55.08-i386/src/sys/net/if.h:211:2: error: 
unknown type name 'kmutex_t'
  kmutex_t *ifq_lock;
  ^~~~

Looks like this latest error was introduced by the following commit:

  2018.06.02.20.07.15 mrg src/usr.bin/kdump/mkioctls 1.51

The build has now been broken in four different ways within 24 hours,
without a single successful build inbetween.

If would be helpful if developers could (a) compile test their changes
with a full release build before committing, (b) fix or revert their
commits quickly if they nonetheless turn out to break the build, and
(c) wait until the tree is building successfully before committing
unrelated changes, so that those changes can benefit from automated
testing.
-- 
Andreas Gustafsson, g...@gson.org