Re: make not closing fds?

2013-09-01 Thread Frank Kardel
That's what I assume too. But, should normal commands really get access 
to those fds?


Frank
On 09/01/13 14:06, Christos Zoulas wrote:

In article 522317ab.1020...@netbsd.org,
Frank Kardel  kar...@netbsd.org wrote:

While building a release I saw in fstat that commands started from make
had many (pipe) file descriptors allocated. Is make missing
setting FD_CLOEXEC/closing before fork on these ? While this is not
really critical it opens up possibilities to clobber at least the output
and gobble up
input data with misbehaved programs.

I believe parallel make passes tokens through fds to children to keep
track of how many parallel makes are running.

christos




Re: link problems

2013-10-11 Thread Frank Kardel

Great,

should I make the rest if the list from my build available ?

Frank

On 10/11/13 13:18, Roy Marples wrote:

On 05/10/2013 19:10, Frank Kardel wrote:

My build is now at  6681/11478 and I see these link errors so far:
./audio/alsa-utils/.broken.html:ld: note: 'ceil' is defined in DSO
/usr/lib/libm.so.0 so try adding it to the linker command line
./audio/cmus/.broken.html:ld: note: 'tgoto' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./audio/festival/.broken.html:ld: note: 'tgetstr' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./chat/silc-client/.broken.html:ld: note: 'tgetflag' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./comms/kermit/.broken.html:ld: note: 'tgoto' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./comms/tn3270/.broken.html:ld: note: 'tgetstr' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./editors/bvi/.broken.html:ld: note: 'tgetstr' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./editors/ce/.broken.html:ld: note: 'tgoto' is defined in DSO
/usr/lib/minfo.so.1 so try adding it to the linker command line
./editors/ex/.broken.html:ld: note: 'tgoto' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./filesystems/fuse-chironfs/.broken.html:ld: note: 'pthread_create' is
defined in DSO /usr/lib/libpthread.so.1 so try adding it to the linker
command line
./games/greed/.broken.html:ld: note: 'tgetstr' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./games/level9/.broken.html:ld: note: 'tgetent' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./games/tads/.broken.html:ld: note: 'tgetflag' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./games/tads/.broken.html:ld: note: 'tgetflag' is defined in DSO
/usr/lib/libterminfo.so.1 so try adding it to the linker command line
./multimedia/ffmpegthumbnailer/.broken.html:ld: note: 'pthread_cancel'
is defined in DSO /usr/lib/libpthread.so.1 so try adding it to
add tradcpp also which I manually helped


All the termcap/terminfo related errors reported here have been fixed.
In most cases the package really used termcap with no curses functions.

Thanks

Roy




Re: editors/Sigil vs. binutils-2.23

2013-10-13 Thread Frank Kardel

See the link problems thread for more issues and discussed/planned fixes.

Frank


On 10/13/13 00:18, Ryo ONODERA wrote:

From: Ryo ONODERA ryo...@yk.rim.or.jp, Date: Sat, 12 Oct 2013 21:56:15 +0900 
(JST)


Hi,

From: Thomas Klausner w...@netbsd.org, Date: Sat, 12 Oct 2013 14:49:03 +0200


Hi!

I'm confused by editors/Sigil breakage:

Linking CXX executable ../../bin/sigil
ld: /scratch/editors/Sigil/work/.buildlink/qt4/lib/libQtCore.so: undefined 
reference to symbol 'pthread_cancel'
ld: note: 'pthread_cancel' is defined in DSO /usr/lib/libpthread.so.1 so try 
adding it to the linker command line
/usr/lib/libpthread.so.1: could not read symbols: Invalid operation
*** Error code 1

The Sigil source code does not use pthread_cancel. QtCore itself is
linked against pthread:

# ldd /usr/pkg/qt4/lib/libQtCore.so
/usr/pkg/qt4/lib/libQtCore.so:
 -lz.1 = /usr/lib/libz.so.1
 -lgcc_s.1 = /usr/lib/libgcc_s.so.1
 -lc.12 = /usr/lib/libc.so.12
 -lstdc++.7 = /usr/lib/libstdc++.so.7
 -lm.0 = /usr/lib/libm.so.0
 -lpthread.1 = /usr/lib/libpthread.so.1

So where is the problem?

Now I updating my NetBSD current environment.
I will try to build Sigil on 6.99.24.

I can reproduce it on today's NetBSD/amd64 6.99.24.

It seems http://mail-index.netbsd.org/pkgsrc-users/2013/10/04/msg018735.html
is same problem.

editors/Sigil that is built on NetBSD/amd64 6.99.23 works with no error
on today's NetBSD/amd64 6.99.24.

--
Ryo ONODERA // ryo...@yk.rim.or.jp
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3




routing messages missing?

2014-01-20 Thread Frank Kardel

Hi,

with a -current as of 2014-01-12 I don't see
on wm0 for 'ifconfig wm0 alias 10.200.10.2 netmask 0xff00'
a RTM_NEWADDR for 10.200.10.2 in 'route monitor'.

but I see for 'ifconfig wm0 10.200.10.2 delete':
got message of size 88 on Mon Jan 20 19:53:01 2014
RTM_DELADDR: address being removed from iface: len 88, metric 0, flags: 
CLONING

sockaddrs: NETMASK,IFP,IFA,BRD
 255.255.255.0 wm0:bc.5f.f4.98.32.84 10.200.10.2 10.200.10.255

doing 'ifconfig lo0 alias 127.0.0.2; ifconfig lo0 127.0.0.2 delete'
gives as expected:
RTM_NEWADDR: address being added to iface: len 80, metric 0, flags: none
sockaddrs: NETMASK,IFP,IFA,BRD
 255.0.0.0 lo0 127.0.0.2 127.0.0.2
got message of size 152 on Mon Jan 20 19:51:01 2014
RTM_ADD: Add Route: len 152, pid 0, seq 0, errno 0, flags: UP,HOST
locks: none inits: none
sockaddrs: DST,GATEWAY
 127.0.0.2 127.0.0.2
got message of size 152 on Mon Jan 20 19:51:30 2014
RTM_DELETE: Delete Route: len 152, pid 0, seq 0, errno 0, flags: HOST
locks: none inits: none
sockaddrs: DST,GATEWAY
 127.0.0.2 127.0.0.2
got message of size 80 on Mon Jan 20 19:51:30 2014
RTM_DELADDR: address being removed from iface: len 80, metric 0, flags: none
sockaddrs: NETMASK,IFP,IFA,BRD
 255.0.0.0 lo0 127.0.0.2 127.0.0.2

Why is there no routing message RTM_NEWADDR for an ifconfig wm0 alias 
w.x.y.z on wm0 ?


Frank


Re: ntpq peculiarity

2014-04-02 Thread Frank Kardel

Hello Dave !

I tripped yesterday over a merge mishap concerning libntp/atouint.c and 
fixed that.


It could be that this is related to what you see. Can you 
checkout/recompile libntp/ and recompile ntpq?


Best regards,
  Frank

On 04/02/14 00:36, Dave Tyson wrote:
I've noticed that the recent version of ntpq always reports peers as 
being stratum 0 regardless of what they really are:


root(cruncher)root$ ntpq
ntpq host 192.168.0.200
current host set to 192.168.0.200
ntpq pe
 remote   refid  st t when poll reach delay offset  
jitter
== 

*garamon.von-opp 129.69.1.153 0 u   34   64   17 61.750 -2.490   
0.658
 iwik.org195.113.144.201  0 u  161   64   14 62.069 -1.863   
0.499

+2001:67c:12ac:: 172.2.53.81  0 u   33   64   17 90.235 0.128   0.468
 panda.zeroloop. 145.238.203.14   0 u   29   64   17 42.939 23.580   
0.257

ntpq

but on 192.168.0.200

ntpq pe
 remote   refid  st t when poll reach delay offset  
jitter
== 

*garamon.von-opp 129.69.1.153 2 u   11   64   17 61.750 -2.490   
0.658
 iwik.org195.113.144.201  2 u  138   64   14 62.069 -1.863   
0.499

+2001:67c:12ac:: 172.2.53.81  2 u   10   64   17 90.235 0.128   0.468
 panda.zeroloop. 145.238.203.14   2 u6   64   17 42.939 23.580   
0.257


Cruncher is running a very recent current:

NetBSD cruncher.anduin.org.uk 6.99.38 NetBSD 6.99.38 (DAVE) #0: Tue 
Mar 25 19:47:36 UTC 2014 
r...@cruncher.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/DAVE amd64


with ntp: ntpd 4.2.7p404-o Fri Dec 27 19:28:17 EST 2013 (import)

192,168.0.200 is running an older current:

NetBSD rakelane.anduin.org.uk 6.99.23 NetBSD 6.99.23 (HP) #1: Wed Aug  
7 11:46:04 BST 2013 
r...@rakelane.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/HP amd64


with ntp:ntpd 4.2.6p5-o Wed Feb  1 07:49:06 UTC 2012 (import)

Is this a known problem or should I send-pr it?

Cheers,
Dave






build break evbarm

2014-04-04 Thread Frank Kardel

Hi !


It seems MAP5_CM_L3INIT_SATA_CLKCTRL is inconsistently defined in 
src/sys/arch/arm/omap/omap2_reg.h.


compile  TISDP2420_INSTALL/obio_com.o
compile  TISDP2420_INSTALL/obio_mputmr.o
compile  TISDP2420_INSTALL/omap2_gpio.o
compile  TISDP2420_INSTALL/obio_wdt.o
compile  TISDP2420_INSTALL/omap2_gpmc.o
compile  TISDP2420_INSTALL/omap2_icu.o
compile  TISDP2420_INSTALL/omap2_l3i.o
In file included from 
/fs/raid1a/src/NetBSD/cur/src/sys/arch/arm/omap/obio_mputmr.c:129:0:
/fs/raid1a/src/NetBSD/cur/src/sys/arch/arm/omap/omap2_reg.h:885:0: 
error: OMAP5_CM_L3INIT_SATA_CLKCTRL redefined [-Werror]

 #define OMAP5_CM_L3INIT_SATA_CLKCTRL  0x4a009688
 ^
/fs/raid1a/src/NetBSD/cur/src/sys/arch/arm/omap/omap2_reg.h:400:0: note: 
this is the location of the previous definition

 #define OMAP5_CM_L3INIT_SATA_CLKCTRL  0x0088
 ^
In file included from 
/fs/raid1a/src/NetBSD/cur/src/sys/arch/arm/omap/obio_com.c:59:0:
/fs/raid1a/src/NetBSD/cur/src/sys/arch/arm/omap/omap2_reg.h:885:0: 
error: OMAP5_CM_L3INIT_SATA_CLKCTRL redefined [-Werror]

 #define OMAP5_CM_L3INIT_SATA_CLKCTRL  0x4a009688
 ^
/fs/raid1a/src/NetBSD/cur/src/sys/arch/arm/omap/omap2_reg.h:400:0: note: 
this is the location of the previous definition

 #define OMAP5_CM_L3INIT_SATA_CLKCTRL  0x0088
 ^

Best regards,
  Frank


Re: RPI kernels in -current expected to work?

2014-04-06 Thread Frank Kardel

Great ! Thanks - works again.

Frank

On 04/06/14 14:43, Nick Hudson wrote:

On 04/06/14 13:01, Frank Kardel wrote:

Hi,

I see a long stream of

fixup: pd 
fixup: pde ... nothing to do

lines scrolling (forever?) after the initial boot kernel messages.
The boot process does not seem to make any reasonably observable 
progress at that point.


This happens with self compiled kernels (as of 2014-04-06) and
kernels fetched from nyftp for 20140403 and 20140406.

The following older kernel works:
NetBSD rpi 6.99.38 NetBSD 6.99.38 (RPI) #0: Sat Mar 29 06:14:39 UTC 2014
bui...@b44.netbsd.org:/home/builds/ab/HEAD/evbarm-earmhf/201403290440Z-obj/home/builds/ab/HEAD/src/sys/arch/evbarm/compile/RPI 
evbarm


Best regards,
  Frank



cvs update :)

Nick




Re: netstat routing table (verbose) output

2014-04-24 Thread Frank Kardel

Hi Christos !

Bringing the functionality back would be good - I have yet to find a 
replacement.


Frank

On 04/24/14 14:44, Christos Zoulas wrote:

In article 5358fd54.1060...@netbsd.org,
Frank Kardel  kar...@netbsd.org wrote:

Hi,

once upon a time (NetBSD 6.x and before) the command
netstat -nvrf inet
delivered:

Routing tables

Internet:
DestinationGatewayFlagsRefs  UseMtu
Interface
default10.0.2.1   UGS15 35833864  -  nfe0
 expire0   recvpipe  0   sendpipe  0
 ssthresh  0   rtt   0   rttvar0
 hopcount  0
10.0.2/24  link#1 UC 140  -  nfe0
 expire   1390815881   recvpipe  0   sendpipe  0
 ssthresh  0   rtt   0   rttvar0
 hopcount  0
10.0.2.1   00:00:24:c9:2c:84  UHLc1  2690346  -  nfe0
 expire   1398341586   recvpipe  0   sendpipe  0
 ssthresh  0   rtt  109375   rttvar93750
 hopcount  0

Nowadays (-current) I only see the same output as without the -v option.

Is this deliberate? Is this a regression?

The -v option was useful to check things like hopcount and diagnose
routing trouble.
The route show dest command still outputs the hopcount and other
information, but not the entire routing table in extended format.

Is there a plan to re-instate the functionality?

I think that died when Elad? tried to cleanup netstat not to use kmem.
It should be fixed properly by adding more stuff to sysctl if we bring
the functionality back.

christos




Re: netstat routing table (verbose) output

2014-04-24 Thread Frank Kardel

On 04/24/14 16:56, Christos Zoulas wrote:

On Apr 24,  2:57pm, kar...@netbsd.org (Frank Kardel) wrote:
-- Subject: Re: netstat routing table (verbose) output

| Hi Christos !
|
| Bringing the functionality back would be good - I have yet to find a
| replacement.

Fixed,

christos

Thanks - will check.

Frank


Re: netstat routing table (verbose) output

2014-04-24 Thread Frank Kardel

On 04/24/14 20:45, Frank Kardel wrote:

On 04/24/14 16:56, Christos Zoulas wrote:

On Apr 24,  2:57pm, kar...@netbsd.org (Frank Kardel) wrote:
-- Subject: Re: netstat routing table (verbose) output

| Hi Christos !
|
| Bringing the functionality back would be good - I have yet to find a
| replacement.

Fixed,

christos

Thanks - will check.

Frank

Works again - thanks!

Frank



interface send-q stall in 6.99.40?

2014-05-12 Thread Frank Kardel

Hi,

I have two observations on 6.99.44 (amd64/evbarm) where a wm-interface 
send-queue is filled to the max. sendto()-calls terminate with ENOBUFS.


net.interfaces.wm3.sndq.len = 256
net.interfaces.wm3.sndq.maxlen = 256
net.interfaces.wm3.sndq.drops = 20007

the interface status is:
wm3: flags=8a43UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST mtu 1500
capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
enabled=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
enabled=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
address: 00:00:xx:xx:xx:xx
media: Ethernet autoselect (1000baseT 
full-duplex,flowcontrol,master,rxpause,txpause)

status: active
[addresses skipped]

traceroute packets from outside look like the are anwsered, but ICMP 
ECHO is not answered.


The interface recovers with an ifconfig wmX down/up.

While I do not know how to provoke this on wm (just happens once a 
week). I found the same phenomenon occurring on a Raspberry Pi when 
detaching the cable. The same symptoms occur there and ifconfig usmsc0 
down/up will recover.


Is anybody else seeing this?

Best regards,
  Frank


Re: interface send-q stall in 6.99.40?

2014-05-12 Thread Frank Kardel

On 05/12/14 22:39, Manuel Bouyer wrote:

On Mon, May 12, 2014 at 09:56:30PM +0200, Frank Kardel wrote:

[...]
traceroute packets from outside look like the are anwsered, but ICMP ECHO is
not answered.

The interface recovers with an ifconfig wmX down/up.

While I do not know how to provoke this on wm (just happens once a week). I
found the same phenomenon occurring on a Raspberry Pi when detaching the
cable. The same symptoms occur there and ifconfig usmsc0 down/up will
recover.

Is anybody else seeing this?

I do, on a amd64 host with wm interface. Disabling TSO4 and TSO6 fixed the
problem for me.


Will try that! Thanks for the hint.

Frank


recent dhcpcd looping on ppp0

2014-09-28 Thread Frank Kardel
The recent dhcpcd version (-current around 20140927) seems to be looping 
on ppp* interfaces.


Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: unknown carrier
Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: carrier_status: 
Inappropriate ioctl for device

Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: unknown carrier
Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: carrier_status: 
Inappropriate ioctl for device

Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: unknown carrier
Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: carrier_status: 
Inappropriate ioctl for device

Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: unknown carrier
Sep 28 15:14:11 Andromeda dhcpcd[3259]: ppp0: carrier_status: 
Inappropriate ioctl for device



Any ideas ? This seems to be a regression.

Frank


Re: recent dhcpcd looping on ppp0

2014-09-28 Thread Frank Kardel

On 09/28/14 23:17, Roy Marples wrote:

On Sunday 28 Sep 2014 22:06:47 Roy Marples wrote:

Going to guess that ppp0 doesn't have a carrier status OR IFF_RUNNING set?
The attached patch should reduce the log spam, let me know how it works out.

Errm, this patch should do better!


Roy

Well, less spam, but still
- busy wait (repeated count goes up to ~300k)
Sep 28 23:50:05 Andromeda dhcpcd[1101]: ppp0: unknown carrier
Sep 28 23:50:05 Andromeda dhcpcd[1101]: ppp0: carrier_status: 
Inappropriate ioctl for device

Sep 28 23:50:14 Andromeda syslogd[1251]: last message repeated 12776 times

- a flurry of routing messages (probably same number of as syslog 
entries)

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown, 
flags: UP,PTP,MULTICAST

leading to busy routing message aware daemons (named, mdnsd)

As a bonus (not necessarily related) a panic:

Sep 28 23:36:30 Andromeda /netbsd: panic: kernel diagnostic assertion 
c-c_cpu-cc_lwp == curlwp || c-c_cpu-cc_active != c failed: file 
/usr/srccur/src/sys/kern/kern_timeout.c, line 313

Sep 28 23:36:30 Andromeda /netbsd: cpu1: Begin traceback...
Sep 28 23:36:30 Andromeda /netbsd: vpanic() at netbsd:vpanic+0x13c
Sep 28 23:36:30 Andromeda /netbsd: kern_assert() at netbsd:kern_assert+0x4f
Sep 28 23:36:30 Andromeda /netbsd: callout_destroy() at 
netbsd:callout_destroy+0x74
Sep 28 23:36:30 Andromeda /netbsd: in6_delmulti() at 
netbsd:in6_delmulti+0x177
Sep 28 23:36:30 Andromeda /netbsd: in6_leavegroup() at 
netbsd:in6_leavegroup+0x15
Sep 28 23:36:30 Andromeda /netbsd: ip6_freemoptions() at 
netbsd:ip6_freemoptions+0x30
Sep 28 23:36:30 Andromeda /netbsd: in6_pcbdetach() at 
netbsd:in6_pcbdetach+0xd7
Sep 28 23:36:30 Andromeda /netbsd: udp6_detach_wrapper() at 
netbsd:udp6_detach_wrapper+0x3f

Sep 28 23:36:30 Andromeda /netbsd: soclose() at netbsd:soclose+0x63
Sep 28 23:36:30 Andromeda /netbsd: soo_close() at netbsd:soo_close+0x16
Sep 28 23:36:30 Andromeda /netbsd: closef() at netbsd:closef+0x54
Sep 28 23:36:30 Andromeda /netbsd: fd_close() at netbsd:fd_close+0x19f
Sep 28 23:36:30 Andromeda /netbsd: sys_close() at netbsd:sys_close+0x20
Sep 28 23:36:30 Andromeda /netbsd: syscall() at netbsd:syscall+0x9a
Sep 28 23:36:30 Andromeda /netbsd: --- syscall (number 6) ---
Sep 28 23:36:30 Andromeda /netbsd: 7f7ff743c46a:
Sep 28 23:36:30 Andromeda /netbsd: cpu1: End traceback...

Best regards,
  Frank



Re: recent dhcpcd looping on ppp0

2014-09-29 Thread Frank Kardel

Much better now.

The busy wait loop is now gone.

Thanks !

Frank

On 09/29/14 14:03, Roy Marples wrote:

On 2014-09-29 11:33, Roy Marples wrote:
Going to guess that ppp0 doesn't have a carrier status OR 
IFF_RUNNING set?
The attached patch should reduce the log spam, let me know how it 
works out.

Errm, this patch should do better!


Well, less spam, but still
- busy wait (repeated count goes up to ~300k)
Sep 28 23:50:05 Andromeda dhcpcd[1101]: ppp0: unknown carrier
Sep 28 23:50:05 Andromeda dhcpcd[1101]: ppp0: carrier_status:
Inappropriate ioctl for device
Sep 28 23:50:14 Andromeda syslogd[1251]: last message repeated 12776 
times


- a flurry of routing messages (probably same number of as 
syslog entries)

got message of size 152 on Sun Sep 28 23:50:10 2014
RTM_IFINFO: iface status change: len 152, if# 6, carrier: unknown,
flags: UP,PTP,MULTICAST
got message of size 152 on Sun Sep 28 23:50:10 2014


Yes, these messages are being triggerd by a flurry of routing messages.
This is probably a bug in NetBSD somewhere and the attached patch
should pretty much silence the warnings down to one n dhcpcd until the
link actually comes up.


So the flurry of messages is triggered by dhcpcd still (thanks 
jmcneill!).

This patch should fix it and all should now be well.

Roy




Bananapi - looping on boot

2014-12-30 Thread Frank Kardel

Hi !

I just try out -current (20141229 evbarm/earmv7hf) and see the boot 
constantly looping like this:


U-Boot SPL 2014.04-10733-gea1ac32 (Nov 24 2014 - 09:46:23)
Board: Bananapi
DRAM: 1024 MiB
CPU: 96000Hz, AXI/AHB/APB: 3/2/2
spl: not an uImage at 1600


U-Boot 2014.04-10733-gea1ac32 (Nov 24 2014 - 09:46:23) Allwinner Technology

CPU:   Allwinner A20 (SUN7I)
Board: Bananapi
I2C:   ready
DRAM:  1 GiB
MMC:   SUNXI SD/MMC: 0
*** Warning - bad CRC, using default environment

In:serial
Out:   serial
Err:   serial
Net:   dwmac.1c5
Hit any key to stop autoboot:  0
reading uEnv.txt
140 bytes read in 21 ms (5.9 KiB/s)
Loaded environment from uEnv.txt
Running uenvcmd ...
mmc0 is current device
reading Bananapi.bin
50908 bytes read in 31 ms (1.6 MiB/s)
reading netbsd.ub
5136896 bytes read in 310 ms (15.8 MiB/s)
## Booting kernel from Legacy Image at 8200 ...
   Image Name:   NetBSD/bpi 7.99.3
   Image Type:   ARM NetBSD Kernel Image (uncompressed)
   Data Size:5136832 Bytes = 4.9 MiB
   Load Address: 40007800
   Entry Point:  40007800
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
## Transferring control to NetBSD stage-2 loader (at address 40007800) ...
Early console started
[ Kernel symbol table missing! ]
The Regents of the University of California.  All rights reserved.

NetBSD 7.99.3 (BPI) #1: Tue Dec 30 10:10:55 CET 2014

kardel@Andromeda:/usr/srccur/src/sys/arch/evbarm/compile/obj.evbarm/BPI
total memory = 1024 MB
avail memory = 1008 MB
sysctl_createv: sysctl_create(machine_arch) returned 17
mainbus0 (root)
cpu0 at mainbus0 core 0: 960 MHz Cortex-A7 r0p4 (Cortex V7A core)
cpu0: DC enabled IC enabled WB disabled EABT branch prediction enabled
cpu0: 32KB/32B 2-way L1 VIPT Instruction cache
cpu0: 32KB/64B 4-way write-back-locking-C L1 PIPT Data cache
cpu0: 256KB/64B 8-way write-through L2 PIPT Unified cache
vfp0 at cpu0: NEON MPE (VFP 3.0+), rounding, NaN propagation, denormals
cpu1 at mainbus0 core 1
armperiph0 at mainbus0
armgic0 at armperiph0: Generic Interrupt Controller, 160 sources (151 valid)
armgic0: 32 Priorities, 128 SPIs, 7 PPIs, 16 SGIs
armgtmr0 at armperiph0: ARMv7 Generic 64-bit Timer (24000 kHz)
armgtmr0: interrupting on irq 27
awinio0 at mainbus0: A20 (0x1651)
awingpio0 at awinio0
awindma0 at awinio0: DMA
awindma0: interrupting on irq 59
awincnt0 at awinio0
com0 at awinio0 port 0: ns16550a, working fifo
com0: console
awindebe0 at awinio0 port 0: Display Engine Backend (BE0)
awintcon0 at awinio0 port 0: LCD/TV timing controller (TCON0)
awinhdmi0 at awinio0: HDMI 1.3
awinwdt0 at awinio0: default period is 10 seconds
awinrtc0 at awinio0: RTC
awinusb0 at awinio0 port 0
ohci0 at awinusb0: OHCI USB controller
ohci0: OHCI version 1.0
usb0 at ohci0: USB revision 1.0
ohci0: interrupting on irq 96
ehci0 at awinusb0: EHCI USB controller
ehci0: companion controller, 1 port each: ohci0
usb1 at ehci0: USB revision 2.0
ehci0: interrupting on irq 71
awinusb1 at awinio0 port 1
ohci1 at awinusb1: OHCI USB controller
ohci1: OHCI version 1.0
usb2 at ohci1: USB revision 1.0
ohci1: interrupting on irq 97
ehci1 at awinusb1: EHCI USB controller
ehci1: companion controller, 1 port each: ohci1
usb3 at ehci1: USB revision 2.0
ehci1: interrupting on irq 72
motg0 at awinio0: OTG
motg0: interrupting at irq 70

U-Boot SPL 2014.04-10733-gea1ac32 (Nov 24 2014 - 09:46:23)
Board: Bananapi
DRAM: 1024 MiB
CPU: 96000Hz, AXI/AHB/APB: 3/2/2
spl: not an uImage at 1600


U-Boot 2014.04-10733-gea1ac32 (Nov 24 2014 - 09:46:23) Allwinner Technology

CPU:   Allwinner A20 (SUN7I)
Board: Bananapi
I2C:   ready
DRAM:  1 GiB
MMC:   SUNXI SD/MMC: 0
*** Warning - bad CRC, using default environment

In:serial
Out:   serial
Err:   serial
Net:   dwmac.1c5
Hit any key to stop autoboot:  0
sun7i#

Any ideas? I was following the instruction from 
http://wiki.netbsd.org/ports/evbarm/allwinner/.


Frank


Re: Bananapi - looping on boot

2014-12-31 Thread Frank Kardel

Spot on - work with the other port. I was just too used to the RPI.

Thank for the hint.

Best regards,
  Frank



On 12/31/14 09:05, Nick Hudson wrote:

On 12/30/14 11:44, Frank Kardel wrote:


Hi,


motg0 at awinio0: OTG
motg0: interrupting at irq 70


It appears to be rebooting here - my guess is that you have your power 
supply plugged into
the OTG port and not the power socket. The power socket is on the long 
side in between the

sata power/sata connectors



http://1.bp.blogspot.com/-azSvZIIpG34/U8jNKCkvGsI/Aro/stwR2lJqlnI/s1600/Banana-pi-%E6%AD%A3%E9%9D%A2.png 



Nick




DRMKMS: NetBSD-current 201501242100 (7.99.4) and ATI Radeon HD 5450

2015-01-25 Thread Frank Kardel

Hi,

I tried out the GENERIC kernel of current-201501242100. The good news is 
that the KMS seems to work:

...
match_bootwedge: unable to read block 65 of dev dk6 (5)
boot device: sd2
root on sd2a dumps on sd2b
root file system type: ffs
kern.module.path=/stand/amd64/7.99.4/modules
drm: initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1787:0x2291).
drm: register mmio base: 0xfea2
drm: register mmio size: 131072
drm kern info: ATOM BIOS: CEDAR
radeon0: info: VRAM: 2048M 0x - 0x7FFF 
(2048M used)

radeon0: info: GTT: 1024M 0x8000 - 0xBFFF
drm: Detected VRAM RAM=800M, BAR=256M
drm: RAM width 64bits DDR
Zone  kernel: Available graphics memory: 11587444 kiB
Zone   dma32: Available graphics memory: 2097152 kiB
drm: radeon: 2048M of VRAM memory ready
drm: radeon: 1024M of GTT memory ready.
drm: Loading CEDAR Microcode
drm: Internal thermal controller with fan control
drm: radeon: dpm initialized
drm: GART: num cpu pages 262144, num gpu pages 262144
drm: PCIE GART of 1024M enabled (table at 0x0025D000).
radeon0: info: WB enabled
radeon0: info: fence driver on ring 0 use gpu addr 0x8c00 
and cpu addr 0x0x80023ccddc00
radeon0: info: fence driver on ring 3 use gpu addr 0x8c0c 
and cpu addr 0x0x80023ccddc0c
radeon0: info: fence driver on ring 5 use gpu addr 0x0005c418 
and cpu addr 0x0x80023c8dc418

drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
radeon0: interrupting at ioapic1 pin 0 (radeon)
drm: radeon: irq initialized.
drm: ring test on 0 succeeded in 1 usecs
drm: ring test on 3 succeeded in 1 usecs
drm: ring test on 5 succeeded in 1 usecs
drm: UVD initialized successfully.
drm: ib test on ring 0 succeeded in 0 usecs
drm: ib test on ring 3 succeeded in 0 usecs
drm: ib test on ring 5 succeeded
drm: Radeon Display Connectors
drm: Connector 0:
drm:   HDMI-A-1
drm:   HPD2
drm:   DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c
drm:   Encoders:
drm: DFP1: INTERNAL_UNIPHY1
drm: Connector 1:
drm:   DVI-D-1
drm:   HPD4
drm:   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
drm:   Encoders:
drm: DFP2: INTERNAL_UNIPHY
drm: Connector 2:
drm:   VGA-1
drm:   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
drm:   Encoders:
drm: CRT1: INTERNAL_KLDSCP_DAC1
radeondrmkmsfb0 at radeon0
radeon0: info: registered panic notifier
radeondrmkmsfb0: framebuffer at 0x80023ceff000, size 2560x1440, 
depth 32, stride 10240
wsdisplay0 at radeondrmkmsfb0 kbdmux 1: console (default, vt100 
emulation), using wskbd0

wsmux1: connecting to wsdisplay0
wsdisplay0: screen 1 added (default, vt100 emulation)
wsdisplay0: screen 2 added (default, vt100 emulation)
wsdisplay0: screen 3 added (default, vt100 emulation)
wsdisplay0: screen 4 added (default, vt100 emulation)
...

The radeondrmkmsfb0 driver is correctly initialize and a big 
consolescreen of 2560x1440 correctly appears.


Unfortunately X -configure fails to detect any card.
The RADEON driver shortly lists the supported chipsets
but does not go on with the initialization.

Compiling GENERIC with radeondrm* at drm? fails
due to macro redefinitions. (with radeondrm in 7.99.1 the
Xserver was able to initialize the radeon driver).

Both Xservers find the card in the PCI bus as:
7.99.1:
[52.580] (--) PCI:*(1:1:0:0) 1002:68f9:1787:2291 rev 0, Mem @ 
0xd000/268435456, 0xfea2/131072, I/O @ 0xe000/256, BIOS @ 
0x/131072


7.99.4:
[   655.185] (--) PCI:*(1:1:0:0) 1002:68f9:1787:2291 rev 0, Mem @ 
0xd000/268435456, 0xfea2/131072, I/O @ 0xe000/256, BIOS @ 
0x/131072


Is there anything else I need to chant as magic words?

Best regards,
  Frank


Re: LSI MegaRaid SAS3108

2015-04-21 Thread Frank Kardel

On 04/20/15 11:06, k...@wide.ad.jp wrote:

Folks,

Is anybody working to support 12Gbps SAS version of LSI MegaRAID
SAS3108 based card on NetBSD-current, -7Beta, or even 6.1?

It seems that SAS3108 is supported in latest versions of OpenBSD and
FreeBSD.

-- Akira Kato
I have one of those on my desk. According to FreeBSD it was sufficient 
to add the pci ids to mfi_pci.c. So I tried that an managed to install. 
So far so good, but now I am stuck with generic HBA errors on startup. 
Also the EOL termination of SG-lists that was done for later cards in 
FreeBSD didn't improve the situation.


As I am not working full time on it - so I have not done any research on 
the exact pattern that leads to a failure (and once you get the generic 
HBA error you are stuck with it). Also I have not tracked any further 
differences between the mfi driver versions. So I may be of limited help 
here. Neverheless I am very interested in the solution or testing 
proposed solutions.


BTW: on the Dell R730 (The card is a Perc H730P) bus_dmamap_sync() (not 
stack trace yet collected)  fails with a diagnostic assertion b/c offset 
0 is not valid for a 0-length map. Maybe that is one more datapoint to 
be investigated.


Frank


Re: why does dk(4) take precedence in boot device selection???

2015-04-25 Thread Frank Kardel

On 04/25/15 02:10, Michael van Elst wrote:

There is no safe way to identify the boot disk from information passed 
by the BIOS. Here is what the MD code for x86 does: 1. BTINFO_ROOTDEVICE 




3. BTINFO_BOOTDISK

a) ...
b) ...

c) bootloader passed BIOS disk number for a CD
Search for the first cd device, you can only boot from unit 0.
Pass driver instance (and partition 0) to MI code.
Well on our Dell system this assumption is invalid. When booting from a 
virtual CD image this is emulated as a USB cdrom. SATA cdroms attached 
to the system will be found/attached first. USB cdroms later in the boot 
sequence. The 'first device' heuristic leads to surprising behavior in 
the case. In order not to silently switch the boot device you need to 
completely disable the SATA controller in the BIOS. So maybe this logic 
should be revisited.


Frank



RPI usb can get stuck

2015-04-09 Thread Frank Kardel

Using an USB-Serial adapter I experience ucb lockups in 7.99.9.

Device (from dmesg):
uslsa0 at uhub1 port 3 configuration 1 interface 0
uslsa0: Silicon Labs ELV USB-WDE1 WetterdatenempfM-CM-$nger, rev 
1.10/1.00, addr 4

ucom0 at uslsa0: Silicon Labs CP210x

Symptoms:
rpi$ cu -l /dev/ttyU0 -s 9600
Connected

ELV USB-WDE1 v1.3
Baud:9600bit/s
Mode:LogView
$1;1;;15,7;41;0,2;1567;0;0
$1;1;;16,0;41;0,0;1567;0;0
$1;1;;16,0;41;0,0;1567;0;0
$1;1;;16,2;41;0,2;1567;0;0
$1;1;;16,4;40;0,5;1567;0;0
~
[EOT]
rpiahz$ cu -l /dev/ttyU0 -s 9600
Connected
[no respone to status query '?' here]
~
[EOT]
rpiahz$ cu -l /dev/ttyU0 -s 9600
[no respone to status query '?' here]
Connected
~
[EOT]
~~.
~.
[stuck here]

Process status:
1000  6706  3117 0   0  0 0 0 -   DE+  pts/0  0:00.00 (cu)
1000  9564  6706 0   0  0 0 0 -   Z+   pts/0  0:00.00 (cu)

stacktrace for 6706:
trace: pid 6706 lid 1 at 0x99373c0c
0x99373c0c: mi_switch+0xc
0x99373c3c: sleepq_block+0xa0
0x99373c7c: cv_timedwait_sig+0x118
0x99373cac: ttysleep+0x80
0x99373ccc: ttywait+0x38
0x99373ce4: ttywflush+0x14
0x99373cfc: ttylclose+0x1c
0x99373d1c: ucomclose+0x74
0x99373d44: cdev_close+0x7c
0x99373d84: spec_close+0x108
0x99373d9c: ufsspec_close+0x60
0x99373dbc: VOP_CLOSE+0x60
0x99373de4: vn_close+0x40
0x99373e24: closef+0x6c
0x99373e6c: fd_free+0x170
0x99373ee4: exit1+0x100
0x99373f04: sys_exit+0x3c
0x99373f7c: syscall+0x88
0x99373fac: swi_handler+0xa0

related dmesg output:
ucomreadcb: wonky status=INVAL

Other observations:
interrupt rate climbs from 8k to 10k while the device is open (much 
better than the 40k it reached before last batch of fixes).


In general You get one good run, on the second run you can still exit 
and on the third cu interaction you get stuck in exit(2).


Any clues?

Frank


Re: MegaRAID 3008/3108

2015-06-03 Thread Frank Kardel

On 06/03/15 21:30, Christos Zoulas wrote:

In article 20150603122110.5f267ef8@taliesin-2.local,
Harry Waddell  wadd...@caravan-epub.com wrote:

On Wed, 3 Jun 2015 18:27:44 + (UTC)
chris...@astron.com (Christos Zoulas) wrote:


In article 20150603111042.4fad14b2@taliesin-2.local,
Harry Waddell  wadd...@caravaninfotech.com wrote:

On Tue, 2 Jun 2015 16:13:07 +0100 (BST)
Stephen Borrill net...@precedence.co.uk wrote:


Anyone working on adding support for SYMBIOS MEGARAID 3108 (0x1000/0x005d)
or 3008 (0x1000/0x005f)? These are supported in OpenBSD by the mfii driver
which also supports the MEGARAID 2208 (0x1000/0x005b). In NetBSD, the
mfi(4) driver was extended to support the 2208 (Thunderbolt) rather than
adding a new driver. The 3008/3108 will require another MFI_IOP type
(OpenBSD call it 25).

--
Stephen


I have a system with this on the motherboard, but I'm dropping an lsi
9261-i8 in
because the newer cards are not supported. My vendor has told me that
the 9261 is near EOL,
so it would be really helpful if someone could add support for the newer LSI
cards. Unfortunately, I don't have much experience in this area.

Shouldn't be too hard to do... As long someone has a card to test...

christos


I just double checked my invoice and the MB has an embedded 3108
hooked up to 7 sata and 1 ssd drive ( for cachecade ).

I don't need to put this into production immediately, so I
can test something during the next few days.
After that, it will be hard  to test with drives hooked up as
the cables are not the same for it and the 9261. Just seeing if the
hw get recognized without drives attached would be easy enough.

I should be getting two more identical systems in the next few months,
so I could test it with one of those too. The plan is to put
netbsd-7 on these, but I could boot a current kernel if needed.

If Frank can post his patches and we can take a look at the OpenBSD
driver for the I/O performance fix.

christos

Well, here is the proof of concept patch - could be a starting point.
Frank
Index: ic/mfi.c
===
RCS file: /cvsroot/src/sys/dev/ic/mfi.c,v
retrieving revision 1.57
diff -u -r1.57 mfi.c
--- ic/mfi.c	4 Apr 2015 15:10:47 -	1.57
+++ ic/mfi.c	3 Jun 2015 19:41:32 -
@@ -71,7 +71,6 @@
  * are those of the authors and should not be interpreted as representing
  * official policies,either expressed or implied, of the FreeBSD Project.
  */
-
 #include sys/cdefs.h
 __KERNEL_RCSID(0, $NetBSD: mfi.c,v 1.57 2015/04/04 15:10:47 christos Exp $);
 
@@ -109,7 +108,7 @@
 #endif /* NBIO  0 */
 
 #ifdef MFI_DEBUG
-uint32_t	mfi_debug = 0
+uint32_t	mfi_debug = ~0 /* XXXkd */
 /*		| MFI_D_CMD  */
 /*		| MFI_D_INTR */
 /*		| MFI_D_MISC */
@@ -3292,7 +3291,11 @@
 	for (i = 0; i  sge_idx; i++) {
 		sgl_ptr-Address = htole64(sgd[i].ds_addr);
 		sgl_ptr-Length = htole32(sgd[i].ds_len);
-		sgl_ptr-Flags = 0;
+		if (i == sge_count - 1) {
+			sgl_ptr-Flags = MPI25_IEEE_SGE_FLAGS_END_OF_LIST;
+		} else {
+			sgl_ptr-Flags = 0;
+		}
 		if (sge_idx  sge_count) {
 			DNPRINTF(MFI_D_DMA,
 			sgl %p %d 0x% PRIx64  len 0x% PRIx32
@@ -3309,8 +3312,11 @@
 		sg_chain = sgl_ptr;
 		/* Prepare chain element */
 		sg_chain-NextChainOffset = 0;
+		sg_chain-Flags = (MPI2_IEEE_SGE_FLAGS_CHAIN_ELEMENT);
+#if 0 /* XXXkd */
 		sg_chain-Flags = (MPI2_IEEE_SGE_FLAGS_CHAIN_ELEMENT |
-		MPI2_IEEE_SGE_FLAGS_IOCPLBNTA_ADDR);
+   MPI2_IEEE_SGE_FLAGS_IOCPLBNTA_ADDR); /* XXXkd not in FreeBSD! */
+#endif
 		sg_chain-Length =  (sizeof(mpi2_sge_io_union) *
 		(sge_count - sge_idx));
 		sg_chain-Address = ccb-ccb_tb_psg_frame;
@@ -3322,7 +3328,11 @@
 		for (; i  sge_count; i++) {
 			sgl_ptr-Address = htole64(sgd[i].ds_addr);
 			sgl_ptr-Length = htole32(sgd[i].ds_len);
-			sgl_ptr-Flags = 0;
+			if (i == sge_count - 1) {
+sgl_ptr-Flags = MPI25_IEEE_SGE_FLAGS_END_OF_LIST;
+			} else {
+sgl_ptr-Flags = 0;
+			}
 			DNPRINTF(MFI_D_DMA,
 			sgl %p %d 0x% PRIx64  len 0x% PRIx32
 			 flags 0x%x\n, sgl_ptr, i, sgl_ptr-Address,
Index: pci/mfi_pci.c
===
RCS file: /cvsroot/src/sys/dev/pci/mfi_pci.c,v
retrieving revision 1.18
diff -u -r1.18 mfi_pci.c
--- pci/mfi_pci.c	29 Mar 2014 19:28:25 -	1.18
+++ pci/mfi_pci.c	3 Jun 2015 19:41:32 -
@@ -144,6 +144,8 @@
 	  MFI_IOP_SKINNY,	mfi_skinny_subtypes },
 	{ PCI_VENDOR_SYMBIOS,	PCI_PRODUCT_SYMBIOS_MEGARAID_2208,
 	  MFI_IOP_TBOLT,	mfi_tbolt_subtypes },
+	{ PCI_VENDOR_SYMBIOS,	PCI_PRODUCT_SYMBIOS_MEGARAID_3108,
+	  MFI_IOP_TBOLT,	mfi_tbolt_subtypes },
 };
 
 const struct mfi_pci_device *


Re: MegaRAID 3008/3108

2015-06-04 Thread Frank Kardel

On 06/03/15 20:27, Christos Zoulas wrote:

In article 20150603111042.4fad14b2@taliesin-2.local,
Harry Waddell  wadd...@caravaninfotech.com wrote:

On Tue, 2 Jun 2015 16:13:07 +0100 (BST)
Stephen Borrill net...@precedence.co.uk wrote:


Anyone working on adding support for SYMBIOS MEGARAID 3108 (0x1000/0x005d)
or 3008 (0x1000/0x005f)? These are supported in OpenBSD by the mfii driver
which also supports the MEGARAID 2208 (0x1000/0x005b). In NetBSD, the
mfi(4) driver was extended to support the 2208 (Thunderbolt) rather than
adding a new driver. The 3008/3108 will require another MFI_IOP type
(OpenBSD call it 25).

--
Stephen


I have a system with this on the motherboard, but I'm dropping an lsi
9261-i8 in
because the newer cards are not supported. My vendor has told me that
the 9261 is near EOL,
so it would be really helpful if someone could add support for the newer LSI
cards. Unfortunately, I don't have much experience in this area.

Shouldn't be too hard to do... As long someone has a card to test...

christos
One of our customer systems (Dell PowerEdge R730) has this card. I got 
it to work by adding the pciids to the driver and crudely adjusting the 
thunderbolt support to use EOM markers, remove the setting of a flag. 
I/O seemed to be working (installation was ok and the system was running 
fine. Issues left were: Absysmal I/O performance on SSDs (no non SSDs 
were available) in the range of 5 - 40 Mb/sec averaging around 20 
Mb/sec. Checking other OS delivered: FreeBSD 10 - 5 MB/sec, OpenBSD 420 
Mb/sec slowly decreasing. Linux SuSe 13.2 - 525-490 MB/sec. So due to 
time constraints and a customer machine we went for the fastest. Patches 
(mis-using the MFI_IOP type for thunderbolt) have been postedalready. 
OpenBSD seems to have an additional change in the way i/o commands are 
handled.


Frank


drm/radeon + X + pkgsrc 2015Q1

2015-05-24 Thread Frank Kardel

Hi,

I just built 2015Q1/amd64 for 7.99.16.

The good news is KMS seems to work for my ATI Radeon HD 5450.
However there seems to be quite some trouble with the X server when glx 
is not disabled. For a while everything seems to work smoothly (glxgears 
is fine, most glx screensavers work - great), but then the Xorg server
crashes with SIGSEGV. Untortunately it is not predictable when. 
Sometimes it is enough to open the first or second window via ALT-F2. 
Sometimes the Xserver survives longer. The good thing is, that the
system remains usable after the crash and kdm restarts the Xorg server. 
So the issue seems to be limited to Xserver.


System amd64, 32Gb, 7.99.16 kernel+userland, pkgsrc-2015Q1, kde4.

Any ideas?

Frank




if_wm.c 1.410 sometimes hangs / sndq drops

2016-05-29 Thread Frank Kardel

With -current as of 20160526T13Z and if_wm.c 1.410
a stuck interface is abserved on following hardware

wm1 at pci11 dev 0 function 0: Intel i82583V (rev. 0x00)
wm1: interrupting at ioapic0 pin 18
wm1: PCI-Express bus
wm1: 2048 words FLASH, version 1.10.0, Image Unique ID 
wm1: Ethernet address bc:5f:f4:98:32:84
makphy0 at wm1 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1

when rsyncing. The interface is on the sending side and
the sndq drops increase dramatically:

net.interfaces.wm1.rcvq.drops = 0
net.interfaces.wm1.sndq.len = 0
net.interfaces.wm1.sndq.maxlen = 256
net.interfaces.wm1.sndq.drops = 6077

ifconfig down/up recover the interface.

Any ideas?

Frank


Re: nouveau under -current

2016-03-05 Thread Frank Kardel

FWIW, a Lenovo W510 also panics.

Frank
On 03/05/16 16:06, Tom Ivar Helbekkmo wrote:

Taylor R Campbell  writes:


I have a pretty similar machine with a pretty similar issue -- a T61p
with some kind of nvidia graphics (not sure the marketing number).  If
you see something about a uvm fault during nouveau_ramht_new, it's
probably the same issue: .

I get this too, on my Dell Latitude E6400 laptop.  Disabling nouveau
locally for now.

-tih




bananapi awge0 & dhcpcd issue observed in current-20160702

2016-07-03 Thread Frank Kardel

Hi *!

I am currently observing that dhcpcd does not seem to obtain an ip 
address at boot on awge0.


ifconfig shows status active, but no ip address. Shortly after the 
ifconfig the ip address is optained though.


Doing ifconfig seems to get things unwedged. It looks like dhcpcd does 
not recognize the carrier for awge0 at first (until ifconfig).


Is anybody seeing similar behavior on current-20160702 ?

Frank


if_tun broken?

2016-09-02 Thread Frank Kardel

Hi !

When running a -current i386 kernel as of 20160902 (own and daily 
builds) and a 7.99.16 userland the tun interface seem broken.


Data received via vtund (tty side) is seen and correctly received 
(routes are recognized, tcpdump works)


Interfaces seem correctly configured:
Gateway# ifconfig tun0
tun0: flags=51 mtu 1500
inet 10.0.0.200 -> 10.0.0.1 netmask 0x
Gateway# ifconfig tun1
tun1: flags=51 mtu 1500
inet 10.0.0.200 -> 10.200.100.1 netmask 0x

tundebug=1 gives:

tun0: tunpoll
tun0: tunpoll waiting
tun0: tunwrite
tun0: tun_output
tun0: not ready 03
tun0: tunpoll
tun0: tunpoll waiting
tun0: tunpoll
tun0: tunpoll waiting
tun0: tunwrite
tun0: tun_output
tun0: not ready 03

Thus the interface is not ready. TUN_IASET is not set. The 
IFADDR_READER_FOREACH(ifa, ifp) loop in tuncreate() does not find any 
addresses tough ifconfig list them.


Thus something seems to be broken here - any ideas?
Must userland match though ifconfig output and behavior looks fine?
Can this be replicated?

Best regards,
  Frank




-current 7.99.36 multicast panic: trap

2016-09-04 Thread Frank Kardel

Hi !

running the -current (7.99.36) with 7.99.16 userland reliably traps at:

src/sys/netinet/ip_mroute.c:1751

on an multi-homed i386 system running mrouted.

A 7.99.16 kernel survives that fine. Do we have a userland dependency or 
is this a new regression from the network stack multiprocessing changes?


Frank


wm WOL not working anymore

2016-10-22 Thread Frank Kardel

Hi !

There has be quite some work going on for wm interfaces.

When testing current kernels I found that some time after
if_wm.c:1.347 the WOL functionality has stopped working
on my ASRock 990FX Extreme 9 wm interfaces (PHYs are down
after "shutdown -p").

Compiling if_wm.c with "options WM_WOL" leads to
compilations errors (defined, but not used).

So currently I gather that WOL on wm is work in
progress - am I right ?

dmesg snipplets:
wm0 at pci6 dev 0 function 0: Intel i82572EI 1000baseT Ethernet (rev. 0x06)
wm0: interrupting at ioapic1 pin 23
wm0: PCI-Express bus
wm0: 2048 words (16 address bits) SPI EEPROM, version 5.11.8, Image 
Unique ID 

wm0: Ethernet address 00:1b:21:xx:yy:zz
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto


wm1 at pci13 dev 0 function 0: Intel i82583V (rev. 0x00)
wm1: interrupting at ioapic0 pin 18
wm1: PCI-Express bus
wm1: 2048 words FLASH, version 1.10.0, Image Unique ID 
wm1: Ethernet address bc:5f:f4:xx:yy:zz
makphy0 at wm1 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto


Frank


Re: strange observations on network configuration (ifconfig)

2016-12-11 Thread Frank Kardel

Hi,

thanks for your reply.

As for backward compatibility: An old userland routed works ok with a 
new kernel.


The log for an added interface for routed for 7.99.16 on a 7.99.42 
kernel is:

-- 10:38:43 --
Recv RIPv2 RESPONSE from 10.200.1.1.520 via wm1
0.0.0.0metric=9
10.0.0.0   metric=1
10.0.0.128/32  metric=2
...
-- 10:38:45 --
Add interface wm0  A.B.C.38   -->A.B.C.38/29 
ChgA.B.C.32/29 -->A.B.C.38 metric=16 wm0 10:35:43 <>
   metric=0  10:38:45 
note RTM_NEWADDR with flags 0x100 for unknown interface index #0
ignore RTM type 0x16 without dst
### the new version of RTM_NEWADDR (RTM_ONEWADDR now)
ignore ARP RTM_ADD from pid 19889: A.B.C.38/32
### here the old routed still detects and ignores the loopback route
RTM_ADD from pid 19889: A.B.C.32/29 --> A.B.C.38
### correct network route
-- 10:38:56 --
send all routes and inhibit dynamic updates for 2.637 sec

As the 'old' routed still manages the routed for the new kernel should 
probably follow the previous behavior.


Looking at the routed diffs related to "ignore ARP RTM_ADD from pid 
19889: A.B.C.38/32"

I find following commit:
Index: table.c
===
RCS file: /cvsroot/src/sbin/routed/table.c,v
retrieving revision 1.24
retrieving revision 1.25
diff -u -r1.24 -r1.25
--- table.c 26 Oct 2009 02:53:15 -  1.24
+++ table.c 4 Apr 2016 07:37:07 -   1.25
@@ -1,4 +1,4 @@
-/* $NetBSD: table.c,v 1.24 2009/10/26 02:53:15 christos Exp $  */
+/* $NetBSD: table.c,v 1.25 2016/04/04 07:37:07 ozaki-r Exp $   */

 /*
  * Copyright (c) 1983, 1988, 1993
@@ -36,7 +36,7 @@
 #include "defs.h"

 #ifdef __NetBSD__
-__RCSID("$NetBSD: table.c,v 1.24 2009/10/26 02:53:15 christos Exp $");
+__RCSID("$NetBSD: table.c,v 1.25 2016/04/04 07:37:07 ozaki-r Exp $");
 #elif defined(__FreeBSD__)
 __RCSID("$FreeBSD$");
 #else
@@ -1106,12 +1106,6 @@
|| INFO_DST()->sa_family != AF_INET)
continue;

-   /* ignore ARP table entries on systems with a merged route
-* and ARP table.
-*/
-   if (rtm->rtm_flags & RTF_LLINFO)
-   continue;
-
/* ignore cloned routes
 */
 #if defined(RTF_CLONED) && defined(__bsdi__)
@@ -1273,11 +1267,6 @@
continue;
}

-   if (m.r.rtm.rtm_flags & RTF_LLINFO) {
-   trace_act("ignore ARP %s", str);
-   continue;
-   }
-
 #if defined(RTF_CLONED) && defined(__bsdi__)
if (m.r.rtm.rtm_flags & RTF_CLONED) {
trace_act("ignore cloned %s", str);

Could that be related to the observed behavior (especially the if 
(rtm->rtm_flags & RTF_LLINFO) continue;)?

The RTF_LLINFO is set when looking at the route monitor trace.

Best regards,
  Frank


On 12/11/16 09:04, Ryota Ozaki wrote:

Hi,

Thank you for the report.

On Mon, Dec 5, 2016 at 11:47 PM, Frank Kardel <kar...@netbsd.org> wrote:

Hi !

when trying out a -current from 20161127 (7.99.42) I see issues with routed.

On configuration of an interface address A.B.C.D/m the local network address
A.B.C.D is correctly entered with a loopback host route for the local
address
in the routing table.
Also the network route via the interface is correctly entered in the table.

As soon as routed detects the new interface it seems to miss the loopback
host route for the local address and consequently decides to remove the
loopback host route from the kernel routing table,

route monitor output:
got message of size 160 on Mon Dec  5 15:10:49 2016
RTM_CHANGE: Change Metrics, Flags or Gateway: len 160, pid 25290, seq 1,
errno 0, flags: <GATEWAY,DONE>
locks: none inits: 
sockaddrs: <DST,GATEWAY,NETMASK>
  default 10.200.1.1 0.0.0.0
got message of size 96 on Mon Dec  5 15:10:52 2016
RTM_ONEWADDR: address being added to iface: len 96, pid 2, seq 0, errno 528,
flags: 
locks: <sendpipe,mtu> inits: none
got message of size 104 on Mon Dec  5 15:10:52 2016
RTM_NEWADDR: address being added to iface: len 104, metric 0, flags:

sockaddrs: <NETMASK,IFP,IFA,BRD>
  255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 default
### new address (tentative)
got message of size 160 on Mon Dec  5 15:10:52 2016
RTM_ADD: Add Route: len 160, pid 4878, seq 0, errno 0, flags:
<UP,HOST,LLINFO,LOCAL>
locks: none inits: none
sockaddrs: <DST,GATEWAY>
  A.B.C.38 link#2
### local address loopback link
got message of size 208 on Mon Dec  5 15:10:52 2016
RTM_ADD: Add Route: len 208, pid 4878, seq 0, errno 0, flags:
<UP,DONE,CONNECTED>
locks: none inits: none
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
  A.B.C.32 link#2 255.255.25

Re: strange observations on network configuration (ifconfig)

2016-12-11 Thread Frank Kardel

Hi !

Reverting that change (1.24->1.25)  and using RTF_LLDATA instead of 
RTF_LLINFO seems to solve the problem.

Is this correct or am I overlooking something?

Frank

On 12/11/16 11:38, Frank Kardel wrote:

Hi,

thanks for your reply.

As for backward compatibility: An old userland routed works ok with a 
new kernel.


The log for an added interface for routed for 7.99.16 on a 7.99.42 
kernel is:

-- 10:38:43 --
Recv RIPv2 RESPONSE from 10.200.1.1.520 via wm1
0.0.0.0metric=9
10.0.0.0   metric=1
10.0.0.128/32  metric=2
...
-- 10:38:45 --
Add interface wm0  A.B.C.38   -->A.B.C.38/29 
ChgA.B.C.32/29 -->A.B.C.38 metric=16 wm0 10:35:43 <>
   metric=0  10:38:45 
note RTM_NEWADDR with flags 0x100 for unknown interface index #0
ignore RTM type 0x16 without dst
### the new version of RTM_NEWADDR (RTM_ONEWADDR now)
ignore ARP RTM_ADD from pid 19889: A.B.C.38/32
### here the old routed still detects and ignores the loopback route
RTM_ADD from pid 19889: A.B.C.32/29 --> A.B.C.38
### correct network route
-- 10:38:56 --
send all routes and inhibit dynamic updates for 2.637 sec

As the 'old' routed still manages the routed for the new kernel should 
probably follow the previous behavior.


Looking at the routed diffs related to "ignore ARP RTM_ADD from pid 
19889: A.B.C.38/32"

I find following commit:
Index: table.c
===
RCS file: /cvsroot/src/sbin/routed/table.c,v
retrieving revision 1.24
retrieving revision 1.25
diff -u -r1.24 -r1.25
--- table.c 26 Oct 2009 02:53:15 -  1.24
+++ table.c 4 Apr 2016 07:37:07 -   1.25
@@ -1,4 +1,4 @@
-/* $NetBSD: table.c,v 1.24 2009/10/26 02:53:15 christos Exp 
$  */
+/* $NetBSD: table.c,v 1.25 2016/04/04 07:37:07 ozaki-r Exp 
$   */


 /*
  * Copyright (c) 1983, 1988, 1993
@@ -36,7 +36,7 @@
 #include "defs.h"

 #ifdef __NetBSD__
-__RCSID("$NetBSD: table.c,v 1.24 2009/10/26 02:53:15 christos Exp $");
+__RCSID("$NetBSD: table.c,v 1.25 2016/04/04 07:37:07 ozaki-r Exp $");
 #elif defined(__FreeBSD__)
 __RCSID("$FreeBSD$");
 #else
@@ -1106,12 +1106,6 @@
|| INFO_DST()->sa_family != AF_INET)
continue;

-   /* ignore ARP table entries on systems with a merged 
route

-* and ARP table.
-*/
-   if (rtm->rtm_flags & RTF_LLINFO)
-   continue;
-
/* ignore cloned routes
 */
 #if defined(RTF_CLONED) && defined(__bsdi__)
@@ -1273,11 +1267,6 @@
continue;
}

-   if (m.r.rtm.rtm_flags & RTF_LLINFO) {
-   trace_act("ignore ARP %s", str);
-   continue;
-   }
-
 #if defined(RTF_CLONED) && defined(__bsdi__)
if (m.r.rtm.rtm_flags & RTF_CLONED) {
trace_act("ignore cloned %s", str);

Could that be related to the observed behavior (especially the if 
(rtm->rtm_flags & RTF_LLINFO) continue;)?

The RTF_LLINFO is set when looking at the route monitor trace.

Best regards,
  Frank


On 12/11/16 09:04, Ryota Ozaki wrote:

Hi,

Thank you for the report.

On Mon, Dec 5, 2016 at 11:47 PM, Frank Kardel <kar...@netbsd.org> wrote:

Hi !

when trying out a -current from 20161127 (7.99.42) I see issues with 
routed.


On configuration of an interface address A.B.C.D/m the local network 
address

A.B.C.D is correctly entered with a loopback host route for the local
address
in the routing table.
Also the network route via the interface is correctly entered in the 
table.


As soon as routed detects the new interface it seems to miss the 
loopback

host route for the local address and consequently decides to remove the
loopback host route from the kernel routing table,

route monitor output:
got message of size 160 on Mon Dec  5 15:10:49 2016
RTM_CHANGE: Change Metrics, Flags or Gateway: len 160, pid 25290, 
seq 1,

errno 0, flags: <GATEWAY,DONE>
locks: none inits: 
sockaddrs: <DST,GATEWAY,NETMASK>
  default 10.200.1.1 0.0.0.0
got message of size 96 on Mon Dec  5 15:10:52 2016
RTM_ONEWADDR: address being added to iface: len 96, pid 2, seq 0, 
errno 528,

flags: 
locks: <sendpipe,mtu> inits: none
got message of size 104 on Mon Dec  5 15:10:52 2016
RTM_NEWADDR: address being added to iface: len 104, metric 0, flags:

sockaddrs: <NETMASK,IFP,IFA,BRD>
  255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 default
### new address (tentative)
got message of size 160 on Mon Dec  5 15:10:52 2016
RTM_ADD: Add Route: len 160, pid 4878, seq 0, errno 0, flags:
<UP,HOST,LLINFO,LOCAL>
locks: none inits: none
sockaddrs: <DST,GATEWAY>
  A.B.C.38 link#2
### local address loopback link
got messag

Re: strange observations on network configuration (ifconfig)

2016-12-12 Thread Frank Kardel

Hi,

I did try that efore an verified it again. Now routed attempts to 
install a local route

for the lo0 interface and fill the log with the EEXIST messages.

That's why I went for LLDATA in order to avoid to analyse routed's inner 
workings completely.

Maybe we need a different test for ignoring kernel routing messages.

Here is he error message from the log:
2016-12-12T09:08:06.522364+01:00 pip routed 10002 - - write(rt_sock) 
RTM_ADD127.0.0.1/32-->127.0.0.1 metric=0 flags=0: File exists


Here is the trace for the failed route insert attempt:
Tracing actions started
Tracing packets started
Tracing packet contents started
Tracing kernel changes started
Add interface lo0  127.0.0.1  -->127.0.0.1/32  
RCVBUF=61440
Add interface wm1  10.200.1.2 -->10.200.1.0/24 
turn on RIP
Add10.200.1.0/24   -->10.200.1.2   metric=0  wm1 
Add127.0.0.1/32-->127.0.0.1metric=0  lo0 
Send mcast RIPv2 REQUEST to 224.0.0.9.520 via wm1
QUERY
write(rt_sock) RTM_ADD127.0.0.1/32-->127.0.0.1 metric=0 flags=0: 
File exists

-- 09:08:06 --

The other part of of the path (not deleting loopback routes for local 
adresses) works.


Frank
On 12/12/16 01:36, Ryota Ozaki wrote:

Hi,

Thank you for the investigation.

On Sun, Dec 11, 2016 at 9:08 PM, Frank Kardel <kar...@netbsd.org> wrote:

Hi !

Reverting that change (1.24->1.25)  and using RTF_LLDATA instead of
RTF_LLINFO seems to solve the problem.
Is this correct or am I overlooking something?

Local routes aren't actually link-layer routes; RTF_LLDATA remain in them
for backward compatibility, IIRC. So as you said if old routed works on
a new kernel, I think it is good to fix routed as I proposed in my earlier
mail.

Could you try the patch?
   http://www.netbsd.org/~ozaki-r/fix-routed.diff

Thanks,
   ozaki-r




strange observations on network configuration (ifconfig)

2016-12-05 Thread Frank Kardel

Hi !

when trying out a -current from 20161127 (7.99.42) I see issues with routed.

On configuration of an interface address A.B.C.D/m the local network address
A.B.C.D is correctly entered with a loopback host route for the local address
in the routing table.
Also the network route via the interface is correctly entered in the table.

As soon as routed detects the new interface it seems to miss the loopback
host route for the local address and consequently decides to remove the
loopback host route from the kernel routing table,

route monitor output:
got message of size 160 on Mon Dec  5 15:10:49 2016
RTM_CHANGE: Change Metrics, Flags or Gateway: len 160, pid 25290, seq 1, errno 0, 
flags: 
locks: none inits: 
sockaddrs: 
 default 10.200.1.1 0.0.0.0
got message of size 96 on Mon Dec  5 15:10:52 2016
RTM_ONEWADDR: address being added to iface: len 96, pid 2, seq 0, errno 528, flags: 

locks:  inits: none
got message of size 104 on Mon Dec  5 15:10:52 2016
RTM_NEWADDR: address being added to iface: len 104, metric 0, flags: 
sockaddrs: 
 255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 default
### new address (tentative)
got message of size 160 on Mon Dec  5 15:10:52 2016
RTM_ADD: Add Route: len 160, pid 4878, seq 0, errno 0, flags: 

locks: none inits: none
sockaddrs: 
 A.B.C.38 link#2
### local address loopback link
got message of size 208 on Mon Dec  5 15:10:52 2016
RTM_ADD: Add Route: len 208, pid 4878, seq 0, errno 0, flags: 

locks: none inits: none
sockaddrs: 
 A.B.C.32 link#2 255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38
### net route via interface
got message of size 160 on Mon Dec  5 15:10:52 2016
RTM_DELETE: Delete Route: len 160, pid 25290, seq 2, errno 0, flags: 

locks: none inits: none
sockaddrs: 
 A.B.C.38 link#2
### routed deletes local address loopback link
got message of size 88 on Mon Dec  5 15:10:57 2016
RTM_ONEWADDR: address being added to iface: len 88, pid 2, seq 0, errno 520, flags: 

locks:  inits: 
got message of size 96 on Mon Dec  5 15:10:57 2016
RTM_NEWADDR: address being added to iface: len 96, metric 0, flags: 

sockaddrs: 
 255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 A.B.C.39
### address finally valid

[BTW: routed/table.c contains an out of date RTM_* number to string table - 
fixed in output below]

Trace from routed:
Tracing actions started
Tracing packets started
Tracing packet contents started
Tracing kernel changes started
Add interface lo0  127.0.0.1  -->127.0.0.1/32 
RCVBUF=61440
Add interface wm1  10.200.1.2 -->10.200.1.0/24   
turn on RIP
Add10.200.1.0/24   -->10.200.1.2   metric=0  wm1 
Add127.0.0.1/32-->127.0.0.1metric=0  lo0 
### initial interface state
Send mcast RIPv2 REQUEST to 224.0.0.9.520 via wm1
QUERY
-- 15:10:46 --
Recv RIPv2 REQUEST from 10.200.1.2.520 via wm1
QUERY
discard our own RIP request
-- 15:10:46 --
Recv RIPv2 RESPONSE from 10.200.1.1.520 via wm1
0.0.0.0metric=9
10.0.0.0   metric=1
10.0.0.128/32  metric=2
...
Add0.0.0.0 -->10.200.1.1   metric=9  wm1 15:10:46
Add10.0.0.0-->10.200.1.1   metric=1  wm1 15:10:46
Add10.0.0.128/32   -->10.200.1.1   metric=2  wm1 15:10:46
...
### received routing information

-- 15:10:47 --
Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0
-- 15:10:48 --
write kernel RTM_CHANGE 0.0.0.0 -->10.200.1.1  metric=9 flags=0x2
-- 15:10:50 --
Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0
-- 15:10:50 --
ignore RTM_ONEWADDR without dst
### old routing messages are not properly skipped?

Add interface wm0  A.B.C.38   -->A.B.C.32/29 
AddA.B.C.32/29 -->A.B.C.38 metric=0  wm0 
### new interface due to ifconfig wm0 A.B.C.D/29

note RTM_NEWADDR with flags 0x100 for unknown interface index #180
### RTM_NEWADDR not properly handled/skipped

RTM_ADD from pid 4878: A.B.C.38/32 --> A.B.C.38
RTM_ADD from pid 4878: A.B.C.32/29 --> A.B.C.38
-- 15:10:51 --
write kernel RTM_DELETE A.B.C.38/32 -->A.B.C.38metric=0 flags=0
### routed does not seem to consider the A.B.C.38/32 -->A.B.C.38 (if=lo0, 
gw=link#2) as being valid

-- 15:10:53 --
Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0
-- 15:10:53 --
ignore RTM_ONEWADDR without dst
note RTM_NEWADDR with flags 0x101 for unknown interface index #180

netstat -nrf inet shows directly after setting the local address:
Routing tables

Internet:
DestinationGatewayFlagsRefs  UseMtu Interface
default10.200.1.1 UG  --  -  wm1
10.200.1/24link#3 UC  --  -  wm1

Re: panic: kernel diagnostic assertion "next != _PSLIST_POISON"

2017-03-14 Thread Frank Kardel

Hmm, I think ch_voltag_convert_in() is a red herring,

Both panics contextually match the higher parts of the stack traces. So 
I would disregard the ch_voltag_convert_in() part here and
conclude it is two distinct panics. One relates to psref corruption in 
network code and the other to wapbl and possibly

recent mount update (-u) changes,

Other ideas ?

Frank

On 03/14/17 08:56, Masanobu SAITOH wrote:

Hi.

On 2017/03/14 16:36, Frank Kardel wrote:

Has anyone seen this panic recently?

Seen in -current-20170311, i386, Soekris 6501.

panic: kernel diagnostic assertion "next != _PSLIST_POISON" failed: 
file "/fs/raid2a/src/NetBSD/cur/src/sys/sys/pslist.h", line 270

cpu0: Begin traceback...
vpanic(c0cb1784,dba43dac,dba43e2c,c09e0d1e,c0cb1784,c0cb16d3,c0cb681b,c0cb6458,10e,a8) 
at netbsd:vpanic+0x121
ch_voltag_convert_in(c0cb1784,c0cb16d3,c0cb681b,c0cb6458,10e,a8,0,c3d70578,c09e0988,c3d70348) 
at netbsd:ch_voltag_convert_in
sysctl_iflist(4,cbd8cf60,c7,cbd8cff9,c33c06c0,c7,c090f986,0,cbd8cf60,a43e90) 
at c09e0d1e
sysctl_rtable(dba43f0c,3,afe01000,dba43efc,0,0,dba43f00,c3de1560,c3c11c0c,3) 
at c09e129c
sysctl_dispatch(dba43f00,6,afe01000,dba43efc,0,0,dba43f00,c3de1560,c3c11c0c,dba43efc) 
at netbsd:sysctl_dispatch+0xbd
sys___sysctl(c3de1560,dba43f68,dba43f60,7dd51000,c3de1560,dba43f60,dba43f68,0,0,b0094fb0) 
at netbsd:sys___sysctl+0xe3

syscall() at netbsd:syscall+0x257
--- syscall (number 202) ---
b00736f7:
cpu0: End traceback...

Frank


Yesterday I sent the following mail to current-users@ but it haven't
delivered yet...


 I updated my machine's kernel which was made from 1 hour ago's
-current source. It paniced. It's reproducible.


/dev/rwd0a: file system is clean; not checking
panic: kernel diagnostic assertion "!(bp->b_oflags & BO_DELWRI)" 
failed: file "../../../../kern/vfs_wapbl.c", line 1142

fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 0x80215455 cs 0x8 rflags 0x246 cr2 
0x770e1f2ae190 ilevel 0 rsp 0xfe8120956b00

curlwp 0xfe847b8820a0 pid 30.1 lowest kstack 0xfe81209532c0
Stopped in pid 30.1 (mount_ffs) at netbsd:breakpoint+0x5:  leave
db{15}> trace
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
wapbl_add_buf() at netbsd:wapbl_add_buf+0x133
bdwrite() at netbsd:bdwrite+0xbd
bwrite() at netbsd:bwrite+0x95
ffs_sbupdate() at netbsd:ffs_sbupdate+0x1b9
ffs_wapbl_start() at netbsd:ffs_wapbl_start+0x177
ffs_mount() at netbsd:ffs_mount+0x4e9
VFS_MOUNT() at netbsd:VFS_MOUNT+0x34
do_sys_mount() at netbsd:do_sys_mount+0x5ee
sys___mount50() at netbsd:sys___mount50+0x33
syscall() at netbsd:syscall+0x1ed
--- syscall (number 410) ---
770e1f28989a:
db{15}>


 At least five days ago's kernel worked without this proble,


Both panics include ch_voltag_convert_in()





netbsd-8 dhcpd dynamic dns updates working?

2017-07-16 Thread Frank Kardel

Has anyone seen dns dynamic updates from dhcpd working?

A previously working config (working in 7.99.71) does not seem to do dns 
dynamic updates at my netbsd-8 installation.


Could this be a fall-out/regression from updating dhcpd or bind (dhcpd 
relies on libdns from bind - and debugging shows that libdns returns

ISC error 35  - ALREADYRUNNING).


Frank


AMD Ryzen and NetBSD?

2017-06-30 Thread Frank Kardel

Hi,

has anybody had any experience with -current on the new AMD generation 
like Ryzen 7 1800X? Is there any motherboard that booted up and if so 
what devices were supported?


Best regards,
  Frank


Re: NetBSD -current/-8 EFI CDROM boot on SuperMicro A2SDi broken

2017-10-22 Thread Frank Kardel

I don't think that that is the problem, It is not a manually build
gpt boot partition, but the stock install UEFI image.

I debugged the "bad partition" path in the boot loader.
Setup a NetBSD-8.0_BETA-amd64-uefi-install.img.gz on a CD via a real USB 
DVD-Reader/Writer.
A bootx64.efi compiled with -DDISK_DEBUG shows "illegal partition". This 
is caused as the EFI environment on that system claims that the CDROM 
device is a HARD_DISK. Thus stand/lib/biosdisk.c:biosdisk_open() exits 
early due to a wrongly created default label in read_partitions().

int
biosdisk_open(struct open_file *f, ...)
/* struct open_file *f, int biosdev, int partition */
{
va_list ap;
struct biosdisk *d;
int biosdev;
int partition;
int error = 0;

va_start(ap, f);
biosdev = va_arg(ap, int);
d = alloc_biosdisk(biosdev);
if (d == NULL) {
error = ENXIO;
goto out;
}

partition = va_arg(ap, int);
#ifdef _STANDALONE
bi_disk.biosdev = d->ll.dev;
bi_disk.partition = partition;
bi_disk.labelsector = -1;

bi_wedge.biosdev = d->ll.dev;
bi_wedge.matchblk = -1;
#endif

#if !defined(NO_DISKLABEL) || !defined(NO_GPT)
error = read_partitions(d);
if (error == -1) {
error = 0;
goto nolabel;
}
if (error)
goto out;

if (partition >= BIOSDISKNPART ||
d->part[partition].fstype == FS_UNUSED) {
#ifdef DISK_DEBUG
printf("illegal partition\n");
#endif
error = EPART;<<<< here we start returning
goto out;
}
...

If just the kernel would be missing I would have expected a 'not found' 
instead of 'illegal partition'.


Did I miss something? Is the UEFI install image known to work elsewhere?

Frank

On 10/22/17 19:14, Michael van Elst wrote:

kar...@netbsd.org (Frank Kardel) writes:


booting hd0a:netbsd - starting in 0 seconds
open betbsd: bad partition


hd0a is the EFI partition. Did you copy the kernel there? If the
root partition follows next, then you need to boot hd0b:netbsd.





Re: NetBSD -current/-8 EFI CDROM boot on SuperMicro A2SDi broken

2017-10-22 Thread Frank Kardel

I debugged something more.

The BIOS cliad the CDROM device is a hard disk - so all our bootloader 
assumptions about no labels on non-harddisks (CDROM, FLOPPY) don't hold. 
Thus biosdisk_open operated on a fake label with block offset 0 and thus 
no access to the file system.


Then I put an install-image (8.99.4, -8.0_BETA) on a USB stick. Here I 
get into the boot loader - good. booting loads the kernel ... and then: 
nothing


Famous last words:


Welcome to the NetBSD/amd64 8.99.4 installation image
===

ACPI (Advanced Configuration and Power Interface) should work on all modern
and legacy hardware.  However if you do encounter a problem while booting,
try disabling it and report a bug at http://www.NetBSD.org/.

 1. Install NetBSD
 2. Install NetBSD (no ACPI)
 3. Install NetBSD (no ACPI, no SMP)
 4. Drop to boot prompt

Choose an option; RETURN for default; SPACE to stop countdown.
Option 1 will be chosen in 0 seconds.
18240704+815212+1281940 [1192536+820900+12704]=0x15ef580
[-STALLED/STUCK/CRASHED/WHATEVER-]

Nothing on VGA (consdev pc) or the SOL redirected com ports 
(com,0x...,115200). Looks like early trouble in startup or EFI.


Same behavior with -8.0_BETA from yesterday.

So I think we are looking at two issues:

1) bootloader when the device type from BIOS is HARD_DISK for CDROMs.

2) starting the kernel from the EFI bootloader does not seem to 
work on this machine (possibly others)


Frank



On 10/22/17 09:12, Martin Husemann wrote:

On Sat, Oct 21, 2017 at 07:11:38PM +0200, Frank Kardel wrote:

Manual transcription:

NetBSD/x86 EFI Bott (x86), Revision 1.0 (..) (from NetBSD 8.0_BETA)
Memory: 252/1971496 k

Press...
booting hd0a:netbsd - starting in 0 seconds
open betbsd: bad partition

It would probably help to see your partition information.

Martin




Re: Automated report: NetBSD-current/i386 test failure

2017-12-12 Thread Frank Kardel
That may also be related with the symptoms I see when booting 8.99.9 on 
amd64 while dhcpcd attempts to solicit a router - it gets stuck in route 
code on psref_release() - needless to say the machine is wedged at that 
point.


Frank

db{0}> mach cpu 1
using CPU 1
db{0}> bt
psref_release() at netbsd:psref_release+0x90
route_output() at netbsd:route_output+0xcb5
route_send_wrapper() at netbsd:route_send_wrapper+0x6c
sosend() at netbsd:sosend+0x7b9
soo_write() at netbsd:soo_write+0x2c
dofilewrite() at netbsd:dofilewrite+0x97
sys_write() at netbsd:sys_write+0x5f
syscall() at netbsd:syscall+0x235
--- syscall (number 4) ---
db{0}> mach cpu 2
using CPU 2
db{0}> bt
x86_pause() at netbsd:x86_pause+0x2
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16
Xintr_x2apic_edge1() at netbsd:Xintr_x2apic_edge1+0xef
--- interrupt ---
x86_pause() at netbsd:x86_pause+0x2
sleepq_block() at netbsd:sleepq_block+0x1c5
cv_timedwait() at netbsd:cv_timedwait+0x104
ipmi_thread() at netbsd:ipmi_thread+0x2f4
db{0}> mach cpu 3
using CPU 3
db{0}> bt
x86_pause() at netbsd:x86_pause+0x2
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16
Xintr_x2apic_edge3() at netbsd:Xintr_x2apic_edge3+0xef
--- interrupt ---
x86_pause() at netbsd:x86_pause+0x2
nd6_timer_work() at netbsd:nd6_timer_work+0x3d
workqueue_worker() at netbsd:workqueue_worker+0xbc
db{0}> mach cpu 0
using CPU 0
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x746
Xintr_x2apic_edge2() at netbsd:Xintr_x2apic_edge2+0xef
--- interrupt ---
x86_pause() at netbsd:x86_pause+0x2
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16
Xintr_x2apic_edge20() at netbsd:Xintr_x2apic_edge20+0xef
--- interrupt ---
x86_pause() at netbsd:x86_pause+0x2
softint_dispatch() at netbsd:softint_dispatch+0x15a
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xe4011d1eeff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---

Frank

On 12/12/17 10:29, NetBSD Test Fixture wrote:

This is an automatically generated notice of new failures of the
NetBSD test suite.

The newly failing test cases are:

 fs/vfs/t_mtime_otrunc:puffs_otrunc_mtime_update
 net/route/t_change:route_change_ifp
 net/route/t_change:route_change_ifp_ifa

The above tests failed in each of the last 3 test runs, and passed in
at least 27 consecutive runs before that.

The following commits were made between the last successful test and
the failed test:

 2017.12.11.02.33.17 knakahara src/sys/kern/subr_psref.c,v 1.8
 2017.12.11.02.33.17 knakahara src/sys/sys/psref.h,v 1.3
 2017.12.11.03.25.45 ozaki-r src/sys/net/if.c,v 1.412
 2017.12.11.03.25.45 ozaki-r src/sys/net/if.h,v 1.252
 2017.12.11.03.25.46 ozaki-r src/sys/net/npf/npf_ifaddr.c,v 1.3
 2017.12.11.03.25.46 ozaki-r src/sys/net/npf/npf_os.c,v 1.9
 2017.12.11.03.29.20 ozaki-r src/sys/net/if.c,v 1.413
 2017.12.11.03.29.20 ozaki-r src/sys/net/if.h,v 1.253
 2017.12.11.03.29.20 ozaki-r src/sys/net/if_bridge.c,v 1.145
 2017.12.11.03.29.20 ozaki-r src/sys/net/if_spppsubr.c,v 1.177
 2017.12.11.03.29.20 ozaki-r src/sys/net/if_vlan.c,v 1.119
 2017.12.11.03.29.20 ozaki-r 
src/sys/rump/net/lib/libnetinet/netinet_component.c,v 1.10

Log files can be found at:

 
http://releng.NetBSD.org/b5reports/i386/commits-2017.12.html#2017.12.11.03.29.20



NetBSD -current/-8 EFI CDROM boot on SuperMicro A2SDi broken

2017-10-21 Thread Frank Kardel

I just got a A2SDi-2C-HLN4F on my desk.

I have not been able to boot NetBSD-current/-8 on this device with 
following symptoms:

1) pxeboot is loaded successfully via network and ignored
	2) loading efibootx64.efi via network work and runs - failing to access 
any file systems (bad partition).
	  dev never finishes (though it lists some disks with partitions - USB, 
simulated USB cdrom from iKVM)
3) booting a physical CD from a USB drive also starts the EFI 
bootloader that then fails with 'bad partition' on all tried file names.


Thus 1) shows that this board does not care much about non EFI boot.
2 & 3) show that our EFI bootloader needs some care (8.99.4 and 8_BETA).

Manual transcription:
>> NetBSD/x86 EFI Bott (x86), Revision 1.0 (..) (from NetBSD 8.0_BETA)
>> Memory: 252/1971496 k
Press...
booting hd0a:netbsd - starting in 0 seconds
open betbsd: bad partition

same for netbsd.gz, onetbsd, onetbsd.gz, netbsd.old, netbsd.old.gz.

Any ideas how to proceed? Currently, I try to find out where and why 
EPART is returned.


BTW: the penguin just runs

Best regards,
  Frank


lokckup in -current 8.99.18

2018-05-20 Thread Frank Kardel

Hi !

I just tried to upgrade from 8.99.14 to 8.99.18 (amd64).

Sadly this version does lock up (not console input, no net, no disk 
activity) when running the DB phase of bacula (backup program). Seems 
the intensive i/o of postgres triggers this lock up.


Unfortunately I cannot enter DDB via the PS2 keyboard - so no further 
information on system state is available.


Is anybody else experiencing lock up with 8.99.18 and normal operation i/o ?

Best regards,

  Frank



Lockups in a Ryzen 7 1800X ASUS Crosshair Hero VI system

2018-02-05 Thread Frank Kardel

I am currently collecting information on failure modes on a
Ryzen 7 1800X ASUS Crosshair Hero VI system 64Gb.

Since August 2017 this system has been unreliable to put it nicely.
Most of the time it just seemed locked up. The CPU was replaced by AMD 
due to the SEGFAULT issue. Now as there is a new AGESA available with 
the BIOS I try to collect information about the failure.


Most of the time the system just seems to lock up hard from what one can 
determine.

No DDB access - only USB keyboard.

There have been 3 cases where I was able to enter DDB and collect some 
more information- seems like the machine is not always completely dead:


First time:
	many processes stuck around VN_LOGK()/biglock while pkgsrc 
bulk-building on null- and tmpfs-mounted sandboxes.

no dump - dump partition too small

Second time:
	panic: locking against myself- forgot where because started st scroll 
away. Running pkgsrc bulk builds with fewer null mounts and not tmpfs 
mounts on build area.

no dump - DDB looped outputting garbage.

Third time:
	lots of process stuck in pmap_update()- while running parallel bacula 
backups.

DDB showed CPUs 12 and 13 as not paused
dump available
ps and active process back traces below:

So are we still having locking issues (kernel from 20180130/amd64 is 
with LOCKDEBUG,DIAGNOSTIC and DEBUG)?


pkgsrc bulk-builds (18 builders) lockup happened repeatedly around 
22:30h into the pbulk-bulkbuild run - might suggest a SW issue.


BTW: DDB pagination does not seem to work with the USB keyboard.

Dump information below:

Crash version 8.99.12, image version 8.99.12.
System panicked: reboot forced via kernel debugger
Backtrace from time of crash is available.

PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
6976 1 3  14   902   e40f94bc5580dovecot tstile
109541 3   4   802   e40ec556ca80 master tstile
9331 >   1 7   8   802   e40a251e0040   send2display
7442 1 2   4   802   e40eafa0c2a0 sh
8109 1 2  12   802   e40f62ae0100 sh
7540 1 2   8   802   e40fe76a4480   cron
3838 1 3  14   902   e40fecee4180 tlsmgr tstile
4964 1 2  11   802   e40a251e0880   postgres
4591 1 2   0   802   e40fecee49c0   postgres
2379 1 2   0   802   e40fe76a48a0   postgres
3944 1 2   0   802   e40e8a038840   postgres
4121 1 2  15   802   e40e5e7c0360   postgres
3369 1 2   0   802   e40feec63500   postgres
1931 1 2   0   802   e40f97458200   postgres
1808 1 2   0   802   e40feecea0a0   postgres
2868 1 2   5   802   e40f62ae0520   imap
4663 1 2   0   802   e40fe932c700 imap-login
4582 1 2   3   802   e40edbfad5e0   imap
4280 1 2   0   802   e40fe7ded120 imap-login
658  1 2  14   802   e40f89507900   imap
2346 1 2   9   802   e40f8cba3340 imap-login
522  1 3   3   802   e40f439ad9a0   imap tstile
3165 1 3   0   802   e40f8c294440   tcsh tstile
2906 1 2   0   802   e40f62ae0940 systat
2300 1 2   1   802   e40f632e00e0   tcsh
1029 1 2   0   802   e40f632e0920   tcsh
137  2 2   0   802   e40feecea4c0   bconsole
137  1 2   0   802   e40feede34a0   bconsole
3078 1 2  10   802   e40f895070c0   tcsh
2939 1 3  14   802   e40faaccc1e0   tcsh tstile
336  2 2   0   802   e40feede3080  kdm_greet
336  1 2   0   802   e40fe7ded960  kdm_greet
2573 1 2   0   802   e40fe78fb080kdm
328  2 2   0   902   e40feecea8e0  X
328  1 2   0   802   e40fe57650c0  X
2892 1 2   0   802   e40fe67d8320kdm
2577 1 2   0   802   e40f94440240  getty
2936 1 2   0   802   e40f8fd69700  getty
2747 1 2   7   802   e40feede38c0  getty
633 28 2  13   802   e40fa8ddd6e0 bacula-dir
633 26 2   0   802   e40eb258d9e0 bacula-dir
633 23 2   0   802   e40eafa0c6c0 bacula-dir
633 22 2   0   802   e40eafa0cae0 bacula-dir
633 21 2   0   802   e40f61109560 bacula-dir
633 20 2   0   802   e40f94440660 bacula-dir
633 19 2   0   802   e40fa8ddd2c0 bacula-dir
633 18 2   0   802   e40fa71229e0 bacula-dir
633 16 2   6   802   e40ec556c240 bacula-dir
633 14 2   0   802 

Re: Lockups in a Ryzen 7 1800X ASUS Crosshair Hero VI system

2018-02-11 Thread Frank Kardel

On 02/06/18 13:16, m...@netbsd.org wrote:

upon further reading it's probably not related but it bugs me that this
patch/similar hack is still not in.


Yes - I have been running with XSAVEOPT disable since mlelstx sent that 
observation. I still have


Lockups - the Ryzen machine is difficault as mostof the time I cannot 
get into the debugger thus I cannot decide wether it is a hard 
(CPU/system) lockup or just a software bug.


I have seen tstile related lockups on another 8.99.9 machine that ceases 
network operation at the point and processes pile up on tstiles when 
accessing the network. So at least one locking issue seems to be there.


Frank



Re: npf in -current amd64 (7 Mar 2018) now cannot use a "ruleset" multiple times

2018-03-10 Thread Frank Kardel

Hi!

It may be a fix/safeguard for a reload prolem I discussed with rmind@ 
and christos@ in May 2017:


To quote my analysis:

OK, I got the time to dig into it and found the cause.

Silly me has (now had) a configuration like this:

group "a" on if1 {
  ruleset "blacklistd"
  ...
}

group default {
  ruleset "blacklistd"
  ...
}

This leads to two group entries in the rs_dynamic list refencing the same named 
dynamic ruleset.
Dynamic rulesets are matched by name in a first found basis.
rs_dynamic[0] has the ruleset reference for the default group (last entry in 
the config file).
rs_dynamic[1] has the rulesat reference for the interface specific group
When reloading oldconfig->rs_dynamic[0] will be re-parented correctly to the new first 
group referencing the rule name "blacklistd".
For the second new interface specific group the old first group is looked up as 
parent (first match of old group list). That's when
the KASSERT fires as these dynamic rules have already been re-parented to the 
new configuration.

This also explains why I see no effect of the dynamic rules. dynamic rules are 
added by name and thus to the first matching entry. This is the ruleset 
reference from the default group - bummer: one name but two instances.

So we have a semantic issue here - following questions arise:
1) can groups only reference unique dynamic ruleset names? This needs to be 
enforced then.
2) If dynamic rulesets are to be shared then the parent notion needs to be 
reworked/redesigned.

Best regards,
  Frank 


Looks like solition 1) was choosen for now.

Seems you just didn't trip over the underlying issue. I have not closely 
followed NPF lately so I don't know the actual fix status of NPF. As in 
the example above I also had the idea of re-using dynamic rulesets - 
maybe this can be implemented as I still think it is useful. For now my 
workaround was to have several rulesets, but this is just good enough to 
work with blacklistd.


Frank

On 03/07/18 05:46, Geoff Wing wrote:

Hi,
npf previously had no issues using a "ruleset" in multiple groups, however
it now has a problem and fails with

npfctl: (re)load failed: some table has a duplicate entry?

The following is a minimal npf.conf to illustrate with it failing due to
the second ``ruleset "blacklistd"'' causing the issue:
-
$if1_if = inet4(vmx0)
$if2_if = inet4(vmx1)

alg "icmp"

group "foo" on $if1_if {
ruleset "blacklistd"
}
group "bar" on $if2_if {
ruleset "blacklistd"
}

group default {
pass final on lo0 all
block all
}
-

I haven't investigated further yet.  Ring any bells with anyone?

System is amd64 -current.

Regards,
Geoff



-current cloner interfaces broken/gone/unusable

2018-04-23 Thread Frank Kardel

Hi,

using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21 
23:01:29 UTC 2018 
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64)


no cloning interfaces are visible:

gateway# ifconfig -l
ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1
gateway# ifconfig -C
ifconfig: SIOCIFGCLONERS for count: Device not configured
gateway# ifconfig vlan0 create
ifconfig: clone_command: Device not configured
ifconfig: exec_matches: Device not configured
gateway#

This does not seem to be a desirable state - any clues what broke here ?

Frank



Re: -current cloner interfaces broken/gone/unusable

2018-04-23 Thread Frank Kardel

Hi Robert !

That made it work again. I share your view on relative beauty here.

There are also 2 other observations with a 8.99.12 userland:

named has now trouble with interface scanning.

2018-04-24T05:13:34.522295+00:00 gateway named 345 - - automatic 
interface scanning terminated: not enough free resources


syslogd has sometimes issues with /var/run/log
2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix 
`/var/run/log': No buffer space available


Looks like the may be more issues with (compatibility) code.

Thanks for the (preliminary) fix.

Frank

On 04/24/18 00:34, Robert Swindells wrote:

Frank Kardel <kar...@netbsd.org> wrote:

using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21
23:01:29 UTC 2018
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64)

no cloning interfaces are visible:

gateway# ifconfig -l
ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1
gateway# ifconfig -C
ifconfig: SIOCIFGCLONERS for count: Device not configured
gateway# ifconfig vlan0 create
ifconfig: clone_command: Device not configured
ifconfig: exec_matches: Device not configured
gateway#

This does not seem to be a desirable state - any clues what broke here ?

It looks to be the test for a valid interface name in
sys/compat/common/uipc_syscalls_50.c that is causing this, I think it
should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA.

This works for me but is a bit ugly:

Index: uipc_syscalls_50.c
===
RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v
retrieving revision 1.4
diff -u -r1.4 uipc_syscalls_50.c
--- uipc_syscalls_50.c  12 Apr 2018 18:50:13 -  1.4
+++ uipc_syscalls_50.c  23 Apr 2018 22:33:14 -
@@ -63,9 +63,17 @@
 struct ifnet *ifp;
 int error;
  
-   ifp = ifunit(ifdr->ifdr_name);

-   if (ifp == NULL)
-   return ENXIO;
+   switch (cmd) {
+   case SIOCGIFDATA:
+   case SIOCZIFDATA:
+   ifp = ifunit(ifdr->ifdr_name);
+   if (ifp == NULL)
+   return ENXIO;
+   break;
+   default:
+   ifp = NULL;
+   break;
+   }
  
 switch (cmd) {

 case SIOCGIFDATA:





Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Frank Kardel

It is not only a boot time issue - I also see during normal operation:

2018-04-24T10:30:10.466723+00:00 gateway blacklistd 611 - - bl_recv: 
recvmsg failed (No buffer space available)
2018-04-24T10:30:10.466821+00:00 gateway blacklistd 611 - - no message 
(No buffer space available)
2018-04-24T10:56:47.223562+00:00 gateway sshd 13053 - - error: maximum 
authentication attempts exceeded for invalid user root from 
106.113.147.190 port 63303 ssh2 [preauth]
2018-04-24T11:15:09.240247+00:00 gateway blacklistd 611 - - bl_recv: 
recvmsg failed (No buffer space available)
2018-04-24T11:15:09.240791+00:00 gateway blacklistd 611 - - no message 
(No buffer space available)


I don't expect major resource usage for blacklistd though.

Also named does not seem to be too happy and ceases interface scanning. 
This does not yet give a warm fuzzy feeling :-) && :-(


Frank

On 04/24/18 09:56, Roy Marples wrote:

On 24/04/2018 08:26, Martin Husemann wrote:

On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote:

syslogd has sometimes issues with /var/run/log
2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() 
unix

`/var/run/log': No buffer space available


This is a seaparate change and unrelated to compatibility. It happens
with up to date binaries as well. I think it was a silent bug before
and has now been made more verbose. Still pretty annoying and happens
for me on various machines on every boot. Roy, did you have a chance to
look at it?


Not yet no. But yes, in all releases prior it was a silent bug on all 
types of socket and in all the BSDs as well. I know, I checked - only 
OpenBSD has an overflow check like this and they solve that with a 
magic message on route(4) only which is just yuck as it makes the 
problem worse.


I only have one machine where I can reliably repro this, my erlite and 
that only happens because route(4) overflows (detected in dhcpcd) as 
it's a router and the box isn't up yet and a load of address 
validation flows over the socket when the link comes up. This is a 
good thing, because dhcpcd can then react to the error and sync it's 
state using getifaddrs().


I think the easiest fix is to increase the default size of the socket 
buffer. Where this is done, I don't know but could find out if pushed.

This would fix everything if the default buffer was big enough.

Saying this, from what I'm hearing this only happens at boot time, so 
we could potentially shrink the buffer back down again if we need to 
consider dynamically growing it in the kernel as well. No idea if 
that's even possible or what performance impact it would have.


The last option is to increase the socket buffer size in all affected 
applications using ioctl (or is it setsockopt?). But to what value I 
don't know. Trial and error?


Roy




Re: st(4) and mt eom

2019-03-21 Thread Frank Kardel

On 03/21/19 07:00, John Nemeth wrote:


On Mar 21,  6:48am, Frank Kardel wrote:
}
} As I wrote, extending MTIO ioctls to support more mt commands is not a
} real issue (e. g. add access to the LOCATE command). We just need to
} decide which features are useful to support and find the time to
} implement them.

  Yes, those would be enhancements, not bug fixes.  Also, what
software would use these enhancements besides mt(1).  Some of the
stuff, like informational items and control items, look to be useful
in their own right, but other stuff, like locate, look to be
redundant.

Yes, aggreed.

} For Bacula it wouldn't help right now, thus the EOM code needs fixing.

  For this, it is debatable if it is an enhancement or a bug
fix.  However, it is needed by a significant application, which
should raise its priority.

} There are also some other things that are useful to add to the SCSI
} subsystem like acquiring timeout information from the device to avoid
} early device resets while the drive is still fighting with its servo
} errors (just happened to me with a bad LTO7 drive). I have that update
} currently sitting in my tree.

  Getting timeout information from the device would be useful.
I had a problem with an earlier tape changer.  If I issued a tape
move command without first rewinding the tape, the operation would
often timeout.  This would cause a SCSI bus reset to be issued
while the tape was rewinding.  Then it would try to probe the
changer while it was still resetting.  This drove the changer crazy.
At first, I tried to remember to rewind the tape before issuing a
move command.  Eventually I found the right spot in our code and
changed the timeout.
I tripped over the same thing long time ago as some of our timeout 
constants are too low.
E. g. our LTO7 drive specifies 60 seconds for TUR, but the coded timeout 
for TUR is 15 seconds. Thus
getting the timeout values from the device is useful in case they are 
longer than our timeouts.
Unfortunately the changer device in our FlexStorII library does not 
provide timeout information.
Would be nice if it did. The drives provide the information for their 
operations.


But fetching the timeout is a plus anyway e. g. WRITE on our LTO7 drive 
can take up to 1560

seconds.



} On 03/21/19 04:27, John Nemeth wrote:
} > On Mar 20,  8:08pm, Frank Kardel wrote:
} > }
} > } This seems to be a long standing deficiency of the driver. Looking at
} > } the SCSI spec it is recommended to issue a READ POSITION command
} >
} >   I did read a book about SCSI sometime ago, so I have a good
} > overview of CCBs and the protocol, but it has been some time since
} > I've looked at it in any depth.  And, although I've made minor
} > changes to our SCSI code, I can't say that I know it in any depth.
} >
} > }
} [snip]
}-- End of excerpt from Frank Kardel



Frank


Re: st(4) and mt eom

2019-03-20 Thread Frank Kardel

Hi Jon !


As I wrote, extending MTIO ioctls to support more mt commands is not a 
real issue (e. g. add access to the LOCATE command). We just need to 
decide which features are useful to support and find the time to 
implement them.


For Bacula it wouldn't help right now, thus the EOM code needs fixing.

There are also some other things that are useful to add to the SCSI 
subsystem like acquiring timeout information from the device to avoid 
early device resets while the drive is still fighting with its servo 
errors (just happened to me with a bad LTO7 drive). I have that update 
currently sitting in my tree.


Frank


On 03/21/19 04:27, John Nemeth wrote:

On Mar 20,  8:08pm, Frank Kardel wrote:
}
} This seems to be a long standing deficiency of the driver. Looking at
} the SCSI spec it is recommended to issue a READ POSITION command

  I did read a book about SCSI sometime ago, so I have a good
overview of CCBs and the protocol, but it has been some time since
I've looked at it in any depth.  And, although I've made minor
changes to our SCSI code, I can't say that I know it in any depth.

}

[snip]


HEADS UP: SCSI device specific timeouts

2019-03-28 Thread Frank Kardel

Hi!

I just committed fetching device specific timeout values from SCSI devices.

The benefit is that we do not abort a perfectly running SCSI commands 
with device reset before its device provided timeout and thus gain

more reliability with slower devices.

The timeout values are loaded at attachment time if the device supports 
them,
If the timeout information is available it shows up as "timeout-info" in 
dmesg like this:


st0 at scsibus0 target 9 lun 0:  tape removable 
timeout-info

st0: density code 92, variable blocks, write-enabled

This mechanism is disabled for USB umass devices as some of these may 
get into strange

states when the SCSI command is sent.

The code has been tested with SCSI tapes, changers and disks. The newer 
tapes provide the timeout information and thus
some helpful longer timeouts like TUR 60 sec instead of our default of 
15 sec and WRITE ~1500 sec instead of 120 sec.


So, if anything gets now stuck at SCSI attachment time it might be the 
new MAINTENANCE_IN/REPORT_SUPPORTED_OPCODES
query and thus might just need a quirk. Hopefully there are not many of 
these devices.


Best regards,
  Frank





Re: st(4) and mt eom

2019-03-20 Thread Frank Kardel

Hi John !

This seems to be a long standing deficiency of the driver. Looking at 
the SCSI spec it is recommended to issue a READ POSITION command


get the current position. Looking at the spec and code it should be 
possible to handle the SP_EOM case better with respect to the position 
information


by issuing READ POSITION (service action code 6 = LONG FORM) at that 
point to set the correct position.


I also ran bacula (now I run freshly ported bareos 18.2.5 for HW 
encryption, tapealert) with HW EOM set to false. btape test made those 
recommendation a long time ago.


Time permitting I try to update the driver this week, would you be 
willing to test?


Frank



On 03/20/19 19:02, John Nemeth wrote:

  If you issue an "mt eom" (forward to end of media), the driver
loses track of the tape position.  This seriously messes with
Bacula's tape handling.  Since Bacula expects the driver not to
lose the tape position I get the feeling there are other operating
systems that don't.  I found this code in st.c:st_space():

 error = scsipi_command(st->sc_periph, (void *), sizeof(cmd), 0, 0,
 0, ST_SPC_TIME, NULL, flags);

 if (error == 0 && (st->flags & ST_POSUPDATED) == 0) {
 number = number - st->last_ctl_resid;
 if (what == SP_BLKS) {
 if (st->blkno != -1)
 st->blkno += number;
 } else if (what == SP_FILEMARKS) {
 if (st->fileno != -1) {
 st->fileno += number;
 if (number > 0)
 st->blkno = 0;
 else if (number < 0)
 st->blkno = -1;
 }
 } else if (what == SP_EOM) {
 /* This loses us relative position. */
 st->fileno = st->blkno = -1;
 }
 }
 return error;
}

Notice the SP_EOM case.  Can any SCSI experts, in particular SCSI
tape experts, shed some light on this and what can be done about
it?

  I have found a workaround for Bacula which is to tell it about
this problem.  If you do that, Bacula will do "mt fsf 65535" (and
pray that there aren't more files then that on the tape).  The tape
I have with the largest number of files is at 1186, so this will
do for now.  Still, it would be nice to fix the underlying problem.

  For those wondering, my bacula-sd.conf contains:

Device {
   Name = LTO-4a
   Archive Device = /dev/nrst0
   Device Type = Tape
   Media Type = LTO-4
   AutoChanger = yes
   LabelMedia = no
   Drive Index = 0
   AlwaysOpen = yes;
   Removable Media = yes;
   Random Access = no;
   Maximum File Size = 2GB
   Automatic Mount = yes;   # when device opened, read it
   Spool Directory = /bacula/spool-sd
   Hardware End of Medium = No
   #Fast Forward Space File = No
   BSF at EOM = yes
   #
   # New alert command in Bacula 9.0.0
   #  Note: you must have the sg3_utils (rpms) or the
   #sg3-utils (deb) installed on your system.
   #and you must set the correct control device that
   #corresponds to the Archive Device
#  Control Device = /dev/sg??  # must be SCSI ctl for /dev/nrst0
#  Alert Command = "/usr/pkg/libexec/bacula/tapealert %l"

   # Enable the Alert command only if you have the mtx package loaded
# Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
# If you have smartctl, enable this, it has more info than tapeinfo
# Alert Command = "sh -c 'smartctl -H -l error %c'"
}





Re: st(4) and mt eom

2019-03-20 Thread Frank Kardel

Hi Adrian!

I just finished implementing the EOM fix. On SPACE(EOM) a READ 
POSITION(LONG_FORMAT) is


done and current file number is set to the number of filemarks since BOT

which is what we need.

on a 10 file taoe (LTO6) both commands

mt fsf 64

mt eom

now return a current file number of 10.

no additional changes to mt like adding a locate command are needed, 
Though adding LOCATE might also be an option to be added separately for 
mt even if it does not help with bacula at all as bacula uses the MT 
ioctls directly and not via scripts.


Thanks for the hint anyway.

Frank


On 03/20/19 21:09, Adrian Bocaniciu wrote:

On Wed, 20 Mar 2019 20:08:17 +0100
Frank Kardel  wrote:


This seems to be a long standing deficiency of the driver. Looking at
the SCSI spec it is recommended to issue a READ POSITION command

get the current position. Looking at the spec and code it should be
possible to handle the SP_EOM case better with respect to the position
information


I suggest that you should look at what FreeBSD does for the command:
mt locate -e

I have not attempted to use NetBSD with a tape, but I am using FreeBSD with an LTO-7 
drive and "mt locate -e" works flawlessly.

I always follow that command in my scripts with a "mt rdspos", to verify that it worked 
correctly, by comparing the result with the last written position on that tape (the position after 
"mt locate -e" should be the position read after writing the previous last file + 1).

Best regards !





Re: st(4) and mt eom

2019-03-22 Thread Frank Kardel

Hi Adrian!

No worries. I am just trying to separate useful features from bugs and 
unnecessary deficiencies that can be fixed quickly without disturbing 
APIs and compatibility.


On 03/22/19 05:54, Adrian Bocaniciu wrote:

On Thu, 21 Mar 2019 06:48:22 +0100
Frank Kardel  wrote:


For Bacula it wouldn't help right now, thus the EOM code needs fixing.


When I referred you to FreeBSD "mt locate -e", my point was not that you should 
implement the additional FreeBSD features, even if that will also be good.
Yes, picking up some feature might be useful as already mention in other 
parts of this thread.

Supporting LOCATE seems useful and others may we useful too.



My point was that you should just search in the FreeBSD code which is the sequence of 
SCSI commands that is given for a "mt locate -e", because that sequence of SCSI 
commands is proven to move reliably the tape to the end of the recorded data (also 
maintaining correctly the position information).
I looked at the driver code and it does the expected and what the spec 
recommends - do a READ_POSTION after a SPACE/LOCATE command.


Frank


Re: What to do with base X11 for netbsd-9 ?

2019-06-05 Thread Frank Kardel
Same EFI issue here: EFI/GK208B [GeForce GT 710] gets stuck when 
attempting to configure the card.


Booting via CSM or EFI and nouveau disabled gets the machine up and 
nouveau works fine with acceleration in the CSM case.


Looking at the PCI configuration the main difference is that the CSM 
sets the BusMaster bits to enabled on the bridges while this is not the 
case for EFI.


Also more IO/MEM/ROM bits are set.

I think we still lack sufficient PCI setup when booting via EFI.

(lspci dumps are available on request).

Frank


On 06/05/19 11:25, Patrick Welche wrote:

On Tue, Jun 04, 2019 at 01:03:49PM +0100, Patrick Welche wrote:

On another front, I cannot boot a computer with nouveau and a
NVidia GeForce GTX 680 (GK104). At the point where the console normally
changes resolution, the screen goes black and everything stops.
(No panic AFAICT)

This has changed! The above is still true with EFI, but with BIOS booting,
nouveau attaches successfully, and I can run xdm+twm! The experience
is exactly the opposite to the one with Sandy Bridge: images display fine,
glmark2 runs. It's the fonts which are unreadable (not a question of size
but of artifacts. This is in 3840x mode. Still, at least I no longer
have to disable nouveau in order to boot!

Cheers,

Patrick




Re: st(4) and mt eom

2019-05-19 Thread Frank Kardel

Hi !

It was not committed as up to now. I just committed the simple fix to 
-current.


I am in the middle of upgrading the positioning code so more changes may 
come time permitting.


Frank

On 05/19/19 09:35, Staffan Thomén wrote:

Frank Kardel wrote:

Hi Adrian!

I just finished implementing the EOM fix. On SPACE(EOM) a READ
POSITION(LONG_FORMAT) is

done and current file number is set to the number of filemarks since BOT

which is what we need.

on a 10 file taoe (LTO6) both commands

mt fsf 64

mt eom

now return a current file number of 10.

no additional changes to mt like adding a locate command are needed,
Though adding LOCATE might also be an option to be added separately for
mt even if it does not help with bacula at all as bacula uses the MT
ioctls directly and not via scripts.


Hey!

I'd really like to have this feature, having recently moved to LTO5 
and been forced to sit through bacula going one file at at time.


So what happened? Are there patches? Will there be a pull-up to -8?

_DenverCoder9, what did you see?_ - xkcd (wisdom of the ancients)

Staffan





Hints for Bananapi and -current

2019-05-01 Thread Frank Kardel

I tried -current with my Bananapi an had limited success:

Using the first steps copying the image armv7 and the 2018.05 u-boot I 
found the u-boot load attempting to perform a dhcp boot as nothing was 
found on the mmc drive in autoboot. Did I miss somethnig to set up there ?


I finally got a kernel to start booting with following chants:

mmc dev 0

fatload mmc 0:1 $fdt_addr_r $fdtfile

fatload mmc 0:1 8200 netbsd-GENERIC.ub

bootm 8200 - $fdt_addr_r root=ld0a console=fb/none

The output always stops at:

[   1.000] NetBSD 8.99.37 (GENERIC) #2: Sun Apr 28 10:09:56 CEST 2019
[   1.000] 
kardel@Andromeda:/src/NetBSD/cur/src/obj.evbarm/sys/arch/evbarm/compile/GENERIC

[   1.000] total memory = 1022 MB
[   1.000] avail memory = 1012 MB
[   1.000] armfdt0 (root)
[   1.000] simplebus0 at armfdt0: LeMaker Banana Pi
[   1.000] simplebus1 at simplebus0
[   1.000] cpus0 at simplebus0
[   1.000] simplebus2 at simplebus0
[   1.000] simplebus3 at simplebus0
[   1.000] cpu0 at cpus0: Cortex-A7 r0p4 (Cortex V7A core)
[   1.000] cpu0: DC enabled IC enabled WB enabled LABT branch 
prediction enabled

[   1.000] cpu0: 32KB/32B 2-way L1 VIPT Instruction cache
[   1.000] cpu0: 32KB/64B 4-way write-back-locking-C L1 PIPT Data cache
[   1.000] cpu0: 256KB/64B 8-way write-through L2 PIPT Unified cache
[   1.000] vfp0 at cpu0: NEON MPE (VFP 3.0+), rounding, NaN 
propagation, denormals

[   1.000] cpufreqdt0 at cpu0
[   1.000] cpu1 at cpus0
[   1.000] cpufreqdt1 at cpu1
[   1.000] gic0 at simplebus1: GIC
[   1.000] armgic0 at gic0: Generic Interrupt Controller, 160 
sources (150 valid)

[   1.000] armgic0: 16 Priorities, 128 SPIs, 7 PPIs, 15 SGIs
[   1.000] fclock0 at simplebus2: 2500 Hz fixed clock (mii_phy_tx)
[   1.000] fclock1 at simplebus2: 12500 Hz fixed clock (gmac_int_tx)
[   1.000] fclock2 at simplebus2: 2400 Hz fixed clock (osc24M)
[   1.000] fclock3 at simplebus2: 32768 Hz fixed clock (osc32k)
[   1.000] gtmr0 at simplebus0: Generic Timer
[   1.000] gtmr0: interrupting on GIC irq 27
[   1.000] armgtmr0 at gtmr0: ARM Generic Timer (24000 kHz)
[   1.420] sun4ia10ccu0 at simplebus1: A20 CCU
[   1.420] sunxinmi0 at simplebus1: NMI
[   1.420] sunxigmacclk0 at simplebus2: GMAC MII/RGMII clock mux
[   1.420] sunxigpio0 at simplebus1: PIO
[   1.420] gpio0 at sunxigpio0: 175 pins
[   1.420] sunxigpio0: interrupting on GIC irq 60
[   1.420] sunxisramc0 at simplebus1: SRAM Controller
[   1.420] sunxidebe0 at simplebus1: Display Engine Backend 
(display-backend@1e6)
[   1.420] sunxidebe1 at simplebus1: Display Engine Backend 
(display-backend@1e4)


So in summary I seem to get up to video initialization. For my 4K TV I 
had to increase the MAX_FB reserved memory to 32M but that didn't help, 
also not connecting any HDMI device didn't help.


The u-boot bootm command was change to manage ramdisk images thus the 
tips on our web site don't apply to the new bootm syntax.


Any other things I can try or that I overlooked?

Frank




Re: Hints for Bananapi and -current

2019-05-01 Thread Frank Kardel

Thanks - that got me beyond screen initialization.

but

bootm 8200 - $fdt_addr_r root=ld0a console=fb

asked for the root device, swap, fs tape and init - so the parameters 
probably did not reach the kernel at all.


starting X didn't show anything on the 4K screen, X seemed to be running 
though.


I am also not sure whether the Bananapi HDMI can do the u-boot 
determined 3940x2160 resolution. I have yet to play around with hdmi 
configuration.


The dmesg output is attached.

Adding a usb keyboard uncovered a panic while awaiting root device input:

[   3.4159917] ehci1: handing over low speed device on port 1 to 
companion controller

[   3.6660050] boot device: 
[   3.6660050] root device: uhidev0 at uhub3 port 1 configuration 1 
interface 0
[   5.0672791] uhidev0: DaKai (0xe8f) 2.4G RX (0xa8), rev 1.10/3.11, 
addr 2, iclass 3/1

[   5.1663953] ukbd0 at uhidev0: 8 Variable keys, 6 Array codes

[   5.3375655] This port is broken, it does not call cnpollc() before 
calling cngetc().

[   5.4375739] This should be fixed, but it will work anyway (for now).
[   5.6775820] wskbd0 at ukbd0: console keyboard, using wsdisplay0
[   5.7732545] uhidev1 at uhub3 port 1 configuration 1 interface 1
[   5.8694170] uhidev1: DaKai (0xe8f) 2.4G RX (0xa8), rev 1.10/3.11, 
addr 2, iclass 3/1

[   5.9727975] panic: usbd_transfer: not done
[   6.0611224] cpu0: Begin traceback...
[   6.1478907] 0x9c695b84: netbsd:db_panic+0x14
[   6.2380975] 0x9c695b9c: netbsd:vpanic+0x194
[   6.3276372] 0x9c695bb4: netbsd:snprintf
[   6.4161552] 0x9c695bf4: netbsd:usbd_sync_transfer
[   6.5075330] 0x9c695c34: netbsd:usbd_do_request_flags+0xa4
[   6.6012676] 0x9c695c4c: netbsd:usbd_do_request+0x20
[   6.6932964] 0x9c695c7c: netbsd:usbd_set_idle+0x70
[   6.7838060] 0x9c695d54: netbsd:uhidev_attach+0xdc
[   6.8737881] 0x9c695d8c: netbsd:config_attach_loc+0x1b4
[   6.9701440] 0x9c695dbc: netbsd:config_found_sm_loc+0x54
[   7.0606941] 0x9c695e5c: netbsd:usbd_attachinterfaces+0x1b0
[   7.1525235] 0x9c695e8c: netbsd:usbd_probe_and_attach+0x84
[   7.2413633] 0x9c695ef4: netbsd:usbd_new_device+0x254
[   7.3280048] 0x9c695f5c: netbsd:uhub_explore+0x2dc
[   7.4155137] 0x9c695f84: netbsd:usb_discover.isra.2+0x74
[   7.5044602] 0x9c695fac: netbsd:usb_event_thread+0x84
[   7.5926830] cpu0: End traceback...
Stopped in pid 0.59 (system) at netbsd:cpu_Debugger+0x4: bx  r14

Any ideas ?

Frank

On 05/01/19 18:40, Jared McNeill wrote:
Remove the following devices from your kernel config and the kernel 
should use simplefb instead: sunxidebe, sunxitcon, sunxihdmi, sunxidep


I just peeked at the code quickly and it looks like the DE drivers are 
blindly using the display's advertised preferred mode without taking 
its own capabilities into consideration.



On Wed, 1 May 2019, Frank Kardel wrote:


I tried -current with my Bananapi an had limited success:

Using the first steps copying the image armv7 and the 2018.05 u-boot 
I found the u-boot load attempting to perform a dhcp boot as nothing 
was found on the mmc drive in autoboot. Did I miss somethnig to set 
up there ?


I finally got a kernel to start booting with following chants:

mmc dev 0

fatload mmc 0:1 $fdt_addr_r $fdtfile

fatload mmc 0:1 8200 netbsd-GENERIC.ub

bootm 8200 - $fdt_addr_r root=ld0a console=fb/none

The output always stops at:

[   1.000] NetBSD 8.99.37 (GENERIC) #2: Sun Apr 28 10:09:56 CEST 
2019
[   1.000] 
kardel@Andromeda:/src/NetBSD/cur/src/obj.evbarm/sys/arch/evbarm/compile/GENERIC

[   1.000] total memory = 1022 MB
[   1.000] avail memory = 1012 MB
[   1.000] armfdt0 (root)
[   1.000] simplebus0 at armfdt0: LeMaker Banana Pi
[   1.000] simplebus1 at simplebus0
[   1.000] cpus0 at simplebus0
[   1.000] simplebus2 at simplebus0
[   1.000] simplebus3 at simplebus0
[   1.000] cpu0 at cpus0: Cortex-A7 r0p4 (Cortex V7A core)
[   1.000] cpu0: DC enabled IC enabled WB enabled LABT branch 
prediction enabled

[   1.000] cpu0: 32KB/32B 2-way L1 VIPT Instruction cache
[   1.000] cpu0: 32KB/64B 4-way write-back-locking-C L1 PIPT Data 
cache

[   1.000] cpu0: 256KB/64B 8-way write-through L2 PIPT Unified cache
[   1.000] vfp0 at cpu0: NEON MPE (VFP 3.0+), rounding, NaN 
propagation, denormals

[   1.000] cpufreqdt0 at cpu0
[   1.000] cpu1 at cpus0
[   1.000] cpufreqdt1 at cpu1
[   1.000] gic0 at simplebus1: GIC
[   1.000] armgic0 at gic0: Generic Interrupt Controller, 160 
sources (150 valid)

[   1.000] armgic0: 16 Priorities, 128 SPIs, 7 PPIs, 15 SGIs
[   1.000] fclock0 at simplebus2: 2500 Hz fixed clock 
(mii_phy_tx)
[   1.000] fclock1 at simplebus2: 12500 Hz fixed clock 
(gmac_int_tx)

[   1.000] fclock2 at simplebus2: 2400 Hz fixed clock (osc24M)
[   1.000] fclock3 at simplebus2: 32768 Hz fixed clock (osc32k)
[   1.000] gtmr0 at simplebus0: Generic Timer
[   1.000] gtmr0: interrupting on GIC irq 27
[   1.000] armgtmr0 at gtmr0: ARM Generic

Re: Hints for Bananapi and -current

2019-05-07 Thread Frank Kardel

That looks much cleaner will try that.

Frank


On 05/01/19 22:58, Jared McNeill wrote:
So there is a better way to boot modern NetBSD/arm (using UEFI and 
bootarm.efi). If you want to boot the old way, it goes something like 
this:


  setenv bootargs root=ld0a console=fb
  fatload mmc 0 $kernel_addr_r netbsd-GENERIC.ub
  fatload mmc 0 $fdt_addr_r $fdtfile
  fdt addr $fdt_addr_r
  bootm $kernel_addr_r - $fdt_addr_r

This method relies on 1) the kernel being wrapped with a legacy U-Boot 
image header, and 2) both the kernel and .dtb file being present on 
the FAT partition.


Now on to the modern boot method..

Using U-Boot 2018.11 or later, setup a FAT partition with the 
following files on it:


  EFI/BOOT/bootarm.efi
  your-fdt-file.dtb

U-Boot will automatically launch the UEFI bootloader and you will be 
presented with a countdown timer. bootarm will load a native ELF 
kernel (by default /netbsd) from the first FFS partition on the same 
drive that the loader came from. In addition, bootarm passes 
information about where to find the root device to the kernel 
automatically, so you shouldn’t need to specify a root= option. 
GENERIC and GENERIC64 kernels are setup to automatically use fb when 
available, so console=fb is also no longer required.


Fast path to try this all out is to grab armv7.img from a build, add 
your U-Boot to it, and write to SD card. The image should boot 
automatically. Alternatively, you can download an image from 
www.invisible.ca/arm <http://www.invisible.ca/arm> that has U-Boot 
already applied for your board.


Hope this helps!
Jared


On May 1, 2019, at 5:07 PM, Frank Kardel <mailto:kar...@netbsd.org>> wrote:


Thanks - that got me beyond screen initialization.

but

bootm 8200 - $fdt_addr_r root=ld0a console=fb

asked for the root device, swap, fs tape and init - so the parameters 
probably did not reach the kernel at all.


starting X didn't show anything on the 4K screen, X seemed to be 
running though.


I am also not sure whether the Bananapi HDMI can do the u-boot 
determined 3940x2160 resolution. I have yet to play around with hdmi 
configuration.


The dmesg output is attached.

Adding a usb keyboard uncovered a panic while awaiting root device input:

[   3.4159917] ehci1: handing over low speed device on port 1 to 
companion controller

[   3.6660050] boot device: 
[   3.6660050] root device: uhidev0 at uhub3 port 1 configuration 1 
interface 0
[   5.0672791] uhidev0: DaKai (0xe8f) 2.4G RX (0xa8), rev 1.10/3.11, 
addr 2, iclass 3/1

[   5.1663953] ukbd0 at uhidev0: 8 Variable keys, 6 Array codes

[   5.3375655] This port is broken, it does not call cnpollc() before 
calling cngetc().

[   5.4375739] This should be fixed, but it will work anyway (for now).
[   5.6775820] wskbd0 at ukbd0: console keyboard, using wsdisplay0
[   5.7732545] uhidev1 at uhub3 port 1 configuration 1 interface 1
[   5.8694170] uhidev1: DaKai (0xe8f) 2.4G RX (0xa8), rev 1.10/3.11, 
addr 2, iclass 3/1

[   5.9727975] panic: usbd_transfer: not done
[   6.0611224] cpu0: Begin traceback...
[   6.1478907] 0x9c695b84: netbsd:db_panic+0x14
[   6.2380975] 0x9c695b9c: netbsd:vpanic+0x194
[   6.3276372] 0x9c695bb4: netbsd:snprintf
[   6.4161552] 0x9c695bf4: netbsd:usbd_sync_transfer
[   6.5075330] 0x9c695c34: netbsd:usbd_do_request_flags+0xa4
[   6.6012676] 0x9c695c4c: netbsd:usbd_do_request+0x20
[   6.6932964] 0x9c695c7c: netbsd:usbd_set_idle+0x70
[   6.7838060] 0x9c695d54: netbsd:uhidev_attach+0xdc
[   6.8737881] 0x9c695d8c: netbsd:config_attach_loc+0x1b4
[   6.9701440] 0x9c695dbc: netbsd:config_found_sm_loc+0x54
[   7.0606941] 0x9c695e5c: netbsd:usbd_attachinterfaces+0x1b0
[   7.1525235] 0x9c695e8c: netbsd:usbd_probe_and_attach+0x84
[   7.2413633] 0x9c695ef4: netbsd:usbd_new_device+0x254
[   7.3280048] 0x9c695f5c: netbsd:uhub_explore+0x2dc
[   7.4155137] 0x9c695f84: netbsd:usb_discover.isra.2+0x74
[   7.5044602] 0x9c695fac: netbsd:usb_event_thread+0x84
[   7.5926830] cpu0: End traceback...
Stopped in pid 0.59 (system) at netbsd:cpu_Debugger+0x4: bx  r14

Any ideas ?

Frank

On 05/01/19 18:40, Jared McNeill wrote:
Remove the following devices from your kernel config and the kernel 
should use simplefb instead: sunxidebe, sunxitcon, sunxihdmi, sunxidep


I just peeked at the code quickly and it looks like the DE drivers 
are blindly using the display's advertised preferred mode without 
taking its own capabilities into consideration.



On Wed, 1 May 2019, Frank Kardel wrote:


I tried -current with my Bananapi an had limited success:

Using the first steps copying the image armv7 and the 2018.05 
u-boot I found the u-boot load attempting to perform a dhcp boot as 
nothing was found on the mmc drive in autoboot. Did I miss 
somethnig to set up there ?


I finally got a kernel to start booting with following chants:

mmc dev 0

fatload mmc 0:1 $fdt_addr_r $fdtfile

fatload mmc 0:1 8200 netbsd-GENERIC.ub

bootm 8200 - $fdt_addr_r root=ld0a console=fb/none

The outp

Re: recurring tstile hangs on -current

2019-06-28 Thread Frank Kardel

Hi Thomas,

glad that this is observed elsewhere.

Maybe following bugs could resonate with your observations:

kern/54207 [serious/high]:
-current locks up solidly when pkgsrc building 
adapta-gtk-theme-3.95.0.11
looks like locking issue in layerfs* (nullfs). (AMD 1800X, 64GB)

kern/54210 [serious/high]:
NetBSD-8 processes presumably not exiting
not tested with -current,but may be there too. (Intel(R) Xeon(R) Gold 6134 CPU 
@ 3.20GHz, ~380Gb)

At this time I am not too confident, that -current is reliably able to do a 
pkgsrc build, though I have seen occasionally bulk builds that did finish.
Most of the time I run into hard lockups with no information about the system 
state available (no console, no X, no network, no DDB).

Frank


On 06/28/19 10:46, Thomas Klausner wrote:

Hi!

I've set up a new machine for bulk building. I have tried various
things, but in the end it always hangs in tstile.

First try was what I currently use: tmpfs sandboxes with nullfs
mounted /bin, /lib, ... When it hung, the suspicion was that it's
nullfs' fault. (The same setup works fine on my current machine.)

The second try was tmpfs with copied-in /bin, /lib, ... and
NFS-mounted packages/distfiles/pkgsrc (from localhost). That also
hung. So the suspicion was that tmpfs or NFS are broken.

The last try was building in the root file system, i.e. not even a
sandbox (chroot). The only tmpfs is in /dev. distfiles/pkgsrc/packages
are on spinning rust, / is on an ld@nvme. With 8 MAKE_JOBS this
finished one pkgsrc build (where some packages didn't build because of
missing distfiles, or because they randomly break like rust). When I
restarted the bulk build with 24 MAKE_JOBS, it hung after ~4 hours.

I have the following systat output:

 2 usersLoad  8.78  7.19  3.62  Fri Jun 28 04:27:32

Proc:r  d  sCsw  Traps SysCal  Intr   Soft  Fault PAGING   SWAPPING
 2410   7548 265849 157956  3504   2399 265476 in  out   in  out
 ops
   56.2% Sy   1.2% Us   0.0% Ni   0.0% In  42.5% Idpages
|||||||||||
> 670 forks
   fkppw
Anon   294104%   zero 62161268  5572 Interrupts   fksvm
Exec14116%   wired   16296  1968 TLB shootdownpwait
File 24587740  18%   inact   43756   100 cpu0 timer   relck
Meta  2606694%   bufs   495676   msi1 vec 0   rlkok
  (kB)real   swaponly  free 9 msix2 vec 0  noram
Active   24835908100033996 9 msix2 vec 157262 ndcpy
Namei Sys-cache Proc-cache   msix2 vec 227906 fltcp
 Calls hits% hits %  3427 ioapic1 pin 12 87178 zfod
125076   122834   98   80 059 ioapic2 pin 0  35775 cow
  msix7 vec 0 8192 fmin
   Disks:   seeks   xfers   bytes   %busy10922 ftarg
  ld01969  16130K34.8  itarg
  dk01969  16130K34.8  flnan
  wd0  pdfre
  dk1  pdscn
  dk2

and this from top:

load averages:  5.13,  6.53,  3.56;   up 1+16:08:05 

 04:28:13
59 processes: 2 runnable, 55 sleeping, 2 on CPU
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.9% idle
Memory: 24G Act, 43M Inact, 16M Wired, 14M Exec, 23G File, 95G Free
Swap: 163G Total, 163G Free

   PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
10353 pbulk 770   185M  172M select/0   0:13  4.74%  4.54% bjam
12120 wiz  109083M   59M tstile/1 165:46  1.46%  1.46% systat
 0 root   00 0K   93M CPU/3135:39  0.00%  0.00% [system]
   219 root  85032M 2676K kqueue/4   7:34  0.00%  0.00% syslogd
13354 wiz   85089M 4948K select/0   0:52  0.00%  0.00% sshd
   380 root  85030M   16M pause/40:04  0.00%  0.00% ntpd
10918 wiz   43025M 2872K CPU/3  0:01  0.00%  0.00% top
 1 root  85020M 1756K wait/290:01  0.00%  0.00% init
  5594 pbulk  00 0K0K RUN/0  0:00  0.00%  0.00% bjam
22861 pbulk  00 0K0K RUN/0  0:00  0.00%  0.00% bjam
   747 root 117020M 2080K tstile/8   0:00  0.00%  0.00% cron
16473 pbulk117018M 1564K tstile/2   0:00  0.00%  0.00% cp
  9705 pbulk1170

NPF on 8.1 and pcap-filter expressions

2019-08-22 Thread Frank Kardel

I just tripped over:

  pass in final pcap-filter "ip multicast or ip6 multicast"

flawlessly compiles ... but:

  pass in final pcap-filter "ip broadcast"

gives in "npf validate"

/etc/npf.conf:xx:9: invalid pcap-filter(7) syntax

although man 7 pcap-filter says otherwise and tcpdump gladly accepts ip 
broadcast.


What needs to be fixed?

Frank



Re: NPF on 8.1 and pcap-filter expressions

2019-08-22 Thread Frank Kardel

I found that in the mean time - thanks for looking.

That leaves me probably with no generic way in npf to detect/determine 
broadcast addresses.


NPF does not seem to have PF's :network/:broadcast/:peer mechanism and 
all we


can access is the IP layer information.

This looks a bit clumsy.

Ideally I would like a generic way to determine networks, broadcast 
addresses and maybe peers statically and dynamically


in order to reduce the configuration spread between interface 
configuration and NPF configuration. This would be useful


for my case where the IP address/network is configured via DHCP and I'd 
rather like to avoid dhcpcd's hooks to rewrite/reload the


NPF configuration.

Also partial interface names like tun for tun0...tun could be helpful 
especially as these interfaces can come and go.


Am I dreaming too much ?

Frank

On 08/22/19 13:22, Michael van Elst wrote:

kar...@netbsd.org (Frank Kardel) writes:


I just tripped over:
   pass in final pcap-filter "ip multicast or ip6 multicast"
flawlessly compiles ... but:
   pass in final pcap-filter "ip broadcast"
gives in "npf validate"
/etc/npf.conf:xx:9: invalid pcap-filter(7) syntax
although man 7 pcap-filter says otherwise and tcpdump gladly accepts ip
broadcast.

from libpcap:

 case Q_IP:
 /*
  * We treat a netmask of PCAP_NETMASK_UNKNOWN (0x)
  * as an indication that we don't know the netmask, and fail
  * in that case.
  */
 if (cstate->netmask == PCAP_NETMASK_UNKNOWN)
 bpf_error(cstate, "netmask not known, so 'ip broadcast'
not supported");

npfctl compiles the filter expression with PCAP_NETMASK_UNKNOWN, there
is no netmask it could apply.





Re: NPF on 8.1 and pcap-filter expressions

2019-08-23 Thread Frank Kardel

On 08/22/19 17:44, Michael van Elst wrote:

On Thu, Aug 22, 2019 at 04:02:43PM +0200, Frank Kardel wrote:

I found that in the mean time - thanks for looking.

That leaves me probably with no generic way in npf to detect/determine
broadcast addresses.

NPF does not seem to have PF's :network/:broadcast/:peer mechanism and all
we

can access is the IP layer information.

This looks a bit clumsy.

Ideally I would like a generic way to determine networks, broadcast
addresses and maybe peers statically and dynamically

npfctl already reads IP information from interfaces, also reading the
netmask wouldn't be much of a problem. It wouldn't be magic though,
But it would allow the npfctl reload strategy after changing things. 
Though the reload strategy opens a small time frame where installed 
filter rules do not match the interface configuration.

rules aren't necessarily bound to an interface, so pcap-filter() would
need an explicit netmask argument, which makes it obvious that the
filter might not work correctly if applied to an interface with a
different netmask.
Yes. But we currently have the situation the NPF does not seem to have 
any means right now to handle netmasks and broadcasts related to 
interface addresses. As NPF works at the IP level I think supporting 
netmask/broadcast/network should be part of NPF even if we start out 
with the static solution in npfctl and supporting a dynamic one later.


In many situations it might be easier to just match the list of broadcast
addresses without pcap-filter.
And what mechanisms does NPF provide to access the broadcast address 
except for manually coding it - did I overlook something?
(I am currently looking in 8.1 as that is productional an try to convert 
our pf router configuration to NPF with very limited success right

now - more on that in another thread).




for my case where the IP address/network is configured via DHCP and I'd
rather like to avoid dhcpcd's hooks to rewrite/reload the
Also partial interface names like tun for tun0...tun could be helpful
especially as these interfaces can come and go.

That's more a question on how much code should be pushed into the
kernel. I'd rather trigger userland to reload the config.

Leaving small time windows of inconsistent configurations.
I think it depends more on the mechanisms/primitives we can think of for 
efficient dynamic access to interface properties.


Using partial interface names doesn't sound like a security feature to
me. Matching the new interface descriptions instead is probably safer
but then descriptions must also be supported by the program that manages
the interfaces.
Yes, interface descriptions are the right/better thing here. But how do 
we handle groups for an interface description when these interfaces 
appear and disappear? Should be compile these groups anyway? How do we 
handle groups for interface names and interface descriptions - looks 
like we might have two different groups for one interface - which rule 
do we run? This needs more thought or just a decision to use either 
interface names or interface descriptions.


Greetings,


Greetings

  Frank



gcc8 not compiling pkgsrc boost-libs-1.71.0

2019-11-06 Thread Frank Kardel

When bulk building pkgsrc boost libs 1.71.0 fails to compile with:

In file included from /usr/include/stddef.h:37,
 from /usr/include/g++/cstddef:50,
 from ./boost/config/compiler/gcc.hpp:165,
 from ./boost/config.hpp:39,
 from ./boost/log/detail/config.hpp:34,
 from libs/log/src/syslog_backend.cpp:18:
./boost/asio/detail/impl/kqueue_reactor.ipp: In constructor 
'boost::asio::detail::kqueue_reactor::kqueue_reactor(boost::asio::execution_context&)':
./boost/asio/detail/impl/kqueue_reactor.ipp:32:5: error: invalid 
static_cast from type 'intptr_t' {aka 'long int'} to type 'void*'

 EV_SET(ev, ident, filt, flags, fflags, data, \
 ^~
./boost/asio/detail/impl/kqueue_reactor.ipp:54:3: note: in expansion of 
macro 'BOOST_ASIO_KQUEUE_EV_SET'

   BOOST_ASIO_KQUEUE_EV_SET([0], interrupter_.read_descriptor(),
   ^~~~

Is gcc8 wrong or do we have a bug in boost-libs ?

Needless to say that this obstacle breaks over 600 packages...

Frank



Re: gcc8 not compiling pkgsrc boost-libs-1.71.0

2019-11-06 Thread Frank Kardel

Thanks for the hint. Will rebuild with current pkgsrc.

Frank


On 11/06/19 21:49, Kamil Rytarowski wrote:

On 06.11.2019 21:03, Frank Kardel wrote:

When bulk building pkgsrc boost libs 1.71.0 fails to compile with:

In file included from /usr/include/stddef.h:37,
  from /usr/include/g++/cstddef:50,
  from ./boost/config/compiler/gcc.hpp:165,
  from ./boost/config.hpp:39,
  from ./boost/log/detail/config.hpp:34,
  from libs/log/src/syslog_backend.cpp:18:
./boost/asio/detail/impl/kqueue_reactor.ipp: In constructor
'boost::asio::detail::kqueue_reactor::kqueue_reactor(boost::asio::execution_context&)':

./boost/asio/detail/impl/kqueue_reactor.ipp:32:5: error: invalid
static_cast from type 'intptr_t' {aka 'long int'} to type 'void*'
  EV_SET(ev, ident, filt, flags, fflags, data, \
  ^~
./boost/asio/detail/impl/kqueue_reactor.ipp:54:3: note: in expansion of
macro 'BOOST_ASIO_KQUEUE_EV_SET'
BOOST_ASIO_KQUEUE_EV_SET([0], interrupter_.read_descriptor(),
^~~~

Is gcc8 wrong or do we have a bug in boost-libs ?

Needless to say that this obstacle breaks over 600 packages...

Frank


Please upgrade pkgsrc package to curent. This (and other failures) were
fixed post-branch.





NetBSD 9 gpt migrate

2020-02-24 Thread Frank Kardel

Hi,

I just had fun with NetBSD 9 gpt migrate.

After creating some space at the end of an MBR partitioned disk I
invoked "gpt migrate wd0".
This shows error messages that the wedges cannot be created (errno 22 I 
believe).


After that, there is a PMBR and gpt refuses any work chanting:
wd0: bogus map current= gpt partition new=gpt partition (from memory)

Also primary and secondary gpt header seem to exist.

As the MBR is now a PMBR the MBR partition information is lost.
gpt is unusable due to the error messages above.

Recovering works by zeroing out both gpt headers and manually
creating the gpt labels.

Did I do something wrong or is "gpt migrate" broken?

Frank


Re: 9.99.40: panic: kernel diagnostic assertion "ci->ci_biglock_count == 0" failed

2020-01-26 Thread Frank Kardel

hi,

While bulk building pkgsrc with 9.99.42 from Jan 25t I see

panic:kernel diagnostic assertion "curcpu()->ci_biglockcount == 0" 
failed: ..kern_exit.c, line 209 kernel lock leaked


That happens every couple of thousand packages - sorry no dump (locking 
against myself as expected).


Frank



On 01/22/20 17:02, Andrew Doran wrote:

On Tue, Jan 21, 2020 at 07:59:35PM +, Andrew Doran wrote:


Hi Thomas,

On Tue, Jan 21, 2020 at 08:47:44PM +0100, Thomas Klausner wrote:


During a bulk build (in rust AFAICT), I got a panic with
panic: kernel diagnostic assertion "ci->ci_biglock_count == 0" failed: file 
"/usr/src/sys/sys/userret.h", line 88

That's this one:

static __inline void
mi_userret(struct lwp *l)
{
 struct cpu_info *ci;

 KPREEMPT_DISABLE(l);
 ci = l->l_cpu;
 KASSERT(l->l_blcnt == 0);
 KASSERT(ci->ci_biglock_count == 0);



The backtrace in the crash dump is not very helpful:

(gdb) bt
#0  0x80224315 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:720
#1  0x809f5ec3 in kern_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:61
#2  0x80a37109 in vpanic (fmt=0x8135e980 "kernel %sassertion \"%s\" failed: 
file \"%s\", line %d ", ap=ap@entry=0xad0928973f48)
 at /usr/src/sys/kern/subr_prf.c:336
#3  0x80e7b0b3 in kern_assert (fmt=fmt@entry=0x8135e980 "kernel %sassertion 
\"%s\" failed: file \"%s\", line %d ")
 at /usr/src/sys/lib/libkern/kern_assert.c:51
#4  0x802568ce in mi_userret (l=0xcfc320ca9c00) at 
/usr/src/sys/sys/userret.h:91
#5  userret (l=0xcfc320ca9c00) at ./machine/userret.h:81
#6  syscall (frame=) at /usr/src/sys/arch/x86/x86/syscall.c:166
#7  0x802096ad in handle_syscall ()

hannken@ supplied me with a repro for this one so I'm going to look into it
tomorrow morning.  syzbot has also run into it recently.

This should be fixed now, with the following revisions:

cvs rdiff -u -r1.165 -r1.166 src/sys/kern/kern_lock.c
cvs rdiff -u -r1.336 -r1.337 src/sys/kern/kern_synch.c

Cheers,
Andrew




Experience with NetBSD on Supermicro

2020-02-05 Thread Frank Kardel

Hi !

Does anybody have experience with NetBSD on Supermicro H11SSW-NT 
mainboard like in 2113S-WN24RT system with an EPYC 7302P?


I assume the the BCM57416 10GE is not yet supported when looking at our 
code.


Best regards,

  Frank



Re: 9.99.47 panic: diagnostic assertion "lwp_locked(l, spc->spc_mutex)" failed: file ".../kern_synch.c", line 1001

2020-02-16 Thread Frank Kardel

Yepp - I had two of those also

Frank


On 02/16/20 12:27, Thomas Klausner wrote:

Hi!

I just updated -current and quite soon had a panic:
cpu1: Begin traceback...
vpanic()
kern_assert
schend_lendpri
turnstile_block
rw_vector_enter
genfs_lock
layer_bypass
VOP_LOCK
vn_lock
layerfs_root
VFS_ROOT
lookup_once
namei_tryemulroot
namei
vn_open
do_open
do_sys_openat
sys_open
syscall

  Thomas




XEN 4.11 and 9.99.48 DOMU performance

2020-03-10 Thread Frank Kardel

This is my first XEN setup so I may have misconfigured something:

I have a 4G DOM0 on a 512G System with a EPYC 7302P 16-Core Processor.

On that I configured a 400G DOMU with 12 vcpus. like this:

name = "system"
kernel = "/netbsd-XEN3_DOMU.gz"
memory = 40
cpus="all"
vcpus=4
maxvcpus=12
vif = [ 'mac=aa:00:00:d1:00:01,bridge=bridge0',
'mac=aa:00:00:d1:00:02,bridge=bridge1' ]
disk = [ 'file:/data0/xen-roots/root-Alpine-system.img,0x0,w',
 'phy:/dev/wedges/data1,0x1,w' ]

On that I run postgresql 11 attempting to load a 1TB database.

Usually this workload keeps a machine continually busy cpu/io-wise.

I was expecting that I/O via the xen backend would be the bottleneck.

Instead DOM0 is only seldom busy for IO. DOMU is crawling along sleeping

at all sorts of places:

load averages:  0.01,  0.11,  0.14;   up 4+00:44:49 16:04:56
68 processes: 66 sleeping, 2 on CPU
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% 
idle

Memory: 314G Act, 26M Inact, 16M Wired, 24M Exec, 312G File, 60G Free
Swap:

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME WCPUCPU COMMAND
0 root   00 0K  803M CPU/1112:25 0.00%  0.00% [system]
19870 pgsql117078M 4056K biowai/4   3:32 0.00%  0.00% postgres
23443 pgsql117051G 1067M tstile/2   2:05 0.00%  0.00% postgres
17623 kardel85031M 4584K ttyraw/2   0:50 0.00%  0.00% systat
18581 pgsql 85051G  104M biowai/0   0:36 0.00%  0.00% postgres
19164 pgsql 85051G  106M uvn_fp/1   0:35 0.00%  0.00% postgres
 9695 pgsql 85051G  105M biowai/8   0:31 0.00%  0.00% postgres
23770 pgsql 84051G  103M uvn_fp/4   0:31 0.00%  0.00% postgres
24557 pgsql117051G 6440K tstile/0   0:30 0.00%  0.00% postgres
  135 pgsql 85051G  113M uvn_fp/4   0:28 0.00%  0.00% postgres
29232 pgsql 85051G  110M uvn_fp/2   0:28 0.00%  0.00% postgres
18124 pgsql 85051G  113M uvn_fp/6   0:27 0.00%  0.00% postgres
27138 pgsql 85051G  106M uvn_fp/0   0:26 0.00%  0.00% postgres
17531 pgsql 85051G  113M uvn_fp/7   0:24 0.00%  0.00% postgres
 3800 pgsql117051G  104M tstile/0   0:21 0.00%  0.00% postgres
25673 pgsql117051G  105M tstile/6   0:20 0.00%  0.00% postgres
24550 kardel85027M 2616K select/0   0:15 0.00%  0.00% screen
22941 pgsql 85051G  104M bioloc/0   0:14 0.00%  0.00% postgres
  345 root  85030M   16M pause/20:11 0.00%  0.00% ntpd
24483 pgsql 85051G  405M poll/1 0:06 0.00%  0.00% postgres
19694 pgsql 85051G   20M poll/4 0:05 0.00%  0.00% postgres
24189 kardel85087M 5260K select/0   0:04 0.00%  0.00% sshd
 1068 root  85062M 3388K poll/100:04 0.00%  0.00% 
pg_restore


xl list show the DOMU mostly being blocked most of the time.

Also the buffercache seems small:

  15282 metadata buffers using233730 kBytes of 
memory ( 0%).
   81970980 pages for cached file data using   327883920 kBytes of 
memory (83%).
   6043 pages for executables using24172 kBytes of 
memory ( 0%).
 366880 pages for anon (non-file) data   1467520 kBytes of 
memory ( 0%).
   15752203 free pages  63008812 kBytes of 
memory (16%).


What needs to be changed to avoid these surprising stalls and get the 
DOMU moving.


Do I need some XEN parameter tuning?

Frank



Re: XEN 4.11 and 9.99.48 DOMU performance

2020-03-10 Thread Frank Kardel
  0.0000 0.000  
0.0000 0.000   0  0  0  0 100
   069 0.0000 0.000  0.0000 0.000  0.0000 0.000 
0.0000 0.000  0.0000 0.000  0.0000 0.000  0.0000 0.000  
0.0000 0.000   0  0  0  0 100
   069 0.0000 0.000  0.0000 0.000  0.0000 0.000 
0.0000 0.000  0.0000 0.000  0.0000 0.000  0.0000 0.000  
0.0000 0.000   0  0  0  0 100
   069 4.0000 0.000  0.0000 0.000  33.999 0.293 
33.999 0.293  0.0000 0.000  4.0000 0.000  0.0000 0.000  
0.0000 0.000   0  0  0  0 100


To me it looks more like locking issues or xen scheduling features.

It makes progress, but very very slowly.

Frank

On 03/10/20 18:14, Manuel Bouyer wrote:

On Tue, Mar 10, 2020 at 04:20:22PM +0100, Frank Kardel wrote:

This is my first XEN setup so I may have misconfigured something:

I have a 4G DOM0 on a 512G System with a EPYC 7302P 16-Core Processor.

On that I configured a 400G DOMU with 12 vcpus. like this:

name = "system"
kernel = "/netbsd-XEN3_DOMU.gz"
memory = 40
cpus="all"
vcpus=4
maxvcpus=12
vif = [ 'mac=aa:00:00:d1:00:01,bridge=bridge0',
 'mac=aa:00:00:d1:00:02,bridge=bridge1' ]
disk = [ 'file:/data0/xen-roots/root-Alpine-system.img,0x0,w',
  'phy:/dev/wedges/data1,0x1,w' ]

On that I run postgresql 11 attempting to load a 1TB database.

Usually this workload keeps a machine continually busy cpu/io-wise.

I was expecting that I/O via the xen backend would be the bottleneck.

Instead DOM0 is only seldom busy for IO. DOMU is crawling along sleeping

at all sorts of places:


What does
iostat 5

show about the disks, in the dom0 and domU ?





Re: XEN 4.11 and 9.99.48 DOMU performance

2020-03-10 Thread Frank Kardel

No information about IPI in vmstat -i in DOM0 and DOMU.

Otherwise it is usually responsive. Sometimes things get stuck but 
switching a screen in screen seems to unstick things.


It seems like "wakeups" get sometimes lost.

Frank


On 03/10/20 19:20, Manuel Bouyer wrote:

On Tue, Mar 10, 2020 at 06:48:14PM +0100, Frank Kardel wrote:

[...]

To me it looks more like locking issues or xen scheduling features.

yes, that could be. does vmstat -i show anything about IPIs ?

Is the domU otherwise responsive ?





Re: XEN 4.11 and 9.99.48 DOMU performance

2020-03-16 Thread Frank Kardel

Hi Manuel !

I am running with this mornings -current and things seem to have 
improved quite a bit. I see


some usable I/O performance to a DOMU load > 8 which is more like it.

Lets see how it progresses.

Frank


On 03/14/20 12:48, Manuel Bouyer wrote:

There have been scheduler-related fixes in the last few days; did you
try with an up to date kernel ?





modload & xen and -current 9.99.60

2020-05-06 Thread Frank Kardel

Hi,

Running 9.99.60 XEN3_DOMU shows

[ 67264.313173] kobj_load, 444: [%M/bpfjit/bpfjit.kmod]: linker error: 
out of memory
[ 67292.894143] kobj_load, 428: [%M/scsiverbose/scsiverbose.kmod]: 
linker error: out of memory


and modload fails with the OOM error.

Is this an expected behavior or a bug? (kern.securelevel is -1).

Frank



Re: modload & xen and -current 9.99.60

2020-05-11 Thread Frank Kardel

A short summary from my private discussion with Manuel:

- the tripping point in my setup was around 44096 Kb

- type="pvh" works in large memory constellations

- Thus is issue seems to exist primarily in DOMU kernels

So it is similar, but not the same. In my scenario modloads fail when
a certain amount or more of memory is available.

I switched to type=pvh new.

Frank

On 05/11/20 15:15, Stephen Borrill wrote:

On Fri, 8 May 2020, Manuel Bouyer wrote:


On Fri, May 08, 2020 at 02:55:10PM +0200, Frank Kardel wrote:
I checked to same kernel in an instance with memory=2048 and it just 
works.


Using todays kernel also works woth memory=2048.

Using memory=65536 for the xen instance gives a surprising familiar

TEST-A# modload bpfjit
[  97.4727034] kobj_load, 444: [%M/bpfjit/bpfjit.kmod]: linker 
error: out of

memory
modload: bpfjit: Cannot allocate memory
TEST-A#

So it seems to be linked to available memory.

The more you have the less you get for modload.


It could be a variable overflow somewhere but I can't see how it 
relates to

64Gb. Does it work with 16Gb ?


This sounds similar to the problem I reported a couple of weeks ago 
with exactly 16GB:


http://mail-index.netbsd.org/port-xen/2020/04/17/msg009654.html

Also could you try with a PVH or HVM guest ? These ones would use 
modules
from /stand/amd64/ and not /stand/amd64-xen/ and should be close to 
native.


I don't have a box with that much RAM to test ...

--
Manuel Bouyer 
NetBSD: 26 ans d'experience feront toujours la difference
--





Re: modload & xen and -current 9.99.60

2020-05-08 Thread Frank Kardel

I checked to same kernel in an instance with memory=2048 and it just works.

Using todays kernel also works woth memory=2048.

Using memory=65536 for the xen instance gives a surprising familiar

TEST-A# modload bpfjit
[  97.4727034] kobj_load, 444: [%M/bpfjit/bpfjit.kmod]: linker error: 
out of memory

modload: bpfjit: Cannot allocate memory
TEST-A#

So it seems to be linked to available memory.

The more you have the less you get for modload.

Frank


On 05/07/20 22:52, Manuel Bouyer wrote:

On Thu, May 07, 2020 at 09:50:18PM +0200, Frank Kardel wrote:

see here:

Alpine: 21:45 ~ [8] sysctl  kern.module.path
kern.module.path = /stand/amd64-xen/9.99.60/modules

looks good


Alpine: 21:46 ~ [9] ll /stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
-r--r--r--  1 root  wheel  34328 May  5 16:58
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
Alpine: 21:46 ~ [10] size
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
textdata bss dec hex filename
   10399   0   0   10399289f
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
Alpine: 21:46 ~ [11] ll
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
-r--r--r--  1 root  wheel  140600 May  5 16:55
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
Alpine: 21:47 ~ [12] size
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
textdata bss dec hex filename
  132575  16   0  132591   205ef
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod

no problem for me, with sources from today:
xen1:/#modload bpfjit
xen1:/#modstat | grep !$ö
modstat | grep bpfjit
bpfjit misc filesys  -09174 sljit
xen1:/#modload pciverbose
xen1:/#modstat | grep !$
modstat | grep pciverbose
pciverbose misc filesys  -0 218 pci





Re: XEN 4.11 and 9.99.48 DOMU performance

2020-03-10 Thread Frank Kardel

interrupt total rate type
vmcmd kills   179840 misc
vmcmd extends 179800 misc
vmcmd calls  1850330 misc
pserialize exclusive access 1450 misc
vmem static_bt_inuse2000 misc
vmem static_bt_count2000 misc
rndpseudo open soft  390 misc
TLB shootdown  50603277  139 intr
softint net/0  33488890   92 misc
softint bio/0 10 misc
softint clk/0   4239404   11 misc
softint ser/0  26560 misc
callout late/0  1990 misc
crosscall unicast600 misc
namecache entries collected  9405282 misc
namecache under scan target  3622200 misc
vcpu0 xenev0 channel 4 18112410   50 intr
softint net/15644651 misc
softint bio/1 10 misc
...

softint clk/11   3856351 misc
softint ser/11  1490 misc
callout late/11   10 misc
vcpu0 xenev0 channel 2  2970 intr
vcpu0 raw systime went backwards1580 intr
vcpu0 xenev0 channel 5 36222558   99 intr
vcpu1 xenev0 channel 6   1554970 intr
vcpu1 missed hardclock   830 intr
vcpu1 xenev0 channel 7 36222475   99 intr
vcpu2 xenev0 channel 8 15438790   42 intr
...

xbd0 map unaligned960510 misc
xbd1 map unaligned  14069263 misc

TLB shootdown is there as some crosscall unicast. I don't see any other 
IPIs though.


Frank


On 03/10/20 19:52, Manuel Bouyer wrote:

On Tue, Mar 10, 2020 at 07:30:33PM +0100, Frank Kardel wrote:

No information about IPI in vmstat -i in DOM0 and DOMU.

the dom0 is not MP so I don't expect to see IPIs here.
But the domU is, so there should be IPIs here.

Hum, it looks like IPIs are in vmstat -e, not -i ...
sorry


Otherwise it is usually responsive. Sometimes things get stuck but switching
a screen in screen seems to unstick things.

It seems like "wakeups" get sometimes lost.

I guess it could be related to IPIs.
But I'm running daily tests on domUs and I didn't notice anything strange





Re: modload & xen and -current 9.99.60

2020-05-07 Thread Frank Kardel

see here:

Alpine: 21:45 ~ [8] sysctl  kern.module.path
kern.module.path = /stand/amd64-xen/9.99.60/modules
Alpine: 21:46 ~ [9] ll /stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
-r--r--r--  1 root  wheel  34328 May  5 16:58 
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
Alpine: 21:46 ~ [10] size 
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod

   textdata bss dec hex filename
  10399   0   0   10399289f 
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
Alpine: 21:46 ~ [11] ll 
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
-r--r--r--  1 root  wheel  140600 May  5 16:55 
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
Alpine: 21:47 ~ [12] size 
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod

   textdata bss dec hex filename
 132575  16   0  132591   205ef 
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod


Frank


On 05/07/20 17:40, Manuel Bouyer wrote:

On Thu, May 07, 2020 at 07:45:48AM +0200, Frank Kardel wrote:

Hi,

Running 9.99.60 XEN3_DOMU shows

[ 67264.313173] kobj_load, 444: [%M/bpfjit/bpfjit.kmod]: linker error: out
of memory
[ 67292.894143] kobj_load, 428: [%M/scsiverbose/scsiverbose.kmod]: linker
error: out of memory

and modload fails with the OOM error.

Is this an expected behavior or a bug? (kern.securelevel is -1).

What does kern.module.path show for you ?





Re: modload & xen and -current 9.99.60

2020-05-07 Thread Frank Kardel

Will test again with a newer kernel.

I forgot to mention the that DOMU is configured with memory=40.

I will also test with a lower memory value.

Frank


On 05/07/20 22:52, Manuel Bouyer wrote:

On Thu, May 07, 2020 at 09:50:18PM +0200, Frank Kardel wrote:

see here:

Alpine: 21:45 ~ [8] sysctl  kern.module.path
kern.module.path = /stand/amd64-xen/9.99.60/modules

looks good


Alpine: 21:46 ~ [9] ll /stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
-r--r--r--  1 root  wheel  34328 May  5 16:58
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
Alpine: 21:46 ~ [10] size
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
textdata bss dec hex filename
   10399   0   0   10399289f
/stand/amd64-xen/9.99.60/modules/bpfjit/bpfjit.kmod
Alpine: 21:46 ~ [11] ll
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
-r--r--r--  1 root  wheel  140600 May  5 16:55
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
Alpine: 21:47 ~ [12] size
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod
textdata bss dec hex filename
  132575  16   0  132591   205ef
/stand/amd64-xen/9.99.60/modules/pciverbose/pciverbose.kmod

no problem for me, with sources from today:
xen1:/#modload bpfjit
xen1:/#modstat | grep !$
modstat | grep bpfjit
bpfjit misc filesys  -09174 sljit
xen1:/#modload pciverbose
xen1:/#modstat | grep !$
modstat | grep pciverbose
pciverbose misc filesys  -0 218 pci





Re: zfs forgetting cache wedges?

2020-10-06 Thread Frank Kardel
Yepp, Moving devpubd earlier (before mountall as that does the "zfs 
mount -a" !) works.


Looks like we could refine the sequence here or pursue a variant of 
devfs in the spare time :-).


Frank


On 09/28/20 19:41, Michael van Elst wrote:

kar...@kardel.name (Frank Kardel) writes:


Interesting - I am running 9.99.72 currently.
I was always wondering why the devices show no statistics. These are
simple gpt zfs wedges.
Any idea what is wrong there?


When you use devpubd to create symlinks in dev/wedges, the links may
be stale when zfs starts because devpubd runs too late.

Moving devpubd to an earlier position would help, but the wedgenames hook
doesn't work without /usr.





zfs forgetting cache wedges?

2020-09-28 Thread Frank Kardel

Hi, is it normal that ZFS sort of forgets its cache configuration?

Given this configuration:
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP 
 HEALTH  ALTROOT
pool-18.94T  2.76T  6.17T - 5%30%  1.11x 
 ONLINE  -

  raidz1  8.94T  2.76T  6.17T - 5%30%
wedges/zfs10g0-0  -  -  - -  -  -
wedges/zfs10g1-0  -  -  - -  -  -
wedges/zfs10g2-0  -  -  - -  -  -
cache -  -  - -  -  -
  wedges/zfs-c1256G  4.40G   252G - 0% 1%

After a boot it looks like this:
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP 
 HEALTH  ALTROOT
pool-18.94T  2.76T  6.17T - 5%30%  1.11x 
 ONLINE  -

  raidz1  8.94T  2.76T  6.17T - 5%30%
wedges/zfs10g0-0  -  -  - -  -  -
wedges/zfs10g1-0  -  -  - -  -  -
wedges/zfs10g2-0  -  -  - -  -  -
cache -  -  - -  -  -
  384839849488-  -  - -  -  -

The cache wedge does not look very usable in that state.
Removing the device and adding it again gives a working cache device 
until the next reboot.


What am I doing wrong or what needs to be fixed?

Frank


Re: zfs forgetting cache wedges?

2020-09-28 Thread Frank Kardel

Interesting - I am running 9.99.72 currently.

I was always wondering why the devices show no statistics. These are 
simple gpt zfs wedges.


Any idea what is wrong there?

Frank


On 09/28/20 18:04, Michael van Elst wrote:

kar...@netbsd.org (Frank Kardel) writes:


After a boot it looks like this:
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP
  HEALTH  ALTROOT
pool-18.94T  2.76T  6.17T - 5%30%  1.11x
  ONLINE  -
   raidz1  8.94T  2.76T  6.17T - 5%30%
 wedges/zfs10g0-0  -  -  - -  -  -
 wedges/zfs10g1-0  -  -  - -  -  -
 wedges/zfs10g2-0  -  -  - -  -  -
cache -  -  - -  -  -
   384839849488-  -  - -  -  -
The cache wedge does not look very usable in that state.


Didn't happen here (with recent -current). The device paths for
the pool devices are stored in /etc/zfs/zfs.cache and the device
path to the cache device is stored on the pool devices.

But your pool devices also do not return data, and that's probably
the reason for the strange cache device path that couldn't be read.

NAME  SIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  
ALTROOT
mypool 80M   104K  79.9M - 4% 0%  1.00x  ONLINE  -
   wedges/image080M   104K  79.9M - 4% 0%
cache-  -  - -  -  -
   wedges/image1  95.2M 1K  95.2M - 0% 0%





Re: RPI3 serlial clock confusion?

2020-07-09 Thread Frank Kardel

You are looking at the console port running at 115200 at boot time, right?

My 9.99.69 console output starts being garbled when 
machdep.cpu.frequency.target is being set to 1400.


That would match the other comments here.

As you didn't update the dtb files could that be the difference? I think 
we got an upgrade of


the dtb files between .17 and .69.

Best regards,

  Frank


On 07/08/20 22:50, Michael Cheponis wrote:
I have been running 9.99.17 on RPi3+ and did ./build.sh current, which 
produced 9.99.69 -- and it works perfectly, no garbling.  I just 
copied src/sys/arch/evbarm/compile/obj/GENERIC/netbsd.img to 
/boot/KERNEL7.IMG and rebooted.


# sysctl -a|grep freq
machdep.cpu.frequency.target = 1400
machdep.cpu.frequency.current = 1400
machdep.cpu.frequency.min = 600
machdep.cpu.frequency.max = 1400
machdep.cpu.frequency.available = 600 1400

So I'm now more confused than ever.



On Wed, Jul 8, 2020 at 9:03 AM Michael van Elst <mailto:mlel...@serpens.de>> wrote:


kar...@netbsd.org <mailto:kar...@netbsd.org> (Frank Kardel) writes:

>The next message is the setting of the maxiimum frequency which
hoses the
>RPI3B serial port speed. Do we have a clock setting/source issue
here?

It's a hardware limitation, the UART frequency is coupled with the
CPU frequency.

You can either run with a fixed CPU frequency or configure the
other UART
as serial port. The latter then causes problems with the bluetooth
controller.

-- 
-- 
Michael van Elst

Internet: mlel...@serpens.de <mailto:mlel...@serpens.de>
"A potential Snark may lurk in
every tree."





RPI3 serlial clock confusion?

2020-07-08 Thread Frank Kardel
Using the image from 
http://nycdn.netbsd.org/pub/NetBSD-daily/HEAD/202007080050Z/evbarm-earmv7hf/binary/gzimg/armv7.img.gz


The bot succeeds, but after the message starting local daemons.

Setting securelevel: kern.securelevel: 0 -> 1
Starting virecover.
Starting devpubd.
Starting local daemons:.
[garbled]

machdep.cpu.frequency.target: 600 -> 1400 is the next line in the HDMI 
screen.


The next message is the setting of the maxiimum frequency which hoses the

RPI3B serial port speed. Do we have a clock setting/source issue here?

Older releases (9.0, 9.99.56) do not change the serial port setup at 
that point.


Frank



-current and RPI 2/3

2020-07-17 Thread Frank Kardel

I am having trouble to get Raspberries o boot recent -current (9.99.69).

Raspberry Pi 2 Model B Rev 1.1

[   1.030] genfb0 at simplebus1: switching to framebuffer console
[   1.030] wsdisplay0 at genfb0 kbdmux 1: console (default, vt100 
emulation)

[   1.030] vchiq0 at simplebus1: BCM2835 VCHIQ
[   1.030] armpmu0 at simplebus0: Performance Monitor Unit
[   1.030] gpioleds0 at simplebus0: ACT PWR
[   1.030] bcmrng0 at simplebus1: RNG
[   1.030] entropy: ready

[   1.030] uvm_fault(0x80b2f3c8, 0, 1) -> e
[   1.030] Fatal kernel mode data abort: 'Translation Fault (S)'
[   1.030] trapframe: 0x80b68ea8
[   1.030] FSR=0005, FAR=00c0, spsr=a153
[   1.030] r0 =80808a00, r1 =0004, r2 =, r3 =
[   1.030] r4 =809e6cc4, r5 =0001, r6 =809e6ccc, r7 =0004
[   1.030] r8 =809e6cd4, r9 =809e6cc4, r10=809e6cc4, r11=80b68f34
[   1.030] r12=80808a00, ssp=80b68ef8, slr=0004, pc =8046f66c

Stopped in pid 0.0 (system) at  netbsd:cpu_topology_init+0x14c: ldr 
r1, [r3,


Raspberry Pi 2 Model B Rev 1.2

[   1.030] vchiq0 at simplebus1: BCM2835 VCHIQ
[   1.030] armpmu0 at simplebus0: Performance Monitor Unit
[   1.030] gpioleds0 at simplebus0: ACT PWR
[   1.030] bcmrng0 at simplebus1: RNG
[   1.030] entropy: ready
[   1.030] cpu_topology_init: info bogus, faking it
[   1.5549067] cpu3: 600 MHz Cortex-A53 r0p4 (Cortex V8A core)
[   1.5749092] cpu3: DC enabled IC enabled WB enabled EABT branch 
prediction enabled

[   1.6149103] cpu3: 32KB/64B 2-way L1 VIPT Instruction cache
[   1.6449126] cpu3: 32KB/64B 4-way write-back-locking-C L1 PIPT Data cache
[   1.6749184] cpu3: 512KB/64B 16-way write-through L2 PIPT Unified cache
[   1.7049206] vfp3 at cpu3: NEON MPE (VFP 3.0+), rounding, NaN 
propagation, denormals

### stuck here

Raspberry Pi 3 Model B Plus Rev 1.3

[   1.030] wsdisplay0 at genfb0 kbdmux 1: console (default, vt100 
emulation)

[   1.030] vchiq0 at simplebus1: BCM2835 VCHIQ
[   1.030] armpmu0 at simplebus0: Performance Monitor Unit
[   1.030] gpioleds0 at simplebus0: ACT
[   1.030] bcmrng0 at simplebus1: RNG
[   1.030] entropy: ready
[   1.030] cpu_topology_init: info bogus, faking it
[   1.7022878] cpu3: 600 MHz Cortex-A53 r0p4 (Cortex V8A core)
[   1.7222897] cpu3: DC enabled IC enabled WB enabled EABT branch 
prediction enabled

[   1.7622915] cpu3: 32KB/64B 2-way L1 VIPT Instruction cache
[   1.7922937] cpu3: 32KB/64B 4-way write-back-locking-C L1 PIPT Data cache
[   1.8222994] cpu3: 512KB/64B 16-way write-through L2 PIPT Unified cache
[   1.8523009] vfp3 at cpu3: NEON MPE (VFP 3.0+), rounding, NaN 
propagation, denormals

### stuck here

sometimes cpu2 makes it to show up.

This happens with self compiles sources and the latest releng version

[   1.000] NetBSD 9.99.69 (GENERIC) #0: Fri Jul 17 02:16:57 UTC 2020
[   1.000] 
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/evbarm/compile/GENERIC


Older -current versions manage to boot.

Frank



Re: -current and RPI 2/3

2020-07-19 Thread Frank Kardel

Thanks: RPI2 and RPI3 boot again !

Frank


On 07/19/20 13:50, Nick Hudson wrote:

On 17/07/2020 14:00, Frank Kardel wrote:

I am having trouble to get Raspberries o boot recent -current (9.99.69).



Should be fixed now with

src/sys/arch/arm/arm/armv6_start.S:1.21
src/sys/arch/arm/include/asan.h:1.3


Raspberry Pi 3 Model B Plus Rev 1.3



Hoping my commit fixes this too - I didn't test.

Sorry for the breakage.

Nick




Re: Routing socket issue?

2021-01-31 Thread Frank Kardel

Looks reasonable to me.

I think the the overflows can be triggered when the routing daemon add a 
longer list of routes like when discovering another part


of the net. This happens e. g. here when my host starts up the VPN an 
then frr needs to insert 30+ routes it just learned.


So depending on the topology there can be at times route message floods.

Frank


On 01/31/21 09:33, Roy Marples wrote:

Hi Frank :)

On 31/01/2021 07:58, Frank Kardel wrote:
For example I fail to see how RTM_LOSING helps that because it won't 
change

how ntpd would configure itself.

Well if I read the comment right I am inclined to differ here:
In in_pcs.c we find:
/*
  * Check for alternatives when higher level complains
  * about service problems.  For now, invalidate cached
  * routing information.  If the route was created dynamically
  * (by a redirect), time to try a default gateway again.
  */
in_losing(struct inpcb *inp)

and the call is in tcp_time.c:
 /*
  * If losing, let the lower level know and try for
  * a better route.  Also, if we backed off this far,
  * our srtt estimate is probably bogus.  Clobber it
  * so we'll take the next rtt measurement as our srtt;
  * move the current srtt into rttvar to keep the current
  * retransmit times until then.
  */

As ntpd acts after a grace period the routing engine may have 
corrected this situation and routing may indeed change.
ntpd's interactions with peers can take up to 1024s so it is good to 
attempt in a best effort way to keep the internal

local address/socket state close to the current state.
It is likely though that there have been routing messages like 
RTM_CHANGE/ADD/DELETE before that and RTM_LOSING is not providing

additional information at the point.


Right, RTM_LOSING is just informational.
If any routing does change then we get RTM_CHANGE/ADD/DELETE etc.





As NTP doesn't bring interfaces up or down, RFM_IFANNOUNCE is 
useless as well.
If the interface does vanish, any addresses on it will be reported 
via RTM_DELADDR.
RTM_IFINFO is also questionable as commentary in the code is that it 
only cares about addresses.



Well I read
ntp_io.c
 /*
  * we are keen on new and deleted addresses and
  * if an interface goes up and down or routing
  * changes
  */
not as being interested in addresses only.

Also keep in mind that at this point routing messages are processed 
in a loop and the action here

 timer_interfacetimeout(current_time + UPDATE_GRACE);
just sets the variable for the next interface+local address update 
run. This is very cheap. The grace period
will batch multiple routing message together. An explicit routing 
message flush is from my point of view
code clutter here. as the socket is effectively drained in the loop 
at the cost of examining the msg_type and setting

a variable. Not much gained here.


OK, we'll keep RTM_IFINFO but drop RTM_IFANNOUNCE.
The point is trying to eliminate the overflow message entirely.

I mean, if you want to argue against any of that then I would 
suggest why even bother filtering or looking at overflow at all?
Shrink the code - any activity on the routing socket, drain it 
ignoring all error, start the interface update timer.
That would be an option but we should react only on known events. 
There may be one or two events that could be removed from
the list after examination as other messages can cover for them. Keep 
in mind the this is a portable code section and the
code tries to be on the fail safe, robust side for the goal of 
address/routing tracking so adjusting it to a particular implementation

may break on other os implementations.


Well, Dragonfly (prior to my patches there) and by extension FreeBSD 
(not checked to see if that changed) both emit RMT_DELADDR before 
RTM_IFANNOUNCE (ie wrong order) so when they do overflow you never see 
RTM_IFANNOUNCE to say the interface vanished. Hence there is zero 
point is listening for it for ntp.






As for the message: IMHO it does not need to be logged at all 
(DPRINTF/maybe LOGDEBUG at most) because the overflow should and 
does just trigger ntpd to reevaluate the interface/routing 
configuration.


This information is not important at all for normal operation as 
the effects are correctly mitigated.


I changed it to LOG_DEBUG as well as removing RTM_LOSING and 
RTM_IFANNOUNCE as discussed above.




Great.

BTW: does the current code revert to (fail safe) periodic interface 
scanning if the routing socket is being disabled (happens when an 
unexpected error code is returned from read(2))?


No.

The socket is non blocking so the only error to ignore here would be 
EINTR.

Any other errors are due to bad programming IMO.
Could be bad programming, but I prefer the ntpd being forgiving 
against hiccups by reverting to periodic scanning when we
disable to routing socket. That is a fail safe

Re: st.c update has broken dump multi-tape support

2021-06-09 Thread Frank Kardel

Hi Brett !

A quick analysis leaves me to believe that the culprit is in this commit:

   revision 1.234
   date: 2018-03-24 09:08:19 +0100;  author: mlelstv;  state: Exp;
   lines: +176 -134;  commitid: xU4Kh6YFLfDywGvA;
   branches:  1.234.2;
   Use separate lock to protect internal state and release locks when
   calling biodone.

Here the logic for ST_EARLY_WARN got lost. So the EOM always delivers 
EIO instead


of a 0 write count when EOM is reported by the drive and early warning 
is enabled.


The early warning logic is described in st.4 as

EOM HANDLING
 Attempts to write past EOM and how EOM is reported are handled 
slightly

 differently based upon whether EARLY WARNING recognition is enabled in
 the driver.

 If EARLY WARNING recognitions is not enabled, then detection of 
EOM (as

 reported in SCSI Sense Data with an EOM indicator) causes the write
 operation to be flagged with I/O error (EIO).  This has the effect for
 the user application of not knowing actually how many bytes were read
 (since the return of the read(2) system call is set to −1).

 If EARLY WARNING recognition is enabled, then detection of EOM (as
 reported in SCSI Sense Data with an EOM indicator) has no immediate
 effect except that the driver notes that EOM has been detected. If the
 write completing didn't transfer all data that was requested, then the
 residual count (counting bytes not written) is returned to the user
 application. In any event, the next attempt to write (if that is 
the next
 action the user application takes) is immediately completed with 
no data
 transferred, and a residual returned to the user application 
indicating

 that no data was transferred.  This is the traditional UNIX EOF
 indication. The state that EOM had been seen is then cleared.

 In either mode of operation, the driver does not prohibit the user
 application from writing more data, if it chooses to do so. This will
 continue up until the physical end of media, which is usually 
signalled
 internally to the driver as a CHECK CONDITION with the Sense Key 
set to

 VOLUME OVERFLOW. When this or any otherwise unhandled error occurs, an
 error return of EIO will be transmitted to the user application.  This
 does indeed mean that if EARLY WARNING is enables and the device
 continues to set EOM indicators prior to hitting physical end of 
media,
 that an indeterminate number of 'short write returns' as described 
in the

 previous paragraph will occur. However, the expected user application
 behaviour (in common with other systems) is to close the tape and 
rewind

 and request another tape upon the receipt of the first EOM indicator,
 possibly after writing one trailer record.

dump abort on EIO. dump will switch tapes if it gets a zero write count.

Thus the 1.234 commit should be fixed with respect to EOM signalling.

Frank


On 06/09/21 02:47, Brett Lymn wrote:

Folks,

I don't perform a tape backup nor update this machine very often so it
has taken a while for me to spot this.

I backup to tape which takes a few tapes to complete, in the past this
has worked fine, when one tape is full dump recognises this and prompts
for a new tape.

I attempted a backup a couple of days ago and now dump says "write
error" and then asks if it should restart the dump, answering yes does
restart the dump from the beginning, answering no causes dump to exit.

As I said, this machine does not get updated often so I suspect this
problem has been there for a while.  The kernel was built with v1.240 of
st.c, this version causes dump to misbehave.  I reverted st.c back to
v1.231 (this was the version of st.c that was used in the kernel that
made the last successful backup).  After adding a couple of FALLTHROUGH
comments to get v1.231 to compile I booted to this kernel and found that
dump behaved correctly again.

Given the above it looks like a change to st.c between v1.231 and v1.240
has broken multi-tape dumps.  Fortunately most of the commits in that
bracket are cosmetic, one that does stand out is v1.238 which does
modify the tape position handling.  I will try a kernel that
incorporates v1.237 of st.c and see what happens.  Unfortunately,
testing is a very slow process as it takes about 3 hours to fill a tape
though I may be able to reduce that by using a lto-1 tape instead which
should halve the time taken to fill a tape.





Re: st.c update has broken dump multi-tape support

2021-06-10 Thread Frank Kardel

Hi Brett,

I meant the section in ststart1 where error is set to zero followed by 
goto out inf the fixed blocksize part.


on that path the biodone() would be missing - just something I noticed 
when looking at the code.


/*
 * only FIXEDBLOCK devices have pending I/O or space operations.
 */
if (st->flags & ST_FIXEDBLOCKS) {
/*
 * If we are at a filemark but have not reported it yet
 * then we should report it now
 */
if (st->flags & ST_AT_FILEMARK) {
if ((bp->b_flags & B_READ) == B_WRITE) {
/*
 * Handling of ST_AT_FILEMARK in
 * st_space will fill in the right file
 * mark count.
 * Back up over filemark
 */
if (st_space(st, 0, SP_FILEMARKS, 0)) {
error = EIO;
goto out;
}
} else {
bp->b_resid = bp->b_bcount;
error = 0;
st->flags &= ~ST_AT_FILEMARK;
 >>>>>>/* XXX missing a biodone() here? */
goto out;
}
}
}

Frank


On 06/10/21 08:42, Brett Lymn wrote:

Hi Frank,

On Thu, Jun 10, 2021 at 07:45:25AM +0200, Frank Kardel wrote:

Could you check whether my suspicion that biodone() may be missing the
ststart1 function in the


It is missing there but is called in ststart if the error is != 0 after
the ststart1 call.  I was going to update the ststart function to do
something very close to what you have done.


I have not tested the patch as my machine with the tapes is remote and has
no remote console

and I don't want to brick that while being off-site.


That's ok - I can test here without harming anything, the machine the
tape drive is attached to has to be booted to windows for $WORK during
the day so my testing window is limited :)





Re: st.c update has broken dump multi-tape support

2021-06-09 Thread Frank Kardel

Hi Brett,

that was my impression too. Just check that the write return count on 
EOM is zero to indicate EOF.


My stab at this would be the attached patch.

Could you check whether my suspicion that biodone() may be missing the 
ststart1 function in the


error == 0 case?

I have not tested the patch as my machine with the tapes is remote and 
has no remote console


and I don't want to brick that while being off-site.

Frank


On 06/10/21 04:53, Brett Lymn wrote:

Hi Frank,

On Wed, Jun 09, 2021 at 07:06:07PM +0200, Frank Kardel wrote:

A quick analysis leaves me to believe that the culprit is in this commit:

revision 1.234
date: 2018-03-24 09:08:19 +0100;  author: mlelstv;  state: Exp;
lines: +176 -134;  commitid: xU4Kh6YFLfDywGvA;
branches:  1.234.2;
Use separate lock to protect internal state and release locks when
calling biodone.

Here the logic for ST_EARLY_WARN got lost. So the EOM always delivers EIO
instead


Yes, I think you are correct looking at that change.  My backup script
does set the early warning flag.  I will have a stab at fixing this
later today, I think if I just avoid returning EIO in the ST_EOM_PENDING
case inside ststart it should be good.



--- st.c	2021-06-09 18:31:50.860128750 +0200
+++ st.c.240	2021-06-10 07:30:55.828024537 +0200
@@ -1242,6 +1242,7 @@
 bp->b_resid = bp->b_bcount;
 error = 0;
 st->flags &= ~ST_AT_FILEMARK;
+/* XXX missing a biodone() here? */
 goto out;
 			}
 		}
@@ -1251,7 +1252,13 @@
 	 * yet then we should report it now.
 	 */
 	if (st->flags & (ST_EOM_PENDING|ST_EIO_PENDING)) {
-		error = EIO;
+		if (st->flags & ST_EIO_PENDING) {
+			error = EIO;
+		} else {
+			error = 0;
+			bp->b_resid = bp->b_bcount;
+			biodone(bp);			
+		}
 		goto out;
 	}
 


Re: st.c update has broken dump multi-tape support

2021-06-12 Thread Frank Kardel

Hi !

Look pretty good so far, ... can we remove following marked lines which 
are already

taken care of in ststart1 complete case?

/*
 * only FIXEDBLOCK devices have pending I/O or space operations.
 */
if (st->flags & ST_FIXEDBLOCKS) {
/*
 * If we are at a filemark but have not reported it yet
 * then we should report it now
 */
if (st->flags & ST_AT_FILEMARK) {
if ((bp->b_flags & B_READ) == B_WRITE) {
/*
 * Handling of ST_AT_FILEMARK in
 * st_space will fill in the right file
 * mark count.
 * Back up over filemark
 */
if (st_space(st, 0, SP_FILEMARKS, 0)) {
error = EIO;
goto out;
}
} else {
>>> bp->b_resid = bp->b_bcount;
error = 0;
st->flags &= ~ST_AT_FILEMARK;
goto out;
}
}
}

/*
 * If we are at EOM but have not reported it
 * yet then we should report it now.
 */
if (st->flags & (ST_EOM_PENDING|ST_EIO_PENDING)) {
>>  bp->b_resid = bp->b_bcount;
error = 0;
if (st->flags & ST_EIO_PENDING)
error = EIO;
st->flags &= ~(ST_EOM_PENDING|ST_EIO_PENDING);
goto out;
}

Frank


On 06/11/21 21:10, Michael van Elst wrote:

bl...@internode.on.net (Brett Lymn) writes:


Here is the patch that makes multi-tape dumps work for me:

I'm currently testing

http://ftp.netbsd.org/pub/NetBSD/misc/mlelstv/st.diff

It's a bit cumbersome to do multi-tape dumps if your disk has 11GB
data and the tape fits 40GB uncompressed.





Re: st.c update has broken dump multi-tape support

2021-06-10 Thread Frank Kardel

Hi !

I assumed Michael was proposing a solution for the missing biodone() in 
the fixed block path (though that part was missing in the patch).


We should try to fix both issues (write return code and missing biodone) 
with hopefully minimal changes without sacrificing clarity and abstraction.


IMHO ststart() should manage the interface to ststart1() but not look 
into specific bits (ST_EOM_PENDING) and ststart1() should signal 
ststart() errno and biodone(). Thus I did see


merit in Michael's proposal. This is a style discussion, however.

On a more important note: Looking into the code again we also seem to 
miss clearing ST_{EOM,EIO}_PENDING that is something that was present in 
1.231. Clearing that would get st.c


in-line again with st(4).

Frank


On 06/10/21 15:52, Brett Lymn wrote:

On Thu, Jun 10, 2021 at 12:13:22PM +0200, Michael van Elst wrote:

On Thu, Jun 10, 2021 at 12:02:19PM +0200, Michael van Elst wrote:


If you don't like the fake errno, the function needs to return
two values, the error value and a boolean to finish the
unqueued request. Cleaner, but more changes.

E.g. (not even compile-tested):


I don't think that is quite right.  At line 1204 error is set to EIO, even with 
your changes
b_error will still get set to EIO when EOM_PENDING is true.  Previously b_error 
was only set
b_error would be set to EIO in previous versions this would only happen if 
there was no
ST_EOM_PENDING flag set.  I did a much smaller change in ststart inside
the if at line 1290 I added a check to only set b_error to the value of error 
unless error
==EIO and st->flags contains ST_EOM_PENDING.  This change made dump perform as 
expected and
prompt for a new tape.





Re: st.c update has broken dump multi-tape support

2021-06-11 Thread Frank Kardel

Hi !

ST_EOM_PENDING is set in st_interpret_sense()

- always for fixed block mode on EOM condition

- if EWARN enabled and EOM condition for variable block size

Frank.


On 06/10/21 23:59, Brett Lymn wrote:

On Thu, Jun 10, 2021 at 05:38:34PM +0200, Michael van Elst wrote:

Sorry, it doesn't fix the EOM handling, just the biodone.


mea culpa... I should take more time before replying...


I still have to understand the EOM logic :)


I will post up a diff later that appears to work for me.  From what the code 
used to do and
the description Frank posted EOM is indicated by a 0 length write with no error 
iff the
early warning flag is set.  I haven't checked but I ASSuME that ST_EOM_PENDING 
will only be
set if the early warning flag is on.





NetBSD on Lenovo

2021-05-03 Thread Frank Kardel

Hi !

Has anyone tried NetBSD with Lenovo ThinkPad P17 Gen 1 20SN ?

Initial digging makes me believe that the Intel integrated Graphic could 
be working. NIVIDIA RTX{3,4,5}000 will probably not work.


Network should work, NVMe has a good chance also from what I could find.

Did anyone try this Notebook and what where the results?

Regards,

  Frank



Re: ZFS on current vs wedges - best practice?

2021-07-19 Thread Frank Kardel

Hi Jeff !

Yes, you can use wedge names.

you can configure thm like "zpool create tank raidz2 wedges/wedgename-a 
wedges/wedgename-b wedges/wedgename-c wedges/wedgename-d wedges/wedgename-e"


For wedges to work you need to start devpubd (/etc/rc.conf: devpubd=YES) 
before ZFS. Currently devpub start too late with its dependencies.


To start devpubd earlier you can use following dependencies in 
/etc/rc.d/devpubd


# PROVIDE: devpubd
# REQUIRE: root
# BEFORE:  DISKS

to recover (not tested) you may try:

start devpubd

zpool export tank # you may try without this first

zpool import -d /dev/wedges # list found pools

zpool import -d /dev/wedges -a # imports all found pools

Use the hints at your own risk. I learned those when we briefly 
(9.99.85-9.99.86) broke zfs vdev access via symlinks. be sure to use a 
recent (>2021-07-18) -current kernel in case you are running -current.


Frank

On 07/19/21 19:52, Jeff Rizzo wrote:
I had forgotten about this little detail, and am not sure about the 
best way to deal with it.



I have four disks partitioned with GPT (so I can create a raidframe 
raid1 on part of the disk, and use the rest for ZFS), and I made the 
mistake (?) of using wedge names to create the zpool.  So, after a 
reboot (but not the first time! only happened after n reboots), the 
wedges reordered themselves, and now my zpool looks like this:



NAME  STATE READ WRITE CKSUM
tank  UNAVAIL  0 0 0
  raidz2-0UNAVAIL  0 0 0
3140223856450238961   UNAVAIL  0 0 0  was /dev/dk4
1770477436286968258   FAULTED  0 0 0  was /dev/dk5
11594062134542531370  UNAVAIL  0 0 0  was /dev/dk6
dk7   ONLINE   0 0 0


I _ think_ I can figure out how to recover my data without recreating 
the entire pool.  (I hope - suggestions there welcome as well!  Once I 
recover this time, I'm going to have to replace the vdevs one at a 
time anyway because I just realized they wedges are misaligned to the 
underlying disk block size.  Sigh.)



However, I'm not sure the best way (is there a way?) to keep this from 
happening again.  Can I use wedge names? (Will those persist across 
boots?)  Other than this minor detail, I've been quite happy with ZFS 
in 9 and -current.






Re: pgdaemon high CPU consumption

2022-07-01 Thread Frank Kardel

Hi Matthias !

See PR 55707 
http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=55707 , which 
I do not considere fixed due to the pgdaemon issue. reverting arc.cto 
1.20 will give you many xcalls, but the system stays more usable.


Frank


On 07/01/22 07:55, Matthias Petermann wrote:

Good day,

since some time I noticed that on several of my systems with 
NetBSD/amd64 9.99.97/98 after longer usage the kernel process pgdaemon 
completely claims a CPU core for itself, i.e. constantly consumes 100%.
The affected systems do not have a shortage of RAM and the problem 
does not disappear even if all workloads are stopped, and thus no RAM 
is actually used by application processes.


I noticed this especially in connection with accesses to the ZFS set 
up on the respective machines - for example after checkout from the 
local CVS relic hosted on ZFS.


Is there already a known problem or what information would have to be 
collected to get to the bottom of this?


I currently have such a case online, so I would be happy to pull 
diagnostic information this evening/afternoon. At the moment all info 
I have is from top.


Normal view:

```
  PID USERNAME PRI NICE   SIZE   RES STATE   TIME   WCPU CPU COMMAND
0 root 1260 0K   34M CPU/0 102:45   100% 100% 
[system]

```

Thread view:


```
  PID   LID USERNAME PRI STATE   TIME   WCPUCPU NAME COMMAND
0   173 root 126 CPU/1  96:57 98.93% 98.93% pgdaemon [system]
```

Kind regards
Matthias





build.sh -m evbarm -a earmv6hf release fails

2022-07-19 Thread Frank Kardel

to compile vchiq_arm.o with today's -current(2022-07-19)

/tmp//cc4Xs586.s: Assembler messages:
/tmp//cc4Xs586.s:1362: Error: selected processor does not support `dsb' 
in ARM mode
/tmp//cc4Xs586.s:6689: Error: selected processor does not support `dsb' 
in ARM mode
/tmp//cc4Xs586.s:6711: Error: selected processor does not support `dsb' 
in ARM mode

--- vchiq_arm.o ---

*** Failed target: vchiq_arm.o
*** Failed commands:
${NORMAL_C}
=> @echo '   ' "compile  RPI_INSTALL/vchiq_arm.o" &&  : echo 
/usr/curbtools/bin/armv6--netbsdelf-eabihf-gcc   -mfloat-abi=soft 
-mapcs-frame -ffreestanding -fno-zero-initialized-in-bss 
-fno-delete-null-pointer-checks   -O2  -fstack-usage -Wstack-usage=3584  
-fno-strict-aliasing -fno-common -std=gnu99   -Werror -Wall -Wno-main 
-Wno-format-zero-length -Wpointer-arith -Wmissing-prototypes 
-Wstrict-prototypes -Wold-style-definition -Wswitch -Wshadow -Wcast-qual 
-Wwrite-strings -Wno-unreachable-code -Wno-pointer-sign -Wno-attributes 
-Wno-sign-compare -Walloca -Wno-address-of-packed-member  -march=armv6z 
-mtune=arm1176jzf-s -mfpu=vfp  --sysroot=/src/NetBSD/cur/BUILD.evbarm 
-I. -I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/acpica/dist 
-I/src/NetBSD/cur/src/sys/../common/lib/libx86emu 
-I/src/NetBSD/cur/src/sys/../common/lib/libc/misc 
-I/src/NetBSD/cur/src/sys/../common/include 
-I/src/NetBSD/cur/src/sys/arch  -I/src/NetBSD/cur/src/sys -nostdinc 
-DCOMPAT_UTILS  -DARM_GENERIC_TODR -D__HAVE_CPU_COUNTER  
-D__HAVE_CPU_UAREA_ALLOC_IDLELWP -D__HAVE_FAST_SOFTINTS  
-D__HAVE_MM_MD_DIRECT_MAPPED_PHYS -DCOMPAT_44  -DDIAGNOSTIC  
-D__HAVE_MM_MD_CACHE_ALIASING -DPLCONSOLE -D_KERNEL -D_KERNEL_OPT 
-std=gnu99 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/quad 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/atomic 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/hash/sha3 
-I/src/NetBSD/cur/src/sys/external/bsd 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-I/src/NetBSD/cur/src/sys/external/bsd/dwc2/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libfdt/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-DVCOS_VERIFY_BKPTS=1 -DUSE_VCHIQ_ARM -D__VCCOREVER__=0x0400 
-DVCHIQ_ENABLE_DEBUG=1 -DVCHIQ_LOG_DEFAULT=5 -c 
/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist/interface/vchiq_arm/vchiq_arm.c 
-o vchiq_arm.o  && /usr/curbtools/bin/armv6--netbsdelf-eabihf-gcc   
-mfloat-abi=soft -mapcs-frame -ffreestanding 
-fno-zero-initialized-in-bss -fno-delete-null-pointer-checks   -O2  
-fstack-usage -Wstack-usage=3584  -fno-strict-aliasing -fno-common 
-std=gnu99   -Werror -Wall -Wno-main -Wno-format-zero-length 
-Wpointer-arith -Wmissing-prototypes -Wstrict-prototypes 
-Wold-style-definition -Wswitch -Wshadow -Wcast-qual -Wwrite-strings 
-Wno-unreachable-code -Wno-pointer-sign -Wno-attributes 
-Wno-sign-compare -Walloca -Wno-address-of-packed-member  -march=armv6z 
-mtune=arm1176jzf-s -mfpu=vfp  --sysroot=/src/NetBSD/cur/BUILD.evbarm 
-I. -I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/acpica/dist 
-I/src/NetBSD/cur/src/sys/../common/lib/libx86emu 
-I/src/NetBSD/cur/src/sys/../common/lib/libc/misc 
-I/src/NetBSD/cur/src/sys/../common/include 
-I/src/NetBSD/cur/src/sys/arch  -I/src/NetBSD/cur/src/sys -nostdinc 
-DCOMPAT_UTILS  -DARM_GENERIC_TODR -D__HAVE_CPU_COUNTER  
-D__HAVE_CPU_UAREA_ALLOC_IDLELWP -D__HAVE_FAST_SOFTINTS  
-D__HAVE_MM_MD_DIRECT_MAPPED_PHYS -DCOMPAT_44  -DDIAGNOSTIC  
-D__HAVE_MM_MD_CACHE_ALIASING -DPLCONSOLE -D_KERNEL -D_KERNEL_OPT 
-std=gnu99 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/quad 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/atomic 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/hash/sha3 
-I/src/NetBSD/cur/src/sys/external/bsd 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-I/src/NetBSD/cur/src/sys/external/bsd/dwc2/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libfdt/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-DVCOS_VERIFY_BKPTS=1 -DUSE_VCHIQ_ARM -D__VCCOREVER__=0x0400 
-DVCHIQ_ENABLE_DEBUG=1 -DVCHIQ_LOG_DEFAULT=5 -c 
/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist/interface/vchiq_arm/vchiq_arm.c 
-o vchiq_arm.o  &&  : echo /usr/curbtools/bin/nbctfconvert -g -L VERSION 
vchiq_arm.o && /usr/curbtools/bin/nbctfconvert -g -L VERSION vchiq_arm.o

*** [vchiq_arm.o] Error code 1

Anything I missed here?


Re: build.sh -m evbarm -a earmv6hf release fails

2022-07-19 Thread Frank Kardel

fixed with src/sys/external/bsd/common/include/asm/barrier.h 1.18

Frank

On 07/19/22 21:43, Frank Kardel wrote:

to compile vchiq_arm.o with today's -current(2022-07-19)

/tmp//cc4Xs586.s: Assembler messages:
/tmp//cc4Xs586.s:1362: Error: selected processor does not support 
`dsb' in ARM mode
/tmp//cc4Xs586.s:6689: Error: selected processor does not support 
`dsb' in ARM mode
/tmp//cc4Xs586.s:6711: Error: selected processor does not support 
`dsb' in ARM mode

--- vchiq_arm.o ---

*** Failed target: vchiq_arm.o
*** Failed commands:
${NORMAL_C}
=> @echo '   ' "compile  RPI_INSTALL/vchiq_arm.o" &&  : echo 
/usr/curbtools/bin/armv6--netbsdelf-eabihf-gcc   -mfloat-abi=soft 
-mapcs-frame -ffreestanding -fno-zero-initialized-in-bss 
-fno-delete-null-pointer-checks   -O2  -fstack-usage 
-Wstack-usage=3584  -fno-strict-aliasing -fno-common -std=gnu99 
-Werror -Wall -Wno-main -Wno-format-zero-length -Wpointer-arith 
-Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition 
-Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wno-unreachable-code 
-Wno-pointer-sign -Wno-attributes -Wno-sign-compare -Walloca 
-Wno-address-of-packed-member -march=armv6z -mtune=arm1176jzf-s 
-mfpu=vfp --sysroot=/src/NetBSD/cur/BUILD.evbarm -I. 
-I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/acpica/dist 
-I/src/NetBSD/cur/src/sys/../common/lib/libx86emu 
-I/src/NetBSD/cur/src/sys/../common/lib/libc/misc 
-I/src/NetBSD/cur/src/sys/../common/include 
-I/src/NetBSD/cur/src/sys/arch  -I/src/NetBSD/cur/src/sys -nostdinc 
-DCOMPAT_UTILS  -DARM_GENERIC_TODR -D__HAVE_CPU_COUNTER 
-D__HAVE_CPU_UAREA_ALLOC_IDLELWP -D__HAVE_FAST_SOFTINTS 
-D__HAVE_MM_MD_DIRECT_MAPPED_PHYS -DCOMPAT_44  -DDIAGNOSTIC 
-D__HAVE_MM_MD_CACHE_ALIASING -DPLCONSOLE -D_KERNEL -D_KERNEL_OPT 
-std=gnu99 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/quad 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/atomic 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/hash/sha3 
-I/src/NetBSD/cur/src/sys/external/bsd 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-I/src/NetBSD/cur/src/sys/external/bsd/dwc2/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libfdt/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-DVCOS_VERIFY_BKPTS=1 -DUSE_VCHIQ_ARM -D__VCCOREVER__=0x0400 
-DVCHIQ_ENABLE_DEBUG=1 -DVCHIQ_LOG_DEFAULT=5 -c 
/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist/interface/vchiq_arm/vchiq_arm.c 
-o vchiq_arm.o  && /usr/curbtools/bin/armv6--netbsdelf-eabihf-gcc   
-mfloat-abi=soft -mapcs-frame -ffreestanding 
-fno-zero-initialized-in-bss -fno-delete-null-pointer-checks   -O2  
-fstack-usage -Wstack-usage=3584  -fno-strict-aliasing -fno-common 
-std=gnu99 -Werror -Wall -Wno-main -Wno-format-zero-length 
-Wpointer-arith -Wmissing-prototypes -Wstrict-prototypes 
-Wold-style-definition -Wswitch -Wshadow -Wcast-qual -Wwrite-strings 
-Wno-unreachable-code -Wno-pointer-sign -Wno-attributes 
-Wno-sign-compare -Walloca -Wno-address-of-packed-member -march=armv6z 
-mtune=arm1176jzf-s -mfpu=vfp --sysroot=/src/NetBSD/cur/BUILD.evbarm 
-I. -I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/acpica/dist 
-I/src/NetBSD/cur/src/sys/../common/lib/libx86emu 
-I/src/NetBSD/cur/src/sys/../common/lib/libc/misc 
-I/src/NetBSD/cur/src/sys/../common/include 
-I/src/NetBSD/cur/src/sys/arch  -I/src/NetBSD/cur/src/sys -nostdinc 
-DCOMPAT_UTILS  -DARM_GENERIC_TODR -D__HAVE_CPU_COUNTER 
-D__HAVE_CPU_UAREA_ALLOC_IDLELWP -D__HAVE_FAST_SOFTINTS 
-D__HAVE_MM_MD_DIRECT_MAPPED_PHYS -DCOMPAT_44  -DDIAGNOSTIC 
-D__HAVE_MM_MD_CACHE_ALIASING -DPLCONSOLE -D_KERNEL -D_KERNEL_OPT 
-std=gnu99 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/quad 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/string 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/arch/arm/atomic 
-I/src/NetBSD/cur/src/sys/lib/libkern/../../../common/lib/libc/hash/sha3 
-I/src/NetBSD/cur/src/sys/external/bsd 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-I/src/NetBSD/cur/src/sys/external/bsd/dwc2/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libfdt/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/libnv/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist 
-I/src/NetBSD/cur/src/sys/external/bsd/common/include 
-DVCOS_VERIFY_BKPTS=1 -DUSE_VCHIQ_ARM -D__VCCOREVER__=0x0400 
-DVCHIQ_ENABLE_DEBUG=1 -DVCHIQ_LOG_DEFAULT=5 -c 
/src/NetBSD/cur/src/sys/external/bsd/vchiq/dist/interface/vchiq_arm/vchiq_arm.c 
-o vchiq_arm.o  &&  : echo /usr/curbtools/bin/nbctfconvert -g -L 
VERS

Re: "zfs send" freezes system

2022-07-19 Thread Frank Kardel
I'd vote for backing out the patch unless someone(TM) can find the 
pgdaemon issue.


Best regards,

  Frank


On 07/19/22 08:46, Matthias Petermann wrote:

Hello,

On 13.07.22 12:30, Matthias Petermann wrote:

I can now confirm that reverting the patch also solved my problem. Of 
course I first fell into the trap, because I had not considered that 
the ZFS code is loaded as a module and had only changed the kernel. 
As a result, it looked at first as if this would not help. Finally it 
did...I am now glad that I can use a zfs send again in this way. This 
previously led reproducibly to a crash, whereby I could not make 
backups. This is critical for me and I would like to support tests 
regarding this.


In contrast to the PR, there are hardly any xcalls in my use case - 
however, my system only has 4 CPU cores, 2 of which are physical.



Many greetings
Matthias



Roundabout one week after removing the patch, my system with ZFS is 
behaving "normally" for the most part and the freezes have 
disappeared. What is the recommended way given the 10 branch? If it is 
not foreseeable that the basic problem can be solved shortly, would it 
also be an option to withdraw the patch in the sources to get at least 
a stable behavior? (Not only) on the sidelines, I would still be 
interested in whether this "zfs send" problem occurs in general, or 
whether certain hardware requirements have a favorable effect on it.


Kind regards
Matthias





i915 observations

2022-07-21 Thread Frank Kardel

Hi !

Where do we collect i915 (-current) issues? We don't seem to have many 
PRs in that area. I have a data point with a Thinkpad T15p where i915 
almost works but has flickering dark dashes and massively delayed 
keyboard input. Unfortunately I have to pass the notebook on to a college.


Frank




  1   2   >