Re: Since last week (today) current on my Ryzen box is unstable

2018-02-18 Thread Julian Elischer

On 19/2/18 4:33 am, Gleb Smirnoff wrote:

On Sun, Feb 18, 2018 at 10:15:24PM +0200, Andriy Gapon wrote:
A> On 18/02/2018 15:26, Gleb Smirnoff wrote:
A> > My only point is that it is a performance improvement. IMHO that's enough 
:)
A>
A> I don't think that passing an invalid argument to a documented KPI is 
"enough"
A> for any optimization.

I don't see a sense in making this KPI so sacred. This is something used 
internally
in kernel, and not used outside. The KPI has changed several times in the past.

A> > If you can't suggest a more elegant way of doing that improvement, then all
A> > I can suggest is to document it and add its support to ZFS.
A>
A> In return I can only suggest that (1) you run your suggestion by arch@ -- 
unless
A> that's already been done and you can point me to the discussion,  (2) 
document
A> it and (3) double-check that all implementations confirm to it.

I can provide a patch for ZFS.



If any module outside of the code that implements it needs to know 
about it,

then it is in the KPI and should be documented in the KPI documentation
(e.g. man 9)


Since the Filesystems need to know about this, it must be an externally
visible feature and therefore needs to be documented.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-18 Thread Andriy Gapon
On 18/02/2018 22:33, Gleb Smirnoff wrote:
> On Sun, Feb 18, 2018 at 10:15:24PM +0200, Andriy Gapon wrote:
> A> On 18/02/2018 15:26, Gleb Smirnoff wrote:
> A> > My only point is that it is a performance improvement. IMHO that's 
> enough :)
> A> 
> A> I don't think that passing an invalid argument to a documented KPI is 
> "enough"
> A> for any optimization.
> 
> I don't see a sense in making this KPI so sacred. This is something used 
> internally
> in kernel, and not used outside. The KPI has changed several times in the 
> past.

I don't have anything against changing KPI.
At the same time think that it should be well-defined at all times.

> A> > If you can't suggest a more elegant way of doing that improvement, then 
> all
> A> > I can suggest is to document it and add its support to ZFS.
> A> 
> A> In return I can only suggest that (1) you run your suggestion by arch@ -- 
> unless
> A> that's already been done and you can point me to the discussion,  (2) 
> document
> A> it and (3) double-check that all implementations confirm to it.
> 
> I can provide a patch for ZFS.

Thank you.  But I think that the documentation update will be much more 
valuable.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-18 Thread Gleb Smirnoff
On Sun, Feb 18, 2018 at 10:15:24PM +0200, Andriy Gapon wrote:
A> On 18/02/2018 15:26, Gleb Smirnoff wrote:
A> > My only point is that it is a performance improvement. IMHO that's enough 
:)
A> 
A> I don't think that passing an invalid argument to a documented KPI is 
"enough"
A> for any optimization.

I don't see a sense in making this KPI so sacred. This is something used 
internally
in kernel, and not used outside. The KPI has changed several times in the past.

A> > If you can't suggest a more elegant way of doing that improvement, then all
A> > I can suggest is to document it and add its support to ZFS.
A> 
A> In return I can only suggest that (1) you run your suggestion by arch@ -- 
unless
A> that's already been done and you can point me to the discussion,  (2) 
document
A> it and (3) double-check that all implementations confirm to it.

I can provide a patch for ZFS.

-- 
Gleb Smirnoff
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-18 Thread Andriy Gapon
On 18/02/2018 15:26, Gleb Smirnoff wrote:
> My only point is that it is a performance improvement. IMHO that's enough :)

I don't think that passing an invalid argument to a documented KPI is "enough"
for any optimization.

> If you can't suggest a more elegant way of doing that improvement, then all
> I can suggest is to document it and add its support to ZFS.

In return I can only suggest that (1) you run your suggestion by arch@ -- unless
that's already been done and you can point me to the discussion,  (2) document
it and (3) double-check that all implementations confirm to it.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-18 Thread Gleb Smirnoff
On Sun, Feb 18, 2018 at 09:28:30AM +0200, Andriy Gapon wrote:
A> > A> vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
A> > A> 0xfe00b3c36650
A> > A> vn_sendfile() at vn_sendfile+0xe70/frame 0xfe00b3c368e0
A> > A> sendfile() at sendfile+0x149/frame 0xfe00b3c36980
A> > A> amd64_syscall() at amd64_syscall+0x79b/frame 0xfe00b3c36ab0
A> > A> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffdb00
A> > A> 
A> > A> I looked at sendfile_swapin() code and it seems that it uses the pager 
API in an
A> > A> undocumented way.  Specifically, it inserts bogus_page into the array of
A> > A> requested pages.  For starters, bogus_page is not busied and 
VOP_GETPAGES is
A> > A> documented to have all requested pages exclusively busied.  Second, I 
always had
A> > A> an impression that bogus_page is an implementation detail of the 
unified buffer
A> > A> / page cache and that other code need not be aware of it.
A> > A> 
A> > A> So, my opinion is that the sendfile code uses a "clever hack" that 
happens to
A> > A> work with the buffer cache based filesystems, but that that hack is a 
bug.
A> > A> So, I'd prefer that the problem is fixed in that code.
A> > A> But I am open to being convinced that all VOP_GETPAGES implementations,
A> > A> including that in ZFS, must be made aware of bogus_page.  Or, at least, 
that
A> > A> they should not verify that the requested pages are busied.
A> > 
A> > This is optimization that improves throughput when file memory cache is
A> > fragmented. Why don't you like adding the code to zfs_freebsd_getpages()?
A> 
A> I cited two reasons above and expected to hear some counter-points rather 
than
A> them being ignored :-)
A> If we settle upon allowing bogus_page to be used in ma[], then that will
A> obviously need to be documented.

My only point is that it is a performance improvement. IMHO that's enough :)
If you can't suggest a more elegant way of doing that improvement, then all
I can suggest is to document it and add its support to ZFS.

-- 
Gleb Smirnoff
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andriy Gapon
On 18/02/2018 04:35, Gleb Smirnoff wrote:
>   Andriy,
> 
> On Sun, Feb 18, 2018 at 12:54:21AM +0200, Andriy Gapon wrote:
> A> > Today's rebuild has given me uptimes of below an hour, usually.  The box 
> will stay up in single user mode long enough to rebuild world/kernel, but 
> multi-user it is panicking at 
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> A> > 
> A> > The backtrace shows that it gets to this panic from a sendfile() 
> syscall.  The line above is in the middle of a big edit that's part of svn 
> revision 329363.  The tripping assertion seems to suggest that m->valid != 0, 
> for whatever that's worth.
> A> 
> A> I am doing a bit of an offline investigation with Andrew and it seems that 
> the
> A> actual panic message is this:
> A> 
> A> panic: vm_page_assert_xbusied: page 0xf807ebbd8f98 not exclusive busy @
> A> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> A> 
> A> The stack is this:
> A> vpanic() at vpanic/frame 0xfe00b3c36390
> A> dmu_read_pages() at dmu_read_pages+0x535/frame 0xfe00b3c36460
> A> zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 
> 0xfe00b3c36510
> A> VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfe00b3c36540
> A> vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 
> 0xfe00b3c36590
> A> VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 
> 0xfe00b3c365c0
> A> vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
> A> 0xfe00b3c36650
> A> vn_sendfile() at vn_sendfile+0xe70/frame 0xfe00b3c368e0
> A> sendfile() at sendfile+0x149/frame 0xfe00b3c36980
> A> amd64_syscall() at amd64_syscall+0x79b/frame 0xfe00b3c36ab0
> A> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffdb00
> A> 
> A> I looked at sendfile_swapin() code and it seems that it uses the pager API 
> in an
> A> undocumented way.  Specifically, it inserts bogus_page into the array of
> A> requested pages.  For starters, bogus_page is not busied and VOP_GETPAGES 
> is
> A> documented to have all requested pages exclusively busied.  Second, I 
> always had
> A> an impression that bogus_page is an implementation detail of the unified 
> buffer
> A> / page cache and that other code need not be aware of it.
> A> 
> A> So, my opinion is that the sendfile code uses a "clever hack" that happens 
> to
> A> work with the buffer cache based filesystems, but that that hack is a bug.
> A> So, I'd prefer that the problem is fixed in that code.
> A> But I am open to being convinced that all VOP_GETPAGES implementations,
> A> including that in ZFS, must be made aware of bogus_page.  Or, at least, 
> that
> A> they should not verify that the requested pages are busied.
> 
> This is optimization that improves throughput when file memory cache is
> fragmented. Why don't you like adding the code to zfs_freebsd_getpages()?

I cited two reasons above and expected to hear some counter-points rather than
them being ignored :-)
If we settle upon allowing bogus_page to be used in ma[], then that will
obviously need to be documented.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Gleb Smirnoff
  Andriy,

On Sun, Feb 18, 2018 at 12:54:21AM +0200, Andriy Gapon wrote:
A> > Today's rebuild has given me uptimes of below an hour, usually.  The box 
will stay up in single user mode long enough to rebuild world/kernel, but 
multi-user it is panicking at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
A> > 
A> > The backtrace shows that it gets to this panic from a sendfile() syscall.  
The line above is in the middle of a big edit that's part of svn revision 
329363.  The tripping assertion seems to suggest that m->valid != 0, for 
whatever that's worth.
A> 
A> I am doing a bit of an offline investigation with Andrew and it seems that 
the
A> actual panic message is this:
A> 
A> panic: vm_page_assert_xbusied: page 0xf807ebbd8f98 not exclusive busy @
A> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
A> 
A> The stack is this:
A> vpanic() at vpanic/frame 0xfe00b3c36390
A> dmu_read_pages() at dmu_read_pages+0x535/frame 0xfe00b3c36460
A> zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 0xfe00b3c36510
A> VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfe00b3c36540
A> vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 
0xfe00b3c36590
A> VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 
0xfe00b3c365c0
A> vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
A> 0xfe00b3c36650
A> vn_sendfile() at vn_sendfile+0xe70/frame 0xfe00b3c368e0
A> sendfile() at sendfile+0x149/frame 0xfe00b3c36980
A> amd64_syscall() at amd64_syscall+0x79b/frame 0xfe00b3c36ab0
A> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffdb00
A> 
A> I looked at sendfile_swapin() code and it seems that it uses the pager API 
in an
A> undocumented way.  Specifically, it inserts bogus_page into the array of
A> requested pages.  For starters, bogus_page is not busied and VOP_GETPAGES is
A> documented to have all requested pages exclusively busied.  Second, I always 
had
A> an impression that bogus_page is an implementation detail of the unified 
buffer
A> / page cache and that other code need not be aware of it.
A> 
A> So, my opinion is that the sendfile code uses a "clever hack" that happens to
A> work with the buffer cache based filesystems, but that that hack is a bug.
A> So, I'd prefer that the problem is fixed in that code.
A> But I am open to being convinced that all VOP_GETPAGES implementations,
A> including that in ZFS, must be made aware of bogus_page.  Or, at least, that
A> they should not verify that the requested pages are busied.

This is optimization that improves throughput when file memory cache is
fragmented. Why don't you like adding the code to zfs_freebsd_getpages()?

-- 
Gleb Smirnoff
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Conrad Meyer
On Sat, Feb 17, 2018 at 12:52 PM, Andrew Reilly  wrote:
> I've applied the patch, and the boot process is quiet now, but it's still 
> loading cc_vegas.ko, seemingly in response to seeing this device: (from 
> pciconf -l -v)
>
> none4@pci0:17:0:2:  class=0x108000 card=0x14561022 chip=0x14561022 
> rev=0x00 hdr=0x00
> vendor = 'Advanced Micro Devices, Inc. [AMD]'
> device = 'Family 17h (Models 00h-0fh) Platform Security Processor'
> class  = encrypt/decrypt
>
> (from devmatch -v)
> Searching  pci bus at slot=0 function=2 dbsf=pci0:17:0:2 
> handle=\_SB_.PCI0.GP17.APSP for pnpinfo vendor=0x1022 device=0x1456 
> subvendor=0x1022 subdevice=0x1456 class=0x108000
> cc_vegas.ko

That's kind of interesting.  That device should match ccp.ko, not
cc_vegas.ko.  As far as I can tell, cc_vegas has no PNP data at all.
Maybe this is a bug in kldxref or devmatch.

Best,
Conrad
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andriy Gapon
On 17/02/2018 14:16, Andrew Reilly wrote:
> Today's rebuild has given me uptimes of below an hour, usually.  The box will 
> stay up in single user mode long enough to rebuild world/kernel, but 
> multi-user it is panicking at 
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> 
> The backtrace shows that it gets to this panic from a sendfile() syscall.  
> The line above is in the middle of a big edit that's part of svn revision 
> 329363.  The tripping assertion seems to suggest that m->valid != 0, for 
> whatever that's worth.

I am doing a bit of an offline investigation with Andrew and it seems that the
actual panic message is this:

panic: vm_page_assert_xbusied: page 0xf807ebbd8f98 not exclusive busy @
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592

The stack is this:
vpanic() at vpanic/frame 0xfe00b3c36390
dmu_read_pages() at dmu_read_pages+0x535/frame 0xfe00b3c36460
zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 0xfe00b3c36510
VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfe00b3c36540
vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 0xfe00b3c36590
VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 0xfe00b3c365c0
vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
0xfe00b3c36650
vn_sendfile() at vn_sendfile+0xe70/frame 0xfe00b3c368e0
sendfile() at sendfile+0x149/frame 0xfe00b3c36980
amd64_syscall() at amd64_syscall+0x79b/frame 0xfe00b3c36ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffdb00

I looked at sendfile_swapin() code and it seems that it uses the pager API in an
undocumented way.  Specifically, it inserts bogus_page into the array of
requested pages.  For starters, bogus_page is not busied and VOP_GETPAGES is
documented to have all requested pages exclusively busied.  Second, I always had
an impression that bogus_page is an implementation detail of the unified buffer
/ page cache and that other code need not be aware of it.

So, my opinion is that the sendfile code uses a "clever hack" that happens to
work with the buffer cache based filesystems, but that that hack is a bug.
So, I'd prefer that the problem is fixed in that code.
But I am open to being convinced that all VOP_GETPAGES implementations,
including that in ZFS, must be made aware of bogus_page.  Or, at least, that
they should not verify that the requested pages are busied.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andrew Reilly
I've applied the patch, and the boot process is quiet now, but it's still 
loading cc_vegas.ko, seemingly in response to seeing this device: (from pciconf 
-l -v)

none4@pci0:17:0:2:  class=0x108000 card=0x14561022 chip=0x14561022 rev=0x00 
hdr=0x00
vendor = 'Advanced Micro Devices, Inc. [AMD]'
device = 'Family 17h (Models 00h-0fh) Platform Security Processor'
class  = encrypt/decrypt

(from devmatch -v)
Searching  pci bus at slot=0 function=2 dbsf=pci0:17:0:2 
handle=\_SB_.PCI0.GP17.APSP for pnpinfo vendor=0x1022 device=0x1456 
subvendor=0x1022 subdevice=0x1456 class=0x108000
cc_vegas.ko

The output above suggests that there isn't a driver attached to that device 
anyway, though.

Cheers,

Andrew Reilly



> On 18 Feb 2018, at 00:06 , Hans Petter Selasky  wrote:
> 
> On 02/17/18 13:42, Hans Petter Selasky wrote:
>> On 02/17/18 13:16, Andrew Reilly wrote:
>>> On a side-note, the new devmatch workings are giving me 43 boot warnings 
>>> about "Malformed NOMATCH string: ''?'', and devmatch_enable="NO" in 
>>> /etc/rc.conf doesn't seem to help, and the new matching is very very keen 
>>> to load cc_vegas.ko, a lot.  Here's the output of devmatch -v, in case that 
>>> helps:
>> Hi,
>> Does the attached patch solve the devmatch issue? Just apply it directly on 
>> /etc and reboot.
>> --HPS
> 
> Please find updated patch attached.
> 
> --HPS
> 
> 

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andriy Gapon
On 17/02/2018 14:16, Andrew Reilly wrote:
> Today's rebuild has given me uptimes of below an hour, usually.  The box will
> stay up in single user mode long enough to rebuild world/kernel, but
> multi-user it is panicking at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> 
> The backtrace shows that it gets to this panic from a sendfile() syscall.
> The line above is in the middle of a big edit that's part of svn revision
> 329363.  The tripping assertion seems to suggest that m->valid != 0, for
> whatever that's worth.

The panic message and the backtrace would be a good start, but a crash dump is
probably what's really needed to analyze the issue.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Hans Petter Selasky

On 02/17/18 13:42, Hans Petter Selasky wrote:

On 02/17/18 13:16, Andrew Reilly wrote:
On a side-note, the new devmatch workings are giving me 43 boot 
warnings about "Malformed NOMATCH string: ''?'', and 
devmatch_enable="NO" in /etc/rc.conf doesn't seem to help, and the new 
matching is very very keen to load cc_vegas.ko, a lot.  Here's the 
output of devmatch -v, in case that helps:


Hi,

Does the attached patch solve the devmatch issue? Just apply it directly 
on /etc and reboot.


--HPS



Please find updated patch attached.

--HPS

Index: etc/devd/devmatch.conf
===
--- etc/devd/devmatch.conf	(revision 329447)
+++ etc/devd/devmatch.conf	(working copy)
@@ -9,7 +9,7 @@
 #
 # Generic NOMATCH event
 nomatch 100 {
-	action "service devmatch start '?$_'";
+	action "/etc/rc.d/devmatch start '?$_'";
 };
 
 # Add the following to devd.conf to prevent this from running:
Index: etc/rc.d/devmatch
===
--- etc/rc.d/devmatch	(revision 329447)
+++ etc/rc.d/devmatch	(working copy)
@@ -37,13 +37,17 @@
 
 start_cmd="${name}_start"
 stop_cmd=':'
-[ -n "$2" ] && one_nomatch="-p '$2'"
+one_nomatch=$2
 
 devmatch_start()
 {
 	local x
 
-	x=$(devmatch ${one_nomatch} | sort -u)
+	if [ -n "$one_nomatch" ]; then
+		x=$(devmatch -p "${one_nomatch}" | sort -u)
+	else
+		x=$(devmatch | sort -u)
+	fi
 
 	[ -n "$x" ] || return
 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Hans Petter Selasky

On 02/17/18 13:16, Andrew Reilly wrote:

On a side-note, the new devmatch workings are giving me 43 boot warnings about "Malformed 
NOMATCH string: ''?'', and devmatch_enable="NO" in /etc/rc.conf doesn't seem to 
help, and the new matching is very very keen to load cc_vegas.ko, a lot.  Here's the output of 
devmatch -v, in case that helps:


Hi,

Does the attached patch solve the devmatch issue? Just apply it directly 
on /etc and reboot.


--HPS
Index: etc/devd/devmatch.conf
===
--- etc/devd/devmatch.conf	(revision 329447)
+++ etc/devd/devmatch.conf	(working copy)
@@ -9,7 +9,7 @@
 #
 # Generic NOMATCH event
 nomatch 100 {
-	action "service devmatch start '?$_'";
+	action "/etc/rc.d/devmatch start '?$_'";
 };
 
 # Add the following to devd.conf to prevent this from running:
Index: etc/rc.d/devmatch
===
--- etc/rc.d/devmatch	(revision 329447)
+++ etc/rc.d/devmatch	(working copy)
@@ -37,7 +37,7 @@
 
 start_cmd="${name}_start"
 stop_cmd=':'
-[ -n "$2" ] && one_nomatch="-p '$2'"
+[ -n "$2" ] && one_nomatch=-p "$2"
 
 devmatch_start()
 {
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andrew Reilly
Hi,

I do a weekly build to track changes, on 12-current since I gave my fileserver 
this new Ryzen motherboard a few months ago.  I switched to current because 
there was some badness in 11-stable that I attributed to new processor 
twitchiness (wouldn't reboot, temperature sensors not working.)  A month or so 
of 12- has been lovely, for the most part.

Today's rebuild has given me uptimes of below an hour, usually.  The box will 
stay up in single user mode long enough to rebuild world/kernel, but multi-user 
it is panicking at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592

The backtrace shows that it gets to this panic from a sendfile() syscall.  The 
line above is in the middle of a big edit that's part of svn revision 329363.  
The tripping assertion seems to suggest that m->valid != 0, for whatever that's 
worth.

Anything that I should be trying?

On a side-note, the new devmatch workings are giving me 43 boot warnings about 
"Malformed NOMATCH string: ''?'', and devmatch_enable="NO" in /etc/rc.conf 
doesn't seem to help, and the new matching is very very keen to load 
cc_vegas.ko, a lot.  Here's the output of devmatch -v, in case that helps:

$ devmatch -v
Searching  acpi bus at handle=\_PR_.P008 for pnpinfo _HID=none _UID=0
Searching  acpi bus at handle=\_PR_.P009 for pnpinfo _HID=none _UID=0
Searching  acpi bus at handle=\_PR_.P00A for pnpinfo _HID=none _UID=0
Searching  acpi bus at handle=\_PR_.P00B for pnpinfo _HID=none _UID=0
Searching  acpi bus at handle=\_PR_.P00C for pnpinfo _HID=none _UID=0
Searching  acpi bus at handle=\_PR_.P00D for pnpinfo _HID=none _UID=0
Searching  acpi bus at handle=\_PR_.P00E for pnpinfo _HID=none _UID=0
Searching  acpi bus at handle=\_PR_.P00F for pnpinfo _HID=none _UID=0
Searching  pci bus at slot=0 function=2 dbsf=pci0:0:0:2 handle=\_SB_.PCI0.IOMA 
for pnpinfo vendor=0x1022 device=0x1451 subvendor=0x1022 subdevice=0x1451 
class=0x080600
Searching  pci bus at slot=0 function=0 dbsf=pci0:9:0:0 for pnpinfo 
vendor=0x8086 device=0x24fb subvendor=0x8086 subdevice=0x2110 class=0x028000
Searching  pci bus at slot=0 function=1 dbsf=pci0:11:0:1 for pnpinfo 
vendor=0x1002 device=0xaab0 subvendor=0x174b subdevice=0xaab0 class=0x040300
Searching  pci bus at slot=0 function=0 dbsf=pci0:17:0:0 for pnpinfo 
vendor=0x1022 device=0x145a subvendor=0x1022 subdevice=0x145a class=0x13
Searching  pci bus at slot=0 function=2 dbsf=pci0:17:0:2 
handle=\_SB_.PCI0.GP17.APSP for pnpinfo vendor=0x1022 device=0x1456 
subvendor=0x1022 subdevice=0x1456 class=0x108000
cc_vegas.ko
Searching  pci bus at slot=0 function=0 dbsf=pci0:18:0:0 for pnpinfo 
vendor=0x1022 device=0x1455 subvendor=0x1022 subdevice=0x1455 class=0x13
Searching  acpi bus at handle=\_SB_.PCI0.SBRG.PIC_ for pnpinfo _HID=PNP 
_UID=0
Searching  acpi bus at handle=\_SB_.PCI0.SBRG.SPKR for pnpinfo _HID=PNP0800 
_UID=0
Searching  acpi bus at handle=\_SB_.GPIO for pnpinfo _HID=AMDI0030 _UID=0
Searching  acpi bus at handle=\_SB_.PTIO for pnpinfo _HID=AMDIF030 _UID=0
Searching  acpi bus at handle=\AOD_ for pnpinfo _HID=PNP0C14 _UID=0

I can't tell if this is related to the zfs problem or not.  As far as I'm 
aware, cc_vegas.ko was not loaded into the kernel before today.

FWIW uname -a says:
FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #6 r329450: Sat Feb 17 
22:36:19 AEDT 2018 root@:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

I'll attach the dmesg.boot from the boot that I had to do while composing this 
message...

Cheers,

Andrew




dmesg.boot
Description: Binary data
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"