Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

2018-03-17 Thread Andriy Gapon
On 17/03/2018 18:51, Mark Millard wrote:
> I'll  note that top was a -w that reports:
> 
>-w Display approximate swap usage for each process.

As far as I can tell, this option is quite broken.
The "approximate swap usage" it reports is nowhere like it.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS panic at boot when mounting root on r330386

2018-03-04 Thread Andriy Gapon
On 05/03/2018 02:59, Bryan Drewery wrote:
>> panic: solaris assert: refcount_count(>spa_refcount) > spa->spa_minref 
>> || MUTEX_HELD(_namespace_lock), file: 
>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c, line: 952
>> cpuid = 10
>> time = 1520207367
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
>> 0xfe23f57a2420
>> vpanic() at vpanic+0x18d/frame 0xfe23f57a2480
>> panic() at panic+0x43/frame 0xfe23f57a24e0
>> assfail() at assfail+0x1a/frame 0xfe23f57a24f0
>> spa_close() at spa_close+0x5d/frame 0xfe23f57a2520
>> spa_get_stats() at spa_get_stats+0x481/frame 0xfe23f57a2700
>> zfs_ioc_pool_stats() at zfs_ioc_pool_stats+0x25/frame 0xfe23f57a2740
>> zfsdev_ioctl() at zfsdev_ioctl+0x76b/frame 0xfe23f57a27e0
>> devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe23f57a2830
>> VOP_IOCTL_APV() at VOP_IOCTL_APV+0x102/frame 0xfe23f57a2860
>> vn_ioctl() at vn_ioctl+0x124/frame 0xfe23f57a2970
>> devfs_ioctl_f() at devfs_ioctl_f+0x1f/frame 0xfe23f57a2990
>> kern_ioctl() at kern_ioctl+0x2c2/frame 0xfe23f57a29f0
>> sys_ioctl() at sys_ioctl+0x15c/frame 0xfe23f57a2ac0
>> amd64_syscall() at amd64_syscall+0x786/frame 0xfe23f57a2bf0
>> fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe23f57a2bf0
>> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80049afda, rsp = 
>> 0x7fffbd18, rbp = 0x7fffbd90 ---
>> KDB: enter: panic
>> [ thread pid 56 tid 100606 ]
>> Stopped at  kdb_enter+0x3b: movq$0,kdb_why
>> db>
> 
> It seems like a race as I can get it to boot sometimes.

Yes, it does.  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210409

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: lock order reversal

2018-02-25 Thread Andriy Gapon
On 26/02/2018 07:18, Jon Brawn wrote:
> Wotcha!
> 
> So, I’ve been using FreeBSD 12-CURRENT at various svn releases for a while 
> now, and I get quite a few “lock order reversal” dumps. The one I’ve got on 
> my screen at the moment is for ufs / bufwait / ufs:
> 
> root@brax:/usr/src/stand # lock order reversal:
>  1st 0xfd0003ec17e8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2602
>  2nd 0x410efa20 bufwait (bufwait) @ 
> /usr/src/sys/ufs/ffs/ffs_vnops.c:282
>  3rd 0xfd00b83ca7e8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2602
> stack backtrace:
> #0 0x003b59d4 at witness_debugger+0x64
> #1 0x0032bd34 at __lockmgr_args+0x6ac
> #2 0x005c6af0 at ffs_lock+0x88
> #3 0x00679eb0 at VOP_LOCK1_APV+0xac
> #4 0x00426fa8 at _vn_lock+0x64
> #5 0x00417550 at vget+0x78
> #6 0x00409fdc at vfs_hash_get+0xec
> #7 0x005c2b94 at ffs_vgetf+0x44
> #8 0x005b96a8 at softdep_sync_buf+0x9f4
> #9 0x005c7834 at ffs_syncvnode+0x26c
> #10 0x005a1b5c at ffs_truncate+0x6b0
> #11 0x005ce3cc at ufs_direnter+0x778
> #12 0x005d64bc at ufs_makeinode+0x4b8
> #13 0x005d2b90 at ufs_create+0x38
> #14 0x00677168 at VOP_CREATE_APV+0xac
> #15 0x0042691c at vn_open_cred+0x264
> #16 0x0041fc84 at kern_openat+0x208
> #17 0x0064b59c at do_el0_sync+0x8bc
> 
> Is there something I should be doing to help debug these?

IMO, no. Please ignore LORs involving "bufwait", "filedesc structure", "syncer"
unless you experience any real problem (like a lock up).

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-18 Thread Andriy Gapon
On 18/02/2018 22:33, Gleb Smirnoff wrote:
> On Sun, Feb 18, 2018 at 10:15:24PM +0200, Andriy Gapon wrote:
> A> On 18/02/2018 15:26, Gleb Smirnoff wrote:
> A> > My only point is that it is a performance improvement. IMHO that's 
> enough :)
> A> 
> A> I don't think that passing an invalid argument to a documented KPI is 
> "enough"
> A> for any optimization.
> 
> I don't see a sense in making this KPI so sacred. This is something used 
> internally
> in kernel, and not used outside. The KPI has changed several times in the 
> past.

I don't have anything against changing KPI.
At the same time think that it should be well-defined at all times.

> A> > If you can't suggest a more elegant way of doing that improvement, then 
> all
> A> > I can suggest is to document it and add its support to ZFS.
> A> 
> A> In return I can only suggest that (1) you run your suggestion by arch@ -- 
> unless
> A> that's already been done and you can point me to the discussion,  (2) 
> document
> A> it and (3) double-check that all implementations confirm to it.
> 
> I can provide a patch for ZFS.

Thank you.  But I think that the documentation update will be much more 
valuable.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-18 Thread Andriy Gapon
On 18/02/2018 15:26, Gleb Smirnoff wrote:
> My only point is that it is a performance improvement. IMHO that's enough :)

I don't think that passing an invalid argument to a documented KPI is "enough"
for any optimization.

> If you can't suggest a more elegant way of doing that improvement, then all
> I can suggest is to document it and add its support to ZFS.

In return I can only suggest that (1) you run your suggestion by arch@ -- unless
that's already been done and you can point me to the discussion,  (2) document
it and (3) double-check that all implementations confirm to it.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andriy Gapon
On 18/02/2018 04:35, Gleb Smirnoff wrote:
>   Andriy,
> 
> On Sun, Feb 18, 2018 at 12:54:21AM +0200, Andriy Gapon wrote:
> A> > Today's rebuild has given me uptimes of below an hour, usually.  The box 
> will stay up in single user mode long enough to rebuild world/kernel, but 
> multi-user it is panicking at 
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> A> > 
> A> > The backtrace shows that it gets to this panic from a sendfile() 
> syscall.  The line above is in the middle of a big edit that's part of svn 
> revision 329363.  The tripping assertion seems to suggest that m->valid != 0, 
> for whatever that's worth.
> A> 
> A> I am doing a bit of an offline investigation with Andrew and it seems that 
> the
> A> actual panic message is this:
> A> 
> A> panic: vm_page_assert_xbusied: page 0xf807ebbd8f98 not exclusive busy @
> A> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> A> 
> A> The stack is this:
> A> vpanic() at vpanic/frame 0xfe00b3c36390
> A> dmu_read_pages() at dmu_read_pages+0x535/frame 0xfe00b3c36460
> A> zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 
> 0xfe00b3c36510
> A> VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfe00b3c36540
> A> vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 
> 0xfe00b3c36590
> A> VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 
> 0xfe00b3c365c0
> A> vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
> A> 0xfe00b3c36650
> A> vn_sendfile() at vn_sendfile+0xe70/frame 0xfe00b3c368e0
> A> sendfile() at sendfile+0x149/frame 0xfe00b3c36980
> A> amd64_syscall() at amd64_syscall+0x79b/frame 0xfe00b3c36ab0
> A> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffdb00
> A> 
> A> I looked at sendfile_swapin() code and it seems that it uses the pager API 
> in an
> A> undocumented way.  Specifically, it inserts bogus_page into the array of
> A> requested pages.  For starters, bogus_page is not busied and VOP_GETPAGES 
> is
> A> documented to have all requested pages exclusively busied.  Second, I 
> always had
> A> an impression that bogus_page is an implementation detail of the unified 
> buffer
> A> / page cache and that other code need not be aware of it.
> A> 
> A> So, my opinion is that the sendfile code uses a "clever hack" that happens 
> to
> A> work with the buffer cache based filesystems, but that that hack is a bug.
> A> So, I'd prefer that the problem is fixed in that code.
> A> But I am open to being convinced that all VOP_GETPAGES implementations,
> A> including that in ZFS, must be made aware of bogus_page.  Or, at least, 
> that
> A> they should not verify that the requested pages are busied.
> 
> This is optimization that improves throughput when file memory cache is
> fragmented. Why don't you like adding the code to zfs_freebsd_getpages()?

I cited two reasons above and expected to hear some counter-points rather than
them being ignored :-)
If we settle upon allowing bogus_page to be used in ma[], then that will
obviously need to be documented.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andriy Gapon
On 17/02/2018 14:16, Andrew Reilly wrote:
> Today's rebuild has given me uptimes of below an hour, usually.  The box will 
> stay up in single user mode long enough to rebuild world/kernel, but 
> multi-user it is panicking at 
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> 
> The backtrace shows that it gets to this panic from a sendfile() syscall.  
> The line above is in the middle of a big edit that's part of svn revision 
> 329363.  The tripping assertion seems to suggest that m->valid != 0, for 
> whatever that's worth.

I am doing a bit of an offline investigation with Andrew and it seems that the
actual panic message is this:

panic: vm_page_assert_xbusied: page 0xf807ebbd8f98 not exclusive busy @
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592

The stack is this:
vpanic() at vpanic/frame 0xfe00b3c36390
dmu_read_pages() at dmu_read_pages+0x535/frame 0xfe00b3c36460
zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 0xfe00b3c36510
VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfe00b3c36540
vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 0xfe00b3c36590
VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 0xfe00b3c365c0
vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
0xfe00b3c36650
vn_sendfile() at vn_sendfile+0xe70/frame 0xfe00b3c368e0
sendfile() at sendfile+0x149/frame 0xfe00b3c36980
amd64_syscall() at amd64_syscall+0x79b/frame 0xfe00b3c36ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffdb00

I looked at sendfile_swapin() code and it seems that it uses the pager API in an
undocumented way.  Specifically, it inserts bogus_page into the array of
requested pages.  For starters, bogus_page is not busied and VOP_GETPAGES is
documented to have all requested pages exclusively busied.  Second, I always had
an impression that bogus_page is an implementation detail of the unified buffer
/ page cache and that other code need not be aware of it.

So, my opinion is that the sendfile code uses a "clever hack" that happens to
work with the buffer cache based filesystems, but that that hack is a bug.
So, I'd prefer that the problem is fixed in that code.
But I am open to being convinced that all VOP_GETPAGES implementations,
including that in ZFS, must be made aware of bogus_page.  Or, at least, that
they should not verify that the requested pages are busied.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Since last week (today) current on my Ryzen box is unstable

2018-02-17 Thread Andriy Gapon
On 17/02/2018 14:16, Andrew Reilly wrote:
> Today's rebuild has given me uptimes of below an hour, usually.  The box will
> stay up in single user mode long enough to rebuild world/kernel, but
> multi-user it is panicking at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
> 
> The backtrace shows that it gets to this panic from a sendfile() syscall.
> The line above is in the middle of a big edit that's part of svn revision
> 329363.  The tripping assertion seems to suggest that m->valid != 0, for
> whatever that's worth.

The panic message and the backtrace would be a good start, but a crash dump is
probably what's really needed to analyze the issue.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Fatal trap 12 booting FreeBSD-CURRENT via isboot kernel module.

2018-02-04 Thread Andriy Gapon
On 04/02/2018 11:50, Maurizio Vairani wrote:
> I have added a socket in the ifioctl() call as in the
> /usr/src/sys/nfs/bootp_subr.c source.
> Please let me know if you prefer a patch.

A patch here https://reviews.freebsd.org/ would be the best.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: couple of nvidia-driver issues

2017-12-07 Thread Andriy Gapon

[cc-ing current@ to raise more awareness]

On 05/12/2017 16:03, Alexey Dokuchaev wrote:
> On Fri, Nov 24, 2017 at 11:31:51AM +0200, Andriy Gapon wrote:
>>
>> I have reported a couple of nvidia-driver issues in the FreeBSD section
>> of the nVidia developer forum, but no replies so far.
>>
>> Well, the first issue is not with the driver, but with a utility that
>> comes with it, nvidia-smi:
>> https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/
>> I wonder if I am the only one affected or if I see the problem because
>> I am on head or something else.
>> I am pretty sure that the problem is caused by a programming bug related
>> to strtok_r.
> 
> I'll try to reproduce it and report back.

I've done some work with a debugger and it seems that there is code that does
something like this:

char *last = NULL;

while (1) {
if (last == NULL)
p = strtok_r(str, sep, );
else
p = strtok_r(NULL, sep, );
if (p == NULL)
break;
...
}

The problem is that when 'p' points to the last token, 'last' is NULL (in
FreeBSD implementation of strtok_r).  That means that when we go to the next
iteration the parsing starts all over again leading to the endless loop.
The code is incorrect from the standards point of view, because the value of
'last' is completely opaque and should not be used for anything else but passing
it back to strtok_r.

I used gdb -w to change the logic to:

char *last = 1;

While (1) {
if (last == 1)
p = strtok_r(str, sep, );
else
p = strtok_r(NULL, sep, );
...
}

Where 1 is used as an "impossible" pointer value which is neither NULL nor a
valid pointer that can be set by strtok_r.  It's not ideal, but binary code
editing is not as easy as that of source code.

The binary patch is here: https://people.freebsd.org/~avg/nvidia-smi.bsdiff

>> The second issue is with the FreeBSD support for the kernel driver:
>> https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/
>> I would like to get some feedback on my analysis.
>> I am testing this patch right now:
>> https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c
> 
> Unfortunately, I'm not an expert on kernel locking primitives to give you
> a proper review, let's see what others have to say.

It's been a while since I posted the patch and there are no comments yet.
I can only add that I am running an INVARIANTS and WITNESS enabled kernel all
the time and before the patch I was getting kernel panics every now and then.
Since I started using the patch I haven't had a single nvidia panic yet.

>> Also, what's the best place or who are the best people with whom to
>> discuss such issues?
> 
> Yes, this is a problem now: since Christian Zander had left nVidia, he
> could not tell me who'd be their next liaison to talk to from FreeBSD
> community. :-(

Oh, I didn't know about Christian's departure.
So, we are not in a very good position now.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dmesg -a shows "Failed to fully fault in a core file segment at VA" examples, anything to worry about?

2017-12-01 Thread Andriy Gapon
On 30/11/2017 23:04, Mark Millard wrote:
> The messages seem to be considered non-fatal at the system
> level, although the processes are getting signal 11. It is
> not clear to me if the signal 11's are consequences, causes,
> or just happen to be associated.


The messages are produced if there is a problem while writing a core file.
So, they can appear only if (after) a process crashed.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Loader.conf problem

2017-11-21 Thread Andriy Gapon
On 21/11/2017 14:48, Thomas Laus wrote:
> I had boot success when copying gptzfsboot file from my laptop that is
> running r325474.  The problem CURRENT version running on my desktop is
> r326012.

Thomas,

could you please compare sizes of the files before going further?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [HEADS UP] posix_fallocate support removed from ZFS, lld affected

2017-11-16 Thread Andriy Gapon
On 13/11/2017 17:02, Ed Maste wrote:
> On 7 November 2017 at 13:12, Andriy Gapon <a...@freebsd.org> wrote:
>>
>> I hope that lld is not that widely used now.
>> But I admit that I put the cart before the horse.
>> I didn't expect that posix_fallocate is used in the development toolchain 
>> and I
>> didn't try to check for it.
> 
> For amd64 it is probably not a very large problem; it's not used by
> default and those who have enabled it can likely adapt. However, it is
> used by arm64 and is the default linker in FreeBSD 11.1, so we'll need
> to go with either an errata fix or Kostik's p_osrel suggestion to
> avoid breaking the package builds.

I agree with that proposal.
I think that that is what I should have done from the very start.
I need a little bit of help to implement it, though.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [HEADS UP] posix_fallocate support removed from ZFS, lld affected

2017-11-07 Thread Andriy Gapon
On 06/11/2017 19:26, Ian Lepore wrote:
> On Mon, 2017-11-06 at 17:40 +0200, Andriy Gapon wrote:
>> From UPDATING:
>> The naive and non-compliant support of posix_fallocate(2) in ZFS
>> has been removed as of r325320.  The system call now returns EINVAL
>> when used on a ZFS file.  Although the new behavior complies with the
>> standard, some consumers are not prepared to cope with it.
>> One known victim is lld prior to r325420.
>>
> 
> It just popped into my head... does this mean that kernels running
> r325320+ on systems using ZFS will be unable to host build jails for
> earlier versions / branches because lld will fail in the jail?

I am afraid that this is true.

> I think that will be a big problem for the ports team's package
> building process, and for anyone using poudriere.

I hope that lld is not that widely used now.
But I admit that I put the cart before the horse.
I didn't expect that posix_fallocate is used in the development toolchain and I
didn't try to check for it.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

[HEADS UP] posix_fallocate support removed from ZFS, lld affected

2017-11-06 Thread Andriy Gapon

>From UPDATING:
The naive and non-compliant support of posix_fallocate(2) in ZFS
has been removed as of r325320.  The system call now returns EINVAL
when used on a ZFS file.  Although the new behavior complies with the
standard, some consumers are not prepared to cope with it.
One known victim is lld prior to r325420.

>From https://svnweb.freebsd.org/changeset/base/325320
The generic (naive) implementation of posix_fallocate cannot provide the
standard mandated guarantee that overwrites would never fail due to the lack
of free space.  The fundamental reason is the copy-on-write architecture
of ZFS.  Other features like compression and deduplication can also
increase the size difference between the (pre-)allocated dummy content
and the future content.

So, until ZFS can properly implement the feature it's better to report
that it is unsupported rather than providing an ersatz implementation.
Please note that EINVAL is used to report that the underlying file system
does not support the operation (POSIX.1-2008).

illumos and ZoL seem to do the same.


lld is affected by the change.
That means that any world builds where lld is used are affected as well (if ZFS
is involved, of course).
One example is the arm64 build (typically a cross build from amd64).
The lld issue is fixed in head as of r325420.  But other branches are still
affected (if you are building them on a head kernel).

Other posix_fallocate consumers could be affected too.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r325320 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs [breaks lld on zfs: lld uses fallocate]

2017-11-04 Thread Andriy Gapon
On 04/11/2017 13:58, Ed Maste wrote:
> I have no idea how they decided EINVAL was a reasonable errno for this case.

I completely agree.  That's a weird choice that I have not seen for any other 
API.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r325320 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs [breaks lld on zfs: lld uses fallocate]

2017-11-04 Thread Andriy Gapon
On 04/11/2017 13:41, Andriy Gapon wrote:
> On 04/11/2017 12:32, Mark Millard wrote:
>>   if (int Err = ::posix_fallocate(FD, 0, Size)) {
>> if (Err != EOPNOTSUPP)
>>   return std::error_code(Err, std::generic_category());
>>   }
> 
> The commit message that you didn't include into your reply contains some 
> useful
> information that authors / maintainers of this code should probably take into
> account:
> 
>>   Please note that EINVAL is used to report that the underlying file system
>>   does not support the operation (POSIX.1-2008).
> 
> Here is a link for that:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html
> 

My response above is quite dry, so I want to add this.
Thank you very much for the deep analysis.
I am sorry for the trouble that my change caused, but I think that its root
cause lies elsewhere (lld, posix).

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r325320 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs [breaks lld on zfs: lld uses fallocate]

2017-11-04 Thread Andriy Gapon
On 04/11/2017 12:32, Mark Millard wrote:
>   if (int Err = ::posix_fallocate(FD, 0, Size)) {
> if (Err != EOPNOTSUPP)
>   return std::error_code(Err, std::generic_category());
>   }

The commit message that you didn't include into your reply contains some useful
information that authors / maintainers of this code should probably take into
account:

>   Please note that EINVAL is used to report that the underlying file system
>   does not support the operation (POSIX.1-2008).

Here is a link for that:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: vtopde on a uva/gpa 0x1030000 @r325228 (amd64)

2017-11-01 Thread Andriy Gapon
On 01/11/2017 10:12, Andriy Gapon wrote:
> On 01/11/2017 09:33, O. Hartmann wrote:
>> I have the same (or similar) probleme here on two boxes now, maybe more to 
>> come
>> as I start updating CURRENT cyclic.
>>
>> Reverting r325227 solves to problem for now.
> 
> Oliver,
> 
> David and I have been working on this and a fix is coming soon.
> Sorry for the trouble and thanks for the report.
> 

Committed as r325272.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: vtopde on a uva/gpa 0x1030000 @r325228 (amd64)

2017-11-01 Thread Andriy Gapon
On 01/11/2017 09:33, O. Hartmann wrote:
> I have the same (or similar) probleme here on two boxes now, maybe more to 
> come
> as I start updating CURRENT cyclic.
> 
> Reverting r325227 solves to problem for now.

Oliver,

David and I have been working on this and a fix is coming soon.
Sorry for the trouble and thanks for the report.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: vtopde on a uva/gpa 0x1030000 @r325228 (amd64)

2017-10-31 Thread Andriy Gapon
On 31/10/2017 14:32, David Wolfskill wrote:
> Andriy, I "cloned" the slice before doing the above, so I can poke
> at this a bit more (e.g., try to get a crash dump), if that would
> still be useful.

Yes, it would be, as I currently do not see what the problem with r325227 is.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: vtopde on a uva/gpa 0x1030000 @r325228 (amd64)

2017-10-31 Thread Andriy Gapon
On 31/10/2017 13:37, David Wolfskill wrote:
> Any suggestions for diagnosing or fixing it?

Try setting a dump device via loader.conf (e.g. dumpdev="ada0p99") and obtaining
a crash dump.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


D12420 fix the misleading log facility used in devd/zfs.conf

2017-09-22 Thread Andriy Gapon
https://reviews.freebsd.org/D12420

Who would be the best people to review this change?
Where are they lurking?
Please point me towards them or add yourself as a reviewer if you are one of 
them :)
Thanks!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: order of executing MOD_LOAD and registering module sysctl-s

2017-08-03 Thread Andriy Gapon
On 02/08/2017 18:49, John Baldwin wrote:
> sysctl nodes are created explicitly via linker_file_register_sysctls, not via
> SYSINITs, so you can't order them with respect to other init functions.
> 
> I think Andriy's suggestion of doing sysctls "inside" sysinits (so they are
> registered last and unregistered first) is probably better than the current
> state and is a simpler fix than changing all sysctls to use SYSINITs.

Kostik (kib) suggested a possible valid use-case that depends on the current
order: adding dynamic sysctl-s under static sysctl-s via the module load 
handler.
He also offered an idea for a possible solution: holding the modules lock in the
shared mode (MOD_SLOCK) around calls to sysctl-s registered from modules.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


order of executing MOD_LOAD and registering module sysctl-s

2017-08-02 Thread Andriy Gapon

As far as I understand a module initialization routine is executed via the
sysinit mechanism.  Specifically, module_register_init is set up as the sysinit
function for every module and it calls MOD_EVENT(mod, MOD_LOAD) to invoke the
module event handler.

In linker_load_file() I see the following code:
linker_file_register_sysctls(lf);
linker_file_sysinit(lf);

I think that this means that any statically declared sysctl-s in the module
would be registered before the module receives the MOD_LOAD event.
It's possible that some of the sysctl-s could have procedures as handlers and
they might access data that is supposed to be initialized by the module event
handler.

So, for example, running sysctl -a at just the right moment during the loading
of a module might end up in an expected behavior (including a crash).

Is my interpretation of how the code works correct?
Can the order of linker_file_sysinit and linker_file_register_sysctls be changed
without a great risk?

Thank you!

P.S.
The same applies to:
linker_file_sysuninit(file);
linker_file_unregister_sysctls(file);

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs.ko no longer loads after r320156: unresolved symbol: abd_is_linear

2017-08-02 Thread Andriy Gapon
On 02/08/2017 04:00, Ngie Cooper (yaneurabeya) wrote:
> 
>> On Aug 1, 2017, at 09:21, John Baldwin <j...@freebsd.org> wrote:
>>
>> On Tuesday, August 01, 2017 09:47:41 AM Andriy Gapon wrote:
>>> On 01/08/2017 02:31, Ngie Cooper wrote:
>>>> Hi,
>>>>I tried upgrading my host from 11.1-STABLE to 12.0-CURRENT, and it 
>>>> didn’t work because abd_is_linear is an undefined symbol (it exists in 
>>>> sys/conf/files, but not sys/modules/zfs/Makefile). I tried adding abd.c to 
>>>> sys/modules/zfs/Makefile and it didn’t immediately fix my compilation 
>>>> problem (ran into a linker error instead).
>>>>If it isn’t fixed in the next few hours I’ll try my hand at fixing the 
>>>> problem.
>>>
>>> I am not sure what exact problem you have...
>>> abd.c should be added to the list of source files via
>>> .include "${SUNW}/uts/common/Makefile.files"
>>>
>>> Perhaps something to do with "inline"...
>>
>> Oh, yes.  If you use -fno-inline-funcs or the like.  I forgot to
>> send this to Andriy earlier, but here is the fix I'm using:
>>
>> https://github.com/freebsd/freebsd/commit/574dc95cf8272e16f6d44aff6cb4e08dede08886
> 
>   Unfortunately… this is head, verbatim, which means that the bug still 
> exists.
>   This gives me an idea of where I should look though.

The URL indeed suggests that the change should be in head, but it's not there as
far as I can tell.  I never saw it being committed.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs.ko no longer loads after r320156: unresolved symbol: abd_is_linear

2017-08-01 Thread Andriy Gapon
On 01/08/2017 19:21, John Baldwin wrote:
> On Tuesday, August 01, 2017 09:47:41 AM Andriy Gapon wrote:
>> On 01/08/2017 02:31, Ngie Cooper wrote:
>>> Hi,
>>> I tried upgrading my host from 11.1-STABLE to 12.0-CURRENT, and it 
>>> didn’t work because abd_is_linear is an undefined symbol (it exists in 
>>> sys/conf/files, but not sys/modules/zfs/Makefile). I tried adding abd.c to 
>>> sys/modules/zfs/Makefile and it didn’t immediately fix my compilation 
>>> problem (ran into a linker error instead).
>>> If it isn’t fixed in the next few hours I’ll try my hand at fixing the 
>>> problem.
>>
>> I am not sure what exact problem you have...
>> abd.c should be added to the list of source files via
>> .include "${SUNW}/uts/common/Makefile.files"
>>
>> Perhaps something to do with "inline"...
> 
> Oh, yes.  If you use -fno-inline-funcs or the like.  I forgot to
> send this to Andriy earlier, but here is the fix I'm using:
> 
> https://github.com/freebsd/freebsd/commit/574dc95cf8272e16f6d44aff6cb4e08dede08886
> 

Please commit at your convenience.
I'll make sure that the fix is upstreamed as well.
Thank you!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs.ko no longer loads after r320156: unresolved symbol: abd_is_linear

2017-08-01 Thread Andriy Gapon
On 01/08/2017 02:31, Ngie Cooper wrote:
> Hi,
>   I tried upgrading my host from 11.1-STABLE to 12.0-CURRENT, and it 
> didn’t work because abd_is_linear is an undefined symbol (it exists in 
> sys/conf/files, but not sys/modules/zfs/Makefile). I tried adding abd.c to 
> sys/modules/zfs/Makefile and it didn’t immediately fix my compilation problem 
> (ran into a linker error instead).
>   If it isn’t fixed in the next few hours I’ll try my hand at fixing the 
> problem.

I am not sure what exact problem you have...
abd.c should be added to the list of source files via
.include "${SUNW}/uts/common/Makefile.files"

Perhaps something to do with "inline"...

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Caveat emptor: Beware of ZFS on HEAD

2017-07-17 Thread Andriy Gapon
On 12/07/2017 23:20, Peter Wemm wrote:
> We mostly run HEAD in the freebsd.org cluster.  Sometime in the last few 
> weeks 
> an ugly zfs problem has surfaced. If a redundant volume is degraded, zfs 
> panics on boot.  If a drive fails while running, or is manually put offline, 
> zfs 
> panics the same way.
> 
> I do not have a smoking gun, but I am suspicious of the June 28th commits 
> (starting at r320156) and their follow-ups. eg: r320452.
> 
> https://bugs.freebsd.org/220691
> 
> I believe single disk systems will *not* be affected by this - the panic only 
> happens when a raidz (and presumably mirror) degrades.  Your laptop etc 
> should 
> be fine.
> 
> I apologize for being vague - I do not know more. Folks running HEAD should 
> take appropritate precautions (eg: keeping a known-good kernel.old and 
> modules 
> around).  This is always advisable when running HEAD anyway, particularly so 
> now.  For us, a kernel.old from June 18th worked fine.
> 

My apologies for the bug.
Everyone affected, could you please test the patch from the bug report?
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220691#c3
Thank you!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS ABD Panic

2017-06-27 Thread Andriy Gapon
On 27/06/2017 17:16, Shawn Webb wrote:
> On Tue, Jun 27, 2017 at 05:12:01PM +0300, Andriy Gapon wrote:
>> On 26/06/2017 03:31, Shawn Webb wrote:
>>> This is on the latest HardenedBSD 12-CURRENT on one of my servers:
>>>
>>> [141] panic: sleepq_add: td 0xf80008d20560 to sleep on wchan 
>>> 0xf803b7d4e810 with sleeping prohibited
>>> [141] cpuid = 5
>>> [141] time = 1498436043
>>> [141] KDB: stack backtrace:
>>> [141] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
>>> 0xfe2fc8b0
>>> [141] vpanic() at vpanic+0x19c/frame 0xfe2fc930
>>> [141] kassert_panic() at kassert_panic+0x126/frame 0xfe2fc9a0
>>> [141] sleepq_add() at sleepq_add+0x34f/frame 0xfe2fc9f0
>>> [141] _sx_xlock_hard() at _sx_xlock_hard+0x2a4/frame 0xfe2fcaa0
>>> [141] _sx_xlock() at _sx_xlock+0x98/frame 0xfe2fcae0
>>> [141] refcount_remove_many() at refcount_remove_many+0x2a/frame 
>>> 0xfe2fcb20
>>> [141] abd_return_buf() at abd_return_buf+0xe3/frame 0xfe2fcb50
>>> [141] vdev_geom_io_intr() at vdev_geom_io_intr+0x114/frame 
>>> 0xfe2fcb70
>>> [141] g_io_schedule_up() at g_io_schedule_up+0x42/frame 0xfe2fcba0
>>> [141] g_up_procbody() at g_up_procbody+0x6d/frame 0xfe2fcbb0
>>> [141] fork_exit() at fork_exit+0x84/frame 0xfe2fcbf0
>>> [141] fork_trampoline() at fork_trampoline+0xe/frame 0xfe2fcbf0
>>
>> Seems like another architectural incompatibility between illumos and FreeBSD.
>> Are you able to reproduce the crash more or less reliably?
> 
> Yup. I just need to do a buildworld and it's triggered.
> 

Could you please test this patch?
http://dpaste.com/12M183A.txt
I put it together rather quickly, but I think that it should work :)

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS ABD Panic

2017-06-27 Thread Andriy Gapon
On 26/06/2017 03:31, Shawn Webb wrote:
> This is on the latest HardenedBSD 12-CURRENT on one of my servers:
> 
> [141] panic: sleepq_add: td 0xf80008d20560 to sleep on wchan 
> 0xf803b7d4e810 with sleeping prohibited
> [141] cpuid = 5
> [141] time = 1498436043
> [141] KDB: stack backtrace:
> [141] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> 0xfe2fc8b0
> [141] vpanic() at vpanic+0x19c/frame 0xfe2fc930
> [141] kassert_panic() at kassert_panic+0x126/frame 0xfe2fc9a0
> [141] sleepq_add() at sleepq_add+0x34f/frame 0xfe2fc9f0
> [141] _sx_xlock_hard() at _sx_xlock_hard+0x2a4/frame 0xfe2fcaa0
> [141] _sx_xlock() at _sx_xlock+0x98/frame 0xfe2fcae0
> [141] refcount_remove_many() at refcount_remove_many+0x2a/frame 
> 0xfe2fcb20
> [141] abd_return_buf() at abd_return_buf+0xe3/frame 0xfe2fcb50
> [141] vdev_geom_io_intr() at vdev_geom_io_intr+0x114/frame 0xfe2fcb70
> [141] g_io_schedule_up() at g_io_schedule_up+0x42/frame 0xfe2fcba0
> [141] g_up_procbody() at g_up_procbody+0x6d/frame 0xfe2fcbb0
> [141] fork_exit() at fork_exit+0x84/frame 0xfe2fcbf0
> [141] fork_trampoline() at fork_trampoline+0xe/frame 0xfe2fcbf0

Seems like another architectural incompatibility between illumos and FreeBSD.
Are you able to reproduce the crash more or less reliably?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Crash in base/head in abd_put() after r320156

2017-06-21 Thread Andriy Gapon
On 21/06/2017 00:45, Trond Endrestøl wrote:
> On Tue, 20 Jun 2017 17:31-0400, Allan Jude wrote:
> 
>> On 2017-06-20 17:27, Trond Endrestøl wrote:
>>> Has anyone else seen a crash in base/head in abd_put() after r320156?
>>>
>>> One of my experimental VMs at home crashed spectacularly after 
>>> upgrading to r320156. I even wiped my /usr/obj, recompiled everything 
>>> and got the same result. Everything's back to normal when I boot 
>>> r320146.
>>>
>>> Here's the backtrace:
>>>
>>> Fatal trap 12: page fault while in kernel mode
>>> cpuid = 3; apic id = 03
>>>
>>> fault virtual address   = 0x8
>>>
>>> Fatal trap 12: page fault while in kernel mode
>>>
>>> cpuid = 2; 
>>> Fatal trap 12: page fault while in kernel mode
>>> apic id = 02
>>> fault virtual address   = 0x8
>>> cpuid = 0; apic id = 00
>>> fault virtual address   = 0x8
>>> fault code  = supervisor read data, page not present
>>> fault code  = supervisor read data, page not present
>>> instruction pointer = 0x20:0x803260fa
>>> stack pointer   = 0x28:0xfe01b0231860
>>> frame pointer   = 0x28:0xfe01b0231870
>>> code segment= base 0x0, limit 0xf, type 0x1b
>>>
>>> = DPL 0, pres 1, long 1, def32 0, gran 1
>>>
>>> Fatal trap 12: page fault while in kernel mode
>>> fault code  = supervisor read data, page not present
>>> processor eflags= interrupt enabled, resume, IOPL = 0
>>> current process = 0 (zio_free_issue_5_2)
>>> trap number = 12
>>> instruction pointer = 0x20:0x803260fa
>>> stack pointer   = 0x28:0xfe01b022c860
>>> frame pointer   = 0x28:0xfe01b022c870
>>> panic: page fault
>>> cpuid = 0
>>> time = 4
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at 0x8044f93b = 
>>> db_trace_self_wrapper+0x2b/frame 0xfe01b0231440
>>> vpanic() at 0x8067ec0c = vpanic+0x19c/frame 0xfe01b02314c0
>>> panic() at 0x8067ea63 = panic+0x43/frame 0xfe01b0231520
>>> trap_fatal() at 0x80983b32 = trap_fatal+0x322/frame 
>>> 0xfe01b0231570
>>> trap_pfault() at 0x80983b89 = trap_pfault+0x49/frame 
>>> 0xfe01b02315d0
>>> trap() at 0x809833c5 = trap+0x295/frame 0xfe01b0231790
>>> calltrap() at 0x80968c21 = calltrap+0x8/frame 0xfe01b0231790
>>> --- trap 0xc, rip = 0x803260fa, rsp = 0xfe01b0231860, rbp = 
>>> 0xfe01b0231870 ---
>>> abd_put() at 0x803260fa = abd_put+0xa/frame 0xfe01b0231870
>>> vdev_raidz_map_free() at 0x803aa7c2 = 
>>> vdev_raidz_map_free+0x82/frame 0xfe01b02318a0
>>> zio_vdev_io_assess() at 0x803ecc04 = zio_vdev_io_assess+0x74/frame 
>>> 0xfe01b02318e0
>>> zio_execute() at 0x803e913c = zio_execute+0xac/frame 
>>> 0xfe01b0231930
>>> zio_vdev_io_start() at 0x803ec894 = zio_vdev_io_start+0x2b4/frame 
>>> 0xfe01b0231990
>>> zio_execute() at 0x803e913c = zio_execute+0xac/frame 
>>> 0xfe01b02319e0
>>> zio_nowait() at 0x803e8a8b = zio_nowait+0xcb/frame 
>>> 0xfe01b0231a20
>>> vdev_mirror_io_start() at 0x803a744c = 
>>> vdev_mirror_io_start+0x35c/frame 0xfe01b0231a70
>>> zio_vdev_io_start() at 0x803ec86c = zio_vdev_io_start+0x28c/frame 
>>> 0xfe01b0231ad0
>>> zio_execute() at 0x803e913c = zio_execute+0xac/frame 
>>> 0xfe01b0231b20
>>> taskqueue_run_locked() at 0x806d3d27 = 
>>> taskqueue_run_locked+0x127/frame 0xfe01b0231b80
>>> taskqueue_thread_loop() at 0x806d4ee8 = 
>>> taskqueue_thread_loop+0xc8/frame 0xfe01b0231bb0
>>> fork_exit() at 0x80640df5 = fork_exit+0x85/frame 0xfe01b0231bf0
>>> fork_trampoline() at 0x8096915e = fork_trampoline+0xe/frame 
>>> 0xfe01b0231bf0
>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>> Uptime: 4s
>>>
>>
>> This seems to be an unintended consequence of some code that was pulled
>> in from upstream today.
>>
>> Try adding: vfs.zfs.trim.enabled=0
>> to /boot/loader.conf
>>
>> (you can set it manually from the boot loader menu with the set command
>> to get the system to boot)
> 
> That worked. Thanks.
> 
> BTW, the call to abd_put() was given a NULL pointer.
> 

Could you please re-enable ZFS TRIM support and test r320186 or later?
ZFS ABD is a rather large upstream change and our TRIM support is sprinkled over
non-trivial amount of code as well.
Thank you.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: hwpmc and Xeon E5 v4

2017-06-15 Thread Andriy Gapon
On 15/06/2017 11:16, Ngie Cooper wrote:
> On Thu, Jun 15, 2017 at 12:44 AM, Andriy Gapon <a...@freebsd.org> wrote:
>>
>> It seems that hwpmc does not support newer Xeon processors:
>>   pmc: Unknown Intel CPU.
> 
> What FreeBSD version is this?

Head as of January.
I do not see any changes to hwpmc_intel.c since then.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


hwpmc and Xeon E5 v4

2017-06-15 Thread Andriy Gapon

It seems that hwpmc does not support newer Xeon processors:
  pmc: Unknown Intel CPU.

This is how FreeBSD reports a processor from Xeon E5 v4 line:
CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (2095.20-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended
Features=0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE>
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics

I think that this processor belongs to Broadwell-EP family:
http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%20E5-2620%20v4.html

>From my reading of the code it seems that 0xf1 case needs to be added to the 
>big
switch statement on the CPU model number.
But I am not sure if / how the processor is compatible which the previous
models.  Would it suffice to treat that CPU as PMC_CPU_INTEL_BROADWELL_XEON?
Or would something more elaborate be required?

I would appreciate any help, patches, suggestions, documentation links, etc.
Thank you.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

2017-05-29 Thread Andriy Gapon

On 28/05/2017 01:20, Rick Macklem wrote:
> - with the "obvious change" mentioned in r312426's commit message, using
>(flags & SW_TYPE_MASK) == SWT_RELINQUISH instead of (flag & SWT_RELINQUISH)
>121minutes

Rick,

can I see how exactly your variant of the obvious change looks in your version?

I am asking, because I meant applying that change to the original code while it
can be also interpreted as applying it to the code after r312426.

That is, does it looks like this:
preempted = !((td->td_flags & TDF_SLICEEND) || ((flags & SW_TYPE_MASK) 
==
SWT_RELINQUISH));

or like this:
preempted = !(td->td_flags & TDF_SLICEEND) && ((flags & SW_TYPE_MASK) ==
SWT_RELINQUISH);

> I also tested:
> ((flags & SW_PREEMPT) != 0 || (flags & SW_TYPE_MASK) == SWT_IDLE ||
>   (flags & SW_TYPE_MASK) == SWT_IWAIT)
> and it also resulted in121minutes

So, this sets the preempted flag for SWT_IDLE and SWT_IWAIT cases.
The flag makes any difference only if the current thread is calling mi_switch()
but remains running (of which typical cases are preemption and yielding).  As
far as I can tell, mi_switch(SWT_IWAIT) is only called when the thread is
already inhibited via TD_SET_IWAIT, so that should not make any difference.
SWT_IDLE is set only when the current thread is an idle thread, so that should
not make any difference either.
Thus, I am puzzled as to why this change could make any difference.

Could you please post full code snippets for each local change that you tried?

Also, could you please capture KTR sched trace while running the test on the
kernel with no local modifications and on the 1yr old kernel?
Ideally, I would like to see the trace with KTR_SCHED | KTR_RUNQ compiled into
the kernel via KTR_COMPILE and then enabled at the run time via
debug.ktr.mask=0x2040.

Thank you.



-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

2017-05-28 Thread Andriy Gapon
On 28/05/2017 01:20, Rick Macklem wrote:
> After poking at this some more, it appears that r312426 is the main cause of
> this degradation.

Rick,

thank you for the investigation!
A quick question before a longer reply: what network driver do you use in your
test setup?  Is it ixl by a chance?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs recv panic

2017-05-16 Thread Andriy Gapon
On 16/05/2017 16:49, Kristof Provost wrote:
> On 16 May 2017, at 15:41, Andriy Gapon wrote:
>> On 10/05/2017 12:37, Kristof Provost wrote:
>>> I have a reproducible panic on CURRENT (r318136) doing
>>> (jupiter) # zfs send -R -v zroot/var@before-kernel-2017-04-26 | nc dual 1234
>>> (dual) # nc -l 1234 | zfs recv -v -F tank/jupiter/var
>>>
>>> For clarity, the receiving machine is CURRENT r318136, the sending machine 
>>> is
>>> running a somewhat older CURRENT version.
>>>
>>> The receiving machine panics a few seconds in:
>>>
>>> receiving full stream of zroot/var@before-kernel-2017-04-03 into
>>> tank/jupiter/var@before-kernel-2017-04-03
>>> panic: solaris assert: dbuf_is_metadata(db) == arc_is_metadata(buf) (0x0 ==
>>> 0x1), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c,
>>> line: 2007
>>
>> could you please try to revert commits related to the compressed send and 
>> see if
>> that helps?  I assume that the sending machine does not have (does not use) 
>> the
>> feature while the target machine is capable of the feature.
>>
>> The commits are: r317648 and r317414.  Mot that I really suspect that change,
>> but just to eliminate the possibility.
> 
> Those commits appear to be the trigger.
> I’ve not changed the sender, but with those reverted I don’t see the panic any
> more.

Thank you for testing.
Do you still have the old kernel / module and the crash dump?
It would interesting to poke around in frame 14.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs recv panic

2017-05-16 Thread Andriy Gapon
On 10/05/2017 12:37, Kristof Provost wrote:
> Hi,
> 
> I have a reproducible panic on CURRENT (r318136) doing
> (jupiter) # zfs send -R -v zroot/var@before-kernel-2017-04-26 | nc dual 1234
> (dual) # nc -l 1234 | zfs recv -v -F tank/jupiter/var
> 
> For clarity, the receiving machine is CURRENT r318136, the sending machine is
> running a somewhat older CURRENT version.
> 
> The receiving machine panics a few seconds in:
> 
> receiving full stream of zroot/var@before-kernel-2017-04-03 into
> tank/jupiter/var@before-kernel-2017-04-03
> panic: solaris assert: dbuf_is_metadata(db) == arc_is_metadata(buf) (0x0 ==
> 0x1), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c,
> line: 2007

Kristof,

could you please try to revert commits related to the compressed send and see if
that helps?  I assume that the sending machine does not have (does not use) the
feature while the target machine is capable of the feature.

The commits are: r317648 and r317414.  Mot that I really suspect that change,
but just to eliminate the possibility.
Thank you.

> cpuid = 0
> time = 1494408122
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0120cad930
> vpanic() at vpanic+0x19c/frame 0xfe0120cad9b0
> panic() at panic+0x43/frame 0xfe0120cada10
> assfail3() at assfail3+0x2c/frame 0xfe0120cada30
> dbuf_assign_arcbuf() at dbuf_assign_arcbuf+0xf2/frame 0xfe0120cada80
> dmu_assign_arcbuf() at dmu_assign_arcbuf+0x170/frame 0xfe0120cadad0
> receive_writer_thread() at receive_writer_thread+0x6ac/frame 
> 0xfe0120cadb70
> fork_exit() at fork_exit+0x84/frame 0xfe0120cadbb0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe0120cadbb0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 7 tid 100672 ]
> Stopped at  kdb_enter+0x3b: movq$0,kdb_why
> db>
> 
> 
> kgdb backtrace:
> #0  doadump (textdump=0) at pcpu.h:232
> #1  0x803a208b in db_dump (dummy=, dummy2= optimized out>, dummy3=, dummy4=) at
> /usr/src/sys/ddb/db_command.c:546
> #2  0x803a1e7f in db_command (cmd_table=) at
> /usr/src/sys/ddb/db_command.c:453
> #3  0x803a1bb4 in db_command_loop () at 
> /usr/src/sys/ddb/db_command.c:506
> #4  0x803a4c7f in db_trap (type=, code= optimized out>) at /usr/src/sys/ddb/db_main.c:248
> #5  0x80a93cb3 in kdb_trap (type=3, code=-61456, tf= out>) at /usr/src/sys/kern/subr_kdb.c:654
> #6  0x80ed3de6 in trap (frame=0xfe0120cad860) at
> /usr/src/sys/amd64/amd64/trap.c:537
> #7  0x80eb62f1 in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:236
> #8  0x80a933eb in kdb_enter (why=0x8143d8f5 "panic", 
> msg= optimized out>) at cpufunc.h:63
> #9  0x80a51cf9 in vpanic (fmt=,
> ap=0xfe0120cad9f0) at /usr/src/sys/kern/kern_shutdown.c:772
> #10 0x80a51d63 in panic (fmt=) at
> /usr/src/sys/kern/kern_shutdown.c:710
> #11 0x8262b26c in assfail3 (a=, lv= optimized
> out>, op=, rv=, f= out>, l=)
> at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
> #12 0x822ad892 in dbuf_assign_arcbuf (db=0xf8008f23e560,
> buf=0xf8008f09fcc0, tx=0xf8008a8d5200) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:2007
> #13 0x822b87f0 in dmu_assign_arcbuf (handle=,
> offset=0, buf=0xf8008f09fcc0, tx=0xf8008a8d5200) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1542
> #14 0x822bf7fc in receive_writer_thread (arg=0xfe0120a1d168) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:2284
> #15 0x80a13704 in fork_exit (callout=0x822bf150
> , arg=0xfe0120a1d168, frame=0xfe0120cadbc0) at
> /usr/src/sys/kern/kern_fork.c:1038
> #16 0x80eb682e in fork_trampoline () at
> /usr/src/sys/amd64/amd64/exception.S:611
> #17 0x in ?? ()
> 
> Let me know if there’s any other information I can provide, or things I can 
> test.
> Fortunately the target machine is not a production machine, so I can panic it 
> as
> often as required.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Panic String: solaris assert: (lsize != psize) implies ((flags & ZIO_FLAG_RAW) != 0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 631

2017-04-28 Thread Andriy Gapon
On 28/04/2017 14:56, Michael Jung wrote:
> I have mad the requested change..
> 
> [root@bsd11 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs]# diff 
> zio.c
> ~mikej/zio.c.orig
> 965c965
> < size, NULL, NULL, ZIO_TYPE_FREE, ZIO_PRIORITY_NOW,
> ---
>> BP_GET_PSIZE(bp), NULL, NULL, ZIO_TYPE_FREE, ZIO_PRIORITY_NOW,

Yes, that's the change that I had in mind.
I was a little bit confused by the order of the original and modified files,
though :-)

> [root@bsd11 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs]#
> 
> As to the pool size:
> 
> [root@bsd11 /usr/home/mikej]# zpool list
> NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
> tank   199G   143G  55.9G -85%71%  1.00x  ONLINE  -
> [root@bsd11 /usr/home/mikej]#
> 
> I should have also mentioned that besides poudriere running a build, it was
> removing old logs - There was some 43G of old logs files that were in the 
> process
> of being removed.

So, given that the panic was in the freeing path, you were probably low on the
pool space back when those log files were created.  I mean that the gang blocks
are typically created when a pool is very fragmented.

> I will hammer the box with and report back first of the week whether the panic
> re-occurs or not.

Please also try removing those old files again too.
Running zpool scrub afterwards could be a good idea too.

Thank you again!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic String: solaris assert: (lsize != psize) implies ((flags & ZIO_FLAG_RAW) != 0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 631

2017-04-27 Thread Andriy Gapon
/db_command.c:453
> #3  0x803a1aa4 in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:506
> #4  0x803a4b6f in db_trap (type=,
> code=) at /usr/src/sys/ddb/db_main.c:248
> #5  0x80a9 in kdb_trap (type=3, code=-61456,
> tf=) at /usr/src/sys/kern/subr_kdb.c:654
> #6  0x80ed2de6 in trap (frame=0xfe086140e780)
> at /usr/src/sys/amd64/amd64/trap.c:537
> #7  0x80eb54e1 in calltrap ()
> at /usr/src/sys/amd64/amd64/exception.S:236
> #8  0x80a92a6b in kdb_enter (why=0x8143c265 "panic",
> msg=) at cpufunc.h:63
> #9  0x80a513c9 in vpanic (fmt=,
> ap=0xfe086140e910) at /usr/src/sys/kern/kern_shutdown.c:772
> #10 0x80a51433 in panic (fmt=)
> at /usr/src/sys/kern/kern_shutdown.c:710
> #11 0x82a6623a in assfail (a=,
> f=, l=)
> at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
> #12 0x828eab3f in zio_create (pio=0xf807def8e810,
> spa=, txg=19514648, bp=0xf807def8e880, data=0x0,
> lsize=512, psize=1024, done=0, private=0x19, type=ZIO_TYPE_NULL,
> priority=512, offset=0, zb=0x80a98d40, pipeline=17301632)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:631
> #13 0x828eb897 in zio_free_sync (pio=0xf807def8e810,
> spa=0xfe000289e000, txg=19514648, bp=0xf807def8e880,
> size=, flags=17301632) at time.h:67
> #14 0x828f422f in zio_gang_tree_issue (pio=0xf807def8e810,
> gn=0xf805bd9162e0, bp=0xf807def8e880, data=0x0)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2114
> #15 0x828f0992 in zio_gang_issue (zio=0xf807def8e810)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2162
> #16 0x828ecb4c in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1694
> #17 0x80aa56bd in taskqueue_run_locked (queue=0xf800096bf900)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2114
> #15 0x828f0992 in zio_gang_issue (zio=0xf807def8e810)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2162
> #16 0x828ecb4c in zio_execute (zio=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1694
> #17 0x80aa56bd in taskqueue_run_locked (queue=0xf800096bf900)
> at /usr/src/sys/kern/subr_taskqueue.c:454
> #18 0x80aa6478 in taskqueue_thread_loop (arg=)
> at /usr/src/sys/kern/subr_taskqueue.c:746
> #19 0x80a13074 in fork_exit (
> callout=0x80aa63f0 ,
> arg=0xf80009350f00, frame=0xfe086140ec00)
> at /usr/src/sys/kern/kern_fork.c:1038
> #20 0x80eb5a1e in fork_trampoline ()
> at /usr/src/sys/amd64/amd64/exception.S:611
> #21 0x in ?? ()
> Current language:  auto; currently minimal
> (kgdb)
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New syscons bugs: shutdown -r doesn't execute rc.d sequence and others

2017-03-30 Thread Andriy Gapon
On 30/03/2017 14:23, Andriy Gapon wrote:
> On 30/03/2017 12:34, Andrey Chernov wrote:
>> On 30.03.2017 12:23, Andrey Chernov wrote:
>>> Yes, only for reboot/shutdown. The system does not do anythings wrong
>>> even under high load. On reboot or hang those lines are never printed:
>>>
>>> kernel: Waiting (max 60 seconds) for system process `vnlru' to stop...done
>>> kernel: Waiting (max 60 seconds) for system process `bufdaemon' to
>>> stop...done
>>> kernel: Waiting (max 60 seconds) for system process `syncer' to stop...
>>> kernel: Syncing disks, vnodes remaining...5 3 0 1 0 0 done
>>> kernel: All buffers synced.
>>> (it is from 10-stable sample, old -current samples are lost)
>>>
>>> Moreover, GELI swap deactivation lines are never printed too (I already
>>> mention that I change swap to normal, but nothing is changed).
>>
>> I start to have raw guess that _any_ kernel printf in shutdown mode
>> cause not printf but premature reboot.
> 
> This sounds somewhat familiar...
> I vaguely recall an opposite issue that happened in the past.  After one of my
> changes the reboot started hanging for one user.  Turned out that the actual 
> bug
> was always there, but previously the system rebooted because of a printf that
> caused a LOR (between spinlocks, AFAIR), witness tried to report it... using
> printf, and that recursed and there was a triple fault in the end.
> 
> Let me try to dig some details, maybe the current issue is related in some 
> ways.

Here they are:
https://lists.freebsd.org/pipermail/freebsd-hackers/2012-May/038812.html
Turns out I remembered them quite wrong.

> By chance, do you have WITNESS but not WITNESS_SKIPSPIN in your kernel config?


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New syscons bugs: shutdown -r doesn't execute rc.d sequence and others

2017-03-30 Thread Andriy Gapon
On 30/03/2017 12:34, Andrey Chernov wrote:
> On 30.03.2017 12:23, Andrey Chernov wrote:
>> Yes, only for reboot/shutdown. The system does not do anythings wrong
>> even under high load. On reboot or hang those lines are never printed:
>>
>> kernel: Waiting (max 60 seconds) for system process `vnlru' to stop...done
>> kernel: Waiting (max 60 seconds) for system process `bufdaemon' to
>> stop...done
>> kernel: Waiting (max 60 seconds) for system process `syncer' to stop...
>> kernel: Syncing disks, vnodes remaining...5 3 0 1 0 0 done
>> kernel: All buffers synced.
>> (it is from 10-stable sample, old -current samples are lost)
>>
>> Moreover, GELI swap deactivation lines are never printed too (I already
>> mention that I change swap to normal, but nothing is changed).
> 
> I start to have raw guess that _any_ kernel printf in shutdown mode
> cause not printf but premature reboot.

This sounds somewhat familiar...
I vaguely recall an opposite issue that happened in the past.  After one of my
changes the reboot started hanging for one user.  Turned out that the actual bug
was always there, but previously the system rebooted because of a printf that
caused a LOR (between spinlocks, AFAIR), witness tried to report it... using
printf, and that recursed and there was a triple fault in the end.

Let me try to dig some details, maybe the current issue is related in some ways.

By chance, do you have WITNESS but not WITNESS_SKIPSPIN in your kernel config?


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Opteron 6100-series "Magny-Cours"

2017-03-27 Thread Andriy Gapon
On 27/03/2017 15:06, Piotr Kubaj wrote:
> Does it have to be specifically 61xx series? I have a server running 2 
> 6262HE's.
> 

Yes.  I have the info that I need for 62xx Opterons.
Thanks.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New /head/sys/amd64/amd64/genassym.c breaks buildkernel amd64 current

2017-03-27 Thread Andriy Gapon
On 03/27/2017 14:35, Rainer Hurling wrote:
> Am 27.03.2017 um 10:31 schrieb Andriy Gapon:
>> On 03/26/2017 00:21, Manfred Antar wrote:
>>> Recent change to genassym.c breaks building a current kernel:
>>>
>>> --
>>>>>> stage 3.1: building everything
>>> --
>>> cd /usr/obj/usr/src/sys/pozo; COMPILER_VERSION=4  COMPILER_TYPE=clang 
>>> COMPILER_FREEBSD_VERSION=126 MAKEOBJDIRPREFIX=/usr/obj 
>>> MACHINE_ARCH=amd64  MACHINE=amd64  CPUTYPE=
>>> GROFF_BIN_PATH=/usr/obj/usr/src/tmp/legacy/usr/bin 
>>> GROFF_FONT_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/groff_font 
>>> GROFF_TMAC_PATH=/usr/obj/usr/src/tmp/legacy/usr/share/tmac
>>> CC="/usr/local/bin/ccache cc -target x86_64-unknown-freebsd12.0
>>> --sysroot=/usr/obj/usr/src/tmp -B/usr/obj/usr/src/tmp/usr/bin"
>>> CXX="/usr/local/bin/ccache c++  -target x86_64-unknown-freebsd12.0
>>> --sysroot=/usr/obj/usr/src/tmp -B/usr/obj/usr/src/tmp/usr/bin"  CPP="cpp
>>> -target x86_64-unknown-freebsd12.0 --sysroot=/usr/obj/usr/src/tmp
>>> -B/usr/obj/usr/src/tmp/usr/bin"  AS="as" AR="ar" LD="ld" LLVM_LINK=""  NM=nm
>>> OBJCOPY="objcopy"  RANLIB=ranlib STRINGS=  SIZE="size"  INSTALL="sh
>>> /usr/src/tools/install.sh" 
>>> PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/usr/obj/usr/src/tmp/usr
>>>
>>   /sbin:/usr/obj/usr/src/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin make  -m
>> /usr/src/share/mk  KERNEL=kernel all -DNO_MODULES_OBJ
>>> machine -> /usr/src/sys/amd64/include
>>> x86 -> /usr/src/sys/x86/include
>>> /usr/local/bin/ccache cc -target x86_64-unknown-freebsd12.0
>>> --sysroot=/usr/obj/usr/src/tmp -B/usr/obj/usr/src/tmp/usr/bin -c -O2 -pipe
>>> -fno-strict-aliasing -g -nostdinc -I. -I/usr/src/sys
>>> -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS 
>>> -include
>>> opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -MD
>>> -MF.depend.genassym.o -MTgenassym.o -mcmodel=kernel -mno-red-zone -mno-mmx
>>> -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv
>>> -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs
>>> -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline 
>>> -Wcast-qual
>>> -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__
>>> -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas
>>> -Wno-error-tautological-compare -Wno-error-empty-body
>>> -Wno-error-parentheses-equality -Wno-error-unused-function
>>> -Wno-error-pointer-sign -Wno-error-shift-negative-value
>>> -Wno-error-address-of-packed-member -mno-aes -mno-avx -std=iso9
>>   899:1999 /usr/src/sys/amd64/amd64/genassym.c
>>> In file included from /usr/src/sys/amd64/amd64/genassym.c:47:
>>> /usr/src/sys/sys/bus.h:730:10: fatal error: 'device_if.h' file not found
>>> #include "device_if.h"
>>>   ^
>>> 1 error generated.
>>> *** Error code 1
>>>
>>> Stop.
>>> make[2]: stopped in /usr/obj/usr/src/sys/pozo
>>> *** Error code 1
>>>
>>> Stop.
>>> make[1]: stopped in /usr/src
>>> *** Error code 1
>>>
>>> Stop.
>>> make: stopped in /usr/src
>>>
>>>
>>> cd /usr/obj/usr/src/sys/pozo ; make device_if.h
>>> awk -f /usr/src/sys/tools/makeobjops.awk /usr/src/sys/kern/device_if.m -h
>>>
>>> also bus_if.h is missing:
>>> (pozo)5023}make
>>> /usr/local/bin/ccache cc -c -O2 -pipe -fno-strict-aliasing -g -nostdinc -I.
>>> -I/usr/src/sys -I/usr/src/sys/contrib/libfdt -D_KERNEL
>>> -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer
>>> -mno-omit-leaf-frame-pointer -MD -MF.depend.genassym.o -MTgenassym.o
>>> -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float
>>> -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector
>>> -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
>>> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef
>>> -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs
>>> -fdiagnostics-show-option -Wno-unknown-pragmas
>>> -Wno-error-tautological-compare -Wno-error-empty-body
&g

Re: Opteron 6100-series "Magny-Cours"

2017-03-27 Thread Andriy Gapon
On 03/25/2017 23:26, Jack L. wrote:
> I have a few still sitting in a corner with FreeBSD 7 or 8 on them. Someday i 
> might put them back on with FreeBSD but not anytime soon

Apologies for not qualifying my question.
I would like to obtain some information from such a system and possibly to ask
to test a patch.
Looks like you won't be able to help with that.  At least, until that some day 
:-).

>> On Mar 25, 2017, at 11:02 AM, Andriy Gapon <a...@freebsd.org> wrote:
>>
>>
>> Does anyone [still] use Opteron 6100-series / "Magny-Cours" processors with 
>> FreeBSD?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New /head/sys/amd64/amd64/genassym.c breaks buildkernel amd64 current

2017-03-27 Thread Andriy Gapon
s -I/usr/src/sys/contrib/libfdt -D_KERNEL 
> -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h  -fno-omit-frame-pointer 
> -mno-omit-leaf-frame-pointer  -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse 
> -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv 
> -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs 
> -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual 
> -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ 
> -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas 
> -Wno-error-tautological-compare -Wno-error-empty-body 
> -Wno-error-parentheses-equality -Wno-error-unused-function 
> -Wno-error-pointer-sign -Wno-error-shift-negative-value 
> -Wno-error-address-of-packed-member  -mno-aes -mno-avx  -std=iso9899:1999   
> vers.c
> ctfconvert -L VERSION -g vers.o
> --- kernel.full ---
> linking kernel.full
> ctfmerge -L VERSION -g -o kernel.full ...
>  text data   bssdechex   filename
>   8657083   805570   3350664   12813317   0xc38405   kernel.full
> --- kernel.debug ---
> objcopy --only-keep-debug kernel.full kernel.debug
> --- kernel ---
> objcopy --strip-debug --add-gnu-debuglink=kernel.debug  kernel.full kernel
> 
> somehow this needs to happen before genassym.c is compiled
> this is a kernel without any modules

I've got another report about this problem, but I can not reproduce it here with
a clean kernel build of GENERIC.
I am not sure what the problem is.
Do you have anything unusual in make.conf, src.conf or your kernel 
configuration?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Opteron 6100-series "Magny-Cours"

2017-03-25 Thread Andriy Gapon

Does anyone [still] use Opteron 6100-series / "Magny-Cours" processors with 
FreeBSD?


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: start-up failure at SVN r314889

2017-03-08 Thread Andriy Gapon
On 08/03/2017 14:55, Michael Butler wrote:
> My laptop usually starts like this ..
> 
> FreeBSD 12.0-CURRENT #21 r314812M: Mon Mar  6 19:34:51 EST 2017
> i...@toshi.auburn.protected-networks.net:/usr/obj/usr/src/sys/TOSHI amd64
> FreeBSD clang version 4.0.0 (branches/release_40 296509) (based on LLVM
> 4.0.0)
> VT(vga): resolution 640x480
> info: [drm] Initialized drm 1.1.0 20060810
> CPU: Intel(R) Core(TM)2 CPU T7600  @ 2.33GHz (2327.56-MHz
> K8-class CPU)
>  [ .. ]
> 
> This morning, I get this :-(
> 
> FreeBSD 12.0-CURRENT #27 r314889M: Tue Mar  7 19:55:25 EST 2017
> i...@toshi.auburn.protected-networks.net:/usr/obj/usr/src/sys/TOSHI
> FreeBSD clang version 4.0.0 (branches/release_40 296509) (based on LLVM
> 4.0.0)
> VT(vga): resolution 640x480
> panic: kthread_add called too soon
>  [ .. ]
> 
> Any thoughts?

Were messages replaced by the second '[..]' really so useless?


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


CFT: aacraid users

2017-03-06 Thread Andriy Gapon

If you currently use aacraid(4) driver and can afford to run a test,
could you please test if you get any regressions after applying the following 
patch?

https://reviews.freebsd.org/D9900.diff

Thank you!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kernel trap 12 with interrupts disabled

2017-03-04 Thread Andriy Gapon
er,,PauseFilterThreshold> Revision=1, ASIDs=65536
>>   TSC: P-state invariant, performance statistics
>> L1 2MB data TLB: 64 entries, fully associative
>> L1 2MB instruction TLB: 24 entries, fully associative
>> L1 4KB data TLB: 64 entries, fully associative
>> L1 4KB instruction TLB: 48 entries, fully associative
>> L1 data cache: 16 kbytes, 64 bytes/line, 1 lines/tag, 4-way associative
>> L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way
>> associative L2 2MB data TLB: 1024 entries, 8-way associative
>> L2 4KB data TLB: 1024 entries, 8-way associative
>> L2 4KB instruction TLB: 1024 entries, 8-way associative
>> L2 unified cache: 2048 kbytes, 64 bytes/line, 1 lines/tag, 16-way associative
>> real memory  = 34359738368 (32768 MB)
>> Physical memory chunk(s):
>> 0x0001 - 0x0005, 327680 bytes (80 pages)
>> 0x0007 - 0x00098fff, 167936 bytes (41 pages)
>> 0x0010 - 0x001f, 1048576 bytes (256 pages)
>> 0x0134 - 0xbfd9, 3198550016 bytes (780896 pages)
>> 0x0001 - 0x00080a849fff, 30241234944 bytes (7383114 pages)
>> avail memory = 33272029184 (31730 MB)
>> Event timer "LAPIC" quality 100
>> LAPIC: ipi_wait() us multiplier 29 (r 13818693 tsc 4018024582)
>> ACPI APIC Table: 
>> Package ID shift: 4
>> L3 cache ID shift: 3
>> L2 cache ID shift: 1
>> L1 cache ID shift: 0
>> Core ID shift: 0
>> INTR: Adding local APIC 1 as a target
>> INTR: Adding local APIC 2 as a target
>> INTR: Adding local APIC 3 as a target
>> INTR: Adding local APIC 4 as a target
>> INTR: Adding local APIC 5 as a target
>> INTR: Adding local APIC 6 as a target
>> INTR: Adding local APIC 7 as a target
>> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
>> FreeBSD/SMP: 1 package(s) x 8 core(s)
>> Package HW ID = 0
>> Core HW ID = 0
>> CPU0 (BSP): APIC ID: 0
>> Core HW ID = 1
>> CPU1 (AP): APIC ID: 1
>> Core HW ID = 2
>> CPU2 (AP): APIC ID: 2
>> Core HW ID = 3
>> CPU3 (AP): APIC ID: 3
>> Core HW ID = 4
>> CPU4 (AP): APIC ID: 4
>> Core HW ID = 5
>> CPU5 (AP): APIC ID: 5
>> Core HW ID = 6
>> CPU6 (AP): APIC ID: 6
>> Core HW ID = 7
>> CPU7 (AP): APIC ID: 7
>> APIC: CPU 0 has ACPI ID 0
>> APIC: CPU 1 has ACPI ID 1
>> APIC: CPU 2 has ACPI ID 2
>> APIC: CPU 3 has ACPI ID 3
>> APIC: CPU 4 has ACPI ID 4
>> APIC: CPU 5 has ACPI ID 5
>> APIC: CPU 6 has ACPI ID 6
>> APIC: CPU 7 has ACPI ID 7
>> lapic0: MCE Thresholding ELVT unmasked
>> kernel trap 12 with interrupts disabled
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id = 00
>> fault virtual address   = 0x0
>> fault code  = supervisor write data, page not present
>> instruction pointer = 0x20:0x809b36ed
>> stack pointer   = 0x28:0x8130baa0
>> frame pointer   = 0x28:0x8130bad0
>> code segment= base 0x0, limit 0xf, type 0x1b
>> = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags= resume, IOPL = 0
>> current process = 0 ()
>> [ thread pid 0 tid 0 ]
>> Stopped at  _mca_init+0x55d:movl$0x1,(%rax,%rcx,1)
>> db> bt
>> Tracing pid 0 tid 0 td 0x810a9dc0
>> _mca_init() at _mca_init+0x55d/frame 0x8130bad0
>> mi_startup() at mi_startup+0x9c/frame 0x8130baf0
>> btext() at btext+0x2c
>> db>

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: major code change for .zfs

2017-02-20 Thread Andriy Gapon
On 23/08/2016 11:43, Andriy Gapon wrote:
> 
> Please review and test a change to .zfs code that is intended to make the code
> aligned with FreeBSD VFS and, as such, more stable:
> https://reviews.freebsd.org/D7421
> 
> The change removes two features.
> .zfs/shares is gone because it was unused on FreeBSD anyway.  We can restore
> that when we need it.
> An ability to take a snapshot by creating a directory under .zfs/snapshot is
> removed.  I hope that you didn't use it.  Please do not start using it now :-)
> Again, this feature can be restored with some work.
> The reason I removed it is that its companion features of destroying and
> renaming snapshots were already missing on FreeBSD, and properly implementing
> the feature required some more work.

This is a heads-up that I am going to commit the change.
If you have objections or concerns please speak up.
Thanks!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


confusing KTR_SCHED traces

2017-02-17 Thread Andriy Gapon

First, an example, three consecutive entries for the same thread (from top to
bottom):
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"sleep",
attributes: prio:84, wmesg:"-", lockname:"(null)"
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"spinning",
attributes: lockname:"sched lock 1"
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"running",
attributes: none

Any automatic analysis tool including schedgraph.py will assume that the thread
ends up in the running state.  In reality, of course, the thread is in the
sleeping state.
The confusing trace is a result of logging the thread's intention to switch out
in mi_switch() before calling sched_switch().  In ULE's sched_switch() we
acquire the "TDQ_LOCK" which could be contested.  In that case the thread spins
waiting for the lock to be released.  This is reported as "spinning" and then
"running" states.

I would like to fix that, but not sure how to do that best.
One idea is to move the mi_switch() trace closer to the cpu_switch() call
similarly to DTrace sched:cpu-off and sched:cpu-on probes.

Any suggestions are welcome.
Thanks!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


basic evdev setup

2017-02-15 Thread Andriy Gapon

Oleksandr,

at the moment the documentation for evdev on FreeBSD is very scarce, even if we
talk about wiki pages, informal howto-s or blog posts.
So, I would like to ask your help for a very basic evdev test setup.

All input devices I have are standard keyboard and a mouse with some extra keys.
I would like to be able to use the keyboard and the mouse as usual when in the
console.  And I would like to be able to use the extra mouse keys in X.

What steps should I take to achieve that?
I already evdev + EVDEV_SUPPORT on the kernel side in addition to the regular
keyboard and mouse drivers (atkbdc + ums).
I have also installed xf86-input-evdev.

Do I need any additional kernel evedev configuration via sysctl?
What should I add to xorg configuration to enable evdev for X?

Thank you!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic on current during shutdown: panic: racct_adjust_resource: resource 4 usage < 0

2017-01-20 Thread Andriy Gapon
On 20/01/2017 02:09, Larry Rosenman wrote:
> Thu Jan 19 18:03:38 CST 2017
> 
> FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #13 r311997: Sat Jan
> 14 22:35:29 CST 2017 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  
> amd64
> 
> panic: racct_adjust_resource: resource 4 usage < 0
[snip]

Very interesting.
Could you please contribute this information to
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210315
?
These could be related issues.

> Unread portion of the kernel message buffer:
> <118>.
> <118>Terminated
> <118>Jan 19 17:54:50 192.168.200.11 last message repeated 13 times
> <118>Jan 19 17:54:59 borg syslogd: exiting on signal 15
> panic: racct_adjust_resource: resource 4 usage < 0
> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe2eb7c18830
> vpanic() at vpanic+0x186/frame 0xfe2eb7c188b0
> kassert_panic() at kassert_panic+0x126/frame 0xfe2eb7c18920
> racct_adjust_resource() at racct_adjust_resource+0xca/frame 0xfe2eb7c18950
> racct_set_locked() at racct_set_locked+0xec/frame 0xfe2eb7c18990
> racct_set() at racct_set+0x54/frame 0xfe2eb7c189c0
> vmspace_exit() at vmspace_exit+0x147/frame 0xfe2eb7c18a00
> exit1() at exit1+0x56b/frame 0xfe2eb7c18a60
> sys_sys_exit() at sys_sys_exit+0xd/frame 0xfe2eb7c18a70
> amd64_syscall() at amd64_syscall+0x2ea/frame 0xfe2eb7c18bf0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe2eb7c18bf0
> --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8025c916a, rsp =
> 0x7fffebd8, rbp = 0x7fffebf0 ---
> Uptime: 4d4h19m38s
> Dumping 12670 out of 64463 
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
[snip]
> __curthread () at ./machine/pcpu.h:222
> 222 __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) #0  __curthread () at ./machine/pcpu.h:222
> #1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:318
> #2  0x80a2ffb5 in kern_reboot (howto=)
> at /usr/src/sys/kern/kern_shutdown.c:386
> #3  0x80a30590 in vpanic (fmt=, ap=0xfe2eb7c188f0)
> at /usr/src/sys/kern/kern_shutdown.c:779
> #4  0x80a303c6 in kassert_panic (
> fmt=0x813ee4fb "%s: resource %d usage < 0")
> at /usr/src/sys/kern/kern_shutdown.c:669
> #5  0x80a21eca in racct_adjust_resource (racct=0xf8001b7c00d0,
> resource=4, amount=) at /usr/src/sys/kern/kern_racct.c:528
> #6  0x80a21acc in racct_set_locked (p=0xf80055f41528,
> resource=, amount=0, force=0)
> at /usr/src/sys/kern/kern_racct.c:718
> #7  0x80a21994 in racct_set (p=0xf80055f41528, resource=4,
> amount=0) at /usr/src/sys/kern/kern_racct.c:741
> #8  0x80d0f8e7 in vmspace_container_reset (p=)
> at /usr/src/sys/vm/vm_map.c:311
> #9  vmspace_exit (td=) at /usr/src/sys/vm/vm_map.c:420
> #10 0x809f01ab in exit1 (td=, rval=,
> signo=) at /usr/src/sys/kern/kern_exit.c:399
> #11 0x809efc3d in sys_sys_exit (td=, uap=)
> at /usr/src/sys/kern/kern_exit.c:178
> #12 0x80e9a98a in syscallenter (td=0xf80055de6000,
> sa=)
> at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
> #13 amd64_syscall (td=0xf80055de6000, traced=0)
> at /usr/src/sys/amd64/amd64/trap.c:902
> #14 
> Can't read data for section '.eh_frame' in file '/'
> (kgdb)
> 
> vmcore IS available.
> 
> 


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: firewire panic

2016-11-14 Thread Andriy Gapon
On 14/11/2016 11:58, Gary Jennejohn wrote:
> On Sun, 13 Nov 2016 23:56:09 +0200
> Andriy Gapon <a...@freebsd.org> wrote:
> 
>> On 11/11/2016 14:25, Andriy Gapon wrote:
>>> panic: mutex sbp not owned at /usr/src/sys/dev/firewire/sbp.c:967
>>> cpuid = 2
>>> curthread: 0xf8000ada5000
>>> stack: 0xfe0504ded000 - 0xfe0504df1000
>>> stack pointer: 0xfe0504df0a00
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at 0x80420bbb = 
>>> db_trace_self_wrapper+0x2b/frame
>>> 0xfe0504df0930
>>> kdb_backtrace() at 0x80670359 = kdb_backtrace+0x39/frame 
>>> 0xfe0504df09e0
>>> vpanic() at 0x8063986c = vpanic+0x14c/frame 0xfe0504df0a20
>>> panic() at 0x806395b3 = panic+0x43/frame 0xfe0504df0a80
>>> __mtx_assert() at 0x8061c40d = __mtx_assert+0xed/frame 
>>> 0xfe0504df0ac0
>>> sbp_cam_scan_lun() at 0x80474667 = sbp_cam_scan_lun+0x37/frame
>>> 0xfe0504df0af0
>>> xpt_done_process() at 0x802aacfa = xpt_done_process+0x2da/frame
>>> 0xfe0504df0b30
>>> xpt_done_td() at 0x802ac2e5 = xpt_done_td+0xd5/frame 
>>> 0xfe0504df0b80  
>>
>> So, it's pretty obvious that the sbp mutex can not be held when
>> sbp_cam_scan_lun() is called.
>>
> 
> The code seems to assume that the scan_callout callout is still
> holding the mutex when sbp_cam_scan_lun() is entered.
> 
> Seems reasonable, since the man page claims that the callout routine
> keeps the mutex locked until the callout function, in this case that's
> sbp_cam_scan_target(), returns.  Since sbp_cam_scan_target() invokes
> xpt_action() with sbp_cam_scan_lun() as its callback, it seems like
> the assumption should be true.

The wrong assumption in your reasoning is that the callback is executed in the
same thread.

> Pehaps there's some asynchronous action happening with the
> firewire code which is releasing the mutex prematurely.
> 
> Or maybe the sbp used in sbp_cam_scan_lun() is wrong?  Dunno.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: firewire panic

2016-11-13 Thread Andriy Gapon
On 11/11/2016 14:25, Andriy Gapon wrote:
> panic: mutex sbp not owned at /usr/src/sys/dev/firewire/sbp.c:967
> cpuid = 2
> curthread: 0xf8000ada5000
> stack: 0xfe0504ded000 - 0xfe0504df1000
> stack pointer: 0xfe0504df0a00
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x80420bbb = 
> db_trace_self_wrapper+0x2b/frame
> 0xfe0504df0930
> kdb_backtrace() at 0x80670359 = kdb_backtrace+0x39/frame 
> 0xfe0504df09e0
> vpanic() at 0x8063986c = vpanic+0x14c/frame 0xfe0504df0a20
> panic() at 0x806395b3 = panic+0x43/frame 0xfe0504df0a80
> __mtx_assert() at 0x8061c40d = __mtx_assert+0xed/frame 
> 0xfe0504df0ac0
> sbp_cam_scan_lun() at 0x80474667 = sbp_cam_scan_lun+0x37/frame
> 0xfe0504df0af0
> xpt_done_process() at 0x802aacfa = xpt_done_process+0x2da/frame
> 0xfe0504df0b30
> xpt_done_td() at 0x802ac2e5 = xpt_done_td+0xd5/frame 
> 0xfe0504df0b80

So, it's pretty obvious that the sbp mutex can not be held when
sbp_cam_scan_lun() is called.
After removing the assertion, just to move further, I do not get any panics and
can access the disk.
But I see many witness warnings.  Some examples:

bus_dma_tag_create with the following non-sleepable locks held:
exclusive sleep mutex sbp (sbp) r = 0 (0xf8000ff04f48) locked @
/usr/src/sys/modules/firewire/sbp/../../../dev/firewire/sbp.c:802
stack backtrace:
#0 0x80ab20a0 at witness_debugger+0x70
#1 0x80ab3387 at witness_warn+0x3d7
#2 0x810174dd at bus_dma_tag_create+0x3d
#3 0x83703402 at fwdma_malloc+0x82
#4 0x836ed759 at sbp_post_explore+0x849
#5 0x836f9e76 at fw_bus_probe_thread+0x906
#6 0x80a164c4 at fork_exit+0x84
#7 0x80ea970e at fork_trampoline+0xe

bus_dmamem_alloc with the following non-sleepable locks held:
exclusive sleep mutex sbp (sbp) r = 0 (0xf8000ff04f48) locked @
/usr/src/sys/modules/firewire/sbp/../../../dev/firewire/sbp.c:802
stack backtrace:
#0 0x80ab20a0 at witness_debugger+0x70
#1 0x80ab3387 at witness_warn+0x3d7
#2 0x810175c3 at bus_dmamem_alloc+0x33
#3 0x83703424 at fwdma_malloc+0xa4
#4 0x836ed759 at sbp_post_explore+0x849
#5 0x836f9e76 at fw_bus_probe_thread+0x906
#6 0x80a164c4 at fork_exit+0x84
#7 0x80ea970e at fork_trampoline+0xe

bus_dmamap_create with the following non-sleepable locks held:
exclusive sleep mutex sbp (sbp) r = 0 (0xf8000ff04f48) locked @
/usr/src/sys/modules/firewire/sbp/../../../dev/firewire/sbp.c:802
stack backtrace:
#0 0x80ab20a0 at witness_debugger+0x70
#1 0x80ab3387 at witness_warn+0x3d7
#2 0x8101755f at bus_dmamap_create+0x2f
#3 0x836ed7d2 at sbp_post_explore+0x8c2
#4 0x836f9e76 at fw_bus_probe_thread+0x906
#5 0x80a164c4 at fork_exit+0x84
#6 0x80ea970e at fork_trampoline+0xe

lock order reversal:
1st 0xf8000ff04f48 sbp (sbp) @ /usr/src/sys/kern/kern_mutex.c:220
2nd 0xfe0001b94870 firewire (firewire) @
/usr/src/sys/modules/firewire/firewire/../../../dev/firewire/firewire.c:302
stack backtrace:
#0 0x80ab20a0 at witness_debugger+0x70
#1 0x80ab1f94 at witness_checkorder+0xe54
#2 0x80a33854 at __mtx_lock_flags+0xa4
#3 0x836f6423 at fw_asyreq+0x2d3
#4 0x80a68f2c at softclock_call_cc+0x19c
#5 0x80a69327 at softclock+0x47
#6 0x80a18d76 at intr_event_execute_handlers+0x96
#7 0x80a193f6 at ithread_loop+0xa6
#8 0x80a164c4 at fork_exit+0x84
#9 0x80ea970e at fork_trampoline+0xe

lock order reversal:
1st 0xf8000ff04f48 sbp (sbp) @ /usr/src/sys/kern/kern_mutex.c:220
2nd 0xf8000a0b4460 CAM device lock (CAM device lock) @
/usr/src/sys/cam/scsi/scsi_xpt.c:2349
stack backtrace:
#0 0x80ab20a0 at witness_debugger+0x70
#1 0x80ab1f94 at witness_checkorder+0xe54
#2 0x80a33854 at __mtx_lock_flags+0xa4
#3 0x8031af12 at scsi_scan_lun+0x122
#4 0x836eef40 at sbp_cam_scan_target+0x100
#5 0x80a68f2c at softclock_call_cc+0x19c
#6 0x80a69327 at softclock+0x47
#7 0x80a18d76 at intr_event_execute_handlers+0x96
#8 0x80a193f6 at ithread_loop+0xa6
#9 0x80a164c4 at fork_exit+0x84
#10 0x80ea970e at fork_trampoline+0xe


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: firewire panic

2016-11-12 Thread Andriy Gapon
On 12/11/2016 11:34, Gary Jennejohn wrote:
> FYI also works on FreeBSD 12.0-CURRENT r308568 amd64.
> 
> Nov 12 10:24:31 ernst kernel: fwohci0: fwohci_intr_core: BUS reset
> Nov 12 10:24:31 ernst kernel: fwohci0: fwohci_intr_core: node_id=0x, 
> SelfID Count=3, non CYCLEMASTER mode
> Nov 12 10:24:31 ernst kernel: firewire0: 2 nodes, maxhop <= 1 cable IRM 
> irm(0)  (me)
> Nov 12 10:24:31 ernst kernel: firewire0: root node is not cycle master capable
> Nov 12 10:24:31 ernst kernel: firewire0: bus manager 0
> Nov 12 10:24:31 ernst kernel: fwohci0: too many cycles lost, no cycle master 
> present?
> Nov 12 10:24:31 ernst kernel: firewire0: split transaction timeout: tl=0x1 
> flag=0x04
> Nov 12 10:24:31 ernst kernel: send: dst=0x01 tl=0x01 rt=0 tcode=0x4 pri=0x0 
> src=0x000
> Nov 12 10:24:34 ernst kernel: fwohci0: fwohci_intr_core: BUS reset
> Nov 12 10:24:34 ernst kernel: fwohci0: fwohci_intr_core: node_id=0x0001, 
> SelfID Count=4, CYCLEMASTER mode
> Nov 12 10:24:34 ernst kernel: firewire0: 2 nodes, maxhop <= 1 cable IRM 
> irm(1)  (me)
> Nov 12 10:24:34 ernst kernel: firewire0: bus manager 1
> Nov 12 10:24:34 ernst kernel: sbp0: sbp_show_sdev_info: sbp0:0:0: ordered:1 
> type:0 EUI:0001a305ee4c node:0 speed:2 maxrec:8
> Nov 12 10:24:34 ernst kernel: sbp0: sbp_show_sdev_info: sbp0:0:0 'Genesys ' 
> '' ''
> Nov 12 10:24:35 ernst kernel: da0 at sbp0 bus 0 scbus6 target 0 lun 0
> Nov 12 10:24:35 ernst kernel: da0:  Fixed Direct Access SCSI device
> Nov 12 10:24:35 ernst kernel: da0: 50.000MB/s transfers
> Nov 12 10:24:35 ernst kernel: da0: 305245MB (625142448 512 byte sectors)
> Nov 12 10:24:35 ernst kernel: da0: quirks=0x2
> Nov 12 10:25:00 ernst kernel: fwohci0: fwohci_intr_core: BUS reset
> Nov 12 10:25:00 ernst kernel: fwohci0: fwohci_intr_core: node_id=0x, 
> SelfID Count=5, CYCLEMASTER mode
> Nov 12 10:25:00 ernst kernel: firewire0: 1 nodes, maxhop <= 0 cable IRM 
> irm(0)  (me)
> Nov 12 10:25:00 ernst kernel: firewire0: bus manager 0
> Nov 12 10:25:00 ernst kernel: da0 at sbp0 bus 0 scbus6 target 0 lun 0
> Nov 12 10:25:00 ernst kernel: da0:  detached
> Nov 12 10:25:00 ernst kernel: (da0:sbp0:0:0:0): Periph destroyed

Is this with INVARIANTS ?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


firewire panic

2016-11-11 Thread Andriy Gapon

Does anyone still use firewire or hack on code?
I've recently tried to connect an external firewire HDD enclosure and got this:

Unread portion of the kernel message buffer:
lock order reversal:
 1st 0xf8002b0f2f48 sbp (sbp) @ /usr/src/sys/kern/kern_mutex.c:158
 2nd 0xf8003f86f460 CAM device lock (CAM device lock) @
/usr/src/sys/cam/scsi/scsi_xpt.c:2323
stack backtrace:
#0 0x8068d220 at witness_debugger+0x70
#1 0x8068cd81 at witness_checkorder+0x7a1
#2 0x8061bab8 at __mtx_lock_flags+0x98
#3 0x802b663d at scsi_scan_lun+0x11d
#4 0x802b51f7 at scsi_action+0x67
#5 0x802a756a at xpt_action+0x1a
#6 0x8047459e at sbp_cam_scan_target+0xce
#7 0x8064f856 at softclock_call_cc+0x2d6
#8 0x8064fbf7 at softclock+0x47
#9 0x80602190 at intr_event_execute_handlers+0xe0
#10 0x806029ec at ithread_execute_handlers+0x2c
#11 0x8060285b at ithread_loop+0x5b
#12 0x805ff72f at fork_exit+0xdf
#13 0x8082483e at fork_trampoline+0xe
lock order reversal:
panic: mutex sbp not owned at /usr/src/sys/dev/firewire/sbp.c:967
cpuid = 2
curthread: 0xf8000ada5000
stack: 0xfe0504ded000 - 0xfe0504df1000
stack pointer: 0xfe0504df0a00
KDB: stack backtrace:
db_trace_self_wrapper() at 0x80420bbb = db_trace_self_wrapper+0x2b/frame
0xfe0504df0930
kdb_backtrace() at 0x80670359 = kdb_backtrace+0x39/frame 
0xfe0504df09e0
vpanic() at 0x8063986c = vpanic+0x14c/frame 0xfe0504df0a20
panic() at 0x806395b3 = panic+0x43/frame 0xfe0504df0a80
__mtx_assert() at 0x8061c40d = __mtx_assert+0xed/frame 
0xfe0504df0ac0
sbp_cam_scan_lun() at 0x80474667 = sbp_cam_scan_lun+0x37/frame
0xfe0504df0af0
xpt_done_process() at 0x802aacfa = xpt_done_process+0x2da/frame
0xfe0504df0b30
xpt_done_td() at 0x802ac2e5 = xpt_done_td+0xd5/frame 0xfe0504df0b80
fork_exit() at 0x805ff72f = fork_exit+0xdf/frame 0xfe0504df0bf0
fork_trampoline() at 0x8082483e = fork_trampoline+0xe/frame
0xfe0504df0bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SM bus ioctls incorrect in FreeBSD 11

2016-11-10 Thread Andriy Gapon
On 14/10/2016 18:51, Lewis Donzis wrote:
> Our opinion doesn’t count for much, but I like 2 or 4.  Option 1 would
> essentially obviate the entire purpose of changing the structure.  Option 2
> basically finishes the job and makes it work properly.  Option 3 is, as you
> say, unappealing.  I have no problem with Option 4, obviously we can change
> our code back to the old way, but assuming there was a good reason for this
> change in the first place, Option 2 seems more logical.
> 
> But whatever y’all decide is fine with us, we’ll just change code to match at
> the appropriate time.

Anyone interested in the issue, could you please take a look at this review?
https://reviews.freebsd.org/D8430

Thank you.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

hwpmc / amd panic when stopping pmcstat

2016-11-07 Thread Andriy Gapon

panic: [pmc,1473] pp_pmcval outside of expected range cpu=2 ri=17
pp_pmcval=fa529f5b pm_reloadcount=1

(kgdb) p pp->pp_pmcs[17].pp_pmc->pm_state
$2 = PMC_STATE_DELETED

Those are interesting bits.  The counter is logically stopped and the value read
from the hardware is small (become huge after "munging").  My theory is that, at
least for AMD processors, a counter keeps running after overflowing.
At the same time, amd_intr() takes an early way out if pm_state !=
PMC_STATE_RUNNING.  So, the counter is allowed to overflow if it's logically
stopped.  But that makes the assertion in pmc_process_csw_out() invalid.

It seems that the following patch fixes the problem.
But I wonder if there is a better, perhaps hardware specific, fix.

Also, maybe the condition should be pm_state == PMC_STATE_RUNNING instead of
pm_state != PMC_STATE_DELETED.

diff --git a/sys/dev/hwpmc/hwpmc_mod.c b/sys/dev/hwpmc/hwpmc_mod.c
index 55dc499b1c40e..36bcccb8c27ac 100644
--- a/sys/dev/hwpmc/hwpmc_mod.c
+++ b/sys/dev/hwpmc/hwpmc_mod.c
@@ -1431,8 +1431,8 @@ pmc_process_csw_out(struct thread *td)
 * save the reading.
 */

-   if (pp != NULL && pp->pp_pmcs[ri].pp_pmc != NULL) {
-
+   if (pm->pm_state != PMC_STATE_DELETED && pp != NULL &&
+   pp->pp_pmcs[ri].pp_pmc != NULL) {
KASSERT(pm == pp->pp_pmcs[ri].pp_pmc,
("[pmc,%d] pm %p != pp_pmcs[%d] %p", __LINE__,
        pm, ri, pp->pp_pmcs[ri].pp_pmc));

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pmcstat -T -P instructions -t $pid ==> NMI

2016-10-28 Thread Andriy Gapon
On 28/10/2016 16:01, Andriy Gapon wrote:
> I suspect that under the right conditions it's possible for wrmsr to cause a
> counter overflow, such that an interrupt (if enabled) is generated after wrmsr
> is executed, even if wrmsr disables the counter.
> 
> In amd_intr() we have this code:
> wrmsr(evsel, config & ~AMD_PMC_ENABLE);
> wrmsr(perfctr, AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(v));
> 
> /* Restart the counter if logging succeeded. */
> error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
> TRAPF_USERMODE(tf));
> if (error == 0)
> wrmsr(evsel, config | AMD_PMC_ENABLE);
> 
> I suspect that in the scenario above, if it is indeed possible, the last wrmsr
> would re-enable the counter that's supposed to be stopped.
> 
> I think that writing back the original value should be more correct, that is:
>   wrmsr(evsel, config);
> 
> I'll test if this change would help.

So, I have tried this change:

--- a/sys/dev/hwpmc/hwpmc_amd.c
+++ b/sys/dev/hwpmc/hwpmc_amd.c
@@ -577,6 +577,7 @@ amd_start_pmc(int cpu, int ri)

PMCDBG1(MDP,STA,2,"amd-start config=0x%x", config);

+   KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu"));
wrmsr(pd->pm_evsel, config);
return 0;
 }
@@ -613,6 +614,7 @@ amd_stop_pmc(int cpu, int ri)

/* turn off the PMC ENABLE bit */
config = pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE;
+   KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu"));
wrmsr(pd->pm_evsel, config);
return 0;
 }
@@ -676,6 +678,7 @@ amd_intr(int cpu, struct trapframe *tf)
perfctr = AMD_PMC_PERFCTR_0 + i;
v   = pm->pm_sc.pm_reloadcount;
config  = rdmsr(evsel);
+   PMCDBG1(MDP,INT,2, "enabled=%d", config & AMD_PMC_ENABLE);

KASSERT((config & ~AMD_PMC_ENABLE) ==
(pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE),
@@ -689,12 +692,13 @@ amd_intr(int cpu, struct trapframe *tf)
error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
TRAPF_USERMODE(tf));
if (error == 0)
-   wrmsr(evsel, config | AMD_PMC_ENABLE);
+   wrmsr(evsel, config);
}

atomic_add_int(retval ? _stats.pm_intr_processed :
_stats.pm_intr_ignored, 1);

+   PMCDBG1(MDP,INT,3, "reval=%d", retval);
return (retval);
 }



And I couldn't reproduce the problem with it.
Also, in the debug log I see the following, for instance:
315466   11068994822044 MDP:INT:3: reval=1
315465   11068994821176 MDP:INT:2: enabled=4194304
315464   11068994820930 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
315463   11068994796610 MDP:INT:3: reval=1
315462   11068994795833 MDP:INT:2: enabled=4194304
315461   11068994795589 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
315460   11068994771107 MDP:INT:3: reval=1
315459   11068994770176 MDP:INT:2: enabled=4194304
315458   11068994769933 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
315457   01068994766498 MDP:SWO:1: pc=0xf8001c6e0e00 pp=0x0 enable-msr=0
315456   01068994765449 CSW:SWO:1: cpu=0 proc=0xf80073767a50 (1655,
pmcstat) pp=0x0
315455   11068994742201 MDP:INT:3: reval=1
315454   11068994739535 MDP:INT:2: enabled=4194304
315453   11068994739169 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
315452   11068994700076 MDP:SWI:1: pc=0xf8007318ea00
pp=0xf80021f56400 enable-msr=0
315451   11068994698957 MDP:STA:2: amd-start config=0x5300c0
315450   11068994698699 MDP:STA:1: amd-start cpu=1 ri=0
315449   11068994697513 MDP:WRI:1: amd-write cpu=1 ri=0 v=
315448   11068994696676 CSW:SWI:1: cpu=1 ri=17 new=65536
315447   11068994694784 MDP:CFG:1: cpu=1 ri=0 pm=0xf8001c6e0900
315446   11068994691210 CSW:SWI:1: cpu=1 proc=0xf8017b0d3a50 (1654,
burnK7) pp=0xf80021f56400
315445   01068994674368 MDP:SWO:1: pc=0xf8001c6e0e00
pp=0xf80021f56400 enable-msr=0
315444   01068994674033 MDP:CFG:1: cpu=0 ri=0 pm=0x0
315443   01068994673597 CSW:SWO:1: cpu=0 ri=17 tmp=-3205 (samp)
315442   01068994673412 MDP:REA:2: amd-read (post-munge) id=0 -> 65536
315441   01068994673247 MDP:REA:2: amd-read (pre-munge) id=0 -> 
281474976645120
315440   01068994673006 MDP:REA:1: amd-read id=0 class=2
315439   01068994672443 MDP:INT:3: reval=1
315438   01068994670981 MDP:INT:2: enabled=0
315437   01068994670591 MDP:INT:1: cpu=0 tf=0x81d769d0 um=0
315436   01068994669599 MDP:STO:1: amd-stop ri=0
315435   01068994668389 CSW:SWO:1: cpu=0 proc=0xf8017b0d3a50 (1654,
burnK7) pp=0xf80021f5

Re: pmcstat -T -P instructions -t $pid ==> NMI

2016-10-28 Thread Andriy Gapon
On 27/10/2016 16:20, Andriy Gapon wrote:
> 
> I observe a problem on a relatively recent, but not the latest, head.
> r306752 amd64 on AMD hardware.
> If I run
>   pmcstat -T -P instructions -t $pid
> with a pid of a busy userland process, then I shortly get a (stray) NMI.
> Apparently hwpmc does not recognize that NMI.

Because the problem was readily reproducible, I managed to gather some hwpmc
debug traces.  The following are last messages collected from a crash dump that
I made after getting an NMI on CPU#0:

index  cpu timestamptrace
-- ---  -
 97230   11089795818106 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97229   11089795792315 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97228   11089795767010 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97227   11089795741843 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97226   11089795716970 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97225   11089795691881 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97224   11089795666756 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97223   11089795641637 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97222   11089795616422 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97221   11089795590715 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97220   01089795590076 MDP:INT:1: cpu=0 tf=0x81d769d0 um=0
 97219   11089795560493 MDP:INT:1: cpu=1 tf=0xfe03c45ecf30 um=1
 97218   11089795520646 MDP:SWI:1: pc=0xf8002f355e00
pp=0xf80021f56400 enable-msr=0
 97217   11089795519677 MDP:STA:2: amd-start config=0x5300c0
 97216   11089795519420 MDP:STA:1: amd-start cpu=1 ri=0
 97215   11089795518154 MDP:WRI:1: amd-write cpu=1 ri=0 v=00fe
 97214   11089795517311 CSW:SWI:1: cpu=1 ri=17 new=65282
 97213   11089795514866 MDP:CFG:1: cpu=1 ri=0 pm=0xf8001c6e0500
 97212   11089795510620 CSW:SWI:1: cpu=1 proc=0xf8013b1fe000 (1666,
burnK7) pp=0xf80021f56400
 97211   01089795494715 MDP:SWO:1: pc=0xf8001c6e0800
pp=0xf80021f56400 enable-msr=0
 97210   01089795494448 MDP:CFG:1: cpu=0 ri=0 pm=0x0
 97209   01089795494003 CSW:SWO:1: cpu=0 ri=17 tmp=-2839 (samp)
 97208   01089795493811 MDP:REA:2: amd-read (post-munge) id=0 -> 65282
 97207   01089795493644 MDP:REA:2: amd-read (pre-munge) id=0 -> 
281474976645374
 97206   01089795493299 MDP:REA:1: amd-read id=0 class=2
 97205   01089795491453 MDP:INT:1: cpu=0 tf=0x81d769d0 um=0
 97204   01089795490487 MDP:STO:1: amd-stop ri=0
 97203   01089795489399 CSW:SWO:1: cpu=0 proc=0xf8013b1fe000 (1666,
burnK7) pp=0xf80021f56400
 97202   01089795460401 MDP:INT:1: cpu=0 tf=0x81d769d0 um=1
 97201   01089795433574 MDP:INT:1: cpu=0 tf=0x81d769d0 um=1
 97200   01089795371727 MDP:INT:1: cpu=0 tf=0x81d769d0 um=0
 97199   01089795328556 MDP:INT:1: cpu=0 tf=0x81d769d0 um=1
 97198   01089795303542 MDP:INT:1: cpu=0 tf=0x81d769d0 um=1
 97197   01089795278288 MDP:INT:1: cpu=0 tf=0x81d769d0 um=1
 97196   01089795253226 MDP:INT:1: cpu=0 tf=0x81d769d0 um=1

I think that what we see here is the target process being migrated from CPU#0 to
CPU#1 and the corresponding reconfiguration of the performance counters on both
CPUs.

I interpret event #97220 to be the unclaimed NMI.
Events #97204 and #97205 look curious.  They seem like a possible "race" between
amd_stop_pmc() and amd_intr().  As I understand, amd_stop_pmc() is called from
the context switch code when the target process gets off CPU.
I suspect that under the right conditions it's possible for wrmsr to cause a
counter overflow, such that an interrupt (if enabled) is generated after wrmsr
is executed, even if wrmsr disables the counter.

In amd_intr() we have this code:
wrmsr(evsel, config & ~AMD_PMC_ENABLE);
wrmsr(perfctr, AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(v));

/* Restart the counter if logging succeeded. */
error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
TRAPF_USERMODE(tf));
if (error == 0)
wrmsr(evsel, config | AMD_PMC_ENABLE);

I suspect that in the scenario above, if it is indeed possible, the last wrmsr
would re-enable the counter that's supposed to be stopped.

I think that writing back the original value should be more correct, that is:
wrmsr(evsel, config);

I'll test if this change would help.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


pmcstat -T -P instructions -t $pid ==> NMI

2016-10-27 Thread Andriy Gapon

I observe a problem on a relatively recent, but not the latest, head.
r306752 amd64 on AMD hardware.
If I run
pmcstat -T -P instructions -t $pid
with a pid of a busy userland process, then I shortly get a (stray) NMI.
Apparently hwpmc does not recognize that NMI.

Any suggestions, help, me-toos?
Thank you!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SM bus ioctls incorrect in FreeBSD 11

2016-10-14 Thread Andriy Gapon
On 14/10/2016 18:11, Michael Gmelin wrote:
> For some history on these changes, please see also [1] and [2] (there
> were a few discussions and the revision was bumped, I also tried to
> get some attention, but not enough it seems).
> 
> Given your recent changes to iicbus in HEAD, I think it would be best to
> MFC those and go with Option 4 or, if that's to drastic, go with
> Option 1.

I am leaning towards this approach as well.

> Thanks for cleaning after me.

You asked for a discussion and reviews.
I can not recall what I was doing at that time, but I completely ignored the
development and for that I can only blame myself.

> [1]https://lists.freebsd.org/pipermail/freebsd-arch/2015-March/016972.html
> [2]https://lists.freebsd.org/pipermail/freebsd-arch/2015-May/017157.html

I also agree that having a thin library on top of the ioctl would be a 
convenience.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SM bus ioctls incorrect in FreeBSD 11

2016-10-14 Thread Andriy Gapon
On 14/10/2016 00:39, Lewis Donzis wrote:
> After upgrading to FreeBSD 11.0 and changing source code to use the new
> version of “struct smbcmd”, some commands are not working as documented,
> specifically those that read data.
> 
> As an example, SMB_READW is documented as returning the word read from the
> device in rdata.word.  However, this doesn’t happen, I think because the
> ioctl request value is defined using _IOW(), so the kernel doesn’t copy the
> data it read back out.
> 
> In prior versions, the structure had only a pointer to the data, and the
> smb.c code used copyout() to transfer the data back to userland.
> 
> As a temporary work-around, we added code to set rbuf to point to rdata.word
> and rcount to two.

Lewis,

thank you for the report.  This is a bug indeed and your analysis is correct.
Could you please open a bugzilla issue for the bug?
https://bugs.freebsd.org/bugzilla/

Looking at ports commit 385155
https://svnweb.freebsd.org/ports/head/sysutils/xmbmon/files/patch-getMB-smb_ioctl.c?r1=385155=385154=385155
I see that it also used the approach that you use as a workaround.
And that port commit is by Michael Gmelin who made the change to smb.h in
r281985 https://svnweb.freebsd.org/base?view=revision=281985
So, I am not sure if the documented approach was known to not work.

The src change is described as "Expand SMBUS API ...", but in fact it also
_changed_ the existing ioctls.  And both binary compatibility and programming
compatibility were broken because of how struct smbcmd was changed.
In FreeBSD we try to not do that without a very strong reason, but alas.
And, as you report, the change was not done entirely correctly.

I see several possibilities now.

Option 1.  Change the documentation to reflect the actual behavior.
In this case data.rdata will remain unusable and unused.  No interface changes.

Option 2. Redefine SMB_READB, SMB_READW and SMB_PCALL ioctls using _IOWR, so
that data.rdata could be returned from kernel.  This seems like a proper fix,
but it is another binary level incompatibility.

Option 3.  Use a horrible hack to discover a userland address of smbcmd and
explicitly copyout to data.rdata.  No interface incompatibilities, but it will
be a horrible hack.  Besides, not sure how feasible it is.

Option 4.  Revert smb ioctl changes to what they used to be before r281985.
Personally, I would prefer this approach.  But now that the new interface is in
11.0, it means another interface change just like Option 2.

I would like to hear other developers' opinions about this situation.

P.S.
Two changes that I want to do no matter which course of action we select are:
- revert SMB_MAXBLOCKSIZE to 32
- remove SMB_TRANS as it does not map to anything defined by the SMBus
  specification and it can not be implemented for most, if not all,
  SMBus controllers

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

install header files required for development with libzfs_core

2016-10-12 Thread Andriy Gapon

JFYI.  I bumped __FreeBSD_version to 1200013 to mark this change.

 Forwarded Message 
Subject: svn commit: r307131 - head/include
Date: Wed, 12 Oct 2016 07:08:32 + (UTC)
From: Andriy Gapon <a...@freebsd.org>
To: src-committ...@freebsd.org, svn-src-...@freebsd.org, 
svn-src-h...@freebsd.org

Author: avg
Date: Wed Oct 12 07:08:32 2016
New Revision: 307131
URL: https://svnweb.freebsd.org/changeset/base/307131

Log:
  install header files required development with libzfs_core

  libzfs_core provides a rather limited but committed (stable) interface
  for working with ZFS.  We install libzfs_core shared library but we do
  not install header files required for developing programs that use
  the library.  This change is to install the required header files
  libzfs_core.h, libnvpair.h and sys/nvpair.h.
The headers are installed into the same locations as on illumos.
Reviewed by:mav, markj
  Differential Revision: https://reviews.freebsd.org/D8005

Modified:
  head/include/Makefile

Modified: head/include/Makefile
==
--- head/include/Makefile   Wed Oct 12 06:58:01 2016(r307130)
+++ head/include/Makefile   Wed Oct 12 07:08:32 2016(r307131)
@@ -237,6 +237,17 @@ copies: .PHONY .META
cd ${.CURDIR}/../sys/teken; \
${INSTALL} -C ${TAG_ARGS} -o ${BINOWN} -g ${BINGRP} -m 444 teken.h \
${DESTDIR}${INCLUDEDIR}/teken
+.if ${MK_CDDL} != "no"
+   cd ${.CURDIR}/../cddl/contrib/opensolaris/lib/libzfs_core/common; \
+   ${INSTALL} -C ${TAG_ARGS} -o ${BINOWN} -g ${BINGRP} -m 444 
libzfs_core.h \
+   ${DESTDIR}${INCLUDEDIR}
+   cd ${.CURDIR}/../cddl/contrib/opensolaris/lib/libnvpair; \
+   ${INSTALL} -C ${TAG_ARGS} -o ${BINOWN} -g ${BINGRP} -m 444 libnvpair.h \
+   ${DESTDIR}${INCLUDEDIR}
+   cd ${.CURDIR}/../sys/cddl/contrib/opensolaris/uts/common/sys; \
+   ${INSTALL} -C ${TAG_ARGS} -o ${BINOWN} -g ${BINGRP} -m 444 nvpair.h \
+   ${DESTDIR}${INCLUDEDIR}/sys
+.endif
  symlinks: .PHONY .META
@${ECHO} "Setting up symlinks to kernel source tree..."

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-12 Thread Andriy Gapon
On 12/10/2016 09:21, Andriy Gapon wrote:
> On 12/10/2016 07:03, Warner Losh wrote:
>> I think I can do the device table mechanism if Andriy isn't up for it.
> 
> That would be great, thank you!
> 

Meanwhile, I've added a "stop-gap" version of 'chromebook_platform' driver here:
https://reviews.freebsd.org/D8172
A full patch can be downloaded from the review request.

Could you please test if it works?

All hints should be removed and the new module should be loaded in addition to
other modules.  It should not matter in which order the modules are loaded.
Could you please test if loading chromebook_platform before and after isl and
cyapa works the same?
It's required to either reboot or reload iicbus between the tests, so that
previously added devices are not re-used.

Thanks!

P.S.
Not sure if the name is good, it's certainly verbose.
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-12 Thread Andriy Gapon
On 12/10/2016 07:03, Warner Losh wrote:
> I think I can do the device table mechanism if Andriy isn't up for it.

That would be great, thank you!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-10 Thread Andriy Gapon
On 10/10/2016 21:45, Michael Gmelin wrote:
> I see three tasks here: - Andriy finishes his change, moving things from
> smbus to iicbus, adding some workaround to keep the user experience like it
> is - Someone else implements the device table mechanism for auto detection -
> Someone else ports HDI over I2C to allow implementing drivers for devices
> like the elan touchpad Matthias is referring to
> 
> Makes sense?

It does to me.
Also, I can suggest another task related to SMBus / I2C.

Looking at the code in the Linux chromeos_laptop driver I see that on some
models some sensors are actually attached to SMBus rather that to I2C.  And, for
example, cyapa can be attached to either bus.  But there is a quirk.  cyapa
won't work over a standard SMBus, it needs some extensions that are typically
provided by Intel chipsets.  I specifically mean the so called "I2C Block Read"
and the transaction that results from the Block Write command when the I2C bit
is set in the SMBus controller's configuration register.  Neither of these modes
is supported by our ichsmb(4) driver.  But on Linux they are both supported and
exposed as I2C_SMBUS_I2C_BLOCK_DATA transaction type.

For one reference please see Mobile 4th Generation Intel® CoreTM
Processor Family I/O Datasheet, section 5.21.1.1.
And, just in case, ig4(4) is about the controllers described in section 5.22 of
that document.

Perhaps, I2C_SMBUS_I2C_BLOCK_DATA served as an inspiration (and perhaps a source
of confusion) for Matt when he added smbus_trans().

Right now I do not have any good suggestion on how to expose that 90% SMBus, 10%
I2C functionality in the FreeBSD model.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [request for testing] isl, cyapa on chromebooks

2016-10-10 Thread Andriy Gapon
On 10/10/2016 18:26, Warner Losh wrote:
> I see no reason not to start the table right away based on
> smbios.sys.product and other criteria. I don't think we need all the
> matches that Linux uses, but we can expand the table if we find it so.
> Why have a stop gap that's a table that we kludge together when the
> real table is of comparable difficulty and wouldn't need to be
> reworked.

One simple reason for me personally.  I do not have the hardware and I am not
particularly interested in it.  I am interested only in cleaning up the smbus
interface and moving ig4iic to iicbus.  I want to get done with that as quickly
as possible and my goal is just that the result is not worse than the current 
code.
I am sure that people who are more interested than me can make the code much 
better.

> On Mon, Oct 10, 2016 at 5:46 AM, Michael Gmelin <gre...@freebsd.org> wrote:
>> On Mon, 10 Oct 2016 14:35:22 +0300
>> Andriy Gapon <a...@freebsd.org> wrote:
>>
>>> On 09/10/2016 23:22, Warner Losh wrote:
>>>> There seems to be enough information present in the smbios data to
>>>> know what devices are at what addresses. Perhaps we should use it as
>>>> much as possible in well controlled situations to move this
>>>> knowledge into the OS.
>>>
>>> So, I was thinking about maybe doing something like this to preserve
>>> the status quo, to avoid requiring manual hints and to lay a
>>> foundation for the proper Chromebook I2C slave discovery:
>>>
[snip]


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-10 Thread Andriy Gapon
On 09/10/2016 23:22, Warner Losh wrote:
> There seems to be enough information present in the smbios data to
> know what devices are at what addresses. Perhaps we should use it as
> much as possible in well controlled situations to move this knowledge
> into the OS.

So, I was thinking about maybe doing something like this to preserve the status
quo, to avoid requiring manual hints and to lay a foundation for the proper
Chromebook I2C slave discovery:


static struct {
uint32_tctlrid,
const char  *name;
uint_t  addr;
} slaves[] = {
{ 0x9c628086,   "isl",  0x88 },
{ 0x9c628086,   "cyapa",0xce },
}

static void
chromebook_i2c_identify(driver_t *driver, device_t bus)
{
device_t controller;
device_t child;
int i;

/*
 * A stop gap approach to preserve the status quo.
 * A more intelligent approach is required to correctly
 * identify a machine model and hadrdware available on it.
 * For instance, DMI could be used.
 * See
http://lxr.free-electrons.com/source/drivers/platform/chrome/chromeos_laptop.c
 */
controller = device_get_parent(bus);
if (strcmp(device_get_name(controller), "ig4iic") != 0)
return;

for (i = 0; i < nitems(slaves); i++) {
if (device_find_child(bus, slave->name, -1) != NULL)
continue;
if (slave->ctlrid != pci_get_devid(controller))
continue;
child = BUS_ADD_CHILD(bus, 0, slave->name, -1);
if (child != NULL)
iicbus_set_addr(child, slave->addr);
}
}

static device_method_t chromebook_i2c_methods[] = {
DEVMETHOD(device_identify,  chromebook_i2c_identify),
{ 0, 0 }
};

static driver_t chromebook_i2c_driver = {
"chromebook_i2c",
chromebook_i2c_methods,
0   /* no softc */
};

static devclass_t chromebook_i2c_devclass;

DRIVER_MODULE(chromebook_i2c, iicbus, chromebook_i2c_driver,
chromebook_i2c_devclass, 0, 0);
MODULE_DEPEND(chromebook_i2c, iicbus, IICBUS_MINVER, IICBUS_PREFVER,
IICBUS_MAXVER);
MODULE_VERSION(chromebook_i2c, 1);

The idea is that this is a driver that listens for new iicbus-es and adds isl
and cyapa devices to a bus if some criteria are met.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


vtterm_cngrab is broken on kms-enabled systems when entering kdb

2016-10-09 Thread Andriy Gapon

JFYI,

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213334

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-09 Thread Andriy Gapon
On 09/10/2016 09:36, Matthias Apitz wrote:
> El día Saturday, October 08, 2016 a las 10:17:07PM +0300, Andriy Gapon 
> escribió:
> 
>> On 08/10/2016 21:07, Matthias Apitz wrote:
>>> I have now produced a memstick with (unpatched) r306769 of October 6. It 
>>> boots
>>> fine in my Acer C720 Chromebook and the moused is working fine with the
>>> cyapa(4) driver. I will apply tomorrow the above v4 patch or is there
>>> anything newer? And will test/report.
>>
>> v4 is the latest.  Thanks!
> 
> The patch applies cleanly, the 'make buildkernel' does fine and system
> boots, but cyapa(4) can not bring the device out of bootstrap. The verbose 
> dmesg
> is here http://www.unixarea.de/dmesg-00.txt 
> 
> And yes, I have in /boot/device.hints:

Well, comparing the hints and the boot message, you have exactly the problem
that I feared many people would have until we add auto-probing to isl and cyapa.

Basically it's easy to connect the dots once you know what they are.  You hint
that isl and cyapa should be on iicbus0 or iicbus1.  But iicbus0 is connected to
iicbb0 which is intel_iicbb0 on drmn0, which is a video card.  That is clearly
wrong.  And a similar thing with iicbus1:  on intel_gmbus0.
But later you have:
iicbus14:  on ig4iic0
iicbus15:  on ig4iic1
So, in your case, and with that probing order (hopefully it not changes from
boot to boot), you should use those two buses to look for isl and cyapa.

Examining output of devinfo -v -r should be even easier than going through 
dmesg.
Hope this helps.

> ...
> # The change moves the drivers from the SMBus to the I2C bus and as such some
> # configuration changes are required.
> # Namely, you will now need iicbus driver either in the kernel configuration 
> or as
> # a module.  For now the smbus driver is also required.
> # You also need to add some entries to /boot/device.hints:
> # 
> hint.isl.0.at="iicbus0"
> hint.isl.0.addr=0x88
> hint.isl.1.at="iicbus1"
> hint.isl.1.addr=0x88
> hint.cyapa.0.at="iicbus0"
> hint.cyapa.0.addr=0xce
> hint.cyapa.1.at="iicbus1"
> hint.cyapa.1.addr=0xce
> # 
> # The hints are required because auto-probing (either via the bus enumeration 
> or
> # self-identification) is disabled for now for safety reason.
> # Also, as I understand, the Intel chipset used in the supported Chromebooks
> # provides to i2c buses (possibly in addition in an smbus) and I am not sure 
> on
> # which of the i2c buses the devices reside.
> 
> Please let me know what to check/debug.
> 
>   matthias
> 


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [request for testing] isl, cyapa on chromebooks

2016-10-08 Thread Andriy Gapon
On 08/10/2016 21:07, Matthias Apitz wrote:
> I have now produced a memstick with (unpatched) r306769 of October 6. It boots
> fine in my Acer C720 Chromebook and the moused is working fine with the
> cyapa(4) driver. I will apply tomorrow the above v4 patch or is there
> anything newer? And will test/report.

v4 is the latest.  Thanks!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-06 Thread Andriy Gapon
On 06/10/2016 11:10, Michael Gmelin wrote:
> 
> 
>> On 06 Oct 2016, at 09:48, Andriy Gapon <a...@freebsd.org> wrote:
>>
>>> On 06/10/2016 08:37, Andriy Gapon wrote:
>>> The more testing the better!
>>
>> Based on Michael's results I've uploaded a new version:
>> https://people.freebsd.org/~avg/ig4-i2c.v4.diff
>>
>>
> 
> Good news. Applying the last two fixes manually, isl is working now.

Great!

> I'll rebuild a kernel using the complete v4 patch on my second machine later 
> today to test the modified cyapa driver.

Thank you again!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-06 Thread Andriy Gapon
On 06/10/2016 08:37, Andriy Gapon wrote:
> The more testing the better!

Based on Michael's results I've uploaded a new version:
https://people.freebsd.org/~avg/ig4-i2c.v4.diff

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-06 Thread Andriy Gapon
On 06/10/2016 10:25, Andriy Gapon wrote:
> On 06/10/2016 10:08, Michael Gmelin wrote:
>>
>>
>>> On 05 Oct 2016, at 15:01, Andriy Gapon <a...@freebsd.org> wrote:
>>>
>>>> On 05/10/2016 14:19, Michael Gmelin wrote:
>>>>
>>>> ig4iic_start is called, but iicbus_hinted_child, isl_probe, iicbus_probe 
>>>> and
>>>> iicbus_attach are not.
>>>
>>> Thank you!
>>> Now I think I see where I made a silly mistake.
>>> Please try an updated version of the patch from here
>>> https://people.freebsd.org/~avg/ig4-i2c.v3.diff
>>> It contains a fix and some cosmetic changes on top of the previous patch.
>>>
>>
>> Isl attaches cleanly on iicbus1 now, but it doesn't appear to function (all 
>> inputs, like  dev.isl.ir etc, are stuck at 0).
> 
> At least some progress...
> Anything interesting in logs?
> 
> Oh! and I've just spotted a typo in isl.c: the last call to isl_read_byte() in
> isl_read_sensor() should have REG_DATA2 (not REG_DATA1 again).

And another, more severe typo :-(
In isl_read_byte we should pass both messages to the bus:
return (iicbus_transfer(dev, msgs, 2));
That is, s/1/2/.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-06 Thread Andriy Gapon
On 06/10/2016 10:08, Michael Gmelin wrote:
> 
> 
>> On 05 Oct 2016, at 15:01, Andriy Gapon <a...@freebsd.org> wrote:
>>
>>> On 05/10/2016 14:19, Michael Gmelin wrote:
>>>
>>> ig4iic_start is called, but iicbus_hinted_child, isl_probe, iicbus_probe and
>>> iicbus_attach are not.
>>
>> Thank you!
>> Now I think I see where I made a silly mistake.
>> Please try an updated version of the patch from here
>> https://people.freebsd.org/~avg/ig4-i2c.v3.diff
>> It contains a fix and some cosmetic changes on top of the previous patch.
>>
> 
> Isl attaches cleanly on iicbus1 now, but it doesn't appear to function (all 
> inputs, like  dev.isl.ir etc, are stuck at 0).

At least some progress...
Anything interesting in logs?

Oh! and I've just spotted a typo in isl.c: the last call to isl_read_byte() in
isl_read_sensor() should have REG_DATA2 (not REG_DATA1 again).

Thank you for testing!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-06 Thread Andriy Gapon
On 06/10/2016 07:47, Matthias Apitz wrote:
> El día Wednesday, October 05, 2016 a las 04:01:25PM +0300, Andriy Gapon 
> escribió:
> 
>> On 05/10/2016 14:19, Michael Gmelin wrote:
>>>
>>> ig4iic_start is called, but iicbus_hinted_child, isl_probe, iicbus_probe and
>>> iicbus_attach are not.
>>
>> Thank you!
>> Now I think I see where I made a silly mistake.
>> Please try an updated version of the patch from here
>> https://people.freebsd.org/~avg/ig4-i2c.v3.diff
>> It contains a fix and some cosmetic changes on top of the previous patch.
> 
> Hi Andriy,
> 
> I have an Acer C720 too, since around two years and it works fine,
> thanks to Michael, with the cyapa chip. I have my C720 in daily heavy
> usage, actually with r292778. Should I apply the above patch or do I
> need to update before to a more recent CURRENT?

I think that the patch should apply.
But if it doesn't...

> I have a second C720
> where I could do such test/update more easy, but this is current not in
> my hands and has, after a repair at Acer, a Elan touch pad.
> But I could prepare an USB key with 12-CURRENT, just for test.

The more testing the better!

> Btw: What is the reason for this change of ig4/i2c/cyapa driver?

The reason is that the hardware that ig4 handles is really an I2C controller.
Moreover, isl and cyapa need to issue I2C commands that can not be mapped to
SMBus commands to talk to their hardware.
I am not sure why Matt Dillon (who I believe is the original author of the code)
chose to use smbus(4) instead of iicbus(4).  And to make that work he had to
"extend" smbus(4) with smbus_trans() method which does not really map to
anything defined by the SMBus specification and which can not be implemented on
any of real pure SMBus controllers (like intpm or ichsmb).  The closest command
that SMBus supports is 'Block write - block read process call', but it's not
quite the same.

> Last thing: I propose to remove freebsd-mobile@ from the thread.

Okay.  I just was not sure where I can find FreeBSD Chromebook owners.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

iicsmb

2016-10-05 Thread Andriy Gapon

Does anyone use iicsmb driver for any practical purposes?
Or more broadly, does anyone have a system with an I2C controller behind which
SMBus-compatible slaves are known to exist?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-05 Thread Andriy Gapon
On 05/10/2016 14:19, Michael Gmelin wrote:
> 
> ig4iic_start is called, but iicbus_hinted_child, isl_probe, iicbus_probe and
> iicbus_attach are not.

Thank you!
Now I think I see where I made a silly mistake.
Please try an updated version of the patch from here
https://people.freebsd.org/~avg/ig4-i2c.v3.diff
It contains a fix and some cosmetic changes on top of the previous patch.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-04 Thread Andriy Gapon
On 05/10/2016 01:48, Michael Gmelin wrote:
> Double-checked the hints, it's all ok.
> 
> Please find a more verbose log file of loading the kernel modules here:
> 
> https://people.freebsd.org/~grembo/c720-20161105.log

Unfortunately this doesn't provide any new insights.

Could you please add some printf-s to iicbus_hinted_child() in
sys/dev/iicbus/iicbus.c to see whether the isl devices are really added via the
hints and what their properties are?
Also, some printf-s to isl_probe() to see if it gets called and where it fails.
And, just in case, to ig4iic_start() to see if gets called.

Thank you!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-04 Thread Andriy Gapon
On 04/10/2016 12:46, Michael Gmelin wrote:
> iicbus(0|1) actually show up in devinfo -v, but nothing else works.
> 
> You can find a log file and various outputs (dmesg, devinfo etc) here:
> 
> https://people.freebsd.org/~grembo/c720-20161104.log

Thank you.
Could you please double-check that device.hints contains the necessary hints?
Could you also set debug.bootverbose=1 before kldload ig4 and kldload isl and
show me any new log messages that appear after doing kldload?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Regression with revision 303970 (was kern.proc.pathname failure while booting from zfs)

2016-10-04 Thread Andriy Gapon

I've written a patch that implements zfs_vptocnp() using z_parent property.
The new code should be 100% reliable for directories and "mostly" reliable for
files (that is, when hardlinks across directories are not used).

Could you please review / test it?
https://reviews.freebsd.org/D8146

Thanks!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-03 Thread Andriy Gapon
On 03/10/2016 23:25, Michael Gmelin wrote:
> On Mon, 3 Oct 2016 19:41:17 +0300
> Andriy Gapon <a...@freebsd.org> wrote:
> 
>> On 03/10/2016 19:07, Michael Gmelin wrote:
>>> I upgraded the latter the r306641, applied your patches (cleanly)
>>> and ran "make kernel" (GENERIC kernel), added the entries to
>>> device.hints and rebooted. Unfortunately ig4 won't load:
>>>
>>> # kldload ig4
>>> link_elf_obj: symbol iicbus_transfer_desc undefined
>>> linker_load_file: Unsupported file type  
>>
>> Hmm, seems like forgot to declare the iicbus module dependency.
>> Could you please kldload iicbus and see if that helps?
>> Meanwhile I'll add the dependency.
> 
> Unfortunately this doesn't help, you should be able to reproduce it
> yourself without access to the actual hardware though.
> 
> I worked around this by adding the iicbus dependency to ig4_pci.c (and
> also to "files").

Okay, this should be in the latest patch anyway.

> Now loading ig4 works and both lynx point controllers are detected.
> Loading isl doesn't create any output and doesn't seem to detect any
> devices. Also, devinfo shows both controllers (ig4iic0, ig4iic1), but
> no iicbus devices.
> 
> # devinfo | grep iic
> ig4iic0
> ig4iic1

Is there anything interesting from from ig4 in the log / dmesg?
Could you please check that your copy of the patch contains this chunk?
@@ -549,7 +780,7 @@ ig4iic_attach(ig4iic_softc_t *sc)
  IG4_CTL_RESTARTEN |
  IG4_CTL_SPEED_STD);

-   sc->smb = device_add_child(sc->dev, "smbus", -1);
+   sc->smb = device_add_child(sc->dev, "iicbus", -1);
if (sc->smb == NULL) {
device_printf(sc->dev, "smbus driver not found\n");
error = ENXIO;

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [request for testing] isl, cyapa on chromebooks

2016-10-03 Thread Andriy Gapon
On 03/10/2016 19:07, Michael Gmelin wrote:
> I upgraded the latter the r306641, applied your patches (cleanly) and
> ran "make kernel" (GENERIC kernel), added the entries to device.hints
> and rebooted. Unfortunately ig4 won't load:
> 
> # kldload ig4
> link_elf_obj: symbol iicbus_transfer_desc undefined
> linker_load_file: Unsupported file type

Hmm, seems like forgot to declare the iicbus module dependency.
Could you please kldload iicbus and see if that helps?
Meanwhile I'll add the dependency.

> I also noticed that isl cannot be built from the module source
> directory:
> 
> # cd /usr/src/sys/modules/i2c/isl
> # make
> ...
> /usr/src/sys/sys/vnode.h:571:10: fatal error: 'vnode_if.h' file not
> found
> #include "vnode_if.h"
>  ^
> 1 error generated
> *** Error code 1
> 
> This can be easily fixed by removing "#include " from
> isl.c (line 56).

Thank you for reporting this!
Looks like I overlooked this because I didn't do make clean after removing
vnode_if.h from the Makefile.
Will fix this too.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[request for testing] isl, cyapa on chromebooks

2016-10-03 Thread Andriy Gapon

If you have a Chromebook where you are currently able to use isl and cyapa
drivers, could you please test the following code change?
https://people.freebsd.org/~avg/ig4-i2c.diff

The change moves the drivers from the SMBus to the I2C bus and as such some
configuration changes are required.
Namely, you will now need iicbus driver either in the kernel configuration or as
a module.  For now the smbus driver is also required.
You also need to add some entries to /boot/device.hints:
hint.isl.0.at="iicbus0"
hint.isl.0.addr=0x88
hint.isl.1.at="iicbus1"
hint.isl.1.addr=0x88
hint.cyapa.0.at="iicbus0"
hint.cyapa.0.addr=0xce
hint.cyapa.1.at="iicbus1"
hint.cyapa.1.addr=0xce

The hints are required because auto-probing (either via the bus enumeration or
self-identification) is disabled for now for safety reason.
Also, as I understand, the Intel chipset used in the supported Chromebooks
provides to i2c buses (possibly in addition in an smbus) and I am not sure on
which of the i2c buses the devices reside.

The changes are build tested only, because I do not have access to the hardware.
So, kernel panics, etc are not unexpected.

Please let me know if drivers attach at all and if there are any issues with 
them.
A verbose dmesg would be of great help.  That could be obtained by booting in a
verbose mode if the drivers are auto-loaded or by setting debug.bootverbose=1
before loading the drivers if that's done manually.

Please also note that ig4 driver is changed, so it too has to be rebuilt if you
are going to build individual modules rather than do a kernel + modules build.

I will appreciate your testing and feedback.
Thank you!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


aibs(4) / atk0110 support for newer systems

2016-09-30 Thread Andriy Gapon

I've written a patch for aibs(4) (also known as ASUS AI Booster and ATK0110)
that adds support for discovering and querying sensors using newer GGRP ("get
group"?) and GITM ("get item"?) methods:
https://people.freebsd.org/~avg/aibs-ggrp-gitm.diff

If you are using the driver could you please test that the patch does not break
it for you?
If you have an ASUS motherboard where aibs does not find sensors could you
please check if there is any improvement with the patch?
Testing the patch should be as easy as building, installing and loading aibs
module (found in sys/modules/acpi/aibs).  Well, also don't forget to apply the
patch with patch -p1 :-)

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Destroy GPT partition scheme absolutely, how?

2016-09-28 Thread Andriy Gapon
On 28/09/2016 21:08, Andrey V. Elsukov wrote:
> This is very strange problem, how did you created MBR if you have not
> destroyed GPT? :)

Using a tool that's not aware of GPT at all?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: SMBus controller driver for AMD APU GX-412TC SOC

2016-09-18 Thread Andriy Gapon
On 18/09/2016 17:05, O. Hartmann wrote:
> Running a recent CURRENT (FreeBSD 12.0-CURRENT #11 r305903: Sat Sep 17 
> 20:30:19 CEST
> 2016) on a PCengines APU 2C4, I see no driver adapted to the SMBus of the 
> system,
> although I have the driver statically linked into the kernel via
> 
> [...]
> # System Management Bus
> device  smbus
> device  smb # SMB generic I/O device driver
> device  ichsmb  # Intel ICH SMBus controller driver
> device  amdsmb  # AMD-8111 SMBus 2.0 controller driver
> device  iicsmb  #
> device  iicbus
> device  iicbb
> device  iic
> device  ic
> [...]
> 
> pciconf -lvbp shows up this message:
> 
> [...]
> none1@pci0:0:20:0:  class=0x0c0500 card=0x780b1022 chip=0x780b1022 
> rev=0x42 hdr=0x00
> vendor = 'Advanced Micro Devices, Inc. [AMD]'
> device = 'FCH SMBus Controller'
> class  = serial bus
> subclass   = SMBus
> 
> So, I guess CURRENT doesn't have a driver covering this type of system? Or do 
> I miss
> something here?

First of all, as strange as it may seem, the correct driver in this case would
be intpm.  See its manual page.
Second, the driver doesn't support your hardware yet.  Some rather small changes
are required and I am working on that.


> I also see a very strange, mysterious and interesting feature, also not 
> attached with a
> driver:
> 
> none0@pci0:0:8:0:   class=0x108000 card=0x15371022 chip=0x15371022 
> rev=0x00 hdr=0x00
> vendor = 'Advanced Micro Devices, Inc. [AMD]'
> class  = encrypt/decrypt
> bar   [10] = type Prefetchable Memory, range 64, base 0xfea0, size 
> 131072, enabled
> bar   [18] = type Memory, range 32, base 0xfe80, size 1048576, enabled
> bar   [1c] = type Memory, range 32, base 0xfea24000, size 4096, enabled
> bar   [20] = type Memory, range 32, base 0xfe90, size 1048576, enabled
> bar   [24] = type Memory, range 32, base 0xfea2, size 8192, enabled
> cap 11[50] = MSI-X supports 2 messages
>  Table in map 0x24[0x0], PBA in map 0x24[0x1000]
> cap 08[5c] = HT MSI fixed address window enabled at 0xfee0
> cap 01[60] = powerspec 3  supports D0 D3  current D0
> 
> 
> Encrypt/decrypt? What is this?
> 
> Thanks for your patience and enlighting me,

I think that that's this hardware:
http://www.amd.com/en-us/innovations/software-technologies/security
http://www.anandtech.com/show/6007/amd-2013-apus-to-include-arm-cortexa5-processor-for-trustzone-capabilities

Linux has a driver for it:
http://www.phoronix.com/scan.php?px=MTU4MTM=news_item
http://cateee.net/lkddb/web-lkddb/CRYPTO_DEV_CCP.html

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kldload intpm

2016-09-08 Thread Andriy Gapon
On 07/09/2016 20:49, John Baldwin wrote:
> You can request a specific ordering via DRIVER_MODULE_ORDERED (you can 
> specify the
> SI_ORDER to use as an extra argument).  The typical practice is to load the 
> "base"
> driver (the one that attaches highest up the device hierarchy) "last" so that 
> all
> other drivers are registered once it tries to attach.  For example, in xl(4) 
> this
> is used to to have the PCI attachment register last so that the miibus driver 
> is
> registered when xl0 attaches:
> 
> DRIVER_MODULE_ORDERED(xl, pci, xl_driver, xl_devclass, NULL, NULL,
> SI_ORDER_ANY);
> DRIVER_MODULE(miibus, xl, miibus_driver, miibus_devclass, NULL, NULL);
> 
> DRIVER_MODULE() uses SI_ORDER_MIDDLE by default.
> 
> This probably needs to be fixed in all of the smbus controller drivers.

Thank you for the advice.
I'm going to fix intpm.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


kldload intpm

2016-09-07 Thread Andriy Gapon

Has kldload intpm ever worked?
Ditto for other smbus drivers.

The reason I am asking is that it doesn't work for me on the latest head.
And it doesn't work because device_probe_and_attach(sc->smbus) fails in
intsmb_attach().

With a little help from DTrace I obtained the following output:
CPU IDFUNCTION:NAME
  0  41924devclass_add_driver:entry devclass = 0xf8000675b700, name
= pci, driver = 0x832ed058, name = intsmb

  0  32121 device_probe_child:entry
parent = 0xf8000af78100, nameunit = intsmb0, devclass = 0xf8001d955880,
name = intsmb, driver = 0x0, name = 
child = 0xf8001d933500, nameunit = smbus1, devclass = 0xf8001d955780,
name = smbus

  kernel`device_probe+0x9d
  kernel`device_probe_and_attach+0x2e
  intpm.ko`intsmb_attach+0x651
  kernel`device_attach+0x41d
  kernel`pci_driver_added+0xed
  kernel`devclass_driver_added+0x7d
  kernel`devclass_add_driver+0x144
  kernel`module_register_init+0xb0
  kernel`linker_load_module+0xc88
  kernel`kern_kldload+0xa7
  kernel`sys_kldload+0x5b
  kernel`amd64_syscall+0x2db
  kernel`0x80e918ab

  1  41924devclass_add_driver:entry devclass = 0xf8001d955880, name
= intsmb, driver = 0x832ee930, name = smbus

My interpretation is that intsmb_attach() is called before the smbus driver is
associated with the intsmb devclass.  That means that the devclass does not have
any drivers at all when intsmb_attach() calls device_probe_and_attach() on its
smbus child.  It's too late when the smbus driver is added to the intsmb 
devclass.

Okay, writing the above gave me an idea to try to change the order of
DRIVER_MODULE() lines in intpm.c and that fixed the problem.

But I seem to recall that some years ago kldload intpm worked without the
change.  Perhaps the order has changed in the module loading code.
Anyway, this seems to be very subtle and error prone.  I wonder if we could make
it more robust.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Regression with revision 303970 (was kern.proc.pathname failure while booting from zfs)

2016-09-04 Thread Andriy Gapon
On 04/09/2016 17:51, Konstantin Belousov wrote:
> It is only masked when name cache has an entry for the vnode.  So sometimes
> vn_fullpath() should be broken even if no normalization is applied.

Yes, this is true.

> OTOH, classic filesystems like UFS do not have any other means to translate
> non-directory inode to name and parent at all, except the namecache hint.

In fact, this is true for ZFS as well.  While ZFS znodes have an attribute that
specifies a (single) parent, it's obviously unreliable for files, because a file
can be linked into multiple directories and then unlinked from a directory
specified by the attribute.

So, at the moment I do not have any good ideas on how to make this work.
Maybe trying to use the parent attribute and failing when it's inconsistent
would be good enough...


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Regression with revision 303970 (was kern.proc.pathname failure while booting from zfs)

2016-09-04 Thread Andriy Gapon
On 04/09/2016 11:24, Andriy Gapon wrote:
> On 27/08/2016 22:09, Frederic Chardon wrote:
>>> Anybody is able to reproduce this behavior or is it a local problem?
>> Reverting 303970 solves this issue. gcore and adb works again, and I
>> can start the vboxnet service.
>> I recreated my boot pool with no properties defined, just to be sure.
> 
> I can not reproduce this issue here.

I was not trying hard enough.  I've just reproduced the problem using a
non-default normalization property.  The issue is that 303970 disabled the use
of VFS name cache when any name "mangling" (normalization, case-insensitivity)
is enabled.  And apparently I misunderstood how vop_stdvptocnp() works.  So,
right now zfs_vptocnp() is broken when its argument is a non-directory vnode.
That fact is masked when the name cache is used and is exposed otherwise.

I will think about a fix.  Could you please file a bug report for this (if not
already)?
Sorry about the breakage.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Regression with revision 303970 (was kern.proc.pathname failure while booting from zfs)

2016-09-04 Thread Andriy Gapon
On 27/08/2016 22:09, Frederic Chardon wrote:
>> Anybody is able to reproduce this behavior or is it a local problem?
> Reverting 303970 solves this issue. gcore and adb works again, and I
> can start the vboxnet service.
> I recreated my boot pool with no properties defined, just to be sure.

I can not reproduce this issue here.
Unfortunately, I have no clue how kern.proc.pathname works, so I would
appreciate any hints at what filesystem operations I should look for potential
problems.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


major code change for .zfs

2016-08-23 Thread Andriy Gapon

Please review and test a change to .zfs code that is intended to make the code
aligned with FreeBSD VFS and, as such, more stable:
https://reviews.freebsd.org/D7421

The change removes two features.
.zfs/shares is gone because it was unused on FreeBSD anyway.  We can restore
that when we need it.
An ability to take a snapshot by creating a directory under .zfs/snapshot is
removed.  I hope that you didn't use it.  Please do not start using it now :-)
Again, this feature can be restored with some work.
The reason I removed it is that its companion features of destroying and
renaming snapshots were already missing on FreeBSD, and properly implementing
the feature required some more work.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ahci timeout during boot on a particular mobo

2016-08-19 Thread Andriy Gapon
On 19/08/2016 14:06, Alexander Motin wrote:
> On 19.08.16 11:30, Andriy Gapon wrote:
>> So, what's suspicious here is that we discover two AHCI channels on the 
>> JMicron
>> device and we seem to discover some sort of a device on one of them.  But the
>> communication with that (phantom?) device times out and that causes a very 
>> long
>> delay during the boot.
> 
> This fake device is the most interesting part.  Marvell AHCI RAID chips
> in such way expose RAID management device, but I doubt that JMicron is
> so advanced, at least it seems like not implemented properly enough.
> 
>> Is there a way to fix the boot delay?
>>
>> Searched for JMB361 in the source code, looked at some nearby device entries,
>> and - is it as simple as adding AHCI_Q_1CH quick for this device?
> 
> AHCI_Q_1CH quirk was added for early Marvell chips that were ever
> dirtier mix of legacy ATA and AHCI, that reported total number of ports
> instead of expected AHCI ones.  May be JMB361 is also like that, but I
> never had those check.  JMB362 I have does not have this problem,
> reporting two real SATA ports, even though it has one legacy PATA port
> also.  I don't have strong objections against this quirk.  I am not sure
> whether it is right solution, but suppose that in couple years nobody
> will bother about that hardware at all.
> 

Thank you for the reply!
I found this bit of info about JMB361
http://www.clubedohardware.com.br/datasheets/JMB361.pdf and it confirms that the
controller has a single SATA port.  And JMB362 has two ports
http://www.clubedohardware.com.br/datasheets/JMB361.pdf.
Maybe the second port on JMB361 has some sort of a SATA-to-IDE adapter and
perhaps it's that adapter that gets detected as a phantom device.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


ahci timeout during boot on a particular mobo

2016-08-19 Thread Andriy Gapon
 problems or strange behavior.

$ camcontrol devlist -v
scbus0 on ahcich0 bus 0:
<> at scbus0 target -1 lun  ()
scbus1 on ahcich1 bus 0:
<> at scbus1 target -1 lun  ()
scbus2 on ata2 bus 0:
   at scbus2 target 0 lun 0 (pass0,cd0)
<> at scbus2 target -1 lun  ()
scbus3 on ahcich2 bus 0:
<> at scbus3 target -1 lun  ()
scbus4 on ahcich3 bus 0:
 at scbus4 target 0 lun 0 (pass1,ada0)
<> at scbus4 target -1 lun  ()
scbus5 on ahcich4 bus 0:
<> at scbus5 target -1 lun  ()
scbus6 on ahcich5 bus 0:
<> at scbus6 target -1 lun  ()
scbus7 on ahcich6 bus 0:
<> at scbus7 target -1 lun  ()
scbus8 on ahcich7 bus 0:
<> at scbus8 target -1 lun  ()
scbus9 on sbp0 bus 0:
<> at scbus9 target -1 lun  ()
scbus-1 on xpt0 bus 0:
<>

Is there a way to fix the boot delay?
Thank you!

P.S.
Searched for JMB361 in the source code, looked at some nearby device entries,
and - is it as simple as adding AHCI_Q_1CH quick for this device?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

vtterm_cngrab + kms does too much to be useful

2016-08-15 Thread Andriy Gapon

Here is an example:

NMI ISA 2c, EISA ff
NMI ... going to debugger

panic: malloc: called with spinlock or critical section held

(kgdb) bt
#0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:297
#1  0x8063c33f in kern_reboot (howto=) at
/usr/src/sys/kern/kern_shutdown.c:365
#2  0x8063c988 in vpanic (fmt=, ap=0xfe03e218cac0) at
/usr/src/sys/kern/kern_shutdown.c:641
#3  0x8063c693 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:574
#4  0x8061abaa in malloc (size=5136, mtp=0x821b2810
, flags=257) at /usr/src/sys/kern/kern_malloc.c:475
#5  0x821849ea in drm_crtc_helper_set_config (set=0xf800282b4a00) at
/usr/src/sys/modules/drm2/drm2/../../../dev/drm2/drm_crtc_helper.c:596
#6  0x8218a03e in drm_fb_helper_restore_fbdev_mode (fb_helper=) at /usr/src/sys/modules/drm2/drm2/../../../dev/drm2/drm_fb_helper.c:344
#7  0x82189efc in vt_kms_postswitch (arg=) at
/usr/src/sys/modules/drm2/drm2/../../../dev/drm2/drm_fb_helper.c:80
#8  0x805237ba in vt_fb_postswitch (vd=) at
/usr/src/sys/dev/vt/hw/fb/vt_fb.c:385
#9  0x805290fd in vt_window_switch (vw=0x80c4e4c0
) at /usr/src/sys/dev/vt/vt_core.c:540
#10 0x80527410 in vtterm_cngrab (tm=) at
/usr/src/sys/dev/vt/vt_core.c:1465
#11 0x806868fe in termcn_cngrab (cp=) at
/usr/src/sys/kern/subr_terminal.c:488
#12 0x805e8ef2 in cngrab () at /usr/src/sys/kern/kern_cons.c:368
#13 0x8067399e in kdb_trap (type=19, code=0, tf=0xfe03e218cf30) at
/usr/src/sys/kern/subr_kdb.c:650
#14 0x8083e2bc in trap (frame=0xfe03e218cf30) at
/usr/src/sys/amd64/amd64/trap.c:389

I do not have any solution for this.
It's certainly nice to be able to switch to console when kdb becomes active.
But the code that does switching should be prepared to work in the rather
restrictive context.  Which the kms code is not.

P.S. It seems that the latest version of drm_fb_helper.c in Linux is quite
different from what we have.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot environments and zfs canmount=noauto

2016-08-11 Thread Andriy Gapon
On 28/07/2016 13:34, Andriy Gapon wrote:
> Locally I have the following rc script to handle subordinate datasets of
> a boot environment: http://dpaste.com/0Q0JPGN.txt
> It is designed for exactly the scenario described above.
> The script is automatically enabled when zfs_enable is enabled.
> 
> It would probably make sense to include the script into the OS after
> some testing and a review.

For posterity, as the paste has expired, I've placed it here:
https://people.freebsd.org/~avg/zfsbe.sh

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


<    1   2   3   4   5   6   7   8   9   10   >