On 12/07/2017 07:35 AM, Alan Somers wrote:
On Thu, Dec 7, 2017 at 2:33 AM, Andriy Gapon <a...@freebsd.org
<mailto:a...@freebsd.org>> wrote:
[cc-ing current@ to raise more awareness]
On 05/12/2017 16:03, Alexey Dokuchaev wrote:
> On Fri, Nov 24, 2017 at 11:31:51AM +0200, Andriy Gapon wrote:
>>
>> I have reported a couple of nvidia-driver issues in the FreeBSD
section
>> of the nVidia developer forum, but no replies so far.
>>
>> Well, the first issue is not with the driver, but with a utility
that
>> comes with it, nvidia-smi:
>>
https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/
<https://devtalk.nvidia.com/default/topic/1026589/freebsd/nvidia-smi-query-gpu-spins-forever-on-freebsd-head-amd64-/>
>> I wonder if I am the only one affected or if I see the problem
because
>> I am on head or something else.
>> I am pretty sure that the problem is caused by a programming bug
related
>> to strtok_r.
>
> I'll try to reproduce it and report back.
I've done some work with a debugger and it seems that there is code
that does
something like this:
char *last = NULL;
while (1) {
if (last == NULL)
p = strtok_r(str, sep, &last);
else
p = strtok_r(NULL, sep, &last);
if (p == NULL)
break;
...
}
The problem is that when 'p' points to the last token, 'last' is
NULL (in
FreeBSD implementation of strtok_r). That means that when we go to
the next
iteration the parsing starts all over again leading to the endless loop.
The code is incorrect from the standards point of view, because the
value of
'last' is completely opaque and should not be used for anything else
but passing
it back to strtok_r.
I used gdb -w to change the logic to:
char *last = 1;
While (1) {
if (last == 1)
p = strtok_r(str, sep, &last);
else
p = strtok_r(NULL, sep, &last);
...
}
Where 1 is used as an "impossible" pointer value which is neither
NULL nor a
valid pointer that can be set by strtok_r. It's not ideal, but
binary code
editing is not as easy as that of source code.
The binary patch is here:
https://people.freebsd.org/~avg/nvidia-smi.bsdiff
<https://people.freebsd.org/~avg/nvidia-smi.bsdiff>
>> The second issue is with the FreeBSD support for the kernel driver:
>>
https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/
<https://devtalk.nvidia.com/default/topic/1026645/freebsd/panic-related-to-nvkms_timers-lock-sx-lock-/>
>> I would like to get some feedback on my analysis.
>> I am testing this patch right now:
>>
https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c
<https://people.freebsd.org/~avg/extra-patch-src_nvidia-modeset_nvidia-modeset-freebsd.c>
>
> Unfortunately, I'm not an expert on kernel locking primitives to
give you
> a proper review, let's see what others have to say.
It's been a while since I posted the patch and there are no comments
yet.
I can only add that I am running an INVARIANTS and WITNESS enabled
kernel all
the time and before the patch I was getting kernel panics every now
and then.
Since I started using the patch I haven't had a single nvidia panic yet.
>> Also, what's the best place or who are the best people with whom to
>> discuss such issues?
>
> Yes, this is a problem now: since Christian Zander had left
nVidia, he
> could not tell me who'd be their next liaison to talk to from FreeBSD
> community. :-(
Oh, I didn't know about Christian's departure.
So, we are not in a very good position now.
How about Aaron Plattner (CC'd). Aaron, are you still working on
FreeBSD driver issues?
Thanks for the heads up, Alan. I filed bug 2032249 to track this.
-- Aaron
_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"