Re: panic: invalid bcd xxx

2017-03-01 Thread Michael Gmelin


> On 2 Mar 2017, at 00:35, Adrian Chadd  wrote:
> 
> This is an emulated BIOS though, right?
> 
> I don't know if we're going to get the RTC 'bugfixed'...
> 


It's SeaBIOS, yes. I feel like this might end up in another quirk/workaround 
solution.

-m


> 
> -adrian
> 
>> On 28 February 2017 at 15:26, Michael Gmelin  wrote:
>> On Tue, 28 Feb 2017 17:16:02 -0600
>> Eric van Gyzen  wrote:
>> 
 On 02/28/2017 16:57, Conrad Meyer wrote:
 On Tue, Feb 28, 2017 at 2:31 PM, Eric van Gyzen
  wrote:
> Your system's real-time clock is returning garbage.  r312702 added
> some input validation a few weeks ago.  Previously, the kernel was
> reading beyond the end of an array and either complaining about
> the clock or setting it to the wrong time based on whatever was in
> the memory beyond the array.
> 
> The added validation shouldn't be an assertion because it operates
> on data beyond the kernel's control.  Try this:
> 
> --- sys/libkern.h   (revision 314424)
> +++ sys/libkern.h   (working copy)
> @@ -57,8 +57,10 @@
> bcd2bin(int bcd)
> {
> 
> -   KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
> -   ("invalid bcd %d", bcd));
> +   if (bcd < 0 || bcd >= LIBKERN_LEN_BCD2BIN) {
> +   printf("invalid bcd %d\n", bcd);
> +   return (0);
> +   }
>return (bcd2bin_data[bcd]);
> }
 
 I don't think removing this assertion and truncating to zero is the
 right thing to do.  Adding an error return to this routine is a
 little much, though.  I think probably the caller should perform
 input validation between the broken device and this routine.
>>> 
>>> Either of those would be a much better solution.  This was just a
>>> quick hack to get the memstick to boot.
>>> 
>> 
>> Thanks for your response.
>> 
>> I'm not in a hurry, so I can wait for a proper solution. Let me know if
>> I should test anything or can help in some other way.
>> 
>> -m
>> 
>> 
>> --
>> Michael Gmelin
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: invalid bcd xxx

2017-03-01 Thread Adrian Chadd
This is an emulated BIOS though, right?

I don't know if we're going to get the RTC 'bugfixed'...


-adrian

On 28 February 2017 at 15:26, Michael Gmelin  wrote:
> On Tue, 28 Feb 2017 17:16:02 -0600
> Eric van Gyzen  wrote:
>
>> On 02/28/2017 16:57, Conrad Meyer wrote:
>> > On Tue, Feb 28, 2017 at 2:31 PM, Eric van Gyzen
>> >  wrote:
>> >> Your system's real-time clock is returning garbage.  r312702 added
>> >> some input validation a few weeks ago.  Previously, the kernel was
>> >> reading beyond the end of an array and either complaining about
>> >> the clock or setting it to the wrong time based on whatever was in
>> >> the memory beyond the array.
>> >>
>> >> The added validation shouldn't be an assertion because it operates
>> >> on data beyond the kernel's control.  Try this:
>> >>
>> >> --- sys/libkern.h   (revision 314424)
>> >> +++ sys/libkern.h   (working copy)
>> >> @@ -57,8 +57,10 @@
>> >>  bcd2bin(int bcd)
>> >>  {
>> >>
>> >> -   KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
>> >> -   ("invalid bcd %d", bcd));
>> >> +   if (bcd < 0 || bcd >= LIBKERN_LEN_BCD2BIN) {
>> >> +   printf("invalid bcd %d\n", bcd);
>> >> +   return (0);
>> >> +   }
>> >> return (bcd2bin_data[bcd]);
>> >>  }
>> >
>> > I don't think removing this assertion and truncating to zero is the
>> > right thing to do.  Adding an error return to this routine is a
>> > little much, though.  I think probably the caller should perform
>> > input validation between the broken device and this routine.
>>
>> Either of those would be a much better solution.  This was just a
>> quick hack to get the memstick to boot.
>>
>
> Thanks for your response.
>
> I'm not in a hurry, so I can wait for a proper solution. Let me know if
> I should test anything or can help in some other way.
>
> -m
>
>
> --
> Michael Gmelin
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

2017-03-01 Thread Mark Millard

On 2017-Feb-28, at 10:13 PM, Mateusz Guzik  wrote:

On Sat, Feb 25, 2017 at 08:31:04PM +0100, Mateusz Guzik wrote:
>> On Sat, Feb 25, 2017 at 09:58:39AM -0800, Mark Millard wrote:
>>> Thus the PowerMac G5 so-called "Quad Core" is back to
>>> -r313254 without your patches. (The "Quad Core" really has
>>> two processors, each with 2 cores.)
>>> 
>> 
>> 
>> Thanks a lot for testing. I'll have to think what to do with it, worst
>> case I'll #ifdef changes with powerpc.
>> 
> 
> Should be fixed with r314474. Got a real powerpc to test on (60 cores),
> was able to lock it up in seconds. Now it is perfectly stablle.
> 
> -- 
> Mateusz Guzik 

The updated so-called "Quad Core" PowerMac G5 used for
TARGET_ARCH=powerpc64 was able to do a self hosted
buildworld buildkernel for -r314479 just fine.

Thanks much for the fixes: Now I can track head again
for powerpc64.


Summary of the transition interval:

So for powerpc64 (and powerpc?) It is a good
idea to avoid anything that is after -r313254
and before -r314474 in head. (Would this be
appropriate for a UPDATING notice given its
span?)

There may be other architectures that might have
a similar status(?): the last fixes involved were
not in Machine Dependent code. (Some architectures
are apparently insensitive to the errors, such as
amd64).

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: confusing KTR_SCHED traces

2017-03-01 Thread Julian Elischer

On 18/2/17 2:48 am, Andriy Gapon wrote:

First, an example, three consecutive entries for the same thread (from top to
bottom):
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"sleep",
attributes: prio:84, wmesg:"-", lockname:"(null)"
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"spinning",
attributes: lockname:"sched lock 1"
KTRGRAPH group:"thread", id:"zio_write_intr_3 tid 100260", state:"running",
attributes: none

Any automatic analysis tool including schedgraph.py will assume that the thread
ends up in the running state.  In reality, of course, the thread is in the
sleeping state.
The confusing trace is a result of logging the thread's intention to switch out
in mi_switch() before calling sched_switch().  In ULE's sched_switch() we
acquire the "TDQ_LOCK" which could be contested.  In that case the thread spins
waiting for the lock to be released.  This is reported as "spinning" and then
"running" states.

I would like to fix that, but not sure how to do that best.
One idea is to move the mi_switch() trace closer to the cpu_switch() call
similarly to DTrace sched:cpu-off and sched:cpu-on probes.


I think that is the way to fix it


Any suggestions are welcome.
Thanks!



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"