Re: Boot failure: panic: No heap setup

2018-03-30 Thread Toomas Soome


> On 30 Mar 2018, at 18:03, Stefan Esser  wrote:
> 
> Am 29.03.18 um 07:15 schrieb Toomas Soome:
>> 
>> 
>>> On 29 Mar 2018, at 01:06, Stefan Esser  wrote:
>>> 
>>> Am 28.03.18 um 22:28 schrieb Warner Losh:
> Hmmm, the code references point into the boot loader code - I had
> expected that there is a problem in the kernel, not the boot loader.
> 
>> [1]
>> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
   
> 
> 
> Seems that setbase has either not been called or has been called with
> base=0.
 
   Right, which is odd...
 
>> [2]
>> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
   
 
> 
> 
> I had thought, that the zfs boot code has been initialized before the
> menu is displayed?
 
   Right, all of this should be done long before we get to the
   interpreter. Can you break into the loader prompt and try the `heap`
   command, see what that outputs? CC'ing imp@ because he actually knows
   things.
 
 Totally weird. I'd add a printf to the sethead() function to display its 
 args
 and see if you get this panic before/after that printf...
>>> 
>>> I'm currently using a Forth-enabled boot loader again, since this is a
>>> "production" machine (my home server, which also receives and keeps all
>>> my work email, for example).
>>> 
>>> I'll build a clean world with the LUA loader and test it on one of the
>>> next days. Tests will include the "heap" loader command and I'll add the
>>> printf (though, if sbrk() has really not been called, I guess that will
>>> not go too well ...).
>>> 
>>> Is it possible, that the setheap function is called a second time, just
>>> before jumping into the kernel? (In that case adding the printf might
>>> crash the loader in the first setheap call ...)
>>> 
>>> Since the loader menu (and escaping from the menu) works, there must be
>>> a valid heap, at that time.
>>> 
>> 
>> indeed. and assuming the message really is from loader, it means, there must
>> be memory corruption - if so, you can check which variables are located
>> close to heap related ones… Also, since you have the working menu, it has to
>> be related to actual loading. Since the loading itself has been working so
>> far, it should be related to lua specific bits which are preparing towards
>> to call load functions.
> 
> Ok, some more data points:
> 
> 1) A printf in setheap reported plausible values during start-up of zfsboot.
>   The menu appeared and wiped away the values so fast that I could not take
>   a photo or write them down.
> 


if you got menu and stuff, it means that at that point the heap was all OK. 
just after setheap() the bcache_init() is called and that too will allocate 
memory.

what you can do is to esc out from menu to OK prompt and check the output of 
heap and biosmem commands… 


> 2) I have rebuilt world and kernel based on r331763. Booting resulted in the
>   same panic as reported before. There was no debug output from the patched
>   setheap call before the panic (which indicates that it was not called a
>   second time).
> 
> 3) In order to get my system to boot, I interrupted loading of zfsloader and
>   forced loading of the previous version (from a world build with Forth in
>   the loader). Booting succeeded with the latest kernel ...
> 
> It looks as if sbrk() was called in zfsloader before setheap() has been used
> to initialize the heap parameters, if lua is enabled instead if Forth. See
> stand/i386/loader/main.c:124 for the location of the setheap call in the
> loader.

this can only happen when something is called before main… 

> 
> This is obviously hard to debug, though, since printf cannot be called at that
> point. A pure write(2) should be possible without heap, but since the console
> has not been initialized at the point of the setheap invocation, there is no
> working output device, AFAIK.
> 
> I do not see, how any sbrk() call could occur before setheap is called. And
> there does not appear to be any other setheap function (or macro) in the
> tree, that could overload the one defined in stand/libsa/sbrk.c ...
> 
> I have no idea how to proceed from here ...
> 
> But now I'm sure it is a problem in zfsloader (or loader in general?).
> 
> Hmmm: How is the panic message printed by sbrk() without a initialized heap?
> The definition of panic in stand/libsa/panic.c relies on a working printf!
> 
> I should be able to use printf in the same way as panic does, but I did
> not succeed when I tried to use it early in zfsloader ...
> 
> Regards, STefan


rgds,
toomas

___
freebsd-current@freebsd.org mailing list

Re: Boot failure: panic: No heap setup

2018-03-30 Thread Stefan Esser
Am 29.03.18 um 07:15 schrieb Toomas Soome:
> 
> 
>> On 29 Mar 2018, at 01:06, Stefan Esser  wrote:
>>
>> Am 28.03.18 um 22:28 schrieb Warner Losh:
 Hmmm, the code references point into the boot loader code - I had
 expected that there is a problem in the kernel, not the boot loader.

> [1]
> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
>>>


 Seems that setbase has either not been called or has been called with
 base=0.
>>>
>>>Right, which is odd...
>>>
> [2]
> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
>>>
>>> 


 I had thought, that the zfs boot code has been initialized before the
 menu is displayed?
>>>
>>>Right, all of this should be done long before we get to the
>>>interpreter. Can you break into the loader prompt and try the `heap`
>>>command, see what that outputs? CC'ing imp@ because he actually knows
>>>things.
>>>
>>> Totally weird. I'd add a printf to the sethead() function to display its 
>>> args
>>> and see if you get this panic before/after that printf...
>>
>> I'm currently using a Forth-enabled boot loader again, since this is a
>> "production" machine (my home server, which also receives and keeps all
>> my work email, for example).
>>
>> I'll build a clean world with the LUA loader and test it on one of the
>> next days. Tests will include the "heap" loader command and I'll add the
>> printf (though, if sbrk() has really not been called, I guess that will
>> not go too well ...).
>>
>> Is it possible, that the setheap function is called a second time, just
>> before jumping into the kernel? (In that case adding the printf might
>> crash the loader in the first setheap call ...)
>>
>> Since the loader menu (and escaping from the menu) works, there must be
>> a valid heap, at that time.
>>
> 
> indeed. and assuming the message really is from loader, it means, there must
> be memory corruption - if so, you can check which variables are located
> close to heap related ones… Also, since you have the working menu, it has to
> be related to actual loading. Since the loading itself has been working so
> far, it should be related to lua specific bits which are preparing towards
> to call load functions.

Ok, some more data points:

1) A printf in setheap reported plausible values during start-up of zfsboot.
   The menu appeared and wiped away the values so fast that I could not take
   a photo or write them down.

2) I have rebuilt world and kernel based on r331763. Booting resulted in the
   same panic as reported before. There was no debug output from the patched
   setheap call before the panic (which indicates that it was not called a
   second time).

3) In order to get my system to boot, I interrupted loading of zfsloader and
   forced loading of the previous version (from a world build with Forth in
   the loader). Booting succeeded with the latest kernel ...

It looks as if sbrk() was called in zfsloader before setheap() has been used
to initialize the heap parameters, if lua is enabled instead if Forth. See
stand/i386/loader/main.c:124 for the location of the setheap call in the
loader.

This is obviously hard to debug, though, since printf cannot be called at that
point. A pure write(2) should be possible without heap, but since the console
has not been initialized at the point of the setheap invocation, there is no
working output device, AFAIK.

I do not see, how any sbrk() call could occur before setheap is called. And
there does not appear to be any other setheap function (or macro) in the
tree, that could overload the one defined in stand/libsa/sbrk.c ...

I have no idea how to proceed from here ...

But now I'm sure it is a problem in zfsloader (or loader in general?).

Hmmm: How is the panic message printed by sbrk() without a initialized heap?
The definition of panic in stand/libsa/panic.c relies on a working printf!

I should be able to use printf in the same way as panic does, but I did
not succeed when I tried to use it early in zfsloader ...

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-29 Thread Toomas Soome


> On 29 Mar 2018, at 01:06, Stefan Esser  wrote:
> 
> Am 28.03.18 um 22:28 schrieb Warner Losh:
>>> Hmmm, the code references point into the boot loader code - I had
>>> expected that there is a problem in the kernel, not the boot loader.
>>> 
 [1]
 https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
>>
>>> 
>>> 
>>> Seems that setbase has either not been called or has been called with
>>> base=0.
>> 
>>Right, which is odd...
>> 
 [2]
 https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
>>
>> 
>>> 
>>> 
>>> I had thought, that the zfs boot code has been initialized before the
>>> menu is displayed?
>> 
>>Right, all of this should be done long before we get to the
>>interpreter. Can you break into the loader prompt and try the `heap`
>>command, see what that outputs? CC'ing imp@ because he actually knows
>>things.
>> 
>> Totally weird. I'd add a printf to the sethead() function to display its args
>> and see if you get this panic before/after that printf...
> 
> I'm currently using a Forth-enabled boot loader again, since this is a
> "production" machine (my home server, which also receives and keeps all
> my work email, for example).
> 
> I'll build a clean world with the LUA loader and test it on one of the
> next days. Tests will include the "heap" loader command and I'll add the
> printf (though, if sbrk() has really not been called, I guess that will
> not go too well ...).
> 
> Is it possible, that the setheap function is called a second time, just
> before jumping into the kernel? (In that case adding the printf might
> crash the loader in the first setheap call ...)
> 
> Since the loader menu (and escaping from the menu) works, there must be
> a valid heap, at that time.
> 

indeed. and assuming the message really is from loader, it means, there must be 
memory corruption - if so, you can check which variables are located close to 
heap related ones… Also, since you have the working menu, it has to be related 
to actual loading. Since the loading itself has been working so far, it should 
be related to lua specific bits which are preparing towards to call load 
functions.

rgds,
toomas



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-28 Thread Stefan Esser
Am 28.03.18 um 22:28 schrieb Warner Losh:
> > Hmmm, the code references point into the boot loader code - I had
> > expected that there is a problem in the kernel, not the boot loader.
> >
> >> [1]
> >> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
> 
> >
> >
> > Seems that setbase has either not been called or has been called with
> > base=0.
> 
> Right, which is odd...
> 
> >> [2]
> >> 
> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
> 
> 
> >
> >
> > I had thought, that the zfs boot code has been initialized before the
> > menu is displayed?
> 
> Right, all of this should be done long before we get to the
> interpreter. Can you break into the loader prompt and try the `heap`
> command, see what that outputs? CC'ing imp@ because he actually knows
> things.
> 
> Totally weird. I'd add a printf to the sethead() function to display its args
> and see if you get this panic before/after that printf...

I'm currently using a Forth-enabled boot loader again, since this is a
"production" machine (my home server, which also receives and keeps all
my work email, for example).

I'll build a clean world with the LUA loader and test it on one of the
next days. Tests will include the "heap" loader command and I'll add the
printf (though, if sbrk() has really not been called, I guess that will
not go too well ...).

Is it possible, that the setheap function is called a second time, just
before jumping into the kernel? (In that case adding the printf might
crash the loader in the first setheap call ...)

Since the loader menu (and escaping from the menu) works, there must be
a valid heap, at that time.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-28 Thread Warner Losh
On Wed, Mar 28, 2018 at 1:10 PM, Kyle Evans  wrote:

> On Tue, Mar 27, 2018 at 6:39 PM, Stefan Esser  wrote:
> > Am 27.03.18 um 21:31 schrieb Kyle Evans:
> >>
> >> On Tue, Mar 27, 2018 at 11:06 AM, Stefan Esser  wrote:
> >>>
> >>> A few weeks ago I tried the LUA boot and found, that my kernel did not
> >>> start
> >>> (i.e. did not print the initial FreeBSD version line), but instead
> >>> stopped
> >>> with:
> >>
> >>
> >> Oy =/
> >>
> >>> panic: No heap setup
> >>>
> >>> I recovered by booting from an alternate boot device and kept my system
> >>> running until today, where I decided to give the LUA boot another try.
> >>>
> >>> The boot failure happened again, with identical message:
> >>>
> >>>  panic: No heap setup
> >>
> >>
> >> Hmm... that's an sbrk panic [1], indicating that setheap hadn't been
> >> called. zfsgptboot is zfsboot with gpt bits included, so the relevant
> >> setheap call is [2] I believe. It's not immediately clear to me how
> >> switching interpreters could actually be breaking it in this way.
> >>
> >> At what point are you hitting this panic? After menu, before kernel
> >> transition?
> >
> >
> > The menu is displayed and I can unload the kernel and load the kernel
> > and modules from an alternate path. The lua code seems to work just fine,
> > but as soon as I enter the "boot" command, the panic happens.
> >
> > This happens when the loader transfers control to the kernel but before
> > any other output is generated. I tried booting a GENERIC kernel just to
> > be sure this is not caused by an out-dated kernel config file.
> >
> >>> I tried booting a GENERIC kernel, but only rebuilding the boot loader
> >>> (gptzfsloader in my case) without LUA support fixed the issue for me
> ...
> >>>
> >>> The system is -CURRENT (built today) on amd64 (not converted to UEFI,
> >>> yet).
> >
> >
> > Hmmm, the code references point into the boot loader code - I had
> > expected that there is a problem in the kernel, not the boot loader.
> >
> >> [1]
> >> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
> >
> >
> > Seems that setbase has either not been called or has been called with
> > base=0.
>
> Right, which is odd...
>
> >> [2]
> >> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/
> zfsboot.c?view=markup#l688
> >
> >
> > I had thought, that the zfs boot code has been initialized before the
> > menu is displayed?
>
> Right, all of this should be done long before we get to the
> interpreter. Can you break into the loader prompt and try the `heap`
> command, see what that outputs? CC'ing imp@ because he actually knows
> things.


Totally weird. I'd add a printf to the sethead() function to display its
args and see if you get this panic before/after that printf...

Warner
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-28 Thread Kyle Evans
On Tue, Mar 27, 2018 at 6:39 PM, Stefan Esser  wrote:
> Am 27.03.18 um 21:31 schrieb Kyle Evans:
>>
>> On Tue, Mar 27, 2018 at 11:06 AM, Stefan Esser  wrote:
>>>
>>> A few weeks ago I tried the LUA boot and found, that my kernel did not
>>> start
>>> (i.e. did not print the initial FreeBSD version line), but instead
>>> stopped
>>> with:
>>
>>
>> Oy =/
>>
>>> panic: No heap setup
>>>
>>> I recovered by booting from an alternate boot device and kept my system
>>> running until today, where I decided to give the LUA boot another try.
>>>
>>> The boot failure happened again, with identical message:
>>>
>>>  panic: No heap setup
>>
>>
>> Hmm... that's an sbrk panic [1], indicating that setheap hadn't been
>> called. zfsgptboot is zfsboot with gpt bits included, so the relevant
>> setheap call is [2] I believe. It's not immediately clear to me how
>> switching interpreters could actually be breaking it in this way.
>>
>> At what point are you hitting this panic? After menu, before kernel
>> transition?
>
>
> The menu is displayed and I can unload the kernel and load the kernel
> and modules from an alternate path. The lua code seems to work just fine,
> but as soon as I enter the "boot" command, the panic happens.
>
> This happens when the loader transfers control to the kernel but before
> any other output is generated. I tried booting a GENERIC kernel just to
> be sure this is not caused by an out-dated kernel config file.
>
>>> I tried booting a GENERIC kernel, but only rebuilding the boot loader
>>> (gptzfsloader in my case) without LUA support fixed the issue for me ...
>>>
>>> The system is -CURRENT (built today) on amd64 (not converted to UEFI,
>>> yet).
>
>
> Hmmm, the code references point into the boot loader code - I had
> expected that there is a problem in the kernel, not the boot loader.
>
>> [1]
>> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
>
>
> Seems that setbase has either not been called or has been called with
> base=0.

Right, which is odd...

>> [2]
>> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
>
>
> I had thought, that the zfs boot code has been initialized before the
> menu is displayed?

Right, all of this should be done long before we get to the
interpreter. Can you break into the loader prompt and try the `heap`
command, see what that outputs? CC'ing imp@ because he actually knows
things.

> Or do I misunderstand this phase of the boot process???
>
> Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-27 Thread Stefan Esser

Am 27.03.18 um 21:31 schrieb Kyle Evans:

On Tue, Mar 27, 2018 at 11:06 AM, Stefan Esser  wrote:

A few weeks ago I tried the LUA boot and found, that my kernel did not start
(i.e. did not print the initial FreeBSD version line), but instead stopped
with:


Oy =/


panic: No heap setup

I recovered by booting from an alternate boot device and kept my system
running until today, where I decided to give the LUA boot another try.

The boot failure happened again, with identical message:

 panic: No heap setup


Hmm... that's an sbrk panic [1], indicating that setheap hadn't been
called. zfsgptboot is zfsboot with gpt bits included, so the relevant
setheap call is [2] I believe. It's not immediately clear to me how
switching interpreters could actually be breaking it in this way.

At what point are you hitting this panic? After menu, before kernel transition?


The menu is displayed and I can unload the kernel and load the kernel
and modules from an alternate path. The lua code seems to work just fine,
but as soon as I enter the "boot" command, the panic happens.

This happens when the loader transfers control to the kernel but before
any other output is generated. I tried booting a GENERIC kernel just to
be sure this is not caused by an out-dated kernel config file.


I tried booting a GENERIC kernel, but only rebuilding the boot loader
(gptzfsloader in my case) without LUA support fixed the issue for me ...

The system is -CURRENT (built today) on amd64 (not converted to UEFI, yet).


Hmmm, the code references point into the boot loader code - I had
expected that there is a problem in the kernel, not the boot loader.


[1] https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56


Seems that setbase has either not been called or has been called with base=0.


[2] 
https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688


I had thought, that the zfs boot code has been initialized before the
menu is displayed?

Or do I misunderstand this phase of the boot process???

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-27 Thread Kyle Evans
On Tue, Mar 27, 2018 at 11:06 AM, Stefan Esser  wrote:
> A few weeks ago I tried the LUA boot and found, that my kernel did not start
> (i.e. did not print the initial FreeBSD version line), but instead stopped
> with:

Oy =/

> panic: No heap setup
>
> I recovered by booting from an alternate boot device and kept my system
> running until today, where I decided to give the LUA boot another try.
>
> The boot failure happened again, with identical message:
>
> panic: No heap setup

Hmm... that's an sbrk panic [1], indicating that setheap hadn't been
called. zfsgptboot is zfsboot with gpt bits included, so the relevant
setheap call is [2] I believe. It's not immediately clear to me how
switching interpreters could actually be breaking it in this way.

At what point are you hitting this panic? After menu, before kernel transition?

> I tried booting a GENERIC kernel, but only rebuilding the boot loader
> (gptzfsloader in my case) without LUA support fixed the issue for me ...
>
> The system is -CURRENT (built today) on amd64 (not converted to UEFI, yet).

[1] https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
[2] 
https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"