Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-23 Thread Willem Jan Withagen
On 23-12-2016 22:07, Slawa Olhovchenkov wrote:
> On Fri, Dec 23, 2016 at 09:37:40PM +0100, Willem Jan Withagen wrote:
> 
>> On 23-12-2016 20:30, Slawa Olhovchenkov wrote:
>>> On Fri, Dec 23, 2016 at 08:16:39PM +0100, Willem Jan Withagen wrote:
>>>
 On 23-12-2016 14:26, Slawa Olhovchenkov wrote:
> On Thu, Dec 22, 2016 at 09:26:02PM +0100, Willem Jan Withagen wrote:
>
>> On 16-12-2016 00:57, Adrian Chadd wrote:
>>> heh, an updated BIOS that solves the problem will solve the problem. :)
>>>
>>> I think you have enough information to provide to supermicro. Ie,
>>> "SMAP says X, when physical memory pages at addresses X are accessed,
>>> they don't behave like memory, maybe something is wrong".
>>>
>>> All I can think of is some hack to add a blacklist for that region so
>>> you can boot the unit. But it makes me wonder what else is going on.
>>
>> I have an X10DRL-iT with 256Gb and 2* 2630V4 available for testing until
>> begin January. Started it on 11-RELEASE and upgraded to 12-CURRENT of
>> 20-12-2016.
>> Boots just fine, and seems to run OKE.
>>
>> If anything useful to test, just let me know.
>
> For touch issuse you must enable in BIOS both NUMA and Memory
> Interleave below 4G.

 Numa was already on, but I cannot find the Memory Interleave option.
>>>
>>> for X10DRi:
>>>
>>> Advanced/Chipset Config/North Bridge/Memory Config/Socket Interleave below 
>>> 4G
>>
>> The only thing that could be this is:
>>  a7 mode,
>> but that is already enabled.
>> This speaks about a bit higher memory bandwidth.
> 
> In may case A7 immediately below 'Socket Interleave below 4G'

Right, then I do not have this option.

Sorry,
--WjW


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-23 Thread Slawa Olhovchenkov
On Fri, Dec 23, 2016 at 09:37:40PM +0100, Willem Jan Withagen wrote:

> On 23-12-2016 20:30, Slawa Olhovchenkov wrote:
> > On Fri, Dec 23, 2016 at 08:16:39PM +0100, Willem Jan Withagen wrote:
> > 
> >> On 23-12-2016 14:26, Slawa Olhovchenkov wrote:
> >>> On Thu, Dec 22, 2016 at 09:26:02PM +0100, Willem Jan Withagen wrote:
> >>>
>  On 16-12-2016 00:57, Adrian Chadd wrote:
> > heh, an updated BIOS that solves the problem will solve the problem. :)
> >
> > I think you have enough information to provide to supermicro. Ie,
> > "SMAP says X, when physical memory pages at addresses X are accessed,
> > they don't behave like memory, maybe something is wrong".
> >
> > All I can think of is some hack to add a blacklist for that region so
> > you can boot the unit. But it makes me wonder what else is going on.
> 
>  I have an X10DRL-iT with 256Gb and 2* 2630V4 available for testing until
>  begin January. Started it on 11-RELEASE and upgraded to 12-CURRENT of
>  20-12-2016.
>  Boots just fine, and seems to run OKE.
> 
>  If anything useful to test, just let me know.
> >>>
> >>> For touch issuse you must enable in BIOS both NUMA and Memory
> >>> Interleave below 4G.
> >>
> >> Numa was already on, but I cannot find the Memory Interleave option.
> > 
> > for X10DRi:
> > 
> > Advanced/Chipset Config/North Bridge/Memory Config/Socket Interleave below 
> > 4G
> 
> The only thing that could be this is:
>   a7 mode,
> but that is already enabled.
> This speaks about a bit higher memory bandwidth.

In may case A7 immediately below 'Socket Interleave below 4G'

> On the PCIe page ther is something like:
>   above 4G encoding
> but that will probably not be it.
> 
> --WjW
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-23 Thread Willem Jan Withagen
On 23-12-2016 20:30, Slawa Olhovchenkov wrote:
> On Fri, Dec 23, 2016 at 08:16:39PM +0100, Willem Jan Withagen wrote:
> 
>> On 23-12-2016 14:26, Slawa Olhovchenkov wrote:
>>> On Thu, Dec 22, 2016 at 09:26:02PM +0100, Willem Jan Withagen wrote:
>>>
 On 16-12-2016 00:57, Adrian Chadd wrote:
> heh, an updated BIOS that solves the problem will solve the problem. :)
>
> I think you have enough information to provide to supermicro. Ie,
> "SMAP says X, when physical memory pages at addresses X are accessed,
> they don't behave like memory, maybe something is wrong".
>
> All I can think of is some hack to add a blacklist for that region so
> you can boot the unit. But it makes me wonder what else is going on.

 I have an X10DRL-iT with 256Gb and 2* 2630V4 available for testing until
 begin January. Started it on 11-RELEASE and upgraded to 12-CURRENT of
 20-12-2016.
 Boots just fine, and seems to run OKE.

 If anything useful to test, just let me know.
>>>
>>> For touch issuse you must enable in BIOS both NUMA and Memory
>>> Interleave below 4G.
>>
>> Numa was already on, but I cannot find the Memory Interleave option.
> 
> for X10DRi:
> 
> Advanced/Chipset Config/North Bridge/Memory Config/Socket Interleave below 4G

The only thing that could be this is:
a7 mode,
but that is already enabled.
This speaks about a bit higher memory bandwidth.

On the PCIe page ther is something like:
above 4G encoding
but that will probably not be it.

--WjW

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-23 Thread Slawa Olhovchenkov
On Fri, Dec 23, 2016 at 08:16:39PM +0100, Willem Jan Withagen wrote:

> On 23-12-2016 14:26, Slawa Olhovchenkov wrote:
> > On Thu, Dec 22, 2016 at 09:26:02PM +0100, Willem Jan Withagen wrote:
> > 
> >> On 16-12-2016 00:57, Adrian Chadd wrote:
> >>> heh, an updated BIOS that solves the problem will solve the problem. :)
> >>>
> >>> I think you have enough information to provide to supermicro. Ie,
> >>> "SMAP says X, when physical memory pages at addresses X are accessed,
> >>> they don't behave like memory, maybe something is wrong".
> >>>
> >>> All I can think of is some hack to add a blacklist for that region so
> >>> you can boot the unit. But it makes me wonder what else is going on.
> >>
> >> I have an X10DRL-iT with 256Gb and 2* 2630V4 available for testing until
> >> begin January. Started it on 11-RELEASE and upgraded to 12-CURRENT of
> >> 20-12-2016.
> >> Boots just fine, and seems to run OKE.
> >>
> >> If anything useful to test, just let me know.
> > 
> > For touch issuse you must enable in BIOS both NUMA and Memory
> > Interleave below 4G.
> 
> Numa was already on, but I cannot find the Memory Interleave option.

for X10DRi:

Advanced/Chipset Config/North Bridge/Memory Config/Socket Interleave below 4G
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-23 Thread Willem Jan Withagen
On 23-12-2016 14:26, Slawa Olhovchenkov wrote:
> On Thu, Dec 22, 2016 at 09:26:02PM +0100, Willem Jan Withagen wrote:
> 
>> On 16-12-2016 00:57, Adrian Chadd wrote:
>>> heh, an updated BIOS that solves the problem will solve the problem. :)
>>>
>>> I think you have enough information to provide to supermicro. Ie,
>>> "SMAP says X, when physical memory pages at addresses X are accessed,
>>> they don't behave like memory, maybe something is wrong".
>>>
>>> All I can think of is some hack to add a blacklist for that region so
>>> you can boot the unit. But it makes me wonder what else is going on.
>>
>> I have an X10DRL-iT with 256Gb and 2* 2630V4 available for testing until
>> begin January. Started it on 11-RELEASE and upgraded to 12-CURRENT of
>> 20-12-2016.
>> Boots just fine, and seems to run OKE.
>>
>> If anything useful to test, just let me know.
> 
> For touch issuse you must enable in BIOS both NUMA and Memory
> Interleave below 4G.

Numa was already on, but I cannot find the Memory Interleave option.

--WjW


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-23 Thread Slawa Olhovchenkov
On Thu, Dec 22, 2016 at 09:26:02PM +0100, Willem Jan Withagen wrote:

> On 16-12-2016 00:57, Adrian Chadd wrote:
> > heh, an updated BIOS that solves the problem will solve the problem. :)
> > 
> > I think you have enough information to provide to supermicro. Ie,
> > "SMAP says X, when physical memory pages at addresses X are accessed,
> > they don't behave like memory, maybe something is wrong".
> > 
> > All I can think of is some hack to add a blacklist for that region so
> > you can boot the unit. But it makes me wonder what else is going on.
> 
> I have an X10DRL-iT with 256Gb and 2* 2630V4 available for testing until
> begin January. Started it on 11-RELEASE and upgraded to 12-CURRENT of
> 20-12-2016.
> Boots just fine, and seems to run OKE.
> 
> If anything useful to test, just let me know.

For touch issuse you must enable in BIOS both NUMA and Memory
Interleave below 4G.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-22 Thread Willem Jan Withagen
On 16-12-2016 00:57, Adrian Chadd wrote:
> heh, an updated BIOS that solves the problem will solve the problem. :)
> 
> I think you have enough information to provide to supermicro. Ie,
> "SMAP says X, when physical memory pages at addresses X are accessed,
> they don't behave like memory, maybe something is wrong".
> 
> All I can think of is some hack to add a blacklist for that region so
> you can boot the unit. But it makes me wonder what else is going on.

I have an X10DRL-iT with 256Gb and 2* 2630V4 available for testing until
begin January. Started it on 11-RELEASE and upgraded to 12-CURRENT of
20-12-2016.
Boots just fine, and seems to run OKE.

If anything useful to test, just let me know.

--WjW


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-21 Thread Steve Wills
Hi,

On 12/21/2016 06:58, Konstantin Belousov wrote:
> What is the exact version of the kernel you are running and which hangs ?

Right now I'm running r310303, booted with SMP disabled.

> Try to bisect.
> 

The issue appeared without updating the OS, but I have since updated.
That said, I have tried booting 10.3 and 9.3 from USB memstick without
success.

> Do you have EARLY_AP_STARTUP option in the kernel config ?
> 

I did, but I have since tried adding nooptions EARLY_AP_STARTUP to my
kernel config (my config include's GENERIC), which didn't affect the issue.

> Send NMI with 'ipmi power diag' and show the machine state from ddb.
> 

This board is the version without ipmi support, so I unfortunately can't
do that.

Steve



signature.asc
Description: OpenPGP digital signature


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-21 Thread Konstantin Belousov
On Tue, Dec 20, 2016 at 04:49:29PM -0500, Steve Wills wrote:
> Hi,
> 
> On 12/16/2016 16:20, John Baldwin wrote:
> > On Thursday, December 15, 2016 03:57:58 PM Adrian Chadd wrote:
> >> heh, an updated BIOS that solves the problem will solve the problem. :)
> >>
> >> I think you have enough information to provide to supermicro. Ie,
> >> "SMAP says X, when physical memory pages at addresses X are accessed,
> >> they don't behave like memory, maybe something is wrong".
> >>
> >> All I can think of is some hack to add a blacklist for that region so
> >> you can boot the unit. But it makes me wonder what else is going on.
> > 
> > We have the blacklist: it is the memory test.  That is the way to workaround
> > this type of BIOS breakage.  This is just the first time in over a decade 
> > that
> > test has been relevant.
> 
> I've got a SuperMicro X10SRA board that I bought back in March, I think.
> It was run CURRENT fine since then, until last month, when it started
> hanging during boot. I was about to update it to a new version of
> CURRENT when it started hanging at boot, but hadn't updated yet. The
> hang is after (verbose boot):
What is the exact version of the kernel you are running and which hangs ?
Try to bisect.

Do you have EARLY_AP_STARTUP option in the kernel config ?

> 
> ACPI APIC Table: 
> Package ID shift: 4
> L3 cache ID shift: 4
> L2 cache ID shift: 1
> L1 cache ID shift: 1
> Core ID shift: 1
Send NMI with 'ipmi power diag' and show the machine state from ddb.

> 
> Recently I've tried booting 9.3 and 10.3 on it without success. Other
> operating systems boot fine. Thinking the hang was similar to the one in
> this thread (or at least the board is), I tried many different BIOS
> changes and also tried enabling the memory test, but none of that
> changes anything. This is a single socket board so there are no NUMA or
> memory interleaving options in the BIOS. The BIOS is up to date (2.0a).
> It will boot if SMP is disabled. That's obviously sub-optimal, but is
> useful for building updated kernels, which I've tried. If anyone has any
> suggestions or ideas, I'd appreciate it.
> 
> Thanks,
> Steve
> 



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-20 Thread Steve Wills
Hi,

On 12/16/2016 16:20, John Baldwin wrote:
> On Thursday, December 15, 2016 03:57:58 PM Adrian Chadd wrote:
>> heh, an updated BIOS that solves the problem will solve the problem. :)
>>
>> I think you have enough information to provide to supermicro. Ie,
>> "SMAP says X, when physical memory pages at addresses X are accessed,
>> they don't behave like memory, maybe something is wrong".
>>
>> All I can think of is some hack to add a blacklist for that region so
>> you can boot the unit. But it makes me wonder what else is going on.
> 
> We have the blacklist: it is the memory test.  That is the way to workaround
> this type of BIOS breakage.  This is just the first time in over a decade that
> test has been relevant.

I've got a SuperMicro X10SRA board that I bought back in March, I think.
It was run CURRENT fine since then, until last month, when it started
hanging during boot. I was about to update it to a new version of
CURRENT when it started hanging at boot, but hadn't updated yet. The
hang is after (verbose boot):

ACPI APIC Table: 
Package ID shift: 4
L3 cache ID shift: 4
L2 cache ID shift: 1
L1 cache ID shift: 1
Core ID shift: 1

Recently I've tried booting 9.3 and 10.3 on it without success. Other
operating systems boot fine. Thinking the hang was similar to the one in
this thread (or at least the board is), I tried many different BIOS
changes and also tried enabling the memory test, but none of that
changes anything. This is a single socket board so there are no NUMA or
memory interleaving options in the BIOS. The BIOS is up to date (2.0a).
It will boot if SMP is disabled. That's obviously sub-optimal, but is
useful for building updated kernels, which I've tried. If anyone has any
suggestions or ideas, I'd appreciate it.

Thanks,
Steve



signature.asc
Description: OpenPGP digital signature


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-16 Thread John Baldwin
On Thursday, December 15, 2016 03:57:58 PM Adrian Chadd wrote:
> heh, an updated BIOS that solves the problem will solve the problem. :)
> 
> I think you have enough information to provide to supermicro. Ie,
> "SMAP says X, when physical memory pages at addresses X are accessed,
> they don't behave like memory, maybe something is wrong".
> 
> All I can think of is some hack to add a blacklist for that region so
> you can boot the unit. But it makes me wonder what else is going on.

We have the blacklist: it is the memory test.  That is the way to workaround
this type of BIOS breakage.  This is just the first time in over a decade that
test has been relevant.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-15 Thread Adrian Chadd
heh, an updated BIOS that solves the problem will solve the problem. :)

I think you have enough information to provide to supermicro. Ie,
"SMAP says X, when physical memory pages at addresses X are accessed,
they don't behave like memory, maybe something is wrong".

All I can think of is some hack to add a blacklist for that region so
you can boot the unit. But it makes me wonder what else is going on.


-adrian
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-15 Thread Slawa Olhovchenkov
On Thu, Dec 15, 2016 at 03:56:56PM +0200, Konstantin Belousov wrote:

> > > Possibly, the dmesg of the boot (with late_console=0) with this and only
> > > this patch applied against stock HEAD.  This might be long.
> > 
> > Do you need all (262144?) lines?
> > 
> > Testing system
> > memorypb
> >  0x204000
> > pb 0x2040001000
> > pb 0x2040002000
> > pb 0x2040003000
> > pb 0x2040004000
> > pb 0x2040005000
> > pb 0x2040006000
> > [...]
> > pb 0x207000
> > 
> > > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> > > index 682307f5fe4..072c8d76acf 100644
> > > --- a/sys/amd64/amd64/machdep.c
> > > +++ b/sys/amd64/amd64/machdep.c
> > > @@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
> > >*/
> > >   *(int *)ptr = tmp;
> > >  
> > > +if (page_bad) printf("pb 0x%lx\n", pa);
> > >  skip_memtest:
> > >   /*
> > >* Adjust array of valid/good pages.
> > 
> > PS: memtest86 hung at test 128-130G (server have 128G installed).
> Well, the physical memory is 128G, but it is not mapped contiguously into
> the address space accessible to the processors.  E.g. in the SMAPs you
> posted above, there are several holes (type 2) used for PCIe config
> window, PCI BARs, APICs, and other i/o register pages.  Intel chipsets
> allow to remap the RAM hidden by the io pages, which is probably not
> done correctly by BIOS.
> 
> The SMAP clearly reports segment 0x1-0x208000 as populated
> by RAM, this is 4G-130G.  Very primitive memory test in kernel does
> not like all pages starting at 129G.  Possibly important detail is that
> kernel memory test only touches first 4 bytes on each page.  So if BIOS
> erronously mapped any io registers into that range, memory test might
> luckily avoid touching anything critical, but still noting that the
> page does not behave as RAM.
> 
> Update BIOS, and if the issue persists, contact supermicro. This
> interesting detail adds even more evidence that BIOS is problematic.

Updated BIOS don't solve this.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-15 Thread Konstantin Belousov
On Thu, Dec 15, 2016 at 04:16:24PM +0300, Slawa Olhovchenkov wrote:
> On Thu, Dec 15, 2016 at 02:33:30PM +0200, Konstantin Belousov wrote:
> 
> > On Thu, Dec 15, 2016 at 01:51:18PM +0300, Slawa Olhovchenkov wrote:
> > > On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:
> > > 
> > > > So my opinion did not changed, this sounds like firmware problem.
> > > > I do not see how can I drill into it more.
> > > 
> > > I am don't know how it related. msgbufp mapped different with and w/o
> > > memory test:
> > > 
> > > w/o memory test, hang:
> > > msgbufp=0xf8207ff0 pa_indx=7 phys_avail[pa_indx]=207ff0
> > > 
> > > w/ memory test, boot:
> > > msgbufp=0xf8203ff0 pa_indx=7 phys_avail[pa_indx]=203ff0
> > Interesting.
> > 
> > Can you show me
> > - the output of the smap command from the loader (yes, I know it was already
> >   shown, I want it in the same mail as the data below for convenience);
> 
> OK set hw.memtest.tests=1
> OK smap
> SMAP type=01 base= len=00099c00 attr=01
> SMAP type=02 base=00099c00 len=6400 attr=01
> SMAP type=02 base=000e len=0002 attr=01
> SMAP type=01 base=0010 len=7906b000 attr=01
> SMAP type=02 base=7916b000 len=00936000 attr=01
> SMAP type=04 base=79aa1000 len=00509000 attr=01
> SMAP type=02 base=79faa000 len=02056000 attr=01
> SMAP type=01 base=0001 len=001f8000 attr=01
> SMAP type=02 base=7c00 len=1400 attr=01
> SMAP type=02 base=fed1c000 len=00029000 attr=01
> SMAP type=02 base=ff00 len=0100 attr=01
> 
> > - the output of sysctl machdep.smap after the succesfull boot with the
> >   memtest enabled.
> 
> machdep.smap:
> SMAP type=01, xattr=01, base=, len=00099c00
> SMAP type=02, xattr=01, base=00099c00, len=6400
> SMAP type=02, xattr=01, base=000e, len=0002
> SMAP type=01, xattr=01, base=0010, len=7906b000
> SMAP type=02, xattr=01, base=7916b000, len=00936000
> SMAP type=04, xattr=01, base=79aa1000, len=00509000
> SMAP type=02, xattr=01, base=79faa000, len=02056000
> SMAP type=01, xattr=01, base=0001, len=001f8000
> SMAP type=02, xattr=01, base=7c00, len=1400
> SMAP type=02, xattr=01, base=fed1c000, len=00029000
> SMAP type=02, xattr=01, base=ff00, len=0100
> 
> > Possibly, the dmesg of the boot (with late_console=0) with this and only
> > this patch applied against stock HEAD.  This might be long.
> 
> Do you need all (262144?) lines?
> 
> Testing system
> memorypb
>  0x204000
> pb 0x2040001000
> pb 0x2040002000
> pb 0x2040003000
> pb 0x2040004000
> pb 0x2040005000
> pb 0x2040006000
> [...]
> pb 0x207000
> 
> > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> > index 682307f5fe4..072c8d76acf 100644
> > --- a/sys/amd64/amd64/machdep.c
> > +++ b/sys/amd64/amd64/machdep.c
> > @@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
> >  */
> > *(int *)ptr = tmp;
> >  
> > +if (page_bad) printf("pb 0x%lx\n", pa);
> >  skip_memtest:
> > /*
> >  * Adjust array of valid/good pages.
> 
> PS: memtest86 hung at test 128-130G (server have 128G installed).
Well, the physical memory is 128G, but it is not mapped contiguously into
the address space accessible to the processors.  E.g. in the SMAPs you
posted above, there are several holes (type 2) used for PCIe config
window, PCI BARs, APICs, and other i/o register pages.  Intel chipsets
allow to remap the RAM hidden by the io pages, which is probably not
done correctly by BIOS.

The SMAP clearly reports segment 0x1-0x208000 as populated
by RAM, this is 4G-130G.  Very primitive memory test in kernel does
not like all pages starting at 129G.  Possibly important detail is that
kernel memory test only touches first 4 bytes on each page.  So if BIOS
erronously mapped any io registers into that range, memory test might
luckily avoid touching anything critical, but still noting that the
page does not behave as RAM.

Update BIOS, and if the issue persists, contact supermicro. This
interesting detail adds even more evidence that BIOS is problematic.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-15 Thread Slawa Olhovchenkov
On Thu, Dec 15, 2016 at 02:33:30PM +0200, Konstantin Belousov wrote:

> On Thu, Dec 15, 2016 at 01:51:18PM +0300, Slawa Olhovchenkov wrote:
> > On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:
> > 
> > > So my opinion did not changed, this sounds like firmware problem.
> > > I do not see how can I drill into it more.
> > 
> > I am don't know how it related. msgbufp mapped different with and w/o
> > memory test:
> > 
> > w/o memory test, hang:
> > msgbufp=0xf8207ff0 pa_indx=7 phys_avail[pa_indx]=207ff0
> > 
> > w/ memory test, boot:
> > msgbufp=0xf8203ff0 pa_indx=7 phys_avail[pa_indx]=203ff0
> Interesting.
> 
> Can you show me
> - the output of the smap command from the loader (yes, I know it was already
>   shown, I want it in the same mail as the data below for convenience);

OK set hw.memtest.tests=1
OK smap
SMAP type=01 base= len=00099c00 attr=01
SMAP type=02 base=00099c00 len=6400 attr=01
SMAP type=02 base=000e len=0002 attr=01
SMAP type=01 base=0010 len=7906b000 attr=01
SMAP type=02 base=7916b000 len=00936000 attr=01
SMAP type=04 base=79aa1000 len=00509000 attr=01
SMAP type=02 base=79faa000 len=02056000 attr=01
SMAP type=01 base=0001 len=001f8000 attr=01
SMAP type=02 base=7c00 len=1400 attr=01
SMAP type=02 base=fed1c000 len=00029000 attr=01
SMAP type=02 base=ff00 len=0100 attr=01

> - the output of sysctl machdep.smap after the succesfull boot with the
>   memtest enabled.

machdep.smap:
SMAP type=01, xattr=01, base=, len=00099c00
SMAP type=02, xattr=01, base=00099c00, len=6400
SMAP type=02, xattr=01, base=000e, len=0002
SMAP type=01, xattr=01, base=0010, len=7906b000
SMAP type=02, xattr=01, base=7916b000, len=00936000
SMAP type=04, xattr=01, base=79aa1000, len=00509000
SMAP type=02, xattr=01, base=79faa000, len=02056000
SMAP type=01, xattr=01, base=0001, len=001f8000
SMAP type=02, xattr=01, base=7c00, len=1400
SMAP type=02, xattr=01, base=fed1c000, len=00029000
SMAP type=02, xattr=01, base=ff00, len=0100

> Possibly, the dmesg of the boot (with late_console=0) with this and only
> this patch applied against stock HEAD.  This might be long.

Do you need all (262144?) lines?

Testing system
memorypb
 0x204000
pb 0x2040001000
pb 0x2040002000
pb 0x2040003000
pb 0x2040004000
pb 0x2040005000
pb 0x2040006000
[...]
pb 0x207000

> diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> index 682307f5fe4..072c8d76acf 100644
> --- a/sys/amd64/amd64/machdep.c
> +++ b/sys/amd64/amd64/machdep.c
> @@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
>*/
>   *(int *)ptr = tmp;
>  
> +if (page_bad) printf("pb 0x%lx\n", pa);
>  skip_memtest:
>   /*
>* Adjust array of valid/good pages.

PS: memtest86 hung at test 128-130G (server have 128G installed).
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-15 Thread Konstantin Belousov
On Thu, Dec 15, 2016 at 01:51:18PM +0300, Slawa Olhovchenkov wrote:
> On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:
> 
> > So my opinion did not changed, this sounds like firmware problem.
> > I do not see how can I drill into it more.
> 
> I am don't know how it related. msgbufp mapped different with and w/o
> memory test:
> 
> w/o memory test, hang:
> msgbufp=0xf8207ff0 pa_indx=7 phys_avail[pa_indx]=207ff0
> 
> w/ memory test, boot:
> msgbufp=0xf8203ff0 pa_indx=7 phys_avail[pa_indx]=203ff0
Interesting.

Can you show me
- the output of the smap command from the loader (yes, I know it was already
  shown, I want it in the same mail as the data below for convenience);
- the output of sysctl machdep.smap after the succesfull boot with the
  memtest enabled.
Possibly, the dmesg of the boot (with late_console=0) with this and only
this patch applied against stock HEAD.  This might be long.

diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index 682307f5fe4..072c8d76acf 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
 */
*(int *)ptr = tmp;
 
+if (page_bad) printf("pb 0x%lx\n", pa);
 skip_memtest:
/*
 * Adjust array of valid/good pages.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-15 Thread Slawa Olhovchenkov
On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:

> So my opinion did not changed, this sounds like firmware problem.
> I do not see how can I drill into it more.

I am don't know how it related. msgbufp mapped different with and w/o
memory test:

w/o memory test, hang:
msgbufp=0xf8207ff0 pa_indx=7 phys_avail[pa_indx]=207ff0

w/ memory test, boot:
msgbufp=0xf8203ff0 pa_indx=7 phys_avail[pa_indx]=203ff0

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread A. Wilcox
On 14/12/16 13:48, Slawa Olhovchenkov wrote:
> On Wed, Dec 14, 2016 at 09:43:24PM +0200, Konstantin Belousov wrote:
> 
>> On Wed, Dec 14, 2016 at 10:29:43PM +0300, Slawa Olhovchenkov wrote:
>>> On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:
>>>
 On Wed, Dec 14, 2016 at 06:26:27PM +0300, Slawa Olhovchenkov wrote:
> For test hardware setup (NUMA+interleave), what ISO I can try to boot?
 Didn't you already tried ?
>>>
>>> Different from FreeBSD.
>> Can you reformulate the statement ?
>> Did you booted some other (non-FreeBSD) OS and it hung with that options
>> as well ?
> 
> No, I don't try now, can you advice some OS for test?

Ugh, Supermicro is big pain.

Try CentOS, also try Debian.  Just to see.  Maybe you get lucky, and one
of them hangs too... Debian usually runs older kernels, so more likely
to not have a workaround.

Best solution: new mainboard vendor, until Supermicro works out their
dumb firmware and makes it less dumb. :(

--arw


-- 
A. Wilcox (awilfox)
Open-source programmer (C, C++, Python)
https://code.foxkit.us/u/awilfox/



signature.asc
Description: OpenPGP digital signature


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Slawa Olhovchenkov
On Wed, Dec 14, 2016 at 09:43:24PM +0200, Konstantin Belousov wrote:

> On Wed, Dec 14, 2016 at 10:29:43PM +0300, Slawa Olhovchenkov wrote:
> > On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:
> > 
> > > On Wed, Dec 14, 2016 at 06:26:27PM +0300, Slawa Olhovchenkov wrote:
> > > > For test hardware setup (NUMA+interleave), what ISO I can try to boot?
> > > Didn't you already tried ?
> > 
> > Different from FreeBSD.
> Can you reformulate the statement ?
> Did you booted some other (non-FreeBSD) OS and it hung with that options
> as well ?

No, I don't try now, can you advice some OS for test?

> > For sure about firmware problem and complains to Supermicro.
> > I think FreeBSD problem don't be accepted by Supermicro.
> You never know.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Konstantin Belousov
On Wed, Dec 14, 2016 at 10:29:43PM +0300, Slawa Olhovchenkov wrote:
> On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:
> 
> > On Wed, Dec 14, 2016 at 06:26:27PM +0300, Slawa Olhovchenkov wrote:
> > > For test hardware setup (NUMA+interleave), what ISO I can try to boot?
> > Didn't you already tried ?
> 
> Different from FreeBSD.
Can you reformulate the statement ?
Did you booted some other (non-FreeBSD) OS and it hung with that options
as well ?

> For sure about firmware problem and complains to Supermicro.
> I think FreeBSD problem don't be accepted by Supermicro.
You never know.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Slawa Olhovchenkov
On Wed, Dec 14, 2016 at 09:03:49PM +0200, Konstantin Belousov wrote:

> On Wed, Dec 14, 2016 at 06:26:27PM +0300, Slawa Olhovchenkov wrote:
> > On Wed, Dec 14, 2016 at 03:13:36PM +0300, Slawa Olhovchenkov wrote:
> > 
> > > On Wed, Dec 14, 2016 at 01:39:27PM +0200, Konstantin Belousov wrote:
> > > 
> > > > In other words, it is almost certainly the hang and not a fault causing
> > > > hang. This means that the machine is not compliant with the IA32
> > > > architecture, in particular, the region reported as normal memory by
> > > > E820 BIOS service does not behave as normal memory.
> > > > 
> > > > Since regardless of the option setting, the memory map is same, and
> > > > bootstrap page table only depend on the memory map, we use the same page
> > > > table when hanging and when operating correctly. We do not fault or hang
> > > > when the option is turned off, which together with the improved early
> > > > fault handling in the patch, makes it almost certain that the problem is
> > > > in hardware configuration and not in our early setup.
> > > > 
> > > > Of course, the most puzzling part is that memory test makes the hang
> > > > go away, while repeating memory test operation only on the msgbuf region
> > > > does not. msgbuf is special in that it is located at TOHM (top of high
> > > > memory). It spans 128KB from below it to the last byte of the last
> > > > physical segment.
> > > > 
> > > > The only ideas I have right now is that there is either a bug in the
> > > > Caching Agent/Home agent/IMC configuration in BIOS, in which case there
> > > > is nothing OS can do to mitigate it.  Or it might be that the memory
> > > > map reported by CMS is wrong (you said that you use legacy boot, right
> > > > ?).  This is not too surprising if true, because non-EFI boot code path
> > > > definitely get less and less testing.
> > > > 
> > > > For the later case (potential bug in CMS), could you switch to EFI boot
> > > > mode and see whether the issue magically healths itself ?  You could 
> > > > boot
> > > > from USB stick in EFI mode without reinstalling for test.
> > > 
> > > I can't boot from USB stick -- this is remote DC and IPMI allow only
> > > CDROM emulation.
> > > 
> > > OK, I am boot in UEFI 12.0 snapshot ISO.
> > > Boot ok.
> > 
> > Sorry. Overload bu work and test wrong combination (NUMA=ON,
> > interleave=OFF)
> > 
> > snapshot iso don't boot with NUMA=ON interleave=ON
> Ok.
> 
> > 
> > For test hardware setup (NUMA+interleave), what ISO I can try to boot?
> Didn't you already tried ?

Different from FreeBSD.
For sure about firmware problem and complains to Supermicro.
I think FreeBSD problem don't be accepted by Supermicro.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Konstantin Belousov
On Wed, Dec 14, 2016 at 06:26:27PM +0300, Slawa Olhovchenkov wrote:
> On Wed, Dec 14, 2016 at 03:13:36PM +0300, Slawa Olhovchenkov wrote:
> 
> > On Wed, Dec 14, 2016 at 01:39:27PM +0200, Konstantin Belousov wrote:
> > 
> > > In other words, it is almost certainly the hang and not a fault causing
> > > hang. This means that the machine is not compliant with the IA32
> > > architecture, in particular, the region reported as normal memory by
> > > E820 BIOS service does not behave as normal memory.
> > > 
> > > Since regardless of the option setting, the memory map is same, and
> > > bootstrap page table only depend on the memory map, we use the same page
> > > table when hanging and when operating correctly. We do not fault or hang
> > > when the option is turned off, which together with the improved early
> > > fault handling in the patch, makes it almost certain that the problem is
> > > in hardware configuration and not in our early setup.
> > > 
> > > Of course, the most puzzling part is that memory test makes the hang
> > > go away, while repeating memory test operation only on the msgbuf region
> > > does not. msgbuf is special in that it is located at TOHM (top of high
> > > memory). It spans 128KB from below it to the last byte of the last
> > > physical segment.
> > > 
> > > The only ideas I have right now is that there is either a bug in the
> > > Caching Agent/Home agent/IMC configuration in BIOS, in which case there
> > > is nothing OS can do to mitigate it.  Or it might be that the memory
> > > map reported by CMS is wrong (you said that you use legacy boot, right
> > > ?).  This is not too surprising if true, because non-EFI boot code path
> > > definitely get less and less testing.
> > > 
> > > For the later case (potential bug in CMS), could you switch to EFI boot
> > > mode and see whether the issue magically healths itself ?  You could boot
> > > from USB stick in EFI mode without reinstalling for test.
> > 
> > I can't boot from USB stick -- this is remote DC and IPMI allow only
> > CDROM emulation.
> > 
> > OK, I am boot in UEFI 12.0 snapshot ISO.
> > Boot ok.
> 
> Sorry. Overload bu work and test wrong combination (NUMA=ON,
> interleave=OFF)
> 
> snapshot iso don't boot with NUMA=ON interleave=ON
Ok.

> 
> For test hardware setup (NUMA+interleave), what ISO I can try to boot?
Didn't you already tried ?

> 
> PS: memmaps:
> 
> NUMA=ON interleave=OFF
> OK memmap
>Type Physical  Virtual   #Pages Attr
>BootServicesCode   0008 UC WC WT WB
>  ConventionalMemory 8000  0027 UC WC WT WB
>BootServicesData 0002f000  0011 UC WC WT WB
>BootServicesCode 0004  0060 UC WC WT WB
>  ConventionalMemory 0010  000660a3 UC WC WT WB
>BootServicesData 661a3000  0080 UC WC WT WB
>  ConventionalMemory 66223000  76b8 UC WC WT WB
>  LoaderData 6d8db000  8000 UC WC WT WB
>  LoaderCode 758db000  0070 UC WC WT WB
>BootServicesData 7594b000  3220 UC WC WT WB
>  ConventionalMemory 78b6b000  028e UC WC WT WB
>BootServicesCode 78df9000  0372 UC WC WT WB
>Reserved 7916b000  0817 UC WC WT WB
>  ConventionalMemory 79982000  011f UC WC WT WB
>   ACPIMemoryNVS 79aa1000  0509 UC WC WT WB
> RuntimeServicesData 79faa000  1dbd UC WC WT WB
> RuntimeServicesCode 7bd67000  0061 UC WC WT WB
>BootServicesData 7bdc8000  0001 UC WC WT WB
> RuntimeServicesData 7bdc9000  0086 UC WC WT WB
>BootServicesData 7be4f000  01b1 UC WC WT WB
>  ConventionalMemory 0001  01f8 UC WC WT WB
>Reserved 7c00  4000
>  MemoryMappedIO 8000  0001 UC
>  MemoryMappedIO fed1c000  0029 UC
>  MemoryMappedIO ff00  1000 UC
> 
> NUMA=ON interleave=ON
>Type Physical  Virtual   #Pages Attr
>BootServicesCode   0008 UC WC WT WB
>  ConventionalMemory 8000  0027 UC WC WT WB
>BootServicesData 0002f000  0011 UC WC WT WB
>BootServicesCode 0004  0060 UC WC WT WB
>  ConventionalMemory 0010  000660a3 UC WC WT WB
>BootServicesData 661a3000  0080 UC WC WT WB
>  ConventionalMemory 66223000  76b8 UC WC WT WB
>  LoaderData 6d8db000  8000 UC WC WT WB
>  LoaderCode 758db000 

Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Slawa Olhovchenkov
On Wed, Dec 14, 2016 at 03:13:36PM +0300, Slawa Olhovchenkov wrote:

> On Wed, Dec 14, 2016 at 01:39:27PM +0200, Konstantin Belousov wrote:
> 
> > In other words, it is almost certainly the hang and not a fault causing
> > hang. This means that the machine is not compliant with the IA32
> > architecture, in particular, the region reported as normal memory by
> > E820 BIOS service does not behave as normal memory.
> > 
> > Since regardless of the option setting, the memory map is same, and
> > bootstrap page table only depend on the memory map, we use the same page
> > table when hanging and when operating correctly. We do not fault or hang
> > when the option is turned off, which together with the improved early
> > fault handling in the patch, makes it almost certain that the problem is
> > in hardware configuration and not in our early setup.
> > 
> > Of course, the most puzzling part is that memory test makes the hang
> > go away, while repeating memory test operation only on the msgbuf region
> > does not. msgbuf is special in that it is located at TOHM (top of high
> > memory). It spans 128KB from below it to the last byte of the last
> > physical segment.
> > 
> > The only ideas I have right now is that there is either a bug in the
> > Caching Agent/Home agent/IMC configuration in BIOS, in which case there
> > is nothing OS can do to mitigate it.  Or it might be that the memory
> > map reported by CMS is wrong (you said that you use legacy boot, right
> > ?).  This is not too surprising if true, because non-EFI boot code path
> > definitely get less and less testing.
> > 
> > For the later case (potential bug in CMS), could you switch to EFI boot
> > mode and see whether the issue magically healths itself ?  You could boot
> > from USB stick in EFI mode without reinstalling for test.
> 
> I can't boot from USB stick -- this is remote DC and IPMI allow only
> CDROM emulation.
> 
> OK, I am boot in UEFI 12.0 snapshot ISO.
> Boot ok.

Sorry. Overload bu work and test wrong combination (NUMA=ON,
interleave=OFF)

snapshot iso don't boot with NUMA=ON interleave=ON

For test hardware setup (NUMA+interleave), what ISO I can try to boot?

PS: memmaps:

NUMA=ON interleave=OFF
OK memmap
   Type Physical  Virtual   #Pages Attr
   BootServicesCode   0008 UC WC WT WB
 ConventionalMemory 8000  0027 UC WC WT WB
   BootServicesData 0002f000  0011 UC WC WT WB
   BootServicesCode 0004  0060 UC WC WT WB
 ConventionalMemory 0010  000660a3 UC WC WT WB
   BootServicesData 661a3000  0080 UC WC WT WB
 ConventionalMemory 66223000  76b8 UC WC WT WB
 LoaderData 6d8db000  8000 UC WC WT WB
 LoaderCode 758db000  0070 UC WC WT WB
   BootServicesData 7594b000  3220 UC WC WT WB
 ConventionalMemory 78b6b000  028e UC WC WT WB
   BootServicesCode 78df9000  0372 UC WC WT WB
   Reserved 7916b000  0817 UC WC WT WB
 ConventionalMemory 79982000  011f UC WC WT WB
  ACPIMemoryNVS 79aa1000  0509 UC WC WT WB
RuntimeServicesData 79faa000  1dbd UC WC WT WB
RuntimeServicesCode 7bd67000  0061 UC WC WT WB
   BootServicesData 7bdc8000  0001 UC WC WT WB
RuntimeServicesData 7bdc9000  0086 UC WC WT WB
   BootServicesData 7be4f000  01b1 UC WC WT WB
 ConventionalMemory 0001  01f8 UC WC WT WB
   Reserved 7c00  4000
 MemoryMappedIO 8000  0001 UC
 MemoryMappedIO fed1c000  0029 UC
 MemoryMappedIO ff00  1000 UC

NUMA=ON interleave=ON
   Type Physical  Virtual   #Pages Attr
   BootServicesCode   0008 UC WC WT WB
 ConventionalMemory 8000  0027 UC WC WT WB
   BootServicesData 0002f000  0011 UC WC WT WB
   BootServicesCode 0004  0060 UC WC WT WB
 ConventionalMemory 0010  000660a3 UC WC WT WB
   BootServicesData 661a3000  0080 UC WC WT WB
 ConventionalMemory 66223000  76b8 UC WC WT WB
 LoaderData 6d8db000  8000 UC WC WT WB
 LoaderCode 758db000  0070 UC WC WT WB
   BootServicesData 7594b000  3220 UC WC WT WB
 ConventionalMemory 78b6b000  028e UC WC WT WB
   BootServicesCode 78df9000  0372 UC WC WT WB
   Reserved 7916b000 

Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Slawa Olhovchenkov
On Wed, Dec 14, 2016 at 04:40:33PM +0200, Konstantin Belousov wrote:

> On Wed, Dec 14, 2016 at 03:13:36PM +0300, Slawa Olhovchenkov wrote:
> > On Wed, Dec 14, 2016 at 01:39:27PM +0200, Konstantin Belousov wrote:
> > 
> > > In other words, it is almost certainly the hang and not a fault causing
> > > hang. This means that the machine is not compliant with the IA32
> > > architecture, in particular, the region reported as normal memory by
> > > E820 BIOS service does not behave as normal memory.
> > > 
> > > Since regardless of the option setting, the memory map is same, and
> > > bootstrap page table only depend on the memory map, we use the same page
> > > table when hanging and when operating correctly. We do not fault or hang
> > > when the option is turned off, which together with the improved early
> > > fault handling in the patch, makes it almost certain that the problem is
> > > in hardware configuration and not in our early setup.
> > > 
> > > Of course, the most puzzling part is that memory test makes the hang
> > > go away, while repeating memory test operation only on the msgbuf region
> > > does not. msgbuf is special in that it is located at TOHM (top of high
> > > memory). It spans 128KB from below it to the last byte of the last
> > > physical segment.
> > > 
> > > The only ideas I have right now is that there is either a bug in the
> > > Caching Agent/Home agent/IMC configuration in BIOS, in which case there
> > > is nothing OS can do to mitigate it.  Or it might be that the memory
> > > map reported by CMS is wrong (you said that you use legacy boot, right
> > > ?).  This is not too surprising if true, because non-EFI boot code path
> > > definitely get less and less testing.
> > > 
> > > For the later case (potential bug in CMS), could you switch to EFI boot
> > > mode and see whether the issue magically healths itself ?  You could boot
> > > from USB stick in EFI mode without reinstalling for test.
> > 
> > I can't boot from USB stick -- this is remote DC and IPMI allow only
> > CDROM emulation.
> > 
> > OK, I am boot in UEFI 12.0 snapshot ISO.
> > Boot ok.
> > 
> > Can I convert installed OS to UEFI mode?
> I am not sure what do you ask there.  Are you asking whether I need any
> further information from the broken setup ?  I believe that no, I cannot
> debug this any further.

I am don't touch UEFI before. I am try to know how to switch for
existing installtion from legacy boot to UEFI boot (for use less
broken setup).

[may be NUMA+interleaving don't give me any good, but I am need test
for sure]

> I think that the interesting piece of data that can be obtained now is
> the memmap command output from the EFI loader from all three configurations,
> NUMA on/off and interleaving.

What you mean 'EFI loader'?
FreeBSD loader for UEFI mode?
Or UEFI shell from BIOS?

> > 
> > > Do you use latest BIOS for your motherboard ?
> > 
> > This is new MB (X10DRi) w/ BIOS 2.0, new is 2.1 but update is not
> > simple (need to prepare bootable dos ISO, mostly utilites don't work
> > under FreeBSD).
> IMO the only way to fix this issue, if it is really important, is
> to contact supermicro and show them the bug.  But this only makes sense if
> repeated on the latest firmware version.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Konstantin Belousov
On Wed, Dec 14, 2016 at 03:13:36PM +0300, Slawa Olhovchenkov wrote:
> On Wed, Dec 14, 2016 at 01:39:27PM +0200, Konstantin Belousov wrote:
> 
> > In other words, it is almost certainly the hang and not a fault causing
> > hang. This means that the machine is not compliant with the IA32
> > architecture, in particular, the region reported as normal memory by
> > E820 BIOS service does not behave as normal memory.
> > 
> > Since regardless of the option setting, the memory map is same, and
> > bootstrap page table only depend on the memory map, we use the same page
> > table when hanging and when operating correctly. We do not fault or hang
> > when the option is turned off, which together with the improved early
> > fault handling in the patch, makes it almost certain that the problem is
> > in hardware configuration and not in our early setup.
> > 
> > Of course, the most puzzling part is that memory test makes the hang
> > go away, while repeating memory test operation only on the msgbuf region
> > does not. msgbuf is special in that it is located at TOHM (top of high
> > memory). It spans 128KB from below it to the last byte of the last
> > physical segment.
> > 
> > The only ideas I have right now is that there is either a bug in the
> > Caching Agent/Home agent/IMC configuration in BIOS, in which case there
> > is nothing OS can do to mitigate it.  Or it might be that the memory
> > map reported by CMS is wrong (you said that you use legacy boot, right
> > ?).  This is not too surprising if true, because non-EFI boot code path
> > definitely get less and less testing.
> > 
> > For the later case (potential bug in CMS), could you switch to EFI boot
> > mode and see whether the issue magically healths itself ?  You could boot
> > from USB stick in EFI mode without reinstalling for test.
> 
> I can't boot from USB stick -- this is remote DC and IPMI allow only
> CDROM emulation.
> 
> OK, I am boot in UEFI 12.0 snapshot ISO.
> Boot ok.
> 
> Can I convert installed OS to UEFI mode?
I am not sure what do you ask there.  Are you asking whether I need any
further information from the broken setup ?  I believe that no, I cannot
debug this any further.

I think that the interesting piece of data that can be obtained now is
the memmap command output from the EFI loader from all three configurations,
NUMA on/off and interleaving.

> 
> > Do you use latest BIOS for your motherboard ?
> 
> This is new MB (X10DRi) w/ BIOS 2.0, new is 2.1 but update is not
> simple (need to prepare bootable dos ISO, mostly utilites don't work
> under FreeBSD).
IMO the only way to fix this issue, if it is really important, is
to contact supermicro and show them the bug.  But this only makes sense if
repeated on the latest firmware version.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Slawa Olhovchenkov
On Wed, Dec 14, 2016 at 01:39:27PM +0200, Konstantin Belousov wrote:

> In other words, it is almost certainly the hang and not a fault causing
> hang. This means that the machine is not compliant with the IA32
> architecture, in particular, the region reported as normal memory by
> E820 BIOS service does not behave as normal memory.
> 
> Since regardless of the option setting, the memory map is same, and
> bootstrap page table only depend on the memory map, we use the same page
> table when hanging and when operating correctly. We do not fault or hang
> when the option is turned off, which together with the improved early
> fault handling in the patch, makes it almost certain that the problem is
> in hardware configuration and not in our early setup.
> 
> Of course, the most puzzling part is that memory test makes the hang
> go away, while repeating memory test operation only on the msgbuf region
> does not. msgbuf is special in that it is located at TOHM (top of high
> memory). It spans 128KB from below it to the last byte of the last
> physical segment.
> 
> The only ideas I have right now is that there is either a bug in the
> Caching Agent/Home agent/IMC configuration in BIOS, in which case there
> is nothing OS can do to mitigate it.  Or it might be that the memory
> map reported by CMS is wrong (you said that you use legacy boot, right
> ?).  This is not too surprising if true, because non-EFI boot code path
> definitely get less and less testing.
> 
> For the later case (potential bug in CMS), could you switch to EFI boot
> mode and see whether the issue magically healths itself ?  You could boot
> from USB stick in EFI mode without reinstalling for test.

I can't boot from USB stick -- this is remote DC and IPMI allow only
CDROM emulation.

OK, I am boot in UEFI 12.0 snapshot ISO.
Boot ok.

Can I convert installed OS to UEFI mode?

> Do you use latest BIOS for your motherboard ?

This is new MB (X10DRi) w/ BIOS 2.0, new is 2.1 but update is not
simple (need to prepare bootable dos ISO, mostly utilites don't work
under FreeBSD).
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Konstantin Belousov
On Wed, Dec 14, 2016 at 01:52:11PM +0300, Slawa Olhovchenkov wrote:
> Booting...
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=01 base= len=00099c00
> SMAP type=02 base=00099c00 len=6400
> SMAP type=02 base=000e len=0002
> SMAP type=01 base=0010 len=7906b000
> SMAP type=02 base=7916b000 len=00936000
> SMAP type=04 base=79aa1000 len=00509000
> SMAP type=02 base=79faa000 len=02056000
> SMAP type=01 base=0001 len=001f8000
> SMAP type=02 base=7c00 len=1400
> SMAP type=02 base=fed1c000 len=00029000
> SMAP type=02 base=ff00 len=0100
> TTT1 0xf8207ff0 0xf8207fb8 10
> . 0
> . 1000
> . 2000
> . 3000
> . 4000
> . 5000
> . 6000
> . 7000
> . 8000
> . 9000
> . a000
> . b000
> . c000
> . d000
> . e000
> . f000
> . 1
> . 11000
> . 12000
> . 13000
> . 14000
> . 15000
> . 16000
> . 17000
> . 18000
> . 19000
> . 1a000
> . 1b000
> . 1c000
> . 1d000
> . 1e000
> . 1f000
> . 2
> . 21000
> . 22000
> . 23000
> . 24000
> . 25000
> . 26000
> . 27000
> . 28000
> . 29000
> . 2a000
> . 2b000

In other words, it is almost certainly the hang and not a fault causing
hang. This means that the machine is not compliant with the IA32
architecture, in particular, the region reported as normal memory by
E820 BIOS service does not behave as normal memory.

Since regardless of the option setting, the memory map is same, and
bootstrap page table only depend on the memory map, we use the same page
table when hanging and when operating correctly. We do not fault or hang
when the option is turned off, which together with the improved early
fault handling in the patch, makes it almost certain that the problem is
in hardware configuration and not in our early setup.

Of course, the most puzzling part is that memory test makes the hang
go away, while repeating memory test operation only on the msgbuf region
does not. msgbuf is special in that it is located at TOHM (top of high
memory). It spans 128KB from below it to the last byte of the last
physical segment.

The only ideas I have right now is that there is either a bug in the
Caching Agent/Home agent/IMC configuration in BIOS, in which case there
is nothing OS can do to mitigate it.  Or it might be that the memory
map reported by CMS is wrong (you said that you use legacy boot, right
?).  This is not too surprising if true, because non-EFI boot code path
definitely get less and less testing.

For the later case (potential bug in CMS), could you switch to EFI boot
mode and see whether the issue magically healths itself ?  You could boot
from USB stick in EFI mode without reinstalling for test.

Do you use latest BIOS for your motherboard ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Slawa Olhovchenkov
On Wed, Dec 14, 2016 at 12:27:11PM +0200, Konstantin Belousov wrote:

> On Wed, Dec 14, 2016 at 11:53:50AM +0200, Konstantin Belousov wrote:
> > On Tue, Dec 13, 2016 at 08:43:45PM +0300, Slawa Olhovchenkov wrote:
> > > On Tue, Dec 13, 2016 at 07:25:29PM +0200, Konstantin Belousov wrote:
> > > 
> > > > This is not what I expected.
> > > > Also, I realized that I mis-read the memory test code.  It does not
> > > > obliterate memory, old content is preserved.
> > > > 
> > > > Please do exactly the same testing with another patch, at the end of the
> > > > message.  There could be more output, up to 256 lines.
> > > 
> > > No problem.
> > > 
> > > Booting...
> > > KDB: debugger backends: ddb
> > > KDB: current backend: ddb
> > > SMAP type=01 base= len=00099c00
> > > SMAP type=02 base=00099c00 len=6400
> > > SMAP type=02 base=000e len=0002
> > > SMAP type=01 base=0010 len=7906b000
> > > SMAP type=02 base=7916b000 len=00936000
> > > SMAP type=04 base=79aa1000 len=00509000
> > > SMAP type=02 base=79faa000 len=02056000
> > > SMAP type=01 base=0001 len=001f8000
> > > SMAP type=02 base=7c00 len=1400
> > > SMAP type=02 base=fed1c000 len=00029000
> > > SMAP type=02 base=ff00 len=0100
> > > TTT1 0xf8207ff0 0xf8207fb8 10
> > > . 0
> > > . 1000
> > > . 2000
> > > . 3000
> > > . 4000
> > > . 5000
> > > . 6000
> > > . 7000
> > > . 8000
> > > . 9000
> > > . a000
> > > . b000
> > > . c000
> > > . d000
> > > . e000
> > > . f000
> > > . 1
> > > . 11000
> > > . 12000
> > > . 13000
> > > . 14000
> > > . 15000
> > > . 16000
> > > . 17000
> > > . 18000
> > > . 19000
> > > . 1a000
> > > . 1b000
> > > . 1c000
> > > . 1d000
> > > . 1e000
> > > . 1f000
> > > . 2
> > > . 21000
> > > . 22000
> > > . 23000
> > > . 24000
> > > . 25000
> > > . 26000
> > > . 27000
> > > . 28000
> > > . 29000
> > > . 2a000
> > > . 2b000
> > > 
> > 
> > Do you still have access to the machine ?
> > If yes, please try this patch (against clean tree, as always) with the
> > same instructions as before.
> > 
> 
> Updated patch, it should provide the expected information in case of
> page fault.

Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
SMAP type=01 base= len=00099c00
SMAP type=02 base=00099c00 len=6400
SMAP type=02 base=000e len=0002
SMAP type=01 base=0010 len=7906b000
SMAP type=02 base=7916b000 len=00936000
SMAP type=04 base=79aa1000 len=00509000
SMAP type=02 base=79faa000 len=02056000
SMAP type=01 base=0001 len=001f8000
SMAP type=02 base=7c00 len=1400
SMAP type=02 base=fed1c000 len=00029000
SMAP type=02 base=ff00 len=0100
TTT1 0xf8207ff0 0xf8207fb8 10
. 0
. 1000
. 2000
. 3000
. 4000
. 5000
. 6000
. 7000
. 8000
. 9000
. a000
. b000
. c000
. d000
. e000
. f000
. 1
. 11000
. 12000
. 13000
. 14000
. 15000
. 16000
. 17000
. 18000
. 19000
. 1a000
. 1b000
. 1c000
. 1d000
. 1e000
. 1f000
. 2
. 21000
. 22000
. 23000
. 24000
. 25000
. 26000
. 27000
. 28000
. 29000
. 2a000
. 2b000

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Konstantin Belousov
On Wed, Dec 14, 2016 at 11:53:50AM +0200, Konstantin Belousov wrote:
> On Tue, Dec 13, 2016 at 08:43:45PM +0300, Slawa Olhovchenkov wrote:
> > On Tue, Dec 13, 2016 at 07:25:29PM +0200, Konstantin Belousov wrote:
> > 
> > > This is not what I expected.
> > > Also, I realized that I mis-read the memory test code.  It does not
> > > obliterate memory, old content is preserved.
> > > 
> > > Please do exactly the same testing with another patch, at the end of the
> > > message.  There could be more output, up to 256 lines.
> > 
> > No problem.
> > 
> > Booting...
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> > SMAP type=01 base= len=00099c00
> > SMAP type=02 base=00099c00 len=6400
> > SMAP type=02 base=000e len=0002
> > SMAP type=01 base=0010 len=7906b000
> > SMAP type=02 base=7916b000 len=00936000
> > SMAP type=04 base=79aa1000 len=00509000
> > SMAP type=02 base=79faa000 len=02056000
> > SMAP type=01 base=0001 len=001f8000
> > SMAP type=02 base=7c00 len=1400
> > SMAP type=02 base=fed1c000 len=00029000
> > SMAP type=02 base=ff00 len=0100
> > TTT1 0xf8207ff0 0xf8207fb8 10
> > . 0
> > . 1000
> > . 2000
> > . 3000
> > . 4000
> > . 5000
> > . 6000
> > . 7000
> > . 8000
> > . 9000
> > . a000
> > . b000
> > . c000
> > . d000
> > . e000
> > . f000
> > . 1
> > . 11000
> > . 12000
> > . 13000
> > . 14000
> > . 15000
> > . 16000
> > . 17000
> > . 18000
> > . 19000
> > . 1a000
> > . 1b000
> > . 1c000
> > . 1d000
> > . 1e000
> > . 1f000
> > . 2
> > . 21000
> > . 22000
> > . 23000
> > . 24000
> > . 25000
> > . 26000
> > . 27000
> > . 28000
> > . 29000
> > . 2a000
> > . 2b000
> > 
> 
> Do you still have access to the machine ?
> If yes, please try this patch (against clean tree, as always) with the
> same instructions as before.
> 

Updated patch, it should provide the expected information in case of
page fault.

diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index b2283339405..682307f5fe4 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -1673,6 +1673,16 @@ hammer_time(u_int64_t modulep, u_int64_t physfree)
wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D);
 
/*
+* Temporary forge some valid pointer to PCB, for exception
+* handlers.  It is reinitialized properly below after FPU is
+* set up.  Also set up td_critnest to short-cut the page
+* fault handler.
+*/
+   cpu_max_ext_state_size = sizeof(struct savefpu);
+   thread0.td_pcb = get_pcb_td();
+   thread0.td_critnest = 1;
+
+   /*
 * The console and kdb should be initialized even earlier than here,
 * but some console drivers don't work until after getmemsize().
 * Default to late console initialization to support these drivers.
@@ -1762,6 +1772,7 @@ hammer_time(u_int64_t modulep, u_int64_t physfree)
 #ifdef FDT
x86_init_fdt();
 #endif
+   thread0.td_critnest = 0;
 
/* Location of kernel stack for locore */
return ((u_int64_t)thread0.td_pcb);
diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
index f275aef3b4f..1be7a629f65 100644
--- a/sys/kern/subr_msgbuf.c
+++ b/sys/kern/subr_msgbuf.c
@@ -67,14 +67,19 @@ msgbuf_init(struct msgbuf *mbp, void *ptr, int size)
mbp->msg_ptr = ptr;
mbp->msg_size = size;
mbp->msg_seqmod = SEQMOD(size);
+printf("YYY1\n");
msgbuf_clear(mbp);
+printf("YYY2\n");
mbp->msg_magic = MSG_MAGIC;
mbp->msg_lastpri = -1;
mbp->msg_flags = 0;
+printf("YYY3\n");
bzero(>msg_lock, sizeof(mbp->msg_lock));
mtx_init(>msg_lock, "msgbuf", NULL, MTX_SPIN);
+printf("YYY4\n");
 }
 
+
 /*
  * Reinitialize a message buffer, retaining its previous contents if
  * the size and checksum are correct. If the old contents cannot be
@@ -85,8 +90,10 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
 {
u_int cksum;
 
-   if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+   if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+printf("XXX1\n");
msgbuf_init(mbp, ptr, size);
+printf("XXX2\n");
return;
}
mbp->msg_seqmod = SEQMOD(size);
@@ -117,10 +124,12 @@ void
 msgbuf_clear(struct msgbuf *mbp)
 {
 
+printf("ZZZ1\n");
bzero(mbp->msg_ptr, mbp->msg_size);
mbp->msg_wseq = 0;
mbp->msg_rseq = 0;
mbp->msg_cksum = 0;
+printf("ZZZ2\n");
 }
 
 /*
diff --git a/sys/kern/subr_prf.c b/sys/kern/subr_prf.c
index e78863830c7..a72984dbc19 100644
--- a/sys/kern/subr_prf.c
+++ b/sys/kern/subr_prf.c
@@ -998,6 +998,14 @@ msgbufinit(void *ptr, int size)
char *cp;
static struct msgbuf *oldp = NULL;
 
+printf("TTT1 %p %p %x\n", ptr, (char *)ptr + size - 

Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-14 Thread Konstantin Belousov
On Tue, Dec 13, 2016 at 08:43:45PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Dec 13, 2016 at 07:25:29PM +0200, Konstantin Belousov wrote:
> 
> > This is not what I expected.
> > Also, I realized that I mis-read the memory test code.  It does not
> > obliterate memory, old content is preserved.
> > 
> > Please do exactly the same testing with another patch, at the end of the
> > message.  There could be more output, up to 256 lines.
> 
> No problem.
> 
> Booting...
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=01 base= len=00099c00
> SMAP type=02 base=00099c00 len=6400
> SMAP type=02 base=000e len=0002
> SMAP type=01 base=0010 len=7906b000
> SMAP type=02 base=7916b000 len=00936000
> SMAP type=04 base=79aa1000 len=00509000
> SMAP type=02 base=79faa000 len=02056000
> SMAP type=01 base=0001 len=001f8000
> SMAP type=02 base=7c00 len=1400
> SMAP type=02 base=fed1c000 len=00029000
> SMAP type=02 base=ff00 len=0100
> TTT1 0xf8207ff0 0xf8207fb8 10
> . 0
> . 1000
> . 2000
> . 3000
> . 4000
> . 5000
> . 6000
> . 7000
> . 8000
> . 9000
> . a000
> . b000
> . c000
> . d000
> . e000
> . f000
> . 1
> . 11000
> . 12000
> . 13000
> . 14000
> . 15000
> . 16000
> . 17000
> . 18000
> . 19000
> . 1a000
> . 1b000
> . 1c000
> . 1d000
> . 1e000
> . 1f000
> . 2
> . 21000
> . 22000
> . 23000
> . 24000
> . 25000
> . 26000
> . 27000
> . 28000
> . 29000
> . 2a000
> . 2b000
> 

Do you still have access to the machine ?
If yes, please try this patch (against clean tree, as always) with the
same instructions as before.

diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index b2283339405..917ea4475f3 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -1673,6 +1673,14 @@ hammer_time(u_int64_t modulep, u_int64_t physfree)
wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D);
 
/*
+* Temporary forge some valid pointer to PCB, for exception
+* handlers.  It is reinitialized properly below after FPU is
+* set up.
+*/
+   cpu_max_ext_state_size = sizeof(struct savefpu);
+   thread0.td_pcb = get_pcb_td();
+
+   /*
 * The console and kdb should be initialized even earlier than here,
 * but some console drivers don't work until after getmemsize().
 * Default to late console initialization to support these drivers.
diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
index f275aef3b4f..1be7a629f65 100644
--- a/sys/kern/subr_msgbuf.c
+++ b/sys/kern/subr_msgbuf.c
@@ -67,14 +67,19 @@ msgbuf_init(struct msgbuf *mbp, void *ptr, int size)
mbp->msg_ptr = ptr;
mbp->msg_size = size;
mbp->msg_seqmod = SEQMOD(size);
+printf("YYY1\n");
msgbuf_clear(mbp);
+printf("YYY2\n");
mbp->msg_magic = MSG_MAGIC;
mbp->msg_lastpri = -1;
mbp->msg_flags = 0;
+printf("YYY3\n");
bzero(>msg_lock, sizeof(mbp->msg_lock));
mtx_init(>msg_lock, "msgbuf", NULL, MTX_SPIN);
+printf("YYY4\n");
 }
 
+
 /*
  * Reinitialize a message buffer, retaining its previous contents if
  * the size and checksum are correct. If the old contents cannot be
@@ -85,8 +90,10 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
 {
u_int cksum;
 
-   if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+   if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+printf("XXX1\n");
msgbuf_init(mbp, ptr, size);
+printf("XXX2\n");
return;
}
mbp->msg_seqmod = SEQMOD(size);
@@ -117,10 +124,12 @@ void
 msgbuf_clear(struct msgbuf *mbp)
 {
 
+printf("ZZZ1\n");
bzero(mbp->msg_ptr, mbp->msg_size);
mbp->msg_wseq = 0;
mbp->msg_rseq = 0;
mbp->msg_cksum = 0;
+printf("ZZZ2\n");
 }
 
 /*
diff --git a/sys/kern/subr_prf.c b/sys/kern/subr_prf.c
index e78863830c7..a72984dbc19 100644
--- a/sys/kern/subr_prf.c
+++ b/sys/kern/subr_prf.c
@@ -998,6 +998,14 @@ msgbufinit(void *ptr, int size)
char *cp;
static struct msgbuf *oldp = NULL;
 
+printf("TTT1 %p %p %x\n", ptr, (char *)ptr + size - sizeof(*msgbufp), size);
+for (int i = 0; i < size; i++) {
+if (i % PAGE_SIZE == 0) printf(". %x\n", i);
+   volatile char *c = (char *)ptr + i;
+   char tmp;
+   tmp = *c;
+   *c = tmp;
+}
size -= sizeof(*msgbufp);
cp = (char *)ptr;
msgbufp = (struct msgbuf *)(cp + size);
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Slawa Olhovchenkov
On Tue, Dec 13, 2016 at 07:25:29PM +0200, Konstantin Belousov wrote:

> This is not what I expected.
> Also, I realized that I mis-read the memory test code.  It does not
> obliterate memory, old content is preserved.
> 
> Please do exactly the same testing with another patch, at the end of the
> message.  There could be more output, up to 256 lines.

No problem.

Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
SMAP type=01 base= len=00099c00
SMAP type=02 base=00099c00 len=6400
SMAP type=02 base=000e len=0002
SMAP type=01 base=0010 len=7906b000
SMAP type=02 base=7916b000 len=00936000
SMAP type=04 base=79aa1000 len=00509000
SMAP type=02 base=79faa000 len=02056000
SMAP type=01 base=0001 len=001f8000
SMAP type=02 base=7c00 len=1400
SMAP type=02 base=fed1c000 len=00029000
SMAP type=02 base=ff00 len=0100
TTT1 0xf8207ff0 0xf8207fb8 10
. 0
. 1000
. 2000
. 3000
. 4000
. 5000
. 6000
. 7000
. 8000
. 9000
. a000
. b000
. c000
. d000
. e000
. f000
. 1
. 11000
. 12000
. 13000
. 14000
. 15000
. 16000
. 17000
. 18000
. 19000
. 1a000
. 1b000
. 1c000
. 1d000
. 1e000
. 1f000
. 2
. 21000
. 22000
. 23000
. 24000
. 25000
. 26000
. 27000
. 28000
. 29000
. 2a000
. 2b000

> > 
> > > If the patched kernel boots succesfully, or if the patched kernel
> > > boots further, I will provide one more, last patch, to test.
> > 
> > please, next time point what verion of source need to patch: vanila or
> > already patched.
> I usually send full patches, i.e. the patch must be applied to the clean
> checkout.  Patch the vanilla sources.

np.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Konstantin Belousov
On Tue, Dec 13, 2016 at 06:28:38PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Dec 13, 2016 at 05:01:39PM +0200, Konstantin Belousov wrote:
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=01 base= len=00099c00
> SMAP type=02 base=00099c00 len=6400
> SMAP type=02 base=000e len=0002
> SMAP type=01 base=0010 len=7906b000
> SMAP type=02 base=7916b000 len=00936000
> SMAP type=04 base=79aa1000 len=00509000
> SMAP type=02 base=79faa000 len=02056000
> SMAP type=01 base=0001 len=001f8000
> SMAP type=02 base=7c00 len=1400
> SMAP type=02 base=fed1c000 len=00029000
> SMAP type=02 base=ff00 len=0100
> TTT1 0xf8207ff0 0xf8207fb8 10
This is not what I expected.
Also, I realized that I mis-read the memory test code.  It does not
obliterate memory, old content is preserved.

Please do exactly the same testing with another patch, at the end of the
message.  There could be more output, up to 256 lines.

> 
> > If the patched kernel boots succesfully, or if the patched kernel
> > boots further, I will provide one more, last patch, to test.
> 
> please, next time point what verion of source need to patch: vanila or
> already patched.
I usually send full patches, i.e. the patch must be applied to the clean
checkout.  Patch the vanilla sources.

diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
index f275aef3b4f..1be7a629f65 100644
--- a/sys/kern/subr_msgbuf.c
+++ b/sys/kern/subr_msgbuf.c
@@ -67,14 +67,19 @@ msgbuf_init(struct msgbuf *mbp, void *ptr, int size)
mbp->msg_ptr = ptr;
mbp->msg_size = size;
mbp->msg_seqmod = SEQMOD(size);
+printf("YYY1\n");
msgbuf_clear(mbp);
+printf("YYY2\n");
mbp->msg_magic = MSG_MAGIC;
mbp->msg_lastpri = -1;
mbp->msg_flags = 0;
+printf("YYY3\n");
bzero(>msg_lock, sizeof(mbp->msg_lock));
mtx_init(>msg_lock, "msgbuf", NULL, MTX_SPIN);
+printf("YYY4\n");
 }
 
+
 /*
  * Reinitialize a message buffer, retaining its previous contents if
  * the size and checksum are correct. If the old contents cannot be
@@ -85,8 +90,10 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
 {
u_int cksum;
 
-   if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+   if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+printf("XXX1\n");
msgbuf_init(mbp, ptr, size);
+printf("XXX2\n");
return;
}
mbp->msg_seqmod = SEQMOD(size);
@@ -117,10 +124,12 @@ void
 msgbuf_clear(struct msgbuf *mbp)
 {
 
+printf("ZZZ1\n");
bzero(mbp->msg_ptr, mbp->msg_size);
mbp->msg_wseq = 0;
mbp->msg_rseq = 0;
mbp->msg_cksum = 0;
+printf("ZZZ2\n");
 }
 
 /*
diff --git a/sys/kern/subr_prf.c b/sys/kern/subr_prf.c
index e78863830c7..a72984dbc19 100644
--- a/sys/kern/subr_prf.c
+++ b/sys/kern/subr_prf.c
@@ -998,6 +998,14 @@ msgbufinit(void *ptr, int size)
char *cp;
static struct msgbuf *oldp = NULL;
 
+printf("TTT1 %p %p %x\n", ptr, (char *)ptr + size - sizeof(*msgbufp), size);
+for (int i = 0; i < size; i++) {
+if (i % PAGE_SIZE == 0) printf(". %x\n", i);
+   volatile char *c = (char *)ptr + i;
+   char tmp;
+   tmp = *c;
+   *c = tmp;
+}
size -= sizeof(*msgbufp);
cp = (char *)ptr;
msgbufp = (struct msgbuf *)(cp + size);
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Slawa Olhovchenkov
On Tue, Dec 13, 2016 at 05:01:39PM +0200, Konstantin Belousov wrote:

> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> > SMAP type=01 base= len=00099c00
> > SMAP type=02 base=00099c00 len=6400
> > SMAP type=02 base=000e len=0002
> > SMAP type=01 base=0010 len=7906b000
> > SMAP type=02 base=7916b000 len=00936000
> > SMAP type=04 base=79aa1000 len=00509000
> > SMAP type=02 base=79faa000 len=02056000
> > SMAP type=01 base=0001 len=001f8000
> > SMAP type=02 base=7c00 len=1400
> > SMAP type=02 base=fed1c000 len=00029000
> > SMAP type=02 base=ff00 len=0100
> > XXX1
> > YYY1
> > ZZZ1
> 
> Ok, please do exactly the same testing with the following patch.

KDB: debugger backends: ddb
KDB: current backend: ddb
SMAP type=01 base= len=00099c00
SMAP type=02 base=00099c00 len=6400
SMAP type=02 base=000e len=0002
SMAP type=01 base=0010 len=7906b000
SMAP type=02 base=7916b000 len=00936000
SMAP type=04 base=79aa1000 len=00509000
SMAP type=02 base=79faa000 len=02056000
SMAP type=01 base=0001 len=001f8000
SMAP type=02 base=7c00 len=1400
SMAP type=02 base=fed1c000 len=00029000
SMAP type=02 base=ff00 len=0100
TTT1 0xf8207ff0 0xf8207fb8 10

> If the patched kernel boots succesfully, or if the patched kernel
> boots further, I will provide one more, last patch, to test.

please, next time point what verion of source need to patch: vanila or
already patched.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Konstantin Belousov
On Tue, Dec 13, 2016 at 05:34:01PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Dec 13, 2016 at 05:11:14PM +0300, Slawa Olhovchenkov wrote:
> 
> > On Tue, Dec 13, 2016 at 03:57:59PM +0200, Konstantin Belousov wrote:
> > 
> > > On Tue, Dec 13, 2016 at 03:49:32PM +0300, Slawa Olhovchenkov wrote:
> > > > > Boot with NUMA enabled and interleave off.
> > > > 
> > > > Already with patched kernel
> > > > 
> > > > > Patch kernel with the 'if (1 || ...)' patch.
> > > > > Reboot, enter BIOS setup and enable interleave there.
> > > > > Try to boot - does it boot ?
> > > > 
> > > > No.
> > > > 
> > > > > If it did not booted, power machine off for 10 minutes.
> > > > 
> > > > OK
> > > > 
> > > > > Power it on, try to boot (with the same patched kernel).
> > > > > Does the machine boot now ?
> > > > 
> > > > Don't boot.
> > > 
> > > I am really puzzled.  In other words, touching all memory causes the
> > > msgbuf to not hang.
> > 
> > yes
> > 
> > > Can you try one more experiment ?
> > > Take the patch below, apply it.
> > > >From the config where interleave is disabled, install new kernel.
> > > Reboot, enter BIOS setup and enable interleave.
> > > Set late_console to zero in loader.
> > > Do not enable memory test.
> > > Boot the patched kernel.
> > > Kernel must hang, according to your previous reports.
> > > I want to see the console log.
> > 
> > Hmm. I am [already] show output from ddb, and guess kernel will be
> > hang at first wirte to *mbp, i.e. you don't see any in console log.
> > 
> > OK, anyway I am try this pacth.
> 
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=01 base= len=00099c00
> SMAP type=02 base=00099c00 len=6400
> SMAP type=02 base=000e len=0002
> SMAP type=01 base=0010 len=7906b000
> SMAP type=02 base=7916b000 len=00936000
> SMAP type=04 base=79aa1000 len=00509000
> SMAP type=02 base=79faa000 len=02056000
> SMAP type=01 base=0001 len=001f8000
> SMAP type=02 base=7c00 len=1400
> SMAP type=02 base=fed1c000 len=00029000
> SMAP type=02 base=ff00 len=0100
> XXX1
> YYY1
> ZZZ1

Ok, please do exactly the same testing with the following patch.
If the patched kernel boots succesfully, or if the patched kernel
boots further, I will provide one more, last patch, to test.

diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
index f275aef3b4f..1be7a629f65 100644
--- a/sys/kern/subr_msgbuf.c
+++ b/sys/kern/subr_msgbuf.c
@@ -67,14 +67,19 @@ msgbuf_init(struct msgbuf *mbp, void *ptr, int size)
mbp->msg_ptr = ptr;
mbp->msg_size = size;
mbp->msg_seqmod = SEQMOD(size);
+printf("YYY1\n");
msgbuf_clear(mbp);
+printf("YYY2\n");
mbp->msg_magic = MSG_MAGIC;
mbp->msg_lastpri = -1;
mbp->msg_flags = 0;
+printf("YYY3\n");
bzero(>msg_lock, sizeof(mbp->msg_lock));
mtx_init(>msg_lock, "msgbuf", NULL, MTX_SPIN);
+printf("YYY4\n");
 }
 
+
 /*
  * Reinitialize a message buffer, retaining its previous contents if
  * the size and checksum are correct. If the old contents cannot be
@@ -85,8 +90,10 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
 {
u_int cksum;
 
-   if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+   if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+printf("XXX1\n");
msgbuf_init(mbp, ptr, size);
+printf("XXX2\n");
return;
}
mbp->msg_seqmod = SEQMOD(size);
@@ -117,10 +124,12 @@ void
 msgbuf_clear(struct msgbuf *mbp)
 {
 
+printf("ZZZ1\n");
bzero(mbp->msg_ptr, mbp->msg_size);
mbp->msg_wseq = 0;
mbp->msg_rseq = 0;
mbp->msg_cksum = 0;
+printf("ZZZ2\n");
 }
 
 /*
diff --git a/sys/kern/subr_prf.c b/sys/kern/subr_prf.c
index e78863830c7..435412d55ea 100644
--- a/sys/kern/subr_prf.c
+++ b/sys/kern/subr_prf.c
@@ -998,6 +998,8 @@ msgbufinit(void *ptr, int size)
char *cp;
static struct msgbuf *oldp = NULL;
 
+printf("TTT1 %p %p %x\n", ptr, (char *)ptr + size - sizeof(*msgbufp), size);
+bzero(ptr, size);
size -= sizeof(*msgbufp);
cp = (char *)ptr;
msgbufp = (struct msgbuf *)(cp + size);
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Slawa Olhovchenkov
On Tue, Dec 13, 2016 at 05:11:14PM +0300, Slawa Olhovchenkov wrote:

> On Tue, Dec 13, 2016 at 03:57:59PM +0200, Konstantin Belousov wrote:
> 
> > On Tue, Dec 13, 2016 at 03:49:32PM +0300, Slawa Olhovchenkov wrote:
> > > > Boot with NUMA enabled and interleave off.
> > > 
> > > Already with patched kernel
> > > 
> > > > Patch kernel with the 'if (1 || ...)' patch.
> > > > Reboot, enter BIOS setup and enable interleave there.
> > > > Try to boot - does it boot ?
> > > 
> > > No.
> > > 
> > > > If it did not booted, power machine off for 10 minutes.
> > > 
> > > OK
> > > 
> > > > Power it on, try to boot (with the same patched kernel).
> > > > Does the machine boot now ?
> > > 
> > > Don't boot.
> > 
> > I am really puzzled.  In other words, touching all memory causes the
> > msgbuf to not hang.
> 
> yes
> 
> > Can you try one more experiment ?
> > Take the patch below, apply it.
> > >From the config where interleave is disabled, install new kernel.
> > Reboot, enter BIOS setup and enable interleave.
> > Set late_console to zero in loader.
> > Do not enable memory test.
> > Boot the patched kernel.
> > Kernel must hang, according to your previous reports.
> > I want to see the console log.
> 
> Hmm. I am [already] show output from ddb, and guess kernel will be
> hang at first wirte to *mbp, i.e. you don't see any in console log.
> 
> OK, anyway I am try this pacth.

KDB: debugger backends: ddb
KDB: current backend: ddb
SMAP type=01 base= len=00099c00
SMAP type=02 base=00099c00 len=6400
SMAP type=02 base=000e len=0002
SMAP type=01 base=0010 len=7906b000
SMAP type=02 base=7916b000 len=00936000
SMAP type=04 base=79aa1000 len=00509000
SMAP type=02 base=79faa000 len=02056000
SMAP type=01 base=0001 len=001f8000
SMAP type=02 base=7c00 len=1400
SMAP type=02 base=fed1c000 len=00029000
SMAP type=02 base=ff00 len=0100
XXX1
YYY1
ZZZ1
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Slawa Olhovchenkov
On Tue, Dec 13, 2016 at 03:57:59PM +0200, Konstantin Belousov wrote:

> On Tue, Dec 13, 2016 at 03:49:32PM +0300, Slawa Olhovchenkov wrote:
> > > Boot with NUMA enabled and interleave off.
> > 
> > Already with patched kernel
> > 
> > > Patch kernel with the 'if (1 || ...)' patch.
> > > Reboot, enter BIOS setup and enable interleave there.
> > > Try to boot - does it boot ?
> > 
> > No.
> > 
> > > If it did not booted, power machine off for 10 minutes.
> > 
> > OK
> > 
> > > Power it on, try to boot (with the same patched kernel).
> > > Does the machine boot now ?
> > 
> > Don't boot.
> 
> I am really puzzled.  In other words, touching all memory causes the
> msgbuf to not hang.

yes

> Can you try one more experiment ?
> Take the patch below, apply it.
> >From the config where interleave is disabled, install new kernel.
> Reboot, enter BIOS setup and enable interleave.
> Set late_console to zero in loader.
> Do not enable memory test.
> Boot the patched kernel.
> Kernel must hang, according to your previous reports.
> I want to see the console log.

Hmm. I am [already] show output from ddb, and guess kernel will be
hang at first wirte to *mbp, i.e. you don't see any in console log.

OK, anyway I am try this pacth.

> diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
> index f275aef3b4f..1be7a629f65 100644
> --- a/sys/kern/subr_msgbuf.c
> +++ b/sys/kern/subr_msgbuf.c
> @@ -67,14 +67,19 @@ msgbuf_init(struct msgbuf *mbp, void *ptr, int size)
>   mbp->msg_ptr = ptr;
>   mbp->msg_size = size;
>   mbp->msg_seqmod = SEQMOD(size);
> +printf("YYY1\n");
>   msgbuf_clear(mbp);
> +printf("YYY2\n");
>   mbp->msg_magic = MSG_MAGIC;
>   mbp->msg_lastpri = -1;
>   mbp->msg_flags = 0;
> +printf("YYY3\n");
>   bzero(>msg_lock, sizeof(mbp->msg_lock));
>   mtx_init(>msg_lock, "msgbuf", NULL, MTX_SPIN);
> +printf("YYY4\n");
>  }
>  
> +
>  /*
>   * Reinitialize a message buffer, retaining its previous contents if
>   * the size and checksum are correct. If the old contents cannot be
> @@ -85,8 +90,10 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
>  {
>   u_int cksum;
>  
> - if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
> + if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
> +printf("XXX1\n");
>   msgbuf_init(mbp, ptr, size);
> +printf("XXX2\n");
>   return;
>   }
>   mbp->msg_seqmod = SEQMOD(size);
> @@ -117,10 +124,12 @@ void
>  msgbuf_clear(struct msgbuf *mbp)
>  {
>  
> +printf("ZZZ1\n");
>   bzero(mbp->msg_ptr, mbp->msg_size);
>   mbp->msg_wseq = 0;
>   mbp->msg_rseq = 0;
>   mbp->msg_cksum = 0;
> +printf("ZZZ2\n");
>  }
>  
>  /*
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Konstantin Belousov
On Tue, Dec 13, 2016 at 03:49:32PM +0300, Slawa Olhovchenkov wrote:
> > Boot with NUMA enabled and interleave off.
> 
> Already with patched kernel
> 
> > Patch kernel with the 'if (1 || ...)' patch.
> > Reboot, enter BIOS setup and enable interleave there.
> > Try to boot - does it boot ?
> 
> No.
> 
> > If it did not booted, power machine off for 10 minutes.
> 
> OK
> 
> > Power it on, try to boot (with the same patched kernel).
> > Does the machine boot now ?
> 
> Don't boot.

I am really puzzled.  In other words, touching all memory causes the
msgbuf to not hang.

Can you try one more experiment ?
Take the patch below, apply it.
>From the config where interleave is disabled, install new kernel.
Reboot, enter BIOS setup and enable interleave.
Set late_console to zero in loader.
Do not enable memory test.
Boot the patched kernel.
Kernel must hang, according to your previous reports.
I want to see the console log.

diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
index f275aef3b4f..1be7a629f65 100644
--- a/sys/kern/subr_msgbuf.c
+++ b/sys/kern/subr_msgbuf.c
@@ -67,14 +67,19 @@ msgbuf_init(struct msgbuf *mbp, void *ptr, int size)
mbp->msg_ptr = ptr;
mbp->msg_size = size;
mbp->msg_seqmod = SEQMOD(size);
+printf("YYY1\n");
msgbuf_clear(mbp);
+printf("YYY2\n");
mbp->msg_magic = MSG_MAGIC;
mbp->msg_lastpri = -1;
mbp->msg_flags = 0;
+printf("YYY3\n");
bzero(>msg_lock, sizeof(mbp->msg_lock));
mtx_init(>msg_lock, "msgbuf", NULL, MTX_SPIN);
+printf("YYY4\n");
 }
 
+
 /*
  * Reinitialize a message buffer, retaining its previous contents if
  * the size and checksum are correct. If the old contents cannot be
@@ -85,8 +90,10 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
 {
u_int cksum;
 
-   if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+   if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+printf("XXX1\n");
msgbuf_init(mbp, ptr, size);
+printf("XXX2\n");
return;
}
mbp->msg_seqmod = SEQMOD(size);
@@ -117,10 +124,12 @@ void
 msgbuf_clear(struct msgbuf *mbp)
 {
 
+printf("ZZZ1\n");
bzero(mbp->msg_ptr, mbp->msg_size);
mbp->msg_wseq = 0;
mbp->msg_rseq = 0;
mbp->msg_cksum = 0;
+printf("ZZZ2\n");
 }
 
 /*
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Slawa Olhovchenkov
On Tue, Dec 13, 2016 at 01:23:40PM +0200, Konstantin Belousov wrote:

> On Tue, Dec 13, 2016 at 02:14:37PM +0300, Slawa Olhovchenkov wrote:
> > On Tue, Dec 13, 2016 at 01:05:35PM +0200, Konstantin Belousov wrote:
> > 
> > > On Tue, Dec 13, 2016 at 02:37:14AM +0300, Slawa Olhovchenkov wrote:
> > > > On Mon, Dec 12, 2016 at 04:20:33PM -0600, A. Wilcox wrote:
> > > > 
> > > > >  Try the debugging patch below, which unconditionally disables 
> > > > >  import of
> > > > >  previous buffer.  To test, you would need to boot, then frob 
> > > > >  options in
> > > > >  BIOS, reboot, again frob etc.
> > > > > >>>
> > > > > >>> still need test patch? if yes, with BIOS options?
> > > > > >> Yes, please test the patch.  I explained the procedure above.
> > > > > > 
> > > > > > sorry, i don't know 'frob'.
> > > > > > what exactly options combination I need test and what about memory 
> > > > > > test?
> > > > > > 
> > > > > 
> > > > > 
> > > > > The idea is that when rebooting, stale memory contents remain, but are
> > > > > corrupted due to interleave.
> > > > > 
> > > > > "Frob" basically means "mess with".  So apply patch, test kernel,
> > > > > reboot, change NUMA option, reboot again, see if it works, and so on.
> > > > > Basically repeat your test with the NUMA=on interleave=on, NUMA=off
> > > > > interleave=on, etc etc.
> > > > 
> > > > NUMA=on interleave=off booted
> > > > NUMA=on interleave=on hang
> > > > 
> > > > I think different combination whatever?
> > > 
> > > Do you mean, that both patched kernel, and unpatched kernel with the
> > > memory test enabled, hang when NUMA and interleave options enabled ?
> > 
> > Unpatched kernel boot with the memory test enabled when NUMA and
> > interleave options enabled -- I am already reported this.
> > 
> > patched kernel  with the memory test enabled boot too.
^^
> > i.e. memory test enabled allow boot in any situation.
> Then what about was the statement above ?

About unpatched kernel
https://lists.freebsd.org/pipermail/freebsd-current/2016-December/064069.html
patched and test I am test now, and wrote in previos mail ^^^

> You said that NUMA and interleave
> on caused hang.  Was that on the patched kernel ?

patched w/o memory test

> > 
> > > Could you enable the options, power down the machine for 10-20 minutes,
> > > and try to boot ?
> > 
> > For with kernel and bios options and boot options?
> > I am have two day befor server put in production for any expirements,
> > but please, be more clear in what combination need to test.
> 
> Boot with NUMA enabled and interleave off.

Already with patched kernel

> Patch kernel with the 'if (1 || ...)' patch.
> Reboot, enter BIOS setup and enable interleave there.
> Try to boot - does it boot ?

No.

> If it did not booted, power machine off for 10 minutes.

OK

> Power it on, try to boot (with the same patched kernel).
> Does the machine boot now ?

Don't boot.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Konstantin Belousov
On Tue, Dec 13, 2016 at 02:14:37PM +0300, Slawa Olhovchenkov wrote:
> On Tue, Dec 13, 2016 at 01:05:35PM +0200, Konstantin Belousov wrote:
> 
> > On Tue, Dec 13, 2016 at 02:37:14AM +0300, Slawa Olhovchenkov wrote:
> > > On Mon, Dec 12, 2016 at 04:20:33PM -0600, A. Wilcox wrote:
> > > 
> > > >  Try the debugging patch below, which unconditionally disables 
> > > >  import of
> > > >  previous buffer.  To test, you would need to boot, then frob 
> > > >  options in
> > > >  BIOS, reboot, again frob etc.
> > > > >>>
> > > > >>> still need test patch? if yes, with BIOS options?
> > > > >> Yes, please test the patch.  I explained the procedure above.
> > > > > 
> > > > > sorry, i don't know 'frob'.
> > > > > what exactly options combination I need test and what about memory 
> > > > > test?
> > > > > 
> > > > 
> > > > 
> > > > The idea is that when rebooting, stale memory contents remain, but are
> > > > corrupted due to interleave.
> > > > 
> > > > "Frob" basically means "mess with".  So apply patch, test kernel,
> > > > reboot, change NUMA option, reboot again, see if it works, and so on.
> > > > Basically repeat your test with the NUMA=on interleave=on, NUMA=off
> > > > interleave=on, etc etc.
> > > 
> > > NUMA=on interleave=off booted
> > > NUMA=on interleave=on hang
> > > 
> > > I think different combination whatever?
> > 
> > Do you mean, that both patched kernel, and unpatched kernel with the
> > memory test enabled, hang when NUMA and interleave options enabled ?
> 
> Unpatched kernel boot with the memory test enabled when NUMA and
> interleave options enabled -- I am already reported this.
> 
> patched kernel  with the memory test enabled boot too.
> 
> i.e. memory test enabled allow boot in any situation.
Then what about was the statement above ?  You said that NUMA and interleave
on caused hang.  Was that on the patched kernel ?

> 
> > Could you enable the options, power down the machine for 10-20 minutes,
> > and try to boot ?
> 
> For with kernel and bios options and boot options?
> I am have two day befor server put in production for any expirements,
> but please, be more clear in what combination need to test.

Boot with NUMA enabled and interleave off.
Patch kernel with the 'if (1 || ...)' patch.
Reboot, enter BIOS setup and enable interleave there.
Try to boot - does it boot ?
If it did not booted, power machine off for 10 minutes.
Power it on, try to boot (with the same patched kernel).
Does the machine boot now ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Slawa Olhovchenkov
On Tue, Dec 13, 2016 at 01:05:35PM +0200, Konstantin Belousov wrote:

> On Tue, Dec 13, 2016 at 02:37:14AM +0300, Slawa Olhovchenkov wrote:
> > On Mon, Dec 12, 2016 at 04:20:33PM -0600, A. Wilcox wrote:
> > 
> > >  Try the debugging patch below, which unconditionally disables import 
> > >  of
> > >  previous buffer.  To test, you would need to boot, then frob options 
> > >  in
> > >  BIOS, reboot, again frob etc.
> > > >>>
> > > >>> still need test patch? if yes, with BIOS options?
> > > >> Yes, please test the patch.  I explained the procedure above.
> > > > 
> > > > sorry, i don't know 'frob'.
> > > > what exactly options combination I need test and what about memory test?
> > > > 
> > > 
> > > 
> > > The idea is that when rebooting, stale memory contents remain, but are
> > > corrupted due to interleave.
> > > 
> > > "Frob" basically means "mess with".  So apply patch, test kernel,
> > > reboot, change NUMA option, reboot again, see if it works, and so on.
> > > Basically repeat your test with the NUMA=on interleave=on, NUMA=off
> > > interleave=on, etc etc.
> > 
> > NUMA=on interleave=off booted
> > NUMA=on interleave=on hang
> > 
> > I think different combination whatever?
> 
> Do you mean, that both patched kernel, and unpatched kernel with the
> memory test enabled, hang when NUMA and interleave options enabled ?

Unpatched kernel boot with the memory test enabled when NUMA and
interleave options enabled -- I am already reported this.

patched kernel  with the memory test enabled boot too.

i.e. memory test enabled allow boot in any situation.

> Could you enable the options, power down the machine for 10-20 minutes,
> and try to boot ?

For with kernel and bios options and boot options?
I am have two day befor server put in production for any expirements,
but please, be more clear in what combination need to test.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-13 Thread Konstantin Belousov
On Tue, Dec 13, 2016 at 02:37:14AM +0300, Slawa Olhovchenkov wrote:
> On Mon, Dec 12, 2016 at 04:20:33PM -0600, A. Wilcox wrote:
> 
> >  Try the debugging patch below, which unconditionally disables import of
> >  previous buffer.  To test, you would need to boot, then frob options in
> >  BIOS, reboot, again frob etc.
> > >>>
> > >>> still need test patch? if yes, with BIOS options?
> > >> Yes, please test the patch.  I explained the procedure above.
> > > 
> > > sorry, i don't know 'frob'.
> > > what exactly options combination I need test and what about memory test?
> > > 
> > 
> > 
> > The idea is that when rebooting, stale memory contents remain, but are
> > corrupted due to interleave.
> > 
> > "Frob" basically means "mess with".  So apply patch, test kernel,
> > reboot, change NUMA option, reboot again, see if it works, and so on.
> > Basically repeat your test with the NUMA=on interleave=on, NUMA=off
> > interleave=on, etc etc.
> 
> NUMA=on interleave=off booted
> NUMA=on interleave=on hang
> 
> I think different combination whatever?

Do you mean, that both patched kernel, and unpatched kernel with the
memory test enabled, hang when NUMA and interleave options enabled ?

Could you enable the options, power down the machine for 10-20 minutes,
and try to boot ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Slawa Olhovchenkov
On Mon, Dec 12, 2016 at 04:20:33PM -0600, A. Wilcox wrote:

>  Try the debugging patch below, which unconditionally disables import of
>  previous buffer.  To test, you would need to boot, then frob options in
>  BIOS, reboot, again frob etc.
> >>>
> >>> still need test patch? if yes, with BIOS options?
> >> Yes, please test the patch.  I explained the procedure above.
> > 
> > sorry, i don't know 'frob'.
> > what exactly options combination I need test and what about memory test?
> > 
> 
> 
> The idea is that when rebooting, stale memory contents remain, but are
> corrupted due to interleave.
> 
> "Frob" basically means "mess with".  So apply patch, test kernel,
> reboot, change NUMA option, reboot again, see if it works, and so on.
> Basically repeat your test with the NUMA=on interleave=on, NUMA=off
> interleave=on, etc etc.

NUMA=on interleave=off booted
NUMA=on interleave=on hang

I think different combination whatever?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread A. Wilcox
 Try the debugging patch below, which unconditionally disables import of
 previous buffer.  To test, you would need to boot, then frob options in
 BIOS, reboot, again frob etc.
>>>
>>> still need test patch? if yes, with BIOS options?
>> Yes, please test the patch.  I explained the procedure above.
> 
> sorry, i don't know 'frob'.
> what exactly options combination I need test and what about memory test?
> 


The idea is that when rebooting, stale memory contents remain, but are
corrupted due to interleave.

"Frob" basically means "mess with".  So apply patch, test kernel,
reboot, change NUMA option, reboot again, see if it works, and so on.
Basically repeat your test with the NUMA=on interleave=on, NUMA=off
interleave=on, etc etc.

hth,
--arw


-- 
A. Wilcox (awilfox)
Open-source programmer (C, C++, Python)
https://code.foxkit.us/u/awilfox/



signature.asc
Description: OpenPGP digital signature


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Slawa Olhovchenkov
On Mon, Dec 12, 2016 at 08:36:47PM +0200, Konstantin Belousov wrote:

> On Mon, Dec 12, 2016 at 08:43:11PM +0300, Slawa Olhovchenkov wrote:
> > On Mon, Dec 12, 2016 at 07:24:18PM +0200, Konstantin Belousov wrote:
> > 
> > > On Mon, Dec 12, 2016 at 08:16:34PM +0300, Slawa Olhovchenkov wrote:
> > > > On Mon, Dec 12, 2016 at 06:54:57PM +0200, Konstantin Belousov wrote:
> > > > 
> > > > > On Mon, Dec 12, 2016 at 07:21:53PM +0300, Slawa Olhovchenkov wrote:
> > > > > > On Mon, Dec 12, 2016 at 04:54:18PM +0200, Konstantin Belousov wrote:
> > > > > > 
> > > > > > > On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov 
> > > > > > > wrote:
> > > > > > > > Booting...
> > > > > > > > ESC[01;00H8+0x8+0xe9bdc]
> > > > > > > >   KDB: debugger backends: ddb
> > > > > > > > KDB: current backend: ddb
> > > > > > > > exit from kdb_init
> > > > > > > > KDB: enter: Boot flags requested debugger
> > > > > > > > [ thread pid 0 tid 0 ]
> > > > > > > - remove any video consoles from the HEAD kernel config, i.e. 
> > > > > > > sc/vt and
> > > > > > >   vga/efi,
> > > > > > > - do not use boot -d,
> > > > > > > - use serial console (IPMI SOL qualifies),
> > > > > > > - set late console to 0,
> > > > > > > and show me the verbose dmesg of such boot with the BIOS options
> > > > > > > which cause troubles.
> > > > > > 
> > > > > > Booting...
> > > > > > KDB: debugger backends: ddb
> > > > > > KDB: current backend: ddb
> > > > > > SMAP type=01 base= len=00099c00
> > > > > > SMAP type=02 base=00099c00 len=6400
> > > > > > SMAP type=02 base=000e len=0002
> > > > > > SMAP type=01 base=0010 len=7906b000
> > > > > > SMAP type=02 base=7916b000 len=00936000
> > > > > > SMAP type=04 base=79aa1000 len=00509000
> > > > > > SMAP type=02 base=79faa000 len=02056000
> > > > > > SMAP type=01 base=0001 len=001f8000
> > > > > > SMAP type=02 base=7c00 len=1400
> > > > > > SMAP type=02 base=fed1c000 len=00029000
> > > > > > SMAP type=02 base=ff00 len=0100
> > > > > > 
> > > > > > This is all. No more.
> > > > > When you switch between variations of the NUMA enablement options, do
> > > > > you just reboot the machine or do you sometimes physically turn it 
> > > > > off ?
> > > > 
> > > > just reboot an 'power reset' via kvm client (memory preserved, i mean)
> > > > 
> > > > > Try to enable memtest, with the hw.memtest.tests=1 loader variable.
> > > > > Does it change things ?
> > > > 
> > > > System booted, dmesg is http://zxy.spb.ru/dmesg.numa
> > > I suspect now the reversed situation could take place, the non-interleaved
> > > option would cause hang.
> > 
> > No, also booted, dmesg http://zxy.spb.ru/dmesg.numa-ninter
> I mean, it could hang if memory testing is not enabled.

realy? system boot w/o memory testing when NUMA ON and interleave OFF:
I am already report about boot after off interleave.

> > 
> > > My current guess is that memory content is preserved but swizzled by
> > > the cache line sized chunks.  So that the msgbuf header, left after the
> > > previous boot, looks correct while the real buffer content is shuffled.
> > > 
> > > Try the debugging patch below, which unconditionally disables import of
> > > previous buffer.  To test, you would need to boot, then frob options in
> > > BIOS, reboot, again frob etc.
> > 
> > still need test patch? if yes, with BIOS options?
> Yes, please test the patch.  I explained the procedure above.

sorry, i don't know 'frob'.
what exactly options combination I need test and what about memory test?

> > 
> > > diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
> > > index f275aef3b4f..d45ef502204 100644
> > > --- a/sys/kern/subr_msgbuf.c
> > > +++ b/sys/kern/subr_msgbuf.c
> > > @@ -85,7 +85,7 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
> > >  {
> > >   u_int cksum;
> > >  
> > > - if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
> > > + if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
> > >   msgbuf_init(mbp, ptr, size);
> > >   return;
> > >   }
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Konstantin Belousov
On Mon, Dec 12, 2016 at 08:43:11PM +0300, Slawa Olhovchenkov wrote:
> On Mon, Dec 12, 2016 at 07:24:18PM +0200, Konstantin Belousov wrote:
> 
> > On Mon, Dec 12, 2016 at 08:16:34PM +0300, Slawa Olhovchenkov wrote:
> > > On Mon, Dec 12, 2016 at 06:54:57PM +0200, Konstantin Belousov wrote:
> > > 
> > > > On Mon, Dec 12, 2016 at 07:21:53PM +0300, Slawa Olhovchenkov wrote:
> > > > > On Mon, Dec 12, 2016 at 04:54:18PM +0200, Konstantin Belousov wrote:
> > > > > 
> > > > > > On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> > > > > > > Booting...
> > > > > > > ESC[01;00H8+0x8+0xe9bdc]  
> > > > > > > KDB: debugger backends: ddb
> > > > > > > KDB: current backend: ddb
> > > > > > > exit from kdb_init
> > > > > > > KDB: enter: Boot flags requested debugger
> > > > > > > [ thread pid 0 tid 0 ]
> > > > > > - remove any video consoles from the HEAD kernel config, i.e. sc/vt 
> > > > > > and
> > > > > >   vga/efi,
> > > > > > - do not use boot -d,
> > > > > > - use serial console (IPMI SOL qualifies),
> > > > > > - set late console to 0,
> > > > > > and show me the verbose dmesg of such boot with the BIOS options
> > > > > > which cause troubles.
> > > > > 
> > > > > Booting...
> > > > > KDB: debugger backends: ddb
> > > > > KDB: current backend: ddb
> > > > > SMAP type=01 base= len=00099c00
> > > > > SMAP type=02 base=00099c00 len=6400
> > > > > SMAP type=02 base=000e len=0002
> > > > > SMAP type=01 base=0010 len=7906b000
> > > > > SMAP type=02 base=7916b000 len=00936000
> > > > > SMAP type=04 base=79aa1000 len=00509000
> > > > > SMAP type=02 base=79faa000 len=02056000
> > > > > SMAP type=01 base=0001 len=001f8000
> > > > > SMAP type=02 base=7c00 len=1400
> > > > > SMAP type=02 base=fed1c000 len=00029000
> > > > > SMAP type=02 base=ff00 len=0100
> > > > > 
> > > > > This is all. No more.
> > > > When you switch between variations of the NUMA enablement options, do
> > > > you just reboot the machine or do you sometimes physically turn it off ?
> > > 
> > > just reboot an 'power reset' via kvm client (memory preserved, i mean)
> > > 
> > > > Try to enable memtest, with the hw.memtest.tests=1 loader variable.
> > > > Does it change things ?
> > > 
> > > System booted, dmesg is http://zxy.spb.ru/dmesg.numa
> > I suspect now the reversed situation could take place, the non-interleaved
> > option would cause hang.
> 
> No, also booted, dmesg http://zxy.spb.ru/dmesg.numa-ninter
I mean, it could hang if memory testing is not enabled.

> 
> > My current guess is that memory content is preserved but swizzled by
> > the cache line sized chunks.  So that the msgbuf header, left after the
> > previous boot, looks correct while the real buffer content is shuffled.
> > 
> > Try the debugging patch below, which unconditionally disables import of
> > previous buffer.  To test, you would need to boot, then frob options in
> > BIOS, reboot, again frob etc.
> 
> still need test patch? if yes, with BIOS options?
Yes, please test the patch.  I explained the procedure above.

> 
> > diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
> > index f275aef3b4f..d45ef502204 100644
> > --- a/sys/kern/subr_msgbuf.c
> > +++ b/sys/kern/subr_msgbuf.c
> > @@ -85,7 +85,7 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
> >  {
> > u_int cksum;
> >  
> > -   if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
> > +   if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
> > msgbuf_init(mbp, ptr, size);
> > return;
> > }
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Slawa Olhovchenkov
On Mon, Dec 12, 2016 at 07:24:18PM +0200, Konstantin Belousov wrote:

> On Mon, Dec 12, 2016 at 08:16:34PM +0300, Slawa Olhovchenkov wrote:
> > On Mon, Dec 12, 2016 at 06:54:57PM +0200, Konstantin Belousov wrote:
> > 
> > > On Mon, Dec 12, 2016 at 07:21:53PM +0300, Slawa Olhovchenkov wrote:
> > > > On Mon, Dec 12, 2016 at 04:54:18PM +0200, Konstantin Belousov wrote:
> > > > 
> > > > > On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> > > > > > Booting...
> > > > > > ESC[01;00H8+0x8+0xe9bdc]
> > > > > >   KDB: debugger backends: ddb
> > > > > > KDB: current backend: ddb
> > > > > > exit from kdb_init
> > > > > > KDB: enter: Boot flags requested debugger
> > > > > > [ thread pid 0 tid 0 ]
> > > > > - remove any video consoles from the HEAD kernel config, i.e. sc/vt 
> > > > > and
> > > > >   vga/efi,
> > > > > - do not use boot -d,
> > > > > - use serial console (IPMI SOL qualifies),
> > > > > - set late console to 0,
> > > > > and show me the verbose dmesg of such boot with the BIOS options
> > > > > which cause troubles.
> > > > 
> > > > Booting...
> > > > KDB: debugger backends: ddb
> > > > KDB: current backend: ddb
> > > > SMAP type=01 base= len=00099c00
> > > > SMAP type=02 base=00099c00 len=6400
> > > > SMAP type=02 base=000e len=0002
> > > > SMAP type=01 base=0010 len=7906b000
> > > > SMAP type=02 base=7916b000 len=00936000
> > > > SMAP type=04 base=79aa1000 len=00509000
> > > > SMAP type=02 base=79faa000 len=02056000
> > > > SMAP type=01 base=0001 len=001f8000
> > > > SMAP type=02 base=7c00 len=1400
> > > > SMAP type=02 base=fed1c000 len=00029000
> > > > SMAP type=02 base=ff00 len=0100
> > > > 
> > > > This is all. No more.
> > > When you switch between variations of the NUMA enablement options, do
> > > you just reboot the machine or do you sometimes physically turn it off ?
> > 
> > just reboot an 'power reset' via kvm client (memory preserved, i mean)
> > 
> > > Try to enable memtest, with the hw.memtest.tests=1 loader variable.
> > > Does it change things ?
> > 
> > System booted, dmesg is http://zxy.spb.ru/dmesg.numa
> I suspect now the reversed situation could take place, the non-interleaved
> option would cause hang.

No, also booted, dmesg http://zxy.spb.ru/dmesg.numa-ninter

> My current guess is that memory content is preserved but swizzled by
> the cache line sized chunks.  So that the msgbuf header, left after the
> previous boot, looks correct while the real buffer content is shuffled.
> 
> Try the debugging patch below, which unconditionally disables import of
> previous buffer.  To test, you would need to boot, then frob options in
> BIOS, reboot, again frob etc.

still need test patch? if yes, with BIOS options?

> diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
> index f275aef3b4f..d45ef502204 100644
> --- a/sys/kern/subr_msgbuf.c
> +++ b/sys/kern/subr_msgbuf.c
> @@ -85,7 +85,7 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
>  {
>   u_int cksum;
>  
> - if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
> + if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
>   msgbuf_init(mbp, ptr, size);
>   return;
>   }
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Konstantin Belousov
On Mon, Dec 12, 2016 at 08:16:34PM +0300, Slawa Olhovchenkov wrote:
> On Mon, Dec 12, 2016 at 06:54:57PM +0200, Konstantin Belousov wrote:
> 
> > On Mon, Dec 12, 2016 at 07:21:53PM +0300, Slawa Olhovchenkov wrote:
> > > On Mon, Dec 12, 2016 at 04:54:18PM +0200, Konstantin Belousov wrote:
> > > 
> > > > On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> > > > > Booting...
> > > > > ESC[01;00H8+0x8+0xe9bdc]  
> > > > > KDB: debugger backends: ddb
> > > > > KDB: current backend: ddb
> > > > > exit from kdb_init
> > > > > KDB: enter: Boot flags requested debugger
> > > > > [ thread pid 0 tid 0 ]
> > > > - remove any video consoles from the HEAD kernel config, i.e. sc/vt and
> > > >   vga/efi,
> > > > - do not use boot -d,
> > > > - use serial console (IPMI SOL qualifies),
> > > > - set late console to 0,
> > > > and show me the verbose dmesg of such boot with the BIOS options
> > > > which cause troubles.
> > > 
> > > Booting...
> > > KDB: debugger backends: ddb
> > > KDB: current backend: ddb
> > > SMAP type=01 base= len=00099c00
> > > SMAP type=02 base=00099c00 len=6400
> > > SMAP type=02 base=000e len=0002
> > > SMAP type=01 base=0010 len=7906b000
> > > SMAP type=02 base=7916b000 len=00936000
> > > SMAP type=04 base=79aa1000 len=00509000
> > > SMAP type=02 base=79faa000 len=02056000
> > > SMAP type=01 base=0001 len=001f8000
> > > SMAP type=02 base=7c00 len=1400
> > > SMAP type=02 base=fed1c000 len=00029000
> > > SMAP type=02 base=ff00 len=0100
> > > 
> > > This is all. No more.
> > When you switch between variations of the NUMA enablement options, do
> > you just reboot the machine or do you sometimes physically turn it off ?
> 
> just reboot an 'power reset' via kvm client (memory preserved, i mean)
> 
> > Try to enable memtest, with the hw.memtest.tests=1 loader variable.
> > Does it change things ?
> 
> System booted, dmesg is http://zxy.spb.ru/dmesg.numa
I suspect now the reversed situation could take place, the non-interleaved
option would cause hang.

My current guess is that memory content is preserved but swizzled by
the cache line sized chunks.  So that the msgbuf header, left after the
previous boot, looks correct while the real buffer content is shuffled.

Try the debugging patch below, which unconditionally disables import of
previous buffer.  To test, you would need to boot, then frob options in
BIOS, reboot, again frob etc.

diff --git a/sys/kern/subr_msgbuf.c b/sys/kern/subr_msgbuf.c
index f275aef3b4f..d45ef502204 100644
--- a/sys/kern/subr_msgbuf.c
+++ b/sys/kern/subr_msgbuf.c
@@ -85,7 +85,7 @@ msgbuf_reinit(struct msgbuf *mbp, void *ptr, int size)
 {
u_int cksum;
 
-   if (mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
+   if (1 || mbp->msg_magic != MSG_MAGIC || mbp->msg_size != size) {
msgbuf_init(mbp, ptr, size);
return;
}
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Slawa Olhovchenkov
On Mon, Dec 12, 2016 at 06:54:57PM +0200, Konstantin Belousov wrote:

> On Mon, Dec 12, 2016 at 07:21:53PM +0300, Slawa Olhovchenkov wrote:
> > On Mon, Dec 12, 2016 at 04:54:18PM +0200, Konstantin Belousov wrote:
> > 
> > > On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> > > > Booting...
> > > > ESC[01;00H8+0x8+0xe9bdc]
> > > >   KDB: debugger backends: ddb
> > > > KDB: current backend: ddb
> > > > exit from kdb_init
> > > > KDB: enter: Boot flags requested debugger
> > > > [ thread pid 0 tid 0 ]
> > > - remove any video consoles from the HEAD kernel config, i.e. sc/vt and
> > >   vga/efi,
> > > - do not use boot -d,
> > > - use serial console (IPMI SOL qualifies),
> > > - set late console to 0,
> > > and show me the verbose dmesg of such boot with the BIOS options
> > > which cause troubles.
> > 
> > Booting...
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> > SMAP type=01 base= len=00099c00
> > SMAP type=02 base=00099c00 len=6400
> > SMAP type=02 base=000e len=0002
> > SMAP type=01 base=0010 len=7906b000
> > SMAP type=02 base=7916b000 len=00936000
> > SMAP type=04 base=79aa1000 len=00509000
> > SMAP type=02 base=79faa000 len=02056000
> > SMAP type=01 base=0001 len=001f8000
> > SMAP type=02 base=7c00 len=1400
> > SMAP type=02 base=fed1c000 len=00029000
> > SMAP type=02 base=ff00 len=0100
> > 
> > This is all. No more.
> When you switch between variations of the NUMA enablement options, do
> you just reboot the machine or do you sometimes physically turn it off ?

just reboot an 'power reset' via kvm client (memory preserved, i mean)

> Try to enable memtest, with the hw.memtest.tests=1 loader variable.
> Does it change things ?

System booted, dmesg is http://zxy.spb.ru/dmesg.numa

# sysctl vm.ndomains vm.default_policy vm.phys_locality vm.phys_segs
vm.ndomains: 1
vm.default_policy: rr
sysctl: unknown oid 'vm.phys_locality'
vm.phys_segs:
SEGMENT 0:

start: 0x1
end:   0x96000
domain:0
free list: 0x80caabc0

SEGMENT 1:

start: 0x10
end:   0x20
domain:0
free list: 0x80caabc0

SEGMENT 2:

start: 0x15c1000
end:   0x15ed000
domain:0
free list: 0x80caa950

SEGMENT 3:

start: 0x15ee000
end:   0x7916b000
domain:0
free list: 0x80caa950

SEGMENT 4:

start: 0x1
end:   0x1f6d8ae000
domain:0
free list: 0x80caa6e0
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Konstantin Belousov
On Mon, Dec 12, 2016 at 07:21:53PM +0300, Slawa Olhovchenkov wrote:
> On Mon, Dec 12, 2016 at 04:54:18PM +0200, Konstantin Belousov wrote:
> 
> > On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> > > Booting...
> > > ESC[01;00H8+0x8+0xe9bdc]  
> > > KDB: debugger backends: ddb
> > > KDB: current backend: ddb
> > > exit from kdb_init
> > > KDB: enter: Boot flags requested debugger
> > > [ thread pid 0 tid 0 ]
> > - remove any video consoles from the HEAD kernel config, i.e. sc/vt and
> >   vga/efi,
> > - do not use boot -d,
> > - use serial console (IPMI SOL qualifies),
> > - set late console to 0,
> > and show me the verbose dmesg of such boot with the BIOS options
> > which cause troubles.
> 
> Booting...
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=01 base= len=00099c00
> SMAP type=02 base=00099c00 len=6400
> SMAP type=02 base=000e len=0002
> SMAP type=01 base=0010 len=7906b000
> SMAP type=02 base=7916b000 len=00936000
> SMAP type=04 base=79aa1000 len=00509000
> SMAP type=02 base=79faa000 len=02056000
> SMAP type=01 base=0001 len=001f8000
> SMAP type=02 base=7c00 len=1400
> SMAP type=02 base=fed1c000 len=00029000
> SMAP type=02 base=ff00 len=0100
> 
> This is all. No more.
When you switch between variations of the NUMA enablement options, do
you just reboot the machine or do you sometimes physically turn it off ?

Try to enable memtest, with the hw.memtest.tests=1 loader variable.
Does it change things ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Slawa Olhovchenkov
On Mon, Dec 12, 2016 at 04:54:18PM +0200, Konstantin Belousov wrote:

> On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> > Booting...
> > ESC[01;00H8+0x8+0xe9bdc]
> >   KDB: debugger backends: ddb
> > KDB: current backend: ddb
> > exit from kdb_init
> > KDB: enter: Boot flags requested debugger
> > [ thread pid 0 tid 0 ]
> - remove any video consoles from the HEAD kernel config, i.e. sc/vt and
>   vga/efi,
> - do not use boot -d,
> - use serial console (IPMI SOL qualifies),
> - set late console to 0,
> and show me the verbose dmesg of such boot with the BIOS options
> which cause troubles.

Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
SMAP type=01 base= len=00099c00
SMAP type=02 base=00099c00 len=6400
SMAP type=02 base=000e len=0002
SMAP type=01 base=0010 len=7906b000
SMAP type=02 base=7916b000 len=00936000
SMAP type=04 base=79aa1000 len=00509000
SMAP type=02 base=79faa000 len=02056000
SMAP type=01 base=0001 len=001f8000
SMAP type=02 base=7c00 len=1400
SMAP type=02 base=fed1c000 len=00029000
SMAP type=02 base=ff00 len=0100

This is all. No more.

> > > Hm, might be also show the output of the 'smap' and 'memmap' output from
> > > the loader.  If any of them worked, could be useful to see the same output
> > > with the NUMA option disabled as well.
> > 
> > NUMA disabled:
> > OK smap
> > SMAP type=01 base= len=00099c00 attr=01
> > SMAP type=02 base=00099c00 len=6400 attr=01
> > SMAP type=02 base=000e len=0002 attr=01
> > SMAP type=01 base=0010 len=7906b000 attr=01
> > SMAP type=02 base=7916b000 len=0093a000 attr=01
> > SMAP type=04 base=79aa5000 len=00505000 attr=01
> > SMAP type=02 base=79faa000 len=02056000 attr=01
> > SMAP type=01 base=0001 len=001f8000 attr=01
> > SMAP type=02 base=7c00 len=1400 attr=01
> > SMAP type=02 base=fed1c000 len=00029000 attr=01
> > SMAP type=02 base=ff00 len=0100 attr=01
> > NUMA enabled:
> > OK smap
> > SMAP type=01 base= len=00099c00 attr=01
> > SMAP type=02 base=00099c00 len=6400 attr=01
> > SMAP type=02 base=000e len=0002 attr=01
> > SMAP type=01 base=0010 len=7906b000 attr=01
> > SMAP type=04 base=79aa1000 len=00509000 attr=01
> > SMAP type=02 base=79faa000 len=02056000 attr=01
> > SMAP type=01 base=0001 len=001f8000 attr=01
> > SMAP type=02 base=7c00 len=1400 attr=01
> > SMAP type=02 base=fed1c000 len=00029000 attr=01
> > SMAP type=02 base=ff00 len=0100 attr=01
> > OK memmap
> > memmap not found
> 
> I.e. you use CMS in BIOS/legacy BIOS boot, right ?

legacy BIOS boot, yes

> Can you double-check that the smap output is indeed same for all three
> cases (no-NUMA, NUMA without <4G interleave, NUMA with 4G interleave) ?

NUMA=ON  4G interleave=ON
SMAP type=01 base= len=00099c00 attr=01 
SMAP type=02 base=00099c00 len=6400 attr=01 
SMAP type=02 base=000e len=0002 attr=01 
SMAP type=01 base=0010 len=7906b000 attr=01 
SMAP type=02 base=7916b000 len=00936000 attr=01 
SMAP type=04 base=79aa1000 len=00509000 attr=01 
SMAP type=02 base=79faa000 len=02056000 attr=01 
SMAP type=01 base=0001 len=001f8000 attr=01 
SMAP type=02 base=7c00 len=1400 attr=01 
SMAP type=02 base=fed1c000 len=00029000 attr=01 
SMAP type=02 base=ff00 len=0100 attr=01 

NUMA=ON  4G interleave=OFF
SMAP type=01 base= len=00099c00 attr=01 
SMAP type=02 base=00099c00 len=6400 attr=01 
SMAP type=02 base=000e len=0002 attr=01 
SMAP type=01 base=0010 len=7906b000 attr=01 
SMAP type=02 base=7916b000 len=00936000 attr=01 
SMAP type=04 base=79aa1000 len=00509000 attr=01 
SMAP type=02 base=79faa000 len=02056000 attr=01 
SMAP type=01 base=0001 len=001f8000 attr=01 
SMAP type=02 base=7c00 len=1400 

Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-12 Thread Konstantin Belousov
On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> Booting...
> ESC[01;00H8+0x8+0xe9bdc]  
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> exit from kdb_init
> KDB: enter: Boot flags requested debugger
> [ thread pid 0 tid 0 ]
- remove any video consoles from the HEAD kernel config, i.e. sc/vt and
  vga/efi,
- do not use boot -d,
- use serial console (IPMI SOL qualifies),
- set late console to 0,
and show me the verbose dmesg of such boot with the BIOS options
which cause troubles.

> 
> > Hm, might be also show the output of the 'smap' and 'memmap' output from
> > the loader.  If any of them worked, could be useful to see the same output
> > with the NUMA option disabled as well.
> 
> NUMA disabled:
> OK smap
> SMAP type=01 base= len=00099c00 attr=01
> SMAP type=02 base=00099c00 len=6400 attr=01
> SMAP type=02 base=000e len=0002 attr=01
> SMAP type=01 base=0010 len=7906b000 attr=01
> SMAP type=02 base=7916b000 len=0093a000 attr=01
> SMAP type=04 base=79aa5000 len=00505000 attr=01
> SMAP type=02 base=79faa000 len=02056000 attr=01
> SMAP type=01 base=0001 len=001f8000 attr=01
> SMAP type=02 base=7c00 len=1400 attr=01
> SMAP type=02 base=fed1c000 len=00029000 attr=01
> SMAP type=02 base=ff00 len=0100 attr=01
> NUMA enabled:
> OK smap
> SMAP type=01 base= len=00099c00 attr=01
> SMAP type=02 base=00099c00 len=6400 attr=01
> SMAP type=02 base=000e len=0002 attr=01
> SMAP type=01 base=0010 len=7906b000 attr=01
> SMAP type=04 base=79aa1000 len=00509000 attr=01
> SMAP type=02 base=79faa000 len=02056000 attr=01
> SMAP type=01 base=0001 len=001f8000 attr=01
> SMAP type=02 base=7c00 len=1400 attr=01
> SMAP type=02 base=fed1c000 len=00029000 attr=01
> SMAP type=02 base=ff00 len=0100 attr=01
> OK memmap
> memmap not found

I.e. you use CMS in BIOS/legacy BIOS boot, right ?
Can you double-check that the smap output is indeed same for all three
cases (no-NUMA, NUMA without <4G interleave, NUMA with 4G interleave) ?

msgbuf setup is very early activity and for most practical purposes it
only depends on the physical segments layout of the machine.  So I do
expect that smap layout changes when the options are frobbed.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Slawa Olhovchenkov
On Mon, Dec 12, 2016 at 01:46:21AM +0300, Slawa Olhovchenkov wrote:

> > > > > > Can you show the verbose dmesg up to the failure point ?
> > > > > > In particular, the SMAP lines should be relevant.
> > > > > 
> > > > > KDB: debugger backends: ddb
> > > > > KDB: current backend: ddb
> > > > > exit from kdb_init
> > > > > KDB: enter: Boot flags requested debugger
> > > > > [ thread pid 0 tid 0 ]
> > > > > Stopped at  0x805361eb = kdb_enter+0x3b:movq
> > > > > $0,0x80dcef20 = kdb_why

Hang at boot is result of combination next two BIOS setting:

NUMA enabled
Socket Interleave below 4G enabled (memory options)

Disabling any allow boot.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Slawa Olhovchenkov
On Mon, Dec 12, 2016 at 12:15:53AM +0300, Slawa Olhovchenkov wrote:

> On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:
> 
> > On Sun, Dec 11, 2016 at 10:06:54PM +0200, Konstantin Belousov wrote:
> > 
> > > On Sun, Dec 11, 2016 at 10:45:59PM +0300, Slawa Olhovchenkov wrote:
> > > > On Sun, Dec 11, 2016 at 09:26:56PM +0200, Konstantin Belousov wrote:
> > > > 
> > > > > On Sun, Dec 11, 2016 at 10:16:26PM +0300, Slawa Olhovchenkov wrote:
> > > > > > On Sun, Dec 11, 2016 at 09:21:11PM +0300, Slawa Olhovchenkov wrote:
> > > > > > 
> > > > > > > On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov 
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov 
> > > > > > > > wrote:
> > > > > > > > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > > > > > > > Boot stoped after next messages:
> > > > > > > > > 
> > > > > > > > > ===
> > > > > > > > > Booting...
> > > > > > > > > KDB: debugger backends: ddb
> > > > > > > > > KDB: current backend: ddb
> > > > > > > > So at least the hammer_time() has a chance to initialize the 
> > > > > > > > console.
> > > > > > > > Do you have serial console ?  Set the loader tunable 
> > > > > > > > debug.late_console
> > > > > > > > to 1 and see if any NMI reaction appear.
> > > > > > > > 
> > > > > > > > > ===
> > > > > > > > > 
> > > > > > > > > This is verbose boot.
> > > > > > > > > No reaction to ~^B, NMI.
> > > > > > > > > 
> > > > > > > > > Same for head and 10.3-RELEASE.
> > > > > > > > > 
> > > > > > > > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > > > > > > > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' 
> > > > > > > > ?
> > > > > > > > What if you try to frob it ?
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On slight different hardware
> > > > > > > > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > > > > > > > 10.3 boot ok w/ BIOS NUMA enabled.
> > > > > > > > 
> > > > > > > > I think the only way to debug this is to add printf() lines to 
> > > > > > > > hammer_time()
> > > > > > > > to see where does it break.  Note that amd64_kdb_init() call 
> > > > > > > > succeeded,
> > > > > > > > so you can start bisect the code from there.
> > > > > > > > 
> > > > > > > 
> > > > > > > Hang in next two lines:
> > > > > > > 
> > > > > > > msgbufinit(msgbufp, msgbufsize);
> > > > > > >   fpuinit();
> > > > > 
> > > > > Can you show the verbose dmesg up to the failure point ?
> > > > > In particular, the SMAP lines should be relevant.
> > > > 
> > > > KDB: debugger backends: ddb
> > > > KDB: current backend: ddb
> > > > exit from kdb_init
> > > > KDB: enter: Boot flags requested debugger
> > > > [ thread pid 0 tid 0 ]
> > > > Stopped at  0x805361eb = kdb_enter+0x3b:movq
> > > > $0,0x80dcef20 = kdb_why
> > > > 
> > > > No SMAP print, boot_verbose enabled.
> > > The log above shows that you used boot -d. What are the pristine boot
> > > messages, with debug.late_console set to 0, of course ?
> > 
> > This is stable/11, no debug.late_console.
> 
> Booting HEAD:
> 
> panic: pmap_mapdev_attr: too many preinit mappings
> cpuid = 0
> KDB: stack backtrace:
> #0 0x80535197 at ??+0
> #1 0x804eb0f2 at ??+0
> #2 0x804eaf63 at ??+0
> #3 0x807b5995 at ??+0
> #4 0x808479ca at ??+0
> #5 0x804079ea at ??+0
> #6 0x8040bb44 at ??+0
> #7 0x8047e178 at ??+0
> #8 0x807a47c3 at ??+0
> #9 0x8028f0a4 at ??+0
> Uptime: 1s

Like this is debug.late_console=0 issuse.
Same panic with NUMA disabled.
Only set debug.late_console=1 allow to boot
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Slawa Olhovchenkov
On Sun, Dec 11, 2016 at 11:47:09PM +0300, Slawa Olhovchenkov wrote:

> On Sun, Dec 11, 2016 at 10:06:54PM +0200, Konstantin Belousov wrote:
> 
> > On Sun, Dec 11, 2016 at 10:45:59PM +0300, Slawa Olhovchenkov wrote:
> > > On Sun, Dec 11, 2016 at 09:26:56PM +0200, Konstantin Belousov wrote:
> > > 
> > > > On Sun, Dec 11, 2016 at 10:16:26PM +0300, Slawa Olhovchenkov wrote:
> > > > > On Sun, Dec 11, 2016 at 09:21:11PM +0300, Slawa Olhovchenkov wrote:
> > > > > 
> > > > > > On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> > > > > > 
> > > > > > > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov 
> > > > > > > wrote:
> > > > > > > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > > > > > > Boot stoped after next messages:
> > > > > > > > 
> > > > > > > > ===
> > > > > > > > Booting...
> > > > > > > > KDB: debugger backends: ddb
> > > > > > > > KDB: current backend: ddb
> > > > > > > So at least the hammer_time() has a chance to initialize the 
> > > > > > > console.
> > > > > > > Do you have serial console ?  Set the loader tunable 
> > > > > > > debug.late_console
> > > > > > > to 1 and see if any NMI reaction appear.
> > > > > > > 
> > > > > > > > ===
> > > > > > > > 
> > > > > > > > This is verbose boot.
> > > > > > > > No reaction to ~^B, NMI.
> > > > > > > > 
> > > > > > > > Same for head and 10.3-RELEASE.
> > > > > > > > 
> > > > > > > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > > > > > > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> > > > > > > What if you try to frob it ?
> > > > > > > 
> > > > > > > > 
> > > > > > > > On slight different hardware
> > > > > > > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > > > > > > 10.3 boot ok w/ BIOS NUMA enabled.
> > > > > > > 
> > > > > > > I think the only way to debug this is to add printf() lines to 
> > > > > > > hammer_time()
> > > > > > > to see where does it break.  Note that amd64_kdb_init() call 
> > > > > > > succeeded,
> > > > > > > so you can start bisect the code from there.
> > > > > > > 
> > > > > > 
> > > > > > Hang in next two lines:
> > > > > > 
> > > > > > msgbufinit(msgbufp, msgbufsize);
> > > > > > fpuinit();
> > > > 
> > > > Can you show the verbose dmesg up to the failure point ?
> > > > In particular, the SMAP lines should be relevant.
> > > 
> > > KDB: debugger backends: ddb
> > > KDB: current backend: ddb
> > > exit from kdb_init
> > > KDB: enter: Boot flags requested debugger
> > > [ thread pid 0 tid 0 ]
> > > Stopped at  0x805361eb = kdb_enter+0x3b:movq
> > > $0,0x80dcef20 = kdb_why
> > > 
> > > No SMAP print, boot_verbose enabled.
> > The log above shows that you used boot -d. What are the pristine boot
> > messages, with debug.late_console set to 0, of course ?
> 
> This is stable/11, no debug.late_console.

Booting HEAD:

panic: pmap_mapdev_attr: too many preinit mappings
cpuid = 0
KDB: stack backtrace:
#0 0x80535197 at ??+0
#1 0x804eb0f2 at ??+0
#2 0x804eaf63 at ??+0
#3 0x807b5995 at ??+0
#4 0x808479ca at ??+0
#5 0x804079ea at ??+0
#6 0x8040bb44 at ??+0
#7 0x8047e178 at ??+0
#8 0x807a47c3 at ??+0
#9 0x8028f0a4 at ??+0
Uptime: 1s


> With ANSI ESC, captured from SOL:
> 
> ESC[01;00HType '?' for a /boot/kernel.VSTREAM/opensolaris.ko size 0xcb10 at 
> 0x13d3000
> ESC[01;00HOK smap 
> /boot/kernel.VSTREAM/if_igb.ko size 0x69f10 at 0x13e
> ESC[02;00HSMAP type=02 base=0009ESC[01;00HSMAP type=01 
> base= len=00099c00 attr=01 
> ESC[02;00HSMAP type=02 base=00099c00 len=6400 attr=01 
> ESC[03;00Hcan't find 'if_ixgbe'
> ESC[01;00H/boot/kernel.VSTREAM/if_lagg.ko size 0x150c0 at 0x144a000
> ESC[01;00HSMAP type=02 base=000e/boot/kernel.VSTREAM/ukbd.ko size 
> 0xe280 at 0x146
> loading required moduleESC[01;00H 'usb'^MSMAP type=01 base=7916b000 
> len=00936000 attr
> =01 ESC[02;00H/boot/kernel.VSTREAM/usb.ko size 0x45d40 at 
> 0x146f000
> ESC[01;00HSMAP type=04 base=79aa1000 
> len=00/boot/kernel.VSTREAM/umass.ko size 0xaa10 at 0x14b5000
> ESC[01;00HSMAP type=02 base=79faa000 len=02056000 attr=01 
> ESC[02;00HSMAP type=01 ba/boot/kernel.VSTREAM/accf_http.ko size 
> 0x2710 at 0x14c
> ESC[01;00HSMAP type=01 base=0001 len=001f8000 attr=01 
> /boot/kernel.VSTREAM/sfxge.ko ESC[02;00HSMAP type=02 
> base=7c00 len=1400 attr=01 size 
> 0x1a8ee0 at 0x14c3000
> ESC[03;00HSMAP type=02 base=ff00 len=0100 
> attr/boot/kernel.VSTREAM/uhci.ko size 0xd448 at 0x166c000
> ESC[01;00HSMAP type=02 base=fed1c000 len=00029000 attr=01 
> /boot/kernel.VSTREAM/ohci.ko size 

Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Slawa Olhovchenkov
On Sun, Dec 11, 2016 at 10:06:54PM +0200, Konstantin Belousov wrote:

> On Sun, Dec 11, 2016 at 10:45:59PM +0300, Slawa Olhovchenkov wrote:
> > On Sun, Dec 11, 2016 at 09:26:56PM +0200, Konstantin Belousov wrote:
> > 
> > > On Sun, Dec 11, 2016 at 10:16:26PM +0300, Slawa Olhovchenkov wrote:
> > > > On Sun, Dec 11, 2016 at 09:21:11PM +0300, Slawa Olhovchenkov wrote:
> > > > 
> > > > > On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> > > > > 
> > > > > > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > > > > > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > > > > > Boot stoped after next messages:
> > > > > > > 
> > > > > > > ===
> > > > > > > Booting...
> > > > > > > KDB: debugger backends: ddb
> > > > > > > KDB: current backend: ddb
> > > > > > So at least the hammer_time() has a chance to initialize the 
> > > > > > console.
> > > > > > Do you have serial console ?  Set the loader tunable 
> > > > > > debug.late_console
> > > > > > to 1 and see if any NMI reaction appear.
> > > > > > 
> > > > > > > ===
> > > > > > > 
> > > > > > > This is verbose boot.
> > > > > > > No reaction to ~^B, NMI.
> > > > > > > 
> > > > > > > Same for head and 10.3-RELEASE.
> > > > > > > 
> > > > > > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > > > > > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> > > > > > What if you try to frob it ?
> > > > > > 
> > > > > > > 
> > > > > > > On slight different hardware
> > > > > > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > > > > > 10.3 boot ok w/ BIOS NUMA enabled.
> > > > > > 
> > > > > > I think the only way to debug this is to add printf() lines to 
> > > > > > hammer_time()
> > > > > > to see where does it break.  Note that amd64_kdb_init() call 
> > > > > > succeeded,
> > > > > > so you can start bisect the code from there.
> > > > > > 
> > > > > 
> > > > > Hang in next two lines:
> > > > > 
> > > > > msgbufinit(msgbufp, msgbufsize);
> > > > >   fpuinit();
> > > 
> > > Can you show the verbose dmesg up to the failure point ?
> > > In particular, the SMAP lines should be relevant.
> > 
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> > exit from kdb_init
> > KDB: enter: Boot flags requested debugger
> > [ thread pid 0 tid 0 ]
> > Stopped at  0x805361eb = kdb_enter+0x3b:movq
> > $0,0x80dcef20 = kdb_why
> > 
> > No SMAP print, boot_verbose enabled.
> The log above shows that you used boot -d. What are the pristine boot
> messages, with debug.late_console set to 0, of course ?

This is stable/11, no debug.late_console.
With ANSI ESC, captured from SOL:

ESC[01;00HType '?' for a /boot/kernel.VSTREAM/opensolaris.ko size 0xcb10 at 
0x13d3000
ESC[01;00HOK smap   
  /boot/kernel.VSTREAM/if_igb.ko size 0x69f10 at 0x13e
ESC[02;00HSMAP type=02 base=0009ESC[01;00HSMAP type=01 
base= len=00099c00 attr=01 
ESC[02;00HSMAP type=02 base=00099c00 len=6400 attr=01   
  ESC[03;00Hcan't find 'if_ixgbe'
ESC[01;00H/boot/kernel.VSTREAM/if_lagg.ko size 0x150c0 at 0x144a000
ESC[01;00HSMAP type=02 base=000e/boot/kernel.VSTREAM/ukbd.ko size 
0xe280 at 0x146
loading required moduleESC[01;00H 'usb'^MSMAP type=01 base=7916b000 
len=00936000 attr
=01 ESC[02;00H/boot/kernel.VSTREAM/usb.ko size 0x45d40 at 
0x146f000
ESC[01;00HSMAP type=04 base=79aa1000 
len=00/boot/kernel.VSTREAM/umass.ko size 0xaa10 at 0x14b5000
ESC[01;00HSMAP type=02 base=79faa000 len=02056000 attr=01   
  ESC[02;00HSMAP type=01 ba/boot/kernel.VSTREAM/accf_http.ko size 
0x2710 at 0x14c
ESC[01;00HSMAP type=01 base=0001 len=001f8000 attr=01   
  /boot/kernel.VSTREAM/sfxge.ko ESC[02;00HSMAP type=02 
base=7c00 len=1400 attr=01 size 
0x1a8ee0 at 0x14c3000
ESC[03;00HSMAP type=02 base=ff00 len=0100 
attr/boot/kernel.VSTREAM/uhci.ko size 0xd448 at 0x166c000
ESC[01;00HSMAP type=02 base=fed1c000 len=00029000 attr=01   
  /boot/kernel.VSTREAM/ohci.ko size 0xc900 at 0x167a000
ESC[02;00HOK memmap 
  ESC[03;00Hmemmap not foun/boot/kernel.VSTREAM/ehci.ko size 0xfb60 at 
0x1687000
ESC[01;00HOK memmap   
/boot/kernel.VSTREAM/xhci.ko size 0x11010 at 0x1697000
ESC[01;00Hmemmap not found  /boot/kernel.VSTREAM/if_ix.ko size 
0x51358 at 0x16a9000^MESC[01;00HOK boot  
  ESC[01;00HOK boot   
/boot/kernel.VSTREAM/cc_htcp.ko size 0x3a70 at 0x16fb000
Booting...
ESC[01;00H8+0x8+0xe9bdc]

Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Konstantin Belousov
On Sun, Dec 11, 2016 at 10:45:59PM +0300, Slawa Olhovchenkov wrote:
> On Sun, Dec 11, 2016 at 09:26:56PM +0200, Konstantin Belousov wrote:
> 
> > On Sun, Dec 11, 2016 at 10:16:26PM +0300, Slawa Olhovchenkov wrote:
> > > On Sun, Dec 11, 2016 at 09:21:11PM +0300, Slawa Olhovchenkov wrote:
> > > 
> > > > On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> > > > 
> > > > > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > > > > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > > > > Boot stoped after next messages:
> > > > > > 
> > > > > > ===
> > > > > > Booting...
> > > > > > KDB: debugger backends: ddb
> > > > > > KDB: current backend: ddb
> > > > > So at least the hammer_time() has a chance to initialize the console.
> > > > > Do you have serial console ?  Set the loader tunable 
> > > > > debug.late_console
> > > > > to 1 and see if any NMI reaction appear.
> > > > > 
> > > > > > ===
> > > > > > 
> > > > > > This is verbose boot.
> > > > > > No reaction to ~^B, NMI.
> > > > > > 
> > > > > > Same for head and 10.3-RELEASE.
> > > > > > 
> > > > > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > > > > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> > > > > What if you try to frob it ?
> > > > > 
> > > > > > 
> > > > > > On slight different hardware
> > > > > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > > > > 10.3 boot ok w/ BIOS NUMA enabled.
> > > > > 
> > > > > I think the only way to debug this is to add printf() lines to 
> > > > > hammer_time()
> > > > > to see where does it break.  Note that amd64_kdb_init() call 
> > > > > succeeded,
> > > > > so you can start bisect the code from there.
> > > > > 
> > > > 
> > > > Hang in next two lines:
> > > > 
> > > > msgbufinit(msgbufp, msgbufsize);
> > > > fpuinit();
> > 
> > Can you show the verbose dmesg up to the failure point ?
> > In particular, the SMAP lines should be relevant.
> 
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> exit from kdb_init
> KDB: enter: Boot flags requested debugger
> [ thread pid 0 tid 0 ]
> Stopped at  0x805361eb = kdb_enter+0x3b:movq
> $0,0x80dcef20 = kdb_why
> 
> No SMAP print, boot_verbose enabled.
The log above shows that you used boot -d. What are the pristine boot
messages, with debug.late_console set to 0, of course ?

Hm, might be also show the output of the 'smap' and 'memmap' output from
the loader.  If any of them worked, could be useful to see the same output
with the NUMA option disabled as well.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Slawa Olhovchenkov
On Sun, Dec 11, 2016 at 09:26:56PM +0200, Konstantin Belousov wrote:

> On Sun, Dec 11, 2016 at 10:16:26PM +0300, Slawa Olhovchenkov wrote:
> > On Sun, Dec 11, 2016 at 09:21:11PM +0300, Slawa Olhovchenkov wrote:
> > 
> > > On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> > > 
> > > > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > > > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > > > Boot stoped after next messages:
> > > > > 
> > > > > ===
> > > > > Booting...
> > > > > KDB: debugger backends: ddb
> > > > > KDB: current backend: ddb
> > > > So at least the hammer_time() has a chance to initialize the console.
> > > > Do you have serial console ?  Set the loader tunable debug.late_console
> > > > to 1 and see if any NMI reaction appear.
> > > > 
> > > > > ===
> > > > > 
> > > > > This is verbose boot.
> > > > > No reaction to ~^B, NMI.
> > > > > 
> > > > > Same for head and 10.3-RELEASE.
> > > > > 
> > > > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > > > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> > > > What if you try to frob it ?
> > > > 
> > > > > 
> > > > > On slight different hardware
> > > > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > > > 10.3 boot ok w/ BIOS NUMA enabled.
> > > > 
> > > > I think the only way to debug this is to add printf() lines to 
> > > > hammer_time()
> > > > to see where does it break.  Note that amd64_kdb_init() call succeeded,
> > > > so you can start bisect the code from there.
> > > > 
> > > 
> > > Hang in next two lines:
> > > 
> > > msgbufinit(msgbufp, msgbufsize);
> > >   fpuinit();
> 
> Can you show the verbose dmesg up to the failure point ?
> In particular, the SMAP lines should be relevant.

KDB: debugger backends: ddb
KDB: current backend: ddb
exit from kdb_init
KDB: enter: Boot flags requested debugger
[ thread pid 0 tid 0 ]
Stopped at  0x805361eb = kdb_enter+0x3b:movq
$0,0x80dcef20 = kdb_why

No SMAP print, boot_verbose enabled.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Konstantin Belousov
On Sun, Dec 11, 2016 at 10:16:26PM +0300, Slawa Olhovchenkov wrote:
> On Sun, Dec 11, 2016 at 09:21:11PM +0300, Slawa Olhovchenkov wrote:
> 
> > On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> > 
> > > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > > Boot stoped after next messages:
> > > > 
> > > > ===
> > > > Booting...
> > > > KDB: debugger backends: ddb
> > > > KDB: current backend: ddb
> > > So at least the hammer_time() has a chance to initialize the console.
> > > Do you have serial console ?  Set the loader tunable debug.late_console
> > > to 1 and see if any NMI reaction appear.
> > > 
> > > > ===
> > > > 
> > > > This is verbose boot.
> > > > No reaction to ~^B, NMI.
> > > > 
> > > > Same for head and 10.3-RELEASE.
> > > > 
> > > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> > > What if you try to frob it ?
> > > 
> > > > 
> > > > On slight different hardware
> > > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > > 10.3 boot ok w/ BIOS NUMA enabled.
> > > 
> > > I think the only way to debug this is to add printf() lines to 
> > > hammer_time()
> > > to see where does it break.  Note that amd64_kdb_init() call succeeded,
> > > so you can start bisect the code from there.
> > > 
> > 
> > Hang in next two lines:
> > 
> > msgbufinit(msgbufp, msgbufsize);
> > fpuinit();

Can you show the verbose dmesg up to the failure point ?
In particular, the SMAP lines should be relevant.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Slawa Olhovchenkov
On Sun, Dec 11, 2016 at 09:21:11PM +0300, Slawa Olhovchenkov wrote:

> On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> 
> > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > Boot stoped after next messages:
> > > 
> > > ===
> > > Booting...
> > > KDB: debugger backends: ddb
> > > KDB: current backend: ddb
> > So at least the hammer_time() has a chance to initialize the console.
> > Do you have serial console ?  Set the loader tunable debug.late_console
> > to 1 and see if any NMI reaction appear.
> > 
> > > ===
> > > 
> > > This is verbose boot.
> > > No reaction to ~^B, NMI.
> > > 
> > > Same for head and 10.3-RELEASE.
> > > 
> > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> > What if you try to frob it ?
> > 
> > > 
> > > On slight different hardware
> > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > 10.3 boot ok w/ BIOS NUMA enabled.
> > 
> > I think the only way to debug this is to add printf() lines to hammer_time()
> > to see where does it break.  Note that amd64_kdb_init() call succeeded,
> > so you can start bisect the code from there.
> > 
> 
> Hang in next two lines:
> 
> msgbufinit(msgbufp, msgbufsize);
>   fpuinit();

[ thread pid 0 tid 0 ]
Stopped at  0x80538c10 = msgbuf_reinit: pushq   %rbp
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c11 = msgbuf_reinit+0x1: movq%rsp,%rbp
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c14 = msgbuf_reinit+0x4: pushq   %r14
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c16 = msgbuf_reinit+0x6: pushq   %rbx
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c17 = msgbuf_reinit+0x7: movl%edx,%r8d
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c1a = msgbuf_reinit+0xa: movq%rdi,%r14
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c1d = msgbuf_reinit+0xd: movq0x8(%r14),%rax
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c21 = msgbuf_reinit+0x11:cmpl
$0x63062,%eax
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538c26 = msgbuf_reinit+0x16:jnz 
0x80538d37 = msgbuf_reinit+0x127
db> 
[ thread pid 0 tid 0 ]
Stopped at  0x80538d37 = msgbuf_reinit+0x127:   movq
%rsi,(%r14)
db> 
[ thread pid 0 tid 0 ]
Stopped atKDB: reentering
KDB: stack backtrace:
  db_trace_self_wrapper() at 0x8032de4b = 
db_trace_self_wrapper+0x2b/frame 0x80bb9370
kdb_reenter() at 0x8053670e = kdb_reenter+0x8e/frame 0x80bb9420
trap() at 0x807c0a31 = trap+0x51/frame 0x80bb9630
calltrap() at 0x807a5011 = calltrap+0x8/frame 0x80bb9630
--- trap 0x1c, rip = 0x8032db33, rsp = 0x80bb9700, rbp = 
0x80bb9730 ---
X_db_search_symbol() at 0x8032db33 = X_db_search_symbol+0x53/frame 
0x80bb9730
db_printsym() at 0x80330f30 = db_printsym+0x70/frame 0x80bb97a0
db_print_loc_and_inst() at 0x8032c0a3 = 
db_print_loc_and_inst+0x13/frame 0x80bb97c0
db_trap() at 0x8032df7f = db_trap+0xcf/frame 0x80bb9850
kdb_trap() at 0x80536b43 = kdb_trap+0x193/frame 0x80bb98e0
trap() at 0x807c0c3c = trap+0x25c/frame 0x80bb9af0
calltrap() at 0x807a5011 = calltrap+0x8/frame 0x80bb9af0
--- trap 0xa, rip = 0x80538d3a, rsp = 0x80bb9bc0, rbp = 
0x80bb9bd0 ---
msgbuf_reinit() at 0x80538d3a = msgbuf_reinit+0x12a/frame 
0x80bb9bd0
msgbufinit() at 0x8053dd31 = msgbufinit+0x21/frame 0x80bb9be0
hammer_time() at 0x807aa5fa = hammer_time+0xf8a/frame 0x80bba070
btext() at 0x8028fc34 = btext+0x24
db> show registers
cs0x20
ds0x28
es0x28
fs0x28
gs0x28
ss0x28
rax 0x
rcx  0
rdx0x17fb8
rbx0x7
rsp 0x80bb9bc0  __stop_set_pcpu+0xb48
rbp 0x80bb9bd0  __stop_set_pcpu+0xb58
rsi 0xf8207ffe8000
rdi 0xf8207fb8
r8 0x17fb8
r9  0x80bb9818  __stop_set_pcpu+0x7a0
r10  0
r11  0
r12  0
r13   0x208000
r14 0xf8207fb8
r15 0x8180
rip 0x80538d3a  msgbuf_reinit+0x12a
rflags0x82
0x80538d3a = msgbuf_reinit+0x12a:   movl%r8d,0xc(%r14)
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 

Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-12-11 Thread Slawa Olhovchenkov
On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:

> On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > I am try to enable NUMA in bios and can't boot FreeBSD.
> > Boot stoped after next messages:
> > 
> > ===
> > Booting...
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> So at least the hammer_time() has a chance to initialize the console.
> Do you have serial console ?  Set the loader tunable debug.late_console
> to 1 and see if any NMI reaction appear.
> 
> > ===
> > 
> > This is verbose boot.
> > No reaction to ~^B, NMI.
> > 
> > Same for head and 10.3-RELEASE.
> > 
> > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> What if you try to frob it ?
> 
> > 
> > On slight different hardware
> > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > 10.3 boot ok w/ BIOS NUMA enabled.
> 
> I think the only way to debug this is to add printf() lines to hammer_time()
> to see where does it break.  Note that amd64_kdb_init() call succeeded,
> so you can start bisect the code from there.
> 

Hang in next two lines:

msgbufinit(msgbufp, msgbufsize);
fpuinit();
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-27 Thread Adrian Chadd
On 26 November 2016 at 13:59, Slawa Olhovchenkov  wrote:
> On Sat, Nov 26, 2016 at 01:55:14PM -0800, Adrian Chadd wrote:
>
>> ok, hm. then i don' know offhand, not without putting in printf debugging. :)
>
> I am not expert in this code, I am need you patches for printf debugging.

heh, sorry, I'm too busy atm to help out more with this. My day job
doesn't involve FreeBSD at all and I have 802.11ac to get off the
ground.

Someone else will have to help figure this out :(



-adrian
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Slawa Olhovchenkov
On Sat, Nov 26, 2016 at 01:55:14PM -0800, Adrian Chadd wrote:

> ok, hm. then i don' know offhand, not without putting in printf debugging. :)

I am not expert in this code, I am need you patches for printf debugging.

> On 26 November 2016 at 13:51, Slawa Olhovchenkov  wrote:
> > On Sat, Nov 26, 2016 at 01:49:00PM -0800, Adrian Chadd wrote:
> >
> >> Ok. So boot verbose and let's see what it says.
> >
> > See first message: it's already verbose boot.
> > Yes, only 3 lines.
> >
> >>
> >> On 26 November 2016 at 10:39, Slawa Olhovchenkov  wrote:
> >> > On Sat, Nov 26, 2016 at 09:44:49AM -0800, Adrian Chadd wrote:
> >> >
> >> >> The ACPI SRAT parsing code - sys/x86/acpica/srat.c .
> >> >>
> >> >> I'd start by enabling bootverbose - adds one echo (SLIT.Localities and
> >> >> the table); adds CPU affinity info (legacy, XAPIC, ACPI) and other
> >> >> locality stuff.
> >> >
> >> > I am use r308809 of HEAD.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Adrian Chadd
ok, hm. then i don' know offhand, not without putting in printf debugging. :)


-a

On 26 November 2016 at 13:51, Slawa Olhovchenkov  wrote:
> On Sat, Nov 26, 2016 at 01:49:00PM -0800, Adrian Chadd wrote:
>
>> Ok. So boot verbose and let's see what it says.
>
> See first message: it's already verbose boot.
> Yes, only 3 lines.
>
>>
>> On 26 November 2016 at 10:39, Slawa Olhovchenkov  wrote:
>> > On Sat, Nov 26, 2016 at 09:44:49AM -0800, Adrian Chadd wrote:
>> >
>> >> The ACPI SRAT parsing code - sys/x86/acpica/srat.c .
>> >>
>> >> I'd start by enabling bootverbose - adds one echo (SLIT.Localities and
>> >> the table); adds CPU affinity info (legacy, XAPIC, ACPI) and other
>> >> locality stuff.
>> >
>> > I am use r308809 of HEAD.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Slawa Olhovchenkov
On Sat, Nov 26, 2016 at 01:49:00PM -0800, Adrian Chadd wrote:

> Ok. So boot verbose and let's see what it says.

See first message: it's already verbose boot.
Yes, only 3 lines.

> 
> On 26 November 2016 at 10:39, Slawa Olhovchenkov  wrote:
> > On Sat, Nov 26, 2016 at 09:44:49AM -0800, Adrian Chadd wrote:
> >
> >> The ACPI SRAT parsing code - sys/x86/acpica/srat.c .
> >>
> >> I'd start by enabling bootverbose - adds one echo (SLIT.Localities and
> >> the table); adds CPU affinity info (legacy, XAPIC, ACPI) and other
> >> locality stuff.
> >
> > I am use r308809 of HEAD.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Adrian Chadd
Ok. So boot verbose and let's see what it says.


-adrian


On 26 November 2016 at 10:39, Slawa Olhovchenkov  wrote:
> On Sat, Nov 26, 2016 at 09:44:49AM -0800, Adrian Chadd wrote:
>
>> The ACPI SRAT parsing code - sys/x86/acpica/srat.c .
>>
>> I'd start by enabling bootverbose - adds one echo (SLIT.Localities and
>> the table); adds CPU affinity info (legacy, XAPIC, ACPI) and other
>> locality stuff.
>
> I am use r308809 of HEAD.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Slawa Olhovchenkov
On Sat, Nov 26, 2016 at 09:44:49AM -0800, Adrian Chadd wrote:

> The ACPI SRAT parsing code - sys/x86/acpica/srat.c .
> 
> I'd start by enabling bootverbose - adds one echo (SLIT.Localities and
> the table); adds CPU affinity info (legacy, XAPIC, ACPI) and other
> locality stuff.

I am use r308809 of HEAD.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Adrian Chadd
The ACPI SRAT parsing code - sys/x86/acpica/srat.c .

I'd start by enabling bootverbose - adds one echo (SLIT.Localities and
the table); adds CPU affinity info (legacy, XAPIC, ACPI) and other
locality stuff.


-adrian


On 26 November 2016 at 09:37, Slawa Olhovchenkov  wrote:
> On Sat, Nov 26, 2016 at 09:35:08AM -0800, Adrian Chadd wrote:
>
>> It may be something to do with memory topology parsing. Maybe we need
>> some more debugging there to try and catch it.
>
> What debug you need?
>
>> On 26 November 2016 at 01:21, Slawa Olhovchenkov  wrote:
>> > I am try to enable NUMA in bios and can't boot FreeBSD.
>> > Boot stoped after next messages:
>> >
>> > ===
>> > Booting...
>> > KDB: debugger backends: ddb
>> > KDB: current backend: ddb
>> > ===
>> >
>> > This is verbose boot.
>> > No reaction to ~^B, NMI.
>> >
>> > Same for head and 10.3-RELEASE.
>> >
>> > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
>> >
>> > On slight different hardware
>> > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
>> > 10.3 boot ok w/ BIOS NUMA enabled.
>> >
>> > ___
>> > freebsd-current@freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Slawa Olhovchenkov
On Sat, Nov 26, 2016 at 09:35:08AM -0800, Adrian Chadd wrote:

> It may be something to do with memory topology parsing. Maybe we need
> some more debugging there to try and catch it.

What debug you need?

> On 26 November 2016 at 01:21, Slawa Olhovchenkov  wrote:
> > I am try to enable NUMA in bios and can't boot FreeBSD.
> > Boot stoped after next messages:
> >
> > ===
> > Booting...
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> > ===
> >
> > This is verbose boot.
> > No reaction to ~^B, NMI.
> >
> > Same for head and 10.3-RELEASE.
> >
> > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> >
> > On slight different hardware
> > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > 10.3 boot ok w/ BIOS NUMA enabled.
> >
> > ___
> > freebsd-current@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Adrian Chadd
It may be something to do with memory topology parsing. Maybe we need
some more debugging there to try and catch it.



-a


On 26 November 2016 at 01:21, Slawa Olhovchenkov  wrote:
> I am try to enable NUMA in bios and can't boot FreeBSD.
> Boot stoped after next messages:
>
> ===
> Booting...
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> ===
>
> This is verbose boot.
> No reaction to ~^B, NMI.
>
> Same for head and 10.3-RELEASE.
>
> Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
>
> On slight different hardware
> (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> 10.3 boot ok w/ BIOS NUMA enabled.
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Slawa Olhovchenkov
On Sat, Nov 26, 2016 at 07:07:20PM +0300, Slawa Olhovchenkov wrote:

> On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> 
> > On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > > I am try to enable NUMA in bios and can't boot FreeBSD.
> > > Boot stoped after next messages:
> > > 
> > > ===
> > > Booting...
> > > KDB: debugger backends: ddb
> > > KDB: current backend: ddb
> > So at least the hammer_time() has a chance to initialize the console.
> > Do you have serial console ?  Set the loader tunable debug.late_console
> 
> Via ipmi sol
> 
> > to 1 and see if any NMI reaction appear.
> 
> I am try this late.
> 
> > > ===
> > > 
> > > This is verbose boot.
> > > No reaction to ~^B, NMI.
> > > 
> > > Same for head and 10.3-RELEASE.
> > > 
> > > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> > Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> 
> No
> 
> > What if you try to frob it ?
> > 
> > > 
> > > On slight different hardware
> > > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > > 10.3 boot ok w/ BIOS NUMA enabled.
> > 
> > I think the only way to debug this is to add printf() lines to hammer_time()
> > to see where does it break.  Note that amd64_kdb_init() call succeeded,
> > so you can start bisect the code from there.
> > 
> 
> I am not expert in this code.

I am think code halted in later getmemsize() (or before it), I am
don't see any messages from init_ops.parse_memmap()/native_parse_memmap().
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Slawa Olhovchenkov
On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:

> On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > I am try to enable NUMA in bios and can't boot FreeBSD.
> > Boot stoped after next messages:
> > 
> > ===
> > Booting...
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> So at least the hammer_time() has a chance to initialize the console.
> Do you have serial console ?  Set the loader tunable debug.late_console

Via ipmi sol

> to 1 and see if any NMI reaction appear.

I am try this late.

> > ===
> > 
> > This is verbose boot.
> > No reaction to ~^B, NMI.
> > 
> > Same for head and 10.3-RELEASE.
> > 
> > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?

No

> What if you try to frob it ?
> 
> > 
> > On slight different hardware
> > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > 10.3 boot ok w/ BIOS NUMA enabled.
> 
> I think the only way to debug this is to add printf() lines to hammer_time()
> to see where does it break.  Note that amd64_kdb_init() call succeeded,
> so you can start bisect the code from there.
> 

I am not expert in this code.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Konstantin Belousov
On Sat, Nov 26, 2016 at 05:57:47PM +0200, Konstantin Belousov wrote:
> On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> > I am try to enable NUMA in bios and can't boot FreeBSD.
> > Boot stoped after next messages:
> > 
> > ===
> > Booting...
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> So at least the hammer_time() has a chance to initialize the console.
> Do you have serial console ?  Set the loader tunable debug.late_console
> to 1 and see if any NMI reaction appear.
Err, sorry.  Set it to 0.

> 
> > ===
> > 
> > This is verbose boot.
> > No reaction to ~^B, NMI.
> > 
> > Same for head and 10.3-RELEASE.
> > 
> > Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
> Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
> What if you try to frob it ?
> 
> > 
> > On slight different hardware
> > (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> > 10.3 boot ok w/ BIOS NUMA enabled.
> 
> I think the only way to debug this is to add printf() lines to hammer_time()
> to see where does it break.  Note that amd64_kdb_init() call succeeded,
> so you can start bisect the code from there.
> 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Enabling NUMA in BIOS stop booting FreeBSD

2016-11-26 Thread Konstantin Belousov
On Sat, Nov 26, 2016 at 12:21:24PM +0300, Slawa Olhovchenkov wrote:
> I am try to enable NUMA in bios and can't boot FreeBSD.
> Boot stoped after next messages:
> 
> ===
> Booting...
> KDB: debugger backends: ddb
> KDB: current backend: ddb
So at least the hammer_time() has a chance to initialize the console.
Do you have serial console ?  Set the loader tunable debug.late_console
to 1 and see if any NMI reaction appear.

> ===
> 
> This is verbose boot.
> No reaction to ~^B, NMI.
> 
> Same for head and 10.3-RELEASE.
> 
> Hardware is Supermicro X10DRi, Dual E5-2650v4, 256GB RAM.
Is there a BIOS option for 'on-chip cluster' or 'HPC computing' ?
What if you try to frob it ?

> 
> On slight different hardware
> (Supermicro X10DRi w/ old BIOS, Dual E5-2640v3, 128GB RAM)
> 10.3 boot ok w/ BIOS NUMA enabled.

I think the only way to debug this is to add printf() lines to hammer_time()
to see where does it break.  Note that amd64_kdb_init() call succeeded,
so you can start bisect the code from there.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"