Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread O. Hartmann
On Tue, 14 Aug 2018 20:41:03 -0700
Pete Wright  wrote:

> On 8/14/18 6:13 PM, Kyle Evans wrote:
> > On Tue, Aug 14, 2018 at 7:28 PM, Pete Wright  wrote:  
> >> i also attempted to boot using UEFI but the system hangs very early in the
> >> boot process.  i have reverted to legacy mode for now so that i can work,
> >> but am keen to test out any patches or do any other debugging that is
> >> needed.  
> > Hi Pete,
> >
> > Where in the process does it hang with UEFI? I can't help much with
> > any of your other problems, but I am curious about this one. =)  
> sure thing - the last several lines are:
> 
> random: fast provider: "Intel Secure Key RNG"
> kbd1 at kbdmux0
> netmap: loaded module
> nexus0

Similar situation same here on a Lenovo ThinkPad E540, except that the netmpa
message occurs prior to "random: fast provider: "Intel Secure Key RNG"


> 
> at this point it hangs.  let me know if you want me to try booting with 
> verbose output to dmesg or something.
> 
> 
> cheers,
> -pete
> 
>

Other UEFI booting systems (oldish Asrock Z77-Pro4/Pro4-M) throw some infos
shortly after booting the kernel which look like the stuff I can see from the
UEFI loader very early - but it is to fast to catch with the naked eye. The
boxes reboot and spinning booting this way. 


Addon: this is with CURRENT r337832

Last known working version for me is: r337718

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Pete Wright


On 8/14/18 9:06 PM, Pete Wright wrote:


On 8/14/18 9:01 PM, Kyle Evans wrote:



I'm curious if you've been bitten somehow by recently enabling EFIRT
in GENERIC. Can you try setting efi.rt.disabled=1 at loader prompt and
see where that gets you?


i did attempt to set that in loader.conf - and it progressed farther 
but

kernel panic'd when trying to bring up my iwn wireless interface.


Interesting... out of side curiosity, what does this panic look like?
Can you also try running a kernel >= r337773 with kib's patch from [1]
applied to make sure this the EFI part of this isn't already solved?


sure i can give that a spin.


i'm building an older version in an attempt to bisect this issue (i 
have a
skylake system running a checkout from monday without issues, so 
testing

that now).  if i am still running into problems i'll boot with
efi.rt.disabled=1 and will post the gdb panic string here.


Excellent.


so i have reverted back to this git hash:
90f37b39e4a

https://github.com/freebsd/freebsd/commit/90f37b39e4ad481d3e5a059123f7d68ac153f0c5 



and i can confirm that i am able to boot with EFI enabled, do not 
experience any issues bringing up my iwn interface and HDMI is 
recognized according to xrandr.  so that solves my immediate problem! 
:)  interestingly enough i still see the ACPI errors I reported 
earlier, but perhaps that is a red herring.


i'll go back to the tip of master and apply kib's patch and see how it 
goes.



ok great (thank's ccache for making buildkernel fast :) )

so after applying the patch from kib i have the same behavior as i'm 
seeing on git hash 90f37b39e4a.


i.e. boots fine with UEFI enabled, iwn interface comes up, HDMI output 
is detected by xrandr and interesting ACPI warning messages.


i'll dogfood this patch tomorrow when i get into the office and validate 
connecting my HDMI display works as expected and will report any other 
issues i bump into.


thanks for your help Kyle!  I didn't think to test kib's patch as i was 
assuming my issue was related to the ACPI errors, but this seems to get 
me back to where i need to be to work tomorrow so i'm good to go :)


-pete


--
Pete Wright
p...@nomadlogic.org
@nomadlogicLA

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Pete Wright


On 8/14/18 9:01 PM, Kyle Evans wrote:



I'm curious if you've been bitten somehow by recently enabling EFIRT
in GENERIC. Can you try setting efi.rt.disabled=1 at loader prompt and
see where that gets you?


i did attempt to set that in loader.conf - and it progressed farther but
kernel panic'd when trying to bring up my iwn wireless interface.


Interesting... out of side curiosity, what does this panic look like?
Can you also try running a kernel >= r337773 with kib's patch from [1]
applied to make sure this the EFI part of this isn't already solved?


sure i can give that a spin.



i'm building an older version in an attempt to bisect this issue (i have a
skylake system running a checkout from monday without issues, so testing
that now).  if i am still running into problems i'll boot with
efi.rt.disabled=1 and will post the gdb panic string here.


Excellent.


so i have reverted back to this git hash:
90f37b39e4a

https://github.com/freebsd/freebsd/commit/90f37b39e4ad481d3e5a059123f7d68ac153f0c5

and i can confirm that i am able to boot with EFI enabled, do not 
experience any issues bringing up my iwn interface and HDMI is 
recognized according to xrandr.  so that solves my immediate problem! 
:)  interestingly enough i still see the ACPI errors I reported earlier, 
but perhaps that is a red herring.


i'll go back to the tip of master and apply kib's patch and see how it goes.

-p

--
Pete Wright
p...@nomadlogic.org
@nomadlogicLA

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Kyle Evans
On Tue, Aug 14, 2018 at 10:56 PM, Pete Wright  wrote:
>
> On 8/14/18 8:45 PM, Kyle Evans wrote:
>>
>> On Tue, Aug 14, 2018 at 10:41 PM, Pete Wright  wrote:
>>>
>>> On 8/14/18 6:13 PM, Kyle Evans wrote:

 On Tue, Aug 14, 2018 at 7:28 PM, Pete Wright 
 wrote:
>
> i also attempted to boot using UEFI but the system hangs very early in
> the
> boot process.  i have reverted to legacy mode for now so that i can
> work,
> but am keen to test out any patches or do any other debugging that is
> needed.

 Hi Pete,

 Where in the process does it hang with UEFI? I can't help much with
 any of your other problems, but I am curious about this one. =)
>>>
>>> sure thing - the last several lines are:
>>>
>>> random: fast provider: "Intel Secure Key RNG"
>>> kbd1 at kbdmux0
>>> netmap: loaded module
>>> nexus0
>>>
>>> at this point it hangs.  let me know if you want me to try booting with
>>> verbose output to dmesg or something.
>>>
>> Are you running GENERIC, or custom config? Any modules loaded?
>
>
>
> this is a GENERIC kernel using ZFS as well as GELI full disk encryption.
>

Good to know, thanks!

>
>>
>> I'm curious if you've been bitten somehow by recently enabling EFIRT
>> in GENERIC. Can you try setting efi.rt.disabled=1 at loader prompt and
>> see where that gets you?
>
>
> i did attempt to set that in loader.conf - and it progressed farther but
> kernel panic'd when trying to bring up my iwn wireless interface.
>

Interesting... out of side curiosity, what does this panic look like?
Can you also try running a kernel >= r337773 with kib's patch from [1]
applied to make sure this the EFI part of this isn't already solved?

>
> i'm building an older version in an attempt to bisect this issue (i have a
> skylake system running a checkout from monday without issues, so testing
> that now).  if i am still running into problems i'll boot with
> efi.rt.disabled=1 and will post the gdb panic string here.
>

Excellent.

> -pete
>

Thanks,

Kyle Evans

[1] https://lists.freebsd.org/pipermail/freebsd-current/2018-August/070660.html
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Pete Wright


On 8/14/18 8:45 PM, Kyle Evans wrote:

On Tue, Aug 14, 2018 at 10:41 PM, Pete Wright  wrote:

On 8/14/18 6:13 PM, Kyle Evans wrote:

On Tue, Aug 14, 2018 at 7:28 PM, Pete Wright  wrote:

i also attempted to boot using UEFI but the system hangs very early in
the
boot process.  i have reverted to legacy mode for now so that i can work,
but am keen to test out any patches or do any other debugging that is
needed.

Hi Pete,

Where in the process does it hang with UEFI? I can't help much with
any of your other problems, but I am curious about this one. =)

sure thing - the last several lines are:

random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
netmap: loaded module
nexus0

at this point it hangs.  let me know if you want me to try booting with
verbose output to dmesg or something.


Are you running GENERIC, or custom config? Any modules loaded?



this is a GENERIC kernel using ZFS as well as GELI full disk encryption.




I'm curious if you've been bitten somehow by recently enabling EFIRT
in GENERIC. Can you try setting efi.rt.disabled=1 at loader prompt and
see where that gets you?


i did attempt to set that in loader.conf - and it progressed farther but 
kernel panic'd when trying to bring up my iwn wireless interface.



i'm building an older version in an attempt to bisect this issue (i have 
a skylake system running a checkout from monday without issues, so 
testing that now).  if i am still running into problems i'll boot with 
efi.rt.disabled=1 and will post the gdb panic string here.


-pete


--
Pete Wright
p...@nomadlogic.org
@nomadlogicLA

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zpool scrub. Wtf?

2018-08-14 Thread Cy Schubert
In message <93818fab-6b53-bade-dffa-afcf322dc...@zyxst.net>, tech-lists 
writes:
> On 12/08/2018 18:53, Cy Schubert wrote:
> > I haven't looked at it closely but from what I saw it was counting scan rea
> ds and issued reads. It may be a simple matter of dividing by 2.
>
> Dividing what by 2?
>
> scan: scrub in progress since Tue Aug  7 21:21:51 2018
>  804G scanned at 163M/s, 1,06T issued at 219M/s, 834G total
>  0 repaired, 129,87% done, 929637 days 13:43:01 to go
>
> I'm also seeing this problem running 12.0-ALPHA1 #0 r337682

This is probably incorrect but this is what I was talking about.

Index: cddl/contrib/opensolaris/cmd/zpool/zpool_main.c
===
--- cddl/contrib/opensolaris/cmd/zpool/zpool_main.c (revision 337830)
+++ cddl/contrib/opensolaris/cmd/zpool/zpool_main.c (working copy)
@@ -4492,8 +4492,8 @@
 
scanned = ps->pss_examined;
pass_scanned = ps->pss_pass_exam;
-   issued = ps->pss_issued;
-   pass_issued = ps->pss_pass_issued;
+   issued = ps->pss_issued / 2;
+   pass_issued = ps->pss_pass_issued / 2;
total = ps->pss_to_examine;
 
/* we are only done with a block once we have issued the IO for it */


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Kyle Evans
On Tue, Aug 14, 2018 at 10:41 PM, Pete Wright  wrote:
>
> On 8/14/18 6:13 PM, Kyle Evans wrote:
>>
>> On Tue, Aug 14, 2018 at 7:28 PM, Pete Wright  wrote:
>>>
>>> i also attempted to boot using UEFI but the system hangs very early in
>>> the
>>> boot process.  i have reverted to legacy mode for now so that i can work,
>>> but am keen to test out any patches or do any other debugging that is
>>> needed.
>>
>> Hi Pete,
>>
>> Where in the process does it hang with UEFI? I can't help much with
>> any of your other problems, but I am curious about this one. =)
>
> sure thing - the last several lines are:
>
> random: fast provider: "Intel Secure Key RNG"
> kbd1 at kbdmux0
> netmap: loaded module
> nexus0
>
> at this point it hangs.  let me know if you want me to try booting with
> verbose output to dmesg or something.
>

Are you running GENERIC, or custom config? Any modules loaded?

I'm curious if you've been bitten somehow by recently enabling EFIRT
in GENERIC. Can you try setting efi.rt.disabled=1 at loader prompt and
see where that gets you?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Pete Wright


On 8/14/18 6:13 PM, Kyle Evans wrote:

On Tue, Aug 14, 2018 at 7:28 PM, Pete Wright  wrote:

i also attempted to boot using UEFI but the system hangs very early in the
boot process.  i have reverted to legacy mode for now so that i can work,
but am keen to test out any patches or do any other debugging that is
needed.

Hi Pete,

Where in the process does it hang with UEFI? I can't help much with
any of your other problems, but I am curious about this one. =)

sure thing - the last several lines are:

random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
netmap: loaded module
nexus0

at this point it hangs.  let me know if you want me to try booting with 
verbose output to dmesg or something.



cheers,
-pete


--

Pete Wright
p...@nomadlogic.org
@nomadlogicLA

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zpool scrub. Wtf?

2018-08-14 Thread tech-lists

On 12/08/2018 18:53, Cy Schubert wrote:

I haven't looked at it closely but from what I saw it was counting scan reads 
and issued reads. It may be a simple matter of dividing by 2.


Dividing what by 2?

scan: scrub in progress since Tue Aug  7 21:21:51 2018
804G scanned at 163M/s, 1,06T issued at 219M/s, 834G total
0 repaired, 129,87% done, 929637 days 13:43:01 to go

I'm also seeing this problem running 12.0-ALPHA1 #0 r337682

thanks,
--
J.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: boot errors since upgrading to 12-current

2018-08-14 Thread tech-lists

On 14/08/2018 21:16, Toomas Soome wrote:




On 14 Aug 2018, at 22:37, tech-lists  wrote:

Hello,

context: amd64, FreeBSD 12.0-ALPHA1 #0 r337682, ZFS. The system is
*not* root-on-zfs. It boots to an SSD. The three disks indicated
below are spinning rust.

NAMESTATE READ WRITE CKSUM storage ONLINE   0
0 0 raidz1-0  ONLINE   0 0 0 ada1ONLINE   0
0 0 ada2ONLINE   0 0 0 ada3ONLINE   0
0 0

This machine was running 11.2 up until about a month ago.

Recently I've seen this flash up on the screen before getting to
the beastie screen:

BIOS drive C: is disk0 BIOS drive D: is disk1 BIOS drive E: is
disk2 BIOS drive F: is disk3 BIOS drive G: is disk4 BIOS drive H:
is disk5 BIOS drive I: is disk6 BIOS drive J: is disk7

[the above is normal and has always has been seen on every boot]

read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to
0xcbdb1330, error: 0x31 read 1 from 0 to 0xcbdb1330, error: 0x31 
read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to
0xcbdb1330, error: 0x31 read 1 from 0 to 0xcbdb1330, error: 0x31 
read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to

0xcbdb1330, error: 0x31

the above has been happening since upgrading to -current a month
ago

ZFS: i/o error - all block copies unavailable ZFS: can't read MOS
of pool storage

the above is alarming and has been happening for the past couple of
days, since upgrading to r337682 on the 12th August.

The beastie screen then loads and it boots normally.

Should I be concerned? Is the output indicative of a problem?



Not immediately and yes. In BIOS loader, we do all disk IO with INT13
and the error 0x31 is often hinting about missing media or some other
controller related error. Could you paste the output from loader
lsdev -v output?

The drive list appears as an result of probing the disks in
biosdisk.c. The read errors are from attempt to read 1 sector from
sector 0 (that is, to read the partition table from the disk). Why
this does end with error, would be interesting to know, unfortunately
that error does not tell us which disk was probed.


Hi Toomas, thanks for looking at this.

lsdev -v looks like this:

OK lsdev -v
disk devices:
disk0: BIOS drive C (16514064 X 512):
disk0s1: FreeBSD  111GB
disk0s1a: FreeBSD UFS 108GB
disk0s1b: FreeBSD swap3881MB

disk1: BIOS drive D (16514064 X 512):
disk2: BIOS drive E (16514064 X 512):
disk3: BIOS drive F (16514064 X 512):
disk4: BIOS drive G (2880 X 512):
read 1 from 0 to 0xcbde0a20, error 0x31
disk5: BIOS drive D (2880 X 512):
read 1 from 0 to 0xcbde0a20, error 0x31
disk6: BIOS drive D (2880 X 512):
read 1 from 0 to 0xcbde0a20, error 0x31
disk7: BIOS drive D (2880 X 512):
read 1 from 0 to 0xcbde0a20, error 0x31
OK

disk4 to disk7 corresponds with da0 to da3 which are sd/mmc devices 
without any media in. What made me notice it is it never showed the read 
1 from 0 to $random_value on 11-stable. The system runs 12-current now.


disk1 to disk3 are the hard drives making up ZFS. These are 4TB Western 
Digital SATA-3 WDC WD4001FAEX.



Since you are getting errors from data pool ‘storage’, it does not
affect the boot. Why the pool storage is unreadable - it likely has
to do about the errors above, but can not tell for sure based on the
data presented here….


Thing is, the data pool works fine when boot completes. i.e it loads 
read/write and behaves normally.


thanks,
--
J.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Kyle Evans
On Tue, Aug 14, 2018 at 7:28 PM, Pete Wright  wrote:
> i also attempted to boot using UEFI but the system hangs very early in the
> boot process.  i have reverted to legacy mode for now so that i can work,
> but am keen to test out any patches or do any other debugging that is
> needed.

Hi Pete,

Where in the process does it hang with UEFI? I can't help much with
any of your other problems, but I am curious about this one. =)

Thanks,

Kyle Evans
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


CURRENT from today throws lots of ACPI errors, lost HDMI detection

2018-08-14 Thread Pete Wright

howdy,
running code from today and having lots of issues.  when i boot the 
system (a kabylake laptop) using legacy mode in the BIOS i see lots of 
these errors are thrown in dmesg:


acpi0:  on motherboard
Firmware Error (ACPI): Failure creating 
[\134_SB.PCI0.XHC.RHUB.HS01._UPC], AE_ALREADY_EXISTS (20180810/dswload2-468)
ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog 
(20180810/psobject-372)

ACPI Error: Skip parsing opcode OpcodeName unavailable (20180810/psloop-689)
Firmware Error (ACPI): Failure creating 
[\134_SB.PCI0.XHC.RHUB.HS01._PLD], AE_ALREADY_EXISTS (20180810/dswload2-468)
ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog 
(20180810/psobject-372)

ACPI Error: Skip parsing opcode OpcodeName unavailable (20180810/psloop-689)
Firmware Error (ACPI): Failure creating 
[\134_SB.PCI0.XHC.RHUB.HS02._UPC], AE_ALREADY_EXISTS (20180810/dswload2-468)
ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog 
(20180810/psobject-372)

ACPI Error: Skip parsing opcode OpcodeName unavailable (20180810/psloop-689)
Firmware Error (ACPI): Failure creating 
[\134_SB.PCI0.XHC.RHUB.HS02._PLD], AE_ALREADY_EXISTS (20180810/dswload2-468)
ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog 
(20180810/psobject-372)

ACPI Error: Skip parsing opcode OpcodeName unavailable (20180810/psloop-689)
Firmware Error (ACPI): Failure creating 
[\134_SB.PCI0.XHC.RHUB.HS03._UPC], AE_ALREADY_EXISTS (20180810/dswload2-468)
ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog 
(20180810/psobject-372)

ACPI Error: Skip parsing opcode OpcodeName unavailable (20180810/psloop-689)


there are lots more, i can post a pastebin link if needed.

i also attempted to boot using UEFI but the system hangs very early in 
the boot process.  i have reverted to legacy mode for now so that i can 
work, but am keen to test out any patches or do any other debugging that 
is needed.


thx!
-pete


--
Pete Wright
p...@nomadlogic.org
@nomadlogicLA

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EFIRT on machines with pcid after r337773

2018-08-14 Thread Oliver Pinter
Hi!

Seems like this patch fixed the boot issue on Dell e5440 with UEFI.

Once you get to MFC, please X-MFC-with the following patch:

commit dfe1112fa878c5d8fa0605d1de10c96ecc993569
Author: rlibby 
Date:   Fri Jul 21 17:11:36 2017 +

__pcpu: gcc -Wredundant-decls

Pollution from counter.h made __pcpu visible in amd64/pmap.c.  Delete
the existing extern decl of __pcpu in amd64/pmap.c and avoid referring
to that symbol, instead accessing the pcpu region via PCPU_SET macros.
Also delete an unused extern decl of __pcpu from mp_x86.c.

Reviewed by:kib
Approved by:markj (mentor)
Sponsored by:   Dell EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D11666

Notes:
svn path=/head/; revision=321335

On 8/15/18, Konstantin Belousov  wrote:
> If you use UEFI boot, have EFIRT compiled in kernel (the case of
> GENERIC) or pre-loaded as module, and efirt is not disabled by a tunable,
> and the machine resets during kernel initialization, try this.
>
> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> index d5d795ab502..c9334eab916 100644
> --- a/sys/amd64/amd64/pmap.c
> +++ b/sys/amd64/amd64/pmap.c
> @@ -1188,7 +1188,7 @@ pmap_bootstrap(vm_paddr_t *firstaddr)
>   kernel_pmap->pm_pcids[i].pm_pcid = PMAP_PCID_KERN;
>   kernel_pmap->pm_pcids[i].pm_gen = 1;
>   }
> - PCPU_SET(pcid_next, PMAP_PCID_KERN + 1);
> + PCPU_SET(pcid_next, PMAP_PCID_KERN + 2);
>   PCPU_SET(pcid_gen, 1);
>   /*
>* pcpu area for APs is zeroed during AP startup.
> @@ -2651,8 +2651,8 @@ pmap_pinit0(pmap_t pmap)
>   bzero(>pm_stats, sizeof pmap->pm_stats);
>   pmap->pm_flags = pmap_flags;
>   CPU_FOREACH(i) {
> - pmap->pm_pcids[i].pm_pcid = PMAP_PCID_NONE;
> - pmap->pm_pcids[i].pm_gen = 0;
> + pmap->pm_pcids[i].pm_pcid = PMAP_PCID_KERN + 1;
> + pmap->pm_pcids[i].pm_gen = 1;
>   if (!pti) {
>   __pcpu[i].pc_kcr3 = PMAP_NO_CR3;
>   __pcpu[i].pc_ucr3 = PMAP_NO_CR3;
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

2018-08-14 Thread Michael Gmelin



On Wed, 6 Jun 2018 01:06:25 +0200
Michael Gmelin  wrote:

> On Tue, 5 Jun 2018 16:11:35 +0300
> Konstantin Belousov  wrote:
> 
> > On Mon, Jun 04, 2018 at 11:17:56PM +0200, Michael Gmelin wrote:  
> > > 
> > > 
> > > On Mon, 4 Jun 2018 14:06:55 +0300
> > > Konstantin Belousov  wrote:
> > > 
> > > > On Mon, Jun 04, 2018 at 12:46:32AM +0200, Michael Gmelin
> > > > wrote:
> > > > > 
> [...]
> > > > > > > > > This machine comes with it by default (my model was
> > > > > > > > > delivered with SeaBIOS 20131018_145217-build121-m2).
> > > > > > > > > So I didn't flash anything (didn't feel like bricking
> > > > > > > > > it). 
> > > > > > > > > >   
> > > > > > > > > > > kernel trap 12 with interrupts disabled
> > > > > > > > > > > 
> > > > > > > > > > > Fatal trap 12: page fault while in kernel mode 
> > > > > > > > > > > cpuid = 0; apic id = 00
> > > > > > > > > > > fault virtual address= 0xf8000100
> > > > > > > > > > > fault code   = supervisor write data,
> > > > > > > > > > > protection violation instruction pointer  =
> > > > > > > > > > > 0x20:Ox8102955f stack pointer=
> > > > > > > > > > > 0x28:0x82a79be0 frame pointer=
> > > > > > > > > > > 0x28:0x82a79c10 code segment =
> > > > > > > > > > > base Ox0, limit Oxf, type Ox1b = DPL 0, pres
> > > > > > > > > > > 1, long 1, def32 0, gran 1 processor
> > > > > > > > > > > eflags = resume, IOPL = 0 current
> > > > > > > > > > > process  = 0 () [ thread pid 0 tid 0 ]
> > > > > > > > > > > Stopped at  native_start_all_aps+0x08f:
> > > > > > > > > > > movq %rax,(%rsi)
> > > > > > > > > > Look up the source line number for this address.
> > > > > > > > > >   
> > > > > > > > > 
> > > > > > > > > I guess that's sys/amd64/amd64/support.S line 854 (in
> > > > > > > > > rdmsr), called by native_start_all_aps. Any additional
> > > > > > > > > hints how I can track it down?  
> > > > > > > > Why did you decided that this is rdmsr_safe() ? First,
> > > > > > > > native_start_all_aps() does not call rdmsr, second the
> > > > > > > > ddb report clearly indicates that the fault occured
> > > > > > > > acessing DMAP in native_start_all_aps().
> > > > > > > > 
> > > > > > > > Just look up the source line by the address
> > > > > > > > native_start_all_aps+0x08f.
> > > > > > > 
> > > > > > > Okay, according to kgbd this should be here:
> > > > > > > 
> > > > > > > https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=68=markup#l369
> > > > > > > 
> > > > > > > 364
> > > > > > > 365/* Create the initial 1GB replicated page tables */
> > > > > > > 366for (i = 0; i < 512; i++) {
> > > > > > > 367/* Each slot of the level 4 pages points to
> > > > > > > the same level 3 page */ 368pt4[i] =
> > > > > > > (u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE);
> > > > > > > 369 pt4[i] |= PG_V | PG_RW | PG_U; 370
> > > > > > > 371/* Each slot of the level 3 pages points to
> > > > > > > the same level 2 page */ 372pt3[i] =
> > > > > > > (u_int64_t)(uintptr_t)(mptramp_pagetables + (2 *
> > > > > > > PAGE_SIZE)); 373pt3[i] |= PG_V | PG_RW | PG_U;
> > > > > > > 374 375/* The level 2 page slots are mapped
> > > > > > > with 2MB pages for 1GB. */ 376pt2[i] = i * (2
> > > > > > > * 1024 * 1024); 377pt2[i] |= PG_V | PG_RW |
> > > > > > > PG_PS | PG_U; 378}
> > > > > > > 
> > > > > > > -m
> > > > > > You have fault on write due to read-only mapping of the
> > > > > > portion of the direct map, which maps the kernel text.  It
> > > > > > is consistent with the faulting address.  It is not clear
> > > > > > if it is something new on your machine, or before the
> > > > > > kernel text was silently corrupted, since ro protection is
> > > > > > somewhat recent.
> > > > > > 
> > > > > > It seems that mp_bootaddress() selected the bad place for
> > > > > > the bootstrap page tables. Even more, we do not include the
> > > > > > kernel text into the physmem[] array, so it is not clear
> > > > > > how did it happen. This code was also changed recently.
> > > > > > 
> > > > > > Can you add the print of the physmap[] array somewhere
> > > > > > before the panic, to see what is the kernel idea of the
> > > > > > available memory ? It should be already done if you have
> > > > > > serial console and set debug.late_console tunable to
> > > > > > 0.  
> > > > > 
> > > > > This is a sad little machine without any kind of serial
> > > > > console.
> > > > > 
> > > > > Physmap looks like this after calling getmemsize():
> > > > > 
> > > > > [0]: 0x1
> > > > > [1]: 0x3
> > > > > [2]: 0x4
> > > > > [3]: 0x9e000
> > > > > [4]: 0x10
> > > > > [5]: 0xf0
> > > > > [6]: 0x1003000
> > > > > [7]: 0x7bf7a000
> > > > > 
> > > > > Physical memory chunks logged in cpu_startup are:
> > > > > 
> > 

EFIRT on machines with pcid after r337773

2018-08-14 Thread Konstantin Belousov
If you use UEFI boot, have EFIRT compiled in kernel (the case of
GENERIC) or pre-loaded as module, and efirt is not disabled by a tunable,
and the machine resets during kernel initialization, try this.

diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index d5d795ab502..c9334eab916 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -1188,7 +1188,7 @@ pmap_bootstrap(vm_paddr_t *firstaddr)
kernel_pmap->pm_pcids[i].pm_pcid = PMAP_PCID_KERN;
kernel_pmap->pm_pcids[i].pm_gen = 1;
}
-   PCPU_SET(pcid_next, PMAP_PCID_KERN + 1);
+   PCPU_SET(pcid_next, PMAP_PCID_KERN + 2);
PCPU_SET(pcid_gen, 1);
/*
 * pcpu area for APs is zeroed during AP startup.
@@ -2651,8 +2651,8 @@ pmap_pinit0(pmap_t pmap)
bzero(>pm_stats, sizeof pmap->pm_stats);
pmap->pm_flags = pmap_flags;
CPU_FOREACH(i) {
-   pmap->pm_pcids[i].pm_pcid = PMAP_PCID_NONE;
-   pmap->pm_pcids[i].pm_gen = 0;
+   pmap->pm_pcids[i].pm_pcid = PMAP_PCID_KERN + 1;
+   pmap->pm_pcids[i].pm_gen = 1;
if (!pti) {
__pcpu[i].pc_kcr3 = PMAP_NO_CR3;
__pcpu[i].pc_ucr3 = PMAP_NO_CR3;
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: boot errors since upgrading to 12-current

2018-08-14 Thread Toomas Soome


> On 14 Aug 2018, at 22:37, tech-lists  wrote:
> 
> Hello,
> 
> context: amd64, FreeBSD 12.0-ALPHA1 #0 r337682, ZFS. The system is *not* 
> root-on-zfs. It boots to an SSD. The three disks indicated below are spinning 
> rust.
> 
>   NAMESTATE READ WRITE CKSUM
>   storage ONLINE   0 0 0
> raidz1-0  ONLINE   0 0 0
>   ada1ONLINE   0 0 0
>   ada2ONLINE   0 0 0
>   ada3ONLINE   0 0 0
> 
> This machine was running 11.2 up until about a month ago.
> 
> Recently I've seen this flash up on the screen before getting to the beastie 
> screen:
> 
> BIOS drive C: is disk0
> BIOS drive D: is disk1
> BIOS drive E: is disk2
> BIOS drive F: is disk3
> BIOS drive G: is disk4
> BIOS drive H: is disk5
> BIOS drive I: is disk6
> BIOS drive J: is disk7
> 
> [the above is normal and has always has been seen on every boot]
> 
> read 1 from 0 to 0xcbdb1330, error: 0x31
> read 1 from 0 to 0xcbdb1330, error: 0x31
> read 1 from 0 to 0xcbdb1330, error: 0x31
> read 1 from 0 to 0xcbdb1330, error: 0x31
> read 1 from 0 to 0xcbdb1330, error: 0x31
> read 1 from 0 to 0xcbdb1330, error: 0x31
> read 1 from 0 to 0xcbdb1330, error: 0x31
> read 1 from 0 to 0xcbdb1330, error: 0x31
> 
> the above has been happening since upgrading to -current a month ago
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS of pool storage
> 
> the above is alarming and has been happening for the past couple of days, 
> since upgrading to r337682 on the 12th August.
> 
> The beastie screen then loads and it boots normally.
> 
> Should I be concerned? Is the output indicative of a problem?
> 

Not immediately and yes. In BIOS loader, we do all disk IO with INT13 and the 
error 0x31 is often hinting about missing media or some other controller 
related error. Could you paste the output from loader lsdev -v output?

The drive list appears as an result of probing the disks in biosdisk.c. The 
read errors are from attempt to read 1 sector from sector 0 (that is, to read 
the partition table from the disk). Why this does end with error, would be 
interesting to know, unfortunately that error does not tell us which disk was 
probed.

Since you are getting errors from data pool ‘storage’, it does not affect the 
boot. Why the pool storage is unreadable - it likely has to do about the errors 
above, but can not tell for sure based on the data presented here…. 

rgds,
toomas

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Make drm drivers use MTRR write-combine

2018-08-14 Thread Johannes Lundberg
On Tue, Aug 14, 2018 at 3:39 PM Konstantin Belousov 
wrote:

> On Tue, Aug 14, 2018 at 08:55:36AM -0500, Eric van Gyzen wrote:
> > On 8/14/18 4:12 AM, Johannes Lundberg wrote:
> > > Hi
> > >
> > > Something that we have seen for a long time on FreeBSD is the boot
> message
> > >
> > > Failed to add WC MTRR for [0xd000-0xdfff]: -22; performance may
> > > suffer
> > >
> > > Taking a closer look at this with memcontrol I can see that the 256 MB
> > > region that DRM wants to set as WC is already covered by this entry
> > > 0xc000/0x4000 BIOS uncacheable set-by-firmware active
> > >
> > > Similar on both my Skylake and Broadwell systems.
> > I see something similar on my Dell XPS 13 with a Kaby Lake R:
> >
> > Failed to add WC MTRR for [0x9000-0x9fff]: -22; performance may
> > suffer
> >
> > 0x8000/0x8000 BIOS uncacheable set-by-firmware active
> >
> > The only mappings in this range are MMIO:
> >
> > machdep.efi_map:
> >Type Physical  Virtual   #Pages Attr
> > [snip]
> > MemoryMappedIO e000   0xe000 0001 RUNTIME
> > MemoryMappedIO fe00   0xfe00 0011 UC RUNTIME
> > MemoryMappedIO fec0   0xfec0 0001 UC RUNTIME
> > MemoryMappedIO fee0   0xfee0 0001 UC WT WB WP RUNTIME
> > MemoryMappedIO ff00   0xff00 1000 UC WT WB WP RUNTIME
>
> Yes, the cause of the message is that current x86 mtrr code is not
> sufficient to handle this situation. You have BIOS-configured variable
> range MTRR which covers upper half of the low 4G, as uncacheable (UC).
> It is reasonable for BIOS to set it up this way because this is where
> PCIe BARs and other devices MMIO regions are located.
>
> One of the BARs there is the GPU aperture that really should be WC
> (write-combining). There are two ways to achieve this: split the UC
> variable-length MTRR range into three, UC/WC/UC, which would require
> three MTRRs to cover. This is what current x86_mem.c code does not
> support.
>
> Another way is to set WC mode in the page table entries (PTEs) using
> Page Attribute Table (PAT), for all PTEs. According to the rules of
> combination of the memory access types between MTRR and PAT, WC in PAT
> and any access mode in MTRR gives effective WC.
>
> I saw the same warning when I initially ported GEM. My code used WC PAT
> type, which makes the warning cosmetical, and which made me to not add
> MTRR split. If new drm driver also consistently uses WC memattr when
> creating aperture mappings, then the warning can be ignored as well.
>

Hi Kib

Thanks for the detailed answer. This might already be the case for the out
of tree drivers as well. From what I read about the VESA driver the
performance difference should be quite big w/o WC and I haven't noticed and
performance issues with the newer drivers at all.

I will confirm this tomorrow.

___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Restart at boot with new kernel

2018-08-14 Thread Matthias Gamsjager
> Is your kernel up to r337773 yet? That should have addressed the EFIRT
> boot issue.
>
>
Rebuilding now. Not sure how late I started so I might have missed it.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Restart at boot with new kernel

2018-08-14 Thread Kyle Evans
On Tue, Aug 14, 2018 at 3:00 PM, Matthias Gamsjager
 wrote:
> Ok missed the solution posted earlier: adding ' efi.rt.disabled=1' to
> loader.conf fixed the issue for me as well

Is your kernel up to r337773 yet? That should have addressed the EFIRT
boot issue.

Thanks,

Kyle Evans
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Restart at boot with new kernel

2018-08-14 Thread Oliver Pinter
On 8/14/18, Matthias Gamsjager  wrote:
> Hi list,
>
> just compiled current and after reboot into the new kernel the machine
> reboots itself at the marked line below. No further info as the reset is
> sudden. Reboot with the old kernel from 12 aug works. Removed all kernel
> modules but with no result.
>
> Any pointers how to approach this?
>
> Aug 14 21:35:19 workstation kernel: ---<>---
> Aug 14 21:35:19 workstation kernel: Copyright (c) 1992-2018 The FreeBSD
> Project.
> Aug 14 21:35:19 workstation kernel: Copyright (c) 1979, 1980, 1983, 1986,
> 1988, 1989, 1991, 1992, 1993, 1994
> Aug 14 21:35:19 workstation kernel: The Regents of the University of
> California. All rights reserved.
> Aug 14 21:35:19 workstation kernel: FreeBSD is a registered trademark of
> The FreeBSD Foundation.
> Aug 14 21:35:19 workstation kernel: FreeBSD 12.0-ALPHA1 #9
> 8501082ee8a(master): Tue Aug 14 21:13:15 CEST 2018
> Aug 14 21:35:19 workstation kernel:
> root@workstation:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG
> amd64
> Aug 14 21:35:19 workstation kernel: FreeBSD clang version 6.0.1
> (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1)
> Aug 14 21:35:19 workstation kernel: VT(efifb): resolution 1024x768
> Aug 14 21:35:19 workstation kernel: CPU: Intel(R) Core(TM) i7-2600K CPU @
> 3.40GHz (3410.09-MHz K8-class CPU)
> Aug 14 21:35:19 workstation kernel:   Origin="GenuineIntel"  Id=0x206a7
> Family=0x6  Model=0x2a  Stepping=7
> Aug 14 21:35:19 workstation kernel:
> Features=0xbfebfbff
> Aug 14 21:35:19 workstation kernel:
> Features2=0x1f9ae3bf
> Aug 14 21:35:19 workstation kernel:   AMD
> Features=0x28100800
> Aug 14 21:35:19 workstation kernel:   AMD Features2=0x1
> Aug 14 21:35:19 workstation kernel:   XSAVE Features=0x1
> Aug 14 21:35:19 workstation kernel:   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
> Aug 14 21:35:19 workstation kernel:   TSC: P-state invariant, performance
> statistics
> Aug 14 21:35:19 workstation kernel: real memory  = 34359738368 (32768 MB)
> Aug 14 21:35:19 workstation kernel: avail memory = 3342208 (31789 MB)
> Aug 14 21:35:19 workstation kernel: Event timer "LAPIC" quality 600
> Aug 14 21:35:19 workstation kernel: ACPI APIC Table: 
> Aug 14 21:35:19 workstation kernel: FreeBSD/SMP: Multiprocessor System
> Detected: 8 CPUs
> Aug 14 21:35:19 workstation kernel: FreeBSD/SMP: 1 package(s) x 4 core(s) x
> 2 hardware threads
> Aug 14 21:35:19 workstation kernel: random: unblocking device.
> Aug 14 21:35:19 workstation kernel: ioapic0  irqs 0-23 on
> motherboard
>>> REBOOT Aug 14 21:35:19 workstation kernel: Launching APs: 1 7 3 6 4 5
> ---<>---
> Aug 14 21:35:19 workstation kernel: Copyright (c) 1992-2018 The FreeBSD
> Project

I observed the same with the MFC'd back r337773 to 11-STABLE.

So CC kib@ and @kevans.

My system is a Dell Latitude e5440 with EFI.

> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Restart at boot with new kernel

2018-08-14 Thread Matthias Gamsjager
Ok missed the solution posted earlier: adding ' efi.rt.disabled=1' to
loader.conf fixed the issue for me as well
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Restart at boot with new kernel

2018-08-14 Thread Matthias Gamsjager
Hi list,

just compiled current and after reboot into the new kernel the machine
reboots itself at the marked line below. No further info as the reset is
sudden. Reboot with the old kernel from 12 aug works. Removed all kernel
modules but with no result.

Any pointers how to approach this?

Aug 14 21:35:19 workstation kernel: ---<>---
Aug 14 21:35:19 workstation kernel: Copyright (c) 1992-2018 The FreeBSD
Project.
Aug 14 21:35:19 workstation kernel: Copyright (c) 1979, 1980, 1983, 1986,
1988, 1989, 1991, 1992, 1993, 1994
Aug 14 21:35:19 workstation kernel: The Regents of the University of
California. All rights reserved.
Aug 14 21:35:19 workstation kernel: FreeBSD is a registered trademark of
The FreeBSD Foundation.
Aug 14 21:35:19 workstation kernel: FreeBSD 12.0-ALPHA1 #9
8501082ee8a(master): Tue Aug 14 21:13:15 CEST 2018
Aug 14 21:35:19 workstation kernel:
root@workstation:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG
amd64
Aug 14 21:35:19 workstation kernel: FreeBSD clang version 6.0.1
(tags/RELEASE_601/final 335540) (based on LLVM 6.0.1)
Aug 14 21:35:19 workstation kernel: VT(efifb): resolution 1024x768
Aug 14 21:35:19 workstation kernel: CPU: Intel(R) Core(TM) i7-2600K CPU @
3.40GHz (3410.09-MHz K8-class CPU)
Aug 14 21:35:19 workstation kernel:   Origin="GenuineIntel"  Id=0x206a7
Family=0x6  Model=0x2a  Stepping=7
Aug 14 21:35:19 workstation kernel:
Features=0xbfebfbff
Aug 14 21:35:19 workstation kernel:
Features2=0x1f9ae3bf
Aug 14 21:35:19 workstation kernel:   AMD
Features=0x28100800
Aug 14 21:35:19 workstation kernel:   AMD Features2=0x1
Aug 14 21:35:19 workstation kernel:   XSAVE Features=0x1
Aug 14 21:35:19 workstation kernel:   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
Aug 14 21:35:19 workstation kernel:   TSC: P-state invariant, performance
statistics
Aug 14 21:35:19 workstation kernel: real memory  = 34359738368 (32768 MB)
Aug 14 21:35:19 workstation kernel: avail memory = 3342208 (31789 MB)
Aug 14 21:35:19 workstation kernel: Event timer "LAPIC" quality 600
Aug 14 21:35:19 workstation kernel: ACPI APIC Table: 
Aug 14 21:35:19 workstation kernel: FreeBSD/SMP: Multiprocessor System
Detected: 8 CPUs
Aug 14 21:35:19 workstation kernel: FreeBSD/SMP: 1 package(s) x 4 core(s) x
2 hardware threads
Aug 14 21:35:19 workstation kernel: random: unblocking device.
Aug 14 21:35:19 workstation kernel: ioapic0  irqs 0-23 on
motherboard
>> REBOOT Aug 14 21:35:19 workstation kernel: Launching APs: 1 7 3 6 4 5
---<>---
Aug 14 21:35:19 workstation kernel: Copyright (c) 1992-2018 The FreeBSD
Project
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


boot errors since upgrading to 12-current

2018-08-14 Thread tech-lists

Hello,

context: amd64, FreeBSD 12.0-ALPHA1 #0 r337682, ZFS. The system is *not* 
root-on-zfs. It boots to an SSD. The three disks indicated below are 
spinning rust.


NAMESTATE READ WRITE CKSUM
storage ONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
ada1ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0

This machine was running 11.2 up until about a month ago.

Recently I've seen this flash up on the screen before getting to the 
beastie screen:


BIOS drive C: is disk0
BIOS drive D: is disk1
BIOS drive E: is disk2
BIOS drive F: is disk3
BIOS drive G: is disk4
BIOS drive H: is disk5
BIOS drive I: is disk6
BIOS drive J: is disk7

[the above is normal and has always has been seen on every boot]

read 1 from 0 to 0xcbdb1330, error: 0x31
read 1 from 0 to 0xcbdb1330, error: 0x31
read 1 from 0 to 0xcbdb1330, error: 0x31
read 1 from 0 to 0xcbdb1330, error: 0x31
read 1 from 0 to 0xcbdb1330, error: 0x31
read 1 from 0 to 0xcbdb1330, error: 0x31
read 1 from 0 to 0xcbdb1330, error: 0x31
read 1 from 0 to 0xcbdb1330, error: 0x31

the above has been happening since upgrading to -current a month ago

 ZFS: i/o error - all block copies unavailable
 ZFS: can't read MOS of pool storage

the above is alarming and has been happening for the past couple of 
days, since upgrading to r337682 on the 12th August.


The beastie screen then loads and it boots normally.

Should I be concerned? Is the output indicative of a problem?

thanks,
--
J.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TCP server app performance

2018-08-14 Thread Navdeep Parhar
On 8/12/18 9:50 AM, Honda Michio wrote:
> Hi,
> 
> I'm measuring TCP server app performance using my toy web server.
> It just accept TCP connections and responds back HTTP OK to the clients.
> It monitors sockets using kqueue, and processes each ready descriptor using
> a pair of read() and write(). (in more detail, it's
> https://github.com/micchie/netmap/tree/paste/apps/phttpd)
> 
> Using 100 persistent TCP connections (the client sends 44 B HTTP GET and
> the server responds with 151 B of HTTP OK) and a single CPU core, I only
> get 152K requests per second, which is 2.5x slower than Linux that runs the
> same app  (except that it uses epoll instead of kqueue).
> I cannot justify this by myself. Does anybody has some intuition about how
> much FreeBSD would get with such workloads?
> I tried disabling TCP delayed ack and changing interrupt rates, but no
> significant difference was observed.
> 
> I use FreeBSD-CURRENT with GENERIC-NODEBUG (git commit hash: 3015145c3aa4b).
> For hardware, the server has Xeon Silver 4110 and Intel X540 NIC (activate
> only a single queue as I test with a single CPU core). All the offloadings
> are disabled.

I hope hw L3/L4 checksumming is still on?

Are your results similar to what you get with 100 (same number as your
test clients) netperf's doing TCP_RR on this setup, or wildly different?

Regards,
Navdeep
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in efi_get_time() on EPCY system when booting

2018-08-14 Thread Michael Tuexen
> On 14. Aug 2018, at 17:09, Kyle Evans  wrote:
> 
> On Tue, Aug 14, 2018 at 11:04 AM, Michael Tuexen  wrote:
>> Dear all,
>> 
>> r337761 panics on boot with a GENERIC kernel on a EPYC system:
> 
> Oy. =(
> 
>> [...]
>> panic: mutex pmap not owned at ../../../amd64/amd64/efirt_machdep.c:268
>> [...]
>> 
>> Any idea what is wrong?
>> 
> 
> Ah, this should be fixed by https://reviews.freebsd.org/D16618 --
> immediate workaround is set efi.rt.disabled=1 in loader.conf(5) or at
> loader prompt. Apologies for the hassle.
Hi Kyle,

I can confirm that r337761 + D16618 boots fine.

Thanks for the quick response!

Best regards
Michael
> 
> Thanks,
> 
> Kyle Evans

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in efi_get_time() on EPCY system when booting

2018-08-14 Thread Kyle Evans
On Tue, Aug 14, 2018 at 11:04 AM, Michael Tuexen  wrote:
> Dear all,
>
> r337761 panics on boot with a GENERIC kernel on a EPYC system:

Oy. =(

> [...]
> panic: mutex pmap not owned at ../../../amd64/amd64/efirt_machdep.c:268
> [...]
>
> Any idea what is wrong?
>

Ah, this should be fixed by https://reviews.freebsd.org/D16618 --
immediate workaround is set efi.rt.disabled=1 in loader.conf(5) or at
loader prompt. Apologies for the hassle.

Thanks,

Kyle Evans
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in efi_get_time() on EPCY system when booting

2018-08-14 Thread Rebecca Cran
This also happens on Qemu with UEFI.

—  
Rebecca
(Apologies for top posting, I’m replying on my phone)

On August 14, 2018 at 10:04:53 AM, Michael Tuexen 
(tue...@freebsd.org(mailto:tue...@freebsd.org)) wrote:

> Dear all,
>  
> r337761 panics on boot with a GENERIC kernel on a EPYC system:
>  
> Copyright (c) 1992-2018 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 12.0-ALPHA1 #3 r337761: Tue Aug 14 17:59:05 CEST 2018
> tue...@epyc.nplab.de:/usr/home/tuexen/head/sys/amd64/compile/TCP amd64
> FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 
> 6.0.1)
> WARNING: WITNESS option enabled, expect reduced performance.
> VT(efifb): resolution 1024x768
> CPU: AMD EPYC 7551P 32-Core Processor (2000.05-MHz K8-class CPU)
> Origin="AuthenticAMD" Id=0x800f12 Family=0x17 Model=0x1 Stepping=2
> Features=0x178bfbff
> Features2=0x7ed8320b
> AMD Features=0x2e500800
> AMD 
> Features2=0x35c233ff
> Structured Extended 
> Features=0x209c01a9
> XSAVE Features=0xf
> AMD Extended Feature Extensions ID EBX=0x1007
> SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
> TSC: P-state invariant, performance statistics
> real memory = 137438953472 (131072 MB)
> avail memory = 133661786112 (127469 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: < >
> FreeBSD/SMP: Multiprocessor System Detected: 64 CPUs
> FreeBSD/SMP: 1 package(s) x 4 groups x 2 cache groups x 4 core(s) x 2 
> hardware threads
> random: unblocking device.
> ioapic0: Changing APIC ID to 128
> ioapic1: Changing APIC ID to 129
> ioapic2: Changing APIC ID to 130
> ioapic3: Changing APIC ID to 131
> ioapic4: Changing APIC ID to 132
> ioapic0  irqs 0-23 on motherboard
> ioapic1  irqs 24-55 on motherboard
> ioapic2  irqs 56-87 on motherboard
> ioapic3  irqs 88-119 on motherboard
> ioapic4  irqs 120-151 on motherboard
> Launching APs: 14 6 52 16 28 44 42 36 20 13 37 4 54 15 21 5 55 18 30 8 26 9 
> 29 58 53 10 56 38 31 11 49 46 22 48 43 12 35 45 41 23 39 51 32 24 27 61 63 33 
> 1 62 60 7 59 40 34 47 2 3 17 19 50 25 57
> Timecounter "TSC" frequency 254700 Hz quality 1000
> random: entropy device external interface
> netmap: loaded module
> [ath_hal] loaded
> module_register_init: MOD_LOAD (vesa, 0x81126100, 0) error 19
> kbd1 at kbdmux0
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> nexus0
> panic: mutex pmap not owned at ../../../amd64/amd64/efirt_machdep.c:268
> cpuid = 60
> time = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0x826ce810
> vpanic() at vpanic+0x1a3/frame 0x826ce870
> panic() at panic+0x43/frame 0x826ce8d0
> __mtx_assert() at __mtx_assert+0xb4/frame 0x826ce8e0
> efi_arch_enter() at efi_arch_enter+0x30/frame 0x826ce910
> efi_get_time() at efi_get_time+0xbd/frame 0x826ce960
> efirtc_probe() at efirtc_probe+0x17/frame 0x826ce990
> device_probe_child() at device_probe_child+0x164/frame 0x826ce9f0
> device_probe() at device_probe+0x98/frame 0x826cea20
> device_probe_and_attach() at device_probe_and_attach+0x32/frame 
> 0x826cea50
> bus_generic_attach() at bus_generic_attach+0x18/frame 0x826cea70
> device_attach() at device_attach+0x3f3/frame 0x826ceab0
> device_probe_and_attach() at device_probe_and_attach+0x71/frame 
> 0x826ceae0
> bus_generic_new_pass() at bus_generic_new_pass+0xdd/frame 0x826ceb10
> bus_set_pass() at bus_set_pass+0x8c/frame 0x826ceb40
> configure() at configure+0x9/frame 0x826ceb50
> mi_startup() at mi_startup+0x118/frame 0x826ceb70
> btext() at btext+0x2c
> KDB: enter: panic
> [ thread pid 0 tid 10 ]
> Stopped at kdb_enter+0x3b: movq $0,kdb_why
> db>
>  
> Any idea what is wrong?
>  
> Best regards
> Michael
>  
>  
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Panic in efi_get_time() on EPCY system when booting

2018-08-14 Thread Michael Tuexen
Dear all,

r337761 panics on boot with a GENERIC kernel on a EPYC system:

Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-ALPHA1 #3 r337761: Tue Aug 14 17:59:05 CEST 2018
tue...@epyc.nplab.de:/usr/home/tuexen/head/sys/amd64/compile/TCP amd64
FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 
6.0.1)
WARNING: WITNESS option enabled, expect reduced performance.
VT(efifb): resolution 1024x768
CPU: AMD EPYC 7551P 32-Core Processor(2000.05-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f12  Family=0x17  Model=0x1  Stepping=2
  
Features=0x178bfbff
  
Features2=0x7ed8320b
  AMD Features=0x2e500800
  AMD 
Features2=0x35c233ff
  Structured Extended 
Features=0x209c01a9
  XSAVE Features=0xf
  AMD Extended Feature Extensions ID EBX=0x1007
  SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics
real memory  = 137438953472 (131072 MB)
avail memory = 133661786112 (127469 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: < >
FreeBSD/SMP: Multiprocessor System Detected: 64 CPUs
FreeBSD/SMP: 1 package(s) x 4 groups x 2 cache groups x 4 core(s) x 2 hardware 
threads
random: unblocking device.
ioapic0: Changing APIC ID to 128
ioapic1: Changing APIC ID to 129
ioapic2: Changing APIC ID to 130
ioapic3: Changing APIC ID to 131
ioapic4: Changing APIC ID to 132
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-55 on motherboard
ioapic2  irqs 56-87 on motherboard
ioapic3  irqs 88-119 on motherboard
ioapic4  irqs 120-151 on motherboard
Launching APs: 14 6 52 16 28 44 42 36 20 13 37 4 54 15 21 5 55 18 30 8 26 9 29 
58 53 10 56 38 31 11 49 46 22 48 43 12 35 45 41 23 39 51 32 24 27 61 63 33 1 62 
60 7 59 40 34 47 2 3 17 19 50 25 57
Timecounter "TSC" frequency 254700 Hz quality 1000
random: entropy device external interface
netmap: loaded module
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0x81126100, 0) error 19
kbd1 at kbdmux0
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
nexus0
panic: mutex pmap not owned at ../../../amd64/amd64/efirt_machdep.c:268
cpuid = 60
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0x826ce810
vpanic() at vpanic+0x1a3/frame 0x826ce870
panic() at panic+0x43/frame 0x826ce8d0
__mtx_assert() at __mtx_assert+0xb4/frame 0x826ce8e0
efi_arch_enter() at efi_arch_enter+0x30/frame 0x826ce910
efi_get_time() at efi_get_time+0xbd/frame 0x826ce960
efirtc_probe() at efirtc_probe+0x17/frame 0x826ce990
device_probe_child() at device_probe_child+0x164/frame 0x826ce9f0
device_probe() at device_probe+0x98/frame 0x826cea20
device_probe_and_attach() at device_probe_and_attach+0x32/frame 
0x826cea50
bus_generic_attach() at bus_generic_attach+0x18/frame 0x826cea70
device_attach() at device_attach+0x3f3/frame 0x826ceab0
device_probe_and_attach() at device_probe_and_attach+0x71/frame 
0x826ceae0
bus_generic_new_pass() at bus_generic_new_pass+0xdd/frame 0x826ceb10
bus_set_pass() at bus_set_pass+0x8c/frame 0x826ceb40
configure() at configure+0x9/frame 0x826ceb50
mi_startup() at mi_startup+0x118/frame 0x826ceb70
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 10 ]
Stopped at  kdb_enter+0x3b: movq$0,kdb_why
db> 

Any idea what is wrong?

Best regards
Michael


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Mergemaster tweak

2018-08-14 Thread Warner Losh
I never have /usr/src on my system (other than the directory that mtree
creates). So when I run mergemaster it's always failing until I remember to
add the -m . (or was it -t ., I sometimes forget).

Given the recent ntpd stuff, I was annoyed enough with this default
behavior to create a  fix for it, which I've uploaded to
https://reviews.freebsd.org/D16709 for review.

Basically, if the proposed SOURCEDIR doesn't have a Makefile.inc1, but .
does, then it prompts to use that. If it can't find one, it errors out
early as a nice side effect.

Please comment on the review. Thanks!

Warner
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Make drm drivers use MTRR write-combine

2018-08-14 Thread Konstantin Belousov
On Tue, Aug 14, 2018 at 08:55:36AM -0500, Eric van Gyzen wrote:
> On 8/14/18 4:12 AM, Johannes Lundberg wrote:
> > Hi
> > 
> > Something that we have seen for a long time on FreeBSD is the boot message
> > 
> > Failed to add WC MTRR for [0xd000-0xdfff]: -22; performance may
> > suffer
> > 
> > Taking a closer look at this with memcontrol I can see that the 256 MB
> > region that DRM wants to set as WC is already covered by this entry
> > 0xc000/0x4000 BIOS uncacheable set-by-firmware active
> > 
> > Similar on both my Skylake and Broadwell systems.
> I see something similar on my Dell XPS 13 with a Kaby Lake R:
> 
> Failed to add WC MTRR for [0x9000-0x9fff]: -22; performance may 
> suffer
> 
> 0x8000/0x8000 BIOS uncacheable set-by-firmware active
> 
> The only mappings in this range are MMIO:
> 
> machdep.efi_map:
>Type Physical  Virtual   #Pages Attr
> [snip]
> MemoryMappedIO e000   0xe000 0001 RUNTIME
> MemoryMappedIO fe00   0xfe00 0011 UC RUNTIME
> MemoryMappedIO fec0   0xfec0 0001 UC RUNTIME
> MemoryMappedIO fee0   0xfee0 0001 UC WT WB WP RUNTIME
> MemoryMappedIO ff00   0xff00 1000 UC WT WB WP RUNTIME

Yes, the cause of the message is that current x86 mtrr code is not
sufficient to handle this situation. You have BIOS-configured variable
range MTRR which covers upper half of the low 4G, as uncacheable (UC).
It is reasonable for BIOS to set it up this way because this is where
PCIe BARs and other devices MMIO regions are located.

One of the BARs there is the GPU aperture that really should be WC
(write-combining). There are two ways to achieve this: split the UC
variable-length MTRR range into three, UC/WC/UC, which would require
three MTRRs to cover. This is what current x86_mem.c code does not
support.

Another way is to set WC mode in the page table entries (PTEs) using
Page Attribute Table (PAT), for all PTEs. According to the rules of
combination of the memory access types between MTRR and PAT, WC in PAT
and any access mode in MTRR gives effective WC.

I saw the same warning when I initially ported GEM. My code used WC PAT
type, which makes the warning cosmetical, and which made me to not add
MTRR split. If new drm driver also consistently uses WC memattr when
creating aperture mappings, then the warning can be ignored as well.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: non-SMP i386 build failure after SVN r337715

2018-08-14 Thread Mark Johnston
On Mon, Aug 13, 2018 at 09:53:22PM -0400, Michael Butler wrote:
> non-SMP builds apparently don't define some required structures ..

Thanks, this should be fixed by r337751.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Make drm drivers use MTRR write-combine

2018-08-14 Thread Eric van Gyzen

On 8/14/18 4:12 AM, Johannes Lundberg wrote:

Hi

Something that we have seen for a long time on FreeBSD is the boot message

Failed to add WC MTRR for [0xd000-0xdfff]: -22; performance may
suffer

Taking a closer look at this with memcontrol I can see that the 256 MB
region that DRM wants to set as WC is already covered by this entry
0xc000/0x4000 BIOS uncacheable set-by-firmware active

Similar on both my Skylake and Broadwell systems.

I see something similar on my Dell XPS 13 with a Kaby Lake R:

Failed to add WC MTRR for [0x9000-0x9fff]: -22; performance may 
suffer


0x8000/0x8000 BIOS uncacheable set-by-firmware active

The only mappings in this range are MMIO:

machdep.efi_map:
  Type Physical  Virtual   #Pages Attr
[snip]
MemoryMappedIO e000   0xe000 0001 RUNTIME
MemoryMappedIO fe00   0xfe00 0011 UC RUNTIME
MemoryMappedIO fec0   0xfec0 0001 UC RUNTIME
MemoryMappedIO fee0   0xfee0 0001 UC WT WB WP RUNTIME
MemoryMappedIO ff00   0xff00 1000 UC WT WB WP RUNTIME

Eric
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Make drm drivers use MTRR write-combine

2018-08-14 Thread Johannes Lundberg
Hi

Something that we have seen for a long time on FreeBSD is the boot message

Failed to add WC MTRR for [0xd000-0xdfff]: -22; performance may
suffer

Taking a closer look at this with memcontrol I can see that the 256 MB
region that DRM wants to set as WC is already covered by this entry
0xc000/0x4000 BIOS uncacheable set-by-firmware active

Similar on both my Skylake and Broadwell systems.

The linuxkpi wrapper can be found here:
https://github.com/FreeBSDDesktop/kms-drm/blob/drm-v4.15/linuxkpi/gplv2/src/linux_mtrr.c

There doesn't seem to exist a function for changing the properties of a sub
region:
https://github.com/FreeBSDDesktop/freebsd-base/blob/master/sys/dev/mem/memutil.c

Any ideas of a good solution to this? Can this region be blacklisted or is
there a safe way to split the big region into several regions with
different flags when the drm driver loads?

For reference, my AMD machine logs this
# dmesg | grep MTRR
Successfully added WC MTRR for [0xe000-0xefff]: 0;
# memcontrol list
--SNIP--
0xff000/0x1000 BIOS write-protect fixed-base fixed-length set-by-firmware
active
0x0/0x8000 BIOS write-back set-by-firmware active
0x8000/0x4000 BIOS write-back set-by-firmware active
0xc000/0x2000 BIOS write-back set-by-firmware active
0xe000/0x1000 drm write-combine active

Not sure if it's a BIOS or CPU vendor issue.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"