Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Xiao Guangrong



On 09/02/2015 06:54 PM, Markus Trippelsdorf wrote:

On 2015.09.02 at 18:27 +0800, Xiao Guangrong wrote:



On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:

On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:


No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.


Thanks for your confirmation.



markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT


Strange, BP processor has already set WC to PAT1, however KVM does not read it 
out
from PAT MSR on its local CPU.

Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
finished after module_init()?

Could please apply this diff and test it again?


(Your patch was malformed.)

[2.138098] kvm: Nested Virtualization enabled
[2.138153] kvm: Nested Paging enabled
[2.138204] KVM PAT: 0x7040600070406.


So the PAT is the value after CPU reset, it's likely PAT is not initialized on
the local CPU.


Could it be a simple AMD/INTEL difference. I'm running a AMD CPU and I
see many !use_intel() in if statements in arch/x86/kernel/cpu/mtrr/main.c...



#define use_intel() (mtrr_if && mtrr_if->use_intel_if == 1)

And i checked your CPU supports "mtrr" /proc/info, so it should use
generic_mtrr_ops and generic_mtrr_ops.use_intel_if = 1. That means AMD CPU
also use "intel" way. :)

Please refer to the initiation of "mtrr_if" in mtrr_bp_init().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Markus Trippelsdorf
On 2015.09.02 at 18:27 +0800, Xiao Guangrong wrote:
> 
> 
> On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:
> > On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
> >>>
> >>> No. PAT is of course enabled and booting is successful sometimes even
> >>> with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
> >>
> >> Thanks for your confirmation.
> >>
> >>>
> >>> markus@x4 linux % cat .config | grep  X86_PAT
> >>> CONFIG_X86_PAT=y
> >>> markus@x4 linux % dmesg | grep PAT
> >>> [0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- 
> >>> WT
> >>
> >> Strange, BP processor has already set WC to PAT1, however KVM does not 
> >> read it out
> >> from PAT MSR on its local CPU.
> >>
> >> Hmm... PAT default values do not include WC, it seems initing PAT on SP 
> >> has not
> >> finished after module_init()?
> >>
> >> Could please apply this diff and test it again?
> >
> > (Your patch was malformed.)
> >
> > [2.138098] kvm: Nested Virtualization enabled
> > [2.138153] kvm: Nested Paging enabled
> > [2.138204] KVM PAT: 0x7040600070406.
> 
> So the PAT is the value after CPU reset, it's likely PAT is not initialized on
> the local CPU.

Could it be a simple AMD/INTEL difference. I'm running a AMD CPU and I
see many !use_intel() in if statements in arch/x86/kernel/cpu/mtrr/main.c...

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Xiao Guangrong



On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:

On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:


No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.


Thanks for your confirmation.



markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT


Strange, BP processor has already set WC to PAT1, however KVM does not read it 
out
from PAT MSR on its local CPU.

Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
finished after module_init()?

Could please apply this diff and test it again?


(Your patch was malformed.)

[2.138098] kvm: Nested Virtualization enabled
[2.138153] kvm: Nested Paging enabled
[2.138204] KVM PAT: 0x7040600070406.


So the PAT is the value after CPU reset, it's likely PAT is not initialized on
the local CPU.

Maybe something is escaped from stop_machine() called in native_smp_cpus_done(),
Ingo?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Markus Trippelsdorf
On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
> >
> > No. PAT is of course enabled and booting is successful sometimes even
> > with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
> 
> Thanks for your confirmation.
> 
> >
> > markus@x4 linux % cat .config | grep  X86_PAT
> > CONFIG_X86_PAT=y
> > markus@x4 linux % dmesg | grep PAT
> > [0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
> 
> Strange, BP processor has already set WC to PAT1, however KVM does not read 
> it out
> from PAT MSR on its local CPU.
> 
> Hmm... PAT default values do not include WC, it seems initing PAT on SP has 
> not
> finished after module_init()?
> 
> Could please apply this diff and test it again?

(Your patch was malformed.)

[2.138098] kvm: Nested Virtualization enabled
[2.138153] kvm: Nested Paging enabled
[2.138204] KVM PAT: 0x7040600070406.
[2.138255] mtrr2protval[0]:18.
[2.138306] mtrr2protval[1]:ff.
[2.138356] mtrr2protval[2]:0.
[2.138408] mtrr2protval[3]:0.
[2.138459] mtrr2protval[4]:8.
[2.138510] mtrr2protval[5]:ff.
[2.138561] mtrr2protval[6]:0.
[2.138612] mtrr2protval[7]:10.
[2.138662] BUG in fallback_mtrr_type, mtrr = 1.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Xiao Guangrong



On 09/02/2015 11:50 AM, Markus Trippelsdorf wrote:

On 2015.09.02 at 06:31 +0800, Xiao Guangrong wrote:



On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:

On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:


Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
If yes, could you please print the actual value out?


It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
it prints 1 for the value of mtrr.

   MTRR_TYPE_WRCOMB 1



Then I suspect pat is not enabled in your box, could you please check
CONFIG_X86_PAT is selected in your .config file, pat is shown in
/proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.


No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.


Thanks for your confirmation.



markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT


Strange, BP processor has already set WC to PAT1, however KVM does not read it 
out
from PAT MSR on its local CPU.

Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
finished after module_init()?

Could please apply this diff and test it again?

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 189e464..d9d3a30 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -884,6 +884,7 @@ static u8 fallback_mtrr_type(int mtrr)
case MTRR_TYPE_WRPROT:
return MTRR_TYPE_UC_MINUS;
default:
+   printk("BUG in %s, mtrr = %d.\n", __FUNCTION__, mtrr);
BUG();
}
 }
@@ -907,6 +908,8 @@ static void build_mtrr2protval(void)
 * guest.
 */
rdmsrl(MSR_IA32_CR_PAT, pat);
+   printk("KVM PAT: 0x%llx.\n", pat);
+
for (i = 0; i < 8; i++) {
u8 mtrr = pat >> (8 * i);

@@ -914,10 +917,17 @@ static void build_mtrr2protval(void)
mtrr2protval[mtrr] = __cm_idx2pte(i);
}

+   for (i = 0; i < 8; i++)
+   printk("mtrr2protval[%d]:%x.\n", i, mtrr2protval[i]);
+
+
for (i = 0; i < 8; i++) {
if (mtrr2protval[i] == MTRR2PROTVAL_INVALID) {
u8 fallback = fallback_mtrr_type(i);
mtrr2protval[i] = mtrr2protval[fallback];
+   if (mtrr2protval[i] == MTRR2PROTVAL_INVALID)
+   printk("BUG in %s, mtrr2protval[%d] = %x.\n", __FUNCTION__, i, 
mtrr2protval[i]);

+
BUG_ON(mtrr2protval[i] == MTRR2PROTVAL_INVALID);
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Xiao Guangrong



On 09/02/2015 11:50 AM, Markus Trippelsdorf wrote:

On 2015.09.02 at 06:31 +0800, Xiao Guangrong wrote:



On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:

On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:


Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
If yes, could you please print the actual value out?


It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
it prints 1 for the value of mtrr.

   MTRR_TYPE_WRCOMB 1



Then I suspect pat is not enabled in your box, could you please check
CONFIG_X86_PAT is selected in your .config file, pat is shown in
/proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.


No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.


Thanks for your confirmation.



markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT


Strange, BP processor has already set WC to PAT1, however KVM does not read it 
out
from PAT MSR on its local CPU.

Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
finished after module_init()?

Could please apply this diff and test it again?

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 189e464..d9d3a30 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -884,6 +884,7 @@ static u8 fallback_mtrr_type(int mtrr)
case MTRR_TYPE_WRPROT:
return MTRR_TYPE_UC_MINUS;
default:
+   printk("BUG in %s, mtrr = %d.\n", __FUNCTION__, mtrr);
BUG();
}
 }
@@ -907,6 +908,8 @@ static void build_mtrr2protval(void)
 * guest.
 */
rdmsrl(MSR_IA32_CR_PAT, pat);
+   printk("KVM PAT: 0x%llx.\n", pat);
+
for (i = 0; i < 8; i++) {
u8 mtrr = pat >> (8 * i);

@@ -914,10 +917,17 @@ static void build_mtrr2protval(void)
mtrr2protval[mtrr] = __cm_idx2pte(i);
}

+   for (i = 0; i < 8; i++)
+   printk("mtrr2protval[%d]:%x.\n", i, mtrr2protval[i]);
+
+
for (i = 0; i < 8; i++) {
if (mtrr2protval[i] == MTRR2PROTVAL_INVALID) {
u8 fallback = fallback_mtrr_type(i);
mtrr2protval[i] = mtrr2protval[fallback];
+   if (mtrr2protval[i] == MTRR2PROTVAL_INVALID)
+   printk("BUG in %s, mtrr2protval[%d] = %x.\n", __FUNCTION__, i, 
mtrr2protval[i]);

+
BUG_ON(mtrr2protval[i] == MTRR2PROTVAL_INVALID);
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Markus Trippelsdorf
On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
> >
> > No. PAT is of course enabled and booting is successful sometimes even
> > with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
> 
> Thanks for your confirmation.
> 
> >
> > markus@x4 linux % cat .config | grep  X86_PAT
> > CONFIG_X86_PAT=y
> > markus@x4 linux % dmesg | grep PAT
> > [0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
> 
> Strange, BP processor has already set WC to PAT1, however KVM does not read 
> it out
> from PAT MSR on its local CPU.
> 
> Hmm... PAT default values do not include WC, it seems initing PAT on SP has 
> not
> finished after module_init()?
> 
> Could please apply this diff and test it again?

(Your patch was malformed.)

[2.138098] kvm: Nested Virtualization enabled
[2.138153] kvm: Nested Paging enabled
[2.138204] KVM PAT: 0x7040600070406.
[2.138255] mtrr2protval[0]:18.
[2.138306] mtrr2protval[1]:ff.
[2.138356] mtrr2protval[2]:0.
[2.138408] mtrr2protval[3]:0.
[2.138459] mtrr2protval[4]:8.
[2.138510] mtrr2protval[5]:ff.
[2.138561] mtrr2protval[6]:0.
[2.138612] mtrr2protval[7]:10.
[2.138662] BUG in fallback_mtrr_type, mtrr = 1.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Xiao Guangrong



On 09/02/2015 06:54 PM, Markus Trippelsdorf wrote:

On 2015.09.02 at 18:27 +0800, Xiao Guangrong wrote:



On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:

On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:


No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.


Thanks for your confirmation.



markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT


Strange, BP processor has already set WC to PAT1, however KVM does not read it 
out
from PAT MSR on its local CPU.

Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
finished after module_init()?

Could please apply this diff and test it again?


(Your patch was malformed.)

[2.138098] kvm: Nested Virtualization enabled
[2.138153] kvm: Nested Paging enabled
[2.138204] KVM PAT: 0x7040600070406.


So the PAT is the value after CPU reset, it's likely PAT is not initialized on
the local CPU.


Could it be a simple AMD/INTEL difference. I'm running a AMD CPU and I
see many !use_intel() in if statements in arch/x86/kernel/cpu/mtrr/main.c...



#define use_intel() (mtrr_if && mtrr_if->use_intel_if == 1)

And i checked your CPU supports "mtrr" /proc/info, so it should use
generic_mtrr_ops and generic_mtrr_ops.use_intel_if = 1. That means AMD CPU
also use "intel" way. :)

Please refer to the initiation of "mtrr_if" in mtrr_bp_init().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Xiao Guangrong



On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:

On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:


No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.


Thanks for your confirmation.



markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT


Strange, BP processor has already set WC to PAT1, however KVM does not read it 
out
from PAT MSR on its local CPU.

Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
finished after module_init()?

Could please apply this diff and test it again?


(Your patch was malformed.)

[2.138098] kvm: Nested Virtualization enabled
[2.138153] kvm: Nested Paging enabled
[2.138204] KVM PAT: 0x7040600070406.


So the PAT is the value after CPU reset, it's likely PAT is not initialized on
the local CPU.

Maybe something is escaped from stop_machine() called in native_smp_cpus_done(),
Ingo?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-02 Thread Markus Trippelsdorf
On 2015.09.02 at 18:27 +0800, Xiao Guangrong wrote:
> 
> 
> On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:
> > On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
> >>>
> >>> No. PAT is of course enabled and booting is successful sometimes even
> >>> with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
> >>
> >> Thanks for your confirmation.
> >>
> >>>
> >>> markus@x4 linux % cat .config | grep  X86_PAT
> >>> CONFIG_X86_PAT=y
> >>> markus@x4 linux % dmesg | grep PAT
> >>> [0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- 
> >>> WT
> >>
> >> Strange, BP processor has already set WC to PAT1, however KVM does not 
> >> read it out
> >> from PAT MSR on its local CPU.
> >>
> >> Hmm... PAT default values do not include WC, it seems initing PAT on SP 
> >> has not
> >> finished after module_init()?
> >>
> >> Could please apply this diff and test it again?
> >
> > (Your patch was malformed.)
> >
> > [2.138098] kvm: Nested Virtualization enabled
> > [2.138153] kvm: Nested Paging enabled
> > [2.138204] KVM PAT: 0x7040600070406.
> 
> So the PAT is the value after CPU reset, it's likely PAT is not initialized on
> the local CPU.

Could it be a simple AMD/INTEL difference. I'm running a AMD CPU and I
see many !use_intel() in if statements in arch/x86/kernel/cpu/mtrr/main.c...

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Markus Trippelsdorf
On 2015.09.02 at 06:31 +0800, Xiao Guangrong wrote:
> 
> 
> On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:
> > On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
> >>
> >> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
> >> If yes, could you please print the actual value out?
> >
> > It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
> > it prints 1 for the value of mtrr.
> >
> >   MTRR_TYPE_WRCOMB 1
> >
> 
> Then I suspect pat is not enabled in your box, could you please check
> CONFIG_X86_PAT is selected in your .config file, pat is shown in
> /proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.

No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.

markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
markus@x4 linux % cat /proc/cpuinfo| grep pat
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni 
monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock 
nrip_save vmmcall
...

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Xiao Guangrong



On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:

On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:


Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
If yes, could you please print the actual value out?


It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
it prints 1 for the value of mtrr.

  MTRR_TYPE_WRCOMB 1



Then I suspect pat is not enabled in your box, could you please check
CONFIG_X86_PAT is selected in your .config file, pat is shown in
/proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.

I will post a fix if the suspect is right.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Markus Trippelsdorf
On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
> 
> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
> If yes, could you please print the actual value out?

It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
it prints 1 for the value of mtrr.

 MTRR_TYPE_WRCOMB 1

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Xiao Guangrong



On 09/01/2015 06:04 PM, Markus Trippelsdorf wrote:

On 2015.09.01 at 10:56 +0200, Ingo Molnar wrote:


* Markus Trippelsdorf  wrote:

As I wrote in my other reply. The boot failure is nondeterministic (boot
succeeds roughly every sixth time). So the bisection and the patch is
just bogus (,but the boot failure is real).

Sorry.


No problem. Please let us know if any of these commits does turn out to be the
culprit. (Which is always a possibility.)


I'm pretty sure commit 3c2e7f7de3 is the culprit.

commit 3c2e7f7de3240216042b61073803b61b9b3cfb22
Author: Paolo Bonzini 
Date:   Tue Jul 7 14:32:17 2015 +0200

 KVM: SVM: use NPT page attributes

I've booted ten times in a row successfully with the following patch:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74d825716f4f..3190173a575f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -989,7 +989,7 @@ static __init int svm_hardware_setup(void)
} else
kvm_disable_tdp();

-   build_mtrr2protval();
+// build_mtrr2protval();
return 0;

  err:

Paolo, your commit causes nondeterministic boot failure on my machine.
It sometimes crashes early with the following backtrace:



Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
If yes, could you please print the actual value out?

BTW, you may change BUG() to WARN() to get the print info more easier.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Markus Trippelsdorf
On 2015.09.01 at 10:56 +0200, Ingo Molnar wrote:
> 
> * Markus Trippelsdorf  wrote:
> > As I wrote in my other reply. The boot failure is nondeterministic (boot
> > succeeds roughly every sixth time). So the bisection and the patch is
> > just bogus (,but the boot failure is real).
> > 
> > Sorry.
> 
> No problem. Please let us know if any of these commits does turn out to be 
> the 
> culprit. (Which is always a possibility.)

I'm pretty sure commit 3c2e7f7de3 is the culprit.

commit 3c2e7f7de3240216042b61073803b61b9b3cfb22
Author: Paolo Bonzini 
Date:   Tue Jul 7 14:32:17 2015 +0200

KVM: SVM: use NPT page attributes

I've booted ten times in a row successfully with the following patch:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74d825716f4f..3190173a575f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -989,7 +989,7 @@ static __init int svm_hardware_setup(void)
} else
kvm_disable_tdp();
 
-   build_mtrr2protval();
+// build_mtrr2protval();
return 0;
 
 err:

Paolo, your commit causes nondeterministic boot failure on my machine.
It sometimes crashes early with the following backtrace:

map_vsyscall
kvm_arch_hardware_setup
map_vsyscall
kvm_init
map_vsyscall
do_one_initcall
kernel_init_freeable
rest_init
kernel_init
ret_from_fork
rest_init

RIP: svm_hardware_setup 

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Xiao Guangrong



On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:

On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:


Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
If yes, could you please print the actual value out?


It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
it prints 1 for the value of mtrr.

  MTRR_TYPE_WRCOMB 1



Then I suspect pat is not enabled in your box, could you please check
CONFIG_X86_PAT is selected in your .config file, pat is shown in
/proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.

I will post a fix if the suspect is right.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Markus Trippelsdorf
On 2015.09.01 at 10:56 +0200, Ingo Molnar wrote:
> 
> * Markus Trippelsdorf  wrote:
> > As I wrote in my other reply. The boot failure is nondeterministic (boot
> > succeeds roughly every sixth time). So the bisection and the patch is
> > just bogus (,but the boot failure is real).
> > 
> > Sorry.
> 
> No problem. Please let us know if any of these commits does turn out to be 
> the 
> culprit. (Which is always a possibility.)

I'm pretty sure commit 3c2e7f7de3 is the culprit.

commit 3c2e7f7de3240216042b61073803b61b9b3cfb22
Author: Paolo Bonzini 
Date:   Tue Jul 7 14:32:17 2015 +0200

KVM: SVM: use NPT page attributes

I've booted ten times in a row successfully with the following patch:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74d825716f4f..3190173a575f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -989,7 +989,7 @@ static __init int svm_hardware_setup(void)
} else
kvm_disable_tdp();
 
-   build_mtrr2protval();
+// build_mtrr2protval();
return 0;
 
 err:

Paolo, your commit causes nondeterministic boot failure on my machine.
It sometimes crashes early with the following backtrace:

map_vsyscall
kvm_arch_hardware_setup
map_vsyscall
kvm_init
map_vsyscall
do_one_initcall
kernel_init_freeable
rest_init
kernel_init
ret_from_fork
rest_init

RIP: svm_hardware_setup 

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Markus Trippelsdorf
On 2015.09.02 at 06:31 +0800, Xiao Guangrong wrote:
> 
> 
> On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:
> > On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
> >>
> >> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
> >> If yes, could you please print the actual value out?
> >
> > It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
> > it prints 1 for the value of mtrr.
> >
> >   MTRR_TYPE_WRCOMB 1
> >
> 
> Then I suspect pat is not enabled in your box, could you please check
> CONFIG_X86_PAT is selected in your .config file, pat is shown in
> /proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.

No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.

markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[0.00] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
markus@x4 linux % cat /proc/cpuinfo| grep pat
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni 
monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock 
nrip_save vmmcall
...

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Xiao Guangrong



On 09/01/2015 06:04 PM, Markus Trippelsdorf wrote:

On 2015.09.01 at 10:56 +0200, Ingo Molnar wrote:


* Markus Trippelsdorf  wrote:

As I wrote in my other reply. The boot failure is nondeterministic (boot
succeeds roughly every sixth time). So the bisection and the patch is
just bogus (,but the boot failure is real).

Sorry.


No problem. Please let us know if any of these commits does turn out to be the
culprit. (Which is always a possibility.)


I'm pretty sure commit 3c2e7f7de3 is the culprit.

commit 3c2e7f7de3240216042b61073803b61b9b3cfb22
Author: Paolo Bonzini 
Date:   Tue Jul 7 14:32:17 2015 +0200

 KVM: SVM: use NPT page attributes

I've booted ten times in a row successfully with the following patch:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74d825716f4f..3190173a575f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -989,7 +989,7 @@ static __init int svm_hardware_setup(void)
} else
kvm_disable_tdp();

-   build_mtrr2protval();
+// build_mtrr2protval();
return 0;

  err:

Paolo, your commit causes nondeterministic boot failure on my machine.
It sometimes crashes early with the following backtrace:



Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
If yes, could you please print the actual value out?

BTW, you may change BUG() to WARN() to get the print info more easier.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

2015-09-01 Thread Markus Trippelsdorf
On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
> 
> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
> If yes, could you please print the actual value out?

It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
it prints 1 for the value of mtrr.

 MTRR_TYPE_WRCOMB 1

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/