[PATCH 3.16.y-ckt 061/126] MIPS: KVM: Fix CACHE immediate offset sign extension

2016-01-06 Thread Luis Henriques
3.16.7-ckt22 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: James Hogan 

commit c5c2a3b998f1ff5a586f9d37e154070b8d550d17 upstream.

The immediate field of the CACHE instruction is signed, so ensure that
it gets sign extended by casting it to an int16_t rather than just
masking the low 16 bits.

Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch 
emulation.")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
[ luis: backported to 3.16:
  - file rename: emulate.c -> kvm_mips_emul.c ]
Signed-off-by: Luis Henriques 
---
 arch/mips/kvm/kvm_mips_emul.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c
index 18b4e2fdae33..950229176c2f 100644
--- a/arch/mips/kvm/kvm_mips_emul.c
+++ b/arch/mips/kvm/kvm_mips_emul.c
@@ -1434,7 +1434,7 @@ kvm_mips_emulate_cache(uint32_t inst, uint32_t *opc, 
uint32_t cause,
 
base = (inst >> 21) & 0x1f;
op_inst = (inst >> 16) & 0x1f;
-   offset = inst & 0x;
+   offset = (int16_t)inst;
cache = (inst >> 16) & 0x3;
op = (inst >> 18) & 0x7;
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.16.y-ckt 060/126] MIPS: KVM: Fix ASID restoration logic

2016-01-06 Thread Luis Henriques
3.16.7-ckt22 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: James Hogan 

commit 002374f371bd02df864cce1fe85d90dc5b292837 upstream.

ASID restoration on guest resume should determine the guest execution
mode based on the guest Status register rather than bit 30 of the guest
PC.

Fix the two places in locore.S that do this, loading the guest status
from the cop0 area. Note, this assembly is specific to the trap &
emulate implementation of KVM, so it doesn't need to check the
supervisor bit as that mode is not implemented in the guest.

Fixes: b680f70fc111 ("KVM/MIPS32: Entry point for trampolining to...")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
[ luis: backported to 3.16:
  - file rename: locore.S -> kvm_locore.S ]
Signed-off-by: Luis Henriques 
---
 arch/mips/kvm/kvm_locore.S | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/mips/kvm/kvm_locore.S b/arch/mips/kvm/kvm_locore.S
index 17376cd838e6..fc24acb3a837 100644
--- a/arch/mips/kvm/kvm_locore.S
+++ b/arch/mips/kvm/kvm_locore.S
@@ -159,9 +159,11 @@ FEXPORT(__kvm_mips_vcpu_run)
 
 FEXPORT(__kvm_mips_load_asid)
/* Set the ASID for the Guest Kernel */
-   INT_SLL t0, t0, 1   /* with kseg0 @ 0x4000, kernel */
-   /* addresses shift to 0x8000 */
-   bltzt0, 1f  /* If kernel */
+   PTR_L   t0, VCPU_COP0(k1)
+   LONG_L  t0, COP0_STATUS(t0)
+   andit0, KSU_USER | ST0_ERL | ST0_EXL
+   xorit0, KSU_USER
+   bnezt0, 1f  /* If kernel */
 INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID/* else user */
 1:
@@ -438,9 +440,11 @@ __kvm_mips_return_to_guest:
mtc0t0, CP0_EPC
 
/* Set the ASID for the Guest Kernel */
-   INT_SLL t0, t0, 1   /* with kseg0 @ 0x4000, kernel */
-   /* addresses shift to 0x8000 */
-   bltzt0, 1f  /* If kernel */
+   PTR_L   t0, VCPU_COP0(k1)
+   LONG_L  t0, COP0_STATUS(t0)
+   andit0, KSU_USER | ST0_ERL | ST0_EXL
+   xorit0, KSU_USER
+   bnezt0, 1f  /* If kernel */
 INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID/* else user */
 1:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.16.y-ckt 062/126] MIPS: KVM: Uninit VCPU in vcpu_create error path

2016-01-06 Thread Luis Henriques
3.16.7-ckt22 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: James Hogan 

commit 585bb8f9a5e592f2ce7abbe5ed3112d5438d2754 upstream.

If either of the memory allocations in kvm_arch_vcpu_create() fail, the
vcpu which has been allocated and kvm_vcpu_init'd doesn't get uninit'd
in the error handling path. Add a call to kvm_vcpu_uninit() to fix this.

Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
[ luis: backported to 3.16:
  - file rename: mips.c -> kvm_mips.c ]
Signed-off-by: Luis Henriques 
---
 arch/mips/kvm/kvm_mips.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index cc721a3c8996..2c81c2c9e8dc 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -307,7 +307,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
 
if (!gebase) {
err = -ENOMEM;
-   goto out_free_cpu;
+   goto out_uninit_cpu;
}
kvm_debug("Allocated %d bytes for KVM Exception Handlers @ %p\n",
  ALIGN(size, PAGE_SIZE), gebase);
@@ -368,6 +368,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
 out_free_gebase:
kfree(gebase);
 
+out_uninit_cpu:
+   kvm_vcpu_uninit(vcpu);
+
 out_free_cpu:
kfree(vcpu);
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] kvm/x86: Hyper-V tsc page setup

2016-01-06 Thread Andrey Smetanin



On 01/06/2016 12:48 AM, Peter Hornyack wrote:

On Thu, Dec 24, 2015 at 1:33 AM, Andrey Smetanin
 wrote:

Lately tsc page was implemented but filled with empty
values. This patch setup tsc page scale and offset based
on vcpu tsc, tsc_khz and  HV_X64_MSR_TIME_REF_COUNT value.

The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr
reads count to zero which potentially improves performance.

The patch applies on top of
'kvm: Make vcpu->requests as 64 bit bitmap'
previously sent.

Signed-off-by: Andrey Smetanin 
CC: Paolo Bonzini 
CC: Gleb Natapov 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

Reviewed-by: Peter Hornyack 



---
  arch/x86/kvm/hyperv.c| 117 +--
  arch/x86/kvm/hyperv.h|   2 +
  arch/x86/kvm/x86.c   |  12 +
  include/linux/kvm_host.h |   1 +
  4 files changed, 117 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index d50675a..504fdc7 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -753,6 +753,105 @@ static int kvm_hv_msr_set_crash_data(struct kvm_vcpu 
*vcpu,
 return 0;
  }

+static u64 calc_tsc_page_scale(u32 tsc_khz)
+{
+   /*
+* reftime (in 100ns) = tsc * tsc_scale / 2^64 + tsc_offset
+* so reftime_delta = (tsc_delta * tsc_scale) / 2^64
+* so tsc_scale = (2^64 * reftime_delta)/tsc_delta
+* so tsc_scale = (2^64 * 10 * 10^6) / tsc_hz = (2^64 * 1) / tsc_khz
+* so tsc_scale = (2^63 * 2 * 1) / tsc_khz
+*/
+   return mul_u64_u32_div(1ULL << 63, 2 * 1, tsc_khz);
+}
+
+static int write_tsc_page(struct kvm *kvm, u64 gfn,
+ PHV_REFERENCE_TSC_PAGE tsc_ref)
+{
+   if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
+   tsc_ref, sizeof(*tsc_ref)))
+   return 1;
+   mark_page_dirty(kvm, gfn);
+   return 0;
+}
+
+static int read_tsc_page(struct kvm *kvm, u64 gfn,
+PHV_REFERENCE_TSC_PAGE tsc_ref)
+{
+   if (kvm_read_guest(kvm, gfn_to_gpa(gfn),
+  tsc_ref, sizeof(*tsc_ref)))
+   return 1;
+   return 0;
+}
+
+static u64 calc_tsc_page_time(struct kvm_vcpu *vcpu,
+ PHV_REFERENCE_TSC_PAGE tsc_ref)
+{
+
+   u64 tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+
+   return mul_u64_u64_shr(tsc, tsc_ref->tsc_scale, 64)
+   + tsc_ref->tsc_offset;
+}
+
+static int setup_blank_tsc_page(struct kvm_vcpu *vcpu, u64 gfn)
+{
+   HV_REFERENCE_TSC_PAGE tsc_ref;
+
+   memset(_ref, 0, sizeof(tsc_ref));
+   return write_tsc_page(vcpu->kvm, gfn, _ref);
+}
+
+int kvm_hv_setup_tsc_page(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+   struct kvm_hv *hv = >arch.hyperv;
+   HV_REFERENCE_TSC_PAGE tsc_ref;
+   u32 tsc_khz;
+   int r;
+   u64 gfn, ref_time, tsc_scale, tsc_offset, tsc;
+
+   if (WARN_ON_ONCE(!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)))
+   return -EINVAL;
+
+   gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
+   vcpu_debug(vcpu, "tsc page gfn 0x%llx\n", gfn);
+
+   tsc_khz = vcpu->arch.virtual_tsc_khz;
+   if (!tsc_khz) {
+   vcpu_unimpl(vcpu, "no tsc khz\n");
+   return setup_blank_tsc_page(vcpu, gfn);
+   }
+
+   r = read_tsc_page(kvm, gfn, _ref);
+   if (r) {
+   vcpu_err(vcpu, "can't access tsc page gfn 0x%llx\n", gfn);
+   return r;
+   }
+
+   tsc_scale = calc_tsc_page_scale(tsc_khz);
+   ref_time = get_time_ref_counter(kvm);
+   tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+
+   /* tsc_offset = reftime - tsc * tsc_scale / 2^64 */
+   tsc_offset = ref_time - mul_u64_u64_shr(tsc, tsc_scale, 64);
+   vcpu_debug(vcpu, "tsc khz %u tsc %llu scale %llu offset %llu\n",
+  tsc_khz, tsc, tsc_scale, tsc_offset);
+
+   tsc_ref.tsc_sequence++;
+   if (tsc_ref.tsc_sequence == 0)


Also avoid tsc_sequence == 0x here. In the Hyper-V TLFS 4.0
(Win2012 R2) 0x is the special sequence number to disable the
reference TSC page.


we already discussed with Microsoft
that documentation contains wrong sequence number
- 0x (instead of 0). please take a look into details here:
https://lkml.org/lkml/2015/11/2/655

+   tsc_ref.tsc_sequence = 1;
+
+   tsc_ref.tsc_scale = tsc_scale;
+   tsc_ref.tsc_offset = tsc_offset;
+
+   vcpu_debug(vcpu, "tsc page calibration time %llu vs. reftime %llu\n",
+  calc_tsc_page_time(vcpu, _ref),
+  get_time_ref_counter(kvm));
+
+   return write_tsc_page(kvm, gfn, _ref);
+}
+
  static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
  bool host)
  {
@@ 

Re: [RFC PATCH v2 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2016-01-06 Thread Yongji Xie

On 2016/1/5 5:42, Benjamin Herrenschmidt wrote:

On Mon, 2016-01-04 at 14:07 -0700, Alex Williamson wrote:

On Thu, 2015-12-31 at 16:50 +0800, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap MSI-X
table in case that user get to touch this directly.

However, EEH mechanism can ensure that a given pci device
can only shoot the MSIs assigned for its PE. So we think
it's safe to expose the MSI-X table to userspace because
the exposed MSI-X table can't be used to do harm to other
memory space.

And with MSI-X table mmapped, some performance issues which
are caused when PCI adapters have critical registers in the
same page as the MSI-X table also can be resolved.

So this patch adds a Kconfig option, VFIO_PCI_MMAP_MSIX,
to support for mmapping MSI-X table.

Signed-off-by: Yongji Xie 
---
  drivers/vfio/pci/Kconfig|4 
  drivers/vfio/pci/vfio_pci.c |6 --
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 02912f1..67b0a2c 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -23,6 +23,10 @@ config VFIO_PCI_MMAP
depends on VFIO_PCI
def_bool y if !S390
  
+config VFIO_PCI_MMAP_MSIX

+   depends on VFIO_PCI_MMAP
+   def_bool y if EEH

Does CONFIG_EEH necessarily mean the EEH is enabled?  Could the
system
not support EEH or could EEH be disabled via kernel commandline
options?

EEH is definitely the wrong thing to test here anyway. What needs to be
tested is that the PCI Host bridge supports filtering of MSIs, so
ideally this should be some kind of host bridge attribute set by the
architecture backend.


So do you mean this attribute can be added in pci_host_bridge like this:

--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -412,6 +412,7 @@ struct pci_host_bridge {
void (*release_fn)(struct pci_host_bridge *);
void *release_data;
unsigned int ignore_reset_delay:1;  /* for entire hierarchy */
+   unsigned int msix_filtered:1;   /* support filtering of MSIs */
/* Resource alignment requirements */
resource_size_t (*align_resource)(struct pci_dev *dev,
const struct resource *res,

I can surely do it if there is no objection from PCI folks. Thanks.

Regards,
Yongji Xie


This can happen with or without CONFIG_EEH and you are right,
CONFIG_EEH can be enabled and the machine not support it.

Any IODA bridge will support this.

Cheers,
Ben.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to reserve guest physical region for ACPI

2016-01-06 Thread Igor Mammedov
On Tue, 5 Jan 2016 18:22:33 +0100
Laszlo Ersek  wrote:

> On 01/05/16 18:08, Igor Mammedov wrote:
> > On Mon, 4 Jan 2016 21:17:31 +0100
> > Laszlo Ersek  wrote:
> >   
> >> Michael CC'd me on the grandparent of the email below. I'll try to add
> >> my thoughts in a single go, with regard to OVMF.
> >>
> >> On 12/30/15 20:52, Michael S. Tsirkin wrote:  
> >>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
>  On Mon, 28 Dec 2015 14:50:15 +0200
>  "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:
> >>
> >> Hi Michael, Paolo,
> >>
> >> Now it is the time to return to the challenge that how to reserve guest
> >> physical region internally used by ACPI.
> >>
> >> Igor suggested that:
> >> | An alternative place to allocate reserve from could be high memory.
> >> | For pc we have "reserved-memory-end" which currently makes sure
> >> | that hotpluggable memory range isn't used by firmware
> >> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
> >> 
> >>
> >> OVMF has no support for the "reserved-memory-end" fw_cfg file. The
> >> reason is that nobody wrote that patch, nor asked for the patch to be
> >> written. (Not implying that just requesting the patch would be
> >> sufficient for the patch to be written.)
> >>  
> > I don't want to tie things to reserved-memory-end because this
> > does not scale: next time we need to reserve memory,
> > we'll need to find yet another way to figure out what is where.
>  Could you elaborate a bit more on a problem you're seeing?
> 
>  To me it looks like it scales rather well.
>  For example lets imagine that we adding a device
>  that has some on device memory that should be mapped into GPA
>  code to do so would look like:
> 
>    pc_machine_device_plug_cb(dev)
>    {
> ...
> if (dev == OUR_NEW_DEVICE_TYPE) {
> memory_region_add_subregion(as, current_reserved_end, >mr);
> set_new_reserved_end(current_reserved_end + 
>  memory_region_size(>mr));
> }
>    }
> 
>  we can practically add any number of new devices that way.
> >>>
> >>> Yes but we'll have to build a host side allocator for these, and that's
> >>> nasty. We'll also have to maintain these addresses indefinitely (at
> >>> least per machine version) as they are guest visible.
> >>> Not only that, there's no way for guest to know if we move things
> >>> around, so basically we'll never be able to change addresses.
> >>>
> >>> 
>   
> > I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
> > support 64 bit RAM instead
> >>
> >> This looks quite doable in OVMF, as long as the blob to allocate from
> >> high memory contains *zero* ACPI tables.
> >>
> >> (
> >> Namely, each ACPI table is installed from the containing fw_cfg blob
> >> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
> >> own allocation policy for the *copies* of ACPI tables it installs.
> >>
> >> This allocation policy is left unspecified in the section of the UEFI
> >> spec that governs EFI_ACPI_TABLE_PROTOCOL.
> >>
> >> The current policy in edk2 (= the reference implementation) seems to be
> >> "allocate from under 4GB". It is currently being changed to "try to
> >> allocate from under 4GB, and if that fails, retry from high memory". (It
> >> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
> >> )
> >>  
> > (and maybe a way to allocate and
> > zero-initialize buffer without loading it through fwcfg),
> >>
> >> Sounds reasonable.
> >>  
> > this way bios
> > does the allocation, and addresses can be patched into acpi.
>  and then guest side needs to parse/execute some AML that would
>  initialize QEMU side so it would know where to write data.
> >>>
> >>> Well not really - we can put it in a data table, by itself
> >>> so it's easy to find.
> >>
> >> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
> >> acpi_get_table_with_size()?
> >>  
> >>>
> >>> AML is only needed if access from ACPI is desired.
> >>>
> >>> 
>  bios-linker-loader is a great interface for initializing some
>  guest owned data and linking it together but I think it adds
>  unnecessary complexity and is misused if it's used to handle
>  device owned data/on device memory in this and VMGID cases.
> >>>
> >>> I want a generic interface for guest to enumerate these things.  linker
> >>> seems quite reasonable but if you see a reason why it won't do, or want
> >>> to propose a better interface, fine.
> >>
> >> * The guest could do the following:
> >> - while processing the ALLOCATE commands, it would make a note where in
> >> GPA space each fw_cfg blob gets allocated
> >> - at the 

Re: [PATCH 3/6] nvdimm acpi: introduce patched dsm memory

2016-01-06 Thread Xiao Guangrong



On 01/06/2016 11:23 PM, Igor Mammedov wrote:

On Tue,  5 Jan 2016 02:52:05 +0800
Xiao Guangrong  wrote:


The dsm memory is used to save the input parameters and store
the dsm result which is filled by QEMU.

The address of dsm memory is decided by bios and patched into
int64 object returned by "MEMA" method

Signed-off-by: Xiao Guangrong 
---
  hw/acpi/aml-build.c | 12 
  hw/acpi/nvdimm.c| 24 ++--
  include/hw/acpi/aml-build.h |  1 +
  3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 78e1290..83eadb3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -394,6 +394,18 @@ Aml *aml_int(const uint64_t val)
  }

  /*
+ * ACPI 1.0b: 16.2.3 Data Objects Encoding:
+ * encode: QWordConst
+ */
+Aml *aml_int64(const uint64_t val)
+{
+Aml *var = aml_alloc();
+build_append_byte(var->buf, 0x0E); /* QWordPrefix */
+build_append_int_noprefix(var->buf, val, 8);
+return var;
+}
+
+/*
   * helper to construct NameString, which returns Aml object
   * for using with aml_append or other aml_* terms
   */
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index bc7cd8f..a72104c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -28,6 +28,7 @@

  #include "hw/acpi/acpi.h"
  #include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
  #include "hw/nvram/fw_cfg.h"
  #include "hw/mem/nvdimm.h"

@@ -402,7 +403,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
MemoryRegion *io,
  state->dsm_mem->len);
  }

-#define NVDIMM_COMMON_DSM  "NCAL"
+#define NVDIMM_GET_DSM_MEM  "MEMA"
+#define NVDIMM_COMMON_DSM   "NCAL"

  static void nvdimm_build_common_dsm(Aml *dev)
  {
@@ -468,7 +470,8 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray 
*table_offsets,
GArray *table_data, GArray *linker,
uint8_t revision)
  {
-Aml *ssdt, *sb_scope, *dev;
+Aml *ssdt, *sb_scope, *dev, *method;
+int offset;

  acpi_add_table(table_offsets, table_data);

@@ -499,9 +502,26 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray 
*table_offsets,

  aml_append(sb_scope, dev);

+/*
+ * leave it at the end of ssdt so that we can conveniently get the
+ * offset of int64 object returned by the function which will be
+ * patched with the real address of the dsm memory by BIOS.
+ */
+method = aml_method(NVDIMM_GET_DSM_MEM, 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int64(0x0)));

there is no need in dedicated aml_int64(), you can use aml_int(0x64) 
trick


We can not do this due to the trick in  bios_linker_loader_add_pointer() which 
will
issue a COMMAND_ADD_POINTER to BIOS, however, this request does:
/*
 * COMMAND_ADD_POINTER - patch the table (originating from
 * @dest_file) at @pointer.offset, by adding a pointer to the table
 * originating from @src_file. 1,2,4 or 8 byte unsigned
 * addition is used depending on @pointer.size.
 */

that means the new-offset = old-offset + the address of the new table allocated 
by BIOS.

So we expect 0 offset here.




+aml_append(sb_scope, method);
  aml_append(ssdt, sb_scope);
  /* copy AML table into ACPI tables blob and patch header there */
  g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
+
+offset = table_data->len - 8;
+
+bios_linker_loader_alloc(linker, NVDIMM_DSM_MEM_FILE, TARGET_PAGE_SIZE,
+ false /* high memory */);
+bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+   NVDIMM_DSM_MEM_FILE, table_data,
+   table_data->data + offset,
+   sizeof(uint64_t));

this offset magic will break badly as soon as someone add something
to the end of SSDT.



Yes, it is, so don't do that, :) and this is why we made the comment here:
 +/*
 + * leave it at the end of ssdt so that we can conveniently get the
 + * offset of int64 object returned by the function which will be
 + * patched with the real address of the dsm memory by BIOS.
 + */


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to reserve guest physical region for ACPI

2016-01-06 Thread Laszlo Ersek
On 01/06/16 14:39, Igor Mammedov wrote:
> On Tue, 5 Jan 2016 18:22:33 +0100
> Laszlo Ersek  wrote:
> 
>> On 01/05/16 18:08, Igor Mammedov wrote:
>>> On Mon, 4 Jan 2016 21:17:31 +0100
>>> Laszlo Ersek  wrote:
>>>   
 Michael CC'd me on the grandparent of the email below. I'll try to add
 my thoughts in a single go, with regard to OVMF.

 On 12/30/15 20:52, Michael S. Tsirkin wrote:  
> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote:
>> On Mon, 28 Dec 2015 14:50:15 +0200
>> "Michael S. Tsirkin"  wrote:
>>
>>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote:

 Hi Michael, Paolo,

 Now it is the time to return to the challenge that how to reserve guest
 physical region internally used by ACPI.

 Igor suggested that:
 | An alternative place to allocate reserve from could be high memory.
 | For pc we have "reserved-memory-end" which currently makes sure
 | that hotpluggable memory range isn't used by firmware
 (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html)
 

 OVMF has no support for the "reserved-memory-end" fw_cfg file. The
 reason is that nobody wrote that patch, nor asked for the patch to be
 written. (Not implying that just requesting the patch would be
 sufficient for the patch to be written.)
  
>>> I don't want to tie things to reserved-memory-end because this
>>> does not scale: next time we need to reserve memory,
>>> we'll need to find yet another way to figure out what is where.
>> Could you elaborate a bit more on a problem you're seeing?
>>
>> To me it looks like it scales rather well.
>> For example lets imagine that we adding a device
>> that has some on device memory that should be mapped into GPA
>> code to do so would look like:
>>
>>   pc_machine_device_plug_cb(dev)
>>   {
>>...
>>if (dev == OUR_NEW_DEVICE_TYPE) {
>>memory_region_add_subregion(as, current_reserved_end, >mr);
>>set_new_reserved_end(current_reserved_end + 
>> memory_region_size(>mr));
>>}
>>   }
>>
>> we can practically add any number of new devices that way.
>
> Yes but we'll have to build a host side allocator for these, and that's
> nasty. We'll also have to maintain these addresses indefinitely (at
> least per machine version) as they are guest visible.
> Not only that, there's no way for guest to know if we move things
> around, so basically we'll never be able to change addresses.
>
> 
>>  
>>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to
>>> support 64 bit RAM instead

 This looks quite doable in OVMF, as long as the blob to allocate from
 high memory contains *zero* ACPI tables.

 (
 Namely, each ACPI table is installed from the containing fw_cfg blob
 with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its
 own allocation policy for the *copies* of ACPI tables it installs.

 This allocation policy is left unspecified in the section of the UEFI
 spec that governs EFI_ACPI_TABLE_PROTOCOL.

 The current policy in edk2 (= the reference implementation) seems to be
 "allocate from under 4GB". It is currently being changed to "try to
 allocate from under 4GB, and if that fails, retry from high memory". (It
 is motivated by Aarch64 machines that may have no DRAM at all under 4GB.)
 )
  
>>> (and maybe a way to allocate and
>>> zero-initialize buffer without loading it through fwcfg),

 Sounds reasonable.
  
>>> this way bios
>>> does the allocation, and addresses can be patched into acpi.
>> and then guest side needs to parse/execute some AML that would
>> initialize QEMU side so it would know where to write data.
>
> Well not really - we can put it in a data table, by itself
> so it's easy to find.

 Do you mean acpi_tb_find_table(), acpi_get_table_by_index() /
 acpi_get_table_with_size()?
  
>
> AML is only needed if access from ACPI is desired.
>
> 
>> bios-linker-loader is a great interface for initializing some
>> guest owned data and linking it together but I think it adds
>> unnecessary complexity and is misused if it's used to handle
>> device owned data/on device memory in this and VMGID cases.
>
> I want a generic interface for guest to enumerate these things.  linker
> seems quite reasonable but if you see a reason why it won't do, or want
> to propose a better interface, fine.

 * The guest could do the following:
 - while processing the ALLOCATE commands, it would make a note where in
 GPA space 

Re: [PATCH 3/6] nvdimm acpi: introduce patched dsm memory

2016-01-06 Thread Igor Mammedov
On Tue,  5 Jan 2016 02:52:05 +0800
Xiao Guangrong  wrote:

> The dsm memory is used to save the input parameters and store
> the dsm result which is filled by QEMU.
> 
> The address of dsm memory is decided by bios and patched into
> int64 object returned by "MEMA" method
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  hw/acpi/aml-build.c | 12 
>  hw/acpi/nvdimm.c| 24 ++--
>  include/hw/acpi/aml-build.h |  1 +
>  3 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 78e1290..83eadb3 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -394,6 +394,18 @@ Aml *aml_int(const uint64_t val)
>  }
>  
>  /*
> + * ACPI 1.0b: 16.2.3 Data Objects Encoding:
> + * encode: QWordConst
> + */
> +Aml *aml_int64(const uint64_t val)
> +{
> +Aml *var = aml_alloc();
> +build_append_byte(var->buf, 0x0E); /* QWordPrefix */
> +build_append_int_noprefix(var->buf, val, 8);
> +return var;
> +}
> +
> +/*
>   * helper to construct NameString, which returns Aml object
>   * for using with aml_append or other aml_* terms
>   */
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index bc7cd8f..a72104c 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -28,6 +28,7 @@
>  
>  #include "hw/acpi/acpi.h"
>  #include "hw/acpi/aml-build.h"
> +#include "hw/acpi/bios-linker-loader.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/mem/nvdimm.h"
>  
> @@ -402,7 +403,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
> MemoryRegion *io,
>  state->dsm_mem->len);
>  }
>  
> -#define NVDIMM_COMMON_DSM  "NCAL"
> +#define NVDIMM_GET_DSM_MEM  "MEMA"
> +#define NVDIMM_COMMON_DSM   "NCAL"
>  
>  static void nvdimm_build_common_dsm(Aml *dev)
>  {
> @@ -468,7 +470,8 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray 
> *table_offsets,
>GArray *table_data, GArray *linker,
>uint8_t revision)
>  {
> -Aml *ssdt, *sb_scope, *dev;
> +Aml *ssdt, *sb_scope, *dev, *method;
> +int offset;
>  
>  acpi_add_table(table_offsets, table_data);
>  
> @@ -499,9 +502,26 @@ static void nvdimm_build_ssdt(GSList *device_list, 
> GArray *table_offsets,
>  
>  aml_append(sb_scope, dev);
>  
> +/*
> + * leave it at the end of ssdt so that we can conveniently get the
> + * offset of int64 object returned by the function which will be
> + * patched with the real address of the dsm memory by BIOS.
> + */
> +method = aml_method(NVDIMM_GET_DSM_MEM, 0, AML_NOTSERIALIZED);
> +aml_append(method, aml_return(aml_int64(0x0)));
there is no need in dedicated aml_int64(), you can use aml_int(0x64) 
trick

> +aml_append(sb_scope, method);
>  aml_append(ssdt, sb_scope);
>  /* copy AML table into ACPI tables blob and patch header there */
>  g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> +
> +offset = table_data->len - 8;
> +
> +bios_linker_loader_alloc(linker, NVDIMM_DSM_MEM_FILE, TARGET_PAGE_SIZE,
> + false /* high memory */);
> +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +   NVDIMM_DSM_MEM_FILE, table_data,
> +   table_data->data + offset,
> +   sizeof(uint64_t));
this offset magic will break badly as soon as someone add something
to the end of SSDT.


>  build_header(linker, table_data,
>  (void *)(table_data->data + table_data->len - ssdt->buf->len),
>  "SSDT", ssdt->buf->len, revision, "NVDIMM");
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index ef44d02..b4726a4 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -246,6 +246,7 @@ Aml *aml_name(const char *name_format, ...) 
> GCC_FMT_ATTR(1, 2);
>  Aml *aml_name_decl(const char *name, Aml *val);
>  Aml *aml_return(Aml *val);
>  Aml *aml_int(const uint64_t val);
> +Aml *aml_int64(const uint64_t val);
>  Aml *aml_arg(int pos);
>  Aml *aml_to_integer(Aml *arg);
>  Aml *aml_to_hexstring(Aml *src, Aml *dst);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 110441] New: KVM guests randomly get I/O errors on VirtIO based devices

2016-01-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=110441

Bug ID: 110441
   Summary: KVM guests randomly get I/O errors on VirtIO based
devices
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.16.7-ckt11-1+deb8u5
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
  Assignee: virtualization_...@kernel-bugs.osdl.org
  Reporter: jordi.mall...@collabora.co.uk
Regression: No

We've been seeing a strange bug in KVM guests hosted by a Debian jessie box
(running 3.16.7-ckt11-1+deb8u5 on x86-64),

Basically, we are getting random VirtIO errors inside our guests, resulting in
stuff like this

[4735406.568235] blk_update_request: I/O error, dev vda, sector 142339584
[4735406.572008] EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -5
writing to inode 1184437 (offset 0 size 208896 starting block 17729472)
[4735406.572008] Buffer I/O error on device dm-0, logical block 17729472
[ ... ]
[4735406.572008] Buffer I/O error on device dm-0, logical block 17729481
[4735406.643486] blk_update_request: I/O error, dev vda, sector 142356480
[ ... ]
[4735406.748456] blk_update_request: I/O error, dev vda, sector 38587480
[4735411.020309] Buffer I/O error on dev dm-0, logical block 12640808, lost
sync page write
[4735411.055184] Aborting journal on device dm-0-8.
[4735411.056148] Buffer I/O error on dev dm-0, logical block 12615680, lost
sync page write
[4735411.057626] JBD2: Error -5 detected when updating journal superblock for
dm-0-8.
[4735411.057936] Buffer I/O error on dev dm-0, logical block 0, lost sync page
write
[4735411.057946] EXT4-fs error (device dm-0): ext4_journal_check_start:56:
Detected aborted journal
[4735411.057948] EXT4-fs (dm-0): Remounting filesystem read-only
[4735411.057949] EXT4-fs (dm-0): previous I/O error to superblock detected

(From an Ubuntu 15.04 guest, EXT4 on LVM2)

Or,

Jan 06 03:39:11 titanium kernel: end_request: I/O error, dev vda, sector
1592467904
Jan 06 03:39:11 titanium kernel: EXT4-fs warning (device vda3):
ext4_end_bio:317: I/O error -5 writing to inode 31169653 (offset 0 size 0
starting block 199058492)
Jan 06 03:39:11 titanium kernel: Buffer I/O error on device vda3, logical block
198899256
[...]
Jan 06 03:39:12 titanium kernel: Aborting journal on device vda3-8.
Jan 06 03:39:12 titanium kernel: Buffer I/O error on device vda3, logical block
99647488

(From a Debian jessie guest, EXT4 directly on a VirtIO-based block device)

When this happens, it affects multiple guests on the hosts at the same time.
Normally they are severe enough that they end up with a r/o file system, but
we've seen a few hosts survive with a non-fatal I/O error. The host's dmesg has
nothing interesting to see.

We've seen this happen with quite heterogeneous guests:

Debian 6, 7 and 8 (Debian kernels 2.6.32, 3.2 and 3.16)
Ubuntu 14.09 and 15.04 (Ubuntu kernels)
32 bit and 64 bit installs.

In short, we haven't seen a clear characteristic in any guest, other than the
affected hosts being the ones with some sustained I/O load (build machines,
cgit servers, PostgreSQL RDBMs...). Most of the times, hosts that just sit
there doing nothing with their disks are not affected.

The host is a stock Debian jessie install that manages libvirt-based QEMU
guests. All the guests have their block devices using virtio drivers, some of
them on spinning media based on LSI RAID (was a 3ware card before, got replaced
as we were very suspicious about it, but are getting the same results), and
some of them based on PCIe SSD storage. We have some other 3 hosts, similar
setup except they run Debian wheezy (and honestly we're not too keen on
upgrading them yet, just in case), none of them has ever shown this kind of
problem

We've been seeing this since last summer, and haven't found a pattern that
tells us where these I/O error bugs are coming from. Google isn't revealing
other people with a similar problem, and we're finding that quite surprising as
our setup is quite basic.

This has also been reported downstream at the Debian BTS as Bug#810121
(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810121).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: HAPPY NEW YEAR

2016-01-06 Thread Carl Leinbach


From: Carl Leinbach
Sent: Wednesday, January 06, 2016 4:21 PM
To: Carl Leinbach
Subject: HAPPY NEW YEAR


Donation has been made to you by Mrs. Liliane Bettencourt. Contact email: 
mrslilianebettencou...@gmail.com  for more details

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: HAPPY NEW YEAR

2016-01-06 Thread Carl Leinbach


From: Carl Leinbach
Sent: Wednesday, January 06, 2016 4:21 PM
To: Carl Leinbach
Subject: HAPPY NEW YEAR


Donation has been made to you by Mrs. Liliane Bettencourt. Contact email: 
mrslilianebettencou...@gmail.com  for more details

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's kvmclock's custom sched_clock for?

2016-01-06 Thread Andy Lutomirski
AFAICT KVM reliably passes a monotonic TSC through to guests, even if
the host suspends.  That's all that sched_clock needs, I think.

So why does kvmclock have a custom sched_clock?

On a related note, KVM doesn't pass the "invariant TSC" feature
through to guests on my machine even though "invtsc" is set in QEMU
and the kernel host code appears to support it.  What gives?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4] scsi: cleanup ioctl headers and provide UAPI versions

2016-01-06 Thread Martin K. Petersen
> "Paolo" == Paolo Bonzini  writes:

>> This is v3 of the series to provide an "official" sg.h header (and
>> scsi_ioctl.h too, though it's basically obsolete) together with the
>> other userspace API definitions.  The change from v2 to v3 is that
>> defaults for sg.c are not exported in include/uapi/linux/sg.c.

Paolo> What happened to these patches?...

They predate me being patch monkey. Please repost with any review tags
or acks you may have received.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html