Re: [Xen-devel] Xen 4.6 released

2015-10-14 Thread Chen, Tiejun

On 10/7/2015 7:16 PM, Wei Liu wrote:

Hi all

I'm pleased to announce that Xen 4.6 is released. As release manager I
would like to thank everyone who involved in the making of 4.6 release
(either in the form of patch, bug report or packaging effort). This
release wouldn't have happened without all your contributions.

You can check out 4.6 release from xen.git with the tag "RELEASE-4.6.0".

Tarball can be obtained from:

   http://bits.xensource.com/oss-xen/release/4.6.0/xen-4.6.0.tar.gz


Wei and Ian,

Could we merge these following commits into xen-4.6.1?

commit 83215fba3a80c30b5191a4a1086dc510cca43069

libxl: introduce libxl__is_igd_vga_passthru

commit 488508b49a65dda20cf6eb6d8f549e8d758e610f

libxl: introduce gfx_passthru_kind

Currently they're already in staging branch.

I know we're going to sync qemu-upstream into qemu-xen in xen 4.7 to 
support IGD passthrough with qemu-xen. But actually only these two 
commits can make sure xen-4.6.x work out IGD passthrough with 
qemu-upstream, instead of qemu-xen. I mean we can set something like this,


device_model_version="qemu-xen"
device_model_override="//qemu-system-i386"

Thanks
Tiejun



Signature for tarball:

   http://bits.xensource.com/oss-xen/release/4.6.0/xen-4.6.0.tar.gz.sig

Note that this is xen-devel@ only announcement. Official release
announcement on xen-announce@, press release, blog post and various
other docs / web pages will be available next week on October 13. The
official download link may also be different from the one provided
above.


Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-27 Thread Chen, Tiejun

On 9/22/2015 12:03 AM, Stefano Stabellini wrote:

It is going to be in QEMU 2.5 and qemu-xen 4.7.


Thanks for your reply.

Do we have any possibility of just merging this series into qemu-xen 
4.6? We really want to support IGD passthrough on xen 4.6 if possible :)


Thanks
Tiejun



On Mon, 21 Sep 2015, Chen, Tiejun wrote:

Stefano,

I have two questions,

#1. Which qemu version is this igd stuff going into? 2.6?
#2. Is this igd stuff going into qemu-xen inside xen? Any plan to go into xen
4.6?

Thanks
Tiejun

On 9/9/2015 1:21 AM, Stefano Stabellini wrote:
> The following changes since commit 8611280505119e296757a60711a881341603fa5a:
>
>target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200)
>
> are available in the git repository at:
>
>git://xenbits.xen.org/people/sstabellini/qemu-dm.git
> tags/xen-2015-09-08-tag
>
> for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4:
>
>xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings.
> (2015-09-08 15:21:56 +)
>
> 
> Xen branch xen-2015-09-08
>
> 
> Don Slutz (1):
>xen-hvm: Add trace to ioreq
>
> Jan Beulich (1):
>xen/HVM: atomically access pointers in bufioreq handling
>
> Konrad Rzeszutek Wilk (7):
>xen-hvm: When using xc_domain_add_to_physmap also include errno when
> reporting
>xen/pt: Update comments with proper function name.
>xen/pt: Make xen_pt_msi_set_enable static
>xen/pt: xen_host_pci_config_read returns -errno, not -1 on failure
>xen: use errno instead of rc for xc_domain_add_to_physmap
>xen/pt/msi: Add the register value when printing logging and error
> messages
>xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings.
>
> Michael S. Tsirkin (1):
>i440fx: make types configurable at run-time
>
> Tiejun Chen (9):
>pc_init1: pass parameters just with types
>piix: create host bridge to passthrough
>hw/pci-assign: split pci-assign.c
>xen, gfx passthrough: basic graphics passthrough support
>xen, gfx passthrough: retrieve VGA BIOS to work
>igd gfx passthrough: create a isa bridge
>xen, gfx passthrough: register a isa bridge
>xen, gfx passthrough: register host bridge specific to passthrough
>xen, gfx passthrough: add opregion mapping
>
>   configure |   28 +
>   hw/core/machine.c |   20 +++
>   hw/i386/Makefile.objs |1 +
>   hw/i386/kvm/pci-assign.c  |   82 ++---
>   hw/i386/pc_piix.c |  139 -
>   hw/i386/pci-assign-load-rom.c |   93 ++
>   hw/pci-host/piix.c|   91 +-
>   hw/xen/Makefile.objs  |1 +
>   hw/xen/xen-host-pci-device.c  |5 +
>   hw/xen/xen-host-pci-device.h  |1 +
>   hw/xen/xen_pt.c   |   42 ++-
>   hw/xen/xen_pt.h   |   22 +++-
>   hw/xen/xen_pt_config_init.c   |   59 -
>   hw/xen/xen_pt_graphics.c  |  272
> +
>   hw/xen/xen_pt_msi.c   |2 +-
>   include/hw/boards.h   |1 +
>   include/hw/i386/pc.h  |9 +-
>   include/hw/pci/pci-assign.h   |   27 
>   include/hw/xen/xen_common.h   |   34 +-
>   qemu-options.hx   |3 +
>   trace-events  |7 ++
>   vl.c  |   10 ++
>   xen-hvm.c |   55 +++--
>   23 files changed, 891 insertions(+), 113 deletions(-)
>   create mode 100644 hw/i386/pci-assign-load-rom.c
>   create mode 100644 hw/xen/xen_pt_graphics.c
>   create mode 100644 include/hw/pci/pci-assign.h
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0

2015-09-24 Thread Chen, Tiejun

On 9/23/2015 11:56 PM, Elena Ufimtseva wrote:

Hi

There is a regression in RMRR patch 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in
new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH Dom0.
This causes pages faults and some long 'hang-like' delays during boot and
device assignments.

During construct_dom0, in PVH path  p2m is being constructed and identity mapped
in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
New code used to map RMRRs invoked from rmrr_identity_mapping
checks if p2m entry exists with same type and access and if yes, skips iommu
mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being
mapped in IOMMU.

This debug patch attached fixes this and Ill be glad to see if there is a more 
elegant fix.


Based on your explanation, sounds pvh always creates this mapping 
beforehand, so what about this?


diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index cf8485e..d026845 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -964,7 +964,7 @@ int set_identity_p2m_entry(struct domain *d, 
unsigned long gfn,

 struct p2m_domain *p2m = p2m_get_hostp2m(d);
 int ret;

-if ( !paging_mode_translate(p2m->domain) )
+if ( !paging_mode_translate(p2m->domain) || is_pvh_domain(d) )
 {
 if ( !need_iommu(d) )
 return 0;

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [v4][PATCH 0/2] libxl: try to support IGD passthrough for qemu upstream

2015-09-24 Thread Chen, Tiejun

Ping...

Thanks
Tiejun

On 9/18/2015 4:30 PM, Tiejun Chen wrote:

Ian,

As we discussed previously,

http://patchwork.ozlabs.org/patch/457055/

now it's time to push this into on xen/tools side since all qemu stuffs
have been merged.

https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg02094.html

v4:

Ian,

Actually we had v3.5 online previously, which was reviewed by you.

http://permalink.gmane.org/gmane.comp.emulators.qemu/329100

So here I just bring a little bit to refine code just for patch #2
according to out last conversation.

v3:

* Refine some codes based on Campbell's feedback so thanks for Campbell's
   kind guideline to patch #2
* Update the manpages in patch #2

v2:

* Refine patch #2's head description
* Improve codes quality inside patch #1 based on Wei's comments
* Refill the summary inside patch #0 based on Konrad and Wei's suggestion

When we're working to support IGD GFX passthrough with qemu
upstream, instead of "-gfx_passthru" we'd like to make that
a machine option, "-machine xxx,igd-passthru=on".

https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02050.html

This need to bring a change on tool side.

After a discussion with Campbell, we'd like to construct a table to record
all IGD devices we can support. If we hit that table, we should pass that
option. And so we also introduce a new field of type, 'gfx_passthru_kind',
to cooperate with 'gfx_passthru' to cover all scenarios like this,

 gfx_passthru = 0=> sets build_info.u.gfx_passthru to false
 gfx_passthru = 1=> sets build_info.u.gfx_passthru to true and
build_info.u.gfx_passthru_kind to DEFAULT
 gfx_passthru = "igd"=> sets build_info.u.gfx_passthru to false
and build_info.u.gfx_passthru_kind to IGD

And note actually that option "-gfx_passthru" is just introduced to
work for qemu-xen-traditional so we should get this away from
libxl__build_device_model_args_new() in the case of qemu upstream.


Tiejun Chen (2):
   libxl: introduce libxl__is_igd_vga_passthru
   libxl: introduce gfx_passthru_kind

  docs/man/xl.cfg.pod.5|  35 --
  tools/libxl/libxl.h  |   6 ++
  tools/libxl/libxl_dm.c   |  46 +++--
  tools/libxl/libxl_internal.h |   2 +
  tools/libxl/libxl_pci.c  | 124 +++
  tools/libxl/libxl_types.idl  |   6 ++
  tools/libxl/xl_cmdimpl.c |  14 +++-
  7 files changed, 223 insertions(+), 10 deletions(-)

Thanks
Tiejun




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-20 Thread Chen, Tiejun

Stefano,

I have two questions,

#1. Which qemu version is this igd stuff going into? 2.6?
#2. Is this igd stuff going into qemu-xen inside xen? Any plan to go 
into xen 4.6?


Thanks
Tiejun

On 9/9/2015 1:21 AM, Stefano Stabellini wrote:

The following changes since commit 8611280505119e296757a60711a881341603fa5a:

   target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200)

are available in the git repository at:

   git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-2015-09-08-tag

for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4:

   xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. 
(2015-09-08 15:21:56 +)


Xen branch xen-2015-09-08


Don Slutz (1):
   xen-hvm: Add trace to ioreq

Jan Beulich (1):
   xen/HVM: atomically access pointers in bufioreq handling

Konrad Rzeszutek Wilk (7):
   xen-hvm: When using xc_domain_add_to_physmap also include errno when 
reporting
   xen/pt: Update comments with proper function name.
   xen/pt: Make xen_pt_msi_set_enable static
   xen/pt: xen_host_pci_config_read returns -errno, not -1 on failure
   xen: use errno instead of rc for xc_domain_add_to_physmap
   xen/pt/msi: Add the register value when printing logging and error 
messages
   xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings.

Michael S. Tsirkin (1):
   i440fx: make types configurable at run-time

Tiejun Chen (9):
   pc_init1: pass parameters just with types
   piix: create host bridge to passthrough
   hw/pci-assign: split pci-assign.c
   xen, gfx passthrough: basic graphics passthrough support
   xen, gfx passthrough: retrieve VGA BIOS to work
   igd gfx passthrough: create a isa bridge
   xen, gfx passthrough: register a isa bridge
   xen, gfx passthrough: register host bridge specific to passthrough
   xen, gfx passthrough: add opregion mapping

  configure |   28 +
  hw/core/machine.c |   20 +++
  hw/i386/Makefile.objs |1 +
  hw/i386/kvm/pci-assign.c  |   82 ++---
  hw/i386/pc_piix.c |  139 -
  hw/i386/pci-assign-load-rom.c |   93 ++
  hw/pci-host/piix.c|   91 +-
  hw/xen/Makefile.objs  |1 +
  hw/xen/xen-host-pci-device.c  |5 +
  hw/xen/xen-host-pci-device.h  |1 +
  hw/xen/xen_pt.c   |   42 ++-
  hw/xen/xen_pt.h   |   22 +++-
  hw/xen/xen_pt_config_init.c   |   59 -
  hw/xen/xen_pt_graphics.c  |  272 +
  hw/xen/xen_pt_msi.c   |2 +-
  include/hw/boards.h   |1 +
  include/hw/i386/pc.h  |9 +-
  include/hw/pci/pci-assign.h   |   27 
  include/hw/xen/xen_common.h   |   34 +-
  qemu-options.hx   |3 +
  trace-events  |7 ++
  vl.c  |   10 ++
  xen-hvm.c |   55 +++--
  23 files changed, 891 insertions(+), 113 deletions(-)
  create mode 100644 hw/i386/pci-assign-load-rom.c
  create mode 100644 hw/xen/xen_pt_graphics.c
  create mode 100644 include/hw/pci/pci-assign.h

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-15 Thread Chen, Tiejun

On 9/15/2015 7:00 PM, Paolo Bonzini wrote:



On 15/09/2015 11:55, Stefano Stabellini wrote:

On Mon, 14 Sep 2015, Paolo Bonzini wrote:

> On 10/09/2015 12:29, Stefano Stabellini wrote:

> > +if (lseek(config_fd, pos, SEEK_SET) != pos) {
> > +return -errno;
> > +}
> >  do {
> > -rc = pread(config_fd, (uint8_t *), len, pos);
> > +rc = read(config_fd, (uint8_t *), len);
> >  } while (rc < 0 && (errno == EINTR || errno == EAGAIN));

>
> This leaks config_fd.

I don't follow, it leaks config_fd where?


Where lseek returns -errno (and IIRC in other places in the same function).


Do you mean we need this change?

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 1fb71c8..7d44228 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -775,15 +775,18 @@ static int host_pci_config_read(int pos, int len, 
uint32_t val)

 }

 if (lseek(config_fd, pos, SEEK_SET) != pos) {
+close(config_fd);
 return -errno;
 }
 do {
 rc = read(config_fd, (uint8_t *), len);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 if (rc != len) {
+close(config_fd);
 return -errno;
 }

+close(config_fd);
 return 0;
 }


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-14 Thread Chen, Tiejun

But looks its not better, so any idea?


Did you at least make an attempt to find other examples of where
we dynamically determine the log level to be used for a message?
I would assume that if you did, you'd have come to

 printk(XENLOG_GUEST "%s" VTDPREFIX


I didn't know this tip on Xen side and its really good.


" It's %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
relaxed ? XENLOG_WARNING : XENLOG_ERROR,
relaxed ? "risky" : "disallowed",
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);

pretty naturally.



But I noticed my original patch is already merged into staging. So

Wei,

Do you think if we need a small patch to improved this? Maybe you can 
squash that if necessary.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-14 Thread Chen, Tiejun

OK, that explanation is fine to me as long as it's made clear no
security guarantee once admin uses 'relax' for any domain. Tiejun
could you resend patch with right warning/error type?



Sure, but a little bit makes me confused when I'm trying to address 
this. Actually most messages are same, except for logevel, so I did this 
like,


printk(XENLOG_G_INFO VTDPREFIX " Assign %04x:%02x:%02x.%u"
   " with shared RMRR at %"PRIx64" for Dom%d.",
   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
   rmrr->base_address, d->domain_id);
if ( relaxed )
printk(XENLOG_G_WARNING VTDPREFIX " It's really risky.");
else
printk(XENLOG_G_ERR VTDPREFIX " So it's disallowed!");
printk(XENLOG_G_INFO VTDPREFIX "\n");

But looks its not better, so any idea?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping

2015-09-10 Thread Chen, Tiejun

Sort of (the patch has the intended effect, but for its size very
many rough edges).



I guess we need to amend the original parameter, once_mapping_mfns, like 
this,


/* xen_once_mapping_mfns: memory mapping mfn bumbers once. */
unsigned int xen_once_mapping_mfns;
size_param("once_mapping_mfns", xen_once_mapping_mfns);

static void xen_once_mapping_mfns_setup(void)
{
if ( once_mapping_mfns < 64 )
xen_once_mapping_mfns = 64;
else if ( once_mapping_mfns > 1024 )
xen_once_mapping_mfns = 1024;
else
xen_once_mapping_mfns = once_mapping_mfns;
}

Or what is your expected rule?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-10 Thread Chen, Tiejun

As you see this short log, "hw/pci-assign: split pci-assign.c", so this
means I just extract something from the original hw/i386/kvm/pci-assign.c,
and here so I just keep those original head files residing
hw/i386/kvm/pci-assign.c, and I didn't introduce anything new.


hw/i386/kvm/pci-assign.c is only built if configure set CONFIG_KVM,
which it won't do on Windows or OSX builds.

It sounds like your patch has incorrectly moved code out of files
which are compiled only if KVM is present, or only if we're doing
Xen PCI passthrough, and into compiled-for-everything files.



Yes, we want to share this chunk of codes between Xen and Kvm. Just to 
this error, could we remove #include ? As I mentioned I still 
can compile this file without this head file.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-10 Thread Chen, Tiejun

xen-host-pci-device.c is only compiled if CONFIG_XEN_PCI_PASSTHROUGH
was set by configure. That won't be the case on OSX or Windows, where
the Xen headers don't exist.



Okay. This actually shouldn't be enabled on Windows so what about this?

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 58a33fb..9a1fcb9 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -739,6 +739,7 @@ static const TypeInfo i440fx_info = {
 .class_init= i440fx_class_init,
 };

+#ifndef _WIN32
 /* IGD Passthrough Host Bridge. */
 typedef struct {
 uint8_t offset;
@@ -819,6 +820,7 @@ static const TypeInfo igd_passthrough_i440fx_info = {
 .instance_size = sizeof(PCII440FXState),
 .class_init= igd_passthrough_i440fx_class_init,
 };
+#endif

 static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge,
 PCIBus *rootbus)
@@ -861,7 +863,9 @@ static const TypeInfo i440fx_pcihost_info = {
 static void i440fx_register_types(void)
 {
 type_register_static(_info);
+#ifndef _WIN32
 type_register_static(_passthrough_i440fx_info);
+#endif
 type_register_static(_pci_type_info);
 type_register_static(_info);
 type_register_static(_xen_info);

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping

2015-09-10 Thread Chen, Tiejun

Right, that's one of the things that would need taking care of.
(Whether enforcing an upper limit is actually needed I'm not
sure - we generally allow the admin to shoot himself in the foot
if he wants to. And whether the lower limit should be 64 instead
of just ensuring the limit is not zero is another question.)


64 was semi-arbitrary  - it ended up giving good latency on
highly scalar machines (8 socket). Higher numbers ended up
affecting the latency.

But higher numbers on small socket machines were OK.
(As they do not have 8 IOMMU VT-d chipsets all potentially
flodding the QPI with serialized cache flushes).




So we should make this range [8, ] here, but 64 by 
default. Right?


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-10 Thread Chen, Tiejun

> > Need to have separate warning/error level for relax/strict.
> >
> > However I don't think this patch is a right fix. So far relax/strict policy
> > is per-domain. what about one VM specifies relax while another VM
> > specifies strict when each is assigned with a device sharing rmrr
> > with the other? In that case it becomes a system-wide security hole.
>
> The one specifying "strict" won't gets its device assigned (due to
> the code above, taking the path that was there already without
> the patch), so I don't see the security issue.
>

Agreed. A VM can't get such device assigned in the first place, so the
hypothetical scenario doesn't exist.



Sorry it's a bad example. My actual concern is that we can't count
on this per-VM relax/strict policy to prevent group devices assigned
to different VM. In that case it's definitely a security hole since
one VM may clobber shared RMRR to impact another VM. So right
example for that scenario is both VMs specified with 'relax'.


What if one of group devices is still owned by Dom0?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-10 Thread Chen, Tiejun

Thanks! I'll fold it the offending patch
(http://marc.info/?l=qemu-devel=144174596628052=2) and resend.



Reviewed-by: Michael S. Tsirkin 



Michale and Stefano,

Thanks a lot :)

Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-10 Thread Chen, Tiejun

>> > > However I don't think this patch is a right fix. So far relax/strict 
policy
>> > > is per-domain. what about one VM specifies relax while another VM
>> > > specifies strict when each is assigned with a device sharing rmrr
>> > > with the other? In that case it becomes a system-wide security hole.
>> >
>> > The one specifying "strict" won't gets its device assigned (due to
>> > the code above, taking the path that was there already without
>> > the patch), so I don't see the security issue.
>> >
>>
>> Agreed. A VM can't get such device assigned in the first place, so the
>> hypothetical scenario doesn't exist.
>>
>
> Sorry it's a bad example. My actual concern is that we can't count
> on this per-VM relax/strict policy to prevent group devices assigned
> to different VM. In that case it's definitely a security hole since
> one VM may clobber shared RMRR to impact another VM. So right
> example for that scenario is both VMs specified with 'relax'.

What if one of group devices is still owned by Dom0?



It's also risky since other VM may attack Dom0 in such scenario.



In my opinion, Dom0 should have a big impact...

Anyway, this always means we have to start refactoring some codes. For 
example, we are probably going to introduce some new fields in struct 
acpi_rmrr_unit, just like,


int domain_id -> Distinguish which domain owns this unit
unsigned int flag -> Record state: XEN_DOMCTL_DEV_RDM_RELAXE or 
!XEN_DOMCTL_DEV_RDM_RELAXE


This should involve several sections, such as parsing rmrr, setup 
hwdomain and assign/remove device.


But I'm not sure if this is good to handle current problem. Actually I 
prefer to work on current patch just now, and then we can start 
discussing our final solution :)


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question to Xen log level in the case of PT

2015-09-09 Thread Chen, Tiejun

So can Xen change log level dynamically like Linux? If yes, we might
change this level temporarily while passing through IGD. If not, any
suggestion?


First of all you could boot without lowering the log level (non-debug
builds) or raising the log level ("loglvl=warning"; debug builds). But


Sorry I don't know how to build "non-debug" here. Could you give me this 
detail? Or where I can get this info?



that would change the log level for the entire session, which may
not be what you're after. I too realized that having a way to
dynamically adjust the log level would be useful occasionally. For
post-4.6 I have a patch (attached) ready allowing to do so in a
limited way from the serial console (and hence also via "xl debug-key").
As you'll see in there I also took note of it probably being desirable
to have a sysctl (and then a wrapping xl command) to full control the
log level. I didn't get around to implement that yet.


Good to know this.



Otoh the specific messages you cite are of quite questionable use
in the first place. I certainly would welcome a patch lowering their
priority to XENLOG_G_DEBUG (which however would still not


Done.

Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-09 Thread Chen, Tiejun

On 9/9/2015 2:54 PM, Jan Beulich wrote:

On 09.09.15 at 03:59,  wrote:

@@ -2310,12 +2312,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
-printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+bool_t relaxed = !!(flag & XEN_DOMCTL_DEV_RDM_RELAXED);
+
+printk(XENLOG_G_WARNING VTDPREFIX


Well, I can live with this always being a warning, but it's not what I
had asked for. The VT-d maintainers will have to judge.



Kevin,

Any comments?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-09 Thread Chen, Tiejun

On 9/10/2015 12:10 AM, Stefano Stabellini wrote:

On Wed, 9 Sep 2015, Stefano Stabellini wrote:

On Tue, 8 Sep 2015, Peter Maydell wrote:
> On 8 September 2015 at 18:21, Stefano Stabellini
>  wrote:
> > The following changes since commit 8611280505119e296757a60711a881341603fa5a:
> >
> >   target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200)
> >
> > are available in the git repository at:
> >
> >   git://xenbits.xen.org/people/sstabellini/qemu-dm.git 
tags/xen-2015-09-08-tag
> >
> > for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4:
> >
> >   xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. 
(2015-09-08 15:21:56 +)
> >
> > 
> > Xen branch xen-2015-09-08
> >
> > 
>
> Hi. I'm afraid this fails to build on OSX (and probably Windows too,
> though that build hasn't run yet):
>
>   CCi386-softmmu/hw/i386/pci-assign-load-rom.o
> /Users/pm215/src/qemu/hw/i386/pci-assign-load-rom.c:6:10: fatal error:
>   'sys/io.h' file not found
> #include 
>  ^
>   CCalpha-softmmu/hw/alpha/pci.o
> 1 error generated.

Tiejun,

this is caused by 33d33242b7d802e6c994f3d56ecba96a66465dc3,
"hw/pci-assign: split pci-assign.c". Could you please double-check
non-Linux builds?


I found another issue introduced by the gfx passthrough series on
Windows:

../hw/pci-host/piix.o: In function `host_pci_config_read':
/root/qemu/hw/pci-host/piix.c:778: undefined reference to `_pread'

It is introduced by:

commit fdb70721ba0496a767137e5505dd27627d19c4a8
Author: Tiejun Chen 
Date:   Wed Jul 15 13:37:43 2015 +0800

 piix: create host bridge to passthrough


You might have to replace the pread call with lseek and read.



This is also surprising to me. Just see xen_host_pci_config_() 
inside /hw/xen/xen-host-pci-device.c, there are so many this pread 
usage(). So I really don't understand what's difference between these 
two files.	


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-09 Thread Chen, Tiejun

On 9/9/2015 9:06 PM, Stefano Stabellini wrote:

On Tue, 8 Sep 2015, Peter Maydell wrote:

On 8 September 2015 at 18:21, Stefano Stabellini
 wrote:
> The following changes since commit 8611280505119e296757a60711a881341603fa5a:
>
>   target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200)
>
> are available in the git repository at:
>
>   git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-2015-09-08-tag
>
> for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4:
>
>   xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. 
(2015-09-08 15:21:56 +)
>
> 
> Xen branch xen-2015-09-08
>
> 

Hi. I'm afraid this fails to build on OSX (and probably Windows too,
though that build hasn't run yet):

  CCi386-softmmu/hw/i386/pci-assign-load-rom.o
/Users/pm215/src/qemu/hw/i386/pci-assign-load-rom.c:6:10: fatal error:
  'sys/io.h' file not found
#include 
 ^
  CCalpha-softmmu/hw/alpha/pci.o
1 error generated.


Tiejun,

this is caused by 33d33242b7d802e6c994f3d56ecba96a66465dc3,
"hw/pci-assign: split pci-assign.c". Could you please double-check
non-Linux builds?


Its interesting.

As you see this short log, "hw/pci-assign: split pci-assign.c", so this 
means I just extract something from the original 
hw/i386/kvm/pci-assign.c, and here so I just keep those original head 
files residing hw/i386/kvm/pci-assign.c, and I didn't introduce anything 
new.


So its very probably that you still can't compile successfully even 
without my commit on OSX/Windows, right? I think Peter may be right,


"Will passthrough even work on Windows and OSX hosts?
Consider whether we should be building this code on those
hosts at all..."

I prefer this isn't what we did previously.




I suspect that the fix would be quite small, but I don't have an OSX or
a Windows build environment to try it.


I haven't a this build environment as well. But I think right now you 
can remove "#include " to fix this simply since looks this is 
redundant actually.


hw/i386/pci-assign: remove one head file

This is redundant actually but really break OS/Windows build.

Signed-off-by: Tiejun Chen 

diff --git a/hw/i386/pci-assign-load-rom.c b/hw/i386/pci-assign-load-rom.c
index bad53b7..1f0d4ef 100644
--- a/hw/i386/pci-assign-load-rom.c
+++ b/hw/i386/pci-assign-load-rom.c
@@ -3,7 +3,6 @@
  */
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 


At least I can build this under Linux,

./configure --target-list=x86_64-softmmu && make

Thanks
Tiejun



Speak about build environments, Peter, would you care to share your
scripts and setup so that I can run similar tests in the future on my
own?  I have no OSX machines so I tried to do a Windows
cross-compile, following http://wiki.qemu.org/Hosts/W32 on Debian 7, but
I failed very early with an "ERROR: zlib check failed".



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping

2015-09-09 Thread Chen, Tiejun

If the 64 limit was arbitrary then I would suggest increasing it to at least
1024 so that
at least 4M of BAR can be mapped in one go and it reduces the overhead by a
factor of 16.


1024 may be a little much, but 256 is certainly a possibility, plus
Konrad's suggestion to allow this limit to be controlled via command
line option.


Are you guys talking this way?

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3946e4c..a9671bb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -88,6 +88,10 @@ boolean_param("noapic", skip_ioapic_setup);
 s8 __read_mostly xen_cpuidle = -1;
 boolean_param("cpuidle", xen_cpuidle);

+/* once_mapping_mfns: memory mapping mfn bumbers once. */
+unsigned int xen_once_mapping_mfns;
+integer_param("once_mapping_mfns", xen_once_mapping_mfns);
+
 #ifndef NDEBUG
 unsigned long __initdata highmem_start;
 size_param("highmem-start", highmem_start);
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 3bf39f1..82c85e3 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -33,6 +33,8 @@
 #include 
 #include 

+extern unsigned int xen_once_mapping_mfns;
+
 static DEFINE_SPINLOCK(domctl_lock);
 DEFINE_SPINLOCK(vcpu_alloc_lock);

@@ -1035,7 +1037,7 @@ long 
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)


 ret = -E2BIG;
 /* Must break hypercall up as this could take a while. */
-if ( nr_mfns > 64 )
+if ( nr_mfns > xen_once_mapping_mfns )
 break;

 ret = -EPERM;

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-09 Thread Chen, Tiejun

Need to have separate warning/error level for relax/strict.

However I don't think this patch is a right fix. So far relax/strict policy
is per-domain. what about one VM specifies relax while another VM
specifies strict when each is assigned with a device sharing rmrr
with the other? In that case it becomes a system-wide security hole.

Once we add code to track group relationship cross domains, it'd be
close to the final fix to support group assignment which originally target
4.7. It might be risky to add that in 4.6.


Yes.



So my suggestion is to live with current limitation.



But recently someone was encountering this problem.

http://www.gossamer-threads.com/lists/xen/devel/391684?page=last

We'd better figure out a simple way to this regression.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Question to Xen log level in the case of PT

2015-09-08 Thread Chen, Tiejun

All guys,

Sorry to raise a question to you since I'm not very sure how to deal 
with this.


When I passthrough a device like IGD, I can see so many messages:

"memory_map:add:" and "memory_map:remove:"

since we have to add/remove all pages map residing PCI bar. Especially 
as a graphic device, oftentimes this range would occupy dozens of MB, 
even hundreds of MB. These print messages consume a lot of time to boot 
a VM. For instance, it takes about 5 minutes to boot a Windows guest on 
my BDW. But if I remove these output simply like this,


diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 7f959f3..82da9d1 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1049,10 +1049,6 @@ long 
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)


 if ( add )
 {
-printk(XENLOG_G_INFO
-   "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n",
-   d->domain_id, gfn, mfn, nr_mfns);
-
 ret = map_mmio_regions(d, gfn, nr_mfns, mfn);
 if ( ret )
 printk(XENLOG_G_WARNING
@@ -1061,10 +1057,6 @@ long 
do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)

 }
 else
 {
-printk(XENLOG_G_INFO
-   "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n",
-   d->domain_id, gfn, mfn, nr_mfns);
-
 ret = unmap_mmio_regions(d, gfn, nr_mfns, mfn);
 if ( ret && is_hardware_domain(current->domain) )
 printk(XENLOG_ERR

its down to a half, about 2.5 minutes.

I know I can't delete this directly. But currently there are four log 
level on Xen side,


 *   XENLOG_ERR: Fatal errors, either Xen, Guest or Dom0
 *   is about to crash.
 *
 *   XENLOG_WARNING: Something bad happened, but we can recover.
 *
 *   XENLOG_INFO: Interesting stuff, but not too noisy.
 *
 *   XENLOG_DEBUG: Use where ever you like. Lots of noise.

looks I have to change XENLOG_G_INFO to XENLOG_G_WARNING but its not 
appropriate here.


So can Xen change log level dynamically like Linux? If yes, we might 
change this level temporarily while passing through IGD. If not, any 
suggestion?


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-09-05 Thread Chen, Tiejun

Sorry for this delay response because I was on vacation.

On 9/5/2015 5:52 AM, Tamas K Lengyel wrote:

On Fri, Sep 4, 2015 at 2:17 AM, Jan Beulich  wrote:


>>> On 03.09.15 at 21:39,  wrote:
> So this change in 4.6 prevents me from passing through devices that have
> worked previously with VT-d:
>
> (XEN) [VT-D] cannot assign :00:1a.0 with shared RMRR at ae8a9000 for
> Dom30.
> (XEN) [VT-D] cannot assign :00:1d.0 with shared RMRR at ae8a9000 for
> Dom31.
>
> The devices are the USB 2.0 devices on a DQ67SW motherboard:
>
> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset
> Family USB Enhanced Host Controller #2 (rev 04)
> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset
> Family USB Enhanced Host Controller #1 (rev 04)

Please don't top post. And I'm also puzzled by you sending this to
Ian rather than the author.



Hm, I've just hit reply-all to the latest message I've found in the thread.




Technically - Tiejun, should this perhaps be permitted in relaxed
mode, at least until group assignment gets implemented? (Or


I agree. What about this?

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c

index 836aed5..038776a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
+u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
+
 printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+   " Currently its %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? "disallowed" : "risky",
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-return -EPERM;
+if ( !relaxed )
+return -EPERM;
 }
 }



Tamas, do you actually mean to assign these to _different_
guests, considering the log fragment above?)



No, I actually want to assign them to the same domain. The domain creation
fails with either of those devices specified for passthrough whether they
are to be attached to the same domain or not.



Tamas, could you try this in your case?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-09-05 Thread Chen, Tiejun

On 9/6/2015 11:19 AM, Tamas K Lengyel wrote:


diff --git a/xen/drivers/passthrough/vtd/iommu.c
b/xen/drivers/passthrough/vtd/iommu.c
index 836aed5..038776a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
+u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
+
 printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+   " Currently its %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? "disallowed" : "risky",



This debug message is backwards?


Yeah. Its indeed like this, relaxed ? "risky" : "disallowed"

But lets wait Jan's comment to step next.





seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-return -EPERM;
+if ( !relaxed )
+return -EPERM;
 }
 }


Tamas, do you actually mean to assign these to _different_

guests, considering the log fragment above?)



No, I actually want to assign them to the same domain. The domain creation
fails with either of those devices specified for passthrough whether they
are to be attached to the same domain or not.



Tamas, could you try this in your case?



Took me a while to find the xl config option to set this flag (pci = [
'sbdf, rdm_policy=strict/relaxed' ]) but now it works as expected!



I remember 'relaxed' is a default value so 'rdm_policy' can't be dropped 
here if you like this.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6

2015-08-27 Thread Chen, Tiejun

On 8/27/2015 7:03 PM, Konrad Rzeszutek Wilk wrote:

On Thu, Aug 27, 2015 at 11:06:30AM +0800, Chen, Tiejun wrote:

On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote:
On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote:
On 8/25/2015 8:19 AM, Tamas K Lengyel wrote:
Hi everyone,
I saw some people passingly mention this on the list before but just in
case it has been missed, my serial is also being spammed with the following
printouts with both Xen 4.6 RC1 and the latest staging build:

...
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
2610742000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid
...


What's your platform? BDW? And how much memory is set to your guest OS?

Is see this as well. But oddly enough - only when I use the AMT feature
(normally I just use serial console on the machine).

The platform is /DQ67SW, BIOS
SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

There is no guest OS - this is initial domain. And I boot with 2GB:
  Released 0 page(s)

Xen: [mem 0x-0x00099fff] usable
Xen: [mem 0x0009a800-0x000f] reserved
Xen: [mem 0x0010-0x1fff] usable
Xen: [mem 0x2000-0x201f] reserved
Xen: [mem 0x2020-0x3fff] usable
Xen: [mem 0x4000-0x401f] reserved
Xen: [mem 0x4020-0x80465fff] usable
Xen: [mem 0x80466000-0x9e855fff] unusable
Xen: [mem 0x9e856000-0x9e85efff] ACPI data
Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS
Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable
Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved
Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable
Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved
Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable
Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved
Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS
Xen: [mem 0x9ebf5000-0x9ec18fff] reserved
Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS
Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved
Xen: [mem 0x9ee7c000-0x9eff] unusable
Xen: [mem 0x9f80-0xbf9f] reserved
Xen: [mem 0xfec0-0xfec00fff] reserved
Xen: [mem 0xfed1c000-0xfed3] reserved
Xen: [mem 0xfed9-0xfed91fff] reserved
Xen: [mem 0xfee0-0xfeef] reserved
Xen: [mem 0xff00-0x] reserved
Xen: [mem 0x0001-0x00043e5f] unusable


Just at first glance to fault address, this seems be issued from some

As you see those fault addresses are out of the normal memory range  here.

known erratas on BDS and SKL.

I am runnig v4.2-rc8.

So I really doubt this is related to some erratas. Currently the pre-fetch
unit of IOMMU unit dedicated to IGD can't work well on some platforms, so
you can see these wired faults.


Do you have some ideas for a solution/patch?


I can't reproduce this on my BDW. And this is my assumption as well. So 
could anyone provide a log with iommu=debug?


Thanks
Tiejun



Thanks!



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6] VT-d: Create IOMMU mappings for RMRR regions if shared EPT is not being used

2015-08-27 Thread Chen, Tiejun

On 8/27/2015 4:40 PM, Malcolm Crossley wrote:

On 27/08/15 03:59, Chen, Tiejun wrote:

This kind of issue is already gone.

https://www.mail-archive.com/xen-devel@lists.xen.org/msg32464.html


There is a bug in the code you refer to above which results in no IOMMU page 
table
mappings being created if the guest domain is not sharing it's EPT page tables 
with
the IOMMU.

set_identity_p2m_entry only configures the EPT page tables and does not 
configure
the IOMMU page tables.


Okay, I got what you mean.

Instead, could you insert iommu_{map,unmap_page() into 
{set,clear}_identity_p2m_entry()? I think this can make 
{set,clear}_identity_p2m_entry approachable in all circumstances.


Kevin and Jan,

Is this fine?

Thanks
Tiejun



We had a real world regression (with xen 4.6-rc1) on a Intel Haswell system with
integrated graphics.

The patch below resolves the regression.

Malcolm



Thanks
Tiejun

On 8/26/2015 11:49 PM, Malcolm Crossley wrote:

Add RMRR 1:1 IOMMU mappings to IOMMU page tables if EPT page table are not being
shared with the IOMMU.

This is a regression in behaviour versus Xen 4.5.

Signed-off-by: Malcolm Crossley malcolm.cross...@citrix.com
---
  xen/drivers/passthrough/vtd/iommu.c | 23 ---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 836aed5..89de741 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,8 +1839,16 @@ static int rmrr_identity_mapping(struct domain *d, 
bool_t map,

  while ( base_pfn  end_pfn )
  {
-if (   (d, base_pfn) )
-ret = -ENXIO;
+if ( iommu_use_hap_pt(d) )
+{
+if ( clear_identity_p2m_entry(d, base_pfn) )
+ret = -ENXIO;
+}
+else
+{
+if ( intel_iommu_unmap_page(d, base_pfn) )
+ret = -ENXIO;
+}
  base_pfn++;
  }

@@ -1855,7 +1863,16 @@ static int rmrr_identity_mapping(struct domain *d, 
bool_t map,

  while ( base_pfn  end_pfn )
  {
-int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
+int err;
+if ( iommu_use_hap_pt(d) )
+{
+err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
+}
+else
+{
+err = intel_iommu_map_page(d, base_pfn, base_pfn,
+   IOMMUF_readable|IOMMUF_writable);
+}

  if ( err )
  return err;






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6

2015-08-26 Thread Chen, Tiejun


On 8/27/2015 12:19 AM, Malcolm Crossley wrote:

On 25/08/15 15:43, Konrad Rzeszutek Wilk wrote:

On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote:

On 8/25/2015 8:19 AM, Tamas K Lengyel wrote:

Hi everyone,
I saw some people passingly mention this on the list before but just in
case it has been missed, my serial is also being spammed with the following
printouts with both Xen 4.6 RC1 and the latest staging build:

...
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
2610742000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid
...





I think this problem is caused by missing IOMMU mappings for the RMRR regions
if the domain does not have shared EPT enabled. This includes PV Dom 0.


I don't think this is relevant to current problem.



I have posted a patch to fix the issue:

http://lists.xen.org/archives/html/xen-devel/2015-08/msg02090.html



And we already fixed this on 4.6.

Thanks
Tiejun





What's your platform? BDW? And how much memory is set to your guest OS?


Is see this as well. But oddly enough - only when I use the AMT feature
(normally I just use serial console on the machine).

The platform is /DQ67SW, BIOS
SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

There is no guest OS - this is initial domain. And I boot with 2GB:
 Released 0 page(s)

Xen: [mem 0x-0x00099fff] usable
Xen: [mem 0x0009a800-0x000f] reserved
Xen: [mem 0x0010-0x1fff] usable
Xen: [mem 0x2000-0x201f] reserved
Xen: [mem 0x2020-0x3fff] usable
Xen: [mem 0x4000-0x401f] reserved
Xen: [mem 0x4020-0x80465fff] usable
Xen: [mem 0x80466000-0x9e855fff] unusable
Xen: [mem 0x9e856000-0x9e85efff] ACPI data
Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS
Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable
Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved
Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable
Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved
Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable
Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved
Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS
Xen: [mem 0x9ebf5000-0x9ec18fff] reserved
Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS
Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved
Xen: [mem 0x9ee7c000-0x9eff] unusable
Xen: [mem 0x9f80-0xbf9f] reserved
Xen: [mem 0xfec0-0xfec00fff] reserved
Xen: [mem 0xfed1c000-0xfed3] reserved
Xen: [mem 0xfed9-0xfed91fff] reserved
Xen: [mem 0xfee0-0xfeef] reserved
Xen: [mem 0xff00-0x] reserved
Xen: [mem 0x0001-0x00043e5f] unusable



Just at first glance to fault address, this seems be issued from some
known erratas on BDS and SKL.


I am runnig v4.2-rc8.


Thanks
Tiejun


The device in question is an integrated Intel graphics card:

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core
Processor Family Integrated Graphics Controller (rev 09)


Same device..


This is a sandy bridge device which doesn't support shared EPT for GPU device.




The only way I found to stop the messages from making my serial connection
useless was by assigning the device to xen-pciback.


This will cause the GPU to be reset and the reset stopped the GPU accessing the
RMRR region.



Cheers,
Tamas



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6] VT-d: Create IOMMU mappings for RMRR regions if shared EPT is not being used

2015-08-26 Thread Chen, Tiejun

This kind of issue is already gone.

https://www.mail-archive.com/xen-devel@lists.xen.org/msg32464.html

Thanks
Tiejun

On 8/26/2015 11:49 PM, Malcolm Crossley wrote:

Add RMRR 1:1 IOMMU mappings to IOMMU page tables if EPT page table are not being
shared with the IOMMU.

This is a regression in behaviour versus Xen 4.5.

Signed-off-by: Malcolm Crossley malcolm.cross...@citrix.com
---
  xen/drivers/passthrough/vtd/iommu.c | 23 ---
  1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 836aed5..89de741 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,8 +1839,16 @@ static int rmrr_identity_mapping(struct domain *d, 
bool_t map,

  while ( base_pfn  end_pfn )
  {
-if ( clear_identity_p2m_entry(d, base_pfn) )
-ret = -ENXIO;
+if ( iommu_use_hap_pt(d) )
+{
+if ( clear_identity_p2m_entry(d, base_pfn) )
+ret = -ENXIO;
+}
+else
+{
+if ( intel_iommu_unmap_page(d, base_pfn) )
+ret = -ENXIO;
+}
  base_pfn++;
  }

@@ -1855,7 +1863,16 @@ static int rmrr_identity_mapping(struct domain *d, 
bool_t map,

  while ( base_pfn  end_pfn )
  {
-int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
+int err;
+if ( iommu_use_hap_pt(d) )
+{
+err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
+}
+else
+{
+err = intel_iommu_map_page(d, base_pfn, base_pfn,
+   IOMMUF_readable|IOMMUF_writable);
+}

  if ( err )
  return err;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6

2015-08-26 Thread Chen, Tiejun

On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote:

On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote:

On 8/25/2015 8:19 AM, Tamas K Lengyel wrote:
Hi everyone,
I saw some people passingly mention this on the list before but just in
case it has been missed, my serial is also being spammed with the following
printouts with both Xen 4.6 RC1 and the latest staging build:

...
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
2610742000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid
...


What's your platform? BDW? And how much memory is set to your guest OS?


Is see this as well. But oddly enough - only when I use the AMT feature
(normally I just use serial console on the machine).

The platform is /DQ67SW, BIOS
SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

There is no guest OS - this is initial domain. And I boot with 2GB:
  Released 0 page(s)

Xen: [mem 0x-0x00099fff] usable
Xen: [mem 0x0009a800-0x000f] reserved
Xen: [mem 0x0010-0x1fff] usable
Xen: [mem 0x2000-0x201f] reserved
Xen: [mem 0x2020-0x3fff] usable
Xen: [mem 0x4000-0x401f] reserved
Xen: [mem 0x4020-0x80465fff] usable
Xen: [mem 0x80466000-0x9e855fff] unusable
Xen: [mem 0x9e856000-0x9e85efff] ACPI data
Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS
Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable
Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved
Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable
Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved
Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable
Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved
Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS
Xen: [mem 0x9ebf5000-0x9ec18fff] reserved
Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS
Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved
Xen: [mem 0x9ee7c000-0x9eff] unusable
Xen: [mem 0x9f80-0xbf9f] reserved
Xen: [mem 0xfec0-0xfec00fff] reserved
Xen: [mem 0xfed1c000-0xfed3] reserved
Xen: [mem 0xfed9-0xfed91fff] reserved
Xen: [mem 0xfee0-0xfeef] reserved
Xen: [mem 0xff00-0x] reserved
Xen: [mem 0x0001-0x00043e5f] unusable



Just at first glance to fault address, this seems be issued from some


As you see those fault addresses are out of the normal memory range  here.


known erratas on BDS and SKL.


I am runnig v4.2-rc8.


So I really doubt this is related to some erratas. Currently the 
pre-fetch unit of IOMMU unit dedicated to IGD can't work well on some 
platforms, so you can see these wired faults.


Thanks
Tiejun



Thanks
Tiejun

The device in question is an integrated Intel graphics card:

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core
Processor Family Integrated Graphics Controller (rev 09)


Same device..


The only way I found to stop the messages from making my serial connection
useless was by assigning the device to xen-pciback.

Cheers,
Tamas



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6

2015-08-25 Thread Chen, Tiejun

On 8/25/2015 8:19 AM, Tamas K Lengyel wrote:

Hi everyone,
I saw some people passingly mention this on the list before but just in
case it has been missed, my serial is also being spammed with the following
printouts with both Xen 4.6 RC1 and the latest staging build:

...
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
33487d7000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr
2610742000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid
...



What's your platform? BDW? And how much memory is set to your guest OS?

Just at first glance to fault address, this seems be issued from some 
known erratas on BDS and SKL.


Thanks
Tiejun


The device in question is an integrated Intel graphics card:

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core
Processor Family Integrated Graphics Controller (rev 09)

The only way I found to stop the messages from making my serial connection
useless was by assigning the device to xen-pciback.

Cheers,
Tamas



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] libxl: move calling libxl__arch_domain_construct_memmap to right place

2015-08-05 Thread Chen, Tiejun

On 8/5/2015 7:25 PM, Wei Liu wrote:

On Wed, Aug 05, 2015 at 12:06:22PM +0100, Ian Campbell wrote:

On Wed, 2015-08-05 at 11:58 +0100, Wei Liu wrote:
 On Wed, Aug 05, 2015 at 11:48:55AM +0100, Ian Campbell wrote:
  On Wed, 2015-08-05 at 11:43 +0100, Wei Liu wrote:
   On Wed, Aug 05, 2015 at 11:38:54AM +0100, Ian Campbell wrote:
On Wed, 2015-08-05 at 11:23 +0100, Wei Liu wrote:
 This function was called in the wrong place, because both
 libxl__vnuma_build_vmemrange_hvm and xc_hvm_build rely on its
 output.
   
What is the effect of this call being in the wrong place?
Presumably
one or
the other of those functions reaches the wrong conclusion?
  
   Originally, by the time that function got called, all guest pages
   were
   already populated.  The end result is E820 map disagrees with what
   vNUMA
   says and what address ranges memory actually resides, i.e. risk of
   guest
   accessing region that doesn't have backing pages.
 
  Ouch. This should certainly be explained in the commit message.
 
  With that: Acked-by: Ian Campbell ian.campb...@citrix.com
 

 I will post v2 shortly with that information in commit message.

  Although perhaps we should wait for confirmation this fix doesn't
  regress
  RMRR somehow?
 

 I doubt it will. The code as-is is already broken for RMRR. But let's
 wait till tomorrow for Tiejun to reply.

 If he doesn't reply by tomorrow, I suggest we apply v2 first and
 fix up any subsequent issues later.

Agreed.


Actually I want to retract this patch. I confused hvm path with pv path
and drew my conclusion when looking at both code paths.

In hvm path, neither libxl__vnuma_build_vmemragen_hvm nor xc_hvm_build
depends on output of libxl__arch_domain_construct_memmap (in fact it
doesn't change anything). So the code is OK.

In pv path, there is a path which relies on having a valid E820 map
first, but that path 1) relies on host E820 map; 2) doesn't involve RMRR
support.

In the end, moving that function call has no effect whatsoever.


Sorry I don't go into this details but seems I have nothing to do about 
this, ultimately. Right?


Thanks
Tiejun



Sorry for the noise!

Wei.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression in OVMF + RMRR series

2015-07-28 Thread Chen, Tiejun

Jul 25 17:48:51.685107 (d1) BIOS map:
Jul 25 17:48:51.685145 (d1)  ffe0-: Main BIOS
Jul 25 17:48:51.693030 (d1) *** HVMLoader bug at e820.c:262
Jul 25 17:48:51.693064 (d1) *** HVMLoader crashed.

Git blame shows that the change that crashes hvmloader was part of the
RMRR series.

Tiejun, could you please fix this please?  It should be easy to reproduce.



This is really my responsibility and I just sent out one patch to fix this,

tools/hvmloader: sync memory map[]

Please take a review.

Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 v2 7/8] python/xc: reinstate original implementation of next_bdf

2015-07-27 Thread Chen, Tiejun

On 2015/7/28 1:45, Wei Liu wrote:

I missed the fact that next_bdf is used to parsed user supplied
strings when reviewing. The user supplied string is a NULL-terminated
string separated by comma. User can supply several PCI devices in that
string. There is, however, no delimiter for different devices, hence
we can't change the syntax of that string.

This patch reinstate the original implementation of next_bdf to
preserve the original syntax. The last argument for xc_assign_device
is always 0.

Signed-off-by: Wei Liu wei.l...@citrix.com
---
Cc: Tiejun Chen tiejun.c...@intel.com

Tiejun, are you actually using this python binding? I don't think we


This change is just following to RMRR series but currently we don't use 
this. So its fine if you think this don't break any other usages.


Thanks
Tiejun


have in tree user.

If nobody is using it, I propose we remove this binding in next
release.

I don't have live example of that string. My analysis is based on
reverse-engineering of original code.
---
  tools/python/xen/lowlevel/xc/xc.c | 30 ++
  1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index c8380d1..2c36eb2 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,8 +592,7 @@ static int token_value(char *token)
  return strtol(token, NULL, 16);
  }

-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
-int *flag)
+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
  {
  char *token;

@@ -608,17 +607,8 @@ static int next_bdf(char **str, int *seg, int *bus, int 
*dev, int *func,
  *dev  = token_value(token);
  token = strchr(token, ',') + 1;
  *func  = token_value(token);
-token = strchr(token, ',') + 1;
-if ( token ) {
-*flag = token_value(token);
-*str = token + 1;
-}
-else
-{
-/* O means we take strict as our default policy. */
-*flag = 0;
-*str = NULL;
-}
+token = strchr(token, ',');
+*str = token ? token + 1 : NULL;

  return 1;
  }
@@ -630,14 +620,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
  uint32_t dom;
  char *pci_str;
  int32_t sbdf = 0;
-int seg, bus, dev, func, flag;
+int seg, bus, dev, func;

  static char *kwd_list[] = { domid, pci, NULL };
  if ( !PyArg_ParseTupleAndKeywords(args, kwds, is, kwd_list,
dom, pci_str) )
  return NULL;

-while ( next_bdf(pci_str, seg, bus, dev, func, flag) )
+while ( next_bdf(pci_str, seg, bus, dev, func) )
  {
  sbdf = seg  16;
  sbdf |= (bus  0xff)  8;
@@ -663,21 +653,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
  uint32_t dom;
  char *pci_str;
  int32_t sbdf = 0;
-int seg, bus, dev, func, flag;
+int seg, bus, dev, func;

  static char *kwd_list[] = { domid, pci, NULL };
  if ( !PyArg_ParseTupleAndKeywords(args, kwds, is, kwd_list,
dom, pci_str) )
  return NULL;

-while ( next_bdf(pci_str, seg, bus, dev, func, flag) )
+while ( next_bdf(pci_str, seg, bus, dev, func) )
  {
  sbdf = seg  16;
  sbdf |= (bus  0xff)  8;
  sbdf |= (dev  0x1f)  3;
  sbdf |= (func  0x7);

-if ( xc_assign_device(self-xc_handle, dom, sbdf, flag) != 0 )
+if ( xc_assign_device(self-xc_handle, dom, sbdf, 0) != 0 )
  {
  if (errno == ENOSYS)
  sbdf = -1;
@@ -696,14 +686,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
  uint32_t dom;
  char *pci_str;
  int32_t sbdf = 0;
-int seg, bus, dev, func, flag;
+int seg, bus, dev, func;

  static char *kwd_list[] = { domid, pci, NULL };
  if ( !PyArg_ParseTupleAndKeywords(args, kwds, is, kwd_list,
dom, pci_str) )
  return NULL;

-while ( next_bdf(pci_str, seg, bus, dev, func, flag) )
+while ( next_bdf(pci_str, seg, bus, dev, func) )
  {
  sbdf = seg  16;
  sbdf |= (bus  0xff)  8;



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-23 Thread Chen, Tiejun

Tiejun, please can you send a patch to fix this up.  Please send just
a revised version of this patch.  I think the rest of the series will
rebase just fine on top of it.  (If I'm wrong then we will need to do
something more complex.)


Sorry to this.


On ARM side the flag field doesn't take any affect.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c

index 9a667e9..b62c8cf 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2709,7 +2709,8 @@ static int arm_smmu_reassign_dev(struct domain *s, 
struct domain *t,

return ret;

if (t) {
-   ret = arm_smmu_assign_dev(t, devfn, dev);
+   /* The flag field doesn't matter to DT device. */
+   ret = arm_smmu_assign_dev(t, devfn, dev, 0);
if (ret)
return ret;
}


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-23 Thread Chen, Tiejun

Can you avoid the mention of DT in the comment please, since PCI will
eventually go that path. Something like No flags are defined for ARM
would suffice I think.


Works for me.

But in some way 0 also represents one valid flag. So what about this ?

No flags are defined for ARM so its always safe to set 0.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-23 Thread Chen, Tiejun

These cosmetic changes can be fixed by a follow-up patch.



If Jackson would like not to fix this directly in his tree, I can post 
this a small patch but we'd better squash this into the predecessor just 
as one commit.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-22 Thread Chen, Tiejun
OOPS! Please refer to this version: (One miss changing flag to flags in 
patch #11 although we can compile successfully.)



#1. To patch #8

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2991333..9c5ef8b 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1316,7 +1316,7 @@ int xc_get_machine_memory_map(xc_interface *xch,
   uint32_t max_entries);

 int xc_reserved_device_memory_map(xc_interface *xch,
-  uint32_t flag,
+  uint32_t flags,
   uint16_t seg,
   uint8_t bus,
   uint8_t devfn,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 298b3b5..1b074b7 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -686,7 +686,7 @@ int xc_domain_set_memory_map(xc_interface *xch,
 }

 int xc_reserved_device_memory_map(xc_interface *xch,
-  uint32_t flag,
+  uint32_t flags,
   uint16_t seg,
   uint8_t bus,
   uint8_t devfn,
@@ -695,11 +695,11 @@ int xc_reserved_device_memory_map(xc_interface *xch,
 {
 int rc;
 struct xen_reserved_device_memory_map xrdmmap = {
-.flag = flag,
-.seg = seg,
-.bus = bus,
-.devfn = devfn,
-.nr_entries = *max_entries
+.flags = flags,
+.nr_entries = *max_entries,
+.dev.pci.seg = seg,
+.dev.pci.bus = bus,
+.dev.pci.devfn = devfn,
 };
 DECLARE_HYPERCALL_BOUNCE(entries,
  sizeof(struct xen_reserved_device_memory) *

#2. To patch #11
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 29476fc..8d103c3 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -94,7 +94,7 @@ const char *libxl__domain_device_model(libxl__gc *gc,

 static int
 libxl__xc_device_get_rdm(libxl__gc *gc,
- uint32_t flag,
+ uint32_t flags,
  uint16_t seg,
  uint8_t bus,
  uint8_t devfn,
@@ -107,7 +107,7 @@ libxl__xc_device_get_rdm(libxl__gc *gc,
  * We really can't presume how many entries we can get in advance.
  */
 *nr_entries = 0;
-r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+r = xc_reserved_device_memory_map(CTX-xch, flags, seg, bus, devfn,
   NULL, nr_entries);
 assert(r = 0);
 /* 0 means we have no any rdm entry. */
@@ -119,7 +119,7 @@ libxl__xc_device_get_rdm(libxl__gc *gc,
 }

 GCNEW_ARRAY(*xrdm, *nr_entries);
-r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+r = xc_reserved_device_memory_map(CTX-xch, flags, seg, bus, devfn,
   *xrdm, nr_entries);
 if (r)
 rc = ERROR_FAIL;
@@ -212,7 +212,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
 unsigned int nr_entries;

 /* Collect all rdm info if exist. */
-rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+rc = libxl__xc_device_get_rdm(gc, XENMEM_RDM_ALL,
   0, 0, 0, nr_entries, xrdm);
 if (rc)
 goto out;
@@ -240,7 +240,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
 devfn = PCI_DEVFN(d_config-pcidevs[i].dev,
   d_config-pcidevs[i].func);
 nr_entries = 0;
-rc = libxl__xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+rc = libxl__xc_device_get_rdm(gc, 0,
   seg, bus, devfn, nr_entries, 
xrdm);

 if (rc)
 goto out;

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-22 Thread Chen, Tiejun

I intend to produce a git branch RSN.



On the staging? Tomorrow I can pull to check/test this.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-22 Thread Chen, Tiejun

Thanks for your clarification to me.


The solution to this is to be systematic in how you handle the email
based review of a series, not to add a further to the reviewer by using
IRC as well.

For example, here is how I handle review of a series I am working on:

I keep all the replies to a series I'm working on marked unread. When I
come to rework the series I take the first reply to the first patch and
start a draft reply to it quoting the entire review.

I then go through the comments one by one and either:

   * make the _complete_ code change, including adding the Changes
 in vN bit to the commit log and delete that comment from the
 reply


Are you saying this case of resending this particular patch online?


or
   * write a reply in my draft to that particular comment which does
 one or more of:

   * Explain that I don't understand the suggestion,
 probably asking questions to try and clarify what was
 being asked for.


Yes, in this case we're arguing, I was really trying to send a sample of 
this code fragment to ask this before I sent out the complete change.



   * Explain, with reasons, why I disagree with the
 suggestion
   * Explain, with reasons, why I only implemented part of
 the suggestion.

Only then do I move on to the next comment in that mail and repeat. At
the end I've either deleted all the comments from my draft (because
I've fully implemented everything) so the draft can be discarded or I
have an email to send explaining what I've not done and why. Only now
do I mark the original review email as read.

Then I move on to the next reply to that thread in my mail box and
repeat until I have been through every mail in the thread and addressed
_all_ of the comments.

At the end of this process _every_ bit of review feedback is addressed
either by making the requested change or by responding explaining the
reason why the change hasn't been made. This is absolutely crucial. You
should never silently ignore a piece of review, even if you don't


I should double check each line but sometimes this doesn't mean I can 
understand everything correctly to do right thing as you expect. But 
this is really not what I intend to do.


Thanks
Tiejun


understand it or disagree with it, always reply and explain why you
haven't.

If the review was particularly long or complex I will then do a second
pass through the review comments and check that every comment is either
mentioned in a Changes in vN changelog comment or I have replied to
it.

I do all of that before posting the next version. IMHO until a
contributor has shown they are diligent about addressing review
comments they should _never_ send out a series which only has review
partially addressed.

The presence of an IRC channel in no way changes the requirement to be
systematic and thorough when dealing with email review.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v12] introduce XENMEM_reserved_device_memory_map

2015-07-22 Thread Chen, Tiejun

On 2015/7/22 21:03, Jan Beulich wrote:

On 22.07.15 at 14:55, tiejun.c...@intel.com wrote:

  +#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( unlikely(start_extent) )
+return -ENOSYS;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries)

)

+return -EFAULT;
+
+if ( grdm.map.flags  ~XENMEM_RDM_ALL )
+return -EINVAL;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+


Just as you asked me previously,

Here if RDM doesn't exist, so

grdm.map.nr_entries = grdm.used_entries = 0, and rc = 0, right?


No, grdm.map.nr_entries still holds whatever the caller passed.


What if the caller pass 0 like raising an inquiry? Indeed, this is 
what we did in patch #11. I think this is reasonable since the caller 
always doesn't know how much buffers should be allocated beforehand, so 
instead, the caller prefer to make this sort of inquiry without any 
buffers.





+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+grdm.map.nr_entries = grdm.used_entries;
+if ( __copy_to_guest(compat, grdm.map, 1) )


So can we still do this copy here?


We not only can, we need to. The only case where we might skip it
is when the incoming grdm.map.nr_entries is unchanged.


If what I'm saying above is right, __copy_to_guest() would return a 
error in this case, right? I don't think this make sense.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-22 Thread Chen, Tiejun

On 2015/7/22 21:24, Ian Jackson wrote:

Tiejun Chen writes ([v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with 
RDM):

Acked-by: Wei Liu wei.l...@citrix.com


I have dropped Wei's ack on this from my git branch, as it was clearly
inappropriate to retain it.  (v11 was acked by me, so there is no need
for a re-review by Wei unless he feels it would be useful.)


Ian,

Sounds you start to merge them into your tree?

But now Jan is trying to update patch #1 as you see. I think something 
needs to be synced on tool sides. Although that is not finished, at 
least three changes exist:


#1. PCI_DEV_RDM_ALL - XENMEM_RDM_ALL
#2. new public memop interface structure

#2.1
+union {
+struct physdev_pci_device pci;
+} dev;

#2.2 flag - flags

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-22 Thread Chen, Tiejun

On 2015/7/22 16:28, Jan Beulich wrote:

On 22.07.15 at 03:30, tiejun.c...@intel.com wrote:

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v11:

* Use GCNEW_ARRAY to replace libxl__malloc()

* #define pfn_to_paddrk is missing safety () around x, and
   move this into libxl_internal.h

* Rename set_rdm_entries() to add_rdm_entry() and put the
   increment at the end so that the assignments are
   to -rdms[d_config-num_rdms].

* Simply make it so that if there are any rdms specified
   in the domain config, they are used instead of the
   automatically gathered information (from strategy and
   devices). So just return if d_config-rmds is valid.

* Shorten some code comments.


I think it is not the first time that we're pointing out to you that
when you make not just cosmetic changes, review and ack tags
should be dropped.


I don't recall this sort of requirement was mentioned. Instead, this is 
new to me. So where can I found this warning you said previously?


Furthermore, you ask me to drop Reviewed-by/Acked-by in this revision, 
what's next? Just to this example,


No.1 revision:

Acked-by: Wei Liu wei.l...@citrix.com
Reviewed-by: Kevin Tian kevin.t...@intel.com

No.2 revision:

I addressed some comments raised by Jackson. But you mean 
Reviewed-by/Acked-by should be dropped.


No.3 revision:

I assume Jackson Ack or Review to this so I should leave one line like this,

Reviewed-by: Ian Jackson ian.jack...@eu.citrix.com

without two previous Acked-by/Reviewed-by, right? So looks like the 
latter always override the former, right?


And I also can't understand why we should drop Reviewed-by/Acked-by from 
other guys. And, all new comments I addressed don't conflict with our 
previous revision so why?


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v13 00/16] Fix RMRR (avoid RDM)

2015-07-22 Thread Chen, Tiejun

Ian,

This was supposed to be my responsibility to resend this series so 
appreciate for your extra effort.



git branch here:
   http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=summary
   git://xenbits.xen.org/people/iwj/xen.git
   base.rdm-v13..wip.rdm-v13



I pick all RMRR patches from wip.rdm-v13 and apply them into staging 
branch in my tree.


...



Tiejun, can you please test this series ?  Hopefully if you say it


Yes. I can compile and boot successfully with/without rdm setting.


still works after this, we can commit it tomorrow.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-22 Thread Chen, Tiejun

Ian,

Thanks for your effort.

A tiny change may be needed but I don't block this.


+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,


Since now we are sitting on xc_reserved_device_memory_map(, flags, xxx), 
s/flag/flags may be better.



+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{


[snip]


+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,


Ditto.


+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+GCNEW_ARRAY(*xrdm, *nr_entries);
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,


Ditto.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-22 Thread Chen, Tiejun

On 2015/7/22 22:04, Ian Jackson wrote:

Chen, Tiejun writes (Re: [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM):

Sounds you start to merge them into your tree?

But now Jan is trying to update patch #1 as you see. I think something
needs to be synced on tool sides. Although that is not finished, at
least three changes exist:


Thanks.  I am negotiating with Jan et al on IRC.



We just need to sync two patches:

#1. To patch #8:

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2991333..9c5ef8b 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1316,7 +1316,7 @@ int xc_get_machine_memory_map(xc_interface *xch,
   uint32_t max_entries);

 int xc_reserved_device_memory_map(xc_interface *xch,
-  uint32_t flag,
+  uint32_t flags,
   uint16_t seg,
   uint8_t bus,
   uint8_t devfn,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 298b3b5..1b074b7 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -686,7 +686,7 @@ int xc_domain_set_memory_map(xc_interface *xch,
 }

 int xc_reserved_device_memory_map(xc_interface *xch,
-  uint32_t flag,
+  uint32_t flags,
   uint16_t seg,
   uint8_t bus,
   uint8_t devfn,
@@ -695,11 +695,11 @@ int xc_reserved_device_memory_map(xc_interface *xch,
 {
 int rc;
 struct xen_reserved_device_memory_map xrdmmap = {
-.flag = flag,
-.seg = seg,
-.bus = bus,
-.devfn = devfn,
-.nr_entries = *max_entries
+.flags = flags,
+.nr_entries = *max_entries,
+.dev.pci.seg = seg,
+.dev.pci.bus = bus,
+.dev.pci.devfn = devfn,
 };
 DECLARE_HYPERCALL_BOUNCE(entries,
  sizeof(struct xen_reserved_device_memory) *

#2. To patch #11:

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 29476fc..40b2bba 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -212,7 +212,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
 unsigned int nr_entries;

 /* Collect all rdm info if exist. */
-rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+rc = libxl__xc_device_get_rdm(gc, XENMEM_RDM_ALL,
   0, 0, 0, nr_entries, xrdm);
 if (rc)
 goto out;
@@ -240,7 +240,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc,
 devfn = PCI_DEVFN(d_config-pcidevs[i].dev,
   d_config-pcidevs[i].func);
 nr_entries = 0;
-rc = libxl__xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+rc = libxl__xc_device_get_rdm(gc, 0,
   seg, bus, devfn, nr_entries, 
xrdm);

 if (rc)
 goto out;

Note I just compiled with these changes since right now I can't access 
any machine to test.


Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v12] introduce XENMEM_reserved_device_memory_map

2015-07-22 Thread Chen, Tiejun

+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( unlikely(start_extent) )
+return -ENOSYS;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+if ( grdm.map.flags  ~XENMEM_RDM_ALL )
+return -EINVAL;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+


Just as you asked me previously,

Here if RDM doesn't exist, so

grdm.map.nr_entries = grdm.used_entries = 0, and rc = 0, right?


+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+grdm.map.nr_entries = grdm.used_entries;
+if ( __copy_to_guest(compat, grdm.map, 1) )


So can we still do this copy here?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

Just to your example,

libxl_domain_config cfg;
cfg.stuff = blah;
cfg.rdm.strategy = HOST;

libxl_domain_create_new(cfg, domid);
libxl_domain_destroy(domid);

Here looks you mean d_config-rdms would be changed, right? Currently
this shouldn't be allowed. But I think we need to further discussion
make this case clear after feature freeze since we didn't have this kind
of assumption in our previous design.

libxl_domain_create_new(cfg, domid);


This response of yours does not lead me to think you have understood
what I am saying, but I agree that this can be dealt with later (if


Indeed, I'm not a fan to Xen tools so I can't picture what this real 
scenario would happen. So if I'm misunderstanding what you mean, just 
please correct me. Or if you still think its hard to explain this to me, 
just tell me what I should do. I think this make your life easy.


Thanks
Tiejun


indeed it needs to be dealt with at all).

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

On 2015/7/21 23:09, Ian Jackson wrote:

Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM):

+static void
+add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
+  uint64_t rdm_start, uint64_t rdm_size, int rdm_policy)
+{
+d_config-num_rdms++;
+d_config-rdms = libxl__realloc(NOGC, d_config-rdms,
+d_config-num_rdms * sizeof(libxl_device_rdm));
+
+d_config-rdms[d_config-num_rdms - 1].start = rdm_start;
+d_config-rdms[d_config-num_rdms - 1].size = rdm_size;
+d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy;
+}



But, I wrote:

Can I suggest a function

   void add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
 uint64_t rdm_start, uint64_t rdm_size, int rdm_policy)

which assumes that d_config-num_rdms is set correctly, and increments
it ?

(Please put the increment at the end so that the assignments are to
-rdms[d_config-num_rdms], or perhaps make a convenience alias.)

Note the last paragraph.

This is now the third time I have posted that text.  It is the fifth
request or clarification I have had to make about this very small
area.  I have to say that I'm finding this rather frustrating.


Sorry, I just ignore the line in brackets since I always think this kind 
of thing is often not a big deal, and next time I should pay more 
attention to the (). But indeed, before I post this whole patch online I 
also picked up this chunk of code to ask you to take a look that. This 
manner means I'm not very sure if I'm addressing this properly. But I 
didn't get a further response, so I guess that should work for you and 
then I posted the whole online.


Now back on our problem,

static void
add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
  uint64_t rdm_start, uint64_t rdm_size, int rdm_policy)
{
d_config-rdms = libxl__realloc(NOGC, d_config-rdms,
(d_config-num_rdms+1) * sizeof(libxl_device_rdm));

d_config-rdms[d_config-num_rdms].start = rdm_start;
d_config-rdms[d_config-num_rdms].size = rdm_size;
d_config-rdms[d_config-num_rdms].policy = rdm_policy;
d_config-num_rdms++;
}

Does this work for you? If I'm still wrong, please correct this function 
directly to cost you less.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

On 2015/7/21 23:57, Ian Jackson wrote:

Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM):

Sorry, I just ignore the line in brackets since I always think this kind
of thing is often not a big deal, and next time I should pay more
attention to the (). But indeed, before I post this whole patch online I
also picked up this chunk of code to ask you to take a look that. This
manner means I'm not very sure if I'm addressing this properly. But I
didn't get a further response, so I guess that should work for you and
then I posted the whole online.


You are talking about 55ae2bb1.9030...@intel.com I guess.  I replied
to that with several comments about your prose and about the
computation of the new set of rdms.

It's true that I didn't comment on the frat that you had half-done one
of the things I had requested.  It is of course a waste of my time to
be constantly re-reviewing half-done changes.


Next time I should see each line of all comments carefully. Maybe its 
good way to use IRC to take your quick advice in advance, and I hope 
this make you feel better. Anyway, sorry to bring this kind of 
inconvenience.





Now back on our problem,

static void
add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
uint64_t rdm_start, uint64_t rdm_size, int rdm_policy)
{
  d_config-rdms = libxl__realloc(NOGC, d_config-rdms,
  (d_config-num_rdms+1) * sizeof(libxl_device_rdm));

  d_config-rdms[d_config-num_rdms].start = rdm_start;
  d_config-rdms[d_config-num_rdms].size = rdm_size;
  d_config-rdms[d_config-num_rdms].policy = rdm_policy;
  d_config-num_rdms++;
}

Does this work for you? If I'm still wrong, please correct this function
directly to cost you less.


Yes, that is what I meant.



Good to know.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

I hope the following can address all comments below:


diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2f8e590..a4bd2a1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -407,7 +407,7 @@ int libxl__domain_build(libxl__gc *gc,

 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 634b8d2..ba852fe 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -92,6 +92,276 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }

+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+GCNEW_ARRAY(*xrdm, *nr_entries);
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, Could not get reserved device memory maps.\n);
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+ uint64_t rdm_start, uint64_t rdm_size)
+{
+return (start + memsize  rdm_start)  (start  rdm_start + rdm_size);
+}
+
+static void
+add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
+  uint64_t rdm_start, uint64_t rdm_size, int rdm_policy)
+{
+assert(d_config-num_rdms);
+
+d_config-rdms = libxl__realloc(NOGC, d_config-rdms,
+d_config-num_rdms * sizeof(libxl_device_rdm));
+
+d_config-rdms[d_config-num_rdms - 1].start = rdm_start;
+d_config-rdms[d_config-num_rdms - 1].size = rdm_size;
+d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy;
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could
+ * still be supported if no conflict.
+ *
+ * But in the case of lowmem, RDM probably scatter the whole RAM space.
+ * Especially multiple RDM entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * strict policy leads to fail libxl.
+ * relaxed policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ * Note when both policies are specified on a given region, the per-device
+ * policy should override the global policy.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+   libxl_domain_config *d_config,
+   uint64_t rdm_mem_boundary,
+   struct xc_hvm_build_args *args)
+{
+int i, j, conflict, rc;
+struct xen_reserved_device_memory *xrdm = NULL;
+uint32_t strategy = d_config-b_info.u.hvm.rdm.strategy;
+uint16_t seg;
+uint8_t bus, devfn;
+uint64_t rdm_start, rdm_size;
+uint64_t highmem_end = args-highmem_end ? args-highmem_end : 
(1ull32);

+
+/* Might not expose rdm. */
+if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE 
+!d_config-num_pcidevs)
+return 0;
+
+/* Query all RDM entries in this platform */
+if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) {
+unsigned int nr_entries;
+
+/* Collect all rdm info if exist. */
+rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+  0, 0, 0, nr_entries, xrdm);
+  

Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

The domain destroy would not change cfg at all.


Okay.




If not, I should double check this duplication so what about a return in
the case of duplicating one entry?


What I am looking for is a *decision* about what the right behaviour
is, backed up by a *rationale*.

The most obvious answer to me would be that if an rdms array is
specified, the strategy should be ignored.  That is how the automatic
numa placement API works.


I'm not familiar with this.

Do you mean this?

if (d_config-num_rdms)
strategy = LIBXL_RDM_RESERVE_STRATEGY_IGNORE;

But except this global strategy, we also have per-device setting so 
maybe something like this?


if (d_config-num_rdms)
return 0;



But another answer would be to take the union of the specified
regions.  That would be more complicated, because the union would have
to be computed.


  if (d_config-rdms[i].start == rdm_start)
  return;


That doesn't, of course, compute the union.



Sorry I don't understand what the take the union of the specified 
regions is here.


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

On 2015/7/21 19:11, Ian Jackson wrote:

Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM):

I would add this check at the beginning of this function:

assert(!d_config-num_rdms  !d_config-rdms).


As Ian Campbell and I have explained, that is not OK.

The caller may choose to pass both some rdms and strategy=host.

The most obvious case is something like the following code

   libxl_domain_config cfg;
   cfg.stuff = blah;
   cfg.rdm.strategy = HOST;

   libxl_domain_create_new(cfg, domid);
   libxl_domain_destroy(domid);


I'm not sure about this procedure, so do you mean this doesn't free 
d_config-num_rdms/d_config-rdms?


If not, I should double check this duplication so what about a return in 
the case of duplicating one entry?


static void
add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
  uint64_t rdm_start, uint64_t rdm_size, int rdm_policy)
{
if (d_config-rdms)
{
unsigned int i;

for (i = 0; i  d_config-num_rdms; i++)
{
if (d_config-rdms[i].start == rdm_start)
return;
}
}

d_config-num_rdms++;
d_config-rdms = libxl__realloc(NOGC, d_config-rdms,
d_config-num_rdms * sizeof(libxl_device_rdm));

d_config-rdms[d_config-num_rdms - 1].start = rdm_start;
d_config-rdms[d_config-num_rdms - 1].size = rdm_size;
d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy;
}

Thanks
Tiejun


   libxl_domain_create_new(cfg, domid);

which must work (and the second domain should have identical
configuration to the first).

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

On 2015/7/21 19:11, Ian Jackson wrote:

Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM):

I would add this check at the beginning of this function:

assert(!d_config-num_rdms  !d_config-rdms).


As Ian Campbell and I have explained, that is not OK.

The caller may choose to pass both some rdms and strategy=host.


If you're talking to create the domain twice I think this should be 
reasonable,


if (d_config-num_rdms) return 0;

Because RDMs are always associated to the platform so we should keep the 
same array. I mean same configuration shouldn't change this point.


Thanks
Tiejun



The most obvious case is something like the following code

   libxl_domain_config cfg;
   cfg.stuff = blah;
   cfg.rdm.strategy = HOST;

   libxl_domain_create_new(cfg, domid);
   libxl_domain_destroy(domid);
   libxl_domain_create_new(cfg, domid);

which must work (and the second domain should have identical
configuration to the first).

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

But d_config is a libxl_domain_config which is supplied by libxl's
caller.  It might contain some rdms.


I guess this line make you or other guys confused so lets delete this
line directly.


I don't think I am very confused.


And if you still worry about something, I can add assert() at the
beginning of this function like this,

assert(!d_config-num_rdms  !d_config-rdms).


If you are sure that this assertion is correct, then that would be
proper.

But as I say above, I don't think it is.



Based on Campbell' explanation I think you guys are raising a reasonable 
concern. We shouldn't clear that over there arbitrarily.


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

  I think the confusion here is that the d_config-rdms array (which

num_rdms is the length of) is in the public API (because it is in
libxl_types.idl) but is apparently only being used in this series as an
internal state for the domain build process (i.e. xl doesn't ever add
anything to the array rdms).

Tiejun, is that an accurate summary?


Yes.



If the field is in the public API then the possibility of something
being passed in their must be considered now, even if this particular
series adds no such calls, since we cannot prevent 3rd party users of
libxl adding such configuration.

Is the possibility of the toolstack (i.e. the caller of libxl) supplying
an array of rdm regions seems to be being left aside for future work or
it not intended to ever support that?


Its very possible so you're right.
Thanks
Tiejun



Ian.




And if you still worry about something, I can add assert() at the
beginning of this function like this,

assert(!d_config-num_rdms  !d_config-rdms).


If you are sure that this assertion is correct, then that would be
proper.

But as I say above, I don't think it is.

Ian.






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

On 2015/7/21 20:33, Ian Jackson wrote:

Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts 
with RDM):

[Ian Jackson:]

The most obvious answer to me would be that if an rdms array is
specified, the strategy should be ignored.  That is how the automatic
numa placement API works.


I'm not familiar with this.

Do you mean this?

  if (d_config-num_rdms)
  strategy = LIBXL_RDM_RESERVE_STRATEGY_IGNORE;

But except this global strategy, we also have per-device setting so
maybe something like this?

  if (d_config-num_rdms)
  return 0;


This latter was the kind of approach I had in mind.  I think that will
do for 4.6.



Thanks for your guideline on this patch, and also I realize lots of 
things need to be improved. So I'm going to make a TODO list publically 
after this feature freeze.


Now just for this patch, please take a final review (I just hope so.)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2f8e590..a4bd2a1 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -407,7 +407,7 @@ int libxl__domain_build(libxl__gc *gc,

 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 634b8d2..3afa735 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -92,6 +92,280 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }

+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+GCNEW_ARRAY(*xrdm, *nr_entries);
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, Could not get reserved device memory maps.\n);
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+ uint64_t rdm_start, uint64_t rdm_size)
+{
+return (start + memsize  rdm_start)  (start  rdm_start + rdm_size);
+}
+
+static void
+add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config,
+  uint64_t rdm_start, uint64_t rdm_size, int rdm_policy)
+{
+d_config-num_rdms++;
+d_config-rdms = libxl__realloc(NOGC, d_config-rdms,
+d_config-num_rdms * sizeof(libxl_device_rdm));
+
+d_config-rdms[d_config-num_rdms - 1].start = rdm_start;
+d_config-rdms[d_config-num_rdms - 1].size = rdm_size;
+d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy;
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could
+ * still be supported if no conflict.
+ *
+ * But in the case of lowmem, RDM probably scatter the whole RAM space.
+ * Especially multiple RDM entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * strict policy leads to fail libxl.
+ * relaxed policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ * Note when both policies are specified on a given region, the per-device
+ * policy should override the global policy.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+   libxl_domain_config *d_config

Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Chen, Tiejun

But another answer would be to take the union of the specified
regions.  That would be more complicated, because the union would have
to be computed.


   if (d_config-rdms[i].start == rdm_start)
   return;


That doesn't, of course, compute the union.


Sorry I don't understand what the take the union of the specified
regions is here.


A region is a subset of the address space.  I meant the set union:
   https://en.wikipedia.org/wiki/Union_%28set_theory%29

That is, an address would be reserved if it was reserved in any of the
rdm regions implied by the config.


Are you saying this point?

The union of two sets A and B is the set of elements which are in A, in 
B, or in both A and B.




The explicitly specified regions might overlap with the computed ones,
without being identical. Computing the union would not be entirely
trivial.



Just to your example,

  libxl_domain_config cfg;
  cfg.stuff = blah;
  cfg.rdm.strategy = HOST;

  libxl_domain_create_new(cfg, domid);
  libxl_domain_destroy(domid);

Here looks you mean d_config-rdms would be changed, right? Currently 
this shouldn't be allowed. But I think we need to further discussion 
make this case clear after feature freeze since we didn't have this kind 
of assumption in our previous design.


  libxl_domain_create_new(cfg, domid);

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Chen, Tiejun

Before you add memory_map.nr_map, you should be able to iterate
from 0 to (not inclusive) nr. At least as far as I recall the original
patch.



Sorry, I really don't understand what you want.

Before we add memory_map.nr_map, e820[0, nr) don't include low/high
memory, right?


Why? memory_map is representing the reserved areas only, isn't it?
If that's not the case, then of course everything is fine.


I'm pretty sure the memory map we get here is an extension of the
original PV-only get_e820 hypercall, which *does* include both the
lowmem and highmem regions.

In any case, it's pretty clear from the patched code that Tiejun is
removing the old code which created the lowmem and highmem regions and
is not replacing it.  Where do you think the highmem region he's
looking for was coming from?



On second thoughts, I prefer to check/sync memory_map.map[] before copy 
them into e820 since ultimately this can make sure hvm_info, 
memory_map.map[] and e820 are on the same page.


Anyway, I would send out v10 to address others so please further post 
your comments over there.


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 00/16] Fix RMRR

2015-07-20 Thread Chen, Tiejun

One brief request -- if you do end up sending this series again, can
you please rebase to staging?  This is the second time I've had to
manually fix up some patches so that they apply.



Sorry for this inconvenience.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Chen, Tiejun

v10:

* Instead of correcting e820, I'd like to correct memory_map.map[]
   and then copy them into e820 directly. I think this can make sure
   hvm_info, memory_map.map[] and e820 are on the same page.


Actually, now that you mention it -- this should probably happen
instead when we update hvm_info-{low,high}_mem_pgend.


I also considered this point previously but I thought just right now we 
only update hvm_info-low/high_mem_pgend inside pci_setup(). But you 
can't guarantee this would be a sole place in the future. Instead, 
memory_map.map[] would always be copied into e820 when we build e820 table.


So I think we'd better do this update once at the last minute.

Thanks
Tiejun



I'm happy to leave this where it is for now, so with Jan's comments
addressed (in particular, incrementing nr_map):

Reviewed-by: George Dunlap george.dun...@eu.citrix.com

But if we have time it might be better to pull the loop in pci_setup()
which moves the memory out into a function, and have that function
modify the lowmem and highmem map[] entries at the same time we modify
low_mem_pgend and high_mem_pgend.  Alternately, you could put that on
your list of clean-ups to hvmloader for 4.7.

Thanks,
  -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread Chen, Tiejun

On 2015/7/20 22:16, Jan Beulich wrote:

On 20.07.15 at 16:10, george.dun...@citrix.com wrote:

Hmm... although I suppose that doesn't catch the possibility of a memory
range crossing the 4G boundary.


I think we can safely ignore that - both real and virtual hardware have
special regions right below 4Gb, so neither RAM not RMRRs can be
reasonably placed there.



Okay, I regenerate this patch online. And I just hope its good to be 
acked here:


hvmloader/pci: Try to avoid placing BARs in RMRRs

Try to avoid placing PCI BARs over RMRRs:

- If mmio_hole_size is not specified, and the existing MMIO range has
  RMRRs in it, and there is space to expand the hole in lowmem without
  moving more memory, then make the MMIO hole as large as possible.

- When placing RMRRs, find the next RMRR higher than the current base
  in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
  the next one.

This certainly won't work in all cases, but it should work in a
significant number of cases.  Additionally, users should be able to
work around problems by setting mmio_hole_size larger in the guest
config.

Signed-off-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 tools/firmware/hvmloader/pci.c | 65 
++

 1 file changed, 65 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..74fc080 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,46 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;

+/* Check if the specified range conflicts with any reserved device 
memory. */

+static bool check_overlap_all(uint64_t start, uint64_t size)
+{
+unsigned int i;
+
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(start, size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+return true;
+}
+
+return false;
+}
+
+/* Find the lowest RMRR higher than base. */
+static int find_next_rmrr(uint32_t base)
+{
+unsigned int i;
+int next_rmrr = -1;
+uint64_t end, min_end = (1ull  32);
+
+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+end = memory_map.map[i].addr + memory_map.map[i].size;
+
+if ( memory_map.map[i].type == E820_RESERVED 
+ end  base 
+ min_end  min_end )
+{
+next_rmrr = i;
+min_end = end;
+}
+}
+
+return next_rmrr;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -46,6 +86,7 @@ void pci_setup(void)
 uint32_t vga_devfn = 256;
 uint16_t class, vendor_id, device_id;
 unsigned int bar, pin, link, isa_irq;
+int next_rmrr;

 /* Resources assignable to PCI devices via BARs. */
 struct resource {
@@ -299,6 +340,15 @@ void pci_setup(void)
 || (((pci_mem_start  1)  PAGE_SHIFT)
 = hvm_info-low_mem_pgend)) )
 pci_mem_start = 1;
+
+/*
+ * Try to accomodate RMRRs in our MMIO region on a best-effort 
basis.
+ * If we have RMRRs in the range, then make pci_mem_start just 
after

+ * hvm_info-low_mem_pgend.
+ */
+if ( pci_mem_start  (hvm_info-low_mem_pgend  PAGE_SHIFT) 
+ check_overlap_all(pci_mem_start, pci_mem_end-pci_mem_start) )
+pci_mem_start = hvm_info-low_mem_pgend  PAGE_SHIFT;
 }

 if ( mmio_total  (pci_mem_end - pci_mem_start) )
@@ -352,6 +402,8 @@ void pci_setup(void)
 io_resource.base = 0xc000;
 io_resource.max = 0x1;

+next_rmrr = find_next_rmrr(pci_mem_start);
+
 /* Assign iomem and ioport resources in descending order of size. */
 for ( i = 0; i  nr_bars; i++ )
 {
@@ -407,6 +459,19 @@ void pci_setup(void)
 }

 base = (resource-base  + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
+
+/* If we're using mem_resource, check for RMRR conflicts. */
+while ( resource == mem_resource 
+next_rmrr = 0 
+check_overlap(base, bar_sz,
+  memory_map.map[next_rmrr].addr,
+  memory_map.map[next_rmrr].size) )
+{
+base = memory_map.map[next_rmrr].addr + 
memory_map.map[next_rmrr].size;

+base = (base + bar_sz - 1)  ~(bar_sz - 1);
+next_rmrr = find_next_rmrr(base);
+}
+
 bar_data |= (uint32_t)base;
 bar_data_upper = (uint32_t)(base  32);
 base += bar_sz;
--
1.9.1

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Chen, Tiejun
Looks just a little bit should be changed so I also paste this new 
online to try winning your Acked here,



hvmloader/e820: construct guest e820 table

Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor supplied regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 tools/firmware/hvmloader/e820.c | 109 
+++-

 1 file changed, 96 insertions(+), 13 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c 
b/tools/firmware/hvmloader/e820.c

index 7a414ab..a6cacdf 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint32_t add_high_mem = 0;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  
PAGE_SHIFT;

+uint64_t map_start, map_size, map_end;

 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;

-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - 
e820[nr].addr;

-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +188,102 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }

+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));

-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ *
+ * Note we just have one low memory entry and one high mmeory entry if
+ * exists.
+ *
+ * But we may have relocated RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range firstly.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - 
e820[nr].addr;

-e820[nr].type = E820_RAM;
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_size;
+
+/* If we need to adjust lowmem. */
+if ( memory_map.map[i].type == E820_RAM 
+ low_mem_end  map_start  low_mem_end  map_end )
+{
+add_high_mem = map_end - low_mem_end;
+memory_map.map[i].size = low_mem_end - map_start;
+break;
+}
+}
+
+/* If we need to adjust highmem. */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_size;
+
+if ( memory_map.map[i].type == E820_RAM 
+ map_start == ((uint64_t)1  32))
+{
+memory_map.map[i].size += add_high_mem;
+break;
+}
+}
+
+/* If there was no highmem region, just create one. */
+if ( i == memory_map.nr_map )
+{
+memory_map.map[i].addr = ((uint64_t)1  32);
+memory_map.map[i].size = add_high_mem;
+memory_map.map[i].type = E820_RAM;
+memory_map.nr_map++;
+}
+
+/* A sanity 

Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-20 Thread Chen, Tiejun


Note I need more time to address others.


+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+   libxl_domain_config *d_config,
+   uint64_t rdm_mem_boundary,
+   struct xc_hvm_build_args *args)
+{

...

+/* Query all RDM entries in this platform */
+if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) {

...

+} else {
+d_config-num_rdms = 0;
+}


Does this not override the domain configuration's num_rdms ?  I don't


We don't have the specific num_rdms parameter in .cfg so I don't 
understand what you mean here.


Thanks
Tiejun


think that is correct.

If the domain configuration has rdms and num_rdms already set, then
the strategy should presumably be ignored.  (Passing the same domain
configuration struct to libxl_domain_create_new, after destroying the
domain, ought to work, even after the first call has modified it.)





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Chen, Tiejun

Actually, now that you mention it -- this should probably happen
instead when we update hvm_info-{low,high}_mem_pgend.



I also considered this point previously but I thought just right now we only
update hvm_info-low/high_mem_pgend inside pci_setup(). But you can't
guarantee this would be a sole place in the future. Instead,
memory_map.map[] would always be copied into e820 when we build e820 table.


We can guarantee it, if nothing else by making sure that no new
changes are added which change the one but not the other.


This means you have to syn this again once you change hvm_info so I 
think this may cost a little bit.


Thanks
Tiejun



But perhaps better would be to put a check in build_e820_map() to
BUG() if hvm_info and memory_map are out of sync.

On the other hand, looking at this now, I think that long-term we
should probably move to have *all* information about the memory layout
passed to hvmloader via the memory map, rather than via hvm_info.
That way we can also get rid of the magic knowledge that hvmloader
has about the memory layout (e.g., the VGA hole).

  -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread Chen, Tiejun

+/* Find the lowest RMRR higher than base. */
+static int find_next_rmrr(uint32_t base)
+{
+unsigned int i;
+int next_rmrr = -1;
+uint64_t min_base = (1ull  32);
+
+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ memory_map.map[i].addr  base 
+ memory_map.map[i].addr  min_base )
+{
+next_rmrr = i;
+min_base = memory_map.map[i].addr;
+}
+}
+return next_rmrr;
+}


Considering _both_ callers, I think the function should actually return
the lowest RMRR higher than or equal to base.


You mean instead of strictly greater than the base.


Or wait - we actually
need to find the lowest RMRR the _end_ of which is higher than base.


Yes, you're right: there's always a risk that pci_mem_start will *start*
in the middle of a range.  Looking for the next *end* is more robust.



Sorry this is not very clear to me so are you saying something like this?

for ( i = 0; i  memory_map.nr_map ; i++ )
{
end = memory_map.map[i].addr + memory_map.map[i].size;

if ( memory_map.map[i].type == E820_RESERVED 
 end  base 
 memory_map.map[i].addr  min_base )
{
next_rmrr = i;
min_base = memory_map.map[i].addr;
}
}


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-20 Thread Chen, Tiejun

For clarity:

I am not acking this patch, primarily because I am not happy with the
code in xlu_rdm_parse which is (a) the result of repeated
clone-and-hack and (b) consists of ad-hoc string pointer fiddling.


Yes, I knew you mentioned this previously but I also remember our last 
deal was something as follows:


 Really I would prefer that this parsing was done with a miniature flex
 parser, rather than ad-hoc pointer arithmetic and use of strtok.

 Sorry, could you show this explicitly?

 Something like what was done for disk devices.  See libxlu_disk_l.l
 for an example.  In this case your code would be a lot less
 complicated than what you see there.

 After the codefreeze I would probably have some time to write it for

Sounds yourself would do this so currently I just keep the original, right?

Thanks
Tiejun

 you.  (I think that would be valuable because libxlu_disk_l.l is a
 very complicated example, and I want be able to point future
 submitters at something simpler.)

 Ian.

Then I didn't receive any response again so I thought yourself made this 
promise.


Thanks
Tiejun



If I had been able to review this patch earlier in the release cycle I
would be explictly nacking this patch.  It is true that maybe someone
will have some time to clean this up later; but in practice it often
turns out that they don't - which is why we usually try not to accept
patches on the basis of promises to do further cleanup.

However, I am late to this party.  I first made this complaint in
response to v7 on the 9th of July.  Under the circumstances I am going
to stand aside and neither ack nor nack this patch.

The rules then say that the patch may be committed given that it has
Wei's ack as a tools maintainer.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-20 Thread Chen, Tiejun

+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+   libxl_domain_config *d_config,
+   uint64_t rdm_mem_boundary,
+   struct xc_hvm_build_args *args)
+{

...

+/* Query all RDM entries in this platform */
+if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) {

...

+} else {
+d_config-num_rdms = 0;
+}


Does this not override the domain configuration's num_rdms ?  I don't


We don't have the specific num_rdms parameter in .cfg so I don't
understand what you mean here.


The domain configuration specified to libxl might contain some rdms.
Then num_rdms in the incoming config would be nonzero.


We never set d_config-num_rdms/d_config-rdms before we goes inside 
libxl__domain_device_construct_rdm(). And actually 
libxl__domain_device_construct_rdm is only one place to set 
d_config-num_rdms/d_config-rdms.


I guess this line make you or other guys confused so lets delete this 
line directly.


And if you still worry about something, I can add assert() at the 
beginning of this function like this,


assert(!d_config-num_rdms  !d_config-rdms).

Thanks
Tiejun



So I think there are two problems here:

1. If that were the case you would leak the application's rdms array.

2. Anyway, if the caller specifies such an array you should use it.
(Fixing this would avoid (1) in any case.)

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread Chen, Tiejun

Okay, I regenerate this patch online. And I just hope its good to be
acked here:


Provided it also works,
Reviewed-by: Jan Beulich jbeul...@suse.com



Why is this marked as Acked-by this time? The same question is raised to 
another hvmloader patch as well.


This really makes me confused since you're the key maintainer associated 
to this, and I remember you also gave me Acked-by to the first hvmloader 
patch. I know this solution is always argued, so does this mean you 
still don't think this is good to go in the tree in your perspective, so 
you want to leave this Acked-by to other maintainers, right?


And what about patch #7, hvmloader/e820: construct guest e820 table, is 
this also not fine to you?


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-17 Thread Chen, Tiejun

---
v9:

* A little improvement to code implementation but again, its still argued about
   this solution.


And as said in reply to George's reply to v8 - the alternative he
proposed is still better than this one, and would therefore have
better chances of me agreeing to take what is there instead of
pushing for a proper solution.



Looks I just need to pick up George' patch as our solution at least to 
eliminate all arguments in current cycle.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Requesting for freeze exception for RMRR

2015-07-17 Thread Chen, Tiejun

My main disagreement here continues to be that we're talking
about a bug fix, and hence I don't view this as needing a freeze
exception in the first place (at least not at this point in time). Yes,
the bug fix involves adding code that looks like a new feature, but
that happens with bug fixes.



Fine then. I'm not going to argue feature vs bug fix at this stage.  The
final resolution is still the same. Tiejun can continue working on this
next week.



Wei and Jan,

Really thanks for your clarification to this case.

Looks two key problems should be addressed as follows:

#1. mmio conflicting with RDM

As Jan suggested George's patch is fine in this phrase.

#2. construct e820

I need to understand what Jan comments properly than resend a patch.

I'm going to finalize these things next week as early as possible.

Again, thanks for all guys's help and I can't walk here without any your 
guides.


Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-17 Thread Chen, Tiejun

On 2015/7/17 18:50, Jan Beulich wrote:

On 17.07.15 at 11:09, tiejun.c...@intel.com wrote:

And then of course there's the question of whether nr is really
the right upper loop bound here: Just prior to this you added
the hypervisor supplied entries - why would you need to iterate
over them here? I.e. I'd see this better be moved ahead of that
other code.



Sounds you mean I should sync low/high memory in memory_map.map[]
beforehand and then fulfill e820 like this,


Why would you want/need to sync into memory_map.map[]?


But actually I just felt this make our process clear.


That's certainly not what I suggested.



Do you mean I should check low/high mem before we add the hypervisor 
supplied entries like this?


+for ( i = nr-1; i  memory_map.nr_map; i-- )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+break;
+}
+}
+
+/* And then we also need to adjust highmem. */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = nr-1 ; i  memory_map.nr_map; i-- )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr == ((uint64_t)1  32))
+{
+e820[i].size += add_high_mem;
+break;
+}
+}
+
+/* If there was no highmem region, just create one. */
+if ( i == memory_map.nr_map )
+{
+e820[nr].addr = ((uint64_t)1  32);
+e820[nr].size = add_high_mem;
+e820[nr].type = E820_RAM;
+i = nr;
+nr++;
+}
+
+/* A sanity check if high memory is broken. */
+BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
+}

Thanks
Tiejun






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Requesting for freeze exception for RMRR

2015-07-17 Thread Chen, Tiejun

The PCI allocation code is in a state, but it was in a similarly bad
state before.  I agree with Jan's point of the risk that these new
changes cause a regression in booting guests, although we can mitigate
that somewhat by testing.

I feel at this point that we shouldn't block the RMRR bugfix on also
fixing the PCI allocation algorithm (which was a pre-existing issue).

Therefore, I recommend that v9 gets respun to v10 to address the current
comments, and accepted.  Afterwards, the PCI allocation algorithm gets
worked on as a bugfix activity, to pro actively cater for the risk of
regression.


If you have any good solution to fix the PCI allocation algorithm 
completely, things couldn't be better :)


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-17 Thread Chen, Tiejun

The way it's written I take it that you assume there to be exactly
one region that the adjustment needs to be done for. Iirc this is
correct with the current model, but why would you continue the
loop then afterwards? Placing a break in the if()'s body would
document the fact that only one such region should exist, and
would eliminate questions as to whether add_high_mem shouldn't
be updated (+=) instead of simply being assigned a new value.


Yes, break should be added here.



And then of course there's the question of whether nr is really
the right upper loop bound here: Just prior to this you added
the hypervisor supplied entries - why would you need to iterate
over them here? I.e. I'd see this better be moved ahead of that
other code.



Sounds you mean I should sync low/high memory in memory_map.map[] 
beforehand and then fulfill e820 like this,


diff --git a/tools/firmware/hvmloader/e820.c 
b/tools/firmware/hvmloader/e820.c

index 7a414ab..b0aa48d 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint32_t add_high_mem = 0;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  
PAGE_SHIFT;

+uint64_t map_start, map_size, map_end;

 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;

-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - 
e820[nr].addr;

-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +188,101 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }

+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));

-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ *
+ * Note we just have one low memory entry and one high mmeory entry if
+ * exists.
+ *
+ * But we may have relocated RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range firstly.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - 
e820[nr].addr;

-e820[nr].type = E820_RAM;
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_end;
+
+/* If we need to adjust lowmem. */
+if ( memory_map.map[i].type == E820_RAM 
+ low_mem_end  map_start  low_mem_end  map_end )
+{
+add_high_mem = map_end - low_mem_end;
+memory_map.map[i].size = low_mem_end - map_start;
+break;
+}
+}
+
+/* If we need to adjust highmem. */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_end;
+
+if ( memory_map.map[i].type == E820_RAM 
+ map_start == ((uint64_t)1  32))
+{
+memory_map.map[i].size += add_high_mem;
+break;
+}
+}
+
+/* If there was no highmem region, just create one. */
+if ( i == memory_map.nr_map )
+{
+memory_map.map[i].addr = ((uint64_t)1  32);
+memory_map.map[i].size = add_high_mem;
+memory_map.map[i].type = E820_RAM;
+}
+
+/* A sanity check if high memory is broken. */
+BUG_ON( high_mem_end !=
+memory_map.map[i].addr + memory_map.map[i].size);
+}
+
+/* Now fulfill e820. */
+for ( i = 0; i  memory_map.nr_map; i++ )

Re: [Xen-devel] Requesting for freeze exception for RMRR

2015-07-17 Thread Chen, Tiejun

I think Andrew means you (or someone else) improves that algorithm
later. No need to provide a perfect solution next week.



Yes, I understand what he mean. But I still want to further ask if he 
have such a good idea right now, maybe we can try to address that 
directly :)


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-17 Thread Chen, Tiejun

+for ( i = nr-1; i  memory_map.nr_map; i-- )


Before you add memory_map.nr_map, you should be able to iterate
from 0 to (not inclusive) nr. At least as far as I recall the original
patch.



Sorry, I really don't understand what you want.

Before we add memory_map.nr_map, e820[0, nr) don't include low/high 
memory, right? So sounds you want me to


for ( i = 0 i  memory_map.nr_map; i++ )
{
if we need to adjust low memory, we just set final low e820 entry;
if we need to adjust high memory, we just set final high e820 entry;
}

Right? But its impossible to do this since we can't assume 
memory_map.map[low memory] is always prior to memory_map.map[high memory].


If I still follow your way, please don't mind to show a pseudocode help 
me understand what you want.


Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-17 Thread Chen, Tiejun

Remind me again please - what prevents the highmem region from
colliding with hypervisor supplied entries?

Also, what if the resulting region exceeds the addressable range
(guest's view of CPUID[8008].EAX[0:7])?


Any idea to this? I think this issue also exists previously.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-17 Thread Chen, Tiejun



On 2015/7/18 0:06, Jan Beulich wrote:

On 17.07.15 at 17:54, tiejun.c...@intel.com wrote:
+for ( i = nr-1; i  memory_map.nr_map; i-- )


Before you add memory_map.nr_map, you should be able to iterate
from 0 to (not inclusive) nr. At least as far as I recall the original
patch.



Sorry, I really don't understand what you want.

Before we add memory_map.nr_map, e820[0, nr) don't include low/high
memory, right?


Why? memory_map is representing the reserved areas only, isn't it?


+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists

Am I wrong with your expectation?

Thanks
Tiejun


If that's not the case, then of course everything is fine.

Jan




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-16 Thread Chen, Tiejun

On 2015/7/16 17:40, George Dunlap wrote:

On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun tiejun.c...@intel.com wrote:

Could you take a look at the original patch #06 ?  Although Jan thought that
is complicated, that is really one version that I can refine in current time
slot.


When you say original, which version are you talking about?  You
mean the one at the base of this thread (v7)?



Yes, I'm pointing patch #6 in v7. And sorry to make this confused to you.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-16 Thread Chen, Tiejun

On 2015/7/16 15:40, Jan Beulich wrote:

On 16.07.15 at 08:52, tiejun.c...@intel.com wrote:

@@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl(
  seg = machine_sbdf  16;
  bus = PCI_BUS(machine_sbdf);
  devfn = PCI_DEVFN2(machine_sbdf);
+flag = domctl-u.assign_device.flag;
+if ( flag  XEN_DOMCTL_DEV_RDM_RELAXED )


Didn't we settle on flag  ~XEN_DOMCTL_DEV_RDM_RELAXED?


Sorry its my fault to miss this merge.

BTW, could I resend this patch separately to get your Ack? If you don't 
have other objections.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 00/16] Fix RMRR

2015-07-16 Thread Chen, Tiejun

On 2015/7/16 15:55, Jan Beulich wrote:

On 10.07.15 at 16:50, george.dun...@eu.citrix.com wrote:

On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen tiejun.c...@intel.com wrote:

v7:


It looks like most of the libxl/libxc patches have been acked.  It
seems to me that most of the hypervisor patches (1-3, 14-15) are
either ready to go in or pretty close.


Now that I looked over v8 I have to admit that if I was a tools
maintainer I wouldn't want to see some of the tools patches in
with just an ack, but without any review.


I'm somewhat confused at this point.

Acked-by: is often used by the maintainer of the affected code when that
maintainer neither contributed to nor forwarded the patch. It is a 
record that the acker has at least reviewed the patch and has indicated 
acceptance.


Does this imply this is already reviewed?

Thanks
Tiejun  


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 00/16] Fix RMRR

2015-07-16 Thread Chen, Tiejun

It looks like most of the libxl/libxc patches have been acked.  It
seems to me that most of the hypervisor patches (1-3, 14-15) are
either ready to go in or pretty close.


Now that I looked over v8 I have to admit that if I was a tools
maintainer I wouldn't want to see some of the tools patches in
with just an ack, but without any review.


I'm somewhat confused at this point.

Acked-by: is often used by the maintainer of the affected code when that
maintainer neither contributed to nor forwarded the patch. It is a
record that the acker has at least reviewed the patch and has indicated
acceptance.

Does this imply this is already reviewed?


No, that would be expressed by Reviewed-by. Acked-by merely
means no objection by the maintainer for the change to go in.



Sorry I'm trying to dig into this.

If nobody would like to take a look at this, so isn't this the 
associated maintainer's responsibility to review finally? In this case 
isn't Acked-by fine enough?


Or you still want to us add two lines explicitly,

Reviewed-by: A
Acked-by: A


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-16 Thread Chen, Tiejun

+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i  nr; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr == (1ull  32))
+{
+e820[i].size += add_high_mem;
+add_high_mem = 0;
+break;
+}
+}
+}
+
+/* Or this is just populated by hvmloader itself. */


This should probably say something like:

If there was no highmem region, we need to create one.


Okay, If there was no highmem entry, we need to create one.




+if ( add_high_mem )
+{
+/*
+ * hvmloader should always update hvm_info-high_mem_pgend
+ * when it relocates RAM anywhere.
+ */
+BUG_ON( !hvm_info-high_mem_pgend );
+
  e820[nr].addr = ((uint64_t)1  32);
  e820[nr].size =
  ((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - 
e820[nr].addr;


In theory add_high_mem and hvm_info-high_mem_pgend  PAGE_SHIFT -
4GiB is the same, but it seems like asking for trouble to assume so


No, its not true in the first if( add_high_mem ) conditional.

Before we enter hvmloader, there are two cases:

#1. hvm_info-high_mem_pgend == 0

So we wouldn't have a highmem entry in e820. But hvmloader may relocate 
RAM upward highmem (add_high_mem) to get sufficient mmio, so 
hvm_info-high_mem_pgend is expanded somewhere (4GiB + add_high_mem).


Then we would fall into the second if( add_high_mem ) conditional.

#2. hvm_info-high_mem_pgend != 0

We always walk into the first if( add_high_mem ) conditional. But here 
add_high_mem just represents that highmem section expanded by 
hvmloader, its really not the whole higmem:(hvm_info-high_mem_pgend  
PAGE_SHIFT - 4GiB).


Thanks
Tiejun


without checking.

Perhaps in the first if( add_high_mem ) conditional, you can
BUG_ON(add_high_mem != ((hvm_info-high_mem_pgend  PAGE_SHIFT) -
(ull1  32))) ?

Other than that, this looks good, thanks.

  -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Chen, Tiejun

  Except that this isn't valid C (no statement following the label). I can

accept goto-s for some error handling cases where the alternatives
might be considered even more ugly than using goto. But the way
this or your original proposal look, I'd rather not have goto-s used
like this.



What about this?

+bool is_conflict = false;

 for ( devfn = 0; devfn  256; devfn++ )
 {
@@ -60,7 +61,7 @@ static void disable_conflicting_devices(void)
 continue;

 /* Check all bars */
-for ( bar = 0; bar  7; bar++ )
+for ( bar = 0; bar  7  !is_conflict; bar++ )
 {
 bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
 if ( bar == 6 )
@@ -89,7 +90,7 @@ static void disable_conflicting_devices(void)
 bar_sz = pci_readl(devfn, bar_reg);
 bar_sz = PCI_BASE_ADDRESS_MEM_MASK;

-for ( i = 0; i  memory_map.nr_map ; i++ )
+for ( i = 0; i  memory_map.nr_map  !is_conflict; i++ )
 {
 if ( memory_map.map[i].type == E820_RESERVED )
 {
@@ -105,13 +106,13 @@ static void disable_conflicting_devices(void)
devfn3, devfn7, bar_reg, bar_data);
 cmd = pci_readw(devfn, PCI_COMMAND);
 pci_writew(devfn, PCI_COMMAND, ~cmd);
-/* Jump next device. */
-goto check_next_device;
+/* So need to jump next device. */
+is_conflict = true;
 }
 }
 }
 }
- check_next_device:
+is_conflict = false;
 }
 }

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Chen, Tiejun

+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+if ( memory_map.map[i].type != E820_RAM )


Here we're assuming that any region not marked as RAM is an RMRR.  Is that true?

In any case, it would be just as strange to have a device BAR overlap
with guest RAM as with an RMRR, wouldn't it?


OOPS! Actually I should take this,

if ( memory_map.map[i].type == E820_RESERVED )

This is same as when I check [RESERVED_MEMORY_DYNAMIC_START, 
RESERVED_MEMORY_DYNAMIC_END).





+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(bar_data , bar_sz,
+   reserved_start, reserved_size) )
+{
+is_conflict = true;
+/* Now disable the memory or I/O mapping. */
+printf(pci dev %02x:%x bar %02x : 0x%08x : conflicts 
+   reserved resource so disable this device.!\n,
+   devfn3, devfn7, bar_reg, bar_data);
+cmd = pci_readw(devfn, PCI_COMMAND);
+pci_writew(devfn, PCI_COMMAND, ~cmd);
+break;
+}
+}
+
+/* Jump next device. */
+if ( is_conflict )
+{
+is_conflict = false;
+break;
+}


This conditional is still inside the memory_map loop; you want it one
loop futher out, in the bar loop, don't you?


Here what I intended to do is if one of all bars specific to one given 
device already conflicts with RDM, its not necessary to continue check 
other remaining bars of this device and other RDM regions, we just 
disable this device simply then check next device.




Also, if you declare is_conflict inside the devfn loop, rather than in
the main function, then you don't need this is_conflict=false here.

It might also be more sensible to use a goto instead; but this is one


This can work for me so it may be as follows:

for ( devfn = 0; devfn  256; devfn++ )
{
 check_next_device:
vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
device_id = pci_readw(devfn, PCI_DEVICE_ID);
if ( (vendor_id == 0x)  (device_id == 0x) )
continue;
...
if ( check_overlap(bar_data , bar_sz,
   reserved_start, reserved_size) )
{
...
/* Jump next device. */
devfn++;
goto check_next_device;
}



where Jan will have a better idea what standard practice will be.



I can follow that again if Jan has any good implementation.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-16 Thread Chen, Tiejun

Yes, sorry, add_high_mem will be the size of memory *relocated*, not
the actual end of it (unless, as you say, the original highmem region
didn't exist).

What I really meant was that either way, after adjusting the highmem
region in the e820, the end of that region should correspond to
hvm_info-high_mem_pgend.

What about something like this?
---
 /*
  * And then we also need to adjust highmem.
  */
 if ( add_high_mem )
 {
 /*
  * Modify the existing highmem region if it exists
  */
 for ( i = 0; i  nr; i++ )
 {
 if ( e820[i].type == E820_RAM 
  e820[i].addr == (1ull  32))
 {
 e820[i].size += add_high_mem;
 break;
 }
 }

 /*
  * If we didn't find a highmem region, make one
  */
 if ( i == nr )
 {
 e820[nr].addr = ((uint64_t)1  32);
 e820[nr].size = e820[nr].addr + add_high_mem;
 e820[nr].type = E820_RAM;
 nr++;
 }

 /*
  * Either way, at this point i points to the entry containing
  * highmem.  Compare it to what's in hvm_info as a sanity
  * check.
  */
 BUG_ON(e820[i].addr+e820[i].size !=
((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT));
 }



Looks really better.

I just introduce a little change based on yours, and I post this as a whole,

diff --git a/tools/firmware/hvmloader/e820.c 
b/tools/firmware/hvmloader/e820.c

index 7a414ab..8c9b01f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  
PAGE_SHIFT;

+uint64_t add_high_mem = 0;

 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;

-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - 
e820[nr].addr;

-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }

-
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - 
e820[nr].addr;

-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
 nr++;
 }

+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
+
+/*
+ * Its possible to relocate RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range in e820.
+ */
+for ( i = 0; i  nr; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = 0; i  nr; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr == ((uint64_t)1  32))
+{
+e820[i].size += add_high_mem;
+break;
+}
+}
+
+/* If there was no highmem region, just create one. */
+if ( i == nr )
+{
+e820[nr].addr = ((uint64_t)1  32);
+e820[nr].size = high_mem_end  - e820[nr].addr;
+e820[nr].type = E820_RAM;
+

Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Chen, Tiejun

What about this?


Looks reasonable (but don't forget that I continue to be unconvinced
that the patch as a whole makes sense).


Yes, I always keep this in my mind as I mentioned in patch #00. Any risk 
you're still concerning? Is it that case if guest OS force enabling 
these devices again? IMO, at this point there are two cases:


#1. Without passing through a RMRR device

Those emulated devices don't create 1:1 mapping so its safe, right?

#2. With passing through a RMRR device

This just probably cause these associated devices not to work well, but 
still don't bring any impact to other Domains, right? I mean this isn't 
going to worsen the preexisting situation.


If I'm wrong please correct me.

Thanks
Tiejun



Jan


+bool is_conflict = false;

   for ( devfn = 0; devfn  256; devfn++ )
   {
@@ -60,7 +61,7 @@ static void disable_conflicting_devices(void)
   continue;

   /* Check all bars */
-for ( bar = 0; bar  7; bar++ )
+for ( bar = 0; bar  7  !is_conflict; bar++ )
   {
   bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
   if ( bar == 6 )
@@ -89,7 +90,7 @@ static void disable_conflicting_devices(void)
   bar_sz = pci_readl(devfn, bar_reg);
   bar_sz = PCI_BASE_ADDRESS_MEM_MASK;

-for ( i = 0; i  memory_map.nr_map ; i++ )
+for ( i = 0; i  memory_map.nr_map  !is_conflict; i++ )
   {
   if ( memory_map.map[i].type == E820_RESERVED )
   {
@@ -105,13 +106,13 @@ static void disable_conflicting_devices(void)
  devfn3, devfn7, bar_reg, bar_data);
   cmd = pci_readw(devfn, PCI_COMMAND);
   pci_writew(devfn, PCI_COMMAND, ~cmd);
-/* Jump next device. */
-goto check_next_device;
+/* So need to jump next device. */
+is_conflict = true;
   }
   }
   }
   }
- check_next_device:
+is_conflict = false;
   }
   }

Thanks
Tiejun







___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-16 Thread Chen, Tiejun



On 2015/7/16 23:16, George Dunlap wrote:

On 07/16/2015 04:04 PM, Chen, Tiejun wrote:

Yes, sorry, add_high_mem will be the size of memory *relocated*, not
the actual end of it (unless, as you say, the original highmem region
didn't exist).

What I really meant was that either way, after adjusting the highmem
region in the e820, the end of that region should correspond to
hvm_info-high_mem_pgend.

What about something like this?
---
  /*
   * And then we also need to adjust highmem.
   */
  if ( add_high_mem )
  {
  /*
   * Modify the existing highmem region if it exists
   */
  for ( i = 0; i  nr; i++ )
  {
  if ( e820[i].type == E820_RAM 
   e820[i].addr == (1ull  32))
  {
  e820[i].size += add_high_mem;
  break;
  }
  }

  /*
   * If we didn't find a highmem region, make one
   */
  if ( i == nr )
  {
  e820[nr].addr = ((uint64_t)1  32);
  e820[nr].size = e820[nr].addr + add_high_mem;
  e820[nr].type = E820_RAM;
  nr++;
  }

  /*
   * Either way, at this point i points to the entry containing
   * highmem.  Compare it to what's in hvm_info as a sanity
   * check.
   */
  BUG_ON(e820[i].addr+e820[i].size !=
 ((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT));
  }



Looks really better.

I just introduce a little change based on yours, and I post this as a
whole,

diff --git a/tools/firmware/hvmloader/e820.c
b/tools/firmware/hvmloader/e820.c
index 7a414ab..8c9b01f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
   unsigned int lowmem_reserved_base,
   unsigned int bios_image_base)
  {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend 
PAGE_SHIFT;
+uint64_t add_high_mem = 0;

  if ( !lowmem_reserved_base )
  lowmem_reserved_base = 0xA;
@@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
  e820[nr].type = E820_RESERVED;
  nr++;

-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) -
e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
  /*
   * Explicitly reserve space for special pages.
   * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
  nr++;
  }

-
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
  {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) -
e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
  nr++;
  }

+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
+
+/*
+ * Its possible to relocate RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less
than
+ * the original we need to revise low/high memory range in e820.
+ */
+for ( i = 0; i  nr; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = 0; i  nr; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr == ((uint64_t)1  32))
+{
+e820[i].size += add_high_mem;
+break;
+}
+}
+
+/* If there was no highmem region, just create one. */
+if ( i == nr )
+{
+e820[nr].addr

Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-16 Thread Chen, Tiejun

Honestly I didn't try to change that point but maybe I'm missing something?


Yes, you are missing something. :-)  I told you exactly what I wanted
changed and what I said could remain the same:


By all means, calculate high_mem_end so it's easier to read.  But then,
when creating a new region, set e820[nr].size = add_high_mem, so that
the BUG_ON() that follows actually checks something useful.


Just to be clear, I want the second if() statement to look like this:


+if ( i == nr )
+{
+e820[nr].addr = ((uint64_t)1  32);
+e820[nr].size = add_high_mem;


Ahh, when you're replying this, I also see this difference and realize 
what you meant. Sorry to this inconvenience and I'll sync this line into 
my tree :)


Thanks
Tiejun


+e820[nr].type = E820_RAM;
+nr++;
+}


Think about why and maybe that will help you understand what I'm talking
about.

  -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Chen, Tiejun

Here what I intended to do is if one of all bars specific to one given
device already conflicts with RDM, its not necessary to continue check other
remaining bars of this device and other RDM regions, we just disable this
device simply then check next device.


I know what you're trying to do; what I'm saying is I don't think it
does what you want it to do.

You have loops nested 3 deep:
1. for each dev
   2.  for each bar
 3. for each memory range

This conditional is in loop 3; you want it to be in loop 2.

(In fact, when you set is_conflict, you then break out of loop 3 back
into loop 2; so this code will never actually be run.)


Sorry I should make this clear last time.

I mean I already knew what you were saying is right at this point so I 
tried to use goto to fix this bug.




   Also, if you declare is_conflict inside the devfn loop, rather than in

the main function, then you don't need this is_conflict=false here.

It might also be more sensible to use a goto instead; but this is one





[snip]


I'm not a fan of hard-coding the loop continuing condition like this;
if I were going to do a goto, I'd want to go to the end of the loop.



I guess something like this,

...
pci_writew(devfn, PCI_COMMAND, ~cmd);
/* Jump next device. */
goto check_next_device;
}
}
}
}
 check_next_device:
}
}


Anyway, the code is OK as it is; I'd rather spend time working on
something that's more of a blocker.



Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Chen, Tiejun

On 2015/7/16 23:39, George Dunlap wrote:

On 07/16/2015 04:20 PM, Chen, Tiejun wrote:

What about this?


Looks reasonable (but don't forget that I continue to be unconvinced
that the patch as a whole makes sense).


Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
you're still concerning? Is it that case if guest OS force enabling
these devices again? IMO, at this point there are two cases:

#1. Without passing through a RMRR device

Those emulated devices don't create 1:1 mapping so its safe, right?

#2. With passing through a RMRR device

This just probably cause these associated devices not to work well, but
still don't bring any impact to other Domains, right? I mean this isn't
going to worsen the preexisting situation.

If I'm wrong please correct me.


But I think the issue is, without doing *something* about MMIO
collisions, the feature as a whole is sort of pointless.  You can
carefully specify rdm=strategy=host,reserved=strict, but you might


I got what your mean. But there's no a good approach to bridge between 
xl and hvmloader to follow this policy. Right now, maybe just one thing 
could be tried like this,


Under hvmloader circumstance,

strict - Still set RDM as E820_RESERVED
relaxed - Set RDM as a new internal E820 flag like E820_HAZARDOUS

Then in the case of MMIO collisions

E820_RESERVED - BUG() - Stop VM
E820_HAZARDOUS - our warning messages + disable devices

I think this can make sure we always take consistent policy in each 
involved cycle.


Thanks
Tiejun


still get devices whose MMIO regions conflict with RMMRs, and there's
nothing you can really do about it.

And although I personally think it might be possible / reasonable to
check in a newly-written, partial MMIO collision avoidance patch, not
everyone might agree.  Even if I were to rewrite and post a patch
myself, they may argue that doing such a complicated re-design after the
feature freeze shouldn't be allowed.

  -George



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Chen, Tiejun

Jan and George,

That implementation to this problem in v7 is really not accepted? Yes, 
that isn't a best solution to make our original mechanism very well, but 
in high level I just think that should be a correct solution fixing this 
problem. According to recent discussion seems we have not a efficient 
way which can be put into 4.6. So instead, could your guys help me make 
that better gradually to reach our current requirement as possible?


Thanks
Tiejun

On 2015/7/17 0:40, George Dunlap wrote:

On 07/16/2015 05:08 PM, Chen, Tiejun wrote:

On 2015/7/16 23:39, George Dunlap wrote:

On 07/16/2015 04:20 PM, Chen, Tiejun wrote:

What about this?


Looks reasonable (but don't forget that I continue to be unconvinced
that the patch as a whole makes sense).


Yes, I always keep this in my mind as I mentioned in patch #00. Any risk
you're still concerning? Is it that case if guest OS force enabling
these devices again? IMO, at this point there are two cases:

#1. Without passing through a RMRR device

Those emulated devices don't create 1:1 mapping so its safe, right?

#2. With passing through a RMRR device

This just probably cause these associated devices not to work well, but
still don't bring any impact to other Domains, right? I mean this isn't
going to worsen the preexisting situation.

If I'm wrong please correct me.


But I think the issue is, without doing *something* about MMIO
collisions, the feature as a whole is sort of pointless.  You can
carefully specify rdm=strategy=host,reserved=strict, but you might


I got what your mean. But there's no a good approach to bridge between
xl and hvmloader to follow this policy. Right now, maybe just one thing
could be tried like this,

Under hvmloader circumstance,

strict - Still set RDM as E820_RESERVED
relaxed - Set RDM as a new internal E820 flag like E820_HAZARDOUS

Then in the case of MMIO collisions

E820_RESERVED - BUG() - Stop VM
E820_HAZARDOUS - our warning messages + disable devices

I think this can make sure we always take consistent policy in each
involved cycle.


A better way to communicate between xl and hvmloader is to use xenstore,
as we do for allow_memory_reallocate.  But I have very little hope we
can hash out a suitable design for that by tomorrow.

  -George




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Chen, Tiejun


  base = (resource-base  + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
+
+/* If we're using mem_resource, check for RMRR conflicts */
+while ( resource == mem_resource 
+next_rmrr  0 
+check_overlap(base, bar_sz,
+  memory_map.map[next_rmrr].addr,
+  memory_map.map[next_rmrr].size)) {
+base = memory_map.map[next_rmrr].addr +
memory_map.map[next_rmrr].size;
+base = (resource-base  + bar_sz - 1)  ~(uint64_t)(bar_sz - 1);
+next_rmrr=find_next_rmrr(base);
+}
+
  bar_data |= (uint32_t)base;
  bar_data_upper = (uint32_t)(base  32);
  base += bar_sz;



Actually this chunk of codes are really similar as what we did in my 
previous revisions from RFC ~ v3. It's just trying to skip and then 
allocate, right? As Jan pointed out, there are two key problems:


#1. All skipping action probably cause a result of no sufficient MMIO to 
allocate all devices as before.


#2. Another is that alignment issue. When the original base change to 
align to rdm_end, some spaces are wasted. Especially, these spaces could 
be allocated to other smaller bars.


This is one key reason why I had new revision started from v4 to address 
these two points :)


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Requesting for freeze exception for RMRR

2015-07-16 Thread Chen, Tiejun

On 2015/7/14 17:29, Wei Liu wrote:

On Tue, Jul 14, 2015 at 09:27:17AM +0800, Chen, Tiejun wrote:

Please work with maintainers to get those hvmloader patches acked or
reviewed.


I will do.


Maybe I need to update current status today.

I just send out v8:

* All tools comments raised by Jackson and Campbell are addressed and I 
don't see further feedback.


* On hv side, last one is reviewed by George. And looks Jan have no 
obvious objection to that. (Jan: If I'm wrong let me take it back. )


* On hvmloader side, there three patches now:
  patch #5 is reviewed by George and Acked by Jan;
  patch #6 is reviewed just for code implementation but that solution 
can't convince all maintainers. Honestly, this is a rare possibility of 
collision in real world and the original pci allocation is not good so 
its hard to reshape the original mechanism in one week. But actually we 
also have some other solutions but we need a little time to figure out 
how to step forward but I just think any solution wouldn't bring any 
change to other stuffs.

  patch #7 is pretty close to be Acked.

So what's your final determination?

Thanks
Tiejun







Note Jackson and Campbell also raised some comments to improve current
codes.

2. explain why it needs to be in this release (benefits).

RMRR mechanism was broken for a long time and this makes VM always face
security issues. In addition, those associated devices can't be passed
through to VM and even result in VM crashes.

3. explain why it doesn't break things (risks).

Our policy makes sure that system will work in the original way by default
as without the RMRR patches. And especially, this series just impacts those
platforms which have RMRR.



Your patches touch crucial path in hvmloader to build memory map. There
is risk that it may break hvmloader even if it is turned off.

I would need at least some positive test results from you to confirm if
this feature is turned off everything works as before.



Could you show what sort of test you need here? I just did boot a VM without
any RDM parameters. I just think maybe someone had this good experience to
check this.



Yes, this is the basic test I need -- at least booting a VM with RDM
off.

Feel free to do more tests and report back if you have time.

Wei.


Thanks
Tiejun





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/16 0:14, George Dunlap wrote:

On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap
george.dun...@eu.citrix.com wrote:

Would it be possible, on a collision, to have one last stab at
allocating the BAR somewhere else, without relocating memory (or
relocating any other BARs)?  At very least then an administrator could
work around this kind of thing by setting the mmio_hole larger in the
domain config.


If it's not possible to have this last-ditch relocation effort, then


Could you take a look at the original patch #06 ?  Although Jan thought 
that is complicated, that is really one version that I can refine in 
current time slot.



yes, I'd be OK with just disabling the device for the time being.



Just let me send out new patch series based this idea. We can continue 
discuss this over there but we also need to further review other 
remaining comments based on a new revision.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-15 Thread Chen, Tiejun

I think I would say:

--
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
hypervisor map
* Add in the hypervisor regions
* Adjust the lowmem and highmem regions if we've had to relocate
memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.
--


I'll update this and thanks a lot.





CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---


[snip]


+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));


Won't this BUG if the guest was actually given less than 2GiB of RAM?


2u  20 = 0x20, so this is 2M, not 2G :)




+
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */


[snip]


+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr  (1ull  32))
+e820[i].size += add_high_mem;
+}
+}


What if there was originally no high memory, but resizing the pci hole
meant we had to create a high memory region?



You're right. We need to consider this case.

Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 16:34, Jan Beulich wrote:

On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:

Furthermore, could we have this solution as follows?


Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
 pci_mem_start = 1;
 }

+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf(Reserved device memory conflicts current PCI 
memory.\n);

+BUG();
+}
+}
+
 if ( mmio_total  (pci_mem_end - pci_mem_start) )
 {
 printf(Low MMIO hole not large enough for all devices,

This is very similar to our current policy to 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 
since actually this is also another rare possibility in real world. Even 
I can do this as well when we handle that conflict with 
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Note its not necessary to concern high memory since we already handle 
this case in the hv code previously, and its also not affected by those 
relocated memory later since our previous policy can make sure RAM isn't 
overlapping with RDM.


Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

This is very similar to our current policy to
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6
since actually this is also another rare possibility in real world. Even
I can do this as well when we handle that conflict with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6.


Sorry, here is one typo, s/#6/#5

Thanks
Tiejun



Note its not necessary to concern high memory since we already handle
this case in the hv code previously, and its also not affected by those
relocated memory later since our previous policy can make sure RAM isn't
overlapping with RDM.

Thanks
Tiejun


to co-maintainers overriding me).

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Certainly appreciate your time.

I didn't mean its wasting time at this point. I just want to express
that its hard to implement that solution in one or two weeks to walking
into 4.6 as an exception.

Note I know this feature is still not accepted as an exception to 4.6
right now so I'm making an assumption.


After all this is a bug fix (and would have been allowed into 4.5 had
it been ready in time), so doesn't necessarily need a freeze
exception (but of course the bar raises the later it gets). Rather


Yes, this is not a bug fix again into 4.6.


than rushing in something that's cumbersome to maintain, I'd much
prefer this to be done properly.



Indeed, we'd like to finalize this properly as you said. But apparently 
time is not sufficient to allow this happened. So I just suggest we can 
further seek the best solution in next phase.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

On 2015/7/15 19:05, George Dunlap wrote:

On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich jbeul...@suse.com wrote:

On 15.07.15 at 10:59, tiejun.c...@intel.com wrote:

On 2015/7/15 16:34, Jan Beulich wrote:

On 15.07.15 at 06:27, tiejun.c...@intel.com wrote:

Furthermore, could we have this solution as follows?


Yet more special casing code you want to add. I said no to this
model, and unless you can address the issue _without_ adding
a lot of special casing code, the answer will remain no (subject


What about this?

@@ -301,6 +301,19 @@ void pci_setup(void)
   pci_mem_start = 1;
   }

+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+uint64_t reserved_start, reserved_size;
+reserved_start = memory_map.map[i].addr;
+reserved_size = memory_map.map[i].size;
+if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+   reserved_start, reserved_size) )
+{
+printf(Reserved device memory conflicts current PCI memory.\n);
+BUG();
+}
+}


So what would the cure be if someone ran into this BUG() (other
than removing the device associated with the conflicting RMRR)?
Afaics such a guest would remain permanently unbootable, which
of course is not an option.


Is not booting worse than what we have now -- which is, booting
successfully but (probably) having issues due to MMIO ranges
overlapping RMRRs?


Its really so rare possibility here since in the real world we didn't 
see any RMRR regions = 0xF000 (the default pci memory start.) And I 
already sent out a little better revision in that ensuing email so also 
please take a review if possible :)




This patch series as a whole represents a lot of work and a lot of
tangible improvements to the situation; and (unless the situation has
changed) it's almost entirely acked apart from the MMIO placement
part.  If there is a simple way that we can change hvmloader so that
most (or even many) VM/device combinations work properly for the 4.6
release, then I think it's worth considering.



Current MMIO allocation mechanism is not good. So we really need to 
reshape that, but we'd better do this with some further discussion in 
next release :)


Thanks
Tiejun



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-15 Thread Chen, Tiejun

Maybe I  can move this chunk of codes downward those actual allocation
to check if RDM conflicts with the final allocation, and then just
disable those associated devices by writing PCI_COMMAND without BUG()
like this draft code,


And what would keep the guest from re-enabling the device?



We can't but IMO,

#1. We're already posting that warning messages.

#2. Actually this is like this sort of case like, as you known, even in 
the native platform, some broken devices are also disabled by BIOS, 
right? So I think this is OS's responsibility or risk to force enabling 
such a broken device.


#3. Its really rare possibility in real world.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   3   4   >