Re: remove alloc_vm_area v2

2020-09-25 Thread Andrew Morton
On Thu, 24 Sep 2020 15:58:42 +0200 Christoph Hellwig  wrote:

> this series removes alloc_vm_area, which was left over from the big
> vmalloc interface rework.  It is a rather arkane interface, basicaly
> the equivalent of get_vm_area + actually faulting in all PTEs in
> the allocated area.  It was originally addeds for Xen (which isn't
> modular to start with), and then grew users in zsmalloc and i915
> which seems to mostly qualify as abuses of the interface, especially
> for i915 as a random driver should not set up PTE bits directly.
> 
> Note that the i915 patches apply to the drm-tip branch of the drm-tip
> tree, as that tree has recent conflicting commits in the same area.

Is the drm-tip material in linux-next yet?  I'm still seeing a non-trivial
reject in there at present.




[xen-4.13-testing bisection] complete test-xtf-amd64-amd64-4

2020-09-25 Thread osstest service owner
branch xen-4.13-testing
xenbranch xen-4.13-testing
job test-xtf-amd64-amd64-4
testid xtf/test-hvm64-xsa-221

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  e1364e05f92d6c2f12cc77f100cea584354c66cb
  Bug not present: 5867a14ac1747d7411066d7fb2cf238658346ab0
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/154848/


  commit e1364e05f92d6c2f12cc77f100cea584354c66cb
  Author: Jan Beulich 
  Date:   Tue Sep 22 16:24:29 2020 +0200
  
  evtchn/x86: enforce correct upper limit for 32-bit guests
  
  The recording of d->max_evtchns in evtchn_2l_init(), in particular with
  the limited set of callers of the function, is insufficient. Neither for
  PV nor for HVM guests the bitness is known at domain_create() time, yet
  the upper bound in 2-level mode depends upon guest bitness. Recording
  too high a limit "allows" x86 32-bit domains to open not properly usable
  event channels, management of which (inside Xen) would then result in
  corruption of the shared info and vCPU info structures.
  
  Keep the upper limit dynamic for the 2-level case, introducing a helper
  function to retrieve the effective limit. This helper is now supposed to
  be private to the event channel code. The used in do_poll() and
  domain_dump_evtchn_info() weren't consistent with port uses elsewhere
  and hence get switched to port_is_valid().
  
  Furthermore FIFO mode's setup_ports() gets adjusted to loop only up to
  the prior ABI limit, rather than all the way up to the new one.
  
  Finally a word on the change to do_poll(): Accessing ->max_evtchns
  without holding a suitable lock was never safe, as it as well as
  ->evtchn_port_ops may change behind do_poll()'s back. Using
  port_is_valid() instead widens some the window for potential abuse,
  until we've dealt with the race altogether (see XSA-343).
  
  This is XSA-342.
  
  Reported-by: Julien Grall 
  Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max 
number of event channels")
  Signed-off-by: Jan Beulich 
  Reviewed-by: Stefano Stabellini 
  Reviewed-by: Julien Grall 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-4.13-testing/test-xtf-amd64-amd64-4.xtf--test-hvm64-xsa-221.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-4.13-testing/test-xtf-amd64-amd64-4.xtf--test-hvm64-xsa-221
 --summary-out=tmp/154848.bisection-summary --basis-template=154358 
--blessings=real,real-bisect xen-4.13-testing test-xtf-amd64-amd64-4 
xtf/test-hvm64-xsa-221
Searching for failure / basis pass:
 154667 fail [host=huxelrebe1] / 154602 [host=albana0] 154358 [host=godello0] 
152528 [host=godello0] 151712 [host=elbling1] 151337 [host=albana0] 151153 
[host=godello1] 151048 [host=pinot0] 150944 ok.
Failure / basis pass flights: 154667 / 150944
(tree with no url: minios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
fb97626fe04747ec89599dce0992def9a36e2f6b 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
730e2b1927e7d911bbd5350714054ddd5912f4ed 
155821a1990b6de78dde5f98fa5ab90e802021e0 
88f5b414ac0f8008c1e2b26f93c3d980120941f7 
17d372b763cb0b2e2e6b5a637c11f3997d2533fa
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
6ff7c838d09224dd4e4c9b5b93152d8db1b19740 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
730e2b1927e7d911bbd5350714054ddd5912f4ed 
2e3de6253422112ae43e608661ba94ea6b345694 
67958a166f6b019e5ad8dcd60a96dcd262669092 
2a8859e87761a0efc119778e094f203dc2ea487a
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 

Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-09-25 Thread Thomas Gleixner
On Fri, Sep 25 2020 at 17:49, Peter Zijlstra wrote:
> Here it looks like this:
>
> [1.830276] BUG: kernel NULL pointer dereference, address: 
> [1.838043] #PF: supervisor instruction fetch in kernel mode
> [1.844357] #PF: error_code(0x0010) - not-present page
> [1.850090] PGD 0 P4D 0
> [1.852915] Oops: 0010 [#1] SMP
> [1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 5.9.0-rc6-00700-g0248dedd12d4 #419
> [1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
> SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [1.876902] RIP: 0010:0x0
> [1.879824] Code: Bad RIP value.
> [1.883423] RSP: :82803da0 EFLAGS: 00010282
> [1.889251] RAX:  RBX: 8282b980 RCX: 
> 82803e40
> [1.897241] RDX: 0001 RSI: 82803e40 RDI: 
> 8282b980
> [1.905201] RBP: 88842f331000 R08:  R09: 
> 0001
> [1.913162] R10: 0001 R11:  R12: 
> 0048
> [1.921123] R13: 82803e40 R14: 8282b9c0 R15: 
> 
> [1.929085] FS:  () GS:88842f40() 
> knlGS:
> [1.938113] CS:  0010 DS:  ES:  CR0: 80050033
> [1.944524] CR2: ffd6 CR3: 02811001 CR4: 
> 000606b0
> [1.952484] Call Trace:
> [1.955214]  msi_domain_alloc+0x36/0x130

Hrm. That looks like a not initialized mandatory callback. Confused.

Is this on -next and if so, does this happen on tip:x86/irq as well?

Can you provide yoru config please?

Thanks,

tglx



[xen-4.12-testing bisection] complete test-xtf-amd64-amd64-4

2020-09-25 Thread osstest service owner
branch xen-4.12-testing
xenbranch xen-4.12-testing
job test-xtf-amd64-amd64-4
testid xtf/test-hvm64-xsa-221

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  b8c9776986968bd1e90835df408f8bfc60640040
  Bug not present: 253a1e64d30e09ae089a060e364a01b4d442d550
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/154839/


  commit b8c9776986968bd1e90835df408f8bfc60640040
  Author: Jan Beulich 
  Date:   Tue Sep 22 17:09:25 2020 +0200
  
  evtchn/x86: enforce correct upper limit for 32-bit guests
  
  The recording of d->max_evtchns in evtchn_2l_init(), in particular with
  the limited set of callers of the function, is insufficient. Neither for
  PV nor for HVM guests the bitness is known at domain_create() time, yet
  the upper bound in 2-level mode depends upon guest bitness. Recording
  too high a limit "allows" x86 32-bit domains to open not properly usable
  event channels, management of which (inside Xen) would then result in
  corruption of the shared info and vCPU info structures.
  
  Keep the upper limit dynamic for the 2-level case, introducing a helper
  function to retrieve the effective limit. This helper is now supposed to
  be private to the event channel code. The used in do_poll() and
  domain_dump_evtchn_info() weren't consistent with port uses elsewhere
  and hence get switched to port_is_valid().
  
  Furthermore FIFO mode's setup_ports() gets adjusted to loop only up to
  the prior ABI limit, rather than all the way up to the new one.
  
  Finally a word on the change to do_poll(): Accessing ->max_evtchns
  without holding a suitable lock was never safe, as it as well as
  ->evtchn_port_ops may change behind do_poll()'s back. Using
  port_is_valid() instead widens some the window for potential abuse,
  until we've dealt with the race altogether (see XSA-343).
  
  This is XSA-342.
  
  Reported-by: Julien Grall 
  Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max 
number of event channels")
  Signed-off-by: Jan Beulich 
  Reviewed-by: Stefano Stabellini 
  Reviewed-by: Julien Grall 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-4.12-testing/test-xtf-amd64-amd64-4.xtf--test-hvm64-xsa-221.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-4.12-testing/test-xtf-amd64-amd64-4.xtf--test-hvm64-xsa-221
 --summary-out=tmp/154839.bisection-summary --basis-template=154601 
--blessings=real,real-bisect xen-4.12-testing test-xtf-amd64-amd64-4 
xtf/test-hvm64-xsa-221
Searching for failure / basis pass:
 154663 fail [host=chardonnay0] / 154601 [host=godello1] 154121 
[host=huxelrebe1] 152525 [host=huxelrebe0] 151715 [host=elbling0] 151388 
[host=godello1] 151367 [host=godello0] 151341 [host=fiano0] 151316 
[host=albana0] 151292 [host=albana1] 151276 [host=huxelrebe0] 151248 
[host=huxelrebe1] 151227 ok.
Failure / basis pass flights: 154663 / 151227
(tree with no url: minios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
fb97626fe04747ec89599dce0992def9a36e2f6b 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
8023a62081ffbe3f734019076ec1a2b4213142bb 
155821a1990b6de78dde5f98fa5ab90e802021e0 
0186e76a62f7409804c2e4785d5a11e7f82a7c52 
17d372b763cb0b2e2e6b5a637c11f3997d2533fa
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
8927e286a43cddfaa328b0f4c41a09c629c9 
d0d8ad39ecb51cd7497cd524484fe09f50876798 
8023a62081ffbe3f734019076ec1a2b4213142bb 
2e3de6253422112ae43e608661ba94ea6b345694 
06760c2bf322cef4622b56bafee74fe93ae01630 
2a8859e87761a0efc119778e094f203dc2ea487a
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 

[xen-4.10-testing test] 154795: trouble: blocked/broken

2020-09-25 Thread osstest service owner
flight 154795 xen-4.10-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/154795/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64  broken
 build-amd64-prev broken
 build-amd64-pvopsbroken
 build-amd64-xsm  broken
 build-amd64-xtf  broken
 build-arm64  broken
 build-arm64-pvopsbroken
 build-arm64-xsm  broken
 build-armhf  broken
 build-armhf-pvopsbroken
 build-i386   broken
 build-i386-prev  broken
 build-i386-pvops broken
 build-i386-xsm   broken
 build-i386-prev   4 host-install(4)broken REGR. vs. 151728
 build-i386-pvops  4 host-install(4)broken REGR. vs. 151728
 build-i386-xsm4 host-install(4)broken REGR. vs. 151728
 build-amd64-pvops 4 host-install(4)broken REGR. vs. 151728
 build-amd64-xsm   4 host-install(4)broken REGR. vs. 151728
 build-i3864 host-install(4)broken REGR. vs. 151728
 build-armhf-pvops 4 host-install(4)broken REGR. vs. 151728
 build-arm64   4 host-install(4)broken REGR. vs. 151728
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 151728
 build-arm64-pvops 4 host-install(4)broken REGR. vs. 151728
 build-amd64   4 host-install(4)broken REGR. vs. 151728
 build-amd64-xtf   4 host-install(4)broken REGR. vs. 151728
 build-amd64-prev  4 host-install(4)broken REGR. vs. 151728
 build-armhf   4 host-install(4)broken REGR. vs. 151728

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-11 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-21 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-31 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-41 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 build-arm64-libvirt   1 

Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks

2020-09-25 Thread Dario Faggioli
On Fri, 2020-09-25 at 20:21 +, Volodymyr Babchuk wrote:
> Hi Dario,
> 
Hi! :-)

> Dario Faggioli writes:
> > And what about the cases where schedule() does return?
> 
> Can it return on x86? I want to test this case, but how force it?
> Null
> scheduler, perhaps?
> 
> > Are these also fine because they're handled within __do_softirq()
> > (i.e., without actually going back to do_softirq() and hence never
> > calling end_hyp_task() for a second time)?
> 
> I afraid, that there will be a bug. schedule() calls end_hyp_task(),
> and
> if it will eventually return from __do_softirq() to do_softirq(),
> end_hyp_task() will be called twice.
>
Yeah, exactly. That's why I was asking whether you had verified that we
actually never get to this. Either because we context switch or because
we stay inside __do_schedule() and never go back to do_schedule().

I was, in fact, referring to all the various cases of handling primary
and secondary scheduling request, when core-scheduling is enabled.

> > > I have put bunch of ASSERTs to ensure that vcpu_begin_hyp_task()
> > > or
> > > vcpu_end_hyp_task() are not called twice and that
> > > vcpu_end_hyp_task()
> > > is
> > > called after vcpu_begin_hyp_task(). Those asserts are not
> > > failing, so
> > > I
> > > assume that I did all this in the right way :)
> > > 
> > Yeah, good to know. :-)
> > 
> > Are you doing these tests with both core-scheduling disabled and
> > enabled?
> 
> Good question. On x86 I am running Xen in QEMU. With -smp=2 it sees
> two
> CPUs:
> 
> (XEN) Brought up 2 CPUs
> (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> 
> You are right, I need to try other variants of scheduling
> granularity.
> 
> Do you by any chance know how to emulate more complex setup in QEMU?
>
Like enabling a virtual topology, on top of which you could test core
(or socket) scheduling? If yes, indeed you can do that in QEMU:

https://www.qemu.org/docs/master/qemu-doc.html

-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies]
 [,sockets=sockets][,maxcpus=maxcpus]

Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
are supported. On Sparc32 target, Linux limits the number of usable
CPUs to 4. For the PC target, the number of cores per die, the number
of threads per cores, the number of dies per packages and the total
number of sockets can be specified. Missing values will be computed. If
any on the three values is given, the total number of CPUs n can be
omitted. maxcpus specifies the maximum number of hotpluggable CPUs.

Once you have an SMT virtual topology, you can boot Xen inside, with an
higher scheduling granularity.

A (rather big!) example would be:

-smp 224,sockets=4,cores=28,threads=2

You can even define a virtual NUMA topology, if you want.

And you can pin the vCPUs to the physical CPUs of the hosts, in such a
way that the virtual topology is mapped to the physical one. This is
good for performance but also increase a little bit the accuracy of
testing.

> Also, what is the preferred way to test/debug Xen on x86?
> 
I test on real hardware, at least most of the times, if this is what
you're asking.

Checking if the code is "functionally correct" is ok-ish if done in a
VM first. But then, especially for scheduling related things, where
timing plays a rather significant role, I personally prefer to test on
actual hardware sooner rather than later.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part


[xen-unstable bisection] complete test-xtf-amd64-amd64-4

2020-09-25 Thread osstest service owner
branch xen-unstable
xenbranch xen-unstable
job test-xtf-amd64-amd64-4
testid xtf/test-hvm64-xsa-221

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  62bcdc4edbf6d8c6e8a25544d48de22ccf75310d
  Bug not present: 112992b05b2d2ca63f3c78eefe1cf8d192d7303a
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/154832/


  commit 62bcdc4edbf6d8c6e8a25544d48de22ccf75310d
  Author: Jan Beulich 
  Date:   Tue Sep 22 15:50:09 2020 +0200
  
  evtchn/x86: enforce correct upper limit for 32-bit guests
  
  The recording of d->max_evtchns in evtchn_2l_init(), in particular with
  the limited set of callers of the function, is insufficient. Neither for
  PV nor for HVM guests the bitness is known at domain_create() time, yet
  the upper bound in 2-level mode depends upon guest bitness. Recording
  too high a limit "allows" x86 32-bit domains to open not properly usable
  event channels, management of which (inside Xen) would then result in
  corruption of the shared info and vCPU info structures.
  
  Keep the upper limit dynamic for the 2-level case, introducing a helper
  function to retrieve the effective limit. This helper is now supposed to
  be private to the event channel code. The used in do_poll() and
  domain_dump_evtchn_info() weren't consistent with port uses elsewhere
  and hence get switched to port_is_valid().
  
  Furthermore FIFO mode's setup_ports() gets adjusted to loop only up to
  the prior ABI limit, rather than all the way up to the new one.
  
  Finally a word on the change to do_poll(): Accessing ->max_evtchns
  without holding a suitable lock was never safe, as it as well as
  ->evtchn_port_ops may change behind do_poll()'s back. Using
  port_is_valid() instead widens some the window for potential abuse,
  until we've dealt with the race altogether (see XSA-343).
  
  This is XSA-342.
  
  Reported-by: Julien Grall 
  Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max 
number of event channels")
  Signed-off-by: Jan Beulich 
  Reviewed-by: Stefano Stabellini 
  Reviewed-by: Julien Grall 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-xtf-amd64-amd64-4.xtf--test-hvm64-xsa-221.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-unstable/test-xtf-amd64-amd64-4.xtf--test-hvm64-xsa-221
 --summary-out=tmp/154832.bisection-summary --basis-template=154611 
--blessings=real,real-bisect xen-unstable test-xtf-amd64-amd64-4 
xtf/test-hvm64-xsa-221
Searching for failure / basis pass:
 154634 fail [host=albana1] / 154611 [host=huxelrebe1] 154592 [host=godello1] 
154576 [host=fiano1] 154556 [host=huxelrebe0] 154521 [host=chardonnay1] 154504 
[host=elbling0] 154494 [host=albana0] 154481 [host=chardonnay0] 154465 
[host=godello0] 154090 [host=godello0] 154058 [host=pinot1] 154036 
[host=godello1] 154016 [host=rimava1] 153983 [host=fiano0] 153957 ok.
Failure / basis pass flights: 154634 / 153957
(tree with no url: minios)
(tree with no url: ovmf)
(tree with no url: seabios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ea6d3cd1ed79d824e605a70c3626bc437c386260 
2785b2a9e04abc148e1c5259f4faee708ea356f4 
17d372b763cb0b2e2e6b5a637c11f3997d2533fa
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ea6d3cd1ed79d824e605a70c3626bc437c386260 
b11910082d90bb1597f6679524eb726a33306672 
17d372b763cb0b2e2e6b5a637c11f3997d2533fa
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 
git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860
 
git://xenbits.xen.org/qemu-xen-traditional.git#3d273dd05e51e5a1ffba3d98c7437ee84e8f8764-3d273dd05e51e5a1ffba3d98c7437ee84e8f8764
 git://xenbits.xen.org/qemu-xen.git#ea6d3cd1ed79d824e605a70c3626bc4\
 

Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks

2020-09-25 Thread Volodymyr Babchuk


Hi Dario,


Dario Faggioli writes:

> On Thu, 2020-09-24 at 18:08 +, Volodymyr Babchuk wrote:
>> So, as I see this, functions are called in the following way (on
>> x86):
>> 
>> 1. do_softirq() calls vcpu_begin_hyp_task() and then executes
>> __do_softirq()
>> 
>> 2. __do_softirq() does different jobs and eventually calls schedule()
>> 
>> 3. schedule() calls vcpu_end_hyp_task() and makes scheduling decision
>> which leads to call to context_switch()
>> 
>> 4. On end context_switch() we will exit hypervisor and enter VM. At
>> least, this is how I understand
>> 
>>nextd->arch.ctxt_switch->tail(next);
>> 
>> call.
>> 
>> So, no need to call vcpu_begin_hyp_task() in context_switch() for
>> x86.
>> 
> Mmm... This looks correct to me too.
>
> And what about the cases where schedule() does return?

Can it return on x86? I want to test this case, but how force it? Null
scheduler, perhaps?

> Are these also fine because they're handled within __do_softirq()
> (i.e., without actually going back to do_softirq() and hence never
> calling end_hyp_task() for a second time)?

I afraid, that there will be a bug. schedule() calls end_hyp_task(), and
if it will eventually return from __do_softirq() to do_softirq(),
end_hyp_task() will be called twice.

>
>> I have put bunch of ASSERTs to ensure that vcpu_begin_hyp_task() or
>> vcpu_end_hyp_task() are not called twice and that vcpu_end_hyp_task()
>> is
>> called after vcpu_begin_hyp_task(). Those asserts are not failing, so
>> I
>> assume that I did all this in the right way :)
>> 
> Yeah, good to know. :-)
>
> Are you doing these tests with both core-scheduling disabled and
> enabled?

Good question. On x86 I am running Xen in QEMU. With -smp=2 it sees two
CPUs:

(XEN) Brought up 2 CPUs
(XEN) Scheduling granularity: cpu, 1 CPU per sched-resource

You are right, I need to try other variants of scheduling granularity.

Do you by any chance know how to emulate more complex setup in QEMU?
Also, what is the preferred way to test/debug Xen on x86?

-- 
Volodymyr Babchuk at EPAM


Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode

2020-09-25 Thread boris . ostrovsky


On 9/25/20 3:04 PM, Anchal Agarwal wrote:
> On Tue, Sep 22, 2020 at 11:17:36PM +, Anchal Agarwal wrote:
>> On Tue, Sep 22, 2020 at 12:18:05PM -0400, boris.ostrov...@oracle.com wrote:
>>> CAUTION: This email originated from outside of the organization. Do not 
>>> click links or open attachments unless you can confirm the sender and know 
>>> the content is safe.
>>>
>>>
>>>
>>> On 9/21/20 5:54 PM, Anchal Agarwal wrote:
 Thanks for the above suggestion. You are right I didn't find a way to 
 declare
 a global state either. I just broke the above check in 2 so that once we 
 have
 support for ARM we should be able to remove aarch64 condition easily. Let 
 me
 know if I am missing nay corner cases with this one.

 static int xen_pm_notifier(struct notifier_block *notifier,
   unsigned long pm_event, void *unused)
 {
 int ret = NOTIFY_OK;
 if (!xen_hvm_domain() || xen_initial_domain())
   ret = NOTIFY_BAD;
 if(IS_ENABLED(CONFIG_ARM64) && (pm_event == PM_SUSPEND_PREPARE || 
 pm_event == HIBERNATION_PREPARE))
   ret = NOTIFY_BAD;

 return ret;
 }
>>>
>>>
>>> This will allow PM suspend to proceed on x86.
>> Right!! Missed it.
>> Also, wrt KASLR stuff, that issue is still seen sometimes but I haven't had
>> bandwidth to dive deep into the issue and fix it.


So what's the plan there? You first mentioned this issue early this year and 
judged by your response it is not clear whether you will ever spend time 
looking at it.


>>  I seem to have lost your email
>> in my inbox hence covering the question here.
>>>
> Can I add your Reviewed-by or Signed-off-by to it?


Are you asking me to add my R-b to the broken code above?


-boris




Re: [PATCH v3 01/11] xen/manage: keep track of the on-going suspend mode

2020-09-25 Thread Anchal Agarwal
On Tue, Sep 22, 2020 at 11:17:36PM +, Anchal Agarwal wrote:
> On Tue, Sep 22, 2020 at 12:18:05PM -0400, boris.ostrov...@oracle.com wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> > 
> > 
> > 
> > On 9/21/20 5:54 PM, Anchal Agarwal wrote:
> > > Thanks for the above suggestion. You are right I didn't find a way to 
> > > declare
> > > a global state either. I just broke the above check in 2 so that once we 
> > > have
> > > support for ARM we should be able to remove aarch64 condition easily. Let 
> > > me
> > > know if I am missing nay corner cases with this one.
> > >
> > > static int xen_pm_notifier(struct notifier_block *notifier,
> > >   unsigned long pm_event, void *unused)
> > > {
> > > int ret = NOTIFY_OK;
> > > if (!xen_hvm_domain() || xen_initial_domain())
> > >   ret = NOTIFY_BAD;
> > > if(IS_ENABLED(CONFIG_ARM64) && (pm_event == PM_SUSPEND_PREPARE || 
> > > pm_event == HIBERNATION_PREPARE))
> > >   ret = NOTIFY_BAD;
> > >
> > > return ret;
> > > }
> > 
> > 
> > 
> > This will allow PM suspend to proceed on x86.
> Right!! Missed it.
> Also, wrt KASLR stuff, that issue is still seen sometimes but I haven't had
> bandwidth to dive deep into the issue and fix it. I seem to have lost your 
> email
> in my inbox hence covering the question here.
> > 
> >
Can I add your Reviewed-by or Signed-off-by to it?
> > -boris
> > 
>
-Anchal



Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Jason Andryuk
On Fri, Sep 25, 2020 at 12:02 PM Julien Grall  wrote:
>
> Hi Jan,
>
> On 25/09/2020 16:36, Jan Beulich wrote:
> > On 25.09.2020 16:33, Julien Grall wrote:
> >> On 25/09/2020 14:58, Jan Beulich wrote:
> >>> On 25.09.2020 15:16, Julien Grall wrote:
>  Fair enough. I would still like to consider a way where we could avoid
>  to hack xsm_* because we have the interrupts disabled.
> >>>
> >>> Well, from a conceptual pov it's at least questionable for XSM to
> >>> need any memory allocations at all when merely being asked for
> >>> permission. And indeed the need for it arises solely from its
> >>> desire to cache the result, for the sake of subsequent lookups.
> >>>
> >>> I also find it odd that there's an XSM check on the send path in
> >>> the first place. This isn't just because it would seem to me that
> >>> it should be decided at binding time whether sending is permitted
> >>> - I may easily be missing something in the conceptual model here.
> >>> It's also because __domain_finalise_shutdown() too uses
> >>> evtchn_send(), and I didn't think this one should be subject to
> >>> any XSM check (just like send_guest_*() aren't).
> >>
> >> Maybe this is the first question we need to answer?
> >
> > Indeed that was the question I asked myself before putting together
> > the patch. Yet I have no idea who could answer it, with Daniel
> > having gone silent for quite long a time now. Hence I didn't even
> > put up the question, but instead tried to find a halfway reasonable
> > solution.
>
> IIRC, XSM is used by OpenXT and QubeOS. So I have CCed them to get some
> input.

I think the send hook exists because send is its own distinct
operation.  While most common usage could be handled by just checking
at bind time, the send hoor provides more flexibility.  For instance,
the send hook can be used to restrict signalling to only one
direction.  Also, a domain label can transition (change) at runtime.
Dropping the send check would latch the permission at bind time which
would not necessarily be valid for the security policy.

Am I correct that the assertion mentioned in the patch description
would only be seen in debug builds?

Thanks,
Jason



Re: [RFC PATCH v1 2/6] sched: track time spent in hypervisor tasks

2020-09-25 Thread Dario Faggioli
On Thu, 2020-09-24 at 18:08 +, Volodymyr Babchuk wrote:
> So, as I see this, functions are called in the following way (on
> x86):
> 
> 1. do_softirq() calls vcpu_begin_hyp_task() and then executes
> __do_softirq()
> 
> 2. __do_softirq() does different jobs and eventually calls schedule()
> 
> 3. schedule() calls vcpu_end_hyp_task() and makes scheduling decision
> which leads to call to context_switch()
> 
> 4. On end context_switch() we will exit hypervisor and enter VM. At
> least, this is how I understand
> 
>    nextd->arch.ctxt_switch->tail(next);
> 
> call.
> 
> So, no need to call vcpu_begin_hyp_task() in context_switch() for
> x86.
> 
Mmm... This looks correct to me too.

And what about the cases where schedule() does return?

Are these also fine because they're handled within __do_softirq()
(i.e., without actually going back to do_softirq() and hence never
calling end_hyp_task() for a second time)?


> I have put bunch of ASSERTs to ensure that vcpu_begin_hyp_task() or
> vcpu_end_hyp_task() are not called twice and that vcpu_end_hyp_task()
> is
> called after vcpu_begin_hyp_task(). Those asserts are not failing, so
> I
> assume that I did all this in the right way :)
> 
Yeah, good to know. :-)

Are you doing these tests with both core-scheduling disabled and
enabled?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] x86: Use LOCK ADD instead of MFENCE for smp_mb()

2020-09-25 Thread Andrew Cooper
On 21/09/2020 14:58, Jan Beulich wrote:
> On 21.09.2020 15:04, Andrew Cooper wrote:
>> MFENCE is overly heavyweight for SMP semantics on WB memory, because it also
>> orders weaker cached writes, and flushes the WC buffers.
>>
>> This technique was used as an optimisation in Java[1], and later adopted by
>> Linux[2] where it was measured to have a 60% performance improvement in 
>> VirtIO
>> benchmarks.
>>
>> The stack is used because it is hot in the L1 cache, and a -4 offset is used
>> to avoid creating a false data dependency on live data.  (For 64bit 
>> userspace,
>> the offset needs to be under the red zone to avoid false dependences).
>>
>> Fix up the 32 bit definitions in HVMLoader and libxc to avoid a false data
>> dependency.
>>
>> [1] https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
>> [2] 
>> https://git.kernel.org/torvalds/c/450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
>>
>> Signed-off-by: Andrew Cooper 
> For the hypervisor and hvmloader part:
> Reviewed-by: Jan Beulich 
>
>> --- a/tools/libs/ctrl/include/xenctrl.h
>> +++ b/tools/libs/ctrl/include/xenctrl.h
>> @@ -68,11 +68,11 @@
>>  #define xen_barrier() asm volatile ( "" : : : "memory")
>>  
>>  #if defined(__i386__)
>> -#define xen_mb()  asm volatile ( "lock; addl $0,0(%%esp)" : : : "memory" )
> If this causes a false dependency (which I agree it does), how
> come ...
>
>> +#define xen_mb()  asm volatile ( "lock addl $0, -4(%%esp)" ::: "memory" )
>>  #define xen_rmb() xen_barrier()
>>  #define xen_wmb() xen_barrier()
>>  #elif defined(__x86_64__)
>> -#define xen_mb()  asm volatile ( "mfence" : : : "memory")
>> +#define xen_mb()  asm volatile ( "lock addl $0, -128(%%rsp)" ::: "memory" )
> ... this doesn't? It accesses the bottom 4 bytes of the redzone,
> doesn't it?

I suppose it does.  However, if anything, I'd be tempted to move it
closer, perhaps even to -32 or so.

The red-zone isn't used a great deal to start with.  It is only useful
for blocks of code with a working set marginally larger than the GPRs,
and no calls/etc.  This is far more commonly seen in number crunching
code, and rare in data-marshalling logic around a shared memory buffer.

However, you also don't want the memory operand to hit a cache line
which isn't in L1, because that will be far slower than any false data
dependency in the memory order buffer.

OTOH, this is userspace, and neither a locked access hitting DRAM, nor a
false data dependency on the bottom of the red zone are going to be the
dominating factor in overall performance.

> As a minor other thought for all of its incarnations: Is a 32-bit
> memory access really the best choice?

Yes.

> Wouldn't an 8-bit access
> further reduce (albeit not eliminate) the risk of unnecessary
> dependencies between this memory access and others in functions
> called from the users of this barrier?

Not really.  The overwhelming majority of stack accesses are full words,
and would still alias even if the memory order buffer tracks aliasing at
byte granularity (which it might not, to save silicon area).

8 or 32 bit access are probably identical, based on the arch and uarch
details I'm aware of.  However, just in case they are not, I'm
definitely going for the form in common with several other major
software projects.

> Or actually, in the hypervisor case, since the used stack slot
> would typically hold the return address of the next level's
> functions, would a 64-bit access or one further away from the
> current stack pointer not help avoid partial dependencies?

No.  The follow call instruction is a write onto the stack, and will
form a write-after-write condition, rather than a false read-after-write
dependency.

> And finally, already when Linux used this for just 32-bit I've
> always been wondering why they bother preserving the contents of
> this piece of memory. How about using NOT (saving the immediate
> byte) or XCHG (requiring a dead register instead of the saved
> arithmetic, immediate byte, and lock prefix)?

Because performance is not dictated by the instruction length.

For equivalent instructions, the shorter encoding provides a marginal
benefit in terms of reduced instruction cache space, compound
improvements such as possibly being able to relax JMP disp32 to JMP
disp8, and increased decode bandwidth to a point.  There are some code
layout patterns where the longer forms are actually faster to decode on
some microarchitectures.

However, most instructions are not equivalent.

The XCHG plan may force a register to be spilled.  Some processors
really do track zeroes all the way out into the cache[1] and make
optimisations based on this knowledge.  ADD $0 is a no-op which can be
spotted at decode time, by a sufficiently clever pipeline.

~Andrew

[1] https://travisdowns.github.io/blog/2020/05/13/intel-zero-opt.html



Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Jan Beulich
On 25.09.2020 18:00, Julien Grall wrote:
> On 25/09/2020 16:36, Jan Beulich wrote:
>> Plus, as said, the minimal change of making Flask
>> avoid xmalloc() when IRQs are off is a change that ought to be made
>> anyway imo, in order to favor proper functioning over performance.
> If there are other use, then yes. What I don't like in the current 
> approach is we are hijacking xsm_event_send() for pre-allocating memory.

I first had a designated function for this, but the need to wire
this through ops, dummy.h, etc made me pretty quickly turn to this
lower churn variant. But that other one hasn't been ruled out if
the "hijacking" is deemed too bad.

Jan



[PATCH 08/11, fixed] drm/i915: use vmap in i915_gem_object_map

2020-09-25 Thread Christoph Hellwig
i915_gem_object_map implements fairly low-level vmap functionality in
a driver.  Split it into two helpers, one for remapping kernel memory
which can use vmap, and one for I/O memory that uses vmap_pfn.

The only practical difference is that alloc_vm_area prefeaults the
vmalloc area PTEs, which doesn't seem to be required here for the
kernel memory case (and could be added to vmap using a flag if actually
required).

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 127 ++
 2 files changed, 60 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 9afa5c4a6bf006..1e1cb245fca778 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -25,6 +25,7 @@ config DRM_I915
select CRC32
select SND_HDA_I915 if SND_HDA_CORE
select CEC_CORE if CEC_NOTIFIER
+   select VMAP_PFN
help
  Choose this option if you have a system that has "Intel Graphics
  Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6550c0bc824ea2..f60ca6dc911f29 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -232,34 +232,21 @@ int __i915_gem_object_put_pages(struct 
drm_i915_gem_object *obj)
return err;
 }
 
-static inline pte_t iomap_pte(resource_size_t base,
- dma_addr_t offset,
- pgprot_t prot)
-{
-   return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
-}
-
 /* The 'mapping' part of i915_gem_object_pin_map() below */
-static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
-enum i915_map_type type)
+static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
+   enum i915_map_type type)
 {
-   unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
-   struct sg_table *sgt = obj->mm.pages;
-   pte_t *stack[32], **mem;
-   struct vm_struct *area;
+   unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
+   struct page *stack[32], **pages = stack, *page;
+   struct sgt_iter iter;
pgprot_t pgprot;
+   void *vaddr;
 
-   if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
-   return NULL;
-
-   if (GEM_WARN_ON(type == I915_MAP_WC &&
-   !static_cpu_has(X86_FEATURE_PAT)))
-   return NULL;
-
-   /* A single page can always be kmapped */
-   if (n_pte == 1 && type == I915_MAP_WB) {
-   struct page *page = sg_page(sgt->sgl);
-
+   switch (type) {
+   default:
+   MISSING_CASE(type);
+   fallthrough;/* to use PAGE_KERNEL anyway */
+   case I915_MAP_WB:
/*
 * On 32b, highmem using a finite set of indirect PTE (i.e.
 * vmap) to provide virtual mappings of the high pages.
@@ -277,30 +264,8 @@ static void *i915_gem_object_map(struct 
drm_i915_gem_object *obj,
 * So if the page is beyond the 32b boundary, make an explicit
 * vmap.
 */
-   if (!PageHighMem(page))
-   return page_address(page);
-   }
-
-   mem = stack;
-   if (n_pte > ARRAY_SIZE(stack)) {
-   /* Too big for stack -- allocate temporary array instead */
-   mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
-   if (!mem)
-   return NULL;
-   }
-
-   area = alloc_vm_area(obj->base.size, mem);
-   if (!area) {
-   if (mem != stack)
-   kvfree(mem);
-   return NULL;
-   }
-
-   switch (type) {
-   default:
-   MISSING_CASE(type);
-   fallthrough;/* to use PAGE_KERNEL anyway */
-   case I915_MAP_WB:
+   if (n_pages == 1 && !PageHighMem(sg_page(obj->mm.pages->sgl)))
+   return page_address(sg_page(obj->mm.pages->sgl));
pgprot = PAGE_KERNEL;
break;
case I915_MAP_WC:
@@ -308,30 +273,50 @@ static void *i915_gem_object_map(struct 
drm_i915_gem_object *obj,
break;
}
 
-   if (i915_gem_object_has_struct_page(obj)) {
-   struct sgt_iter iter;
-   struct page *page;
-   pte_t **ptes = mem;
+   if (n_pages > ARRAY_SIZE(stack)) {
+   /* Too big for stack -- allocate temporary array instead */
+   pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
+   return NULL;
+   }
 
-   for_each_sgt_page(page, iter, sgt)
-   **ptes++ = mk_pte(page, pgprot);
-   } else 

Re: [Intel-gfx] [PATCH 08/11] drm/i915: use vmap in i915_gem_object_map

2020-09-25 Thread Christoph Hellwig
On Fri, Sep 25, 2020 at 03:08:59PM +0100, Matthew Auld wrote:
> > +   i = 0;
> > +   for_each_sgt_page(page, iter, obj->mm.pages)
> > +   pages[i++] = page;
> > +   vaddr = vmap(pages, n_pages, 0, pgprot);
> > +   if (pages != stack)
> > +   kvfree(pages);
> > +   return vaddr;
> > +}

> > -   return area->addr;
> > +   for_each_sgt_daddr(addr, iter, obj->mm.pages)
> > +   pfns[i++] = (iomap + addr) >> PAGE_SHIFT;
> 
> Missing the i = 0 fix from Dan?

Yeah, looks like I only managed to apply the one in the page based
version above.



Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Julien Grall

Hi Jan,

On 25/09/2020 16:36, Jan Beulich wrote:

On 25.09.2020 16:33, Julien Grall wrote:

On 25/09/2020 14:58, Jan Beulich wrote:

On 25.09.2020 15:16, Julien Grall wrote:

Fair enough. I would still like to consider a way where we could avoid
to hack xsm_* because we have the interrupts disabled.


Well, from a conceptual pov it's at least questionable for XSM to
need any memory allocations at all when merely being asked for
permission. And indeed the need for it arises solely from its
desire to cache the result, for the sake of subsequent lookups.

I also find it odd that there's an XSM check on the send path in
the first place. This isn't just because it would seem to me that
it should be decided at binding time whether sending is permitted
- I may easily be missing something in the conceptual model here.
It's also because __domain_finalise_shutdown() too uses
evtchn_send(), and I didn't think this one should be subject to
any XSM check (just like send_guest_*() aren't).


Maybe this is the first question we need to answer?


Indeed that was the question I asked myself before putting together
the patch. Yet I have no idea who could answer it, with Daniel
having gone silent for quite long a time now. Hence I didn't even
put up the question, but instead tried to find a halfway reasonable
solution. 


IIRC, XSM is used by OpenXT and QubeOS. So I have CCed them to get some 
input.



After all it's not just the master branch which is stuck
right now.



And from a backporting perspective having the "fix"
touch code which hasn't had much churn over the last years may even
be beneficial.


Right, but dropping xsm_evtchn_send() (if it is not possible) is going 
to be even better. So lets explore this solution first.



Plus, as said, the minimal change of making Flask
avoid xmalloc() when IRQs are off is a change that ought to be made
anyway imo, in order to favor proper functioning over performance.
If there are other use, then yes. What I don't like in the current 
approach is we are hijacking xsm_event_send() for pre-allocating memory.


Cheers,

--
Julien Grall



Re: [PATCH v2 04/11] xen/memory: Fix acquire_resource size semantics

2020-09-25 Thread Jan Beulich
On 22.09.2020 20:24, Andrew Cooper wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1007,6 +1007,26 @@ static long xatp_permission_check(struct domain *d, 
> unsigned int space)
>  return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
>  }
>  
> +/*
> + * Return 0 on any kind of error.  Caller converts to -EINVAL.
> + *
> + * All nonzero values should be repeatable (i.e. derived from some fixed
> + * property of the domain), and describe the full resource (i.e. mapping the
> + * result of this call will be the entire resource).
> + */
> +static unsigned int resource_max_frames(struct domain *d,

With the lockless intentions I think this could be const from
here on through all the descendants. With this
Reviewed-by: Jan Beulich 
albeit I have one more minor remark:

> @@ -1058,6 +1066,27 @@ static int acquire_resource(
>  if ( rc )
>  goto out;
>  
> +max_frames = resource_max_frames(d, xmar.type, xmar.id);
> +
> +rc = -EINVAL;
> +if ( !max_frames )
> +goto out;
> +
> +if ( guest_handle_is_null(xmar.frame_list) )
> +{
> +if ( xmar.nr_frames )
> +goto out;
> +
> +xmar.nr_frames = max_frames;
> +
> +rc = -EFAULT;
> +if ( __copy_field_to_guest(arg, , nr_frames) )
> +goto out;
> +
> +rc = 0;
> +goto out;
> +}

That's a lot of "goto out" here. I don't suppose I could talk you
into reducing their amount some, since at least the last two look
to be easy to fold?

Jan



Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-09-25 Thread Peter Zijlstra
On Fri, Sep 25, 2020 at 11:29:13AM -0400, Qian Cai wrote:

> It looks like the crashes happen in the interrupt remapping code where they 
> are
> only able to to generate partial call traces.

> [8.466614][T0] BUG: kernel NULL pointer dereference, address: 
> 
> [8.474295][T0] #PF: supervisor instruction fetch in kernel mode
> [8.480669][T0] #PF: error_code(0x0010) - not-present page
> [8.486518][T0] PGD 0 P4D 0 
> [8.489757][T0] Oops: 0010 [#1] SMP KASAN PTI
> [8.494476][T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I    
>5.9.0-rc6-next-20200925 #2
> [8.503987][T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 
> Gen10, BIOS U34 11/13/2019
> [8.513238][T0] RIP: 0010:0x0
> [8.516562][T0] Code: Bad RIP v

Here it looks like this:

[1.830276] BUG: kernel NULL pointer dereference, address: 
[1.838043] #PF: supervisor instruction fetch in kernel mode
[1.844357] #PF: error_code(0x0010) - not-present page
[1.850090] PGD 0 P4D 0
[1.852915] Oops: 0010 [#1] SMP
[1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.9.0-rc6-00700-g0248dedd12d4 #419
[1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
SE5C600.86B.02.02.0002.122320131210 12/23/2013
[1.876902] RIP: 0010:0x0
[1.879824] Code: Bad RIP value.
[1.883423] RSP: :82803da0 EFLAGS: 00010282
[1.889251] RAX:  RBX: 8282b980 RCX: 82803e40
[1.897241] RDX: 0001 RSI: 82803e40 RDI: 8282b980
[1.905201] RBP: 88842f331000 R08:  R09: 0001
[1.913162] R10: 0001 R11:  R12: 0048
[1.921123] R13: 82803e40 R14: 8282b9c0 R15: 
[1.929085] FS:  () GS:88842f40() 
knlGS:
[1.938113] CS:  0010 DS:  ES:  CR0: 80050033
[1.944524] CR2: ffd6 CR3: 02811001 CR4: 000606b0
[1.952484] Call Trace:
[1.955214]  msi_domain_alloc+0x36/0x130
[1.959594]  __irq_domain_alloc_irqs+0x165/0x380
[1.964748]  dmar_alloc_hwirq+0x9a/0x120
[1.969127]  dmar_set_interrupt.part.0+0x1c/0x60
[1.974281]  enable_drhd_fault_handling+0x2c/0x6c
[1.979532]  apic_intr_mode_init+0xfa/0x100
[1.984191]  x86_late_time_init+0x20/0x30
[1.988662]  start_kernel+0x723/0x7e6
[1.992748]  secondary_startup_64_no_verify+0xa6/0xab
[1.998386] Modules linked in:
[2.001794] CR2: 
[2.005510] ---[ end trace 837dc60d7c66efa2 ]---




Re: [PATCH 03/11 RFC] gitignore: Add/Generalize entries

2020-09-25 Thread Elliott Mitchell
On Fri, Sep 25, 2020 at 08:49:01AM +0200, Jan Beulich wrote:
> I don't think so, no - new ports will similarly have asm-/config.h,
> and there shouldn't be a requirement to "git add -f" them at that point.
> The role of such named files really is too different to have such a top
> level entry imo.

Okay.  I had thought autoconf/configure was by far the most common source
of config.h files, and thus best to have in there, but my local copies
have been adjusted.  There aren't many config.h entries so dumping an
attempt at a common pattern is quite appropriate.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Jan Beulich
On 25.09.2020 16:33, Julien Grall wrote:
> On 25/09/2020 14:58, Jan Beulich wrote:
>> On 25.09.2020 15:16, Julien Grall wrote:
>>> Fair enough. I would still like to consider a way where we could avoid
>>> to hack xsm_* because we have the interrupts disabled.
>>
>> Well, from a conceptual pov it's at least questionable for XSM to
>> need any memory allocations at all when merely being asked for
>> permission. And indeed the need for it arises solely from its
>> desire to cache the result, for the sake of subsequent lookups.
>>
>> I also find it odd that there's an XSM check on the send path in
>> the first place. This isn't just because it would seem to me that
>> it should be decided at binding time whether sending is permitted
>> - I may easily be missing something in the conceptual model here.
>> It's also because __domain_finalise_shutdown() too uses
>> evtchn_send(), and I didn't think this one should be subject to
>> any XSM check (just like send_guest_*() aren't).
> 
> Maybe this is the first question we need to answer?

Indeed that was the question I asked myself before putting together
the patch. Yet I have no idea who could answer it, with Daniel
having gone silent for quite long a time now. Hence I didn't even
put up the question, but instead tried to find a halfway reasonable
solution. After all it's not just the master branch which is stuck
right now. And from a backporting perspective having the "fix"
touch code which hasn't had much churn over the last years may even
be beneficial. Plus, as said, the minimal change of making Flask
avoid xmalloc() when IRQs are off is a change that ought to be made
anyway imo, in order to favor proper functioning over performance.

Jan



Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Jan Beulich
On 25.09.2020 16:57, Jürgen Groß wrote:
> On 25.09.20 14:21, Jan Beulich wrote:
>> On 25.09.2020 12:34, Julien Grall wrote:
>>> On 24/09/2020 11:53, Jan Beulich wrote:
 xmalloc() & Co may not be called with IRQs off, or else check_lock()
 will have its assertion trigger about locks getting acquired
 inconsistently. Re-arranging the locking in evtchn_send() doesn't seem
 very reasonable, especially since the per-channel lock was introduced to
 avoid acquiring the per-domain event lock on the send paths. Issue a
 second call to xsm_evtchn_send() instead, before acquiring the lock, to
 give XSM / Flask a chance to pre-allocate whatever it may need.
>>>
>>> This is the sort of fall-out I was expecting when we decide to turn off
>>> the interrupts for big chunk of code. I couldn't find any at the time
>>> though...
>>>
>>> Can you remind which caller of send_guest{global, vcpu}_virq() will call
>>> them with interrupts off?
>>
>> I don't recall which one of the two it was that I hit; we wanted
>> both to use the lock anyway. send_guest_pirq() very clearly also
>> gets called with IRQs off.
>>
>>> Would it be possible to consider deferring the call to a softirq
>>> taslket? If so, this would allow us to turn the interrupts again.
>>
>> Of course this is in principle possible; the question is how
>> involved this is going to get. However, on x86 oprofile's call to
>> send_guest_vcpu_virq() can't easily be replaced - it's dangerous
>> enough already that in involves locks in NMI context. I don't
>> fancy seeing it use more commonly used ones.
> 
> Is it really so hard to avoid calling send_guest_vcpu_virq() in NMI
> context?
> 
> Today it is called only if the NMI happened inside the guest, so the
> main Xen stack is unused at this time. It should be rather straight
> forward to mimic a stack frame on the main stack and iret to a special
> handler from NMI context. This handler would then call
> send_guest_vcpu_virq() and return to the guest.

Quite possible that it's not overly difficult to arrange for. But
even with this out of the way I don't really view this softirq
tasklet route as viable; I could be proven wrong by demonstrating
that it's sufficiently straightforward.

Jan



Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-09-25 Thread Qian Cai
On Wed, 2020-08-26 at 13:16 +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.
> 
> The first version can be found here:
> 
> https://lore.kernel.org/r/20200821002424.119492...@linutronix.de
> 
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.

Reverting the part of this patchset on the top of today's linux-next fixed an
boot issue on HPE ProLiant DL560 Gen10, i.e.,

$ git revert --no-edit 13b90cadfc29..bc95fd0d7c42

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/x86.config

It looks like the crashes happen in the interrupt remapping code where they are
only able to to generate partial call traces.

[1.912386][T0] ACPI: X2APIC_NMI (uid[0xf5] high level 9983][T0] ... 
MAX_LOCK_DEPTH:  48
[7.914876][T0] ... MAX_LOCKDEP_KEYS:8192
[7.919942][T0] ... CLASSHASH_SIZE:  4096
[7.925009][T0] ... MAX_LOCKDEP_ENTRIES: 32768
[7.930163][T0] ... MAX_LOCKDEP_CHAINS:  65536
[7.935318][T0] ... CHAINHASH_SIZE:  32768
[7.940473][T0]  memory used by lock dependency info: 6301 kB
[7.946586][T0]  memory used for stack traces: 4224 kB
[7.952088][T0]  per task-struct memory footprint: 1920 bytes
[7.968312][T0] mempolicy: Enabling automatic NUMA balancing. Configure 
with numa_balancing= or the kernel.numa_balancing sysctl
[7.980281][T0] ACPI: Core revision 20200717
[7.993343][T0] clocksource: hpet: mask: 0x max_cycles: 
0x, max_idle_ns: 79635855245 ns
[8.003270][T0] APIC: Switch to symmetric I/O mode setup
[8.008951][T0] DMAR: Host address width 46
[8.013512][T0] DMAR: DRHD base: 0x00e5ffc000 flags: 0x0
[8.019680][T0] DMAR: dmar0: reg_base_addr e5ffc000 ver 1:0 cap 
8d2078c106f0466 [T0] DMAR-IR: IOAPIC id 15 under DRHD base  0xe5ffc000 
IOMMU 0
[8.420990][T0] DMAR-IR: IOAPIC id 8 under DRHD base  0xddffc000 IOMMU 15
[8.428166][T0] DMAR-IR: IOAPIC id 9 under DRHD base  0xddffc000 IOMMU 15
[8.435341][T0] DMAR-IR: HPET id 0 under DRHD base 0xddffc000
[8.441456][T0] DMAR-IR: Queued invalidation will be enabled to support 
x2apic and Intr-remapping.
[8.457911][T0] DMAR-IR: Enabled IRQ remapping in x2apic mode
[8.466614][T0] BUG: kernel NULL pointer dereference, address: 

[8.474295][T0] #PF: supervisor instruction fetch in kernel mode
[8.480669][T0] #PF: error_code(0x0010) - not-present page
[8.486518][T0] PGD 0 P4D 0 
[8.489757][T0] Oops: 0010 [#1] SMP KASAN PTI
[8.494476][T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I  
 5.9.0-rc6-next-20200925 #2
[8.503987][T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 
Gen10, BIOS U34 11/13/2019
[8.513238][T0] RIP: 0010:0x0
[8.516562][T0] Code: Bad RIP v

or

[2.906744][T0] ACPI: X2API32, address 0xfec68000, GSI 128-135
[2.907063][T0] IOAPIC[15]: apic_id 29, version 32, address 0xfec7, 
GSI 136-143
[2.907071][T0] IOAPIC[16]: apic_id 30, version 32, address 0xfec78000, 
GSI 144-151
[2.907079][T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[2.907084][T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high 
level)
[2.907100][T0] Using ACPI (MADT) for SMP configuration information
[2.907105][T0] ACPI: HPET id: 0x8086a701 base: 0xfed0
[2.907116][T0] ACPI: SPCR: console: uart,mmio,0x0,115200
[2.907121][T0] TSC deadline timer available
[2.907126][T0] smpboot: Allowing 144 CPUs, 0 hotplug CPUs
[2.907163][T0] [mem 0xd000-0xfdff] available for PCI devices
[2.907175][T0] clocksource: refined-jiffies: mask: 0x 
max_cycles: 0x, max_idle_ns: 1911260446275 ns
[2.914541][T0] setup_percpu: NR_CPUS:256 nr_cpumask_bits:144 
nr_cpu_ids:144 nr_node_ids:4
[2.926109][   466 ecap f020df
[9.134709][T0] DMAR: DRHD base: 0x00f5ffc000 flags: 0x0
[9.140867][T0] DMAR: dmar8: reg_base_addr f5ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.149610][T0] DMAR: DRHD base: 0x00f7ffc000 flags: 0x0
[9.155762][T0] DMAR: dmar9: reg_base_addr f7ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.164491][T0] DMAR: DRHD base: 0x00f9ffc000 flags: 0x0
[9.170645][T0] DMAR: dmar10: reg_base_addr f9ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.179476][T0] DMAR: DRHD base: 0x00fbffc000 flags: 0x0
[9.185626][T0] DMAR: dmar11: reg_base_addr fbffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.194442][T0] DMAR: DRHD base: 0x00dfffc000 flags: 0x0
[

Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Jürgen Groß

On 25.09.20 14:21, Jan Beulich wrote:

On 25.09.2020 12:34, Julien Grall wrote:

On 24/09/2020 11:53, Jan Beulich wrote:

xmalloc() & Co may not be called with IRQs off, or else check_lock()
will have its assertion trigger about locks getting acquired
inconsistently. Re-arranging the locking in evtchn_send() doesn't seem
very reasonable, especially since the per-channel lock was introduced to
avoid acquiring the per-domain event lock on the send paths. Issue a
second call to xsm_evtchn_send() instead, before acquiring the lock, to
give XSM / Flask a chance to pre-allocate whatever it may need.


This is the sort of fall-out I was expecting when we decide to turn off
the interrupts for big chunk of code. I couldn't find any at the time
though...

Can you remind which caller of send_guest{global, vcpu}_virq() will call
them with interrupts off?


I don't recall which one of the two it was that I hit; we wanted
both to use the lock anyway. send_guest_pirq() very clearly also
gets called with IRQs off.


Would it be possible to consider deferring the call to a softirq
taslket? If so, this would allow us to turn the interrupts again.


Of course this is in principle possible; the question is how
involved this is going to get. However, on x86 oprofile's call to
send_guest_vcpu_virq() can't easily be replaced - it's dangerous
enough already that in involves locks in NMI context. I don't
fancy seeing it use more commonly used ones.


Is it really so hard to avoid calling send_guest_vcpu_virq() in NMI
context?

Today it is called only if the NMI happened inside the guest, so the
main Xen stack is unused at this time. It should be rather straight
forward to mimic a stack frame on the main stack and iret to a special
handler from NMI context. This handler would then call
send_guest_vcpu_virq() and return to the guest.

This would basically be similar to the Linux kernel's
__run_on_irqstack().


Juergen



Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Julien Grall

Hi Jan,

On 25/09/2020 14:58, Jan Beulich wrote:

On 25.09.2020 15:16, Julien Grall wrote:

Hi Jan,

On 25/09/2020 13:21, Jan Beulich wrote:

On 25.09.2020 12:34, Julien Grall wrote:

On 24/09/2020 11:53, Jan Beulich wrote:

xmalloc() & Co may not be called with IRQs off, or else check_lock()
will have its assertion trigger about locks getting acquired
inconsistently. Re-arranging the locking in evtchn_send() doesn't seem
very reasonable, especially since the per-channel lock was introduced to
avoid acquiring the per-domain event lock on the send paths. Issue a
second call to xsm_evtchn_send() instead, before acquiring the lock, to
give XSM / Flask a chance to pre-allocate whatever it may need.


This is the sort of fall-out I was expecting when we decide to turn off
the interrupts for big chunk of code. I couldn't find any at the time
though...

Can you remind which caller of send_guest{global, vcpu}_virq() will call
them with interrupts off?


I don't recall which one of the two it was that I hit; we wanted
both to use the lock anyway. send_guest_pirq() very clearly also
gets called with IRQs off.


Would it be possible to consider deferring the call to a softirq
taslket? If so, this would allow us to turn the interrupts again.


Of course this is in principle possible; the question is how
involved this is going to get.
However, on x86 oprofile's call to
send_guest_vcpu_virq() can't easily be replaced - it's dangerous
enough already that in involves locks in NMI context. I don't
fancy seeing it use more commonly used ones.


Fair enough. I would still like to consider a way where we could avoid
to hack xsm_* because we have the interrupts disabled.


Well, from a conceptual pov it's at least questionable for XSM to
need any memory allocations at all when merely being asked for
permission. And indeed the need for it arises solely from its
desire to cache the result, for the sake of subsequent lookups.

I also find it odd that there's an XSM check on the send path in
the first place. This isn't just because it would seem to me that
it should be decided at binding time whether sending is permitted
- I may easily be missing something in the conceptual model here.
It's also because __domain_finalise_shutdown() too uses
evtchn_send(), and I didn't think this one should be subject to
any XSM check (just like send_guest_*() aren't).


Maybe this is the first question we need to answer?




AFAICT, we don't need send_guest_global_virq() and evtchn_send() to be
mutually exclusive. Is that correct?


Yes, any number of sends (even to the same port) could in principle
run concurrently, I think. Or else the FIFO code would have been
broken from the very point where the per-channel lock was
introduced and acquiring of the per-domain one then dropped from
evtchn_send() (other sending paths weren't using the per-domain one
anyway already before that).


So how about splitting the lock in two? One would be used when the
interrupts have to be disabled while the other would be used when we can
keep interrupts enabled.


Now that's an interesting proposal. I thought one lock per channel
was already pretty fine-grained. Now you propose making it two.


The two locks would have to be taken when the event channel needs to be
modified.


Requiring a total of 6 locks to be acquired when fiddling with
interdomain channels... Wow. Definitely more intrusive overall than
the change here.


Well hacks are always more self-contained but long term they are a 
nightmare to maintain :).


6 locks is indeed a lot, but the discussion here shows that the current 
locking is probably not suitable for the current uses.


I don't have a better solution so far, but I will have a think about it. 
Meanwhile, it would be good to figure out why xsm_evtchn_send() is needed.


Cheers,

--
Julien Grall



Re: [Intel-gfx] [PATCH 08/11] drm/i915: use vmap in i915_gem_object_map

2020-09-25 Thread Matthew Auld
On Thu, 24 Sep 2020 at 14:59, Christoph Hellwig  wrote:
>
> i915_gem_object_map implements fairly low-level vmap functionality in
> a driver.  Split it into two helpers, one for remapping kernel memory
> which can use vmap, and one for I/O memory that uses vmap_pfn.
>
> The only practical difference is that alloc_vm_area prefeaults the
> vmalloc area PTEs, which doesn't seem to be required here for the
> kernel memory case (and could be added to vmap using a flag if actually
> required).
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/i915/Kconfig  |   1 +
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c | 126 ++
>  2 files changed, 59 insertions(+), 68 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 9afa5c4a6bf006..1e1cb245fca778 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -25,6 +25,7 @@ config DRM_I915
> select CRC32
> select SND_HDA_I915 if SND_HDA_CORE
> select CEC_CORE if CEC_NOTIFIER
> +   select VMAP_PFN
> help
>   Choose this option if you have a system that has "Intel Graphics
>   Media Accelerator" or "HD Graphics" integrated graphics,
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6550c0bc824ea2..b519417667eb4b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -232,34 +232,21 @@ int __i915_gem_object_put_pages(struct 
> drm_i915_gem_object *obj)
> return err;
>  }
>
> -static inline pte_t iomap_pte(resource_size_t base,
> - dma_addr_t offset,
> - pgprot_t prot)
> -{
> -   return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
> -}
> -
>  /* The 'mapping' part of i915_gem_object_pin_map() below */
> -static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
> -enum i915_map_type type)
> +static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
> +   enum i915_map_type type)
>  {
> -   unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
> -   struct sg_table *sgt = obj->mm.pages;
> -   pte_t *stack[32], **mem;
> -   struct vm_struct *area;
> +   unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
> +   struct page *stack[32], **pages = stack, *page;
> +   struct sgt_iter iter;
> pgprot_t pgprot;
> +   void *vaddr;
>
> -   if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
> -   return NULL;
> -
> -   if (GEM_WARN_ON(type == I915_MAP_WC &&
> -   !static_cpu_has(X86_FEATURE_PAT)))
> -   return NULL;
> -
> -   /* A single page can always be kmapped */
> -   if (n_pte == 1 && type == I915_MAP_WB) {
> -   struct page *page = sg_page(sgt->sgl);
> -
> +   switch (type) {
> +   default:
> +   MISSING_CASE(type);
> +   fallthrough;/* to use PAGE_KERNEL anyway */
> +   case I915_MAP_WB:
> /*
>  * On 32b, highmem using a finite set of indirect PTE (i.e.
>  * vmap) to provide virtual mappings of the high pages.
> @@ -277,30 +264,8 @@ static void *i915_gem_object_map(struct 
> drm_i915_gem_object *obj,
>  * So if the page is beyond the 32b boundary, make an explicit
>  * vmap.
>  */
> -   if (!PageHighMem(page))
> -   return page_address(page);
> -   }
> -
> -   mem = stack;
> -   if (n_pte > ARRAY_SIZE(stack)) {
> -   /* Too big for stack -- allocate temporary array instead */
> -   mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
> -   if (!mem)
> -   return NULL;
> -   }
> -
> -   area = alloc_vm_area(obj->base.size, mem);
> -   if (!area) {
> -   if (mem != stack)
> -   kvfree(mem);
> -   return NULL;
> -   }
> -
> -   switch (type) {
> -   default:
> -   MISSING_CASE(type);
> -   fallthrough;/* to use PAGE_KERNEL anyway */
> -   case I915_MAP_WB:
> +   if (n_pages == 1 && !PageHighMem(sg_page(obj->mm.pages->sgl)))
> +   return page_address(sg_page(obj->mm.pages->sgl));
> pgprot = PAGE_KERNEL;
> break;
> case I915_MAP_WC:
> @@ -308,30 +273,49 @@ static void *i915_gem_object_map(struct 
> drm_i915_gem_object *obj,
> break;
> }
>
> -   if (i915_gem_object_has_struct_page(obj)) {
> -   struct sgt_iter iter;
> -   struct page *page;
> -   pte_t **ptes = mem;
> +   if (n_pages > ARRAY_SIZE(stack)) {
> +   /* Too big for stack -- allocate 

[PATCH v2] x86/xen: disable Firmware First mode for correctable memory errors

2020-09-25 Thread Juergen Gross
When running as Xen dom0 the kernel isn't responsible for selecting the
error handling mode, this should be handled by the hypervisor.

So disable setting FF mode when running as Xen pv guest. Not doing so
might result in boot splats like:

[7.509696] HEST: Enabling Firmware First mode for corrected errors.
[7.510382] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 2.
[7.510383] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 3.
[7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 4.
[7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 5.
[7.510385] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 6.
[7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 7.
[7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 8.

Reason is that the HEST ACPI table contains the real number of MCA
banks, while the hypervisor is emulating only 2 banks for guests.

Cc: sta...@vger.kernel.org
Signed-off-by: Juergen Gross 
Reviewed-by: Boris Ostrovsky 
---
 arch/x86/xen/enlighten_pv.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 22e741e0b10c..351ac1a9a119 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1376,6 +1376,15 @@ asmlinkage __visible void __init xen_start_kernel(void)
x86_init.mpparse.get_smp_config = x86_init_uint_noop;
 
xen_boot_params_init_edd();
+
+#ifdef CONFIG_ACPI
+   /*
+* Disable selecting "Firmware First mode" for correctable
+* memory errors, as this is the duty of the hypervisor to
+* decide.
+*/
+   acpi_disable_cmcff = 1;
+#endif
}
 
if (!boot_params.screen_info.orig_video_isVGA)
-- 
2.26.2




Re: [PATCH] x86/xen: disable Firmware First mode for correctable memory errors

2020-09-25 Thread Jan Beulich
On 25.09.2020 15:45, boris.ostrov...@oracle.com wrote:
> On 9/25/20 6:11 AM, Juergen Gross wrote:
>> @@ -1296,6 +1296,14 @@ asmlinkage __visible void __init 
>> xen_start_kernel(void)
>>  
>>  xen_smp_init();
>>  
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Disable selecting "Firmware First mode" for correctable memory
>> + * errors, as this is the duty of the hypervisor to decide.
>> + */
>> +acpi_disable_cmcff = 1;
>> +#endif
> 
> 
> Not that it matters greatly but should this go under if (xen_initial_domain())
> clause a bit further down?

Yes - DomU-s are supposed to be in charge of their (virtual) firmware,
no matter that right now APEI for guests is completely out of sight as
far as I'm aware.

Jan



Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Jan Beulich
On 25.09.2020 15:16, Julien Grall wrote:
> Hi Jan,
> 
> On 25/09/2020 13:21, Jan Beulich wrote:
>> On 25.09.2020 12:34, Julien Grall wrote:
>>> On 24/09/2020 11:53, Jan Beulich wrote:
 xmalloc() & Co may not be called with IRQs off, or else check_lock()
 will have its assertion trigger about locks getting acquired
 inconsistently. Re-arranging the locking in evtchn_send() doesn't seem
 very reasonable, especially since the per-channel lock was introduced to
 avoid acquiring the per-domain event lock on the send paths. Issue a
 second call to xsm_evtchn_send() instead, before acquiring the lock, to
 give XSM / Flask a chance to pre-allocate whatever it may need.
>>>
>>> This is the sort of fall-out I was expecting when we decide to turn off
>>> the interrupts for big chunk of code. I couldn't find any at the time
>>> though...
>>>
>>> Can you remind which caller of send_guest{global, vcpu}_virq() will call
>>> them with interrupts off?
>>
>> I don't recall which one of the two it was that I hit; we wanted
>> both to use the lock anyway. send_guest_pirq() very clearly also
>> gets called with IRQs off.
>>
>>> Would it be possible to consider deferring the call to a softirq
>>> taslket? If so, this would allow us to turn the interrupts again.
>>
>> Of course this is in principle possible; the question is how
>> involved this is going to get. 
>> However, on x86 oprofile's call to
>> send_guest_vcpu_virq() can't easily be replaced - it's dangerous
>> enough already that in involves locks in NMI context. I don't
>> fancy seeing it use more commonly used ones.
> 
> Fair enough. I would still like to consider a way where we could avoid 
> to hack xsm_* because we have the interrupts disabled.

Well, from a conceptual pov it's at least questionable for XSM to
need any memory allocations at all when merely being asked for
permission. And indeed the need for it arises solely from its
desire to cache the result, for the sake of subsequent lookups.

I also find it odd that there's an XSM check on the send path in
the first place. This isn't just because it would seem to me that
it should be decided at binding time whether sending is permitted
- I may easily be missing something in the conceptual model here.
It's also because __domain_finalise_shutdown() too uses
evtchn_send(), and I didn't think this one should be subject to
any XSM check (just like send_guest_*() aren't).

> AFAICT, we don't need send_guest_global_virq() and evtchn_send() to be 
> mutually exclusive. Is that correct?

Yes, any number of sends (even to the same port) could in principle
run concurrently, I think. Or else the FIFO code would have been
broken from the very point where the per-channel lock was
introduced and acquiring of the per-domain one then dropped from
evtchn_send() (other sending paths weren't using the per-domain one
anyway already before that).

> So how about splitting the lock in two? One would be used when the 
> interrupts have to be disabled while the other would be used when we can 
> keep interrupts enabled.

Now that's an interesting proposal. I thought one lock per channel
was already pretty fine-grained. Now you propose making it two.

> The two locks would have to be taken when the event channel needs to be 
> modified.

Requiring a total of 6 locks to be acquired when fiddling with
interdomain channels... Wow. Definitely more intrusive overall than
the change here.

Jan



Re: [PATCH RFC 3/4] mm/page_alloc: always move pages to the tail of the freelist in unset_migratetype_isolate()

2020-09-25 Thread Oscar Salvador
On Wed, Sep 16, 2020 at 08:34:10PM +0200, David Hildenbrand wrote:
> Page isolation doesn't actually touch the pages, it simply isolates
> pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist.
> 
> We already place pages to the tail of the freelists when undoing
> isolation via __putback_isolated_page(), let's do it in any case
> (e.g., if order == pageblock_order) and document the behavior.
> 
> This change results in all pages getting onlined via online_pages() to
> be placed to the tail of the freelist.
> 
> Cc: Andrew Morton 
> Cc: Alexander Duyck 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Dave Hansen 
> Cc: Vlastimil Babka 
> Cc: Wei Yang 
> Cc: Oscar Salvador 
> Cc: Mike Rapoport 
> Cc: Scott Cheloha 
> Cc: Michael Ellerman 
> Signed-off-by: David Hildenbrand 

LGTM.
Feel the same way about move_freepages_block_tail/move_freepages_block_tail
wrappers, I think we are better off without them.

Reviewed-by: Oscar Salvador 

Thanks

-- 
Oscar Salvador
SUSE L3



Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable

2020-09-25 Thread Qian Cai
On Wed, 2020-08-26 at 13:17 +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.
> 
> Signed-off-by: Thomas Gleixner 

Today's linux-next will have some warnings on s390x:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/s390.config

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

> ---
> V2: Make the architectures (and drivers) which need the fallbacks select them
> and not the other way round (Bjorn).
> ---
>  arch/ia64/Kconfig  |1 +
>  arch/mips/Kconfig  |1 +
>  arch/powerpc/Kconfig   |1 +
>  arch/s390/Kconfig  |1 +
>  arch/sparc/Kconfig |1 +
>  arch/x86/Kconfig   |1 +
>  drivers/pci/Kconfig|3 +++
>  drivers/pci/controller/Kconfig |3 +++
>  drivers/pci/msi.c  |3 ++-
>  include/linux/msi.h|   31 ++-
>  10 files changed, 40 insertions(+), 6 deletions(-)
> 
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -56,6 +56,7 @@ config IA64
>   select NEED_DMA_MAP_STATE
>   select NEED_SG_DMA_LENGTH
>   select NUMA if !FLATMEM
> + select PCI_MSI_ARCH_FALLBACKS
>   default y
>   help
> The Itanium Processor Family is Intel's 64-bit successor to
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -86,6 +86,7 @@ config MIPS
>   select MODULES_USE_ELF_REL if MODULES
>   select MODULES_USE_ELF_RELA if MODULES && 64BIT
>   select PERF_USE_VMALLOC
> + select PCI_MSI_ARCH_FALLBACKS
>   select RTC_LIB
>   select SYSCTL_EXCEPTION_TRACE
>   select VIRT_TO_BUS
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -246,6 +246,7 @@ config PPC
>   select OLD_SIGACTIONif PPC32
>   select OLD_SIGSUSPEND
>   select PCI_DOMAINS  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select PCI_SYSCALL  if PCI
>   select PPC_DAWR if PPC64
>   select RTC_LIB
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -185,6 +185,7 @@ config S390
>   select OLD_SIGSUSPEND3
>   select PCI_DOMAINS  if PCI
>   select PCI_MSI  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select SPARSE_IRQ
>   select SYSCTL_EXCEPTION_TRACE
>   select THREAD_INFO_IN_TASK
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -43,6 +43,7 @@ config SPARC
>   select GENERIC_STRNLEN_USER
>   select MODULES_USE_ELF_RELA
>   select PCI_SYSCALL if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select ODD_RT_SIGACTION
>   select OLD_SIGSUSPEND
>   select CPU_NO_EFFICIENT_FFS
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -225,6 +225,7 @@ config X86
>   select NEED_SG_DMA_LENGTH
>   select PCI_DOMAINS  if PCI
>   select PCI_LOCKLESS_CONFIG  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select PERF_EVENTS
>   select RTC_LIB
>   select RTC_MC146818_LIB
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
>   depends on PCI_MSI
>   select GENERIC_MSI_IRQ_DOMAIN
>  
> +config PCI_MSI_ARCH_FALLBACKS
> + bool
> +
>  config PCI_QUIRKS
>   default y
>   bool "Enable PCI quirk workarounds" if EXPERT
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -41,6 +41,7 @@ config PCI_TEGRA
>   bool "NVIDIA Tegra PCIe controller"
>   depends on ARCH_TEGRA || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
> + select PCI_MSI_ARCH_FALLBACKS
>   help
> Say Y here if you want support for the PCIe host controller found
> on NVIDIA Tegra SoCs.
> @@ -67,6 +68,7 @@ config PCIE_RCAR_HOST
>   bool "Renesas R-Car PCIe host controller"
>   depends on ARCH_RENESAS || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
> + select PCI_MSI_ARCH_FALLBACKS
>   help
> Say Y here if you want PCIe controller support on R-Car SoCs in host
> mode.
> @@ -103,6 +105,7 @@ config 

Re: [PATCH] x86/xen: disable Firmware First mode for correctable memory errors

2020-09-25 Thread boris . ostrovsky


On 9/25/20 6:11 AM, Juergen Gross wrote:
> @@ -1296,6 +1296,14 @@ asmlinkage __visible void __init xen_start_kernel(void)
>  
>   xen_smp_init();
>  
> +#ifdef CONFIG_ACPI
> + /*
> +  * Disable selecting "Firmware First mode" for correctable memory
> +  * errors, as this is the duty of the hypervisor to decide.
> +  */
> + acpi_disable_cmcff = 1;
> +#endif


Not that it matters greatly but should this go under if (xen_initial_domain()) 
clause a bit further down?


Either way:


Reviewed-by: Boris Ostrovsky 





Re: [PATCH RFC 2/4] mm/page_alloc: place pages to tail in __putback_isolated_page()

2020-09-25 Thread David Hildenbrand
On 25.09.20 15:19, Oscar Salvador wrote:
> On Wed, Sep 16, 2020 at 08:34:09PM +0200, David Hildenbrand wrote:
>> __putback_isolated_page() already documents that pages will be placed to
>> the tail of the freelist - this is, however, not the case for
>> "order >= MAX_ORDER - 2" (see buddy_merge_likely()) - which should be
>> the case for all existing users.
>>
>> This change affects two users:
>> - free page reporting
>> - page isolation, when undoing the isolation.
>>
>> This behavior is desireable for pages that haven't really been touched
>> lately, so exactly the two users that don't actually read/write page
>> content, but rather move untouched pages.
>>
>> The new behavior is especially desirable for memory onlining, where we
>> allow allocation of newly onlined pages via undo_isolate_page_range()
>> in online_pages(). Right now, we always place them to the head of the
>> free list, resulting in undesireable behavior: Assume we add
>> individual memory chunks via add_memory() and online them right away to
>> the NORMAL zone. We create a dependency chain of unmovable allocations
>> e.g., via the memmap. The memmap of the next chunk will be placed onto
>> previous chunks - if the last block cannot get offlined+removed, all
>> dependent ones cannot get offlined+removed. While this can already be
>> observed with individual DIMMs, it's more of an issue for virtio-mem
>> (and I suspect also ppc DLPAR).
>>
>> Note: If we observe a degradation due to the changed page isolation
>> behavior (which I doubt), we can always make this configurable by the
>> instance triggering undo of isolation (e.g., alloc_contig_range(),
>> memory onlining, memory offlining).
>>
>> Cc: Andrew Morton 
>> Cc: Alexander Duyck 
>> Cc: Mel Gorman 
>> Cc: Michal Hocko 
>> Cc: Dave Hansen 
>> Cc: Vlastimil Babka 
>> Cc: Wei Yang 
>> Cc: Oscar Salvador 
>> Cc: Mike Rapoport 
>> Cc: Scott Cheloha 
>> Cc: Michael Ellerman 
>> Signed-off-by: David Hildenbrand 
> 
> LGTM, the only thing is the shuffe_zone topic that Wei and Vlastimil rose.
> Feels a bit odd that takes precedence over something we explicitily demanded.
>

Thanks, yeah I'll be changing that.

> With the comment Vlastimil suggested:
> 
> Reviewed-by: Oscar Salvador 
> 

-- 
Thanks,

David / dhildenb




Re: [PATCH 06/11] drm/i915: use vmap in shmem_pin_map

2020-09-25 Thread Tvrtko Ursulin



On 24/09/2020 14:58, Christoph Hellwig wrote:

shmem_pin_map somewhat awkwardly reimplements vmap using
alloc_vm_area and manual pte setup.  The only practical difference
is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't
seem to be required here (and could be added to vmap using a flag if
actually required).  Switch to use vmap, and use vfree to free both the
vmalloc mapping and the page array, as well as dropping the references
to each page.

Signed-off-by: Christoph Hellwig 
---
  drivers/gpu/drm/i915/gt/shmem_utils.c | 76 +++
  1 file changed, 18 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c 
b/drivers/gpu/drm/i915/gt/shmem_utils.c
index 43c7acbdc79dea..f011ea42487e11 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -49,80 +49,40 @@ struct file *shmem_create_from_object(struct 
drm_i915_gem_object *obj)
return file;
  }
  
-static size_t shmem_npte(struct file *file)

-{
-   return file->f_mapping->host->i_size >> PAGE_SHIFT;
-}
-
-static void __shmem_unpin_map(struct file *file, void *ptr, size_t n_pte)
-{
-   unsigned long pfn;
-
-   vunmap(ptr);
-
-   for (pfn = 0; pfn < n_pte; pfn++) {
-   struct page *page;
-
-   page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
-  GFP_KERNEL);
-   if (!WARN_ON(IS_ERR(page))) {
-   put_page(page);
-   put_page(page);
-   }
-   }
-}
-
  void *shmem_pin_map(struct file *file)
  {
-   const size_t n_pte = shmem_npte(file);
-   pte_t *stack[32], **ptes, **mem;
-   struct vm_struct *area;
-   unsigned long pfn;
-
-   mem = stack;
-   if (n_pte > ARRAY_SIZE(stack)) {
-   mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
-   if (!mem)
-   return NULL;
-   }
+   struct page **pages;
+   size_t n_pages, i;
+   void *vaddr;
  
-	area = alloc_vm_area(n_pte << PAGE_SHIFT, mem);

-   if (!area) {
-   if (mem != stack)
-   kvfree(mem);
+   n_pages = file->f_mapping->host->i_size >> PAGE_SHIFT;
+   pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
return NULL;
-   }
  
-	ptes = mem;

-   for (pfn = 0; pfn < n_pte; pfn++) {
-   struct page *page;
-
-   page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
-  GFP_KERNEL);
-   if (IS_ERR(page))
+   for (i = 0; i < n_pages; i++) {
+   pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i,
+  GFP_KERNEL);
+   if (IS_ERR(pages[i]))
goto err_page;
-
-   **ptes++ = mk_pte(page,  PAGE_KERNEL);
}
  
-	if (mem != stack)

-   kvfree(mem);
-
+   vaddr = vmap(pages, n_pages, VM_MAP_PUT_PAGES, PAGE_KERNEL);
+   if (!vaddr)
+   goto err_page;
mapping_set_unevictable(file->f_mapping);
-   return area->addr;
-
+   return vaddr;
  err_page:
-   if (mem != stack)
-   kvfree(mem);
-
-   __shmem_unpin_map(file, area->addr, pfn);
+   while (--i >= 0)
+   put_page(pages[i]);
+   kvfree(pages);
return NULL;
  }
  
  void shmem_unpin_map(struct file *file, void *ptr)

  {
mapping_clear_unevictable(file->f_mapping);
-   __shmem_unpin_map(file, ptr, shmem_npte(file));
+   vfree(ptr);
  }
  
  static int __shmem_rw(struct file *file, loff_t off,




Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common

2020-09-25 Thread Oleksandr



On 25.09.20 10:03, Jan Beulich wrote:

Hi Jan.


On 24.09.2020 18:45, Oleksandr wrote:

On 24.09.20 14:16, Jan Beulich wrote:

Hi Jan


On 22.09.2020 21:32, Oleksandr wrote:

On 16.09.20 11:50, Jan Beulich wrote:

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
break;
}

+#ifdef CONFIG_IOREQ_SERVER

+if ( op == XENMEM_decrease_reservation )
+curr_d->qemu_mapcache_invalidate = true;
+#endif

I don't see why you put this right into decrease_reservation(). This
isn't just to avoid the extra conditional, but first and foremost to
avoid bypassing the earlier return from the function (in the case of
preemption). In the context of this I wonder whether the ordering of
operations in hvm_hypercall() is actually correct.

Good point, indeed we may return earlier in case of preemption, I missed
that.
Will move it to decrease_reservation(). But, we may return even earlier
in case of error...
Now I am wondering should we move it to the very beginning of command
processing or not?

In _this_ series I'd strongly recommend you keep things working as
they are. If independently you think you've found a reason to
re-order certain operations, then feel free to send a patch with
suitable justification.

Of course, I will try to retain current behavior.



I'm also unconvinced curr_d is the right domain in all cases here;
while this may be a pre-existing issue in principle, I'm afraid it
gets more pronounced by the logic getting moved to common code.

Sorry I didn't get your concern here.

Well, you need to be concerned whose qemu_mapcache_invalidate flag
you set.

May I ask, in what cases the *curr_d* is the right domain?

When a domain does a decrease-reservation on itself. I thought
that's obvious. But perhaps your question was rather meant a to
whether a->domain ever is _not_ the right one?

No, my question was about *curr_d*. I saw your answer
> I'm also unconvinced curr_d is the right domain in all cases here;
and just wanted to clarify these cases. Sorry if I was unclear.





We need to make sure that domain is using IOREQ server(s) at least.
Hopefully, we have a helper for this
which is hvm_domain_has_ioreq_server(). Please clarify, anything else I
should taking care of?

Nothing I can recall / think of right now, except that the change
may want to come under a different title and with a different
description.As indicated, I don't think this is correct for PVH
Dom0 issuing the request against a HVM DomU, and addressing this
will likely want this moved out of hvm_memory_op() anyway. Of
course an option is to split this into two patches - the proposed
bug fix (perhaps wanting backporting) and then the moving of the
field out of arch.hvm. If you feel uneasy about the bug fix part,
let me know and I (or maybe Roger) will see to put together a
patch.


Thank you for the clarification.

Yes, it would be really nice if you (or maybe Roger) could create a 
patch for the bug fix part.


--
Regards,

Oleksandr Tyshchenko




Re: [PATCH RFC 2/4] mm/page_alloc: place pages to tail in __putback_isolated_page()

2020-09-25 Thread Oscar Salvador
On Wed, Sep 16, 2020 at 08:34:09PM +0200, David Hildenbrand wrote:
> __putback_isolated_page() already documents that pages will be placed to
> the tail of the freelist - this is, however, not the case for
> "order >= MAX_ORDER - 2" (see buddy_merge_likely()) - which should be
> the case for all existing users.
> 
> This change affects two users:
> - free page reporting
> - page isolation, when undoing the isolation.
> 
> This behavior is desireable for pages that haven't really been touched
> lately, so exactly the two users that don't actually read/write page
> content, but rather move untouched pages.
> 
> The new behavior is especially desirable for memory onlining, where we
> allow allocation of newly onlined pages via undo_isolate_page_range()
> in online_pages(). Right now, we always place them to the head of the
> free list, resulting in undesireable behavior: Assume we add
> individual memory chunks via add_memory() and online them right away to
> the NORMAL zone. We create a dependency chain of unmovable allocations
> e.g., via the memmap. The memmap of the next chunk will be placed onto
> previous chunks - if the last block cannot get offlined+removed, all
> dependent ones cannot get offlined+removed. While this can already be
> observed with individual DIMMs, it's more of an issue for virtio-mem
> (and I suspect also ppc DLPAR).
> 
> Note: If we observe a degradation due to the changed page isolation
> behavior (which I doubt), we can always make this configurable by the
> instance triggering undo of isolation (e.g., alloc_contig_range(),
> memory onlining, memory offlining).
> 
> Cc: Andrew Morton 
> Cc: Alexander Duyck 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Dave Hansen 
> Cc: Vlastimil Babka 
> Cc: Wei Yang 
> Cc: Oscar Salvador 
> Cc: Mike Rapoport 
> Cc: Scott Cheloha 
> Cc: Michael Ellerman 
> Signed-off-by: David Hildenbrand 

LGTM, the only thing is the shuffe_zone topic that Wei and Vlastimil rose.
Feels a bit odd that takes precedence over something we explicitily demanded.

With the comment Vlastimil suggested:

Reviewed-by: Oscar Salvador 

-- 
Oscar Salvador
SUSE L3



Re: [PATCH v2 02/11] xen/gnttab: Rework resource acquisition

2020-09-25 Thread Jan Beulich
On 22.09.2020 20:24, Andrew Cooper wrote:
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,81 @@ static int gnttab_get_shared_frame_mfn(struct domain 
> *d,
>  return 0;
>  }
>  
> +int gnttab_acquire_resource(
> +struct domain *d, unsigned int id, unsigned long frame,
> +unsigned int nr_frames, xen_pfn_t mfn_list[])
> +{
> +struct grant_table *gt = d->grant_table;
> +unsigned int i = nr_frames, tot_frames;
> +mfn_t tmp;
> +void **vaddrs = NULL;
> +int rc;
> +
> +/* Input sanity. */
> +if ( !nr_frames )
> +return -EINVAL;

I continue to object to this becoming an error. It wasn't before,
and it wouldn't be in line with various other operations, not the
least XENMEM_resource_ioreq_server handling, but also various of
the other mem-op functions (just to give concrete examples) or
basically all of the grant-table ones.

And if it really was to be an error, it could be had by folding
into ...

> +/* Overflow checks */
> +if ( frame + nr_frames < frame )
> +return -EINVAL;

this:

if ( frame + nr_frames <= frame )
return -EINVAL;

> +tot_frames = frame + nr_frames;
> +if ( tot_frames != frame + nr_frames )
> +return -EINVAL;
> +
> +/* Grow table if necessary. */
> +grant_write_lock(gt);
> +switch ( id )
> +{
> +case XENMEM_resource_grant_table_id_shared:
> +rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, );
> +break;
> +
> +case XENMEM_resource_grant_table_id_status:
> +if ( gt->gt_version != 2 )
> +{
> +default:
> +rc = -EINVAL;
> +break;
> +}
> +rc = gnttab_get_status_frame_mfn(d, tot_frames - 1, );
> +break;
> +}
> +
> +/* Any errors from growing the table? */
> +if ( rc )
> +goto out;
> +
> +switch ( id )
> +{
> +case XENMEM_resource_grant_table_id_shared:
> +vaddrs = gt->shared_raw;
> +break;
> +
> +case XENMEM_resource_grant_table_id_status:
> +/* Check that void ** is a suitable representation for gt->status. */
> +BUILD_BUG_ON(!__builtin_types_compatible_p(
> + typeof(gt->status), grant_status_t **));
> +vaddrs = (void **)gt->status;
> +break;
> +}

Why the 2nd switch()? As long as you don't de-reference vaddrs
prior to checking rc, I don't see anything that could go wrong.
And once both are folded, maybe vaddr's initializer could also
be dropped again?

Jan



Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Julien Grall

Hi Jan,

On 25/09/2020 13:21, Jan Beulich wrote:

On 25.09.2020 12:34, Julien Grall wrote:

On 24/09/2020 11:53, Jan Beulich wrote:

xmalloc() & Co may not be called with IRQs off, or else check_lock()
will have its assertion trigger about locks getting acquired
inconsistently. Re-arranging the locking in evtchn_send() doesn't seem
very reasonable, especially since the per-channel lock was introduced to
avoid acquiring the per-domain event lock on the send paths. Issue a
second call to xsm_evtchn_send() instead, before acquiring the lock, to
give XSM / Flask a chance to pre-allocate whatever it may need.


This is the sort of fall-out I was expecting when we decide to turn off
the interrupts for big chunk of code. I couldn't find any at the time
though...

Can you remind which caller of send_guest{global, vcpu}_virq() will call
them with interrupts off?


I don't recall which one of the two it was that I hit; we wanted
both to use the lock anyway. send_guest_pirq() very clearly also
gets called with IRQs off.


Would it be possible to consider deferring the call to a softirq
taslket? If so, this would allow us to turn the interrupts again.


Of course this is in principle possible; the question is how
involved this is going to get. 
However, on x86 oprofile's call to

send_guest_vcpu_virq() can't easily be replaced - it's dangerous
enough already that in involves locks in NMI context. I don't
fancy seeing it use more commonly used ones.


Fair enough. I would still like to consider a way where we could avoid 
to hack xsm_* because we have the interrupts disabled.


AFAICT, we don't need send_guest_global_virq() and evtchn_send() to be 
mutually exclusive. Is that correct?


So how about splitting the lock in two? One would be used when the 
interrupts have to be disabled while the other would be used when we can 
keep interrupts enabled.


The two locks would have to be taken when the event channel needs to be 
modified.


Cheers,

--
Julien Grall



RE: [PATCH v9 0/8] domain context infrastructure

2020-09-25 Thread Paul Durrant
> -Original Message-
> From: Lengyel, Tamas 
> Sent: 24 September 2020 20:36
> To: Paul Durrant ; xen-devel@lists.xenproject.org
> Cc: Paul Durrant ; Andrew Cooper 
> ; Daniel De Graaf
> ; George Dunlap ; Ian Jackson
> ; Jan Beulich ; Julien Grall 
> ; Marek
> Marczykowski-Górecki ; Roger Pau Monné 
> ;
> Stefano Stabellini ; Volodymyr Babchuk 
> ; Wei Liu
> 
> Subject: RE: [PATCH v9 0/8] domain context infrastructure
> 
> 
> 
> > -Original Message-
> > From: Xen-devel  On Behalf Of Paul
> > Durrant
> > Sent: Thursday, September 24, 2020 9:10 AM
> > To: xen-devel@lists.xenproject.org
> > Cc: Paul Durrant ; Andrew Cooper
> > ; Daniel De Graaf ;
> > George Dunlap ; Ian Jackson
> > ; Jan Beulich ; Julien Grall
> > ; Marek Marczykowski-Górecki
> > ; Roger Pau Monné
> > ; Stefano Stabellini ;
> > Volodymyr Babchuk ; Wei Liu
> > 
> > Subject: [PATCH v9 0/8] domain context infrastructure
> >
> > From: Paul Durrant 
> >
> > Paul Durrant (8):
> >   xen/common: introduce a new framework for save/restore of 'domain'
> > context
> >   xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
> >   tools/misc: add xen-domctx to present domain context
> >   docs/specs: add missing definitions to libxc-migration-stream
> >   docs / tools: specific migration v4 to include DOMAIN_CONTEXT
> >   common/domain: add a domain context record for shared_info...
> >   x86/time: add a domain context record for tsc_info...
> >   tools/libxc: add DOMAIN_CONTEXT records to the migration stream...
> 
> 
> Hi Paul,
> Could you push a git branch somewhere for this series? I would like to see 
> this being integrated with
> VM forking and if its not too much effort just create the patch for that so 
> that it could be appended
> to the series.
> 

Hi Tamas,

  Done. See 
https://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git;a=shortlog;h=refs/heads/domain-save14

  Cheers,

Paul





Re: [PATCH v9 6/8] common/domain: add a domain context record for shared_info...

2020-09-25 Thread Jan Beulich
On 24.09.2020 15:10, Paul Durrant wrote:
> From: Paul Durrant 
> 
> ... and update xen-domctx to dump some information describing the record.
> 
> NOTE: Processing of the content during restore is currently limited to
>   PV domains, and matches processing of the PV-only SHARED_INFO record
>   done by libxc. All content is, however, saved such that restore
>   processing can be modified in future without requiring a new record
>   format.
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Jan Beulich 



[xen-4.10-testing test] 154750: trouble: blocked/broken

2020-09-25 Thread osstest service owner
flight 154750 xen-4.10-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/154750/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64  broken
 build-amd64-prev broken
 build-amd64-pvopsbroken
 build-amd64-xsm  broken
 build-amd64-xtf  broken
 build-arm64  broken
 build-arm64-pvopsbroken
 build-arm64-xsm  broken
 build-armhf  broken
 build-armhf-pvopsbroken
 build-i386   broken
 build-i386-prev  broken
 build-i386-pvops broken
 build-i386-xsm   broken
 build-i386-pvops  4 host-install(4)broken REGR. vs. 151728
 build-i386-prev   4 host-install(4)broken REGR. vs. 151728
 build-i3864 host-install(4)broken REGR. vs. 151728
 build-i386-xsm4 host-install(4)broken REGR. vs. 151728
 build-amd64-pvops 4 host-install(4)broken REGR. vs. 151728
 build-armhf-pvops 4 host-install(4)broken REGR. vs. 151728
 build-amd64-xsm   4 host-install(4)broken REGR. vs. 151728
 build-arm64   4 host-install(4)broken REGR. vs. 151728
 build-arm64-xsm   4 host-install(4)broken REGR. vs. 151728
 build-arm64-pvops 4 host-install(4)broken REGR. vs. 151728
 build-amd64   4 host-install(4)broken REGR. vs. 151728
 build-amd64-xtf   4 host-install(4)broken REGR. vs. 151728
 build-amd64-prev  4 host-install(4)broken REGR. vs. 151728
 build-armhf   4 host-install(4)broken REGR. vs. 151728

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-seattle   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-11 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-21 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-31 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 

Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Jan Beulich
On 25.09.2020 12:34, Julien Grall wrote:
> On 24/09/2020 11:53, Jan Beulich wrote:
>> xmalloc() & Co may not be called with IRQs off, or else check_lock()
>> will have its assertion trigger about locks getting acquired
>> inconsistently. Re-arranging the locking in evtchn_send() doesn't seem
>> very reasonable, especially since the per-channel lock was introduced to
>> avoid acquiring the per-domain event lock on the send paths. Issue a
>> second call to xsm_evtchn_send() instead, before acquiring the lock, to
>> give XSM / Flask a chance to pre-allocate whatever it may need.
> 
> This is the sort of fall-out I was expecting when we decide to turn off 
> the interrupts for big chunk of code. I couldn't find any at the time 
> though...
> 
> Can you remind which caller of send_guest{global, vcpu}_virq() will call 
> them with interrupts off?

I don't recall which one of the two it was that I hit; we wanted
both to use the lock anyway. send_guest_pirq() very clearly also
gets called with IRQs off.

> Would it be possible to consider deferring the call to a softirq 
> taslket? If so, this would allow us to turn the interrupts again.

Of course this is in principle possible; the question is how
involved this is going to get. However, on x86 oprofile's call to
send_guest_vcpu_virq() can't easily be replaced - it's dangerous
enough already that in involves locks in NMI context. I don't
fancy seeing it use more commonly used ones.

Jan



RE: [PATCH v2 0/2] fix 'xl block-detach'

2020-09-25 Thread Paul Durrant
Ping? AFAICT this series is fully acked. Can it be committed soon?

  Paul

> -Original Message-
> From: Paul Durrant 
> Sent: 15 September 2020 15:10
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant 
> Subject: [PATCH v2 0/2] fix 'xl block-detach'
> 
> From: Paul Durrant 
> 
> This series makes it behave as the documentation states it should
> 
> Paul Durrant (2):
>   libxl: provide a mechanism to define a device 'safe remove'
> function...
>   xl: implement documented '--force' option for block-detach
> 
>  docs/man/xl.1.pod.in |  4 ++--
>  tools/libxl/libxl.h  | 33 +
>  tools/libxl/libxl_device.c   |  9 +
>  tools/libxl/libxl_disk.c |  4 +++-
>  tools/libxl/libxl_domain.c   |  2 +-
>  tools/libxl/libxl_internal.h | 30 +++---
>  tools/xl/xl_block.c  | 21 -
>  tools/xl/xl_cmdtable.c   |  3 ++-
>  8 files changed, 77 insertions(+), 29 deletions(-)
> 
> --
> 2.20.1





Re: [PATCH] evtchn/Flask: pre-allocate node on send path

2020-09-25 Thread Julien Grall

Hi Jan,

On 24/09/2020 11:53, Jan Beulich wrote:

xmalloc() & Co may not be called with IRQs off, or else check_lock()
will have its assertion trigger about locks getting acquired
inconsistently. Re-arranging the locking in evtchn_send() doesn't seem
very reasonable, especially since the per-channel lock was introduced to
avoid acquiring the per-domain event lock on the send paths. Issue a
second call to xsm_evtchn_send() instead, before acquiring the lock, to
give XSM / Flask a chance to pre-allocate whatever it may need.


This is the sort of fall-out I was expecting when we decide to turn off 
the interrupts for big chunk of code. I couldn't find any at the time 
though...


Can you remind which caller of send_guest{global, vcpu}_virq() will call 
them with interrupts off?


Would it be possible to consider deferring the call to a softirq 
taslket? If so, this would allow us to turn the interrupts again.


Cheers,

--
Julien Grall



Re: [PATCH RFC 1/4] mm/page_alloc: convert "report" flag of __free_one_page() to a proper flag

2020-09-25 Thread Oscar Salvador
On Wed, Sep 16, 2020 at 08:34:08PM +0200, David Hildenbrand wrote:
> Let's prepare for additional flags and avoid long parameter lists of bools.
> Follow-up patches will also make use of the flags in __free_pages_ok(),
> however, I wasn't able to come up with a better name for the type - should
> be good enough for internal purposes.
> 
> Cc: Andrew Morton 
> Cc: Alexander Duyck 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Dave Hansen 
> Cc: Vlastimil Babka 
> Cc: Wei Yang 
> Cc: Oscar Salvador 
> Cc: Mike Rapoport 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Oscar Salvador 

-- 
Oscar Salvador
SUSE L3



[PATCH] x86/xen: disable Firmware First mode for correctable memory errors

2020-09-25 Thread Juergen Gross
When running as Xen dom0 the kernel isn't responsible for selecting the
error handling mode, this should be handled by the hypervisor.

So disable setting FF mode when running as Xen pv guest. Not doing so
might result in boot splats like:

[7.509696] HEST: Enabling Firmware First mode for corrected errors.
[7.510382] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 2.
[7.510383] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 3.
[7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 4.
[7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 5.
[7.510385] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 6.
[7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 7.
[7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA 
bank 8.

Reason is that the HEST ACPI table contains the real number of MCA
banks, while the hypervisor is emulating only 2 banks for guests.

Cc: sta...@vger.kernel.org
Signed-off-by: Juergen Gross 
---
 arch/x86/xen/enlighten_pv.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 22e741e0b10c..065c049d0f3c 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1296,6 +1296,14 @@ asmlinkage __visible void __init xen_start_kernel(void)
 
xen_smp_init();
 
+#ifdef CONFIG_ACPI
+   /*
+* Disable selecting "Firmware First mode" for correctable memory
+* errors, as this is the duty of the hypervisor to decide.
+*/
+   acpi_disable_cmcff = 1;
+#endif
+
 #ifdef CONFIG_ACPI_NUMA
/*
 * The pages we from Xen are not related to machine pages, so
-- 
2.26.2




Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features

2020-09-25 Thread Oleksandr



On 25.09.20 09:51, Jan Beulich wrote:

Hi Jan


On 24.09.2020 20:02, Oleksandr wrote:

On 24.09.20 19:02, Oleksandr wrote:
Julien, I noticed that three fields mmio* are not used within current
series on Arm. Do we expect them to be used later on?
I would rather not add fields which are not used. We could add them when
needed.

Would be the following acceptable?
1. Both fields "io_completion" and "io_req" (which seems to be the only
fields used in common/ioreq.c) are moved to common struct vcpu as part
of struct vcpu_io,
      enum hvm_io_completion is also moved (and renamed).
2. We remove everything related to hvm_vcpu* from the Arm header.
3. x86's struct hvm_vcpu_io stays as it is (but without two fields
"io_completion" and "io_req").
      I think, this way we separate a common part and reduce Xen changes
(which are getting bigger).

If this works, it would be my preference too.


Thanks. I will wait for Julien's input on that and if he doesn't have 
any objections I will go this direction.



--
Regards,

Oleksandr Tyshchenko




Re: [PATCH v3 00/22] Convert all remaining drivers to GEM object functions

2020-09-25 Thread Thomas Zimmermann
Hi

Am 23.09.20 um 16:33 schrieb Christian König:
> Feel free to add an Acked-by: Christian König 
> to all patches which I haven't explicitly reviewed.

Done, thanks.

> 
> I would say we should just push this to drm-misc-next now.

It's merged now.

Best regards
Thomas

> 
> Thanks for the nice cleanup,
> Christian.
> 
> Am 23.09.20 um 12:21 schrieb Thomas Zimmermann:
>> The GEM and PRIME related callbacks in struct drm_driver are
>> deprecated in
>> favor of GEM object functions in struct drm_gem_object_funcs. This
>> patchset
>> converts the remaining drivers to object functions and removes most of
>> the
>> obsolete interfaces.
>>
>> Version 3 of this patchset mostly fixes drm_gem_prime_handle_to_fd and
>> updates i.MX's dcss driver. The driver was missing from earlier versions
>> and still needs review.
>>
>> Patches #1 to #6, #8 to #17 and #19 to #20 convert DRM drivers to GEM
>> object
>> functions, one by one. Each patch moves existing callbacks from struct
>> drm_driver to an instance of struct drm_gem_object_funcs, and sets these
>> funcs when the GEM object is initialized. The expection is
>> .gem_prime_mmap.
>> There are different ways of how drivers implement the callback, and
>> moving
>> it to GEM object functions requires a closer review for each.
>>
>> Patch #18 fixes virtgpu to use GEM object functions where possible. The
>> driver recently introduced a function for one of the deprecated
>> callbacks.
>>
>> Patches #7 and #20 convert i.MX's dcss and xlnx to CMA helper macros.
>> There's
>> no apparent reason why the drivers do the GEM setup on their's own.
>> Using CMA
>> helper macros adds GEM object functions implicitly.
>>
>> With most of the GEM and PRIME moved to GEM object functions, related
>> code
>> in struct drm_driver and in the DRM core/helpers is being removed by
>> patch
>> #22.
>>
>> Further testing is welcome. I tested the drivers for which I have HW
>> available. These are gma500, i915, nouveau, radeon and vc4. The console,
>> Weston and Xorg apparently work with the patches applied.
>>
>> v3:
>> * restore default call to drm_gem_prime_export() in
>>   drm_gem_prime_handle_to_fd()
>> * return -ENOSYS if get_sg_table is not set
>> * drop all checks for obj->funcs
>> * clean up TODO list and documentation
>> v2:
>> * moved code in amdgpu and radeon
>> * made several functions static in various drivers
>> * updated TODO-list item
>> * fix virtgpu
>>
>> Thomas Zimmermann (22):
>>    drm/amdgpu: Introduce GEM object functions
>>    drm/armada: Introduce GEM object functions
>>    drm/etnaviv: Introduce GEM object functions
>>    drm/exynos: Introduce GEM object functions
>>    drm/gma500: Introduce GEM object functions
>>    drm/i915: Introduce GEM object functions
>>    drm/imx/dcss: Initialize DRM driver instance with CMA helper macro
>>    drm/mediatek: Introduce GEM object functions
>>    drm/msm: Introduce GEM object funcs
>>    drm/nouveau: Introduce GEM object functions
>>    drm/omapdrm: Introduce GEM object functions
>>    drm/pl111: Introduce GEM object functions
>>    drm/radeon: Introduce GEM object functions
>>    drm/rockchip: Convert to drm_gem_object_funcs
>>    drm/tegra: Introduce GEM object functions
>>    drm/vc4: Introduce GEM object functions
>>    drm/vgem: Introduce GEM object functions
>>    drm/virtgpu: Set PRIME export function in struct drm_gem_object_funcs
>>    drm/vkms: Introduce GEM object functions
>>    drm/xen: Introduce GEM object functions
>>    drm/xlnx: Initialize DRM driver instance with CMA helper macro
>>    drm: Remove obsolete GEM and PRIME callbacks from struct drm_driver
>>
>>   Documentation/gpu/drm-mm.rst  |  4 +-
>>   Documentation/gpu/todo.rst    |  9 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  6 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 23 +++--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h   |  5 --
>>   drivers/gpu/drm/armada/armada_drv.c   |  3 -
>>   drivers/gpu/drm/armada/armada_gem.c   | 12 ++-
>>   drivers/gpu/drm/armada/armada_gem.h   |  2 -
>>   drivers/gpu/drm/drm_gem.c | 53 
>>   drivers/gpu/drm/drm_gem_cma_helper.c  |  8 +-
>>   drivers/gpu/drm/drm_prime.c   | 14 +--
>>   drivers/gpu/drm/etnaviv/etnaviv_drv.c | 13 ---
>>   drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 -
>>   drivers/gpu/drm/etnaviv/etnaviv_gem.c | 19 -
>>   drivers/gpu/drm/exynos/exynos_drm_drv.c   | 10 ---
>>   drivers/gpu/drm/exynos/exynos_drm_gem.c   | 15 
>>   drivers/gpu/drm/gma500/framebuffer.c  |  2 +
>>   drivers/gpu/drm/gma500/gem.c  | 18 +++-
>>   drivers/gpu/drm/gma500/gem.h  |  3 +
>>   drivers/gpu/drm/gma500/psb_drv.c  |  9 --
>>   drivers/gpu/drm/gma500/psb_drv.h  |  2 -
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c    | 21 -
>>   

Re: [PATCH RFC 3/4] mm/page_alloc: always move pages to the tail of the freelist in unset_migratetype_isolate()

2020-09-25 Thread Vlastimil Babka
On 9/25/20 10:05 AM, David Hildenbrand wrote:
  static inline void del_page_from_free_list(struct page *page, struct zone 
 *zone,
   unsigned int order)
  {
 @@ -2323,7 +2332,7 @@ static inline struct page 
 *__rmqueue_cma_fallback(struct zone *zone,
   */
  static int move_freepages(struct zone *zone,
  struct page *start_page, struct page *end_page,
 -int migratetype, int *num_movable)
 +int migratetype, int *num_movable, bool to_tail)
  {
struct page *page;
unsigned int order;
 @@ -2354,7 +2363,10 @@ static int move_freepages(struct zone *zone,
VM_BUG_ON_PAGE(page_zone(page) != zone, page);
  
order = page_order(page);
 -  move_to_free_list(page, zone, order, migratetype);
 +  if (to_tail)
 +  move_to_free_list_tail(page, zone, order, migratetype);
 +  else
 +  move_to_free_list(page, zone, order, migratetype);
page += 1 << order;
pages_moved += 1 << order;
}
 @@ -2362,8 +2374,9 @@ static int move_freepages(struct zone *zone,
return pages_moved;
  }
  
 -int move_freepages_block(struct zone *zone, struct page *page,
 -  int migratetype, int *num_movable)
 +static int __move_freepages_block(struct zone *zone, struct page *page,
 +int migratetype, int *num_movable,
 +bool to_tail)
  {
unsigned long start_pfn, end_pfn;
struct page *start_page, *end_page;
 @@ -2384,7 +2397,20 @@ int move_freepages_block(struct zone *zone, struct 
 page *page,
return 0;
  
return move_freepages(zone, start_page, end_page, migratetype,
 -  num_movable);
 +num_movable, to_tail);
 +}
 +
 +int move_freepages_block(struct zone *zone, struct page *page,
 +   int migratetype, int *num_movable)
 +{
 +  return __move_freepages_block(zone, page, migratetype, num_movable,
 +false);
 +}
 +
 +int move_freepages_block_tail(struct zone *zone, struct page *page,
 +int migratetype)
 +{
 +  return __move_freepages_block(zone, page, migratetype, NULL, true);
  }
>>>
>>> Likewise, just 5 callers of move_freepages_block(), all in the files you're
>>> already changing, so no need for this wrappers IMHO.
> 
> As long as we don't want to move the implementation to the header, we'll
> need it for the constant propagation to work at compile time (we don't
> really have link-time optimizations). Or am I missing something?

I guess move_freepages_block() is not exactly fast path, so we could do without 
it.

> Thanks!
> 




RE: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common

2020-09-25 Thread Paul Durrant
> -Original Message-
> From: Julien Grall 
> Sent: 24 September 2020 19:01
> To: Oleksandr Tyshchenko ; xen-devel@lists.xenproject.org
> Cc: Oleksandr Tyshchenko ; Andrew Cooper 
> ;
> George Dunlap ; Ian Jackson 
> ; Jan Beulich
> ; Stefano Stabellini ; Wei Liu 
> ; Roger Pau
> Monné ; Paul Durrant ; Jun Nakajima 
> ;
> Kevin Tian ; Tim Deegan ; Julien Grall 
> 
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> 
> 
> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> > +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> > +{
> > +unsigned int prev_state = STATE_IOREQ_NONE;
> > +unsigned int state = p->state;
> > +uint64_t data = ~0;
> > +
> > +smp_rmb();
> > +
> > +/*
> > + * The only reason we should see this condition be false is when an
> > + * emulator dying races with I/O being requested.
> > + */
> > +while ( likely(state != STATE_IOREQ_NONE) )
> > +{
> > +if ( unlikely(state < prev_state) )
> > +{
> > +gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> 
> > %u\n",
> > + prev_state, state);
> > +sv->pending = false;
> > +domain_crash(sv->vcpu->domain);
> > +return false; /* bail */
> > +}
> > +
> > +switch ( prev_state = state )
> > +{
> > +case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> > +p->state = STATE_IOREQ_NONE;
> > +data = p->data;
> > +break;
> > +
> > +case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> 
> > IORESP_READY */
> > +case STATE_IOREQ_INPROCESS:
> > +wait_on_xen_event_channel(sv->ioreq_evtchn,
> > +  ({ state = p->state;
> > + smp_rmb();
> > + state != prev_state; }));
> > +continue;
> 
> As I pointed out previously [1], this helper was implemented with the
> expectation that wait_on_xen_event_channel() will not return if the vCPU
> got rescheduled.
> 
> However, this assumption doesn't hold on Arm.
> 
> I can see two solution:
> 1) Re-execute the caller
> 2) Prevent an IOREQ to disappear until the loop finish.
> 
> @Paul any opinions?

The ioreq control plane is largely predicated on there being no pending I/O 
when the state of a server is modified, and it is assumed that domain_pause() 
is sufficient to achieve this. If that assumption doesn't hold then we need 
additional synchronization.

  Paul

> 
> Cheers,
> 
> [1]
> https://lore.kernel.org/xen-devel/6bfc3920-8f29-188c-cff4-2b99dabe1...@xen.org/
> 
> 
> --
> Julien Grall




Re: [PATCH RFC 2/4] mm/page_alloc: place pages to tail in __putback_isolated_page()

2020-09-25 Thread David Hildenbrand
On 24.09.20 12:37, Vlastimil Babka wrote:
> On 9/16/20 8:34 PM, David Hildenbrand wrote:
>> __putback_isolated_page() already documents that pages will be placed to
>> the tail of the freelist - this is, however, not the case for
>> "order >= MAX_ORDER - 2" (see buddy_merge_likely()) - which should be
>> the case for all existing users.
> 
> I think here should be a sentence saying something along "Thus this patch
> introduces a FOP_TO_TAIL flag to really ensure moving pages to tail."

Agreed, thanks!

> 
>> This change affects two users:
>> - free page reporting
>> - page isolation, when undoing the isolation.
>>
>> This behavior is desireable for pages that haven't really been touched
>> lately, so exactly the two users that don't actually read/write page
>> content, but rather move untouched pages.
>>
>> The new behavior is especially desirable for memory onlining, where we
>> allow allocation of newly onlined pages via undo_isolate_page_range()
>> in online_pages(). Right now, we always place them to the head of the
>> free list, resulting in undesireable behavior: Assume we add
>> individual memory chunks via add_memory() and online them right away to
>> the NORMAL zone. We create a dependency chain of unmovable allocations
>> e.g., via the memmap. The memmap of the next chunk will be placed onto
>> previous chunks - if the last block cannot get offlined+removed, all
>> dependent ones cannot get offlined+removed. While this can already be
>> observed with individual DIMMs, it's more of an issue for virtio-mem
>> (and I suspect also ppc DLPAR).
>>
>> Note: If we observe a degradation due to the changed page isolation
>> behavior (which I doubt), we can always make this configurable by the
>> instance triggering undo of isolation (e.g., alloc_contig_range(),
>> memory onlining, memory offlining).
>>
>> Cc: Andrew Morton 
>> Cc: Alexander Duyck 
>> Cc: Mel Gorman 
>> Cc: Michal Hocko 
>> Cc: Dave Hansen 
>> Cc: Vlastimil Babka 
>> Cc: Wei Yang 
>> Cc: Oscar Salvador 
>> Cc: Mike Rapoport 
>> Cc: Scott Cheloha 
>> Cc: Michael Ellerman 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  mm/page_alloc.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 91cefb8157dd..bba9a0f60c70 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -89,6 +89,12 @@ typedef int __bitwise fop_t;
>>   */
>>  #define FOP_SKIP_REPORT_NOTIFY  ((__force fop_t)BIT(0))
>>  
>> +/*
>> + * Place the freed page to the tail of the freelist after buddy merging. 
>> Will
>> + * get ignored with page shuffling enabled.
>> + */
>> +#define FOP_TO_TAIL ((__force fop_t)BIT(1))
>> +
>>  /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
>>  static DEFINE_MUTEX(pcp_batch_high_lock);
>>  #define MIN_PERCPU_PAGELIST_FRACTION(8)
>> @@ -1040,6 +1046,8 @@ static inline void __free_one_page(struct page *page, 
>> unsigned long pfn,
>>  
>>  if (is_shuffle_order(order))
>>  to_tail = shuffle_pick_tail();
>> +else if (fop_flags & FOP_TO_TAIL)
>> +to_tail = true;
> 
> Should we really let random shuffling decision have a larger priority than
> explicit FOP_TO_TAIL request? Wei Yang mentioned that there's a call to
> shuffle_zone() anyway to process a freshly added memory, so we don't need to 
> do
> that also during the process of addition itself? Might help with your goal of
> reducing dependencies even on systems that do have shuffling enabled?

So, we do have cases where generic_online_page() -> __free_pages_core()
isn't called (see patch #4):

generic_online_page() is used in two cases:

1. Direct memory onlining in online_pages(). Here, we call
   shuffle_zone().
2. Deferred memory onlining in memory-ballooning-like mechanisms (HyperV
   balloon and virtio-mem), when parts of a section are kept
   fake-offline to be fake-onlined later on.

While we shuffle in the fist instance the whole zone, we wouldn't
shuffle in the second case.

But maybe this should be tackled (just like when alloc_contig_free() a
large contiguous range, memory offlining failing, alloc_contig_range()
failing) by manually shuffling the zone again. That would be cleaner,
and the right thing to do when exposing large, contiguous ranges again
to the buddy.

Thanks!


-- 
Thanks,

David / dhildenb




Re: [PATCH RFC 3/4] mm/page_alloc: always move pages to the tail of the freelist in unset_migratetype_isolate()

2020-09-25 Thread David Hildenbrand
On 25.09.20 04:45, Wei Yang wrote:
> On Thu, Sep 24, 2020 at 01:13:29PM +0200, Vlastimil Babka wrote:
>> On 9/16/20 8:34 PM, David Hildenbrand wrote:
>>> Page isolation doesn't actually touch the pages, it simply isolates
>>> pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist.
>>>
>>> We already place pages to the tail of the freelists when undoing
>>> isolation via __putback_isolated_page(), let's do it in any case
>>> (e.g., if order == pageblock_order) and document the behavior.
>>>
>>> This change results in all pages getting onlined via online_pages() to
>>> be placed to the tail of the freelist.
>>>
>>> Cc: Andrew Morton 
>>> Cc: Alexander Duyck 
>>> Cc: Mel Gorman 
>>> Cc: Michal Hocko 
>>> Cc: Dave Hansen 
>>> Cc: Vlastimil Babka 
>>> Cc: Wei Yang 
>>> Cc: Oscar Salvador 
>>> Cc: Mike Rapoport 
>>> Cc: Scott Cheloha 
>>> Cc: Michael Ellerman 
>>> Signed-off-by: David Hildenbrand 
>>> ---
>>>  include/linux/page-isolation.h |  2 ++
>>>  mm/page_alloc.c| 36 +-
>>>  mm/page_isolation.c|  8 ++--
>>>  3 files changed, 39 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>>> index 572458016331..a36be2cf4dbb 100644
>>> --- a/include/linux/page-isolation.h
>>> +++ b/include/linux/page-isolation.h
>>> @@ -38,6 +38,8 @@ struct page *has_unmovable_pages(struct zone *zone, 
>>> struct page *page,
>>>  void set_pageblock_migratetype(struct page *page, int migratetype);
>>>  int move_freepages_block(struct zone *zone, struct page *page,
>>> int migratetype, int *num_movable);
>>> +int move_freepages_block_tail(struct zone *zone, struct page *page,
>>> + int migratetype);
>>>  
>>>  /*
>>>   * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index bba9a0f60c70..75b0f49b4022 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -899,6 +899,15 @@ static inline void move_to_free_list(struct page 
>>> *page, struct zone *zone,
>>> list_move(>lru, >free_list[migratetype]);
>>>  }
>>>  
>>> +/* Used for pages which are on another list */
>>> +static inline void move_to_free_list_tail(struct page *page, struct zone 
>>> *zone,
>>> + unsigned int order, int migratetype)
>>> +{
>>> +   struct free_area *area = >free_area[order];
>>> +
>>> +   list_move_tail(>lru, >free_list[migratetype]);
>>> +}
>>
>> There are just 3 callers of move_to_free_list() before this patch, I would 
>> just
>> add the to_tail parameter there instead of new wrapper. For callers with
>> constant parameter, the inline will eliminate it anyway.
> 
> Got the same feeling :-)

I once was told boolean parameters are the root of all evil, so I tried
to keep them file-local :)

One thing to be aware of is, that inline optimizations won't help as
long as this function is in mm/page_alloc.c, see below.

> 
>>
>>>  static inline void del_page_from_free_list(struct page *page, struct zone 
>>> *zone,
>>>unsigned int order)
>>>  {
>>> @@ -2323,7 +2332,7 @@ static inline struct page 
>>> *__rmqueue_cma_fallback(struct zone *zone,
>>>   */
>>>  static int move_freepages(struct zone *zone,
>>>   struct page *start_page, struct page *end_page,
>>> - int migratetype, int *num_movable)
>>> + int migratetype, int *num_movable, bool to_tail)
>>>  {
>>> struct page *page;
>>> unsigned int order;
>>> @@ -2354,7 +2363,10 @@ static int move_freepages(struct zone *zone,
>>> VM_BUG_ON_PAGE(page_zone(page) != zone, page);
>>>  
>>> order = page_order(page);
>>> -   move_to_free_list(page, zone, order, migratetype);
>>> +   if (to_tail)
>>> +   move_to_free_list_tail(page, zone, order, migratetype);
>>> +   else
>>> +   move_to_free_list(page, zone, order, migratetype);
>>> page += 1 << order;
>>> pages_moved += 1 << order;
>>> }
>>> @@ -2362,8 +2374,9 @@ static int move_freepages(struct zone *zone,
>>> return pages_moved;
>>>  }
>>>  
>>> -int move_freepages_block(struct zone *zone, struct page *page,
>>> -   int migratetype, int *num_movable)
>>> +static int __move_freepages_block(struct zone *zone, struct page *page,
>>> + int migratetype, int *num_movable,
>>> + bool to_tail)
>>>  {
>>> unsigned long start_pfn, end_pfn;
>>> struct page *start_page, *end_page;
>>> @@ -2384,7 +2397,20 @@ int move_freepages_block(struct zone *zone, struct 
>>> page *page,
>>> return 0;
>>>  
>>> return move_freepages(zone, start_page, end_page, migratetype,
>>> -   num_movable);
>>> +

[xen-4.14-testing bisection] complete test-xtf-amd64-amd64-1

2020-09-25 Thread osstest service owner
branch xen-4.14-testing
xenbranch xen-4.14-testing
job test-xtf-amd64-amd64-1
testid xtf/test-hvm64-xsa-221

Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  xen git://xenbits.xen.org/xen.git
  Bug introduced:  b8c2efbe7b3e8fa5f0b0a3679afccd1204949070
  Bug not present: f5469067ee0260673ca1e554ff512a55ccfc
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/154774/


  commit b8c2efbe7b3e8fa5f0b0a3679afccd1204949070
  Author: Jan Beulich 
  Date:   Tue Sep 22 16:13:34 2020 +0200
  
  evtchn/x86: enforce correct upper limit for 32-bit guests
  
  The recording of d->max_evtchns in evtchn_2l_init(), in particular with
  the limited set of callers of the function, is insufficient. Neither for
  PV nor for HVM guests the bitness is known at domain_create() time, yet
  the upper bound in 2-level mode depends upon guest bitness. Recording
  too high a limit "allows" x86 32-bit domains to open not properly usable
  event channels, management of which (inside Xen) would then result in
  corruption of the shared info and vCPU info structures.
  
  Keep the upper limit dynamic for the 2-level case, introducing a helper
  function to retrieve the effective limit. This helper is now supposed to
  be private to the event channel code. The used in do_poll() and
  domain_dump_evtchn_info() weren't consistent with port uses elsewhere
  and hence get switched to port_is_valid().
  
  Furthermore FIFO mode's setup_ports() gets adjusted to loop only up to
  the prior ABI limit, rather than all the way up to the new one.
  
  Finally a word on the change to do_poll(): Accessing ->max_evtchns
  without holding a suitable lock was never safe, as it as well as
  ->evtchn_port_ops may change behind do_poll()'s back. Using
  port_is_valid() instead widens some the window for potential abuse,
  until we've dealt with the race altogether (see XSA-343).
  
  This is XSA-342.
  
  Reported-by: Julien Grall 
  Fixes: 48974e6ce52e ("evtchn: use a per-domain variable for the max 
number of event channels")
  Signed-off-by: Jan Beulich 
  Reviewed-by: Stefano Stabellini 
  Reviewed-by: Julien Grall 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-4.14-testing/test-xtf-amd64-amd64-1.xtf--test-hvm64-xsa-221.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-4.14-testing/test-xtf-amd64-amd64-1.xtf--test-hvm64-xsa-221
 --summary-out=tmp/154774.bisection-summary --basis-template=154350 
--blessings=real,real-bisect xen-4.14-testing test-xtf-amd64-amd64-1 
xtf/test-hvm64-xsa-221
Searching for failure / basis pass:
 154641 fail [host=chardonnay1] / 154350 [host=huxelrebe1] 154148 
[host=albana0] 154116 [host=godello0] 152545 [host=elbling1] 152537 
[host=chardonnay0] 152531 [host=godello0] 152153 [host=godello0] 152124 
[host=fiano1] 152081 [host=albana0] 152061 [host=albana1] 152043 
[host=huxelrebe1] 151922 [host=elbling1] 151899 ok.
Failure / basis pass flights: 154641 / 151899
(tree with no url: minios)
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Tree: xtf git://xenbits.xen.org/xtf.git
Latest c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
fb97626fe04747ec89599dce0992def9a36e2f6b 
3c659044118e34603161457db9934a34f816d78b 
ea6d3cd1ed79d824e605a70c3626bc437c386260 
155821a1990b6de78dde5f98fa5ab90e802021e0 
f37a1cf023b277d0d49323bf322ce3ff0c92262d 
17d372b763cb0b2e2e6b5a637c11f3997d2533fa
Basis pass c3038e718a19fc596f7b1baba0f83d5146dc7784 
c530a75c1e6a472b0eb9558310b518f0dfcd8860 
9c6f3545aee0808b78a0ad4480b6eb9d24989dc1 
3c659044118e34603161457db9934a34f816d78b 
ea6d3cd1ed79d824e605a70c3626bc437c386260 
88ab0c15525ced2eefe39220742efe4769089ad8 
ce3c4493e4e6c94495ddd8538e801a35980bff0d 
f645a19115e666ce6401ca63b7d7388571463b55
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/linux-pvops.git#c3038e718a19fc596f7b1baba0f83d5146dc7784-c3038e718a19fc596f7b1baba0f83d5146dc7784
 

[xen-4.13-testing test] 154667: regressions - FAIL

2020-09-25 Thread osstest service owner
flight 154667 xen-4.13-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/154667/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-xtf-amd64-amd64-4   68 xtf/test-hvm64-xsa-221   fail REGR. vs. 154358
 test-xtf-amd64-amd64-5   68 xtf/test-hvm64-xsa-221   fail REGR. vs. 154358
 test-xtf-amd64-amd64-4   106 xtf/test-pv64-xsa-221   fail REGR. vs. 154358
 test-xtf-amd64-amd64-5   106 xtf/test-pv64-xsa-221   fail REGR. vs. 154358
 test-xtf-amd64-amd64-2   68 xtf/test-hvm64-xsa-221   fail REGR. vs. 154358
 test-amd64-amd64-xl-xsm  12 guest-start  fail REGR. vs. 154358
 test-xtf-amd64-amd64-2   106 xtf/test-pv64-xsa-221   fail REGR. vs. 154358
 test-xtf-amd64-amd64-3   68 xtf/test-hvm64-xsa-221   fail REGR. vs. 154358
 test-xtf-amd64-amd64-1   68 xtf/test-hvm64-xsa-221   fail REGR. vs. 154358
 test-amd64-i386-xl-xsm   12 guest-start  fail REGR. vs. 154358
 test-xtf-amd64-amd64-3   106 xtf/test-pv64-xsa-221   fail REGR. vs. 154358
 test-xtf-amd64-amd64-1   106 xtf/test-pv64-xsa-221   fail REGR. vs. 154358
 test-amd64-amd64-libvirt-xsm 12 guest-start  fail REGR. vs. 154358
 test-amd64-i386-libvirt-xsm  12 guest-start  fail REGR. vs. 154358
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 10 debian-hvm-install 
fail REGR. vs. 154358
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 154358
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 10 debian-hvm-install 
fail REGR. vs. 154358
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 154358
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 154358
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 154358
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 154358
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 154358

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   

Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common

2020-09-25 Thread Jan Beulich
On 24.09.2020 18:45, Oleksandr wrote:
> 
> On 24.09.20 14:16, Jan Beulich wrote:
> 
> Hi Jan
> 
>> On 22.09.2020 21:32, Oleksandr wrote:
>>> On 16.09.20 11:50, Jan Beulich wrote:
 On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>break;
>}
>
> +#ifdef CONFIG_IOREQ_SERVER
> +if ( op == XENMEM_decrease_reservation )
> +curr_d->qemu_mapcache_invalidate = true;
> +#endif
 I don't see why you put this right into decrease_reservation(). This
 isn't just to avoid the extra conditional, but first and foremost to
 avoid bypassing the earlier return from the function (in the case of
 preemption). In the context of this I wonder whether the ordering of
 operations in hvm_hypercall() is actually correct.
>>> Good point, indeed we may return earlier in case of preemption, I missed
>>> that.
>>> Will move it to decrease_reservation(). But, we may return even earlier
>>> in case of error...
>>> Now I am wondering should we move it to the very beginning of command
>>> processing or not?
>> In _this_ series I'd strongly recommend you keep things working as
>> they are. If independently you think you've found a reason to
>> re-order certain operations, then feel free to send a patch with
>> suitable justification.
> 
> Of course, I will try to retain current behavior.
> 
> 
 I'm also unconvinced curr_d is the right domain in all cases here;
 while this may be a pre-existing issue in principle, I'm afraid it
 gets more pronounced by the logic getting moved to common code.
>>> Sorry I didn't get your concern here.
>> Well, you need to be concerned whose qemu_mapcache_invalidate flag
>> you set.
> May I ask, in what cases the *curr_d* is the right domain?

When a domain does a decrease-reservation on itself. I thought
that's obvious. But perhaps your question was rather meant a to
whether a->domain ever is _not_ the right one?

> We need to make sure that domain is using IOREQ server(s) at least. 
> Hopefully, we have a helper for this
> which is hvm_domain_has_ioreq_server(). Please clarify, anything else I 
> should taking care of?

Nothing I can recall / think of right now, except that the change
may want to come under a different title and with a different
description. As indicated, I don't think this is correct for PVH
Dom0 issuing the request against a HVM DomU, and addressing this
will likely want this moved out of hvm_memory_op() anyway. Of
course an option is to split this into two patches - the proposed
bug fix (perhaps wanting backporting) and then the moving of the
field out of arch.hvm. If you feel uneasy about the bug fix part,
let me know and I (or maybe Roger) will see to put together a
patch.

Jan



Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features

2020-09-25 Thread Jan Beulich
On 24.09.2020 20:02, Oleksandr wrote:
> On 24.09.20 19:02, Oleksandr wrote:
> Julien, I noticed that three fields mmio* are not used within current 
> series on Arm. Do we expect them to be used later on?
> I would rather not add fields which are not used. We could add them when 
> needed.
> 
> Would be the following acceptable?
> 1. Both fields "io_completion" and "io_req" (which seems to be the only 
> fields used in common/ioreq.c) are moved to common struct vcpu as part 
> of struct vcpu_io,
>      enum hvm_io_completion is also moved (and renamed).
> 2. We remove everything related to hvm_vcpu* from the Arm header.
> 3. x86's struct hvm_vcpu_io stays as it is (but without two fields 
> "io_completion" and "io_req").
>      I think, this way we separate a common part and reduce Xen changes 
> (which are getting bigger).

If this works, it would be my preference too. So far I was left
under the impression that you did move / mention further fields
for a reason.

Jan



Re: [PATCH 03/11 RFC] gitignore: Add/Generalize entries

2020-09-25 Thread Jan Beulich
On 24.09.2020 23:48, Elliott Mitchell wrote:
> On Thu, Sep 24, 2020 at 05:44:09PM +0200, Jan Beulich wrote:
>> On 02.09.2020 03:08, Elliott Mitchell wrote:
>>> @@ -33,12 +38,12 @@ cscope.po.out
>>>  .vscode
>>>  
>>>  dist
>>> -stubdom/*.tar.gz
>>>  
>>>  autom4te.cache/
>>>  config.log
>>>  config.status
>>>  config.cache
>>> +config.h
>>>  config/Toplevel.mk
>>>  config/Paths.mk
>>
>> While in userland config.h may indeed be a typically generated file,
>> there are three source files by this name under xen/. Patch 6 also
>> doesn't look to override this in any way for xen/. I think this wants
>> to move a level down, into tools/ and wherever else it may be
>> applicable.
> 
> Another possibility is Git isn't as aggressive with enforcing ignores as
> some other version control software is.  A file which matches a
> .gitignore pattern will not show up under "Untracked files" in
> `git status`; however, /modifying/ a file which is already under control,
> but matches an ignore pattern *will* cause it to show up under
> "Changes not staged for commit".  Git will also allow you to use
> `git add -f` on a file which matches an ignore pattern.
> 
> There are already a few files in Git which have targetted matches pointed
> at them, yet are still present.  Perhaps these were added by mistaken use
> of `add -f`, perhaps they were deliberately added and the author missed
> removing the .gitignore entry.
> 
> As such perhaps the generalized "config.h" pattern is appropriate and
> move towards removing the few examples which currently exist?

I don't think so, no - new ports will similarly have asm-/config.h,
and there shouldn't be a requirement to "git add -f" them at that point.
The role of such named files really is too different to have such a top
level entry imo.

Jan



[PATCH v2 0/3] Fix and cleanup xenguest.h

2020-09-25 Thread Juergen Gross
This series fixes builds of libxenguest users outside the Xen build
system and it cleans up the xenguest.h header merging xenctrl_dom.h
into it.

Juergen Gross (3):
  tools/libs: merge xenctrl_dom.h into xenguest.h
  tools/libxenguest: make xc_dom_loader interface private to libxenguest
  tools/lixenguest: hide struct elf_dom_parms layout from users

 stubdom/grub/kexec.c|  20 +-
 tools/helpers/init-xenstore-domain.c|   2 +-
 tools/libs/ctrl/Makefile|   2 +-
 tools/libs/ctrl/include/xenctrl_dom.h   | 455 
 tools/libs/ctrl/xc_private.c|   1 -
 tools/libs/guest/include/xenguest.h | 400 -
 tools/libs/guest/xg_dom_arm.c   |   1 -
 tools/libs/guest/xg_dom_armzimageloader.c   |   1 -
 tools/libs/guest/xg_dom_binloader.c |   1 -
 tools/libs/guest/xg_dom_boot.c  |   1 -
 tools/libs/guest/xg_dom_compat_linux.c  |   1 -
 tools/libs/guest/xg_dom_core.c  |  86 +++-
 tools/libs/guest/xg_dom_decompress.h|   4 +-
 tools/libs/guest/xg_dom_decompress_unsafe.h |   2 -
 tools/libs/guest/xg_dom_elfloader.c |   1 -
 tools/libs/guest/xg_dom_hvmloader.c |   1 -
 tools/libs/guest/xg_dom_x86.c   |   1 -
 tools/libs/guest/xg_offline_page.c  |   1 -
 tools/libs/guest/xg_private.h   |  14 +
 tools/libs/guest/xg_sr_common.h |   1 -
 tools/libxl/libxl_arm.c |   1 -
 tools/libxl/libxl_arm.h |   2 -
 tools/libxl/libxl_create.c  |   1 -
 tools/libxl/libxl_dm.c  |   1 -
 tools/libxl/libxl_dom.c |   1 -
 tools/libxl/libxl_internal.h|   1 -
 tools/libxl/libxl_vnuma.c   |   2 -
 tools/libxl/libxl_x86.c |   2 -
 tools/libxl/libxl_x86_acpi.c|   7 +-
 tools/python/xen/lowlevel/xc/xc.c   |   2 +-
 30 files changed, 496 insertions(+), 520 deletions(-)
 delete mode 100644 tools/libs/ctrl/include/xenctrl_dom.h

-- 
2.26.2




[PATCH v2 2/3] tools/libxenguest: make xc_dom_loader interface private to libxenguest

2020-09-25 Thread Juergen Gross
The pluggable kernel loader interface is used only internally of
libxenguest, so make it private. This removes a dependency on the Xen
internal header xen/libelf/libelf.h from xenguest.h.

Signed-off-by: Juergen Gross 
---
 tools/libs/guest/include/xenguest.h | 15 ---
 tools/libs/guest/xg_private.h   | 13 +
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/tools/libs/guest/include/xenguest.h 
b/tools/libs/guest/include/xenguest.h
index 279f06345c..dba6a21643 100644
--- a/tools/libs/guest/include/xenguest.h
+++ b/tools/libs/guest/include/xenguest.h
@@ -247,21 +247,6 @@ struct xc_dom_image {
 unsigned int max_vcpus;
 };
 
-/* --- pluggable kernel loader - */
-
-struct xc_dom_loader {
-char *name;
-/* Sadly the error returns from these functions are not consistent: */
-elf_negerrnoval (*probe) (struct xc_dom_image * dom);
-elf_negerrnoval (*parser) (struct xc_dom_image * dom);
-elf_errorstatus (*loader) (struct xc_dom_image * dom);
-
-struct xc_dom_loader *next;
-};
-
-#define __init __attribute__ ((constructor))
-void xc_dom_register_loader(struct xc_dom_loader *loader);
-
 /* --- arch specific hooks - */
 
 struct xc_dom_arch {
diff --git a/tools/libs/guest/xg_private.h b/tools/libs/guest/xg_private.h
index b2b9b6..9940d554ef 100644
--- a/tools/libs/guest/xg_private.h
+++ b/tools/libs/guest/xg_private.h
@@ -41,6 +41,19 @@
 #endif
 #endif
 
+struct xc_dom_loader {
+char *name;
+/* Sadly the error returns from these functions are not consistent: */
+elf_negerrnoval (*probe) (struct xc_dom_image * dom);
+elf_negerrnoval (*parser) (struct xc_dom_image * dom);
+elf_errorstatus (*loader) (struct xc_dom_image * dom);
+
+struct xc_dom_loader *next;
+};
+
+#define __init __attribute__ ((constructor))
+void xc_dom_register_loader(struct xc_dom_loader *loader);
+
 char *xc_read_image(xc_interface *xch,
 const char *filename, unsigned long *size);
 char *xc_inflate_buffer(xc_interface *xch,
-- 
2.26.2




[PATCH v2 1/3] tools/libs: merge xenctrl_dom.h into xenguest.h

2020-09-25 Thread Juergen Gross
Today xenctrl_dom.h is part of libxenctrl as it is included by
xc_private.c. This seems not to be needed, so merge xenctrl_dom.h into
xenguest.h where its contents really should be.

Replace all #includes of xenctrl_dom.h by xenguest.h ones or drop them
if xenguest.h is already included.

Signed-off-by: Juergen Gross 
---
 stubdom/grub/kexec.c|   2 +-
 tools/helpers/init-xenstore-domain.c|   2 +-
 tools/libs/ctrl/Makefile|   2 +-
 tools/libs/ctrl/include/xenctrl_dom.h   | 455 
 tools/libs/ctrl/xc_private.c|   1 -
 tools/libs/guest/include/xenguest.h | 426 +-
 tools/libs/guest/xg_dom_arm.c   |   1 -
 tools/libs/guest/xg_dom_armzimageloader.c   |   1 -
 tools/libs/guest/xg_dom_binloader.c |   1 -
 tools/libs/guest/xg_dom_boot.c  |   1 -
 tools/libs/guest/xg_dom_compat_linux.c  |   1 -
 tools/libs/guest/xg_dom_core.c  |   1 -
 tools/libs/guest/xg_dom_decompress.h|   4 +-
 tools/libs/guest/xg_dom_decompress_unsafe.h |   2 -
 tools/libs/guest/xg_dom_elfloader.c |   1 -
 tools/libs/guest/xg_dom_hvmloader.c |   1 -
 tools/libs/guest/xg_dom_x86.c   |   1 -
 tools/libs/guest/xg_offline_page.c  |   1 -
 tools/libs/guest/xg_sr_common.h |   1 -
 tools/libxl/libxl_arm.c |   1 -
 tools/libxl/libxl_arm.h |   2 -
 tools/libxl/libxl_create.c  |   1 -
 tools/libxl/libxl_dm.c  |   1 -
 tools/libxl/libxl_dom.c |   1 -
 tools/libxl/libxl_internal.h|   1 -
 tools/libxl/libxl_vnuma.c   |   2 -
 tools/libxl/libxl_x86.c |   2 -
 tools/libxl/libxl_x86_acpi.c|   2 -
 tools/python/xen/lowlevel/xc/xc.c   |   2 +-
 29 files changed, 430 insertions(+), 490 deletions(-)
 delete mode 100644 tools/libs/ctrl/include/xenctrl_dom.h

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index 24001220a9..e9a69d2a32 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -20,7 +20,7 @@
 #include 
 
 #include 
-#include 
+#include 
 
 #include 
 #include 
diff --git a/tools/helpers/init-xenstore-domain.c 
b/tools/helpers/init-xenstore-domain.c
index 5bdb48dc80..bcaa0e6fa9 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -8,7 +8,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/tools/libs/ctrl/Makefile b/tools/libs/ctrl/Makefile
index ec93fb5b73..0071226d2a 100644
--- a/tools/libs/ctrl/Makefile
+++ b/tools/libs/ctrl/Makefile
@@ -49,7 +49,7 @@ CFLAGS += -include $(XEN_ROOT)/tools/config.h
 # Needed for posix_fadvise64() in xc_linux.c
 CFLAGS-$(CONFIG_Linux) += -D_GNU_SOURCE
 
-LIBHEADER := xenctrl.h xenctrl_compat.h xenctrl_dom.h
+LIBHEADER := xenctrl.h xenctrl_compat.h
 PKG_CONFIG := xencontrol.pc
 PKG_CONFIG_NAME := Xencontrol
 
diff --git a/tools/libs/ctrl/include/xenctrl_dom.h 
b/tools/libs/ctrl/include/xenctrl_dom.h
deleted file mode 100644
index 40b85b7755..00
--- a/tools/libs/ctrl/include/xenctrl_dom.h
+++ /dev/null
@@ -1,455 +0,0 @@
-/*
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation;
- * version 2.1 of the License.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; If not, see .
- */
-
-#ifndef _XC_DOM_H
-#define _XC_DOM_H
-
-#include 
-
-#define X86_HVM_NR_SPECIAL_PAGES8
-#define X86_HVM_END_SPECIAL_REGION  0xff000u
-#define XG_MAX_MODULES 2
-
-/* --- typedefs and structs  */
-
-typedef uint64_t xen_vaddr_t;
-typedef uint64_t xen_paddr_t;
-
-#define PRIpfn PRI_xen_pfn
-
-struct xc_dom_seg {
-xen_vaddr_t vstart;
-xen_vaddr_t vend;
-xen_pfn_t pfn;
-xen_pfn_t pages;
-};
-
-struct xc_hvm_firmware_module {
-uint8_t  *data;
-uint32_t  length;
-uint64_t  guest_addr_out;
-};
-
-struct xc_dom_mem {
-struct xc_dom_mem *next;
-void *ptr;
-enum {
-XC_DOM_MEM_TYPE_MALLOC_INTERNAL,
-XC_DOM_MEM_TYPE_MALLOC_EXTERNAL,
-XC_DOM_MEM_TYPE_MMAP,
-} type;
-size_t len;
-unsigned char memory[0];
-};
-
-struct xc_dom_phys {
-struct xc_dom_phys *next;
-void *ptr;
-xen_pfn_t first;
-xen_pfn_t count;
-};
-
-struct xc_dom_module {
-void *blob;
-size_t size;
-void *cmdline;
-/* If seg.vstart is non zero then the module will be loaded 

[PATCH v2 3/3] tools/lixenguest: hide struct elf_dom_parms layout from users

2020-09-25 Thread Juergen Gross
Don't include struct elf_dom_parms in struct xc_dom_image, but rather
use a pointer to reference it. Together with adding accessor functions
for the externally needed elements this enables to drop including the
Xen private header xen/libelf/libelf.h from xenguest.h.

Fixes: 7e0165c19387 ("tools/libxc: untangle libxenctrl from libxenguest")
Signed-off-by: Juergen Gross 
---
 stubdom/grub/kexec.c| 18 +++---
 tools/libs/guest/include/xenguest.h | 29 +++---
 tools/libs/guest/xg_dom_core.c  | 85 +++--
 tools/libs/guest/xg_private.h   |  1 +
 tools/libxl/libxl_x86_acpi.c|  5 +-
 5 files changed, 88 insertions(+), 50 deletions(-)

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index e9a69d2a32..3da80b5b4a 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -222,6 +222,7 @@ void kexec(void *kernel, long kernel_size, void *module, 
long module_size, char
 char features[] = "";
 struct mmu_update *m2p_updates;
 unsigned long nr_m2p_updates;
+uint64_t virt_base;
 
 DEBUG("booting with cmdline %s\n", cmdline);
 xc_handle = xc_interface_open(0,0,0);
@@ -294,10 +295,11 @@ void kexec(void *kernel, long kernel_size, void *module, 
long module_size, char
 goto out;
 }
 
+virt_base = xc_dom_virt_base(dom);
 /* copy hypercall page */
 /* TODO: domctl instead, but requires privileges */
-if (dom->parms.virt_hypercall != -1) {
-pfn = PHYS_PFN(dom->parms.virt_hypercall - dom->parms.virt_base);
+if (xc_dom_virt_hypercall(dom) != -1) {
+pfn = PHYS_PFN(xc_dom_virt_hypercall(dom) - virt_base);
 memcpy((void *) pages[pfn], hypercall_page, PAGE_SIZE);
 }
 
@@ -313,11 +315,11 @@ void kexec(void *kernel, long kernel_size, void *module, 
long module_size, char
 /* Move current console, xenstore and boot MFNs to the allocated place */
 do_exchange(dom, dom->console_pfn, start_info.console.domU.mfn);
 do_exchange(dom, dom->xenstore_pfn, start_info.store_mfn);
-DEBUG("virt base at %llx\n", dom->parms.virt_base);
+DEBUG("virt base at %llx\n", virt_base);
 DEBUG("bootstack_pfn %lx\n", dom->bootstack_pfn);
-_boot_target = dom->parms.virt_base + PFN_PHYS(dom->bootstack_pfn);
+_boot_target = virt_base + PFN_PHYS(dom->bootstack_pfn);
 DEBUG("_boot_target %lx\n", _boot_target);
-do_exchange(dom, PHYS_PFN(_boot_target - dom->parms.virt_base),
+do_exchange(dom, PHYS_PFN(_boot_target - virt_base),
 virt_to_mfn(&_boot_page));
 
 if ( dom->arch_hooks->setup_pgtables )
@@ -373,13 +375,13 @@ void kexec(void *kernel, long kernel_size, void *module, 
long module_size, char
 _boot_oldpdmfn = virt_to_mfn(start_info.pt_base);
 DEBUG("boot old pd mfn %lx\n", _boot_oldpdmfn);
 DEBUG("boot pd virt %lx\n", dom->pgtables_seg.vstart);
-_boot_pdmfn = dom->pv_p2m[PHYS_PFN(dom->pgtables_seg.vstart - 
dom->parms.virt_base)];
+_boot_pdmfn = dom->pv_p2m[PHYS_PFN(dom->pgtables_seg.vstart - virt_base)];
 DEBUG("boot pd mfn %lx\n", _boot_pdmfn);
 _boot_stack = _boot_target + PAGE_SIZE;
 DEBUG("boot stack %lx\n", _boot_stack);
-_boot_start_info = dom->parms.virt_base + PFN_PHYS(dom->start_info_pfn);
+_boot_start_info = virt_base + PFN_PHYS(dom->start_info_pfn);
 DEBUG("boot start info %lx\n", _boot_start_info);
-_boot_start = dom->parms.virt_entry;
+_boot_start = xc_dom_virt_entry(dom);
 DEBUG("boot start %lx\n", _boot_start);
 
 /* Keep only useful entries */
diff --git a/tools/libs/guest/include/xenguest.h 
b/tools/libs/guest/include/xenguest.h
index dba6a21643..a9984dbea5 100644
--- a/tools/libs/guest/include/xenguest.h
+++ b/tools/libs/guest/include/xenguest.h
@@ -22,8 +22,6 @@
 #ifndef XENGUEST_H
 #define XENGUEST_H
 
-#include 
-
 #define XC_NUMA_NO_NODE   (~0U)
 
 #define XCFLAGS_LIVE  (1 << 0)
@@ -109,7 +107,7 @@ struct xc_dom_image {
 uint32_t f_requested[XENFEAT_NR_SUBMAPS];
 
 /* info from (elf) kernel image */
-struct elf_dom_parms parms;
+struct elf_dom_parms *parms;
 char *guest_type;
 
 /* memory layout */
@@ -390,6 +388,13 @@ void *xc_dom_pfn_to_ptr_retcount(struct xc_dom_image *dom, 
xen_pfn_t first,
  xen_pfn_t count, xen_pfn_t *count_out);
 void xc_dom_unmap_one(struct xc_dom_image *dom, xen_pfn_t pfn);
 void xc_dom_unmap_all(struct xc_dom_image *dom);
+void *xc_dom_vaddr_to_ptr(struct xc_dom_image *dom,
+  xen_vaddr_t vaddr, size_t *safe_region_out);
+uint64_t xc_dom_virt_base(struct xc_dom_image *dom);
+uint64_t xc_dom_virt_entry(struct xc_dom_image *dom);
+uint64_t xc_dom_virt_hypercall(struct xc_dom_image *dom);
+char *xc_dom_guest_os(struct xc_dom_image *dom);
+bool xc_dom_feature_get(struct xc_dom_image *dom, unsigned int nr);
 
 static inline void *xc_dom_seg_to_ptr_pages(struct xc_dom_image *dom,
   struct xc_dom_seg *seg,
@@ -411,24 +416,6 @@ 

[libvirt test] 154675: regressions - FAIL

2020-09-25 Thread osstest service owner
flight 154675 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/154675/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  af5fb476da55759ba8265194e554d24f64007ceb
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z   77 days
Failing since151818  2020-07-11 04:18:52 Z   76 days   71 attempts
Testing same since   154675  2020-09-24 04:20:04 Z1 days1 attempts


People who touched revisions under test:
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Bastien Orivel 
  Bihong Yu 
  Binfeng Wu 
  Boris Fiuczynski 
  Christian Ehrhardt 
  Collin Walling 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Erik Skultety 
  Fabian Freyer 
  Fangge Jin 
  Fedora Weblate Translation 
  Han Han 
  Hao Wang 
  Ian Wienand 
  Jamie Strandboge 
  Jamie Strandboge 
  Jean-Baptiste Holcroft 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jiri Denemark 
  Jonathon Jongsma 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Laine Stump 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Marc Hartmayer 
  Marek Marczykowski-Górecki 
  Martin Kletzander 
  Matt Coleman 
  Matt Coleman 
  Michal Privoznik 
  Milo Casagrande 
  Neal Gompa 
  Nikolay Shirokovskiy 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Roman Bogorodskiy 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  Simon Gaiser 
  Stefan Bader 
  Stefan Berger 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Wang Xin 
  Weblate 
  Yang Hang 
  Yanqiu Zhang 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zheng Chuan 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  blocked 
 test-amd64-amd64-libvirt blocked 
 test-arm64-arm64-libvirt blocked 
 test-armhf-armhf-libvirt