[Xen-devel] [ovmf test] 112305: all pass - PUSHED

2017-07-25 Thread osstest service owner
flight 112305 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112305/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 3b341e263da957b2c8896317f41cc32880c878b0
baseline version:
 ovmf 1683ecec41a7c944783c51efa75375f1e0a71d08

Last test of basis   112091  2017-07-21 10:17:54 Z4 days
Testing same since   112305  2017-07-26 02:04:42 Z0 days1 attempts


People who touched revisions under test:
  Liming Gao 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=ovmf
+ revision=3b341e263da957b2c8896317f41cc32880c878b0
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 
3b341e263da957b2c8896317f41cc32880c878b0
+ branch=ovmf
+ revision=3b341e263da957b2c8896317f41cc32880c878b0
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=ovmf
+ xenbranch=xen-unstable
+ '[' xovmf = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable
+ prevxenbranch=xen-4.9-testing
+ '[' x3b341e263da957b2c8896317f41cc32880c878b0 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git
++ : git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
++ : 

[Xen-devel] [qemu-mainline test] 112290: regressions - trouble: blocked/broken/fail/pass

2017-07-25 Thread osstest service owner
flight 112290 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112290/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-credit2   4 host-install(4)broken REGR. vs. 111765
 test-arm64-arm64-xl   4 host-install(4)broken REGR. vs. 111765
 test-arm64-arm64-xl-xsm   4 host-install(4)broken REGR. vs. 111765
 test-arm64-arm64-libvirt-xsm  4 host-install(4)broken REGR. vs. 111765
 build-armhf-xsm   5 host-build-prep  fail REGR. vs. 111765
 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install   fail REGR. vs. 111765
 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install   fail REGR. vs. 111765
 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install  fail REGR. vs. 111765

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 111765
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 111765
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 111765
 test-amd64-amd64-xl-rtds 10 debian-install   fail  like 111765
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass

version targeted for testing:
 qemuu4c4414a4388f902b7ae2814f9a64898dd0e426a5
baseline version:
 qemuu31fe1c414501047cbb91b695bdccc0068496dcf6

Last test of basis   111765  2017-07-13 10:20:16 Z   12 days
Failing since111790  2017-07-14 04:20:46 Z   11 days   16 attempts
Testing same since   112290  2017-07-25 14:08:31 Z0 days1 attempts


People who touched revisions under test:
  Alex Bennée 
  Alex Williamson 
  Alexander Graf 
  Alexey G 
  Alexey Gerasimenko 
  Alexey Kardashevskiy 
  Alistair Francis 
  Anthony PERARD 
  Anton Nefedov 
  Aurelien Jarno 
  Bharata B Rao 
  Boqun Feng (Intel) 
  Christian Borntraeger 
  Christian Borntraeger  for the s390 part.
  Claudio Imbrenda 
  Cornelia Huck 
  Cornelia Huck 
  Cédric Le Goater 
  Daniel Barboza 
  Daniel Henrique Barboza 
  Daniel P. Berrange 
  Daniel Rempel 

[Xen-devel] [xen-unstable test] 112286: tolerable trouble: broken/fail/pass - PUSHED

2017-07-25 Thread osstest service owner
flight 112286 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112286/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl-xsm   4 host-install(4)  broken pass in 112274
 test-arm64-arm64-xl-credit2   4 host-install(4)  broken pass in 112274
 test-amd64-i386-xl-qemuu-ovmf-amd64 5 host-ping-check-native fail in 112274 
pass in 112286
 test-armhf-armhf-xl   6 xen-installfail pass in 112274

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 17 guest-start.2fail REGR. vs. 112143
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop   fail REGR. vs. 112260

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 18 guest-start/win.repeat fail in 112274 
like 112260
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail in 112274 like 
112260
 test-arm64-arm64-xl-xsm 13 migrate-support-check fail in 112274 never pass
 test-arm64-arm64-xl-xsm 14 saverestore-support-check fail in 112274 never pass
 test-arm64-arm64-xl-credit2 13 migrate-support-check fail in 112274 never pass
 test-arm64-arm64-xl-credit2 14 saverestore-support-check fail in 112274 never 
pass
 test-armhf-armhf-xl 13 migrate-support-check fail in 112274 never pass
 test-armhf-armhf-xl 14 saverestore-support-check fail in 112274 never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 112210
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 112260
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 112260
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 112260
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 112260
 test-amd64-amd64-xl-rtds 10 debian-install   fail  like 112260
 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass

version targeted for testing:
 xen  

[Xen-devel] [xen-unstable-smoke test] 112297: tolerable trouble: broken/pass - PUSHED

2017-07-25 Thread osstest service owner
flight 112297 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112297/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  6c9abf0e8022807bb7d677570d0775659950ff1a
baseline version:
 xen  33a0b4fe90f1ef1a104dd454c931bb46d417ffca

Last test of basis   112284  2017-07-25 10:00:59 Z0 days
Testing same since   112297  2017-07-25 20:11:50 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Chao Gao 
  George Dunlap 
  Julien Grall 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=xen-unstable-smoke
+ revision=6c9abf0e8022807bb7d677570d0775659950ff1a
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
6c9abf0e8022807bb7d677570d0775659950ff1a
+ branch=xen-unstable-smoke
+ revision=6c9abf0e8022807bb7d677570d0775659950ff1a
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.9-testing
+ '[' x6c9abf0e8022807bb7d677570d0775659950ff1a = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : 

[Xen-devel] [linux-linus test] 112277: regressions - FAIL

2017-07-25 Thread osstest service owner
flight 112277 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112277/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-examine  7 reboot   fail REGR. vs. 110515
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. 
vs. 110515
 test-amd64-amd64-xl-pvh-intel  7 xen-bootfail REGR. vs. 110515
 test-amd64-amd64-qemuu-nested-intel  7 xen-boot  fail REGR. vs. 110515
 test-amd64-amd64-amd64-pvgrub  7 xen-bootfail REGR. vs. 110515
 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-xl-qemut-debianhvm-amd64  7 xen-bootfail REGR. vs. 110515
 test-amd64-amd64-pygrub   7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-i386-pvgrub  7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-libvirt-pair 21 guest-start/debian  fail REGR. vs. 110515
 test-amd64-i386-libvirt-pair 21 guest-start/debian   fail REGR. vs. 110515
 test-amd64-i386-pair 21 guest-start/debian   fail REGR. vs. 110515

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-pair21 guest-start/debian fail pass in 112271

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 18 guest-start/win.repeat fail blocked in 
110515
 test-amd64-amd64-xl-qemut-win7-amd64 18 guest-start/win.repeat fail in 112271 
blocked in 110515
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop   fail in 112271 like 110515
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 110515
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 110515
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 110515
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 110515
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 110515
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 110515
 test-amd64-amd64-xl-rtds 10 debian-install   fail  like 110515
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 110515
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   

Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-4.13

2017-07-25 Thread Jens Axboe
On 07/25/2017 03:21 PM, Konrad Rzeszutek Wilk wrote:
> Hi Jens,
> 
> Please git pull in your branch "for-linus" the following
> branch which is based on 765e40b675a9566459ddcb8358ad16f3b8344bbe
> "blk-mq: map queues to all present CPUs":
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-jens-4.13
> 
> It has two bug-fixes for the xen-blkfront driver.
> 
> Thank you!
> 
> 
> drivers/block/xen-blkfront.c | 21 -
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> Dongli Zhang (1):
>   xen/blkfront: always allocate grants first from per-queue persistent 
> grants
> 
> Junxiao Bi (1):
>   xen-blkfront: fix mq start/stop race

Pulled, thanks.

-- 
Jens Axboe


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 07/13] xen/pvcalls: implement accept command

2017-07-25 Thread Stefano Stabellini
Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make
sure that only one accept command is executed at any given time by
setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the
inflight_accept_req waitqueue.

sock->sk->sk_send_head is not used for ip sockets: reuse the field to
store a pointer to the struct sock_mapping corresponding to the socket.

Convert the new struct socket pointer into an uint64_t and use it as id
for the new socket to pass to the backend.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 79 +
 drivers/xen/pvcalls-front.h |  3 ++
 2 files changed, 82 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 3b5d50e..b8c4538 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -413,6 +413,85 @@ int pvcalls_front_listen(struct socket *sock, int backlog)
return ret;
 }
 
+int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int 
flags)
+{
+   struct pvcalls_bedata *bedata;
+   struct sock_mapping *map;
+   struct sock_mapping *map2 = NULL;
+   struct xen_pvcalls_request *req;
+   int notify, req_id, ret, evtchn;
+
+   if (!pvcalls_front_dev)
+   return -ENOTCONN;
+   bedata = dev_get_drvdata(_front_dev->dev);
+
+   map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
+   if (!map)
+   return -ENOTSOCK;
+
+   if (map->passive.status != PVCALLS_STATUS_LISTEN)
+   return -EINVAL;
+
+   /*
+* Backend only supports 1 inflight accept request, will return
+* errors for the others
+*/
+   if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+(void *)>passive.flags)) {
+   if (wait_event_interruptible(map->passive.inflight_accept_req,
+   !test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+ (void *)>passive.flags))
+   != 0)
+   return -EINTR;
+   }
+
+
+   newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL);
+   if (newsock->sk == NULL)
+   return -ENOMEM;
+
+   spin_lock(>pvcallss_lock);
+   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
+   if (RING_FULL(>ring) ||
+   READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
+   kfree(newsock->sk);
+   spin_unlock(>pvcallss_lock);
+   return -EAGAIN;
+   }
+
+   map2 = create_active();
+
+   req = RING_GET_REQUEST(>ring, req_id);
+   req->req_id = req_id;
+   req->cmd = PVCALLS_ACCEPT;
+   req->u.accept.id = (uint64_t) sock;
+   req->u.accept.ref = map2->active.ref;
+   req->u.accept.id_new = (uint64_t) newsock;
+   req->u.accept.evtchn = evtchn;
+
+   list_add_tail(>list, >socket_mappings);
+   WRITE_ONCE(newsock->sk->sk_send_head, (void *)map2);
+   map2->sock = newsock;
+
+   bedata->ring.req_prod_pvt++;
+   RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify);
+   spin_unlock(>pvcallss_lock);
+   if (notify)
+   notify_remote_via_irq(bedata->irq);
+
+   wait_event(bedata->inflight_req,
+  READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+   clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)>passive.flags);
+   wake_up(>passive.inflight_accept_req);
+
+   ret = bedata->rsp[req_id].ret;
+   /* read ret, then set this rsp slot to be reused */
+   smp_mb();
+   WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
+   return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
{ "pvcalls" },
{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index aa8fe10..ab4f1da 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock,
   struct sockaddr *addr,
   int addr_len);
 int pvcalls_front_listen(struct socket *sock, int backlog);
+int pvcalls_front_accept(struct socket *sock,
+struct socket *newsock,
+int flags);
 
 #endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 09/13] xen/pvcalls: implement recvmsg

2017-07-25 Thread Stefano Stabellini
Implement recvmsg by copying data from the "in" ring. If not enough data
is available and the recvmsg call is blocking, then wait on the
inflight_conn_req waitqueue. Take the active socket in_mutex so that
only one function can access the ring at any given time.

If not enough data is available on the ring, rather than returning
immediately or sleep-waiting, spin for up to 5000 cycles. This small
optimization turns out to improve performance and latency significantly.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 106 
 drivers/xen/pvcalls-front.h |   4 ++
 2 files changed, 110 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index d8ed280..b4ca569 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -96,6 +96,20 @@ static int pvcalls_front_write_todo(struct sock_mapping *map)
return size - pvcalls_queued(prod, cons, size);
 }
 
+static bool pvcalls_front_read_todo(struct sock_mapping *map)
+{
+   struct pvcalls_data_intf *intf = map->active.ring;
+   RING_IDX cons, prod;
+   int32_t error;
+
+   cons = intf->in_cons;
+   prod = intf->in_prod;
+   error = intf->in_error;
+   return (error != 0 ||
+   pvcalls_queued(prod, cons,
+  XEN_FLEX_RING_SIZE(intf->ring_order))) != 0;
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
struct xenbus_device *dev = dev_id;
@@ -418,6 +432,98 @@ int pvcalls_front_sendmsg(struct socket *sock, struct 
msghdr *msg,
return tot_sent;
 }
 
+static int __read_ring(struct pvcalls_data_intf *intf,
+  struct pvcalls_data *data,
+  struct iov_iter *msg_iter,
+  size_t len, int flags)
+{
+   RING_IDX cons, prod, size, masked_prod, masked_cons;
+   RING_IDX array_size = XEN_FLEX_RING_SIZE(intf->ring_order);
+   int32_t error;
+
+   cons = intf->in_cons;
+   prod = intf->in_prod;
+   error = intf->in_error;
+   /* get pointers before reading from the ring */
+   virt_rmb();
+   if (error < 0)
+   return error;
+
+   size = pvcalls_queued(prod, cons, array_size);
+   masked_prod = pvcalls_mask(prod, array_size);
+   masked_cons = pvcalls_mask(cons, array_size);
+
+   if (size == 0)
+   return 0;
+
+   if (len > size)
+   len = size;
+
+   if (masked_prod > masked_cons) {
+   copy_to_iter(data->in + masked_cons, len, msg_iter);
+   } else {
+   if (len > (array_size - masked_cons)) {
+   copy_to_iter(data->in + masked_cons,
+array_size - masked_cons, msg_iter);
+   copy_to_iter(data->in,
+len - (array_size - masked_cons),
+msg_iter);
+   } else {
+   copy_to_iter(data->in + masked_cons, len, msg_iter);
+   }
+   }
+   /* read data from the ring before increasing the index */
+   virt_mb();
+   if (!(flags & MSG_PEEK))
+   intf->in_cons += len;
+
+   return len;
+}
+
+int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+int flags)
+{
+   struct pvcalls_bedata *bedata;
+   int ret = -EAGAIN;
+   struct sock_mapping *map;
+   int count = 0;
+
+   if (!pvcalls_front_dev)
+   return -ENOTCONN;
+   bedata = dev_get_drvdata(_front_dev->dev);
+
+   map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
+   if (!map)
+   return -ENOTSOCK;
+
+   if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC))
+   return -EOPNOTSUPP;
+
+   mutex_lock(>active.in_mutex);
+   if (len > XEN_FLEX_RING_SIZE(map->active.ring->ring_order))
+   len = XEN_FLEX_RING_SIZE(map->active.ring->ring_order);
+
+   while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) {
+   if (count < PVCALLS_FRONT_MAX_SPIN)
+   count++;
+   else
+   wait_event_interruptible(map->active.inflight_conn_req,
+pvcalls_front_read_todo(map));
+   }
+   ret = __read_ring(map->active.ring, >active.data,
+ >msg_iter, len, flags);
+
+   if (ret > 0)
+   notify_remote_via_irq(map->active.irq);
+   if (ret == 0)
+   ret = -EAGAIN;
+   if (ret == -ENOTCONN)
+   ret = 0;
+
+   mutex_unlock(>active.in_mutex);
+   return ret;
+}
+
 int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int 
addr_len)
 {
struct pvcalls_bedata *bedata;
diff --git 

[Xen-devel] [PATCH v2 10/13] xen/pvcalls: implement poll command

2017-07-25 Thread Stefano Stabellini
For active sockets, check the indexes and use the inflight_conn_req
waitqueue to wait.

For passive sockets, send PVCALLS_POLL to the backend. Use the
inflight_accept_req waitqueue if an accept is outstanding. Otherwise use
the inflight_req waitqueue: inflight_req is awaken when a new response
is received; on wakeup we check whether the POLL response is arrived by
looking at the PVCALLS_FLAG_POLL_RET flag. We set the flag from
pvcalls_front_event_handler, if the response was for a POLL command.

In pvcalls_front_event_handler, get the struct socket pointer from the
poll id (we previously converted struct socket* to uint64_t and used it
as id).

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 134 
 drivers/xen/pvcalls-front.h |   3 +
 2 files changed, 126 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index b4ca569..833b717 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -130,17 +130,35 @@ static irqreturn_t pvcalls_front_event_handler(int irq, 
void *dev_id)
rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons);
 
req_id = rsp->req_id;
-   src = (uint8_t *)>rsp[req_id];
-   src += sizeof(rsp->req_id);
-   dst = (uint8_t *)rsp;
-   dst += sizeof(rsp->req_id);
-   memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
-   /*
-* First copy the rest of the data, then req_id. It is
-* paired with the barrier when accessing bedata->rsp.
-*/
-   smp_wmb();
-   WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id);
+   if (rsp->cmd == PVCALLS_POLL) {
+   struct socket *sock = (struct socket *) rsp->u.poll.id;
+   struct sock_mapping *map =
+   (struct sock_mapping *)
+   READ_ONCE(sock->sk->sk_send_head);
+
+   set_bit(PVCALLS_FLAG_POLL_RET,
+   (void *)>passive.flags);
+   /*
+* Set RET, then clear INFLIGHT. It pairs with
+* the checks at the beginning of
+* pvcalls_front_poll_passive.
+*/
+   smp_wmb();
+   clear_bit(PVCALLS_FLAG_POLL_INFLIGHT,
+ (void *)>passive.flags);
+   } else {
+   src = (uint8_t *)>rsp[req_id];
+   src += sizeof(rsp->req_id);
+   dst = (uint8_t *)rsp;
+   dst += sizeof(rsp->req_id);
+   memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
+   /*
+* First copy the rest of the data, then req_id. It is
+* paired with the barrier when accessing bedata->rsp.
+*/
+   smp_wmb();
+   WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id);
+   }
 
done = 1;
bedata->ring.rsp_cons++;
@@ -707,6 +725,100 @@ int pvcalls_front_accept(struct socket *sock, struct 
socket *newsock, int flags)
return ret;
 }
 
+static unsigned int pvcalls_front_poll_passive(struct file *file,
+  struct pvcalls_bedata *bedata,
+  struct sock_mapping *map,
+  poll_table *wait)
+{
+   int notify, req_id;
+   struct xen_pvcalls_request *req;
+
+   if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
+(void *)>passive.flags)) {
+   poll_wait(file, >passive.inflight_accept_req, wait);
+   return 0;
+   }
+
+   if (test_and_clear_bit(PVCALLS_FLAG_POLL_RET,
+  (void *)>passive.flags))
+   return POLLIN;
+
+   /*
+* First check RET, then INFLIGHT. No barriers necessary to
+* ensure execution ordering because of the conditional
+* instructions creating control dependencies.
+*/
+
+   if (test_and_set_bit(PVCALLS_FLAG_POLL_INFLIGHT,
+(void *)>passive.flags)) {
+   poll_wait(file, >inflight_req, wait);
+   return 0;
+   }
+
+   spin_lock(>pvcallss_lock);
+   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
+   if (RING_FULL(>ring) ||
+   READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
+   spin_unlock(>pvcallss_lock);
+   return -EAGAIN;
+   }
+   req = RING_GET_REQUEST(>ring, req_id);
+   req->req_id = req_id;
+   

[Xen-devel] [PATCH v2 01/13] xen/pvcalls: introduce the pvcalls xenbus frontend

2017-07-25 Thread Stefano Stabellini
Introduce a xenbus frontend for the pvcalls protocol, as defined by
https://xenbits.xen.org/docs/unstable/misc/pvcalls.html.

This patch only adds the stubs, the code will be added by the following
patches.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 61 +
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.c

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
new file mode 100644
index 000..a8d38c2
--- /dev/null
+++ b/drivers/xen/pvcalls-front.c
@@ -0,0 +1,61 @@
+/*
+ * (c) 2017 Stefano Stabellini 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const struct xenbus_device_id pvcalls_front_ids[] = {
+   { "pvcalls" },
+   { "" }
+};
+
+static int pvcalls_front_remove(struct xenbus_device *dev)
+{
+   return 0;
+}
+
+static int pvcalls_front_probe(struct xenbus_device *dev,
+ const struct xenbus_device_id *id)
+{
+   return 0;
+}
+
+static void pvcalls_front_changed(struct xenbus_device *dev,
+   enum xenbus_state backend_state)
+{
+}
+
+static struct xenbus_driver pvcalls_front_driver = {
+   .ids = pvcalls_front_ids,
+   .probe = pvcalls_front_probe,
+   .remove = pvcalls_front_remove,
+   .otherend_changed = pvcalls_front_changed,
+};
+
+static int __init pvcalls_frontend_init(void)
+{
+   if (!xen_domain())
+   return -ENODEV;
+
+   pr_info("Initialising Xen pvcalls frontend driver\n");
+
+   return xenbus_register_frontend(_front_driver);
+}
+
+module_init(pvcalls_frontend_init);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 04/13] xen/pvcalls: implement connect command

2017-07-25 Thread Stefano Stabellini
Send PVCALLS_CONNECT to the backend. Allocate a new ring and evtchn for
the active socket.

Introduce a data structure to keep track of sockets. Introduce a
waitqueue to allow the frontend to wait on data coming from the backend
on the active socket (recvmsg command).

Two mutexes (one of reads and one for writes) will be used to protect
the active socket in and out rings from concurrent accesses.

sock->sk->sk_send_head is not used for ip sockets: reuse the field to
store a pointer to the struct sock_mapping corresponding to the socket.
This way, we can easily get the struct sock_mapping from the struct
socket.

Convert the struct socket pointer into an uint64_t and use it as id for
the new socket to pass to the backend.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 177 +---
 drivers/xen/pvcalls-front.h |   2 +
 2 files changed, 168 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index d1dbcf1..d0f5f42 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -13,6 +13,10 @@
  */
 
 #include 
+#include 
+#include 
+
+#include 
 
 #include 
 #include 
@@ -40,6 +44,24 @@ struct pvcalls_bedata {
 };
 struct xenbus_device *pvcalls_front_dev;
 
+struct sock_mapping {
+   bool active_socket;
+   struct list_head list;
+   struct socket *sock;
+   union {
+   struct {
+   int irq;
+   grant_ref_t ref;
+   struct pvcalls_data_intf *ring;
+   struct pvcalls_data data;
+   struct mutex in_mutex;
+   struct mutex out_mutex;
+
+   wait_queue_head_t inflight_conn_req;
+   } active;
+   };
+};
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
struct xenbus_device *dev = dev_id;
@@ -84,6 +106,18 @@ static irqreturn_t pvcalls_front_event_handler(int irq, 
void *dev_id)
return IRQ_HANDLED;
 }
 
+static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map)
+{
+   struct sock_mapping *map = sock_map;
+
+   if (map == NULL)
+   return IRQ_HANDLED;
+
+   wake_up_interruptible(>active.inflight_conn_req);
+
+   return IRQ_HANDLED;
+}
+
 int pvcalls_front_socket(struct socket *sock)
 {
struct pvcalls_bedata *bedata;
@@ -137,6 +171,127 @@ int pvcalls_front_socket(struct socket *sock)
return ret;
 }
 
+static struct sock_mapping *create_active(int *evtchn)
+{
+   struct sock_mapping *map = NULL;
+   void *bytes;
+   int ret, irq = -1, i;
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL)
+   return NULL;
+
+   init_waitqueue_head(>active.inflight_conn_req);
+
+   map->active.ring = (struct pvcalls_data_intf *)
+   __get_free_page(GFP_KERNEL | __GFP_ZERO);
+   if (map->active.ring == NULL)
+   goto out_error;
+   memset(map->active.ring, 0, XEN_PAGE_SIZE);
+   map->active.ring->ring_order = RING_ORDER;
+   bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+   map->active.ring->ring_order);
+   if (bytes == NULL)
+   goto out_error;
+   for (i = 0; i < (1 << map->active.ring->ring_order); i++)
+   map->active.ring->ref[i] = gnttab_grant_foreign_access(
+   pvcalls_front_dev->otherend_id,
+   pfn_to_gfn(virt_to_pfn(bytes) + i), 0);
+
+   map->active.ref = gnttab_grant_foreign_access(
+   pvcalls_front_dev->otherend_id,
+   pfn_to_gfn(virt_to_pfn((void *)map->active.ring)), 0);
+
+   map->active.data.in = bytes;
+   map->active.data.out = bytes +
+   XEN_FLEX_RING_SIZE(map->active.ring->ring_order);
+
+   ret = xenbus_alloc_evtchn(pvcalls_front_dev, evtchn);
+   if (ret)
+   goto out_error;
+   irq = bind_evtchn_to_irqhandler(*evtchn, pvcalls_front_conn_handler,
+   0, "pvcalls-frontend", map);
+   if (irq < 0)
+   goto out_error;
+
+   map->active.irq = irq;
+   map->active_socket = true;
+   mutex_init(>active.in_mutex);
+   mutex_init(>active.out_mutex);
+
+   return map;
+
+out_error:
+   if (irq >= 0)
+   unbind_from_irqhandler(irq, map);
+   else if (*evtchn >= 0)
+   xenbus_free_evtchn(pvcalls_front_dev, *evtchn);
+   kfree(map->active.data.in);
+   kfree(map->active.ring);
+   kfree(map);
+   return NULL;
+}
+
+int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
+   int addr_len, int flags)
+{
+   struct pvcalls_bedata *bedata;
+   struct sock_mapping *map = NULL;
+   struct 

[Xen-devel] [PATCH v2 05/13] xen/pvcalls: implement bind command

2017-07-25 Thread Stefano Stabellini
Send PVCALLS_BIND to the backend. Introduce a new structure, part of
struct sock_mapping, to store information specific to passive sockets.

Introduce a status field to keep track of the status of the passive
socket.

Introduce a waitqueue for the "accept" command (see the accept command
implementation): it is used to allow only one outstanding accept
command at any given time and to implement polling on the passive
socket. Introduce a flags field to keep track of in-flight accept and
poll commands.

sock->sk->sk_send_head is not used for ip sockets: reuse the field to
store a pointer to the struct sock_mapping corresponding to the socket.

Convert the struct socket pointer into an uint64_t and use it as id for
the socket to pass to the backend.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 73 +
 drivers/xen/pvcalls-front.h |  3 ++
 2 files changed, 76 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index d0f5f42..af2ce20 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -59,6 +59,23 @@ struct sock_mapping {
 
wait_queue_head_t inflight_conn_req;
} active;
+   struct {
+   /* Socket status */
+#define PVCALLS_STATUS_UNINITALIZED  0
+#define PVCALLS_STATUS_BIND  1
+#define PVCALLS_STATUS_LISTEN2
+   uint8_t status;
+   /*
+* Internal state-machine flags.
+* Only one accept operation can be inflight for a socket.
+* Only one poll operation can be inflight for a given socket.
+*/
+#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0
+#define PVCALLS_FLAG_POLL_INFLIGHT   1
+#define PVCALLS_FLAG_POLL_RET2
+   uint8_t flags;
+   wait_queue_head_t inflight_accept_req;
+   } passive;
};
 };
 
@@ -292,6 +309,62 @@ int pvcalls_front_connect(struct socket *sock, struct 
sockaddr *addr,
return ret;
 }
 
+int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int 
addr_len)
+{
+   struct pvcalls_bedata *bedata;
+   struct sock_mapping *map = NULL;
+   struct xen_pvcalls_request *req;
+   int notify, req_id, ret;
+
+   if (!pvcalls_front_dev)
+   return -ENOTCONN;
+   if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM)
+   return -ENOTSUPP;
+   bedata = dev_get_drvdata(_front_dev->dev);
+
+   map = kzalloc(sizeof(*map), GFP_KERNEL);
+   if (map == NULL)
+   return -ENOMEM;
+
+   spin_lock(>pvcallss_lock);
+   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
+   if (RING_FULL(>ring) ||
+   READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
+   kfree(map);
+   spin_unlock(>pvcallss_lock);
+   return -EAGAIN;
+   }
+   req = RING_GET_REQUEST(>ring, req_id);
+   req->req_id = req_id;
+   map->sock = sock;
+   req->cmd = PVCALLS_BIND;
+   req->u.bind.id = (uint64_t) sock;
+   memcpy(req->u.bind.addr, addr, sizeof(*addr));
+   req->u.bind.len = addr_len;
+
+   init_waitqueue_head(>passive.inflight_accept_req);
+
+   list_add_tail(>list, >socketpass_mappings);
+   WRITE_ONCE(sock->sk->sk_send_head, (void *)map);
+   map->active_socket = false;
+
+   bedata->ring.req_prod_pvt++;
+   RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify);
+   spin_unlock(>pvcallss_lock);
+   if (notify)
+   notify_remote_via_irq(bedata->irq);
+
+   wait_event(bedata->inflight_req,
+  READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+   map->passive.status = PVCALLS_STATUS_BIND;
+   ret = bedata->rsp[req_id].ret;
+   /* read ret, then set this rsp slot to be reused */
+   smp_mb();
+   WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
+   return 0;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
{ "pvcalls" },
{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 63b0417..8b0a274 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -6,5 +6,8 @@
 int pvcalls_front_socket(struct socket *sock);
 int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr,
  int addr_len, int flags);
+int pvcalls_front_bind(struct socket *sock,
+  struct sockaddr *addr,
+  int addr_len);
 
 #endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 11/13] xen/pvcalls: implement release command

2017-07-25 Thread Stefano Stabellini
Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
in_mutex and out_mutex to avoid concurrent accesses. Then, free the
socket.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 85 +
 drivers/xen/pvcalls-front.h |  1 +
 2 files changed, 86 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 833b717..5a4040e 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -184,6 +184,23 @@ static irqreturn_t pvcalls_front_conn_handler(int irq, 
void *sock_map)
return IRQ_HANDLED;
 }
 
+static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
+  struct sock_mapping *map)
+{
+   int i;
+
+   spin_lock(>pvcallss_lock);
+   if (!list_empty(>list))
+   list_del_init(>list);
+   spin_unlock(>pvcallss_lock);
+
+   for (i = 0; i < (1 << map->active.ring->ring_order); i++)
+   gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
+   gnttab_end_foreign_access(map->active.ref, 0, 0);
+   free_page((unsigned long)map->active.ring);
+   unbind_from_irqhandler(map->active.irq, map);
+}
+
 int pvcalls_front_socket(struct socket *sock)
 {
struct pvcalls_bedata *bedata;
@@ -819,6 +836,74 @@ unsigned int pvcalls_front_poll(struct file *file, struct 
socket *sock,
return pvcalls_front_poll_passive(file, bedata, map, wait);
 }
 
+int pvcalls_front_release(struct socket *sock)
+{
+   struct pvcalls_bedata *bedata;
+   struct sock_mapping *map;
+   int req_id, notify;
+   struct xen_pvcalls_request *req;
+
+   if (!pvcalls_front_dev)
+   return -EIO;
+   bedata = dev_get_drvdata(_front_dev->dev);
+   if (!bedata)
+   return -EIO;
+
+   if (sock->sk == NULL)
+   return 0;
+
+   map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
+   if (map == NULL)
+   return 0;
+
+   spin_lock(>pvcallss_lock);
+   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
+   if (RING_FULL(>ring) ||
+   READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
+   spin_unlock(>pvcallss_lock);
+   return -EAGAIN;
+   }
+   WRITE_ONCE(sock->sk->sk_send_head, NULL);
+
+   req = RING_GET_REQUEST(>ring, req_id);
+   req->req_id = req_id;
+   req->cmd = PVCALLS_RELEASE;
+   req->u.release.id = (uint64_t)sock;
+
+   bedata->ring.req_prod_pvt++;
+   RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify);
+   spin_unlock(>pvcallss_lock);
+   if (notify)
+   notify_remote_via_irq(bedata->irq);
+
+   wait_event(bedata->inflight_req,
+   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+   if (map->active_socket) {
+   /* 
+* Set in_error and wake up inflight_conn_req to force
+* recvmsg waiters to exit.
+*/
+   map->active.ring->in_error = -EBADF;
+   wake_up_interruptible(>active.inflight_conn_req);
+
+   mutex_lock(>active.in_mutex);
+   mutex_lock(>active.out_mutex);
+   pvcalls_front_free_map(bedata, map);
+   mutex_unlock(>active.out_mutex);
+   mutex_unlock(>active.in_mutex);
+   kfree(map);
+   } else {
+   spin_lock(>pvcallss_lock);
+   list_del_init(>list);
+   kfree(map);
+   spin_unlock(>pvcallss_lock);
+   }
+   WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
+
+   return 0;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
{ "pvcalls" },
{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 25e05b8..3332978 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -23,5 +23,6 @@ int pvcalls_front_recvmsg(struct socket *sock,
 unsigned int pvcalls_front_poll(struct file *file,
struct socket *sock,
poll_table *wait);
+int pvcalls_front_release(struct socket *sock);
 
 #endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 13/13] xen: introduce a Kconfig option to enable the pvcalls frontend

2017-07-25 Thread Stefano Stabellini
Also add pvcalls-front to the Makefile.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/Kconfig  | 9 +
 drivers/xen/Makefile | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 4545561..0b2c828 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,15 @@ config XEN_PCIDEV_BACKEND
 
  If in doubt, say m.
 
+config XEN_PVCALLS_FRONTEND
+   tristate "XEN PV Calls frontend driver"
+   depends on INET && XEN
+   help
+ Experimental frontend for the Xen PV Calls protocol
+ (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+ sends a small set of POSIX calls to the backend, which
+ implements them.
+
 config XEN_PVCALLS_BACKEND
bool "XEN PV Calls backend driver"
depends on INET && XEN && XEN_BACKEND
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 480b928..afb9e03 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_XEN_EFI) += efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)   += xlate_mmu.o
 obj-$(CONFIG_XEN_PVCALLS_BACKEND)  += pvcalls-back.o
+obj-$(CONFIG_XEN_PVCALLS_FRONTEND) += pvcalls-front.o
 xen-evtchn-y   := evtchn.o
 xen-gntdev-y   := gntdev.o
 xen-gntalloc-y := gntalloc.o
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 08/13] xen/pvcalls: implement sendmsg

2017-07-25 Thread Stefano Stabellini
Send data to an active socket by copying data to the "out" ring. Take
the active socket out_mutex so that only one function can access the
ring at any given time.

If not enough room is available on the ring, rather than returning
immediately or sleep-waiting, spin for up to 5000 cycles. This small
optimization turns out to improve performance significantly.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 109 
 drivers/xen/pvcalls-front.h |   3 ++
 2 files changed, 112 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index b8c4538..d8ed280 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -29,6 +29,7 @@
 #define PVCALLS_INVALID_ID (UINT_MAX)
 #define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
 #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
+#define PVCALLS_FRONT_MAX_SPIN 5000
 
 struct pvcalls_bedata {
struct xen_pvcalls_front_ring ring;
@@ -79,6 +80,22 @@ struct sock_mapping {
};
 };
 
+static int pvcalls_front_write_todo(struct sock_mapping *map)
+{
+   struct pvcalls_data_intf *intf = map->active.ring;
+   RING_IDX cons, prod, size = XEN_FLEX_RING_SIZE(intf->ring_order);
+   int32_t error;
+
+   cons = intf->out_cons;
+   prod = intf->out_prod;
+   error = intf->out_error;
+   if (error == -ENOTCONN)
+   return 0;
+   if (error != 0)
+   return error;
+   return size - pvcalls_queued(prod, cons, size);
+}
+
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
struct xenbus_device *dev = dev_id;
@@ -309,6 +326,98 @@ int pvcalls_front_connect(struct socket *sock, struct 
sockaddr *addr,
return ret;
 }
 
+static int __write_ring(struct pvcalls_data_intf *intf,
+   struct pvcalls_data *data,
+   struct iov_iter *msg_iter,
+   size_t len)
+{
+   RING_IDX cons, prod, size, masked_prod, masked_cons;
+   RING_IDX array_size = XEN_FLEX_RING_SIZE(intf->ring_order);
+   int32_t error;
+
+   cons = intf->out_cons;
+   prod = intf->out_prod;
+   error = intf->out_error;
+   /* read indexes before continuing */
+   virt_mb();
+
+   if (error < 0)
+   return error;
+
+   size = pvcalls_queued(prod, cons, array_size);
+   if (size >= array_size)
+   return 0;
+   if (len > array_size - size)
+   len = array_size - size;
+
+   masked_prod = pvcalls_mask(prod, array_size);
+   masked_cons = pvcalls_mask(cons, array_size);
+
+   if (masked_prod < masked_cons) {
+   copy_from_iter(data->out + masked_prod, len, msg_iter);
+   } else {
+   if (len > array_size - masked_prod) {
+   copy_from_iter(data->out + masked_prod,
+  array_size - masked_prod, msg_iter);
+   copy_from_iter(data->out,
+  len - (array_size - masked_prod),
+  msg_iter);
+   } else {
+   copy_from_iter(data->out + masked_prod, len, msg_iter);
+   }
+   }
+   /* write to ring before updating pointer */
+   virt_wmb();
+   intf->out_prod += len;
+
+   return len;
+}
+
+int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg,
+ size_t len)
+{
+   struct pvcalls_bedata *bedata;
+   struct sock_mapping *map;
+   int sent = 0, tot_sent = 0;
+   int count = 0, flags;
+
+   if (!pvcalls_front_dev)
+   return -ENOTCONN;
+   bedata = dev_get_drvdata(_front_dev->dev);
+
+   map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
+   if (!map)
+   return -ENOTSOCK;
+
+   flags = msg->msg_flags;
+   if (flags & (MSG_CONFIRM|MSG_DONTROUTE|MSG_EOR|MSG_OOB))
+   return -EOPNOTSUPP;
+
+   mutex_lock(>active.out_mutex);
+   if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) {
+   mutex_unlock(>active.out_mutex);
+   return -EAGAIN;
+   }
+
+again:
+   count++;
+   sent = __write_ring(map->active.ring,
+   >active.data, >msg_iter,
+   len);
+   if (sent > 0) {
+   len -= sent;
+   tot_sent += sent;
+   notify_remote_via_irq(map->active.irq);
+   }
+   if (sent >= 0 && len > 0 && count < PVCALLS_FRONT_MAX_SPIN)
+   goto again;
+   if (sent < 0)
+   tot_sent = sent;
+
+   mutex_unlock(>active.out_mutex);
+   return tot_sent;
+}
+
 int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int 
addr_len)
 {
struct pvcalls_bedata 

[Xen-devel] [PATCH v2 12/13] xen/pvcalls: implement frontend disconnect

2017-07-25 Thread Stefano Stabellini
Implement pvcalls frontend removal function. Go through the list of
active and passive sockets and free them all, one at a time.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Juergen Gross 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 5a4040e..b3d4675 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -911,6 +911,34 @@ int pvcalls_front_release(struct socket *sock)
 
 static int pvcalls_front_remove(struct xenbus_device *dev)
 {
+   struct pvcalls_bedata *bedata;
+   struct sock_mapping *map = NULL, *n;
+
+   bedata = dev_get_drvdata(_front_dev->dev);
+
+   list_for_each_entry_safe(map, n, >socket_mappings, list) {
+   mutex_lock(>active.in_mutex);
+   mutex_lock(>active.out_mutex);
+   pvcalls_front_free_map(bedata, map);
+   mutex_unlock(>active.out_mutex);
+   mutex_unlock(>active.in_mutex);
+   kfree(map);
+   }
+   list_for_each_entry_safe(map, n, >socketpass_mappings, list) {
+   spin_lock(>pvcallss_lock);
+   list_del_init(>list);
+   spin_unlock(>pvcallss_lock);
+   kfree(map);
+   }
+   if (bedata->irq > 0)
+   unbind_from_irqhandler(bedata->irq, dev);
+   if (bedata->ref >= 0)
+   gnttab_end_foreign_access(bedata->ref, 0, 0);
+   kfree(bedata->ring.sring);
+   kfree(bedata);
+   dev_set_drvdata(>dev, NULL);
+   xenbus_switch_state(dev, XenbusStateClosed);
+   pvcalls_front_dev = NULL;
return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 02/13] xen/pvcalls: connect to the backend

2017-07-25 Thread Stefano Stabellini
Implement the probe function for the pvcalls frontend. Read the
supported versions, max-page-order and function-calls nodes from
xenstore.

Introduce a data structure named pvcalls_bedata. It contains pointers to
the command ring, the event channel, a list of active sockets and a list
of passive sockets. Lists accesses are protected by a spin_lock.

Introduce a waitqueue to allow waiting for a response on commands sent
to the backend.

Introduce an array of struct xen_pvcalls_response to store commands
responses.

Only one frontend<->backend connection is supported at any given time
for a guest. Store the active frontend device to a static pointer.

Introduce a stub functions for the event handler.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 153 
 1 file changed, 153 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index a8d38c2..5e0b265 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -20,6 +20,29 @@
 #include 
 #include 
 
+#define PVCALLS_INVALID_ID (UINT_MAX)
+#define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
+
+struct pvcalls_bedata {
+   struct xen_pvcalls_front_ring ring;
+   grant_ref_t ref;
+   int irq;
+
+   struct list_head socket_mappings;
+   struct list_head socketpass_mappings;
+   spinlock_t pvcallss_lock;
+
+   wait_queue_head_t inflight_req;
+   struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];
+};
+struct xenbus_device *pvcalls_front_dev;
+
+static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
+{
+   return IRQ_HANDLED;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
{ "pvcalls" },
{ "" }
@@ -33,12 +56,142 @@ static int pvcalls_front_remove(struct xenbus_device *dev)
 static int pvcalls_front_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
+   int ret = -EFAULT, evtchn, ref = -1, i;
+   unsigned int max_page_order, function_calls, len;
+   char *versions;
+   grant_ref_t gref_head = 0;
+   struct xenbus_transaction xbt;
+   struct pvcalls_bedata *bedata = NULL;
+   struct xen_pvcalls_sring *sring;
+
+   if (pvcalls_front_dev != NULL) {
+   dev_err(>dev, "only one PV Calls connection supported\n");
+   return -EINVAL;
+   }
+
+   versions = xenbus_read(XBT_NIL, dev->otherend, "versions", );
+   if (!len)
+   return -EINVAL;
+   if (strcmp(versions, "1")) {
+   kfree(versions);
+   return -EINVAL;
+   }
+   kfree(versions);
+   ret = xenbus_scanf(XBT_NIL, dev->otherend,
+  "max-page-order", "%u", _page_order);
+   if (ret <= 0)
+   return -ENODEV;
+   if (max_page_order < RING_ORDER)
+   return -ENODEV;
+   ret = xenbus_scanf(XBT_NIL, dev->otherend,
+  "function-calls", "%u", _calls);
+   if (ret <= 0 || function_calls != 1)
+   return -ENODEV;
+   pr_info("%s max-page-order is %u\n", __func__, max_page_order);
+
+   bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL);
+   if (!bedata)
+   return -ENOMEM;
+
+   init_waitqueue_head(>inflight_req);
+   for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++)
+   bedata->rsp[i].req_id = PVCALLS_INVALID_ID;
+
+   sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL |
+__GFP_ZERO);
+   if (!sring)
+   goto error;
+   SHARED_RING_INIT(sring);
+   FRONT_RING_INIT(>ring, sring, XEN_PAGE_SIZE);
+
+   ret = xenbus_alloc_evtchn(dev, );
+   if (ret)
+   goto error;
+
+   bedata->irq = bind_evtchn_to_irqhandler(evtchn,
+   pvcalls_front_event_handler,
+   0, "pvcalls-frontend", dev);
+   if (bedata->irq < 0) {
+   ret = bedata->irq;
+   goto error;
+   }
+
+   ret = gnttab_alloc_grant_references(1, _head);
+   if (ret < 0)
+   goto error;
+   bedata->ref = ref = gnttab_claim_grant_reference(_head);
+   if (ref < 0)
+   goto error;
+   gnttab_grant_foreign_access_ref(ref, dev->otherend_id,
+   virt_to_gfn((void *)sring), 0);
+
+ again:
+   ret = xenbus_transaction_start();
+   if (ret) {
+   xenbus_dev_fatal(dev, ret, "starting transaction");
+   goto error;
+   }
+   ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
+   if (ret)
+   goto error_xenbus;
+   ret = xenbus_printf(xbt, 

[Xen-devel] [PATCH v2 03/13] xen/pvcalls: implement socket command and handle events

2017-07-25 Thread Stefano Stabellini
Send a PVCALLS_SOCKET command to the backend, use the masked
req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
ready for the response, and there cannot be two outstanding responses
with the same req_id.

Wait for the response by waiting on the inflight_req waitqueue and
check for the req_id field in rsp[req_id]. Use atomic accesses to
read the field. Once a response is received, clear the corresponding rsp
slot by setting req_id to PVCALLS_INVALID_ID. Note that
PVCALLS_INVALID_ID is invalid only from the frontend point of view. It
is not part of the PVCalls protocol.

pvcalls_front_event_handler is in charge of copying responses from the
ring to the appropriate rsp slot. It is done by copying the body of the
response first, then by copying req_id atomically. After the copies,
wake up anybody waiting on waitqueue.

pvcallss_lock protects accesses to the ring.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 94 +
 drivers/xen/pvcalls-front.h |  8 
 2 files changed, 102 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.h

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 5e0b265..d1dbcf1 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+#include "pvcalls-front.h"
+
 #define PVCALLS_INVALID_ID (UINT_MAX)
 #define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
 #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE)
@@ -40,9 +42,101 @@ struct pvcalls_bedata {
 
 static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
 {
+   struct xenbus_device *dev = dev_id;
+   struct pvcalls_bedata *bedata;
+   struct xen_pvcalls_response *rsp;
+   uint8_t *src, *dst;
+   int req_id = 0, more = 0, done = 0;
+
+   if (dev == NULL)
+   return IRQ_HANDLED;
+
+   bedata = dev_get_drvdata(>dev);
+   if (bedata == NULL)
+   return IRQ_HANDLED;
+
+again:
+   while (RING_HAS_UNCONSUMED_RESPONSES(>ring)) {
+   rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons);
+
+   req_id = rsp->req_id;
+   src = (uint8_t *)>rsp[req_id];
+   src += sizeof(rsp->req_id);
+   dst = (uint8_t *)rsp;
+   dst += sizeof(rsp->req_id);
+   memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
+   /*
+* First copy the rest of the data, then req_id. It is
+* paired with the barrier when accessing bedata->rsp.
+*/
+   smp_wmb();
+   WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id);
+
+   done = 1;
+   bedata->ring.rsp_cons++;
+   }
+
+   RING_FINAL_CHECK_FOR_RESPONSES(>ring, more);
+   if (more)
+   goto again;
+   if (done)
+   wake_up(>inflight_req);
return IRQ_HANDLED;
 }
 
+int pvcalls_front_socket(struct socket *sock)
+{
+   struct pvcalls_bedata *bedata;
+   struct xen_pvcalls_request *req;
+   int notify, req_id, ret;
+
+   if (!pvcalls_front_dev)
+   return -EACCES;
+   /*
+* PVCalls only supports domain AF_INET,
+* type SOCK_STREAM and protocol 0 sockets for now.
+*
+* Check socket type here, AF_INET and protocol checks are done
+* by the caller.
+*/
+   if (sock->type != SOCK_STREAM)
+   return -ENOTSUPP;
+
+   bedata = dev_get_drvdata(_front_dev->dev);
+
+   spin_lock(>pvcallss_lock);
+   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
+   if (RING_FULL(>ring) ||
+   READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
+   spin_unlock(>pvcallss_lock);
+   return -EAGAIN;
+   }
+   req = RING_GET_REQUEST(>ring, req_id);
+   req->req_id = req_id;
+   req->cmd = PVCALLS_SOCKET;
+   req->u.socket.id = (uint64_t) sock;
+   req->u.socket.domain = AF_INET;
+   req->u.socket.type = SOCK_STREAM;
+   req->u.socket.protocol = 0;
+
+   bedata->ring.req_prod_pvt++;
+   RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify);
+   spin_unlock(>pvcallss_lock);
+   if (notify)
+   notify_remote_via_irq(bedata->irq);
+
+   if (wait_event_interruptible(bedata->inflight_req,
+   READ_ONCE(bedata->rsp[req_id].req_id) == req_id) != 0)
+   return -EINTR;
+
+   ret = bedata->rsp[req_id].ret;
+   /* read ret, then set this rsp slot to be reused */
+   smp_mb();
+   WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
+
+   return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
{ "pvcalls" },
{ "" }
diff --git 

[Xen-devel] [PATCH v2 06/13] xen/pvcalls: implement listen command

2017-07-25 Thread Stefano Stabellini
Send PVCALLS_LISTEN to the backend.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-front.c | 48 +
 drivers/xen/pvcalls-front.h |  1 +
 2 files changed, 49 insertions(+)

diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index af2ce20..3b5d50e 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -365,6 +365,54 @@ int pvcalls_front_bind(struct socket *sock, struct 
sockaddr *addr, int addr_len)
return 0;
 }
 
+int pvcalls_front_listen(struct socket *sock, int backlog)
+{
+   struct pvcalls_bedata *bedata;
+   struct sock_mapping *map;
+   struct xen_pvcalls_request *req;
+   int notify, req_id, ret;
+
+   if (!pvcalls_front_dev)
+   return -ENOTCONN;
+   bedata = dev_get_drvdata(_front_dev->dev);
+
+   map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
+   if (!map)
+   return -ENOTSOCK;
+
+   if (map->passive.status != PVCALLS_STATUS_BIND)
+   return -EOPNOTSUPP;
+
+   spin_lock(>pvcallss_lock);
+   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
+   if (RING_FULL(>ring) ||
+   bedata->rsp[req_id].req_id != PVCALLS_INVALID_ID) {
+   spin_unlock(>pvcallss_lock);
+   return -EAGAIN;
+   }
+   req = RING_GET_REQUEST(>ring, req_id);
+   req->req_id = req_id;
+   req->cmd = PVCALLS_LISTEN;
+   req->u.listen.id = (uint64_t) sock;
+   req->u.listen.backlog = backlog;
+
+   bedata->ring.req_prod_pvt++;
+   RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify);
+   spin_unlock(>pvcallss_lock);
+   if (notify)
+   notify_remote_via_irq(bedata->irq);
+
+   wait_event(bedata->inflight_req,
+  READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
+
+   map->passive.status = PVCALLS_STATUS_LISTEN;
+   ret = bedata->rsp[req_id].ret;
+   /* read ret, then set this rsp slot to be reused */
+   smp_mb();
+   WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
+   return ret;
+}
+
 static const struct xenbus_device_id pvcalls_front_ids[] = {
{ "pvcalls" },
{ "" }
diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
index 8b0a274..aa8fe10 100644
--- a/drivers/xen/pvcalls-front.h
+++ b/drivers/xen/pvcalls-front.h
@@ -9,5 +9,6 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr 
*addr,
 int pvcalls_front_bind(struct socket *sock,
   struct sockaddr *addr,
   int addr_len);
+int pvcalls_front_listen(struct socket *sock, int backlog);
 
 #endif
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 00/13] introduce the Xen PV Calls frontend

2017-07-25 Thread Stefano Stabellini
Hi all,

this series introduces the frontend for the newly introduced PV Calls
procotol.

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

For more information about PV Calls, please read:

https://xenbits.xen.org/docs/unstable/misc/pvcalls.html

This patch series only implements the frontend driver. It doesn't
attempt to redirect POSIX calls to it. The functions exported in
pvcalls-front.h are meant to be used for that. A separate patch series
will be sent to use them and hook them into the system.


Changes in v2:
- use xenbus_read_unsigned when possible
- call dev_set_drvdata earlier in pvcalls_front_probe not to dereference
  a NULL pointer in the error path
- set ret appropriately in pvcalls_front_probe
- include pvcalls-front.h in pvcalls-front.c
- call wake_up only once after the consuming loop in pvcalls_front_event_handler
- don't leak "bytes" in case of errors in create_active
- call spin_unlock appropriately in case of errors in create_active
- remove all BUG_ONs
- don't leak newsock->sk in pvcalls_front_accept in case of errors
- rename PVCALLS_FRON_MAX_SPIN to PVCALLS_FRONT_MAX_SPIN
- return bool from pvcalls_front_read_todo
- add a barrier after setting PVCALLS_FLAG_POLL_RET in
  pvcalls_front_event_handler
- remove outdated comment in pvcalls_front_free_map
- clear sock->sk->sk_send_head later in pvcalls_front_release
- make XEN_PVCALLS_FRONTEND tristate
- don't add an empty resume function


Stefano Stabellini (13):
  xen/pvcalls: introduce the pvcalls xenbus frontend
  xen/pvcalls: connect to the backend
  xen/pvcalls: implement socket command and handle events
  xen/pvcalls: implement connect command
  xen/pvcalls: implement bind command
  xen/pvcalls: implement listen command
  xen/pvcalls: implement accept command
  xen/pvcalls: implement sendmsg
  xen/pvcalls: implement recvmsg
  xen/pvcalls: implement poll command
  xen/pvcalls: implement release command
  xen/pvcalls: implement frontend disconnect
  xen: introduce a Kconfig option to enable the pvcalls frontend

 drivers/xen/Kconfig |9 +
 drivers/xen/Makefile|1 +
 drivers/xen/pvcalls-front.c | 1103 +++
 drivers/xen/pvcalls-front.h |   28 ++
 4 files changed, 1141 insertions(+)
 create mode 100644 drivers/xen/pvcalls-front.c
 create mode 100644 drivers/xen/pvcalls-front.h

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [GIT PULL] (xen) stable/for-jens-4.13

2017-07-25 Thread Konrad Rzeszutek Wilk
Hi Jens,

Please git pull in your branch "for-linus" the following
branch which is based on 765e40b675a9566459ddcb8358ad16f3b8344bbe
"blk-mq: map queues to all present CPUs":

 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
stable/for-jens-4.13

It has two bug-fixes for the xen-blkfront driver.

Thank you!


drivers/block/xen-blkfront.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

Dongli Zhang (1):
  xen/blkfront: always allocate grants first from per-queue persistent 
grants

Junxiao Bi (1):
  xen-blkfront: fix mq start/stop race


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 11/13] xen/pvcalls: implement release command

2017-07-25 Thread Stefano Stabellini
On Mon, 24 Jul 2017, Juergen Gross wrote:
> On 22/07/17 02:12, Stefano Stabellini wrote:
> > Send PVCALLS_RELEASE to the backend and wait for a reply. Take both
> > in_mutex and out_mutex to avoid concurrent accesses. Then, free the
> > socket.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: boris.ostrov...@oracle.com
> > CC: jgr...@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 86 
> > +
> >  drivers/xen/pvcalls-front.h |  1 +
> >  2 files changed, 87 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index b6cfb7d..bd3dfac 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -174,6 +174,24 @@ static irqreturn_t pvcalls_front_conn_handler(int irq, 
> > void *sock_map)
> > return IRQ_HANDLED;
> >  }
> >  
> > +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata,
> > +  struct sock_mapping *map)
> > +{
> > +   int i;
> > +
> > +   spin_lock(>pvcallss_lock);
> > +   if (!list_empty(>list))
> > +   list_del_init(>list);
> > +   spin_unlock(>pvcallss_lock);
> > +
> > +   /* what if the thread waiting still need access? */
> 
> Is this handled? If not, why is it no problem?

Yes, sorry. This is a left-over from earlier versions of the code.

This scenario is handled because threads waiting will have already been
awaken by the wake_up_interruptible call in pvcalls_front_release, and
also the code is protected by both the in_mutex and out_mutex. I hadn't
introduced in_mutex and out_mutex yet when I wrote this comment, it no
longer applies.



> > +   for (i = 0; i < (1 << map->active.ring->ring_order); i++)
> > +   gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0);
> > +   gnttab_end_foreign_access(map->active.ref, 0, 0);
> > +   free_page((unsigned long)map->active.ring);
> > +   unbind_from_irqhandler(map->active.irq, map);
> > +}
> > +
> >  int pvcalls_front_socket(struct socket *sock)
> >  {
> > struct pvcalls_bedata *bedata;
> > @@ -805,6 +823,74 @@ unsigned int pvcalls_front_poll(struct file *file, 
> > struct socket *sock,
> > return pvcalls_front_poll_passive(file, bedata, map, wait);
> >  }
> >  
> > +int pvcalls_front_release(struct socket *sock)
> > +{
> > +   struct pvcalls_bedata *bedata;
> > +   struct sock_mapping *map;
> > +   int req_id, notify;
> > +   struct xen_pvcalls_request *req;
> > +
> > +   if (!pvcalls_front_dev)
> > +   return -EIO;
> > +   bedata = dev_get_drvdata(_front_dev->dev);
> > +   if (!bedata)
> > +   return -EIO;
> > +
> > +   if (sock->sk == NULL)
> > +   return 0;
> > +
> > +   map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
> > +   if (map == NULL)
> > +   return 0;
> > +   WRITE_ONCE(sock->sk->sk_send_head, NULL);
> > +
> > +   spin_lock(>pvcallss_lock);
> > +   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
> > +   BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING);
> > +   if (RING_FULL(>ring) ||
> > +   READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
> > +   spin_unlock(>pvcallss_lock);
> > +   return -EAGAIN;
> 
> Isn't it a problem you already cleared sock->sk->sk_send_head?

Yes, you are right. It would effectively leak the socket. I'll move the
clearing of sk_send_head after this check.



> > +   }
> > +   req = RING_GET_REQUEST(>ring, req_id);
> > +   req->req_id = req_id;
> > +   req->cmd = PVCALLS_RELEASE;
> > +   req->u.release.id = (uint64_t)sock;
> > +
> > +   bedata->ring.req_prod_pvt++;
> > +   RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify);
> > +   spin_unlock(>pvcallss_lock);
> > +   if (notify)
> > +   notify_remote_via_irq(bedata->irq);
> > +
> > +   wait_event(bedata->inflight_req,
> > +   READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
> > +
> > +   if (map->active_socket) {
> > +   /* 
> > +* Set in_error and wake up inflight_conn_req to force
> > +* recvmsg waiters to exit.
> > +*/
> > +   map->active.ring->in_error = -EBADF;
> > +   wake_up_interruptible(>active.inflight_conn_req);
> > +
> > +   mutex_lock(>active.in_mutex);
> > +   mutex_lock(>active.out_mutex);
> > +   pvcalls_front_free_map(bedata, map);
> > +   mutex_unlock(>active.out_mutex);
> > +   mutex_unlock(>active.in_mutex);
> > +   kfree(map);
> > +   } else {
> > +   spin_lock(>pvcallss_lock);
> > +   list_del_init(>list);
> > +   kfree(map);
> > +   spin_unlock(>pvcallss_lock);
> > +   }
> > +   WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID);
> > +
> > +   return 0;
> > +}
> > +
> >  static const struct xenbus_device_id pvcalls_front_ids[] = {
> > { "pvcalls" },
> > { "" }
> > diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h
> > index 25e05b8..3332978 

Re: [Xen-devel] [PATCH v1 10/13] xen/pvcalls: implement poll command

2017-07-25 Thread Stefano Stabellini
On Mon, 24 Jul 2017, Juergen Gross wrote:
> On 22/07/17 02:12, Stefano Stabellini wrote:
> > For active sockets, check the indexes and use the inflight_conn_req
> > waitqueue to wait.
> > 
> > For passive sockets, send PVCALLS_POLL to the backend. Use the
> > inflight_accept_req waitqueue if an accept is outstanding. Otherwise use
> > the inflight_req waitqueue: inflight_req is awaken when a new response
> > is received; on wakeup we check whether the POLL response is arrived by
> > looking at the PVCALLS_FLAG_POLL_RET flag. We set the flag from
> > pvcalls_front_event_handler, if the response was for a POLL command.
> > 
> > In pvcalls_front_event_handler, get the struct socket pointer from the
> > poll id (we previously converted struct socket* to uint64_t and used it
> > as id).
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: boris.ostrov...@oracle.com
> > CC: jgr...@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 123 
> > 
> >  drivers/xen/pvcalls-front.h |   3 ++
> >  2 files changed, 115 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index 3d1041a..b6cfb7d 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -128,17 +128,29 @@ static irqreturn_t pvcalls_front_event_handler(int 
> > irq, void *dev_id)
> > rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons);
> >  
> > req_id = rsp->req_id;
> > -   src = (uint8_t *)>rsp[req_id];
> > -   src += sizeof(rsp->req_id);
> > -   dst = (uint8_t *)rsp;
> > -   dst += sizeof(rsp->req_id);
> > -   memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
> > -   /*
> > -* First copy the rest of the data, then req_id. It is
> > -* paired with the barrier when accessing bedata->rsp.
> > -*/
> > -   smp_wmb();
> > -   WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id);
> > +   if (rsp->cmd == PVCALLS_POLL) {
> > +   struct socket *sock = (struct socket *) rsp->u.poll.id;
> > +   struct sock_mapping *map =
> > +   (struct sock_mapping *)
> > +   READ_ONCE(sock->sk->sk_send_head);
> > +
> > +   set_bit(PVCALLS_FLAG_POLL_RET,
> > +   (void *)>passive.flags);
> 
> Add a barrier here to make sure PVCALLS_FLAG_POLL_INFLIGHT is cleared
> _after_ setting PVCALLS_FLAG_POLL_RET?

Yes, good point, I'll add an smp_wmb() here. A barrier is unnecessary at
the other end (the beginning of pvcalls_front_poll_passive) because of
the conditional instructions creating control dependencies. I'll add a
comment.


> > +   clear_bit(PVCALLS_FLAG_POLL_INFLIGHT,
> > + (void *)>passive.flags);
> > +   } else {
> > +   src = (uint8_t *)>rsp[req_id];
> > +   src += sizeof(rsp->req_id);
> > +   dst = (uint8_t *)rsp;
> > +   dst += sizeof(rsp->req_id);
> > +   memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
> > +   /*
> > +* First copy the rest of the data, then req_id. It is
> > +* paired with the barrier when accessing bedata->rsp.
> > +*/
> > +   smp_wmb();
> > +   WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id);
> > +   }
> >  
> > bedata->ring.rsp_cons++;
> > wake_up(>inflight_req);
> > @@ -704,6 +716,95 @@ int pvcalls_front_accept(struct socket *sock, struct 
> > socket *newsock, int flags)
> > return ret;
> >  }
> >  
> > +static unsigned int pvcalls_front_poll_passive(struct file *file,
> > +  struct pvcalls_bedata *bedata,
> > +  struct sock_mapping *map,
> > +  poll_table *wait)
> > +{
> > +   int notify, req_id;
> > +   struct xen_pvcalls_request *req;
> > +
> > +   if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT,
> > +(void *)>passive.flags)) {
> > +   poll_wait(file, >passive.inflight_accept_req, wait);
> > +   return 0;
> > +   }
> > +
> > +   if (test_and_clear_bit(PVCALLS_FLAG_POLL_RET,
> > +  (void *)>passive.flags))
> > +   return POLLIN;
> > +
> > +   if (test_and_set_bit(PVCALLS_FLAG_POLL_INFLIGHT,
> > +(void *)>passive.flags)) {
> > +   poll_wait(file, >inflight_req, wait);
> > +   return 0;
> > +   }
> > +
> > +   spin_lock(>pvcallss_lock);
> > +   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
> > +   BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING);
> > +   if (RING_FULL(>ring) ||
> > +   READ_ONCE(bedata->rsp[req_id].req_id) != 

Re: [Xen-devel] [PATCH v1 03/13] xen/pvcalls: implement socket command and handle events

2017-07-25 Thread Stefano Stabellini
On Mon, 24 Jul 2017, Juergen Gross wrote:
> On 22/07/17 02:11, Stefano Stabellini wrote:
> > Send a PVCALLS_SOCKET command to the backend, use the masked
> > req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0
> > and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array
> > ready for the response, and there cannot be two outstanding responses
> > with the same req_id.
> > 
> > Wait for the response by waiting on the inflight_req waitqueue and
> > check for the req_id field in rsp[req_id]. Use atomic accesses to
> > read the field. Once a response is received, clear the corresponding rsp
> > slot by setting req_id to PVCALLS_INVALID_ID. Note that
> > PVCALLS_INVALID_ID is invalid only from the frontend point of view. It
> > is not part of the PVCalls protocol.
> > 
> > pvcalls_front_event_handler is in charge of copying responses from the
> > ring to the appropriate rsp slot. It is done by copying the body of the
> > response first, then by copying req_id atomically. After the copies,
> > wake up anybody waiting on waitqueue.
> > 
> > pvcallss_lock protects accesses to the ring.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: boris.ostrov...@oracle.com
> > CC: jgr...@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 91 
> > +
> >  drivers/xen/pvcalls-front.h |  8 
> >  2 files changed, 99 insertions(+)
> >  create mode 100644 drivers/xen/pvcalls-front.h
> > 
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index fb08ebf..7933c73 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> 
> Shouldn't you include pvcalls-front.h?

Yes


> > @@ -40,9 +40,100 @@ struct pvcalls_bedata {
> >  
> >  static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> >  {
> > +   struct xenbus_device *dev = dev_id;
> > +   struct pvcalls_bedata *bedata;
> > +   struct xen_pvcalls_response *rsp;
> > +   uint8_t *src, *dst;
> > +   int req_id = 0, more = 0;
> > +
> > +   if (dev == NULL)
> > +   return IRQ_HANDLED;
> > +
> > +   bedata = dev_get_drvdata(>dev);
> > +   if (bedata == NULL)
> > +   return IRQ_HANDLED;
> > +
> > +again:
> > +   while (RING_HAS_UNCONSUMED_RESPONSES(>ring)) {
> > +   rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons);
> > +
> > +   req_id = rsp->req_id;
> > +   src = (uint8_t *)>rsp[req_id];
> > +   src += sizeof(rsp->req_id);
> > +   dst = (uint8_t *)rsp;
> > +   dst += sizeof(rsp->req_id);
> > +   memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id));
> > +   /*
> > +* First copy the rest of the data, then req_id. It is
> > +* paired with the barrier when accessing bedata->rsp.
> > +*/
> > +   smp_wmb();
> > +   WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id);
> > +
> > +   bedata->ring.rsp_cons++;
> > +   wake_up(>inflight_req);
> > +   }
> > +
> > +   RING_FINAL_CHECK_FOR_RESPONSES(>ring, more);
> > +   if (more)
> > +   goto again;
> 
> Wouldn't it make more sense to use wake_up() just once if there is any
> response pending and do the consuming loop outside the irq handler?

You are definitely right: it's far better to call wake_up() just once
after the consuming loop if there is any response pending. I'll do that.

However, I am not sure there is much to gain in moving the consuming
loop out of the irq handler: it's pretty short and doesn't call any long
running or sleeping functions.



 
> > return IRQ_HANDLED;
> >  }
> >  
> > +int pvcalls_front_socket(struct socket *sock)
> > +{
> > +   struct pvcalls_bedata *bedata;
> > +   struct xen_pvcalls_request *req;
> > +   int notify, req_id, ret;
> > +
> > +   if (!pvcalls_front_dev)
> > +   return -EACCES;
> > +   /*
> > +* PVCalls only supports domain AF_INET,
> > +* type SOCK_STREAM and protocol 0 sockets for now.
> > +*
> > +* Check socket type here, AF_INET and protocol checks are done
> > +* by the caller.
> > +*/
> > +   if (sock->type != SOCK_STREAM)
> > +   return -ENOTSUPP;
> > +
> > +   bedata = dev_get_drvdata(_front_dev->dev);
> > +
> > +   spin_lock(>pvcallss_lock);
> > +   req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1);
> > +   BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING);
> > +   if (RING_FULL(>ring) ||
> > +   READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
> > +   spin_unlock(>pvcallss_lock);
> > +   return -EAGAIN;
> > +   }
> > +   req = RING_GET_REQUEST(>ring, req_id);
> > +   req->req_id = req_id;
> > +   req->cmd = PVCALLS_SOCKET;
> > +   req->u.socket.id = (uint64_t) sock;
> > +   req->u.socket.domain = AF_INET;
> > +   req->u.socket.type = SOCK_STREAM;
> > +   req->u.socket.protocol = 0;
> > +
> > +   bedata->ring.req_prod_pvt++;
> > +   RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify);
> > +   

Re: [Xen-devel] [PATCH v1 02/13] xen/pvcalls: connect to the backend

2017-07-25 Thread Stefano Stabellini
On Mon, 24 Jul 2017, Juergen Gross wrote:
> On 22/07/17 02:11, Stefano Stabellini wrote:
> > Implement the probe function for the pvcalls frontend. Read the
> > supported versions, max-page-order and function-calls nodes from
> > xenstore.
> > 
> > Introduce a data structure named pvcalls_bedata. It contains pointers to
> > the command ring, the event channel, a list of active sockets and a list
> > of passive sockets. Lists accesses are protected by a spin_lock.
> > 
> > Introduce a waitqueue to allow waiting for a response on commands sent
> > to the backend.
> > 
> > Introduce an array of struct xen_pvcalls_response to store commands
> > responses.
> > 
> > Only one frontend<->backend connection is supported at any given time
> > for a guest. Store the active frontend device to a static pointer.
> > 
> > Introduce a stub functions for the event handler.
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: boris.ostrov...@oracle.com
> > CC: jgr...@suse.com
> > ---
> >  drivers/xen/pvcalls-front.c | 153 
> > 
> >  1 file changed, 153 insertions(+)
> > 
> > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
> > index 173e204..fb08ebf 100644
> > --- a/drivers/xen/pvcalls-front.c
> > +++ b/drivers/xen/pvcalls-front.c
> > @@ -20,6 +20,29 @@
> >  #include 
> >  #include 
> >  
> > +#define PVCALLS_INVALID_ID (UINT_MAX)
> > +#define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
> > +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, 
> > XEN_PAGE_SIZE)
> > +
> > +struct pvcalls_bedata {
> > +   struct xen_pvcalls_front_ring ring;
> > +   grant_ref_t ref;
> > +   int irq;
> > +
> > +   struct list_head socket_mappings;
> > +   struct list_head socketpass_mappings;
> > +   spinlock_t pvcallss_lock;
> > +
> > +   wait_queue_head_t inflight_req;
> > +   struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING];
> > +};
> > +struct xenbus_device *pvcalls_front_dev;
> > +
> > +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id)
> > +{
> > +   return IRQ_HANDLED;
> > +}
> > +
> >  static const struct xenbus_device_id pvcalls_front_ids[] = {
> > { "pvcalls" },
> > { "" }
> > @@ -33,7 +56,114 @@ static int pvcalls_front_remove(struct xenbus_device 
> > *dev)
> >  static int pvcalls_front_probe(struct xenbus_device *dev,
> >   const struct xenbus_device_id *id)
> >  {
> > +   int ret = -EFAULT, evtchn, ref = -1, i;
> > +   unsigned int max_page_order, function_calls, len;
> > +   char *versions;
> > +   grant_ref_t gref_head = 0;
> > +   struct xenbus_transaction xbt;
> > +   struct pvcalls_bedata *bedata = NULL;
> > +   struct xen_pvcalls_sring *sring;
> > +
> > +   if (pvcalls_front_dev != NULL) {
> > +   dev_err(>dev, "only one PV Calls connection supported\n");
> > +   return -EINVAL;
> > +   }
> > +
> > +   versions = xenbus_read(XBT_NIL, dev->otherend, "versions", );
> > +   if (!len)
> > +   return -EINVAL;
> > +   if (strcmp(versions, "1")) {
> > +   kfree(versions);
> > +   return -EINVAL;
> > +   }
> > +   kfree(versions);
> > +   ret = xenbus_scanf(XBT_NIL, dev->otherend,
> > +  "max-page-order", "%u", _page_order);
> 
> Use xenbus_read_unsigned() instead?

OK


> > +   if (ret <= 0)
> > +   return -ENODEV;
> > +   if (max_page_order < RING_ORDER)
> > +   return -ENODEV;
> > +   ret = xenbus_scanf(XBT_NIL, dev->otherend,
> > +  "function-calls", "%u", _calls);
> 
> xenbus_read_unsigned() again?

OK


> > +   if (ret <= 0 || function_calls != 1)
> > +   return -ENODEV;
> > +   pr_info("%s max-page-order is %u\n", __func__, max_page_order);
> > +
> > +   bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL);
> > +   if (!bedata)
> > +   return -ENOMEM;
> > +
> 
> You should call dev_set_drvdata() here already, otherwise entering the
> error path will dereference a NULL pointer instead of bedata.

OK


> > +   init_waitqueue_head(>inflight_req);
> > +   for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++)
> > +   bedata->rsp[i].req_id = PVCALLS_INVALID_ID;
> > +
> > +   sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL |
> > +__GFP_ZERO);
> > +   if (!sring)
> > +   goto error;
> 
> ret will be 1 here. Shouldn't you set it to -ENOMEM?

Yes, I'll do that


> 
> > +   SHARED_RING_INIT(sring);
> > +   FRONT_RING_INIT(>ring, sring, XEN_PAGE_SIZE);
> > +
> > +   ret = xenbus_alloc_evtchn(dev, );
> > +   if (ret)
> > +   goto error;
> > +
> > +   bedata->irq = bind_evtchn_to_irqhandler(evtchn,
> > +   pvcalls_front_event_handler,
> > +   0, "pvcalls-frontend", dev);
> > +   if (bedata->irq < 0) {
> > +   ret = bedata->irq;
> > +   goto error;
> > +   }
> > +
> > +   ret = 

[Xen-devel] [libvirt test] 112276: tolerable all pass - PUSHED

2017-07-25 Thread osstest service owner
flight 112276 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112276/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 112201
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 112201
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 112201
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-qcow2 12 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 13 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass

version targeted for testing:
 libvirt  f7237d63e8f02f3689f9b63b413fae7d4221faa9
baseline version:
 libvirt  f36f2e463f877292ee1f30dcb1000337739c2fd3

Last test of basis   112201  2017-07-23 04:21:57 Z2 days
Testing same since   112276  2017-07-25 04:21:09 Z0 days1 attempts


People who touched revisions under test:
  John Ferlan 
  Martin Kletzander 
  Michal Privoznik 
  Peter Krempa 
  Shivaprasad G Bhat 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master

[Xen-devel] [PATCH v2] hvmloader, libxl: use the correct ACPI settings depending on device model

2017-07-25 Thread Igor Druzhinin
We need to choose ACPI tables and ACPI IO port location
properly depending on the device model version we are running.
Previously, this decision was made by BIOS type specific
code in hvmloader, e.g. always load QEMU traditional specific
tables if it's ROMBIOS and always load QEMU Xen specific
tables if it's SeaBIOS.

This change saves this behavior but adds an additional way
(xenstore key) to specify the correct device model if we
happen to run a non-default one. Toolstack bit makes use of it.

Signed-off-by: Igor Druzhinin 
Reviewed-by: Paul Durrant 
---
Changes in v2:
* fix insufficient allocation size of localent
---
 tools/firmware/hvmloader/hvmloader.c |  2 --
 tools/firmware/hvmloader/ovmf.c  |  2 ++
 tools/firmware/hvmloader/rombios.c   |  2 ++
 tools/firmware/hvmloader/seabios.c   |  3 +++
 tools/firmware/hvmloader/util.c  | 24 
 tools/libxl/libxl_create.c   |  4 +++-
 6 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index f603f68..db11ab1 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -405,8 +405,6 @@ int main(void)
 }
 
 acpi_enable_sci();
-
-hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
 }
 
 init_vm86_tss();
diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 4ff7f1d..ebadc64 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -127,6 +127,8 @@ static void ovmf_acpi_build_tables(void)
 .dsdt_15cpu_len = 0
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/rombios.c 
b/tools/firmware/hvmloader/rombios.c
index 56b39b7..31a7c65 100644
--- a/tools/firmware/hvmloader/rombios.c
+++ b/tools/firmware/hvmloader/rombios.c
@@ -181,6 +181,8 @@ static void rombios_acpi_build_tables(void)
 .dsdt_15cpu_len = dsdt_15cpu_len,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+
 hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS);
 }
 
diff --git a/tools/firmware/hvmloader/seabios.c 
b/tools/firmware/hvmloader/seabios.c
index 870576a..5878eff 100644
--- a/tools/firmware/hvmloader/seabios.c
+++ b/tools/firmware/hvmloader/seabios.c
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 
 extern unsigned char dsdt_anycpu_qemu_xen[];
 extern int dsdt_anycpu_qemu_xen_len;
@@ -99,6 +100,8 @@ static void seabios_acpi_build_tables(void)
 .dsdt_15cpu_len = 0,
 };
 
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+
 hvmloader_acpi_build_tables(, rsdp);
 add_table(rsdp);
 }
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index db5f240..45b777c 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -31,6 +31,9 @@
 #include 
 #include 
 
+extern unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+
 /*
  * Check whether there exists overlap in the specified memory range.
  * Returns true if exists, else returns false.
@@ -897,6 +900,27 @@ void hvmloader_acpi_build_tables(struct acpi_config 
*config,
 /* Allocate and initialise the acpi info area. */
 mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1);
 
+/* If the device model is specified switch to the corresponding tables */
+s = xenstore_read("platform/device-model", "");
+if ( !strncmp(s, "qemu_xen_traditional", 21) )
+{
+config->dsdt_anycpu = dsdt_anycpu;
+config->dsdt_anycpu_len = dsdt_anycpu_len;
+config->dsdt_15cpu = dsdt_15cpu;
+config->dsdt_15cpu_len = dsdt_15cpu_len;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0);
+}
+else if ( !strncmp(s, "qemu_xen", 9) )
+{
+config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
+config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+config->dsdt_15cpu = NULL;
+config->dsdt_15cpu_len = 0;
+
+hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1);
+}
+
 config->lapic_base_address = LAPIC_BASE_ADDRESS;
 config->lapic_id = acpi_lapic_id;
 config->ioapic_base_address = ioapic_base_address;
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1158303..1d24209 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc,
 vments[4] = "start_time";
 vments[5] = GCSPRINTF("%lu.%02d", 
start_time.tv_sec,(int)start_time.tv_usec/1);
 
-localents = libxl__calloc(gc, 11, sizeof(char *));
+localents = libxl__calloc(gc, 13, sizeof(char *));
 i = 0;
 localents[i++] = "platform/acpi";
 localents[i++] = 

Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Paolo Bonzini

> > Thanks---however, after re-reading xen-mapcache.c, dma needs to be false
> > for unlocked mappings.
> 
> If there is a DMA operation already in progress, it means that we'll
> already have a locked mapping for it.

Yes, I only wanted to say that qemu_ram_ptr_length should pass dma=false
when called by address_space_*_continue (i.e. with locked=false).

Paolo

> When address_space_write_continue is called, which in turn would call
> qemu_map_ram_ptr, or qemu_ram_ptr_length(unlocked), if the start and
> size of the requested mapping matches the one of the previously created
> locked mapping, then a pointer to the locked mapping will be returned.
> 
> If they don't match, a new unlocked mapping will be created and a
> pointer to it will be returned. (Arguably the algorithm could be
> improved so that a new mapping is not created if the address and size
> are contained within the locked mapping. This is a missing optimization
> today.)
> 
> It doesn't matter if a new unlocked mapping is created, or if the locked
> mapping is returned, because the pointer returned by
> qemu_ram_ptr_length(unlocked) is only used to do the memcpy, and never
> again. So I don't think this is a problem.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4.12 047/196] xen/scsiback: Fix a TMR related use-after-free

2017-07-25 Thread Greg Kroah-Hartman
4.12-stable review patch.  If anyone has any objections, please let me know.

--

From: Bart Van Assche 

commit 9f4ab18ac51dc87345a9cbd2527e6acf7a0a9335 upstream.

scsiback_release_cmd() must not dereference se_cmd->se_tmr_req
because that memory is freed by target_free_cmd_mem() before
scsiback_release_cmd() is called. Fix this use-after-free by
inlining struct scsiback_tmr into struct vscsibk_pend.

Signed-off-by: Bart Van Assche 
Reviewed-by: Juergen Gross 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: David Disseldorp 
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Nicholas Bellinger 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/xen/xen-scsiback.c |   33 +
 1 file changed, 9 insertions(+), 24 deletions(-)

--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -134,9 +134,7 @@ struct vscsibk_pend {
struct page *pages[VSCSI_MAX_GRANTS];
 
struct se_cmd se_cmd;
-};
 
-struct scsiback_tmr {
atomic_t tmr_complete;
wait_queue_head_t tmr_wait;
 };
@@ -599,26 +597,20 @@ static void scsiback_device_action(struc
struct scsiback_tpg *tpg = pending_req->v2p->tpg;
struct scsiback_nexus *nexus = tpg->tpg_nexus;
struct se_cmd *se_cmd = _req->se_cmd;
-   struct scsiback_tmr *tmr;
u64 unpacked_lun = pending_req->v2p->lun;
int rc, err = FAILED;
 
-   tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL);
-   if (!tmr) {
-   target_put_sess_cmd(se_cmd);
-   goto err;
-   }
-
-   init_waitqueue_head(>tmr_wait);
+   init_waitqueue_head(_req->tmr_wait);
 
rc = target_submit_tmr(_req->se_cmd, nexus->tvn_se_sess,
   _req->sense_buffer[0],
-  unpacked_lun, tmr, act, GFP_KERNEL,
+  unpacked_lun, NULL, act, GFP_KERNEL,
   tag, TARGET_SCF_ACK_KREF);
if (rc)
goto err;
 
-   wait_event(tmr->tmr_wait, atomic_read(>tmr_complete));
+   wait_event(pending_req->tmr_wait,
+  atomic_read(_req->tmr_complete));
 
err = (se_cmd->se_tmr_req->response == TMR_FUNCTION_COMPLETE) ?
SUCCESS : FAILED;
@@ -626,9 +618,8 @@ static void scsiback_device_action(struc
scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
transport_generic_free_cmd(_req->se_cmd, 1);
return;
+
 err:
-   if (tmr)
-   kfree(tmr);
scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
 }
 
@@ -1389,12 +1380,6 @@ static int scsiback_check_stop_free(stru
 static void scsiback_release_cmd(struct se_cmd *se_cmd)
 {
struct se_session *se_sess = se_cmd->se_sess;
-   struct se_tmr_req *se_tmr = se_cmd->se_tmr_req;
-
-   if (se_tmr && se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) {
-   struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr;
-   kfree(tmr);
-   }
 
percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag);
 }
@@ -1455,11 +1440,11 @@ static int scsiback_queue_status(struct
 
 static void scsiback_queue_tm_rsp(struct se_cmd *se_cmd)
 {
-   struct se_tmr_req *se_tmr = se_cmd->se_tmr_req;
-   struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr;
+   struct vscsibk_pend *pending_req = container_of(se_cmd,
+   struct vscsibk_pend, se_cmd);
 
-   atomic_set(>tmr_complete, 1);
-   wake_up(>tmr_wait);
+   atomic_set(_req->tmr_complete, 1);
+   wake_up(_req->tmr_wait);
 }
 
 static void scsiback_aborted_task(struct se_cmd *se_cmd)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4.9 037/125] xen/scsiback: Fix a TMR related use-after-free

2017-07-25 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Bart Van Assche 

commit 9f4ab18ac51dc87345a9cbd2527e6acf7a0a9335 upstream.

scsiback_release_cmd() must not dereference se_cmd->se_tmr_req
because that memory is freed by target_free_cmd_mem() before
scsiback_release_cmd() is called. Fix this use-after-free by
inlining struct scsiback_tmr into struct vscsibk_pend.

Signed-off-by: Bart Van Assche 
Reviewed-by: Juergen Gross 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: David Disseldorp 
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Nicholas Bellinger 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/xen/xen-scsiback.c |   33 +
 1 file changed, 9 insertions(+), 24 deletions(-)

--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -134,9 +134,7 @@ struct vscsibk_pend {
struct page *pages[VSCSI_MAX_GRANTS];
 
struct se_cmd se_cmd;
-};
 
-struct scsiback_tmr {
atomic_t tmr_complete;
wait_queue_head_t tmr_wait;
 };
@@ -599,26 +597,20 @@ static void scsiback_device_action(struc
struct scsiback_tpg *tpg = pending_req->v2p->tpg;
struct scsiback_nexus *nexus = tpg->tpg_nexus;
struct se_cmd *se_cmd = _req->se_cmd;
-   struct scsiback_tmr *tmr;
u64 unpacked_lun = pending_req->v2p->lun;
int rc, err = FAILED;
 
-   tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL);
-   if (!tmr) {
-   target_put_sess_cmd(se_cmd);
-   goto err;
-   }
-
-   init_waitqueue_head(>tmr_wait);
+   init_waitqueue_head(_req->tmr_wait);
 
rc = target_submit_tmr(_req->se_cmd, nexus->tvn_se_sess,
   _req->sense_buffer[0],
-  unpacked_lun, tmr, act, GFP_KERNEL,
+  unpacked_lun, NULL, act, GFP_KERNEL,
   tag, TARGET_SCF_ACK_KREF);
if (rc)
goto err;
 
-   wait_event(tmr->tmr_wait, atomic_read(>tmr_complete));
+   wait_event(pending_req->tmr_wait,
+  atomic_read(_req->tmr_complete));
 
err = (se_cmd->se_tmr_req->response == TMR_FUNCTION_COMPLETE) ?
SUCCESS : FAILED;
@@ -626,9 +618,8 @@ static void scsiback_device_action(struc
scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
transport_generic_free_cmd(_req->se_cmd, 1);
return;
+
 err:
-   if (tmr)
-   kfree(tmr);
scsiback_do_resp_with_sense(NULL, err, 0, pending_req);
 }
 
@@ -1389,12 +1380,6 @@ static int scsiback_check_stop_free(stru
 static void scsiback_release_cmd(struct se_cmd *se_cmd)
 {
struct se_session *se_sess = se_cmd->se_sess;
-   struct se_tmr_req *se_tmr = se_cmd->se_tmr_req;
-
-   if (se_tmr && se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) {
-   struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr;
-   kfree(tmr);
-   }
 
percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag);
 }
@@ -1455,11 +1440,11 @@ static int scsiback_queue_status(struct
 
 static void scsiback_queue_tm_rsp(struct se_cmd *se_cmd)
 {
-   struct se_tmr_req *se_tmr = se_cmd->se_tmr_req;
-   struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr;
+   struct vscsibk_pend *pending_req = container_of(se_cmd,
+   struct vscsibk_pend, se_cmd);
 
-   atomic_set(>tmr_complete, 1);
-   wake_up(>tmr_wait);
+   atomic_set(_req->tmr_complete, 1);
+   wake_up(_req->tmr_wait);
 }
 
 static void scsiback_aborted_task(struct se_cmd *se_cmd)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support

2017-07-25 Thread Stefano Stabellini
On Tue, 25 Jul 2017, Julien Grall wrote:
> On 25/07/17 19:48, Stefano Stabellini wrote:
> > On Tue, 25 Jul 2017, Julien Grall wrote:
> > > On 25/07/17 07:47, Vijay Kilari wrote:
> > > > > >  void numa_failed(void)
> > > > > >  {
> > > > > >  numa_off = true;
> > > > > >  init_dt_numa_distance();
> > > > > >  node_distance_fn = NULL;
> > > > > > +init_cpu_to_node();
> > > > > > +}
> > > > > > +
> > > > > > +void __init numa_set_cpu_node(int cpu, unsigned int nid)
> > > > > > +{
> > > > > > +if ( !node_isset(nid, processor_nodes_parsed) || nid >=
> > > > > > MAX_NUMNODES
> > > > > > )
> > > > > > +nid = 0;
> > > > > 
> > > > > 
> > > > > This looks wrong to me. If the node-id is invalid, why would you
> > > > > blindly
> > > > > set
> > > > > to 0?
> > > > 
> > > > Generally this check will not pass. I will make this function return
> > > > error code in case
> > > > of wrong nid.
> > > 
> > > I don't really want to see error code and error handling everywhere in the
> > > initialization code. I would assume that if the NUMA bindings are wrong we
> > > should just crash Xen rather continuing with NUMA disabled.
> > > 
> > > Stefano do you have any opinion here?
> > 
> > Yes, I noticed that there is an overabundance of error checks in the
> > patches. I have pointed out in other cases that some of these checks are
> > duplicates.
> > 
> > I am OK with some checks but we should not do the same check over and
> > over.
> > 
> > To answer the question: do we need any checks at all?
> > 
> > I am fine with no checks on the device tree or ACPI bindings themselves.
> > I am also OK with some checks in few places to check that the
> > information passed by the firmware is in the right shape (for example we
> > check for the ACPI header length before accessing any ACPI tables). That
> > is good. But I am not OK with repeating the same check multiple times
> > uselessly or checking for conditions that cannot happen (like a NULL
> > pointer in the ACPI header case again).
> 
> I would prefer to keep the check on the DT bindings and ACPI bindings. I hit
> some problem in the past that were quite annoying to debug without them.
> 
> But I was wondering if we should just panic/BUG_ON directly. Rather than
> returning an error.

I think BUG_ON is fine, but it would be best if we also printed a
useful message before crashing Xen. At least the user would know that
the problem is a broken device_tree/ACPI.


> For any redundant check, we could just turn to an ASSERT.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Stefano Stabellini
On Tue, 25 Jul 2017, Paolo Bonzini wrote:
> - Original Message -
> > From: "Stefano Stabellini" 
> > To: "Paolo Bonzini" 
> > Cc: "Anthony PERARD" , "Stefano Stabellini" 
> > ,
> > xen-devel@lists.xen.org, qemu-de...@nongnu.org
> > Sent: Tuesday, July 25, 2017 8:08:21 PM
> > Subject: Re: QEMU commit 04bf2526ce breaks use of xen-mapcache
> > 
> > On Tue, 25 Jul 2017, Paolo Bonzini wrote:
> > > > Hi,
> > > > 
> > > > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram)
> > > > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr().
> > > > That result in calling xen_map_cache() with lock=true, but this mapping
> > > > is never invalidated.
> > > > So QEMU use more and more RAM until it stop working for a reason or an
> > > > other. (crash if host have little RAM or stop emulating but no crash)
> > > > 
> > > > I don't know if calling xen_invalidate_map_cache_entry() in
> > > > address_space_read_continue() and address_space_write_continue() is the
> > > > right answer.  Is there something better to do ?
> > > 
> > > I think it's correct for dma to be true... maybe add a lock argument to
> > > qemu_ram_ptr_length, so that make address_space_{read,write}_continue can
> > > pass 0 and everyone else passes 1?
> > 
> > I think that is a great suggestion. That way, the difference between
> > locked mappings and unlocked mappings would be explicit, rather than
> > relying on callers to use qemu_map_ram_ptr for unlocked mappings and
> > qemu_ram_ptr_length for locked mappings. And there aren't that many
> > callers of qemu_ram_ptr_length, so adding a parameter wouldn't be an
> > issue.
> 
> Thanks---however, after re-reading xen-mapcache.c, dma needs to be false
> for unlocked mappings.

If there is a DMA operation already in progress, it means that we'll
already have a locked mapping for it.

When address_space_write_continue is called, which in turn would call
qemu_map_ram_ptr, or qemu_ram_ptr_length(unlocked), if the start and
size of the requested mapping matches the one of the previously created
locked mapping, then a pointer to the locked mapping will be returned.

If they don't match, a new unlocked mapping will be created and a
pointer to it will be returned. (Arguably the algorithm could be
improved so that a new mapping is not created if the address and size
are contained within the locked mapping. This is a missing optimization
today.)

It doesn't matter if a new unlocked mapping is created, or if the locked
mapping is returned, because the pointer returned by
qemu_ram_ptr_length(unlocked) is only used to do the memcpy, and never
again. So I don't think this is a problem.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/hvm: Fix boundary check in hvmemul_insn_fetch()

2017-07-25 Thread Andrew Cooper
c/s 0943a03037 added some extra protection for overflowing the emulation
instruction cache, but Coverity points out that boundary condition is off by
one when memcpy()'ing out of the buffer.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Paul Durrant 
---
 xen/arch/x86/hvm/emulate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 495e312..52bed04 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -958,8 +958,8 @@ int hvmemul_insn_fetch(
  * Will we overflow insn_buf[]?  This shouldn't be able to happen,
  * which means something went wrong with instruction decoding...
  */
-if ( insn_off > sizeof(hvmemul_ctxt->insn_buf) ||
- (insn_off + bytes) > sizeof(hvmemul_ctxt->insn_buf) )
+if ( insn_off >= sizeof(hvmemul_ctxt->insn_buf) ||
+ (insn_off + bytes) >= sizeof(hvmemul_ctxt->insn_buf) )
 {
 ASSERT_UNREACHABLE();
 return X86EMUL_UNHANDLEABLE;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86/svm: Drop svm_segment_register_t

2017-07-25 Thread Boris Ostrovsky
On 07/25/2017 02:28 PM, Andrew Cooper wrote:
> Most SVM code already uses struct segment_register.  Drop the typedef and
> adjust the definitions in struct vmcb_struct, and svm_dump_sel().  Introduce
> some build-time assertions that struct segment_register from the common
> emulation code is usable in struct vmcb_struct.
>
> While making these adjustments, fix some comments to not mix decimal and
> hexidecimal offsets, and drop all trailing whitespace in vmcb.h
>
> No functional change.
>
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 
> ---

Reviewed-by: Boris Ostrovsky 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support

2017-07-25 Thread Julien Grall



On 25/07/17 19:48, Stefano Stabellini wrote:

On Tue, 25 Jul 2017, Julien Grall wrote:

On 25/07/17 07:47, Vijay Kilari wrote:

 void numa_failed(void)
 {
 numa_off = true;
 init_dt_numa_distance();
 node_distance_fn = NULL;
+init_cpu_to_node();
+}
+
+void __init numa_set_cpu_node(int cpu, unsigned int nid)
+{
+if ( !node_isset(nid, processor_nodes_parsed) || nid >=
MAX_NUMNODES
)
+nid = 0;



This looks wrong to me. If the node-id is invalid, why would you blindly
set
to 0?


Generally this check will not pass. I will make this function return
error code in case
of wrong nid.


I don't really want to see error code and error handling everywhere in the
initialization code. I would assume that if the NUMA bindings are wrong we
should just crash Xen rather continuing with NUMA disabled.

Stefano do you have any opinion here?


Yes, I noticed that there is an overabundance of error checks in the
patches. I have pointed out in other cases that some of these checks are
duplicates.

I am OK with some checks but we should not do the same check over and
over.

To answer the question: do we need any checks at all?

I am fine with no checks on the device tree or ACPI bindings themselves.
I am also OK with some checks in few places to check that the
information passed by the firmware is in the right shape (for example we
check for the ACPI header length before accessing any ACPI tables). That
is good. But I am not OK with repeating the same check multiple times
uselessly or checking for conditions that cannot happen (like a NULL
pointer in the ACPI header case again).


I would prefer to keep the check on the DT bindings and ACPI bindings. I 
hit some problem in the past that were quite annoying to debug without them.


But I was wondering if we should just panic/BUG_ON directly. Rather than 
returning an error.


For any redundant check, we could just turn to an ASSERT.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support

2017-07-25 Thread Stefano Stabellini
On Tue, 25 Jul 2017, Julien Grall wrote:
> On 25/07/17 07:47, Vijay Kilari wrote:
> > > >  void numa_failed(void)
> > > >  {
> > > >  numa_off = true;
> > > >  init_dt_numa_distance();
> > > >  node_distance_fn = NULL;
> > > > +init_cpu_to_node();
> > > > +}
> > > > +
> > > > +void __init numa_set_cpu_node(int cpu, unsigned int nid)
> > > > +{
> > > > +if ( !node_isset(nid, processor_nodes_parsed) || nid >=
> > > > MAX_NUMNODES
> > > > )
> > > > +nid = 0;
> > > 
> > > 
> > > This looks wrong to me. If the node-id is invalid, why would you blindly
> > > set
> > > to 0?
> > 
> > Generally this check will not pass. I will make this function return
> > error code in case
> > of wrong nid.
> 
> I don't really want to see error code and error handling everywhere in the
> initialization code. I would assume that if the NUMA bindings are wrong we
> should just crash Xen rather continuing with NUMA disabled.
> 
> Stefano do you have any opinion here?

Yes, I noticed that there is an overabundance of error checks in the
patches. I have pointed out in other cases that some of these checks are
duplicates.

I am OK with some checks but we should not do the same check over and
over.

To answer the question: do we need any checks at all?

I am fine with no checks on the device tree or ACPI bindings themselves.
I am also OK with some checks in few places to check that the
information passed by the firmware is in the right shape (for example we
check for the ACPI header length before accessing any ACPI tables). That
is good. But I am not OK with repeating the same check multiple times
uselessly or checking for conditions that cannot happen (like a NULL
pointer in the ACPI header case again).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Paolo Bonzini


- Original Message -
> From: "Stefano Stabellini" 
> To: "Paolo Bonzini" 
> Cc: "Anthony PERARD" , "Stefano Stabellini" 
> ,
> xen-devel@lists.xen.org, qemu-de...@nongnu.org
> Sent: Tuesday, July 25, 2017 8:08:21 PM
> Subject: Re: QEMU commit 04bf2526ce breaks use of xen-mapcache
> 
> On Tue, 25 Jul 2017, Paolo Bonzini wrote:
> > > Hi,
> > > 
> > > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram)
> > > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr().
> > > That result in calling xen_map_cache() with lock=true, but this mapping
> > > is never invalidated.
> > > So QEMU use more and more RAM until it stop working for a reason or an
> > > other. (crash if host have little RAM or stop emulating but no crash)
> > > 
> > > I don't know if calling xen_invalidate_map_cache_entry() in
> > > address_space_read_continue() and address_space_write_continue() is the
> > > right answer.  Is there something better to do ?
> > 
> > I think it's correct for dma to be true... maybe add a lock argument to
> > qemu_ram_ptr_length, so that make address_space_{read,write}_continue can
> > pass 0 and everyone else passes 1?
> 
> I think that is a great suggestion. That way, the difference between
> locked mappings and unlocked mappings would be explicit, rather than
> relying on callers to use qemu_map_ram_ptr for unlocked mappings and
> qemu_ram_ptr_length for locked mappings. And there aren't that many
> callers of qemu_ram_ptr_length, so adding a parameter wouldn't be an
> issue.

Thanks---however, after re-reading xen-mapcache.c, dma needs to be false
for unlocked mappings.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support

2017-07-25 Thread Julien Grall



On 25/07/17 07:47, Vijay Kilari wrote:

 void numa_failed(void)
 {
 numa_off = true;
 init_dt_numa_distance();
 node_distance_fn = NULL;
+init_cpu_to_node();
+}
+
+void __init numa_set_cpu_node(int cpu, unsigned int nid)
+{
+if ( !node_isset(nid, processor_nodes_parsed) || nid >= MAX_NUMNODES
)
+nid = 0;



This looks wrong to me. If the node-id is invalid, why would you blindly set
to 0?


Generally this check will not pass. I will make this function return
error code in case
of wrong nid.


I don't really want to see error code and error handling everywhere in 
the initialization code. I would assume that if the NUMA bindings are 
wrong we should just crash Xen rather continuing with NUMA disabled.


Stefano do you have any opinion here?

[...]


 #include 
 #include 
 #include 
@@ -106,6 +107,7 @@ static void __init dt_smp_init_cpus(void)
 [0 ... NR_CPUS - 1] = MPIDR_INVALID
 };
 bool_t bootcpu_valid = 0;
+nodeid_t *cpu_to_nodemap;
 int rc;

 mpidr = boot_cpu_data.mpidr.bits & MPIDR_HWID_MASK;
@@ -117,11 +119,18 @@ static void __init dt_smp_init_cpus(void)
 return;
 }

+cpu_to_nodemap = xzalloc_array(nodeid_t, NR_CPUS);



Why do you need to allocate cpu_to_nodemap? Would not it be easier to put it
on the stack as we do for other variable?


This array holds nodemap indexed by cpuid once for all the cpus.
Later while setting the logical cpu id mapping, the node mapping is set
by calling numa_set_cpu_node().


This does not answer question... Please read it again and explain why 
you can't do:


nodeid_t cpu_to_nodemap[NR_CPUS];


diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index d1dc83a..0d3146c 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -10,12 +10,19 @@ void init_dt_numa_distance(void);
 #ifdef CONFIG_NUMA
 void numa_init(void);
 int dt_numa_init(void);
+void numa_set_cpu_node(int cpu, unsigned int nid);
+
 #else
 static inline void numa_init(void)
 {
 return;
 }

+static inline void numa_set_cpu_node(int cpu, unsigned int nid)
+{
+return;
+}
+
 /* Fake one node for now. See also node_online_map. */
 #define cpu_to_node(cpu) 0
 #define node_to_cpumask(node)   (cpu_online_map)
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index ca0a2a6..fc4747f 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -15,7 +15,6 @@ extern nodeid_t acpi_setup_node(unsigned int pxm);
 extern void srat_detect_node(int cpu);

 extern nodeid_t apicid_to_node[];
-extern void init_cpu_to_node(void);

 void srat_parse_regions(paddr_t addr);
 unsigned int arch_get_dma_bitsize(void);
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 10ef4c4..8a306e7 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -30,6 +30,7 @@ extern s8 acpi_numa;
 void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
 int srat_disabled(void);
 int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node);
+void init_cpu_to_node(void);



You never used this function in common code. So why did you move it in the
common headers?


Same was defined for x86 as well. So I have moved to common header file.


You should make common only functions that will be called from code 
common or are part of common code.


In this particular case, the name might be the same but the behavior is 
completely different and the way to use it too...


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/3] x86/hvm: Rearange check_segment() to use a switch statement

2017-07-25 Thread Andrew Cooper
This simplifies the logic by separating the x86_segment check from the type
check.  No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/hvm/domain.c | 43 +--
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index dca7a00..293956c 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -70,23 +70,38 @@ static int check_segment(struct segment_register *reg, enum 
x86_segment seg)
 return -EINVAL;
 }
 
-if ( seg == x86_seg_cs && !(reg->attr.fields.type & 0x8) )
+switch ( seg )
 {
-gprintk(XENLOG_ERR, "Non-code segment provided for CS\n");
-return -EINVAL;
-}
+case x86_seg_cs:
+if ( !(reg->attr.fields.type & 0x8) )
+{
+gprintk(XENLOG_ERR, "Non-code segment provided for CS\n");
+return -EINVAL;
+}
+break;
 
-if ( seg == x86_seg_ss &&
- ((reg->attr.fields.type & 0x8) || !(reg->attr.fields.type & 0x2)) )
-{
-gprintk(XENLOG_ERR, "Non-writeable segment provided for SS\n");
-return -EINVAL;
-}
+case x86_seg_ss:
+if ( (reg->attr.fields.type & 0x8) || !(reg->attr.fields.type & 0x2) )
+{
+gprintk(XENLOG_ERR, "Non-writeable segment provided for SS\n");
+return -EINVAL;
+}
+break;
 
-if ( reg->attr.fields.s && seg != x86_seg_ss && seg != x86_seg_cs &&
- (reg->attr.fields.type & 0x8) && !(reg->attr.fields.type & 0x2) )
-{
-gprintk(XENLOG_ERR, "Non-readable segment provided for DS or ES\n");
+case x86_seg_ds:
+case x86_seg_es:
+if ( (reg->attr.fields.type & 0x8) && !(reg->attr.fields.type & 0x2) )
+{
+gprintk(XENLOG_ERR, "Non-readable segment provided for DS or 
ES\n");
+return -EINVAL;
+}
+break;
+
+case x86_seg_tr:
+break;
+
+default:
+ASSERT_UNREACHABLE();
 return -EINVAL;
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 0/3] Improvements to struct semgent_register

2017-07-25 Thread Andrew Cooper
No functional change, but the source code is less verbose.

Andrew Cooper (3):
  x86/svm: Drop svm_segment_register_t
  x86/hvm: Rearange check_segment() to use a switch statement
  x86/emul: Drop segment_attributes_t

 tools/fuzz/x86_instruction_emulator/fuzz-emul.c |  10 +-
 tools/tests/x86_emulator/test_x86_emulator.c|   2 +-
 xen/arch/x86/cpu/vpmu.c |   2 +-
 xen/arch/x86/hvm/domain.c   |  80 +++-
 xen/arch/x86/hvm/emulate.c  |  20 +--
 xen/arch/x86/hvm/hvm.c  | 154 
 xen/arch/x86/hvm/svm/svm.c  |  10 +-
 xen/arch/x86/hvm/svm/svmdebug.c |   6 +-
 xen/arch/x86/hvm/svm/vmcb.c |  32 +++--
 xen/arch/x86/hvm/vmx/realmode.c |  10 +-
 xen/arch/x86/hvm/vmx/vmx.c  |  41 +++
 xen/arch/x86/mm/shadow/common.c |   6 +-
 xen/arch/x86/pv/emul-priv-op.c  |  40 +++---
 xen/arch/x86/vm_event.c |   2 +-
 xen/arch/x86/x86_emulate/x86_emulate.c  |  55 +
 xen/arch/x86/x86_emulate/x86_emulate.h  |  37 +++---
 xen/include/asm-x86/hvm/svm/vmcb.h  |  37 +++---
 17 files changed, 279 insertions(+), 265 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 3/3] x86/emul: Drop segment_attributes_t

2017-07-25 Thread Andrew Cooper
The amount of namespace resolution is unnecessarily large, as all code deals
in terms of struct segment_register.  This removes the attr.fields part of all
references, and alters attr.bytes to just attr.

Three areas of code using initialisers for segment_register are tweaked to
compile with older versions of GCC.  arch_set_info_hvm_guest() has its SEG()
macros altered to use plain comma-based initialisation, while
{rm,vm86}_{cs,ds}_attr are simplified to plain numbers which matches their
description in the manuals.

No functional change.  (For some reason, the old {rm,vm86}_{cs,ds}_attr causes
GCC to create variable in .rodata, whereas the new code uses immediate
operands.  As a result, vmx_{get,set}_segment_register() are slightly
shorter.)

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
 tools/fuzz/x86_instruction_emulator/fuzz-emul.c |  10 +-
 tools/tests/x86_emulator/test_x86_emulator.c|   2 +-
 xen/arch/x86/cpu/vpmu.c |   2 +-
 xen/arch/x86/hvm/domain.c   |  43 ---
 xen/arch/x86/hvm/emulate.c  |  20 +--
 xen/arch/x86/hvm/hvm.c  | 154 
 xen/arch/x86/hvm/svm/svm.c  |  10 +-
 xen/arch/x86/hvm/svm/svmdebug.c |   4 +-
 xen/arch/x86/hvm/svm/vmcb.c |  16 +--
 xen/arch/x86/hvm/vmx/realmode.c |  10 +-
 xen/arch/x86/hvm/vmx/vmx.c  |  41 +++
 xen/arch/x86/mm/shadow/common.c |   6 +-
 xen/arch/x86/pv/emul-priv-op.c  |  40 +++---
 xen/arch/x86/vm_event.c |   2 +-
 xen/arch/x86/x86_emulate/x86_emulate.c  |  55 +
 xen/arch/x86/x86_emulate/x86_emulate.h  |  37 +++---
 16 files changed, 219 insertions(+), 233 deletions(-)

diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c 
b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
index aadbb40..a2329f8 100644
--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
@@ -583,7 +583,7 @@ static bool in_longmode(struct x86_emulate_ctxt *ctxt)
 const struct fuzz_state *s = ctxt->data;
 const struct fuzz_corpus *c = s->corpus;
 
-return long_mode_active(ctxt) && c->segments[x86_seg_cs].attr.fields.l;
+return long_mode_active(ctxt) && c->segments[x86_seg_cs].l;
 }
 
 static void set_sizes(struct x86_emulate_ctxt *ctxt)
@@ -597,8 +597,8 @@ static void set_sizes(struct x86_emulate_ctxt *ctxt)
 ctxt->addr_size = ctxt->sp_size = 64;
 else
 {
-ctxt->addr_size = c->segments[x86_seg_cs].attr.fields.db ? 32 : 16;
-ctxt->sp_size   = c->segments[x86_seg_ss].attr.fields.db ? 32 : 16;
+ctxt->addr_size = c->segments[x86_seg_cs].db ? 32 : 16;
+ctxt->sp_size   = c->segments[x86_seg_ss].db ? 32 : 16;
 }
 }
 
@@ -741,8 +741,8 @@ static void sanitize_input(struct x86_emulate_ctxt *ctxt)
 /* EFLAGS.VM implies 16-bit mode */
 if ( regs->rflags & X86_EFLAGS_VM )
 {
-c->segments[x86_seg_cs].attr.fields.db = 0;
-c->segments[x86_seg_ss].attr.fields.db = 0;
+c->segments[x86_seg_cs].db = 0;
+c->segments[x86_seg_ss].db = 0;
 }
 }
 
diff --git a/tools/tests/x86_emulator/test_x86_emulator.c 
b/tools/tests/x86_emulator/test_x86_emulator.c
index 1955332..76665ab 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -264,7 +264,7 @@ static int read_segment(
 if ( !is_x86_user_segment(seg) )
 return X86EMUL_UNHANDLEABLE;
 memset(reg, 0, sizeof(*reg));
-reg->attr.fields.p = 1;
+reg->p = 1;
 return X86EMUL_OKAY;
 }
 
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 21383d3..90954ca 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -304,7 +304,7 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
 r->cs = seg.sel;
 hvm_get_segment_register(sampled, x86_seg_ss, );
 r->ss = seg.sel;
-r->cpl = seg.attr.fields.dpl;
+r->cpl = seg.dpl;
 if ( !(sampled->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) )
 *flags |= PMU_SAMPLE_REAL;
 }
diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index 293956c..7e11541 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -27,13 +27,13 @@
 static int check_segment(struct segment_register *reg, enum x86_segment seg)
 {
 
-if ( reg->attr.fields.pad != 0 )
+if ( reg->pad != 0 )
 {
 gprintk(XENLOG_ERR, "Segment attribute bits 12-15 are not zero\n");
 return -EINVAL;
 }
 
-if ( reg->attr.bytes == 0 )
+if ( reg->attr == 0 )
 {
 if ( seg != x86_seg_ds && seg != x86_seg_es )
 {
@@ -45,26 +45,26 @@ static int check_segment(struct 

[Xen-devel] [PATCH v2 1/3] x86/svm: Drop svm_segment_register_t

2017-07-25 Thread Andrew Cooper
Most SVM code already uses struct segment_register.  Drop the typedef and
adjust the definitions in struct vmcb_struct, and svm_dump_sel().  Introduce
some build-time assertions that struct segment_register from the common
emulation code is usable in struct vmcb_struct.

While making these adjustments, fix some comments to not mix decimal and
hexidecimal offsets, and drop all trailing whitespace in vmcb.h

No functional change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Boris Ostrovsky 
CC: Suravee Suthikulpanit 

v2:
 * Extend the build assertions.
---
 xen/arch/x86/hvm/svm/svmdebug.c|  2 +-
 xen/arch/x86/hvm/svm/vmcb.c| 16 
 xen/include/asm-x86/hvm/svm/vmcb.h | 37 +
 3 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c
index a3f8685..4902824 100644
--- a/xen/arch/x86/hvm/svm/svmdebug.c
+++ b/xen/arch/x86/hvm/svm/svmdebug.c
@@ -21,7 +21,7 @@
 #include 
 #include 
 
-static void svm_dump_sel(const char *name, const svm_segment_register_t *s)
+static void svm_dump_sel(const char *name, const struct segment_register *s)
 {
 printk("%s: %04x %04x %08x %016"PRIx64"\n",
name, s->sel, s->attr.bytes, s->limit, s->base);
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 96abf8d..2e67d8d 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -310,6 +310,22 @@ void __init setup_vmcb_dump(void)
 register_keyhandler('v', vmcb_dump, "dump AMD-V VMCBs", 1);
 }
 
+static void __init __maybe_unused build_assertions(void)
+{
+struct segment_register sreg;
+
+/* Check struct segment_register against the VMCB segment layout. */
+BUILD_BUG_ON(sizeof(sreg)   != 16);
+BUILD_BUG_ON(sizeof(sreg.sel)   != 2);
+BUILD_BUG_ON(sizeof(sreg.attr)  != 2);
+BUILD_BUG_ON(sizeof(sreg.limit) != 4);
+BUILD_BUG_ON(sizeof(sreg.base)  != 8);
+BUILD_BUG_ON(offsetof(struct segment_register, sel)   != 0);
+BUILD_BUG_ON(offsetof(struct segment_register, attr)  != 2);
+BUILD_BUG_ON(offsetof(struct segment_register, limit) != 4);
+BUILD_BUG_ON(offsetof(struct segment_register, base)  != 8);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h 
b/xen/include/asm-x86/hvm/svm/vmcb.h
index 30a228b..fa0d3e2 100644
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -31,7 +31,7 @@ enum GenericIntercept1bits
 GENERAL1_INTERCEPT_SMI   = 1 << 2,
 GENERAL1_INTERCEPT_INIT  = 1 << 3,
 GENERAL1_INTERCEPT_VINTR = 1 << 4,
-GENERAL1_INTERCEPT_CR0_SEL_WRITE = 1 << 5, 
+GENERAL1_INTERCEPT_CR0_SEL_WRITE = 1 << 5,
 GENERAL1_INTERCEPT_IDTR_READ = 1 << 6,
 GENERAL1_INTERCEPT_GDTR_READ = 1 << 7,
 GENERAL1_INTERCEPT_LDTR_READ = 1 << 8,
@@ -304,13 +304,10 @@ enum VMEXIT_EXITCODE
 VMEXIT_INVALID  =  -1
 };
 
-/* Definition of segment state is borrowed by the generic HVM code. */
-typedef struct segment_register svm_segment_register_t;
-
 typedef union
 {
 u64 bytes;
-struct 
+struct
 {
 u64 vector:8;
 u64 type:  3;
@@ -324,7 +321,7 @@ typedef union
 typedef union
 {
 u64 bytes;
-struct 
+struct
 {
 u64 tpr:  8;
 u64 irq:  1;
@@ -342,7 +339,7 @@ typedef union
 typedef union
 {
 u64 bytes;
-struct 
+struct
 {
 u64 type: 1;
 u64 rsv0: 1;
@@ -438,23 +435,23 @@ struct vmcb_struct {
 u8  guest_ins[15];  /* offset 0xD1 */
 u64 res10a[100];/* offset 0xE0 pad to save area */
 
-svm_segment_register_t es;  /* offset 1024 - cleanbit 8 */
-svm_segment_register_t cs;  /* cleanbit 8 */
-svm_segment_register_t ss;  /* cleanbit 8 */
-svm_segment_register_t ds;  /* cleanbit 8 */
-svm_segment_register_t fs;
-svm_segment_register_t gs;
-svm_segment_register_t gdtr; /* cleanbit 7 */
-svm_segment_register_t ldtr;
-svm_segment_register_t idtr; /* cleanbit 7 */
-svm_segment_register_t tr;
+struct segment_register es;  /* offset 0x400 - cleanbit 8 */
+struct segment_register cs;  /* cleanbit 8 */
+struct segment_register ss;  /* cleanbit 8 */
+struct segment_register ds;  /* cleanbit 8 */
+struct segment_register fs;
+struct segment_register gs;
+struct segment_register gdtr; /* cleanbit 7 */
+struct segment_register ldtr;
+struct segment_register idtr; /* cleanbit 7 */
+struct segment_register tr;
 u64 res10[5];
 u8 res11[3];
 u8 _cpl;/* cleanbit 8 */
 u32 res12;
-u64 _efer;  /* offset 1024 + 0xD0  - cleanbit 5 */
+u64 _efer;  /* offset 0x400 + 0xD0 - cleanbit 5 */
 u64 

Re: [Xen-devel] [OSSTEST PATCH v13 00/24] Have OpenStack tested on top of xen's master and libvirt's master.

2017-07-25 Thread Ian Jackson
Anthony PERARD writes ("[OSSTEST PATCH v13 00/24] Have OpenStack tested on top 
of xen's master and libvirt's master."):
> Now powered with subunit-to-substep engine.
> 
> The Tempest test names reported via subunit are in the form:
> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,id-bdbb5441-9204-419d-a225-b4fdbfb1a1a8,image,network,volume]

Blimey.  Well, I guess we will put up with that.

Thanks.  I think I have been through all the patches now.  I hope you
find my comments helpful.

Regards,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH v13 18/24] TestSupport: Introduce target_cmd_stashed

2017-07-25 Thread Ian Jackson
Anthony PERARD writes ("[OSSTEST PATCH v13 18/24] TestSupport: Introduce 
target_cmd_stashed"):
> This works like target_cmd, but takes a ref to a filename as argument
> and stash the output of the command then return a path to the stashed
> output.
...
> +# Like target_cmd, but stash cmd stdout and return a path to it.
> +sub target_cmd_stashed ($$$;$$) {
> +my ($tho,$leafref,$tcmd,$timeout,$extrasshopts) = @_;
> +my $stdout = open_unique_stashfile($leafref);
> +my $rc = tcmd(undef, $stdout, 0, 'osstest', $tho, $tcmd, $timeout,
> +$extrasshopts);
> +die "$stdout $!" if $stdout->error or !close $stdout;

You can't sensibly interpolate a filehandle into a "" string.
You should use the filename (probably, the whole filename).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH v13 22/24] New branch openstack-ocata

2017-07-25 Thread Ian Jackson
Anthony PERARD writes ("[OSSTEST PATCH v13 22/24] New branch openstack-ocata"):
> Testing of the Ocata stable branch of OpenStack against Xen unstable.
> 
> OpenStack have many different repo which should be in sync, so we should
> attempd to grab the revisions of the stable branch of every OpenStack
> tree, but for now, the runvars REVISION_* of tree other than nova is set
> to "origin/stable/ocata", except Tempest does not have stable branch and
> should be able to test any OpenStack version.

Ah I see this is where the new branch is created.  You will want to
run standalone-generate-dump-flight-runvars after this, not after the
previous patch (which is fine as it is - sorry, please forget my
comment about editing cr-for-branches there).

I'm afraid I don't understand your explanation about stable branches.
Earlier I asked:

  > Do you intend to provide a version of this patch which maintains a
  > tested branch for all of these different trees ?

  No, I don't. This would be a different patch (and maybe different patch
  series).

So you don't intend to maintain a tested branch for each of these
trees.  But you do intend, I think, to maintain a tested branch of the
main tree.  You say the subtrees "should be in sync", which I take to
mean that openstack upstream only expect it to work if you grab a
revision from all of these trees at "roughty the same time" or
something.

I don't think this will necessarily work properly.  The most obvious
problem I see is that regressions which are introduced in the subtrees
will be picked up by even flights which are attempting to use a
working (ie osstest-tested) version of "openstack".

This may not be critical if openstack jobs appear only on the flights
for openstack branches.  openstack branches will occasionally suffer
trouble but things will probably be BALGE.

But we definitely won't be able to add openstack tests to the other
branches without doing something different.

I have a very strong feeling we have discussed this before but I'm
afraid the answer doesn't seem to have been written down somewhere I
can easily find it.  It should be explained in your series somewhere,
in a comment or a commit message.

It would also be nice to have a theory about how this could be
improved in the future.  That would mean we could be more confident
that we're not painting ourselves into a corner.

Having said all that, with a suitable explanation, I think the code is
probably about right.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH v13 24/24] openstack tests: Don't run them on arm*

2017-07-25 Thread Ian Jackson
Anthony PERARD writes ("[OSSTEST PATCH v13 24/24] openstack tests: Don't run 
them on arm*"):
> Signed-off-by: Anthony PERARD 

Acked-by: Ian Jackson 

You should probably mention the existence of this patch in the other
one's commit message...

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH v13 21/24] Create a flight to test OpenStack with xen-unstable and libvirt

2017-07-25 Thread Ian Jackson
Anthony PERARD writes ("[OSSTEST PATCH v13 21/24] Create a flight to test 
OpenStack with xen-unstable and libvirt"):
> This patch creates a flight "openstack*", with those jobs:

Do you mean it creates a "branch" ?  But I don't think it does.  I
predict no changes to the output of mg-list-all-branches.  You
probably want to edit cr-for-branches.

>   build-amd64
>   build-amd64-libvirt
>   build-amd64-pvops
>   build-amd64-xsm
>   build-arm64
>   build-arm64-libvirt
>   build-arm64-pvops
>   build-arm64-xsm
>   build-armhf
>   build-armhf-libvirt
>   build-armhf-pvops
>   build-armhf-xsm
>   test-amd64-amd64-devstack
>   test-amd64-amd64-devstack-xsm
>   test-arm64-arm64-devstack
>   test-arm64-arm64-devstack-xsm
>   test-armhf-armhf-devstack
>   test-armhf-armhf-devstack-xsm

Does it add these jobs to existing flights ?  I think it probably
does.  Is that intentional ?  I think it probably isn't.  If it is you
should explain it in the commit message - and also explain why this is
OK despite you not keeping tested versions of all the input branches
(ie, risking uncontrolled regressions in openstack components breaking
the xen pushes...)

You can see what it does with standalone-dump-all-flight-runvars
(which is best run with eatmydata).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Stefano Stabellini
On Tue, 25 Jul 2017, Julien Grall wrote:
> On 25/07/17 18:52, Stefano Stabellini wrote:
> > On Tue, 25 Jul 2017, Andrii Anisov wrote:
> > > Hello Andrew,
> > > 
> > > 
> > > On 25.07.17 19:23, Andrew Cooper wrote:
> > > > As a general rule, Xen needs to be able to tolerate and cope with any
> > > > quantity of crap described by the firmware.  On the x86 side, we have
> > > > large quantities of workarounds for buggy ACPI/MP/SMBIOS tables.
> > > That approach somehow covered with early mentioned options:
> > > 
> > > On 25.07.17 15:24, Andrii Anisov wrote:
> > > > * ignore next duplicating (overlapping) memory node in favor of one
> > > > already
> > > > in a memory banks list
> > > > * merge duplicating (overlapping), even neighboring, memory banks
> > > 
> > > On 25.07.17 19:23, Andrew Cooper wrote:
> > > > It might be the case that the best Xen can do is give up, but it should
> > > > do so with a clear error message identifying what the firmware has done
> > > > which is sufficiently crazy to prevent further booting.
> > > We have one more option to choose for the case:
> > > 
> > > * BUG() with clear notification at the moment we are trying to add
> > > overlapping
> > > memory bank
> > > 
> > > So what to choose?
> > 
> > Certainly we need to print a clear warning.
> 
> +1 here.
> 
> > Then, we can decide whether we prefer to crash (as we do today), or
> > work-around the broken device-tree. I think it would be more beneficial
> > to Xen users if we tried to continue anyway, and probably the best way
> > to do that would be by merging the overlapping memory regions.
> > I fully understand that this is not required by the spec, but lots of
> > hardware (x86 and ARM) get released every day with some sort of broken
> > spec compliance. It's our job to decide on a case by case basis whether
> > it makes sense for us to support these platforms anyway.
> > 
> > In this case, the cost of supporting the Renesas R-Car Gen3 seems pretty
> > limited to me. Of course, I would have to see the patch, but if we can
> > make this work with limited amount of changes, very maintainable, I
> > would take them. Otherwise, screw it :-)
> 
> I tend to disagree here. This is the first board were the bug occurs and the
> Device-Tree is replaceable.
> 
> Furthermore, if you look at the wikipage for Renesas R-Car on Xen (see [1]), a
> specific DT for Xen is provided.
> 
> So I can't see any reason to implement that in Xen at the moment.

That's true. It does not make sense to do this until we get rid of the
Xen specific R-Car device tree on the wiki.


> [1]
> https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Stefano Stabellini
On Tue, 25 Jul 2017, Paolo Bonzini wrote:
> > Hi,
> > 
> > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram)
> > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr().
> > That result in calling xen_map_cache() with lock=true, but this mapping
> > is never invalidated.
> > So QEMU use more and more RAM until it stop working for a reason or an
> > other. (crash if host have little RAM or stop emulating but no crash)
> > 
> > I don't know if calling xen_invalidate_map_cache_entry() in
> > address_space_read_continue() and address_space_write_continue() is the
> > right answer.  Is there something better to do ?
> 
> I think it's correct for dma to be true... maybe add a lock argument to
> qemu_ram_ptr_length, so that make address_space_{read,write}_continue can
> pass 0 and everyone else passes 1?

I think that is a great suggestion. That way, the difference between
locked mappings and unlocked mappings would be explicit, rather than
relying on callers to use qemu_map_ram_ptr for unlocked mappings and
qemu_ram_ptr_length for locked mappings. And there aren't that many
callers of qemu_ram_ptr_length, so adding a parameter wouldn't be an
issue.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Julien Grall



On 25/07/17 18:52, Stefano Stabellini wrote:

On Tue, 25 Jul 2017, Andrii Anisov wrote:

Hello Andrew,


On 25.07.17 19:23, Andrew Cooper wrote:

As a general rule, Xen needs to be able to tolerate and cope with any
quantity of crap described by the firmware.  On the x86 side, we have
large quantities of workarounds for buggy ACPI/MP/SMBIOS tables.

That approach somehow covered with early mentioned options:

On 25.07.17 15:24, Andrii Anisov wrote:

* ignore next duplicating (overlapping) memory node in favor of one already
in a memory banks list
* merge duplicating (overlapping), even neighboring, memory banks


On 25.07.17 19:23, Andrew Cooper wrote:

It might be the case that the best Xen can do is give up, but it should
do so with a clear error message identifying what the firmware has done
which is sufficiently crazy to prevent further booting.

We have one more option to choose for the case:

* BUG() with clear notification at the moment we are trying to add overlapping
memory bank

So what to choose?


Certainly we need to print a clear warning.


+1 here.


Then, we can decide whether we prefer to crash (as we do today), or
work-around the broken device-tree. I think it would be more beneficial
to Xen users if we tried to continue anyway, and probably the best way
to do that would be by merging the overlapping memory regions.
I fully understand that this is not required by the spec, but lots of
hardware (x86 and ARM) get released every day with some sort of broken
spec compliance. It's our job to decide on a case by case basis whether
it makes sense for us to support these platforms anyway.

In this case, the cost of supporting the Renesas R-Car Gen3 seems pretty
limited to me. Of course, I would have to see the patch, but if we can
make this work with limited amount of changes, very maintainable, I
would take them. Otherwise, screw it :-)


I tend to disagree here. This is the first board were the bug occurs and 
the Device-Tree is replaceable.


Furthermore, if you look at the wikipage for Renesas R-Car on Xen (see 
[1]), a specific DT for Xen is provided.


So I can't see any reason to implement that in Xen at the moment.

Cheers,

[1] 
https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X


--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps

2017-07-25 Thread Ian Jackson
Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement 
target_subunit_cmd a subunit stream parser into substeps"):
> target_subunit_cmd can be used like target_cmd, but the command would
> needs to output a subunit v1 stream, which will be parsed and turned
> into osstest substeps. The command can be `| subunit-2to1` in order to
> turn a subunit v2 stream into v1.
> 
> Currently, time is not taken into account, and all substeps will have
> bogus timestamp as the output of the command is parsed after it has
> runned.
> 
> This is a description of the subunit v1 protocol, taken from
> python-subunit README, or https://pypi.python.org/pypi/python-subunit

What a lot of code!

> +while (<$stdout>) {
> +if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) {
> +# This is the timestamp for the next events

I'm not sure what your ( ) are doing here.

> +} elsif (/^test(?:ing)?:? (.+)\n/) {
> +# Start of a new test.
> +$logfilename = subunit_sanitize_testname($1) . '.log';
> +$fh = open_unique_stashfile(\$logfilename);

This name might clash with existing logfile names, which might be
generated later.  Can you put "subunit-" on the front maybe ?

> +substep_start(subunit_sanitize_testname($1), $logfilename);

And here, I think you should start the parameter you pass to
substep_start with '/' so that it gets appended to the testid for the
whole script, for a similar reason.

I think it would be better to call subunit_sanitize_testname only
once.

> +} elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess):
> +   \ (.+?)(\ \[(\ multipart)?)?$/x) {
> +# Result of a test, with its output.
> +my $result = $1;
> +my $testname = $2;
> +my $have_details = $3;
> +my $is_multipart = $4;

I would normally write this:
   my ($result, $testname, $have_... ) = ($1,$2,$3,$4,$5)
although I don't really mind much that you have written it as you
have.

> +if ($have_details) {
> +if ($is_multipart) {
> +# Test output
> +while (<$stdout>) {
> +# part content-type
> +# from 
> https://tools.ietf.org/html/rfc6838#section-4.2
> +my $restricted_name = 
> qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*';
> +if (m{ ^Content-Type:\s+
> +$restricted_name/$restricted_name # 
> type/sub-type
> +# parameters
> +(?:\s*;\s*$restricted_name=[^,]+
> +  (?:,\s*$restricted_name=[^,]+)*)
> +\s*$
> +}xi) {

I don't understand why you are trying to match this Content-Type so
precisely.  AFAICT from the grammar, all you need to do is see whether
there is something vaguely like a c-t header.

> +print $fh $_ or die $!;
> +
> +# part name
> +my $line = <$stdout>;
> +print $fh $line or die $!;
> +
> +# Read chunks of a part
> +while (<$stdout>) {
> +if (/^([0-9A-F]+)\r$/i) {
> +my $chunk_size = hex($1);

What makes you think the digits are in hex ?

Since you have to go to the effort of separating out all of this
stuff, it might be worth printing these multipart objects with one
object per logfile.  Although I won't insist on that because I suspect
that multipart results are rare.

> +} else {
> +# Unexpected output
> +chomp;
> +logm("*** $_");

I guess the error recovery is to continue until you see "]"
and hope.  Fair enough.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Stefano Stabellini
On Tue, 25 Jul 2017, Andrii Anisov wrote:
> Hello Andrew,
> 
> 
> On 25.07.17 19:23, Andrew Cooper wrote:
> > As a general rule, Xen needs to be able to tolerate and cope with any
> > quantity of crap described by the firmware.  On the x86 side, we have
> > large quantities of workarounds for buggy ACPI/MP/SMBIOS tables.
> That approach somehow covered with early mentioned options:
> 
> On 25.07.17 15:24, Andrii Anisov wrote:
> > * ignore next duplicating (overlapping) memory node in favor of one already
> > in a memory banks list
> > * merge duplicating (overlapping), even neighboring, memory banks 
> 
> On 25.07.17 19:23, Andrew Cooper wrote:
> > It might be the case that the best Xen can do is give up, but it should
> > do so with a clear error message identifying what the firmware has done
> > which is sufficiently crazy to prevent further booting.
> We have one more option to choose for the case:
> 
> * BUG() with clear notification at the moment we are trying to add overlapping
> memory bank
> 
> So what to choose?

Certainly we need to print a clear warning.

Then, we can decide whether we prefer to crash (as we do today), or
work-around the broken device-tree. I think it would be more beneficial
to Xen users if we tried to continue anyway, and probably the best way
to do that would be by merging the overlapping memory regions.

I fully understand that this is not required by the spec, but lots of
hardware (x86 and ARM) get released every day with some sort of broken
spec compliance. It's our job to decide on a case by case basis whether
it makes sense for us to support these platforms anyway.

In this case, the cost of supporting the Renesas R-Car Gen3 seems pretty
limited to me. Of course, I would have to see the patch, but if we can
make this work with limited amount of changes, very maintainable, I
would take them. Otherwise, screw it :-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Paolo Bonzini
> Hi,
> 
> Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram)
> start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr().
> That result in calling xen_map_cache() with lock=true, but this mapping
> is never invalidated.
> So QEMU use more and more RAM until it stop working for a reason or an
> other. (crash if host have little RAM or stop emulating but no crash)
> 
> I don't know if calling xen_invalidate_map_cache_entry() in
> address_space_read_continue() and address_space_write_continue() is the
> right answer.  Is there something better to do ?

I think it's correct for dma to be true... maybe add a lock argument to
qemu_ram_ptr_length, so that make address_space_{read,write}_continue can
pass 0 and everyone else passes 1?

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-25 Thread Andrew Cooper
On 25/07/17 17:40, Alexey G wrote:
> On Mon, 24 Jul 2017 21:39:08 +0100
> Igor Druzhinin  wrote:
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will
>>> be emulated, some will be merely passed through and yet there will be
>>> some RMRR ranges. libxl can't know all this stuff - some comes from the
>>> host, some comes from DM. So actual MMIO requirements are known to
>>> hvmloader at the PCI bus enumeration time.
>>>   
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
> If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
> please let it be discussed on xen-devel first as it may affect many
> people... and not all of them might be happy. :)
>
> (tons of IMO and TLDR ahead, be warned)
>
> Moving PCI BAR allocation from guest side to libxl is a controversial step.
> This may be the architecturally wrong way in fact. There are properties and
> areas of responsibility. Among primary responsibilities of guest's firmware
> is PCI BARs and MMIO hole size allocation.

There is already a very blury line concerning "firmware".  What you
describe is correct for real hardware, but remember that virtual
machines are anything but.  There is already a lot of aspects of
initialisation covered by Xen or the toolstack which would be covered by
"firmware" in a native system.  A lot of these are never ever going to
move within guest control.

> That's a guest's territory.

Every tweakable which is available inside the guest is a security attack
surface.

It is important to weigh up all options, and it might indeed be the case
that putting the tweakable inside the guest is the correct action to
take, but simply "because that's what real hardware does" is not a good
enough argument.

We've had far too many XSAs due to insufficient forethought when lashing
things together in the past.

> Guest relocates PCI BARs (and not just BIOS able to do this), guest
> firmware relocates MMIO hole base for them. If it was a real system, all
> tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
> by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
> offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
> them and generating ACPI tables. And that's ok as hvmloader can be
> considered merely a 'supplemental' firmware to perform some tasks of
> SeaBIOS/OVMF before passing control to them. This solution has some
> architecture logic at least and doesn't look bad.

PCI BAR relocation isn't interesting to consider.  It obviously has to
be dynamic (as the OS is free to renumber the bridges).

The issue I am concerned with is purely the MMIO window selection.  From
the point of view of the guest, this is fixed at boot; changing it
requires a reboot and altering the BIOS settings.

>
> On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
> know what the MMIO size value is might be a bad idea.
> Aside from some code duplication, straying too far from the real hw paths,
> or breaking existing (or future) interfaces this might have some other
> negative consequences. Ex. who will be initializing guest's ACPI tables if
> only libxl will know the memory layout? Some new interfaces between libxl
> and hvmloader just to let the latter know what values to write to ACPI
> tables being created? Or libxl will be initializing guest's ACPI tables as
> well (another guest's internal task)? Similar concerns are applicable to
> guest's final E820 construction.

Who said anything about only libxl knowing the layout?

Whatever ends up happening, the hypervisor needs to know the layout to
be able to sensibly audit a number of guest actions which currently go
unaudited.  (I am disappointed that this wasn't done in the first place,
and surprised that Xen as a whole has managed to last this long without
this information being known to the hypervisor.)

>
> Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
> of the device model (at least for now). Right now it's QEMU who traps PCI
> BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
> QEMU already talks to Xen which ranges should be passed through or trapped
> -- it can tell him the current overall MMIO limits as well... or handle
> these limits himself -- if the MMIO hole range check is all what required to
> avoid MMIO space misusing, this check can be easily implemented in QEMU,
> provided that QEMU knows what memory/MMIO layout is. There is a lot of
> implementation freedom where to place restrictions and checks, Xen or QEMU.
> Strictly 

Re: [Xen-devel] [PATCH 22/25 v6] xen/arm: vpl011: Add support for vuart console in xenconsole

2017-07-25 Thread Stefano Stabellini
Hi Bhupinder,

Thanks for trying, and I would have done the same thing as you did:

ifeq ($(CONFIG_SBSA_VUART_CONSOLE),y)
CONFIG_VUART_CONSOLE := y
endif

However, we don't want to introduce a dependency between "make tools"
and "make xen", meaning that one should be able to do "make tools"
successfully even without having done "make xen" before.

I think this is a generic tools building issue that will need to be
fixed, probably by always generating the xen .config, even when building
tools. But I don't think it's fair to ask you to do it as part of this
series. So feel free to disregard my little request for now and always
set CONFIG_VUART_CONSOLE := y.


On Tue, 25 Jul 2017, Bhupinder Thakur wrote:
> Hi Stefano,
> 
> Can we make CONFIG_VUART_CONSOLE dependent on CONFIG_SBSA_VUART_CONSOLE?
> 
> CONFIG_SBSA_VUART_CONSOLE is a Kconfig option while
> CONFIG_VUART_CONSOLE is an option defined in the .mk file which is
> used while compiling the toolstack.
> 
> So if I try to do something like this in arm64.mk/arm32.mk file, I am
> not sure if CONFIG_SBSA_VUART_CONSOLE definition will be available
> (since .config would not be generated) if I have not compiled Xen
> hypervisor code first:
> 
> ifeq ($(CONFIG_SBSA_VUART_CONSOLE),y)
> CONFIG_VUART_CONSOLE := y
> endif
> 
> Regards,
> Bhupinder
> 
> On 22 July 2017 at 01:14, Stefano Stabellini  wrote:
> > On Fri, 21 Jul 2017, Julien Grall wrote:
> >> Hi,
> >>
> >> On 18/07/17 21:07, Stefano Stabellini wrote:
> >> > On Mon, 17 Jul 2017, Bhupinder Thakur wrote:
> >> > > This patch finally adds the support for vuart console. It adds
> >> > > two new fields in the console initialization:
> >> > >
> >> > > - optional
> >> > > - prefer_gnttab
> >> > >
> >> > > optional flag tells whether the console is optional.
> >> > >
> >> > > prefer_gnttab tells whether the ring buffer should be allocated using
> >> > > grant table.
> >> > >
> >> > > Signed-off-by: Bhupinder Thakur 
> >> > > ---
> >> > > CC: Ian Jackson 
> >> > > CC: Wei Liu 
> >> > > CC: Stefano Stabellini 
> >> > > CC: Julien Grall 
> >> > >
> >> > > Changes since v4:
> >> > > - Renamed VUART_CFLAGS- to CFLAGS_vuart- in the Makefile as per the
> >> > > convention.
> >> > >
> >> > >  config/arm32.mk   |  1 +
> >> > >  config/arm64.mk   |  1 +
> >> > >  tools/console/Makefile|  3 ++-
> >> > >  tools/console/daemon/io.c | 29 -
> >> > >  4 files changed, 32 insertions(+), 2 deletions(-)
> >> > >
> >> > > diff --git a/config/arm32.mk b/config/arm32.mk
> >> > > index f95228e..b9f23fe 100644
> >> > > --- a/config/arm32.mk
> >> > > +++ b/config/arm32.mk
> >> > > @@ -1,5 +1,6 @@
> >> > >  CONFIG_ARM := y
> >> > >  CONFIG_ARM_32 := y
> >> > > +CONFIG_VUART_CONSOLE := y
> >> > >  CONFIG_ARM_$(XEN_OS) := y
> >> > >
> >> > >  CONFIG_XEN_INSTALL_SUFFIX :=
> >> >
> >> > What about leaving this off for ARM32 by default?
> >>
> >> Why? This will only disable xenconsole changes and not the hypervisor. The
> >> changes are quite tiny, so I would even be in favor of enabling for all
> >> architectures.
> >>
> >> Or are you suggesting to disable the VPL011 emulation in the hypervisor? 
> >> But I
> >> don't see the emulation AArch64 specific, and a user could disable it if he
> >> doesn't want it...
> >
> > I was thinking that the virtual pl011 is mostly useful for SBSA
> > compliance, which doesn't really apply to ARM32 (there are no ARM32 SBSA
> > compliant platforms as far as I am aware).
> >
> > Given that we don't need vpl011 on ARM32, I thought we might as well
> > disable it. Less code the better. I wouldn't go as far as introducing
> > more #ifdefs to disable it, but I would make use of the existing config
> > options to turn it off by default on ARM32. Does it make sense?
> >
> > That said, you are right that there is no point in disabling only
> > CONFIG_VUART_CONSOLE, which affects the tools only. We should really
> > disable SBSA_VUART_CONSOLE by default on ARM32. In fact, ideally
> > CONFIG_VUART_CONSOLE would be set dependning on the value of
> > SBSA_VUART_CONSOLE. What do you think?
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH v13 10/24] ts-openstack-deploy: Increase open fd limit for RabbitMQ

2017-07-25 Thread Ian Jackson
Anthony PERARD writes ("[OSSTEST PATCH v13 10/24] ts-openstack-deploy: Increase 
open fd limit for RabbitMQ"):
> Signed-off-by: Anthony PERARD 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6] x86/monitor: Notify monitor if an emulation fails.

2017-07-25 Thread Razvan Cojocaru
On 07/18/2017 01:20 PM, Razvan Cojocaru wrote:
> On 07/18/2017 01:09 PM, Andrew Cooper wrote:
>> On 18/07/17 10:37, Petre Pircalabu wrote:
>>> If case of a vm_event with the emulate_flags set, if the instruction
>>> cannot be emulated, the monitor should be notified instead of directly
>>> injecting a hw exception.
>>> This behavior can be used to re-execute an instruction not supported by
>>> the emulator using the real processor (e.g. altp2m) instead of just
>>> crashing.
>>>
>>> Signed-off-by: Petre Pircalabu 
>>
>> There are many situations which end up failing an emulation with
>> UNHANDLEABLE.
>>
>> What exact scenario are you looking to catch?  Is it just instructions
>> which aren't implemented in the emulator?
> 
> Instructions that are not implemented in the emulator are our main use
> case for this, yes. In which case, we'd like a chance to be able to
> single-step them (using altp2m), so that the guest will continue to run
> even with an incomplete emulator.
> 
> We don't care about instructions that would have failed to run in both
> scenarios (emulated or single-stepped). I'm not sure if there are other
> cases in which an instruction, although supported by the emulator, would
> fail emulation but pass single-stepping.

Is there further action required on our part the get this patch
commited? FWIW, while not ideal, from our perspective trying to
single-step an instruction that was UNHANDLEABLE when attempting to
emulate it is acceptable (even if UNHANDLEABLE doesn't mean that the
instruction is not supported by the emulator). At worst it won't run
when single-stepped either, and that'll be that.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 22/25 v6] xen/arm: vpl011: Add support for vuart console in xenconsole

2017-07-25 Thread Stefano Stabellini
On Tue, 25 Jul 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 07/21/2017 08:44 PM, Stefano Stabellini wrote:
> > On Fri, 21 Jul 2017, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 18/07/17 21:07, Stefano Stabellini wrote:
> > > > On Mon, 17 Jul 2017, Bhupinder Thakur wrote:
> > > > > This patch finally adds the support for vuart console. It adds
> > > > > two new fields in the console initialization:
> > > > > 
> > > > > - optional
> > > > > - prefer_gnttab
> > > > > 
> > > > > optional flag tells whether the console is optional.
> > > > > 
> > > > > prefer_gnttab tells whether the ring buffer should be allocated using
> > > > > grant table.
> > > > > 
> > > > > Signed-off-by: Bhupinder Thakur 
> > > > > ---
> > > > > CC: Ian Jackson 
> > > > > CC: Wei Liu 
> > > > > CC: Stefano Stabellini 
> > > > > CC: Julien Grall 
> > > > > 
> > > > > Changes since v4:
> > > > > - Renamed VUART_CFLAGS- to CFLAGS_vuart- in the Makefile as per the
> > > > > convention.
> > > > > 
> > > > >   config/arm32.mk   |  1 +
> > > > >   config/arm64.mk   |  1 +
> > > > >   tools/console/Makefile|  3 ++-
> > > > >   tools/console/daemon/io.c | 29 -
> > > > >   4 files changed, 32 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/config/arm32.mk b/config/arm32.mk
> > > > > index f95228e..b9f23fe 100644
> > > > > --- a/config/arm32.mk
> > > > > +++ b/config/arm32.mk
> > > > > @@ -1,5 +1,6 @@
> > > > >   CONFIG_ARM := y
> > > > >   CONFIG_ARM_32 := y
> > > > > +CONFIG_VUART_CONSOLE := y
> > > > >   CONFIG_ARM_$(XEN_OS) := y
> > > > > 
> > > > >   CONFIG_XEN_INSTALL_SUFFIX :=
> > > > 
> > > > What about leaving this off for ARM32 by default?
> > > 
> > > Why? This will only disable xenconsole changes and not the hypervisor. The
> > > changes are quite tiny, so I would even be in favor of enabling for all
> > > architectures.
> > > 
> > > Or are you suggesting to disable the VPL011 emulation in the hypervisor?
> > > But I
> > > don't see the emulation AArch64 specific, and a user could disable it if
> > > he
> > > doesn't want it...
> > 
> > I was thinking that the virtual pl011 is mostly useful for SBSA
> > compliance, which doesn't really apply to ARM32 (there are no ARM32 SBSA
> > compliant platforms as far as I am aware).
> > 
> > Given that we don't need vpl011 on ARM32, I thought we might as well
> > disable it. Less code the better. I wouldn't go as far as introducing
> > more #ifdefs to disable it, but I would make use of the existing config
> > options to turn it off by default on ARM32. Does it make sense?
> > 
> > That said, you are right that there is no point in disabling only
> > CONFIG_VUART_CONSOLE, which affects the tools only. We should really
> > disable SBSA_VUART_CONSOLE by default on ARM32. In fact, ideally
> > CONFIG_VUART_CONSOLE would be set dependning on the value of
> > SBSA_VUART_CONSOLE. What do you think?
> 
> Kconfig is only targeting the hypervisor and this is the tools. It is possible
> to build the tools separately from the hypervisor and therefore .config would
> not be generated.
> 
> Therefore your suggestion cannot work at the moment.

It's incredible this problem hasn't been solved yet. I guess the right
way of fixing it would be to generate the Xen .config even when doing
"make tools".


> However, imposing an
> #ifdef to require some work to support correctly for 29 lines does not seem
> very warrant.

Yes, I don't want to feature-creep it.


> So I think we should keep on by default.

OK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 04/25 v6] xen/arm: vpl011: Add support for vuart in libxl

2017-07-25 Thread Bhupinder Thakur
Hi,

On 18 July 2017 at 17:00, Wei Liu  wrote:
> CC x86 maintainers
>
> On Tue, Jul 18, 2017 at 12:19:19PM +0100, Julien Grall wrote:
>> >
>> >  ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
>> > +   ("vuart", libxl_vuart_type),
>>
>> ... here it is ARM specific. I am not convinced that we should tie vuart to
>> ARM only. I cannot see why x86 would not be able to use it in the future.
>> Any opinions?
>
> I don't know. I asked Bhupinder to put it here because it looked arm
> specific to me. I will let x86 maintainers to decide whether they want
> such thing.

What is the decision on this?

I believe that since most of the vuart code added in libxl is arch
agnostic, it should be fine to keep the libxl_vuart_type as a generic
type.

Regards,
Bhupinder

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()

2017-07-25 Thread Dario Faggioli
On Tue, 2017-07-25 at 17:52 +0100, George Dunlap wrote:
> On 07/25/2017 05:47 PM, Dario Faggioli wrote:
> > 
> > All that being said, it probably would be good to add a performance
> > counter, and try to get a sense of how frequently we actually end
> > up in
> > this function as a fallback.
> > 
> > But in the meantime, yes, I'd try to make svc stay in the runqueue
> > where it is, in this case, if possible.
> 
> Sounds good.  So are you going to respin the series then?
> 
Yep, I'll rebase and respin this series. And then rebase the caps
series on top of this and respin it too.

Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap

2017-07-25 Thread Dario Faggioli
On Tue, 2017-07-25 at 15:34 +0100, George Dunlap wrote:
> On 06/29/2017 11:09 AM, Dario Faggioli wrote:
> > E.g., if the vcpu is "greedy", and always tries to run, as soon as
> > it
> > has some budget, the difference between the two solution is _where_
> > we
> > put the "sitting period".
> 
> Sorry, dropped this thread to prepare for XenSummit.
> 
NP.

> Well I think there's a principle a bit like Ockham's Razor: In
> general,
> if there's not an obvious advantage between two implementations, the
> simpler / easier to understand implementation is better.
> 
> My sense is also that in general, as long as context switching
> overhead
> doesn't become an issue, allowing a guest to run for shorter bursts
> more
> often is better than allowing it to run for longer bursts less
> often.  A
> cpu-bound task will perform about as well in both cases, but a
> latency-sensitive task will perform much worse if it's allowed to
> burn
> up its timeslice and forced to wait for a big chunk of time.
> 
That's actually a fair point.

> On the other hand, my intuitions about how to improve AFL fuzzing
> these
> last few weeks have turned out to be consistently wrong, so at the
> moment I'm not inclined to be overly confident in my intuition
> without
> some testing. :-)
> 
Well, this is a similar situation to the one in the other thread. It's
an edge case that, likely, will never or only very rarely occur. So, we
don't need to think too much about it, and we well even afford to
follow your intuition! :-D

Mmm... Maybe this came out a bit less of a "cheering you up", as I
wanted it to be, but hey, I tried! :-P :-P :-P

No, jokes apart, I like your argument about the effect this may have on
latency sensitive workload, if they happen to suffer from one of this
overrun, so I will change the code the way you suggest. :-)

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 02/13] iommu: Add extra order argument to the IOMMU APIs and platform callbacks

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Replace existing single-page stuff (IOMMU APIs and platform callbacks)
with the multi-page one followed by modifications of all related parts.

These new map_pages/unmap_pages APIs do almost the same thing
as old map_page/unmap_page ones except the formers have extra
order argument and as the result can handle the number of pages.
So have new platform callbacks.

Although the current behavior was retained in all places (I hope),
it should be noted that the rollback logic was moved from the common code
to the IOMMU drivers. Now the IOMMU drivers are responsible for unmapping
already mapped pages if something went wrong during mapping the number
of pages (order > 0).

Signed-off-by: Oleksandr Tyshchenko 
(only for x86 and generic parts)
Reviewed-by/CC: Jan Beulich 
CC: Julien Grall 
CC: Kevin Tian 
CC: Suravee Suthikulpanit 
CC: Andrew Cooper 
CC: George Dunlap 
CC: Ian Jackson 
CC: Konrad Rzeszutek Wilk 
CC: Stefano Stabellini 
CC: Tim Deegan 
CC: Wei Liu 

---
   Changes in v1:
  - Replace existing single-page IOMMU APIs/platform callbacks with
multi-page ones instead of just keeping both variants of them.
  - Use order argument instead of page_count.
  - Clarify patch subject/description.

   Changes in v2:
  - Add maintainers in CC
---
 xen/arch/x86/mm.c | 11 +++---
 xen/arch/x86/mm/p2m-ept.c | 21 ++-
 xen/arch/x86/mm/p2m-pt.c  | 26 +++---
 xen/arch/x86/mm/p2m.c | 38 
 xen/arch/x86/x86_64/mm.c  |  5 +--
 xen/common/grant_table.c  | 10 +++---
 xen/drivers/passthrough/amd/iommu_map.c   | 50 +--
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |  8 ++---
 xen/drivers/passthrough/arm/smmu.c| 41 --
 xen/drivers/passthrough/iommu.c   | 21 +--
 xen/drivers/passthrough/vtd/iommu.c   | 48 +++--
 xen/drivers/passthrough/vtd/x86/vtd.c |  4 +--
 xen/drivers/passthrough/x86/iommu.c   |  6 ++--
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  8 +++--
 xen/include/xen/iommu.h   | 20 ++-
 15 files changed, 196 insertions(+), 121 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2dc7db9..33fcffe 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2623,11 +2623,14 @@ static int __get_page_type(struct page_info *page, 
unsigned long type,
 if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
 {
 if ( (x & PGT_type_mask) == PGT_writable_page )
-iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, 
page_to_mfn(page)));
+iommu_ret = iommu_unmap_pages(d,
+  mfn_to_gmfn(d, 
page_to_mfn(page)),
+  0);
 else if ( type == PGT_writable_page )
-iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, 
page_to_mfn(page)),
-   page_to_mfn(page),
-   IOMMUF_readable|IOMMUF_writable);
+iommu_ret = iommu_map_pages(d,
+mfn_to_gmfn(d, page_to_mfn(page)),
+page_to_mfn(page), 0,
+IOMMUF_readable|IOMMUF_writable);
 }
 }
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index ecab56f..0ccf451 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -870,26 +870,9 @@ out:
 else
 {
 if ( iommu_flags )
-for ( i = 0; i < (1 << order); i++ )
-{
-rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, 
iommu_flags);
-if ( unlikely(rc) )
-{
-while ( i-- )
-/* If statement to satisfy __must_check. */
-if ( iommu_unmap_page(p2m->domain, gfn + i) )
-continue;
-
-break;
-}
-}
+rc = iommu_map_pages(d, gfn, mfn_x(mfn), order, iommu_flags);
 else
-for ( i = 0; i < (1 << order); i++ )
-{
-ret = iommu_unmap_page(d, gfn + i);
-if ( !rc )
-rc = ret;
-}
+rc = iommu_unmap_pages(d, gfn, order);
  

[Xen-devel] [PATCH v2 06/13] iommu: Add extra use_iommu argument to iommu_domain_init()

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

The presence of this flag lets us know that the guest domain has statically
assigned devices which will most likely be used for passthrough
and as the result the IOMMU is expected to be used for this domain.

Taking into the account this hint when dealing with non-shared IOMMUs
we can populate IOMMU page tables before hand avoid going through
the list of pages at the first assigned device.
As this flag doesn't cover hotplug case, we will continue to populate
IOMMU page tables on the fly.

Extend corresponding platform callback with extra argument as well and
pass thought incoming flag to the IOMMU drivers followed by updating
"d->need_iommu" flag for any domains. But, it must be an additional logic before
updating this flag for hardware domains which the next patch is introducing.

As iommu_domain_init() is called with "use_iommu" flag being forced
to false for now, no functional change is intended for both ARM and x86.

Signed-off-by: Oleksandr Tyshchenko 
CC: Jan Beulich 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Andrew Cooper 
CC: Kevin Tian 
CC: Suravee Suthikulpanit 

---
   Changes in v1:
  - Clarify patch subject/description.
  - s/bool_t/bool/

   Changes in v2:
  - Extend "init" callback with extra argument too.
  - Clarify patch description.
  - Add maintainers in CC
---
 xen/arch/arm/domain.c   |  2 +-
 xen/arch/x86/domain.c   |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  2 +-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/iommu.c | 10 --
 xen/drivers/passthrough/vtd/iommu.c |  2 +-
 xen/include/xen/iommu.h |  4 ++--
 7 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 76310ed..ec19310 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -569,7 +569,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 ASSERT(config != NULL);
 
 /* p2m_init relies on some value initialized by the IOMMU subsystem */
-if ( (rc = iommu_domain_init(d)) != 0 )
+if ( (rc = iommu_domain_init(d, false)) != 0 )
 goto fail;
 
 if ( (rc = p2m_init(d)) != 0 )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d7e6992..1ffe76c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -641,7 +641,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 if ( (rc = init_domain_irq_mapping(d)) != 0 )
 goto fail;
 
-if ( (rc = iommu_domain_init(d)) != 0 )
+if ( (rc = iommu_domain_init(d, false)) != 0 )
 goto fail;
 }
 spin_lock_init(>arch.e820_lock);
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index fe744d2..2491e8c 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -261,7 +261,7 @@ static int get_paging_mode(unsigned long entries)
 return level;
 }
 
-static int amd_iommu_domain_init(struct domain *d)
+static int amd_iommu_domain_init(struct domain *d, bool use_iommu)
 {
 struct domain_iommu *hd = dom_iommu(d);
 
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index e828308..652b58c 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2705,7 +2705,7 @@ static int arm_smmu_reassign_dev(struct domain *s, struct 
domain *t,
return 0;
 }
 
-static int arm_smmu_iommu_domain_init(struct domain *d)
+static int arm_smmu_iommu_domain_init(struct domain *d, bool use_iommu)
 {
struct arm_smmu_xen_domain *xen_domain;
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 3e9e4c3..19c87d1 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -129,7 +129,7 @@ static void __init parse_iommu_param(char *s)
 } while ( ss );
 }
 
-int iommu_domain_init(struct domain *d)
+int iommu_domain_init(struct domain *d, bool use_iommu)
 {
 struct domain_iommu *hd = dom_iommu(d);
 int ret = 0;
@@ -142,7 +142,13 @@ int iommu_domain_init(struct domain *d)
 return 0;
 
 hd->platform_ops = iommu_get_ops();
-return hd->platform_ops->init(d);
+ret = hd->platform_ops->init(d, use_iommu);
+if ( ret )
+return ret;
+
+d->need_iommu = use_iommu;
+
+return 0;
 }
 
 static void __hwdom_init check_hwdom_reqs(struct domain *d)
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index b4e8c89..45d1f36 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1277,7 +1277,7 

[Xen-devel] [PATCH v2 13/13] [RFC] iommu: AMD-Vi: Squash map_pages/unmap_pages with map_page/unmap_page

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Reduce the scope of the TODO by squashing single-page stuff with
multi-page one. Next target is to use large pages whenever possible
in the case that hardware supports them.

Signed-off-by: Oleksandr Tyshchenko 
CC: Jan Beulich 
CC: Suravee Suthikulpanit 

---
   Changes in v1:
  -

   Changes in v2:
  -

Signed-off-by: Oleksandr Tyshchenko 
---
 xen/drivers/passthrough/amd/iommu_map.c | 250 
 1 file changed, 121 insertions(+), 129 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
b/xen/drivers/passthrough/amd/iommu_map.c
index ea3a728..22d0cc6 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -631,188 +631,180 @@ static int update_paging_mode(struct domain *d, 
unsigned long gfn)
 return 0;
 }
 
-static int __must_check amd_iommu_map_page(struct domain *d, unsigned long gfn,
-   unsigned long mfn,
-   unsigned int flags)
+/*
+ * TODO: Optimize by using large pages whenever possible in the case
+ * that hardware supports them.
+ */
+int __must_check amd_iommu_map_pages(struct domain *d, unsigned long gfn,
+ unsigned long mfn,
+ unsigned int order,
+ unsigned int flags)
 {
-bool_t need_flush = 0;
 struct domain_iommu *hd = dom_iommu(d);
 int rc;
-unsigned long pt_mfn[7];
-unsigned int merge_level;
+unsigned long orig_gfn = gfn;
+unsigned long i;
 
 if ( iommu_use_hap_pt(d) )
 return 0;
 
-memset(pt_mfn, 0, sizeof(pt_mfn));
-
 spin_lock(>arch.mapping_lock);
-
 rc = amd_iommu_alloc_root(hd);
+spin_unlock(>arch.mapping_lock);
 if ( rc )
 {
-spin_unlock(>arch.mapping_lock);
 AMD_IOMMU_DEBUG("Root table alloc failed, gfn = %lx\n", gfn);
 domain_crash(d);
 return rc;
 }
 
-/* Since HVM domain is initialized with 2 level IO page table,
- * we might need a deeper page table for lager gfn now */
-if ( is_hvm_domain(d) )
+for ( i = 0; i < (1UL << order); i++, gfn++, mfn++ )
 {
-if ( update_paging_mode(d, gfn) )
+bool_t need_flush = 0;
+unsigned long pt_mfn[7];
+unsigned int merge_level;
+
+memset(pt_mfn, 0, sizeof(pt_mfn));
+
+spin_lock(>arch.mapping_lock);
+
+/* Since HVM domain is initialized with 2 level IO page table,
+ * we might need a deeper page table for lager gfn now */
+if ( is_hvm_domain(d) )
+{
+if ( update_paging_mode(d, gfn) )
+{
+spin_unlock(>arch.mapping_lock);
+AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+domain_crash(d);
+rc = -EFAULT;
+goto err;
+}
+}
+
+if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
 {
 spin_unlock(>arch.mapping_lock);
-AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
 domain_crash(d);
-return -EFAULT;
+rc = -EFAULT;
+goto err;
 }
-}
 
-if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
-{
-spin_unlock(>arch.mapping_lock);
-AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
-domain_crash(d);
-return -EFAULT;
-}
+/* Install 4k mapping first */
+need_flush = set_iommu_pte_present(pt_mfn[1], gfn, mfn,
+   IOMMU_PAGING_MODE_LEVEL_1,
+   !!(flags & IOMMUF_writable),
+   !!(flags & IOMMUF_readable));
 
-/* Install 4k mapping first */
-need_flush = set_iommu_pte_present(pt_mfn[1], gfn, mfn, 
-   IOMMU_PAGING_MODE_LEVEL_1,
-   !!(flags & IOMMUF_writable),
-   !!(flags & IOMMUF_readable));
+/* Do not increase pde count if io mapping has not been changed */
+if ( !need_flush )
+{
+spin_unlock(>arch.mapping_lock);
+continue;
+}
 
-/* Do not increase pde count if io mapping has not been changed */
-if ( !need_flush )
-goto out;
+/* 4K mapping for PV guests never changes,
+ * no need to flush if we trust non-present bits */
+if ( is_hvm_domain(d) )
+amd_iommu_flush_pages(d, gfn, 0);
 
-/* 4K mapping for PV guests never changes, 
- * no need to flush if we trust 

[Xen-devel] [PATCH v2 04/13] xen/arm: p2m: Update IOMMU mapping whenever possible if page table is not shared

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Update IOMMU mapping if the IOMMU doesn't share page table with the CPU.
The best place to do so on ARM is __p2m_set_entry(). Use mfn as an indicator
of the required action. If mfn is valid call iommu_map_pages(),
otherwise - iommu_unmap_pages().

Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Julien Grall 

---
   Changes in v1:
  - Update IOMMU mapping in __p2m_set_entry() instead of p2m_set_entry().
  - Pass order argument to IOMMU APIs instead of page_count.

   Changes in v2:
  - Add Julien's acked-by
---
 xen/arch/arm/p2m.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index fc2a106..729ed94 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -984,7 +984,15 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
 p2m_free_entry(p2m, orig_pte, level);
 
 if ( need_iommu(p2m->domain) && (p2m_valid(orig_pte) || p2m_valid(*entry)) 
)
-rc = iommu_iotlb_flush(p2m->domain, gfn_x(sgfn), 1UL << page_order);
+{
+if ( iommu_use_hap_pt(p2m->domain) )
+rc = iommu_iotlb_flush(p2m->domain, gfn_x(sgfn), 1UL << 
page_order);
+else if ( !mfn_eq(smfn, INVALID_MFN) )
+rc = iommu_map_pages(p2m->domain, gfn_x(sgfn), mfn_x(smfn),
+ page_order, p2m_get_iommu_flags(t));
+else
+rc = iommu_unmap_pages(p2m->domain, gfn_x(sgfn), page_order);
+}
 else
 rc = 0;
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 05/13] iommu/arm: Re-define iommu_use_hap_pt(d) as iommu_hap_pt_share

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Not every integrated into ARM SoCs IOMMU can share page tables
with the CPU and as the result the iommu_use_hap_pt(d) mustn't
always be true.
Reuse x86's iommu_hap_pt_share flag to indicate whether the IOMMU
page table is shared or not.

As P2M table must always be shared between the CPU and the SMMU
print an error message and bail out if this flag was previously unset.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 

---
   Changes in v1:
  -

   Changes in v2:
  - Bail out if iommu_hap_pt_share is unset instead of overriding it
  - Clarify comment in code
---
 xen/drivers/passthrough/arm/smmu.c | 6 ++
 xen/include/asm-arm/iommu.h| 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 7c313c0..e828308 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2856,6 +2856,12 @@ static __init int arm_smmu_dt_init(struct dt_device_node 
*dev,
 */
dt_device_set_used_by(dev, DOMID_XEN);
 
+   if (!iommu_hap_pt_share) {
+   dev_err(dt_to_dev(dev),
+   "P2M table must always be shared between the CPU and 
the SMMU\n");
+   return -EINVAL;
+   }
+
rc = arm_smmu_device_dt_probe(dev);
if (rc)
return rc;
diff --git a/xen/include/asm-arm/iommu.h b/xen/include/asm-arm/iommu.h
index 57d9b1e..2a6bd3d 100644
--- a/xen/include/asm-arm/iommu.h
+++ b/xen/include/asm-arm/iommu.h
@@ -20,8 +20,8 @@ struct arch_iommu
 void *priv;
 };
 
-/* Always share P2M Table between the CPU and the IOMMU */
-#define iommu_use_hap_pt(d) (1)
+/* Not every ARM SoCs IOMMU use the same page-table format as the CPU. */
+#define iommu_use_hap_pt(d) (iommu_hap_pt_share)
 
 const struct iommu_ops *iommu_get_ops(void);
 void __init iommu_set_ops(const struct iommu_ops *ops);
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 09/13] xen/arm: Add use_iommu flag to xen_arch_domainconfig

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

This flag is intended to let Xen know that the guest has devices
which will most likely be used for passthrough and as the result
the IOMMU is expected to be used for this domain.

The primary aim of this knowledge is to help the IOMMUs that don't
share page tables with the CPU on ARM be ready before P2M code starts
updating IOMMU mapping.
So, if this flag is set the non-shared IOMMUs will populate
their page tables at the domain creation time and thereby will be able
to handle IOMMU mapping updates from *the very beginning*.

In order to retain the current behavior for x86 still call
iommu_domain_init() with use_iommu flag being forced to false.

Signed-off-by: Oleksandr Tyshchenko 
CC: Jan Beulich 
CC: Julien Grall 
CC: Ian Jackson 
CC: Wei Liu 

---
   Changes in V1:
  - Treat use_iommu flag as the ARM decision only. Don't use
common domain creation flag for it, use ARM config instead.
  - Clarify patch subject/description.

   Changes in V2:
  - Cosmetic fixes.
---
 tools/libxl/libxl_arm.c   | 8 
 xen/arch/arm/domain.c | 2 +-
 xen/include/public/arch-arm.h | 5 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index d842d88..cb9fe05 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -78,6 +78,14 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 return ERROR_FAIL;
 }
 
+if (d_config->num_dtdevs || d_config->num_pcidevs)
+xc_config->use_iommu = 1;
+else
+xc_config->use_iommu = 0;
+
+LOG(DEBUG, "IOMMU %s expected to be used for this domain",
+xc_config->use_iommu ? "is" : "isn't");
+
 return 0;
 }
 
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index ec19310..3079bbe 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -569,7 +569,7 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 ASSERT(config != NULL);
 
 /* p2m_init relies on some value initialized by the IOMMU subsystem */
-if ( (rc = iommu_domain_init(d, false)) != 0 )
+if ( (rc = iommu_domain_init(d, !!config->use_iommu)) != 0 )
 goto fail;
 
 if ( (rc = p2m_init(d)) != 0 )
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index bd974fb..b1fae45 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -322,6 +322,11 @@ struct xen_arch_domainconfig {
  *
  */
 uint32_t clock_frequency;
+/*
+ * IN
+ * IOMMU is expected to be used for this domain.
+ */
+uint8_t use_iommu;
 };
 #endif /* __XEN__ || __XEN_TOOLS__ */
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 03/13] xen/arm: p2m: Add helper to convert p2m type to IOMMU flags

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

The helper has the same purpose as existing for x86 one.
It is used for choosing IOMMU mapping attribute according to
the memory type.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Julien Grall 

---
   Changes in v1:
  - Add Julien's reviewed-by

   Changes in v2:
  -
---
 xen/include/asm-arm/p2m.h | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 1269052..635cc25 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* for vm_event_response_t */
 #include 
 #include 
@@ -353,6 +354,39 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned 
int order)
 return gfn_add(gfn, 1UL << order);
 }
 
+/*
+ * p2m type to IOMMU flags
+ */
+static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
+{
+unsigned int flags;
+
+switch( p2mt )
+{
+case p2m_ram_rw:
+case p2m_iommu_map_rw:
+case p2m_map_foreign:
+case p2m_grant_map_rw:
+case p2m_mmio_direct_dev:
+case p2m_mmio_direct_nc:
+case p2m_mmio_direct_c:
+flags = IOMMUF_readable | IOMMUF_writable;
+break;
+case p2m_ram_ro:
+case p2m_iommu_map_ro:
+case p2m_grant_map_ro:
+flags = IOMMUF_readable;
+break;
+default:
+flags = 0;
+break;
+}
+
+/* TODO Do we need to handle access permissions here? */
+
+return flags;
+}
+
 #endif /* _XEN_P2M_H */
 
 /*
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 08/13] iommu/arm: Misc fixes for arch specific part

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

1. Add missing return in case if IOMMU ops have been already set.
2. Add check for shared IOMMU before returning an error.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 

---
   Changes in V1:
  -

   Changes in V2:
  -
---
 xen/drivers/passthrough/arm/iommu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/arm/iommu.c 
b/xen/drivers/passthrough/arm/iommu.c
index 95b1abb..6f01c13 100644
--- a/xen/drivers/passthrough/arm/iommu.c
+++ b/xen/drivers/passthrough/arm/iommu.c
@@ -32,7 +32,10 @@ void __init iommu_set_ops(const struct iommu_ops *ops)
 BUG_ON(ops == NULL);
 
 if ( iommu_ops && iommu_ops != ops )
+{
 printk("WARNING: Cannot set IOMMU ops, already set to a different 
value\n");
+return;
+}
 
 iommu_ops = ops;
 }
@@ -70,6 +73,6 @@ void arch_iommu_domain_destroy(struct domain *d)
 
 int arch_iommu_populate_page_table(struct domain *d)
 {
-/* The IOMMU shares the p2m with the CPU */
-return -ENOSYS;
+/* Return an error if the IOMMU shares the p2m with the CPU */
+return iommu_use_hap_pt(d) ? -ENOSYS : 0;
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 11/13] iommu/arm: smmu: Squash map_pages/unmap_pages with map_page/unmap_page

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Eliminate TODO by squashing single-page stuff with multi-page one.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 

---
   Changes in v1:
  -

   Changes in v2:
  -
---
 xen/drivers/passthrough/arm/smmu.c | 48 +-
 1 file changed, 6 insertions(+), 42 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 652b58c..021031a 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2737,8 +2737,8 @@ static void arm_smmu_iommu_domain_teardown(struct domain 
*d)
xfree(xen_domain);
 }
 
-static int __must_check arm_smmu_map_page(struct domain *d, unsigned long gfn,
-   unsigned long mfn, unsigned int flags)
+static int __must_check arm_smmu_map_pages(struct domain *d, unsigned long gfn,
+   unsigned long mfn, unsigned int order, unsigned int 
flags)
 {
p2m_type_t t;
 
@@ -2763,10 +2763,11 @@ static int __must_check arm_smmu_map_page(struct domain 
*d, unsigned long gfn,
 * The function guest_physmap_add_entry replaces the current mapping
 * if there is already one...
 */
-   return guest_physmap_add_entry(d, _gfn(gfn), _mfn(mfn), 0, t);
+   return guest_physmap_add_entry(d, _gfn(gfn), _mfn(mfn), order, t);
 }
 
-static int __must_check arm_smmu_unmap_page(struct domain *d, unsigned long 
gfn)
+static int __must_check arm_smmu_unmap_pages(struct domain *d,
+   unsigned long gfn, unsigned int order)
 {
/*
 * This function should only be used by gnttab code when the domain
@@ -2775,44 +2776,7 @@ static int __must_check arm_smmu_unmap_page(struct 
domain *d, unsigned long gfn)
if ( !is_domain_direct_mapped(d) )
return -EINVAL;
 
-   return guest_physmap_remove_page(d, _gfn(gfn), _mfn(gfn), 0);
-}
-
-/* TODO: Optimize by squashing map_pages/unmap_pages with map_page/unmap_page 
*/
-static int __must_check arm_smmu_map_pages(struct domain *d, unsigned long gfn,
-   unsigned long mfn, unsigned int order, unsigned int flags)
-{
-   unsigned long i;
-   int rc = 0;
-
-   for (i = 0; i < (1UL << order); i++) {
-   rc = arm_smmu_map_page(d, gfn + i, mfn + i, flags);
-   if (unlikely(rc)) {
-   while (i--)
-   /* If statement to satisfy __must_check. */
-   if (arm_smmu_unmap_page(d, gfn + i))
-   continue;
-
-   break;
-   }
-   }
-
-   return rc;
-}
-
-static int __must_check arm_smmu_unmap_pages(struct domain *d,
-   unsigned long gfn, unsigned int order)
-{
-   unsigned long i;
-   int rc = 0;
-
-   for (i = 0; i < (1UL << order); i++) {
-   int ret = arm_smmu_unmap_page(d, gfn + i);
-   if (!rc)
-   rc = ret;
-   }
-
-   return rc;
+   return guest_physmap_remove_page(d, _gfn(gfn), _mfn(gfn), order);
 }
 
 static const struct iommu_ops arm_smmu_iommu_ops = {
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 07/13] iommu: Make decision about needing IOMMU for hardware domains in advance

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

The hardware domains require IOMMU to be used in the most cases and
a decision to use it is made at hardware domain construction time.
But, it is not the best moment for the non-shared IOMMUs due to
the necessity of retrieving all mapping which could happen in a period
of time between IOMMU per-domain initialization and this moment.

So, make a decision about needing IOMMU a bit earlier, in iommu_domain_init().
Having "d->need_iommu" flag set at the early stage we won't skip
any IOMMU mapping updates. And as the result the existing in iommu_hwdom_init()
code that goes through the list of the pages and tries to retrieve mapping
for non-shared IOMMUs won't be needed anymore and can be just dropped.

Signed-off-by: Oleksandr Tyshchenko 
CC: Jan Beulich 
CC: Julien Grall 

---
Changes in v1:
   -

Changes in v2:
   - This is the result of reworking old patch:
 [PATCH v1 08/10] iommu: Split iommu_hwdom_init() into arch specific parts
---
 xen/drivers/passthrough/iommu.c | 44 ++---
 1 file changed, 10 insertions(+), 34 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 19c87d1..f5e5b7e 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -52,7 +52,7 @@ custom_param("iommu", parse_iommu_param);
 bool_t __initdata iommu_enable = 1;
 bool_t __read_mostly iommu_enabled;
 bool_t __read_mostly force_iommu;
-bool_t __hwdom_initdata iommu_dom0_strict;
+bool_t __read_mostly iommu_dom0_strict;
 bool_t __read_mostly iommu_verbose;
 bool_t __read_mostly iommu_workaround_bios_bug;
 bool_t __read_mostly iommu_igfx = 1;
@@ -141,6 +141,15 @@ int iommu_domain_init(struct domain *d, bool use_iommu)
 if ( !iommu_enabled )
 return 0;
 
+if ( is_hardware_domain(d) )
+{
+if ( (paging_mode_translate(d) && !iommu_passthrough) ||
+  iommu_dom0_strict )
+use_iommu = 1;
+else
+use_iommu = 0;
+}
+
 hd->platform_ops = iommu_get_ops();
 ret = hd->platform_ops->init(d, use_iommu);
 if ( ret )
@@ -161,8 +170,6 @@ static void __hwdom_init check_hwdom_reqs(struct domain *d)
 if ( iommu_passthrough )
 panic("Dom0 uses paging translated mode, dom0-passthrough must not be "
   "enabled\n");
-
-iommu_dom0_strict = 1;
 }
 
 void __hwdom_init iommu_hwdom_init(struct domain *d)
@@ -175,37 +182,6 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
 return;
 
 register_keyhandler('o', _dump_p2m_table, "dump iommu p2m table", 0);
-d->need_iommu = !!iommu_dom0_strict;
-if ( need_iommu(d) && !iommu_use_hap_pt(d) )
-{
-struct page_info *page;
-unsigned int i = 0;
-int rc = 0;
-
-page_list_for_each ( page, >page_list )
-{
-unsigned long mfn = page_to_mfn(page);
-unsigned long gfn = mfn_to_gmfn(d, mfn);
-unsigned int mapping = IOMMUF_readable;
-int ret;
-
-if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) ||
- ((page->u.inuse.type_info & PGT_type_mask)
-  == PGT_writable_page) )
-mapping |= IOMMUF_writable;
-
-ret = hd->platform_ops->map_pages(d, gfn, mfn, 0, mapping);
-if ( !rc )
-rc = ret;
-
-if ( !(i++ & 0xf) )
-process_pending_softirqs();
-}
-
-if ( rc )
-printk(XENLOG_WARNING "d%d: IOMMU mapping failed: %d\n",
-   d->domain_id, rc);
-}
 
 return hd->platform_ops->hwdom_init(d);
 }
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 10/13] xen/arm: domain_build: Don't expose IOMMU specific properties to the guest

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

We don't passthrough IOMMU device to DOM0 even if it is not used by
Xen. Therefore exposing the properties that describe relationship
between master devices and IOMMUs does not make any sense.

According to the:
1. Documentation/devicetree/bindings/iommu/iommu.txt
2. Documentation/devicetree/bindings/pci/pci-iommu.txt

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 

---
   Changes in v1:
  -

   Changes in v2:
  - Skip optional properties too.
  - Clarify patch description
---
 xen/arch/arm/domain_build.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 3abacc0..fadfbbc 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -432,6 +432,16 @@ static int write_properties(struct domain *d, struct 
kernel_info *kinfo,
 continue;
 }
 
+/* Don't expose IOMMU specific properties to the guest */
+if ( dt_property_name_is_equal(prop, "iommus") )
+continue;
+
+if ( dt_property_name_is_equal(prop, "iommu-map") )
+continue;
+
+if ( dt_property_name_is_equal(prop, "iommu-map-mask") )
+continue;
+
 res = fdt_property(kinfo->fdt, prop->name, prop_data, prop_len);
 
 if ( res )
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 01/13] xen/device-tree: Add dt_count_phandle_with_args helper

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Port Linux helper of_count_phandle_with_args for counting
number of phandles in a property.

Signed-off-by: Oleksandr Tyshchenko 
Reviewed-by: Julien Grall 

---
   Changes in v1:
  - Add Julien's reviewed-by

   Changes in v2:
  -
---
 xen/common/device_tree.c  |  7 +++
 xen/include/xen/device_tree.h | 19 +++
 2 files changed, 26 insertions(+)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 7b009ea..60b0095 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -1663,6 +1663,13 @@ int dt_parse_phandle_with_args(const struct 
dt_device_node *np,
 index, out_args);
 }
 
+int dt_count_phandle_with_args(const struct dt_device_node *np,
+   const char *list_name,
+   const char *cells_name)
+{
+return __dt_parse_phandle_with_args(np, list_name, cells_name, 0, -1, 
NULL);
+}
+
 /**
  * unflatten_dt_node - Alloc and populate a device_node from the flat tree
  * @fdt: The parent device tree blob
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index 0aecbe0..738f1b6 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -764,6 +764,25 @@ int dt_parse_phandle_with_args(const struct dt_device_node 
*np,
const char *cells_name, int index,
struct dt_phandle_args *out_args);
 
+/**
+ * dt_count_phandle_with_args() - Find the number of phandles references in a 
property
+ * @np: pointer to a device tree node containing a list
+ * @list_name: property name that contains a list
+ * @cells_name: property name that specifies phandles' arguments count
+ *
+ * Returns the number of phandle + argument tuples within a property. It
+ * is a typical pattern to encode a list of phandle and variable
+ * arguments into a single property. The number of arguments is encoded
+ * by a property in the phandle-target node. For example, a gpios
+ * property would contain a list of GPIO specifies consisting of a
+ * phandle and 1 or more arguments. The number of arguments are
+ * determined by the #gpio-cells property in the node pointed to by the
+ * phandle.
+ */
+int dt_count_phandle_with_args(const struct dt_device_node *np,
+   const char *list_name,
+   const char *cells_name);
+
 #ifdef CONFIG_DEVICE_TREE_DEBUG
 #define dt_dprintk(fmt, args...)  \
 printk(XENLOG_DEBUG fmt, ## args)
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 12/13] [RFC] iommu: VT-d: Squash map_pages/unmap_pages with map_page/unmap_page

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Reduce the scope of the TODO by squashing single-page stuff with
multi-page one. Next target is to use large pages whenever possible
in the case that hardware supports them.

Signed-off-by: Oleksandr Tyshchenko 
CC: Jan Beulich 
CC: Kevin Tian 

---
   Changes in v1:
  -

   Changes in v2:
  -
---
 xen/drivers/passthrough/vtd/iommu.c | 138 +---
 1 file changed, 67 insertions(+), 71 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 45d1f36..d20b2f9 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1750,15 +1750,24 @@ static void iommu_domain_teardown(struct domain *d)
 spin_unlock(>arch.mapping_lock);
 }
 
-static int __must_check intel_iommu_map_page(struct domain *d,
- unsigned long gfn,
- unsigned long mfn,
- unsigned int flags)
+static int __must_check intel_iommu_unmap_pages(struct domain *d,
+unsigned long gfn,
+unsigned int order);
+
+/*
+ * TODO: Optimize by using large pages whenever possible in the case
+ * that hardware supports them.
+ */
+static int __must_check intel_iommu_map_pages(struct domain *d,
+  unsigned long gfn,
+  unsigned long mfn,
+  unsigned int order,
+  unsigned int flags)
 {
 struct domain_iommu *hd = dom_iommu(d);
-struct dma_pte *page = NULL, *pte = NULL, old, new = { 0 };
-u64 pg_maddr;
 int rc = 0;
+unsigned long orig_gfn = gfn;
+unsigned long i;
 
 /* Do nothing if VT-d shares EPT page table */
 if ( iommu_use_hap_pt(d) )
@@ -1768,78 +1777,60 @@ static int __must_check intel_iommu_map_page(struct 
domain *d,
 if ( iommu_passthrough && is_hardware_domain(d) )
 return 0;
 
-spin_lock(>arch.mapping_lock);
-
-pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn << PAGE_SHIFT_4K, 1);
-if ( pg_maddr == 0 )
+for ( i = 0; i < (1UL << order); i++, gfn++, mfn++ )
 {
-spin_unlock(>arch.mapping_lock);
-return -ENOMEM;
-}
-page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
-pte = page + (gfn & LEVEL_MASK);
-old = *pte;
-dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K);
-dma_set_pte_prot(new,
- ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
- ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
+struct dma_pte *page = NULL, *pte = NULL, old, new = { 0 };
+u64 pg_maddr;
 
-/* Set the SNP on leaf page table if Snoop Control available */
-if ( iommu_snoop )
-dma_set_pte_snp(new);
+spin_lock(>arch.mapping_lock);
 
-if ( old.val == new.val )
-{
+pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn << PAGE_SHIFT_4K, 1);
+if ( pg_maddr == 0 )
+{
+spin_unlock(>arch.mapping_lock);
+rc = -ENOMEM;
+goto err;
+}
+page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
+pte = page + (gfn & LEVEL_MASK);
+old = *pte;
+dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K);
+dma_set_pte_prot(new,
+ ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
+ ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
+
+/* Set the SNP on leaf page table if Snoop Control available */
+if ( iommu_snoop )
+dma_set_pte_snp(new);
+
+if ( old.val == new.val )
+{
+spin_unlock(>arch.mapping_lock);
+unmap_vtd_domain_page(page);
+continue;
+}
+*pte = new;
+
+iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
 spin_unlock(>arch.mapping_lock);
 unmap_vtd_domain_page(page);
-return 0;
-}
-*pte = new;
-
-iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
-spin_unlock(>arch.mapping_lock);
-unmap_vtd_domain_page(page);
 
-if ( !this_cpu(iommu_dont_flush_iotlb) )
-rc = iommu_flush_iotlb(d, gfn, dma_pte_present(old), 1);
-
-return rc;
-}
-
-static int __must_check intel_iommu_unmap_page(struct domain *d,
-   unsigned long gfn)
-{
-/* Do nothing if hardware domain and iommu supports pass thru. */
-if ( iommu_passthrough && is_hardware_domain(d) )
-return 0;
-
-return dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K);
-}
-
-/* TODO: Optimize by squashing map_pages/unmap_pages with 

[Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM

2017-07-25 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Hi, all.

The purpose of this patch series is to create a base for porting
any "Non-shared" IOMMUs to Xen on ARM. Saying "Non-shared" IOMMU I mean
the IOMMU that can't share the page table with the CPU.
Primarily, we are interested in IPMMU-VMSA and I hope that it will be the first 
candidate.
It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car Gen3 
SoCs (ARM).
I am about to push IPMMU-VMSA support in a while.

With regard to the patch series, it was rebased on Xen 4.9.0 release and tested 
on Renesas R-Car Gen3
H3/M3 based boards with applied IPMMU-VMSA support:
- Patches 1 and 3 have Julien's Rb.
- Patch 2 has Jan's Rb but only for x86 and generic parts.
- Patch 4 has Julien's Ab.
- Patches 5,6,9,10 were slightly reworked.
- Patch 7 was significantly reworked. The previous patch -> iommu: Split 
iommu_hwdom_init() into arch specific parts
- Patches 8,11,12,13 are new.

Not really sure about x86-related changes since I had no possibility to check.
So, compile-tested on x86.

You can find current patch series here:
repo: https://github.com/otyshchenko1/xen.git branch: non_shared_iommu_v2

Previous patch series here:
[PATCH v1 00/10] "Non-shared" IOMMU support on ARM
https://www.mail-archive.com/xen-devel@lists.xen.org/msg107532.html

[RFC PATCH 0/9] "Non-shared" IOMMU support on ARM
https://www.mail-archive.com/xen-devel@lists.xen.org/msg100468.html

Thank you.

Oleksandr Tyshchenko (13):
  xen/device-tree: Add dt_count_phandle_with_args helper
  iommu: Add extra order argument to the IOMMU APIs and platform
callbacks
  xen/arm: p2m: Add helper to convert p2m type to IOMMU flags
  xen/arm: p2m: Update IOMMU mapping whenever possible if page table is
not shared
  iommu/arm: Re-define iommu_use_hap_pt(d) as iommu_hap_pt_share
  iommu: Add extra use_iommu argument to iommu_domain_init()
  iommu: Make decision about needing IOMMU for hardware domains in
advance
  iommu/arm: Misc fixes for arch specific part
  xen/arm: Add use_iommu flag to xen_arch_domainconfig
  xen/arm: domain_build: Don't expose IOMMU specific properties to the
guest
  iommu/arm: smmu: Squash map_pages/unmap_pages with map_page/unmap_page
  [RFC] iommu: VT-d: Squash map_pages/unmap_pages with
map_page/unmap_page
  [RFC] iommu: AMD-Vi: Squash map_pages/unmap_pages with
map_page/unmap_page

 tools/libxl/libxl_arm.c   |   8 +
 xen/arch/arm/domain.c |   2 +-
 xen/arch/arm/domain_build.c   |  10 ++
 xen/arch/arm/p2m.c|  10 +-
 xen/arch/x86/domain.c |   2 +-
 xen/arch/x86/mm.c |  11 +-
 xen/arch/x86/mm/p2m-ept.c |  21 +--
 xen/arch/x86/mm/p2m-pt.c  |  26 +---
 xen/arch/x86/mm/p2m.c |  38 +
 xen/arch/x86/x86_64/mm.c  |   5 +-
 xen/common/device_tree.c  |   7 +
 xen/common/grant_table.c  |  10 +-
 xen/drivers/passthrough/amd/iommu_map.c   | 212 +++---
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |  10 +-
 xen/drivers/passthrough/arm/iommu.c   |   7 +-
 xen/drivers/passthrough/arm/smmu.c|  23 ++-
 xen/drivers/passthrough/iommu.c   |  73 -
 xen/drivers/passthrough/vtd/iommu.c   | 116 +-
 xen/drivers/passthrough/vtd/x86/vtd.c |   4 +-
 xen/drivers/passthrough/x86/iommu.c   |   6 +-
 xen/include/asm-arm/iommu.h   |   4 +-
 xen/include/asm-arm/p2m.h |  34 +
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   8 +-
 xen/include/public/arch-arm.h |   5 +
 xen/include/xen/device_tree.h |  19 +++
 xen/include/xen/iommu.h   |  24 +--
 26 files changed, 402 insertions(+), 293 deletions(-)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Anthony PERARD
Hi,

Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram)
start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr().
That result in calling xen_map_cache() with lock=true, but this mapping
is never invalidated.
So QEMU use more and more RAM until it stop working for a reason or an
other. (crash if host have little RAM or stop emulating but no crash)

I don't know if calling xen_invalidate_map_cache_entry() in
address_space_read_continue() and address_space_write_continue() is the
right answer.  Is there something better to do ?

(A good way to reproduce: Install Windows, so without pv driver.)

Thanks,

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-25 Thread Igor Druzhinin
On 25/07/17 17:40, Alexey G wrote:
> On Mon, 24 Jul 2017 21:39:08 +0100
> Igor Druzhinin  wrote:
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will
>>> be emulated, some will be merely passed through and yet there will be
>>> some RMRR ranges. libxl can't know all this stuff - some comes from the
>>> host, some comes from DM. So actual MMIO requirements are known to
>>> hvmloader at the PCI bus enumeration time.
>>>   
>>
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
> 
> If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
> please let it be discussed on xen-devel first as it may affect many
> people... and not all of them might be happy. :)
> 

Everything always goes through the mailing list.

> (tons of IMO and TLDR ahead, be warned)
> 
> Moving PCI BAR allocation from guest side to libxl is a controversial step.
> This may be the architecturally wrong way in fact. There are properties and
> areas of responsibility. Among primary responsibilities of guest's firmware
> is PCI BARs and MMIO hole size allocation. That's a guest's territory.
> Guest relocates PCI BARs (and not just BIOS able to do this), guest
> firmware relocates MMIO hole base for them. If it was a real system, all
> tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
> by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
> offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
> them and generating ACPI tables. And that's ok as hvmloader can be
> considered merely a 'supplemental' firmware to perform some tasks of
> SeaBIOS/OVMF before passing control to them. This solution has some
> architecture logic at least and doesn't look bad.
> 

libxl is also a part of firmware so to speak. It's incorrect to think
that only hvmloader and BIOS images are "proper" firmware.

> On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
> know what the MMIO size value is might be a bad idea.
> Aside from some code duplication, straying too far from the real hw paths,
> or breaking existing (or future) interfaces this might have some other
> negative consequences. Ex. who will be initializing guest's ACPI tables if
> only libxl will know the memory layout? Some new interfaces between libxl
> and hvmloader just to let the latter know what values to write to ACPI
> tables being created? Or libxl will be initializing guest's ACPI tables as
> well (another guest's internal task)? Similar concerns are applicable to
> guest's final E820 construction.
> 

The information is not confined by libxl - it's passed to hvmloader and
it can finish the tasks libxl couldn't. Although, ACPI tables could be
harmlessly initialized inside libxl as well (see PVH implementation).

> Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
> of the device model (at least for now). Right now it's QEMU who traps PCI
> BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
> QEMU already talks to Xen which ranges should be passed through or trapped
> -- it can tell him the current overall MMIO limits as well... or handle
> these limits himself -- if the MMIO hole range check is all what required to
> avoid MMIO space misusing, this check can be easily implemented in QEMU,
> provided that QEMU knows what memory/MMIO layout is. There is a lot of
> implementation freedom where to place restrictions and checks, Xen or QEMU.
> Strictly speaking, the MMIO hole itself can be considered a property of the
> emulated machine and may have implementation differences for different
> emulated chipsets. For example, the real i440' NB do not have an idea of
> high MMIO hole at all.
> 
> We have already a sort of an interface between hvmloader and QEMU --
> hvmloader has to do basic initialization for some emulated chipset's
> registers (and this depends on the machine). Providing additional handling
> for few other registers (TOM/TOLUD/etc) will cost almost nothing and
> purpose of this registers will actually match their usage in real HW. This
> way we can use an existing available interface and don't stray too far from
> the real HW ways. 
> 
> I want to try this approach for Q35 bringup patches for Xen I'm currently
> working on. I'll send these patches as RFC and will be glad to receive some
> constructive criticism.
> 

Sure. Static hole size provisioning doesn't prohibit its dynamic
counterpart.

Igor

>>> libxl can be taught to retrieve all missing info from QEMU, but this way
>>> will require to perform all grunt work of 

Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()

2017-07-25 Thread George Dunlap
On 07/25/2017 05:47 PM, Dario Faggioli wrote:
> On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote:
>> On 07/25/2017 05:00 PM, Dario Faggioli wrote:
>>> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:

>>> Mmm.. I think you're right. In fact, in a properly configured
>>> system,
>>> we'll never go past step 3 (from the comment at the top).
>>>
>>> Which is not ideal, or at least not what I had in mind. In fact, I
>>> think it's better to check step 4 (svc->vcpu->processor in hard-
>>> affinity) and step 5 (a CPU from svc's runqueue in hard affinity),
>>> as
>>> that would mean avoiding a runqueue migration.
>>>
>>> What about I basically kill step 3, i.e., if we reach this point
>>> during
>>> the soft-affinity step, I just continue to the hard-affinity one?
>>
>> Hmm, well *normally* we would rather have a vcpu running within its
>> soft
>> affinity, even if that means moving it to another runqueue.  
>>
> Yes, but both *ideally* and *normally*, we just should not be here. :-)
> 
> If we did end up here, we're in guessing territory, and, although what
> you say about a guest wanting to run on within its soft-affinity is
> always true, from the guest own point of view, our job as the scheduler
> is to do what would be best for the system as a whole. But we are in a
> situation where we could not gather the information to make such a
> decision.
> 
>> Is your
>> idea that, the only reason we're in this particular code is because
>> we
>> couldn't grab the lock we need to make a more informed decision; so
>> defer if possible to previous decisions, which (we might presume) was
>> able to make a more informed decision?
>>
> Kind of, yes. Basically I think we should "escape" from this situation
> as quickly as possible, and causing as few troubles as possible to both
> ourself and to others, in the hope that it will go better next time.
> 
> Trying to stay in the same runqueue seems to me to fit this
> requirement, as:
> - as you say, we're here because a previous (presumably well informed)
>   decision brought us here so, hopefully, staying here is not too bad, 
>   neither for us nor overall;
> - staying here is quicker and means less overhead for svc;
> - staying here means less overhead overall. In fact, if we decide to 
>   change runqueue, we will have to take the remote runqueue lock at 
>   some point... And I'd prefer that to be for good reasons.
> 
> All that being said, it probably would be good to add a performance
> counter, and try to get a sense of how frequently we actually end up in
> this function as a fallback.
> 
> But in the meantime, yes, I'd try to make svc stay in the runqueue
> where it is, in this case, if possible.

Sounds good.  So are you going to respin the series then?

> 
>>> ASSERT_UNREACHABLE() is indeed much better. What do you mean with
>>> "something random"? The value to be assigned to cpu?
>>
>> Er, yes, I meant the return value.  Returning 0, or v->processor
>> would
>> be simple options.  *Really* defensive programming would attempt to
>> chose something somewhat sensible with the minimal risk of triggering
>> some other hidden assumptions (say, any cpu on our current runqueue).
>> But part of me says even thinking too long about it is a waste of
>> time
>> for something we're 99.99% sure can never happen. :-)
>>
> Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using
> either v->processor (with a comment), or a cpumask_any(something). Of
> course the latter is expensive, but it should not be a big problem,
> considering we'll never get there (I'll have a look at generated the
> assembly, to confirm that).

OK, thanks.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Andrii Anisov

Hello Andrew,


On 25.07.17 19:23, Andrew Cooper wrote:

As a general rule, Xen needs to be able to tolerate and cope with any
quantity of crap described by the firmware.  On the x86 side, we have
large quantities of workarounds for buggy ACPI/MP/SMBIOS tables.

That approach somehow covered with early mentioned options:

On 25.07.17 15:24, Andrii Anisov wrote:
* ignore next duplicating (overlapping) memory node in favor of one 
already in a memory banks list
* merge duplicating (overlapping), even neighboring, memory banks 


On 25.07.17 19:23, Andrew Cooper wrote:

It might be the case that the best Xen can do is give up, but it should
do so with a clear error message identifying what the firmware has done
which is sufficiently crazy to prevent further booting.

We have one more option to choose for the case:

* BUG() with clear notification at the moment we are trying to add 
overlapping memory bank


So what to choose?

ps:

Hitting a BUG()
Actually silently dying without earlyprintk enabled in this case. It 
happens pretty early.


--

*Andrii Anisov*



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-25 Thread Alexey G
On Tue, 25 Jul 2017 15:13:17 +0100
Igor Druzhinin  wrote:
> >> The algorithm implemented in hvmloader for that is not complicated and
> >> can be moved to libxl easily. What we can do is to provision a hole big
> >> enough to include all the initially assigned PCI devices. We can also
> >> account for emulated MMIO regions if necessary. But, to be honest, it
> >> doesn't really matter since if there is no enough space in lower MMIO
> >> hole for some BARs - they can be easily relocated to upper MMIO
> >> hole by hvmloader or the guest itself (dynamically).
> >>
> >> Igor  
> > [Zhang, Xiong Y] yes, If we could supply a big enough mmio hole and
> > don't allow hvmloader to do relocate, things will be easier. But how
> > could we supply a big enough mmio hole ? a. statical set base address
> > of mmio hole to 2G/3G. b. Like hvmloader to probe all the pci devices
> > and calculate mmio size. But this runs prior to qemu, how to probe pci
> > devices ? 
> 
> It's true that we don't know the space occupied by emulated device
> before QEMU is started.  But QEMU needs to be started with some lower
> MMIO hole size statically assigned.
> 
> One of the possible solutions is to calculate a hole size required to
> include all the assigned pass-through devices and round it up to the
> nearest GB boundary but not larger than 2GB total. If it's not enough to
> also include all the emulated devices - it's not enough, some of the PCI
> device are going to be relocated to upper MMIO hole in that case.

Not all devices are BAR64-capable and even those which are may have Option
ROM BARs (mem32 only). Yet there are 32-bits guests who will find 64-bit
BARs with values above 4GB to be extremely unacceptable. Low MMIO hole is a
precious resource. Also, one need to consider implications of PCI device
hotplugging against the 'static' precalculation approach.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()

2017-07-25 Thread Dario Faggioli
On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote:
> On 07/25/2017 05:00 PM, Dario Faggioli wrote:
> > On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
> > > 
> > Mmm.. I think you're right. In fact, in a properly configured
> > system,
> > we'll never go past step 3 (from the comment at the top).
> > 
> > Which is not ideal, or at least not what I had in mind. In fact, I
> > think it's better to check step 4 (svc->vcpu->processor in hard-
> > affinity) and step 5 (a CPU from svc's runqueue in hard affinity),
> > as
> > that would mean avoiding a runqueue migration.
> > 
> > What about I basically kill step 3, i.e., if we reach this point
> > during
> > the soft-affinity step, I just continue to the hard-affinity one?
> 
> Hmm, well *normally* we would rather have a vcpu running within its
> soft
> affinity, even if that means moving it to another runqueue.  
>
Yes, but both *ideally* and *normally*, we just should not be here. :-)

If we did end up here, we're in guessing territory, and, although what
you say about a guest wanting to run on within its soft-affinity is
always true, from the guest own point of view, our job as the scheduler
is to do what would be best for the system as a whole. But we are in a
situation where we could not gather the information to make such a
decision.

> Is your
> idea that, the only reason we're in this particular code is because
> we
> couldn't grab the lock we need to make a more informed decision; so
> defer if possible to previous decisions, which (we might presume) was
> able to make a more informed decision?
> 
Kind of, yes. Basically I think we should "escape" from this situation
as quickly as possible, and causing as few troubles as possible to both
ourself and to others, in the hope that it will go better next time.

Trying to stay in the same runqueue seems to me to fit this
requirement, as:
- as you say, we're here because a previous (presumably well informed)
  decision brought us here so, hopefully, staying here is not too bad, 
  neither for us nor overall;
- staying here is quicker and means less overhead for svc;
- staying here means less overhead overall. In fact, if we decide to 
  change runqueue, we will have to take the remote runqueue lock at 
  some point... And I'd prefer that to be for good reasons.

All that being said, it probably would be good to add a performance
counter, and try to get a sense of how frequently we actually end up in
this function as a fallback.

But in the meantime, yes, I'd try to make svc stay in the runqueue
where it is, in this case, if possible.

> > ASSERT_UNREACHABLE() is indeed much better. What do you mean with
> > "something random"? The value to be assigned to cpu?
> 
> Er, yes, I meant the return value.  Returning 0, or v->processor
> would
> be simple options.  *Really* defensive programming would attempt to
> chose something somewhat sensible with the minimal risk of triggering
> some other hidden assumptions (say, any cpu on our current runqueue).
> But part of me says even thinking too long about it is a waste of
> time
> for something we're 99.99% sure can never happen. :-)
> 
Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using
either v->processor (with a comment), or a cpumask_any(something). Of
course the latter is expensive, but it should not be a big problem,
considering we'll never get there (I'll have a look at generated the
assembly, to confirm that).

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

2017-07-25 Thread Alexey G
On Mon, 24 Jul 2017 21:39:08 +0100
Igor Druzhinin  wrote:
> > But, the problem is that overall MMIO hole(s) requirements are not known
> > exactly at the time the HVM domain being created. Some PCI devices will
> > be emulated, some will be merely passed through and yet there will be
> > some RMRR ranges. libxl can't know all this stuff - some comes from the
> > host, some comes from DM. So actual MMIO requirements are known to
> > hvmloader at the PCI bus enumeration time.
> >   
> 
> IMO hvmloader shouldn't really be allowed to relocate memory under any
> conditions. As Andrew said it's much easier to provision the hole
> statically in libxl during domain construction process and it doesn't
> really compromise any functionality. Having one more entity responsible
> for guest memory layout only makes things more convoluted.

If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
please let it be discussed on xen-devel first as it may affect many
people... and not all of them might be happy. :)

(tons of IMO and TLDR ahead, be warned)

Moving PCI BAR allocation from guest side to libxl is a controversial step.
This may be the architecturally wrong way in fact. There are properties and
areas of responsibility. Among primary responsibilities of guest's firmware
is PCI BARs and MMIO hole size allocation. That's a guest's territory.
Guest relocates PCI BARs (and not just BIOS able to do this), guest
firmware relocates MMIO hole base for them. If it was a real system, all
tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
them and generating ACPI tables. And that's ok as hvmloader can be
considered merely a 'supplemental' firmware to perform some tasks of
SeaBIOS/OVMF before passing control to them. This solution has some
architecture logic at least and doesn't look bad.

On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
know what the MMIO size value is might be a bad idea.
Aside from some code duplication, straying too far from the real hw paths,
or breaking existing (or future) interfaces this might have some other
negative consequences. Ex. who will be initializing guest's ACPI tables if
only libxl will know the memory layout? Some new interfaces between libxl
and hvmloader just to let the latter know what values to write to ACPI
tables being created? Or libxl will be initializing guest's ACPI tables as
well (another guest's internal task)? Similar concerns are applicable to
guest's final E820 construction.

Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
of the device model (at least for now). Right now it's QEMU who traps PCI
BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
QEMU already talks to Xen which ranges should be passed through or trapped
-- it can tell him the current overall MMIO limits as well... or handle
these limits himself -- if the MMIO hole range check is all what required to
avoid MMIO space misusing, this check can be easily implemented in QEMU,
provided that QEMU knows what memory/MMIO layout is. There is a lot of
implementation freedom where to place restrictions and checks, Xen or QEMU.
Strictly speaking, the MMIO hole itself can be considered a property of the
emulated machine and may have implementation differences for different
emulated chipsets. For example, the real i440' NB do not have an idea of
high MMIO hole at all.

We have already a sort of an interface between hvmloader and QEMU --
hvmloader has to do basic initialization for some emulated chipset's
registers (and this depends on the machine). Providing additional handling
for few other registers (TOM/TOLUD/etc) will cost almost nothing and
purpose of this registers will actually match their usage in real HW. This
way we can use an existing available interface and don't stray too far from
the real HW ways. 

I want to try this approach for Q35 bringup patches for Xen I'm currently
working on. I'll send these patches as RFC and will be glad to receive some
constructive criticism.

> > libxl can be taught to retrieve all missing info from QEMU, but this way
> > will require to perform all grunt work of PCI BARs allocation in libxl
> > itself - in order to calculate the real MMIO hole(s) size, one needs to
> > take into account all PCI BARs sizes and their alignment requirements
> > diversity + existing gaps due to RMRR ranges... basically, libxl will
> > need to do most of hvmloader/pci.c's job.
> >   
> 
> The algorithm implemented in hvmloader for that is not complicated and
> can be moved to libxl easily. What we can do is to provision a hole big
> enough to include all the initially assigned PCI devices. We can also
> account for emulated MMIO regions if necessary. But, to be honest, it
> doesn't really matter since if there is no 

Re: [Xen-devel] Android bring up over xen for Rcar-H3

2017-07-25 Thread Andrii Anisov

Dear George,

On 19.07.17 12:42, George John wrote:
I want to bring up Android over Xen as Dom0 on Rcar-H3 salvator x 
board. Can anybody please share the procedure to do so.?.
Once you have an Android for Salvator-X board, the procedure is not 
really specific.
You have to have the ATF running u-boot in EL2 mode, stuff from here [1] 
applied to the kernel.

You have to build XEN itself.
Than run the system similar to [2].
You will have the Android running in Dom0.

But the main question here is building XEN toolstack for Android. I'm 
not sure if it is feasible. From my previous experience there were no 
good result.


[1] 
https://github.com/xen-troops/meta-demo/tree/master/meta-rcar-gen3-xen/recipes-kernel/linux/linux-renesas
[2] 
https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X#Running_the_system


--

*Andrii Anisov*



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Andrew Cooper
On 25/07/17 17:02, Andrii Anisov wrote:
> Hello Julien,
>
>
> On 25.07.17 18:44, Julien Grall wrote:
>> I have seen work on this board for the past year and it is the first
>> time I have seen a complain about memory overlap. So why this sudden
>> change?
> It just an approach change. I'm cleaning up nits for the board.
>
>> Is that a comment from the vendor or your guess?
> It is my impression.
> But yes, it could be worth to ask them why do they hardcode the memory
> layout in their u-boot but stuff the dts with number of memory nodes.
> Actually from the beginning of our time for this board the solution
> was merging all memory description nodes in one in a dts for XEN, like
> [1]. So u-boot runtime updates to the dtb were just ignored.
>
>> You need to differentiate the device-tree spec itself and Linux
>> implementation. memblock is something common to all architecture. It
>> does not mean it is something valid to do.
> That is why I asked for an advice.
>
> [1]
> https://github.com/xen-troops/meta-demo/blob/master/meta-rcar-gen3-xen/recipes-kernel/linux/linux-renesas/r8a7795-salvator-x-xen.dts#L61

As a general rule, Xen needs to be able to tolerate and cope with any
quantity of crap described by the firmware.  On the x86 side, we have
large quantities of workarounds for buggy ACPI/MP/SMBIOS tables.

It might be the case that the best Xen can do is give up, but it should
do so with a clear error message identifying what the firmware has done
which is sufficiently crazy to prevent further booting.  Hitting a BUG()
is not a user friendly thing to do, so should be fixed.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Mark Rutland
On Tue, Jul 25, 2017 at 06:27:31PM +0300, Andrii Anisov wrote:
> On 25.07.17 17:23, Julien Grall wrote:
> >I think this is by chance rather than by design. The first
> >question to answer is why a Firmware would specify twice the same
> >memory banks? Is it valid from the specification?
> Yep, that is the question.

I beleive that all memory regions described in any memory node's reg
entries should be disjoint (i.e. should not overlap). Per IEEE-1275, reg
entries (within a node) are supposed to be disjoint, and I would expect
this requirement to extend across nodes in the case of memory nodes.

Currently, the DT spec is somewhat sparse in wording, but this can be
corrected.

> >Regardless that, it looks like to me that the device-tree you give
> >to the board should not contain the memory nodes.
> The device-tree is provided by vendor in that form, and u-boot is
> theirs. It seems to me that they do not really care since the kernel
> tolerates duplication.
> 
> >> ps. Linux kernel does tolerate duplicated memory nodes by
> >>merging memory blocks. I.e. memblock_add_range() function is
> >>commented as following:
> >>/**
> >> * memblock_add_range - add new memblock region
> >> * @type: memblock type to add new region into
> >> * @base: base address of the new region
> >> * @size: size of the new region
> >> * @nid: nid of the new region
> >> * @flags: flags of the new region
> >> *
> >> * Add new memblock region [@base,@base+@size) into @type. The new
> >>region
> >> * is allowed to overlap with existing ones - overlaps don't affect
> >>already
> >> * existing regions.  @type is guaranteed to be minimal (all
> >>neighbouring
> >> * compatible regions are merged) after the addition.
> >> *
> >> * RETURNS:
> >> * 0 on success, -errno on failure.
> >> */
> IMO the function description is pretty straight-forward.
> But let us wait for device tree guys feedback.

This might be the designed of *memblock*, but that does not mean that it
is a deliberate part of the DT handling. I beleive this is simply an
implementation detail that happens to mask a DT bug.

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()

2017-07-25 Thread George Dunlap
On 07/25/2017 05:00 PM, Dario Faggioli wrote:
> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
>> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
>>>
>>> diff --git a/xen/common/sched_credit2.c
>>> b/xen/common/sched_credit2.c
>>> index c749d4e..54f6e21 100644
>>> --- a/xen/common/sched_credit2.c
>>> +++ b/xen/common/sched_credit2.c
>>> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu,
>>> cpumask_t *mask)
>>>  }
>>>  
>>>  /*
>>> - * When a hard affinity change occurs, we may not be able to check
>>> some
>>> - * (any!) of the other runqueues, when looking for the best new
>>> processor
>>> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
>>> happens, we
>>> - * pick, in order of decreasing preference:
>>> - *  - svc's current pcpu;
>>> - *  - another pcpu from svc's current runq;
>>> - *  - any cpu.
>>> + * In csched2_cpu_pick(), it may not be possible to actually look
>>> at remote
>>> + * runqueues (the trylock-s on their spinlocks can fail!). If that
>>> happens,
>>> + * we pick, in order of decreasing preference:
>>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft
>>> affinity;
>>> + *  3) just one valid pcpu from svc's soft affinity;
>>> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
>>> + *  5) a pcpu in svc's current runqueue that is also in svc's hard
>>> affinity;
>>> + *  6) just one valid pcpu from svc's hard affinity
>>> + *
>>> + * Of course, 1, 2 and 3 makes sense only if svc has a soft
>>> affinity. Also
>>> + * note that at least 6 is guaranteed to _always_ return at least
>>> one pcpu.
>>>   */
>>>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>>>  {
>>>  struct vcpu *v = svc->vcpu;
>>> -int cpu = v->processor;
>>> +unsigned int bs;
>>>  
>>> -cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
>>> -cpupool_domain_cpumask(v->domain));
>>> +for_each_affinity_balance_step( bs )
>>> +{
>>> +int cpu = v->processor;
>>> +
>>> +if ( bs == BALANCE_SOFT_AFFINITY &&
>>> + !has_soft_affinity(v, v->cpu_hard_affinity) )
>>> +continue;
>>>  
>>> -if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>>> -return cpu;
>>> +affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
>>> +cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +cpupool_domain_cpumask(v->domain));
>>>  
>>> -if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> -   >rqd->active)) )
>>> -{
>>> -cpumask_and(cpumask_scratch_cpu(cpu), >rqd->active,
>>> -cpumask_scratch_cpu(cpu));
>>> -return cpumask_first(cpumask_scratch_cpu(cpu));
>>> -}
>>> +/*
>>> + * This is cases 1 or 4 (depending on bs): if v->processor 
>>> is (still)
>>> + * in our affinity, go for it, for cache betterness.
>>> + */
>>> +if ( likely(cpumask_test_cpu(cpu,
>>> cpumask_scratch_cpu(cpu))) )
>>> +return cpu;
>>>  
>>> -ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>>> +/*
>>> + * This is cases 2 or 5 (depending on bs): v->processor
>>> isn't there
>>> + * any longer, check if we at least can stay in our
>>> current runq.
>>> + */
>>> +if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> +   >rqd->active)) )
>>> +{
>>> +cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +>rqd->active);
>>> +return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +}
>>>  
>>> -return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +/*
>>> + * This is cases 3 or 6 (depending on bs): last stand,
>>> just one valid
>>> + * pcpu from our soft affinity, if we have one and if
>>> there's any. In
>>> + * fact, if we are doing soft-affinity, it is possible
>>> that we fail,
>>> + * which means we stay in the loop and look for hard
>>> affinity. OTOH,
>>> + * if we are at the hard-affinity balancing step, it's
>>> guaranteed that
>>> + * there is at least one valid cpu, and therefore we are
>>> sure that we
>>> + * return it, and never really exit the loop.
>>> + */
>>> +ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
>>> +   bs == BALANCE_SOFT_AFFINITY);
>>> +cpu = cpumask_first(cpumask_scratch_cpu(cpu));
>>
>> So just checking my understanding here... at this point we're not
>> taking
>> into consideration load or idleness or anything else -- we're just
>> saying, "Is there a cpu in my soft affinity it is *possible* to run
>> on?"
>>
> Exactly. If we are in this function, it means we failed to take the
> locks we needed, for making a choice based on load, 

Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()

2017-07-25 Thread Andrew Cooper
On 25/07/17 16:27, Tim Deegan wrote:
> At 16:00 +0100 on 25 Jul (1500998413), Andrew Cooper wrote:
>> The purpose of guest_walk_tables() is to match the behaviour of real 
>> hardware.
>>
>> A PV guest can have 2M superpages in its pagetables, via the M2P and the
>> initial initrd mapping, even if it isn't permitted to create arbitrary 2M
>> superpage mappings.
> Can the domain builder (or Xen?) really give a guest superpage
> mappings for its initrd?  Wouldn't that cause problems for live
> migration?

From Wei's work, dom0 definitely ends up calling put_page_type() on a L2
superpage.

Looking at the dom0 construction code more closely, it appears that Xen
creates the initial p2m in the guest using superpages.

I will s/initrd/initial p2m/ in the commit message.  I don't believe the
domain builder has any way of creating such mappings for a domU, so the
migration aspect is less important.

>
> In any case this patch looks correct: the presence of superpages in PV
> pagetables is decided by the PV MM rules, so the walker should accept them.
>
> Reviewed-by: Tim Deegan 

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/5] xen: sched_null: add some tracing

2017-07-25 Thread Dario Faggioli
On Tue, 2017-07-25 at 16:15 +0100, George Dunlap wrote:
> On Thu, Jun 29, 2017 at 1:56 PM, Dario Faggioli
>  wrote:
> > In line with what is there in all the other schedulers.
> > 
> > Signed-off-by: Dario Faggioli 
> > ---
> > George Dunlap 
> 
> FYI forgot the 'CC:' for this and patch 5.  :-)  
>
Ah, indeed. Sorry for that.

> (No problem, just one
> extra step to download the series into an mbox.)
> 
Ok, will pay more attention next time. :-)

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()

2017-07-25 Thread Wei Liu
On Tue, Jul 25, 2017 at 04:27:00PM +0100, Tim Deegan wrote:
> At 16:00 +0100 on 25 Jul (1500998413), Andrew Cooper wrote:
> > The purpose of guest_walk_tables() is to match the behaviour of real 
> > hardware.
> > 
> > A PV guest can have 2M superpages in its pagetables, via the M2P and the
> > initial initrd mapping, even if it isn't permitted to create arbitrary 2M
> > superpage mappings.
> 
> Can the domain builder (or Xen?) really give a guest superpage
> mappings for its initrd?  Wouldn't that cause problems for live
> migration?
> 

I can see superpage mapping when constructing Dom0 (kernel or initrd,
haven't checked).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()

2017-07-25 Thread Wei Liu
On Tue, Jul 25, 2017 at 04:00:13PM +0100, Andrew Cooper wrote:
> The purpose of guest_walk_tables() is to match the behaviour of real hardware.
> 
> A PV guest can have 2M superpages in its pagetables, via the M2P and the
> initial initrd mapping, even if it isn't permitted to create arbitrary 2M
> superpage mappings.
> 
> guest_can_use_l2_superpages() checking opt_allow_superpage is a piece of PV
> guest policy enforcement, rather than its intended purpose of meaning "would
> hardware tolerate finding an L2 superpage with these control settings?"
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap

2017-07-25 Thread Dario Faggioli
On Tue, 2017-07-25 at 16:08 +0100, George Dunlap wrote:
> On 06/08/2017 01:08 PM, Dario Faggioli wrote:
> > 
> Hmm, this needs to be rebased on the structure layout patches I
> checked
> in last week. :-)
> 
Indeed it does. And of course, I'm up for it. :-)

The soft-affinity series is likely to generate less conflicts... It's
probably easier if we merge that first (after thaking care of patches 3
and 6, of course), isn't it?

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Andrii Anisov

Hello Julien,


On 25.07.17 18:44, Julien Grall wrote:
I have seen work on this board for the past year and it is the first 
time I have seen a complain about memory overlap. So why this sudden 
change?

It just an approach change. I'm cleaning up nits for the board.


Is that a comment from the vendor or your guess?

It is my impression.
But yes, it could be worth to ask them why do they hardcode the memory 
layout in their u-boot but stuff the dts with number of memory nodes.
Actually from the beginning of our time for this board the solution was 
merging all memory description nodes in one in a dts for XEN, like [1]. 
So u-boot runtime updates to the dtb were just ignored.


You need to differentiate the device-tree spec itself and Linux 
implementation. memblock is something common to all architecture. It 
does not mean it is something valid to do.

That is why I asked for an advice.

[1] 
https://github.com/xen-troops/meta-demo/blob/master/meta-rcar-gen3-xen/recipes-kernel/linux/linux-renesas/r8a7795-salvator-x-xen.dts#L61

--

*Andrii Anisov*



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()

2017-07-25 Thread Dario Faggioli
On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > index c749d4e..54f6e21 100644
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu,
> > cpumask_t *mask)
> >  }
> >  
> >  /*
> > - * When a hard affinity change occurs, we may not be able to check
> > some
> > - * (any!) of the other runqueues, when looking for the best new
> > processor
> > - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
> > happens, we
> > - * pick, in order of decreasing preference:
> > - *  - svc's current pcpu;
> > - *  - another pcpu from svc's current runq;
> > - *  - any cpu.
> > + * In csched2_cpu_pick(), it may not be possible to actually look
> > at remote
> > + * runqueues (the trylock-s on their spinlocks can fail!). If that
> > happens,
> > + * we pick, in order of decreasing preference:
> > + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> > + *  2) a pcpu in svc's current runqueue that is also in svc's soft
> > affinity;
> > + *  3) just one valid pcpu from svc's soft affinity;
> > + *  4) svc's current pcpu, if it is part of svc's hard affinity;
> > + *  5) a pcpu in svc's current runqueue that is also in svc's hard
> > affinity;
> > + *  6) just one valid pcpu from svc's hard affinity
> > + *
> > + * Of course, 1, 2 and 3 makes sense only if svc has a soft
> > affinity. Also
> > + * note that at least 6 is guaranteed to _always_ return at least
> > one pcpu.
> >   */
> >  static int get_fallback_cpu(struct csched2_vcpu *svc)
> >  {
> >  struct vcpu *v = svc->vcpu;
> > -int cpu = v->processor;
> > +unsigned int bs;
> >  
> > -cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> > -cpupool_domain_cpumask(v->domain));
> > +for_each_affinity_balance_step( bs )
> > +{
> > +int cpu = v->processor;
> > +
> > +if ( bs == BALANCE_SOFT_AFFINITY &&
> > + !has_soft_affinity(v, v->cpu_hard_affinity) )
> > +continue;
> >  
> > -if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> > -return cpu;
> > +affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
> > +cpumask_and(cpumask_scratch_cpu(cpu),
> > cpumask_scratch_cpu(cpu),
> > +cpupool_domain_cpumask(v->domain));
> >  
> > -if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> > -   >rqd->active)) )
> > -{
> > -cpumask_and(cpumask_scratch_cpu(cpu), >rqd->active,
> > -cpumask_scratch_cpu(cpu));
> > -return cpumask_first(cpumask_scratch_cpu(cpu));
> > -}
> > +/*
> > + * This is cases 1 or 4 (depending on bs): if v->processor 
> > is (still)
> > + * in our affinity, go for it, for cache betterness.
> > + */
> > +if ( likely(cpumask_test_cpu(cpu,
> > cpumask_scratch_cpu(cpu))) )
> > +return cpu;
> >  
> > -ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
> > +/*
> > + * This is cases 2 or 5 (depending on bs): v->processor
> > isn't there
> > + * any longer, check if we at least can stay in our
> > current runq.
> > + */
> > +if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> > +   >rqd->active)) )
> > +{
> > +cpumask_and(cpumask_scratch_cpu(cpu),
> > cpumask_scratch_cpu(cpu),
> > +>rqd->active);
> > +return cpumask_first(cpumask_scratch_cpu(cpu));
> > +}
> >  
> > -return cpumask_first(cpumask_scratch_cpu(cpu));
> > +/*
> > + * This is cases 3 or 6 (depending on bs): last stand,
> > just one valid
> > + * pcpu from our soft affinity, if we have one and if
> > there's any. In
> > + * fact, if we are doing soft-affinity, it is possible
> > that we fail,
> > + * which means we stay in the loop and look for hard
> > affinity. OTOH,
> > + * if we are at the hard-affinity balancing step, it's
> > guaranteed that
> > + * there is at least one valid cpu, and therefore we are
> > sure that we
> > + * return it, and never really exit the loop.
> > + */
> > +ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
> > +   bs == BALANCE_SOFT_AFFINITY);
> > +cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> 
> So just checking my understanding here... at this point we're not
> taking
> into consideration load or idleness or anything else -- we're just
> saying, "Is there a cpu in my soft affinity it is *possible* to run
> on?"
>
Exactly. If we are in this function, it means we failed to take the
locks we needed, for making a choice based on load, idleness, etc, but
we need a CPU, so we pick whatever is valid.

Re: [Xen-devel] [PATCH 3/5] xen: sched-null: support soft-affinity

2017-07-25 Thread George Dunlap
On Thu, Jun 29, 2017 at 1:56 PM, Dario Faggioli
 wrote:
> The null scheduler does not really use hard-affinity for
> scheduling, it uses it for 'placement', i.e., for deciding
> to what pCPU to statically assign a vCPU.
>
> Let's use soft-affinity in the same way, of course with the
> difference that, if there's no free pCPU within the vCPU's
> soft-affinity, we go checking the hard-affinity, instead of
> putting the vCPU in the waitqueue.
>
> This does has no impact on the scheduling overhead, because
> soft-affinity is only considered in cold-path (like when a
> vCPU joins the scheduler for the first time, or is manually
> moved between pCPUs by the user).
>
> Signed-off-by: Dario Faggioli 

Reviewed-by: George Dunlap 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Julien Grall

On 25/07/17 16:27, Andrii Anisov wrote:

Hello Julien,


Hi Andrii,


On 25.07.17 17:23, Julien Grall wrote:

I think this is by chance rather than by design. The first question to
answer is why a Firmware would specify twice the same memory banks? Is
it valid from the specification?

Yep, that is the question.


Regardless that, it looks like to me that the device-tree you give to
the board should not contain the memory nodes.

The device-tree is provided by vendor in that form, and u-boot is
theirs. It seems to me that they do not really care since the kernel
tolerates duplication.


I have seen work on this board for the past year and it is the first 
time I have seen a complain about memory overlap. So why this sudden 
change? Is that a comment from the vendor or your guess?





 ps. Linux kernel does tolerate duplicated memory nodes by merging
memory blocks. I.e. memblock_add_range() function is commented as
following:
/**
 * memblock_add_range - add new memblock region
 * @type: memblock type to add new region into
 * @base: base address of the new region
 * @size: size of the new region
 * @nid: nid of the new region
 * @flags: flags of the new region
 *
 * Add new memblock region [@base,@base+@size) into @type. The new
region
 * is allowed to overlap with existing ones - overlaps don't affect
already
 * existing regions.  @type is guaranteed to be minimal (all
neighbouring
 * compatible regions are merged) after the addition.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */

IMO the function description is pretty straight-forward.


You need to differentiate the device-tree spec itself and Linux 
implementation. memblock is something common to all architecture. It 
does not mean it is something valid to do.



But let us wait for device tree guys feedback.



Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 20/20] osstest: save/retrieve the last successfully tested FreeBSD build

2017-07-25 Thread Roger Pau Monne
On Tue, Jul 25, 2017 at 04:18:52PM +0100, Ian Jackson wrote:
> Roger Pau Monne writes ("[PATCH v7 20/20] osstest: save/retrieve the last 
> successfully tested FreeBSD build"):
> > And use it in order to install the hosts for the next FreeBSD flight.
> ...
> > +case "$branch" in
> > +freebsd-*)
> > +IFS=$'\n'
> 
> That's quite brave, but I don't object.  I would have piped the output
> into `read' or something.

Yes, I think that would be better, let me do that as a patch on top of
this afterwards. I'm currently building the new images and I cannot
play freely with osstest ATM.

> > +for anointed in \> +`./mg-anoint list-prepared "freebsd build 
> > $freebsd_branch*"`; do
> ^
> I think there is a missing space between $freebsd_branch and * ?
> 
> > +IFS=$'\n'
> > +for anointed in \
> > +`./mg-anoint list-prepared "freebsd build $freebsd_branch*"`; do
> > +# Retrieve previous successful FreeBSD build for each arch.
> > +freebsd_arch=${anointed##* }
> > +freebsd_envvar="FREEBSD_${freebsd_arch^^}_BUILDJOB"
> > +if [ "x${!freebsd_envvar}" = "x" ]; then
> > +flight_job=`./mg-anoint retrieve "$anointed"`
> > +export ${freebsd_envvar}=${flight_job/ /.}
> > +fi
> > +done
> > +unset IFS
> 
> LGTM apart from the same missing space.
> 
> So if you add those spaces:
> 
> Acked-by: Ian Jackson 

Thanks, I've now pushed this (also with the other acks) to:

git://xenbits.xen.org/people/royger/osstest.git freebsd_v8

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())

2017-07-25 Thread Andrii Anisov

Hello Julien,


On 25.07.17 17:23, Julien Grall wrote:
I think this is by chance rather than by design. The first question to 
answer is why a Firmware would specify twice the same memory banks? Is 
it valid from the specification?

Yep, that is the question.

Regardless that, it looks like to me that the device-tree you give to 
the board should not contain the memory nodes.
The device-tree is provided by vendor in that form, and u-boot is 
theirs. It seems to me that they do not really care since the kernel 
tolerates duplication.


 ps. Linux kernel does tolerate duplicated memory nodes by merging 
memory blocks. I.e. memblock_add_range() function is commented as 
following:

/**
 * memblock_add_range - add new memblock region
 * @type: memblock type to add new region into
 * @base: base address of the new region
 * @size: size of the new region
 * @nid: nid of the new region
 * @flags: flags of the new region
 *
 * Add new memblock region [@base,@base+@size) into @type. The new
region
 * is allowed to overlap with existing ones - overlaps don't affect
already
 * existing regions.  @type is guaranteed to be minimal (all
neighbouring
 * compatible regions are merged) after the addition.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */

IMO the function description is pretty straight-forward.
But let us wait for device tree guys feedback.

--

*Andrii Anisov*



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()

2017-07-25 Thread Tim Deegan
At 16:00 +0100 on 25 Jul (1500998413), Andrew Cooper wrote:
> The purpose of guest_walk_tables() is to match the behaviour of real hardware.
> 
> A PV guest can have 2M superpages in its pagetables, via the M2P and the
> initial initrd mapping, even if it isn't permitted to create arbitrary 2M
> superpage mappings.

Can the domain builder (or Xen?) really give a guest superpage
mappings for its initrd?  Wouldn't that cause problems for live
migration?

In any case this patch looks correct: the presence of superpages in PV
pagetables is decided by the PV MM rules, so the walker should accept them.

Reviewed-by: Tim Deegan 

> guest_can_use_l2_superpages() checking opt_allow_superpage is a piece of PV
> guest policy enforcement, rather than its intended purpose of meaning "would
> hardware tolerate finding an L2 superpage with these control settings?"
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Tim Deegan 
> CC: George Dunlap 
> CC: Wei Liu 
> ---
>  xen/include/asm-x86/guest_pt.h | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
> index 72126d5..08031c8 100644
> --- a/xen/include/asm-x86/guest_pt.h
> +++ b/xen/include/asm-x86/guest_pt.h
> @@ -205,15 +205,17 @@ static inline guest_l4e_t guest_l4e_from_gfn(gfn_t gfn, 
> u32 flags)
>  static inline bool guest_can_use_l2_superpages(const struct vcpu *v)
>  {
>  /*
> + * PV guests use Xen's paging settings.  Being 4-level, 2M
> + * superpages are unconditionally supported.
> + *
>   * The L2 _PAGE_PSE bit must be honoured in HVM guests, whenever
>   * CR4.PSE is set or the guest is in PAE or long mode.
>   * It's also used in the dummy PT for vcpus with CR0.PG cleared.
>   */
> -return (is_pv_vcpu(v)
> -? opt_allow_superpage
> -: (GUEST_PAGING_LEVELS != 2
> -   || !hvm_paging_enabled(v)
> -   || (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE)));
> +return (is_pv_vcpu(v) ||
> +GUEST_PAGING_LEVELS != 2 ||
> +!hvm_paging_enabled(v) ||
> +(v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE));
>  }
>  
>  static inline bool guest_can_use_l3_superpages(const struct domain *d)
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/5] xen: sched_null: check for pending tasklet work a bit earlier

2017-07-25 Thread George Dunlap
On Thu, Jun 29, 2017 at 1:56 PM, Dario Faggioli
 wrote:
> Whether or not there's pending tasklet work to do, it's
> something we know from the tasklet_work_scheduled parameter.
>
> Deal with that as soon as possible, like all other schedulers
> do.
>
> Signed-off-by: Dario Faggioli 

Reviewed-by: George Dunlap 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 20/20] osstest: save/retrieve the last successfully tested FreeBSD build

2017-07-25 Thread Ian Jackson
Roger Pau Monne writes ("[PATCH v7 20/20] osstest: save/retrieve the last 
successfully tested FreeBSD build"):
> And use it in order to install the hosts for the next FreeBSD flight.
...
> +case "$branch" in
> +freebsd-*)
> +IFS=$'\n'

That's quite brave, but I don't object.  I would have piped the output
into `read' or something.

> +for anointed in \> +`./mg-anoint list-prepared "freebsd build 
> $freebsd_branch*"`; do
^
I think there is a missing space between $freebsd_branch and * ?

> +IFS=$'\n'
> +for anointed in \
> +`./mg-anoint list-prepared "freebsd build $freebsd_branch*"`; do
> +# Retrieve previous successful FreeBSD build for each arch.
> +freebsd_arch=${anointed##* }
> +freebsd_envvar="FREEBSD_${freebsd_arch^^}_BUILDJOB"
> +if [ "x${!freebsd_envvar}" = "x" ]; then
> +flight_job=`./mg-anoint retrieve "$anointed"`
> +export ${freebsd_envvar}=${flight_job/ /.}
> +fi
> +done
> +unset IFS

LGTM apart from the same missing space.

So if you add those spaces:

Acked-by: Ian Jackson 

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   3   >