[Xen-devel] [ovmf test] 112305: all pass - PUSHED
flight 112305 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/112305/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 3b341e263da957b2c8896317f41cc32880c878b0 baseline version: ovmf 1683ecec41a7c944783c51efa75375f1e0a71d08 Last test of basis 112091 2017-07-21 10:17:54 Z4 days Testing same since 112305 2017-07-26 02:04:42 Z0 days1 attempts People who touched revisions under test: Liming Gaojobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=ovmf + revision=3b341e263da957b2c8896317f41cc32880c878b0 + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 3b341e263da957b2c8896317f41cc32880c878b0 + branch=ovmf + revision=3b341e263da957b2c8896317f41cc32880c878b0 + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . ./cri-common ++ . ./cri-getconfig ++ umask 002 + select_xenbranch + case "$branch" in + tree=ovmf + xenbranch=xen-unstable + '[' xovmf = xlinux ']' + linuxbranch= + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + select_prevxenbranch ++ ./cri-getprevxenbranch xen-unstable + prevxenbranch=xen-4.9-testing + '[' x3b341e263da957b2c8896317f41cc32880c878b0 = x ']' + : tested/2.6.39.x + . ./ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{"OsstestUpstream"} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/qemu-xen-traditional.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://xenbits.xen.org/xtf.git ++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git ++ : git://xenbits.xen.org/xtf.git ++ : git://xenbits.xen.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git ++ : git://git.seabios.org/seabios.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git ++ : git://xenbits.xen.org/osstest/seabios.git ++ : https://github.com/tianocore/edk2.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git ++ : git://xenbits.xen.org/osstest/ovmf.git ++ : git://xenbits.xen.org/osstest/linux-firmware.git ++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git ++ : git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git ++ :
[Xen-devel] [qemu-mainline test] 112290: regressions - trouble: blocked/broken/fail/pass
flight 112290 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/112290/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-xl-credit2 4 host-install(4)broken REGR. vs. 111765 test-arm64-arm64-xl 4 host-install(4)broken REGR. vs. 111765 test-arm64-arm64-xl-xsm 4 host-install(4)broken REGR. vs. 111765 test-arm64-arm64-libvirt-xsm 4 host-install(4)broken REGR. vs. 111765 build-armhf-xsm 5 host-build-prep fail REGR. vs. 111765 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install fail REGR. vs. 111765 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install fail REGR. vs. 111765 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install fail REGR. vs. 111765 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-xl-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 111765 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 111765 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 111765 test-amd64-amd64-xl-rtds 10 debian-install fail like 111765 test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass version targeted for testing: qemuu4c4414a4388f902b7ae2814f9a64898dd0e426a5 baseline version: qemuu31fe1c414501047cbb91b695bdccc0068496dcf6 Last test of basis 111765 2017-07-13 10:20:16 Z 12 days Failing since111790 2017-07-14 04:20:46 Z 11 days 16 attempts Testing same since 112290 2017-07-25 14:08:31 Z0 days1 attempts People who touched revisions under test: Alex BennéeAlex Williamson Alexander Graf Alexey G Alexey Gerasimenko Alexey Kardashevskiy Alistair Francis Anthony PERARD Anton Nefedov Aurelien Jarno Bharata B Rao Boqun Feng (Intel) Christian Borntraeger Christian Borntraeger for the s390 part. Claudio Imbrenda Cornelia Huck Cornelia Huck Cédric Le Goater Daniel Barboza Daniel Henrique Barboza Daniel P. Berrange Daniel Rempel
[Xen-devel] [xen-unstable test] 112286: tolerable trouble: broken/fail/pass - PUSHED
flight 112286 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/112286/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-arm64-arm64-xl-xsm 4 host-install(4) broken pass in 112274 test-arm64-arm64-xl-credit2 4 host-install(4) broken pass in 112274 test-amd64-i386-xl-qemuu-ovmf-amd64 5 host-ping-check-native fail in 112274 pass in 112286 test-armhf-armhf-xl 6 xen-installfail pass in 112274 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 17 guest-start.2fail REGR. vs. 112143 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail REGR. vs. 112260 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 18 guest-start/win.repeat fail in 112274 like 112260 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail in 112274 like 112260 test-arm64-arm64-xl-xsm 13 migrate-support-check fail in 112274 never pass test-arm64-arm64-xl-xsm 14 saverestore-support-check fail in 112274 never pass test-arm64-arm64-xl-credit2 13 migrate-support-check fail in 112274 never pass test-arm64-arm64-xl-credit2 14 saverestore-support-check fail in 112274 never pass test-armhf-armhf-xl 13 migrate-support-check fail in 112274 never pass test-armhf-armhf-xl 14 saverestore-support-check fail in 112274 never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 112210 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 112260 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 112260 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 112260 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 112260 test-amd64-amd64-xl-rtds 10 debian-install fail like 112260 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail never pass test-armhf-armhf-xl-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass version targeted for testing: xen
[Xen-devel] [xen-unstable-smoke test] 112297: tolerable trouble: broken/pass - PUSHED
flight 112297 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/112297/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 6c9abf0e8022807bb7d677570d0775659950ff1a baseline version: xen 33a0b4fe90f1ef1a104dd454c931bb46d417ffca Last test of basis 112284 2017-07-25 10:00:59 Z0 days Testing same since 112297 2017-07-25 20:11:50 Z0 days1 attempts People who touched revisions under test: Andrew CooperChao Gao George Dunlap Julien Grall jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm broken test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=xen-unstable-smoke + revision=6c9abf0e8022807bb7d677570d0775659950ff1a + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 6c9abf0e8022807bb7d677570d0775659950ff1a + branch=xen-unstable-smoke + revision=6c9abf0e8022807bb7d677570d0775659950ff1a + . ./cri-lock-repos ++ . ./cri-common +++ . ./cri-getconfig +++ umask 002 +++ getrepos getconfig Repos perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' +++ local repos=/home/osstest/repos +++ '[' -z /home/osstest/repos ']' +++ '[' '!' -d /home/osstest/repos ']' +++ echo /home/osstest/repos ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . ./cri-common ++ . ./cri-getconfig ++ umask 002 + select_xenbranch + case "$branch" in + tree=xen + xenbranch=xen-unstable-smoke + qemuubranch=qemu-upstream-unstable + '[' xxen = xlinux ']' + linuxbranch= + '[' xqemu-upstream-unstable = x ']' + select_prevxenbranch ++ ./cri-getprevxenbranch xen-unstable-smoke + prevxenbranch=xen-4.9-testing + '[' x6c9abf0e8022807bb7d677570d0775659950ff1a = x ']' + : tested/2.6.39.x + . ./ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{"OsstestUpstream"} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/qemu-xen-traditional.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://xenbits.xen.org/xtf.git ++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git ++ : git://xenbits.xen.org/xtf.git ++ : git://xenbits.xen.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : git ++ : git://xenbits.xen.org/osstest/rumprun.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git ++ : git://git.seabios.org/seabios.git ++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git ++ : git://xenbits.xen.org/osstest/seabios.git ++ :
[Xen-devel] [linux-linus test] 112277: regressions - FAIL
flight 112277 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/112277/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-examine 7 reboot fail REGR. vs. 110515 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-xl-pvh-intel 7 xen-bootfail REGR. vs. 110515 test-amd64-amd64-qemuu-nested-intel 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-amd64-pvgrub 7 xen-bootfail REGR. vs. 110515 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-xl-qemut-debianhvm-amd64 7 xen-bootfail REGR. vs. 110515 test-amd64-amd64-pygrub 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-i386-pvgrub 7 xen-boot fail REGR. vs. 110515 test-amd64-amd64-libvirt-pair 21 guest-start/debian fail REGR. vs. 110515 test-amd64-i386-libvirt-pair 21 guest-start/debian fail REGR. vs. 110515 test-amd64-i386-pair 21 guest-start/debian fail REGR. vs. 110515 Tests which are failing intermittently (not blocking): test-amd64-amd64-pair21 guest-start/debian fail pass in 112271 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-win7-amd64 18 guest-start/win.repeat fail blocked in 110515 test-amd64-amd64-xl-qemut-win7-amd64 18 guest-start/win.repeat fail in 112271 blocked in 110515 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail in 112271 like 110515 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 110515 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 110515 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 110515 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 110515 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 110515 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 110515 test-amd64-amd64-xl-rtds 10 debian-install fail like 110515 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 110515 test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore fail never pass test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail
Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-4.13
On 07/25/2017 03:21 PM, Konrad Rzeszutek Wilk wrote: > Hi Jens, > > Please git pull in your branch "for-linus" the following > branch which is based on 765e40b675a9566459ddcb8358ad16f3b8344bbe > "blk-mq: map queues to all present CPUs": > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git > stable/for-jens-4.13 > > It has two bug-fixes for the xen-blkfront driver. > > Thank you! > > > drivers/block/xen-blkfront.c | 21 - > 1 file changed, 12 insertions(+), 9 deletions(-) > > Dongli Zhang (1): > xen/blkfront: always allocate grants first from per-queue persistent > grants > > Junxiao Bi (1): > xen-blkfront: fix mq start/stop race Pulled, thanks. -- Jens Axboe ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 07/13] xen/pvcalls: implement accept command
Send PVCALLS_ACCEPT to the backend. Allocate a new active socket. Make sure that only one accept command is executed at any given time by setting PVCALLS_FLAG_ACCEPT_INFLIGHT and waiting on the inflight_accept_req waitqueue. sock->sk->sk_send_head is not used for ip sockets: reuse the field to store a pointer to the struct sock_mapping corresponding to the socket. Convert the new struct socket pointer into an uint64_t and use it as id for the new socket to pass to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 79 + drivers/xen/pvcalls-front.h | 3 ++ 2 files changed, 82 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 3b5d50e..b8c4538 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -413,6 +413,85 @@ int pvcalls_front_listen(struct socket *sock, int backlog) return ret; } +int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + struct sock_mapping *map2 = NULL; + struct xen_pvcalls_request *req; + int notify, req_id, ret, evtchn; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + if (map->passive.status != PVCALLS_STATUS_LISTEN) + return -EINVAL; + + /* +* Backend only supports 1 inflight accept request, will return +* errors for the others +*/ + if (test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, +(void *)>passive.flags)) { + if (wait_event_interruptible(map->passive.inflight_accept_req, + !test_and_set_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, + (void *)>passive.flags)) + != 0) + return -EINTR; + } + + + newsock->sk = kzalloc(sizeof(*newsock->sk), GFP_KERNEL); + if (newsock->sk == NULL) + return -ENOMEM; + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + kfree(newsock->sk); + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + + map2 = create_active(); + + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_ACCEPT; + req->u.accept.id = (uint64_t) sock; + req->u.accept.ref = map2->active.ref; + req->u.accept.id_new = (uint64_t) newsock; + req->u.accept.evtchn = evtchn; + + list_add_tail(>list, >socket_mappings); + WRITE_ONCE(newsock->sk->sk_send_head, (void *)map2); + map2->sock = newsock; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + clear_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, (void *)>passive.flags); + wake_up(>passive.inflight_accept_req); + + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + return ret; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index aa8fe10..ab4f1da 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -10,5 +10,8 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len); int pvcalls_front_listen(struct socket *sock, int backlog); +int pvcalls_front_accept(struct socket *sock, +struct socket *newsock, +int flags); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 09/13] xen/pvcalls: implement recvmsg
Implement recvmsg by copying data from the "in" ring. If not enough data is available and the recvmsg call is blocking, then wait on the inflight_conn_req waitqueue. Take the active socket in_mutex so that only one function can access the ring at any given time. If not enough data is available on the ring, rather than returning immediately or sleep-waiting, spin for up to 5000 cycles. This small optimization turns out to improve performance and latency significantly. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 106 drivers/xen/pvcalls-front.h | 4 ++ 2 files changed, 110 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index d8ed280..b4ca569 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -96,6 +96,20 @@ static int pvcalls_front_write_todo(struct sock_mapping *map) return size - pvcalls_queued(prod, cons, size); } +static bool pvcalls_front_read_todo(struct sock_mapping *map) +{ + struct pvcalls_data_intf *intf = map->active.ring; + RING_IDX cons, prod; + int32_t error; + + cons = intf->in_cons; + prod = intf->in_prod; + error = intf->in_error; + return (error != 0 || + pvcalls_queued(prod, cons, + XEN_FLEX_RING_SIZE(intf->ring_order))) != 0; +} + static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { struct xenbus_device *dev = dev_id; @@ -418,6 +432,98 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg, return tot_sent; } +static int __read_ring(struct pvcalls_data_intf *intf, + struct pvcalls_data *data, + struct iov_iter *msg_iter, + size_t len, int flags) +{ + RING_IDX cons, prod, size, masked_prod, masked_cons; + RING_IDX array_size = XEN_FLEX_RING_SIZE(intf->ring_order); + int32_t error; + + cons = intf->in_cons; + prod = intf->in_prod; + error = intf->in_error; + /* get pointers before reading from the ring */ + virt_rmb(); + if (error < 0) + return error; + + size = pvcalls_queued(prod, cons, array_size); + masked_prod = pvcalls_mask(prod, array_size); + masked_cons = pvcalls_mask(cons, array_size); + + if (size == 0) + return 0; + + if (len > size) + len = size; + + if (masked_prod > masked_cons) { + copy_to_iter(data->in + masked_cons, len, msg_iter); + } else { + if (len > (array_size - masked_cons)) { + copy_to_iter(data->in + masked_cons, +array_size - masked_cons, msg_iter); + copy_to_iter(data->in, +len - (array_size - masked_cons), +msg_iter); + } else { + copy_to_iter(data->in + masked_cons, len, msg_iter); + } + } + /* read data from the ring before increasing the index */ + virt_mb(); + if (!(flags & MSG_PEEK)) + intf->in_cons += len; + + return len; +} + +int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, +int flags) +{ + struct pvcalls_bedata *bedata; + int ret = -EAGAIN; + struct sock_mapping *map; + int count = 0; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + if (flags & (MSG_CMSG_CLOEXEC|MSG_ERRQUEUE|MSG_OOB|MSG_TRUNC)) + return -EOPNOTSUPP; + + mutex_lock(>active.in_mutex); + if (len > XEN_FLEX_RING_SIZE(map->active.ring->ring_order)) + len = XEN_FLEX_RING_SIZE(map->active.ring->ring_order); + + while (!(flags & MSG_DONTWAIT) && !pvcalls_front_read_todo(map)) { + if (count < PVCALLS_FRONT_MAX_SPIN) + count++; + else + wait_event_interruptible(map->active.inflight_conn_req, +pvcalls_front_read_todo(map)); + } + ret = __read_ring(map->active.ring, >active.data, + >msg_iter, len, flags); + + if (ret > 0) + notify_remote_via_irq(map->active.irq); + if (ret == 0) + ret = -EAGAIN; + if (ret == -ENOTCONN) + ret = 0; + + mutex_unlock(>active.in_mutex); + return ret; +} + int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) { struct pvcalls_bedata *bedata; diff --git
[Xen-devel] [PATCH v2 10/13] xen/pvcalls: implement poll command
For active sockets, check the indexes and use the inflight_conn_req waitqueue to wait. For passive sockets, send PVCALLS_POLL to the backend. Use the inflight_accept_req waitqueue if an accept is outstanding. Otherwise use the inflight_req waitqueue: inflight_req is awaken when a new response is received; on wakeup we check whether the POLL response is arrived by looking at the PVCALLS_FLAG_POLL_RET flag. We set the flag from pvcalls_front_event_handler, if the response was for a POLL command. In pvcalls_front_event_handler, get the struct socket pointer from the poll id (we previously converted struct socket* to uint64_t and used it as id). Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 134 drivers/xen/pvcalls-front.h | 3 + 2 files changed, 126 insertions(+), 11 deletions(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index b4ca569..833b717 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -130,17 +130,35 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons); req_id = rsp->req_id; - src = (uint8_t *)>rsp[req_id]; - src += sizeof(rsp->req_id); - dst = (uint8_t *)rsp; - dst += sizeof(rsp->req_id); - memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); - /* -* First copy the rest of the data, then req_id. It is -* paired with the barrier when accessing bedata->rsp. -*/ - smp_wmb(); - WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); + if (rsp->cmd == PVCALLS_POLL) { + struct socket *sock = (struct socket *) rsp->u.poll.id; + struct sock_mapping *map = + (struct sock_mapping *) + READ_ONCE(sock->sk->sk_send_head); + + set_bit(PVCALLS_FLAG_POLL_RET, + (void *)>passive.flags); + /* +* Set RET, then clear INFLIGHT. It pairs with +* the checks at the beginning of +* pvcalls_front_poll_passive. +*/ + smp_wmb(); + clear_bit(PVCALLS_FLAG_POLL_INFLIGHT, + (void *)>passive.flags); + } else { + src = (uint8_t *)>rsp[req_id]; + src += sizeof(rsp->req_id); + dst = (uint8_t *)rsp; + dst += sizeof(rsp->req_id); + memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); + /* +* First copy the rest of the data, then req_id. It is +* paired with the barrier when accessing bedata->rsp. +*/ + smp_wmb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); + } done = 1; bedata->ring.rsp_cons++; @@ -707,6 +725,100 @@ int pvcalls_front_accept(struct socket *sock, struct socket *newsock, int flags) return ret; } +static unsigned int pvcalls_front_poll_passive(struct file *file, + struct pvcalls_bedata *bedata, + struct sock_mapping *map, + poll_table *wait) +{ + int notify, req_id; + struct xen_pvcalls_request *req; + + if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, +(void *)>passive.flags)) { + poll_wait(file, >passive.inflight_accept_req, wait); + return 0; + } + + if (test_and_clear_bit(PVCALLS_FLAG_POLL_RET, + (void *)>passive.flags)) + return POLLIN; + + /* +* First check RET, then INFLIGHT. No barriers necessary to +* ensure execution ordering because of the conditional +* instructions creating control dependencies. +*/ + + if (test_and_set_bit(PVCALLS_FLAG_POLL_INFLIGHT, +(void *)>passive.flags)) { + poll_wait(file, >inflight_req, wait); + return 0; + } + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; +
[Xen-devel] [PATCH v2 01/13] xen/pvcalls: introduce the pvcalls xenbus frontend
Introduce a xenbus frontend for the pvcalls protocol, as defined by https://xenbits.xen.org/docs/unstable/misc/pvcalls.html. This patch only adds the stubs, the code will be added by the following patches. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 61 + 1 file changed, 61 insertions(+) create mode 100644 drivers/xen/pvcalls-front.c diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c new file mode 100644 index 000..a8d38c2 --- /dev/null +++ b/drivers/xen/pvcalls-front.c @@ -0,0 +1,61 @@ +/* + * (c) 2017 Stefano Stabellini + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include + +#include +#include +#include +#include +#include + +static const struct xenbus_device_id pvcalls_front_ids[] = { + { "pvcalls" }, + { "" } +}; + +static int pvcalls_front_remove(struct xenbus_device *dev) +{ + return 0; +} + +static int pvcalls_front_probe(struct xenbus_device *dev, + const struct xenbus_device_id *id) +{ + return 0; +} + +static void pvcalls_front_changed(struct xenbus_device *dev, + enum xenbus_state backend_state) +{ +} + +static struct xenbus_driver pvcalls_front_driver = { + .ids = pvcalls_front_ids, + .probe = pvcalls_front_probe, + .remove = pvcalls_front_remove, + .otherend_changed = pvcalls_front_changed, +}; + +static int __init pvcalls_frontend_init(void) +{ + if (!xen_domain()) + return -ENODEV; + + pr_info("Initialising Xen pvcalls frontend driver\n"); + + return xenbus_register_frontend(_front_driver); +} + +module_init(pvcalls_frontend_init); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 04/13] xen/pvcalls: implement connect command
Send PVCALLS_CONNECT to the backend. Allocate a new ring and evtchn for the active socket. Introduce a data structure to keep track of sockets. Introduce a waitqueue to allow the frontend to wait on data coming from the backend on the active socket (recvmsg command). Two mutexes (one of reads and one for writes) will be used to protect the active socket in and out rings from concurrent accesses. sock->sk->sk_send_head is not used for ip sockets: reuse the field to store a pointer to the struct sock_mapping corresponding to the socket. This way, we can easily get the struct sock_mapping from the struct socket. Convert the struct socket pointer into an uint64_t and use it as id for the new socket to pass to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 177 +--- drivers/xen/pvcalls-front.h | 2 + 2 files changed, 168 insertions(+), 11 deletions(-) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index d1dbcf1..d0f5f42 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -13,6 +13,10 @@ */ #include +#include +#include + +#include #include #include @@ -40,6 +44,24 @@ struct pvcalls_bedata { }; struct xenbus_device *pvcalls_front_dev; +struct sock_mapping { + bool active_socket; + struct list_head list; + struct socket *sock; + union { + struct { + int irq; + grant_ref_t ref; + struct pvcalls_data_intf *ring; + struct pvcalls_data data; + struct mutex in_mutex; + struct mutex out_mutex; + + wait_queue_head_t inflight_conn_req; + } active; + }; +}; + static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { struct xenbus_device *dev = dev_id; @@ -84,6 +106,18 @@ static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) return IRQ_HANDLED; } +static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map) +{ + struct sock_mapping *map = sock_map; + + if (map == NULL) + return IRQ_HANDLED; + + wake_up_interruptible(>active.inflight_conn_req); + + return IRQ_HANDLED; +} + int pvcalls_front_socket(struct socket *sock) { struct pvcalls_bedata *bedata; @@ -137,6 +171,127 @@ int pvcalls_front_socket(struct socket *sock) return ret; } +static struct sock_mapping *create_active(int *evtchn) +{ + struct sock_mapping *map = NULL; + void *bytes; + int ret, irq = -1, i; + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (map == NULL) + return NULL; + + init_waitqueue_head(>active.inflight_conn_req); + + map->active.ring = (struct pvcalls_data_intf *) + __get_free_page(GFP_KERNEL | __GFP_ZERO); + if (map->active.ring == NULL) + goto out_error; + memset(map->active.ring, 0, XEN_PAGE_SIZE); + map->active.ring->ring_order = RING_ORDER; + bytes = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, + map->active.ring->ring_order); + if (bytes == NULL) + goto out_error; + for (i = 0; i < (1 << map->active.ring->ring_order); i++) + map->active.ring->ref[i] = gnttab_grant_foreign_access( + pvcalls_front_dev->otherend_id, + pfn_to_gfn(virt_to_pfn(bytes) + i), 0); + + map->active.ref = gnttab_grant_foreign_access( + pvcalls_front_dev->otherend_id, + pfn_to_gfn(virt_to_pfn((void *)map->active.ring)), 0); + + map->active.data.in = bytes; + map->active.data.out = bytes + + XEN_FLEX_RING_SIZE(map->active.ring->ring_order); + + ret = xenbus_alloc_evtchn(pvcalls_front_dev, evtchn); + if (ret) + goto out_error; + irq = bind_evtchn_to_irqhandler(*evtchn, pvcalls_front_conn_handler, + 0, "pvcalls-frontend", map); + if (irq < 0) + goto out_error; + + map->active.irq = irq; + map->active_socket = true; + mutex_init(>active.in_mutex); + mutex_init(>active.out_mutex); + + return map; + +out_error: + if (irq >= 0) + unbind_from_irqhandler(irq, map); + else if (*evtchn >= 0) + xenbus_free_evtchn(pvcalls_front_dev, *evtchn); + kfree(map->active.data.in); + kfree(map->active.ring); + kfree(map); + return NULL; +} + +int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, + int addr_len, int flags) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map = NULL; + struct
[Xen-devel] [PATCH v2 05/13] xen/pvcalls: implement bind command
Send PVCALLS_BIND to the backend. Introduce a new structure, part of struct sock_mapping, to store information specific to passive sockets. Introduce a status field to keep track of the status of the passive socket. Introduce a waitqueue for the "accept" command (see the accept command implementation): it is used to allow only one outstanding accept command at any given time and to implement polling on the passive socket. Introduce a flags field to keep track of in-flight accept and poll commands. sock->sk->sk_send_head is not used for ip sockets: reuse the field to store a pointer to the struct sock_mapping corresponding to the socket. Convert the struct socket pointer into an uint64_t and use it as id for the socket to pass to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 73 + drivers/xen/pvcalls-front.h | 3 ++ 2 files changed, 76 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index d0f5f42..af2ce20 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -59,6 +59,23 @@ struct sock_mapping { wait_queue_head_t inflight_conn_req; } active; + struct { + /* Socket status */ +#define PVCALLS_STATUS_UNINITALIZED 0 +#define PVCALLS_STATUS_BIND 1 +#define PVCALLS_STATUS_LISTEN2 + uint8_t status; + /* +* Internal state-machine flags. +* Only one accept operation can be inflight for a socket. +* Only one poll operation can be inflight for a given socket. +*/ +#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0 +#define PVCALLS_FLAG_POLL_INFLIGHT 1 +#define PVCALLS_FLAG_POLL_RET2 + uint8_t flags; + wait_queue_head_t inflight_accept_req; + } passive; }; }; @@ -292,6 +309,62 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, return ret; } +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map = NULL; + struct xen_pvcalls_request *req; + int notify, req_id, ret; + + if (!pvcalls_front_dev) + return -ENOTCONN; + if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM) + return -ENOTSUPP; + bedata = dev_get_drvdata(_front_dev->dev); + + map = kzalloc(sizeof(*map), GFP_KERNEL); + if (map == NULL) + return -ENOMEM; + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + kfree(map); + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + map->sock = sock; + req->cmd = PVCALLS_BIND; + req->u.bind.id = (uint64_t) sock; + memcpy(req->u.bind.addr, addr, sizeof(*addr)); + req->u.bind.len = addr_len; + + init_waitqueue_head(>passive.inflight_accept_req); + + list_add_tail(>list, >socketpass_mappings); + WRITE_ONCE(sock->sk->sk_send_head, (void *)map); + map->active_socket = false; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + map->passive.status = PVCALLS_STATUS_BIND; + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + return 0; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index 63b0417..8b0a274 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -6,5 +6,8 @@ int pvcalls_front_socket(struct socket *sock); int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, int addr_len, int flags); +int pvcalls_front_bind(struct socket *sock, + struct sockaddr *addr, + int addr_len); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 11/13] xen/pvcalls: implement release command
Send PVCALLS_RELEASE to the backend and wait for a reply. Take both in_mutex and out_mutex to avoid concurrent accesses. Then, free the socket. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 85 + drivers/xen/pvcalls-front.h | 1 + 2 files changed, 86 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 833b717..5a4040e 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -184,6 +184,23 @@ static irqreturn_t pvcalls_front_conn_handler(int irq, void *sock_map) return IRQ_HANDLED; } +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata, + struct sock_mapping *map) +{ + int i; + + spin_lock(>pvcallss_lock); + if (!list_empty(>list)) + list_del_init(>list); + spin_unlock(>pvcallss_lock); + + for (i = 0; i < (1 << map->active.ring->ring_order); i++) + gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0); + gnttab_end_foreign_access(map->active.ref, 0, 0); + free_page((unsigned long)map->active.ring); + unbind_from_irqhandler(map->active.irq, map); +} + int pvcalls_front_socket(struct socket *sock) { struct pvcalls_bedata *bedata; @@ -819,6 +836,74 @@ unsigned int pvcalls_front_poll(struct file *file, struct socket *sock, return pvcalls_front_poll_passive(file, bedata, map, wait); } +int pvcalls_front_release(struct socket *sock) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + int req_id, notify; + struct xen_pvcalls_request *req; + + if (!pvcalls_front_dev) + return -EIO; + bedata = dev_get_drvdata(_front_dev->dev); + if (!bedata) + return -EIO; + + if (sock->sk == NULL) + return 0; + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (map == NULL) + return 0; + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + WRITE_ONCE(sock->sk->sk_send_head, NULL); + + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_RELEASE; + req->u.release.id = (uint64_t)sock; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + if (map->active_socket) { + /* +* Set in_error and wake up inflight_conn_req to force +* recvmsg waiters to exit. +*/ + map->active.ring->in_error = -EBADF; + wake_up_interruptible(>active.inflight_conn_req); + + mutex_lock(>active.in_mutex); + mutex_lock(>active.out_mutex); + pvcalls_front_free_map(bedata, map); + mutex_unlock(>active.out_mutex); + mutex_unlock(>active.in_mutex); + kfree(map); + } else { + spin_lock(>pvcallss_lock); + list_del_init(>list); + kfree(map); + spin_unlock(>pvcallss_lock); + } + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + + return 0; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index 25e05b8..3332978 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -23,5 +23,6 @@ int pvcalls_front_recvmsg(struct socket *sock, unsigned int pvcalls_front_poll(struct file *file, struct socket *sock, poll_table *wait); +int pvcalls_front_release(struct socket *sock); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 13/13] xen: introduce a Kconfig option to enable the pvcalls frontend
Also add pvcalls-front to the Makefile. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/Kconfig | 9 + drivers/xen/Makefile | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index 4545561..0b2c828 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -196,6 +196,15 @@ config XEN_PCIDEV_BACKEND If in doubt, say m. +config XEN_PVCALLS_FRONTEND + tristate "XEN PV Calls frontend driver" + depends on INET && XEN + help + Experimental frontend for the Xen PV Calls protocol + (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It + sends a small set of POSIX calls to the backend, which + implements them. + config XEN_PVCALLS_BACKEND bool "XEN PV Calls backend driver" depends on INET && XEN && XEN_BACKEND diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index 480b928..afb9e03 100644 --- a/drivers/xen/Makefile +++ b/drivers/xen/Makefile @@ -39,6 +39,7 @@ obj-$(CONFIG_XEN_EFI) += efi.o obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o obj-$(CONFIG_XEN_AUTO_XLATE) += xlate_mmu.o obj-$(CONFIG_XEN_PVCALLS_BACKEND) += pvcalls-back.o +obj-$(CONFIG_XEN_PVCALLS_FRONTEND) += pvcalls-front.o xen-evtchn-y := evtchn.o xen-gntdev-y := gntdev.o xen-gntalloc-y := gntalloc.o -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 08/13] xen/pvcalls: implement sendmsg
Send data to an active socket by copying data to the "out" ring. Take the active socket out_mutex so that only one function can access the ring at any given time. If not enough room is available on the ring, rather than returning immediately or sleep-waiting, spin for up to 5000 cycles. This small optimization turns out to improve performance significantly. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 109 drivers/xen/pvcalls-front.h | 3 ++ 2 files changed, 112 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index b8c4538..d8ed280 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -29,6 +29,7 @@ #define PVCALLS_INVALID_ID (UINT_MAX) #define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) +#define PVCALLS_FRONT_MAX_SPIN 5000 struct pvcalls_bedata { struct xen_pvcalls_front_ring ring; @@ -79,6 +80,22 @@ struct sock_mapping { }; }; +static int pvcalls_front_write_todo(struct sock_mapping *map) +{ + struct pvcalls_data_intf *intf = map->active.ring; + RING_IDX cons, prod, size = XEN_FLEX_RING_SIZE(intf->ring_order); + int32_t error; + + cons = intf->out_cons; + prod = intf->out_prod; + error = intf->out_error; + if (error == -ENOTCONN) + return 0; + if (error != 0) + return error; + return size - pvcalls_queued(prod, cons, size); +} + static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { struct xenbus_device *dev = dev_id; @@ -309,6 +326,98 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, return ret; } +static int __write_ring(struct pvcalls_data_intf *intf, + struct pvcalls_data *data, + struct iov_iter *msg_iter, + size_t len) +{ + RING_IDX cons, prod, size, masked_prod, masked_cons; + RING_IDX array_size = XEN_FLEX_RING_SIZE(intf->ring_order); + int32_t error; + + cons = intf->out_cons; + prod = intf->out_prod; + error = intf->out_error; + /* read indexes before continuing */ + virt_mb(); + + if (error < 0) + return error; + + size = pvcalls_queued(prod, cons, array_size); + if (size >= array_size) + return 0; + if (len > array_size - size) + len = array_size - size; + + masked_prod = pvcalls_mask(prod, array_size); + masked_cons = pvcalls_mask(cons, array_size); + + if (masked_prod < masked_cons) { + copy_from_iter(data->out + masked_prod, len, msg_iter); + } else { + if (len > array_size - masked_prod) { + copy_from_iter(data->out + masked_prod, + array_size - masked_prod, msg_iter); + copy_from_iter(data->out, + len - (array_size - masked_prod), + msg_iter); + } else { + copy_from_iter(data->out + masked_prod, len, msg_iter); + } + } + /* write to ring before updating pointer */ + virt_wmb(); + intf->out_prod += len; + + return len; +} + +int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg, + size_t len) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + int sent = 0, tot_sent = 0; + int count = 0, flags; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + flags = msg->msg_flags; + if (flags & (MSG_CONFIRM|MSG_DONTROUTE|MSG_EOR|MSG_OOB)) + return -EOPNOTSUPP; + + mutex_lock(>active.out_mutex); + if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) { + mutex_unlock(>active.out_mutex); + return -EAGAIN; + } + +again: + count++; + sent = __write_ring(map->active.ring, + >active.data, >msg_iter, + len); + if (sent > 0) { + len -= sent; + tot_sent += sent; + notify_remote_via_irq(map->active.irq); + } + if (sent >= 0 && len > 0 && count < PVCALLS_FRONT_MAX_SPIN) + goto again; + if (sent < 0) + tot_sent = sent; + + mutex_unlock(>active.out_mutex); + return tot_sent; +} + int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) { struct pvcalls_bedata
[Xen-devel] [PATCH v2 12/13] xen/pvcalls: implement frontend disconnect
Implement pvcalls frontend removal function. Go through the list of active and passive sockets and free them all, one at a time. Signed-off-by: Stefano StabelliniReviewed-by: Juergen Gross CC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 5a4040e..b3d4675 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -911,6 +911,34 @@ int pvcalls_front_release(struct socket *sock) static int pvcalls_front_remove(struct xenbus_device *dev) { + struct pvcalls_bedata *bedata; + struct sock_mapping *map = NULL, *n; + + bedata = dev_get_drvdata(_front_dev->dev); + + list_for_each_entry_safe(map, n, >socket_mappings, list) { + mutex_lock(>active.in_mutex); + mutex_lock(>active.out_mutex); + pvcalls_front_free_map(bedata, map); + mutex_unlock(>active.out_mutex); + mutex_unlock(>active.in_mutex); + kfree(map); + } + list_for_each_entry_safe(map, n, >socketpass_mappings, list) { + spin_lock(>pvcallss_lock); + list_del_init(>list); + spin_unlock(>pvcallss_lock); + kfree(map); + } + if (bedata->irq > 0) + unbind_from_irqhandler(bedata->irq, dev); + if (bedata->ref >= 0) + gnttab_end_foreign_access(bedata->ref, 0, 0); + kfree(bedata->ring.sring); + kfree(bedata); + dev_set_drvdata(>dev, NULL); + xenbus_switch_state(dev, XenbusStateClosed); + pvcalls_front_dev = NULL; return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 02/13] xen/pvcalls: connect to the backend
Implement the probe function for the pvcalls frontend. Read the supported versions, max-page-order and function-calls nodes from xenstore. Introduce a data structure named pvcalls_bedata. It contains pointers to the command ring, the event channel, a list of active sockets and a list of passive sockets. Lists accesses are protected by a spin_lock. Introduce a waitqueue to allow waiting for a response on commands sent to the backend. Introduce an array of struct xen_pvcalls_response to store commands responses. Only one frontend<->backend connection is supported at any given time for a guest. Store the active frontend device to a static pointer. Introduce a stub functions for the event handler. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 153 1 file changed, 153 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index a8d38c2..5e0b265 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -20,6 +20,29 @@ #include #include +#define PVCALLS_INVALID_ID (UINT_MAX) +#define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) + +struct pvcalls_bedata { + struct xen_pvcalls_front_ring ring; + grant_ref_t ref; + int irq; + + struct list_head socket_mappings; + struct list_head socketpass_mappings; + spinlock_t pvcallss_lock; + + wait_queue_head_t inflight_req; + struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING]; +}; +struct xenbus_device *pvcalls_front_dev; + +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) +{ + return IRQ_HANDLED; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } @@ -33,12 +56,142 @@ static int pvcalls_front_remove(struct xenbus_device *dev) static int pvcalls_front_probe(struct xenbus_device *dev, const struct xenbus_device_id *id) { + int ret = -EFAULT, evtchn, ref = -1, i; + unsigned int max_page_order, function_calls, len; + char *versions; + grant_ref_t gref_head = 0; + struct xenbus_transaction xbt; + struct pvcalls_bedata *bedata = NULL; + struct xen_pvcalls_sring *sring; + + if (pvcalls_front_dev != NULL) { + dev_err(>dev, "only one PV Calls connection supported\n"); + return -EINVAL; + } + + versions = xenbus_read(XBT_NIL, dev->otherend, "versions", ); + if (!len) + return -EINVAL; + if (strcmp(versions, "1")) { + kfree(versions); + return -EINVAL; + } + kfree(versions); + ret = xenbus_scanf(XBT_NIL, dev->otherend, + "max-page-order", "%u", _page_order); + if (ret <= 0) + return -ENODEV; + if (max_page_order < RING_ORDER) + return -ENODEV; + ret = xenbus_scanf(XBT_NIL, dev->otherend, + "function-calls", "%u", _calls); + if (ret <= 0 || function_calls != 1) + return -ENODEV; + pr_info("%s max-page-order is %u\n", __func__, max_page_order); + + bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL); + if (!bedata) + return -ENOMEM; + + init_waitqueue_head(>inflight_req); + for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++) + bedata->rsp[i].req_id = PVCALLS_INVALID_ID; + + sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL | +__GFP_ZERO); + if (!sring) + goto error; + SHARED_RING_INIT(sring); + FRONT_RING_INIT(>ring, sring, XEN_PAGE_SIZE); + + ret = xenbus_alloc_evtchn(dev, ); + if (ret) + goto error; + + bedata->irq = bind_evtchn_to_irqhandler(evtchn, + pvcalls_front_event_handler, + 0, "pvcalls-frontend", dev); + if (bedata->irq < 0) { + ret = bedata->irq; + goto error; + } + + ret = gnttab_alloc_grant_references(1, _head); + if (ret < 0) + goto error; + bedata->ref = ref = gnttab_claim_grant_reference(_head); + if (ref < 0) + goto error; + gnttab_grant_foreign_access_ref(ref, dev->otherend_id, + virt_to_gfn((void *)sring), 0); + + again: + ret = xenbus_transaction_start(); + if (ret) { + xenbus_dev_fatal(dev, ret, "starting transaction"); + goto error; + } + ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1); + if (ret) + goto error_xenbus; + ret = xenbus_printf(xbt,
[Xen-devel] [PATCH v2 03/13] xen/pvcalls: implement socket command and handle events
Send a PVCALLS_SOCKET command to the backend, use the masked req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0 and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array ready for the response, and there cannot be two outstanding responses with the same req_id. Wait for the response by waiting on the inflight_req waitqueue and check for the req_id field in rsp[req_id]. Use atomic accesses to read the field. Once a response is received, clear the corresponding rsp slot by setting req_id to PVCALLS_INVALID_ID. Note that PVCALLS_INVALID_ID is invalid only from the frontend point of view. It is not part of the PVCalls protocol. pvcalls_front_event_handler is in charge of copying responses from the ring to the appropriate rsp slot. It is done by copying the body of the response first, then by copying req_id atomically. After the copies, wake up anybody waiting on waitqueue. pvcallss_lock protects accesses to the ring. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 94 + drivers/xen/pvcalls-front.h | 8 2 files changed, 102 insertions(+) create mode 100644 drivers/xen/pvcalls-front.h diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index 5e0b265..d1dbcf1 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -20,6 +20,8 @@ #include #include +#include "pvcalls-front.h" + #define PVCALLS_INVALID_ID (UINT_MAX) #define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER #define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, XEN_PAGE_SIZE) @@ -40,9 +42,101 @@ struct pvcalls_bedata { static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) { + struct xenbus_device *dev = dev_id; + struct pvcalls_bedata *bedata; + struct xen_pvcalls_response *rsp; + uint8_t *src, *dst; + int req_id = 0, more = 0, done = 0; + + if (dev == NULL) + return IRQ_HANDLED; + + bedata = dev_get_drvdata(>dev); + if (bedata == NULL) + return IRQ_HANDLED; + +again: + while (RING_HAS_UNCONSUMED_RESPONSES(>ring)) { + rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons); + + req_id = rsp->req_id; + src = (uint8_t *)>rsp[req_id]; + src += sizeof(rsp->req_id); + dst = (uint8_t *)rsp; + dst += sizeof(rsp->req_id); + memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); + /* +* First copy the rest of the data, then req_id. It is +* paired with the barrier when accessing bedata->rsp. +*/ + smp_wmb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); + + done = 1; + bedata->ring.rsp_cons++; + } + + RING_FINAL_CHECK_FOR_RESPONSES(>ring, more); + if (more) + goto again; + if (done) + wake_up(>inflight_req); return IRQ_HANDLED; } +int pvcalls_front_socket(struct socket *sock) +{ + struct pvcalls_bedata *bedata; + struct xen_pvcalls_request *req; + int notify, req_id, ret; + + if (!pvcalls_front_dev) + return -EACCES; + /* +* PVCalls only supports domain AF_INET, +* type SOCK_STREAM and protocol 0 sockets for now. +* +* Check socket type here, AF_INET and protocol checks are done +* by the caller. +*/ + if (sock->type != SOCK_STREAM) + return -ENOTSUPP; + + bedata = dev_get_drvdata(_front_dev->dev); + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + if (RING_FULL(>ring) || + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_SOCKET; + req->u.socket.id = (uint64_t) sock; + req->u.socket.domain = AF_INET; + req->u.socket.type = SOCK_STREAM; + req->u.socket.protocol = 0; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + if (wait_event_interruptible(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id) != 0) + return -EINTR; + + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + + return ret; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git
[Xen-devel] [PATCH v2 06/13] xen/pvcalls: implement listen command
Send PVCALLS_LISTEN to the backend. Signed-off-by: Stefano StabelliniCC: boris.ostrov...@oracle.com CC: jgr...@suse.com --- drivers/xen/pvcalls-front.c | 48 + drivers/xen/pvcalls-front.h | 1 + 2 files changed, 49 insertions(+) diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c index af2ce20..3b5d50e 100644 --- a/drivers/xen/pvcalls-front.c +++ b/drivers/xen/pvcalls-front.c @@ -365,6 +365,54 @@ int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len) return 0; } +int pvcalls_front_listen(struct socket *sock, int backlog) +{ + struct pvcalls_bedata *bedata; + struct sock_mapping *map; + struct xen_pvcalls_request *req; + int notify, req_id, ret; + + if (!pvcalls_front_dev) + return -ENOTCONN; + bedata = dev_get_drvdata(_front_dev->dev); + + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); + if (!map) + return -ENOTSOCK; + + if (map->passive.status != PVCALLS_STATUS_BIND) + return -EOPNOTSUPP; + + spin_lock(>pvcallss_lock); + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); + if (RING_FULL(>ring) || + bedata->rsp[req_id].req_id != PVCALLS_INVALID_ID) { + spin_unlock(>pvcallss_lock); + return -EAGAIN; + } + req = RING_GET_REQUEST(>ring, req_id); + req->req_id = req_id; + req->cmd = PVCALLS_LISTEN; + req->u.listen.id = (uint64_t) sock; + req->u.listen.backlog = backlog; + + bedata->ring.req_prod_pvt++; + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); + spin_unlock(>pvcallss_lock); + if (notify) + notify_remote_via_irq(bedata->irq); + + wait_event(bedata->inflight_req, + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); + + map->passive.status = PVCALLS_STATUS_LISTEN; + ret = bedata->rsp[req_id].ret; + /* read ret, then set this rsp slot to be reused */ + smp_mb(); + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); + return ret; +} + static const struct xenbus_device_id pvcalls_front_ids[] = { { "pvcalls" }, { "" } diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h index 8b0a274..aa8fe10 100644 --- a/drivers/xen/pvcalls-front.h +++ b/drivers/xen/pvcalls-front.h @@ -9,5 +9,6 @@ int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int addr_len); +int pvcalls_front_listen(struct socket *sock, int backlog); #endif -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 00/13] introduce the Xen PV Calls frontend
Hi all, this series introduces the frontend for the newly introduced PV Calls procotol. PV Calls is a paravirtualized protocol that allows the implementation of a set of POSIX functions in a different domain. The PV Calls frontend sends POSIX function calls to the backend, which implements them and returns a value to the frontend and acts on the function call. For more information about PV Calls, please read: https://xenbits.xen.org/docs/unstable/misc/pvcalls.html This patch series only implements the frontend driver. It doesn't attempt to redirect POSIX calls to it. The functions exported in pvcalls-front.h are meant to be used for that. A separate patch series will be sent to use them and hook them into the system. Changes in v2: - use xenbus_read_unsigned when possible - call dev_set_drvdata earlier in pvcalls_front_probe not to dereference a NULL pointer in the error path - set ret appropriately in pvcalls_front_probe - include pvcalls-front.h in pvcalls-front.c - call wake_up only once after the consuming loop in pvcalls_front_event_handler - don't leak "bytes" in case of errors in create_active - call spin_unlock appropriately in case of errors in create_active - remove all BUG_ONs - don't leak newsock->sk in pvcalls_front_accept in case of errors - rename PVCALLS_FRON_MAX_SPIN to PVCALLS_FRONT_MAX_SPIN - return bool from pvcalls_front_read_todo - add a barrier after setting PVCALLS_FLAG_POLL_RET in pvcalls_front_event_handler - remove outdated comment in pvcalls_front_free_map - clear sock->sk->sk_send_head later in pvcalls_front_release - make XEN_PVCALLS_FRONTEND tristate - don't add an empty resume function Stefano Stabellini (13): xen/pvcalls: introduce the pvcalls xenbus frontend xen/pvcalls: connect to the backend xen/pvcalls: implement socket command and handle events xen/pvcalls: implement connect command xen/pvcalls: implement bind command xen/pvcalls: implement listen command xen/pvcalls: implement accept command xen/pvcalls: implement sendmsg xen/pvcalls: implement recvmsg xen/pvcalls: implement poll command xen/pvcalls: implement release command xen/pvcalls: implement frontend disconnect xen: introduce a Kconfig option to enable the pvcalls frontend drivers/xen/Kconfig |9 + drivers/xen/Makefile|1 + drivers/xen/pvcalls-front.c | 1103 +++ drivers/xen/pvcalls-front.h | 28 ++ 4 files changed, 1141 insertions(+) create mode 100644 drivers/xen/pvcalls-front.c create mode 100644 drivers/xen/pvcalls-front.h ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [GIT PULL] (xen) stable/for-jens-4.13
Hi Jens, Please git pull in your branch "for-linus" the following branch which is based on 765e40b675a9566459ddcb8358ad16f3b8344bbe "blk-mq: map queues to all present CPUs": git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-4.13 It has two bug-fixes for the xen-blkfront driver. Thank you! drivers/block/xen-blkfront.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) Dongli Zhang (1): xen/blkfront: always allocate grants first from per-queue persistent grants Junxiao Bi (1): xen-blkfront: fix mq start/stop race ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 11/13] xen/pvcalls: implement release command
On Mon, 24 Jul 2017, Juergen Gross wrote: > On 22/07/17 02:12, Stefano Stabellini wrote: > > Send PVCALLS_RELEASE to the backend and wait for a reply. Take both > > in_mutex and out_mutex to avoid concurrent accesses. Then, free the > > socket. > > > > Signed-off-by: Stefano Stabellini> > CC: boris.ostrov...@oracle.com > > CC: jgr...@suse.com > > --- > > drivers/xen/pvcalls-front.c | 86 > > + > > drivers/xen/pvcalls-front.h | 1 + > > 2 files changed, 87 insertions(+) > > > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > > index b6cfb7d..bd3dfac 100644 > > --- a/drivers/xen/pvcalls-front.c > > +++ b/drivers/xen/pvcalls-front.c > > @@ -174,6 +174,24 @@ static irqreturn_t pvcalls_front_conn_handler(int irq, > > void *sock_map) > > return IRQ_HANDLED; > > } > > > > +static void pvcalls_front_free_map(struct pvcalls_bedata *bedata, > > + struct sock_mapping *map) > > +{ > > + int i; > > + > > + spin_lock(>pvcallss_lock); > > + if (!list_empty(>list)) > > + list_del_init(>list); > > + spin_unlock(>pvcallss_lock); > > + > > + /* what if the thread waiting still need access? */ > > Is this handled? If not, why is it no problem? Yes, sorry. This is a left-over from earlier versions of the code. This scenario is handled because threads waiting will have already been awaken by the wake_up_interruptible call in pvcalls_front_release, and also the code is protected by both the in_mutex and out_mutex. I hadn't introduced in_mutex and out_mutex yet when I wrote this comment, it no longer applies. > > + for (i = 0; i < (1 << map->active.ring->ring_order); i++) > > + gnttab_end_foreign_access(map->active.ring->ref[i], 0, 0); > > + gnttab_end_foreign_access(map->active.ref, 0, 0); > > + free_page((unsigned long)map->active.ring); > > + unbind_from_irqhandler(map->active.irq, map); > > +} > > + > > int pvcalls_front_socket(struct socket *sock) > > { > > struct pvcalls_bedata *bedata; > > @@ -805,6 +823,74 @@ unsigned int pvcalls_front_poll(struct file *file, > > struct socket *sock, > > return pvcalls_front_poll_passive(file, bedata, map, wait); > > } > > > > +int pvcalls_front_release(struct socket *sock) > > +{ > > + struct pvcalls_bedata *bedata; > > + struct sock_mapping *map; > > + int req_id, notify; > > + struct xen_pvcalls_request *req; > > + > > + if (!pvcalls_front_dev) > > + return -EIO; > > + bedata = dev_get_drvdata(_front_dev->dev); > > + if (!bedata) > > + return -EIO; > > + > > + if (sock->sk == NULL) > > + return 0; > > + > > + map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head); > > + if (map == NULL) > > + return 0; > > + WRITE_ONCE(sock->sk->sk_send_head, NULL); > > + > > + spin_lock(>pvcallss_lock); > > + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); > > + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); > > + if (RING_FULL(>ring) || > > + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { > > + spin_unlock(>pvcallss_lock); > > + return -EAGAIN; > > Isn't it a problem you already cleared sock->sk->sk_send_head? Yes, you are right. It would effectively leak the socket. I'll move the clearing of sk_send_head after this check. > > + } > > + req = RING_GET_REQUEST(>ring, req_id); > > + req->req_id = req_id; > > + req->cmd = PVCALLS_RELEASE; > > + req->u.release.id = (uint64_t)sock; > > + > > + bedata->ring.req_prod_pvt++; > > + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); > > + spin_unlock(>pvcallss_lock); > > + if (notify) > > + notify_remote_via_irq(bedata->irq); > > + > > + wait_event(bedata->inflight_req, > > + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); > > + > > + if (map->active_socket) { > > + /* > > +* Set in_error and wake up inflight_conn_req to force > > +* recvmsg waiters to exit. > > +*/ > > + map->active.ring->in_error = -EBADF; > > + wake_up_interruptible(>active.inflight_conn_req); > > + > > + mutex_lock(>active.in_mutex); > > + mutex_lock(>active.out_mutex); > > + pvcalls_front_free_map(bedata, map); > > + mutex_unlock(>active.out_mutex); > > + mutex_unlock(>active.in_mutex); > > + kfree(map); > > + } else { > > + spin_lock(>pvcallss_lock); > > + list_del_init(>list); > > + kfree(map); > > + spin_unlock(>pvcallss_lock); > > + } > > + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); > > + > > + return 0; > > +} > > + > > static const struct xenbus_device_id pvcalls_front_ids[] = { > > { "pvcalls" }, > > { "" } > > diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h > > index 25e05b8..3332978
Re: [Xen-devel] [PATCH v1 10/13] xen/pvcalls: implement poll command
On Mon, 24 Jul 2017, Juergen Gross wrote: > On 22/07/17 02:12, Stefano Stabellini wrote: > > For active sockets, check the indexes and use the inflight_conn_req > > waitqueue to wait. > > > > For passive sockets, send PVCALLS_POLL to the backend. Use the > > inflight_accept_req waitqueue if an accept is outstanding. Otherwise use > > the inflight_req waitqueue: inflight_req is awaken when a new response > > is received; on wakeup we check whether the POLL response is arrived by > > looking at the PVCALLS_FLAG_POLL_RET flag. We set the flag from > > pvcalls_front_event_handler, if the response was for a POLL command. > > > > In pvcalls_front_event_handler, get the struct socket pointer from the > > poll id (we previously converted struct socket* to uint64_t and used it > > as id). > > > > Signed-off-by: Stefano Stabellini> > CC: boris.ostrov...@oracle.com > > CC: jgr...@suse.com > > --- > > drivers/xen/pvcalls-front.c | 123 > > > > drivers/xen/pvcalls-front.h | 3 ++ > > 2 files changed, 115 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > > index 3d1041a..b6cfb7d 100644 > > --- a/drivers/xen/pvcalls-front.c > > +++ b/drivers/xen/pvcalls-front.c > > @@ -128,17 +128,29 @@ static irqreturn_t pvcalls_front_event_handler(int > > irq, void *dev_id) > > rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons); > > > > req_id = rsp->req_id; > > - src = (uint8_t *)>rsp[req_id]; > > - src += sizeof(rsp->req_id); > > - dst = (uint8_t *)rsp; > > - dst += sizeof(rsp->req_id); > > - memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); > > - /* > > -* First copy the rest of the data, then req_id. It is > > -* paired with the barrier when accessing bedata->rsp. > > -*/ > > - smp_wmb(); > > - WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); > > + if (rsp->cmd == PVCALLS_POLL) { > > + struct socket *sock = (struct socket *) rsp->u.poll.id; > > + struct sock_mapping *map = > > + (struct sock_mapping *) > > + READ_ONCE(sock->sk->sk_send_head); > > + > > + set_bit(PVCALLS_FLAG_POLL_RET, > > + (void *)>passive.flags); > > Add a barrier here to make sure PVCALLS_FLAG_POLL_INFLIGHT is cleared > _after_ setting PVCALLS_FLAG_POLL_RET? Yes, good point, I'll add an smp_wmb() here. A barrier is unnecessary at the other end (the beginning of pvcalls_front_poll_passive) because of the conditional instructions creating control dependencies. I'll add a comment. > > + clear_bit(PVCALLS_FLAG_POLL_INFLIGHT, > > + (void *)>passive.flags); > > + } else { > > + src = (uint8_t *)>rsp[req_id]; > > + src += sizeof(rsp->req_id); > > + dst = (uint8_t *)rsp; > > + dst += sizeof(rsp->req_id); > > + memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); > > + /* > > +* First copy the rest of the data, then req_id. It is > > +* paired with the barrier when accessing bedata->rsp. > > +*/ > > + smp_wmb(); > > + WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); > > + } > > > > bedata->ring.rsp_cons++; > > wake_up(>inflight_req); > > @@ -704,6 +716,95 @@ int pvcalls_front_accept(struct socket *sock, struct > > socket *newsock, int flags) > > return ret; > > } > > > > +static unsigned int pvcalls_front_poll_passive(struct file *file, > > + struct pvcalls_bedata *bedata, > > + struct sock_mapping *map, > > + poll_table *wait) > > +{ > > + int notify, req_id; > > + struct xen_pvcalls_request *req; > > + > > + if (test_bit(PVCALLS_FLAG_ACCEPT_INFLIGHT, > > +(void *)>passive.flags)) { > > + poll_wait(file, >passive.inflight_accept_req, wait); > > + return 0; > > + } > > + > > + if (test_and_clear_bit(PVCALLS_FLAG_POLL_RET, > > + (void *)>passive.flags)) > > + return POLLIN; > > + > > + if (test_and_set_bit(PVCALLS_FLAG_POLL_INFLIGHT, > > +(void *)>passive.flags)) { > > + poll_wait(file, >inflight_req, wait); > > + return 0; > > + } > > + > > + spin_lock(>pvcallss_lock); > > + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); > > + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); > > + if (RING_FULL(>ring) || > > + READ_ONCE(bedata->rsp[req_id].req_id) !=
Re: [Xen-devel] [PATCH v1 03/13] xen/pvcalls: implement socket command and handle events
On Mon, 24 Jul 2017, Juergen Gross wrote: > On 22/07/17 02:11, Stefano Stabellini wrote: > > Send a PVCALLS_SOCKET command to the backend, use the masked > > req_prod_pvt as req_id. This way, req_id is guaranteed to be between 0 > > and PVCALLS_NR_REQ_PER_RING. We already have a slot in the rsp array > > ready for the response, and there cannot be two outstanding responses > > with the same req_id. > > > > Wait for the response by waiting on the inflight_req waitqueue and > > check for the req_id field in rsp[req_id]. Use atomic accesses to > > read the field. Once a response is received, clear the corresponding rsp > > slot by setting req_id to PVCALLS_INVALID_ID. Note that > > PVCALLS_INVALID_ID is invalid only from the frontend point of view. It > > is not part of the PVCalls protocol. > > > > pvcalls_front_event_handler is in charge of copying responses from the > > ring to the appropriate rsp slot. It is done by copying the body of the > > response first, then by copying req_id atomically. After the copies, > > wake up anybody waiting on waitqueue. > > > > pvcallss_lock protects accesses to the ring. > > > > Signed-off-by: Stefano Stabellini> > CC: boris.ostrov...@oracle.com > > CC: jgr...@suse.com > > --- > > drivers/xen/pvcalls-front.c | 91 > > + > > drivers/xen/pvcalls-front.h | 8 > > 2 files changed, 99 insertions(+) > > create mode 100644 drivers/xen/pvcalls-front.h > > > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > > index fb08ebf..7933c73 100644 > > --- a/drivers/xen/pvcalls-front.c > > +++ b/drivers/xen/pvcalls-front.c > > Shouldn't you include pvcalls-front.h? Yes > > @@ -40,9 +40,100 @@ struct pvcalls_bedata { > > > > static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) > > { > > + struct xenbus_device *dev = dev_id; > > + struct pvcalls_bedata *bedata; > > + struct xen_pvcalls_response *rsp; > > + uint8_t *src, *dst; > > + int req_id = 0, more = 0; > > + > > + if (dev == NULL) > > + return IRQ_HANDLED; > > + > > + bedata = dev_get_drvdata(>dev); > > + if (bedata == NULL) > > + return IRQ_HANDLED; > > + > > +again: > > + while (RING_HAS_UNCONSUMED_RESPONSES(>ring)) { > > + rsp = RING_GET_RESPONSE(>ring, bedata->ring.rsp_cons); > > + > > + req_id = rsp->req_id; > > + src = (uint8_t *)>rsp[req_id]; > > + src += sizeof(rsp->req_id); > > + dst = (uint8_t *)rsp; > > + dst += sizeof(rsp->req_id); > > + memcpy(dst, src, sizeof(*rsp) - sizeof(rsp->req_id)); > > + /* > > +* First copy the rest of the data, then req_id. It is > > +* paired with the barrier when accessing bedata->rsp. > > +*/ > > + smp_wmb(); > > + WRITE_ONCE(bedata->rsp[req_id].req_id, rsp->req_id); > > + > > + bedata->ring.rsp_cons++; > > + wake_up(>inflight_req); > > + } > > + > > + RING_FINAL_CHECK_FOR_RESPONSES(>ring, more); > > + if (more) > > + goto again; > > Wouldn't it make more sense to use wake_up() just once if there is any > response pending and do the consuming loop outside the irq handler? You are definitely right: it's far better to call wake_up() just once after the consuming loop if there is any response pending. I'll do that. However, I am not sure there is much to gain in moving the consuming loop out of the irq handler: it's pretty short and doesn't call any long running or sleeping functions. > > return IRQ_HANDLED; > > } > > > > +int pvcalls_front_socket(struct socket *sock) > > +{ > > + struct pvcalls_bedata *bedata; > > + struct xen_pvcalls_request *req; > > + int notify, req_id, ret; > > + > > + if (!pvcalls_front_dev) > > + return -EACCES; > > + /* > > +* PVCalls only supports domain AF_INET, > > +* type SOCK_STREAM and protocol 0 sockets for now. > > +* > > +* Check socket type here, AF_INET and protocol checks are done > > +* by the caller. > > +*/ > > + if (sock->type != SOCK_STREAM) > > + return -ENOTSUPP; > > + > > + bedata = dev_get_drvdata(_front_dev->dev); > > + > > + spin_lock(>pvcallss_lock); > > + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(>ring) - 1); > > + BUG_ON(req_id >= PVCALLS_NR_REQ_PER_RING); > > + if (RING_FULL(>ring) || > > + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { > > + spin_unlock(>pvcallss_lock); > > + return -EAGAIN; > > + } > > + req = RING_GET_REQUEST(>ring, req_id); > > + req->req_id = req_id; > > + req->cmd = PVCALLS_SOCKET; > > + req->u.socket.id = (uint64_t) sock; > > + req->u.socket.domain = AF_INET; > > + req->u.socket.type = SOCK_STREAM; > > + req->u.socket.protocol = 0; > > + > > + bedata->ring.req_prod_pvt++; > > + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(>ring, notify); > > +
Re: [Xen-devel] [PATCH v1 02/13] xen/pvcalls: connect to the backend
On Mon, 24 Jul 2017, Juergen Gross wrote: > On 22/07/17 02:11, Stefano Stabellini wrote: > > Implement the probe function for the pvcalls frontend. Read the > > supported versions, max-page-order and function-calls nodes from > > xenstore. > > > > Introduce a data structure named pvcalls_bedata. It contains pointers to > > the command ring, the event channel, a list of active sockets and a list > > of passive sockets. Lists accesses are protected by a spin_lock. > > > > Introduce a waitqueue to allow waiting for a response on commands sent > > to the backend. > > > > Introduce an array of struct xen_pvcalls_response to store commands > > responses. > > > > Only one frontend<->backend connection is supported at any given time > > for a guest. Store the active frontend device to a static pointer. > > > > Introduce a stub functions for the event handler. > > > > Signed-off-by: Stefano Stabellini> > CC: boris.ostrov...@oracle.com > > CC: jgr...@suse.com > > --- > > drivers/xen/pvcalls-front.c | 153 > > > > 1 file changed, 153 insertions(+) > > > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > > index 173e204..fb08ebf 100644 > > --- a/drivers/xen/pvcalls-front.c > > +++ b/drivers/xen/pvcalls-front.c > > @@ -20,6 +20,29 @@ > > #include > > #include > > > > +#define PVCALLS_INVALID_ID (UINT_MAX) > > +#define RING_ORDER XENBUS_MAX_RING_GRANT_ORDER > > +#define PVCALLS_NR_REQ_PER_RING __CONST_RING_SIZE(xen_pvcalls, > > XEN_PAGE_SIZE) > > + > > +struct pvcalls_bedata { > > + struct xen_pvcalls_front_ring ring; > > + grant_ref_t ref; > > + int irq; > > + > > + struct list_head socket_mappings; > > + struct list_head socketpass_mappings; > > + spinlock_t pvcallss_lock; > > + > > + wait_queue_head_t inflight_req; > > + struct xen_pvcalls_response rsp[PVCALLS_NR_REQ_PER_RING]; > > +}; > > +struct xenbus_device *pvcalls_front_dev; > > + > > +static irqreturn_t pvcalls_front_event_handler(int irq, void *dev_id) > > +{ > > + return IRQ_HANDLED; > > +} > > + > > static const struct xenbus_device_id pvcalls_front_ids[] = { > > { "pvcalls" }, > > { "" } > > @@ -33,7 +56,114 @@ static int pvcalls_front_remove(struct xenbus_device > > *dev) > > static int pvcalls_front_probe(struct xenbus_device *dev, > > const struct xenbus_device_id *id) > > { > > + int ret = -EFAULT, evtchn, ref = -1, i; > > + unsigned int max_page_order, function_calls, len; > > + char *versions; > > + grant_ref_t gref_head = 0; > > + struct xenbus_transaction xbt; > > + struct pvcalls_bedata *bedata = NULL; > > + struct xen_pvcalls_sring *sring; > > + > > + if (pvcalls_front_dev != NULL) { > > + dev_err(>dev, "only one PV Calls connection supported\n"); > > + return -EINVAL; > > + } > > + > > + versions = xenbus_read(XBT_NIL, dev->otherend, "versions", ); > > + if (!len) > > + return -EINVAL; > > + if (strcmp(versions, "1")) { > > + kfree(versions); > > + return -EINVAL; > > + } > > + kfree(versions); > > + ret = xenbus_scanf(XBT_NIL, dev->otherend, > > + "max-page-order", "%u", _page_order); > > Use xenbus_read_unsigned() instead? OK > > + if (ret <= 0) > > + return -ENODEV; > > + if (max_page_order < RING_ORDER) > > + return -ENODEV; > > + ret = xenbus_scanf(XBT_NIL, dev->otherend, > > + "function-calls", "%u", _calls); > > xenbus_read_unsigned() again? OK > > + if (ret <= 0 || function_calls != 1) > > + return -ENODEV; > > + pr_info("%s max-page-order is %u\n", __func__, max_page_order); > > + > > + bedata = kzalloc(sizeof(struct pvcalls_bedata), GFP_KERNEL); > > + if (!bedata) > > + return -ENOMEM; > > + > > You should call dev_set_drvdata() here already, otherwise entering the > error path will dereference a NULL pointer instead of bedata. OK > > + init_waitqueue_head(>inflight_req); > > + for (i = 0; i < PVCALLS_NR_REQ_PER_RING; i++) > > + bedata->rsp[i].req_id = PVCALLS_INVALID_ID; > > + > > + sring = (struct xen_pvcalls_sring *) __get_free_page(GFP_KERNEL | > > +__GFP_ZERO); > > + if (!sring) > > + goto error; > > ret will be 1 here. Shouldn't you set it to -ENOMEM? Yes, I'll do that > > > + SHARED_RING_INIT(sring); > > + FRONT_RING_INIT(>ring, sring, XEN_PAGE_SIZE); > > + > > + ret = xenbus_alloc_evtchn(dev, ); > > + if (ret) > > + goto error; > > + > > + bedata->irq = bind_evtchn_to_irqhandler(evtchn, > > + pvcalls_front_event_handler, > > + 0, "pvcalls-frontend", dev); > > + if (bedata->irq < 0) { > > + ret = bedata->irq; > > + goto error; > > + } > > + > > + ret =
[Xen-devel] [libvirt test] 112276: tolerable all pass - PUSHED
flight 112276 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/112276/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 112201 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail like 112201 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 112201 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-libvirt-qcow2 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass version targeted for testing: libvirt f7237d63e8f02f3689f9b63b413fae7d4221faa9 baseline version: libvirt f36f2e463f877292ee1f30dcb1000337739c2fd3 Last test of basis 112201 2017-07-23 04:21:57 Z2 days Testing same since 112276 2017-07-25 04:21:09 Z0 days1 attempts People who touched revisions under test: John FerlanMartin Kletzander Michal Privoznik Peter Krempa Shivaprasad G Bhat jobs: build-amd64-xsm pass build-arm64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-arm64-arm64-libvirt-qcow2 pass test-armhf-armhf-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
[Xen-devel] [PATCH v2] hvmloader, libxl: use the correct ACPI settings depending on device model
We need to choose ACPI tables and ACPI IO port location properly depending on the device model version we are running. Previously, this decision was made by BIOS type specific code in hvmloader, e.g. always load QEMU traditional specific tables if it's ROMBIOS and always load QEMU Xen specific tables if it's SeaBIOS. This change saves this behavior but adds an additional way (xenstore key) to specify the correct device model if we happen to run a non-default one. Toolstack bit makes use of it. Signed-off-by: Igor DruzhininReviewed-by: Paul Durrant --- Changes in v2: * fix insufficient allocation size of localent --- tools/firmware/hvmloader/hvmloader.c | 2 -- tools/firmware/hvmloader/ovmf.c | 2 ++ tools/firmware/hvmloader/rombios.c | 2 ++ tools/firmware/hvmloader/seabios.c | 3 +++ tools/firmware/hvmloader/util.c | 24 tools/libxl/libxl_create.c | 4 +++- 6 files changed, 34 insertions(+), 3 deletions(-) diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c index f603f68..db11ab1 100644 --- a/tools/firmware/hvmloader/hvmloader.c +++ b/tools/firmware/hvmloader/hvmloader.c @@ -405,8 +405,6 @@ int main(void) } acpi_enable_sci(); - -hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1); } init_vm86_tss(); diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c index 4ff7f1d..ebadc64 100644 --- a/tools/firmware/hvmloader/ovmf.c +++ b/tools/firmware/hvmloader/ovmf.c @@ -127,6 +127,8 @@ static void ovmf_acpi_build_tables(void) .dsdt_15cpu_len = 0 }; +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1); + hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS); } diff --git a/tools/firmware/hvmloader/rombios.c b/tools/firmware/hvmloader/rombios.c index 56b39b7..31a7c65 100644 --- a/tools/firmware/hvmloader/rombios.c +++ b/tools/firmware/hvmloader/rombios.c @@ -181,6 +181,8 @@ static void rombios_acpi_build_tables(void) .dsdt_15cpu_len = dsdt_15cpu_len, }; +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0); + hvmloader_acpi_build_tables(, ACPI_PHYSICAL_ADDRESS); } diff --git a/tools/firmware/hvmloader/seabios.c b/tools/firmware/hvmloader/seabios.c index 870576a..5878eff 100644 --- a/tools/firmware/hvmloader/seabios.c +++ b/tools/firmware/hvmloader/seabios.c @@ -28,6 +28,7 @@ #include #include +#include extern unsigned char dsdt_anycpu_qemu_xen[]; extern int dsdt_anycpu_qemu_xen_len; @@ -99,6 +100,8 @@ static void seabios_acpi_build_tables(void) .dsdt_15cpu_len = 0, }; +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1); + hvmloader_acpi_build_tables(, rsdp); add_table(rsdp); } diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c index db5f240..45b777c 100644 --- a/tools/firmware/hvmloader/util.c +++ b/tools/firmware/hvmloader/util.c @@ -31,6 +31,9 @@ #include #include +extern unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[]; +extern int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len; + /* * Check whether there exists overlap in the specified memory range. * Returns true if exists, else returns false. @@ -897,6 +900,27 @@ void hvmloader_acpi_build_tables(struct acpi_config *config, /* Allocate and initialise the acpi info area. */ mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1); +/* If the device model is specified switch to the corresponding tables */ +s = xenstore_read("platform/device-model", ""); +if ( !strncmp(s, "qemu_xen_traditional", 21) ) +{ +config->dsdt_anycpu = dsdt_anycpu; +config->dsdt_anycpu_len = dsdt_anycpu_len; +config->dsdt_15cpu = dsdt_15cpu; +config->dsdt_15cpu_len = dsdt_15cpu_len; + +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 0); +} +else if ( !strncmp(s, "qemu_xen", 9) ) +{ +config->dsdt_anycpu = dsdt_anycpu_qemu_xen; +config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len; +config->dsdt_15cpu = NULL; +config->dsdt_15cpu_len = 0; + +hvm_param_set(HVM_PARAM_ACPI_IOPORTS_LOCATION, 1); +} + config->lapic_base_address = LAPIC_BASE_ADDRESS; config->lapic_id = acpi_lapic_id; config->ioapic_base_address = ioapic_base_address; diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 1158303..1d24209 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc, vments[4] = "start_time"; vments[5] = GCSPRINTF("%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/1); -localents = libxl__calloc(gc, 11, sizeof(char *)); +localents = libxl__calloc(gc, 13, sizeof(char *)); i = 0; localents[i++] = "platform/acpi"; localents[i++] =
Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache
> > Thanks---however, after re-reading xen-mapcache.c, dma needs to be false > > for unlocked mappings. > > If there is a DMA operation already in progress, it means that we'll > already have a locked mapping for it. Yes, I only wanted to say that qemu_ram_ptr_length should pass dma=false when called by address_space_*_continue (i.e. with locked=false). Paolo > When address_space_write_continue is called, which in turn would call > qemu_map_ram_ptr, or qemu_ram_ptr_length(unlocked), if the start and > size of the requested mapping matches the one of the previously created > locked mapping, then a pointer to the locked mapping will be returned. > > If they don't match, a new unlocked mapping will be created and a > pointer to it will be returned. (Arguably the algorithm could be > improved so that a new mapping is not created if the address and size > are contained within the locked mapping. This is a missing optimization > today.) > > It doesn't matter if a new unlocked mapping is created, or if the locked > mapping is returned, because the pointer returned by > qemu_ram_ptr_length(unlocked) is only used to do the memcpy, and never > again. So I don't think this is a problem. > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4.12 047/196] xen/scsiback: Fix a TMR related use-after-free
4.12-stable review patch. If anyone has any objections, please let me know. -- From: Bart Van Asschecommit 9f4ab18ac51dc87345a9cbd2527e6acf7a0a9335 upstream. scsiback_release_cmd() must not dereference se_cmd->se_tmr_req because that memory is freed by target_free_cmd_mem() before scsiback_release_cmd() is called. Fix this use-after-free by inlining struct scsiback_tmr into struct vscsibk_pend. Signed-off-by: Bart Van Assche Reviewed-by: Juergen Gross Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: David Disseldorp Cc: xen-de...@lists.xenproject.org Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman --- drivers/xen/xen-scsiback.c | 33 + 1 file changed, 9 insertions(+), 24 deletions(-) --- a/drivers/xen/xen-scsiback.c +++ b/drivers/xen/xen-scsiback.c @@ -134,9 +134,7 @@ struct vscsibk_pend { struct page *pages[VSCSI_MAX_GRANTS]; struct se_cmd se_cmd; -}; -struct scsiback_tmr { atomic_t tmr_complete; wait_queue_head_t tmr_wait; }; @@ -599,26 +597,20 @@ static void scsiback_device_action(struc struct scsiback_tpg *tpg = pending_req->v2p->tpg; struct scsiback_nexus *nexus = tpg->tpg_nexus; struct se_cmd *se_cmd = _req->se_cmd; - struct scsiback_tmr *tmr; u64 unpacked_lun = pending_req->v2p->lun; int rc, err = FAILED; - tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL); - if (!tmr) { - target_put_sess_cmd(se_cmd); - goto err; - } - - init_waitqueue_head(>tmr_wait); + init_waitqueue_head(_req->tmr_wait); rc = target_submit_tmr(_req->se_cmd, nexus->tvn_se_sess, _req->sense_buffer[0], - unpacked_lun, tmr, act, GFP_KERNEL, + unpacked_lun, NULL, act, GFP_KERNEL, tag, TARGET_SCF_ACK_KREF); if (rc) goto err; - wait_event(tmr->tmr_wait, atomic_read(>tmr_complete)); + wait_event(pending_req->tmr_wait, + atomic_read(_req->tmr_complete)); err = (se_cmd->se_tmr_req->response == TMR_FUNCTION_COMPLETE) ? SUCCESS : FAILED; @@ -626,9 +618,8 @@ static void scsiback_device_action(struc scsiback_do_resp_with_sense(NULL, err, 0, pending_req); transport_generic_free_cmd(_req->se_cmd, 1); return; + err: - if (tmr) - kfree(tmr); scsiback_do_resp_with_sense(NULL, err, 0, pending_req); } @@ -1389,12 +1380,6 @@ static int scsiback_check_stop_free(stru static void scsiback_release_cmd(struct se_cmd *se_cmd) { struct se_session *se_sess = se_cmd->se_sess; - struct se_tmr_req *se_tmr = se_cmd->se_tmr_req; - - if (se_tmr && se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) { - struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr; - kfree(tmr); - } percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag); } @@ -1455,11 +1440,11 @@ static int scsiback_queue_status(struct static void scsiback_queue_tm_rsp(struct se_cmd *se_cmd) { - struct se_tmr_req *se_tmr = se_cmd->se_tmr_req; - struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr; + struct vscsibk_pend *pending_req = container_of(se_cmd, + struct vscsibk_pend, se_cmd); - atomic_set(>tmr_complete, 1); - wake_up(>tmr_wait); + atomic_set(_req->tmr_complete, 1); + wake_up(_req->tmr_wait); } static void scsiback_aborted_task(struct se_cmd *se_cmd) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4.9 037/125] xen/scsiback: Fix a TMR related use-after-free
4.9-stable review patch. If anyone has any objections, please let me know. -- From: Bart Van Asschecommit 9f4ab18ac51dc87345a9cbd2527e6acf7a0a9335 upstream. scsiback_release_cmd() must not dereference se_cmd->se_tmr_req because that memory is freed by target_free_cmd_mem() before scsiback_release_cmd() is called. Fix this use-after-free by inlining struct scsiback_tmr into struct vscsibk_pend. Signed-off-by: Bart Van Assche Reviewed-by: Juergen Gross Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: David Disseldorp Cc: xen-de...@lists.xenproject.org Signed-off-by: Nicholas Bellinger Signed-off-by: Greg Kroah-Hartman --- drivers/xen/xen-scsiback.c | 33 + 1 file changed, 9 insertions(+), 24 deletions(-) --- a/drivers/xen/xen-scsiback.c +++ b/drivers/xen/xen-scsiback.c @@ -134,9 +134,7 @@ struct vscsibk_pend { struct page *pages[VSCSI_MAX_GRANTS]; struct se_cmd se_cmd; -}; -struct scsiback_tmr { atomic_t tmr_complete; wait_queue_head_t tmr_wait; }; @@ -599,26 +597,20 @@ static void scsiback_device_action(struc struct scsiback_tpg *tpg = pending_req->v2p->tpg; struct scsiback_nexus *nexus = tpg->tpg_nexus; struct se_cmd *se_cmd = _req->se_cmd; - struct scsiback_tmr *tmr; u64 unpacked_lun = pending_req->v2p->lun; int rc, err = FAILED; - tmr = kzalloc(sizeof(struct scsiback_tmr), GFP_KERNEL); - if (!tmr) { - target_put_sess_cmd(se_cmd); - goto err; - } - - init_waitqueue_head(>tmr_wait); + init_waitqueue_head(_req->tmr_wait); rc = target_submit_tmr(_req->se_cmd, nexus->tvn_se_sess, _req->sense_buffer[0], - unpacked_lun, tmr, act, GFP_KERNEL, + unpacked_lun, NULL, act, GFP_KERNEL, tag, TARGET_SCF_ACK_KREF); if (rc) goto err; - wait_event(tmr->tmr_wait, atomic_read(>tmr_complete)); + wait_event(pending_req->tmr_wait, + atomic_read(_req->tmr_complete)); err = (se_cmd->se_tmr_req->response == TMR_FUNCTION_COMPLETE) ? SUCCESS : FAILED; @@ -626,9 +618,8 @@ static void scsiback_device_action(struc scsiback_do_resp_with_sense(NULL, err, 0, pending_req); transport_generic_free_cmd(_req->se_cmd, 1); return; + err: - if (tmr) - kfree(tmr); scsiback_do_resp_with_sense(NULL, err, 0, pending_req); } @@ -1389,12 +1380,6 @@ static int scsiback_check_stop_free(stru static void scsiback_release_cmd(struct se_cmd *se_cmd) { struct se_session *se_sess = se_cmd->se_sess; - struct se_tmr_req *se_tmr = se_cmd->se_tmr_req; - - if (se_tmr && se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) { - struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr; - kfree(tmr); - } percpu_ida_free(_sess->sess_tag_pool, se_cmd->map_tag); } @@ -1455,11 +1440,11 @@ static int scsiback_queue_status(struct static void scsiback_queue_tm_rsp(struct se_cmd *se_cmd) { - struct se_tmr_req *se_tmr = se_cmd->se_tmr_req; - struct scsiback_tmr *tmr = se_tmr->fabric_tmr_ptr; + struct vscsibk_pend *pending_req = container_of(se_cmd, + struct vscsibk_pend, se_cmd); - atomic_set(>tmr_complete, 1); - wake_up(>tmr_wait); + atomic_set(_req->tmr_complete, 1); + wake_up(_req->tmr_wait); } static void scsiback_aborted_task(struct se_cmd *se_cmd) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support
On Tue, 25 Jul 2017, Julien Grall wrote: > On 25/07/17 19:48, Stefano Stabellini wrote: > > On Tue, 25 Jul 2017, Julien Grall wrote: > > > On 25/07/17 07:47, Vijay Kilari wrote: > > > > > > void numa_failed(void) > > > > > > { > > > > > > numa_off = true; > > > > > > init_dt_numa_distance(); > > > > > > node_distance_fn = NULL; > > > > > > +init_cpu_to_node(); > > > > > > +} > > > > > > + > > > > > > +void __init numa_set_cpu_node(int cpu, unsigned int nid) > > > > > > +{ > > > > > > +if ( !node_isset(nid, processor_nodes_parsed) || nid >= > > > > > > MAX_NUMNODES > > > > > > ) > > > > > > +nid = 0; > > > > > > > > > > > > > > > This looks wrong to me. If the node-id is invalid, why would you > > > > > blindly > > > > > set > > > > > to 0? > > > > > > > > Generally this check will not pass. I will make this function return > > > > error code in case > > > > of wrong nid. > > > > > > I don't really want to see error code and error handling everywhere in the > > > initialization code. I would assume that if the NUMA bindings are wrong we > > > should just crash Xen rather continuing with NUMA disabled. > > > > > > Stefano do you have any opinion here? > > > > Yes, I noticed that there is an overabundance of error checks in the > > patches. I have pointed out in other cases that some of these checks are > > duplicates. > > > > I am OK with some checks but we should not do the same check over and > > over. > > > > To answer the question: do we need any checks at all? > > > > I am fine with no checks on the device tree or ACPI bindings themselves. > > I am also OK with some checks in few places to check that the > > information passed by the firmware is in the right shape (for example we > > check for the ACPI header length before accessing any ACPI tables). That > > is good. But I am not OK with repeating the same check multiple times > > uselessly or checking for conditions that cannot happen (like a NULL > > pointer in the ACPI header case again). > > I would prefer to keep the check on the DT bindings and ACPI bindings. I hit > some problem in the past that were quite annoying to debug without them. > > But I was wondering if we should just panic/BUG_ON directly. Rather than > returning an error. I think BUG_ON is fine, but it would be best if we also printed a useful message before crashing Xen. At least the user would know that the problem is a broken device_tree/ACPI. > For any redundant check, we could just turn to an ASSERT. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache
On Tue, 25 Jul 2017, Paolo Bonzini wrote: > - Original Message - > > From: "Stefano Stabellini"> > To: "Paolo Bonzini" > > Cc: "Anthony PERARD" , "Stefano Stabellini" > > , > > xen-devel@lists.xen.org, qemu-de...@nongnu.org > > Sent: Tuesday, July 25, 2017 8:08:21 PM > > Subject: Re: QEMU commit 04bf2526ce breaks use of xen-mapcache > > > > On Tue, 25 Jul 2017, Paolo Bonzini wrote: > > > > Hi, > > > > > > > > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram) > > > > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr(). > > > > That result in calling xen_map_cache() with lock=true, but this mapping > > > > is never invalidated. > > > > So QEMU use more and more RAM until it stop working for a reason or an > > > > other. (crash if host have little RAM or stop emulating but no crash) > > > > > > > > I don't know if calling xen_invalidate_map_cache_entry() in > > > > address_space_read_continue() and address_space_write_continue() is the > > > > right answer. Is there something better to do ? > > > > > > I think it's correct for dma to be true... maybe add a lock argument to > > > qemu_ram_ptr_length, so that make address_space_{read,write}_continue can > > > pass 0 and everyone else passes 1? > > > > I think that is a great suggestion. That way, the difference between > > locked mappings and unlocked mappings would be explicit, rather than > > relying on callers to use qemu_map_ram_ptr for unlocked mappings and > > qemu_ram_ptr_length for locked mappings. And there aren't that many > > callers of qemu_ram_ptr_length, so adding a parameter wouldn't be an > > issue. > > Thanks---however, after re-reading xen-mapcache.c, dma needs to be false > for unlocked mappings. If there is a DMA operation already in progress, it means that we'll already have a locked mapping for it. When address_space_write_continue is called, which in turn would call qemu_map_ram_ptr, or qemu_ram_ptr_length(unlocked), if the start and size of the requested mapping matches the one of the previously created locked mapping, then a pointer to the locked mapping will be returned. If they don't match, a new unlocked mapping will be created and a pointer to it will be returned. (Arguably the algorithm could be improved so that a new mapping is not created if the address and size are contained within the locked mapping. This is a missing optimization today.) It doesn't matter if a new unlocked mapping is created, or if the locked mapping is returned, because the pointer returned by qemu_ram_ptr_length(unlocked) is only used to do the memcpy, and never again. So I don't think this is a problem. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86/hvm: Fix boundary check in hvmemul_insn_fetch()
c/s 0943a03037 added some extra protection for overflowing the emulation instruction cache, but Coverity points out that boundary condition is off by one when memcpy()'ing out of the buffer. Signed-off-by: Andrew Cooper--- CC: Jan Beulich CC: Paul Durrant --- xen/arch/x86/hvm/emulate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 495e312..52bed04 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -958,8 +958,8 @@ int hvmemul_insn_fetch( * Will we overflow insn_buf[]? This shouldn't be able to happen, * which means something went wrong with instruction decoding... */ -if ( insn_off > sizeof(hvmemul_ctxt->insn_buf) || - (insn_off + bytes) > sizeof(hvmemul_ctxt->insn_buf) ) +if ( insn_off >= sizeof(hvmemul_ctxt->insn_buf) || + (insn_off + bytes) >= sizeof(hvmemul_ctxt->insn_buf) ) { ASSERT_UNREACHABLE(); return X86EMUL_UNHANDLEABLE; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/3] x86/svm: Drop svm_segment_register_t
On 07/25/2017 02:28 PM, Andrew Cooper wrote: > Most SVM code already uses struct segment_register. Drop the typedef and > adjust the definitions in struct vmcb_struct, and svm_dump_sel(). Introduce > some build-time assertions that struct segment_register from the common > emulation code is usable in struct vmcb_struct. > > While making these adjustments, fix some comments to not mix decimal and > hexidecimal offsets, and drop all trailing whitespace in vmcb.h > > No functional change. > > Signed-off-by: Andrew Cooper> Reviewed-by: Jan Beulich > --- Reviewed-by: Boris Ostrovsky ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support
On 25/07/17 19:48, Stefano Stabellini wrote: On Tue, 25 Jul 2017, Julien Grall wrote: On 25/07/17 07:47, Vijay Kilari wrote: void numa_failed(void) { numa_off = true; init_dt_numa_distance(); node_distance_fn = NULL; +init_cpu_to_node(); +} + +void __init numa_set_cpu_node(int cpu, unsigned int nid) +{ +if ( !node_isset(nid, processor_nodes_parsed) || nid >= MAX_NUMNODES ) +nid = 0; This looks wrong to me. If the node-id is invalid, why would you blindly set to 0? Generally this check will not pass. I will make this function return error code in case of wrong nid. I don't really want to see error code and error handling everywhere in the initialization code. I would assume that if the NUMA bindings are wrong we should just crash Xen rather continuing with NUMA disabled. Stefano do you have any opinion here? Yes, I noticed that there is an overabundance of error checks in the patches. I have pointed out in other cases that some of these checks are duplicates. I am OK with some checks but we should not do the same check over and over. To answer the question: do we need any checks at all? I am fine with no checks on the device tree or ACPI bindings themselves. I am also OK with some checks in few places to check that the information passed by the firmware is in the right shape (for example we check for the ACPI header length before accessing any ACPI tables). That is good. But I am not OK with repeating the same check multiple times uselessly or checking for conditions that cannot happen (like a NULL pointer in the ACPI header case again). I would prefer to keep the check on the DT bindings and ACPI bindings. I hit some problem in the past that were quite annoying to debug without them. But I was wondering if we should just panic/BUG_ON directly. Rather than returning an error. For any redundant check, we could just turn to an ASSERT. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support
On Tue, 25 Jul 2017, Julien Grall wrote: > On 25/07/17 07:47, Vijay Kilari wrote: > > > > void numa_failed(void) > > > > { > > > > numa_off = true; > > > > init_dt_numa_distance(); > > > > node_distance_fn = NULL; > > > > +init_cpu_to_node(); > > > > +} > > > > + > > > > +void __init numa_set_cpu_node(int cpu, unsigned int nid) > > > > +{ > > > > +if ( !node_isset(nid, processor_nodes_parsed) || nid >= > > > > MAX_NUMNODES > > > > ) > > > > +nid = 0; > > > > > > > > > This looks wrong to me. If the node-id is invalid, why would you blindly > > > set > > > to 0? > > > > Generally this check will not pass. I will make this function return > > error code in case > > of wrong nid. > > I don't really want to see error code and error handling everywhere in the > initialization code. I would assume that if the NUMA bindings are wrong we > should just crash Xen rather continuing with NUMA disabled. > > Stefano do you have any opinion here? Yes, I noticed that there is an overabundance of error checks in the patches. I have pointed out in other cases that some of these checks are duplicates. I am OK with some checks but we should not do the same check over and over. To answer the question: do we need any checks at all? I am fine with no checks on the device tree or ACPI bindings themselves. I am also OK with some checks in few places to check that the information passed by the firmware is in the right shape (for example we check for the ACPI header length before accessing any ACPI tables). That is good. But I am not OK with repeating the same check multiple times uselessly or checking for conditions that cannot happen (like a NULL pointer in the ACPI header case again). ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache
- Original Message - > From: "Stefano Stabellini"> To: "Paolo Bonzini" > Cc: "Anthony PERARD" , "Stefano Stabellini" > , > xen-devel@lists.xen.org, qemu-de...@nongnu.org > Sent: Tuesday, July 25, 2017 8:08:21 PM > Subject: Re: QEMU commit 04bf2526ce breaks use of xen-mapcache > > On Tue, 25 Jul 2017, Paolo Bonzini wrote: > > > Hi, > > > > > > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram) > > > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr(). > > > That result in calling xen_map_cache() with lock=true, but this mapping > > > is never invalidated. > > > So QEMU use more and more RAM until it stop working for a reason or an > > > other. (crash if host have little RAM or stop emulating but no crash) > > > > > > I don't know if calling xen_invalidate_map_cache_entry() in > > > address_space_read_continue() and address_space_write_continue() is the > > > right answer. Is there something better to do ? > > > > I think it's correct for dma to be true... maybe add a lock argument to > > qemu_ram_ptr_length, so that make address_space_{read,write}_continue can > > pass 0 and everyone else passes 1? > > I think that is a great suggestion. That way, the difference between > locked mappings and unlocked mappings would be explicit, rather than > relying on callers to use qemu_map_ram_ptr for unlocked mappings and > qemu_ram_ptr_length for locked mappings. And there aren't that many > callers of qemu_ram_ptr_length, so adding a parameter wouldn't be an > issue. Thanks---however, after re-reading xen-mapcache.c, dma needs to be false for unlocked mappings. Paolo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH v3 15/24] ARM: NUMA: DT: Add CPU NUMA support
On 25/07/17 07:47, Vijay Kilari wrote: void numa_failed(void) { numa_off = true; init_dt_numa_distance(); node_distance_fn = NULL; +init_cpu_to_node(); +} + +void __init numa_set_cpu_node(int cpu, unsigned int nid) +{ +if ( !node_isset(nid, processor_nodes_parsed) || nid >= MAX_NUMNODES ) +nid = 0; This looks wrong to me. If the node-id is invalid, why would you blindly set to 0? Generally this check will not pass. I will make this function return error code in case of wrong nid. I don't really want to see error code and error handling everywhere in the initialization code. I would assume that if the NUMA bindings are wrong we should just crash Xen rather continuing with NUMA disabled. Stefano do you have any opinion here? [...] #include #include #include @@ -106,6 +107,7 @@ static void __init dt_smp_init_cpus(void) [0 ... NR_CPUS - 1] = MPIDR_INVALID }; bool_t bootcpu_valid = 0; +nodeid_t *cpu_to_nodemap; int rc; mpidr = boot_cpu_data.mpidr.bits & MPIDR_HWID_MASK; @@ -117,11 +119,18 @@ static void __init dt_smp_init_cpus(void) return; } +cpu_to_nodemap = xzalloc_array(nodeid_t, NR_CPUS); Why do you need to allocate cpu_to_nodemap? Would not it be easier to put it on the stack as we do for other variable? This array holds nodemap indexed by cpuid once for all the cpus. Later while setting the logical cpu id mapping, the node mapping is set by calling numa_set_cpu_node(). This does not answer question... Please read it again and explain why you can't do: nodeid_t cpu_to_nodemap[NR_CPUS]; diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h index d1dc83a..0d3146c 100644 --- a/xen/include/asm-arm/numa.h +++ b/xen/include/asm-arm/numa.h @@ -10,12 +10,19 @@ void init_dt_numa_distance(void); #ifdef CONFIG_NUMA void numa_init(void); int dt_numa_init(void); +void numa_set_cpu_node(int cpu, unsigned int nid); + #else static inline void numa_init(void) { return; } +static inline void numa_set_cpu_node(int cpu, unsigned int nid) +{ +return; +} + /* Fake one node for now. See also node_online_map. */ #define cpu_to_node(cpu) 0 #define node_to_cpumask(node) (cpu_online_map) diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h index ca0a2a6..fc4747f 100644 --- a/xen/include/asm-x86/numa.h +++ b/xen/include/asm-x86/numa.h @@ -15,7 +15,6 @@ extern nodeid_t acpi_setup_node(unsigned int pxm); extern void srat_detect_node(int cpu); extern nodeid_t apicid_to_node[]; -extern void init_cpu_to_node(void); void srat_parse_regions(paddr_t addr); unsigned int arch_get_dma_bitsize(void); diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h index 10ef4c4..8a306e7 100644 --- a/xen/include/xen/numa.h +++ b/xen/include/xen/numa.h @@ -30,6 +30,7 @@ extern s8 acpi_numa; void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn); int srat_disabled(void); int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node); +void init_cpu_to_node(void); You never used this function in common code. So why did you move it in the common headers? Same was defined for x86 as well. So I have moved to common header file. You should make common only functions that will be called from code common or are part of common code. In this particular case, the name might be the same but the behavior is completely different and the way to use it too... Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/3] x86/hvm: Rearange check_segment() to use a switch statement
This simplifies the logic by separating the x86_segment check from the type check. No functional change. Signed-off-by: Andrew CooperReviewed-by: Jan Beulich --- xen/arch/x86/hvm/domain.c | 43 +-- 1 file changed, 29 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c index dca7a00..293956c 100644 --- a/xen/arch/x86/hvm/domain.c +++ b/xen/arch/x86/hvm/domain.c @@ -70,23 +70,38 @@ static int check_segment(struct segment_register *reg, enum x86_segment seg) return -EINVAL; } -if ( seg == x86_seg_cs && !(reg->attr.fields.type & 0x8) ) +switch ( seg ) { -gprintk(XENLOG_ERR, "Non-code segment provided for CS\n"); -return -EINVAL; -} +case x86_seg_cs: +if ( !(reg->attr.fields.type & 0x8) ) +{ +gprintk(XENLOG_ERR, "Non-code segment provided for CS\n"); +return -EINVAL; +} +break; -if ( seg == x86_seg_ss && - ((reg->attr.fields.type & 0x8) || !(reg->attr.fields.type & 0x2)) ) -{ -gprintk(XENLOG_ERR, "Non-writeable segment provided for SS\n"); -return -EINVAL; -} +case x86_seg_ss: +if ( (reg->attr.fields.type & 0x8) || !(reg->attr.fields.type & 0x2) ) +{ +gprintk(XENLOG_ERR, "Non-writeable segment provided for SS\n"); +return -EINVAL; +} +break; -if ( reg->attr.fields.s && seg != x86_seg_ss && seg != x86_seg_cs && - (reg->attr.fields.type & 0x8) && !(reg->attr.fields.type & 0x2) ) -{ -gprintk(XENLOG_ERR, "Non-readable segment provided for DS or ES\n"); +case x86_seg_ds: +case x86_seg_es: +if ( (reg->attr.fields.type & 0x8) && !(reg->attr.fields.type & 0x2) ) +{ +gprintk(XENLOG_ERR, "Non-readable segment provided for DS or ES\n"); +return -EINVAL; +} +break; + +case x86_seg_tr: +break; + +default: +ASSERT_UNREACHABLE(); return -EINVAL; } -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 0/3] Improvements to struct semgent_register
No functional change, but the source code is less verbose. Andrew Cooper (3): x86/svm: Drop svm_segment_register_t x86/hvm: Rearange check_segment() to use a switch statement x86/emul: Drop segment_attributes_t tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 10 +- tools/tests/x86_emulator/test_x86_emulator.c| 2 +- xen/arch/x86/cpu/vpmu.c | 2 +- xen/arch/x86/hvm/domain.c | 80 +++- xen/arch/x86/hvm/emulate.c | 20 +-- xen/arch/x86/hvm/hvm.c | 154 xen/arch/x86/hvm/svm/svm.c | 10 +- xen/arch/x86/hvm/svm/svmdebug.c | 6 +- xen/arch/x86/hvm/svm/vmcb.c | 32 +++-- xen/arch/x86/hvm/vmx/realmode.c | 10 +- xen/arch/x86/hvm/vmx/vmx.c | 41 +++ xen/arch/x86/mm/shadow/common.c | 6 +- xen/arch/x86/pv/emul-priv-op.c | 40 +++--- xen/arch/x86/vm_event.c | 2 +- xen/arch/x86/x86_emulate/x86_emulate.c | 55 + xen/arch/x86/x86_emulate/x86_emulate.h | 37 +++--- xen/include/asm-x86/hvm/svm/vmcb.h | 37 +++--- 17 files changed, 279 insertions(+), 265 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/3] x86/emul: Drop segment_attributes_t
The amount of namespace resolution is unnecessarily large, as all code deals in terms of struct segment_register. This removes the attr.fields part of all references, and alters attr.bytes to just attr. Three areas of code using initialisers for segment_register are tweaked to compile with older versions of GCC. arch_set_info_hvm_guest() has its SEG() macros altered to use plain comma-based initialisation, while {rm,vm86}_{cs,ds}_attr are simplified to plain numbers which matches their description in the manuals. No functional change. (For some reason, the old {rm,vm86}_{cs,ds}_attr causes GCC to create variable in .rodata, whereas the new code uses immediate operands. As a result, vmx_{get,set}_segment_register() are slightly shorter.) Signed-off-by: Andrew CooperReviewed-by: Jan Beulich --- tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 10 +- tools/tests/x86_emulator/test_x86_emulator.c| 2 +- xen/arch/x86/cpu/vpmu.c | 2 +- xen/arch/x86/hvm/domain.c | 43 --- xen/arch/x86/hvm/emulate.c | 20 +-- xen/arch/x86/hvm/hvm.c | 154 xen/arch/x86/hvm/svm/svm.c | 10 +- xen/arch/x86/hvm/svm/svmdebug.c | 4 +- xen/arch/x86/hvm/svm/vmcb.c | 16 +-- xen/arch/x86/hvm/vmx/realmode.c | 10 +- xen/arch/x86/hvm/vmx/vmx.c | 41 +++ xen/arch/x86/mm/shadow/common.c | 6 +- xen/arch/x86/pv/emul-priv-op.c | 40 +++--- xen/arch/x86/vm_event.c | 2 +- xen/arch/x86/x86_emulate/x86_emulate.c | 55 + xen/arch/x86/x86_emulate/x86_emulate.h | 37 +++--- 16 files changed, 219 insertions(+), 233 deletions(-) diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c index aadbb40..a2329f8 100644 --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c @@ -583,7 +583,7 @@ static bool in_longmode(struct x86_emulate_ctxt *ctxt) const struct fuzz_state *s = ctxt->data; const struct fuzz_corpus *c = s->corpus; -return long_mode_active(ctxt) && c->segments[x86_seg_cs].attr.fields.l; +return long_mode_active(ctxt) && c->segments[x86_seg_cs].l; } static void set_sizes(struct x86_emulate_ctxt *ctxt) @@ -597,8 +597,8 @@ static void set_sizes(struct x86_emulate_ctxt *ctxt) ctxt->addr_size = ctxt->sp_size = 64; else { -ctxt->addr_size = c->segments[x86_seg_cs].attr.fields.db ? 32 : 16; -ctxt->sp_size = c->segments[x86_seg_ss].attr.fields.db ? 32 : 16; +ctxt->addr_size = c->segments[x86_seg_cs].db ? 32 : 16; +ctxt->sp_size = c->segments[x86_seg_ss].db ? 32 : 16; } } @@ -741,8 +741,8 @@ static void sanitize_input(struct x86_emulate_ctxt *ctxt) /* EFLAGS.VM implies 16-bit mode */ if ( regs->rflags & X86_EFLAGS_VM ) { -c->segments[x86_seg_cs].attr.fields.db = 0; -c->segments[x86_seg_ss].attr.fields.db = 0; +c->segments[x86_seg_cs].db = 0; +c->segments[x86_seg_ss].db = 0; } } diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c index 1955332..76665ab 100644 --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -264,7 +264,7 @@ static int read_segment( if ( !is_x86_user_segment(seg) ) return X86EMUL_UNHANDLEABLE; memset(reg, 0, sizeof(*reg)); -reg->attr.fields.p = 1; +reg->p = 1; return X86EMUL_OKAY; } diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index 21383d3..90954ca 100644 --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -304,7 +304,7 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs) r->cs = seg.sel; hvm_get_segment_register(sampled, x86_seg_ss, ); r->ss = seg.sel; -r->cpl = seg.attr.fields.dpl; +r->cpl = seg.dpl; if ( !(sampled->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) ) *flags |= PMU_SAMPLE_REAL; } diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c index 293956c..7e11541 100644 --- a/xen/arch/x86/hvm/domain.c +++ b/xen/arch/x86/hvm/domain.c @@ -27,13 +27,13 @@ static int check_segment(struct segment_register *reg, enum x86_segment seg) { -if ( reg->attr.fields.pad != 0 ) +if ( reg->pad != 0 ) { gprintk(XENLOG_ERR, "Segment attribute bits 12-15 are not zero\n"); return -EINVAL; } -if ( reg->attr.bytes == 0 ) +if ( reg->attr == 0 ) { if ( seg != x86_seg_ds && seg != x86_seg_es ) { @@ -45,26 +45,26 @@ static int check_segment(struct
[Xen-devel] [PATCH v2 1/3] x86/svm: Drop svm_segment_register_t
Most SVM code already uses struct segment_register. Drop the typedef and adjust the definitions in struct vmcb_struct, and svm_dump_sel(). Introduce some build-time assertions that struct segment_register from the common emulation code is usable in struct vmcb_struct. While making these adjustments, fix some comments to not mix decimal and hexidecimal offsets, and drop all trailing whitespace in vmcb.h No functional change. Signed-off-by: Andrew CooperReviewed-by: Jan Beulich --- CC: Boris Ostrovsky CC: Suravee Suthikulpanit v2: * Extend the build assertions. --- xen/arch/x86/hvm/svm/svmdebug.c| 2 +- xen/arch/x86/hvm/svm/vmcb.c| 16 xen/include/asm-x86/hvm/svm/vmcb.h | 37 + 3 files changed, 34 insertions(+), 21 deletions(-) diff --git a/xen/arch/x86/hvm/svm/svmdebug.c b/xen/arch/x86/hvm/svm/svmdebug.c index a3f8685..4902824 100644 --- a/xen/arch/x86/hvm/svm/svmdebug.c +++ b/xen/arch/x86/hvm/svm/svmdebug.c @@ -21,7 +21,7 @@ #include #include -static void svm_dump_sel(const char *name, const svm_segment_register_t *s) +static void svm_dump_sel(const char *name, const struct segment_register *s) { printk("%s: %04x %04x %08x %016"PRIx64"\n", name, s->sel, s->attr.bytes, s->limit, s->base); diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c index 96abf8d..2e67d8d 100644 --- a/xen/arch/x86/hvm/svm/vmcb.c +++ b/xen/arch/x86/hvm/svm/vmcb.c @@ -310,6 +310,22 @@ void __init setup_vmcb_dump(void) register_keyhandler('v', vmcb_dump, "dump AMD-V VMCBs", 1); } +static void __init __maybe_unused build_assertions(void) +{ +struct segment_register sreg; + +/* Check struct segment_register against the VMCB segment layout. */ +BUILD_BUG_ON(sizeof(sreg) != 16); +BUILD_BUG_ON(sizeof(sreg.sel) != 2); +BUILD_BUG_ON(sizeof(sreg.attr) != 2); +BUILD_BUG_ON(sizeof(sreg.limit) != 4); +BUILD_BUG_ON(sizeof(sreg.base) != 8); +BUILD_BUG_ON(offsetof(struct segment_register, sel) != 0); +BUILD_BUG_ON(offsetof(struct segment_register, attr) != 2); +BUILD_BUG_ON(offsetof(struct segment_register, limit) != 4); +BUILD_BUG_ON(offsetof(struct segment_register, base) != 8); +} + /* * Local variables: * mode: C diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h index 30a228b..fa0d3e2 100644 --- a/xen/include/asm-x86/hvm/svm/vmcb.h +++ b/xen/include/asm-x86/hvm/svm/vmcb.h @@ -31,7 +31,7 @@ enum GenericIntercept1bits GENERAL1_INTERCEPT_SMI = 1 << 2, GENERAL1_INTERCEPT_INIT = 1 << 3, GENERAL1_INTERCEPT_VINTR = 1 << 4, -GENERAL1_INTERCEPT_CR0_SEL_WRITE = 1 << 5, +GENERAL1_INTERCEPT_CR0_SEL_WRITE = 1 << 5, GENERAL1_INTERCEPT_IDTR_READ = 1 << 6, GENERAL1_INTERCEPT_GDTR_READ = 1 << 7, GENERAL1_INTERCEPT_LDTR_READ = 1 << 8, @@ -304,13 +304,10 @@ enum VMEXIT_EXITCODE VMEXIT_INVALID = -1 }; -/* Definition of segment state is borrowed by the generic HVM code. */ -typedef struct segment_register svm_segment_register_t; - typedef union { u64 bytes; -struct +struct { u64 vector:8; u64 type: 3; @@ -324,7 +321,7 @@ typedef union typedef union { u64 bytes; -struct +struct { u64 tpr: 8; u64 irq: 1; @@ -342,7 +339,7 @@ typedef union typedef union { u64 bytes; -struct +struct { u64 type: 1; u64 rsv0: 1; @@ -438,23 +435,23 @@ struct vmcb_struct { u8 guest_ins[15]; /* offset 0xD1 */ u64 res10a[100];/* offset 0xE0 pad to save area */ -svm_segment_register_t es; /* offset 1024 - cleanbit 8 */ -svm_segment_register_t cs; /* cleanbit 8 */ -svm_segment_register_t ss; /* cleanbit 8 */ -svm_segment_register_t ds; /* cleanbit 8 */ -svm_segment_register_t fs; -svm_segment_register_t gs; -svm_segment_register_t gdtr; /* cleanbit 7 */ -svm_segment_register_t ldtr; -svm_segment_register_t idtr; /* cleanbit 7 */ -svm_segment_register_t tr; +struct segment_register es; /* offset 0x400 - cleanbit 8 */ +struct segment_register cs; /* cleanbit 8 */ +struct segment_register ss; /* cleanbit 8 */ +struct segment_register ds; /* cleanbit 8 */ +struct segment_register fs; +struct segment_register gs; +struct segment_register gdtr; /* cleanbit 7 */ +struct segment_register ldtr; +struct segment_register idtr; /* cleanbit 7 */ +struct segment_register tr; u64 res10[5]; u8 res11[3]; u8 _cpl;/* cleanbit 8 */ u32 res12; -u64 _efer; /* offset 1024 + 0xD0 - cleanbit 5 */ +u64 _efer; /* offset 0x400 + 0xD0 - cleanbit 5 */ u64
Re: [Xen-devel] [OSSTEST PATCH v13 00/24] Have OpenStack tested on top of xen's master and libvirt's master.
Anthony PERARD writes ("[OSSTEST PATCH v13 00/24] Have OpenStack tested on top of xen's master and libvirt's master."): > Now powered with subunit-to-substep engine. > > The Tempest test names reported via subunit are in the form: > tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,id-bdbb5441-9204-419d-a225-b4fdbfb1a1a8,image,network,volume] Blimey. Well, I guess we will put up with that. Thanks. I think I have been through all the patches now. I hope you find my comments helpful. Regards, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH v13 18/24] TestSupport: Introduce target_cmd_stashed
Anthony PERARD writes ("[OSSTEST PATCH v13 18/24] TestSupport: Introduce target_cmd_stashed"): > This works like target_cmd, but takes a ref to a filename as argument > and stash the output of the command then return a path to the stashed > output. ... > +# Like target_cmd, but stash cmd stdout and return a path to it. > +sub target_cmd_stashed ($$$;$$) { > +my ($tho,$leafref,$tcmd,$timeout,$extrasshopts) = @_; > +my $stdout = open_unique_stashfile($leafref); > +my $rc = tcmd(undef, $stdout, 0, 'osstest', $tho, $tcmd, $timeout, > +$extrasshopts); > +die "$stdout $!" if $stdout->error or !close $stdout; You can't sensibly interpolate a filehandle into a "" string. You should use the filename (probably, the whole filename). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH v13 22/24] New branch openstack-ocata
Anthony PERARD writes ("[OSSTEST PATCH v13 22/24] New branch openstack-ocata"): > Testing of the Ocata stable branch of OpenStack against Xen unstable. > > OpenStack have many different repo which should be in sync, so we should > attempd to grab the revisions of the stable branch of every OpenStack > tree, but for now, the runvars REVISION_* of tree other than nova is set > to "origin/stable/ocata", except Tempest does not have stable branch and > should be able to test any OpenStack version. Ah I see this is where the new branch is created. You will want to run standalone-generate-dump-flight-runvars after this, not after the previous patch (which is fine as it is - sorry, please forget my comment about editing cr-for-branches there). I'm afraid I don't understand your explanation about stable branches. Earlier I asked: > Do you intend to provide a version of this patch which maintains a > tested branch for all of these different trees ? No, I don't. This would be a different patch (and maybe different patch series). So you don't intend to maintain a tested branch for each of these trees. But you do intend, I think, to maintain a tested branch of the main tree. You say the subtrees "should be in sync", which I take to mean that openstack upstream only expect it to work if you grab a revision from all of these trees at "roughty the same time" or something. I don't think this will necessarily work properly. The most obvious problem I see is that regressions which are introduced in the subtrees will be picked up by even flights which are attempting to use a working (ie osstest-tested) version of "openstack". This may not be critical if openstack jobs appear only on the flights for openstack branches. openstack branches will occasionally suffer trouble but things will probably be BALGE. But we definitely won't be able to add openstack tests to the other branches without doing something different. I have a very strong feeling we have discussed this before but I'm afraid the answer doesn't seem to have been written down somewhere I can easily find it. It should be explained in your series somewhere, in a comment or a commit message. It would also be nice to have a theory about how this could be improved in the future. That would mean we could be more confident that we're not painting ourselves into a corner. Having said all that, with a suitable explanation, I think the code is probably about right. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH v13 24/24] openstack tests: Don't run them on arm*
Anthony PERARD writes ("[OSSTEST PATCH v13 24/24] openstack tests: Don't run them on arm*"): > Signed-off-by: Anthony PERARDAcked-by: Ian Jackson You should probably mention the existence of this patch in the other one's commit message... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH v13 21/24] Create a flight to test OpenStack with xen-unstable and libvirt
Anthony PERARD writes ("[OSSTEST PATCH v13 21/24] Create a flight to test OpenStack with xen-unstable and libvirt"): > This patch creates a flight "openstack*", with those jobs: Do you mean it creates a "branch" ? But I don't think it does. I predict no changes to the output of mg-list-all-branches. You probably want to edit cr-for-branches. > build-amd64 > build-amd64-libvirt > build-amd64-pvops > build-amd64-xsm > build-arm64 > build-arm64-libvirt > build-arm64-pvops > build-arm64-xsm > build-armhf > build-armhf-libvirt > build-armhf-pvops > build-armhf-xsm > test-amd64-amd64-devstack > test-amd64-amd64-devstack-xsm > test-arm64-arm64-devstack > test-arm64-arm64-devstack-xsm > test-armhf-armhf-devstack > test-armhf-armhf-devstack-xsm Does it add these jobs to existing flights ? I think it probably does. Is that intentional ? I think it probably isn't. If it is you should explain it in the commit message - and also explain why this is OK despite you not keeping tested versions of all the input branches (ie, risking uncontrolled regressions in openstack components breaking the xen pushes...) You can see what it does with standalone-dump-all-flight-runvars (which is best run with eatmydata). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
On Tue, 25 Jul 2017, Julien Grall wrote: > On 25/07/17 18:52, Stefano Stabellini wrote: > > On Tue, 25 Jul 2017, Andrii Anisov wrote: > > > Hello Andrew, > > > > > > > > > On 25.07.17 19:23, Andrew Cooper wrote: > > > > As a general rule, Xen needs to be able to tolerate and cope with any > > > > quantity of crap described by the firmware. On the x86 side, we have > > > > large quantities of workarounds for buggy ACPI/MP/SMBIOS tables. > > > That approach somehow covered with early mentioned options: > > > > > > On 25.07.17 15:24, Andrii Anisov wrote: > > > > * ignore next duplicating (overlapping) memory node in favor of one > > > > already > > > > in a memory banks list > > > > * merge duplicating (overlapping), even neighboring, memory banks > > > > > > On 25.07.17 19:23, Andrew Cooper wrote: > > > > It might be the case that the best Xen can do is give up, but it should > > > > do so with a clear error message identifying what the firmware has done > > > > which is sufficiently crazy to prevent further booting. > > > We have one more option to choose for the case: > > > > > > * BUG() with clear notification at the moment we are trying to add > > > overlapping > > > memory bank > > > > > > So what to choose? > > > > Certainly we need to print a clear warning. > > +1 here. > > > Then, we can decide whether we prefer to crash (as we do today), or > > work-around the broken device-tree. I think it would be more beneficial > > to Xen users if we tried to continue anyway, and probably the best way > > to do that would be by merging the overlapping memory regions. > > I fully understand that this is not required by the spec, but lots of > > hardware (x86 and ARM) get released every day with some sort of broken > > spec compliance. It's our job to decide on a case by case basis whether > > it makes sense for us to support these platforms anyway. > > > > In this case, the cost of supporting the Renesas R-Car Gen3 seems pretty > > limited to me. Of course, I would have to see the patch, but if we can > > make this work with limited amount of changes, very maintainable, I > > would take them. Otherwise, screw it :-) > > I tend to disagree here. This is the first board were the bug occurs and the > Device-Tree is replaceable. > > Furthermore, if you look at the wikipage for Renesas R-Car on Xen (see [1]), a > specific DT for Xen is provided. > > So I can't see any reason to implement that in Xen at the moment. That's true. It does not make sense to do this until we get rid of the Xen specific R-Car device tree on the wiki. > [1] > https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache
On Tue, 25 Jul 2017, Paolo Bonzini wrote: > > Hi, > > > > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram) > > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr(). > > That result in calling xen_map_cache() with lock=true, but this mapping > > is never invalidated. > > So QEMU use more and more RAM until it stop working for a reason or an > > other. (crash if host have little RAM or stop emulating but no crash) > > > > I don't know if calling xen_invalidate_map_cache_entry() in > > address_space_read_continue() and address_space_write_continue() is the > > right answer. Is there something better to do ? > > I think it's correct for dma to be true... maybe add a lock argument to > qemu_ram_ptr_length, so that make address_space_{read,write}_continue can > pass 0 and everyone else passes 1? I think that is a great suggestion. That way, the difference between locked mappings and unlocked mappings would be explicit, rather than relying on callers to use qemu_map_ram_ptr for unlocked mappings and qemu_ram_ptr_length for locked mappings. And there aren't that many callers of qemu_ram_ptr_length, so adding a parameter wouldn't be an issue. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
On 25/07/17 18:52, Stefano Stabellini wrote: On Tue, 25 Jul 2017, Andrii Anisov wrote: Hello Andrew, On 25.07.17 19:23, Andrew Cooper wrote: As a general rule, Xen needs to be able to tolerate and cope with any quantity of crap described by the firmware. On the x86 side, we have large quantities of workarounds for buggy ACPI/MP/SMBIOS tables. That approach somehow covered with early mentioned options: On 25.07.17 15:24, Andrii Anisov wrote: * ignore next duplicating (overlapping) memory node in favor of one already in a memory banks list * merge duplicating (overlapping), even neighboring, memory banks On 25.07.17 19:23, Andrew Cooper wrote: It might be the case that the best Xen can do is give up, but it should do so with a clear error message identifying what the firmware has done which is sufficiently crazy to prevent further booting. We have one more option to choose for the case: * BUG() with clear notification at the moment we are trying to add overlapping memory bank So what to choose? Certainly we need to print a clear warning. +1 here. Then, we can decide whether we prefer to crash (as we do today), or work-around the broken device-tree. I think it would be more beneficial to Xen users if we tried to continue anyway, and probably the best way to do that would be by merging the overlapping memory regions. I fully understand that this is not required by the spec, but lots of hardware (x86 and ARM) get released every day with some sort of broken spec compliance. It's our job to decide on a case by case basis whether it makes sense for us to support these platforms anyway. In this case, the cost of supporting the Renesas R-Car Gen3 seems pretty limited to me. Of course, I would have to see the patch, but if we can make this work with limited amount of changes, very maintainable, I would take them. Otherwise, screw it :-) I tend to disagree here. This is the first board were the bug occurs and the Device-Tree is replaceable. Furthermore, if you look at the wikipage for Renesas R-Car on Xen (see [1]), a specific DT for Xen is provided. So I can't see any reason to implement that in Xen at the moment. Cheers, [1] https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps
Anthony PERARD writes ("[OSSTEST PATCH v13 19/24] TestSupport: Implement target_subunit_cmd a subunit stream parser into substeps"): > target_subunit_cmd can be used like target_cmd, but the command would > needs to output a subunit v1 stream, which will be parsed and turned > into osstest substeps. The command can be `| subunit-2to1` in order to > turn a subunit v2 stream into v1. > > Currently, time is not taken into account, and all substeps will have > bogus timestamp as the output of the command is parsed after it has > runned. > > This is a description of the subunit v1 protocol, taken from > python-subunit README, or https://pypi.python.org/pypi/python-subunit What a lot of code! > +while (<$stdout>) { > +if (/^time: (\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)(\.\d+)?Z$/) { > +# This is the timestamp for the next events I'm not sure what your ( ) are doing here. > +} elsif (/^test(?:ing)?:? (.+)\n/) { > +# Start of a new test. > +$logfilename = subunit_sanitize_testname($1) . '.log'; > +$fh = open_unique_stashfile(\$logfilename); This name might clash with existing logfile names, which might be generated later. Can you put "subunit-" on the front maybe ? > +substep_start(subunit_sanitize_testname($1), $logfilename); And here, I think you should start the parameter you pass to substep_start with '/' so that it gets appended to the testid for the whole script, for a similar reason. I think it would be better to call subunit_sanitize_testname only once. > +} elsif (/^(success(?:ful)?|failure|skip|error|xfail|uxsuccess): > + \ (.+?)(\ \[(\ multipart)?)?$/x) { > +# Result of a test, with its output. > +my $result = $1; > +my $testname = $2; > +my $have_details = $3; > +my $is_multipart = $4; I would normally write this: my ($result, $testname, $have_... ) = ($1,$2,$3,$4,$5) although I don't really mind much that you have written it as you have. > +if ($have_details) { > +if ($is_multipart) { > +# Test output > +while (<$stdout>) { > +# part content-type > +# from > https://tools.ietf.org/html/rfc6838#section-4.2 > +my $restricted_name = > qr'[a-zA-Z0-9][a-zA-Z0-9!#$&^_.+-]*'; > +if (m{ ^Content-Type:\s+ > +$restricted_name/$restricted_name # > type/sub-type > +# parameters > +(?:\s*;\s*$restricted_name=[^,]+ > + (?:,\s*$restricted_name=[^,]+)*) > +\s*$ > +}xi) { I don't understand why you are trying to match this Content-Type so precisely. AFAICT from the grammar, all you need to do is see whether there is something vaguely like a c-t header. > +print $fh $_ or die $!; > + > +# part name > +my $line = <$stdout>; > +print $fh $line or die $!; > + > +# Read chunks of a part > +while (<$stdout>) { > +if (/^([0-9A-F]+)\r$/i) { > +my $chunk_size = hex($1); What makes you think the digits are in hex ? Since you have to go to the effort of separating out all of this stuff, it might be worth printing these multipart objects with one object per logfile. Although I won't insist on that because I suspect that multipart results are rare. > +} else { > +# Unexpected output > +chomp; > +logm("*** $_"); I guess the error recovery is to continue until you see "]" and hope. Fair enough. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
On Tue, 25 Jul 2017, Andrii Anisov wrote: > Hello Andrew, > > > On 25.07.17 19:23, Andrew Cooper wrote: > > As a general rule, Xen needs to be able to tolerate and cope with any > > quantity of crap described by the firmware. On the x86 side, we have > > large quantities of workarounds for buggy ACPI/MP/SMBIOS tables. > That approach somehow covered with early mentioned options: > > On 25.07.17 15:24, Andrii Anisov wrote: > > * ignore next duplicating (overlapping) memory node in favor of one already > > in a memory banks list > > * merge duplicating (overlapping), even neighboring, memory banks > > On 25.07.17 19:23, Andrew Cooper wrote: > > It might be the case that the best Xen can do is give up, but it should > > do so with a clear error message identifying what the firmware has done > > which is sufficiently crazy to prevent further booting. > We have one more option to choose for the case: > > * BUG() with clear notification at the moment we are trying to add overlapping > memory bank > > So what to choose? Certainly we need to print a clear warning. Then, we can decide whether we prefer to crash (as we do today), or work-around the broken device-tree. I think it would be more beneficial to Xen users if we tried to continue anyway, and probably the best way to do that would be by merging the overlapping memory regions. I fully understand that this is not required by the spec, but lots of hardware (x86 and ARM) get released every day with some sort of broken spec compliance. It's our job to decide on a case by case basis whether it makes sense for us to support these platforms anyway. In this case, the cost of supporting the Renesas R-Car Gen3 seems pretty limited to me. Of course, I would have to see the patch, but if we can make this work with limited amount of changes, very maintainable, I would take them. Otherwise, screw it :-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache
> Hi, > > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram) > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr(). > That result in calling xen_map_cache() with lock=true, but this mapping > is never invalidated. > So QEMU use more and more RAM until it stop working for a reason or an > other. (crash if host have little RAM or stop emulating but no crash) > > I don't know if calling xen_invalidate_map_cache_entry() in > address_space_read_continue() and address_space_write_continue() is the > right answer. Is there something better to do ? I think it's correct for dma to be true... maybe add a lock argument to qemu_ram_ptr_length, so that make address_space_{read,write}_continue can pass 0 and everyone else passes 1? Paolo ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu
On 25/07/17 17:40, Alexey G wrote: > On Mon, 24 Jul 2017 21:39:08 +0100 > Igor Druzhininwrote: >>> But, the problem is that overall MMIO hole(s) requirements are not known >>> exactly at the time the HVM domain being created. Some PCI devices will >>> be emulated, some will be merely passed through and yet there will be >>> some RMRR ranges. libxl can't know all this stuff - some comes from the >>> host, some comes from DM. So actual MMIO requirements are known to >>> hvmloader at the PCI bus enumeration time. >>> >> IMO hvmloader shouldn't really be allowed to relocate memory under any >> conditions. As Andrew said it's much easier to provision the hole >> statically in libxl during domain construction process and it doesn't >> really compromise any functionality. Having one more entity responsible >> for guest memory layout only makes things more convoluted. > If moving most tasks of hvmloader to libxl is a planned feature in Citrix, > please let it be discussed on xen-devel first as it may affect many > people... and not all of them might be happy. :) > > (tons of IMO and TLDR ahead, be warned) > > Moving PCI BAR allocation from guest side to libxl is a controversial step. > This may be the architecturally wrong way in fact. There are properties and > areas of responsibility. Among primary responsibilities of guest's firmware > is PCI BARs and MMIO hole size allocation. There is already a very blury line concerning "firmware". What you describe is correct for real hardware, but remember that virtual machines are anything but. There is already a lot of aspects of initialisation covered by Xen or the toolstack which would be covered by "firmware" in a native system. A lot of these are never ever going to move within guest control. > That's a guest's territory. Every tweakable which is available inside the guest is a security attack surface. It is important to weigh up all options, and it might indeed be the case that putting the tweakable inside the guest is the correct action to take, but simply "because that's what real hardware does" is not a good enough argument. We've had far too many XSAs due to insufficient forethought when lashing things together in the past. > Guest relocates PCI BARs (and not just BIOS able to do this), guest > firmware relocates MMIO hole base for them. If it was a real system, all > tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done > by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were > offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for > them and generating ACPI tables. And that's ok as hvmloader can be > considered merely a 'supplemental' firmware to perform some tasks of > SeaBIOS/OVMF before passing control to them. This solution has some > architecture logic at least and doesn't look bad. PCI BAR relocation isn't interesting to consider. It obviously has to be dynamic (as the OS is free to renumber the bridges). The issue I am concerned with is purely the MMIO window selection. From the point of view of the guest, this is fixed at boot; changing it requires a reboot and altering the BIOS settings. > > On other hand, moving PCI hole calculation to libxl just to let Xen/libxl > know what the MMIO size value is might be a bad idea. > Aside from some code duplication, straying too far from the real hw paths, > or breaking existing (or future) interfaces this might have some other > negative consequences. Ex. who will be initializing guest's ACPI tables if > only libxl will know the memory layout? Some new interfaces between libxl > and hvmloader just to let the latter know what values to write to ACPI > tables being created? Or libxl will be initializing guest's ACPI tables as > well (another guest's internal task)? Similar concerns are applicable to > guest's final E820 construction. Who said anything about only libxl knowing the layout? Whatever ends up happening, the hypervisor needs to know the layout to be able to sensibly audit a number of guest actions which currently go unaudited. (I am disappointed that this wasn't done in the first place, and surprised that Xen as a whole has managed to last this long without this information being known to the hypervisor.) > > Another thing is that handling ioreq/PT MMIO ranges is somewhat a property > of the device model (at least for now). Right now it's QEMU who traps PCI > BAR accesses and tells Xen how to handle specific ranges of MMIO space. If > QEMU already talks to Xen which ranges should be passed through or trapped > -- it can tell him the current overall MMIO limits as well... or handle > these limits himself -- if the MMIO hole range check is all what required to > avoid MMIO space misusing, this check can be easily implemented in QEMU, > provided that QEMU knows what memory/MMIO layout is. There is a lot of > implementation freedom where to place restrictions and checks, Xen or QEMU. > Strictly
Re: [Xen-devel] [PATCH 22/25 v6] xen/arm: vpl011: Add support for vuart console in xenconsole
Hi Bhupinder, Thanks for trying, and I would have done the same thing as you did: ifeq ($(CONFIG_SBSA_VUART_CONSOLE),y) CONFIG_VUART_CONSOLE := y endif However, we don't want to introduce a dependency between "make tools" and "make xen", meaning that one should be able to do "make tools" successfully even without having done "make xen" before. I think this is a generic tools building issue that will need to be fixed, probably by always generating the xen .config, even when building tools. But I don't think it's fair to ask you to do it as part of this series. So feel free to disregard my little request for now and always set CONFIG_VUART_CONSOLE := y. On Tue, 25 Jul 2017, Bhupinder Thakur wrote: > Hi Stefano, > > Can we make CONFIG_VUART_CONSOLE dependent on CONFIG_SBSA_VUART_CONSOLE? > > CONFIG_SBSA_VUART_CONSOLE is a Kconfig option while > CONFIG_VUART_CONSOLE is an option defined in the .mk file which is > used while compiling the toolstack. > > So if I try to do something like this in arm64.mk/arm32.mk file, I am > not sure if CONFIG_SBSA_VUART_CONSOLE definition will be available > (since .config would not be generated) if I have not compiled Xen > hypervisor code first: > > ifeq ($(CONFIG_SBSA_VUART_CONSOLE),y) > CONFIG_VUART_CONSOLE := y > endif > > Regards, > Bhupinder > > On 22 July 2017 at 01:14, Stefano Stabelliniwrote: > > On Fri, 21 Jul 2017, Julien Grall wrote: > >> Hi, > >> > >> On 18/07/17 21:07, Stefano Stabellini wrote: > >> > On Mon, 17 Jul 2017, Bhupinder Thakur wrote: > >> > > This patch finally adds the support for vuart console. It adds > >> > > two new fields in the console initialization: > >> > > > >> > > - optional > >> > > - prefer_gnttab > >> > > > >> > > optional flag tells whether the console is optional. > >> > > > >> > > prefer_gnttab tells whether the ring buffer should be allocated using > >> > > grant table. > >> > > > >> > > Signed-off-by: Bhupinder Thakur > >> > > --- > >> > > CC: Ian Jackson > >> > > CC: Wei Liu > >> > > CC: Stefano Stabellini > >> > > CC: Julien Grall > >> > > > >> > > Changes since v4: > >> > > - Renamed VUART_CFLAGS- to CFLAGS_vuart- in the Makefile as per the > >> > > convention. > >> > > > >> > > config/arm32.mk | 1 + > >> > > config/arm64.mk | 1 + > >> > > tools/console/Makefile| 3 ++- > >> > > tools/console/daemon/io.c | 29 - > >> > > 4 files changed, 32 insertions(+), 2 deletions(-) > >> > > > >> > > diff --git a/config/arm32.mk b/config/arm32.mk > >> > > index f95228e..b9f23fe 100644 > >> > > --- a/config/arm32.mk > >> > > +++ b/config/arm32.mk > >> > > @@ -1,5 +1,6 @@ > >> > > CONFIG_ARM := y > >> > > CONFIG_ARM_32 := y > >> > > +CONFIG_VUART_CONSOLE := y > >> > > CONFIG_ARM_$(XEN_OS) := y > >> > > > >> > > CONFIG_XEN_INSTALL_SUFFIX := > >> > > >> > What about leaving this off for ARM32 by default? > >> > >> Why? This will only disable xenconsole changes and not the hypervisor. The > >> changes are quite tiny, so I would even be in favor of enabling for all > >> architectures. > >> > >> Or are you suggesting to disable the VPL011 emulation in the hypervisor? > >> But I > >> don't see the emulation AArch64 specific, and a user could disable it if he > >> doesn't want it... > > > > I was thinking that the virtual pl011 is mostly useful for SBSA > > compliance, which doesn't really apply to ARM32 (there are no ARM32 SBSA > > compliant platforms as far as I am aware). > > > > Given that we don't need vpl011 on ARM32, I thought we might as well > > disable it. Less code the better. I wouldn't go as far as introducing > > more #ifdefs to disable it, but I would make use of the existing config > > options to turn it off by default on ARM32. Does it make sense? > > > > That said, you are right that there is no point in disabling only > > CONFIG_VUART_CONSOLE, which affects the tools only. We should really > > disable SBSA_VUART_CONSOLE by default on ARM32. In fact, ideally > > CONFIG_VUART_CONSOLE would be set dependning on the value of > > SBSA_VUART_CONSOLE. What do you think? > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH v13 10/24] ts-openstack-deploy: Increase open fd limit for RabbitMQ
Anthony PERARD writes ("[OSSTEST PATCH v13 10/24] ts-openstack-deploy: Increase open fd limit for RabbitMQ"): > Signed-off-by: Anthony PERARDAcked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6] x86/monitor: Notify monitor if an emulation fails.
On 07/18/2017 01:20 PM, Razvan Cojocaru wrote: > On 07/18/2017 01:09 PM, Andrew Cooper wrote: >> On 18/07/17 10:37, Petre Pircalabu wrote: >>> If case of a vm_event with the emulate_flags set, if the instruction >>> cannot be emulated, the monitor should be notified instead of directly >>> injecting a hw exception. >>> This behavior can be used to re-execute an instruction not supported by >>> the emulator using the real processor (e.g. altp2m) instead of just >>> crashing. >>> >>> Signed-off-by: Petre Pircalabu>> >> There are many situations which end up failing an emulation with >> UNHANDLEABLE. >> >> What exact scenario are you looking to catch? Is it just instructions >> which aren't implemented in the emulator? > > Instructions that are not implemented in the emulator are our main use > case for this, yes. In which case, we'd like a chance to be able to > single-step them (using altp2m), so that the guest will continue to run > even with an incomplete emulator. > > We don't care about instructions that would have failed to run in both > scenarios (emulated or single-stepped). I'm not sure if there are other > cases in which an instruction, although supported by the emulator, would > fail emulation but pass single-stepping. Is there further action required on our part the get this patch commited? FWIW, while not ideal, from our perspective trying to single-step an instruction that was UNHANDLEABLE when attempting to emulate it is acceptable (even if UNHANDLEABLE doesn't mean that the instruction is not supported by the emulator). At worst it won't run when single-stepped either, and that'll be that. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 22/25 v6] xen/arm: vpl011: Add support for vuart console in xenconsole
On Tue, 25 Jul 2017, Julien Grall wrote: > Hi Stefano, > > On 07/21/2017 08:44 PM, Stefano Stabellini wrote: > > On Fri, 21 Jul 2017, Julien Grall wrote: > > > Hi, > > > > > > On 18/07/17 21:07, Stefano Stabellini wrote: > > > > On Mon, 17 Jul 2017, Bhupinder Thakur wrote: > > > > > This patch finally adds the support for vuart console. It adds > > > > > two new fields in the console initialization: > > > > > > > > > > - optional > > > > > - prefer_gnttab > > > > > > > > > > optional flag tells whether the console is optional. > > > > > > > > > > prefer_gnttab tells whether the ring buffer should be allocated using > > > > > grant table. > > > > > > > > > > Signed-off-by: Bhupinder Thakur> > > > > --- > > > > > CC: Ian Jackson > > > > > CC: Wei Liu > > > > > CC: Stefano Stabellini > > > > > CC: Julien Grall > > > > > > > > > > Changes since v4: > > > > > - Renamed VUART_CFLAGS- to CFLAGS_vuart- in the Makefile as per the > > > > > convention. > > > > > > > > > > config/arm32.mk | 1 + > > > > > config/arm64.mk | 1 + > > > > > tools/console/Makefile| 3 ++- > > > > > tools/console/daemon/io.c | 29 - > > > > > 4 files changed, 32 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/config/arm32.mk b/config/arm32.mk > > > > > index f95228e..b9f23fe 100644 > > > > > --- a/config/arm32.mk > > > > > +++ b/config/arm32.mk > > > > > @@ -1,5 +1,6 @@ > > > > > CONFIG_ARM := y > > > > > CONFIG_ARM_32 := y > > > > > +CONFIG_VUART_CONSOLE := y > > > > > CONFIG_ARM_$(XEN_OS) := y > > > > > > > > > > CONFIG_XEN_INSTALL_SUFFIX := > > > > > > > > What about leaving this off for ARM32 by default? > > > > > > Why? This will only disable xenconsole changes and not the hypervisor. The > > > changes are quite tiny, so I would even be in favor of enabling for all > > > architectures. > > > > > > Or are you suggesting to disable the VPL011 emulation in the hypervisor? > > > But I > > > don't see the emulation AArch64 specific, and a user could disable it if > > > he > > > doesn't want it... > > > > I was thinking that the virtual pl011 is mostly useful for SBSA > > compliance, which doesn't really apply to ARM32 (there are no ARM32 SBSA > > compliant platforms as far as I am aware). > > > > Given that we don't need vpl011 on ARM32, I thought we might as well > > disable it. Less code the better. I wouldn't go as far as introducing > > more #ifdefs to disable it, but I would make use of the existing config > > options to turn it off by default on ARM32. Does it make sense? > > > > That said, you are right that there is no point in disabling only > > CONFIG_VUART_CONSOLE, which affects the tools only. We should really > > disable SBSA_VUART_CONSOLE by default on ARM32. In fact, ideally > > CONFIG_VUART_CONSOLE would be set dependning on the value of > > SBSA_VUART_CONSOLE. What do you think? > > Kconfig is only targeting the hypervisor and this is the tools. It is possible > to build the tools separately from the hypervisor and therefore .config would > not be generated. > > Therefore your suggestion cannot work at the moment. It's incredible this problem hasn't been solved yet. I guess the right way of fixing it would be to generate the Xen .config even when doing "make tools". > However, imposing an > #ifdef to require some work to support correctly for 29 lines does not seem > very warrant. Yes, I don't want to feature-creep it. > So I think we should keep on by default. OK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/25 v6] xen/arm: vpl011: Add support for vuart in libxl
Hi, On 18 July 2017 at 17:00, Wei Liuwrote: > CC x86 maintainers > > On Tue, Jul 18, 2017 at 12:19:19PM +0100, Julien Grall wrote: >> > >> > ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), >> > + ("vuart", libxl_vuart_type), >> >> ... here it is ARM specific. I am not convinced that we should tie vuart to >> ARM only. I cannot see why x86 would not be able to use it in the future. >> Any opinions? > > I don't know. I asked Bhupinder to put it here because it looked arm > specific to me. I will let x86 maintainers to decide whether they want > such thing. What is the decision on this? I believe that since most of the vuart code added in libxl is arch agnostic, it should be fine to keep the libxl_vuart_type as a generic type. Regards, Bhupinder ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
On Tue, 2017-07-25 at 17:52 +0100, George Dunlap wrote: > On 07/25/2017 05:47 PM, Dario Faggioli wrote: > > > > All that being said, it probably would be good to add a performance > > counter, and try to get a sense of how frequently we actually end > > up in > > this function as a fallback. > > > > But in the meantime, yes, I'd try to make svc stay in the runqueue > > where it is, in this case, if possible. > > Sounds good. So are you going to respin the series then? > Yep, I'll rebase and respin this series. And then rebase the caps series on top of this and respin it too. Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap
On Tue, 2017-07-25 at 15:34 +0100, George Dunlap wrote: > On 06/29/2017 11:09 AM, Dario Faggioli wrote: > > E.g., if the vcpu is "greedy", and always tries to run, as soon as > > it > > has some budget, the difference between the two solution is _where_ > > we > > put the "sitting period". > > Sorry, dropped this thread to prepare for XenSummit. > NP. > Well I think there's a principle a bit like Ockham's Razor: In > general, > if there's not an obvious advantage between two implementations, the > simpler / easier to understand implementation is better. > > My sense is also that in general, as long as context switching > overhead > doesn't become an issue, allowing a guest to run for shorter bursts > more > often is better than allowing it to run for longer bursts less > often. A > cpu-bound task will perform about as well in both cases, but a > latency-sensitive task will perform much worse if it's allowed to > burn > up its timeslice and forced to wait for a big chunk of time. > That's actually a fair point. > On the other hand, my intuitions about how to improve AFL fuzzing > these > last few weeks have turned out to be consistently wrong, so at the > moment I'm not inclined to be overly confident in my intuition > without > some testing. :-) > Well, this is a similar situation to the one in the other thread. It's an edge case that, likely, will never or only very rarely occur. So, we don't need to think too much about it, and we well even afford to follow your intuition! :-D Mmm... Maybe this came out a bit less of a "cheering you up", as I wanted it to be, but hey, I tried! :-P :-P :-P No, jokes apart, I like your argument about the effect this may have on latency sensitive workload, if they happen to suffer from one of this overrun, so I will change the code the way you suggest. :-) Thanks and Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 02/13] iommu: Add extra order argument to the IOMMU APIs and platform callbacks
From: Oleksandr TyshchenkoReplace existing single-page stuff (IOMMU APIs and platform callbacks) with the multi-page one followed by modifications of all related parts. These new map_pages/unmap_pages APIs do almost the same thing as old map_page/unmap_page ones except the formers have extra order argument and as the result can handle the number of pages. So have new platform callbacks. Although the current behavior was retained in all places (I hope), it should be noted that the rollback logic was moved from the common code to the IOMMU drivers. Now the IOMMU drivers are responsible for unmapping already mapped pages if something went wrong during mapping the number of pages (order > 0). Signed-off-by: Oleksandr Tyshchenko (only for x86 and generic parts) Reviewed-by/CC: Jan Beulich CC: Julien Grall CC: Kevin Tian CC: Suravee Suthikulpanit CC: Andrew Cooper CC: George Dunlap CC: Ian Jackson CC: Konrad Rzeszutek Wilk CC: Stefano Stabellini CC: Tim Deegan CC: Wei Liu --- Changes in v1: - Replace existing single-page IOMMU APIs/platform callbacks with multi-page ones instead of just keeping both variants of them. - Use order argument instead of page_count. - Clarify patch subject/description. Changes in v2: - Add maintainers in CC --- xen/arch/x86/mm.c | 11 +++--- xen/arch/x86/mm/p2m-ept.c | 21 ++- xen/arch/x86/mm/p2m-pt.c | 26 +++--- xen/arch/x86/mm/p2m.c | 38 xen/arch/x86/x86_64/mm.c | 5 +-- xen/common/grant_table.c | 10 +++--- xen/drivers/passthrough/amd/iommu_map.c | 50 +-- xen/drivers/passthrough/amd/pci_amd_iommu.c | 8 ++--- xen/drivers/passthrough/arm/smmu.c| 41 -- xen/drivers/passthrough/iommu.c | 21 +-- xen/drivers/passthrough/vtd/iommu.c | 48 +++-- xen/drivers/passthrough/vtd/x86/vtd.c | 4 +-- xen/drivers/passthrough/x86/iommu.c | 6 ++-- xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 8 +++-- xen/include/xen/iommu.h | 20 ++- 15 files changed, 196 insertions(+), 121 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 2dc7db9..33fcffe 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -2623,11 +2623,14 @@ static int __get_page_type(struct page_info *page, unsigned long type, if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) { if ( (x & PGT_type_mask) == PGT_writable_page ) -iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); +iommu_ret = iommu_unmap_pages(d, + mfn_to_gmfn(d, page_to_mfn(page)), + 0); else if ( type == PGT_writable_page ) -iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), - page_to_mfn(page), - IOMMUF_readable|IOMMUF_writable); +iommu_ret = iommu_map_pages(d, +mfn_to_gmfn(d, page_to_mfn(page)), +page_to_mfn(page), 0, +IOMMUF_readable|IOMMUF_writable); } } diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index ecab56f..0ccf451 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -870,26 +870,9 @@ out: else { if ( iommu_flags ) -for ( i = 0; i < (1 << order); i++ ) -{ -rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags); -if ( unlikely(rc) ) -{ -while ( i-- ) -/* If statement to satisfy __must_check. */ -if ( iommu_unmap_page(p2m->domain, gfn + i) ) -continue; - -break; -} -} +rc = iommu_map_pages(d, gfn, mfn_x(mfn), order, iommu_flags); else -for ( i = 0; i < (1 << order); i++ ) -{ -ret = iommu_unmap_page(d, gfn + i); -if ( !rc ) -rc = ret; -} +rc = iommu_unmap_pages(d, gfn, order);
[Xen-devel] [PATCH v2 06/13] iommu: Add extra use_iommu argument to iommu_domain_init()
From: Oleksandr TyshchenkoThe presence of this flag lets us know that the guest domain has statically assigned devices which will most likely be used for passthrough and as the result the IOMMU is expected to be used for this domain. Taking into the account this hint when dealing with non-shared IOMMUs we can populate IOMMU page tables before hand avoid going through the list of pages at the first assigned device. As this flag doesn't cover hotplug case, we will continue to populate IOMMU page tables on the fly. Extend corresponding platform callback with extra argument as well and pass thought incoming flag to the IOMMU drivers followed by updating "d->need_iommu" flag for any domains. But, it must be an additional logic before updating this flag for hardware domains which the next patch is introducing. As iommu_domain_init() is called with "use_iommu" flag being forced to false for now, no functional change is intended for both ARM and x86. Signed-off-by: Oleksandr Tyshchenko CC: Jan Beulich CC: Julien Grall CC: Stefano Stabellini CC: Andrew Cooper CC: Kevin Tian CC: Suravee Suthikulpanit --- Changes in v1: - Clarify patch subject/description. - s/bool_t/bool/ Changes in v2: - Extend "init" callback with extra argument too. - Clarify patch description. - Add maintainers in CC --- xen/arch/arm/domain.c | 2 +- xen/arch/x86/domain.c | 2 +- xen/drivers/passthrough/amd/pci_amd_iommu.c | 2 +- xen/drivers/passthrough/arm/smmu.c | 2 +- xen/drivers/passthrough/iommu.c | 10 -- xen/drivers/passthrough/vtd/iommu.c | 2 +- xen/include/xen/iommu.h | 4 ++-- 7 files changed, 15 insertions(+), 9 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 76310ed..ec19310 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -569,7 +569,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags, ASSERT(config != NULL); /* p2m_init relies on some value initialized by the IOMMU subsystem */ -if ( (rc = iommu_domain_init(d)) != 0 ) +if ( (rc = iommu_domain_init(d, false)) != 0 ) goto fail; if ( (rc = p2m_init(d)) != 0 ) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index d7e6992..1ffe76c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -641,7 +641,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags, if ( (rc = init_domain_irq_mapping(d)) != 0 ) goto fail; -if ( (rc = iommu_domain_init(d)) != 0 ) +if ( (rc = iommu_domain_init(d, false)) != 0 ) goto fail; } spin_lock_init(>arch.e820_lock); diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c index fe744d2..2491e8c 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -261,7 +261,7 @@ static int get_paging_mode(unsigned long entries) return level; } -static int amd_iommu_domain_init(struct domain *d) +static int amd_iommu_domain_init(struct domain *d, bool use_iommu) { struct domain_iommu *hd = dom_iommu(d); diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index e828308..652b58c 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -2705,7 +2705,7 @@ static int arm_smmu_reassign_dev(struct domain *s, struct domain *t, return 0; } -static int arm_smmu_iommu_domain_init(struct domain *d) +static int arm_smmu_iommu_domain_init(struct domain *d, bool use_iommu) { struct arm_smmu_xen_domain *xen_domain; diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 3e9e4c3..19c87d1 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -129,7 +129,7 @@ static void __init parse_iommu_param(char *s) } while ( ss ); } -int iommu_domain_init(struct domain *d) +int iommu_domain_init(struct domain *d, bool use_iommu) { struct domain_iommu *hd = dom_iommu(d); int ret = 0; @@ -142,7 +142,13 @@ int iommu_domain_init(struct domain *d) return 0; hd->platform_ops = iommu_get_ops(); -return hd->platform_ops->init(d); +ret = hd->platform_ops->init(d, use_iommu); +if ( ret ) +return ret; + +d->need_iommu = use_iommu; + +return 0; } static void __hwdom_init check_hwdom_reqs(struct domain *d) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index b4e8c89..45d1f36 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1277,7 +1277,7
[Xen-devel] [PATCH v2 13/13] [RFC] iommu: AMD-Vi: Squash map_pages/unmap_pages with map_page/unmap_page
From: Oleksandr TyshchenkoReduce the scope of the TODO by squashing single-page stuff with multi-page one. Next target is to use large pages whenever possible in the case that hardware supports them. Signed-off-by: Oleksandr Tyshchenko CC: Jan Beulich CC: Suravee Suthikulpanit --- Changes in v1: - Changes in v2: - Signed-off-by: Oleksandr Tyshchenko --- xen/drivers/passthrough/amd/iommu_map.c | 250 1 file changed, 121 insertions(+), 129 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index ea3a728..22d0cc6 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -631,188 +631,180 @@ static int update_paging_mode(struct domain *d, unsigned long gfn) return 0; } -static int __must_check amd_iommu_map_page(struct domain *d, unsigned long gfn, - unsigned long mfn, - unsigned int flags) +/* + * TODO: Optimize by using large pages whenever possible in the case + * that hardware supports them. + */ +int __must_check amd_iommu_map_pages(struct domain *d, unsigned long gfn, + unsigned long mfn, + unsigned int order, + unsigned int flags) { -bool_t need_flush = 0; struct domain_iommu *hd = dom_iommu(d); int rc; -unsigned long pt_mfn[7]; -unsigned int merge_level; +unsigned long orig_gfn = gfn; +unsigned long i; if ( iommu_use_hap_pt(d) ) return 0; -memset(pt_mfn, 0, sizeof(pt_mfn)); - spin_lock(>arch.mapping_lock); - rc = amd_iommu_alloc_root(hd); +spin_unlock(>arch.mapping_lock); if ( rc ) { -spin_unlock(>arch.mapping_lock); AMD_IOMMU_DEBUG("Root table alloc failed, gfn = %lx\n", gfn); domain_crash(d); return rc; } -/* Since HVM domain is initialized with 2 level IO page table, - * we might need a deeper page table for lager gfn now */ -if ( is_hvm_domain(d) ) +for ( i = 0; i < (1UL << order); i++, gfn++, mfn++ ) { -if ( update_paging_mode(d, gfn) ) +bool_t need_flush = 0; +unsigned long pt_mfn[7]; +unsigned int merge_level; + +memset(pt_mfn, 0, sizeof(pt_mfn)); + +spin_lock(>arch.mapping_lock); + +/* Since HVM domain is initialized with 2 level IO page table, + * we might need a deeper page table for lager gfn now */ +if ( is_hvm_domain(d) ) +{ +if ( update_paging_mode(d, gfn) ) +{ +spin_unlock(>arch.mapping_lock); +AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn); +domain_crash(d); +rc = -EFAULT; +goto err; +} +} + +if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) ) { spin_unlock(>arch.mapping_lock); -AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn); +AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn); domain_crash(d); -return -EFAULT; +rc = -EFAULT; +goto err; } -} -if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) ) -{ -spin_unlock(>arch.mapping_lock); -AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn); -domain_crash(d); -return -EFAULT; -} +/* Install 4k mapping first */ +need_flush = set_iommu_pte_present(pt_mfn[1], gfn, mfn, + IOMMU_PAGING_MODE_LEVEL_1, + !!(flags & IOMMUF_writable), + !!(flags & IOMMUF_readable)); -/* Install 4k mapping first */ -need_flush = set_iommu_pte_present(pt_mfn[1], gfn, mfn, - IOMMU_PAGING_MODE_LEVEL_1, - !!(flags & IOMMUF_writable), - !!(flags & IOMMUF_readable)); +/* Do not increase pde count if io mapping has not been changed */ +if ( !need_flush ) +{ +spin_unlock(>arch.mapping_lock); +continue; +} -/* Do not increase pde count if io mapping has not been changed */ -if ( !need_flush ) -goto out; +/* 4K mapping for PV guests never changes, + * no need to flush if we trust non-present bits */ +if ( is_hvm_domain(d) ) +amd_iommu_flush_pages(d, gfn, 0); -/* 4K mapping for PV guests never changes, - * no need to flush if we trust
[Xen-devel] [PATCH v2 04/13] xen/arm: p2m: Update IOMMU mapping whenever possible if page table is not shared
From: Oleksandr TyshchenkoUpdate IOMMU mapping if the IOMMU doesn't share page table with the CPU. The best place to do so on ARM is __p2m_set_entry(). Use mfn as an indicator of the required action. If mfn is valid call iommu_map_pages(), otherwise - iommu_unmap_pages(). Signed-off-by: Oleksandr Tyshchenko Acked-by: Julien Grall --- Changes in v1: - Update IOMMU mapping in __p2m_set_entry() instead of p2m_set_entry(). - Pass order argument to IOMMU APIs instead of page_count. Changes in v2: - Add Julien's acked-by --- xen/arch/arm/p2m.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index fc2a106..729ed94 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -984,7 +984,15 @@ static int __p2m_set_entry(struct p2m_domain *p2m, p2m_free_entry(p2m, orig_pte, level); if ( need_iommu(p2m->domain) && (p2m_valid(orig_pte) || p2m_valid(*entry)) ) -rc = iommu_iotlb_flush(p2m->domain, gfn_x(sgfn), 1UL << page_order); +{ +if ( iommu_use_hap_pt(p2m->domain) ) +rc = iommu_iotlb_flush(p2m->domain, gfn_x(sgfn), 1UL << page_order); +else if ( !mfn_eq(smfn, INVALID_MFN) ) +rc = iommu_map_pages(p2m->domain, gfn_x(sgfn), mfn_x(smfn), + page_order, p2m_get_iommu_flags(t)); +else +rc = iommu_unmap_pages(p2m->domain, gfn_x(sgfn), page_order); +} else rc = 0; -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 05/13] iommu/arm: Re-define iommu_use_hap_pt(d) as iommu_hap_pt_share
From: Oleksandr TyshchenkoNot every integrated into ARM SoCs IOMMU can share page tables with the CPU and as the result the iommu_use_hap_pt(d) mustn't always be true. Reuse x86's iommu_hap_pt_share flag to indicate whether the IOMMU page table is shared or not. As P2M table must always be shared between the CPU and the SMMU print an error message and bail out if this flag was previously unset. Signed-off-by: Oleksandr Tyshchenko CC: Julien Grall --- Changes in v1: - Changes in v2: - Bail out if iommu_hap_pt_share is unset instead of overriding it - Clarify comment in code --- xen/drivers/passthrough/arm/smmu.c | 6 ++ xen/include/asm-arm/iommu.h| 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 7c313c0..e828308 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -2856,6 +2856,12 @@ static __init int arm_smmu_dt_init(struct dt_device_node *dev, */ dt_device_set_used_by(dev, DOMID_XEN); + if (!iommu_hap_pt_share) { + dev_err(dt_to_dev(dev), + "P2M table must always be shared between the CPU and the SMMU\n"); + return -EINVAL; + } + rc = arm_smmu_device_dt_probe(dev); if (rc) return rc; diff --git a/xen/include/asm-arm/iommu.h b/xen/include/asm-arm/iommu.h index 57d9b1e..2a6bd3d 100644 --- a/xen/include/asm-arm/iommu.h +++ b/xen/include/asm-arm/iommu.h @@ -20,8 +20,8 @@ struct arch_iommu void *priv; }; -/* Always share P2M Table between the CPU and the IOMMU */ -#define iommu_use_hap_pt(d) (1) +/* Not every ARM SoCs IOMMU use the same page-table format as the CPU. */ +#define iommu_use_hap_pt(d) (iommu_hap_pt_share) const struct iommu_ops *iommu_get_ops(void); void __init iommu_set_ops(const struct iommu_ops *ops); -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 09/13] xen/arm: Add use_iommu flag to xen_arch_domainconfig
From: Oleksandr TyshchenkoThis flag is intended to let Xen know that the guest has devices which will most likely be used for passthrough and as the result the IOMMU is expected to be used for this domain. The primary aim of this knowledge is to help the IOMMUs that don't share page tables with the CPU on ARM be ready before P2M code starts updating IOMMU mapping. So, if this flag is set the non-shared IOMMUs will populate their page tables at the domain creation time and thereby will be able to handle IOMMU mapping updates from *the very beginning*. In order to retain the current behavior for x86 still call iommu_domain_init() with use_iommu flag being forced to false. Signed-off-by: Oleksandr Tyshchenko CC: Jan Beulich CC: Julien Grall CC: Ian Jackson CC: Wei Liu --- Changes in V1: - Treat use_iommu flag as the ARM decision only. Don't use common domain creation flag for it, use ARM config instead. - Clarify patch subject/description. Changes in V2: - Cosmetic fixes. --- tools/libxl/libxl_arm.c | 8 xen/arch/arm/domain.c | 2 +- xen/include/public/arch-arm.h | 5 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index d842d88..cb9fe05 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -78,6 +78,14 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, return ERROR_FAIL; } +if (d_config->num_dtdevs || d_config->num_pcidevs) +xc_config->use_iommu = 1; +else +xc_config->use_iommu = 0; + +LOG(DEBUG, "IOMMU %s expected to be used for this domain", +xc_config->use_iommu ? "is" : "isn't"); + return 0; } diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index ec19310..3079bbe 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -569,7 +569,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags, ASSERT(config != NULL); /* p2m_init relies on some value initialized by the IOMMU subsystem */ -if ( (rc = iommu_domain_init(d, false)) != 0 ) +if ( (rc = iommu_domain_init(d, !!config->use_iommu)) != 0 ) goto fail; if ( (rc = p2m_init(d)) != 0 ) diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index bd974fb..b1fae45 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -322,6 +322,11 @@ struct xen_arch_domainconfig { * */ uint32_t clock_frequency; +/* + * IN + * IOMMU is expected to be used for this domain. + */ +uint8_t use_iommu; }; #endif /* __XEN__ || __XEN_TOOLS__ */ -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 03/13] xen/arm: p2m: Add helper to convert p2m type to IOMMU flags
From: Oleksandr TyshchenkoThe helper has the same purpose as existing for x86 one. It is used for choosing IOMMU mapping attribute according to the memory type. Signed-off-by: Oleksandr Tyshchenko Reviewed-by: Julien Grall --- Changes in v1: - Add Julien's reviewed-by Changes in v2: - --- xen/include/asm-arm/p2m.h | 34 ++ 1 file changed, 34 insertions(+) diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index 1269052..635cc25 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -5,6 +5,7 @@ #include #include #include +#include #include /* for vm_event_response_t */ #include #include @@ -353,6 +354,39 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order) return gfn_add(gfn, 1UL << order); } +/* + * p2m type to IOMMU flags + */ +static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt) +{ +unsigned int flags; + +switch( p2mt ) +{ +case p2m_ram_rw: +case p2m_iommu_map_rw: +case p2m_map_foreign: +case p2m_grant_map_rw: +case p2m_mmio_direct_dev: +case p2m_mmio_direct_nc: +case p2m_mmio_direct_c: +flags = IOMMUF_readable | IOMMUF_writable; +break; +case p2m_ram_ro: +case p2m_iommu_map_ro: +case p2m_grant_map_ro: +flags = IOMMUF_readable; +break; +default: +flags = 0; +break; +} + +/* TODO Do we need to handle access permissions here? */ + +return flags; +} + #endif /* _XEN_P2M_H */ /* -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 08/13] iommu/arm: Misc fixes for arch specific part
From: Oleksandr Tyshchenko1. Add missing return in case if IOMMU ops have been already set. 2. Add check for shared IOMMU before returning an error. Signed-off-by: Oleksandr Tyshchenko CC: Julien Grall --- Changes in V1: - Changes in V2: - --- xen/drivers/passthrough/arm/iommu.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/arm/iommu.c b/xen/drivers/passthrough/arm/iommu.c index 95b1abb..6f01c13 100644 --- a/xen/drivers/passthrough/arm/iommu.c +++ b/xen/drivers/passthrough/arm/iommu.c @@ -32,7 +32,10 @@ void __init iommu_set_ops(const struct iommu_ops *ops) BUG_ON(ops == NULL); if ( iommu_ops && iommu_ops != ops ) +{ printk("WARNING: Cannot set IOMMU ops, already set to a different value\n"); +return; +} iommu_ops = ops; } @@ -70,6 +73,6 @@ void arch_iommu_domain_destroy(struct domain *d) int arch_iommu_populate_page_table(struct domain *d) { -/* The IOMMU shares the p2m with the CPU */ -return -ENOSYS; +/* Return an error if the IOMMU shares the p2m with the CPU */ +return iommu_use_hap_pt(d) ? -ENOSYS : 0; } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 11/13] iommu/arm: smmu: Squash map_pages/unmap_pages with map_page/unmap_page
From: Oleksandr TyshchenkoEliminate TODO by squashing single-page stuff with multi-page one. Signed-off-by: Oleksandr Tyshchenko CC: Julien Grall --- Changes in v1: - Changes in v2: - --- xen/drivers/passthrough/arm/smmu.c | 48 +- 1 file changed, 6 insertions(+), 42 deletions(-) diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 652b58c..021031a 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -2737,8 +2737,8 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d) xfree(xen_domain); } -static int __must_check arm_smmu_map_page(struct domain *d, unsigned long gfn, - unsigned long mfn, unsigned int flags) +static int __must_check arm_smmu_map_pages(struct domain *d, unsigned long gfn, + unsigned long mfn, unsigned int order, unsigned int flags) { p2m_type_t t; @@ -2763,10 +2763,11 @@ static int __must_check arm_smmu_map_page(struct domain *d, unsigned long gfn, * The function guest_physmap_add_entry replaces the current mapping * if there is already one... */ - return guest_physmap_add_entry(d, _gfn(gfn), _mfn(mfn), 0, t); + return guest_physmap_add_entry(d, _gfn(gfn), _mfn(mfn), order, t); } -static int __must_check arm_smmu_unmap_page(struct domain *d, unsigned long gfn) +static int __must_check arm_smmu_unmap_pages(struct domain *d, + unsigned long gfn, unsigned int order) { /* * This function should only be used by gnttab code when the domain @@ -2775,44 +2776,7 @@ static int __must_check arm_smmu_unmap_page(struct domain *d, unsigned long gfn) if ( !is_domain_direct_mapped(d) ) return -EINVAL; - return guest_physmap_remove_page(d, _gfn(gfn), _mfn(gfn), 0); -} - -/* TODO: Optimize by squashing map_pages/unmap_pages with map_page/unmap_page */ -static int __must_check arm_smmu_map_pages(struct domain *d, unsigned long gfn, - unsigned long mfn, unsigned int order, unsigned int flags) -{ - unsigned long i; - int rc = 0; - - for (i = 0; i < (1UL << order); i++) { - rc = arm_smmu_map_page(d, gfn + i, mfn + i, flags); - if (unlikely(rc)) { - while (i--) - /* If statement to satisfy __must_check. */ - if (arm_smmu_unmap_page(d, gfn + i)) - continue; - - break; - } - } - - return rc; -} - -static int __must_check arm_smmu_unmap_pages(struct domain *d, - unsigned long gfn, unsigned int order) -{ - unsigned long i; - int rc = 0; - - for (i = 0; i < (1UL << order); i++) { - int ret = arm_smmu_unmap_page(d, gfn + i); - if (!rc) - rc = ret; - } - - return rc; + return guest_physmap_remove_page(d, _gfn(gfn), _mfn(gfn), order); } static const struct iommu_ops arm_smmu_iommu_ops = { -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 07/13] iommu: Make decision about needing IOMMU for hardware domains in advance
From: Oleksandr TyshchenkoThe hardware domains require IOMMU to be used in the most cases and a decision to use it is made at hardware domain construction time. But, it is not the best moment for the non-shared IOMMUs due to the necessity of retrieving all mapping which could happen in a period of time between IOMMU per-domain initialization and this moment. So, make a decision about needing IOMMU a bit earlier, in iommu_domain_init(). Having "d->need_iommu" flag set at the early stage we won't skip any IOMMU mapping updates. And as the result the existing in iommu_hwdom_init() code that goes through the list of the pages and tries to retrieve mapping for non-shared IOMMUs won't be needed anymore and can be just dropped. Signed-off-by: Oleksandr Tyshchenko CC: Jan Beulich CC: Julien Grall --- Changes in v1: - Changes in v2: - This is the result of reworking old patch: [PATCH v1 08/10] iommu: Split iommu_hwdom_init() into arch specific parts --- xen/drivers/passthrough/iommu.c | 44 ++--- 1 file changed, 10 insertions(+), 34 deletions(-) diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 19c87d1..f5e5b7e 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -52,7 +52,7 @@ custom_param("iommu", parse_iommu_param); bool_t __initdata iommu_enable = 1; bool_t __read_mostly iommu_enabled; bool_t __read_mostly force_iommu; -bool_t __hwdom_initdata iommu_dom0_strict; +bool_t __read_mostly iommu_dom0_strict; bool_t __read_mostly iommu_verbose; bool_t __read_mostly iommu_workaround_bios_bug; bool_t __read_mostly iommu_igfx = 1; @@ -141,6 +141,15 @@ int iommu_domain_init(struct domain *d, bool use_iommu) if ( !iommu_enabled ) return 0; +if ( is_hardware_domain(d) ) +{ +if ( (paging_mode_translate(d) && !iommu_passthrough) || + iommu_dom0_strict ) +use_iommu = 1; +else +use_iommu = 0; +} + hd->platform_ops = iommu_get_ops(); ret = hd->platform_ops->init(d, use_iommu); if ( ret ) @@ -161,8 +170,6 @@ static void __hwdom_init check_hwdom_reqs(struct domain *d) if ( iommu_passthrough ) panic("Dom0 uses paging translated mode, dom0-passthrough must not be " "enabled\n"); - -iommu_dom0_strict = 1; } void __hwdom_init iommu_hwdom_init(struct domain *d) @@ -175,37 +182,6 @@ void __hwdom_init iommu_hwdom_init(struct domain *d) return; register_keyhandler('o', _dump_p2m_table, "dump iommu p2m table", 0); -d->need_iommu = !!iommu_dom0_strict; -if ( need_iommu(d) && !iommu_use_hap_pt(d) ) -{ -struct page_info *page; -unsigned int i = 0; -int rc = 0; - -page_list_for_each ( page, >page_list ) -{ -unsigned long mfn = page_to_mfn(page); -unsigned long gfn = mfn_to_gmfn(d, mfn); -unsigned int mapping = IOMMUF_readable; -int ret; - -if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) || - ((page->u.inuse.type_info & PGT_type_mask) - == PGT_writable_page) ) -mapping |= IOMMUF_writable; - -ret = hd->platform_ops->map_pages(d, gfn, mfn, 0, mapping); -if ( !rc ) -rc = ret; - -if ( !(i++ & 0xf) ) -process_pending_softirqs(); -} - -if ( rc ) -printk(XENLOG_WARNING "d%d: IOMMU mapping failed: %d\n", - d->domain_id, rc); -} return hd->platform_ops->hwdom_init(d); } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 10/13] xen/arm: domain_build: Don't expose IOMMU specific properties to the guest
From: Oleksandr TyshchenkoWe don't passthrough IOMMU device to DOM0 even if it is not used by Xen. Therefore exposing the properties that describe relationship between master devices and IOMMUs does not make any sense. According to the: 1. Documentation/devicetree/bindings/iommu/iommu.txt 2. Documentation/devicetree/bindings/pci/pci-iommu.txt Signed-off-by: Oleksandr Tyshchenko CC: Julien Grall --- Changes in v1: - Changes in v2: - Skip optional properties too. - Clarify patch description --- xen/arch/arm/domain_build.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 3abacc0..fadfbbc 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -432,6 +432,16 @@ static int write_properties(struct domain *d, struct kernel_info *kinfo, continue; } +/* Don't expose IOMMU specific properties to the guest */ +if ( dt_property_name_is_equal(prop, "iommus") ) +continue; + +if ( dt_property_name_is_equal(prop, "iommu-map") ) +continue; + +if ( dt_property_name_is_equal(prop, "iommu-map-mask") ) +continue; + res = fdt_property(kinfo->fdt, prop->name, prop_data, prop_len); if ( res ) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 01/13] xen/device-tree: Add dt_count_phandle_with_args helper
From: Oleksandr TyshchenkoPort Linux helper of_count_phandle_with_args for counting number of phandles in a property. Signed-off-by: Oleksandr Tyshchenko Reviewed-by: Julien Grall --- Changes in v1: - Add Julien's reviewed-by Changes in v2: - --- xen/common/device_tree.c | 7 +++ xen/include/xen/device_tree.h | 19 +++ 2 files changed, 26 insertions(+) diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c index 7b009ea..60b0095 100644 --- a/xen/common/device_tree.c +++ b/xen/common/device_tree.c @@ -1663,6 +1663,13 @@ int dt_parse_phandle_with_args(const struct dt_device_node *np, index, out_args); } +int dt_count_phandle_with_args(const struct dt_device_node *np, + const char *list_name, + const char *cells_name) +{ +return __dt_parse_phandle_with_args(np, list_name, cells_name, 0, -1, NULL); +} + /** * unflatten_dt_node - Alloc and populate a device_node from the flat tree * @fdt: The parent device tree blob diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h index 0aecbe0..738f1b6 100644 --- a/xen/include/xen/device_tree.h +++ b/xen/include/xen/device_tree.h @@ -764,6 +764,25 @@ int dt_parse_phandle_with_args(const struct dt_device_node *np, const char *cells_name, int index, struct dt_phandle_args *out_args); +/** + * dt_count_phandle_with_args() - Find the number of phandles references in a property + * @np: pointer to a device tree node containing a list + * @list_name: property name that contains a list + * @cells_name: property name that specifies phandles' arguments count + * + * Returns the number of phandle + argument tuples within a property. It + * is a typical pattern to encode a list of phandle and variable + * arguments into a single property. The number of arguments is encoded + * by a property in the phandle-target node. For example, a gpios + * property would contain a list of GPIO specifies consisting of a + * phandle and 1 or more arguments. The number of arguments are + * determined by the #gpio-cells property in the node pointed to by the + * phandle. + */ +int dt_count_phandle_with_args(const struct dt_device_node *np, + const char *list_name, + const char *cells_name); + #ifdef CONFIG_DEVICE_TREE_DEBUG #define dt_dprintk(fmt, args...) \ printk(XENLOG_DEBUG fmt, ## args) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 12/13] [RFC] iommu: VT-d: Squash map_pages/unmap_pages with map_page/unmap_page
From: Oleksandr TyshchenkoReduce the scope of the TODO by squashing single-page stuff with multi-page one. Next target is to use large pages whenever possible in the case that hardware supports them. Signed-off-by: Oleksandr Tyshchenko CC: Jan Beulich CC: Kevin Tian --- Changes in v1: - Changes in v2: - --- xen/drivers/passthrough/vtd/iommu.c | 138 +--- 1 file changed, 67 insertions(+), 71 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 45d1f36..d20b2f9 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1750,15 +1750,24 @@ static void iommu_domain_teardown(struct domain *d) spin_unlock(>arch.mapping_lock); } -static int __must_check intel_iommu_map_page(struct domain *d, - unsigned long gfn, - unsigned long mfn, - unsigned int flags) +static int __must_check intel_iommu_unmap_pages(struct domain *d, +unsigned long gfn, +unsigned int order); + +/* + * TODO: Optimize by using large pages whenever possible in the case + * that hardware supports them. + */ +static int __must_check intel_iommu_map_pages(struct domain *d, + unsigned long gfn, + unsigned long mfn, + unsigned int order, + unsigned int flags) { struct domain_iommu *hd = dom_iommu(d); -struct dma_pte *page = NULL, *pte = NULL, old, new = { 0 }; -u64 pg_maddr; int rc = 0; +unsigned long orig_gfn = gfn; +unsigned long i; /* Do nothing if VT-d shares EPT page table */ if ( iommu_use_hap_pt(d) ) @@ -1768,78 +1777,60 @@ static int __must_check intel_iommu_map_page(struct domain *d, if ( iommu_passthrough && is_hardware_domain(d) ) return 0; -spin_lock(>arch.mapping_lock); - -pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn << PAGE_SHIFT_4K, 1); -if ( pg_maddr == 0 ) +for ( i = 0; i < (1UL << order); i++, gfn++, mfn++ ) { -spin_unlock(>arch.mapping_lock); -return -ENOMEM; -} -page = (struct dma_pte *)map_vtd_domain_page(pg_maddr); -pte = page + (gfn & LEVEL_MASK); -old = *pte; -dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K); -dma_set_pte_prot(new, - ((flags & IOMMUF_readable) ? DMA_PTE_READ : 0) | - ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0)); +struct dma_pte *page = NULL, *pte = NULL, old, new = { 0 }; +u64 pg_maddr; -/* Set the SNP on leaf page table if Snoop Control available */ -if ( iommu_snoop ) -dma_set_pte_snp(new); +spin_lock(>arch.mapping_lock); -if ( old.val == new.val ) -{ +pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn << PAGE_SHIFT_4K, 1); +if ( pg_maddr == 0 ) +{ +spin_unlock(>arch.mapping_lock); +rc = -ENOMEM; +goto err; +} +page = (struct dma_pte *)map_vtd_domain_page(pg_maddr); +pte = page + (gfn & LEVEL_MASK); +old = *pte; +dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K); +dma_set_pte_prot(new, + ((flags & IOMMUF_readable) ? DMA_PTE_READ : 0) | + ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0)); + +/* Set the SNP on leaf page table if Snoop Control available */ +if ( iommu_snoop ) +dma_set_pte_snp(new); + +if ( old.val == new.val ) +{ +spin_unlock(>arch.mapping_lock); +unmap_vtd_domain_page(page); +continue; +} +*pte = new; + +iommu_flush_cache_entry(pte, sizeof(struct dma_pte)); spin_unlock(>arch.mapping_lock); unmap_vtd_domain_page(page); -return 0; -} -*pte = new; - -iommu_flush_cache_entry(pte, sizeof(struct dma_pte)); -spin_unlock(>arch.mapping_lock); -unmap_vtd_domain_page(page); -if ( !this_cpu(iommu_dont_flush_iotlb) ) -rc = iommu_flush_iotlb(d, gfn, dma_pte_present(old), 1); - -return rc; -} - -static int __must_check intel_iommu_unmap_page(struct domain *d, - unsigned long gfn) -{ -/* Do nothing if hardware domain and iommu supports pass thru. */ -if ( iommu_passthrough && is_hardware_domain(d) ) -return 0; - -return dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K); -} - -/* TODO: Optimize by squashing map_pages/unmap_pages with
[Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM
From: Oleksandr TyshchenkoHi, all. The purpose of this patch series is to create a base for porting any "Non-shared" IOMMUs to Xen on ARM. Saying "Non-shared" IOMMU I mean the IOMMU that can't share the page table with the CPU. Primarily, we are interested in IPMMU-VMSA and I hope that it will be the first candidate. It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car Gen3 SoCs (ARM). I am about to push IPMMU-VMSA support in a while. With regard to the patch series, it was rebased on Xen 4.9.0 release and tested on Renesas R-Car Gen3 H3/M3 based boards with applied IPMMU-VMSA support: - Patches 1 and 3 have Julien's Rb. - Patch 2 has Jan's Rb but only for x86 and generic parts. - Patch 4 has Julien's Ab. - Patches 5,6,9,10 were slightly reworked. - Patch 7 was significantly reworked. The previous patch -> iommu: Split iommu_hwdom_init() into arch specific parts - Patches 8,11,12,13 are new. Not really sure about x86-related changes since I had no possibility to check. So, compile-tested on x86. You can find current patch series here: repo: https://github.com/otyshchenko1/xen.git branch: non_shared_iommu_v2 Previous patch series here: [PATCH v1 00/10] "Non-shared" IOMMU support on ARM https://www.mail-archive.com/xen-devel@lists.xen.org/msg107532.html [RFC PATCH 0/9] "Non-shared" IOMMU support on ARM https://www.mail-archive.com/xen-devel@lists.xen.org/msg100468.html Thank you. Oleksandr Tyshchenko (13): xen/device-tree: Add dt_count_phandle_with_args helper iommu: Add extra order argument to the IOMMU APIs and platform callbacks xen/arm: p2m: Add helper to convert p2m type to IOMMU flags xen/arm: p2m: Update IOMMU mapping whenever possible if page table is not shared iommu/arm: Re-define iommu_use_hap_pt(d) as iommu_hap_pt_share iommu: Add extra use_iommu argument to iommu_domain_init() iommu: Make decision about needing IOMMU for hardware domains in advance iommu/arm: Misc fixes for arch specific part xen/arm: Add use_iommu flag to xen_arch_domainconfig xen/arm: domain_build: Don't expose IOMMU specific properties to the guest iommu/arm: smmu: Squash map_pages/unmap_pages with map_page/unmap_page [RFC] iommu: VT-d: Squash map_pages/unmap_pages with map_page/unmap_page [RFC] iommu: AMD-Vi: Squash map_pages/unmap_pages with map_page/unmap_page tools/libxl/libxl_arm.c | 8 + xen/arch/arm/domain.c | 2 +- xen/arch/arm/domain_build.c | 10 ++ xen/arch/arm/p2m.c| 10 +- xen/arch/x86/domain.c | 2 +- xen/arch/x86/mm.c | 11 +- xen/arch/x86/mm/p2m-ept.c | 21 +-- xen/arch/x86/mm/p2m-pt.c | 26 +--- xen/arch/x86/mm/p2m.c | 38 + xen/arch/x86/x86_64/mm.c | 5 +- xen/common/device_tree.c | 7 + xen/common/grant_table.c | 10 +- xen/drivers/passthrough/amd/iommu_map.c | 212 +++--- xen/drivers/passthrough/amd/pci_amd_iommu.c | 10 +- xen/drivers/passthrough/arm/iommu.c | 7 +- xen/drivers/passthrough/arm/smmu.c| 23 ++- xen/drivers/passthrough/iommu.c | 73 - xen/drivers/passthrough/vtd/iommu.c | 116 +- xen/drivers/passthrough/vtd/x86/vtd.c | 4 +- xen/drivers/passthrough/x86/iommu.c | 6 +- xen/include/asm-arm/iommu.h | 4 +- xen/include/asm-arm/p2m.h | 34 + xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 8 +- xen/include/public/arch-arm.h | 5 + xen/include/xen/device_tree.h | 19 +++ xen/include/xen/iommu.h | 24 +-- 26 files changed, 402 insertions(+), 293 deletions(-) -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache
Hi, Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram) start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr(). That result in calling xen_map_cache() with lock=true, but this mapping is never invalidated. So QEMU use more and more RAM until it stop working for a reason or an other. (crash if host have little RAM or stop emulating but no crash) I don't know if calling xen_invalidate_map_cache_entry() in address_space_read_continue() and address_space_write_continue() is the right answer. Is there something better to do ? (A good way to reproduce: Install Windows, so without pv driver.) Thanks, -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu
On 25/07/17 17:40, Alexey G wrote: > On Mon, 24 Jul 2017 21:39:08 +0100 > Igor Druzhininwrote: >>> But, the problem is that overall MMIO hole(s) requirements are not known >>> exactly at the time the HVM domain being created. Some PCI devices will >>> be emulated, some will be merely passed through and yet there will be >>> some RMRR ranges. libxl can't know all this stuff - some comes from the >>> host, some comes from DM. So actual MMIO requirements are known to >>> hvmloader at the PCI bus enumeration time. >>> >> >> IMO hvmloader shouldn't really be allowed to relocate memory under any >> conditions. As Andrew said it's much easier to provision the hole >> statically in libxl during domain construction process and it doesn't >> really compromise any functionality. Having one more entity responsible >> for guest memory layout only makes things more convoluted. > > If moving most tasks of hvmloader to libxl is a planned feature in Citrix, > please let it be discussed on xen-devel first as it may affect many > people... and not all of them might be happy. :) > Everything always goes through the mailing list. > (tons of IMO and TLDR ahead, be warned) > > Moving PCI BAR allocation from guest side to libxl is a controversial step. > This may be the architecturally wrong way in fact. There are properties and > areas of responsibility. Among primary responsibilities of guest's firmware > is PCI BARs and MMIO hole size allocation. That's a guest's territory. > Guest relocates PCI BARs (and not just BIOS able to do this), guest > firmware relocates MMIO hole base for them. If it was a real system, all > tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done > by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were > offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for > them and generating ACPI tables. And that's ok as hvmloader can be > considered merely a 'supplemental' firmware to perform some tasks of > SeaBIOS/OVMF before passing control to them. This solution has some > architecture logic at least and doesn't look bad. > libxl is also a part of firmware so to speak. It's incorrect to think that only hvmloader and BIOS images are "proper" firmware. > On other hand, moving PCI hole calculation to libxl just to let Xen/libxl > know what the MMIO size value is might be a bad idea. > Aside from some code duplication, straying too far from the real hw paths, > or breaking existing (or future) interfaces this might have some other > negative consequences. Ex. who will be initializing guest's ACPI tables if > only libxl will know the memory layout? Some new interfaces between libxl > and hvmloader just to let the latter know what values to write to ACPI > tables being created? Or libxl will be initializing guest's ACPI tables as > well (another guest's internal task)? Similar concerns are applicable to > guest's final E820 construction. > The information is not confined by libxl - it's passed to hvmloader and it can finish the tasks libxl couldn't. Although, ACPI tables could be harmlessly initialized inside libxl as well (see PVH implementation). > Another thing is that handling ioreq/PT MMIO ranges is somewhat a property > of the device model (at least for now). Right now it's QEMU who traps PCI > BAR accesses and tells Xen how to handle specific ranges of MMIO space. If > QEMU already talks to Xen which ranges should be passed through or trapped > -- it can tell him the current overall MMIO limits as well... or handle > these limits himself -- if the MMIO hole range check is all what required to > avoid MMIO space misusing, this check can be easily implemented in QEMU, > provided that QEMU knows what memory/MMIO layout is. There is a lot of > implementation freedom where to place restrictions and checks, Xen or QEMU. > Strictly speaking, the MMIO hole itself can be considered a property of the > emulated machine and may have implementation differences for different > emulated chipsets. For example, the real i440' NB do not have an idea of > high MMIO hole at all. > > We have already a sort of an interface between hvmloader and QEMU -- > hvmloader has to do basic initialization for some emulated chipset's > registers (and this depends on the machine). Providing additional handling > for few other registers (TOM/TOLUD/etc) will cost almost nothing and > purpose of this registers will actually match their usage in real HW. This > way we can use an existing available interface and don't stray too far from > the real HW ways. > > I want to try this approach for Q35 bringup patches for Xen I'm currently > working on. I'll send these patches as RFC and will be glad to receive some > constructive criticism. > Sure. Static hole size provisioning doesn't prohibit its dynamic counterpart. Igor >>> libxl can be taught to retrieve all missing info from QEMU, but this way >>> will require to perform all grunt work of
Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
On 07/25/2017 05:47 PM, Dario Faggioli wrote: > On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote: >> On 07/25/2017 05:00 PM, Dario Faggioli wrote: >>> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote: >>> Mmm.. I think you're right. In fact, in a properly configured >>> system, >>> we'll never go past step 3 (from the comment at the top). >>> >>> Which is not ideal, or at least not what I had in mind. In fact, I >>> think it's better to check step 4 (svc->vcpu->processor in hard- >>> affinity) and step 5 (a CPU from svc's runqueue in hard affinity), >>> as >>> that would mean avoiding a runqueue migration. >>> >>> What about I basically kill step 3, i.e., if we reach this point >>> during >>> the soft-affinity step, I just continue to the hard-affinity one? >> >> Hmm, well *normally* we would rather have a vcpu running within its >> soft >> affinity, even if that means moving it to another runqueue. >> > Yes, but both *ideally* and *normally*, we just should not be here. :-) > > If we did end up here, we're in guessing territory, and, although what > you say about a guest wanting to run on within its soft-affinity is > always true, from the guest own point of view, our job as the scheduler > is to do what would be best for the system as a whole. But we are in a > situation where we could not gather the information to make such a > decision. > >> Is your >> idea that, the only reason we're in this particular code is because >> we >> couldn't grab the lock we need to make a more informed decision; so >> defer if possible to previous decisions, which (we might presume) was >> able to make a more informed decision? >> > Kind of, yes. Basically I think we should "escape" from this situation > as quickly as possible, and causing as few troubles as possible to both > ourself and to others, in the hope that it will go better next time. > > Trying to stay in the same runqueue seems to me to fit this > requirement, as: > - as you say, we're here because a previous (presumably well informed) > decision brought us here so, hopefully, staying here is not too bad, > neither for us nor overall; > - staying here is quicker and means less overhead for svc; > - staying here means less overhead overall. In fact, if we decide to > change runqueue, we will have to take the remote runqueue lock at > some point... And I'd prefer that to be for good reasons. > > All that being said, it probably would be good to add a performance > counter, and try to get a sense of how frequently we actually end up in > this function as a fallback. > > But in the meantime, yes, I'd try to make svc stay in the runqueue > where it is, in this case, if possible. Sounds good. So are you going to respin the series then? > >>> ASSERT_UNREACHABLE() is indeed much better. What do you mean with >>> "something random"? The value to be assigned to cpu? >> >> Er, yes, I meant the return value. Returning 0, or v->processor >> would >> be simple options. *Really* defensive programming would attempt to >> chose something somewhat sensible with the minimal risk of triggering >> some other hidden assumptions (say, any cpu on our current runqueue). >> But part of me says even thinking too long about it is a waste of >> time >> for something we're 99.99% sure can never happen. :-) >> > Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using > either v->processor (with a comment), or a cpumask_any(something). Of > course the latter is expensive, but it should not be a big problem, > considering we'll never get there (I'll have a look at generated the > assembly, to confirm that). OK, thanks. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
Hello Andrew, On 25.07.17 19:23, Andrew Cooper wrote: As a general rule, Xen needs to be able to tolerate and cope with any quantity of crap described by the firmware. On the x86 side, we have large quantities of workarounds for buggy ACPI/MP/SMBIOS tables. That approach somehow covered with early mentioned options: On 25.07.17 15:24, Andrii Anisov wrote: * ignore next duplicating (overlapping) memory node in favor of one already in a memory banks list * merge duplicating (overlapping), even neighboring, memory banks On 25.07.17 19:23, Andrew Cooper wrote: It might be the case that the best Xen can do is give up, but it should do so with a clear error message identifying what the firmware has done which is sufficiently crazy to prevent further booting. We have one more option to choose for the case: * BUG() with clear notification at the moment we are trying to add overlapping memory bank So what to choose? ps: Hitting a BUG() Actually silently dying without earlyprintk enabled in this case. It happens pretty early. -- *Andrii Anisov* ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu
On Tue, 25 Jul 2017 15:13:17 +0100 Igor Druzhininwrote: > >> The algorithm implemented in hvmloader for that is not complicated and > >> can be moved to libxl easily. What we can do is to provision a hole big > >> enough to include all the initially assigned PCI devices. We can also > >> account for emulated MMIO regions if necessary. But, to be honest, it > >> doesn't really matter since if there is no enough space in lower MMIO > >> hole for some BARs - they can be easily relocated to upper MMIO > >> hole by hvmloader or the guest itself (dynamically). > >> > >> Igor > > [Zhang, Xiong Y] yes, If we could supply a big enough mmio hole and > > don't allow hvmloader to do relocate, things will be easier. But how > > could we supply a big enough mmio hole ? a. statical set base address > > of mmio hole to 2G/3G. b. Like hvmloader to probe all the pci devices > > and calculate mmio size. But this runs prior to qemu, how to probe pci > > devices ? > > It's true that we don't know the space occupied by emulated device > before QEMU is started. But QEMU needs to be started with some lower > MMIO hole size statically assigned. > > One of the possible solutions is to calculate a hole size required to > include all the assigned pass-through devices and round it up to the > nearest GB boundary but not larger than 2GB total. If it's not enough to > also include all the emulated devices - it's not enough, some of the PCI > device are going to be relocated to upper MMIO hole in that case. Not all devices are BAR64-capable and even those which are may have Option ROM BARs (mem32 only). Yet there are 32-bits guests who will find 64-bit BARs with values above 4GB to be extremely unacceptable. Low MMIO hole is a precious resource. Also, one need to consider implications of PCI device hotplugging against the 'static' precalculation approach. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote: > On 07/25/2017 05:00 PM, Dario Faggioli wrote: > > On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote: > > > > > Mmm.. I think you're right. In fact, in a properly configured > > system, > > we'll never go past step 3 (from the comment at the top). > > > > Which is not ideal, or at least not what I had in mind. In fact, I > > think it's better to check step 4 (svc->vcpu->processor in hard- > > affinity) and step 5 (a CPU from svc's runqueue in hard affinity), > > as > > that would mean avoiding a runqueue migration. > > > > What about I basically kill step 3, i.e., if we reach this point > > during > > the soft-affinity step, I just continue to the hard-affinity one? > > Hmm, well *normally* we would rather have a vcpu running within its > soft > affinity, even if that means moving it to another runqueue. > Yes, but both *ideally* and *normally*, we just should not be here. :-) If we did end up here, we're in guessing territory, and, although what you say about a guest wanting to run on within its soft-affinity is always true, from the guest own point of view, our job as the scheduler is to do what would be best for the system as a whole. But we are in a situation where we could not gather the information to make such a decision. > Is your > idea that, the only reason we're in this particular code is because > we > couldn't grab the lock we need to make a more informed decision; so > defer if possible to previous decisions, which (we might presume) was > able to make a more informed decision? > Kind of, yes. Basically I think we should "escape" from this situation as quickly as possible, and causing as few troubles as possible to both ourself and to others, in the hope that it will go better next time. Trying to stay in the same runqueue seems to me to fit this requirement, as: - as you say, we're here because a previous (presumably well informed) decision brought us here so, hopefully, staying here is not too bad, neither for us nor overall; - staying here is quicker and means less overhead for svc; - staying here means less overhead overall. In fact, if we decide to change runqueue, we will have to take the remote runqueue lock at some point... And I'd prefer that to be for good reasons. All that being said, it probably would be good to add a performance counter, and try to get a sense of how frequently we actually end up in this function as a fallback. But in the meantime, yes, I'd try to make svc stay in the runqueue where it is, in this case, if possible. > > ASSERT_UNREACHABLE() is indeed much better. What do you mean with > > "something random"? The value to be assigned to cpu? > > Er, yes, I meant the return value. Returning 0, or v->processor > would > be simple options. *Really* defensive programming would attempt to > chose something somewhat sensible with the minimal risk of triggering > some other hidden assumptions (say, any cpu on our current runqueue). > But part of me says even thinking too long about it is a waste of > time > for something we're 99.99% sure can never happen. :-) > Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using either v->processor (with a comment), or a cpumask_any(something). Of course the latter is expensive, but it should not be a big problem, considering we'll never get there (I'll have a look at generated the assembly, to confirm that). Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu
On Mon, 24 Jul 2017 21:39:08 +0100 Igor Druzhininwrote: > > But, the problem is that overall MMIO hole(s) requirements are not known > > exactly at the time the HVM domain being created. Some PCI devices will > > be emulated, some will be merely passed through and yet there will be > > some RMRR ranges. libxl can't know all this stuff - some comes from the > > host, some comes from DM. So actual MMIO requirements are known to > > hvmloader at the PCI bus enumeration time. > > > > IMO hvmloader shouldn't really be allowed to relocate memory under any > conditions. As Andrew said it's much easier to provision the hole > statically in libxl during domain construction process and it doesn't > really compromise any functionality. Having one more entity responsible > for guest memory layout only makes things more convoluted. If moving most tasks of hvmloader to libxl is a planned feature in Citrix, please let it be discussed on xen-devel first as it may affect many people... and not all of them might be happy. :) (tons of IMO and TLDR ahead, be warned) Moving PCI BAR allocation from guest side to libxl is a controversial step. This may be the architecturally wrong way in fact. There are properties and areas of responsibility. Among primary responsibilities of guest's firmware is PCI BARs and MMIO hole size allocation. That's a guest's territory. Guest relocates PCI BARs (and not just BIOS able to do this), guest firmware relocates MMIO hole base for them. If it was a real system, all tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for them and generating ACPI tables. And that's ok as hvmloader can be considered merely a 'supplemental' firmware to perform some tasks of SeaBIOS/OVMF before passing control to them. This solution has some architecture logic at least and doesn't look bad. On other hand, moving PCI hole calculation to libxl just to let Xen/libxl know what the MMIO size value is might be a bad idea. Aside from some code duplication, straying too far from the real hw paths, or breaking existing (or future) interfaces this might have some other negative consequences. Ex. who will be initializing guest's ACPI tables if only libxl will know the memory layout? Some new interfaces between libxl and hvmloader just to let the latter know what values to write to ACPI tables being created? Or libxl will be initializing guest's ACPI tables as well (another guest's internal task)? Similar concerns are applicable to guest's final E820 construction. Another thing is that handling ioreq/PT MMIO ranges is somewhat a property of the device model (at least for now). Right now it's QEMU who traps PCI BAR accesses and tells Xen how to handle specific ranges of MMIO space. If QEMU already talks to Xen which ranges should be passed through or trapped -- it can tell him the current overall MMIO limits as well... or handle these limits himself -- if the MMIO hole range check is all what required to avoid MMIO space misusing, this check can be easily implemented in QEMU, provided that QEMU knows what memory/MMIO layout is. There is a lot of implementation freedom where to place restrictions and checks, Xen or QEMU. Strictly speaking, the MMIO hole itself can be considered a property of the emulated machine and may have implementation differences for different emulated chipsets. For example, the real i440' NB do not have an idea of high MMIO hole at all. We have already a sort of an interface between hvmloader and QEMU -- hvmloader has to do basic initialization for some emulated chipset's registers (and this depends on the machine). Providing additional handling for few other registers (TOM/TOLUD/etc) will cost almost nothing and purpose of this registers will actually match their usage in real HW. This way we can use an existing available interface and don't stray too far from the real HW ways. I want to try this approach for Q35 bringup patches for Xen I'm currently working on. I'll send these patches as RFC and will be glad to receive some constructive criticism. > > libxl can be taught to retrieve all missing info from QEMU, but this way > > will require to perform all grunt work of PCI BARs allocation in libxl > > itself - in order to calculate the real MMIO hole(s) size, one needs to > > take into account all PCI BARs sizes and their alignment requirements > > diversity + existing gaps due to RMRR ranges... basically, libxl will > > need to do most of hvmloader/pci.c's job. > > > > The algorithm implemented in hvmloader for that is not complicated and > can be moved to libxl easily. What we can do is to provision a hole big > enough to include all the initially assigned PCI devices. We can also > account for emulated MMIO regions if necessary. But, to be honest, it > doesn't really matter since if there is no
Re: [Xen-devel] Android bring up over xen for Rcar-H3
Dear George, On 19.07.17 12:42, George John wrote: I want to bring up Android over Xen as Dom0 on Rcar-H3 salvator x board. Can anybody please share the procedure to do so.?. Once you have an Android for Salvator-X board, the procedure is not really specific. You have to have the ATF running u-boot in EL2 mode, stuff from here [1] applied to the kernel. You have to build XEN itself. Than run the system similar to [2]. You will have the Android running in Dom0. But the main question here is building XEN toolstack for Android. I'm not sure if it is feasible. From my previous experience there were no good result. [1] https://github.com/xen-troops/meta-demo/tree/master/meta-rcar-gen3-xen/recipes-kernel/linux/linux-renesas [2] https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions/Salvator-X#Running_the_system -- *Andrii Anisov* ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
On 25/07/17 17:02, Andrii Anisov wrote: > Hello Julien, > > > On 25.07.17 18:44, Julien Grall wrote: >> I have seen work on this board for the past year and it is the first >> time I have seen a complain about memory overlap. So why this sudden >> change? > It just an approach change. I'm cleaning up nits for the board. > >> Is that a comment from the vendor or your guess? > It is my impression. > But yes, it could be worth to ask them why do they hardcode the memory > layout in their u-boot but stuff the dts with number of memory nodes. > Actually from the beginning of our time for this board the solution > was merging all memory description nodes in one in a dts for XEN, like > [1]. So u-boot runtime updates to the dtb were just ignored. > >> You need to differentiate the device-tree spec itself and Linux >> implementation. memblock is something common to all architecture. It >> does not mean it is something valid to do. > That is why I asked for an advice. > > [1] > https://github.com/xen-troops/meta-demo/blob/master/meta-rcar-gen3-xen/recipes-kernel/linux/linux-renesas/r8a7795-salvator-x-xen.dts#L61 As a general rule, Xen needs to be able to tolerate and cope with any quantity of crap described by the firmware. On the x86 side, we have large quantities of workarounds for buggy ACPI/MP/SMBIOS tables. It might be the case that the best Xen can do is give up, but it should do so with a clear error message identifying what the firmware has done which is sufficiently crazy to prevent further booting. Hitting a BUG() is not a user friendly thing to do, so should be fixed. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
On Tue, Jul 25, 2017 at 06:27:31PM +0300, Andrii Anisov wrote: > On 25.07.17 17:23, Julien Grall wrote: > >I think this is by chance rather than by design. The first > >question to answer is why a Firmware would specify twice the same > >memory banks? Is it valid from the specification? > Yep, that is the question. I beleive that all memory regions described in any memory node's reg entries should be disjoint (i.e. should not overlap). Per IEEE-1275, reg entries (within a node) are supposed to be disjoint, and I would expect this requirement to extend across nodes in the case of memory nodes. Currently, the DT spec is somewhat sparse in wording, but this can be corrected. > >Regardless that, it looks like to me that the device-tree you give > >to the board should not contain the memory nodes. > The device-tree is provided by vendor in that form, and u-boot is > theirs. It seems to me that they do not really care since the kernel > tolerates duplication. > > >> ps. Linux kernel does tolerate duplicated memory nodes by > >>merging memory blocks. I.e. memblock_add_range() function is > >>commented as following: > >>/** > >> * memblock_add_range - add new memblock region > >> * @type: memblock type to add new region into > >> * @base: base address of the new region > >> * @size: size of the new region > >> * @nid: nid of the new region > >> * @flags: flags of the new region > >> * > >> * Add new memblock region [@base,@base+@size) into @type. The new > >>region > >> * is allowed to overlap with existing ones - overlaps don't affect > >>already > >> * existing regions. @type is guaranteed to be minimal (all > >>neighbouring > >> * compatible regions are merged) after the addition. > >> * > >> * RETURNS: > >> * 0 on success, -errno on failure. > >> */ > IMO the function description is pretty straight-forward. > But let us wait for device tree guys feedback. This might be the designed of *memblock*, but that does not mean that it is a deliberate part of the DT handling. I beleive this is simply an implementation detail that happens to mask a DT bug. Thanks, Mark. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
On 07/25/2017 05:00 PM, Dario Faggioli wrote: > On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote: >> On 06/16/2017 03:13 PM, Dario Faggioli wrote: >>> >>> diff --git a/xen/common/sched_credit2.c >>> b/xen/common/sched_credit2.c >>> index c749d4e..54f6e21 100644 >>> --- a/xen/common/sched_credit2.c >>> +++ b/xen/common/sched_credit2.c >>> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu, >>> cpumask_t *mask) >>> } >>> >>> /* >>> - * When a hard affinity change occurs, we may not be able to check >>> some >>> - * (any!) of the other runqueues, when looking for the best new >>> processor >>> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that >>> happens, we >>> - * pick, in order of decreasing preference: >>> - * - svc's current pcpu; >>> - * - another pcpu from svc's current runq; >>> - * - any cpu. >>> + * In csched2_cpu_pick(), it may not be possible to actually look >>> at remote >>> + * runqueues (the trylock-s on their spinlocks can fail!). If that >>> happens, >>> + * we pick, in order of decreasing preference: >>> + * 1) svc's current pcpu, if it is part of svc's soft affinity; >>> + * 2) a pcpu in svc's current runqueue that is also in svc's soft >>> affinity; >>> + * 3) just one valid pcpu from svc's soft affinity; >>> + * 4) svc's current pcpu, if it is part of svc's hard affinity; >>> + * 5) a pcpu in svc's current runqueue that is also in svc's hard >>> affinity; >>> + * 6) just one valid pcpu from svc's hard affinity >>> + * >>> + * Of course, 1, 2 and 3 makes sense only if svc has a soft >>> affinity. Also >>> + * note that at least 6 is guaranteed to _always_ return at least >>> one pcpu. >>> */ >>> static int get_fallback_cpu(struct csched2_vcpu *svc) >>> { >>> struct vcpu *v = svc->vcpu; >>> -int cpu = v->processor; >>> +unsigned int bs; >>> >>> -cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, >>> -cpupool_domain_cpumask(v->domain)); >>> +for_each_affinity_balance_step( bs ) >>> +{ >>> +int cpu = v->processor; >>> + >>> +if ( bs == BALANCE_SOFT_AFFINITY && >>> + !has_soft_affinity(v, v->cpu_hard_affinity) ) >>> +continue; >>> >>> -if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) ) >>> -return cpu; >>> +affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu)); >>> +cpumask_and(cpumask_scratch_cpu(cpu), >>> cpumask_scratch_cpu(cpu), >>> +cpupool_domain_cpumask(v->domain)); >>> >>> -if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu), >>> - >rqd->active)) ) >>> -{ >>> -cpumask_and(cpumask_scratch_cpu(cpu), >rqd->active, >>> -cpumask_scratch_cpu(cpu)); >>> -return cpumask_first(cpumask_scratch_cpu(cpu)); >>> -} >>> +/* >>> + * This is cases 1 or 4 (depending on bs): if v->processor >>> is (still) >>> + * in our affinity, go for it, for cache betterness. >>> + */ >>> +if ( likely(cpumask_test_cpu(cpu, >>> cpumask_scratch_cpu(cpu))) ) >>> +return cpu; >>> >>> -ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu))); >>> +/* >>> + * This is cases 2 or 5 (depending on bs): v->processor >>> isn't there >>> + * any longer, check if we at least can stay in our >>> current runq. >>> + */ >>> +if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu), >>> + >rqd->active)) ) >>> +{ >>> +cpumask_and(cpumask_scratch_cpu(cpu), >>> cpumask_scratch_cpu(cpu), >>> +>rqd->active); >>> +return cpumask_first(cpumask_scratch_cpu(cpu)); >>> +} >>> >>> -return cpumask_first(cpumask_scratch_cpu(cpu)); >>> +/* >>> + * This is cases 3 or 6 (depending on bs): last stand, >>> just one valid >>> + * pcpu from our soft affinity, if we have one and if >>> there's any. In >>> + * fact, if we are doing soft-affinity, it is possible >>> that we fail, >>> + * which means we stay in the loop and look for hard >>> affinity. OTOH, >>> + * if we are at the hard-affinity balancing step, it's >>> guaranteed that >>> + * there is at least one valid cpu, and therefore we are >>> sure that we >>> + * return it, and never really exit the loop. >>> + */ >>> +ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) || >>> + bs == BALANCE_SOFT_AFFINITY); >>> +cpu = cpumask_first(cpumask_scratch_cpu(cpu)); >> >> So just checking my understanding here... at this point we're not >> taking >> into consideration load or idleness or anything else -- we're just >> saying, "Is there a cpu in my soft affinity it is *possible* to run >> on?" >> > Exactly. If we are in this function, it means we failed to take the > locks we needed, for making a choice based on load,
Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()
On 25/07/17 16:27, Tim Deegan wrote: > At 16:00 +0100 on 25 Jul (1500998413), Andrew Cooper wrote: >> The purpose of guest_walk_tables() is to match the behaviour of real >> hardware. >> >> A PV guest can have 2M superpages in its pagetables, via the M2P and the >> initial initrd mapping, even if it isn't permitted to create arbitrary 2M >> superpage mappings. > Can the domain builder (or Xen?) really give a guest superpage > mappings for its initrd? Wouldn't that cause problems for live > migration? From Wei's work, dom0 definitely ends up calling put_page_type() on a L2 superpage. Looking at the dom0 construction code more closely, it appears that Xen creates the initial p2m in the guest using superpages. I will s/initrd/initial p2m/ in the commit message. I don't believe the domain builder has any way of creating such mappings for a domU, so the migration aspect is less important. > > In any case this patch looks correct: the presence of superpages in PV > pagetables is decided by the PV MM rules, so the walker should accept them. > > Reviewed-by: Tim DeeganThanks. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/5] xen: sched_null: add some tracing
On Tue, 2017-07-25 at 16:15 +0100, George Dunlap wrote: > On Thu, Jun 29, 2017 at 1:56 PM, Dario Faggioli >wrote: > > In line with what is there in all the other schedulers. > > > > Signed-off-by: Dario Faggioli > > --- > > George Dunlap > > FYI forgot the 'CC:' for this and patch 5. :-) > Ah, indeed. Sorry for that. > (No problem, just one > extra step to download the series into an mbox.) > Ok, will pay more attention next time. :-) Thanks and Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()
On Tue, Jul 25, 2017 at 04:27:00PM +0100, Tim Deegan wrote: > At 16:00 +0100 on 25 Jul (1500998413), Andrew Cooper wrote: > > The purpose of guest_walk_tables() is to match the behaviour of real > > hardware. > > > > A PV guest can have 2M superpages in its pagetables, via the M2P and the > > initial initrd mapping, even if it isn't permitted to create arbitrary 2M > > superpage mappings. > > Can the domain builder (or Xen?) really give a guest superpage > mappings for its initrd? Wouldn't that cause problems for live > migration? > I can see superpage mapping when constructing Dom0 (kernel or initrd, haven't checked). ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()
On Tue, Jul 25, 2017 at 04:00:13PM +0100, Andrew Cooper wrote: > The purpose of guest_walk_tables() is to match the behaviour of real hardware. > > A PV guest can have 2M superpages in its pagetables, via the M2P and the > initial initrd mapping, even if it isn't permitted to create arbitrary 2M > superpage mappings. > > guest_can_use_l2_superpages() checking opt_allow_superpage is a piece of PV > guest policy enforcement, rather than its intended purpose of meaning "would > hardware tolerate finding an L2 superpage with these control settings?" > > Signed-off-by: Andrew CooperReviewed-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/4] xen: credit2: implement utilization cap
On Tue, 2017-07-25 at 16:08 +0100, George Dunlap wrote: > On 06/08/2017 01:08 PM, Dario Faggioli wrote: > > > Hmm, this needs to be rebased on the structure layout patches I > checked > in last week. :-) > Indeed it does. And of course, I'm up for it. :-) The soft-affinity series is likely to generate less conflicts... It's probably easier if we merge that first (after thaking care of patches 3 and 6, of course), isn't it? Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
Hello Julien, On 25.07.17 18:44, Julien Grall wrote: I have seen work on this board for the past year and it is the first time I have seen a complain about memory overlap. So why this sudden change? It just an approach change. I'm cleaning up nits for the board. Is that a comment from the vendor or your guess? It is my impression. But yes, it could be worth to ask them why do they hardcode the memory layout in their u-boot but stuff the dts with number of memory nodes. Actually from the beginning of our time for this board the solution was merging all memory description nodes in one in a dts for XEN, like [1]. So u-boot runtime updates to the dtb were just ignored. You need to differentiate the device-tree spec itself and Linux implementation. memblock is something common to all architecture. It does not mean it is something valid to do. That is why I asked for an advice. [1] https://github.com/xen-troops/meta-demo/blob/master/meta-rcar-gen3-xen/recipes-kernel/linux/linux-renesas/r8a7795-salvator-x-xen.dts#L61 -- *Andrii Anisov* ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/7] xen: credit2: soft-affinity awareness in fallback_cpu()
On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote: > On 06/16/2017 03:13 PM, Dario Faggioli wrote: > > > > diff --git a/xen/common/sched_credit2.c > > b/xen/common/sched_credit2.c > > index c749d4e..54f6e21 100644 > > --- a/xen/common/sched_credit2.c > > +++ b/xen/common/sched_credit2.c > > @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu, > > cpumask_t *mask) > > } > > > > /* > > - * When a hard affinity change occurs, we may not be able to check > > some > > - * (any!) of the other runqueues, when looking for the best new > > processor > > - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that > > happens, we > > - * pick, in order of decreasing preference: > > - * - svc's current pcpu; > > - * - another pcpu from svc's current runq; > > - * - any cpu. > > + * In csched2_cpu_pick(), it may not be possible to actually look > > at remote > > + * runqueues (the trylock-s on their spinlocks can fail!). If that > > happens, > > + * we pick, in order of decreasing preference: > > + * 1) svc's current pcpu, if it is part of svc's soft affinity; > > + * 2) a pcpu in svc's current runqueue that is also in svc's soft > > affinity; > > + * 3) just one valid pcpu from svc's soft affinity; > > + * 4) svc's current pcpu, if it is part of svc's hard affinity; > > + * 5) a pcpu in svc's current runqueue that is also in svc's hard > > affinity; > > + * 6) just one valid pcpu from svc's hard affinity > > + * > > + * Of course, 1, 2 and 3 makes sense only if svc has a soft > > affinity. Also > > + * note that at least 6 is guaranteed to _always_ return at least > > one pcpu. > > */ > > static int get_fallback_cpu(struct csched2_vcpu *svc) > > { > > struct vcpu *v = svc->vcpu; > > -int cpu = v->processor; > > +unsigned int bs; > > > > -cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity, > > -cpupool_domain_cpumask(v->domain)); > > +for_each_affinity_balance_step( bs ) > > +{ > > +int cpu = v->processor; > > + > > +if ( bs == BALANCE_SOFT_AFFINITY && > > + !has_soft_affinity(v, v->cpu_hard_affinity) ) > > +continue; > > > > -if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) ) > > -return cpu; > > +affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu)); > > +cpumask_and(cpumask_scratch_cpu(cpu), > > cpumask_scratch_cpu(cpu), > > +cpupool_domain_cpumask(v->domain)); > > > > -if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu), > > - >rqd->active)) ) > > -{ > > -cpumask_and(cpumask_scratch_cpu(cpu), >rqd->active, > > -cpumask_scratch_cpu(cpu)); > > -return cpumask_first(cpumask_scratch_cpu(cpu)); > > -} > > +/* > > + * This is cases 1 or 4 (depending on bs): if v->processor > > is (still) > > + * in our affinity, go for it, for cache betterness. > > + */ > > +if ( likely(cpumask_test_cpu(cpu, > > cpumask_scratch_cpu(cpu))) ) > > +return cpu; > > > > -ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu))); > > +/* > > + * This is cases 2 or 5 (depending on bs): v->processor > > isn't there > > + * any longer, check if we at least can stay in our > > current runq. > > + */ > > +if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu), > > + >rqd->active)) ) > > +{ > > +cpumask_and(cpumask_scratch_cpu(cpu), > > cpumask_scratch_cpu(cpu), > > +>rqd->active); > > +return cpumask_first(cpumask_scratch_cpu(cpu)); > > +} > > > > -return cpumask_first(cpumask_scratch_cpu(cpu)); > > +/* > > + * This is cases 3 or 6 (depending on bs): last stand, > > just one valid > > + * pcpu from our soft affinity, if we have one and if > > there's any. In > > + * fact, if we are doing soft-affinity, it is possible > > that we fail, > > + * which means we stay in the loop and look for hard > > affinity. OTOH, > > + * if we are at the hard-affinity balancing step, it's > > guaranteed that > > + * there is at least one valid cpu, and therefore we are > > sure that we > > + * return it, and never really exit the loop. > > + */ > > +ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) || > > + bs == BALANCE_SOFT_AFFINITY); > > +cpu = cpumask_first(cpumask_scratch_cpu(cpu)); > > So just checking my understanding here... at this point we're not > taking > into consideration load or idleness or anything else -- we're just > saying, "Is there a cpu in my soft affinity it is *possible* to run > on?" > Exactly. If we are in this function, it means we failed to take the locks we needed, for making a choice based on load, idleness, etc, but we need a CPU, so we pick whatever is valid.
Re: [Xen-devel] [PATCH 3/5] xen: sched-null: support soft-affinity
On Thu, Jun 29, 2017 at 1:56 PM, Dario Faggioliwrote: > The null scheduler does not really use hard-affinity for > scheduling, it uses it for 'placement', i.e., for deciding > to what pCPU to statically assign a vCPU. > > Let's use soft-affinity in the same way, of course with the > difference that, if there's no free pCPU within the vCPU's > soft-affinity, we go checking the hard-affinity, instead of > putting the vCPU in the waitqueue. > > This does has no impact on the scheduling overhead, because > soft-affinity is only considered in cold-path (like when a > vCPU joins the scheduler for the first time, or is manually > moved between pCPUs by the user). > > Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
On 25/07/17 16:27, Andrii Anisov wrote: Hello Julien, Hi Andrii, On 25.07.17 17:23, Julien Grall wrote: I think this is by chance rather than by design. The first question to answer is why a Firmware would specify twice the same memory banks? Is it valid from the specification? Yep, that is the question. Regardless that, it looks like to me that the device-tree you give to the board should not contain the memory nodes. The device-tree is provided by vendor in that form, and u-boot is theirs. It seems to me that they do not really care since the kernel tolerates duplication. I have seen work on this board for the past year and it is the first time I have seen a complain about memory overlap. So why this sudden change? Is that a comment from the vendor or your guess? ps. Linux kernel does tolerate duplicated memory nodes by merging memory blocks. I.e. memblock_add_range() function is commented as following: /** * memblock_add_range - add new memblock region * @type: memblock type to add new region into * @base: base address of the new region * @size: size of the new region * @nid: nid of the new region * @flags: flags of the new region * * Add new memblock region [@base,@base+@size) into @type. The new region * is allowed to overlap with existing ones - overlaps don't affect already * existing regions. @type is guaranteed to be minimal (all neighbouring * compatible regions are merged) after the addition. * * RETURNS: * 0 on success, -errno on failure. */ IMO the function description is pretty straight-forward. You need to differentiate the device-tree spec itself and Linux implementation. memblock is something common to all architecture. It does not mean it is something valid to do. But let us wait for device tree guys feedback. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 20/20] osstest: save/retrieve the last successfully tested FreeBSD build
On Tue, Jul 25, 2017 at 04:18:52PM +0100, Ian Jackson wrote: > Roger Pau Monne writes ("[PATCH v7 20/20] osstest: save/retrieve the last > successfully tested FreeBSD build"): > > And use it in order to install the hosts for the next FreeBSD flight. > ... > > +case "$branch" in > > +freebsd-*) > > +IFS=$'\n' > > That's quite brave, but I don't object. I would have piped the output > into `read' or something. Yes, I think that would be better, let me do that as a patch on top of this afterwards. I'm currently building the new images and I cannot play freely with osstest ATM. > > +for anointed in \> +`./mg-anoint list-prepared "freebsd build > > $freebsd_branch*"`; do > ^ > I think there is a missing space between $freebsd_branch and * ? > > > +IFS=$'\n' > > +for anointed in \ > > +`./mg-anoint list-prepared "freebsd build $freebsd_branch*"`; do > > +# Retrieve previous successful FreeBSD build for each arch. > > +freebsd_arch=${anointed##* } > > +freebsd_envvar="FREEBSD_${freebsd_arch^^}_BUILDJOB" > > +if [ "x${!freebsd_envvar}" = "x" ]; then > > +flight_job=`./mg-anoint retrieve "$anointed"` > > +export ${freebsd_envvar}=${flight_job/ /.} > > +fi > > +done > > +unset IFS > > LGTM apart from the same missing space. > > So if you add those spaces: > > Acked-by: Ian JacksonThanks, I've now pushed this (also with the other acks) to: git://xenbits.xen.org/people/royger/osstest.git freebsd_v8 Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Duplicated memory node in the Device-Tree (WAS [XEN] Re: Duplicated memory nodes cause the BUG())
Hello Julien, On 25.07.17 17:23, Julien Grall wrote: I think this is by chance rather than by design. The first question to answer is why a Firmware would specify twice the same memory banks? Is it valid from the specification? Yep, that is the question. Regardless that, it looks like to me that the device-tree you give to the board should not contain the memory nodes. The device-tree is provided by vendor in that form, and u-boot is theirs. It seems to me that they do not really care since the kernel tolerates duplication. ps. Linux kernel does tolerate duplicated memory nodes by merging memory blocks. I.e. memblock_add_range() function is commented as following: /** * memblock_add_range - add new memblock region * @type: memblock type to add new region into * @base: base address of the new region * @size: size of the new region * @nid: nid of the new region * @flags: flags of the new region * * Add new memblock region [@base,@base+@size) into @type. The new region * is allowed to overlap with existing ones - overlaps don't affect already * existing regions. @type is guaranteed to be minimal (all neighbouring * compatible regions are merged) after the addition. * * RETURNS: * 0 on success, -errno on failure. */ IMO the function description is pretty straight-forward. But let us wait for device tree guys feedback. -- *Andrii Anisov* ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/pagewalk: Remove opt_allow_superpage check from guest_can_use_l2_superpages()
At 16:00 +0100 on 25 Jul (1500998413), Andrew Cooper wrote: > The purpose of guest_walk_tables() is to match the behaviour of real hardware. > > A PV guest can have 2M superpages in its pagetables, via the M2P and the > initial initrd mapping, even if it isn't permitted to create arbitrary 2M > superpage mappings. Can the domain builder (or Xen?) really give a guest superpage mappings for its initrd? Wouldn't that cause problems for live migration? In any case this patch looks correct: the presence of superpages in PV pagetables is decided by the PV MM rules, so the walker should accept them. Reviewed-by: Tim Deegan> guest_can_use_l2_superpages() checking opt_allow_superpage is a piece of PV > guest policy enforcement, rather than its intended purpose of meaning "would > hardware tolerate finding an L2 superpage with these control settings?" > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Tim Deegan > CC: George Dunlap > CC: Wei Liu > --- > xen/include/asm-x86/guest_pt.h | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h > index 72126d5..08031c8 100644 > --- a/xen/include/asm-x86/guest_pt.h > +++ b/xen/include/asm-x86/guest_pt.h > @@ -205,15 +205,17 @@ static inline guest_l4e_t guest_l4e_from_gfn(gfn_t gfn, > u32 flags) > static inline bool guest_can_use_l2_superpages(const struct vcpu *v) > { > /* > + * PV guests use Xen's paging settings. Being 4-level, 2M > + * superpages are unconditionally supported. > + * > * The L2 _PAGE_PSE bit must be honoured in HVM guests, whenever > * CR4.PSE is set or the guest is in PAE or long mode. > * It's also used in the dummy PT for vcpus with CR0.PG cleared. > */ > -return (is_pv_vcpu(v) > -? opt_allow_superpage > -: (GUEST_PAGING_LEVELS != 2 > - || !hvm_paging_enabled(v) > - || (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE))); > +return (is_pv_vcpu(v) || > +GUEST_PAGING_LEVELS != 2 || > +!hvm_paging_enabled(v) || > +(v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_PSE)); > } > > static inline bool guest_can_use_l3_superpages(const struct domain *d) > -- > 2.1.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/5] xen: sched_null: check for pending tasklet work a bit earlier
On Thu, Jun 29, 2017 at 1:56 PM, Dario Faggioliwrote: > Whether or not there's pending tasklet work to do, it's > something we know from the tasklet_work_scheduled parameter. > > Deal with that as soon as possible, like all other schedulers > do. > > Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7 20/20] osstest: save/retrieve the last successfully tested FreeBSD build
Roger Pau Monne writes ("[PATCH v7 20/20] osstest: save/retrieve the last successfully tested FreeBSD build"): > And use it in order to install the hosts for the next FreeBSD flight. ... > +case "$branch" in > +freebsd-*) > +IFS=$'\n' That's quite brave, but I don't object. I would have piped the output into `read' or something. > +for anointed in \> +`./mg-anoint list-prepared "freebsd build > $freebsd_branch*"`; do ^ I think there is a missing space between $freebsd_branch and * ? > +IFS=$'\n' > +for anointed in \ > +`./mg-anoint list-prepared "freebsd build $freebsd_branch*"`; do > +# Retrieve previous successful FreeBSD build for each arch. > +freebsd_arch=${anointed##* } > +freebsd_envvar="FREEBSD_${freebsd_arch^^}_BUILDJOB" > +if [ "x${!freebsd_envvar}" = "x" ]; then > +flight_job=`./mg-anoint retrieve "$anointed"` > +export ${freebsd_envvar}=${flight_job/ /.} > +fi > +done > +unset IFS LGTM apart from the same missing space. So if you add those spaces: Acked-by: Ian JacksonIan. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel