Re: [Xen-devel] [PATCH] [tools/hotplug] Use ip on systems where brctl is not available

2019-12-18 Thread Steven Haigh

On 2019-12-19 02:42, Ian Jackson wrote:

Steven Haigh writes ("[PATCH] [tools/hotplug] Use ip on systems where
brctl is not available"):

Newer distros like CentOS 8 do not have brctl available. As such, we
can't use it to configure networking anymore.

This patch will fall back to 'ip' or 'bridge' commands if brctl is not
available in the working PATH.


This looks good to me at least in the brctl case.  I have two minor
comments.

For the avoidance of doubt, I guess you have tested this in the
`ip'/`bridge' case ?  How thoroughly ? :-)


I have tested it to the point that it's almost a port of the Fedora 
patch - however the Fedora patch removes brctl completely in favour of 
the ip / bridge commands. While I haven't specifically debugged the 
result on Fedora, the networking works successfully when running a 
Domain-0 in Fedora 31 - which was the source of the 'ip' commands to 
run.





-if [ -z "$bridge" ]
-then
-  bridge=$(brctl show | awk 'NR==2{print$1}')
-
+if [ -z "$bridge" ]; then


The presumably-unintentional style change makes the review slightly
harder...


I'm intending to submit a new patch series after this (to make 
backporting this easier) that cleans up formatting / whitespace / syntax 
across the majority of scripts in the Linux directory. It'll look like a 
hot mess when submitting the next lot of patches - but its better than 
nothing.



-bridge=$(brctl show | cut -d "
+if which brctl >&/dev/null; then


Maybe introduce
   have_brctl () { ... }
so we can say
   if have_brctl; then
?


I don't really have a preference. brctl is used through quite a few 
scripts - none of which really have a standard method of operation or 
common presentation. Some scripts call xen-network-common.sh - some do 
not.


Would I be correct in thinking that your proposal would be to ensure all 
network scripts source xen-network-common.sh - but this would be a more 
invasive change for backporting - hence I've tried to keep it as simple 
as possible for now.


Would a restructure of these things be better for something to be 
committed as yet another patch set (after formatting/style cleanups) 
that makes things a little more consistent?


--
Steven Haigh

? net...@crc.id.au ? https://www.crc.id.au

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] [tools/hotplug] Use ip on systems where brctl is not available

2019-12-17 Thread Steven Haigh
Newer distros like CentOS 8 do not have brctl available. As such, we
can't use it to configure networking anymore.

This patch will fall back to 'ip' or 'bridge' commands if brctl is not
available in the working PATH.

This would be a likely backport candidate to any version expected to be
built on CentOS 8 etc.

---
 tools/hotplug/Linux/colo-proxy-setup  | 30 +--
 tools/hotplug/Linux/vif-bridge| 16 
 tools/hotplug/Linux/vif2  | 12 +++--
 tools/hotplug/Linux/xen-network-common.sh | 16 +---
 4 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/tools/hotplug/Linux/colo-proxy-setup 
b/tools/hotplug/Linux/colo-proxy-setup
index 94e2034452..d709146c47 100755
--- a/tools/hotplug/Linux/colo-proxy-setup
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -76,10 +76,17 @@ function teardown_primary()
 
 function setup_secondary()
 {
-do_without_error brctl delif $bridge $vifname
-do_without_error brctl addbr $forwardbr
-do_without_error brctl addif $forwardbr $vifname
-do_without_error brctl addif $forwardbr $forwarddev
+if which brctl >&/dev/null; then
+do_without_error brctl delif $bridge $vifname
+do_without_error brctl addbr $forwardbr
+do_without_error brctl addif $forwardbr $vifname
+do_without_error brctl addif $forwardbr $forwarddev
+else
+do_without_error ip link set $vifname nomaster
+do_without_error ip link add name $forwardbr type bridge
+do_without_error ip link set $vifname master $forwardbr
+do_without_error ip link set $forwarddev master $forwardbr
+fi
 do_without_error ip link set dev $forwardbr up
 do_without_error modprobe xt_SECCOLO
 
@@ -91,10 +98,17 @@ function setup_secondary()
 
 function teardown_secondary()
 {
-do_without_error brctl delif $forwardbr $forwarddev
-do_without_error brctl delif $forwardbr $vifname
-do_without_error brctl delbr $forwardbr
-do_without_error brctl addif $bridge $vifname
+if which brctl >&/dev/null; then
+do_without_error brctl delif $forwardbr $forwarddev
+do_without_error brctl delif $forwardbr $vifname
+do_without_error brctl delbr $forwardbr
+do_without_error brctl addif $bridge $vifname
+else
+do_without_error ip link set $forwarddev nomaster
+do_without_error ip link set $vifname nomaster
+do_without_error ip link delete $forwardbr type bridge
+do_without_error ip link set $vifname master $bridge
+fi
 
 do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \
 $vifname -j SECCOLO --index $index
diff --git a/tools/hotplug/Linux/vif-bridge b/tools/hotplug/Linux/vif-bridge
index 6956dea66a..e722090ca8 100644
--- a/tools/hotplug/Linux/vif-bridge
+++ b/tools/hotplug/Linux/vif-bridge
@@ -31,10 +31,12 @@ dir=$(dirname "$0")
 bridge=${bridge:-}
 bridge=$(xenstore_read_default "$XENBUS_PATH/bridge" "$bridge")
 
-if [ -z "$bridge" ]
-then
-  bridge=$(brctl show | awk 'NR==2{print$1}')
-
+if [ -z "$bridge" ]; then
+if which brctl >&/dev/null; then
+bridge=$(brctl show | awk 'NR==2{print$1}')
+else
+bridge=$(bridge link | cut -d" " -f7)
+fi
   if [ -z "$bridge" ]
   then
  fatal "Could not find bridge, and none was specified"
@@ -82,7 +84,11 @@ case "$command" in
 ;;
 
 offline)
-do_without_error brctl delif "$bridge" "$dev"
+if which brctl >&/dev/null; then
+do_without_error brctl delif "$bridge" "$dev"
+else
+do_without_error ip link set "$dev" nomaster
+fi
 do_without_error ifconfig "$dev" down
 ;;
 
diff --git a/tools/hotplug/Linux/vif2 b/tools/hotplug/Linux/vif2
index 2c155be68c..5bd555c6f0 100644
--- a/tools/hotplug/Linux/vif2
+++ b/tools/hotplug/Linux/vif2
@@ -7,13 +7,21 @@ dir=$(dirname "$0")
 bridge=$(xenstore_read_default "$XENBUS_PATH/bridge" "$bridge")
 if [ -z "$bridge" ]
 then
-nr_bridges=$(($(brctl show | cut -f 1 | grep -v "^$" | wc -l) - 1))
+if which brctl >&/dev/null; then
+nr_bridges=$(($(brctl show | cut -f 1 | grep -v "^$" | wc -l) - 1))
+else
+nr_bridges=$(bridge link | wc -l)
+fi
 if [ "$nr_bridges" != 1 ]
then
fatal "no bridge specified, and don't know which one to use 
($nr_bridges found)"
 fi
-bridge=$(brctl show | cut -d "
+if which brctl >&/dev/null; then
+bridge=$(brctl show | cut -d "
 " -f 2 | cut -f 1)
+else
+bridge=$(bridge link | cut -d" " -f6)
+fi
 fi
 
 command="$1"
diff --git a/tools/hotplug/Linux/xen-network-common.sh 
b/tools/hotplug/Linux/xen-network-common.sh
index 92ffa603f7..8dd3a62068 100644
--- a/tools/hotplug/Linux/xen-network-common.sh
+++ b/tools/hotplug/Linux/xen-network-common.sh
@@ -111,9 +111,13 @@ create_bridge () {
 
 # Don't create the bridge if it already exists.
 if [ ! -e 

Re: [Xen-devel] [PATCH 1/2] Tidy up whitespace and formatting in file to be consistent.

2019-12-17 Thread Steven Haigh
Ok, if its going to be 4 spaces for each file, I can batch convert & 
tidy stuff up...


The file I changed had both types, so I went with my own preference :)

If it might be a better approach, I'll sort out the majority of scripts 
in that directory - and do no function changes and post a series that 
does nothing but cleanup - then do the brctl / ip changes on top of 
that in a different patch.


I might as well do them all - and it makes sense to do nothing but 
cleanup, then functional changes based on the cleaned up code.

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au


On Tue, Dec 17, 2019 at 14:13, Wei Liu  wrote:

On Fri, Dec 13, 2019 at 03:08:34PM +1100, Steven Haigh wrote:

 Signed-off-by: Steven Haigh 


Acked-by: Wei Liu 

I will need to add tools/hotplug to the subject line and the following
commit message:

   Use 4 spaces for indentation throughout the file. No functional
   change.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] Tidy up whitespace and formatting in file to be consistent.

2019-12-12 Thread Steven Haigh
Signed-off-by: Steven Haigh 
---
 tools/hotplug/Linux/xen-network-common.sh | 144 +++---
 1 file changed, 70 insertions(+), 74 deletions(-)

diff --git a/tools/hotplug/Linux/xen-network-common.sh 
b/tools/hotplug/Linux/xen-network-common.sh
index 92ffa603f7..ab76827a64 100644
--- a/tools/hotplug/Linux/xen-network-common.sh
+++ b/tools/hotplug/Linux/xen-network-common.sh
@@ -26,118 +26,114 @@
 #   that the virtual device will take once the physical device has
 #   been renamed.
 
-if ! which ifup >/dev/null 2>/dev/null
-then
-  preiftransfer()
-  {
-true
-  }
-  ifup()
-  {
-false
-  }
-  ifdown()
-  {
-false
-  }
+if ! which ifup >/dev/null 2>/dev/null; then
+   preiftransfer()
+   {
+   true
+   }
+   ifup()
+   {
+   false
+   }
+   ifdown()
+   {
+   false
+   }
 else
-  preiftransfer()
-  {
-true
-  }
+   preiftransfer()
+   {
+   true
+   }
 fi
 
 
 first_file()
 {
-  t="$1"
-  shift
-  for file in $@
-  do
-if [ "$t" "$file" ]
-then
-  echo "$file"
-  return
-fi
-  done
+   t="$1"
+   shift
+   for file in $@; do
+   if [ "$t" "$file" ]; then
+   echo "$file"
+   return
+   fi
+   done
 }
 
 find_dhcpd_conf_file()
 {
-  first_file -f /etc/dhcp3/dhcpd.conf /etc/dhcpd.conf
+   first_file -f /etc/dhcp3/dhcpd.conf /etc/dhcpd.conf
 }
 
 
 find_dhcpd_init_file()
 {
-  first_file -x /etc/init.d/{dhcp3-server,dhcp,dhcpd}
+   first_file -x /etc/init.d/{dhcp3-server,dhcp,dhcpd}
 }
 
 find_dhcpd_arg_file()
 {
-  first_file -f /etc/sysconfig/dhcpd /etc/defaults/dhcp 
/etc/default/dhcp3-server
+   first_file -f /etc/sysconfig/dhcpd /etc/defaults/dhcp 
/etc/default/dhcp3-server
 }
 
 # configure interfaces which act as pure bridge ports:
 _setup_bridge_port() {
-local dev="$1"
-local virtual="$2"
-
-# take interface down ...
-ip link set dev ${dev} down
-
-if [ $virtual -ne 0 ] ; then
-# Initialise a dummy MAC address. We choose the numerically
-# largest non-broadcast address to prevent the address getting
-# stolen by an Ethernet bridge for STP purposes.
-# (FE:FF:FF:FF:FF:FF)
-ip link set dev ${dev} address fe:ff:ff:ff:ff:ff || true
-fi
-
-# ... and configure it
-ip address flush dev ${dev}
+   local dev="$1"
+   local virtual="$2"
+
+   # take interface down ...
+   ip link set dev ${dev} down
+
+   if [ $virtual -ne 0 ]; then
+   # Initialise a dummy MAC address. We choose the numerically
+   # largest non-broadcast address to prevent the address getting
+   # stolen by an Ethernet bridge for STP purposes.
+   # (FE:FF:FF:FF:FF:FF)
+   ip link set dev ${dev} address fe:ff:ff:ff:ff:ff || true
+   fi
+
+   # ... and configure it
+   ip address flush dev ${dev}
 }
 
 setup_physical_bridge_port() {
-_setup_bridge_port $1 0
+   _setup_bridge_port $1 0
 }
 setup_virtual_bridge_port() {
-_setup_bridge_port $1 1
+   _setup_bridge_port $1 1
 }
 
 # Usage: create_bridge bridge
 create_bridge () {
-local bridge=$1
-
-# Don't create the bridge if it already exists.
-if [ ! -e "/sys/class/net/${bridge}/bridge" ]; then
-   brctl addbr ${bridge}
-   brctl stp ${bridge} off
-   brctl setfd ${bridge} 0
-fi
+   local bridge=$1
+
+   # Don't create the bridge if it already exists.
+   if [ ! -e "/sys/class/net/${bridge}/bridge" ]; then
+   brctl addbr ${bridge}
+   brctl stp ${bridge} off
+   brctl setfd ${bridge} 0
+   fi
 }
 
 # Usage: add_to_bridge bridge dev
 add_to_bridge () {
-local bridge=$1
-local dev=$2
-
-# Don't add $dev to $bridge if it's already on a bridge.
-if [ -e "/sys/class/net/${bridge}/brif/${dev}" ]; then
-   ip link set dev ${dev} up || true
-   return
-fi
-brctl addif ${bridge} ${dev}
-ip link set dev ${dev} up
+   local bridge=$1
+   local dev=$2
+
+   # Don't add $dev to $bridge if it's already on a bridge.
+   if [ -e "/sys/class/net/${bridge}/brif/${dev}" ]; then
+   ip link set dev ${dev} up || true
+   return
+   fi
+   brctl addif ${bridge} ${dev}
+   ip link set dev ${dev} up
 }
 
 # Usage: set_mtu bridge dev
 set_mtu () {
-local bridge=$1
-local dev=$2
-mtu="`ip link show dev ${bridge}| awk '/mtu/ { print $5 }'`"
-if [ -n "$mtu" ] && [ "$mtu" -gt 0 ]
-then
-ip link set dev ${dev} mtu $mtu || :
-fi
+   local bridge=$1
+   

[Xen-devel] [PATCH 2/2] Use ip for bridge related functions where brctl is not present

2019-12-12 Thread Steven Haigh
Signed-off-by: Steven Haigh 
---
 tools/hotplug/Linux/colo-proxy-setup  | 30 +--
 tools/hotplug/Linux/vif-bridge| 19 --
 tools/hotplug/Linux/vif2  | 12 +++--
 tools/hotplug/Linux/xen-network-common.sh | 15 +---
 4 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/tools/hotplug/Linux/colo-proxy-setup 
b/tools/hotplug/Linux/colo-proxy-setup
index 94e2034452..cbd5b773c6 100755
--- a/tools/hotplug/Linux/colo-proxy-setup
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -76,10 +76,17 @@ function teardown_primary()
 
 function setup_secondary()
 {
-do_without_error brctl delif $bridge $vifname
-do_without_error brctl addbr $forwardbr
-do_without_error brctl addif $forwardbr $vifname
-do_without_error brctl addif $forwardbr $forwarddev
+if [ -x "/usr/sbin/brctl" ]; then
+do_without_error brctl delif $bridge $vifname
+do_without_error brctl addbr $forwardbr
+do_without_error brctl addif $forwardbr $vifname
+do_without_error brctl addif $forwardbr $forwarddev
+else
+do_without_error ip link set $vifname nomaster
+do_without_error ip link add name $forwardbr type bridge
+do_without_error ip link set $vifname master $forwardbr
+do_without_error ip link set $forwarddev master $forwardbr
+fi
 do_without_error ip link set dev $forwardbr up
 do_without_error modprobe xt_SECCOLO
 
@@ -91,10 +98,17 @@ function setup_secondary()
 
 function teardown_secondary()
 {
-do_without_error brctl delif $forwardbr $forwarddev
-do_without_error brctl delif $forwardbr $vifname
-do_without_error brctl delbr $forwardbr
-do_without_error brctl addif $bridge $vifname
+if [ -x "/usr/sbin/brctl" ]; then
+do_without_error brctl delif $forwardbr $forwarddev
+do_without_error brctl delif $forwardbr $vifname
+do_without_error brctl delbr $forwardbr
+do_without_error brctl addif $bridge $vifname
+else
+do_without_error ip link set $forwarddev nomaster
+do_without_error ip link set $vifname nomaster
+do_without_error ip link delete $forwardbr type bridge
+do_without_error ip link set $vifname master $bridge
+fi
 
 do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \
 $vifname -j SECCOLO --index $index
diff --git a/tools/hotplug/Linux/vif-bridge b/tools/hotplug/Linux/vif-bridge
index 6956dea66a..e035411934 100644
--- a/tools/hotplug/Linux/vif-bridge
+++ b/tools/hotplug/Linux/vif-bridge
@@ -31,12 +31,13 @@ dir=$(dirname "$0")
 bridge=${bridge:-}
 bridge=$(xenstore_read_default "$XENBUS_PATH/bridge" "$bridge")
 
-if [ -z "$bridge" ]
-then
-  bridge=$(brctl show | awk 'NR==2{print$1}')
-
-  if [ -z "$bridge" ]
-  then
+if [ -z "$bridge" ]; then
+if [ -x "/usr/sbin/brctl" ]; then
+bridge=$(brctl show | awk 'NR==2{print$1}')
+else
+bridge=$(bridge link | cut -d" " -f7)
+fi
+  if [ -z "$bridge" ]; then
  fatal "Could not find bridge, and none was specified"
   fi
 else
@@ -82,7 +83,11 @@ case "$command" in
 ;;
 
 offline)
-do_without_error brctl delif "$bridge" "$dev"
+if [ -x "/usr/sbin/brctl"]; then
+do_without_error brctl delif "$bridge" "$dev"
+else
+do_without_error ip link set "$dev" nomaster
+fi
 do_without_error ifconfig "$dev" down
 ;;
 
diff --git a/tools/hotplug/Linux/vif2 b/tools/hotplug/Linux/vif2
index 2c155be68c..e36070cbbb 100644
--- a/tools/hotplug/Linux/vif2
+++ b/tools/hotplug/Linux/vif2
@@ -7,13 +7,21 @@ dir=$(dirname "$0")
 bridge=$(xenstore_read_default "$XENBUS_PATH/bridge" "$bridge")
 if [ -z "$bridge" ]
 then
-nr_bridges=$(($(brctl show | cut -f 1 | grep -v "^$" | wc -l) - 1))
+if [ -x "/usr/sbin/brctl" ]; then
+nr_bridges=$(($(brctl show | cut -f 1 | grep -v "^$" | wc -l) - 1))
+else
+nr_bridges=$(bridge link | wc -l)
+fi
 if [ "$nr_bridges" != 1 ]
then
fatal "no bridge specified, and don't know which one to use 
($nr_bridges found)"
 fi
-bridge=$(brctl show | cut -d "
+if [ -x "/usr/sbin/brctl" ]; then
+bridge=$(brctl show | cut -d "
 " -f 2 | cut -f 1)
+else
+bridge=$(bridge link | cut -d" " -f6)
+fi
 fi
 
 command="$1"
diff --git a/tools/hotplug/Linux/xen-network-common.sh 
b/tools/hotplug/Linux/xen-network-common.sh
index ab76827a64..7833deac6c 100644
--- a/tools/hotplug/Linux/xen-network-common.sh
+++ b/tools/hotplug/Linux/xen-network-common.sh
@@ -108,9 +108,12 @@ create_bridge 

[Xen-devel] [PATCH v2 0/2] [PATCH-for-4.13] Work towards removing brctl

2019-12-12 Thread Steven Haigh
Start updating scripts for network functionality

(Resending as the patch emails seem to have been eaten somewhere)

The scripts for networking in Xen have a mixture of formatting,
tab spacing, space spacing inconsistencies.

We also have issues where CentOS 8 does not have brctl - being
replaced with ip / bridge commands.

This series starts cleaning up whitespace and formatting, as well
as starts adding conditionals for using brctl (if present) but using
ip if /usr/sbin/brctl is not installed.

Changes since v1
  * Fixed reference to /usr/bin/brctl instead of /usr/sbin/brctl

Steven Haigh (2):
  Tidy up whitespace and formatting in file to be consistent.
  Use ip for bridge related functions where brctl is not present

 tools/hotplug/Linux/colo-proxy-setup  |  30 +++--
 tools/hotplug/Linux/vif-bridge|  19 ++-
 tools/hotplug/Linux/vif2  |  12 +-
 tools/hotplug/Linux/xen-network-common.sh | 151 +++---
 4 files changed, 121 insertions(+), 91 deletions(-)

-- 
2.24.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/2] [PATCH-for-4.13] Work towards removing brctl

2019-12-12 Thread Steven Haigh
Start updating scripts for network functionality

The scripts for networking in Xen have a mixture of formatting,
tab spacing, space spacing inconsistencies.

We also have issues where CentOS 8 does not have brctl - being
replaced with ip / bridge commands.

This series starts cleaning up whitespace and formatting, as well
as starts adding conditionals for using brctl (if present) but using
ip if /usr/sbin/brctl is not installed.

Changes since v1
  * Fixed reference to /usr/bin/brctl instead of /usr/sbin/brctl

Steven Haigh (2):
  Tidy up whitespace and formatting in file to be consistent.
  Use ip for bridge related functions where brctl is not present

 tools/hotplug/Linux/colo-proxy-setup  |  30 +++--
 tools/hotplug/Linux/vif-bridge|  19 ++-
 tools/hotplug/Linux/vif2  |  12 +-
 tools/hotplug/Linux/xen-network-common.sh | 151 +++---
 4 files changed, 121 insertions(+), 91 deletions(-)

-- 
2.24.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/2] [PATCH-for-4.13] Work towards removing brctl

2019-12-12 Thread Steven Haigh
Start updating scripts for network functionality

The scripts for networking in Xen have a mixture of formatting,
tab spacing, space spacing inconsistencies.

We also have issues where CentOS 8 does not have brctl - being
replaced with ip / bridge commands.

This series starts cleaning up whitespace and formatting, as well
as starts adding conditionals for using brctl (if present) but using
ip if /usr/sbin/brctl is not installed.

Steven Haigh (2):
  Tidy up whitespace and formatting in file to be consistent.
  Use ip for bridge related functions where brctl is not present

 tools/hotplug/Linux/colo-proxy-setup  |  30 +++--
 tools/hotplug/Linux/vif-bridge|  19 ++-
 tools/hotplug/Linux/vif2  |  12 +-
 tools/hotplug/Linux/xen-network-common.sh | 151 +++---
 4 files changed, 121 insertions(+), 91 deletions(-)

-- 
2.24.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] Tidy up whitespace and formatting in file to be consistent.

2019-12-12 Thread Steven Haigh
Signed-off-by: Steven Haigh 
---
 tools/hotplug/Linux/xen-network-common.sh | 144 +++---
 1 file changed, 70 insertions(+), 74 deletions(-)

diff --git a/tools/hotplug/Linux/xen-network-common.sh 
b/tools/hotplug/Linux/xen-network-common.sh
index 92ffa603f7..ab76827a64 100644
--- a/tools/hotplug/Linux/xen-network-common.sh
+++ b/tools/hotplug/Linux/xen-network-common.sh
@@ -26,118 +26,114 @@
 #   that the virtual device will take once the physical device has
 #   been renamed.
 
-if ! which ifup >/dev/null 2>/dev/null
-then
-  preiftransfer()
-  {
-true
-  }
-  ifup()
-  {
-false
-  }
-  ifdown()
-  {
-false
-  }
+if ! which ifup >/dev/null 2>/dev/null; then
+   preiftransfer()
+   {
+   true
+   }
+   ifup()
+   {
+   false
+   }
+   ifdown()
+   {
+   false
+   }
 else
-  preiftransfer()
-  {
-true
-  }
+   preiftransfer()
+   {
+   true
+   }
 fi
 
 
 first_file()
 {
-  t="$1"
-  shift
-  for file in $@
-  do
-if [ "$t" "$file" ]
-then
-  echo "$file"
-  return
-fi
-  done
+   t="$1"
+   shift
+   for file in $@; do
+   if [ "$t" "$file" ]; then
+   echo "$file"
+   return
+   fi
+   done
 }
 
 find_dhcpd_conf_file()
 {
-  first_file -f /etc/dhcp3/dhcpd.conf /etc/dhcpd.conf
+   first_file -f /etc/dhcp3/dhcpd.conf /etc/dhcpd.conf
 }
 
 
 find_dhcpd_init_file()
 {
-  first_file -x /etc/init.d/{dhcp3-server,dhcp,dhcpd}
+   first_file -x /etc/init.d/{dhcp3-server,dhcp,dhcpd}
 }
 
 find_dhcpd_arg_file()
 {
-  first_file -f /etc/sysconfig/dhcpd /etc/defaults/dhcp 
/etc/default/dhcp3-server
+   first_file -f /etc/sysconfig/dhcpd /etc/defaults/dhcp 
/etc/default/dhcp3-server
 }
 
 # configure interfaces which act as pure bridge ports:
 _setup_bridge_port() {
-local dev="$1"
-local virtual="$2"
-
-# take interface down ...
-ip link set dev ${dev} down
-
-if [ $virtual -ne 0 ] ; then
-# Initialise a dummy MAC address. We choose the numerically
-# largest non-broadcast address to prevent the address getting
-# stolen by an Ethernet bridge for STP purposes.
-# (FE:FF:FF:FF:FF:FF)
-ip link set dev ${dev} address fe:ff:ff:ff:ff:ff || true
-fi
-
-# ... and configure it
-ip address flush dev ${dev}
+   local dev="$1"
+   local virtual="$2"
+
+   # take interface down ...
+   ip link set dev ${dev} down
+
+   if [ $virtual -ne 0 ]; then
+   # Initialise a dummy MAC address. We choose the numerically
+   # largest non-broadcast address to prevent the address getting
+   # stolen by an Ethernet bridge for STP purposes.
+   # (FE:FF:FF:FF:FF:FF)
+   ip link set dev ${dev} address fe:ff:ff:ff:ff:ff || true
+   fi
+
+   # ... and configure it
+   ip address flush dev ${dev}
 }
 
 setup_physical_bridge_port() {
-_setup_bridge_port $1 0
+   _setup_bridge_port $1 0
 }
 setup_virtual_bridge_port() {
-_setup_bridge_port $1 1
+   _setup_bridge_port $1 1
 }
 
 # Usage: create_bridge bridge
 create_bridge () {
-local bridge=$1
-
-# Don't create the bridge if it already exists.
-if [ ! -e "/sys/class/net/${bridge}/bridge" ]; then
-   brctl addbr ${bridge}
-   brctl stp ${bridge} off
-   brctl setfd ${bridge} 0
-fi
+   local bridge=$1
+
+   # Don't create the bridge if it already exists.
+   if [ ! -e "/sys/class/net/${bridge}/bridge" ]; then
+   brctl addbr ${bridge}
+   brctl stp ${bridge} off
+   brctl setfd ${bridge} 0
+   fi
 }
 
 # Usage: add_to_bridge bridge dev
 add_to_bridge () {
-local bridge=$1
-local dev=$2
-
-# Don't add $dev to $bridge if it's already on a bridge.
-if [ -e "/sys/class/net/${bridge}/brif/${dev}" ]; then
-   ip link set dev ${dev} up || true
-   return
-fi
-brctl addif ${bridge} ${dev}
-ip link set dev ${dev} up
+   local bridge=$1
+   local dev=$2
+
+   # Don't add $dev to $bridge if it's already on a bridge.
+   if [ -e "/sys/class/net/${bridge}/brif/${dev}" ]; then
+   ip link set dev ${dev} up || true
+   return
+   fi
+   brctl addif ${bridge} ${dev}
+   ip link set dev ${dev} up
 }
 
 # Usage: set_mtu bridge dev
 set_mtu () {
-local bridge=$1
-local dev=$2
-mtu="`ip link show dev ${bridge}| awk '/mtu/ { print $5 }'`"
-if [ -n "$mtu" ] && [ "$mtu" -gt 0 ]
-then
-ip link set dev ${dev} mtu $mtu || :
-fi
+   local bridge=$1
+   

[Xen-devel] [PATCH 2/2] Use ip for bridge related functions where brctl is not present

2019-12-12 Thread Steven Haigh
Signed-off-by: Steven Haigh 
---
 tools/hotplug/Linux/colo-proxy-setup  | 30 +--
 tools/hotplug/Linux/vif-bridge| 19 --
 tools/hotplug/Linux/vif2  | 12 +++--
 tools/hotplug/Linux/xen-network-common.sh | 15 +---
 4 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/tools/hotplug/Linux/colo-proxy-setup 
b/tools/hotplug/Linux/colo-proxy-setup
index 94e2034452..690021d10a 100755
--- a/tools/hotplug/Linux/colo-proxy-setup
+++ b/tools/hotplug/Linux/colo-proxy-setup
@@ -76,10 +76,17 @@ function teardown_primary()
 
 function setup_secondary()
 {
-do_without_error brctl delif $bridge $vifname
-do_without_error brctl addbr $forwardbr
-do_without_error brctl addif $forwardbr $vifname
-do_without_error brctl addif $forwardbr $forwarddev
+if [ -x "/usr/bin/brctl" ]; then
+do_without_error brctl delif $bridge $vifname
+do_without_error brctl addbr $forwardbr
+do_without_error brctl addif $forwardbr $vifname
+do_without_error brctl addif $forwardbr $forwarddev
+else
+do_without_error ip link set $vifname nomaster
+do_without_error ip link add name $forwardbr type bridge
+do_without_error ip link set $vifname master $forwardbr
+do_without_error ip link set $forwarddev master $forwardbr
+fi
 do_without_error ip link set dev $forwardbr up
 do_without_error modprobe xt_SECCOLO
 
@@ -91,10 +98,17 @@ function setup_secondary()
 
 function teardown_secondary()
 {
-do_without_error brctl delif $forwardbr $forwarddev
-do_without_error brctl delif $forwardbr $vifname
-do_without_error brctl delbr $forwardbr
-do_without_error brctl addif $bridge $vifname
+if [ -x "/usr/bin/brctl" ]; then
+do_without_error brctl delif $forwardbr $forwarddev
+do_without_error brctl delif $forwardbr $vifname
+do_without_error brctl delbr $forwardbr
+do_without_error brctl addif $bridge $vifname
+else
+do_without_error ip link set $forwarddev nomaster
+do_without_error ip link set $vifname nomaster
+do_without_error ip link delete $forwardbr type bridge
+do_without_error ip link set $vifname master $bridge
+fi
 
 do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \
 $vifname -j SECCOLO --index $index
diff --git a/tools/hotplug/Linux/vif-bridge b/tools/hotplug/Linux/vif-bridge
index 6956dea66a..e035411934 100644
--- a/tools/hotplug/Linux/vif-bridge
+++ b/tools/hotplug/Linux/vif-bridge
@@ -31,12 +31,13 @@ dir=$(dirname "$0")
 bridge=${bridge:-}
 bridge=$(xenstore_read_default "$XENBUS_PATH/bridge" "$bridge")
 
-if [ -z "$bridge" ]
-then
-  bridge=$(brctl show | awk 'NR==2{print$1}')
-
-  if [ -z "$bridge" ]
-  then
+if [ -z "$bridge" ]; then
+if [ -x "/usr/sbin/brctl" ]; then
+bridge=$(brctl show | awk 'NR==2{print$1}')
+else
+bridge=$(bridge link | cut -d" " -f7)
+fi
+  if [ -z "$bridge" ]; then
  fatal "Could not find bridge, and none was specified"
   fi
 else
@@ -82,7 +83,11 @@ case "$command" in
 ;;
 
 offline)
-do_without_error brctl delif "$bridge" "$dev"
+if [ -x "/usr/sbin/brctl"]; then
+do_without_error brctl delif "$bridge" "$dev"
+else
+do_without_error ip link set "$dev" nomaster
+fi
 do_without_error ifconfig "$dev" down
 ;;
 
diff --git a/tools/hotplug/Linux/vif2 b/tools/hotplug/Linux/vif2
index 2c155be68c..e36070cbbb 100644
--- a/tools/hotplug/Linux/vif2
+++ b/tools/hotplug/Linux/vif2
@@ -7,13 +7,21 @@ dir=$(dirname "$0")
 bridge=$(xenstore_read_default "$XENBUS_PATH/bridge" "$bridge")
 if [ -z "$bridge" ]
 then
-nr_bridges=$(($(brctl show | cut -f 1 | grep -v "^$" | wc -l) - 1))
+if [ -x "/usr/sbin/brctl" ]; then
+nr_bridges=$(($(brctl show | cut -f 1 | grep -v "^$" | wc -l) - 1))
+else
+nr_bridges=$(bridge link | wc -l)
+fi
 if [ "$nr_bridges" != 1 ]
then
fatal "no bridge specified, and don't know which one to use 
($nr_bridges found)"
 fi
-bridge=$(brctl show | cut -d "
+if [ -x "/usr/sbin/brctl" ]; then
+bridge=$(brctl show | cut -d "
 " -f 2 | cut -f 1)
+else
+bridge=$(bridge link | cut -d" " -f6)
+fi
 fi
 
 command="$1"
diff --git a/tools/hotplug/Linux/xen-network-common.sh 
b/tools/hotplug/Linux/xen-network-common.sh
index ab76827a64..7833deac6c 100644
--- a/tools/hotplug/Linux/xen-network-common.sh
+++ b/tools/hotplug/Linux/xen-network-common.sh
@@ -108,9 +108,12 @@ create_bridge 

Re: [Xen-devel] Status of 4.13

2019-11-20 Thread Steven Haigh

On 2019-11-21 17:05, Jürgen Groß wrote:

Where do we stand with Xen 4.13 regarding blockers and related patches?

2. Ryzen/Rome failures with Windows guests:
   What is the currently planned way to address the problem? Who is
   working on that?


A workaround was found by specifying cpuid values in the Windows VM 
config file.


The workaround line is:
cpuid = [ "0x8008:ecx=0100" ]

It was suggested that this be documented - but no immediate action 
should be taken - with a view to correct this properly in 4.14.


I'm not sure the status of any patches / additions to documentation - 
however maybe this is the wiki? I'll leave that for someone else to 
comment on.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ryzen 3xxx works with Windows

2019-11-15 Thread Steven Haigh
Can add weight to these findings. Tested with Xen 4.12.1 and the cpuid 
line suggested and it looks like my Windows VM has come up with 4 vCPUS.


I can't RDP in to make sure its 100% booted, but it certainly isn't 
doing the crash dump cycle - and CPU usage is consistent with being 
successfully booted.

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Fri, Nov 15, 2019 at 18:06, Andreas Kinzler  wrote:

Hello All,

I compared the CPUID listings from Ryzen 2700X (attached as tar.xz) 
to 3700X and found only very few differences. I added


cpuid = [ "0x8008:ecx=0100" ]

to xl.cfg and then Windows runs great with 16 vCPUs. Cinebench R15 
score is >2050 which is more or less the bare metal value.


Regards Andreas




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] x86: Add hack to disable "Fake HT" mode

2019-11-15 Thread Steven Haigh
Just regarding the use of a system environment variable to turn this 
feature / bugfix / hack on and off - this would probably break starting 
the VM via the xendomains script.


If the VM definition is in /etc/xen/auto/, then there would be nothing 
to set the environment variable before the VM is launched - hence it 
would not be applied and a guest crash would occur...


Depending on the VM's settings, this would either continue to start & 
crash - or just stop again until it could be started with the ENV 
variable.

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Fri, Nov 15, 2019 at 10:57, George Dunlap  
wrote:

Changeset ca2eee92df44 ("x86, hvm: Expose host core/HT topology to HVM
guests") attempted to "fake up" a topology which would induce guest
operating systems to not treat vcpus as sibling hyperthreads.  This
involved (among other things) actually reporting hyperthreading as
available, but giving vcpus every other APICID.  The resulting cpu
featureset is invalid, but most operating systems on most hardware
managed to cope with it.

Unfortunately, Windows running on modern AMD hardware -- including
Ryzen 3xxx series processors, and reportedly EPYC "Rome" cpus -- gets
confused by the resulting contradictory feature bits and crashes
during installation.  (Linux guests have so far continued to cope.)

A "proper" fix is complicated and it's too late to fix it either for
4.13, or to backport to supported branches.  As a short-term fix,
implement an option to disable this "Fake HT" mode.  The resulting
topology reported will not be canonical, but experimentally continues
to work with Windows guests.

However, disabling this "Fake HT" mode has not been widely tested, and
will almost certainly break migration if applied inconsistently.

To minimize impact while allowing administrators to disable "Fake HT"
only on guests which are known not to work without it (i.e., Windows
guests) on affected hardware, add an environment variable which can be
set to disable the "Fake HT" mode on such hardware.

Reported-by: Steven Haigh 
Reported-by: Andreas Kinzler 
Signed-off-by: George Dunlap 
---
This has been compile-tested only; I'm posting it early to get
feedback on the approach.

TODO: Prevent such guests from being migrated

Open questions:

- Is this the right place to put the `getenv` check?

- Is there any way we can make migration work, at least in some cases?

- Can we check for known-problematic models, and at least report a
  more useful error?

CC: Andrew Cooper 
CC: Jan Beulich 
CC: Ian Jackson 
CC: Anthony Perard 
---
 tools/libxc/xc_cpuid_x86.c | 74 
+++---

 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 312c481f1e..70c85e1467 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -579,52 +579,68 @@ int xc_cpuid_apply_policy(xc_interface *xch, 
uint32_t domid,

 }
 else
 {
-/*
- * Topology for HVM guests is entirely controlled by Xen.  
For now, we
- * hardcode APIC_ID = vcpu_id * 2 to give the illusion of no 
SMT.

- */
-p->basic.htt = true;
+p->basic.htt = false;
 p->extd.cmp_legacy = false;

-/*
- * Leaf 1 EBX[23:16] is Maximum Logical Processors Per 
Package.
- * Update to reflect vLAPIC_ID = vCPU_ID * 2, but make sure 
to avoid

- * overflow.
- */
-if ( !(p->basic.lppp & 0x80) )
-p->basic.lppp *= 2;
-
 switch ( p->x86_vendor )
 {
 case X86_VENDOR_INTEL:
 for ( i = 0; (p->cache.subleaf[i].type &&
   i < ARRAY_SIZE(p->cache.raw)); ++i )
 {
-p->cache.subleaf[i].cores_per_package =
-(p->cache.subleaf[i].cores_per_package << 1) | 1;
+p->cache.subleaf[i].cores_per_package = 0;
 p->cache.subleaf[i].threads_per_cache = 0;
 }
 break;
+}

-case X86_VENDOR_AMD:
-case X86_VENDOR_HYGON:
+if ( !getenv("XEN_LIBXC_DISABLE_FAKEHT") ) {
 /*
- * Leaf 0x8008 ECX[15:12] is ApicIdCoreSize.
- * Leaf 0x8008 ECX[7:0] is NumberOfCores (minus one).
- * Update to reflect vLAPIC_ID = vCPU_ID * 2.  But avoid
- * - overflow,
- * - going out of sync with leaf 1 EBX[23:16],
- * - incrementing ApicIdCoreSize when it's zero (which 
changes the

- *   meaning of bits 7:0).
+ * Topology for HVM guests is entirely controlled by 
Xen.  For now, we
+ * hardcode APIC_ID = vcpu_id * 2 to give the illusion 
of no SMT.

  */
-if

Re: [Xen-devel] [XEN PATCH 0/3] read grubenv and set default from it

2019-10-27 Thread Steven Haigh

Awesome - thanks Michael.

I'll try and test this out tomorrow.
Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Sat, Oct 26, 2019 at 16:00, "YOUNG, MICHAEL A." 
 wrote:

On Sat, 26 Oct 2019, Steven Haigh wrote:


 If / when pygrub is able to properly read and boot from BLS based
 configurations (I'm not sure if this patchset makes pygrub BLS 
compatible, or
 just fixes the existing issues) - but we can look at revisiting 
removing
 these workarounds from anaconda / grub2 packages in F30 / F31 / 
Rawhide.


The patchset doesn't add BLS compatibility but should be useful for 
what I
expect BLS support to look like (I have a idea of what would be 
required

though I haven't worked out the details yet).

Michael Young

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH 0/3] read grubenv and set default from it

2019-10-26 Thread Steven Haigh
Just for the record, the grub packages have been updated in Fedora 31 
to automatically disable BLS when installing / removing a kernel on Xen 
Dom0 / DomU installations.


As such, we should never come across a Fedora 31 install with BLS 
enabled from this point forwards.


There is currently ongoing work to disable BLS during the installation 
via anaconda - but this hasn't hit yet - and I believe it's already a 
freeze exception.


If / when pygrub is able to properly read and boot from BLS based 
configurations (I'm not sure if this patchset makes pygrub BLS 
compatible, or just fixes the existing issues) - but we can look at 
revisiting removing these workarounds from anaconda / grub2 packages in 
F30 / F31 / Rawhide.


Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Fri, Oct 25, 2019 at 22:52, "YOUNG, MICHAEL A." 
 wrote:

This series of patches is to improve the parsing by pygrub of grub
configuration on Fedora. The current result of parsing is generally
that the second kernel listed is set as the default due to a
set default=1 line in grub.cfg which is only intended to be
reached after repeated boot failures.

The patches read the grubenv file (which consists of key=value lines
padded to 1024 characters by # characters) to get the values of
next_entry and saved_entry, which can be a kernel string or an
order number. Unfortunately, for Fedora 31 at least, this is
often a BLS-style string so it isn't necessarily useful. The patches
use the value of next_entry or of saved_entry to set the default
kernel or sets it to the first kernel listed if those values are set
but not used.


Michael Young (3):
  set default kernel from grubenv next_entry or saved_entry
  read a grubenv file if it is next to the grub.cfg file
  Example Fedora 31 grub.cfg and grubenv files

 tools/pygrub/examples/fedora-31.grub.cfg | 200 
+++

 tools/pygrub/examples/fedora-31.grubenv  |   5 +
 tools/pygrub/src/GrubConf.py |  31 +++-
 tools/pygrub/src/pygrub  |  21 ++-
 4 files changed, 253 insertions(+), 4 deletions(-)
 create mode 100644 tools/pygrub/examples/fedora-31.grub.cfg
 create mode 100644 tools/pygrub/examples/fedora-31.grubenv

--
2.21.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Debugging Windows HVM crashes on Ryzen 3xxx series CPUs.

2019-10-25 Thread Steven Haigh

# patch -p1 < ../000-debug-patch-0.patch
patching file xen/arch/x86/hvm/hvm.c
Hunk #1 succeeded at 3373 (offset 1 line).
patching file xen/arch/x86/hvm/svm/svm.c
Hunk #1 succeeded at 2159 (offset -64 lines).

I've attached the output from around boot all the way until after the 
Windows HVM DomU crashed.


I gzip'ed it, as its a few hundred Kb.
Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Fri, Oct 25, 2019 at 10:28, Jan Beulich  wrote:

On 25.10.2019 09:00, Steven Haigh wrote:
 Further to my last, I downloaded the latest Windows Server 2016 ISO 
from

 Microsoft.

 Filename: 
Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO


 Have attached as much of the log as I could get attempting to boot 
from

 the ISO and having a blank LV as the install target.

 The Windows error message (shown via VNC) is HAL MEMORY ALLOCATION.


Hmm, that's as if there was still (again?) an issue with CPUID
handling - iirc the same was observable on maximum-size Rome
systems prior to df29d03f1d (and its fixup). Below the debugging
patch I did use at the time, maybe it turns out helpful here too
(and perhaps you'd really only need the first hunk, I had put in
the other one just in case anyway).

However this looks to be different from your earlier report,
where you said you've got some

(XEN) d1v0 VIRIDIAN CRASH: ac 0 a0a0 f8065c06bf88 bf8

So I wonder whether there's a new issue masking the old one.

Jan

--- unstable.orig/xen/arch/x86/hvm/hvm.c
+++ unstable/xen/arch/x86/hvm/hvm.c
@@ -3372,6 +3372,9 @@ int hvm_vmexit_cpuid(struct cpu_user_reg
 }

 guest_cpuid(curr, leaf, subleaf, );
+if(regs->ax && (regs->eax >> 16) != 0x4000 && (long)regs->rip < 0) 
{//temp
+ printk("%pv[%08lx]: %08x:%08x=%08x:%08x:%08x:%08x\n", curr, 
regs->rip, leaf, subleaf, res.a, res.b, res.c, res.d);

+}
 HVMTRACE_6D(CPUID, leaf, subleaf, res.a, res.b, res.c, res.d);

 regs->rax = res.a;
--- unstable.orig/xen/arch/x86/hvm/svm/svm.c
+++ unstable/xen/arch/x86/hvm/svm/svm.c
@@ -2223,7 +2223,13 @@ static void svm_do_msr_access(struct cpu

 rc = hvm_msr_read_intercept(regs->ecx, _content);
 if ( rc == X86EMUL_OKAY )
+{//temp
 msr_split(regs, msr_content);
+ if(regs->ecx == 0xc001100c || regs->ecx == 0xc0011005)
+  printk("%pv[%08lx]: %08x -> %08x:%08x\n", curr, regs->rip, 
regs->ecx, regs->edx, regs->eax);

+} else if(regs->ecx == 0xc001100c || regs->ecx == 0xc0011005) {
+ printk("%pv[%08lx]: %08x -> #GP\n", curr, regs->rip, regs->ecx);
+}
 }
 else
 rc = hvm_msr_write_intercept(regs->ecx, msr_fold(regs), 
true);


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




xen-output.log.gz
Description: application/gzip
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Debugging Windows HVM crashes on Ryzen 3xxx series CPUs.

2019-10-25 Thread Steven Haigh
Further to my last, I downloaded the latest Windows Server 2016 ISO from 
Microsoft.


Filename: Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO

Have attached as much of the log as I could get attempting to boot from 
the ISO and having a blank LV as the install target.


The Windows error message (shown via VNC) is HAL MEMORY ALLOCATION.

On 2019-10-25 16:26, Steven Haigh wrote:

Just to make things annoying, I also get the following message in the
logs for correctly operating Linux PVH DomU's:

(XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x2600, fault
address = 0xfffdf800, flags = 0x8

As such, I think we're back to zero clues at the moment as to what is 
going on.


Suggestions welcome :)

On 2019-10-25 01:45, Paul Durrant wrote:

Not much clue in the logs. The crash params are weird though...
certainly not matching the doc.
(https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0xac--hal-memory-allocation)
but then again they are not always to be believed.
There are some odd looking IOMMU faults in there too.

 Paul

On Thu, 24 Oct 2019 at 13:01, Steven Haigh  wrote:


Hi all,

I've managed to get the git master version of Xen on this affected
system and tries to boot a Windows Server 2016 system. It crashes as

per normal.

I managed to get these logs, but I'm not quite sure what else to do
to
debug this issue further.

Suggestions welcome.

The boot log in /var/log/xen/ shows:
Waiting for domain soti.vm (domid 4) to die [pid 9174]
Domain 4 has shut down, reason code 3 0x3
Action for shutdown reason code 3 is destroy
Domain 4 needs to be cleaned up: destroying the domain
Done. Exiting now

For some reason I'm not getting any serial output - so I'll have to
take a look at that tomorrow - but if you need anything further,
please
let me know and I'll see what I can turn up.

Windows config file:

type = "hvm"
name = "$vmname.vm"
viridian = 1
#viridian = ['base']
memory = 8192
vcpus = 4
vif = ['bridge=br51, mac=00:16:3E:64:CC:A0']
#disk = [ '/dev/vg_hosting/$vmname.vm,raw,xvda,rw',


'file:/root/SW_DVD9_NTRL_Windows_Svrs_2016_English_2_Std_DC_FPP_OEM_X21-22567.ISO,hdc:cdrom,r'


]
disk = [ '/dev/vg_hosting/$vmname.vm,raw,hda,rw' ]
boot = 'cd'
vnc = 2
vnclisten = "0.0.0.0"
#vncpasswd = ''

## Set the clock to localtime - not UTC...
localtime = 1

## Fix the mouse cursor for VNC usage
usbdevice = 'tablet'

## Lower CPU prio that other VMs...
cpu_weight = 128

on_poweroff = 'destroy'
on_reboot = 'destroy'
on_crash = 'destroy'

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897(XEN) HVM d8v0 save: CPU
(XEN) HVM d8v1 save: CPU
(XEN) HVM d8v2 save: CPU
(XEN) HVM d8v3 save: CPU
(XEN) HVM d8 save: PIC
(XEN) HVM d8 save: IOAPIC
(XEN) HVM d8v0 save: LAPIC
(XEN) HVM d8v1 save: LAPIC
(XEN) HVM d8v2 save: LAPIC
(XEN) HVM d8v3 save: LAPIC
(XEN) HVM d8v0 save: LAPIC_REGS
(XEN) HVM d8v1 save: LAPIC_REGS
(XEN) HVM d8v2 save: LAPIC_REGS
(XEN) HVM d8v3 save: LAPIC_REGS
(XEN) HVM d8 save: PCI_IRQ
(XEN) HVM d8 save: ISA_IRQ
(XEN) HVM d8 save: PCI_LINK
(XEN) HVM d8 save: PIT
(XEN) HVM d8 save: RTC
(XEN) HVM d8 save: HPET
(XEN) HVM d8 save: PMTIMER
(XEN) HVM d8v0 save: MTRR
(XEN) HVM d8v1 save: MTRR
(XEN) HVM d8v2 save: MTRR
(XEN) HVM d8v3 save: MTRR
(XEN) HVM d8 save: VIRIDIAN_DOMAIN
(XEN) HVM d8v0 save: CPU_XSAVE
(XEN) HVM d8v1 save: CPU_XSAVE
(XEN) HVM d8v2 save: CPU_XSAVE
(XEN) HVM d8v3 save: CPU_XSAVE
(XEN) HVM d8v0 save: VIRIDIAN_VCPU
(XEN) HVM d8v1 save: VIRIDIAN_VCPU
(XEN) HVM d8v2 save: VIRIDIAN_VCPU
(XEN) HVM d8v3 save: VIRIDIAN_VCPU
(XEN) HVM d8v0 save: VMCE_VCPU
(XEN) HVM d8v1 save: VMCE_VCPU
(XEN) HVM d8v2 save: VMCE_VCPU
(XEN) HVM d8v3 save: VMCE_VCPU
(XEN) HVM d8v0 save: TSC_ADJUST
(XEN) HVM d8v1 save: TSC_ADJUST
(XEN) HVM d8v2 save: TSC_ADJUST
(XEN) HVM d8v3 save: TSC_ADJUST
(XEN) HVM d8v0 save: CPU_MSR
(XEN) HVM d8v1 save: CPU_MSR
(XEN) HVM d8v2 save: CPU_MSR
(XEN) HVM d8v3 save: CPU_MSR
(XEN) HVM8 restore: CPU 0
(XEN) emul-priv-op.c::d0v0 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v1 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v0 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v2 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v1 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) e

Re: [Xen-devel] Debugging Windows HVM crashes on Ryzen 3xxx series CPUs.

2019-10-24 Thread Steven Haigh
Just to make things annoying, I also get the following message in the 
logs for correctly operating Linux PVH DomU's:


(XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x2600, fault 
address = 0xfffdf800, flags = 0x8


As such, I think we're back to zero clues at the moment as to what is 
going on.


Suggestions welcome :)

On 2019-10-25 01:45, Paul Durrant wrote:

Not much clue in the logs. The crash params are weird though...
certainly not matching the doc.
(https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0xac--hal-memory-allocation)
but then again they are not always to be believed.
There are some odd looking IOMMU faults in there too.

 Paul

On Thu, 24 Oct 2019 at 13:01, Steven Haigh  wrote:


Hi all,

I've managed to get the git master version of Xen on this affected
system and tries to boot a Windows Server 2016 system. It crashes as

per normal.

I managed to get these logs, but I'm not quite sure what else to do
to
debug this issue further.

Suggestions welcome.

The boot log in /var/log/xen/ shows:
Waiting for domain soti.vm (domid 4) to die [pid 9174]
Domain 4 has shut down, reason code 3 0x3
Action for shutdown reason code 3 is destroy
Domain 4 needs to be cleaned up: destroying the domain
Done. Exiting now

For some reason I'm not getting any serial output - so I'll have to
take a look at that tomorrow - but if you need anything further,
please
let me know and I'll see what I can turn up.

Windows config file:

type = "hvm"
name = "$vmname.vm"
viridian = 1
#viridian = ['base']
memory = 8192
vcpus = 4
vif = ['bridge=br51, mac=00:16:3E:64:CC:A0']
#disk = [ '/dev/vg_hosting/$vmname.vm,raw,xvda,rw',


'file:/root/SW_DVD9_NTRL_Windows_Svrs_2016_English_2_Std_DC_FPP_OEM_X21-22567.ISO,hdc:cdrom,r'


]
disk = [ '/dev/vg_hosting/$vmname.vm,raw,hda,rw' ]
boot = 'cd'
vnc = 2
vnclisten = "0.0.0.0"
#vncpasswd = ''

## Set the clock to localtime - not UTC...
localtime = 1

## Fix the mouse cursor for VNC usage
usbdevice = 'tablet'

## Lower CPU prio that other VMs...
cpu_weight = 128

on_poweroff = 'destroy'
on_reboot = 'destroy'
on_crash = 'destroy'

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Debugging Windows HVM crashes on Ryzen 3xxx series CPUs.

2019-10-24 Thread Steven Haigh

Hi all,

I've managed to get the git master version of Xen on this affected 
system and tries to boot a Windows Server 2016 system. It crashes as 
per normal.


I managed to get these logs, but I'm not quite sure what else to do to 
debug this issue further.


Suggestions welcome.

The boot log in /var/log/xen/ shows:
Waiting for domain soti.vm (domid 4) to die [pid 9174]
Domain 4 has shut down, reason code 3 0x3
Action for shutdown reason code 3 is destroy
Domain 4 needs to be cleaned up: destroying the domain
Done. Exiting now

For some reason I'm not getting any serial output - so I'll have to 
take a look at that tomorrow - but if you need anything further, please 
let me know and I'll see what I can turn up.


Windows config file:

type = "hvm"
name = "$vmname.vm"
viridian = 1
#viridian = ['base']
memory = 8192
vcpus = 4
vif = ['bridge=br51, mac=00:16:3E:64:CC:A0']
#disk = [ '/dev/vg_hosting/$vmname.vm,raw,xvda,rw', 
'file:/root/SW_DVD9_NTRL_Windows_Svrs_2016_English_2_Std_DC_FPP_OEM_X21-22567.ISO,hdc:cdrom,r' 
]

disk = [ '/dev/vg_hosting/$vmname.vm,raw,hda,rw' ]
boot = 'cd'
vnc = 2
vnclisten = "0.0.0.0"
#vncpasswd = ''

## Set the clock to localtime - not UTC...
localtime = 1

## Fix the mouse cursor for VNC usage
usbdevice = 'tablet'

## Lower CPU prio that other VMs...
cpu_weight = 128

on_poweroff = 'destroy'
on_reboot = 'destroy'
on_crash = 'destroy'

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


(XEN) HVM d4v0 save: CPU
(XEN) HVM d4v1 save: CPU
(XEN) HVM d4v2 save: CPU
(XEN) HVM d4v3 save: CPU
(XEN) HVM d4 save: PIC
(XEN) HVM d4 save: IOAPIC
(XEN) HVM d4v0 save: LAPIC
(XEN) HVM d4v1 save: LAPIC
(XEN) HVM d4v2 save: LAPIC
(XEN) HVM d4v3 save: LAPIC
(XEN) HVM d4v0 save: LAPIC_REGS
(XEN) HVM d4v1 save: LAPIC_REGS
(XEN) HVM d4v2 save: LAPIC_REGS
(XEN) HVM d4v3 save: LAPIC_REGS
(XEN) HVM d4 save: PCI_IRQ
(XEN) HVM d4 save: ISA_IRQ
(XEN) HVM d4 save: PCI_LINK
(XEN) HVM d4 save: PIT
(XEN) HVM d4 save: RTC
(XEN) HVM d4 save: HPET
(XEN) HVM d4 save: PMTIMER
(XEN) HVM d4v0 save: MTRR
(XEN) HVM d4v1 save: MTRR
(XEN) HVM d4v2 save: MTRR
(XEN) HVM d4v3 save: MTRR
(XEN) HVM d4 save: VIRIDIAN_DOMAIN
(XEN) HVM d4v0 save: CPU_XSAVE
(XEN) HVM d4v1 save: CPU_XSAVE
(XEN) HVM d4v2 save: CPU_XSAVE
(XEN) HVM d4v3 save: CPU_XSAVE
(XEN) HVM d4v0 save: VIRIDIAN_VCPU
(XEN) HVM d4v1 save: VIRIDIAN_VCPU
(XEN) HVM d4v2 save: VIRIDIAN_VCPU
(XEN) HVM d4v3 save: VIRIDIAN_VCPU
(XEN) HVM d4v0 save: VMCE_VCPU
(XEN) HVM d4v1 save: VMCE_VCPU
(XEN) HVM d4v2 save: VMCE_VCPU
(XEN) HVM d4v3 save: VMCE_VCPU
(XEN) HVM d4v0 save: TSC_ADJUST
(XEN) HVM d4v1 save: TSC_ADJUST
(XEN) HVM d4v2 save: TSC_ADJUST
(XEN) HVM d4v3 save: TSC_ADJUST
(XEN) HVM d4v0 save: CPU_MSR
(XEN) HVM d4v1 save: CPU_MSR
(XEN) HVM d4v2 save: CPU_MSR
(XEN) HVM d4v3 save: CPU_MSR
(XEN) HVM4 restore: CPU 0
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v2 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v2 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v2 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v0 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c::d0v3 Domain attempted WRMSR c0011020 from 
0x00064040 to 0x000640400400
(XEN) emul-priv-op.c:11

Re: [Xen-devel] /sys/hypervisor entries for Xen (Domain-0, PV, PVH and HVM)

2019-10-09 Thread Steven Haigh

Thanks Michael,

In the meantime, we're looking at just disabling BLS by default in the 
grub packages within Fedora when its run on a Xen guest. This means we 
should at least be at a point where Fedora guests will work reliably 
again as Xen guests.


It seems to be agreed that this will stay in place until such point 
where pygrub understands BLS and this no longer becomes an issue - and 
likely there'd be some overlap to let the updated pygrub spread as far 
as possible before yanking out this workaround.


For now, the only big issue that remains is that the current pygrub 
will always boot the second image in the list due to pygrub incorrectly 
parsing the failover sections of the Fedora grub.cfg where the failover 
will set 'default=1' causing this behaviour.


Assuming that the Fedora side is resolved, and we always get a non-BLS 
grub.cfg in a Fedora guest, is there a simpler fix that could be 
included before Xen 4.13 gets launched (and hopefully backported)?


I'm not sure if the proposed changes to Fedora makes this a little 
simpler in fixing the entire issue.


(apologies for top posting, Geary doesn't seem to like letting me 
bottom post!)


Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Wed, Oct 9, 2019 at 09:00, M A Young  wrote:

On Wed, 9 Oct 2019, Steven Haigh wrote:


 Hi all,

 I'm working on fixing up the grub packages for Fedora in deducing 
the new BLS

 logic in Fedora and disabling it in non-compatible environments.

 BZ Report:
 https://bugzilla.redhat.com/show_bug.cgi?id=1703700

 Currently, it seems that we can deduce the following two scenarios:

 in /sys/hypervisor:

 1) type == xen && uuid == all zeros, then this is BLS safe (the 
Domain-0).
 2) type == xen && uuid != all zeros, then this is BLS *unsafe* 
(covers PV, HVM

 and PVH guests).

 Is there any other variables that come into effect that could cause 
a

 variation in the above checks as to enable or disable BLS?

 Right now, I'm proposing that we try to disable the new BLS 
behaviour in
 Fedora for PV, HVM and PVH guests - as pygrub is not up to the task 
of booting
 them. We included HVM as it may be common for users to switch 
between HVM and

 PVH configurations for the same installed VM.


I do have a long term plan to try to get pygrub to handle BLS, though 
I

don't expect to have it working soon.

Michael Young

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] /sys/hypervisor entries for Xen (Domain-0, PV, PVH and HVM)

2019-10-08 Thread Steven Haigh

Hi all,

I'm working on fixing up the grub packages for Fedora in deducing the 
new BLS logic in Fedora and disabling it in non-compatible environments.


BZ Report:
https://bugzilla.redhat.com/show_bug.cgi?id=1703700

Currently, it seems that we can deduce the following two scenarios:

in /sys/hypervisor:

1) type == xen && uuid == all zeros, then this is BLS safe (the 
Domain-0).
2) type == xen && uuid != all zeros, then this is BLS *unsafe* (covers 
PV, HVM and PVH guests).


Is there any other variables that come into effect that could cause a 
variation in the above checks as to enable or disable BLS?


Right now, I'm proposing that we try to disable the new BLS behaviour 
in Fedora for PV, HVM and PVH guests - as pygrub is not up to the task 
of booting them. We included HVM as it may be common for users to 
switch between HVM and PVH configurations for the same installed VM.


Any comments either here or via the BZ report above would be welcome.

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCE] Xen 4.13 Development Update

2019-09-28 Thread Steven Haigh
At the risk of sounding like a broken record, is there an progress with 
investigations on the AMD Ryzen 3xxx series and Windows HVM systems?

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Sat, Sep 28, 2019 at 07:09, Juergen Gross  wrote:
As multiple rather important patch series are very short before being 
ready
I have decided to push the hard code freeze one week back to October 
4th.


This email only tracks big items for xen.git tree. Please reply for 
items you
would like to see in 4.13 so that people have an idea what is going 
on and

prioritise accordingly.

You're welcome to provide description and use cases of the feature 
you're

working on.

= Timeline =

We now adopt a fixed cut-off date scheme. We will release about every 
8 months.

The upcoming 4.13 timeline are as followed:

* Last posting date: September 13th, 2019
---> We are here
* Hard code freeze: October 4th, 2019
* RC1: TBD
* Release: November 7th, 2019

Note that we don't have freeze exception scheme anymore. All patches
that wish to go into 4.13 must be posted initially no later than the
last posting date and finally no later than the hard code freeze. All
patches posted after that date will be automatically queued into next
release.

RCs will be arranged immediately after freeze.

We recently introduced a jira instance to track all the tasks (not 
only big)
for the project. See: 
https://xenproject.atlassian.net/projects/XEN/issues.


Some of the tasks tracked by this e-mail also have a corresponding 
jira task

referred by XEN-N.

I have started to include the version number of series associated to 
each
feature. Can each owner send an update on the version number if the 
series

was posted upstream?

= Projects =

== Hypervisor ==

*  Core scheduling (v4)
  -  Juergen Gross

=== x86 ===

*  Intel Processor Trace virtualization enabling (v1)
  -  Luwei Kang

*  Linux stub domains (RFC v2)
  -  Marek Marczykowski-Górecki

*  Improve late microcode loading (v12)
  -  Chao Gao

*  Fixes to #DB injection
  -  Andrew Cooper

*  CPUID/MSR Xen/toolstack improvements
  -  Andrew Cooper

*  Improvements to domain_crash()
  -  Andrew Cooper

*  EIBRS
  -  Andrew Cooper

*  Xen ioreq server (v2)
  -  Roger Pau Monne

=== ARM ===

== Completed ==

*  Drop tmem
  -  Wei Liu

*  Add support for Hygon Dhyana Family 18h processor
  -  Pu Wen

*  hypervisor x86 instruction emulator additions for AVX512
  -  Jan Beulich

*  x2APIC support for AMD
  -  Jan Beulich

*  add per-domain IOMMU control
  -  Paul Durrant

*  TEE mediator (and OP-TEE) support in XEN
  -  Volodymyr Babchuk

*  Renesas IPMMU-VMSA support + Linux's iommu_fwspec
  -  Oleksandr Tyshchenko


Juergen Gross

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X

2019-09-24 Thread Steven Haigh

If this is helpful, I can probably provide the same from:
* Ryzen 1700x
* Ryzen 2700x
* Ryzen 3900x

I'll leave it to those in the know as to what is useful or not...
Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Tue, Sep 24, 2019 at 11:56, Andreas Kinzler  wrote:

On 23.09.2019 10:17, Jan Beulich wrote:

While, according to AMD's processor specs page, the 3700X is just an
8-core chip, I wonder whether
https://lists.xenproject.org/archives/html/xen-devel/2019-09/msg01954.html
still affects this configuration as well. Could you give this a try 
in

at least the viridian=0 case? As to Linux, did you check that PVH


As a first input for the Xen developers I used the tool from 
http://www.etallen.com/cpuid.html to dump complete cpuid information.


Regards Andreas
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCE] Call for agenda items for September 5th Community Call @ 15:00 UTC

2019-09-05 Thread Steven Haigh

On 2019-09-05 18:19, Andrew Cooper wrote:

On 05/09/2019 09:00, Lars Kurth wrote:
IMPORTANT: I had a few additions to the agenda, but do not know WHO 
added these. I believe one was Juergen. Who added the items related to 
MA Youngs patches?


Steven Haigh I believe.

Booting fedora guests is currently in a very broken state.


Yep - I added points 1 & 2 to the AOB section.

I've also added point 3 to inform that I wouldn't be able to drive those 
myself due to timezones.


I have added some references to xen-devel list posts that may be able to 
assist. I'm happy to answer any questions via freenode if someone wants 
to clobber me a few hours before the meeting - pending availability.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X (and 3900X)

2019-09-02 Thread Steven Haigh


On Mon, Sep 2, 2019 at 6:34 PM, Paul Durrant  
wrote:

 -Original Message-
 From: Steven Haigh 
 Sent: 02 September 2019 09:32
 To: Paul Durrant 
 Cc: Andrew Cooper ; Andreas Kinzler 
; xen-

 de...@lists.xenproject.org
 Subject: Re: Windows HVM no longer boots with AMD Ryzen 3700X (and 
3900X)


 On 2019-09-02 18:25, Steven Haigh wrote:
 > On 2019-09-02 18:20, Paul Durrant wrote:
 >>> -Original Message-
 >>> From: Steven Haigh 
 >>> Sent: 02 September 2019 09:09
 >>> To: Paul Durrant 
 >>> Cc: Andreas Kinzler ; Andrew Cooper
 >>> ; xen-
 >>> de...@lists.xenproject.org
 >>> Subject: Re: Windows HVM no longer boots with AMD Ryzen 3700X 
(and

 >>> 3900X)
 >>>
 >>> On 2019-09-02 18:04, Paul Durrant wrote:
 >>> >> -Original Message-
 >>> >> Further to the above, I did some experimentation. The 
following is a

 >>> >> list of attempted boot configurations and their outcomes:
 >>> >>
 >>> >> viridian=1
 >>> >> vcpus=4
 >>> >> STOPCODE: HAL MEMORY ALLOCATION
 >>> >>
 >>> >> viridian=0
 >>> >> vcpus=4
 >>> >> STOPCODE: MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED
 >>> >>
 >>> >> viridian=0
 >>> >> vcpus=1
 >>> >> Boot OK - get to Windows Server 2016 login etc
 >>> >>
 >>> >
 >>> > And to complete the set, how about viridian=1 vcpus=1?
 >>>
 >>> Any vcpus value where viridian=1 is used creates a HAL MEMORY
 >>> ALLOCATION
 >>> stopcode when trying to boot Windows.
 >>
 >> Ok, so I guess that issue hits first and, only if you get beyond 
that

 >> do you hit the multiprocessor problem.
 >>
 >> The viridian option is not actually a boolean any more (that
 >> interpretation is just for compat) so it would be a good 
datapoint to
 >> know which of the enlightenments causes the change in behaviour. 
Could
 >> you try viridian=['base'] to see if that's sufficient to cause 
the
 >> problem? (I'm guessing it probably is but it would be good to 
know).

 >
 >
 > Hi Paul,
 >
 > I can confirm that viridian=['base'] crashes with the same HAL 
MEMORY

 > ALLOCATION stopcode - even on 1 vcpu.

 Also, just wondering, we're using 8.2.0 of the Windows PV drivers on
 this VM.

 Does this matter? Has there been any changes that would affect this 
in

 8.2.1 or 8.2.2?



From what you describe, I think this is happening way before any PV 
driver code is entered. I guess it would be prudent to make sure by 
trying it on a fresh VM (but didn't you say before that this happens 
when booting from the installation media?)



The original poster mentioned the problem with the install media. To be 
honest, I haven't tried that as yet.


My case / tests are from a working Windows Server 2016 install imaged 
on a different machine (that works and boots fine there).






___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X (and 3900X)

2019-09-02 Thread Steven Haigh

On 2019-09-02 18:25, Steven Haigh wrote:

On 2019-09-02 18:20, Paul Durrant wrote:

-Original Message-
From: Steven Haigh 
Sent: 02 September 2019 09:09
To: Paul Durrant 
Cc: Andreas Kinzler ; Andrew Cooper 
; xen-

de...@lists.xenproject.org
Subject: Re: Windows HVM no longer boots with AMD Ryzen 3700X (and 
3900X)


On 2019-09-02 18:04, Paul Durrant wrote:
>> -Original Message-
>> Further to the above, I did some experimentation. The following is a
>> list of attempted boot configurations and their outcomes:
>>
>> viridian=1
>> vcpus=4
>> STOPCODE: HAL MEMORY ALLOCATION
>>
>> viridian=0
>> vcpus=4
>> STOPCODE: MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED
>>
>> viridian=0
>> vcpus=1
>> Boot OK - get to Windows Server 2016 login etc
>>
>
> And to complete the set, how about viridian=1 vcpus=1?

Any vcpus value where viridian=1 is used creates a HAL MEMORY 
ALLOCATION

stopcode when trying to boot Windows.


Ok, so I guess that issue hits first and, only if you get beyond that
do you hit the multiprocessor problem.

The viridian option is not actually a boolean any more (that
interpretation is just for compat) so it would be a good datapoint to
know which of the enlightenments causes the change in behaviour. Could
you try viridian=['base'] to see if that's sufficient to cause the
problem? (I'm guessing it probably is but it would be good to know).



Hi Paul,

I can confirm that viridian=['base'] crashes with the same HAL MEMORY
ALLOCATION stopcode - even on 1 vcpu.


Also, just wondering, we're using 8.2.0 of the Windows PV drivers on 
this VM.


Does this matter? Has there been any changes that would affect this in 
8.2.1 or 8.2.2?


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X (and 3900X)

2019-09-02 Thread Steven Haigh

On 2019-09-02 18:20, Paul Durrant wrote:

-Original Message-
From: Steven Haigh 
Sent: 02 September 2019 09:09
To: Paul Durrant 
Cc: Andreas Kinzler ; Andrew Cooper 
; xen-

de...@lists.xenproject.org
Subject: Re: Windows HVM no longer boots with AMD Ryzen 3700X (and 
3900X)


On 2019-09-02 18:04, Paul Durrant wrote:
>> -Original Message-
>> Further to the above, I did some experimentation. The following is a
>> list of attempted boot configurations and their outcomes:
>>
>> viridian=1
>> vcpus=4
>> STOPCODE: HAL MEMORY ALLOCATION
>>
>> viridian=0
>> vcpus=4
>> STOPCODE: MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED
>>
>> viridian=0
>> vcpus=1
>> Boot OK - get to Windows Server 2016 login etc
>>
>
> And to complete the set, how about viridian=1 vcpus=1?

Any vcpus value where viridian=1 is used creates a HAL MEMORY 
ALLOCATION

stopcode when trying to boot Windows.


Ok, so I guess that issue hits first and, only if you get beyond that
do you hit the multiprocessor problem.

The viridian option is not actually a boolean any more (that
interpretation is just for compat) so it would be a good datapoint to
know which of the enlightenments causes the change in behaviour. Could
you try viridian=['base'] to see if that's sufficient to cause the
problem? (I'm guessing it probably is but it would be good to know).



Hi Paul,

I can confirm that viridian=['base'] crashes with the same HAL MEMORY 
ALLOCATION stopcode - even on 1 vcpu.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X (and 3900X)

2019-09-02 Thread Steven Haigh

On 2019-09-02 18:04, Paul Durrant wrote:

-Original Message-
Further to the above, I did some experimentation. The following is a
list of attempted boot configurations and their outcomes:

viridian=1
vcpus=4
STOPCODE: HAL MEMORY ALLOCATION

viridian=0
vcpus=4
STOPCODE: MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED

viridian=0
vcpus=1
Boot OK - get to Windows Server 2016 login etc



And to complete the set, how about viridian=1 vcpus=1?


Any vcpus value where viridian=1 is used creates a HAL MEMORY ALLOCATION 
stopcode when trying to boot Windows.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X (and 3900X)

2019-09-01 Thread Steven Haigh

On 2019-09-02 11:42, Steven Haigh wrote:

On 2019-08-21 06:57, Andreas Kinzler wrote:

On 20.08.2019 22:38, Andrew Cooper wrote:

On 20/08/2019 21:36, Andreas Kinzler wrote:

On 20.08.2019 20:12, Andrew Cooper wrote:
Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is 
unchanged

from 2700X (working) to 3700X (crashing).
So you've done a Zen v1 => Zen v2 CPU upgrade and an existing 
system?

With "existing system" you mean the Windows installation?

I meant same computer, not same VM.


Tried with 2 mainboards: Asrock X370 Pro4 and AsrockRack X470D4U.
You need to flash the BIOS for Zen2. X470D4U BIOS 3.1 works with 2700X
but not with 3700X. X370 Pro4 with somewhat older BIOS worked for
2700X and does not work with current (6.00) BIOS and 3700X.

Yes, but it is not relevant. The same BSODs happen if you boot the 
HVM

with just the iso installation medium and no disks.
That's a useful datapoint.  I wouldn't expect this to be relevant, 
given

how Window's HAL works.


It should make debugging for you quite "simple" because it can be
reproduced very easily.


Just to add a data point to this - I also see this problem on a Ryzen 9 
3900x.


xl dmesg shows:
(XEN) d2v0 VIRIDIAN CRASH: ac 0 a0a0 f80293254750 aea
(XEN) d3v0 VIRIDIAN CRASH: ac 0 a0a0 f80093a40750 aea
(XEN) d5v0 VIRIDIAN CRASH: ac 0 a0a0 f8028e422350 aea
(XEN) d6v0 VIRIDIAN CRASH: ac 0 a0a0 f80309431750 aea
(XEN) d10v0 VIRIDIAN CRASH: ac 0 a0a0 f8012823e750 aea
(XEN) d11v0 VIRIDIAN CRASH: ac 0 a0a0 f8032e657350 aea

Windows usually has a stopcode of "HAL MEMORY ALLOCATION" when it blue 
screens.


From xl info:
hw_caps:
178bf3ff:f6d8320b:2e500800:244037ff:000f:219c91a9:0044:0500
virt_caps  : hvm hvm_directio
xen_version: 4.11.2
xen_caps   : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64

Is there any further info that can be provided? Not being able to
virtualise Windows is a bit of a PITA...


Further to the above, I did some experimentation. The following is a 
list of attempted boot configurations and their outcomes:


viridian=1
vcpus=4
STOPCODE: HAL MEMORY ALLOCATION

viridian=0
vcpus=4
STOPCODE: MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED

viridian=0
vcpus=1
Boot OK - get to Windows Server 2016 login etc

As such, it looks like its not a completely fatal problem - but running 
Windows on a single vcpu is.... unpleasant ;)


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Windows HVM no longer boots with AMD Ryzen 3700X (and 3900X)

2019-09-01 Thread Steven Haigh

On 2019-08-21 06:57, Andreas Kinzler wrote:

On 20.08.2019 22:38, Andrew Cooper wrote:

On 20/08/2019 21:36, Andreas Kinzler wrote:

On 20.08.2019 20:12, Andrew Cooper wrote:
Xen version 4.10.2. dom0 kernel 4.13.16. The BIOS version is 
unchanged

from 2700X (working) to 3700X (crashing).
So you've done a Zen v1 => Zen v2 CPU upgrade and an existing 
system?

With "existing system" you mean the Windows installation?

I meant same computer, not same VM.


Tried with 2 mainboards: Asrock X370 Pro4 and AsrockRack X470D4U.
You need to flash the BIOS for Zen2. X470D4U BIOS 3.1 works with 2700X
but not with 3700X. X370 Pro4 with somewhat older BIOS worked for
2700X and does not work with current (6.00) BIOS and 3700X.

Yes, but it is not relevant. The same BSODs happen if you boot the 
HVM

with just the iso installation medium and no disks.
That's a useful datapoint.  I wouldn't expect this to be relevant, 
given

how Window's HAL works.


It should make debugging for you quite "simple" because it can be
reproduced very easily.


Just to add a data point to this - I also see this problem on a Ryzen 9 
3900x.


xl dmesg shows:
(XEN) d2v0 VIRIDIAN CRASH: ac 0 a0a0 f80293254750 aea
(XEN) d3v0 VIRIDIAN CRASH: ac 0 a0a0 f80093a40750 aea
(XEN) d5v0 VIRIDIAN CRASH: ac 0 a0a0 f8028e422350 aea
(XEN) d6v0 VIRIDIAN CRASH: ac 0 a0a0 f80309431750 aea
(XEN) d10v0 VIRIDIAN CRASH: ac 0 a0a0 f8012823e750 aea
(XEN) d11v0 VIRIDIAN CRASH: ac 0 a0a0 f8032e657350 aea

Windows usually has a stopcode of "HAL MEMORY ALLOCATION" when it blue 
screens.


From xl info:
hw_caps: 
178bf3ff:f6d8320b:2e500800:244037ff:000f:219c91a9:0044:0500

virt_caps  : hvm hvm_directio
xen_version: 4.11.2
xen_caps   : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 
hvm-3.0-x86_32p hvm-3.0-x86_64


Is there any further info that can be provided? Not being able to 
virtualise Windows is a bit of a PITA...


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] read grubenv and set default from saved_entry or next_entry

2019-08-27 Thread Steven Haigh
Just wanted to give this a quick followup... Did this end up 
progressing?


On Fri, Aug 16, 2019 at 3:37 PM, Steven Haigh  wrote:

On 2019-08-16 15:25, Steven Haigh wrote:

On 2019-08-16 05:05, YOUNG, MICHAEL A. wrote:

On Thu, 15 Aug 2019, Steven Haigh wrote:


Having a bit of a look here

My test system grubenv file has:
# GRUB Environment Block
saved_entry=0
kernelopts=root=UUID=5346b4d9-885f-4673-8aff-04a16bf1971a ro
rootflags=subvol=root selinux=0 rhgb quiet
boot_success=1
#


I have attached a revision of the first patch which should handle a
numeric saved_entry.


Hi Michael,

I tried this - and it successfully works for systems that have 
saved_entry=0.


I noticed that stock installs still have problems with updating
grubenv from new kernel installs. I had to manually regenerate
grub.cfg after upgrading to kernel 5.2.8. grubenv doesn't seem to get
changed at all unless you manually use 'grub2-set-default 0'

$ rpm -qa | grep kernel | sort
kernel-5.1.15-300.fc30.x86_64
kernel-5.2.8-200.fc30.x86_64
kernel-core-5.1.15-300.fc30.x86_64
kernel-core-5.2.8-200.fc30.x86_64
kernel-headers-5.2.8-200.fc30.x86_64
kernel-modules-5.1.15-300.fc30.x86_64
kernel-modules-5.2.8-200.fc30.x86_64

$ rpm -qa | grep grub | sort
grub2-common-2.02-81.fc30.noarch
grub2-pc-2.02-81.fc30.x86_64
grub2-pc-modules-2.02-81.fc30.noarch
grub2-tools-2.02-81.fc30.x86_64
grub2-tools-efi-2.02-81.fc30.x86_64
grub2-tools-extra-2.02-81.fc30.x86_64
grub2-tools-minimal-2.02-81.fc30.x86_64
grubby-8.40-31.fc30.x86_64
grubby-deprecated-8.40-31.fc30.x86_64

$ cat /etc/default/grub
GRUB_TIMEOUT=1
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="audit=0 selinux=0 console=hvc0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=false

It seems we still have issues with this configuration - but this is a
Fedora 30 problem -  not Xen.


Sorry, forgot to add this for using the functionality of 
saved_entry=0.


Tested-by: Steven Haigh 

Have not tested using a string as the entry - as all of my installs 
seem to have other problems wrt semi-related fedora issues.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] read grubenv and set default from saved_entry or next_entry

2019-08-15 Thread Steven Haigh

On 2019-08-16 15:25, Steven Haigh wrote:

On 2019-08-16 05:05, YOUNG, MICHAEL A. wrote:

On Thu, 15 Aug 2019, Steven Haigh wrote:


Having a bit of a look here

My test system grubenv file has:
# GRUB Environment Block
saved_entry=0
kernelopts=root=UUID=5346b4d9-885f-4673-8aff-04a16bf1971a ro
rootflags=subvol=root selinux=0 rhgb quiet
boot_success=1
#


I have attached a revision of the first patch which should handle a
numeric saved_entry.


Hi Michael,

I tried this - and it successfully works for systems that have 
saved_entry=0.


I noticed that stock installs still have problems with updating
grubenv from new kernel installs. I had to manually regenerate
grub.cfg after upgrading to kernel 5.2.8. grubenv doesn't seem to get
changed at all unless you manually use 'grub2-set-default 0'

$ rpm -qa | grep kernel | sort
kernel-5.1.15-300.fc30.x86_64
kernel-5.2.8-200.fc30.x86_64
kernel-core-5.1.15-300.fc30.x86_64
kernel-core-5.2.8-200.fc30.x86_64
kernel-headers-5.2.8-200.fc30.x86_64
kernel-modules-5.1.15-300.fc30.x86_64
kernel-modules-5.2.8-200.fc30.x86_64

$ rpm -qa | grep grub | sort
grub2-common-2.02-81.fc30.noarch
grub2-pc-2.02-81.fc30.x86_64
grub2-pc-modules-2.02-81.fc30.noarch
grub2-tools-2.02-81.fc30.x86_64
grub2-tools-efi-2.02-81.fc30.x86_64
grub2-tools-extra-2.02-81.fc30.x86_64
grub2-tools-minimal-2.02-81.fc30.x86_64
grubby-8.40-31.fc30.x86_64
grubby-deprecated-8.40-31.fc30.x86_64

$ cat /etc/default/grub
GRUB_TIMEOUT=1
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="audit=0 selinux=0 console=hvc0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=false

It seems we still have issues with this configuration - but this is a
Fedora 30 problem -  not Xen.


Sorry, forgot to add this for using the functionality of saved_entry=0.

Tested-by: Steven Haigh 

Have not tested using a string as the entry - as all of my installs seem 
to have other problems wrt semi-related fedora issues.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] read grubenv and set default from saved_entry or next_entry

2019-08-15 Thread Steven Haigh

On 2019-08-16 05:05, YOUNG, MICHAEL A. wrote:

On Thu, 15 Aug 2019, Steven Haigh wrote:


Having a bit of a look here

My test system grubenv file has:
# GRUB Environment Block
saved_entry=0
kernelopts=root=UUID=5346b4d9-885f-4673-8aff-04a16bf1971a ro
rootflags=subvol=root selinux=0 rhgb quiet
boot_success=1
#


I have attached a revision of the first patch which should handle a
numeric saved_entry.


Hi Michael,

I tried this - and it successfully works for systems that have 
saved_entry=0.


I noticed that stock installs still have problems with updating grubenv 
from new kernel installs. I had to manually regenerate grub.cfg after 
upgrading to kernel 5.2.8. grubenv doesn't seem to get changed at all 
unless you manually use 'grub2-set-default 0'


$ rpm -qa | grep kernel | sort
kernel-5.1.15-300.fc30.x86_64
kernel-5.2.8-200.fc30.x86_64
kernel-core-5.1.15-300.fc30.x86_64
kernel-core-5.2.8-200.fc30.x86_64
kernel-headers-5.2.8-200.fc30.x86_64
kernel-modules-5.1.15-300.fc30.x86_64
kernel-modules-5.2.8-200.fc30.x86_64

$ rpm -qa | grep grub | sort
grub2-common-2.02-81.fc30.noarch
grub2-pc-2.02-81.fc30.x86_64
grub2-pc-modules-2.02-81.fc30.noarch
grub2-tools-2.02-81.fc30.x86_64
grub2-tools-efi-2.02-81.fc30.x86_64
grub2-tools-extra-2.02-81.fc30.x86_64
grub2-tools-minimal-2.02-81.fc30.x86_64
grubby-8.40-31.fc30.x86_64
grubby-deprecated-8.40-31.fc30.x86_64

$ cat /etc/default/grub
GRUB_TIMEOUT=1
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="audit=0 selinux=0 console=hvc0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=false

It seems we still have issues with this configuration - but this is a 
Fedora 30 problem -  not Xen.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] read grubenv and set default from saved_entry or next_entry

2019-08-15 Thread Steven Haigh

On 2019-08-15 09:56, YOUNG, MICHAEL A. wrote:

On Thu, 15 Aug 2019, Michael Young wrote:

This patch may help your issue with the default kernel setting on 
Fedora 30
as it uses the setting of saved_entry or next_entry from the grubenv 
file to
choose the default kernel which should override any setting picked up 
from if

clauses in the grub.cfg file.

I have only done limited and somewhat imperfect testing on it and 
isn't a
proper fix (which would use grubenv settings based on what is in the 
if

clauses) but I think it should work in your case.


The patch is actually attached this time.

Michael Young


Having a bit of a look here

My test system grubenv file has:
# GRUB Environment Block
saved_entry=0
kernelopts=root=UUID=5346b4d9-885f-4673-8aff-04a16bf1971a ro 
rootflags=subvol=root selinux=0 rhgb quiet

boot_success=1
#


The grub-set-default man page states:
SYNOPSIS
   grub-set-default [--boot-directory=DIR] MENU_ENTRY

   MENU_ENTRY
  A number, a menu item title or a menu item identifier.

Do I have to use a name for the title to match?

I notice I still have the second option selected with saved_entry=0

--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] failing to set value to 0 in Grub2ConfigFile

2019-08-13 Thread Steven Haigh
I've had a tinker with the patch - I don't have a Fedora build system 
atm - so I just edited the file on the Dom0 and removed the pyc/pyo 
files. Same issue:


   pyGRUB  version 0.6

┌┐
│ Fedora (5.2.6-200.fc30.x86_64) 30 (Thirty) 
   │
│ Fedora (0-rescue-ee4b18b1898e4bf2b36ff71077b23b5e) 30 (Thirty) 
   │
│
   │
│
   │
│
   │
│
   │
│
   │
│
   │


└┘
Use the ^ and v keys to select which entry is highlighted.
Press enter to boot the selected OS, 'e' to edit the
commands before booting, 'a' to modify the kernel arguments
before booting, or 'c' for a command line.

The rescue entry is selected in the above example.

My crappy hack has been to edit /usr/libexec/xen/bin/pygrub and add 
sel=0 as follows:


   def image_index(self):
   if isinstance(self.cf.default, int):
   sel = self.cf.default
   elif self.cf.default.isdigit():
   sel = int(self.cf.default)
   sel = 0
   else:

I know this is horrible!

I'm still disabling BLSCFG in /etc/default/grub - otherwise the pygrub 
menu is completely empty.


I don't know what the solution is right now - but I do somewhat agree 
with ignoring anything inside an if statement in grub.cfg - as the 
logic is ignored anyway. Do you still need to read the grubenv in doing 
this?


I assume the read for grubenv is to get the 'saved_entry' value?

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Wed, Aug 14, 2019 at 7:51 AM, "YOUNG, MICHAEL A." 
 wrote:

On Tue, 13 Aug 2019, Andrew Cooper wrote:


 On 13/08/2019 22:02, YOUNG, MICHAEL A. wrote:
 I have been looking at the pygrub code to see if it is possible to 
cope
 with grub files with BLSCFG and spotted this minor issue in 
GrubConf.py
 where the code intends to replace ${saved_entry} and ${next_entry} 
with 0

 but doesn't succeed.

 Signed-off-by: Michael Young 


 Ah - this looks suspiciously like it might be the bugfix for an 
issue

 reported by Steven.

 Steven - do you mind giving this patch a try for your "Fedora 30 
DomU -

 pygrub always boots the second menu option" problem?


Sadly I don't think it is that simple and to it properly would require
parsing if clauses in the grub file and also reading variables from 
the

grubenv file.

I do however have an idea which might work which is to ignore 
anything in
if clauses, read the grubenv file (which I now have a hacky way of 
doing)

and treating the value of next_entry or saved_entry as the setting for
the default kernel to pick. If I finish a patch that does this I will 
post
it on the list, but I very much doubt it will be of commitable 
quality.


Michael Young

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Fedora 30 DomU - pygrub always boots the second menu option

2019-08-09 Thread Steven Haigh
Further looking into this, it seems the problem occurs on this section 
of the grub.cfg:


### BEGIN /etc/grub.d/08_fallback_counting ###
insmod increment
# Check if boot_counter exists and boot_success=0 to activate this 
behaviour.

if [ -n "${boot_counter}" -a "${boot_success}" = "0" ]; then
 # if countdown has ended, choose to boot rollback deployment,
 # i.e. default=1 on OSTree-based systems.
 if [ "${boot_counter}" = "0" -o "${boot_counter}" = "-1" ]; then
   set default=1
   set boot_counter=-1
 # otherwise decrement boot_counter
 else
   decrement boot_counter
 fi
 save_env boot_counter
fi
### END /etc/grub.d/08_fallback_counting ###

It seems pygrub sees the 'set default=1' and uses it - even though it 
shouldn't be normally hit due to it being in a conditional.

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


On Thu, Aug 1, 2019 at 12:54 AM, Steven Haigh  wrote:
There's a ton of changes to grub in Fedora 30 Most of them 
causing pain.


When booting using pygrub, the presented menu always has the second 
option selected.


The contents of /etc/default/grub is as follows:
GRUB_TIMEOUT=1
GRUB_DEFAULT=0
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="audit=0 selinux=0 console=hvc0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=false

I have attached the generated grub.cfg created via:
grub2-mkconfig -o /boot/grub/grub.cfg

BLSCFG is a whole new clusterf**k of problems that became default in 
Fedora 30 that cause many problems - but first things first...

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Fedora 30 DomU - pygrub always boots the second menu option

2019-07-31 Thread Steven Haigh
There's a ton of changes to grub in Fedora 30 Most of them causing 
pain.


When booting using pygrub, the presented menu always has the second 
option selected.


The contents of /etc/default/grub is as follows:
GRUB_TIMEOUT=1
GRUB_DEFAULT=0
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="audit=0 selinux=0 console=hvc0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=false

I have attached the generated grub.cfg created via:
grub2-mkconfig -o /boot/grub/grub.cfg

BLSCFG is a whole new clusterf**k of problems that became default in 
Fedora 30 that cause many problems - but first things first...

Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897


#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub2-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

### BEGIN /etc/grub.d/00_header ###
set pager=1

if [ -f ${config_directory}/grubenv ]; then
  load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
  load_env
fi
if [ "${next_entry}" ] ; then
   set default="${next_entry}"
   set next_entry=
   save_env next_entry
   set boot_once=true
else
   set default="0"
fi

if [ x"${feature_menuentry_id}" = xy ]; then
  menuentry_id_option="--id"
else
  menuentry_id_option=""
fi

export menuentry_id_option

if [ "${prev_saved_entry}" ]; then
  set saved_entry="${prev_saved_entry}"
  save_env saved_entry
  set prev_saved_entry=
  save_env prev_saved_entry
  set boot_once=true
fi

function savedefault {
  if [ -z "${boot_once}" ]; then
saved_entry="${chosen}"
save_env saved_entry
  fi
}

function load_video {
  if [ x$feature_all_video_module = xy ]; then
insmod all_video
  else
insmod efi_gop
insmod efi_uga
insmod ieee1275_fb
insmod vbe
insmod vga
insmod video_bochs
insmod video_cirrus
  fi
}

terminal_output console
if [ x$feature_timeout_style = xy ] ; then
  set timeout_style=menu
  set timeout=1
# Fallback normal timeout code in case the timeout_style feature is
# unavailable.
else
  set timeout=1
fi
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/01_users ###
if [ -f ${prefix}/user.cfg ]; then
  source ${prefix}/user.cfg
  if [ -n "${GRUB2_PASSWORD}" ]; then
set superusers="root"
export superusers
password_pbkdf2 root ${GRUB2_PASSWORD}
  fi
fi
### END /etc/grub.d/01_users ###

### BEGIN /etc/grub.d/08_fallback_counting ###
insmod increment
# Check if boot_counter exists and boot_success=0 to activate this behaviour.
if [ -n "${boot_counter}" -a "${boot_success}" = "0" ]; then
  # if countdown has ended, choose to boot rollback deployment,
  # i.e. default=1 on OSTree-based systems.
  if  [ "${boot_counter}" = "0" -o "${boot_counter}" = "-1" ]; then
set default=1
set boot_counter=-1
  # otherwise decrement boot_counter
  else
decrement boot_counter
  fi
  save_env boot_counter
fi
### END /etc/grub.d/08_fallback_counting ###

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Fedora (5.1.20-300.fc30.x86_64) 30 (Thirty)' --class fedora --class 
gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 
'gnulinux-5.1.20-300.fc30.x86_64-advanced-e2f94071-1c3b-4b45-b6fb-22e3f952d4ae' 
{
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod ext2
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root  
e2f94071-1c3b-4b45-b6fb-22e3f952d4ae
else
  search --no-floppy --fs-uuid --set=root 
e2f94071-1c3b-4b45-b6fb-22e3f952d4ae
fi
linux   /boot/vmlinuz-5.1.20-300.fc30.x86_64 
root=UUID=e2f94071-1c3b-4b45-b6fb-22e3f952d4ae ro audit=0 selinux=0 
console=hvc0 
initrd  /boot/initramfs-5.1.20-300.fc30.x86_64.img
}
menuentry 'Fedora (5.1.18-300.fc30.x86_64) 30 (Thirty)' --class fedora --class 
gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 
'gnulinux-5.1.18-300.fc30.x86_64-advanced-e2f94071-1c3b-4b45-b6fb-22e3f952d4ae' 
{
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod ext2
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root  
e2f94071-1c3b-4b45-b6fb-22e3f952d4ae
else
  search --no-floppy --fs-uuid --set=root 
e2f94071-1c3b-4b45-b6fb-22e3f952d4ae
fi
linux   /boot/vmlinuz-5.1.18-300.fc30.x86_64 
root=UUID=e2f94071-1c3b-4b45-b6fb-22e3f952d4ae ro audit=0 selinux=0 
console=hvc0 
initrd  /boot/initramfs-5.1.18-300.fc30.x86_64.img
}
menuentry 'Fedora (0-rescue-46e72612de204d5d8d6a9fe68e255ba3) 30 (Thirty)' 
--class fedora --class gnu-linux --class gnu --class os --un

[Xen-devel] Fedora 30 and BLSCFG changes equals non-booting DomUs.

2019-07-31 Thread Steven Haigh
Fedora 30 implemented Boot Loader Specification (BLS) by default for 
all newly installed, and any upgraded systems.


This causes hell booting a DomU that is *not* configured as HVM - thus 
fails when not using the bootloader from within the guest.


pygrub will always fail to boot these VMs.

Links:

Fedora change page:
https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault

Main Fedora BZ with lots of issues:
https://bugzilla.redhat.com/show_bug.cgi?id=1652806

My bug report on new kernels not appearing in generated grub.cfg files:
https://bugzilla.redhat.com/show_bug.cgi?id=1703700

So far, the only workaround is to install the 'grubby-depreciated' 
package, set 'GRUB_ENABLE_BLSCFG=false' in /etc/default/grub, then 
manually re-create the grub.cfg file via: grub2-mkconfig -o 
/boot/grub/grub.cfg


Upon a newer kernel being installed, it may or may not appear in the 
grub.cfg configuration - even with the above changes. As such, numerous 
kernel upgrades later and your installed VM might not boot at all.


In numerous systems, I run grub2-mkconfig in /etc/rc.d/rc.local to 
avoid a completely broken VM. Not ideal.


So, to start the discussion, with none of this currently being sent 
upstream, this is a Fedora-ism. How to handle BLS enabled guests?


It also seems to be a Fedora problem with respect to kernel updates 
still causing problems - but that's another issue.


Steven Haigh

 net...@crc.id.au  https://www.crc.id.au
 +613 9001 6090    +614 1293 5897




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Criteria / validation proposal: drop Xen

2019-05-14 Thread Steven Haigh


On Tue, May 14, 2019 at 11:50 PM, Lars Kurth  
wrote:

Apologies,
I mixed up some references
Lars


...

[A2] https://bugzilla.redhat.com/show_bug.cgi?id=1264103
[B1] https://bugzilla.redhat.com/show_bug.cgi?id=1703700


Bug B1 here was lodged by myself. There is also a post to xen-devel 
titled "pygrub not starting first menuentry in Fedora 30".


I just added a comment there which I shall paste below to include those 
not subscribed to that BZ:


Thinking about this further - and noticing it being referenced on 
xen-devel mailing list, I would like to suggest the following - which 
may have been overlooked right now...


If the grub %post scripting checked to see if it was installing / 
upgrading in a Xen DomU, it could set 'GRUB_ENABLE_BLSCFG=false' in 
/etc/default/grub automatically. This would fix both new installs and 
upgrades.


The final fix would be figuring out why pygrub currently boots the 
*second* entry in the resulting grub.cfg - unlike how F29 worked. This 
may be either a fix on the grub2-mkconfig or pygrub side - I'm not 
quite sure yet. This would likely restore functionality completely. At 
least until something else more suitable is done?




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] pygrub not starting first menuentry in Fedora 30

2019-05-14 Thread Steven Haigh


On Tue, May 14, 2019 at 11:40 PM, George Dunlap  
wrote:
On Mon, May 13, 2019 at 11:25 AM Steven Haigh  
wrote:


 There seems to be some changes in Fedora 30 that cause the second 
boot

 entry in grub.cfg to be booted instead of the first.

 This means that Fedora 30 systems either always boot into an older
 kernel, or in the case of systems with only one kernel installed, 
the

 rescue image.

 There also seems to be some new issues with the move to BLSCFG -
 however it seems a new requirement is to have
 GRUB_ENABLE_BLSCFG="false" in /etc/default/grub. This causes
 grub2-mkconfig to work correctly and spit out a grub.cfg file that
 pygrub can then use.

 Is this a bug in pygrub, or a problem with how Fedora 30 generates a
 grub.cfg?

 I tried to pick through pygrub - but couldn't quite follow the 
python

 logic to see where the default boot option is selected.


AFAICT, the basic issue is that pygrub is a partial re-implementation
of grub, and hasn't re-implemented the blscfg functionality.


I don't think this is an issue. When using 'GRUB_ENABLE_BLSCFG=false' 
in /etc/default/grub, the grub config file is generated correctly and 
works as expected. The problem is not that it doesn't work, but 
something is causing an offset in the default menu item (almost like an 
off-by-one) that causes the *second* entry in the grub.cfg to boot.



The *most robust* solution going forward is always going to be to use
grub-xen (AKA pvgrub2) instead of pygrub.  grub-xen is a port of the
actual grub project to run as a PV guest, and so will always be  the
most compatible with upstream grub.

Not sure who's "in charge" of pygrub enough to teach it how to use 
blscfg.


I'm not sure there's a huge rush for this... If upstream grub 
installers checked to see if it was installing on a Xen DomU, then set 
GRUB_ENABLE_BLSCFG=false by default - the remaining fix should be 
rather simple to figure out - after all, functionality is correct - 
apart from the wrong entry being selected by default.




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] pygrub not starting first menuentry in Fedora 30

2019-05-13 Thread Steven Haigh
There seems to be some changes in Fedora 30 that cause the second boot 
entry in grub.cfg to be booted instead of the first.


This means that Fedora 30 systems either always boot into an older 
kernel, or in the case of systems with only one kernel installed, the 
rescue image.


There also seems to be some new issues with the move to BLSCFG - 
however it seems a new requirement is to have 
GRUB_ENABLE_BLSCFG="false" in /etc/default/grub. This causes 
grub2-mkconfig to work correctly and spit out a grub.cfg file that 
pygrub can then use.


Is this a bug in pygrub, or a problem with how Fedora 30 generates a 
grub.cfg?


I tried to pick through pygrub - but couldn't quite follow the python 
logic to see where the default boot option is selected.




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Criteria / validation proposal: drop Xen

2019-04-27 Thread Steven Haigh
(and sending to the list this time due to Geary being rather 
featureless mail client)


As one of those being caught by regressions upgrading F29 to F30 under 
Xen DomU's, I think this is a bad idea.


It shows that it wasn't tested, because it doesn't work. To me, this 
exposes weaknesses in the testing and the solution shouldn't be "The 
check fails, remove the check".


On Sat, Apr 27, 2019 at 4:18 AM, Konrad Rzeszutek Wilk 
 wrote:

On Fri, Apr 26, 2019 at 10:22:13PM +0530, Sumantro Mukherjee wrote:

 Yup +1 from my side too. Xen is hardly tested since a lot of time.


Hi!

And that is thanks to one of the GRUB2 bugs that needs some love
from Peter Jones.

As without that bug being fixed - it is very difficult to test it - 
as you can't even load Xen!


I've asked the upstream GRUB maintainer to sheed some light on the
confusion about multiboot2 + SecureBoot - hopefully that will resolve
the question.

My vote is to have it remain as is.

Thank you.


 On Fri, Apr 26, 2019 at 10:07 PM Geoffrey Marr  
wrote:


 > Since F24, I haven't seen or heard of anyone who uses Xen over KVM
 > anywhere other than this thread... I'm +1 for making this test an
 > "Optional" one.
 >
 > Geoff Marr
 > IRC: coremodule
 >
 >
 > On Fri, Apr 26, 2019 at 10:33 AM Adam Williamson <
 > adamw...@fedoraproject.org> wrote:
 >
 >> On Thu, 2017-07-06 at 13:19 -0700, Adam Williamson wrote:
 >> > On Thu, 2017-07-06 at 15:59 -0400, Konrad Rzeszutek Wilk wrote:
 >> > > > > I would prefer for it to remain as it is.
 >> > > >
 >> > > > This is only practical if it's going to be tested, and 
tested

 >> regularly
 >> > > > - not *only* on the final release candidate, right before 
we sign

 >> off
 >> > > > on the release. It needs to be tested regularly throughout 
the

 >> release
 >> > > > cycle, on the composes that are "nominated for testing".
 >> > >
 >> > > Right, which is why I am happy that you have pointed me to 
the right

 >> > > place so I can be up-to-date.
 >> >
 >> > Great, thanks. So let's leave it as it is for now, but we'll 
keep an
 >> > eye on this during F27 cycle. If we get to, say, Beta and 
there are no

 >> > results for the test, that's gonna be a problem. Thanks!
 >>
 >> So, for Fedora 30, this was not tested throughout the whole 
cycle. I

 >> think we can consider the proposal to remove the criterion active
 >> again.
 >> --
 >> Adam Williamson
 >> Fedora QA Community Monkey
 >> IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT 
happyassassin . net

 >> http://www.happyassassin.net
 >> ___
 >> test mailing list -- t...@lists.fedoraproject.org
 >> To unsubscribe send an email to 
test-le...@lists.fedoraproject.org
 >> Fedora Code of Conduct: 
https://getfedora.org/code-of-conduct.html
 >> List Guidelines: 
https://fedoraproject.org/wiki/Mailing_list_guidelines

 >> List Archives:
 >> 
https://lists.fedoraproject.org/archives/list/t...@lists.fedoraproject.org

 >>
 > ___
 > test mailing list -- t...@lists.fedoraproject.org
 > To unsubscribe send an email to test-le...@lists.fedoraproject.org
 > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
 > List Guidelines: 
https://fedoraproject.org/wiki/Mailing_list_guidelines

 > List Archives:
 > 
https://lists.fedoraproject.org/archives/list/t...@lists.fedoraproject.org

 >


 --
 //sumantro
 Fedora QE
 TRIED AND PERSONALLY TESTED, ERGO TRUSTED 



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] Make credit2 the default scheduler

2018-09-19 Thread Steven Haigh

On 2018-09-19 19:38, Dario Faggioli wrote:

On Sat, 2018-09-15 at 00:21 +1000, Steven Haigh wrote:

On Friday, 14 September 2018 6:45:35 PM AEST Jan Beulich wrote:
> > > >
> And that's despite "sched=credit2 crashes system when using
> cpupools"? While I agree that we shouldn't delay the switch for
> much longer, in particular with there already being a fix available
> from you I think that one should go in before the one here.

Even though my opinion probably isn't very heavy on this matter, I've
used
credit2 exclusively for a considerable time.


Well, this is really interesting and useful to know. Can I ask what
your typical workload is (if any), and how are things going?


I don't really have a 'typical' workload. There's mail servers, web 
servers, DNS, shell boxes, all kinds of varied stuff. I haven't had any 
noticeable performance issues that I could look at and say "X is 
different".



If you're talking the issue I
think you're talking about, then I discovered it when doing stuff
that most
people probably wouldn't bother with - evidenced that I hadn't done
it before
either.


Actually, we do expect the default scheduler not to crash if one
creates a cpupool.

Not that there hasn't been similar bug in Credit1, while it was the
default (check `git log' :-/). But what this all means is that we need
to do better at testing these things, e.g., finally adding cpupool and
CPU online/offline testing to OSSTest.

Anyway, the bugfix is in now. :-)


Agreed. I haven't had a chance to test that patch of yet - as I need to 
reconfigure the IPMI BMC to recover from a hard crash. Its tucked away 
in a rack out of sight and mind - so this isn't quite straight forward - 
but possible if I know I'm going to try and crash it :)



I take peoples word on the performance +/- of a few percent here and
there -
so if its easier to maintain and better code, then yeah - it makes
sense to
move on with it. I certainly haven't found any normal use cases that
would
lead me to object to this.


Great, and thanks again for the feedback!

Regards,
Dario


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] Make credit2 the default scheduler

2018-09-14 Thread Steven Haigh
On Friday, 14 September 2018 6:45:35 PM AEST Jan Beulich wrote:
> >>> On 13.09.18 at 18:51,  wrote:
> > On Thu, 2018-09-13 at 17:38 +0100, George Dunlap wrote:
> >> Credit2 was declared "supported" in 4.8, and as of 4.10 had two other
> >> critical features implemented (soft affinity / NUMA and caps).
> >> 
> >> [..]
> >> 
> >> Credit2, like credit, has a number of workloads / setups for which
> >> performance could be improved.  Personally I think networking and
> >> partially-loaded systems is going to be more representative of what
> >> Xen is actually used for; so I think credit2 is on the whole the
> >> better scheduler to use by default.  And in any case, making those
> >> improvements on credit2 will be easier than on credit.
> >> 
> >> Signed-off-by: George Dunlap 
> > 
> > After all the effort we've spent on this, I'm really, really happy to
> > see this (trying to) happen. Thanks for sending the patch. :-)
> > 
> > I fully agree with and second George's reasoning, and feel 100% like
> > providing my:
> > 
> > Acked-by: Dario Faggioli 
> 
> And that's despite "sched=credit2 crashes system when using
> cpupools"? While I agree that we shouldn't delay the switch for
> much longer, in particular with there already being a fix available
> from you I think that one should go in before the one here.

Even though my opinion probably isn't very heavy on this matter, I've used 
credit2 exclusively for a considerable time. If you're talking the issue I 
think you're talking about, then I discovered it when doing stuff that most 
people probably wouldn't bother with - evidenced that I hadn't done it before 
either.

I take peoples word on the performance +/- of a few percent here and there - 
so if its easier to maintain and better code, then yeah - it makes sense to 
move on with it. I certainly haven't found any normal use cases that would 
lead me to object to this.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] BUG: sched=credit2 crashes system when using cpupools

2018-09-12 Thread Steven Haigh
On Thursday, 13 September 2018 1:11:20 AM AEST Dario Faggioli wrote:
> On Thu, 2018-08-30 at 18:49 +1000, Steven Haigh wrote:
> > On 2018-08-30 18:33, Jan Beulich wrote:
> > > Anyway - as Jürgen says, something for the scheduler
> > > maintainers to look into.
> 
> Ok, I'm back.
> 
> > Yep - I just want to confirm that we tested this in BOTH NUMA
> > configurations - and credit2 crashed on both.
> > 
> > I switched back to sched=credit, and it seems to work as expected:
> > # xl cpupool-list
> > Name   CPUs   Sched Active   Domain count
> > Pool-node0  12credit   y  3
> > Pool-node1  12credit   y  0

Hi Dario,

I'll try to clarify below.
 
> Wait, in a previous message, you said: "A machine where we could get
> this working every time shows". Doesn't that mean creating a separate
> pool for node 1 works with both Credit and Credit2, if the node has
> memory?
> 
> I mean, trying to clarifying, my understanding is that you have to
> systems:
> 
> system A: node 1 has *no* memory
> system B: both node 0 and node 1 have memory
> 
> Creating a Credit pool with pcpus from node 1 always work on both
> systems.

Correct. With the credit scheduler, the pool split worked correctly on both 
systems.
 
> OTOH, when you try to create a Credit2 pool with pcpus from node 1,
> does it always crash on both systems, or does it work on system B and
> crashes on system A ?

Both systems crashed when using credit2. We originally thought this was due to 
the different memory layout between the two systems. This bit turned out to 
not matter as both systems crashed in the same way.

> I do have a NUMA box with RAM in both nodes (so similar to system B).
> Last time I checked, what you're trying to do worked there, pretty much
> with any scheduler combination, but I'll recheck.
> 
> I don't have a box similar to system A. I'll try to remove some of the
> RAM from that NUMA box, and check what happens.

In theory, if you reproduce what we did, it should crash anyway. The RAM 
layout shouldn't matter.

We changed the scheduler in the grub line as 'sched=credit2'. Then did the 
split. I didn't try changing the Dom0 to boot with credit, but making the 
pools credit2.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] BUG: sched=credit2 crashes system when using cpupools

2018-08-30 Thread Steven Haigh

On 2018-08-30 18:33, Jan Beulich wrote:

On 30.08.18 at 06:01,  wrote:
Managed to get the same crash log when adding CPUs to Pool-1 as 
follows:


Create the pool:
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN)  cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns

Add the CPUs:
(XEN) Adding cpu 12 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Removing cpu 12 from runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Removing cpu 13 from runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Removing cpu 14 from runqueue 0
(XEN) Xen BUG at sched_credit2.c:3452


credit2 still not being the default - do things work if you don't 
override

the default (of using credit1)? I guess the problem is connected to the
"Removing cpu  from runqueue 0", considering this

BUG_ON(!cpumask_test_cpu(cpu, >active));

is what triggers. Anyway - as Jürgen says, something for the scheduler
maintainers to look into.


Yep - I just want to confirm that we tested this in BOTH NUMA 
configurations - and credit2 crashed on both.


I switched back to sched=credit, and it seems to work as expected:
# xl cpupool-list
Name   CPUs   Sched Active   Domain count
Pool-node0  12credit   y  3
Pool-node1  12credit   y  0

I've updated the subject - as this isn't a NUMA issue at all.

--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] RFE: Detect NUMA misconfigurations and prevent machine freezes

2018-08-29 Thread Steven Haigh

On 2018-08-29 15:49, Juergen Gross wrote:

On 29/08/18 07:33, Steven Haigh wrote:
When playing with NUMA support recently, I noticed a host would always 
hang
when trying to create a cpupool for the second NUMA node in the 
system.


I was using the following commands:
# xl cpupool-create name=\"Pool-1\" sched=\"credit2\
# xl cpupool-cpu-remove Pool-0 node:1
# xl cpupool-cpu-add Pool-1 node:1

After the last command, the system would hang - requiring a hard reset 
of the

machine to fix.

I tried a different variation with the same result:
# xl cpupool-create name=\"Pool-1\" sched=\"credit2\
# xl cpupool-cpu-remove Pool-0 node:1
# xl cpupool-cpu-add Pool-1 12

It turns out that the RAM was installed sub-optimally in this machine. 
A

partial output from 'xl info -n' shows:
numa_info  :
node:memsizememfreedistances
  0: 67584  62608  10,21
  1: 0  0  21,10

A machine where we could get this working every time shows:
node:memsizememfreedistances
  0: 34816  30483  10,21
  1: 32768  32125  21,10

As we can deduce RAM misconfigurations in this scenario, I believe we 
should
check to ensure that RAM configuration / layout is sane *before* 
attempting to

split the system and print a warning.

This would prevent a hard system freeze in this scenario.


RAM placement should not matter here. As the name already suggests
cpupools do assignment of cpus. RAM allocated will be preferred taken
from a local node, but this shouldn't be mandatory for success.

Would it be possible to use a debug hypervisor (e.g. 4.12-unstable) for
generating a verbose log (hypervisor boot parameter "loglvl=all") and
sending the complete hypervisor log?


I don't have a package for 4.11 or 4.12 built at all - but I did this on 
4.10.2-pre (built from staging-4.10).


Managed to get the same crash log when adding CPUs to Pool-1 as follows:

Create the pool:
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN)  cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns

Add the CPUs:
(XEN) Adding cpu 12 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Removing cpu 12 from runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Removing cpu 13 from runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Removing cpu 14 from runqueue 0
(XEN) Xen BUG at sched_credit2.c:3452
(XEN) Adding cpu 15 to runqueue 0
(XEN) [ Xen-4.10.2-pre  x86_64  debug=n   Not tainted ]
(XEN) CPU:13
(XEN) RIP:e008:[] 
sched_credit2.c#csched2_schedule+0xf60/0x1330

(XEN) RFLAGS: 00010046   CONTEXT: hypervisor
(XEN) rax:    rbx: 82d080562500   rcx: 
fffd
(XEN) rdx: 831040ae1f80   rsi: 000d   rdi: 
831050abfe38
(XEN) rbp: 831044ba7f30   rsp: 831050abfd20   r8:  
831050ac61a0
(XEN) r9:  82d08022eee0   r10: 0001   r11: 
83007dc11060
(XEN) r12: 000d   r13: 831050aefec0   r14: 
831050ac6180
(XEN) r15: 82d080562500   cr0: 8005003b   cr4: 
001526e0

(XEN) cr3: 7dc32000   cr2: 7fff9cb47838
(XEN) fsb:    gsb:    gss: 


(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen code around  
(sched_credit2.c#csched2_schedule+0xf60/0x1330):
(XEN)  ff ff 0f 0b 0f 1f 40 00 <0f> 0b 0f 0b 4c 89 ef e8 d4 b6 ff ff e9 
87 f1 ff

(XEN) Xen stack trace from rsp=831050abfd20:
(XEN)83107f5f6010 831050abfe38 002dd914464a 
000d
(XEN)02f8d264 8310  
8310419efdd0
(XEN)831050ac6400 82d0802344fa  
001bc97d2ea7
(XEN)002d9f005422 831050ac63c0 82d080562500 
82d080577280
(XEN)000d 0046 0282 
0082
(XEN)000d 831050ac61c8 82d080576fac 
000d
(XEN)83007dd7f000 831044ba8000 002dd914464a 
831050ac6180
(XEN)82d080562500 82d080233eed 002d 
831050ac61a0
(XEN)831050ab6010 82d080267cf5 82d0802bffe0 
82d0802c4505
(XEN)002dd61b7fed 82d08023ae48 3800380b 

(XEN) 831050ab 82d08054bc80 
82d080562500
(XEN)831050ab 82d0802375b2 82d0805771f0 
82d08054c300
(XEN)82d0805771f0 000d 000d 
82d08026d2d5
(XEN)83007dd7f000 83007dd7f000 83007dbae000 
83107cd38000
(XEN) 83107fb0d000 82d080562500 

(XEN)   

(XEN) 0

[Xen-devel] RFE: Detect NUMA misconfigurations and prevent machine freezes

2018-08-28 Thread Steven Haigh
When playing with NUMA support recently, I noticed a host would always hang 
when trying to create a cpupool for the second NUMA node in the system.

I was using the following commands:
# xl cpupool-create name=\"Pool-1\" sched=\"credit2\
# xl cpupool-cpu-remove Pool-0 node:1
# xl cpupool-cpu-add Pool-1 node:1

After the last command, the system would hang - requiring a hard reset of the 
machine to fix.

I tried a different variation with the same result:
# xl cpupool-create name=\"Pool-1\" sched=\"credit2\
# xl cpupool-cpu-remove Pool-0 node:1
# xl cpupool-cpu-add Pool-1 12

It turns out that the RAM was installed sub-optimally in this machine. A 
partial output from 'xl info -n' shows:
numa_info  :
node:memsizememfreedistances
  0: 67584  62608  10,21
  1: 0  0  21,10

A machine where we could get this working every time shows:
node:memsizememfreedistances
  0: 34816  30483  10,21
  1: 32768  32125  21,10

As we can deduce RAM misconfigurations in this scenario, I believe we should 
check to ensure that RAM configuration / layout is sane *before* attempting to 
split the system and print a warning.

This would prevent a hard system freeze in this scenario.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Unable to build staging-4.9 on RHEL6

2018-08-27 Thread Steven Haigh
On Monday, 27 August 2018 8:36:50 PM AEST Jan Beulich wrote:
> >>> On 27.08.18 at 12:03,  wrote:
> > On Monday, 27 August 2018 6:32:17 PM AEST Jan Beulich wrote:
> >> >>> On 24.08.18 at 04:56,  wrote:
> >> > When trying to build both xen and qemu-xen from the staging-4.9
> >> > branches, I'm running into issues compiling.
> >> > 
> >> > Errors start with:
> >> > 
> >> > BUILDSTDERR: sse.c: In function 'simd_test':
> >> > BUILDSTDERR: sse.c:319: error: subscripted value is neither array nor
> >> > pointer
> >> 
> >> That's the x86 insn emulator test harness afaict, which doesn't get
> >> built unless you explicitly ask for it. Why are you building it? It's
> >> well known that it requires a new enough compiler (and the bar will
> >> raise with AVX512 support, which is in the works).
> > 
> > Hi Jan,
> > 
> > I don't specifically enable any testing (that I'm aware of).
> > 
> > The current SPEC file that I'm using is at:
> > https://git.crc.id.au/netwiz/xen49/src/devel/SPECS/xen49.spec
> > 
> > Essentially it boils down to:
> > ./configure --enable-systemd --prefix=/usr --enable-xsmpolicy
> > %{?enable_ocaml}
> > \
> > 
> >   --libdir=%{_libdir} --enable-efi --disable-qemu-traditional \
> >   --with-extra-qemuU-configure-args="--enable-spice --enable-usb-redir"
> > 
> > (cd xen; make defconfig; sed -i 's/# CONFIG_LIVEPATCH is not set/
> > CONFIG_LIVEPATCH=y/g' .config; make oldconfig)
> > 
> > export XEN_DOMAIN=xen.crc.id.au
> > make %{?_smp_mflags} %{?efi_flags} dist
> 
> It's this last step which is of interest afaict; how you configure
> things does not seem to matter. The dist target implies building
> dist-tools, which in turn has install-tools as a dependency. Most
> of the sub-directories under tools/tests/ have an empty
> install target in their makefiles though, which means nothing is
> going to be done when entering the respective directories.

This is actually interesting.

I tried originally doing a 'make build', then 'make DESTDIR=%{buildroot} 
install' - however I found that this didn't produce any EFI binaries in /boot/
efi/efi/$EFI_VENDOR/ like the 'make dist' option does.

This seemed to persist across all versions (4.6 - 4.10) hence the path that I 
had taken.

Should this then be a part of fixing why no EFI binaries exist in the make 
build / make install method, and not trying to fix a test as a side effect of 
having to use 'make dist'?
 
> > I'll note that 4.6, 4.7, and 4.10 do not fail in this fashion using almost
> > the same command set to build.
> 
> I can't spot any relevant difference between the versions you
> mention (or master). I'm afraid you'll need to dig a little deeper
> yourself (unless someone else has an idea).
> 
> Jan



-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Unable to build staging-4.9 on RHEL6

2018-08-27 Thread Steven Haigh
On Monday, 27 August 2018 6:32:17 PM AEST Jan Beulich wrote:
> >>> On 24.08.18 at 04:56,  wrote:
> > When trying to build both xen and qemu-xen from the staging-4.9
> > branches, I'm running into issues compiling.
> > 
> > Errors start with:
> > 
> > BUILDSTDERR: sse.c: In function 'simd_test':
> > BUILDSTDERR: sse.c:319: error: subscripted value is neither array nor
> > pointer
> 
> That's the x86 insn emulator test harness afaict, which doesn't get
> built unless you explicitly ask for it. Why are you building it? It's
> well known that it requires a new enough compiler (and the bar will
> raise with AVX512 support, which is in the works).

Hi Jan,

I don't specifically enable any testing (that I'm aware of).

The current SPEC file that I'm using is at:
https://git.crc.id.au/netwiz/xen49/src/devel/SPECS/xen49.spec

Essentially it boils down to:
./configure --enable-systemd --prefix=/usr --enable-xsmpolicy %{?enable_ocaml} 
\
  --libdir=%{_libdir} --enable-efi --disable-qemu-traditional \
  --with-extra-qemuU-configure-args="--enable-spice --enable-usb-redir"

(cd xen; make defconfig; sed -i 's/# CONFIG_LIVEPATCH is not set/
CONFIG_LIVEPATCH=y/g' .config; make oldconfig)

export XEN_DOMAIN=xen.crc.id.au
make %{?_smp_mflags} %{?efi_flags} dist

I'll note that 4.6, 4.7, and 4.10 do not fail in this fashion using almost the 
same command set to build.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Unable to build staging-4.9 on RHEL6

2018-08-23 Thread Steven Haigh

Hi all,

When trying to build both xen and qemu-xen from the staging-4.9 
branches, I'm running into issues compiling.


Errors start with:

BUILDSTDERR: sse.c: In function 'simd_test':
BUILDSTDERR: sse.c:319: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse.c:320: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse.c:324: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse.c:328: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse.c:334: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse.c:340: error: can't convert between vector values of 
different size
BUILDSTDERR: sse.c:345: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse.c:368: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse.c:377: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse.c:380: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse.c:405: error: invalid operands to binary == (have 
'float __vector__' and 'vec_t')
BUILDSTDERR: sse.c:538: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse.c:555: error: invalid operands to binary == (have 
'float __vector__' and 'vec_t')
BUILDSTDERR: sse.c:569: error: invalid operands to binary == (have 
'vec_t' and 'int')
BUILDSTDERR: sse.c:645: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse.c:645: error: subscripted value is neither array nor 
pointer

BUILDSTDERR: make[6]: *** [sse.bin] Error 1
BUILDSTDERR: sse2.c: In function 'simd_test':
BUILDSTDERR: sse2.c:319: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse2.c:320: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse2.c:324: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse2.c:328: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse2.c:334: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse2.c:340: error: can't convert between vector values of 
different size
BUILDSTDERR: sse2.c:345: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')
BUILDSTDERR: sse2.c:569: error: invalid operands to binary == (have 
'vec_t' and 'int')
BUILDSTDERR: sse2.c:645: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse2.c:645: error: subscripted value is neither array nor 
pointer
BUILDSTDERR: sse2.c:651: error: invalid operands to binary > (have 
'vec_t' and 'int')
BUILDSTDERR: sse2.c:653: error: invalid operands to binary == (have 
'vec_t' and 'vec_t')

BUILDSTDERR: make[6]: *** [sse2.bin] Error 1

This continues for sse4, sse-avx, sse2-avx, and sse4-avx.

Right now, the code I'm using for this build is:

git clone -n -b staging-4.9 git://xenbits.xen.org/xen.git .
git checkout %{gitid}
pushd tools
git clone -b staging-4.9 git://xenbits.xen.org/qemu-xen.git
popd

gitid is currently set to: 6c9d139cdd0289f2b35b5deea4b41b8e3e1b39b7

--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Backport request - 3a2b8525b883baa87fe89b3da58f5c09fa599b99 to staging-4.9

2018-08-23 Thread Steven Haigh

Commit:
http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=3a2b8525b883baa87fe89b3da58f5c09fa599b99

I've also run across this building the staging-4.9 branch on RHEL7.

Build errors with:
gcc  -m64 -DBUILD_ID -fno-strict-aliasing -std=gnu99 -Wall 
-Wstrict-prototypes -Wdeclaration-after-statement 
-Wno-unused-but-set-variable -Wno-unused-local-typedefs   -O2 
-fomit-frame-pointer 
-D__XEN_INTERFACE_VERSION__=__XEN_LATEST_INTERFACE_VERSION__ -MMD -MF 
.xs-test.o.d -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE   -Werror 
-I/builddir/build/BUILD/tools/tests/xenstore/../../../tools/xenstore/include 
-I/builddir/build/BUILD/tools/tests/xenstore/../../../tools/include  -c 
-o xs-test.o xs-test.c

BUILDSTDERR: xs-test.c: In function 'call_test':
BUILDSTDERR: xs-test.c:111:8: error: 'ret' may be used uninitialized in 
this function [-Werror=maybe-uninitialized]

BUILDSTDERR:  if ( ret )
BUILDSTDERR: ^

--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Why Xen Project need an OS for doing Virtualization?

2018-07-15 Thread Steven Haigh

On 2018-07-16 06:40, Jason Long wrote:

Hello.
If Xen Project is a type-1 Hypervisor then why it need an OS for doing
Virtualization?


A quick google search should have pointed you to this:
https://wiki.xenproject.org/wiki/Dom0

--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...

2018-07-04 Thread Steven Haigh
On Thursday, 5 July 2018 1:47:27 AM AEST Ian Jackson wrote:
> George Dunlap writes ("Re: [Xen-devel] [Notes for xen summit 2018 design 
session] Process changes: is the 6 monthly release Cadence too short, Security 
Process, ..."):
> > I seem to recall saying that even if we agreed that moving to continuous
> > delivery was a goal we wanted to pursue, we would still be several years
> > away from achieving anything like it; and so in the mean time, it would
> > probably make sense to move back to a 9-month cycle while we attack the
> > problem.
> Another thing that is that as our window of N years'
> security-supported releases has filled up with ~6-month releases,
> there are more of them.
> 
> I know we had concerns that this makes backporting harder.  I'm not
> really sure that's true.  The total amount of backporting lossage
> (merge conflicts etc.) is the same, and trivial automatic backports
> are nearly no work.
> 
> But one thing that is noticeable is that this significantly increases
> our test load when a security update comes out.  Each
> security-supported branch gets updates, and osstest suddenly needs to
> test them all.

I did like the idea of an 'LTS' vs 'testing' release. The idea that odd / even 
or similar varied in lifecycle - allowing entire versions to be killed off 
rapidly.

If we had 'LTS' support at 2 (or more?) years, and 'testing' support at 6 (or 
less?) months, would this help?

I guess the idea would be that the 'testing' versions are where all the rapid 
features are added, which is then 'frozen' into an LTS release at some point 
to only get security fixes via point releases (ala linux kernel version type 
bumping).

Is this even practical? I guess this would depend on how often an LTS release 
is spawned...

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...

2018-07-04 Thread Steven Haigh
On Thursday, 5 July 2018 1:26:16 AM AEST George Dunlap wrote:
> > On Jul 3, 2018, at 11:07 AM, Roger Pau Monné 
> > wrote:
> > 
> > On Mon, Jul 02, 2018 at 06:03:39PM +, Lars Kurth wrote:
> > 
> >> We then had a discussion around why the positive benefits didn't
> >> materialize:
 * Andrew and a few other believe that the model isn't
> >> broken, but that the issue is with how we>> 
> >>   develop. In other words, moving to a 9 months model will *not* fix the
> >>   underlying issues, but 
 merely provide an incentive not to fix them.
> >> 
> >> * Issues highlighted were:
> >> 
> >>   * 2-3 months stabilizing period is too long
> > 
> > 
> > I think one of the goals with the 6 month release cycle was to shrink
> > the stabilizing period, but it didn't turn that way, and the
> > stabilizing period is quite similar with a 6 or a 9 month release
> > cycle.
> 
> 
> Right, and I think this was something that wasn’t quite captured in Lars’
> summary.
 
> Everyone agreed:
> 1. The expectation was that a shorter release cycle would lead to shorter
> stabilization periods
 2. This has not turned out to be the case, which
> means
> 3 At the moment, our “time doing development” to “time fixing bugs for a
> release” ratio is far too low.
 
> One option to fix #3 is to go back to a 9-month cycle (or even a 12-month
> cycle), which would increase the “development” part of the equation.
 
> But Doug was advocating trying instead to attack the “time fixing bugs” part
> of the equation.  He said he was a big fan “continuous delivery” — of being
> *always* ready to release.  And I think there’s a fair amount of agreement
> that one of the reasons it takes so long to stabilize is that our testing
> isn’t reliably catching bugs for whatever reason.

On this point alone, release quickly, release often. The kernel for instance 
runs about 1 release per week on the stable branches - sometimes 2.

With regular process, releases should be easy to achieve - and that means 
automation, automation and more automation.

From my end as a 'consumer' of this process, a regular process makes it easy 
for me to rebase work based on a regular release. In fact, it usually takes me 
less than 30 minutes to update, compile, package and release kernel builds - 
and all of that is compile time. I check kernel.org every 6 hours, and fire 
off automated builds as needed. The reality is, I could easily make this 
hourly and reduce time from release to package to less than an hour - but at 
that point is it worth it? :)

The longest delay on getting this to the clients is the time it takes for non-
local mirrors to catch up.
 
> So a fair amount of the discussion was about what it would look like, and
> what it would take, to make it such that almost any push from osstest (or
> whatever testing infrasctructure we went with) could reasonably be
> released, and would have a very low expectation of having extraneous bugs.

The key here is testing. If we started at the release end of the tree, 
something as simple as a git tag triggers a tarball export of the tree 
versioned as per the tag may well be a huge step forward in the release.

Then time can be taken after this in tweaking the testing part as we find 
things that become obvious through the rapid release process.

> I seem to recall saying that even if we agreed that moving to continuous
> delivery was a goal we wanted to pursue, we would still be several years
> away from achieving anything like it; and so in the mean time, it would
> probably make sense to move back to a 9-month cycle while we attack the
> problem.

Honestly, as a packager, I don't see ground-breaking changes in any version of 
Xen that is currently released. Most are optimisations like the new PVH code.

My point here is that really, are there enough major features that would break 
everything to require a new major release every 9 months?

Would 12 months for major features be more suitable and keep smaller 
refinements / additions as point releases?

With a solid base via testing, there's no real reason why a weekly (or daily) 
release wouldn't be technically feasible - apart from not having enough 
changes to justify it ;)

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Project Security Process Whitepaper v1 is ready for community review

2018-06-27 Thread Steven Haigh
On Wednesday, 27 June 2018 7:19:58 PM AEST Jan Beulich wrote:
> >>> On 27.06.18 at 06:05,  wrote:
> > Right now, we're at a stage where we could probably justify a new release
> > of 4.6, 4.7, 4.8, 4.9, and 4.10 due to the depth of XSAs contained within
> > that can't be patched on top of the release archive.
> 
> 4.7.6 and 4.8.4 are imminent anyway, and 4.9.3 is due in about a
> month's time (I'll send a respective call for pointing out missing
> backports once I've flushed out my own queue). There's not going to
> be another release off the 4.6 branch, at least not one organized by
> XenProject. Even us meaning to do so for 4.7 is only because of the
> circumstances.
> 
> As mentioned before - personally I'm not fancying to do more frequent
> stable releases.

Surely we are able to automate the majority of the process?

I could imagine that with a regular release schedule, it could be refined 
enough to automatically package the current git branch based on just 
committing a tag.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Project Security Process Whitepaper v1 is ready for community review

2018-06-26 Thread Steven Haigh
On Tuesday, 5 June 2018 8:34:28 PM AEST George Dunlap wrote:
> On Mon, Jun 4, 2018 at 3:55 PM, Lars Kurth  wrote:
> > 2.2.3 B. Git baseline of patches
> > This created quite a bit of discussion and we did learn a few things:
> > * From the thread, having to cherry pick a small (around 5-6) patches have
> > to be cherry-picked for XSAs to apply to tarballs this appears to be seen
> > as OK for most users. More patches are a problem * Recently this issue
> > has become much worse, because some security fixes (or pre-requisites for
> > them) have been developed in public and some XSAs required significant
> > backporting to be able to be run * A point release has usually <50%
> > security fixes
> > * There is no appetite amongst existing point release maintainers to
> > maintain a staging branch and an XSA + pre-requisites only branch
> > 
> > In other words, we are at a stale-mate. I see two ways around it
> > a) Find an additional volunteer to maintain XSA + pre-requisites only
> > branches for releases b) Find some tooling/test based solution which
> > exposes issues applying XSAs on the last releases of a staging branch for
> > a point release. This is a little bit of a half-baked idea, but it may be
> > worthwhile looking into. For example, we could create an OSSTEST, that
> > checks out the last released stable branch and applies outstanding XSAs
> > and pre-requisites based on the meta-info to it (e.g. via xsatool or a
> > variant thereof). This test would fail, if an XSA does not apply, which
> > implies that the pre-requisites are incomplete. If all XSAs apply, we can
> > run the full OSSTEST on it. The test could also produce a list of git
> > commits from staging that include XSAs and pre-requisites that can be
> > applied in order. This should in theory - if doable - help downstreams
> > which are struggling with this problem, while flagging up potential
> > issues to stable maintainers early. Any thoughts? Would this be workable
> > and if so, would it actually help?
> Here's a question:  What would it take for most downstreams to update
> to staging when a public release was made?
> 
> Suppose we did this:
> 1. When we predisclose an issue, freeze the stable branches until the
> embargo lifts -- no backports.
> 2. When the embargo lifts, addition to the patches, we release a new
> point release, complete with signed tag and tarball.
> 3. We only do non-security point releases if we go 4 months without a
> security-prompted point release.
> 
> At the moment the release process is quite manual, which isn't
> terrible for one point release every 4 months per supported release,
> but would significantly increase the workload if we did it for every
> supported version for every XSA.  We'd have to invest quite a bit in
> automating that process, which would make it only worth it if a
> significant number of people would find that useful.
> 
> The other thing we could probably do is write a tool which would
> automatically determine the minimum number of 'extra' patches to
> backport from the stable branch to allow the patch to apply and build.
> The issue with that, of course, is that such a branch will be an
> artificial branch which has almost no testing.

I just wanted to git this a bit of a prod to keep it current.

Right now, we're at a stage where we could probably justify a new release of 
4.6, 4.7, 4.8, 4.9, and 4.10 due to the depth of XSAs contained within that 
can't be patched on top of the release archive.

Its actually easier to rebuild packages often with just updating version 
numbers than do patches to eternity. The kernel packages are one such example 
where this can be easily automated to even build / distribute without human 
interaction.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Xen crashes with Ryzen 7 CPUs

2018-06-23 Thread Steven Haigh
Hi all,

So I recently decided to upgrade the hardware on my home server from a good 
old i5-2500k to a Ryzen 7 1700x - while throwing in 32Gb of DDR4 2400.

I have noticed that every time I boot Xen on the Ryzen 7, within a couple of 
minutes, the system crashes.

As this happens on both RHEL7 and Fedora 28, I figure theres something more 
going on here.

Does anyone have a set of recommendations or experiences of running Xen on a 
Ryzen CPU?

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Project Security Whitepaper v1 is ready for community review

2018-05-23 Thread Steven Haigh

On 2018-05-22 20:52, Steven Haigh wrote:

On Tuesday, 22 May 2018 8:11:38 PM AEST Jan Beulich wrote:

>>> On 18.05.18 at 19:53, <marma...@invisiblethingslab.com> wrote:
> Alternative workaround for this would be more frequent point releases by
> default (maybe with ability to delay it very few commits are queued).
> For example every 3 months. It wouldn't solve all the cases, but I think
> will make it easier most of the time.

Is every 3 months so much better than every 4 months? Granted we
basically never manage to make it exactly 4 months, but on the average
I think we're not too far off.


I think the big thing is reducing the delta between the staging branch 
and the
release. I can only assume that would reduce the number of issues that 
occur
with patching vs release tarballs - hopefully making the security teams 
job a

little easier.

That being said, if an approach of releasing a new build when we come 
across
broken patch sets for XSAs (like the current 4.9.1 vs XSAs, and prior 
4.10.0

vs XSAs), then I think this part becomes irrelevant.


As another example for this, the patches for XSA263 do not apply to 
*any* released tarball version of Xen.


So far, the patches included with the announcement fail on 4.6, 4.7, 4.9 
and 4.10.


I can only assume that this means all the XSA patches require commits 
that are currently in various staging git trees that have not been 
released in any formal manner via a point release.


--
Steven Haigh

? net...@crc.id.au ? https://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Project Security Whitepaper v1 is ready for community review

2018-05-22 Thread Steven Haigh
On Tuesday, 22 May 2018 8:11:38 PM AEST Jan Beulich wrote:
> >>> On 18.05.18 at 19:53, <marma...@invisiblethingslab.com> wrote:
> > Alternative workaround for this would be more frequent point releases by
> > default (maybe with ability to delay it very few commits are queued).
> > For example every 3 months. It wouldn't solve all the cases, but I think
> > will make it easier most of the time.
> 
> Is every 3 months so much better than every 4 months? Granted we
> basically never manage to make it exactly 4 months, but on the average
> I think we're not too far off.

I think the big thing is reducing the delta between the staging branch and the 
release. I can only assume that would reduce the number of issues that occur 
with patching vs release tarballs - hopefully making the security teams job a 
little easier.

That being said, if an approach of releasing a new build when we come across 
broken patch sets for XSAs (like the current 4.9.1 vs XSAs, and prior 4.10.0 
vs XSAs), then I think this part becomes irrelevant.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Project Security Whitepaper v1 is ready for community review

2018-05-18 Thread Steven Haigh
Hi Lars,

I think this is an excellent start.

A specific concern that I have is when we get into a state between releases 
and XSAs where you cannot take the current release and then apply all released 
/ embargo'ed XSA patches.

The current reasoning for this is that XSA patches are developed on top of the 
staging git branches. While this is still acceptable, I believe we need the 
ability to roll a new point release that will allow end users to be up to 
date.

Expecting things to always be built for distribution from the staging git 
branch is somewhat of a hassle - as in the current case of 4.9.1. With 
publicly released XSAs, you cannot begin with a release of 4.9.1 and patch all 
post-released XSAs.

While this does not seem to happen very often - I would estimate around 4-5 
times in the past decade - we should encourage an out-of-schedule point 
release. This can be based off the current state post-XSA of the staging 
branch - but enables reproducable builds at the very least.

Recently, this situation happened with the batch of XSAs before 4.10.1 was 
released, and is currently the case of 4.9.1 + existing XSAs.

This potentially leaves end users in limbo until the next point release rolls 
around - without rebasing off a semi-random git commit (which is not 4.9.1 or 
4.9.2 - but something inbetween) - or backporting massive amounts of commits 
to a release.

As this is a somewhat rare occasion, if only a handful of commits need to be 
cherrypicked, I would see this as fine. If it requires many more, I believe it 
should trigger an out-of-cycle point release.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

On Friday, 18 May 2018 8:13:55 PM AEST Lars Kurth wrote:
> Dear Community Members,
> 
> just under 3 months ago, we started a community consultation titled "Xen
> Security Process Consultation: is there a case to change anything?" (see
> https://lists.xenproject.org/archives/html/xen-announce/2018-02/msg0.ht
> ml). As promised, I would collate the input - together with further analysis
> trying to genuinely consider the implications of what respondents to the
> consultation have been suggesting - in a white paper. The white paper is
> attached and contains
 
> 1) Baseline: an analysis of our XSAs and how we dealt with XSAs in the
> recent past
 2) Results from the Community Consultation
> 2.1) Feedback received from a community consultation
> 2.2) Analysis
> 3) Recommendations and policy changes - some is quite extensive to try and
> tries to evaluate the impact of policy changes, which would result if we
> implemented solutions to issues highlighted by our users.
 
> The next step is for community members to provide public feedback. If it
> turns out there is a case for changes/improvements, I will condense the
> output of this discussion into a concrete change proposal (or a series
> thereof) to be voted on in the usual way. This may require several
> iterations. Note that the document contains workflow and tools related
> feedback, which I did not anticipate. Some issues highlighted should be
> easy to fix, others will require additional discussion on xen-devel@, such
> as
 * Inconsistent Meta Data and XSA prerequisites
> * Git baseline of patches
> * Release cycle related (issues)
> 
> The document tries to label all discussion items, such that it is easy to
> comment. I normally attach a converted markdown version: however, this is
> unwieldly in this case, because there is a large number of tables and
> images. Thus, I have created a google doc copy which allows anyone with the
> following link
> https://docs.google.com/document/d/1FbGV4ZZB9OU8SI4b9ntnM-l6NaQLND8Yfd9u11V
> 5Q5A/edit?usp=sharing to comment on sections of the document. If you do,
> please make sure you identify yourself in the comment and/or also highlight
> feedback in the e-mail thread discussion that will follow this document.  
> 
> Please also let us know areas of the whitepaper you agree with, as this will
> make it overall easier to identify how much consensus there would be to
> address specific issues and proposals in the document. Otherwise the
> discussion will primarily focus on points of contention, while other areas
> where in fact there may be consensus, will be missed. If there is little or
> no feedback (either positive or negative), we have to assume that people
> are happy with the status quo and that there is only a weak case for
> changes. 
 
> Best Regards
> Lars
> 
> 
> 



signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Problem with Xen 4.7.5

2018-04-05 Thread Steven Haigh
On Thursday, 5 April 2018 7:19:15 PM AEST Ian Jackson wrote:
> Steven Haigh writes ("Re: Problem with Xen 4.7.5"):
> > On 2018-04-05 03:22, Ian Jackson wrote:
> > > Apologies for the inconvenience.
> 
> ...
> 
> > I'm wondering if the severity of this is high enough that I should
> > withdraw the 4.7.5 packages from my repos.
> > 
> > Is this a show-stopper? Security issue? Dom0 crash?
> > 
> > I released 4.7.5 packages within an hour of the 4.7.5 release
> > announcement - so there are quite a few user systems that have already
> > grabbed them.
> 
> It's quite bad.  We don't understand the impact properly yet but it is
> at least a domU crash in some situations.
> 
> I would withdraw the packages, yes.  Sorry.

Thanks Ian, its all good - I've withdrawn them and I'll send an announcement 
to my user list to get them to downgrade etc...

-- 
Steven Haigh

 net...@crc.id.au    http://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Problem with Xen 4.7.5

2018-04-04 Thread Steven Haigh

On 2018-04-05 03:22, Ian Jackson wrote:


We have discovered a bug in Xen 4.7.5 (related to shadow paging).
This bug is a new regression compared to 4.7.4 and does not affect
other Xen releases.

We are investigating the problem.  For now, we recommend that users of
4.7.x do not upgrade to 4.7.5.

Apologies for the inconvenience.


Hi Ian,

I'm wondering if the severity of this is high enough that I should 
withdraw the 4.7.5 packages from my repos.


Is this a show-stopper? Security issue? Dom0 crash?

I released 4.7.5 packages within an hour of the 4.7.5 release 
announcement - so there are quite a few user systems that have already 
grabbed them.


--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Kernel trace when using xl block-attach - 4.14.25

2018-03-17 Thread Steven Haigh
Hi all,

I've noticed recently that once every now and again I get a kernel traceback 
when attaching a drive to a DomU.

The drive in question is attached to the system via eSATA, opened via 
cryptsetup, then added to the DomU. When this crash occurs, any process trying 
to access either the /dev/mapper entry, or the DomU accessing the associated 
block device will hang and never come back.

The only real resolution is to forcefully destroy the DomU (which fails) and 
then eventually hit the reset button on the Dom0.

The trace from dmesg is as follows:
[504869.792058] xen-blkback: backend/vbd/4/51745: using 2 queues, protocol 1 
(x86_64-abi) persistent grants
[504877.624108] [ cut here ]
[504877.624117] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:2725 
rcu_process_callbacks+0x495/0x4b0
[504877.624118] Modules linked in: dm_crypt algif_skcipher af_alg xt_physdev 
br_netfilter iptable_filter bridge 8021q garp stp llc btrfs zstd_decompress 
zstd_compress xxhash it87 hwmon_vid dm_mod dax raid456 async_raid6_recov 
async_memcp
y async_pq async_xor async_tx xor crct10dif_pclmul ghash_clmulni_intel pcbc 
raid6_pq aesni_intel iTCO_wdt aes_x86_64 iTCO_vendor_support crypto_simd 
glue_helper cryptd pl2303 pcspkr usbserial sg lpc_ich mei_me mei shpchp 
i2c_i801 mfd_cor
e xenfs xen_privcmd ip_tables xfs libcrc32c raid1 sd_mod i915 iosf_mbi 
i2c_algo_bit drm_kms_helper drm crc32c_intel r8169 serio_raw ahci libahci mii 
i2c_core sata_mv video xen_acpi_processor xen_pciback xen_netback xen_gntalloc 
xen_gntde
v xen_evtchn ipv6 crc_ccitt autofs4
[504877.624167] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.14.25-5.el7xen.x86_64 #2
[504877.624168] Hardware name: Gigabyte Technology Co., Ltd. To be filled by 
O.E.M./Z68M-D2H, BIOS U1G 03/06/2013
[504877.624169] task: 820124c0 task.stack: 8200
[504877.624172] RIP: e030:rcu_process_callbacks+0x495/0x4b0
[504877.624173] RSP: e02b:880080203f20 EFLAGS: 00010002
[504877.624175] RAX:  RBX: 880080222cc0 RCX: 
172f7355
[504877.624177] RDX: d801 RSI: 880080203f30 RDI: 
880080222cf8
[504877.624178] RBP: 82059880 R08: 00025200 R09: 
815dc988
[504877.624179] R10: 880080225200 R11: eaf325c0 R12: 
820124c0
[504877.624180] R13: 0001 R14: 880080222cf8 R15: 
7fff
[504877.624194] FS:  7fa23850c740() GS:88008020() knlGS:

[504877.624196] CS:  e033 DS:  ES:  CR0: 80050033
[504877.624197] CR2: 7f82fda14000 CR3: 61b88000 CR4: 
00042660
[504877.624199] Call Trace:
[504877.624203]  
[504877.624208]  __do_softirq+0xc8/0x26b
[504877.624211]  irq_exit+0x93/0xb0
[504877.624215]  xen_evtchn_do_upcall+0x2c/0x40
[504877.624219]  xen_do_hypervisor_callback+0x29/0x40
[504877.624221]  
[504877.624224] RIP: e030:xen_hypercall_sched_op+0xa/0x20
[504877.624225] RSP: e02b:82003e90 EFLAGS: 0246
[504877.624227] RAX:  RBX: 820124c0 RCX: 
810013aa
[504877.624228] RDX: ab50466e RSI:  RDI: 
0001
[504877.624229] RBP:  R08: 0002 R09: 

[504877.624230] R10: 7ff0 R11: 0246 R12: 
820124c0
[504877.624231] R13:  R14:  R15: 

[504877.624234]  ? xen_hypercall_sched_op+0xa/0x20
[504877.624238]  ? xen_safe_halt+0xc/0x20
[504877.624240]  ? default_idle+0x18/0xf0
[504877.624242]  ? do_idle+0x164/0x1a0
[504877.624244]  ? cpu_startup_entry+0x5f/0x70
[504877.624247]  ? start_kernel+0x4e1/0x4ec
[504877.624248]  ? set_init_arg+0x55/0x55
[504877.624251]  ? xen_start_kernel+0x52e/0x538
[504877.624253] Code: 12 ff 5e 00 e9 45 fc ff ff 0f 0b e9 23 fc ff ff 0f 0b 0f 
1f 40 00 e9 f3 fc ff ff 0f 0b 66 0f 1f 84 00 00 00 00 00 e9 d5 fb ff ff <0f> 
0b 66 0f 1f 84 00 00 00 00 00 e9 d0 fd ff ff 90 66 2e 0f 1f
[504877.624290] ---[ end trace 62909e6ca83c56bf ]---

Has anyone seen this before or is aware of its cause?

-- 
Steven Haigh

 net...@crc.id.au    http://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Security Advisory 255 - grant table v2 -> v1 transition may crash Xen

2018-02-27 Thread Steven Haigh
On Wednesday, 28 February 2018 1:36:14 AM AEDT George Dunlap wrote:
> On 02/27/2018 02:22 PM, Jan Beulich wrote:
> >>>> On 27.02.18 at 13:37, <net...@crc.id.au> wrote:
> >> On Tuesday, 27 February 2018 11:00:08 PM AEDT Xen. org security team 
wrote:
> >>> RESOLUTION
> >>> ==
> >>> 
> >>> Applying the appropriate attached patch resolves this issue.
> >>> 
> >>> xsa255-?.patch xen-unstable, Xen 4.10.x
> >>> xsa255-4.9-?.patch Xen 4.9.x, Xen 4.8.x
> >>> xsa255-4.7-?.patch Xen 4.7.x
> >>> xsa255-4.6-?.patch Xen 4.6.x
> >> 
> >> Is there a missing pre-requisite patch required for 4.6.6?
> >> 
> >> I'm currently getting a failure on these patches as follows:
> >> 
> >> Patch #55 (xsa255-4.6-1.patch):
> >> + echo 'Patch #55 (xsa255-4.6-1.patch):'
> >> + /bin/cat /builddir/build/SOURCES/xsa255-4.6-1.patch
> >> + /usr/bin/patch -p1 --fuzz=2
> >> patching file xen/arch/arm/domain.c
> >> patching file xen/arch/arm/mm.c
> >> Hunk #2 FAILED at 1075.
> >> Hunk #3 FAILED at 1090.
> >> 2 out of 3 hunks FAILED -- saving rejects to file xen/arch/arm/mm.c.rej
> > 
> > I've just applied the patches to all stable branches, and they all
> > applied fine, including the 4.6 ones. Are you perhaps missing the
> > XSA-235 fix there? In any event, as said a number of times in
> > the past, the patches we provide are against the staging branches
> > for the respective stable versions; we don't guarantee patches
> > apply to vanilla stable releases.
> 
> And as other people have said several times, most downstreams don't
> build from stable-XX, but take a tarball and add patches to it.  I
> expect Steven was asking if someone could point him to specific commits
> from stable-XX that might be required.

Hi George,

Yes, you are correct.

As XSA-235 was an ARM only issue (and I don't build anything for ARM), these 
usually get skipped in my packaging.

As XSA-255 is *both* ARM & x86, it needed that extra bit of TLC... This 
probably makes it a little unique in how XSAs are normally presented.

I did look at the two patches in XSA-255, but it looked like there is a 
combination of both ARM & x86 changes in specifically the -2 patch which lead 
me to the conclusion that I couldn't just remove one patch to take out the 
common and x86 parts.

I figured something was missing, but wasn't able to track it back to the patch 
from August last year.

Thanks to Jan for the pointers to the missing requirement - I've got packages 
built for 4.6 now to push shortly.

-- 
Steven Haigh

 net...@crc.id.au    http://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen Security Advisory 255 - grant table v2 -> v1 transition may crash Xen

2018-02-27 Thread Steven Haigh
On Tuesday, 27 February 2018 11:00:08 PM AEDT Xen. org security team wrote:
> Xen Security Advisory XSA-255
>   version 3
> 
>  grant table v2 -> v1 transition may crash Xen
> 

> RESOLUTION
> ==
> 
> Applying the appropriate attached patch resolves this issue.
> 
> xsa255-?.patch xen-unstable, Xen 4.10.x
> xsa255-4.9-?.patch Xen 4.9.x, Xen 4.8.x
> xsa255-4.7-?.patch Xen 4.7.x
> xsa255-4.6-?.patch Xen 4.6.x

Is there a missing pre-requisite patch required for 4.6.6?

I'm currently getting a failure on these patches as follows:

Patch #55 (xsa255-4.6-1.patch):
+ echo 'Patch #55 (xsa255-4.6-1.patch):'
+ /bin/cat /builddir/build/SOURCES/xsa255-4.6-1.patch
+ /usr/bin/patch -p1 --fuzz=2
patching file xen/arch/arm/domain.c
patching file xen/arch/arm/mm.c
Hunk #2 FAILED at 1075.
Hunk #3 FAILED at 1090.
2 out of 3 hunks FAILED -- saving rejects to file xen/arch/arm/mm.c.rej

The patches for 4.7, 4.9 and 4.10 seem to apply successfully.

-- 
Steven Haigh

 net...@crc.id.au    http://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen release cycle revisited

2017-12-19 Thread Steven Haigh
On Tuesday, 19 December 2017 10:44:04 PM AEDT George Dunlap wrote:
> On 12/19/2017 10:42 AM, Steven Haigh wrote:
> > On Tuesday, 19 December 2017 7:47:14 PM AEDT Jan Beulich wrote:
> >>>>> On 19.12.17 at 07:58, <jgr...@suse.com> wrote:
> >>> My proposal addresses the 4.10 experience. I see the following
> >>> alternatives (assuming we want to keep the two releases per year
> >>> scheme):
> >>> 
> >>> 1. Leave everything as is
> >>> 
> >>>Pro: seems to work for the June release
> >>>Con: release date for the December release is risky
> >>> 
> >>> 2. Move releases one month earlier, freeze dates as well (my proposal)
> >>> 
> >>>Pro: more time for release at end of the year
> >>>Con: freeze date end of February at end of Chinese New Year holidays
> >>>
> >>> in some years (2018 not applicable, as we would move that
> >>> release by 2 weeks only, so next time this will really hit us
> >>> will be 2026, maybe a little bit in 2021)
> >>> 
> >>> 3. Move releases one month earlier, freeze dates before holidays
> >>> 
> >>>Pro: developers won't have to let feature slip due to holiday
> >>>Con: shorter development time for _all_ developers
> >>> 
> >>> 4. Keep the June release like today, move the December release 2 or 4
> >>> 
> >>>weeks earlier
> >>>Pro: all Pros of 1-3
> >>>Con: every second release will have shorter development cycle
> >> 
> >> 5. Go to a yearly release cycle, with June as expected release date.
> >> At the risk of (still) being the only one to dislike the 6-month cycle,
> >> I have to say that there, at the moment, being 4 actively maintained
> >> stable branches and 6 security maintained ones is - just like I did
> >> anticipate back when we discussed the shortening of the cycle - a
> >> significant burden. And we haven't even reached the point yet
> >> where all security maintained branches are from the 6-month cycle.
> > 
> > I've gotta agree here - I've already been skipping releases to keep up.
> > 4.8
> > was a complete non-starter for me, and 4.10 might be the same. Its
> > exhausting.
> FWIW the CentOS Virt SIG had already decided to only consider updating
> every other release, back when the release cadence was 9 months (leaving
> a year and a half between upgrades).

Understandable.
 
> But of course, that's in part  because CentOS is meant to be "stodgy and
> reliable".  Fedora and Ubuntu, for instance, have 6-month release
> cycles.  It looks like Ubuntu 17.10 has Xen 4.9 in it; and there's no
> reason to think Ubuntu 18.04 LTS won't have 4.10 in it.

Agreed - but things like Fedora evolve pretty rapidly. I'm not sure Xen is in 
the same rapid development model - so I don't think we should follow suit in 
this. It would be more than reasonable to have, say, Xen 4.10.0 in one version 
of Fedora - and 4.10.2 in the next...

> Steven, when I glanced at your site it looked like you're actively
> supporting all versions of Xen you've ever released -- is that right?
> If so it's a lot more work than I think anyone else is doing. :-)

Correct. The only versions I don't support are EOL'ed versions and 4.8 - which 
I skipped due to workload.

This means the current build list is 4.5, 4.6, 4.7, 4.9 for both EL6 and EL7.

4.2 and 4.4 got EOL'ed a while back and I don't make them available anymore.

If I didn't miss a 4.8 and already had 4.10 out the door, then I'd be 
publishing 4.5, 4.6, 4.7, 4.8, 4.9 and 4.10. I'm not sure if that's feasible 
for anyone - let alone how it'd be managed effectively by the security team!

For me personally, December release dates for builds is horrible - so its 
likely just about any December release would be on the ignore list - hence I 
feel a single yearly release in July (+/- up to 2 months) would be fine as a 
target.

-- 
Steven Haigh

 net...@crc.id.au    http://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen release cycle revisited

2017-12-19 Thread Steven Haigh
On Tuesday, 19 December 2017 7:47:14 PM AEDT Jan Beulich wrote:
> >>> On 19.12.17 at 07:58, <jgr...@suse.com> wrote:
> > My proposal addresses the 4.10 experience. I see the following
> > alternatives (assuming we want to keep the two releases per year
> > scheme):
> > 
> > 1. Leave everything as is
> > 
> >Pro: seems to work for the June release
> >Con: release date for the December release is risky
> > 
> > 2. Move releases one month earlier, freeze dates as well (my proposal)
> > 
> >Pro: more time for release at end of the year
> >Con: freeze date end of February at end of Chinese New Year holidays
> >
> > in some years (2018 not applicable, as we would move that
> > release by 2 weeks only, so next time this will really hit us
> > will be 2026, maybe a little bit in 2021)
> > 
> > 3. Move releases one month earlier, freeze dates before holidays
> > 
> >Pro: developers won't have to let feature slip due to holiday
> >Con: shorter development time for _all_ developers
> > 
> > 4. Keep the June release like today, move the December release 2 or 4
> > 
> >weeks earlier
> >Pro: all Pros of 1-3
> >Con: every second release will have shorter development cycle
> 
> 5. Go to a yearly release cycle, with June as expected release date.
> At the risk of (still) being the only one to dislike the 6-month cycle,
> I have to say that there, at the moment, being 4 actively maintained
> stable branches and 6 security maintained ones is - just like I did
> anticipate back when we discussed the shortening of the cycle - a
> significant burden. And we haven't even reached the point yet
> where all security maintained branches are from the 6-month cycle.

I've gotta agree here - I've already been skipping releases to keep up. 4.8 
was a complete non-starter for me, and 4.10 might be the same. Its exhausting.

I'm not sure there are really enough under-the-hood changes to justify a 6 
month rapid release cycle.

It adds extra load on the security team, packagers, distro builders etc etc 
which could probably be avoided with a sane release cycle.

I would think one release per year + point releases / roll ups with all XSAs 
and backported fixes every quarter would be fantastic - as well as a 'master' 
that people can build off git at will...

-- 
Steven Haigh

 net...@crc.id.au    http://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Windows PV drivers and Windows 10 / Windows Server 2016

2017-12-12 Thread Steven Haigh
Hi all,

Re the Windows PV drivers - I've tried v8.2.0 on Windows 10, and it required 
me to put Windows into TEST MODE to still load the drivers. Bringing it out of 
test mode results in the Xen PV drivers being uninstalled.

I now have to create a Windows Server 2016 DomU and I'm wondering if there is 
any way without living in TEST MODE for the rest of its life to install the PV 
drivers?

-- 
Steven Haigh

 net...@crc.id.au    http://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897

signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel