Bug#1071480: libldap: sends some IPv6 addresses as server name
On Mon, May 20, 2024 at 04:25:57PM -0700, Quanah Gibson-Mount wrote: > > --On Monday, May 20, 2024 3:45 PM -0700 Elliott Mitchell > wrote: > > Side note - I did raise this issue with the rest of the OpenLDAP project, > and Howard noted: > > "DNS names are required to begin with a letter. RFC 1035, sec 2.3.1. The > fact that gnutls allows names that are all numeric is certainly their bug". > > So I guess two bugs here. According to what I found, that requirement was removed. This doesn't invalidate the fact that no top-level domain consists exclusively of numbers (in fact I'm pretty sure none have any numbers). I'm proposing checking only for nul-characters and passing everything else through. Principle being anything handling SNI must handle the case of a string which fails to match a known entry. If a server program chose to honor strings which violate RFC 6066, GnuTLS doesn't need to get in the way of that. Simply terminating the connection really isn't to helpful (it could simply be a bug). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1071480: libldap: sends some IPv6 addresses as server name
On Mon, May 20, 2024 at 12:46:34PM -0700, Ryan Tandy wrote: > However, I tested your patch, and I'm not sure it's correct. > > If the IPv6 address contains a letter a-f before the first colon, I > think the code you changed is never reached. On seeing the first > non-digit, we break the loop with numeric=0, and never reach the colon. > > Have I missed something? > > I would appreciate if you would pursue this issue upstream. If the fix > needs further review or discussion with the upstream developers, I'd > really rather not be a middleman in that conversation. No, you haven't missed something. %-) Turns out I goofed when reading the loop. Indeed the `if(!isdigit(*c)) {` needs to have the `break;` removed too, then it will work. The person writing the loop was thinking of the most commonly used block of IPv6 addresses which start with "2001:". Yet IPv6 is hexadecimal and "fd00:/8" is part of a validly used block. On Mon, May 20, 2024 at 01:13:11PM -0700, Quanah Gibson-Mount wrote: > > --On Monday, May 20, 2024 1:46 PM -0700 Ryan Tandy wrote: > > > Control: tag -1 upstream moreinfo > > > > Hi Elliott, thank you for investigating this issue and contributing a > > patch. > > [snip] > > > I would appreciate if you would pursue this issue upstream. If the fix > > needs further review or discussion with the upstream developers, I'd > > really rather not be a middleman in that conversation. > > Upstream generally does not accept 3rd party patch contributions, so asking > debian to contribute it wil likely result in it not being accepted. So > it's better to work directly with the OpenLDAP project. I'd start by > filing an issue in the issue tracker if one doesn't already exist: > > https://bugs.openldap.org > > and then apply for a gitlab account with the OpenLDAP project: > > https://git.openldap.org > > After the account is approved, you can open a PR to have your patch > evaluated. Debian policy for maintainers is they're required to take care of pushing issues upstream. I didn't want to deal with the OpenLDAP bug tracker and those steps, so pushing to the Debian project seemed handiest. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1071480: libldap: sends some IPv6 addresses as server name
Seems there were two bugs in #1070033. The part for OpenLDAP is pretty simple. When detecting an IPv6 address (via ':' in the string), the function `ldap_int_tls_connect()` triggers a `break;`, but this requires `numeric=1` to still be in effect. Since IPv6 addresses are hexadecimal, this isn't always true. Patch attached. Given how small it is, any license acceptable to the Debian project is acceptable to me. I'll let the maintainer forward it to the OpenLDAP project. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 From: Elliott Mitchell Date: Sun, 19 May 2024 09:49:36 -0700 Subject: [PATCH] tls: fix handling of numeric IPv6 addresses for SNI A colon in the SNI is a strong indicator of an IPv6 address. Since IPv6 addresses are hexadecimal, `numeric` may already be false and falling through to the test doesn't work. Address this by preemptively setting `sni` to invalid (NULL). Fixes: b8f34888 ("ITS#9176 check for numeric addrs before passing SNI") --- libraries/libldap/tls2.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libraries/libldap/tls2.c b/libraries/libldap/tls2.c index f9dcbfc8d..d433e6508 100644 --- a/libraries/libldap/tls2.c +++ b/libraries/libldap/tls2.c @@ -399,8 +399,10 @@ ldap_int_tls_connect( LDAP *ld, LDAPConn *conn, const char *host ) int numeric = 1; unsigned char *c; for ( c = (unsigned char *)sni; *c; c++ ) { - if ( *c == ':' ) /* IPv6 address */ + if ( *c == ':' ) { /* IPv6 address */ +sni = NULL; break; + } if ( *c == '.' ) continue; if ( !isdigit( *c )) { -- 2.39.2
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
On Sat, May 18, 2024 at 10:47:55AM +0200, Andreas Metzler wrote: > On 2024-05-18 Elliott Mitchell wrote: > > On Sat, May 18, 2024 at 08:16:25AM +0200, Andreas Metzler wrote: > [...] > > >> You seem to argue that it is major problem for a gnutls client to *send* > >> e.g. "127.0.0.1" as SNI. My point is that this is not a problem but at > >> most uncomely since client-side certificate verification will fail. > >> Even for a trusted certificate name checking is done (if gnutls is > >> correctly used). And this will not succeed if the CN or SAN is an IP > >> address. (I have tried with test certificates and gnutls-cli/-serv. My > >> testing might be flawed of course.) > > > This is purely hypothetical since this case isn't being observed. > > > What #1070033 is about is, a program was configured to directly connect > > to a server via IPv6. This address was provided to libgnutls. libgnutls > > sent the provided address to the server as SNI without verifying it was > > valid for SNI. > > > The usual approach is be conservative in what you send, but liberal in > > what you accept. This means libgnutls needs to check whether what is > > provided is acceptable before sending it, but the server side could > > allow an IP address which violates RFC 6066. > > > `gnutls-cli` is a very poor simulcrum for this case. `gnutls-cli` does > > lots of checking which specialized clients may skip. `gnutls-cli` also > > assumes name service is fully available. Whereas `nslcd` cannot rely on > > name service being operational as it may provide name service. > Let's assume > a) _gnutls_server_name_send_params() was changed to reject >e.g. "127.0.0.1"[1] and > b) this stopped libgnutls from sending "127.0.0.1" to the server as SNI. > > How would this help you, or how is this related to this bug report? In > this bug report perhaps an IPv6 address was used which is already > rejected by _gnutls_server_name_send_params(). This is not something I proposed and indeed this wouldn't help me. _gnutls_server_name_recv_params() does some rough filtering which catches IPv6 addresses, but not IPv4 addresses. _gnutls_server_name_send_params() does NO filtering and thus sends both IPv4 and IPv6 addresses. libgnutls is being conservative in what it accepts, but liberal in what it sends. This breaks interoperability. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
On Sat, May 18, 2024 at 08:16:25AM +0200, Andreas Metzler wrote: > On 2024-05-18 Elliott Mitchell wrote: > > On Sat, May 18, 2024 at 07:40:13AM +0200, Andreas Metzler wrote: > >> On 2024-05-18 Elliott Mitchell wrote: > >>> On Sat, May 18, 2024 at 06:55:06AM +0200, Andreas Metzler wrote: > [...] > >>>> Afaict it is a short-cut to save more expensive processing for obvious > >>>> errors. gnutls_session_get_verify_cert_status() (with > >>>> gnutls_session_set_verify_cert() set correctly) or > >>>> gnutls_x509_crt_check_hostname()/gnutls_certificate_verify_peers3() > >>>> does more elaborate stuff on the data, > >>>> gnutls_certificate_verify_peers2() requires a separate > >>>> gnutls_x509_crt_check_hostname(). > > >>> Which seems to argue the more urgent issue is > >>> _gnutls_server_name_send_params() needs to do checking of the provided > >>> server hostname before sending it as SNI. > > >> Why is this urgent or even relevant? Certificate checking (client-side) > >> will not accept IP adresses as SNI field. > > > Not relevant. If the certificate comes from a local file, it is assumed > > trusted. If the certificate comes from the server, then it is only > > available *after* connection and the SNI has already been sent. > [...] > You seem to argue that it is major problem for a gnutls client to *send* > e.g. "127.0.0.1" as SNI. My point is that this is not a problem but at > most uncomely since client-side certificate verification will fail. > Even for a trusted certificate name checking is done (if gnutls is > correctly used). And this will not succeed if the CN or SAN is an IP > address. (I have tried with test certificates and gnutls-cli/-serv. My > testing might be flawed of course.) This is purely hypothetical since this case isn't being observed. What #1070033 is about is, a program was configured to directly connect to a server via IPv6. This address was provided to libgnutls. libgnutls sent the provided address to the server as SNI without verifying it was valid for SNI. The usual approach is be conservative in what you send, but liberal in what you accept. This means libgnutls needs to check whether what is provided is acceptable before sending it, but the server side could allow an IP address which violates RFC 6066. `gnutls-cli` is a very poor simulcrum for this case. `gnutls-cli` does lots of checking which specialized clients may skip. `gnutls-cli` also assumes name service is fully available. Whereas `nslcd` cannot rely on name service being operational as it may provide name service. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
On Sat, May 18, 2024 at 07:40:13AM +0200, Andreas Metzler wrote: > On 2024-05-18 Elliott Mitchell wrote: > > On Sat, May 18, 2024 at 06:55:06AM +0200, Andreas Metzler wrote: > [...] > > > > > I notice the `_gnutls_dnsname_is_valid()` function in > > > > > gnutls28-3.8.5/lib/str.h accepts IPv4 addresses (which are NOT valid > > > > > in > > > > > DNS), but rejects IPv6 addresses. > > > > At a very bare level an IPv4 address is a valid DNS name (alnum, dashes, > > > and dots), an IPv6 adress is not. That is what gnutls is checking here. > > > No, there isn't any IPv4 address which is a valid DNS name. No top-level > > domain consists purely of decimal digits, whereas IPv4 addresses consist > > of purely decimal digits. In fact I don't believe there are any > > top-level domains which have even a single decimal digit in them. > which is totally irrelevant if my reading (quoted below) that this is > not a policy check but a performance optimization is correct. Most recent change to the line, commit 71d921edc4: Add GNUTLS_E_RECEIVED_DISALLOWED_NAME for illegal SNI names An illegal/disallowed SNI server name previously generated the misleading message "An illegal parameter has been received.". This commit changes it to "A disallowed SNI server name has been received.". That commit message clearly indicates the author was thinking of it as a policy check. > > > Afaict it is a short-cut to save more expensive processing for obvious > > > errors. gnutls_session_get_verify_cert_status() (with > > > gnutls_session_set_verify_cert() set correctly) or > > > gnutls_x509_crt_check_hostname()/gnutls_certificate_verify_peers3() > > > does more elaborate stuff on the data, > > > gnutls_certificate_verify_peers2() requires a separate > > > gnutls_x509_crt_check_hostname(). > > > Which seems to argue the more urgent issue is > > _gnutls_server_name_send_params() needs to do checking of the provided > > server hostname before sending it as SNI. > > Why is this urgent or even relevant? Certificate checking (client-side) > will not accept IP adresses as SNI field. Not relevant. If the certificate comes from a local file, it is assumed trusted. If the certificate comes from the server, then it is only available *after* connection and the SNI has already been sent. The issue is libgnutls's API requires providing the library with the server being connected to. libgnutls then assumes the provided server can be used for SNI, which is untrue (in this case IP addresses violate RFC 6066). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
On Sat, May 18, 2024 at 06:55:06AM +0200, Andreas Metzler wrote: > On 2024-05-17 Elliott Mitchell wrote: > > On Thu, May 16, 2024 at 07:06:49PM -0700, Elliott Mitchell wrote: > > > On Tue, May 14, 2024 at 06:22:09PM +0200, Andreas Metzler wrote: > [...] > > > > Could you please post the requested output, although there are no > > > > obvious clues there to your eyes? > > > > > > Problem is that provides rather a lot of data about this network setup. > > > The quantity of information is enough for me to be rather uncomfortable > > > with providing it via public channel. > [...] > > > > I notice the `_gnutls_dnsname_is_valid()` function in > > > gnutls28-3.8.5/lib/str.h accepts IPv4 addresses (which are NOT valid in > > > DNS), but rejects IPv6 addresses. > At a very bare level an IPv4 address is a valid DNS name (alnum, dashes, > and dots), an IPv6 adress is not. That is what gnutls is checking here. No, there isn't any IPv4 address which is a valid DNS name. No top-level domain consists purely of decimal digits, whereas IPv4 addresses consist of purely decimal digits. In fact I don't believe there are any top-level domains which have even a single decimal digit in them. > Afaict it is a short-cut to save more expensive processing for obvious > errors. gnutls_session_get_verify_cert_status() (with > gnutls_session_set_verify_cert() set correctly) or > gnutls_x509_crt_check_hostname()/gnutls_certificate_verify_peers3() > does more elaborate stuff on the data, > gnutls_certificate_verify_peers2() requires a separate > gnutls_x509_crt_check_hostname(). Which seems to argue the more urgent issue is _gnutls_server_name_send_params() needs to do checking of the provided server hostname before sending it as SNI. I've got an initial implementation of this here, but I'm left wondering how far verification should go. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
On Thu, May 16, 2024 at 07:06:49PM -0700, Elliott Mitchell wrote: > On Tue, May 14, 2024 at 06:22:09PM +0200, Andreas Metzler wrote: > > On 2024-05-14 Elliott Mitchell wrote: > > > On Wed, May 01, 2024 at 01:45:00PM +0200, Andreas Metzler wrote: > > [...] > > >> well you could post the complete output of > > >> gnutls-cli --port 636 fd12:3456:7890:abcd::3 > > >> perhaps even with -d10? I would reassign to openldap then if there are > > >> no obvious clues. > > > > > `gnutls-cli` doesn't yield anything obvious. > > [...] > > > Could you please post the requested output, although there are no > > obvious clues there to your eyes? > > Problem is that provides rather a lot of data about this network setup. > The quantity of information is enough for me to be rather uncomfortable > with providing it via public channel. > > > I did get the connection to proceed further than before though. If I add > the IPv6 address of the LDAP server to /etc/hosts, and then use the > hostname instead of IPv6 address for the uri line of /etc/nslcd.conf > things get further (I believe over IPv6, but I haven't satisfactorily > verified this). > > This suggests #1070033 is either in libgnutls30 or slapd. The issue > could be slapd is passing an IPv6 address to a portion of libgnutls30's > API which requires a hostname. The issue could be libgnutls30 rejects > IPv6 addresses in some place(s) where they should be valid by the API. > > I notice the `_gnutls_dnsname_is_valid()` function in > gnutls28-3.8.5/lib/str.h accepts IPv4 addresses (which are NOT valid in > DNS), but rejects IPv6 addresses. Then I look deeper and find RFC 6066 (https://www.rfc-editor.org/rfc/rfc6066), page 7: Literal IPv4 and IPv6 addresses are not permitted in "HostName". This suggests there are at least 2, possibly 3 or more bugs. #1 RFC 6066 says neither are legal, yet _gnutls_dnsname_is_valid() accepts IPv4 addresses (including the 32-bit integer version), but rejects IPv6 addresses. This sort of inconsistency leads to security breaches. #2 The gnutls library uses the SNI extension without checking whether it was passed a literal addresses. #3 nslcd always passes the host string provided to its "uri" configuration setting to the gnutls API without checking whether it is a literal address. #1 is definitely a bug present in the libgnutls30 package. At least one of #2 and #3 is definitely a bug, but both may very well be bugs. Seems better to check in the library as it could effect multiple programs using the library. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
On Tue, May 14, 2024 at 06:22:09PM +0200, Andreas Metzler wrote: > On 2024-05-14 Elliott Mitchell wrote: > > On Wed, May 01, 2024 at 01:45:00PM +0200, Andreas Metzler wrote: > [...] > >> well you could post the complete output of > >> gnutls-cli --port 636 fd12:3456:7890:abcd::3 > >> perhaps even with -d10? I would reassign to openldap then if there are > >> no obvious clues. > > > `gnutls-cli` doesn't yield anything obvious. > [...] > Could you please post the requested output, although there are no > obvious clues there to your eyes? Problem is that provides rather a lot of data about this network setup. The quantity of information is enough for me to be rather uncomfortable with providing it via public channel. I did get the connection to proceed further than before though. If I add the IPv6 address of the LDAP server to /etc/hosts, and then use the hostname instead of IPv6 address for the uri line of /etc/nslcd.conf things get further (I believe over IPv6, but I haven't satisfactorily verified this). This suggests #1070033 is either in libgnutls30 or slapd. The issue could be slapd is passing an IPv6 address to a portion of libgnutls30's API which requires a hostname. The issue could be libgnutls30 rejects IPv6 addresses in some place(s) where they should be valid by the API. I notice the `_gnutls_dnsname_is_valid()` function in gnutls28-3.8.5/lib/str.h accepts IPv4 addresses (which are NOT valid in DNS), but rejects IPv6 addresses. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
affects 1070033 nslcd quit On Wed, May 01, 2024 at 01:45:00PM +0200, Andreas Metzler wrote: > On 2024-04-30 Elliott Mitchell wrote: > > On Tue, Apr 30, 2024 at 05:55:15AM +0200, Andreas Metzler wrote: > > > On 2024-04-29 Elliott Mitchell wrote: > [...] > > > > From `nslcd` on clients I was getting the message: > > > > nslcd[12345]: [1a2b3c] failed to bind to LDAP > > > > server ldaps://[fd12:3456:7890:abcd::3]/: Can't contact LDAP server: > > > > The TLS connection was non-properly terminated.: Resource temporarily > > > > unavailable > [...] > > > > Once I finally figured out `slapd`'s debug mode ('-h ldaps:/// > > > > ldapi:///' > > > > is two arguments, the ldaps and ldapi are a single argument). I got > > > > traces from `slapd`: (serial numbers filed off) > > > > > > > tls_read: want=5, got=5 > > > > : 16 03 01 01 8f > > > > > > > tls_read: want=399, got=399 > > > > 0160:fd12 > > > > 0170::3456:7890:abcd: > > > > 0180::3.-.@. > > > > TLS: can't accept: A disallowed SNI server name has been received.. > > > > connection_read(13): TLS accept failure error=-1 id=1005, closing > [...] > > > I guess you used the IPv6 address as either CN or Subject Alternative > > > Name. Both take names, not IP addresses. There is a different field for > > > IP addresses. > > > > > > gnutls-cli --port 636 fd12:3456:7890:abcd::3 > > > > > > will probably give more info. > > > > > > FWIW I have just generated a local test certificate with "IPAddress:" > > > set to '::1' and things work for me as expected. > > > Hmm, `gnutls-cli --port ldaps` gave a different result. The connection > > successfully established and I was left being able to type to `slapd`. > [...] > > Anything further is purely guesswork. > well you could post the complete output of > gnutls-cli --port 636 fd12:3456:7890:abcd::3 > perhaps even with -d10? I would reassign to openldap then if there are > no obvious clues. `gnutls-cli` doesn't yield anything obvious. Problem is there are at least 3 packages where the bug could lurk: libgnutls30's API could indicate numeric addresses are legal somewhere, but not accept IPv6 addresses (something gets fed to _gnutls_dnsname_is_valid() which shouldn't be). I notice the libgnutls30 function _gnutls_dnsname_is_valid() will return true for "127.0.0.1". This function is almost certainly wrong as it accepts IPv4 addresses (which are not valid in DNS), but rejects IPv6 addresses. nslcd could be passing something which could be an IP address to the wrong part of the libgnutls30 API. nslcd might also be sending an IP address in LDAP somewhere it is required to send a hostname. slapd could be passing something which could be an IP address to the wrong part of the libgnutls30 API. slapd might also be assuming something in LDAP is a hostname when it is valid to be an IP address. Right now _gnutls_dnsname_is_valid() seems highly suspect. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
On Tue, Apr 30, 2024 at 05:55:15AM +0200, Andreas Metzler wrote: > On 2024-04-29 Elliott Mitchell wrote: > > Package: libgnutls30 > > Version: 3.7.9-2+deb12u2 > > Severity: important > > > Long story to finding this one. Trying to get LDAP setup on this > > network. As a recent deployment it seemed appropriate to use IPv6. > > > From `nslcd` on clients I was getting the message: > > nslcd[12345]: [1a2b3c] failed to bind to LDAP server > > ldaps://[fd12:3456:7890:abcd::3]/: Can't contact LDAP server: The TLS > > connection was non-properly terminated.: Resource temporarily unavailable > > > Running `nslcd` in debug mode failed to yield any additional useful > > information. > > > Once I finally figured out `slapd`'s debug mode ('-h ldaps:/// ldapi:///' > > is two arguments, the ldaps and ldapi are a single argument). I got > > traces from `slapd`: (serial numbers filed off) > > > tls_read: want=5, got=5 > > : 16 03 01 01 8f > > > tls_read: want=399, got=399 > > 0160:fd12 > > 0170::3456:7890:abcd: > > 0180::3.-.@. > > TLS: can't accept: A disallowed SNI server name has been received.. > > connection_read(13): TLS accept failure error=-1 id=1005, closing > > > Further tracing of the error message appears to point to the function > > `_gnutls_dnsname_is_valid()` in gnutls/lib/str.h. Seems libgnutls30 is > > incompatible with numeric IPv6 addresses. > > > While IPv6-only hosts are presently uncommon, there is now quite a bit of > > IPv6 traffic in many places. I think this is worthy of having a severity > > of "critical" as "bookworm" may remain as "stable" past when there is > > more IPv6 traffic than IPv4 traffic. For "trixie" this seems very > > likely. > [...] > > Good morning, > > I guess you used the IPv6 address as either CN or Subject Alternative > Name. Both take names, not IP addresses. There is a different field for > IP addresses. > > gnutls-cli --port 636 fd12:3456:7890:abcd::3 > > will probably give more info. > > FWIW I have just generated a local test certificate with "IPAddress:" > set to '::1' and things work for me as expected. Hmm, `gnutls-cli --port ldaps` gave a different result. The connection successfully established and I was left being able to type to `slapd`. Unfortunately that causes there to be 3 packages which could be the one responsible for the problem. Could be libgnutls30 as I originally suspected. Yet `slapd` and `nslcd` could also be responsible for the problem. The string "A disallowed SNI server name has been received." is found in `libgnutls.so.30`. The string "connection_read(%d): input error=%d id=%lu, closing." is found in `/usr/sbin/slapd`. Anything further is purely guesswork. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1070033: libgnutls30: rejects numeric IPv6 addresses during connection
Package: libgnutls30 Version: 3.7.9-2+deb12u2 Severity: important Long story to finding this one. Trying to get LDAP setup on this network. As a recent deployment it seemed appropriate to use IPv6. >From `nslcd` on clients I was getting the message: nslcd[12345]: [1a2b3c] failed to bind to LDAP server ldaps://[fd12:3456:7890:abcd::3]/: Can't contact LDAP server: The TLS connection was non-properly terminated.: Resource temporarily unavailable Running `nslcd` in debug mode failed to yield any additional useful information. Once I finally figured out `slapd`'s debug mode ('-h ldaps:/// ldapi:///' is two arguments, the ldaps and ldapi are a single argument). I got traces from `slapd`: (serial numbers filed off) tls_read: want=5, got=5 : 16 03 01 01 8f tls_read: want=399, got=399 0160:fd12 0170::3456:7890:abcd: 0180::3.-.@. TLS: can't accept: A disallowed SNI server name has been received.. connection_read(13): TLS accept failure error=-1 id=1005, closing Further tracing of the error message appears to point to the function `_gnutls_dnsname_is_valid()` in gnutls/lib/str.h. Seems libgnutls30 is incompatible with numeric IPv6 addresses. While IPv6-only hosts are presently uncommon, there is now quite a bit of IPv6 traffic in many places. I think this is worthy of having a severity of "critical" as "bookworm" may remain as "stable" past when there is more IPv6 traffic than IPv4 traffic. For "trixie" this seems very likely. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1069264: grub: chooses stale RAID1 mirror over fresh mirror
Package: grub Version: 2.06-13+deb12u1 >From `dmesg`: md: kicking non-fresh from array! This is using MD-RAID1. Appears GRUB is opting to load grub.cfg, kernel and initial ramdisk off of this device, rather than the still operational mirror. The result is without manual intervention an older kernel potentially gets loaded and causes problems. I must argue this qualifies as "critical" since an older kernel might have security holes or other problems. For now I'll leave this as "important" since I'm unsure how many are effected by this. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#988477: Also observing #988477
tags 988477 - moreinfo found 988477 4.17.2+76-ge1f9cb16e2-1~deb12u1 affects 988477 src:linux severity 988477 critical quit I am also observing #988477 occur. This machine has a AMD Zen 4 processor. The first observation was when motherboard/processor was swapped out, the older motherboard/processor was several generations old. The pattern which is emerging is Linux MD RAID1 plus recent AMD processor which has full IOMMU functionality. The older machine was believed to have an IOMMU, but the BIOS wasn't creating appropriate ACPI tables (IVRS) and thus Xen was unable to utilize it. This seems to be occuring with a small percentage of write operations. Subsequent read operations appear to be fine. I am not convinced this is a Xen bug. I suspect this is instead a bug in the Linux MD subsystem. In particular if the DMA interface was designed assuming only a single device would ever access any page, but the MD RAID1 driver is reusing the same page for both devices. IOMMU page release could be handled by marking the page unused in a device data structure and later removed by sweeping a table. In such case if the MD-RAID1 driver was to redirect the page to another device between these two steps, the entry for a subsequent device could be wiped out when trying to invalidate an entry for a prior device. Anyway, I'm also observing bug #988477. This could also be a kernel bug. So far no crashes/confirmed data loss have occured, but sweeping the mirror does turn up small numbers of inconsistencies. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#810964: #810964 is more kernel driver than Xen
reassign 810964 src:linux tags 810964 -moreinfo affects 810964 src:xen found 810964 5.10.191-1 found 810964 6.1.52-1 found 810964 6.5.3-1 found 810964 5.10.127-2~bpo10+1 found 810964 6.1.38-4~bpo11+1 found 810964 6.4.4-3~bpo12+1 quit Upon further investigation, while some part of #810964 may be in Xen, the biggest issue is in the Linux kernel. Appears MCE/EDAC support for Xen was implemented around 2008-2012. Since that time the maintainer has changed and the new maintainer was unaware the driver was supposed to function on Xen. As such the current maintainer has been adding in constructs which are incompatible with operation on Xen, and at 767f4b620eda overtly broke Xen support. Part of the fix may require adjustments to Xen, but right now the immediate source of breakage is the Linux kernel. As such I'm reassigning this to src:linux. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1050030: Similar reproduction
On Fri, Aug 18, 2023 at 02:05:31PM -0700, Elliott Mitchell wrote: > >From reading the available information I suspect Tianocore/EDK2 may have > tried to move some functionality to a distinct build and neither setup > quite works. Notably there is now a "OvmfPkg/OvmfXen.dsc" build > configuration. The OVMF.fd for Qemu for Xen functionality may have been > moved /here/. There might also be an attempt at functionality similar to > "ArmVirtPkg/ArmVirtXen.dsc" (Debian 978595) for x86. Now confirmed reverting to 2020.11-2+deb11u1 takes care of the issues I'm running into. I've been able to build OvmfPkg/OvmfXen.dsc, but haven't gotten it to do anything. I'm suspecting the support for running headless didn't get into OvmfXen. I'm interacting with someone knowledgeable, but nothing yet. I suspect the "ovmf" package needs to be split. I've gotten the impression the build needed for normal `qemu` isn't going to be the same as the build needed for xen-qemu. I think what is really needed is for xen-utils-X.YY to Recommend a virtual package "xen-domu-bootloader" which is then provided by tools which can load VMs. The current other in-service tool is grub-xen-host, but it appears OvmfXen may also be able to provide the service. I'm attaching two patches which should help organize the source package. These leave all the "./edksetup.sh" lines identical. Perhaps make use of this to make the build cleaner? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 >From 910f6592733dbef2166ceb469320b8e21c4fa977 Mon Sep 17 00:00:00 2001 Message-Id: <910f6592733dbef2166ceb469320b8e21c4fa977.1692832840.git.ehem+deb...@m5p.com> From: Elliott Mitchell Date: Wed, 20 Jan 2021 17:40:15 -0800 Subject: [PATCH 1/2] debian/rules: Rework edksetup.sh invocations Instead of using "set -e", instead overtly test return codes using a conditional. Move commonly used build flags to front of command as a precursor to merging into a macro. This also makes the varying flags more overt by being on the end. --- debian/rules | 45 +++-- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/debian/rules b/debian/rules index 116c9c74b7..36b1ffc045 100755 --- a/debian/rules +++ b/debian/rules @@ -59,8 +59,7 @@ undefine CONF_PATH override_dh_auto_build: build-qemu-efi-aarch64 build-qemu-efi-arm build-ovmf build-ovmf32 debian/setup-build-stamp: - set -e; . ./edksetup.sh; \ - make -C BaseTools ARCH=$(EDK2_BUILD_ARCH) + . ./edksetup.sh && make -C BaseTools ARCH=$(EDK2_BUILD_ARCH) touch $@ OVMF_INSTALL_DIR = debian/ovmf-install @@ -95,11 +94,10 @@ build-ovmf32: $(OVMF32_BINARIES) $(OVMF32_IMAGES) $(OVMF32_BINARIES) $(OVMF32_IMAGES): debian/setup-build-stamp rm -rf $(OVMF32_INSTALL_DIR) mkdir $(OVMF32_INSTALL_DIR) - set -e; . ./edksetup.sh; \ - build -a IA32 \ - -t $(EDK2_TOOLCHAIN) \ + . ./edksetup.sh && build -b $(BUILD_TYPE) -t $(EDK2_TOOLCHAIN) \ + -a IA32 \ -p OvmfPkg/OvmfPkgIa32.dsc \ - $(OVMF32_4M_SMM_FLAGS) -b $(BUILD_TYPE) + $(OVMF32_4M_SMM_FLAGS) cp $(OVMF32_BUILD_DIR)/FV/OVMF_CODE.fd \ $(OVMF32_INSTALL_DIR)/OVMF32_CODE_4M.secboot.fd cp $(OVMF32_BUILD_DIR)/FV/OVMF_VARS.fd \ @@ -109,38 +107,34 @@ build-ovmf: $(OVMF_BINARIES) $(OVMF_IMAGES) $(OVMF_PREENROLLED_VARS) $(OVMF_BINARIES) $(OVMF_IMAGES): debian/setup-build-stamp rm -rf $(OVMF_INSTALL_DIR) mkdir $(OVMF_INSTALL_DIR) - set -e; . ./edksetup.sh; \ - build -a X64 \ - -t $(EDK2_TOOLCHAIN) \ + . ./edksetup.sh && build -b $(BUILD_TYPE) -t $(EDK2_TOOLCHAIN) \ + -a X64 \ -p OvmfPkg/OvmfPkgX64.dsc \ - $(OVMF_2M_FLAGS) -b $(BUILD_TYPE) + $(OVMF_2M_FLAGS) cp $(OVMF_BUILD_DIR)/FV/OVMF_CODE.fd \ $(OVMF_BUILD_DIR)/FV/OVMF.fd $(OVMF_INSTALL_DIR)/ cp $(OVMF_BUILD_DIR)/FV/OVMF_VARS.fd $(OVMF_INSTALL_DIR)/ rm -rf Build/OvmfX64 - set -e; . ./edksetup.sh; \ - build -a IA32 -a X64 \ - -t $(EDK2_TOOLCHAIN) \ + . ./edksetup.sh && build -b $(BUILD_TYPE) -t $(EDK2_TOOLCHAIN) \ + -a IA32 -a X64 \ -p OvmfPkg/OvmfPkgIa32X64.dsc \ - $(OVMF_4M_FLAGS) -b $(BUILD_TYPE) + $(OVMF_4M_FLAGS) cp $(OVMF3264_BUILD_DIR)/FV/OVMF_CODE.fd \ $(OVMF_INSTALL_DIR)/OVMF_CODE_4M.fd cp $(OVMF3264_BUILD_DIR)/FV/OVMF_VARS.fd \ $(OVMF_INSTALL_DIR)/OVMF_VARS_4M.fd rm -rf Build/OvmfX64 - set -e; . ./edksetup.sh; \ - build -a X64 \ - -t $(EDK2_TOOLCHAIN) \ + . ./edksetup.sh && build -b $(BUILD_TYPE) -t $(EDK2_TOOLCHAIN) \ + -a X64 \ -p OvmfPkg/OvmfPkgX64.dsc \ - $(OVMF_2M_SMM_FLAGS) -b $(BUILD_TYPE) + $(OVMF_2M_SMM_FLAGS) cp $(OVMF_BUILD_DIR)/FV/OVMF_CODE.fd \ $(OVMF_INSTALL_DIR)/OVMF_CODE.secboot.fd rm -rf
Bug#1050030: Similar reproduction
affects 1050030 src:xen quit I'm seeing a similar situation, though instead using FreeBSD/x86 in the VM. For FreeBSD the bootloader appears to operate normally, but something fails quickly after loading the kernel: Loading kernel... /boot/kernel/kernel text=0x18aa98 text=0xdfd150 text=0x675154 data=0x140 data=0x1c38e8+0x43b718 0x8+0x18fe70+0x8+0x1ae449/ Loading configured modules... /boot/entropy size=0x1000 /etc/hostid size=0x25 staging 0xe3e0 (not copying) tramp 0xe351b000 PT4 0xe3512000 Start @ 0x8038b000 ... EFI framebuffer information: addr, size 0xf000, 0x1d5000 dimensions 800 x 600 stride 800 masks 0x00ff, 0xff00, 0x00ff, 0xff00 I believe all these messages are from FreeBSD's bootloader. The first message from the kernel should be "---<>---", yet that message never shows. Xen shows the domain spinning on a single processor which makes me believe the FreeBSD kernel has loaded, panic()ed and the debugger is loaded (but there is no VGA console). >From reading the available information I suspect Tianocore/EDK2 may have tried to move some functionality to a distinct build and neither setup quite works. Notably there is now a "OvmfPkg/OvmfXen.dsc" build configuration. The OVMF.fd for Qemu for Xen functionality may have been moved /here/. There might also be an attempt at functionality similar to "ArmVirtPkg/ArmVirtXen.dsc" (Debian 978595) for x86. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#978595: #978595 is looking higher priority
On Tue, Jul 04, 2023 at 11:56:39PM +0300, Michael Tokarev wrote: > Out of curiocity, what value is it to boot a xen domU (or qemu) guest in uefi > mode? > I mean, bios mode is still recommended for at least commercial virt solutions > such > as vmware, and it works significantly faster in qemu and xen too. It is > more, qemu > ships minimal bios (qboot) to eliminate all boot-time cruft which is not > needed in > a vm most of the time. First, the known high value portion of #978595 is getting ArmVirtPkg/ArmVirtXen.dsc built and packaged. This results in a XEN_EFI.fd file. As such the presently verified value only applies to ARM. What you do with XEN_EFI.fd is you configure an ARM domain with 'kernel = "${edk2_install_dir}/XEN_EFI.fd"' The resultant domain has no extra daemons emulating hardware. Inside the domain, Tianocore/EDK2 will search via its normal means for a boot.efi file and load that if it can. This is similar to PyGRUB versus PvGRUB. If the OS being loaded has native Xen drivers, you've gotten rid of the Qemu process hanging around in domain 0 providing security holes. So far this is reliably booting the WIP FreeBSD/arm64. I imagine this could also load GRUB. I believe OvmfPkg/OvmfXen.dsc aims to be something similar for x86, but I've yet to achieve results from that. My hope is this could load FreeBSD/x86 in a PVH domain. On Tue, Jul 04, 2023 at 10:30:34PM +0200, Paul Leiber wrote: > As the Windows systems are not usable anymore, Xen is significantly > reduced in functionality after the upgrade. Is this existing bug report > the right place to file this, or should I open a new bug report? If this > bug report is the right place, its priority should indeed be raised, at > least to important (linux PVH DomUs are still working fine). If I should > open a new bug report, for which package? New report. The topic for #978595 is I was hoping for some other build types of EDK2/Tianocore to be built and packaged. What you're describing is a regression and certainly not merely a wishlist packaging request. I'm unsure, but at first thought this would be src:xen. On that note a FreeBSD VM I've got has been having difficulty since the 4.14 -> 4.17 upgrade. I'm still fighting other upgrade issues right now. Some portions of the EDK2/Tianocore packaging look suspicious, so I wouldn't be surprised if the failure was there. On a very different note, I'm concerned about commit 5e68feec5b2. If you would care to examine patch #3 attached to message #22 on bug 978595, you will notice it bears a strong resemblance. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978595#22 https://bugs.debian.org/cgi-bin/bugreport.cgi?att=3;bug=978595;filename=0003-debian-rules-Switch-to-truncate-from-dd.patch;msg=22 I believe 5e68feec5b2 is simply parallel development (that was an ugly use of `dd` and `truncate` is known). Yet I find it discouraging I pointed to the issue more than a full 2 years earlier it was ignored. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#452721: irt: irt: Bug#452721 notes from explorations
Synthesizing things since I hadn't been copied on previous message... On Mon Jul 31 18:10:34 BST 2023, zithro wrote: > > On 31 Jul 2023 03:39, Elliott Mitchell wrote: > > > Presently I hope to convince the Xen core to allow full Python in domain > > configuration files, but no news on that front so far. This would mean > > /etc/default/xendomains would need to change to match Python syntax. > > There was an answer today on xen-devel: the ability to use scripts in > domU cfg files has been explicitely removed for various reasons. > This does not prevent you from "source"-ing teh cfg files in your > script(s) if they are proper Python syntax. Or you could simply > parse/regex the values you want. Though the reasons given seem orthogonal to my thinking. I'm thinking use libpython as the parser since that allows dictionaries and guarantees the syntax remains a subset of Python. Whereas the responses read like they think I'm asking for full Python scripts as domain configurations (which is a very large superset of what I'm proposing). > And as Marek suggested in his answer, you can also put any arbitrary > settings in the comments. I had already thought of that as it is a common strategy for such things. This though has substantial limitations and since Python has all the capabilities needed, strategies based on Python seem very attractive. I was thinking Perl for a bit, but Python provides a simple strategy for extracting required information out of configurations. Crucially the UUID which lets you match running domains to their configuration. > Although ... > > > My thinking for adding to domain configuration files would be something > > along these lines: > > > > init = { > > 'tool': 'xendomains-ng', > > 'version': 0, > > 'order': 9, > > 'startwait': 60, > > 'stopaction': 'save', > > } > > The problem with adding this to a domU config file is that it could > cause problems for (live) migrations. The start/stop order is "per > dom0", and may be different on another one. > Imagine two dom0s, one storing the domain files "locally", while the > other uses NFS. Only in the second case the domU should wait for the NFS > server/domain to be available. > > To me, the start/stop logic should be in a dom0 config file. I'm not understanding the situation you're thinking of. The closest I can come is you're thinking of a situation which would be handled by having host defaults, but also overrides in domain.cfg files. Generic VMs would act according to the host settings, only domains which had overridden values would act differently. You could have a network of VM hosts where normal hosts specify 'migrate' in /etc/default/xendomains. Then you have the magic host which specifies 'save' or 'shutdown'. You would also specify something other than 'migrate' for domains handling services local to a particular host. > > 'startwait' would tell the script to wait that long before starting > > subsquent domains. > > A time-based wait may be useful for when everything goes well, but what > about when there are problems ? > If you want to be sure a domain is up (ie. ready to serve), you would > need to peek at the related "service". > For example, to be sure a DNS domU is up, you would have to try a DNS > request, as a ping or "xl list" would not be enough. > Also, domains in xen/auto are started with a mix of serialization AND > parallelization, as "xl create" returns once the domain has started (ie. > in the Xen point of view, not the user's). Indeed. I'm well aware what I'm suggesting has major limitations. I'm proposing what I consider feasible given available time. What you're suggesting could be a feature for v2, which might be written based on what I manage. > > 'stopaction' would allow different actions if the machine was to stop. > > The 3 options which come to mind are 'stop' (shutdown), 'save' (save to > > specified storage location), and 'migrate'. > > Then, each time you do NOT want to follow the usual action, you'd have > to edit -each- domU cfg file ? Usually if you didn't want to follow the usual action, you would invoke `xl` manually. What has come to mind though is perhaps the action should be uploaded to the xenstore. Then when an unusual action was desired, the xenstore information could be changed and the action would follow the domain. This though seems a feature for a future version. > > If full Python doesn't become available, this might take the format: > > init = 'tool=xendomains-ng,version=0,order=9,startwait=60,stopaction=save' > > Not needing to parse the string though does make one's life simpler. > > Well, it makes -your- life easier, not
Bug#1049450: New rpc.mountd rejects -N 2 option
On Wed, Aug 16, 2023 at 08:57:16AM +0200, Salvatore Bonaccorso wrote: > > On Tue, Aug 15, 2023 at 04:13:59PM -0700, Elliott Mitchell wrote: > > Package: nfs-kernel-server > > Version: 1:2.6.2-4 > > > > Hopefully SSIA. > > > > `rpc.mountd` has a -N option to disable versions of NFS. > > > > I had been previously using "-N 2", but that is now broken. The error > > message was quite non-helpful ("nfsd2" if I recall correctly). Upon > > removing "-N 2", luckily NFSv2 didn't get enabled, but this was still > > annoying to deal with. At worst using a deprecated setting should merely > > generate a warning. > > Removal of NFSv2 support was documented with a Debian NEWS entry for > 1:2.6.1-1~exp1, cf. #1006650. > > nfs-utils (1:2.6.1-1~exp1) unstable; urgency=medium > > Support for NFSv2 has been removed from nfs-kernel-server. It was > previously disabled by default, but still available. > > -- Ben Hutchings Sun, 13 Mar 2022 19:05:02 +0100 Removing NFSv2 support shouldn't invalidate "-N 2". "-N 2" is supposed to disable NFSv2 at runtime, as such removing all NFSv2 support should merely render "-N 2" 100% redundant and at worst produce a warning. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1049450: New rpc.mountd rejects -N 2 option
Package: nfs-kernel-server Version: 1:2.6.2-4 Hopefully SSIA. `rpc.mountd` has a -N option to disable versions of NFS. I had been previously using "-N 2", but that is now broken. The error message was quite non-helpful ("nfsd2" if I recall correctly). Upon removing "-N 2", luckily NFSv2 didn't get enabled, but this was still annoying to deal with. At worst using a deprecated setting should merely generate a warning. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#452721: irt: Bug#452721 notes from explorations
Even though there hasn't been any discussion recently, bug #452721 is very much still of major concern to me. First issue is how to parse domain configuration files. Reason being a foo.cfg file might have the configuration 'name = "bar"'. This would also let the script retrieve the UUID if that has been set. Turns out while Python in domain configuration files isn't supportted, the syntax is still a proper subset of the Python language. This makes Python the ideal programming language for a replacement script. Only weakness is being able to have full Python syntax in configuration files might make the task simpler. Presently I hope to convince the Xen core to allow full Python in domain configuration files, but no news on that front so far. This would mean /etc/default/xendomains would need to change to match Python syntax. My thinking for adding to domain configuration files would be something along these lines: init = { 'tool': 'xendomains-ng', 'version': 0, 'order': 9, 'startwait': 60, 'stopaction': 'save', } Mainly a Python dictionary holding key values. Thought being the 'tool' and 'version' values, is to hope for some form of compatibility if such scripts were to become common. My thinking is 'order' would indicate sequence. Domains with higher order get started first (same order would nominally allow parallel start). If a domain.cfg file didn't define order then its order is 0. 'startwait' would tell the script to wait that long before starting subsquent domains. 'stopaction' would allow different actions if the machine was to stop. The 3 options which come to mind are 'stop' (shutdown), 'save' (save to specified storage location), and 'migrate'. If full Python doesn't become available, this might take the format: init = 'tool=xendomains-ng,version=0,order=9,startwait=60,stopaction=save' Not needing to parse the string though does make one's life simpler. Other concerns include: Sometimes you may want to take a distinct action during stop. Ie if you're doing restarts for kernel updates, you'll want to override and have domains reboot. It may be handier to have distinct options for 'restart'. Full restarts can follow proper order, or could simply involve bouncing domains based on order. Notably with HVM domains and Qemu updates, you could do: order 0 down, order 1 down, order 9 down, order 9 up, order 2 up, order 0 up Or you could do: order 9 down, order 9 up, order 1 down, order 1 up, order 0 down, order 0 up I'm basically certain writing a new xendomains script in Python is the way to go. Now to get an answer as to whether full Python in domain configuration files could be reenabled. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1036364: zfsutils-linux: please remove GPT creation bug
Package: zfsutils-linux Version: 2.0.3-9+deb11u1 Would the Debian ZFS maintainers be so kind as to remove the GPT creation bug from zfsutils-linux? Full details are at: https://github.com/openzfs/zfs/issues/94 The issue is simply zpool's create/replace and other subcommands try to unconditionally create GPT labels on anything resembling a whole device. Unfortunately whatever algorithm is used for detecting whole devices is poor and generates almost as many false positives and false negatives as true positives and true negatives. Attempts to get the OpenZFS project to address this seem to have met with deaf ears. Notably this was reported as a bug a decade ago, but nothing has happened (notice the dates on the upstream bug report). At this point I hope the Debian maintainers are willing to patch out this bug. Hopefully maintainers of other projects will follow and upstream might notice they're approaching the losing end of a fork. This has been a problem for so long that the workarounds are headed towards being well-established techniques. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1034811: linux: consider CONFIG_HW_RANDOM_VIRTIO=n
Package: src:linux Version: 6.0.3-1~bpo11+1 Severity: wishlist Looks like someone had the idea of a virtualized HW RNG. Yet looking at the kernel source, there isn't a single actual implementation. Unless I'm missing something, having CONFIG_HW_RANDOM_VIRTIO simply wastes processor time during build and enlarges the package for no gain. Perhaps time for Debian to quit packaging this used idea? Looks like on-processor HW RNGs are what are taking over. Possibly also the HW RNG from the vTPM implementation. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1034463: closing 1034463
On Sun, Apr 16, 2023 at 07:08:03AM +0200, Salvatore Bonaccorso wrote: > CONFIG_AGP is built-in in Debian, in particular for: > > debian/config/alpha/config:CONFIG_AGP=y > debian/config/amd64/config:CONFIG_AGP=y > debian/config/hppa/config.parisc64:CONFIG_AGP=y > debian/config/ia64/config:CONFIG_AGP=y > debian/config/kernelarch-powerpc/config:CONFIG_AGP=y > debian/config/kernelarch-x86/config:CONFIG_AGP=y I hadn't checked all architectures, but was well-aware it is built-in for amd64. I was suggesting it should change from being built-in to being a module. The reason being AGP is very rare on amd64 motherboards. According to the handy reference, AGP was starting to disappear just as amd64 hardware started hitting the market. I'm unsure where other architectures stand on the issue. Yet amd64 it shouldn't be built-in. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1034463: linux: consider CONFIG_AGP=m
Package: src:linux Version: 5.10.158+2 Severity: wishlist Could AGP support be turned into a module for Debian kernels? I'm tempted to suggest it shouldn't even be built for amd64, but does seem reasonable for i686 kernels. Given this, module seems to make sense. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1032480: xen: Important cherry-picks for bookworm/updates
On Tue, Mar 07, 2023 at 01:13:56PM -0800, Elliott Mitchell wrote: > > ad15a0a8ca2515d8ac58edfc0bc1d3719219cb77 > x86/time: prevent overflow with high frequency TSCs Okay, looks like this one had already been grabbed. Sorry for the way too late alert. Thanks for staying on top of what was happening with upstream Xen. > I haven't found a patch for the other one yet. There is some issue with > the latest generation which needs "x2apic=false" on Xen's command-line > in order to get interrupts to domain 0. I'm guessing the latest from AMD > broke the PIC emulation. > > If this isn't actually patched yet, I suspect it soon will be. I haven't > observed anything on xen-devel, so perhaps the workaround was found too > quickly to get noticed as urgent. This one though looks potentially more and less serious. The workaround is simpler than the above ("x2apic=false" on Xen's command-line, instead of "tsc_mode = 1" for *every* VM). Yet the underlying problem could be more severe. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1032480: xen: Important cherry-picks for bookworm/updates
Package: src:xen Version: 4.17.0+46-gaaf74a532c-1 Severity: important Two major bugs have shown with the release of new hardware from AMD. Since the new hardware is likely to become common during the life of Debian/bookworm, you may wish to grab them early: ad15a0a8ca2515d8ac58edfc0bc1d3719219cb77 x86/time: prevent overflow with high frequency TSCs Turns out the latest generation is fast enough to cause overflows. I haven't found a patch for the other one yet. There is some issue with the latest generation which needs "x2apic=false" on Xen's command-line in order to get interrupts to domain 0. I'm guessing the latest from AMD broke the PIC emulation. If this isn't actually patched yet, I suspect it soon will be. I haven't observed anything on xen-devel, so perhaps the workaround was found too quickly to get noticed as urgent. Now to continuing the work on figuring out the consequences from upgrading hardware a bit too early... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#921187: IRT: backports for Xen
>From looking, it doesn't appear necessary to remove the dependency of QEMU on libxenmiscX.YY to make backports possible. According to DPKG, multiple versions of libxenmisc can be installed at the same time, so the issue is simply whether multiple versions of QEMU can be installed at the same time. Last time I tried, it was /almost/ possible to install the testing version of Xen on an otherwise stable system. The only dependency issue was the testing version of Xen needed an incompatible version of libc. Backports already look 99% possible. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1026914: arcanist client improperly uploading files
Package: arcanist Version: 0~git20200925-1 Severity: grave If one has one or more commits in /some/repo one can create a Phabricator diff by running `arc diff $oldver`. If there are are untracked files in the directory the arcanist client gives the message: 8<-8< You have untracked files in this working copy. Working copy: /some/repo Untracked changes in working copy: (To ignore these 1 change(s), add them to ".git/info/exclude".) file0 file1 file2 Ignore these 3 untracked file(s) and continue? [y/N] 8<-8< Suspicious resemblance to what `git status` might give. If one then goes to an appropriate version of Phabricator, on the right column between "Tags" and "Subscribers" will be "Referenced Files". I have noticed "Referenced Files" appears when untracked files are present. Diffs done from repository directories with no untracked files do not have the "Referenced Files". As such I reasonably believe arcanist is NOT ignoring these files. At a minimum it is uploading metadata about them to Phabricator, at worst it is uploading them to the server without notification. Privacy and security violation. This is visible enough I suspect many people have already noticed. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1006418: #1006418: Linux stubdomains?
Not a proper In-Reply-To since that message ended up /somewhere/ and I'm thus going back to the bug DB for this reply. I guess I'm neutral-ish on Linux versus Mini-OS for doing stub domains for Debian on Xen. I suspect development on Xen's Mini-OS isn't all that active. On the flip side due to its limited requirements, Mini-OS might not need much updating. My major concern is can current Linux kernels be made small enough for this to be worthwhile? I've done some small Debian domains and they really want a minimum of 192MB of memory, which seems a bit large. Really the big issue seems to be someone simply needs to play with this a *lot* in order to figure the thing out. The information which is out there isn't easy to understand. I suspect the simplest may be to examine Qubes OS as they have figured the thing out. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1017944: Another reproduction of #1017944
X-Debbugs-Cc: pkg-xen-de...@lists.alioth.debian.org Guess we're finding out where everyone's update windows are. Some though may report before resolving the issue or somewhat after. Yet another reproducer of the issue here. I also observed the failure in Xen's dmesg and confirm the issue occurs with PVH VMs. I haven't tried rebuilding with Valentin Kleibel's patch, but another potential workaround is to add: deb https://snapshot.debian.org/archive/debian/20220801T032804Z/ bullseye main To /etc/apt/sources.list, then *hold* the GRUB packages at 2.04-20. I'm wondering whether we should subscribe pkg-xen-de...@lists.alioth.debian.org to this bug as it has a kind of major impact. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#737564: #737564 is becoming more urgent
For some time the Linux kernel hasn't guaranteed the order of block devices. #737564 is a good solution to this issue. (yeah, suddenly running into devices getting different designations due to restart) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1009793: linux-source 5.10.106-1 changes block device order
Package: src:linux Version: 5.10.106-1 Between 5.10.103-1 and 5.10.106-1 (image -13) something changed which reliably causes what used to show as /dev/sda to show as /dev/sdb. Other block devices plugged into the SCSI subsystem may have swapped around, but I've yet to untangle the others. A few utilities are still sensitive to block device order and this causes issues for those. Nothing on the hardware explains this. The controller thinks the device has a lower number, the device should respond much faster. The lowest level is the cciss driver. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1008911: initscripts: /run often mounted nodev, "/run/rootdev" likely to fail
Package: initscripts Version: 3.02-1 Often /run is mounted with the "nodev" option, at which point doing a `mknod` "/run/rootdev", then trying to `fsck` that doesn't work as a fallback. Perhaps "/dev/fsckfallbackdev"? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1008910: mount-functions: Only allows for LABEL/UUID
found 1008910 3.02-1 found 1008910 2.96-7+deb11u1 found 1008910 2.93-8 quit On Mon, Apr 04, 2022 at 12:48:07AM +0200, Thorsten Glaser wrote: > On Sun, 3 Apr 2022, Elliott Mitchell wrote: > > > Perhaps the test should be: "[A-Z][A-Z]*[A-Z][A-Z]=*"? > > No, that???s a shellglob, no BRE. Indeed. That was the strictest pattern I could come up with likely to match new additions and exclude other things. All other matches start with "/" so nominally "[A-Z]*" would be enough by itself. > I think it???s best here to update the list with whatever findfs(8) > comes up when it does come up; anything else would require either > ksh extglobs or really excessive parsing attempts few would want > to maintain. > > - LABEL=*|UUID=*) > + LABEL=*|UUID=*|PARTUUID=*|PARTLABEL=*) Not my decision to make, I simply narrowed down the issue and stated it was a problem. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1008910: mount-functions: Only allows for LABEL/UUID
Package: initscripts Version: 3.01-1 This is *almost* #677420, but not quite. The test in /lib/init/mount-functions.sh, _read_fstab() tests for "LABEL=*|UUID=*" before resorting to `findfs`. Thing is `findfs` has two other cases it can handle and that test misses those two. Perhaps the test should be: "[A-Z][A-Z]*[A-Z][A-Z]=*"? That matches the two currently supported extra cases and would hopefully catch further additions to what `findfs` can handle. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1008857: irt: fsck: Automatic filesystem check skipped
Come to think of it, my initial message may have pointed to the root cause. May very well be `fsck` skips checks on filesystems on USB devices. Problem is this behavior is taking precedence over checking filesystems listed in /etc/fstab. If the root filesystem is located on USB it is irrelevant the device can easily be removed; if the device is removed the system will be in a very problematic state, so acting as if it is non-removable is the correct behavior. Similar situation for any device in /etc/fstab which isn't marked "noauto". -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1008857: fsck: Automatic filesystem check skipped
Package: util-linux Version: 2.36.1-8+devuan2 Severity: important For some reason on this aarch64 device, the automated filesystem checks which should be done via `fsck -T -M -A -a -t ext4` are getting skipped. When trying to run this manually, no error messages of any sort was observed. During boot I am observing the message "warning: maximal mount count reached, running e2fsck is recommended". I'm aware of two system quirks which might cause `fsck` to fail. This is an aarch64 (ARM64) system. The main storage device is UAS and features a hybrid MBR. Most utilities and the Linux kernel find the valid GPT and ignore the hybrid GPT. I recall running into an issue like this on a mipsel system many years ago. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1008308: radicale: TLS broken with several clients
On Sat, Mar 26, 2022 at 05:38:21PM +0100, Jonas Smedegaard wrote: > Quoting Elliott Mitchell (2022-03-26 16:35:53) > > Has been reported upstream: > > https://github.com/Kozea/Radicale/issues/1183 > > > > Upstream has been completely unresponsive. No fix is available. > > Thanks for reporting this upstream where it belongs. > > For the Debian packaging of Radicale the recommended use is to *not* > handle TLS directly but let another frontend web service handle that. > Upstream calls this approach "Reverse Proxy": > https://radicale.org/v3.html#reverse-proxy "Recommended" means other configurations should function. Notably the documentation suggests running as a daemon is in theory supported: https://radicale.org/v3.html#running-as-a-service Reverse-proxy is also a specialized configuration not appropriate for all situations. For the setup I've got adding Apache or ngnix would more than double the size of the installation. This would also add Apache or ngnix's security vulnerabilities to this setup (they've been pretty good, but that is not perfect). > Lowering severity accordingly. important: "a bug which has a major effect on the usability of a package, without rendering it completely unusable to everyone." Broken seems the definition on major effect on usability. In fact I believe "grave" is appropriate for this issue. I don't know the frequencies of the various types of configuration. I have a suspicion standalone daemon is a very common configuration and most users are unaware they're relying on the security of WPA2. > > With no fix available this renders the Radicale package useless unless > > one wishes to run in with an insecure configuration (disable TLS/SSL). > > No. Radicale is certainly not useless. Okay, that is true. It is simply broken for the type of setup I've got and no assistance has been forthcoming from upstream. My hope was your channels as a package maintainer might be able to place more pressure on upstream to address a grave bug. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1008308: radicale: TLS broken with several clients
Package: radicale Version: 3.0.6-3 Severity: important Has been reported upstream: https://github.com/Kozea/Radicale/issues/1183 Upstream has been completely unresponsive. No fix is available. Their changelog fails to mentions any fix for this. Reputedly upstream plans to force upgrades and doing so would violate Debian policy. With no fix available this renders the Radicale package useless unless one wishes to run in with an insecure configuration (disable TLS/SSL). Sorry to say this, but perhaps the Radicale package needs to be removed from Debian if this is the support level. Clients known effected include iPhone and DAVx5 (Android). I suspect this only manifests if Radicale is in the standalone configuration (likely not when setup as an Apache module). Presently the only visible solution is to remain with the old stable version of Radicale. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1006595: libexec move patch update
This still seems a Good Idea(tm) to start the process of moving to /usr/libexec, but I do update patches if I discover issues. Note, for the shorter term it makes sense to leave things in /usr/lib. Until a few revisions pass with both /usr/lib and /usr/libexec copies, xen-utils-common must keep using /usr/lib. Issue is once xen-utils-common uses /usr/libexec, older installations break. Best to keep compatibility with old builds for a while. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 >From b7477e7fab01b48b663d3e89e4f4c7bd352c8b7e Mon Sep 17 00:00:00 2001 From: Elliott Mitchell Date: Sat, 26 Feb 2022 17:15:46 -0800 Subject: [PATCH] debian: Initial phase of moving xen-utils-* to libexec, future compat At some future point the executables will be moved to /usr/libexec. Ensure current versions of the package will be compatible with future xen-utils-common packages which expect the files in /usr/libexec. Signed-off-by: Elliott Mitchell --- debian/rules | 4 debian/xen-utils-V.install.vsn-in | 3 +++ debian/xen-utils-common.install | 3 +++ 3 files changed, 10 insertions(+) diff --git a/debian/rules b/debian/rules index ba2567b4de..095ad07c51 100755 --- a/debian/rules +++ b/debian/rules @@ -298,6 +298,10 @@ xenstore_rm = $(addprefix debian/xen-utils-common/, \ override_dh_install: debian/shuffle-binaries $(upstream_version) : + mkdir $(t)/usr/libexec + ln -s /usr/lib/xen-$(upstream_version)/bin $(t)/usr/libexec/xen-$(upstream_version) + ln -s /usr/lib/xen-common/bin $(t)/usr/libexec/xen + : debian/shuffle-boot-files $(upstream_version) $(flavour) : dh_install $(dh_install_excludes) diff --git a/debian/xen-utils-V.install.vsn-in b/debian/xen-utils-V.install.vsn-in index da04b59d42..66dc5cd190 100644 --- a/debian/xen-utils-V.install.vsn-in +++ b/debian/xen-utils-V.install.vsn-in @@ -1,3 +1,6 @@ +# initial phase of moving to libexec, future compatibility +usr/libexec/xen-@version@ + usr/lib/xen-@version@/bin usr/lib/xen-@version@/lib/python diff --git a/debian/xen-utils-common.install b/debian/xen-utils-common.install index 620825ad18..121d45d8a0 100755 --- a/debian/xen-utils-common.install +++ b/debian/xen-utils-common.install @@ -29,3 +29,6 @@ usr/share/man ../scripts/xen-toolstack-wrapper usr/lib/xen-common/bin ../scripts/xen-toolstack usr/lib/xen-common/bin + +# initial phase of moving to libexec, future compatibility +usr/libexec/xen -- 2.30.2
Bug#1005176: xen-utils-4 library dependencies need update
On Fri, Feb 25, 2022 at 06:40:23PM +0100, Hans van Kranenburg wrote: > > However, I hope you understand that there's no way we can help when you > use something else than the actual packages in Debian, do not provide > any error messages seen, and describe what you see instead as "it felt > like everything wanted to explode". I'm aware I've got things in a state which is outside the support envelope. I was hoping observations might also apply inside the support envelope. > For me, Xen 4.16 does run OK on my test servers, FWIW. That doesn't surprise me, it didn't take long to get things into a working state for me. Just I was able to get things into a problematic state which the packaging is supposed to prevent. xen-utils-4.16 depends on: libxencall1, libxenevtchn1, libxenforeignmemory1, libxengnttab1, and libxentoollog1. On a system being upgraded there will be 3 versions of each of these libraries available. 4.14.3+32-g9de3671772-1 4.14.3+32-g9de3671772-1~deb11u1 4.16.0-1~exp1 Issue is the rebuilt xen-hypervisor-4.16 and xen-utils-4.16 could be installed without updating libxencall1, libxenevtchn1, libxenforeignmemory1, libxengnttab1, and libxentoollog1. With the 4.14.3+32-g9de3671772-1~deb11u1 versions of libraries things were broken. I'm unsure which one(s) was the problem, though the problem disappeared once all 5 were updated. That enough for you? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#466064: xserver-xorg-core: -novtswitch is still broken
found 466064 2:1.20.11-1+deb11u1 quit I almost wonder whether I'm seeing a distinct bug since #466064 is so old. -novtswitch continues(?) to be problematic. Current version the option doesn't work. Not switching VTs is rather valuable for having multiple X-servers started by init and running on distinct VTs. Combined with VMs all sorts of interesting things become possible. Having a broken -novtswitch option causes all sorts of trouble for such a setup. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1002670: grub2-common: Unable to force MBR/embedding installation
On Mon, Jan 03, 2022 at 05:17:19PM +, Steve McIntyre wrote: > > On Mon, Jan 03, 2022 at 08:52:48AM -0800, Elliott Mitchell wrote: > > > arm64 machines categorically do *not* have any capability to run this > way. It has never been a thing. Instead, systems running GRUB will > load GRUB as a UEFI binary from an EFI System Partition (ESP) and go > from there. Depending on the exact installation on your system, you > may have an ESP on removable storage (SD?), on hard drive / SSD, or > maybe on internal eMMC or similar. It's possible you could be loading > from the network too, but I assume you'd know if that was happening. Finally figured out what was occurring. I believe your statement "a UEFI binary from an EFI System Partition (ESP)" is wrong. Appears Tianocore will load UEFI binaries from diskslices which simply contain filesystems it understands. > You have not identified the exact platform you're using, so I've no > idea exactly which of the above options is most likely. A cheap and popular ARM64 device which got a Tianocore implementation fairly quickly due to its level of popularity. Trick is it doesn't have NVRAM available for storage of EFI variables, so `grub-install` cannot make itself the default boot method. I would suggest the "EFI variables are not supported on this system." warning/error message needs more information. In Tianocore's configuration (boot menu) I needed to go in and select "Add boot method" and navigate to where `grub-install` put grubaa64.efi. Then I simply needed to tell Tianocore to have that as the highest priority boot method. > >Or at least that is a simple explanation for why traces of > >2.02+dfsg1-20+deb10u4 continue to persist, while 2.04-20 appears > >reluctant. > > My first guess would be that either: > > * You have more than one ESP on your system somewhere, and the system >is finding an old grub binary that way; or > > * The old grub is in the removable media path and you haven't managed >to replace it yet (see my first mail for details on how to do >that). I finally found the 2.02+dfsg1-20+deb10u4 "grubaa64.efi". It was on /dev/sda128 which was marked with a type UUID of bc13c2ff-59e6-4262-a352-b275fd6f7172 ("Linux extended boot" according to `fdisk`). /dev/sda128 contained an ISO9660 filesystem which was the previous Debian netinst ISO image. The message "EFI variables are not supported on this system." was less than wonderful for figuring out what to do next. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1005176: xen-utils-4 library dependencies need update
Package: src:xen Version: 4.16.0-1~exp1 I'm guilty of pulling in later Xen source and building it based on the experimental 4.16 packaging. As such this may actually only be an issue for a package version beyond 4.16.0. I'm uncertain which it is, but xen-utils-4.16 appears to need an update to one or more of libxencall1, libxenevtchn1, libxenforeignmemory1, libxengnttab1 and/or libxentoollog1 in order to function. During my initial update I merely updated libxenmisc4.16 and libxenstore4. In this condition something (I suspect xenstored) was rather broken and things were unusable. Notably `xl list` was hanging. I was unable to get VMs started and it felt like everything wanted to explode. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#989560: Bug #989560 solved by update?
Nothing further has been heard. Was bug #989560 resolved by updating to the GRUB 2.04 packages? Possibly as part of upgrading to bullseye? The provided information looks like what one might expect from trying to load Xen on ARM via GRUB 2.02. As such I'm left suspecting this was resolved by updating. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1002670: grub2-common: Unable to force MBR/embedding installation
On Mon, Jan 03, 2022 at 05:17:19PM +, Steve McIntyre wrote: > > On Mon, Jan 03, 2022 at 08:52:48AM -0800, Elliott Mitchell wrote: > >On Mon, Jan 03, 2022 at 02:35:48PM +, Steve McIntyre wrote: > >> > >> What you're asking for here won't work; arm64 devices don't/can't use > >> the embedding MBR/gap style of GRUB installation - that's x86 only. > >> Instead, > >> what you need is to do an EFI installation but with a couple of extra > >> options chosen. Run "dpkg-reconfigure -plow grub-efi-arm64" and say: > >> > >> * "yes" to "Force extra installation to the EFI removable media path?" > >> * "no" to "Update NVRAM variables to automatically boot into Debian?" > >> > >> and and you should be fine from now on. > > > >Justify the statement "arm64 devices don't/can't use the embedding > >MBR/gap style of GRUB installation". I concur that is not the normal way > >of doing EFI installation on ARM64 devices, but in this case I've got a > >device which is unable to store persistent variables (if you sacrifice a > >SD Card it can store them, but otherwise it loses them on restart). > > The MBR/gap style is *totally specific* to the old-school x86 BIOS way > of doing things: > > * A tiny 16-bit x86 asm loader is added into the boot sector; it uses >BIOS routines to load the GRUB core image from the raw space after >the partition table and execute it. > > * The core image contains enough functionality (display, filesystems, >storage drivers, etc.) to be able to find load further modules from >the /boot filesystem. It loads those, runs the menu, etc. > > arm64 machines categorically do *not* have any capability to run this > way. It has never been a thing. Instead, systems running GRUB will > load GRUB as a UEFI binary from an EFI System Partition (ESP) and go > from there. Depending on the exact installation on your system, you > may have an ESP on removable storage (SD?), on hard drive / SSD, or > maybe on internal eMMC or similar. It's possible you could be loading > from the network too, but I assume you'd know if that was happening. > > You have not identified the exact platform you're using, so I've no > idea exactly which of the above options is most likely. The platform is Tianocore built for the particular hardware. While the hardware does have a SD controller, since there is no card present it couldn't be loaded from there (no eMMC). As such it is definitely coming from what Linux sees as /dev/sda (via USB3 since that hardware is available). Now, since everything on the VFAT filesystem used by both the initial stage loader and Tianocore was moved to a subdirectory, the older version isn't coming from there (unless Tianocore does the equivalent of a `find` during load, which seems unlikely). As such I'm pretty sure Tianocore is finding the older GRUB by looking in the gap between the GPT entries and data start. > >Yet somehow despite restarting from a mostly clean slate an older > >installation of 2.02+dfsg1-20+deb10u4 keeps managing to manifest, while > >2.04-20 is unable to be loaded. > > > >My conclusion is 2.02+dfsg1-20+deb10u4 was able to successfully install > >in the embedding area despite not being supposed to work. Meanwhile I'm > >guessing Tianocore/ARM64 inherited the ability to boot from the embedding > >area, despite using that being strongly discouraged. > > Nope, sorry. > > >Or at least that is a simple explanation for why traces of > >2.02+dfsg1-20+deb10u4 continue to persist, while 2.04-20 appears > >reluctant. > > My first guess would be that either: > > * You have more than one ESP on your system somewhere, and the system >is finding an old grub binary that way; or > > * The old grub is in the removable media path and you haven't managed >to replace it yet (see my first mail for details on how to do >that). The former isn't possible. I am though wondering if use of the "--removable" option will resolve the situation. When it comes down to it, all storage media is removable just an issue of how difficult it is to remove and replace. Since this one is via USB3 it is pretty simple to swap out, but I could see internal storage being attached via USB3... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1002670: grub2-common: Unable to force MBR/embedding installation
On Mon, Jan 03, 2022 at 02:35:48PM +, Steve McIntyre wrote: > > On Sun, Dec 26, 2021 at 05:12:38PM -0800, Elliott Mitchell wrote: > > > >Hopefully the subject tells the tale. Due to some odd hardware, I need > >to force `grub-install` to install the EFI version of GRUB into the > >MBR/boot area gap. Unfortunately the documentation suggest none of > >`grub-install`'s options can get this result. As a result I've got a > >problem. > > > >The background: I'm trying to get GRUB installed on a very popular ARM64 > >device which has a full Tianocore/UEFI image available. Unfortunately > >while it is full Tianocore, the device lacks any private NVRAM and thus > >is unable to store EFI variables. > > > >`grub-install` tries to do a "normal" UEFI installation, which fails due > >to lack of EFI variables. As a result I need GRUB to install in the > >MBR/GPT gap, but none of `grub-install`'s options appear to cause this. > > What you're asking for here won't work; arm64 devices don't/can't use > the embedding MBR/gap style of GRUB installation - that's x86 only. Instead, > what you need is to do an EFI installation but with a couple of extra > options chosen. Run "dpkg-reconfigure -plow grub-efi-arm64" and say: > > * "yes" to "Force extra installation to the EFI removable media path?" > * "no" to "Update NVRAM variables to automatically boot into Debian?" > > and and you should be fine from now on. Justify the statement "arm64 devices don't/can't use the embedding MBR/gap style of GRUB installation". I concur that is not the normal way of doing EFI installation on ARM64 devices, but in this case I've got a device which is unable to store persistent variables (if you sacrifice a SD Card it can store them, but otherwise it loses them on restart). Yet somehow despite restarting from a mostly clean slate an older installation of 2.02+dfsg1-20+deb10u4 keeps managing to manifest, while 2.04-20 is unable to be loaded. My conclusion is 2.02+dfsg1-20+deb10u4 was able to successfully install in the embedding area despite not being supposed to work. Meanwhile I'm guessing Tianocore/ARM64 inherited the ability to boot from the embedding area, despite using that being strongly discouraged. Or at least that is a simple explanation for why traces of 2.02+dfsg1-20+deb10u4 continue to persist, while 2.04-20 appears reluctant. (now back to pondering whether grub-uboot may still be a more maintainable for this installation) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1002670: grub2-common: Unable to force MBR/embedding installation
Package: grub2-common Version: 2.04-20 Severity: important Hopefully the subject tells the tale. Due to some odd hardware, I need to force `grub-install` to install the EFI version of GRUB into the MBR/boot area gap. Unfortunately the documentation suggest none of `grub-install`'s options can get this result. As a result I've got a problem. The background: I'm trying to get GRUB installed on a very popular ARM64 device which has a full Tianocore/UEFI image available. Unfortunately while it is full Tianocore, the device lacks any private NVRAM and thus is unable to store EFI variables. `grub-install` tries to do a "normal" UEFI installation, which fails due to lack of EFI variables. As a result I need GRUB to install in the MBR/GPT gap, but none of `grub-install`'s options appear to cause this. Plan B might be to remove the EFI System UUID from the boot area, but this solution seems wrong. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: (Presently) Not in 5.10 source
Having finally gotten to test this, the issue does NOT effect 5.10.70-1. So far I've only gotten to try reboot, but that went fine. Might have been an ACPI or Xen mismerge into 4.19. Alas this may simply disappear into history. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1000147: radicale: Non-working init script
On Thu, Nov 18, 2021 at 07:26:50PM +0100, Jonas Smedegaard wrote: > > Quoting Elliott Mitchell (2021-11-18 16:45:58) > > Appears the documentation for `start-stop-daemon` is misleading or > > wrong, and the "--exec" option is needed if "--startas" is given a > > pathname. > > This sounds like a bug in start-stop-daemon: please report against the > package dpkg which seems to provide start-stop-daemon, and provide more > details on how it fails to work. > > > > Might be this is an issue for me, but not others since the "radicale" > > user's shell had been set to `/bin/false`. As this is strongly > > recommended security hardening, the radicale package should work with > > a system setup this way. > > Not sure what you are saying here, but seems a separate issue (even if > affecting the other one). > > If you mean to say that using shell /usr/sbin/nologin for radicale > account is strongly discouraged, then please file a separate bugreport > about that - preferably with more details, as that is not obvious to me. > > Also, please file a separate bugreport if you believe radicale should > work with custom shell setting and fails to do so (but works without > such change). Because I agree that should work, and am surprised if it > doesn't (but I don't use sysV init system myself so cannot easily test). My guess is this could be a documentation problem for `start-stop-daemon`. Based upon observed behavior, I suspect "--exec" changes to the appropriate user and then does an execve() of the specified executeable. Whereas "--startas" is instead executing the shell of the specified user with arguments as specified. The latter requires the shell be valid. Unless there is an overwhelmingly important reason for the radicale user's shell to be valid, it should instead be `/bin/false`. This though requires use of "--exec". Since Radicale appears to function properly when started with "--exec" that seems a vastly superior approach (doesn't result in security concerns). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#1000147: radicale: Non-working init script
Package: radicale Version: 3.0.6-3 Severity: important The init script `/etc/init.d/radicale` which is included with the 3.0.6-3 package failed to start Radicale for me. Radicale's "--daemon" option was apparently removed with 3.0.6-3. Attempting to use the "--daemon" option resulted in an error. `start-stop-daemon`'s "-b" option was able to work around this. Appears the documentation for `start-stop-daemon` is misleading or wrong, and the "--exec" option is needed if "--startas" is given a pathname. Might be this is an issue for me, but not others since the "radicale" user's shell had been set to `/bin/false`. As this is strongly recommended security hardening, the radicale package should work with a system setup this way. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#972950: ncal: cal fails to highlight current date (and rejects -h flag)
Yet another person who has noticed this. Highlighting the current date is rather handy for interactive use. The basis of #904839 is incorrect. Without that change `cal` uses isatty() to determine whether output is a terminal. If not a terminal, highlighting is disabled (compare `ncal -b` and `ncal -b | cat`). As such, if the reporter for #904839 really was attempting to parse the output, that wouldn't have been effected by highlighting. There was only a single reporter for #904839 (after the feature had been in place for 5 years), there are now 5 complaints after it has been absent for less than a year. That seems like rather overwhelming support for keeping highlighting. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#996988: Should Provide: flash-kernel on ARM(64)
Package: pv-grub-menu Version: 1.3 SSIA. On ARM(64) systems typical Linux kernel packages Recommends "flash-kernel", but for VMs this is quite undesireable. As such I would suggest pv-grub-menu should be marked as providing flash-kernel on ARM(64). (I suspect this is harmless on other architectures, but is only really needed on ARM(64)) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#996666: Xen PVH domains lack console
Package: grub-xen-host Version: 2.04-20 I'm unsure which versions from stable were tried, but at a minimum 2.02+dfsg1-20+deb10u4 was and also had this issue. I'm also unsure whether this is actually a GRUB bug versus a Linux kernel bug. When booting in x86 PVH mode the Linux kernel fails to load the Xen virtual console as console. As a result the kernel's dmesg is unavailable for debugging during boot. When using grub-x86_64-xen.bin (x86 PV mode) as Xen kernel: $ cat /proc/consoles tty0 -WU (EC p )4:1 hvc0 -W- (E p ) 229:0 $ When using grub-i386-xen_pvh.bin (x86 PVH mode) as Xen kernel: $ cat /proc/consoles tty0 -WU (EC p )4:1 $ When using Tianocore's XEN_EFI.fd (arm64 PVH mode) as Xen kernel: $ cat /proc/consoles hvc0 -W- (EC p ) 229:0 $ Presently I think GRUB is more likely the culprit, but this is far from certain. Notably there are a few messages from the ACPI code, so I'm wondering if GRUB sets up an ACPI table which isn't quite right. I'm surprised at "tty0" being listed given the complete lack of any sort of potential console device. Yet this is x86, so I can understand. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#996608: linux-source-5.10: Mising dependency: dwarves
Package: linux-source-5.10 Version: 5.10.70-1 SSIA. Debian's 5.10 configuration will NOT build without the "dwarves" package (`pahole`). In light of this some package, likely linux-source-5.10 should recommend "dwarves". -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#452721: [Pkg-xen-devel] Bug#452721: "xendomains" does not restore domains in same order as it would start them
On Tue, Sep 28, 2021 at 11:39:49PM +0200, Diederik de Haas wrote: > On Tuesday, 28 September 2021 13:41:57 CEST Andy Smith wrote: > > > > > Could the domain ID be used for that? > > > > I don't like it because it only says how recent a domain was > > started relative to others, not any intention about start/stop > > order. Shut one down manually (or crash) and start it again and it > > gets a new domid higher than all existing. > > It is a (really) simple heuristic and likely too simple. > But at first glance it seemed (to me) to actually do the right thing. It is *definitely* too simple to do a good job; however, this has the advantages of being a significant improvement and simple enough to be in service quickly. On Wed, Sep 29, 2021 at 01:24:58AM +0200, Diederik de Haas wrote: > On Wednesday, 29 September 2021 00:02:46 CEST Andy Smith wrote: > > On Tue, Sep 28, 2021 at 11:39:49PM +0200, Diederik de Haas wrote: > > The idea of the domid controlling/influencing order of shutdown > > It was just an idea that popped in my head. All in all I've likely spend less > then a minute thinking about the domid idea. > Don't spend more on it then you already have ;) The record shows I suggested it first: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=452721#35 This isn't an adaquate solution, but is a distinct improvement. > > > I really agree with the 'upstream' tag as not only should it be > > > fixed/adjusted there, but it also engages a (much) larger audience who > > > think of scenarios we likely didn't think about. > > > > Should we move discussion to xen-us...@lists.xen.org then? > > I can make a case for both xen-users and xen-devel. > xen-users: > It could be that a solution already exists. I know that in Qubes (which uses > Xen) has some dependency mechanism in that if you start vmA which depends on > vmB, then it first starts vmB and then vmA. I don't know if that is a Qubes > 'extension' or that they simply use available functionality of Xen. Could be interesting to learn of what solutions are already out there and what features are must have. Most existing solutions likely have problems. Some may be GPL-incompatible. Most are likely very limited. > xen-devel: > If needed functionality doesn't yet exist and needs to be built anew, then > xen-devel is the right place to discuss that. > > It could be that the best place to start is xen-users which then may/could > 'transition' to xen-devel. > > Let's hear others first what they think is the best approach. Perhaps. Question is how much person-time is available for this? If a great deal of xen-devel person-time can be devoted to this a very ambitious solution might be viable. If only a little bit of xen-devel person-time is available, the approach would need to be very limited. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#452721: [Pkg-xen-devel] Bug#452721: #452721 moreinfo?
On Mon, Sep 27, 2021 at 05:13:04PM +, Andy Smith wrote: > On Sun, Sep 26, 2021 at 08:07:58PM -0700, Elliott Mitchell wrote: > > During a full downtime when all VMs were fully shut down, this effect > > can be achieved by including numbers in the filename. Say > > /etc/xen/auto/0_ldap.cfg, /etc/xen/auto/1_fileserver.cfg, > > /etc/xen/auto/9_everything_else.cfg. > > I also do this to control start up order, though I use a prefix of > NNN-. > > The main missing functionality from my point of view is not being > able to control the order of save/shutdown. As you say the script > for saving everything or shutting everything down just does a read > of all existing domids and does the action on them one by one in > increasing order. Seems we're running into the same problems, coming up with the same first-tier workaround and now we all need a common complete solution. > I think the "auto" directory is a pretty good and simple interface, > so how about using it for save/shutdown as well? So, instead of just > enumerating all running domids, enumerate all files in > /etc/xen/auto/ in REVERSE order, parsing the name of the domain out > of each one and doing the action on that name. When all files have > been exhausted, THEN do the action on any remaining running domains. > > This has the advantages of: > > - still working even if administrator does not use ordering in > /etc/xen/auto. Filename format there does not change from what it > is now, where ordering is already possible but is optional. > > - being quite obvious behaviour - save/shutdown order is reverse of > start order. This though requires something which understands the format of those files, can retrieve name or uuid, and then resolve that to something suitable for `xl {save|shutdown}`. Alternatively this requires `xl {save|shutdown}` to be able to select the target domain based on the configuration file (documentation reads like this might be halfway implemented). Additionally this needs a tool to identify domains which are NOT listed in /etc/xen/auto/ then do save/shutdown on them first. > That seems like a good minimal improvement, but if one wanted to > explicitly control save/shutdown order then perhaps the next > enhancement could be an /etc/xen/shutdown/ directory with similar > purpose to the "auto" one? i.e.: > > 1. Enumerate files in "shutdown" directory in reverse order, getting >name from each and doing shutdown action on it > > 2. If there were no files there, instead use "auto" directory for >this purpose > > 3. Then do shutdown action on every remaining running domain as >usual > > Again this still results in everything getting a shutdown action if > administrator does not want to do any of this. > > It's an open question for me whether step 2 (falling back to > enumerating "auto" directory) only happens when "shutdown" directory > is empty or if it should happen all of the time. This strikes me (note, I am NOT a Debian maintainer) as likely to involve too much work for too little gain. For complex setups this won't be enough, for simple setups this will be overkill. > > If the hypervisor is rebooted and VMs are saved to /var/lib/xen/save; > > they will be paused in identifier order, but saved by domain name. When > > scanning /var/lib/xen/save, `xendomains` goes by filename which means VMs > > are restored in a distinct (and often problematic) order. > > > > A minimal solution would be for `xendomains` to save VMs in > > /var/lib/xen/save - and then use `sort -n` during restore. > > If by this you mean it would be good if the "save all" action picked > the filename from the filename in the "auto" directory, to replicate > that directory's ordering, then I agree. > > If however you mean the actual Xen domid of the running domain then > I'm not sure what that would buy us. If I had a domain with a > filename of 010-ldap0.cfg it might get strted first and have domid > 1, but then I reboot it and it has domid 99, I wouldn't want it > saved as /var/lib/xen/save/99-ladp0, I'd still want it saved as > /var/lib/xen/save/010-ladp0, Minimal meaning very simple to implement, but very limited. The idea is domains which start later get higher domain Ids. As long as crucial domains rarely get restarted, they will tend to keep low domain Ids. This fails when a crucial domain gets restarted late due to some reason, but this might capture enough low-hanging fruit to be worthwhile. > > A better approach would be to have a LSB style header specifying > > dependencies to flag VMs which should be saved or shutdown late, > > and VMs which should be saved or shutdown early.
Bug#452721: #452721 moreinfo?
I'm surprised #452721 is tagged moreinfo since it seems simple, but that may depend on installation capability. Note, I am not the original reporter, so I might actually be observing something distinct. I doubt this, but I cannot be certain. Issue is this, a hypervisor machine could have tens or even hundreds of VMs. There could be ordering dependencies during startup and shutdown. Notably there are core services, such as LDAP, DHCP, fileserver and DNS. Often these need to be up before anything else and they may need to come up in a particular order. Most often the LDAP server (which can be a distinct VM) needs to be up first. Meanwhile for downtimes, a fileserver (which can also be a VM) needs to go down last. During a full downtime when all VMs were fully shut down, this effect can be achieved by including numbers in the filename. Say /etc/xen/auto/0_ldap.cfg, /etc/xen/auto/1_fileserver.cfg, /etc/xen/auto/9_everything_else.cfg. If the hypervisor is rebooted and VMs are saved to /var/lib/xen/save; they will be paused in identifier order, but saved by domain name. When scanning /var/lib/xen/save, `xendomains` goes by filename which means VMs are restored in a distinct (and often problematic) order. A minimal solution would be for `xendomains` to save VMs in /var/lib/xen/save - and then use `sort -n` during restore. A better approach would be to have a LSB style header specifying dependencies to flag VMs which should be saved or shutdown late, and VMs which should be saved or shutdown early. A ridiculous overkill solution might be to turn the /etc/xen/*.cfg files into full init scripts. This could be done by having a script which understood domain configuration files well enough to identify the name/UUID and then start/stop the domain as specified by $1. Use that script as the interpreter (#! line), then it could find the configuration via $0. Then normal init script handling tools could take care of ordering. (geeze, that really does actually seem kind of like a semi-workable solution despite seeming rather crazy at first) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#939186: irt: Bug #939186 and 4.11/4.14
found 939186 4.14.2+25-gb6a8c4f72d-2 found 939186 4.14.3-1 tags 939186 upstream quit Upon a bit more experimentation, seems my minimal example had become too minimal. Bring in the less minimal example and things explode again. Finally setup an appropriate downtime window and it reproduced. After some fighting to get Xen's console operational, detail on the panic was recorded. My wild guess from the output is some combination of enabling "nestedhvm" and this being an AMD processor machine. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#939186: Bug #939186 and 4.11/4.14
Control: found 939186 4.11.4+107-gef32c7afa2-1 Certainly reproduced with 4.11. Most recently I tried 4.14 and it *didn't* occur. Problem is I've got two guesses: First, system had most VMs shutdown for some experimentation. Could be having >50% of memory allocated is needed for this to occur. Second, could be the issue got fixed sometime before 4.14. I *really* hope it is the second, but #939186 should remain open for a while so that experimentation can occur. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: Simply ACPI powerdown/reset issue?
On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote: > I presume you are suggesting I try booting 4.19.181-1 on the > current version of Xen-4.14 for bullseye as a dom0. I am not > inclined to try it until an official Debian developer endorses > your opinion that the bug I am seeing is distinct > from #991967, at which point I will report the bug I am > seeing as a new bug. Chuck Zmudzinski you are getting rather close to my threshold for calling harrassment. You're not /quite/ there, but I'm concerned. Since the purpose of the bug reports is to find and diagnose bugs, I did a bit of experimentation and made some observations. I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. As such, I believe there are in fact two distinct bugs being observed. The presence of EITHER of these is sufficient to cause hangs during powerdown or reboot. First, some patch originally from Linux's main branch breaks Xen reboots was backported somewhere between 4.19.181-1 and 4.19.194-3. This may either have been introduced before 5.10 diverged from main, or may also have been backported to 5.10. THIS is Debian bug #991967. Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is valuable to ARM devices breaks reboots and powerdowns on x86. This is correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently this has no Debian bug report. The first is presently unidentified, someone enthusiastic either needs to read git logs/source code, or bisect and build to find where it got broken. The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: > > On 9/20/21 7:39 PM, Diederik de Haas wrote: > > On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: > >> Merely having the path is a sufficiently strong indicator for me to > >> simply wave it past. I though would suggest Debian should instead > >> cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. > >> > >> This is available as a patch at: > >> > >> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 > > You probably then also want the following commit, which is a fix on that > > patch: > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b > > > > Found that via the following url/query: > > https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI > > > > I don't know whether others should be used from that as well. > > I tried these two commits (adapted for the xen-4.14 branch) but this > approach did not fix the bug - with these patches applied the dom0 > did not power down. > > My advice for the Debian Xen Team is to consult with upstream and > get their advice on whether or not it is advisable for Debian to > retain the patches from the Xen-4.16 branch that have been > added to the Debian 4.14 package in an attempt to support > some arm devices that panic during on an unpatched Xen-4.14. > If upstream cannot help Debian backport fixes for arm panics > from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian > Xen team should remove aggressive patches that really have now > turned the Debian Xen-4.14 package into a Frankenstein version > that is a mixture of Xen-4.14 and Xen-4.16, and decide that support > for those arm devices must wait until Debian gets Xen 4.16 up > and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Mon, Sep 20, 2021 at 06:29:49PM -0400, Chuck Zmudzinski wrote: > On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: > > > > On 9/20/21 12:27 AM, Elliott Mitchell wrote: > >> On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > >> > >>> I suspect the following patch is the culprit for problems > >>> shutting down on the amd64 architecture: > >>> > >>> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > >>> This patch does affect amd64 acpi code, and is probably causing > >>> the problem on my amd64 system, so my build of the xen-4.14 > >>> hypervisor without this patch fixed the problem. > >> Of the ones listed that is the only one which has any overlap with x86 > >> code.?? The next reproduction step is `apt-get source xen && > >> patch -p1 -R < > >> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > >> && dpkg-buildpackage -b`.?? Then try with this to confirm that patch > >> is what does it. > >> > >> Thing is that delta is rather small.?? I don't have a simulator, but that > >> is rather small to be the culprit. > > > > I just tested the build with > > patch -p1 -R < > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > applied before building the package and I can confirm that this is the > > patch > > causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch > > fixes it on my amd64 system. But this would probably break the arm build. > > > > I think one possible fix would require modifying > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > so it only applies at runtime to the arm architecture. I will try some > > modifications to the patch instead of removing it, and if I get something > > that works on amd64 and also might work on arm, I will post it > > for Elliott to try. > > I have an encouraging result. I found a very simple patch > to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff > bug on my system and it should not affect the arm patches > at all: > -- > This patch partially reverts previous patch > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > This hopefully fixes #911976 > > --- a/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:49:08.0 -0400 > +++ b/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:25:05.572038000 -0400 > @@ -46,10 +46,6 @@ > if ((phys + size) <= (1 * 1024 * 1024)) > ?? return __va(phys); > > -?? /* No further arch specific implementation after early boot */ > -?? if (system_state >= SYS_STATE_boot) > -?? ?? return NULL; > - > offset = phys & (PAGE_SIZE - 1); > mapped_size = PAGE_SIZE - offset; > set_fixmap(FIX_ACPI_END, phys); > -- > > Can you try this patch to src:xen and see if your > arm devices are OK with it? Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 The other commit I would suggest being picked by src:xen is 5a4087004d1adbbb223925f3306db0e5824a2bdc This is for device-tree funkiness which got added between linux-5.10.0 and linux-5.10.y (if the Debian kernel team wants to maintain a fix in Debian's kernel source, that works too). BTW have I mentioned I've become rather skeptical of device-trees being a usable way of representing hardware information? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 > > linux kernel version: 5.10.46-4 (the current amd64 kernel > for bullseye) > > Boot system: EFI, not using secure boot, booting xen > hypervisor and dom0 bullseye with grub-efi package for > bullseye, and it boots the xen-4.14-amd64.gz file, not > the xen-4.14-amd64.efi file. > I also tested a buster dom0 with the 4.19 series kernel > on the xen-4.14 hypervisor from bullseye and saw the > problem, but I did not see the problem with either > a buster (linux 4.19) or bullseye (linux 5.10) dom0 on > the xen-4.11 hypervisor, so I think the problem is > with the Debian version of the xen-4.14 hypervisor, > not with src:linux. You're referencing several software versions which are mismatches for #991967. #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3, but not Linux kernel 4.19.181. The fact it correlates with a Linux kernel update rather strongly points to the Linux kernel. I could believe the situation is partially the fault of both though. > I suspect the following patch is the culprit for problems > shutting down on the amd64 architecture: > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > This patch does affect amd64 acpi code, and is probably causing > the problem on my amd64 system, so my build of the xen-4.14 > hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. > I think this bug should be re-classified as a bug in src:xen. There could be a separate bug in src:xen, but that is not #991967. > I also would inquire with the Debian Xen Team about why they > are backporting patches from the upstream xen unstable > branch into Debian's 4.14 package that is currently shipping > on Debian stable (bullseye). IMHO, the aforementioned > patches that are not in the stable 4.14 branch upstream > should not be included in the xen package for Debian stable. It was requested since someone trying to have Xen operational on a device needed those for operation. Rather a lot of bugfix or very small standalone feature patches get cherry-picked. Presently I haven't been convinced this is a Xen bug (though it does effect Xen installations). Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux kernel? I'm suspecting got incorrectly backported on the Linux side (alternatively the Xen project seems a bit poor at keeping needed patches in Linux). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso > wrote: > > > > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > > An experiment lead to a potential alternative explanation for #991967. > > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > > was tried on a UEFI system and the issue wasn't observed) > > > > Following up on https://bugs.debian.org/991967#12 > > > > Did you succeeded in bisecting the issue as you seem to have it > > reproducible? > > I noticed this bug on bullseye ever since I have been > running bullseye as a dom0, but my testing indicates > there is no problem with src:linux but the problem > appeared in src:xen with the 4.14 version of xen on > bullseye. > > I ask Elliott if you are only seeing the problem on Debian's > xen-4.14 hypervisor? Also, which architecture, arm or > amd64? I only see the problem on the Debian xen-4.14 > hypervisor, and I have only tested on amd64, and I > have found a fix for my amd64 system which is as > follows: > > Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, > with a Haswell CPU (core i5-4590S) > > xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 > > linux kernel version: 5.10.46-4 (the current amd64 kernel > for bullseye) Nope. As per the report the problem appeared with kernel 4.19.194-3 and at the time using Xen 4.11. The kernel you're listing is rather more recent, which might suggest a patch which had been backported from 5.x to 4.19. I could believe a Xen security update being the trigger though (I don't recall there being one at the right time, but I wouldn't rule it out). > Boot system: EFI, not using secure boot, booting xen > hypervisor and dom0 bullseye with grub-efi package for > bullseye, and it boots the xen-4.14-amd64.gz file, not > the xen-4.14-amd64.efi file. > > I also tested a buster dom0 with the 4.19 series kernel > on the xen-4.14 hypervisor from bullseye and saw the > problem, but I did not see the problem with either > a buster (linux 4.19) or bullseye (linux 5.10) dom0 on > the xen-4.11 hypervisor, so I think the problem is > with the Debian version of the xen-4.14 hypervisor, > not with src:linux. Just to make sure, the kernel you were testing was 4.19.194-3? The issue didn't manifest with kernels earlier than that. Could be we're seeing distinct bugs. > This patch does affect amd64 acpi code, and is probably causing > the problem on my amd64 system, so my build of the xen-4.14 > hypervisor without this patch fixed the problem. While that commit modifies the code path the processor takes, the modified path appears identical. > I also would inquire with the Debian Xen Team about why they > are backporting patches from the upstream xen unstable > branch into Debian's 4.14 package that is currently shipping > on Debian stable (bullseye). IMHO, the aforementioned > patches that are not in the stable 4.14 branch upstream > should not be included in the xen package for Debian stable. Some people are asking for those. Those are bugfixes for an extremely popular device which panics on boot without the patches. Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees were modified in a way which broke Xen 4.14 on ARM64. The change violated Linux's own standards for device-trees, yet still appeared in a stable branch. In other news, if you see device-trees compared to ACPI tables, they're not very comparable. 99% of ACPI tables work for all versions of all OSes. Any given device-tree is only likely to work for a single version of a single OS. While a useful abstraction for portions of kernel code, device-trees are utter garbage compared to ACPI tables. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sat, Sep 11, 2021 at 01:29:12PM +0200, Salvatore Bonaccorso wrote: > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > An experiment lead to a potential alternative explanation for #991967. > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > was tried on a UEFI system and the issue wasn't observed) > > Following up on https://bugs.debian.org/991967#12 > > Did you succeeded in bisecting the issue as you seem to have it > reproducible? Problem is that is rather a lot of kernel builds, which also means a lot of downtime... Right now distribution update seems worthy of greater attention. The one notable bit is the one I sent in the last message. The system does NOT have UEFI, and a test system with UEFI seemed to have no problem. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
An experiment lead to a potential alternative explanation for #991967. The issue may be ACPI (non-UEFI) powerdown/reset was broken at 4.19.194-3. Presence of Xen on the system may be unrelated. Failing that, it could be Xen and non-UEFI systems are effected. (Xen was tried on a UEFI system and the issue wasn't observed) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: linux-src 4.19.194-3 breaks Xen Dom0 powerdown and reboot
Package: src:linux Version: 4.19.194-3 Control: affects -1 src:xen SSIA. Previous versions of 4.19 had no issues (4.19.181-1 according to notes), but this cropped up with 4.19.194-3 (-1 and -2 weren't tested). When a Xen domain 0 tries to reboot or powerdown the computer, it hangs with the display off, but the power supply is active. I'm rebuilding from source, so I imagine this also effects linux-image-4.19.0-17-amd64. Seems .194 caused multiple problems for Xen given 990642. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#989560: Bug #989560 is grub-common, not xen-hypervisor-common
I rate #989560 as a grub-common bug, *not* a xen-hypervisor-common bug. As you've noticed, the problem is with the file /etc/grub.d/20_linux_xen, which is part of grub-common, not xen-hypervisor-common. A working grub.cfg will be generated by the version of the file from GRUB 2.04. If you can deal with installing *only* GRUB from testing, that should work. The bug should be reassigned to grub-common, but marked as effecting Xen so duplicate reports don't show up (actually I'm pretty sure reports against grub-common or src:grub2 already exist). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#979548: u-boot: Package Xen build
On Thu, Jan 07, 2021 at 11:34:44PM -0800, Vagrant Cascadian wrote: > On 2021-01-07, Elliott Mitchell wrote: > > Might it be possible to get a u-boot-xen-arm64 package built? While > > "PyGRUB" is great for Linux, it isn't so good for booting other OSes. > > Do you mean: > > > https://gitlab.denx.de/u-boot/u-boot/-/blob/master/doc/board/xen/xenguest_arm64.rst > > This doesn't describe how to use it or, importantly, what files we would > need to ship in the package. If you could help clarify that (possibly > provide a patch), and ideally get it clarified in the upstream > documentation, then I would think we would be able to ship such a > package. Appears the build issue wasn't libfdt-dev, but instead `dwz` and `debhelper`. I suspect libfdt-dev:any or libfdt-dev may now be sufficient for building (I'm not 100% sure since I have a workaround in place). Anyway I now have something which looks like a first pass at having U-Boot/ARM boot a Xen VM. Some progress has been made, but it I haven't confirmed full operation yet. The build was achieved by copying configs/xenguest_arm64_defconfig over qemu_arm64_defconfig and then cross-building for arm64. This suggests extra steps for "qemu" are also appropriate for "xenguest". Once complete, the file ./usr/lib/u-boot/qemu_arm64/u-boot.bin was copied to the host machine. A configuration file was created for xl, the value for "kernel" was pointed at the u-boot.bin file, both "bootloader" and "ramdisk" options were left unset. Upon attempt to boot this VM (`xl create -c u-boot.cfg`) I ended up at a prompt "xenguest# ". The command-line appeared to act how I would expect U-Boot to act, so I conclude U-Boot had successfully loaded. The next task is to get the OS I wish to run in the VM loaded by U-Boot. As of 2020.10+dfsg-2, appears the "xenguest" defconfig disables all EFI/GPT support. I must recommend the U-Boot maintainers advise upstream to set CONFIG_EFI_LOADER=y and CONFIG_CMD_PART=y in the Xen defconfig. While some smaller VMs may not need EFI support, it appears to be gaining traction everywhere with ARM64. I note SuSE uses it as an intermediate stage between U-Boot and GRUB. FreeBSD's ARM64 VM images appear to *assume* EFI is in use. I haven't gotten U-Boot/Xen to successfully load FreeBSD's bootloader yet, but progress is being made. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#979548: u-boot: Package Xen build
On Thu, Jan 07, 2021 at 11:34:44PM -0800, Vagrant Cascadian wrote: > This doesn't describe how to use it or, importantly, what files we would > need to ship in the package. If you could help clarify that (possibly > provide a patch), and ideally get it clarified in the upstream > documentation, then I would think we would be able to ship such a > package. I think 2 or 3 files would be useful to ship in such a package. First would be "u-boot.bin" or whatever the output filename is. Second might be a README mentioning the 3 values needing to be set in a domu.cfg file. Third might be a /etc/xen/xlexample.u-boot file. The 3 values which need to be set in the domain configuration file are: kernel = "/usr/lib/u-boot/xen/u-boot.bin" # ramdisk = # extra = Mainly the "ramdisk" and "extra" settings should be left unset, while "kernel" points at the U-Boot image. A /etc/xen/xlexample.u-boot would be a copy of Xen's /etc/xen/xlexample.pvlinux with the 3 values set appropriately. Then https://wiki.debian.org/Xen should be adjusted to mention U-Boot being available to boot user domains for Xen. In fact I'm trying to find out whether Xen/U-Boot can load OSes besides Linux. Note, this is presently theory as src:u-boot has a problematic set of package requirements. Presently libfdt-dev doesn't allow installation of multiple architecture versions. My build VM is setup to target Xen which needs the host package, while the u-boot build needs the build package. Grr! -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#979548: u-boot: Package Xen build
On Thu, Jan 07, 2021 at 11:34:44PM -0800, Vagrant Cascadian wrote: > On 2021-01-07, Elliott Mitchell wrote: > > Might it be possible to get a u-boot-xen-arm64 package built? While > > "PyGRUB" is great for Linux, it isn't so good for booting other OSes. > > Do you mean: > > > https://gitlab.denx.de/u-boot/u-boot/-/blob/master/doc/board/xen/xenguest_arm64.rst > > This doesn't describe how to use it or, importantly, what files we would > need to ship in the package. If you could help clarify that (possibly > provide a patch), and ideally get it clarified in the upstream > documentation, then I would think we would be able to ship such a > package. I'm less than 100% sure myself. :-) Most likely you simply configure xenguest_arm64_defconfig, build the configuration and the package would be the copyright plus one output file. In order to use this you setup a VM/domain configuration file where the single output file is specified as the "kernel" parameter. This would cause the U-Boot image to be loaded as if it was an OS kernel and be loaded into the resultant VM and started. Then in theory U-Boot loads configuration parameters from VM disk devices as it normally would. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#974755: smartd: Problematic memory activity
Hmm, don't see a copy of the follow-up message anywhere. Sent to the bug and not me? 6 devices are being monitored, they're behind a HP controller (cciss driver). I don't know for certain that triggering self-tests is the cause, this is merely obvious speculation. My most recent observations seem to suggest this is incorrect as the oom-killer was triggered at a time when self-tests shouldn't be run. I'm open to other programs on the system being the actual cause, allocating memory quickly and pushing the limits. Thing is I've never observed any program besides `smartd` triggering the oom-killer. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#976123: u-boot-rpi: Unreliable USB with storage+keyboard
Package: u-boot-rpi Version: 2020.10+dfsg-1+b1 Severity: important Hopefully SSIA. U-Boot's USB support is highly unreliable. Trying to interact with an advanced bootloader (GRUB) via USB-keyboard is highly troublesome if the Raspberry PI is also booting from a USB storage device. There is some level of magic timing needed to get U-Boot to detect both and provide access for long enough for the bootloader (GRUB here) to finish its job. UEFI is very new to U-Boot, but USB is highly unreliable. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#976122: u-boot-rpi: Fails with mini-UART
Package: u-boot-rpi Version: 2020.10+dfsg-1+b1 Severity: important Appears "standard" device trees for the Raspberry PI 4B connect the serial pins to the mini-UART. This is troublesome due to the mini-UART's baud rate changing when the processor clock changes. Often Raspberry PI devices have an initial boot phase where the processor clock is locked at maximum for a period, and then decreased. If/when that decrease occurs, the baud rate changes and suddenly serial communication becomes corrupt. 3 strategies come to mind for U-Boot: 1> Dynamically modify the baud rate register as the processor clock changes. If the processor clock is increased, decrease the baud rate register. If the processor clock is decreased, increase the baud rate register. 2> Peg the processor clock at maximum until EFI boot mode is exited. 3> Peg the processor clock at minimum until EFI boot mode is exited. The first is ideal, but requires U-Boot to monitor the processor clock as it changes dynamically. The next two are suboptimal, but not too likely to cause problems. The likely cause for a bootloader (GRUB) to remain active is user interaction via the serial port. In this case a stable baud rate is crucial. GRUB is likely to issue halt instructions while waiting and this should keep processor temperature down. As such I feel pegging the processor clock to max is better than pegging to minimum. Once an OS takes over, EFI boot services should be exited and then it is no longer a U-Boot issue. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#939633: More severe #939633 for RP4 on 5.8?
found 935456 5.9.6-1~bpo10+1 quit After having spent several hours on kernel compiles and experimenting with the situation, I'm fairly sure this also applies to linux-source-5.9. Odd thing is, when I booted the device using the Tianocore implementation it came right up with no problems. I'm getting this odd suspicion someone deliberately broke the device-trees in Debian's kernel source. The goal being to force everyone onto the Tianocore/ACPI implementation and try to kill device-trees. Right now I think this is conspiracy theory territory, but I'm left wondering how such a serious bug could hang around so long... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#824954: IRT: [bug #52939] [PATCH] 10_linux: support loading device trees
The patch to have GRUB load a device-tree is interesting. This is certainly worthy of discussion. Three issues come up when looking though: First, your patch modifies /etc/grub.d/10_linux, but misses /etc/grub.d/10_linux_xen. /etc/grub.d/10_linux_xen needs a fairly similar treatment. Second, rather than having this get buried inside Debian bug #824954, you should instead file a new bug against grub-common. Third, there may be a need for extra guarding to ensure these sections *only* get invoked on ARM devices (I'm fairly sure the *exact* *same* file is shipped for all architectures). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#963962: /etc/grub.d/20_linux_xen generates non-functional menu entries
found 963962 2.02+dfsg1-20+deb10u2 2.04-10 quit I was going to report I'd never observed this bug, but then I examined the grub.cfg files and I discover they're present. I would tend to rate this as minor, but the original submitter didn't adjust severity. With 2.04-10 the xen-4.*.config file entries are absent, but entries for both the .efi file and the other are produced. On an aarch64 system the .efi file can be booted by GRUB 2.04. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#824954: flash-kernel: GRUB? via U-Boot?
For a Raspberry PI, I've got the initial workings of a script to accomplish this goal. First, install u-boot-rpi, raspi-firmware, and grub-efi-arm64. Next, create a filesystem on a device the Raspberry PI will boot from. For anything pre-RP4, this will have to VFAT and show up in a MBR. A system I've done has a GPT with entry #3, which matches with entry #1 in MBR. The Raspberry PI will find this and boot from it, Linux will see it as /dev/sda3. Mount this filesystem on /boot/efi. Do the following: cp /usr/lib/raspi-firmware/* /boot/efi # cp /usr/share/doc/raspi-firmware/copyright /boot/efi/LICENSE.broadcom cp /usr/lib/u-boot/rpi_arm64/u-boot.bin /boot/efi/u-boot64.bin cp /usr/lib/u-boot/rpi_3/u-boot.bin /boot/efi/u-boot3.bin cp /usr/lib/u-boot/rpi_4/u-boot.bin /boot/efi/u-boot4.bin cp /boot/dtbs/`uname -r`/broadcom/bcm2*-rpi*.dtb /boot/efi grub-install --bootloader-id=BOOT cp /boot/efi/EFI/BOOT/grubaa64.efi /boot/efi/EFI/BOOT/bootaa64.efi echo bootaa64 > /boot/efi/startup.nsh Now, I'm using SuSE as a starting point. They copy a series of device-tree overlays into /boot/efi/overlays. These may come from the Raspberry PI Foundation for optional hardware/configuration the RPF provides. Next would be to to create /boot/efi/config.txt. I'm unsure of which directives would be appropriate for Debian. Debian would certainly need to configure distinct "kernel=" lines depending upon which variant was being booted. This is rather badly damaged by bug #939633. Until the device-trees are fixed, this is completely broken. Not ready for most people, but almost there... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#940628: Working in 2.04-8 and 2.04-10
As of 2.04-8 it was possible to boot Xen on ARM. The funky mechanism by which GRUB loads its modules does a good job of obscuring which modules to confirm presence of. Seeing 'xen_loader="xen_hypervisor"' makes one expect to find "/usr/lib/grub/arm64-efi/xen_hypervisor.mod", not for it to be taken care of by "/usr/lib/grub/arm64-efi/xen_boot.mod". -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#939633: More severe #939633 for RP4 on 5.8?
found 939633 5.8.10-1~bpo10+1 severity 939633 important merge 935456 939633 quit I'm left suspecting bugs #935456 and #939633, are in reality a single bug: Raspberry Pi device trees were garbled during Debian's 5.2 kernel development. They appear to remain very garbled, to the point of being pretty well useless. I've built a kernel from Debian's 5.8 kernel source and the device tree binary produced doesn't appear to allow a Raspberry PI 4B to complete its boot. Might be USB functionality is operational, but neither ethernet interface nor display function. Ironically, the additional ACPI/EFI support DOES function. This means the Tianocore image for Raspberry PI 4B works better with the current source. I'm unsure whether badly breaking all Raspberry PI variants quite justifies critical or grave (popular machine, but kernel issues by nature cause 10x the damage so severities should be somewhat damped). I certainly hope to see the 5.9 release since that has additional high-value improvements... -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#939186: [Pkg-xen-devel] Bug#939186: HVM + Balloon crashes Xen hypervisor
On Wed, Nov 25, 2020 at 01:32:10PM +0100, Hans van Kranenburg wrote: > Can you still reproduce this with Xen 4.11 or 4.14? > If not, can you mail 939186-cl...@bugs.debian.org to close it? > > I just tried a few things with maxmem and memory with a PVH guest on Xen > 4.14, and it just seems to work like it should. I /think/ I tried it with 4.11 and it continued to reproduce. That though was sufficiently non-recently that I need to recheck to be certain. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#921547: u-boot: Please consider making u-boot* arch:all
My thinking mirrors one of Jonathan McDowell's: One should be able to build an installation image for $device/$architecture on $random_device/$random_architecture. This is very useful for exactly the same situations where using `debootstrap --foreign` is. Say if one has a desktop already running proper Debian and a target device which needs to get U-Boot. As such let me suggest this should also be considered for all of the u-boot-* packages. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#975685: grub-install fails with U-Boot EFI
Package: grub2-common Version: 2.04-10 `grub-install` fails to install properly when run on a system using U-Boot's implementation of the EFI protocol (potentially also effects package grub-efi-arm64, perhaps this should be against src:grub2). Since a Tianocore-based implementation of the EFI protocol is also available, I can provide more imformation. A useful distinction is U-Boot's EFI implementation does NOT implement EFI variables. This seems a plausible method to distinguish U-Boot's partial EFI implementation from Tianocore's complete EFI implementation. On the U-Boot implementation grubaa64.efi needs to be installed as /boot/efi/EFI/BOOT/bootaa64.efi instead. Roughly akin to --bootloader-id=BOOT, plus an extra rename. I suspect I may be filing other bugs soon. (the platform is a Raspberry Pi 4B, the Tianocore implementation is quite workable except too many pieces of software assume device-tree on ARM and won't work with ACPI) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#824954: flash-kernel: GRUB? via U-Boot?
There may be several distinct bugs involved with #824954. For one, I suspect `grub-install`'s behavior needs to change if EFI variables aren't supported. I use this as a flag which could distinguish installation on top of a full EFI implementation (perhaps Tianocore-derived), versus U-Boot's rather primative EFI implementation. Notably right now `grub-install` tries to install to /boot/efi/EFI/debian by default. This is appropriate for a full EFI implementation where boot entries can be added by adding variables. Yet with U-Boot's limited implementation, the files must go in EFI/BOOT (--bootloader-id=BOOT). Right now I'm simply trying to figure out what others have done to reuse it for my own purposes. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#948712: Returned mail: see transcript for details
reopen 948712 quit There should be a rather obvious use case where absent /boot/firmware is quite appropriate. For someone needing a copy of the firmware, but using other tools to build the boot area. Notably one might use raspi-firmware to retrieve start*.elf/fixup*.dat. Then add u-boot-rpi for second stage bootloader. Next grub-efi-arm* for third stage. Lastly flash-kernel to glue all the pieces together. Not one of these requires the existance of /boot/firmware. In fact, not one of these needs the installation of dosfstools. Perhaps the raspi-firmware package should be split into pieces so as to allow merely installing the actually required portions? (raspi-firmware-bin which depends upon: raspi1-firmware-bin, raspi2-firmware-bin, raspi3-firmware-bin, and raspi4-firmware-bin?) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#563204: Recommends is really too strong for os-prober
Commenting since the report still exists in the bug DB... I've found `os-prober` often produces many false positive OS installation detections. As such I really find recommends too strong, simply including during installation and then merely suggests would be better. If someone removes it, likely that is due to not needing it and the choice should be honored. Worse, on VM systems searching for additional OS installations is a major security risk due to potential for finding VMs instead of host. If one of those manages to boot on bare hardware, everything is compromised. Producing messages during updates though is reasonable. Just don't be too pushy, even keeping recommends packages from being reinstalled requires a careful eye. Mostly documenting some people *really* don't want os-prober, even though we are likely the minority. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#968965: xen: FTBFS woes in sid
On Fri, Nov 20, 2020 at 08:02:26PM +0100, Hans van Kranenburg wrote: > So, > > On 9/21/20 4:16 PM, Hans van Kranenburg wrote: > > [...] > > > > gcc-Wl,-z,relro -Wl,-z,now -pthread -Wl,-soname > > -Wl,libxentoolcore.so.1 -shared -Wl,--version-script=libxentoolcore.map > > -o libxentoolcore.so.1.0 handlereg.opic > > /usr/bin/ld: i386:x86-64 architecture of input file `handlereg.opic' is > > incompatible with i386 output > > /usr/bin/ld: handlereg.opic: file class ELFCLASS64 incompatible with > > ELFCLASS32 > > /usr/bin/ld: final link failed: file in wrong format > > collect2: error: ld returned 1 exit status > > This one is caused by "debian/rules: Combine shared Make args". I > reverted that change for now. > > When retrying the i386 build, I run into yet another failure, sigh: > > >8 > > dh_install: warning: Cannot find (any matches for) > "usr/lib/debug/usr/lib/xen-*/boot/*" (tried in ., debian/tmp) > > dh_install: warning: xen-utils-4.14 missing files: > usr/lib/debug/usr/lib/xen-*/boot/* > dh_install: error: missing files, aborting > > >8 > > I can only find CONFIG_PV_SHIM=n in the build log. What is going on > here? Attached is the build log. > > My WIP branch is here (including the make-patches commit, it's ready to > build). I also forwarded the thing to latest stable-4.14. > > https://salsa.debian.org/xen-team/debian-xen/-/commits/knorrie/4.14/ I was going to type, "That can't be true! Both sections are identical, so that commit *couldn't* have done it!" Being the careful sort, look closer. Look closer. Then realize if one reads fast they look identical, but they're getting *slightly* different values for ${XEN_TARGET_ARCH}. Mainly for $(make_args_xen), ${XEN_TARGET_ARCH} gets $(xen_arch_$(flavour)), but for $(make_args_tools), ${XEN_TARGET_ARCH} gets $(xen_arch_$(DEB_HOST_ARCH)). Three of us and we didn't spot that difference. Should still combine ${XEN_COMPILE_ARCH} which remains identical for both values. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#546392: Isn't bug #546392 complete?
I'm pretty sure bug #546392 was completed several /years/ ago, yet the bug was never marked complete. I don't recall when, but perhaps near version 2.01 or earlier? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#975062: Python 3 (pygrub) in 4.14 packages
On Wed, Nov 18, 2020 at 04:32:00PM +0100, Hans van Kranenburg wrote: > I also have a little snippet from IRC, which is about this, where Ian > reports that he's seen it working. > > https://salsa.debian.org/xen-team/debian-xen/-/snippets/500 > > So, apparently there are cases in which pygrub 'works' and in which it > does not, and apparently using pygrub with "amd64 kernel and Xen tools > but i386 userland" is problematic, and I remember some remarks which I > can't find back about that that use case was probably already broken > always, in the past. > > I wanted to find out about this and set up some test cases to reproduce > things (I've never used pygrub yet), but that obviously did not happen > yet. I have some stuff going on in my personal life that is taking up a > lot of time currently. What is rather easy for *me* is to help > organizing the work and managing todo lists etc, but not learning new > stuff ATM. > > So, my current questions are: > > 1. Is pygrub a blocker for having Xen 4.14 in unstable? Because that > should be our first team-goal now. > 2. What exactly is going on, can we make a list/table/whatever about in > which cases pygrub 'does not work' (in more detail, how does it fail). > 3. pygrub keeps being the thing that always causes problems. What would > be your (asking anyone who wants to think along) ideas about which > well-defined situations/test-cases we should have to execute instead of > having the users report problems after big package changes? > > Hans > > P.S. Next message after the commercials will be on #968965 which is the > other biggest issue for Xen 4.14 in unstable now. Due to working with Pry Mar, I can state the cross-compilation of the Python shared objects may not be 100% functional yet. Looks very much like Python's "distutils" took 2 steps forward and then 2 steps backward during the Python 2 -> Python 3 transition. (Great! Linking and compilation got separated. Ewww! CFLAGS gets appended to LDFLAGS. Great! We'll add support for architecture-specific compilation directories. Ewww! There isn't a good way to pass in the architecture triplet.) I've got an initial patch for working around an issue here, but the quality doesn't look great to me. Something along those lines should be submitted to Xen, but I'm unsure of all the issues. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 >From 1bb407482fa82ad5034a4e4bdfa34dfa3a828f9a Mon Sep 17 00:00:00 2001 From: Elliott Mitchell Date: Thu, 1 Oct 2020 15:19:33 -0700 Subject: [PATCH] tools/python: Correct extension filenames for Python 3 Appears Python became *more* difficult to properly cross-compile between Python 2 and Python 3. This takes care of the naming of the xc.so/xs.so extension shared objects for Python. Signed-off-by: Elliott Mitchell --- tools/pygrub/setup.py | 16 +++- tools/python/setup.py | 16 +++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/tools/pygrub/setup.py b/tools/pygrub/setup.py index 91019e97e7..dfe01e6220 100644 --- a/tools/pygrub/setup.py +++ b/tools/pygrub/setup.py @@ -11,6 +11,19 @@ except KeyError: pass XEN_ROOT = "../.." +from distutils import command +import distutils.command.build_ext +class BuildExtArch(distutils.command.build_ext.build_ext): + arch_map = { + 'x86_64': 'amd64', + 'x86_32': 'i386', + 'arm64': 'aarch64', + 'arm32': 'armel', + } + def get_ext_filename(self, ext_name): + name = super().get_ext_filename(ext_name) + return name.replace(os.getenv("XEN_COMPILE_ARCH"), self.arch_map[os.getenv("XEN_TARGET_ARCH")]) + xenfsimage = Extension("xenfsimage", extra_compile_args = extra_compile_args, extra_link_args = extra_link_args, @@ -30,5 +43,6 @@ setup(name='pygrub', package_dir={'grub': 'src', 'fsimage': 'src'}, scripts = ["src/pygrub"], packages=pkgs, - ext_modules = [ xenfsimage ] + ext_modules = [ xenfsimage ], + cmdclass = {'build_ext': BuildExtArch}, ) diff --git a/tools/python/setup.py b/tools/python/setup.py index 8faf1c0ddc..6b95d30b89 100644 --- a/tools/python/setup.py +++ b/tools/python/setup.py @@ -13,6 +13,19 @@ PATH_LIBXC= XEN_ROOT + "/tools/libxc" PATH_LIBXL= XEN_ROOT + "/tools/libxl" PATH_XENSTORE = XEN_ROOT + "/tools/xenstore" +from distutils import command +import distutils.command.build_ext +class BuildExtArch(distutils.command.build_ext.build_ext): + arch_map = { + 'x86_64': 'amd64', + 'x86_32': 'i386', + 'arm64': 'aarch64', + 'arm32': '
Bug#974756: idle3-tools: Needs support for drives behind controllers
Package: idle3-tools Version: 0.9.1-2 `idle3ctl` needs an implementation of `smartctl`'s -d option in order to talk to disks behind hardware RAID controllers. This is nearly a bug in smartmontools of the code for the -d option needing to turn into a library so other low-level tools can utilize it. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#974755: smartd: Problematic memory activity (triggers oom-killer)
Package: smartmontools Version: 6.6-1 `smartd` is doing some sort of activity which tends to trigger the kernel oom-killer. I suspect this may relate to triggering self-tests. System in question has plenty of swap available, and presently reports more than 50MB of available memory. Presently adding "choom -p $$ -n +500 >/dev/null" to /etc/default/smartmontools works around the worst of the damage as this causes it to tend to kill itself instead of rather more critical daemons. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#774129: dpkg-buildpackage: Should set the cross build profile automatically
(sending a second copy to the body of the message since <774...@bugs.debian.orgg> didn't quite work) retitle 774129 dpkg-buildpackage: Should set the cross build profile automatically severity 774129 normal quit Setting the "cross" build profile could be the difference between a successful cross package build and a build failure. As such I believe this rates a normal bug as the -a/-t options are effectively broken right now. Stating in the man page "-Pcross" must be used if -a or -t is used might turn this minor, though I really think the full solution should be aimed for. As stated in my last message, I also think setting the cross profile should cause "noocaml" and potentially a few other no profiles to be set. This would both simplify supporting cross-building (since packages wouldn't need to detect the cross profile at every point the noocaml profile is detected). This would also make a future transition when OCAML became cross build-friendly simpler since some packages wouldn't need further adjustment to work, and those which did need adjustment would cause bug reports mentioning the situation. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#971397: dpkg-dev: dpkg-buildpackage -P option behavior change in update
Package: dpkg-dev Version: 1.19.7 Severity: important Between versions 1.19.6 and 1.19.7 the behavior of the -P option for dpkg-buildpackage changed. At 1.19.6 if there was no string directly on the -P option, the following argument would be interpreted as the profiles to set. At 1.19.7 the string MUST be part of the same argument. ie at 1.19.6, `dpkg-buildpackage -a arm64 -P cross` worked, while 1.19.7 *requires* `dpkg-buildpackage -a arm64 -Pcross` (the latter may have worked with 1.19.6, but the former worked with 1.19.6) I see good arguments both for and against allowing or not the profile list being a separate argument. Overtly user-visible behavior though should NOT change with patch-level changes (should be minor-version). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#961511: [Pkg-xen-devel] Bug#961511: [PATCH] d/xen-utils-common.xen.init: disable oom killer for xenstored
On Tue, Sep 22, 2020 at 02:39:09PM +0200, Hans van Kranenburg wrote: > How did you test it and how did you get a working process without the --? By reading the man page, noticing there was no mention of "--" and then trying `choom -n +5 sleep 5` and found that worked. When you sent this message I checked and GNU `sleep` does have "--version", thus I tried `choom -n +5 sleep 5 --version` and found *that* failed. A "--" seemed natural, but documentation omitting crucial details is a problem. Never mind. Nice find, I did at one point have the oom-killer get the wrong process and saw *problems*. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#961511: [PATCH] d/xen-utils-common.xen.init: disable oom killer for xenstored
This is fun. Actually isn't too difficult to trigger, simply slowly reduce the memory Xen allocates to Dom0 and eventually the oom-killer is likely to trigger (having tried to shrink Dom0 as far as possible, believe me, I know). I had been wondering which of the Xen daemons could be safely restarted since it is handy to restart daemons instead of whole machine for security updates... Interestingly running `xenstored --help` mentions: -I, --internal-db store database in memory, not on disk There is a run/xenstored/tdb file so I end up wondering if newer versions are in fact storing everything in a file and restarting isn't so bad. The patch switches the arguments from: --exec "$try_xenstored" -- ... to: --exec /usr/bin/choom -- -n -1000 "$try_xenstored" -- ... I'm pretty sure start-stop-daemon is consuming the "--" and the second "--" shouldn't be there. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#774129: dpkg-*: Doesn't set cross-build profile with -a or -t
found 774129 1.19.7 quit You might consider -a/--target-arch or -t/--target-type to merely be conveniences, but /not/ enabling the cross profile when the build arch differs from the host arch is stopping a decimeter shy of the goal line. Is it even possible someone /wouldn't/ want the cross profile if cross-building? (a --no-default-profiles option could be worthwhile) While the original bug report is about `dpkg-buildpackage`, this also applies to `dpkg-checkbuilddeps`. I'm inclined to rate this as more than "wishlist". At a minimum, the man page for `dpkg-buildpackage` would need to suggest using "-P cross" when using -a or -t. What is worthy of consideration is whether enabling the cross profile should cause the noocaml profile (possibly others) to be enabled. Cross-compiling Python bindings is pretty simple, depend on libpython-dev and python-dev:any. OCAML is presently very broken for cross-compilation (pretty well impossible). I'm unsure of the other profiles. My concern is if enabling cross doesn't enable noocaml, any package setup for cross-compilation, but includes some OCAML will have to disable OCAML if either is enabled. In the future once OCAML becomes cross-friendly, every single package will then need to reenable OCAML. Whereas if dpkg-buildpackage is handling the situation merely removing cross => noocaml could take care of those which don't need adjustment to OCAML invocation (and would then generate new reports for packages which do need adjustment). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#965245: [Pkg-xen-devel] Bug#965245: Cross-build issues
On Sat, Jul 18, 2020 at 04:08:50PM +0200, Hans van Kranenburg wrote: > On 7/18/20 5:53 AM, Elliott Mitchell wrote: > > Package: src:xen > > Version: 4.13 > > Tags: patch > > > > I've been playing try to get Xen 4.13 to cross-build for ARM. In the > > process I've been running into bunches of problems, so here are fixes. > > Can you: > * add a 'why' line to the commit message of the first patch > * add Signed-off-by lines > * and then mailbomb (git send-email) it to > pkg-xen-de...@lists.alioth.debian.org with Cc to Ian Jackson > ? Just all of it in 1 mail thread? (So, > with 0/10 cover letter which does not have to contain anything else than > something like 'Hi! See #965245, kthxbye'.) > > Then we can collect some Reviewed-by etc. Will do, may end up collecting an extra patch or two in the process (one of these has been sent upstream, Debian builds are unfinished for me). > > OCAML/xenstored is being problematic, that looks like outright bugs on > > ocaml-nox making it unusable for cross-building. > > The cxenstored is also still there. The init scripts look if oxenstored > is installed, and if not, it falls back to using normal xenstored. So, I > suspect if you patch it out of the build for this arch, then no other > changes are necessary. (Normally both are built now, so that if a user > wants, in case of problems or whatever, they can switch back). The problem is OCAML is basically utterly broken for cross-building. There is the "-cc" argument for `ocamlc` which looks like someone started work on making it work cross-architecture, but never finished. In light of this, that is pretty much what I've done. In order to get dh_install to cooperate and ensure xen-utils-wrapper functions with distinct builds, I need substitues for oxenstored.conf and oxenstored. > > I'm including copies of 3 patches from Julien Grall. Upstream source for > > this is: git://xenbits.xen.org/people/julieng/xen-unstable.git The > > branch "arm-dma/v2". > > Ok, these patches are in Xen 4.14 I see. First thing I want to do going > forward is forwarding the packaging to that. I hope this will also only > make your life easier. Hmm, thought they were against 4.13. Might be these revised ones are targeting 4.14, but the code is the same on 4.13. > But, keep the 3 upstream patches in the set for now, so that it's > explicit that you need them for this. > > > Why yes, I am trying to get Xen operational on a Raspberry PI. Why do > > you ask? :-) > > Haha. Exciting. I like it. Looking forward to see it working and help > testing it here. I didn't do cross-building yet, so time to learn > something new. There appear to be a *bunch* of people trying to get Xen operational on Raspberry PI 4b devices. I'm aiming for what I consider to be a straightforward approach, which is to use existing packaging tools. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445