On Tue, May 01, 2018 at 11:57:25AM +0100, Dr. David Alan Gilbert wrote: > * Daniel P. Berrangé (berra...@redhat.com) wrote: > > On Tue, May 01, 2018 at 11:00:35AM +0100, Daniel P. Berrangé wrote: > > > On Mon, Apr 30, 2018 at 07:59:43PM +0100, Dr. David Alan Gilbert (git) > > > wrote: > > > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > > > > > During a TLS connect we see: > > > > migration_channel_connect calls > > > > migration_tls_channel_connect > > > > (calls after TLS setup) > > > > migration_channel_connect > > > > > > > > My previous error handling fix made migration_channel_connect > > > > call migrate_fd_connect in all cases; unfortunately the above > > > > means it gets called twice and crashes doing double cleanup. > > > > > > > > Fixes: 688a3dcba98 > > > > > > This fixes the crash, but we're still seeing error messages duplicated > > > > > > (qemu) migrate_set_parameter tls-creds tls0 > > > (qemu) migrate tcp:localhost:9000 > > > qemu-system-x86_64: Certificate does not match the hostname localhost > > > qemu-system-x86_64: Certificate does not match the hostname localhost > > > > > > git bisect points to 688a3dcba98 as the cause of these double > > > errors still. > > Can you give me the full sequence to trigger that; I did try yesterday > and have failed to get that error out of the TLS code.
Essentially I follow this: https://qemu.weilnetz.de/doc/qemu-doc.html#network_005ftls when generating the server-cert.pem file I use this input template: $ cat server.info organization = My Org cn = 127.0.0.1 # Commented out to force failure with 'localhost' name #dns_name = "localhost" dns_name = "localhost.localdomain" ip_address = 127.0.0.1 ip_address = ::1 tls_www_server encryption_key signing_key $ certtool --generate-certificate \ --load-ca-certificate ca-cert.pem \ --load-ca-privkey ca-key.pem \ --load-privkey server-key.pem \ --template server.info \ --outfile server-cert.pem On the server (mig target) side I run: $ ./x86_64-softmmu/qemu-system-x86_64 \ -object tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/security/qemutls,verify-peer=no \ -incoming defer -monitor stdio -display none QEMU 2.11.50 monitor - type 'help' for more information (qemu) migrate_set_parameter tls-creds tls0 (qemu) migrate_incoming tcp:localhost:9000 Notice I've set verify-peer=no on the server, since I've not bothered to issue any client-cert.pem on my dev setup here. On the client (mig source) side I run: $ ./x86_64-softmmu/qemu-system-x86_64 \ -object tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/security/qemutls\ -monitor stdio -display none QEMU 2.11.50 monitor - type 'help' for more information (qemu) migrate_set_parameter tls-creds tls0 (qemu) migrate tcp:localhost:9000 qemu-system-x86_64: Certificate does not match the hostname localhost qemu-system-x86_64: Certificate does not match the hostname localhost (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off Migration status: failed (Certificate does not match the hostname localhost) total time: 0 milliseconds Notice that when generating the server certificate I did *not* include "localhost" as a DNS name, nor in the Common Name. So when I connect with a URI of "tcp:localhost:9000" we correctly see that the "localhost" name doesn't match any DNS name or common name in the server cert. If I instead set the tls-hostname parameter it will succeed: (qemu) migrate_set_parameter tls-hostname 127.0.0.1 (qemu) migrate_set_parameter tls-creds tls0 (qemu) migrate tcp:localhost:9000 (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off Migration status: completed total time: 131 milliseconds downtime: 5 milliseconds setup: 1 milliseconds transferred ram: 1369 kbytes throughput: 105.76 mbps remaining ram: 0 kbytes total ram: 148296 kbytes duplicate: 36817 pages skipped: 0 pages normal: 261 pages normal bytes: 1044 kbytes dirty sync count: 2 page size: 4 kbytes Or likewise if I used "tcp:127.0.0.1:9000" it would succeed too, because "127.0.0.1" is listed as a IP address in the server cert. > > > The second stack trace is the error reporting context that I added > > originally > > in > > > > commit d59ce6f34434bf47a9b26138c908650bf9a24be1 > > Author: Daniel P. Berrange <berra...@redhat.com> > > Date: Wed Apr 27 11:05:00 2016 +0100 > > > > migration: add reporting of errors for outgoing migration > > > > > > So the first stack trace is the new duplicate. > > > > Which error reporting context is "better" though, I don't know ? > > > OK, I see why I didn't see this - I almost always use migrate -d and > that timer only happens without the -d. Oh, yes, I forget to mention I left migration in the foreground. If you background it, then my original patch would only show the error message when you did "info migrate" > > My patch was based on the view that, although alot of code uses > > error_report, > > long term all migration would eventually need to be able to filter an > > 'Error *errp' back up the stack, so that we can pass it back to QMP / HMP > > via > > 'info migrate' / query-migrate. So I decided to leave the error_report_err > > call to the hmp.c code, as long term that's the only place that would need > > to print to the console. > > I wanted it to land in the /var/log/libvirt/qemu/whatever so we could > see what happened. In general QMP commands are expected to send errors back to the QMP client, not write them to stderr. Of course we have substantial legacy code so some commands do end up only using error_report, but the general expectation with QMP is that we'd eliminate error_report in codepaths triggered from monitor commands. The only reason we print to stderr with QMP is that it would be impossible to diagnose failures for code that is not converted to "Error *" yet. Places where we have converted to "Error *" don't need to use stderr. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|