On 06.02.20 17:27, Lukáš Doktor wrote: > Dne 06. 02. 20 v 16:03 Max Reitz napsal(a): >> On 03.02.20 08:59, Lukáš Doktor wrote: >>> Using a range of ports from 32768 to 65538 is dangerous as some >>> application might already be listening there and interfere with the >>> testing. There is no way to reserve ports, but let's decrease the chance >>> of interactions by only using ports that were free at the time of >>> importing this module. >>> >>> Without this patch CI occasionally fails with used ports. Additionally I >>> tried listening on the first port to be tried via "nc -l localhost >>> $port" and no matter how many other ports were available it always >>> hanged for infinity. >> >> I’m afraid I don’t quite understand. The new functions check whether >> the ports are available for use by creating a server on them (i.e., >> binding a socket there). The current code just lets qemu create a >> server there, and see if that works or not. >> >> So the only difference I can see is that instead of trying out random >> ports during the test and see whether they’re free to use we do this >> check only once when the test is started. >> >> And the only problem I can imagine from your description is that there >> is some other tool on the system that tries to set up a server but >> cannot because we already have an NBD server there (by accident). >> >> But I don’t see how checking for free ports once at startup solves that >> problem reliably. >> >> If what I guessed above is right, the only reliable solution I can >> imagine would be to allow users to specify the port range through >> environment variables, and then you’d have to specify a range that you >> know is free for use. >> >> Max >> > > Hello Max, > > thank you and I am sorry for not digging deep enough. This week my CI failed > with: > > 01:24:06 DEBUG| [stdout] +ERROR: test_inet (__main__.QemuNBD) > 01:24:06 DEBUG| [stdout] > +---------------------------------------------------------------------- > 01:24:06 DEBUG| [stdout] +Traceback (most recent call last): > 01:24:06 DEBUG| [stdout] + File "147", line 85, in setUp > 01:24:06 DEBUG| [stdout] + self.vm.launch() > 01:24:06 DEBUG| [stdout] + File > "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", > line 302, in launch > 01:24:06 DEBUG| [stdout] + self._launch() > 01:24:06 DEBUG| [stdout] + File > "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", > line 319, in _launch > 01:24:06 DEBUG| [stdout] + self._pre_launch() > 01:24:06 DEBUG| [stdout] + File > "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/qtest.py", > line 106, in _pre_launch > 01:24:06 DEBUG| [stdout] + super(QEMUQtestMachine, self)._pre_launch() > 01:24:06 DEBUG| [stdout] + File > "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", > line 270, in _pre_launch > 01:24:06 DEBUG| [stdout] + self._qmp = > qmp.QEMUMonitorProtocol(self._vm_monitor, server=True) > 01:24:06 DEBUG| [stdout] + File > "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/qmp.py", > line 60, in __init__ > 01:24:06 DEBUG| [stdout] + self.__sock.bind(self.__address) > 01:24:06 DEBUG| [stdout] +OSError: [Errno 98] Address already in use > > I made the mistake of reproducing this on my home system using the qemu > revision that I had and assuming it's caused by a used port. So I limited the > port range and used nc to occupy the port. It sort-of reproduced but instead > of Address already in use it hanged until I kill the nc process. Then it > failed with: > > +Traceback (most recent call last): > + File "147", line 124, in test_inet > + flatten_sock_addr(address)) > + File "147", line 59, in client_test > + self.assert_qmp(result, 'return', {}) > + File "/home/medic/Work/Projekty/qemu/tests/qemu-iotests/iotests.py", line > 821, in assert_qmp > + result = self.dictpath(d, path) > + File "/home/medic/Work/Projekty/qemu/tests/qemu-iotests/iotests.py", line > 797, in dictpath > + self.fail('failed path traversal for "%s" in "%s"' % (path, str(d))) > +AssertionError: failed path traversal for "return" in "{'error': {'class': > 'GenericError', 'desc': 'Failed to read initial magic: Unexpected end-of-file > before all bytes were read'}}" > > After a brief study I thought qemu is not doing the job well enough and > wanted to add a protection. Anyway after a more thorough overview I came to a > different conclusion and that is that we are facing the same issue as with > incoming migration about a year ago. What happened is that I started "nc -l > localhost 32789" which results in: > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > nc 26758 medic 3u IPv6 9579487 0t0 TCP localhost:32789 (LISTEN) > > Then we start the server by "_try_server_up" where qemu-nbd detects the port > is occupied on IPv6 but available on IPv4, so it claims it: > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > nc 26758 medic 3u IPv6 9579487 0t0 TCP localhost:32789 > (LISTEN) > qemu-nbd 26927 medic 4u IPv4 9591857 0t0 TCP localhost:32789 > (LISTEN) > > and reports success. Then we try to connect but the hotplugged VM first > attempts to connect on the IPv6 address and hangs for infinity. > > Now is this an expected behavior? If so then we need the find_free_address > (but preferably directly in _try_server_up just before starting the qemu-nbd) > to leave as little time-frame for collision as possible. Otherwise the test > is alright and qemu-nbd needs a fix to bail out in case some address is > already used (IIRC this is what incoming migration does).
Ah, OK. Well, expected behavior... It’s a shame, that’s what it is. > My second mistake was testing this on the old code-base and rebasing it only > before sending the patch (without testing after the rebase). If I were to > test it first, I would have found out that the real reproducer is simply > running the test as the commit 8dff69b9415b4287e900358744b732195e1ab2e2 broke > it. > > > So basically there are 2 actions: > > 1. fix the test as on my system it fails in 100% of cases, bisect says the > first bad commit is 8dff69b9415b4287e900358744b732195e1ab2e2. Would anyone > have time in digging into this? I already spent way too much time on this and > don't really know what that commit is trying to do. Yep, I’ve sent a patch: https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00294.html > 2. decide on the behavior when IPv4/6 is already in use (bail-out or start). > 2a. In case it should bail-out than the test is correct and there is no need > for my patch. On the other hand qemu-nbd has to be fixed I don’t think it makes much sense to let qemu’s NBD server ensure that it claims both IPv4 and IPv6 in case the user specifies a non-descriptive hostname. > 2b. Otherwise I can send a v2 that will check the port in the _try_server_up > just before starting qemu-nbd to minimize the risk of using a utilized port > (or should you decide it's not worth checking, I can simply forget about this) Hm. It wouldn’t be fully reliable, but, well... The risk would be minimal. OTOH, would it work if we just did a %s/localhost/127.0.0.1/ in the test? We have specific cases for IPv6, so I think it makes sense to force IPv4 in all other cases. Max