Hello,

this problem has been bugging me for some time so I tried to dissect it using strace, and eventually, I was lucky.

The problem is with the code that transfers data from the testbed to the host. The relevant part of the trace is:

read(4, "3\"\n\t\t      (literal (question-an"..., 1000000) = 466944
mremap(0xb4e16000, 1003520, 471040, MREMAP_MAYMOVE) = 0xb4e16000
munmap(0xb4f0b000, 4096)                = 0
write(1, "3\"\n\t\t      (literal (question-an"..., 466944) = 466944
mmap2(NULL, 1003520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb4f0b000
read(4, " to remove any fcrontab, and\n   "..., 1000000) = 1000000
munmap(0xb4e16000, 471040)              = 0
write(1, " to remove any fcrontab, and\n   "..., 1000000) = 1000000
munmap(0xb4f0b000, 1003520)             = 0
madvise(0xb5bff000, 8372224, MADV_DONTNEED) = 0
exit(0)                                 = ?

As you van see, the process read 1000000 bytes and then exited, although there would probably be more bytes available. IMHO that was because it was signaled (by setting the running flag) that it the testbed process has ended.

I have created a patch that somewhat mitigates this race condition by ensuring all available data are read from the file before the running flag is checked. I need to test it more, but it seems to work so far. However, it could still fail if the data arrived in the file in the (small) window between the read and the check for the running flag, which could be set in that time as well.

Regarding this whole auxverb thing and the shovel function, did you consider any other solutions, which could be more reliable? For example, I believe the files it actually uses are normal files, is that right? If it is, couldn't the output from the testbed be collected synchronously after the testbed has exited. Or, could named pipes be used instead, which would obviate the need to guess when the data end?


Regards

    Jiri Palecek

--- autopkgtest-normal/usr/bin/autopkgtest-virt-qemu	2017-04-30 19:09:57.000000000 +0200
+++ /mnt/extras/src/autopkgtest-4.4+nmu1/virt/autopkgtest-virt-qemu	2017-08-14 03:25:20.382311089 +0200
@@ -328,7 +328,7 @@
     fcntl.fcntl(fin, fcntl.F_SETFL,
                 fcntl.fcntl(fin, fcntl.F_GETFL) | os.O_NONBLOCK)
     count = 0
-    while running:
+    while True:
         try:
             block = os.read(fin, 1000000)
             if flagfile_on_eof and not block:
@@ -343,6 +343,8 @@
                 raise
             block = None
         if not block:
+            if not running:
+                return
             time.sleep(0.01)
             continue
         while True:

Reply via email to