On 13 June 2014, quoth I: > Ah. I found a spot in ec_master_queue_datagram where I had incorrectly > applied patch 11 (and jiffies_sent would have been 0). I've been > sidetracked a little and haven't had a chance to re-test this, but I > expect it will solve the issue; thanks for the hint!
Unfortunately that wasn't it. But the bisection search I did definitely suggested it was something in this patch that introduced the behaviour. I did make some other intentional changes from the patch: - I changed the "protocol" parameter of ec_slave_datagram_to_buffer and ec_slave_buffer_to_datagram to uint8_t from uint16_t (since that seems more consistent). - I left out the cycles_poll > cycles_sent and jiffies_poll > jiffies_sent checks in the timeout checking, since as I noted before these would not be safe against wraparound. - Some of the logging levels were changed. - Otherwise it's only minor formatting changes and nothing that should affect functionality. If I put the jiffies_poll > jiffies_sent check back in, I do not get these 17171869us timeouts; but I'm unconvinced it's safe to leave this check in. Incidentally, reversing the calculation: time_us = (unsigned int)((jiffies_poll - jiffies_sent) * 1000000 / HZ); where time_us = 17171869 and HZ = 250, suggests that (poll-sent) is -2 jiffies (17171869 * 250 / 1000000 ~= -2), but then plugging that forwards through the formula I'm not sure why time_us != 4294959296 (aka -8000) instead. (I did notice that (uint32_t)-2 * (uint64_t)(1000000 / 250) = 17179869176000, which at least has the right digits but they're at the wrong magnitude.) I suspect I'm missing something fundamental (and obvious). In any case, I changed the code where it assigned the current time to datagrams pulled out of the buffer to assign {cycles,jiffies}_poll instead, and that seemed to resolve the issue without the need for the dubious comparison. Do you agree that this is a reasonable solution? > (Part of the side-track suggested that patch 26 might not be sufficient > to solve that problem, but I haven't confirmed that yet, and it'll > probably be a few days before I get a chance to check it again. And of > course it's possible that this was just another error on my part, or > affected by the above goof.) This did work in the end; when it was appearing not to work, it was on a bisection build that didn't have that patch applied yet. _______________________________________________ etherlab-dev mailing list etherlab-dev@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-dev