I may have had an epiphany .either that or I'm totally confused (no bets here :-) If the send_command doesn't get a result immediately can't poll() walk over it? So "get radio" might end up getting some other response since it's first on the read queue? So it sends a command that doesn't get an immediate response, then poll() comes in and does "get frequencies" and the other command sees the response to that? And then that confuses HRD by receiving an additional command before responding to the first one?
Mike W9MDB From: Bill Somerville [mailto:g4...@classdesign.com] Sent: Friday, September 05, 2014 7:21 PM To: wsjt-devel@lists.sourceforge.net Subject: Re: [wsjt-devel] HRD tcp bug On 06/09/2014 00:53, Michael Black wrote: Hi Mike, OK current run with rev 4255 but I am running with 10 second socket timeout modification. I ran WSJT-X for a while. I then started HRD Logbook at 13:10. Port 56169 is WSJT-X and Port 56193 is HRD Logbook. Takes a few seconds for Logbook to get going . 13:10:07.064095 WSJT-X does "get frequencies" and HRD does ACK that packet 13:10:07.171095 Logger connects to HRD 13:10:09.593095 WSJT-X finally responds to WSJT-X "get frequencies" 13:10:10.105095 WSJT-X sends "[743] get radio" 13:10:10.697095 reply to "get radio" 13:10:16.190095 WSJT-X sends "[743] get radio" ACK is received but no reply from HRD 13:10:26.189095 retransmit ACK but no reply 13:10:36.189095 retransmit ACK but no reply 13:10:46:189095 retransmit ACK but no reply 13:10:56:189095 retransmit ACK but no reply WSTJX retries exhausted. OK, that's not good. So, in this case, HRD is clearly just swallowing the "get radio" command and never responding to it while it's happily chatting with HRD Logger. It might be related to "get radio". This is what Logger does 13:10:09.595095 IP 127.0.0.1.56193 > 127.0.0.1.7809: P 79:121(42) ack 109 win 127 0x0000: 4500 0052 6865 4000 8006 0000 7f00 0001 E..Rhe@......... 0x0010: 7f00 0001 db81 1e81 8d15 bf8b a1fe 10ac ................ 0x0020: 5018 007f e852 0000 2a00 0000 cdab 3412 P....R..*.....4. 0x0030: 3412 cdab 0000 0000 4700 6500 7400 2000 4.......G.e.t... 0x0040: 5200 6100 6400 6900 6f00 7300 0000 0000 R.a.d.i.o.s..... 0x0050: 0000 .. 13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: . ack 121 win 127 0x0000: 4500 0028 6866 4000 8006 0000 7f00 0001 E..(hf@......... 0x0010: 7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5 ................ 0x0020: 5010 007f b7da 0000 P....... 13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: P 109:177(68) ack 121 win 127 0x0000: 4500 006c 6867 4000 8006 0000 7f00 0001 E..lhg@......... 0x0010: 7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5 ................ 0x0020: 5018 007f 280c 0000 4400 0000 cdab 3412 P...(...D.....4. 0x0030: 3412 cdab 0000 0000 3700 3400 3300 3a00 4.......7.4.3.:. 0x0040: 5400 5400 2d00 4f00 4d00 4e00 4900 2000 T.T.-.O.M.N.I... 0x0050: 5600 4900 4900 2000 2800 5200 6100 6400 V.I.I...(.R.a.d. 0x0060: 6900 6f00 2900 0000 0000 0000 i.o.)....... Once that is sent HRD no longer seems to want to respond to "get radio". Could this be a new command that should be used "get radios" instead of "get radio" ?? My guess when implementing this was that I needed to detect a change of radio so I prefix every command with a "get radio" exchange to check that I am still talking to the same rig. "get radio" seems to get the radio that currently has focus in HRD. IIRC the only way I could be consistent when there was more than one rig connected to HRD was to always address the rig that has focus, otherwise we would have to specify the actual radio name somewhere in WSJT-X which I wanted to avoid. The "get radios" command is different as it returns all the rigs currently connected to HRD. It is quite possible that the "get radio" exchange is actually the cause of the issues you are seeing. It is late here; so tomorrow I'll have a look at a version that doesn't repeatedly use "get radio" to check the HRD context number. The change will be non-trivial but I should be able to come up with something we can at least test with to verify we have found the action that triggers the issue. Mike W9MDB 73 Bill G4WJS. From: Bill Somerville [mailto:g4...@classdesign.com] Sent: Friday, September 05, 2014 10:52 AM To: wsjt-devel@lists.sourceforge.net Subject: Re: [wsjt-devel] HRD tcp bug On 05/09/2014 16:29, Michael Black wrote: Hi Mike, Still having problems with the HRD tcp bug. What's of interest to me is that I can run HRD Logger and DM780 both of which use port 7809 too and they don't seem to have any problems. So I'm going to try and compare these to hopefully figure out what's going on. If you want I can capture the network traffic with all 3 of these things running and send it to you along with a trace log. This packet dump shows an example of the problem. After HRD died I clicked on "retry" so this is the sequence that followed. 11:28:27.460700 The first command "get context" goes out 11:28:32.460910 retransmit of "get context" after timeout 11:28:32.592923 HRD responds to first attempt 11:28:32.592923 WSJTX moves on to "get id" 11:28:35.770041 HRD responds to 2nd attempt I do get rather significant pauses in HRD especially when switching frequencies. So I bumped up the socket_wait_time to 10000 and now I hit the max # of retries error which I've never seen before .during which menu operations on WSJTX are pretty much hosed sionce timeout takes a minute or so with 5 retries. Is that the socket wait time in WSJT-X? This is an interesting observation, all the CAT activity in WSJT-X should be asynchronous as it is running in a separate dedicated thread and the main GUI doesn't block waiting for responses. I need to find out why you are seeing this problem as it may be an error in the message queuing to and from the transceiver control thread. Mike W9MDB 73 Bill G4WJS. ---------------------------------------------------------------------------- -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel