On 06/09/2014 17:37, Michael Black wrote: Hi Mike, > I may have had an epiphany….either that or I'm totally confused (no bets > here :-) > If the send_command doesn't get a result immediately can't poll() walk over > it? So "get radio" might end up getting some other response since it's > first on the read queue? > So it sends a command that doesn't get an immediate response, then poll() > comes in and does "get frequencies" and the other command sees the response > to that? > And then that confuses HRD by receiving an additional command before > responding to the first one? I don't think so. All the CAT commands run in a single thread and block for replies. That means that send_command cannot be interrupted by a do_sync (the consequence of a poll). The lack of a reply when one is expected will cause the time out exception but not an exit from send_command until all the retries have been exhausted.
I don't have access to my shack at present since it is active in the AA contest, I hope to look at this further a bit later. > > > Mike W9MDB 73 Bill G4WJS. > > From: Bill Somerville [mailto:g4...@classdesign.com] > Sent: Friday, September 05, 2014 7:21 PM > To: wsjt-devel@lists.sourceforge.net > Subject: Re: [wsjt-devel] HRD tcp bug > > On 06/09/2014 00:53, Michael Black wrote: > Hi Mike, > OK…current run with rev 4255 but I am running with 10 second socket timeout > modification. > > I ran WSJT-X for a while. > I then started HRD Logbook at 13:10. Port 56169 is WSJT-X and Port 56193 > is HRD Logbook. > > Takes a few seconds for Logbook to get going…. > 13:10:07.064095 WSJT-X does "get frequencies" and HRD does ACK that packet > 13:10:07.171095 Logger connects to HRD > > 13:10:09.593095 WSJT-X finally responds to WSJT-X "get frequencies" > 13:10:10.105095 WSJT-X sends "[743] get radio" > 13:10:10.697095 reply to "get radio" > > 13:10:16.190095 WSJT-X sends "[743] get radio" – ACK is received but no > reply from HRD > 13:10:26.189095 retransmit – ACK but no reply > 13:10:36.189095 retransmit – ACK but no reply > 13:10:46:189095 retransmit – ACK but no reply > 13:10:56:189095 retransmit – ACK but no reply > > WSTJX retries exhausted. > OK, that's not good. > > > So, in this case, HRD is clearly just swallowing the "get radio" command and > never responding to it while it's happily chatting with HRD Logger. > > It might be related to "get radio". This is what Logger does > 13:10:09.595095 IP 127.0.0.1.56193 > 127.0.0.1.7809: P 79:121(42) ack 109 > win 127 > 0x0000: 4500 0052 6865 4000 8006 0000 7f00 0001 > E..Rhe@......... > 0x0010: 7f00 0001 db81 1e81 8d15 bf8b a1fe 10ac > ................ > 0x0020: 5018 007f e852 0000 2a00 0000 cdab 3412 > P....R..*.....4. > 0x0030: 3412 cdab 0000 0000 4700 6500 7400 2000 > 4.......G.e.t... > 0x0040: 5200 6100 6400 6900 6f00 7300 0000 0000 > R.a.d.i.o.s..... > 0x0050: 0000 .. > 13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: . ack 121 win 127 > 0x0000: 4500 0028 6866 4000 8006 0000 7f00 0001 > E..(hf@......... > 0x0010: 7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5 > ................ > 0x0020: 5010 007f b7da 0000 P....... > 13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: P 109:177(68) ack 121 > win 127 > 0x0000: 4500 006c 6867 4000 8006 0000 7f00 0001 > E..lhg@......... > 0x0010: 7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5 > ................ > 0x0020: 5018 007f 280c 0000 4400 0000 cdab 3412 > P...(...D.....4. > 0x0030: 3412 cdab 0000 0000 3700 3400 3300 3a00 > 4.......7.4.3.:. > 0x0040: 5400 5400 2d00 4f00 4d00 4e00 4900 2000 > T.T.-.O.M.N.I... > 0x0050: 5600 4900 4900 2000 2800 5200 6100 6400 > V.I.I...(.R.a.d. > 0x0060: 6900 6f00 2900 0000 0000 0000 > i.o.)....... > > Once that is sent HRD no longer seems to want to respond to "get radio". > Could this be a new command that should be used "get radios" instead of "get > radio" ?? > My guess when implementing this was that I needed to detect a change of > radio so I prefix every command with a "get radio" exchange to check that I > am still talking to the same rig. "get radio" seems to get the radio that > currently has focus in HRD. > > IIRC the only way I could be consistent when there was more than one rig > connected to HRD was to always address the rig that has focus, otherwise we > would have to specify the actual radio name somewhere in WSJT-X which I > wanted to avoid. > > The "get radios" command is different as it returns all the rigs currently > connected to HRD. > > It is quite possible that the "get radio" exchange is actually the cause of > the issues you are seeing. > > It is late here; so tomorrow I'll have a look at a version that doesn't > repeatedly use "get radio" to check the HRD context number. The change will > be non-trivial but I should be able to come up with something we can at > least test with to verify we have found the action that triggers the issue. > > > Mike W9MDB > 73 > Bill > G4WJS. > > > From: Bill Somerville [mailto:g4...@classdesign.com] > Sent: Friday, September 05, 2014 10:52 AM > To: wsjt-devel@lists.sourceforge.net > Subject: Re: [wsjt-devel] HRD tcp bug > > On 05/09/2014 16:29, Michael Black wrote: > Hi Mike, > Still having problems with the HRD tcp bug. > What's of interest to me is that I can run HRD Logger and DM780 both of > which use port 7809 too and they don't seem to have any problems. So I'm > going to try and compare these to hopefully figure out what's going on. If > you want I can capture the network traffic with all 3 of these things > running and send it to you along with a trace log. > > This packet dump shows an example of the problem. > After HRD died I clicked on "retry" so this is the sequence that followed. > 11:28:27.460700 The first command "get context" goes out > 11:28:32.460910 retransmit of "get context" after timeout > 11:28:32.592923 HRD responds to first attempt > 11:28:32.592923 WSJTX moves on to "get id" > 11:28:35.770041 HRD responds to 2nd attempt > > I do get rather significant pauses in HRD…especially when switching > frequencies. > So I bumped up the socket_wait_time to 10000 and now I hit the max # of > retries error which I've never seen before….during which menu operations on > WSJTX are pretty much hosed sionce timeout takes a minute or so with 5 > retries. > > Is that the socket wait time in WSJT-X? > > This is an interesting observation, all the CAT activity in WSJT-X should be > asynchronous as it is running in a separate dedicated thread and the main > GUI doesn't block waiting for responses. I need to find out why you are > seeing this problem as it may be an error in the message queuing to and from > the transceiver control thread. > Mike W9MDB > 73 > Bill > G4WJS. > > > > ---------------------------------------------------------------------------- > -- > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > > > > _______________________________________________ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel > > > > ------------------------------------------------------------------------------ > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > _______________________________________________ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel