On 06/09/2014 17:37, Michael Black wrote:
Hi Mike,
> I may have had an epiphany….either that or I'm totally confused (no bets
> here :-)
> If the send_command doesn't get a result immediately can't poll() walk over
> it?  So "get radio" might end up getting some other response since it's
> first on the read queue?
> So it sends a command that doesn't get an immediate response, then poll()
> comes in and does "get frequencies" and the other command sees the response
> to that?
> And then that confuses HRD by receiving an additional command before
> responding to the first one?
I don't think so. All the CAT commands run in a single thread and block 
for replies. That means that send_command cannot be interrupted by a 
do_sync (the consequence of a poll). The lack of a reply when one is 
expected will cause the time out exception but not an exit from 
send_command until all the retries have been exhausted.

I don't have access to my shack at present since it is active in the AA 
contest, I hope to look at this further a bit later.
>
>
> Mike W9MDB
73
Bill
G4WJS.
>
> From: Bill Somerville [mailto:g4...@classdesign.com]
> Sent: Friday, September 05, 2014 7:21 PM
> To: wsjt-devel@lists.sourceforge.net
> Subject: Re: [wsjt-devel] HRD tcp bug
>
> On 06/09/2014 00:53, Michael Black wrote:
> Hi Mike,
> OK…current run with rev 4255 but I am running with 10 second socket timeout
> modification.
>   
> I ran WSJT-X for a while.
> I then started HRD Logbook at 13:10.    Port 56169 is WSJT-X and Port 56193
> is HRD Logbook.
>   
> Takes a few seconds for Logbook to get going….
> 13:10:07.064095 WSJT-X does "get frequencies" and HRD does ACK that packet
> 13:10:07.171095 Logger connects to HRD
>   
> 13:10:09.593095 WSJT-X finally responds to WSJT-X "get frequencies"
> 13:10:10.105095 WSJT-X sends "[743] get radio"
> 13:10:10.697095 reply to "get radio"
>   
> 13:10:16.190095 WSJT-X sends "[743] get radio" – ACK is received but no
> reply from HRD
> 13:10:26.189095 retransmit – ACK but no reply
> 13:10:36.189095 retransmit – ACK but no reply
> 13:10:46:189095 retransmit – ACK but no reply
> 13:10:56:189095 retransmit – ACK but no reply
>   
> WSTJX retries exhausted.
> OK, that's not good.
>
>   
> So, in this case, HRD is clearly just swallowing the "get radio" command and
> never responding to it while it's happily chatting with HRD Logger.
>   
> It might be related to "get radio".  This is what Logger does
> 13:10:09.595095 IP 127.0.0.1.56193 > 127.0.0.1.7809: P 79:121(42) ack 109
> win 127
>                  0x0000:  4500 0052 6865 4000 8006 0000 7f00 0001
> E..Rhe@.........
>                  0x0010:  7f00 0001 db81 1e81 8d15 bf8b a1fe 10ac
> ................
>                  0x0020:  5018 007f e852 0000 2a00 0000 cdab 3412
> P....R..*.....4.
>                  0x0030:  3412 cdab 0000 0000 4700 6500 7400 2000
> 4.......G.e.t...
>                  0x0040:  5200 6100 6400 6900 6f00 7300 0000 0000
> R.a.d.i.o.s.....
>                  0x0050:  0000                                     ..
> 13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: . ack 121 win 127
>                  0x0000:  4500 0028 6866 4000 8006 0000 7f00 0001
> E..(hf@.........
>                  0x0010:  7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5
> ................
>                  0x0020:  5010 007f b7da 0000                      P.......
> 13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: P 109:177(68) ack 121
> win 127
>                  0x0000:  4500 006c 6867 4000 8006 0000 7f00 0001
> E..lhg@.........
>                  0x0010:  7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5
> ................
>                  0x0020:  5018 007f 280c 0000 4400 0000 cdab 3412
> P...(...D.....4.
>                  0x0030:  3412 cdab 0000 0000 3700 3400 3300 3a00
> 4.......7.4.3.:.
>                  0x0040:  5400 5400 2d00 4f00 4d00 4e00 4900 2000
> T.T.-.O.M.N.I...
>                  0x0050:  5600 4900 4900 2000 2800 5200 6100 6400
> V.I.I...(.R.a.d.
>                  0x0060:  6900 6f00 2900 0000 0000 0000
> i.o.).......
>   
> Once that is sent HRD no longer seems to want to respond to "get radio".
> Could this be a new command that should be used "get radios" instead of "get
> radio" ??
> My guess when implementing this was that I needed to detect a change of
> radio so I prefix every command with a "get radio" exchange to check that I
> am still talking to the same rig. "get radio" seems to get the radio that
> currently has focus in HRD.
>
> IIRC the only way I could be consistent when there was more than one rig
> connected to HRD was to always address the rig that has focus, otherwise we
> would have to specify the actual radio name somewhere in WSJT-X which I
> wanted to avoid.
>
> The "get radios" command is different as it returns all the rigs currently
> connected to HRD.
>
> It is quite possible that the "get radio" exchange is actually the cause of
> the issues you are seeing.
>
> It is late here; so tomorrow I'll have a look at a version that doesn't
> repeatedly use "get radio" to check the HRD context number. The change will
> be non-trivial but I should be able to come up with something we can at
> least test with to verify we have found the action that triggers the issue.
>
>   
> Mike W9MDB
> 73
> Bill
> G4WJS.
>
>   
> From: Bill Somerville [mailto:g4...@classdesign.com]
> Sent: Friday, September 05, 2014 10:52 AM
> To: wsjt-devel@lists.sourceforge.net
> Subject: Re: [wsjt-devel] HRD tcp bug
>   
> On 05/09/2014 16:29, Michael Black wrote:
> Hi Mike,
> Still having problems with the HRD tcp bug.
> What's of interest to me is that I can run HRD Logger and DM780 both of
> which use port 7809 too and they don't seem to have any problems.  So I'm
> going to try and compare these to hopefully figure out what's going on.  If
> you want I can capture the network traffic with all 3 of these things
> running and send it to you along with a trace log.
>   
> This packet dump shows an example of the problem.
> After HRD died I clicked on "retry" so this is the sequence that followed.
> 11:28:27.460700  The first command "get context" goes out
> 11:28:32.460910  retransmit of "get context" after timeout
> 11:28:32.592923 HRD responds to first attempt
> 11:28:32.592923 WSJTX moves on to "get id"
> 11:28:35.770041 HRD responds to 2nd attempt
>   
> I do get rather significant pauses in HRD…especially when switching
> frequencies.
> So I bumped up the socket_wait_time to 10000 and now I hit the max # of
> retries error which I've never seen before….during which menu operations on
> WSJTX are pretty much hosed sionce timeout takes a minute or so with 5
> retries.
>   
> Is that the socket wait time in WSJT-X?
>
> This is an interesting observation, all the CAT activity in WSJT-X should be
> asynchronous as it is running in a separate dedicated thread and the main
> GUI doesn't block waiting for responses. I need to find out why you are
> seeing this problem as it may be an error in the message queuing to and from
> the transceiver control thread.
> Mike W9MDB
> 73
> Bill
> G4WJS.
>
>
>
> ----------------------------------------------------------------------------
> --
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
>
>
>
> _______________________________________________
> wsjt-devel mailing list
> wsjt-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/wsjt-devel
>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> wsjt-devel mailing list
> wsjt-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/wsjt-devel


------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Reply via email to