I may have had an epiphany….either that or I'm totally confused (no bets
here :-) 
If the send_command doesn't get a result immediately can't poll() walk over
it?  So "get radio" might end up getting some other response since it's
first on the read queue?
So it sends a command that doesn't get an immediate response, then poll()
comes in and does "get frequencies" and the other command sees the response
to that?
And then that confuses HRD by receiving an additional command before
responding to the first one?


Mike W9MDB

From: Bill Somerville [mailto:g4...@classdesign.com] 
Sent: Friday, September 05, 2014 7:21 PM
To: wsjt-devel@lists.sourceforge.net
Subject: Re: [wsjt-devel] HRD tcp bug

On 06/09/2014 00:53, Michael Black wrote:
Hi Mike,
OK…current run with rev 4255 but I am running with 10 second socket timeout
modification.
 
I ran WSJT-X for a while.
I then started HRD Logbook at 13:10.    Port 56169 is WSJT-X and Port 56193
is HRD Logbook.
 
Takes a few seconds for Logbook to get going….
13:10:07.064095 WSJT-X does "get frequencies" and HRD does ACK that packet
13:10:07.171095 Logger connects to HRD
 
13:10:09.593095 WSJT-X finally responds to WSJT-X "get frequencies"
13:10:10.105095 WSJT-X sends "[743] get radio"
13:10:10.697095 reply to "get radio"
 
13:10:16.190095 WSJT-X sends "[743] get radio" – ACK is received but no
reply from HRD
13:10:26.189095 retransmit – ACK but no reply
13:10:36.189095 retransmit – ACK but no reply
13:10:46:189095 retransmit – ACK but no reply
13:10:56:189095 retransmit – ACK but no reply
 
WSTJX retries exhausted.
OK, that's not good.

 
So, in this case, HRD is clearly just swallowing the "get radio" command and
never responding to it while it's happily chatting with HRD Logger.
 
It might be related to "get radio".  This is what Logger does
13:10:09.595095 IP 127.0.0.1.56193 > 127.0.0.1.7809: P 79:121(42) ack 109
win 127
                0x0000:  4500 0052 6865 4000 8006 0000 7f00 0001 
E..Rhe@.........
                0x0010:  7f00 0001 db81 1e81 8d15 bf8b a1fe 10ac 
................
                0x0020:  5018 007f e852 0000 2a00 0000 cdab 3412 
P....R..*.....4.
                0x0030:  3412 cdab 0000 0000 4700 6500 7400 2000 
4.......G.e.t...
                0x0040:  5200 6100 6400 6900 6f00 7300 0000 0000 
R.a.d.i.o.s.....
                0x0050:  0000                                     ..
13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: . ack 121 win 127
                0x0000:  4500 0028 6866 4000 8006 0000 7f00 0001 
E..(hf@.........
                0x0010:  7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5 
................
                0x0020:  5010 007f b7da 0000                      P.......
13:10:09.595095 IP 127.0.0.1.7809 > 127.0.0.1.56193: P 109:177(68) ack 121
win 127
                0x0000:  4500 006c 6867 4000 8006 0000 7f00 0001 
E..lhg@.........
                0x0010:  7f00 0001 1e81 db81 a1fe 10ac 8d15 bfb5 
................
                0x0020:  5018 007f 280c 0000 4400 0000 cdab 3412 
P...(...D.....4.
                0x0030:  3412 cdab 0000 0000 3700 3400 3300 3a00 
4.......7.4.3.:.
                0x0040:  5400 5400 2d00 4f00 4d00 4e00 4900 2000 
T.T.-.O.M.N.I...
                0x0050:  5600 4900 4900 2000 2800 5200 6100 6400 
V.I.I...(.R.a.d.
                0x0060:  6900 6f00 2900 0000 0000 0000           
i.o.).......
 
Once that is sent HRD no longer seems to want to respond to "get radio".
Could this be a new command that should be used "get radios" instead of "get
radio" ??  
My guess when implementing this was that I needed to detect a change of
radio so I prefix every command with a "get radio" exchange to check that I
am still talking to the same rig. "get radio" seems to get the radio that
currently has focus in HRD.

IIRC the only way I could be consistent when there was more than one rig
connected to HRD was to always address the rig that has focus, otherwise we
would have to specify the actual radio name somewhere in WSJT-X which I
wanted to avoid.

The "get radios" command is different as it returns all the rigs currently
connected to HRD.

It is quite possible that the "get radio" exchange is actually the cause of
the issues you are seeing.

It is late here; so tomorrow I'll have a look at a version that doesn't
repeatedly use "get radio" to check the HRD context number. The change will
be non-trivial but I should be able to come up with something we can at
least test with to verify we have found the action that triggers the issue.

 
Mike W9MDB
73
Bill
G4WJS.

 
From: Bill Somerville [mailto:g4...@classdesign.com] 
Sent: Friday, September 05, 2014 10:52 AM
To: wsjt-devel@lists.sourceforge.net
Subject: Re: [wsjt-devel] HRD tcp bug
 
On 05/09/2014 16:29, Michael Black wrote:
Hi Mike,
Still having problems with the HRD tcp bug.
What's of interest to me is that I can run HRD Logger and DM780 both of
which use port 7809 too and they don't seem to have any problems.  So I'm
going to try and compare these to hopefully figure out what's going on.  If
you want I can capture the network traffic with all 3 of these things
running and send it to you along with a trace log.
 
This packet dump shows an example of the problem.
After HRD died I clicked on "retry" so this is the sequence that followed.
11:28:27.460700  The first command "get context" goes out
11:28:32.460910  retransmit of "get context" after timeout
11:28:32.592923 HRD responds to first attempt
11:28:32.592923 WSJTX moves on to "get id"
11:28:35.770041 HRD responds to 2nd attempt
 
I do get rather significant pauses in HRD…especially when switching
frequencies.
So I bumped up the socket_wait_time to 10000 and now I hit the max # of
retries error which I've never seen before….during which menu operations on
WSJTX are pretty much hosed sionce timeout takes a minute or so with 5
retries.
 
Is that the socket wait time in WSJT-X?

This is an interesting observation, all the CAT activity in WSJT-X should be
asynchronous as it is running in a separate dedicated thread and the main
GUI doesn't block waiting for responses. I need to find out why you are
seeing this problem as it may be an error in the message queuing to and from
the transceiver control thread. 
Mike W9MDB
73
Bill
G4WJS.



----------------------------------------------------------------------------
--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/



_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel



------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Reply via email to