Bill The length of the UDP packet does not seem to be the issue. It turns out that the packet loss rate that I'm seeing has gone *way* down and is now negligible. I suspect that there was a congested link somewhere. However, I can only spot lost packets if I get the occasional packet -- and it appears that some paths (or some local environments) drop essentially all packets. The network is currently dropping less than 1 in 10k packets. During the height of the problem it was dropping 1 in 10 packets!
With regard to sending packets more often, I'm happy with a 1 minutes timer for FT4/FT8 as those will tend to be fairly full anyway. I suspect that the cost of handling a packet is roughly n+1 where n is the number of spots. The five minute recommendation came from the original PSK world when the number of spots in five minutes was fairly small.... I'd be happy with getting packets of 1k or more as often as you want to send them. If the band is quiet, then just flush them out after five minutes. I suspect that TCP only makes a difference if some piece of the path is very lossy. The Marine Mobile use case was one of those cases, and the congestion case. Looking at the statistics on the server, it is less than 1 in 10k packets need to be reassembled, and that is (I think) because some other decoders send very large UDP packets. I have had at least one 10k packet over the last couple of days.... For TCP, I have keep-alive set so that I can timeout connections when the client goes away silently. The server currently has plenty of CPU and I don't think that handling lots of TCP connections will be that much of an overhead. There is some cost to accepting/closing the connection, but I doubt that it is noticeable. It probably makes life easier for you to keep the connection open. I see one of the benefits of TCP that you can give the user a positive indication that spots are being transmitted (and then it becomes my problem if they don't appear!) You don't need to send the repeated template descriptors on TCP. With regard to special callsigns -- what often happens is that the special callsign is also using WSJT-X and so transmits their locator as the receiver when they send their first report. At this point, I know what their locator is.... There isn't otherwise a good way for me to get the locators. I'm also working on bringing back the mechanism to allow anybody with an LoTW certificate to add/edit their locator. You can send whatever templates you want. Currently, I only process the attributes that you see -- adding a new column to the database to store those attributes would be very painful (although it would become easier if I upgraded the database server) I'm a little nervous about the volume of wspr spots -- there are already occasional issues on the server due to load, so adding another 50% of load is unlikely to make those problems go away. Philip p.s. I'll be looking into your test server issues later..... On Wed, Jun 17, 2020 at 8:36 AM Bill Somerville <g4...@classdesign.com> wrote: > On 07/06/2020 02:23, Philip Gladstone wrote: > > There are a (small) number of WSJT-X users who have difficulty reporting > their spots to pskreporter. Some of these are in "difficult" areas of > network connectivity (e.g. Marine Mobile) and I suspect that the UDP > transport is losing most of their packets. The general loss rate seems to > be around 1%-2% which is somewhat higher than I would expect, but it is not > unbelievable either. > > It is also difficult to diagnose these sort of problems as the packets > appear to leave the PC running WSJT-X and not arrive at my server! > > PSKReporter was never supposed to be 100% reliable, but there seem to be a > lot of people who think otherwise.... > > In an effort to improve the situation, I have now stood up a TCP listener > that might help. The protocol is identical -- the only difference is that > you send the same messages as before over a TCP connection to > report.pskreporter.info port 4739 rather than over a UDP connection. > There is no extra framing required as the messages already contain a length > code. > > The listening server should be able to support enough connections. It will > close a connection if an invalid message is received. > > Is this change something that could be implemented? Also, currently, you > send a bunch of packets at the same time (on the five minute expiry). You > could send them as soon as they get "full" rather than waiting. > > Thanks > > Philip > > Hi Philip, > > I have a test version of WSJT-X that can use TCP/IP to send spots to > PSKReporter, it seems to work ok on the test server, I have not tried it on > the main server yet. I have some thoughts and questions on yoour issues and > suggestions above. > > - WSJT-X builds UDP datagrams roughly 1400 bytes long. This could be > the root problem of some users not getting traffic through to PSKReporter, > due to packet fragmentation. IPv4 UDP allows fragmentation if routers need > to, but receivers will drop any reassembled datagram if any fragment is > lost, that thereby increases the likelihood of lost datagrams somewhat. > Routers may also just drop UDP datagrams over a certain size although I > have no idea if that really happens. The best recommendations for UDP > datagram payload size that guarantees deliverability (note *not* guarantee > delivery, that's never the case with UDP) is probably 508 bytes. One of the > changes I've implemented is to finally follow your recommendation to only > send template descriptors in the first three datagrams of a session and > once per hour thereafter. I have also implemented not sending the receiver > data set unless there is a change to the information, again sent every hour > even if there is no change to the data. These changes would mean that a > datagram payload size of 508 is reasonably practical. It would mean sending > more datagrams, probably after shorter intervals, say at least one per > minute, but overall less traffic volume. > - Although I understand your suggestion to send datagrams when they > are full, that is not easy to implement without causing server load issues, > the reason being that datagrams would tend to be sent during the decoding > phase of WSJT-X. On busy HF bands with FT8 and FT4 that would tend to > generate large spikes of traffic at 15s and 7.5 s intervals from the top of > each minute, respectively. I think using a variation on the current > mechanism with all available spots being sent in one or more datagrams on a > fixed timer interval would best randomize the traffic flow. The timer > origin being based on application start time alone. Perhaps if the timer > interval were 1 minute rather than 5 minutes the flow of spots would be > smoothed somewhat. This is also in line with the UDP datagram size > suggestion above. > - Perhaps a dual interval approach could be used to ensure more filled > datagrams, say WSJT-X checked every minute (or even 30s or less) and sends > as many filled datagrams as it has spots for, then once every 5 minutes it > flushes any queued spots including a final partially filled datagram. That > would smooth the flow for high volumes and still spot every 5 minutes for > the spotter monitoring a quiet band. Other intervals are of course possible > - suggestions? > - Using TCP/IP has some merits, but I am not sure there will be any > real gain. It may be better to try smaller UDP datagrams before TCP/IP. You > already have metrics for UDP to detect levels of dropped datagrams so you > could easily assess if smaller datagrams would solve the missing data > issue. For sure if shorter datagrams solve the root problem then TCP/IP > only gains client knowledge of server outages or network connectivity > issue, and guaranteed in-order delivery when a connection is working. The > latter being of no real value here. > - Also in this thread we discussed fallback strategies if TCP/IP > connections failed. Thinking this through, I don't see any benefit. WSJT-X > is certainly not going to store spots for any extended time to forward > later if the server is not available, and I doubt the UDP service will be > available if the TCP/IP one is down so reverting to UDP has little value > IMHO (although we could do it easily). The complexity of a failed TCP/IP > connection is the process needed to re-establish the connection, something > that is not required with UDP. I think the best strategy for WSJT-X would > be to drop spots on the floor if a TCP/IP connection were used and the > connection failed. Of course one benefit would be that WSJT-X could inform > the user that spots cannot be delivered, an option that is not available > with UDP. > - Another question with using TCP/IP is how long should WSJT-X keep a > connection to the server open for. We could have a connection that lives as > long as the client program session. Alternatively we could choose to open a > connection for each pass of sending records to PSK Reporter. The latter > does not allow much flexibility for transient outages, e.g. we might use a > 30 s timeout for sending TCP/IP data but that makes little sense if we are > going to close the connection before that time expires. OTOH a long running > connection might add some unwanted server load to maintain its end of > potentially several thousand connections concurrently. A small benefit of > long running connections is that we could set the SO_KEEPALIVE TCP/IP > option which would let us know if the server has gone away even when we are > not sending spots because of a silent band. Keep alive packets are normally > sent after two hours so there's not going to be any instant feedback about > a server that has gone AWOL. > - I assume there would be no need to send any repeated template > descriptors or receiver data with a TCP/IP connection, other than perhaps > for some sort of server availability handshake. > - We have discussed before whether WSJT-X should send spots where the > grid square is unknown. This would be a considerable traffic increase, > although WSJT-X might mitigate a bit by de-duplicating at some level. I > have not really thought through de-duplicating spots much, but I suppose > keeping a list of spotted calls in the last N minutes (N yet to be defined) > and only spotting if the call is not on the list. That asks secondary > questions about whether the spotted frequency should qualify the list > entry, maybe mode too. That then begs the question "what frequency > resolution", say for example just band changes in the last N minutes allow > re-spotting ... ??? In summary, what are your feelings on spots with an > empty grid square? Another possibility is to only spot non-standard calls > with no grid square (non-standard in terms of the FT8/T4/MSK144 protocol, > or Type 2 compound calls for other block modes like JT9 and JT65). This > would allow those calls to get spotted on PSK Reporter without hav+ing to > send special messages to get spotted (currently they would have to send a > message like "DE <MYSPECIAL> IO91" after CQ calls or QSOs). The problem for > PSK Reporter would be that the spots without a grid square would need a > derived coordinate for plotting, do you have that capability already, and > how robust is it with special callsigns? > - Sending WSPR spots to PSK Reporter could be implemented in two ways. > The best option would be a hand-over of the wsprnet.org domain so PSK > Reporter gets traffic form all existing sources in the current format, but > I'm not sure that is what is being proposed. Alternatively a new template > could be provided for WSPR spots and WSJT-X could send to that to > report.pskreporter.info as well as existing traffic to wsprnet.org, or > instead of, at the users discretion. I have no information on the volume of > traffic that would be forthcoming if the second option were taken. I > suspect many sources of wsprnet.org spots are not WSJT-X and they may > never add spotting to PSK Reporter for various reasons. > - I am not certain how PSK Reporter handles IPFIX templates. We > currently use a template that is different from the ones suggested on your > developer information web page: https://pskreporter.info/pskdev.html. > Is that because it is out of date, or is it that any combination of your > IPFIX attributes are allowed? For WSPR a senderPower attribute would be > required with a range of zero to sixty which is in units of dBm. An 8-bit > signed integer would be fine and allow for any future extension of powers > lower than 0 dBm. > > 73 > Bill > G4WJS. > _______________________________________________ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel >
_______________________________________________ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel