Makes sense. That could easily be simplified with a subclass of WeeWxIOError. Call it WeeWxZeroBytesError.
On Mon, Dec 13, 2021 at 6:51 AM John Kline <[email protected]> wrote: > Hi Tom, > > In particular, if I get the read too few bytes exception, I simply return > from genDavisLoopPackets(). genDavisLoopPackets() will be immediately > called again asking for another 200 packets. > > Just in case I’m not explaining it well, here’s the code that checks for > such an error and simply returns. With this code, weewx never restarts. > for loop in range(N): > for count in range(self.max_tries): > try: > loop_packet = self._get_packet() > log.debug('%s: Loop packet success!' % weeutil.weeutil.timestamp_to_string > (loop_packet['dateTime'])) > except weewx.WeeWxIOError as e: > if str(e).endswith('instead'): > if str(e).endswith('got 0 instead'): > log.info('get_packet returned 0 bytes. (%d)' % self.pkt_count) > else: > log.info('get_packet returned too few bytes. (%d)' % self.pkt_count) > # If already on a bad read, log that fact. > if self.on_bad_read: > log.info('genDavisLoopPackets: repeated bad read.') > self.on_bad_read = True > return > log.error("LOOP try #%d; error: %s", count + 1, e) > time.sleep(self.wait_before_retry) > else: > self.on_bad_read = False > self.pkt_count += 1 > yield loop_packet > break > else: > log.error("LOOP max tries (%d) exceeded.", self.max_tries) > raise weewx.RetriesExceeded("Max tries exceeded while getting LOOP data.") > > > On Dec 13, 2021, at 4:13 AM, Tom Keffer <[email protected]> wrote: > > > John: good sleuthing! > > Although, I didn't quite understand your comment about exiting > genDavisLoopPackets(). Are you saying you might as well set max_tries=1 > because it never recovers? > > It's possible that after a time sync, the logger is occupied for a bit and > unable to generate new packets. A possible solution might be to sleep for 5 > or 10 seconds after the sync. > > In any case, I'm 1,200 miles from my station, so I can't offer much for a > few months. > > -tk > > > > On Sun, Dec 12, 2021 at 6:00 PM John Kline <[email protected]> wrote: > >> >> I’ve been studying the LOOP errors for over a year now. I see them on >> three different installs, 1 NUC7i5 (w/ console) and two Raspberry Pi 4s (1 >> w/ console, 1 w/ envoy). >> >> I’ve got a modified and instrumented driver. The important mod is to >> exit genDavisLoopPackets() when the error is hit. The error is almost >> always “Expected to read 99 chars; got 0 instead”, but it does not have >> to be 0 that was actually received. I you don’t exit out of that loop, and >> you don’t get lucky to be near the end of the loop, weewx will restart. >> With the change, weewx never restarts. >> >> Next is the question of why. It may very well be what Tom suggests. >> I’ve suspected as such and have been meaning to write a C program to read >> the serial (USB) port and feed weewx. >> >> One thing I can say is that, for me, ON ALL THREE consoles, the error >> occurs *almost* always after a time set. >> >> Here’s an example from the log: >> >> Vantage Clock Set/Short Reads Info: >> Dec 12 00:30:04 judy weewx[491] INFO user.vantagenext: Clock set to >> 2021-12-12 00:30:06 PST (1639297806) (9128, 1639297804.599486, 1.303189) >> Dec 12 00:30:09 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9129) >> Dec 12 00:30:13 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9130) >> Dec 12 00:30:23 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9134) >> Dec 12 00:30:27 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9135) >> Dec 12 00:30:31 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9136) >> Dec 12 00:30:35 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9137) >> Dec 12 00:30:39 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9138) >> Dec 12 00:30:44 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9139) >> Dec 12 00:30:49 judy weewx[491] INFO user.vantagenext: get_packet >> returned 0 bytes. (9141) >> >> In the above, when the clock set happened, the driver had served 9128 >> successful loop packets. It served one more after the clock set and then >> got the zero bytes error (9129). The same is true for the next, one >> success and then one error. After that, 4 successes and one error. And so >> on. >> >> Sometimes after a time set, I see no errors, but that is not often. The >> number of errors varies. This is on the high side. 1 - 4 errors might be >> more common. >> >> Sometimes there is an error that occurs NOT after the time set. This is >> very rare and, when it does happen, it is very often an error that I see on >> all three consoles. And, when it does occur, it’s always a single error. >> >> I’m still studying this (because it bothers the heck out of me). >> >> On Dec 12, 2021, at 5:32 PM, Tom Keffer <[email protected]> wrote: >> >> >> Naturally, that excerpt didn't have any LOOP errors. >> >> I can definitely see a scenario where an overloaded machine could miss >> LOOP packets. The LOOP packets are requested in big bunches, typically 200 >> at a time, then the driver blocks, waiting for them. If a reporting thread >> hogs the CPU, the driver may be starved for time and not get back in time >> to get the waiting packet. The Vantage then assumes the driver has gone >> away and stops sending the packets. >> >> You have a lot going on with your system. Is there anything that could >> prevent the Python runtime engine from switching threads? >> >> On Sun, Dec 12, 2021 at 5:23 PM vince <[email protected]> wrote: >> >>> On Sunday, December 12, 2021 at 4:19:33 PM UTC-8 Tom Keffer wrote: >>> >>>> Vince, you should know better! That's not much of a log excerpt. What >>>> was before that DMPAFT log entry? Did weewx put something in the database? >>>> If not, that looks like a classic case of memory corruption. >>>> >>>> As for the LOOP errors, I see those every once in a while, particularly >>>> when the logger is busy after a large catch up. Again, more of the log >>>> would help. >>>> >>>> >>> Pot. Kettle .Black. Moi ? - Guilty as charged, sir :-) >>> >>> I guess I was trying to get a feel for "occasionally seeing this or >>> that" is not that unusual nor necessarily actionable if everything else >>> works ok. >>> >>> But to answer - everything seems to be working ok. >>> The db is getting updated just fine, as are the web pages and images. >>> All the extensions posting to WU/PWS/CWOP and the MQTT stuff (publish >>> and subscribe) are working fine too. >>> >>> I've attached a gzipped syslog from a restart of weewx and a couple >>> cycles afterward. Typically the rsync stuff completes last, so I added a >>> blank line between cycles to make this a little easier to parse. I did >>> notice a api key error in my forecast setup (fixed) so the initial startup >>> might be a little longer than normal since the forecast stuff had to catch >>> up. Timings for subsequent runs look normal for here - a little over 3 >>> minutes til the rsync stuff ends a 5-minute archive set of stuff... >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "weewx-user" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/weewx-user/61b8a4db-7ee4-4449-9fcc-7780f1775078n%40googlegroups.com >>> <https://groups.google.com/d/msgid/weewx-user/61b8a4db-7ee4-4449-9fcc-7780f1775078n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "weewx-user" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/weewx-user/CAPq0zEBV7S%2B7hP55BZ_UU9SRdJTfnpJiw-SG%3DKfc0tmrnkYJTQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/weewx-user/CAPq0zEBV7S%2B7hP55BZ_UU9SRdJTfnpJiw-SG%3DKfc0tmrnkYJTQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> -- >> You received this message because you are subscribed to the Google Groups >> "weewx-user" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/weewx-user/327F1AFB-9303-493E-B6BA-265B31D4A380%40johnkline.com >> <https://groups.google.com/d/msgid/weewx-user/327F1AFB-9303-493E-B6BA-265B31D4A380%40johnkline.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "weewx-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/weewx-user/CAPq0zEDUKgHvsK6e5gJdXJVF7Wz%3DdzgqwkhrJ0eCEXDi2yP9Jw%40mail.gmail.com > <https://groups.google.com/d/msgid/weewx-user/CAPq0zEDUKgHvsK6e5gJdXJVF7Wz%3DdzgqwkhrJ0eCEXDi2yP9Jw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > -- > You received this message because you are subscribed to the Google Groups > "weewx-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/weewx-user/63D224BC-5CBB-4070-B2AB-44AEC4B91624%40johnkline.com > <https://groups.google.com/d/msgid/weewx-user/63D224BC-5CBB-4070-B2AB-44AEC4B91624%40johnkline.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "weewx-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/CAPq0zED_5wh8adJpn_ObJvW%2Bz5_TGz_kE4OrqAApXDALUbbDPw%40mail.gmail.com.
