Jarom, So you got 503's with my updated code, but not 403's?
Yeah, I have found that it is normal for WU to lazily process re-uploaded records, and it seems as if they drop them...as if some backend process is dying. I found through trial and error that after a weewx "outage" I need to re-upload exactly 10 times, but not back to back. I do one upload every 20 minutes, which means spreading the 10 re-uploads out over a 200 minute period. After that wunderfixer reports no missing records. Well, at least it used to work before all these 404's and 503's. :-( Regards, \Leon -- Leon Shaner :: Dearborn, Michigan (iPad Pro) > On May 23, 2019, at 11:06 AM, Jarom Hatch <[email protected]> wrote: > > Those 503's are coming from Akamai. When I have used Akamai in the past > those errors are because the origin is not responding correctly. > > Related sorta, when I took out the code to pull the timestamps and just > force-ran all my records as a big backfill I got success messages back > however none of the records actually made their way in. > >> On Thursday, May 23, 2019 at 8:27:37 AM UTC-6, Leon Shaner wrote: >> Jarom, >> >> Thanks so much! I see what I did wrong and I was able to make a stubbed >> down version for basic testing to prove it's at least trying to connect. >> >> Same location (for WeeWX 3.9.1): >> >> https://raw.githubusercontent.com/UberEclectic/weewx/master/bin/wunderfixer >> >> The thing is, I'm still constantly getting 404 (Not Found) even with CURL, >> and just a bit ago the site started throwing 503 (Service Not Available). >> So... It's kinda hard to test under these conditions. But as long as you >> don't get a 403, then at least my User-Agent "hack" will be "proven." :-/ >> >> Regards, >> \Leon >> -- >> Leon Shaner :: Dearborn, Michigan (iPad Pro) >> >>> On May 23, 2019, at 1:57 AM, Jarom Hatch <[email protected]> wrote: >>> >>> I tried the 3.9.1 version and I get Could not get Weather Underground data. >>> Exiting. >>> >>> Curl still works, even for yesterday's data. Tracing the script it doesn't >>> appear to be actually attempting the download. >>> >>> >>>> On Wednesday, May 22, 2019 at 7:03:49 PM UTC-6, Leon Shaner wrote: >>>> Say, we need a tester who is still on 3.9.1 or there abouts to try this >>>> out: >>>> >>>> https://raw.githubusercontent.com/UberEclectic/weewx/master/bin/wunderfixer >>>> >>>> Can't do anything to workaround WU's sporadic 404 and 503 errors, but at >>>> least the 403 error should be gone. >>>> >>>> I was able to test the 4.0 / development version myself on both Python2 >>>> and 3, so hopefully Tom will merge that soon. It's over here if you are >>>> impatient. =D >>>> >>>> https://raw.githubusercontent.com/UberEclectic/weewx/development/bin/wunderfixer >>>> >>>> Regards, >>>> \Leon >>>> -- >>>> Leon Shaner :: Dearborn, Michigan (iPad Pro) >>>> >>>>> On May 22, 2019, at 7:24 PM, Leon Shaner <[email protected]> wrote: >>>>> >>>>> Hey WeeWX'ers!!! =D >>>>> >>>>> I have a fix in the hopper. >>>>> >>>>> There's nothing that can be done for the occasional HTTP 404, or even >>>>> 503's I am now seeing, but the HTTP 403 was due to a change on WU's part >>>>> where they are rejecting certain HTTP User-Agent strings. The fact that >>>>> they are putting Akamai in the middle is almost certainly a great thing >>>>> re: their scalability issues; however, they probably inherited some >>>>> default settings that filter "bots" and malware and such, which is likely >>>>> why the HTTP User-Agent now matters. >>>>> >>>>> I have set the User-Agent to "CURL" and it works. >>>>> I have set it to "Mozilla" and it works. I'm going with that one, since >>>>> it means Mosaic Killer, both of which were among the the very first >>>>> User-Agents I ever worked with, circa 1993 back before there was such as >>>>> thing as Netscape. =D >>>>> >>>>> /ye-olde-farte mode off ;-) >>>>> >>>>> My testing has so far been under Python3, but coincidentally (and not a >>>>> causation), WU started throwing HTTP 503's around the time that I tried >>>>> validating my code also under Python2. >>>>> >>>>> Everything is working against today's date. >>>>> It's when I go after yesterday's date that I get the HTTP server error >>>>> 503. >>>>> >>>>> I expect the 404's and 503's to go away eventually, but at least for now >>>>> I have a fix for the 403 (forbidden)'s, just based on the User-Agent >>>>> string. >>>>> >>>>> I'll submit a change for wunderfixer both to the 3.9.x "master" and 4.0.x >>>>> "development" branches in a moment and reply back with direct links for >>>>> anyone who wants a fix sooner. =D >>>>> >>>>> Isn't this fun? =D >>>>> >>>>> Regards, >>>>> \Leon >>>>> -- >>>>> Leon Shaner :: Dearborn, Michigan (iPad Pro) >>>>> >>>>>> On May 22, 2019, at 4:20 PM, Leon Shaner <[email protected]> wrote: >>>>>> >>>>>> I'm still working on this. >>>>>> CURL is telling me they are not only using https, but also TLSv1.2. >>>>>> Here is a transcript, in case one of y'all beats me to the fix. =D >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "weewx-user" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> <wu.txt> >>>>>> >>>>>> >>>>>> >>>>>> Working from here: >>>>>> https://docs.python.org/2/library/ssl.html >>>>>> >>>>>> So far I have tried this, to no avail. >>>>>> Really just doing the "import ssl" and using https in the URL, and >>>>>> adding context=ssl_context to the urllib.request. >>>>>> >>>>>> A snippet of that looks as follows, but still getting 403 forbidden. :-( >>>>>> >>>>>> # For new WU interface which uses SSL and TLSv1.2 >>>>>> import ssl >>>>>> >>>>>> ... >>>>>> >>>>>> _url = >>>>>> "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=%s" \ >>>>>> "&month=%d&day=%d&year=%d&format=1" % (self.station, >>>>>> dayRequested_tt[1], >>>>>> >>>>>> dayRequested_tt[2], dayRequested_tt[0]) >>>>>> >>>>>> # specify TLSv1.2 and SSLv2, but not SSLv3 >>>>>> ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2) >>>>>> ssl_context.options |= ssl.PROTOCOL_SSLv23 >>>>>> ssl_context.options |= ssl.OP_NO_SSLv3 >>>>>> >>>>>> try : >>>>>> # Hit the weather underground site: >>>>>> _wudata = urllib.request.urlopen(_url, context=ssl_context) >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> \Leon >>>>>> -- >>>>>> Leon Shaner :: Dearborn, Michigan (iPad Pro) >>>>>> >>>>>>> On May 22, 2019, at 2:42 PM, Leon Shaner <[email protected]> wrote: >>>>>>> >>>>>>> Jarom, >>>>>>> >>>>>>> CURL is pretty sophisticated in its ability to emulate browser state in >>>>>>> pretty much any way but JavaScript. When it worked this morning, I saw >>>>>>> some cookies were involved. >>>>>>> It may well be that the python way isn't handling that part. >>>>>>> I don't know enough about how python fetches pages to work that out, >>>>>>> but I am very familiar with CURL, so if I can find a path that works >>>>>>> consistently, then I'll go back to the python to see about how to >>>>>>> implement same. >>>>>>> >>>>>>> I was getting 404's in the browser even, when I looked at it earlier. >>>>>>> >>>>>>> I'll keep working on it, but not too hard, so as to not get on their >>>>>>> radar in any unwanted sort of way. ;-) >>>>>>> >>>>>>> Regards, >>>>>>> \Leon >>>>>>> -- >>>>>>> Leon Shaner :: Dearborn, Michigan (iPad Pro) >>>>>>> >>>>>>>> On May 22, 2019, at 2:04 PM, Jarom Hatch <[email protected]> wrote: >>>>>>>> >>>>>>>> Interesting, using curl sometimes I can it fine, but wunderfixer is >>>>>>>> always getting a 403 Forbidden, as if it is actively being blocked... >>>>>>>> When it doesn't work in curl I get `HTTP/1.1 404 Not Found` and when >>>>>>>> it does work I get `HTTP/1.1 200 OK`. Curl never gets a 403 error. >>>>>>>> >>>>>>>>> On Wednesday, May 22, 2019 at 11:48:08 AM UTC-6, Jarom Hatch wrote: >>>>>>>>> I was able to get it to work twice in my web browser, but as you >>>>>>>>> said, it is sporadic. I don't ever recall them using Akamai before >>>>>>>>> so that may very well be a contributing factor. >>>>>>>>> >>>>>>>>> I wonder if we can find out the origin address and see what happens >>>>>>>>> if we can bypass Akamai... >>>>>>>>> >>>>>>>>>> On Wednesday, May 22, 2019 at 7:35:18 AM UTC-6, Leon Shaner wrote: >>>>>>>>>> For one thing, the URL of this form: >>>>>>>>>> >>>>>>>>>> http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1 >>>>>>>>>> >>>>>>>>>> Is now redirecting to one using HTTPS: >>>>>>>>>> >>>>>>>>>> https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1 >>>>>>>>>> >>>>>>>>>> Also, the redirect itself takes an excruciatingly long time. >>>>>>>>>> So I just changed the URL to https directly... >>>>>>>>>> >>>>>>>>>> The first time I tried any of the above using CURL this morning it >>>>>>>>>> worked, but then after that I started getting: >>>>>>>>>> >>>>>>>>>> An error occurred while processing your request. >>>>>>>>>> Reference #30.6f451160.1558531514.16ced4f6 >>>>>>>>>> >>>>>>>>>> It looks as if they've put some kind of Akamai proxy in the middle, >>>>>>>>>> which is fine for static content, but not so fine for a query of >>>>>>>>>> this nature. Strange that it worked for me the very first time. >>>>>>>>>> It's almost as if the Akamai "farm" has lost some "state" >>>>>>>>>> information and not all nodes have the same content, so if you get >>>>>>>>>> stuck going through a bad node you get a bogus response. >>>>>>>>>> >>>>>>>>>> Attached is a transcript of a failed attempt. I put SOMESTATION >>>>>>>>>> there only after the fact. The actual query was for my actual >>>>>>>>>> station, which used to work. >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "weewx-user" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>>> an email to [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/weewx-user/07ac6f86-ae4d-4854-8398-ce4ab8d846c1%40googlegroups.com. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "weewx-user" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "weewx-user" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/weewx-user/FA3780B4-F4CB-4897-9CA5-87557D62DAF7%40isylum.org. >>>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "weewx-user" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/weewx-user/49f5ecce-f082-4d64-848c-98e07e3d6349%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "weewx-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/weewx-user/db878ac7-8255-4704-9e69-051647ae8f98%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "weewx-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/92847878-2728-4421-8BDB-E9DD93775645%40isylum.org. For more options, visit https://groups.google.com/d/optout.
