Re: [weewx-user] Re: We have finally put the last nail on the coffin for Wunderfixer?

Leon Shaner Thu, 23 May 2019 09:10:37 -0700

Jarom,

So you got 503's with my updated code, but not 403's?


Yeah, I have found that it is normal for WU to lazily process re-uploaded 
records, and it seems as if they drop them...as if some backend process is 
dying.

I found through trial and error that after a weewx "outage" I need to re-upload 
exactly 10 times, but not back to back.  I do one upload every 20 minutes, 
which means spreading the 10 re-uploads out over a 200 minute period.
After that wunderfixer reports no missing records.
Well, at least it used to work before all these 404's and 503's.  :-(

Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPad Pro)

> On May 23, 2019, at 11:06 AM, Jarom Hatch <[email protected]> wrote:
> 
> Those 503's are coming from Akamai.  When I have used Akamai in the past 
> those errors are because the origin is not responding correctly.
> 
> Related sorta, when I took out the code to pull the timestamps and just 
> force-ran all my records as a big backfill I got success messages back 
> however none of the records actually made their way in.
> 
>> On Thursday, May 23, 2019 at 8:27:37 AM UTC-6, Leon Shaner wrote:
>> Jarom,
>> 
>> Thanks so much!  I see what I did wrong and I was able to make a stubbed 
>> down version for basic testing to prove it's at least trying to connect.
>> 
>> Same location (for WeeWX 3.9.1):
>> 
>> https://raw.githubusercontent.com/UberEclectic/weewx/master/bin/wunderfixer
>> 
>> The thing is, I'm still constantly getting 404 (Not Found) even with CURL, 
>> and just a bit ago the site started throwing 503 (Service Not Available).
>> So...  It's kinda hard to test under these conditions.   But as long as you 
>> don't get a 403, then at least my User-Agent "hack" will be "proven."   :-/
>> 
>> Regards,
>> \Leon
>> --
>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>> 
>>> On May 23, 2019, at 1:57 AM, Jarom Hatch <[email protected]> wrote:
>>> 
>>> I tried the 3.9.1 version and I get Could not get Weather Underground data. 
>>> Exiting.
>>> 
>>> Curl still works, even for yesterday's data.  Tracing the script it doesn't 
>>> appear to be actually attempting the download.  
>>> 
>>> 
>>>> On Wednesday, May 22, 2019 at 7:03:49 PM UTC-6, Leon Shaner wrote:
>>>> Say, we need a tester who is still on 3.9.1 or there abouts to try this 
>>>> out:
>>>> 
>>>> https://raw.githubusercontent.com/UberEclectic/weewx/master/bin/wunderfixer
>>>> 
>>>> Can't do anything to workaround WU's sporadic 404 and 503 errors, but at 
>>>> least the 403 error should be gone.
>>>> 
>>>> I was able to test the 4.0 / development version myself on both Python2 
>>>> and 3, so hopefully Tom will merge that soon.  It's over here if you are 
>>>> impatient.  =D
>>>> 
>>>> https://raw.githubusercontent.com/UberEclectic/weewx/development/bin/wunderfixer
>>>> 
>>>> Regards,
>>>> \Leon
>>>> --
>>>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>>> 
>>>>> On May 22, 2019, at 7:24 PM, Leon Shaner <[email protected]> wrote:
>>>>> 
>>>>> Hey WeeWX'ers!!!  =D
>>>>> 
>>>>> I have a fix in the hopper.
>>>>> 
>>>>> There's nothing that can be done for the occasional HTTP 404, or even 
>>>>> 503's I am now seeing, but the HTTP 403 was due to a change on WU's part 
>>>>> where they are rejecting certain HTTP User-Agent strings.  The fact that 
>>>>> they are putting Akamai in the middle is almost certainly a great thing 
>>>>> re: their scalability issues; however, they probably inherited some 
>>>>> default settings that filter "bots" and malware and such, which is likely 
>>>>> why the HTTP User-Agent now matters.
>>>>> 
>>>>> I have set the User-Agent to "CURL" and it works.
>>>>> I have set it to "Mozilla" and it works.  I'm going with that one, since 
>>>>> it means Mosaic Killer, both of which were among the the very first 
>>>>> User-Agents I ever worked with, circa 1993 back before there was such as 
>>>>> thing as Netscape.  =D
>>>>> 
>>>>> /ye-olde-farte mode off  ;-)
>>>>> 
>>>>> My testing has so far been under Python3, but coincidentally (and not a 
>>>>> causation), WU started throwing HTTP 503's around the time that I tried 
>>>>> validating my code also under Python2.
>>>>> 
>>>>> Everything is working against today's date.
>>>>> It's when I go after yesterday's date that I get the HTTP server error 
>>>>> 503.
>>>>> 
>>>>> I expect the 404's and 503's to go away eventually, but at least for now 
>>>>> I have a fix for the 403 (forbidden)'s, just based on the User-Agent 
>>>>> string.
>>>>> 
>>>>> I'll submit a change for wunderfixer both to the 3.9.x "master" and 4.0.x 
>>>>> "development" branches in a moment and reply back with direct links for 
>>>>> anyone who wants a fix sooner.  =D
>>>>> 
>>>>> Isn't this fun?  =D
>>>>> 
>>>>> Regards,
>>>>> \Leon
>>>>> --
>>>>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>>>> 
>>>>>> On May 22, 2019, at 4:20 PM, Leon Shaner <[email protected]> wrote:
>>>>>> 
>>>>>> I'm still working on this.
>>>>>> CURL is telling me they are not only using https, but also TLSv1.2.
>>>>>> Here is a transcript, in case one of y'all beats me to the fix.  =D
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "weewx-user" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>> <wu.txt>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Working from here:
>>>>>> https://docs.python.org/2/library/ssl.html
>>>>>> 
>>>>>> So far I have tried this, to no avail.
>>>>>> Really just doing the "import ssl" and using https in the URL, and 
>>>>>> adding context=ssl_context to the urllib.request.
>>>>>> 
>>>>>> A snippet of that looks as follows, but still getting 403 forbidden.  :-(
>>>>>> 
>>>>>> # For new WU interface which uses SSL and TLSv1.2
>>>>>> import ssl
>>>>>> 
>>>>>> ...
>>>>>> 
>>>>>>         _url = 
>>>>>> "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=%s"; \
>>>>>>                "&month=%d&day=%d&year=%d&format=1" % (self.station, 
>>>>>> dayRequested_tt[1],
>>>>>>                                                       
>>>>>> dayRequested_tt[2], dayRequested_tt[0])
>>>>>> 
>>>>>>         # specify TLSv1.2 and SSLv2, but not SSLv3
>>>>>>         ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
>>>>>>         ssl_context.options |= ssl.PROTOCOL_SSLv23
>>>>>>         ssl_context.options |= ssl.OP_NO_SSLv3
>>>>>> 
>>>>>>         try :
>>>>>>             # Hit the weather underground site:
>>>>>>             _wudata = urllib.request.urlopen(_url, context=ssl_context)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> \Leon
>>>>>> --
>>>>>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>>>>> 
>>>>>>> On May 22, 2019, at 2:42 PM, Leon Shaner <[email protected]> wrote:
>>>>>>> 
>>>>>>> Jarom,
>>>>>>> 
>>>>>>> CURL is pretty sophisticated in its ability to emulate browser state in 
>>>>>>> pretty much any way but JavaScript.  When it worked this morning, I saw 
>>>>>>> some cookies were involved.
>>>>>>> It may well be that the python way isn't handling that part.
>>>>>>> I don't know enough about how python fetches pages to work that out, 
>>>>>>> but I am very familiar with CURL, so if I can find a path that works 
>>>>>>> consistently, then I'll go back to the python to see about how to 
>>>>>>> implement same.
>>>>>>> 
>>>>>>> I was getting 404's in the browser even, when I looked at it earlier.
>>>>>>> 
>>>>>>> I'll keep working on it, but not too hard, so as to not get on their 
>>>>>>> radar in any unwanted sort of way.  ;-)
>>>>>>> 
>>>>>>> Regards,
>>>>>>> \Leon
>>>>>>> --
>>>>>>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>>>>>> 
>>>>>>>> On May 22, 2019, at 2:04 PM, Jarom Hatch <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Interesting, using curl sometimes I can it fine, but wunderfixer is 
>>>>>>>> always getting a 403 Forbidden, as if it is actively being blocked...  
>>>>>>>> When it doesn't work in curl I get `HTTP/1.1 404 Not Found` and when 
>>>>>>>> it does work I get `HTTP/1.1 200 OK`.  Curl never gets a 403 error.
>>>>>>>> 
>>>>>>>>> On Wednesday, May 22, 2019 at 11:48:08 AM UTC-6, Jarom Hatch wrote:
>>>>>>>>> I was able to get it to work twice in my web browser, but as you 
>>>>>>>>> said, it is sporadic.  I don't ever recall them using Akamai before 
>>>>>>>>> so that may very well be a contributing factor.
>>>>>>>>> 
>>>>>>>>> I wonder if we can find out the origin address and see what happens 
>>>>>>>>> if we can bypass Akamai...
>>>>>>>>> 
>>>>>>>>>> On Wednesday, May 22, 2019 at 7:35:18 AM UTC-6, Leon Shaner wrote:
>>>>>>>>>> For one thing, the URL of this form:
>>>>>>>>>> 
>>>>>>>>>> http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1
>>>>>>>>>> 
>>>>>>>>>> Is now redirecting to one using HTTPS:
>>>>>>>>>> 
>>>>>>>>>> https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1
>>>>>>>>>> 
>>>>>>>>>> Also, the redirect itself takes an excruciatingly long time.
>>>>>>>>>> So I just changed the URL to https directly...
>>>>>>>>>> 
>>>>>>>>>> The first time I tried any of the above using CURL this morning it 
>>>>>>>>>> worked, but then after that I started getting:
>>>>>>>>>> 
>>>>>>>>>> An error occurred while processing your request.
>>>>>>>>>> Reference #30.6f451160.1558531514.16ced4f6
>>>>>>>>>> 
>>>>>>>>>> It looks as if they've put some kind of Akamai proxy in the middle, 
>>>>>>>>>> which is fine for static content, but not so fine for a query of 
>>>>>>>>>> this nature.  Strange that it worked for me the very first time.  
>>>>>>>>>> It's almost as if the Akamai "farm" has lost some "state" 
>>>>>>>>>> information and not all nodes have the same content, so if you get 
>>>>>>>>>> stuck going through a bad node you get a bogus response.
>>>>>>>>>> 
>>>>>>>>>> Attached is a transcript of a failed attempt.  I put SOMESTATION 
>>>>>>>>>> there only after the fact.  The actual query was for my actual 
>>>>>>>>>> station, which used to work.
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "weewx-user" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>>> an email to [email protected].
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/weewx-user/07ac6f86-ae4d-4854-8398-ce4ab8d846c1%40googlegroups.com.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>> 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "weewx-user" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "weewx-user" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/weewx-user/FA3780B4-F4CB-4897-9CA5-87557D62DAF7%40isylum.org.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "weewx-user" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/weewx-user/49f5ecce-f082-4d64-848c-98e07e3d6349%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "weewx-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/weewx-user/db878ac7-8255-4704-9e69-051647ae8f98%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-user/92847878-2728-4421-8BDB-E9DD93775645%40isylum.org.
For more options, visit https://groups.google.com/d/optout.

Re: [weewx-user] Re: We have finally put the last nail on the coffin for Wunderfixer?

Reply via email to