Sad, now I'm getting the 503.  I'll keep trying.

On Thursday, May 23, 2019 at 8:27:37 AM UTC-6, Leon Shaner wrote:
>
> Jarom,
>
> Thanks so much!  I see what I did wrong and I was able to make a stubbed 
> down version for basic testing to prove it's at least trying to connect.
>
> Same location (for WeeWX 3.9.1):
>
> https://raw.githubusercontent.com/UberEclectic/weewx/master/bin/wunderfixer
>
> The thing is, I'm still constantly getting 404 (Not Found) even with CURL, 
> and just a bit ago the site started throwing 503 (Service Not Available).
> So...  It's kinda hard to test under these conditions.   But as long as 
> you don't get a 403, then at least my User-Agent "hack" will be "proven."   
> :-/
>
> Regards,
> \Leon
> --
> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>
> On May 23, 2019, at 1:57 AM, Jarom Hatch <[email protected] <javascript:>> 
> wrote:
>
> I tried the 3.9.1 version and I get Could not get Weather Underground 
> data. Exiting.
>
> Curl still works, even for yesterday's data.  Tracing the script it 
> doesn't appear to be actually attempting the download.  
>
>
> On Wednesday, May 22, 2019 at 7:03:49 PM UTC-6, Leon Shaner wrote:
>>
>> Say, we need a tester who is still on 3.9.1 or there abouts to try this 
>> out:
>>
>>
>> https://raw.githubusercontent.com/UberEclectic/weewx/master/bin/wunderfixer
>>
>> Can't do anything to workaround WU's sporadic 404 and 503 errors, but at 
>> least the 403 error should be gone.
>>
>> I was able to test the 4.0 / development version myself on both Python2 
>> and 3, so hopefully Tom will merge that soon.  It's over here if you are 
>> impatient.  =D
>>
>>
>> https://raw.githubusercontent.com/UberEclectic/weewx/development/bin/wunderfixer
>>
>> Regards,
>> \Leon
>> --
>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>
>> On May 22, 2019, at 7:24 PM, Leon Shaner <[email protected]> wrote:
>>
>> Hey WeeWX'ers!!!  =D
>>
>> I have a fix in the hopper.
>>
>> There's nothing that can be done for the occasional HTTP 404, or even 
>> 503's I am now seeing, but the HTTP 403 was due to a change on WU's part 
>> where they are rejecting certain HTTP User-Agent strings.  The fact that 
>> they are putting Akamai in the middle is almost certainly a great thing re: 
>> their scalability issues; however, they probably inherited some default 
>> settings that filter "bots" and malware and such, which is likely why the 
>> HTTP User-Agent now matters.
>>
>> I have set the User-Agent to "CURL" and it works.
>> I have set it to "Mozilla" and it works.  I'm going with that one, since 
>> it means Mosaic Killer, both of which were among the the very first 
>> User-Agents I ever worked with, circa 1993 back before there was such as 
>> thing as Netscape.  =D
>>
>> /ye-olde-farte mode off  ;-)
>>
>> My testing has so far been under Python3, but coincidentally (and not a 
>> causation), WU started throwing HTTP 503's around the time that I tried 
>> validating my code also under Python2.
>>
>> Everything is working against today's date.
>> It's when I go after yesterday's date that I get the HTTP server error 
>> 503.
>>
>> I expect the 404's and 503's to go away eventually, but at least for now 
>> I have a fix for the 403 (forbidden)'s, just based on the User-Agent string.
>>
>> I'll submit a change for wunderfixer both to the 3.9.x "master" and 4.0.x 
>> "development" branches in a moment and reply back with direct links for 
>> anyone who wants a fix sooner.  =D
>>
>> Isn't this fun?  =D
>>
>> Regards,
>> \Leon
>> --
>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>
>> On May 22, 2019, at 4:20 PM, Leon Shaner <[email protected]> wrote:
>>
>> I'm still working on this.
>> CURL is telling me they are not only using https, but also TLSv1.2.
>> Here is a transcript, in case one of y'all beats me to the fix.  =D
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "weewx-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org
>>  
>> <https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> <wu.txt>
>>
>>
>>
>>
>> Working from here:
>> https://docs.python.org/2/library/ssl.html
>>
>> So far I have tried this, to no avail.
>> Really just doing the "import ssl" and using https in the URL, and adding 
>> context=ssl_context to the urllib.request.
>>
>> A snippet of that looks as follows, but still getting 403 forbidden.  :-(
>>
>> # For new WU interface which uses SSL and TLSv1.2
>> import ssl
>>
>> ...
>>
>>         _url = "
>> https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=%s"; \
>>                "&month=%d&day=%d&year=%d&format=1" % (self.station, 
>> dayRequested_tt[1],
>>                                                       dayRequested_tt[2], 
>> dayRequested_tt[0])
>>
>>         # specify TLSv1.2 and SSLv2, but not SSLv3
>>         ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
>>         ssl_context.options |= ssl.PROTOCOL_SSLv23
>>         ssl_context.options |= ssl.OP_NO_SSLv3
>>
>>         try :
>>             # Hit the weather underground site:
>>             _wudata = urllib.request.urlopen(_url, context=ssl_context)
>>
>>
>>
>> Regards,
>> \Leon
>> --
>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>
>> On May 22, 2019, at 2:42 PM, Leon Shaner <[email protected]> wrote:
>>
>> Jarom,
>>
>> CURL is pretty sophisticated in its ability to emulate browser state in 
>> pretty much any way but JavaScript.  When it worked this morning, I saw 
>> some cookies were involved.
>> It may well be that the python way isn't handling that part.
>> I don't know enough about how python fetches pages to work that out, but 
>> I am very familiar with CURL, so if I can find a path that works 
>> consistently, then I'll go back to the python to see about how to implement 
>> same.
>>
>> I was getting 404's in the browser even, when I looked at it earlier.
>>
>> I'll keep working on it, but not too hard, so as to not get on their 
>> radar in any unwanted sort of way.  ;-)
>>
>> Regards,
>> \Leon
>> --
>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>>
>> On May 22, 2019, at 2:04 PM, Jarom Hatch <[email protected]> wrote:
>>
>> Interesting, using curl sometimes I can it fine, but wunderfixer is 
>> always getting a 403 Forbidden, as if it is actively being blocked...  When 
>> it doesn't work in curl I get `HTTP/1.1 404 Not Found` and when it does 
>> work I get `HTTP/1.1 200 OK`.  Curl never gets a 403 error.
>>
>> On Wednesday, May 22, 2019 at 11:48:08 AM UTC-6, Jarom Hatch wrote:
>>>
>>> I was able to get it to work twice in my web browser, but as you said, 
>>> it is sporadic.  I don't ever recall them using Akamai before so that may 
>>> very well be a contributing factor.
>>>
>>> I wonder if we can find out the origin address and see what happens if 
>>> we can bypass Akamai...
>>>
>>> On Wednesday, May 22, 2019 at 7:35:18 AM UTC-6, Leon Shaner wrote:
>>>>
>>>> For one thing, the URL of this form:
>>>>
>>>>
>>>> http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1
>>>>
>>>> Is now redirecting to one using HTTPS:
>>>>
>>>>
>>>> https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1
>>>>
>>>> Also, the redirect itself takes an excruciatingly long time.
>>>> So I just changed the URL to https directly...
>>>>
>>>> The first time I tried any of the above using CURL this morning it 
>>>> worked, but then after that I started getting:
>>>>
>>>> An error occurred while processing your request.
>>>>
>>>> Reference #30.6f451160.1558531514.16ced4f6
>>>> It looks as if they've put some kind of Akamai proxy in the middle, 
>>>> which is fine for static content, but not so fine for a query of this 
>>>> nature.  Strange that it worked for me the very first time.  It's almost 
>>>> as 
>>>> if the Akamai "farm" has lost some "state" information and not all nodes 
>>>> have the same content, so if you get stuck going through a bad node you 
>>>> get 
>>>> a bogus response.
>>>>
>>>> Attached is a transcript of a failed attempt.  I put SOMESTATION there 
>>>> only after the fact.  The actual query was for my actual station, which 
>>>> used to work.
>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "weewx-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/weewx-user/07ac6f86-ae4d-4854-8398-ce4ab8d846c1%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/weewx-user/07ac6f86-ae4d-4854-8398-ce4ab8d846c1%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "weewx-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org
>>  
>> <https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "weewx-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/weewx-user/FA3780B4-F4CB-4897-9CA5-87557D62DAF7%40isylum.org
>>  
>> <https://groups.google.com/d/msgid/weewx-user/FA3780B4-F4CB-4897-9CA5-87557D62DAF7%40isylum.org?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
> You received this message because you are subscribed to the Google Groups 
> "weewx-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/weewx-user/49f5ecce-f082-4d64-848c-98e07e3d6349%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/weewx-user/49f5ecce-f082-4d64-848c-98e07e3d6349%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-user/3f05e4eb-67d5-4dfe-94bd-3d95e1cc7eb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to