I had a look. It's not trivial to say the least. Actually I'd say it's not doable at all. Here's a quick summary of why:
The main page (for your zip code/location) loads a lot of scripts, one of which has the JSONP payload we want. So the first step would need to be something that still parses the main page for your location to search for and extract the correct script URL to then load and parse. Here's the main page for zip code 10020: http://www.weather.com/weather/today/l/10020:4:US Load that in Chrome with the developer tools panel open. Stand well back. What you'll find among the blizzard of other resources that get loaded is this: http://dsx.weather.com/wxd/v2/(BERecord/en_US;MORecord/en_US)/USNY1252:1:US?api=7bb1c920-7027-4289-9c96-ae5e263980bc&jsonp=angular.callbacks._c That provides the following JSONP (again, discoverable only via the developer tools panel): --------------------- angular.callbacks._c({ "status": 200, "body": [ {"id": "/wxd/v2/BERecord/en_US/USNY1252:1:US", "status": 204 } , {"id": "/wxd/v2/MORecord/en_US/USNY1252:1:US", "status": 200, "generatedTime": 1417019149, "cacheMaxSeconds": 300, "currentTime": 1417019201 , "doc": {"MOHdr":{"obsStn":"T72503067","procTm":"20141126111035","_procTmLocal":"2014-11-26T11:10:35.000-05:00","procTmISO":"2014-11-26T16:10:35.000Z"},"MOData":{"stnNm":"Glen Head","obsDayGmt":"20141126","obsTmGmt":"160500","dyNght":"D","locObsDay":"20141126","locObsTm":"110416","tmpF":35,"tmpC":2,"sky":11,"wx":"Light Rain","iconExt":1201,"alt":30.1,"baroTrnd":2,"baroTrndAsc":"Falling Rapidly","ceil":800,"ceilM":244,"clds":"OVC","dwptF":33,"dwptC":1,"hIF":35,"hIC":2,"rH":94,"pres":1020.2,"presChnge":-0.05,"visM":5.0,"visK":8.05,"wCF":27,"wCC":-3,"wDir":20,"wDirAsc":"NNE","wSpdM":10,"wSpdK":16,"wSpdKn":9,"tmpMx24F":55,"tmpMx24C":13,"tmpMn24F":35,"tmpMn24C":2,"tmpMx6F":-21,"tmpMx6C":-29,"prcp24":0.27,"prcp3_6hr":0.27,"prcpHr":0.05,"prcpMTD":4.47,"prcpYr":39.69,"prcp2Dy":0.27,"prcp3Dy":0.83,"prcp7Dy":0.83,"snwDep":0.5,"snwIncr":0.2,"snwTot":0.5,"snwTot6hr":0.5,"snwMTD":0.8,"snwSsn":0.8,"snwYr":47.6,"snw2Dy":0.5,"snw3Dy":0.5,"snw7Dy":0.5,"sunrise":"06:52 am","sunset":"04:28 pm","uvIdx":1,"uvDes":"Low","uvWrn":0,"flsLkIdxF":27,"flsLkIdxC":-3,"recTyp":"TECCI","vocalCd":"OIT72503067:OZ201411261605:OT35:OTC27:OTF27:OTH55:OTL35:OTD-21:OU1:OH94:OX1201:OW01S10:OD33:OV50:OC8:OP3010T01:ORH5:ORQ27:ORD27:OSH2:OSQ5:OSD5:ORM447:ORY3969:OMR352:OYR4244:OSM8:OSY476:OSS8:OQ1156","avgMTDPrecip":3.52,"avgYTDPrecip":42.44,"wxMan":"wx2510","qulfr":"OQ1156","qulfrSvrty":2,"_presIn":30.13,"_altMeters":9.17,"_snwDepCm":1.27,"_prcp24Cm":0.69,"_prcp24Mm":6.86,"_prcpYrMm":1008.13,"_prcpMTDMm":113.54,"_prcp2DyMm":6.86,"_prcp3DyMm":21.08,"_prcp7DyMm":21.08,"_snwYrCm":120.9,"_snw2DyCm":1.27,"_snw3DyCm":1.27,"_snw7DyCm":1.27,"_sunriseISOLocal":"2014-11-26T06:52:00.000-05:00","_sunsetISOLocal":"2014-11-26T16:28:00.000-05:00","obsDateTimeISO":"2014-11-26T16:05:00.000Z","sunriseISO":"2014-11-26T11:52:00.000Z","sunsetISO":"2014-11-26T21:28:00.000Z","_obsDateLocalTimeISO":"2014-11-26T11:05:00.000-05:00","_extendedQulfrPhrase":"A mix of wintry precipitation is occurring at other points nearby.","_wDirAsc_en":"NNE"}} } ] }) --------------------- And in fact you can see in there various useful bits of information, like temperature, etc. JSON key/value pairs that could, in theory, be extracted. Here's the thing. That URL doesn't exist if you just download the main page via, say, curl (or via a perl script, same thing). So in other words, it's only because some other Javascript is evaluated that the browser then makes a request for that URL -- but without going through that process, you can't know what URL to request. And whatever state is necessary? You won't have that either. So a perl script can't access it unless we play guessing games with the URL and assume it will always be of a certain form, then grab it directly. That's not likely possible. My recommendation is to abandon weather.com as a source. If I had to do this insane parsing job for some reason I'd be looking at using PhantomJS: http://phantomjs.org/ which is basically a Javascript enabled headless browser that you can then interrogate. So you tell it to load the weather.com page, it will happily run their metric f**k ton of JS, and at that point you then you have access to the DOM and you can go to town, similar to inspecting things via the developer tools panel. Needless to say, this is not something I think that we want our Squeezebox servers doing... So I think people who want weather need another source and a reboot of the parser for that source. The good news is that screen scraping (DOM scraping, really) with the perl is actually very straightforward. As long as you don't need to run JS to get there... In fact I had a look at forecast.io. It's global, and although it too uses JSON, it's much more lightweight and straightforward. Again though, fitting all into the existing SDT framework would be some work no matter what. SBB _______________________________________________ plugins mailing list [email protected] http://lists.slimdevices.com/mailman/listinfo/plugins
