Guys, Thanks so much for your prompt feedback. Basically, what I am doing is to keep sending the request based on date & time until we reach to another day. Specifically, what I have is something like:
api_url = 'http://en.wikipedia.org/w/api.php' date='20160504022715' while (True): api_params = 'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&rcd ir=newer&format=json&rcstart={date}'.format(date=date) f = urllib2.Request(api_url, api_params) source = urllib2.urlopen(f, None, 300).read() source = json.loads(source) Increase date. Given the above code, I am encountering an weird situation. In the query, if I set rclimit to 500 then it runs normally. However, if I set rclimit to 5000 like my previous email, I will see the error. I know that for recent change rclimit should be set to 500. But is there anything particular about the values of rclimit that could lead to the break in json ? On 5/5/16, 11:16 PM, "Wikitech-l on behalf of MZMcBride" <[email protected] on behalf of [email protected]> wrote: >Trung Dinh wrote: >>Hi all, >>I have an issue why trying to parse data fetched from wikipedia api. >>This is the piece of code that I am using: >>api_url = >>'https://urldefense.proofpoint.com/v2/url?u=http-3A__en.wikipedia.org_w_a >>pi.php&d=CwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=K9jJjNfacravQkfypdTZOg&m=Gl3eq >>wsc58M_ot8G6G2qehCjARnv3B19Uv5b6hApJz4&s=AjBJxhe0ZaeTqz3r3wPQOH_kiIjq2_h4 >>UgKIgJUC5XQ&e= ' >>api_params = >>'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&r >>c >>dir=newer&format=json&rcstart=20160504022715' >> >>f = urllib2.Request(api_url, api_params) >>print ('requesting ' + api_url + '?' + api_params) >>source = urllib2.urlopen(f, None, 300).read() >>source = json.loads(source) >> >>json.loads(source) raised the following exception " Expecting , >>delimiter: line 1 column 817105 (char 817104" >> >>I tried to use source.encode('utf-8') and some other encodings but they >>all didn't help. >>Do we have any workaround for that issue ? Thanks :) > >Hi. > >Weird, I can't reproduce this error. I had to import the "json" and >"urllib2" modules, but after doing so, executing the code you provided >here worked fine for me: <https://phabricator.wikimedia.org/P3009>. > >You probably want to use >'https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_w_a >pi.php&d=CwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=K9jJjNfacravQkfypdTZOg&m=Gl3eqw >sc58M_ot8G6G2qehCjARnv3B19Uv5b6hApJz4&s=aw9laFsQi8JGilqru0zbRUlrBdcWj52NmF >tRw6ZW5sI&e= ' as your >end-point (HTTPS, not HTTP). > >As far as I know, JSON is always encoded as UTF-8, so you shouldn't need >to encode or decode the data explicitly. > >The error you're getting generally means that the JSON was malformed for >some reason. It seems unlikely that MediaWiki's api.php is outputting >invalid JSON, but I suppose it's possible. > >Since you're coding in Python, you may be interested in a framework such >as <https://github.com/alexz-enwp/wikitools>. > >MZMcBride > > > >_______________________________________________ >Wikitech-l mailing list >[email protected] >https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
