Guys, 

Thanks so much for your prompt feedback.
Basically, what I am doing is to keep sending the request based on date &
time until we reach to another day.
Specifically, what I have is something like:

api_url = 'http://en.wikipedia.org/w/api.php'
date='20160504022715'

while (True):
  api_params = 
'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&rcd
ir=newer&format=json&rcstart={date}'.format(date=date)
  f = urllib2.Request(api_url, api_params)
  source = urllib2.urlopen(f, None, 300).read()
  source = json.loads(source)
  Increase date.

Given the above code, I am encountering an weird situation. In the query,
if I set rclimit to 500 then it runs normally. However, if I set rclimit
to 5000 like my previous email, I will see the error. I know that for
recent change rclimit should be set to 500. But is there anything
particular about the values of rclimit that could lead to the break in
json ?

On 5/5/16, 11:16 PM, "Wikitech-l on behalf of MZMcBride"
<[email protected] on behalf of [email protected]>
wrote:

>Trung Dinh wrote:
>>Hi all,
>>I have an issue why trying to parse data fetched from wikipedia api.
>>This is the piece of code that I am using:
>>api_url = 
>>'https://urldefense.proofpoint.com/v2/url?u=http-3A__en.wikipedia.org_w_a
>>pi.php&d=CwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=K9jJjNfacravQkfypdTZOg&m=Gl3eq
>>wsc58M_ot8G6G2qehCjARnv3B19Uv5b6hApJz4&s=AjBJxhe0ZaeTqz3r3wPQOH_kiIjq2_h4
>>UgKIgJUC5XQ&e= '
>>api_params = 
>>'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&r
>>c
>>dir=newer&format=json&rcstart=20160504022715'
>>
>>f = urllib2.Request(api_url, api_params)
>>print ('requesting ' + api_url + '?' + api_params)
>>source = urllib2.urlopen(f, None, 300).read()
>>source = json.loads(source)
>>
>>json.loads(source) raised the following exception " Expecting ,
>>delimiter: line 1 column 817105 (char 817104"
>>
>>I tried to use source.encode('utf-8') and some other encodings but they
>>all didn't help.
>>Do we have any workaround for that issue ? Thanks :)
>
>Hi.
>
>Weird, I can't reproduce this error. I had to import the "json" and
>"urllib2" modules, but after doing so, executing the code you provided
>here worked fine for me: <https://phabricator.wikimedia.org/P3009>.
>
>You probably want to use
>'https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_w_a
>pi.php&d=CwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=K9jJjNfacravQkfypdTZOg&m=Gl3eqw
>sc58M_ot8G6G2qehCjARnv3B19Uv5b6hApJz4&s=aw9laFsQi8JGilqru0zbRUlrBdcWj52NmF
>tRw6ZW5sI&e= ' as your
>end-point (HTTPS, not HTTP).
>
>As far as I know, JSON is always encoded as UTF-8, so you shouldn't need
>to encode or decode the data explicitly.
>
>The error you're getting generally means that the JSON was malformed for
>some reason. It seems unlikely that MediaWiki's api.php is outputting
>invalid JSON, but I suppose it's possible.
>
>Since you're coding in Python, you may be interested in a framework such
>as <https://github.com/alexz-enwp/wikitools>.
>
>MZMcBride
>
>
>
>_______________________________________________
>Wikitech-l mailing list
>[email protected]
>https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to