https://bugzilla.wikimedia.org/show_bug.cgi?id=37536

Merlijn van Deen <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #1 from Merlijn van Deen <[email protected]> 2012-06-13 21:53:59 
UTC ---
Simple test script, based on
http://nl.wikipedia.org/wiki/Lijst_van_alle_Radio_2_Top_2000's (1,189,202
bytes) . These were run from willow.toolserver.org.

/* ----------------------

import wikipedia
import datetime

p_get = wikipedia.Page('nl', "Lijst_van_alle_Radio_2_Top_2000's")
p_put = wikipedia.Page('nl', 'Gebruiker:Valhallasw/lange pagina')
text = p_get.get()
print len(text)
text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
p_put.put(text)

   ---------------------- */

Under IPv6 (default), the output is the following:

/* --------------------
(...snip...)
>>> print len(text)
1189202
>>> text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
>>> p_put.put(text)
Sleeping for 3.8 seconds, 2012-06-13 21:50:10
Updating page [[Gebruiker:Valhallasw/lange pagina]] via API
<urlopen error timed out>
WARNING: Could not open 'http://nl.wikipedia.org/w/api.php'. Maybe the server
or
 your connection is down. Retrying in 1 minutes...
   -------------------- */

Under IPv4 (with the patch shown below), the output is the following:

/* --------------------
(...snip...)
>>> print len(text)
1189202
>>> text = datetime.datetime.now().isoformat() + "\n\n" + p_get.get()
>>> p_put.put(text)
Sleeping for 4.0 seconds, 2012-06-13 21:48:27
Updating page [[Gebruiker:Valhallasw/lange pagina]] via API
(302, 'OK', {u'pageid': 2846006, u'title': u'Gebruiker:Valhallasw/lange
pagina', u'newtimestamp': u'2012-06-13T21:49:21Z', u'result': u'Success',
u'oldrevid': 31455180, u'newrevid': 31455194})
   -------------------- */

The hack to test this is the following:

Index: families/wikipedia_family.py
===================================================================
--- families/wikipedia_family.py        (revision 10117)
+++ families/wikipedia_family.py        (working copy)
@@ -44,7 +44,7 @@
         if family.config.SSL_connection:
             self.langs = dict([(lang, None) for lang in
self.languages_by_size])
         else:
-            self.langs = dict([(lang, '%s.wikipedia.org' % lang) for lang in
self.languages_by_size])
+            self.langs = dict([(lang, '91.198.174.225') for lang in
self.languages_by_size])

         # Override defaults
         self.namespaces[1]['ja'] = [u'ノート', u'トーク']
Index: wikipedia.py
===================================================================
--- wikipedia.py        (revision 10117)
+++ wikipedia.py        (working copy)
@@ -5437,6 +5437,7 @@
             'User-agent': useragent,
             'Content-Length': str(len(data)),
             'Content-type':contentType,
+            'Host': 'nl.wikipedia.org',
         }
         if cookies:
             headers['Cookie'] = cookies
Index: pywikibot/comms/http.py
===================================================================
--- pywikibot/comms/http.py     (revision 10117)
+++ pywikibot/comms/http.py     (working copy)
@@ -54,6 +54,7 @@

     headers = {
         'User-agent': useragent,
+        'Host': 'nl.wikipedia.org',
         #'Accept-Language': config.mylang,
         #'Accept-Charset': config.textfile_encoding,
         #'Keep-Alive': '115',


Note, however, that this could also be a bug in the python http stack...

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to