Xqt has submitted this change and it was merged.

Change subject: Change title whitelist to title blacklist
......................................................................


Change title whitelist to title blacklist

Titles with characters outside the BMP [1] (>\uFFFF) are now no longer
detected as illegal. See this thread: [2]

[1] https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane
[2] http://thread.gmane.org/gmane.comp.python.pywikipediabot.general/13197/

This list of characters was generated by using the old re and by
enumerating characters:

import re
m = re.compile(u'''[^ 
%!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\u0080-\uFFFF+]''')
for x in range(0,0x80):
   if m.match(unichr(x)):
         print "%x" % x,

0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
23 3c 3e 5b 5d 7b 7c 7d 7f

Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1
---
M pywikibot/page.py
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Xqt: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/pywikibot/page.py b/pywikibot/page.py
index e51977c..58debb7 100644
--- a/pywikibot/page.py
+++ b/pywikibot/page.py
@@ -2853,8 +2853,8 @@
 
     """
     illegal_titles_pattern = re.compile(
-        # Matching titles will be held as illegal.
-            u'''[^ %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\u0080-\uFFFF+]'''
+            # Matching titles will be held as illegal.
+            ur'''[\x00-\x1f\x23\x3c\x3e\x5b\x5d\x7b\x7c\x7d\x7f]'''
             # URL percent encoding sequences interfere with the ability
             # to round-trip titles -- you can't link to them consistently.
             u'|%[0-9A-Fa-f]{2}'

-- 
To view, visit https://gerrit.wikimedia.org/r/78525
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1
Gerrit-PatchSet: 2
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Merlijn van Deen <[email protected]>
Gerrit-Reviewer: Ladsgroup <[email protected]>
Gerrit-Reviewer: Legoktm <[email protected]>
Gerrit-Reviewer: Merlijn van Deen <[email protected]>
Gerrit-Reviewer: Xqt <[email protected]>
Gerrit-Reviewer: jenkins-bot

_______________________________________________
Pywikibot-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-commits

Reply via email to