jayvdb created this task.
jayvdb added a subscriber: jayvdb.
jayvdb added projects: pywikibot-core, Pywikibot-interwiki.py.
Herald added subscribers: pywikibot-bugs-list, Aklapper.
TASK DESCRIPTION
I've always considered this method to be a very strange thing, with its
definition of empty being unlikely to be useful elsewhere. Why 4? my guess is
that 4 characters is too small to be useful, e.g. {{a}} and [[a]] are 5 chars.
If `isEmpty` is unintentionally used on a category page, which on small wikis
are usually 'empty' except for category links, the category page will be
considered to be 'empty'. This is why interwiki.py needs to also check
`isCategory`.
My gut feeling is that this method should be copied to be a function in
interwiki.py, where you can optimise it how you want, so that it skips useless
pages using faster algorithms. e.g. if the page is in a content namespace,
rather than looking at interwikilinks and category links, I am guessing that
the following will be much faster and be almost as good:
```
def simpleEmptyCheck(page):
try:
# get the 50th character, if it exists
page.text[50]
return False
except IndexError:
return True
```
Then Page.isEmpty can be marked as `@deprecated` .
The use in page_tests would then be moved into a new test method in
TestPageDeprecation.
TASK DETAIL
https://phabricator.wikimedia.org/T112340
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb
Cc: Aklapper, jayvdb, pywikibot-bugs-list
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs