https://bugzilla.wikimedia.org/show_bug.cgi?id=52777
Web browser: ---
Bug ID: 52777
Summary: user_properties table bloat
Product: Wikimedia
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General/Unknown
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected], [email protected],
[email protected]
Classification: Unclassified
Mobile Platform: ---
On enwiki, the user_properties table has about 60M rows, for only 20M users.
This is incredible considering that user_properties is meant to only store
non-default options, to reduce DB space. The index length is about 2.2 GB, and
the data size is about 3.7GB.
By sampling, the number of user_properties rows per user can be estimated. The
problem is very dependent on user_id, and is mostly confined to user_id values
less than 10M, i.e. users created before mid-2009.
user_id props/user
---------------------------
0 8.2615
1000000 5.8696
2000000 4.9534
3000000 4.8038
4000000 4.6013
5000000 4.3775
6000000 4.4137
7000000 5.3833
8000000 5.8919
9000000 6.4356
10000000 0.8789
11000000 1.1052
12000000 1.0005
13000000 0.9774
14000000 1.0987
Sampling 1000 users with user_id<10M, we find that the main culprits are:
searchNs-1 : 968 users
skin: 964 users
thumbsize: 912 users
75% of the skin rows have an empty string as their value, which causes
Skin::newFromKey() to return the default skin, same as if the row was missing.
The rest are mostly "monobook", presumably manually set via the
UsabilityInitiative OptIn extension.
"searchNs-1" is a bug, it relates to searching the special namespace, which is
not possible. It is "0" in all sampled rows.
"thumbsize" is "3" in all sampled rows, which is not the default, the default
is "4" on all WMF wikis other than svwiki. In addition to bloat of the
user_properties table, this causes fragmentation of the parser cache. There's
no way 91% of users prior to 2009 manually set this value, it must have been
set by a bug.
We should remove unnecessary or incorrectly inserted rows, and ensure that this
does not happen again (e.g. as a consequence of the resolution of bug 36316).
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l