| Lucas_Werkmeister_WMDE added a comment. |
So change Iaaac950b48 implements caching on the most granular level we have, for the “format constraint” check: does arbitrary text match arbitrary regex? This is very common constraint type – practically all “external identifier” properties have one or more regexes to ensure that the identifier matches a certain format, many “Commons link” properties have regexes to assert that the link has a particular file name extension, and several “URL” properties check the protocol with a regex (e. g. only https?://.+). It is also, according to our statsd tracking, one of the most expensive constraint types. And as long as the regex flavor is stable, the result is valid indefinitely: it depends only on the text and the regex.
The question is whether it makes sense to cache this information, given the scale of Wikimedia’s object cache. I wrote a small script (P5921) to check how many format constraints each property has (it can have several), and how many statements, which should tell us how many combinations of text and regex there are to be cached (approximately). The script fails for one property (too many statements to count in WDQS), so I got the count for that property via another tool (see commit message for details). Overall, I estimate there are about one hundred million text-regex combinations for which we could cache the boolean result.
I assume that most of these combinations will never be hit (currently, only 193 users have the checkConstraints gadget enabled, and many items with “external identifier” statements will probably never be visited by any user with the gadget enabled), but still, that’s a huge number, and I have no idea if it’s sensible to cache that or not. Can someone from #performance-team comment on this?
Cc: gerritbot, Ladsgroup, daniel, Aklapper, Jonas, Lucas_Werkmeister_WMDE, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Agabi10, Izno, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
