| Smalyshev added a comment. |
I don't think there are many other regex libraries which have been verified as safe for arbitrary user input, with limited time and memory.
PCRE is probably not one, especially pcre+PHP, I've seen a bunch of issues there. Also I think full PCRE is way over-powered for what we need here, and subsequently the attack area is much wider. It is also not too hard to get PCRE into pathological behavior time- and memory-wise, given control over regexp and string. Java regexps also suffer from the same problems, but I've added mitigation for it to WDQS lately.
If we run PCRE on user-supplied regexps, we should either sanitize them (no idea how), or run them in very tightly sequestered environment, if we're running PHP+PCRE essentially assuming we're running a hostile code. That seems to be kind of dangerous, so we should be very careful here.
Cc: Smalyshev, tstarling, daniel, GWicke, Joe, Lucas_Werkmeister_WMDE, Krinkle, Aklapper, GoranSMilovanovic, QZanden, Agabi10, Izno, SBisson, Wikidata-bugs, aude, jayvdb, fbstj, RobLa-WMF, santhosh, Jdforrester-WMF, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
