https://bugzilla.wikimedia.org/show_bug.cgi?id=47368
--- Comment #6 from Ori Livneh <[email protected]> --- (In reply to comment #0) > Inconsistency is causing issues with scripts which now have to treat the log > database specially. This isn't EventLogging exceptionalism, you know: URIs are UTF-8, MediaWiki speaks UTF-8, the Python source is UTF-8, the local of the system it's running on is UTF-8, the ZeroMQ stream is UTF-8, MongoDB is UTF-8, and it is the most common encoding on Android and iOS. Binary character encoding is a bit of a misnomer, since it's not actually a character encoding but the absence of one. When all of your interactions with the database are mediated by MediaWiki, then it doesn't matter much how the database is configured as long as MediaWiki knows how to work with it. But EventLogging data has many consumers, working with data in SQL, CSV and XML format in Python and R and direct database client GUIs and who knows what else. Leaving the character encoding unspecified means leaving it up to application code, which in this case is scattered across a number of different codebases. This opens up a vast gulf of indeterminacy and a large potential for encoding issues. Getting all these different pieces, written in different languages and running on different systems, to speak with one another was tricky as hell, but it's working now, and we'd need to have a really compelling reason for changing it. I think this is a WONTFIX, but leaving it open for now in case Yuvi wants to make a thorough case for a migration. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
