[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 Andre Klapper aklap...@wikimedia.org changed: What|Removed |Added Priority|Unprioritized |Low Version|unspecified |1.23-git -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 Niklas Laxström niklas.laxst...@gmail.com changed: What|Removed |Added CC||amir.ahar...@mail.huji.ac.i ||l --- Comment #6 from Niklas Laxström niklas.laxst...@gmail.com --- For definite answer you need to ask Tim who wrote the code. My guess is that it just followed the pattern of BagOStuff which serializes values. I will undermine your use case a bit though: 1) You should not be accessing a cache directly, it may not even exist. 2) WMF is using CDB files, no DB for l10n cache. 3) Why don't you spin up MediaWiki and ask it to provide you the information you need. No network needed for that. The language engineering team does not have knowledge of or capacity to address all i18n/l10n related issues [1], but we will do our best to help you and make you depend less on us. [1] We spend largest portion of our time on feature development. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #7 from Nemo federicol...@tiscali.it --- By the way, Toolserver has a special table for namespace names and I'm told Labs recently got such feature too. https://wiki.toolserver.org/view/Toolserver_database -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #8 from Oliver Keyes oke...@wikimedia.org --- (In reply to Niklas Laxström from comment #6) For definite answer you need to ask Tim who wrote the code. My guess is that it just followed the pattern of BagOStuff which serializes values. I will undermine your use case a bit though: 1) You should not be accessing a cache directly, it may not even exist. Sure; if you can point to a better way to automatically get to this data on a machine with no connection to the internet, I'm happy to hear it. 2) WMF is using CDB files, no DB for l10n cache. Then why is the table there? It's a MediaWiki feature we don't use? 3) Why don't you spin up MediaWiki and ask it to provide you the information you need. No network needed for that. Sure; that's not an easily replicable solution, though. To use that scenario; every time the table is updated, I'd need to spin up a local MediaWiki instance, query it, retrieve the data, save that to a file, manually transfer to the file to [stat1001/stat1002/analytics*/delete-as-applicable]...and so would anyone else looking at doing things with granularity at the namespace level. That kind of approach would be more easily done by just querying the APIs in a loop, retrieving the data as JSON files, and transferring thatthe problem being the 'automation' bit. The language engineering team does not have knowledge of or capacity to address all i18n/l10n related issues [1], but we will do our best to help you and make you depend less on us. Thanks :). -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #9 from Oliver Keyes oke...@wikimedia.org --- (In reply to Nemo from comment #7) By the way, Toolserver has a special table for namespace names and I'm told Labs recently got such feature too. https://wiki.toolserver.org/view/Toolserver_database huh; interesting. Prrobably not directly applicable, because there's no direct connection betwixt the analytics machines and the toolserver dbs to my knowledge, but the way they extracted that could be useful. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #10 from Nemo federicol...@tiscali.it --- (In reply to Oliver Keyes from comment #9) huh; interesting. Prrobably not directly applicable, because there's no direct connection betwixt the analytics machines and the toolserver dbs to my knowledge, but the way they extracted that could be useful. Or you could start using the infrastructure everyone uses instead of reinventing the wheel. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #11 from Oliver Keyes oke...@wikimedia.org --- (In reply to Nemo from comment #10) (In reply to Oliver Keyes from comment #9) huh; interesting. Prrobably not directly applicable, because there's no direct connection betwixt the analytics machines and the toolserver dbs to my knowledge, but the way they extracted that could be useful. Or you could start using the infrastructure everyone uses instead of reinventing the wheel. You mean, Tool Labs? Sure, I'll just poke Legal and see how they feel about moving our request logs to Labs. Oh, wait ;p. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #12 from Niklas Laxström niklas.laxst...@gmail.com --- (In reply to Oliver Keyes from comment #8) 2) WMF is using CDB files, no DB for l10n cache. Then why is the table there? It's a MediaWiki feature we don't use? Database tables are created unconditionally. The store can be changed with wgGlobals configuration, and WMF uses CDB distributed to the application servers for speed. Can you tell a bit more where do you need the namespaces? Are you parsing the wikitext? Doesn't for example pagelinks table have the namespace resolved to the numerical id already? -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #13 from Oliver Keyes oke...@wikimedia.org --- (In reply to Niklas Laxström from comment #12) Can you tell a bit more where do you need the namespaces? Are you parsing the wikitext? Doesn't for example pagelinks table have the namespace resolved to the numerical id already? Nope, the RequestLogs - which are unfortunately totally detached from MediaWiki proper. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #14 from Niklas Laxström niklas.laxst...@gmail.com --- Then I don't see any other solution currently other than 3) I mentioned. If you don't want to install MediaWiki on the analytics servers, the second best thing would be to automate it with some kind of script keeping the constraints in mind. After getting the namespaces, should be prepared to mirror some of transformations MW handles: spaces/plusses/underscores, url encoding, charset encoding, case insensitivity, unicode normalizations, redirect pages... Some of these cause an actual redirect, some of the don't. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 Nemo federicol...@tiscali.it changed: What|Removed |Added Component|Internationalization|Database Severity|normal |enhancement --- Comment #4 from Nemo federicol...@tiscali.it --- But we still lack a use case. One server in the world with DB but without API isn't really convincing. The kind of information you need (like content namespace or not) also has nothing to do with l10n, so there is no reason to believe it would last forever in there if it's even available. On the why, that's the format used in other tables too, for instance log_params, so your question should be rephrased as: document why and where the DB tables contain serialiased data. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #5 from Oliver Keyes oke...@wikimedia.org --- (In reply to Nemo from comment #4) But we still lack a use case. One server in the world with DB but without API isn't really convincing. It's more than one server; it's going to be all our analytics machines, and that's the crucial element. If the scenario was reversed - some strange universe in which we had one production machine and the rest were analytics-based - the argument wouldn't hold any water. Part of this, sure, is that production is 90% of our use cases, but that's only part of it. The other is simply that research and analytics /are/ a use case, and one of increasing importance. We now have 6 researchers, 4 analytics engineers, a director-level position and a Product Manager; arguing that this is not something worth addressing, either by justifying it or solving it, simply because those people can get their work done on only a few machines, ignores the tremendous resources being thrown behind analytics. A use case was provided with the original post ;p. The kind of information you need (like content namespace or not) also has nothing to do with l10n, so there is no reason to believe it would last forever in there if it's even available. It's absolutely available; please do me the courtesy of assuming I did basic research and attempted to solve the problem through other means before submitting the bug. If you want to check yourself, look for namespaceNames and namespaceAliases in lc_key. Sure, namespaceNames are not /directly/ a localisation problem - they're wiki-based as well as language based - but they are, most of the time, language-based. More importantly, whether they do or do not last forever (which seems a very strange test. /nothing/ we do lasts forever. well, other than the projects. That's kind of why we're here), they're currently stored there. The solution to the problem is for the language engineering team to either (a) explain why serialised PHP is the best plausible way to store this data or (b) change it. I'm not asking for a solution proof against any possible permutation of future events, because that's impossible, but the fact of the matter is that this table is the source of the issue as it stands. On the why, that's the format used in other tables too, for instance log_params, so your question should be rephrased as: document why and where the DB tables contain serialiased data. Sure, if documentation was all I was asking for ;p. log_params is less crucial; to my knowledge the only thing we tend to use it for is retrieving the patrol status of a page (that is, whether it was patrolled automatically or manually), and that's something you can extract with a minimal amount of effort because it's not a complex piece of data - it's just the 1 or 0 closest to the end of the string. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 Niklas Laxström niklas.laxst...@gmail.com changed: What|Removed |Added Status|NEW |UNCONFIRMED Ever confirmed|1 |0 --- Comment #1 from Niklas Laxström niklas.laxst...@gmail.com --- We are not using serialized files for localisation cache. The cache is either in CDB format or in the SQL database. Are you saying that the strings in the database are serialized? -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 --- Comment #2 from Oliver Keyes oke...@wikimedia.org --- (In reply to Niklas Laxström from comment #1) Are you saying that the strings in the database are serialized? Indeed (or, at least, the arrays appear to be) -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802 Oliver Keyes oke...@wikimedia.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #3 from Oliver Keyes oke...@wikimedia.org --- https://www.mediawiki.org/wiki/Manual:L10n_cache_table would seem to count as confirmed. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l