[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-03-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

   Priority|Unprioritized   |Low
Version|unspecified |1.23-git

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

Niklas Laxström niklas.laxst...@gmail.com changed:

   What|Removed |Added

 CC||amir.ahar...@mail.huji.ac.i
   ||l

--- Comment #6 from Niklas Laxström niklas.laxst...@gmail.com ---
For definite answer you need to ask Tim who wrote the code. My guess is that it
just followed the pattern of BagOStuff which serializes values.

I will undermine your use case a bit though:
1) You should not be accessing a cache directly, it may not even exist.
2) WMF is using CDB files, no DB for l10n cache.
3) Why don't you spin up MediaWiki and ask it to provide you the information
you need. No network needed for that.

The language engineering team does not have knowledge of or capacity to address
all i18n/l10n related issues [1], but we will do our best to help you and make
you depend less on us.

[1] We spend largest portion of our time on feature development.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #7 from Nemo federicol...@tiscali.it ---
By the way, Toolserver has a special table for namespace names and I'm told
Labs recently got such feature too.
https://wiki.toolserver.org/view/Toolserver_database

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #8 from Oliver Keyes oke...@wikimedia.org ---
(In reply to Niklas Laxström from comment #6)
 For definite answer you need to ask Tim who wrote the code. My guess is that
 it just followed the pattern of BagOStuff which serializes values.
 
 I will undermine your use case a bit though:
 1) You should not be accessing a cache directly, it may not even exist.
Sure; if you can point to a better way to automatically get to this data on a
machine with no connection to the internet, I'm happy to hear it.
 2) WMF is using CDB files, no DB for l10n cache.
Then why is the table there? It's a MediaWiki feature we don't use?
 3) Why don't you spin up MediaWiki and ask it to provide you the information
 you need. No network needed for that.
Sure; that's not an easily replicable solution, though. To use that scenario;
every time the table is updated, I'd need to spin up a local MediaWiki
instance, query it, retrieve the data, save that to a file, manually transfer
to the file to [stat1001/stat1002/analytics*/delete-as-applicable]...and so
would anyone else looking at doing things with granularity at the namespace
level. That kind of approach would be more easily done by just querying the
APIs in a loop, retrieving the data as JSON files, and transferring thatthe
problem being the 'automation' bit.
 
 The language engineering team does not have knowledge of or capacity to
 address all i18n/l10n related issues [1], but we will do our best to help
 you and make you depend less on us.
 

Thanks :).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #9 from Oliver Keyes oke...@wikimedia.org ---
(In reply to Nemo from comment #7)
 By the way, Toolserver has a special table for namespace names and I'm told
 Labs recently got such feature too.
 https://wiki.toolserver.org/view/Toolserver_database

huh; interesting. Prrobably not directly applicable, because there's no direct
connection betwixt the analytics machines and the toolserver dbs to my
knowledge, but the way they extracted that could be useful.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #10 from Nemo federicol...@tiscali.it ---
(In reply to Oliver Keyes from comment #9)
 huh; interesting. Prrobably not directly applicable, because there's no
 direct connection betwixt the analytics machines and the toolserver dbs to
 my knowledge, but the way they extracted that could be useful.

Or you could start using the infrastructure everyone uses instead of
reinventing the wheel.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #11 from Oliver Keyes oke...@wikimedia.org ---
(In reply to Nemo from comment #10)
 (In reply to Oliver Keyes from comment #9)
  huh; interesting. Prrobably not directly applicable, because there's no
  direct connection betwixt the analytics machines and the toolserver dbs to
  my knowledge, but the way they extracted that could be useful.
 
 Or you could start using the infrastructure everyone uses instead of
 reinventing the wheel.

You mean, Tool Labs? Sure, I'll just poke Legal and see how they feel about
moving our request logs to Labs. Oh, wait ;p.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #12 from Niklas Laxström niklas.laxst...@gmail.com ---
(In reply to Oliver Keyes from comment #8)
  2) WMF is using CDB files, no DB for l10n cache.
 Then why is the table there? It's a MediaWiki feature we don't use?

Database tables are created unconditionally. The store can be changed with
wgGlobals configuration, and WMF uses CDB distributed to the application
servers for speed.

Can you tell a bit more where do you need the namespaces? Are you parsing the
wikitext? Doesn't for example pagelinks table have the namespace resolved to
the numerical id already?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #13 from Oliver Keyes oke...@wikimedia.org ---
(In reply to Niklas Laxström from comment #12)
 
 Can you tell a bit more where do you need the namespaces? Are you parsing
 the wikitext? Doesn't for example pagelinks table have the namespace
 resolved to the numerical id already?

Nope, the RequestLogs - which are unfortunately totally detached from MediaWiki
proper.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #14 from Niklas Laxström niklas.laxst...@gmail.com ---
Then I don't see any other solution currently other than 3) I mentioned. If you
don't want to install MediaWiki on the analytics servers, the second best thing
would be to automate it with some kind of script keeping the constraints in
mind.

After getting the namespaces, should be prepared to mirror some of
transformations MW handles: spaces/plusses/underscores, url encoding, charset
encoding, case insensitivity, unicode normalizations, redirect pages... Some of
these cause an actual redirect, some of the don't.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-23 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

Nemo federicol...@tiscali.it changed:

   What|Removed |Added

  Component|Internationalization|Database
   Severity|normal  |enhancement

--- Comment #4 from Nemo federicol...@tiscali.it ---
But we still lack a use case. One server in the world with DB but without API
isn't really convincing. The kind of information you need (like content
namespace or not) also has nothing to do with l10n, so there is no reason to
believe it would last forever in there if it's even available.

On the why, that's the format used in other tables too, for instance
log_params, so your question should be rephrased as: document why and where the
DB tables contain serialiased data.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-23 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #5 from Oliver Keyes oke...@wikimedia.org ---
(In reply to Nemo from comment #4)
 But we still lack a use case. One server in the world with DB but without
 API isn't really convincing.
 
It's more than one server; it's going to be all our analytics machines, and
that's the crucial element. If the scenario was reversed - some strange
universe in which we had one production machine and the rest were
analytics-based - the argument wouldn't hold any water. Part of this, sure, is
that production is 90% of our use cases, but that's only part of it. The other
is simply that research and analytics /are/ a use case, and one of increasing
importance. We now have 6 researchers, 4 analytics engineers, a director-level
position and a Product Manager; arguing that this is not something worth
addressing, either by justifying it or solving it, simply because those people
can get their work done on only a few machines, ignores the tremendous
resources being thrown behind analytics.

A use case was provided with the original post ;p.

The kind of information you need (like content
 namespace or not) also has nothing to do with l10n, so there is no reason to
 believe it would last forever in there if it's even available.

It's absolutely available; please do me the courtesy of assuming I did basic
research and attempted to solve the problem through other means before
submitting the bug. If you want to check yourself, look for namespaceNames and
namespaceAliases in lc_key.

Sure, namespaceNames are not /directly/ a localisation problem - they're
wiki-based as well as language based - but they are, most of the time,
language-based.

More importantly, whether they do or do not last forever (which seems a very
strange test. /nothing/ we do lasts forever. well, other than the projects.
That's kind of why we're here), they're currently stored there. The solution to
the problem is for the language engineering team to either (a) explain why
serialised PHP is the best plausible way to store this data or (b) change it. 
I'm not asking for a solution proof against any possible permutation of future
events, because that's impossible, but the fact of the matter is that this
table is the source of the issue as it stands.


 On the why, that's the format used in other tables too, for instance
 log_params, so your question should be rephrased as: document why and where
 the DB tables contain serialiased data.

Sure, if documentation was all I was asking for ;p. log_params is less crucial;
to my knowledge the only thing we tend to use it for is retrieving the patrol
status of a page (that is, whether it was patrolled automatically or manually),
and that's something you can extract with a minimal amount of effort because
it's not a complex piece of data - it's just the 1 or 0 closest to the end of
the string.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-22 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

Niklas Laxström niklas.laxst...@gmail.com changed:

   What|Removed |Added

 Status|NEW |UNCONFIRMED
 Ever confirmed|1   |0

--- Comment #1 from Niklas Laxström niklas.laxst...@gmail.com ---
We are not using serialized files for localisation cache. The cache is either
in CDB format or in the SQL database.

Are you saying that the strings in the database are serialized?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-22 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

--- Comment #2 from Oliver Keyes oke...@wikimedia.org ---
(In reply to Niklas Laxström from comment #1)

 Are you saying that the strings in the database are serialized?

Indeed (or, at least, the arrays appear to be)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 61802] Use a different format for l10n_cache (or document why the current one is the best one)

2014-02-22 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=61802

Oliver Keyes oke...@wikimedia.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #3 from Oliver Keyes oke...@wikimedia.org ---
https://www.mediawiki.org/wiki/Manual:L10n_cache_table would seem to count as
confirmed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l