https://bugzilla.wikimedia.org/show_bug.cgi?id=54359
--- Comment #2 from [email protected] --- We recently fixed a bug that is very similar to the problem you describe. Let me make sure it's only related and not the same thing. So until some weeks back, when looking at the active editors for country X (e.g.: [1]), the graph showed the data for one of the countries in the set {X, Y1, Y2, Y3}. Reloading the page might show the data for a different country of the same set {X, Y1, Y2, Y3}. When ordering Y1, Y2, Y3 alphabetically, they were really close to each other. The root cause seems to have been column order mismatches between different versions of the same file. This problem was solved by trying to make sense of ~17k files and removing ~15k stale files/duplicates. (In reply to comment #0) > [...] than relying on row numbers.) > [...] > My theory, confirmed at one point by Evan, was that the graphs relied on row > numbers Sorry to be nitpicking here, but since you are talking both here and also some lines above about /row/ numbers, let me make sure we are talking about the same files. Do you really mean /row/ numbers (that could totally be the case, but would hint towards you using files that I have not yet discovered in our repos), or /column/ numbers (As for example used in [2])? The files produced by Evans geowiki scripts (i.e.: "Active Editor" data) rely on column numbers. Yes, the column number of country X in file Z.csv might change between any given day. And in fact, they not only “might” change, they actually do change often. (For the current geowiki dashboards, graphs, ... those frequent changes are not a problem, as we regenerate the relevant files for each run using the current csvs) > [...], which > shifted the row numbers, which caused the graph to display false and > misleading > data. If that graph displayed false data, that's a real problem from my perspective. But since you are using past tense in your description and you also state that you cannot force to reproduce the problem… are we still affected by the problem? If so, could you point me to a concrete file/URL that causes problems? > Since I have no insight into the data-generating scripts themselves, [...] The scripts are at https://gerrit.wikimedia.org/r/#/admin/projects/analytics/geowiki . As usual: Patches welcome :-) You can find a rough overview of the geowiki dataflow at https://wikitech.wikimedia.org/wiki/Analytics/Geowiki#Dataflow . [1] http://gp.wmflabs.org/graphs/en_germany_all [2] http://gp.wmflabs.org/data/datafiles/gp/en_all.csv -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
