https://bugzilla.wikimedia.org/show_bug.cgi?id=54359

--- Comment #2 from [email protected] ---
We recently fixed a bug that is very similar to the problem you
describe. Let me make sure it's only related and not the same thing.

So until some weeks back, when looking at the active editors for
country X (e.g.: [1]), the graph showed the data for one of the
countries in the set {X, Y1, Y2, Y3}. Reloading the page might show
the data for a different country of the same set {X, Y1, Y2, Y3}. When
ordering Y1, Y2, Y3 alphabetically, they were really close to each
other.

The root cause seems to have been column order mismatches between
different versions of the same file. This problem was solved by trying
to make sense of ~17k files and removing ~15k stale files/duplicates.


(In reply to comment #0)
> [...] than relying on row numbers.)
> [...]
> My theory, confirmed at one point by Evan, was that the graphs relied on row
> numbers

Sorry to be nitpicking here, but since you are talking both here and
also some lines above about /row/ numbers, let me make sure we are
talking about the same files. Do you really mean /row/ numbers (that
could totally be the case, but would hint towards you using files that
I have not yet discovered in our repos), or /column/ numbers (As for
example used in [2])?

The files produced by Evans geowiki scripts (i.e.: "Active Editor"
data) rely on column numbers. Yes, the column number of country X in
file Z.csv might change between any given day.

And in fact, they not only “might” change, they actually do change often.

(For the current geowiki dashboards, graphs, ... those frequent
changes are not a problem, as we regenerate the relevant files for
each run using the current csvs)

> [...], which
> shifted the row numbers, which caused the graph to display false and
> misleading
> data.

If that graph displayed false data, that's a real problem from my
perspective.

But since you are using past tense in your description and you also
state that you cannot force to reproduce the problem… are we still
affected by the problem?

If so, could you point me to a concrete file/URL that causes problems?

> Since I have no insight into the data-generating scripts themselves, [...]

The scripts are at
  https://gerrit.wikimedia.org/r/#/admin/projects/analytics/geowiki
. As usual: Patches welcome :-)
You can find a rough overview of the geowiki dataflow at
  https://wikitech.wikimedia.org/wiki/Analytics/Geowiki#Dataflow
.

[1] http://gp.wmflabs.org/graphs/en_germany_all
[2] http://gp.wmflabs.org/data/datafiles/gp/en_all.csv

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to