https://bugzilla.wikimedia.org/show_bug.cgi?id=27774

           Summary: Username to user_id match is inconsistent in revisions
                    of dump.
           Product: XML Snapshots
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: analytics
          Severity: enhancement
          Priority: Normal
         Component: General
        AssignedTo: ar...@wikimedia.org
        ReportedBy: dvanli...@gmail.com
                CC: tf...@wikimedia.org, aaron.halfa...@gmail.com


Username to user_id match is inconsistent in revisions of dump.  
This could be a characteristic of how and when the username field gets updated
in the revision table.  If so, it would be nice to have a clear explanation of
what things to expect (e.g. deleted users, username changes, etc).
We see a range of inconsistencies along the lines of many usernames matched
with the same ID, many IDs matched with the same username, non-ip usernames
with no ID and completely missing user information.
Our approach is to associate a user_id with its most recent username and
propagate this username to all instances of user_id.

Proposed solution:

1) Run SQL query to synchronize usernames with userids.
2) Run SQL query to replace cases hostname is the username.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to