https://bugzilla.wikimedia.org/show_bug.cgi?id=27774
Summary: Username to user_id match is inconsistent in revisions of dump. Product: XML Snapshots Version: unspecified Platform: All OS/Version: All Status: NEW Keywords: analytics Severity: enhancement Priority: Normal Component: General AssignedTo: ar...@wikimedia.org ReportedBy: dvanli...@gmail.com CC: tf...@wikimedia.org, aaron.halfa...@gmail.com Username to user_id match is inconsistent in revisions of dump. This could be a characteristic of how and when the username field gets updated in the revision table. If so, it would be nice to have a clear explanation of what things to expect (e.g. deleted users, username changes, etc). We see a range of inconsistencies along the lines of many usernames matched with the same ID, many IDs matched with the same username, non-ip usernames with no ID and completely missing user information. Our approach is to associate a user_id with its most recent username and propagate this username to all instances of user_id. Proposed solution: 1) Run SQL query to synchronize usernames with userids. 2) Run SQL query to replace cases hostname is the username. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l