chasemp added a comment.

Not meant exactly as a direct response to @faidon, he just hit the three 
big-ish options on the nose, though there are variations of them.

>   * migrate accounts off Bugzilla/RT as-is (email as the username for 
> Bugzilla)
>   * prompt users to just recover/rename their account if they wish so
>   * migrate comments as they are, owned by their correct owner and with the 
> correct timestamp


The history surrounding these options is spread out over hundreds of tickets so 
I'm going to try to hit some highlights to kind of bring this comment narrative 
together I hope.  We all know bugzilla uses emails as usernames, but bugzilla 
also does some responsible stuff.  If you go to a bugzilla ticket without 
authentication you see:

{F210}

They are smart enough not to leak emails to anonymous crawling, etc.  But 
phabricator isn't setup to handle the emails as username use case.  We know 
that we want maniphest to be search engine friendly, and indeed I think a few 
of us have found our own issues already while searching for phabricator related 
things.  Andre explained to us that one of the biggest points of contention 
with the community is the problem of bugzilla being cavalier with user emails.  
Users will/have/do actively refuse to sign up and report bugs because they 
don't want to their emails broadcast (which I'm not saying is reasonable 
necessarily -- just true), and the ones that have signed up are not happy with 
it either.  The idea of using emails as usernames is just an unsellable, 
undefendable, unworkable solution from this perspective.  Or at least after 
lots of discussion related to this, we all thought it would tank the project 
from the beginning for community support.

So what about some derivative or some generated username or something like 
their anonymous user id from bugzilla?  There are a lot of social problems with 
that, but chief among them is that a static comment that shows the old username 
(as it was shown in the anonymous bugzilla sense) and text is more desirable 
than a "real" username that no one knows and a comment that has lost the 
context of the original user.  So the idea of noise usernames from old systems 
seems good at first, and then when you do it there is a lot missing.  Looking 
through a few hundred of the first tickets in that mode leaves you thinking, 
who is this?  is this a wmf person?  We could do "1234-WMF" for old WMF 
accounts and "123" for old non-WMF accounts, but it gets super messy and not at 
all better than a static comment that is at least straight forward.  There is 
no one yet who we have had to explain what a static comment is, or how it 
works.  It's an obvious ploy for metadata migration.  There is no
guarantee that many of the people from bugzilla, or rt, or any other system we 
import will ever show up again and so it may never be fixed from this 
perspective.  We could do something in the middle, use the "generic" name 
bugzilla uses which is just the "foo" part of "[email protected]", but bugzilla 
doesn't worry much about overlap or identity.  The [email protected] and 
[email protected] issue.  

Then there is the problem of usernames and alias management in phabricator.  
Renaming users, or trying to merge accounts, especially ones that may be 
claimed a year or two years from now is problematic.  If you attempt to change 
your username in Phabricator the first part of the prompt warning is this:

>The old username will no longer be tied to the user, so anything which uses it 
>(like old commit >messages) will no longer associate correctly. (And, if you 
>give a user a username which some other user used to have, username lookups 
>will begin returning the wrong user.)

The long story short there is, we import [email protected] as goo, and let's say 
there is no name conflict with the name from rt or trello or mingle or whatever 
other system we are going to migrate from next.  It's doubtful that goo wants 
goo to be their username in phab, it seems few actually do want their username 
to be the first part of whatever email they were using.  But goo doesn't show 
back up for another year, however their comments and their work has been 
referenced a bunch of times, when they come back and claim their account 
(however that is done) they immediately want to change their identity as they 
didn't actually choose the one we assigned.  We lose necessary history in 
phabricator for their account from the point of migration to the point of 
claim.  When we looked at doing this across the many, many, many accounts from 
bugzilla it was not a username shuffle game that seemed sane to get into.  It 
gets weirder as this huge swath of account names we have taken up are chosen
by actual users later and the history of who is really who becomes problematic. 
 The other outcome that is preferable in phabricator now is that you know who 
is a real person.  Of those bugzilla accounts there will not be a 1:1 
translation to phabricator accounts, but if we mock up a bunch of dummy 
accounts and link their emails it is hard to tell who has //actually// 
registered in this system. That is not even taking into account the 
questionable nature of moving user's emails unsolicited to a system they never 
signed up for. That stuff could be handled in a tactful way most likely, but 
the mess of 'who is really here?' is actually more substantial than it seems at 
first.

Then there is the problem of how users can claim accounts.  We could make a 
fake auth provider, we could setup some special page that handles it on its own 
and creates an associated account in phab after we validate an email, we could 
write a real auth provider for bugzilla and/or RT.  If this was desirable 
(because we had created our army of dummy accounts) the logic for it would be a 
merge headache for as long as we want to allow this claiming to happen.  Which 
ideally is forever, even if someday we don't worry about their comment history 
coming along or whatever.  Any way we figured it there was ongoing cost we 
prefer not to incur.  One of the small advantages @mmodell and I have is we 
have maintained phabricator before, and we know that they move pretty fast.  
They encourage weekly merges, and the longest spread I know of is 6 months 
(facebook), and it seems like kind of a support disaster for upstream.  That 
kind of rolling release requires an ever escalating amount of time as
local patches stack up, or get complicated, and coming out of the gate that way 
wasn't taken lightly.

A big part of my thinking on this has been, what is going to put is in the best 
place in 6 months or a year?  The first year is going to be a pain no matter 
what, but after that bumpy ride which approach leaves us in the best position 
for easy tracking of upstream and low ops intervention for future users who are 
late, or very late to the party?  So our conclusion on that front was to find a 
way to bend the native internals of phabricator to provide the needed logic, 
and then to shim a cron and a bit of metadata swapping.  The idea of static 
comments, and users claiming their history, may seem unintuitive, but out of 
the box on the first day of migration anyone can pick up any ticket and it 
makes sense.  Also though, if one or thirty thousand people sign up we may 
trail a bit on associating their old history, but we can do it seamlessly from 
several historical sources, and in the end, after the first/second/third waves 
everything will be "right" -- or right enough.

One of the pieces of metadata that has been critical is dependency 
relationships between tickets, but these don't follow a straight line through 
history.  Issue 1 can depend on issue 200, and issue 300 can depend on issue 3. 
 When importing a series of many thousands of tickets it isn't possible to do 
this linking correctly on the first pass.  If you create an issue and things it 
is dependent on are further down the line a second wave of patch-up massaging 
is likely necessary.  This reality was really the impetus for the current 
approach. It was necessary and turned out to be not that difficult -- at least 
comparatively.

So whether any of that seems justifiable is probably subjective, but suffice to 
say we came up with a thousand ways not to migrate accounts from multiple 
historical sources.  One of the things discussed early on for this approach of 
'pulling the table cloth off the table with the dishes on it' metadata 
massaging was "can we live with static comments?".   @aklapper and I have 
talked about it a lot, or more to the point I have bugged him about it a lot, 
and the answer for the team was 'yes'.  Static comments is not a usability 
issue, and with the number and size of the other problems we have it is a 
reasonable compromise.  So the issue has never been, why can't we find a way to 
make comment nicer?, it has always been one of survivable, if not reasonable, 
compromises.  I don't think anyone on the phabricator team felt like it was a 
deal breaker, and honestly, we have been doing a lot more than 4 people worth 
of work.  If you try to search for issues there is no sort or search
functionality that is lost with static comments.  There is no way to search for 
things you have commented on as the act of commenting subscribes you to an 
issue, and you can search for issues you are subscribed to.  We are migrating 
the "CC" or "Subscribers" list from bugzilla so we didn't take issue with the 
lack.

Why is comment metadata complex?  Well it's more like it's not simple.  If you 
load a ticket you are  fetching an object, and along with it comes a bunch of 
transactions (in the phabricator sense).  Those transactions can be adding 
CC's, dependencies, or whatever, and comments too.  Transactions can also spawn 
other transactions.  If you create a comment and you @mention a user that 
association is a transaction in and of itself.  Transactions have unique global 
id's, authors, time metadata, and if their type is comment they associate with 
a comment history that has the same.  So to do really savvy metadata fixup you 
have to change the right fields on all of those objects, and the objects they 
spawn, and the interactions while not opaque can still be a pain.

So that's some off-the-cuff back-story, and I'm not saying they are good 
thoughts or bad thoughts, but they were //the// thoughts.  Not really meant to 
form the basis of an argument for or against any one thing, only to convey the 
myriad number of issues.  It's late so if I've said something preposterous I 
apologize.

All that being said, I spent a portion of this week figuring out how we could 
do it if comments are a priority.  When @mmodell and I talked earlier we 
thought the above was the best compromise, and I still honestly don't find 
comment metadata to be all that compelling.  However, after more exploring I 
think it can be done in a semi-reasonable fashion.

________


So here is the proposal from the phabricator team :)

- We can set ourselves up to fix comment metadata in the nicest of nice ways 
for users

- We don't know if we can do this first run during the actual initial migration 
window, the time estimates didn't include this thinking.  But certainly shortly 
thereafter, and maybe say once a week, once a month, or every few months for 
awhile we can batch process things to an amendable state.

- Part of the 'batch run' thinking is two fold.  We are not sure how this will 
impact trying to do it during normal hours (in theory fine) and part of that is 
it involves having to invalidate some remarkup caching so that updated things 
are shown correctly.  Anytime you start invalidating cache on a broad scale 
while users are doing their thing chaos can ensue so at this point the thinking 
is scheduled off-hours runs.

- After said batch runs user foo will now in theory be seamlessly integrated 
into Phab with history from source only historically identifiable by their 
external reference id (fl, bz, rt, etc).

- I'm thinking 7 or 8 days of upfront work / testing to get this to a solid 
state.  Part of that is lots of testing of partial cache invalidation, Phab 
only has tooling to support global but that would be pretty heavy handed.

TASK DETAIL
  https://phabricator.wikimedia.org/T572

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

To: chasemp
Cc: wikibugs-l, chasemp, Aklapper, Qgil, mmodell, Eloquence, faidon, RobLa-WMF, 
mark, jeremyb



_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to