chasemp added a comment.
Not meant exactly as a direct response to @faidon, he just hit the three
big-ish options on the nose, though there are variations of them.
> * migrate accounts off Bugzilla/RT as-is (email as the username for
> Bugzilla)
> * prompt users to just recover/rename their account if they wish so
> * migrate comments as they are, owned by their correct owner and with the
> correct timestamp
The history surrounding these options is spread out over hundreds of tickets so
I'm going to try to hit some highlights to kind of bring this comment narrative
together I hope. We all know bugzilla uses emails as usernames, but bugzilla
also does some responsible stuff. If you go to a bugzilla ticket without
authentication you see:
{F210}
They are smart enough not to leak emails to anonymous crawling, etc. But
phabricator isn't setup to handle the emails as username use case. We know
that we want maniphest to be search engine friendly, and indeed I think a few
of us have found our own issues already while searching for phabricator related
things. Andre explained to us that one of the biggest points of contention
with the community is the problem of bugzilla being cavalier with user emails.
Users will/have/do actively refuse to sign up and report bugs because they
don't want to their emails broadcast (which I'm not saying is reasonable
necessarily -- just true), and the ones that have signed up are not happy with
it either. The idea of using emails as usernames is just an unsellable,
undefendable, unworkable solution from this perspective. Or at least after
lots of discussion related to this, we all thought it would tank the project
from the beginning for community support.
So what about some derivative or some generated username or something like
their anonymous user id from bugzilla? There are a lot of social problems with
that, but chief among them is that a static comment that shows the old username
(as it was shown in the anonymous bugzilla sense) and text is more desirable
than a "real" username that no one knows and a comment that has lost the
context of the original user. So the idea of noise usernames from old systems
seems good at first, and then when you do it there is a lot missing. Looking
through a few hundred of the first tickets in that mode leaves you thinking,
who is this? is this a wmf person? We could do "1234-WMF" for old WMF
accounts and "123" for old non-WMF accounts, but it gets super messy and not at
all better than a static comment that is at least straight forward. There is
no one yet who we have had to explain what a static comment is, or how it
works. It's an obvious ploy for metadata migration. There is no
guarantee that many of the people from bugzilla, or rt, or any other system we
import will ever show up again and so it may never be fixed from this
perspective. We could do something in the middle, use the "generic" name
bugzilla uses which is just the "foo" part of "[email protected]", but bugzilla
doesn't worry much about overlap or identity. The [email protected] and
[email protected] issue.
Then there is the problem of usernames and alias management in phabricator.
Renaming users, or trying to merge accounts, especially ones that may be
claimed a year or two years from now is problematic. If you attempt to change
your username in Phabricator the first part of the prompt warning is this:
>The old username will no longer be tied to the user, so anything which uses it
>(like old commit >messages) will no longer associate correctly. (And, if you
>give a user a username which some other user used to have, username lookups
>will begin returning the wrong user.)
The long story short there is, we import [email protected] as goo, and let's say
there is no name conflict with the name from rt or trello or mingle or whatever
other system we are going to migrate from next. It's doubtful that goo wants
goo to be their username in phab, it seems few actually do want their username
to be the first part of whatever email they were using. But goo doesn't show
back up for another year, however their comments and their work has been
referenced a bunch of times, when they come back and claim their account
(however that is done) they immediately want to change their identity as they
didn't actually choose the one we assigned. We lose necessary history in
phabricator for their account from the point of migration to the point of
claim. When we looked at doing this across the many, many, many accounts from
bugzilla it was not a username shuffle game that seemed sane to get into. It
gets weirder as this huge swath of account names we have taken up are chosen
by actual users later and the history of who is really who becomes problematic.
The other outcome that is preferable in phabricator now is that you know who
is a real person. Of those bugzilla accounts there will not be a 1:1
translation to phabricator accounts, but if we mock up a bunch of dummy
accounts and link their emails it is hard to tell who has //actually//
registered in this system. That is not even taking into account the
questionable nature of moving user's emails unsolicited to a system they never
signed up for. That stuff could be handled in a tactful way most likely, but
the mess of 'who is really here?' is actually more substantial than it seems at
first.
Then there is the problem of how users can claim accounts. We could make a
fake auth provider, we could setup some special page that handles it on its own
and creates an associated account in phab after we validate an email, we could
write a real auth provider for bugzilla and/or RT. If this was desirable
(because we had created our army of dummy accounts) the logic for it would be a
merge headache for as long as we want to allow this claiming to happen. Which
ideally is forever, even if someday we don't worry about their comment history
coming along or whatever. Any way we figured it there was ongoing cost we
prefer not to incur. One of the small advantages @mmodell and I have is we
have maintained phabricator before, and we know that they move pretty fast.
They encourage weekly merges, and the longest spread I know of is 6 months
(facebook), and it seems like kind of a support disaster for upstream. That
kind of rolling release requires an ever escalating amount of time as
local patches stack up, or get complicated, and coming out of the gate that way
wasn't taken lightly.
A big part of my thinking on this has been, what is going to put is in the best
place in 6 months or a year? The first year is going to be a pain no matter
what, but after that bumpy ride which approach leaves us in the best position
for easy tracking of upstream and low ops intervention for future users who are
late, or very late to the party? So our conclusion on that front was to find a
way to bend the native internals of phabricator to provide the needed logic,
and then to shim a cron and a bit of metadata swapping. The idea of static
comments, and users claiming their history, may seem unintuitive, but out of
the box on the first day of migration anyone can pick up any ticket and it
makes sense. Also though, if one or thirty thousand people sign up we may
trail a bit on associating their old history, but we can do it seamlessly from
several historical sources, and in the end, after the first/second/third waves
everything will be "right" -- or right enough.
One of the pieces of metadata that has been critical is dependency
relationships between tickets, but these don't follow a straight line through
history. Issue 1 can depend on issue 200, and issue 300 can depend on issue 3.
When importing a series of many thousands of tickets it isn't possible to do
this linking correctly on the first pass. If you create an issue and things it
is dependent on are further down the line a second wave of patch-up massaging
is likely necessary. This reality was really the impetus for the current
approach. It was necessary and turned out to be not that difficult -- at least
comparatively.
So whether any of that seems justifiable is probably subjective, but suffice to
say we came up with a thousand ways not to migrate accounts from multiple
historical sources. One of the things discussed early on for this approach of
'pulling the table cloth off the table with the dishes on it' metadata
massaging was "can we live with static comments?". @aklapper and I have
talked about it a lot, or more to the point I have bugged him about it a lot,
and the answer for the team was 'yes'. Static comments is not a usability
issue, and with the number and size of the other problems we have it is a
reasonable compromise. So the issue has never been, why can't we find a way to
make comment nicer?, it has always been one of survivable, if not reasonable,
compromises. I don't think anyone on the phabricator team felt like it was a
deal breaker, and honestly, we have been doing a lot more than 4 people worth
of work. If you try to search for issues there is no sort or search
functionality that is lost with static comments. There is no way to search for
things you have commented on as the act of commenting subscribes you to an
issue, and you can search for issues you are subscribed to. We are migrating
the "CC" or "Subscribers" list from bugzilla so we didn't take issue with the
lack.
Why is comment metadata complex? Well it's more like it's not simple. If you
load a ticket you are fetching an object, and along with it comes a bunch of
transactions (in the phabricator sense). Those transactions can be adding
CC's, dependencies, or whatever, and comments too. Transactions can also spawn
other transactions. If you create a comment and you @mention a user that
association is a transaction in and of itself. Transactions have unique global
id's, authors, time metadata, and if their type is comment they associate with
a comment history that has the same. So to do really savvy metadata fixup you
have to change the right fields on all of those objects, and the objects they
spawn, and the interactions while not opaque can still be a pain.
So that's some off-the-cuff back-story, and I'm not saying they are good
thoughts or bad thoughts, but they were //the// thoughts. Not really meant to
form the basis of an argument for or against any one thing, only to convey the
myriad number of issues. It's late so if I've said something preposterous I
apologize.
All that being said, I spent a portion of this week figuring out how we could
do it if comments are a priority. When @mmodell and I talked earlier we
thought the above was the best compromise, and I still honestly don't find
comment metadata to be all that compelling. However, after more exploring I
think it can be done in a semi-reasonable fashion.
________
So here is the proposal from the phabricator team :)
- We can set ourselves up to fix comment metadata in the nicest of nice ways
for users
- We don't know if we can do this first run during the actual initial migration
window, the time estimates didn't include this thinking. But certainly shortly
thereafter, and maybe say once a week, once a month, or every few months for
awhile we can batch process things to an amendable state.
- Part of the 'batch run' thinking is two fold. We are not sure how this will
impact trying to do it during normal hours (in theory fine) and part of that is
it involves having to invalidate some remarkup caching so that updated things
are shown correctly. Anytime you start invalidating cache on a broad scale
while users are doing their thing chaos can ensue so at this point the thinking
is scheduled off-hours runs.
- After said batch runs user foo will now in theory be seamlessly integrated
into Phab with history from source only historically identifiable by their
external reference id (fl, bz, rt, etc).
- I'm thinking 7 or 8 days of upfront work / testing to get this to a solid
state. Part of that is lots of testing of partial cache invalidation, Phab
only has tooling to support global but that would be pretty heavy handed.
TASK DETAIL
https://phabricator.wikimedia.org/T572
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
<username>.
To: chasemp
Cc: wikibugs-l, chasemp, Aklapper, Qgil, mmodell, Eloquence, faidon, RobLa-WMF,
mark, jeremyb
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l