Thanks Dan. I told Toby he should subscribe to this list :) Regarding #1 another option could be to: • only allow user_ids as keys (after all most JSON consumers prefer to work with user_ids) but add user_names as attributes • return both user_ids and user_names as separate columns in flat CSVs.
Either way, it sounds like this would be a great addition for Wikimetrics customers. Dario On Nov 26, 2013, at 5:01 PM, Dan Andreescu <[email protected]> wrote: > These things have definitely been discussed before, so it's time to get them > prioritized. CC-ed Toby directly so he can follow up: > > 1. wikimetrics should allow user_name to be the key in report outputs. Right > now, only user_id is allowed and this is not great. LiAnna, Jaime, and > Jessie are definitely interested in this, and have mentioned it a few times. > 2. wikimetrics should allow "generated cohorts" as implemented by user > metrics api. These are cohorts defined by reports on other cohorts. For > example, if we run report R on cohort C, then generated cohort (GC) would be: > GC = {user | user in C and R(user) is true}. Dario is definitely interested > in this, and Jaime might be as well. > > > On Tue, Nov 26, 2013 at 6:59 PM, LiAnna Davis <[email protected]> wrote: > I would LOVE it if the output gave user names instead of user IDs. Often the > data makes me want to investigate the individual stories of contributors who > added a lot of content/made a lot of edits/etc., but there's no way of doing > that with user IDs since I can't convert user IDs to usernames. > > > > > On Tue, Nov 26, 2013 at 2:46 PM, Dario Taraborelli > <[email protected]> wrote: > thanks for the clarification Jaimee – it sounds like we should consider > adding user_names to the output if this is the main cause of the problem > instead of building functionality at the input to deal with this. Dan, any > thoughts? > > BTW this notion of rerunning cohort analysis for members of a previous cohort > who meet specific criteria is a use case that Product/Editor Engagement is > also interested in. We used to call these “generated cohorts” in the old > design plans for UserMetrics and I’d love if we revisited this feature > requests and its relative priority. > > D > > On Nov 26, 2013, at 2:35 PM, Jaime Anstee <[email protected]> wrote: > >> Missed the question back to me, sorry. Mixed cohorts might occur due to the >> output as user IDs while collection is of usernames - say someone has a >> repeating events and has a csv output of data for those new users that were >> retained at a certain activity level from Point A to B and then has new >> cohort members opt in at Point B but only wants to include those that >> already survived from Point A and new at Point B cohort members for >> examining at another Point C. Without the output of usernames to create the >> active Point B cohort separately this would make the Point C cohort a mix of >> qualified user ids and new user names. There are several ways of dealing >> with this, it was just the first scenario I could think of that could cause >> this. Seems we still need to revisit the possibility of accessing usernames >> as output, also for reasons of matching to other data points where most >> users and most program leaders do not know user ids - Jaime >> >> -- >> >> Jaime Anstee, Ph.D >> Program Evaluation Specialist >> Wikimedia Foundation >> +1.415.839.6885 ext 6869 >> www.wikimediafoundation.org >> >> Imagine a world in which every single human being can freely share in the >> sum of all knowledge. Help us make it a reality! >> https://donate.wikimedia.org >> >> >> >> On Fri, Nov 22, 2013 at 4:04 PM, Dario Taraborelli >> <[email protected]> wrote: >> that works for me, thanks! >> >> Jaimee – can you give us more details on the use case for mixed cohorts that >> you had in mind? >> >> On Nov 22, 2013, at 3:28 PM, Dan Andreescu <[email protected]> wrote: >> >>> >>> So, for now, until I figure out how to fix this, it will always prefer >>> user_names before user_ids. >>> >>> I think this is an argument for making users specifying whether it's names >>> or ids up front, and not allowing mixtures. Assuming it might be a mixture >>> and looking for names first is almost certain to produce inaccurate results >>> at some point. We have ids precisely to avoid collisions with names, >>> allowing for renaming users, and other cases. >>> >>> Yep, I just learned this the hard way and made a fool of myself in front of >>> a bunch of people I admire. So, I'd be glad if I'm the only one that this >>> happens to. If nobody objects, I'm going to allow the user to select >>> whether their cohort contains user_ids OR user_names, and strictly prohibit >>> mixtures. >>> >>> _______________________________________________ >>> Wikimetrics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikimetrics >> >> >> _______________________________________________ >> Wikimetrics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikimetrics >> >> >> _______________________________________________ >> Wikimetrics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikimetrics > > > _______________________________________________ > Wikimetrics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimetrics > > > > > -- > LiAnna Davis > Wikipedia Education Program Communications Manager > Wikimedia Foundation > http://education.wikimedia.org > (415) 839-6885 x6649 > [email protected] > > _______________________________________________ > Wikimetrics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimetrics > > > _______________________________________________ > Wikimetrics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikimetrics
_______________________________________________ Wikimetrics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikimetrics
