Thanks Dan. I told Toby he should subscribe to this list :)

Regarding #1 another option could be to:
• only allow user_ids as keys (after all most JSON consumers prefer to work 
with user_ids) but add user_names as attributes
• return both user_ids and user_names as separate columns in flat CSVs. 

Either way, it sounds like this would be a great addition for Wikimetrics 
customers.

Dario

On Nov 26, 2013, at 5:01 PM, Dan Andreescu <[email protected]> wrote:

> These things have definitely been discussed before, so it's time to get them 
> prioritized.  CC-ed Toby directly so he can follow up:
> 
> 1. wikimetrics should allow user_name to be the key in report outputs.  Right 
> now, only user_id is allowed and this is not great.  LiAnna, Jaime, and 
> Jessie are definitely interested in this, and have mentioned it a few times.
> 2. wikimetrics should allow "generated cohorts" as implemented by user 
> metrics api.  These are cohorts defined by reports on other cohorts.  For 
> example, if we run report R on cohort C, then generated cohort (GC) would be: 
> GC = {user | user in C and R(user) is true}.  Dario is definitely interested 
> in this, and Jaime might be as well.
> 
> 
> On Tue, Nov 26, 2013 at 6:59 PM, LiAnna Davis <[email protected]> wrote:
> I would LOVE it if the output gave user names instead of user IDs. Often the 
> data makes me want to investigate the individual stories of contributors who 
> added a lot of content/made a lot of edits/etc., but there's no way of doing 
> that with user IDs since I can't convert user IDs to usernames.
> 
> 
> 
> 
> On Tue, Nov 26, 2013 at 2:46 PM, Dario Taraborelli 
> <[email protected]> wrote:
> thanks for the clarification Jaimee – it sounds like we should consider 
> adding user_names to the output if this is the main cause of the problem 
> instead of building functionality at the input to deal with this. Dan, any 
> thoughts?
> 
> BTW this notion of rerunning cohort analysis for members of a previous cohort 
> who meet specific criteria is a use case that Product/Editor Engagement is 
> also interested in. We used to call these “generated cohorts” in the old 
> design plans for UserMetrics and I’d love if we revisited this feature 
> requests and its relative priority.
> 
> D
> 
> On Nov 26, 2013, at 2:35 PM, Jaime Anstee <[email protected]> wrote:
> 
>> Missed the question back to me, sorry.  Mixed cohorts might occur due to the 
>> output as user IDs while collection is of usernames - say someone has a 
>> repeating events and has a csv output of data for those new users that were 
>> retained at a certain activity level from Point A to B and then has new 
>> cohort members opt in at Point B but only wants to include those that 
>> already survived from Point A and new at Point B  cohort members for 
>> examining at another Point C.  Without the output of usernames to create the 
>> active Point B cohort separately this would make the Point C cohort a mix of 
>> qualified user ids and new user names.  There are several ways of dealing 
>> with this, it was just the first scenario I could think of that could cause 
>> this.  Seems we still need to revisit the possibility of accessing usernames 
>> as output, also for reasons of matching to other data points where most 
>> users and most program leaders do not know user ids - Jaime
>> 
>> -- 
>> 
>> Jaime Anstee, Ph.D
>> Program Evaluation Specialist
>> Wikimedia Foundation
>> +1.415.839.6885 ext 6869
>> www.wikimediafoundation.org
>> 
>> Imagine a world in which every single human being can freely share in the 
>> sum of all knowledge. Help us make it a reality!
>> https://donate.wikimedia.org
>> 
>> 
>> 
>> On Fri, Nov 22, 2013 at 4:04 PM, Dario Taraborelli 
>> <[email protected]> wrote:
>> that works for me, thanks! 
>> 
>> Jaimee – can you give us more details on the use case for mixed cohorts that 
>> you had in mind?
>> 
>> On Nov 22, 2013, at 3:28 PM, Dan Andreescu <[email protected]> wrote:
>> 
>>> 
>>> So, for now, until I figure out how to fix this, it will always prefer 
>>> user_names before user_ids.
>>> 
>>> I think this is an argument for making users specifying whether it's names 
>>> or ids up front, and not allowing mixtures. Assuming it might be a mixture 
>>> and looking for names first is almost certain to produce inaccurate results 
>>> at some point. We have ids precisely to avoid collisions with names, 
>>> allowing for renaming users, and other cases. 
>>> 
>>> Yep, I just learned this the hard way and made a fool of myself in front of 
>>> a bunch of people I admire.  So, I'd be glad if I'm the only one that this 
>>> happens to.  If nobody objects, I'm going to allow the user to select 
>>> whether their cohort contains user_ids OR user_names, and strictly prohibit 
>>> mixtures.
>>> 
>>> _______________________________________________
>>> Wikimetrics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>> 
>> 
>> _______________________________________________
>> Wikimetrics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikimetrics
>> 
>> 
>> _______________________________________________
>> Wikimetrics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikimetrics
> 
> 
> _______________________________________________
> Wikimetrics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikimetrics
> 
> 
> 
> 
> -- 
> LiAnna Davis
> Wikipedia Education Program Communications Manager
> Wikimedia Foundation
> http://education.wikimedia.org
> (415) 839-6885 x6649
> [email protected]
> 
> _______________________________________________
> Wikimetrics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikimetrics
> 
> 
> _______________________________________________
> Wikimetrics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikimetrics

_______________________________________________
Wikimetrics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikimetrics

Reply via email to