I think embeddings[1] would be a nice way to create a signature.
Essentially, we could dump data about a person's activities into it (words
added, namespaces edited, time of day of edits, temporal frequency of
editing, # of revisions per session, frequency of citation by type, etc.)
and get a signature that could represent several aspects of behavior.   The
vectors that come out of an embedding would allow us to provide a distance
measure between one editor's behavior and another editor's behavior.

That said, I think it is more likely that we would be able to match
behaviors that look more like experienced editors generally than one
specific editor who might be the primary account of the sock puppet.
Still, this might be useful for many aspects of newcomer support and
patrolling work.  E.g. if a new account looks like an experienced editor,
they might not need an invite to the Teahouse.  In fact such an editor
account may be a sock or a legitimate alternative account.  On the other
hand, if a newcomer account is getting reverted or warned a lot but doesn't
behave like an advertiser or a POV-pusher, we probably want to reach out to
them to help.

I'm really interested in investing in embedding-based strategies for
tracking the topic-space of content and clustering behaviors but I don't
have the resources on the Scoring Platform team[2] to do any sort of
serious engineering work with embeddings right now.  In the meantime, I'm
interested in talking to external researchers about collaborations and
possibly even short term contracts to dig into these types of modeling
problems.  If anyone out there is interested in that, please reach out.

In the meantime, we're working on more rudimentary AIs that can help us
sort vandals from everyone else. :)

1. https://en.wikipedia.org/wiki/Embedding
2. https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team

-Aaron

On Fri, Aug 23, 2019 at 12:27 AM Kerry Raymond <[email protected]>
wrote:

> To reply to my own question .
>
>
>
> Can we find a way to create a "signature" of an account's pattern of
> editing? Perhaps it might be a set of signatures, maybe one for the
> categories that the account appears to be active in, another for the type
> of
> edit, etc. Then if these signatures were calculated for all banned accounts
> or currently blocked accounts (or at least ones with a long enough
> contribution history to make it worthwhile - we're not interested in
> one-edit vandals), then we could have a tool that could be run to quickly
> compare one account against the signatures of banned/blocked accounts as
> well as the cumulative edits of a set of known sockpuppets (i.e. treat them
> as a single account) to determine if this may be a sockpuppet case meriting
> further investigation. I imagine that it would be too expensive
> computationally to actually run comparisons of the contribution histories
> of
> all "bad guy" accounts against the suspicious account which is why I
> propose
> a "signature" approach (but I'm happy to be told otherwise).
>
>
>
> If we had such a tool and it proves reasonably reliable in identifying
> likely sockpuppets (not asking for guarantees but close enough not to be a
> waste of time to investigate), then we could routinely use it on new
> accounts or reactivating accounts (i.e. possible sleeper accounts) once
> they
> have a long enough editing history to enable the tool to operate
> effectively
> to provide automated early warning of new/reactivating accounts appearing
> suspiciously similar to "bad guy" accounts.
>
>
>
> But this is a hard problem, both technically and socially (Assume Good
> Faith, Privacy, etc), so I welcome the thoughts of others.
>
>
>
> Kerry
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


-- 

Aaron Halfaker

Principal Research Scientist

Head of the Scoring Platform team
Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to