On 21 June 2011 17:52, Sebastian Schelter <[email protected]> wrote: > I guess it depends on what features you want to use to detect those fake > profiles.
Yes, sometimes networks can be copied wholesale. So for example see http://en.wikipedia.org/wiki/Ex.plode.us http://brainstorm.tribe.net/thread/34fb1a79-351d-4251-8318-829623c1c9cb ... when explode.us reproduced the entire social graph of tribe.net on a new site. Thousands of 'genuine fakes'. From the user's point of a view these were perceived as fake copies of their real profile. From a data structure point of view the graphs were identical, and you'd need to use technologies like openid/oauth to address the relevant notion of authenticity. There is also mischief sometimes with a profile being copied as a way of gaining trust of the profile owner's friends. But I guess you're more looking for spam accounts etc? ie. the victim is a site not a user. > If you want to look at network features of the social graph there is not > much Mahout has to offer currently. We had a patch starting a graph mining > module recently but its only at its very beginning. Maybe interesting re http://www.amazon.ca/Understanding-Complex-Datasets-Mining-Decompositions/dp/1584888326 ... there is a chapter in there on use of graph decompositions for social graph analysis, and the kinds of preprocessing approaches that have been adopted, to have social relationships more 'visible' to later processing. (The chapter seems to be online at http://91-641.wiki.uml.edu/file/view/graphschapter.pdf though I've no idea if it is meant to be.). I'm curious how much of that could be handled within Mahout's framework, but I've not got my head around the (walk Laplacian etc etc) details. cheers, Dan
