Dear R users
 
I'm a novice user of R and have absolutely no prior knowledge of social network 
analysis, so apologies if my question is trivial. I've spent alot of time 
trying to solve this on my own but I really can't so hope someone here can help 
me out. Cheers!
 
The dataset:
I'm trying to predict the existance of links (True or False) in a test set 
using a training set. Both data sets are in an "edgelist" format, where User 
IDs represents nodes in both columns with the 1st column directing to the 2nd 
column (see figure 1 below). Using the AUC to evaluate the performance, I am 
looking for the best algorithm to predict the existance of links in the test 
data (50% are true and rest are false).
 
Figure 1:
> training
Vertices: 1133143 
Edges: 999 
Directed: TRUE 
Edges:
                        
[0]       105 ->  850956
[1]       105 -> 1073420
[2]       105 -> 1102667
[3]       165 ->  888346
[4]       165 ->  579649
[5]       165 ->  136665
etc..
 
I'm having problems obtaining the probability scores for the links / edges as 
most of the scores are for the nodes. An example of this is the graph.knn and 
page.rank module in igraph. 
 
So my questions are:
1) What do I need to do to obtain the scores for the links instead of the nodes 
(I presume it must be a data preparation step that I must be missing out)?
2) Which R package would be the best for running the various techniques - 
Jackard index, Adamic-Adar, common neightbours, PropFlow, etc
3) How to implement a supervised learning method such as random forest (I am 
guessing I need to obtain a feature list but again, how can I get the scores 
for the edges)? 
 
Hope I've explain my questions well but do let me know if more clarification is 
need. 
 
Thanks in advance
Eu Jin                                    
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to