Dear R users I'm a novice user of R and have absolutely no prior knowledge of social network analysis, so apologies if my question is trivial. I've spent alot of time trying to solve this on my own but I really can't so hope someone here can help me out. Cheers! The dataset: I'm trying to predict the existance of links (True or False) in a test set using a training set. Both data sets are in an "edgelist" format, where User IDs represents nodes in both columns with the 1st column directing to the 2nd column (see figure 1 below). Using the AUC to evaluate the performance, I am looking for the best algorithm to predict the existance of links in the test data (50% are true and rest are false). Figure 1: > training Vertices: 1133143 Edges: 999 Directed: TRUE Edges: [0] 105 -> 850956 [1] 105 -> 1073420 [2] 105 -> 1102667 [3] 165 -> 888346 [4] 165 -> 579649 [5] 165 -> 136665 etc.. I'm having problems obtaining the probability scores for the links / edges as most of the scores are for the nodes. An example of this is the graph.knn and page.rank module in igraph. So my questions are: 1) What do I need to do to obtain the scores for the links instead of the nodes (I presume it must be a data preparation step that I must be missing out)? 2) Which R package would be the best for running the various techniques - Jackard index, Adamic-Adar, common neightbours, PropFlow, etc 3) How to implement a supervised learning method such as random forest (I am guessing I need to obtain a feature list but again, how can I get the scores for the edges)? Hope I've explain my questions well but do let me know if more clarification is need. Thanks in advance Eu Jin [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.