Hi!

I am working with a large network data consisting of source-target
pairs stored in a tibble. Now I need to transform the directed dataset
to an undirected network data. This means, I need to keep only one
instance for pairs with the same "nodes". In other words, if my data
has one row with A (source) and B (target) and one with B (source) and
A (target), only the pair A-B should be kept.

Here an example how I have solved this problem so far:

--- snip ---

# Create some data
x<-tibble(Source=rep(1:3,4), Target=c(rep(1,3),rep(2,3),rep(3,3),rep(4,3)))
x       # print original data

# Remove "undirected" duplicates
x<-x %>% mutate(pair=mapply(function(x,y)
paste0(sort(c(x,y)),collapse="-"), Source, Target)) %>% distinct(pair,
.keep_all = T) %>% mutate(Source=sapply(pair, function(x)
unlist(strsplit(x, split="-"))[1]), Target=sapply(pair, function(x)
unlist(strsplit(x, split="-"))[2])) %>% select(-pair)

x       # print cleaned data

--- snip ---

The good thing with my own solution is that it allows the creation of
weighted pairs as well. One just needs to replace 'distinct(pair,
.keep_all=T)' with 'count(pair)'.

I have done a lot of searching but not found any function providing
this functionality. Does someone know an alternative, maybe a more
effective function/solution?

Best,

Kimmo Elo


-- 
Dr. Kimmo Elo
Senior researcher in European Studies
=====================================================
University of Turku
Centre for Parliamentary Studies
Finland
E-mail: kimmo....@utu.fi
=====================================================
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to