# build off of david's suggestion x <- data.frame( patient= 1:20 , disease = sapply( pmin( 2 + rpois( 20 , 2 ) , 6 ) , function( n ) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+" ) ) )
# break the diseases into a list, one entry per patient y <- strsplit( as.character( x$disease ) , "\\+" ) # melt the list library(reshape2) z <- melt( y ) # re-name the columns in that result names( z ) <- c( "disease" , "patient" ) # print the results to the screen z # compare the structure to `x` if you like x On Wed, Jun 25, 2014 at 2:18 AM, Abhinaba Roy <abhinabaro...@gmail.com> wrote: > Hi David, > > I was thinking something like this: > > ID Disease > 1 A > 2 B > 3 A > 1 C > 2 D > 5 A > 4 B > 3 D > 2 A > .. .. > > How can this be done? > > > On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius <dwinsem...@comcast.net> > wrote: > > > > > On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: > > > > > Dear R helpers, > > > > > > I want to generate data for say 1000 patients (i.e., 1000 unique IDs) > > > having suffered from various diseases in the past (say diseases > > > A,B,C,D,E,F). The only condition imposed is that each patient should've > > > suffered from *atleast* two diseases. So my data frame will have two > > > columns 'ID' and 'Disease'. > > > > > > I want to do a basket analysis with this data, where ID will be the > > > identifier and we will establish rules based on the 'Disease' column. > > > > > > How can I generate this type of data in R? > > > > > > > Perhaps something along these lines for 20 cases: > > > > > data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), > > function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+" > ) ) > > + ) > > patient disease > > 1 1 F+D > > 2 2 F+A+D+E > > 3 3 F+D+C+E > > 4 4 B+D+C+A > > 5 5 D+A+F+C > > 6 6 E+A+D > > 7 7 E+F+B+C+A+D > > 8 8 A+B+C+D+E > > 9 9 B+E+C+F > > 10 10 C+A > > 11 11 B+A+D+E+C+F > > 12 12 B+C > > 13 13 A+D+B+E > > 14 14 D+C+E+F+B+A > > 15 15 C+F+D+E+A > > 16 16 A+C+B > > 17 17 C+D+B+E > > 18 18 A+B > > 19 19 C+B+D+E+F > > 20 20 D+C+F > > > > > -- > > > Regards > > > Abhinaba Roy > > > > > > [[alternative HTML version deleted]] > > > > You should read the Posting Guide and learn to post in HTML. > > > > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > > David Winsemius > > Alameda, CA, USA > > > > > > > -- > Regards > Abhinaba Roy > Statistician > Radix Analytics Pvt. Ltd > Ahmedabad > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.