# build off of david's suggestion
x <-
    data.frame(
        patient= 1:20 ,
        disease =
            sapply(
                pmin( 2 + rpois( 20 , 2 ) , 6 ) ,
                function( n ) paste0( sample( c('A','B','C','D','E','F'),
n), collapse="+" )
            )
    )

# break the diseases into a list, one entry per patient
y <- strsplit( as.character( x$disease ) , "\\+" )

# melt the list
library(reshape2)

z <- melt( y )

# re-name the columns in that result
names( z ) <- c( "disease" , "patient" )

# print the results to the screen
z

# compare the structure to `x` if you like
x





On Wed, Jun 25, 2014 at 2:18 AM, Abhinaba Roy <abhinabaro...@gmail.com>
wrote:

> Hi David,
>
> I was thinking something like this:
>
> ID   Disease
> 1     A
> 2     B
> 3     A
> 1    C
> 2    D
> 5    A
> 4    B
> 3    D
> 2    A
> ..    ..
>
> How can this be done?
>
>
> On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius <dwinsem...@comcast.net>
> wrote:
>
> >
> > On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:
> >
> > > Dear R helpers,
> > >
> > > I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
> > > having suffered from various diseases in the past (say diseases
> > > A,B,C,D,E,F). The only condition imposed is that each patient should've
> > > suffered from *atleast* two diseases. So my data frame will have two
> > > columns 'ID' and 'Disease'.
> > >
> > > I want to do a basket analysis with this data, where ID will be the
> > > identifier and we will establish rules based on the 'Disease' column.
> > >
> > > How can I generate this type of data in R?
> > >
> >
> > Perhaps something along these lines for 20 cases:
> >
> > > data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6),
> > function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+"
> ) )
> > + )
> >    patient     disease
> > 1        1         F+D
> > 2        2     F+A+D+E
> > 3        3     F+D+C+E
> > 4        4     B+D+C+A
> > 5        5     D+A+F+C
> > 6        6       E+A+D
> > 7        7 E+F+B+C+A+D
> > 8        8   A+B+C+D+E
> > 9        9     B+E+C+F
> > 10      10         C+A
> > 11      11 B+A+D+E+C+F
> > 12      12         B+C
> > 13      13     A+D+B+E
> > 14      14 D+C+E+F+B+A
> > 15      15   C+F+D+E+A
> > 16      16       A+C+B
> > 17      17     C+D+B+E
> > 18      18         A+B
> > 19      19   C+B+D+E+F
> > 20      20       D+C+F
> >
> > > --
> > > Regards
> > > Abhinaba Roy
> > >
> > >       [[alternative HTML version deleted]]
> >
> > You should read the Posting Guide and learn to post in HTML.
> > >
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > --
> > David Winsemius
> > Alameda, CA, USA
> >
> >
>
>
> --
> Regards
> Abhinaba Roy
> Statistician
> Radix Analytics Pvt. Ltd
> Ahmedabad
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to