Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Ahmed Attia
Thanks Bert, worked nicely. Yes, genotypes with only one ID will be eliminated before partitioning the data. Best regards Ahmed Attia On Mon, Aug 27, 2018 at 8:09 PM, Bert Gunter wrote: > Just partition the unique stand_ID's and select on them using %in% , say: > > id <-

Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Bert Gunter
Sorry, my bad -- careless reading: you need to do the partitioning within genotype. Something like: by(dataGenotype, dataGenotype$Genotype, function(x){ u <- unique(x$standID) tst <- x$x2 %in% sample(u, floor(length(u)/2)) list(test = x[tst,], train = x[!tst,] }) This will give a

Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread MacQueen, Don via R-help
And yes, I ignored Genotype, but for the example data none of the stand_ID values are present in more than one Genotype, so it doesn't matter. If that's not true in general, then constructing the grp variable is a little more complex, but the principle is the same. -- Don MacQueen Lawrence

Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread MacQueen, Don via R-help
You could start with split() grp <- rep('', nrow(mydata) ) grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training' grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing' split(mydata, grp) or perhaps grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' ) split(mydata, grp)

Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Bert Gunter
Just partition the unique stand_ID's and select on them using %in% , say: id <- unique(dataGenotype$stand_ID) tst <- sample(id, floor(length(id)/2)) wh <- dataGenotype$stand_ID %in% tst ## logical vector test<- dataGenotype[wh,] train <- dataGenotype[!wh,] There are a million variations on this

[R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Ahmed Attia
I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotypestand_IDInventory_date stemC mheight H13