Thanks Sarah, one of my column names was missing a letter so it was throwing things off. It works super fast now and is exactly what I needed. My actual data set has about 6 other ancillary response data data columns, is there a way to combine the 'full' data set I just created with the original in case I need any of the other response variables. E.g.
FULL: Original: Combined: site year sample site year sample color shape site year sample color shape 1 1 10 1 1 10 blue diamond 1 1 10 blue diamond 1 1 12 1 1 12 green pyramid 1 1 12 green pyramid 1 1 NA 1 1 NA NA NA Thanks On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > Yeah, that's tiny: > > > fullout <- expand.grid(site=1:669, year=1:7, sample=1:3) > > dim(fullout) > [1] 14049 3 > > > Almost certainly the problem is that your expand.grid result doesn't > have the same column names as your actual data file, so merge() is > trying to make an enormous result. Note how when I made outgrid in the > example I named the columns. > > Make sure that the names are identical! > > > On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter > <curtisburkhal...@gmail.com> wrote: > > Sarah, > > > > I have 669 sites and each site has 7 years of data, so if I'm thinking > > correctly then there should be 4683 possible combinations of site x year. > > For each year though I need 3 sampling periods so that there is something > > like the following: > > > > site 1 year1 sample 1 > > site 1 year1 sample 2 > > site 1 year1 sample 3 > > site 2 year1 sample 1 > > site 2 year1 sample 2 > > site 2 year1 sample 3..... > > site 669 year7 sample 1 > > site 669 year7 sample 2 > > site 669 year7 sample 3. > > > > I have my max memory allocation set to the amount of RAM (8GB) on my > laptop, > > but it still 'times out' due to memory problems. > > > > On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee <sarah.gos...@gmail.com> > > wrote: > >> > >> You said your data only had 14000 rows, which really isn't many. > >> > >> How many possible combinations do you have, and how many do you need to > >> add? > >> > >> On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter > >> <curtisburkhal...@gmail.com> wrote: > >> > Sarah, > >> > > >> > This strategy works great for this small dataset, but when I attempt > >> > your > >> > method with my data set I reach the maximum allowable memory > allocation > >> > and > >> > the operation just stalls and then stops completely before it is > >> > finished. > >> > Do you know of a way around this? > >> > > >> > Thanks > >> > > >> > On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee <sarah.gos...@gmail.com > > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> I didn't work through your code, because it looked overly > complicated. > >> >> Here's a more general approach that does what you appear to want: > >> >> > >> >> # use dput() to provide reproducible data please! > >> >> comAn <- structure(list(animals = c("bird", "bird", "bird", "bird", > >> >> "bird", > >> >> "bird", "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat", > >> >> "cat", "cat"), animalYears = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, > >> >> 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L, 36L, > >> >> 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L, 48L, 21L > >> >> )), .Names = c("animals", "animalYears", "animalMass"), class = > >> >> "data.frame", row.names = c("1", > >> >> "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", > >> >> "14", "15", "16")) > >> >> > >> >> > >> >> # add reps to comAn > >> >> # assumes comAn is already sorted on animals, animalYears > >> >> comAn$reps <- unlist(sapply(rle(do.call("paste", > >> >> comAn[,1:2]))$lengths, seq_len)) > >> >> > >> >> # create full set of combinations > >> >> outgrid <- expand.grid(animals=unique(comAn$animals), > >> >> animalYears=unique(comAn$animalYears), reps=unique(comAn$reps), > >> >> stringsAsFactors=FALSE) > >> >> > >> >> # combine with comAn > >> >> comAn.full <- merge(outgrid, comAn, all.x=TRUE) > >> >> > >> >> > comAn.full > >> >> animals animalYears reps animalMass > >> >> 1 bird 1 1 29 > >> >> 2 bird 1 2 48 > >> >> 3 bird 1 3 36 > >> >> 4 bird 2 1 20 > >> >> 5 bird 2 2 34 > >> >> 6 bird 2 3 34 > >> >> 7 cat 1 1 46 > >> >> 8 cat 1 2 33 > >> >> 9 cat 1 3 48 > >> >> 10 cat 2 1 21 > >> >> 11 cat 2 2 NA > >> >> 12 cat 2 3 NA > >> >> 13 dog 1 1 21 > >> >> 14 dog 1 2 28 > >> >> 15 dog 1 3 25 > >> >> 16 dog 2 1 35 > >> >> 17 dog 2 2 18 > >> >> 18 dog 2 3 11 > >> >> > > >> >> > >> >> On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter > >> >> <curtisburkhal...@gmail.com> wrote: > >> >> > Hey everyone, > >> >> > > >> >> > I've written a function that adds NAs to a dataframe where data is > >> >> > missing > >> >> > and it seems to work great if I only need to run it once, but if I > >> >> > run > >> >> > it > >> >> > two times in a row I run into problems. I've created a workable > >> >> > example > >> >> > to > >> >> > explain what I mean and why I would do this. > >> >> > > >> >> > In my dataframe there are areas where I need to add two rows of NAs > >> >> > (b/c > >> >> > I > >> >> > need to have 3 animal x year combos and for cat in year 2 I only > have > >> >> > one) > >> >> > so I thought that I'd just run my code twice using the function in > >> >> > the > >> >> > code > >> >> > below. Everything works great when I run it the first time, but > when > >> >> > I > >> >> > run > >> >> > it again it says that the value returned to the list 'x' is of > length > >> >> > 0. > >> >> > I > >> >> > don't understand why the function works the first time around and > >> >> > adds > >> >> > an > >> >> > NA to the 'animalMass' column, but won't do it again. I've used > >> >> > (print(str(dataframe)) to see if there is a change in class or type > >> >> > when > >> >> > the function runs through the original dataframe and there is for > >> >> > 'animalYears', but I just convert it back before rerunning the > >> >> > function > >> >> > for > >> >> > second time. > >> >> > > >> >> > Any thoughts on this would be greatly appreciated b/c my actual > data > >> >> > dataframe I have to input into WinBUGS is 14000x12, so it's not a > >> >> > trivial > >> >> > thing to just add in an NA here or there. > >> >> > > >> >> >>comAn > >> >> > animals animalYears animalMass > >> >> > 1 bird 1 29 > >> >> > 2 bird 1 48 > >> >> > 3 bird 1 36 > >> >> > 4 bird 2 20 > >> >> > 5 bird 2 34 > >> >> > 6 bird 2 34 > >> >> > 7 dog 1 21 > >> >> > 8 dog 1 28 > >> >> > 9 dog 1 25 > >> >> > 10 dog 2 35 > >> >> > 11 dog 2 18 > >> >> > 12 dog 2 11 > >> >> > 13 cat 1 46 > >> >> > 14 cat 1 33 > >> >> > 15 cat 1 48 > >> >> > 16 cat 2 21 > >> >> > > >> >> > So every animal has 3 measurements per year, except for the cat in > >> >> > year > >> >> > two > >> >> > which has only 1. I run the code below and get: > >> >> > > >> >> > #combs defines the different combinations of > >> >> > #animals and animalYears > >> >> > combs<-paste(comAn$animals,comAn$animalYears,sep=':') > >> >> > #counts defines how long the different combinations are > >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length) > >> >> > #missing defines the combs that have length less than one and puts > it > >> >> > in > >> >> > #the data frame missing > >> >> > missing<-data.frame(vals=combs[counts<2],count=counts[counts<2]) > >> >> > > >> >> > genRows<-function(dat){ > >> >> > vals<-strsplit(dat[1],':')[[1]] > >> >> > #not sure why dat[2] is being converted to a string > >> >> > newRows<-2-as.numeric(dat[2]) > >> >> > newDf<-data.frame(animals=rep(vals[1],newRows), > >> >> > animalYears=rep(vals[2],newRows), > >> >> > animalMass=rep(NA,newRows)) > >> >> > return(newDf) > >> >> > } > >> >> > > >> >> > > >> >> > x<-apply(missing,1,genRows) > >> >> > comAn=rbind(comAn, > >> >> > do.call(rbind,x)) > >> >> > > >> >> >> comAn > >> >> > animals animalYears animalMass > >> >> > 1 bird 1 29 > >> >> > 2 bird 1 48 > >> >> > 3 bird 1 36 > >> >> > 4 bird 2 20 > >> >> > 5 bird 2 34 > >> >> > 6 bird 2 34 > >> >> > 7 dog 1 21 > >> >> > 8 dog 1 28 > >> >> > 9 dog 1 25 > >> >> > 10 dog 2 35 > >> >> > 11 dog 2 18 > >> >> > 12 dog 2 11 > >> >> > 13 cat 1 46 > >> >> > 14 cat 1 33 > >> >> > 15 cat 1 48 > >> >> > 16 cat 2 21 > >> >> > 17 cat 2 <NA> > >> >> > > >> >> > So far so good, but then I adjust the code so that it reads > (**notice > >> >> > the > >> >> > change in the specification in 'missing' to counts<3**): > >> >> > > >> >> > #combs defines the different combinations of > >> >> > #animals and animalYears > >> >> > combs<-paste(comAn$animals,comAn$animalYears,sep=':') > >> >> > #counts defines how long the different combinations are > >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length) > >> >> > #missing defines the combs that have length less than one and puts > it > >> >> > in > >> >> > #the data frame missing > >> >> > missing<-data.frame(vals=combs[counts<3],count=counts[counts<3]) > >> >> > > >> >> > genRows<-function(dat){ > >> >> > vals<-strsplit(dat[1],':')[[1]] > >> >> > #not sure why dat[2] is being converted to a string > >> >> > newRows<-2-as.numeric(dat[2]) > >> >> > newDf<-data.frame(animals=rep(vals[1],newRows), > >> >> > animalYears=rep(vals[2],newRows), > >> >> > animalMass=rep(NA,newRows)) > >> >> > return(newDf) > >> >> > } > >> >> > > >> >> > > >> >> > x<-apply(missing,1,genRows) > >> >> > comAn=rbind(comAn, > >> >> > do.call(rbind,x)) > >> >> > > >> >> > The result for 'x' then reads: > >> >> > > >> >> >> x > >> >> > [[1]] > >> >> > [1] animals animalYears animalMass > >> >> > <0 rows> (or 0-length row.names) > >> >> > > >> >> > Any thoughts on why it might be doing this instead of adding an > >> >> > additional > >> >> > row to get the result: > >> >> > > >> >> >> comAn > >> >> > animals animalYears animalMass > >> >> > 1 bird 1 29 > >> >> > 2 bird 1 48 > >> >> > 3 bird 1 36 > >> >> > 4 bird 2 20 > >> >> > 5 bird 2 34 > >> >> > 6 bird 2 34 > >> >> > 7 dog 1 21 > >> >> > 8 dog 1 28 > >> >> > 9 dog 1 25 > >> >> > 10 dog 2 35 > >> >> > 11 dog 2 18 > >> >> > 12 dog 2 11 > >> >> > 13 cat 1 46 > >> >> > 14 cat 1 33 > >> >> > 15 cat 1 48 > >> >> > 16 cat 2 21 > >> >> > 17 cat 2 <NA> > >> >> > 18 cat 2 <NA> > >> >> > > >> >> > Thanks > >> >> > -- > >> >> > Curtis Burkhalter > >> > > >> > > -- Curtis Burkhalter https://sites.google.com/site/curtisburkhalter/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.