Re: [R] problem applying the same function twice

2015-03-12 Thread Curtis Burkhalter
Sarah, This strategy works great for this small dataset, but when I attempt your method with my data set I reach the maximum allowable memory allocation and the operation just stalls and then stops completely before it is finished. Do you know of a way around this? Thanks On Tue, Mar 10, 2015 at

Re: [R] problem applying the same function twice

2015-03-12 Thread William Dunlap
The key to your problem may be that x<-apply(missing,1,genRows) converts 'missing' to a matrix, with the same type for all columns then makes x either a list or a matrix but never a data.frame. Those features of apply may mess up the rest of your calculations. Don't use apply(). Bill Dunlap T

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Sarah, I realized what I was saying after I pressed send on the email. It makes perfect sense now, thanks so much for your help and patience. On Mar 10, 2015 5:57 PM, "Sarah Goslee" wrote: > I think you're kind of missing the way this works: > > the data frame created by expand.grid() should ONL

Re: [R] problem applying the same function twice

2015-03-10 Thread Sarah Goslee
I think you're kind of missing the way this works: the data frame created by expand.grid() should ONLY have site, year, sample (with the exact names used in the data itself). Then the merged data frame will have the full site,year,sample combinations, along with ALL the data variables. Your animal

Re: [R] problem applying the same function twice

2015-03-10 Thread Jeff Newmiller
You may find it beneficial to investigate packages dplyr, data.table, or a combination of the two for handling large data sets in memory. Or, perhaps dplyr with a SQL back end for working on disk (I have not tried that myself yet). I do find your excuse for manufacturing data records uncompelli

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Thanks Sarah, one of my column names was missing a letter so it was throwing things off. It works super fast now and is exactly what I needed. My actual data set has about 6 other ancillary response data data columns, is there a way to combine the 'full' data set I just created with the original i

Re: [R] problem applying the same function twice

2015-03-10 Thread Sarah Goslee
Yeah, that's tiny: > fullout <- expand.grid(site=1:669, year=1:7, sample=1:3) > dim(fullout) [1] 14049 3 Almost certainly the problem is that your expand.grid result doesn't have the same column names as your actual data file, so merge() is trying to make an enormous result. Note how when I

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Sarah, I have 669 sites and each site has 7 years of data, so if I'm thinking correctly then there should be 4683 possible combinations of site x year. For each year though I need 3 sampling periods so that there is something like the following: site 1 year1 sample 1 site 1 year1

Re: [R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
William, You say not to use apply here, but what would you use in its place? Thanks On Tue, Mar 10, 2015 at 2:13 PM, William Dunlap wrote: > The key to your problem may be that >x<-apply(missing,1,genRows) > converts 'missing' to a matrix, with the same type for all columns > then makes x

Re: [R] problem applying the same function twice

2015-03-10 Thread Sarah Goslee
Hi, I didn't work through your code, because it looked overly complicated. Here's a more general approach that does what you appear to want: # use dput() to provide reproducible data please! comAn <- structure(list(animals = c("bird", "bird", "bird", "bird", "bird", "bird", "dog", "dog", "dog", "

[R] problem applying the same function twice

2015-03-10 Thread Curtis Burkhalter
Hey everyone, I've written a function that adds NAs to a dataframe where data is missing and it seems to work great if I only need to run it once, but if I run it two times in a row I run into problems. I've created a workable example to explain what I mean and why I would do this. In my datafram