Rainer...I can't believe this did the trick.  You're a genius.  Thank you
sir.


On Thu, May 23, 2013 at 7:07 AM, Rainer Schuermann <
rainer.schuerm...@gmx.net> wrote:

> Using the data generated with your code below, does
>
> rbind( DF1, DF2[ !(DF2$X.TIME %in% DF1$X.TIME), ] )
> DF1 <- DF1[ order( DF1$X.DATE, DF1$X.TIME ), ]
>
> do the job?
>
> Rgds,
> Rainer
>
>
>
>
> On Thursday 23 May 2013 05:54:26 Adeel - SafeGreenCapital wrote:
> > Thank you Blaser:
> >
> > This is the exact solution I came up with but when comparing 8M rows
> even on
> > an 8G machine, one runs out of memory.  To run this effectively, I have
> to
> > break the DF into smaller DFs, loop through them and then do a massive
> > rmerge at the end.  That's what takes 8+ hours to compute.
> >
> > Even the bigmemory package is causing OOM issues.
> >
> > -----Original Message-----
> > From: Blaser Nello [mailto:nbla...@ispm.unibe.ch]
> > Sent: Thursday, May 23, 2013 12:15 AM
> > To: Adeel Amin; r-help@r-project.org
> > Subject: RE: [R] adding rows without loops
> >
> > Merge should do the trick. How to best use it will depend on what you
> > want to do with the data after.
> > The following is an example of what you could do. This will perform
> > best, if the rows are missing at random and do not cluster.
> >
> > DF1 <- data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100,
> > VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
> > DF2 <- data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100,
> > VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32))
> >
> > DFm <- merge(DF1, DF2, by=c("X.DATE", "X.TIME"), all=TRUE)
> >
> > while(any(is.na(DFm))){
> >   if (any(is.na(DFm[1,]))) stop("Complete first row required!")
> >   ind <- which(is.na(DFm), arr.ind=TRUE)
> >   prind <- matrix(c(ind[,"row"]-1, ind[,"col"]), ncol=2)
> >   DFm[is.na(DFm)] <- DFm[prind]
> > }
> > DFm
> >
> > Best,
> > Nello
> >
> > -----Original Message-----
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> > On Behalf Of Adeel Amin
> > Sent: Donnerstag, 23. Mai 2013 07:01
> > To: r-help@r-project.org
> > Subject: [R] adding rows without loops
> >
> > I'm comparing a variety of datasets with over 4M rows.  I've solved this
> > problem 5 different ways using a for/while loop but the processing time
> > is murder (over 8 hours doing this row by row per data set).  As such
> > I'm trying to find whether this solution is possible without a loop or
> > one in which the processing time is much faster.
> >
> > Each dataset is a time series as such:
> >
> > DF1:
> >
> >     X.DATE X.TIME VALUE VALUE2
> > 1 01052007   0200    37     29
> > 2 01052007   0300    42     24
> > 3 01052007   0400    45     28
> > 4 01052007   0500    45     27
> > 5 01052007   0700    45     35
> > 6 01052007   0800    42     32
> > 7 01052007   0900    45     32
> > ...
> > ...
> > ...
> > n
> >
> > DF2
> >
> >     X.DATE X.TIME VALUE VALUE2
> > 1 01052007   0200    37     29
> > 2 01052007   0300    42     24
> > 3 01052007   0400    45     28
> > 4 01052007   0500    45     27
> > 5 01052007   0600    45     35
> > 6 01052007   0700    42     32
> > 7 01052007   0800    45     32
> >
> > ...
> > ...
> > n+4000
> >
> > In other words there are 4000 more rows in DF2 then DF1 thus the
> > datasets are of unequal length.
> >
> > I'm trying to ensure that all dataframes have the same number of X.DATE
> > and X.TIME entries.  Where they are missing, I'd like to insert a new
> > row.
> >
> > In the above example, when comparing DF2 to DF1, entry 01052007 0600
> > entry is missing in DF1.  The solution would add a row to DF1 at the
> > appropriate index.
> >
> > so new dataframe would be
> >
> >
> >     X.DATE X.TIME VALUE VALUE2
> > 1 01052007   0200    37     29
> > 2 01052007   0300    42     24
> > 3 01052007   0400    45     28
> > 4 01052007   0500    45     27
> > 5 01052007   0600    45     27
> > 6 01052007   0700    45     35
> > 7 01052007   0800    42     32
> > 8 01052007   0900    45     32
> >
> > Value and Value2 would be the same as row 4.
> >
> > Of course this is simple to accomplish using a row by row analysis but
> > with of 4M rows the processing time destroying and rebinding the
> > datasets is very time consuming and I believe highly un-R'ish.  What am
> > I missing?
> >
> > Thanks!
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to