Re: [R] reshape to wide format takes extremely long

Dennis Murphy Thu, 02 Sep 2010 02:35:37 -0700

Hi:

I did the following test using function ddply() in the plyr package on a toy
data frame with 50000 observations using five studies, 20 subjects per
study, 25 cycles per subject, five days per cycle and four observations by
type per day. No date-time variable was included.


# Test data frame
big <- data.frame(study = factor(rep(1:5, each = 10000)),
                  subject = factor(rep(1:100, each = 500)),
                  cycle = rep(rep(1:25, each = 20), 100),
                  day = rep(rep(1:5, each = 4), 500),
                  type = rep(c('ALB', 'ALP', 'ALT', 'AST'), 12500),
                  obs = rpois(50000, 70) )
> dim(big)
[1] 50000     6

# 64-bit R on a Windows 7 box with 8Gb RAM and a 2.93GHz Core Duo chip.
system.time(bigtab <- ddply(big, .(study, subject, cycle, day), function(x)
xtabs(obs ~ type, data = x)))
   user  system elapsed
  30.22    0.02   30.60

> dim(bigtab)
[1] 12500     8
> head(bigtab)
  study subject cycle day ALB ALP ALT AST
1     1       1     1   1  77  80  67  70
2     1       1     1   2  60  54  70  70
3     1       1     1   3  71  77  69  65
4     1       1     1   4  62  71  73  68
5     1       1     1   5  78  67  69  78
6     1       1     2   1  71  69  74  69
> tail(bigtab)
      study subject cycle day ALB ALP ALT AST
12495     5     100    24   5  75  83  72  70
12496     5     100    25   1  85  52  62  70
12497     5     100    25   2  79  64  84  68
12498     5     100    25   3  67  65  74  81
12499     5     100    25   4  62  86  66  80
12500     5     100    25   5  58  76  85  84

There may be an easier/more efficient way to do this with melt() and cast()
in the reshape package, but moved on when I couldn't figure it out within
ten minutes (probably because I was thinking 'xtabs of obs by type for
study/subject/cycle/day combinations - that's the ticket!' :) Packages sqldf
and data.table are other viable options for this sort of task, and now that
there is a test data set to play with, it would be interesting to see what
else can be done. I'd be surprised if this couldn't be done within a few
seconds because the data frame is not that large.

Anyway, HTH,
Dennis



On Thu, Sep 2, 2010 at 12:24 AM, Coen van Hasselt
<coenvanhass...@gmail.com>wrote:

> Hello,
>
> I have a data.frame with the following format:
>
> > head(clin2)
>    Study Subject  Type      Obs Cycle Day       Date  Time
> 1 A001101   10108   ALB 44.00000    98   1 2004-03-11 14:26
> 2 A001101   10108   ALP 95.00000    98   1 2004-03-11 14:26
> 3 A001101   10108   ALT 61.00000    98   1 2004-03-11 14:26
> 5 A001101   10108   AST 33.00000    98   1 2004-03-11 14:26
>
> I want to transform this data.frame so that I have "Obs" columns for
> each "Type". The full dataset is 45000 rows long. For a short subset
> of 100 rows, reshaping takes 0.2 seconds, and produces what I want.
> All columns are either numeric or character format (incl. date/time).
>
> > reshape(clin2, v.names="Obs", timevar="Type",
> direction="wide",idvar=c("Study","Subject","Cycle","Day","Date","Time"),)
>      Study Subject Cycle Day       Date  Time Obs.ALB Obs.ALP Obs.ALT
> Obs.AST
> 1   A001101   10108    98   1 2004-03-11 14:26      44      95      61
>  33
> 11  A001101   10108     1   1 2004-03-12 14:01      41      85      39
>  33
> 21  A001101   10108     1   8 2004-03-22 10:34      40      90      70
>  34
> 30  A001101   10108     1  15 2004-03-29 09:56      45      97      66
>     48 [........]
>
> However, when using the same reshape command for the full data.frame
> of 45000 rows, it still wasn't finished when run overnight (8 GB RAM +
> 8 GB swap in use).
>
> The time to process this data.frame from a 100-row subset to a
> 1000-row subset increases from 0.2 sec to 60 sec.
>
> I would greatly appreciate a advice why the time for reshaping is
> increasing exponentially with the nr. of rows, and how I can do this
> in an elegant way.
>
> Thanks!
>
> Coen.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape to wide format takes extremely long

Reply via email to