subject:"\[R\] Problem with ddply in the plyr\-package\: surprising output of a date\-column"

[R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Christoph Jäckel

Hi Together,

I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference) is handled
correctly. Any idea what I do wrong?

df - 
data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1),

Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-05-5,1985-05-6,1985-05-7),
 Value=c(1,2,3,4,5,6,7)))
df[,1] - as.character(df[,1])
df[,2] - as.character(df[,2])
df$Date   - strptime(df$Date,%Y-%m-%d)

#Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the desired output is:
df[c(2:3,6:7),]

#My idea: Write a custom function that only returns observations with
multiple rows.
#Seems to work except that the Date column doesn't make any sense anymore
#Warning message: In output[[var]][rng] - df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})

#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})

Thanks in advance,

Christoph



Christoph Jäckel (Dipl.-Kfm.)



Research Assistant

Chair for Financial Management and Capital Markets | Lehrstuhls für
Finanzmanagement und Kapitalmärkte

TUM School of Management | Technische Universität München

Arcisstr. 21 | D-80333 München | Germany

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Brian Diggs


On 4/25/2011 10:19 AM, Christoph Jäckel wrote:

Hi Together,

I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference) is handled
correctly. Any idea what I do wrong?

df- 
data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1),

Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-05-5,1985-05-6,1985-05-7),
  Value=c(1,2,3,4,5,6,7)))
df[,1]- as.character(df[,1])
df[,2]- as.character(df[,2])
df$Date- strptime(df$Date,%Y-%m-%d)

#Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the desired output is:
df[c(2:3,6:7),]

#My idea: Write a custom function that only returns observations with
multiple rows.
#Seems to work except that the Date column doesn't make any sense anymore
#Warning message: In output[[var]][rng]- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})

#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})


Works for me:

 df[c(2:3,6:7),]
  ID1 ID2 ID3  Date Value
2   2   b  v1 1985-05-2 2
3   2   b  v1 1985-05-3 3
6   4   e  v1 1985-05-6 6
7   4   e  v1 1985-05-7 7
 ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})
  ID1 ID2 ID3  Date Value
1   2   b  v1 1985-05-2 2
2   2   b  v1 1985-05-3 3
3   4   e  v1 1985-05-6 6
4   4   e  v1 1985-05-7 7
 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] plyr_1.5.2

loaded via a namespace (and not attached):
[1] tools_2.13.0

A couple of things: there was just an update of plyr to 1.5.2; maybe 
that fixes what you are seeing?  Also, your df consists of only factors. 
 cbind-ing the data before turning it into a data.frame makes it a 
character matrix which gets converted to factors.


 str(df)
'data.frame':   7 obs. of  5 variables:
 $ ID1  : Factor w/ 4 levels 1,2,3,4: 1 2 2 3 3 4 4
 $ ID2  : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5
 $ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1
 $ Date : Factor w/ 7 levels 1985-05-1,1985-05-2,..: 1 2 3 4 5 6 7
 $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7

Maybe that has something to do with the odd dates since they are not 
really dates at all, just string representations of factor levels. 
Compare with:


DF - data.frame(ID1=c(1,2,2,3,3,4,4),
ID2=c('a','b','b','c','d','e','e'),
ID3=c(v1,v1,v1,v1,v2,v1,v1),
Date=as.Date(c(1985-05-1,1985-05-2,1985-05-3,
1985-05-4,1985-05-5,1985-05-6,1985-05-7)),
Value=c(1,2,3,4,5,6,7))
str(DF)
#'data.frame':   7 obs. of  5 variables:
# $ ID1  : num  1 2 2 3 3 4 4
# $ ID2  : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5
# $ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1
# $ Date : Date, format: 1985-05-01 1985-05-02 ...
# $ Value: num  1 2 3 4 5 6 7

This version also works for me.

ddply(DF,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})
#  ID1 ID2 ID3   Date Value
#1   2   b  v1 1985-05-02 2
#2   2   b  v1 1985-05-03 3
#3   4   e  v1 1985-05-06 6
#4   4   e  v1 1985-05-07 7


Thanks in advance,

Christoph



Christoph Jäckel (Dipl.-Kfm.)



Research Assistant

Chair for Financial Management and Capital Markets | Lehrstuhls für
Finanzmanagement und Kapitalmärkte

TUM School of Management | Technische Universität München

Arcisstr. 21 | D-80333 München | Germany




--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Peter Ehlers


On 2011-04-25 10:19, Christoph Jäckel wrote:

Hi Together,

I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference) is handled
correctly. Any idea what I do wrong?

df- 
data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1),

Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-05-5,1985-05-6,1985-05-7),
  Value=c(1,2,3,4,5,6,7)))
df[,1]- as.character(df[,1])
df[,2]- as.character(df[,2])
df$Date- strptime(df$Date,%Y-%m-%d)

#Apparently there are two observation that have the same IDs: ID1=2 and ID1=4
ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the desired output is:
df[c(2:3,6:7),]

#My idea: Write a custom function that only returns observations with
multiple rows.
#Seems to work except that the Date column doesn't make any sense anymore
#Warning message: In output[[var]][rng]- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})

#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})


I would characterize your problem as:
a) using strptime - this is what gives ddply() fits;

b) not using str() to check whether R agrees with
   you with respect to your data;

c) using cbind() inside data.frame(). This isn't
   wrong, but is rarely (in my experience) useful.

If you use as.Date (or even nothing) on your Date
variable, you'll find that ddply does what you want.
To see why it doesn't work with strptime, check
str(df) and then ?Posixlt. You've converted Date
values to lists.

My comment about cbind() is to warn you that your
Values variable, as you have constructed it, is
a factor.

Peter Ehlers



Thanks in advance,

Christoph



Christoph Jäckel (Dipl.-Kfm.)



Research Assistant

Chair for Financial Management and Capital Markets | Lehrstuhls für
Finanzmanagement und Kapitalmärkte

TUM School of Management | Technische Universität München

Arcisstr. 21 | D-80333 München | Germany

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread William Dunlap



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Diggs
 Sent: Monday, April 25, 2011 11:05 AM
 To: christoph.jaec...@wi.tum.de
 Cc: r-help@r-project.org
 Subject: Re: [R] Problem with ddply in the plyr-package: 
 surprising output of a date-column
 
 On 4/25/2011 10:19 AM, Christoph Jäckel wrote:
  Hi Together,
 
  I have a problem with the plyr package - more precisely 
 with the ddply
  function - and would be very grateful for any help. I hope 
 the example
  here is precise enough for someone to identify the problem. 
 Basically,
  in this step I want to identify observations that are identical in
  terms of certain identifiers (ID1, ID2, ID3) and just want to save
  those observations (in this step, without deleting any rows or
  manipulating any data) in a separate data.frame. However, I get the
  warning message below and the column with dates is messed up.
  Interestingly, the value column (the type is factor here, but if you
  change that with as.integer it doesn't make any difference) 
 is handled
  correctly. Any idea what I do wrong?
 
  df- 
 data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d
','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1),
 
  
 Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-0
 5-5,1985-05-6,1985-05-7),
Value=c(1,2,3,4,5,6,7)))
  df[,1]- as.character(df[,1])
  df[,2]- as.character(df[,2])
  df$Date- strptime(df$Date,%Y-%m-%d)
 
  #Apparently there are two observation that have the same 
 IDs: ID1=2 and ID1=4
  ddply(df,.(ID1,ID2,ID3),nrow)
  #I want to save those IDs in a separate data.frame, so the 
 desired output is:
  df[c(2:3,6:7),]
 
  #My idea: Write a custom function that only returns 
 observations with
  multiple rows.
  #Seems to work except that the Date column doesn't make any 
 sense anymore
  #Warning message: In output[[var]][rng]- df[[var]]: number of items
  to replace is not a multiple of replacement length
  ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})
 
  #Notice that it works perfectly if I only have one observation with
  multiple rows
  ddply(df[1:6,],.(ID1,ID2,ID3),function(df) 
 if(nrow(df)=1){NULL}else{df})
 
 Works for me:
 
   df[c(2:3,6:7),]
ID1 ID2 ID3  Date Value
 2   2   b  v1 1985-05-2 2
 3   2   b  v1 1985-05-3 3
 6   4   e  v1 1985-05-6 6
 7   4   e  v1 1985-05-7 7
   ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})
ID1 ID2 ID3  Date Value
 1   2   b  v1 1985-05-2 2
 2   2   b  v1 1985-05-3 3
 3   4   e  v1 1985-05-6 6
 4   4   e  v1 1985-05-7 7
 [ ... version info elided ... ] 
 A couple of things: there was just an update of plyr to 1.5.2; maybe 
 that fixes what you are seeing?  Also, your df consists of 
 only factors. 
   cbind-ing the data before turning it into a data.frame makes it a 
 character matrix which gets converted to factors.
 
   str(df)
 'data.frame':   7 obs. of  5 variables:
   $ ID1  : Factor w/ 4 levels 1,2,3,4: 1 2 2 3 3 4 4
   $ ID2  : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5
   $ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1
   $ Date : Factor w/ 7 levels 1985-05-1,1985-05-2,..: 1 2 
 3 4 5 6 7
   $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7

The OP's data.frame contained a POSIXlt (not factor) object
in the Date column
   str(df)
  'data.frame':   7 obs. of  5 variables:
   $ ID1  : chr  1 2 2 3 ...
   $ ID2  : chr  a b b c ...
   $ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1
   $ Date : POSIXlt, format: 1985-05-01 1985-05-02 ...
   $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7
and apparently plyr's equivalent of rbind doesn't support that class.

If you want to continue using POSIXlt objects you can get your
immediate result without ddply; subscripting will do the job:
   nDups - with(df, ave(rep(0,nrow(df)), ID1, ID2, ID3, FUN=length))
   print(nDups)
  [1] 1 2 2 1 1 2 2
   df[nDups1, ]
ID1 ID2 ID3   Date Value
  2   2   b  v1 1985-05-02 2
  3   2   b  v1 1985-05-03 3
  6   4   e  v1 1985-05-06 6
  7   4   e  v1 1985-05-07 7
   str(.Last.value)
  'data.frame':   4 obs. of  5 variables:
   $ ID1  : chr  2 2 4 4
   $ ID2  : chr  b b e e
   $ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1
   $ Date : POSIXlt, format: 1985-05-02 1985-05-03 ...
   $ Value: Factor w/ 7 levels 1,2,3,4,..: 2 3 6 7

If you need plyr for other tasks you ought to use a different
class for your date data (or wait until plyr can deal with
POSIXlt objects).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 Maybe that has something to do with the odd dates since 
 they are not 
 really dates at all, just string representations of factor levels. 
 Compare with:
 
 DF - data.frame(ID1=c(1,2,2,3,3,4,4),
   ID2=c('a','b','b','c','d','e','e'),
   ID3=c(v1,v1,v1,v1,v2,v1,v1),
   Date=as.Date(c(1985-05-1,1985-05-2,1985-05-3,

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Brian Diggs


On 4/25/2011 11:55 AM, William Dunlap wrote:



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Brian Diggs
Sent: Monday, April 25, 2011 11:05 AM
To: christoph.jaec...@wi.tum.de
Cc: r-help@r-project.org
Subject: Re: [R] Problem with ddply in the plyr-package:
surprising output of a date-column

On 4/25/2011 10:19 AM, Christoph Jäckel wrote:

Hi Together,

I have a problem with the plyr package - more precisely

with the ddply

function - and would be very grateful for any help. I hope

the example

here is precise enough for someone to identify the problem.

Basically,

in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step, without deleting any rows or
manipulating any data) in a separate data.frame. However, I get the
warning message below and the column with dates is messed up.
Interestingly, the value column (the type is factor here, but if you
change that with as.integer it doesn't make any difference)

is handled

correctly. Any idea what I do wrong?

df-

data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d

','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1),




Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-0
5-5,1985-05-6,1985-05-7),

   Value=c(1,2,3,4,5,6,7)))
df[,1]- as.character(df[,1])
df[,2]- as.character(df[,2])
df$Date- strptime(df$Date,%Y-%m-%d)

#Apparently there are two observation that have the same

IDs: ID1=2 and ID1=4

ddply(df,.(ID1,ID2,ID3),nrow)
#I want to save those IDs in a separate data.frame, so the

desired output is:

df[c(2:3,6:7),]

#My idea: Write a custom function that only returns

observations with

multiple rows.
#Seems to work except that the Date column doesn't make any

sense anymore

#Warning message: In output[[var]][rng]- df[[var]]: number of items
to replace is not a multiple of replacement length
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})

#Notice that it works perfectly if I only have one observation with
multiple rows
ddply(df[1:6,],.(ID1,ID2,ID3),function(df)

if(nrow(df)=1){NULL}else{df})

Works for me:

df[c(2:3,6:7),]
ID1 ID2 ID3  Date Value
2   2   b  v1 1985-05-2 2
3   2   b  v1 1985-05-3 3
6   4   e  v1 1985-05-6 6
7   4   e  v1 1985-05-7 7
ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df})
ID1 ID2 ID3  Date Value
1   2   b  v1 1985-05-2 2
2   2   b  v1 1985-05-3 3
3   4   e  v1 1985-05-6 6
4   4   e  v1 1985-05-7 7
[ ... version info elided ... ]
A couple of things: there was just an update of plyr to 1.5.2; maybe
that fixes what you are seeing?  Also, your df consists of
only factors.
   cbind-ing the data before turning it into a data.frame makes it a
character matrix which gets converted to factors.

str(df)
'data.frame':   7 obs. of  5 variables:
   $ ID1  : Factor w/ 4 levels 1,2,3,4: 1 2 2 3 3 4 4
   $ ID2  : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5
   $ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1
   $ Date : Factor w/ 7 levels 1985-05-1,1985-05-2,..: 1 2
3 4 5 6 7
   $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7


The OP's data.frame contained a POSIXlt (not factor) object
in the Date column
 str(df)
   'data.frame':   7 obs. of  5 variables:
$ ID1  : chr  1 2 2 3 ...
$ ID2  : chr  a b b c ...
$ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1
$ Date : POSIXlt, format: 1985-05-01 1985-05-02 ...
$ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7


Thanks, Bill. Somehow I missed that, despite the OP having it in his 
code; I even copied it into my testing window.  It was my error for not 
running it and noting it.



and apparently plyr's equivalent of rbind doesn't support that class.


plyr uses rbind.fill primarily.  And it doesn't handle columns of 
POSIXlt based on testing that directly. (Although with only one 
argument, it just passes the data.frame back, which is why when there 
was just a single duplicate, it worked; that bypassed the code that 
couldn't handle POSIXlt's.)



If you want to continue using POSIXlt objects you can get your
immediate result without ddply; subscripting will do the job:
 nDups- with(df, ave(rep(0,nrow(df)), ID1, ID2, ID3, FUN=length))
 print(nDups)
   [1] 1 2 2 1 1 2 2
 df[nDups1, ]
 ID1 ID2 ID3   Date Value
   2   2   b  v1 1985-05-02 2
   3   2   b  v1 1985-05-03 3
   6   4   e  v1 1985-05-06 6
   7   4   e  v1 1985-05-07 7
 str(.Last.value)
   'data.frame':   4 obs. of  5 variables:
$ ID1  : chr  2 2 4 4
$ ID2  : chr  b b e e
$ ID3  : Factor w/ 2 levels v1,v2: 1 1 1 1
$ Date : POSIXlt, format: 1985-05-02 1985-05-03 ...
$ Value: Factor w/ 7 levels 1,2,3,4,..: 2 3 6 7

If you need plyr for other tasks you ought to use a different
class for your date data (or

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Hadley Wickham

 If you need plyr for other tasks you ought to use a different
 class for your date data (or wait until plyr can deal with
 POSIXlt objects).

How do you get POSIXlt objects into a data frame?

 df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01
 str(df)
'data.frame':   1 obs. of  1 variable:
 $ x: POSIXct, format: 2008-01-01

 df - data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01)
 str(df)
'data.frame':   1 obs. of  1 variable:
 $ x: AsIs, format: 0

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Christoph Jäckel

Hi together,

thank you so much for your help! The problem was indeed the
strptime-function. Replacing that with as.Date solves the problem,
both in the example I provided and in my actual data set.

I think this is a lesson for me to not use types I'm not really
familiar with (POSIXlt in this case).

Thanks again!

Christoph

On Mon, Apr 25, 2011 at 10:07 PM, Hadley Wickham had...@rice.edu wrote:

  If you need plyr for other tasks you ought to use a different
  class for your date data (or wait until plyr can deal with
  POSIXlt objects).

 How do you get POSIXlt objects into a data frame?

  df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01
  str(df)
 'data.frame':   1 obs. of  1 variable:
  $ x: POSIXct, format: 2008-01-01

  df - data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01)
  str(df)
 'data.frame':   1 obs. of  1 variable:
  $ x: AsIs, format: 0

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/



--


Christoph Jäckel (Dipl.-Kfm.)



Research Assistant

Chair for Financial Management and Capital Markets | Lehrstuhl für
Finanzmanagement und Kapitalmärkte

TUM School of Management | Technische Universität München

Arcisstr. 21 | D-80333 München | Germany

Mailto: christoph.jaec...@wi.tum.de | Web: www.fm.wi.tum.de

Phone: +49 89 289 25482 | Fax: +49 89 289 25488



Head of Chair:

Univ.-Prof. Dr. Christoph Kaserer

--

E-Mail Disclaimer

Der Inhalt dieser E-Mail ist vertraulich und ausschliesslich
fuer den bezeichneten Adressaten bestimmt. Wenn Sie nicht
der vorgesehene Adressat dieser E-Mail oder dessen Vertreter
sein sollten, so beachten Sie bitte, dass jede Form der
Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir
bitten Sie, sich in diesem Fall mit dem Absender der E-Mail
in Verbindung zu setzen.

The information contained in this email is confidential{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Brian Diggs


On 4/25/2011 1:07 PM, Hadley Wickham wrote:

If you need plyr for other tasks you ought to use a different
class for your date data (or wait until plyr can deal with
POSIXlt objects).


How do you get POSIXlt objects into a data frame?


df- data.frame(x = as.POSIXlt(as.Date(c(2008-01-01
str(df)

'data.frame':   1 obs. of  1 variable:
  $ x: POSIXct, format: 2008-01-01


df- data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01)
str(df)

'data.frame':   1 obs. of  1 variable:
  $ x: AsIs, format: 0

Hadley


Assigning to a column after the data.frame creation step

 df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01
 str(df)
'data.frame':   1 obs. of  1 variable:
 $ x: POSIXct, format: 2008-01-01
 dput(df)
structure(list(x = structure(1199145600, class = c(POSIXct,
POSIXt), tzone = UTC)), .Names = x, row.names = c(NA, -1L
), class = data.frame)
 df$x - as.POSIXlt(as.Date(c(2008-01-01)))
 str(df)
'data.frame':   1 obs. of  1 variable:
 $ x: POSIXlt, format: 2008-01-01
 dput(df)
structure(list(x = structure(list(sec = 0, min = 0L, hour = 0L,
mday = 1L, mon = 0L, year = 108L, wday = 2L, yday = 0L, isdst = 
0L), .Names = c(sec,

min, hour, mday, mon, year, wday, yday, isdst
), class = c(POSIXlt, POSIXt), tzone = UTC)), .Names = x, 
row.names = c(NA,

-1L), class = data.frame)

This is reminiscent of the 1d array problem; there are types that are 
coerced into other types when passed as part of a data.frame constructor 
(data.frame call), but are not coerced when assigned to a column.


Looking at help pages, calls to data.frame call as.data.frame on each 
argument; `[-.data.frame` has a section on coercion which starts The 
story over when replacement values are coerced is a complicated one, and 
one that has changed during R's development. This section is a guide 
only. which makes me think it is not all that well defined.


Digging more, there is a as.data.frame.POSIXlt, although the help page 
for it (DateTimeClasses in base) does not mention it or document it.  It 
is documented, though, in as.data.frame (which also has comments about 
coercing 1 dimensional arrays).


So, potentially, there could be differences with any class that has an 
as.data.frame method because it will be treated differently if passed to 
data.frame versus a column assignment with `[-.data.frame`


 methods(as.data.frame)
 [1] as.data.frame.aovproj*as.data.frame.array
 [3] as.data.frame.AsIsas.data.frame.character
 [5] as.data.frame.complex as.data.frame.data.frame
 [7] as.data.frame.Dateas.data.frame.default
 [9] as.data.frame.difftimeas.data.frame.factor
[11] as.data.frame.ftable* as.data.frame.function
[13] as.data.frame.idf*as.data.frame.integer
[15] as.data.frame.listas.data.frame.logical
[17] as.data.frame.logLik* as.data.frame.matrix
[19] as.data.frame.model.matrixas.data.frame.numeric
[21] as.data.frame.numeric_version as.data.frame.ordered
[23] as.data.frame.POSIXct as.data.frame.POSIXlt
[25] as.data.frame.raw as.data.frame.table
[27] as.data.frame.ts  as.data.frame.vector

So, I suppose it is working as documented.  Though I wonder how long ago 
it was that someone (who has been using R regularly for at least a year) 
actually read the entire help page for data.frame and/or as.data.frame. 
 It's one of those things you think you know and understand until you 
find out you don't.


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

2011-04-25 Thread Peter Ehlers


On 2011-04-25 13:07, Hadley Wickham wrote:

If you need plyr for other tasks you ought to use a different
class for your date data (or wait until plyr can deal with
POSIXlt objects).


How do you get POSIXlt objects into a data frame?


df- data.frame(x = as.POSIXlt(as.Date(c(2008-01-01
str(df)

'data.frame':   1 obs. of  1 variable:
  $ x: POSIXct, format: 2008-01-01


df- data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01)
str(df)

'data.frame':   1 obs. of  1 variable:
  $ x: AsIs, format: 0

Hadley



To mimic the OP's code

  df - data.frame(x = 2008-01-01)
  df$x - as.POSIXlt(df$x, %Y-%m-%d)
  str(df)
  #'data.frame':   1 obs. of  1 variable:
  # $ x: POSIXlt, format: 2008-01-01

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column

9 matches

Site Navigation

Mail list logo

Footer information