[R] Problem with ddply in the plyr-package: surprising output of a date-column
Hi Together, I have a problem with the plyr package - more precisely with the ddply function - and would be very grateful for any help. I hope the example here is precise enough for someone to identify the problem. Basically, in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step, without deleting any rows or manipulating any data) in a separate data.frame. However, I get the warning message below and the column with dates is messed up. Interestingly, the value column (the type is factor here, but if you change that with as.integer it doesn't make any difference) is handled correctly. Any idea what I do wrong? df - data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1), Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-05-5,1985-05-6,1985-05-7), Value=c(1,2,3,4,5,6,7))) df[,1] - as.character(df[,1]) df[,2] - as.character(df[,2]) df$Date - strptime(df$Date,%Y-%m-%d) #Apparently there are two observation that have the same IDs: ID1=2 and ID1=4 ddply(df,.(ID1,ID2,ID3),nrow) #I want to save those IDs in a separate data.frame, so the desired output is: df[c(2:3,6:7),] #My idea: Write a custom function that only returns observations with multiple rows. #Seems to work except that the Date column doesn't make any sense anymore #Warning message: In output[[var]][rng] - df[[var]]: number of items to replace is not a multiple of replacement length ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) #Notice that it works perfectly if I only have one observation with multiple rows ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) Thanks in advance, Christoph Christoph Jäckel (Dipl.-Kfm.) Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhls für Finanzmanagement und Kapitalmärkte TUM School of Management | Technische Universität München Arcisstr. 21 | D-80333 München | Germany __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
On 4/25/2011 10:19 AM, Christoph Jäckel wrote: Hi Together, I have a problem with the plyr package - more precisely with the ddply function - and would be very grateful for any help. I hope the example here is precise enough for someone to identify the problem. Basically, in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step, without deleting any rows or manipulating any data) in a separate data.frame. However, I get the warning message below and the column with dates is messed up. Interestingly, the value column (the type is factor here, but if you change that with as.integer it doesn't make any difference) is handled correctly. Any idea what I do wrong? df- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1), Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-05-5,1985-05-6,1985-05-7), Value=c(1,2,3,4,5,6,7))) df[,1]- as.character(df[,1]) df[,2]- as.character(df[,2]) df$Date- strptime(df$Date,%Y-%m-%d) #Apparently there are two observation that have the same IDs: ID1=2 and ID1=4 ddply(df,.(ID1,ID2,ID3),nrow) #I want to save those IDs in a separate data.frame, so the desired output is: df[c(2:3,6:7),] #My idea: Write a custom function that only returns observations with multiple rows. #Seems to work except that the Date column doesn't make any sense anymore #Warning message: In output[[var]][rng]- df[[var]]: number of items to replace is not a multiple of replacement length ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) #Notice that it works perfectly if I only have one observation with multiple rows ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) Works for me: df[c(2:3,6:7),] ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-2 2 3 2 b v1 1985-05-3 3 6 4 e v1 1985-05-6 6 7 4 e v1 1985-05-7 7 ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) ID1 ID2 ID3 Date Value 1 2 b v1 1985-05-2 2 2 2 b v1 1985-05-3 3 3 4 e v1 1985-05-6 6 4 4 e v1 1985-05-7 7 sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.5.2 loaded via a namespace (and not attached): [1] tools_2.13.0 A couple of things: there was just an update of plyr to 1.5.2; maybe that fixes what you are seeing? Also, your df consists of only factors. cbind-ing the data before turning it into a data.frame makes it a character matrix which gets converted to factors. str(df) 'data.frame': 7 obs. of 5 variables: $ ID1 : Factor w/ 4 levels 1,2,3,4: 1 2 2 3 3 4 4 $ ID2 : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5 $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1 $ Date : Factor w/ 7 levels 1985-05-1,1985-05-2,..: 1 2 3 4 5 6 7 $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7 Maybe that has something to do with the odd dates since they are not really dates at all, just string representations of factor levels. Compare with: DF - data.frame(ID1=c(1,2,2,3,3,4,4), ID2=c('a','b','b','c','d','e','e'), ID3=c(v1,v1,v1,v1,v2,v1,v1), Date=as.Date(c(1985-05-1,1985-05-2,1985-05-3, 1985-05-4,1985-05-5,1985-05-6,1985-05-7)), Value=c(1,2,3,4,5,6,7)) str(DF) #'data.frame': 7 obs. of 5 variables: # $ ID1 : num 1 2 2 3 3 4 4 # $ ID2 : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5 # $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1 # $ Date : Date, format: 1985-05-01 1985-05-02 ... # $ Value: num 1 2 3 4 5 6 7 This version also works for me. ddply(DF,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) # ID1 ID2 ID3 Date Value #1 2 b v1 1985-05-02 2 #2 2 b v1 1985-05-03 3 #3 4 e v1 1985-05-06 6 #4 4 e v1 1985-05-07 7 Thanks in advance, Christoph Christoph Jäckel (Dipl.-Kfm.) Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhls für Finanzmanagement und Kapitalmärkte TUM School of Management | Technische Universität München Arcisstr. 21 | D-80333 München | Germany -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
On 2011-04-25 10:19, Christoph Jäckel wrote: Hi Together, I have a problem with the plyr package - more precisely with the ddply function - and would be very grateful for any help. I hope the example here is precise enough for someone to identify the problem. Basically, in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step, without deleting any rows or manipulating any data) in a separate data.frame. However, I get the warning message below and the column with dates is messed up. Interestingly, the value column (the type is factor here, but if you change that with as.integer it doesn't make any difference) is handled correctly. Any idea what I do wrong? df- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1), Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-05-5,1985-05-6,1985-05-7), Value=c(1,2,3,4,5,6,7))) df[,1]- as.character(df[,1]) df[,2]- as.character(df[,2]) df$Date- strptime(df$Date,%Y-%m-%d) #Apparently there are two observation that have the same IDs: ID1=2 and ID1=4 ddply(df,.(ID1,ID2,ID3),nrow) #I want to save those IDs in a separate data.frame, so the desired output is: df[c(2:3,6:7),] #My idea: Write a custom function that only returns observations with multiple rows. #Seems to work except that the Date column doesn't make any sense anymore #Warning message: In output[[var]][rng]- df[[var]]: number of items to replace is not a multiple of replacement length ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) #Notice that it works perfectly if I only have one observation with multiple rows ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) I would characterize your problem as: a) using strptime - this is what gives ddply() fits; b) not using str() to check whether R agrees with you with respect to your data; c) using cbind() inside data.frame(). This isn't wrong, but is rarely (in my experience) useful. If you use as.Date (or even nothing) on your Date variable, you'll find that ddply does what you want. To see why it doesn't work with strptime, check str(df) and then ?Posixlt. You've converted Date values to lists. My comment about cbind() is to warn you that your Values variable, as you have constructed it, is a factor. Peter Ehlers Thanks in advance, Christoph Christoph Jäckel (Dipl.-Kfm.) Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhls für Finanzmanagement und Kapitalmärkte TUM School of Management | Technische Universität München Arcisstr. 21 | D-80333 München | Germany __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Diggs Sent: Monday, April 25, 2011 11:05 AM To: christoph.jaec...@wi.tum.de Cc: r-help@r-project.org Subject: Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column On 4/25/2011 10:19 AM, Christoph Jäckel wrote: Hi Together, I have a problem with the plyr package - more precisely with the ddply function - and would be very grateful for any help. I hope the example here is precise enough for someone to identify the problem. Basically, in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step, without deleting any rows or manipulating any data) in a separate data.frame. However, I get the warning message below and the column with dates is messed up. Interestingly, the value column (the type is factor here, but if you change that with as.integer it doesn't make any difference) is handled correctly. Any idea what I do wrong? df- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d ','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1), Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-0 5-5,1985-05-6,1985-05-7), Value=c(1,2,3,4,5,6,7))) df[,1]- as.character(df[,1]) df[,2]- as.character(df[,2]) df$Date- strptime(df$Date,%Y-%m-%d) #Apparently there are two observation that have the same IDs: ID1=2 and ID1=4 ddply(df,.(ID1,ID2,ID3),nrow) #I want to save those IDs in a separate data.frame, so the desired output is: df[c(2:3,6:7),] #My idea: Write a custom function that only returns observations with multiple rows. #Seems to work except that the Date column doesn't make any sense anymore #Warning message: In output[[var]][rng]- df[[var]]: number of items to replace is not a multiple of replacement length ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) #Notice that it works perfectly if I only have one observation with multiple rows ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) Works for me: df[c(2:3,6:7),] ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-2 2 3 2 b v1 1985-05-3 3 6 4 e v1 1985-05-6 6 7 4 e v1 1985-05-7 7 ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) ID1 ID2 ID3 Date Value 1 2 b v1 1985-05-2 2 2 2 b v1 1985-05-3 3 3 4 e v1 1985-05-6 6 4 4 e v1 1985-05-7 7 [ ... version info elided ... ] A couple of things: there was just an update of plyr to 1.5.2; maybe that fixes what you are seeing? Also, your df consists of only factors. cbind-ing the data before turning it into a data.frame makes it a character matrix which gets converted to factors. str(df) 'data.frame': 7 obs. of 5 variables: $ ID1 : Factor w/ 4 levels 1,2,3,4: 1 2 2 3 3 4 4 $ ID2 : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5 $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1 $ Date : Factor w/ 7 levels 1985-05-1,1985-05-2,..: 1 2 3 4 5 6 7 $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7 The OP's data.frame contained a POSIXlt (not factor) object in the Date column str(df) 'data.frame': 7 obs. of 5 variables: $ ID1 : chr 1 2 2 3 ... $ ID2 : chr a b b c ... $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1 $ Date : POSIXlt, format: 1985-05-01 1985-05-02 ... $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7 and apparently plyr's equivalent of rbind doesn't support that class. If you want to continue using POSIXlt objects you can get your immediate result without ddply; subscripting will do the job: nDups - with(df, ave(rep(0,nrow(df)), ID1, ID2, ID3, FUN=length)) print(nDups) [1] 1 2 2 1 1 2 2 df[nDups1, ] ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-02 2 3 2 b v1 1985-05-03 3 6 4 e v1 1985-05-06 6 7 4 e v1 1985-05-07 7 str(.Last.value) 'data.frame': 4 obs. of 5 variables: $ ID1 : chr 2 2 4 4 $ ID2 : chr b b e e $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 $ Date : POSIXlt, format: 1985-05-02 1985-05-03 ... $ Value: Factor w/ 7 levels 1,2,3,4,..: 2 3 6 7 If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Maybe that has something to do with the odd dates since they are not really dates at all, just string representations of factor levels. Compare with: DF - data.frame(ID1=c(1,2,2,3,3,4,4), ID2=c('a','b','b','c','d','e','e'), ID3=c(v1,v1,v1,v1,v2,v1,v1), Date=as.Date(c(1985-05-1,1985-05-2,1985-05-3,
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
On 4/25/2011 11:55 AM, William Dunlap wrote: Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Diggs Sent: Monday, April 25, 2011 11:05 AM To: christoph.jaec...@wi.tum.de Cc: r-help@r-project.org Subject: Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column On 4/25/2011 10:19 AM, Christoph Jäckel wrote: Hi Together, I have a problem with the plyr package - more precisely with the ddply function - and would be very grateful for any help. I hope the example here is precise enough for someone to identify the problem. Basically, in this step I want to identify observations that are identical in terms of certain identifiers (ID1, ID2, ID3) and just want to save those observations (in this step, without deleting any rows or manipulating any data) in a separate data.frame. However, I get the warning message below and the column with dates is messed up. Interestingly, the value column (the type is factor here, but if you change that with as.integer it doesn't make any difference) is handled correctly. Any idea what I do wrong? df- data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d ','e','e'),ID3=c(v1,v1,v1,v1,v2,v1,v1), Date=c(1985-05-1,1985-05-2,1985-05-3,1985-05-4,1985-0 5-5,1985-05-6,1985-05-7), Value=c(1,2,3,4,5,6,7))) df[,1]- as.character(df[,1]) df[,2]- as.character(df[,2]) df$Date- strptime(df$Date,%Y-%m-%d) #Apparently there are two observation that have the same IDs: ID1=2 and ID1=4 ddply(df,.(ID1,ID2,ID3),nrow) #I want to save those IDs in a separate data.frame, so the desired output is: df[c(2:3,6:7),] #My idea: Write a custom function that only returns observations with multiple rows. #Seems to work except that the Date column doesn't make any sense anymore #Warning message: In output[[var]][rng]- df[[var]]: number of items to replace is not a multiple of replacement length ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) #Notice that it works perfectly if I only have one observation with multiple rows ddply(df[1:6,],.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) Works for me: df[c(2:3,6:7),] ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-2 2 3 2 b v1 1985-05-3 3 6 4 e v1 1985-05-6 6 7 4 e v1 1985-05-7 7 ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)=1){NULL}else{df}) ID1 ID2 ID3 Date Value 1 2 b v1 1985-05-2 2 2 2 b v1 1985-05-3 3 3 4 e v1 1985-05-6 6 4 4 e v1 1985-05-7 7 [ ... version info elided ... ] A couple of things: there was just an update of plyr to 1.5.2; maybe that fixes what you are seeing? Also, your df consists of only factors. cbind-ing the data before turning it into a data.frame makes it a character matrix which gets converted to factors. str(df) 'data.frame': 7 obs. of 5 variables: $ ID1 : Factor w/ 4 levels 1,2,3,4: 1 2 2 3 3 4 4 $ ID2 : Factor w/ 5 levels a,b,c,d,..: 1 2 2 3 4 5 5 $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1 $ Date : Factor w/ 7 levels 1985-05-1,1985-05-2,..: 1 2 3 4 5 6 7 $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7 The OP's data.frame contained a POSIXlt (not factor) object in the Date column str(df) 'data.frame': 7 obs. of 5 variables: $ ID1 : chr 1 2 2 3 ... $ ID2 : chr a b b c ... $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 2 1 1 $ Date : POSIXlt, format: 1985-05-01 1985-05-02 ... $ Value: Factor w/ 7 levels 1,2,3,4,..: 1 2 3 4 5 6 7 Thanks, Bill. Somehow I missed that, despite the OP having it in his code; I even copied it into my testing window. It was my error for not running it and noting it. and apparently plyr's equivalent of rbind doesn't support that class. plyr uses rbind.fill primarily. And it doesn't handle columns of POSIXlt based on testing that directly. (Although with only one argument, it just passes the data.frame back, which is why when there was just a single duplicate, it worked; that bypassed the code that couldn't handle POSIXlt's.) If you want to continue using POSIXlt objects you can get your immediate result without ddply; subscripting will do the job: nDups- with(df, ave(rep(0,nrow(df)), ID1, ID2, ID3, FUN=length)) print(nDups) [1] 1 2 2 1 1 2 2 df[nDups1, ] ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-02 2 3 2 b v1 1985-05-03 3 6 4 e v1 1985-05-06 6 7 4 e v1 1985-05-07 7 str(.Last.value) 'data.frame': 4 obs. of 5 variables: $ ID1 : chr 2 2 4 4 $ ID2 : chr b b e e $ ID3 : Factor w/ 2 levels v1,v2: 1 1 1 1 $ Date : POSIXlt, format: 1985-05-02 1985-05-03 ... $ Value: Factor w/ 7 levels 1,2,3,4,..: 2 3 6 7 If you need plyr for other tasks you ought to use a different class for your date data (or
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). How do you get POSIXlt objects into a data frame? df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01 str(df) 'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: 2008-01-01 df - data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01) str(df) 'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: 0 Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
Hi together, thank you so much for your help! The problem was indeed the strptime-function. Replacing that with as.Date solves the problem, both in the example I provided and in my actual data set. I think this is a lesson for me to not use types I'm not really familiar with (POSIXlt in this case). Thanks again! Christoph On Mon, Apr 25, 2011 at 10:07 PM, Hadley Wickham had...@rice.edu wrote: If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). How do you get POSIXlt objects into a data frame? df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01 str(df) 'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: 2008-01-01 df - data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01) str(df) 'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: 0 Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ -- Christoph Jäckel (Dipl.-Kfm.) Research Assistant Chair for Financial Management and Capital Markets | Lehrstuhl für Finanzmanagement und Kapitalmärkte TUM School of Management | Technische Universität München Arcisstr. 21 | D-80333 München | Germany Mailto: christoph.jaec...@wi.tum.de | Web: www.fm.wi.tum.de Phone: +49 89 289 25482 | Fax: +49 89 289 25488 Head of Chair: Univ.-Prof. Dr. Christoph Kaserer -- E-Mail Disclaimer Der Inhalt dieser E-Mail ist vertraulich und ausschliesslich fuer den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte, dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen. The information contained in this email is confidential{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
On 4/25/2011 1:07 PM, Hadley Wickham wrote: If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). How do you get POSIXlt objects into a data frame? df- data.frame(x = as.POSIXlt(as.Date(c(2008-01-01 str(df) 'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: 2008-01-01 df- data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01) str(df) 'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: 0 Hadley Assigning to a column after the data.frame creation step df - data.frame(x = as.POSIXlt(as.Date(c(2008-01-01 str(df) 'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: 2008-01-01 dput(df) structure(list(x = structure(1199145600, class = c(POSIXct, POSIXt), tzone = UTC)), .Names = x, row.names = c(NA, -1L ), class = data.frame) df$x - as.POSIXlt(as.Date(c(2008-01-01))) str(df) 'data.frame': 1 obs. of 1 variable: $ x: POSIXlt, format: 2008-01-01 dput(df) structure(list(x = structure(list(sec = 0, min = 0L, hour = 0L, mday = 1L, mon = 0L, year = 108L, wday = 2L, yday = 0L, isdst = 0L), .Names = c(sec, min, hour, mday, mon, year, wday, yday, isdst ), class = c(POSIXlt, POSIXt), tzone = UTC)), .Names = x, row.names = c(NA, -1L), class = data.frame) This is reminiscent of the 1d array problem; there are types that are coerced into other types when passed as part of a data.frame constructor (data.frame call), but are not coerced when assigned to a column. Looking at help pages, calls to data.frame call as.data.frame on each argument; `[-.data.frame` has a section on coercion which starts The story over when replacement values are coerced is a complicated one, and one that has changed during R's development. This section is a guide only. which makes me think it is not all that well defined. Digging more, there is a as.data.frame.POSIXlt, although the help page for it (DateTimeClasses in base) does not mention it or document it. It is documented, though, in as.data.frame (which also has comments about coercing 1 dimensional arrays). So, potentially, there could be differences with any class that has an as.data.frame method because it will be treated differently if passed to data.frame versus a column assignment with `[-.data.frame` methods(as.data.frame) [1] as.data.frame.aovproj*as.data.frame.array [3] as.data.frame.AsIsas.data.frame.character [5] as.data.frame.complex as.data.frame.data.frame [7] as.data.frame.Dateas.data.frame.default [9] as.data.frame.difftimeas.data.frame.factor [11] as.data.frame.ftable* as.data.frame.function [13] as.data.frame.idf*as.data.frame.integer [15] as.data.frame.listas.data.frame.logical [17] as.data.frame.logLik* as.data.frame.matrix [19] as.data.frame.model.matrixas.data.frame.numeric [21] as.data.frame.numeric_version as.data.frame.ordered [23] as.data.frame.POSIXct as.data.frame.POSIXlt [25] as.data.frame.raw as.data.frame.table [27] as.data.frame.ts as.data.frame.vector So, I suppose it is working as documented. Though I wonder how long ago it was that someone (who has been using R regularly for at least a year) actually read the entire help page for data.frame and/or as.data.frame. It's one of those things you think you know and understand until you find out you don't. -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with ddply in the plyr-package: surprising output of a date-column
On 2011-04-25 13:07, Hadley Wickham wrote: If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). How do you get POSIXlt objects into a data frame? df- data.frame(x = as.POSIXlt(as.Date(c(2008-01-01 str(df) 'data.frame': 1 obs. of 1 variable: $ x: POSIXct, format: 2008-01-01 df- data.frame(x = I(as.POSIXlt(as.Date(c(2008-01-01) str(df) 'data.frame': 1 obs. of 1 variable: $ x: AsIs, format: 0 Hadley To mimic the OP's code df - data.frame(x = 2008-01-01) df$x - as.POSIXlt(df$x, %Y-%m-%d) str(df) #'data.frame': 1 obs. of 1 variable: # $ x: POSIXlt, format: 2008-01-01 Peter Ehlers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.