Re: [R] Date handling in R is hard to understand
Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Alemu Tadesse Sent: Friday, November 08, 2013 8:41 PM To: r-help@r-project.org Subject: [R] Date handling in R is hard to understand Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to rbind/cbind can use data.frame method which add any column specific format. However with normal method, it results in matrix which has to have common type of data in all columns (actually matrix is only vector with dimensions). str(cbind(airquality, 1:153)) 'data.frame': 153 obs. of 7 variables: $ ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ solar.r: int 190 118 149 313 NA NA 299 99 19 194 ... $ wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ temp : int 67 72 74 62 56 66 65 59 61 69 ... $ month : int 5 5 5 5 5 5 5 5 5 5 ... $ day: int 1 2 3 4 5 6 7 8 9 10 ... $ 1:153 : int 1 2 3 4 5 6 7 8 9 10 ... Regards Petr change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
Thank you all for taking your time and looking at this problem. Yes, date handling is a problem with many languages. I have resolved the rbind not being able to handle different data formats in a column for this specific problem by making the data format a character and later convert back to numeric. Thank you again On Mon, Nov 11, 2013 at 3:06 AM, PIKAL Petr petr.pi...@precheza.cz wrote: Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Alemu Tadesse Sent: Friday, November 08, 2013 8:41 PM To: r-help@r-project.org Subject: [R] Date handling in R is hard to understand Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to rbind/cbind can use data.frame method which add any column specific format. However with normal method, it results in matrix which has to have common type of data in all columns (actually matrix is only vector with dimensions). str(cbind(airquality, 1:153)) 'data.frame': 153 obs. of 7 variables: $ ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ solar.r: int 190 118 149 313 NA NA 299 99 19 194 ... $ wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ temp : int 67 72 74 62 56 66 65 59 61 69 ... $ month : int 5 5 5 5 5 5 5 5 5 5 ... $ day: int 1 2 3 4 5 6 7 8 9 10 ... $ 1:153 : int 1 2 3 4 5 6 7 8 9 10 ... Regards Petr change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
I agree w/ lubridate.I also would like to mention that date handling is amazingly difficult in ALL computer languages, not just R. Take a stroll through sites like thedailywtf.com to see how quickly people get into tarpits full of thorns when trying to deal with leap years, weeks vs month ends, etc. Bert Gunter wrote Have a look at the lubridate package. It claims to try to make dealing with dates easier. -- Bert On Fri, Nov 8, 2013 at 11:41 AM, Alemu Tadesse lt; alemu.tadesse@ gt; wrote: Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Date-handling-in-R-is-hard-to-understand-tp4680070p4680125.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
Have a look at the lubridate package. It claims to try to make dealing with dates easier. -- Bert On Fri, Nov 8, 2013 at 11:41 AM, Alemu Tadesse alemu.tade...@gmail.com wrote: Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date - as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Date handling in R is hard to understand
Hi Mihretu, Can you grep for AM or PM? If so build your format string depending upon whether one of these exists in the date string. Jim On 11/09/2013 06:41 AM, Alemu Tadesse wrote: Dear All, I usually work with time series data. The data may come in AM/PM date format or on 24 hour time basis. R can not recognize the two differences automatically - at least for me. I have to specifically tell R in which time format the data is. It seems that Pandas knows how to handle date without being told the format. The problem arises when I try to shift time by a certain time. Say adding 3600 to shift it forward, that case I have to use something like: Measured_data$Date- as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %I:%M %p)+3600 or Measured_data$Date- as.POSIXct(as.character(Measured_data$Date), tz=,format = %m/%d/%Y %H:%M)+3600 depending on the format. The date also attaches MDT or MST and so on. When merging two data frames with dates of different format that may create a problem (I think). When I get data from excel it could be in any/random format and I needed to customize the date to use in R in one of the above formats. Any TIPS - for automatic processing with no need to specifically tell the data format ? Another problem I saw was that when using r bind to bind data frames, if one column of one of the data frames is a character data (say for example none - coming from mysql) format R doesn't know how to concatenate numeric column from the other data frame to it. I needed to change the numeric to character and later after binding takes place I had to re-convert it to numeric. But, this causes problem in an automated environment. Any suggestion ? Thanks Mihretu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.