Re: [R] Date-Time-Stamp input method for user-specific formats

David Winsemius Mon, 05 Oct 2009 15:25:51 -0700


On Oct 5, 2009, at 5:14 PM, esp wrote:

Date-Time-Stamp input method to correctly interpret user-specific
formats:coding is  90% there - based on exmple at
http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html
...anyone got the last 10% please?

CONTEXT:
Data is received where one of the columns is a datetimestamp. Atmidnight,the value represented as text in this column consists of just thedate part,e.g. "01/09/2009". At other times, the value in the column containsbothdate and time e.g. "01/09/2009 00:00:01". The goal is to read itinto R as
an appropriate data type, where for example date arithmetic can be
performed. As far as I can tell, the most appropriate such datatype isPOSIXct. The trick then is to read in the datetimestamps in thedata as
this type.

PROBLEM:

POSIXct defaults to a text representation almost but not quite like my
received data. The main difference is that the POSIXct date part isinreverse order, e.g. "2009-09-01". It is possible to define adifferentformat where date and time parts look like my data but whenencounteringdatetimestamps where only the the date part is present (as in thecase of my
midnight data) then this is interpreted as NA i.e. undefined.

SOLUTION (ALMOST):

There is a workaround (based on example at
http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It ispossible todefine a class then read the data in as this class. For such aclass it ispossible to define a class method, in terms of a function, fortranslating atext (character string) representation into a value. In thatfunction, one
can use a conditional expression to treat midnight datetimestamps
differently from those at other times of day. The example belowdoes that.In order to apply this function over all of the datetimestamp valuesin the
column, it is necessary to use something like R's 'sapply' function.

SNAG:
The function below implements this approach. A datetimestamp withonly thedate part, including leading zeroes, is always length 10(characters). Itcorrectly interprets the datetimestamp values, but unfortunatelytranslatesthem into what appear to be numeric type. I am actually uncertainpreciselywhat is happening, as I am very new to R and have most certainlystretched
myself in writing this code.  I think perhaps it returns a list and
something associated with this aspect makes it "forget" the datatype isPOSIXct or at least how such a type should be displayed as text orwhat to
do about it.

PLEA:

Please, can anyone give any help whatsoever, however tenuous?

CODE, DATA & RESULTS:
Function to Read required data, intended to make the datetime columnof the
data (example given further below) into POSIXct values:
<<<
spot_frequency_readin <- function(file,nrows=-1) {

# create temp class
setClass("t_class2_", representation("character"))
setAs("character", "t_class2_", function(from) {sapply(from,function(x) {
 if (nchar(x)==10) {
as.POSIXct(strptime(x,format="%d/%m/%Y"))
}
else {
as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
}
}
)
}
)

#(for format symbols, see "R Reference Card")

# read the file (TSV)
file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
as.is=FALSE, col.names=c("DATETIME", "FREQ"),colClasses=c("t_class2_",
"numeric") )

# remove it now that we are done with it
removeClass("t_class2_")

return(file)
}
This appears to work apart as regards processing each row of datacorrectly,but the values returned look like numeric equivalents of POSIXct, asopposed
to the expected character-based (string) equivalents:


Example Data:
<<<
DATETIME        FREQ
01/09/2009      59.036
01/09/2009 00:00:01     58.035
01/09/2009 00:00:02     53.035
01/09/2009 00:00:03     47.033
01/09/2009 00:00:04     52.03
01/09/2009 00:00:05     55.025
Example Function Call:
<<<
spot = spot_frequency_readin("mydatafile.txt",4)
Result of Example Function Call:
<<<
spot[1]
   DATETIME

1 1251759600
2 1251759601
3 1251759602
4 1251759603
What I ideally wanted to see (whether or not the time part of the
datetimestamp at midnight was displayed):
<<<
spot[1]
   DATETIME

01/09/2009 00:00:00
01/09/2009 00:00:01
01/09/2009 00:00:02
01/09/2009 00:00:03
01/09/2009 00:00:04
For the function as defined above using 'sapply'
spot[,1]
        01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
00:00:03
        1251759600          1251759601          1251759602
1251759603
This was unexpected - it seems to have displayed the datetimestampvalues
both as per my defined character-string representation and as numeric
values.


as.POSIXct(spot$DATETIME,  origin="1970-01-01")

01/09/2009 01/09/2009 00:00:01 01/09/200900:00:02"2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-0105:00:02 EDT"

      01/09/2009 00:00:03
"2009-09-01 05:00:03 EDT"

If you want to get rid of the somewhat extranous names:

> unname(as.POSIXct(spot$DATETIME,  origin="1970-01-01") )

[1] "2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-0105:00:02 EDT"

[4] "2009-09-01 05:00:03 EDT"

If you want a varialbe that stays that way:

> spot$D2 <- as.POSIXct(spot$DATETIME,  origin="1970-01-01")
> spot
    DATETIME   FREQ                  D2
1 1251777600 59.036 2009-09-01 05:00:00
2 1251777601 58.035 2009-09-01 05:00:01
3 1251777602 53.035 2009-09-01 05:00:02
4 1251777603 47.033 2009-09-01 05:00:03

Or you could overwrite spot$DATETIME.

Alternatively ifI replace the 'sapply' by a 'lapply' then I getsomethingcloser to what I expect. It is at least what looks like R's defaulttextrepresentation for POSIXct datetimes, even if it is not in mypreferred
format.
<<<
spot[,1]
[[1]]
[1] "2009-09-01 BST"

[[2]]
[1] "2009-09-01 00:00:01 BST"

[[3]]
[1] "2009-09-01 00:00:02 BST"

[[4]]
[1] "2009-09-01 00:00:03 BST"
--



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Date-Time-Stamp input method for user-specific formats

Reply via email to