On Oct 5, 2009, at 5:14 PM, esp wrote:


Date-Time-Stamp input method to correctly interpret user-specific
formats:coding is  90% there - based on exmple at
http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html
...anyone got the last 10% please?

CONTEXT:

Data is received where one of the columns is a datetimestamp. At midnight, the value represented as text in this column consists of just the date part, e.g. "01/09/2009". At other times, the value in the column contains both date and time e.g. "01/09/2009 00:00:01". The goal is to read it into R as
an appropriate data type, where for example date arithmetic can be
performed. As far as I can tell, the most appropriate such data type is POSIXct. The trick then is to read in the datetimestamps in the data as
this type.

PROBLEM:

POSIXct defaults to a text representation almost but not quite like my
received data. The main difference is that the POSIXct date part is in reverse order, e.g. "2009-09-01". It is possible to define a different format where date and time parts look like my data but when encountering datetimestamps where only the the date part is present (as in the case of my
midnight data) then this is interpreted as NA i.e. undefined.

SOLUTION (ALMOST):

There is a workaround (based on example at
http://tolstoy.newcastle.edu.au/R/help/05/02/12003.html). It is possible to define a class then read the data in as this class. For such a class it is possible to define a class method, in terms of a function, for translating a text (character string) representation into a value. In that function, one
can use a conditional expression to treat midnight datetimestamps
differently from those at other times of day. The example below does that. In order to apply this function over all of the datetimestamp values in the
column, it is necessary to use something like R's 'sapply' function.

SNAG:

The function below implements this approach. A datetimestamp with only the date part, including leading zeroes, is always length 10 (characters). It correctly interprets the datetimestamp values, but unfortunately translates them into what appear to be numeric type. I am actually uncertain precisely what is happening, as I am very new to R and have most certainly stretched
myself in writing this code.  I think perhaps it returns a list and
something associated with this aspect makes it "forget" the data type is POSIXct or at least how such a type should be displayed as text or what to
do about it.

PLEA:

Please, can anyone give any help whatsoever, however tenuous?

CODE, DATA & RESULTS:

Function to Read required data, intended to make the datetime column of the
data (example given further below) into POSIXct values:
<<<
spot_frequency_readin <- function(file,nrows=-1) {

# create temp class
setClass("t_class2_", representation("character"))
setAs("character", "t_class2_", function(from) {sapply(from, function(x) {
 if (nchar(x)==10) {
as.POSIXct(strptime(x,format="%d/%m/%Y"))
}
else {
as.POSIXct(strptime(x,format="%d/%m/%Y %H:%M:%S"))
}
}
)
}
)

#(for format symbols, see "R Reference Card")

# read the file (TSV)
file <- read.delim(file, header=TRUE, comment.char = "", nrows=nrows,
as.is=FALSE, col.names=c("DATETIME", "FREQ"), colClasses=c("t_class2_",
"numeric") )

# remove it now that we are done with it
removeClass("t_class2_")

return(file)
}

This appears to work apart as regards processing each row of data correctly, but the values returned look like numeric equivalents of POSIXct, as opposed
to the expected character-based (string) equivalents:


Example Data:
<<<
DATETIME        FREQ
01/09/2009      59.036
01/09/2009 00:00:01     58.035
01/09/2009 00:00:02     53.035
01/09/2009 00:00:03     47.033
01/09/2009 00:00:04     52.03
01/09/2009 00:00:05     55.025



Example Function Call:
<<<
spot = spot_frequency_readin("mydatafile.txt",4)



Result of Example Function Call:
<<<
spot[1]
   DATETIME

1 1251759600
2 1251759601
3 1251759602
4 1251759603



What I ideally wanted to see (whether or not the time part of the
datetimestamp at midnight was displayed):
<<<
spot[1]
   DATETIME

01/09/2009 00:00:00
01/09/2009 00:00:01
01/09/2009 00:00:02
01/09/2009 00:00:03
01/09/2009 00:00:04



For the function as defined above using 'sapply'
spot[,1]
        01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 01/09/2009
00:00:03
        1251759600          1251759601          1251759602
1251759603

This was unexpected - it seems to have displayed the datetimestamp values
both as per my defined character-string representation and as numeric
values.

as.POSIXct(spot$DATETIME,  origin="1970-01-01")
01/09/2009 01/09/2009 00:00:01 01/09/2009 00:00:02 "2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01 05:00:02 EDT"
      01/09/2009 00:00:03
"2009-09-01 05:00:03 EDT"

If you want to get rid of the somewhat extranous names:

> unname(as.POSIXct(spot$DATETIME,  origin="1970-01-01") )
[1] "2009-09-01 05:00:00 EDT" "2009-09-01 05:00:01 EDT" "2009-09-01 05:00:02 EDT"
[4] "2009-09-01 05:00:03 EDT"

If you want a varialbe that stays that way:

> spot$D2 <- as.POSIXct(spot$DATETIME,  origin="1970-01-01")
> spot
    DATETIME   FREQ                  D2
1 1251777600 59.036 2009-09-01 05:00:00
2 1251777601 58.035 2009-09-01 05:00:01
3 1251777602 53.035 2009-09-01 05:00:02
4 1251777603 47.033 2009-09-01 05:00:03

Or you could overwrite spot$DATETIME.



Alternatively ifI replace the 'sapply' by a 'lapply' then I get something closer to what I expect. It is at least what looks like R's default text representation for POSIXct datetimes, even if it is not in my preferred
format.
<<<
spot[,1]

[[1]]
[1] "2009-09-01 BST"

[[2]]
[1] "2009-09-01 00:00:01 BST"

[[3]]
[1] "2009-09-01 00:00:02 BST"

[[4]]
[1] "2009-09-01 00:00:03 BST"


--


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to