Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Patrick Connolly
On Sun, 07-Jan-2007 at 12:01PM +, Mark Wardle wrote:

|> Dear all,
|> 
|> The as.Date() function appears to give different results depending on
|> the order of the vector passed into it.
|> 
|> d1 = c("1900-01-01", "2007-01-01","","2001-05-03")
|> d2 = c("", "1900-01-01", "2007-01-01","2001-05-03")
|> as.Date(d1)  # gives correct results
|> as.Date(d2)  # fails with error (* see below)
|> 
|> This problem does not arise if the dates are NA rather than an empty
|> string, but my data is coming via RODBC and I still don't have NAs
|> passed across properly.
|> 
|> I might add that I initially noticed this behaviour when using RODBC's
|> sqlQuery() function call, and I initially had difficulty explaining why
|> one column of dates was passed correctly, but another failed. The
|> failing column was a "date of death" column where it was NA ("") for
|> most patients.
|> 
|> I've come up with two workarounds that work. The first is to sort the
|> data at the SQL level, ensuring the initial record is not null. The
|> second is to use sqlQuery() with as.is=T option, and then do the sorting
|> and conversion afterwards.

Simpler, I think, is to add one line
d2[d2 == ""] <- NA

I've not tested the idea extensively, so there might be occasions
where it falls down.  If you're working with a dataframe, you can use
one of the apply functions to effect all columns.


HTH

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~} Great minds discuss ideas
 _( Y )_Middle minds discuss events 
(:_~*~_:)Small minds discuss people  
 (_)-(_)   . Anon
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Mark Wardle
Prof Brian Ripley wrote:

> The correct work-around is to get non-valid strings returned as NA, not
> "".  That is argument 'na.strings' in RODBC (and elsewhere: read.table
> behaves in the same way).
> 


Thanks for these replies.

As I have mentioned before, my peculiar combination of PostgreSQL,
Actual's ODBC driver on Mac OS X, and RODBC means that for numbers and
dates, NULL values are passed to R as empty strings rather than NAs (2).
This does not occur with PostgreSQL's "text" column type.

For the benefit of others who in the future may use this combination(1),
my workaround for numbers/integers/boolean values is to essentially have
temporary intermediate tables with columns of type "text" whatever the
format, and let R/RODBC parse the strings into the correct resulting
format (which it then does faultlessly). This does not work for dates
however, and so I must use one of the two workarounds I mentioned in my
post.


Best wishes,

Mark

(1) unlikely as it may be
(2) I still cannot fathom why integers and dates are not handled correctly.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Prof Brian Ripley
On Sun, 7 Jan 2007, Mark Wardle wrote:

> Dear all,
>
> The as.Date() function appears to give different results depending on
> the order of the vector passed into it.
>
> d1 = c("1900-01-01", "2007-01-01","","2001-05-03")
> d2 = c("", "1900-01-01", "2007-01-01","2001-05-03")
> as.Date(d1)   # gives correct results
> as.Date(d2)   # fails with error (* see below)
>
> This problem does not arise if the dates are NA rather than an empty
> string, but my data is coming via RODBC and I still don't have NAs
> passed across properly.
>
> I might add that I initially noticed this behaviour when using RODBC's
> sqlQuery() function call, and I initially had difficulty explaining why
> one column of dates was passed correctly, but another failed. The
> failing column was a "date of death" column where it was NA ("") for
> most patients.
>
> I've come up with two workarounds that work. The first is to sort the
> data at the SQL level, ensuring the initial record is not null. The
> second is to use sqlQuery() with as.is=T option, and then do the sorting
> and conversion afterwards.
>
> Is the behaviour of as.Date() shown above as expected/designed?

Yes.  It uses the first non-NA string to choose the format *if you do not 
specify it*.

The correct work-around is to get non-valid strings returned as NA, not 
"".  That is argument 'na.strings' in RODBC (and elsewhere: read.table 
behaves in the same way).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Gavin Simpson
On Sun, 2007-01-07 at 12:01 +, Mark Wardle wrote:
> Dear all,
> 
> The as.Date() function appears to give different results depending on
> the order of the vector passed into it.
> 
> d1 = c("1900-01-01", "2007-01-01","","2001-05-03")
> d2 = c("", "1900-01-01", "2007-01-01","2001-05-03")
> as.Date(d1)   # gives correct results
> as.Date(d2)   # fails with error (* see below)
> 
> This problem does not arise if the dates are NA rather than an empty
> string, but my data is coming via RODBC and I still don't have NAs
> passed across properly.
> 
> I might add that I initially noticed this behaviour when using RODBC's
> sqlQuery() function call, and I initially had difficulty explaining why
> one column of dates was passed correctly, but another failed. The
> failing column was a "date of death" column where it was NA ("") for
> most patients.
> 
> I've come up with two workarounds that work. The first is to sort the
> data at the SQL level, ensuring the initial record is not null. The
> second is to use sqlQuery() with as.is=T option, and then do the sorting
> and conversion afterwards.

Why not just tell R what the format the dates are in, using the "format"
argument to as.Date?

> d1 = c("1900-01-01", "2007-01-01","","2001-05-03")
> d2 = c("", "1900-01-01", "2007-01-01","2001-05-03")
> as.Date(d1, "%Y-%m-%d")
[1] "1900-01-01" "2007-01-01" NA   "2001-05-03"
> as.Date(d2, "%Y-%m-%d")
[1] NA   "1900-01-01" "2007-01-01" "2001-05-03"

> 
> Is the behaviour of as.Date() shown above as expected/designed?

I don't know about expected/designed, but I would have thought
explicitly stating the date format would be the most fool-proof way of
making sure R did what you wanted, and the easiest way to work around
your "problem".

HTH

G

> 
> Many thanks,
> 
> Mark
> 
> 
> (*) "Error in fromchar(x) : character string is not in a standard
> unambiguous format"
> 
> sessionInfo():
> R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale:
> C/en_GB.UTF-8/C/C/C/C
> attached base packages:
> [1] "methods"   "stats" "graphics"  "grDevices" "utils"
> "datasets" "base"
> 
> other attached packages:
> rcompletion   RODBC
>"0.0-12" "1.1-7"
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Mark Wardle
Dear all,

The as.Date() function appears to give different results depending on
the order of the vector passed into it.

d1 = c("1900-01-01", "2007-01-01","","2001-05-03")
d2 = c("", "1900-01-01", "2007-01-01","2001-05-03")
as.Date(d1) # gives correct results
as.Date(d2) # fails with error (* see below)

This problem does not arise if the dates are NA rather than an empty
string, but my data is coming via RODBC and I still don't have NAs
passed across properly.

I might add that I initially noticed this behaviour when using RODBC's
sqlQuery() function call, and I initially had difficulty explaining why
one column of dates was passed correctly, but another failed. The
failing column was a "date of death" column where it was NA ("") for
most patients.

I've come up with two workarounds that work. The first is to sort the
data at the SQL level, ensuring the initial record is not null. The
second is to use sqlQuery() with as.is=T option, and then do the sorting
and conversion afterwards.

Is the behaviour of as.Date() shown above as expected/designed?

Many thanks,

Mark


(*) "Error in fromchar(x) : character string is not in a standard
unambiguous format"

sessionInfo():
R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale:
C/en_GB.UTF-8/C/C/C/C
attached base packages:
[1] "methods"   "stats" "graphics"  "grDevices" "utils"
"datasets" "base"

other attached packages:
rcompletion   RODBC
   "0.0-12" "1.1-7"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.