Re: [R] as.Date() results depend on order of data within vector?
On Sun, 07-Jan-2007 at 12:01PM +, Mark Wardle wrote: |> Dear all, |> |> The as.Date() function appears to give different results depending on |> the order of the vector passed into it. |> |> d1 = c("1900-01-01", "2007-01-01","","2001-05-03") |> d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") |> as.Date(d1) # gives correct results |> as.Date(d2) # fails with error (* see below) |> |> This problem does not arise if the dates are NA rather than an empty |> string, but my data is coming via RODBC and I still don't have NAs |> passed across properly. |> |> I might add that I initially noticed this behaviour when using RODBC's |> sqlQuery() function call, and I initially had difficulty explaining why |> one column of dates was passed correctly, but another failed. The |> failing column was a "date of death" column where it was NA ("") for |> most patients. |> |> I've come up with two workarounds that work. The first is to sort the |> data at the SQL level, ensuring the initial record is not null. The |> second is to use sqlQuery() with as.is=T option, and then do the sorting |> and conversion afterwards. Simpler, I think, is to add one line d2[d2 == ""] <- NA I've not tested the idea extensively, so there might be occasions where it falls down. If you're working with a dataframe, you can use one of the apply functions to effect all columns. HTH -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.Date() results depend on order of data within vector?
Prof Brian Ripley wrote: > The correct work-around is to get non-valid strings returned as NA, not > "". That is argument 'na.strings' in RODBC (and elsewhere: read.table > behaves in the same way). > Thanks for these replies. As I have mentioned before, my peculiar combination of PostgreSQL, Actual's ODBC driver on Mac OS X, and RODBC means that for numbers and dates, NULL values are passed to R as empty strings rather than NAs (2). This does not occur with PostgreSQL's "text" column type. For the benefit of others who in the future may use this combination(1), my workaround for numbers/integers/boolean values is to essentially have temporary intermediate tables with columns of type "text" whatever the format, and let R/RODBC parse the strings into the correct resulting format (which it then does faultlessly). This does not work for dates however, and so I must use one of the two workarounds I mentioned in my post. Best wishes, Mark (1) unlikely as it may be (2) I still cannot fathom why integers and dates are not handled correctly. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.Date() results depend on order of data within vector?
On Sun, 7 Jan 2007, Mark Wardle wrote: > Dear all, > > The as.Date() function appears to give different results depending on > the order of the vector passed into it. > > d1 = c("1900-01-01", "2007-01-01","","2001-05-03") > d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") > as.Date(d1) # gives correct results > as.Date(d2) # fails with error (* see below) > > This problem does not arise if the dates are NA rather than an empty > string, but my data is coming via RODBC and I still don't have NAs > passed across properly. > > I might add that I initially noticed this behaviour when using RODBC's > sqlQuery() function call, and I initially had difficulty explaining why > one column of dates was passed correctly, but another failed. The > failing column was a "date of death" column where it was NA ("") for > most patients. > > I've come up with two workarounds that work. The first is to sort the > data at the SQL level, ensuring the initial record is not null. The > second is to use sqlQuery() with as.is=T option, and then do the sorting > and conversion afterwards. > > Is the behaviour of as.Date() shown above as expected/designed? Yes. It uses the first non-NA string to choose the format *if you do not specify it*. The correct work-around is to get non-valid strings returned as NA, not "". That is argument 'na.strings' in RODBC (and elsewhere: read.table behaves in the same way). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.Date() results depend on order of data within vector?
On Sun, 2007-01-07 at 12:01 +, Mark Wardle wrote: > Dear all, > > The as.Date() function appears to give different results depending on > the order of the vector passed into it. > > d1 = c("1900-01-01", "2007-01-01","","2001-05-03") > d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") > as.Date(d1) # gives correct results > as.Date(d2) # fails with error (* see below) > > This problem does not arise if the dates are NA rather than an empty > string, but my data is coming via RODBC and I still don't have NAs > passed across properly. > > I might add that I initially noticed this behaviour when using RODBC's > sqlQuery() function call, and I initially had difficulty explaining why > one column of dates was passed correctly, but another failed. The > failing column was a "date of death" column where it was NA ("") for > most patients. > > I've come up with two workarounds that work. The first is to sort the > data at the SQL level, ensuring the initial record is not null. The > second is to use sqlQuery() with as.is=T option, and then do the sorting > and conversion afterwards. Why not just tell R what the format the dates are in, using the "format" argument to as.Date? > d1 = c("1900-01-01", "2007-01-01","","2001-05-03") > d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") > as.Date(d1, "%Y-%m-%d") [1] "1900-01-01" "2007-01-01" NA "2001-05-03" > as.Date(d2, "%Y-%m-%d") [1] NA "1900-01-01" "2007-01-01" "2001-05-03" > > Is the behaviour of as.Date() shown above as expected/designed? I don't know about expected/designed, but I would have thought explicitly stating the date format would be the most fool-proof way of making sure R did what you wanted, and the easiest way to work around your "problem". HTH G > > Many thanks, > > Mark > > > (*) "Error in fromchar(x) : character string is not in a standard > unambiguous format" > > sessionInfo(): > R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale: > C/en_GB.UTF-8/C/C/C/C > attached base packages: > [1] "methods" "stats" "graphics" "grDevices" "utils" > "datasets" "base" > > other attached packages: > rcompletion RODBC >"0.0-12" "1.1-7" > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC [f] +44 (0)20 7679 0565 UCL Department of Geography Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street London, UK[w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT [w] http://www.freshwaters.org.uk/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.Date() results depend on order of data within vector?
Dear all, The as.Date() function appears to give different results depending on the order of the vector passed into it. d1 = c("1900-01-01", "2007-01-01","","2001-05-03") d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") as.Date(d1) # gives correct results as.Date(d2) # fails with error (* see below) This problem does not arise if the dates are NA rather than an empty string, but my data is coming via RODBC and I still don't have NAs passed across properly. I might add that I initially noticed this behaviour when using RODBC's sqlQuery() function call, and I initially had difficulty explaining why one column of dates was passed correctly, but another failed. The failing column was a "date of death" column where it was NA ("") for most patients. I've come up with two workarounds that work. The first is to sort the data at the SQL level, ensuring the initial record is not null. The second is to use sqlQuery() with as.is=T option, and then do the sorting and conversion afterwards. Is the behaviour of as.Date() shown above as expected/designed? Many thanks, Mark (*) "Error in fromchar(x) : character string is not in a standard unambiguous format" sessionInfo(): R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale: C/en_GB.UTF-8/C/C/C/C attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" "base" other attached packages: rcompletion RODBC "0.0-12" "1.1-7" __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.