I would add to this that in an important data set I was working with, most of the dates were dd/mm/yy but some of them were mm/dd/yy and that led to the realisation that I couldn't *tell* for about 40% of the dates which they were. If they were all one or the other, no worries, but when you have people from mixed backgrounds writing in mixed formats, you have a problem.
On Thu, 11 Jun 2020 at 19:17, Martin Maechler <maech...@stat.math.ethz.ch> wrote: > >>>>> Rich Shepard > >>>>> on Wed, 10 Jun 2020 07:44:49 -0700 writes: > > > On Wed, 10 Jun 2020, Jeff Newmiller wrote: > >> Fix your format specification? ?strptime > > >>> I have been trying to convert European short dates > >>> formatted as dd/mm/yy into the ISO 8601 but the function > >>> as.Dates interprets them as American ones (mm/dd/yy), > >>> thus I get: > > > Look at Hadley Wickham's 'tidyverse' collection as > > described in R for Data Science. There are date, datetime, > > and time functions that will do just what you want. > > > Rich > > I strongly disagree that automatic guessing of date format is a > good idea: > > If you have dates such as 01/02/03, 10/11/12 , ... > you cannot have a software (and also not a human) to *guess* for > you what it means. You have to *know* or get that knowledge "exogenously", > i.e., from context (say "meta data" if you want) that you as > data analyst must have before you can reliably work with that > data. > > There is a global standard (ISO) for dates, 2020-06-11, for today's; > These have the huge advantage that alphabetical ordering is > equivalent to time ordering ... and honestly I don't see why > smart people (such as most? R users) do not all use these much > more often, notably when it comes to data. > > But as long as most people in the world don't use that format > and practically all default formats for dates (e.g. in > spreadsheats and computer locales) do not use the ISO > standard, but rather regional conventions, one must add meta > data to have 100% garantee to use the correct format. > > Of course, you can often guess correctly with very high > (subjective) probability, e.g., 11/23/99 is highly probably > the 23rd of Nov, 1999.... and indeed if you have more than a few > dates, it often helps to guess correctly. But there's no > guarantee. > > No, I state that it is much better to ask from the data analyst > to use their brains a little bit and enter the date format > explicitly, than using software that does guess it for them > correctly most of the time. How should they find out at all in > the rare cases the automatic guess will be wrong ? > > Martin Maechler > ETH Zurich and R Core team > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.