Re: [Rd] Date vs date (long)

2007-09-21 Thread Terry Therneau
Peter et al

  Thanks for the comments on dates.  Some of the respondents missed the point,
by showing ways that I could work around the problems, when my main argument
is that one shouldn't have to work around problems.  So I hereto present
round 2 of the debate.

 1 Postulates

   a. In my 35 year computing experience, I think that nothing frustrates me
more than a computer program that tries to keep me from doing something
for my own protection, when I know quite well what I am doing.  So postulate
1 is a Bayesian sort of thing: the loss function is so large (hopping mad
user) that one should be very cautious about creating a taboo.

   b. The S language's primary success is as a tool.  Tools get used in ways
that the originator never thought of.  Alternate use is not wrong --- in
fact you want to foster it.  (My farm backround plays a role here.  You wouldn't
believe the number of things I've fixed with a hammer and/or wrench, when the
goal was not to get it done right, but just to get whatever done and get the 
crop in.)

  
 2 Key question

  Both a data and a time-span object consist of a numeric value along with
ancillary information about how to interpret that value.  For simplicity call
the latter attributes (ignoring whether they are implemented using the attr
function or slots or whatever).
  For some operations is is fairly clear what do to with both the attr and the
numeric part, e.g., date + 1 is the next day.  No problem here.

  For other operations, e.g., timespan^2 it is only clear that the result is no
longer a timespan, but not what class it should be.  I firmly believe that the
right result is to toss the attribute and return the number.  This makes
the tool optimally useful.  Returning an error message is an unneccesary
and controlling  response: what good did the not legal message do me?
There are of course many cases where an error message is the only choice,
because I can't see what to do with either the number or the attribute, 
e.g.  date + string.

  The key question is then what is the right philosphy, flexible tool or
rigorous control?  Rigorous control languages have not fared well historically.

3 Hard cases

  The hardest are cases where the right return value is unclear.  An example
is (date + 1.73) : should one return a true date, which is integer, allow
an invalid internal value that is fixed at print time, return a numeric,
or an error message?

   I put (timespan/constant) in this category.  The author has no hint as to
whether the constant is unitless or not.  In the medical research environment
converstions back and forth from days to months and years are very common,
greatly outmassing division of an iterval into pieces, so if I had to guess
I would assume that I had to drop the units; another environment might be
just the opposite.

4 Response to particular points:

Peter D, 9/14
  a. as.Date(x)
Peter suggests (as.Date('1960-1-1') + x).  This is a really good idea, as it
makes the code both origin independent and clearer.

  b. I'd advise against numeric operation on difftime objects in general, 
because of the unspecified units.
  If I carry this idea forward, the R should insist that I specify units for
any variable that corresponds to a physical quantity, e.g. height or 
weight, so that it can slap my hands with an error message when I type

bodyMassIndex = weight/ height^2

or cause plot(height^2, weight) to fail.  This would go a long way towards
making R the most frustrating program available.  (An Microsoft gives some
stiff competition in that area!)

 c. 
It is assumed that the divisor is unit-less. 
Convert to numeric first to avoid this. (The idea has been raised to 
introduce new units: epiyears and epimonths, in which case you might do

x - as.Date('2007-9-14') - as.Date('1953-3-10')
units(x) - epiyears

which would give you the age in years for those purposes where you don't 
care missing the exact birthday by a day or so.)

   As I said, division is a hard case with no clear answer.  The creation of
other unit schemes is silly --- why in the world would I voluntarily put on
a straightjacket?

d. 
 as.Date('09Sep2007')
 
 Error in fromchar(x) : character string is not in a standard unambiguous 
format

  My off-the-cuff suggestion is to make the message honest
Error in fromchar(x): program is not able to divine the correct format

The problem is not that the format is necessarily wrong or ambiguous, but that
the program can't guess.  (Which is no real fault of the program - such 
a recognition is a hard problem.  It's ok to ask me for a format string).

--
Hadley Wickham
Why not just always use seconds for difftime objects?  An attribute
could control how it was formatted, but would be independent of the
underlying representation.

  This misses the point.

---

Gabor Grothendieck 

as.Date(10)
You can define as.Date.numeric in your package and then it will work.  zoo
has done that.

library(zoo)
as.Date(10)

  This is also a 

Re: [Rd] Date vs date (long)

2007-09-17 Thread Peter Dalgaard
Terry Therneau wrote:
   b. I'd advise against numeric operation on difftime objects in general, 
 because of the unspecified units.
   If I carry this idea forward, the R should insist that I specify units for
 any variable that corresponds to a physical quantity, e.g. height or 
 weight, so that it can slap my hands with an error message when I type

   bodyMassIndex = weight/ height^2

 or cause plot(height^2, weight) to fail.  This would go a long way towards
 making R the most frustrating program available.  (An Microsoft gives some
 stiff competition in that area!)
   
That's not the point. The point is that 2 weeks is 14 days, so do you 
want sqrt(2) or sqrt(14)? It is not my design to have this 
variable-units encoding of difftimes, but as it is there, it  is better 
to play along than to pretend that it is something else. (Once you go to 
faster time scales than in epidemiology, this becomes quite crucial 
because the units chosen can depend on the actual differences computed!)

  c. 
 It is assumed that the divisor is unit-less. 
 Convert to numeric first to avoid this. (The idea has been raised to 
 introduce new units: epiyears and epimonths, in which case you might do

 x - as.Date('2007-9-14') - as.Date('1953-3-10')
 units(x) - epiyears

 which would give you the age in years for those purposes where you don't 
 care missing the exact birthday by a day or so.)

As I said, division is a hard case with no clear answer.  The creation of
 other unit schemes is silly --- why in the world would I voluntarily put on
 a straightjacket?
   
We'll put it on for you...

It makes sense to calculate half a difftime or a 12th or a 100th of a 
difftime. You were asking the system to magically conclude that a 
365.25th of a difftime has a different meaning, a units conversion. This 
is the sort of thing that humans can discern, but not machines. The 
design is that you change units by using units(x)-. Unfortunately the 
largest regular unit is weeks, hence the suggestion of epiyears.

 d. 
   
 as.Date('09Sep2007')
 
   
 Error in fromchar(x) : character string is not in a standard unambiguous 
 
 format

   My off-the-cuff suggestion is to make the message honest
   Error in fromchar(x): program is not able to divine the correct format
   
Heh. Pretty close.  Now what is a suitable eufemism for divine?
 The problem is not that the format is necessarily wrong or ambiguous, but that
 the program can't guess.  (Which is no real fault of the program - such 
 a recognition is a hard problem.  It's ok to ask me for a format string).

 --
 Hadley Wickham
 Why not just always use seconds for difftime objects?  An attribute
 could control how it was formatted, but would be independent of the
 underlying representation.

   This misses the point.
   
No. It _is_ the point.  The design is that the numeric value of a 
difftime is nonsensical without knowing the units. This might be 
different, although as Brian indicated, the choice is deliberate, and 
some deep thinking was involved.
 ---

 Gabor Grothendieck 

 as.Date(10)
 You can define as.Date.numeric in your package and then it will work.  zoo
 has done that.

 library(zoo)
 as.Date(10)

   This is also a nice idea.  Although adding to a package is possible, it is
 now very hard to take away, given namespaces.  That is, I can't define my
 own Math.Date to do away with the creation of timespan objects.  Am I
 correct?  Is it also true that adding methods is hard if one uses version 4
 classes?

   The rest of Gabor's comments are workarounds for the problem I raised.
 But I don't want to have to wrap as.numeric around all of my date 
 calculations.

   
Just get used to it, I'd say.

 ---
 Brian Ripley

 It fails by design.  Using sqrt() on a measurement that has an arbitrary 
 origin would not have been good design.

   Ah, the classic Unix response of that's not a bug, it's a feature.

   What is interesting is that this is almost precisely the response I
 got when I first argued for a global na.action default.  John C (I think)
 replied that, essentially, S SHOULD slap you alonside the head when
 there were missing values.  They require careful thought wrt good analysis,
 and allowing a global option was bad design because it would encourage bad
 statistics.  The Insightful side of the debate said they didn't dare because
 is might break something.  After getting nowhere with talking I finally
 gave up and wrote my own version into the survival code.  This leverage 
 eventually forced adoption of the idea.
Not many (any?) people currently set na.action=na.fail because it is
 a better design. 
 --

 Historically, languages designed for other people to use have been bad: Cobol,
 PL/I, Pascal, Ada, C++. The good languages have been those that were designed 
 for their own creators: C, Perl, Smalltalk, Lisp. (Paul Graham)
   
Each of those that I know had its share of trouble with users who relied 
on 

Re: [Rd] Date vs date (long)

2007-09-17 Thread Gabor Grothendieck
On 9/17/07, Terry Therneau [EMAIL PROTECTED] wrote:
 Gabor Grothendieck

 as.Date(10)
 You can define as.Date.numeric in your package and then it will work.  zoo
 has done that.

 library(zoo)
 as.Date(10)

  This is also a nice idea.  Although adding to a package is possible, it is
 now very hard to take away, given namespaces.  That is, I can't define my
 own Math.Date to do away with the creation of timespan objects.  Am I
 correct?  Is it also true that adding methods is hard if one uses version 4
 classes?

  The rest of Gabor's comments are workarounds for the problem I raised.
 But I don't want to have to wrap as.numeric around all of my date
 calculations.

You can define as.Date.numeric and Ops.Date, say, using S3 and these
will be added to the whatever is there but won't override the existing
+.Date and -.Date nor would you want them to or else the behavior would
be different depending on whether your package was there or not.  Also
namespaces should not be a problem since zoo uses namespaces and
it defined its own as.Date.numeric.

Try this:

Ops.Date - function (e1, e2) {
e - if (missing(e2)) {
NextMethod(.Generic)
}
else if (any(nchar(.Method) == 0)) {
NextMethod(.Generic)
}
else {
e1 - as.numeric(e1)
e2 - as.numeric(e2)
NextMethod(.Generic)
}
e
}

Sys.Date() / Sys.Date()
Sys.Date() + as.numeric(Sys.Date())
as.numeric(Sys.Date()) + as.numeric(Sys.Date())

Sys.Date() + Sys.Date() # error since its intercepted by +.Date

Thus you will have to issue some as.numeric calls but perhaps not too
many.

However, I think its better not to implement Ops.Date as above but
just leave the Date operations the way they are, extend it with
as.Date.numeric like zoo has done and force the user to use as.numeric
in other cases to make it clear from the code that  there is conversion
going on.  I have done a fair amount of Date manipulation and have not
found the as.numeric to be onerous.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date vs date

2007-09-15 Thread Prof Brian Ripley
On Fri, 14 Sep 2007, hadley wickham wrote:

   3. temp - as.Date('1990/1/1') - as.date('1953/2/5')
  sqrt(temp)
  Error in Math.difftime(temp3) : sqrtnot defined for difftime objects

   Minor bug: no space before the word 'not'
   Major: this shouldn't fail.


 Arguably, it should (Is this a difftime object? Which units?).
 I'd advise against numeric operation on difftime objects in general,
 because of the unspecified units. These are always days when working
 with Date objects, but with general time objects it is not predictable.
 So I'd recommend sqrt(as.numeric(temp, units=days)).

It fails by design.  Using sqrt() on a measurement that has an arbitrary 
origin would not have been good design.

 Why not just always use seconds for difftime objects?  An attribute
 could control how it was formatted, but would be independent of the
 underlying representation.

Because of leapseconds and changes to/from DST (which require knowing the 
timezone and its transition times).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date vs date

2007-09-14 Thread Peter Dalgaard
Terry Therneau wrote:
  I wrote the date package long ago, and it has been useful.  In my current 
 task 
 of reunifying the R (Tom Lumley) and Splus (me) code trees for survival, I'm 
 removing the explicit dependence on 'date' objects from the expected survival 
 routines so that they better integrate.   Comparison of 'date' to 'Date' has 
 raised a couple of questions.
  
   Clearly Date is more mature -- more options for conversion, better 
 plotting, 
 etc (a long list of etc).  I see three things where date is better.  Only the 
 last of these really matters, and is the point on which I would like comment. 
  
 (Well, actually I'd like to talk you all into a change, of course).
   
   1. Since date uses 1/1/1960 as the base, and so does SAS, those of us who 
 contantly pass files back and forth between those two packages have a 
 slightly 
 easier time.

   2. as.date(10) works, as.Date(10) does not.  Sometimes I have done a 
 manipluation that the date package does not understand, and I know that the 
 result is still of the right type, but the package does not.  However, this 
 is 
 fairly rare and I can work around it. (It mostly occurs in processing the 
 rate 
 tables for expected survival).
   
The ideology here is that the origin is an implementation detail which 
users are not really expected to know. as.Date(1960-1-1)+10  works, 
and if you insist, so does structure(10, class=Date) (albeit not with 
the same result).

   
   3. temp - as.Date('1990/1/1') - as.date('1953/2/5')
  sqrt(temp)
  Error in Math.difftime(temp3) : sqrtnot defined for difftime objects

   Minor bug: no space before the word 'not'
   Major: this shouldn't fail.  

   
Arguably, it should (Is this a difftime object? Which units?).
I'd advise against numeric operation on difftime objects in general, 
because of the unspecified units. These are always days when working 
with Date objects, but with general time objects it is not predictable. 
So I'd recommend sqrt(as.numeric(temp, units=days)).

 People will do things with time intervals that you have not thought of.  
 Fitting 
 a growth curve that uses a square root, for instance.  I firmly believe that 
 the 
 superior behavior in the face of something unexpected is to assume that the 
 user 
 knows what they are doing, and return a numeric.  
I recognize that assume the user knows what they are doing is an 
 anathema 
 to the more zealous OO types, but in designing a class I have found that they 
 often know more than me!

4. Variation on #3 above

   
 (as.Date('2007-9-14') - as.Date('1953-3-10')) / 365.25
 
   Time difference of 54.51335 days

 No, I am not 54.5 days old.  Both hair color and knee creaking most 
 definitely proclaim otherwise, I am sorry to say. Time difference / number 
 should be a number.  
   
Same story as above. It is assumed that the divisor is unit-less. 
Convert to numeric first to avoid this. (The idea has been raised to 
introduce new units: epiyears and epimonths, in which case you might do

x - as.Date('2007-9-14') - as.Date('1953-3-10')
units(x) - epiyears

which would give you the age in years for those purposes where you don't 
care missing the exact birthday by a day or so.)
 
5. This is only amusing.  Im not saying that as.Date should necessarily 
 work, 
 but the format is certainly not ambiguous.  (Not standard, but not ambiguous).
 Not important to fix, not something that date does any better.

   
 as.Date('09Sep2007')
 
 Error in fromchar(x) : character string is not in a standard unambiguous 
 format
  

   
Yes. Also: this _is_ ambiguous, but does not cause an error

  as.Date(05-06-07)
[1] 5-06-07

Not that it should or even could, but it demonstrates that the error 
message above is beside the point.  Can you suggest a better wording?
 


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date vs date

2007-09-14 Thread hadley wickham
3. temp - as.Date('1990/1/1') - as.date('1953/2/5')
   sqrt(temp)
   Error in Math.difftime(temp3) : sqrtnot defined for difftime objects
 
Minor bug: no space before the word 'not'
Major: this shouldn't fail.
 
 
 Arguably, it should (Is this a difftime object? Which units?).
 I'd advise against numeric operation on difftime objects in general,
 because of the unspecified units. These are always days when working
 with Date objects, but with general time objects it is not predictable.
 So I'd recommend sqrt(as.numeric(temp, units=days)).

Why not just always use seconds for difftime objects?  An attribute
could control how it was formatted, but would be independent of the
underlying representation.

Hadley

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date vs date

2007-09-14 Thread Gabor Grothendieck
On 9/14/07, Terry Therneau [EMAIL PROTECTED] wrote:
  I wrote the date package long ago, and it has been useful.  In my current 
 task
 of reunifying the R (Tom Lumley) and Splus (me) code trees for survival, I'm
 removing the explicit dependence on 'date' objects from the expected survival
 routines so that they better integrate.   Comparison of 'date' to 'Date' has
 raised a couple of questions.

  Clearly Date is more mature -- more options for conversion, better plotting,
 etc (a long list of etc).  I see three things where date is better.  Only the
 last of these really matters, and is the point on which I would like comment.
 (Well, actually I'd like to talk you all into a change, of course).

  1. Since date uses 1/1/1960 as the base, and so does SAS, those of us who
 contantly pass files back and forth between those two packages have a slightly
 easier time.

There are some other programs that use 1/1/70.  See the R Help Desk article
in R News 4/1 that discusses a few origins.


  2. as.date(10) works, as.Date(10) does not.  Sometimes I have done a
 manipluation that the date package does not understand, and I know that the
 result is still of the right type, but the package does not.  However, this is
 fairly rare and I can work around it. (It mostly occurs in processing the rate
 tables for expected survival).

You can define as.Date.numeric in your package and then it will work.  zoo
has done that.

library(zoo)
as.Date(10)

Some other things you can do:

today - Sys.Date()
Epoch - today - as.numeric(today)

Epoch + 10  # similar to as.Date(10)


  3. temp - as.Date('1990/1/1') - as.date('1953/2/5')
 sqrt(temp)
 Error in Math.difftime(temp3) : sqrtnot defined for difftime objects

  Minor bug: no space before the word 'not'
  Major: this shouldn't fail.

 People will do things with time intervals that you have not thought of.  
 Fitting
 a growth curve that uses a square root, for instance.  I firmly believe that 
 the
 superior behavior in the face of something unexpected is to assume that the 
 user
 knows what they are doing, and return a numeric.
   I recognize that assume the user knows what they are doing is an anathema
 to the more zealous OO types, but in designing a class I have found that they
 often know more than me!

   4. Variation on #3 above

  (as.Date('2007-9-14') - as.Date('1953-3-10')) / 365.25
  Time difference of 54.51335 days

No, I am not 54.5 days old.  Both hair color and knee creaking most
 definitely proclaim otherwise, I am sorry to say. Time difference / number
 should be a number.

Note that you can write:

x - Sys.Date()
y - x + 1
as.numeric(x-y)
as.numeric(x) - as.numeric(y)


   5. This is only amusing.  Im not saying that as.Date should necessarily 
 work,
 but the format is certainly not ambiguous.  (Not standard, but not ambiguous).
 Not important to fix, not something that date does any better.

  as.Date('09Sep2007')
 Error in fromchar(x) : character string is not in a standard unambiguous 
 format

as.Date(09Sep2007, %d%b%Y)




Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel