Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-16 Thread Duncan Murdoch
On 15/12/2007 5:17 PM, Martin Maechler wrote:
 TP == Tony Plate [EMAIL PROTECTED]
 on Fri, 14 Dec 2007 13:58:30 -0700 writes:
 
 TP Duncan Murdoch wrote:
  On 12/13/2007 1:59 PM, Tony Plate wrote:
  Duncan Murdoch wrote:
  On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
  Full_Name: Petr Simecek
  Version: 2.5.1, 2.6.1
  OS: Windows XP
  Submission from: (NULL) (195.113.231.2)
  
  
  Several times I have experienced that a length of a POSIXt vector 
  has not been
  computed right.
  
  Example:
  
  tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
  ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
  ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), 
  mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon 
  = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
  105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = 
  c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
  163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
  1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, min, 
  hour, mday, mon, year, wday, yday, isdst
  ), class = c(POSIXt, POSIXlt))
  
  print(tv)
  # print 11 time points (right)
  
  length(tv)
  # returns 9 (wrong)
  
  tv is a list of length 9.  The answer is right, your expectation is 
  wrong.
  I have tried that on several computers with/without switching to 
  English
  locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a 
  help pages but I
  cannot imagine how that could be OK.
  
  See this in ?POSIXt:
  
  Class 'POSIXlt' is a named list of vectors...
  
  You could define your own length measurement as
  
  length.POSIXlt - function(x) length(x$sec)
  
  and you'll get the answer you expect, but be aware that length.XXX 
  methods are quite rare, and you may surprise some of your users.
  
  
  On the other hand, isn't the fact that length() currently always 
  returns 9 for POSIXlt objects likely to be a surprise to many users 
  of POSIXlt?
  
  The back of The New S Language says Easy-to-use facilities allow 
  you to organize, store and retrieve all sorts of data. ... S 
  functions and data organization make applications easy to write.
  
  Now, POSIXlt has methods for c() and vector subsetting [ (and many 
  other vector-manipulation methods - see methods(class=POSIXlt)).  
  Hence, from the point of view of intending to supply easy-to-use 
  facilities ... [for] all sorts of data, isn't it a little 
  incongruous that length() is not also provided -- as 3 functions (any 
  others?) comprise a core set of vector-manipulation functions?
  
  Would it make sense to have an informal prescription (e.g., in 
  R-exts) that a class that implements a vector-like object and 
  provides at least of one of functions 'c', '[' and 'length' should 
  provide all three?  It would also be easy to describe a test-suite 
  that should be included in the 'test' directory of a package 
  implementing such a class, that had some tests of the basic 
  vector-manipulation functionality, such as:
  
   # at this point, x0, x1, x3,  x10 should exist, as vectors of the
   # class being tested, of length 0, 1, 3, and 10, and they should
   # contain no duplicate elements
   length(x0)
  [1] 1
   length(c(x0, x1))
  [1] 2
   length(c(x1,x10))
  [1] 11
   all(x3 == x3[seq(len=length(x3))])
  [1] TRUE
   all(x3 == c(x3[1], x3[2], x3[3]))
  [1] TRUE
   length(c(x3[2], x10[5:7]))
  [1] 4
  
  
  It would also be possible to describe a larger set of vector 
  manipulation functions that should be implemented together, including 
  e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[-', 'is.na', 
  head, tail ... (many of which are provided for POSIXlt).
  
  Or is there some good reason that length() cannot be provided (while 
  'c' and '[' can) for some vector-like classes such as POSIXlt?
  
  What you say sounds good in general, but the devil is in the details. 
  Changing the meaning of length(x) for some objects has fairly 
  widespread effects.  Are they all positive?  I don't know.
  
  Adding a prescription like the one you suggest would be good if it's 
  easy to implement, but bad if it's already widely violated.  How many 
  base or CRAN or Bioconductor packages violate it currently?   Do the 
  ones that provide all 3 methods do so in a consistent way, i.e. does 
  length(x) mean the same thing in all of them?
 TP I'm not sure doing something like this would be so bad even if it is 
 TP already widely 

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-16 Thread Tony Plate
Duncan Murdoch wrote:
 On 15/12/2007 5:17 PM, Martin Maechler wrote:
 TP == Tony Plate [EMAIL PROTECTED]
 on Fri, 14 Dec 2007 13:58:30 -0700 writes:
 TP Duncan Murdoch wrote:
  On 12/13/2007 1:59 PM, Tony Plate wrote:
  Duncan Murdoch wrote:
  On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
  Full_Name: Petr Simecek
  Version: 2.5.1, 2.6.1
  OS: Windows XP
  Submission from: (NULL) (195.113.231.2)
  
  
  Several times I have experienced that a length of a POSIXt vector 
  has not been
  computed right.
  
  Example:
  
  tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
  ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
  ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), 
  mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), 
 mon 
  = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
  105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = 
  c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 
 163L, 
  163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
  1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, 
 min, 
  hour, mday, mon, year, wday, yday, isdst
  ), class = c(POSIXt, POSIXlt))
  
  print(tv)
  # print 11 time points (right)
  
  length(tv)
  # returns 9 (wrong)
  
  tv is a list of length 9.  The answer is right, your expectation is 
  wrong.
  I have tried that on several computers with/without switching to 
  English
  locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a 
  help pages but I
  cannot imagine how that could be OK.
  
  See this in ?POSIXt:
  
  Class 'POSIXlt' is a named list of vectors...
  
  You could define your own length measurement as
  
  length.POSIXlt - function(x) length(x$sec)
  
  and you'll get the answer you expect, but be aware that length.XXX 
  methods are quite rare, and you may surprise some of your users.
  
  
  On the other hand, isn't the fact that length() currently always 
  returns 9 for POSIXlt objects likely to be a surprise to many users 
  of POSIXlt?
  
  The back of The New S Language says Easy-to-use facilities allow 
  you to organize, store and retrieve all sorts of data. ... S 
  functions and data organization make applications easy to write.
  
  Now, POSIXlt has methods for c() and vector subsetting [ (and many 
  other vector-manipulation methods - see methods(class=POSIXlt)).  
  Hence, from the point of view of intending to supply easy-to-use 
  facilities ... [for] all sorts of data, isn't it a little 
  incongruous that length() is not also provided -- as 3 functions 
 (any 
  others?) comprise a core set of vector-manipulation functions?
  
  Would it make sense to have an informal prescription (e.g., in 
  R-exts) that a class that implements a vector-like object and 
  provides at least of one of functions 'c', '[' and 'length' should 
  provide all three?  It would also be easy to describe a test-suite 
  that should be included in the 'test' directory of a package 
  implementing such a class, that had some tests of the basic 
  vector-manipulation functionality, such as:
  
   # at this point, x0, x1, x3,  x10 should exist, as vectors of the
   # class being tested, of length 0, 1, 3, and 10, and they should
   # contain no duplicate elements
   length(x0)
  [1] 1
   length(c(x0, x1))
  [1] 2
   length(c(x1,x10))
  [1] 11
   all(x3 == x3[seq(len=length(x3))])
  [1] TRUE
   all(x3 == c(x3[1], x3[2], x3[3]))
  [1] TRUE
   length(c(x3[2], x10[5:7]))
  [1] 4
  
  
  It would also be possible to describe a larger set of vector 
  manipulation functions that should be implemented together, 
 including 
  e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[-', 'is.na', 
  head, tail ... (many of which are provided for POSIXlt).
  
  Or is there some good reason that length() cannot be provided (while 
  'c' and '[' can) for some vector-like classes such as POSIXlt?
  
  What you say sounds good in general, but the devil is in the details. 
  Changing the meaning of length(x) for some objects has fairly 
  widespread effects.  Are they all positive?  I don't know.
  
  Adding a prescription like the one you suggest would be good if it's 
  easy to implement, but bad if it's already widely violated.  How many 
  base or CRAN or Bioconductor packages violate it currently?   Do the 
  ones that provide all 3 methods do so in a consistent way, i.e. does 
  length(x) mean the same thing in all of them?
 TP I'm not sure doing something like this would be so bad even if it 

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-15 Thread Martin Maechler
 TP == Tony Plate [EMAIL PROTECTED]
 on Fri, 14 Dec 2007 13:58:30 -0700 writes:

TP Duncan Murdoch wrote:
 On 12/13/2007 1:59 PM, Tony Plate wrote:
 Duncan Murdoch wrote:
 On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)
 
 
 Several times I have experienced that a length of a POSIXt vector 
 has not been
 computed right.
 
 Example:
 
 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), 
 mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon 
 = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = 
 c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, min, 
 hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))
 
 print(tv)
 # print 11 time points (right)
 
 length(tv)
 # returns 9 (wrong)
 
 tv is a list of length 9.  The answer is right, your expectation is 
 wrong.
 I have tried that on several computers with/without switching to 
 English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a 
 help pages but I
 cannot imagine how that could be OK.
 
 See this in ?POSIXt:
 
 Class 'POSIXlt' is a named list of vectors...
 
 You could define your own length measurement as
 
 length.POSIXlt - function(x) length(x$sec)
 
 and you'll get the answer you expect, but be aware that length.XXX 
 methods are quite rare, and you may surprise some of your users.
 
 
 On the other hand, isn't the fact that length() currently always 
 returns 9 for POSIXlt objects likely to be a surprise to many users 
 of POSIXlt?
 
 The back of The New S Language says Easy-to-use facilities allow 
 you to organize, store and retrieve all sorts of data. ... S 
 functions and data organization make applications easy to write.
 
 Now, POSIXlt has methods for c() and vector subsetting [ (and many 
 other vector-manipulation methods - see methods(class=POSIXlt)).  
 Hence, from the point of view of intending to supply easy-to-use 
 facilities ... [for] all sorts of data, isn't it a little 
 incongruous that length() is not also provided -- as 3 functions (any 
 others?) comprise a core set of vector-manipulation functions?
 
 Would it make sense to have an informal prescription (e.g., in 
 R-exts) that a class that implements a vector-like object and 
 provides at least of one of functions 'c', '[' and 'length' should 
 provide all three?  It would also be easy to describe a test-suite 
 that should be included in the 'test' directory of a package 
 implementing such a class, that had some tests of the basic 
 vector-manipulation functionality, such as:
 
  # at this point, x0, x1, x3,  x10 should exist, as vectors of the
  # class being tested, of length 0, 1, 3, and 10, and they should
  # contain no duplicate elements
  length(x0)
 [1] 1
  length(c(x0, x1))
 [1] 2
  length(c(x1,x10))
 [1] 11
  all(x3 == x3[seq(len=length(x3))])
 [1] TRUE
  all(x3 == c(x3[1], x3[2], x3[3]))
 [1] TRUE
  length(c(x3[2], x10[5:7]))
 [1] 4
 
 
 It would also be possible to describe a larger set of vector 
 manipulation functions that should be implemented together, including 
 e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[-', 'is.na', 
 head, tail ... (many of which are provided for POSIXlt).
 
 Or is there some good reason that length() cannot be provided (while 
 'c' and '[' can) for some vector-like classes such as POSIXlt?
 
 What you say sounds good in general, but the devil is in the details. 
 Changing the meaning of length(x) for some objects has fairly 
 widespread effects.  Are they all positive?  I don't know.
 
 Adding a prescription like the one you suggest would be good if it's 
 easy to implement, but bad if it's already widely violated.  How many 
 base or CRAN or Bioconductor packages violate it currently?   Do the 
 ones that provide all 3 methods do so in a consistent way, i.e. does 
 length(x) mean the same thing in all of them?
TP I'm not sure doing something like this would be so bad even if it is 
TP already widely violated.  R has evolved significantly over time, and 
TP many rough edges have been cleaned up, sometimes in ways that were not 
TP backward compatible.  

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-15 Thread Gabor Grothendieck
If it were simply deprecated and then changed then
everyone using it would get a warning during the period
of deprecation so it would
not be so bad.  Given that its current behavior is
not very useful I suspect its not widely used anyways.
| haven't followed the whole discussion so sorry if these
points have already been made.

On Dec 15, 2007 5:17 PM, Martin Maechler [EMAIL PROTECTED] wrote:
  TP == Tony Plate [EMAIL PROTECTED]
  on Fri, 14 Dec 2007 13:58:30 -0700 writes:


TP Duncan Murdoch wrote:
 On 12/13/2007 1:59 PM, Tony Plate wrote:
 Duncan Murdoch wrote:
 On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector
 has not been
 computed right.

 Example:

 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L),
 mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon
 = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L,
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday =
 c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L,
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L,
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, min,
 hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)

 tv is a list of length 9.  The answer is right, your expectation is
 wrong.
 I have tried that on several computers with/without switching to
 English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a
 help pages but I
 cannot imagine how that could be OK.

 See this in ?POSIXt:

 Class 'POSIXlt' is a named list of vectors...

 You could define your own length measurement as

 length.POSIXlt - function(x) length(x$sec)

 and you'll get the answer you expect, but be aware that length.XXX
 methods are quite rare, and you may surprise some of your users.


 On the other hand, isn't the fact that length() currently always
 returns 9 for POSIXlt objects likely to be a surprise to many users
 of POSIXlt?

 The back of The New S Language says Easy-to-use facilities allow
 you to organize, store and retrieve all sorts of data. ... S
 functions and data organization make applications easy to write.

 Now, POSIXlt has methods for c() and vector subsetting [ (and many
 other vector-manipulation methods - see methods(class=POSIXlt)).
 Hence, from the point of view of intending to supply easy-to-use
 facilities ... [for] all sorts of data, isn't it a little
 incongruous that length() is not also provided -- as 3 functions (any
 others?) comprise a core set of vector-manipulation functions?

 Would it make sense to have an informal prescription (e.g., in
 R-exts) that a class that implements a vector-like object and
 provides at least of one of functions 'c', '[' and 'length' should
 provide all three?  It would also be easy to describe a test-suite
 that should be included in the 'test' directory of a package
 implementing such a class, that had some tests of the basic
 vector-manipulation functionality, such as:

  # at this point, x0, x1, x3,  x10 should exist, as vectors of the
  # class being tested, of length 0, 1, 3, and 10, and they should
  # contain no duplicate elements
  length(x0)
 [1] 1
  length(c(x0, x1))
 [1] 2
  length(c(x1,x10))
 [1] 11
  all(x3 == x3[seq(len=length(x3))])
 [1] TRUE
  all(x3 == c(x3[1], x3[2], x3[3]))
 [1] TRUE
  length(c(x3[2], x10[5:7]))
 [1] 4
 

 It would also be possible to describe a larger set of vector
 manipulation functions that should be implemented together, including
 e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[-', 'is.na',
 head, tail ... (many of which are provided for POSIXlt).

 Or is there some good reason that length() cannot be provided (while
 'c' and '[' can) for some vector-like classes such as POSIXlt?

 What you say sounds good in general, but the devil is in the details.
 Changing the meaning of length(x) for some objects has fairly
 widespread effects.  Are they all positive?  I don't know.

 Adding a prescription like the one you suggest would be good if it's
 easy to implement, but bad if it's already widely violated.  How many
 base or CRAN or Bioconductor packages violate it currently?   Do the
 ones that provide all 3 methods do so in a 

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-14 Thread Tony Plate
Duncan Murdoch wrote:
 On 12/13/2007 1:59 PM, Tony Plate wrote:
 Duncan Murdoch wrote:
 On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector 
 has not been
 computed right.

 Example:

 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), 
 mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon 
 = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = 
 c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, min, 
 hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)

 tv is a list of length 9.  The answer is right, your expectation is 
 wrong.
 I have tried that on several computers with/without switching to 
 English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a 
 help pages but I
 cannot imagine how that could be OK.

 See this in ?POSIXt:

 Class 'POSIXlt' is a named list of vectors...

 You could define your own length measurement as

 length.POSIXlt - function(x) length(x$sec)

 and you'll get the answer you expect, but be aware that length.XXX 
 methods are quite rare, and you may surprise some of your users.


 On the other hand, isn't the fact that length() currently always 
 returns 9 for POSIXlt objects likely to be a surprise to many users 
 of POSIXlt?

 The back of The New S Language says Easy-to-use facilities allow 
 you to organize, store and retrieve all sorts of data. ... S 
 functions and data organization make applications easy to write.

 Now, POSIXlt has methods for c() and vector subsetting [ (and many 
 other vector-manipulation methods - see methods(class=POSIXlt)).  
 Hence, from the point of view of intending to supply easy-to-use 
 facilities ... [for] all sorts of data, isn't it a little 
 incongruous that length() is not also provided -- as 3 functions (any 
 others?) comprise a core set of vector-manipulation functions?

 Would it make sense to have an informal prescription (e.g., in 
 R-exts) that a class that implements a vector-like object and 
 provides at least of one of functions 'c', '[' and 'length' should 
 provide all three?  It would also be easy to describe a test-suite 
 that should be included in the 'test' directory of a package 
 implementing such a class, that had some tests of the basic 
 vector-manipulation functionality, such as:

   # at this point, x0, x1, x3,  x10 should exist, as vectors of the
   # class being tested, of length 0, 1, 3, and 10, and they should
   # contain no duplicate elements
   length(x0)
 [1] 1
   length(c(x0, x1))
 [1] 2
   length(c(x1,x10))
 [1] 11
   all(x3 == x3[seq(len=length(x3))])
 [1] TRUE
   all(x3 == c(x3[1], x3[2], x3[3]))
 [1] TRUE
   length(c(x3[2], x10[5:7]))
 [1] 4
  

 It would also be possible to describe a larger set of vector 
 manipulation functions that should be implemented together, including 
 e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[-', 'is.na', 
 head, tail ... (many of which are provided for POSIXlt).

 Or is there some good reason that length() cannot be provided (while 
 'c' and '[' can) for some vector-like classes such as POSIXlt?

 What you say sounds good in general, but the devil is in the details. 
 Changing the meaning of length(x) for some objects has fairly 
 widespread effects.  Are they all positive?  I don't know.

 Adding a prescription like the one you suggest would be good if it's 
 easy to implement, but bad if it's already widely violated.  How many 
 base or CRAN or Bioconductor packages violate it currently?   Do the 
 ones that provide all 3 methods do so in a consistent way, i.e. does 
 length(x) mean the same thing in all of them?
I'm not sure doing something like this would be so bad even if it is 
already widely violated.  R has evolved significantly over time, and 
many rough edges have been cleaned up, sometimes in ways that were not 
backward compatible.  This is a great thing  my thanks go to the people 
working on R.

If some base or CRAN or Bioconductor packages currently don't implement 
vector operations consistently, wouldn't it be good to know that?  
Wouldn't it be useful to have an automatic way of determining whether a 
particular vector-like class is consistent with generally agreed set of 
principles for how basic vector operations should work -- things like 
length(x)+length(y)==length(c(x,y))?  This could help developers check, 
document  improve their code, and it could help users understand how to 
use 

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-13 Thread Tony Plate
Duncan Murdoch wrote:
 On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector has not 
 been
 computed right.

 Example:

 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 
 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 
 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, 
 min, hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)
 
 tv is a list of length 9.  The answer is right, your expectation is wrong.
 I have tried that on several computers with/without switching to English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a help pages 
 but I
 cannot imagine how that could be OK.
 
 See this in ?POSIXt:
 
 Class 'POSIXlt' is a named list of vectors...
 
 You could define your own length measurement as
 
 length.POSIXlt - function(x) length(x$sec)
 
 and you'll get the answer you expect, but be aware that length.XXX 
 methods are quite rare, and you may surprise some of your users.
 

On the other hand, isn't the fact that length() currently always returns 9 
for POSIXlt objects likely to be a surprise to many users of POSIXlt?

The back of The New S Language says Easy-to-use facilities allow you to 
organize, store and retrieve all sorts of data. ... S functions and data 
organization make applications easy to write.

Now, POSIXlt has methods for c() and vector subsetting [ (and many other 
vector-manipulation methods - see methods(class=POSIXlt)).  Hence, from 
the point of view of intending to supply easy-to-use facilities ... [for] 
all sorts of data, isn't it a little incongruous that length() is not also 
provided -- as 3 functions (any others?) comprise a core set of 
vector-manipulation functions?

Would it make sense to have an informal prescription (e.g., in R-exts) that 
a class that implements a vector-like object and provides at least of one 
of functions 'c', '[' and 'length' should provide all three?  It would also 
be easy to describe a test-suite that should be included in the 'test' 
directory of a package implementing such a class, that had some tests of 
the basic vector-manipulation functionality, such as:

  # at this point, x0, x1, x3,  x10 should exist, as vectors of the
  # class being tested, of length 0, 1, 3, and 10, and they should
  # contain no duplicate elements
  length(x0)
[1] 1
  length(c(x0, x1))
[1] 2
  length(c(x1,x10))
[1] 11
  all(x3 == x3[seq(len=length(x3))])
[1] TRUE
  all(x3 == c(x3[1], x3[2], x3[3]))
[1] TRUE
  length(c(x3[2], x10[5:7]))
[1] 4
 

It would also be possible to describe a larger set of vector manipulation 
functions that should be implemented together, including e.g., 'rep', 
'unique', 'duplicated', '==', 'sort', '[-', 'is.na', head, tail ... (many 
of which are provided for POSIXlt).

Or is there some good reason that length() cannot be provided (while 'c' 
and '[' can) for some vector-like classes such as POSIXlt?

-- Tony Plate

 Duncan Murdoch
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-13 Thread Duncan Murdoch
On 12/13/2007 1:59 PM, Tony Plate wrote:
 Duncan Murdoch wrote:
 On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector has not 
 been
 computed right.

 Example:

 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 
 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 
 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, 
 min, hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)
 
 tv is a list of length 9.  The answer is right, your expectation is wrong.
 I have tried that on several computers with/without switching to English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a help pages 
 but I
 cannot imagine how that could be OK.
 
 See this in ?POSIXt:
 
 Class 'POSIXlt' is a named list of vectors...
 
 You could define your own length measurement as
 
 length.POSIXlt - function(x) length(x$sec)
 
 and you'll get the answer you expect, but be aware that length.XXX 
 methods are quite rare, and you may surprise some of your users.
 
 
 On the other hand, isn't the fact that length() currently always returns 9 
 for POSIXlt objects likely to be a surprise to many users of POSIXlt?
 
 The back of The New S Language says Easy-to-use facilities allow you to 
 organize, store and retrieve all sorts of data. ... S functions and data 
 organization make applications easy to write.
 
 Now, POSIXlt has methods for c() and vector subsetting [ (and many other 
 vector-manipulation methods - see methods(class=POSIXlt)).  Hence, from 
 the point of view of intending to supply easy-to-use facilities ... [for] 
 all sorts of data, isn't it a little incongruous that length() is not also 
 provided -- as 3 functions (any others?) comprise a core set of 
 vector-manipulation functions?
 
 Would it make sense to have an informal prescription (e.g., in R-exts) that 
 a class that implements a vector-like object and provides at least of one 
 of functions 'c', '[' and 'length' should provide all three?  It would also 
 be easy to describe a test-suite that should be included in the 'test' 
 directory of a package implementing such a class, that had some tests of 
 the basic vector-manipulation functionality, such as:
 
   # at this point, x0, x1, x3,  x10 should exist, as vectors of the
   # class being tested, of length 0, 1, 3, and 10, and they should
   # contain no duplicate elements
   length(x0)
 [1] 1
   length(c(x0, x1))
 [1] 2
   length(c(x1,x10))
 [1] 11
   all(x3 == x3[seq(len=length(x3))])
 [1] TRUE
   all(x3 == c(x3[1], x3[2], x3[3]))
 [1] TRUE
   length(c(x3[2], x10[5:7]))
 [1] 4
  
 
 It would also be possible to describe a larger set of vector manipulation 
 functions that should be implemented together, including e.g., 'rep', 
 'unique', 'duplicated', '==', 'sort', '[-', 'is.na', head, tail ... (many 
 of which are provided for POSIXlt).
 
 Or is there some good reason that length() cannot be provided (while 'c' 
 and '[' can) for some vector-like classes such as POSIXlt?

What you say sounds good in general, but the devil is in the details. 
Changing the meaning of length(x) for some objects has fairly widespread 
effects.  Are they all positive?  I don't know.

Adding a prescription like the one you suggest would be good if it's 
easy to implement, but bad if it's already widely violated.  How many 
base or CRAN or Bioconductor packages violate it currently?   Do the 
ones that provide all 3 methods do so in a consistent way, i.e. does 
length(x) mean the same thing in all of them?

I agree that the current state is less than perfect, but making it 
better would really be a lot of work.  I suspect there are better ways 
to spend my time, so I'm not going to volunteer to do it.  I'm not even 
going to invite someone else to do it, or offer to review your work if 
you volunteer.  I think this falls into the class of next time we write 
a language, let's handle this better problems.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-12 Thread simecek
Full_Name: Petr Simecek
Version: 2.5.1, 2.6.1
OS: Windows XP
Submission from: (NULL) (195.113.231.2)


Several times I have experienced that a length of a POSIXt vector has not been
computed right.

Example:

tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 
12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 
13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, 
min, hour, mday, mon, year, wday, yday, isdst
), class = c(POSIXt, POSIXlt))

print(tv)
# print 11 time points (right)

length(tv)
# returns 9 (wrong)

I have tried that on several computers with/without switching to English
locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a help pages but I
cannot imagine how that could be OK.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-12 Thread Peter Dalgaard
[EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector has not been
 computed right.

 Example:

 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 
 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 
 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, 
 min, hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)

 I have tried that on several computers with/without switching to English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a help pages 
 but I
 cannot imagine how that could be OK.

   
Given the way you define it, you should be able to imagine it!

It's a list of length 9:  sec, min, hour,..., isdst.


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-12 Thread Duncan Murdoch
On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)
 
 
 Several times I have experienced that a length of a POSIXt vector has not been
 computed right.
 
 Example:
 
 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 
 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 
 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, 
 min, hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))
 
 print(tv)
 # print 11 time points (right)
 
 length(tv)
 # returns 9 (wrong)

tv is a list of length 9.  The answer is right, your expectation is wrong.
 
 I have tried that on several computers with/without switching to English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a help pages 
 but I
 cannot imagine how that could be OK.

See this in ?POSIXt:

Class 'POSIXlt' is a named list of vectors...

You could define your own length measurement as

length.POSIXlt - function(x) length(x$sec)

and you'll get the answer you expect, but be aware that length.XXX 
methods are quite rare, and you may surprise some of your users.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-12 Thread ripley
It is right: it is a list of length 9.  You even constructed it as such a 
list!

On Tue, 11 Dec 2007, [EMAIL PROTECTED] wrote:

 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector has not been
 computed right.

 Example:

 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L,
 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L,
 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L,
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L,
 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L,
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L,
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec,
 min, hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)

 I have tried that on several computers with/without switching to English 
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a help 
 pages but I cannot imagine how that could be OK.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel