Re: [Rd] POSIXlt matching bug

2010-07-02 Thread Sklyar, Oleg (London)
POSIXlt is a list and it is not a list of dates or times, it is a list
of 

 x - as.POSIXlt(Sys.Date())
 names(x)
[1] sec   min   hour  mday  mon   year  wday  yday
isdst

So if you want to match these things, you should use POSIXct or any
other numeric-based format (as POSIXct is just a double value for the
number of seconds since 1970-01-01) e.g.

 z - as.POSIXct(Sys.Date())
 x - as.POSIXct(Sys.Date())
 z==x
[1] TRUE
 match(z,x)
[1] 1
 z %in% x
[1] TRUE

Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3803
oskl...@maninvestments.com 

 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of McGehee, Robert
 Sent: 29 June 2010 15:46
 To: r-b...@r-project.org; r-devel@r-project.org
 Subject: [Rd] POSIXlt matching bug
 
 I came across the below mis-feature/bug using match with 
 POSIXlt objects
 (from strptime) in R 2.11.1 (though this appears to be an old issue).
 
  x - as.POSIXlt(Sys.Date())
  table - as.POSIXlt(Sys.Date()+0:5)
  length(x)
 [1] 1
  x %in% table  # I expect TRUE
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
  match(x, table) # I expect 1
 [1] NA NA NA NA NA NA NA NA NA
 
 This behavior seemed more plausible when the length of a 
 POSIXlt object
 was 9 (back in the day), however since the length was redefined, the
 length of x no longer matches the length of the match function output,
 as specified by the ?match documentation: A vector of the same length
 as 'x'.
 
 I would normally suggest that we add a POSIXlt method for match that
 converts x into POSIXct or character first. However, match does not
 appear to be generic. Below is a possible rewrite of match 
 that appears
 to work as desired.
 
 match - function(x, table, nomatch = NA_integer_, 
 incomparables = NULL)
 
 .Internal(match(if(is.factor(x)||inherits(x, POSIXlt))
 as.character(x) else x,
 if(is.factor(table)||inherits(table, POSIXlt))
 as.character(table) else table,
 nomatch, incomparables))
 
 That said, I understand some people may be very sensitive to the speed
 of the match function, and may prefer a simple change to the ?match
 documentation noting this (odd) behavior for POSIXlt. 
 
 Thanks, Robert
 
 R.version
_
 platform   x86_64-unknown-linux-gnu 
 arch   x86_64   
 os linux-gnu
 system x86_64, linux-gnu
 status  
 major  2
 minor  11.1 
 year   2010 
 month  05   
 day31   
 svn rev52157
 language   R
 version.string R version 2.11.1 (2010-05-31)
 
 Robert McGehee, CFA
 Geode Capital Management, LLC
 One Post Office Square, 28th Floor | Boston, MA | 02109
 Tel: 617/392-8396Fax:617/476-6389
 mailto:robert.mcge...@geodecapital.com
 
 
 This e-mail, and any attachments hereto, are intended for use by the
 addressee(s) only and may contain information that is (i) confidential
 information of Geode Capital Management, LLC and/or its affiliates,
 and/or (ii) proprietary information of Geode Capital Management, LLC
 and/or its affiliates. If you are not the intended recipient of this
 e-mail, or if you have otherwise received this e-mail in error, please
 immediately notify me by telephone (you may call collect), or 
 by e-mail,
 and please permanently delete the original, any print outs and any
 copies of the foregoing. Any dissemination, distribution or copying of
 this e-mail is strictly prohibited. 
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 

**
 Please consider the environment before printing this email or its attachments.
The contents of this email are for the named addressees ...{{dropped:19}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] POSIXlt matching bug

2010-07-02 Thread Martin Maechler
 RobMcG == McGehee, Robert robert.mcge...@geodecapital.com
 on Tue, 29 Jun 2010 10:46:06 -0400 writes:

RobMcG I came across the below mis-feature/bug using match with POSIXlt 
objects
RobMcG (from strptime) in R 2.11.1 (though this appears to be an old 
issue).

 x - as.POSIXlt(Sys.Date())
 table - as.POSIXlt(Sys.Date()+0:5)
 length(x)
RobMcG [1] 1
 x %in% table  # I expect TRUE
RobMcG [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 match(x, table) # I expect 1
RobMcG [1] NA NA NA NA NA NA NA NA NA

RobMcG This behavior seemed more plausible when the length of a POSIXlt 
object
RobMcG was 9 (back in the day), however since the length was redefined, the
RobMcG length of x no longer matches the length of the match function 
output,
RobMcG as specified by the ?match documentation: A vector of the same 
length
RobMcG as 'x'.

RobMcG I would normally suggest that we add a POSIXlt method for match that
RobMcG converts x into POSIXct or character first. However, match does not
RobMcG appear to be generic. Below is a possible rewrite of match that 
appears
RobMcG to work as desired.

RobMcG match - function(x, table, nomatch = NA_integer_, incomparables = 
NULL)

RobMcG .Internal(match(if(is.factor(x)||inherits(x, POSIXlt))
RobMcG as.character(x) else x,
RobMcG if(is.factor(table)||inherits(table, POSIXlt))
RobMcG as.character(table) else table,
RobMcG nomatch, incomparables))

RobMcG That said, I understand some people may be very sensitive to the 
speed
RobMcG of the match function, 

yes, indeed. 

I'm currently investigating an alternative, considerably more
programming time, but in the end should loose much less speed,
is to  .Internal()ize the tests in C code,
so that the resulting R code would simply be

match - function(x, table, nomatch = NA_integer_, incomparables = NULL)
.Internal(x, table, nomatch, incomparables)


Martin Maechler,
ETH Zurich


RobMcG and may prefer a simple change to the ?match
RobMcG documentation noting this (odd) behavior for POSIXlt. 

RobMcG Thanks, Robert

RobMcG R.version
RobMcG _
RobMcG platform   x86_64-unknown-linux-gnu 
RobMcG arch   x86_64   
RobMcG os linux-gnu
RobMcG system x86_64, linux-gnu
RobMcG status  
RobMcG major  2
RobMcG minor  11.1 
RobMcG year   2010 
RobMcG month  05   
RobMcG day31   
RobMcG svn rev52157
RobMcG language   R
RobMcG version.string R version 2.11.1 (2010-05-31)

RobMcG Robert McGehee, CFA
RobMcG Geode Capital Management, LLC
RobMcG One Post Office Square, 28th Floor | Boston, MA | 02109
RobMcG Tel: 617/392-8396Fax:617/476-6389
RobMcG mailto:robert.mcge...@geodecapital.com


 This e-mail, and any attachments hereto, are intended for use by the
RobMcG addressee(s) only and may contain information that is (i) 
confidential
RobMcG information of Geode Capital Management, LLC and/or its affiliates,
RobMcG and/or (ii) proprietary information of Geode Capital Management, LLC
RobMcG and/or its affiliates. If you are not the intended recipient of this
RobMcG e-mail, or if you have otherwise received this e-mail in error, 
please
RobMcG immediately notify me by telephone (you may call collect), or by 
e-mail,
RobMcG and please permanently delete the original, any print outs and any
RobMcG copies of the foregoing. Any dissemination, distribution or copying 
of
RobMcG this e-mail is strictly prohibited. 

RobMcG __
RobMcG R-devel@r-project.org mailing list
RobMcG https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] POSIXlt matching bug

2010-07-02 Thread Martin Maechler
 MM == Martin Maechler maech...@stat.math.ethz.ch
 on Fri, 2 Jul 2010 12:22:07 +0200 writes:

 RobMcG == McGehee, Robert robert.mcge...@geodecapital.com
 on Tue, 29 Jun 2010 10:46:06 -0400 writes:

RobMcG I came across the below mis-feature/bug using match with POSIXlt 
objects
RobMcG (from strptime) in R 2.11.1 (though this appears to be an old 
issue).

 x - as.POSIXlt(Sys.Date())
 table - as.POSIXlt(Sys.Date()+0:5)
 length(x)
RobMcG [1] 1
 x %in% table  # I expect TRUE
RobMcG [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 match(x, table) # I expect 1
RobMcG [1] NA NA NA NA NA NA NA NA NA

RobMcG This behavior seemed more plausible when the length of a POSIXlt 
object
RobMcG was 9 (back in the day), however since the length was redefined, the
RobMcG length of x no longer matches the length of the match function 
output,
RobMcG as specified by the ?match documentation: A vector of the same 
length
RobMcG as 'x'.

RobMcG I would normally suggest that we add a POSIXlt method for match that
RobMcG converts x into POSIXct or character first. However, match does not
RobMcG appear to be generic. Below is a possible rewrite of match that 
appears
RobMcG to work as desired.

RobMcG match - function(x, table, nomatch = NA_integer_, incomparables = 
NULL)

RobMcG .Internal(match(if(is.factor(x)||inherits(x, POSIXlt))
RobMcG as.character(x) else x,
RobMcG if(is.factor(table)||inherits(table, POSIXlt))
RobMcG as.character(table) else table,
RobMcG nomatch, incomparables))

RobMcG That said, I understand some people may be very sensitive to the 
speed
RobMcG of the match function, 

MM yes, indeed. 

MM I'm currently investigating an alternative, considerably more
MM programming time, but in the end should loose much less speed,
MM is to  .Internal()ize the tests in C code,
MM so that the resulting R code would simply be

MM match - function(x, table, nomatch = NA_integer_, incomparables = NULL)
MM .Internal(x, table, nomatch, incomparables)

I have committed such a change to  R-devel, to be 2.12.x.
This should mean that  match() actually is now very slightly
faster than it used to be.
The speed gain may not be measurable though.

Martin Maechler,  ETH Zurich



RobMcG and may prefer a simple change to the ?match
RobMcG documentation noting this (odd) behavior for POSIXlt. 

RobMcG Thanks, Robert

RobMcG R.version
RobMcG _
RobMcG platform   x86_64-unknown-linux-gnu 
RobMcG arch   x86_64   
RobMcG os linux-gnu
RobMcG system x86_64, linux-gnu
RobMcG status  
RobMcG major  2
RobMcG minor  11.1 
RobMcG year   2010 
RobMcG month  05   
RobMcG day31   
RobMcG svn rev52157
RobMcG language   R
RobMcG version.string R version 2.11.1 (2010-05-31)

RobMcG Robert McGehee, CFA
RobMcG Geode Capital Management, LLC
RobMcG One Post Office Square, 28th Floor | Boston, MA | 02109
RobMcG Tel: 617/392-8396Fax:617/476-6389
RobMcG mailto:robert.mcge...@geodecapital.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Best way to determine if you're running 32 or 64 bit R on windows

2010-07-02 Thread Jeffrey Horner
Hi,

Is this sufficient?

if (.Machine$sizeof.pointer==4){
  cat('32\n')
} else {
  cat('64\n')
}

Or is it better to test something in R.version, say os?

I'd like to use this to specify appropriate linker arguments when
building the RMySQL windows package.

Jeff
-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best way to determine if you're running 32 or 64 bit R on windows

2010-07-02 Thread Martin Maechler
Jeffrey Horner jeffrey.horner at gmail.com writes:

 Is this sufficient?
 
 if (.Machine$sizeof.pointer==4){
   cat('32\n')
 } else {
   cat('64\n')
 }
 
 Or is it better to test something in R.version, say os?

No, the above is perfect,  as it also works on other platforms to distinguish
32-bit and 64-bit.

Regards, Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Attributes of 1st argument in ...

2010-07-02 Thread Daniel Murphy
R-Devel:

I am trying to get an attribute of the first argument in a call to a
function whose formal arguments consist of dots only and do something, e.g.,
call 'cbind', based on the attribute
f- function(...) {get first attribute; maybe or maybe not call 'cbind'}

I thought of (ignoring deparse.level for the moment)

f-function(...) {x - attr(list(...)[[1L]], foo); if (x==bar)
cbind(...) else x}

but I feared my solution might do some extra copying, with a performance
penalty if the dotted objects in the actual call to f' are very large.

I thought the following alternative might avoid a potential performance hit
by evaluating the attribute in the parent.frame (and therefore avoid extra
copying?):

f-function(...)
{
   L-match.call(expand.dots=FALSE)[[2L]]
   x - eval(substitute(attr(x,foo), list(x=L[[1L]])))
   if (x==bar) cbind(...) else x
}

system.time tests showed this second form to be only marginally faster.

Is my fear about extra copying unwarranted? If not, is there a better way to
get the foo attribute of the first argument other than my two
alternatives?

Thanks,
Dan Murphy

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Attributes of 1st argument in ...

2010-07-02 Thread Olaf Mersmann
Hi Daniel,

On 02.07.2010, at 23:26, Daniel Murphy wrote:
 I am trying to get an attribute of the first argument in a call to a
 function whose formal arguments consist of dots only and do something, e.g.,
 call 'cbind', based on the attribute
 f- function(...) {get first attribute; maybe or maybe not call 'cbind'}
 
 I thought of (ignoring deparse.level for the moment)
 
 f-function(...) {x - attr(list(...)[[1L]], foo); if (x==bar)
 cbind(...) else x}

what about using the somewhat obscure ..1 syntax? This version runs quite a bit 
faster for me:

 g - function(...) {
   x - attr(..1, foo)
   if (x == bar)
 cbind(...)
   else
 x
 }

but it will be hard to quantify how this pans out for your unless we know how 
many and what size and type the arguments are.

Cheers,
Olaf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] kmeans

2010-07-02 Thread Gabor Grothendieck
In kmeans() in stats one gets an error message with the default
clustering algorithm if centers = 1.  Its often useful to calculate
the sum of squares for 1 cluster, 2 clusters, etc. and this error
complicates things since one has to treat 1 cluster as a special case.
 A second reason is that easily getting the 1 cluster sum of squares
makes it easy to calculate the between cluster sum of squares when
there is more than 1 cluster.

I suggest adding the line marked ### to the source code of kmeans (the
other lines shown are just ther to illustrate context).  Adding this
line forces kmeans to use the code for algorithm 3 if centers is 1.
This is useful since unlike the code for the default algorithm, the
code for algorithm 3 succeeds for centers = 1.

if(length(centers) == 1) {
if (centers == 1) nmeth - 3 ###
k - centers

Also note that KMeans in Rcmdr produces a betweenss and a tot.withinss
and it would be nice if kmeans in stats did that too:

 library(Rcmdr)
 str(KMeans(USArrests, 3))
List of 6
 $ cluster : Named int [1:50] 1 1 1 2 1 2 3 1 1 2 ...
  ..- attr(*, names)= chr [1:50] Alabama Alaska Arizona Arkansas ...
 $ centers : num [1:3, 1:4] 11.81 8.21 4.27 272.56 173.29 ...
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr [1:3] 1 2 3
  .. ..$ : chr [1:4] Murder Assault UrbanPop Rape
 $ withinss: num [1:3] 19564 9137 19264
 $ size: int [1:3] 16 14 20
 $ tot.withinss: num 47964  =
 $ betweenss   : num 307844 =
 - attr(*, class)= chr kmeans

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel