Re: [R] Monotonic interpolation

2007-09-06 Thread Bert Gunter
RSiteSearch(monotone, restr=func) will give you several packages and
functions for monotone smoothing, including the isoreg() function in the
standard stats package.  You can determine if any of these does what you
want.


Bert Gunter
Genetech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of excalibur
Sent: Thursday, September 06, 2007 8:04 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Monotonic interpolation




Le jeu. 6 sept. à 09:45, excalibur a écrit :


 Hello everybody, has anyone got a function for smooth monotonic  
 interpolation
 (splines ...) of a univariate function (like a distribution  
 function for
 example) ?

approxfun() might be what your looking for.

Is the result of approxfun() inevitably monotonic ?
-- 
View this message in context:
http://www.nabble.com/Monotonic-interpolation-tf4392288.html#a12524568
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] computing distance in miles or km between 2 street addre

2007-09-06 Thread Bert Gunter
There is a well-known (greeedy) algorithm due to Dijkstra for choosing the
shortest path = minimum weight path on a weighted digraph between two
vertices. I'm sure numerous open source versions of this are available.
optim() is not relevant.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding
Sent: Thursday, September 06, 2007 3:18 PM
To: Philip James Smith; r-help@stat.math.ethz.ch
Subject: Re: [R] computing distance in miles or km between 2 street addre

On 06-Sep-07 18:42:32, Philip James Smith wrote:
 Hi R-ers:
 
 I need to compute the distance between 2 street addresses in
 either km or miles. I do not care if the distance is a shortest
 driving route or if it is as the crow flies.
 
 Does anybody know how to do this? Can it be done in R? I have
 thousands of addresses, so I think that Mapquest is out of the
 question!
 
 Please rely to: [EMAIL PROTECTED]
 
 Thank you!
 Phil Smith

That's a somewhat ill-posed question! You will for a start
need a database of some kind, either of geographical locations
(coordinates) of street addresses, or of the metric of the
road network with capability to identify the street addresses
in the database.

If it's just as the crow flies, then it can be straightforwardly
computed in R, either by Pythogoras (when they are not too far
apart) or using a function which takes account of the shape of
the Earth,

There are many R packages which have to do with mapping data.
Search for map through the list of R packages at

http://finzi.psych.upenn.edu/R/library/maptools/html/00Index.html

-- maptools in particular. Also look at (for instance) aspace.

For shortest driving route then you need to find the shortest
distance through a network. You may find some hints in the
package optim -- but there must be some R experts out there
on this sort of thing!

However, the primary need is for the database which gives
the distance information in one form or another. What were
you proposing to use for this? As far as I know, R has no
database relevant to street addresses!

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 06-Sep-07   Time: 23:17:57
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust linear models and unequal variance

2007-09-04 Thread Bert Gunter
Let me try a reply, although I wish others wiser than I had responded.

1. How do you know the variances are unequal? 

2. If you somehow know what the variances are (or at least their relative
sizes), you can use the weights arguments of the functions you mentions to
weight inversely proportional to variance (except not for the MM method in
rlm() according to the docs.) 

3. That ranked regression is robust is a myth. It also does not deal with
the unequal variance situation. It is not a panacea for anything. If you
need robust regression use robust regression.

4. If group sizes are not too dissimilar, than whether you case weight or
not may not make much difference (alas, hard to tell a priori). Especially
to estimation.

The fundamental issue is that outliers and unequal variances must be
operationalized, otherwise they are confounded: outlier only has meaning
compared to what is expected from a specified distribution. Outliers are no
longer out when the variance is large. 

Also look at glm() with the quasi option if you wish to consider fitting a
heterogeneous variance structure to initialize a robust method (which could,
of course, be distorted by your outliers).


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Geertje Van der
Heijden
Sent: Tuesday, September 04, 2007 10:55 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Robust linear models and unequal variance

Hi all,

I have probably a basic question, but I can't seem to find the answer in
the literature or in the R-archives. 

I would like to do a robust ANCOVA (using either rlm or lmRob of the
MASS and robust packages) - my response variable deviates slightly from
normal and I have some outliers. The data consist of 2 factor
variables and 3-5 covariates (fdepending on the model). However, the
variance between my groups is not equal and I am not sure if it is
therefore appropriate to use a robust statistical method or if a
non-parametric analysis (i.e. ranked regression) might be better. If I
can still use a robust statistical method, which estimator is best to
use to deal with unequal variance? And if it is better to use a
non-parametric analysis, could anyone put me in the direction of the
right non-parametric method to use (the relationship between my response
variable and the covariates is linear)?

Any help on this would be greatly appreciated!

Many thanks,
Geertje


Geertje van der Heijden
PhD student
Tropical Ecology
School of Geography
University of Leeds
Leeds LS2 9JT

Tel: (+44)(0)113 3433345 
Email: [EMAIL PROTECTED]



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving plot into file

2007-08-31 Thread Bert Gunter
?Devices
e.g. ?pdf 


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of uv
Sent: Friday, August 31, 2007 5:41 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Saving plot into file


Hello. I am using R with Mac X11. I am looping through a few hundreds of
text
lines, making a plot() for each of them. I would like to save these plots
graphical images into separate graphical files and I didn't succeed doing
that. I would be grateful for any suggestion.
-- 
View this message in context:
http://www.nabble.com/Saving-plot-into-file-tf4359947.html#a12425669
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] retrieve p-value from a cox.obj

2007-08-29 Thread Bert Gunter
str() is your friend. It tells you about the structure of any R object, from
which you can usually glean what you need to know to get what you want. It
is often useful to use it on summary(object) rather than on the object, as
the summary method for an (S3) classed object often contains what you're
looking for.

Less generally, names() and as.list() can sometimes get you what you want
also.

Alternatively, check the summary.coxph() code (survival:::summary.coxph, as
it's hidden in the namespace). It is clear there how to get what you want,
either direct from the fitted object or from the summary.coxph object.

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of clearsky
Sent: Wednesday, August 29, 2007 8:41 AM
To: r-help@stat.math.ethz.ch
Subject: [R] retrieve p-value from a cox.obj


I have a cox.obj named obj, 
obj - coxph( Surv(time, status) ~ group, surv.data)
now I want to retrieve the p-value from obj, so that I can run this hundreds
of times and plot out the distribution of the p-value. could anyone tell me
how to get p-value from obj?

thanks,

-- 
View this message in context:
http://www.nabble.com/retrieve-p-value-from-a-cox.obj-tf4348520.html#a123896
52
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Excel

2007-08-29 Thread Bert Gunter
Erich:

This is not a comment either for or against the use of Excel. I only wish to
point out that AFAICS, Hadley Wickham's reshape package offers all the pivot
table functionality and more.

If I am wrong about this, please let me and everyone else know.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Erich Neuwirth
Sent: Wednesday, August 29, 2007 11:43 AM
To: r-help
Subject: Re: [R] Excel

Excel bashing can be fun but also can be dangerous because
you are makeing your life harder than necessary.
Statisticians meanwhile know that the numerics of statistical
computation can be quite bad, therefore one should not use them.
But using our (we = Thomas Baier + Erich Neuwirth) RExcel addin either
with the R(D)COM server or with rcom (package on CRAN) allows you to use
all the nice features of Excel (yes, there are quite a few) and use R as
as the computational engine within Excel. The formula
=RApply(var,A1:A1000) in an Excel cell for example will use R to
compute the variance of the data in column A in Excel. If you change any
of the values in the range A1:A1000 will automatically recompute the
variance.

There is one feature in Excel which is extremely convenient, Pivot
tables. Anybody doing any work as statistical consultant really ought to
know about Pivot tables, and I am still surprised how many statisticians
do not know about it. Neither Gnumeric nor OpenOffice Calc offer
comparably convenient ways working with multidimensional tables.

I think the answer to the question
Excel or R of course is Excel and R.



-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Experimental Design with R

2007-08-28 Thread Bert Gunter
Please use R's search tools.

RSiteSearch(experimental design, restr = funct) 

finds optBlock() in the AlgDesign package as the 10th hit.

Whether this package will have what you want is another issue. 

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marc BERVEILLER
Sent: Tuesday, August 28, 2007 6:12 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Experimental Design with R

Dear R-users,

I want to know if there is a package that allows to define different
experimental designs (factorial, orthogonal, taguchi)
and to compare them.
I don't found one in the R-web site, but it is possible I missed it!

Thank you in advance

Sincerely,
Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset using noncontiguous variables by name (not index)

2007-08-26 Thread Bert Gunter
The problem is that x3:x5 does not mean what you think it means. The only
reason it does the right thing in subset() is because a clever trick is used
there (read the code -- it's not hard to understand) to ensure that it does.
Gabor has essentially mimicked that trick in his solution.

However, it is not necessary do this. You can construct the call directly as
you tried to do. Using the anscombe example, here's how:

chooz - c(x1,x3:x4,y2)  ## enclose the desired expression in quotes
do.call (subset, list( x = anscombe, select = parse(text = chooz)))

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Gabor 
 Grothendieck
 Sent: Sunday, August 26, 2007 2:10 PM
 To: Muenchen, Robert A (Bob)
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] subset using noncontiguous variables by name 
 (not index)
 
 Using builtin data frame anscombe try this. First we set up a 
 data frame
 anscombe.seq which has one row containing 1, 2, 3, ... .  Then select
 out from that data frame and unlist it to get the desired 
 index vector.
 
  anscombe.seq - replace(anscombe[1,], TRUE, seq_along(anscombe))
  idx - unlist(subset(anscombe.seq, select = c(x1, x3:x4, y2)))
  anscombe[idx]
x1 x3 x4   y2
 1  10 10  8 9.14
 2   8  8  8 8.14
 3  13 13  8 8.74
 4   9  9  8 8.77
 5  11 11  8 9.26
 6  14 14  8 8.10
 7   6  6  8 6.13
 8   4  4 19 3.10
 9  12 12  8 9.13
 10  7  7  8 7.26
 11  5  5  8 4.74
 
 
 On 8/26/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
  Hi All,
 
  I'm using the subset function to select a list of variables, some of
  which are contiguous in the data frame, and others of which 
 are not. It
  works fine when I use the form:
 
  subset(mydata,select=c(x1,x3:x5,x7) )
 
  In reality, my list is far more complex. So I would like to 
 store it in
  a variable to substitute in for c(x1,x3:x5,x7) but cannot get it to
  work. That use of the c function seems to violate R rules, 
 so I'm not
  sure how it works at all. A small simulation of the problem 
 is below.
 
  If the variable names  orders were really this simple, I could use
  indices like
 
  summary( mydata[ ,c(1,3:5,7) ] )
 
  but alas, they are not.
 
  How does the c function work this way in the first place, 
 and how can I
  make this substitution?
 
  Thanks,
  Bob
 
  mydata - data.frame(
   x1=c(1,2,3,4,5),
   x2=c(1,2,3,4,5),
   x3=c(1,2,3,4,5),
   x4=c(1,2,3,4,5),
   x5=c(1,2,3,4,5),
   x6=c(1,2,3,4,5),
   x7=c(1,2,3,4,5)
  )
  mydata
 
  # This does what I want.
  summary(
   subset(mydata,select=c(x1,x3:x5,x7) )
  )
 
  # Can I substitute myVars?
  attach(mydata)
  myVars1 - c(x1,x3:x5,x7)
 
  # Not looking good!
  myVars1
 
  # This doesn't do the right thing.
  summary(
   subset(mydata,select=myVars1 )
  )
 
  # Total desperation on this attempt:
  myVars2 - x1,x3:x5,x7
  myVars2
 
  # This doesn't work either.
  summary(
   subset(mydata,select=myVars2 )
  )
 
 
 
  =
  Bob Muenchen (pronounced Min'-chen), Manager
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230
  FAX: (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web: http://oit.utk.edu/scc,
  News: http://listserv.utk.edu/archives/statnews.html
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] uneven list to matrix

2007-08-24 Thread Bert Gunter
It can be done straightforwardly -- don't know about efficientcy -- without
recourse to zoo or merge:

## example data
x- 1:5
names(x) - letters[1:5]
alph - list( x[1:4], x[c(1,3,4)],x[c(1,4,5)])

## Solution
rn - unique(unlist(sapply(alph,names)))
mx - matrix( nr=length(rn), nc=length(alph),dimnames = list(rn,NULL))
## use dimnames = sort(rn) if you want to sort them
for(i in seq(length(alph))){y - alph[[i]]; mx[names(y),i] - y} 


Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher Marcum
Sent: Thursday, August 23, 2007 10:27 PM
To: Gabor Grothendieck
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] uneven list to matrix

Hi Gabor,

My apologies. Both solutions work just fine on large lists (n=1000,
n[[i]]=500). A memory problem on my machine caused the error and
fail-to-sort. Thank you!

PS - The zoo method is slightly faster.

Best,
Chris

Gabor Grothendieck wrote:
 On 8/24/07, Christopher Marcum [EMAIL PROTECTED] wrote:
 Hi Gabor,

 Thank you. The native solution works just fine, though there is an
 interesting side effect, namely, that with very large lists the rows of
 the output become scrambled though the corresponding columns are
 correctly
 sorted. The zoo package solution does not work on large lists: there is
 an
 error:

 Error in order(na.last, decreasing, ...) :
argument 1 is not a vector

 They both work on the example data.  Please provide reproducible
 examples to illustrate your comments if you would like a response.


 Gabor Grothendieck wrote:
  Here are two solutions.  The first repeatedly uses merge and the
  second creates a zoo object from each alph component whose time
  index consists of the row labels and uses zoo's multiway merge to
  merge them.
 
  # test data
  m - matrix(1:5, 5, dimnames = list(LETTERS[1:5], NULL))
  alph - list(m[1:4,,drop=F], m[c(1,3,4),,drop=F], m[c(1,4,5),,drop=F])
  alph
 
  # solution 1
  out - alph[[1]]
  for(i in 2:length(alph)) {
out - merge(out, alph[[i]], by = 0, all = TRUE)
row.names(out) - out[[1]]
out - out[-1]
  }
  matrix(as.matrix(out), nrow(out), dimnames=list(rownames(out),NULL))
 
  # solution 2
  library(zoo)
  z - do.call(merge, lapply(alph, function(x) zoo(c(x), rownames(x
  matrix(coredata(z), nrow(z), dimnames=list(time(z),NULL))
 
 
  On 8/23/07, Christopher Marcum [EMAIL PROTECTED] wrote:
  Hello,
 
  I am sure I am not the only person with this problem.
 
  I have a list with n elements, each consisting of a single column
 matrix
  with different row lengths. Each row has a name ranging from A to E.
  Here
  is an example:
 
  alph[[1]]
  A 1
  B 2
  C 3
  D 4
 
  alph[[2]]
  A 1
  C 3
  D 4
 
  alph[[3]]
  A 1
  D 4
  E 5
 
 
  I would like to create a matrix from the elements in the list with n
  columns such that the row names are preserved and NAs are inserted
 into
  the cells where the uneven lists do not match up based on their row
  names.
  Here is an example of the desired output:
 
  newmatrix
   [,1]  [,2]  [,3]
  A  1 1 1
  B  2 NANA
  C  3 3 NA
  D  4 4 4
  E  NANA5
 
  Any suggestions?
  I have tried
  do.call(cbind,list)
  I also thought I was on the right track when I tried converting each
  element into a vector and then running this loop (which ultimately
  failed):
 
  newmat-matrix(NA,ncol=3,nrow=5)
  colnames(newmatrix)-c(A:E)
  for(j in 1:3){
  for(i in 1:5){
  for(k in 1:length(list[[i]])){
  if(is.na(match(colnames(newmatrix),names(alph[[i]])))[j]==TRUE){
  newmatrix[i,j]-NA}
  else newmatrix[i,j]-alph[[i]][k]}}}
 
  Thanks,
  Chris
  UCI Sociology
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 





__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regulatory Compliance and Validation Issues

2007-08-20 Thread Bert Gunter
FWIW:

The few companies with which I'm familiar have significance resources --
i.e. software QC departments --  devoted to validating that internally
developed code (e.g. SAS macros) used in submissions does what its claims to
do. Extensive documentation of the code and the validation processs is
required. All changes to such code must of course be documented and
validated. I believe this is all part of CFR Part 11 requirements.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr
Sent: Monday, August 20, 2007 12:17 PM
To: Cody Hamilton
Cc: Thomas Lumley; r-help@stat.math.ethz.ch
Subject: Re: [R] Regulatory Compliance and Validation Issues

Cody Hamilton wrote:
 Dear Thomas,
 
 Thank you for your reply.  You are of course quite right - the R
Foundation couldn't be responsible for any individually contributed package.
 
 I am curious as to how an orgainization operating in a regulated
environment could safely use a contributed package.  What if the
author/maintainer retires or loses interest in maintaining the package?  The
organization would then find itself in the awkward position of being reliant
on software for which there is no technical support and which may not be
compatible with future versions of the base R software.  I suppose the
organization could take responsibility for maintaining the individual
functions within a package on its own (one option made possible by the open
source nature of R), but this would require outstanding programming
resources which the company may not have (Thomas Lumleys are sadly rare).
In addition, as the organization is claiming the functions as their own (and
not as out-of-the-box software), the level of required validation would be
truly extraordinary.  I also wonder if an everyone-maintain-their-own-copy
approach could lead to multiple mutated vers
i!
  ons of a package's functions across the R universe (e.g. Edwards' version
of sas.get() vs. Company X's version of sas.get(), etc.).
 
 Regards,
-Cody

Cody,

I think of this issue as not unlike an organization using its own code 
written by its own analysts or SAS programmers.  Code is reused all the 
time.

Frank

 
 As always, I am speaking for myself and not necessarily for Edwards
Lifesciences.
 
 -Original Message-
 From: Thomas Lumley [mailto:[EMAIL PROTECTED]
 Sent: Sunday, August 19, 2007 8:50 AM
 To: Cody Hamilton
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Regulatory Compliance and Validation Issues
 
 On Fri, 17 Aug 2007, Cody Hamilton wrote:
 
 snip
 I have a few specific comments/questions that I would like to present to
 the R help list.
 snip
 2. While the document's scope is limited to base R plus recommended
 packages, I believe most companies will need access to functionalities
 provided by packages not included in the base or recommended packages.
 (For example, I don't think I could survive without the sas.get()
 function from the Design library.)  How can a company address the issues
 covered in the document for packages outside its scope?  For example,
 what if a package's author does not maintain historical archive versions
 of the package?  What if the author no longer maintains the package?
 Is the solution to add more packages to the recommended list (I'm fairly
 certain that this would not be a simple process) or is there another
 solution?
 
 This will have to be taken up with the package maintainer.  The R
 Foundation doesn't have any definitive knowledge about, eg, Frank
 Harrell's development practices and I don't think the FDA would regard our
 opinions as relevant.
 
 Archiving, at least, is addressed by CRAN: all the previously released
 versions of packages are available
 
 3. At least at my company, each new version must undergo basically the
 same IQ/OQ/PQ as the first installation.  As new versions of R seem to
 come at least once a year, the ongoing validation effort would be
 painful if the most up-to-date version of R is to be maintained within
 the company.  Is there any danger it delaying the updates (say updating
 R within the company every two years or so)?
 
 It's worse than that: there are typically 4 releases of R per year (the
 document you are commenting on actually gives dates).  The ongoing
 validation effort may indeed be painful, and this was mentioned as an
 issue in the talk by David James  Tony Rossini.
 
 The question of what is missed by delaying updates can be answered by
 looking at the NEWS file. The question of whether it is dangerous is
 really an internal risk management issue for you.
 
 -thomas
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair

Re: [R] Convert factor to numeric vector of labels

2007-08-14 Thread Bert Gunter
Matt:

I believe you have confused issues.

Setting stringsAsFactors = FALSE would dramatically **increase** the amount
of memory used for storing character vectors, which is what factors are for.
So your proposed solution does exactly the opposite of what you want.

The issue you are worried about is when numeric fields are somehow
interpreted as non-numeric. This can happen for a variety of reasons (stray
characters in numeric fields,quotes around numbers,...). The solution is not
to set a global default that does the opposite of what you want in its
intended use, but to read the documentation and either set the appropriate
arguments (perhaps colClasses of read.table) or fix the original data before
R reads it (e.g. remove quotes and stray characters). Failing that, the
one-off solutions given are the correct way to handle what is a data
problem, not an R problem.

However, I should add that there are arguments for making stringsAsFactors =
FALSE; search the archives for discussions why. The memory penalty will have
to be paid, of course.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Matthew Keller
Sent: Tuesday, August 14, 2007 12:48 PM
To: John Kane
Cc: Falk Lieder; r-help@stat.math.ethz.ch
Subject: Re: [R] Convert factor to numeric vector of labels

Hi all,

If we, the R community, are endeavoring to make R user friendly
(gasp!), I think that one of the first places to start would be in
setting stringsAsFactors = FALSE. Several times I've run into
instances of folks decrying R's rediculous usage of memory in
reading data, only to come to find out that these folks were
unknowingly importing certain columns as factors. The fix is easy once
you know it, but it isn't obvious to new users, and I'd bet that it
turns some % of people off of the program. Factors are not used often
enough to justify this default behavior in my opinion. When factors
are used, the user knows to treat the variable as a factor, and so it
can be done on a case-by-case (or should I say variable-by-variable?)
basis.

Is this a default that should be changed?

Matt


On 8/13/07, John Kane [EMAIL PROTECTED] wrote:
 This is one of R's rather _endearing_  little
 idiosyncrasies. I ran into it a while ago.
 http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98090.html


 For some reason, possibly historical, the option
 stringAsFactors is set to TRUE.

 As Prof Ripley says FAQ 7.10 will tell you
 as.numeric(as.character(f)) # for a one-off conversion

 From Gabor Grothendieck  A one-off solution for a
 complete data.frame

 DF - data.frame(let = letters[1:3], num = 1:3,
  stringsAsFactors = FALSE)

 str(DF)  # to see what has happened.

 You can reset the option globally, see below.  However
 you might want to read Gabor Grothendieck's comment
 about this in the thread referenced above since it
 could cause problems if you transfer files alot.

 Personally I went with the global option since I don't
 tend to transfer programs to other people and I was
 getting tired of tracking down errors in my programs
 caused by numeric and character variables suddenly
 deciding to become factors.

 From Steven Tucker:

 You can also this option globally with
  options(stringsAsFactors = TRUE)  # in
 \library\base\R\Rprofile

 --- Falk Lieder [EMAIL PROTECTED] wrote:

  Hi,
 
  I have imported a data file to R. Unfortunately R
  has interpreted some
  numeric variables as factors. Therefore I want to
  reconvert these to numeric
  vectors whose values are the factor levels' labels.
  I tried
  as.numeric(factor),
  but it returns a vector of factor levels (i.e.
  1,2,3,...) instead of labels
  (i.e. 0.71, 1.34, 2.61,.).
  What can I do instead?
 
  Best wishes, Falk

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Matthew C Keller
Postdoctoral Fellow
Virginia Institute for Psychiatric and Behavioral Genetics

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Legend on graph

2007-08-13 Thread Bert Gunter
You can get the legend outside the plot region by

1. First changing the clipping region via par(xpd = TRUE) ; (or xpd=NA). see
?par

2. Specifying x and y coodinates for legend placement outside the limits of
the plot region.

This allows you to include a legend without adding a bunch of useless
whitespace to the plot region; or to add a grid to the plot without
interfering with the legend.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Nguyen Dinh Nguyen
Sent: Monday, August 13, 2007 3:42 PM
To: [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: [R] Legend on graph

Hi Akki, 
Then you may need to increase y-axis scale by ylim=c(min,max)
Cheers
Nguyen

On 8/12/07, akki [EMAIL PROTECTED] wrote:
 Hi,
 I have a problem when I want to put a legend on the graph.
 I do:

 legend(topright, names(o), cex=0.9, col=plot_colors,lty=1:5, bty=n)

 but the legend is writen into the graph (graphs' top but into the graph),
 because I have values on this position. How can I write the legend on top
 the graph without the legend writes on graph's values.

 Thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixture of Normals with Large Data

2007-08-07 Thread Bert Gunter
Why would anyone want to fit a mixture of normals with 110 million
observations?? Any questions about the distribution that you would care to
ask can be answered directly from the data. Of course, any test of normality
(or anything else) would be rejected.

More to the point, the data are certainly not a random sample of anything.
There will be all kinds of systematic nonrandom structure in them. This is
clearly a situation where the researcher needs to think more carefully about
the substantive questions of interest and how the data may shed light on
them, instead of arbitrarily and perhaps reflexively throwing some silly
statistical methodology at them.  

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor
Sent: Tuesday, August 07, 2007 3:02 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Mixture of Normals with Large Data

I wasn't aware of this literature, thanks for the references.

On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote:
 Another possibility is to use data squashing methods.  Relevant papers
are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen
(1999).

 Ravi.
 

 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology
 School of Medicine
 Johns Hopkins University

 Ph. (410) 502-2619
 email: [EMAIL PROTECTED]


 - Original Message -
 From: Charles C. Berry [EMAIL PROTECTED]
 Date: Saturday, August 4, 2007 8:01 pm
 Subject: Re: [R] Mixture of Normals with Large Data
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch


  On Sat, 4 Aug 2007, Tim Victor wrote:
 
All:
   
I am trying to fit a mixture of 2 normals with  110 million
  observations. I
am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
  I
continue to run out of memory. Does anyone have any suggestions.
 
 
   If the first few million observations can be regarded as a SRS of the
 
   rest, then just use them. Or read in blocks of a convenient size and
 
   sample some observations from each block. You can repeat this process
  a
   few times to see if the results are sufficiently accurate.
 
   Otherwise, read in blocks of a convenient size (perhaps 1 million
   observations at a time), quantize the data to a manageable number of
 
   intervals - maybe a few thousand - and tabulate it. Add the counts
  over
   all the blocks.
 
   Then use mle() to fit a multinomial likelihood whose probabilities
  are the
   masses associated with each bin under a mixture of normals law.
 
   Chuck
 
   
Thanks so much,
   
Tim
   
   [[alternative HTML version deleted]]
   
__
R-help@stat.math.ethz.ch mailing list
   
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
   
 
   Charles C. Berry(858) 534-2098
Dept of
  Family/Preventive Medicine
   E UC San Diego
 La Jolla, San Diego 92093-0901
 
   __
   R-help@stat.math.ethz.ch mailing list
 
   PLEASE do read the posting guide
   and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixture of Normals with Large Data

2007-08-07 Thread Bert Gunter
 

Have you considered the situation of wanting to characterize
probability densities of prevalence estimates based on a complex
random sample of some large population.

No -- and I stand by my statement. The empirical distribution of the data
themselves are the best characterization of the density. You and others
are free to disagree.

-- Bert



On 8/7/07, Bert Gunter [EMAIL PROTECTED] wrote:
 Why would anyone want to fit a mixture of normals with 110 million
 observations?? Any questions about the distribution that you would care to
 ask can be answered directly from the data. Of course, any test of
normality
 (or anything else) would be rejected.

 More to the point, the data are certainly not a random sample of anything.
 There will be all kinds of systematic nonrandom structure in them. This is
 clearly a situation where the researcher needs to think more carefully
about
 the substantive questions of interest and how the data may shed light on
 them, instead of arbitrarily and perhaps reflexively throwing some silly
 statistical methodology at them.

 Bert Gunter
 Genentech Nonclinical Statistics

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor
 Sent: Tuesday, August 07, 2007 3:02 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Mixture of Normals with Large Data

 I wasn't aware of this literature, thanks for the references.

 On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote:
  Another possibility is to use data squashing methods.  Relevant papers
 are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen
 (1999).
 
  Ravi.
  
 
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: [EMAIL PROTECTED]
 
 
  - Original Message -
  From: Charles C. Berry [EMAIL PROTECTED]
  Date: Saturday, August 4, 2007 8:01 pm
  Subject: Re: [R] Mixture of Normals with Large Data
  To: [EMAIL PROTECTED]
  Cc: r-help@stat.math.ethz.ch
 
 
   On Sat, 4 Aug 2007, Tim Victor wrote:
  
 All:

 I am trying to fit a mixture of 2 normals with  110 million
   observations. I
 am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
   I
 continue to run out of memory. Does anyone have any suggestions.
  
  
If the first few million observations can be regarded as a SRS of the
  
rest, then just use them. Or read in blocks of a convenient size and
  
sample some observations from each block. You can repeat this process
   a
few times to see if the results are sufficiently accurate.
  
Otherwise, read in blocks of a convenient size (perhaps 1 million
observations at a time), quantize the data to a manageable number of
  
intervals - maybe a few thousand - and tabulate it. Add the counts
   over
all the blocks.
  
Then use mle() to fit a multinomial likelihood whose probabilities
   are the
masses associated with each bin under a mixture of normals law.
  
Chuck
  

 Thanks so much,

 Tim

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list

 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.

  
Charles C. Berry(858) 534-2098
 Dept of
   Family/Preventive Medicine
E UC San Diego
  La Jolla, San Diego 92093-0901
  
__
R-help@stat.math.ethz.ch mailing list
  
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create vectors form matrices

2007-08-06 Thread Bert Gunter
The poster asked for row major representation, not column major
representation.

Matrices **are** vectors -- stored in column major order.
Try:

cat(x,\n)  ## versus...
cat(t(x),\n)

The tabular printout occurs because the print() method for a matrix object
(more generally any array) prints the matrix  (a vector with a dim
attribute) in an appropriate way. However you can manipulate the matrix
**as** a vector, and in most circumstances, the dim attribute will be
preserved so it will remain a matrix object.

Please read An Introduction to R, ?methods and ?print (at least) for
details. R will always be arcane to those who do not make a serious effort
to learn it. It is **not** meant to be intuitive and easy for casual users
to just plunge into. It is far too complex and powerful for that. But the
rewards are great for serious data analysts who put in the effort.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Henrique Dallazuanna
Sent: Monday, August 06, 2007 7:33 AM
To: Niccolò Bassani
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Create vectors form matrices

Try:

dim(matrix) - NULL

-- 
Henrique Dallazuanna
Curitiba-Parana-Brasil
250 25' 40 S 490 16' 22 O

On 06/08/07, Niccolr Bassani [EMAIL PROTECTED] wrote:

 Hi, dear R users. I've a kind of stupid question, I hope you can provide
 some help!
 The topic here's really simple: vectors and matrices.
 I have a matrix (616 rows x 22 cols) filled with numbers and NAs;
 something
 like this:

 1  2  3  4  5  6  NA  NA NA NA 
 1  2  3  4  NA  NA  NA  NA NA .
 ..
 

 What I'm trying to do is to put all the rows on a unique row, so to have
 something like this:

 1  2  3  4  5  6  NA  NA NA NA 1  2  3  4  NA  NA  NA  NA NA
 .

 and so on. The matter is that whatever I try, I just get something like
 this:

 1 1 1 1 1 1 1 1 .2 2 2 2 2 2 2 2 2 ..

 Obviously, this is not what required. I've tried to concatenate, I've
 built
 a for cicle, but nothing seems to produce what I want. Sorry for the dumb
 question, but I'm almost sure I need holidays...
 Thanks in advance!
 niccolr

 [[alternative HTML version deleted]]


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

2007-08-03 Thread Bert Gunter
I suspect you'll get some creative answers, but if all you're worried about
is whether a column exists before you do something with it, what's wrong
with:

nm - ... ## a character vector of names
if(!all(nm %in% names(yourdata))) ## complain
else ## do something


I think this is called defensive programming.

Bert Gunter
Genentech


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steven McKinney
Sent: Friday, August 03, 2007 10:38 AM
To: r-help@stat.math.ethz.ch
Subject: [R] FW: Selecting undefined column of a data frame (was
[BioC]read.phenoData vs read.AnnotatedDataFrame)

Hi all,

What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?

Alice Johnson recently tackled this issue
(see [BioC] posting below).

Due to a mis-spelled column name (FileName
instead of Filename) which produced no warning,
Alice spent a fair amount of time tracking down
this bug.  With my fumbling fingers I'll be tracking
down such a bug soon too.

Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column?  It seems to me
that the [.data.frame extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in [.data.frame such as

if (any(is.na(cols))) 
stop(undefined columns selected)



In R 2.5.1 a NULL is silently returned.

 foo - data.frame(Filename = c(a, b))
 foo[, FileName]
NULL

Has something changed so that the code lines
if (any(is.na(cols))) 
stop(undefined columns selected)
in [.data.frame no longer work properly (if
I am understanding the intention properly)?

If not, could  [.data.frame check an
options() variable setting (say
warn.undefined.colnames) and throw a warning
if a non-existent column name is referenced?




 sessionInfo()
R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base 

other attached packages:
 plotrix lme4   Matrix  lattice 
 2.2-3  0.99875-4 0.999375-0 0.16-2 
 



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: [EMAIL PROTECTED] on behalf of Johnstone, Alice
Sent: Wed 8/1/2007 7:20 PM
To: [EMAIL PROTECTED]
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
 
 For interest sake, I have found out why I wasn't getting my expected
results when using read.AnnotatedDataFrame
Turns out the error was made in the ReadAffy command, where I specified
the filenames to be read from my AnnotatedDataFrame object.  There was a
typo error with a capital N ($FileName) rather than lowercase n
($Filename) as in my target file..whoops.  However this meant the
filename argument was ignored without the error message(!) and instead
of using the information in the AnnotatedDataFrame object (which
included filenames, but not alphabetically) it read the .cel files in
alphabetical order from the working directory - hence the wrong file was
given the wrong label (given by the order of Annotated object) and my
comparisons were confused without being obvious as to why or where.
Our solution: specify that filename is as.character so assignment of
file to target is correct(after correcting $Filename) now that using
read.AnnotatedDataFrame rather than readphenoData.

Data-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd)

Hurrah!

It may be beneficial to others, that if the filename argument isn't
specified, that filenames are read from the phenoData object if included
here.

Thanks!

-Original Message-
From: Martin Morgan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 26 July 2007 11:49 a.m.
To: Johnstone, Alice
Cc: [EMAIL PROTECTED]
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

Hi Alice --

Johnstone, Alice [EMAIL PROTECTED] writes:

 Using R2.5.0 and Bioconductor I have been following code to analysis 
 Affymetrix expression data: 2 treatments vs control.  The original 
 code was run last year and used the read.phenoData command, however 
 with the newer version I get the error message Warning messages:
 read.phenoData is deprecated, use read.AnnotatedDataFrame instead The 
 phenoData class is deprecated, use AnnotatedDataFrame (with
 ExpressionSet) instead
  
 I use the read.AnnotatedDataFrame command, but when it comes to the 
 end of the analysis the comparison of the treatment to the controls 
 gets mixed up compared to what you get using the original 
 read.phenoData ie it looks like the 3 groups get labelled wrong and so

 the comparisons are different (but they can still be matched up

Re: [R] Extracting a website text content using R

2007-08-01 Thread Bert Gunter
Yes, there are.

(Please see and follow the posting guide if you wish to obtain something
more specific)


Bert Gunter
Genetech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Am Stat
Sent: Wednesday, August 01, 2007 2:19 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Extracting a website text content using R

Dear useR,

Just wandering whether it is possible that there is any function in R could
let me get the text contents for a certain website.

Thanks a lot!

Best,

Leon

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Slightly OT - use of R

2007-07-30 Thread Bert Gunter
Why? You might receive more useful replies from a relevant subset of users
if you specify the purpose you have in mind.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Logsdon
Sent: Monday, July 30, 2007 1:28 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Slightly OT - use of R

I am trying to get a measure of how R compares in usage as a statistical 
platform compared to other software.  I would guess it is the most widely 
used among statisticians at least by virtue of it being open source.

But is there any study to which I can refer?  By asking this list I am not 
exactly adopting a rigorous approach!  

Best wishes

John

John Logsdon   Try to make things as simple
Quantex Research Ltd, Manchester UK as possible but not simpler
[EMAIL PROTECTED]  [EMAIL PROTECTED]
+44(0)161 445 4951/G:+44(0)7717758675   www.quantex-research.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating symmetric matrices

2007-07-30 Thread Bert Gunter
See ?dist for an object oriented approach that may be better.

Directly, you can do something like (see  ?row ?col):

x - matrix(NA, 10,10)
## Lower triangular :
x[row(x) = col(x) ] - rnorm(55) 
x[row(x)  col(x)] - x[row(x)  col(x)]
## or you could have saved the random vector and re-used it.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gregory Gentlemen
Sent: Friday, July 27, 2007 8:28 PM
To: r-help@stat.math.ethz.ch
Subject: [R] generating symmetric matrices

Greetings,

I have a seemingly simple task which I have not been able to solve today. I
want to construct a symmetric matrix of arbtriray size w/o using loops. The
following I thought would do it:

p - 6
Rmat - diag(p)
dat.cor - rnorm(p*(p-1)/2)
Rmat[outer(1:p, 1:p, )] - Rmat[outer(1:p, 1:p, )] - dat.cor

However, the problem is that the matrix is filled by column and so the
resulting matrix is not symmetric.

I'd be grateful for any adive and/or solutions.

Gregory 

   
 
  
-

   


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constructing correlation matrices

2007-07-30 Thread Bert Gunter
See ?dist for an object oriented approach that may be better.

Directly, you can do something like (see  ?row ?col):

x - matrix(NA, 10,10)
## Lower triangular :
x[row(x) = col(x) ] - rnorm(55) 
x[row(x)  col(x)] - x[row(x)  col(x)]
## or you could have saved the random vector and re-used it.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gregory Gentlemen
Sent: Sunday, July 29, 2007 7:32 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Constructing correlation matrices

Greetings,

I have a seemingly simple task which I have not been able to solve today and
I checked all of the help archives on this and have been unable to find
anything useful. I want to construct a symmetric matrix of arbtriray size
w/o using loops. The following I thought would do it:

p - 6
Rmat - diag(p)
dat.cor - rnorm(p*(p-1)/2)
Rmat[outer(1:p, 1:p, )] - Rmat[outer(1:p, 1:p, )] - dat.cor

However, the problem is that the matrix is filled by column and so the
resulting matrix is not symmetric.

I'd be grateful for any adive and/or solutions.

Gregory 
   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] doubt about options(graphics.record=T)

2007-07-23 Thread Bert Gunter
Below is an explicit excerpt from the Help file. How, please is this not
clear enough?

Bert Gunter
Genentech Nonclinical Statistics


Recorded plot histories are of class SavedPlots. They have a print method,
and a subset method. As the individual plots are of class recordedplot
they can be replayed by printing them: see recordPlot.

 The active plot history is stored in variable .SavedPlots in the
workspace.  [emphasis added]



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of terra
Sent: Monday, July 23, 2007 11:30 AM
To: Prof Brian Ripley; r-help@stat.math.ethz.ch
Subject: Re: [R] doubt about options(graphics.record=T)

 Hi all,

 I've been using R under WindowsXP.

 So, where the R stores the graphic archives (don't saved) if I use the 
 option
 options(graphics.record=T) inside of Rprofile.site file?
 
 The relevant help file (?windows) does tell you: please read it.

Dear Prof. Ripley,

I read the recommended (?windows) end it was not clear enough!

BTW, I just found a discussion from ([R] RGui: windows-record and command
history Thomas 
Steiner (23 Mar 2006)) where Duncan wrote:

- The graphics history is stored in your current workspace in memory, and it
can get big.

I think it is the answer I was searching. Do you agree?

Regards,

/\/\/\/\
  Jose Claudio Faria
  Brasil/Bahia/UESC/DCET
  Estatistica Experimental/Prof. Titular
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
  Tels:
73-3634.2779 (res - Ilheus/BA)
19-3435.1536 (res - Piracicaba/SP) *
19-9144.8979 (cel - Piracicaba/SP) *
/\/\/\/\

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drawing rectangles in multiple panels

2007-07-11 Thread Bert Gunter
Deepayan et. al.:

A question/comment: I have usually found that the subscripts argument is
what I need when passing *external* information into the panel function, for
example, when I wish to add results from a fit done external to the trellis
call. Fits[subscripts] gives me the fits (or whatever) I want to plot for
each panel. It is not clear to me how the panel layout information from
panel.number(), etc. would be helpful here instead. Am I correct? -- or is
there a smarter way to do this that I've missed?

Cheers,

Bert

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Wednesday, July 11, 2007 10:04 AM
To: Jonathan Williams
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Drawing rectangles in multiple panels

On 7/11/07, Jonathan Williams [EMAIL PROTECTED] wrote:
 Hi folks,

 I'm having some trouble understanding the intricacies of panel
 functions.  I wish to create three side-by-side graphs, each with
 different data-- so far, so good: I rbind() the data, add a column of
 subscripts as a conditioning variable, load up the lattice package,
 specify either a c(3,1) 'layout' or work through 'allow.multiple' and
 'outer' and I'm good to go.

 But now I wish to add three rectangles to each plot, which will be in
 different places on each panel, and I'm terribly stuck.  I can guess
 this requires defining a panel function on the fly, but none of my
 attempts are working.  Suggestions?

You haven't told us what determines the rectangles (only that they are
different in each panel). If they are completely driven by panel data,
here's an example:

panel.qrect -
function(x, y, ...)
{
xq - quantile(x, c(0.1, 0.9))
yq - quantile(y, c(0.1, 0.9))
panel.rect(xq[1], yq[1], xq[2], yq[2],
   col = grey86, border = NA)
panel.xyplot(x, y, ...)
}

xyplot(Sepal.Length ~ Sepal.Width | Species, iris,
   panel = panel.qrect)

If the rectangles are somehow determined externally, you probably want
to use one of the accessor functions described in help(panel.number).
There are good and bad (i.e. less robust) ways to use these, but we
need to know your use case before recommending one.

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why doesn't as.character of this factor create a vector ofcharacters?

2007-07-10 Thread Bert Gunter
Andrew:

As you haven't received a reply yet ...

?factor,?UseMethod, and An Introduction to R may help. But it's a bit
subtle.

Factors are objects that are integer vectors (codes) with a levels attribute
that associates the codes with levels as character names. So
df[df$a==Abraham,] is a data.frame in which the columns are still factors.
as.character() is a S3 generic function that calls the (internal) default
method on a data.frame. This obviously just turns the vector of integers
into characters and ignores the levels attribute.

t() is also a S3 generic with a data.frame method. This merely converts the
data.frame to a matrix via as.matrix and then applies t() to the matrix. The
as.matrix() method for data.frames captures the levels and converts the
data.frame to a character matrix with the level names, not their numeric
codes.So another perhaps more intuitive but also more storage intensive way
(I think) of doing what you wantthat avoids the transpose and as.vector()
conversion would be:

mx - as.matrix(df)
mx[mx[,a]==Abraham,,drop=TRUE]

HTH.

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee
Sent: Tuesday, July 10, 2007 8:57 AM
To: r-help@stat.math.ethz.ch
Subject: [R] why doesn't as.character of this factor create a vector
ofcharacters?

I'm trying to figure out why when I use as.character() on one row of a
data.frame, I get factor numbers instead of a character vector.  Any
suggestions?

See the following code:

a-c(Abraham,Jonah,Moses)
b-c(Sarah,Hannah,Mary)
c-c(Billy,Joe,Bob)

df-data.frame(a=a,b=b,c=c)

#Suppose I'm interested in one line of this data frame but as a vector

one.line - df[df$a==Abraham,]

#However the following illustrates the problem I'm having

one.line - as.vector(df[df$a==Abraham,]) #Creates a one row
data.frame instead of a vector!

#compare above to

one.line - as.character(df[df$a==Abraham,]) #Creates a vector of 1, 3, 1!

#In the end, this creates the output that I'd like:

one.line -as.vector(t(df[df$a==Abraham,])) #but it seems like a lot of
work!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Salient feature selection

2007-07-02 Thread Bert Gunter
Andy:

See e.g. the pls package. However, be forewarned: this is a vague problem
(what kind of predictors/responses do you want? -- linear combinations?
nonlinear combinations? ...). The problem is also NP-Hard I believe, so
solutions are very algorithm (and even starting value)-dependent. For these
reasons, statistical inference is difficult, at best, and probably not even
meaningful in your context, as I doubt that you have a random sample of
anything. A personal recommendation (with which many disagree, I know): seek
extreme parsimony in both predictors and responses for results to be
replicable/scientifically meaningful.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andy Weller
Sent: Monday, July 02, 2007 8:17 AM
To: R-help@stat.math.ethz.ch
Subject: [R] Salient feature selection

I am relatively new to R. I am hoping that someone will be able to point 
me in the right direction and/or suggest a technique/package/reference 
that will help me with the following. I have:

a) Some explanatory variables (integers, real) - these are real world 
physical descriptions, i.e. counts of features, etc

b) Some response variables (integers, real) - these are image analysis 
measurements (gray-value distributions, textural descriptors, etc) of 
the same things represented in a

and I want to find out which between the two correlate best - i.e. the 
salient features from BOTH sets (i.e. not for classification purposes).

For example, if a has 10 explanatory variables and b has 10 response 
variables, I want to test the complete set of explanatory variables with 
each individual response (or vice versa). So, explanatory 1-10 with 
response 1, explanatory 1-10 with response 2, explanatory 1-10 with 
response 3, etc...

This should ultimately tell me which real world physical features are 
related best with the image analysis measurements (with the confidence 
level between them).

I hope this makes sense?

I have used SPSS AnswerTree's Exhaustive CHAID before to select a 
subset of input features for a complete set of output features to aid 
the creation of artificial neural networks. I want to do a similar 
thing, but it is not important for ALL explanatory and response 
variables are used/selected.

I hope that I have been clear in my intentions and I look forward to 
your replies, Andy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] applying max elementwise to two vectors

2007-06-28 Thread Bert Gunter
Please... use and **read** the docs:

?max   --- pmax 


Bert Gunter


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Afshartous, David
Sent: Thursday, June 28, 2007 1:20 PM
To: r-help@stat.math.ethz.ch
Subject: [R] applying max elementwise to two vectors

 
All,

Is there one liner way to obtain the max per observation for two
vectors?
I looked at apply and lapply but it seems that groundwork would have to
be done before applying either of those.  The code below does it but
seems
like overkill.

Thanks!
Dave

x = rnorm(10)
y = rnorm(10)

ind = which(x  y)
z = x
z[ind] - y[ind]  ## z now contains the max's

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme correlation structures

2007-06-27 Thread Bert Gunter
Please read ?lme carefully -- the info you seek is there. In particular, the
weights argument for changing variance weighting by covariates and the
correlation argument for specifying correlation structures.

Pinheiro and Bates's MIXED EFFECT MODELS IN S... is the canonical reference
(which you should get if you want to use R as you said) that exposits the
ideas at greater length.


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gareth Hughes
Sent: Wednesday, June 27, 2007 7:50 AM
To: r-help@stat.math.ethz.ch
Subject: [R] lme correlation structures

Hi all,

I've been using SAS proc mixed to fit linear mixed models and would
like to be able to fit the same models in R. Two things in particular:

1) I have longitudinal data and wish to allow for different repeated
measures covariance parameter estimates for different groups (men and
women), each covariance matrix having the same structure. In proc
mixed this would be done by specifying group= in the REPEATED
statement. Is this simple to do in R? (I've tried form=~time|indv/sex
for example but this doesn't seem to do the job).

2) I've read that other correlation structures can be specified. Does
anyone have any examples of how toeplitz or (first-order)
ante-dependence structures can be specified?

Many thanks,

Gareth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] moving-window (neighborhood) analysis

2007-06-27 Thread Bert Gunter
See the Spatial section under CRAN's Task views 

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Carlos Guâno
Grohmann
Sent: Wednesday, June 27, 2007 8:27 AM
To: r-help@stat.math.ethz.ch
Subject: [R] moving-window (neighborhood) analysis

Hello all

I was wondering what would be the best way to do a moving-window
analysis of a matrix? By moving-window I mean that kind of analysis
common in GIS, where each pixel (matrix element) of the resulting map
is a function of it neighbors, and the neighborhood is a square
matrix.
I was hoping there was some function in R that could do that, where I
could define the size of the neighborhood, and then apply some
function to the values, some function I don't have in GIS packages
(like circular statistics).

thanks all.

Carlos


-- 
+---+
  Carlos Henrique Grohmann - Guano
  Visiting Researcher at Kingston University London - UK
  Geologist M.Sc  - Doctorate Student at IGc-USP - Brazil
Linux User #89721  - carlos dot grohmann at gmail dot com
+---+
_
Good morning, doctors. I have taken the liberty of removing Windows
95 from my hard drive.
--The winning entry in a What were HAL's first words contest judged
by 2001: A SPACE ODYSSEY creator Arthur C. Clarke

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exaustive subgrouping or combination

2007-06-27 Thread Bert Gunter
Do you realize that for n items, there are 2^(n-1) such groups -- since you
essentially want all possible subsets divided by 2: all possible subsets and
their complements repeats each split twice, backwards and forwards. So this
will quickly become ummm... rather large.

If you really want to do this, one lazy but inefficient way I can think of
is to use expand.grid() to generate your subsets. Here's a toy example that
shows you how with n = 4.

## generate a list with four components each of which 
## is c(TRUE,FALSE) -- note that a data.frame is a list

z - data.frame(matrix(rep(c(TRUE,FALSE),4),nrow=2)

## Now use expand.grid to get all 2^4 possible 4 vectors as rows

ix - do.call(expand.grid,z)

## This is essentially what you want. 

apply(ix[1:8,],1,function(x)(1:4)[x])

## gives you the list of first splits, while apply(ix[16:9],... gives the
complements (note reversal of order).


 
Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Waverley
Sent: Wednesday, June 27, 2007 1:57 PM
To: r-help@stat.math.ethz.ch
Subject: [R] exaustive subgrouping or combination

Dear Colleagues,

I am looking for a package or previous implemented R to subgroup and
exaustively divide a vector of squence into 2 groups.

For example:

1, 2, 3, 4

I want to have a group of
1, (2,3,4)
(1,2), (3,4)
(1,3), (2,4)
(1,4), (2,3)
(1,2,3), 4
(2,3), (1,4)
...

Can someone help me as how to implement this?  I get some imaginary problem
when the sequence becomes large.

Thanks much in advance.

-- 
Waverley @ Palo Alto

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] open .r files with double-click

2007-06-11 Thread Bert Gunter
However, do note (on Windows) that you can use an external text/programming
editors (see CRAN's listings)and can register .r / .R files to open
automatically in the chosen editor when clicked on.At least some of these
editors (eg TINN-R) can be configured to automatically and simultaneously
open the RGUI, too, I believe -- but someone may correct me on this.

Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Duncan Murdoch
Sent: Saturday, June 09, 2007 4:29 AM
To: [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] open .r files with double-click

On 08/06/2007 2:52 PM, [EMAIL PROTECTED] wrote:
 Hi Folks,
 On Windows XP, R 2.5.0.
 
 After reading the Installation for Windows and Windows FAQs,
 I cannot resolve this.
 
 I set file types so that Rgui.exe will open .r files.
 
 When I try to open a .r file by double-clicking, R begins to launch,
 but I get an error message saying
 
 Argument 'C:\Documents and Settings\Zoology\My Documents\trial.r'
_ignored_
 
 I click OK, and then R GUI opens, but not the script file.
 
 Is there a way to change this?

Not currently. See the appendix Invoking R of the Introduction manual 
for the current command line parameters, which don't include open a 
script.  This would be a reasonable addition, and I'll add it at some 
point, sooner if someone else comes up with a convincing argument for 
the right command line parameter to do this.

It would be better if clicking on a second script opened a new window in 
the same session, but that takes more work; not sure I'll get to this.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rlm results on trellis plot

2007-06-08 Thread Bert Gunter
I don't think the code below does what's requested, as it assumes a single
overall fit for all panels, and I think the requester wanted separate fits
by panel. This can be easily done, of course, by a minor modification:

xyplot( y ~ x | z,
 panel = function(x,y,...){
   panel.xyplot(x,y,...)
   panel.abline(lm(y~x),col=blue,lwd=2)
   panel.abline(rlm(y~x),col = red,lwd=2)
})

Note that the coefficients do not need to be explicitly extracted by coef(),
as panel.abline will do this automatically.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374



Alan S Barnett wrote:
 How do I add to a trellis plot the best fit line from a robust fit? I
 can use panel.lm to add a least squares fit, but there is no panel.rlm
 function.

  How about using panel.abline() instead of panel.lmline()?

fit1 - coef(lm(stack.loss ~ Air.Flow, data = stackloss))
fit2 - coef(rlm(stack.loss ~ Air.Flow, data = stackloss))

xyplot(stack.loss ~ Air.Flow, data=stackloss,
   panel = function(x, y, ...){
 panel.xyplot(x, y, ...)
 panel.abline(fit1, type=l, col=blue)
 panel.abline(fit2, type=l, col=red)
   }, aspect=1)

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R is not a validated software package..

2007-06-08 Thread Bert Gunter
Frank et. al:

I believe this is a bit too facile. 21 CFR Part 11 does necessitate a
software validation **process** -- but this process does not require any
particular software. Rather, it requires that those using whatever software
demonstrate to the FDA's satisfaction that the software does what it's
supposed to do appropriately. This includes a lot more than assuring, say,
the numerical accuracy of computations; I think it also requires
demonstration that the data are secure, that it is properly transferred
from one source to another, etc. I assume that the statistical validation of
R would be relatively simple, as R already has an extensive test suite, and
it would simply be a matter of providing that test suite info. A bit more
might be required, but I don't think it's such a big deal. 

I think Wensui Liu's characterization of clinical statisticians as having a
mentality related to job security is a canard. Although I work in
nonclinical, my observation is that clinical statistics is complex and
difficult, not only because of many challenging statistical issues, but also
because of the labyrinthian complexities of the regulated and extremely
costly environment in which they work. It is certainly a job that I could
not do.

That said, probably the greatest obstacle to change from SAS is neither
obstinacy nor ignorance, but rather inertia: pharmaceutical companies have
over the decades made a huge investment in SAS infrastructure to support the
collection, organization, analysis, and submission of data for clinical
trials. To convert this to anything else would be a herculean task involving
huge expense, risk, and resources. R, S-Plus (and much else -- e.g. numerous
unvalidated data mining software packages) are routinely used by clinical
statisticians to better understand their data and for exploratory analyses
that are used to supplement official analyses (e.g. for trying to justify
collection of tissue samples or a pivotal study in a patient subpopulation).
But it is difficult for me to see how one could make a business case to
change clinical trial analysis software infrastructure from SAS to S-Plus,
SPSS, or anything else.

**DISCLAINMER** 
My opinions only. They do not in any way represent the view of my company or
its employees.


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr
Sent: Friday, June 08, 2007 7:45 AM
To: Giovanni Parrinello
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] R is not a validated software package..

Giovanni Parrinello wrote:
 Dear All,
 discussing with a statistician of a pharmaceutical company I received 
 this answer about the statistical package that I have planned to use:
 
 As R is not a validated software package, we would like to ask if it 
 would rather be possible for you to use SAS, SPSS or another approved 
 statistical software system.
 
 Could someone suggest me a 'polite' answer?
 TIA
 Giovanni
 

Search the archives and you'll find a LOT of responses.

Briefly, in my view there are no requirements, just some pharma 
companies that think there are.  FDA is required to accepted all 
submissions, and they get some where only Excel was used, or Minitab, 
and lots more.  There is a session on this at the upcoming R 
International Users Meeting in Iowa in August.  The session will include 
dicussions of federal regulation compliance for R, for those users who 
feel that such compliance is actually needed.

Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to find how many modes in 2 dimensions case

2007-06-08 Thread Bert Gunter
Note that the number of modes (local maxima??)  is a function of the
bandwidth, so I'm not sure your question is even meaningful. 

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Patrick Wang
Sent: Friday, June 08, 2007 11:54 AM
To: R-help@stat.math.ethz.ch
Subject: [R] how to find how many modes in 2 dimensions case

Hi,

Does anyone know how to count the number of modes in 2 dimensions using
kde2d function?

Thanks
Pat

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to input data from the keyboard

2007-06-07 Thread Bert Gunter
Please do your homework:

help.search(input)
 


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Miguel Caro
Sent: Thursday, June 07, 2007 11:01 AM
To: r-help@stat.math.ethz.ch
Subject: [R] how to input data from the keyboard


Hello everybody, i wish to input data from the keyboard. In C++ it would
seem
like this:

printf(Input parameter Alpha= );
scanf(%d, alpha);

how would be in R?

Thanks for your help.

Bye 
Miguel.
-- 
View this message in context:
http://www.nabble.com/how-to-input-data-from-the-keyboard-tf3885387.html#a11
013164
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing multiple distributions

2007-05-31 Thread Bert Gunter
While Ravi's suggestion of the compositions package is certainly
appropriate, I suspect that the complex and extensive statistical homework
you would need to do to use it might be overwhelming (the geometry of
compositions is a simplex, and this makes things hard). As a simple and
perhaps useful alternative, use pairs() or splom() to plot your 5-D data,
distinguishing the different treatments via color and/or symbol.

In addition, it might be useful to do the same sort of plot on the first two
principal components (?prcomp) of the first 4 dimensions of your 5 component
vectors (since the 5th is determined by the first 4). Because of the
simplicial geometry, this PCA approach is not right, but it may nevertheless
be revealing. The same plotting ideas are in the compositions package done
properly (in the correct geometry),so if you are motivated to do so, you can
do these things there. Even if you don't dig into the details, using the
compositions package version of the plots may be realtively easy to
do,interpretable, and revealing -- more so than my simple but wrong
suggestions. You can decide.

I would not trust inference using ad hoc approaches in the untransformed
data. That's what the package is for. But plotting the data should always be
at least the first thing you do anyway. I often find it to be sufficient,
too.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of jiho
Sent: Thursday, May 31, 2007 8:37 AM
To: R-help
Subject: Re: [R] Comparing multiple distributions

Nobody answered my first request. I am sorry if I did not explain my  
problem clearly. English is not my native language and statistical  
english is even more difficult. I'll try to summarize my issue in  
more appropriate statistical terms:

Each of my observations is not a single number but a vector of 5  
proportions (which add up to 1 for each observation). I want to  
compare the shape of those vectors between two treatments (i.e. how  
the quantities are distributed between the 5 values in treatment A  
with respect to treatment B).

I was pointed to Hotelling T-squared. Does it seem appropriate? Are  
there other possibilities (I read many discussions about hotelling  
vs. manova but I could not see how any of those related to my  
particular case)?

Thank you very much in advance for your insights. See below for my  
earlier, more detailed, e-mail.

On 2007-May-21  , at 19:26 , jiho wrote:
 I am studying the vertical distribution of plankton and want to  
 study its variations relatively to several factors (time of day,  
 species, water column structure etc.). So my data is special in  
 that, at each sampling site (each observation), I don't have *one*  
 number, I have *several* numbers (abundance of organisms in each  
 depth bin, I sample 5 depth bins) which describe a vertical  
 distribution.

 Then let say I want to compare speciesA with speciesB, I would end  
 up trying to compare a group of several distributions with another  
 group of several distributions (where a distribution is a vector  
 of 5 numbers: an abundance for each depth bin). Does anyone know  
 how I could do this (with R obviously ;) )?

 Currently I kind of get around the problem and:
 - compute mean abundance per depth bin within each group and  
 compare the two mean distributions with a ks.test but this  
 obviously diminishes the power of the test (I only compare 5*2  
 observations)
 - restrict the information at each sampling site to the mean depth  
 weighted by the abundance of the species of interest. This way I  
 have one observation per station but I reduce the information to  
 the mean depths while the actual repartition is important also.

 I know this is probably not directly R related but I have already  
 searched around for solutions and solicited my local statistics  
 expert... to no avail. So I hope that the stats' experts on this  
 list will help me.

 Thank you very much in advance.

JiHO
---
http://jo.irisson.free.fr/



-- 
Ce message a iti virifii par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a iti trouvi.
CRI UPVD http://www.univ-perp.fr

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] runif with weights

2007-05-30 Thread Bert Gunter
You did not explicitly say it, but your example indicates that you want to
sample from integers only (else what would weights mean?). So...

?sample  -- in particular note the prob argument and read help docs
carefully

e.g.

sample(100,25,prob=c(0,rep.int(.4,9),rep.int(.6,90))) ## without replacement

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ken Knoblauch
Sent: Wednesday, May 30, 2007 5:59 PM
To: [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: [R] runif with weights

Not sure why you have set the probability of a 1 to 0 but maybe something
like this might be what you want:

round( ifelse( rbinom(25, 1, 0.4), runif(25, 2, 10), runif(25, 11, 100) ) )
 [1]  2  6 34 90 79 71 83  8 47 36 21 32 17 71  3 16  9 65 94  6 30  5  7
10 13



I would like to generate 25 numbers from 1 to 100 but I would like to have
some numbers that could  be more probable to come out. I was thinking of
the function runif:
runif(25, 1, 100) , but I don´t know how to give more weight to some
numbers.

Example:
each number from 2 to 10 has the probability of 40% to come out but the
probability of each number from 11 to 100 to come out is 60%.


-- 
Ken Knoblauch
Inserm U846
Institut Cellule Souche et Cerveau
Département Neurosciences Intégratives
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.pizzerialesgemeaux.com/u846/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-29 Thread Bert Gunter
False. Box proved ~ca 1952 that standard inferences in the linear regression
model are robust to nonnormality, at least for (nearly) balanced designs.
The **crucial** assumption is independence, which I suspect partially
motivated his time series work on arima modeling. More recently, work on
hierarchical models (e.g. repeated measures/mixed effect models) has also
dealt with lack of independence.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of wssecn
Sent: Friday, May 25, 2007 2:59 PM
To: r-help
Subject: Re: [R] normality tests [Broadcast]

 The normality of the residuals is important in the inference procedures for
the classical linear regression model, and normality is very important in
correlation analysis (second moment)...

Washington S. Silva

 Thank you all for your replies they have been more useful... well
 in my case I have chosen to do some parametric tests (more precisely
 correlation and linear regressions among some variables)... so it
 would be nice if I had an extra bit of support on my decisions... If I
 understood well from all your replies... I shouldn't pay s much
 attntion on the normality tests, so it wouldn't matter which one/ones
 I use to report... but rather focus on issues such as the power of the
 test...
 
 Thanks again.
 
 On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
   Most standard tests, such as t-tests and ANOVA, are fairly resistant to
  non-normalilty for significance testing. It's the sample means that have
  to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
  for normality prior to choosing a test statistic is generally not a good
  idea.
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
  Sent: Friday, May 25, 2007 12:04 PM
  To: [EMAIL PROTECTED]; Frank E Harrell Jr
  Cc: r-help
  Subject: Re: [R] normality tests [Broadcast]
 
  From: [EMAIL PROTECTED]
  
   On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
 Hi all,

 apologies for seeking advice on a general stats question. I ve run
 
 normality tests using 8 different methods:
 - Lilliefors
 - Shapiro-Wilk
 - Robust Jarque Bera
 - Jarque Bera
 - Anderson-Darling
 - Pearson chi-square
 - Cramer-von Mises
 - Shapiro-Francia

 All show that the null hypothesis that the data come from a normal
 
 distro cannot be rejected. Great. However, I don't think
   it looks nice
 to report the values of 8 different tests on a report. One note is
 
 that my sample size is really tiny (less than 20
   independent cases).
 Without wanting to start a flame war, are there any
   advices of which
 one/ones would be more appropriate and should be reported
   (along with
 a Q-Q plot). Thank you.

 Regards,

   
Wow - I have so many concerns with that approach that it's
   hard to know
where to begin.  But first of all, why care about
   normality?  Why not
use distribution-free methods?
   
You should examine the power of the tests for n=20.  You'll probably
 
find it's not good enough to reach a reliable conclusion.
  
   And wouldn't it be even worse if I used non-parametric tests?
 
  I believe what Frank meant was that it's probably better to use a
  distribution-free procedure to do the real test of interest (if there is
  one) instead of testing for normality, and then use a test that assumes
  normality.
 
  I guess the question is, what exactly do you want to do with the outcome
  of the normality tests?  If those are going to be used as basis for
  deciding which test(s) to do next, then I concur with Frank's
  reservation.
 
  Generally speaking, I do not find goodness-of-fit for distributions very
  useful, mostly for the reason that failure to reject the null is no
  evidence in favor of the null.  It's difficult for me to imagine why
  there's insufficient evidence to show that the data did not come from a
  normal distribution would be interesting.
 
  Andy
 
 
   
Frank
   
   
--
Frank E Harrell Jr   Professor and Chair   School
   of Medicine
  Department of Biostatistics
   Vanderbilt University
   
  
  
   --
   yianni
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 
 
  
  --
  Notice:  This e-mail message, together with any
  attachments,...{{dropped}}
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read

Re: [R] trouble understanding why ...==NaN isn't true

2007-05-29 Thread Bert Gunter
1. NaN is a character string, **not** NaN; hence is.nan(NaN) yields
FALSE.

2. Please read the docs!  ?NaN explicitly says:

Do not test equality to NaN, or even use identical, since systems typically
have many different NaN values.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee
Sent: Tuesday, May 29, 2007 3:33 PM
To: r-help@stat.math.ethz.ch
Subject: [R] trouble understanding why ...==NaN isn't true

I have the following data:

 dataset[2,Sample.227]
[1]NaN
1558 Levels: -0.000 -0.001 -0.002 -0.003 -0.004 -0.005 -0.006 -0.007 -0.008-
0.009 ...  2.000


However, I'm not sure why this expression is coming back as FALSE:

 dataset[2,Sample.227]==NaN
[1] FALSE

Similarly:

 dataset[2,Sample.227]==NaN
[1] NA


It seems that since NaN is represented as a character, this expression
==NaN should be TRUE, but it's returning as FALSE.

Thanks,
Andrew

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is it possible to print a data.frame without the row names?

2007-05-24 Thread Bert Gunter
?write.table 



Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bos, Roger
Sent: Thursday, May 24, 2007 7:17 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Is it possible to print a data.frame without the row names?

Is it possible to print a data.frame without the row names?  I checked
?data.frame, ?print, ?format and didn't see anything that helped.  In
the example below, I would just like to show the two columns of data and
not the row.names 1:10.
 
 a-data.frame(1:10, 21:30)
 a
   X1.10 X21.30
1  1 21
2  2 22
3  3 23
4  4 24
5  5 25
6  6 26
7  7 27
8  8 28
9  9 29
1010 30
 row.names(a)-NULL
 a
   X1.10 X21.30
1  1 21
2  2 22
3  3 23
4  4 24
5  5 25
6  6 26
7  7 27
8  8 28
9  9 29
1010 30
 
 
Thanks,
 
Roger J. Bos, CFA
  
 

** * 
This message is for the named person's use only. It may 
contain confidential, proprietary or legally privileged 
information. No right to confidential or privileged treatment 
of this message is waived or lost by any error in 
transmission. If you have received this message in error, 
please immediately notify the sender by e-mail, 
delete the message and all copies from your system and destroy 
any hard copies. You must not, directly or indirectly, use, 
disclose, distribute, print or copy any part of this message 
if you are not the intended recipient. 
**
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help with apply and ccf

2007-05-22 Thread Bert Gunter
I understand you to want correlations of corresponding rows (** not ccf,
which returns a vector ccf for each pair of rows). If that is so, 

1) ... in theory, diag(cor(t(A), t(B)) would work without apply, except
196,000 rows is probably too large, and it is probably too inefficient to
compute and then throw away all the off-diagonals anyway.

2. ##Use a 3d array.
 ar - array(c(A,B),dim=c(dim(A),2)) ## this can also be done by abind() in
the abind package
  apply(ar,1,function(x)cor(x[,1],x[,2])) ## Value is a vector

3. ## probably simplest and best
 sapply(seq_along(nrow(a)),function(i)cor(a[i,],b[i,])) ## Note: value is a
vector, not an array


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Andric
Sent: Tuesday, May 22, 2007 8:35 AM
To: r-help@stat.math.ethz.ch
Subject: [R] R-help with apply and ccf

Dear R gurus,

I would like to use the ccf function on two matrices that are each 196000 x
12.  Ideally, I want to be able to go row by row for the two matrices using
apply for the ccf function and get one 196000 X 1 array output.  The apply
function though wants only one array, no?  Basically, is there a way to use
apply when there are two arrays in order to do something like correlation on
a row by row basis?
Thanks for your help

Michael

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Bert Gunter
?cut

This would recode to a factor with numeric labels for its levels.
as.numeric(as.character(...))would then convert the labels to numeric values
that you can manipulate. This presumes that the variable you are coding is
numeric and you want to recode by binning the values into ordered bins. 


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Lauri Nikkinen
Sent: Friday, May 18, 2007 8:02 AM
To: Gabor Grothendieck
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Simple programming question

Thank you all for your answers. Actually Gabor's first post was right in
that sense that I wanted to have low to all cases which are lower than
second highest. But how about if I want to convert/recode those high,
mid and low to numeric to make some calculations, e.g. 3, 1, 0
respectively. How do I have to modify your solutions? I would also like to
apply this solution to many kinds of recoding situations.

-Lauri


2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]:

 There was a problem in the first line in the case that the highest number
 is not unique within a category.   In this example its not apparent since
 that never occurs.  At any rate, it should be:

 f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
 factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))

 Also note that the factor labels were arranged so that
 low, mid and high correspond to levels 1, 2 and 3
 respectively.

 On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Try this.  f assigns 1, 2 and 3 to the highest, second highest and third
 highest
  within a category.  ave applies f to each category.  Finally we convert
 it to a
  factor.
 
  f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
  factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))
 
 
 
  On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
   Hi R-users,
  
   I have a simple question for R heavy users. If I have a data frame
 like this
  
  
   dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
   var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
   dfr - dfr[order(dfr$categ),]
  
   and I want to score values or points in variable named var3
 following this
   kind of logic:
  
   1. the highest value of var3 within category (variable named categ)
 -
   high
   2. the second highest value - mid
   3. lowest value - low
  
   This would be the output of this reasoning:
  
   dfr$score -
  

factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low
,low,high,mid,low,low))
   dfr
  
   The question is how I do this programmatically in R (i.e. if I have
 2000
   rows in my dfr)?
  
   I appreciate your help!
  
   Cheers,
   Lauri
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using lm() with variable formula

2007-05-17 Thread Bert Gunter
... and note that if a matrix of responses is on the left of ~ , separate
regressions will be simultaneously fit to each of the columns of the matrix.
Note that this **is** in TFM -- ?lm.


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck
Sent: Thursday, May 17, 2007 8:22 AM
To: Chris Elsaesser
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] using lm() with variable formula

Try this:


lm(Sepal.Length ~., iris[1:3])

# or

cn - c(Sepal.Length, Sepal.Width, Petal.Length)
lm(Sepal.Length ~., iris[cn])



On 5/17/07, Chris Elsaesser [EMAIL PROTECTED] wrote:
 New to R; please excuse me if this is a dumb question.  I tried to RTFM;
 didn't help.

 I want to do a series of regressions over the columns in a data.frame,
 systematically varying the response variable and the the terms; and not
 necessarily including all the non-response columns.  In my case, the
 columns are time series. I don't know if that makes a difference; it
 does mean I have to call lag() to offset non-response terms. I can not
 assume a specific number of columns in the data.frame; might be 3, might
 be 20.

 My central problem is that the formula given to lm() is different each
 time.  For example, say a data.frame had columns with the following
 headings:  height, weight, BP (blood pressure), and Cals (calorie intake
 per time frame).  In that case, I'd need something like the following:

lm(height ~ weight + BP + Cals)
lm(height ~ weight + BP)
lm(height ~ weight + Cals)
lm(height ~ BP + Cals)
lm(weight ~ height + BP)
lm(weight ~ height + Cals)
etc.

 In general, I'll have to read the header to get the argument labels.

 Do I have to write several functions, each taking a different number of
 arguments?  I'd like to construct a string or list representing the
 varialbes in the formula and apply lm(), so to say  [I'm mainly a Lisp
 programmer where that part would be very simple. Anyone have a Lisp API
 for R? :-}]

 Thanks,
 chris

 Chris Elsaesser, PhD
 Principal Scientist, Machine Learning
 SPADAC Inc.
 7921 Jones Branch Dr. Suite 600
 McLean, VA 22102

 703.371.7301 (m)
 703.637.9421 (o)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeated measures regression

2007-05-17 Thread Bert Gunter
You need to gain some background. MIXED EFFECTS MODELS in S and S-PLUS by
Pinheiro and Bates is a canonical reference for how to do this with R.
Chapter 10  of Venables and Ripley's MASS(4th ed.) contains a more compact
but very informative overview that may suffice. Other useful references can
also be found on CRAN.


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Christie
Sent: Thursday, May 17, 2007 10:06 AM
To: R-help@stat.math.ethz.ch
Subject: [R] repeated measures regression


How does one go about doing a repeated measure regression? The  
documentation I have on it (Lorch  Myers 1990) says to use linear /  
(subj x linear) to get your F.  However, if I put subject into glm or  
lm I can't get back a straight error term because it assumes  
(rightly) that subject is a nominal predictor of some sort.

In looking at LME it seems like it just does the right thing here if  
I enter the random effect the same as when looking for ANOVA like  
results out of it.  But, part of the reason I'm asking is that I  
wanted to compare the two methods.  I suppose I could get it out of  
aov but isn't that built on lm?  I guess what I'm asking is how to  
calculate the error terms easily with lm.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bug or feature?

2007-05-17 Thread Bert Gunter
... but it **is** explicitly documented in ?subset:

For data frames, the subset argument works on the rows. Note that subset
will be evaluated in the data frame, so columns can be referred to (by name)
as variables in the expression (see the examples).  


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of ivo welch
Sent: Thursday, May 17, 2007 11:53 AM
To: jim holtman
Cc: r-help
Subject: Re: [R] bug or feature?

ahh...it is the silent substitution of the data frame in the subset
statement.   I should have known this.  (PS: this may not be desirable
behavior; maybe it would be useful to issue a warning if the same name
is defined in an upper data frame.  just an opinion...)

mea misunderstanding.

/iaw

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Testing for existence inside a function

2007-05-15 Thread Bert Gunter
I think parent.frame() is what is wanted, not parent.env(environment()) in
your suggested solution:

Consider this: (which does **not** however handle the arbitrary expressions
as argument issue):


foo1 - function(z){
cat(exists(deparse(substitute(z)),parent.frame()),
exists(deparse(substitute(z)),parent.env(environment())),
exists(deparse(substitute(z))),\n)
invisible()
 }

foo -  function(x){
y - x
foo1(y)
}

x-1

## Then ...
 foo(x)
TRUE FALSE FALSE 

Note that parent.env() is the **enclosing environment** i.e. the environment
in which foo1 is defined (lexical scoping); while parent.frame() is the
frame of the caller of foo1, which is what is wanted if foo1 is to work when
called within a function. Note that parent.frame() would also work when foo1
is called at the command line.

Further corrections/clarifications welcome, of course.

Bert Gunter
Genentech Nonclinical Statistics



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck
Sent: Tuesday, May 15, 2007 10:06 AM
To: Liaw, Andy
Cc: r-help@stat.math.ethz.ch; Talbot Katz
Subject: Re: [R] Testing for existence inside a function

Maybe this:


chk2 - function(x) {
chr - deparse(substitute(x))
e - parse(text = chr)
structure(exists(chr, parent.env(environment())),
   is.name = length(e) == 1  is.name(e[[1]]))
}
chk2(1) # structure(FALSE, is.name = FALSE)
ab - 1
chk2(ab+1) # structure(FALSE, is.name = FALSE)
chk2(ab) # structure(TRUE, is.name = TRUE)
exists(x) # FALSE
chk2(x) # structure(FALSE, is.name = TRUE)
chk2(x+1) # structure(FALSE, is.name = FALSE)


On 5/15/07, Liaw, Andy [EMAIL PROTECTED] wrote:
 Another thing to watch out for is that an argument to a function can be
 an expression (or even literal constants), instead of just the name of
 an object.  exists() wouldn't really do the right thing.  I'm not sure
 how to properly do the exhaustive check.

 Andy

 From: Gabor Grothendieck
 
  Try this modification:
 
   chk - function(x) exists(deparse(substitute(x)),
  parent.env(environment()))
   ab - 1
   chk(ab)
  [1] TRUE
   exists(x)
  [1] FALSE
   chk(x)
  [1] FALSE
 
 
 
  On 5/15/07, Talbot Katz [EMAIL PROTECTED] wrote:
   Hi.
  
   Thanks once more for the swift response.  This solution
  works pretty well.
   The only small glitch is if I pass the function an argument
  with the same
   name as the function argument.  That is, suppose x is the
  argument name in
   my user-defined function, and that object x does not
  exist.  If I call the
   function f(x), i.e., using the non-existent object x as the
  argument value,
   then the function says that x exists.
  
   Here is my example log:
  
   chkex5 - function(objn){
   + c(exob=exists(deparse(substitute(objn
   + }
   exists(objn)
   [1] FALSE
   chkex5(objn)
   exob
   TRUE
   
  
   But I suppose I can live with this.  Thanks again!
  
  
   --  TMK  --
   212-460-5430home
   917-656-5351cell
  
  
  
   From: Liaw, Andy [EMAIL PROTECTED]
   To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch
   Subject: RE: [R] Testing for existence inside a function
   Date: Tue, 15 May 2007 11:41:17 -0400
   
   Just need a bit more work:
   
   R f - function(x) exists(deparse(substitute(x)))
   R f(y)
   [1] FALSE
   R y - 1
   R f(y)
   [1] TRUE
   R f(z)
   [1] FALSE
   
   Andy
   
   From: Talbot Katz

 Hi, Andy.

 Thank you for the quick response!  Unfortunately, none of
 these are exactly
 what I'm looking for.  I'm looking for the following:
 Suppose object y
 exists and object z does not exist.  If I pass y as the
  value of the
 argument to my function, I want to be able to verify, inside
 my function,
 the existence of y; similarly, if I pass z as the value of
 the argument, I
 want to be able to see, inside the function, that z
  doesn't exist.

 The missing function just checks whether the argument is
 missing; in my
 case, the argument is not missing, but the object may not
 exist.  And the
 way you use the exists function inside the user-defined
 function doesn't
 test the argument to the user-defined function, it's just
 hard-coded for the
 object y.  So I'm sorry if I wasn't clear before, and I hope
 this is clear
 now.  Perhaps what I'm attempting to do is unavailable
 because it's a bad
 programming paradigm.  But even an explanation if that's the
 case would be
 appreciated.

 --  TMK  --
 212-460-5430home
 917-656-5351cell



 From: Liaw, Andy [EMAIL PROTECTED]
 To: Talbot Katz [EMAIL PROTECTED],r-help@stat.math.ethz.ch
 Subject: RE: [R] Testing for existence inside a
  function  [Broadcast]
 Date: Tue, 15 May 2007 11:03:12 -0400
 
 Not sure which one you want, but the following should cover it:
 
 R f - function(x) c(x=missing(x), y=exists(y))
 R f(1)
  x y

Re: [R] confidence intervals on multiple comparisons

2007-05-15 Thread Bert Gunter
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Sent: Tuesday, May 15, 2007 12:52 PM
 To: Salvatore Enrico Indiogine
 Cc: R-help@stat.math.ethz.ch; [EMAIL PROTECTED]
 Subject: Re: [R] confidence intervals on multiple comparisons
 
 
 Enrico,
 
 prop.test is for testing proportions two at a time.  If you 
 want to test
 for differences between 4 proportions simultaneously (rather 
 than two at a
 time), try a logistic regression model (from which you can 
 get confidence
 intervals for each of your groups).
 
 Cody Hamilton, PhD
 Staff Biostatistician
 Edwards Lifesciences
 

Yes, but beware: in the default contr.treatment coding for contrasts, you
get estimates and confidence intervals for the first group and for the
**differences** between the first group and the others. As you said, it's
easy to get what you want from this, but you must pay attention to the
details here. 

Bert Gunter
Genentech Nonclinical statistics


   
  
  Salvatore Enrico
  
  Indiogine   
  
  [EMAIL PROTECTED]
   To 
  .com 
 R-help@stat.math.ethz.ch
  Sent by: 
   cc 
  [EMAIL PROTECTED]
  
  at.math.ethz.ch  
  Subject 
[R] confidence 
 intervals on 
multiple comparisons   
  
  05/13/2007 10:51 
  
  AM   
  
   
  
   
  
   
  
   
  
 
 
 
 
 Greetings!
 
 I am using prop.test to compare 4 proportions to find out whether they
 are equal.  According to the help function you can not have confidence
 intervals if you compare more than 2 proportions.
 
 I need to find an effect size or confidence interval for 
 these proportions.
 
 Any suggestions?
 
 Enrico
 
 --
 Enrico Indiogine
 
 Mathematics Education
 Texas AM University
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looking for a cleaner way to implement a setting certainindices of a matrix to 1 function

2007-05-08 Thread Bert Gunter
Suggestion:

You might make it easier for folks to help if you explained in clear and
simple terms what you are trying to do. Code is hard to deconstruct.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Leeds, Mark (IED)
Sent: Tuesday, May 08, 2007 2:22 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Looking for a cleaner way to implement a setting certainindices
of a matrix to 1 function

I wrote an ugly algorithm to set certain elements of a matrix to 1
without looping and below works and you can see what
The output is below the code.

K-6
lagnum-2

restrictmat-matrix(0,nrow=K,ncol=K*3)
restrictmat[((col(restrictmat) - row(restrictmat) = 0 ) 
(col(restrictmat)-row(restrictmat)) %% K == 0)]-1
restrictmat[,(lagnum*K+1):ncol(restrictmat)]-0

 restrictmat
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[,13] [,14] [,15] [,16] [,17] [,18]
[1,]100000100 0 0 0
0 0 0 0 0 0
[2,]010000010 0 0 0
0 0 0 0 0 0
[3,]001000001 0 0 0
0 0 0 0 0 0
[4,]000100000 1 0 0
0 0 0 0 0 0
[5,]000010000 0 1 0
0 0 0 0 0 0
[6,]000001000 0 0 1
0 0 0 0 0 0

For lagnum equals 1 , it also works :

 restrictmat
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[,13] [,14] [,15] [,16] [,17] [,18]
[1,]100000000 0 0 0
0 0 0 0 0 0
[2,]010000000 0 0 0
0 0 0 0 0 0
[3,]001000000 0 0 0
0 0 0 0 0 0
[4,]000100000 0 0 0
0 0 0 0 0 0
[5,]000010000 0 0 0
0 0 0 0 0 0
[6,]000001000 0 0 0
0 0 0 0 0 0

But I am thinking that there has to be a better way particularly because
I'll get an error if I set lagnum to 3. 
Any improvements or total revampings are appreciated. The number of
columns will always be a multiple of the number of rows
So K doesn't have to be 6. that was just to show what the commands do.
thanks.


This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

2007-05-07 Thread Bert Gunter
Folks:

If I understand correctly, the following may be pertinent.

Note that the procedure:

min.nnet = nnet[k] such that error rate of nnet[k] = min[i] {error
rate(nnet(training data) from ith random start) }

does not guarantee a classifier with a lower error rate on **new** data than
any single one of the random starts. That is because you are using the same
training set to choose the model (= nnet parameters) as you are using to
determine the error rate. I know it's tempting to think that choosing the
best among many random starts always gets you a better classifier, but it
need not. The error rate on the training set for any classifier -- be it a
single one or one derived in some way from many -- is a biased estimate of
the true error rate, so that choosing a classifer on this basis does not
assure better performance for future data. In particular, I would guess that
choosing the best among many (hundreds/thousands) random starts is probably
almost guaranteed to produce a poor predictor (ergo the importance of
parsimony/penalization).  I would appreciate comments from anyone, pro or
con, with knowledge and experience of these things, however, as I'm rather
limited on both.

The simple answer to the question of obtaining the error rate using
validation data is: Do whatever you like to choose/fit a classifier on the
training set. **Once you are done,** the estimate of your error rate is the
error rate you get on applying that classifier to the validation set. But
you can do this only once! If you don't like that error rate and go back to
finding a a better predictor in some way, then your validation data have now
been used to derive the classifier and thus has become part of the training
data, so any further assessment of the error rate of a new classifier on it
is now also a biased estimate. You need yet new validation data for that.

Of course, there are all sort of cross validation schemes one can use to
avoid -- or maybe mitigate -- these issues: most books on statistical
classification/machine learning discuss this in detail.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham
Sent: Monday, May 07, 2007 5:26 AM
To: Wensui Liu
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

Pick the one with the lowest error rate on your training data?
Hadley

On 5/7/07, Wensui Liu [EMAIL PROTECTED] wrote:
 well, how to do you know which ones are the best out of several hundreds?
 I will average all results out of several hundreds.

 On 5/7/07, hadley wickham [EMAIL PROTECTED] wrote:
  On 5/6/07, nathaniel Grey [EMAIL PROTECTED] wrote:
   Hello R-Users,
  
   I have been using (nnet) by Ripley  to train a neural net on a test
dataset, I have obtained predictions for a validtion dataset using:
  
   PP-predict(nnetobject,validationdata)
  
   Using PP I can find the -2 log likelihood for the validation datset.
  
   However what I really want to know is how well my nueral net is doing
at classifying my binary output variable. I am new to R and I can't figure
out how you can assess the success rates of predictions.
  
 
  table(PP, binaryvariable)
  should get you started.
 
  Also if you're using nnet with random starts, I strongly suggest
  taking the best out of several hundred (or maybe thousand) trials - it
  makes a big difference!
 
  Hadley
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 WenSui Liu
 A lousy statistician who happens to know a little programming
 (http://spaces.msn.com/statcompute/blog)


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple scatterplots

2007-05-02 Thread Bert Gunter
Please note: in R you can specify (some of the) graphics parameters as the
appropriate length vectors. So your plot example below can also be done as,
for example:

plot(
rep.int(aa,3),c(cc,bb,dd),col=rep(c(red,blue,green),e=length(aa)))

However, this doesn't seem to fit the posted request, where maybe something
like a trellis plot of the different distributions is what is wanted?? --
but I may well misunderstand.
 
Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Kane
Sent: Wednesday, May 02, 2007 11:06 AM
To: Kostadin Cholakov; r-help@stat.math.ethz.ch
Subject: Re: [R] Multiple scatterplots

Your title and your posting do not say the same thing.
 

Assuming you want all three distributions on one
scatter plot does this help?

aa - 1:10
bb - 11:2
cc  - bb^2
dd - c(3,4,7,9,11,32,11,14,5,9)

plot(aa,cc, col=red)
points(aa,bb, col=blue)
points(aa,dd, col=green)

Also in plotting it is a good idea to look at all the
variations etc that you can get with par()

Type  ?par 

--- Kostadin Cholakov [EMAIL PROTECTED] wrote:

 Hi,
 
 I have to plot three Ziph distributions for three
 languages where the
 x value represents the rank of a given word and the
 y value represents
 the relative frequency of this word in the corpus.
 Is there some way
 so that I can plot all three distributions on a
 single scatterplot,
 preferably with different colours :) I tried to find
 something in the
 R manual but there are no such examples :( Thank
 you!
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] thousand separator (was RE: weight)

2007-04-30 Thread Bert Gunter
Except this doesn't work for 1,123,456.789 Marc.

I hesitate to suggest it, but gregexpr() will do it, as it captures the
position of **every** match to ,. This could be then used to process the
vector via some sort of loop/apply statement.

But I think there **must** be a more elegant way using regular expressions
alone, so I, too, await a clever reply.

-- Bert 


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz
Sent: Monday, April 30, 2007 10:02 AM
To: Liaw, Andy
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] thousand separator (was RE: weight)

One possibility would be to use something like the following
post-import:

 WTPP
[1] 1,106.8250 1,336.5138

 str(WTPP)
 Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2

 as.numeric(gsub(,, , WTPP))
[1] 1106.825 1336.514


Essentially strip the ',' characters from the factors and then coerce
the resultant character vector to numeric. 

HTH,

Marc Schwartz


On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote:
 I've run into this occasionally.  My current solution is simply to read
 it into Excel, re-format the offending column(s) by unchecking the
 thousand separator box, and write it back out.  Not exactly ideal to
 say the least.  If anyone can provide a better solution in R, I'm all
 ears...
 
 Andy 
 
 From: Natalie O'Toole
  
  Hi,
  
  These are the variables in my file. I think the variable i'm having 
  problems with is WTPP which is of the Factor type. Does 
  anyone know how to 
  fix this, please?
  
  Thanks,
  
  Nat
  
  data.frame':   290 obs. of  5 variables:
   $ PROV  : num  48 48 48 48 48 48 48 48 48 48 ...
   $ REGION: num  4 4 4 4 4 4 4 4 4 4 ...
   $ GRADE : num  7 7 7 7 7 7 7 7 7 7 ...
   $ Y_Q10A: num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
   $ WTPP  : Factor w/ 1884 levels 
  1,106.8250,1,336.5138,..: 1544 67 
  1568 40 221 1702 1702 1434 310 310 ...
  
  
  __
  
  
  
  --- Douglas Bates [EMAIL PROTECTED] wrote:
  
   On 4/28/07, John Kane [EMAIL PROTECTED] wrote:
IIRC you have a yes/no smoking variable scored 1/2
   ?
   
It is possibly being read in as a factor not as an
integer.
   
try
 class(df$smoking.variable)
to see .
   
   Good point.  In general I would recommend using
   
   str(df)
   
   to check on the class or storage type of all
   variables in a data frame
   if you are getting unexpected results when
   manipulating it.  That
   function is carefully written to provide a maximum
   of information in a
   minimum of space.
  
  Yes but I'm an relative newbie at R and didn't realise
  that str() would do that.  I always thought it was
  some kind of string function. 
  
  Thanks, it makes life much easier.
  
   
--- Natalie O'Toole [EMAIL PROTECTED] wrote:
   
 Hi,

 I'm getting an error message:

 Error in df[, 1:4] * df[, 5] : non-numeric
   argument
 to binary operator
 In addition: Warning message:
 Incompatible methods (Ops.data.frame,
 Ops.factor) for *

 here is my code:


 ##reading in the file
 happyguys-read.table(c:/test4.dat,
   header=TRUE,
 row.names=1)

 ##subset the file based on Select If

 test-subset(happyguys, PROV==48  GRADE == 7  
 Y_Q10A  9)

 ##sorting the file

 mydata-test
 mydataSorted-mydata[ order(mydata$Y_Q10A), ]
 print(mydataSorted)


 ##assigning  a different name to file

 happyguys-mydataSorted


 ##trying to weight my data

 data.frame-happyguys
 df-data.frame
 df1-df[, 1:4] * df[, 5]

 ##getting error message here??

 Error in df[, 1:4] * df[, 5] : non-numeric
   argument
 to binary operator
 In addition: Warning message:
 Incompatible methods (Ops.data.frame,
 Ops.factor) for *

 Does anyone know what this error message means?

 I've been reviewing R code all day  getting
   more
 familiar with it

 Thanks,

 Nat


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] thousand separator (was RE: weight)

2007-04-30 Thread Bert Gunter
Nothing! My mistake! gsub -- not sub -- is what you want to get 'em all.

-- Bert 


Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz
Sent: Monday, April 30, 2007 10:18 AM
To: Bert Gunter
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] thousand separator (was RE: weight)

Bert,

What am I missing?

 print(as.numeric(gsub(,, , 1,123,456.789)), 10)
[1] 1123456.789


FWIW, this is using:

R version 2.5.0 Patched (2007-04-27 r41355)

Marc

On Mon, 2007-04-30 at 10:13 -0700, Bert Gunter wrote:
 Except this doesn't work for 1,123,456.789 Marc.
 
 I hesitate to suggest it, but gregexpr() will do it, as it captures the
 position of **every** match to ,. This could be then used to process the
 vector via some sort of loop/apply statement.
 
 But I think there **must** be a more elegant way using regular expressions
 alone, so I, too, await a clever reply.
 
 -- Bert 
 
 
 Bert Gunter
 Genentech Nonclinical Statistics
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz
 Sent: Monday, April 30, 2007 10:02 AM
 To: Liaw, Andy
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] thousand separator (was RE: weight)
 
 One possibility would be to use something like the following
 post-import:
 
  WTPP
 [1] 1,106.8250 1,336.5138
 
  str(WTPP)
  Factor w/ 2 levels 1,106.8250,1,336.5138: 1 2
 
  as.numeric(gsub(,, , WTPP))
 [1] 1106.825 1336.514
 
 
 Essentially strip the ',' characters from the factors and then coerce
 the resultant character vector to numeric. 
 
 HTH,
 
 Marc Schwartz
 
 
 On Mon, 2007-04-30 at 12:26 -0400, Liaw, Andy wrote:
  I've run into this occasionally.  My current solution is simply to read
  it into Excel, re-format the offending column(s) by unchecking the
  thousand separator box, and write it back out.  Not exactly ideal to
  say the least.  If anyone can provide a better solution in R, I'm all
  ears...
  
  Andy 
  
  From: Natalie O'Toole
   
   Hi,
   
   These are the variables in my file. I think the variable i'm having 
   problems with is WTPP which is of the Factor type. Does 
   anyone know how to 
   fix this, please?
   
   Thanks,
   
   Nat
   
   data.frame':   290 obs. of  5 variables:
$ PROV  : num  48 48 48 48 48 48 48 48 48 48 ...
$ REGION: num  4 4 4 4 4 4 4 4 4 4 ...
$ GRADE : num  7 7 7 7 7 7 7 7 7 7 ...
$ Y_Q10A: num  1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ WTPP  : Factor w/ 1884 levels 
   1,106.8250,1,336.5138,..: 1544 67 
   1568 40 221 1702 1702 1434 310 310 ...
   
   
   __
   
   
   
   --- Douglas Bates [EMAIL PROTECTED] wrote:
   
On 4/28/07, John Kane [EMAIL PROTECTED] wrote:
 IIRC you have a yes/no smoking variable scored 1/2
?

 It is possibly being read in as a factor not as an
 integer.

 try
  class(df$smoking.variable)
 to see .

Good point.  In general I would recommend using

str(df)

to check on the class or storage type of all
variables in a data frame
if you are getting unexpected results when
manipulating it.  That
function is carefully written to provide a maximum
of information in a
minimum of space.
   
   Yes but I'm an relative newbie at R and didn't realise
   that str() would do that.  I always thought it was
   some kind of string function. 
   
   Thanks, it makes life much easier.
   

 --- Natalie O'Toole [EMAIL PROTECTED] wrote:

  Hi,
 
  I'm getting an error message:
 
  Error in df[, 1:4] * df[, 5] : non-numeric
argument
  to binary operator
  In addition: Warning message:
  Incompatible methods (Ops.data.frame,
  Ops.factor) for *
 
  here is my code:
 
 
  ##reading in the file
  happyguys-read.table(c:/test4.dat,
header=TRUE,
  row.names=1)
 
  ##subset the file based on Select If
 
  test-subset(happyguys, PROV==48  GRADE == 7  
  Y_Q10A  9)
 
  ##sorting the file
 
  mydata-test
  mydataSorted-mydata[ order(mydata$Y_Q10A), ]
  print(mydataSorted)
 
 
  ##assigning  a different name to file
 
  happyguys-mydataSorted
 
 
  ##trying to weight my data
 
  data.frame-happyguys
  df-data.frame
  df1-df[, 1:4] * df[, 5]
 
  ##getting error message here??
 
  Error in df[, 1:4] * df[, 5] : non-numeric
argument
  to binary operator
  In addition: Warning message:
  Incompatible methods (Ops.data.frame,
  Ops.factor) for *
 
  Does anyone know what this error message means?
 
  I've been reviewing R code all day  getting
more
  familiar with it
 
  Thanks,
 
  Nat
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do

Re: [R] exclude the unfit data from the iteration

2007-04-24 Thread Bert Gunter
?try 
Wrap each iteration in a try() call

Also ?tryCatch if you want to get fancy -- and can understand the rather
arcane docs.

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mohammad Ehsanul
Karim
Sent: Tuesday, April 24, 2007 3:33 PM
To: r-help@stat.math.ethz.ch
Subject: [R] exclude the unfit data from the iteration

Dear List, 

Trying to explain my situation as simply as possible
for me:

I am running a series of iteration on coxph model on
simulated data (newly generated data on each iteration
to run under coxph; in my example below- sim.fr is the
generated data). However, sometimes i get warning
messages like 
Ran out of iterations and did not converge or 
Error in var(x, na.rm = na.rm) : missing observations
in cov/cor 
because in some cases one of my covariate (say, var5
or var6 or both who are binary variables) becomes all
0's!

How do I exclude the unfit data (that does not
fit/converge: that produces warning messages) that may
be generated in any iteration, and still continue by
replacing it by the next iteration data (untill it
generates acceptable data that does not give any
trouble like not converging)? Is there any provision
in R?

sim.result - function(...){
...
fit.gm.em - coxph(Surv(times,censored) ~
var1+var2+var3+var4+var5+var6 +
frailty(id,dist='gamma', method='em'), data= sim.fr)
...
}

I know
options(warn=-1)
can hide warning messages, but I need not hide the
problem, all i need to do is to tell the program to
continue untill fixed number of times (say, 100) it
iterates with good data.


Thank you for your time.

Mohammad Ehsanul Karim (R - 2.3.1 on windows)
Institute of Statistical Research and Training
University of Dhaka

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting mixed models to censored data?

2007-04-23 Thread Bert Gunter
Douglas:

AFAIK, this is subject area of active current research. Diggle, Heagerty,
Liang, and Zeger , 2002, (ANALYSIS OF LONGITUDINAL DATA) say on p.316: An
emerging consensus is that analysis of data with potentially informative
dropouts necessarily involves assumptions which are difficult, or even
impossible, to check from the observed data.  This was ca 1994, I believe,
so I don't know whether this view is still held among experts (which I am
not). But if it is, you may do well to be careful of whatever SAS does even
if you do have to go running off to it.

Cheers,

Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Douglas Grove
Sent: Monday, April 23, 2007 10:58 AM
To: r-help@stat.math.ethz.ch
Subject: [R] fitting mixed models to censored data?

Hi,

I'm trying to figure out if there are any packages allowing
one to fit mixed models (or non-linear mixed models) to data
that includes censoring.

I've done some searching already on CRAN and through the mailing
list archives, but haven't discovered anything.  Since I may well
have done a poor job searching I thought I'd ask here prior to
giving up.

I understand that SAS's proc nlmixed can accomodate censoring
(though proc mixed apparently can't), so if I can't find 
something available in R, I'll have to break down and use
that.  Please, save me from having to use SAS!

Thanks much,
Doug

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summary and min max

2007-04-23 Thread Bert Gunter
I believe it is fair to say that this is where (S3 to keep it simple)
classes come in handy: Class the sorts of objects you're working with, say
MyClass, and then write your own summary.MyClass() method.


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Robert Duval
Sent: Monday, April 23, 2007 4:16 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] summary and min max

Has anyone created an alternative summary method where the rounding is
made only for digits to right of the decimal point?

I personally don't like the way summarize works on this particular
issue, but I'm not sure how to modify it generically...

(of course one can always set digits=something_big but this is not
elegant and unpractical when one doesn't know in advance the magnitude
of a number)

robert

On 4/23/07, Mike Prager [EMAIL PROTECTED] wrote:
 Sebastian P. Luque [EMAIL PROTECTED] wrote:

  I came across a case where there's a discrepancy between minimum and
  maximum values reported by 'summary' and the 'min' and 'max' functions:

 summary() rounds by default. Thus its reporting oddball values
 is considered a feature, not a bug.

 --
 Mike Prager, NOAA, Beaufort, NC
 * Opinions expressed are personal and not represented otherwise.
 * Any use of tradenames does not constitute a NOAA endorsement.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent output using 'round'

2007-04-19 Thread Bert Gunter
It has nothing to do with round() -- it's the digits argument of the print
method that controls the number of digits in the output, print.default in
this case. And the documentation from print.default says for the digits
argument:

digits: a non-null value for digits specifies the minimum number of
significant digits to be printed in values. The default, NULL, uses
getOption(digits) 

And, lo and behold, your output shows a minimum of 3 **significant**  digits
with more being used in tables to line up values that are both greater and
less than 1.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 

 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Bob Green
 Sent: Thursday, April 19, 2007 2:05 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] inconsistent output using 'round'
 
 Peter,
 
 Many thanks. I have never seen a  confidence interval from 0.000 to 
 6265941604681544800.000 - this is a worry.
 
 I am also still puzzled why use of digits = 3, produced output which 
 includes 2, 3 and 4 decimal points as per below. The two decimal 
 point values for the coef should have been 2.479, 1.027, 1.614.
 
 regards
 
 Bob
 
   print(exp(coef(mod.multacute)),digits = 3)
   (Intercept) in.acute.dangery violent.convictionsy
 GBH.UW 0.233 3.900.714
 homicide   0.183 2.480.682
   in.acute.dangery:violent.convictionsy
 GBH.UW1.03
 homicide 1.61
   print(exp(confint(mod.multacute)),digits =3)
 , , GBH.UW
 
   2.5 % 97.5 %
 (Intercept)   0.130  0.417
 in.acute.dangery  1.384 10.970
 violent.convictionsy  0.213  2.390
 in.acute.dangery:violent.convictionsy 0.146  7.200
 
 , , homicide
 
   2.5 % 97.5 %
 (Intercept)   0.0964  0.349
 in.acute.dangery  0.7194  8.543
 violent.convictionsy  0.1747  2.660
 in.acute.dangery:violent.convictionsy 0.1767 14.738
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bquote in plot.default vs plot.formula ?

2007-04-13 Thread Bert Gunter
Folks:

If it's not too technical, could someone explain the following:

x - 1:5; y - x

## The following 3 all work as expected:

plot(x,y, main= expression(sin(x+1)))
plot(y~x, main= expression(sin(x+2 )))
plot(x,y, main= bquote(sin(x+3)))

## The following does not:
plot(y~x, main= bquote(sin(x+4)))

## Perhaps more interesting results occur if log[10] is substituted for
sin in these expressions. The last plot command then produces the error
message: Error in log[10] : object is not subsettable

Feel free to reply offline if you think that's more appropriate. Version
info below.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


Version info:

 R.Version()
$platform
[1] i386-pc-mingw32

$arch
[1] i386

$os
[1] mingw32

$system
[1] i386, mingw32

$status
[1] 

$major
[1] 2

$minor
[1] 4.1

$year
[1] 2006

$month
[1] 12

$day
[1] 18

$`svn rev`
[1] 40228

$language
[1] R

$version.string
[1] R version 2.4.1 (2006-12-18)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nlm() and optim()

2007-04-10 Thread Bert Gunter
Numerical optimization is sensitive to (at least) the method chosen,
control/convergence specifications, and the parameterization of the function
being optimized (all of this is well known). Defining what you mean by
reproduce in a precise, operational way is therefore essential. You have
not done so. For example, if it is the negative (ln)likelihood of a
statistical model that is being minimized, if the model is overparametrized
so that there are near identifiability issues,the confidence region for the
parameters will essentially be a (possibly quite irregular)lower dimensional
subspace (submanifold) of the full parameter space. Would you say that
results reproduce  if they fall within this confidence region, even though
they may be quite different than the estimated minima? Issues with possibly
multiple local minima also complicate matters.

Bottom line: Determining when you havereproduced results from complex
modelling that rely on numerical optimization for model fitting can be
difficult. Careful and parsimonious modelling is vital.


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Silvia Lucato
Sent: Tuesday, April 10, 2007 7:32 AM
To: r-help@stat.math.ethz.ch
Subject: [R] nlm() and optim()

Dear R-users,

I have just joint the list and much appreciate any thoughts on 2 issues.

Firstly, I want to reproduce some minimization results conducted in MATLAB.
I have suceeded with nlm and optim-method CG. I have been told that I should
get also with other optim methods. Actually, I found the same results when
testing a very straightforward equation. However with a more complicated
model it was not true. It is realy possible? Have I got it by chance in the
simple case?

Secondly, in order to check which optimization is more suitable for our
study, I would like to have the value of the minimized parameters on each
iteration to later plot a likelihood surface. However, for both nlm and
optim, I could only keep the last iteration results. Is there a way to
store/record the minimized values for each iteration? 

Sorry if these questions are reocuring. I have been searching for hints but
did not get too far and I am fairly new to R.

Comments and examples are most  welcome.
Silvia Hadeler

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a data frame from a list

2007-04-05 Thread Bert Gunter
Dmitri:

As you apparently have not received a reply

IMHO, one of the glories of R is the ease with which you can create de novo
solutions for little problems like this yourself. While there may be more
efficient,robust, and elegant solutions already available, it can frequently
be considerably more time consuming to find and figure them out, as you
appear to have experienced. (And once outside base R and standard packages,
documentation can be problematic).

Anyway, whether you agree with that propoganda or not, here is a little
function (no claim for elegance or efficiency!) that does what you want, I
think:

makeFrame-function(xlist)
{
allnames - sort(unique(unlist(sapply(xlist,names
data.frame(lapply(xlist,function(y,an)structure(y[match(an,names(y))],
names=NULL),
an=allnames),row.names=allnames)
}

##test it

 lst
$a
A B 
1 8 

$b
A B C 
2 3 0 

$c
B D 
2 0 

 makeFrame(lst)
   a  b  c
A  1  2 NA
B  8  3  2
C NA  0 NA
D NA NA  0
 

Cheers,

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Szerman
Sent: Thursday, April 05, 2007 11:58 AM
To: R-Help
Subject: [R] creating a data frame from a list

Dear all,

A few months ago, I asked for your help on the following problem:

I have a list with three (named) numeric vectors:

 lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) )
 lst
$a
A B
1 8

$b
A B C
2 3 0

$c
B D
2 0

Now, I'd love to use this list to create the following data frame:

 dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA),
+  b=c(A=2,B=3,C=0,D=NA),
+  c=c(A=NA,B=2,C=NA,D=0) )

 dtf
   ab c
A   1   2  NA
B   8   3 2
C NA   0  NA
D NA NA0

That is, I wish to merge the three vectors in the list into a data frame
by their (row)names.

And I got the following answer:

library(zoo)
z - do.call(merge, lapply(lst, function(x) zoo(x, names(x
rownames(z) - time(z)
coredata(z)

However, it does not seem to be working. Here's what I get when I try it:

 lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) )
 library(zoo)
 z - do.call(merge, lapply(lst, function(x) zoo(x, names(x
Error in if (freq  1  identical(all.equal(freq, round(freq)),
TRUE)) freq - round(freq) :
missing value where TRUE/FALSE needed
In addition: Warning message:
NAs introduced by coercion

and z was not created.

Any ideas on what is going on here?
Thank you,
Dimitri

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wikibooks

2007-03-29 Thread Bert Gunter
Question:

Many (perhaps most?) questions on the list are easily answerable simply by
checking existing R Docs (Help file/man pages, Intro to R, etc.). Why would
a Wiki be more effective in deflecting such questions from the mailing list
than them? Why would too helpful R experts be more inclined to refer people
to the Wiki than the existing docs? Bottom line: it's psychology at issue
here, I think, not the form of the docs. 

Disclaimer 1: None of this is meant to reflect one way or ther other on the
usefulness of Wikis as a documentation format -- only their ability to
change the Help list culture.

Disclaimer 2: Others have repeatedly made similar comments (asking us to
refer people to the docs rather than providing explicit answers, I mean).

Cheers,
Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr
Sent: Thursday, March 29, 2007 3:32 PM
To: Ben Bolker
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Wikibooks

Ben Bolker wrote:
 Alberto Monteiro albmont at centroin.com.br writes:
 
 As a big fan of Wikipedia, it's frustrating to see how little there is
about 
 R in the correlated project, the Wikibooks:

 http://en.wikibooks.org/wiki/R_Programming

 Alberto Monteiro

 
   Well, we do have an R wiki -- http://wiki.r-project.org/rwiki/doku.php
--
 although it is not as active as I'd like.  (We got stuck halfway through
 porting Paul Johnson's R Tips to it ...)   Please contribute!
   Most of the (considerable) effort people expend in answering
 questions about R goes to the mailing lists -- I personally would like it
if some
 tiny fraction of that energy could be redirected toward the wiki, where
 information can be presented in a nicer format and (ideally) polished
 over time -- rather than having to dig back through multiple threads on
the
 mailing lists to get answers.  (After that we have to get people
 to look for the answers on the wiki.)

I would like to strongly second Ben.  In some ways, R experts are too 
nice.  Continuing to answer the same questions over and over does not 
lead to a better way using R wiki.  I would rather see the work go into 
enhancing the wiki and refactoring information, and responses to many 
r-help please for help be see wiki topic x.  While doing this let's 
consider putting a little more burden on new users to look for good 
answers already provided.

Frank

 
   Just my two cents -- and I've been delinquent in my 
 wiki'ing recently too ...
 
   Ben Bolker
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Completely off topic, but amusing?

2007-03-23 Thread Bert Gunter
Folks:

Thought that many on this list might find this amusing, perhaps even a bit
relevant. Hope it's OK:


WASHINGTON - The government's estimate of the number of Americans without
health insurance fell by nearly 2 million Friday, but not because anyone got
health coverage. 

The Census Bureau
http://search.news.yahoo.com/search/news/?p=Census+Bureau  said it has
been overstating the number of people without health insurance since 1995.
The bureau blamed the inflated numbers on a **12-year-old computer
programming error**.[emphasis added -- BG]
**

So what does validated software really mean? (Rhetorical question -- no
reply sought). 

Cheers to all,

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Colored boxes with values in the box

2007-03-22 Thread Bert Gunter
Sounds like  ?image  what you are looking for, perhaps?

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Pappu, Kartik
Sent: Thursday, March 22, 2007 3:42 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Colored boxes with values in the box

Hi all,

I have a x, y matrix of numbers (usually ranging from 0 to 40).  I need 
to group these numbers and assign a color to each group (for example 0 
to 15 - Blue, 16-30- Yellow, and 31-40- Red).  Then I need to draw a 
rectangular matrix which contains X x Y boxes and each box has the  
corresponding value from the input matrix and is also colored according 
to which group (i.e red, yellow, or blue) that value falls into.

I have used the color2D.matplot function from the plotrix package, but 
I cant quite figure out how to group the values to represent red blue 
and yellow colors.

Thanks

Kartik


--
IMPORTANT WARNING:  This email (and any attachments) is only...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bad points in regression

2007-03-16 Thread Bert Gunter
(mount soapbox...)

While I know the prior discussion represents common practice, I would argue
-- perhaps even plead -- that the modern(?? 30 years old now) alternative
of robust/resistant estimation be used, especially in the readily available
situation of least-squares regression. RSiteSearch(Robust) will bring up
numerous possibilities.rrcov and robustbase are at least two packages
devoted to this, but the functionality is available in many others (e.g.
rlm() in MASS).

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374





-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding
Sent: Friday, March 16, 2007 6:44 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Bad points in regression

On 16-Mar-07 12:41:50, Alberto Monteiro wrote:
 Ted Harding wrote:
 
 alpha - 0.3
 beta - 0.4
 sigma - 0.5
 err - rnorm(100)
 err[15] - 5; err[25] - -4; err[50] - 10
 x - 1:100
 y - alpha + beta * x + sigma * err
 ll - lm(y ~ x)
 plot(ll)
 
 ll is the output of a linear model fiited by lm(), and so has
 several components (see ?lm in the section Value), one of
 which is residuals (which can be abbreviated to res).
 
 So, in the case of your example,
 
   which(abs(ll$res)2)
   15 25 50
 
 extracts the information you want (and the 2 was inspired by
 looking at the residuals plot from your plot(ll)).

 Ok, but how can I grab those points _in general_? What is the
 criterium that plot used to mark those points as bad points?

Ahh ... ! I see what you're after. OK, look at the plot method
for lm():

?plot.lm
  ## S3 method for class 'lm':
  plot(x, which = 1:4,
caption = c(Residuals vs Fitted, Normal Q-Q plot,
  Scale-Location plot, Cook's distance plot),
  panel = points,
  sub.caption = deparse(x$call), main = ,
  ask = prod(par(mfcol))  length(which)  dev.interactive(),
  ...,
  id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75)


where (see further down):

  id.n: number of points to be labelled in each plot, starting with
the most extreme.

and note, in the default parameter-values listing above:

  id.n = 3

Hence, the 3 most extreme points (according to the criterion being
plotted in each plot) are marked in each plot.

So, for instance3, try

  plot(ll,id.n=5)

and you will get points 10,15,25,28,50. And so on. But that
pre-supposes that you know how many points are exceptional.


What is meant by extremeis not stated in the help page ?plot.lm,
but can be identified by inspecting the code for plot.lm(), which
you can see by entering

  plot.lm

In your example, if you omit the line which assigns anomalous values
to err[15[, err[25] and err[50], then you are likely to observe that
different points get identified on different plots. For instance,
I just got the following results for the default id.n=3:

[1] Residuals vs Fitted:   41,53,59
[2] Standardised Residuals:41,53,59
[3] sqrt(Stand Res) vs Fitted: 41,53,59
[4] Cook's Distance:   59,96,97


There are several approaches (with somewhat different outcomes)
to identifying outliers. If you apply one of these, you will
probably get the identities of the points anyway.

Again in the context of your example (where in fact you
deliberately set 3 points to have exceptional errors, thus
coincidentally the same as the default value 3 of id.n),
you could try different values for id.n and inspect the graphs
to see whether a given value of id.n marks some points that
do not look exceptional relative to the mass of the other points.

So, the above plot(ll,id.n=5) gave me one point, 10 on the
residuals plot, which apparently belonged to the general
distribution of residuals.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 16-Mar-07   Time: 13:43:54
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Connecting R-help and Google Groups?

2007-03-14 Thread Bert Gunter
I know nothing about Google Groups, but FWIW, I think it would be most
unwise for R/CRAN to hook up to **any** commercially sponsored web portals.
Future changes in their policies, interfaces,or access conditions may make
them inaccessible or unfreindly to R users. So long as we have folks willing
and able to host and maintain our lists as part of the CRAN infrastructure,
CRAN maintains control. I think this is wise and prudent.

I am happy to be educated to the contrary if I misunderstand how this would
work.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul Lynch
Sent: Wednesday, March 14, 2007 8:48 AM
To: R-help@stat.math.ethz.ch
Subject: [R] Connecting R-help and Google Groups?

This morning I tried to see if I could find the r-help mailing list on
Google Groups, which has an interface that I like.  I found three
Google Groups (The R Project for Statistical Computing, rproject,
and rhelp) but none of them are connected to the r-help list.

Is there perhaps some reason why it wouldn't be a good thing for there
to be a connected Google Group?  I think it should be possible to set
things up so that a post to the Google Group goes to the r-help
mailing list, and vice-versa.

Also, does anyone know why the three existing R Google Groups failed
to get connected to r-help?  It might require some action on the part
of the r-help list administrator.

Thanks,
--Paul

-- 
Paul Lynch
Aquilent, Inc.
National Library of Medicine (Contractor)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to modify a column of a matrix

2007-03-12 Thread Bert Gunter
?cut ## if you have several bins, where ifelse becomes messy


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz
Sent: Monday, March 12, 2007 11:25 AM
To: Sergio Della Franca
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] How to modify a column of a matrix

On Mon, 2007-03-12 at 18:55 +0100, Sergio Della Franca wrote:
 Dear R-helpers,
 
 I'm trying to create a string-code to modify the contents of a column of a
 matrix.
 
 For example, I have this dataset:
 
   YEAR   PRODUCTS
   1992  3253
   1993  4144
   1994  3246
   1996  4144
   1997  4087
   1998  3836
   1999  4379
   2000  4072
   2001  4202
   2002  4554
   2003  4456
   2004  4738
   2005  4144
 
 I want to convert/update the values of the column PRODUCTS under some
 condition (i.e. when the values of PRODUCTS is greather than 4000 replace
 the values of PRODUCTS whit 0 else replace with 1).
 
 My question is the following:
 there is a function or a metodology that allow to makes this operation?
 
 
 Thank you in advance,
 Sergio

If the data is above is matrix (MAT) and not a data frame:

# See ?cbind and ?ifelse

MAT - cbind(MAT,  NewCol = ifelse(MAT[, PRODUCTS]  4000, 0, 1))

 MAT
   YEAR PRODUCTS NewCol
1  1992 3253  1
2  1993 4144  0
3  1994 3246  1
4  1996 4144  0
5  1997 4087  0
6  1998 3836  1
7  1999 4379  0
8  2000 4072  0
9  2001 4202  0
10 2002 4554  0
11 2003 4456  0
12 2004 4738  0
13 2005 4144  0


If it is a data frame:

DF$NewCol - ifelse(DF$PRODUCTS  4000, 0, 1)

 DF
   YEAR PRODUCTS NewCol
1  1992 3253  1
2  1993 4144  0
3  1994 3246  1
4  1996 4144  0
5  1997 4087  0
6  1998 3836  1
7  1999 4379  0
8  2000 4072  0
9  2001 4202  0
10 2002 4554  0
11 2003 4456  0
12 2004 4738  0
13 2005 4144  0


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hwo can i get a vector that...

2007-03-07 Thread Bert Gunter
apply(yourMatrix,1,which.max) 


Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of bunny ,
lautloscrew.com
Sent: Wednesday, March 07, 2007 2:12 PM
To: r-help@stat.math.ethz.ch
Subject: [R] hwo can i get a vector that...

dear all,

how can i get a vector that shows the number of the column of matrix  
that contains the maximum of the row ??
can´t believe in need a loop for this...

i have a  100 x 3 matrix and want to get a 100 x 1 vector with values  
1,2,3 .

there must be a simple solution. i just cannot find it. i think am  
searching on the wrong end.

thx for help in advance.

m.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to edit my R codes into a efficient way

2007-03-06 Thread Bert Gunter
Have you read An Introduction to R? If not, do so before posting any further
questions.

Once you have read it, pay attention to what it says about lists, which is a
very general data structure (indeed, **the** most general) that is very
convenient for this sort of task. The general approach that one uses is
something like:

ContentsOfFiles - lapply(filenameVector,
functionThatReadsFile,additionalParametersto Function)

More specifically,

ContentsOfFiles - lapply(filenameVector, read.csv, header=TRUE,
quote=,fill=TRUE)

see ?lapply


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Xuhong Zhu
Sent: Tuesday, March 06, 2007 7:19 AM
To: r-help@stat.math.ethz.ch
Subject: [R] how to edit my R codes into a efficient way

Hello, Everyone,

I am a student an a new learner of R and I am trying to do my homework
in R. I have 10 files need to be read and process seperately. I really
want to write the codes into something like macro to save the lines
instead of repeating 10 times of similar work.

The following is part of my codes and I only extracted three lines for
each repeating section.


data.1 - read.csv(http://pegasus.cc.ucf.edu/~xsu/CLASS/STA6704/pat1.csv;,
header = TRUE, sep = ,, quote = ,
fill = TRUE);
data.2 - read.csv(http://pegasus.cc.ucf.edu/~xsu/CLASS/STA6704/pat3.csv;,
header = TRUE, sep = ,, quote = ,
fill = TRUE);
data.3 - read.csv(http://pegasus.cc.ucf.edu/~xsu/CLASS/STA6704/pat4.csv;,
header = TRUE, sep = ,, quote = ,
fill = TRUE);

baby.1 - data.frame(cuff=data.1$avg_value,
time=seq(1,dim(data.1)[1]), patient=rep(1, dim(data.1)[1]))
baby.2 - data.frame(cuff=data.2$avg_value,
time=seq(1,dim(data.2)[1]), patient=rep(3, dim(data.2)[1]))
baby.3 - data.frame(cuff=data.3$avg_value,
time=seq(1,dim(data.3)[1]), patient=rep(4, dim(data.3)[1]))


I also tried the codes below but it doesn't work.

for(n in 1:10){
mm - data.frame(cuff=paste(data,n, sep=.)$avg_value,
time=seq(1,dim(paste(data,n, sep=.))[1]),
patient=rep(1,paste(data,n, sep=.))[1]))
assign(paste(baby,n,sep=.), mm)}

I am looking forward to your help and thanks very much!

Xuhong

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Off topic:Spam on R-help increase?

2007-03-06 Thread Bert Gunter
Folks:

In the past 2 days I have seen a large increase of  spam getting into
R-help. Are others experiencing this problem? If so, has there been some
change to the spam filters on the R-servers? If not, is the problem on my
end?

Feel free to reply privately. 

Thanks.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recalling and printing multiple graphs. Is there somethingin the HISTORY menu that will help?

2007-03-06 Thread Bert Gunter
See FAQ for Windows 5.2 and the referenced README.


?win.metafile and ?replayPlot might allow you to replay the saved plot
history (by default in .SavedPlots) into a file in emf or wmf format, I
think, but I haven't actually tried this -- don't know if it will work for
multiple graphs.

Let us know if this approach works if you don't get a definitive answer
elsewhere.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin
Sent: Tuesday, March 06, 2007 9:44 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Recalling and printing multiple graphs. Is there somethingin
the HISTORY menu that will help?

I have written an R function that produces multiple graphs. I use
par(ask=TRUE) to allow for the inspection of each graph before the next
graph is drawn. I am looking for a way to recall all graphs drawn in an
R session, and a method that can be used to print all the graphs at one
time. I know that I could simply print each graph after I inspect the
graph, but this gets tiresome if one's function produces tens of graphs.
I suspect that if I knew more about the history menu (which currently
has an entry RECORDING) I could get the graphs to be replayed and
printed, but alas I have not been able to find instructions for using
the HISTORY menu. Please take pity on my  when you let me know that some
easy search or command could get me the information I needed. I have
looked, but clearly in the wrong places.
John 
 
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
[EMAIL PROTECTED]
Confidentiality Statement:
This email message, including any attachments, is for the so...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] from function to its name?

2007-03-02 Thread Bert Gunter
 Seth is, of course, correct, but perhaps the following may help:

## function that takes a function as an argument

 foo - function(f,x)list(deparse(substitute(f)),f(x))

## Value is a list of length 2; first component is a character string giving
the name of the funtion; second component is the result of applying the
function to the x argument.

##pass in the name (UNquoted) of the function as the first argument
## This works because the evaluator looks up the function that the symbol is
bound to in the usual way

 foo(mean, 1:5)
[[1]]
[1] mean

[[2]]
[1] 3

## pass in an unnamed function as the first argument
 foo(function(y)sum(y)/length(y), 1:5)
[[1]]
[1] function(y) sum(y)/length(y)

[[2]]
[1] 3

## the following gives an error since the first argument is a character
string, not a name/symbol:

 foo(f=mean, 1:5)
Error in foo(f = mean, 1:5) : could not find function f


Cheers,

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Seth Falcon
Sent: Friday, March 02, 2007 9:18 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] from function to its name?

Ido M. Tamir [EMAIL PROTECTED] writes:
 I wanted to pass a vector of functions as an argument to a function to do
some 
 calculations and put the results in a list where each list entry has 
 the name of the function.
 I thought I could either pass a vector of function names as character,
then
 retrieve the functions etc...
 Or do the opposite, pass the functions and then retrieve the names, but
 this seems not to be possible it occurred to me, hence my question.

Functions don't have to have names, by which I mean that the
definition doesn't have to be bound to a symbol.  If your function
takes a list of functions then:

  yourFunc(theFuncs=list(function(x) x + 1))

You could force the list to have names and use them.  Or you could
force function names to be passed in (your other idea).

+ seth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for Statistical Models in S ?

2007-03-01 Thread Bert Gunter
The White Book provides the original S Language Specification. This was what
existed at Bell labs way back then. Subsequent implementations, both S-Plus
and R, will differ on details.

Also, a lot of development effort has flowed over the dam since publication,
so both implementations contain lots of stuff not even mentioned there.See
also the Green book.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Charilaos Skiadas
Sent: Thursday, March 01, 2007 12:56 PM
To: R-Mailingliste
Subject: [R] R code for Statistical Models in S ?

I just acquired a copy of Statistical Models in S, I guess most  
commonly known as the white book, and realized to my dismay that  
most of the code is not directly executable in R, and I was wondering  
if there was a source discussing the things that are different and  
what the new ways of calling things are.

For instance, the first obstacle was the solder.balance data set. I  
found a solder data set in rpart, which is very close to it except  
for the fact that the Panel variable is not a factor, but that's  
easily fixed.
The first problem is the next two calls, on pages 2 and 3. One is  
plot(solder.balance), which is supposed to produce a very different  
plot than it does in R (I actually don't know the name of the plot,  
which is part of the problem I guess). Then one is supposed to call  
plot.factor(skips ~ Opening + Mask), which I took to mean:
plot(skips ~ Opening + Mask, data=solder), and that worked, though  
I still haven't been able to make a direct call to plot.factor work  
(I keep getting a could not find function plot.factor error).

Anyway, just wondered whether there is some page somewhere that  
discusses these little differences here and there, as I am sure there  
will be a number of other problems such as these along the way.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Default par() options

2007-03-01 Thread Bert Gunter

Thomas:
I am not sure exactly what you are asking for below, but I wonder if your
query could be satisfied by the judicious use of the ... argument in a
wrapper function to par(), like

myPar=function(bg=lightgray, pch=19,...)par(bg=bg,pch=pch,...)

or perhaps 

myX11 - function(width=10, bg=lightgray, pch=19,...)
{
X11(width=width)
par(bg=bg,pch = pch,...)
}

This would use the existing user-chosen defaults for the respective devices
if no other values were provided, and would allow the user to explicitly
specify any different values for them or additional arguments to par if
needed. I agree that it ain't elegant, though, so I'd welcome better
alternatives, too.

Of course, one can explicitly use formals() and the construction:

dots - as.list(substitute(list(...)))[-1]   ## VR: S PROGRAMMING p. 46

to obtain all the arguments and their names and appropriately stuff them
into either par() or X11() using do.call() or something similar; but that
seems like more than you need here.

Anyway, HTH.


-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Petr 
 Klasterecky
 Sent: Thursday, March 01, 2007 12:51 PM
 To: Thomas Friedrichsmeier
 Cc: [EMAIL PROTECTED]
 Subject: Re: [R] Default par() options
 
 I am no expert on these topics but currently I am solving a similar 
 issue using the .Rprofile file and the .First function. So maybe it's 
 enough to put
 .First - function(){
 par(whatever you want)
 further instructions if neccessary
 }
 
 Petr
 
 Thomas Friedrichsmeier napsal(a):
  The following question/idea came up on the RKWard 
 development mailing list, 
  but might be of general interest:
  
  Is there a nice way to customize the default look of all 
 graphs on all 
  devices? I.e. a way to - for instance - set the following 
 options before each 
  plot:
  
  par(bg=light gray, las=2, pch=19)
  
  As far as I have found, there would currently be two ways 
 to do this:
  1) Adding the above statement manually after opening the 
 device, and before 
  starting the plot. It could of course be wrapped inside a 
 custom function to 
  save some typing, but you'd still need to make sure to 
 always add the 
  command.
  
  2) Overriding all device functions with something like:
  X11 - function (...) {
  grDevices::X11 (...)
  par ([custom options])
  }
  This would be feasible, but feels rather dirty. Also, 
 something substantially 
  more elaborate would be needed to honor e.g. fonts and bg 
 arguments, if 
  explicitely specified in the call to X11. Would have to be 
 done for each 
  device separately.
  
  Does a third, more elegant solution exist?
  
  If not, would the following idea have any chances of being 
 added to R?
  
  Create a new options(par.default), similar to the already 
 existing 
  options(par.ask.default). This would take a list of par() 
 options to set a 
  default value for, like e.g.:
  
  options(par.default=list(bg=light gray, las=2, pch=19))
  
  Only those options would need to be specified in the list, 
 for which you 
  actually want to set a default different from the built-in. Options 
  explicitely specified in X11(), plot(), additional calls to 
 par(), etc. would 
  take precedence over options(par.default).
  
  Regards
  Thomas Friedrichsmeier
  
  
  
 --
 --
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Petr Klasterecky
 Dept. of Probability and Statistics
 Charles University in Prague
 Czech Republic
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages in R for least median squares regression and computingoutliers (thompson tau technique etc.)

2007-02-28 Thread Bert Gunter
Packages MASS and robustbase both have this functionality. There may also be
others.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of lalitha viswanath
Sent: Wednesday, February 28, 2007 10:04 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Packages in R for least median squares regression and
computingoutliers (thompson tau technique etc.)

Hi
I am looking for suitable packages in R that do
regression analyses using least median squares method
(or better). Additionally, I am also looking for
packages that implement algorithms/methods for
detecting outliers that can be discarded before doing
the regression analyses.

Although some websites refer to lms method under
package lps in R, I am unable to find such a package
on CRAN.

I would greatly appreciate any pointers to suitable
functions/packages for doing the above analyses.

Thanks
Lalitha


 


TV dinner still cooling? 
Check out Tonight's Picks on Yahoo! TV.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is a expression good for?

2007-02-28 Thread Bert Gunter
See VR's S PROGRAMMING, esp. section 3.5; and section 6.1 and subsequent of
the R Language Definition.

An expression object is the output of parse(), and so is R's representation
of a parsed expression. It is a type of list -- a parse tree for the
expression. This means that you can actually find the sorts of things you
mention by taking it apart as a list:

 ex - parse(text = x + y)
 ex
expression(x + y)
 class(ex)
[1] expression
 ex[[1]]
x + y
 ex[[c(1,1)]]
`+`
 ex[[c(1,2)]]
x
 ex[[c(1,3)]]
y


There are few if any circumstances when one should do this: this is the job
of the evaluator. There are also special tools available for when you really
might want to do this sort of thing   -- eg. ?formula, ?terms for altering
model specifications. But it is tricky to do right and in full generality --
e.g. ?eval and the above references for some of the issues. 

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alberto Monteiro
Sent: Wednesday, February 28, 2007 1:03 PM
To: r-help@stat.math.ethz.ch
Subject: [R] What is a expression good for?

I mean, I can generate a expression, for example, with:

z - expression(x+y)

But then how can I _use_ it? Is it possible to retrieve
information from it, for example, that z is a sum, its
first argument is x (or expression(x)) and its second
argument is y?

Alberto Monteiro

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting of all possible models

2007-02-27 Thread Bert Gunter
... Below

-- Bert 

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr
Sent: Tuesday, February 27, 2007 5:14 AM
To: Indermaur Lukas
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] fitting of all possible models

Indermaur Lukas wrote:
 Hi,
 Fitting all possible models (GLM) with 10 predictors will result in loads
of (2^10 - 1) models. I want to do that in order to get the importance of
variables (having an unbalanced variable design) by summing the up the
AIC-weights of models including the same variable, for every variable
separately. It's time consuming and annoying to define all possible models
by hand. 
  
 Is there a command, or easy solution to let R define the set of all
possible models itself? I defined models in the following way to process
them with a batch job:
  
 # e.g. model 1
 preference- formula(Y~Lwd + N + Sex + YY)

 # e.g. model 2
 preference_heterogeneity- formula(Y~Ri + Lwd + N + Sex + YY)  
 etc.
 etc.
  
  
 I appreciate any hint
 Cheers
 Lukas

If you choose the model from amount 2^10 -1 having best AIC, that model 
will be badly biased.  Why look at so many?  Pre-specification of 
models, or fitting full models with penalization, 

--- ...the rub being how much to penalize. My impression from what I've read
is, for prediction, close to the more, the better is the predictor... .
Nature rewards parsimony.

Cheers,
Bert


Frank

  
  
  
  
  
 °°° 
 Lukas Indermaur, PhD student 
 eawag / Swiss Federal Institute of Aquatic Science and Technology 
 ECO - Department of Aquatic Ecology
 Überlandstrasse 133
 CH-8600 Dübendorf
 Switzerland
  
 Phone: +41 (0) 71 220 38 25
 Fax: +41 (0) 44 823 53 15 
 Email: [EMAIL PROTECTED]
 www.lukasindermaur.ch
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] looping

2007-02-26 Thread Bert Gunter
You do not say -- and I am unable to divine -- whether you wish to sample
with or without replacement: each time or as a whole.

In general, when you want to do this sort of thing, the fastest way to do it
is just to sample everything you need at once and then form it into a list
or matrix or whatever. For example, for sampling 100 each time with
replacement 200 times:

mySamples - matrix(sample(yourDatavector, 100*200,replace=FALSE),ncol=200)

will give you a 100 row by 200 column matrix of samples without replacement
from yourDatavector. I hope that you can adapt this to suit your needs.

 
Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Neil Hepburn
Sent: Monday, February 26, 2007 4:11 PM
To: r-help@stat.math.ethz.ch
Subject: [R] looping


Greetings:

I am looking for some help (probably really basic) with looping. What I want
to do is repeatedly sample observations (about 100 per sample) from a large
dataset (100,000 observations).  I would like the samples labelled sample.1,
sample.2, and so on (or some other suitably simple naming scheme).  To do
this manually I would 

smp.1 - sample(10, 100)
sample.1 - dataset[smp.1,]
smp.2 - sample(10, 100)
sample.2 - dataset[smp.2,]
.
.
.
smp.50 - sample(10, 100)
sample.50 - dataset[smp.50,]

and so on.

I tried the following loop code to generate 100 samples:

for (i in 1:50){
+ smp.[i] - sample(10, 100)
+ sample.[i] - dataset[smp.[i],]}

Unfortunately, that does not work -- specifying the looping variable i in
the way that I have does not work since R uses that to reference places in a
vector (x[i] would be the ith element in the vector x)

Is it possible to assign the value of the looping variable in a name within
the loop structure?

Cheers,
Neil Hepburn

===
Neil Hepburn, Economics Instructor
Social Sciences Department,
The University of Alberta Augustana Campus
4901 - 46 Avenue 
Camrose, Alberta
T4V 2R3

Phone (780) 697-1588
email [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures in Classification and Regresssion Trees

2007-02-23 Thread Bert Gunter
Andrew:

Good question! AFAIK most of the so-called machine learning machinery --
regression and classification trees, SVM's, neural nets, random forests,
and other more chic methods (I make no attempt to keep up with all of them)
-- ignore error structure; that is, they assume the data are at least
independent (not necessarily identically distributed). I don't think merely
exchangeable is good enough either, though I may be wrong about this.

But I believe you have put your finger on a key issue: although all this
cool methodology is usually not terribly concerned with inference
(x-validation and bootstrapping being the usual methodology rather than,
say, asymptotics), one wonders how biased the estimators are when there are
various correlations in the data. I suspect a lot, depending on the nature
of the correlations and the methods. I think the moral is: thermodynamics
still rules -- there's no free lunch. You are just as likely to produce
nonsense using all this nonparametric methodology as you are using
parametric methods if you ignore the error structure of the data.
Incidentally, I should point out that George Box fulminated on this very
issue about 50 years ago. In his statistics classes he always used to say
that all the fuss (then) about using non-parametric rank-based methods (e.g.
Mann-Whitney-Wilcoxon) rather than parametric t-statistics was silly since
the t-statistics were relatively insensitive to deopartures from normality
anyway and it was lack of independence, not exact normality, that was the
key practical issue, and both approaches were sensitive to that. He
published several papers to this effect, of course.

Needless to say, I would welcome other -- especially better informed and
contrary -- views on these issues, either on or off list.

Cheers,

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Park
Sent: Friday, February 23, 2007 7:51 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Repeated measures in Classification and Regresssion Trees

Dear R members,

I have been trying to find out whether one can use multivariate
regression trees (for example mvpart) to analyze repeated measures data.
 As a non-parametric technique, CART is insensitive to most of the
assumptions of parametric regression, but repeated measures data raises
the issue of the independence of several data points measured on the
same subject, or from the same plot over time.

Any perspectives will be welcome,



Andy Park (Assistant Professor)

Centre for Forest Interdisciplinary Research (CFIR),
Department of Biology,
University of Winnipeg,
515 Portage Avenue,
Winnipeg, Manitoba, R3B 2E9,
Canada

Phone: (204) 786-9407

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] investigating interactions with mixed models

2007-02-22 Thread Bert Gunter
?interaction.plot

Should help you. This works on the data, not the model. A 3-way interaction
just means that the 2-way interaction differs among the various levels of
the 3rd factor. Clever use of trellis plots (?xyplot -- especially
?panel.linejoin -- gives greater flexibility, but it requires that a steeper
learning curve be climbed).

In general, the presence of interactions is just another manifestation of
the response varying nonlinearly in the factors (**not** in the parameters,
of course -- it's a linear model after all). This is essentially always the
case, it's just a question of whether the signal/noise ratio (which depends
on sample size) is large enough to see it via P-values. So by all means look
at the plots and try to understand and interpret what's going on; but by no
means assume that p-values above and below a threshhold of .05 are a clear
guide to determining this. As usual, statistical significance and scientific
relevance are not equivalent, and the degree of overlap between the two is
often difficult to judge.

Cheers,
Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Robinson
Sent: Thursday, February 22, 2007 2:32 PM
To: R. Baker
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] investigating interactions with mixed models

Hello Rachel,

I don't think that there is any infrastructure for these procedures on
lmer objects, yet.  If you are willing to use lme instead, then the
multcomp package seems to provide post-hoc tests.  It is worth noting
that there is some doubt as to the validity of the reference
distributions for tests of fixed effects in the presence of random
effects. 

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-are-p_002dvalues-not-displa
yed-when-using-lmer_0028_0029_003f

Cheers

Andrew

On Thu, Feb 22, 2007 at 12:32:44PM +, R. Baker wrote:
 I'm investigating a number of dependent variables using mixed models, e.g.
 
 data.lmer45 = lmer(ampStopB ~ (type + stress + MorD)^3 + (1|speaker) + 
 (1|word), data=data)
 
 The p-values for some of the 2-way and 3-way interactions are significant 
 at a 0.05 level and I have been trying to find out how to understand the 
 exact nature of the interactions. Does anyone know if it is possible to
run 
 post-hoc tests on mixed model (lmer) objects? I have read about TukeyHSD 
 but it seems that this can only be run on anova (aov) objects.
 
 Any suggestions would be gratefully appreciated!
 
 Rachel Baker
 
 -- 
 --
 PhD student
 Dept of Linguistics
 Sidgwick Avenue
 University of Cambridge  
 Cambridge
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert to binary to decimal

2007-02-15 Thread Bert Gunter
why not simply:

sum(x * 2^(rev(seq_along(x)) - 1))   ?


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374 


Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Roland Rau
Sent: Thursday, February 15, 2007 8:22 AM
To: [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] convert to binary to decimal

That was a nice quick distraction. Unfortunately, I am not the first to
answer. :-(
Anyway, I offer two solutions (which are different from the one of Marc
Schwartz); I wrote it quickly but I hope they are correct.

Enjoy and thanks,
Roland

a - c(TRUE, FALSE, TRUE)
b - c(TRUE, FALSE, TRUE, TRUE)

bin2dec.easy - function(binaryvector) {
  sum(2^(which(rev(binaryvector)==TRUE)-1))
}

bin2dec.recursive - function(binaryvector) {
  reversed.input - rev(binaryvector)
  binaryhelper(reversed.input, 0, 0)
}

binaryhelper - function(binvector, currentpower, currentresult) {
  if (length(binvector)1) {
currentresult
  } else {
if (binvector[1]) {
  binaryhelper(binvector[-1], currentpower+1,
currentresult+2^currentpower)
} else {
  binaryhelper(binvector[-1], currentpower+1, currentresult)
}
  }
}


bin2dec.easy(a)
bin2dec.recursive(a)
bin2dec.easy(b)
bin2dec.recursive(b)





On 2/15/07, Marc Schwartz [EMAIL PROTECTED] wrote:

 On Thu, 2007-02-15 at 16:38 +0100, Martin Feldkircher wrote:
  Hello,
  we need to convert a logical vector to a (decimal) integer. Example:
 
  a=c(TRUE, FALSE, TRUE) (binary number 101)
 
  the function we are looking for should return
 
  dec2bin(a)=5
 
  Is there a package for such a function or is it even implemented in the
  base package? We found the hexmode and octmode command, but not a
  binmode. We know how to program it ourselves however we are looking for
  a computationally efficient algorithm.
 
  Martin and Stefan

 This is a modification of a function that I had posted a while back, so
 that it handles 'x' as a logical vector. I added the first line in the
 function to convert the logical vector to it's numeric equivalent and
 then coerce to character:

 bin2dec - function(x)
 {
   x - as.character(as.numeric(x))
   b - as.numeric(unlist(strsplit(x, )))
   pow - 2 ^ ((length(b) - 1):0)
   sum(pow[b == 1])
 }


 a - c(TRUE, FALSE, TRUE)

  bin2dec(a)
 [1] 5

 HTH,

 Marc Schwartz

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] font size in plots

2007-02-14 Thread Bert Gunter
In general, most methods for R's generic plot command (try:
getAnywhere(plot.hclust)) in R's base graphics system accept further
arguments in the (...) portion that provide these sorts of capabilities.
?par will tell you about these further graphical parameters. 


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Federico Abascal
Sent: Wednesday, February 14, 2007 7:42 AM
To: r-help@stat.math.ethz.ch
Subject: [R] font size in plots

Dear members of the list,

it is likely a stupid question but I cannot find the information neither
in R manuals nor in google.

I am generating a plot (from hclust results) but I cannot see properly
the labels because the default font size is too large. How can I change it?

Thanks!
Federico

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Putting splom in a function

2007-02-14 Thread Bert Gunter
Roberto:

You need to do what ?xyplot says. as.symbol(groups) is not a variable and is
certainly not subsettable. If you have a variable named groups in your
data.frame , then ... groups = groups gives the grouping according to the
levels of that variable. If you do not, then it may be picking up a variable
named groups from somewhere else, probably your global workspace, which
may be producihg your unexpected results. Or perhaps there is a variable
named groups in you data frame which is not what you think it is. Have you
checked?

In any case, please examine or run the examples in ?xyplot, especially those
that use the group = argument.

One note: I do grant you that the phrase variable or expression may be
confusing in this context. But do note that ?as.expression explicitly says:

 'Expression' here is not being used in its colloquial sense, that of
mathematical expressions. Those are calls (see call) in R, and an R
expression vector is a list of calls etc, typically as returned by parse.  

What is meant by the phrase in the xyplot help is expression in its
colloquial sense of a math (or more generally, any R) expression, not a
formal expression object, which is what the cast as.expression() gives.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Roberto Perdisci
Sent: Wednesday, February 14, 2007 1:05 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Putting splom in a function

Hello R list,
  I have a little problem with splom. I'd like to wrap it in a
function, for example:

multi.scatterplot - function(data,groups,cols,colors) {
splom(~data[,cols], groups = as.symbol(groups), data = data, panel
= panel.superpose, col=colors)
}

and then call it like in

multi.scatterplot(iris,Species,1:4,c(green,blue,red))

but the problem is:
Error in form$groups[form$subscr] : object is not subsettable

if I use
  groups = groups
instead of
  groups = as.symbol(groups)

shomthing is plotted, but not the correct scatterplot.

I think the problem is that I don't cast the 'groups' variable to the
correct type. Besides as.symbol() I tried also as.expression(),
because ?xyplot says groups: a variable or expression to be evaluated
in the data frame specified by 'data'.
What is the correct type? What as.* should I use?

thank you,
regards,
Roberto

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with subsets and xyplot

2007-02-07 Thread Bert Gunter
?aggregate says:

... the result is reformatted into a data frame containing the variables in
by and x. The ones arising from by contain the unique combinations of
grouping values used for determining the subsets, and the ones arising from
x the corresponding summary statistics for the subset of the respective
variables in x. 

so meansbymsa does not have the same number of rows as your original data
frame, which it must for subsetting to work properly (meansbymsa[,2] was
recycled to be of the right length by default, which produces the nonsense
you got. See ?xyplot)


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Peter Flom
Sent: Wednesday, February 07, 2007 12:10 PM
To: [EMAIL PROTECTED]
Subject: [R] Problem with subsets and xyplot

Hello

I have a dataframe that looks like this

 MSA  CITY HIVEST YEAR   YR CAT
1   0200  Albuquerque 0.50 1996 1996   5
2   0520  Atlanta13.00 1997 1997   5
3   0720  Baltimore  29.10 1994 1994   1
4   0720  Baltimore  13.00 1995 1995   5
5   0720  Baltimore   3.68 1996 1996   3
6   0720  Baltimore   9.00 1997 1997   5
7   0720  Baltimore  11.00 1998 1998   5
8   0875  Bergen-Passaic 51.80 1990 1990   5


many more rows

I would like to create some xyplots, but separately for MSAs that are
high, moderate or low on HIVEST.  Here's what I tried

 READ IN DATA AND RECODE SOME VARIABLES
attach(hivest)

cat - CAT
cat[cat  5] - 6


msa - as.numeric(MSA)
msa[msa == 7361] - 7360
msa[msa == 7362] - 7360
msa[msa == 7363] - 7360

msa[msa == 5601] - 5600
msa[msa == 5602] - 5600

msa[msa == 6484] - 6483


   FIND MEANS FOR EACH MSA, FOR SUBSETTING LATER
meanbymsa - aggregate(HIVEST, by = list(msa), FUN = mean, na.rm = T)

 meanbymsa[,2] gives me the column I want; the 25%tile of this
column is about 3.1.

but when I try

plot1 - xyplot(HIVEST~YEAR|as.factor(msa),  pch = LETTERS[cat], subset
= (meanbymsa[,2]  3.1))
plot1


I don't get what I expect.  No errors, and it is a subset, but the
subset is NOT MSAs with low values of HIVEST.


Any help appreciated.


Peter




Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
http://cduhr.ndri.org
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R in Industry

2007-02-06 Thread Bert Gunter
... two main drawbacks of R at our firm (as viewed by our IT dept) are lack
of
guaranteed support as well as the difficulty in finding candidates.


-- Just an aside: lack of guaranteed support -- absolutely true in theory,
absolutely false in practice. I doubt that the voluntary support found on
r-help and other R lists can be matched by the guaranteed support of any
commercial software product. Not that this makes a difference to the IT
group's requirements, of course...

Cheers,
Bert

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme in R and Splus-7

2007-02-05 Thread Bert Gunter


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of yyan liu
Sent: Monday, February 05, 2007 11:25 AM
To: r-help@stat.math.ethz.ch
Subject: [R] lme in R and Splus-7

Hi:
  I used the function lme in R and Splus-7. With the same dataset and same
argument for the function, I got quite different estimation results from
these two software. Anyone has this experience before?


 Why don't you try searching the archives yourself to see?

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange error in robust package

2007-02-05 Thread Bert Gunter
Probably not worth the effort to try and figure out. Try reinstalling the
latest version of the package and repeating. Maybe something got corrupted.
Also, while you're at it, make sure you have the latest version or R
installed and all your other packages are up to date (robust uses some of
them).


Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Monica Pisica
Sent: Monday, February 05, 2007 2:34 PM
To: r-help@stat.math.ethz.ch
Subject: [R] strange error in robust package
Importance: High

Hi everybody,

I am using quite frequently the robust package and until now i never had 
any problems. Actually last time i used it was last Friday very 
successfully.

Anyway, today anytime i want to use the function fit.models i get the 
following error even if i use the example form the help file:

data(woodmod.dat)
woodmod.fm - fit.models(list(Robust = covRob, Classical = cov), data = 
woodmod.dat)

Error in donostah(data, control) : object .Random.seed not found
Error in model.list[[i]] : subscript out of bounds

Does anybody know what is wrong?

Thanks,

Monica Palaseanu-Lovejoy
USGS / ETI Pro
St. Petersburg, FL

_

Spaces

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop with string variable AND customizable summary output

2007-01-29 Thread Bert Gunter
Prior answers are certainly correct, but this is where lists and lapply
shine:

result-lapply(list(UK,USA),function(z)summary(lm(y~x,data=z)))

As in (nearly) all else, simplicity is a virtue.

If you prefer to keep the data sources as a character vector,dataNames,

result-lapply(dataNames,function(z)summary(lm(y~x,data=get(z 

should work. 

Note: both of these are untested for the general case where they might be
used within a function and may not find the right z unless you pay attention
to scope, especially in the get() construction.


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Monday, January 29, 2007 8:23 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Loop with string variable AND customizable summary output

Dear All,
Thank you very much for your help!
Carlo

-Original Message-
From: Wensui Liu [mailto:[EMAIL PROTECTED]
Sent: Mon 29/01/2007 15:39
To: Rosa,C
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Loop with string variable AND customizable summary output
 
Carlo,

try something like:

for (i in c(UK,USA))
{
summ-summary(lm(y ~ x), subset = (country = i))
assign(paste('output', i, sep = ''), summ);
}

(note: it is untested, sorry).

On 1/29/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Dear All,

 I am using R for my research and I have two questions about it:

 1) is it possible to create a loop using a string, instead of a numeric
vector? I have in mind a specific problem:

 Suppose you have 2 countries: UK, and USA, one dependent (y) and one
independent variable (y) for each country (vale a dire: yUK, xUK, yUSA,
xUSA) and you want to run automatically the following regressions:



 for (i in c(UK,USA))

 output{i}-summary(lm(y{i} ~ x{i}))



 In other words, at the end I would like to have two objects as output:
outputUK and outputUSA, which contain respectively the results of the
first and second regression (yUK on xUK and yUSA on xUSA).



 2) in STATA there is a very nice code (outreg) to display nicely (and as
the user wants to) your regression results.

 Is there anything similar in R / R contributed packages? More precisely, I
am thinking of something that is close in spirit to summary but it is also
customizable. For example, suppose you want different Signif. codes:  0
'***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 or a different format display
(i.e. without t value column) implemented automatically (without manually
editing it every time).

 In alternative, if I was able to see it, I could modify the source code of
the function summary, but I am not able to see its (line by line) code.
Any idea?

 Or may be a customizable regression output already exists?

 Thanks really a lot!

 Carlo

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behaviour with equality after simple subtraction

2007-01-26 Thread Bert Gunter
FAQ on R 7.31. ?all.equal   ?identical  

Have you read these?

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mike Prager
Sent: Friday, January 26, 2007 8:41 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] strange behaviour with equality after simple subtraction

martin sikora [EMAIL PROTECTED] wrote:

 today while trying to extract data from a list for subsequent analysis, i
 stumbled upon this funny behavior on my system:
 
  x-c(0.1,0.9)
 
  1-x[2]
 
 [1] 0.1
 
  x[1]
 
 [1] 0.1
 
  x[1]==1-x[2]
 
 [1] FALSE
 
  x[1]1-x[2]
 
 [1] TRUE
 

Not at all strange, an expected property of floating-point
arithmetic and one of the most frequently asked questions here.

 print(0.1, digits=17)
[1] 0.1
 print(1 - 0.9, digits=17)
[1] 0.09998
 

A simple description of the issue is at

http://docs.python.org/tut/node16.html

In most cases, it suffices to test for approximate difference or
relative difference. The former would look like this

if (abs(x[1] - x[2])  eps)) ...
 
with eps set to something you think is an insignificant
difference, say 1.0e-10.


-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust PCA?

2007-01-18 Thread Bert Gunter
You seem not to have received a reply.  

You can use cov.rob in MASS or cov.Mcd in robustbase or undoubtedly others
to obtain a robust covariance matrix and then use that for PCA. 

-- Bert


Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz
Sent: Thursday, January 18, 2007 11:44 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Robust PCA?

Hi.

I'm checking into robust methods for principal components analysis.  There 
seem to be several floating around.  I'm currently focusing my attention on 
a method of Hubert, Rousseeuw, and Vanden Branden 
(http://wis.kuleuven.be/stat/Papers/robpca.pdf) mainly because I'm familiar 
with other work by Rousseeuw and Hubert in robust methodologies.  Of course,

I'd like to obtain code for this method, or another good robust PCA method, 
if there's one out there.  I haven't noticed the existence on CRAN of a 
package for robust PCA (the authors of the ROBPCA method do provide MATLAB 
code).

--  TMK  --
212-460-5430home
917-656-5351cell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect size in GLIM models

2007-01-17 Thread Bert Gunter
Folks:

I think this and several other recent posts on ranking predictors are nice
illustrations of a fundamental conundrum: Empirical models are fit as good
*predictors*; meaningful interpretation of separate parameters/components
of the predictors may well be difficult or impossible, especially in complex
models.  All that the fitting process guarantees if it works well is a good
overall predictor to data sampled from the same process. Unfortunately,
most/much of the time, those who apply the procedures are interested in
interpretation, not prediction.

Addendum: Interpretation is helped by well-designed studies and experiments,
hindered by data mining of observational data.

I don't think any of this is profound, just sometimes forgotten; however, I
would welcome public or private reaction to this comment, and especially
refinement/corrections.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley
Sent: Wednesday, January 17, 2007 6:02 AM
To: Behnke Jerzy
Cc: Reader Tom; r-help@stat.math.ethz.ch
Subject: Re: [R] Effect size in GLIM models

On Wed, 17 Jan 2007, Behnke Jerzy wrote:

 Dear All,
 I wonder if anyone can advise me as to whether there is a consensus as
 to how the effect size should be calculated from GLIM models in R for
 any specified significant main effect or interaction.

I think there is consensus that effect sizes are not measured by 
significance tests.  If you have a log link (you did not say), the model 
coefficients have a direct interpretation via multiplicative increases in 
rates.

 In investigating the causes of variation in infection in wild animals,
 we have fitted 4-way GLIM models in R with negative binomial errors.

What exactly do you mean by 'GLIM models in R with negative binomial 
errors'?  Negative binomial regression is within the GLM framework only 
for fixed shape theta. Package MASS has glm.nb() which extends the 
framework and you may be using without telling us.  (AFAIK GLIM is a 
software package, not a class of models.)

I suspect you are using the code from MASS without reference to the book
it supports, which has a worked example of model selection.

 These are then simplified using the STEP procedure, and finally each of
 the remaining terms is deleted in turn, and the model without that term
 compared to a model with that term to estimate probability

'probability' of what?

 An ANOVA of each model gives the deviance explained by each interaction
 and main effect, and the percentage deviance attributable to each factor
 can be calculated from NULL deviance.

If theta is not held fixed, anova() is probably not appropriate: see the 
help for anova.negbin.

 However, we estimate probabilities by subsequent deletion of terms, and
 this gives the LR statistic. Expressing the value of the LR statistic as
 a percentage of 2xlog-like in a model without any factors, gives lower
 values than the former procedure.

I don't know anything to suggest percentages of LR statistics are 
reasonable summary measures.  There are extensions of R^2 to these models, 
but AFAIK they share the well-attested drawbacks of R^2.

 Are either of these appropriate? If so which is best, or alternatively
 how can % deviance be calculated. We require % deviance explained by
 each factor or interaction,  because we need to compare individual
 factors (say host age) across a range of infections.

 Any advice will be most gratefully appreciated. I can send you a worked
 example if you require more information.

We do ask for more information in the posting guide and the footer of 
every message.  I have had to guess uncomfortably much in formulating my 
answers.

 Jerzy. M. Behnke,
 The School of Biology,
 The University of Nottingham,
 University Park,
 NOTTINGHAM, NG7 2RD
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self

Re: [R] R editor vs. Tinn-R

2007-01-12 Thread Bert Gunter
Thierry:

Instead of discussing this odd behaviour of TINN-R, I would prefer a
discussion on importing data through the clipboard. In my opinion it isn't a
good a idea to import data with the clipboard. I know that it's a quick and
dirty way to get your data fast into R. 
But I see two major drawbacks. First of all you have no chance of checking
what data you imported. This is important when you need to check your
results a few days (weeks, months or even years) later. A second drawback is
that you won't feel the need to store your data in an orderly fashion. Which
often leads to a huge pile of junk, instead of a valuable dataset...
-

I do not understand this. I do this all the time, easily check the data in R
(which has all sorts of powerful capabilities to do this), and easily store
the data as part of the .Rdata file that also contains functions,
transformations, analyses, etc. that I have used on the data. I do not know
what is more orderly and useful than that! So would you care to
elaborate?

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eval(parse(text vs. get when accessing a function

2007-01-05 Thread Bert Gunter
??

Or to add to what Peter Dalgaard said... (perhaps for the case of many more
functions)

Why eval(parse())? What's wrong with if then? 

g - function(fpost,x){if(fpost==1)f.1 else f.2 }(x)

or switch() if you have more than 2 possible arguments? I think your remarks
reinforce the wisdom of Thomas's axiom . 

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ramon Diaz-Uriarte
Sent: Friday, January 05, 2007 10:02 AM
To: r-help; [EMAIL PROTECTED]
Subject: [R] eval(parse(text vs. get when accessing a function

Dear All,

I've read Thomas Lumley's fortune If the answer is parse() you should
usually 
rethink the question.. But I am not sure it that also applies (and why) to 
other situations (Lumley's comment 
http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html
was in reply to accessing a list).

Suppose I have similarly called functions, except for a postfix. E.g.

f.1 - function(x) {x + 1}
f.2 - function(x) {x + 2}

And sometimes I want to call f.1 and some other times f.2 inside another 
function. I can either do:

g - function(x, fpost) {
calledf - eval(parse(text = paste(f., fpost, sep = )))
calledf(x)
## do more stuff
}


Or:

h - function(x, fpost) {
calledf - get(paste(f., fpost, sep = ))
calledf(x)
## do more stuff
}


Two questions:
1) Why is the second better? 

2) By changing g or h I could use do.call instead; why would that be
better? 
Because I can handle differences in argument lists?



Thanks,


R.



-- 
Ramón Díaz-Uriarte
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Some Windows code for GUI-izing workspace loading

2007-01-04 Thread Bert Gunter
 
Folks:

Motivated by the recent thread on setting working directories, below are a
couple of functions for GUI-izing saving and loading files **in Windows
only** that sort of takes care of this automatically.  The simple strategy
is just to maintain a file consisting of the filenames of recently saved
workspace (.Rdata, etc.)files. Whenever I save a workspace via the function
mySave() below, the filename is chosen via a standard Windows file browser,
and the filename where the workspace was saved is added to the list if it
isn't already there. The recent() function then reads this file and brings
up a GUI standard Windows list box (via select.list()) of the first k
filenames (default k = 10) to load into the workspace **and** sets the
working directory to that of the first file loaded (several can be brought
in at once).

I offer these functions with some trepidation: they are extremely simple and
unsophisticated, and you definitely use them at your own risk. There is no
checking nor warning for whether object names in one loaded file duplicate
and hence overwrite those in another when more than one is loaded, for
example. Nevertheless, I have found the functions handy, as I use the
recently used files options on all my software all the time and wanted to
emulate this for R.

Suggestions for improvement (or better yet, code!) or information about bugs
or other stupidities gratefully appreciated.

Cheers,

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


 Code Follows  #

mySave-
function(recentlistFile=paste(c:/Program
Files/R,recentFiles.txt,sep=/),
savePlots=FALSE)
{
## DESCRIPTION:
## Use a windows GUI to save current workspace

## ARGUMENTS:
##recentlistFile: a quoted character string giving the full
pathname/filename to
##  the file containing the listof recent files.
##  This must be the same as the filename argument of recent()
##The default saves the file in the global R program directory, which
means it does not
##have to be changed when updating to new versions of R which I store
under
##the global R directory. You may need to change this if you have a
different
##way of doing things.
##
##
##savePlots: logical. Should the .SavedPlots plot history be saved? This
object can
##be quite large and not saving it often makes saving and loading much
faster,
##as well as avoiding memory problems. The default is not to save.

if(!savePlots) if(exists(.SavedPlots,where=1))rm(.SavedPlots,pos=1)

fname-choose.files(caption='Save
As...',filters=Filters['RData',],multi=FALSE)
if(fname!=){
save.image(fname)
  if(!file.exists(recentlistFile))write(fname,recentlistFile,ncol=1)
  else{
  nm-scan(recentlistFile,what=,quiet=TRUE,sep=\n)
##  remove duplicate filenames and list in LIFO order
  write(unique(c(fname,nm)),recentlistFile,ncol=1)
}
}
else cat('\nWorkspace not saved\n')
}



 recent-
function(filename=paste(c:/Program
Files/R,recentFiles.txt,sep=/),nshow=10,
setwork=TRUE)
{

## DESCRIPTION:
## GUI-izes workspace loading by bringing up a select box of files
containing
##  recently saved workspaces to load into R.

## ARGUMENTS:
## file: character. The full path name to the file containing the
file list, 
## which is a text file with the filenames, one per line.
##
##
## nshow: The maximum number of paths to show in the list
##
## setwork: logical. Should the working directory be set to that of
the first file
##  loaded?

## find the file containing the filenames if it exists
if(!file.exists(filename))
stop(File containing recent files list cannot be found.)
filelist-scan(filename,what=character(),quiet=TRUE,sep='\n')
len-length(filelist)
if(!len)stop(No recent files)
recentFiles-select.list(filelist[1:min(nshow,len)],multiple=TRUE)
if(!length(recentFiles))stop(No files selected)
i-0
for(nm in recentFiles){
if(file.exists(nm)){
load(nm,env=.GlobalEnv)
i-i+1
if(i==1 setwork)setwd(dirname(nm))
}
else cat('\nFile',nm,'not found.\n')
}
cat('\n\n',i,paste(' file',ifelse(i==1,'','s'),' loaded\n',sep=))
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] na.action and simultaneous regressions

2007-01-03 Thread Bert Gunter
Ravi:

You misinterpreted my reply -- perhaps I was unclear. I did **not** say that
lm() with a matrix response would do it, but that the apply construction or
an explicit loop would. As you and the poster noted, lm() produces a
separate fit to each column of only the rowwise complete data.


Bert Gunter


-Original Message-
From: Ravi Varadhan [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 03, 2007 2:15 PM
To: 'Bert Gunter'; 'Talbot Katz'; r-help@stat.math.ethz.ch
Subject: RE: [R] na.action and simultaneous regressions

No, Bert, lm doesn't produce a list each of whose components is a separate
fit using all the nonmissing data in the column.  It is true that the
regressions are independently performed, but when the response matrix is
passed from lm on to lm.fit, only the complete rows are passed, i.e.
rows with no missing values.  I looked at lm function, but it was not
obvious to me how to fix it.  

In the following toy example, the degrees of freedom for y1 regression
should be 18 and that for y2 should be 15, but both degrees of freedom are
only 15.

 y1 - runif(20)
 y2 - c(runif(17), rep(NA,3))
 x - rnorm(20)
 summary(lm(cbind(y1,y2) ~ x))
Response y1 :

Call:
lm(formula = y1 ~ x)

Residuals:
 Min   1Q   Median   3Q  Max 
-0.52592 -0.22632 -0.00964  0.25117  0.31227 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  0.569890.06902   8.257 5.82e-07 ***
x   -0.123250.06516  -1.8910.078 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.2798 on 15 degrees of freedom
Multiple R-Squared: 0.1926, Adjusted R-squared: 0.1387 
F-statistic: 3.577 on 1 and 15 DF,  p-value: 0.07804 


Response y2 :

Call:
lm(formula = y2 ~ x)

Residuals:
 Min   1Q   Median   3Q  Max 
-0.48880 -0.28552 -0.06022  0.23167  0.54425 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  0.437120.07686   5.687 4.31e-05 ***
x0.102780.07257   1.4160.177
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.3115 on 15 degrees of freedom
Multiple R-Squared: 0.118,  Adjusted R-squared: 0.05915 
F-statistic: 2.006 on 1 and 15 DF,  p-value: 0.1771 


Ravi.


---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: [EMAIL PROTECTED]

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bert Gunter
Sent: Wednesday, January 03, 2007 4:46 PM
To: 'Talbot Katz'; r-help@stat.math.ethz.ch
Subject: Re: [R] na.action and simultaneous regressions

As the Help page says:

If response is a matrix a linear model is fitted separately by least-squares
to each column of the matrix

So there's nothing hidden going on behind the scenes, and
apply(cbind(y1,y2),2,function(z)lm(z~x)) (or an explicit loop, of course)
will produce a list each of whose components is a separate fit using all the
nonmissing data in the column. 

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Talbot Katz
Sent: Wednesday, January 03, 2007 11:56 AM
To: r-help@stat.math.ethz.ch
Subject: [R] na.action and simultaneous regressions

Hi.

I am running regressions of several dependent variables using the same set 
of independent variables.  The independent variable values are complete, but

each dependent variable has some missing values for some observations; by 
default, lm(y1~x) will carry out the regressions using only the observations

without missing values of y1.  If I do lm(cbind(y1,y2)~x), the default will 
be to use only the observations for which neither y1 nor y2 is missing.  I'd

like to have the regression for each separate dependent variable use all the

non-missing cases for that variable.  I would think that there should be a 
way to do that using the na.action option, but I haven't seen this in the 
documentation or figured out how to do it on my own.  Can it be done this 
way, or do I have to code the regressions in a loop?  (By the way, since it 
restricts to non-missing values in all the variables simultaneously, is this

because it's doing some sort of SUR or other simultaneous equation 
estimation behind the scenes?)

Thanks!

--  TMK  --
212-460-5430home
917-656-5351cell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] sorting by name

2006-12-14 Thread Bert Gunter
This is trivial.

help([) and An Introduction to R will tell you how.

P.S. As earlier posts today have mentioned, stepwise variable selection is
generally a bad idea.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brooke LaFlamme
Sent: Thursday, December 14, 2006 4:34 PM
To: r-help@stat.math.ethz.ch
Subject: [R] sorting by name

Hi all,

I'm not sure that there is really a way to do this, but I thought I'd see if
anyone knew.

I have a file with 1 to n columns all named something like X1, X2, X3Xn.

I have another file that has in one column n number of rows. Each row has a
number in it (not in order; the ordering of the numbers is important but it
isn't in count order).

Basically, I would like to order the columns in the first file by the
numbers in the rows of the second file. So, if file#2 has these numbers in
rows 1-4:

 [,1]  
 [1,]   2 
 [2,]   3 
 [3,]   1 
 [4,]   4

I would like the first file to look like this:

X2 X3 X1 X4 
1
 Instead of the original order:

X1 X2 X3 X4 
1

Is this possible? 

The point of this all is to run a stepwise linear regression that first
regresses on X2, then adds in X3, X1, X4 in that order, stopping at each
step to assess whether to drop one or more of the previously added
variables. 

Thank you in advance for any suggestions!

Brooke LaFlamme

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 questions

2006-12-12 Thread Bert Gunter
Warning: Something of a personal rant, clearly reflecting my own hangups,
and having nothing to do with my company or anyone else! A good reason for
most to stop reading now

While you will find respondents on this list on the whole quite gracious in
their willingness to help newbies learn R, there are limits to this
patience. In particular, questions of the sort below seem to me, at least to
be clear announcements that the original poster has **not** read the posting
guide, nor has made an honest attempt to learn R by studying what I think is
quite good basic documentation (see, for example, An Introduction to R;
CRAN lists many more similar resources). While I grant that sometimes the
online help is a bit terse, I don't think that anyone who has made an honest
attempt to read the basic docs would ask such questions. If I am wrong in
this, I apologize. But, if not, then I consider the questions unworthy of my
time to respond to. Whether right or wrong, queries posted in a way that
conveys that impression are less likely to elicit good replies. I guess the
moral is that on this list anyway, good behavior is rewarded, and bad
behavior is ignored.

Cheers,
Bert Gunter

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Van Campenhout Bjorn
Sent: Tuesday, December 12, 2006 7:51 AM
To: r-help
Cc: [EMAIL PROTECTED]
Subject: Re: [R] 2 questions

Hi!

 I'm new here. Want to ask two possibly quite basic questions:

 1. How can I clear all objects in one stroke?

how about

rm(ls())

try

rm(list=ls())


Bjorn
 2. How can I perform a regression with independent variables specified by
 an object?

Dh, no spontaneous idea.

Greetings,

Sebastian


 Thanks,

 Tim

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combinations of m objects into r groups

2006-12-12 Thread Bert Gunter
This issue has come up before:

RSiteSearch(nkpartitions) 

will find references for you on CRAN.

You might also try
http://ranau.cs.ui.ac.id/book/AlgDesignManual/BOOK/BOOK4/NODE153.HTM

for some background, or google on set partitions.

Bottom line: it ain't trivial.

Cheers,

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Maria Montez
Sent: Tuesday, December 12, 2006 4:07 PM
To: r-help@stat.math.ethz.ch
Subject: [R] combinations of m objects into r groups

Hi!

Suppose I have m objects. I need to find out what are all possible ways 
I can group those m objects into r groups. Moreover, I need to create a 
matrix that contains what those arrangements are. I've created code for 
when r=2 but I've come to a halt when trying to generalize it into r groups.

For example, if I have m=6 objects and I want to arrange them into 
groups of r=2, there are a total of 41 possible arrangements. I would 
like a matrix of the form (showing only 9 possible arrangements):

  c1 c2 c3 c4 c5 c6 c7 c8 c9
1  1  2  2  2  2  2  1  1  1
2  2  1  2  2  2  2  1  2  2
3  2  2  1  2  2  2  2  1  2
4  2  2  2  1  2  2  2  2  1
5  2  2  2  2  1  2  2  2  2
6  2  2  2  2  2  1  2  2  2

This means that arrangement c1 puts object 1 into group 1 and all other 
objects into group 2.

I've created code for this particular example with two groups. I'm using 
the subsets function which I've found posted online, in a post that 
references page 149 of Venables and Ripley (2nd ed).

#subsets function computes all possibles combinations of n objects r at a
time 
subsets-function(r,n,v=1:n)
{
if(r=0) NULL else
if(r=n) v[1:n] else
rbind(cbind(v[1],Recall(r-1,n-1,v[-1])), Recall(r, n-1,v[-1]))
}
#labels for objects
r - c(1100,1010,1001,0110,0101,0011)
m-length(r)
for (k in 1:trunc(m/2)){
  a - subsets(k, m)
  for (i in 1:dim(a)[1]){
sub - rep(2, m)
b - a[i,]
for (j in 1:length(b)){
  sub[b[j]]=1
}
r - data.frame(r, sub)
  }
}
names - c(xcomb)
for (i in 1:(dim(r)[2]-1)) {
  names - c(names,paste(c,i,sep=))
}
names(r) - names

Any suggestions?

Thanks, Maria








After searching for help I found a

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove from a string

2006-12-08 Thread Bert Gunter
 I second Marc's comments below, but for amusement, another alternative to
the (undesirable) eval(call()) construction is:

 foo-function(x)x^2
 get(foo)(1:5)
[1]  1  4  9 16 25

I believe this is equally undesirable, however, and as Marc said, making
your function a function of two arguments or something similar would be the
better approach.


Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz
Sent: Friday, December 08, 2006 6:14 AM
To: Katharina Vedovelli
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Remove  from a string

On Fri, 2006-12-08 at 14:57 +0100, Katharina Vedovelli wrote:
 Hi all!
 
 I have lots of functions called in the following pattern
 'NameOfFunctionNumber' where the name always stays the same and the number
 varies from 1 to 98.
 Another function which I run in advance returns the number of the function
 which has to be called next.
 
 Now I want to combine 'NameOfFunction' with the 'Number' returned so that
i
 can call the desired function.
 I do this by:
 
 x-c(NameOfFunction,Number)
 z-paste(x,collapse=)
 z
 
 which returns
 
 NameOfFunctionNumber
 
 My Problem is that R doesn't recognise this as the name of my function
 because of the  at the beginning and the end.
 Is there a way of getting rid of those? Or does anybody know another way
of
 solving this problem?
 
 Thanks a lot for your help!
 Cheers,
 Katharina

It is not entirely clear what your ultimate goal is, thus there may be a
(much) better approach than calling functions in this manner. What do
the functions actually do and does the output vary based upon some
attribute (ie. the class) of the argument such that using R's typical
function dispatch method would be more suitable.

However, to address the specific question, at least two options:

 NameOfFunction21 - function(x) x^2

 eval(call(paste(NameOfFunction, 21, sep = ),  21))
[1] 441

 do.call(paste(NameOfFunction, 21, sep = ),  list(21))
[1] 441

In both cases, the result is to evaluate the function call, with 21 as
the argument.  See ?call, ?eval and ?do.call for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summary shows wrong maximum

2006-12-06 Thread Bert Gunter
 
Folks:

Is 

So this is at best a matter of opinion, 
and credentials do matter for opinions.

-- Brian Ripley

an R fortunes candidate?

-- Bert Gunter


On Tue, 5 Dec 2006, Oliver Czoske wrote:

 On Mon, 4 Dec 2006, Uwe Ligges wrote:
 Sebastian Spaeth wrote:
 Hi all,
 I have a list with a numerical column cum_hardreuses. By coincidence I
 discovered this:

 max(libs[,cum_hardreuses])
 [1] 1793

 summary(libs[,cum_hardreuses])
 Min. 1st Qu.  MedianMean 3rd Qu.Max.
1   2   4  36  141790

 (note the max value of 1790) Ouch this is bad! Anything I can do to
remedy
 this? Known bug?

 No, it's a feature! See ?summary: printing is done up to 3 significant
 digits by default.

 Unfortunately, '1790' is printed with *four* significant digits, not
 three. The correct representation with three significant digits would have
 to employ scientific notation, 1.79e3.



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Usage of apply

2006-12-06 Thread Bert Gunter

But do note -- again! -- that the apply family of functions do their magic
**internally through looping**, so that they are generally not much faster
-- and sometimes a bit slower -- then explicit loops. Their chief advantage
(IMO, of course) is in code clarity and correctness, which is why I prefer
them. (They are also written to do their looping as efficiently as possible,
which explicit looping in user code may not.)

Of course, vectorized calculations (colMeans() in the example below) **are**
much faster and usually clearer than explicit loops.
 
Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chuck Cleland
Sent: Wednesday, December 06, 2006 6:54 AM
To: R Help
Subject: Re: [R] Usage of apply

Jin Shusong wrote:
 Dear R Users,
   
   Are there any documents on the usage of apply, tapply,
 sapply so that I avoid explicit loops.  I found that these
 three functions were quite hard to be understood.  Thank you
 in advance.

  If you have read the help pages for each and possibly even consulted
the reference on those help pages, you may need to elaborate on what
parts of these functions you don't understand.  You might also describe
a loop you are contemplating and ask how it might be replaced by one of
these functions.
  Here is a very simple example of a loop that could be avoided with one
of these functions:

 for(i in 1:4){print(mean(iris[,i]))}
[1] 5.84
[1] 3.057333
[1] 3.758
[1] 1.199333

  Here is how you would do that with apply():

 apply(iris[,1:4], 2, mean)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
5.84 3.057333 3.758000 1.199333

  Even better in this particular case would be:

 colMeans(iris[,1:4])
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
5.84 3.057333 3.758000 1.199333

  but you don't always want mean() or sum() as the function, so the
functions you mention above are more general than colMeans() and similar
functions.

 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summary shows wrong maximum

2006-12-06 Thread Bert Gunter
Mike:

I offered no opinion -- and really didn't have any -- about the worthiness
of any of the comments that were made. I just liked Brian's little quotable
aside.

But since you bait me a bit ...

In general, I believe that showing th 2-3 most important -- **not
significant** -- digits **and no more** is desirable. By  most important I
mean the leftmost digits which are changing in the data (there are some
caveats in the presence of extreme outliers). Printing more digits merely
obfuscates the ability of the eye/brain to perceive the patterns of change
in the data, the presumed intent of displaying it (not of storing it, of
course). Displaying excessive digits to demonstrate (usually falsely) one's
precision is evil. Clarity of communications is the standard we should
aspire to.

These views have been more eloquently expressed by  A.S.C Ehrenburg and
Howard Wainer among others...

-- Bert


Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mike Prager
Sent: Wednesday, December 06, 2006 11:46 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Summary shows wrong maximum

I don't know about candidacy, and I'm not going to argue about
correctness, but it seems to me that the only valid reasons to
limit precision of printing in a statistics program are (1) to
save space and (2) to allow for machine limitations. This is
neither. To chop off information and replace it with zeroes is
just plain nasty.


Bert Gunter [EMAIL PROTECTED] wrote:

  
 Folks:
 
 Is 
 
 So this is at best a matter of opinion, 
 and credentials do matter for opinions.
 
 -- Brian Ripley
 
 an R fortunes candidate?
 
 -- Bert Gunter
 
 
 On Tue, 5 Dec 2006, Oliver Czoske wrote:
 
  On Mon, 4 Dec 2006, Uwe Ligges wrote:
  Sebastian Spaeth wrote:
  Hi all,
  I have a list with a numerical column cum_hardreuses. By coincidence
I
  discovered this:
 
  max(libs[,cum_hardreuses])
  [1] 1793
 
  summary(libs[,cum_hardreuses])
  Min. 1st Qu.  MedianMean 3rd Qu.Max.
 1   2   4  36  141790
 
  (note the max value of 1790) Ouch this is bad! Anything I can do to
 remedy
  this? Known bug?
 
  No, it's a feature! See ?summary: printing is done up to 3 significant
  digits by default.
 
  Unfortunately, '1790' is printed with *four* significant digits, not
  three. The correct representation with three significant digits would
have
  to employ scientific notation, 1.79e3.
 
 

-- 
Mike Prager, NOAA, Beaufort, NC
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stat question - not R question so ignore if not interested

2006-12-05 Thread Bert Gunter
... But of course this is always the question underlying all empirical -- or
maybe even scientific -- analysis: is there some other perhaps more
fundamental variable out there that I'm missing that would explain what's
really going on?

I clearly remember George Box commenting on this in his Monday night beer
and statistics sessions: after you're done and perhaps have written up and
presented your (intricate!) analysis, you're always worried that someone
might come along and say, Well, did you consider...?

Cheers,
Bert 

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jonathan Baron
Sent: Tuesday, December 05, 2006 1:45 PM
To: Richard M. Heiberger
Cc: r-help@stat.math.ethz.ch; C. Park; Leeds,Mark (IED)
Subject: Re: [R] stat question - not R question so ignore if not interested

A classic example used by my colleague Paul Rozin (when he
teaches Psych 1) is to compute the correlation between height
and number of shoes owned, in the class.  Shorter students own
more shoes.  But ...

On 12/05/06 16:34, Richard M. Heiberger wrote:
 The missing piece is why there are two clusters.  There is
 most likely a two-level factor distinguishing the groups
 that was not included in the model.  It might not even have
 been measured and now you need to find it.
 
 Rich

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tests for NULL objects

2006-11-29 Thread Bert Gunter
Merely convention.

NULL == 2  == logical(0), that is, a logical vector of length 0. It makes
sense (at least to me) that any(logical(0)) is FALSE, since no elements of
the vector are TRUE. all(logical(0)) is TRUE since no elements of the vector
are FALSE.

I think these are reasonable and fairly standard conventions, but even if
you disagree, they are certainly not worth making a fuss over and certainly
cannot be changed without breaking a lot of code, I'm sure. 

Bert Gunter
Nonclinical Statistics
7-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Benilton Carvalho
Sent: Wednesday, November 29, 2006 2:21 PM
To: R-Mailingliste
Subject: [R] tests for NULL objects

Hi Everyone,

After searching the subject and not being successful, I was wondering  
if any you could explain me the idea behind the following fact:

all(NULL == 2)  ## TRUE
any(NULL == 2) ## FALSE

Thanks a lot,

Benilton

--
Benilton Carvalho
PhD Candidate
Department of Biostatistics
Johns Hopkins University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >