Re: [R] understanding eigen(): getting non-normalized eigenvectors

2003-06-10 Thread Prof Brian Ripley
Eigenvectors are defined only up to a scalar constant (assuming distinct 
eigenvalues).  However, your `by hand' answer does not pass the simple 
test Av = lambda v for some lambda.  So you cannot reproduce incorrect
answers in R!

Your example is unusual: A is of rank 1.

On 9 Jun 2003, Christoph Lehmann wrote:

 Hi, dear R pros
 
 I try to understand eigen(). I have seen, that eigen() gives the
 eigenvectors normalized to unit length.
 
 What shall I do to get the eigenvectors not normalized to unit length?

Multiply them by any randomly chosen non-zero scalar!

 E.g. take the example:
 
  A
  
[,1]   [,2]
   V1  0.7714286 -0.2571429
   V2 -0.4224490  0.1408163
 
 Calculating eigen(A) by hand gives the eigenvectors (example from
 Backhaus, multivariate analysis):
 
  0.77143  and 0.25714
 -0.42245  0.14082

The second is not an eigenvector of A: try it!  They look like rounded
versions of A with a sign error.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] binomial GAM ROC

2003-06-10 Thread David Nogués
Hello

Im a beginner on R and I would like how to develop a ROC statistic to 
evaluate a GAM model with a binomial distribution (Im using  mgcv package)

Thanks in advance

--
David Nogués Bravo
Functional Ecology and Biodiversity Department
Pyrenean Institute of Ecology
Spanish Research Council
Av. Montañana 1005
Zaragoza - CP 50059
976716030 - 976716019 (fax)
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] c(...) and methods

2003-06-10 Thread Marsland, John
I have been writing some S4 classes and have a problem about how I might
pass a signature to c().

Take the following example:

setClass(collection, representation(list, date=POSIXt))

x - new(collection, list(1,2,3), date=Sys.time())
y - new(collection, list(4,5,6), date=Sys.time())

obviously, I can do c(x,y), but this wil be of class list

I would like to do something like:

setMethod(c, signature(collection),
function(...) {
if([EMAIL PROTECTED]@date) stop(at different dates!)
res - c(as(e1, list),as(e2, list))
new(collection, res, date = [EMAIL PROTECTED])
})

but c() takes ... as its arguments and I don't know how I might reference
that with a signature and appropriate arguments etc.

Has anybody an ideas? I presume it doesn't matter that c() is
.Primative(c)?

Regards, 

John Marsland


** 
This is a commercial communication from Commerzbank AG.\ \ This ... {{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] ESRI shapefiles and EMME/2 packages

2003-06-10 Thread Hisaji Ono
Hi.

 Currently I couldn't  your uploaded packages in CRAN at Austria.

 Where could I get them currently?

 And do you know Rmap(http://www.maths.lancs.ac.uk/Software/Rmap/) which
supports shape files and other geo-spatial data formats with map projection?

 Regards.

- Original Message - 
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, June 10, 2003 6:57 AM
Subject: [R] ESRI shapefiles and EMME/2 packages


 I just uploaded two packages to CRAN.

 shapefiles_0.1.tar.gz - functions to read and write ESRI shapefiles
 (including dbfs)
 emme2_0.1.tar.gz - functions to read binary data from an EMME/2 databank
 data (EMME/2 is a transportation modeling program)

 Please let me know if you find any bugs or have some suggestions.  Thanks.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] SOM random seed

2003-06-10 Thread Prof Brian Ripley
First, *which* SOM routine, from which package?
Credit where credit is due and all that 

If you mean the one in class(VR), it uses the standard R random number
generation.  You don't need to `pass the random-seed as an argument
somehow', just set the seed as you would anyway (e.g. via set.seed).

For example

 library(MASS)
 lcrabs - log(crabs[, 4:8])
 crabs.grp - factor(c(B, b, O, o)[rep(1:4, rep(50,4))])
 gr - somgrid(topo = hexagonal)
 old.seed - .Random.seed  ## save old seed if you want to
 set.seed(123)
 crabs.som - SOM(lcrabs, gr)
 .Random.seed - old.seed  ## restore the seed

On Tue, 10 Jun 2003, Jonck van der Kogel wrote:

 I have a question about the SOM routine. You can either supply the 
 initial representatives for the lattice yourself or else they are 
 chosen randomly from the dataset. Is it possible to pass the 
 random-seed as an argument somehow, when choosing the random 
 initialisation of the lattice?
 As it is now, each time I run a SOM on a dataset with the same settings 
 the resulting SOM will still be slightly different due to the random 
 initialisation.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Regression output labels

2003-06-10 Thread Pfaff, Bernhard
 Hello to all-
   1.  When I run a regression which implements the 
 augmented Dickey-Fuller
 test, I am confused about the names given to the regressors 
 in the output.
 I understand what xGE stands for in a standard lm test 
 involving an
 independent variable GE for instance, but if I lags and or 
 differences are
 included in the model, what do the following output stand for:
   xlag(x,-1)GE
   xD.GE
   xD.lag(diff(x), -i)GE
   xD.D.lag(diff(x), -i)GE
   Thanks for the clarifications -- I don't want to 
 misspeculate on the
 actual interpretations, here...
 

Hello rwatkins,

I assume that you are referring to the function:
http://www.econ.uiuc.edu/~econ472/adf.R.txt


adf-
function(x, L = 2, int = T, trend = T)
{
#Construct Data for Augmented Dickey Fuller Model with L lags.
#This is a modified version for R, in which the command rts was substituted
by ts.
x - ts(x)
D - diff(x)
if(L  0) {
for(i in 1:L)
D - ts.intersect(D, lag(diff(x),  - i))
}
D - ts.intersect(lag(x, -1), D)
if(trend == T)
D - ts.intersect(D, time(x))
y - D[, 2]
x - D[, -2]
if(int == T)
summary(lm(y ~ x))
else summary(lm(y ~ x - 1))
}

As you can see by the code the lagged differences of the series are produced
repetitively in the for-loop; the function argument L sets the maximum
number of lagged differences of the time series to be tested for unit root.
After the for-loop the one period lag of the original variable is included
to D as first columne. In case of a trend inclusion, this variable is
included as last columne:

xD.lag(x, -1): is the one period lag of the original series;
xD.D.D.lag(diff(x), -i): is the one period lag of the differenced series;
xD.D.lag(diff(x), -i): is the two period lag of the differenced series and
xtime(x): is the linear time trend.  

Pls. note, that you should include as many lagged differenced x series
(function argument L) until the residuals of the test regression do not
contain any significant autocorrelations (check their acfs and/or pacfs). 
 Also...
   2.  When an Engle-Granger test is run on multiple 
 independent variables,
 only one cointegration vector is returned.  Can one tell 
 which vector --
 or what two variables' relationship  -- is being identified for the R
 output.  Likewise, if I run a Johansen test, does R tell me 

To which function are you referring in particular?

 specifically
 which pairs of variables are cointegrated or do I just get the rank?
 
   Thanks to all for your time and consideration.
 

If you apply the Engle-Granger Two-Step-procedure in case of more than two
I(1) variables and if more than one cointegration-relationship exists you
are estimating a linear combination of the latter. In case you apply a
Johansen test (rank- or trace-test) you are only testing the space of
cointegrating vectors. In order to draw inferences about the
cointegration-relationships you must test these specifically.

As a suggestion for reading and further elaboration let me allow you to hint
you to the following literature:

1)
Hamilton, James D. (1994). Time Series Analysis. Princeton N.J.: Princeton
University
Press.

2)
Engle, Robert F., and Clive W. J. Granger. (1987). Co-Integration and Error
Correction:
Representation, Estimation and Testing. Econometrica. Vol. 55, pp. 251-276.

3)
Johansen, Søren. (1991). Estimation and Hypotheses Testing of Cointegrating
Vectors
in Gaussian Vector Autoregressive Models. Econometrica. Vol. 59, pp.
1551-1580.

4)
Johansen, Søren. (1995). Likelihood-Based Inference in Cointegrated Vector
Autoregressive
Models. Oxford: Oxford University Press.

HTH,
Bernhard


 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 


--
If you have received this e-mail in error or wish to read our e-mail 
disclaimer statement and monitoring policy, please refer to 
http://www.drkw.com/disc/email/ or contact the sender.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] fitting data to exponential distribution with glm

2003-06-10 Thread Adelchi Azzalini
On Tuesday 10 June 2003 17:31, Masayoshi Hayashi wrote:
(B I am learning glm function, but how do you fit data using exponential
(B distribution with glm?
(B
(BThe Gamma family is parametrised in glm() by two parameters: 
(Bmean and dispersion; the "dispersion" regulates the shape. 
(B
(BSo must fit a GLM with the Gamma family, and then produce a "summary"
(Bwith dispersion parameter set equal to 1, since this value 
(Bcorresponds to the exponential distribution in the Gamma family.
(B
(BIn practice:
(B
(Bfit - glm(formula =...,  family = Gamma)
(Bsummary(fit,dispersion=1)   
(B
(B
(Bbest wishes,
(B
(BAdelchi Azzalini
(B
(B-- 
(BAdelchi Azzalini  [EMAIL PROTECTED]
(BDipart.Scienze Statistiche, Universit(B?(B di Padova, Italia
(Bhttp://azzalini.stat.unipd.it/
(B
(B__
(B[EMAIL PROTECTED] mailing list
(Bhttps://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] fitting data to exponential distribution with glm

2003-06-10 Thread Prof Brian Ripley
An exponential distribution is a gamma distribution, and as far as fitting
the MLE of the coefficients all gammas give the same MLEs.  (You can
specify the dispersion and hence that the gamma is exponential when asking
for summaries, anova, etc.)

On Wed, 11 Jun 2003, Masayoshi Hayashi wrote:

 I am learning glm function, but how do you fit data using exponential
 distribution with glm?
 In the help file, under Family Objects for Models, no ready made option
 seems available for the distribution as well as for other distributions
 satisfying GLM requirements not listed there.

Which ones did you have in mind?  The only other commonly used
distribution which gives a glm is the negative binomial with fixed shape, 
for which see the MASS book and package.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] estimate the number of clusters

2003-06-10 Thread Martin Maechler
 MM == Martin Maechler [EMAIL PROTECTED]
 on Tue, 10 Jun 2003 18:12:36 +0200 writes:

MM Ping, you found another bug in silhouette.default() --
MM which can happen when there's one cluster with exactly
MM one observation.

MM I'll let you know more, once I have a complete fix.

The patch for this bug  {against an *installed* version of cluster}
is this :
---

--- cluster-version-1.7-2/library/cluster/R/cluster Thu Jun  5 04:00:15 
2003
+++ fixed/cluster/R/cluster Tue Jun 10 18:56:17 
2003
@@ -2019,11 +2019,11 @@
 wds[iC, cluster] - j
 a.i - if(Nj  1) colSums(dmatrix[iC, iC])/(Nj - 1) else 0 # length(a.i)= Nj
 ## minimal distances to points in all other clusters:
-diC - rbind(apply(dmatrix[!iC, iC], 2,
+diC - rbind(apply(dmatrix[!iC, iC, drop = FALSE], 2,
function(r) tapply(r, x[!iC], mean)))# (k-1) x Nj
 minC - max.col(-t(diC))
 wds[iC,neighbor] - clid[-j][minC]
-b.i - diC[cbind(minC, seq(minC))]
+b.i - diC[cbind(minC, seq(along = minC))]
 s.i - (b.i - a.i) / pmax(b.i, a.i)
 wds[iC,sil_width] - s.i
 }

---

i.e. you add  , drop = FALSE in line 2022
 and  along =in line 2026
in the appropriate places.

A fixed version of cluster should appear soon, and also together
with R 1.7.1.

Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] estimating a density by selecting the bandwidth

2003-06-10 Thread Rafael Bertola
I´ve a data set and i want fit a kernel density estimate to the data.
but using the k-nearest neighbour method.
How i do this with R.

thanks
-- 
  
  [EMAIL PROTECTED]

--

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] estimating a density by selecting the bandwidth

2003-06-10 Thread Prof Brian Ripley
On Tue, 10 Jun 2003, Rafael Bertola wrote:

 I´ve a data set and i want fit a kernel density estimate to the data.
 but using the k-nearest neighbour method.
 How i do this with R.

Well, you define the exact algorithm you want to use, and then you program 
it, R being a fully-featured programming language.

Your description is exceedingly vague: if you are hoping for an already 
implemented solution you might take a look at packge locfit (but that is 
only a rough fit to your `description'.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] color coding a legend

2003-06-10 Thread Robert Schick
I'm using R 1.6.2 on a Windows 2000 machine.

I've plotted the results of an MDS run labeled by a numerical ID, and
color coded by a group code:

plot(cv.mds.spr$points, type=n, main=Non-Metric Multidimensional
Scaling of SprRun CV Watersheds)
text(cv.mds.spr$points, labels = as.character(cv.wshed.id.spr), col =
codes(cv.wshed.grp), cex=.75)

Question is, how do I get the legend to match the color codes?

I have tried different permutations of the following: 
leg.txt - c(LSSJ.NS,LSSJ.SS,US.RD,US.SF)   # the groups in
cv.wshed.grp
legend(-6.5, -2.5, leg.txt, pch=1234, col=
as.character(codes(cv.wshed.grp)))

But this only plots the codes in cv.wshed.grp as they are encountered,
not the levels. Ideally I'd like the legend to have the Group ID label,
and a filled box corresponding to the colors in the text call above.


-- 
Rob Schick
Ecologist
NOAA Fisheries
Santa Cruz Lab
110 Shaffer Road
Santa Cruz, CA 95060
Phone: 831.420.3960

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Rounding problem R vs Excel

2003-06-10 Thread Marc Schwartz
Hi all,

I thought that I would follow up on this thread with some refined
information.

As I mentioned in a prior post in this thread, I posted a query to the
OOo folks regarding the inability to replicate the IEEE representation
issue in OOo Calc.

Recall the following results:

OOo Calc 1.0.2 and 1.1 Beta2:

Cell Formula  Value
= 4.145 * 100 + 0.5   4.1500E+02
= 0.5 - 0.4 - 0.1 0.E+00
=(0.5 - 0.4 - 0.1)0.E+00


MS Excel 2002 (XP)

Cell Formula  Value
= 4.145 * 100 + 0.5   4.1500E+02
= 0.5 - 0.4 - 0.1 0.E+00
=(0.5 - 0.4 - 0.1)-2.7755575615628900E-17


Gnumeric 1.0.12:

Cell Formula  Value
= 4.145 * 100 + 0.5   +4.14943157E+02
= 0.5 - 0.4 - 0.1 -2.77555756156289135106E-17
*Gnumeric does not appear to allow the surrounding parens.


R 1.7.1 Beta:

 print(4.145 * 100 + 0.5, digits = 20)
[1] 414.94
 formatC(4.145 * 100 + 0.5, format = E, digits = 20)
[1] 4.14943157E+02

 print(0.5 - 0.4 - 0.1, digits = 20)
[1] -2.775557561562891e-17
 formatC(0.5 - 0.4 - 0.1, format = E, digits = 20)
[1] -2.77555756156289135106E-17


Today, I received a reply from Niklas Nebel at Sun to indicate that
indeed OOo has included the same (or perhaps more correctly, a similar)
optimization that MS incorporated into Excel starting with Excel 97,
which renders results close to zero as zero.

In OOo Calc 1.1 Beta, Niklas indicated that the relevant source code is
in:

sal/inc/rtl/math.hxx

Since the full source tarball is rather large, a subset of the relevant
source code from that file is below. Note the use of the approxEqual()
function as the basis for subsequent arithmetic. 

This code should, in theory, offer some insight into what Excel is doing
as well, with the exception of course of the formula parsing issue,
which is clearly a bug.

HTH,

Marc Schwartz


***

Copyright: 2002 by Sun Microsystems, Inc.

/** Test equality of two values with an accuracy of the magnitude of the
given values scaled by 2^-48 (4 bits roundoff stripped).

@ATTENTION
approxEqual( value!=0.0, 0.0 ) _never_ yields true.
*/
inline bool approxEqual(double a, double b)
{
  if ( a == b )
return true;
  double x = a - b;
return (x  0.0 ? -x : x)
 ((a  0.0 ? -a : a) * (1.0 / (16777216.0 * 16777216.0)));
}

/** Add two values.

If signs differ and the absolute values are equal according to
approxEqual() the method returns 0.0 instead of calculating the sum.

If you wanted to sum up multiple values it would be convenient not to
call approxAdd() for each value but instead remember the first value not
equal to 0.0, add all other values using normal + operator, and with the
result and the remembered value call approxAdd().
*/
inline double approxAdd(double a, double b)
{
  if ( ((a  0.0  b  0.0) || (b  0.0  a  0.0))
  approxEqual( a, -b ) )
  return 0.0;
  return a + b;
}

/** Substract two values (a-b).

If signs are identical and the values are equal according to
approxEqual() the method returns 0.0 instead of calculating the
substraction.
*/
inline double approxSub(double a, double b)
{
if ( ((a  0.0  b  0.0) || (a  0.0  b  0.0))  
 approxEqual(a, b) )
  return 0.0;
  return a - b;
}

/** floor() method taking approxEqual() into account.

  Use for expected integer values being calculated by double functions.

@ATTENTION
The threshhold value is 3.55271e-015
*/

inline double approxFloor(double a)
{
  double b = floor( a );
  // The second approxEqual() is necessary for values that are near the
limit
  // of numbers representable with 4 bits stripped off. (#i12446#)
  if ( approxEqual( a - 1.0, b )  !approxEqual( a, b ) )
return b + 1.0;
  return b;
}

/** ceil() method taking approxEqual() into account.

Use for expected integer values being calculated by double functions.

@ATTENTION
The threshhold value is 3.55271e-015
*/
inline double approxCeil(double a)
{
  double b = ceil( a );
  // The second approxEqual() is necessary for values that are near the 
limit
  // of numbers representable with 4 bits stripped off. (#i12446#)
  if ( approxEqual( a + 1.0, b )  !approxEqual( a, b ) )
return b - 1.0;
  return b;
}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Bootstraping with MANOVA

2003-06-10 Thread Prof Brian Ripley
It means what it says!  The residuals from the manova fit have a 
degenerate distribution: that's a problem with bootstrapping.

I don't think you've done this correctly: assuming you are intending to 
bootstrap residuals you seem to have resampled the independent variable 
and not added back the mean contribution.  Compare the examples in MASS4 
or Davison  Hinkley.

Also, do remember you need to show the bootstrap is valid in each 
scenario: it is not universally valid and this one looks dodgy to me.

On Wed, 11 Jun 2003, Ko-Kang Kevin Wang wrote:

 Hi,
 
 Does anyone know what the error message mean?
  Boot2.Pillai - function(x, ind) {
 +   x - as.matrix(x[,2:ncol(x)])
 +   boot.x - as.factor(x[ind, 1])
 +   boot.man - manova(x ~ boot.x)
 +   summary(manova(boot.man))[[4]][[3]]
 + }
  
  man.res - manova(as.matrix(pl.nosite) ~
 +   as.factor(plankton.new[,1]))$residuals
  boot2.plank - cbind(plankton.new[, 1], man.res)
  boot.sep - boot(boot2.plank, Boot2.Pillai, R = 1000,
 +  strata = plankton.new[, 1])
 Error in summary.manova(manova(boot.man)) : 
 residuals have rank 5  6
 Execution halted
 
 
 A sample of plankton.new is as follows:
  plankton.new[sample(dim(plankton.new)[1], 5, replace = TRUE),]
site  ACARTIA   EUTERP   GLADIO   TEMORA  FAVELLA OIKOPL
 15M 2.326336 3.168792 0.00 0.00 3.854852  0
 47W 2.699838 2.276462 1.799341 2.495544 2.274158  0
 33W 2.274158 3.301247 0.00 0.00 0.00  0
 8 M 2.875640 2.796574 0.00 0.00 3.051538  0
 4 M 2.100371 2.796574 0.00 0.00 2.100371  0
 
 
 pl.nosite is a data frame like plankton.new, but without the site column.
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] speeding up 1000s of coxph regression?

2003-06-10 Thread Xiao-Jun Ma
I have a gene expression matrix n (genes) X p (cases), where n = 8000 and p
= 100. I want to fit each gene as univariate in a coxph model, i.e., fitting
8000 models. I do something like this:

res - apply(data, 1, coxph.func)

which takes about 4 min, not bad. But I need to do large numbers of
permutations of the data (permuting the columns), for example, 2000, which
would take 5 days. I would like to know if there is way to speed this up?

Any help appreciated.

Xiao-Jun

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] speeding up 1000s of coxph regression?

2003-06-10 Thread Thomas Lumley
On Tue, 10 Jun 2003, Xiao-Jun Ma wrote:

 I have a gene expression matrix n (genes) X p (cases), where n = 8000 and p
 = 100. I want to fit each gene as univariate in a coxph model, i.e., fitting
 8000 models. I do something like this:

 res - apply(data, 1, coxph.func)

 which takes about 4 min, not bad. But I need to do large numbers of
 permutations of the data (permuting the columns), for example, 2000, which
 would take 5 days. I would like to know if there is way to speed this up?


Calling coxph.fit directly would likely be faster.

Also, you probably don't need to do 2000 permutations on all 8000 genes: a
few hundred permutations is probably enough to decide that most of the
genes aren't interesting.

If you are going to be doing a lot of this sort of thing it might be worth
looking at the parallel processing facilities in the `snow' package.
There's a description of their use in another gene expression problem in
the new R Newsletter.


-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Re: color coding a legend - solved?

2003-06-10 Thread Robert Schick
I did the following, and had it plot as needed: 
legend(3.25, -2.5, leg.txt, pch=15, col= levels(as.factor(codes(cv.wshed.grp

Problem was in my (lack of) understanding of codes()

But my second question is this: is there a way to color code the points
on the first two PCA axes by group id within a biplot. Fig 11.7 in MASS
4 is my inspiration, but I don't understand how to use the col variable
within biplot. The following works within a simple plot, but not a
biplot:

text(cv.mds.spr$points, labels = as.character(cv.wshed.id.spr), col = 
levels(as.factor(codes(cv.wshed.grp))), cex=.75)

Advice?


Robert Schick wrote:
 
 I'm using R 1.6.2 on a Windows 2000 machine.
 
 I've plotted the results of an MDS run labeled by a numerical ID, and
 color coded by a group code:
 
 plot(cv.mds.spr$points, type=n, main=Non-Metric Multidimensional
 Scaling of SprRun CV Watersheds)
 text(cv.mds.spr$points, labels = as.character(cv.wshed.id.spr), col =
 codes(cv.wshed.grp), cex=.75)
 
 Question is, how do I get the legend to match the color codes?
 
 I have tried different permutations of the following:
 leg.txt - c(LSSJ.NS,LSSJ.SS,US.RD,US.SF)   # the groups in
 cv.wshed.grp
 legend(-6.5, -2.5, leg.txt, pch=1234, col=
 as.character(codes(cv.wshed.grp)))
 
 But this only plots the codes in cv.wshed.grp as they are encountered,
 not the levels. Ideally I'd like the legend to have the Group ID label,
 and a filled box corresponding to the colors in the text call above.
 
 --
 Rob Schick
 Ecologist
 NOAA Fisheries
 Santa Cruz Lab
 110 Shaffer Road
 Santa Cruz, CA 95060
 Phone: 831.420.3960

-- 
Rob Schick
Ecologist
NOAA Fisheries
Santa Cruz Lab
110 Shaffer Road
Santa Cruz, CA 95060
Phone: 831.420.3960

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Getting graphs into LaTeX

2003-06-10 Thread Gwendolyn van Paasschen
Hi, A, this is G -- please let me know how to reach you!
[[alternate HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Multiple match function?

2003-06-10 Thread Jonck van der Kogel
Hi all,
I have (yet another) question about a function in R. What I would like 
to do is test for the presence of a certain value in a vector, and have 
the positions that this value is at returned to me.
For example, let's say I have a vector:
x - c(1,1,2,2,3,3,4,4)

Now I would like a function that would return positions 3 and 4 should 
I test for the value 2. Or 5 and 6 should I test for 3.

Could someone please tell me how I should do this? The match function 
only returns the first position that a value is found at. Of course I 
could write my own function that loops through the vector and tests for 
the presence of each value manually but it seems likely that a function 
that does this is already present in R. No need to re-invent the wheel 
:-)

Thanks very much, Jonck

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Does the RPM for RH9 know about TCL/Tk

2003-06-10 Thread Morgan Hough
Sorry for the probable repeat post but I can only search the list up to
2002 (is there a better way?). I am using the RH9 RPM from CRAN but
packages like AnalyzeFMRI say that tcltk is not found. Do I need to do
more to get Tk GUIs working on RH9 or does the RPM not have tcltk support
built in (should I compile from source). Thanks in advance.

Take care.

-Morgan

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Does the RPM for RH9 know about TCL/Tk

2003-06-10 Thread Jonathan Baron
On 06/10/03 18:30, Morgan Hough wrote:
Sorry for the probable repeat post but I can only search the list up to
2002 (is there a better way?). 

Yes, see my search page below.

I am using the RH9 RPM from CRAN but
packages like AnalyzeFMRI say that tcltk is not found. Do I need to do
more to get Tk GUIs working on RH9 or does the RPM not have tcltk support
built in (should I compile from source). Thanks in advance.

There was in fact some discussion of this last month.  I am not
sure of the answer.  But I installed 1.7.0 from the RPM for RH 9,
and I got the same error message when trying to get Rcmdr to
work.  I did have tcl and tk installed.  Unfortunately, I did not
do a properly controlled experiment.  I first installed tcllib,
which was not installed originally.  (That didn't help, by
itself.)  Then I re-installed R _from source_ and then everything
worked.  But I did have the basic vanilla installation of RH 9,
and I did have this problem.  So you aren't the only one.  

I still don't know whether tcllib is necessary, and whether the
RPM itself installs different things depending on what is on the
system.  (I would assume not, but I'm not sure.)

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
R page:   http://finzi.psych.upenn.edu/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Multiple match function?

2003-06-10 Thread Jonathan Baron
On 06/11/03 03:28, Jonck van der Kogel wrote:
Hi all,
I have (yet another) question about a function in R. What I would like 
to do is test for the presence of a certain value in a vector, and have 
the positions that this value is at returned to me.
For example, let's say I have a vector:
x - c(1,1,2,2,3,3,4,4)

Now I would like a function that would return positions 3 and 4 should 
I test for the value 2. Or 5 and 6 should I test for 3.

one way is
which(x==2)

I'm sure there are others.

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
R page:   http://finzi.psych.upenn.edu/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] qwilcox

2003-06-10 Thread Knut M. Wittkowski
The function 'wilcox.test' in R and S gives (almost) identical results (see 
below). 'qwilcox' however, does not:

 qwilcox(p,5,5)

p:  0.025   0.975

R   3  22
S  18  37
I originally wanted to ask a questions, but then I found the answer. Given 
the confusion I run into, I wonder if this experience is worth reporting.

The S-Plus quantiles are almost correct (they are the limits of the region 
of acceptance, rather than the quantiles). The description in the R help file

Distribution of the Wilcoxon Rank Sum Statistic

suggests that R:qwilcox also gives quantiles for the rank sum (which the 
Wilcoxon rank sum test is based on). In fact, however, it gives quantiles 
for the u-statistic (which the Mann-Whitney test is based upon). While the 
tests are logically equivalent, the particular test statistics

- sum(Xic(X,Y))rank sum (Wilcoxon)
- sum(Xic(  Y))u statistic (Mann-Whitney)
are different (apologies for the non-standard notation). Since 
wilcox.test relates to the rank sums in both R and S, as does qwilcox in 
S, the name 'qwilcox' in R may be misleading. How about renaming it to 
'qmannwhitney' instead and adding 'qwilcoxon' for a function that 
corresponds to S:qwilcox?

 x1 - c(1,2,3,  5,6 )
 x2 - c(  4,7,8,9,10)
 sum(x1)
[1] 17
 sum(x2)
[1] 38
R wilcox.test(x1,x2,alternative=two.sided)
Wilcoxon rank sum test: p-value = 0.03175
R wilcox.exact(x1,x2,alternative=two.sided)
Exact Wilcoxon rank sum test: p-value = 0.03175
S wilcox.test(x1,x2,alternative=two.sided)
Exact Wilcoxon rank-sum test: p-value = 0.0317
 x1 - c(1,2,  4,5,6 )
 x2 - c(3,  7,8,9,10)
 sum(x1)
[1] 18
 sum(x2)
[1] 37
R wilcox.test(x1,x2,alternative=two.sided)
Wilcoxon rank sum test: p-value = 0.05556
R wilcox.exact(x1,x2,alternative=two.sided)
Exact Wilcoxon rank sum test: p-value = 0.05556
S wilcox.test(x1,x2,alternative=two.sided)
Exact Wilcoxon rank sum test: p-value = 0.0556
Knut M. Wittkowski, PhD,DSc
--
The Rockefeller University, GCRC
1230 York Ave #121B, Box 322, NY,NY 10021
+1(212)327-7175, +1(212)327-8450 (Fax)
[EMAIL PROTECTED]
http://www.rucares.org/statist/
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] speeding up 1000s of coxph regression?

2003-06-10 Thread A.J. Rossini
Thomas Lumley [EMAIL PROTECTED] writes:


 If you are going to be doing a lot of this sort of thing it might be worth
 looking at the parallel processing facilities in the `snow' package.
 There's a description of their use in another gene expression problem in
 the new R Newsletter.

Actually, they used the RPVM package directly; however, Thomas is
still correct, it probably would be simple to recast using SNOW.

Some hints and details can be found in a tech report by Luke Tierney,
Michael Li, and myself in the UW Biostat tech report series (can't
recall which #, but it's on http://www.bepress.com/uwbiostat/).

best,
-tony

-- 
A.J. Rossini  /  [EMAIL PROTECTED]  /  [EMAIL PROTECTED]
Biomedical/Health Informatics and Biostatistics, University of Washington.
Biostatistics, HVTN/SCHARP, Fred Hutchinson Cancer Research Center.
FHCRC: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email 

CONFIDENTIALITY NOTICE: This e-mail message and any attachments ... {{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help