from:"Jari Oksanen"

Re: [R] prcomp eigenvalues

2005-08-03 Thread Jari Oksanen

On Tue, 2005-08-02 at 19:06 -0700, Rebecca Young wrote:
 Hello,
 
 Can you get eigenvalues in addition to eigevectors using prcomp?  If so how?
 I am unable to use princomp due to small sample sizes.
 Thank you in advance for your help!
 Rebecca Young
 
Rebecca, 

This answer is similar as some others, but this is simpler.

You have two separate problems: running PCA and getting eigenvalues. The
first is easy to solve: use prcomp instead of princomp (which only
exists for  historic reasons).  Function prcomp can handle cases with
more columns than rows. 

pc - prcomp(x)

Above I assumed that your data are called x (or you can first make x,
say: x - rcauchy(200); dim(x) - c(20,10) -- which puts a funny twist
to comments on variances and standard deviations below).

This saves something that are called 'sdev' or standard deviations, and
you can get values that are (proportional to) eigenvalues simply by
taking their squares:

ev - pc$sdev^2

These may be good enough for you (they would be good enough for me).
However, if you want to exactly replicate the numbers in some other
piece of software, you may need to multiply these by some constant. If
you don't need this, you may stop reading here.

The eigenvalues above are related to usual 'unbiased' variance so that
the following results are approximately equal:

sum(ev)
sum(apply(x, 2, var))

If you want to get eigenvalues related to biased estimate of variance,
you can do

eb - (1-1/nrow(x))*ev

Function princomp uses these, as do some other software, but prcomp
works hard and carefully to get the eigenvalues it uses instead of
biased values (that would come naturally and directly in the algorithm
it uses). 

Some programs relate their eigenvalues to the sum of squares, and you
can get these by

es - (nrow(x) - 1) * ev

Finally, some popular programs in ecology (your affiliation) use
proportional eigenvalues which you can get with:

ev/sum(ev)

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] INDVAL and mvpart

2005-08-09 Thread Jari Oksanen

Agnieszka,

Package 'mvpart' is documented. In this case, ?rpart.object explains
*where* in the rpart object is the membership vector.

cheers, jari oksanen

On Mon, 2005-08-08 at 16:02 +0200, [EMAIL PROTECTED] wrote:
 Hi,
 
 I'd like to perform Dufrene-Legendre Indicator Species Analysis for
 a multivariate regression tree. However I have problems with arguments
 of duleg(veg,class,numitr=1000)function. How to obtain  a vector of
 numeric class memberships for samples, or a classification object
 returned from mvpart?

-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] invalid 'mode' of argument?

2005-08-10 Thread Jari Oksanen

On Wed, 2005-08-10 at 08:13 -0400, Kang, Sang-Hoon wrote:

 As a novice I was trying to calculate Shannon diversity index using
 diversity function in vegan package and kept having same error message.
 Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument
 
This error (which is from sum()) seems to come if you have non-numeric
data (factors, character variables etc.). Check that your data are
strictly numeric. Some of the most common cases I've seen are that row
or column names are not read as row and column names but as data rows or
columns. 
 
 My dataset is from microarray and have abundant missing values, so I
 tried labeling them as NA and 0, but still same error message.

 Shannon index is negative sum of proportion times log of proportion, so
 I put 1 for missing values to avoid log 0, but still same error message.
 
You shouldn't forge your data: the function handles zeros.

PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] vectors of different length in a matrix

2005-08-22 Thread Jari Oksanen

On Mon, 2005-08-22 at 08:56 -0400, Duncan Murdoch wrote:
 On 8/22/2005 8:45 AM, Marten Winter wrote:
  HI!
  
  I?ve 3 vectors of different length (a,b,c) and want to arrange them in a 
  matrix a,b,c as rows and the figures of these vectors in the columns 
  (with that matrix i want to calculate a distance between thes vectors - 
  vegan - vegdist - horn). Is there a possibilty to create such a matrix 
  and to fill up the missing fields with NA?s automatically
 
 Filling with NA's is the hard part; R normally likes to recycle vectors 
 that are too short.
 
 Here's one way, probably not the best:
 
 x - matrix(NA, 3, max(length(a), length(b), length(c)))
 x[1,seq(along=a)] - a
 x[2,seq(along=b)] - b
 x[3,seq(along=c)] - c
 
 Another way to do it would be to extend all the vectors to the same 
 length by appending NAs, then using rbind.
 
Another issue is that this would fail at the next step outlined in the
original message (vegan - vegdist - horn), since that step won't
accept NAs. So the original schedule was bad. If you fill with zeros,
then the 'vegdist' step would work in the sense that it produces
numbers. I don't know if these numbers would make any sense if the
vectors had nothing to do with each other originally, and columns would
be of mixed meaning after stacking into a matrix. If your vector
elements had identities (names) originally, then you should stack your
data so that entries with the same identity go to the same column. It is
difficult to imagine Horn index used in cases where you don't have these
identities -- specifically species names.  

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] vectors of different length in a matrix

2005-08-22 Thread Jari Oksanen

On Mon, 2005-08-22 at 16:13 +0300, Jari Oksanen wrote:
 On Mon, 2005-08-22 at 08:56 -0400, Duncan Murdoch wrote:
  On 8/22/2005 8:45 AM, Marten Winter wrote:
   HI!
   
   I?ve 3 vectors of different length (a,b,c) and want to arrange them in a 
   matrix a,b,c as rows and the figures of these vectors in the columns 
   (with that matrix i want to calculate a distance between thes vectors - 
   vegan - vegdist - horn). Is there a possibilty to create such a matrix 
   and to fill up the missing fields with NA?s automatically
  
  Filling with NA's is the hard part; R normally likes to recycle vectors 
  that are too short.
  
  Here's one way, probably not the best:
  
  x - matrix(NA, 3, max(length(a), length(b), length(c)))
  x[1,seq(along=a)] - a
  x[2,seq(along=b)] - b
  x[3,seq(along=c)] - c
  
  Another way to do it would be to extend all the vectors to the same 
  length by appending NAs, then using rbind.
  
 Another issue is that this would fail at the next step outlined in the
 original message (vegan - vegdist - horn), since that step won't
 accept NAs. 

Uh. It seems that I should read the package documentation (and posting
guide which tells me to do so): it seems that vegdist() *can* handle
NAs. I do still think that data with NA probably makes no sense with
alternative horn.

cheers, jari oksanen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Document clustering for R

2005-09-13 Thread Jari Oksanen

On Mon, 2005-09-12 at 12:47 -0700, Raymond K Pon wrote:
 I'm working on a project related to document clustering. I know that R 
 has clustering algorithms such as clara, but only supports two distance 
 metrics: euclidian and manhattan, which are not very useful for 
 clustering documents. I was wondering how easy it would be to extend the 
 clustering package in R to support other distance metrics, such as 
 cosine distance, or if there was an API for custom distance metrics.
 
You don't have to extend the clustering package in R to support other
distance metrics, but you should take care that you produce your
dissimilarities (or distances) in the standard format so that they can
be used in clustering package or in cmdscale or in isoMDS or any other
function excepting a dist object.  Clustering package will support
new dissimilarities if they were written in standard conforming way.
There are several packages that offer alternative dissimilarities (and
some even distances) that can be used in clustering functions. Look for
distances or dissimilarities in the R Site. Some of these could be
the one for you... I would be surprised if cosine index is missing (and
if needed, I could write it for you in C, but I don't think that is
necessary).

cheers, jari oksanen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Graphical presentation of logistic regression

2005-09-15 Thread Jari Oksanen

On Wed, 2005-09-14 at 06:29 -0500, Frank E Harrell Jr wrote:
 Beale, Colin wrote:
  Hi,
  
  I wonder if anyone has written any code to implement the suggestions of
  Smart et al (2004) in the Bulletin of the Ecological Society of America
  for a new way of graphically presenting the results of logistic
  regression (see
  www.esapubs.org/bulletin/backissues/085-3/bulletinjuly2004_2column.htm#t
  ools1 for the full text)? I couldn't find anything relating to this sort
  of graphical representation of logistic models in the archives, but
  maybe someone has solved it already? In short, Smart et al suggest that
  a logistic regression be presented as a combination of the two
  histograms for successes and failures (with one presented upside down at
  the top of the figure, the other the right way up at the bottom)
  overlaid by the probability function (ie logistic curve). It's somewhat
  hard to describe, but is nicely illustrated in the full text version
  above. I think it is a sensible way of presenting these results and am
  keen to do so - at the moment I can only do this by generating the two
  histograms and the logistic curve separately (using hist() and lines()),
  then copying and pasting the graphs out of R and inverting one in a
  graphics package, before overlying the others. I'm sure this could be
  done within R and would be a handy plotting function to develop. Has
  anyone done so, or can anyone give me any pointers to doing this? I
  really nead to know how to invert a histogram and how to overlay this
  with another histogram the right way up.
  
  Any thoughts would be welcome.
  
  Thanks in advance,
  Colin
 
  From what you describe, that is a poor way to represent the model 
 except for judging discrimination ability (if the model is calibrated 
 well).  Effect plots, odds ratio charts, and nomograms are better.  See 
 the Design package for details.
 

You're correct when you say that this is a poor way to represent the
model. However, you should have some understanding to us ecologists who
are simple creatures working with tangible subjects such as animals and
plants (microbiologists work with less tangible things). Therefore we
want to have a concrete and simple representation. After all, the
example was about occurrence of an animal against a concrete
environmental variable, and a concrete representation was suggested.
Nomograms and things are abstractions that you understand first after
long education and training (I tried the Design package and I didn't
understand the nomogram plot). 

I tried with one concrete example with my own data, and the inverted
histogram method was patently misleading (with Baz Rowlingson's neat and
compact code, sorry for the repetition). The method would be useful with
dense and regular data only, but now the clearest visual cue was the
uneven sampling intensity. With my limited knowledge on R facilities, I
can now remember only two ways two preserve the concreteness of display
in the base R: jitter() to avoid overplotting of observations, and
sunflowerplot() to show the amount of overplotting.

I think Ecological Society of America would be happy to receive papers
to suggest better ways to represent binary response data, if some of the
knowledgeable persons in this groups would decided to educate them (I'm
not an ESA member, so I wouldn't be educated: therefore 'them' instead
of 'us'). The ESA bulletin will be influential in manuscript submitted
to the Society journals in the future, and the time for action is now.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Compare two distance matrices

2005-10-07 Thread Jari Oksanen

On Fri, 2005-10-07 at 09:31 +0200, Mattias de Hollander wrote:
 Hi all,
 
 Thanks for the quick response. I see the ade4 package in not needed
 for distance matrix computation, but as far i can see you need it for
 comparing two distance matrices. In the stats package i can't find any
 similiar functions like mantel.randtest or RVdist.randtest of the ade4
 package. So i think this package is still needed if i would like to
 make a scatter plot of the matrices. Or should i manualy compare these
 matrices with a loop for example and make a plot of this?

To plot two dissimilarity structures d1 and d2 in base R, you can use
command

plot(d1, d2)

For a plot() command, the dissimilarity structure looks like a vector.

Dissimilarity structure means a result that you can get from as.dist()
or directly from dist() function or any other alternative implementation
of dissimilarity functions giving compliant results.

For Mantel tests you may need ade4 (or some other package that has the
same test).

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Under-dispersion - a stats question?

2005-10-11 Thread Jari Oksanen

On Tue, 2005-10-11 at 17:16 -0400, Kjetil Holuerson wrote:
 Martin Henry H. Stevens wrote:
  Hello all:
  I frequently have glm models in which the residual variance is much  
  lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF  
  = 82). Is it appropriate for me to use a quasipoisson error  
  distribution and test it with an F distribution? It seems to me that  
  I could stand to gain a much-reduced standard error if I let the  
  procedure estimate my dispersion factor (which is what I assume the  
  quasi- distributions do).
  
 
 I did'nt see an answer to this. maybe you could treat as a
 quasimodel, but first you should ask why there is underdispersion.
 
 Underdispersion could arise if you have dependent responses, for 
 instance, competition (say, between plants) could produce 
 underdispersion. Then you would be better off changing to an appropriate
 model. maybe you could post more about your experimental setup?
 
Some ecologists from Bergen, Norway, suggest using quasipoisson with its
underdispersed residual error (while I wouldn't do that). However, it
indeed would be useful to know a bit more about the setup, like the type
of dependent variable. If the dependent variable happens to be the
number of species (like it's been in some papers by MHHS), this
certainly is *not* Poisson nor quasi-Poisson nor in the exponential
family, although it so often is modelled. I've often seen that species
richness (number of species -- or in R-speak 'tokens' -- in a
collection) is underdispersed to Poisson, and for a good reason. Even
there I'd play safe and use poisson() instead of underdispersed
quasipoisson(). 

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] varimax rotation difference between R and SPSS

2005-10-13 Thread Jari Oksanen

On Thu, 2005-10-13 at 16:13 +0200, Andreas Cordes wrote:
 Hi,
 I am puzzeled with a differing result of princomp in R and FACTOR in 
 SPSS. Regarding the amount of explained Variance, the two results are 
 the same. However, the loadings differ substantially, in the unrotated 
 as well as in the rotated form.
 In both cases correlation matrices are analyzed. The sums of the squared 
 components is one in both programs.

Not in the data that you pasted in your message. After reading in the
data I get from the non-rotated R solution:

 colSums(rpc^2)
V2 V3
 1  1

And the non-rotated SPSS solutions gives:

 colSums(spc^2)
  V2   V3
5.363671 2.136624

After normalizing the SPSS pc's, the solutions are identical (within
numerical accuracy) after reversing the sign of second pc.

I don't want to look at the data full of holes, like the loadings from
varimax rotation. However, it seems that the raw solutions are
identical.

cheers, jari oksanen

-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] noncommutative addition: NA+NaN != NaN+NA

2004-09-07 Thread Jari Oksanen

On Tue, 2004-09-07 at 12:47, Prof Brian Ripley wrote:
 On Tue, 7 Sep 2004, Robin Hankin wrote:
 
  Check this out:
 
 I am unable to reproduce it on any of the 7 different systems I checked
 (Solaris, Linux, Windows with various compilers).
 
NaN +NA
  [1] NaN
NA + NaN
  [1] NA
  
  I thought + was commutative by definition.   What's going on?
 
  platform powerpc-apple-darwin6.8
  arch powerpc
  os   darwin6.8
  system   powerpc, darwin6.8
  status
  (Both give NA under linux, so it looks like a version-specific issue).
 
 Linux on that hardware?  It might be a chip issue.

I tried this in Linux on Mac iBook G4, and the results were the same:
NaN+NA was NaN, just like in MacOS X version.  So it looks like a chip
issue. However, the RPM built from the src.rpm packages at CRAN failed
in some checks in Linux/iBook. 

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] isoMDS

2004-09-09 Thread Jari Oksanen

On Wed, 2004-09-08 at 21:31, Doran, Harold wrote:
 Thank you. Quick clarification. isoMDS only works with dissimilarities.
 Converting my similarity matrix into the dissimilarity matrix is done as
 (from an email I found on the archives)
 
  d- max(tt)-tt
 
 Where tt is the similarity matrix. With this, I tried isoMDS as follows:
 
  tt.mds-isoMDS(d)
 
 and I get the following error message. 
 
 Error in isoMDS(d) : An initial configuration must be supplied with
 NA/Infs in d. I was a little confused on exactly how to specify this
 initial config. So, from here I ran cmdscale on d as
 
This error message is quite informative: you have either missing or
non-finite entries in your data. The only surprising thing here is that
cmdscale works: it should fail, too. Are you sure that you haven't done
anything with your data matrix in between, like changed it from matrix
to a dist object? If the Inf/NaN/NA values are on the diagonal, they
will magically disappear with as.dist. Anyway, if you're able to get a
metric scaling result, you can manually feed that into isoMDS for the
initial configuration, and  avoid the check. See ?isoMDS.

  d.mds-cmdscale(d)
 
 which seemed to work fine and produce reasonable results. I was able to
 take the coordinates and run them through a k-means cluster and the
 results seemed to correctly match the grouping structure I created for
 this sample analysis.
 
 Cmdscale is for metric scaling, but it seemed to produce the results
 correctly. 
 
 So, did I correctly convert the similarity matrix to the dissimilarity
 matrix? Second, should I have used cmdscale rather than isoMDS as I have
 done? Or, is there a way to specify the initial configuration that I
 have not done correctly.

If you don't know whether you should use isoMDS or cmdscale, you
probably should use cmdscale. If you know, things are different.
Probably isoMDS gives you `better'(TM) results, but it is more
complicated to handle.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] isoMDS

2004-09-09 Thread Jari Oksanen

On Thu, 2004-09-09 at 04:53, Kjetil Brinchmann Halvorsen wrote:

 
 Mardia, kent  Bibby defines the standard transformation from a 
 similarity matrix to a dissimilarity
 (distance) matrix by
 
 d_rs -  sqrt( c_rr -2*c_rs + c_ss)
 
 where c_rs are the similarities. This assures the diagonal of the 
 dissimilarity matrix to be zero.
 You could try that.
 
In R notation, this would be

sim2dist - function(x) 
 as.dist(sqrt(outer(diag(x), diag(x), +) - 2*x))

Mardia, Kent  Bibby indeed passingly say that this is a `standard
transformation' (page 403). However, it is really a canonical way only
if diagonal elements in similarity matrix are sums of squares, and
off-diagonal elements are cross products. In that case the `standard
transformation' gives you Euclidean distances (or if you have
variances/covariances or ones/correlations it gives you something
similar). However, it is no standard if your similarities are something
else, and cannot be transformed into Euclidean distances.

However, in isoMDS this *may* not matter, since NMDS uses only rank
order of dissimilarities, and any transformation giving dissimilarities
in the same rank order *may* give similar results. The statement was
conditions (may), since isoMDS uses cmdscale for the starting
configuration, and cmdscale will give different results with different
transformations. So isoMDS may stop in different (local) optima. Setting
`tol' parameter low enough in isoMDS (see ?isoMDS) helped in a couple of
cases I tried, and the results were practically identical with different
transformations. So it doesn't matter too much how you change your
similarities to dissimilarities, since isoMDS indeed treats them as
dissimilarities (but cmdscale treats them as distances).

cheers, jari oksanen
-- 
J.Oksanen, Oulu, Finland.
Object-oriented programming is an exceptionally bad idea which could
only have originated in California. E. Dijkstra

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Installing packages on OS X

2004-09-09 Thread Jari Oksanen

On Wed, 2004-09-08 at 21:25, hadley wickham wrote:
 On my computer, it seems that (binary?) packages installed through the
 GUI in RAqua are not used available to the command line version of R,
 while (source) packages installed with R CMD INSTALL are available to
 both.  This is a problem when I run R CMD CHECK on a package that I am
 creating that depends on packages I have installed through the gui.
 
 Is this a problem with my installation of R, or a known limitation?
 (there is no mention of this in the Mac OS X faq, however, the entire
 section entitled Installing packages is blank).
 
It is in some other FAQ... Unfortunately, I don't have a Mac available
now, so I can't check. However, seek for environmental variables and
setting library paths in some other R FAQ or R
Administration/Installation guide. There will you find a description of
things you should do. It is just as crystal clear as unix man pages:
everything is clear *after* you know what is said there, but you may
have hard time noticing this clarity.

I solved this problem some months ago after a long search among the
official documentation. So it is documented, but well hidden.

I may have a look at a machine where I solved this in the evening
(UTC+3), if you won't get solution before that.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] isoMDS

2004-09-09 Thread Jari Oksanen

On Thu, 2004-09-09 at 14:25, Doran, Harold wrote:
 Thank you. I use the same matrix on cmdscale as I did with isoMDS. I
 have reproduced my steps below for clarification if this happens to
 shed any light.
--- snip ---

Doran,

Your data clarified things. It seems to me now, that your data are not a
a matrix but a data.frame. A problem for an ordinary user is that
data.frames and matrices look identical, but that's only surface: you
shouldn't be shallow but look deep in their souls to see that they are
compeletely different, and therefore isoMDS fails. At least isoMDS gives
just that error for a data.frame, but cmdscale casts data.frame to a
matrix therefore it works.

So the following should work (worked when I tied):

tt - as.matrix(tt)
isoMDS(tt)

(and you could down to a dist object with tt - as.dist(tt) which seems
to handle data.frames directly, too).

Then you will still need to avoid the complaint about zero-distances
among points. This means that you have some identical points in your
data, and isoMDS does not like them. This issue was discussed here in
April, 2004 (and many other times). Search archives for the subject
question on isoMDS.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] getting started on Bayesian analysis

2004-09-15 Thread Jari Oksanen

On Wed, 2004-09-15 at 03:27, HALL, MARK E wrote:
   I've found 
   
 Bayesian Methods: A Social and Behavioral Sciences Approach
 by Jeff Gill 
 
 useful as an introduction.  The examples are written in R and S with generalized 
 scripts for doing 
 a variety of problems.  (Though I never got change-point analysis to successfully in 
 R.)
 
Change point analysis? I haven't seen the book, but I read lecture
handouts of one Bayesian course over here in Finland (Antti Penttinen,
Jyväskylä), and translated his example to R during one (rare) warm
summer day in a garden. So do you mean this (binary case):

 source(/mnt/flash/cb.update.R)
 cb.update
function (y, A=1, B=1, C=1, D=1, N=1200, burnin=200)
{
n - length(y)
lambda - numeric(N)
mu - numeric(N)
k - numeric(N)
lambda[1] - A/(A+B)
mu[1] - C/(C+D)
k[1] - n/2
sn - sum(y)
 
for (i in 2:N) {
kold - k[i-1]
sk - sum(y[1:kold])
lambda[i] - rbeta(1, A+sk, B + kold - sk)
mu[i] - rbeta(1, C + sn - sk, D + n - sn + sk - kold  )
knew - sample(n-1, 1)
sknew - sum(y[1:knew])
r - (sknew - sk) *
(log(lambda[i])-log(mu[i]))-(knew-kold)*(lambda[i]-mu[i])
if(min(0,r)  log(runif(1))) k[i] - knew
else k[i] - k[i-1]
}
out - cbind(lambda, mu, k)
out[(burnin+1):N, ]
}
 y - c(rbinom(60, 1, 0.8), rbinom(40, 1, 0.3))
 uh - cb.update(y, N=5200)
 colMeans(uh)
lambda mu  k
 0.8189303  0.4169367 59.077
 mean(y[1:60])
[1] 0.783
 mean(y[41:100])
[1] 0.45
 plot(density(uh[,1]))
 plot(density(uh[,2]))
 plot(table(uh[,3]), type=h)

This was off-topic. So something about business: isn't the (Win)BUGS
author working with a R port?

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Problem installing source packages on OS X

2004-09-15 Thread Jari Oksanen

On 15 Sep 2004, at 20:29, Aric Gregson wrote:
I am attempting to install the Hmisc, rreport and Design packages, but
am not able to do so. I am running R v1.9.1 on Mac OS 10.3.5.
I get the same error for Hmisc (rreport is not on CRAN). It looks like
it is trying to use g77 to compile the source package. How can I change
the default compiler? Will this solve the problem? I cannot find a
binary version of either package.
R is trying to build a Fortran program, and it needs a Fortran 
compiler. Fortran compiler does not ship with MacOS X, but you got to 
get one. See the MacOS FAQ for R. If I remember correctly, it tells you 
to go http://hpc.sourceforge.net/ for the compiler.

Normally I wouldn't remember addresses like this, but just today I had 
to make a visit there: I had installed g77 using fink, and that puts 
its stuff into /sw instead of /usr/local. Some R routines had hardcoded 
the g77 path to /usr/local/bin/g77 and so building a package failed in 
the false claim of missing g77 (yeah, it was in the path).

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] BUGS and OS X

2004-09-16 Thread Jari Oksanen

On Wed, 2004-09-15 at 21:29, Tamas K Papp wrote:
 On Wed, Sep 15, 2004 at 02:21:18PM -0400, Liaw, Andy wrote:
  That's more of a question for the BUGS developers.  BUGS is not open
source,
  so whatever binary is provided, that's all you can use.  If I'm not
  mistaken, WinBUGS is the only version under development.
 
 I found something called JAGS, and I am still exploring it.  It
 appears to be an open-source BUGS replacement, thought with
 limitations.
 
MacOS X is a kind of unix (where the emphasis is on the kind of), so
you can get and compile any source code developed for unix -- with some
luck. One alternative is Bassist available at
http://www.cs.helsinki.fi/research/fdk/bassist/. I just tried and found
out that you can compile and install it in MacOS X in the usual way
(./configure  make  sudo make install). That's all I can say about
it. It may not be easiest to use. The current version seems to be a bit
oldish and not quite complete, but somebody claimed that they may start
developing Bassist again. Actually, Bob O'Hara (who usually calls
himself Anon. in this list) should know more, and hopefully this
message will prompt him to tell us, too.

 I was asking what software people would recommend for the same
 functionality, not a drop-in replacement.  I am just baffled by the
 bewildering array of R packages, and would be so happy if somebody
 told me what THEY use for Bayesian analysis, so I could read the docs
 and get started.  MCMC? Boa? etc.  Suggestions on how experienced
 users do bayesian analysis in R would be welcome.
 
You need a guru to guide you. That's the holy tradition in Bayesianism.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Signs of loadings from princomp on Windows

2004-09-16 Thread Jari Oksanen

On Thu, 2004-09-16 at 02:38, Tony Plate wrote:
 You could investigate this yourself by looking at the code of princomp (try 
 getAnywhere(princomp.default)).  I'd suggest making a file that in-lines 
 the body of princomp.default into the commands you had below.  See if you 
 still get the difference.  (I'd be surprised if you didn't).  Then try 
 commenting out lines the second pass through the commands produces the same 
 results as the first.  The very last thing you commented out might help to 
 answer your question What would be causing the
 difference?  (The fact that various people chimed in to say they could 
 reproduce the behavior that bothered you, but didn't bother dig deeper 
 suggests it didn't bother them that much, which further suggests that you 
 are the person most motivated by this and thus the best candidate for 
 investigating it further...)
 
People were not too bothered, since the sign of the eigenvector is not
well defined in PCA: vectors x and -x are equal. Have you compared
absolute values? Do they differ much (more than, say 1e-6)? If they
differ too much for you, this could be a symptom of some other problems,
so it may be worth investigating in machines where you get this thing
(others can do nothing). Since the princomp.default is difficult to find
(either getAnywhere(princomp.default) or stats:::princomp -- I hate
this information hiding), and its code is winding, I'd suggest you
concentrate studying line:

sol - eigen(cv, symmetric=TRUE)

where you get the cv with

cv - cov.wt(x)$cov * (1 - 1/nrow(x))

and x is your data matrix. If cv remains unchanged from time to time,
but there is a change in signs of sol$vectors, then you have localised
your problem. If it's not there, then the rest of the princomp.default
code is worth investigating. If it's in the eigen, then it dives deep
into Fortran, and that may be all you can say. (If your covariance
matrices change with repeated calculations, then the problem is deeper).

However, sign doesn't matter if there are 
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Multi-dimensional scaling

2004-09-16 Thread Jari Oksanen

On Thu, 2004-09-16 at 17:28, Luis Rideau Cruz wrote:

 Is there any package/function in R which can perform multi-dimensional
 scaling?
 
Yes.

Ripley's MASS package has isoMDS for non-metric multidimensional
scaling. Moreover, the same package has function sammon for another
variant. Some people regard SOM as a crude form of multidimensional
scalling, and that is -- surprise -- in MASS, too (but there are other
implementations). Vasic R (or its stats component) has principal
co-ordinates analysis, a.k.a. as metric multidimensional scaling.
Finally, R has a utility help.search which would show you most of these
and something else, too (perhaps xgvis in the xgobi, if that's installed
in your system). Try help.search(multidimensional scaling).

cheers, jari oksane
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] data(eurodist) and PCA ??

2004-10-13 Thread Jari Oksanen

On Wed, 2004-10-13 at 09:51, Prof Brian Ripley wrote:
 On Wed, 13 Oct 2004, Dan Bolser wrote:

  I have a complex distance matrix, and I am thinking about how to cluster
  it and how to visualize the quality of the resulting clusters. 
 
 Using PCA and plotting the first two components is classical
 multi-dimensional scaling, as implemented by cmdscale().  Look up MDS
 somewhere (e.g. in MASS).  It is exact if the distances are Euclidean in
 2D.  However, eurodist gives road distances on the surface of sphere.
 
 Classic examples for the illustration of MDS are departements of France 
 based on proximity data and cities in the UK based on road distances.
 
These road distances seem to be very non-Euclidean indeed (even
non-metric). It seems to be 2282km from Athens to Milan if you go
directly, but if you go via Rome it is only 1403km:

 trip - c(Athens, Rome, Milan)
 as.matrix(eurodist)[trip, trip]
   Athens Rome Milan
Athens  0  817  2282
Rome  8170   586
Milan2282  586 0
 817 + 586
[1] 1403

I thought that World is non-Euclidean, but not that obviously.

cheers, jari oksanen


-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] biplot.princomp with loadings only

2004-10-04 Thread Jari Oksanen

On Thu, 2004-09-30 at 10:33, Christoph Lehmann wrote:
 Hi
 
 is there a way to plot only the loadings in a biplot (with the nice 
 arrows), and to skip the scores?
 
Christoph,

I may have overlooked some email messages, but it seems to me that you
haven't yet got an answer to your practical question. From the practical
point of view, we may skip the point that you rather ask for a
monoplot than biplot if you have only one set of points. Further, I
may forget my surprise when I see that somebody really thinks that these
arrows are nice. OK, they may be nice if you have only a couple of
them, but anybody plotting 30 or more arrows normally asks how to get
rid off this mess. 

Of course you can plot arrows in your monoplot, since you have got
access to everything in R and you can do anything with R (but coffee
comes somewhat bland, so I recommend something else for the task cooking
coffee). Here is an example:

# Run PCA
data(USArrests)
sol - princomp(USArrests, cor=T)
# Extract loadings
X - sol$loadings
# Plot the frame
plot(X, asp=1, type=n)
abline(v=0, lty=3)
abline(h=0, lty=3)
# Plot arrows: see ?arrows for the syntax
arrows(0, 0, X[,1], X[,2], len=0.1, col=red)
# Label the arrows
text(1.1*X, rownames(X), col=red, xpd=T)

Cheers, jari okanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to use a matrix in pcurve?

2004-10-24 Thread Jari Oksanen

Sun,
On 24 Oct 2004, at 10:24, XP Sun wrote:
Hi, Everyone,
I want to calculate the principal curve of a points set.
First I read the points'coordinate with function scan,
then converted it to matrix with the function matrix,
and fit the curve with function principal.curve.
Here is my data in the file bmn007.data:
0.023603 -0.086540   -0.001533
0.024349 -0.083877   -0.001454
..
..
0.025004 -0.083690   -0.001829
0.025562 -0.083877   -0.001857
0.026100 -0.083877   0.90
0.025965 -0.083877   0.002574
and the code as follow:
pp - scan(bmn007.data, quiet= TRUE)
x - matrix(pp, nc=2, byrow=TRUE)
fit - principal.curve(x, plot = TRUE)
points(fit,col=red)
By now, I got a right result.
But when i changed to use pcurve with matrix x as pcurve(x),
an error was thrown as following:
Estimating starting configuration using : CA
Error in h %*% diag(sqrt(d)) : non-conformable arguments
How to convert a matrix to the format could be accepted by pcurve?
Any help appreciated!
Sun,
The canonical answer is ask De'ath (the author of the package). The 
rest is guessing. It seems that pcurve uses correspondence analysis 
(CA) to estimate the starting configuration. CA doesn't handle cases 
where any of the marginal sums (row or column sums) are negative or 
zero. Do you have this kind of cases? If so, can you get rid of them? 
Does pcurve have another option than CA for getting the starting 
configuration?

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] persp(), scatterplot3d(), ... argument

2004-10-27 Thread Jari Oksanen

On Wed, 2004-10-27 at 11:11, Uwe Ligges wrote:
 Jari Oksanen wrote:
 
  On Wed, 2004-10-27 at 10:04, Uwe Ligges wrote:
  
  
  This is a larger problem if 
  1. one of the underlying functions does not have ...
  2. you want to relay arguments to two or more underlying functions, and
  3. you don't want to list all possible arguments in your function
  definition, since it is long enough already.
  
  The solution is still there, but it is (black) magic. For instance,
  'arrows' does not have ..., so you must add them with this magical
  mystery string:
  
  formals(arrows) - c(formals(arrows), alist(... = ))
 
 
 You don't need it for simple things like:
 
foo - function(...){
plot(1:10)
arrows(1,1,7,7,...)
}
 
 foo(lwd=5) # works!
 
That's why I had point 2 above: it really would work with simpler
things. However, the following may fail:

 parrow - 
function (x, y, ...)
{
plot(x, y, ...)
arrows(0, 0, x, y, ...)
invisible()
}
 parrow(runif(10), runif(10), col=red) # works
 parrow(runif(10), runif(10), col=red, pch=16)
Error in arrows(0, 0, x, y, ...) : unused argument(s) (pch ...)

Adding formals would help.

 
 As always, useful patches are welcome.
 

I don't know if this counts as a useful patch, but it is patch anyway:

diff -u2r old/arrows.R new/arrows.R
--- old/arrows.R2004-10-27 11:32:25.0 +0300
+++ new/arrows.R2004-10-27 11:32:53.0 +0300
@@ -1,5 +1,5 @@
 arrows -
 function (x0, y0, x1, y1, length = 0.25, angle = 30, code = 2,
-col = par(fg), lty = NULL, lwd = par(lwd), xpd = NULL)
+col = par(fg), lty = NULL, lwd = par(lwd), xpd = NULL, ...)
 {
 .Internal(arrows(x0, y0, x1, y1, length = length, angle = angle,


cheers, jari oksanen
-- 
J.Oksanen, Oulu, Finland.
Object-oriented programming is an exceptionally bad idea which could
only have originated in California. E. Dijkstra

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] ploting an ellipse keeps giving errors

2004-10-27 Thread Jari Oksanen

On Wed, 2004-10-27 at 11:34, Sun wrote:
 library (ellipse)
 
 shape1 = c (1, 0, 0,1)
 dim(shape1) = c(2,2)
 ellipse (center = c(0,0), shape = shape1, radius = 1)
 
 =
 Error in plot.xy(xy.coords(x, y), type = type, col = col, lty = lty, ...) : 
 plot.new has not been called yet
 
 
 It is really frustrating. Also what do the shape matrix, radius correspond to an 
 ellipse function
 
 (x-x0)^2/a + (y-y0)^2/b = 1
 
 ? Please advise!

Sun, did you read the ?ellipse help page? I just read, but I didn't find
arguments 'center', 'shape' or 'radius' there. It could be useful to use
argument specified in the help page.  Section 'Details' of ?ellipse
explains the parametrization.

cheers, jari oksanen

-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] ploting an ellipse keeps giving errors

2004-10-27 Thread Jari Oksanen

On Wed, 2004-10-27 at 12:04, Jari Oksanen wrote:
 On Wed, 2004-10-27 at 11:34, Sun wrote:
  library (ellipse)

Here's your problem! See below.

  shape1 = c (1, 0, 0,1)
  dim(shape1) = c(2,2)
  ellipse (center = c(0,0), shape = shape1, radius = 1)
  
  =
  Error in plot.xy(xy.coords(x, y), type = type, col = col, lty = lty, ...) : 
  plot.new has not been called yet
  
  
  It is really frustrating. Also what do the shape matrix, radius correspond to an 
  ellipse function
  
  (x-x0)^2/a + (y-y0)^2/b = 1
  
  ? Please advise!
 
 Sun, did you read the ?ellipse help page? I just read, but I didn't find
 arguments 'center', 'shape' or 'radius' there. It could be useful to use
 argument specified in the help page.  Section 'Details' of ?ellipse
 explains the parametrization.
 
Sun,

Actually the problem seems to be that you loaded library(ellipse), but
follow the instructions for function ellipse in library(car). Would this
help? (One additional note: ellipse::ellipse.default uses British
spelling for 'centre', but 'cent' would work both in ellipse::ellipse
and car::ellipse.)

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] biplot drawing conc ellipses

2004-11-05 Thread Jari Oksanen

On Thu, 2004-11-04 at 22:44, T. Murlidharan Nair wrote:
 Is there an option to draw concentration ellipses in biplots ? It seems
 really nice to summarize large number of points of each group.

Murli,

If you mean biplot.prcomp function in stats package, and you want to
draw the concentration ellipses for row scores, the answer probably is
not easily. Technically, the problem is that arrows for loadings are
drawn after labels for rowscores, and the scaling used for drawing row
scores is lost in the process. If you try to add points or segments to
the existing plots, you should use the scaling for arrows on sides 3 and
4 (top and right). If you want to add something for row scores, you just
don't have information on co-ordinates. I didn't check biplot.princomp,
but the situation may be similar there. 

Drawing of ellipsoids is possible in some alternative packages. You
already got a hint of ade4. In addition, vegan has pca as a special case
of its rda function, and there you have tools like ordiellipse (using
the ellipse package), ordispider and ordihull to display the variability
within factor levels. However, vegan doesn't have biplots like
biplot.prcomp, i.e. with arrows for loadings, Moreover, scaling of
results is different. 

It seems that the only thing you can do is to write your sweet on biplot
function. 

cheers, jari oksanen
-- 
Jari Oksanen -- Oulu, Finland.
But, Mousie, thou art no thy lane, In proving foresight may be vain;
The best-laid schemes o' mice an 'men, Gang aft agley,
An'lea'e us nought but grief an' pain, For promis'd joy! (Robert Burns)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] rgl on Mac OS

2004-11-08 Thread Jari Oksanen

On Sun, 2004-11-07 at 02:54, Saiwing Yeung wrote:

 It seems like a number of people on this list can install rgl but have
 problem loading it. I found myself in the same situation too.
 
 I have tried the workaround of removing /usr/X11R6/lib from
 DYLD_LIBRARY_PATH, but it doesn't seem to work for me, I am still getting
 the same error (that everyone else seems to get). Can anyone give me some
 ideas on what else to try?
 
 I have Mac OS 10.3.5, running R2.0. Thanks in advance!
 
I had a quick look at this issue, and indeed, rgl failed to load in my
system (MacOS X 10.3.6, R 2.0.0) with various error messages. It seems
to me that the binary packages at CRAN were incompatible (g++ is
notorious for version changes incompatibilities). The solution was to 
use source packages and compile locally. For this you need to have a
compiler installed. The compiler comes with MacOS X 10.3.* installation
cd/dvd, but you have to install their Developer Tools separately.

One of the early error messages was that libpng was missing. When
installing from source, rgl was configured without png support, and this
message disappeared. However, CRAN binaries failed even after installing
png libraries, but now with other error messages. I got my libpng with
the help of http://www.rna.nl/ii.html (that you need anyway).

It may be that you have to start X11 separately before calling
library(rgl), but this was not necessary in my later attempts. 

Summary: install from source package. Optionally, you may install libpng
as well.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] gdist and gower distance

2004-11-09 Thread Jari Oksanen

On Tue, 2004-11-09 at 12:59, Alessio Boattini wrote:
 Dear All,
  
 I would like to ask clarifications on the gower distnce matrix calculated by 
 the function gdistin the library mvpart.
 Here is a dummy example:
  
  library(mvpart)
 Loading required package: survival 
 Loading required package: splines 
  mvpart package loaded: extends rpart to include
  multivariate and distance-based partitioning
  x=matrix(1:6, byrow=T, ncol=2)
  x
  [,1] [,2]
 [1,]12
 [2,]34
 [3,]56
  gdist(x, method=euclid)
  12
 2 2.828427 
 3 5.656854 2.828427
  
 ##
 doing the calculations by hand according to the formula in gdist help page I 
 get the same results. The formula given is:
  'euclidean'   d[jk] = sqrt(sum (x[ij]-x[ik])^2)
 #
 
  sqrt(8)
 [1] 2.828427
  gdist(x, method=gower)
   1 2
 2 0.7071068  
 3 1.4142136 0.7071068
  
 ###
 doing the calculations by hand according to the formula in gdist help page 
 cannot reproduce the same results. The formula given is:
 'gower'   d[jk] = sum (abs(x[ij]-x[ik])/(max(i)-min(i))
 ##
  
 Could anybody please shed some light?
  

There seems to be a bug in documentation. The function uses different
calculation than the help page specifies. Look at the 'gdist' code. Just
to make things easier: In the function body, gower is method 6, and
Euclidean distances are method 2.

Gower's original paper is available through http://www.jstor.org/
(Biometrics Vol. 27, No. 4, p. 857-871; 1971).

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R works on Fedora Core 3

2004-11-09 Thread Jari Oksanen

On 9 Nov 2004, at 19:44, Jonathan Baron wrote:
The RPM for Fedora Core 2 seems to work just fine on Core 3.
(The graphics window got smaller, but I'm sure there is a setting
for that.)
That would be good news. I really don't know how the graphics window 
became so big at some stage. (MacOS X is just cute here: tiny, sharp, 
fast graphics window.)

Has the options()printcmd reappeared, so that dev.print() works without 
changing default options?

cheers, jazza
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] CDs for R?

2004-11-16 Thread Jari Oksanen

On 16 Nov 2004, at 23:39, (Ted Harding) wrote:
Some of us are on narrow bandwidth dialup connections,
so downloading large quantities of stuff is out of the
question (e.g. at approx. 5min/MB, it would take over
2 days to download a single CD). The meat of CRAN
(including contributed packages and documentation)
is enough to fill 5 CDs, though one individual probably
wouldn't be interested in all of that.
5 CDs sounds 4 too many. I once burnt CDs for my students, and they 
fitted nicely in one CD (Windows binaries, all packages as Windows 
binaries and sources, contributed documents).  I guess you can fit 
Windows, Mac and some Linux binaries all in one CD.

Now comes my suggestion to CRAN maintainer: this all would be easier, 
if you would produce a CD image file ('iso') that would contain a 
snapshot of the latest version: main binaries, all contributed 
packages, and docs. Getting somebody to help downloading this iso would 
be much easier than trying to collect all first and then make up your 
own cd image.

Actually, only Windows and Mac users need binary versions of packages. 
The former because they don't have tools to install from source, the 
latter because they don't know that they have the tools (being command 
line challenged).

To Dirk Eddelbuettel: Yes indeed, Ubuntu gives human face to Debian and 
is a much more pleasant experience. However, changing OS for R may be 
asking too much. Further, Ubuntu/Debian comes with a tiny and biased 
selection of packages, and if that's not your kind of bias, you have 
got to go to the Internet again. Further, Ubuntu (and other Linuxes) 
lag behind R. The current Ubuntu release comes with R 1.9.1, and it 
won't be upgraded but in the next release scheduled for April 2005 (and 
just in the same time as the next R, so that Ubuntu will be one R 
version off again). I guess the lag is even worse in packages.

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] CDs for R?

2004-11-17 Thread Jari Oksanen

On Wed, 2004-11-17 at 16:54, Dirk Eddelbuettel wrote:
 On Wed, Nov 17, 2004 at 08:25:54AM +0200, Jari Oksanen wrote:
  
  On 16 Nov 2004, at 23:39, (Ted Harding) wrote:
  Now comes my suggestion to CRAN maintainer: this all would be easier, 
  if you would produce a CD image file ('iso') that would contain a 
  snapshot of the latest version: main binaries, all contributed 
  packages, and docs. Getting somebody to help downloading this iso would 
  be much easier than trying to collect all first and then make up your 
  own cd image.
 
 It's volunteer effort, so someone actually has to do this. Can you help?
 
Probably not. Not because I wouldn't be willing, but I may not be
able... 

I have done this a couple of time using wget to build a local subtree
of selected parts of CRAN. Then running mkisofs was pretty simple. I
guess this could be automated pretty easily if you have the repository
already at hand: all you need is mkisofs + info of its targets. However,
I am not that kind of guru.

All this would require that people think this is worthwhile. I think
that the general feeling has been that there is no need for a
R-current.iso snapshot (or the same as a valid Windows name). So this
is an academic issue (suits me).

  
  To Dirk Eddelbuettel: Yes indeed, Ubuntu gives human face to Debian and 
  is a much more pleasant experience. However, changing OS for R may be 
  asking too much. Further, Ubuntu/Debian comes with a tiny and biased 
  selection of packages, and if that's not your kind of bias, you have 
  got to go to the Internet again. Further, Ubuntu (and other Linuxes) 
 
 Again, it reflects the interests of the volunteers involved. If you want to
 see other things done, come join in and do them.
 
I know this is volunteer work, and I do appreciate this volunteer work.
It is all biased -- hence the formulation of your kind of bias. At the
moment I have no idea how to build a deb package of R packages, so I
don't know what to say. 

  lag behind R. The current Ubuntu release comes with R 1.9.1, and it 
  won't be upgraded but in the next release scheduled for April 2005 (and 
  just in the same time as the next R, so that Ubuntu will be one R 
  version off again). I guess the lag is even worse in packages.
 
 This actually requires a response. Here is a quick log (from my mail folder)
 about what new packages (of mine, can't speak for others) got uploaded
 recently -- in most cases, this is on the day of the source release, so the
 lag would be close to zero.
  
 Now, if and when these get pressed into a release by Debian or Ubuntu I do
 not control. Which is, I guess, why we're discussing archive snapshots in
 this thread. 
 
They go, I guess, through a testing period in Debian, and if they don't
wait for anybody else, they may appear in some version of Debian after
that. In Debian repository you typically see much older versions. As to
Ubuntu (that I know a bit better), they will go into next release which
is nearly six months ahead (they are not upgraded in between). 

Actually, Ubuntu is a bad choice if you just want to have R, since R is
not among the core packages, but it is unsupported. Moreover, Ubuntu is
a bad choice for the original problem of slow wires: Even for an
ordinary install you need internet connection, if you want to get beyond
a very rudimentary system. I just forgot this in my previous message:
when you're wired, you think it's natural to be wired. So forget Ubuntu
if you want to have R without fast internet connection. 

I have Ubuntu since it was about the only easily managed powerpc system
I found. At the moment, I have R 2.0.0 built from source distribution
there. Packages are from source files, too. 

Thanks for the good work with Debian!

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Running R from CD?

2004-11-22 Thread Jari Oksanen

On Mon, 2004-11-22 at 02:41, bogdan romocea wrote:
 Better install and run R from a USB flash drive. This will save you
 the trouble of re-writing the CD as you upgrade and install new
 packages. Also, you can simply copy the R installation on your work
 computer (no install rights needed); R will run.
 
I think there is a niche (= a hole in the wall) for a live CD: it is
cheaper to distribute 20 copies of CD's to your audience than 20 USB
memory sticks. Instructions would be welcome.
 
 From: Hans van Walen hans_at_vanwalen.com

 At work I have no permission to install R. So, would anyone know
 whether it is possible to create a CD with a running R-installation
 for a windows(XP) pc? And of course, how to?
 
Check the file Getting-Started-with-the-Rcmdr.pdf in John Fox's Rcmdr
package. You should be able to reach this package by launching
help.start(), and then browsing its directory in the help browser
window. Go to chapter 7. Some Suggestions for Instructors which tells
you how to make a live CD of R in Windows. I haven't tried this, since I
don't have Windows, but I sure will when I got to be an instructor in
a Windows class.
 
cheers, jari oksanen 
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to insert one element into a vector?

2004-11-23 Thread Jari Oksanen

On Mon, 2004-11-22 at 17:43, Barry Rowlingson wrote:
 Deepayan Sarkar wrote:
 
  Pretty much what 'append' does.
 
   A shame then, that help.search(insert) doesn't find 'append'! I cant 
 think why anyone looking for a way of _inserting_ a value in the middle 
 of a vector would think of looking at append!
 
   Python has separate insert and append methods for vectors.
 
x=[1,2,3,4,6]
x.insert(4,5)
x
   [1, 2, 3, 4, 5, 6]
x.append(99)
x
   [1, 2, 3, 4, 5, 6, 99]

So has R. R's 'insert' is called 'append', and R's 'append' is called
'c'. Counter-intuitively though, and I'm happy that Peter Dalgaard
didn't know that 'append' inserts: it gives some hope to us ordinary
mortals.

cheers, jazza
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] The hidden costs of GPL software?

2004-11-23 Thread Jari Oksanen

On Tue, 2004-11-23 at 17:40, roger koenker wrote:
 Having just finished an index I would like to second John's comments.
 Even as an author, it is  difficult to achieve some degree of
 completeness and consistency.
 
 Of course, maybe a real whizz at clustering could assemble something
 very useful quite easily.  All of us who have had the frustration of 
 searching
 for a forgotten function would be grateful.
 
You mean SOM?
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] A basic question

2004-11-30 Thread Jari Oksanen

On Tue, 2004-11-30 at 13:58, Kenneth wrote:
 Hi R users:
 
 I want to know any experience compiling R in other LINUX distributions
 besides FEDORA (Red Hat) or Mandrake, for example in BSD, Debian,
 Gentoo, Slackware, vector LINUX, Knoppix, Yopper or CERN linux?
 
 Hope this is not a basic question
 
 Thank you for your help.
 
I assume that the following will typically work:
Get the source file, gunzip and untar, cd to the created directory and
type:

./configure
make
sudo make install

It is best to check the resulting configuration after ./configure and
get the software (compilers, libraries, packages, utilities) you need
for the missing functionality you want to have. It is also wise to run
'make check' after 'make' so that you see if you can trust your
compilation. This make check fails in some cases: at least standard
package 'foreign' failed 'make check' in ppc architecture both in Red
Hat/Fedora  based (Yellowdog) and Debian based (Ubuntu) Linuxes when I
tried last time. Otherwise the compilation seems to run smoothly (and
you may not need 'foreign').

BSD is not Linux, but R is officially supported at least for one version
of BSD with GNU tools: MacOS X.

cheers,jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] can't install r package on debian due to linker problem

2004-12-01 Thread Jari Oksanen

On Wed, 2004-12-01 at 14:38, Robert Sams wrote:
 hi,
 
 my attempt to install the package Hmisc v3.0-1 fails with the message:
 
 /usr/bin/ld: cannot find -lfrtbegin
 collect2: ld returned 1 exit status
 make: *** [Hmisc.so] Error 1
 ERROR: compilation failed for package 'Hmisc'
 
It is funny to see this error message in Debian which is a GNU/Linux
system. Typically you see the very same error message in MacOS X which
is a GNU/BSD system. There this is caused by missing Fortran compiler.
Indeed, at least in Red Hat Linux, libfrbegin.a is owned by Fortran
(g77). However, you claim below that you have installed Fortran (g77). I
suggest you look for if you some Fortran related packages are missing,
or you can try to 'locate' libfrtbegin.a in your system and see if it is
in the linker search path.
 
 i'm at a loss here. any hints will be very much appreciated.
 
 i'm running:
 
 debian stable
 R version 2.0.1
 gcc 2.95.4-14
 g77 2.95.4-14
 binutils 2.12.90.0.1-4
 
cheers, jari oksanen

-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] step.gam

2004-12-01 Thread Jari Oksanen

On Wed, 2004-12-01 at 17:09, David Nogués wrote:
 Dear R-users:
 
 Im trying (using gam package) to develop a stepwise analysis. My gam 
 object contains five pedictor variables (a,b,c,d,e,f). I define the 
 step.gam:
 
 step.gam(gamobject, scope=list(a= ~s(a,4), b= ~s(b,4), c= ~s(c,4), 
 d= ~s(d,4), e= ~s(e,4), f= ~s(f,4)))
 
Your scope doesn't look much like Trevor Hastie's help page. Have you
tried formulating your scope like Hastie tells you to do? That is, for
a you should list all possible cases for stepping instead of only one.
That is, something like ...a = ~ 1 + a + s(a, 2) + s(a, 4).

Why do you want to use this kind of stepping, when the standard package
mgcv has a much better way of model building using generalized cross
validation?

Dave Roberts discusses R/S-plus (or mgcv/gam package level) gam fitting
in ecological context at
http://labdsv.nr.usu.edu/splus_R/lab5/lab5.html. You may find some
useful hints here, as Dave is partial to the traditional S-plus gam as
well.

cheers, jari oksanen
 
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Protocol for answering basic questions

2004-12-01 Thread Jari Oksanen

On 1 Dec 2004, at 19:46, [EMAIL PROTECTED] wrote:
I have been a member for only a few days but I find the tone of some
responses are inappropriate for a list dubbing itself a help list. I 
also
completely understand that traffic needs to be kept at a modest level 
to
keep advanced users interested; therefore, I suggest that a second 
help list
be created to deal with advanced R help. Belong to both lists if you 
wish
and filter your email for cursory glances or a detailed reading. Users 
must
judge themselves the level of their queries and perhaps a note saying
something like requests to the advanced list are generally made by 
users
who already have a very good working knowledge of R or some very rough
benchmark for judging your level like 2 years.

I do not know how much work this would involve or resources available 
for
this - it is a blind proposal. I think it might deal with many of the
problems both beginner and advanced users have with the present list.

You may have not been long enough on this list to see that some of the 
old-time gurus have reached a demigod like status. Demigods have all 
rights to be `rude' (that's almost a definition of a demi-deity). That 
said, I do know your sentiments: I'd be afraid to post a question to 
this list. I also remember that I was shocked that the first message I 
sent here got answers from people like VR (both) and many others, and 
these were friendly and useful answers (although I could have found the 
answer to my question with careful reading of documents -- it was about 
specifying offset in glm).

This is a subscribed mailing list. As such, this is a restrictive list 
with more stringent rules than open newsgroups. Well, newsgroups can be 
really harsh places, too. I don't think that it would be wise to 
establish a parallel novice mailing list. That would add only one extra 
irritation: cross-posting to several lists. However, I do think that 
novice questions could be be better served in a newsgroup (Usenet) than 
in a closed mailing list. There have been several suggestions of 
transforming this mailing list into a newsgroup, but these suggestions 
have been rejected, and rightly. However, if you want to have novice 
group with slacker netiquette, you could try to establish a parallel 
and alternative newsgroup with different emphasis than this mailing 
list. I am sure that many of the greatest gurus wouldn't follow you 
into this newsgroup, but they would keep to this mailing list. If you 
want to  have answers to 'basic', 'silly' or 'simple' questions, you 
don't need them either.

Suggesting a Usenet newsgroup a generation thing. I think some of the 
younger users would prefer a Wiki or a Forum (these are words I've 
seen, but I wouldn't visit places like this, talking about my 
g-g-generation).

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] depth constrained cluster

2004-12-01 Thread Jari Oksanen

On Wed, 2004-12-01 at 18:36, Emmanuel GANDOUIN wrote:
 Please could you help me to find a package to apply a depth-constrained 
 cluster analysis on palaeoecological data (in order to zone subfossil 
 diagram)?
 
I assume that  you made an exhaustive search in CRAN, and the lack of
answers indicates that there is no such a function in R. You may check
Pierre Legendre's Progiciel R instead (yes, this is a different R and
has a priority to the name, our R being a later homonym). The progiciel
R is available at Thttp://www.fas.umontreal.ca/biol/casgrain/fr/labo/.
his package seems to have both onedimensional or chronologically
constrained clustering and 2dim or spatially constrained clustering.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Protocol for answering basic questions

2004-12-02 Thread Jari Oksanen

On Thu, 2004-12-02 at 11:19, John Logsdon wrote:

 There are three ways of tackling this as far as I see:
 
 First would be to make the list a Reply to Sender so that most of us don't
 see the replies. This would keep the traffic down and if any topic was of
 interest to another member, s/he could ask the originator whether it had
 been solved or the solution could also be posted as a summary.  One
 advantage of Reply to Sender is that it is only the Sender sees the
 multiple messages sent saying the same thing from good souls around the
 world who haven't seen the N-1 other messages...
 
This seems to depend on the mail reader. This already is the default
behaviour with me (Evolution mail reader). I have to select
Reply-to-All to send the message to r-help as well -- and then it goes
to Cc list as well. It seems that some other mail software behaves
differently. It seems that R-help mail has two candidates to Reply:

From: this field is the original poster
Sender: [EMAIL PROTECTED]

Obviously my mail reader picks only From, but John's picks both From
and Sender.

Some other mail lists add to headers a new field Reply-To which equals
to From (original poster). It seems that this would be sufficient to
make many mail readers to use this as a default address.

Another issue is whether it is nice to divert a public discussion to a
private conversation. In several cases the solutions to the problem
remain private as well. After all, the purpose of the mailing list is a
public discussion instead of a public call to a private discussion.

cheers, jari oksanen
-- 
Jari Oksanen -- Oulu, Finland.
But, Mousie, thou art no thy lane, In proving foresight may be vain;
The best-laid schemes o' mice an 'men, Gang aft agley,
An'lea'e us nought but grief an' pain, For promis'd joy! (Robert Burns)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How about a mascot for R?

2004-12-02 Thread Jari Oksanen

On 2 Dec 2004, at 19:46, (Ted Harding) wrote:
On 02-Dec-04 Henrik Bengtsson wrote:
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Damian
Betebenner
Sent: Thursday, December 02, 2004 6:07 PM
To: [EMAIL PROTECTED]
Subject: [R] How about a mascot for R?
Excellent replies,
So a couple of questions about preferences for the mascot:
1. Does the mascot need to have a name that starts with R? Is
that usually the way it works?
So far the possibilities put forward are: Ray, Ram, Inch
Worm, Rhinoceros
R.oo (http://www.maths.lth.se/help/R/R.oo/), ooops Roo, which is
Australian slang for Kangaroo. http://images.google.com/images?q=roo
Cheers
Henrik Bengtsson
(And of course .oo suggests the OO aspect of R as well).
But what appeals to me about this suggestion is that it made
me recall cartoon drawings I saw many years ago, illustrating
leptokurtic and platykurtic.
The platykurtic was a profile drawing of a platypus,
illustrating the flat-topped profile of such a distribution.
The leptokurtic showed two kanagaroos in profile, upright,
face-to-face, with tails outstretched on the ground behind them.
The envelope of this drawing illustrated the high peak and the
long tails. (And of course they are good leppers).
Can anyone remember where this appeared?
LEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

I can check that tomorrow when I'm at my office. You can have a look at 
the image at

http://cc.oulu.fi/~jarioksa/mascot.html
I think this is a copyright picture, and it cannot be used freely as a 
mascot (and will disappear soon from this address).

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How about a mascot for R?

2004-12-02 Thread Jari Oksanen

On Thu, 2004-12-02 at 22:56, Peter Dalgaard wrote:
 Tim Churches [EMAIL PROTECTED] writes:
 
  Damian Betebenner wrote:
   R users,
   How come R doesn't have a mascot?
  
  Perhaps someone with artistic flair could create a mascot based on
  this image? It would help to give newcomers to R-help the right idea:
  
  http://www.accesscom.com/~alvaro/alien/thepics/ripley1__.jpg
 
 Or maybe this one:
 
 http://www.accesscom.com/~alvaro/alien/thepics/bg10s.jpg
 
 or (apologies to Pat Burns):
 
 http://www.accesscom.com/~alvaro/alien/thepics/alien102_.jpg

It seems that tastes for movies vary. I've never liked movies about
ecologically non-sustainable and energetically impossible life forms.
The current sub-theme brings to my mind something completely different:
http://www.hundland.com/posters/t/TheTalentedMr.Ripley.jpg.

cheers, jari o.
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How about a mascot for R?

2004-12-04 Thread Jari Oksanen

On 4 Dec 2004, at 16:19, Martin Maechler wrote:
DScottNZ == David Scott [EMAIL PROTECTED]
on Fri, 3 Dec 2004 15:04:52 +1300 (NZDT) writes:
  
DScottNZ As to an animal mascot, I think a New Zealand
DScottNZ mascot is a must,
well, thinking that must is bit strong, I agree that
I have had the same idea (NZ animal) before your post.
I first thought of the obvious Kiwi, but hoping for something
more beautiful had been googling around for New Zealand animals,
then had been side tracted by the Kakapo which I found nice,
intriguing, but in his fight against extinction didn't seem to
fit to my notion of R..
Firstly, Kiwi is a rip snorter for a bird. Secondly, there are other 
kind of kiwis than the kiwi bird. I'm living about as far a away from 
NZ as is it is possible (you're getting closer if you try to get away), 
but even I've heard of 'kiwi fruit', 'kiwi bear' (brushtail possum) and 
'kiwi' as people. So it could be something 'kiwi'. I do think that a 
kiwi bird would be mascotty like a creature: cuddly and round and 
easiesh to draw. One parallel story brought about here is the penguin 
as a Linux mascot. Actually, this is a not-so-pleasant story: Linus 
Torvalds told somewhere that a penguin (hardly gentoo but some other 
species) tried to bite off his finger in a zoo, which made him to like 
those animals (he's a Swedish speaking Finn which helps to explain this 
attitude). With this attitude, you could pick a gray, mouse-like 
nocturnal bird as a mascot. Naturally, this is none of my business, so 
you should not let this message influence your opinion (it wouldn't 
anyway).

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Gam() function in R

2004-12-06 Thread Jari Oksanen

On 6 Dec 2004, at 7:36, Janice Tse wrote:
Thanks for the email. I will check that out
However when I was doing this :gam(y~s(x1)+s(x2,3),  
family=gaussian,
data=mydata )it gives me  the error :

Error in terms.formula(formula, data = data) :
invalid model formula in ExtractVars
What does it mean ?
When Any Liaw answered you (below), he asked you to specify which kind  
of 'gam' did you use: the one in standard package 'mgcv' or the one in  
package 'gam'. We should know this to know what does it mean to get   
your error message. If you used mgcv:::gam, it means that you didn't  
read it help pages which say that you should specify your model as:

gam(y ~ s(x1) + s(x2, k=3))
Further, it may be useful to read the help pages to understand what it  
means to specify k=3 and how it may influence your model. Simon Wood --  
the mgcv author -- also has a very useful article in the R Newsletter:  
see the CRAN archive. It may be really difficult to understand what you  
do when  you do mgcv:::gam unless you read this paper (it is possible,  
but hard). Simon's article specifically answers to your first question  
of deciding the smoothness, and explains how elegantly this is done in  
mgcv:::gam (gam:::gam has another set of tools and philosophy).

If you happened to use gam:::gam, then you have to look at another  
explanation.

cheers, jari oksanen
From: Liaw, Andy [mailto:[EMAIL PROTECTED]
Sent: Sunday, December 05, 2004 11:34 PM
To: 'Janice Tse'; [EMAIL PROTECTED]
Subject: RE: [R] Gam() function in R
Unfortunately that's not really an R question.  I recommend that you  
read up
on the statistical methods underneath.  One that I'd wholeheartedly
recommend is Prof. Harrell's `Regression Modeling Strategies'.

[BTW, there are now two implementations of gam() in R: one in `mgcv',  
which
is fairly different from that in  `gam'.  I'm guessing you're  
referring to
the one in `gam', but please remember to state which contributed  
package
you're using, along with version of R and OS.]

Cheers,
Andy
From: Janice Tse
Hi all,
I'm   a new user of R gam() function. I am wondering how do
we decide on the
smooth function to use?
The general form is gam(y~s(x1,df=i)+s(x2,df=j)...)  , how do we
decide on the degree freedom to use for each smoother, and if we shold
apply smoother to each attribute?
Thanks!!
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


--- 
-
--
Notice:  This e-mail message, together with any  
attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!  
http://www.R-project.org/posting-guide.html

--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Importing vector graphics into R

2004-12-08 Thread Jari Oksanen

On Wed, 2004-12-08 at 15:53, [EMAIL PROTECTED] wrote:
 On 08-Dec-04 Roger Bivand wrote:
  On Tue, 7 Dec 2004, Hinrich Göhlmann wrote:
  
  Dear R users,
  
  I know of the possibility to import bitmaps via the nice
  pixmap library. 
  But if you later on create a PDF it is somewhat
  disappointing to have such graphics bitmapped. Is there
  a trick (via maps?) to import a vector graphic and have
  them plotted onto a graph? My searching attempts in the
  searchable r-help archive did not seem to result in anything 
  useful...
  
  No, nothing obvious. If you have an Xfig file - or convert to
  one from PS,
 
 How does one do that? None of the tools I can find on my (Linux)
 system seem to include the possibility of PS-Xfig (or any other
 vector format either, except of course PDF).
 
pstoedit. May not be in standard distros, but can be compiled from the
source. Here we have even used pstoedit in post-processing eps graphs
from R. It works in some cases, but, for instance, lattice graphic was
made of polygons instead of lines, and we couldn't change line widths
for horizontal lines only in panel headers.

This is what pstoedit gives for version info:

pstoedit: version 3.33 / DLL interface 108 (build Oct 17 2003 - release
build) : Copyright (C) 1993 - 2003 Wolfgang Glunz

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] How about a mascot for R?

2004-12-08 Thread Jari Oksanen

On Wed, 2004-12-08 at 14:10, Rau, Roland wrote:

 Dear all,
 
 browsing through the suggestions, I have the impression that the general
 direction is towards an animal from New Zealand (I guess because of the
 roots of R). But since the R Foundation is now located in Vienna,
 Austria. What about a typical Austrian animal? Is there one? Maybe a
 Wolpertinger. A Wolpertinger is a fantasy animal which is a rabbit
 with the antlers known from deer and some wings from a bird. In addition
 to the Austrian headquarters, another reason for such an animal which
 does not exist in reality (or does it???) is that coding something with
 R is sometimes so easy that it appears to be almost unreal.

I just wait for someone jumping off and saying this is off-topic and you
should stop posting to this list -- and I'm afraid it could happen just
at this point. However, if you accept stranger animals then the group
called Rhinogradentia gives good candidates (at least as pleasant as
Onychophora suggested previously). First, they have R in their name.
Second, they look like mascots.

The most authoritative guide to the group is:

Stümpke, H. 1957. Bau und Leben der Rhinogradentia. Gustav Fischer
Verlag, Stuttgart

The English translation is The Snouters: Form and life of the
Rhinogrades . The University of Chicago Press (1981).

Google will found more info for those who don't have acces to these
books.

cheers, jaRi oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] How about a mascot for R?

2004-12-08 Thread Jari Oksanen

On Wed, 2004-12-08 at 17:01, Jari Oksanen wrote:

 
 I just wait for someone jumping off and saying this is off-topic and you
 should stop posting to this list -- and I'm afraid it could happen just
 at this point. 

Just to make it clear and to avoid misunderstanding: I was trying to
reach a passive voice with my poor English. I don't want to indicate
that anyone else but me should stop posting to this list...

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] How about a mascot for R?

2004-12-08 Thread Jari Oksanen

On Wed, 2004-12-08 at 17:01, Jari Oksanen wrote:

 
 I just wait for someone jumping off and saying this is off-topic and you
 should stop posting to this list -- and I'm afraid it could happen just
 at this point. 

Just to make it clear and to avoid misunderstanding: I was trying to
reach a passive voice with my poor English. I don't want to indicate
that anyone else but me should stop posting to this list...

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to circumvent negative eigenvalues in the capscale function

2004-12-09 Thread Jari Oksanen

On Fri, 2004-12-10 at 06:11, [EMAIL PROTECTED] wrote:
 

 I am trying to do a partial canonical analysis of principal coordinates
 using Bray-Curtis distances. The capscale addin to R appears to be the only
 way of doing it, however, when I try and calculate a Bray-Curtis distance
 matrix either using Capscale or Vegedist (capscale I understand uses
 Vegedist anyway to calculate its distance matrix), R uses up all available
 memory on the computer, stops and then comes back with errors regarding
 negative eigenvalues.
 
The way to avoid negative eigenvalues is to use a ``positive
semidefinite'' dissimilarity matrix. This may sound cryptic. In simple
words: the underlying functions in capscale assume that your
dissimilarities are like (Euclidean) distances, meaning that the
shortest route between two points is a straight line, and you cannot
find a shorter route by going via a third point. This is possible with
Bray-Curtis index, and as its symptom, you get negative eigenvalues
(which are ignored in capscale, and only the dimensions with positive
eigenvalues are used). Were negative eigenvalues your problem, you could
avoid them by using another dissimilarity index with better metric
properties. Jaccard dissimilarity is rank-order similar to Bray-Curtis,
but it should be positive semidefinite.

However, I don't think think that negative eigenvalues and memory
problems are coupled. I guess that you simply have memory problems, and
negative eigenvalues are unrelated. So you need more memory or an
operating system with better memory handling. You may try with some
Linux live-cd (such as Quantian) where you can use R in Linux without
installing Linux in your hard drive.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Switching to Mac, suggestions? (was switching to linux)

2004-12-13 Thread Jari Oksanen

On 13 Dec 2004, at 19:53, doktora v wrote:
I'm looking to switch to Mac platform. Anyone had any experience
with that? I'm expecting on a power G4 laptop later this week hope
R behaves...
I have been a Linux user since 1999, and I got my first ever Mac (iBook 
G4 laptop) last December. There is just as little to comment on MacOS X 
as there is to comment on flavours of Linux distros: there is no large 
difference as regards to R. I still prefer emacs  ess as a shell (but 
you can get some kind of real emacs in Mac as well), but MacOS X/ R is 
more of an eye candy (though I find it really hard to get any real use 
for transparent windows in R: I still prefer to see what I type instead 
of looking the background through the text). As regards to R, it is 
just the same if you have any brand of Linux or MacOS X or even a 
fringe system like Windows. The differences are somewhere else than in 
R.

By the way, Ubuntu GNU/Linux works nicely in Mac, with blas who knows 
about the vector processor in G4.

cheers, jazza
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Switching to Mac, suggestions? (was switching to linux)

2004-12-13 Thread Jari Oksanen

On Mon, 2004-12-13 at 19:53, doktora v wrote:
 I'm looking to switch to Mac platform. Anyone had any experience
 with that? I'm expecting on a power G4 laptop later this week hope
 R behaves...
 
Still one comment on speed. I once (and, actually, just now) had to
analyse a big data set of some 1100 observations using various
multivariate methods, among them isoMDS of MASS and eigenvector methods
in vegan library. I made a testsuite of typical analysis sequence for
this very special data set. So it is non-general, but something that
matters to me. I have run this data set on crippled (=Celeron) i686
under Linux and Windows, and on G4 (iBook and iMac) under MacOS X,
Yellowdog Linux 3 and Ubuntu GNU/Linux 4.10. It may be daring to say
something about G4 performance based on this special case, but this
doesn't stop me from saying. For my all sequence, G4 with MacOS X is
somewhat faster compared to cpu speed than Celeron, but not nearly as
much as advertised. There were some procedures that run slower per MHz
than Celeron (isoMDS). However, MacOS X comes with G4-optimized blas, so
that eigenvector based analysis was faster: 800 MHz iBook run like 1400
MHz Celeron, and 1000MHz iMac run like 1700 MHz Celeron. I guess the
boost depends on time you spend in blas. Otherwise you may count that
your G4 cpu cycles equal i686 cpu cycles, and you are slower since you
can get faster Intel chips. Vector processor (AltiVec) may be handy, but
most functions can't use without very tedious and ugly code optimized by
hand. I've seen claims that gcc 3.4 has some automatic G4 optimization.
If this is true, you may get some advantage with G4.

G5 is a different issue.

Yellowdog Linux 3 didn't have G4-optimized blas, and it was really slow.
Actually, 800 MHz iBook run like a 500 MHz Celeron in a blas-heavy
analysis. YD3 was so old that I couldn't build an optimized blas without
extensive upgrading (gcc, glibc etc), and I really wasn't motivated for
that. You can get a G4-optimized blas for Ubuntu GNU/Linux and with that
it runs just as fast as MacOS X.

BTW, this test matter in the sense that I have to run these analyses,
and they take an observable amount of time. The test suite run in 800MHz
iBook in 1600 secs, and in in  2GHZ Celeron in 700 secs. We are not
talking about millisecond boosts but about going to lunch or sitting by
your computer.

Another efficiency issue in Mac is that graphics are superb in Mac. The
default plot (quartz) is small but sharp. It used to scale instantly
when you changed its size, but this deteriorated in 2.0 series.

cheers, jari oksanen

-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] printing PCA scores

2005-01-24 Thread Jari Oksanen

On Sat, 2005-01-22 at 17:31 -0500, Jérôme Lemaître wrote:
 Hey folks,
 
 I have an environmental dataset on which I conducted a PCA (prcomp) and I
 need the scores of this PCA for each site (=each row) to conduct further
 analyses.
 
 Can you please help me with that?
 
Did you try help(prcomp) ?

It says that prcomp (may) return an item called 'x':

x: if 'retx' is true the value of the rotated data (the centred
  (and scaled if requested) data multiplied by the 'rotation'
  matrix) is returned.

[non-matching parentheses in the original help file]

So this is what you asked for.

cheers, jari oksanen

-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] MacOS X and vectorized files (emf, wmf,...)

2005-01-31 Thread Jari Oksanen

On Mon, 2005-01-31 at 08:06 +0100, Patrick Giraudoux H wrote:
 Dear Listers,
 
 We are organising practical trainings for students with R 2.0.1 under MacOS 
 X. I am used with R 2.0.1 under Windows XP and thus has been surprised not 
 to find functions in the MacOS X version of R providing vectorized chart 
 outputs to a file. For instance the equivalent of:
 
 win.metafile()
 
 or
 
 savePlot()
 
 ... including a wmf or emf option.
 
 Can one obtain only jpeg or bitmap or eps files with R under MacOS X or did 
 I miss something?
 
Saving a plot from a menu bar (click save) saves a plot in pdf which
is vectorized graphic format native to MacOS X. Further, dev.copy2eps()
work normally, as do postcript() and pdf() devices. See appropriate help
pages.

Native MS Windows formats (such as wmf) may not work, but who needs
them?

cheers, jari oksanen 
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Bootstrapped eigenvector

2005-01-31 Thread Jari Oksanen

Jérôme,


On Sat, 2005-01-29 at 14:14 -0500, Jérôme Lemaître wrote:
 Hello alls,
 
 I found in the literature a technique that has been evaluated as one of the
 more robust to assess statistically the significance of the loadings in a
 PCA: bootstrapping the eigenvector (Jackson, Ecology 1993, 74: 2204-2214;
 Peres-Neto and al. 2003. Ecology 84:2347-2363). However, I'm not able to
 transform by myself the following steps into a R program, yet?
 
 Can someone could help me with this?
 
 I thank you very much by advance.
 
 Here are the steps that I need to perform:
 1) Resample 1000 times with replacement entire raws from the original data
 sets (7 variables, 126 raws)
 2) Conduct a PCA on each bootstrapped sample
 3) To prevent axis reflexion and/or axis reordering in the bootstrap, here
 are two more steps for each bootstrapped sample
 3a) calculate correlation matrix between the PCA scores of the original and
 those of the bootstrapped sample
 3b) Examine whether the highest absolute correlation is between the
 corresponding axis for the original and bootstrapped samples. When it is not
 the case, reorder the eigenvectors. This means that if the highest
 correlation is between the first original axis and the second bootstrapped
 axis, the loadings for the second bootstrapped axis and use to estimate the
 confidence interval for the original first PC axis.
 4) Determine the p value for each loading. Obtained as follow: number of
 loadings =0 for loadings that were positive in the original matrix divided
 by the number of boostrap samples (1000) and/or number of loadings =0 for
 loadings that were negative in the original matrix divided by the number of
 boostrap samples (1000).
 
The following function seems to run the analysis like Peres-Neto and
others defined:

 netoboot
function (x, permutations=1000, ...)
{
   pcnull - princomp(x, ...)
   res - pcnull$loadings
   out - matrix(0, nrow=nrow(res), ncol=ncol(res))
   N - nrow(x)
   for (i in 1:permutations) {
   pc - princomp(x[sample(N, replace=TRUE), ], ...)
   pred - predict(pc, newdata = x)
   r -  cor(pcnull$scores, pred)
   k - apply(abs(r), 2, which.max)
   reve - sign(diag(r[k,]))
   sol - pc$loadings[ ,k]
   sol - sweep(sol, 2, reve, *)
   out - out + ifelse(res  0, sol =  0, sol = 0)
   }
   out/permutations
}

With typical chemical data, you should pass option cor = TRUE to
princomp. Another issue is whether you should use this method. Opinions
may be divided here, but I'll let that to the proper Statistician to
comment on.

Best wishes, Jari Oksanen

-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Problem installing Hmisc (more info)

2005-02-06 Thread Jari Oksanen

On 5 Feb 2005, at 23:59, Prof Brian Ripley wrote:
On Sat, 5 Feb 2005, Michael Kubovy wrote:
Frank Harrell suggested I re-post with information about the version 
of
R
Thanks, that starts to make sense.  You appear to have g77 installed, 
but not in the place the person who prepared the binary install of R 
has it.
Where do you have it installed?  ('whereis g77' or 'which g77' should 
tell you.)  Then you need to alter FLIBS in R_HOME/etc/Makeconf to 
point to it.  (You can remove -lfrtbegin from FLIBS: it is not needed: 
your R_HOME looks to be 
/Library/Frameworks/R.framework/Versions/2.0.1.)

However, I believe there is a more fundamental problem: because libg2c 
is a static library on current MacOS X, most packages using Fortran 
cannot be compiled there.  That's presumably the case with Hmisc, as 
the automated package builder is not providing a binary build.  (There 
is supposedly a check directory on CRAN, but it is not there at 
present.)

This must be a problem specific to a certain installation. I regularly 
build Fortran files into MacOS X binaries. The specification in this 
machine is:

pomme:~ jarioksanen$ uname -a
Darwin pomme.local 7.7.0 Darwin Kernel Version 7.7.0: Sun Nov  7 
16:06:51 PST 2004; root:xnu/xnu-517.9.5.obj~1/RELEASE_PPC  Power 
Macintosh powerpc
pomme:~ jarioksanen$ locate libg2c
/usr/local/lib/libg2c.0.0.0.dylib
/usr/local/lib/libg2c.0.dylib
/usr/local/lib/libg2c.a
/usr/local/lib/libg2c.dylib
/usr/local/lib/libg2c.la

So there are both static and dynamic libg2c's. I don't know of any 
package management system for Mac, so I don't know who installed these 
files, but probably it was g77. Probably I got these from 
http://hpc.sourceforge.net/, though (I try avoid Fink which is a 
constant source of trouble).

Similarly, the g2c is installed in /usr/local:
pomme:~ jarioksanen$ which g77
/usr/local/bin/g77
I got my g77 from a place pointed to in R-MacOS X FAQs.
I have often seen problems in MacOS with hardcoded paths which assume 
certain locations for files. Latest problem was that 'rgl' library 
assumes that 'libpng' to be in a different place that I had it. For 
instance, Darwin's Fink installs stuff in a unique place called /sw. 
Perhaps that's the problem? However, ideally MacOS software should work 
'jus anywhere' (and 'just work') like they say in ads.

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] randomisation

2005-02-09 Thread Jari Oksanen

On Wed, 2005-02-09 at 14:27 +0100, Yann Clough wrote:
I  am working on an ecological problem and dealing with a matrix where
rows correspond to samples, and columns correspond to species.
 
The values in the matrix are recorded abundances of the organisms.
 
I  want  to  create  a  series  of  randomised  datasets  where  total
abundances per sample (rowSums) and per species (colSums) are equal to
those in the dataset of my observations.
 
Simple example of the kind of thing I have:
 
matrix(c(1,0,2,10,1,3,5,6,7,1,0,0),nrow=4,  ncol=3,by=row)  # observed
data
 
rowSums(tempmatrix) #individuals per location,
 
colSums(tempmatrix) #individuals per species
 
example of a matrix which complies with the two restrictions:
 
tempmatrix2=matrix(c(1,0,2,11,0,3,5,6,7,0,1,0),nrow=4, ncol=3,by=row)
 
rowSums(tempmatrix2)
 
colSums(tempmatrix2)
 
hope this is clear
 
As already explained, this may not be possible as a simple permutation.
You seem to have something else on your mind: moving individuals freely
between species instead of permuting data which means redistributing the
abundances among species instead of permutation. 

For a traditional permutation, you may have a look at the labdsv package
(for ecological applications). This has function 'rndveg' which
attempts to preserve either species occurrence distributions or plot-
level species richness. Preserving both may be impossible, but check
the function. The 'labdsv' source package is available at CRAN, and
Windows and MacOS X binaries through my web page
(http://cc.oulu.fi/~jarioksa/softhelp/softalist.html).


The Windows binary is not available at CRAN since the package fails R
CMD check in Windows (so you shouldn't check the package but just use
it). The Mac binary is not available at CRAN since the whole Mac binary
package system seems to be dysfunctional (there is nothing after Jan 19,
2005).

cheers, jari oksanen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] installing package hier.part on Mac OSX

2005-02-10 Thread Jari Oksanen

On Thu, 2005-02-10 at 08:01 +, Prof Brian Ripley wrote:
 For MacOS we have
 
 Binary packages, foo.tgz
 Source packages, foo.tar.gz
 
 Neither are `zip files' (things created by zip, usually with extension .zip).
 
 It looks like you have not installed a source package before, and you 
 either do not have the development tools installed or they are not in your 
 path.

In this case probably the tools are missing, starting from 'make'. You
should install the Development Tools / X-Code which come with the MacOS
X installation cd/dvd at least from version 10.3.x. Otherwise you can
get the development tools from http://developer.apple.com/. Moreover,
in this case you need to get a Fortran compiler which does not come with
MacOS. See R for Mac OS X FAQ, section the Fortran compiler g77 gcc
3.3. 

 
 That there is no binary version of a package available usually indicates a 
 problem with it on MacOS X, at least on the autobuilder's version of 
 MacOS.
 
Well, 'hier.part' is younger than the latest entry in Mac binary
packages: there is nothing after Jan 19, 2005. The binary package builds
beautifully in MacOS X. However, it seems to require package 'gtools'
that I can't find in CRAN nor in BioConductor repositories. It seems
that this didn't prevent passing tests to be included at CRAN or
producing Windows binaries. 

Theresa, I can send you a Mac binary if you don't want to see the
trouble of installing X-Code and g77. However, it failed with missing
'gtools' upon loading.

cheers, jari oksanen
 On Wed, 9 Feb 2005, Theresa Talley wrote:
 
  Hi-
  I've been trying to install the hier.part package on
  my mac (OSX 10.3.7) and it is not working for some
  reason. I am downloading the package source called :
  hier.part_1.0.tar.gz.  When I try to auto install from
  the cran site, I get this message:
  * Installing *source* package 'hier.part' ...
  ** libs
  /Library/Frameworks/R.framework/Resources/bin/SHLIB:
  line 1: make: command not found
 
  And when I try to install from the zip file on my
  computer, I get this message:
 
 What precisely did you do here?
 
  gzip: stdin: not in gzip format
  tar: Child returned status 1
  tar: Error exit delayed from previous errors
  Error in file(file, r) : unable to open connection
  In addition: Warning messages:
  1: Installation of package hier.part had non-zero exit
  status in: install.packages(c(hier.part), lib =
  /Library/Frameworks/R.framework/Resources/library,
  2: tar returned non-zero exit code: 512 in: untar(pkg,
  tmpDir)
  3: cannot open file `hier.part_1.0.tar/DESCRIPTION'
 
  I've successfully installed other packages (e.g.,
  vegan, cluster) so am not sure if there is something
  different about this one or if Im just being dopey.
 
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 __ R-help@stat.math.ethz.ch 
 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the 
 posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Failure of update.packages()

2005-02-10 Thread Jari Oksanen

On Thu, 2005-02-10 at 13:52 +0100, Peter Dalgaard wrote:
 I M S White [EMAIL PROTECTED] writes:
 
  Can anyone explain why with latest version of R (2.0.1) on FC3, installed
  from R-2.0.1-0.fdr.2.fc3.i386.rpm, update.packages() produces the message
  
  /usr/lib/R/bin/Rcmd exec: INSTALL: not found.
  
  Indeed /usr/lib/R/bin seems to lack various shell scripts (INSTALL,
  REMOVE, etc).

 You need to install the R-devel package too:
 1
 R-devel-2.0.1-0.fdr.2.fc3.i386.rpm 
 
 The big idea is that this will suck in all the required compilers,
 libraries, and include files via RPM dependencies, but users with
 limited disk space may be content with the binaries of R+recommended
 packages. 
 
This kind of problems were to be anticipated, weren't they? The great
divide between use-only and devel packages is a rpm packaging standard,
but not very useful in this case: it splits a 568K devel chip from a
15.4M chunk of base R. Moreover, you don't have a repository of binary
packages for Linux which means that not many people can use the 568K
saving in download times (saving in disk space is more considerable of
course). So are there plans for binary Linux packages for other distros
than Debian so that people could use the non-devel piece of R only?

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Failure of update.packages()

2005-02-10 Thread Jari Oksanen

On 10 Feb 2005, at 19:26, Peter Dalgaard wrote:
[EMAIL PROTECTED] writes:
Quoting Jari Oksanen [EMAIL PROTECTED]:
On Thu, 2005-02-10 at 13:52 +0100, Peter Dalgaard wrote:
I M S White [EMAIL PROTECTED] writes:
Can anyone explain why with latest version of R (2.0.1) on FC3, 
installed
from R-2.0.1-0.fdr.2.fc3.i386.rpm, update.packages() produces the 
message

/usr/lib/R/bin/Rcmd exec: INSTALL: not found.
Indeed /usr/lib/R/bin seems to lack various shell scripts (INSTALL,
REMOVE, etc).

You need to install the R-devel package too:
1
R-devel-2.0.1-0.fdr.2.fc3.i386.rpm
The big idea is that this will suck in all the required compilers,
libraries, and include files via RPM dependencies, but users with
limited disk space may be content with the binaries of R+recommended
packages.
This kind of problems were to be anticipated, weren't they? The great
divide between use-only and devel packages is a rpm packaging 
standard,
but not very useful in this case: it splits a 568K devel chip from a
15.4M chunk of base R. Moreover, you don't have a repository of 
binary
packages for Linux which means that not many people can use the 568K
saving in download times (saving in disk space is more considerable 
of
course). So are there plans for binary Linux packages for other 
distros
than Debian so that people could use the non-devel piece of R only?

cheers, jari oksanen
The splitting is an experiment (and I said so when I announced it).
It does have unforseen consequences, like implicating me in 
maintaining a
repository of binary RPMs for CRAN packages, which I'm not 
particularly keen
on.

So I shall probably revert to a single RPM, and force the installation
requirements to be the same as the build requirements.  This was, in 
fact,
Peter's suggestion which shows that not everybody is as short-sighted 
as me.

Martyn
Hmm... Actually, you had sort of convinced me that the split might be
a good idea. Point being of course that it's not the 568K that gets
shaved off in R-devel, it's the 12M for gcc + the 5M for g77 + 28M for
perl + more, which are only needed for installing packages and are
therefore not dependencies of the main R RPM. Maintaining binary
package RPMs was never in the cards as I saw it. However, it then only
makes sense if a sizable proportion of R users are never going to
install packages. Otherwise you get cost of having to explain the
point repeatedly, at basically zero benefit.
That's a good point. You could look at MacOS X standard installation to 
see what can be left out in a working installation. In default Mac, you 
don't have gcc (12M), nor g77, but you sure need perl for a sensible 
working machine, and tha't in default MacOS X installation. The price 
is that you need a possibility to install binary R packages. So not so 
much saving, but a bit more than what you get by shaving off R-devel.

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R + MacOSX + Emacs(XEmacs) + ESS

2005-02-15 Thread Jari Oksanen

On Tue, 2005-02-15 at 10:34 -0200, Ronaldo Reis-Jr. wrote:
 Hi,
 
 I try to use Emacs or XEmacs with R in a MacOS X Panter without X11.
 
 Anybody can make this work?

Did you try googling for macos X emacs? That's the way you get it. I
have found two different versions, both work graphically without X11.
ESS installs quite smoothly. Depending on your configuration, you may
have to use ESC for Meta instead of Alt of some other systems. So
start R in ESS using ESC-R. 

(The emacs that comes with MacOS X also is GNU Emacs, but works only
within terminal window.) 

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] eigen vector question

2005-02-18 Thread Jari Oksanen

On Fri, 2005-02-18 at 11:26 +0100, Uwe Ligges wrote:
 Jessica Higgs wrote:
 
  Sorry to bother everyone, but I've looked in all of the help files and 
  manuals I have and I can't find the answer to this question.  I'm doing 
  principle component analysis by calculating the eigen vectors of a 
  correlation matrix that I have that is composed of 21 parameters.  I 
  have the eigen vectors and their values that R produced for me but I'm 
  not sure how to tell which eigen vector/value corresponds to which 
  parameter because when R produces eigen vectors it does so in decreasing 
  order of significance, meaning that the eigen vector that explains the 
  most of the variance is listed first, followed by the next eigen vector, 
  etc etc. Any help would be appreciated. Feel free to write back if you 
  need more information on my problem.  Thanks!
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 
 Have you considered to use princomp()?
 
It is really weird that people always recommend using princomp, although
it is numerically inferior to prcomp and fails with rank deficit data.
The natural solution would be to define functions:

loadings - 
function(x) UseMethod(loadings)

loadings.princomp - 
function (x) x$loadings

loadings.prcomp - 
function(x) structure(x$rotation, class=loadings)

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to set up number of prin comp.

2005-02-25 Thread Jari Oksanen

On Fri, 2005-02-25 at 20:29 +0800, [EMAIL PROTECTED] wrote:
 Hi Bjrn-Helge,
 
 Thanks for your help.
 
 In my case, there are more variables in the matrix than the units, 
 so I have to use Prcomp with covariance to do PCA. The problem I am facing 
 is how to get fisrt 8 coefficients and scores and how to write the result 
 into text file. Thanks again.
 
 When I change princomp to prcomp below, I will NULL for pc$scores 
  pc$loadings.
 
 X - some matrix
 pc - prcomp(X)
 pc$scores[,1:4]# The four first score vectors
 pc$loadings[,1:4]  # The four first loadings
 
Three most useful commands are help(), str() and names().

The first tells you how to use prcomp() and how it names its results.
Try help(prcomp).
The second peeks into the result so you see what is in there. Try (with
your result) srt(pc).
The third tells you what names are available in your result.
The first (help) is the most useful of these commands, since it tells
you what these names and items are. If you read it, you should say:
pc$x[, 1:4] # The four first score vectors
pc$rotation[, 1:4] # The four first loadings

Also, loadings(pc) should work with prcomp.

I think I'll write functions as.prcomp.princomp and as.princomp.prcomp
someday. 

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Reconstructing Datasets

2005-03-01 Thread Jari Oksanen


On Tue, 2005-03-01 at 20:30 +, Laura Quinn wrote:
 Hi,
 
 Is it possible to recreate smoothed data sets in R, by performing a PCA
 and then reconstructing a data set from say the first 2/3 EOFs?
 
 I've had a look in the help pages and don't seem to find anything
 relevant.
 
It's not in the R help, but in the books about PCA in help references. 

This can be done, not quite directly. Most of the hassle comes from the
centring, and I guess in your case, from scaling of the results. I guess
it is best to first scale the results like PCA would do, then make the
low-rank approximation, and then de-scale:

x - scale(x, scale = TRUE)
pc - prcomp(x)

Full rank will be:

xfull - pc$x %*% pc$rotation

The eigenvalues already are incorporated in pc$x, and you don't have to
care about them.

Then rank=3 approximation will be:

x3 - pc$x[,1:3] %*% pc$rotation[,1:3]

Then you have to de-scale:

x3 - sweep(x3, 2, attr(x, scaled:scale, *)
x3 - sweep(x3, 2, attr(x, scaled:center, +)

And here you are. I wouldn't call this a smoothing, though.

Library 'vegan' can do this automatically for PCA run with function
'rda', but there the scaling of raw results is non-conventional (though
biplot).

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Reconstructing Datasets

2005-03-01 Thread Jari Oksanen

On Wed, 2005-03-02 at 08:30 +0200, Jari Oksanen wrote:
 On Tue, 2005-03-01 at 20:30 +, Laura Quinn wrote:
  Hi,
  
  Is it possible to recreate smoothed data sets in R, by performing a PCA
  and then reconstructing a data set from say the first 2/3 EOFs?
  
  I've had a look in the help pages and don't seem to find anything
  relevant.
  
 It's not in the R help, but in the books about PCA in help references. 
 
 This can be done, not quite directly. Most of the hassle comes from the
 centring, and I guess in your case, from scaling of the results. I guess
 it is best to first scale the results like PCA would do, then make the
 low-rank approximation, and then de-scale:
 
 x - scale(x, scale = TRUE)
 pc - prcomp(x)
 
 Full rank will be:
 
 xfull - pc$x %*% pc$rotation

Naturally, I forgot the transposition:

xfull - pc$x %*% t(pc$rotation)

and the check:

range(x - xfull) 

which should be something in magnitude 1e-12 or better (6e-15 in the
test I run).

 
 The eigenvalues already are incorporated in pc$x, and you don't have to
 care about them.
 
 Then rank=3 approximation will be:
 
 x3 - pc$x[,1:3] %*% pc$rotation[,1:3]
 
and the same here:

x3 - pc$x[,1:3] %*% t(pc$rotation[,1:3])

The moral: cut-and-paste.

 Then you have to de-scale:
 
 x3 - sweep(x3, 2, attr(x, scaled:scale, *)
 x3 - sweep(x3, 2, attr(x, scaled:center, +)
 
And here you need to close the parentheses: 

x3 - sweep(x3, 2, attr(x, scaled:scale, *))
x3 - sweep(x3, 2, attr(x, scaled:center, +))

The moral #1: cut-and-paste.
and #2: drink coffee in the morning.

 And here you are. I wouldn't call this a smoothing, though.
 
 Library 'vegan' can do this automatically for PCA run with function
 'rda', but there the scaling of raw results is non-conventional (though
 biplot).
 

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Multidimensional Scaling (MDS) in R

2005-03-09 Thread Jari Oksanen

This nmds seems to be the wrapper function in the labdsv package. 
Please check the documentation in that package. If I remember 
correctly, labdsv is geared for cases with large number of points, and 
then you don't want to get labels because they would be too congested 
to be seen anyway. The recommended procedure is to identify interesting 
points using 'plotid' function in labdsv.

Function nmds is a very simple wrapper: it uses isoMDS in the MASS 
package, and adds class and some class methods. You may use isoMDS 
directly instead:

dis - dsvdis(x) # Assuming you use labdsv
ord - isoMDS(dis)
plot(ord$points, asp = 1, type=n)
text(ord$points, rownames(ord$points)
The posting guide tells  you to make package specific questions to the 
package author directly. In this case, the package author does not read 
R-News.

cheers, jari oksanen
On 8 Mar 2005, at 19:43, Isaac Waisberg wrote:
Hi;
I am working with the similarity matrix below and I would like to plot
a two-dimensional MDS solution such as each point in the plot has a
label.
This is what I did:
data - read.table('c:/multivariate/mds/colour.txt',header=FALSE)
similarity - as.dist(data)
distance - 1-similarity
result.nmds - nmds(distance)
plot(result.nmds)
(nmds and plot.nmds as defined at
labdsv.nr.usu.edu/splus_R/lab8/lab8.html; nmds simply calls isoMDS)
Colour.txt, containing the similaity matrix, reads as follows:
 1.0 .86 .42 .42 .18 .06 .07 .04 .02 .07 .09 .12 .13 .16
 .86 1.0 .50 .44 .22 .09 .07 .07 .02 .04 .07 .11 .13 .14
 .42 .50 1.0 .81 .47 .17 .10 .08 .02 .01 .02 .01 .05 .03
 .42 .44 .81 1.0 .54 .25 .10 .09 .02 .01 .01 .01 .02 .04
 .18 .22 .47 .54 1.0 .61 .31 .26 .07 .02 .02 .01 .02 .01
 .06 .09 .17 .25 .61 1.0 .62 .45 .14 .08 .02 .02 .02 .01
 .07 .07 .10 .10 .31 .62 1.0 .73 .22 .14 .05 .02 .02 .01
 .04 .07 .08 .09 .26 .45 .73 1.0 .33 .19 .04 .03 .02 .02
 .02 .02 .02 .02 .07 .14 .22 .33 1.0 .58 .37 .27 .20 .23
 .07 .04 .01 .01 .02 .08 .14 .19 .58 1.0 .74 .50 .41 .28
 .09 .07 .02 .01 .02 .02 .05 .04 .37 .74 1.0 .76 .62 .55
 .12 .11 .01 .01 .01 .02 .02 .03 .27 .50 .76 1.0 .85 .68
 .13 .13 .05 .02 .02 .02 .02 .02 .20 .41 .62 .85 1.0 .76
 .16 .14 .03 .04 .01 .01 .01 .02 .23 .28 .55 .68 .76 1.0
The first row corresponds to colour 1 (C1), the second to colour 2
(C2), and so on.
First, I'm not sure if this is correct or not. Second, obviously the
points in the plot are not labeled. I suppose I must add a labels
column and then print the labels together with the results. But, how
should I do it?
Many thanks,
Isaac
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

--
Jari Oksanen, Oulu, Finland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Significance of Principal Coordinates

2005-03-15 Thread Jari Oksanen

On Mon, 2005-03-14 at 18:32 +0100, Christian Kamenik wrote:
 Dear all,
 
 I was looking for methods in R that allow assessing the number of  
 significant principal coordinates. Unfortunatly I was not very 
 successful. I expanded my search to the web and Current Contents, 
 however, the information I found is very limited.
 Therefore, I tried to write code for doing a randomization. I would 
 highly appriciate if somebody could comment on the following approach. I 
 am neither a statistician, nor an R expert... the data matrix I used has 
 72 species (columns) and 167 samples (rows).
 
Earlier this year (Sat, 29 Jan 2005) Jérôme Lemaître asked something
similar here under subject Bootstrapped eigenvector (but the code I
posted then had one bug I know and perhaps some I don't know!). Some
ecologists (Donald Jackson, Peres-Neto) have indeed tried to develop
methods for PCA, and they could be easily modified for PCoA which is
about the same method, in particular with Euclidean distances like you
used. So the following two solutions are practically identical (within
2e-15 in the case I tried):

x - decostand(x, norm) # in vegan
chordis - dist(x) # Euclidean is the default, so this is chord distance
pcoa - cmdscale(chordis)
pca - prcomp(x)

Verify this with:

procrustes(pcoa, pca, choices=1:2) # in vegan

PCoA with row weights is something different, but I really don't know
why would you like to do this. I really don't understand what people
mean with significant eigenvalues, unless they are making Factor
Analysis. In PCA, you rotate your data, and you can find low-rank
approximations of your data, but how these are rotatations are
significant is beyond my imagination. Further, resampling with
replacement seems to suit poorly to multivariate analysis: it duplicates
some rows and so it makes easier to find similar rows that is the
ultimate task in PC rotation. It seems that Monte Carlo results are
systematically better than any original data (only if number of rows
is much lower than  number of columns this is not disturbing). Also,
resampling or shuffling species tends to create communities that are
fundamentally different from any real community we have: instead of
single or a few abundant species, they may have several or none. With
total abundance constraint you can hide the traces of anarchistic
community assembly, but not its fundamental fault. So I do think that
(1) you cannot use resampling in assessing PCA and its kin, (2) you
cannot say what is the meaning of being significant in this case, and
(3) the number of significant axes would only be a function of sample
size even here.

Now my hope is that some guru over there gets so irritated that (s)he
chastises me for writing such pieces of stupidity, and sends a correct
solution here with accompanying code and references to the literature.
Let's hope so.

The old truth is that most data sets have 2.5 dimensions (Kruskal):
those two that you can show in a printed plot, and that half a dimension
that you must explain away in the text. Wouldn't that be a sufficient
solution?

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How to do such MDS in R

2005-03-23 Thread Jari Oksanen

On 21 Mar 2005, at 13:29, ronggui wrote:
i know cmdscale and isoMDS inR can do classical and non-metric MDS.but 
i want to konw if there is packages can carry on individual 
differences scaling and multidimensional analysis og 
preference?both method are important one,but i can not find any clue 
on how to do it using R.
anyone can help?
thank you!
It may be that individual differences scaling is not available in R. 
The classic piece of software for this purpose is SINDSCAL. It is 
beautiful Fortran (although this sounds like contradiction in terms), 
and it would be easy to port the software into R, but I think the 
license does not allow this. The hardest bit would be to change the 
output into R. I suggest you dig up SINDSCAL somewhere -- it could be 
in netlib -- and compile it yourself. Gnu g77 is quite OK.

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Principle Component Analysis in R

2005-04-05 Thread Jari Oksanen

On Tue, 2005-04-05 at 16:59 +1200, Brett Stansfield wrote:
 Dear R 
 Should I be concerned if the loadings to a Principle Component Analysis are
 as follows:
 
 Loadings:
   Comp.1 Comp.2 Comp.3 Comp.4
 X100m -0.500  0.558 0.661
 X200m -0.508  0.379  0.362 -0.683
 X400m -0.505 -0.274 -0.794 -0.197
 X800m -0.486 -0.686  0.486  0.239
 
Comp.1 Comp.2 Comp.3 Comp.4
 SS loadings  1.00   1.00   1.00   1.00
 Proportion Var   0.25   0.25   0.25   0.25
 Cumulative Var   0.25   0.50   0.75   1.00
 
 I just got concerned that no loading value was given for  X100m, component
 3. I have looked at the data using list() and it all seems OK

You don't have to worry about one empty cell in loadings: the print
function (called behind the curtain to show the results to you) is so
clever that it doesn't show you small numbers, although they are there.
I guess this happens because people with Factor Analysis background
expect this. However, I would be worried if I got results like this, and
would not use Princip*al* Components at all, since none of the
components seems to be any more principal than others. Wouldn't original
data do?

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] two methods for regression, two different results

2005-04-06 Thread Jari Oksanen

On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote:
 Please forgive a straight stats question, and the informal notation.
  
 let us say we wish to perform a liner regression:
 y=b0 + b1*x + b2*z
  
 There are two ways this can be done, the usual way, as a single
 regression, 
 fit1-lm(y~x+z)
 or by doing two regressions. In the first regression we could have y as
 the dependent variable and x as the independent variable 
 fit2-lm(y~x). 
 The second regrssion would be a regression in which the residuals from
 the first regression would be the depdendent variable, and the
 independent variable would be z.
 fit2-lm(fit2$residuals~z)
  
 I would think the two methods would give the same p value and the same
 beta coefficient for z. The don't. Can someone help my understand why
 the two methods do not give the same results. Additionally, could
 someone tell me when one method might be better than the other, i.e.
 what question does the first method anwser, and what question does the
 second method answer. I have searched a number of textbooks and have not
 found this question addressed.
  
John,

Bill Venables already told you that they don't do that, because they are
not orthogonal. Here is a simpler way of getting the same result as he
suggested for the coefficients of z (but only for z):

 x - runif(100)
 z - x + rnorm(100, sd=0.4)
 y - 3 + x + z + rnorm(100, sd=0.3)
 mod - lm(y ~ x + z)
 mod2 - lm(residuals(lm(y ~ x)) ~ x + z)
 summary(mod)

Call:
lm(formula = y ~ x + z)

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  2.964360.06070  48.836   2e-16 ***
x0.962720.11576   8.317 5.67e-13 ***
z1.089220.06711  16.229   2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom

 summary(mod2)

Call:
lm(formula = residuals(lm(y ~ x)) ~ x + z)

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) -0.157310.06070  -2.592   0.0110 *
x   -0.844590.11576  -7.296 8.13e-11 ***
z1.089220.06711  16.229   2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom

You can omit x from the outer lm only if x and z are orthogonal,
although you already removed the effect of x... In orthogonal case the
coefficient for x would be 0.

Residuals are equal in these two models:

 range(residuals(mod) - residuals(mod2))
[1] -2.797242e-17  5.551115e-17

But, of course, fitted values are not equal, since you fit the mod2 to
the residuals after removing the effect of x...

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] weird results w/ prcomp-princomp

2005-04-08 Thread Jari Oksanen

On Fri, 2005-04-08 at 11:12 +0200, Alessandro Bigi wrote:
 I am doing a Principal Component Analaysis (PCA) on a 44x19 matrix.
 with
   princomp(x,cor=TRUE,scores=TRUE)
 and
   prcomp(x,scale=TRUE,center=TRUE)
 The resulted eigenv. and rotated matrix are the same (as expected), however 
 the sum of eigenvalues is lower than 19 (number of variables).
 
What about the sum of squared sdev? (Hint, the prcomp help page says
that the returned sdev are the square root of the eigenvalues. While
princomp help does not say this explicitly, it says that sdev are
standard deviations).

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Error message with nmds

2006-05-17 Thread Jari Oksanen

On Tue, 2006-05-16 at 13:25 -0700, Jonathan Hughes wrote:
 I am trying to apply nmds to a data matrix but I receive the  
 following error message:
 
 Error in isoMDS(dis, y = y, k = k, maxit = maxit) :
   zero or negative distance between objects 5 and 7
 
 The data are in a vegetation cover-class matrix (species in columns,  
 plots in rows, classes 1-8 with lots of zero values) converted to a  
 dissimilarity matrix (bray curtis).
 
 I assumed that objects 5 and 7 refer to rows of my original data; and  
 they do have the same species with the same cover classes.  I deleted  
 one of these rows but I received the same error message with a rerun  
 of nmds.  As it turns out, the new rows 5 and 7 are the same.  How do  
 I avoid this problem?

Jonathan, this is a FAQ in the proper sense of the word: this is
frequently asked. Last thread was on April, 2006. See

https://stat.ethz.ch/pipermail/r-help/2006-April/092598.html

and answers. You may also use RSiteSearch with keyword isoMDS to find
other (and older) threads.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] 2 Courses Near You - (1) Introduction to R/S+ programming: Microarrays Analysis and Bioconductor, (2) R/Splus Fundamentals and Programming Techniques

2006-06-14 Thread Jari Oksanen

On Tue, 2006-06-13 at 21:34 +0200, Uwe Ligges wrote:
 ... and again I wonder which courses are near. This leads at once to 
 the question: which metric is in use?. 
Possibly this:

### Great Circle distances
### Use different sign to N and S, and to E and W
### (does not matter which sign)
### Lat and long must be in degrees + decimals (sorry)
globedis -
function(lat0, lon0, lat1, lon1, km = TRUE)
{
phi0 - pi/180*lat0
phi1 - pi/180*lat1
lambda - pi/180*(lon0 - lon1)
delta - sin(phi0) * sin(phi1) + cos(phi0) * cos(phi1) * cos(lambda)
delta - acos(delta)
dist - 60*180/pi*delta
dist - dist %% 10800
if (km)
dist - 1.852 * dist
dist
}

Which says that Boston is nearest to my office (6100km). The other
alternatives are Baltimore 6720km, Chicago 6800km, Raleigh 7050km and
San Francisco 8240km.

In more practical metric of flight time, Baltimore is closest (OUL - BWI
12h55min), but Boston and Chicago are not much further away (OUL - BOS
14h00min, OUL-CHI 14h15min).

cheers, jari oksanen

 Probably some football related 
 metric: FIFA WM takes place in Dortmund and commercials say something 
 like the world is our guest ...
 Now, let's escape from football to Austria and Vienna's useR!2006 
 conference!
 
 Uwe Ligges
 
 
 
 [EMAIL PROTECTED] wrote:
  XLSolutions Corporation (www.xlsolutions-corp.com) is proud to announce:
  
  (1) Introduction to R/S+ programming: Microarrays Analysis and Bioconductor 
   
*** San Francisco / July 17-18, 2006 ***
*** Chicago   / July 24-25, 2006 ***
*** Baltimore / July 27-28, 2006 *** 
*** Raleigh   / July 17-18, 2006 ***
*** Boston/ July 27-28, 2006 ***
http://www.xlsolutions-corp.com/RSmicro
  
  (2) R/Splus Fundamentals and Programming Techniques
  
*** San Francisco / July 10-11, 2006 ***
*** Houston   / July 13-14, 2006 ***
*** San Diego / July 17-18, 2006 ***
*** Chicago   / July 20-21, 2006 ***
*** New York City / July 24-25, 2006 ***
*** Boston/ July 27-28, 2006 ***
   
http://www.xlsolutions-corp.com/Rfund.htm  
  
  Ask for group discount and reserve your seat Now - Earlybird Rates.
  Payment due after the class! Email Sue Turner:  [EMAIL PROTECTED]
  Interested in our Advanced Programming class? 
  
  (1) Introduction to R/S+ programming: Microarrays Analysis and Bioconductor 

   
  Course Outline:
  
  - R/S System: Overview; Installation and Demonstration 
  - Data Manipulation and Management 
  - Graphics; Enhancing Plots, Trellis 
  - Writing Functions 
  - Connecting to External Software 
  - R/S Packages and Libraries (e.g. BioConductor) 
  - BioConductor: Overview; Installation and Demonstration 
  - Array Quality Inspection 
  - Correction and Normalization; Affymetrix and cDNA arrays 
  - Identification of Differentially Expressed Genes 
  - Visualization of Genomic Information 
  - Clustering Methods in R/Splus 
  - Gene Ontology (GO) and Pathway Analysis 
  - Inference, Strategies for Large Data 
  
  
  
  (2) R/Splus Fundamentals and Programming Techniques
 
  Course outline.
  
  - An Overview of R and S
  - Data Manipulation and Graphics
  - Using Lattice Graphics
  - A Comparison of R and S-Plus
  - How can R Complement SAS?
  - Writing Functions
  - Avoiding Loops
  - Vectorization
  - Statistical Modeling
  - Project Management
  - Techniques for Effective use of R and S
  - Enhancing Plots
  - Using High-level Plotting Functions
  - Building and Distributing Packages (libraries)
  - Connecting; ODBC, Rweb, Orca via sockets and via Rjava
  
  Email us for group discounts.
  Email Sue Turner: [EMAIL PROTECTED]
  Phone: 206-686-1578
  Visit us: www.xlsolutions-corp.com/training.htm
  Please let us know if you and your colleagues are interested in this
  class to take advantage of group discount. Register now to secure your
  seat!
  
  Cheers,
  Elvis Miller, PhD
  Manager Training.
  XLSolutions Corporation
  206 686 1578
  www.xlsolutions-corp.com
  [EMAIL PROTECTED]
  
  2 Courses - (1) Introduction to R/S+ programming: Microarrays Analysis and 
  Bioconductor 
  (2) R/Splus Fundamentals and Programming Techniques
   Interest in our R/Splus Advanced Programming?  Email us for 
  upcoming courses.
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
Jari Oksanen -- Dept Biology, Univ Oulu

Re: [R] MDS with missing data?

2006-06-14 Thread Jari Oksanen

Dear Context Grey,

On 15 Jun 2006, at 6:42, context grey wrote:


 I will be applying MDS (actually Isomap) to make a
 psychological
 concept map of the similarities between N concepts.

So actually, how do you do isomap? RSiteSearch gave me one hit of 
isomap. I only ask, because I've implemented a working version of 
isomap (not ready for prime time yet, but a proof that it works). If 
isomap already is available in R, I won't do anything more with the 
function. I don't understand the rest of the question, but isomap 
really may be able to work with NA dissimilarities: just replace them 
with shortest path distances via non-missing dissimilarities. In fact, 
you don't need but some ('k') non-missing dissimilarities per item, 
since that is how isomap works. Your dissimilarity structure may become 
disconnected, of course, but that's common in isomap.

If you mean that your raw data has NA, then you may select a 
dissimilarity function that can handle NA input and produce finite 
dissimilarities (I think daisy in the cluster package does this).

Somehow I feel I answered to quite a different question than you asked. 
Sorry.

 I would like to scale to a large number of concepts,
 however, the
 resulting N*(N-1) pairwise similarities is prohibitive
 for a user survey.
 I'm thinking of giving people random subsets of the
 pairwise
 similarities.

 Does anyone have recommendations for this situation?

 My current thoughts are to either

 1) use nonmetric/gradient descent MDS which seems to
 allow missing data, or

Not the isoMDS function in MASS. if N(N-1) is a problem, then nonmetric 
MDS may not be the solution.

 2) devise some scheme whereby the data that are ranked
 in common
by several people is used to derive a scaling
 factor for each
person's ratings.

 Thanks for any advice,

 _

Cheers, Green Power
--
Green Power, Oulu, Finland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] MDS with missing data?

2006-06-15 Thread Jari Oksanen

On Thu, 2006-06-15 at 07:13 +0300, Jari Oksanen wrote:


 
  1) use nonmetric/gradient descent MDS which seems to
  allow missing data, or
 
 Not the isoMDS function in MASS. if N(N-1) is a problem, then nonmetric 
 MDS may not be the solution.

Sorry for the wrong information: isoMDS does handle NA. I remembered old
times when I looked at the issue, but isoMDS changed since. Fine work!

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Ordination of feature film data question

2006-03-13 Thread Jari Oksanen

On Mon, 2006-03-13 at 07:50 +, Prof Brian Ripley wrote:
 `Ordination' is ecologists' terminology for multidimensional scaling.
 You will find worked examples in MASS (the book, see the R FAQ), and the 
 two most commonly used functions, isoMDS and sammon, in MASS the package.
 
'Ordination' in ecologists' terminology also covers principal components
analysis and variants of correspondence analysis. Actually, when an
ecologist speaks about 'ordination', she most often means correspondence
analysis, which also sounds like a natural (though perhaps not the best)
choice for co-occurrence data in movies.

cheers, jari oksanen 
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] transparent background for PDF

2006-03-24 Thread Jari Oksanen


On 24 Mar 2006, at 20:30, Dennis Fisher wrote:

 Colleagues

 Running R2.2.1 on either a Linux (RedHat 9) or Mac (10.4) platform.

 I created a PDF document using pdf(FILENAME.pdf, bg=transparent,
 version=1.4).  I then imported the graphic into PowerPoint -
 background was set to a non-transparent color.  I was hoping that the
 inserted graphic would be transparent - instead, it had a white
 background.

According to my experience, this is a feature of PowerPoint which 
seems to be incapable to display transparent background in PDF. This 
also concerns transparent background PDF's from other programmes than 
R. This experience is from Linux  Mac (pdf) and PP in Mac (never tried 
that with PowerPoint on Linux...).

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] isoMDS and 0 distances

2006-04-19 Thread Jari Oksanen

On Tue, 2006-04-18 at 22:06 -0400, Tyler Smith wrote:

 I'm trying to do a non-metric multidimensional scaling using isoMDS. 
 However, I have some '0' distances in my data, and I'm not sure how to 
 deal with them. I'd rather not drop rows from the original data, as I am 
 comparing several datasets (morphology and molecular data) for the same 
 individuals, and it's interesting to see how much morphological 
 variation can be associated with an identical genotype.
 
 I've tried replacing the 0's with NA, but the isoMDS appears to stop on 
 the first iteration and the stress does not improve:
 
 distA # A dist object with 13695 elements, 4 of which == 0
 cmdsA - cmdscale(distA, k=2)
 
 distB - distA
 distB[which(distB==0)] - NA
 
 isoA - isoMDS(distB, cmdsA)
 initial  value 21.835691
 final  value 21.835691
 converged
 
 The other approach I've tried is replacing the 0's with small numbers. 
 In this case isoMDS does reduce the stress values.
 
 min(distA[which(distA0)])
 [1] 0.02325581
 
 distC - distA
 distC[which(distC==0)] - 0.001
 isoC - isoMDS(distC)
 initial  value 21.682854
 iter   5 value 16.862093
 iter  10 value 16.451800
 final  value 16.339224
 converged
 
 So my questions are: what am I doing wrong in the first example? Why 
 does isoMDS converge without doing anything? Is replacing the 0's with 
 small numbers an appropriate alternative?
 
Tyler,

My experience is that isoMDS *may* fail to go away from the starting
configuration if there are identical values in initial configuration,
and this will happen if you use cmdscale() to get the initial
configuration. You *may* get over this by shifting duplicates a bit:

 con - cmdscale(dis)
 dups - duplicated(con)
 sum(dups)
[1] 2
 con[dups, ] - con[dups,] + runif(2*sum(dups), -0.01, 0.01)

Then isoMDS may go further.

Another issue is that at a quick look isoMDS() seems to do nothing
sensible with missing values, although it accepts them. The only thing
is that they are ordered last, or regarded as very long distances (in
your case they rather should be regarded as very short distances). The
keylines in isoMDS are:

ord - order(dis)
nd - sum(!is.na(ord))

Even when 'dis' has missing values,  the result of order() ('ord') has
no missing values, but with default argument na.last=TRUE they are put
last in the list. An obvious looking change would be to replace the
second line with:

nd - sum(!is.na(dis))

but this dumps the core of R at least in my machine: probably you need
the full length of vectors also in addition to number of non-missing
entries. (This quick look was based on the latest release version of
MASS/VR: there may be a newer version already with the upcoming R
release, but that's not released yet.)

You may check working with NA: are duplicate points identical in
results?

Then about replacing zero distances with a tiny number: this has been
discussed before in this list, and Ripley said no, no!. I do it all
the time, but only in secrecy. A suggested solution was to drop
duplicates, but then there still is a weighting issue, and isoMDS does
not have weights argument.


cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] isoMDS and 0 distances

2006-04-19 Thread Jari Oksanen

On Wed, 2006-04-19 at 07:46 +0100, Prof Brian Ripley wrote:
 Short answer: you cannot compare distances including NAs, so there is no 
 way to find a monotone mapping of distances.
 
The original Kruskal-Young-Shepard-Torgerson programme KYST (version 1
from 1973) could handle missing values. Unfortunately I've lost the
documents, but if I remember correctly, the argument was that you don't
need but a subset (representative for points) of (dis)similarities to
get a monotone regression. KYST -- and computers of that time (I used
Burroughs!) -- had limitations on data size, and removing some of the
dissimilarities was a way of getting more than 64 data points into
analysis. However, better not go into details since:

C THIS INFORMATION IS PROPRIETARY AND IS THE
C PROPERTY OF BELL TELEPHONE LABORATORIES,
C INCORPORATED.  ITS REPRODUCTION OR DISCLOSURE
C TO OTHERS, EITHER ORALLY OR IN WRITING, IS
C PROHIBITED WITHOUT WRITTEN PRERMISSION OF
C BELL LABORATORIES.
CKYST-2A AUGUST, 1977   

cheers, jari oksanen
-- 
Jari Oksanen -- Biologian laitos, Oulun yliopisto, 90014 Oulu
sposti [EMAIL PROTECTED], kotisivu http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] environmental data as vector in PCA plots

2004-05-10 Thread Jari Oksanen

On 10 May 2004, at 17:15, Heike Schmitt wrote:

I want to include a vector representing the sites - environmental data
correlation in a PCA.
I currently use prcomp (no scaling) to perform the PCA, and envfit to
retrieve the coordinates of the environmental data vector. However, the
vector length is different from the one obtained in CAnoco when 
performing
a species - environmental biplot (scaling -2). How can I scale the 
vector
in order to be in accordance with Canoco, or which other scaling 
options
are there?
Canoco scaling abs(2) does not scale sites, but the sum of squares of 
site scores = 1 for all axes. In contrast, prcomp scales site axes by 
eigenvalue, like does Canoco with scaling abs(1). Therefore you cannot 
get similar results as in Canoco. A simple solution that *may* (or may 
not) work is to transpose your data: instead of prcomp(x), try 
prcomp(t(scale(x, scale=F), center=F). This does the centring to the 
columns of x (like it should be done), then transposes your data and 
prcomp's without  new centring -- which was already made for columns (I 
didn't test this, but this way it was done in the olden times). Another 
alternative is to use the function rda in the same package where you 
found envfit (vegan), since it is not unlike Canoco in its scaling. 
However, it won't give you negative scalings of PCA (RDA without 
constraints), since its author (that's I) thinks that you shouldn't use 
negative scaling of Canoco in RDA/PCA. The package ships with a pdf 
document which discusses PCA scaling in prcomp, princomp, rda (of 
vegan) and Canoco (of Cajo ter Braak), and even hints how to get the 
minus scalings that the author doesn't approve.

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] BIO-ENV procedure

2004-05-14 Thread Jari Oksanen

On Fri, 2004-05-14 at 00:08, Peter Nelson wrote:
 I've been unable to find a R package that provides the means of 
 performing Clarke  Ainsworth's BIO-ENV procedure or something 
 comparable. Briefly, they describe a method for comparing two separate 
 sample ordinations, one from species data and the second from 
 environmental data. The analysis includes selection of the 'best' 
 subset of environmental variables for explaining the observed spp 
 ordination.  Is there something available or being developed?
 
Send a reference to the exact algorithm (or to recipe to algorithm) so
that someone can implement the method. Your post is not sufficient to
know what should be there.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] BIO-ENV procedure

2004-05-14 Thread Jari Oksanen

On Fri, 2004-05-14 at 00:08, Peter Nelson wrote:
 I've been unable to find a R package that provides the means of 
 performing Clarke  Ainsworth's BIO-ENV procedure or something 
 comparable. Briefly, they describe a method for comparing two separate 
 sample ordinations, one from species data and the second from 
 environmental data. The analysis includes selection of the 'best' 
 subset of environmental variables for explaining the observed spp 
 ordination.  Is there something available or being developed?
 
I found a photocopy of Clarke's  Ainsworth's paper (our library does
not subscribe to the Marine Ecology Progress Series, because salty seas
are too far away). It may be that you don't have a canned BIO-ENV
routine in R, but you can get very close to the procedure using R (and
the only missing piece looks unessential to me). However, I know that
Bill Venables was working with Clarke's PRIMER, and he may have canned
even BIO-ENV.

The following discussion is based on Clarke  Ainsworth, Mar. Ecol.
Prog. Ser. 92, 205-219; 1993. It seems that this is not an algorithm,
because it uses brute force and we have no idea if it converges to
anywhere. Let's call this a procedure. CA suggest analysis with
separate ordination of community data, and then selecting a subset of
environmental variables that is similar to the species data. Please note
that this is not constrained (or `canonical') ordination: species
ordination is done  independently. Further, the similarity of
environmental and biological structure is analysed apart from
ordination, so that you may have cases with good species -- environment
relationship, but not in ordination.

1. An NMDS of community data using Bray-Curtis dissimilarities. Use
isoMDS function of Ripley's  Venables's MASS library for NMDS.
Bray-Curtis dissimilarity is available at least in vegan, and probably
in ade4 and possibly in many other packages.

2. For evaluating species -- environment relationship, they suggest
using rank correlation between Bray-Curtis dissimilarities of community
data and Euclidean distances of environmental data with certain set of
environmental variables. You have Euclidean distances in stats function
dist (or in vegan and N other packages), and you can get rank
correlations with cor.test.  However, Clarke  Ainsworth suggest a new
type of rank correalation that they call ``harmonic rank correlation''
and this may not be in R (haven't searched, though). I do think that is
unessential for the method, so you can do with the existing rank
correlations.

3. Then comes the hard work. You have to try with all possible
combinations of environmental variables (and there may be several
combinations of them). This could be canned, because this is boring and
error prone. Thomas Lumley's leaps package does this for regression
analysis, and model could be taken from there. Now you can do this, but
it may be a bit hard work.

4. Now you select the best subset, or the subset giving highest rank
correlation. There is no guarantee that there is such a unique, clear
case, but you may be lucky.

5. Take that subset, get Euclidean distances, and ordinate those
distances using NMDS, and plot your two ordinations side by side. Clarke
 Ainsworth recommend using NMDS instead of metric (or classic) MDS, and
they warn against Procrustes comparison of these two solutions (but I
would suggest Procrustes comparison). 


 library(MASS)
 library(vegan)
 data(varespec)
 data(varechem)
 env - varechem[, c(N,P,K)]
 d - vegdist(varespec, bray)
 env - scale(env)
 cor.test(d, dist(env[,1]), method=spear)$est
  rho
0.1712362
 cor.test(d, dist(env[,2]), method=spear)$est
  rho
0.1803071
 cor.test(d, dist(env[,3]), method=spear)$est
  rho
0.2427814
 cor.test(d, dist(env[,c(1,2)]), method=spear)$est
  rho
0.2422454
 cor.test(d, dist(env[,c(1,3)]), method=spear)$est
  rho
0.2471631
 cor.test(d, dist(env[,c(2,3)]), method=spear)$est
  rho
0.2081135
 cor.test(d, dist(env[,c(1,2,3)]), method=spear)$est
  rho
0.2441523

Some warnings on ties were removed. This suggest that the best subset
uses variables 2 and 3 or N and K. In this case we can skip the NMDS of
community data, since it has rank 2 (only two environmental variables),
and can be exactly plotted in 2 dim without ordination. 

 mds.comm - isoMDS(d)
 par(mfrow=c(1,2))
 plot(mds.comm$points, asp=1)
 plot(env[, c(1,3)], asp=1)

Or, possibly:

 par(mfrow=c(1,1))
 plot(procrustes(env[,c(1,3)], mds.comm))

So you can do it. The missing pieces are the harmonic rank correlation
(if you think that's essential) and automating variable selection.
Somebody could do them (not me, though).

cheers, jari oksanen

-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Factor loadings and principal component plots

2004-05-04 Thread Jari Oksanen

On Tue, 2004-05-04 at 09:56, Prof Brian Ripley wrote:
 On 4 May 2004, Jari Oksanen wrote:
 
  On Tue, 2004-05-04 at 09:34, Prof Brian Ripley wrote:
  
   
   Yes, but princomp is the recommended way, not prcomp.
  
  But the documentation seems to recommend prcomp:
 
 For numerical accuracy, but not for flexibility.
 
Wouldn't the best alternative be to combine flexibility and accuracy
into one alternative? I mean, I'd still use prcomp after reading the
help pages, and I'd put more weight on accuracy than on flexibility. A
quick exploitation of the princomp would yield the attached flexible
prcomp code.

prcomp is more flexible at least in one point: it can handle data with
less units than variables.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]
prcomp.default -
function (x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL,
  subset = rep(TRUE, nrow(as.matrix(x))), ...) 
{
x - as.matrix(x)
x - x[subset, , drop = FALSE]
x - scale(x, center = center, scale = scale.)
s - svd(x, nu = 0)
if (!is.null(tol)) {
rank - sum(s$d  (s$d[1] * tol))
if (rank  ncol(x)) 
s$v - s$v[, 1:rank, drop = FALSE]
}
s$d - s$d/sqrt(max(1, nrow(x) - 1))
dimnames(s$v) - list(colnames(x), paste(PC, seq(len = ncol(s$v)), 
 sep = ))
r - list(sdev = s$d, rotation = s$v)
if (retx) 
r$x - x %*% s$v
class(r) - prcomp
r
}
prcomp.formula -
function (formula, data = NULL, subset, na.action, ...) 
{
mt - terms(formula, data = data)
if (attr(mt, response)  0) 
stop(response not allowed in formula)
cl - match.call()
mf - match.call(expand.dots = FALSE)
mf$... - NULL
mf[[1]] - as.name(model.frame)
mf - eval.parent(mf)
if (any(sapply(mf, function(x) is.factor(x) || !is.numeric(x 
stop(PCA applies only to numerical variables)
na.act - attr(mf, na.action)
mt - attr(mf, terms)
attr(mt, intercept) - 0
x - model.matrix(mt, mf)
res - prcomp.default(x, ...)
cl[[1]] - as.name(prcomp)
res$call - cl
if (!is.null(na.act)) {
res$na.action - na.act
if (!is.null(sc - res$x)) 
res$x - napredict(na.act, sc)
}
res
}
prcomp -
function (x, ...) 
UseMethod(prcomp)
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Factor loadings and principal component plots

2004-05-04 Thread Jari Oksanen

On Tue, 2004-05-04 at 09:34, Prof Brian Ripley wrote:

 
 Yes, but princomp is the recommended way, not prcomp.

But the documentation seems to recommend prcomp:

?prcomp:

 
 The calculation is done by a singular value decomposition of the
 (centered and scaled) data matrix, not by using 'eigen' on the
 covariance matrix.  This is generally the preferred method for
 numerical accuracy.

?princomp:

 The calculation is done using 'eigen' on the correlation or
 covariance matrix, as determined by 'cor'.  This is done for
 compatibility with the S-PLUS result.  A preferred method of
 calculation is to use 'svd' on 'x', as is done in 'prcomp'.

Just confused, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Common principle components

2004-05-27 Thread Jari Oksanen

On Wed, 2004-05-26 at 18:01, J. Pedro Granadeiro wrote:
 I am sorry for not being clear. I meant the methods detailed in:
 
 Flury, B. (1988). Common Principle Components Analysis and Related 
 Multivariate Models, John Wiley and Sons, New York. 

After writing my previous (long) response to this message, I started to
think that it would be strange if the ade4 people of Lyon had not
written something similar. Indeed they have: there are several
alternative methods for multivariate analysis of K tables in ade4. They
may not be exactly identical to Flury's Common Principal Components, but
they do similar things. Some of the methods may even be identical: The
ade4 people cite French sources, and Flury does not cite French sources
-- and there are at least two parallel universes in multivariate
analysis that rarely cross each other. Just go to CRAN and get ade4, and
try to figure out how to do the analysis you need.

cheers, jari oksanen 
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] distance in the function kmeans

2004-05-28 Thread Jari Oksanen

My thread broke as I write this at home and there were no new messages 
on this subject after I got  home. I hope this still reaches interested 
parties.

There are several methods that find centroids (means) from distance 
data. Centroid clustering methods do so, and so does classic scaling 
a.k.a. metric multidimensional scaling a.k.a. principal co-ordinates 
analysis (in R function cmdscale the means are found in C function 
dblcen.c in R sources). Strictly this centroid finding only works with 
Euclidean distances, but these methods willingly handle any other 
dissimilarities (or distances). Sometimes this results in anomalies 
like upper levels being below lower levels in cluster diagrams or in 
negative eigenvalues in cmdscale. In principle, kmeans could do the 
same if she only wanted.

Is it correct to use non-Euclidean dissimilarities when Euclidean 
distances were assumed? In my field (ecology) we know that Euclidean 
distances are often poor, and some other dissimilarities have better 
properties, and I think it is OK to break the rules (or `violate the 
assumptions'). Now we don't know what kind of dissimilarities were used 
in the original post (I think I never saw this specified), so we don't 
know if they can be euclidized directly using ideas of Petzold or 
Simpson. They might be semimetric or other sinful dissimilarities, too. 
These would be bad in the sense Uwe Ligges wrote: you wouldn't get 
centres of Voronoi polygons in original space, not even non-overlapping 
polygons. Still they might work better than the original space (who 
wants to be in the original space when there are better spaces floating 
around?)

The following trick handles the problem euclidizing space implied by 
any dissimilarity meaasure (metric or semimetric). Here mdata is your 
original (rectangular) data matrix, and dis is any dissimilarity data:

tmp - cmdscale(dis, k=min(dim(mdata))-1, eig=TRUE)
eucspace - tmp$points[, tmp$eig  0.01]
The condition removes axes with negative or almost-zero eigenvalues 
that you will get with semimetric dissimilarities.

Then just call kmeans with eucspace as argument. If your dis is 
Euclidean, this is only a rotation and kmeans of eucspace and mdata 
should be equal. For other types of dis (even for semimetric 
dissimilarity) this maps your dissimilarities onto Euclidean space 
which in effect is the same as performing kmeans with your original 
dissimilarity.

Cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] privileged slots,

2004-06-01 Thread Jari Oksanen

On Tue, 2004-06-01 at 12:21, Torsten Steuernagel wrote:
 On 28 May 2004 at 8:19, Duncan Murdoch wrote:
 
  I'd advise against doing this kind of optimization.  It will make your
  code harder to maintain, and while it might be faster today, if @-
  is really a major time sink, it's an obvious candidate for
  optimization in R, e.g. by making it .Internal or .Primitive.  When
  that happens, your optimized code will likely be slower (if it even
  works at all).
 
 Agreed. I don't recommend doing this either. I don't believe it makes 
 any difference using slot- instead of @- in real life. Anyway, that 
 optimized code should always work (slower or not) because slot- 
 is fully documented and I don't see why it should be removed or its 
 behaviour should change. That wouldn't only break the kind of code 
 mentioned here but also everything else that makes use of slot-.
 
There are several other things that were fully documented and still were
removed. One of the latest cases was print.coefmat which was abruptly
made Defunct without warning or grace period: code written for 1.8*
didn't work in 1.9.0 and if corrected for 1.9.0 it wouldn't work in
pre-1.9.0. Anything can change in R without warning, and your code may
be broken anytime. Just be prepared.

cheers, jari oksanen 
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] running R UNIX in a mac computer

2004-06-11 Thread Jari Oksanen

On Fri, 2004-06-11 at 03:49, Tiago R Magalhaes wrote:
 Hi to you all
 
 My question is:
 
 there is a package written in UNIX for which there is no Mac version.
 
 I would like to know if it's possible to install the R UNIX version on the
 MacOSX and run that UNIX package on my Mac (through this UNIX R Vresion on
 a Mac)
 
 I have seen a porfile for r version 1.8.1 on darwin:
 http://r.darwinports.com/
 is that it?
 
 aother question related to that
 if it's possible to use UNIX R in Mac, does anyone know how fast or how
 slow that is?

Tiago,

If it is a CRAN package *without* MacOS version, there obviously is a
reason for this handicap, and you cannot run the package. If it is a
stray package, its developer probably just doesn't have opportunity or
will to build a Mac binary, but you can build it yourself if you're
lucky. Check the FAQ and ReadMe files with your R/Aqua version to see
what you need. With little trouble you can easily use source packages
with your Mac R. Many tools are already installed in your OS (perl at
least). If the package has only R files, you may be able to install a
source package directly. If it has C source code, you should first
install MacOS X Developer tools (XCode): it comes with your OS
installation CD/DVD, but it is not installed by default. If the package
has Fortran source code, you got to find external Fortran compiler:
MacOS X ships with C compiler, but without Fortran compiler. See the Mac
R FAQ for the best alternatives to find the compiler (this FAQ is
installed with your R).

Installing a Darwin R orobably won't help you. It needs and uses exactly
the same tools to build the packages as R/Aqua. If you can't install a
source package in R/Aqua, you cannot install it in R/Darwin, and vice
versa. The toolset is the decisive part, not the R shell. I assume that
both versions of R are just as fast (or slow). R/Aqua uses highly
optimized BLAS for numeric functions, and if R/Darwin uses the same
library, it is just as fast. If it doesn't use optimized BLAS, it will
be clearly slower. 

I have installed Linux in Mac, but I found out that R was clearly (20%)
slower in Linux than in MacOS in the very same piece of hardware. The
main reason seemed to be that Linux R didn't have optimized BLAS because
the largest differences were in functions calling svd and qr (I used
YellowDog Linux) -- the Linux version took 150%(!) longer to run the
same svd-heavy test code. Another reason seemed to be that the Fortran
compiler produces much slower code in Linux than in MacOS X (difference
about 20%).

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] can't get text to appear over individual panels in multi-panel plot

2004-06-18 Thread Jari Oksanen

On 18 Jun 2004, at 8:26, Deepayan Sarkar wrote:
On Thursday 17 June 2004 22:57, Patrick Bennett wrote:
yes, i can reproduce that same graph when i print to the pdf-device.
but the panel titles do not appear when I print to the Quartz-device.
Hmm. I won't be able to help you then, let's hope someone else can.
I think this is a problem with the quartz device. I have often see that 
margin texts are plotted even in ordinary plot() if quartz thinks there 
is no space for them. They do still appear if you copy the screen 
graphics as a pdf file. In Linux (my principal platform) I typically 
reduce the white margins, but if I use the same mar pars in MacOS X I 
won't get axis labels. Quartz is the culprit I suppose.

Actually, in your example I couldn't get the texts when I saved the 
plot as a pdf (menu entry). However, when I opened an X11 device, the 
text was reproduced OK.  So it looks like a quartz problem.

For X11 in MacOS X: It may not be in the default installation, but it 
is in the installation CD/DVD of MacOS X. Then you got to start it 
explicitly before launching x11() within R shell. In general, I 
wouldn't recommend using x11() in Mac, since quartz() looks so much 
better: x11 looks just as clumsy as x11 in Linux or the ordinary 
Windows plotting device in some other OS. -- And beware: I have a 
suspicion that if you stop your X11 in MacOS X, your mouse will die at 
logout and you got to boot (or restart the mouse demon if you know  who 
he is).

cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] can't get text to appear over individual panels in multi-panel plot

2004-06-18 Thread Jari Oksanen

On Fri, 2004-06-18 at 09:28, Jari Oksanen wrote:
 On 18 Jun 2004, at 8:26, Deepayan Sarkar wrote:
 
 
  On Thursday 17 June 2004 22:57, Patrick Bennett wrote:
 
  yes, i can reproduce that same graph when i print to the pdf-device.
  but the panel titles do not appear when I print to the Quartz-device.
 
  Hmm. I won't be able to help you then, let's hope someone else can.
 
I had a closer look at this, and it indeed looks like quartz() is anally
checking that there is enough space for text or it refuses to print it
at all. Like I wrote, the command worked with x11() device in MacOS X,
but failed with default quartz(). I checked again (in another machine),
and it seems that you may get text if you expand the par.strip: try
adding

 par.strip.text=list(lines=2) 

in your Lattice plotting command (lines=1.8 was the smallest that worked
in my case). 

This is a fault (``undesirable feature'') in quartz. This doesn't
concern Lattice only, but all graphics commands: quartz() refuses to
show axis labels or titles in too narrow margins, or to write text too
close to axes (if xpd is not set) in quite ordinary plot(). 

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] Maxima/minima of loess line?

2004-08-24 Thread Jari Oksanen

On Tue, 2004-08-24 at 15:23, Liaw, Andy wrote:
 Just take range() of the fitted loess values.
 
Or if you really want to investigate the *line* instead of some random
*points*, you may need something like:

 optimize(function(x, mod) predict(mod, data.frame(speed=x)), c(0,20),
maximum=TRUE, mod=cars.lo)
$maximum
[1] 19.5
 
$objective
[1] 56.44498

This elaborates the ?loess example with the result object cars.lo (and,
of course is a bad example since the fit is monotone and solutions is
forced to the margin). Use maximum=FALSE for *a* minimum.

If you have several predictors, you either need to supply constant
values for those in optimize, or for simultaneous search in all use
optim or nlm.

cheers, jari oksanen
 
  From: Fredrik Karlsson
  
  Dear list,
  
  I've produced a loess line that I would like to investigate 
  in terms of 
  local/global maxima and minima. How would you do this?
  
  Thank you in advance.
  
  /Fredrik Karlsson
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] integrate function

2004-08-26 Thread Jari Oksanen

On Wed, 2004-08-25 at 23:44, Peter Dalgaard wrote:
 Ronaldo Reis Jr. [EMAIL PROTECTED] writes:
 
  Is possible to integrate this diferential equation:
  
  dN/dt = Nr(1-(N/K))
  
  in R using the integrate() function?
 
 No.
  
However,  you could use 

N = K/(1 + exp(log((K-N0)/K) -r*t)),

where N0 is the population size at t=0 (that you must fix or estimate).

Causton has a long discussion about integrating this funcition on in his
Mathematics for Biologists (or something like that). Apart from that
MuPad may be free for Linux, and you can buy many other alternatives for
symbolic mathematics (Maple is available for Linux, at least). It may be
that you still have to work to get the solution you need, even with
snappy tools like that.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R binaries for UMBUTU Linux?

2005-04-13 Thread Jari Oksanen

On Wed, 2005-04-13 at 15:19 +0200, Derek Eder wrote:
 Has anyone out there compiled R for the Umbutu Linux* (neé Debian) v.
 5.04 distribution for Intel-type platforms (32 and 64 bit) ?
 
 Thank you,
 
 Derek Eder
 
 * Umbutu, a popular new Linux distribution, not a Nigerian scam, I
 promise!http://www.ubuntulinux.org/
 
Well, if you mean Ubuntu (and Debian is still there: she's not married
to Ubuntu but kept her name), I have some experience (though not on
Intel -- later about that). First, it seems that R is not in standard
Ubuntu base, but you can find it in the universe, and install as a
binary. However, the rhythms are a bit off. Previous Ubuntu release was
about simultaneously with the R-2.0.x release, and you got R-1.9.1 in
Ubuntu. The current release of Ubuntu was last week, and R is up to next
week. This means that you're lagging behind by one cycle in R with these
predictable and regular release cycles. However, Ubuntu is a Linux which
means that you can compile R from the sources quite easily. I did this
with Ubuntu 4.04, and compilation went smoothly (like usually). However,
I did this in ppc (32bit, or G4), and some tests failed (at least in
'foreign': I haven't studied this in more detail). The base R seems to
work OK, though. Alternatively, you can use real Debian packages from
its testing repository. Ubuntu does not recommend using native Debian
packages, but I guess with R you can do this fairly safely (the general
problem is a potential conflict in version naming which may lead to
conflicts in upgrades, but I think this is OK with R). So you may get
the latest Debian (testing) packages -- as soon as they get through the
jungle of dependencies and appear in Debian.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Factor Analysis Biplot

2005-04-15 Thread Jari Oksanen

On Fri, 2005-04-15 at 12:49 +1200, Brett Stansfield wrote:
 Dear R
Dear S,

 When I go to do the biplot
 
 biplot(eurofood.fa$scores, eurofood$loadings)
 Error in 1:p : NA/NaN argument

Potential sources of error (guessing: no sufficient detail given in the
message):

- you ask scores from eurofood.fa and loadings from eurofood: one of
these names may be wrong.
- you did not ask scores in factanal (they are not there as default, but
you have to specify 'scores').

 
 Loadings:
   Factor1 Factor2
 RedMeat0.561  -0.112 
 WhiteMeat  0.593  -0.432 
 Eggs   0.839  -0.195 
 Milk   0.679 
 Fish   0.300   0.951 
 Cereals   -0.902  -0.267 
 Starch 0.542   0.253 
 Nuts  -0.760 
 Fr.Veg-0.145   0.325
 
The cut values are there, but they are not displayed.  To see this, you
may try:

unclass(eurofood$loadings)
print(eurofuud$loadings, cutoff=0)

cheers, J
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R2.0.1 for Mac OS X 10.3 problem

2005-04-18 Thread Jari Oksanen

On Mon, 2005-04-18 at 06:39 -0700, Horacio Montenegro wrote:
  I have the same problem, in Windows, and I think
 the .Rdata is not corrupted. Load R without loading
 the.Rdata file - move or rename it. Then library(lme4)
 and load the .Rdata - it should work.
 
This also is my experience (in Linux  MacOS X). The .RData need not be
corrupted, but you have corrupted your R installation by deleting or
corrupting some package using S4 methods (like failed upgrade of a S4
method package). When you start R so that it tries to restore those S4
objects in .RData, you get the error. Renaming or deleting .RData will
help, of course. Alternatively, in my case it helped to start R with
option --no-restore-data (in Mac when starting R from terminal -- where
you are all the time in Linux). Probably it would help to install again
the original S4 package. (In my case this happened when I tried Thomas
Yee's VGAM and then removed the package.)

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Very Slow Gower Similarity Function

2005-04-18 Thread Jari Oksanen

On 18 Apr 2005, at 19:10, Tyler Smith wrote:
Hello,
I am a relatively new user of R. I have written a basic function to 
calculate
the Gower similarity function. I was motivated to do so partly as an 
excercise
in learning R, and partly because the existing option (vegdist in the 
vegan
package) does not accept missing values.

Speed is the reason to use C instead of R. It should be easy, almost 
trivial, to modify the vegdist.c  so that it handles missing values. I 
guess this handling means ignoring the value pair if one of the values 
is missing -- which is not so gentle to the metric properties so dear 
to Gower. Package vegan is designed for ecological community data which 
generally do not have missing values (except in environmental data), 
but contributions are welcome.

I think I have succeeded - my function gives me the correct values. 
However, now
that I'm starting to use it with real data, I realise it's very slow. 
It takes
more than 45 minutes on my Windows 98 machine (R 2.0.1 Patched 
(2005-03-29))
with a 185x32 matrix with ca 100 missing values. If anyone can suggest 
ways to
speed up my function I would appreciate it. I suspect having a pair of 
nested
for loops is the problem, but I couldn't figure out how to get rid of 
them.
cheers, jari oksanen
--
Jari Oksanen, Oulu, Finland
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Very Slow Gower Similarity Function

2005-04-18 Thread Jari Oksanen

On 18 Apr 2005, at 20:36, Anon. wrote:
Jari Oksanen wrote:
On 18 Apr 2005, at 19:10, Tyler Smith wrote:
Hello,
I am a relatively new user of R. I have written a basic function to 
calculate
the Gower similarity function. I was motivated to do so partly as an 
excercise
in learning R, and partly because the existing option (vegdist in 
the vegan
package) does not accept missing values.

Speed is the reason to use C instead of R. It should be easy, almost 
trivial, to modify the vegdist.c  so that it handles missing values. 
I guess this handling means ignoring the value pair if one of the 
values is missing -- which is not so gentle to the metric properties 
so dear to Gower. Package vegan is designed for ecological community 
data which generally do not have missing values (except in 
environmental data), but contributions are welcome.

The only reason you never see ecological community data with missing 
values is because the ecologists remove those species/sites from their 
Excel sheets before they give it to you to sort out their mess.
Well, ecologists have plenty of missing species in their community 
data, but these have zero values since they were not observed. I guess 
some Bob O'Hara is going to have a paper about this in JAE.

This is actually one of the few things they know how to do in Excel - 
I'm dreading the day when a paper appears in JAE saying that you can 
use Excel to produce P-values.

The A in JAE stands for Animal: for real things they still have 
Journal of Ecology.

To be slightly more serious, as an exercise the OP could consider 
writing a wrapper function in R that removes the missing data and then 
calls vegdist to calculate his Gower similarity index.

The looping goes within C code, and for pairwise deletion of missing 
values wrapping is difficult. With complete.cases this is trivial (and 
then your result would be more metric as well).
--
Jari Oksanen, Oulu, Finland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] results from sammon()

2005-04-20 Thread Jari Oksanen

On Wed, 2005-04-20 at 10:35 +0200, Domenico Cozzetto wrote:
 Dear all,
 I'm trying to get a two dimensional embedding of some data using different
 meythods, among which princomp(), cmds(), sammon() and isoMDS(). I have a
 problem with sammon() because the coordinates I get are all equal to NA.
 What does it mean? Why the method fails in finding the coordinates? Can I do
 anything to get some meaningful results?

I'm sorry, but I can't reproduce your problem. I have tried hard with
different tricks, but sammon() always gives good numeric results, or
reports on the problems with the input and refuses to continue. For a
starter: which sammon did you use. I think there may be three or four
implementations in R with that name alone (and some variants may be
names differently). I used sammon() in MASS (Venables  Ripley), and
could not get NA. You need to give more details if you want to get help.

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] results from sammon()

2005-04-20 Thread Jari Oksanen

On Wed, 2005-04-20 at 12:35 +0200, Domenico Cozzetto wrote:
 Thanks for the attention paid to my rpoblem. Please find enclosed
 the matrix with my dissimilarities. This is the only case in
 which sammon(), from the MASS package, gives me this kind of problems.
 I'm using the implementation of sammon provided by the package MASS and the
 starting configuration is the default one.
 Here are the values for the other actual parameters
 niter = 100, trace = FALSE, magic = 0.2, tol = 1e-4
 

Domenico,

I had a look at your dissimilarity matrix, and indeed, they gave all NaN
in sammon() of MASS. This is speculation: sammon() uses cmdscale to get
starting configuration, and cmdscale puts two points (20 and 21) at zero
distance from each other. Sammon scaling checks against zero
dissimilarities in input, put it seems that it doesn't check against
zero dissimilarities in starting configuration. Moving one point
slightly seems to solve your problem. In the following, diss is the
dissimilarity matrix you sent. The trick is to calculate the same
starting configuration that sammon() would use (y), but then move one of
the conflicting points slightly and give that as the starting
configuration:

 y - cmdscale(diss)
 range(dist(y))
[1] 0.00 1.443101
 y[21,] - y[21,] + 0.01
 sam - sammon(diss, y)
Initial stress: 0.23260
stress after  10 iters: 0.09420, magic = 0.461
stress after  20 iters: 0.08072, magic = 0.500
stress after  30 iters: 0.07838, magic = 0.500
stress after  40 iters: 0.07754, magic = 0.500
stress after  50 iters: 0.07710, magic = 0.500
stress after  60 iters: 0.07681, magic = 0.500
stress after  70 iters: 0.07663, magic = 0.500
stress after  80 iters: 0.07653, magic = 0.500

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Pca loading plot lables

2005-04-25 Thread Jari Oksanen

On Mon, 2005-04-25 at 13:21 +0200, Frédéric Ooms wrote:
 Dear colleagues,
 I a m a beginner with R and I would like to add labels (i.e. the variable 
 names) on a pca loading plot to determine the most relevant variables. Could 
 you please tell me the way to do this kind of stuff.
 The command I use to draw the pca loading plot is the following :
 Plot(molprop.pc$loading[,1] ~ molprop.pc$loading[,2])
 Thanks for your help

Have you tried 'biplot' and found it unsatisfactory for your needs? 

biplot(pr)

Alternatively, you can do it by hand:

plot(pr$loadings, type=n)
text(pr$loadings, rownames(pr$loadings), xpd=TRUE)
abline(h=0); abline(v=0)

If you really want to have Axis 2 as horizontal, then you must replace
all pr$loadings pieces with pr$loadings[,2:1]. 

cheers, jari oksanen
-- 
Jari Oksanen [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

1 2 >

1 - 100 of 143 matches

Mail list logo