Re: [R] prcomp eigenvalues
On Tue, 2005-08-02 at 19:06 -0700, Rebecca Young wrote: Hello, Can you get eigenvalues in addition to eigevectors using prcomp? If so how? I am unable to use princomp due to small sample sizes. Thank you in advance for your help! Rebecca Young Rebecca, This answer is similar as some others, but this is simpler. You have two separate problems: running PCA and getting eigenvalues. The first is easy to solve: use prcomp instead of princomp (which only exists for historic reasons). Function prcomp can handle cases with more columns than rows. pc - prcomp(x) Above I assumed that your data are called x (or you can first make x, say: x - rcauchy(200); dim(x) - c(20,10) -- which puts a funny twist to comments on variances and standard deviations below). This saves something that are called 'sdev' or standard deviations, and you can get values that are (proportional to) eigenvalues simply by taking their squares: ev - pc$sdev^2 These may be good enough for you (they would be good enough for me). However, if you want to exactly replicate the numbers in some other piece of software, you may need to multiply these by some constant. If you don't need this, you may stop reading here. The eigenvalues above are related to usual 'unbiased' variance so that the following results are approximately equal: sum(ev) sum(apply(x, 2, var)) If you want to get eigenvalues related to biased estimate of variance, you can do eb - (1-1/nrow(x))*ev Function princomp uses these, as do some other software, but prcomp works hard and carefully to get the eigenvalues it uses instead of biased values (that would come naturally and directly in the algorithm it uses). Some programs relate their eigenvalues to the sum of squares, and you can get these by es - (nrow(x) - 1) * ev Finally, some popular programs in ecology (your affiliation) use proportional eigenvalues which you can get with: ev/sum(ev) cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] INDVAL and mvpart
Agnieszka, Package 'mvpart' is documented. In this case, ?rpart.object explains *where* in the rpart object is the membership vector. cheers, jari oksanen On Mon, 2005-08-08 at 16:02 +0200, [EMAIL PROTECTED] wrote: Hi, I'd like to perform Dufrene-Legendre Indicator Species Analysis for a multivariate regression tree. However I have problems with arguments of duleg(veg,class,numitr=1000)function. How to obtain a vector of numeric class memberships for samples, or a classification object returned from mvpart? -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] invalid 'mode' of argument?
On Wed, 2005-08-10 at 08:13 -0400, Kang, Sang-Hoon wrote: As a novice I was trying to calculate Shannon diversity index using diversity function in vegan package and kept having same error message. Error in sum(..., na.rm = na.rm) : invalid 'mode' of argument This error (which is from sum()) seems to come if you have non-numeric data (factors, character variables etc.). Check that your data are strictly numeric. Some of the most common cases I've seen are that row or column names are not read as row and column names but as data rows or columns. My dataset is from microarray and have abundant missing values, so I tried labeling them as NA and 0, but still same error message. Shannon index is negative sum of proportion times log of proportion, so I put 1 for missing values to avoid log 0, but still same error message. You shouldn't forge your data: the function handles zeros. PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectors of different length in a matrix
On Mon, 2005-08-22 at 08:56 -0400, Duncan Murdoch wrote: On 8/22/2005 8:45 AM, Marten Winter wrote: HI! I?ve 3 vectors of different length (a,b,c) and want to arrange them in a matrix a,b,c as rows and the figures of these vectors in the columns (with that matrix i want to calculate a distance between thes vectors - vegan - vegdist - horn). Is there a possibilty to create such a matrix and to fill up the missing fields with NA?s automatically Filling with NA's is the hard part; R normally likes to recycle vectors that are too short. Here's one way, probably not the best: x - matrix(NA, 3, max(length(a), length(b), length(c))) x[1,seq(along=a)] - a x[2,seq(along=b)] - b x[3,seq(along=c)] - c Another way to do it would be to extend all the vectors to the same length by appending NAs, then using rbind. Another issue is that this would fail at the next step outlined in the original message (vegan - vegdist - horn), since that step won't accept NAs. So the original schedule was bad. If you fill with zeros, then the 'vegdist' step would work in the sense that it produces numbers. I don't know if these numbers would make any sense if the vectors had nothing to do with each other originally, and columns would be of mixed meaning after stacking into a matrix. If your vector elements had identities (names) originally, then you should stack your data so that entries with the same identity go to the same column. It is difficult to imagine Horn index used in cases where you don't have these identities -- specifically species names. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectors of different length in a matrix
On Mon, 2005-08-22 at 16:13 +0300, Jari Oksanen wrote: On Mon, 2005-08-22 at 08:56 -0400, Duncan Murdoch wrote: On 8/22/2005 8:45 AM, Marten Winter wrote: HI! I?ve 3 vectors of different length (a,b,c) and want to arrange them in a matrix a,b,c as rows and the figures of these vectors in the columns (with that matrix i want to calculate a distance between thes vectors - vegan - vegdist - horn). Is there a possibilty to create such a matrix and to fill up the missing fields with NA?s automatically Filling with NA's is the hard part; R normally likes to recycle vectors that are too short. Here's one way, probably not the best: x - matrix(NA, 3, max(length(a), length(b), length(c))) x[1,seq(along=a)] - a x[2,seq(along=b)] - b x[3,seq(along=c)] - c Another way to do it would be to extend all the vectors to the same length by appending NAs, then using rbind. Another issue is that this would fail at the next step outlined in the original message (vegan - vegdist - horn), since that step won't accept NAs. Uh. It seems that I should read the package documentation (and posting guide which tells me to do so): it seems that vegdist() *can* handle NAs. I do still think that data with NA probably makes no sense with alternative horn. cheers, jari oksanen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Document clustering for R
On Mon, 2005-09-12 at 12:47 -0700, Raymond K Pon wrote: I'm working on a project related to document clustering. I know that R has clustering algorithms such as clara, but only supports two distance metrics: euclidian and manhattan, which are not very useful for clustering documents. I was wondering how easy it would be to extend the clustering package in R to support other distance metrics, such as cosine distance, or if there was an API for custom distance metrics. You don't have to extend the clustering package in R to support other distance metrics, but you should take care that you produce your dissimilarities (or distances) in the standard format so that they can be used in clustering package or in cmdscale or in isoMDS or any other function excepting a dist object. Clustering package will support new dissimilarities if they were written in standard conforming way. There are several packages that offer alternative dissimilarities (and some even distances) that can be used in clustering functions. Look for distances or dissimilarities in the R Site. Some of these could be the one for you... I would be surprised if cosine index is missing (and if needed, I could write it for you in C, but I don't think that is necessary). cheers, jari oksanen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Graphical presentation of logistic regression
On Wed, 2005-09-14 at 06:29 -0500, Frank E Harrell Jr wrote: Beale, Colin wrote: Hi, I wonder if anyone has written any code to implement the suggestions of Smart et al (2004) in the Bulletin of the Ecological Society of America for a new way of graphically presenting the results of logistic regression (see www.esapubs.org/bulletin/backissues/085-3/bulletinjuly2004_2column.htm#t ools1 for the full text)? I couldn't find anything relating to this sort of graphical representation of logistic models in the archives, but maybe someone has solved it already? In short, Smart et al suggest that a logistic regression be presented as a combination of the two histograms for successes and failures (with one presented upside down at the top of the figure, the other the right way up at the bottom) overlaid by the probability function (ie logistic curve). It's somewhat hard to describe, but is nicely illustrated in the full text version above. I think it is a sensible way of presenting these results and am keen to do so - at the moment I can only do this by generating the two histograms and the logistic curve separately (using hist() and lines()), then copying and pasting the graphs out of R and inverting one in a graphics package, before overlying the others. I'm sure this could be done within R and would be a handy plotting function to develop. Has anyone done so, or can anyone give me any pointers to doing this? I really nead to know how to invert a histogram and how to overlay this with another histogram the right way up. Any thoughts would be welcome. Thanks in advance, Colin From what you describe, that is a poor way to represent the model except for judging discrimination ability (if the model is calibrated well). Effect plots, odds ratio charts, and nomograms are better. See the Design package for details. You're correct when you say that this is a poor way to represent the model. However, you should have some understanding to us ecologists who are simple creatures working with tangible subjects such as animals and plants (microbiologists work with less tangible things). Therefore we want to have a concrete and simple representation. After all, the example was about occurrence of an animal against a concrete environmental variable, and a concrete representation was suggested. Nomograms and things are abstractions that you understand first after long education and training (I tried the Design package and I didn't understand the nomogram plot). I tried with one concrete example with my own data, and the inverted histogram method was patently misleading (with Baz Rowlingson's neat and compact code, sorry for the repetition). The method would be useful with dense and regular data only, but now the clearest visual cue was the uneven sampling intensity. With my limited knowledge on R facilities, I can now remember only two ways two preserve the concreteness of display in the base R: jitter() to avoid overplotting of observations, and sunflowerplot() to show the amount of overplotting. I think Ecological Society of America would be happy to receive papers to suggest better ways to represent binary response data, if some of the knowledgeable persons in this groups would decided to educate them (I'm not an ESA member, so I wouldn't be educated: therefore 'them' instead of 'us'). The ESA bulletin will be influential in manuscript submitted to the Society journals in the future, and the time for action is now. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Compare two distance matrices
On Fri, 2005-10-07 at 09:31 +0200, Mattias de Hollander wrote: Hi all, Thanks for the quick response. I see the ade4 package in not needed for distance matrix computation, but as far i can see you need it for comparing two distance matrices. In the stats package i can't find any similiar functions like mantel.randtest or RVdist.randtest of the ade4 package. So i think this package is still needed if i would like to make a scatter plot of the matrices. Or should i manualy compare these matrices with a loop for example and make a plot of this? To plot two dissimilarity structures d1 and d2 in base R, you can use command plot(d1, d2) For a plot() command, the dissimilarity structure looks like a vector. Dissimilarity structure means a result that you can get from as.dist() or directly from dist() function or any other alternative implementation of dissimilarity functions giving compliant results. For Mantel tests you may need ade4 (or some other package that has the same test). cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Under-dispersion - a stats question?
On Tue, 2005-10-11 at 17:16 -0400, Kjetil Holuerson wrote: Martin Henry H. Stevens wrote: Hello all: I frequently have glm models in which the residual variance is much lower than the residual degrees of freedom (e.g. Res.Dev=30.5, Res.DF = 82). Is it appropriate for me to use a quasipoisson error distribution and test it with an F distribution? It seems to me that I could stand to gain a much-reduced standard error if I let the procedure estimate my dispersion factor (which is what I assume the quasi- distributions do). I did'nt see an answer to this. maybe you could treat as a quasimodel, but first you should ask why there is underdispersion. Underdispersion could arise if you have dependent responses, for instance, competition (say, between plants) could produce underdispersion. Then you would be better off changing to an appropriate model. maybe you could post more about your experimental setup? Some ecologists from Bergen, Norway, suggest using quasipoisson with its underdispersed residual error (while I wouldn't do that). However, it indeed would be useful to know a bit more about the setup, like the type of dependent variable. If the dependent variable happens to be the number of species (like it's been in some papers by MHHS), this certainly is *not* Poisson nor quasi-Poisson nor in the exponential family, although it so often is modelled. I've often seen that species richness (number of species -- or in R-speak 'tokens' -- in a collection) is underdispersed to Poisson, and for a good reason. Even there I'd play safe and use poisson() instead of underdispersed quasipoisson(). cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] varimax rotation difference between R and SPSS
On Thu, 2005-10-13 at 16:13 +0200, Andreas Cordes wrote: Hi, I am puzzeled with a differing result of princomp in R and FACTOR in SPSS. Regarding the amount of explained Variance, the two results are the same. However, the loadings differ substantially, in the unrotated as well as in the rotated form. In both cases correlation matrices are analyzed. The sums of the squared components is one in both programs. Not in the data that you pasted in your message. After reading in the data I get from the non-rotated R solution: colSums(rpc^2) V2 V3 1 1 And the non-rotated SPSS solutions gives: colSums(spc^2) V2 V3 5.363671 2.136624 After normalizing the SPSS pc's, the solutions are identical (within numerical accuracy) after reversing the sign of second pc. I don't want to look at the data full of holes, like the loadings from varimax rotation. However, it seems that the raw solutions are identical. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] noncommutative addition: NA+NaN != NaN+NA
On Tue, 2004-09-07 at 12:47, Prof Brian Ripley wrote: On Tue, 7 Sep 2004, Robin Hankin wrote: Check this out: I am unable to reproduce it on any of the 7 different systems I checked (Solaris, Linux, Windows with various compilers). NaN +NA [1] NaN NA + NaN [1] NA I thought + was commutative by definition. What's going on? platform powerpc-apple-darwin6.8 arch powerpc os darwin6.8 system powerpc, darwin6.8 status (Both give NA under linux, so it looks like a version-specific issue). Linux on that hardware? It might be a chip issue. I tried this in Linux on Mac iBook G4, and the results were the same: NaN+NA was NaN, just like in MacOS X version. So it looks like a chip issue. However, the RPM built from the src.rpm packages at CRAN failed in some checks in Linux/iBook. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] isoMDS
On Wed, 2004-09-08 at 21:31, Doran, Harold wrote: Thank you. Quick clarification. isoMDS only works with dissimilarities. Converting my similarity matrix into the dissimilarity matrix is done as (from an email I found on the archives) d- max(tt)-tt Where tt is the similarity matrix. With this, I tried isoMDS as follows: tt.mds-isoMDS(d) and I get the following error message. Error in isoMDS(d) : An initial configuration must be supplied with NA/Infs in d. I was a little confused on exactly how to specify this initial config. So, from here I ran cmdscale on d as This error message is quite informative: you have either missing or non-finite entries in your data. The only surprising thing here is that cmdscale works: it should fail, too. Are you sure that you haven't done anything with your data matrix in between, like changed it from matrix to a dist object? If the Inf/NaN/NA values are on the diagonal, they will magically disappear with as.dist. Anyway, if you're able to get a metric scaling result, you can manually feed that into isoMDS for the initial configuration, and avoid the check. See ?isoMDS. d.mds-cmdscale(d) which seemed to work fine and produce reasonable results. I was able to take the coordinates and run them through a k-means cluster and the results seemed to correctly match the grouping structure I created for this sample analysis. Cmdscale is for metric scaling, but it seemed to produce the results correctly. So, did I correctly convert the similarity matrix to the dissimilarity matrix? Second, should I have used cmdscale rather than isoMDS as I have done? Or, is there a way to specify the initial configuration that I have not done correctly. If you don't know whether you should use isoMDS or cmdscale, you probably should use cmdscale. If you know, things are different. Probably isoMDS gives you `better'(TM) results, but it is more complicated to handle. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] isoMDS
On Thu, 2004-09-09 at 04:53, Kjetil Brinchmann Halvorsen wrote: Mardia, kent Bibby defines the standard transformation from a similarity matrix to a dissimilarity (distance) matrix by d_rs - sqrt( c_rr -2*c_rs + c_ss) where c_rs are the similarities. This assures the diagonal of the dissimilarity matrix to be zero. You could try that. In R notation, this would be sim2dist - function(x) as.dist(sqrt(outer(diag(x), diag(x), +) - 2*x)) Mardia, Kent Bibby indeed passingly say that this is a `standard transformation' (page 403). However, it is really a canonical way only if diagonal elements in similarity matrix are sums of squares, and off-diagonal elements are cross products. In that case the `standard transformation' gives you Euclidean distances (or if you have variances/covariances or ones/correlations it gives you something similar). However, it is no standard if your similarities are something else, and cannot be transformed into Euclidean distances. However, in isoMDS this *may* not matter, since NMDS uses only rank order of dissimilarities, and any transformation giving dissimilarities in the same rank order *may* give similar results. The statement was conditions (may), since isoMDS uses cmdscale for the starting configuration, and cmdscale will give different results with different transformations. So isoMDS may stop in different (local) optima. Setting `tol' parameter low enough in isoMDS (see ?isoMDS) helped in a couple of cases I tried, and the results were practically identical with different transformations. So it doesn't matter too much how you change your similarities to dissimilarities, since isoMDS indeed treats them as dissimilarities (but cmdscale treats them as distances). cheers, jari oksanen -- J.Oksanen, Oulu, Finland. Object-oriented programming is an exceptionally bad idea which could only have originated in California. E. Dijkstra __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Installing packages on OS X
On Wed, 2004-09-08 at 21:25, hadley wickham wrote: On my computer, it seems that (binary?) packages installed through the GUI in RAqua are not used available to the command line version of R, while (source) packages installed with R CMD INSTALL are available to both. This is a problem when I run R CMD CHECK on a package that I am creating that depends on packages I have installed through the gui. Is this a problem with my installation of R, or a known limitation? (there is no mention of this in the Mac OS X faq, however, the entire section entitled Installing packages is blank). It is in some other FAQ... Unfortunately, I don't have a Mac available now, so I can't check. However, seek for environmental variables and setting library paths in some other R FAQ or R Administration/Installation guide. There will you find a description of things you should do. It is just as crystal clear as unix man pages: everything is clear *after* you know what is said there, but you may have hard time noticing this clarity. I solved this problem some months ago after a long search among the official documentation. So it is documented, but well hidden. I may have a look at a machine where I solved this in the evening (UTC+3), if you won't get solution before that. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] isoMDS
On Thu, 2004-09-09 at 14:25, Doran, Harold wrote: Thank you. I use the same matrix on cmdscale as I did with isoMDS. I have reproduced my steps below for clarification if this happens to shed any light. --- snip --- Doran, Your data clarified things. It seems to me now, that your data are not a a matrix but a data.frame. A problem for an ordinary user is that data.frames and matrices look identical, but that's only surface: you shouldn't be shallow but look deep in their souls to see that they are compeletely different, and therefore isoMDS fails. At least isoMDS gives just that error for a data.frame, but cmdscale casts data.frame to a matrix therefore it works. So the following should work (worked when I tied): tt - as.matrix(tt) isoMDS(tt) (and you could down to a dist object with tt - as.dist(tt) which seems to handle data.frames directly, too). Then you will still need to avoid the complaint about zero-distances among points. This means that you have some identical points in your data, and isoMDS does not like them. This issue was discussed here in April, 2004 (and many other times). Search archives for the subject question on isoMDS. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] getting started on Bayesian analysis
On Wed, 2004-09-15 at 03:27, HALL, MARK E wrote: I've found Bayesian Methods: A Social and Behavioral Sciences Approach by Jeff Gill useful as an introduction. The examples are written in R and S with generalized scripts for doing a variety of problems. (Though I never got change-point analysis to successfully in R.) Change point analysis? I haven't seen the book, but I read lecture handouts of one Bayesian course over here in Finland (Antti Penttinen, Jyväskylä), and translated his example to R during one (rare) warm summer day in a garden. So do you mean this (binary case): source(/mnt/flash/cb.update.R) cb.update function (y, A=1, B=1, C=1, D=1, N=1200, burnin=200) { n - length(y) lambda - numeric(N) mu - numeric(N) k - numeric(N) lambda[1] - A/(A+B) mu[1] - C/(C+D) k[1] - n/2 sn - sum(y) for (i in 2:N) { kold - k[i-1] sk - sum(y[1:kold]) lambda[i] - rbeta(1, A+sk, B + kold - sk) mu[i] - rbeta(1, C + sn - sk, D + n - sn + sk - kold ) knew - sample(n-1, 1) sknew - sum(y[1:knew]) r - (sknew - sk) * (log(lambda[i])-log(mu[i]))-(knew-kold)*(lambda[i]-mu[i]) if(min(0,r) log(runif(1))) k[i] - knew else k[i] - k[i-1] } out - cbind(lambda, mu, k) out[(burnin+1):N, ] } y - c(rbinom(60, 1, 0.8), rbinom(40, 1, 0.3)) uh - cb.update(y, N=5200) colMeans(uh) lambda mu k 0.8189303 0.4169367 59.077 mean(y[1:60]) [1] 0.783 mean(y[41:100]) [1] 0.45 plot(density(uh[,1])) plot(density(uh[,2])) plot(table(uh[,3]), type=h) This was off-topic. So something about business: isn't the (Win)BUGS author working with a R port? cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem installing source packages on OS X
On 15 Sep 2004, at 20:29, Aric Gregson wrote: I am attempting to install the Hmisc, rreport and Design packages, but am not able to do so. I am running R v1.9.1 on Mac OS 10.3.5. I get the same error for Hmisc (rreport is not on CRAN). It looks like it is trying to use g77 to compile the source package. How can I change the default compiler? Will this solve the problem? I cannot find a binary version of either package. R is trying to build a Fortran program, and it needs a Fortran compiler. Fortran compiler does not ship with MacOS X, but you got to get one. See the MacOS FAQ for R. If I remember correctly, it tells you to go http://hpc.sourceforge.net/ for the compiler. Normally I wouldn't remember addresses like this, but just today I had to make a visit there: I had installed g77 using fink, and that puts its stuff into /sw instead of /usr/local. Some R routines had hardcoded the g77 path to /usr/local/bin/g77 and so building a package failed in the false claim of missing g77 (yeah, it was in the path). cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] BUGS and OS X
On Wed, 2004-09-15 at 21:29, Tamas K Papp wrote: On Wed, Sep 15, 2004 at 02:21:18PM -0400, Liaw, Andy wrote: That's more of a question for the BUGS developers. BUGS is not open source, so whatever binary is provided, that's all you can use. If I'm not mistaken, WinBUGS is the only version under development. I found something called JAGS, and I am still exploring it. It appears to be an open-source BUGS replacement, thought with limitations. MacOS X is a kind of unix (where the emphasis is on the kind of), so you can get and compile any source code developed for unix -- with some luck. One alternative is Bassist available at http://www.cs.helsinki.fi/research/fdk/bassist/. I just tried and found out that you can compile and install it in MacOS X in the usual way (./configure make sudo make install). That's all I can say about it. It may not be easiest to use. The current version seems to be a bit oldish and not quite complete, but somebody claimed that they may start developing Bassist again. Actually, Bob O'Hara (who usually calls himself Anon. in this list) should know more, and hopefully this message will prompt him to tell us, too. I was asking what software people would recommend for the same functionality, not a drop-in replacement. I am just baffled by the bewildering array of R packages, and would be so happy if somebody told me what THEY use for Bayesian analysis, so I could read the docs and get started. MCMC? Boa? etc. Suggestions on how experienced users do bayesian analysis in R would be welcome. You need a guru to guide you. That's the holy tradition in Bayesianism. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Signs of loadings from princomp on Windows
On Thu, 2004-09-16 at 02:38, Tony Plate wrote: You could investigate this yourself by looking at the code of princomp (try getAnywhere(princomp.default)). I'd suggest making a file that in-lines the body of princomp.default into the commands you had below. See if you still get the difference. (I'd be surprised if you didn't). Then try commenting out lines the second pass through the commands produces the same results as the first. The very last thing you commented out might help to answer your question What would be causing the difference? (The fact that various people chimed in to say they could reproduce the behavior that bothered you, but didn't bother dig deeper suggests it didn't bother them that much, which further suggests that you are the person most motivated by this and thus the best candidate for investigating it further...) People were not too bothered, since the sign of the eigenvector is not well defined in PCA: vectors x and -x are equal. Have you compared absolute values? Do they differ much (more than, say 1e-6)? If they differ too much for you, this could be a symptom of some other problems, so it may be worth investigating in machines where you get this thing (others can do nothing). Since the princomp.default is difficult to find (either getAnywhere(princomp.default) or stats:::princomp -- I hate this information hiding), and its code is winding, I'd suggest you concentrate studying line: sol - eigen(cv, symmetric=TRUE) where you get the cv with cv - cov.wt(x)$cov * (1 - 1/nrow(x)) and x is your data matrix. If cv remains unchanged from time to time, but there is a change in signs of sol$vectors, then you have localised your problem. If it's not there, then the rest of the princomp.default code is worth investigating. If it's in the eigen, then it dives deep into Fortran, and that may be all you can say. (If your covariance matrices change with repeated calculations, then the problem is deeper). However, sign doesn't matter if there are -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Multi-dimensional scaling
On Thu, 2004-09-16 at 17:28, Luis Rideau Cruz wrote: Is there any package/function in R which can perform multi-dimensional scaling? Yes. Ripley's MASS package has isoMDS for non-metric multidimensional scaling. Moreover, the same package has function sammon for another variant. Some people regard SOM as a crude form of multidimensional scalling, and that is -- surprise -- in MASS, too (but there are other implementations). Vasic R (or its stats component) has principal co-ordinates analysis, a.k.a. as metric multidimensional scaling. Finally, R has a utility help.search which would show you most of these and something else, too (perhaps xgvis in the xgobi, if that's installed in your system). Try help.search(multidimensional scaling). cheers, jari oksane -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data(eurodist) and PCA ??
On Wed, 2004-10-13 at 09:51, Prof Brian Ripley wrote: On Wed, 13 Oct 2004, Dan Bolser wrote: I have a complex distance matrix, and I am thinking about how to cluster it and how to visualize the quality of the resulting clusters. Using PCA and plotting the first two components is classical multi-dimensional scaling, as implemented by cmdscale(). Look up MDS somewhere (e.g. in MASS). It is exact if the distances are Euclidean in 2D. However, eurodist gives road distances on the surface of sphere. Classic examples for the illustration of MDS are departements of France based on proximity data and cities in the UK based on road distances. These road distances seem to be very non-Euclidean indeed (even non-metric). It seems to be 2282km from Athens to Milan if you go directly, but if you go via Rome it is only 1403km: trip - c(Athens, Rome, Milan) as.matrix(eurodist)[trip, trip] Athens Rome Milan Athens 0 817 2282 Rome 8170 586 Milan2282 586 0 817 + 586 [1] 1403 I thought that World is non-Euclidean, but not that obviously. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] biplot.princomp with loadings only
On Thu, 2004-09-30 at 10:33, Christoph Lehmann wrote: Hi is there a way to plot only the loadings in a biplot (with the nice arrows), and to skip the scores? Christoph, I may have overlooked some email messages, but it seems to me that you haven't yet got an answer to your practical question. From the practical point of view, we may skip the point that you rather ask for a monoplot than biplot if you have only one set of points. Further, I may forget my surprise when I see that somebody really thinks that these arrows are nice. OK, they may be nice if you have only a couple of them, but anybody plotting 30 or more arrows normally asks how to get rid off this mess. Of course you can plot arrows in your monoplot, since you have got access to everything in R and you can do anything with R (but coffee comes somewhat bland, so I recommend something else for the task cooking coffee). Here is an example: # Run PCA data(USArrests) sol - princomp(USArrests, cor=T) # Extract loadings X - sol$loadings # Plot the frame plot(X, asp=1, type=n) abline(v=0, lty=3) abline(h=0, lty=3) # Plot arrows: see ?arrows for the syntax arrows(0, 0, X[,1], X[,2], len=0.1, col=red) # Label the arrows text(1.1*X, rownames(X), col=red, xpd=T) Cheers, jari okanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to use a matrix in pcurve?
Sun, On 24 Oct 2004, at 10:24, XP Sun wrote: Hi, Everyone, I want to calculate the principal curve of a points set. First I read the points'coordinate with function scan, then converted it to matrix with the function matrix, and fit the curve with function principal.curve. Here is my data in the file bmn007.data: 0.023603 -0.086540 -0.001533 0.024349 -0.083877 -0.001454 .. .. 0.025004 -0.083690 -0.001829 0.025562 -0.083877 -0.001857 0.026100 -0.083877 0.90 0.025965 -0.083877 0.002574 and the code as follow: pp - scan(bmn007.data, quiet= TRUE) x - matrix(pp, nc=2, byrow=TRUE) fit - principal.curve(x, plot = TRUE) points(fit,col=red) By now, I got a right result. But when i changed to use pcurve with matrix x as pcurve(x), an error was thrown as following: Estimating starting configuration using : CA Error in h %*% diag(sqrt(d)) : non-conformable arguments How to convert a matrix to the format could be accepted by pcurve? Any help appreciated! Sun, The canonical answer is ask De'ath (the author of the package). The rest is guessing. It seems that pcurve uses correspondence analysis (CA) to estimate the starting configuration. CA doesn't handle cases where any of the marginal sums (row or column sums) are negative or zero. Do you have this kind of cases? If so, can you get rid of them? Does pcurve have another option than CA for getting the starting configuration? cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] persp(), scatterplot3d(), ... argument
On Wed, 2004-10-27 at 11:11, Uwe Ligges wrote: Jari Oksanen wrote: On Wed, 2004-10-27 at 10:04, Uwe Ligges wrote: This is a larger problem if 1. one of the underlying functions does not have ... 2. you want to relay arguments to two or more underlying functions, and 3. you don't want to list all possible arguments in your function definition, since it is long enough already. The solution is still there, but it is (black) magic. For instance, 'arrows' does not have ..., so you must add them with this magical mystery string: formals(arrows) - c(formals(arrows), alist(... = )) You don't need it for simple things like: foo - function(...){ plot(1:10) arrows(1,1,7,7,...) } foo(lwd=5) # works! That's why I had point 2 above: it really would work with simpler things. However, the following may fail: parrow - function (x, y, ...) { plot(x, y, ...) arrows(0, 0, x, y, ...) invisible() } parrow(runif(10), runif(10), col=red) # works parrow(runif(10), runif(10), col=red, pch=16) Error in arrows(0, 0, x, y, ...) : unused argument(s) (pch ...) Adding formals would help. As always, useful patches are welcome. I don't know if this counts as a useful patch, but it is patch anyway: diff -u2r old/arrows.R new/arrows.R --- old/arrows.R2004-10-27 11:32:25.0 +0300 +++ new/arrows.R2004-10-27 11:32:53.0 +0300 @@ -1,5 +1,5 @@ arrows - function (x0, y0, x1, y1, length = 0.25, angle = 30, code = 2, -col = par(fg), lty = NULL, lwd = par(lwd), xpd = NULL) +col = par(fg), lty = NULL, lwd = par(lwd), xpd = NULL, ...) { .Internal(arrows(x0, y0, x1, y1, length = length, angle = angle, cheers, jari oksanen -- J.Oksanen, Oulu, Finland. Object-oriented programming is an exceptionally bad idea which could only have originated in California. E. Dijkstra __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ploting an ellipse keeps giving errors
On Wed, 2004-10-27 at 11:34, Sun wrote: library (ellipse) shape1 = c (1, 0, 0,1) dim(shape1) = c(2,2) ellipse (center = c(0,0), shape = shape1, radius = 1) = Error in plot.xy(xy.coords(x, y), type = type, col = col, lty = lty, ...) : plot.new has not been called yet It is really frustrating. Also what do the shape matrix, radius correspond to an ellipse function (x-x0)^2/a + (y-y0)^2/b = 1 ? Please advise! Sun, did you read the ?ellipse help page? I just read, but I didn't find arguments 'center', 'shape' or 'radius' there. It could be useful to use argument specified in the help page. Section 'Details' of ?ellipse explains the parametrization. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ploting an ellipse keeps giving errors
On Wed, 2004-10-27 at 12:04, Jari Oksanen wrote: On Wed, 2004-10-27 at 11:34, Sun wrote: library (ellipse) Here's your problem! See below. shape1 = c (1, 0, 0,1) dim(shape1) = c(2,2) ellipse (center = c(0,0), shape = shape1, radius = 1) = Error in plot.xy(xy.coords(x, y), type = type, col = col, lty = lty, ...) : plot.new has not been called yet It is really frustrating. Also what do the shape matrix, radius correspond to an ellipse function (x-x0)^2/a + (y-y0)^2/b = 1 ? Please advise! Sun, did you read the ?ellipse help page? I just read, but I didn't find arguments 'center', 'shape' or 'radius' there. It could be useful to use argument specified in the help page. Section 'Details' of ?ellipse explains the parametrization. Sun, Actually the problem seems to be that you loaded library(ellipse), but follow the instructions for function ellipse in library(car). Would this help? (One additional note: ellipse::ellipse.default uses British spelling for 'centre', but 'cent' would work both in ellipse::ellipse and car::ellipse.) cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] biplot drawing conc ellipses
On Thu, 2004-11-04 at 22:44, T. Murlidharan Nair wrote: Is there an option to draw concentration ellipses in biplots ? It seems really nice to summarize large number of points of each group. Murli, If you mean biplot.prcomp function in stats package, and you want to draw the concentration ellipses for row scores, the answer probably is not easily. Technically, the problem is that arrows for loadings are drawn after labels for rowscores, and the scaling used for drawing row scores is lost in the process. If you try to add points or segments to the existing plots, you should use the scaling for arrows on sides 3 and 4 (top and right). If you want to add something for row scores, you just don't have information on co-ordinates. I didn't check biplot.princomp, but the situation may be similar there. Drawing of ellipsoids is possible in some alternative packages. You already got a hint of ade4. In addition, vegan has pca as a special case of its rda function, and there you have tools like ordiellipse (using the ellipse package), ordispider and ordihull to display the variability within factor levels. However, vegan doesn't have biplots like biplot.prcomp, i.e. with arrows for loadings, Moreover, scaling of results is different. It seems that the only thing you can do is to write your sweet on biplot function. cheers, jari oksanen -- Jari Oksanen -- Oulu, Finland. But, Mousie, thou art no thy lane, In proving foresight may be vain; The best-laid schemes o' mice an 'men, Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! (Robert Burns) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] rgl on Mac OS
On Sun, 2004-11-07 at 02:54, Saiwing Yeung wrote: It seems like a number of people on this list can install rgl but have problem loading it. I found myself in the same situation too. I have tried the workaround of removing /usr/X11R6/lib from DYLD_LIBRARY_PATH, but it doesn't seem to work for me, I am still getting the same error (that everyone else seems to get). Can anyone give me some ideas on what else to try? I have Mac OS 10.3.5, running R2.0. Thanks in advance! I had a quick look at this issue, and indeed, rgl failed to load in my system (MacOS X 10.3.6, R 2.0.0) with various error messages. It seems to me that the binary packages at CRAN were incompatible (g++ is notorious for version changes incompatibilities). The solution was to use source packages and compile locally. For this you need to have a compiler installed. The compiler comes with MacOS X 10.3.* installation cd/dvd, but you have to install their Developer Tools separately. One of the early error messages was that libpng was missing. When installing from source, rgl was configured without png support, and this message disappeared. However, CRAN binaries failed even after installing png libraries, but now with other error messages. I got my libpng with the help of http://www.rna.nl/ii.html (that you need anyway). It may be that you have to start X11 separately before calling library(rgl), but this was not necessary in my later attempts. Summary: install from source package. Optionally, you may install libpng as well. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gdist and gower distance
On Tue, 2004-11-09 at 12:59, Alessio Boattini wrote: Dear All, I would like to ask clarifications on the gower distnce matrix calculated by the function gdistin the library mvpart. Here is a dummy example: library(mvpart) Loading required package: survival Loading required package: splines mvpart package loaded: extends rpart to include multivariate and distance-based partitioning x=matrix(1:6, byrow=T, ncol=2) x [,1] [,2] [1,]12 [2,]34 [3,]56 gdist(x, method=euclid) 12 2 2.828427 3 5.656854 2.828427 ## doing the calculations by hand according to the formula in gdist help page I get the same results. The formula given is: 'euclidean' d[jk] = sqrt(sum (x[ij]-x[ik])^2) # sqrt(8) [1] 2.828427 gdist(x, method=gower) 1 2 2 0.7071068 3 1.4142136 0.7071068 ### doing the calculations by hand according to the formula in gdist help page cannot reproduce the same results. The formula given is: 'gower' d[jk] = sum (abs(x[ij]-x[ik])/(max(i)-min(i)) ## Could anybody please shed some light? There seems to be a bug in documentation. The function uses different calculation than the help page specifies. Look at the 'gdist' code. Just to make things easier: In the function body, gower is method 6, and Euclidean distances are method 2. Gower's original paper is available through http://www.jstor.org/ (Biometrics Vol. 27, No. 4, p. 857-871; 1971). cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R works on Fedora Core 3
On 9 Nov 2004, at 19:44, Jonathan Baron wrote: The RPM for Fedora Core 2 seems to work just fine on Core 3. (The graphics window got smaller, but I'm sure there is a setting for that.) That would be good news. I really don't know how the graphics window became so big at some stage. (MacOS X is just cute here: tiny, sharp, fast graphics window.) Has the options()printcmd reappeared, so that dev.print() works without changing default options? cheers, jazza -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] CDs for R?
On 16 Nov 2004, at 23:39, (Ted Harding) wrote: Some of us are on narrow bandwidth dialup connections, so downloading large quantities of stuff is out of the question (e.g. at approx. 5min/MB, it would take over 2 days to download a single CD). The meat of CRAN (including contributed packages and documentation) is enough to fill 5 CDs, though one individual probably wouldn't be interested in all of that. 5 CDs sounds 4 too many. I once burnt CDs for my students, and they fitted nicely in one CD (Windows binaries, all packages as Windows binaries and sources, contributed documents). I guess you can fit Windows, Mac and some Linux binaries all in one CD. Now comes my suggestion to CRAN maintainer: this all would be easier, if you would produce a CD image file ('iso') that would contain a snapshot of the latest version: main binaries, all contributed packages, and docs. Getting somebody to help downloading this iso would be much easier than trying to collect all first and then make up your own cd image. Actually, only Windows and Mac users need binary versions of packages. The former because they don't have tools to install from source, the latter because they don't know that they have the tools (being command line challenged). To Dirk Eddelbuettel: Yes indeed, Ubuntu gives human face to Debian and is a much more pleasant experience. However, changing OS for R may be asking too much. Further, Ubuntu/Debian comes with a tiny and biased selection of packages, and if that's not your kind of bias, you have got to go to the Internet again. Further, Ubuntu (and other Linuxes) lag behind R. The current Ubuntu release comes with R 1.9.1, and it won't be upgraded but in the next release scheduled for April 2005 (and just in the same time as the next R, so that Ubuntu will be one R version off again). I guess the lag is even worse in packages. cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] CDs for R?
On Wed, 2004-11-17 at 16:54, Dirk Eddelbuettel wrote: On Wed, Nov 17, 2004 at 08:25:54AM +0200, Jari Oksanen wrote: On 16 Nov 2004, at 23:39, (Ted Harding) wrote: Now comes my suggestion to CRAN maintainer: this all would be easier, if you would produce a CD image file ('iso') that would contain a snapshot of the latest version: main binaries, all contributed packages, and docs. Getting somebody to help downloading this iso would be much easier than trying to collect all first and then make up your own cd image. It's volunteer effort, so someone actually has to do this. Can you help? Probably not. Not because I wouldn't be willing, but I may not be able... I have done this a couple of time using wget to build a local subtree of selected parts of CRAN. Then running mkisofs was pretty simple. I guess this could be automated pretty easily if you have the repository already at hand: all you need is mkisofs + info of its targets. However, I am not that kind of guru. All this would require that people think this is worthwhile. I think that the general feeling has been that there is no need for a R-current.iso snapshot (or the same as a valid Windows name). So this is an academic issue (suits me). To Dirk Eddelbuettel: Yes indeed, Ubuntu gives human face to Debian and is a much more pleasant experience. However, changing OS for R may be asking too much. Further, Ubuntu/Debian comes with a tiny and biased selection of packages, and if that's not your kind of bias, you have got to go to the Internet again. Further, Ubuntu (and other Linuxes) Again, it reflects the interests of the volunteers involved. If you want to see other things done, come join in and do them. I know this is volunteer work, and I do appreciate this volunteer work. It is all biased -- hence the formulation of your kind of bias. At the moment I have no idea how to build a deb package of R packages, so I don't know what to say. lag behind R. The current Ubuntu release comes with R 1.9.1, and it won't be upgraded but in the next release scheduled for April 2005 (and just in the same time as the next R, so that Ubuntu will be one R version off again). I guess the lag is even worse in packages. This actually requires a response. Here is a quick log (from my mail folder) about what new packages (of mine, can't speak for others) got uploaded recently -- in most cases, this is on the day of the source release, so the lag would be close to zero. Now, if and when these get pressed into a release by Debian or Ubuntu I do not control. Which is, I guess, why we're discussing archive snapshots in this thread. They go, I guess, through a testing period in Debian, and if they don't wait for anybody else, they may appear in some version of Debian after that. In Debian repository you typically see much older versions. As to Ubuntu (that I know a bit better), they will go into next release which is nearly six months ahead (they are not upgraded in between). Actually, Ubuntu is a bad choice if you just want to have R, since R is not among the core packages, but it is unsupported. Moreover, Ubuntu is a bad choice for the original problem of slow wires: Even for an ordinary install you need internet connection, if you want to get beyond a very rudimentary system. I just forgot this in my previous message: when you're wired, you think it's natural to be wired. So forget Ubuntu if you want to have R without fast internet connection. I have Ubuntu since it was about the only easily managed powerpc system I found. At the moment, I have R 2.0.0 built from source distribution there. Packages are from source files, too. Thanks for the good work with Debian! cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Running R from CD?
On Mon, 2004-11-22 at 02:41, bogdan romocea wrote: Better install and run R from a USB flash drive. This will save you the trouble of re-writing the CD as you upgrade and install new packages. Also, you can simply copy the R installation on your work computer (no install rights needed); R will run. I think there is a niche (= a hole in the wall) for a live CD: it is cheaper to distribute 20 copies of CD's to your audience than 20 USB memory sticks. Instructions would be welcome. From: Hans van Walen hans_at_vanwalen.com At work I have no permission to install R. So, would anyone know whether it is possible to create a CD with a running R-installation for a windows(XP) pc? And of course, how to? Check the file Getting-Started-with-the-Rcmdr.pdf in John Fox's Rcmdr package. You should be able to reach this package by launching help.start(), and then browsing its directory in the help browser window. Go to chapter 7. Some Suggestions for Instructors which tells you how to make a live CD of R in Windows. I haven't tried this, since I don't have Windows, but I sure will when I got to be an instructor in a Windows class. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to insert one element into a vector?
On Mon, 2004-11-22 at 17:43, Barry Rowlingson wrote: Deepayan Sarkar wrote: Pretty much what 'append' does. A shame then, that help.search(insert) doesn't find 'append'! I cant think why anyone looking for a way of _inserting_ a value in the middle of a vector would think of looking at append! Python has separate insert and append methods for vectors. x=[1,2,3,4,6] x.insert(4,5) x [1, 2, 3, 4, 5, 6] x.append(99) x [1, 2, 3, 4, 5, 6, 99] So has R. R's 'insert' is called 'append', and R's 'append' is called 'c'. Counter-intuitively though, and I'm happy that Peter Dalgaard didn't know that 'append' inserts: it gives some hope to us ordinary mortals. cheers, jazza -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] The hidden costs of GPL software?
On Tue, 2004-11-23 at 17:40, roger koenker wrote: Having just finished an index I would like to second John's comments. Even as an author, it is difficult to achieve some degree of completeness and consistency. Of course, maybe a real whizz at clustering could assemble something very useful quite easily. All of us who have had the frustration of searching for a forgotten function would be grateful. You mean SOM? -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A basic question
On Tue, 2004-11-30 at 13:58, Kenneth wrote: Hi R users: I want to know any experience compiling R in other LINUX distributions besides FEDORA (Red Hat) or Mandrake, for example in BSD, Debian, Gentoo, Slackware, vector LINUX, Knoppix, Yopper or CERN linux? Hope this is not a basic question Thank you for your help. I assume that the following will typically work: Get the source file, gunzip and untar, cd to the created directory and type: ./configure make sudo make install It is best to check the resulting configuration after ./configure and get the software (compilers, libraries, packages, utilities) you need for the missing functionality you want to have. It is also wise to run 'make check' after 'make' so that you see if you can trust your compilation. This make check fails in some cases: at least standard package 'foreign' failed 'make check' in ppc architecture both in Red Hat/Fedora based (Yellowdog) and Debian based (Ubuntu) Linuxes when I tried last time. Otherwise the compilation seems to run smoothly (and you may not need 'foreign'). BSD is not Linux, but R is officially supported at least for one version of BSD with GNU tools: MacOS X. cheers,jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can't install r package on debian due to linker problem
On Wed, 2004-12-01 at 14:38, Robert Sams wrote: hi, my attempt to install the package Hmisc v3.0-1 fails with the message: /usr/bin/ld: cannot find -lfrtbegin collect2: ld returned 1 exit status make: *** [Hmisc.so] Error 1 ERROR: compilation failed for package 'Hmisc' It is funny to see this error message in Debian which is a GNU/Linux system. Typically you see the very same error message in MacOS X which is a GNU/BSD system. There this is caused by missing Fortran compiler. Indeed, at least in Red Hat Linux, libfrbegin.a is owned by Fortran (g77). However, you claim below that you have installed Fortran (g77). I suggest you look for if you some Fortran related packages are missing, or you can try to 'locate' libfrtbegin.a in your system and see if it is in the linker search path. i'm at a loss here. any hints will be very much appreciated. i'm running: debian stable R version 2.0.1 gcc 2.95.4-14 g77 2.95.4-14 binutils 2.12.90.0.1-4 cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] step.gam
On Wed, 2004-12-01 at 17:09, David Nogués wrote: Dear R-users: Im trying (using gam package) to develop a stepwise analysis. My gam object contains five pedictor variables (a,b,c,d,e,f). I define the step.gam: step.gam(gamobject, scope=list(a= ~s(a,4), b= ~s(b,4), c= ~s(c,4), d= ~s(d,4), e= ~s(e,4), f= ~s(f,4))) Your scope doesn't look much like Trevor Hastie's help page. Have you tried formulating your scope like Hastie tells you to do? That is, for a you should list all possible cases for stepping instead of only one. That is, something like ...a = ~ 1 + a + s(a, 2) + s(a, 4). Why do you want to use this kind of stepping, when the standard package mgcv has a much better way of model building using generalized cross validation? Dave Roberts discusses R/S-plus (or mgcv/gam package level) gam fitting in ecological context at http://labdsv.nr.usu.edu/splus_R/lab5/lab5.html. You may find some useful hints here, as Dave is partial to the traditional S-plus gam as well. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Protocol for answering basic questions
On 1 Dec 2004, at 19:46, [EMAIL PROTECTED] wrote: I have been a member for only a few days but I find the tone of some responses are inappropriate for a list dubbing itself a help list. I also completely understand that traffic needs to be kept at a modest level to keep advanced users interested; therefore, I suggest that a second help list be created to deal with advanced R help. Belong to both lists if you wish and filter your email for cursory glances or a detailed reading. Users must judge themselves the level of their queries and perhaps a note saying something like requests to the advanced list are generally made by users who already have a very good working knowledge of R or some very rough benchmark for judging your level like 2 years. I do not know how much work this would involve or resources available for this - it is a blind proposal. I think it might deal with many of the problems both beginner and advanced users have with the present list. You may have not been long enough on this list to see that some of the old-time gurus have reached a demigod like status. Demigods have all rights to be `rude' (that's almost a definition of a demi-deity). That said, I do know your sentiments: I'd be afraid to post a question to this list. I also remember that I was shocked that the first message I sent here got answers from people like VR (both) and many others, and these were friendly and useful answers (although I could have found the answer to my question with careful reading of documents -- it was about specifying offset in glm). This is a subscribed mailing list. As such, this is a restrictive list with more stringent rules than open newsgroups. Well, newsgroups can be really harsh places, too. I don't think that it would be wise to establish a parallel novice mailing list. That would add only one extra irritation: cross-posting to several lists. However, I do think that novice questions could be be better served in a newsgroup (Usenet) than in a closed mailing list. There have been several suggestions of transforming this mailing list into a newsgroup, but these suggestions have been rejected, and rightly. However, if you want to have novice group with slacker netiquette, you could try to establish a parallel and alternative newsgroup with different emphasis than this mailing list. I am sure that many of the greatest gurus wouldn't follow you into this newsgroup, but they would keep to this mailing list. If you want to have answers to 'basic', 'silly' or 'simple' questions, you don't need them either. Suggesting a Usenet newsgroup a generation thing. I think some of the younger users would prefer a Wiki or a Forum (these are words I've seen, but I wouldn't visit places like this, talking about my g-g-generation). cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] depth constrained cluster
On Wed, 2004-12-01 at 18:36, Emmanuel GANDOUIN wrote: Please could you help me to find a package to apply a depth-constrained cluster analysis on palaeoecological data (in order to zone subfossil diagram)? I assume that you made an exhaustive search in CRAN, and the lack of answers indicates that there is no such a function in R. You may check Pierre Legendre's Progiciel R instead (yes, this is a different R and has a priority to the name, our R being a later homonym). The progiciel R is available at Thttp://www.fas.umontreal.ca/biol/casgrain/fr/labo/. his package seems to have both onedimensional or chronologically constrained clustering and 2dim or spatially constrained clustering. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Protocol for answering basic questions
On Thu, 2004-12-02 at 11:19, John Logsdon wrote: There are three ways of tackling this as far as I see: First would be to make the list a Reply to Sender so that most of us don't see the replies. This would keep the traffic down and if any topic was of interest to another member, s/he could ask the originator whether it had been solved or the solution could also be posted as a summary. One advantage of Reply to Sender is that it is only the Sender sees the multiple messages sent saying the same thing from good souls around the world who haven't seen the N-1 other messages... This seems to depend on the mail reader. This already is the default behaviour with me (Evolution mail reader). I have to select Reply-to-All to send the message to r-help as well -- and then it goes to Cc list as well. It seems that some other mail software behaves differently. It seems that R-help mail has two candidates to Reply: From: this field is the original poster Sender: [EMAIL PROTECTED] Obviously my mail reader picks only From, but John's picks both From and Sender. Some other mail lists add to headers a new field Reply-To which equals to From (original poster). It seems that this would be sufficient to make many mail readers to use this as a default address. Another issue is whether it is nice to divert a public discussion to a private conversation. In several cases the solutions to the problem remain private as well. After all, the purpose of the mailing list is a public discussion instead of a public call to a private discussion. cheers, jari oksanen -- Jari Oksanen -- Oulu, Finland. But, Mousie, thou art no thy lane, In proving foresight may be vain; The best-laid schemes o' mice an 'men, Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! (Robert Burns) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How about a mascot for R?
On 2 Dec 2004, at 19:46, (Ted Harding) wrote: On 02-Dec-04 Henrik Bengtsson wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Damian Betebenner Sent: Thursday, December 02, 2004 6:07 PM To: [EMAIL PROTECTED] Subject: [R] How about a mascot for R? Excellent replies, So a couple of questions about preferences for the mascot: 1. Does the mascot need to have a name that starts with R? Is that usually the way it works? So far the possibilities put forward are: Ray, Ram, Inch Worm, Rhinoceros R.oo (http://www.maths.lth.se/help/R/R.oo/), ooops Roo, which is Australian slang for Kangaroo. http://images.google.com/images?q=roo Cheers Henrik Bengtsson (And of course .oo suggests the OO aspect of R as well). But what appeals to me about this suggestion is that it made me recall cartoon drawings I saw many years ago, illustrating leptokurtic and platykurtic. The platykurtic was a profile drawing of a platypus, illustrating the flat-topped profile of such a distribution. The leptokurtic showed two kanagaroos in profile, upright, face-to-face, with tails outstretched on the ground behind them. The envelope of this drawing illustrated the high peak and the long tails. (And of course they are good leppers). Can anyone remember where this appeared? LEASE do read the posting guide! http://www.R-project.org/posting-guide.html I can check that tomorrow when I'm at my office. You can have a look at the image at http://cc.oulu.fi/~jarioksa/mascot.html I think this is a copyright picture, and it cannot be used freely as a mascot (and will disappear soon from this address). cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How about a mascot for R?
On Thu, 2004-12-02 at 22:56, Peter Dalgaard wrote: Tim Churches [EMAIL PROTECTED] writes: Damian Betebenner wrote: R users, How come R doesn't have a mascot? Perhaps someone with artistic flair could create a mascot based on this image? It would help to give newcomers to R-help the right idea: http://www.accesscom.com/~alvaro/alien/thepics/ripley1__.jpg Or maybe this one: http://www.accesscom.com/~alvaro/alien/thepics/bg10s.jpg or (apologies to Pat Burns): http://www.accesscom.com/~alvaro/alien/thepics/alien102_.jpg It seems that tastes for movies vary. I've never liked movies about ecologically non-sustainable and energetically impossible life forms. The current sub-theme brings to my mind something completely different: http://www.hundland.com/posters/t/TheTalentedMr.Ripley.jpg. cheers, jari o. -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How about a mascot for R?
On 4 Dec 2004, at 16:19, Martin Maechler wrote: DScottNZ == David Scott [EMAIL PROTECTED] on Fri, 3 Dec 2004 15:04:52 +1300 (NZDT) writes: DScottNZ As to an animal mascot, I think a New Zealand DScottNZ mascot is a must, well, thinking that must is bit strong, I agree that I have had the same idea (NZ animal) before your post. I first thought of the obvious Kiwi, but hoping for something more beautiful had been googling around for New Zealand animals, then had been side tracted by the Kakapo which I found nice, intriguing, but in his fight against extinction didn't seem to fit to my notion of R.. Firstly, Kiwi is a rip snorter for a bird. Secondly, there are other kind of kiwis than the kiwi bird. I'm living about as far a away from NZ as is it is possible (you're getting closer if you try to get away), but even I've heard of 'kiwi fruit', 'kiwi bear' (brushtail possum) and 'kiwi' as people. So it could be something 'kiwi'. I do think that a kiwi bird would be mascotty like a creature: cuddly and round and easiesh to draw. One parallel story brought about here is the penguin as a Linux mascot. Actually, this is a not-so-pleasant story: Linus Torvalds told somewhere that a penguin (hardly gentoo but some other species) tried to bite off his finger in a zoo, which made him to like those animals (he's a Swedish speaking Finn which helps to explain this attitude). With this attitude, you could pick a gray, mouse-like nocturnal bird as a mascot. Naturally, this is none of my business, so you should not let this message influence your opinion (it wouldn't anyway). cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Gam() function in R
On 6 Dec 2004, at 7:36, Janice Tse wrote: Thanks for the email. I will check that out However when I was doing this :gam(y~s(x1)+s(x2,3), family=gaussian, data=mydata )it gives me the error : Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars What does it mean ? When Any Liaw answered you (below), he asked you to specify which kind of 'gam' did you use: the one in standard package 'mgcv' or the one in package 'gam'. We should know this to know what does it mean to get your error message. If you used mgcv:::gam, it means that you didn't read it help pages which say that you should specify your model as: gam(y ~ s(x1) + s(x2, k=3)) Further, it may be useful to read the help pages to understand what it means to specify k=3 and how it may influence your model. Simon Wood -- the mgcv author -- also has a very useful article in the R Newsletter: see the CRAN archive. It may be really difficult to understand what you do when you do mgcv:::gam unless you read this paper (it is possible, but hard). Simon's article specifically answers to your first question of deciding the smoothness, and explains how elegantly this is done in mgcv:::gam (gam:::gam has another set of tools and philosophy). If you happened to use gam:::gam, then you have to look at another explanation. cheers, jari oksanen From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Sunday, December 05, 2004 11:34 PM To: 'Janice Tse'; [EMAIL PROTECTED] Subject: RE: [R] Gam() function in R Unfortunately that's not really an R question. I recommend that you read up on the statistical methods underneath. One that I'd wholeheartedly recommend is Prof. Harrell's `Regression Modeling Strategies'. [BTW, there are now two implementations of gam() in R: one in `mgcv', which is fairly different from that in `gam'. I'm guessing you're referring to the one in `gam', but please remember to state which contributed package you're using, along with version of R and OS.] Cheers, Andy From: Janice Tse Hi all, I'm a new user of R gam() function. I am wondering how do we decide on the smooth function to use? The general form is gam(y~s(x1,df=i)+s(x2,df=j)...) , how do we decide on the degree freedom to use for each smoother, and if we shold apply smoother to each attribute? Thanks!! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html --- - -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Importing vector graphics into R
On Wed, 2004-12-08 at 15:53, [EMAIL PROTECTED] wrote: On 08-Dec-04 Roger Bivand wrote: On Tue, 7 Dec 2004, Hinrich Göhlmann wrote: Dear R users, I know of the possibility to import bitmaps via the nice pixmap library. But if you later on create a PDF it is somewhat disappointing to have such graphics bitmapped. Is there a trick (via maps?) to import a vector graphic and have them plotted onto a graph? My searching attempts in the searchable r-help archive did not seem to result in anything useful... No, nothing obvious. If you have an Xfig file - or convert to one from PS, How does one do that? None of the tools I can find on my (Linux) system seem to include the possibility of PS-Xfig (or any other vector format either, except of course PDF). pstoedit. May not be in standard distros, but can be compiled from the source. Here we have even used pstoedit in post-processing eps graphs from R. It works in some cases, but, for instance, lattice graphic was made of polygons instead of lines, and we couldn't change line widths for horizontal lines only in panel headers. This is what pstoedit gives for version info: pstoedit: version 3.33 / DLL interface 108 (build Oct 17 2003 - release build) : Copyright (C) 1993 - 2003 Wolfgang Glunz cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] How about a mascot for R?
On Wed, 2004-12-08 at 14:10, Rau, Roland wrote: Dear all, browsing through the suggestions, I have the impression that the general direction is towards an animal from New Zealand (I guess because of the roots of R). But since the R Foundation is now located in Vienna, Austria. What about a typical Austrian animal? Is there one? Maybe a Wolpertinger. A Wolpertinger is a fantasy animal which is a rabbit with the antlers known from deer and some wings from a bird. In addition to the Austrian headquarters, another reason for such an animal which does not exist in reality (or does it???) is that coding something with R is sometimes so easy that it appears to be almost unreal. I just wait for someone jumping off and saying this is off-topic and you should stop posting to this list -- and I'm afraid it could happen just at this point. However, if you accept stranger animals then the group called Rhinogradentia gives good candidates (at least as pleasant as Onychophora suggested previously). First, they have R in their name. Second, they look like mascots. The most authoritative guide to the group is: Stümpke, H. 1957. Bau und Leben der Rhinogradentia. Gustav Fischer Verlag, Stuttgart The English translation is The Snouters: Form and life of the Rhinogrades . The University of Chicago Press (1981). Google will found more info for those who don't have acces to these books. cheers, jaRi oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] How about a mascot for R?
On Wed, 2004-12-08 at 17:01, Jari Oksanen wrote: I just wait for someone jumping off and saying this is off-topic and you should stop posting to this list -- and I'm afraid it could happen just at this point. Just to make it clear and to avoid misunderstanding: I was trying to reach a passive voice with my poor English. I don't want to indicate that anyone else but me should stop posting to this list... cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] How about a mascot for R?
On Wed, 2004-12-08 at 17:01, Jari Oksanen wrote: I just wait for someone jumping off and saying this is off-topic and you should stop posting to this list -- and I'm afraid it could happen just at this point. Just to make it clear and to avoid misunderstanding: I was trying to reach a passive voice with my poor English. I don't want to indicate that anyone else but me should stop posting to this list... cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to circumvent negative eigenvalues in the capscale function
On Fri, 2004-12-10 at 06:11, [EMAIL PROTECTED] wrote: I am trying to do a partial canonical analysis of principal coordinates using Bray-Curtis distances. The capscale addin to R appears to be the only way of doing it, however, when I try and calculate a Bray-Curtis distance matrix either using Capscale or Vegedist (capscale I understand uses Vegedist anyway to calculate its distance matrix), R uses up all available memory on the computer, stops and then comes back with errors regarding negative eigenvalues. The way to avoid negative eigenvalues is to use a ``positive semidefinite'' dissimilarity matrix. This may sound cryptic. In simple words: the underlying functions in capscale assume that your dissimilarities are like (Euclidean) distances, meaning that the shortest route between two points is a straight line, and you cannot find a shorter route by going via a third point. This is possible with Bray-Curtis index, and as its symptom, you get negative eigenvalues (which are ignored in capscale, and only the dimensions with positive eigenvalues are used). Were negative eigenvalues your problem, you could avoid them by using another dissimilarity index with better metric properties. Jaccard dissimilarity is rank-order similar to Bray-Curtis, but it should be positive semidefinite. However, I don't think think that negative eigenvalues and memory problems are coupled. I guess that you simply have memory problems, and negative eigenvalues are unrelated. So you need more memory or an operating system with better memory handling. You may try with some Linux live-cd (such as Quantian) where you can use R in Linux without installing Linux in your hard drive. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Switching to Mac, suggestions? (was switching to linux)
On 13 Dec 2004, at 19:53, doktora v wrote: I'm looking to switch to Mac platform. Anyone had any experience with that? I'm expecting on a power G4 laptop later this week hope R behaves... I have been a Linux user since 1999, and I got my first ever Mac (iBook G4 laptop) last December. There is just as little to comment on MacOS X as there is to comment on flavours of Linux distros: there is no large difference as regards to R. I still prefer emacs ess as a shell (but you can get some kind of real emacs in Mac as well), but MacOS X/ R is more of an eye candy (though I find it really hard to get any real use for transparent windows in R: I still prefer to see what I type instead of looking the background through the text). As regards to R, it is just the same if you have any brand of Linux or MacOS X or even a fringe system like Windows. The differences are somewhere else than in R. By the way, Ubuntu GNU/Linux works nicely in Mac, with blas who knows about the vector processor in G4. cheers, jazza -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Switching to Mac, suggestions? (was switching to linux)
On Mon, 2004-12-13 at 19:53, doktora v wrote: I'm looking to switch to Mac platform. Anyone had any experience with that? I'm expecting on a power G4 laptop later this week hope R behaves... Still one comment on speed. I once (and, actually, just now) had to analyse a big data set of some 1100 observations using various multivariate methods, among them isoMDS of MASS and eigenvector methods in vegan library. I made a testsuite of typical analysis sequence for this very special data set. So it is non-general, but something that matters to me. I have run this data set on crippled (=Celeron) i686 under Linux and Windows, and on G4 (iBook and iMac) under MacOS X, Yellowdog Linux 3 and Ubuntu GNU/Linux 4.10. It may be daring to say something about G4 performance based on this special case, but this doesn't stop me from saying. For my all sequence, G4 with MacOS X is somewhat faster compared to cpu speed than Celeron, but not nearly as much as advertised. There were some procedures that run slower per MHz than Celeron (isoMDS). However, MacOS X comes with G4-optimized blas, so that eigenvector based analysis was faster: 800 MHz iBook run like 1400 MHz Celeron, and 1000MHz iMac run like 1700 MHz Celeron. I guess the boost depends on time you spend in blas. Otherwise you may count that your G4 cpu cycles equal i686 cpu cycles, and you are slower since you can get faster Intel chips. Vector processor (AltiVec) may be handy, but most functions can't use without very tedious and ugly code optimized by hand. I've seen claims that gcc 3.4 has some automatic G4 optimization. If this is true, you may get some advantage with G4. G5 is a different issue. Yellowdog Linux 3 didn't have G4-optimized blas, and it was really slow. Actually, 800 MHz iBook run like a 500 MHz Celeron in a blas-heavy analysis. YD3 was so old that I couldn't build an optimized blas without extensive upgrading (gcc, glibc etc), and I really wasn't motivated for that. You can get a G4-optimized blas for Ubuntu GNU/Linux and with that it runs just as fast as MacOS X. BTW, this test matter in the sense that I have to run these analyses, and they take an observable amount of time. The test suite run in 800MHz iBook in 1600 secs, and in in 2GHZ Celeron in 700 secs. We are not talking about millisecond boosts but about going to lunch or sitting by your computer. Another efficiency issue in Mac is that graphics are superb in Mac. The default plot (quartz) is small but sharp. It used to scale instantly when you changed its size, but this deteriorated in 2.0 series. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] printing PCA scores
On Sat, 2005-01-22 at 17:31 -0500, Jérôme Lemaître wrote: Hey folks, I have an environmental dataset on which I conducted a PCA (prcomp) and I need the scores of this PCA for each site (=each row) to conduct further analyses. Can you please help me with that? Did you try help(prcomp) ? It says that prcomp (may) return an item called 'x': x: if 'retx' is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the 'rotation' matrix) is returned. [non-matching parentheses in the original help file] So this is what you asked for. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] MacOS X and vectorized files (emf, wmf,...)
On Mon, 2005-01-31 at 08:06 +0100, Patrick Giraudoux H wrote: Dear Listers, We are organising practical trainings for students with R 2.0.1 under MacOS X. I am used with R 2.0.1 under Windows XP and thus has been surprised not to find functions in the MacOS X version of R providing vectorized chart outputs to a file. For instance the equivalent of: win.metafile() or savePlot() ... including a wmf or emf option. Can one obtain only jpeg or bitmap or eps files with R under MacOS X or did I miss something? Saving a plot from a menu bar (click save) saves a plot in pdf which is vectorized graphic format native to MacOS X. Further, dev.copy2eps() work normally, as do postcript() and pdf() devices. See appropriate help pages. Native MS Windows formats (such as wmf) may not work, but who needs them? cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bootstrapped eigenvector
Jérôme, On Sat, 2005-01-29 at 14:14 -0500, Jérôme Lemaître wrote: Hello alls, I found in the literature a technique that has been evaluated as one of the more robust to assess statistically the significance of the loadings in a PCA: bootstrapping the eigenvector (Jackson, Ecology 1993, 74: 2204-2214; Peres-Neto and al. 2003. Ecology 84:2347-2363). However, I'm not able to transform by myself the following steps into a R program, yet? Can someone could help me with this? I thank you very much by advance. Here are the steps that I need to perform: 1) Resample 1000 times with replacement entire raws from the original data sets (7 variables, 126 raws) 2) Conduct a PCA on each bootstrapped sample 3) To prevent axis reflexion and/or axis reordering in the bootstrap, here are two more steps for each bootstrapped sample 3a) calculate correlation matrix between the PCA scores of the original and those of the bootstrapped sample 3b) Examine whether the highest absolute correlation is between the corresponding axis for the original and bootstrapped samples. When it is not the case, reorder the eigenvectors. This means that if the highest correlation is between the first original axis and the second bootstrapped axis, the loadings for the second bootstrapped axis and use to estimate the confidence interval for the original first PC axis. 4) Determine the p value for each loading. Obtained as follow: number of loadings =0 for loadings that were positive in the original matrix divided by the number of boostrap samples (1000) and/or number of loadings =0 for loadings that were negative in the original matrix divided by the number of boostrap samples (1000). The following function seems to run the analysis like Peres-Neto and others defined: netoboot function (x, permutations=1000, ...) { pcnull - princomp(x, ...) res - pcnull$loadings out - matrix(0, nrow=nrow(res), ncol=ncol(res)) N - nrow(x) for (i in 1:permutations) { pc - princomp(x[sample(N, replace=TRUE), ], ...) pred - predict(pc, newdata = x) r - cor(pcnull$scores, pred) k - apply(abs(r), 2, which.max) reve - sign(diag(r[k,])) sol - pc$loadings[ ,k] sol - sweep(sol, 2, reve, *) out - out + ifelse(res 0, sol = 0, sol = 0) } out/permutations } With typical chemical data, you should pass option cor = TRUE to princomp. Another issue is whether you should use this method. Opinions may be divided here, but I'll let that to the proper Statistician to comment on. Best wishes, Jari Oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem installing Hmisc (more info)
On 5 Feb 2005, at 23:59, Prof Brian Ripley wrote: On Sat, 5 Feb 2005, Michael Kubovy wrote: Frank Harrell suggested I re-post with information about the version of R Thanks, that starts to make sense. You appear to have g77 installed, but not in the place the person who prepared the binary install of R has it. Where do you have it installed? ('whereis g77' or 'which g77' should tell you.) Then you need to alter FLIBS in R_HOME/etc/Makeconf to point to it. (You can remove -lfrtbegin from FLIBS: it is not needed: your R_HOME looks to be /Library/Frameworks/R.framework/Versions/2.0.1.) However, I believe there is a more fundamental problem: because libg2c is a static library on current MacOS X, most packages using Fortran cannot be compiled there. That's presumably the case with Hmisc, as the automated package builder is not providing a binary build. (There is supposedly a check directory on CRAN, but it is not there at present.) This must be a problem specific to a certain installation. I regularly build Fortran files into MacOS X binaries. The specification in this machine is: pomme:~ jarioksanen$ uname -a Darwin pomme.local 7.7.0 Darwin Kernel Version 7.7.0: Sun Nov 7 16:06:51 PST 2004; root:xnu/xnu-517.9.5.obj~1/RELEASE_PPC Power Macintosh powerpc pomme:~ jarioksanen$ locate libg2c /usr/local/lib/libg2c.0.0.0.dylib /usr/local/lib/libg2c.0.dylib /usr/local/lib/libg2c.a /usr/local/lib/libg2c.dylib /usr/local/lib/libg2c.la So there are both static and dynamic libg2c's. I don't know of any package management system for Mac, so I don't know who installed these files, but probably it was g77. Probably I got these from http://hpc.sourceforge.net/, though (I try avoid Fink which is a constant source of trouble). Similarly, the g2c is installed in /usr/local: pomme:~ jarioksanen$ which g77 /usr/local/bin/g77 I got my g77 from a place pointed to in R-MacOS X FAQs. I have often seen problems in MacOS with hardcoded paths which assume certain locations for files. Latest problem was that 'rgl' library assumes that 'libpng' to be in a different place that I had it. For instance, Darwin's Fink installs stuff in a unique place called /sw. Perhaps that's the problem? However, ideally MacOS software should work 'jus anywhere' (and 'just work') like they say in ads. cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] randomisation
On Wed, 2005-02-09 at 14:27 +0100, Yann Clough wrote: I am working on an ecological problem and dealing with a matrix where rows correspond to samples, and columns correspond to species. The values in the matrix are recorded abundances of the organisms. I want to create a series of randomised datasets where total abundances per sample (rowSums) and per species (colSums) are equal to those in the dataset of my observations. Simple example of the kind of thing I have: matrix(c(1,0,2,10,1,3,5,6,7,1,0,0),nrow=4, ncol=3,by=row) # observed data rowSums(tempmatrix) #individuals per location, colSums(tempmatrix) #individuals per species example of a matrix which complies with the two restrictions: tempmatrix2=matrix(c(1,0,2,11,0,3,5,6,7,0,1,0),nrow=4, ncol=3,by=row) rowSums(tempmatrix2) colSums(tempmatrix2) hope this is clear As already explained, this may not be possible as a simple permutation. You seem to have something else on your mind: moving individuals freely between species instead of permuting data which means redistributing the abundances among species instead of permutation. For a traditional permutation, you may have a look at the labdsv package (for ecological applications). This has function 'rndveg' which attempts to preserve either species occurrence distributions or plot- level species richness. Preserving both may be impossible, but check the function. The 'labdsv' source package is available at CRAN, and Windows and MacOS X binaries through my web page (http://cc.oulu.fi/~jarioksa/softhelp/softalist.html). The Windows binary is not available at CRAN since the package fails R CMD check in Windows (so you shouldn't check the package but just use it). The Mac binary is not available at CRAN since the whole Mac binary package system seems to be dysfunctional (there is nothing after Jan 19, 2005). cheers, jari oksanen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] installing package hier.part on Mac OSX
On Thu, 2005-02-10 at 08:01 +, Prof Brian Ripley wrote: For MacOS we have Binary packages, foo.tgz Source packages, foo.tar.gz Neither are `zip files' (things created by zip, usually with extension .zip). It looks like you have not installed a source package before, and you either do not have the development tools installed or they are not in your path. In this case probably the tools are missing, starting from 'make'. You should install the Development Tools / X-Code which come with the MacOS X installation cd/dvd at least from version 10.3.x. Otherwise you can get the development tools from http://developer.apple.com/. Moreover, in this case you need to get a Fortran compiler which does not come with MacOS. See R for Mac OS X FAQ, section the Fortran compiler g77 gcc 3.3. That there is no binary version of a package available usually indicates a problem with it on MacOS X, at least on the autobuilder's version of MacOS. Well, 'hier.part' is younger than the latest entry in Mac binary packages: there is nothing after Jan 19, 2005. The binary package builds beautifully in MacOS X. However, it seems to require package 'gtools' that I can't find in CRAN nor in BioConductor repositories. It seems that this didn't prevent passing tests to be included at CRAN or producing Windows binaries. Theresa, I can send you a Mac binary if you don't want to see the trouble of installing X-Code and g77. However, it failed with missing 'gtools' upon loading. cheers, jari oksanen On Wed, 9 Feb 2005, Theresa Talley wrote: Hi- I've been trying to install the hier.part package on my mac (OSX 10.3.7) and it is not working for some reason. I am downloading the package source called : hier.part_1.0.tar.gz. When I try to auto install from the cran site, I get this message: * Installing *source* package 'hier.part' ... ** libs /Library/Frameworks/R.framework/Resources/bin/SHLIB: line 1: make: command not found And when I try to install from the zip file on my computer, I get this message: What precisely did you do here? gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error exit delayed from previous errors Error in file(file, r) : unable to open connection In addition: Warning messages: 1: Installation of package hier.part had non-zero exit status in: install.packages(c(hier.part), lib = /Library/Frameworks/R.framework/Resources/library, 2: tar returned non-zero exit code: 512 in: untar(pkg, tmpDir) 3: cannot open file `hier.part_1.0.tar/DESCRIPTION' I've successfully installed other packages (e.g., vegan, cluster) so am not sure if there is something different about this one or if Im just being dopey. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Failure of update.packages()
On Thu, 2005-02-10 at 13:52 +0100, Peter Dalgaard wrote: I M S White [EMAIL PROTECTED] writes: Can anyone explain why with latest version of R (2.0.1) on FC3, installed from R-2.0.1-0.fdr.2.fc3.i386.rpm, update.packages() produces the message /usr/lib/R/bin/Rcmd exec: INSTALL: not found. Indeed /usr/lib/R/bin seems to lack various shell scripts (INSTALL, REMOVE, etc). You need to install the R-devel package too: 1 R-devel-2.0.1-0.fdr.2.fc3.i386.rpm The big idea is that this will suck in all the required compilers, libraries, and include files via RPM dependencies, but users with limited disk space may be content with the binaries of R+recommended packages. This kind of problems were to be anticipated, weren't they? The great divide between use-only and devel packages is a rpm packaging standard, but not very useful in this case: it splits a 568K devel chip from a 15.4M chunk of base R. Moreover, you don't have a repository of binary packages for Linux which means that not many people can use the 568K saving in download times (saving in disk space is more considerable of course). So are there plans for binary Linux packages for other distros than Debian so that people could use the non-devel piece of R only? cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Failure of update.packages()
On 10 Feb 2005, at 19:26, Peter Dalgaard wrote: [EMAIL PROTECTED] writes: Quoting Jari Oksanen [EMAIL PROTECTED]: On Thu, 2005-02-10 at 13:52 +0100, Peter Dalgaard wrote: I M S White [EMAIL PROTECTED] writes: Can anyone explain why with latest version of R (2.0.1) on FC3, installed from R-2.0.1-0.fdr.2.fc3.i386.rpm, update.packages() produces the message /usr/lib/R/bin/Rcmd exec: INSTALL: not found. Indeed /usr/lib/R/bin seems to lack various shell scripts (INSTALL, REMOVE, etc). You need to install the R-devel package too: 1 R-devel-2.0.1-0.fdr.2.fc3.i386.rpm The big idea is that this will suck in all the required compilers, libraries, and include files via RPM dependencies, but users with limited disk space may be content with the binaries of R+recommended packages. This kind of problems were to be anticipated, weren't they? The great divide between use-only and devel packages is a rpm packaging standard, but not very useful in this case: it splits a 568K devel chip from a 15.4M chunk of base R. Moreover, you don't have a repository of binary packages for Linux which means that not many people can use the 568K saving in download times (saving in disk space is more considerable of course). So are there plans for binary Linux packages for other distros than Debian so that people could use the non-devel piece of R only? cheers, jari oksanen The splitting is an experiment (and I said so when I announced it). It does have unforseen consequences, like implicating me in maintaining a repository of binary RPMs for CRAN packages, which I'm not particularly keen on. So I shall probably revert to a single RPM, and force the installation requirements to be the same as the build requirements. This was, in fact, Peter's suggestion which shows that not everybody is as short-sighted as me. Martyn Hmm... Actually, you had sort of convinced me that the split might be a good idea. Point being of course that it's not the 568K that gets shaved off in R-devel, it's the 12M for gcc + the 5M for g77 + 28M for perl + more, which are only needed for installing packages and are therefore not dependencies of the main R RPM. Maintaining binary package RPMs was never in the cards as I saw it. However, it then only makes sense if a sizable proportion of R users are never going to install packages. Otherwise you get cost of having to explain the point repeatedly, at basically zero benefit. That's a good point. You could look at MacOS X standard installation to see what can be left out in a working installation. In default Mac, you don't have gcc (12M), nor g77, but you sure need perl for a sensible working machine, and tha't in default MacOS X installation. The price is that you need a possibility to install binary R packages. So not so much saving, but a bit more than what you get by shaving off R-devel. cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R + MacOSX + Emacs(XEmacs) + ESS
On Tue, 2005-02-15 at 10:34 -0200, Ronaldo Reis-Jr. wrote: Hi, I try to use Emacs or XEmacs with R in a MacOS X Panter without X11. Anybody can make this work? Did you try googling for macos X emacs? That's the way you get it. I have found two different versions, both work graphically without X11. ESS installs quite smoothly. Depending on your configuration, you may have to use ESC for Meta instead of Alt of some other systems. So start R in ESS using ESC-R. (The emacs that comes with MacOS X also is GNU Emacs, but works only within terminal window.) cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] eigen vector question
On Fri, 2005-02-18 at 11:26 +0100, Uwe Ligges wrote: Jessica Higgs wrote: Sorry to bother everyone, but I've looked in all of the help files and manuals I have and I can't find the answer to this question. I'm doing principle component analysis by calculating the eigen vectors of a correlation matrix that I have that is composed of 21 parameters. I have the eigen vectors and their values that R produced for me but I'm not sure how to tell which eigen vector/value corresponds to which parameter because when R produces eigen vectors it does so in decreasing order of significance, meaning that the eigen vector that explains the most of the variance is listed first, followed by the next eigen vector, etc etc. Any help would be appreciated. Feel free to write back if you need more information on my problem. Thanks! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Have you considered to use princomp()? It is really weird that people always recommend using princomp, although it is numerically inferior to prcomp and fails with rank deficit data. The natural solution would be to define functions: loadings - function(x) UseMethod(loadings) loadings.princomp - function (x) x$loadings loadings.prcomp - function(x) structure(x$rotation, class=loadings) cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to set up number of prin comp.
On Fri, 2005-02-25 at 20:29 +0800, [EMAIL PROTECTED] wrote: Hi Bjrn-Helge, Thanks for your help. In my case, there are more variables in the matrix than the units, so I have to use Prcomp with covariance to do PCA. The problem I am facing is how to get fisrt 8 coefficients and scores and how to write the result into text file. Thanks again. When I change princomp to prcomp below, I will NULL for pc$scores pc$loadings. X - some matrix pc - prcomp(X) pc$scores[,1:4]# The four first score vectors pc$loadings[,1:4] # The four first loadings Three most useful commands are help(), str() and names(). The first tells you how to use prcomp() and how it names its results. Try help(prcomp). The second peeks into the result so you see what is in there. Try (with your result) srt(pc). The third tells you what names are available in your result. The first (help) is the most useful of these commands, since it tells you what these names and items are. If you read it, you should say: pc$x[, 1:4] # The four first score vectors pc$rotation[, 1:4] # The four first loadings Also, loadings(pc) should work with prcomp. I think I'll write functions as.prcomp.princomp and as.princomp.prcomp someday. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reconstructing Datasets
On Tue, 2005-03-01 at 20:30 +, Laura Quinn wrote: Hi, Is it possible to recreate smoothed data sets in R, by performing a PCA and then reconstructing a data set from say the first 2/3 EOFs? I've had a look in the help pages and don't seem to find anything relevant. It's not in the R help, but in the books about PCA in help references. This can be done, not quite directly. Most of the hassle comes from the centring, and I guess in your case, from scaling of the results. I guess it is best to first scale the results like PCA would do, then make the low-rank approximation, and then de-scale: x - scale(x, scale = TRUE) pc - prcomp(x) Full rank will be: xfull - pc$x %*% pc$rotation The eigenvalues already are incorporated in pc$x, and you don't have to care about them. Then rank=3 approximation will be: x3 - pc$x[,1:3] %*% pc$rotation[,1:3] Then you have to de-scale: x3 - sweep(x3, 2, attr(x, scaled:scale, *) x3 - sweep(x3, 2, attr(x, scaled:center, +) And here you are. I wouldn't call this a smoothing, though. Library 'vegan' can do this automatically for PCA run with function 'rda', but there the scaling of raw results is non-conventional (though biplot). cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reconstructing Datasets
On Wed, 2005-03-02 at 08:30 +0200, Jari Oksanen wrote: On Tue, 2005-03-01 at 20:30 +, Laura Quinn wrote: Hi, Is it possible to recreate smoothed data sets in R, by performing a PCA and then reconstructing a data set from say the first 2/3 EOFs? I've had a look in the help pages and don't seem to find anything relevant. It's not in the R help, but in the books about PCA in help references. This can be done, not quite directly. Most of the hassle comes from the centring, and I guess in your case, from scaling of the results. I guess it is best to first scale the results like PCA would do, then make the low-rank approximation, and then de-scale: x - scale(x, scale = TRUE) pc - prcomp(x) Full rank will be: xfull - pc$x %*% pc$rotation Naturally, I forgot the transposition: xfull - pc$x %*% t(pc$rotation) and the check: range(x - xfull) which should be something in magnitude 1e-12 or better (6e-15 in the test I run). The eigenvalues already are incorporated in pc$x, and you don't have to care about them. Then rank=3 approximation will be: x3 - pc$x[,1:3] %*% pc$rotation[,1:3] and the same here: x3 - pc$x[,1:3] %*% t(pc$rotation[,1:3]) The moral: cut-and-paste. Then you have to de-scale: x3 - sweep(x3, 2, attr(x, scaled:scale, *) x3 - sweep(x3, 2, attr(x, scaled:center, +) And here you need to close the parentheses: x3 - sweep(x3, 2, attr(x, scaled:scale, *)) x3 - sweep(x3, 2, attr(x, scaled:center, +)) The moral #1: cut-and-paste. and #2: drink coffee in the morning. And here you are. I wouldn't call this a smoothing, though. Library 'vegan' can do this automatically for PCA run with function 'rda', but there the scaling of raw results is non-conventional (though biplot). cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Multidimensional Scaling (MDS) in R
This nmds seems to be the wrapper function in the labdsv package. Please check the documentation in that package. If I remember correctly, labdsv is geared for cases with large number of points, and then you don't want to get labels because they would be too congested to be seen anyway. The recommended procedure is to identify interesting points using 'plotid' function in labdsv. Function nmds is a very simple wrapper: it uses isoMDS in the MASS package, and adds class and some class methods. You may use isoMDS directly instead: dis - dsvdis(x) # Assuming you use labdsv ord - isoMDS(dis) plot(ord$points, asp = 1, type=n) text(ord$points, rownames(ord$points) The posting guide tells you to make package specific questions to the package author directly. In this case, the package author does not read R-News. cheers, jari oksanen On 8 Mar 2005, at 19:43, Isaac Waisberg wrote: Hi; I am working with the similarity matrix below and I would like to plot a two-dimensional MDS solution such as each point in the plot has a label. This is what I did: data - read.table('c:/multivariate/mds/colour.txt',header=FALSE) similarity - as.dist(data) distance - 1-similarity result.nmds - nmds(distance) plot(result.nmds) (nmds and plot.nmds as defined at labdsv.nr.usu.edu/splus_R/lab8/lab8.html; nmds simply calls isoMDS) Colour.txt, containing the similaity matrix, reads as follows: 1.0 .86 .42 .42 .18 .06 .07 .04 .02 .07 .09 .12 .13 .16 .86 1.0 .50 .44 .22 .09 .07 .07 .02 .04 .07 .11 .13 .14 .42 .50 1.0 .81 .47 .17 .10 .08 .02 .01 .02 .01 .05 .03 .42 .44 .81 1.0 .54 .25 .10 .09 .02 .01 .01 .01 .02 .04 .18 .22 .47 .54 1.0 .61 .31 .26 .07 .02 .02 .01 .02 .01 .06 .09 .17 .25 .61 1.0 .62 .45 .14 .08 .02 .02 .02 .01 .07 .07 .10 .10 .31 .62 1.0 .73 .22 .14 .05 .02 .02 .01 .04 .07 .08 .09 .26 .45 .73 1.0 .33 .19 .04 .03 .02 .02 .02 .02 .02 .02 .07 .14 .22 .33 1.0 .58 .37 .27 .20 .23 .07 .04 .01 .01 .02 .08 .14 .19 .58 1.0 .74 .50 .41 .28 .09 .07 .02 .01 .02 .02 .05 .04 .37 .74 1.0 .76 .62 .55 .12 .11 .01 .01 .01 .02 .02 .03 .27 .50 .76 1.0 .85 .68 .13 .13 .05 .02 .02 .02 .02 .02 .20 .41 .62 .85 1.0 .76 .16 .14 .03 .04 .01 .01 .01 .02 .23 .28 .55 .68 .76 1.0 The first row corresponds to colour 1 (C1), the second to colour 2 (C2), and so on. First, I'm not sure if this is correct or not. Second, obviously the points in the plot are not labeled. I suppose I must add a labels column and then print the labels together with the results. But, how should I do it? Many thanks, Isaac __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jari Oksanen, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Significance of Principal Coordinates
On Mon, 2005-03-14 at 18:32 +0100, Christian Kamenik wrote: Dear all, I was looking for methods in R that allow assessing the number of significant principal coordinates. Unfortunatly I was not very successful. I expanded my search to the web and Current Contents, however, the information I found is very limited. Therefore, I tried to write code for doing a randomization. I would highly appriciate if somebody could comment on the following approach. I am neither a statistician, nor an R expert... the data matrix I used has 72 species (columns) and 167 samples (rows). Earlier this year (Sat, 29 Jan 2005) Jérôme Lemaître asked something similar here under subject Bootstrapped eigenvector (but the code I posted then had one bug I know and perhaps some I don't know!). Some ecologists (Donald Jackson, Peres-Neto) have indeed tried to develop methods for PCA, and they could be easily modified for PCoA which is about the same method, in particular with Euclidean distances like you used. So the following two solutions are practically identical (within 2e-15 in the case I tried): x - decostand(x, norm) # in vegan chordis - dist(x) # Euclidean is the default, so this is chord distance pcoa - cmdscale(chordis) pca - prcomp(x) Verify this with: procrustes(pcoa, pca, choices=1:2) # in vegan PCoA with row weights is something different, but I really don't know why would you like to do this. I really don't understand what people mean with significant eigenvalues, unless they are making Factor Analysis. In PCA, you rotate your data, and you can find low-rank approximations of your data, but how these are rotatations are significant is beyond my imagination. Further, resampling with replacement seems to suit poorly to multivariate analysis: it duplicates some rows and so it makes easier to find similar rows that is the ultimate task in PC rotation. It seems that Monte Carlo results are systematically better than any original data (only if number of rows is much lower than number of columns this is not disturbing). Also, resampling or shuffling species tends to create communities that are fundamentally different from any real community we have: instead of single or a few abundant species, they may have several or none. With total abundance constraint you can hide the traces of anarchistic community assembly, but not its fundamental fault. So I do think that (1) you cannot use resampling in assessing PCA and its kin, (2) you cannot say what is the meaning of being significant in this case, and (3) the number of significant axes would only be a function of sample size even here. Now my hope is that some guru over there gets so irritated that (s)he chastises me for writing such pieces of stupidity, and sends a correct solution here with accompanying code and references to the literature. Let's hope so. The old truth is that most data sets have 2.5 dimensions (Kruskal): those two that you can show in a printed plot, and that half a dimension that you must explain away in the text. Wouldn't that be a sufficient solution? cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to do such MDS in R
On 21 Mar 2005, at 13:29, ronggui wrote: i know cmdscale and isoMDS inR can do classical and non-metric MDS.but i want to konw if there is packages can carry on individual differences scaling and multidimensional analysis og preference?both method are important one,but i can not find any clue on how to do it using R. anyone can help? thank you! It may be that individual differences scaling is not available in R. The classic piece of software for this purpose is SINDSCAL. It is beautiful Fortran (although this sounds like contradiction in terms), and it would be easy to port the software into R, but I think the license does not allow this. The hardest bit would be to change the output into R. I suggest you dig up SINDSCAL somewhere -- it could be in netlib -- and compile it yourself. Gnu g77 is quite OK. cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Principle Component Analysis in R
On Tue, 2005-04-05 at 16:59 +1200, Brett Stansfield wrote: Dear R Should I be concerned if the loadings to a Principle Component Analysis are as follows: Loadings: Comp.1 Comp.2 Comp.3 Comp.4 X100m -0.500 0.558 0.661 X200m -0.508 0.379 0.362 -0.683 X400m -0.505 -0.274 -0.794 -0.197 X800m -0.486 -0.686 0.486 0.239 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00 I just got concerned that no loading value was given for X100m, component 3. I have looked at the data using list() and it all seems OK You don't have to worry about one empty cell in loadings: the print function (called behind the curtain to show the results to you) is so clever that it doesn't show you small numbers, although they are there. I guess this happens because people with Factor Analysis background expect this. However, I would be worried if I got results like this, and would not use Princip*al* Components at all, since none of the components seems to be any more principal than others. Wouldn't original data do? cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] two methods for regression, two different results
On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote: Please forgive a straight stats question, and the informal notation. let us say we wish to perform a liner regression: y=b0 + b1*x + b2*z There are two ways this can be done, the usual way, as a single regression, fit1-lm(y~x+z) or by doing two regressions. In the first regression we could have y as the dependent variable and x as the independent variable fit2-lm(y~x). The second regrssion would be a regression in which the residuals from the first regression would be the depdendent variable, and the independent variable would be z. fit2-lm(fit2$residuals~z) I would think the two methods would give the same p value and the same beta coefficient for z. The don't. Can someone help my understand why the two methods do not give the same results. Additionally, could someone tell me when one method might be better than the other, i.e. what question does the first method anwser, and what question does the second method answer. I have searched a number of textbooks and have not found this question addressed. John, Bill Venables already told you that they don't do that, because they are not orthogonal. Here is a simpler way of getting the same result as he suggested for the coefficients of z (but only for z): x - runif(100) z - x + rnorm(100, sd=0.4) y - 3 + x + z + rnorm(100, sd=0.3) mod - lm(y ~ x + z) mod2 - lm(residuals(lm(y ~ x)) ~ x + z) summary(mod) Call: lm(formula = y ~ x + z) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 2.964360.06070 48.836 2e-16 *** x0.962720.11576 8.317 5.67e-13 *** z1.089220.06711 16.229 2e-16 *** --- Residual standard error: 0.2978 on 97 degrees of freedom summary(mod2) Call: lm(formula = residuals(lm(y ~ x)) ~ x + z) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -0.157310.06070 -2.592 0.0110 * x -0.844590.11576 -7.296 8.13e-11 *** z1.089220.06711 16.229 2e-16 *** --- Residual standard error: 0.2978 on 97 degrees of freedom You can omit x from the outer lm only if x and z are orthogonal, although you already removed the effect of x... In orthogonal case the coefficient for x would be 0. Residuals are equal in these two models: range(residuals(mod) - residuals(mod2)) [1] -2.797242e-17 5.551115e-17 But, of course, fitted values are not equal, since you fit the mod2 to the residuals after removing the effect of x... cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] weird results w/ prcomp-princomp
On Fri, 2005-04-08 at 11:12 +0200, Alessandro Bigi wrote: I am doing a Principal Component Analaysis (PCA) on a 44x19 matrix. with princomp(x,cor=TRUE,scores=TRUE) and prcomp(x,scale=TRUE,center=TRUE) The resulted eigenv. and rotated matrix are the same (as expected), however the sum of eigenvalues is lower than 19 (number of variables). What about the sum of squared sdev? (Hint, the prcomp help page says that the returned sdev are the square root of the eigenvalues. While princomp help does not say this explicitly, it says that sdev are standard deviations). cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061 email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Error message with nmds
On Tue, 2006-05-16 at 13:25 -0700, Jonathan Hughes wrote: I am trying to apply nmds to a data matrix but I receive the following error message: Error in isoMDS(dis, y = y, k = k, maxit = maxit) : zero or negative distance between objects 5 and 7 The data are in a vegetation cover-class matrix (species in columns, plots in rows, classes 1-8 with lots of zero values) converted to a dissimilarity matrix (bray curtis). I assumed that objects 5 and 7 refer to rows of my original data; and they do have the same species with the same cover classes. I deleted one of these rows but I received the same error message with a rerun of nmds. As it turns out, the new rows 5 and 7 are the same. How do I avoid this problem? Jonathan, this is a FAQ in the proper sense of the word: this is frequently asked. Last thread was on April, 2006. See https://stat.ethz.ch/pipermail/r-help/2006-April/092598.html and answers. You may also use RSiteSearch with keyword isoMDS to find other (and older) threads. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 2 Courses Near You - (1) Introduction to R/S+ programming: Microarrays Analysis and Bioconductor, (2) R/Splus Fundamentals and Programming Techniques
On Tue, 2006-06-13 at 21:34 +0200, Uwe Ligges wrote: ... and again I wonder which courses are near. This leads at once to the question: which metric is in use?. Possibly this: ### Great Circle distances ### Use different sign to N and S, and to E and W ### (does not matter which sign) ### Lat and long must be in degrees + decimals (sorry) globedis - function(lat0, lon0, lat1, lon1, km = TRUE) { phi0 - pi/180*lat0 phi1 - pi/180*lat1 lambda - pi/180*(lon0 - lon1) delta - sin(phi0) * sin(phi1) + cos(phi0) * cos(phi1) * cos(lambda) delta - acos(delta) dist - 60*180/pi*delta dist - dist %% 10800 if (km) dist - 1.852 * dist dist } Which says that Boston is nearest to my office (6100km). The other alternatives are Baltimore 6720km, Chicago 6800km, Raleigh 7050km and San Francisco 8240km. In more practical metric of flight time, Baltimore is closest (OUL - BWI 12h55min), but Boston and Chicago are not much further away (OUL - BOS 14h00min, OUL-CHI 14h15min). cheers, jari oksanen Probably some football related metric: FIFA WM takes place in Dortmund and commercials say something like the world is our guest ... Now, let's escape from football to Austria and Vienna's useR!2006 conference! Uwe Ligges [EMAIL PROTECTED] wrote: XLSolutions Corporation (www.xlsolutions-corp.com) is proud to announce: (1) Introduction to R/S+ programming: Microarrays Analysis and Bioconductor *** San Francisco / July 17-18, 2006 *** *** Chicago / July 24-25, 2006 *** *** Baltimore / July 27-28, 2006 *** *** Raleigh / July 17-18, 2006 *** *** Boston/ July 27-28, 2006 *** http://www.xlsolutions-corp.com/RSmicro (2) R/Splus Fundamentals and Programming Techniques *** San Francisco / July 10-11, 2006 *** *** Houston / July 13-14, 2006 *** *** San Diego / July 17-18, 2006 *** *** Chicago / July 20-21, 2006 *** *** New York City / July 24-25, 2006 *** *** Boston/ July 27-28, 2006 *** http://www.xlsolutions-corp.com/Rfund.htm Ask for group discount and reserve your seat Now - Earlybird Rates. Payment due after the class! Email Sue Turner: [EMAIL PROTECTED] Interested in our Advanced Programming class? (1) Introduction to R/S+ programming: Microarrays Analysis and Bioconductor Course Outline: - R/S System: Overview; Installation and Demonstration - Data Manipulation and Management - Graphics; Enhancing Plots, Trellis - Writing Functions - Connecting to External Software - R/S Packages and Libraries (e.g. BioConductor) - BioConductor: Overview; Installation and Demonstration - Array Quality Inspection - Correction and Normalization; Affymetrix and cDNA arrays - Identification of Differentially Expressed Genes - Visualization of Genomic Information - Clustering Methods in R/Splus - Gene Ontology (GO) and Pathway Analysis - Inference, Strategies for Large Data (2) R/Splus Fundamentals and Programming Techniques Course outline. - An Overview of R and S - Data Manipulation and Graphics - Using Lattice Graphics - A Comparison of R and S-Plus - How can R Complement SAS? - Writing Functions - Avoiding Loops - Vectorization - Statistical Modeling - Project Management - Techniques for Effective use of R and S - Enhancing Plots - Using High-level Plotting Functions - Building and Distributing Packages (libraries) - Connecting; ODBC, Rweb, Orca via sockets and via Rjava Email us for group discounts. Email Sue Turner: [EMAIL PROTECTED] Phone: 206-686-1578 Visit us: www.xlsolutions-corp.com/training.htm Please let us know if you and your colleagues are interested in this class to take advantage of group discount. Register now to secure your seat! Cheers, Elvis Miller, PhD Manager Training. XLSolutions Corporation 206 686 1578 www.xlsolutions-corp.com [EMAIL PROTECTED] 2 Courses - (1) Introduction to R/S+ programming: Microarrays Analysis and Bioconductor (2) R/Splus Fundamentals and Programming Techniques Interest in our R/Splus Advanced Programming? Email us for upcoming courses. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jari Oksanen -- Dept Biology, Univ Oulu
Re: [R] MDS with missing data?
Dear Context Grey, On 15 Jun 2006, at 6:42, context grey wrote: I will be applying MDS (actually Isomap) to make a psychological concept map of the similarities between N concepts. So actually, how do you do isomap? RSiteSearch gave me one hit of isomap. I only ask, because I've implemented a working version of isomap (not ready for prime time yet, but a proof that it works). If isomap already is available in R, I won't do anything more with the function. I don't understand the rest of the question, but isomap really may be able to work with NA dissimilarities: just replace them with shortest path distances via non-missing dissimilarities. In fact, you don't need but some ('k') non-missing dissimilarities per item, since that is how isomap works. Your dissimilarity structure may become disconnected, of course, but that's common in isomap. If you mean that your raw data has NA, then you may select a dissimilarity function that can handle NA input and produce finite dissimilarities (I think daisy in the cluster package does this). Somehow I feel I answered to quite a different question than you asked. Sorry. I would like to scale to a large number of concepts, however, the resulting N*(N-1) pairwise similarities is prohibitive for a user survey. I'm thinking of giving people random subsets of the pairwise similarities. Does anyone have recommendations for this situation? My current thoughts are to either 1) use nonmetric/gradient descent MDS which seems to allow missing data, or Not the isoMDS function in MASS. if N(N-1) is a problem, then nonmetric MDS may not be the solution. 2) devise some scheme whereby the data that are ranked in common by several people is used to derive a scaling factor for each person's ratings. Thanks for any advice, _ Cheers, Green Power -- Green Power, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] MDS with missing data?
On Thu, 2006-06-15 at 07:13 +0300, Jari Oksanen wrote: 1) use nonmetric/gradient descent MDS which seems to allow missing data, or Not the isoMDS function in MASS. if N(N-1) is a problem, then nonmetric MDS may not be the solution. Sorry for the wrong information: isoMDS does handle NA. I remembered old times when I looked at the issue, but isoMDS changed since. Fine work! cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ordination of feature film data question
On Mon, 2006-03-13 at 07:50 +, Prof Brian Ripley wrote: `Ordination' is ecologists' terminology for multidimensional scaling. You will find worked examples in MASS (the book, see the R FAQ), and the two most commonly used functions, isoMDS and sammon, in MASS the package. 'Ordination' in ecologists' terminology also covers principal components analysis and variants of correspondence analysis. Actually, when an ecologist speaks about 'ordination', she most often means correspondence analysis, which also sounds like a natural (though perhaps not the best) choice for co-occurrence data in movies. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] transparent background for PDF
On 24 Mar 2006, at 20:30, Dennis Fisher wrote: Colleagues Running R2.2.1 on either a Linux (RedHat 9) or Mac (10.4) platform. I created a PDF document using pdf(FILENAME.pdf, bg=transparent, version=1.4). I then imported the graphic into PowerPoint - background was set to a non-transparent color. I was hoping that the inserted graphic would be transparent - instead, it had a white background. According to my experience, this is a feature of PowerPoint which seems to be incapable to display transparent background in PDF. This also concerns transparent background PDF's from other programmes than R. This experience is from Linux Mac (pdf) and PP in Mac (never tried that with PowerPoint on Linux...). cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] isoMDS and 0 distances
On Tue, 2006-04-18 at 22:06 -0400, Tyler Smith wrote: I'm trying to do a non-metric multidimensional scaling using isoMDS. However, I have some '0' distances in my data, and I'm not sure how to deal with them. I'd rather not drop rows from the original data, as I am comparing several datasets (morphology and molecular data) for the same individuals, and it's interesting to see how much morphological variation can be associated with an identical genotype. I've tried replacing the 0's with NA, but the isoMDS appears to stop on the first iteration and the stress does not improve: distA # A dist object with 13695 elements, 4 of which == 0 cmdsA - cmdscale(distA, k=2) distB - distA distB[which(distB==0)] - NA isoA - isoMDS(distB, cmdsA) initial value 21.835691 final value 21.835691 converged The other approach I've tried is replacing the 0's with small numbers. In this case isoMDS does reduce the stress values. min(distA[which(distA0)]) [1] 0.02325581 distC - distA distC[which(distC==0)] - 0.001 isoC - isoMDS(distC) initial value 21.682854 iter 5 value 16.862093 iter 10 value 16.451800 final value 16.339224 converged So my questions are: what am I doing wrong in the first example? Why does isoMDS converge without doing anything? Is replacing the 0's with small numbers an appropriate alternative? Tyler, My experience is that isoMDS *may* fail to go away from the starting configuration if there are identical values in initial configuration, and this will happen if you use cmdscale() to get the initial configuration. You *may* get over this by shifting duplicates a bit: con - cmdscale(dis) dups - duplicated(con) sum(dups) [1] 2 con[dups, ] - con[dups,] + runif(2*sum(dups), -0.01, 0.01) Then isoMDS may go further. Another issue is that at a quick look isoMDS() seems to do nothing sensible with missing values, although it accepts them. The only thing is that they are ordered last, or regarded as very long distances (in your case they rather should be regarded as very short distances). The keylines in isoMDS are: ord - order(dis) nd - sum(!is.na(ord)) Even when 'dis' has missing values, the result of order() ('ord') has no missing values, but with default argument na.last=TRUE they are put last in the list. An obvious looking change would be to replace the second line with: nd - sum(!is.na(dis)) but this dumps the core of R at least in my machine: probably you need the full length of vectors also in addition to number of non-missing entries. (This quick look was based on the latest release version of MASS/VR: there may be a newer version already with the upcoming R release, but that's not released yet.) You may check working with NA: are duplicate points identical in results? Then about replacing zero distances with a tiny number: this has been discussed before in this list, and Ripley said no, no!. I do it all the time, but only in secrecy. A suggested solution was to drop duplicates, but then there still is a weighting issue, and isoMDS does not have weights argument. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] isoMDS and 0 distances
On Wed, 2006-04-19 at 07:46 +0100, Prof Brian Ripley wrote: Short answer: you cannot compare distances including NAs, so there is no way to find a monotone mapping of distances. The original Kruskal-Young-Shepard-Torgerson programme KYST (version 1 from 1973) could handle missing values. Unfortunately I've lost the documents, but if I remember correctly, the argument was that you don't need but a subset (representative for points) of (dis)similarities to get a monotone regression. KYST -- and computers of that time (I used Burroughs!) -- had limitations on data size, and removing some of the dissimilarities was a way of getting more than 64 data points into analysis. However, better not go into details since: C THIS INFORMATION IS PROPRIETARY AND IS THE C PROPERTY OF BELL TELEPHONE LABORATORIES, C INCORPORATED. ITS REPRODUCTION OR DISCLOSURE C TO OTHERS, EITHER ORALLY OR IN WRITING, IS C PROHIBITED WITHOUT WRITTEN PRERMISSION OF C BELL LABORATORIES. CKYST-2A AUGUST, 1977 cheers, jari oksanen -- Jari Oksanen -- Biologian laitos, Oulun yliopisto, 90014 Oulu sposti [EMAIL PROTECTED], kotisivu http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] environmental data as vector in PCA plots
On 10 May 2004, at 17:15, Heike Schmitt wrote: I want to include a vector representing the sites - environmental data correlation in a PCA. I currently use prcomp (no scaling) to perform the PCA, and envfit to retrieve the coordinates of the environmental data vector. However, the vector length is different from the one obtained in CAnoco when performing a species - environmental biplot (scaling -2). How can I scale the vector in order to be in accordance with Canoco, or which other scaling options are there? Canoco scaling abs(2) does not scale sites, but the sum of squares of site scores = 1 for all axes. In contrast, prcomp scales site axes by eigenvalue, like does Canoco with scaling abs(1). Therefore you cannot get similar results as in Canoco. A simple solution that *may* (or may not) work is to transpose your data: instead of prcomp(x), try prcomp(t(scale(x, scale=F), center=F). This does the centring to the columns of x (like it should be done), then transposes your data and prcomp's without new centring -- which was already made for columns (I didn't test this, but this way it was done in the olden times). Another alternative is to use the function rda in the same package where you found envfit (vegan), since it is not unlike Canoco in its scaling. However, it won't give you negative scalings of PCA (RDA without constraints), since its author (that's I) thinks that you shouldn't use negative scaling of Canoco in RDA/PCA. The package ships with a pdf document which discusses PCA scaling in prcomp, princomp, rda (of vegan) and Canoco (of Cajo ter Braak), and even hints how to get the minus scalings that the author doesn't approve. cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] BIO-ENV procedure
On Fri, 2004-05-14 at 00:08, Peter Nelson wrote: I've been unable to find a R package that provides the means of performing Clarke Ainsworth's BIO-ENV procedure or something comparable. Briefly, they describe a method for comparing two separate sample ordinations, one from species data and the second from environmental data. The analysis includes selection of the 'best' subset of environmental variables for explaining the observed spp ordination. Is there something available or being developed? Send a reference to the exact algorithm (or to recipe to algorithm) so that someone can implement the method. Your post is not sufficient to know what should be there. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] BIO-ENV procedure
On Fri, 2004-05-14 at 00:08, Peter Nelson wrote: I've been unable to find a R package that provides the means of performing Clarke Ainsworth's BIO-ENV procedure or something comparable. Briefly, they describe a method for comparing two separate sample ordinations, one from species data and the second from environmental data. The analysis includes selection of the 'best' subset of environmental variables for explaining the observed spp ordination. Is there something available or being developed? I found a photocopy of Clarke's Ainsworth's paper (our library does not subscribe to the Marine Ecology Progress Series, because salty seas are too far away). It may be that you don't have a canned BIO-ENV routine in R, but you can get very close to the procedure using R (and the only missing piece looks unessential to me). However, I know that Bill Venables was working with Clarke's PRIMER, and he may have canned even BIO-ENV. The following discussion is based on Clarke Ainsworth, Mar. Ecol. Prog. Ser. 92, 205-219; 1993. It seems that this is not an algorithm, because it uses brute force and we have no idea if it converges to anywhere. Let's call this a procedure. CA suggest analysis with separate ordination of community data, and then selecting a subset of environmental variables that is similar to the species data. Please note that this is not constrained (or `canonical') ordination: species ordination is done independently. Further, the similarity of environmental and biological structure is analysed apart from ordination, so that you may have cases with good species -- environment relationship, but not in ordination. 1. An NMDS of community data using Bray-Curtis dissimilarities. Use isoMDS function of Ripley's Venables's MASS library for NMDS. Bray-Curtis dissimilarity is available at least in vegan, and probably in ade4 and possibly in many other packages. 2. For evaluating species -- environment relationship, they suggest using rank correlation between Bray-Curtis dissimilarities of community data and Euclidean distances of environmental data with certain set of environmental variables. You have Euclidean distances in stats function dist (or in vegan and N other packages), and you can get rank correlations with cor.test. However, Clarke Ainsworth suggest a new type of rank correalation that they call ``harmonic rank correlation'' and this may not be in R (haven't searched, though). I do think that is unessential for the method, so you can do with the existing rank correlations. 3. Then comes the hard work. You have to try with all possible combinations of environmental variables (and there may be several combinations of them). This could be canned, because this is boring and error prone. Thomas Lumley's leaps package does this for regression analysis, and model could be taken from there. Now you can do this, but it may be a bit hard work. 4. Now you select the best subset, or the subset giving highest rank correlation. There is no guarantee that there is such a unique, clear case, but you may be lucky. 5. Take that subset, get Euclidean distances, and ordinate those distances using NMDS, and plot your two ordinations side by side. Clarke Ainsworth recommend using NMDS instead of metric (or classic) MDS, and they warn against Procrustes comparison of these two solutions (but I would suggest Procrustes comparison). library(MASS) library(vegan) data(varespec) data(varechem) env - varechem[, c(N,P,K)] d - vegdist(varespec, bray) env - scale(env) cor.test(d, dist(env[,1]), method=spear)$est rho 0.1712362 cor.test(d, dist(env[,2]), method=spear)$est rho 0.1803071 cor.test(d, dist(env[,3]), method=spear)$est rho 0.2427814 cor.test(d, dist(env[,c(1,2)]), method=spear)$est rho 0.2422454 cor.test(d, dist(env[,c(1,3)]), method=spear)$est rho 0.2471631 cor.test(d, dist(env[,c(2,3)]), method=spear)$est rho 0.2081135 cor.test(d, dist(env[,c(1,2,3)]), method=spear)$est rho 0.2441523 Some warnings on ties were removed. This suggest that the best subset uses variables 2 and 3 or N and K. In this case we can skip the NMDS of community data, since it has rank 2 (only two environmental variables), and can be exactly plotted in 2 dim without ordination. mds.comm - isoMDS(d) par(mfrow=c(1,2)) plot(mds.comm$points, asp=1) plot(env[, c(1,3)], asp=1) Or, possibly: par(mfrow=c(1,1)) plot(procrustes(env[,c(1,3)], mds.comm)) So you can do it. The missing pieces are the harmonic rank correlation (if you think that's essential) and automating variable selection. Somebody could do them (not me, though). cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Factor loadings and principal component plots
On Tue, 2004-05-04 at 09:56, Prof Brian Ripley wrote: On 4 May 2004, Jari Oksanen wrote: On Tue, 2004-05-04 at 09:34, Prof Brian Ripley wrote: Yes, but princomp is the recommended way, not prcomp. But the documentation seems to recommend prcomp: For numerical accuracy, but not for flexibility. Wouldn't the best alternative be to combine flexibility and accuracy into one alternative? I mean, I'd still use prcomp after reading the help pages, and I'd put more weight on accuracy than on flexibility. A quick exploitation of the princomp would yield the attached flexible prcomp code. prcomp is more flexible at least in one point: it can handle data with less units than variables. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] prcomp.default - function (x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, subset = rep(TRUE, nrow(as.matrix(x))), ...) { x - as.matrix(x) x - x[subset, , drop = FALSE] x - scale(x, center = center, scale = scale.) s - svd(x, nu = 0) if (!is.null(tol)) { rank - sum(s$d (s$d[1] * tol)) if (rank ncol(x)) s$v - s$v[, 1:rank, drop = FALSE] } s$d - s$d/sqrt(max(1, nrow(x) - 1)) dimnames(s$v) - list(colnames(x), paste(PC, seq(len = ncol(s$v)), sep = )) r - list(sdev = s$d, rotation = s$v) if (retx) r$x - x %*% s$v class(r) - prcomp r } prcomp.formula - function (formula, data = NULL, subset, na.action, ...) { mt - terms(formula, data = data) if (attr(mt, response) 0) stop(response not allowed in formula) cl - match.call() mf - match.call(expand.dots = FALSE) mf$... - NULL mf[[1]] - as.name(model.frame) mf - eval.parent(mf) if (any(sapply(mf, function(x) is.factor(x) || !is.numeric(x stop(PCA applies only to numerical variables) na.act - attr(mf, na.action) mt - attr(mf, terms) attr(mt, intercept) - 0 x - model.matrix(mt, mf) res - prcomp.default(x, ...) cl[[1]] - as.name(prcomp) res$call - cl if (!is.null(na.act)) { res$na.action - na.act if (!is.null(sc - res$x)) res$x - napredict(na.act, sc) } res } prcomp - function (x, ...) UseMethod(prcomp) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Factor loadings and principal component plots
On Tue, 2004-05-04 at 09:34, Prof Brian Ripley wrote: Yes, but princomp is the recommended way, not prcomp. But the documentation seems to recommend prcomp: ?prcomp: The calculation is done by a singular value decomposition of the (centered and scaled) data matrix, not by using 'eigen' on the covariance matrix. This is generally the preferred method for numerical accuracy. ?princomp: The calculation is done using 'eigen' on the correlation or covariance matrix, as determined by 'cor'. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use 'svd' on 'x', as is done in 'prcomp'. Just confused, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Common principle components
On Wed, 2004-05-26 at 18:01, J. Pedro Granadeiro wrote: I am sorry for not being clear. I meant the methods detailed in: Flury, B. (1988). Common Principle Components Analysis and Related Multivariate Models, John Wiley and Sons, New York. After writing my previous (long) response to this message, I started to think that it would be strange if the ade4 people of Lyon had not written something similar. Indeed they have: there are several alternative methods for multivariate analysis of K tables in ade4. They may not be exactly identical to Flury's Common Principal Components, but they do similar things. Some of the methods may even be identical: The ade4 people cite French sources, and Flury does not cite French sources -- and there are at least two parallel universes in multivariate analysis that rarely cross each other. Just go to CRAN and get ade4, and try to figure out how to do the analysis you need. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] distance in the function kmeans
My thread broke as I write this at home and there were no new messages on this subject after I got home. I hope this still reaches interested parties. There are several methods that find centroids (means) from distance data. Centroid clustering methods do so, and so does classic scaling a.k.a. metric multidimensional scaling a.k.a. principal co-ordinates analysis (in R function cmdscale the means are found in C function dblcen.c in R sources). Strictly this centroid finding only works with Euclidean distances, but these methods willingly handle any other dissimilarities (or distances). Sometimes this results in anomalies like upper levels being below lower levels in cluster diagrams or in negative eigenvalues in cmdscale. In principle, kmeans could do the same if she only wanted. Is it correct to use non-Euclidean dissimilarities when Euclidean distances were assumed? In my field (ecology) we know that Euclidean distances are often poor, and some other dissimilarities have better properties, and I think it is OK to break the rules (or `violate the assumptions'). Now we don't know what kind of dissimilarities were used in the original post (I think I never saw this specified), so we don't know if they can be euclidized directly using ideas of Petzold or Simpson. They might be semimetric or other sinful dissimilarities, too. These would be bad in the sense Uwe Ligges wrote: you wouldn't get centres of Voronoi polygons in original space, not even non-overlapping polygons. Still they might work better than the original space (who wants to be in the original space when there are better spaces floating around?) The following trick handles the problem euclidizing space implied by any dissimilarity meaasure (metric or semimetric). Here mdata is your original (rectangular) data matrix, and dis is any dissimilarity data: tmp - cmdscale(dis, k=min(dim(mdata))-1, eig=TRUE) eucspace - tmp$points[, tmp$eig 0.01] The condition removes axes with negative or almost-zero eigenvalues that you will get with semimetric dissimilarities. Then just call kmeans with eucspace as argument. If your dis is Euclidean, this is only a rotation and kmeans of eucspace and mdata should be equal. For other types of dis (even for semimetric dissimilarity) this maps your dissimilarities onto Euclidean space which in effect is the same as performing kmeans with your original dissimilarity. Cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] privileged slots,
On Tue, 2004-06-01 at 12:21, Torsten Steuernagel wrote: On 28 May 2004 at 8:19, Duncan Murdoch wrote: I'd advise against doing this kind of optimization. It will make your code harder to maintain, and while it might be faster today, if @- is really a major time sink, it's an obvious candidate for optimization in R, e.g. by making it .Internal or .Primitive. When that happens, your optimized code will likely be slower (if it even works at all). Agreed. I don't recommend doing this either. I don't believe it makes any difference using slot- instead of @- in real life. Anyway, that optimized code should always work (slower or not) because slot- is fully documented and I don't see why it should be removed or its behaviour should change. That wouldn't only break the kind of code mentioned here but also everything else that makes use of slot-. There are several other things that were fully documented and still were removed. One of the latest cases was print.coefmat which was abruptly made Defunct without warning or grace period: code written for 1.8* didn't work in 1.9.0 and if corrected for 1.9.0 it wouldn't work in pre-1.9.0. Anything can change in R without warning, and your code may be broken anytime. Just be prepared. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] running R UNIX in a mac computer
On Fri, 2004-06-11 at 03:49, Tiago R Magalhaes wrote: Hi to you all My question is: there is a package written in UNIX for which there is no Mac version. I would like to know if it's possible to install the R UNIX version on the MacOSX and run that UNIX package on my Mac (through this UNIX R Vresion on a Mac) I have seen a porfile for r version 1.8.1 on darwin: http://r.darwinports.com/ is that it? aother question related to that if it's possible to use UNIX R in Mac, does anyone know how fast or how slow that is? Tiago, If it is a CRAN package *without* MacOS version, there obviously is a reason for this handicap, and you cannot run the package. If it is a stray package, its developer probably just doesn't have opportunity or will to build a Mac binary, but you can build it yourself if you're lucky. Check the FAQ and ReadMe files with your R/Aqua version to see what you need. With little trouble you can easily use source packages with your Mac R. Many tools are already installed in your OS (perl at least). If the package has only R files, you may be able to install a source package directly. If it has C source code, you should first install MacOS X Developer tools (XCode): it comes with your OS installation CD/DVD, but it is not installed by default. If the package has Fortran source code, you got to find external Fortran compiler: MacOS X ships with C compiler, but without Fortran compiler. See the Mac R FAQ for the best alternatives to find the compiler (this FAQ is installed with your R). Installing a Darwin R orobably won't help you. It needs and uses exactly the same tools to build the packages as R/Aqua. If you can't install a source package in R/Aqua, you cannot install it in R/Darwin, and vice versa. The toolset is the decisive part, not the R shell. I assume that both versions of R are just as fast (or slow). R/Aqua uses highly optimized BLAS for numeric functions, and if R/Darwin uses the same library, it is just as fast. If it doesn't use optimized BLAS, it will be clearly slower. I have installed Linux in Mac, but I found out that R was clearly (20%) slower in Linux than in MacOS in the very same piece of hardware. The main reason seemed to be that Linux R didn't have optimized BLAS because the largest differences were in functions calling svd and qr (I used YellowDog Linux) -- the Linux version took 150%(!) longer to run the same svd-heavy test code. Another reason seemed to be that the Fortran compiler produces much slower code in Linux than in MacOS X (difference about 20%). cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can't get text to appear over individual panels in multi-panel plot
On 18 Jun 2004, at 8:26, Deepayan Sarkar wrote: On Thursday 17 June 2004 22:57, Patrick Bennett wrote: yes, i can reproduce that same graph when i print to the pdf-device. but the panel titles do not appear when I print to the Quartz-device. Hmm. I won't be able to help you then, let's hope someone else can. I think this is a problem with the quartz device. I have often see that margin texts are plotted even in ordinary plot() if quartz thinks there is no space for them. They do still appear if you copy the screen graphics as a pdf file. In Linux (my principal platform) I typically reduce the white margins, but if I use the same mar pars in MacOS X I won't get axis labels. Quartz is the culprit I suppose. Actually, in your example I couldn't get the texts when I saved the plot as a pdf (menu entry). However, when I opened an X11 device, the text was reproduced OK. So it looks like a quartz problem. For X11 in MacOS X: It may not be in the default installation, but it is in the installation CD/DVD of MacOS X. Then you got to start it explicitly before launching x11() within R shell. In general, I wouldn't recommend using x11() in Mac, since quartz() looks so much better: x11 looks just as clumsy as x11 in Linux or the ordinary Windows plotting device in some other OS. -- And beware: I have a suspicion that if you stop your X11 in MacOS X, your mouse will die at logout and you got to boot (or restart the mouse demon if you know who he is). cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can't get text to appear over individual panels in multi-panel plot
On Fri, 2004-06-18 at 09:28, Jari Oksanen wrote: On 18 Jun 2004, at 8:26, Deepayan Sarkar wrote: On Thursday 17 June 2004 22:57, Patrick Bennett wrote: yes, i can reproduce that same graph when i print to the pdf-device. but the panel titles do not appear when I print to the Quartz-device. Hmm. I won't be able to help you then, let's hope someone else can. I had a closer look at this, and it indeed looks like quartz() is anally checking that there is enough space for text or it refuses to print it at all. Like I wrote, the command worked with x11() device in MacOS X, but failed with default quartz(). I checked again (in another machine), and it seems that you may get text if you expand the par.strip: try adding par.strip.text=list(lines=2) in your Lattice plotting command (lines=1.8 was the smallest that worked in my case). This is a fault (``undesirable feature'') in quartz. This doesn't concern Lattice only, but all graphics commands: quartz() refuses to show axis labels or titles in too narrow margins, or to write text too close to axes (if xpd is not set) in quite ordinary plot(). cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Maxima/minima of loess line?
On Tue, 2004-08-24 at 15:23, Liaw, Andy wrote: Just take range() of the fitted loess values. Or if you really want to investigate the *line* instead of some random *points*, you may need something like: optimize(function(x, mod) predict(mod, data.frame(speed=x)), c(0,20), maximum=TRUE, mod=cars.lo) $maximum [1] 19.5 $objective [1] 56.44498 This elaborates the ?loess example with the result object cars.lo (and, of course is a bad example since the fit is monotone and solutions is forced to the margin). Use maximum=FALSE for *a* minimum. If you have several predictors, you either need to supply constant values for those in optimize, or for simultaneous search in all use optim or nlm. cheers, jari oksanen From: Fredrik Karlsson Dear list, I've produced a loess line that I would like to investigate in terms of local/global maxima and minima. How would you do this? Thank you in advance. /Fredrik Karlsson __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] integrate function
On Wed, 2004-08-25 at 23:44, Peter Dalgaard wrote: Ronaldo Reis Jr. [EMAIL PROTECTED] writes: Is possible to integrate this diferential equation: dN/dt = Nr(1-(N/K)) in R using the integrate() function? No. However, you could use N = K/(1 + exp(log((K-N0)/K) -r*t)), where N0 is the population size at t=0 (that you must fix or estimate). Causton has a long discussion about integrating this funcition on in his Mathematics for Biologists (or something like that). Apart from that MuPad may be free for Linux, and you can buy many other alternatives for symbolic mathematics (Maple is available for Linux, at least). It may be that you still have to work to get the solution you need, even with snappy tools like that. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R binaries for UMBUTU Linux?
On Wed, 2005-04-13 at 15:19 +0200, Derek Eder wrote: Has anyone out there compiled R for the Umbutu Linux* (neé Debian) v. 5.04 distribution for Intel-type platforms (32 and 64 bit) ? Thank you, Derek Eder * Umbutu, a popular new Linux distribution, not a Nigerian scam, I promise!http://www.ubuntulinux.org/ Well, if you mean Ubuntu (and Debian is still there: she's not married to Ubuntu but kept her name), I have some experience (though not on Intel -- later about that). First, it seems that R is not in standard Ubuntu base, but you can find it in the universe, and install as a binary. However, the rhythms are a bit off. Previous Ubuntu release was about simultaneously with the R-2.0.x release, and you got R-1.9.1 in Ubuntu. The current release of Ubuntu was last week, and R is up to next week. This means that you're lagging behind by one cycle in R with these predictable and regular release cycles. However, Ubuntu is a Linux which means that you can compile R from the sources quite easily. I did this with Ubuntu 4.04, and compilation went smoothly (like usually). However, I did this in ppc (32bit, or G4), and some tests failed (at least in 'foreign': I haven't studied this in more detail). The base R seems to work OK, though. Alternatively, you can use real Debian packages from its testing repository. Ubuntu does not recommend using native Debian packages, but I guess with R you can do this fairly safely (the general problem is a potential conflict in version naming which may lead to conflicts in upgrades, but I think this is OK with R). So you may get the latest Debian (testing) packages -- as soon as they get through the jungle of dependencies and appear in Debian. cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Factor Analysis Biplot
On Fri, 2005-04-15 at 12:49 +1200, Brett Stansfield wrote: Dear R Dear S, When I go to do the biplot biplot(eurofood.fa$scores, eurofood$loadings) Error in 1:p : NA/NaN argument Potential sources of error (guessing: no sufficient detail given in the message): - you ask scores from eurofood.fa and loadings from eurofood: one of these names may be wrong. - you did not ask scores in factanal (they are not there as default, but you have to specify 'scores'). Loadings: Factor1 Factor2 RedMeat0.561 -0.112 WhiteMeat 0.593 -0.432 Eggs 0.839 -0.195 Milk 0.679 Fish 0.300 0.951 Cereals -0.902 -0.267 Starch 0.542 0.253 Nuts -0.760 Fr.Veg-0.145 0.325 The cut values are there, but they are not displayed. To see this, you may try: unclass(eurofood$loadings) print(eurofuud$loadings, cutoff=0) cheers, J -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R2.0.1 for Mac OS X 10.3 problem
On Mon, 2005-04-18 at 06:39 -0700, Horacio Montenegro wrote: I have the same problem, in Windows, and I think the .Rdata is not corrupted. Load R without loading the.Rdata file - move or rename it. Then library(lme4) and load the .Rdata - it should work. This also is my experience (in Linux MacOS X). The .RData need not be corrupted, but you have corrupted your R installation by deleting or corrupting some package using S4 methods (like failed upgrade of a S4 method package). When you start R so that it tries to restore those S4 objects in .RData, you get the error. Renaming or deleting .RData will help, of course. Alternatively, in my case it helped to start R with option --no-restore-data (in Mac when starting R from terminal -- where you are all the time in Linux). Probably it would help to install again the original S4 package. (In my case this happened when I tried Thomas Yee's VGAM and then removed the package.) cheers, jari oksanen -- Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Very Slow Gower Similarity Function
On 18 Apr 2005, at 19:10, Tyler Smith wrote: Hello, I am a relatively new user of R. I have written a basic function to calculate the Gower similarity function. I was motivated to do so partly as an excercise in learning R, and partly because the existing option (vegdist in the vegan package) does not accept missing values. Speed is the reason to use C instead of R. It should be easy, almost trivial, to modify the vegdist.c so that it handles missing values. I guess this handling means ignoring the value pair if one of the values is missing -- which is not so gentle to the metric properties so dear to Gower. Package vegan is designed for ecological community data which generally do not have missing values (except in environmental data), but contributions are welcome. I think I have succeeded - my function gives me the correct values. However, now that I'm starting to use it with real data, I realise it's very slow. It takes more than 45 minutes on my Windows 98 machine (R 2.0.1 Patched (2005-03-29)) with a 185x32 matrix with ca 100 missing values. If anyone can suggest ways to speed up my function I would appreciate it. I suspect having a pair of nested for loops is the problem, but I couldn't figure out how to get rid of them. cheers, jari oksanen -- Jari Oksanen, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Very Slow Gower Similarity Function
On 18 Apr 2005, at 20:36, Anon. wrote: Jari Oksanen wrote: On 18 Apr 2005, at 19:10, Tyler Smith wrote: Hello, I am a relatively new user of R. I have written a basic function to calculate the Gower similarity function. I was motivated to do so partly as an excercise in learning R, and partly because the existing option (vegdist in the vegan package) does not accept missing values. Speed is the reason to use C instead of R. It should be easy, almost trivial, to modify the vegdist.c so that it handles missing values. I guess this handling means ignoring the value pair if one of the values is missing -- which is not so gentle to the metric properties so dear to Gower. Package vegan is designed for ecological community data which generally do not have missing values (except in environmental data), but contributions are welcome. The only reason you never see ecological community data with missing values is because the ecologists remove those species/sites from their Excel sheets before they give it to you to sort out their mess. Well, ecologists have plenty of missing species in their community data, but these have zero values since they were not observed. I guess some Bob O'Hara is going to have a paper about this in JAE. This is actually one of the few things they know how to do in Excel - I'm dreading the day when a paper appears in JAE saying that you can use Excel to produce P-values. The A in JAE stands for Animal: for real things they still have Journal of Ecology. To be slightly more serious, as an exercise the OP could consider writing a wrapper function in R that removes the missing data and then calls vegdist to calculate his Gower similarity index. The looping goes within C code, and for pairwise deletion of missing values wrapping is difficult. With complete.cases this is trivial (and then your result would be more metric as well). -- Jari Oksanen, Oulu, Finland __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] results from sammon()
On Wed, 2005-04-20 at 10:35 +0200, Domenico Cozzetto wrote: Dear all, I'm trying to get a two dimensional embedding of some data using different meythods, among which princomp(), cmds(), sammon() and isoMDS(). I have a problem with sammon() because the coordinates I get are all equal to NA. What does it mean? Why the method fails in finding the coordinates? Can I do anything to get some meaningful results? I'm sorry, but I can't reproduce your problem. I have tried hard with different tricks, but sammon() always gives good numeric results, or reports on the problems with the input and refuses to continue. For a starter: which sammon did you use. I think there may be three or four implementations in R with that name alone (and some variants may be names differently). I used sammon() in MASS (Venables Ripley), and could not get NA. You need to give more details if you want to get help. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] results from sammon()
On Wed, 2005-04-20 at 12:35 +0200, Domenico Cozzetto wrote: Thanks for the attention paid to my rpoblem. Please find enclosed the matrix with my dissimilarities. This is the only case in which sammon(), from the MASS package, gives me this kind of problems. I'm using the implementation of sammon provided by the package MASS and the starting configuration is the default one. Here are the values for the other actual parameters niter = 100, trace = FALSE, magic = 0.2, tol = 1e-4 Domenico, I had a look at your dissimilarity matrix, and indeed, they gave all NaN in sammon() of MASS. This is speculation: sammon() uses cmdscale to get starting configuration, and cmdscale puts two points (20 and 21) at zero distance from each other. Sammon scaling checks against zero dissimilarities in input, put it seems that it doesn't check against zero dissimilarities in starting configuration. Moving one point slightly seems to solve your problem. In the following, diss is the dissimilarity matrix you sent. The trick is to calculate the same starting configuration that sammon() would use (y), but then move one of the conflicting points slightly and give that as the starting configuration: y - cmdscale(diss) range(dist(y)) [1] 0.00 1.443101 y[21,] - y[21,] + 0.01 sam - sammon(diss, y) Initial stress: 0.23260 stress after 10 iters: 0.09420, magic = 0.461 stress after 20 iters: 0.08072, magic = 0.500 stress after 30 iters: 0.07838, magic = 0.500 stress after 40 iters: 0.07754, magic = 0.500 stress after 50 iters: 0.07710, magic = 0.500 stress after 60 iters: 0.07681, magic = 0.500 stress after 70 iters: 0.07663, magic = 0.500 stress after 80 iters: 0.07653, magic = 0.500 cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Pca loading plot lables
On Mon, 2005-04-25 at 13:21 +0200, Frédéric Ooms wrote: Dear colleagues, I a m a beginner with R and I would like to add labels (i.e. the variable names) on a pca loading plot to determine the most relevant variables. Could you please tell me the way to do this kind of stuff. The command I use to draw the pca loading plot is the following : Plot(molprop.pc$loading[,1] ~ molprop.pc$loading[,2]) Thanks for your help Have you tried 'biplot' and found it unsatisfactory for your needs? biplot(pr) Alternatively, you can do it by hand: plot(pr$loadings, type=n) text(pr$loadings, rownames(pr$loadings), xpd=TRUE) abline(h=0); abline(v=0) If you really want to have Axis 2 as horizontal, then you must replace all pr$loadings pieces with pr$loadings[,2:1]. cheers, jari oksanen -- Jari Oksanen [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html