[R] Trouble building R2.1.0 from source on Linux: package VR
Hi, Following on from suggestions made last week, I decided to install R 2.1.0 on my Linux machine. I'm running into a problem there however, as shown: make[1]: Entering directory `/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended' make[2]: Entering directory `/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended' begin installing recommended package VR WARNING: ignoring environment value of R_HOME tar: Skipping to next header tar: Archive contains obsolescent base-64 headers incomplete literal tree gzip: VR.tgz: invalid compressed data--format violated tar: Error exit delayed from previous errors ERROR: cannot extract package from 'VR.tgz' make[2]: *** [VR.ts] Error 1 make[2]: Leaving directory `/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended' make[1]: *** [recommended-packages] Error 2 make[1]: Leaving directory `/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended' make: *** [stamp-recommended] Error 2 This is most unusual - I must have built R from source five or six times going back to 1.5.0 and don't recall any problems like this. Does anyone have any suggestions about where I might look for the source of this problem. In particular, I'm interested in using package 'vsn', whose installation is (I believe) blocked by this problem. I downloaded the source from CRAN this afternoon, choosing the version 2.1.0 'stable' code. I don't have sysadmin privileges on the build machine, but this has never been a problem before - I just pass --prefix=$HOME to the configure script. FWIW, I believe the sysadmins use Debian. This is a pretty time-critical matter for me (I wouldn't have chosen to upgrade now, were it not for my earlier problem with merge when there are empty labels), any assistance greatly appreciated. Thanks in advance. -Frank PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] possible bug in merge with duplicate blank names in 'by' field.
Thanks for your quick responses, Gabor and Brian. I'm currently running R version 1.9.1 on Linux. Actually, I have just tested this on R v.2.1.0 running under Windows XP, and indeed, as you both indicate, the problem does not exist on that version for that OS. So, at an appropriate time I'll upgrade my Linux installation to the most recent version (1.9.1 is a year old, I guess). -Frank At 03:26 AM 6/17/2005, Prof Brian Ripley wrote: What version of R is this (please do see the posting guide)? In both 2.1.0 and 2.1.1 beta I get all Promoter ip.x ip.y ip 130 40 40 240 40 40 3a 10 NA NA 4c 20 20 20 5b NA 15 15 6d NA 30 30 so cannot reproduce your result. Are you sure that the `blanks' really are empty and not some character that is printing as empty on your unstated OS? BTW ' ' is what is normally called `blank'. BTW, these are not `names' but character strings: `names' has other meanings in R. PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] possible bug in merge with duplicate blank names in 'by' field.
Run this: p - c('a', 'c', '', ''); a - c(10, 20, 30, 40); d1 - data.frame(Promoter=p, ip=a) # Note duplicate empty names in p. p - c('b', 'c', 'd', ''); a - c(15, 20, 30, 40); d2 - data.frame(Promoter=p, ip=a) all - merge(x=d1, y=d2, by=Promoter, all=T) all - merge(x=all, y=d2, by=Promoter, all=T) all Data is this: d1 Promoter ip 1a 10 2c 20 3 30 4 40 d2 Promoter ip 1b 15 2c 20 3d 30 4 40 Output looks like this: Promoter ip.x ip.y ip 140 30 30 240 40 30 340 30 40 440 40 40 5b 15 NA NA 6c 20 20 20 7d 30 NA NA 8a NA 10 10 The weird thing about this is (in my view) that each instance of '' is considered unique, so with each successive merge, all combinatorial possibilities are explored, like a SQL outer join (Cartesian product). For non-empty names, an inner join is performed. Dealing with genomic data (10^4 datapoints), it's easy to have a couple of blanks buried in the middle of things, and to combine several replicates with successive merges. I couldn't understand how my three replicates of 6000 points, in which I expected substantial overlap in the labels, were taking so long to merge and ultimately generating 57000 labels. The culprit turned out to be a few hundred blanks buried in the middle. Why does the empty (null) name merit special treatment? Perhaps I'm missing something. I hesitate to submit this as a bug, since technically I guess you could say that blank names, especially duplicates, are not kosher. But on the other hand, this combinatorial behaviour seems to occur only for blanks. -Frank PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda source code
Wei Geng, I asked the same question about six weeks ago, so let me try to answer it. The source for the entire package 'MASS' is in a single file, I believe (at least this is true on my Linux setup). The exact location of that file you'll have to determine by searching the directory/folder where you installed it. The function 'lda' is implemented entirely in R itself, like much of its functionality. Look at the functions 'lda' and 'predict.lda' in this file for details. Comments are sparse in most R code, from what I gather, but if you look in Pattern Recognition and Neural Networks by Brian Ripley (one of the authors of this package), you'll find a discussion in section 2.4 'Predictive classification' that covers much of what's going on, from what I've been able to glean. I hope that helps. There are certainly others out there who are more au fait with this than me. -Frank Gibbons At 05:47 PM 10/1/2003, you wrote: Wei Geng wrote: I am new to R. Trying to find out how lda() {in MASS R1.8.0 Windows} was implemented in R. Does anyone know where to find out lda source code ? Thanks. Here: http://cran.r-project.org Hint: MASS is a *package*. You want to view its *source*. Same with most other R packages. Or just about anything else you want to know about R. Cheers Jason -- Indigo Industrial Controls Ltd. http://www.indigoindustrial.co.nz 64-21-343-545 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Cook-distance-type plot (vertical bars)
Hi, Figure 13 of Emmanuel Paradis's R for Beginners was produced by termplot working on an aov object. The lower right-hand plot is labelled Cook's distance plot, and I'd really like to produce a similar type of figure, but in a totally different context. (I'm not even sure what this kind of figure is called, perhaps an impulse plot, where instead of a point at (x,y), there's a vertical bar running from the x-axis up to where the point would be). Can anyone give me pointers on where to look for more info. I've had a look in the places I could think of (plot.lm.R, termplot.R, plot.R, aov.R), and couldn't find anything. Maybe I overlooked it? Thanks -Frank Gibbons PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Cook-distance-type plot (vertical bars)
Thanks to all who responded, and so promptly too: it works exactly as you describe. Figure 13 of Emmanuel Paradis's R for Beginners was produced by termplot working on an aov object. No, it was produced by plot() working on a aov object, as its caption indicates. The termplot() is Figure 14. Thomas Lumley is quite right, it's produced by plot() (not termplot()), and it is mentioned on p31 of R for beginners. My mistake. In the interest of self education, is there a more comprehensive source for plot-types that I should read? Ideally, this would be something with lots of figures, so that I could browse the figures to find what I want to do, and then look up how to do it. R for Beginners goes some way along this path, but perhaps there's something more comprehensive? Thanks again, -Frank PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] discriminant function
Stefan, I asked the same question last week. As Brian Ripley, its author, said then (and others), the only way to see what's going on is to read the code. It's pretty complicated statistically (that's why the performance is so good!), many of the details are in chapter 2 of Pattern Recognition Neural Networks. The upshot is that it's really not that easy to come up with a simple equation to encompass all of what lda.R does, unfortunately. You can get the scaling factors programmatically for the centered variables by running coef() on the LDA object, or just using the 'lda_object$scaling' That's a start, but there's a long way to go from there -Frank At 09:24 AM 8/26/2003, you wrote: How can I extract the linear discriminant functions resulting from a LDA analysis? The coefficients are listed as a result from the analysis but I have not found a way to extract these programmatically. No refrences in the archives were found. Thank you very much, Stefan __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] LDA in R: how to extract full equation, especially constant term
Hi, Having dipped my toe into R a few times over the last year or two, in the last few weeks I've been using it more and more; I'm now a thorough convert. I've just joined the list, because although it's great, I do have this problem... I'm using linear discriminant analysis for binary classification, and am happy with the classification performance using predict(). What I'd like to do now is extract the equation for this classifier, for use elsewhere (in Perl/Python code). I know that I can get the means and scaling factors from the predict() object, but I'm having trouble computing the constant term. From reading Venables Ripley and Hastie/Tibshirani/Friedman, I know the priors play a role in adjusting the cut-point from zero (for equally sized classes), based on the relative sizes of the two classes. But when I try to do the computation, I don't get a value that agrees with that returned by predict(). I've seen a post about this problem in the past, but it was never really answered by anyone who was familiar with R/S-PLUS. Can anyone help me with this? I guess I'm really wondering how R is computing the constant term in its discriminant function. Thanks, -Frank Gibbons PhD, Computational Biologist, Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA. Tel: 617-432-3555 Fax: 617-432-3557 http://llama.med.harvard.edu/~fgibbons __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help